File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

나승훈

Na, Seung-Hoon
Natural Language Processing Lab
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.number 2 -
dc.citation.startPage 8 -
dc.citation.title ACM TRANSACTIONS ON INFORMATION SYSTEMS -
dc.citation.volume 33 -
dc.contributor.author Na, Seung-Hoon -
dc.date.accessioned 2025-04-25T15:13:19Z -
dc.date.available 2025-04-25T15:13:19Z -
dc.date.created 2025-04-08 -
dc.date.issued 2015-02 -
dc.description.abstract The standard approach for term frequency normalization is based only on the document length. However, it does not distinguish the verbosity from the scope, these being the two main factors determining the document length. Because the verbosity and scope have largely different effects on the increase in term frequency, the standard approach can easily suffer from insufficient or excessive penalization depending on the specific type of long document. To overcome these problems, this article proposes two-stage normalization by performing verbosity and scope normalization separately, and by employing different penalization functions. In verbosity normalization, each document is prenormalized by dividing the term frequency by the verbosity of the document. In scope normalization, an existing retrieval model is applied in a straightforward manner to the prenormalized document, finally leading us to formulate our proposed verbosity normalized (VN) retrieval model. Experimental results carried out on standard TREC collections demonstrate that the VN model leads to marginal but statistically significant improvements over standard retrieval models. -
dc.identifier.bibliographicCitation ACM TRANSACTIONS ON INFORMATION SYSTEMS, v.33, no.2, pp.8 -
dc.identifier.doi 10.1145/2699669 -
dc.identifier.issn 1046-8188 -
dc.identifier.scopusid 2-s2.0-84923328913 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/86826 -
dc.identifier.wosid 000351440200004 -
dc.language 영어 -
dc.publisher ASSOC COMPUTING MACHINERY -
dc.title Two-Stage Document Length Normalization for Information Retrieval -
dc.type Article -
dc.description.isOpenAccess FALSE -
dc.relation.journalWebOfScienceCategory Computer Science, Information Systems -
dc.relation.journalResearchArea Computer Science -
dc.type.docType Article -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor document length normalization -
dc.subject.keywordAuthor Algorithms -
dc.subject.keywordAuthor Experimentation -
dc.subject.keywordAuthor Performance -
dc.subject.keywordAuthor Theory -
dc.subject.keywordAuthor Verbosity normalization -
dc.subject.keywordAuthor retrieval heuristics -
dc.subject.keywordAuthor term frequency -
dc.subject.keywordAuthor scope normalization -
dc.subject.keywordPlus MODELS -
dc.subject.keywordPlus TERM FREQUENCY NORMALIZATION -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.