Two-Stage Document Length Normalization for Information Retrieval

Na, Seung-Hoon

doi:10.1145/2699669

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

나승훈

Na, Seung-Hoon: Natural Language Processing Lab

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.number	2	-
dc.citation.startPage	8	-
dc.citation.title	ACM TRANSACTIONS ON INFORMATION SYSTEMS	-
dc.citation.volume	33	-
dc.contributor.author	Na, Seung-Hoon	-
dc.date.accessioned	2025-04-25T15:13:19Z	-
dc.date.available	2025-04-25T15:13:19Z	-
dc.date.created	2025-04-08	-
dc.date.issued	2015-02	-
dc.description.abstract	The standard approach for term frequency normalization is based only on the document length. However, it does not distinguish the verbosity from the scope, these being the two main factors determining the document length. Because the verbosity and scope have largely different effects on the increase in term frequency, the standard approach can easily suffer from insufficient or excessive penalization depending on the specific type of long document. To overcome these problems, this article proposes two-stage normalization by performing verbosity and scope normalization separately, and by employing different penalization functions. In verbosity normalization, each document is prenormalized by dividing the term frequency by the verbosity of the document. In scope normalization, an existing retrieval model is applied in a straightforward manner to the prenormalized document, finally leading us to formulate our proposed verbosity normalized (VN) retrieval model. Experimental results carried out on standard TREC collections demonstrate that the VN model leads to marginal but statistically significant improvements over standard retrieval models.	-
dc.identifier.bibliographicCitation	ACM TRANSACTIONS ON INFORMATION SYSTEMS, v.33, no.2, pp.8	-
dc.identifier.doi	10.1145/2699669	-
dc.identifier.issn	1046-8188	-
dc.identifier.scopusid	2-s2.0-84923328913	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86826	-
dc.identifier.wosid	000351440200004	-
dc.language	영어	-
dc.publisher	ASSOC COMPUTING MACHINERY	-
dc.title	Two-Stage Document Length Normalization for Information Retrieval	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalResearchArea	Computer Science	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	document length normalization	-
dc.subject.keywordAuthor	Algorithms	-
dc.subject.keywordAuthor	Experimentation	-
dc.subject.keywordAuthor	Performance	-
dc.subject.keywordAuthor	Theory	-
dc.subject.keywordAuthor	Verbosity normalization	-
dc.subject.keywordAuthor	retrieval heuristics	-
dc.subject.keywordAuthor	term frequency	-
dc.subject.keywordAuthor	scope normalization	-
dc.subject.keywordPlus	MODELS	-
dc.subject.keywordPlus	TERM FREQUENCY NORMALIZATION	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.