Collection-based compound noun segmentation for Korean information retrieval

Kang, In-Su; Na, Seung-Hoon; Lee, Jong-Hyeok

doi:10.1007/s10791-006-9007-3

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

나승훈

Na, Seung-Hoon: Natural Language Processing Lab

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.endPage	631	-
dc.citation.number	5	-
dc.citation.startPage	613	-
dc.citation.title	INFORMATION RETRIEVAL	-
dc.citation.volume	9	-
dc.contributor.author	Kang, In-Su	-
dc.contributor.author	Na, Seung-Hoon	-
dc.contributor.author	Lee, Jong-Hyeok	-
dc.date.accessioned	2025-04-25T15:14:07Z	-
dc.date.available	2025-04-25T15:14:07Z	-
dc.date.created	2025-04-08	-
dc.date.issued	2006-11	-
dc.description.abstract	Compound noun segmentation is a key first step in language processing for Korean. Thus far, most approaches require some form of human supervision, such as pre-existing dictionaries, segmented compound nouns, or heuristic rules. As a result, they suffer from the unknown word problem, which can be overcome by unsupervised approaches. However, previous unsupervised methods normally do not consider all possible segmentation candidates, and/or rely on character-based segmentation clues such as bi-grams or all-length n-grams. So, they are prone to falling into a local solution. To overcome the problem, this paper proposes an unsupervised segmentation algorithm that searches the most likely segmentation result from all possible segmentation candidates using a word-based segmentation context. As word-based segmentation clues, a dictionary is automatically generated from a corpus. Experiments using three test collections show that our segmentation algorithm is successfully applied to Korean information retrieval, improving a dictionary-based longest-matching algorithm.	-
dc.identifier.bibliographicCitation	INFORMATION RETRIEVAL, v.9, no.5, pp.613 - 631	-
dc.identifier.doi	10.1007/s10791-006-9007-3	-
dc.identifier.issn	1386-4564	-
dc.identifier.scopusid	2-s2.0-33748808080	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86845	-
dc.identifier.wosid	000240551900005	-
dc.language	영어	-
dc.publisher	SPRINGER	-
dc.title	Collection-based compound noun segmentation for Korean information retrieval	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalResearchArea	Computer Science	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Korean information retrieval	-
dc.subject.keywordAuthor	compound noun segmentation	-
dc.subject.keywordAuthor	unsupervised method	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.