File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

나승훈

Na, Seung-Hoon
Natural Language Processing Lab
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.endPage 631 -
dc.citation.number 5 -
dc.citation.startPage 613 -
dc.citation.title INFORMATION RETRIEVAL -
dc.citation.volume 9 -
dc.contributor.author Kang, In-Su -
dc.contributor.author Na, Seung-Hoon -
dc.contributor.author Lee, Jong-Hyeok -
dc.date.accessioned 2025-04-25T15:14:07Z -
dc.date.available 2025-04-25T15:14:07Z -
dc.date.created 2025-04-08 -
dc.date.issued 2006-11 -
dc.description.abstract Compound noun segmentation is a key first step in language processing for Korean. Thus far, most approaches require some form of human supervision, such as pre-existing dictionaries, segmented compound nouns, or heuristic rules. As a result, they suffer from the unknown word problem, which can be overcome by unsupervised approaches. However, previous unsupervised methods normally do not consider all possible segmentation candidates, and/or rely on character-based segmentation clues such as bi-grams or all-length n-grams. So, they are prone to falling into a local solution. To overcome the problem, this paper proposes an unsupervised segmentation algorithm that searches the most likely segmentation result from all possible segmentation candidates using a word-based segmentation context. As word-based segmentation clues, a dictionary is automatically generated from a corpus. Experiments using three test collections show that our segmentation algorithm is successfully applied to Korean information retrieval, improving a dictionary-based longest-matching algorithm. -
dc.identifier.bibliographicCitation INFORMATION RETRIEVAL, v.9, no.5, pp.613 - 631 -
dc.identifier.doi 10.1007/s10791-006-9007-3 -
dc.identifier.issn 1386-4564 -
dc.identifier.scopusid 2-s2.0-33748808080 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/86845 -
dc.identifier.wosid 000240551900005 -
dc.language 영어 -
dc.publisher SPRINGER -
dc.title Collection-based compound noun segmentation for Korean information retrieval -
dc.type Article -
dc.description.isOpenAccess FALSE -
dc.relation.journalWebOfScienceCategory Computer Science, Information Systems -
dc.relation.journalResearchArea Computer Science -
dc.type.docType Article -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor Korean information retrieval -
dc.subject.keywordAuthor compound noun segmentation -
dc.subject.keywordAuthor unsupervised method -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.