There are no files associated with this item.
Cited time in
Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.citation.endPage | 631 | - |
| dc.citation.number | 5 | - |
| dc.citation.startPage | 613 | - |
| dc.citation.title | INFORMATION RETRIEVAL | - |
| dc.citation.volume | 9 | - |
| dc.contributor.author | Kang, In-Su | - |
| dc.contributor.author | Na, Seung-Hoon | - |
| dc.contributor.author | Lee, Jong-Hyeok | - |
| dc.date.accessioned | 2025-04-25T15:14:07Z | - |
| dc.date.available | 2025-04-25T15:14:07Z | - |
| dc.date.created | 2025-04-08 | - |
| dc.date.issued | 2006-11 | - |
| dc.description.abstract | Compound noun segmentation is a key first step in language processing for Korean. Thus far, most approaches require some form of human supervision, such as pre-existing dictionaries, segmented compound nouns, or heuristic rules. As a result, they suffer from the unknown word problem, which can be overcome by unsupervised approaches. However, previous unsupervised methods normally do not consider all possible segmentation candidates, and/or rely on character-based segmentation clues such as bi-grams or all-length n-grams. So, they are prone to falling into a local solution. To overcome the problem, this paper proposes an unsupervised segmentation algorithm that searches the most likely segmentation result from all possible segmentation candidates using a word-based segmentation context. As word-based segmentation clues, a dictionary is automatically generated from a corpus. Experiments using three test collections show that our segmentation algorithm is successfully applied to Korean information retrieval, improving a dictionary-based longest-matching algorithm. | - |
| dc.identifier.bibliographicCitation | INFORMATION RETRIEVAL, v.9, no.5, pp.613 - 631 | - |
| dc.identifier.doi | 10.1007/s10791-006-9007-3 | - |
| dc.identifier.issn | 1386-4564 | - |
| dc.identifier.scopusid | 2-s2.0-33748808080 | - |
| dc.identifier.uri | https://scholarworks.unist.ac.kr/handle/201301/86845 | - |
| dc.identifier.wosid | 000240551900005 | - |
| dc.language | 영어 | - |
| dc.publisher | SPRINGER | - |
| dc.title | Collection-based compound noun segmentation for Korean information retrieval | - |
| dc.type | Article | - |
| dc.description.isOpenAccess | FALSE | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.type.docType | Article | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordAuthor | Korean information retrieval | - |
| dc.subject.keywordAuthor | compound noun segmentation | - |
| dc.subject.keywordAuthor | unsupervised method | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr
Copyright (c) 2023 by UNIST LIBRARY. All rights reserved.
ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.