File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

나승훈

Na, Seung-Hoon
Natural Language Processing Lab
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.number 3 -
dc.citation.startPage 37 -
dc.citation.title ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING -
dc.citation.volume 19 -
dc.contributor.author Kim, H. -
dc.contributor.author Na, Seung-Hoon -
dc.date.accessioned 2025-04-25T15:11:46Z -
dc.date.available 2025-04-25T15:11:46Z -
dc.date.created 2025-04-08 -
dc.date.issued 2020-05 -
dc.description.abstract There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with “high” translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets. © 2020 Association for Computing Machinery. -
dc.identifier.bibliographicCitation ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, v.19, no.3, pp.37 -
dc.identifier.doi 10.1145/3365916 -
dc.identifier.issn 2375-4699 -
dc.identifier.scopusid 2-s2.0-85078253410 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/86794 -
dc.language 영어 -
dc.publisher Association for Computing Machinery -
dc.title Uniformly interpolated balancing for robust prediction in translation quality estimation: A case study of English-Korean translation -
dc.type Article -
dc.description.isOpenAccess FALSE -
dc.type.docType Article -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor Imbalanced data -
dc.subject.keywordAuthor Predictor-Estimator -
dc.subject.keywordAuthor Translation quality estimation -
dc.subject.keywordAuthor Uniformly interpolated balancing -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.