Uniformly interpolated balancing for robust prediction in translation quality estimation: A case study of English-Korean translation

Kim, H.; Na, Seung-Hoon

doi:10.1145/3365916

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

나승훈

Na, Seung-Hoon: Natural Language Processing Lab

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.number	3	-
dc.citation.startPage	37	-
dc.citation.title	ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING	-
dc.citation.volume	19	-
dc.contributor.author	Kim, H.	-
dc.contributor.author	Na, Seung-Hoon	-
dc.date.accessioned	2025-04-25T15:11:46Z	-
dc.date.available	2025-04-25T15:11:46Z	-
dc.date.created	2025-04-08	-
dc.date.issued	2020-05	-
dc.description.abstract	There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with “high” translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets. © 2020 Association for Computing Machinery.	-
dc.identifier.bibliographicCitation	ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, v.19, no.3, pp.37	-
dc.identifier.doi	10.1145/3365916	-
dc.identifier.issn	2375-4699	-
dc.identifier.scopusid	2-s2.0-85078253410	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86794	-
dc.identifier.wosid	000582616600004	-
dc.language	영어	-
dc.publisher	Association for Computing Machinery	-
dc.title	Uniformly interpolated balancing for robust prediction in translation quality estimation: A case study of English-Korean translation	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Imbalanced data	-
dc.subject.keywordAuthor	Predictor-Estimator	-
dc.subject.keywordAuthor	Translation quality estimation	-
dc.subject.keywordAuthor	Uniformly interpolated balancing	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.