Learning of indiscriminate distributions of document embeddings for domain adaptation

Park, Saerom; Lee, Woojin; Lee, Jaewook

doi:10.3233/IDA-184131

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

박새롬

Park, Saerom

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.endPage	797	-
dc.citation.number	4	-
dc.citation.startPage	779	-
dc.citation.title	INTELLIGENT DATA ANALYSIS	-
dc.citation.volume	23	-
dc.contributor.author	Park, Saerom	-
dc.contributor.author	Lee, Woojin	-
dc.contributor.author	Lee, Jaewook	-
dc.date.accessioned	2023-12-21T18:40:25Z	-
dc.date.available	2023-12-21T18:40:25Z	-
dc.date.created	2023-05-09	-
dc.date.issued	2019-09	-
dc.description.abstract	Natural language processing (NLP) is an important application area in domain adaptation because properties of texts depend on their corpus. However, a textual input is not fundamentally represented as the numerical vector. Many domain adaptation methods for NLP have been developed on the basis of numerical representations of texts instead of textual inputs. Thus, we develop a distributed representation learning method of words and documents for domain adaptation. The developed method addresses the domain separation problem of document embeddings from different domains, that is, the supports of the embeddings are separable across domains and the distributions of the embeddings are discriminated. We propose a new method based on negative sampling. The proposed method learns document embeddings by assuming that a noise distribution is dependent on a domain. The proposed method moves a document embedding close to the embeddings of the important words in the document and keeps the embedding away from the word embeddings that occur frequently in both domains. For Amazon reviews, we verified that the proposed method outperformed other representation methods in terms of indiscriminability of the distributions of the document embeddings through experiments such as visualizing them and calculating a proxy A-distance measure. We also performed sentiment classification tasks to validate the effectiveness of document embeddings. The proposed method achieved consistently better results than other methods. In addition, we applied the learned document embeddings to the domain adversarial neural network method, which is a popular deep learning-based domain adaptation model. The proposed method obtained not only better performance on most datasets but also more stable convergences for all datasets than the other methods. Therefore, the proposed method are applicable to other domain adaptation methods for NLP using numerical representations of documents or words.	-
dc.identifier.bibliographicCitation	INTELLIGENT DATA ANALYSIS, v.23, no.4, pp.779 - 797	-
dc.identifier.doi	10.3233/IDA-184131	-
dc.identifier.issn	1088-467X	-
dc.identifier.scopusid	2-s2.0-85073120748	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/64275	-
dc.identifier.wosid	000488816000004	-
dc.language	영어	-
dc.publisher	IOS PRESS	-
dc.title	Learning of indiscriminate distributions of document embeddings for domain adaptation	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalResearchArea	Computer Science	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Domain adaptation	-
dc.subject.keywordAuthor	natural language processing	-
dc.subject.keywordAuthor	distributed representation	-
dc.subject.keywordAuthor	negative sampling	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.