Training Domain-Specific Korean Language Model for Aspect-based Review Analysis

Park, Hyerin

Scholarworks@UNIST

UNIST Library

File Download

200000813850.pdf

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim, Yeolib	-
dc.contributor.author	Park, Hyerin	-
dc.date.accessioned	2024-10-14T13:49:59Z	-
dc.date.available	2024-10-14T13:49:59Z	-
dc.date.issued	2024-08	-
dc.description.abstract	Korea's e-commerce market holds a significant portion of the total retail distribution market, and its growth has increased the need to analyze user-generated online reviews from multiple perspectives. Recent studies have utilized Aspect-based Sentiment Analysis (ABSA) to analyze these reviews comprehensively, enabling sentiment analysis on various aspects of products. However, ABSA research for the Korean language has been limited due to its unique linguistic characteristics, such as its agglutinative nature and less strict spacing rules, which complicate the development of effective tokenizers and the acquisition of large datasets. To address these challenges, we aimed to develop a tokenizer tailored to Korean online reviews and train a domain-specific language model. We collected a total of 78 million online reviews across nine categories from Korea's largest shopping platform and performed preprocessing specific to review characteristics. Using morpheme analysis and SentencePiece, we created our tokenizer and trained a BERT model. Our model outperformed widely used models in tasks related to recognizing aspect terms as single word, although it showed slightly lower performance in general natural language understanding tasks. These results demonstrate the model's effectiveness in understanding review-specific contexts, highlighting its potential to enhance sentiment analysis in the Korean e-commerce domain.	-
dc.description.degree	Master	-
dc.description	School of Business Administration (Management Engineering)	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/84059	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000813850	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.subject	NLP	-
dc.subject	BERT	-
dc.subject	Tokenizer	-
dc.title	Training Domain-Specific Korean Language Model for Aspect-based Review Analysis	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.