File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Kim, Yeolib -
dc.contributor.author Park, Hyerin -
dc.date.accessioned 2024-10-14T13:49:59Z -
dc.date.available 2024-10-14T13:49:59Z -
dc.date.issued 2024-08 -
dc.description.abstract Korea's e-commerce market holds a significant portion of the total retail distribution market, and its growth has increased the need to analyze user-generated online reviews from multiple perspectives. Recent studies have utilized Aspect-based Sentiment Analysis (ABSA) to analyze these reviews comprehensively, enabling sentiment analysis on various aspects of products. However, ABSA research for the Korean language has been limited due to its unique linguistic characteristics, such as its agglutinative nature and less strict spacing rules, which complicate the development of effective tokenizers and the acquisition of large datasets. To address these challenges, we aimed to develop a tokenizer tailored to Korean online reviews and train a domain-specific language model. We collected a total of 78 million online reviews across nine categories from Korea's largest shopping platform and performed preprocessing specific to review characteristics. Using morpheme analysis and SentencePiece, we created our tokenizer and trained a BERT model. Our model outperformed widely used models in tasks related to recognizing aspect terms as single word, although it showed slightly lower performance in general natural language understanding tasks. These results demonstrate the model's effectiveness in understanding review-specific contexts, highlighting its potential to enhance sentiment analysis in the Korean e-commerce domain. -
dc.description.degree Master -
dc.description School of Business Administration (Management Engineering) -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/84059 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000813850 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.subject NLP -
dc.subject BERT -
dc.subject Tokenizer -
dc.title Training Domain-Specific Korean Language Model for Aspect-based Review Analysis -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.