Korea's e-commerce market holds a significant portion of the total retail distribution market, and its growth has increased the need to analyze user-generated online reviews from multiple perspectives. Recent studies have utilized Aspect-based Sentiment Analysis (ABSA) to analyze these reviews comprehensively, enabling sentiment analysis on various aspects of products. However, ABSA research for the Korean language has been limited due to its unique linguistic characteristics, such as its agglutinative nature and less strict spacing rules, which complicate the development of effective tokenizers and the acquisition of large datasets. To address these challenges, we aimed to develop a tokenizer tailored to Korean online reviews and train a domain-specific language model. We collected a total of 78 million online reviews across nine categories from Korea's largest shopping platform and performed preprocessing specific to review characteristics. Using morpheme analysis and SentencePiece, we created our tokenizer and trained a BERT model. Our model outperformed widely used models in tasks related to recognizing aspect terms as single word, although it showed slightly lower performance in general natural language understanding tasks. These results demonstrate the model's effectiveness in understanding review-specific contexts, highlighting its potential to enhance sentiment analysis in the Korean e-commerce domain.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
School of Business Administration (Management Engineering)