File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Comuzzi, Marco -
dc.contributor.author Kim, Yeonsu -
dc.date.accessioned 2025-04-04T13:49:43Z -
dc.date.available 2025-04-04T13:49:43Z -
dc.date.issued 2025-02 -
dc.description.abstract This research investigates the application of Large Language Models (LLMs) to enhance data prepara- tion pipelines in Predictive Process Monitoring (PPM). PPM, a critical tool for analyzing event logs to predict future process behaviors, often suffers from issues such as missing values, semantic inconsisten- cies, and data noise. The study demonstrates the potential of LLMs to address these imperfections by leveraging their contextual understanding to improve data quality and predictive accuracy. The proposed LLM-driven pipeline integrates steps such as contextual transformation, information extraction, and text normalization, evaluated on two domain-specific datasets, Credit and Pub. Exper- imental results highlight the effectiveness of LLM-based imputation in handling semantic variability, particularly for homonym transformations, where performance metrics such as BERTScore and F1- scores show significant improvements. However, the study also identifies limitations, notably reduced performance under high synonym transformation levels and domain-specific linguistic complexities, es- pecially in the Pub dataset. Comparative analysis reveals that LLMs excel in scenarios requiring semantic understanding, of- fering advantages over traditional rule-based imputation methods in certain contexts. The research em- phasizes the complementary potential of combining LLMs with classic approaches, suggesting hybrid models for robust and scalable data preparation pipelines. This study contributes to the growing field of process mining by showcasing the feasibility of in- tegrating advanced LLMs into PPM workflows. Future research directions include domain-specific fine-tuning, lightweight model development, and hybrid frameworks to optimize both automation and interpretability. Ultimately, these advancements aim to bridge the gap between raw data imperfections and actionable process insights, driving efficiency and accuracy in predictive analytics. -
dc.description.degree Master -
dc.description Department of Industrial Engineering -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/86487 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000865871 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.subject Process Mining -
dc.title Optimizing Data Preparation Pipelines for Predictive Process Monitoring with Large Language Models -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.