File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Data descriptions from large language models with influence estimation

Author(s)
Kim, Chaeri
Advisor
Kim, Taehwan
Issued Date
2024-02
URI
https://scholarworks.unist.ac.kr/handle/201301/82152 http://unist.dcollection.net/common/orgView/200000743449
Abstract
Deep learning models have been successful in many areas but understanding their behaviors still remains a black-box. Most prior explainable AI (XAI) approaches have focused on interpreting and explaining how models make predictions. But in contrast, we take a different approach via the lens of data because data is one of the most important factors in the success of deep learning models. We would like to understand how data can be explained with deep learning model training via one of the most common media -- language. Therefore, we propose a novel approach to understand and extract which information can explain each class inside the dataset well by incorporating knowledge from existing external knowledge bases extracted through large language models such as GPT-3.5. However, the extracted data descriptions may still include irrelevant information, so we propose to exploit influence estimation to choose the most informative textual descriptions, along with the CLIP score. The presented textual descriptions may provide insight into what the trained model focuses on and utilizes for making the prediction. Furthermore, by utilizing recent vision-language contrastive learning as it may provide cross-modal transferability, we propose a novel benchmark task of cross-modal transfer classification to examine the effectiveness of the data description. In experiments with nine image classification datasets, the extracted text descriptions further boost the performance of the trained model with only images. Therefore, it demonstrates that the proposed approach provides information that can explain the characteristics of each dataset that helps the model to train. Through this, we may have insight and inherent interpretability of the decision process from the model. In addition, we show that our approach may help to solve model bias in text-to-image generation tasks.
Publisher
Ulsan National Institute of Science and Technology

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.