File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Kim, Taehwan -
dc.contributor.author Ki, Youngbin -
dc.date.accessioned 2025-09-29T11:31:18Z -
dc.date.available 2025-09-29T11:31:18Z -
dc.date.issued 2025-08 -
dc.description.abstract While recent VideoQA studies utilize captions converted from frames as the main source of LLM reasoning, they primarily focus on selecting key frames and often overlook the content of the captions. In this work, we hypothesize that the content of the caption directly influences the reasoning process of LLM. To validate this hypothesis, we establish an evaluation setting that enables isolating the effect of caption content. And our findings show that general captions frequently lack question-relevant information and sometimes even hinder reasoning. To address this issue, we propose a question-aware caption refinement framework that extracts question-related events and event-specific visual elements and incorporates them into refined captions. Extensive experiments across multiple datasets and baselines demonstrate that our refined captions consistently improve over general captions, across both commonsense and non-commonsense questions. Specifically, for non-commonsense questions, our method improves accuracy by 11.8% on NExT-QA and 14.6% on IntentQA. These results empirically validate our hypothesis and highlight the importance of aligning caption content with the intent of the question to enable accurate and robust reasoning in VideoQA. -
dc.description.degree Master -
dc.description Graduate School of Artificial Intelligence -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/88266 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000905240 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.rights.embargoReleaseDate 9999-12-31 -
dc.rights.embargoReleaseTerms 9999-12-31 -
dc.subject Video Question Answering -
dc.title Question-aware Caption Refinement for Video Question Answering -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.