Question-aware Caption Refinement for Video Question Answering

Ki, Youngbin

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim, Taehwan	-
dc.contributor.author	Ki, Youngbin	-
dc.date.accessioned	2025-09-29T11:31:18Z	-
dc.date.available	2025-09-29T11:31:18Z	-
dc.date.issued	2025-08	-
dc.description.abstract	While recent VideoQA studies utilize captions converted from frames as the main source of LLM reasoning, they primarily focus on selecting key frames and often overlook the content of the captions. In this work, we hypothesize that the content of the caption directly influences the reasoning process of LLM. To validate this hypothesis, we establish an evaluation setting that enables isolating the effect of caption content. And our findings show that general captions frequently lack question-relevant information and sometimes even hinder reasoning. To address this issue, we propose a question-aware caption refinement framework that extracts question-related events and event-specific visual elements and incorporates them into refined captions. Extensive experiments across multiple datasets and baselines demonstrate that our refined captions consistently improve over general captions, across both commonsense and non-commonsense questions. Specifically, for non-commonsense questions, our method improves accuracy by 11.8% on NExT-QA and 14.6% on IntentQA. These results empirically validate our hypothesis and highlight the importance of aligning caption content with the intent of the question to enable accurate and robust reasoning in VideoQA.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/88266	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000905240	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.rights.embargoReleaseDate	9999-12-31	-
dc.rights.embargoReleaseTerms	9999-12-31	-
dc.subject	Video Question Answering	-
dc.title	Question-aware Caption Refinement for Video Question Answering	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.