Motion-based Video Sampling and Cutmix Augmentation for Natural Language-Based Vehicle Retrieval

Kim, Dongyoung

Scholarworks@UNIST

UNIST Library

File Download

200000813499.pdf

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Yoo, Jaejun	-
dc.contributor.author	Kim, Dongyoung	-
dc.date.accessioned	2024-10-14T13:50:39Z	-
dc.date.available	2024-10-14T13:50:39Z	-
dc.date.issued	2024-08	-
dc.description.abstract	Retrieving target vehicles through natural language descriptions is crucial for urban management within intelligent transportation systems. Existing methods use models like CLIP that exploit the relationship between text and visual data. Since conventional CLIP models take images as input, they utilize syn- thetic data, such as moving maps, to represent vehicle trajectories. However, these models struggle to comprehend the temporal aspects of video data. Researchers have attempted to improve temporal un- derstanding by using various data augmentations and video encoders. Nonetheless, video encoders can only process a few frames at a time, and traditional frame sampling methods do not effectively capture the dynamics of vehicle movement. To address these issues, We propose a motion-based video sampling technique that efficiently harnesses the motion data of target vehicles. By leveraging state-of-the-art video foundation models and a re-ranking algorithm, we have improved the performance of models on public datasets for natural language-based vehicle retrieval. Additionally, the available benchmark dataset is unique, limited in size, and exhibits significant class imbalances. Therefore, we applied the Video CutMix augmentation algorithm and demonstrated through experiments that vehicle augmenta- tion is feasible in addressing class imbalance.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/84190	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000813499	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.title	Motion-based Video Sampling and Cutmix Augmentation for Natural Language-Based Vehicle Retrieval	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.