File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Yoo, Jaejun -
dc.contributor.author Kim, Dongyoung -
dc.date.accessioned 2024-10-14T13:50:39Z -
dc.date.available 2024-10-14T13:50:39Z -
dc.date.issued 2024-08 -
dc.description.abstract Retrieving target vehicles through natural language descriptions is crucial for urban management within intelligent transportation systems. Existing methods use models like CLIP that exploit the relationship between text and visual data. Since conventional CLIP models take images as input, they utilize syn- thetic data, such as moving maps, to represent vehicle trajectories. However, these models struggle to comprehend the temporal aspects of video data. Researchers have attempted to improve temporal un- derstanding by using various data augmentations and video encoders. Nonetheless, video encoders can only process a few frames at a time, and traditional frame sampling methods do not effectively capture the dynamics of vehicle movement. To address these issues, We propose a motion-based video sampling technique that efficiently harnesses the motion data of target vehicles. By leveraging state-of-the-art video foundation models and a re-ranking algorithm, we have improved the performance of models on public datasets for natural language-based vehicle retrieval. Additionally, the available benchmark dataset is unique, limited in size, and exhibits significant class imbalances. Therefore, we applied the Video CutMix augmentation algorithm and demonstrated through experiments that vehicle augmenta- tion is feasible in addressing class imbalance. -
dc.description.degree Master -
dc.description Graduate School of Artificial Intelligence -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/84190 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000813499 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.title Motion-based Video Sampling and Cutmix Augmentation for Natural Language-Based Vehicle Retrieval -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.