Semi-supervised multi-modal video action recognition with audio source localization guided mixup

Kang, Seok Un

Scholarworks@UNIST

UNIST Library

File Download

200000743485.pdf

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim,Tae Hwan	-
dc.contributor.author	Kang, Seok Un	-
dc.date.accessioned	2024-04-11T15:19:56Z	-
dc.date.available	2024-04-11T15:19:56Z	-
dc.date.issued	2024-02	-
dc.description.abstract	Video action recognition is a challenging but important task to understand and find out what the video does. However, acquiring labels of video is costly, and semi-supervised learning (SSL) has been studied to improve the performance even with the small number of labeled data in the task. Prior studies for semi-supervised video action recognition have mostly focused on using single modality - visuals - but video is multi-modal so utilizing both visuals and audio would be desirable and improve the performance further, which has not been well explored. Therefore, we propose audio-visual SSL for video action recognition, which uses both visual and audio together, even with quite a few labeled data that is challenging. In addition, to maximize the information of audio and video, we propose a novel audio source localization-guided mixup method that considers inter-modal relations between video and audio modalities. In experiments on UCF-51, Kinetics-400, and VGGSound datasets, our model shows the superior performance of the proposed SSL audio-visual action recognition and audio source localization-guided mixup.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/82146	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000743485	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.rights.embargoReleaseDate	9999-12-31	-
dc.rights.embargoReleaseTerms	9999-12-31	-
dc.title	Semi-supervised multi-modal video action recognition with audio source localization guided mixup	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.