Generating Realistic Images from In-the-wild Sounds

Scholarworks@UNIST

UNIST Library

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim, Taehwan	-
dc.contributor.author	Lee, Taegyeong	-
dc.date.accessioned	2024-10-14T13:50:42Z	-
dc.date.available	2024-10-14T13:50:42Z	-
dc.date.issued	2024-08	-
dc.description.abstract	Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First, we convert sound into text using audio captioning. Second, we propose audio attention and sentence attention to represent the rich characteristics of sound and visualize the sound. Lastly, we propose a direct sound optimization with CLIPscore and AudioCLIP and generate images with a diffusion-based model. In experiments, it shows that our model is able to generate high quality images from wild sounds and outperforms baselines in both quantitative and qualitative evaluations on wild audio datasets.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/84195	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000813034	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.title	Generating Realistic Images from In-the-wild Sounds	-
dc.type	Thesis	-

qrcode

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.