File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

김태환

Kim, Taehwan
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Generating Realistic Images from In-the-wild Sounds

Author(s)
Lee, TaegyeongKang, JeonghunKim, HyeonyuKim, Taehwan
Issued Date
2023-10-04
URI
https://scholarworks.unist.ac.kr/handle/201301/67783
Citation
IEEE International Conference on Computer Vision
Abstract
Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First, we convert sound into text using audio captioning. Second, we propose audio attention and sentence attention to represent the rich characteristics of sound and visualize the sound. Lastly, we propose a direct sound optimization with CLIPscore and AudioCLIP and generate images with a diffusion-based model. In experiments, it shows that our model is able to generate high quality images from wild sounds and outperforms baselines in both quantitative and qualitative evaluations on wild audio datasets.
Publisher
IEEE/CVF

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.