File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

윤성환

Yoon, Sung Whan
Machine Intelligence and Information Learning Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace SI -
dc.citation.conferencePlace Singapore EXPO -
dc.citation.title International Conference on Learning Representations -
dc.contributor.author Lee, Jae-Jun -
dc.contributor.author Yoon, Sung Whan -
dc.date.accessioned 2025-02-13T12:05:07Z -
dc.date.available 2025-02-13T12:05:07Z -
dc.date.created 2025-02-12 -
dc.date.issued 2025-04-24 -
dc.description.abstract Learning with multiple modalities has recently demonstrated significant gains in many domains by maximizing the shared information across modalities. However, the current approaches strongly rely on high-quality paired datasets, which allow co-training from the paired labels from different modalities. In this context, we raise a pivotal question: Can a model with one modality synergize the training of other models with the different modalities, even without the paired multimodal labels? Our answer is 'Yes'. As a figurative description, we argue that a writer, i.e., a language model, can promote the training of a painter, i.e., a visual model, even without the paired ground truth of text and image. We theoretically argue that a superior representation can be achieved by the synergy between two different modalities without paired supervision. As proofs of concept, we broadly confirm the considerable performance gains from the synergy among visual, language, and audio models. From a theoretical viewpoint, we first establish a mathematical foundation of the synergy between two different modality models, where each one is trained with its own modality. From a practical viewpoint, our work aims to broaden the scope of multimodal learning to encompass the synergistic usage of single-modality models, relieving a strong limitation of paired supervision. -
dc.identifier.bibliographicCitation International Conference on Learning Representations -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/86224 -
dc.identifier.url https://openreview.net/forum?id=5BXWhVbHAK -
dc.language 영어 -
dc.publisher International Conference on Learning Representations -
dc.title Can One Modality Model Synergize Training of Other Modality Models? -
dc.type Conference Paper -
dc.date.conferenceDate 2025-04-24 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.