Can One Modality Model Synergize Training of Other Modality Models?

Lee, Jae-Jun; Yoon, Sung Whan

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

윤성환

Yoon, Sung Whan: Machine Intelligence and Information Learning Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	SI	-
dc.citation.conferencePlace	Singapore EXPO	-
dc.citation.title	International Conference on Learning Representations	-
dc.contributor.author	Lee, Jae-Jun	-
dc.contributor.author	Yoon, Sung Whan	-
dc.date.accessioned	2025-02-13T12:05:07Z	-
dc.date.available	2025-02-13T12:05:07Z	-
dc.date.created	2025-02-12	-
dc.date.issued	2025-04-24	-
dc.description.abstract	Learning with multiple modalities has recently demonstrated significant gains in many domains by maximizing the shared information across modalities. However, the current approaches strongly rely on high-quality paired datasets, which allow co-training from the paired labels from different modalities. In this context, we raise a pivotal question: Can a model with one modality synergize the training of other models with the different modalities, even without the paired multimodal labels? Our answer is 'Yes'. As a figurative description, we argue that a writer, i.e., a language model, can promote the training of a painter, i.e., a visual model, even without the paired ground truth of text and image. We theoretically argue that a superior representation can be achieved by the synergy between two different modalities without paired supervision. As proofs of concept, we broadly confirm the considerable performance gains from the synergy among visual, language, and audio models. From a theoretical viewpoint, we first establish a mathematical foundation of the synergy between two different modality models, where each one is trained with its own modality. From a practical viewpoint, our work aims to broaden the scope of multimodal learning to encompass the synergistic usage of single-modality models, relieving a strong limitation of paired supervision.	-
dc.identifier.bibliographicCitation	International Conference on Learning Representations	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86224	-
dc.identifier.url	https://openreview.net/forum?id=5BXWhVbHAK	-
dc.language	영어	-
dc.publisher	International Conference on Learning Representations	-
dc.title	Can One Modality Model Synergize Training of Other Modality Models?	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2025-04-24	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.