Towards Efficient Multilingual Multimodal Fusion: A Contrastive Learning Approach Using Machine-Translation

Scholarworks@UNIST

UNIST Library

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim, Taehwan	-
dc.contributor.author	Kim, Jongeun	-
dc.date.accessioned	2024-10-14T13:50:41Z	-
dc.date.available	2024-10-14T13:50:41Z	-
dc.date.issued	2024-08	-
dc.description.abstract	Prior researches focus on multilingual text and images in zero-shot settings due to the lack of multilin- gual image-text pair data. On the other hand, to handle multilingual multimodal directly, we introduce an Efficient Multilingual Multimodal Fusion (EMMF) network trained on machine-translated datasets. The multilingual and multimodal projected representations learn contrastively to adjust along with au- toregressive manner. Experiments on the xGQA dataset demonstrate that our model successfully aligns representations compared to previous zero-shot methods and shows qualitative improvements over sim- ilar methods.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/84192	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000813131	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.title	Towards Efficient Multilingual Multimodal Fusion: A Contrastive Learning Approach Using Machine-Translation	-
dc.type	Thesis	-

qrcode

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.