A Study on the Application of Multi-modal Learning for Real-World Challenges using Prototype

Sohn, Wonho

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Lim, Chiehyeon	-
dc.contributor.author	Sohn, Wonho	-
dc.date.accessioned	2026-03-26T22:14:06Z	-
dc.date.available	2026-03-26T22:14:06Z	-
dc.date.issued	2026-02	-
dc.description.abstract	Modern information systems increasingly capture heterogeneous signals in multiple modalities, including images, text, audio/video, and structured logs, creating strong incentives for multi-modal learning. By integrating complementary information and enforcing cross-modal consistency, multi-modal models can learn unified representation implying complementary information, thereby improving accuracy, robustness, and generalization. However, deploying these methods beyond curated benchmarks introduces additional real-world requirements that are not addressed by predictive performance alone. Real-world application must remain feasible under continuously growing dataset, remain resilient to incomplete, noisy, or partially missing inputs, and provide checkable rationales for decisions. This dissertation advances a unified perspective that treats these two objectives: (i) strengthening multi-modal representations to improve downstream performance via heterogeneous integration, and (ii) introducing prototypes as a complementary design principle to mitigate real-world constraints. In the first study on fashion e-commerce, we propose MDL-FR, an end-to-end framework that integrates visual and textual data and learns style prototypes that capture high-level structure, enabling style-aware outfit generation beyond compatibility-only recommendation. In the second study on single-cell multi-omics integration, we propose CPG-AE, which replaces dense cell–cell interactions with a sparse cell–prototype graph and combines prototype-mediated message passing with multi-modal fusion autoencoder to learn coherent joint embeddings. Across both domains, experimental results show that multi-modal architectures improve task performance, while learned prototypes provide compact anchors that enhance scalability, robustness to incomplete data, and evidence-oriented validation beyond aggregate metrics.	-
dc.description.degree	Doctor	-
dc.description	Department of Industrial Engineering	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/90973	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000965089	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.rights.embargoReleaseDate	9999-12-31	-
dc.rights.embargoReleaseTerms	9999-12-31	-
dc.subject	Analog-digital hybrid computing, Computing-in memory (CIM), embedded dynamic random-access memory (eDRAM)	-
dc.title	A Study on the Application of Multi-modal Learning for Real-World Challenges using Prototype	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.