File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Lim, Chiehyeon -
dc.contributor.author Sohn, Wonho -
dc.date.accessioned 2026-03-26T22:14:06Z -
dc.date.available 2026-03-26T22:14:06Z -
dc.date.issued 2026-02 -
dc.description.abstract Modern information systems increasingly capture heterogeneous signals in multiple modalities, including images, text, audio/video, and structured logs, creating strong incentives for multi-modal learning. By integrating complementary information and enforcing cross-modal consistency, multi-modal models can learn unified representation implying complementary information, thereby improving accuracy, robustness, and generalization. However, deploying these methods beyond curated benchmarks introduces additional real-world requirements that are not addressed by predictive performance alone. Real-world application must remain feasible under continuously growing dataset, remain resilient to incomplete, noisy, or partially missing inputs, and provide checkable rationales for decisions.

This dissertation advances a unified perspective that treats these two objectives: (i) strengthening multi-modal representations to improve downstream performance via heterogeneous integration, and (ii) introducing prototypes as a complementary design principle to mitigate real-world constraints. In the first study on fashion e-commerce, we propose MDL-FR, an end-to-end framework that integrates visual and textual data and learns style prototypes that capture high-level structure, enabling style-aware outfit generation beyond compatibility-only recommendation. In the second study on single-cell multi-omics integration, we propose CPG-AE, which replaces dense cell–cell interactions with a sparse cell–prototype graph and combines prototype-mediated message passing with multi-modal fusion autoencoder to learn coherent joint embeddings. Across both domains, experimental results show that multi-modal architectures improve task performance, while learned prototypes provide compact anchors that enhance scalability, robustness to incomplete data, and evidence-oriented validation beyond aggregate metrics.
-
dc.description.degree Doctor -
dc.description Department of Industrial Engineering -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/90973 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000965089 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.rights.embargoReleaseDate 9999-12-31 -
dc.rights.embargoReleaseTerms 9999-12-31 -
dc.subject Analog-digital hybrid computing, Computing-in memory (CIM), embedded dynamic random-access memory (eDRAM) -
dc.title A Study on the Application of Multi-modal Learning for Real-World Challenges using Prototype -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.