Controllable Text-to-Image Synthesis for Multi-Modality MR Images

Scholarworks@UNIST

UNIST Library

There are no files associated with this item.

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

이지민

Read More

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Controllable Text-to-Image Synthesis for Multi-Modality MR Images

Author(s): Kim, Kyuri, Na, Yoonho, Ye, Sung-Joon, Lee, Jimin, Ahn, Sung Soo, Eun Park, Ji, Kim, Hwiyoung

Citation: 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024, pp.7921 - 7930

Abstract: Generative modeling has seen significant advancements in recent years, especially in the realm of text-to-image synthesis. Despite this progress, the medical field has yet to fully leverage the capabilities of large-scale foundational models for synthetic data generation. This paper introduces a framework for text-conditional magnetic resonance (MR) imaging generation, addressing the complexities associated with multi-modality considerations. The framework comprises a pre-trained large language model, a diffusion-based prompt-conditional image generation architecture, and an additional denoising network for input structural binary masks. Experimental results demonstrate that the proposed framework is capable of generating realistic, high-resolution, and high-fidelity multi-modal MR images that align with medical language text prompts. Further, the study interprets the cross-attention maps of the generated results based on text-conditional statements. The contributions of this research lay a robust foundation for future studies in text-conditional medical image generation and hold significant promise for accelerating advancements in medical imaging research.

qrcode

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.