Controllable Text-to-Image Synthesis for Multi-Modality MR Images

Kim, Kyuri; Na, Yoonho; Ye, Sung-Joon; Lee, Jimin; Ahn, Sung Soo; Eun Park, Ji; Kim, Hwiyoung

doi:10.1109/WACV57701.2024.00775

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

이지민

Lee, Jimin: Radiation & Medical Intelligence Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	US	-
dc.citation.endPage	7930	-
dc.citation.startPage	7921	-
dc.citation.title	2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024	-
dc.contributor.author	Kim, Kyuri	-
dc.contributor.author	Na, Yoonho	-
dc.contributor.author	Ye, Sung-Joon	-
dc.contributor.author	Lee, Jimin	-
dc.contributor.author	Ahn, Sung Soo	-
dc.contributor.author	Eun Park, Ji	-
dc.contributor.author	Kim, Hwiyoung	-
dc.date.accessioned	2024-12-20T09:35:12Z	-
dc.date.available	2024-12-20T09:35:12Z	-
dc.date.created	2024-12-19	-
dc.date.issued	2024-01-03	-
dc.description.abstract	Generative modeling has seen significant advancements in recent years, especially in the realm of text-to-image synthesis. Despite this progress, the medical field has yet to fully leverage the capabilities of large-scale foundational models for synthetic data generation. This paper introduces a framework for text-conditional magnetic resonance (MR) imaging generation, addressing the complexities associated with multi-modality considerations. The framework comprises a pre-trained large language model, a diffusion-based prompt-conditional image generation architecture, and an additional denoising network for input structural binary masks. Experimental results demonstrate that the proposed framework is capable of generating realistic, high-resolution, and high-fidelity multi-modal MR images that align with medical language text prompts. Further, the study interprets the cross-attention maps of the generated results based on text-conditional statements. The contributions of this research lay a robust foundation for future studies in text-conditional medical image generation and hold significant promise for accelerating advancements in medical imaging research.	-
dc.identifier.bibliographicCitation	2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024, pp.7921 - 7930	-
dc.identifier.doi	10.1109/WACV57701.2024.00775	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/85118	-
dc.language	영어	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Controllable Text-to-Image Synthesis for Multi-Modality MR Images	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2024-01-04	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.