RDP-MRG: Imitating Radiologist’s Diagnosis Process for Enhanced Visual Grounding in Medical Report Generation

Kim, Jiwon

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Lim, Chiehyeon	-
dc.contributor.author	Kim, Jiwon	-
dc.date.accessioned	2026-03-26T22:15:16Z	-
dc.date.available	2026-03-26T22:15:16Z	-
dc.date.issued	2026-02	-
dc.description.abstract	Automated medical report generation (MRG) has gained significant research value for its potential to reduce workload and prevent diagnostic errors. Despite recent advances, generating accurate radiology reports remains challenging, as existing models often struggle to visually ground on the clinically important region, which is critical for practical application. We identify three key factors that make visual grounding particularly difficult in medical imaging: deficiency of visual cues in medical images, im- balance of disease distribution, and the inherent frequent bias of the decoder, which tends to prioritize common findings over clinically important findings. In this work, we propose RDP-MRG, a medical report generation framework that mimics the radiologist diagnosis process. Our approach follows a coarse-to-fine diagnostic process composed of three integrated stages. First, the model localizes suspicious regions at the macro-level diagnosis stage by amplifying subtle visual cues using anatomical and clinical knowledge (Visual Cue Amplification, VCA). Second, it identifies the corresponding organ and infers associated diseases for each localized region at the micro-level diagnosis stage (Visual Cue Embodiment, VCE). Finally, the model explicitly leverages the localized and inferred diagnostic information—lesions, organs, and diseases—as guidance to generate visually grounded reports (Visually Grounded Generation, VGG). We evaluate RDP-MRG on two benchmark datasets, MIMIC-CXR and IU-Xray. On MIMIC-CXR, our method achieves superior clinical accuracy among single-stage MRG models and attains performance that is comparable to or even exceeds that of two-stage MRG approaches. Furthermore, RDP-MRG establishes state-of-the-art zero-shot performance on IU-Xray, demonstrating strong cross-dataset generalizability. Extensive experimental results further confirm that our coarse-to-fine diagnostic framework effectively addresses the key challenges in medical report gen- eration, resulting in improved visual grounding and clinical efficacy.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/91052	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000965223	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.rights.embargoReleaseDate	9999-12-31	-
dc.rights.embargoReleaseTerms	9999-12-31	-
dc.subject	Quantum Dots,Ligand Exchange,Ligand,Suface modiciation, CdSe, InP	-
dc.title	RDP-MRG: Imitating Radiologist’s Diagnosis Process for Enhanced Visual Grounding in Medical Report Generation	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.