RDP-MRG: Imitating Radiologist’s Diagnosis Process for Enhanced Visual Grounding in Medical Report Generation

Kim, Jiwon

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

RDP-MRG: Imitating Radiologist’s Diagnosis Process for Enhanced Visual Grounding in Medical Report Generation

Author(s): Kim, Jiwon

Advisor: Lim, Chiehyeon

Issued Date: 2026-02

URI: https://scholarworks.unist.ac.kr/handle/201301/91052 http://unist.dcollection.net/common/orgView/200000965223

Abstract: Automated medical report generation (MRG) has gained significant research value for its potential to reduce workload and prevent diagnostic errors. Despite recent advances, generating accurate radiology reports remains challenging, as existing models often struggle to visually ground on the clinically important region, which is critical for practical application. We identify three key factors that make visual grounding particularly difficult in medical imaging: deficiency of visual cues in medical images, im- balance of disease distribution, and the inherent frequent bias of the decoder, which tends to prioritize common findings over clinically important findings. In this work, we propose RDP-MRG, a medical report generation framework that mimics the radiologist diagnosis process. Our approach follows a coarse-to-fine diagnostic process composed of three integrated stages. First, the model localizes suspicious regions at the macro-level diagnosis stage by amplifying subtle visual cues using anatomical and clinical knowledge (Visual Cue Amplification, VCA). Second, it identifies the corresponding organ and infers associated diseases for each localized region at the micro-level diagnosis stage (Visual Cue Embodiment, VCE). Finally, the model explicitly leverages the localized and inferred diagnostic information—lesions, organs, and diseases—as guidance to generate visually grounded reports (Visually Grounded Generation, VGG). We evaluate RDP-MRG on two benchmark datasets, MIMIC-CXR and IU-Xray. On MIMIC-CXR, our method achieves superior clinical accuracy among single-stage MRG models and attains performance that is comparable to or even exceeds that of two-stage MRG approaches. Furthermore, RDP-MRG establishes state-of-the-art zero-shot performance on IU-Xray, demonstrating strong cross-dataset generalizability. Extensive experimental results further confirm that our coarse-to-fine diagnostic framework effectively addresses the key challenges in medical report gen- eration, resulting in improved visual grounding and clinical efficacy.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Graduate School of Artificial Intelligence Artificial Intelligence

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.