File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

RDP-MRG: Imitating Radiologist’s Diagnosis Process for Enhanced Visual Grounding in Medical Report Generation

Author(s)
Kim, Jiwon
Advisor
Lim, Chiehyeon
Issued Date
2026-02
URI
https://scholarworks.unist.ac.kr/handle/201301/91052 http://unist.dcollection.net/common/orgView/200000965223
Abstract
Automated medical report generation (MRG) has gained significant research value for its potential to reduce workload and prevent diagnostic errors. Despite recent advances, generating accurate radiology reports remains challenging, as existing models often struggle to visually ground on the clinically important region, which is critical for practical application. We identify three key factors that make visual grounding particularly difficult in medical imaging: deficiency of visual cues in medical images, im- balance of disease distribution, and the inherent frequent bias of the decoder, which tends to prioritize common findings over clinically important findings. In this work, we propose RDP-MRG, a medical report generation framework that mimics the radiologist diagnosis process. Our approach follows a coarse-to-fine diagnostic process composed of three integrated stages. First, the model localizes suspicious regions at the macro-level diagnosis stage by amplifying subtle visual cues using anatomical and clinical knowledge (Visual Cue Amplification, VCA). Second, it identifies the corresponding organ and infers associated diseases for each localized region at the micro-level diagnosis stage (Visual Cue Embodiment, VCE). Finally, the model explicitly leverages the localized and inferred diagnostic information—lesions, organs, and diseases—as guidance to generate visually grounded reports (Visually Grounded Generation, VGG). We evaluate RDP-MRG on two benchmark datasets, MIMIC-CXR and IU-Xray. On MIMIC-CXR, our method achieves superior clinical accuracy among single-stage MRG models and attains performance that is comparable to or even exceeds that of two-stage MRG approaches. Furthermore, RDP-MRG establishes state-of-the-art zero-shot performance on IU-Xray, demonstrating strong cross-dataset generalizability. Extensive experimental results further confirm that our coarse-to-fine diagnostic framework effectively addresses the key challenges in medical report gen- eration, resulting in improved visual grounding and clinical efficacy.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence Artificial Intelligence

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.