File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Lim, Chiehyeon -
dc.contributor.author Kim, Jiwon -
dc.date.accessioned 2026-03-26T22:15:16Z -
dc.date.available 2026-03-26T22:15:16Z -
dc.date.issued 2026-02 -
dc.description.abstract Automated medical report generation (MRG) has gained significant research value for its potential to reduce workload and prevent diagnostic errors. Despite recent advances, generating accurate radiology reports remains challenging, as existing models often struggle to visually ground on the clinically important region, which is critical for practical application. We identify three key factors that make visual grounding particularly difficult in medical imaging: deficiency of visual cues in medical images, im- balance of disease distribution, and the inherent frequent bias of the decoder, which tends to prioritize common findings over clinically important findings. In this work, we propose RDP-MRG, a medical report generation framework that mimics the radiologist diagnosis process. Our approach follows a coarse-to-fine diagnostic process composed of three integrated stages. First, the model localizes suspicious regions at the macro-level diagnosis stage by amplifying subtle visual cues using anatomical and clinical knowledge (Visual Cue Amplification, VCA). Second, it identifies the corresponding organ and infers associated diseases for each localized region at the micro-level diagnosis stage (Visual Cue Embodiment, VCE). Finally, the model explicitly leverages the localized and inferred diagnostic information—lesions, organs, and diseases—as guidance to generate visually grounded reports (Visually Grounded Generation, VGG). We evaluate RDP-MRG on two benchmark datasets, MIMIC-CXR and IU-Xray. On MIMIC-CXR, our method achieves superior clinical accuracy among single-stage MRG models and attains performance that is comparable to or even exceeds that of two-stage MRG approaches. Furthermore, RDP-MRG establishes state-of-the-art zero-shot performance on IU-Xray, demonstrating strong cross-dataset generalizability. Extensive experimental results further confirm that our coarse-to-fine diagnostic framework effectively addresses the key challenges in medical report gen- eration, resulting in improved visual grounding and clinical efficacy. -
dc.description.degree Master -
dc.description Graduate School of Artificial Intelligence Artificial Intelligence -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/91052 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000965223 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.rights.embargoReleaseDate 9999-12-31 -
dc.rights.embargoReleaseTerms 9999-12-31 -
dc.subject Quantum Dots,Ligand Exchange,Ligand,Suface modiciation, CdSe, InP -
dc.title RDP-MRG: Imitating Radiologist’s Diagnosis Process for Enhanced Visual Grounding in Medical Report Generation -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.