Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

2
citations
#1094
in ICCV 2025
of 2701 papers
7
Top Authors
8
Data Points

Abstract

The rapid advancements in Vision Language Models (VLMs) have prompted the development of multi-modal medical assistant systems. Despite this progress, current models still have inherent probabilistic uncertainties, often producing erroneous or unverified responses-an issue with serious implications in medical applications. Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning. However, these training-dependent strategies are costly and still lack sufficient alignment with clinical expertise. To address these issues, we propose an expert-in-the-loop framework named Expert-Controlled Classifier-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training. This framework introduces an uncertainty estimation strategy to identify unreliable outputs. It then retrieves relevant references to assist experts in highlighting key terms and applies classifier-free guidance to refine the token embeddings of MedVLM, ensuring that the adjusted outputs are correct and align with expert highlights. Evaluations across three medical visual question answering benchmarks demonstrate that the proposed Expert-CFG, with 4.2B parameters and limited expert annotations, outperforms state-of-the-art models with 13B parameters. The results demonstrate the feasibility of deploying such a system in resource-limited settings for clinical use.

Citation History

Jan 24, 2026
0
Jan 26, 2026
0
Jan 26, 2026
0
Jan 27, 2026
2+2
Feb 3, 2026
2
Feb 13, 2026
2
Feb 13, 2026
2
Feb 13, 2026
2