Verbalized Uncertainty in Medical AI: Differential Diagnosis in Commercial LLMs

Title:Verbalized Uncertainty in Medical AI: Differential Diagnosis in Commercial LLMs

Authors:Yann Hombria Gawior, Steff Groefsema, Omer Tarik Ozyilmaz and Matias Valdenegro-Toro

Conference:IEEE CBMS 2026

Tags:large language models, medical AI and uncertainty estimation

Abstract:

Large Language Models (LLMs) have revolutionized large-scale data processing in healthcare settings, including more efficient and readily available diagnostic models. Differential diagnoses are generated freely and introduced into the clinic by concerned patients. However, many biases are present with limited knowledge about the relationship between the model correctness and the prediction's associated confidence. The current study analyzed three differently purposed LLMs in light of this relationship and visualized the calibration of medical LLMs. Sex, age, and pathology-stratified analyses were also performed separately to evaluate possible biases. Our results indicate that calibration moves from overconfidence to underconfidence when medical LLMs are prompted for a top-5 of likely diagnoses instead of a single prediction. Moreover we found no biases for sex or age-groups, while a bias might exist for specific pathologies. We show that robust evaluation is key for trust in these medical LLMs and more information is required before clinical adoption.