Is fine-tuning useful in EHR-based prediction models? A use case on mortality prediction with longitudinal data from Spanish (SIDIAP) and UK (CPRD) populations aged over 65 years

Title:Is fine-tuning useful in EHR-based prediction models? A use case on mortality prediction with longitudinal data from Spanish (SIDIAP) and UK (CPRD) populations aged over 65 years

Authors:Lucía Amalia Carrasco-Ribelles, Margarita Cabrera-Bean, Albert Prats-Uribe, Sara Khalid and Concepción Violán

Conference:IEEE CBMS 2025

Tags:all cause mortality, Attention mechanisms, clinical utility, decision curve analysis, EHR, electronic health record, External validation, Fine-tuning, Longitudinal data, Mortality, net benefit, office for national statistics, Recurrent neural networks and transfer learning

Abstract:

Transfer learning enables the reuse of models trained on large datasets, reducing data collection, computation time, and costs. While widely used in computer vision, its application to models based on electronic health records (EHRs) remains limited. This study evaluates whether fine-tuning an EHR-based model from one country to another outperforms training a model from scratch.

EHR from the SIDIAP (Spain) and CPRD (UK) databases were used, defining a cohort in each country of individuals aged 65+ followed between 2010 and 2019. A prediction model was trained and validated internally for each country to predict 1-year mortality (country-specific), then externally validated and fine-tuned with the other country’s population (re-calibrated model). The models were based on ARIADNEhr, a previously validated architecture. Performance metrics, decision curve analysis, and attention maps were compared.

Participants included 1,456,052 from SIDIAP and 1,507,736 from CPRD, with similar demographics. Performance on the external cohort varied between -10.9% and +39.5%. Fine-tuning consistently improved external performance (1.8%–15.5%), enhanced model calibration and clinical utility, and maintained key contributing variables. However, the fine-tuned models did not reach the performance of the country-specific models, showing a performance drop between 14% and 20%. Fine-tuning may be useful in other fields, but further development may be required for its application in tabular EHR-based prediction models.