Tags:BERT, Clinical notes, Deep Learning, NER and Psychiatry
Abstract:
The unstructured nature of psychiatric clinical notes poses a significant challenge for automated information extraction and data structuring. In this study, we explore the use of transformer-based language models to perform Named Entity Recognition (NER) on de-identified Spanish electronic health records (EHRs) provided by the Psychiatry Service of Complejo Asistencial Universitario de León (CAULE). A manually annotated gold standard, consisting of 200 clinical notes, was developed by domain experts to evaluate the performance of five models: BETO (cased and uncased), ALBETO, ClinicalBERT, and Bio\_ClinicalBERT. Each model was fine-tuned and assessed using a strict exact matching criterion across six clinically relevant label types. Results demonstrate that ClinicalBERT, despite being pre-trained on English medical corpora, achieved the highest macro-average F1-score on the test set (80\%). However, BETO-cased outperformed ClinicalBERT in four out of six label types, being better in categories with higher syntactic variability. Lower-performing models, such as ALBETO and Bio\_ClinicalBERT, struggled to generalize to Spanish psychiatric language, likely due to domain and language mismatches. This work highlights the effectiveness of transformer-based architectures for structuring psychiatric narratives in Spanish and provides a robust foundation for future clinical NLP applications in non-English contexts.
Fine-Tuning Transformer Models for Structuring Spanish Psychiatric Clinical Notes