| ||||
| ||||
![]() Title:Evaluating Schema-Awareness in Transformer Models for Synthetic Electronic Health Records Authors:Amanda Bertgren, Fredrik Öhberg, Paolo Soda, Ulf Näslund, Patrik Wennberg and Christer Grönlund Conference:IEEE CBMS 2026 Tags:electronic health records, synthetic data and transformer model Abstract: Synthetic data can facilitate sharing of electronic health records whilst preserving the privacy of the individuals represented in the original data set. However, conventional tabular models for synthetic data generation (SDG) are structurally limited by data quality and display inflexibility to pre-training when the dataset is small. Transformer models have potential to overcome these challenges, but how to optimise their adaption to SDG is still largely unexplored. Therefore, we evaluated the effect of schema-awareness embedded in weights and prompts in four experiments using a Phi-2 model fine-tuned on a dataset within cardiovascular disease risk, in comparison to a CTGAN model. Our results showed that schema-awareness did not have a notable effect in neither weights nor prompts for synthetic data generation. However, all experiments based on the Phi-2 model displayed higher performance in fidelity for numerical variables in comparison to CTGAN. All models performed similarly within privacy preservation. Future studies should explore if schema-awareness have an effect when pre-training the model on multiple datasets and whether more or other type of schematic information have an effect on SDG performance. Evaluating Schema-Awareness in Transformer Models for Synthetic Electronic Health Records ![]() Evaluating Schema-Awareness in Transformer Models for Synthetic Electronic Health Records | ||||
| Copyright © 2002 – 2026 EasyChair |
