| ||||
| ||||
![]() Title:Benchmark Pipeline for Missing Data Imputation in Oncological Electronic Health Records Conference:IEEE CBMS 2026 Tags:Breast Cancer, Data Quality, Electronic Health Records, Generative Models, Imputation and Missing Data Abstract: Cancer-related Electronic Health Records (EHRs) integrate structured clinical variables, such as diagnoses, prescribed medications, laboratory measurements, procedures, detailed longitudinal data documenting clinical visit histories. While this heterogeneity enables large-scale observational research, workflow-driven missingness and inconsistent data capture complicate preprocessing and integration, potentially biasing statistical inference, survival modeling, and patient stratification. Robust and reproducible imputation strategies are therefore essential for reliable oncology analyses. In this work, we evaluate whether a standardized preprocessing pipeline, combining data harmonization, variable selection, outlier detection, and imputation, improves robustness in survival-oriented oncology studies. We systematically compare statistical approaches (Multiple Imputation by Chained Equations), machine learning methods (MissForest and k-nearest neighbors), and deep generative models (Variational Autoencoders and Generative Adversarial Imputation Networks). Experiments are conducted on a real-world cohort of breast cancer patients assembled from four hospital registries at Institut Paoli-Calmettes, in Marseille, France, characterized by substantial and heterogeneous missingness. We assess reconstruction performance across methods and analyze how imputation choices propagate to downstream survival modeling and patient stratification. Our findings provide practical guidance for selecting imputation strategies in real-world cancer EHR studies, highlighting trade-offs between reconstruction accuracy and clinical validity while promoting reproducible analytical workflows. Benchmark Pipeline for Missing Data Imputation in Oncological Electronic Health Records ![]() Benchmark Pipeline for Missing Data Imputation in Oncological Electronic Health Records | ||||
| Copyright © 2002 – 2026 EasyChair |
