Tags:Clinical Data, Machine Learning, Pancreatic Cancer, SHAP and Survival Prediction
Abstract:
Pancreatic cancer remains one of the most aggressive malignancies, with limited survival rates and significant variability in patient outcomes. This study evaluates the performance of three machine learning models (Random Forest, Decision Tree, and XGBoost) in predicting patient survival at 3, 12, and 18 months, using data from the Complejo Asistencial Universitario de León (CAULE) Radiology Department. To systematically analyze the impact of different features on survival prediction, the dataset was structured into seven variable groups (G1-G7), incorporating demographic, clinical, and treatment-related information. To address the inherent class imbalance in survival prediction, an Autoencoder-based synthetic data generation approach was applied, ensuring a balanced distribution of survival and non-survival cases across all timeframes. Hyperparameter tuning was performed, and experimental results indicate that Random Forest and XGBoost achieved comparable performance, both obtaining an accuracy above 81\% at 3 months, 83\% at 12 months, and 88\% at 18 months when trained on Group G7. To enhance model interpretability, SHapley Additive exPlanations (SHAP) was applied to the best-performing model, identifying key factors influencing survival.
AI-Driven Survival Prediction in Pancreatic Cancer