Thyroid disorders, which impact the thyroid gland’s functionality, present substantial health challenges globally. Current research in thyroid disease emphasizes prediction using machine learning (ML) and deep learning (DL) algorithms, with a particular interest in ensemble models for prediction. However, existing thyroid disease prediction models often encounter challenges with imbalanced datasets, limited data sizes, and inadequate feature selection. In this paper, a stacking model was introduced for predicting thyroid disease. The model utilized Extreme Gradient Boosting (XGBoost) and Multilayer Perceptron (MLP) as base models for thyroid disease prediction. These models were integrated into a stacking framework where Logistic Regression (LR) is the meta-model, aggregating the predictions from the base models to generate final predictions. Also, the feature selection technique was applied to identify significant features. Additionally, a SMOTE-ENN a hybrid sampling technique was incorporated to address the class balancing issue in the thyroid dataset, enhancing the model’s ability to learn effectively from both classes. Furthermore, a SHapley Additive exPlanations (SHAP) technique was used to enrich the interpretability and transparency of the proposed model. The proposed stacking model demonstrated superior performance with an impressive accuracy of 99.78%. Finally, the stacking technique effectively combined the strengths of XGBoost and MLP, using their complementary features to boost predictive accuracy through LR. The proposed model enhanced prediction accuracy and offered valuable insights into the intricate patterns and relationships among features associated with thyroid disease.
ThyroStack: a Stacking Model for Thyroid Disease Prediction