Tags:Infant Mortaliy, Logistic Regression and Random Oversampling
Abstract:
The objective of the present study was to investigate the characteristics of the mother, pregnancy, childbirth, and unborn child as possible risk and/or protective factors for infant mortality in a large data set. Multiple logistic regression models and data balancing techniques were used to this end. Material and methods: This was a cross-sectional study that investigated the mortality of children born in hospital units without anomalies and single pregnancies, using 167,928 birth records from the state of Rio de Janeiro in 2019 (DATASUS SIM and SINASC databases). Due to the relatively low proportion of the outcome variable (mortality), the data balancing technique Random Oversampling (RO) was used in order to generate a dataset more suitable for the modelling. The RO technique consists in the resampling method by adding records to the minority class. Data were divided into the groups training and test. Model fit was evaluated by means of the statistics AUC, Recall, Precision, F1-Score and Accuracy. A final model was obtained by discarding all non-significant variables in the model obtained in the dataset. Results: The result of the performance measures in the final model (AUC: 0.85, Recall: 0.70, Precision: 0.05, F1-Score: 0.09, Accuracy: 0.91) indicated a good model fit. The variables Apgar score, newborn birth weight, and gestational age were those with the largest impact on the outcome. Conclusion. The results of this work can generate several insights and reflections for public policy actions that could improve infant mortality rates.
Infant Mortality Prediction in the State of Rio de Janeiro Based on Maternal, Pregnancy, Delivery and Unborn Child Characteristics