Gestational diabetes mellitus (GDM) is characterized by remarkable heterogeneity, making personalized treatment challenging. We applied unsupervised machine learning techniques to identify clinically relevant GDM clusters using data available in the routine clinical practice from 2,682 women diagnosed with GDM (1,865 women from Charité University Hospital in Berlin and 817 women from the Medical University of Vienna). We employed different clustering algorithms including k-means, k-medoids, and hierarchical clustering. Model selection was based on evaluating solution stability with the Jaccard index, cluster compactness via the silhouette score, and robustness assessed through two-fold cross-validation. External validation on independent cohorts was implemented to evaluate the generalizability of the identified clusters. Our final model identified three distinct clusters using maternal age, pre-pregnancy body mass index, and glucose levels from the diagnostic oral glucose tolerance test. Each cluster was associated with different treatment needs and neonatal outcomes. These findings provide a valuable clinical decision support that could help clinicians define more personalized treatment approaches, improving both maternal and neonatal outcomes.
Identification and Validation of Gestational Diabetes Subgroups: a Data-Driven Approach