Predicting Shortened Pregnancy: XGBoost Outperforms in sPTB Risk Assessment
Predictive Models for Spontaneous Preterm Birth (sPTB)
In this study, we analyzed data from 3,082 pregnant women at our main hospital and 864 from the Chengxi branch. We developed predictive models for spontaneous preterm birth (sPTB) using five machine learning algorithms. The XGBoost model showed the best performance among these.
We identified several key factors that influence sPTB. The main factors include alkaline phosphatase (ALP), alpha-fetoprotein (AFP), albumin (ALB), hematocrit (HCT), total cholesterol (TC), diastolic blood pressure (DBP), alanine aminotransferase (ALT), platelet count (PLT), height, and systolic blood pressure (SBP).
Statistical Findings
From 44 evaluated indicators, 24 showed statistically significant differences between the groups. We conducted a collinearity analysis on these indicators and excluded those with high correlation. This process helped us create five models: Adaboost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Gradient Boosting (GB). The XGBoost model produced an area under the curve (AUC) of 0.89, indicating its strong predictive capability.
Role of XGBoost in Medicine
XGBoost is gaining popularity in medical diagnostics due to its efficiency and flexibility. It can analyze large datasets to predict disease risks effectively. Researchers have applied this algorithm to various medical conditions, including heart disease and cancer.
Conclusion and Clinical Implications
Combining key laboratory parameters like ALP and AFP with advanced algorithms like XGBoost can enhance the prediction of sPTB. We recommend integrating the XGBoost model into clinical practice, alongside ultrasound and clinical data, to improve its accuracy.
Study Limitations
Our study has limitations. The data came from a single hospital, which may affect how broadly we can apply our findings. We did not include ultrasound data, especially cervical length, which is important for assessing fetal health.
To improve our research, we plan to gather data from multiple centers to include diverse patient populations. Future studies should also focus on increasing sample sizes and using prospective methods to validate our findings. Additionally, we aim to collect data across different regions and consider factors like ethnicity and socioeconomic status to enhance the predictive model’s performance.
During data collection, we focused on a group of patients with uncomplicated pregnancies. While this helps maintain consistency, it may limit the generalizability of the findings.
In summary, this study highlights the potential of machine learning in predicting sPTB and emphasizes the need for comprehensive data to improve predictive accuracy.
