Models for predicting preterm birth have historically focused on babies considered very preterm, born at 28 to 32 weeks, or moderate to late preterm, born at 32 to 37 weeks. Only a few studies have looked at those born extremely preterm, before 28 weeks of development, yet these early fetuses account for the vast majority of newborn deaths.
Aware that knowledge saves lives, Vanderbilt University Medical Center bioinformatics specialist You Chen, Ph.D., is leading an initiative to provide clinicians with predictive tools for extreme preterm birth (EPB).
In a study published in the Journal of Biomedical Informatics, Chen and colleagues identified five top risk factors in EPB that yielded high predictive accuracy. For this work, he employed a recurrent neural network (RNN), a machine learning approach that combines predictions from multiple models.
“We found that we could predict preterm birth at 20 weeks gestational age with high accuracy,” Chen said. “This is eight weeks earlier than before.”
Now, he is training and refining this model to yield even higher and more widely applicable predictive results.
Addressing a Gap in the Literature
Chen says the few studies that have focused on risk factors for EPB and predicting EPB occurrence have included only small cohorts and a small number of risk factors. Additionally, they have demonstrated poor predictive performance.
“We found that we could predict preterm birth at 20 weeks gestational age with high accuracy.”
“As a result, EPB prediction models proposed to date do not allow sufficient time for families or healthcare organizations to plan optimal care for the newborn,” Chen said.
Weighting Risk Factors
In contrast, the model developed by Chen and colleagues identified several primary risk factors as being significantly associated with EPB, including twin pregnancy, systemic lupus erythematosus, short cervical length, hypertensive disorder and hydroxychloroquine sulfate.
While the existence of some of these risks is not news, the model is unique in that it identifies both individual and aggregated risks that can influence prognosis cumulatively.
“If a patient does not have a high risk based on any one factor, an obstetrician may not expect a preterm birth,” he said. “But this model accounts for combinations of low and moderate risks that might put them at high risk in the aggregate.”
Putting the Model to Work
To test the model, Chen plumbed data from the EMRs of over 25,000 deliveries that occurred at Vanderbilt over a 10 year period, during which time EPBs accounted for about 1 percent of the deliveries. He then trained a series of RNN models on these records and combined them into one ensemble model.
“An RNN model enables us to tap into the temporal information in each patient’s EMR,” Chen said. “We hypothesized that combining many RNNs would substantially increase our predictive power.”
“If a patient does not have a high risk based on any one factor, an obstetrician may not expect a preterm birth. This model accounts for combinations of low and moderate risks that might put them at high risk in the aggregate.”
Compared to traditional machine learning models, the RNN ensemble model yielded a significantly higher predictive performance, even at 20 weeks gestational age.
Implications for Delivery
Sarah Osmundson, M.D., a maternal-fetal medicine specialist at Vanderbilt, worked with Chen on the study and sees the model’s chief value as being the ability to afford clinicians more time to prepare for a preterm birth.
“We really struggle to understand the etiology of preterm birth,” Osmundson said. “We have limited effective medications to prevent preterm birth. Further, medications may pause contractions and labor, but often they only delay birth by 24 to 48 hours.”
She says the biggest benefit in predicting preterm birth is risk stratifying patients.
“Early prediction can help providers ensure that the mother delivers at a location equipped and staffed to care for the baby’s inevitable health complications,” Osmundson said. “Some mothers may even temporarily move from a rural setting to an urban area to be near a tertiary neonatal center.”
Refinements in Progress
Chen has begun working on multiple refinements to the model, including the incorporation of polygenic risk scores for a more comprehensive risk assessment.
Another of his goals is to incrementally remove bias from the AI predictions.
“Models relying on EMRs or polygenic risk scores may have significantly different predictive performances for disparate subpopulations,” he said. “We believe that such errors could be resolved by training our model with data from larger and more diverse populations.”
Toward that end, the team plans to combine data from different health institutions and subpopulations, creating a pipeline toward a more generalizable application of their models.