1,812
Views
0
CrossRef citations to date
0
Altmetric
Original Article

A new model based on artificial intelligence to screening preterm birth

, ORCID Icon, , , , , , , , , , & show all
Article: 2241100 | Received 10 Feb 2022, Accepted 21 Jul 2023, Published online: 30 Jul 2023

Abstract

Objective

The objective of this study is to create a new screening for spontaneous preterm birth (sPTB) based on artificial intelligence (AI).

Methods

This study included 524 singleton pregnancies from 18th to 24th-week gestation after transvaginal ultrasound cervical length (CL) analyzes for screening sPTB < 35 weeks. AI model was created based on the stacking-based ensemble learning method (SBELM) by the neural network, gathering CL < 25 mm, multivariate unadjusted logistic regression (LR), and the best AI algorithm. Receiver Operating Characteristics (ROC) curve to predict sPTB < 35 weeks and area under the curve (AUC), sensitivity, specificity, accuracy, predictive positive and negative values were performed to evaluate CL < 25 mm, LR, the best algorithms of AI and SBELM.

Results

The most relevant variables presented by LR were cervical funneling, index straight CL/internal angle inside the cervix (≤ 0.200), previous PTB < 37 weeks, previous curettage, no antibiotic treatment during pregnancy, and weight (≤ 58 kg), no smoking, and CL < 30.9 mm. Fixing 10% of false positive rate, CL < 25 mm and SBELM present, respectively: AUC of 0.318 and 0.808; sensitivity of 33.3% and 47,3%; specificity of 91.8 and 92.8%; positive predictive value of 23.1 and 32.7%; negative predictive value of 94.9 and 96.0%. This machine learning presented high statistical significance when compared to CL < 25 mm after T-test (p < .00001).

Conclusion

AI applied to clinical and ultrasonographic variables could be a viable option for screening of sPTB < 35 weeks, improving the performance of short cervix, with a low false-positive rate.

Introduction

The prediction of preterm birth (PTB) plays a crucial role in addressing the significant mortality rates among children [Citation1]. Currently, the most widely accepted predictor for PTB is a cervical measurement (CL) < 25 mm, which serves as the recognized cutoff value for positive screening in singleton pregnancies [Citation2]. Obstetric history, particularly the number of prior preterm births occurring before 37 weeks or 34 weeks, can also serve as a screening method. However, despite the potential for improved screening performance through a combination of obstetric history, especially the number of previous preterm birth < 37 weeks or < 34 weeks, and cervical length, there is currently no readily available tool to facilitate such integration [Citation3,Citation4].

In most centers, patients in the mid-trimester of gestation were considered a positive screening if they had a short cervix (<25 mm) [Citation2], or in case of previous PTB, with 10 percent false positive for each case separately [Citation3]. According to Iams et al. [Citation2], the sensitivity of the short cervix <25 mm was 37.3%. When compared to screenings for other pregnancy complications, such as preeclampsia [Citation5], the current performance of screening for preterm birth, based solely on cervical length (CL), is significantly inferior.

Considering the global number of live births in 2014 (139.9 million), preterm birth (PTB) occurred in approximately 10.6% (14.84 million) of cases [Citation6]. According to DATASUS data from 2019, Brazil had a PTB rate of 11.4% before 37 weeks of gestation (http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sinasc/cnv/nvuf.def). PTB is the leading cause of mortality among children under the age of 5 [Citation7]. It is estimated that approximately 1.4 million deaths occur annually worldwide due to prematurity [Citation8]. Consequently, interventions to reduce PTB are of utmost importance, and the utilization of novel technologies such as Artificial Intelligence (AI) may offer a viable alternative [Citation9].

Over the last 10 years, the increase of data in modern society created big data. Artificial Intelligence has been extensively used to analyze it [Citation10]. This process of machine learning may use different models to increase the performance of traditional approaches such as CL < 25 mm, as Logistic Regression, XGradient Boosting, Bagging Classifier, LDA, Artificial Neural Network, Adda Boost Classifier, staking ensemble learning model (SBLEM) and others [Citation11,Citation12]. Machine learning techniques have been increasingly applied in medicine and are regularly accepted in image recognition and other screening methods [Citation13,Citation14].

The main objective is to create a new screening for spontaneous preterm birth (sPTB) < 35 weeks, based on demographic, clinical, and ultrasonographic variables analyzed by AI.

Methods

This is a retrospective cohort analysis of 755 singleton pregnancies in the second trimester evaluated from October 2010 to August 2018 in the Sector for Prediction of Preterm Birth at the Federal University of São Paulo (UNIFESP). During this period, all patients between 18 to 24 weeks were offered, in addition to the anomaly scan, a transvaginal scan for the assessment of CL. This study was approved by the Ethics Committee (IRB) of The Federal University of Sao Paulo (Plataforma Brasil - http://plataformabrasil.saude.gov.br) number CAAE 30873613.8.0000.5505, on 3rd Sep 2014, with an amendment on 8th Aug 2020. All participants signed the consent form. Mechanical treatment for PTB was appreciated with the use of pessary and cervical cerclage. Cervical cerclage, McDonald technique, was systematically offered for typical cervical insufficiency [Citation15].

Only cases of preterm rupture of ovular membranes and spontaneous onset of labor were included in the analysis. All non-spontaneous deliveries (n = 99) were excluded from this study, encompassing cases of labor induction (n = 62) and cesarean sections performed due to maternal or fetal complications (n = 37), such as gestational diabetes, preeclampsia, and fetal growth restriction (medically indication of delivery). Furthermore, mechanical treatments for PTB were also excluded (n = 22). Mechanical treatments referred to the utilization of pessary (n = 11) and cervical cerclage (n = 11). A total of 110 patients were lost to follow-up, resulting in 524 patients remaining for analysis (). Pregnancy outcome data were obtained from medical records or the government’s database of live birth registrations (SINASC). For cases of preterm delivery < 35 weeks, the classification considered whether it occurred spontaneously or was medically indicated.

Figure 1. Flowchart of the included patients.

Figure 1. Flowchart of the included patients.

All patients included in this study were participants in international randomized trials focusing on the systematic prediction and prevention of preterm birth (PTB) [Citation16–18]. Among them, those identified as high risk for PTB (defined as CL < 25 mm) were randomly assigned to receive either isolated progesterone treatment at a dosage of 200 mg/day or a combination of progesterone at 200 mg/day and a cervical pessary. However, patients who were randomized for the pessary intervention were subsequently excluded from the analysis, leaving only those who were randomized to receive progesterone. The administration of antibiotics followed a predefined protocol [Citation19].

Discriminatory aspects of measurements of cervix

Cervical length was measured with a transvaginal (TV) probe, with an empty maternal bladder. An adequate view of the cervical gland area was obtained, low pressure over de cervix was applied and appropriate magnification was used. The cervical study consisted of three consecutive measures without abdominal compression of the uterine fundus and the other three measures after abdominal compression of the uterine fundus by at least 20 s, to simulate a uterine contraction. From the TV images, CL (A-B), cervical outer diameter (X-Z), a two-time measure (A-Y-B length) [Citation17]; and the inside cervical angle (A-Y-B angle) were measured (), this variable will be better explained on discussion section. Care was taken to exclude from the CL the isthmus (B-C), where there was no cervical mucosa [Citation20]. Another measure applied as a variable in this study was an index straight CL/internal angle inside the cervix (index = A-B/A-Y-B angle). The smallest straight measure was considered as its respective A-Y-B measure and angle. Other measures which were performed were A-C, A-C two-times, A-C trace line, as presented in . This study aims to expand the understanding of cervical characteristics by investigating new aspects and measurements not previously described in the literature, specifically focusing on the variables of index straight cervical length (CL) and internal angle within the cervix (index = A-B/A-Y-B angle). By exploring these variables, we seek to enhance the available information on the cervix and contribute to its comprehensive assessment.

Figure 2. Image representative of the cervix to demonstrate the principal measurements performed during TVS in our unit.

Figure 2. Image representative of the cervix to demonstrate the principal measurements performed during TVS in our unit.

Selection of variables

In the whole study, 33 variables were evaluated and could be compared. After the determination of variables, the missing data were evaluated. More than 30% of missing data were criteria of exclusion of the variable and all possible correction on the description of data was performed.

The ultrasound was preceded by a detailed appointment to collect all relevant data for screening sPTB < 35 weeks as a history of preterm or term birth, and the gestational age when PTB occurs, cervical surgery or known uterine malformation, history for cervical incompetence, habits, addictions (smoking or drugs), medications (including antibiotics and progesterone), history of vaginal bleeding, bacterial infection during the current pregnancy, and most common demographic characteristics [as age, weight, height, body mass index (BMI), ethnics, schooling, number of gestations, parity, miscarriage, and curettage]. A variable “Any Risk Factor to PTB” was created including, in the same variable, patients with previous PTB, cervical surgery, uterine malformation, or cervical incompetence.

First of all, the dummies variables creation was performed, i.e. all categorical variables were modified into a numerical variable that takes on one of two values: zero or one. Subsequently, the vectorization and the dummies were created by odds ratio (OR) pointing out sPTB < 35 weeks, and finally, the dummies were reassembled by the similarity of the OR. For non-categorical variables, the variable was separated in decile scale and regroup by the similarity of the OR in each decile; and for categorical (zero or one) variables the OR was considered directly [Citation21].

Cervical length as control group

All screening techniques applied during this study considered 524 patients. The CL < 25 mm is considered the gold standard for PTB screening and in terms of comparison can be considered a control group, thus the results achieved with AI techniques were compared with the ones established using the CL < 25 mm.

Criteria of selection of the variables

The inclusion of the variables and algorithms in the model obeys two predefined rules: 1) Forward selection based on logistic regression (LR) to select variables (it needs to have a significance of at least 0.05 to be elected by the model) [Citation22]; 2) There was verified Pearson Correlation among variables and no elimination of variables was necessary because all variables were in low correlation (0.30 to 0.50; − 0.30 to − 0.50) or very low correlation (0.00 to 0.30; 0.00 to −0.30) [Citation23].

Cross-validation method

The analysis was conducted using the cross-validation method, whereby 70% of the total number of cases were utilized for adjusting and training the techniques, while the remaining 30% constituted the Test Group for testing and validating all the models subsequent to the training phase [Citation24].

Stepwise forward multivariate unadjusted logistic regression

A stepwise process is a mathematical approach used to select the most relevant variables for predicting spontaneous preterm birth (sPTB) occurring before 35 weeks of gestation [Citation25]. In this study, the stepwise analysis involved the inclusion of all 33 variables in a multivariate analysis [Citation26] and subsequently identified the statistically significant variables to compose the new predictive model. Once the most important variables were determined through logistic regression, their individual importance was further evaluated using SHAP analysis.

Artificial intelligence – evaluated algorithms (models)

The utilization of Artificial Intelligence (AI) involves inherent complexity, and this novel mathematical calculation introduces additional learning strategies, some of which may be regarded as “black box” approaches that offer limited insight into the underlying data structure [Citation27].

Different algorithms of machine learning were applied in the same database to evaluate the performance in Receiver Operating Characteristics (ROC) curve (area under ROC curve - AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) by different techniques of Machine Learning (ML) to discriminate the target (sPTB < 35 weeks). Among different possibilities in AI models existing, the following algorithms were tested: X Gradient Boosting (XGBoost); Linear Discriminate Analysis (LDA); AdaBoost Classifier; Randon Forrest Classifier; Decision Tree [Citation11,Citation27,Citation28]. The statistical analysis to choose the best algorithm was the highest number in the Kolmogorov-Smirnov test [Citation29]. The statistical analysis to choose the best algorithm was the highest number in the Kolmogorov-Smirnov test in the 30% of the Test Group.

Stacking-based ensemble learning method

A combination of multiple models was employed to generate a unified score for the creation of the Stacking-Based Ensemble Learning Model (SBELM), utilizing a Neural Network approach. This method synergistically combines different models, leveraging the strengths of each, to achieve optimal performance [Citation30]. Based on the calculated metrics obtained from logistic regression, a comparison between the best Machine Learning (ML) algorithm and CL < 25 mm was conducted, revealing distinct performance outcomes. The Kolmogorov-Smirnov test was employed to identify the ML algorithm that demonstrated the highest performance. To uphold the current screening method (CL < 25 mm), a composite performance measure combining CL, logistic regression, and the best-performing AI algorithm was chosen ().

Figure 3. Schematic representation of SBELM, stacking different algorithms to achieve better performance to explain the event (sPTB < 35 weeks) stacked by Neural Network. The stacking was composed of CL < 25 mm, LR, and the best algorithm was obtained by machine learning.

Figure 3. Schematic representation of SBELM, stacking different algorithms to achieve better performance to explain the event (sPTB < 35 weeks) stacked by Neural Network. The stacking was composed of CL < 25 mm, LR, and the best algorithm was obtained by machine learning.

Statistical analysis

The significance was established by p-value < .05, two-sided, and the confidence interval was established in 95% for logistic regression, IA models, and the comparison between ROC curves. The ROC curve and AUC were created for CL < 25 mm, LR, the principal algorithm of AI and SBELM. The significance of each model was tested comparing with CL < 25 mm and p-value was determined by the Shapiro-Wilk test and Student T-test, and it was considered statically significant when p < .05. Sensitivity, specificity, accuracy, predictive positive and negative values were performed with a false-positive rate fixed at 10% and 20%.

The process of collection of data, missing data validation, and Dummy vectorization was developed with Microsoft Excel for Mac (Microsoft 365) and SQL Management Studio (SQL - Structured Query Language). Statistical analysis of CL < 25 mm, LR, algorithms of AI, and SBELM required IBM SPSS Statistics, version 23.0 (SPSS Inc., Chicago, IL, USA) and Python (Anaconda) [Citation31].

Results

Demographic characteristics

There was a total of 524 cases, 488 (93.1%) deliveries occurred ≥ 35 weeks and 36 (6.9%) deliveries < 35 weeks. Demographic characteristics were expressed in .

Table 1. Demographic characteristics of the population in the study and odds ratio.

Cervical length distribution and sPTB detection rate by the cervix

The ROC curve for CL < 25 mm has an AUC of 0.318 and it was considered the principal parameter of comparison for LR, Algorithms of AI, and SBELM. The sensitivity, specificity, accuracy, PPV and NPV obtained exclusively by CL < 25 mm for sPTB < 35 weeks (gold standard method) were, respectively, 33.3%, 91.8%, 87.8%, 23.1%, and 94.9%, keeping fixed a false positive rate of 10%.

Stepwise forward multivariate unadjusted logistic regression

A total of 8 variables were considered relevant to explain sPTB < 35 weeks based on 129 dummies, which were developed through a similar OR that discriminates higher risk for PTB. After the LR the AUC obtained was 0.749.

The results of LR were expressed in , considering Coef β, SD, Wald, P-value, OR, and confidence interval (CI) of 95%. The obtained variables were the presence of funneling, index A-B/A-Y-B angle (index ≤ 0.200), weight (≤ 58 kg), previous preterm delivery < 37 weeks, no antibiotic treatment during pregnancy, previous curettage, no-smoking, and CL (< 30.9 mm) ().

Table 2. Selected variables using logistic regression (LR).

The most important variables were selected according to LR, the P-value was expressed, and the value of Coef. ß is the number applied on the Logit function to discriminate the cases with a higher risk for PTB in this population.

After logistic regression determines the current variables, a test of correlation (described in ) was performed to confirm the low correlation. Any variable selected has less than 50% of correlation, and in fact, the absolute majority have less than 10% of correlation.

Table 3. Correlation matrix.

The SHAP dependence plot was generated to elucidate the interplay among variables and their impact on the probability prediction of preterm birth ().

Figure 4. The SHAP dependence plot to evaluate the interaction between variables selected by logistic regression.

Figure 4. The SHAP dependence plot to evaluate the interaction between variables selected by logistic regression.

Regarding logistic regression (LR), the sensitivity for preterm deliveries occurring before 35 weeks was determined to be 38.9%, with a predictive positive value of 26.9%. This corresponds to a false-positive rate of 10% ().

Artificial Intelligence

All variables were included in several algorithms of AI simulation. The models successively interacting with the variables after testing (70.8% of the cases – n = 371) and training (29.2% of the cases - n = 153), at the end of the process regularly improving the result of LR. This interaction could be measured by the Kolmogorov-Smirnov test (KS2) and the ROC curve could be obtained after the use of IA, and the KS2 best performance was obtained by X Gradient Boosting and all results were expressed in .

Table 4. – Different algorithms and respective Kolmogorov-Smirnov tests. The XGBoost achieved the best performance, with the higher Kolmogorov-Smirnov test index (KS test = 0.489899).

The SHAP analysis was conducted using the variables selected by XGBoost, and the results were visually presented in . The significance of each variable is depicted on the scale provided in the graphic.

Figure 5. The SHAP analysis for XGBoost demonstrates the importance of each variable.

Figure 5. The SHAP analysis for XGBoost demonstrates the importance of each variable.

Stacking-based ensemble learning model

The stacking-based ensemble learning model (SBELM), implemented with a neural network, exhibited the highest performance in terms of the area under the curve (AUC) compared to other techniques. SHAP analysis revealed the varying importance of each individual model in contributing to its overall performance. illustrates the contributions of CL< 25 mm, logistic regression (LR), and XGBoost, with the importance of these models presented as normalized values to facilitate comprehension.

Figure 6. SHAP analysis of the different models. Normalized importance of each algorithm which composed SBELM.

Figure 6. SHAP analysis of the different models. Normalized importance of each algorithm which composed SBELM.

The neural network was created with one hidden layer with 3 (three) units (artificial neurons). One dependent variable in the output layer with 2 (two) units in the output layer, with SoftMax which is an activation function. The activation function of the hidden layer was Hyperbolic Tangent, and the error function was Cross-Entropy ().

Figure 7. The Schematic representation of neural network with different models compound SBELM (left column) with hidden layers (Middle column) and output layers (right column). the thickness of the lines represents the importance of each algorithm in the respective hidden layer and the output layer.

Figure 7. The Schematic representation of neural network with different models compound SBELM (left column) with hidden layers (Middle column) and output layers (right column). the thickness of the lines represents the importance of each algorithm in the respective hidden layer and the output layer.

The ROC curve analysis and AUC to predict the target was created for CL < 25 mm, LR, XGBoost, and SBELM (), and statistical significance was calculated after comparison with CL < 25 mm by Student T-test. Sensitivity, specificity, accuracy, PPV, and NPV and AUC (ROC) considering 10% of false-positive rate, were expressed in .

Figure 8. Performance of ROC curve of the CL < 25 mm (green line), logistic regression (red line), X Gradient Boosting (blue line), and SBELM (purple line). considering the ROC curve area, the best-isolated model performed (SBELM) performed better than CL < 25 mm and achieved AUC = 0.808 for the detection of sPTB < 35 weeks.

Figure 8. Performance of ROC curve of the CL < 25 mm (green line), logistic regression (red line), X Gradient Boosting (blue line), and SBELM (purple line). considering the ROC curve area, the best-isolated model performed (SBELM) performed better than CL < 25 mm and achieved AUC = 0.808 for the detection of sPTB < 35 weeks.

Table 5. Considering different models to screening sPTB < 35 weeks, it Presents AUC (ROC) with p-value, sensitivity, specificity, accuracy, predictive positive value, and a predictive negative value when was fixed the false-positive rate (FPR) in 10%.

provides the most significant findings, showcasing a 10% false-positive rate. However, it is worth considering another perspective by hypothetically setting a fixed 20% false-positive rate for the stacking-based ensemble learning model (SBELM). In this scenario, the SBELM demonstrates superior values for sensitivity (75.0%), specificity (84.2%), accuracy (83.6%), positive predictive value (PPV) (26.0%), and negative predictive value (NPV) (97.0%) compared to sensitivity (41.5%), specificity (82.0%), accuracy (78.8%), PPV (16.3%), and NPV (94.3%) when considering values for short cervix < 28.5 mm, which represents similar 20% of FPR in terms of the cervical length. Notably, the increase in sensitivity and PPV is 60% greater with the utilization of the machine learning technique (SBELM), in addition to improvements observed in all other aspects.

A Student T-test for the LR, XGBoost, and SBELM comparing with CL < 25 mm were performed and the values of p were < .0001, for all algorithms. Predict pseudo-probability obtained by SBELM can be observed in , it is important to notice that there was a reduced intersection between 0 and 1.

Figure 9. Predict pseudo-probability of preterm birth < 35 weeks, with small Points of intersection between 0 and 1.

Figure 9. Predict pseudo-probability of preterm birth < 35 weeks, with small Points of intersection between 0 and 1.

Discussion

A significant change in data generation and technology consumption based on AI has been experienced. During 2019 and 2020, the amount of processed information was larger than all the production developed by mankind so far [Citation32]. Health technologies have continuously been further developed since the beginning of medicine. Increasing knowledge in diagnostic, prevention, treatment, and rehabilitation has led to significant changes in healthcare systems [Citation33]. This is a unique moment in using AI techniques, where the current computational power linked to data production allows us to use algorithms which in the past were not possible and provides better adjustment of statistical models and patients’ research-based ratings.

Observing the current AI moment, it is possible to identify this technology in several everyday applications, such as search engines, engines for a credit decision, the process of image identification, among others, which inside an application could provide an automated and efficient process offering a differentiated experience for mankind.

According to Gartner [Citation10], the Data & Analytics area is increasingly gaining space in the market and is expected to grow even more in the forthcoming years. Until 2022, 90% of corporative strategies mention data as an essential asset for the results of public or private companies.

Main findings

Therefore, the objective of this application is to enhance the process of PTB risk assessment, which can have significant implications for the perinatal outcome. The current methods exhibit low accuracy, leading to delays in the development of therapeutic interventions.

The implementation of AI, specifically the SBELM, led to a notable improvement in the screening sensitivity for sPTB occurring before 35 weeks, increasing from 33.3% with the use of CL < 25 mm to 47.2% in our specific population. It is worth highlighting that these patients were referred to a tertiary center specifically for CL screening. This advancement in PTB screening holds significant implications and has the potential to pave the way for the development of novel therapeutic interventions. Moreover, this model can be extrapolated to a larger population of women as it relies solely on ultrasound, a widely accessible method, and a computer program. It is important to emphasize that the primary objective of this study was not to serve as a diagnostic tool.

New variables were created by this research to describe better the cervix and one of them was selected with relevance by the model, they could be evaluated in other studies.

Implication for research

Beta et al. [Citation4] has created a similar screening method based on maternal characteristics, intending to associate serum markers, but without sonographic variables. The principal variables selected were previous PTB 34-36 weeks (OR 2.980 CI95% 1.887 − 4.704; p < .0001), smoking (OR 1.577 CI95% 1.143 − 2.174; p = .005 and assisted human reproduction (OR 1.722 CI95% 1.114 − 2.661; p = .014), these numbers were very similar to OR of this study.

Another study by Celik et al. [Citation34] evaluated different scenarios to screen PTB using CL, associated with maternal demographic characteristics and obstetric history. Fixed 10% of the false-positive rate the detection rate for CL < 25 mm was 24.2% and for the combination CL < 25 mm, obstetric history and maternal characteristics was 29.3% for 34–36 weeks’ gestation. The comparison from this study and Celik et al. [Citation34], regarding the similar target (gestational age 34-36 weeks), the similar false-positive rate (10%), and the similar analyzed variables (CL < 25 mm + maternal characteristics + obstetric history) could improve detection for sPTB < 35 weeks, achieving 47.2% after AI assessment and other variable assessment.

Future studies can be conducted based on the findings of this article, and it would be of great importance to develop multicenter studies utilizing the same strategy. This approach can lead to the development of models tailored to different populations, thereby increasing the sensitivity for PTB and ultimately resulting in improved therapeutic options.

This article holds relevance in the context of PTB screening. The combination of high sensitivity with a low false-positive rate can effectively discern the risk of PTB in a specific population. These findings can serve as the basis for the formulation of new guidelines, not only for different scenarios such as longer cervix and infections but also for exploring new applications of transvaginal ultrasound examinations at different stages of pregnancy.

Implication for clinicians

The selection of the optimal criteria for the utilization of this new method is feasible. In order to enhance sensitivity while maintaining the same false-positive rate for CL< 25 mm, a fixed false-positive rate of 10% was implemented and applied. However, it is worth considering the high values of sensitivity (75.0%), positive predictive value (PPV), negative predictive value (NPV), specificity, and accuracy achieved with a false-positive rate of 20%. This threshold of 20% false-positive rate, accompanied by increased sensitivity, could be considered a second evaluation between 26 to 28 weeks gestation.

The applicability of this calculator to the entire population is feasible. The improved performance of PTB screening, surpassing the sole use of cervical length, has the potential to provide therapeutic options that may not have been considered for certain individuals. Nonetheless, validation of these results in a low-risk population through randomized clinical trials or other retrospective studies utilizing a similar technique is imperative.

New modalities of AI were tested using exclusively these variables selected for this study and the performance of the Kolmogorov-Smirnov test was superior when using SBELM [Citation35].

Reflection of other results

The group created for research in PTB developed a single form to measure de cervix in clinical appointments. An unusual measurement trying to explain PTB was created. Including the inside cervical angle (A-Y-B angle), index A-B/A-Y-B angle, B-C measure (isthmus), the A-C trace line, the A-C two straight measures, transverse measure, and anteroposterior measure (X-Z). New studies must be conceived to validate, or not, these findings with these new measures, as inside cervical angle (A-Y-B angle), index A-B/A-Y-B angle, A-C straight measure, and B-C measure which participated of logistic regression, and index A-B/A-Y-B angle was included from final variables selected by stepwise LR, which can demonstrate the importance of this new non-validated variable.

Over the years the CL < 25 mm was chosen for screening of PTB even despite the regular sensitivity to discriminate PTB around 37% [Citation2]. Otherwise, 63% or almost 2/3 of PTB are not screened by short cervix. Consequently, new more accurate predictors would be welcome. So, the facility to screening patients with a single measure with regular accuracy is the most important factor to accept this mediocre sensitivity [Citation3]. With AI, it is possible to consider screening PTB with better performance, without increasing the false-positive rate and achieving an acceptable performance for a screening.

Strengths and limitation

One of the strengths of this research lies in the innovative methodology based on SBELM, which resulted in the development of a novel AI algorithm specifically tailored for this study, in conjunction with logistic regression and superior AI algorithms for preterm birth prediction. While it may appear as a mere maneuver, it was a deliberate strategy aimed at optimizing the assessment of key factors associated with PTB risk.

Consequently, it becomes evident that cervical length is the most influential factor in PTB risk across the entire cohort, particularly in cases where CL is less than 20 mm. However, the question arises: Is CL equally significant in assessing PTB risk for pregnant women with CL measurements between 20 and 35 mm? Moreover, does CL hold any relevance in predicting PTB risk for patients with CL measurements exceeding 35 mm [Citation34]? These aspects are explored in the context of this new SBELM approach, which enables the screening of patients at risk of PTB regardless of their CL measurements, with high sensitivity and specificity.

The utilization of SBELM offers a valuable opportunity to identify patients at risk of PTB beyond the reliance on CL alone, thereby enhancing the screening process with improved sensitivity and specificity [Citation13].

These research results demonstrated that different characteristics could be involved in different CLs and this strategy to achieve better sensitivity without changes in false-positive rate or the number of positive and negative predictive values. This retrospective study is just the start of the entire process to increase the detection of PTB. After this initial phase, a prospective study with this new methodology will be performed to check the real efficiency of the algorithm SBELM.

Preterm birth (PTB) imposes a significant economic and social burden, accounting for approximately 80% of child mortality before the age of 5 [Citation6]. In the United States alone, there are approximately 4.1 million preterm deliveries each year [Citation36]. When considering the cost associated with each PTB occurring between 28 and 36 weeks, which is estimated at around $12,500 per case [Citation37], the financial implications reach staggering figures, exceeding $51.2 billion annually.

Given the enormity of these numbers, it is crucial to urgently explore new and effective technologies for the detection of PTB. The current reliance on the measurement of cervical length alone, with a detection rate of approximately 37% [Citation2], necessitates the pursuit of alternative screening methods. In this regard, this study demonstrates the potential of AI technology in achieving a detection rate of nearly 50%. It is important to note that while this improvement in sensitivity is promising, it does have limitations, particularly in terms of diagnostic capability. However, for the purpose of screening, this level of sensitivity is considered acceptable.

Efforts to enhance PTB detection and screening are vital to mitigate its profound impact. The utilization of AI technology holds promise in this endeavor, offering a substantial improvement in sensitivity compared to the traditional approach based solely on CL< 25 mm.

Another strength of this study lies in the utilization of logistic regression as the foundation of the mathematical process. This approach enabled the demonstration of each step in the mathematical process, including the determination of algorithmic parameters such as Coef. ß, Wald, OR, and P-value for each cervical length measurement. Subsequently, the application of machine learning techniques was introduced, enhancing the sensitivity of PTB detection. Although the AI models employed in this study may be considered as “black boxes” in terms of their internal workings, it is important to highlight that they were built upon a well-established and transparent stepwise univariate forward LR framework. This approach ensures the reproducibility and transparency of the research process [Citation38].

Furthermore, this study demonstrates another strength by promptly developing an online calculator to facilitate future research calculations, prospective phases, and technology validations. This calculator can also be utilized to benefit other populations by providing improved sensitivity and specificity. Given the complexity of PTB screening, involving various parameters such as different cervical measurements, clinical history, and obstetrical antecedents, having a dedicated calculator for this purpose is invaluable. The study by Nicolaides et al. [Citation39], which developed a similar calculator based on ultrasonographic and biochemical findings in the first trimester of pregnancy for screening aneuploidies, serves as an inspiration for this study. It is widely recognized as the leading approach for screening fetal aneuploidies on a global scale.

One of the limitations of this article is its methodology. A retrospective study, in essence, does not have the best statistical power in the medical area to discriminate an event, and this cohort was not conceived for this methodology. A randomized double-blind prospective multicentric clinical trial developed exclusively with this objective has the best statistical power. However, a retrospective cohort to test this new technology was used, it could be considered adequate for this innovation and the database could be useful during the implementation of the second phase, a prospective randomized clinical trial [Citation40].

Another limitation of this article is the unique research center in a tertiary screening referred center for PTB. This aspect could produce less reproducibility of our results in low-risk populations, because of the demographic characteristics of our population. However, this singular population will be prospectively evaluated in the future and these findings could be consistent, probably this methodology could be applied in a wider form provided that calculation has been performed in a low-risk population in a multicentric study which better represents the population in general, in comparison with CL < 25 mm.

The exclusion of cases with CL < 25 mm treated with a cervical pessary or cerclage is also a limitation of this study. These exclusions could produce bias, although this bias could not be improper, because the presence of a short cervix in a higher frequency could uncover the other less relevant aspects evaluated in a longer cervix, which regularly could have a fewer performance in the determinism of PTB, and maybe this risk of bias was important to give more relevance for aspects regularly less valorized.

The small number of cases (n = 524) in this study could be another limitation, after exclusion of 33 cases by missing data, and other cases with cervical pessary and cervical cerclage, we necessarily made the statistical analysis with this low number of patients. However, increasing the number of cases will require minimal time and effort to train the model.

Research about screening methods must be conceived without any treatment [Citation41], but in our days is unethical not to offer vaginal progesterone for short cervix, and also not offer antibiotics for amniotic fluid sludge, regarding meta-analyses evidence [Citation42,Citation43]. Thus, excluding mechanical treatment was an alternative to exclude overwhelming treatments for PTB whose does not achieve worldwide agreement [Citation44], therefore reducing the risk of bias for the new screening method, even considering the bias which could be created by vaginal progesterone and antibiotics. If this screening method were effective in the use of progesterone/antibiotics it could be useful in clinical practice, once the patients would also be using it, if they had medical indication.

This study introduces a novel cervical measurement, the angle inside the cervix (A-Y-B angle), and its association with cervical length (CL) forms the CL/angle index (index = A-B/A-Y-B angle). Notably, this variable demonstrates significant discriminatory effects within the group of cervix measurements ranging from 20 to 35 mm. Additionally, the isolated cervical angle holds importance in longer cervix measurements (> 35 mm), although these specific results are not presented in this article. The inclusion of this innovative measurement should be emphasized, as it contributes to the pool of variables analyzed in this study and may have relevance in machine learning models. Logistic regression analysis further supports the significance of this novel measurement, identifying it as the most relevant variable with an odds ratio of 1.533 (95% CI 0.481–4.887).

So, if this calculator will be applied in the real context, these authors suggest that all measures be executed exactly as the methodology was described, and in the future new research could be developed to explain better the importance of each variable.

Hence, the incorporation of machine learning techniques in screening for PTB before 35 weeks gestation holds great promise and can be implemented in clinical practice shortly after the availability of the calculator. The World Health Organization emphasizes the importance of screening tests being valid, reliable, cost-effective, acceptable, and accompanied by appropriate follow-up services. This new method shows the potential in meeting these essential criteria for application in clinical settings [Citation41]. By enhancing the screening process, new possibilities emerge in the realm of PTB, as potential treatments can be offered to patients who would not have been identified through traditional short cervix screening alone. This advancement offers the possibility of reducing PTB rates. To validate these findings, further investigation is warranted, including larger retrospective studies involving low-risk populations and prospective randomized clinical trials.

The utilization of artificial intelligence and machine learning improves the ability to identify individuals at a higher risk of PTB before 35 weeks, irrespective of cervical length or previous PTB history. This approach represents a new perspective in medical screening, significantly enhancing the sensitivity and specificity compared to traditional methods such as cervical length less than 25 mm while maintaining a similar false-positive rate of 10%.

Author’s contribution

The authors’ contribution, the interpretation of the data, the article’s writing, the critical review of the intellectual content, and the final approval of the version to be published were similar. All authors accept responsibility for the paper as published.

Acknowledgments

These authors would like to thank the Health and Medical Equipment Division of Samsung Brazil for offering the WS80A ultrasound system and HERA W9 ultrasound system to perform the exams during the study. We are also grateful to Mr. Rudolf Wiedemann for his support with the present article’s English version.

Disclosure statement

The authors report no conflict of interest: Including relevant financial, personal, political, intellectual, or religious interests.

Data availability statement

Data available on request from the authors.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Mercer BM, Goldenberg RL, Das A, et al. The preterm prediction study: a clinical risk assessment system. Am J Obstet Gynecol. 1996;174(6):1885–1895. doi: 10.1016/s0002-9378(96)70225-9.
  • Iams JD, Goldenberg RL, Meis PJ, et al. The length of the cervix and the risk of spontaneous premature delivery. N Engl J Med. 1996;334(9):567–572. doi: 10.1056/NEJM199602293340904.
  • To MS, Skentou CA, Royston P, et al. Prediction of patient-specific risk of early preterm delivery using maternal history and sonographic measurement of cervical length: a population-based prospective study. Ultrasound Obstet Gynecol. 2006;27(4):362–367. doi: 10.1002/uog.2773.
  • Beta J, Akolekar R, Ventura W, et al. Prediction of spontaneous preterm delivery from maternal factors, obstetric history and placental perfusion and function at 11–13 weeks. Prenat Diagn. 2011;31(1):75–83. doi: 10.1002/pd.2662.
  • Rolnik DL, Wright D, Poon LCY, et al. ASPRE trial: performance of screening for preterm pre-eclampsia. Ultrasound Obstet Gynecol. 2017;50(4):492–495. doi: 10.1002/uog.18816.
  • Chawanpaiboon S, Vogel JP, Moller AB, et al. Global, regional, and national estimates of levels of preterm birth in 2014: a systematic review and modelling analysis. Lancet Glob Health. 2019;7(1):e37–46–e46. doi: 10.1016/S2214-109X(18)30451-0.
  • Goldenberg RL, Culhane JF, Iams JD, et al. Epidemiology and causes of preterm birth. Lancet. [Internet]. 2008;371(9606):75–84. doi: 10.1016/S0140-6736(08)60074-4.
  • United Nations Inter-Agency Group for Child Mortality Estimation. Levels and Trends in Child Mortality - Report 2018. 2018;
  • Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Heal Inf Sci Syst. 2014;2(1):1–10.
  • Beyer MA, Laney D. The importance of “big data”: a definition. Gartner; 2012.
  • Ayodele TO. Types of machine learning algorithms. In: Zhang Y, editor. New Advances in Machine Learning. Rijeka: IntechOpen; 2010. p. 21–48.
  • Geron A. Hands-On machine learning with Scikit-Learn, keras & TensorFlow - Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472; 2019. 1–510. p.
  • Wang Y, Wang D, Geng N, et al. Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl Soft Comput J. [Internet]. 2019;77:188–204. doi: 10.1016/j.asoc.2019.01.015.
  • Hosny A, Parmar C, Quackenbush J, et al. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500–510. doi: 10.1038/s41568-018-0016-5.
  • Arnold KC, Flint CJ. Cerclage for the management of cervical insufficiency. In: Arnold KC, Flint CJ, editors. Obstetrics Essentials. Cham: Springer; 2017. p. 173–177.
  • Pacagnella RC, Silva TV, Cecatti JG, et al. Pessary plus progesterone to prevent preterm birth in women with short cervixes: a randomized controlled trial. Obstet Gynecol. 2022;139(1):41–51. doi: 10.1097/AOG.0000000000004634.
  • Pacagnella RC, Mol B, Borovac-Pinheiro A, et al. A randomized controlled trial on the use of pessary plus progesterone to prevent preterm birth in women with short cervical length (P5 trial). BMC Pregnancy Childbirth. 2019;19(1):442. doi: 10.1186/s12884-019-2513-2.
  • Witkin SS, Moron AF, Ridenhour BJ, et al. Vaginal biomarkers that predict cervical length and dominant bacteria in the vaginal microbiomes of pregnant women downloaded from. MBio. [Internet]. 2019;10(1):e02242-19. doi: 10.1128/mBio.02242-19.
  • Hatanaka AR, Franca MS, Hamamoto TENK, et al. Antibiotic treatment for patients with amniotic fluid “sludge” to prevent spontaneous preterm birth: a historically controlled observational study. Acta Obstet Gynecol Scand. 2019;98(9):1157–1163. doi: 10.1111/aogs.13603.
  • Kagan KO, Sonek J. How to measure cervical length. Ultrasound Obstet Gynecol. 2015;45(3):358–362. doi: 10.1002/uog.14742.
  • Manssori J. What is vectorization in machine learning? [Internet]. June 2020. 2020. Available from: https://towardsdatascience.com/what-is-vectorization-in-machine-learning-6c7be3e4440a.
  • Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J Clin Epidemiol. 2004;57(11):1138–1146. doi: 10.1016/j.jclinepi.2004.04.003.
  • Mukaka MM. Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69–71.
  • Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Int Jt Conf Artif Intell. 1995;2:1137–1145.
  • Pampel FC. Logisfic regression-A primer (quantitative applications in the social sciences). Thousand Oaks, CA: Sage Universal Paper; 2000.
  • Johnson RA, Wichern DW. Applied multivariate statistical analysis. 6th Ed. Upper Saddle River (NJ): Pearson Education; 2007.
  • Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66. doi: 10.1007/BF00153759.
  • Santos RAF, De Barros RSM. Comparing FBTSeg and NNTree implementations with established ensemble methods. Proc - Int Conf Tools with Artif Intell ICTAI. 2012;1(11):898–903.
  • Conover W. Practical nonparametric statistic. 3rd ed. New York (NY): John Wiley & Sons; 1999.
  • Dezeroski S, B Z. Is combining classifiers with stacking better than selecting the best one? Mach Learn. 2004;40(2):50–62.
  • Van Rossum GD. Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica; 1995.
  • Hammad KAI, Mohammed AIF, Zain JM, et al. Big data analysis and storage. Proc 2015 Int Conf Oper Excell Serv Eng. 2015;(9):648–659.
  • Ricciardi W, Pita Barros P, Bourek A, et al. How to govern the digital transformation of health services. Eur J Public Health. 2019;29(Supplement_3):7–12. doi: 10.1093/eurpub/ckz165.
  • Celik E, To M, Gajewska K, et al. Cervical length and obstetric history predict spontaneous preterm birth: development and validation of a model to provide individualized risk assessment. Ultrasound Obstet Gynecol. 2008;31(5):549–554. doi: 10.1002/uog.5333.
  • Pernía-Espinoza A, Fernandez-Ceniceros J, Antonanzas J, et al. Stacking ensemble with parsimonious base models to improve generalization capability in the characterization of steel bolted components. Appl Soft Comput J. [Internet]. 2018;70:737–750. doi: 10.1016/j.asoc.2018.06.005.
  • Martin JA, Hamilton BE, Sutton PD, et al. Births: final data for 2005. Natl Vital Stat Rep. 2007;56(6):1–103.
  • Russell RB, Green NS, Steiner CA, et al. Cost of hospitalization for preterm and low birth weight infants in the United States. Pediatrics. 2007;120(1):e1–e9. doi: 10.1542/peds.2006-2386.
  • He J, Baxter SL, Xu J, et al. The practical implementation of artificial intelligence technologies in medicine. Nature. 2019;25(1):30–36.
  • Nicolaides KH. Nuchal translucency and other first-trimester sonographic markers of chromosomal abnormalities. Am J Obstet Gynecol. 2004;191(1):45–67. doi: 10.1016/j.ajog.2004.03.090.
  • Moher D, Wells GA, Dulberg CS. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA. 1994;272(2):122–124. doi: 10.1001/jama.1994.03520020048013.
  • Wilson JMG, Jungner G. Principes and practice of screening for disease. J R Coll Gen Pract. 1968;16(4):318.
  • Pergialiotis V, Bellos I, Antsaklis A, et al. Presence of amniotic fluid sludge and pregnancy outcomes: a systematic review. Acta Obstet Gynecol Scand. 2020;99(11):1434–1443. doi: 10.1111/aogs.13893.
  • Romero R, Conde-Agudelo A, Da Fonseca E, et al. Vaginal progesterone for preventing preterm birth and adverse perinatal outcomes in sigleton gestations with a short cervix: meta-analysis of individual patient data. Am J Obstet Gynecol. 2018;218(2):161–180. doi: 10.1016/j.ajog.2017.11.576.
  • Alfirevic Z, Owen J, Carreras Moratonas E, et al. Vaginal progesterone, cerclage or cervical pessary for preventing preterm birth in asymptomatic singleton pregnant women with a history of preterm birth and a sonographic short cervix. Ultrasound Obstet Gynecol. 2013;41(2):146–151. doi: 10.1002/uog.12300.