1,950
Views
0
CrossRef citations to date
0
Altmetric
Review Article

Estimated fetal weight standards of the INTERGROWTH-21st project for the prediction of adverse outcomes: a systematic review with meta-analysis

ORCID Icon, , , , , & show all
Article: 2230510 | Received 13 Dec 2021, Accepted 23 Jun 2023, Published online: 05 Jul 2023

Abstract

Objective

To systematically review and assess the risk of bias in the literature evaluating the performance of INTERGROWTH-21st estimated fetal weight (EFW) standards to predict maternal, fetal and neonatal adverse outcomes.

Methods

Searches were performed in seven electronic databases (Scopus, Web of Science, Medline, Embase, Lilacs, Scielo and Google Scholar) using citation tools and keywords (intergrowth AND (standard OR reference OR formula OR model OR curve); all from 2014 to the last search on April 16th, 2021). We included full-text articles investigating the ability of INTERGROWTH-21st EFW standards to predict maternal, fetal or neonatal adverse outcomes in women with a singleton pregnancy who gave birth to infants with no congenital abnormalities. The study was registered on PROSPERO under the number CRD42020115462. Risk of bias was assessed with a customized instrument based on the CHARMS checklist and composed of 9 domains. Meta-analysis was performed using relative risk (RR [95%CI]) and summary ROC curves on outcomes reported by two or more methodologically homogeneous studies.

Results

Sixteen studies evaluating fifteen different outcomes were selected. The risk of bias was high (>50% of studies with high risk) for two domains: blindness of assessment (81.3%) and calibration assessment (93.8%). Considering all the outcomes investigated, for 95% of the results, the specificity was above 73.0%, but the sensitivity was below 64.1%. Pooled results demonstrated a higher RR of neonatal small for gestational age (6.71 [5.51–8.17]), Apgar <7 at 5 min (2.17 [1.48–3.18]), and neonatal intensive care unit admission (2.22 [1.76–2.79]) for fetuses classified <10th percentile when compared to those classified above this limit. The limitation of the study is the absence of heterogeneity exploration or publication bias investigation, whereas no outcomes were evaluated by more than five studies.

Conclusions

The IG-21 EFW standard has low sensitivity and high specificity for adverse events of pregnancy. Classification <10th percentile identifies a high-risk group for developing maternal, fetal and neonatal adverse outcomes, especially neonatal small for gestational age, Apgar <7 at 5 min, and neonatal intensive care unit admission. Future studies should include blind assessment of outcomes, perform calibration analysis with continuous data, and evaluate alternative cutoff points.

Introduction

Estimated fetal weight (EFW) is routinely used during antenatal care for the screening of fetuses at risk of presenting adverse outcomes at birth, such as small- (SGA) and large- (LGA) for-gestational-age. The prenatal surveillance of fetal growth may lead to interventions to reduce stillbirth, morbidity, and postnatal mortality [Citation1].

EFW is calculated based on measures of fetal biometry, e.g. abdominal circumference (AC), head circumference (HC), biparietal diameter (BPD), and femur length (FL). Hammami et al. [Citation2] identified 70 different formulas for EFW derived from local studies with small samples in a systematic literature review (SLR) of 45 studies. The models provided by formulas incorporating three or more biometrical measurements have been shown to be more accurate than those with fewer parameters, with a particularly good performance of the Hadlock [Citation3] formula from measurements of HC, BPD, AC and FL. However, there is no consensus about which is the most accurate formula for EFW.

An international standard to predict EFW may enable a valid comparison between and within populations. In 2017, the INTERGROWTH-21st Project (IG-21) published the first international, multicenter, population-based EFW formula and proposed international standards for fetuses at 22–40 gestational weeks. Standardized data were obtained from eight geographically diverse countries and populations. The IG-21 EFW formula is a function of AC and HC based on 2,404 newborns who underwent the last ultrasound scan within 14 days before birth [Citation4].

Several studies have been performed aiming to evaluate IG-21 EFW standards’ ability to predict the occurrence of adverse outcomes, with conflicting results [Citation5Citation8]. This SLR aims to synthesize evidence regarding the performance of IG-21 EFW standards to predict maternal, fetal and neonatal adverse outcomes.

Methods

This SLR is part of a larger study registered on PROSPERO under the number CRD42020115462, aiming to answer what is known about all IG-21 standards’ predictive ability, including newborn size, fetal growth, gestational weight gain, symphysis-fundal height, and EFW. The search strategy was developed and performed for the main study. This paper presents the findings for studies validating EFW standards. The study design consisted of a SLR followed by a meta-analysis (MA), both conducted taking into account the Preferred Reporting Items for Systematic reviews and Meta-Analyses Group guidelines (PRISMA) [Citation9].

The literature search comprised three steps: (1) forward search using citation analysis tools of the Scopus, Web of Science (WoS), Medline, and Google Scholar databases to identify studies that cited the five articles related to the IG-21 standards [Citation4, Citation10Citation13]; (2) automatic search in Scopus, WoS, Medline, Embase, Lilacs, and Scielo using free-text terms; (3) backward search, manually checking the reference lists of the eligible studies. The complete search strategy and its results are described in Appendix S1.

Inclusion criteria included: investigate the performance of the EFW standards for predicting maternal, fetal and neonatal adverse outcomes in an external dataset (i.e. not IG-21 data) and include singleton pregnant women giving birth to infants with no congenital abnormalities. Inclusion was restricted to original full-text articles published in peer-reviewed journals in English, Spanish, and Portuguese since 2014.

We excluded studies that repeated the original modeling process in the validation data or refitted the models on new data, studies restricted to overweight and/or obese subjects, studies restricted to preterm births and studies that did not evaluate the association with adverse outcomes.

Study selection

The selection process started with title and abstract screening, followed by a full-text reading of potentially eligible publications. Both steps were performed by two independent reviewers. Disagreements were solved by a third reviewer. When multiple reports of the same study/database were identified, the more detailed report was selected. Studies selection was managed using Covidence [Citation14].

Data extraction

Data extraction was performed using a structured questionnaire by one author (Appendix S2). A second author reviewed the data, and any disagreement was solved by consensus. The following information was extracted: year of publication, country and city where the study was conducted, sample size, study design, inclusion and exclusion criteria, sample characteristics (degree of risk, age, BMI, parity and gestational age at delivery), predictors and cutoff points used in the analysis, outcomes evaluated and their respective incidence, the method used to estimate fetal weight, time points of predictor assessment, the proportion of fetuses classified below the 10th percentile, statistical methods used and presentation of results. Data were synthetized and presented in tables.

Whenever possible, numbers reported as true positives, false positives, false negatives, and true negatives were extracted for each outcome, as the percentile chosen as the cutoff point. The data were used to calculate sensitivity (%), specificity (%), positive likelihood ratio (+LR, %), negative likelihood ratio (-LR, %), positive predictive value (PPV, %), negative predictive value (NPV, %), accuracy (%) and relative risks (RR), and their respective 95% confidence intervals (CI).

Whenever the numerical estimate of interest was presented graphically, without the exact value, it was estimated using the software “GetData Graph Digitizer” (1 study [Citation5]) [Citation15]. Where necessary, the corresponding author was contacted and asked for supplementary data, with a reminder thirty days after the first contact. Two authors were contacted, but neither provided the requested information [Citation16, Citation17].

Assessment of risk of bias

Risk of bias was assessed with a customized instrument based on the CHARMS checklist [Citation18]. The following domains were considered: study design, recruitment method, missing data, outcome definition and measurement, blindness of assessment, similarity with IG-21 methods (same ultrasound methodology [Citation19] to obtain the parameters and IG-21 formula to obtain EFW), calibration assessment, performance of discrimination and report of strengths and weaknesses.

Two authors independently performed the risk of bias assessment. Differences were solved through consensus. The instrument consisted of nine questions, one for each domain. For each question, answers were classified as low, high, or unclear risk of bias (Appendix S3).

Data synthesis

MA was performed to report the pooled RR and a summary receiver operating characteristic curve (SROC) for the outcomes investigated by two or more studies with similar cutoff points. To avoid methodological heterogeneity, studies using birth weight (BW) as a predictor and/or with the evaluation before 32 weeks of pregnancy were excluded from the MA. To estimate the effect size, we used a random-effects model weighted by the inverse of the variance. The proportion of the observed variance reflecting the true effect’s variance, rather than sampling error, was evaluated by I2 statistics [Citation20].

MA of RR was performed in Stata, version 12.1[Citation21]. SROC was created using Review Manager 5.4 [Citation22].

Results

Study selection

Excluding duplicates, 1,621 studies were identified from the initial search, and 16 were included after reading the full text. The study selection process and the reasons for exclusions are outlined in Figure S1. The backward search did not retrieve any additional studies. The list of excluded papers after full-text reading is presented in Table S1. In addition to the predetermined exclusion criteria, we further excluded one study in which the sample was exclusively composed of gestations with ultrasonography (USG) evidence of AC < 3% or EFW < 10% [Citation23].

Study characteristics

Six studies were performed in the United States (35.3%) [Citation5Citation8, Citation24]. Others were from Spain [Citation25Citation27], Canada [Citation28, Citation29], Brazil [Citation17, Citation30], China [Citation31], India [Citation32], Australia [Citation33] and the United Kingdom [Citation34]. The studies with the largest sample sizes used population-based secondary data [Citation24, Citation33]. Contrary to smaller studies, these papers did not estimate EFW by the IG-21 formula but used BW as an estimate of fetal weight. Canadian studies present the same limitation [Citation28, Citation29]. Other exceptions were three studies that obtained EFW using the Hadlock formulas [Citation17, Citation30, Citation32]. Thus, these studies did not evaluate the performance of the EFW formula but only the centile curves from the IG-21 Project. One study [Citation7] did not clearly state the methods used to obtain the EFW ().

Table 1. Summary of studies characteristics.

The most frequent cutoff of the IG-21 EFW standards used to predict adverse outcomes was the 10th percentile (n = 11), followed by the 5th (n = 5) and 90th percentiles (n = 5). Only Hua et al. [Citation5] and Sovio et al. [Citation34] used repeated measures. To achieve superior prediction power, subsequent analyses focused on the last USG for these studies ().

Most studies (9/12 with available data) described a low proportion of fetuses classified below the 10th percentile, ranging between 3.2 and 14.6%. All studies with proportions <6% of fetuses below the 10th percentile (n = 5) used BW or unclear methodologies to estimate the predictor ().

Risk of bias

Three studies presented a low risk of bias for six/seven domains [Citation28, Citation29, 34], eleven presented a low risk for four/five domains [5–Citation8, Citation25, Citation27, Citation30Citation33, Citation35], and two showed a low risk for three domains [Citation17,Citation24] (). More than 50% of studies presented a high or unclear risk of bias in the following domains: missing data (75.0%), outcome definition and measurement (56.3%), blindness of assessment (81.3%), similarity with the IG-21 methods (100%) and calibration assessment (93.8%) ().

Figure 1. Quality assessment evaluation. (A) Risk of bias according to selected domains for each study; (B) Proportion of studies with low, high, and unclear risk of bias for each selected domain.

Figure 1. Quality assessment evaluation. (A) Risk of bias according to selected domains for each study; (B) Proportion of studies with low, high, and unclear risk of bias for each selected domain.

Synthesis of results

The outcomes evaluated included Apgar <7 at 5 min [Citation7, Citation8, Citation29, Citation35], cesarean delivery for nonreassuring fetal status (NRFS) [Citation35], chronic villitis [Citation29], cord blood pH <7.1 [Citation7,Citation35], fetal vascular malperfusion [Citation29], instrumental delivery for NRFS [Citation35], maternal vascular malperfusion [Citation29], mechanical ventilation [Citation8], perinatal mortality [Citation5, Citation8, Citation33], neonatal hypoglycemia [Citation7, Citation8], neonatal SGA [Citation6, Citation7, Citation24, Citation30, Citation32, Citation34], neonatal LGA [Citation17, Citation24, Citation27, Citation34], neonatal intensive care unit (NICU) admission [Citation7, Citation8, Citation24], respiratory distress syndrome [Citation7] and stillbirth [Citation28, Citation33, Citation35] (Tables S2 and S3).

Seven studies investigated a composite outcome that captured the occurrence of one or more outcomes [Citation5, Citation7, Citation8, Citation25, Citation31, Citation33, Citation35]. However, the outcomes included in this composite were remarkably heterogeneous and were not included in the MA (Table S4).

In general, IG-21 presented low sensitivity and high specificity for the prediction of outcomes. Considering all studies, outcomes and cutoffs, 95% of the investigations presented a specificity above 73.0%, while for 95% of the investigations, the sensitivity was below 64.1% (Tables S2 and S3).

For analysis considering low percentiles as exposure, cutoffs varied between < p3 and < p25. Sensitivity varied between 3.0% (composite outcome; <p3) and 87.9% (composite outcome; <p11.61). Specificity varied between 80.5% (composite outcome; <p11.61) and 99.4% (neonatal SGA; <p10) (Table S2).

For analysis considering high percentiles as exposure, cutoffs varied between > p75 and > p97. Sensitivity varied between 1.1% for mechanical ventilation and 88.4% for neonatal LGA (>p90 for both). Specificity varied between 48.0% (mortality; >p75) and 93.0% (NICU admission; >p90) (Table S3).

The + LR was usually greater than 1 for all outcomes and cutoffs, while the -LR was lower than or equal to 1. The PPV was generally lower than 50%, while the NPV was higher than 90%. Higher PPV values were observed for neonatal SGA, although they were still lower than 86.9% (Tables S2 and S3).

The AUC values varied between 0.53 and 0.62 for the composite outcome. The exception was the study of Zhu et al. (2019), which presented an AUC of 0.90. Mixed results were expected for the composite outcome because of its heterogeneous definition. The most significant AUC estimates were found for the prediction of neonatal SGA (between 0.83 and 0.90) (Table S5).

A detailed description of the eligibility criteria and population characteristics for each study is available in Table S6.

Pooled results for the cutoff < p10 demonstrated a higher RR of neonatal SGA (6.71 [95%CI: 5.51–8.17]), Apgar <7 at 5 min (2.17 [95%CI: 1.48–3.18]) and NICU admission (2.22 [95%CI: 1.76–2.79]) for fetuses classified below the 10th percentile by the IG-21 EFW standard when compared to those classified above this limit. The RRs of neonatal hypoglycemia and cord blood pH < 7.1 were not statistically significant. Pooled results for the cutoff > p90 demonstrated a higher RR of neonatal LGA (6.15 [95%CI: 3.72–10.14]) for fetuses classified above the 90th percentile by the IG-21 EFW standard (). The outcomes of neonatal hypoglycemia and LGA presented substantial heterogeneity.

Figure 2. Estimate effects and pooled results for outcomes evaluated by two or more studies with similar cutoffs. It included only studies whose fetal weight was estimated by formulas based on USG measurements over 32 weeks of pregnancy. tp, true positives; fp, false positives; fn, false negatives; tn, true negatives; RR, relative risk; SGA, small for gestational age; NICU: neonatal intensive care unit; LGA: large for gestational age; For studies with a zero cell in the contingency table, the Stata command [metan] automatically adds 0.5 in all cells.

Figure 2. Estimate effects and pooled results for outcomes evaluated by two or more studies with similar cutoffs. It included only studies whose fetal weight was estimated by formulas based on USG measurements over 32 weeks of pregnancy. tp, true positives; fp, false positives; fn, false negatives; tn, true negatives; RR, relative risk; SGA, small for gestational age; NICU: neonatal intensive care unit; LGA: large for gestational age; For studies with a zero cell in the contingency table, the Stata command [metan] automatically adds 0.5 in all cells.

SROC showed a great variability in sensitivity. Specificity was greater than 0.9 for all outcomes analyzed. The diagnostic accuracy of IG-21 EFW varied according to the outcome studied, following this descending order: neonatal SGA, neonatal LGA, Apgar < 7 at 5 min, neonatal hypoglycemia, NICU admission and cord blood pH < 7.1 ().

Figure 3. SROC curve for outcomes evaluated by two or more studies with similar cutoffs. It included only studies whose fetal weight was estimated by formulas based on USG measurements over 32 weeks of pregnancy. SGA: small for gestational age; NICU: neonatal intensive care unit; LGA: large for gestational age.

Figure 3. SROC curve for outcomes evaluated by two or more studies with similar cutoffs. It included only studies whose fetal weight was estimated by formulas based on USG measurements over 32 weeks of pregnancy. SGA: small for gestational age; NICU: neonatal intensive care unit; LGA: large for gestational age.

Further investigation of the heterogeneity was not possible due to the small number of studies. For the same reason, sensitivity analysis and an evaluation of publication bias were not performed. Considering the Cochrane Handbook, the minimum number of studies to apply tests for funnel plot asymmetry is 10, as with small number of studies the power of the tests to distinguish chance from real asymmetry is too low [Citation20].

Discussion

This study synthesizes the evidence regarding the ability of the IG-21 EFW standard to predict adverse outcomes. In summary, we observed that the diagnostic accuracy of IG-21 is limited by its low sensitivity, -LR, and PPV. However, if used as a screening tool, it has good performance, with high specificity. The classification below the 10th percentile was associated with a higher risk for adverse outcomes, with significantly higher risks of neonatal SGA, Apgar <7 at 5 min and NICU admission.

Our results show high specificity and low sensitivity for the most commonly used cutoff points (10th and 90th percentiles). This means that the probability of developing adverse outcomes among those classified at the highest/lowest percentiles is high and the chance of false-positive screening is low. On the other hand, low sensitivity indicates a high rate of false-negative screening, which means that many individuals who are classified as having adequate EFW may still develop adverse outcomes.

The advantages of a screening tool with high specificity are preserving families from the emotional impact of a false positive result and decreased rates of unnecessary procedures, interventions, and iatrogenic preterm births, additionally saving valuable resources and, potentially, lives. However, the lack of sensitivity is translated in many cases of adverse outcomes missed, not receiving timely interventions [Citation36].

The use of alternative cutoff points can improve discrimination power. More restrictive cutoffs increase the rates of false-negative results and must be avoided. More embracing cutoffs are more sensitive, which could improve IG-21 EFW standard performance. Only three studies reported the best cutoff according to receiver operating characteristic (ROC) curve analysis. Blue et al. [Citation6], Zhu et al. [Citation31] and Kato et al. [Citation30] identified the 22, 11.61 and 40.9 percentiles as those optimizing the balance of true- and false-positive and false-negative results, respectively. The best cutoff point may vary according to the outcome of interest and its prevalence in each geographic region [Citation37].

PPV and NPV must be interpreted with caution since they are dependent on the outcome incidence and are not fixed characteristics of the test [Citation38]. PPV is directly influenced by the prevalence, while NPV is inversely affected by it. In this way, the low prevalence of SLR outcomes can partially explain the high NPV and low PPV observed.

Some included papers have unique characteristics that need to be highlighted. Nahirney et al. [Citation24], Choi et al. [Citation33], Hiersch et al. [Citation28] and Melamed et al. [Citation29] did not estimate the fetal weight but used the BW to categorize infants with SGA or LGA according to the IG-21 EFW standard [Citation4]. Lorusso et al. [Citation17], Vikraman & Elayedatt [Citation32], and Kato et al. [Citation30] estimated the fetal weight using Hadlock formulas. IG-21 EFW charts were not originally developed to classify BW or EFW derived by other formulas, and these strategies can bias the results in unexpected forms.

We did not expect to find studies using BW in IG-21 EFW centiles in our literature search; therefore, it was not anticipated in our eligibility criteria. However, we decided to keep these studies in the SLR to emphasize the recurrent presence of this approach in the literature and discourage further authors from following this methodology. The use of BW rather than EFW is expected to find overly optimistic assessments of predictive performance. These studies were not included in our MA, and the exclusion of their results from the SLR does not change our main conclusions.

The risk of bias assessment indicated that important strategies to avoid bias and ensure transparency were not implemented or reported in the included studies. The lack of blindness in the outcome assessment may overestimate the method’s predictive ability, especially when the outcome requires subjective interpretation (e.g. Apgar score) [Citation18, Citation39]. In turn, the methods used to measure the predictor may influence the results in several ways. Studies do not clearly state the ultrasonographic procedures and measures of the parameters that compose the EFW formula, which is essential for readers to contextualize the results [Citation18]. Finally, calibration assessment is essential when predictions are used for clinical decisions [Citation40]. Especially for those studies investigating the accuracy of the IG-21 formula to predict BW, the calibration plot would provide a better idea of how it performs in each population and whether it produces valid measures for the outcome of interest.

It is expected that any growth standard would find an association with SGA-related outcomes when comparing below and above a particular cutoff. This aspect may be enriched by comparing the results of various standards. All papers included in this SLR compared the performance of the IG-21 standard with other methods. However, our literature review was not designed to accomplish this objective. Thus, it must be explored in future investigations.

This is the first study to synthesize data on the ability of IG-21 EFW standards to predict adverse maternal, fetal and neonatal outcomes worldwide, considering different cutoff points. The search process was extremely sensitive, with three steps in multiple databases. Moreover, we extracted and systematized quantitative data to perform a MA for some of the outcomes. As limitations, we could not investigate the origin of the MA heterogeneous results or the possibility of publication bias since no outcomes were evaluated by more than five studies.

EFW formula and curves are considered important tools for predicting adverse pregnancy outcomes worldwide, especially SGA and LGA. The idea proposed by the IG-21 consortium of a population-based international standard derived prospectively, using a prescriptive approach, has a biological basis and makes sense in actual multicultural societies [Citation4]. With the findings of our study, we could better understand this standard’s discrimination characteristics for its use in clinical practice in the present SLR.

Recommendations for future studies include using prospective data, standardized methods, and blinded assessment of outcomes. Researchers should assess and report its performance in terms of both calibration and discrimination when investigating any tool’s predictive value. Studies aiming to confirm the usefulness of the IG-21 standards for predicting adverse outcomes can be improved by the evaluation of predefined and biologically plausible alternative cutoff points by ROC curve analysis and the use of the IG-21 formula to obtain EFW instead of other formulas or BW. On the other hand, dichotomizing continuous predictors always means losing information or, even worse, might entail biased findings based on the data-driven selection of a “best cutoff point” [Citation41]. Thus, authors should consider analyzing it as a continuous variable, mainly when included in multivariable regression models, as suggested by Stirnemann et al. [Citation4].

Conclusion

Being classified as adequate EFW by IG-21 is insufficient to exclude a more rigorous follow-up due to the high false-negative rates. However, the evidence indicates that patients classified in extreme percentiles should be monitored more closely since this is a highly specific tool, meaning these individuals have a significantly increased risk of developing maternal, fetal and neonatal adverse outcomes; especially neonatal SGA, Apgar <7 at 5 min, and NICU admission. More standardized and methodologically robust research would benefit this area, as evidenced by the high or unclear risk of bias observed in some of the individual studies included in our review.

Author contributions

FR performed the search, obtained, and interpreted the data for the work, and drafted the manuscript. FR, TRBC, RC participated in the screening process and full-text reading. FR, RC: performed the risk of bias assessment. FR, TRBC, RC, DRF, MMS, EOO, GK: Contributed to the conception and design of the work, revised the manuscript for intellectual content, provided final approval for publication, and agreed to be accountable for all aspects of the work.

Supplemental material

Supplemental Material

Download Zip (311.4 KB)

Acknowledgments

The authors thank Ronaldo Alves and Mylena Gonzalez for their contribution to screening studies for inclusion in this review.

Disclosure statement

GK, EOO, and MMS worked with the INTERGROWTH-21st consortium.

Additional information

Funding

This study was supported by the Brazilian National Research Council (CNPq) and the Brazilian Ministry of Health [grant numbers 408678/2017-8, 443770/2018-2]. The funders did not participate in conducting research or writing the paper.

References

  • McCowan LM, Figueras F, Anderson NH. Evidence-based national guidelines for the management of suspected fetal growth restriction: comparison, consensus, and controversy. Am J Obstet Gynecol. 2018;218(2S):S855–S868. DOI:10.1016/j.ajog.2017.12.004.
  • Hammami A, Zumaeta AM, Syngelaki A, et al. Ultrasonographic estimation of fetal weight: development of new model and assessment of performance of previous models. Ultrasound Obstet Gynecol. 2018;52(1):35–43. DOI:10.1002/uog.19066.
  • Hadlock FP, Harrist RB, Sharman RS, et al. Estimation of fetal weight with the use of head, body, and femur measurements-a prospective study. Am J Obstet Gynecol. 1985;151(3):333–337. DOI:10.1016/0002-9378(85)90298-4.
  • Stirnemann J, Villar J, Salomon LJ, et al. International estimated fetal weight standards of the INTERGROWTH-21st project. Ultrasound Obstet Gynecol. 2017;49(4):478–486. DOI:10.1002/uog.17347.
  • Hua X, Shen M, Reddy UM, et al. Comparison of the INTERGROWTH‐21st, national institute of child health and human development, and WHO fetal growth standards. Int J Gynaecol Obstet. 2018;143(2):156–163. DOI:10.1002/ijgo.12637.
  • Blue NR, Savabi M, Beddow ME, et al. The hadlock method is superior to newer methods for the prediction of the birth weight percentile. J Ultrasound Med. 2019;38(3):587–596. DOI:10.1002/jum.14725.
  • Nwabuobi C, Camisasca-Lopina H, Leavitt K, et al. INTERGROWTH-21st and hadlock growth standards to predict neonatal small for gestational age and short-term neonatal outcomes. Am J Obstet Gynecol. 2018;218(1):S310. DOI:10.1016/j.ajog.2017.11.044.
  • Kabiri D, Romero R, Gudicha DW, et al. Prediction of adverse perinatal outcomes by fetal biometry: a comparison of customized and population-based standards. Ultrasound Obstet Gynecol. 2020;55(2):177–188. DOI:10.1002/uog.20299.
  • Page MJ, McKenzie J, Bossuyt P, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2020;372:n71.
  • Cheikh Ismail L, Bishop DC, Pang R, et al. Gestational weight gain standards based on women enrolled in the fetal growth longitudinal study of the INTERGROWTH-21st project: a prospective longitudinal cohort study. BMJ. 2016;352:i555. DOI:10.1136/bmj.i555.
  • Papageorghiou AT, Ohuma EO, Gravett MG, et al. International standards for symphysis-fundal height based on serial measurements from the fetal growth longitudinal study of the INTERGROWTH-21st project: prospective cohort study in eight countries. BMJ. 2016;355:i5662. DOI:10.1136/bmj.i5662.
  • Papageorghiou AT, Ohuma EO, Altman DG, et al. International standards for fetal growth based on serial ultrasound measurements: the fetal growth longitudinal study of the INTERGROWTH-21st project. Lancet. 2014;384(9946):869–879. DOI:10.1016/S0140-6736(14)61490-2.
  • Villar J, Cheikh Ismail L, Victora CG, et al. International standards for newborn weight, length, and head circumference by gestational age and sex: the newborn Cross-Sectional study of the INTERGROWTH-21st project. Lancet. 2014;384(9946):857–868. DOI:10.1016/S0140-6736(14)60932-6.
  • Covidence systematic review software. Veritas Health Innovation, Melbourne, Australia. www.covidence.org
  • Fedorov S. GetData graph digitizer. 2002.
  • Cheng X, Folco EJ, Shimizu K, et al. Adiponectin induces pro-inflammatory programs in human macrophages and CD4+ T cells. J Biol Chem. 2012;287(44):36896–36904. DOI:10.1074/jbc.M112.409516.
  • Lorusso L, Kato DMP, Dalla Costa NRA, et al. Performance of local reference curve on the diagnosis of large for gestational age fetuses in diabetic pregnant women. J Matern Fetal Neonatal Med. 2022;35(10):1899–1906. DOI:10.1080/14767058.2020.1774539.
  • Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. DOI:10.1371/journal.pmed.1001744.
  • Papageorghiou AT, Sarris I, Ioannou C, et al. Ultrasound methodology used to construct the fetal growth standards in the INTERGROWTH-21st project. BJOG. 2013;120:27–32. DOI:10.1111/1471-0528.12313.
  • Higgins JPT, Cochrane collaboration. Cochrane handbook for systematic reviews of interventions. 2nd ed. Hoboken (NJ): wiley-Blackwell; 2020.
  • StataCorp. Stata Statistical Software: release 12. 2011;
  • The Cochrane Collaboration. Review Manager (RevMan). 2020;
  • Finneran MM, Ware CA, Russo J, et al. Use of birth weight- vs. ultrasound-derived fetal weight classification methods: implications for detection of abnormal umbilical artery doppler. J Perinat Med. 2020;48(6):615–624. DOI:10.1515/jpm-2020-0068.
  • Nahirney M, Chaput K, Metcalfe A. Assessing the role of maternal race on the prediction of NICU admission by three growth charts: a cross-sectional study. J Matern-Fetal Neonatal Med. 2021;34(8):1233–1240. DOI:10.1080/14767058.2019.1631791.
  • Morales-Roselló J, Cañada Martínez AJ, Scarinci E, et al. Comparison of cerebroplacental ratio, intergrowth-21st standards, customized growth, and local population references for the prediction of fetal compromise: which is the best approach? Fetal Diagn Ther. 2019;46(5):341–352. DOI:10.1159/000497142.
  • Saviron-Cornudella R, Mariano Esteban L, Lerma D, et al. Comparison of fetal weight distribution improved by paternal height by spanish standard versus intergrowth 21st standard. J Perinat Med. 2018;46(7):750–759. DOI:10.1515/jpm-2016-0298.
  • Savirón-Cornudella R, Esteban LM, Aznar-Gimeno R, et al. Prediction of large for gestational age by ultrasound at 35 weeks and impact of Ultrasound-Delivery interval: comparison of 6 standards. Fetal Diagn Ther. 2021;48(1):15–23. DOI:10.1159/000510020.
  • Hiersch L, Lipworth H, Kingdom J, et al. Identification of the optimal growth chart and threshold for the prediction of antepartum stillbirth. Arch Gynecol Obstet. 2021;303(2):381–390. DOI:10.1007/s00404-020-05747-4.
  • Melamed N, Hiersch L, Aviram A, et al. Diagnostic accuracy of fetal growth charts for placenta-related fetal growth restriction. Placenta. 2021;105:70–77. DOI:10.1016/j.placenta.2021.01.022.
  • Kato DMP, Lorusso L, Bruns RF, et al. Performance of a local reference curve for predicting small for gestational age fetuses in pregnant women with HIV/AIDS. J Clin Ultrasound. 2021;49(4):322–327. DOI:10.1002/jcu.22961.
  • Zhu C, Ren Y-Y, Wu J-N, et al. A comparison of prediction of adverse perinatal outcomes between hadlock and INTERGROWTH-21st standards at the third trimester. Biomed Res Int. 2019;2019:7698038. DOI:10.1155/2019/7698038.
  • Vikraman S, Elayedatt R. Prospective comparative evaluation of performance of fetal growth charts in the diagnosis of suboptimal fetal growth during third trimester ultrasound. J Fetal Med. 2020;07(02):103–110. DOI:10.1007/s40556-020-00244-9.
  • Choi SKY, Gordon A, Hilder L, et al. Performance of six birthweight and estimated fetal weight standards for predicting adverse perinatal outcomes: a 10-year nationwide population-based study. Ultrasound Obstet Gynecol. 2021;58(2):264–277. DOI:10.1002/uog.22151.
  • Sovio U, Smith GCS. Comparison of estimated fetal weight percentiles near term for predicting extremes of birthweight percentile. Am J Obstet Gynecol. 2021;224(3):292.e1–292.e19. e1-292.e19. DOI:10.1016/j.ajog.2020.08.054.
  • Savirón-Cornudella R, Esteban LM, Tajada-Duaso M, et al. Detection of adverse perinatal outcomes at term delivery using ultrasound estimated percentile weight at 35 weeks of gestation: comparison of five fetal growth standards. Fetal Diagn Ther. 2020;47(2):104–114. DOI:10.1159/000500453.
  • Callec R, Lamy C, Perdriolle‐Galet E, et al. Impact on obstetric outcome of third-trimester screening for small-for-gestational-age fetuses. Ultrasound Obstet Gynecol. 2015;46(2):216–220. DOI:10.1002/uog.14755.
  • Habibzadeh F, Habibzadeh P, Yadollahie M. On determining the most appropriate test cut-off value: the case of tests with continuous results. Biochem Med. 2016;26(3):297–307. DOI:10.11613/BM.2016.034.
  • Ranganathan P, Aggarwal R. Common pitfalls in statistical analysis: understanding the properties of diagnostic tests – part 1. Perspect Clin Res. 2018;9(1):40–43. DOI:10.4103/picr.PICR_170_17.
  • Whiting P, Rutjes AWS, Reitsma B, et al. Sources of variation and bias in studies of diagnostic accuracy. Ann Intern Med. 2004;140(3):189–202. DOI:10.7326/0003-4819-140-3-200402030-00010.
  • Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology. 2010;21(1):128–138. DOI:10.1097/EDE.0b013e3181c30fb2.
  • Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25(1):127–141. DOI:10.1002/sim.2331.