101
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Artificial Intelligence and Cancer Clinical Research: III Risk Prediction Models for Febrile Neutropenia in Patients Receiving Cancer Chemotherapy

ORCID Icon &

Introduction

Chemotherapy-associated adverse events may have physical, emotional and economic effects as well as result in treatment dose reductions, delays or early stopping potentially compromising disease control and increasing cancer-related mortality (Citation1). Febrile neutropenia (FN) is defined as a single temperature ≥38.3 °C (or 38.0 °C for ≥1 h) in patients with moderate or severe neutropenia (<1,000 neutrophils per μl). FN is associated with considerable morbidity, mortality and costs requiring immediate evaluation and treatment (Citation2–4). While there is no International Classification of Diseases – Oncology (ICD-O) code specific for FN, the frequency of severe neutropenia is strongly correlated with FN and represents a useful predictive marker (Citation5). The risk of FN is directly related to the severity and duration of neutropenia following chemotherapy with the greatest risk of an initial FN event observed during the first cycle of treatment when most patients are receiving full dose chemotherapy (Citation6–8).

Risk factors and risk models for febrile neutropenia

Multiple risk factors for FN in patients receiving cancer chemotherapy have been identified in an effort to guide more personalized clinical decision making and appropriate supportive care measures (Citation1). Recognizing that accurate clinical assessment of FN risk by clinicians is limited (Citation9), several groups have developed formal risk models for identifying patients receiving cancer chemotherapy at increased risk of FN. The validation of these models is based on both model discrimination between low and high-risk patients as well as model calibration demonstrating close agreement between predicted and observed outcomes (Citation10). While internal validation is ideally based on bootstrapping, external validation is essential for clinical application and needs to be based on a separate population similar to individuals in which the model will eventually be utilized (Citation10,Citation11).

While the majority of modeling efforts have been based on relatively small patient numbers in selected disease or therapeutic settings (Citation12–16), more recent models have been based on larger more diverse cohorts of patients with cancer. A large, nationwide US prospective real-world study of more than 4,000 ambulatory patients with solid tumors or lymphoma initiating a new chemotherapy regimen was specifically designed and conducted to develop a validated FN risk model (Citation17). Multivariable logistic regression analysis was conducted using split sample development data on 2,425 patients demonstrating several factors significantly associated with the risk of FN in cycle 1 including previous chemotherapy, baseline leukopenia, renal dysfunction, several classes of myelosuppressive chemotherapeutic agents, planned relative dose intensity(RDI) >85%, and absence of prophylactic myeloid growth factor. Associations with other factors, such as cancer type, hepatic dysfunction or use of additional concurrent immunosuppressive drugs beyond chemotherapy were not as strong although potentially clinically relevant (Citation17). Overall model performance in the development and internal random split sample validation datasets were encouraging with area under the receiver operating curve (AUROC) of 0.833 (95% CI: 0.813–0.852) and 0.805 (95% CI: 0.774–0.836), respectively. The model was subsequently validated separately in a small (N = 221) external dataset extracted from electronic health records confirmed by medical record review of adult patients with newly diagnosed breast, ovary, lung cancer or lymphoma who received chemotherapy (Citation18). Although model performance in the independent dataset was slightly lower than in the original model, it was judged as reasonable with an AUROC of 0.748. Further validation efforts are ongoing.

Improving conventional risk models for FN

Efforts to improve the performance of FN risk models have focused on adding additional potential risk factors unavailable in the original studies (Citation19,Citation20). Li and colleagues conducted a large retrospective cohort study of 15,279 adult patients receiving cancer chemotherapy at Kaiser Permanente Southern California to develop and evaluate first cycle FN risk models using an equal split sample technique incorporating additional risk factors such as rheumatoid and thyroid disorders as well as other comorbidities (Citation21). The original Lyman-Kuderer model described above served as a reference comparator model (Citation17). Despite small improvements in FN risk prediction in the first cycle of chemotherapy, these were not considered clinically significant, and no overall improvement in FN risk reclassification was observed (Citation21). Although this study appears to further validate the original model, there was no significant improvement in overall performance despite the development of an updated model with several additional risk factors (AUROC 0.72 versys 0.71). The authors point out that for models that include standard risk factors and demonstrate reasonable discrimination, very large independent associations of new covariates with outcome are needed to demonstrate a meaningful improvement in AUROC (Citation21,Citation22).

Variable shrinkage approaches to FN modeling

Standard linear regression attempts to find the regression coefficients that minimize the sum of squared residuals. We have previously discussed in some detail the limitations of variable selection based on stepwise regression prompting the development of other methods including variable shrinkage methods for multivariable modeling (Citation20). Least absolute shrinkage and selection operator (LASSO) is designed to facilitate variable selection and regularization to enhance model accuracy. This approach reduces the risk of overfitting by fixing the sum of the regression coefficients and limiting the effect of weakly prognostic variables. Overfitting may result in the probability of the outcome being underestimated in low-risk patients and overestimated in those at high risk. While this approach may eliminate less important variables, it is not without some loss of predictive information. Alternatively, Ridge Regression shrinks coefficient estimates toward zero but does not eliminate them entirely adding a penalty to coefficients proportional to the square of the coefficients. This may be particularly useful when there is multicollinearity where predictors are highly correlated by reducing the variance of the coefficient estimates and reducing overfitting especially in models with many predictors. However, as the coefficient estimates may be subsequently biased, their interpretation can be more challenging. An intermediate approach for variable shrinkage is the Elastic Net which is a technique for linear regression models to prevent overfitting by incorporating features of both LASSO and Ridge Regression and is also useful with multicollinearity. Support Vector Machines (SVMs) are supervised learning models used for classification and regression tasks that generate a decision boundary between data categories. The resulting models are less affected by outliers and are also quite robust to overfitting. While these approaches may facilitate variable selection and improve model accuracy notably in settings where there are few events or the number of events per variable (EPV) is low, e.g., EPV < 10, they each make assumptions that must be assessed in the context of the specific model.

A Finnish study evaluated the electronic health records of 5,879 patients receiving cancer chemotherapy at a single hospital (Citation23). The primary outcome of neutropenic infection (NI) based on a surrogate endpoint of grade 4 neutropenia with an elevated serum C-reactive protein (>10 mg/l) was 4%. The authors developed a penalized regression model (Least absolute shrinkage and selection operator; LASSO) to reduce the risk of overfitting and guide variable selection by setting the sum of the regression coefficients to less than a fixed value. The model predicted NI with good discrimination with AUROCs of 0.84 (95% CI: 0.81–0.86) and 0.75 (95% CI:0.69–0.77) in the development (N = 2,101) and validation (N = 1,937) datasets, respectively. The LASSO model including G-CSF use, cancer type, pretreatment neutrophil and platelet counts, intravenous chemotherapy, and planned dose intensity outperformed the previously discussed Lyman-Kuderer and Li models based on conventional logistic regression methods (P < .001). Of note, the LASSO model also was predictive of FN as an outcome with an AUROC of 0.77 (0.71–0.81).

Deep learning approaches to modeling

More recent risk modelling approaches have turned to Deep Learning (DL) modes of machine learning with multiple levels of processing to ‘learn’ representations of data abstraction exploring the detailed structure of large data sets and guide how model parameters can change to improve data representation at each successive level (Citation24).

Artificial Neural Networks (ANNs) represent a class of DL algorithms inspired by our understanding of the structure and function of the human brain. They are represented by interconnected nodes organized in layers including an input layer, an output layer and multiple intermediate layers where computations occur. Connections between nodes each have associated weights which contribute to the weighted sum of their inputs to produce an output. Input data are processed to generate predictions and weights are assigned, often initially at random. Processing is done through the strength of the weighted connections between the input and output levels with weights adjusted during training. Measured difference between the predicted output and the actual output then enables backward propagation (Back propagation) to adjust the weights to reduce the difference between predicted and observed outputs. A parametric activation function is applied to the weighted sum of inputs to introduce non-linearity into the model, as the process proceeds to generate output. The model is trained through the optimization of weights and minimization of biases using a training dataset. The measure of how well the network’s predictions match the actual target values is termed the loss function including mean squared error for regression and cross-entropy for classification with smaller learning rates representing smaller adjustments. Parameters are further modified through the validation of the model in a separate dataset. The precision, sensitivity and accuracy of the model is assessed with performance often based on the AUROC.

TensorFlow is an open-source ML library developed by Google designed specifically for DL models. TensorFlow provides application programming interfaces (APIs) such as Keras to facilitate model building and is supported by Windows and macOS as well as other platforms. Keras is an open-source neural network library from Python acting as a user-friendly interface and is fully integrated into TensorFlow providing a high-level API to train neural network models effectively.

Gradient boosting is a DL approach used for regression and classification where each sequential model (often decision trees) tries to correct errors made previously. With each iteration, a new model is fitted to the residuals and added to previous predictions improving overall accuracy. The process of gradient descent minimizes the loss function adjusting model parameters toward reducing error. The final prediction is a collection of all models weighted by their contribution to error reduction compared to all previous models to achieve high accuracy. Importantly, it may provide information on the contribution of different model features. Gradient boosting is often implemented through rapid tree-based algorithms known as LightGBM or optimized gradient boosting algorithms. Extreme Gradient Boosting (XGBoost) enables each sequential model to minimize the prediction errors of previous models. Importantly, XGBoost includes Lasso and Ridge regression to reduce overfitting which is common when modeling real world data. XGBoost is capable of efficiently handling large data sets and missing values and is widely accessible to a range of programming languages with high performance metrics.

FN models based on deep learning versus regression methods

Until now, there has been limited exploration of the ability of DL approaches to improve variable selection and model specification in FN risk model development compared to regression-based approaches including penalized regression models such as LASSO, ridge regression and elastic net (Citation20,Citation25). Cho and colleagues recently developed a FN risk model in 933 hospitalized patients with breast cancer receiving cancer chemotherapy using ML algorithms (Citation26). The training dataset consisted of 843 patients while validation was based on a limited dataset of 90 patients. Of note, patients with missing data for clinical, pathologic or therapeutic data were excluded. FN events during the first cycle were reported in 43.4% and 47.8% of the development and validation datasets, respectively. Both classical approaches and ML algorithms were studied including LASSO, ridge regression and selector operator regression along with support vector machine, decision tree, ANNs along with XGBoosting algorithms. Factors associated with FN were selected from the data using recursive feature elimination with prediction models generated for each ML algorithm. While the authors claim that the ML algorithms improve the prediction of FN in patients with breast cancer receiving chemotherapy, the differences in model performance were minimal across approaches with AUROC ranging from 0.855 to 0.905 across all models and no algorithm clearly outperforming others. The multiple limitations of this study include the small sample size to inpatients with breast cancer, likely selection bias associated with the unexplainably high rates of FN not seen in other breast cancer studies and the lack of formal analytic comparisons of model performance between algorithms, that limit the generalizability and utility of these results.

Preliminary results from a comprehensive study of FN risk models based on advanced ML strategies compared to conventional methods in 86,161 real-world patients with solid and lymphoid malignancies receiving cancer chemotherapy were recently presented by Flanigan and colleagues (Citation27). The investigators utilized the Optum Research Database of Claims from commercial and Medicare Advantage health plan enrollees between 2009 and 2020 including participants with ≥1 claim for a chemotherapy agent. The previously developed logistic regression model by Lyman, Kuderer and collegues was validated in this population with an AUROC of 0.83 in the overall sample (Citation27). Likewise, the previously developed penalized logistic regression (LASSO) model by Li et al also performed similarly in the overall sample (AUROC = 0.83) (Citation21). Using advanced ML techniques, the authors found that the XGBoost model provided the best fit. However, overall model performance based on AUROC ranged between 0.82 and 0.84 across the models studied to date including logistic regression, LASSO, Tensor Flow Keras, Gradient Boosting and XGBoost with no clinically significant differences.

Another advanced ML modeling effort focused on a different but related outcome, i.e., predicting in-hospital mortality of patients with FN (Citation28). They utilized the Nationwide Inpatient Sample (NIS) in the UK to develop a risk model for mortality among patients with FN using ML techniques including linear models such as ridge logistic regression and linear support vector machine and non-linear models including gradient boosting tree and ANNs. The authors found that model performance was nearly identical across ML modeling strategies with an AUROC of approximately 0.92 with all approaches.

Given the limitations of the specific populations utilized for model training and the potential for selection and algorithm bias, further independent evaluation of the potential benefit of these and other forthcoming FN risk models based on advanced ML approaches must be awaited before their comparative value can be firmly established.

Conclusions

Numerous risk factors and risk models for the occurrence of FN in patients receiving cancer chemotherapy have been developed in anticipation of targeting appropriate preventative measures toward those patients at greatest risk (Citation1). Many models have been developed based on retrospective studies of relatively small numbers of patients. An FN risk model based on logistic regression using a large prospective cohort study has been independently validated but has limitations in discriminative performance, similar to models attempting to predict poor outcomes from FN. In an effort to improve model performance, studies have more recently focused on identifying additional risk factors and the application of more advanced modeling approaches. Studies applying variable shrinkage methods to reduce overfitting due to limited events relative to the number of covariates have demonstrated modest improvement in model accuracy. The current focus on advanced ML approaches has only recently been explored in this setting. To date, modeling studies aided by ANNs, support vector machines and random forests have demonstrated very limited benefits over regression-based approaches and are plagued by persistent concerns over methodologic limitations.

As discussed previously, the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) Statement has recently been updated to include important evaluation criteria in studies involving artificial intelligence (Citation29). The risk for bias in the development and validation of such risk prediction models has been demonstrated most commonly related to insufficient sample size. The use of split sample internal validation approaches and the lack of adherence to the appropriate methodological criteria also represent major limitations. Given the complexity and potential for bias in risk model studies utilizing advanced ML algorithms, considerably more research is needed with rigorous attention to optimal criteria for the development and validation of risk models as summarized in the TRIPOD Statement () (Citation29,Citation30). Cancer clinical research including prognostic and predictive modeling studies utilizing artificial intelligence including ML methods must be rigorously designed, conducted and objectively evaluated and independently validated considering both their strengths and significant limitations (Citation30,Citation31).

Table 1. The Transparent Reporting of multivariable prediction models for Individual Prognosis or Diagnosis and Artificial Intelligence (TRIPOD + AI) statement criteria (Citation29).

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

Additional information

Funding

No funding was associated with this work. Dr Lyman reports consulting for Sandoz, AstraZeneca, BeyondSpring and G1 Therapeutics. Dr Kuderer reports consulting for Sandoz, AstraZeneca, Janssen, Pfizer, BMS, Beyond Springs, G1 Therapeutics, Seattle Genetics, Fresenius.

References

  • Kuderer NM, Desai A, Lustberg MB, Lyman GH. Mitigating acute chemotherapy-associated adverse events in patients with cancer. Nat Rev Clin Oncol. 2022;19(11):681–697. doi:10.1038/s41571-022-00685-3.
  • Lyman GH, Sparreboom A. Chemotherapy dosing in overweight and obese patients with cancer. Nat Rev Clin Oncol. 2013;10(8):451–459. doi:10.1038/nrclinonc.2013.108.
  • Denduluri N, Patt DA, Wang Y, Bhor M, Li X, Favret AM, et al. Dose delays, dose reductions, and relative dose intensity in patients with cancer who received adjuvant or neoadjuvant chemotherapy in community oncology practices. J Natl Compr Canc Netw. 2015;13(11):1383–1393. doi:10.6004/jnccn.2015.0166.
  • Kuderer NM, Dale DC, Crawford J, Cosler LE, Lyman GH. Mortality, morbidity, and cost associated with febrile neutropenia in adult cancer patients. Cancer. 2006;106(10):2258–2266. doi:10.1002/cncr.21847.
  • Mohanlal R, Ogenstad S, Lyman GH, Huang L, Blayney DW. Grade 4 neutropenia frequency as a binary risk predictor for adverse clinical consequences of chemotherapy‑induced neutropenia: a meta-analysis. Cancer Invest. 2023;41(4):369–378. doi:10.1080/07357907.2023.2179064.
  • Bodey GP, Buckley M, Sathe YS, Freireich EJ. Quantitative relationships between circulating leukocytes and infection in patients with acute leukemia. Ann Intern Med. 1966;64(2):328–340. doi:10.7326/0003-4819-64-2-328.
  • Crawford J, Dale DC, Lyman GH. Chemotherapy-induced neutropenia: risks, consequences, and new directions for its management. Cancer. 2004;100(2):228–237. doi:10.1002/cncr.11882.
  • Blackwell S, Crawford J. Filgrastim (r-metHuG-CSF) in the chemotherapy setting. In: Morsten G, Dexter T, editors. Filgrastim (r-metHuG-CSF) in clinical practice. New York, NY, Marcel Dekker, 1994. p. 103–116.
  • Lyman GH, Dale DC, Legg JC, Abella E, Morrow PK, Whittaker S, et al. Assessing patients’ risk of febrile neutropenia: is there a correlation between physician-assessed risk and model-predicted risk? Cancer Med. 2015;4(8):1153–1160. doi:10.1002/cam4.454.
  • Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–138. doi:10.1097/EDE.0b013e3181c30fb2.
  • Lyman GH, Poniewierski MS. A patient risk model of chemotherapy-induced febrile neutropenia: lessons learned from the ANC study group. J Natl Compr Canc Netw. 2017;15(12):1543–1550. doi:10.6004/jnccn.2017.7038.
  • Pfeil AM, Vulsteke C, Paridaens R, Dieudonné A-S, Pettengell R, Hatse S, et al. Multivariable regression analysis of febrile neutropenia occurrence in early breast cancer patients receiving chemotherapy assessing patient-related, chemotherapy-related and genetic risk factors. BMC Cancer. 2014;14(1):201. doi:10.1186/1471-2407-14-201.
  • Dranitsaris G, Rayson D, Vincent M, Chang J, Gelmon K, Sandor D, et al. Identifying patients at high risk for neutropenic complications during chemotherapy for metastatic breast cancer with doxorubicin or pegylated liposomal doxorubicin: the development of a prediction model. Am J Clin Oncol. 2008;31(4):369–374. doi:10.1097/COC.0b013e318165c01d.
  • Hosmer W, Malin J, Wong M. Development and validation of a prediction model for the risk of developing febrile neutropenia in the first cycle of chemotherapy among elderly patients with breast, lung, colorectal, and prostate cancer. Support Care Cancer. 2011;19(3):333–341. doi:10.1007/s00520-010-0821-1.
  • Bozcuk H, Yıldız M, Artaç M, Kocer M, Kaya Ç, Ulukal E, et al. A prospectively validated nomogram for predicting the risk of chemotherapy-induced febrile neutropenia: a multicenter study. Support Care Cancer. 2015;23(6):1759–1767. doi:10.1007/s00520-014-2531-6.
  • Aapro M, Ludwig H, Bokemeyer C, Gascón P, Boccadoro M, Denhaerynck K, et al. Predictive modeling of the outcomes of chemotherapy-induced (febrile) neutropenia prophylaxis with biosimilar filgrastim (MONITOR-GCSF study). Ann Oncol. 2016;27(11):2039–2045. doi:10.1093/annonc/mdw309.
  • Lyman GH, Kuderer NM, Crawford J, Wolff DA, Culakova E, Poniewierski MS, et al. Predicting individual risk of neutropenic complications in patients receiving cancer chemotherapy. Cancer. 2011;117(9):1917–1927. doi:10.1002/cncr.25691.
  • Pawloski PA, Thomas AJ, Kane S, Vazquez-Benitez G, Shapiro GR, Lyman GH, et al. Predicting neutropenia risk in patients with cancer using electronic data. J Am Med Inform Assoc. 2017;24(e1):e129–e135. doi:10.1093/jamia/ocw131.
  • Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, et al. How to develop a more accurate risk prediction model when there are few events. BMJ. 2015;351:h3868. doi:10.1136/bmj.h3868.
  • Lyman GH, Msaouel P, Kuderer NM. Risk model development and validation in clinical oncology: lessons learned. Cancer Invest. 2023;41(1):1–11. doi:10.1080/07357907.2022.2137914.
  • Li Y, Family L, Chen LH, Page JH, Klippel Z, Xu L, et al. Value of incorporating newly identified risk factors into risk prediction for chemotherapy-induced febrile neutropenia. Cancer Med. 2018;7(8):4121–4131. doi:10.1002/cam4.1580.
  • Pencina MJ, D'Agostino RB, D'Agostino RB, Vasan RS Jr. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–172. discussion 207-12, doi:10.1002/sim.2929.
  • Venäläinen MS, Heervä E, Hirvonen O, Saraei S, Suomi T, Mikkola T, et al. Improved risk prediction of chemotherapy-induced neutropenia-model development and validation with real-world data. Cancer Med. 2022;11(3):654–663. doi:10.1002/cam4.4465.
  • LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi:10.1038/nature14539.
  • Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. doi:10.1016/j.jclinepi.2019.02.004.
  • Cho B-J, Kim KM, Bilegsaikhan S-E, Suh YJ. Machine learning improves the prediction of febrile neutropenia in Korean inpatients undergoing chemotherapy for breast cancer. Sci Rep. 2020;10(1):14803. doi:10.1038/s41598-020-71927-6.
  • Flanigan J, Arani R, Johnson M, et al. Optimizing G-CSF prophylaxis with AI-based modeling: quantifying clinical characteristics associated with febrile neutropenia (FN) in cancer patients. Support Care Cancer. 2024;32(Suppl 1):5434.
  • Du X, Min J, Shah CP, Bishnoi R, Hogan WR, Lemas DJ, et al. Predicting in-hospital mortality of patients with febrile neutropenia using machine learning models. Int J Med Inform. 2020;139:104140. doi:10.1016/j.ijmedinf.2020.104140.
  • Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi:10.1136/bmj-2023-078378.
  • Lyman GH, Kuderer NM. Artificial intelligence in cancer clinical research: II. Development and validation of clinical prediction models. Cancer Invest. 2024:1–5. doi:10.1080/07357907.2024.2354991.
  • Lyman GH, Kuderer NM. Artificial intelligence in cancer clinical research: I. Introduction. Cancer Invest. 2024:1–4. doi:10.1080/07357907.2024.2347784.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.