882
Views
5
CrossRef citations to date
0
Altmetric
Article

Machine Learning vs. Survival Analysis Models: a study on right censored heart failure data

, & ORCID Icon
Pages 1899-1916 | Received 03 Sep 2021, Accepted 26 Mar 2022, Published online: 11 Apr 2022
 

Abstract

Machine Learning Models are known to understand the intricacies of the data well, but native ML models cannot be used in time-to-event analysis due to censoring. In this paper, we explore the use of Machine Learning Models in the field of Survival Analysis using right censored Heart Failure Clinical Records Dataset. For this purpose, we first identify the top most important features responsible for death due to heart failure using Recursive Feature Elimination and then see how Machine Learning models can be adapted to improve the time-to-event analysis outcomes. To deal with this, Machine Learning Models are modified using the techniques Inverse Probability of Censoring Weighting (IPCW) and IPCW Bagging and are trained using the processed dataset alongside various survival analysis models. Area Under the time-dependent ROC (AUC) is used as a performance metric. The results reveal that the average AUC value for Survival Analysis Models is 0.51 while that of Machine Learning Models processed using IPCW increased to 0.80, and those processed using IPCW Bagging increased by 0.82. This reflects that Machine Learning models outperform Survival Analysis models in the case of time-to-event analysis of right censored dataset, and hence, are better indicators of risk of heart disease.

Disclosure statement

There is no conflict of interest.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Acknowledgements

The authors acknowledge the support provided by Indian Institute of Technology Hyderabad, India.

Notes

1 Code is adapted from @ [Gonzalez Ginestet et al. Citation2021], R and Python languages are used for coding.

2 Cure fractions are calculated by considering mean values for the features which are not varying, i.e., mean EF is 38.08% and mean Creatinine is 1.394 mg/dL.

3 milliequivalents per litre

4 milligrams per decilitre

5 microliter

6 micrograms per liter

7 Smooth functions considered are - Adaptive smooth spline over the covariate EF and Thin plate regression spline over Creatinine multiplied with a factor of EF.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.