568
Views
12
CrossRef citations to date
0
Altmetric
Original Research

Cardiovascular Disease Prediction by Machine Learning Algorithms Based on Cytokines in Kazakhs of China

, , , , , , , , , ORCID Icon, , ORCID Icon & show all
Pages 417-428 | Published online: 09 Jun 2021
 

Abstract

Background

Cardiovascular disease (CVD) is the leading cause of mortality worldwide. Accurately identifying subjects at high-risk of CVD may improve CVD outcomes. We sought to systematically examine the feasibility and performance of 7 widely used machine learning (ML) algorithms in predicting CVD risks.

Methods

The final analysis included 1508 Kazakh subjects in China without CVD at baseline who completed follow-up. All subjects were randomly divided into the training set (80%) and the test set (20%). L1-penalized logistic regression (LR), support vector machine with radial basis function (SVM), decision tree (DT), random forest (RF), k-nearest neighbors (KNN), Gaussian naive Bayes (NB), and extreme gradient boosting (XGB) were employed for prediction CVD outcomes. Ten-fold cross-validation was used during model developing and hyperparameters tuning in the training set. Model performance was evaluated in the test set in light of discrimination, calibration, and clinical usefulness. RF was applied to obtain the variable importance of included variables. Twenty-two variables, including sociodemographic characteristics, medical history, cytokines, and synthetic indices, were used for model development.

Results

Among 1508 subjects, 203 were diagnosed with CVD over a median follow-up of 5.17 years. All 7 models had moderate to excellent discrimination (AUC ranged from 0.770 to 0.872) and were well calibrated. LR and SVM performed identically with an AUC of 0.872 (95% CI: 0.829–0.907) and 0.868 (95% CI: 0.825–0.904), respectively. LR had the lowest Brier score (0.078) and the highest sensitivity (97.1%). Decision curve analysis indicated that SVM was slightly better than LR. The inflammatory cytokines, such as hs-CRP and IL-6, were identified as strong predictors of CVD.

Conclusion

SVM and LR can be applied to guide clinical decision-making in the Kazakh Chinese population, and further study is required to ensure their accuracies.

Acknowledgments

We sincerely thank those who participated in the study. We would also like to acknowledge the clinical laboratory of First Affiliated Hospital of Shihezi University School of Medicine for their work.

Data Sharing Statement

The data used to support the findings of this study are available from the corresponding author (Shuxia Guo, [email protected]) upon reasonable request.

Ethics Approval and Consent to Participate

This study was approved by the Institutional Ethics Review Board (IERB) of the First Affiliated Hospital of Shihezi University School of Medicine (IERB No. SHZ2010LL01). All of the participants provided their written informed consent prior to the start of the study. All methods were carried out in accordance with relevant guidelines and regulations.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Disclosure

The authors declare that they have no conflicts of interest.

Additional information

Funding

This study was funded by the Shihezi University Innovation Outstanding Young Talents Program (Natural Science) (No. CXPY201908), the National Natural Science Foundation of China (No. 81560551) and the Key Programs for Science and Technology from Xinjiang Production and Construction Corps (No. 2018AB017).