Abstract
Objective
Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease’s heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features.
Methods
After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation.
Results
Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%.
Conclusion
Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features.
Supplemental data for this article is available online at at www.tandfonline.com/ijas .
Acknowledgements
We thank Stefan Zielen, Sven Kluge, Helena Donath, Katherina Blümchen, Jordis Trischlera, and Johannes Schulze from Klinikum Goethe University (KGU) for providing the Frankfurt dataset for the present study.
Authors’ contribution
Piyush Bhardwaj: Conceptualization, Investigation, Software, Writing-Original draft; Ashish Tyagi: Supervision, Writing: Review & Editing; Shashank Tyagi: Supervision, Writing: Review & Editing; Joana Antão: Writing: Review & Editing; Qichen Deng: Supervision, Writing: Review & Editing.
Declaration of interest
The authors declare no conflict of interest and there has been no financial support for this work.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.