2,938
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Artificial Intelligence-based Prediction of Diabetes and Prediabetes Using Health Checkup Data in Korea

, , , &
Article: 2145644 | Received 18 Aug 2022, Accepted 04 Nov 2022, Published online: 21 Nov 2022

Figures & data

Figure 1. Flow chart of study participants.

Figure 1. Flow chart of study participants.

Table 1. Checkup variables under study.

Figure 2. Machine learning model architecture: (a) Logistic regression; (b) naïve bayes; (c) support vector machine; (d) random forest; (e) extremely randomized tree (f) extreme gradient boosting; (g) light gradient boosting machine; (h) multilayer perceptron.

Figure 2. Machine learning model architecture: (a) Logistic regression; (b) naïve bayes; (c) support vector machine; (d) random forest; (e) extremely randomized tree (f) extreme gradient boosting; (g) light gradient boosting machine; (h) multilayer perceptron.

Table 2. General characteristics of two datasets in the study design.

Figure 3. Odds ratio plot for statistically significant features. Plot (a) displays the probability of developing prediabetes from normal as one unit of each feature increases and Plot (b) displays the probability of developing diabetes from prediabetes as one unit of each feature increases.

ALB, Albumin; TBil, Total Bilirubin; Urine_SG, Urine Specific Gravity; FH_DM, Family History of Diabetes Mellitus; FBG, Fasting Blood Glucose; Hct, Hematocrit; BFP, Body Fat Percentage; PR, Pulse Rate; SBP, Systolic Blood Pressure; AST, Aspartate Aminotransferase; GGT, Gamma-Glutamyl Transferase; ESR, Erythrocyte Sedimentation Rate; DBP, Diastolic Blood Pressure; RDW, Red Cell Distribution Width; RBC, Red Blood Cell; HDL, High-density Lipoprotein; FVC, Forced Vital Capacity.
Figure 3. Odds ratio plot for statistically significant features. Plot (a) displays the probability of developing prediabetes from normal as one unit of each feature increases and Plot (b) displays the probability of developing diabetes from prediabetes as one unit of each feature increases.

Table 3. The top-10 ranked variables by permutation feature importance for each ML in two datasets.

Table 4. Variable ranking for all 8 models by permutation feature importance.

Figure 4. Feature selection through Boruta algorithm.

Figure 4. Feature selection through Boruta algorithm.

Table 5. The variable selected Boruta, SelectKbest, Lasso method.

Table 6. The performance measure of each classification algorithm.

Figure 5. Violin plot. (a) prediabetes progression (b) diabetes progression associated with FBG, HbA1c, Hct.

Figure 5. Violin plot. (a) prediabetes progression (b) diabetes progression associated with FBG, HbA1c, Hct.

Figure 6. Confusion matrix. (a) prediabetes progression; (b) diabetes progression through each machine learning algorithm.

Figure 6. Confusion matrix. (a) prediabetes progression; (b) diabetes progression through each machine learning algorithm.

Figure 7. ROC curve (a) prediabetes progression, plot (b) diabetes progression through each machine learning algorithm.

Figure 7. ROC curve (a) prediabetes progression, plot (b) diabetes progression through each machine learning algorithm.