Predicting cervical cancer biopsy results using demographic and epidemiological parameters: a custom stacked ensemble machine learning approach

Table 2. Various researches that diagnose cervical cancer using machine learning approaches

Table 3. Description of the dataset

Figure 2. (a) Bar graph showing the percentage of null values in the dataset (b) The percentage of biopsy positive and biopsy negative results.

Figure 3. Count plots and density plots (a) The number of patients who smoke (b) The number of patients who use hormonal contraceptive (c) The number of patients who use intra-uterine devices (d) The number of patients who have sexually transmitted diseases (e) Patient age distribution (f) The number of pregnancies (g) The number of sexual partners.

Figure 4. Multivariate analysis using box plots.

Figure 5. The impact of STD’s on the biopsy result.

Figure 6. The presence of outliers in data. (a) Outliers before IQR treatment (b) After IQR treatment (Outliers removed).

Figure 7. Mutual Information which describes the relationship among various attributes which diagnose cervical cancer.

Figure 8. Pearson’s correlation heatmap which describes the relationship among various attributes which diagnose cervical cancer.

Figure 9. Biopsy results before and after balancing. (a) Initial unbalanced data (b) Balanced data after using Borderline-SMOTE.

Figure 10. Various steps followed to predict biopsy results using machine learning.

Figure 11. Custom stacking architecture to predict cervical cancer biopsy results.

Table

Table 4. Performance evaluation of classifiers without using feature selection and borderline-SMOTE (Unbalanced dataset)

Figure 12. ROC curves obtained by the classifiers. (a) Logistic regression (b) Decision tree (c) KNN (d) SVM (e) Naïve Bayes (f) STACK A.

Figure 13. Precision-recall curves of the initial set of classifiers.

Table 5. Performance evaluation of the initial set of classifiers after data balancing and hyperparameter tuning

Figure 14. AUC’s of the bagging, boosting and stacking classifiers. (a) Random forest (b) MLP (c) Adaboost (d) Catboost (e) Lightgbm (f) Xgboost (g) Extratrees (h)STACKB (i) STACKC.

Figure 15. Precision-recall curves of the bagging, boosting and stacking classifiers.

Figure 16. Confusion matrices: (a) STACK A (b) STACK B (c) STACK C.

Table 6. Performance evaluation of bagging and boosting classifiers after data balancing and hyperparameter tuning

Figure 17. Feature importance using SHAP. (a) Bee swarm plot (b) Mean SHAP values.

Figure 18. Feature importance using random forest.

Figure 19. Feature importance using LIME (Positive Biopsy result).

Figure 20. Feature importance using LIME (Negative Biopsy result).

Figure 21. Feature importance using LIME graphs. (a) Positive biopsy result (b) negative biopsy result.

Figure 22. Feature importance using ELI5 for cervical cancer positive biopsy result.

Figure 23. User interface of the prediction model deployed using “Gradio” to classify cervical cancer results.

Table 7. Comparison of various researches

Nithya, B., & Ilango, V. (2019, Jun). Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Applied Sciences, 1(6), 1–6. https://doi.org/10.1007/s42452-019-0645-7

PubMed Web of Science ®Google Scholar

Fernandes, K., Chicco, D., Cardoso, J. S., & Fernandes, J. (2018, May 14). Supervised deep learning embeddings for the prediction of cervical cancer diagnosis. PeerJ Computer Science, 4, e154. https://doi.org/10.7717/peerj-cs.154

PubMedGoogle Scholar

Ijaz, M. F., Attique, M., & Son, Y. (2020, January). Data-driven cervical cancer prediction model with outlier detection and over-sampling methods. Sensors, 20(10), 2809. https://doi.org/10.3390/s20102809

Suman, S. K., & Hooda, N. (2019, May). Predicting risk of Cervical Cancer: A case study of machine learning. Journal of Statistics and Management Systems, 22(4), 689–696. https://doi.org/10.1080/09720510.2019.1611227

Lee, C. K., Tse, Y. K., Ho, G. T., & Chung, S. H. (2021, January 1). Uncovering insights from healthcare archives to improve operations: An association analysis for cervical cancer screening. Technological Forecasting and Social Change, 162, 120375. https://doi.org/10.1016/j.techfore.2020.120375

Soğukkuyu, D. Y., & Ata, O. (2022). Diagnosing Cervical Cancer Using Machine Learning Methods. In2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Jun 9, (pp. 1–3). IEEE.

Aloss, A., Sahu, B., Deeb, H., & Mishra, D. (2022). A crow search algorithm-based machine learning model for heart disease and cervical cancer diagnosis. In InElectronic systems and intelligent computing (pp. 303–311). Springer. https://doi.org/10.1007/978-981-16-9488-2_27

Chauhan, N. K., & Singh, K. (2022, Jan). Performance assessment of machine learning classifiers using selective feature approaches for cervical cancer detection. Wireless Personal Communications, 12, 1–32. https://doi.org/10.1007/s11277-022-09467-7

Curia, F. (2021, July). Cervical cancer risk prediction with robust ensemble and explainable black boxes method. Health and Technology, 11(4), 875–885. https://doi.org/10.1007/s12553-021-00554-6

Jahan, S., Islam, M. D., Islam, L., Rashme, T. Y., Prova, A. A., Paul, B. K., Islam, M. D., & Mosharof, M. K. (2021, Oct). Automated invasive cervical cancer disease detection at early stage through suitable machine learning model. SN Applied Sciences, 3(10), 1–7. https://doi.org/10.1007/s42452-021-04786-z

Tanimu, J. J., Hamada, M., Hassan, M., Kakudi, H., & Abiodun, J. O. (2022). A machine learning method for classification of cervical cancer. Electronics, 11(3), 463. https://doi.org/10.3390/electronics11030463

Nsugbe, E. (2022, April). Towards the use of cybernetics for an enhanced cervical cancer care strategy. Intelligent Medicine, 2(3), 117–126. https://doi.org/10.1016/j.imed.2022.02.001

Nagadeepa, C. H., Sai, P. P., Madhuri, G., Reddy, K. S., & Reddy, D. V. (2022). Artificial Intelligence based Cervical Cancer Risk Prediction Using M1 Algorithms. In2022 International Conference on Emerging SMarcht Computing and Informatics (ESCI), March 9, (pp. 1–6). IEEE. https://doi.org/10.1109/ESCI53509.2022.9758241