1,551
Views
18
CrossRef citations to date
0
Altmetric
Invited Review Articles

Potential value and impact of data mining and machine learning in clinical diagnostics

, , , , , ORCID Icon & ORCID Icon show all
Pages 275-296 | Received 29 Mar 2020, Accepted 26 Nov 2020, Published online: 19 Mar 2021

References

  • Naugler C, Church DL. Automation and artificial intelligence in the clinical laboratory. Crit Rev Clin Lab Sci. 2019;56(2):98–110.
  • Teng X, Gong Y. Research on application of machine learning in data mining. IOP Conf Ser Mater Sci Eng. 2018;392(6):062202.
  • Sahu H, Shrma S, Gondhalakar S. A brief overview on data mining survey. Int J Comput Technol Electr Eng. 2011;1(3):114–121.
  • Benke K, Benke G. Artificial intelligence and big data in public health. IJERPH. 2018;15(12):2796.
  • Bellinger C, Jabbar MSM, Zaïane O, et al. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health. 2017;17(1):907.
  • Sidey-Gibbons JA, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19(1):64.
  • Deo R. Machine learning in medicine. Circulation. 2015;132(20):1920–1930.
  • Gui C, Chan V. JUoWOMJ. Machine learning in medicine. UWOMJ. 2017;86(2):76–78.
  • Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–1358.
  • Gonoodi K, Tayefi M, Saberi-Karimian M, et al. An assessment of the risk factors for vitamin D deficiency using a decision tree model. Diabetes Metab Syndr. 2019;13(3):1773–1777.
  • Tayefi M, Saberi-Karimian M, Esmaeili H, et al. Evaluating of associated risk factors of metabolic syndrome by using decision tree. Comp Clin Pathol. 2018;27(1):215–223.
  • Tayefi M, Tajfard M, Saffar S, et al. hs-CRP is strongly associated with coronary heart disease (CHD): a data mining approach using decision tree algorithm. Comput Methods Programs Biomed. 2017;141:105–109.
  • Master SR, Mayer-Schönberger V. Learning from our mistakes: the future of validating complex diagnostics. Clin Chem. 2015;61(2):347–348.
  • Zheng T, Xie W, Xu L, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–127.
  • Selvakumar S, Sheik Abdullah A, Suganya R. Decision support system for type II diabetes and its risk factor prediction using bee-based harmony search and decision tree algorithm. IJBET. 2019;29(1):46–67.
  • Vasudevan P. Iterative dichotomiser-3 algorithm in data mining applied to diabetes database. J Comput Sci. 2014;10(7):1151–1155.
  • Uemura H, Ghaibeh AA, Katsuura-Kamano S, et al. Systemic inflammation and family history in relation to the prevalence of type 2 diabetes based on an alternating decision tree. Sci Rep. 2017;7:45502.
  • Khan SR, Mohan H, Liu Y, et al. The discovery of novel predictive biomarkers and early-stage pathophysiology for the transition from gestational diabetes to type 2 diabetes. Diabetologia. 2019;62(4):687–703.
  • Amadid H, Johansen NB, Bjerregaard A-L, et al. Physical activity dimensions associated with impaired glucose metabolism. Med Sci Sports Exerc. 2017;49(11):2176–2184.
  • Ramezankhani A, Pournik O, Shahrabi J, et al. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran lipid and glucose study. Diabetes Res Clin Pract. 2014;105(3):391–398.
  • Jelinek HF, Stranieri A, Yatsko A, et al. Data analytics identify glycated haemoglobin co-markers for type 2 diabetes mellitus diagnosis. Comput Biol Med. 2016;75:90–97.
  • Serdar MA, Serteser M, Ucal Y, et al. An Assessment of HbA1c in diabetes mellitus and pre-diabetes diagnosis: a multi-centered data mining study. Appl Biochem Biotechnol. 2020;190(1):44–56.
  • Basu S, Raghavan S, Wexler DJ, et al. Characteristics associated with decreased or increased mortality risk from glycemic therapy among patients with type 2 diabetes and high cardiovascular risk: machine learning analysis of the ACCORD Trial. Dia Care. 2018;41(3):604–612.
  • Georga EI, Protopappas VC, Polyzos D, Fotiadis DI, editors. Online prediction of glucose concentration in type 1 diabetes using extreme learning machines. Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2015: IEEE.
  • Ling SH, San PP, Nguyen HT. Non-invasive hypoglycemia monitoring system using extreme learning machine for Type 1 diabetes. ISA Trans. 2016;64:440–446.
  • Asad M, Qamar U, Zeb B, et al. editors. Blood glucose level prediction with minimal inputs using feedforward neural network for diabetic type 1 patients. Proceedings of the 2019 11th International Conference on Machine Learning and Computing; 2019.
  • Rouhani S, MirSharif M. Data Mining Approach for the Early Risk Assessment of Gestational Diabetes Mellitus. Int J Knowledge Discov Bioinforma. 2018;8(1):1–11.
  • Zeevi D, Korem T, Zmora N, et al. Personalized nutrition by prediction of glycemic responses. Cell. 2015;163(5):1079–1094.
  • Teixeira PL, Wei W-Q, Cronin RM, et al. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J Am Med Inform Assoc. 2017;24(1):162–171.
  • Ramezankhani A, Kabir A, Pournik O, et al. Classification-based data mining for identification of risk patterns associated with hypertension in Middle Eastern population: a 12-year longitudinal study. Medicine (Baltimore). 2016;95(35):e4143.
  • Ferreira JP, Pizard A, Machu JL, et al. Plasma protein biomarkers and their association with mutually exclusive cardiovascular phenotypes: the FIBRO-TARGETS case–control analyses. Clin Res Cardiol. 2020;109(1):22–33.
  • Jhee JH, Lee S, Park Y, et al. Prediction model development of late-onset preeclampsia using machine learning-based methods. PLoS One. 2019;14(8):e0221202.
  • Worachartcheewan A, Nantasenamat C, Isarankura-Na-Ayudhya C, et al. Quantitative population-health relationship (QPHR) for assessing metabolic syndrome. EXCLI J. 2013;12(569):569–583.
  • Kim T, Kim J, Won J, et al. A decision tree-based approach for identifying urban-rural differences in metabolic syndrome risk factors in the adult Korean population. J Endocrinol Invest. 2012;35(9):847–852.
  • Grabauskytė I, Tamošiūnas A, Kavaliauskas M, et al. A comparison of decision tree induction with binary logistic regression for the prediction of the risk of cardiovascular diseases in adult men. Informatica (Netherlands). 2018;29(4):675–692.
  • Shi CH, Zhao HH, Hou N, Chen JX, et al. Identifying metabolite and protein biomarkers in Unstable angina in-patients by feature selection based data mining method. Chem Res Chin Univ. 2011;27(1):87–93.
  • Ambale-Venkatesh B, Yang X, Wu CO, et al. Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circ Res. 2017;121(9):1092–1101.
  • Tseng Y-J, Huang C-E, Wen C-N, et al. Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int J Med Inf. 2019;128:79–86.
  • Banaei N, Moshfegh J, Mohseni-Kabir A, Houghton JM, et al. Machine learning algorithms enhance the specificity of cancer biomarker detection using SERS-based immunoassays in microfluidic chips. RSC Adv. 2019;9(4):1859–1868.
  • Dai M, Chen X, Mo S, et al. Meta-signature LncRNAs serve as novel biomarkers for colorectal cancer: integrated bioinformatics analysis, experimental validation and diagnostic evaluation. Sci Rep. 2017;7:46572.
  • Kawaguchi T, Kakuma T, Yatsuhashi H, et al. Data mining reveals complex interactions of risk factors and clinical feature profiling associated with the staging of non-hepatitis B virus/non-hepatitis C virus-related hepatocellular carcinoma. Hepatol Res. 2011;41(6):564–571.
  • Yamada S, Kawaguchi A, Kawaguchi T, et al. Serum albumin level is a notable profiling factor for non‐B, non‐C hepatitis virus‐related hepatocellular carcinoma: a data‐mining analysis. Hepatol Res. 2014;44(8):837–845.
  • Estevez J, Chen VL, Podlaha O, et al. Differential serum cytokine profiles in patients with chronic hepatitis b, c, and hepatocellular carcinoma. Sci Rep. 2017;7(1):1–11.
  • Omran DAEH, Awad ABH, Mabrouk MAER, et al. Application of data mining techniques to explore predictors of HCC in Egyptian patients with HCV-related chronic liver disease. Asian Pac J Cancer Prev. 2015;16(1):381–385.
  • Khameneh ME, Sepehri MM, Saberifiroozi M. Using data mining for identify patients at high risk to hepatocellular carcinoma in the cirrhosis liver. Preliminary report. Govaresh. 2014;19(4):265–274.
  • Kawaguchi T, Tokushige K, Hyogo H, et al. A data mining-based prognostic algorithm for NAFLD-related hepatoma patients: a nationwide study by the Japan Study Group of NAFLD. Sci Rep. 2018;8(1):10434.
  • Tanaka T, Kurosaki M, Lilly LB, et al. Identifying candidates with favorable prognosis following liver transplantation for hepatocellular carcinoma: data mining analysis. J Surg Oncol. 2015;112(1):72–79.
  • Pattanapairoj S, Silsirivanit A, Muisuk K, et al. Improve discrimination power of serum markers for diagnosis of cholangiocarcinoma using data mining-based approach. Clin Biochem. 2015;48(10–11):668–673.
  • Hu X, Zhang P, Shang A, et al. A primary proteomic analysis of serum from patients with nonfunctioning pituitary adenoma. J Int Med Res. 2012;40(1):95–104.
  • Pan CC, De Yin X, Yu JR, et al. Establishment of a specific serum proteomic profile model for liver organ-specific metastasis of nasopharyngeal carcinoma by matrix-assisted laser desorption/ionization time of flight mass spectrometry. Tumor. 2013;33(9):814–819.
  • Imai S, Yamada T, Kasashi K, et al. Construction of a risk prediction model of vancomycin-associated nephrotoxicity to be used at the time of initial therapeutic drug monitoring: a data mining analysis using a decision tree model. J Eval Clin Pract. 2019;25(1):163–170.
  • Yin W, Yi Y, Guan X, et al. Preprocedural Prediction Model for Contrast‐Induced Nephropathy Patients. JAHA. 2017;6(2):e004498.
  • Aqlan F, Markle R, Shamsan A, editors. Data mining for chronic kidney disease prediction. Proceedings of the 67th Annual Conference and Expo of the Institute of Industrial Engineers 2017; 2017.
  • Rucci P, Mandreoli M, Gibertoni D, et al. A clinical stratification tool for chronic kidney disease progression rate based on classification tree analysis. Nephrol Dial Transplant. 2014;29(3):603–610.
  • Satake K, Shimizu Y, Sasaki Y, et al. Serum under-O-glycosylated IgA1 level is not correlated with glomerular IgA deposition based upon heterogeneity in the composition of immune complexes in IgA nephropathy. BMC Nephrol. 2014;15(1):89.
  • Chen T, Brewster P, Tuttle KR, et al. Prediction of cardiovascular outcomes with machine learning techniques: application to the cardiovascular outcomes in renal atherosclerotic lesions (CORAL) study. Int J Nephrol Renovasc Dis. 2019;12:49–58.
  • Yoo KD, Noh J, Lee H, et al. A machine learning approach using survival statistics to predict graft survival in kidney transplant recipients: a multicenter cohort study. Sci Rep. 2017;7(1):1–12.
  • Leone M, Bechis C, Baumstarck K, et al. Outcome of acute mesenteric ischemia in the intensive care unit: a retrospective, multicenter study of 780 cases. Intensive Care Med. 2015;41(4):667–676.
  • Daly MC, von Allmen D, Wong HR. Biomarkers to estimate the probability of complicated appendicitis. J Pediatr Surg. 2018;53(3):437–440.
  • Petersen MB, Tolver A, Husted L, et al. Repeated measurements of blood lactate concentration as a prognostic marker in horses with acute colitis evaluated with classification and regression trees (CART) and random forest analysis. Veterinary Journal. 2016;213:18–23.
  • Ma H, Xu C-f, Shen Z, et al. Application of machine learning techniques for clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in China. Biomed Res Int. 2018;2018:4304376.
  • Saad Y, Awad A, Alakel W, et al. Data mining of routine laboratory tests can predict liver disease progression in Egyptian diabetic patients with hepatitis C virus (G4) infection: a cohort study of 71 806 patients. Eur J Gastroenterol Hepatol. 2018;30(2):201–206.
  • Hashem S, Esmat G, Elakel W, et al. Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinform. 2018;15(3):861–868.
  • Ferrario M, Cambiaghi A, Brunelli L, et al. Mortality prediction in patients with severe septic shock: a pilot study using a target metabolomics approach. Sci Rep. 2016;6:20391.
  • Dente CJ, Bradley M, Schobel S, et al. Towards precision medicine: accurate predictive modeling of infectious complications in combat casualties. J Trauma Acute Care Surg. 2017;83(4):609–616.
  • Tambuyzer T, Guiza F, Boonen E, et al. Heart rate time series characteristics for early detection of infections in critically ill patients. J Clin Monit Comput. 2017;31(2):407–415.
  • Kurosaki M, Matsunaga K, Hirayama I, et al. A predictive model of response to peginterferon ribavirin in chronic hepatitis C using classification and regression tree analysis. Hepatology Research. 2010;40(3):251–260.
  • Khairy M, Fouad R, Mabrouk M, et al. The impact of interleukin 28b gene polymorphism on the virological response to combined pegylated interferon and ribavirin therapy in chronic HCV genotype 4 infected Egyptian patients using data mining analysis. Hepat Mon. 2013;13(6):e10509.
  • El Raziky M, Fathalah WF, Zakaria Z, et al. Predictors of virological response in 3235 chronic HCV Egyptian patients treated with peginterferon alpha-2a compared with peginterferon alpha-2b using statistical methods and data mining techniques. J Interferon Cytokine Res. 2016;36(5):338–346.
  • Shang G, Richardson A, Gahan ME, et al. Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining. J Med Virol. 2013;85(8):1334–1339.
  • Coelho-dos-Reis JG, Peruhype-Magalhães V, Pascoal-Xavier MA, GIPH, et al. Flow cytometric-based protocols for assessing anti-MT-2 IgG1 reactivity: high-dimensional data handling to define predictors for clinical follow-up of Human T-cell Leukemia virus type-1 infection. J Immunol Methods. 2017;444:36–46.
  • Turk G, Ghiglione Y, Hormanstorfer M, et al. Biomarkers of progression after HIV acute/early infection: nothing compares to CD4+ T-cell count? Viruses. 2018;10(1):34.
  • Sui M, Huang X, Li Y, et al. Application and comparison of laboratory parameters for forecasting severe hand-foot-mouth disease using logistic regression, discriminant analysis and decision tree. Clin Lab. 2016;62(06/2016):1023–1031.
  • Julio TA, Vernal S, Massaro JD, et al. Biological predictors shared by dementia and bullous pemphigoid patients point out a cross-antigenicity between BP180/BP230 brain and skin isoforms. Immunol Res. 2018;66(5):567–576.
  • Sarraf S, Tofighi G. editors. Deep learning-based pipeline to recognize Alzheimer's disease using fMRI data. Proceedings of the 2016 Future Technologies Conference (FTC); 2016: IEEE.
  • Byeon H. A prediction model for mild cognitive impairment using random forests. IJACSA. 2015;6(12):8–12.
  • Bang S, Son S, Roh H, et al. Quad-phased data mining modeling for dementia diagnosis. BMC Med Inform Decis Mak. 2017;17(Suppl 1):60.
  • Husain W, Xin LK, Jothi N. editors. Predicting generalized anxiety disorder among women using random forest approach. Proceedings of the 2016 3rd International Conference on Computer and Information Sciences (ICCOINS); 2016: IEEE.
  • Kotsilieris T, Pintelas E, Livieris I, et al. Reviewing machine learning techniques for predicting anxiety disorders. University of Patras; 2018. (Technical Report TR01-18).
  • Das D, Ito J, Kadowaki T, et al. An interpretable machine learning model for diagnosis of Alzheimer's disease. PeerJ. 2019;7(3):e6543.
  • Safaee P, Noorossana R, Heidari K, et al. Using decision tree to predict serum ferritin level in women with anemia. Tehran Uni Med J, 2016;74(1):50–57. Available from: https://www.sid.ir/en/journal/ViewPaper.aspx?ID=506429
  • Girardeau Y, Jannot A-S, Chatellier G, et al. Association between borderline dysnatremia and mortality insight into a new data mining approach. BMC Med Inform Decis Mak. 2017;17(1):152.
  • Lee J-Y, Kim H-J. Identification of major risk factors association with respiratory diseases by data mining. J Korea Inf Sci Soc. 2014;25(2):373–384.
  • Uthoff J, Stephens MJ, Newell JD, Jr, et al. Machine learning approach for distinguishing malignant and benign lung nodules utilizing standardized perinodular parenchymal features from CT. Med Phys. 2019;46(7):3207–3216.
  • Franssen FM, Alter P, Bar N, et al. Personalized medicine for patients with COPD: where are we? Int J Chron Obstruct Pulmon Dis. 2019;14:1465–1484.
  • Shah SA, Velardo C, Farmer A, et al. Exacerbations in chronic obstructive pulmonary disease: identification and prediction using a digital health system. J Med Internet Res. 2017;19(3):e69.
  • Finkelstein J, Cheol Jeong I. Machine learning approaches to personalize early prediction of asthma exacerbations. Ann N Y Acad Sci. 2017;1387(1):153–165.
  • Topalovic M, Laval S, Aerts J-M, Belgian Pulmonary Function Study investigators, et al. Automated interpretation of pulmonary function tests in adults with respiratory complaints. Respiration. 2017;93(3):170–178.
  • Chen H-Y, Chuang C-H, Yang Y-J, et al. Exploring the risk factors of preterm birth using data mining. Expert Syst Appl. 2011;38(5):5384–5387.
  • Firouzi Jahantigh F, Nazarnejad R, Jahantigh F. M. Investigating the risk factors for low birth weight using data mining: a case study of Imam Ali hospital, Zahedan, Iran. Journal of Mazandaran University of Medical Sciences. 2016;25(133):171–182.
  • Pourahmad S, Hamdami E, Vaziri F, et al. Comparison of the Effective Factors of Preterm Birth Versus Low Birth Weight in Southern Iran Using Artificial Neural Network. IJWHR. 2017;5(1):55–59.
  • Ferreira D, Oliveira A, Freitas A. Applying data mining techniques to improve diagnosis in neonatal jaundice. BMC Med Inform Decis Mak. 2012;12(1):143.
  • Shirwaikar RD, Acharya D, Makkithaya K, et al. Optimizing neural networks for medical data sets: a case study on neonatal apnea prediction. Artif Intell Med. 2019;98:59–76.
  • Adnan MHBM, Husain W, editors. A hybrid approach using Naïve Bayes and Genetic Algorithm for childhood obesity prediction. Proceedings of the 2012 International Conference on Computer & Information Science (ICCIS); 2012: IEEE.
  • Zhang S, Tjortjis C, Zeng X, et al. Comparing data mining methods with logistic regression in childhood obesity prediction. Inf Syst Front. 2009;11(4):449–460.
  • Dugan TM, Mukhopadhyay S, Carroll A, et al. Machine learning techniques for prediction of early childhood obesity. Appl Clin Inform. 2015;6(3):506–520.
  • Winiarti S, Yuliansyah H, Purnama AA. Identification of Toddlers’ Nutritional Status Using Data Mining Approach. Int J Adv Comput Sci Appl. 2018;9(1):164–169.
  • Momand Z, Mongkolnam P, Kositpanthavong P, Chan JH, editors. Data mining based prediction of malnutrition in afghan children. Proceedings of the 2020 12th International Conference on Knowledge and Smart Technology (KST): IEEE.
  • Gonçalves JM, Portela F, Santos MF, et al. Predict sepsis level in intensive medicine–data mining approach. In: Advances in Information Systems and Technologies. Berlin, Heidelberg: Springer; 2013 p. 201–211.
  • Desautels T, Calvert J, Hoffman J, et al. Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. JMIR Med Inform. 2016;4(3):e28.
  • Fialho AS, Cismondi F, Vieira SM, et al. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Syst Appl. 2012;39(18):13158–13165.
  • Martin CM, Vogel C, Grady D, et al. Implementation of complex adaptive chronic care: the Patient Journey Record system (PaJR)). J Eval Clin Pract. 2012;18(6):1226–1234.
  • Loh TP, Cervinski MA, Katayev A, et al. Recommendations for laboratory informatics specifications needed for the application of patient-based real time quality control. Clin Chim Acta. 2019;495:625–629.
  • Ma C, Cheng X, Xue F, et al. Validation of an approach using only patient big data from clinical laboratories to establish reference intervals for thyroid hormones based on data mining. Clin Biochem. 2020;80:25–30.
  • Jones GRD. Validating common reference intervals in routine laboratories. Clin Chim Acta. 2014;432:119–121.
  • Ahmed S, Zierk J, Khan AH. Establishment of reference intervals for alkaline phosphatase in Pakistani children using a data mining approach. Lab Med. 2020;51(5):484–490.
  • Farrell C-JL, Nguyen L, Carter AC. Data mining for age-related TSH reference intervals in adulthood. Clin Chem Lab Med. 2017;55(10):e213–e215.
  • Farrell CJL, Nguyen L, Carter AC. Parathyroid hormone: data mining for age-related reference intervals in adults. Clin Endocrinol (Oxf). 2018;88(2):311–317.
  • Ozarda Y, Higgins V, Adeli K. Verification of reference intervals in routine clinical laboratories: practical challenges and recommendations. Clin Chem Lab Med. 2018;57(1):30–37.
  • Worachartcheewan A, Schaduangrat N, Prachayasittikul V, et al. Data mining for the identification of metabolic syndrome status. Excli J. 2018;17:72–88.
  • Tahmassebi A, Wengert GJ, Helbich TH, et al. Impact of machine learning with multiparametric magnetic resonance imaging of the breast for early prediction of response to neoadjuvant chemotherapy and survival outcomes in breast cancer patients. Invest Radiol. 2019;54(2):110–117.
  • Imai S, Yamada T, Kasashi K, et al. Usefulness of a decision tree model for the analysis of adverse drug reactions: evaluation of a risk prediction model of vancomycin-associated nephrotoxicity constructed using a data mining procedure. J Eval Clin Pract. 2017;23(6):1240–1246.
  • Speiser JL, Lee WM, Karvellas CJ, et al. Predicting outcome on admission and post-admission for acetaminophen-induced acute liver failure using classification and regression tree models. PLoS One. 2015;10(4):e0122929.
  • Winiarti S, Yuliansyah H, Andi A. Identification of toddlers’ nutritional status using data mining approach. IJACSA. 2018;9(1):164–169.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.