ABSTRACT
Data Mining (DM) is a set of techniques that allow to analyse data from different perspectives and summarising it into useful information. Data mining has been increasingly used in medicine, especially in oncology. Data preprocessing is the most important step of knowledge extraction process and allows to improve the performance of the DM models. Breast cancer (BC) becomes the most common cancer among females worldwide and the leading cause of women's death. This paper aims to perform a systematic mapping study to analyse and synthesise studies on the application of preprocessing techniques for a DM task in breast cancer.Therefore, 66 relevant articles published between 2000 and October 2018 were selected and analysed according to five criteria: year/channel of publication, research type, medical task, empirical type and preprocessing task. The results show that Conferences and journals are the most targeted publication sources, researchers were more interested in applying preprocessing techniques for the diagnosis of BC, historical-based evaluation was the most used empirical type in the evaluation of preprocessing techniques in BC, and data reduction was the most investigated task of preprocessing in BC. However, A low number of papers discussed treatment which encourages researchers to devote more efforts to this task.
Disclosure statement
No potential conflict of interest was reported by the author(s)
Additional information
Notes on contributors
Imane Chlioui
Imane Chlioui is a Ph.D. student at the Computer Science and Systems Analysis School (ENSIAS, University Mohammed V, Rabat, Morocco), a member of the Software Project Management Research Team. Her doctoral research investigates the impact of missing data techniques on breast cancer classification. She received her engineering degree in 2015 from the Computer Science and Systems Analysis School (ENSIAS).
Ali Idri
Ali Idri is a Full Professor at the Computer Science and Systems Analysis School (ENSIAS, University Mohammed V, Rabat, Morocco). He received his Master and Doctorate of 3rd Cycle in Computer Science from the University of Mohamed V in 1994 and 1997 respectively. He received his Ph.D. in Cognitive and Computer Sciences from the University of Quebec at Montreal in 2003. He is the head of the Software Project Management Research Team since 2010 and the Chair of the department Web and Mobile Engineering for the period 2014-2020. He was the principal investigator of several leading national and international projects. He was ranked at the 3rd position of the Top-Ten researchers in the field of software effort estimation according to the study “Research Patterns and Trends in Software Effort Estimation (Information and Software Technology 91 (2017) 1–21). He was recently ranked 2nd of the Top-Ten researchers in doing Systematic Mapping Studies in Software Engineering according to the study “Landscaping systematic mapping studies in software engineering: A tertiary study”. He is an Associate Editor of BMC Medical Informatics and Decision Making, https://bmcmedinformdecismak.biomedcentral.com/). He is an Expert Evaluator of the CNRST (http://www.cnrst.ma/index.php/fr/). He is very active in the fields of software engineering, machine learning and medical informatics and has published more than 180 papers in well recognized journals and conferences.
Ibtissam Abnane
Ibtissam Abnane is an Assistant Professor at the Computer Science and Systems Analysis School (ENSIAS, University Mohammed V, Rabat, Morocco). She received her engineering degree in Computer Science from National School of Applied Sciences of Safi (ENSAS) in 2013. She received her Ph.D. from the Computer Science and Systems Analysis School (ENSIAS, University Mohammed V, Rabat, Morocco) in 2018. She is a member of the Software Project Management Research Team. She is working in the fields of software engineering, machine learning and medical informatics.