160
Views
3
CrossRef citations to date
0
Altmetric
Special Issue: 4th MICCAI workshop on Deep Learning in Medical Image Analysis

Data preprocessing in knowledge discovery in breast cancer: systematic mapping study

, &
Pages 547-561 | Received 18 Aug 2019, Accepted 13 Feb 2020, Published online: 27 Feb 2020

References

  • Abnane I , Idri A 2018. Improved analogy-based effort estimation with incomplete mixed data. Proc 2018 Fed Conf Comput Sci Inf Syst FedCSIS 2018; Poznań, Poland. p. 1015–1024.
  • Abreu PH , Santos MS , Abreu MH , Andrade B , Silva DC. 2016. Predicting breast cancer recurrence using machine learning techniques. ACM Comput Surv. 49:1–40.
  • Abubacker NF , Azman A , Doraisamy S , Murad MAA. 2017. An integrated method of associative classification and neuro-fuzzy approach for effective mammographic classification. Neural Comput Appl. 28:3967–3980.
  • Acharya UR , Ng WL , Rahmat K , Sudarshan VK , Koh JEW , Tan JH , Hagiwara Y , Yeong CH , Ng KH . 2017. Data mining framework for breast lesion classification in shear wave ultrasound: a hybrid feature paradigm. Biomed Signal Process Control. 33:400–410.
  • Ahadi FS , Desai MR , Lei C , Li Y , Jia R 2017. Feature-based classification and diagnosis of breast cancer using fuzzy inference system. 2017 IEEE Int Conf Inf Autom ICIA 2017; Macao, China. p. 517–522.
  • Akay MF . 2009. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl. 36:3240–3247.
  • Aličković E , Subasi A . 2017. Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput Appl. 28:753–763.
  • Ampatzoglou A , Charalampidou S , Stamelos I . 2013. Research state of the art on GoF design patterns: a mapping study. J Syst Softw. 86:1945–1964.
  • Bashir S , Qamar U , Khan FH . 2015. Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble. Qual Quant. 49:2061–2076.
  • Begum S , Bera SP , Chakraborty D , Sarkar R 2017. Breast cancer detection using feature selection and active learning. Comput Commun Electr Technol Proc Int Conf Adv Comput Commun Electr Technol (ACCET 2016); West Bengal, India. p. 43–48.
  • Brereton P , Kitchenham BA , Budgen D , Turner M , Khalil M . 2007. Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw. 80:571–583.
  • Chawla NVN , Japkowicz N , Drive P , Kotcz A . 2004. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl. 6:1–6.
  • Chlioui I , Idri A , Abnane I , de Gea JMC , Fernández-Alemán JL 2019. Breast cancer classification with missing data imputation. Adv Intell Syst Comput; Galicia, Spain. p. 13–23.
  • Clark AF 2004. PEIPA, the Pilot European Image Processing Archive. [accessed 2004 Feb 5]. Peipa.essex.ac.uk.
  • Condori-Fernandez N , Daneva M , Sikkel K , Wieringa R , Dieste O , Pastor O 2009. A systematic mapping study on empirical evaluation of software requirements specifications techniques. 2009 3rd Int Symp Empir Softw Eng Meas ESEM 2009; NW Washington, DC, United States: IEEE computer society. p. 502–505.
  • Dankolo MN , Mohamed Radzi NH , Salehuddin R , Mustaffa NH . 2018. Hybrid flower pollination algorithm and support vector machine for breast cancer classification. J Technol Manag Bus. 5:36–42.
  • Deepa S , Bharathi VS 2012. An efficient digital mammogram image classification using DTCWT and SVM. Machinery A for computing, editor. CCSEIT ’12 Proc Second Int Conf Comput Sci Eng Inf Technol; Coimbatore UNK, India. p. 288–293.
  • Doreswamy SMU 2015. Fast modular artificial neural network for the classification of breast cancer data. WCI ’15 Third Int Symp Women Comput Informatics; Kochi, India. p. 66–72.
  • Dua D , Graff C 2017. UCI machine learning repository; [accessed 2019 Oct 7]. http://archive.ics.uci.edu/ml
  • El Idrissi T , Idri A , Bakkoury Z 2018. Data mining techniques in diabetes self-management: A systematic map. Adv Intell Syst Comput; Naples, Italy. p. 1142–1152.
  • Elmoufidi A , Fahssi K E , Jai-andaloussi S , Sekkaki A . 2014. Detection of regions of interest in mammograms by using local binary pattern and dynamic K-means algorithm. Int J Image Video Process Theory Appl. 1:118–123.
  • Esfandiari N , Babavalian MR , Moghadam AME , Tabar VK . 2014. Knowledge discovery in medicine: current issue and future trend. Expert Syst Appl. 41:4434–4463.
  • Fallahi A , Jafari S . 2011. An expert system for detection of breast cancer using data preprocessing and bayesian network. Int J Adv Sci Technol. 34:65–70.
  • Fayyad U , Piatetsky-Shapiro G , Smyth P . 1996. From data mining to knowledge discovery in databases. AI Mag. 17:37.
  • Ferreira CBR , Borges DL 2001. Automated mammogram classification using a multiresolution pattern recognition approach. 14th Brazilian Symp Comput Graph Image Process; Florianopolis, Brazil. p. 76–83.
  • Fondón I , Sarmiento A , García AI , Silvestre M , Eloy C , Polónia A , Aguiar P . 2018. Automatic classification of tissue malignancy for breast carcinoma diagnosis. Comput Biol Med. 96:41–51.
  • Franco L , Subirats JL , Molina I , Alba E , Jerez JM 2007. Early breast cancer prognosis prediction and rule extraction using a new constructive neural network algorithm. Int Work Artif Neural Networks; San sebastian, Spain. p. 1004–1011.
  • García S , Luengo J , Herrera F . 2016. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Syst. 98:1–29.
  • García V , Mollineda RA , Sánchez JS 2009. Index of balanced accuracy: A performance measure for skewed class distributions. Iber Conf pattern Recognit image Anal; Póvoa de Varzim, Portugal. p. 441–448.
  • García-Laencina PJ , Abreu PH , Abreu MH , Afonoso N . 2015. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Comput Biol Med. 59:125–133.
  • Gayathri BM , Sumathi CP 2016. Mamdani fuzzy inference system for breast cancer risk detection. 2015 IEEE Int Conf Comput Intell Comput Res ICCIC 2015; Madurai, Tamilnadu, India. p. 1–6.
  • Haneem F , Ali R , Kama N , Basri S 2017. Descriptive analysis and text analysis in systematic literature review: a review of master data management. Int Conf Res Innov Inf Syst ICRIIS; Langkawi Island, Malaysia. p. 1–6.
  • Hasan H , Tahir NM 2010. Feature selection of breast cancer based on principal component analysis. Proc - CSPA 2010 2010 6th Int Colloq Signal Process Its Appl; Malacca City, Malaysia. p. 242–245.
  • Hassanien AE . 2007. Fuzzy rough sets hybrid scheme for breast cancer detection. Image Vis Comput. 25:172–183.
  • Hu S , Liang Y , Ma L , He Y 2009. MSMOTE: improving classification performance when training data is imbalanced. 2nd Int Work Comput Sci Eng WCSE 2009; Qingdao, China. p. 13–17.
  • Huang CL , Liao HC , Chen MC . 2008. Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Syst Appl. 34:578–587.
  • Idri A , Abnane I , Abran A . 2016. Missing data techniques in analogy-based software development effort estimation. J Syst Softw. 117:595–611.
  • Idri A , Benhar H , Fernández-Alemán JL , Kadi I . 2018a. A systematic map of medical data preprocessing in knowledge discovery. Comput Methods Programs Biomed. 162:69–85.
  • Idri A , Chlioui I , El Ouassif B 2018b. A systematic map of data analytics in breast cancer. ACM Int Conf Proceeding Ser; Brisbane, Australia. p. 1–10.
  • Inan O , Uzer MS , Yilmaz N . 2013. A new hybrid feature selection method based on association rules and pca for detection of breast cancer. Int J Innov Comput Inf Control. 9:727–729.
  • Jacob SG , Geetha Ramani R 2012. Efficient classifier for classification of prognostic breast cancer data through data mining techniques. Proc World Congr Eng Comput Sci; San Francisco, USA. p. 24–26.
  • Jhajharia S , Varshney HK , Verma S , Kumar R 2016. A neural network based breast cancer prognosis model with PCA processed features. 2016 Int Conf Adv Comput Commun Informatics, ICACCI 2016; Jaipur, India. p. 1896–1901.
  • Jonsdottir T , Hvannberg ET , Sigurdsson H , Sigurdsson S . 2008. The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. Expert Syst Appl. 34:108–118.
  • Jose J , Chacko A , Dhas DAS 2017. Comparative study of different image denoising filters for mammogram preprocessing. Proc Int Conf Inven Syst Control ICISC 2017; Coimbatore, India. p. 1–6.
  • Kadi I , Idri A , Fernandez-Aleman JL . 2019. Systematic mapping study of data mining–based empirical studies in cardiology. Health Informatics J. 25:741–770.
  • Kaushik D , Kaur K 2016. Application of data mining for high accuracy prediction of breast tissue biopsy results. 2016 3rd Int Conf Digit Inf Process Data Mining, Wirel Commun DIPDMWC 2016; Moscow, Russia. p. 40–45.
  • Keyvanfard F , Shoorehdeli MA , Teshnehlab M 2011. Feature selection and classification of breast MRI lesions based on multi classifier. 2011 Int Symp Artif Intell Signal Process AISP 2011; Tehran, Iran. p. 54–58.
  • Kitchenham B , Mendes E , Travassos G 2006. A systematic review of cross-vs. within-company cost estimation studies. Proc 10th Int Conf Eval Assess Softw Eng; Swindon, Uk. p. 81–90.
  • Kumari M , Singh V . 2018. Breast cancer prediction system. Procedia Comput Sci. 132:371–376.
  • Li B , Zhao Y , Yan W 2018. Benign and malignant mammographic image classification based on convolutional neural networks. Proc 2018 10th Int Conf Mach Learn Comput; Macau, China. p. 247–251.
  • Lin GS , Chang YC , Yeh WC , Liu KC , Yeh CH 2012. Detecting masses in digital mammograms based on texture analysis and neural classifier. Proc - 3rd Int Conf Inf Secur Intell Control ISIC 2012; Yunlin, Taiwan. p. 222–225.
  • Liu Y-Q-Q , Wang C , Zhang L 2009. Decision tree based predictive models for breast cancer survivability on imbalanced data. 2009 3rd Int Conf Bioinforma Biomed Eng; Beijing, China. p. 1–4.
  • Lotfnezhad Afshar H , Ahmadi M , Roudbari M , Sadoughi F . 2015. Prediction of breast cancer survival through knowledge discovery in databases. Glob J Health Sci. 7:392–398.
  • Luo ST , Cheng BW . 2012. Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst. 36:569–577.
  • Macías-García L , Luna-Romera JM , García-Gutiérrez J , Martínez-Ballesteros M , Riquelme-Santos JC , González-Cámpora R . 2017. A study of the suitability of autoencoders for preprocessing data in breast cancer experimentation. J Biomed Inform. 72:33–44.
  • Maimon O , Rokach L editors. 2005. data mining and knowledge discovery handbook. 2nd ed. Boston (MA): Springer.
  • Malpani R , Lu M , Zhang D , Sung WK 2011. Mining transcriptional association rules from breast cancer profile data. Proc 2011 IEEE Int Conf Inf Reuse Integr IRI 2011; Las Vegas, USA. p. 154–159.
  • Mookiaha MRK , Acharyaa UR , Ng EYK . 2012. Data mining technique for breast cancer detection in thermograms using hybrid feature extraction strategy. Quant Infrared Thermogr J. 9:151–165.
  • Morimoto LM , White E , Chen Z , Chlebowski RT , Hays J , Kuller L , Lopez AM , Manson JA , Margolis KL , Muti PC , et al. 2002. Obesity, body size, and risk of postmenopausal breast cancer: the women’s health initiative (United States). Cancer Causes Control. 13:741–751.
  • Muštra M , Grgić M , Delač K . 2012. Breast density classification using multiple feature selection. Autom J Control Meas Electron Comput Commun. 53:362–372.
  • Nandi RJ , Nandi AK , Rangayyan R , Scutt D 2006. Genetic programming and feature selection for classification of breast masses in mammograms. Annu Int Conf IEEE Eng Med Biol – Proc; New York, USA. p. 3021–3024.
  • Osareh A , Shadgar B 2010. Machine learning techniques to diagnose breast cancer. 2010 5th Int Symp Heal Informatics Bioinformatics, HIBIT 2010; Belek, Antalya, Turkey. p. 114–120.
  • Oskouei RJ , Kor NM , Maleki SA . 2017. Data mining and medical world: breast cancers’ diagnosis, treatment, prognosis and challenges. Am J Cancer Res. 7:610–627.
  • Paper C , Academy AA , Zeid M , Academy A 2012. Experimental comparison of classifiers for breast cancer diagnosis experimental comparison of classifiers for breast cancer diagnosis. 2012 Seventh Int Conf Comput Eng Syst; Cairo, Egypt. p. 180–185.
  • Petersen K , Feldt R , Mujtaba S , Mattsson M 2008. Systematic mapping studies in software engineering. 12Th Int Conf Eval Assess Softw Eng; Vol. 12; Swindon, United Kingdom. p. 1–10.
  • Petersen K , Vakkalanka S , Kuzniarz L . 2015. Guidelines for conducting systematic mapping studies in software engineering : an update. Inf Softw Technol. 64:1–18.
  • Punitha S , Ravi S , Devi MA , Vaishnavi J 2016. Computer aided mammography techniques for detection and classification of breast cancers. Proc Int Conf Informatics Anal; Pondicherry, India. p. 1–8.
  • Raad A , Kalakech A , Ayache M 2012. Breast cancer classification using neural network approach: MLP and RBF. 13th Int Arab Conf Inf Technol ACIT ’; Zarqa, Jordan. p. 15–19.
  • Radovic M , Djokovic M , Peulic A , Filipovic N 2013. Application of data mining algorithms for mammogram classification. 13th IEEE Int Conf Bioinforma Bioeng IEEE BIBE 2013; Chania, Greece. p. 1–4.
  • Rathi M , Gupta C . 2014. An approach to predict breast cancer and drug suggestion using machine learning techniques. ACEEE Int J Inf Technol. 4:23–31.
  • Ribeiro MX , Traina AJM , Traina C , Azevedo-Marques PM . 2008. An association rule-based method to support medical image diagnosis with efficiency. IEEE Trans Multimed. 10:277–285.
  • Sakri SB , Abdul Rashid NB , Muhammad Zain Z . 2018. Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access. 6:29637–29647.
  • Sardi L , Idri A , Fernández-Alemán JL . 2017. A systematic review of gamification in e-health. J Biomed Inform. 71:31–48.
  • Senthilkumar J , Kavitha JK , Manjula D , Krishnamoorthy R 2009. ADMID: an association rule discovery for mammogram image diagnosis. Proc - IEEE Symp Comput Med Syst; Albuquerque, NM, USA. p. 1–8.
  • Sharma V , Singh S . 2014. CFS-SMO based classification of breast density using multiple texture models. Med Biol Eng Comput. 52:521–529.
  • Shastri SS , Nair PC , Gupta D , Nayar RC , Rao R , Ram A . 2018. Breast cancer diagnosis and prognosis using machine learning techniques. Adv Intell Syst Comput. 683:327–344.
  • Shen R , Yang Y , Shao F 2014. Intelligent breast cancer prediction model using data mining techniques. Proc - 2014 6th Int Conf Intell Human-Machine Syst Cybern IHMSC 2014; Hangzhou, Zhejiang, China. p. 384–387.
  • Shukla N , Hagenbuchner M , Win KT , Yang J . 2018. Breast cancer data analysis for survivability studies and prediction. Comput Methods Programs Biomed. 155:199–208.
  • Siddique A , Iqbal M , Browne WN 2016. A comprehensive strategy for mammogram image classification using learning classifier systems. 2016 IEEE Congr Evol Comput CEC 2016; Vancouver, Canada. p. 2201–2208.
  • Song Q , Shepperd M , Chen X , Liu J . 2008. Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation. J Syst Softw. 81:2361–2370.
  • Song T , Wang Y , Du W , Cao S , Tian Y , Liang Y . 2017. The method for breast cancer grade prediction and pathway analysis based on improved multiple kernel learning. J Bioinform Comput Biol. 15:1650037.
  • Suganthi M , Madheswaran M 2010. An enhanced decision support system for breast tumor identification in screening mammograms using combined classifier C3. Int Conf Work Emerg Trends Technol 2010, ICWET 2010; Mumbai Maharashtra, India. p. 786–791.
  • Sugimoto M , Takada M , Toi M 2013. Comparison of robustness against missing values of alternative decision tree and multiple logistic regression for predicting clinical data in primary breast cancer. Proc Annu Int Conf IEEE Eng Med Biol Soc EMBS; Milano, Italy. p. 3054–3057.
  • Sun Y , Babbs CF , Delp EJ 2006. A comparison of feature selection methods for the detection of breast cancers in mammograms: adaptive sequential floating search vs. genetic algorithm. 2005 IEEE Eng Med Biol 27th Annu Conf; Shanghai, China. p. 6532–6535.
  • Taheri M , Hamer G , Son SH , Shin SY 2016. Enhanced breast cancer classification with automatic thresholding using SVM and harris corner detection. Proc Int Conf Res Adapt Converg Syst; Odense, Denmark. p. 56–60.
  • Thangavel K , Velayutham C 2012. Rough set based unsupervised feature selection in digital mammogram image using entropy measure. 2012 Int Conf Biomed Eng ICoBE 2012; Penang, Malaysia. p. 10–16.
  • van Vliet MH , Fabien R , Horlings HM , van de Vijver MJ , Reinders MJT , Wessels LFA . 2008. Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics. 9:1–22.
  • Vanneschi L , Farinaccio A , Mauri G , Antoniotti M , Provero P , Giacobini M . 2011. A comparison of machine learning techniques for survival prediction in breast cancer. BioData Min. 4:1–13.
  • Velayutham C , Thangavel K 2011. A novel feature extraction method using spectral shape in digital mammogram image. Proc 2011 World Congr Inf Commun Technol WICT 2011; Mumbai, India. p. 835–840.
  • Walker S , Hyde C , Hamilton W . 2014. Risk of breast cancer in symptomatic women in primary care: a case-control study using electronic records. Br J Gen Pract. 64:788–793.
  • Wang Y , Fuyong W 2006. Breast cancer diagnosis via support vector machines. 2006 Chinese Control Conf; Harbin, China. p. 1853–1856.
  • Wohlin C , Runeson P , Da Mota Silveira Neto PA , Engström E , Do Carmo Machado I , De Almeida ES . 2013. On the reliability of mapping studies in software engineering. J Syst Softw. 86:2594–2610.
  • Yadav P , Jethani V 2016. Breast thermograms analysis for cancer detection using feature extraction and data mining technique. Proc Int Conf Adv Inf Commun Technol Comput; Bikaner, India. p. 1–5.
  • Yang HC , Chang CH , Huang SW , Chou YH , Li PC 2007. Breast ultrasound computer-aided diagnosis using both acoustic and image features. Proc - IEEE Ultrason Symp; New York, USA. p. 2489–2492.
  • Younesi F , Alam NR , Zoroofi RA , Ahmadian A , Guiti M 2007. Computer‐aided mass detection on digitized mammograms using adaptive thresholding and fuzzy entropy. 2007 29th Annu Int Conf IEEE Eng Med Biol Soc; Lyon, France. p. 5638–5640a.
  • Zhang D , Zou L , Zhou X , He F . 2018. Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer. IEEE Access. 6:28936–28944.
  • Zhang G , Wang W , Moon J , Pack JK , Jeon SI 2011. A review of breast tissue classification in mammograms. Proc 2011 ACM Symp Res Appl Comput - RACS ’11; TaiChung, Taiwan. p. 232–237.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.