4,959
Views
33
CrossRef citations to date
0
Altmetric
Research Article

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis

&

References

  • Torre LA, Bray F, Siegel RL, et al. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65:87–108.
  • Paulin F, Santhakumaran A. Classification of breast cancer by comparing back propagation training algorithms. Int J Comp Sci Eng. 2011;3:S68.
  • Ahan S, Polat K, Kodaz H, et al. A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Comput Biol Med. 2007;37:415.
  • Wang S, Minku LL, Yao X. Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng. 2015;27:1356–1368.
  • Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. JAIR. 2002;16:321–357.
  • Lunardon N, Menardi G, Torelli N. ROSE: a package for binary imbalanced learning. R J. 2014;6:79–89.
  • Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31:264–323.
  • Ng MK. A note on constrained k-means algorithms. Pattern Recognit. 2000;33:515–519.
  • Chen J, Zhang C, Xue X, et al. Fast instance selection for speeding up support vector machines. Knowl-Based Syst. 2013;45:1–7.
  • Liu C, Wang W, Wang M, et al. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst. 2017;116:58–73.
  • Li J-B, Peng Y, Liu D. Quasiconformal kernel common locality discriminant analysis with application to breast cancer diagnosis. Inf Sci. 2013;223:256–269.
  • Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl. 2014;41:1476–1482.
  • Pashaei E, Ozen M, Aydin N. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization. Conference proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual Conference; 2015 Aug; Milan, Italy. p. 7230–7233.
  • Weng C-H, Huang T-K, Han R-P. Disease prediction with different types of neural network classifiers. Telemat Inform. 2016;33:277–292.
  • Lee W, Jun C-H, Lee J-S. Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf Sci. 2017;381:92–103.
  • Nilashi M, Ibrahim O, Ahmadi H, et al. A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inform. 2017;34:133–144.
  • Peng L, Chen W, Zhou W, et al. An immune-inspired semi-supervised algorithm for breast cancer diagnosis. Comput Meth Prog Bio. 2016;134:259–265.
  • Sheikhpour R, Sarram MA, Sheikhpour R. Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl Soft Comput. 2016;40:113–131.
  • Idicula-Thomas S, Kulkarni AJ, Kulkarni BD, et al. A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli. Bioinformatics. 2006;22:278–284.
  • Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–297.
  • Han H, Wang W-Y, Mao B-H, editors. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. advances in intelligent computing. Berlin (Heidelberg): Springer Berlin Heidelberg; 2005.
  • He H, Bai Y, Garcia EA, et al., editors. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks; 2008 June 1–8; Hong Kong, China: IEEE.
  • Burges CJC. Geometry and invariance in kernel based methods In: Schölkopf B, Burges CJC, Smola, AJ, et al. editors. Advances in kernel methods. Cambridge: MIT Press; 1999. p. 89–116.
  • Wang S, Yao X. Relationships between diversity of classification ensembles and single-class performance measures. IEEE T Knowl Data En. 2012;25:206–219.
  • Raeder T, Forman G, Chawla NV. Learning from imbalanced data: evaluation matters. In: Holmes DE, Jain LC, editors. Data mining: foundations and intelligent paradigms: volume 1: clustering, association and classification. Berlin (Heidelberg): Springer Berlin Heidelberg; 2012. p. 315–331.
  • Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc. 1937;32:675–701.
  • Pashaei E, Aydin N. Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput. 2017;56:94–106.
  • Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
  • Lin WC, Tsai CF, Hu YH, et al. Clustering-based undersampling in class-imbalanced data. Inf Sci. 2017;409:17–26.