Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis

Jue ZhangSchool of Information and Technology, Northwest University, Xi’an, China; ;School of Information Engineering, Yulin University, Yulin, ChinaView further author information

Li ChenSchool of Information and Technology, Northwest University, Xi’an, China; Correspondence[email protected]
View further author information

Abstract

To overcome the two-class imbalanced classification problem existing in the diagnosis of breast cancer, a hybrid of Random Over Sampling Example, K-means and Support vector machine (RK-SVM) model is proposed which is based on sample selection. Random Over Sampling Example (ROSE) is utilized to balance the dataset and further improve the diagnosis accuracy by Support Vector Machine (SVM). As there is one different sample selection factor via clustering that encourages selecting the samples near the class boundary. The purpose of clustering here is to reduce the risk of removing useful samples and improve the efficiency of sample selection. To test the performance of the new hybrid classifier, it is implemented on breast cancer datasets and the other three datasets from the University of California Irvine (UCI) machine learning repository, which are commonly used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in term of G-mean and accuracy indices. Additionally, experimental results show that this method also performs superiorly for binary problems.

Keywords:

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work is supported by National Natural Science Foundation of China under Grant No. 51866015, and Shaanxi Technology Committee Industrial Public Relation Project (No. 2018GY-146).

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis

Information for

Open access

Opportunities

Help and information

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis

Abstract

Disclosure statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature