Abstract
The support vector machine (SVM) has been successfully applied to various classification areas with great flexibility and a high level of classification accuracy. However, the SVM is not suitable for the classification of large or imbalanced datasets because of significant computational problems and a classification bias toward the dominant class. The SVM combined with the k-means clustering (KM-SVM) is a fast algorithm developed to accelerate both the training and the prediction of SVM classifiers by using the cluster centers obtained from the k-means clustering. In the KM-SVM algorithm, however, the penalty of misclassification is treated equally for each cluster center even though the contributions of different cluster centers to the classification can be different. In order to improve classification accuracy, we propose the WKM–SVM algorithm which imposes different penalties for the misclassification of cluster centers by using the number of data points within each cluster as a weight. As an extension of the WKM–SVM, the recovery process based on WKM–SVM is suggested to incorporate the information near the optimal boundary. Furthermore, the proposed WKM–SVM can be successfully applied to imbalanced datasets with an appropriate weighting strategy. Experiments show the effectiveness of our proposed methods.
Mathematics Subject Classification:
Acknowledgments
The authors are grateful to the editor and the reviewers for their constructive and insightful comments and suggestions, which helped to dramatically improve the quality of this article. This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0009204).