ABSTRACT
In the application of high-dimensional data classification, several attempts have been made to achieve variable selection by replacing the -penalty with other penalties for the support vector machine (SVM). However, these high-dimensional SVM methods usually do not take into account the special structure among covariates (features). In this article, we consider a classification problem, where the covariates are ordered in some meaningful way, and the number of covariates p can be much larger than the sample size n. We propose a structured sparse SVM to tackle this type of problems, which combines the non-convex penalty and cubic spline estimation procedure (i.e. penalizing second-order derivatives of the coefficients) to the SVM. From a theoretical point of view, the proposed method satisfies the local oracle property. Simulations show that the method works effectively both in feature selection and classification accuracy. A real application is conducted to illustrate the benefits of the method.
Acknowledgments
We would like to thank the editor, associate editor and two reviewers for their constructive comments that have led to a significant improvement of the manuscript.
Disclosure statement
No potential conflict of interest was reported by the author(s).