Abstract
Classification and feature selection play an important role in knowledge discovery in high-dimensional data. Although penalized Support Vector Machine (SVM) is among the most powerful methods for classification and automatic feature selection in high-dimensional feature space, it is not directly applicable to ultrahigh-dimensional cases, wherein the number of features far exceeds the sample size. In this paper, we suggest an efficient two-step method for simultaneous classification and identifying important features in the setting of ultrahigh-dimensional models. Specifically, we first develop an independence screening procedure to reduce the dimensionality of the feature space to a moderate scale, and then penalized support vector machine is applied to the dimension-reduced feature space to select important features further and estimate the coefficients, via a (penalized) model fit. Implementation of the suggested two-step method is not limited by the dimensionality of the models and entails much less computational cost. Numerical examples and a real data analysis are used to demonstrate the finite sample performance of our proposal.
Acknowledgements
We would like to thank the anonymous reviewers and associate editor for their valuable comments and suggestions which significantly improved the presentation of the paper and led to more details.
Disclosure statement
No potential conflict of interest was reported by the author(s).