Abstract
Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current article presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function
and adds the penalty
capturing the squared Euclidean distance of the parameter vector
to the sparsity set Sk where at most k components of
are nonzero. If
represents the minimum of the objective
, then
tends to the constrained minimum of
over Sk as ρ tends to
. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power. Supplementary materials for this article are available online.
Supplementary Materials
Appendix: The file “appendix.pdf” provides derivations for both Algorithm MM and Algorithm SD, a description of simulated datasets, implementation details, and stability results for variable selection. (.pdf)
Julia code: The file “SparseSVM.zip” contains Julia code to reproduce our numerical experiments. Software is also available at https://github.com/alanderos91/SparseSVM.jl. Contents are structured as a Julia project to handle software and data dependencies in an automated fashion. See the project’s README for details. (.zip)