ABSTRACT
We consider the computational and statistical issues for high-dimensional Bayesian model selection under the Gaussian spike and slab priors. To avoid large matrix computations needed in a standard Gibbs sampler, we propose a novel Gibbs sampler called “Skinny Gibbs” which is much more scalable to high-dimensional problems, both in memory and in computational efficiency. In particular, its computational complexity grows only linearly in p, the number of predictors, while retaining the property of strong model selection consistency even when p is much greater than the sample size n. The present article focuses on logistic regression due to its broad applicability as a representative member of the generalized linear models. We compare our proposed method with several leading variable selection methods through a simulation study to show that Skinny Gibbs has a strong performance as indicated by our theoretical work. Supplementary materials for this article are available online.
Supplementary Materials
Part A: Proofs of Theorems 1 and 2.
Part B: A discussion about the connection between L0 penalization and Skinny Gibbs.
Part C: A discussion about an unbiasedness property of Skinny Gibbs.
Part D: Skinny Gibbs algorithm using the Polya-Gamma scale mixture representation (Polson, Scott, and Windle Citation2013) of the logistic distribution.
Part E: A discussion on the stability and convergence of Skinny Gibbs chains for the empirical studies of Sections 4 and 5.
Part F: Additional simulation results for high correlation and weak signal settings that are not presented in the article.
Acknowledgment
The authors thank Professor Faming Liang for providing them the code to perform model selection based on the Bayesian Subset Regression.
Funding
The research is partially supported by the NSF Awards DMS-1307566, DMS-1607840, DMS-1811768, and the Chinese National Natural Science Projects 11129101, 11501123, and 11690012.