Abstract
The impracticality of posterior sampling has prevented the widespread adoption of spike-and-slab priors in high-dimensional applications. To alleviate the computational burden, optimization strategies have been proposed that quickly find local posterior modes. Trading off uncertainty quantification for computational speed, these strategies have enabled spike-and-slab deployments at scales that would be previously unfeasible. We build on one recent development in this strand of work: the Spike-and-Slab LASSO procedure. Instead of optimization, however, we explore multiple avenues for posterior sampling, some traditional and some new. Intrigued by the speed of Spike-and-Slab LASSO mode detection, we explore the possibility of sampling from an approximate posterior by performing MAP optimization on many independently perturbed datasets. To this end, we explore Bayesian bootstrap ideas and introduce a new class of jittered Spike-and-Slab LASSO priors with random shrinkage targets. These priors are a key constituent of the Bayesian Bootstrap Spike-and-Slab LASSO (BB-SSL) method proposed here. BB-SSL turns fast optimization into approximate posterior sampling. Beyond its scalability, we show that BB-SSL has a strong theoretical support. Indeed, we find that the induced pseudo-posteriors contract around the truth at a near-optimal rate in sparse normal-means and in high-dimensional regression. We compare our algorithm to the traditional Stochastic Search Variable Selection (under Laplace priors) as well as many state-of-the-art methods for shrinkage priors. We show, both in simulations and on real data, that our method fares very well in these comparisons, often providing substantial computational gains. Supplementary materials for this article are available online.
Supplementary Materials
Appendix: File “appendix” containing proofs, discussion of connections to NPL, details of computational complexity analysis and additional experimental results mentioned in the article. (pdf file)
Code: File “code” containing R scripts to perform the simulations and experiments described in the article. (zipped file)
Notes
1 Here we are not necessarily assuming that and the above formula is hence, slightly different from Theorem 3.1 of Ročková and George (Citation2018).
2 Note that if , then
which brings us back to the uniform Dirichlet distribution.
3 In fact, under the uniform Dirichlet distribution, the marginal distribution becomes . Since
, the distribution of
converges to Inverse-Gamma(1,1) which exhibits a skewed shape, which is in sharp contrast to the symmetric Gaussian distribution of the true posterior.
4 In order for to be a bounded real number smaller than 1, we would need
. For example when
, for random matrix where each element is generated independently by Gaussian distribution, we have
(Vivo, Majumdar, and Bohigas Citation2007). So in order for such a sequence Cn (s.t.
) to exist, we need
. We can choose
and
under such settings.