Abstract
Several strategies have been developed recently to ensure valid inference after model selection; some of these are easy to compute, while others fare better in terms of inferential power. In this article, we consider a selective inference framework for Gaussian data. We propose a new method for inference through approximate maximum likelihood estimation. Our goal is to: (a) achieve better inferential power with the aid of randomization, (b) bypass expensive MCMC sampling from exact conditional distributions that are hard to evaluate in closed forms. We construct approximate inference, for example, p-values, confidence intervals etc., by solving a fairly simple, convex optimization problem. We illustrate the potential of our method across wide-ranging values of signal-to-noise ratio in simulations. On a cancer gene expression dataset we find that our method improves upon the inferential power of some commonly used strategies for selective inference. Supplementary materials for this article are available online.
Supplementary Materials
The supplementary materials contain proofs for the technical results, provide additional examples to demonstrate the soft-truncated likelihood, show asymptotic guarantees for the approximate selective MLE, and illustrate the generalization of our method to multiple, convex queries.
Acknowledgments
S.P. would like to sincerely thank and acknowledge Veera Baladandayuthapani and Yujia Pan for their inputs in the analysis of the TCGA dataset. S.P. is immensely thankful to Xuming He and Liza Levina for offering valuable comments on an initial draft of the article. The authors thank the anonymous reviewers for their many insightful suggestions on earlier drafts of the article.