ABSTRACT
Conducting model selection on data gives rise to selection uncertainty which, when ignored, invalidates subsequent classical inference which assumes that the model is given before the analysis and is in all its aspects correctly specified. In selective inference, the randomness induced by selection is dealt with by conditioning confidence intervals and p-values on the subspace of the data which leads to the same model selection as the observed data. The main challenge is the characterization of this selection event. We develop an algorithm for conducting approximate post-selection inference for parameters after model selection events which may not be characterizable as polyhedrons. We apply this on the adaptive lasso, the adaptive elastic net and the group lasso. We conduct experiments on simulated and real data, illustrating that the algorithm can both successfully control the false-positive rate and is computationally efficient.
Disclosure statement
The authors report there are no competing interests to declare.
Data availability
The dataset used in Section 6.1 is publicly available in the software R [Citation34] as data(birthwt) from the package MASS. The data from Section 6.2 are also publicly available in R [Citation34] from the package gglasso as data(bardet).
Notes
1 Nine outliers, of which 2.465 is the largest, are excluded from Figure (b).
2 Eighteen outliers, of which 2.480 is the largest, are excluded from Figure (b).
3 Twenty-eight outliers, of which 2.227 is the largest, are excluded from Figure (b).
4 Seven outliers, of which 2.380 is the largest, are excluded from Figure (b).
5 Eight outliers, of which 1.870 the largest are excluded from Figure .