Abstract
We propose a test of many zero parameter restrictions in a high dimensional linear iid regression model with k n regressors. The test statistic is formed by estimating key parameters one at a time based on many low dimension regression models with nuisance terms. The parsimoniously parameterized models identify whether the original parameter of interest is or is not zero. Estimating fixed low dimension sub-parameters ensures greater estimator accuracy, it does not require a sparsity assumption nor therefore a regularized estimator, it is computationally fast compared to, for example, de-biased Lasso, and using only the largest in a sequence of weighted estimators reduces test statistic complexity and therefore estimation error. We provide a parametric wild bootstrap for p-value computation, and prove the test is consistent and has nontrivial
-local-to-null power where
is the
covariate fourth moment.
Acknowledgments
This manuscript benefitted from the expert commentary and suggestions of two referees and an associate editor.
Disclosure Statement
No potential conflict of interest was reported by the author.
Notes
1 As a testament to the increasing amount of available data, in November 2022 the 27th General Conference on Weights and Measures presented new numerical prefixes handling, among other magnitudes,1030.
2 This may carry over to (semi)nonparametric settings with fixed low dimension and infinite dimensional function θ0. See Chen and Pouzo (Citation2015) for simulation evidence of sieve based bootstrapped t-tests.
3 Using Matlab (with coordinate descent and ADMM algorithms) via a SLURM scheduler on the Longleaf cluster at UNC, with 128 workers on 1 node, and setting n 100,
0,
200, and 1000 bootstrap samples, one bootstrapped p-value under H0 for the DBL max-statistic with 5-fold cross-validation took 335 sec (5.6 min). Increase
to 480, as in our simulation study, yielded a computation time of 8109 sec (135.2 min). The bootstrapped parsimonious max-test, by comparison, took only 4.17 and 5.1 sec, respectively, yielding up to a 1600x computation time gain. Increase n to 250 and set
1144, and DBL required 50.21 hr, while our method took just 5.2 sec (roughly
faster). See Section 4 for complete simulation details.
4 In our design xt
is bounded or Gaussian and is therefore -bounded for any p > 0.
5 See Dezeure et al. (Citation2015) for discussion of an R-package hdi (high dimensional inference). At least for the class of hypotheses considered in this article, we anticipate that estimating many low dimension models by least squares will always dominate de-biased Lasso in terms of computation time, irrespective of the software or solver used. A comprehensive comparison across software platforms and numerical solvers, however, is left for subsequent research.
6 We suppress λ,
1, when confusion is avoided.