ABSTRACT
Classical two-sample permutation tests for equality of distributions have exact size in finite samples, but they fail to control size for testing equality of parameters that summarize each distribution. This article proposes permutation tests for equality of parameters that are estimated at root-n or slower rates. Our general framework applies to both parametric and nonparametric models, with two samples or one sample split into two subsamples. Our tests have correct size asymptotically while preserving exact size in finite samples when distributions are equal. They have no loss in local asymptotic power compared to tests that use asymptotic critical values. We propose confidence sets with correct coverage in large samples that also have exact coverage in finite samples if distributions are equal up to a transformation. We apply our theory to four commonly-used hypothesis tests of nonparametric functions evaluated at a point. Lastly, simulations show good finite sample properties, and two empirical examples illustrate our tests in practice. Supplementary materials for this article are available online.
Supplementary Materials
The online supplementary materials contain a PDF document with all the proofs, tables, and graphs that are referred to in the article. It also includes replication code for the simulations and empirical examples.
Acknowledgments
We thank Roger Koenker, Marcelo Moreira, Joe Romano, Azeem Shaikh, Xiaofeng Shao, the editor, associate editor, and two anonymous referees for many insightful comments. The article benefited from feedback given by seminar participants at the U. of Sao Paulo (FEARP), U. of Michigan, Boston U., U. of Essex, IAAE-Rotterdam, Bristol-ESG, KER Int’l Conference, UIUC, and Penn State. Bertanha acknowledges financial support received while visiting the Kenneth C. Griffin Department of Economics at the University of Chicago, where part of this work was conducted.
Notes
1 The lack of size control of the classical permutation test outside of the sharp null has been studied for a long time (e.g., Romano Citation1990). Theorem 2.1 confirms the lack of size control in our setting.
2 Alternative resampling methods such as the bootstrap and subsampling also share the same asymptotic local power as the permutation test because they produce critical values that are consistent for standard Gaussian critical values under the null hypothesis.
3 Suppose one uses an estimator that assumes the null hypothesis is true; that is, is consistent for under the null hypothesis but has a different probability limit under the alternative. Regardless of whether the null is true, such an estimator applied to a random permutation of the data is generally consistent for , and Corollary 2.1 remains true. Consistency for comes from the fact that an estimator applied to a random permutation behaves as if it were applied to data from a mixture distribution, where the null is always true (Section B.3.2 in the supplementary materials).
4 We cannot split the sample based on X being above or below 0 as we do in Section 3.3. If we split the sample based on X and the distribution of X is asymmetric, it becomes impossible to identify the side limit of f at 0 using only data from either sample, as required by Assumption 2.1.
5 The data transformation is used to obtain finite sample exactness when distributions are equal up to a transformation , but it is not necessary for correct asymptotic coverage. In fact, for the null hypothesis , one may construct a permutation test that compares the value of with the critical values from . The test has correct size asymptotically, and the confidence set constructed by inverting has correct coverage in large samples.
6 For more details on bias correction see discussion in Section 3.1. Section D.5 of the supplementary materials demonstrates validity of our permutation tests with the LPR estimator.
7 It is worth noting that SB takes longer than SP to compute because it requires re-estimation of for two different bandwidths and multiple observations i (Cao-Abad Citation1991, p. 2227). We compare the computation time of all tests with two empirical illustrations in the next section ().
8 For the t-test, we obtain critical values from the simulated distribution under the null hypothesis, and keep those critical values to examine the simulated rejection rates under the alternative hypotheses. For the permutation, bootstrap, and subsampling tests, we numerically search for a nominal level α that gives us the simulated rejection rate of 5% under the null hypothesis. Once that artificial nominal level is found, we fix that nominal level and compute the simulated rejection probabilities under the various alternative hypotheses. Rejection may be randomized in case of ties (EquationEquation (2.8)(2.8) (2.8) ) in order for the numerical search to find a solution.
9 We downloaded the datasets from Michal Kolesar’s repository, https://github.com/kolesarm.