740
Views
14
CrossRef citations to date
0
Altmetric
Theory and Methods

Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests

, &
Pages 1308-1323 | Received 01 Sep 2012, Published online: 19 Dec 2013
 

Abstract

Some existing nonparametric two-sample tests for equality of multivariate distributions perform unsatisfactorily when the two sample sizes are unbalanced. In particular, the power of these tests tends to diminish with increasingly unbalanced sample sizes. In this article, we propose a new testing procedure to solve this problem. The proposed test, based on the nearest neighbor method by Schilling, employs a novel ensemble subsampling scheme to remedy this issue. More specifically, the test statistic is a weighted average of a collection of statistics, each associated with a randomly selected subsample of the data. We derive the asymptotic distribution of the test statistic under the null hypothesis and show that the new test is consistent against all alternatives when the ratio of the sample sizes either goes to a finite limit or tends to infinity. Via simulated data examples we demonstrate that the new test has increasing power with increasing sample size ratio when the size of the smaller sample is fixed. The test is applied to a real-data example in the field of corporate finance. Supplementary materials for this article are available online.

View correction statement:
“Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests,”

Acknowledgments

The authors thank Joseph Chang and Ye Luo for helpful discussions. Their sincere gratitude also goes to three anonymous reviewers, an AE, and the Co-editor Xuming He, for many constructive comments and suggestions.

Notes

“Unconstrained firms are firms that are not labeled as constrained firms.” “Constrained firms do not pay dividends, do not have a net equity or debt purchase (not both) over the event quarter, and have a Tobin's Q greater than one at the end of the event quarter” (Korajczyk and Lévy Citation2003).

The raw data are from the COMPUSTAT database, the CRSP database, the Board of Governors of Federal Reserve System H.15 Database, and the U.S. Bureau of Labor Statistics CPI database. The cleaned data and R codes are available upon request.

This relatively short and conceptual proof is suggested by one of our anonymous referees. An alternative proof which is more explicit can be found in the supplementary materials.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.