Abstract
A triangle statistic is proposed for testing the equality of two multivariate continuous distributions in high-dimensional settings based on sample interpoint distances. Given two independent p-dimensional random samples, a triangle can be formed by randomly selecting one observation from one sample and two observations from the other sample. The triangle statistic estimates the probability that the distance between the two observations from the same distribution is the largest, the middle or the smallest in the triangle being formed by these three observations. We show that the test based on the triangle statistic is asymptotically distribution-free under the null hypothesis of equal, but unknown continuous distribution functions. The triangle test is compared with other nonparametric tests through a simulation study. The triangle statistic is well defined when the number of variables p is larger than the number of observations m, and its computational complexity is independent of p, making it suitable for high-dimensional settings.
Acknowledgements
The authors thank two referees and the associate editor for helpful comments that led to better presentation of the article.