Abstract
Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension grows with the sample size and at the same time exhibit a superior power performance of the new test compared with alternative methods. We also illustrate our approach through two popularly used datasets in high-dimensional classification and clustering literatures where deviation from the normality assumption may lead to invalid conclusions.
Supplementary Materials
The supplementary materials contain the R codes used to reproduce the simulation results in the paper.
Acknowledgments
The authors thank to the associate editor and two referees for their helpful constructive comments which have helped to improve the quality and presentation of the article. The authors also thank to the reviewer’s suggestions on various numerical alternatives including the improved multivariate Shapiro–Wilk’s test and the Fisher’s test.