Abstract
Motivated by the advent of high-dimensional, highly correlated data, this work studies the limit behavior of the empirical cumulative distribution function (ecdf) of standard normal random variables under arbitrary correlation. First, we provide a necessary and sufficient condition for convergence of the ecdf to the standard normal distribution. Next, under general correlation, we show that the ecdf limit is a random, possible infinite, mixture of normal distribution functions that depends on a number of latent variables and can serve as an asymptotic approximation to the ecdf in high dimensions. We provide conditions under which the dimension of the ecdf limit, defined as the smallest number of effective latent variables, is finite. Estimates of the latent variables are provided and their consistency proved. We demonstrate these methods in a real high-dimensional data example from brain imaging where it is shown that, while the study exhibits apparently strongly significant results, they can be entirely explained by correlation, as captured by the asymptotic approximation developed here. Supplementary materials for this article are available online.
Additional information
Notes on contributors
David Azriel
David Azriel is lecturer at the Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa 32000, Israel, and Postdoctoral Research Associate in the Department of Statistics of the Wharton School of the University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104 (E-mail: [email protected]). Armin Schwartzman is Associate Professor at the Department of Statistics, North Carolina State University, Raleigh, NC 27695 (E-mail: [email protected]). The authors are grateful to Philip Reiss from the Department of Child and Adolescent Psychiatry, New York University School of Medicine, for providing the brain imaging data. This work was partially supported by NIH grant R01-CA157528.
Armin Schwartzman
David Azriel is lecturer at the Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa 32000, Israel, and Postdoctoral Research Associate in the Department of Statistics of the Wharton School of the University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104 (E-mail: [email protected]). Armin Schwartzman is Associate Professor at the Department of Statistics, North Carolina State University, Raleigh, NC 27695 (E-mail: [email protected]). The authors are grateful to Philip Reiss from the Department of Child and Adolescent Psychiatry, New York University School of Medicine, for providing the brain imaging data. This work was partially supported by NIH grant R01-CA157528.