Search in:

Advanced search

Journal of the American Statistical Association Volume 118, 2023 - Issue 541

Submit an article Journal homepage

1,427

Views

CrossRef citations to date

Altmetric

Theory and Methods

A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

Hao Chena Department of Statistics, University of California at Davis, CAView further author information

Yin Xiab Department of Statistics, School of Management, Fudan UniversityCorrespondence[email protected]
View further author information

Pages 719-731 | Received 20 Dec 2019, Accepted 04 Jul 2021, Published online: 31 Aug 2021

Cite this article
https://doi.org/10.1080/01621459.2021.1953507
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Anderson, T. W. (2003), An Introduction To Multivariate Statistical Analysis (3rd ed.), New York: Wiley-Intersceince.
Google Scholar
Baringhaus, L., and Henze, N. (1988), “A Consistent Test for Multivariate Normality Based on the Empirical Characteristic Function,” Metrika, 35, 339–348. DOI: 10.1007/BF02613322.
Google Scholar
Bartoszyński, R., Pearl, D. K., and Lawrence, J. (1997), “A Multidimensional Goodness-of-Fit Test Based on Interpoint Distances,” Journal of the American Statistical Association, 92, 577–586. DOI: 10.1080/01621459.1997.10474010.
Web of Science ®Google Scholar
Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L. (2013), “Valid Post-Selection Inference,” The Annals of Statistics, 41, 802–837. DOI: 10.1214/12-AOS1077.
Web of Science ®Google Scholar
Bickel, P. J., and Levina, E. (2004), “Some Theory for Fisher’s Linear Discriminant Function, Naive Bayes’, and Some Alternatives When There are Many More Variables Than Observations,” Bernoulli, 10, 989–1010. DOI: 10.3150/bj/1106314847.
Web of Science ®Google Scholar
Bickel, P. J., and Levina, E. (2008), “Regularized Estimation of Large Covariance Matrices,” The Annals of Statistics, 36, 199–227. DOI: 10.1214/009053607000000758.
Web of Science ®Google Scholar
Cai, T., and Liu, W. (2011), “A Direct Estimation Approach to Sparse Linear Discriminant Analysis,” Journal of the American Statistical Association, 106, 1566–1577. DOI: 10.1198/jasa.2011.tm11199.
Web of Science ®Google Scholar
Cai, T. T., Liu, W., and Xia, Y. (2014), “Two-Sample Test of High Dimensional Means Under Dependency,” Journal of Royal Statistical Society, Series B, 76, 349–372. DOI: 10.1111/rssb.12034.
Google Scholar
Chan, H. P., and Walther, G. (2015), “Optimal Detection of Multi-Sample Aligned Sparse Signals,” The Annals of Statistics, 43, 1865–1895. DOI: 10.1214/15-AOS1328.
Web of Science ®Google Scholar
Chen, H. (2019), “Sequential Change-Point Detection Based on Nearest Neighbors,” The Annals of Statistics, 47, 1381–1407. DOI: 10.1214/18-AOS1718.
Web of Science ®Google Scholar
Chen, H., Chen, X., and Su, Y. (2018), “A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data,” Journal of the American Statistical Association, 113, 1146–1155. DOI: 10.1080/01621459.2017.1307757.
Web of Science ®Google Scholar
Chen, H., and Friedman, J. H. (2017), “A New Graph-Based Two-Sample Test for Multivariate and Object Data,” Journal of the American statistical Association, 112, 397–409. DOI: 10.1080/01621459.2016.1147356.
Web of Science ®Google Scholar
Chen, H., and Zhang, N. (2015), “Graph-Based Change-Point Detection,” The Annals of Statistics, 43, 139–176. DOI: 10.1214/14-AOS1269.
Web of Science ®Google Scholar
Dettling, M. (2004), “Bagboosting for Tumor Classification With Gene Expression Data,” Bioinformatics, 20, 3583–3593. DOI: 10.1093/bioinformatics/bth447.
PubMed Web of Science ®Google Scholar
Doornik, J. A., and Hansen, H. (2008), “An Omnibus Test for Univariate and Multivariate Normality,” Oxford Bulletin of Economics and Statistics, 70, 927–939. DOI: 10.1111/j.1468-0084.2008.00537.x.
Web of Science ®Google Scholar
Fan, J., and Fan, Y. (2008), “High Dimensional Classification Using Features Annealed Independence Rules,” Annals of Statistics, 36, 2605.
PubMed Web of Science ®Google Scholar
Fan, J., Feng, Y., and Wu, Y. (2009), “Network Exploration Via the Adaptive Lasso and Scad Penalties,” The Annals of Applied Statistics, 3, 521. DOI: 10.1214/08-AOAS215.
PubMed Web of Science ®Google Scholar
Fattorini, L. (1986), “Remarks on the Use of Shapiro–Wilk Statistic for Testing Multivariate Normality,” Statistica, 46, 209–217.
Google Scholar
Friedman, J., Hastie, T., and Tibshirani, R. (2008), “Sparse Inverse Covariance Estimation With the Graphical Lasso,” Biostatistics, 9, 432–441. DOI: 10.1093/biostatistics/kxm045.
PubMed Web of Science ®Google Scholar
Friedman, J. H., and Rafsky, L. C. (1979), “Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests,” The Annals of Statistics, 697–717. DOI: 10.1214/aos/1176344722.
Web of Science ®Google Scholar
Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J., and Bueno, R. (2002), “Translation of Microarray Data Into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma,” Cancer Research, 62, 4963–4967.
PubMed Web of Science ®Google Scholar
Henze, N. (1988), “A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences,” The Annals of Statistics, 16, 772–783. DOI: 10.1214/aos/1176350835.
Web of Science ®Google Scholar
Henze, N., and Wagner, T. (1997), “A New Approach to the Bhep Tests for Multivariate Normality,” Journal of Multivariate Analysis, 62, 1–23. DOI: 10.1006/jmva.1997.1684.
Web of Science ®Google Scholar
Henze, N., and Zirkler, B. (1990), “A Class of Invariant Consistent Tests for Multivariate Normality,” Communications in Statistics-Theory and Methods, 19, 3595–3617. DOI: 10.1080/03610929008830400.
Web of Science ®Google Scholar
Jin, J., and Wang, W. (2016), “Influential Features PCA for High Dimensional Clustering,” Annals of Statistics, 44, 2323–2359.
Web of Science ®Google Scholar
Jurečková, J., and Kalina, J. (2012), “Nonparametric Multivariate Rank Tests and Their Unbiasedness,” Bernoulli, 229–251. DOI: 10.3150/10-BEJ326.
Web of Science ®Google Scholar
Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016), “Exact Post-Selection Inference, With Application to the Lasso,” The Annals of Statistics, 44, 907–927. DOI: 10.1214/15-AOS1371.
Web of Science ®Google Scholar
Liu, K., Zhang, R., and Mei, Y. (2019), “Scalable Sum-Shrinkage Schemes for Distributed Monitoring Large-Scale Data Streams,” Statistica Sinica, 29, 1–22.
Web of Science ®Google Scholar
Liu, Q., Lee, J., and Jordan, M. (2016), “A Kernelized Stein Discrepancy for Goodness-of-Fit Tests,” in International Conference on Machine Learning, pp. 276–284.
Google Scholar
Liu, W. (2013), “Gaussian Graphical Model Estimation With False Discovery Rate Control,” Annals of Statistics, 41, 2948–2978.
Web of Science ®Google Scholar
Ma, S., Gong, Q., and Bohnert, H. J. (2007), “An Arabidopsis Gene Network Based on the Graphical Gaussian Model,” Genome Research, 17. DOI: 10.1101/gr.6911207.
Web of Science ®Google Scholar
Mai, Q., Zou, H., and Yuan, M. (2012), “A Direct Approach to Sparse Discriminant Analysis in Ultra-High Dimensions,” Biometrika, 99, 29–42. DOI: 10.1093/biomet/asr066.
Web of Science ®Google Scholar
Mardia, K. V. (1970), “Measures of Multivariate Skewness and Kurtosis With Applications,” Biometrika, 57, 519–530. DOI: 10.1093/biomet/57.3.519.
Web of Science ®Google Scholar
Marozzi, M. (2015), “Multivariate Multidistance Tests for High-Dimensional Low Sample Size Case-Control Studies,” Statistics in Medicine, 34, 1511–1526. DOI: 10.1002/sim.6418.
PubMed Web of Science ®Google Scholar
Rothman, A. J., Bickel, P. J., Levina, E., Zhu, J. (2008), “Sparse Permutation Invariant Covariance Estimation,” Electronic Journal of Statistics, 2, 494–515. DOI: 10.1214/08-EJS176.
Web of Science ®Google Scholar
Royston, J. (1983), “Some Techniques for Assessing Multivarate Normality Based on the Shapiro–Wilk W,” Applied Statistics, 32, 121–133. DOI: 10.2307/2347291.
Web of Science ®Google Scholar
Schilling, M. F. (1986), “Multivariate Two-Sample Tests Based on Nearest Neighbors,” Journal of the American Statistical Association, 81, 799–806. DOI: 10.1080/01621459.1986.10478337.
Web of Science ®Google Scholar
Shapiro, S. S., and Wilk, M. B. (1965), “An Analysis of Variance Test for Normality (Complete Samples),” Biometrika, 52, 91–611. DOI: 10.1093/biomet/52.3-4.591.
Web of Science ®Google Scholar
Smith, S. P., and Jain, A. K. (1988), “A Test to Determine the Multivariate Normality of a Data Set,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 757–761. DOI: 10.1109/34.6789.
Web of Science ®Google Scholar
Taylor, J., and Tibshirani, R. (2018), “Post-Selection Inference for-Penalized Likelihood Models,” Canadian Journal of Statistics, 46, 41–61. DOI: 10.1002/cjs.11313.
PubMed Web of Science ®Google Scholar
Villasenor Alva, J. A., and Estrada, E. G. (2009), “A Generalization of Shapiro–Wilk’s Test for Multivariate Normality,” Communications in Statistics Theory and Methods, 38, 1870–1883. DOI: 10.1080/03610920802474465.
Web of Science ®Google Scholar
Wang, T., and Samworth, R. J. (2018), “High Dimensional Change Point Estimation Via Sparse Projection,” Journal of the Royal Statistical Society, Series B, 80, 57–83. DOI: 10.1111/rssb.12243.
Google Scholar
Xia, Y., Cai, T., and Cai, T. T. (2015), “Testing Differential Networks With Applications to the Detection of Gene–Gene Interactions,” Biometrika, 102, 247–266. DOI: 10.1093/biomet/asu074.
PubMed Web of Science ®Google Scholar
Xie, Y., and Siegmund, D. (2013), “Sequential Multi-Sensor Change-Point Detection,” The Annals of Statistics, 41, 670–692. DOI: 10.1214/13-AOS1094.
Web of Science ®Google Scholar
Yuan, M. (2010), “High Dimensional Inverse Covariance Matrix Estimation Via Linear Programming,” Journal of Machine Learning Research, 11, 2261–2286.
Web of Science ®Google Scholar
Yuan, M., and Lin, Y. (2007), “Model Selection and Estimation in the Gaussian Graphical Model,” Biometrika, 94, 19–35. DOI: 10.1093/biomet/asm018.
Web of Science ®Google Scholar
Zhou, M., and Shao, Y. (2014), “A Powerful Test for Multivariate Normality,” Journal of Applied Statistics, 41, 351–363. DOI: 10.1080/02664763.2013.839637.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date