1,427
Views
3
CrossRef citations to date
0
Altmetric
Theory and Methods

A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

&
Pages 719-731 | Received 20 Dec 2019, Accepted 04 Jul 2021, Published online: 31 Aug 2021

References

  • Anderson, T. W. (2003), An Introduction To Multivariate Statistical Analysis (3rd ed.), New York: Wiley-Intersceince.
  • Baringhaus, L., and Henze, N. (1988), “A Consistent Test for Multivariate Normality Based on the Empirical Characteristic Function,” Metrika, 35, 339–348. DOI: 10.1007/BF02613322.
  • Bartoszyński, R., Pearl, D. K., and Lawrence, J. (1997), “A Multidimensional Goodness-of-Fit Test Based on Interpoint Distances,” Journal of the American Statistical Association, 92, 577–586. DOI: 10.1080/01621459.1997.10474010.
  • Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L. (2013), “Valid Post-Selection Inference,” The Annals of Statistics, 41, 802–837. DOI: 10.1214/12-AOS1077.
  • Bickel, P. J., and Levina, E. (2004), “Some Theory for Fisher’s Linear Discriminant Function, Naive Bayes’, and Some Alternatives When There are Many More Variables Than Observations,” Bernoulli, 10, 989–1010. DOI: 10.3150/bj/1106314847.
  • Bickel, P. J., and Levina, E. (2008), “Regularized Estimation of Large Covariance Matrices,” The Annals of Statistics, 36, 199–227. DOI: 10.1214/009053607000000758.
  • Cai, T., and Liu, W. (2011), “A Direct Estimation Approach to Sparse Linear Discriminant Analysis,” Journal of the American Statistical Association, 106, 1566–1577. DOI: 10.1198/jasa.2011.tm11199.
  • Cai, T. T., Liu, W., and Xia, Y. (2014), “Two-Sample Test of High Dimensional Means Under Dependency,” Journal of Royal Statistical Society, Series B, 76, 349–372. DOI: 10.1111/rssb.12034.
  • Chan, H. P., and Walther, G. (2015), “Optimal Detection of Multi-Sample Aligned Sparse Signals,” The Annals of Statistics, 43, 1865–1895. DOI: 10.1214/15-AOS1328.
  • Chen, H. (2019), “Sequential Change-Point Detection Based on Nearest Neighbors,” The Annals of Statistics, 47, 1381–1407. DOI: 10.1214/18-AOS1718.
  • Chen, H., Chen, X., and Su, Y. (2018), “A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data,” Journal of the American Statistical Association, 113, 1146–1155. DOI: 10.1080/01621459.2017.1307757.
  • Chen, H., and Friedman, J. H. (2017), “A New Graph-Based Two-Sample Test for Multivariate and Object Data,” Journal of the American statistical Association, 112, 397–409. DOI: 10.1080/01621459.2016.1147356.
  • Chen, H., and Zhang, N. (2015), “Graph-Based Change-Point Detection,” The Annals of Statistics, 43, 139–176. DOI: 10.1214/14-AOS1269.
  • Dettling, M. (2004), “Bagboosting for Tumor Classification With Gene Expression Data,” Bioinformatics, 20, 3583–3593. DOI: 10.1093/bioinformatics/bth447.
  • Doornik, J. A., and Hansen, H. (2008), “An Omnibus Test for Univariate and Multivariate Normality,” Oxford Bulletin of Economics and Statistics, 70, 927–939. DOI: 10.1111/j.1468-0084.2008.00537.x.
  • Fan, J., and Fan, Y. (2008), “High Dimensional Classification Using Features Annealed Independence Rules,” Annals of Statistics, 36, 2605.
  • Fan, J., Feng, Y., and Wu, Y. (2009), “Network Exploration Via the Adaptive Lasso and Scad Penalties,” The Annals of Applied Statistics, 3, 521. DOI: 10.1214/08-AOAS215.
  • Fattorini, L. (1986), “Remarks on the Use of Shapiro–Wilk Statistic for Testing Multivariate Normality,” Statistica, 46, 209–217.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2008), “Sparse Inverse Covariance Estimation With the Graphical Lasso,” Biostatistics, 9, 432–441. DOI: 10.1093/biostatistics/kxm045.
  • Friedman, J. H., and Rafsky, L. C. (1979), “Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests,” The Annals of Statistics, 697–717. DOI: 10.1214/aos/1176344722.
  • Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J., and Bueno, R. (2002), “Translation of Microarray Data Into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma,” Cancer Research, 62, 4963–4967.
  • Henze, N. (1988), “A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences,” The Annals of Statistics, 16, 772–783. DOI: 10.1214/aos/1176350835.
  • Henze, N., and Wagner, T. (1997), “A New Approach to the Bhep Tests for Multivariate Normality,” Journal of Multivariate Analysis, 62, 1–23. DOI: 10.1006/jmva.1997.1684.
  • Henze, N., and Zirkler, B. (1990), “A Class of Invariant Consistent Tests for Multivariate Normality,” Communications in Statistics-Theory and Methods, 19, 3595–3617. DOI: 10.1080/03610929008830400.
  • Jin, J., and Wang, W. (2016), “Influential Features PCA for High Dimensional Clustering,” Annals of Statistics, 44, 2323–2359.
  • Jurečková, J., and Kalina, J. (2012), “Nonparametric Multivariate Rank Tests and Their Unbiasedness,” Bernoulli, 229–251. DOI: 10.3150/10-BEJ326.
  • Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016), “Exact Post-Selection Inference, With Application to the Lasso,” The Annals of Statistics, 44, 907–927. DOI: 10.1214/15-AOS1371.
  • Liu, K., Zhang, R., and Mei, Y. (2019), “Scalable Sum-Shrinkage Schemes for Distributed Monitoring Large-Scale Data Streams,” Statistica Sinica, 29, 1–22.
  • Liu, Q., Lee, J., and Jordan, M. (2016), “A Kernelized Stein Discrepancy for Goodness-of-Fit Tests,” in International Conference on Machine Learning, pp. 276–284.
  • Liu, W. (2013), “Gaussian Graphical Model Estimation With False Discovery Rate Control,” Annals of Statistics, 41, 2948–2978.
  • Ma, S., Gong, Q., and Bohnert, H. J. (2007), “An Arabidopsis Gene Network Based on the Graphical Gaussian Model,” Genome Research, 17. DOI: 10.1101/gr.6911207.
  • Mai, Q., Zou, H., and Yuan, M. (2012), “A Direct Approach to Sparse Discriminant Analysis in Ultra-High Dimensions,” Biometrika, 99, 29–42. DOI: 10.1093/biomet/asr066.
  • Mardia, K. V. (1970), “Measures of Multivariate Skewness and Kurtosis With Applications,” Biometrika, 57, 519–530. DOI: 10.1093/biomet/57.3.519.
  • Marozzi, M. (2015), “Multivariate Multidistance Tests for High-Dimensional Low Sample Size Case-Control Studies,” Statistics in Medicine, 34, 1511–1526. DOI: 10.1002/sim.6418.
  • Rothman, A. J., Bickel, P. J., Levina, E., Zhu, J. (2008), “Sparse Permutation Invariant Covariance Estimation,” Electronic Journal of Statistics, 2, 494–515. DOI: 10.1214/08-EJS176.
  • Royston, J. (1983), “Some Techniques for Assessing Multivarate Normality Based on the Shapiro–Wilk W,” Applied Statistics, 32, 121–133. DOI: 10.2307/2347291.
  • Schilling, M. F. (1986), “Multivariate Two-Sample Tests Based on Nearest Neighbors,” Journal of the American Statistical Association, 81, 799–806. DOI: 10.1080/01621459.1986.10478337.
  • Shapiro, S. S., and Wilk, M. B. (1965), “An Analysis of Variance Test for Normality (Complete Samples),” Biometrika, 52, 91–611. DOI: 10.1093/biomet/52.3-4.591.
  • Smith, S. P., and Jain, A. K. (1988), “A Test to Determine the Multivariate Normality of a Data Set,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 757–761. DOI: 10.1109/34.6789.
  • Taylor, J., and Tibshirani, R. (2018), “Post-Selection Inference for-Penalized Likelihood Models,” Canadian Journal of Statistics, 46, 41–61. DOI: 10.1002/cjs.11313.
  • Villasenor Alva, J. A., and Estrada, E. G. (2009), “A Generalization of Shapiro–Wilk’s Test for Multivariate Normality,” Communications in Statistics Theory and Methods, 38, 1870–1883. DOI: 10.1080/03610920802474465.
  • Wang, T., and Samworth, R. J. (2018), “High Dimensional Change Point Estimation Via Sparse Projection,” Journal of the Royal Statistical Society, Series B, 80, 57–83. DOI: 10.1111/rssb.12243.
  • Xia, Y., Cai, T., and Cai, T. T. (2015), “Testing Differential Networks With Applications to the Detection of Gene–Gene Interactions,” Biometrika, 102, 247–266. DOI: 10.1093/biomet/asu074.
  • Xie, Y., and Siegmund, D. (2013), “Sequential Multi-Sensor Change-Point Detection,” The Annals of Statistics, 41, 670–692. DOI: 10.1214/13-AOS1094.
  • Yuan, M. (2010), “High Dimensional Inverse Covariance Matrix Estimation Via Linear Programming,” Journal of Machine Learning Research, 11, 2261–2286.
  • Yuan, M., and Lin, Y. (2007), “Model Selection and Estimation in the Gaussian Graphical Model,” Biometrika, 94, 19–35. DOI: 10.1093/biomet/asm018.
  • Zhou, M., and Shao, Y. (2014), “A Powerful Test for Multivariate Normality,” Journal of Applied Statistics, 41, 351–363. DOI: 10.1080/02664763.2013.839637.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.