452
Views
2
CrossRef citations to date
0
Altmetric
High-Dimensional and Big Data

Cross-Validated Loss-based Covariance Matrix Estimator Selection in High Dimensions

ORCID Icon, ORCID Icon, & ORCID Icon
Pages 601-612 | Received 06 Nov 2021, Accepted 28 Jul 2022, Published online: 07 Oct 2022

References

  • Amezquita, R. A., Lun, A. T. L., Becht, E., Carey, V. J., Carpp, L. N., Geistlinger, L., Marini, F., Rue-Albrecht, K., Risso, D., Soneson, C., Waldron, L., Pagès, H., Smith, M. L., Huber, W., Morgan, M., Gottardo, R., and Hicks, S. C. (2020), “Orchestrating Single-Cell Analysis with Bioconductor,” Nature Methods, 17, 137–145. 10.1038/s41592-019-0654-x.
  • Anderson, T. (2003), An Introduction to Multivariate Statistical Analysis (3rd ed.), Hoboken, NJ: Wiley.
  • Bai, J. (2003), “Inferential Theory for Factor Models of Large Dimensions,” Econometrica, 71, 135–171. DOI: 10.1111/1468-0262.00392.
  • Bartz, D. (2016), “Cross-Validation based Nonlinear Shrinkage,” Available at https://arxiv.org/abs/1611.00798.
  • Bennett, G. (1962), “Probability Inequalities for the Sum of Independent Random Variables,” Journal of the American Statistical Association, 57, 33–45. DOI: 10.1080/01621459.1962.10482149.
  • Bickel, P. J., and Levina, E. (2008a), “Covariance Regularization by Thresholding,” Annals of Statistics, 36, 2577–2604. Available at http://www.jstor.org/stable/25464728.
  • Bickel, P. J., and Levina, E. (2008b), “Regularized Estimation of Large Covariance Matrices,” Annals of Statistics, 36, 199–227. Available at http://www.jstor.org/stable/25464621.
  • Boileau, P., Hejazi, N. S., Collica, B., van der Laan, M. J., and Dudoit, S. (2021), “cvCovEst: Cross-validated Covariance Matrix Estimator Selection and Evaluation in R,” Journal of Open Source Software, 6, 3273. DOI: 10.21105/joss.03273..
  • Breiman, L., and Spector, P. (1992), “Submodel Selection and Evaluation in Regression. The X-random Case,” International Statistical Review, 60, 291–319. DOI: 10.2307/1403680.
  • Cai, T., and Liu, W. (2011), “Adaptive Thresholding for Sparse Covariance Matrix Estimation,” Journal of the American Statistical Association, 106, 672–684. DOI: 10.1198/jasa.2011.tm10560..
  • Cai, T. T., Zhang, C.-H., and Zhou, H. H. (2010), “Optimal Rates of Convergence for Covariance Matrix Estimation,” Annals of Statistics, 38, 2118–2144.
  • Coyle, J., and Hejazi, N. (2018), “origami: A Generalized Framework for Cross-Validation in R,” Journal of Open Source Software, 3, 512. DOI: 10.21105/joss.00512..
  • Dudoit, S., and van der Laan, M. J. (2005), “Asymptotics of Cross-validated Risk Estimation in Estimator Selection and Performance Assessment,” Statistical Methodology, 2, 131–154. DOI: 10.1016/j.stamet.2005.02.003.
  • Efron, B. (2012), Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambridge: Cambridge University Press.
  • Fan, J., Fan, Y., and Lv, J. (2008), “High Dimensional Covariance Matrix Estimation using a Factor Model,” Journal of Econometrics, 147, 186 – 197. Available at Econometric modelling in finance and risk management: An overview. http://www.sciencedirect.com/science/article/pii/S0304407608001346.
  • Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,” Journal of the Royal Statistical Society, Series B, 75, 603–680. DOI: 10.1111/rssb.12016.
  • Fan, J., Liao, Y., and Wang, W. (2016), “Projected Principal Component Analysis in Factor Models,” Annals of Statistics, 44, 219–254.
  • Fan, J., Wang, W., and Zhong, Y. (2019), “Robust Covariance Estimation for Approximate Factor Models,” Journal of Econometrics, 208, 5–22. Special Issue on Financial Engineering and Risk Management. DOI: 10.1016/j.jeconom.2018.09.003.
  • Fang, Y., Wang, B., and Feng, Y. (2016), “Tuning-parameter Selection in Regularized Estimations of Large Covariance Matrices,” Journal of Statistical Computation and Simulation, 86, 494–509. DOI: 10.1080/00949655.2015.1017823..
  • Friedman, J., Hastie, T., and Tibshirani, R. (2001), The Elements of Statistical Learning, volume 1, Springer Series in Statistics, New York: Springer.
  • Golub, G. H., and Van Loan, C. F. (1996), Matrix Computations (3rd ed.), Baltimore, MD: Johns Hopkins Universtiy Press.
  • Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2002), A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics, New York: Springer.
  • Johnstone, I. M. (2001), “On the Distribution of the Largest Eigenvalue in Principal Components Analysis,” Annals of Statistics, 29, 295–327.
  • Johnstone, I. M., and Lu, A. Y. (2009), “On Consistency and Sparsity for Principal Components Analysis in High Dimensions,” Journal of the American Statistical Association, 104, 682–693. DOI: 10.1198/jasa.2009.0121.
  • Lam, C., and Fan, J. (2009), “Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation,” Annals of Statistics, 37, 4254–4278.
  • Ledoit, O., and Wolf, M. (2004), “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 88, 365–411. DOI: 10.1016/S0047-259X(03)00096-4.
  • Ledoit, O., and Wolf, M. (2012), “Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices,” Annals of Statistics, 40, 1024–1060.
  • Ledoit, O., and Wolf, M. (2015), “Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions,” Journal of Multivariate Analysis, 139, 360–384.
  • Ledoit, O., and Wolf, M. (2018), “Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices,” ECON – Working papers 264, Department of Economics - University of Zurich. Available at https://EconPapers.repec.org/RePEc:zur:econwp:264.
  • Marčenko, V. A., and Pastur, L. A. (1967), “Distribution of Eigenvalues for some Sets of Random Matrices,” Mathematics of the USSR-Sbornik, 1, 457–483. DOI: 10.1070/SM1967v001n04ABEH001994.
  • McInnes, L., Healy, J., and Melville, J. (2018), “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” arXiv, 1802.03426. Available at http://arxiv.org/abs/1802.03426.
  • Onatski, A. (2012), “Asymptotics of the Principal Components Estimator of Large Factor Models with Weakly Influential Factors,” Journal of Econometrics, 168, 244–258. DOI: 10.1016/j.jeconom.2012.01.034.
  • Poincaré, H. (1912), Calcul des probabilités (Vol. 1), Paris: Gauthier-Villars.
  • R Core Team. (2021), R: A Language and Environment for Statistical Computing, Vienna: R Foundation for Statistical Computing. Available at https://www.R-project.org/.
  • Risso, D., and Cole, M. (2020), scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets. R package version 2.3.17.
  • Robbins, H. (1964), “The Empirical Bayes Approach to Statistical Decision Problems,” Annals of Mathematical Statistics, 35, 1–20. DOI: 10.1214/aoms/1177703729.
  • Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized Thresholding of Large Covariance Matrices,” Journal of the American Statistical Association, 104, 177–186. DOI: 10.1198/jasa.2009.0101..
  • Schäfer, J., and Strimmer, K. (2005), “A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics,” Statistical Applications in Genetics and Molecular Biology, 4. DOI: 10.2202/1544-6115.1175.
  • Smith, S. (2005), “Covariance, Subspace, and Intrinsic Crame/spl acute/r-Rao bounds,” IEEE Transactions on Signal Processing, 53, 1610–1630. DOI: 10.1109/TSP.2005.845428.
  • Stock, J. H., and Watson, M. W. (2002), “Forecasting Using Principal Components from a Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179. Available at http://www.jstor.org/stable/3085839.
  • Tasic, B., Menon, V., Nguyen, T. N., Kim, T. K., Jarsky, T., Yao, Z., Levi, B., Gray, L. T., Sorensen, S. A., Dolbeare, T., Bertagnolli, D., Goldy, J., Shapovalova, N., Parry, S., Lee, C., Smith, K., Bernard, A., Madisen, L., Sunkin, S. M., Hawrylycz, M., Koch, C., and Zeng, H. (2016), “Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics,” Nature Neuroscience, 19, 335–346. 10.1038/nn.4216.
  • van der Laan, M. J., and Dudoit, S. (2003), “Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples,” Working paper 130, University of California, Berkeley. Available at https://biostats.bepress.com/ucbbiostat/paper130/.
  • van der Vaart, A. W., Dudoit, S., and van der Laan, M. J. (2006), “Oracle Inequalities for Multi-Fold Cross Validation,” Statistics and Decisions, 24, 351–371. DOI: 10.1524/stnd.2006.24.3.351.
  • Zeisel, A., Muñoz-Manchado, A. B., Codeluppi, S., Lönnerberg, P., La Manno, G., Juréus, A., Marques, S., Munguba, H., He, L., Betsholtz, C., Rolny, C., Castelo-Branco, G., Hjerling-Leffler, J., and Linnarsson, S. (2015), “Cell Types in the Mouse Cortex and Hippocampus Revealed by Single-Cell RNA-seq,” Science, 347, 1138–1142. DOI: 10.1126/science.aaa1934.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.