Search in:

Advanced search

Journal of Computational and Graphical Statistics Volume 32, 2023 - Issue 2

Submit an article Journal homepage

452

Views

CrossRef citations to date

Altmetric

High-Dimensional and Big Data

Cross-Validated Loss-based Covariance Matrix Estimator Selection in High Dimensions

Philippe Boileaua Graduate Group in Biostatistics and Center for Computational Biology, UC Berkeley, Berkeley, CA;

https://orcid.org/0000-0002-4850-2507 View further author information

Nima S. Hejazib Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA;

https://orcid.org/0000-0002-7127-2789 View further author information

Mark J. van der Laanc Department of Statistics, Division of Biostatistics, and Center for Computational Biology, UC Berkeley, Berkeley, CAView further author information

Sandrine Dudoitc Department of Statistics, Division of Biostatistics, and Center for Computational Biology, UC Berkeley, Berkeley, CACorrespondence[email protected]

https://orcid.org/0000-0002-6069-8629 View further author information

Pages 601-612 | Received 06 Nov 2021, Accepted 28 Jul 2022, Published online: 07 Oct 2022

Cite this article
https://doi.org/10.1080/10618600.2022.2110883
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Amezquita, R. A., Lun, A. T. L., Becht, E., Carey, V. J., Carpp, L. N., Geistlinger, L., Marini, F., Rue-Albrecht, K., Risso, D., Soneson, C., Waldron, L., Pagès, H., Smith, M. L., Huber, W., Morgan, M., Gottardo, R., and Hicks, S. C. (2020), “Orchestrating Single-Cell Analysis with Bioconductor,” Nature Methods, 17, 137–145. 10.1038/s41592-019-0654-x.
PubMed Web of Science ®Google Scholar
Anderson, T. (2003), An Introduction to Multivariate Statistical Analysis (3rd ed.), Hoboken, NJ: Wiley.
Google Scholar
Bai, J. (2003), “Inferential Theory for Factor Models of Large Dimensions,” Econometrica, 71, 135–171. DOI: 10.1111/1468-0262.00392.
Web of Science ®Google Scholar
Bartz, D. (2016), “Cross-Validation based Nonlinear Shrinkage,” Available at https://arxiv.org/abs/1611.00798.
Google Scholar
Bennett, G. (1962), “Probability Inequalities for the Sum of Independent Random Variables,” Journal of the American Statistical Association, 57, 33–45. DOI: 10.1080/01621459.1962.10482149.
Web of Science ®Google Scholar
Bickel, P. J., and Levina, E. (2008a), “Covariance Regularization by Thresholding,” Annals of Statistics, 36, 2577–2604. Available at http://www.jstor.org/stable/25464728.
Web of Science ®Google Scholar
Bickel, P. J., and Levina, E. (2008b), “Regularized Estimation of Large Covariance Matrices,” Annals of Statistics, 36, 199–227. Available at http://www.jstor.org/stable/25464621.
Web of Science ®Google Scholar
Boileau, P., Hejazi, N. S., Collica, B., van der Laan, M. J., and Dudoit, S. (2021), “cvCovEst: Cross-validated Covariance Matrix Estimator Selection and Evaluation in R,” Journal of Open Source Software, 6, 3273. DOI: 10.21105/joss.03273..
Google Scholar
Breiman, L., and Spector, P. (1992), “Submodel Selection and Evaluation in Regression. The X-random Case,” International Statistical Review, 60, 291–319. DOI: 10.2307/1403680.
Web of Science ®Google Scholar
Cai, T., and Liu, W. (2011), “Adaptive Thresholding for Sparse Covariance Matrix Estimation,” Journal of the American Statistical Association, 106, 672–684. DOI: 10.1198/jasa.2011.tm10560..
Web of Science ®Google Scholar
Cai, T. T., Zhang, C.-H., and Zhou, H. H. (2010), “Optimal Rates of Convergence for Covariance Matrix Estimation,” Annals of Statistics, 38, 2118–2144.
Web of Science ®Google Scholar
Coyle, J., and Hejazi, N. (2018), “origami: A Generalized Framework for Cross-Validation in R,” Journal of Open Source Software, 3, 512. DOI: 10.21105/joss.00512..
Google Scholar
Dudoit, S., and van der Laan, M. J. (2005), “Asymptotics of Cross-validated Risk Estimation in Estimator Selection and Performance Assessment,” Statistical Methodology, 2, 131–154. DOI: 10.1016/j.stamet.2005.02.003.
Google Scholar
Efron, B. (2012), Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambridge: Cambridge University Press.
Google Scholar
Fan, J., Fan, Y., and Lv, J. (2008), “High Dimensional Covariance Matrix Estimation using a Factor Model,” Journal of Econometrics, 147, 186 – 197. Available at Econometric modelling in finance and risk management: An overview. http://www.sciencedirect.com/science/article/pii/S0304407608001346.
Web of Science ®Google Scholar
Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,” Journal of the Royal Statistical Society, Series B, 75, 603–680. DOI: 10.1111/rssb.12016.
Google Scholar
Fan, J., Liao, Y., and Wang, W. (2016), “Projected Principal Component Analysis in Factor Models,” Annals of Statistics, 44, 219–254.
PubMed Web of Science ®Google Scholar
Fan, J., Wang, W., and Zhong, Y. (2019), “Robust Covariance Estimation for Approximate Factor Models,” Journal of Econometrics, 208, 5–22. Special Issue on Financial Engineering and Risk Management. DOI: 10.1016/j.jeconom.2018.09.003.
PubMed Web of Science ®Google Scholar
Fang, Y., Wang, B., and Feng, Y. (2016), “Tuning-parameter Selection in Regularized Estimations of Large Covariance Matrices,” Journal of Statistical Computation and Simulation, 86, 494–509. DOI: 10.1080/00949655.2015.1017823..
Web of Science ®Google Scholar
Friedman, J., Hastie, T., and Tibshirani, R. (2001), The Elements of Statistical Learning, volume 1, Springer Series in Statistics, New York: Springer.
Google Scholar
Golub, G. H., and Van Loan, C. F. (1996), Matrix Computations (3rd ed.), Baltimore, MD: Johns Hopkins Universtiy Press.
Google Scholar
Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2002), A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics, New York: Springer.
Google Scholar
Johnstone, I. M. (2001), “On the Distribution of the Largest Eigenvalue in Principal Components Analysis,” Annals of Statistics, 29, 295–327.
Web of Science ®Google Scholar
Johnstone, I. M., and Lu, A. Y. (2009), “On Consistency and Sparsity for Principal Components Analysis in High Dimensions,” Journal of the American Statistical Association, 104, 682–693. DOI: 10.1198/jasa.2009.0121.
PubMed Web of Science ®Google Scholar
Lam, C., and Fan, J. (2009), “Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation,” Annals of Statistics, 37, 4254–4278.
PubMed Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2004), “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 88, 365–411. DOI: 10.1016/S0047-259X(03)00096-4.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2012), “Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices,” Annals of Statistics, 40, 1024–1060.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2015), “Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions,” Journal of Multivariate Analysis, 139, 360–384.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2018), “Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices,” ECON – Working papers 264, Department of Economics - University of Zurich. Available at https://EconPapers.repec.org/RePEc:zur:econwp:264.
Google Scholar
Marčenko, V. A., and Pastur, L. A. (1967), “Distribution of Eigenvalues for some Sets of Random Matrices,” Mathematics of the USSR-Sbornik, 1, 457–483. DOI: 10.1070/SM1967v001n04ABEH001994.
Google Scholar
McInnes, L., Healy, J., and Melville, J. (2018), “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” arXiv, 1802.03426. Available at http://arxiv.org/abs/1802.03426.
Google Scholar
Onatski, A. (2012), “Asymptotics of the Principal Components Estimator of Large Factor Models with Weakly Influential Factors,” Journal of Econometrics, 168, 244–258. DOI: 10.1016/j.jeconom.2012.01.034.
Web of Science ®Google Scholar
Poincaré, H. (1912), Calcul des probabilités (Vol. 1), Paris: Gauthier-Villars.
Google Scholar
R Core Team. (2021), R: A Language and Environment for Statistical Computing, Vienna: R Foundation for Statistical Computing. Available at https://www.R-project.org/.
Google Scholar
Risso, D., and Cole, M. (2020), scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets. R package version 2.3.17.
Google Scholar
Robbins, H. (1964), “The Empirical Bayes Approach to Statistical Decision Problems,” Annals of Mathematical Statistics, 35, 1–20. DOI: 10.1214/aoms/1177703729.
Google Scholar
Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized Thresholding of Large Covariance Matrices,” Journal of the American Statistical Association, 104, 177–186. DOI: 10.1198/jasa.2009.0101..
Web of Science ®Google Scholar
Schäfer, J., and Strimmer, K. (2005), “A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics,” Statistical Applications in Genetics and Molecular Biology, 4. DOI: 10.2202/1544-6115.1175.
PubMed Web of Science ®Google Scholar
Smith, S. (2005), “Covariance, Subspace, and Intrinsic Crame/spl acute/r-Rao bounds,” IEEE Transactions on Signal Processing, 53, 1610–1630. DOI: 10.1109/TSP.2005.845428.
Web of Science ®Google Scholar
Stock, J. H., and Watson, M. W. (2002), “Forecasting Using Principal Components from a Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179. Available at http://www.jstor.org/stable/3085839.
Google Scholar
Tasic, B., Menon, V., Nguyen, T. N., Kim, T. K., Jarsky, T., Yao, Z., Levi, B., Gray, L. T., Sorensen, S. A., Dolbeare, T., Bertagnolli, D., Goldy, J., Shapovalova, N., Parry, S., Lee, C., Smith, K., Bernard, A., Madisen, L., Sunkin, S. M., Hawrylycz, M., Koch, C., and Zeng, H. (2016), “Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics,” Nature Neuroscience, 19, 335–346. 10.1038/nn.4216.
PubMed Web of Science ®Google Scholar
van der Laan, M. J., and Dudoit, S. (2003), “Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples,” Working paper 130, University of California, Berkeley. Available at https://biostats.bepress.com/ucbbiostat/paper130/.
Google Scholar
van der Vaart, A. W., Dudoit, S., and van der Laan, M. J. (2006), “Oracle Inequalities for Multi-Fold Cross Validation,” Statistics and Decisions, 24, 351–371. DOI: 10.1524/stnd.2006.24.3.351.
Google Scholar
Zeisel, A., Muñoz-Manchado, A. B., Codeluppi, S., Lönnerberg, P., La Manno, G., Juréus, A., Marques, S., Munguba, H., He, L., Betsholtz, C., Rolny, C., Castelo-Branco, G., Hjerling-Leffler, J., and Linnarsson, S. (2015), “Cell Types in the Mouse Cortex and Hippocampus Revealed by Single-Cell RNA-seq,” Science, 347, 1138–1142. DOI: 10.1126/science.aaa1934.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Cross-Validated Loss-based Covariance Matrix Estimator Selection in High Dimensions

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Cross-Validated Loss-based Covariance Matrix Estimator Selection in High Dimensions

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date