Search in:

Advanced search

Journal of the American Statistical Association Volume 117, 2022 - Issue 540

Submit an article Journal homepage

1,940

Views

CrossRef citations to date

Altmetric

Theory and Methods

Distributed Estimation for Principal Component Analysis: An Enlarged Eigenspace Analysis

Xi Chena Stern School of Business, New York University, New York, NY; Correspondence[email protected]

Jason D. Leeb Department of Electrical Engineering, Princeton University, Princeton, NJ;

He Lia Stern School of Business, New York University, New York, NY;

Yun Yangc Department of Statistics, University of Illinois Urbana-Champaign, Champaign, IL

Pages 1775-1786 | Received 05 Apr 2020, Accepted 01 Feb 2021, Published online: 06 Apr 2021

Cite this article
https://doi.org/10.1080/01621459.2021.1886937
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Allen-Zhu, Z., and Li, Y. (2016), “LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain,” in Proceedings of the Advances in Neural Information Processing Systems (NIPS).
Google Scholar
Bair, E., Hastie, T., Paul, D., and Tibshirani, R. (2006), “Prediction by Supervised Principal Components,” Journal of the American Statistical Association, 101, 119–137. DOI: 10.1198/016214505000000628.
Web of Science ®Google Scholar
Banerjee, M., Durot, C., and Sen, B. (2019), “Divide and Conquer in Nonstandard Problems and the Super-Efficiency Phenomenon,” The Annals of Statistics, 47, 720–757. DOI: 10.1214/17-AOS1633.
Web of Science ®Google Scholar
Battey, H., Fan, J., Liu, H., Lu, J., and Zhu, Z. (2018), “Distributed Testing and Estimation Under Sparse High Dimensional Models,” The Annals of Statistics, 46, 1352. DOI: 10.1214/17-AOS1587.
PubMed Web of Science ®Google Scholar
Bengio, Y., Courville, A., and Vincent, P. (2013), “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1798–1828. DOI: 10.1109/TPAMI.2013.50.
PubMed Web of Science ®Google Scholar
Cai, T. T., Ma, Z., and Wu, Y. (2013), “Sparse PCA: Optimal Rates and Adaptive Estimation,” The Annals of Statistics, 41, 3074–3110. DOI: 10.1214/13-AOS1178.
Web of Science ®Google Scholar
Cai, T. T., and Zhang, A. (2018), “Rate-Optimal Perturbation Bounds for Singular Subspaces With Applications to High-Dimensional Statistics,” The Annals of Statistics, 46, 60–89. DOI: 10.1214/17-AOS1541.
Web of Science ®Google Scholar
Chen, X., Liu, W., Mao, X., and Yang, Z. (2020), “Distributed High-Dimensional Regression Under a Quantile Loss Function,” Journal of Machine Learning Research, 21, 1–43.
PubMed Web of Science ®Google Scholar
Chen, X., Liu, W., and Zhang, Y. (2021), “First-Order Newton-Type Estimator for Distributed Estimation and Inference,” Journal of the American Statistical Association, in press, DOI: 10.1080/01621459.2021.1891925.
PubMed Web of Science ®Google Scholar
Chen, X., Liu, W., and Zhang, Y. (2019), “Quantile Regression Under Memory Constraint,” The Annals of Statistics, 47, 3244–3273.
Web of Science ®Google Scholar
Davis, C., and Kahan, W. M. (1970), “The Rotation of Eigenvectors by a Perturbation. III,” SIAM Journal on Numerical Analysis, 7, 1–46.
Web of Science ®Google Scholar
Fan, J., Guo, Y., and Wang, K. (2019), “Communication-Efficient Accurate Statistical Estimation,” arXiv no. 1906.04870.
Google Scholar
Fan, J., Wang, D., Wang, K., and Zhu, Z. (2019), “Distributed Estimation of Principal Eigenspaces,” The Annals of Statistics, 47, 3009–3031.
PubMed Web of Science ®Google Scholar
Fan, J., and Wang, W. (2017), “Asymptotics of Empirical Eigen-Structure for Ultra-High Dimensional Spiked Covariance Model,” The Annals of Statistics, 45, 1342–1374.
PubMed Web of Science ®Google Scholar
Frank, L. E., and Friedman, J. H. (1993), “A Statistical View of Some Chemometrics Regression Tools,” Technometrics, 35, 109–135.
Web of Science ®Google Scholar
Garber, D., and Hazan, E. (2015), “Fast and Simple PCA via Convex Optimization,” arXiv no. 1509.05647.
Google Scholar
Garber, D., Hazan, E., Jin, C., Kakade, S. M., Musco, C., Netrapalli, P., and Sidford, A. (2016), “Faster Eigenvector Computation via Shift-and-Invert Preconditioning,” in Proceedings of the International Conference on Machine Learning (ICML).
Google Scholar
Garber, D., Shamir, O., and Srebro, N. (2017), “Communication-Efficient Algorithms for Distributed Stochastic Principal Component Analysis,” in Proceedings of the International Conference on Machine Learning (ICML).
Google Scholar
Horowitz, J. L. (2009), Semiparametric and Nonparametric Methods in Econometrics (Vol. 12), New York: Springer.
Google Scholar
Hotelling, H. (1933), “Analysis of a Complex of Statistical Variables Into Principal Components,” Journal of Educational Psychology, 24, 417–441.
Google Scholar
Hristache, M., Juditsky, A., and Spokoiny, V. (2001), “Direct Estimation of the Index Coefficient in a Single-Index Model,” The Annals of Statistics, 29, 595–623.
Web of Science ®Google Scholar
Janzamin, M., Sedghi, H., and Anandkumar, A. (2014), “Score Function Features for Discriminative Learning: Matrix and Tensor Framework,” arXiv no. 1412.2863.
Google Scholar
Jeffers, J. (1967), “Two Case Studies in the Application of Principal Component Analysis,” Journal of the Royal Statistical Society, Series C, 16, 225–236.
Web of Science ®Google Scholar
Johnson, R., and Zhang, T. (2013), “Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction,” in Advances in Neural Information Processing Systems (NIPS).
Google Scholar
Johnstone, I. M. (2001), “On the Distribution of the Largest Eigenvalue in Principal Components Analysis,” The Annals of Statistics, 29, 295–327.
Web of Science ®Google Scholar
Johnstone, I. M., and Lu, A. Y. (2009), “On Consistency and Sparsity for Principal Components Analysis in High Dimensions,” Journal of the American Statistical Association, 104, 682–693.
PubMed Web of Science ®Google Scholar
Jolliffe, I. T. (1982), “A Note on the Use of Principal Components in Regression,” Journal of the Royal Statistical Society, Series C, 31, 300–303.
Web of Science ®Google Scholar
Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-Efficient Distributed Statistical Inference,” Journal of the American Statistical Association, 114, 668–681.
Web of Science ®Google Scholar
Lee, J. D., Liu, Q., Sun, Y., and Taylor, J. E. (2017), “Communication-Efficient Sparse Regression,” Journal of Machine Learning Research, 18, 1–30.
Web of Science ®Google Scholar
Li, K.-C. (1992), “On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein’s Lemma,” Journal of the American Statistical Association, 87, 1025–1039. DOI: 10.1080/01621459.1992.10476258.
Web of Science ®Google Scholar
Pearson, K. (1901), “LIII. On Lines and Planes of Closest Fit to Systems of Points in Space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 559–572. DOI: 10.1080/14786440109462720.
Google Scholar
Rigollet, P., and Hütter, J.-C. (2015), “High Dimensional Statistics,” Lecture Notes for Course 18S997.
Google Scholar
Shamir, O. (2016), “Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity,” in Proceedings of the International Conference on Machine Learning (ICML).
Google Scholar
Shamir, O., Srebro, N., and Zhang, T. (2014), “Communication Efficient Distributed Optimization Using an Approximate Newton-Type Method,” in Proceedings of the International Conference on Machine Learning (ICML).
Google Scholar
Shi, C., Lu, W., and Song, R. (2018), “A Massive Data Framework for M-Estimators With Cubic-Rate,” Journal of American Statistical Association, 113, 1698–1709. DOI: 10.1080/01621459.2017.1360779.
PubMed Web of Science ®Google Scholar
Stein, C. M. (1981), “Estimation of the Mean of a Multivariate Normal Distribution,” The Annals of Statistics, 9, 1135–1151. DOI: 10.1214/aos/1176345632.
Web of Science ®Google Scholar
Van Loan, C., and Golub, G. (2012), Matrix Computations (3rd ed.), Baltimore, MD: Johns Hopkins University Press.
Google Scholar
Vershynin, R. (2012), “Introduction to the Non-Asymptotic Analysis of Random Matrices,” in Compressed Sensing, pp. 210–268.
Google Scholar
Volgushev, S., Chao, S.-K., and Cheng, G. (2019), “Distributed Inference for Quantile Regression Processes,” The Annals of Statistics, 47, 1634–1662. DOI: 10.1214/18-AOS1730.
Web of Science ®Google Scholar
Vu, V. Q., and Lei, J. (2013), “Minimax Sparse Principal Subspace Estimation in High Dimensions,” The Annals of Statistics, 41, 2905–2947. DOI: 10.1214/13-AOS1151.
Web of Science ®Google Scholar
Wang, X., Yang, Z., Chen, X., and Liu, W. (2019), “Distributed Inference for Linear Support Vector Machine,” Journal of Machine Learning Research, 20, 1–41.
Web of Science ®Google Scholar
Wen, Z., and Yin, W. (2013), “A Feasible Method for Optimization With Orthogonality Constraints,” Mathematical Programming, 142, 397–434. DOI: 10.1007/s10107-012-0584-1.
Web of Science ®Google Scholar
Xu, Z. (2018), “Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation,” in Advances in Neural Information Processing Systems (NIPS).
Google Scholar
Yang, Z., Balasubramanian, K., and Liu, H. (2017), “On Stein’s Identity and Near-Optimal Estimation in High-Dimensional Index Models,” arXiv no. 1709.08795.
Google Scholar
Yu, Y., Wang, T., and Samworth, R. J. (2014), “A Useful Variant of the Davis–Kahan Theorem for Statisticians,” Biometrika, 102, 315–323. DOI: 10.1093/biomet/asv008.
Web of Science ®Google Scholar
Zhang, Y., Duchi, J., and Wainwright, M. (2015), “Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm With Minimax Optimal Rates,” Journal of Machine Learning Research, 16, 3299–3340.
Web of Science ®Google Scholar
Zhao, T., Cheng, G., and Liu, H. (2016), “A Partially Linear Framework for Massive Heterogeneous Data,” The Annals of Statistics, 44, 1400–1437. DOI: 10.1214/15-AOS1410.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Distributed Estimation for Principal Component Analysis: An Enlarged Eigenspace Analysis

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Distributed Estimation for Principal Component Analysis: An Enlarged Eigenspace Analysis

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date