1,940
Views
12
CrossRef citations to date
0
Altmetric
Theory and Methods

Distributed Estimation for Principal Component Analysis: An Enlarged Eigenspace Analysis

, , &
Pages 1775-1786 | Received 05 Apr 2020, Accepted 01 Feb 2021, Published online: 06 Apr 2021
 

Abstract

The growing size of modern datasets brings many challenges to the existing statistical estimation approaches, which calls for new distributed methodologies. This article studies distributed estimation for a fundamental statistical machine learning problem, principal component analysis (PCA). Despite the massive literature on top eigenvector estimation, much less is presented for the top-L-dim (L > 1) eigenspace estimation, especially in a distributed manner. We propose a novel multi-round algorithm for constructing top-L-dim eigenspace for distributed data. Our algorithm takes advantage of shift-and-invert preconditioning and convex optimization. Our estimator is communication-efficient and achieves a fast convergence rate. In contrast to the existing divide-and-conquer algorithm, our approach has no restriction on the number of machines. Theoretically, the traditional Davis–Kahan theorem requires the explicit eigengap assumption to estimate the top-L-dim eigenspace. To abandon this eigengap assumption, we consider a new route in our analysis: instead of exactly identifying the top-L-dim eigenspace, we show that our estimator is able to cover the targeted top-L-dim population eigenspace. Our distributed algorithm can be applied to a wide range of statistical problems based on PCA, such as principal component regression and single index model. Finally, we provide simulation studies to demonstrate the performance of the proposed distributed estimator.

Supplementary Materials

In the supplementary material, we first provide detailed proofs of all our theoretical results in Section 3. We then study two applications of our distributed PCA: principal component regression and single index model. We provide theoretical findings and conduct additional numerical experiments for these two applications.

Acknowledgments

The authors would like to thank the editor-in-chief, the anonymous associated editor, and two anonymous referees for many useful suggestions and feedback that greatly improve the article.

Funding

Xi Chen is supported by NSF grant IIS-1845444. Jason D. Lee is supported by NSF grant CCF-2002272. Yun Yang is supported by NSF grant DMS-1810831.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.