1,267
Views
16
CrossRef citations to date
0
Altmetric
Theory and Methods

Fast, Exact Bootstrap Principal Component Analysis for p > 1 Million

Pages 846-860 | Received 01 May 2014, Published online: 18 Aug 2016
 

Abstract

Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components (PCs) from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap PCs, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap PCs are limited to the same n-dimensional subspace and can be efficiently represented by their low-dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low-dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first three PCs based on 1000 bootstrap samples to be calculated on a standard laptop in 47 min, as opposed to approximately 4 days with standard methods. Supplementary materials for this article are available online.

View correction statement:
Correction

Notes

1 Note, if the data have been centered, then n − 1 basis vectors are sufficient. For brevity of notation though, we will generally refer to the subspace under either scenario, centered or uncentered, as n-dimensional.

2 The bootstrap score matrix is equal to DbUb', and the variances explained by each bootstrap PC are equal to the diagonals of (1/(n − 1))(Db)2. These variances explained can also be expressed as a proportion of the total variance of the bootstrap sample, which can be calculated as trace(var(Yb)) = (1/(n − 1))||DUPb||2 = (1/(n − 1))∑ni = 1j = 1n(DUPb)2[i, j].

3 Here, the correlation operation is taken across the p elements of the vector, without the operation’s common statistical interpretation that each vector element is a new observation of a random variable.

4 In practice, we calculate the diagonals of Vcov(Ab[, k])V′ by the row sums of (Vcov(Ab[, k]))○(V), where ○ denotes element-wise multiplication as opposed to traditional matrix multiplication.

5 One interpretation of CIs constructed from rotation adjusted bootstrap PCs is that if the population PC matrix is rotated toward each sample from the population, then average pointwise coverage of rotation adjusted CIs should be approximately 100α%

6 The computational complexity of finding the appropriate rotation matrix in each bootstrap depends on taking the SVD of the K × K matrix V[,1:K]b'T=A[,1:K]b'V'T, where VT can be precalculated before the bootstrap procedure.

7 In each bootstrap sample, the variance explained by the columns of T is equal to the variance of the resampled data after a projection onto the space spanned by T. The projected data are equal to T(TT)− 1TYb = (T(TT)− 1TV)DUPb, where T′(TT)− 1TV is an n × n matrix that can be precalculated before the bootstrap procedure.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.