1,267
Views
16
CrossRef citations to date
0
Altmetric
Theory and Methods

Fast, Exact Bootstrap Principal Component Analysis for p > 1 Million

Pages 846-860 | Received 01 May 2014, Published online: 18 Aug 2016
 

Abstract

Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components (PCs) from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap PCs, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap PCs are limited to the same n-dimensional subspace and can be efficiently represented by their low-dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low-dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first three PCs based on 1000 bootstrap samples to be calculated on a standard laptop in 47 min, as opposed to approximately 4 days with standard methods. Supplementary materials for this article are available online.

View correction statement:
Correction

Notes

1 Note, if the data have been centered, then n − 1 basis vectors are sufficient. For brevity of notation though, we will generally refer to the subspace under either scenario, centered or uncentered, as n-dimensional.

2 The bootstrap score matrix is equal to DbUb', and the variances explained by each bootstrap PC are equal to the diagonals of (1/(n − 1))(Db)2. These variances explained can also be expressed as a proportion of the total variance of the bootstrap sample, which can be calculated as trace(var(Yb)) = (1/(n − 1))||DUPb||2 = (1/(n − 1))∑ni = 1j = 1n(DUPb)2[i, j].

3 Here, the correlation operation is taken across the p elements of the vector, without the operation’s common statistical interpretation that each vector element is a new observation of a random variable.

4 In practice, we calculate the diagonals of Vcov(Ab[, k])V′ by the row sums of (Vcov(Ab[, k]))○(V), where ○ denotes element-wise multiplication as opposed to traditional matrix multiplication.

5 One interpretation of CIs constructed from rotation adjusted bootstrap PCs is that if the population PC matrix is rotated toward each sample from the population, then average pointwise coverage of rotation adjusted CIs should be approximately 100α%

6 The computational complexity of finding the appropriate rotation matrix in each bootstrap depends on taking the SVD of the K × K matrix V[,1:K]b'T=A[,1:K]b'V'T, where VT can be precalculated before the bootstrap procedure.

7 In each bootstrap sample, the variance explained by the columns of T is equal to the variance of the resampled data after a projection onto the space spanned by T. The projected data are equal to T(TT)− 1TYb = (T(TT)− 1TV)DUPb, where T′(TT)− 1TV is an n × n matrix that can be precalculated before the bootstrap procedure.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.