7,138
Views
24
CrossRef citations to date
0
Altmetric
Research Article

Fast Robust Correlation for High-Dimensional Data

& ORCID Icon
Pages 184-198 | Received 03 Apr 2019, Accepted 24 Sep 2019, Published online: 01 Nov 2019

Figures & data

Table 1 Computation times (in seconds) of various correlation matrices as a function of the dimension d, for n = 1000 observations.

Fig. 1 Location influence functions at ρ = 0 for different transformations g.

Fig. 1 Location influence functions at ρ = 0 for different transformations g.

Fig. 2 The proposed transformation (22) with default constants b = 1.5 and c = 4.

Fig. 2 The proposed transformation (22) with default constants b = 1.5 and c = 4.

Table 2 Correlation measures based on transformations g with their breakdown value ε*, efficiency, gross-error sensitivity γ*, rejection point δ*, and correlation between X and g(X).

Fig. 3 Illustration of wrapping a standardized sample {z1,,zn}. Values in the interval [b,b] are left unchanged, whereas values outside [c,c] are zeroed. The intermediate values are “folded” inward so they still play a role.

Fig. 3 Illustration of wrapping a standardized sample {z1,…,zn}. Values in the interval [−b,b] are left unchanged, whereas values outside [−c,c] are zeroed. The intermediate values are “folded” inward so they still play a role.

Fig. 4 Bias and MSE of correlation measures based on transformation, for uncontaminated Gaussian data with sample size 100.

Fig. 4 Bias and MSE of correlation measures based on transformation, for uncontaminated Gaussian data with sample size 100.

Fig. 5 MSE of the correlation measures in with 10% of outliers placed at k = 3 (left) and k = 5 (right).

Fig. 5 MSE of the correlation measures in Figure 4 with 10% of outliers placed at k = 3 (left) and k = 5 (right).

Fig. 6 Bias and MSE of other robust correlation measures, for uncontaminated Gaussian data with sample size 100.

Fig. 6 Bias and MSE of other robust correlation measures, for uncontaminated Gaussian data with sample size 100.

Fig. 7 MSE of the correlation measures in with 10% of outliers placed at k = 3 (left) and k = 5 (right).

Fig. 7 MSE of the correlation measures in Figure 6 with 10% of outliers placed at k = 3 (left) and k = 5 (right).

Fig. 8 Left panel: power of dCor (dashed black curve) and its robust version (blue curve) for bivariate X and Y with distribution t(1) and independence except for X1=Y1 versus the sample size n. Right panel: power of dCor and its robust version for d-dimensional X and Y with distribution t(1) and n = 100, as a function of the dimension d.

Fig. 8 Left panel: power of dCor (dashed black curve) and its robust version (blue curve) for bivariate X and Y with distribution t(1) and independence except for X1=Y1 versus the sample size n. Right panel: power of dCor and its robust version for d-dimensional X and Y with distribution t(1) and n = 100, as a function of the dimension d.

Fig. 9 Prostate data: cellmap of the genes with the largest number of flagged cells.

Fig. 9 Prostate data: cellmap of the genes with the largest number of flagged cells.

Fig. 10 Frames 60, 100, and 200 of the video data.

Fig. 10 Frames 60, 100, and 200 of the video data.

Fig. 11 First loading vector of the video data, for classical PCA (upper left), Spearman correlation (upper right), Huber’s ψ (lower left), and wrapping (lower right).

Fig. 11 First loading vector of the video data, for classical PCA (upper left), Spearman correlation (upper right), Huber’s ψ (lower left), and wrapping (lower right).

Fig. 12 Residuals of the video data, for classical PCA (upper left), Spearman correlation (upper right), Huber’s ψ (lower left), and wrapping (lower right).

Fig. 12 Residuals of the video data, for classical PCA (upper left), Spearman correlation (upper right), Huber’s ψ (lower left), and wrapping (lower right).
Supplemental material

Supplemental Material

Download Zip (405.9 KB)