4,790
Views
14
CrossRef citations to date
0
Altmetric
Theory and Methods

Detecting Abrupt Changes in the Presence of Local Fluctuations and Autocorrelated Noise

, , & ORCID Icon
Pages 2147-2162 | Received 15 May 2020, Accepted 19 Mar 2021, Published online: 18 May 2021

Figures & data

Fig. 1 Segmentations of well-log data: wild binary segmentation using the strengthened Schwarz information criteria (top); segmentation under square error loss with penalty inflated to account for autocorrelation in measurement error (middle); optimal segmentation from DeCAFS with default penalty (bottom). Each plot shows the data (black line) the estimated mean (red line) and changepoint location (vertical blue dashed lines).

Fig. 1 Segmentations of well-log data: wild binary segmentation using the strengthened Schwarz information criteria (top); segmentation under square error loss with penalty inflated to account for autocorrelation in measurement error (middle); optimal segmentation from DeCAFS with default penalty (bottom). Each plot shows the data (black line) the estimated mean (red line) and changepoint location (vertical blue dashed lines).

Fig. 2 Top row: projections of data v for detecting a change in the middle of n = 100 data-points. Random walk model (top-left) for varying ση2 of 0.03 (black), 0.02 (red) and 0.01 (green); AR(1) plus random walk model (top-right) for ση2=0.01 and varying ϕ of 0.4 (black), 0.2 (red) and 0.1 (green). In both plots the blue line shows the standard cusum projection. Bottom row: noncentrality parameter for a χ12 test of a change using the optimal projection (solid line) and the cusum projection (dashed line) for a change of size 1 in the middle of the data as we vary n. Out-fill asymptotics (bottom-left) where (ση2,ϕ) is (0.0025,0) (black), (0.01,0) (red), (0.0025,0.5) (green) and (0.01,0.5) (blue); In-fill asymptotics (bottom-right) where for n = 50 (ση2,ϕ) is (0.0025,0) (black), (0.01,0) (red), (0.0025,0.5) (green) and (0.01,0.5) (blue).

Fig. 2 Top row: projections of data v for detecting a change in the middle of n = 100 data-points. Random walk model (top-left) for varying ση2 of 0.03 (black), 0.02 (red) and 0.01 (green); AR(1) plus random walk model (top-right) for ση2=0.01 and varying ϕ of 0.4 (black), 0.2 (red) and 0.1 (green). In both plots the blue line shows the standard cusum projection. Bottom row: noncentrality parameter for a χ12 test of a change using the optimal projection (solid line) and the cusum projection (dashed line) for a change of size 1 in the middle of the data as we vary n. Out-fill asymptotics (bottom-left) where (ση2,ϕ) is (0.0025,0) (black), (0.01,0) (red), (0.0025,0.5) (green) and (0.01,0.5) (blue); In-fill asymptotics (bottom-right) where for n = 50 (ση2,ϕ) is (0.0025,0) (black), (0.01,0) (red), (0.0025,0.5) (green) and (0.01,0.5) (blue).

Fig. 3 Four different change scenarios. Top-left, no change present, top-right, change pattern with 19 different changes, bottom-left up changes only, bottom-right, up-down changes of the same magnitude. In this particular example data were generated from an AR model with ϕ=0.7,σν=2.

Fig. 3 Four different change scenarios. Top-left, no change present, top-right, change pattern with 19 different changes, bottom-left up changes only, bottom-right, up-down changes of the same magnitude. In this particular example data were generated from an AR model with ϕ=0.7,σν=2.

Fig. 4 F1 Scores on the 4 different scenarios. In A a pure AR(1) over a range of values of ϕ, for fixed values of σν=2,ση=0 and a change of magnitude 10. In B a pure AR(1) process with fixed ϕ=0.85 and changes in the signal of various magnitudes. In C the full model with ϕ=0.85 for a range of values of ση. The gray line represents the cross-section between parameters values in A, B, and C. AR1Seg est. and DeCAFS est. refer to the segmentation of the relative algorithms with estimated parameters. Note, in B the results from DeCAFS and DeCAFS est overlap so only one line is visible. Other algorithms use the true parameter values.

Fig. 4 F1 Scores on the 4 different scenarios. In A a pure AR(1) over a range of values of ϕ, for fixed values of σν=2,ση=0 and a change of magnitude 10. In B a pure AR(1) process with fixed ϕ=0.85 and changes in the signal of various magnitudes. In C the full model with ϕ=0.85 for a range of values of ση. The gray line represents the cross-section between parameters values in A, B, and C. AR1Seg est. and DeCAFS est. refer to the segmentation of the relative algorithms with estimated parameters. Note, in B the results from DeCAFS and DeCAFS est overlap so only one line is visible. Other algorithms use the true parameter values.

Fig. 5 F1 score on different scenarios with AR(2) noise as we vary ϕ2. Data simulated fixing σν=2,ση=0 and ϕ1=0.3 over a change of size 20.

Fig. 5 F1 score on different scenarios with AR(2) noise as we vary ϕ2. Data simulated fixing σν=2,ση=0 and ϕ1=0.3 over a change of size 20.

Fig. 6 In A the F1Score on the 4 scenarios for the Sinusoidal Model for fixed amplitude of 15, changes of size 5 and IID Gaussian noise with a variance of 4, as we vary the frequency of the sinusoidal process. In B an example of a realization for the updown scenario, vertical segments refer to estimated changepoint locations of DeCAFS (in light green) and AR1Seg (in blue).

Fig. 6 In A the F1Score on the 4 scenarios for the Sinusoidal Model for fixed amplitude of 15, changes of size 5 and IID Gaussian noise with a variance of 4, as we vary the frequency of the sinusoidal process. In B an example of a realization for the updown scenario, vertical segments refer to estimated changepoint locations of DeCAFS (in light green) and AR1Seg (in blue).

Fig. 7 On top: comparison of the F1 Score in A1, Precision in A2 and MSE in A3, for DeCAFS (in green) and LAVA (in red) with oracle initial parameters and the relative results with estimated initial parameters (in lighter colours), on the updown scenario for a random walk signal over a range of values of ση. On the bottom the first 250 observations of two realizations of the experiment with, in B1, ση equal to 0.5 and in B2 ση equal to 2. Again, the continuous lines over the data points represent the signal estimates of DeCAFS and LAVA; and the vertical lines below show their estimated changepoint locations.

Fig. 7 On top: comparison of the F1 Score in A1, Precision in A2 and MSE in A3, for DeCAFS (in green) and LAVA (in red) with oracle initial parameters and the relative results with estimated initial parameters (in lighter colours), on the updown scenario for a random walk signal over a range of values of ση. On the bottom the first 250 observations of two realizations of the experiment with, in B1, ση equal to 0.5 and in B2 ση equal to 2. Again, the continuous lines over the data points represent the signal estimates of DeCAFS and LAVA; and the vertical lines below show their estimated changepoint locations.

Fig. 8 Data on 2000 bp of the plus-strand of the Bacilus subtilis chromosome. Gray dots show the original data. The plain red line represents the estimated signal of DeCAFS with a penalty of 10log(n). The dashed black line represents the estimated signal of hmmTiling.

Fig. 8 Data on 2000 bp of the plus-strand of the Bacilus subtilis chromosome. Gray dots show the original data. The plain red line represents the estimated signal of DeCAFS with a penalty of 10 log (n). The dashed black line represents the estimated signal of hmmTiling.

Fig. 9 Benchmark comparisons. The number of promoters (left) and terminators (right) correctly predicted on the plus strand, M(δ) using a 22 bp distance cutoff, as a function of the number of predicted breakpoints, R(δ). Plain black lines are the results of hmmTiling (as reported in of Nicolas et al. Citation2009)). Dotted black lines are the results of hmmTiling when considering all probes rather than only those called transitions. Plain red lines are the results of DeCAFS using β=8log(n) for promoters and 5log(n) for terminators. These values were learned on the minus strand using a data-driven approach. The thin dark-green leaning line represent y=x.

Fig. 9 Benchmark comparisons. The number of promoters (left) and terminators (right) correctly predicted on the plus strand, M(δ) using a 22 bp distance cutoff, as a function of the number of predicted breakpoints, R(δ). Plain black lines are the results of hmmTiling (as reported in Figure 4 of Nicolas et al. Citation2009)). Dotted black lines are the results of hmmTiling when considering all probes rather than only those called transitions. Plain red lines are the results of DeCAFS using β=8 log (n) for promoters and 5 log (n) for terminators. These values were learned on the minus strand using a data-driven approach. The thin dark-green leaning line represent y=x.
Supplemental material

Supplemental Material

Download PDF (2.9 MB)