Full article: A General Framework for Constructing Locally Self-Normalized Multiple-Change-Point Tests

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We propose a general framework to construct self-normalized multiple-change-point tests with time series data. The only building block is a user-specified single-change-detecting statistic, which covers a large class of popular methods, including the cumulative sum process, outlier-robust rank statistics, and order statistics. The proposed test statistic does not require robust and consistent estimation of nuisance parameters, selection of bandwidth parameters, nor pre-specification of the number of change points. The finite-sample performance shows that the proposed test is size-accurate, robust against misspecification of the alternative hypothesis, and more powerful than existing methods. Case studies of the Shanghai-Hong Kong Stock Connect turnover are provided.

KEYWORDS:

1 Introduction

Testing structural stability is critical in statistical inference. There is a considerable amount of literature and practical interest in testing for the existence of change points. Csörgö and Horváth (Citation1988, Citation1997) provided comprehensive solutions to the at-most-one-change (AMOC) problem using both classical parametric and nonparametric approaches. One method assumes a fixed number of change points (m) for the multiple-change-point problem. The test statistics are constructed by dividing the data into segments and maximizing a sum of AMOC statistics computed within each segment; see Antoch and Jarušková (Citation2013) and Shao and Zhang (Citation2010). However, such approaches are vulnerable to misspecification of the true number of change points M and the computational cost often increases exponentially with m. Another method uses sequential estimation (Vostrikova Citation1981; Bai and Perron Citation1998). This class of methods sequentially performs AMOC tests and produces estimates if the no-change-point hypothesis is rejected. Because the structural difference of data before and after a particular change point may be less apparent in the presence of other change points, the AMOC-type tests may lose their power under the multiple-change-point setting.

Localization can be used to improve power. Bauer and Hackl (Citation1980) introduced the moving sum approach. Chu, Hornik, and Kaun (Citation1995) and Eichinger and Kirch (Citation2018) later applied the approach to change-point detection. This method selects a fixed window size and performs inference on each subsample. Fryzlewicz (Citation2014) proposed wild binary segmentation, which draws subsamples without fixing the window size, to avoid window size selection. Both algorithms attempt to find a subsample containing only one change point to boost power without specifying m. If the standard CUSUM process is used, the methods require consistent and change-point-robust estimation of the long-run variance, which can be nontrivial and may require tuning a bandwidth parameter (Andrews Citation1991). Worse still, different bandwidths can noticeably influence the tests (Kiefer and Vogelsang Citation2005). To avoid the challenging estimation, Lobato (Citation2001) first proposed using a self-normalizer that converges to a pivotal distribution proportional to the nuisance parameter. Shao and Zhang (Citation2010) later applied self-normalization to the AMOC problem. In the multiple-change-point problem, Zhang and Lavitas (Citation2018) proposed a self-normalized test that intuitively scans for the first and last change points. However, it may compromise power when larger changes occur in the middle.

In this article, our contributions are as follows: (a) We propose a novel localized self-normalization framework to transform AMOC statistics into powerful and size-accurate multiple-change-point tests, which apply to a broad class of time series models and a large class of AMOC statistics, including the CUSUM process and outlier-robust change-detecting statistics, instead of any specific process as constructed in existing methods. (b) The methods support multiple-change-point detection in general model parameters. The resulting tests achieve promising power. (c) Our test is also shown to be locally powerful asymptotically. (d) The novel self-normalizers can be recursively computed. (e) The test allows an intuitive extension to estimation with good theoretical properties.

The rest of the article is structured as follows. Section 2 reviews the self-normalization method proposed by Shao and Zhang (Citation2010) under the AMOC problem. Section 3.1 presents the proposed framework, and of Sections 3.2–3.4 explore applications of some popular change-detecting processes and extensions to estimation. In Section 4, the differences between various self-normalized approaches are comprehensively discussed. Some implementation issues are also discussed. Simulation results are presented in Section 5, which show good size accuracy and a substantial improvement in power. The article ends with a real-data analysis of the Shanghai-Hong Kong Stock Connect turnover in Section 6. Proofs of the main results are presented in the Appendix. A supplement containing additional simulation results, recursive formulas, algorithms, and an extension to long-range dependent time series is also included. An R-package “SNmct” is available online.

2 Introduction of Self-Normalization

Consider the signal-plus-noise model: $X_{i} = μ_{i} + Z_{i}$ , where $μ_{i} = E (X_{i})$ for $i = 1, \dots, n$ and ${Z_{i} : i \in Z}$ is a stationary noise sequence. The AMOC testing problem is formulated as follows: (2.1) $H_{0} : μ_{1} = \dots = μ_{n},$ (2.1) (2.2) $H_{1} : \exists 1 < k_{1} < n, μ_{1} = \dots = μ_{k_{1}} \neq μ_{k_{1} + 1} = \dots = μ_{n} .$ (2.2)

Let $ξ_{k_{1}, k_{2}} = \sum_{i = k_{1}}^{k_{2}} X_{i}$ if $1 \leq k_{1} \leq k_{2} \leq n$ ; $ξ_{k_{1}, k_{2}} = 0$ otherwise. A popular statistic for detecting a single change point is the CUSUM process, defined as (2.3) $C_{n} (⌊ n t ⌋) = n^{- 1 / 2} \sum_{i = 1}^{⌊ n t ⌋} (X_{i} - n^{- 1} ξ_{1, n}),$ (2.3) where $t \in [0, 1]$ and $⌊ n t ⌋$ is the largest integer part of nt. The limiting distribution of the functional in (2.3) is established by the following assumption.

Assumption 2.1.

As $n \to \infty, {n^{- 1 / 2} \sum_{i = 1}^{⌊ n t ⌋} Z_{i} : t \in [0, 1]} \Rightarrow {σ B (t) : t \in [0, 1]},$ where $σ^{2} = lim_{n \to \infty} var (ξ_{1, n}) / n \in (0, \infty)$ is the long-run variance, $B (\cdot)$ is the standard Brownian motion and “ $\Rightarrow$ ” denotes convergence in distribution in the Skorokhod space (Billingsley Citation1999).

Assumption 2.1 is known as the functional central limit theorem or Donsker’s invariance principle. Under standard regularity conditions, Assumption 2.1 is satisfied. For example, Herrndorf (Citation1984) proved the functional central limit theorem for the dependent data under some mixing conditions; Wu (Citation2007) later proved the strong convergence for stationary processes using physical and predictive dependence measures. Under the null hypothesis H₀, by the continuous mapping theorem, we have ${C_{n} (⌊ n t ⌋) : t \in [0, 1]} \Rightarrow {σ {B (t) - t B (1)} : t \in [0, 1]}$ .

Classically, the celebrated Kolmogorov–Smirnov test statistic, defined as ${KS}_{n} (σ) = sup_{k = 1, \dots, n} | C_{n} (k) / σ |$ , can be used for the AMOC problem. Because σ is typically unknown, we estimate it by an estimator $\hat{σ}$ that is consistent under H₀ and H₁; see Chan (Citation2022a, Citation2022b) for some possible estimators. Hence, ${KS}_{n} (\hat{σ})$ converges to $sup_{t \in (0, 1)} | B (t) - t B (1) |$ , which is known as the Kolmogorov distribution. Shao and Zhang (Citation2010) proposed to bypass the estimation of σ by normalizing $C_{n} (k)$ by a nondegenerate standardizing random process called a self-normalizer: $\begin{matrix} V_{n} (k) = n^{- 2} {\sum_{i = 1}^{k} {(ξ_{1, i} - \frac{i}{k} ξ_{1, k})}^{2} \\ + \sum_{i = k + 1}^{n} {(ξ_{i, n} - \frac{n - i + 1}{n - k} ξ_{k + 1, n})}^{2}}, \end{matrix}$ where $k = 1, \dots, n - 1$ . The resulting self-normalized test statistic for the AMOC problem is $S_{n}^{(1)}$ , where (2.4) $\begin{matrix} S_{n}^{(1)} = sup_{k = 1, \dots, n - 1} S_{n}^{(1)} (k) and \\ S_{n}^{(1)} (k) = C_{n}^{2} (k) / V_{n} (k) . \end{matrix}$ (2.4)

Under Assumption 2.1 and H₀, the limiting distribution of $S_{n}^{(1)}$ is nondegenerate and pivotal. The nuisance parameter $σ^{2}$ is asymptotically cancelled out in the numerator $C_{n}^{2} (k)$ and the denominator $V_{n} (k)$ on the right-hand side of (2.4). Moreover, because there is no change point in the intervals $[1, k_{1}]$ and $[k_{1} + 1, n]$ under H₁, the self-normalizer at the true change point, $V_{n} (k_{1})$ , is invariant to the change, and therefore their proposed test does not suffer from the well-known non-monotonic power problem; see, for example, Vogelsang (Citation1999).

However, this appealing feature no longer exists in the multiple-change-point setting. Moreover, their proposed self-normalizer is specifically designed for normalizing the CUSUM process, which may not be the best choice if the data have a heavy-tailed distribution. In the next section, we propose a framework for extending user-specified AMOC statistics to a multiple-change-point self-normalized test using the localization idea.

3 General Framework

3.1 Locally Self-Normalized Test Statistic

We consider the multiple-change-point testing problem, that is, to test the H₀ in (2.1) against (3.1) $\begin{matrix} H_{\geq 1} : μ_{1} = \dots = μ_{k_{1}} \neq μ_{k_{1} + 1} = \dots = μ_{k_{2}} \neq \dots \neq μ_{k_{M} + 1} \\ = \dots = μ_{n}, \end{matrix}$ (3.1) for unknown times $1 < k_{1} < \dots < k_{M} < n$ , and an unknown number of change points $M \geq 1$ . In this section, combining the advantages of localization and self-normalization, we lay out the underlying principles to define a general class of statistics for testing the existence of multiple change points.

Generally, for the AMOC problem, one may consider any specific global change-detecting process, ${D_{n} (⌊ n t ⌋)}_{0 \leq t \leq 1}$ . For some positive definite matrix $σ_{D}$ and some empirical process $D (\cdot)$ with a pivotal distribution, ${D_{n} (⌊ n t ⌋)}_{0 \leq t \leq 1}$ converges weakly to $σ_{D} D (\cdot)$ in the Skorokhod space. Typically, $D_{n} (k)$ is designed to detect whether k is a change point for $k = 1, \dots, n$ . We generalize the input global change-detecting process into a localized change-detecting statistic, defined as (3.2) $\begin{matrix} L_{n}^{(D)} (k | s, e) = {(\frac{n}{e - s + 1})}^{1 / 2} [D_{n} (k) - D_{n} (s - 1) \\ - \frac{k - s + 1}{e - s + 1} {D_{n} (e) - D_{n} (s - 1)}], \end{matrix}$ (3.2) where $1 \leq s \leq k \leq e \leq n$ . By the continuous mapping theorem, $L_{n}^{(D)} (\cdot | s, e)$ converges in distribution to a random variable proportional to a nuisance parameter σ_D. A self-normalizer is constructed by a function of the global change-detecting process to cancel σ_D, defined as (3.3) $\begin{matrix} V_{n}^{(D)} (k | s, e) = \frac{k - s + 1}{{(e - s + 1)}^{2}} \sum_{j = s}^{k} L_{n}^{(D)} {(j | s, k)}^{\otimes 2} \\ + \frac{e - k}{{(e - s + 1)}^{2}} \sum_{j = k + 1}^{e} L_{n}^{(D)} {(j | k + 1, e)}^{\otimes 2}, \end{matrix}$ (3.3) where $A^{\otimes 2} = A A^{T}$ for any matrix A. For $1 \leq s \leq k \leq e \leq n$ , define (3.4) $T_{n}^{(D)} (k | s, e) = L_{n}^{(D)} {(k | s, e)}^{T} V_{n}^{(D)} {(k | s, e)}^{- 1} L_{n}^{(D)} (k | s, e),$ (3.4) and set $T_{n}^{(D)} (k | s, e) = 0$ otherwise. By the continuous mapping theorem, it is not difficult to see that $T_{n}^{(D)} (\cdot | \cdot)$ is asymptotically pivotal. Consequently, $T_{n}^{(D)} (k | s, e)$ is a generalized locally self-normalized (LSN) statistic for detecting whether k is a change point over the local window $[s, e]$ . To infer whether k is a change point, we aggregate $T_{n}^{(D)} (k | \cdot)$ over all of the symmetric local windows and propose the score function, defined as (3.5) $T_{n}^{(D)} (k) = sup_{⌊ n ϵ ⌋ \leq d \leq n} T_{n}^{(D)} (k | k - d, k + 1 + d),$ (3.5) where $0 < ϵ < 1 / 2$ is a fixed local tuning parameter, which is similarly used in, for example, Huang, Volgushev, and Shao (Citation2015) and Zhang and Lavitas (Citation2018); see Remark 3.1 for the rationale for considering only the symmetric local windows. In particular, they suggested choosing $ϵ = 0.1$ , which is used throughout this article. Essentially, (3.5) compares the local subsamples of length d + 1 before and after time k, that is, $S_{before} = {X_{k - d}, \dots, X_{k}}$ and $S_{after} = {X_{k + 1}, \dots, X_{k + 1 + d}}$ , for each possible width d that is not too small. The statistic $T_{n}^{(D)} (k)$ records the change corresponding to the most discernible d; see Section 3.2 for more discussion on the change-detecting ability of the LSN statistic. Finally, to capture the effects of all potential change points, we aggregate the scores $T_{n}^{(D)} (k)$ across k and obtain the proposed test statistic, (3.6) $T_{n}^{(D)} = \frac{1}{n - 2 ⌊ ϵ n ⌋ - 1} \sum_{k = ⌊ ϵ n ⌋ + 1}^{n - ⌊ ϵ n ⌋ - 1} T_{n}^{(D)} (k) .$ (3.6)

Our framework constructs multiple-change-point tests through a two-layer aggregation process. The first aggregation across d is to let the data choose the window sizes. We remark that the performance of the moving sum (MOSUM) statistic, which is a special case of the localized CUSUM with a fixed window size, depends critically on the window size (Eichinger and Kirch Citation2018). To handle window size selection, Cho and Kirch (Citation2022) proposed localized pruning by first taking the union of fixed bandwidth change-point estimators produced by the MOSUM method or other multiscale method over a specific grid of bandwidths, and then pruning down the size of the first-step estimator. Although the procedure may work for point estimation, it is unclear whether it can be extended to construct powerful tests. The proposed framework employs an automatic window selection, based on the strength of discrepancy of $S_{before}$ and $S_{after}$ according to the LSN statistic, prior to the estimation stage to avoid additional pruning. Nevertheless, both authors suggested considering multiple bandwidths for robustness.

The purpose of the second layer is to gather all information on all possible change points. Generally, users may choose their preferred aggregation operators, for example, median, trimmed mean, etc, for aggregation. Formally, for any finite set of indices $S \subset Z$ having a size $| S |$ , we say that $P$ is an aggregation operator if $P_{i \in S} a_{i}$ maps ${a_{i} \in R : i \in S}$ to a real number. For example, if $P = max, P_{i \in S} a_{i} = {max}_{i \in S} a_{i}$ gives the maximum; and if $P = mean, P_{i \in S} a_{i} = \sum_{i \in S} a_{i} / | S |$ gives the sample mean. Hence, the most general form of our proposed test statistic is $\begin{matrix} T_{n}^{(D)} = T_{n}^{(D)} [P^{(1)}, P^{(2)}] \\ = P_{⌊ ϵ n ⌋ < k < n - ⌊ ϵ n ⌋}^{(2)} [P_{⌊ n ϵ ⌋ \leq d \leq n}^{(1)} {T_{n}^{(D)} (k | k - d, k + 1 + d)}], \end{matrix}$ where $P^{(1)}$ and $P^{(2)}$ are some aggregation operators. The optimal choices of aggregation operators depend on the number of change points and their respective sizes. By construction, the scores (3.5) remain large over the neighborhoods of the true change points. Intuitively, the average score captures evidence from all change points and their neighborhoods, whereas the maximum score can only capture evidence from a single change point corresponding to the highest score. According to our experience and the simulation results provided in Section C.12 of the supplement, the gain in power is noticeable when $P^{(2)} = mean$ and all the changes are of the same size. However, the gain diminishes as the portions of smaller changes increase relative to those of larger changes. Nevertheless, $T_{n}^{(D)} [max, mean]$ still has greater power in the cases considered. Therefore, we recommend selecting $P^{(2)} = mean$ , as used in the rest of this article.

Our approach is more flexible as it supports a general change-detecting process $D_{n} (\cdot)$ and general aggregation operators $P^{(1)}$ and $P^{(2)}$ . Moreover, the locally self-normalized statistic is a function of the supplied global change-detecting process. Therefore, recursive computation of the locally self-normalized statistics, regardless of the input change-detecting process, is possible for our approach. Briefly, our framework is an automatic procedure for generalizing any single-change-detecting statistic $D_{n} (\cdot)$ to a multiple-change-detecting statistic $T_{n}^{(D)}$ . We demonstrate this feature through various examples. Section 3.2 presents the theoretical details of the proposed framework under the popular CUSUM process. An application of outlier robust statistic is demonstrated in Section 3.3. Extension to testing change in general parameters is discussed in Section 3.4.

Remark 3.1

(Symmetric windows). Let the score function based on nonsymmetric windows be ${\tilde{T}}_{n}^{(D)} (k) = sup_{⌊ ϵ n ⌋ \leq d_{0}, d_{1} \leq n} T_{n}^{(D)}$ $(k | k - d_{0}, k + 1 + d_{1})$ . The corresponding test statistic is ${\tilde{T}}_{n}^{(D)} = \sum_{k = ⌊ ϵ n ⌋ + 1}^{n - ⌊ ϵ n ⌋ - 1} {\tilde{T}}_{n}^{(D)} (k) / (n - 2 ⌊ ϵ n ⌋ - 1)$ . According to our experience, the gain in power from nonsymmetric windows is marginal, but the computational cost increases exponentially; see the simulation results provided in Section C.10 of the supplement. Therefore, symmetric windows are recommended to balance cost and power.

3.2 CUSUM Statistic

To improve the power of the CUSUM process under the multiple-change-point setting, we generalize it to a CUSUM LSN statistic according to our proposed framework. In particular, the localized CUSUM statistic can be rewritten as (3.7) $\begin{matrix} L_{n}^{(C)} (k | s, e) = {(\frac{n}{e - s + 1})}^{1 / 2} [C_{n} (k) - C_{n} (s - 1) \\ - \frac{k - s + 1}{e - s + 1} {C_{n} (e) - C_{n} (s - 1)}], \\ = \frac{(k - s + 1) (e - s)}{{(e - s + 1)}^{3 / 2}} {\frac{ξ_{s, k}}{(k - s + 1)} - \frac{ξ_{k + 1, e}}{(e - k)}}, \end{matrix}$ (3.7) where $1 \leq s \leq k \leq e \leq n$ . Therefore, the localized CUSUM statistic compares the sample means of data from time s to k and k + 1 to e. Because (3.7) is not normalized, according to (3.3)–(3.4), the localized self-normalizer and localized CUSUM statistic are respectively $\begin{matrix} V_{n}^{(C)} (k | s, e) = \frac{k - s + 1}{{(e - s + 1)}^{2}} \sum_{j = s}^{k} L_{n}^{(D)} {(j | s, k)}^{2} \\ + \frac{e - k}{{(e - s + 1)}^{2}} \sum_{j = k + 1}^{e} L_{n}^{(D)} {(j | k + 1, e)}^{2}, \\ T_{n}^{(C)} (k | s, e) = \frac{L_{n}^{(C)} {(k | s, e)}^{2}}{V_{n}^{(C)} (k | s, e)} . \end{matrix}$

To infer how likely k is a potential change-point location, we consider all symmetric local windows that centered at k and define the score function according to (3.5) as (3.8) $T_{n}^{(C)} (k) = sup_{⌊ n ϵ ⌋ \leq d \leq n} T_{n}^{(C)} (k | k - d, k + 1 + d),$ (3.8) where $ϵ = 0.1$ . In , the proposed score function (3.8) is compared with Shao and Zhang (Citation2010) method. Clearly, $T_{n}^{(C)} (\cdot)$ achieves the local maxima near all true change-point locations, while $S_{n}^{(1)} (\cdot)$ does not. Because the $S_{n}^{(1)}$ self-normalizers are not localized, they include data containing change points and are, therefore, inflated by other changes. Other possible reasons include reduced change-point detection capacity of the global CUSUM because of other changes and nonmonotonic change-point configuration. As a remark, the under-performance of $S_{n}^{(1)}$ comes with the benefit of smaller computational complexity; for more discussion; see Section 4.1. Finally, we obtain the proposed test statistic by aggregating the scores $T_{n}^{(C)} (k)$ across k as follows: (3.9) $\begin{matrix} T_{n}^{(C)} = \frac{1}{n - 2 ⌊ ϵ n ⌋ - 1} \sum_{k = ⌊ ϵ n ⌋ + 1}^{n - ⌊ ϵ n ⌋ - 1} T_{n}^{(C)} (k), \end{matrix}$ (3.9) which captures the effects of all potential change points. For $j = 1, \dots, M$ , define $π_{j} = k_{j} / n$ and $Δ_{j} = μ_{k_{j} + 1} - μ_{k_{j}}$ be the jth relative change-point time and the corresponding change, respectively. Theorem 3.1 states the limiting distribution and the consistency of $T_{n}^{(C)}$ .

Fig. 1 (a) A realization of data (black solid line) with 5 change-points (blue dashed lines) and its mean function (red solid line). (b) Shao and Zhang’s (Citation2010) score function $S_{n}^{(1)} (k)$ . (c) The proposed score function $T_{n}^{(C)} (k)$ .

Theorem 3.1

(Limiting distribution and consistency). Suppose Assumption 2.1 is satisfied. (i) Under H₀, $T_{n}^{(C)} \overset{d}{\to} T$ for any $0 < ϵ < 1 / 2$ , where (3.10) $T = {(1 - 2 ϵ)}^{- 1} \int_{ϵ}^{1 - ϵ} sup_{δ > ϵ} \frac{L {(t | t - δ, t + δ)}^{2}}{V (t | t - δ, t + δ)} d t,$ (3.10) (3.11) $\begin{matrix} L (t | τ_{1}, τ_{2}) = \frac{1}{{(τ_{2} - τ_{1})}^{1 / 2}} [B (t) - B (τ_{1}) \\ - \frac{t - τ_{1}}{τ_{2} - τ_{1}} {B (τ_{2}) - B (τ_{1})}], \end{matrix}$ (3.11) (3.12) $\begin{matrix} V (t | t - δ, t + δ) = \frac{1}{4 δ} {\int_{t - δ}^{t} L {(τ | t - δ, t)}^{2} d τ \\ + \int_{t}^{t + δ} L {(τ | t, t + δ)}^{2} d τ} . \end{matrix}$ (3.12)

(ii) Under $H_{\geq 1}$ , if there exist $j \in {1, \dots, M}$ and C > 0 such that the jth change-point time satisfies $ϵ < min (π_{j} - π_{j - 1}, π_{j + 1} - π_{j})$ and the corresponding jth change magnitude satisfies $| Δ_{j} | = | μ_{k_{j} + 1} - μ_{k_{j}} | > C$ as $n \to \infty$ , then $lim_{n \to \infty} T_{n}^{(C)} = \infty$ in probability.

The test statistic $T_{n}^{(C)}$ has a pivotal limiting distribution under H₀, the quantiles of which can be found by simulation; see . See also Section 4 for more discussion on the critical value adjustment procedure for handling strong serial correlation when the sample size is small. Under $H_{\geq 1}$ , the test is of power 1 asymptotically if the change magnitude is at least a nonzero constant. Theorem 3.2 concerns a local alternative hypothesis that contains change points with diminishing change magnitudes.

Table 1. Simulated quantiles of $T$ , that is, the limiting distribution of $T_{n}^{(C)}$ .

Display Table

Theorem 3.2

(Local limiting power). Under $H_{\geq 1}$ , parameterize the jth change magnitude $| Δ_{j} (n) | = | μ_{k_{j} + 1} - μ_{k_{j}} |$ as a function of n. If there exist $j \in {1, \dots, M}$ and $0.5 < κ \leq 1$ such that the jth change point satisfies $ϵ < min (π_{j} - π_{j - 1}, π_{j + 1} - π_{j})$ and $| Δ_{j} (n) | ≍ n^{κ - 1}$ as $n \to \infty$ , then $lim_{n \to \infty} T_{n}^{(C)} = \infty$ in probability.

Motivated by , a natural change-point set estimator $\hat{Π}$ can be intuitively obtained from the points corresponding to the local maxima of the general score functions $T_{n}^{(D)} (\cdot)$ in (3.5). Specifically, the estimator is defined as (3.13) $\begin{matrix} {\hat{Π}}^{(D)} = {\frac{k}{n} \in {\frac{⌊ n ϵ ⌋}{n}, \dots, \frac{n - ⌊ n ϵ ⌋}{n}} : T_{n}^{(D)} (k) \\ = max_{_{k - ⌊ n ϵ ⌋ < j \leq k + ⌊ n ϵ ⌋}} T_{n}^{(D)} (j), T_{n}^{(D)} (k) > ϱ}, \end{matrix}$ (3.13) where ϱ is a decision threshold and D is the global change-detecting process. In particular, if the CUSUM process is used, that is, $D = C$ , Lemma 3.3 and Theorem 3.4 show the consistency of the proposed point estimator, ${\hat{Π}}^{(C)}$ .

Lemma 3.3.

Let π_j be the jth relative change point, k_j be the jth change-point time such that $k_{j} / n \to π_{j}$ as $n \to \infty$ for $j = 1, \dots, M$ , $k_{0} = 0$ , and $k_{M + 1} = n$ . Let h be any non-change-point time such that $| h / n - π_{j} | \to c > 0$ as $n \to \infty$ for all j. Define $E (X_{i}) = μ + \sum_{j = 1}^{M} Δ_{j} 1 (i / n > π_{j})$ . If $π_{j} - π_{j - 1} > ϵ$ and $Δ_{j} ≍ n^{κ_{j} - 1}$ as $n \to \infty$ , where $0.5 < κ_{j} \leq 1$ for all j, then for $ζ_{n, j} = o (n^{2 κ_{j} - 1})$ , $P {T_{n}^{(C)} (k_{j}) > ζ_{n, j}} \to 1 and P {T_{n}^{(C)} (h) > ζ_{n, j}} \to 0.$

Theorem 3.4.

Assume that for all $j = 1, \dots, M$ where M is a finite constant, $π_{j} - π_{j - 1} > ϵ$ and $Δ_{j} ≍ n^{κ_{j} - 1}$ as $n \to \infty$ where $0.5 < κ_{j} \leq 1$ . Let ${\hat{π}}_{1}^{(C)} < {\hat{π}}_{2}^{(C)} < \dots < {\hat{π}}_{\hat{M}}^{(C)}$ be the elements of ${\hat{Π}}^{(C)}$ . If $ϱ = o (n^{2 \bar{κ} - 1})$ where $\bar{κ} = {min}_{j = 1, \dots, M} κ_{j}$ , then for any small enough $η > 0$ such that $η < ϵ$ , $P {max_{_{j' = 1, \dots, M}} min_{_{j = 1, \dots, \hat{M}}} | π_{j'} - {\hat{π}}_{j}^{(C)} | > η} \to 0.$

By Lemma 3.3, the optimal threshold, ϱ, depends on the unknown change magnitudes. If the sizes of the changes are further assumed to be O(1), then we can set $ϱ = c n^{υ}$ with $0 < υ < 1$ . Theoretically, it is challenging to derive an optimal choice of c and $υ$ without a specific cost function that trades off the accuracy and size of ${\hat{Π}}^{(C)}$ . It is possible to select $(c, υ)$ across a class of change-point models through a large-scale simulation. As point estimation is beyond the scope of this article, we leave it as an open question for further study. From our experience, choosing $ϱ = \sqrt{n}$ yields a reasonable result. Because the score functions are computed during the computation of the proposed test statistics $T_{n}^{(C)}$ , the point estimator can be directly computed.

Remark 3.2

(Improving point estimation). The change-point estimation may be improved by further integrating other screening algorithms, for example, the screening and ranking algorithm (Niu and Zhang Citation2012) and the scanning procedure proposed by Yau and Zhao (Citation2016), to perform model selection in order to avoid overestimating the number of change points. Alternatively, our proposed test can be applied to stepwise estimation, for example, binary segmentation (Vostrikova Citation1981), wild binary segmentation (Fryzlewicz Citation2014), and SNCP (Zhao, Jiang, and Shao Citation2021); see Section G of the supplement for information on how to integrate our method with existing screening methods.

3.3 Outlier-Robust Statistics

An outlier-robust nonparametric change-point test can be constructed using the Wilcoxon two-sample statistic. The corresponding $D_{n} (\cdot)$ is (3.14) $\begin{matrix} W_{n} (⌊ n t ⌋) = \frac{1}{n^{3 / 2}} \sum_{i = 1}^{⌊ n t ⌋} \sum_{j = ⌊ n t ⌋ + 1}^{n} (1_{{X_{i} \leq X_{j}}} - \frac{1}{2}) \\ = n^{- 3 / 2} (\sum_{i = 1}^{⌊ n t ⌋} R_{i} - \frac{⌊ n t ⌋}{n} \sum_{i' = 1}^{n} R_{i'}), \end{matrix}$ (3.14) for $t \in [0, 1]$ , where $R_{i} = \sum_{j = 1}^{n} 1_{{X_{j} \leq X_{i}}}$ is the rank of X_i when there is no tie in $X_{1}, \dots, X_{n}$ . It is assumed that the data are weakly dependent and can be represented as a functional of an absolutely regular process. Under some mixing conditions, we have ${W_{n} (⌊ n t ⌋)} \Rightarrow σ_{W} {B (t) - t B (1)}$ , where $σ_{W}^{2} = \sum_{k \in Z} cov {F (X_{0}), F (X_{k})}$ and $F (\cdot)$ is the distribution function of X₁ with a bounded density; see Theorem 3.1 in Dehling et al. (Citation2015). From the principles in Section 3.1, we may use (3.14) to construct a self-normalized multiple-change-point Wilcoxon test and eliminate the nuisance parameter $σ_{W}$ by using the test statistic $T_{n}^{(W)} = T_{n}^{(W)} [max, mean]$ ; see the corollary below.

Corollary 3.5

(Limiting distribution of $T_{n}^{(W)}$ ). Under the regularity conditions in Theorem 3.1 of Dehling et al. (Citation2015) and H₀, we have $T_{n}^{(W)} \overset{d}{\to} T$ .

The Hodges–Lehmann statistic is another popular alternative to the CUSUM statistic in (2.3). Its global change-detecting process is (3.15) $\begin{matrix} H_{n} (⌊ n t ⌋) = n^{- 3 / 2} ⌊ n t ⌋ (n - ⌊ n t ⌋) \\ median {(X_{i} - X_{j}) : 1 \leq i \leq ⌊ n t ⌋ < j \leq n}, \end{matrix}$ (3.15) for $t \in [0, 1]$ , where $median (S)$ denotes the sample median of a set $S$ . It outperforms the CUSUM test under skewed and heavy-tailed distributions. Under the regularity conditions in Dehling, Fried, and Wendler (Citation2020), we have ${H_{n} (⌊ n t ⌋)} \Rightarrow σ_{H} u {(0)}^{- 1} {B (t) - t B (1)}$ , where $u (x) = \int_{R} f (y) f (x + y) d y, σ_{H}^{2} = \sum_{k \in Z} cov {F (X_{0}), F (X_{k})}$ , f is the density of X₁, and F is the distribution function of X₁. Similarly, we can apply the Hodges–Lehmann statistic to test for the existence of multiple change points. The corresponding test statistic is $T_{n}^{(H)} = T_{n}^{(H)} [max, mean]$ . Its limiting distribution is presented in Corollary 3.6. To our knowledge, there is no existing self-normalized tests use the Hodges–Lehmann statistic. It is included to demonstrate the generality of the proposed self-normalization framework and to detect change points in heavy-tailed data.

Corollary 3.6

(Limiting distribution of $T_{n}^{(H)}$ ). Under the regularity conditions in Theorem 1 of Dehling, Fried, and Wendler (Citation2020) and H₀, we have $T_{n}^{(H)} \overset{d}{\to} T$ .

3.4 Extension for Testing Change in General Parameters

Instead of testing changes in mean, one may be interested in other quantities, for example, variances, quantiles and model parameters. Let $θ_{i} = P (F_{i}^{(h)}) \in R^{q}$ , where $h \in N, P (\cdot)$ is a functional, $F_{i}^{(h)}$ is the joint distribution function of $Y_{i} = {(X_{i}, \dots, X_{i + h - 1})}^{⊺}$ for $i = 1, \dots, n - h + 1$ and $q \in N$ . For example, for h = 1, $μ_{i} = \int_{- \infty}^{\infty} x d F_{i}^{(1)} (x)$ and $σ_{i}^{2} = \int_{R} x^{2} d F_{i}^{(1)} (x) - {\int_{R} x d F_{i}^{(1)} (x)}^{2}$ are the marginal mean and variance at time i, respectively. For h = 2, the lag-1 autocovariance at time i is $γ_{i} (1) = \int_{R^{2}} (x_{i} - μ_{i}) (x_{i - 1} - μ_{i - 1}) d F_{i}^{(2)} (x_{i - 1}, x_{i})$ . The hypotheses in (2.1) and (3.1) are redefined by replacing μ_i’s by θ_i’s. A possible global change-detecting process is (3.16) $G_{n} (⌊ n t ⌋) = \frac{⌊ n t ⌋ (n - ⌊ n t ⌋)}{n^{3 / 2}} ({\hat{θ}}_{1, ⌊ n t ⌋} - {\hat{θ}}_{⌊ n t ⌋ + 1, n}),$ (3.16) for $t \in [0, 1]$ ; see, for example, Shao (Citation2010), Shao and Zhang (Citation2010), and Shao (Citation2015). If ${\hat{θ}}_{i, j} = {(j - i + 1)}^{- 1} \sum_{t = i}^{j} X_{t}$ , then $G_{n} (\cdot) = C_{n} (\cdot)$ . Therefore, (3.16) generalizes (2.3) from testing changes in μ_i’s to testing changes in θ_i’s. The final test statistic is $T_{n}^{(G)} = T_{n}^{(G)} [max, mean]$ .

The limiting distribution of (3.16) requires standard regularity conditions in handling statistical functionals. Based on the sample $Y_{s}, \dots, Y_{e}$ , define the empirical distribution of Y_i’s by ${\hat{F}}_{s, e}^{(h)} = {(e - s + 1)}^{- 1} \sum_{i = s}^{e} δ_{Y_{i}}$ , where δ_y is a point mass at $y \in R^{h}$ . Assume that ${\hat{θ}}_{s, e} = P ({\hat{F}}_{s, e}^{(h)})$ is asymptotically linear in the following sense, as $e - s \to \infty$ : (3.17) $P ({\hat{F}}_{s, e}^{(h)}) = P (F^{(h)}) + {(e - s + 1)}^{- 1} \sum_{i = s}^{e} IF (Y_{i} | P, F^{(h)}) + R_{s, e},$ (3.17) where $R_{s, e}$ is a remainder term, $F^{(h)} = F_{s}^{(h)} = \dots = F_{e}^{(h)}$ , and $IF (y | P, F^{h}) = lim_{ϵ \to 0} {P ((1 - ϵ) F^{h} + ϵ δ_{y}) - P (F^{h})} / ϵ$ is the influence function; see Wasserman (Citation2006).

Corollary 3.7

(Limiting distribution of $T_{n}^{(G)}$ ). For $t \in [0, 1]$ and $h \in N$ , define $I_{n} (⌊ n t ⌋) = n^{- 1 / 2} \sum_{i = 1}^{⌊ n t ⌋} IF (Y_{i} | P, F^{(h)})$ , where $F^{(h)} = F_{1}^{(h)} = \dots = F_{n - h + 1}^{(h)}$ under H₀. Suppose that (i) $E {IF (Y_{i} | P, F^{h})} = 0$ for all i, (ii) ${I_{n} (t) : t \in [0, 1]} \Rightarrow {σ_{G} B^{(q)} (t) : t \in [0, 1]}$ for some positive definite matrix σ_G and $B^{(q)} (\cdot)$ is q-dimensional standard Brownian motion, where $q \in N$ , and (iii) $sup_{k = 1, \dots, n - 1} | R_{1 : k} | + | R_{(k + 1) : n} | = o_{p} (n^{- 1 / 2})$ . Then $T_{n}^{(G)} \overset{d}{\to} T^{(q)}$ , where $\begin{matrix} T^{(q)} = {(1 - 2 ϵ)}^{- 1} \int_{ϵ}^{1 - ϵ} sup_{δ > ϵ} L^{(q)} {(t | t - δ, t + δ)}^{T} V^{(q)} {(t | t - δ, t + δ)}^{- 1} \\ L^{(q)} (t | t - δ, t + δ) d t, \end{matrix}$ where $L^{(q)}$ is defined as $L$ in (3.11) but with $B (\cdot)$ being replaced by $B^{(q)} (\cdot)$ ; and $V^{(q)}$ is defined as $V$ in (3.11) but with $L {(\cdot | \cdot, \cdot)}^{2}$ being replaced by $L^{(q)} {(\cdot | \cdot, \cdot)}^{\otimes 2}$ . In particular, $T^{(1)} = T$ .

4 Discussion and Implementation

4.1 Comparison with Existing Methods

Shao and Zhang (Citation2010) extended the self-normalized one-change-point test (2.4) to a supervised multiple-change-point test tailored for testing m change-points, where m is pre-specified. The test statistic is (4.1) $S_{n}^{(m)} = sup_{(k_{1}, \dots, k_{m}) \in Ω_{n}^{(m)} (ϵ)} \sum_{j = 1}^{m} T_{n}^{(C)} (k_{1} | k_{j - 1}, k_{j + 1}),$ (4.1) where $Ω_{n}^{(m)} (ϵ) = {k_{1}, \dots, k_{m} : j \in {1, \dots, m}, ⌊ ϵ n ⌋ \leq$ $min (k_{j} - k_{j - 1}, k_{j + 1} - k_{j})}, k_{0} = 1$ , and $k_{m + 1} = n$ . The trimmed region $Ω_{n}^{(m)} (ϵ)$ prevents estimates from being computed with too few observations. Later, Zhang and Lavitas (Citation2018) proposed an unsupervised self-normalized multiple-change-point test that bypasses specifying m and is defined as, (4.2) $\begin{matrix} Z_{n} = sup_{(k_{1}, k_{2}) \in Ω_{n}^{(2)} (ϵ)} T_{n}^{(C)} (k_{1} | 1, k_{2}) \\ + sup_{(k_{1}, k_{2}) \in Ω_{n}^{(2)} (ϵ)} T_{n}^{(C)} (k_{2} | k_{1}, n) . \end{matrix}$ (4.2)

Although $S_{n}^{(m)}, Z_{n}$ and $T_{n}^{(C)}$ use LSN CUSUM statistics, that is, $T_{n}^{(C)} (k | s, e)$ , as building blocks, they have different restrictions on the local windows and aggregate the LSN statistics in different ways. The Shao and Zhang (Citation2010) multiple-change-point test has strict control over the m local windows because the boundaries of a window relate to preceding and subsequent windows. If the number of change points is misspecified, some windows may not contain only one change point. If m < M, the self-normalizers are not robust to changes. If m > M, some degrees of freedom are lost trying to detect change points that do not exist. Both cases may lead to a significant loss of power. Moreover, their computational cost increases exponentially with m. The Zhang and Lavitas’s (Citation2018) approach sets the left end of the window to 1 in the forward scan, and the right end to n in the backward scan. Therefore, their approach tends to scan for the first change point $k_{1} \in [1, e]$ and the last change point $k_{M} \in [s, n]$ for some e and s, which may lead to a loss of power; see Section 5.2. In contrast, our approach scans for all possible change points because windows can start and end at any time with the same computation complexity; see Section C.9 of the supplement. The score function at each time takes O(n) steps to compute and O(1) spaces to store using recursive computation. Because the test statistic is computed as a function of O(n) scores, it has a computational cost of $O (n^{2})$ and a memory requirement of O(n). summarizes the comparison. As a remark, the self-normalized segmentation algorithms proposed by Jiang, Zhao, and Shao (Citation2021) and Zhao, Jiang, and Shao (Citation2021) are also based on localized CUSUM contrast-type change-detecting statistics. In the testing stage, a nested window set is introduced to control the computational cost by considering a finite set of windows. In contrast, we develop recursive LSN statistics that consider all symmetric windows with sizes greater than ϵ. Therefore, our method considers all possible windows with size greater than $⌊ ϵ n ⌋$ and possibly increases power.

Table 2. Comparisons between various different change-point tests in terms of (a) finite-sample size accuracy with respect to the nominal Type-I error rate; (b) power under several change-point numbers M; (c) time complexity of computing the test statistics based on a sample of size n; (d) robustness against outliers; and (e) requirements of computing a long-run variance estimate ${\hat{σ}}_{n}^{2}$ and specifying a target number of change points m, where ${\hat{σ}}_{n}^{2}$ is assumed to be computed in O(n) steps.

Display Table

4.2 Finite-n Adjusted Critical Values

In the time series context, the accuracy of the invariance principle, that is, Assumption 2.1, may deteriorate when the sample size is small (Kiefer and Vogelsang Citation2005). Therefore, the asymptotic theory in, for example, Theorem 3.1, may not apply when serial dependency is strong and the sample size is small. It may lead to severe size distortion, observed in both existing self-normalized and non-self-normalized methods, including the proposed tests; see Section 5 and Section C.13 of the supplement for simulation evidences. For the multiple-change-point problem (3.1) in mean, a finite-sample adjusted critical value procedure based on the strength of serial dependency is proposed to mitigate this problem.

We propose to compute a critical value $c_{α} (n, ρ)$ by matching the autocorrelation function (ACF) at lag one ρ and n for various specified levels of significance $α \in (0, 1)$ . The values of $c_{α} (n, ρ)$ are tabulated in Section D of the supplement for $α \in {0.01, 0.05, 0.1}, n \in {100, 200, \dots, 1000, 2000, \dots, 10000}$ , and $ρ \in {0, \pm 0.1, \dots, \pm, 0.9}$ . The testing procedure is outlined as follows.

Compute the sample lag-1 ACF $\hat{ρ}$ of ${X_{i + b_{n}} - X_{i}}_{i = 1}^{n - b_{n}}$ , where $b_{n} = ⌊ n^{1 / 3} ⌋$ is the differencing parameter.
Obtain the critical value $c_{α} (n, \hat{ρ})$ by matching n and ρ. Use linear interpolation if n and/or $\hat{ρ}$ do not lie in the grids.
Reject the null if $T_{n}^{(ℵ)} > c_{α} (n, \hat{ρ})$ , where $ℵ \in {C, W, H, G}$ .

The adjustment procedure borrows from the idea of bootstrap-based critical value simulation procedures; see for example, Zhou (Citation2013) and Pešta and Wendler (Citation2020). The limiting distribution is pivotal by Theorem 3.1 and Corollaries 3.5–3.7. Therefore, the critical value simulation procedure can be performed prior to the computation of the test statistic. Because $\hat{ρ}$ depends on the differenced data with a suitable lag, the change points (if any) have negligible effect on $\hat{ρ}$ .

The consistency of $\hat{ρ}$ is developed on the following framework: Let $X_{i} = μ_{i} + Z_{i}$ , where μ_i’s are deterministic and Z_i’s are zero-mean stationary noises. Define $Z_{i} = g (\dots, ε_{i - 1}, ε_{i})$ , where $ε_{i}$ ‘s are independent and identically distributed ( $iid$ ) random variables and g is some measurable function. Let $ε_{i}^{'}$ be an $iid$ copy of $ε_{i}$ and $Z_{i}^{'} = g (\dots, ε_{- 1}, ε_{0}^{'}, ε_{1}, \dots, ε_{i})$ . For p > 1 and ${‖ Z_{i} ‖}_{p} < \infty$ , the physical dependence measure (Wu Citation2011) is defined as $λ_{p} (i) = {‖ Z_{i} - Z_{i}^{'} ‖}_{p}$ , where ${‖ \cdot ‖}_{p} = {(E | \cdot |^{p})}^{1 / p}$ . Theorem 4.1 states that $\hat{ρ}$ is a consistent estimator even under a large number of change points M. In particular, if $b_{n} = ⌊ n^{1 / 3} ⌋$ and $| Δ_{*} | < \infty$ , then Theorem 4.1 guarantees that $\hat{ρ}$ is consistent for ρ if $M = o (n^{2 / 3})$ . The proposed adjustment only affects the finite-sample performance because the critical value $c_{α} (n, ρ)$ is a constant as a function of ρ when $n \to \infty$ .

Theorem 4.1.

Assume that $E (Z_{1}^{4}) < \infty$ and $\sum_{i = 1}^{\infty} λ_{4} (i) < \infty$ . Define $| Δ_{*} | = sup_{1 \leq j \leq M} | Δ_{j} |$ . Let b_n be an $N$ -valued sequence. If $b_{n} / n + 1 / b_{n} + (b_{n} M / n) Δ_{*}^{2} \to 0$ as $n \to \infty$ , then $\hat{ρ} \overset{pr}{\to} ρ$ .

Although it is possible to match the ACF at a higher lag, the AR(1) model is considered to balance the accuracy and the computational cost; see Section C.13 of the supplement. The sensitivity analysis of the differencing parameter b_n in Section C.14 of the supplement also verifies that the choice has a minimal effect on the performance of the tests. The simulation result shows that the tests have a remarkably accurate size even when the true model is not the AR(1) model; see Section 5 and Section C of the supplement. It indicates that the selected AR(1) model has certain explanatory power. If another approximation model is more appropriate for a specific application, then the AR(1) model should be replaced. Users may choose between the standard testing procedure using or the proposed adjustment, if it is appropriate.

5 Simulation Experiments

5.1 Setting and Overview

Throughout Section 5, the experiments are designed as follows. The time series is generated from a signal-plus-noise model: $X_{i} = μ_{i} + Z_{i}$ . The values of μ_i’s will be specified in Sections 5.2 and 5.3. The zero-mean noises Z_i’s are simulated from a stationary bilinear autoregressive (BAR) model: (5.1) $Z_{i} = (ϖ + ϑ ε_{i}) Z_{i - 1} + ε_{i}, i = 1, \dots, n,$ (5.1) where $ε_{i}$ ‘s are independent standard normal random variables, and $ϖ, ϑ \in R$ such that $ϖ^{2} + ϑ^{2} < 1$ . The BAR model is a class of nonlinear time series models that have been extensively studied in the literature; see, for example, Granger and Anderson (Citation1978), Rao (Citation1981), and Rao and Gabr (Citation1984). The bilinear time series model is widely used in the fields of control theory (Bruni, DiPillo, and Koch Citation1974), econometrics (Weiss Citation1986), and seismology (Dargahi-Noubary, Laycock, and Rao Citation1978), which the phenomenon of occasional sharp spikes occurs in sample paths. Similar findings are obtained under other noise models, including the autoregressive-moving-average model, threshold AR model, and absolute value nonlinear AR model. Because of space constraints, these results are presented in to the supplement. The critical value is chosen according to the adjustment procedure described in Section 4.2.

5.2 Size and Power

In this section, we examine the size and power of different tests when there exist various numbers of change points. Following the suggestion of Huang, Volgushev, and Shao (Citation2015) and Zhang and Lavitas (Citation2018), we choose $ϵ = 0.1$ in $S_{n}^{(2)}, S_{n}^{(3)}, Z_{n}$ and in our proposed tests for a fair comparison. Suppose that the change points are evenly spaced and the mean change directions are alternating (increasing or decreasing): (5.2) $μ_{i} = Δ \sum_{j = 1}^{M} {(- 1)}^{j + 1} 1_{{i / n > j / (M + 1)}},$ (5.2) where $M \in {1, \dots, 6}$ denotes the number of change points, and $Δ \in R$ controls the magnitude of the mean changes. Then, $μ_{i} \equiv 0$ under H₀.

All tests are computed at the nominal size $α = 5 %$ . The null rejection rates $\hat{α}$ , for sample sizes $n \in {200, 400}$ , are presented in . To summarize the result, we further report the sample root mean squared error (RMSE) of $\hat{α}$ over all cases of $ϑ = 0.8, - 0.5, 0.5, 0.8$ and $ϖ = - 0.8, - 0.5, - 0.3, 0, 0.3, 0.5, 0.8$ for each test and each n. Specifically, ${RMSE}_{n} = {[| H |^{- 1} \sum_{(ϑ, ϖ) \in H} {{\hat{α}}_{n} (ϑ, ϖ) - α}^{2}]}^{1 / 2},$ where $H$ is the set of $(ϑ, ϖ)$ used in the simulation, $| H |$ is the cardinality of $H$ , and ${\hat{α}}_{n} (ϑ, ϖ)$ is the empirical size under the BAR model with parameter ϑ and ϖ with sample size n. The self-normalized tests generally have a more accurate size than the non-self-normalized test ${KS}_{n}$ . This finding is consistent with that of Shao (Citation2015). The self-normalized approach is a special case of the fixed bandwidth approach with the largest possible bandwidth and thus achieves the smallest size distortion, as observed by Kiefer and Vogelsang (Citation2002). In comparison, $T_{n}^{(W)}$ and $T_{n}^{(H)}$ control the size most accurately among the self-normalized tests. In particular, our proposed tests have the least severe under-size problem, observed in existing self-normalized tests (i.e., $S_{n}^{(1)}$ , $S_{n}^{(2)}, S_{n}^{(3)}$ , and $Z_{n}$ ) when $ϖ < 0$ . Moreover, the test proposed by Shao and Zhang (Citation2010), $S_{n}^{(m)}$ , suffers from an increasing size distortion as m increases. This finding is consistent with that of Zhang and Lavitas (Citation2018). The fact that $T_{n}^{(W)}$ and $T_{n}^{(H)}$ outperform $T_{n}^{(C)}$ can be attributed to the outlier robustness of rank and order statistics. This advantage is particularly noticeable in bilinear time series, as this model is well known for producing sudden high amplitude oscillations that mimic structures in, for example, explosion and earthquake data in seismology; see Section 5.2 of Rao and Gabr (Citation1984). However, in the more standard ARMA models, $T_{n}^{(C)}$ performs as well as $T_{n}^{(W)}$ and $T_{n}^{(H)}$ ; see the supplement for the detailed results.

Table 3. Null rejection rates $\hat{α}$ at nominal size $α = 5 %$ under BAR model and mean function (5.2).

Display Table

The power curve against Δ is computed at 5% nominal size using 2¹⁰ replications with n = 200. displays the size-adjusted power when $ϖ = ϑ = 0.5$ . The (unadjusted) power is presented in the supplement. The results with various values of ϖ and ϑ are similar, thus, they are presented in the supplement. Generally, $T_{n}^{(W)}$ and $T_{n}^{(H)}$ have the highest power. They outperform $T_{n}^{(C)}$ because of their robustness against outliers. When only the CUSUM-type tests are considered, the non-self-normalized test ${KS}_{n}$ performs better than self-normalized tests when M = 1. However, when M > 1, ${KS}_{n}$ significantly under-performs because it is tailor-made for the AMOC alternative (2.2). Although $S_{n}^{(1)}$ has the highest power among the self-normalized CUSUM tests when M = 1, it suffers from the notorious non-monotonic power problem when M > 1 because its self-normalizer is not robust to multiple change points. Thus, $S_{n}^{(1)}$ is not a consistent test when M > 1.

Fig. 2 Size-adjusted power under BAR model with $ϖ = ϑ = 0.5$ , n = 200 and mean function (5.2).

Surprisingly, our proposed test $T_{n}^{(C)}$ outperforms the tests of Shao and Zhang (Citation2010), $S_{n}^{(m)}$ , even when M is well-specified. This finding demonstrates that it is not advantageous to know M because structural change can be identified by observing the data around it without looking at the entire segmented series. Moreover, as discussed in Section 4.1, $S_{n}^{(m)}$ defines the local windows restrictively. It accumulates errors if the boundaries of the local windows differ from the actual change points. It is also interesting to see that, compared with $S_{n}^{(1)}$ , $S_{n}^{(2)}$ , and $S_{n}^{(3)}$ are less sensitive to misspecification of M; however, they are still less powerful than our proposed tests.

5.3 Ability to Capture Effects of All Changes

The proposed tests are robust against multiple change points compared with existing methods. Consider the three-change-point setting in which the magnitude of changes at the first and the last change points is only half that of the second change point. Specifically, define the mean function for case 1 as (5.2) with M = 3 and for case 2 as $μ_{i} = Δ {\frac{1}{2} \cdot 1_{{i > ⌊ n / 4 ⌋}} - 1_{{i > ⌊ 2 n / 4 ⌋}} + \frac{1}{2} \cdot 1_{{i > ⌊ 3 n / 4 ⌋}}} .$

From , $Z_{n}$ and $S_{n}^{(2)}$ lose approximately half of their power in Case 2 compared with that in Case 1. In contrast, the power of our tests only decreases by 1/3 when $Δ \leq 1$ , while it remains roughly the same when $Δ \geq 2$ . Therefore, our tests are more powerful and can capture a larger class of structural changes than existing methods. In contrast, $S_{n}^{(2)}$ and $Z_{n}$ tend to consider the first and/or the last change points.

Fig. 3 Size-adjusted power under the BAR model with $ϖ = ϑ = 0.5$ , n = 200 and mean functions as in cases 1 and 2.

5.4 Change in General Parameters

In many applications, trends may exist in the data. Therefore, we consider the multiple-change-point problem in the mean and median trend models. Specifically, we consider the regression model: $Φ (X_{i}) = α_{i} + β_{i} (i / n)$ with $Φ (\cdot) = E (\cdot)$ and $Φ (\cdot) = median (\cdot)$ under the following multiple-change-point model: $\begin{matrix} α_{i} = Δ_{α, 0} + \sum_{j = 1}^{M} Δ_{α, j} 1 (i / n > π_{j}) and \\ β_{i} = Δ_{β, 0} + \sum_{j = 1}^{M} Δ_{β, j} 1 (i / n > π_{j}) \end{matrix}$ for $i = 1, \dots, n$ , where $M \in N, 0 < π_{1} < \dots < π_{M} < 1$ , and $Δ_{α, 0}, \dots, Δ_{α, M}, Δ_{β, 0}, \dots, Δ_{β, M} \in R$ . Note that $Φ (\cdot) = E (\cdot)$ and $Φ (\cdot) = median (\cdot)$ refer to the mean trend and median trend models, respectively. Our goal is to test H₀ against $H_{\geq 1}$ with μ_i’s replaced by β_i’s.

To handle changes in general parameters, the global change-detecting process $G_{n} (\cdot)$ in (3.16) is used. Let ${\hat{θ}}_{s : e}$ be an estimator of the common value of $β_{s}, \dots, β_{e}$ based on observations $X_{s}, \dots, X_{e}$ under H₀. The estimators ${\hat{θ}}_{s : e}$ are obtained using ordinary least-squares regression under the mean trend model; while they are obtained using quantile regression under the median trend model and computed by the R package “quantreg”. Let $T_{n}^{(G, mean)}$ and $T_{n}^{(G, med)}$ be the mean trend version and median trend version of the test statistics $T_{n}^{(G)}$ in Section 3.4, respectively. In our simulation, we generate the data as $\begin{matrix} X_{i} = Δ \sum_{j = 0}^{M} {\frac{(i - ⌊ j n / (M + 1) ⌋}{⌊ n / (M + 1) ⌋} \land 1} 1_{{i / n > j / (M + 1)}} + Z_{i}, \\ i = 1, \dots, n, \end{matrix}$ where Z_i’s are simulated from the BAR model (5.1), $M \in {1, \dots, 6}$ denotes the number of change points, and $Δ \in R$ controls the magnitude of the changes. Briefly, the model implies an uptrend in mean and median before every odd index of change points until level Δ. After reaching Δ at odd change points, the trend shifts downward and decreases to 0 at the next change point. The empirical rejection rates are reported in . From the results, $T_{n}^{(G, mean)}$ suffers less severe under-size distortion than $T_{n}^{(G, med)}$ . Both tests have decent powers in the multiple-change-point setting and significantly improve from n = 200 to n = 400. For reference, the size-adjusted power is also reported in Section C.11 of the supplement.

Table 4. Empirical rejection rates (%) of the mean and median trend tests, $T_{n}^{(G, mean)}$ and $T_{n}^{(G, med)}$ , respectively, at 5% nominal size under the BAR model.

Display Table

6 Shanghai-Hong Kong Stock Connect turnover

We use our proposed method to perform change-point analysis of the Shanghai-Hong Kong Stock Connect Southbound Turnover index (Bloomberg code: AHXHT index) from March 23, 2017 to March 22, 2022 (n = 1232). The data are retrieved from Bloomberg. The Stock Connect is a channel that allows mainland China and Hong Kong investors to access other stock markets jointly. The southbound line enables investors in mainland China to invest in the Hong Kong stock market. Studies have shown that the Stock Connect improves mutual market liquidity and capital market liberalization (Bai and Chow Citation2017; Huo and Ahmed Citation2017; Xu et al. Citation2020). Therefore, it is of practical interest, to determine whether change points exist in the Stock Connect Southbound daily turnover.

We investigate whether changes in trend exist in the log daily turnovers. Under the mean and median trend models, we consider the tests statistics $T_{n}^{(G, mean)}$ and $T_{n}^{(G, med)}$ , specified in Section 5.4. The resulting test statistics are 47.46 and 40.64. Both p-values are less than 1%. The trend change-point location estimates are indicated by red vertical lines in . The mean and median trend analyses agree on four estimated change points, estimated on March 7, 2018, January 24, 2019, October 2, 2019, and January 12, 2021. The median trend analysis detects an additional change point on May 25, 2020. The first change point is likely to be the China–United States trade war. After the trade war began in January 2018, the Stock Connect turnover displayed a downtrend. An uptrend is detected after 2019 until the COVID-19 events, while it stops after the beginning of 2021.

Fig. 4 Estimated (a) mean trend and (b) median trend change points are indicated by red vertical lines. The orange lines indicate the fitted regression lines within each region separated by the estimated change points. Data are retrieved from Bloomberg.

7 Conclusion

Our method improves existing change-point tests and has several advantages: (i) it has high power and size accuracy, (ii) it does not require specification of the number of change points nor consistent estimation of nuisance parameters, (iii) a consistent estimate of change points can be naturally produced, (iv) general change-detecting statistics, for example, rank and order statistics, can be used to enhance robustness, (v) it can test a change in general parameter of interest, and (vi) it applies to a wide range of time series models. summarizes the properties of the proposed tests. Moreover, our proposed framework is driven by intuitive principles. If a single-change-detecting statistic is provided, our framework can generalize it to a multiple-change-detecting statistic. We anticipate that future works will apply our framework to non-time-series data, for example, spatial and spatial–temporal data.

8 Proofs of Theorems

Proof of Theorem

s 3.1 and 3.2. (i) Under H₀ and by the continuous mapping theorem, Assumption 2.1 implies that ${C_{n} (⌊ n t ⌋) : t \in [0, 1]} \Rightarrow {σ (B (t) - t B (1)) : t \in [0, 1]}$ . Note that $T_{n}^{(C)}$ is a composite function of $C_{n} (\cdot)$ through (3.2), (3.3), (3.4), (3.5), and (3.6), each of which is a continuous and measurable map. By the continuous mapping theorem, we obtain $T_{n}^{(C)} \overset{d}{\to} T$ in (3.10). The limiting distribution $T$ is well-defined because $V_{n}^{(C)} (k | k - d, k + 1 + d)$ converges to a nonnegative and nondegenerate distribution for any $⌊ n ϵ ⌋ \leq d \leq n$ and $⌊ ϵ n ⌋ + 1 \leq k \leq n - ⌊ ϵ n ⌋ - 1$ , if $ϵ > 0$ .

(ii) For consistency, we consider the jth relative change-point time π_j such that the corresponding change magnitude satisfies $| Δ_{j} | ≍ n^{κ - 1}$ , where $0.5 < κ \leq 1$ . Let $Δ_{*} = Δ_{j}$ and $k_{*} = ⌊ n π_{j} ⌋$ . Under this assumption, there exist $c \in R^{+}$ and $3 / 2 - κ \leq Υ < 1$ such that $ϵ + ⌊ c n^{Υ} ⌋ / n < min (π_{j} - π_{j - 1}, π_{j + 1} - π_{j})$ is satisfied for a large enough n. Therefore, there is only one change point in the interval $k_{*} \pm (⌊ ϵ n ⌋ + ⌊ c n^{Υ} ⌋)$ . It suffices to consider $d = ⌊ ϵ n ⌋$ . For all k such that $| k - k_{*} | \leq ⌊ b n^{Υ} ⌋$ where $0 < b < c$ , we can decompose (8.1) $\begin{matrix} L_{n}^{(C)} {(k | k - d, k + d + 1)}^{2} \\ = \frac{d + 1}{8} {({\bar{X}}_{(k - d) : k} - {\bar{X}}_{(k + 1) : (k + d + 1)})}^{2} \\ = \frac{d + 1}{8} {( {{\bar{Z}}_{(k - d) : k} - {\bar{Z}}_{(k + 1) : (k + d + 1)}} - {1 - \frac{| k_{*} - k |}{d + 1}} Δ_{*} )}^{2} \\ = \frac{1}{8} [(\sqrt{d + 1} {{\bar{Z}}_{(k - d) : k} - {\bar{Z}}_{(k + 1) : (k + d + 1)}}) \\ - {\sqrt{d + 1} - \frac{⌊ b n^{Υ} ⌋}{\sqrt{d + 1}}} Δ_{*}]^{2} \\ \geq {O_{p} (1) + Ξ_{1, n}}^{2}, \end{matrix}$ (8.1) where $| Ξ_{1, n} | ≍ n^{κ - 1 / 2}$ . EquationEquation (8.1)(8.1) $\begin{matrix} L_{n}^{(C)} {(k | k - d, k + d + 1)}^{2} \\ = \frac{d + 1}{8} {({\bar{X}}_{(k - d) : k} - {\bar{X}}_{(k + 1) : (k + d + 1)})}^{2} \\ = \frac{d + 1}{8} {( {{\bar{Z}}_{(k - d) : k} - {\bar{Z}}_{(k + 1) : (k + d + 1)}} - {1 - \frac{| k_{*} - k |}{d + 1}} Δ_{*} )}^{2} \\ = \frac{1}{8} [(\sqrt{d + 1} {{\bar{Z}}_{(k - d) : k} - {\bar{Z}}_{(k + 1) : (k + d + 1)}}) \\ - {\sqrt{d + 1} - \frac{⌊ b n^{Υ} ⌋}{\sqrt{d + 1}}} Δ_{*}]^{2} \\ \geq {O_{p} (1) + Ξ_{1, n}}^{2}, \end{matrix}$ (8.1) holds uniformly in k for $| k - k_{*} | \leq ⌊ b n^{Υ} ⌋$ . Next, for the self-normalizer $V_{n}^{(C)} (k | k - d, k + d + 1)$ , we consider two cases: $k \geq k_{*}$ and $k < k_{*}$ . If $k \geq k_{*}$ , there is only one change point in the interval $[k - d, k]$ . Following similar calculations as in (8.1), we have $L_{n}^{(C)} {(j | k - d, k)}^{2} \leq L_{n}^{(C)} (k_{*} | k - d, k)$ for all $j = k - d, \dots, k$ , if n is large enough. Also, because there is no change point in the interval $[k + 1, k + 1 + d], L_{n}^{(C)} {(j | k + 1, k + 1 + d)}^{2} = O_{p} (1)$ for all $j = k + 1, \dots, k + 1 + d$ . So, $\begin{matrix} V_{n}^{(C)} (k | k - d, k + d + 1) \\ = \frac{1}{4 (d + 1)} {\sum_{j = k - d}^{k} L_{n}^{(C)} {(j | k - d, k)}^{2} \\ + \sum_{j = k + 1}^{k + 1 + d} L_{n}^{(C)} {(j | k + 1, k + 1 + d)}^{2}} \\ \leq \frac{1}{4} L_{n}^{(C)} {(k_{*} | k - d, k)}^{2} + O_{p} (1) \\ = \frac{{(d + 1 - ⌊ b n^{Υ} ⌋)}^{2} {(⌊ b n^{Υ} ⌋)}^{2}}{2 {(d + 1)}^{3}} {{\bar{Z}}_{(k - d) : k_{*}} - {\bar{Z}}_{k_{*} : k} - Δ_{*}}^{2} + O_{p} (1) \\ \leq {O_{p} (1) + Ξ_{2, n}}^{2}, \end{matrix}$ where $| Ξ_{2, n} | ≍ n^{Υ + κ - 3 / 2}$ . For the case in which $k < k_{*}$ , the analysis is similar and $V_{n}^{(C)}$ are of the same order. Therefore, there exists a constant $c' \neq 0$ such that $T_{n}^{(C)} (k) \geq {\frac{O_{p} (1) + Ξ_{1, n}}{O_{p} (1) + Ξ_{2, n}}}^{2} = {o_{p} (1) + c' n^{1 - Υ}}^{2},$

for all $k = k_{*} - ⌊ b n^{Υ} ⌋, \dots, k_{*} + ⌊ b n^{Υ} ⌋$ . Because $T_{n}^{(C)} (k) \geq 0$ for all k, we have $\begin{matrix} T_{n}^{(C)} \geq n^{- 1} \sum_{k = k_{*} - ⌊ b n^{Υ} ⌋}^{k_{*} + ⌊ b n^{Υ} ⌋} T_{n}^{(C)} (k) \\ \geq \frac{2 ⌊ b n^{Υ} ⌋ + 1}{n} {o_{p} (1) + c' n^{1 - Υ}}^{2} \\ = c ″ n^{1 - Υ} {o_{p} (1) + 1}^{2}, \end{matrix}$

for a large enough n, where $c ″ > 0$ is a constant. Consequently, $T_{n}^{(C)} \to \infty$ in probability as $n \to \infty$ because $Υ < 1$ . □

Proof of Lemma 3.3.

By the assumption stated in the lemma, there exists an ϵ such that π_j is the unique relative change-point time in the interval $[π_{j} - ϵ, π_{j} + ϵ]$ . Let $d = ⌊ n ϵ ⌋$ and $k_{j} = ⌊ n π_{j} ⌋$ . Let ${\bar{Z}}_{s : e} = {(e - s + 1)}^{- 1} \sum_{i = s}^{e} Z_{i}$ , where $Z_{i} = X_{i} - E (X_{i})$ . Then, $\begin{matrix} L_{n}^{(C)} {(k_{j} | k_{j} - d, k_{j} + 1 + d)}^{2} \\ = \frac{1}{8} {\sqrt{d + 1} ({\bar{Z}}_{(k_{j} - d) : k_{j}} - {\bar{Z}}_{(k_{j} + 1) : (k_{j} + 1 + d)}) - \sqrt{d + 1} Δ_{j})}^{2} \\ = O_{p} (1) + O_{p} (n^{κ_{j} - 1 / 2}) + O (n^{2 κ_{j} - 1}) . \end{matrix}$

Because there is no change point in the intervals $[k_{j} - d, k_{j}]$ and $(k_{j} + 1, k_{j} + 1 + d]$ , the self-normalizer $V_{n}^{(C)} (k_{j} | k_{j} - d, k_{j} + 1 + d)$ is free of Δ_j and therefore is of $O_{p} (1)$ . Then, let $ζ_{n, j} = o (n^{2 κ_{j} - 1})$ , as $n \to \infty$ , $ζ_{n, j}^{- 1} T_{n}^{(C)} (k_{j}) \geq ζ_{n, j}^{- 1} \frac{L_{n}^{(C)} {(k_{j} | k_{j} - d, k_{j} + 1 + d)}^{2}}{V_{n}^{(C)} (k_{j} | k_{j} - d, k_{j} + 1 + d)} \to \infty .$

Therefore, $P {T_{n}^{(C)} (k_{j}) > ζ_{n, j}} \to 1$ . For any time i, there are $M_{i}^{-} \geq 0$ change point(s) in the interval $[0, j], M_{i}^{+} \geq 0$ change point(s) in the interval $[i + 1, n]$ . Let $k_{i, 0} = i$ and $k_{i, a} = {\begin{matrix} min (k \in {k_{1}, \dots, k_{M}, n} : k > k_{i, a - 1}), \\ if a = 1, 2, \dots, \\ max (k \in {0, k_{1}, \dots, k_{M}, n} : k < k_{i, a + 1}), \\ if a = - 1, - 2, \dots . \end{matrix}$

Therefore, $k_{i, a}$ denotes the time of $| a |$ th change point after time i if a > 0; before time i if a < 0. Likewise, let $Δ_{i, a}$ be the size of change corresponding to $k_{i, a}$ for $| a | > 0$ , and $Δ_{i, 0} = 0$ . Let $Δ_{s : e}^{*} = sup_{i = s, \dots, e - 1} | E (X_{i + 1}) - E (X_{i}) |$ and a set of time indices in (s, e) that are asymptotically away from all change points be, $Ω_{s : e} = {i \in [s, e] \cap Z : lim_{n \to \infty} | i / n - π_{j} | > 0 for all j = 1, \dots, M},$

Consider non-change-point time $h \in Ω_{1, n}$ . For all $j = 1, \dots, M, d \geq ⌊ n ϵ ⌋$ , and time $i \in Ω_{(h - d) : (h + 1 + d)}$ , we have $M_{i}^{-} + M_{i}^{+} \geq 1$ and $\begin{matrix} L_{n}^{(C)} {(i | h - d, h + 1 + d)}^{2} \\ = \frac{d + 1}{8} {{\bar{Z}}_{h - d : i} - {\bar{Z}}_{i + 1 : h + 1 + d} + 1_{(M_{i}^{-} - M_{h - d}^{-} > 0)} \\ \sum_{v = - (M_{i}^{-} - M_{h - d}^{-})}^{- 1} (\frac{k_{i, v + 1} - k_{i, v}}{i - h + d + 1} \sum_{l = - (M_{i}^{-} - M_{h - d}^{-})}^{v} Δ_{i, l}) \\ - \sum_{v = 0}^{M_{i}^{+} - M_{h + 1 + d}^{+}} (\frac{{k_{i, v + 1} \land (h + 1 + d)} - k_{i, v}}{h + 1 + d - i} \sum_{l = - (M_{i}^{-} - M_{h - d}^{-})}^{v} Δ_{i, l})}^{2} \\ = O_{p} [n {Δ_{(h - d) : (h + 1 + d)}^{*}}^{2}] . \end{matrix}$

Using the result, the self-normalizer is at least of the same order because for some c > 0, $\begin{matrix} V_{n}^{(C)} (h | h - d, h + 1 + d) \\ \geq \frac{1}{4 (d + 1)} {\sum_{j \in Ω_{(h - d) : h}} L_{n}^{(C)} {(j | h - d, h)}^{2} \\ + \sum_{j \in Ω_{(h + 1) : (h + 1 + d)}} L_{n}^{(C)} {(j | h + 1, h + 1 + d)}^{2}} \\ = c n {Δ_{(h - d) : (h + 1 + d)}^{*}}^{2} . \end{matrix}$

Therefore, $T_{n}^{(C)} (h) = O_{p} (1)$ , implying $P {sup_{s \in [π_{j} - ϵ, π_{j} + ϵ] ∖ {π_{j}}}$ $T_{n}^{(C)} (⌊ n s ⌋) > ζ_{n, j}} \to 0$ . □

Proof of Theorem 3.4.

For any small enough $η > 0$ such that $η < ϵ$ , let $A_{j'} : = {{min}_{j = 1, \dots, \hat{M}} | π_{j'} - {\hat{π}}_{j} | > η}$ be the event that the absolute difference between the $j'$ th true change point and the closest estimator in ${\hat{Π}}^{(C)}$ is greater than η. Let $Ψ_{j'} (ϵ, η) = [π_{j} - ϵ, π_{j} + ϵ] ∖ [π_{j'} - η, π_{j'} + η]$ . By the definition of ${\hat{Π}}^{(C)}$ in (3.13) with $D = C$ , we can represent $A_{j'}$ as $\begin{matrix} A_{j'} = {sup_{t \in [π_{j'} - η, π_{j'} + η]} T_{n}^{(C)} (⌊ n t ⌋) \leq sup_{s \in Ψ_{j'} (ϵ, η)} T_{n}^{(C)} (⌊ n s ⌋)} \\ \cup {sup_{t \in [π_{j'} - η, π_{j'} + η]} T_{n}^{(C)} (⌊ n t ⌋) \leq ϱ} \\ \subseteq {T_{n}^{(C)} (⌊ n π_{j'} ⌋) \leq sup_{s \in Ψ_{j'} (ϵ, η)} T_{n}^{(C)} (⌊ n s ⌋)} \\ \cup {T_{n}^{(C)} (⌊ n π_{j'} ⌋) \leq ϱ} . \end{matrix}$

Using Lemma 3.3, we have, for all $j' = 1, \dots, M$ as $n \to \infty$ , $\begin{matrix} P (A_{j'}) \leq P {T_{n}^{(C)} (⌊ n π_{j'} ⌋) < sup_{s \in Ψ_{j'} (ϵ, η)} T_{n}^{(C)} (⌊ n s ⌋)} \\ + P {T_{n}^{(C)} (⌊ n π_{j'} ⌋) \leq ϱ} \to 0. \end{matrix}$

Then, $P ({max}_{j' = 1, \dots, M} {min}_{j = 1, \dots, \hat{M}} | π_{j'} - {\hat{π}}_{j} | > η) = P (\cup_{j' = 1}^{M} A_{j'}) \leq$ $\sum_{j' = 1}^{M} P (A_{j'}) \to 0$ . □

Supplementary Materials

The supplementary note contains additional simulation results, finite-n adjusted critical values, recursive formulas, algorithms, and proofs of corollaries. Extension to long-range dependent time series is discussed. Implementation code is provided in the R library SNmct.

Supplemental material

SNmct_supp.zip

Download Zip (800.3 KB)

Acknowledgments

The authors would like to thank the anonymous referees, an associate editor, and the editor for their constructive comments that improved the scope and presentation of the article.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

This research was partially supported by grants GRF-14304420, 14306421, and 14307922 provided by the Research Grants Council of HKSAR.

References

Andrews, D. W. (1991), “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59, 817–858.
Web of Science ®Google Scholar
Antoch, J., and Jarušková, D. (2013), “Testing for Multiple Change Points,” Computational Statistics, 28, 2161–2183.
Web of Science ®Google Scholar
Bai, J., and Perron, P. (1998), “Estimating and Testing Linear Models with Multiple Structural Changes,” Econometrica, 66, 47–78.
Web of Science ®Google Scholar
Bai, Y., and Chow, D. Y. P. (2017), “Shanghai-Hong Kong Stock Connect: An Analysis of Chinese Partial Stock Market Liberalization Impact on the Local and Foreign Markets,” Journal of International Financial Markets, Institutions and Money Rank, 50, 182–203.
Web of Science ®Google Scholar
Bauer, P., and Hackl, P. (1980), “An Extension of the MOSUM Technique for Quality Control,” Technometrics, 22, 1–7.
Web of Science ®Google Scholar
Billingsley, P. (1999), Convergence of Probability Measures. Wiley Series in Probability and Statistics (2nd ed.), New York: Wiley.
Google Scholar
Bruni, C., DiPillo, G., and Koch, G. (1974), “Bilinear Systems: An Appealing Class of “Nearly Linear” Systems in Theory and Applications,” IEEE Transactions on Automatic Control, 19, 334–348. DOI: 10.1109/TAC.1974.1100617.
Web of Science ®Google Scholar
Chan, K. W. (2022a), “Mean-Structure and Autocorrelation Consistent Covariance Matrix Estimation,” Journal of Business & Economic Statistics, 40, 201–215. DOI: 10.1080/07350015.2020.1796397.
Web of Science ®Google Scholar
Chan, K. W. (2022b), “Optimal Difference-based Variance Estimators in Time Series: A General Framework,” Annals of Statistics, 50, 1376–1400.
Web of Science ®Google Scholar
Cho, H., and Kirch, C. (2022), “Two-Stage Data Segmentation Permitting Multiscale Change Points, Heavy Tails and Dependence,” Annals of the Institute of Statistical Mathematics, 74, 653–684. DOI: 10.1007/s10463-021-00811-5.
Web of Science ®Google Scholar
Chu, C.-S. J., Hornik, K., and Kaun, C.-M. (1995), “MOSUM Tests for Parameter Constancy,” Biometrika, 82, 603–617. DOI: 10.1093/biomet/82.3.603.
Web of Science ®Google Scholar
Csörgö, M., and Horváth, L. (1988), “20 Nonparametric Methods for Changepoint Problems,” Handbook of Statistics, 7, 403–425.
Google Scholar
Csörgö, M., and Horváth, L. (1997), Limit Theorems in Change-point Analysis, Wiley Series in Probability and Statistics, New York: John Wiley.
Google Scholar
Dargahi-Noubary, G., Laycock, P., and Rao, T. S. (1978), “Non-Linear Stochastic Models for Seismic Events with Applications in Event Identification,” Geophysical Journal International, 55, 655–668. DOI: 10.1111/j.1365-246X.1978.tb05934.x.
Google Scholar
Dehling, H., Fried, R., Garcia, I., and Wendler, M. (2015), “Change-Point Detection Under Dependence based on Two-Sample U-Statistics,” in Asymptotic Laws and Methods in Stochastics, eds. D. Dawson, R. Kulik, M. Ould Haye, B. Szyszkowicz and Y. Zhao), pp. 195–220. New York: Springer.
Google Scholar
Dehling, H., Fried, R., and Wendler, M. (2020), “A Robust Method for Shift Detection in Time Series,” Biometrika, 107, 647–660. DOI: 10.1093/biomet/asaa004.
Web of Science ®Google Scholar
Eichinger, B., and Kirch, C. (2018), “A MOSUM Procedure for the Estimation of Multiple Random Change Points,” Bernoulli, 24, 526–564. DOI: 10.3150/16-BEJ887.
Web of Science ®Google Scholar
Fryzlewicz, P. (2014), “Wild Binary Segmentation for Multiple Change-Point Detection,” Annals of Statistics, 42, 2243–2281.
Web of Science ®Google Scholar
Granger, C. W. J., and Anderson, A. P. (1978), An Introduction to Bilinear Time Series Models, Gottingen: Vandenhoeck & Ruprecht.
Google Scholar
Herrndorf, N. (1984), “A Functional Central Limit Theorem for Weakly Dependent Sequences of Random Variables,” Annals of Probability, 12, 141–153.
Web of Science ®Google Scholar
Huang, Y., Volgushev, S., and Shao, X. (2015), “On Self-Normalization for Censored Dependent Data,” Journal of Time Series Analysis, 36, 109–124. DOI: 10.1111/jtsa.12096.
Web of Science ®Google Scholar
Huo, R., and Ahmed, A. D. (2017), “Return and Volatility Spillovers Effects: Evaluating the Impact of Shanghai-Hong Kong Stock Connect,” Economic Modelling, 61, 260–272. DOI: 10.1016/j.econmod.2016.09.021.
Web of Science ®Google Scholar
Jiang, F., Zhao, Z., and Shao, X. (2021), “Modelling the Covid-19 Infection Trajectory: A Piecewise Linear Quantile Trend Model,” Journal of the Royal Statistical Society, Series B, 84, 1589–1607. DOI: 10.1111/rssb.12453.
Google Scholar
Kiefer, N. M., and Vogelsang, T. J. (2002), “Heteroskedasticity-Autocorrelation Robust Standard Errors using the Bartlett Kernel Without Truncation,” Econometrica, 70, 2093–2095. DOI: 10.1111/1468-0262.00366.
Web of Science ®Google Scholar
Kiefer, N. M., and Vogelsang, T. J. (2005), “A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests,” Econometric Theory, 21, 1130–1164.
Web of Science ®Google Scholar
Lobato, I. N. (2001), “Testing that a Dependent Process is Uncorrelated,” Journal of the American Statistical Association, 96, 1066–1076. DOI: 10.1198/016214501753208726.
Web of Science ®Google Scholar
Niu, Y. S., and Zhang, H. (2012), “The Screening and Ranking Algorithm to Detect DNA Copy Number Variations,” The Annals of Applied Statistics, 6, 1306–1326. DOI: 10.1214/12-AOAS539.
PubMed Web of Science ®Google Scholar
Pešta, M., and Wendler, M. (2020), “Nuisance-Parameter-Free Changepoint Detection in Non-Stationary Series,” Test, 29, 379–408. DOI: 10.1007/s11749-019-00659-1.
Web of Science ®Google Scholar
Rao, T. S. (1981), “On the Theory of Bilinear Time Series Models,” Journal of the Royal Statistical Society, Series B, 43, 244–255. DOI: 10.1111/j.2517-6161.1981.tb01177.x.
Google Scholar
Rao, T. S., and Gabr, M. M. (1984), An Introduction to Bispectral Analysis and Bilinear Time Series Models (Vol. 24, 1st ed.), New York: Springer-Verlag.
Google Scholar
Shao, X. (2010), “A Self-Normalized Approach to Confidence Interval Construction in Time Series,” Journal of the Royal Statistical Society, Series B, 72, 343–366. DOI: 10.1111/j.1467-9868.2009.00737.x.
Google Scholar
Shao, X. (2015), “Self-Normalization for Time Series: A Review of Recent Developments,” Journal of the American Statistical Association, 110, 1797–1817.
Web of Science ®Google Scholar
Shao, X., and Zhang, X. (2010), “Testing for Change Points in Time Series,” Journal of the American Statistical Association, 105, 1228–1240. DOI: 10.1198/jasa.2010.tm10103.
Web of Science ®Google Scholar
Vogelsang, T. J. (1999), “Sources of Nonmonotonic Power When Testing for a Shift in Mean of a Dynamic Time Series,” Journal of Econometrics, 88, 283–299. DOI: 10.1016/S0304-4076(98)00034-7.
Web of Science ®Google Scholar
Vostrikova, L. Y. (1981), “Detection Disorder in Multidimensional Random Processes,” Soviet Mathematics - Doklady, 259, 55–59.
Google Scholar
Wasserman, L. (2006), All of Nonparametric Statistics (1st ed.), New York: Springer-Verlag.
Google Scholar
Weiss, A. A. (1986), “Arch and Bilinear Time Series Models: Comparison and Combination,” Journal of Business & Economic Statistics, 4, 59–70. DOI: 10.2307/1391387.
Web of Science ®Google Scholar
Wu, W. B. (2007), “Strong Invariance Principles for Dependent Random Variables,” Annals of Probability, 35, 2294–2320.
Web of Science ®Google Scholar
Wu, W. B. (2011), “Asymptotic Theory for Stationary Processes,” Statistics and Its Interface, 4, 207–226.
Web of Science ®Google Scholar
Xu, K., Zheng, X., Pan, D., Xing, L., and Zhang, X. (2020), “Stock Market Openness and Market Quality: Evidence from the Shanghai-Hong Kong Stock Connect Program,” Journal of Financial Research, 43, 373–406. DOI: 10.1111/jfir.12210.
Web of Science ®Google Scholar
Yau, C. Y., and Zhao, Z. (2016), “Inference for Multiple Change Points in Time Series via Likelihood Ratio Scan Statistics,” Journal of the Royal Statistical Society, Series B, 78, 895–916. DOI: 10.1111/rssb.12139.
Google Scholar
Zhang, T., and Lavitas, L. (2018), “Unsupervised Self-Normalized Change-Point Testing for Time Series,” Journal of the American Statistical Association, 113, 637–648. DOI: 10.1080/01621459.2016.1270214.
Web of Science ®Google Scholar
Zhao, Z., Jiang, F., and Shao, X. (2021), “Segmenting Time Series via Self-Normalization,” arXiv preprint arXiv:2112.05331.
Google Scholar
Zhou, Z. (2013), “Heteroscedasticity and Autocorrelation Robust Structural Change Detection,” Journal of the American Statistical Association, 108, 726–740. DOI: 10.1080/01621459.2013.787184.
Web of Science ®Google Scholar

A General Framework for Constructing Locally Self-Normalized Multiple-Change-Point Tests

Abstract

1 Introduction

2 Introduction of Self-Normalization

3 General Framework