Full article: High-Dimensional Time Series Segmentation via Factor-Adjusted Vector Autoregressive Modeling

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Vector autoregressive (VAR) models are popularly adopted for modeling high-dimensional time series, and their piecewise extensions allow for structural changes in the data. In VAR modeling, the number of parameters grow quadratically with the dimensionality which necessitates the sparsity assumption in high dimensions. However, it is debatable whether such an assumption is adequate for handling datasets exhibiting strong serial and cross-sectional correlations. We propose a piecewise stationary time series model that simultaneously allows for strong correlations as well as structural changes, where pervasive serial and cross-sectional correlations are accounted for by a time-varying factor structure, and any remaining idiosyncratic dependence between the variables is handled by a piecewise stationary VAR model. We propose an accompanying two-stage data segmentation methodology which fully addresses the challenges arising from the latency of the component processes. Its consistency in estimating both the total number and the locations of the change points in the latent components, is established under conditions considerably more general than those in the existing literature. We demonstrate the competitive performance of the proposed methodology on simulated datasets and an application to U.S. blue chip stocks data. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

Vector autoregressive (VAR) models are popular for modeling cross-sectional and serial correlations in multivariate, possibly high-dimensional time series. With, for example, applications in finance (Barigozzi and Hallin 2017), biology (Shojaie and Michailidis 2010) and genomics (Michailidis and d’Alché Buc 2013). Within such settings, the importance of data segmentation is well-recognized, and several methods exist for detecting change points in VAR models in both fixed (Kirch, Muhsal, and Ombao 2015) and high dimensions (Safikhani and Shojaie 2022; Wang et al. 2019; Bai, Safikhani, and Michailidis 2020; Maeng, Eckley, and Fearnhead 2022).

VAR modeling quickly becomes a high-dimensional problem as the number of parameters grows quadratically with the dimensionality. Accordingly, most existing methods for detecting change points in high-dimensional, piecewise stationary VAR processes assumes sparsity (Basu and Michailidis 2015). However, it is debatable whether highly sparse models are appropriate for some applications. For example, Giannone, Lenza, and Primiceri (2021) note the difficulty of identifying sparse predictive representations for several macroeconomic applications.

We illustrate the inadequacy of the sparsity assumption on a volatility panel dataset (see Section 5.3 for its description). shows that as the dimensionality increases, the leading eigenvalue of the spectral density matrix at frequency 0 (i.e., the long-run covariance) estimated from the data also increases linearly. This indicates the presence of strong serial and cross-sectional correlations that cannot be accommodated by sparse VAR models. In , we report the logged and truncated p-values obtained from fitting a VAR(5) model to the same dataset (truncation level chosen at $\log (3.858 \times 10^{- 6})$ by Bonferroni correction with the significance level 0.1) via ridge regression, see Cule, Vineis, and De Iorio (2011). Strong dependence observed from most pairs of the variables further confirms that we cannot infer a sparse pairwise relationship from such data. On the other hand, shows that once we estimate factors driving the strong correlations and adjust for their presence, there is evidence that the remaining dependence in the data can be modeled as being sparse. Together, the plots (d), (e), (c), and (f) display that the relationship between a pair of variables (after factor-adjustment) varies over time, particularly at the level of industrial sectors. Here, the intervals are chosen according to the data segmentation result reported in Section 5.3. This example highlights the importance of (i) accounting for the dominant correlations prior to fitting a model under the sparsity assumption, and (ii) detecting structural changes when analyzing time series datasets covering a long period.

Fig. 1 (a) The two largest eigenvalues of the long-run covariance matrix estimated from the volatility panel analyzed in Section 5.3 (March 18, 2008–July 07, 2009, n = 223) with subsets of cross-sections randomly sampled 100 times for each given dimension $p \in {5, \dots, 72}$ (x-axis). (b) and (c): logged and truncated p-values from fitting a VAR(5) model to the same dataset without and with factor-adjustment. (d)–(f): logged and truncated p-values similarly obtained with factor-adjustment from the same variables over different periods. In (b)–(f), for each pair of variables, the minimum p-value over the five lags is reported. Corresponding tickers are given in x- and y-axes and industrial sectors are indicated by the colors and boundaries drawn.

Motivated by the aforementioned characteristics of high-dimensional time series data, factor-adjusted regression modeling has increasingly gained popularity (Fan, Ke, and Wang 2020; Fan, Masini, and Medeiros 2021; Krampe and Margaritella 2021). The factor-adjusted VAR model proposed by Barigozzi, Cho, and Owens (2022) assumes that a handful of common factors capture strong serial and cross-sectional correlations, such that it is reasonable to assume a sparse VAR model on the remaining component to capture idiosyncratic, variable-specific dependence. We extend this framework by proposing a new, piecewise stationary factor-adjusted VAR model and develop FVARseg, an accompanying change point detection methodology. Below we summarize the methodological and theoretical contributions made in this article.

1.1 Generality of the Modeling Framework

We decompose the data into two piecewise stationary latent processes: one is driven by factors and accounts for dominant serial and cross-sectional correlations, and the other models sparse pairwise dependence via a VAR model. We adopt the most general approach to factor modeling and allow both components to undergo changes which, in the case of the latter, are attributed to shifts in the VAR parameters. To the best of our knowledge, such a general model simultaneously permitting the presence of common factors and change points, has not been studied in the literature previously. Accordingly, we are not aware of any method that can comprehensively address the data segmentation problem considered in this article.

1.2 Methodological Novelty

The idea of scanning the data for changes over moving windows, has successfully been applied to a variety of data segmentation problems (Preuss, Puchstein, and Dette 2015; Eichinger and Kirch 2018; Chen, Wang, and Wu 2022). We propose FVARseg, a two-stage methodology that combines this idea with statistics carefully designed to have good detection power against different types of changes in the two latent components. In Stage 1 of FVARseg, motivated by that dominant factor-driven correlations appear as leading eigenvalues in the frequency domain, see for example, , we propose a detector statistic that contrasts the local spectral density matrix estimators from neighboring moving windows in operator norm, which is well-suited to detect changes in the factor-driven component.

In Stage 2 for detecting change points in the latent piecewise stationary VAR process, we deliberately avoid estimating the latent process which may incur large errors. Instead, we make use of (i) the Yule-Walker equation that relates autocovariances (ACV) and VAR parameters, and (ii) the availability of local ACV estimators of the latent VAR process after Stage 1. Combining these ingredients, we propose a novel detector statistic that enjoys methodological simplicity as well as statistical efficiency. Further, through sequential evaluation of the detector statistic, the second-stage procedure requires the estimation of local VAR parameters at selected locations only. Consequently it is highly competitive computationally when both the sample size and the dimensionality are large.

1.3 Theoretical Consistency

FVARseg achieves consistency in estimating the total number and locations of the change points in both of the piecewise stationary factor-driven and VAR processes. Our theoretical analysis is conducted in a setting considerably more general than those commonly adopted in the literature, permitting dependence across stationary segments and heavy-tailedness of the data. We also derive the rate of localization for each stage of FVARseg where we make explicit the influence of tail behavior and the size of changes. In particular, under Gaussianity, the estimators from Stage 1 nearly matches the minimax optimal rate derived for the simpler, covariance change point detection problem.

The rest of the article is structured as follows. Section 2 introduces the piecewise stationary factor-adjusted VAR model. Section 3 describes the two stages of FVARseg, the proposed data segmentation methodology, and Section 4 establishes its theoretical consistency. Section 5 demonstrates the performance of FVARseg empirically, and Section 6 concludes the article. R code is available from https://github.com/haeran-cho/fvarseg.

1.4 Notation

Let $I$ and $O$ denote an identity matrix and a matrix of zeros whose dimensions depend on the context. For a random variable X and $ν \geq 1$ , denote $| | X | |_{ν} = {(E | X |^{ν})}^{1 / ν}$ . Given $A = [a_{i i'}, 1 \leq i \leq m, 1 \leq i' \leq n]$ , we denote by $A^{*}$ its transposed complex conjugate. We define its element-wise $l_{\infty}, l_{1}$ and $l_{2}$ -norms by $| A |_{\infty} = \max_{i, i'} | a_{i i'} |$ , $| A |_{1} = \sum_{i, i'} | a_{i i'} |$ and $| A |_{2} = \sqrt{\sum_{i, i'} | a_{i i'} |^{2}}$ , and its spectral and induced L₁, $L_{\infty}$ -norms by $| | A | |$ , $| | A | |_{1} = \max_{1 \leq i' \leq n} \sum_{i = 1}^{m} | a_{i i'} |$ and $| | A | |_{\infty} = \max_{1 \leq i \leq n} \sum_{i' = 1}^{m} | a_{i i'} |$ , respectively. For positive definite $A$ , we denote its minimum eigenvalue by $| | A | |_{\min}$ . For two real numbers, $a \lor b = \max (a, b)$ and $a \land b = \min (a, b)$ . For two sequences ${a_{n}}$ and ${b_{n}}$ , we write $a_{n} ≍ b_{n}$ if, for some constants $C_{1}, C_{2} > 0$ , there exists $N \in N$ such that $C_{1} \leq a_{n} b_{n}^{- 1} \leq C_{2}$ for all $n \geq N$ .

2 Piecewise Stationary Factor-Adjusted VAR Model

2.1 Background

A zero-mean, p-variate process $ξ_{t}$ follows a VAR(d) model if it satisfies (1) $ξ_{t} = A_{1} ξ_{t - 1} + \dots + A_{d} ξ_{t - d} + {(Γ)}^{1 / 2} ε_{t},$ (1) where $A_{l} \in R^{p \times p}, 1 \leq l \leq d$ , determine how future values of the series depend on their past. The p-variate random vector $ε_{t} = {(ε_{1 t}, \dots, ε_{p t})}^{⊤}$ has $ε_{i t}$ which are iid for all i and t with $E (ε_{i t}) = 0$ and $var (ε_{i t}) = 1$ . The positive definite matrix $Γ \in R^{p \times p}$ is the covariance matrix of the innovations for the VAR process.

A factor-driven component exhibits strong cross-sectional and/or serial correlations by “loading” finite-dimensional factors linearly. Among many, the generalized dynamic factor model (GDFM, Forni et al. 2000, 2015) provides the most general approach (see Appendix D for further discussions), and defines the p-variate factor-driven component $χ_{t}$ as (2) $χ_{t} = B (L) u_{t} = \sum_{l = 0}^{\infty} B_{l} u_{t - l} .$ (2)

For fixed q, the q-variate random vector $u_{t} = {(u_{1 t}, \dots, u_{q t})}^{⊤}$ contains the common factors which are shared across the variables and time, and u_jt are assumed to be iid for all j and t with $E (u_{j t}) = 0$ and $var (u_{j t}) = 1$ . The matrix of square-summable filters $B (L) = \sum_{l = 0}^{\infty} B_{l} L^{l}$ with the lag-operator L and $B_{l} \in R^{p \times q}$ , serves the role of loadings under (2).

Barigozzi, Cho, and Owens (2022) propose a factor-adjusted VAR model, where the observations are assumed to be decomposed as a sum of the two latent components $ξ_{t}$ and $χ_{t}$ in (1)–(2), with pervasive correlations in the data are accounted for by $χ_{t}$ and the remaining dependence captured by $ξ_{t}$ . In the next section, we introduce its piecewise stationary extension where both the factor-driven and VAR processes are allowed to undergo structural changes.

2.2 Model

We observe a zero-mean, p-variate piecewise stationary process $X_{t} = χ_{t} + ξ_{t}$ where (3) ${ \begin{matrix} χ_{t} = χ_{t}^{[k]} = B^{[k]} (L) u_{t} \\ for θ_{χ, k} + 1 \leq t \leq θ_{χ, k + 1}, 0 \leq k \leq K_{χ}, \\ ξ_{t} = ξ_{t}^{[k]} = \sum_{l = 1}^{d} A_{l}^{[k]} ξ_{t - l} + {(Γ^{[k]})}^{1 / 2} ε_{t} \\ for θ_{ξ, k} + 1 \leq t \leq θ_{ξ, k + 1}, 0 \leq k \leq K_{ξ} . \end{matrix}$ (3)

Here, $θ_{χ, k}, 1 \leq k \leq K_{χ}$ , denote the change points in the piecewise stationary factor-driven component $χ_{t}$ such that at each $θ_{χ, k}$ , the filter of loadings $B^{[k]} (L)$ undergoes a change. We permit the factor number to vary over time as $q_{k} \leq q$ , with the factor $u_{t}^{[k]} \in R^{q_{k}}$ associated with $χ_{t}^{[k]}$ being a sub-vector of $u_{t} \in R^{q}$ . Similarly, $θ_{ξ, k}, 1 \leq k \leq K_{ξ}$ , denote the change points in the piecewise stationary VAR process $ξ_{t}$ at which the VAR parameters undergo shifts; we permit the VAR innovation covariance matrix to vary as $Γ^{[k]}$ but our interest lies in detecting changes in VAR parameters, and the VAR order may vary over time as $d_{k} \leq d$ with $A_{l}^{[k]} = O$ for $l \geq d_{k} + 1$ . By convention, we denote $θ_{χ, 0} = θ_{ξ, 0} = 0$ and $θ_{χ, K_{χ} + 1} = θ_{ξ, K_{ξ} + 1} = n$ . In line with the factor modeling literature, we assume that $χ_{t}$ and $ξ_{t}$ are uncorrelated through having $E (u_{j t} ε_{i t'}) = 0$ for any i, j, t and $t'$ .

The model (3) does not require that the change points in $χ_{t}$ and $ξ_{t}$ are aligned, or that $K_{χ} = K_{ξ}$ . Our goal is to estimate the total number and locations of the change points for both of the piecewise stationary latent processes. Importantly, we allow ${ξ_{t}^{[k]}, t \in Z}$ (resp. ${χ_{t}^{[k]}, t \in Z}$ ) to be dependent across k through sharing the innovations $ε_{t}$ (resp. $u_{t}$ ). This makes our model considerably more general than those found in the literature on (high-dimensional) data segmentation under VAR models (Wang et al. 2019; Safikhani and Shojaie 2022; Bai, Safikhani, and Michailidis 2022) which assume independence across the segments. Data segmentation under factor models has been considered by Barigozzi, Cho, and Fryzlewicz (2018) and Li, Li, and Fryzlewicz (2023) but they adopt a static approach to factor modeling.

2.3 Assumptions

We introduce assumptions that ensure the (asymptotic) identifiability of the two latent processes in (3) which are framed in terms of spectral properties, as well as controlling the degree of dependence in the data. Denote by $Γ_{χ}^{[k]} (l) = E (χ_{t - l}^{[k]} {(χ_{t}^{[k]})}^{⊤})$ the ACV matrix of $χ_{t}^{[k]}$ at lag $l \in Z$ , and its spectral density matrix at frequency $ω \in [- π, π]$ by $Σ_{χ}^{[k]} (ω) = {(2 π)}^{- 1} \sum_{l = - \infty}^{\infty} Γ_{χ}^{[k]} (l) e^{- ι l ω}$ with $ι = \sqrt{- 1}$ . Then, $μ_{χ, j}^{[k]} (ω), 1 \leq j \leq q_{k}$ , denote the real, positive eigenvalues of $Σ_{χ}^{[k]} (ω)$ ordered by decreasing size. We similarly define $Γ_{ξ}^{[k]} (l), Σ_{ξ}^{[k]} (ω)$ and $μ_{ξ, j}^{[k]} (ω)$ for $ξ_{t}^{[k]}$ .

Assumption 2.1.

For each $0 \leq k \leq K_{χ}$ , the following holds: There exist a positive integer $p_{0} \geq 1$ , pairs of functions $ω \mapsto α_{j}^{[k]} (ω)$ and $ω \mapsto β_{j}^{[k]} (ω)$ for $ω \in [- π, π]$ and $1 \leq j \leq q_{k}$ , and $r_{k, j} \in (0, 1]$ satisfying $r_{k, 1} \geq \dots \geq r_{k, q_{k}}$ such that for all $p \geq p_{0}$ , $\begin{matrix} β_{1}^{[k]} (ω) \geq \frac{μ_{χ, 1}^{[k]} (ω)}{p^{r_{k, 1}}} \geq α_{1}^{[k]} (ω) > \dots > β_{q_{k}}^{[k]} (ω) \\ \geq \frac{μ_{q_{k}}^{[k]} (ω)}{p^{r_{k, q_{k}}}} \geq α_{q_{k}}^{[k]} (ω) > 0. \end{matrix}$

If $r_{k, j} = 1$ for all $1 \leq j \leq q_{k}$ as frequently assumed in the literature (Fan, Liao, and Mincheva 2013; Forni et al. 2015), we are in the presence of q_k factors which are equally pervasive for the whole cross-sections of $χ_{t}^{[k]}$ . If $r_{k, j} < 1$ for some j, we permit the presence of ‘weak’ factors. Since our primary interest lies in change point analysis, we later introduce a related but distinct condition on the size of change in $χ_{t}$ in Assumption 4.2.

Assumption 2.2.

$\det (\sum_{l = 1}^{d} A_{l}^{[k]} z^{l}) \neq 0$ for all $| z | \leq 1$ and $0 \leq k \leq K_{ξ}$ .
$m_{ε} \leq \min_{0 \leq k \leq K_{ξ}} | | Γ^{[k]} | |_{\min} \leq \max_{0 \leq k \leq K_{ξ}} | | Γ^{[k]} | | \leq M_{ε}$ for some constants $0 < m_{ε} \leq M_{ε}$ .
Consider the Wold decomposition $ξ_{t}^{[k]} = \sum_{l = 0}^{\infty} D_{l}^{[k]} {(Γ^{[k]})}^{1 / 2} ε_{t - l}$ where $D_{l}^{[k]} = [D_{l, i j}^{[k]}, 1 \leq i, j \leq p]$ . Then, there exist constants $Ξ > 0$ and $ς > 2$ such that we have $C_{i j}, 1 \leq i, j \leq p$ , satisfying $\max {\max_{1 \leq j \leq p} \sum_{i = 1}^{p} C_{i j}, \max_{1 \leq i \leq p} \sum_{j = 1}^{p} C_{i j}, \max_{1 \leq i \leq p} \sqrt{\sum_{j = 1}^{p} C_{i j}^{2}}} \leq Ξ$ with which $\max_{0 \leq k \leq K_{ξ}} | D_{l, i j}^{[k]} | \leq C_{i j} {(1 + l)}^{- ς}$ for all $l \geq 0$ .
$\min_{0 \leq k \leq K_{ξ}} \inf_{ω \in [- π, π]} μ_{ξ, p}^{[k]} (ω) \geq m_{ξ}$ for some fixed constant $m_{ξ} > 0$ .

Assumption 2.3.

There exist constants $Ξ > 0$ and $ς > 2$ such that for all $l \geq 0$ , $\begin{matrix} \max_{_{0 \leq k \leq K_{χ}}} \max_{_{1 \leq i \leq p}} | B_{l, i \cdot}^{[k]} |_{2} \leq Ξ {(1 + l)}^{- ς} and \\ \max_{_{0 \leq k \leq K_{χ}}} \sqrt{\sum_{j = 1}^{q_{k}} | B_{l, \cdot j}^{[k]} |_{\infty}^{2}} \leq Ξ {(1 + l)}^{- ς} . \end{matrix}$

Assumption 2.2(i)–(ii) are standard conditions in the literature (Lütkepohl 2005; Basu and Michailidis 2015). Under condition (iii) and Assumption 2.3, we have time-varying serial dependence in $X_{t}$ (across all segments) decay at an algebraic rate. Assumption 2.2(iii) allows for mild cross-correlations in $ξ_{t}^{[k]}$ while ensuring that $μ_{ξ, 1}^{[k]} (ω)$ is uniformly bounded:

Proposition 2.1.

Under Assumption 2.2, uniformly over all $ω \in [- π, π]$ , there exists some $M_{ξ} > 0$ depending only on $M_{ε}$ , Ξ and Ϛ such that $\max_{0 \leq k \leq K_{ξ}} \sup_{ω \in [- π, π]} μ_{ξ, 1}^{[k]} (ω) \leq M_{ξ}$ .

Remark 2.1.

Proposition 2.1, together with Assumption 2.2(iv), establishes the boundedness of the eigenvalues of $Σ_{ξ}^{[k]} (ω)$ , which is commonly assumed in the high-dimensional VAR literature for the consistency of Lasso estimators. Assumption 2.2(iv) holds if there exists some constant $Ξ < \infty$ satisfying $\max (\max_{1 \leq j \leq p} \sum_{l = 1}^{d} | A_{l, \cdot j}^{[k]} |_{1}, \max_{1 \leq i \leq p} \sum_{l = 1}^{d} | A_{l, i \cdot}^{[k]} |_{1}) \leq Ξ$ (Basu and Michailidis 2015). When d = 1, we have $D_{l}^{[k]} = {(A_{1}^{[k]})}^{l}$ such that if $| A_{1}^{[k]} |_{\infty} \leq γ < 1$ , Assumption 2.2(iii) is readily satisfied with $\max (| | D_{l}^{[k]} | |_{1}, | | D_{l}^{[k]} | |_{\infty}) \leq Ξ γ^{l - 1}$ .

From Assumption 2.1 and Proposition 2.1, the latent components in (3) are asymptotically identifiable as $p \to \infty$ , thanks to the gap between $μ_{χ, q_{k}}^{[k]} (ω)$ diverging with p and $μ_{ξ, 1}^{[k]} (ω)$ which is uniformly bounded, which agrees with the phenomenon observed in .

3 Methodology

3.1 Stage 1: Factor-Driven Component Segmentation

3.1.1 Change Point Detection

The spectral density matrix of $χ_{t}$ is given by $Σ_{χ}^{[k]} (ω) = {(2 π)}^{- 1} B^{[k]} (e^{- ι ω}) {(B^{[k]} (e^{- ι ω}))}^{*}$ for $θ_{χ, k} + 1 \leq t \leq χ_{χ, k + 1}$ , that is it varies over time in a piecewise constant manner with change points at $θ_{χ, k}, 1 \leq k \leq K_{χ}$ . By Weyl’s inequality, Assumption 2.1 and Proposition 2.1 jointly indicate a gap in the eigenvalues of (time-varying) spectral density matrix of $X_{t}$ , that is, those attributed to the factor-driven component diverges with p while the remaining ones are bounded for all p. This suggests an approach that looks for changes in $χ_{t}$ from the behavior of $X_{t}$ in the frequency domain which we further justify below.

Example 3.1.

Suppose that $χ_{t}$ contains a single change point at $t = θ_{χ, 1}$ at which a new factor is introduced, that is $χ_{t}^{[0]} = B^{[0]} (L) u_{t}^{[0]}$ and $χ_{t}^{[1]} = B^{[1]} (L) u_{t}^{[1]} = B^{[0]} (L) u_{t}^{[0]} + b (L) v_{t}$ with $u_{t}^{[1]} = {({(u_{t}^{[0]})}^{⊤}, v_{t})}^{⊤}$ , which leads to $Σ_{χ}^{[1]} (ω) - Σ_{χ}^{[0]} (ω) = b (e^{- ι ω}) b^{*} (e^{- ι ω}) / (2 π)$ . Then, from the uncorrelatedness between $χ_{t}$ and $ξ_{t}$ and Proposition 2.1, the time-varying spectral density of $X_{t}, Σ_{x, t} (ω)$ , satisfies $| | \sum_{t = 1}^{θ_{χ, 1}} Σ_{x, t} (ω) / θ_{χ, 1} - \sum_{t = θ_{χ, 1} + 1}^{n} Σ_{x, t} (ω) / (n - θ_{χ, 1}) | | = | | b (e^{- ι ω}) b^{*} (e^{- ι ω}) | | / (2 π) + O (1)$ . That is, the change in the spectral density of $χ_{t}$ is detectable as a change in time-varying spectral density matrix of $X_{t}$ in operator norm, with the size of change diverging with p as $| | {(b (e^{- ι ω}))}^{*} b (e^{- ι ω}) | |$ does so under Assumption 2.1.

Thus, we detect changes in $χ_{t}$ by scanning for any large change in the spectral density matrix of $X_{t}$ measured in operator norm, and propose the following moving window-based approach. Given a bandwidth G, we estimate the local spectral density matrix of $X_{t}$ by (4) $\begin{matrix} {\hat{Σ}}_{x, v} (ω, G) = \frac{1}{2 π} \sum_{l = - m}^{m} K (\frac{l}{m}) {\hat{Γ}}_{x, v} (l, G) \exp (- ι l ω) \\ for G \leq v \leq n, \end{matrix}$ (4) where $K (\cdot)$ denotes the Bartlett kernel, $m = G^{β}$ the kernel bandwidth with $β \in (0, 1)$ , and (5) $\begin{matrix} {\hat{Γ}}_{x, v} (l, G) = \frac{1}{G} \sum_{t = v - G + 1 + l}^{v} X_{t - l} X_{t}^{⊤} for l \geq 0, and \\ {\hat{Γ}}_{x, v} (l, G) = {\hat{Γ}}_{x, v}^{⊤} (- l, G) for l < 0. \end{matrix}$ (5)

Then the following statistic (6) $T_{χ, v} (ω, G) = | | {\hat{Σ}}_{x, v} (ω, G) - {\hat{Σ}}_{x, v + G} (ω, G) | |, G \leq v \leq n - G,$ (6) serves as a good proxy of the difference in local spectral density matrices of $χ_{t}$ over $I_{v} (G) = {v - G + 1, \dots, v}$ and $I_{v + G} (G) = {v + 1, \dots, v + G}$ . To make it more precise, let $Σ_{χ, v} (ω, G)$ denote a weighted average $\sum_{k = 0}^{K_{χ}} w_{χ, k} (v) Σ_{χ}^{[k]} (ω)$ with weights $w_{χ, k} (v)$ corresponding to the proportion of $χ_{t}, t \in I_{v} (G)$ , belonging to $χ_{t}^{[k]}$ (see F.1). Then, $T_{χ, v}^{*} (ω, G) = | | Σ_{χ, v} (ω, G) - Σ_{χ, v + G} (ω, G) | |$ , as a function of v, linearly increases and then decreases around the change points with a peak of size $| | Σ_{χ}^{[k]} (ω) - Σ_{χ}^{[k + 1]} (ω) | |$ formed at $v = θ_{χ, k}$ for all $1 \leq k \leq K_{χ}$ , provided that the bandwidth G is not too large (in the sense of Assumption 4.2(ii)). The detector statistic $T_{χ, v} (ω, G)$ is designed to approximate $T_{χ, v}^{*} (ω, G)$ when $χ_{t}$ is not directly observed, and thus is well-suited to detect and locate the change points therein. Unlike other methods for detecting changes in the factor structure (e.g., Li, Li, and Fryzlewicz 2023), we do not require the number of factors, either for each segment or for the whole dataset, as an input for the construction of $T_{χ, v} (ω, G)$ .

Once $T_{χ, v} (ω_{l}, G)$ is evaluated at the Fourier frequencies $ω_{l} = 2 π l / (2 m + 1), 0 \leq l \leq m$ , we adapt the maximum-check of Eichinger and Kirch (2018) for simultaneous detection of the multiple change points. Taking the pointwise maximum over the frequencies at each given location v, we check if $T_{χ, v} (ω (v), G)$ exceeds some threshold $κ_{n, p}$ where $ω (v)$ denotes the frequency at which $T_{v} (ω_{l}, G)$ is maximized, that is $ω (v) = \arg \max_{ω_{l} : 0 \leq l \leq m} T_{v} (ω_{l}, G)$ . If so, it provides evidence that a change point $θ_{χ, k}$ is located near the time point v, but some care is needed to avoid detecting duplicate estimators, since the detector statistic is expected to take a large value over an interval containing $θ_{χ, k}$ . Therefore, denoting by $I \subset {G, \dots, n - G}$ the set containing all time points at which $T_{χ, v} (ω (v), G) > κ_{n, p}$ , we regard $\hat{θ} = \arg \max_{v \in I} T_{χ, v} (ω (v), G)$ as a change point estimator if it is a local maximizer of $T_{χ, v} (ω (\hat{θ}), G)$ within an interval of radius ηG centered at $\hat{θ}$ with some $η \in (0, 1)$ , that is $T_{χ, \hat{θ}} (ω (\hat{θ}), G) \geq \max_{\hat{θ} - η G < v \leq \hat{θ} + η G} T_{χ, v} (ω (\hat{θ}), G)$ . Once $\hat{θ}$ is added to the set of final estimators, say ${\hat{Θ}}_{χ}$ , in order to avoid the risk of duplicate estimators, we remove the interval of radius G centered at $\hat{θ}$ from $I$ , and repeat the same procedure with the maximizer of $T_{χ, v} (ω (v), G)$ at time points v remaining in $I$ until the set $I$ is empty. Algorithm 1 in Appendix C outlines the steps of Stage 1 of FVARseg.

3.1.2 Post-Segmentation Factor Adjustment

Following the detection of change points in $χ_{t}$ , we are able to estimate the segment-specific quantities related to $χ_{t}^{[k]}$ . In view of the second-stage of FVARseg detecting change points in $ξ_{t}$ , we describe how to estimate $Γ_{χ}^{[k]} (l)$ with which we can estimate the ACV of $ξ_{t}$ .

For each $k = 0, \dots, {\hat{K}}_{χ}$ , we first estimate the spectral density of $X_{t}$ over the segment ${{\hat{θ}}_{χ, k} + 1, \dots, {\hat{θ}}_{χ, k + 1}}$ by ${\hat{Σ}}_{x}^{[k]} (ω)$ as in (4) using the sample ACV computed from the segment (we use the same kernel bandwidth m for simplicity). Then noting that the spectral density matrix of $χ_{t}^{[k]}$ is of rank q_k under (3), we estimate it from the eigendecomposition of ${\hat{Σ}}_{x}^{[k]} (ω_{l})$ by retaining only the q_k largest eigenvalues, say ${\hat{μ}}_{x, j}^{[k]} (ω_{l})$ , and the associated eigenvectors ${\hat{e}}_{x, j}^{[k]} (ω_{l})$ , and then estimate the ACV of $χ_{t}^{[k]}$ by inverse Fourier transform, that is (7) $\begin{matrix} {\hat{Σ}}_{χ}^{[k]} (ω_{l}) = \sum_{j = 1}^{q_{k}} {\hat{μ}}_{x, j}^{[k]} (ω_{l}) {\hat{e}}_{x, j}^{[k]} (ω_{l}) {({\hat{e}}_{x, j}^{[k]} (ω_{l}))}^{*} and \\ {\hat{Γ}}_{χ}^{[k]} (l) = \frac{2 π}{2 m + 1} \sum_{l = - m}^{m} {\hat{Σ}}_{χ}^{[k]} (ω_{l}) e^{ι ω_{l} l} . \end{matrix}$ (7)

The estimators in (7) require the factor number q_k as an input. We refer to Hallin and Liška (2007) for an information criterion (IC)-based estimator of q_k that make use of the postulated eigengap in the spectral density matrix of $X_{t}$ .

3.2 Stage 2: Piecewise VAR Process Segmentation

Applying the existing VAR segmentation methods in our setting requires estimating the np elements of the latent piecewise stationary VAR process $ξ_{t}$ , which introduces additional errors and possibly results in the loss of statistical efficiency. In addition, as discussed in Appendix A.2, the existing methods tend to be computationally demanding, for example by evaluating the Lasso estimators $O (n^{2})$ times in a dynamic programming algorithm, or solving a large fused Lasso objective function of dimension $n p^{2} d$ . Instead, since we can estimate the local ACV of $ξ_{t}$ from the post-segmentation factor-adjustment in Stage 1, our proposed methodology for segmenting the latent VAR component avoids estimating $ξ_{t}$ directly. Also, as described below, the proposed method evaluates the local VAR parameters at carefully selected locations only, and thus is computationally efficient.

Specifically, our approach makes use of the Yule-Walker equation (Lütkepohl 2005). Let $β^{[k]} = {[A_{1}^{[k]}, \dots, A_{d}^{[k]}]}^{⊤} \in R^{(p d) \times p}$ contain all VAR parameters in the kth segment. Then, it is related to the ACV matrices $Γ_{ξ}^{[k]} (l) = E (ξ_{t - l}^{[k]} {(ξ_{t}^{[k]})}^{⊤})$ as $G^{[k]} β^{[k]} = g^{[k]}$ , where (8) $\begin{matrix} G^{[k]} = [\begin{matrix} Γ_{ξ}^{[k]} (0) & Γ_{ξ}^{[k]} (- 1) & \dots & Γ_{ξ}^{[k]} (- d + 1) \\ ⋱ \\ Γ_{ξ}^{[k]} (d - 1) & Γ_{ξ}^{[k]} (d - 2) & \dots & Γ_{ξ}^{[k]} (0) \end{matrix}] \\ and g^{[k]} = [\begin{matrix} Γ_{ξ}^{[k]} (1) \\ ⋮ \\ Γ_{ξ}^{[k]} (d) \end{matrix}], \end{matrix}$ (8) with $G^{[k]}$ being invertible due to Assumption 2.2(iv). We propose to use this estimating equation in combination with the local ACV estimators of $ξ_{t}$ obtained as described below.

For a given bandwidth G and the interval $I_{v} (G) = {v - G + 1, \dots, v}$ , we estimate the ACV of $ξ_{t}$ for $t \in I_{v} (G)$ , by ${\hat{Γ}}_{ξ, v} (l, G) = {\hat{Γ}}_{x, v} (l, G) - {\hat{Γ}}_{χ, v} (l, G)$ . Here, ${\hat{Γ}}_{x, v} (l, G)$ is defined in (5) and ${\hat{Γ}}_{χ, v} (l, G)$ is a weighted average of ${\hat{Γ}}_{χ}^{[k]} (l), 0 \leq k \leq {\hat{K}}_{χ}$ , the estimators of ACV of $χ_{t}^{[k]}$ in (7), with the weights given by the proportion of $I_{v} (G)$ covered by the kth segment (see F.16 for the precise definition). Replacing $Γ_{ξ}^{[k]} (l)$ with ${\hat{Γ}}_{ξ, v} (l, G)$ , we obtain ${\hat{G}}_{v} (G)$ estimating a weighted average of $G^{[k]}$ , and similarly ${\hat{g}}_{v} (G)$ . Then, we propose to scan $T_{ξ, v} (\hat{β}, G) = | | | ({\hat{G}}_{v} (G) \hat{β} - {\hat{g}}_{v} (G)) - ({\hat{G}}_{v + G} (G) \hat{β} - {\hat{g}}_{v + G} (G)) | | |$ with some inspection parameter $\hat{β} \in R^{(p d) \times p}$ and a matrix norm $| | | \cdot | | |$ . We motivate this statistic by considering $T_{ξ, v}^{*} (\hat{β}, G) = | | | (G_{v} (G) \hat{β} - g_{v} (G)) - (G_{v + G} (G) \hat{β} - g_{v + G} (G)) | | |$ , its population counterpart. With appropriately chosen G (see Assumption 4.4(ii)), $T_{ξ, v}^{*} (\hat{β}, G) = 0$ if v is far from all the change points in $ξ_{t}$ , that is $\min_{k} | v - θ_{ξ, k} | \geq G$ , while it is tent-shaped near the change points with a local maximum at $v = θ_{ξ, k}$ , provided that (9) $G^{[k - 1]} (\hat{β} - β^{[k - 1]}) \neq G^{[k]} (\hat{β} - β^{[k]}) .$ (9)

For the inspection parameter, we adopt an $l_{1}$ -regularized Yule-Walker estimator which, at some $v_{°} \in {G, \dots, n}$ , solves the following constrained $l_{1}$ -minimization problem (10) $\begin{matrix} {\hat{β}}_{v_{°}} (G) = \arg \min_{β \in R^{p d \times p}} | β |_{1} subject to \\ {| {\hat{G}}_{v_{°}} (G) β - {\hat{g}}_{v_{°}} (G) |}_{\infty} \leq λ_{n, p}, \end{matrix}$ (10) with a tuning parameter $λ_{n, p} > 0$ . Assuming stationarity, a similar approach has been proposed for the estimation of high-dimensional VAR in Han, Lu, and Liu (2015), and extended to the case where the VAR process of interest is latent in Barigozzi, Cho, and Owens (2022). The $l_{\infty}$ -constraint in (10) naturally leads to the choice $| | | \cdot | | | = | \cdot |_{\infty}$ , resulting in the following detector statistic: $T_{ξ, v} (\hat{β}, G) = {| ({\hat{G}}_{v} (G) \hat{β} - {\hat{g}}_{v} (G)) - ({\hat{G}}_{v + G} (G) \hat{β} - {\hat{g}}_{v + G} (G)) |}_{\infty} .$

For good detection power, the condition in (9) suggests using an estimator of $β^{[k - 1]}$ or $β^{[k]}$ in place of $\hat{β}$ for detecting $θ_{ξ, k}$ . Therefore, we propose to evaluate $T_{ξ, v} ({\hat{β}}_{v_{°}} (G), G)$ for $v \geq G$ , with ${\hat{β}}_{v_{°}} (G)$ updated sequentially at locations strategically selected as below.

First we estimate $β^{[0]}$ by $\hat{β} = {\hat{β}}_{G} (G)$ in (10) with $v_{°} = G$ and scan the data using $T_{ξ, v} (\hat{β}, G), v \geq v_{°}$ . When $T_{ξ, v} (\hat{β}, G)$ exceeds some threshold, say $π_{n, p}$ , at $v = \overset{ˇ}{θ}$ for the first time, it signifies that a change has occurred in the neighborhood. Reducing the search for a change point to ${\overset{ˇ}{θ}, \dots, \overset{ˇ}{θ} + G}$ , we identify a change point estimator as the local maximizer ${\hat{θ}}_{ξ, 1} = \arg \max_{\overset{ˇ}{θ} \leq v \leq \overset{ˇ}{θ} + G} T_{ξ, v} (\hat{β}, G)$ . Then updating $\hat{β}$ with ${\hat{β}}_{v_{°}} (G)$ obtained at $v_{°} = {\hat{θ}}_{ξ, 1} + (η + 1) G$ for some $η \in (0, 1]$ (i.e., only using an interval of length G located strictly to the right of ${\hat{θ}}_{ξ, 1}$ for its computation), we continue screening $T_{ξ, v} (\hat{β}, G), v \geq v_{°}$ , until it next exceeds $π_{n, p}$ . These steps of screening $T_{ξ, v} (\hat{β}, G)$ and updating $\hat{β}$ are repeated iteratively until the end of the data sequence is reached. Algorithm 2 in Appendix C outlines the steps of the Stage 2 methodology.

illustrates that although $ξ_{t}$ is latent, at each iteration, ${\hat{β}}_{v_{°}} (G)$ does as well as its oracle counterpart (obtained as in (10) with the sample ACV of $ξ_{t}$ replacing ${\hat{Γ}}_{ξ, v} (l, G)$ ). Computationally, this strategy benefits from that the costly solution to the $l_{1}$ -minimization problem in (10) is required (at most) $K_{ξ} + 1$ times with an appropriately chosen threshold $π_{n, p}$ (see Theorem 4.3). We further demonstrate numerically the competitiveness of Stage 2 as a standalone method for VAR time series segmentation in Section 5.2, and provide an in-depth comparative study with the existing methods in Appendix A.2.

Fig. 2 Illustration of Stage 2 applied to a realization from (M1) of Section 5.2 with G = 300 and d = 1. Top: The solid curve represents $T_{ξ, v} (\hat{β}, G), v_{°} \leq v \leq \overset{ˇ}{θ} + G$ , computed at the three iterations of Steps 1–3 of Algorithm 2. At each iteration, we use $\hat{β} = {\hat{β}}_{v_{°}} (G)$ estimated from each of the sections the data highlighted in the x-axis (left to right); the corresponding estimators are plotted in the bottom panel and for comparison, we also plot the estimators obtained in the oracle setting where $ξ_{t}$ is observable (all plots have the identical z-axis range). The locations of $v_{°}, \overset{ˇ}{θ}$ and $\hat{θ}$ in Algorithm 2, and $θ_{ξ, k}$ are denoted by the vertical long-dashed, dot-dashed, dotted and dashed lines, respectively. The horizontal line represents $π_{n, p}$ chosen as described in Section 5.1.

Fig. 2 Illustration of Stage 2 applied to a realization from (M1) of Section 5.2 with G = 300 and d = 1. Top: The solid curve represents Tξ,v(β̂,G), v°≤v≤θˇ+G, computed at the three iterations of Steps 1–3 of Algorithm 2. At each iteration, we use β̂=β̂v°(G) estimated from each of the sections the data highlighted in the x-axis (left to right); the corresponding estimators are plotted in the bottom panel and for comparison, we also plot the estimators obtained in the oracle setting where ξt is observable (all plots have the identical z-axis range). The locations of v°,θˇ and θ̂ in Algorithm 2, and θξ,k are denoted by the vertical long-dashed, dot-dashed, dotted and dashed lines, respectively. The horizontal line represents πn,p chosen as described in Section 5.1.

4 Theoretical Properties

4.1 Consistency of Stage 1 of FVARseg

We carry out our theoretical investigation under two different regimes with respect to the tail behavior of $u_{t}$ and $ξ_{t}$ ; in particular, the weaker condition in Assumption 4.1(i) permits heavy-tailed innovations, while the existing literature on (piecewise stationary) VAR modeling in high dimensions, commonly adopts the Gaussianity as in (ii).

Assumption 4.1.

We assume either of the following conditions.

There exists $ν > 4$ such that $\max {E (| u_{j t} |^{ν}), E (| ε_{i t} |^{ν})} \leq μ_{ν} < \infty$ .
$u_{t} \sim_{iid} N_{q} (0, I)$ and $ε_{t} \sim_{iid} N_{p} (0, I)$ .

In establishing the consistency of Stage 1, we opt to measure the size of changes in $χ_{t}$ using $Δ_{χ, k} (ω) = Σ_{χ}^{[k]} (ω) - Σ_{χ}^{[k - 1]} (ω), 1 \leq k \leq K_{χ}$ , the difference in spectral density matrices of $χ_{t}$ from neighboring segments. As $Δ_{χ, k} (ω)$ is Hermitian, we can always find the jth largest (in modulus), real-valued eigenvalue of $Δ_{χ, k} (ω)$ which we denote by $μ_{j} (Δ_{χ, k} (ω))$ , with $μ_{1} (Δ_{χ, k} (ω)) = | | Δ_{χ, k} (ω) | |$ . Recall that $m = G^{β}$ for some $β \in (0, 1)$ , denotes the bandwidth used in local spectral density estimation, see (4).

Assumption 4.2.

For each $1 \leq k \leq K_{χ}$ , the following holds: There exist a positive integer $p_{0} \geq 1$ and pairs of functions $ω \mapsto a_{j}^{[k]} (ω)$ and $ω \mapsto b_{j}^{[k]} (ω)$ for $ω \in [- π, π]$ and j = 1, 2, and $r_{k, 1}^{'} \in (0, 1]$ and $r_{k, 2}^{'} \in [0, 1]$ satisfying $r_{k, 1}^{'} \geq r_{k, 2}^{'}$ , such that $\begin{matrix} b_{1}^{[k]} (ω) \geq \frac{μ_{1} (Δ_{χ, k} (ω))}{p^{r_{k, 1}^{'}}} \geq a_{1}^{[k]} (ω) > b_{2}^{[k]} (ω) \\ \geq \frac{μ_{2} (Δ_{χ, k} (ω))}{p^{r_{k, 2}^{'}}} \geq a_{2}^{[k]} (ω) \geq 0 \end{matrix}$ for all $p \geq p_{0}$ . Besides, we assume that the functions $ω \mapsto p^{- r_{k, 1}^{'}} μ_{1} (Δ_{χ, k} (ω))$ are Lipschitz continuous with bounded Lipschitz constants. Then for $Δ_{χ, k} = \max_{ω \in [- π, π]} μ_{1} (Δ_{χ, k} (ω))$ , we have $\max_{1 \leq k \leq K_{χ}} Δ_{χ, k}^{- 1} \cdot p (ψ_{n} \lor m^{- 1}) = o (1)$ , where (11) $ψ_{n} = { \begin{matrix} \frac{n^{2 / ν} m \log^{2 + 2 / ν} (G)}{G} \lor \sqrt{\frac{m \log (n)}{G}} & under Assumption 4.1 (i), \\ \sqrt{\frac{m \log (n)}{G}} & under Assumption 4.1 (ii) . \end{matrix}$ (11)
The bandwidth G = G_n satisfies $G_{n} \to \infty$ as $n \to \infty$ while fulfilling (12) $\min {\min_{_{0 \leq k \leq K_{χ}}} (θ_{χ, k + 1} - θ_{χ, k}), \min_{_{0 \leq k \leq K_{ξ}}} (θ_{ξ, k + 1} - θ_{ξ, k})} \geq 2 G .$ (12) Assumption 4.2 specifies the detection lower bound which is determined by $\min_{k} Δ_{χ, k}$ and $\min_{k} (θ_{χ, k + 1} - θ_{χ, k})$ (through G), for all $K_{χ}$ change points $χ_{t}$ to be detectable by Stage 1. In the literature on change point detection in factor models, a common assumption is that the change is large enough to appear as extra “factors” (Duan, Bai, and Han 2022), in light of which the condition (i) is a reasonable one. It further requires $μ_{1} (Δ_{χ, k} (ω))$ to be distinct from the rest. In fact, the remaining $μ_{j} (Δ_{χ, k} (ω)), j \geq 2$ , are allowed to be exactly zero, which is the case in Example 3.1; here, we have $Δ_{χ, 1} = \max_{ω} {(2 π)}^{- 1} | | {(b (e^{- ι ω}))}^{*} b (e^{- ι ω}) | |$ where $b (z)$ is a p-variate vector of factor loading filters. The rate $p (ψ_{n} \lor m^{- 1})$ represents the bias-variance tradeoff when estimating the local spectral density matrix of $χ_{t}$ by ${\hat{Σ}}_{x, v} (G, ω)$ (see Proposition F.6). It is possible to find the rate of kernel bandwidth m that minimizes this rate depending on the tail behavior of X_it (e.g., $m ≍ {(G / \log (n))}^{1 / 3}$ under Gaussianity), but we choose to explicitly highlight the role of this tuning parameter on our results.

Theorem 4.1.

Suppose that Assumptions 2.1–2.3, 4.1, and 4.2 hold. Let $κ_{n, p}$ satisfy $\begin{matrix} 2 M p (ψ_{n} \lor \frac{1}{m} \lor \frac{1}{p}) < κ_{n, p} < \frac{1}{2} \min_{_{1 \leq k \leq K_{χ}}} Δ_{χ, k} \\ - M p (ψ_{n} \lor \frac{1}{m} \lor \frac{1}{p}) \end{matrix}$ for some constant M > 0. Then, there exists a set $M_{n, p}^{χ}$ with $P (M_{n, p}^{χ}) \to 1$ as $n, p \to \infty$ , such that the following holds for ${\hat{Θ}}_{χ} = {{\hat{θ}}_{χ, k}, 1 \leq k \leq {\hat{K}}_{χ} : {\hat{θ}}_{χ, 1} < \dots < {\hat{θ}}_{χ, {\hat{K}}_{χ}}}$ returned by Stage 1 of FVARseg, on $M_{n, p}^{χ}$ for large enough n and p:

${\hat{K}}_{χ} = K_{χ}$ and $\max_{1 \leq k \leq K_{χ}} | {\hat{θ}}_{χ, k} - θ_{χ, k} | \leq ϵ_{0} G$ for some $ϵ_{0} \in (0, 1 / 2)$ with $η \in (2 ϵ_{0}, 1]$ .
There exists a constant $c_{0} > 0$ such that for all $1 \leq k \leq K_{χ}, | {\hat{θ}}_{χ, k} - θ_{χ, k} | \leq c_{0} ρ_{n, p}^{[k]}$ where $\begin{matrix} ρ_{n, p}^{[k]} = {(\frac{Δ_{χ, k}}{p})}^{- 2} \\ \times {\begin{matrix} m^{\frac{ν}{ν - 2}} {(G K_{χ})}^{\frac{2}{ν - 2}} & under Assumption 4.1 (i), \\ m \log (G K_{χ}) & under Assumption 4.1 (ii) . \end{matrix} \end{matrix}$

Remark 4.1.

In Theorem 4.1(b), $ρ_{n, p}^{[k]}$ reflects the difficulty associated with estimating the individual change point $θ_{χ, k}$ manifested by ${(p^{- 1} Δ_{χ, k})}^{- 2}$ . In the Gaussian case (Assumption 4.1(ii)), the localization rate $ρ_{n, p}^{[k]}$ is always sharper than G due to Assumption 4.2(i). Considering the problem of covariance change point detection in independent, sub-Gaussian random vectors in high dimensions, Wang, Yu, and Rinaldo (2021) derive the minimax lower bound on the localization rate in their Lemma 3.2, and $ρ_{n, p}^{[k]}$ matches this rate up to $m \log (n)$ ; here, the dependence on the kernel bandwidth m is attributed to that we consider a time series segmentation problem, that is a change may occur in the ACV of $χ_{t}$ at lags other than zero. If heavier tails are permitted (Assumption 4.1(i)), $ρ_{n, p}^{[k]}$ can be tighter than $ϵ_{0} G$ , for example when $Δ_{χ, k} ≍ p, K_{χ}$ is fixed and $m ≍ G^{β}$ for some $β \in (0, 1 - 4 / ν)$ .
Empirically, replacing $\hat{θ}$ with $\tilde{θ} = \arg \max_{v \in I} {avg}_{l} T_{χ, v} (ω_{l}, G)$ returns a more stable location estimator, where ${avg}_{l}$ denotes the average operator over $l = 0, \dots, m$ . We can derive the localization rate for $\tilde{θ}$ similarly as in Theorem 4.1(b) with ${\tilde{Δ}}_{χ, k} = π^{- 1} \int_{0}^{π} | | Δ_{χ, k} (ω) | | d ω$ in place of $Δ_{χ, k}$ . Our numerical results in Section 5.2 are based on this estimator.

Next, we establish the consistency of ${\hat{Γ}}_{χ}^{[k]} (l)$ in (7) estimating the segment-specific ACV of $χ_{t}^{[k]}$ under the following assumption on the strength of factors.

Assumption 4.3.

Assumption 2.1 holds with $r_{k, j} = 1$ for all $1 \leq j \leq q_{k}$ and $0 \leq k \leq K_{χ}$ .

Theorem 4.2.

Suppose that Assumption 4.3 holds in addition to the assumptions made in Theorem 4.1, and define $ρ_{n, p} = \max_{1 \leq k \leq K_{χ}} \min (ϵ_{0} G, ρ_{n, p}^{[k]})$ . Also let $ϑ_{n, p} = { \begin{matrix} \frac{m {(n p)}^{2 / ν} \log^{7 / 2} (p)}{G} \lor \sqrt{\frac{m \log (n p)}{G}} & under Assumption 4.1 (i), \\ \sqrt{\frac{m \log (n p)}{G}} & under Assumption 4.1 (ii) . \end{matrix}$

Then on $M_{n, p}^{χ}$ defined in Theorem 4.1, for some finite integer $d \in N$ , we have $\begin{matrix} \max_{_{0 \leq k \leq K_{χ}}} \max_{_{0 \leq l \leq d}} {| {\hat{Γ}}_{χ}^{[k]} (l) - Γ_{χ}^{[k]} (l) |}_{\infty} \\ = O_{p} (ϑ_{n, p} \lor \frac{1}{m} \lor \frac{ρ_{n, p}}{G} \lor \frac{1}{\sqrt{p}}) . \end{matrix}$

It is possible to work under the weaker Assumption 2.1 and trace the effect of weak factors or bound estimation errors measured in different norms. Corollary E.16 of Barigozzi, Cho, and Owens (2022) derives such results in the stationary setting, where an additional multiplicative factor of $p^{2 (1 - \min_{k} r_{k, q_{k}})}$ appears in the O_P-bound in Theorem (4.2). We work under the stronger Assumption 4.3 as it simplifies the presentation of Theorem 4.2 which plays an important role in the investigation into Stage 2 of FVARseg. Also, for the factor number estimator of Hallin and Liška (2007) which achieves consistency under the general dynamic factor model we adopt in (3), it is required that $r_{k, q_{k}} = 1$ for all $0 \leq k \leq K_{χ}$ . Forni et al. (2004) argue that this assumption is a natural one requiring the influence of the common shocks to be, in some sense, “stationary along the cross-sections,” and also it is compatible with the cross-sectional ordering being completely arbitrary.

4.2 Consistency of Stage 2 of FVARseg

Suppose that the tuning parameter for the $l_{1}$ -regularized Yule-Walker estimation problem in (10), is set with some constant M > 0 and $ϑ_{n, p}$ and $ρ_{n, p}$ defined in Theorem 4.2, as (13) $λ_{n, p} = M (\max_{_{0 \leq k \leq K_{ξ}}} | | β^{[k]} | |_{1} + 1) (ϑ_{n, p} \lor \frac{1}{m} \lor \frac{ρ_{n, p}}{G} \lor \frac{1}{\sqrt{p}}) .$ (13)

This choice reflects the error in ${\hat{Γ}}_{ξ, v} (l, G)$ estimating the local ACV of $ξ_{t}$ over all v and $l$ .

The following assumption imposes conditions on the size of the changes in VAR parameters and the minimum spacing between the change points.

Assumption 4.4.

For each $1 \leq k \leq K_{ξ}$ , let $Δ_{ξ, k} = G^{[k]} (β^{[k]} - β^{[k - 1]})$ . Then, $\max_{_{1 \leq k \leq K_{ξ}}} \frac{(1 \lor | | G^{[k]} {(G^{[k - 1]})}^{- 1} | |_{1}) λ_{n, p}}{| Δ_{ξ, k} |_{\infty}} = o (1) .$
The bandwidth G fulfils (12), that is $\min_{0 \leq k \leq K_{ξ}} (θ_{ξ, k + 1} - θ_{ξ, k}) \geq 2 G$ .

Remark 4.2.

We choose to measure the size of change using $| Δ_{ξ, k} |_{\infty}$ . From Assumption 2.2(iv), we have $Δ_{ξ, k} = O$ iff $β^{[k]} - β^{[k - 1]} = O$ . In the related literature, the $l_{2}$ -norm $| β^{[k]} - β^{[k - 1]} |_{2}$ scaled by the global sparsity (given by the union of the supports of all $β^{[k]}, 0 \leq k \leq K_{ξ}$ ), is used to measure the size of change where this global sparsity may be much greater than that of $Δ_{ξ, k}$ when $K_{ξ}$ is large, see Appendix A.2. In some instances, we have $G^{[k]} {(G^{[k - 1]})}^{- 1} = I$ , for example, when d = 1 and $A_{1}^{[k]} = - A_{1}^{[k - 1]}$ such that Assumption 4.4(i) becomes $λ_{n, p} = o (\min_{k} | Δ |_{\infty})$ . More generally, bounding $| | G^{[k]} {(G^{[k - 1]})}^{- 1} | |_{1}$ implicitly assumes (approximate) sparsity on the second-order structure of $ξ_{t}$ . When d = 1, we have $G^{[k]} = \sum_{l = 0}^{\infty} {(A_{1}^{[k]})}^{l} Γ^{[k]} {[{(A_{1}^{[k]})}^{⊤}]}^{l}$ such that the boundedness of $| | G^{[k]} | |_{1}$ and $| | {(G^{[k]})}^{- 1} | |_{1}$ follows when $A_{1}^{[k]}$ and $Γ^{[k]}$ are block diagonal with fixed block size (Wang and Tsay 2022). For general $d \geq 1$ , we have $| | G^{[k]} {(G^{[k - 1]})}^{- 1} | |_{1}$ bounded if $G^{[k]}$ are strictly diagonally dominant (see Definition 6.1.9 of Horn and Johnson (1985) and Han, Lu, and Liu (2015)), which is met for example, when $A_{l}^{[k]}$ are diagonal with their diagonal entries fulfilling $γ_{ξ, i i}^{[k]} (0) > 2 \sum_{l = 1}^{d - 1} | γ_{ξ, i i}^{[k]} (l) |$ (where $Γ_{ξ}^{[k]} (l) = {[γ_{ξ, i i'}^{[k]} (l)]}_{i, i'}$ ); this trivially holds when d = 1.

Theorem 4.3.

Suppose that Assumption 4.4 holds in addition to the assumptions made in Theorem 4.2. With $λ_{n, p}$ chosen as in (13), we set $π_{n, p}$ to satisfy $2 λ_{n, p} < π_{n, p} < \frac{1}{2} \min_{_{1 \leq k \leq K_{ξ}}} | Δ_{ξ, k} |_{\infty} .$

Then, there exists a set $M_{n, p}^{ξ}$ with $P (M_{n, p}^{ξ}) \to 1$ as $n, p \to \infty$ , such that the following holds for ${\hat{Θ}}_{ξ} = {{\hat{θ}}_{ξ, k}, 1 \leq k \leq {\hat{K}}_{ξ} : {\hat{θ}}_{ξ, 1} < \dots < {\hat{θ}}_{ξ, {\hat{K}}_{ξ}}}$ returned by Stage 2 of FVARseg, on $M_{n, p}^{ξ}$ for large enough n:

${\hat{K}}_{ξ} = K_{ξ}$ and $\max_{1 \leq k \leq K_{ξ}} | {\hat{θ}}_{ξ, k} - θ_{ξ, k} | \leq ϵ_{0} G$ for some $ϵ_{0} \in (0, 1 / 2)$ with $η \in (ϵ_{0}, 1]$ .
There exists a constant $c_{0} > 0$ such that for all $1 \leq k \leq K_{ξ}$ satisfying ${θ_{ξ, k} - 2 G + 1, \dots, θ_{ξ, k} + 2 G} \cap Θ_{χ} = \emptyset$ , we have $| {\hat{θ}}_{ξ, k} - θ_{ξ, k} | \leq c_{0} ϱ_{n, p}^{[k]}$ , where $\begin{matrix} ϱ_{n, p}^{[k]} = | Δ_{ξ, k} |_{\infty}^{- 2} (1 + \max_{_{0 \leq k \leq K_{ξ}}} | | β^{[k]} | |_{1}) \\ \times {\begin{matrix} {(G K_{ξ} p)}^{\frac{2}{ν - 2}} \log^{\frac{3 ν}{ν - 2}} (p) \\ underAssumption 4.1 (i), \\ \log (G K_{ξ} p) \\ underAssumption 4.1 (ii) . \end{matrix} \end{matrix}$

Due to the sequential nature of FVARseg, the success of Stage 2 is conditional on that of Stage 1 which occurs on an asymptotic one-set, see Theorem 4.1. Theorem 4.3(a) establishes that Stage 2 of FVARseg consistently detects all $K_{ξ}$ change points within the distance of $ϵ_{0} G$ where ϵ₀ can be made arbitrarily small as $n, p \to \infty$ under Assumption 4.4(i). Theorem 4.3(b) shows that a further refined localization rate can be derived for $θ_{ξ, k}$ when it is sufficiently distanced away from the change points in the factor-driven component. If, say, $θ_{ξ, k}$ lies close to $θ_{χ, k'}$ , a change point in $χ_{t}$ , the error from estimating the local ACV of $ξ_{t}$ due to the bias in ${\hat{θ}}_{χ, k'}$ , prevents applying the arguments involved in the refinement to such $θ_{ξ, k}$ . The refined rate $ϱ_{n, p}^{[k]}$ is always tighter than G under Gaussianity.

It is of independent interest to consider the cases where $χ_{t}$ is stationary (i.e., $K_{χ} = 0$ ) or where we directly observe the piecewise stationary VAR process (i.e., $X_{t} = ξ_{t}$ ). Consistency of the Stage 2 of FVARseg readily extends to such settings and the improved localization rates in Theorem 4.3(b) apply to all the estimators. Also, further improvement is attained in the heavy-tailed situations (Assumption 4.1(i)) if $ξ_{t}$ is directly observable. For the full statement of the results, we refer to Corollary A.1 in Appendix A where we also provide a detailed comparison between Stage 2 of FVARseg and existing VAR segmentation methods (that do not take into the possible presence of factors), both theoretically and numerically.

5 Empirical Results

5.1 Numerical Considerations

5.1.1 Multiscale Extension

The bandwidth G is required to be large enough to provide a good local estimators of spectral density of $χ_{t}$ (Stage 1) and VAR parameters (Stage 2). However, if G is too large, we may have windows that contain two or more changes when scanning the data for change points, which violates Assumptions 4.2(ii) and 4.4(ii). Cho and Kirch (2022) note the lack of adaptivity of a single-bandwidth moving window procedure in the presence of multiscale change points (a mixture of large changes over short intervals and smaller changes over long intervals), and advocates the use of multiple bandwidths. Accordingly we also propose to apply FVARseg with a range of bandwidths and prune down the outputs using a “bottom-up” method (Messer et al. 2014; Meier, Kirch, and Cho 2021). Let $\hat{Θ} (G)$ denote the output from Stage 1 or 2 with a bandwidth G. Given a set of bandwidths $G = {G_{h}, 1 \leq h \leq H : G_{1} < \dots < G_{H}}$ , we accept all estimators from the finest G₁ to the set of final estimators $\hat{Θ}$ and sequentially for $h \geq 2$ , accept $\hat{θ} \in \hat{Θ} (G_{h})$ iff $\min_{\overset{ˇ}{θ} \in \hat{Θ}} | \hat{θ} - \overset{ˇ}{θ} | \geq G / 2$ . In simulation studies, we use $G_{χ} = {[n / 10], [n / 8], [n / 6], [n / 4]}$ for Stage 1, and $G_{ξ}$ generated as an equispaced sequence between $[2.5 p]$ and $[n / 4]$ of length 4 for Stage 2. The choice of $G_{ξ}$ is motivated by the simulation results of Barigozzi, Cho, and Owens (2022) under the stationarity, where the $l_{1}$ -regularized estimator in (10) was observed to perform well when the sample size exceeds 2p.

5.1.2 Speeding Up Stage 1

The computational bottleneck of FVARseg is the computation of $T_{χ, v} (ω_{l}, G)$ in Stage 1, which involves singular value decomposition (SVD) of a p × p-matrix at multiple frequencies and over time. We propose to evaluate $T_{χ, v} (ω_{l}, G)$ on a grid $v \in {G + a b_{n} : 0 \leq a \leq ⌊ (n - 2 G) / b_{n} ⌋}$ with $b_{n} = ⌊ 2 \log (n) ⌋$ . This may incur additional bias of at most $b_{n} / 2 \leq \log (n)$ in change point location estimation which is asymptotically negligible in view of Theorem 4.1, but reduce the computational load by the factor of b_n.

5.1.3 Selection of Thresholds

The theoretically permitted ranges of $κ_{n, p}$ and $π_{n, p}$ (see Theorems 4.1 and 4.3) depend on constants which are not accessible or difficult to estimate in practice. This is an issue commonly encountered by data segmentation methods which involve localized testing, and often a reasonable solution is found by large-scale simulations, an approach we also take. We use simulations to derive a simple rule for selecting the threshold as a function of n, p, and G. For this, we (i) propose a scaling for each of the two detector statistics adopted in Stages 1 and 2 which reduces its dependence on the data generating process, and (ii) fit a linear model for an appropriate percentile of the scaled detector statistics obtained from simulated datasets. Specifically, we simulate B = 100 time series following (3) with $K_{χ} = K_{ξ} = 0$ using the models considered in Section 5.2, and record the maximum of the scaled detector statistics $T_{χ, v}^{°} (G)$ and $T_{ξ, v}^{°} (G)$ over v on each realization. Here, the scaling terms are obtained from the first G observations only, as $\begin{matrix} T_{χ, v}^{°} (G) = \max_{_{0 \leq l \leq m}} \frac{T_{χ, v} (ω_{l}, G)}{T_{χ, G} (ω_{l}, G)} and \\ T_{ξ, v}^{°} (G) = \frac{T_{ξ, v} ({\hat{β}}_{G} (G), G)}{\max_{_{0 \leq l \leq d}} | {\hat{Γ}}_{ξ, [\frac{G}{2}]} (l, [\frac{G}{2}]) - {\hat{Γ}}_{ξ, G} (l, [\frac{G}{2}]) |_{\infty}} . \end{matrix}$

Generating the data with varying $(n, p, q, d)$ and repeating the above procedure with multiple choices of G, we fit a linear model to the $100 (1 - τ)$ th percentile of $\log (\max_{v} T_{χ, v}^{°} (G))$ with $\log \log (n)$ and $\log (G)$ as regressors ( $R_{adj}^{2} = 0.9651$ ), and use the fitted model to derive a threshold for given n and G that is then applied to the similarly scaled $T_{χ, v}^{°} (ω_{l}, G)$ . Analogously, we regress the $100 (1 - τ)$ th percentile of $\log (\max_{v} T_{ξ, v}^{°} (G))$ onto $\log \log (n), \log \log (p)$ and $\log (G)$ ( $R_{adj}^{2} = 0.985$ ), and find a threshold applied to the scaled $T_{ξ, v}^{°} (\hat{β}, G)$ given n, p, and G from the fitted model. The choice of the regressors is motivated by the definitions of ψ_n and $ψ_{n, p}$ which appear in Theorems 4.1 and 4.3. The high values of $R_{adj}^{2}$ indicate the excellent fit of the linear models and consequently, that the threshold selection rule is insensitive to the data generating processes. When Stage 2 is used as a standalone method for segmenting observed VAR processes, a smaller threshold is recommended which is in line with Corollary A.1, and we find that $π_{n, p} = 1$ works well with the proposed scaling.

5.1.4 Other Tuning Parameters

While data-adaptive methods exist for selecting the kernel window size m in (4) (Politis 2003), we find that setting it simply at $m = \max (1, ⌊ G^{1 / 3} ⌋)$ for given G, works well for the purpose of data segmentation. The results are not highly sensitive to the choice of η in Stage 1 and use $η = 0.5$ throughout. In Stage 2, we find that not trimming off the data when estimating the VAR parameters by setting η = 0, does not hurt the numerical performance. In factor-adjustment, we select the segment-specific factor number q_k using the IC-based approach of Hallin and Liška (2007). Krampe and Margaritella (2021) propose to jointly select the (static) factor number and the VAR order using an IC but generally, the validity of IC is not well-understood for VAR order selection in high dimensions. In our simulations, following the practice in the literature on VAR segmentation, we regard d as known but also investigate the sensitivity of FVARseg when d is misspecified. In analyzing the panel of daily volatilities (Section 5.3), we use d = 5 which has the interpretation of the number of trading days per week. Finally, we select $λ_{n, p}$ in (10) via cross validation as in Barigozzi, Cho, and Owens (2022).

5.2 Simulation Studies

In the simulations, we consider the cases when the factor-driven component is present ( $χ_{t} \neq 0$ ) and when it is not ( $χ_{t} = 0$ ). For the former, we consider two models for generating $χ_{t}$ with q = 2. In the first model, referred to as (C1), $χ_{t}$ admits a static factor model representation while in the second model (C2), it does not; empirically, the task of factor structure estimation is observed to be more challenging under (C2) (Forni et al. 2017; Barigozzi, Cho, and Owens 2022). We generate $ξ_{t}$ as piecewise stationary Gaussian VAR(d) processes with $d \in {1, 2}$ and a parameter β that controls the size of the change (with smaller β indicating the smaller change). We refer to Appendix B.1 for the full descriptions of simulation models and for an overview of the 24 data generating processes which also contains information about the sets of change points $Θ_{χ}$ and $Θ_{ξ}$ ; under each setting, we generate 100 realizations. Below we provide a summary of the findings from the simulation studies, and –B.2 reporting the results can be found in Appendix B.2.

Table 1 Data generating processes for simulation studies.

Display Table

To the best of our knowledge, there does not exist a methodology that comprehensively addresses the change point problem under the model (3). Therefore under (M1)–(M2), we compare the Stage 1 of FVARseg with a method proposed in Barigozzi, Cho, and Fryzlewicz (2018), referred to as BCF hereafter, on their performance at detecting changes in $χ_{t}$ . While BCF has a step for detecting change points in the remainder component, it does so nonparametically unlike the Stage 2 of FVARseg, which may lead to unfair comparison. Hence, we separately consider (M3) with $X_{t} = ξ_{t}$ where we compare the Stage 2 method with VARDetect (Safikhani, Bai, and Michailidis 2022), a block-wise variant of Safikhani and Shojaie (2022).

5.2.1 Results under (M1)–(M2)

Overall, FVARseg achieves good accuracy in estimating the total number and locations of the change points for both $χ_{t}$ and $ξ_{t}$ across different data generating processes. Under (M1) adopting the static factor model for generating $χ_{t}$ , FVARseg shows similar performance as BCF in detecting $Θ_{χ}$ when the dimension is small (p = 50), but the latter tends to over-estimate the number of change points as p increases. Also, FVARseg outperforms the binary segmentation-based BCF in change point localization. BCF requires as an input the upper bound on the number of global factors, say $q'$ , that includes the ones attributed to the change points, and its performance is sensitive to its choice. In (M1), we have $q' \leq 3 q (K_{χ} + 1)$ (which is supplied to BCF) while in (M2), $χ_{t}^{[k]}$ does not admit a static factor representation and accordingly such $q'$ does not exist (we set $q' = 2 q$ for BCF). Accordingly, BCF tends to under-estimate the number of change points under (M2). Generally, the task of detecting change points in $ξ_{t}$ is aggravated by the presence of change points in $χ_{t}$ due to the sequential nature of FVARseg, and the Stage 2 performs better when $K_{χ} = 0$ both in terms of detection and localization accuracy, which agrees with the observations made in Corollary A.1(a).

Between (M1) and (M2), the latter poses a more challenging setting for the Stage 2 methodology. This may be attributed to (i) the difficulty posed by the data generating scenario (C2), which is observed to make the estimation tasks related to the latent VAR process more difficult (Barigozzi, Cho, and Owens 2022), and (ii) that $Θ_{χ} = Θ_{ξ}$ where the estimation bias from Stage 1 has a worse effect on the performance of Stage 2 compared to when $Θ_{χ}$ and $Θ_{ξ}$ do not overlap, see the discussion below Theorem 4.3.

5.2.2 Results under (M3)

Table B.2 shows that the Stage 2 of FVARseg outperforms VARDetect in all criteria considered, particularly as p increases. VARDetect struggles to detect any change point when the change is weak (recall that $β = 0.6$ is used when d = 1 which makes the size of change at $θ_{ξ, 2}$ small) or when d = 2. FVARseg is faster than VARDetect in most situations except for when $(d, p, K_{ξ}) = (1, 50, 0)$ , sometimes more than 10 times for example when d = 2 and there is no change point in the data. Additionally, Stage 2 of FVARseg is less sensitive to the over-specification of the VAR order (d = 2 is used when in fact d = 1). When it is under-specified, there is slight loss of detection power as expected. Generally, an increase in VAR order brings in an increase in the number of VAR parameters which impacts the empirical performance. Compared to the results obtained under (M1)–(M2), the localization performance of the Stage 2 method improves in the absence of the factor-driven component, even though the size of changes under (M3) tends to be smaller. This confirms the theoretical findings reported in Corollary A.1 (b) in Appendix A. Although not reported here, when the full FVARseg methodology is applied to the data generated under (M3), the Stage 1 method does not detect any spurious change point estimators as desired.

5.3 Application: U.S. Blue Chip Data

We consider daily stock prices from p = 72 U.S. blue chip companies across industry sectors between January 3, 2000 and February 16, 2022 (n = 5568 days), retrieved from the Wharton Research Data Services; the list of companies and their corresponding sectors can be found in Appendix E. Following Diebold and Y ilmaz (2014), we measure the volatility using $σ_{i t}^{2} = 0.361 {(p_{i t}^{high} - p_{i t}^{low})}^{2}$ where $p_{i t}^{high}$ (resp. $p_{i t}^{low}$ ) denotes the maximum (resp. minimum) log-price of stock i on day t, and set $X_{i t} = \log (σ_{i t}^{2})$ .

We apply FVARseg to detect change points in the panel of volatility measures ${X_{i t}, 1 \leq i \leq p; 1 \leq t \leq n}$ . With $n_{0} = 252$ denoting the number of trading days per year, we apply Stage 1 with bandwidths chosen as an equispaced sequence between $[n_{0} / 4]$ and $2 n_{0}$ of length 4, implicitly setting the minimum distance between two neighboring change points to be three months. Based on the empirical sample size requirement for VAR parameter estimation (see Section 5.1), we apply Stage 2 with bandwidths chosen as an equispaced sequence between $2.5 p$ and $2 n_{0}$ of length 4. The VAR order is set at d = 5 which corresponds to the number of trading days in each week, and the rest of the tuning parameters are selected as in Section 5.1. reports the segmentation results.

Table 2 Sets of change point estimators returned by FVARseg.

Display Table

Stage 1 detects four change points around the Great Financial Crisis between 2007 and 2009, and the last two estimators from Stage 1 correspond to the onset (2020-02-20) and the end (2020-04-07) of the stock market crash brought in by the instability due to the COVID-19 pandemic. Given the clustering of change points between 2007 and 2009, an alternative approach is to adopt a locally stationary factor model as in Barigozzi et al. (2021). However, such a model does not allow for the number of factors to vary over time, whereas we observe the contrary to be the case when applying the IC-based method of Hallin and Liška (2007) to each segment defined by ${\hat{Θ}}_{χ}$ , see . This supports that it is more appropriate to model the changes in the factor-driven component of this dataset as abrupt changes rather than as smooth transitions.

Table 3 Estimated number of factors ${\hat{q}}_{k}$ from ${X_{t}, {\hat{θ}}_{χ, k} + 1 \leq t \leq {\hat{θ}}_{χ, k + 1}}, k = 0, \dots, 7$ .

Display Table

The estimators from Stage 2 are spread across the period in consideration. illustrate how the linkages between different companies vary over the four segments identified between 2003 and 2011 particularly at the level of industrial sectors, although this information is not used by FVARseg.

To further validate the segmentation obtained by FVARseg, we perform a forecasting exercise. Two approaches, referred to as (F1) and (F2), are adopted to build forecasting models where the difference lies in how a sub-sample of ${X_{u}, u \leq t - 1}$ , is chosen to forecast $X_{t}$ . Simply put, (F1) uses the observations belonging to the same segment as $X_{t}$ only, for constructing the forecast of $χ_{t}$ (resp. $ξ_{t}$ ) according to the segmentation defined by ${\hat{Θ}}_{χ}$ (resp. ${\hat{Θ}}_{ξ}$ ), while (F2) ignores the presence of the most recent change point estimator. We expect (F1) to give more accurate predictions if the data undergoes structural changes at the detected change points. On the other hand, if some of the change point estimators are spurious, (F2) is expected to produce better forecasts since it makes use of more observations. We select $T$ , the set of time points at which to perform forecasting, such that each $t \in T$ does not belong to the first two segments (i.e., $t \geq \max ({\hat{θ}}_{χ, 2}, {\hat{θ}}_{ξ, 2}) + 1$ ), and there are at least n₀ of observations to build a forecast model separately for $χ_{t}$ and $ξ_{t}$ , respectively. Denoting by ${\hat{L}}_{χ} (v) = \max {0 \leq k \leq {\hat{K}}_{χ} : {\hat{θ}}_{χ, k} + 1 \leq v}$ the index of ${\hat{θ}}_{χ, k}$ nearest to and strictly left of v and similarly defining ${\hat{L}}_{ξ} (v)$ , this means that $\min ({\hat{L}}_{χ} (t), {\hat{L}}_{ξ} (t)) \geq 2$ and $\min (t - {\hat{θ}}_{χ, {\hat{L}}_{χ} (t)}, t - {\hat{θ}}_{ξ, {\hat{L}}_{ξ} (t)}) \geq n_{0}$ for all $t \in T$ . We have $| T | = 1600$ . For such $t \in T$ , we obtain ${\hat{X}}_{t} (N) = {\hat{χ}}_{t} (N_{1}) + {\hat{ξ}}_{t} (N_{2})$ for some $N = (N_{1}, N_{2})$ , where ${\hat{χ}}_{t} (N_{1})$ denotes an estimator of the best linear predictor of $χ_{t}$ given $X_{t - l}, 1 \leq l \leq N_{1}$ , and ${\hat{ξ}}_{t} (N_{2})$ is defined analogously. The difference between the two approaches we take lies in the selection of $N$ .

(F1)We set $N_{1} = t - {\hat{K}}_{χ, {\hat{L}}_{χ} (t)} - 1$ and $N_{2} = t - {\hat{K}}_{ξ, {\hat{L}}_{ξ} (t)} - 1$ .

(F2)We set $N_{1} = t - {\hat{K}}_{χ, {\hat{L}}_{χ} (t) - 1} - 1$ and $N_{2} = t - {\hat{K}}_{ξ, {\hat{L}}_{ξ} (t) - 1} - 1$ .

Barigozzi, Cho, and Owens (2022) propose two methods for estimating the best linear predictors of $χ_{t}$ and $ξ_{t}$ under a stationary factor-adjusted VAR model, one based on a more restrictive assumption on the factor structure (“restricted”) than the other (“unrestricted”); we refer to the paper for their detailed descriptions. Both estimators are combined with the two approaches (F1) and (F2). reports the summary of the forecasting errors measured as ${FE}_{t}^{avg} = | X_{t} - {\hat{X}}_{t} (N) |_{2}^{2} / | X_{t} |_{2}^{2}$ and ${FE}_{t}^{\max} = | X_{t} - {\hat{X}}_{t} (N) |_{\infty} / | X_{t} |_{\infty}$ , obtained from combining different best linear predictors with (F1)–(F2). According to all evaluation criteria, (F1) produces forecasts that are more accurate than (F2) regardless of the forecasting methods, which supports the validity of the change point estimators returned by FVARseg.

Table 4 Mean and standard errors of ${FE}_{t}^{avg}$ and ${FE}_{t}^{\max}$ for $t \in T$ where $| T | = 1600$ .

Display Table

6 Conclusions

We consider the problem of high-dimensional time series segmentation under a piecewise stationary, factor-adjusted VAR model which, adopting the most general approach to time series factor modeling, permits pervasive cross-sectional and serial correlations in the data as well as accommodating structural changes. The FVARseg proceeds in two stages, detecting change points in the factor-driven component and the idiosyncratic VAR process separately, and fully addresses the challenges arising from the presence of latent factors. Theoretical consistency of FVARseg is established under general conditions permitting heavy tails and dependence across the stationary segments, and we derive the estimation rates that make explicit the influence of the tail behavior and the size of changes. It is competitive both theoretically and computationally in comparison with the existing methods that are proposed for special instances of the proposed factor-adjusted VAR model.

Supplementary Materials

In the supplement, Section F contains preliminary lemmas and all proofs for theoretical results stated in the paper. Section B presents further information on simulation studies and Section A includes further discussions on Stage 2 of FVARseg.

Supplemental material

UASA_A_2240054_Supplemental.zip

Download Zip (4 MB)

Disclosure Statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

Cho is grateful to the support of the Leverhulme Trust via the Research Project Grant RPG-2019-390. Maeng, Eckley and Fearnhead gratefully acknowledge the financial support of the Engineering and Physical Sciences Research Council (EPSRC) via StatScale (EP/N031938/1). Eckley is also grateful to the EPSRC for the financial support of EP/T025964/1, whilst Fearnhead also acknowledges the support of EP/V053590/1. Open Access funding was provided by Durham University.

References

Bai, P., Safikhani, A., and Michailidis, G. (2020), “Multiple Change Points Detection in Low Rank and Sparse High Dimensional Vector Autoregressive Models,” IEEE Transactions on Signal Processing, 68, 3074–3089. DOI: 10.1109/TSP.2020.2993145.
Web of Science ®Google Scholar
Bai, P., Safikhani, A., and Michailidis, G. (2022), “Multiple Change Point Detection in Reduced Rank High Dimensional Vector Autoregressive Models,” Journal of the American Statistical Association (to appear). DOI: 10.1080/01621459.2022.2079514.
Web of Science ®Google Scholar
Barigozzi, M., Cho, H., and Fryzlewicz, P. (2018), “Simultaneous Multiple Change-Point and Factor Analysis for High-Dimensional Time Series,” Journal of Econometrics, 206, 187–225. DOI: 10.1016/j.jeconom.2018.05.003.
Web of Science ®Google Scholar
Barigozzi, M., Cho, H., and Owens, D. (2022), “FNETS: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series,” arXiv preprint arXiv:2201.06110. DOI: 10.1080/07350015.2023.2257270.
Google Scholar
Barigozzi, M., and Hallin, M. (2017), “A Network Analysis of the Volatility of High Dimensional Financial Series,” Journal of the Royal Statistical Society, Series C, 66, 581–605. DOI: 10.1111/rssc.12177.
Google Scholar
Barigozzi, M., Hallin, M., Soccorsi, S., and von Sachs, R. (2021), “Time-Varying General Dynamic Factor Models and the Measurement of Financial Connectedness,” Journal of Econometrics, 222, 324–343. DOI: 10.1016/j.jeconom.2020.07.004.
Web of Science ®Google Scholar
Basu, S., and Michailidis, G. (2015), “Regularized Estimation in Sparse High-Dimensional Time Series Models,” The Annals of Statistics, 43, 1535–1567. DOI: 10.1214/15-AOS1315.
Web of Science ®Google Scholar
Chen, L., Wang, W., and Wu, W. B. (2022), “Inference of Breakpoints in High-Dimensional Time Series,” Journal of the American Statistical Association, 117, 1951–1963, DOI: 10.1080/01621459.2021.1893178.
Web of Science ®Google Scholar
Cho, H., and Kirch, C. (2022), “Two-Stage Data Segmentation Prmitting Multiscale Change Points, Heavy Tails and Dependence,” Annals of the Institute of Statistical Mathematics, 74, 653–684. DOI: 10.1007/s10463-021-00811-5.
Web of Science ®Google Scholar
Cule, E., Vineis, P., and De Iorio, M. (2011), “Significance Testing in Ridge Regression for Genetic Data,” BMC Bioinformatics, 12, 1–15. DOI: 10.1186/1471-2105-12-372.
PubMed Web of Science ®Google Scholar
Diebold, F. X., and Yilmaz, K. (2014), “On the Network Topology of Variance Decompositions: Measuring the Connectedness of Financial Firms,” Journal of Econometrics, 182, 119–134. DOI: 10.1016/j.jeconom.2014.04.012.
Web of Science ®Google Scholar
Duan, J., Bai, J., and Han, X. (2022), “Quasi-Maximum Likelihood Estimation of Break Point in High-Dimensional Factor Models,” Journal of Econometrics (to appear). DOI: 10.1016/j.jeconom.2021.12.011.
Google Scholar
Eichinger, B., and Kirch, C. (2018), “A MOSUM Procedure for the Estimation of Multiple Random Change Points,” Bernoulli, 24, 526–564. DOI: 10.3150/16-BEJ887.
Web of Science ®Google Scholar
Fan, J., Ke, Y., and Wang, K. (2020), “Factor-Adjusted Regularized Model Selection,” Journal of Econometrics, 216, 71–85. DOI: 10.1016/j.jeconom.2020.01.006.
PubMed Web of Science ®Google Scholar
Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,” Journal of the Royal Statistical Society, Series B, 75, 603–680. DOI: 10.1111/rssb.12016.
Google Scholar
Fan, J., Masini, R., and Medeiros, M. C. (2021), “Bridging Factor and Sparse Models,” arXiv preprint arXiv:2102.11341.
Google Scholar
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000), “The Generalized Dynamic Factor Model: Identification and Estimation,” The Review of Economics and Statistics, 82, 540–554. DOI: 10.1162/003465300559037.
Web of Science ®Google Scholar
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2004), “The Generalized Dynamic Factor Model Consistency and Rates,” Journal of Econometrics, 119, 231–255.
Web of Science ®Google Scholar
Forni, M., Hallin, M., Lippi, M., and Zaffaroni, P. (2015), “Dynamic Factor Models with Infinite-Dimensional Factor Spaces: One-Sided Representations,” Journal of Econometrics, 185, 359–371. DOI: 10.1016/j.jeconom.2013.10.017.
Web of Science ®Google Scholar
Forni, M., Hallin, M., Lippi, M., and Zaffaroni, P. (2017), “Dynamic Factor Models with Infinite-Dimensional Factor Space: Asymptotic Analysis,” Journal of Econometrics, 199, 74–92.
Web of Science ®Google Scholar
Giannone, D., Lenza, M., and Primiceri, G. E. (2021), “Economic Predictions with Big Data: The Illusion of Sparsity,” ECB Working Paper 2542, European Central Bank.
Google Scholar
Hallin, M., and Liška, R. (2007), “Determining the Number of Factors in the General Dynamic Factor Model,” Journal of the American Statistical Association, 102, 603–617. DOI: 10.1198/016214506000001275.
Web of Science ®Google Scholar
Han, F., Lu, H., and Liu, H. (2015), “A Direct Estimation of High Dimensional Stationary Vector Autoregressions,” Journal of Machine Learning Research, 16, 3115–3150.
Web of Science ®Google Scholar
Horn, R. A., and Johnson, C. R. (1985), Matrix Analysis, Cambridge: Cambridge University Press.
Google Scholar
Kirch, C., Muhsal, B., and Ombao, H. (2015), “Detection of Changes in Multivariate Time Series with Application to EEG Data,” Journal of the American Statistical Association, 110, 1197–1216. DOI: 10.1080/01621459.2014.957545.
Web of Science ®Google Scholar
Krampe, J., and Margaritella, L. (2021), “Dynamic Factor Models with Sparse VAR Idiosyncratic Components,” arXiv preprint arXiv:2112.07149.
Google Scholar
Li, Y.-N., Li, D., and Fryzlewicz, P. (2023), “Detection of Multiple Structural Breaks in Large Covariance Matrices,” Journal of Business and Economic Statistics, 41, 846–861, DOI: 10.1080/07350015.2022.2076686.
Web of Science ®Google Scholar
Lütkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Berlin: Springer.
Google Scholar
Maeng, H., Eckley, I., and Fearnhead, P. (2022), “Collective Anomaly Detection in High-Dimensional VAR Models, Statistica Sinica (to appear).
Web of Science ®Google Scholar
Meier, A., Kirch, C., and Cho, H. (2021), “mosum: A Package for Moving Sums in Change-Point Analysis,” Journal of Statistical Software, 97, 1–42. DOI: 10.18637/jss.v097.i08.
Web of Science ®Google Scholar
Messer, M., Kirchner, M., Schiemann, J., Roeper, J., Neininger, R., and Schneider, G. (2014), “A Multiple Filter Test for the Detection of Rate Changes in Renewal Processes with Varying Variance,” Annals of Applied Statistics, 8, 2027–2067.
Web of Science ®Google Scholar
Michailidis, G., and d’Alché Buc, F. (2013), “Autoregressive Models for Gene Regulatory Network Inference: Sparsity, Stability and Causality Issues, Mathematical Biosciences, 246, 326–334. DOI: 10.1016/j.mbs.2013.10.003.
PubMed Web of Science ®Google Scholar
Politis, D. N. (2003), “Adaptive Bandwidth Choice,” Journal of Nonparametric Statistics, 15, 517–533. DOI: 10.1080/10485250310001604659.
Web of Science ®Google Scholar
Preuss, P., Puchstein, R., and Dette, H. (2015), “Detection of Multiple Structural Breaks in Multivariate Time Series,” Journal of the American Statistical Association, 110, 654–668. DOI: 10.1080/01621459.2014.920613.
Web of Science ®Google Scholar
Safikhani, A., Bai, Y., and Michailidis, G. (2022), “Fast and Scalable Algorithm for Detection of Structural Breaks in Big VAR Models,” Journal of Computational and Graphical Statistics, 31, 176–189. DOI: 10.1080/10618600.2021.1950005.
Web of Science ®Google Scholar
Safikhani, A., and Shojaie, A. (2022), “Joint Structural Break Detection and Parameter Estimation in High-Dimensional Nonstationary VAR Models,” Journal of the American Statistical Association, 117, 251–264. DOI: 10.1080/01621459.2020.1770097.
PubMed Web of Science ®Google Scholar
Shojaie, A., and Michailidis, G. (2010), “Discovering Graphical Granger Causality Using the Truncating Lasso Penalty,” Bioinformatics, 26, i517–i523. DOI: 10.1093/bioinformatics/btq377.
PubMed Web of Science ®Google Scholar
Wang, D., and Tsay, R. S. (2022), “Rate-Optimal Robust Estimation of High-Dimensional Vector Autoregressive Models,” arXiv preprint arXiv:2107.11002.
Google Scholar
Wang, D., Yu, Y., and Rinaldo, A. (2021), “Optimal Covariance Change Point Localization in High Dimensions,” Bernoulli, 27, 554–575. DOI: 10.3150/20-BEJ1249.
Web of Science ®Google Scholar
Wang, D., Yu, Y., Rinaldo, A., and Willett, R. (2019), “Localizing Changes in High-Dimensional Vector Autoregressive Processes,” arXiv preprint arXiv:1909.06359.
Google Scholar

High-Dimensional Time Series Segmentation via Factor-Adjusted Vector Autoregressive Modeling

Abstract

1 Introduction

1.1 Generality of the Modeling Framework

1.2 Methodological Novelty

1.3 Theoretical Consistency

1.4 Notation

2 Piecewise Stationary Factor-Adjusted VAR Model

2.1 Background

2.2 Model

2.3 Assumptions

3 Methodology

3.1 Stage 1: Factor-Driven Component Segmentation

3.1.1 Change Point Detection

3.1.2 Post-Segmentation Factor Adjustment

3.2 Stage 2: Piecewise VAR Process Segmentation

4 Theoretical Properties

4.1 Consistency of Stage 1 of FVARseg

4.2 Consistency of Stage 2 of FVARseg

5 Empirical Results

5.1 Numerical Considerations

5.1.1 Multiscale Extension

5.1.2 Speeding Up Stage 1

5.1.3 Selection of Thresholds

5.1.4 Other Tuning Parameters

5.2 Simulation Studies

Table 1 Data generating processes for simulation studies.

5.2.1 Results under (M1)–(M2)

5.2.2 Results under (M3)

5.3 Application: U.S. Blue Chip Data

Table 2 Sets of change point estimators returned by FVARseg.

Table 3 Estimated number of factors q̂k from {Xt, θ̂χ,k+1≤t≤θ̂χ,k+1},k=0,…,7.

Table 4 Mean and standard errors of FEtavg and FEtmax for t∈T where |T|=1600.

6 Conclusions

Supplementary Materials

UASA_A_2240054_Supplemental.zip

Disclosure Statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 3 Estimated number of factors ${\hat{q}}_{k}$ from ${X_{t}, {\hat{θ}}_{χ, k} + 1 \leq t \leq {\hat{θ}}_{χ, k + 1}}, k = 0, \dots, 7$ .

Table 4 Mean and standard errors of ${FE}_{t}^{avg}$ and ${FE}_{t}^{\max}$ for $t \in T$ where $| T | = 1600$ .