1,825
Views
3
CrossRef citations to date
0
Altmetric
Method

High-Dimensional Time Series Segmentation via Factor-Adjusted Vector Autoregressive Modeling

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Received 06 Apr 2022, Accepted 14 Jul 2023, Published online: 26 Jul 2023

Abstract

Vector autoregressive (VAR) models are popularly adopted for modeling high-dimensional time series, and their piecewise extensions allow for structural changes in the data. In VAR modeling, the number of parameters grow quadratically with the dimensionality which necessitates the sparsity assumption in high dimensions. However, it is debatable whether such an assumption is adequate for handling datasets exhibiting strong serial and cross-sectional correlations. We propose a piecewise stationary time series model that simultaneously allows for strong correlations as well as structural changes, where pervasive serial and cross-sectional correlations are accounted for by a time-varying factor structure, and any remaining idiosyncratic dependence between the variables is handled by a piecewise stationary VAR model. We propose an accompanying two-stage data segmentation methodology which fully addresses the challenges arising from the latency of the component processes. Its consistency in estimating both the total number and the locations of the change points in the latent components, is established under conditions considerably more general than those in the existing literature. We demonstrate the competitive performance of the proposed methodology on simulated datasets and an application to U.S. blue chip stocks data. Supplementary materials for this article are available online.

1 Introduction

Vector autoregressive (VAR) models are popular for modeling cross-sectional and serial correlations in multivariate, possibly high-dimensional time series. With, for example, applications in finance (Barigozzi and Hallin 2017), biology (Shojaie and Michailidis 2010) and genomics (Michailidis and d’Alché Buc 2013). Within such settings, the importance of data segmentation is well-recognized, and several methods exist for detecting change points in VAR models in both fixed (Kirch, Muhsal, and Ombao 2015) and high dimensions (Safikhani and Shojaie 2022; Wang et al. 2019; Bai, Safikhani, and Michailidis 2020; Maeng, Eckley, and Fearnhead 2022).

VAR modeling quickly becomes a high-dimensional problem as the number of parameters grows quadratically with the dimensionality. Accordingly, most existing methods for detecting change points in high-dimensional, piecewise stationary VAR processes assumes sparsity (Basu and Michailidis 2015). However, it is debatable whether highly sparse models are appropriate for some applications. For example, Giannone, Lenza, and Primiceri (2021) note the difficulty of identifying sparse predictive representations for several macroeconomic applications.

We illustrate the inadequacy of the sparsity assumption on a volatility panel dataset (see Section 5.3 for its description). shows that as the dimensionality increases, the leading eigenvalue of the spectral density matrix at frequency 0 (i.e., the long-run covariance) estimated from the data also increases linearly. This indicates the presence of strong serial and cross-sectional correlations that cannot be accommodated by sparse VAR models. In , we report the logged and truncated p-values obtained from fitting a VAR(5) model to the same dataset (truncation level chosen at log(3.858×106) by Bonferroni correction with the significance level 0.1) via ridge regression, see Cule, Vineis, and De Iorio (2011). Strong dependence observed from most pairs of the variables further confirms that we cannot infer a sparse pairwise relationship from such data. On the other hand, shows that once we estimate factors driving the strong correlations and adjust for their presence, there is evidence that the remaining dependence in the data can be modeled as being sparse. Together, the plots (d), (e), (c), and (f) display that the relationship between a pair of variables (after factor-adjustment) varies over time, particularly at the level of industrial sectors. Here, the intervals are chosen according to the data segmentation result reported in Section 5.3. This example highlights the importance of (i) accounting for the dominant correlations prior to fitting a model under the sparsity assumption, and (ii) detecting structural changes when analyzing time series datasets covering a long period.

Fig. 1 (a) The two largest eigenvalues of the long-run covariance matrix estimated from the volatility panel analyzed in Section 5.3 (March 18, 2008–July 07, 2009, n = 223) with subsets of cross-sections randomly sampled 100 times for each given dimension p{5,,72} (x-axis). (b) and (c): logged and truncated p-values from fitting a VAR(5) model to the same dataset without and with factor-adjustment. (d)–(f): logged and truncated p-values similarly obtained with factor-adjustment from the same variables over different periods. In (b)–(f), for each pair of variables, the minimum p-value over the five lags is reported. Corresponding tickers are given in x- and y-axes and industrial sectors are indicated by the colors and boundaries drawn.

Fig. 1 (a) The two largest eigenvalues of the long-run covariance matrix estimated from the volatility panel analyzed in Section 5.3 (March 18, 2008–July 07, 2009, n = 223) with subsets of cross-sections randomly sampled 100 times for each given dimension p∈{5,…,72} (x-axis). (b) and (c): logged and truncated p-values from fitting a VAR(5) model to the same dataset without and with factor-adjustment. (d)–(f): logged and truncated p-values similarly obtained with factor-adjustment from the same variables over different periods. In (b)–(f), for each pair of variables, the minimum p-value over the five lags is reported. Corresponding tickers are given in x- and y-axes and industrial sectors are indicated by the colors and boundaries drawn.

Motivated by the aforementioned characteristics of high-dimensional time series data, factor-adjusted regression modeling has increasingly gained popularity (Fan, Ke, and Wang 2020; Fan, Masini, and Medeiros 2021; Krampe and Margaritella 2021). The factor-adjusted VAR model proposed by Barigozzi, Cho, and Owens (2022) assumes that a handful of common factors capture strong serial and cross-sectional correlations, such that it is reasonable to assume a sparse VAR model on the remaining component to capture idiosyncratic, variable-specific dependence. We extend this framework by proposing a new, piecewise stationary factor-adjusted VAR model and develop FVARseg, an accompanying change point detection methodology. Below we summarize the methodological and theoretical contributions made in this article.

1.1 Generality of the Modeling Framework

We decompose the data into two piecewise stationary latent processes: one is driven by factors and accounts for dominant serial and cross-sectional correlations, and the other models sparse pairwise dependence via a VAR model. We adopt the most general approach to factor modeling and allow both components to undergo changes which, in the case of the latter, are attributed to shifts in the VAR parameters. To the best of our knowledge, such a general model simultaneously permitting the presence of common factors and change points, has not been studied in the literature previously. Accordingly, we are not aware of any method that can comprehensively address the data segmentation problem considered in this article.

1.2 Methodological Novelty

The idea of scanning the data for changes over moving windows, has successfully been applied to a variety of data segmentation problems (Preuss, Puchstein, and Dette 2015; Eichinger and Kirch 2018; Chen, Wang, and Wu 2022). We propose FVARseg, a two-stage methodology that combines this idea with statistics carefully designed to have good detection power against different types of changes in the two latent components. In Stage 1 of FVARseg, motivated by that dominant factor-driven correlations appear as leading eigenvalues in the frequency domain, see for example, , we propose a detector statistic that contrasts the local spectral density matrix estimators from neighboring moving windows in operator norm, which is well-suited to detect changes in the factor-driven component.

In Stage 2 for detecting change points in the latent piecewise stationary VAR process, we deliberately avoid estimating the latent process which may incur large errors. Instead, we make use of (i) the Yule-Walker equation that relates autocovariances (ACV) and VAR parameters, and (ii) the availability of local ACV estimators of the latent VAR process after Stage 1. Combining these ingredients, we propose a novel detector statistic that enjoys methodological simplicity as well as statistical efficiency. Further, through sequential evaluation of the detector statistic, the second-stage procedure requires the estimation of local VAR parameters at selected locations only. Consequently it is highly competitive computationally when both the sample size and the dimensionality are large.

1.3 Theoretical Consistency

FVARseg achieves consistency in estimating the total number and locations of the change points in both of the piecewise stationary factor-driven and VAR processes. Our theoretical analysis is conducted in a setting considerably more general than those commonly adopted in the literature, permitting dependence across stationary segments and heavy-tailedness of the data. We also derive the rate of localization for each stage of FVARseg where we make explicit the influence of tail behavior and the size of changes. In particular, under Gaussianity, the estimators from Stage 1 nearly matches the minimax optimal rate derived for the simpler, covariance change point detection problem.

The rest of the article is structured as follows. Section 2 introduces the piecewise stationary factor-adjusted VAR model. Section 3 describes the two stages of FVARseg, the proposed data segmentation methodology, and Section 4 establishes its theoretical consistency. Section 5 demonstrates the performance of FVARseg empirically, and Section 6 concludes the article. R code is available from https://github.com/haeran-cho/fvarseg.

1.4 Notation

Let I and O denote an identity matrix and a matrix of zeros whose dimensions depend on the context. For a random variable X and ν1, denote ||X||ν=(E|X|ν)1/ν. Given A=[aii, 1im, 1in], we denote by A* its transposed complex conjugate. We define its element-wise l,l1 and l2-norms by |A|=maxi,i|aii|, |A|1=i,i|aii| and |A|2=i,i|aii|2, and its spectral and induced L1, L-norms by ||A||, ||A||1=max1ini=1m|aii| and ||A||=max1ini=1m|aii|, respectively. For positive definite A, we denote its minimum eigenvalue by ||A||min. For two real numbers, ab=max(a,b) and ab=min(a,b). For two sequences {an} and {bn}, we write anbn if, for some constants C1,C2>0, there exists NN such that C1anbn1C2 for all nN.

2 Piecewise Stationary Factor-Adjusted VAR Model

2.1 Background

A zero-mean, p-variate process ξt follows a VAR(d) model if it satisfies (1) ξt=A1ξt1++Adξtd+(Γ)1/2εt,(1) where AlRp×p, 1ld, determine how future values of the series depend on their past. The p-variate random vector εt=(ε1t,,εpt) has εit which are iid for all i and t with E(εit)=0 and var(εit)=1. The positive definite matrix ΓRp×p is the covariance matrix of the innovations for the VAR process.

A factor-driven component exhibits strong cross-sectional and/or serial correlations by “loading” finite-dimensional factors linearly. Among many, the generalized dynamic factor model (GDFM, Forni et al. 2000, 2015) provides the most general approach (see Appendix D for further discussions), and defines the p-variate factor-driven component χt as (2) χt=B(L)ut=l=0Blutl.(2)

For fixed q, the q-variate random vector ut=(u1t,,uqt) contains the common factors which are shared across the variables and time, and ujt are assumed to be iid for all j and t with E(ujt)=0 and var(ujt)=1. The matrix of square-summable filters B(L)=l=0BlLl with the lag-operator L and BlRp×q, serves the role of loadings under (2).

Barigozzi, Cho, and Owens (2022) propose a factor-adjusted VAR model, where the observations are assumed to be decomposed as a sum of the two latent components ξt and χt in (1)–(2), with pervasive correlations in the data are accounted for by χt and the remaining dependence captured by ξt. In the next section, we introduce its piecewise stationary extension where both the factor-driven and VAR processes are allowed to undergo structural changes.

2.2 Model

We observe a zero-mean, p-variate piecewise stationary process Xt=χt+ξt where (3) {χt=χt[k]=B[k](L)utforθχ,k+1tθχ,k+1, 0kKχ,ξt=ξt[k]=l=1dAl[k]ξtl+(Γ[k])1/2εtforθξ,k+1tθξ,k+1, 0kKξ.(3)

Here, θχ,k, 1kKχ, denote the change points in the piecewise stationary factor-driven component χt such that at each θχ,k, the filter of loadings B[k](L) undergoes a change. We permit the factor number to vary over time as qkq, with the factor ut[k]Rqk associated with χt[k] being a sub-vector of utRq. Similarly, θξ,k, 1kKξ, denote the change points in the piecewise stationary VAR process ξt at which the VAR parameters undergo shifts; we permit the VAR innovation covariance matrix to vary as Γ[k] but our interest lies in detecting changes in VAR parameters, and the VAR order may vary over time as dkd with Al[k]=O for ldk+1. By convention, we denote θχ,0=θξ,0=0 and θχ,Kχ+1=θξ,Kξ+1=n. In line with the factor modeling literature, we assume that χt and ξt are uncorrelated through having E(ujtεit) = 0 for any i, j, t and t.

The model (3) does not require that the change points in χt and ξt are aligned, or that Kχ=Kξ. Our goal is to estimate the total number and locations of the change points for both of the piecewise stationary latent processes. Importantly, we allow {ξt[k], tZ} (resp. {χt[k], tZ}) to be dependent across k through sharing the innovations εt (resp. ut). This makes our model considerably more general than those found in the literature on (high-dimensional) data segmentation under VAR models (Wang et al. 2019; Safikhani and Shojaie 2022; Bai, Safikhani, and Michailidis 2022) which assume independence across the segments. Data segmentation under factor models has been considered by Barigozzi, Cho, and Fryzlewicz (2018) and Li, Li, and Fryzlewicz (2023) but they adopt a static approach to factor modeling.

2.3 Assumptions

We introduce assumptions that ensure the (asymptotic) identifiability of the two latent processes in (3) which are framed in terms of spectral properties, as well as controlling the degree of dependence in the data. Denote by Γχ[k](l)=E(χtl[k](χt[k])) the ACV matrix of χt[k] at lag lZ, and its spectral density matrix at frequency ω[π,π] by Σχ[k](ω)=(2π)1l=Γχ[k](l)eιlω with ι=1. Then, μχ,j[k](ω), 1jqk, denote the real, positive eigenvalues of Σχ[k](ω) ordered by decreasing size. We similarly define Γξ[k](l),Σξ[k](ω) and μξ,j[k](ω) for ξt[k].

Assumption 2.1.

For each 0kKχ, the following holds: There exist a positive integer p01, pairs of functions ωαj[k](ω) and ωβj[k](ω) for ω[π,π] and 1jqk, and rk,j(0,1] satisfying rk,1rk,qk such that for all pp0, β1[k](ω)μχ,1[k](ω)prk,1α1[k](ω)>>βqk[k](ω)μqk[k](ω)prk,qkαqk[k](ω)>0.

If rk,j=1 for all 1jqk as frequently assumed in the literature (Fan, Liao, and Mincheva 2013; Forni et al. 2015), we are in the presence of qk factors which are equally pervasive for the whole cross-sections of χt[k]. If rk,j<1 for some j, we permit the presence of ‘weak’ factors. Since our primary interest lies in change point analysis, we later introduce a related but distinct condition on the size of change in χt in Assumption 4.2.

Assumption 2.2.

  1. det(l=1dAl[k]zl)0 for all |z|1 and 0kKξ.

  2. mεmin0kKξ||Γ[k]||minmax0kKξ||Γ[k]||Mε for some constants 0<mεMε.

  3. Consider the Wold decomposition ξt[k]=l=0Dl[k](Γ[k])1/2εtl where Dl[k]=[Dl,ij[k], 1i,jp]. Then, there exist constants Ξ>0 and ς>2 such that we have Cij, 1i,jp, satisfying max{max1jpi=1pCij,max1ipj=1pCij,max1ipj=1pCij2}Ξ with which max0kKξ|Dl,ij[k]|Cij(1+l)ς for all l0.

  4. min0kKξinfω[π,π]μξ,p[k](ω)mξ for some fixed constant mξ>0.

Assumption 2.3.

There exist constants Ξ>0 and ς>2 such that for all l0, max0kKχmax1ip|Bl,i·[k]|2Ξ(1+l)ς and max0kKχj=1qk|Bl,·j[k]|2Ξ(1+l)ς.

Assumption 2.2(i)–(ii) are standard conditions in the literature (Lütkepohl 2005; Basu and Michailidis 2015). Under condition (iii) and Assumption 2.3, we have time-varying serial dependence in Xt (across all segments) decay at an algebraic rate. Assumption 2.2(iii) allows for mild cross-correlations in ξt[k] while ensuring that μξ,1[k](ω) is uniformly bounded:

Proposition 2.1.

Under Assumption 2.2, uniformly over all ω[π,π], there exists some Mξ>0 depending only on Mε, Ξ and Ϛ such that max0kKξsupω[π,π]μξ,1[k](ω)Mξ.

Remark 2.1.

Proposition 2.1, together with Assumption 2.2(iv), establishes the boundedness of the eigenvalues of Σξ[k](ω), which is commonly assumed in the high-dimensional VAR literature for the consistency of Lasso estimators. Assumption 2.2(iv) holds if there exists some constant Ξ< satisfying max(max1jpl=1d|Al,·j[k]|1,max1ipl=1d|Al,i·[k]|1)Ξ (Basu and Michailidis 2015). When d = 1, we have Dl[k]=(A1[k])l such that if |A1[k]|γ<1, Assumption 2.2(iii) is readily satisfied with max(||Dl[k]||1,||Dl[k]||)Ξγl1.

From Assumption 2.1 and Proposition 2.1, the latent components in (3) are asymptotically identifiable as p, thanks to the gap between μχ,qk[k](ω) diverging with p and μξ,1[k](ω) which is uniformly bounded, which agrees with the phenomenon observed in .

3 Methodology

3.1 Stage 1: Factor-Driven Component Segmentation

3.1.1 Change Point Detection

The spectral density matrix of χt is given by Σχ[k](ω)=(2π)1B[k](eιω)(B[k](eιω))* for θχ,k+1tχχ,k+1, that is it varies over time in a piecewise constant manner with change points at θχ,k, 1kKχ. By Weyl’s inequality, Assumption 2.1 and Proposition 2.1 jointly indicate a gap in the eigenvalues of (time-varying) spectral density matrix of Xt, that is, those attributed to the factor-driven component diverges with p while the remaining ones are bounded for all p. This suggests an approach that looks for changes in χt from the behavior of Xt in the frequency domain which we further justify below.

Example 3.1.

Suppose that χt contains a single change point at t=θχ,1 at which a new factor is introduced, that is χt[0]=B[0](L)ut[0] and χt[1]=B[1](L)ut[1]=B[0](L)ut[0]+b(L)vt with ut[1]=((ut[0]),vt), which leads to Σχ[1](ω)Σχ[0](ω)=b(eιω)b*(eιω)/(2π). Then, from the uncorrelatedness between χt and ξt and Proposition 2.1, the time-varying spectral density of Xt,Σx,t(ω), satisfies ||t=1θχ,1Σx,t(ω)/θχ,1t=θχ,1+1nΣx,t(ω)/(nθχ,1)||=||b(eιω)b*(eιω)||/(2π)+O(1). That is, the change in the spectral density of χt is detectable as a change in time-varying spectral density matrix of Xt in operator norm, with the size of change diverging with p as ||(b(eιω))*b(eιω)|| does so under Assumption 2.1.

Thus, we detect changes in χt by scanning for any large change in the spectral density matrix of Xt measured in operator norm, and propose the following moving window-based approach. Given a bandwidth G, we estimate the local spectral density matrix of Xt by (4) Σ̂x,v(ω,G)=12πl=mmK(lm)Γ̂x,v(l,G)exp(ιlω)forGvn,(4) where K(·) denotes the Bartlett kernel, m=Gβ the kernel bandwidth with β(0,1), and (5) Γ̂x,v(l,G)=1Gt=vG+1+lvXtlXt for l0, and Γ̂x,v(l,G)=Γ̂x,v(l,G) for l<0.(5)

Then the following statistic (6) Tχ,v(ω,G)=||Σ̂x,v(ω,G)Σ̂x,v+G(ω,G)||,GvnG,(6) serves as a good proxy of the difference in local spectral density matrices of χt over Iv(G)={vG+1,,v} and Iv+G(G)={v+1,,v+G}. To make it more precise, let Σχ,v(ω,G) denote a weighted average k=0Kχwχ,k(v)Σχ[k](ω) with weights wχ,k(v) corresponding to the proportion of χt, tIv(G), belonging to χt[k] (see F.1). Then, Tχ,v*(ω,G)=||Σχ,v(ω,G)Σχ,v+G(ω,G)||, as a function of v, linearly increases and then decreases around the change points with a peak of size ||Σχ[k](ω)Σχ[k+1](ω)|| formed at v=θχ,k for all 1kKχ, provided that the bandwidth G is not too large (in the sense of Assumption 4.2(ii)). The detector statistic Tχ,v(ω,G) is designed to approximate Tχ,v*(ω,G) when χt is not directly observed, and thus is well-suited to detect and locate the change points therein. Unlike other methods for detecting changes in the factor structure (e.g., Li, Li, and Fryzlewicz 2023), we do not require the number of factors, either for each segment or for the whole dataset, as an input for the construction of Tχ,v(ω,G).

Once Tχ,v(ωl,G) is evaluated at the Fourier frequencies ωl=2πl/(2m+1), 0lm, we adapt the maximum-check of Eichinger and Kirch (2018) for simultaneous detection of the multiple change points. Taking the pointwise maximum over the frequencies at each given location v, we check if Tχ,v(ω(v),G) exceeds some threshold κn,p where ω(v) denotes the frequency at which Tv(ωl,G) is maximized, that is ω(v)=argmaxωl:0lmTv(ωl,G). If so, it provides evidence that a change point θχ,k is located near the time point v, but some care is needed to avoid detecting duplicate estimators, since the detector statistic is expected to take a large value over an interval containing θχ,k. Therefore, denoting by I{G,,nG} the set containing all time points at which Tχ,v(ω(v),G)>κn,p, we regard θ̂=argmaxvITχ,v(ω(v),G) as a change point estimator if it is a local maximizer of Tχ,v(ω(θ̂),G) within an interval of radius ηG centered at θ̂ with some η(0,1), that is Tχ,θ̂(ω(θ̂),G)maxθ̂ηG<vθ̂+ηGTχ,v(ω(θ̂),G). Once θ̂ is added to the set of final estimators, say Θ̂χ, in order to avoid the risk of duplicate estimators, we remove the interval of radius G centered at θ̂ from I, and repeat the same procedure with the maximizer of Tχ,v(ω(v),G) at time points v remaining in I until the set I is empty. Algorithm 1 in Appendix C outlines the steps of Stage 1 of FVARseg.

3.1.2 Post-Segmentation Factor Adjustment

Following the detection of change points in χt, we are able to estimate the segment-specific quantities related to χt[k]. In view of the second-stage of FVARseg detecting change points in ξt, we describe how to estimate Γχ[k](l) with which we can estimate the ACV of ξt.

For each k=0,,K̂χ, we first estimate the spectral density of Xt over the segment {θ̂χ,k+1,,θ̂χ,k+1} by Σ̂x[k](ω) as in (4) using the sample ACV computed from the segment (we use the same kernel bandwidth m for simplicity). Then noting that the spectral density matrix of χt[k] is of rank qk under (3), we estimate it from the eigendecomposition of Σ̂x[k](ωl) by retaining only the qk largest eigenvalues, say μ̂x,j[k](ωl), and the associated eigenvectors êx,j[k](ωl), and then estimate the ACV of χt[k] by inverse Fourier transform, that is (7) Σ̂χ[k](ωl) =j=1qkμ̂x,j[k](ωl)êx,j[k](ωl)(êx,j[k](ωl))* and Γ̂χ[k](l)=2π2m+1l=mmΣ̂χ[k](ωl)eιωll.(7)

The estimators in (7) require the factor number qk as an input. We refer to Hallin and Liška (2007) for an information criterion (IC)-based estimator of qk that make use of the postulated eigengap in the spectral density matrix of Xt.

3.2 Stage 2: Piecewise VAR Process Segmentation

Applying the existing VAR segmentation methods in our setting requires estimating the np elements of the latent piecewise stationary VAR process ξt, which introduces additional errors and possibly results in the loss of statistical efficiency. In addition, as discussed in Appendix A.2, the existing methods tend to be computationally demanding, for example by evaluating the Lasso estimators O(n2) times in a dynamic programming algorithm, or solving a large fused Lasso objective function of dimension np2d. Instead, since we can estimate the local ACV of ξt from the post-segmentation factor-adjustment in Stage 1, our proposed methodology for segmenting the latent VAR component avoids estimating ξt directly. Also, as described below, the proposed method evaluates the local VAR parameters at carefully selected locations only, and thus is computationally efficient.

Specifically, our approach makes use of the Yule-Walker equation (Lütkepohl 2005). Let β[k]=[A1[k],,Ad[k]]R(pd)×p contain all VAR parameters in the kth segment. Then, it is related to the ACV matrices Γξ[k](l)=E(ξtl[k](ξt[k])) as G[k]β[k]=g[k], where (8) G[k]=[Γξ[k](0)Γξ[k](1)Γξ[k](d+1)Γξ[k](d1)Γξ[k](d2)Γξ[k](0)]andg[k]=[Γξ[k](1)Γξ[k](d)],(8) with G[k] being invertible due to Assumption 2.2(iv). We propose to use this estimating equation in combination with the local ACV estimators of ξt obtained as described below.

For a given bandwidth G and the interval Iv(G)={vG+1,,v}, we estimate the ACV of ξt for tIv(G), by Γ̂ξ,v(l,G)=Γ̂x,v(l,G)Γ̂χ,v(l,G). Here, Γ̂x,v(l,G) is defined in (5) and Γ̂χ,v(l,G) is a weighted average of Γ̂χ[k](l), 0kK̂χ, the estimators of ACV of χt[k] in (7), with the weights given by the proportion of Iv(G) covered by the kth segment (see F.16 for the precise definition). Replacing Γξ[k](l) with Γ̂ξ,v(l,G), we obtain Ĝv(G) estimating a weighted average of G[k], and similarly ĝv(G). Then, we propose to scan Tξ,v(β̂,G)=|||(Ĝv(G)β̂ĝv(G))(Ĝv+G(G)β̂ĝv+G(G))||| with some inspection parameter β̂R(pd)×p and a matrix norm |||·|||. We motivate this statistic by considering Tξ,v*(β̂,G)=|||(Gv(G)β̂gv(G))(Gv+G(G)β̂gv+G(G))|||, its population counterpart. With appropriately chosen G (see Assumption 4.4(ii)), Tξ,v*(β̂,G)=0 if v is far from all the change points in ξt, that is mink|vθξ,k|G, while it is tent-shaped near the change points with a local maximum at v=θξ,k, provided that (9) G[k1](β̂β[k1])G[k](β̂β[k]).(9)

For the inspection parameter, we adopt an l1-regularized Yule-Walker estimator which, at some v°{G,,n}, solves the following constrained l1-minimization problem (10) β̂v°(G)=argminβRpd×p|β|1subject to|Ĝv°(G)βĝv°(G)|λn,p,(10) with a tuning parameter λn,p>0. Assuming stationarity, a similar approach has been proposed for the estimation of high-dimensional VAR in Han, Lu, and Liu (2015), and extended to the case where the VAR process of interest is latent in Barigozzi, Cho, and Owens (2022). The l-constraint in (10) naturally leads to the choice |||·|||=|·|, resulting in the following detector statistic: Tξ,v(β̂,G)=|(Ĝv(G)β̂ĝv(G))(Ĝv+G(G)β̂ĝv+G(G))|.

For good detection power, the condition in (9) suggests using an estimator of β[k1] or β[k] in place of β̂ for detecting θξ,k. Therefore, we propose to evaluate Tξ,v(β̂v°(G),G) for vG, with β̂v°(G) updated sequentially at locations strategically selected as below.

First we estimate β[0] by β̂=β̂G(G) in (10) with v°=G and scan the data using Tξ,v(β̂,G), vv°. When Tξ,v(β̂,G) exceeds some threshold, say πn,p, at v=θˇ for the first time, it signifies that a change has occurred in the neighborhood. Reducing the search for a change point to {θˇ,,θˇ+G}, we identify a change point estimator as the local maximizer θ̂ξ,1=argmaxθˇvθˇ+GTξ,v(β̂,G). Then updating β̂ with β̂v°(G) obtained at v°=θ̂ξ,1+(η+1)G for some η(0,1] (i.e., only using an interval of length G located strictly to the right of θ̂ξ,1 for its computation), we continue screening Tξ,v(β̂,G), vv°, until it next exceeds πn,p. These steps of screening Tξ,v(β̂,G) and updating β̂ are repeated iteratively until the end of the data sequence is reached. Algorithm 2 in Appendix C outlines the steps of the Stage 2 methodology.

illustrates that although ξt is latent, at each iteration, β̂v°(G) does as well as its oracle counterpart (obtained as in (10) with the sample ACV of ξt replacing Γ̂ξ,v(l,G)). Computationally, this strategy benefits from that the costly solution to the l1-minimization problem in (10) is required (at most) Kξ+1 times with an appropriately chosen threshold πn,p (see Theorem 4.3). We further demonstrate numerically the competitiveness of Stage 2 as a standalone method for VAR time series segmentation in Section 5.2, and provide an in-depth comparative study with the existing methods in Appendix A.2.

Fig. 2 Illustration of Stage 2 applied to a realization from (M1) of Section 5.2 with G = 300 and d = 1. Top: The solid curve represents Tξ,v(β̂,G), v°vθˇ+G, computed at the three iterations of Steps 1–3 of Algorithm 2. At each iteration, we use β̂=β̂v°(G) estimated from each of the sections the data highlighted in the x-axis (left to right); the corresponding estimators are plotted in the bottom panel and for comparison, we also plot the estimators obtained in the oracle setting where ξt is observable (all plots have the identical z-axis range). The locations of v°,θˇ and θ̂ in Algorithm 2, and θξ,k are denoted by the vertical long-dashed, dot-dashed, dotted and dashed lines, respectively. The horizontal line represents πn,p chosen as described in Section 5.1.

Fig. 2 Illustration of Stage 2 applied to a realization from (M1) of Section 5.2 with G = 300 and d = 1. Top: The solid curve represents Tξ,v(β̂,G), v°≤v≤θˇ+G, computed at the three iterations of Steps 1–3 of Algorithm 2. At each iteration, we use β̂=β̂v°(G) estimated from each of the sections the data highlighted in the x-axis (left to right); the corresponding estimators are plotted in the bottom panel and for comparison, we also plot the estimators obtained in the oracle setting where ξt is observable (all plots have the identical z-axis range). The locations of v°,θˇ and θ̂ in Algorithm 2, and θξ,k are denoted by the vertical long-dashed, dot-dashed, dotted and dashed lines, respectively. The horizontal line represents πn,p chosen as described in Section 5.1.

4 Theoretical Properties

4.1 Consistency of Stage 1 of FVARseg

We carry out our theoretical investigation under two different regimes with respect to the tail behavior of ut and ξt; in particular, the weaker condition in Assumption 4.1(i) permits heavy-tailed innovations, while the existing literature on (piecewise stationary) VAR modeling in high dimensions, commonly adopts the Gaussianity as in (ii).

Assumption 4.1.

We assume either of the following conditions.

  1. There exists ν>4 such that max{E(|ujt|ν),E(|εit|ν)}μν<.

  2. utiidNq(0,I) and εtiidNp(0,I).

In establishing the consistency of Stage 1, we opt to measure the size of changes in χt using Δχ,k(ω)=Σχ[k](ω)Σχ[k1](ω),1kKχ, the difference in spectral density matrices of χt from neighboring segments. As Δχ,k(ω) is Hermitian, we can always find the jth largest (in modulus), real-valued eigenvalue of Δχ,k(ω) which we denote by μj(Δχ,k(ω)), with μ1(Δχ,k(ω))=||Δχ,k(ω)||. Recall that m=Gβ for some β(0,1), denotes the bandwidth used in local spectral density estimation, see (4).

Assumption 4.2.

  1. For each 1kKχ, the following holds: There exist a positive integer p01 and pairs of functions ωaj[k](ω) and ωbj[k](ω) for ω[π,π] and j = 1, 2, and rk,1(0,1] and rk,2[0,1] satisfying rk,1rk,2, such that b1[k](ω)μ1(Δχ,k(ω))prk,1a1[k](ω)>b2[k](ω)μ2(Δχ,k(ω))prk,2a2[k](ω)0for all pp0. Besides, we assume that the functions ωprk,1μ1(Δχ,k(ω)) are Lipschitz continuous with bounded Lipschitz constants. Then for Δχ,k=maxω[π,π]μ1(Δχ,k(ω)), we have max1kKχΔχ,k1·p(ψnm1)=o(1), where (11) ψn={​​​​n2/νmlog2+2/ν(G)Gmlog(n)Gunder Assumption 4.1(i),mlog(n)Gunder Assumption 4.1(ii).(11)

  2. The bandwidth G = Gn satisfies Gn as n while fulfilling (12) min{min0kKχ(θχ,k+1θχ,k),min0kKξ(θξ,k+1θξ,k)}2G.(12) Assumption 4.2 specifies the detection lower bound which is determined by minkΔχ,k and mink(θχ,k+1θχ,k) (through G), for all Kχ change points χt to be detectable by Stage 1. In the literature on change point detection in factor models, a common assumption is that the change is large enough to appear as extra “factors” (Duan, Bai, and Han 2022), in light of which the condition (i) is a reasonable one. It further requires μ1(Δχ,k(ω)) to be distinct from the rest. In fact, the remaining μj(Δχ,k(ω)), j2, are allowed to be exactly zero, which is the case in Example 3.1; here, we have Δχ,1=maxω(2π)1||(b(eιω))*b(eιω)|| where b(z) is a p-variate vector of factor loading filters. The rate p(ψnm1) represents the bias-variance tradeoff when estimating the local spectral density matrix of χt by Σ̂x,v(G,ω) (see Proposition F.6). It is possible to find the rate of kernel bandwidth m that minimizes this rate depending on the tail behavior of Xit (e.g., m(G/log(n))1/3 under Gaussianity), but we choose to explicitly highlight the role of this tuning parameter on our results.

Theorem 4.1.

Suppose that Assumptions 2.1–2.3, 4.1, and 4.2 hold. Let κn,p satisfy 2Mp(ψn1m1p)<κn,p<12min1kKχΔχ,kMp(ψn1m1p)for some constant M > 0. Then, there exists a set Mn,pχ with P(Mn,pχ)1 as n,p, such that the following holds for Θ̂χ={θ̂χ,k, 1kK̂χ: θ̂χ,1<<θ̂χ,K̂χ} returned by Stage 1 of FVARseg, on Mn,pχ for large enough n and p:

  1. K̂χ=Kχ and max1kKχ|θ̂χ,kθχ,k|ϵ0G for some ϵ0(0,1/2) with η(2ϵ0,1].

  2. There exists a constant c0>0 such that for all 1kKχ,|θ̂χ,kθχ,k|c0ρn,p[k] where ρn,p[k]=(Δχ,kp)2×{mνν2(GKχ)2ν2under Assumption 4.1(i),mlog(GKχ)under Assumption 4.1(ii).

Remark 4.1.

  1. In Theorem 4.1(b), ρn,p[k] reflects the difficulty associated with estimating the individual change point θχ,k manifested by (p1Δχ,k)2. In the Gaussian case (Assumption 4.1(ii)), the localization rate ρn,p[k] is always sharper than G due to Assumption 4.2(i). Considering the problem of covariance change point detection in independent, sub-Gaussian random vectors in high dimensions, Wang, Yu, and Rinaldo (2021) derive the minimax lower bound on the localization rate in their Lemma 3.2, and ρn,p[k] matches this rate up to mlog(n); here, the dependence on the kernel bandwidth m is attributed to that we consider a time series segmentation problem, that is a change may occur in the ACV of χt at lags other than zero. If heavier tails are permitted (Assumption 4.1(i)), ρn,p[k] can be tighter than ϵ0G, for example when Δχ,kp,Kχ is fixed and mGβ for some β(0,14/ν).

  2. Empirically, replacing θ̂ with θ˜=argmaxvI avglTχ,v(ωl,G) returns a more stable location estimator, where avgl denotes the average operator over l=0,,m. We can derive the localization rate for θ˜ similarly as in Theorem 4.1(b) with Δ˜χ,k=π10π||Δχ,k(ω)||dω in place of Δχ,k. Our numerical results in Section 5.2 are based on this estimator.

Next, we establish the consistency of Γ̂χ[k](l) in (7) estimating the segment-specific ACV of χt[k] under the following assumption on the strength of factors.

Assumption 4.3.

Assumption 2.1 holds with rk,j=1 for all 1jqk and 0kKχ.

Theorem 4.2.

Suppose that Assumption 4.3 holds in addition to the assumptions made in Theorem 4.1, and define ρn,p=max1kKχmin(ϵ0G,ρn,p[k]). Also let ϑn,p ={​​​​m(np)2/νlog7/2(p)G  mlog(np)Gunder Assumption 4.1(i),mlog(np)Gunder Assumption 4.1(ii).

Then on Mn,pχ defined in Theorem 4.1, for some finite integer dN, we have max0kKχmax0ld|Γ̂χ[k](l)Γχ[k](l)|=Op(ϑn,p1mρn,pG1p).

It is possible to work under the weaker Assumption 2.1 and trace the effect of weak factors or bound estimation errors measured in different norms. Corollary E.16 of Barigozzi, Cho, and Owens (2022) derives such results in the stationary setting, where an additional multiplicative factor of p2(1minkrk,qk) appears in the OP-bound in Theorem (4.2). We work under the stronger Assumption 4.3 as it simplifies the presentation of Theorem 4.2 which plays an important role in the investigation into Stage 2 of FVARseg. Also, for the factor number estimator of Hallin and Liška (2007) which achieves consistency under the general dynamic factor model we adopt in (3), it is required that rk,qk=1 for all 0kKχ. Forni et al. (2004) argue that this assumption is a natural one requiring the influence of the common shocks to be, in some sense, “stationary along the cross-sections,” and also it is compatible with the cross-sectional ordering being completely arbitrary.

4.2 Consistency of Stage 2 of FVARseg

Suppose that the tuning parameter for the l1-regularized Yule-Walker estimation problem in (10), is set with some constant M > 0 and ϑn,p and ρn,p defined in Theorem 4.2, as (13) λn,p=M(max0kKξ||β[k]||1+1)(ϑn,p1mρn,pG1p).(13)

This choice reflects the error in Γ̂ξ,v(l,G) estimating the local ACV of ξt over all v and l.

The following assumption imposes conditions on the size of the changes in VAR parameters and the minimum spacing between the change points.

Assumption 4.4.

  1. For each 1kKξ, let Δξ,k=G[k](β[k]β[k1]). Then, max1kKξ(1||G[k](G[k1])1||1)λn,p|Δξ,k|=o(1).

  2. The bandwidth G fulfils (12), that is min0kKξ(θξ,k+1θξ,k)2G.

Remark 4.2.

We choose to measure the size of change using |Δξ,k|. From Assumption 2.2(iv), we have Δξ,k=O iff β[k]β[k1]=O. In the related literature, the l2-norm |β[k]β[k1]|2 scaled by the global sparsity (given by the union of the supports of all β[k], 0kKξ), is used to measure the size of change where this global sparsity may be much greater than that of Δξ,k when Kξ is large, see Appendix A.2. In some instances, we have G[k](G[k1])1=I, for example, when d = 1 and A1[k]=A1[k1] such that Assumption 4.4(i) becomes λn,p=o(mink|Δ|). More generally, bounding ||G[k](G[k1])1||1 implicitly assumes (approximate) sparsity on the second-order structure of ξt. When d = 1, we have G[k]=l=0(A1[k])lΓ[k][(A1[k])]l such that the boundedness of ||G[k]||1 and ||(G[k])1||1 follows when A1[k] and Γ[k] are block diagonal with fixed block size (Wang and Tsay 2022). For general d1, we have ||G[k](G[k1])1||1 bounded if G[k] are strictly diagonally dominant (see Definition 6.1.9 of Horn and Johnson (1985) and Han, Lu, and Liu (2015)), which is met for example, when Al[k] are diagonal with their diagonal entries fulfilling γξ,ii[k](0)>2l=1d1|γξ,ii[k](l)| (where Γξ[k](l)=[γξ,ii[k](l)]i,i); this trivially holds when d = 1.

Theorem 4.3.

Suppose that Assumption 4.4 holds in addition to the assumptions made in Theorem 4.2. With λn,p chosen as in (13), we set πn,p to satisfy 2λn,p<πn,p<12min1kKξ|Δξ,k|.

Then, there exists a set Mn,pξ with P(Mn,pξ)1 as n,p, such that the following holds for Θ̂ξ={θ̂ξ,k, 1kK̂ξ: θ̂ξ,1<<θ̂ξ,K̂ξ} returned by Stage 2 of FVARseg, on Mn,pξ for large enough n:

  1. K̂ξ=Kξ and max1kKξ|θ̂ξ,kθξ,k|ϵ0G for some ϵ0(0,1/2) with η(ϵ0,1].

  2. There exists a constant c0>0 such that for all 1kKξ satisfying {θξ,k2G+1,,θξ,k+2G}Θχ=, we have |θ̂ξ,kθξ,k|c0ϱn,p[k], where ϱn,p[k]=|Δξ,k|2(1+max0kKξ||β[k]||1)×{(GKξp)2ν2log3νν2(p)underAssumption 4.1(i),log(GKξp)underAssumption 4.1(ii).

Due to the sequential nature of FVARseg, the success of Stage 2 is conditional on that of Stage 1 which occurs on an asymptotic one-set, see Theorem 4.1. Theorem 4.3(a) establishes that Stage 2 of FVARseg consistently detects all Kξ change points within the distance of ϵ0G where ϵ0 can be made arbitrarily small as n,p under Assumption 4.4(i). Theorem 4.3(b) shows that a further refined localization rate can be derived for θξ,k when it is sufficiently distanced away from the change points in the factor-driven component. If, say, θξ,k lies close to θχ,k, a change point in χt, the error from estimating the local ACV of ξt due to the bias in θ̂χ,k, prevents applying the arguments involved in the refinement to such θξ,k. The refined rate ϱn,p[k] is always tighter than G under Gaussianity.

It is of independent interest to consider the cases where χt is stationary (i.e., Kχ=0) or where we directly observe the piecewise stationary VAR process (i.e., Xt=ξt). Consistency of the Stage 2 of FVARseg readily extends to such settings and the improved localization rates in Theorem 4.3(b) apply to all the estimators. Also, further improvement is attained in the heavy-tailed situations (Assumption 4.1(i)) if ξt is directly observable. For the full statement of the results, we refer to Corollary A.1 in Appendix A where we also provide a detailed comparison between Stage 2 of FVARseg and existing VAR segmentation methods (that do not take into the possible presence of factors), both theoretically and numerically.

5 Empirical Results

5.1 Numerical Considerations

5.1.1 Multiscale Extension

The bandwidth G is required to be large enough to provide a good local estimators of spectral density of χt (Stage 1) and VAR parameters (Stage 2). However, if G is too large, we may have windows that contain two or more changes when scanning the data for change points, which violates Assumptions 4.2(ii) and 4.4(ii). Cho and Kirch (2022) note the lack of adaptivity of a single-bandwidth moving window procedure in the presence of multiscale change points (a mixture of large changes over short intervals and smaller changes over long intervals), and advocates the use of multiple bandwidths. Accordingly we also propose to apply FVARseg with a range of bandwidths and prune down the outputs using a “bottom-up” method (Messer et al. 2014; Meier, Kirch, and Cho 2021). Let Θ̂(G) denote the output from Stage 1 or 2 with a bandwidth G. Given a set of bandwidths G={Gh, 1hH: G1<<GH}, we accept all estimators from the finest G1 to the set of final estimators Θ̂ and sequentially for h2, accept θ̂Θ̂(Gh) iff minθˇΘ̂|θ̂θˇ|G/2. In simulation studies, we use Gχ={[n/10],[n/8],[n/6],[n/4]} for Stage 1, and Gξ generated as an equispaced sequence between [2.5p] and [n/4] of length 4 for Stage 2. The choice of Gξ is motivated by the simulation results of Barigozzi, Cho, and Owens (2022) under the stationarity, where the l1-regularized estimator in (10) was observed to perform well when the sample size exceeds 2p.

5.1.2 Speeding Up Stage 1

The computational bottleneck of FVARseg is the computation of Tχ,v(ωl,G) in Stage 1, which involves singular value decomposition (SVD) of a p × p-matrix at multiple frequencies and over time. We propose to evaluate Tχ,v(ωl,G) on a grid v{G+abn: 0a(n2G)/bn} with bn=2log(n). This may incur additional bias of at most bn/2log(n) in change point location estimation which is asymptotically negligible in view of Theorem 4.1, but reduce the computational load by the factor of bn.

5.1.3 Selection of Thresholds

The theoretically permitted ranges of κn,p and πn,p (see Theorems 4.1 and 4.3) depend on constants which are not accessible or difficult to estimate in practice. This is an issue commonly encountered by data segmentation methods which involve localized testing, and often a reasonable solution is found by large-scale simulations, an approach we also take. We use simulations to derive a simple rule for selecting the threshold as a function of n, p, and G. For this, we (i) propose a scaling for each of the two detector statistics adopted in Stages 1 and 2 which reduces its dependence on the data generating process, and (ii) fit a linear model for an appropriate percentile of the scaled detector statistics obtained from simulated datasets. Specifically, we simulate B = 100 time series following (3) with Kχ=Kξ=0 using the models considered in Section 5.2, and record the maximum of the scaled detector statistics Tχ,v°(G) and Tξ,v°(G) over v on each realization. Here, the scaling terms are obtained from the first G observations only, as Tχ,v°(G)=max0lmTχ,v(ωl,G)Tχ,G(ωl,G) and Tξ,v°(G)=Tξ,v(β̂G(G),G)max0ld|Γ̂ξ,[G2](l,[G2])Γ̂ξ,G(l,[G2])|.

Generating the data with varying (n,p,q,d) and repeating the above procedure with multiple choices of G, we fit a linear model to the 100(1τ)th percentile of log(maxvTχ,v°(G)) with loglog(n) and log(G) as regressors (Radj2=0.9651), and use the fitted model to derive a threshold for given n and G that is then applied to the similarly scaled Tχ,v°(ωl,G). Analogously, we regress the 100(1τ)th percentile of log(maxvTξ,v°(G)) onto loglog(n),loglog(p) and log(G) (Radj2=0.985), and find a threshold applied to the scaled Tξ,v°(β̂,G) given n, p, and G from the fitted model. The choice of the regressors is motivated by the definitions of ψn and ψn,p which appear in Theorems 4.1 and 4.3. The high values of Radj2 indicate the excellent fit of the linear models and consequently, that the threshold selection rule is insensitive to the data generating processes. When Stage 2 is used as a standalone method for segmenting observed VAR processes, a smaller threshold is recommended which is in line with Corollary A.1, and we find that πn,p=1 works well with the proposed scaling.

5.1.4 Other Tuning Parameters

While data-adaptive methods exist for selecting the kernel window size m in (4) (Politis 2003), we find that setting it simply at m=max(1,G1/3) for given G, works well for the purpose of data segmentation. The results are not highly sensitive to the choice of η in Stage 1 and use η=0.5 throughout. In Stage 2, we find that not trimming off the data when estimating the VAR parameters by setting η = 0, does not hurt the numerical performance. In factor-adjustment, we select the segment-specific factor number qk using the IC-based approach of Hallin and Liška (2007). Krampe and Margaritella (2021) propose to jointly select the (static) factor number and the VAR order using an IC but generally, the validity of IC is not well-understood for VAR order selection in high dimensions. In our simulations, following the practice in the literature on VAR segmentation, we regard d as known but also investigate the sensitivity of FVARseg when d is misspecified. In analyzing the panel of daily volatilities (Section 5.3), we use d = 5 which has the interpretation of the number of trading days per week. Finally, we select λn,p in (10) via cross validation as in Barigozzi, Cho, and Owens (2022).

5.2 Simulation Studies

In the simulations, we consider the cases when the factor-driven component is present (χt0) and when it is not (χt=0). For the former, we consider two models for generating χt with q = 2. In the first model, referred to as (C1), χt admits a static factor model representation while in the second model (C2), it does not; empirically, the task of factor structure estimation is observed to be more challenging under (C2) (Forni et al. 2017; Barigozzi, Cho, and Owens 2022). We generate ξt as piecewise stationary Gaussian VAR(d) processes with d{1,2} and a parameter β that controls the size of the change (with smaller β indicating the smaller change). We refer to Appendix B.1 for the full descriptions of simulation models and for an overview of the 24 data generating processes which also contains information about the sets of change points Θχ and Θξ; under each setting, we generate 100 realizations. Below we provide a summary of the findings from the simulation studies, and –B.2 reporting the results can be found in Appendix B.2.

Table 1 Data generating processes for simulation studies.

To the best of our knowledge, there does not exist a methodology that comprehensively addresses the change point problem under the model (3). Therefore under (M1)–(M2), we compare the Stage 1 of FVARseg with a method proposed in Barigozzi, Cho, and Fryzlewicz (2018), referred to as BCF hereafter, on their performance at detecting changes in χt. While BCF has a step for detecting change points in the remainder component, it does so nonparametically unlike the Stage 2 of FVARseg, which may lead to unfair comparison. Hence, we separately consider (M3) with Xt=ξt where we compare the Stage 2 method with VARDetect (Safikhani, Bai, and Michailidis 2022), a block-wise variant of Safikhani and Shojaie (2022).

5.2.1 Results under (M1)–(M2)

Overall, FVARseg achieves good accuracy in estimating the total number and locations of the change points for both χt and ξt across different data generating processes. Under (M1) adopting the static factor model for generating χt, FVARseg shows similar performance as BCF in detecting Θχ when the dimension is small (p = 50), but the latter tends to over-estimate the number of change points as p increases. Also, FVARseg outperforms the binary segmentation-based BCF in change point localization. BCF requires as an input the upper bound on the number of global factors, say q, that includes the ones attributed to the change points, and its performance is sensitive to its choice. In (M1), we have q3q(Kχ+1) (which is supplied to BCF) while in (M2), χt[k] does not admit a static factor representation and accordingly such q does not exist (we set q=2q for BCF). Accordingly, BCF tends to under-estimate the number of change points under (M2). Generally, the task of detecting change points in ξt is aggravated by the presence of change points in χt due to the sequential nature of FVARseg, and the Stage 2 performs better when Kχ=0 both in terms of detection and localization accuracy, which agrees with the observations made in Corollary A.1(a).

Between (M1) and (M2), the latter poses a more challenging setting for the Stage 2 methodology. This may be attributed to (i) the difficulty posed by the data generating scenario (C2), which is observed to make the estimation tasks related to the latent VAR process more difficult (Barigozzi, Cho, and Owens 2022), and (ii) that Θχ=Θξ where the estimation bias from Stage 1 has a worse effect on the performance of Stage 2 compared to when Θχ and Θξ do not overlap, see the discussion below Theorem 4.3.

5.2.2 Results under (M3)

Table B.2 shows that the Stage 2 of FVARseg outperforms VARDetect in all criteria considered, particularly as p increases. VARDetect struggles to detect any change point when the change is weak (recall that β=0.6 is used when d = 1 which makes the size of change at θξ,2 small) or when d = 2. FVARseg is faster than VARDetect in most situations except for when (d,p,Kξ)=(1,50,0), sometimes more than 10 times for example when d = 2 and there is no change point in the data. Additionally, Stage 2 of FVARseg is less sensitive to the over-specification of the VAR order (d = 2 is used when in fact d = 1). When it is under-specified, there is slight loss of detection power as expected. Generally, an increase in VAR order brings in an increase in the number of VAR parameters which impacts the empirical performance. Compared to the results obtained under (M1)–(M2), the localization performance of the Stage 2 method improves in the absence of the factor-driven component, even though the size of changes under (M3) tends to be smaller. This confirms the theoretical findings reported in Corollary A.1 (b) in Appendix A. Although not reported here, when the full FVARseg methodology is applied to the data generated under (M3), the Stage 1 method does not detect any spurious change point estimators as desired.

5.3 Application: U.S. Blue Chip Data

We consider daily stock prices from p = 72 U.S. blue chip companies across industry sectors between January 3, 2000 and February 16, 2022 (n = 5568 days), retrieved from the Wharton Research Data Services; the list of companies and their corresponding sectors can be found in Appendix E. Following Diebold and Y ilmaz (2014), we measure the volatility using σit2=0.361(pithighpitlow)2 where pithigh (resp. pitlow) denotes the maximum (resp. minimum) log-price of stock i on day t, and set Xit=log(σit2).

We apply FVARseg to detect change points in the panel of volatility measures {Xit, 1ip; 1tn}. With n0=252 denoting the number of trading days per year, we apply Stage 1 with bandwidths chosen as an equispaced sequence between [n0/4] and 2n0 of length 4, implicitly setting the minimum distance between two neighboring change points to be three months. Based on the empirical sample size requirement for VAR parameter estimation (see Section 5.1), we apply Stage 2 with bandwidths chosen as an equispaced sequence between 2.5p and 2n0 of length 4. The VAR order is set at d = 5 which corresponds to the number of trading days in each week, and the rest of the tuning parameters are selected as in Section 5.1. reports the segmentation results.

Table 2 Sets of change point estimators returned by FVARseg.

Stage 1 detects four change points around the Great Financial Crisis between 2007 and 2009, and the last two estimators from Stage 1 correspond to the onset (2020-02-20) and the end (2020-04-07) of the stock market crash brought in by the instability due to the COVID-19 pandemic. Given the clustering of change points between 2007 and 2009, an alternative approach is to adopt a locally stationary factor model as in Barigozzi et al. (2021). However, such a model does not allow for the number of factors to vary over time, whereas we observe the contrary to be the case when applying the IC-based method of Hallin and Liška (2007) to each segment defined by Θ̂χ, see . This supports that it is more appropriate to model the changes in the factor-driven component of this dataset as abrupt changes rather than as smooth transitions.

Table 3 Estimated number of factors q̂k from {Xt, θ̂χ,k+1tθ̂χ,k+1},k=0,,7.

The estimators from Stage 2 are spread across the period in consideration. illustrate how the linkages between different companies vary over the four segments identified between 2003 and 2011 particularly at the level of industrial sectors, although this information is not used by FVARseg.

To further validate the segmentation obtained by FVARseg, we perform a forecasting exercise. Two approaches, referred to as (F1) and (F2), are adopted to build forecasting models where the difference lies in how a sub-sample of {Xu, ut1}, is chosen to forecast Xt. Simply put, (F1) uses the observations belonging to the same segment as Xt only, for constructing the forecast of χt (resp. ξt) according to the segmentation defined by Θ̂χ (resp. Θ̂ξ), while (F2) ignores the presence of the most recent change point estimator. We expect (F1) to give more accurate predictions if the data undergoes structural changes at the detected change points. On the other hand, if some of the change point estimators are spurious, (F2) is expected to produce better forecasts since it makes use of more observations. We select T, the set of time points at which to perform forecasting, such that each tT does not belong to the first two segments (i.e., tmax(θ̂χ,2,θ̂ξ,2)+1), and there are at least n0 of observations to build a forecast model separately for χt and ξt, respectively. Denoting by L̂χ(v)=max{0kK̂χ: θ̂χ,k+1v} the index of θ̂χ,k nearest to and strictly left of v and similarly defining L̂ξ(v), this means that min(L̂χ(t),L̂ξ(t))2 and min(tθ̂χ,L̂χ(t),tθ̂ξ,L̂ξ(t))n0 for all tT. We have |T|=1600. For such tT, we obtain X̂t(N)=χ̂t(N1)+ξ̂t(N2) for some N=(N1,N2), where χ̂t(N1) denotes an estimator of the best linear predictor of χt given Xtl, 1lN1, and ξ̂t(N2) is defined analogously. The difference between the two approaches we take lies in the selection of N.

(F1)We set N1=tK̂χ,L̂χ(t)1 and N2=tK̂ξ,L̂ξ(t)1.

(F2)We set N1=tK̂χ,L̂χ(t)11 and N2=tK̂ξ,L̂ξ(t)11.

Barigozzi, Cho, and Owens (2022) propose two methods for estimating the best linear predictors of χt and ξt under a stationary factor-adjusted VAR model, one based on a more restrictive assumption on the factor structure (“restricted”) than the other (“unrestricted”); we refer to the paper for their detailed descriptions. Both estimators are combined with the two approaches (F1) and (F2). reports the summary of the forecasting errors measured as FEtavg=|XtX̂t(N)|22/|Xt|22 and FEtmax=|XtX̂t(N)|/|Xt|, obtained from combining different best linear predictors with (F1)–(F2). According to all evaluation criteria, (F1) produces forecasts that are more accurate than (F2) regardless of the forecasting methods, which supports the validity of the change point estimators returned by FVARseg.

Table 4 Mean and standard errors of FEtavg and FEtmax for tT where |T|=1600.

6 Conclusions

We consider the problem of high-dimensional time series segmentation under a piecewise stationary, factor-adjusted VAR model which, adopting the most general approach to time series factor modeling, permits pervasive cross-sectional and serial correlations in the data as well as accommodating structural changes. The FVARseg proceeds in two stages, detecting change points in the factor-driven component and the idiosyncratic VAR process separately, and fully addresses the challenges arising from the presence of latent factors. Theoretical consistency of FVARseg is established under general conditions permitting heavy tails and dependence across the stationary segments, and we derive the estimation rates that make explicit the influence of the tail behavior and the size of changes. It is competitive both theoretically and computationally in comparison with the existing methods that are proposed for special instances of the proposed factor-adjusted VAR model.

Supplementary Materials

In the supplement, Section F contains preliminary lemmas and all proofs for theoretical results stated in the paper. Section B presents further information on simulation studies and Section A includes further discussions on Stage 2 of FVARseg.

Supplemental material

UASA_A_2240054_Supplemental.zip

Download Zip (4 MB)

Disclosure Statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

Cho is grateful to the support of the Leverhulme Trust via the Research Project Grant RPG-2019-390. Maeng, Eckley and Fearnhead gratefully acknowledge the financial support of the Engineering and Physical Sciences Research Council (EPSRC) via StatScale (EP/N031938/1). Eckley is also grateful to the EPSRC for the financial support of EP/T025964/1, whilst Fearnhead also acknowledges the support of EP/V053590/1. Open Access funding was provided by Durham University.

References

  • Bai, P., Safikhani, A., and Michailidis, G. (2020), “Multiple Change Points Detection in Low Rank and Sparse High Dimensional Vector Autoregressive Models,” IEEE Transactions on Signal Processing, 68, 3074–3089. DOI: 10.1109/TSP.2020.2993145.
  • Bai, P., Safikhani, A., and Michailidis, G. (2022), “Multiple Change Point Detection in Reduced Rank High Dimensional Vector Autoregressive Models,” Journal of the American Statistical Association (to appear). DOI: 10.1080/01621459.2022.2079514.
  • Barigozzi, M., Cho, H., and Fryzlewicz, P. (2018), “Simultaneous Multiple Change-Point and Factor Analysis for High-Dimensional Time Series,” Journal of Econometrics, 206, 187–225. DOI: 10.1016/j.jeconom.2018.05.003.
  • Barigozzi, M., Cho, H., and Owens, D. (2022), “FNETS: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series,” arXiv preprint arXiv:2201.06110. DOI: 10.1080/07350015.2023.2257270.
  • Barigozzi, M., and Hallin, M. (2017), “A Network Analysis of the Volatility of High Dimensional Financial Series,” Journal of the Royal Statistical Society, Series C, 66, 581–605. DOI: 10.1111/rssc.12177.
  • Barigozzi, M., Hallin, M., Soccorsi, S., and von Sachs, R. (2021), “Time-Varying General Dynamic Factor Models and the Measurement of Financial Connectedness,” Journal of Econometrics, 222, 324–343. DOI: 10.1016/j.jeconom.2020.07.004.
  • Basu, S., and Michailidis, G. (2015), “Regularized Estimation in Sparse High-Dimensional Time Series Models,” The Annals of Statistics, 43, 1535–1567. DOI: 10.1214/15-AOS1315.
  • Chen, L., Wang, W., and Wu, W. B. (2022), “Inference of Breakpoints in High-Dimensional Time Series,” Journal of the American Statistical Association, 117, 1951–1963, DOI: 10.1080/01621459.2021.1893178.
  • Cho, H., and Kirch, C. (2022), “Two-Stage Data Segmentation Prmitting Multiscale Change Points, Heavy Tails and Dependence,” Annals of the Institute of Statistical Mathematics, 74, 653–684. DOI: 10.1007/s10463-021-00811-5.
  • Cule, E., Vineis, P., and De Iorio, M. (2011), “Significance Testing in Ridge Regression for Genetic Data,” BMC Bioinformatics, 12, 1–15. DOI: 10.1186/1471-2105-12-372.
  • Diebold, F. X., and Yilmaz, K. (2014), “On the Network Topology of Variance Decompositions: Measuring the Connectedness of Financial Firms,” Journal of Econometrics, 182, 119–134. DOI: 10.1016/j.jeconom.2014.04.012.
  • Duan, J., Bai, J., and Han, X. (2022), “Quasi-Maximum Likelihood Estimation of Break Point in High-Dimensional Factor Models,” Journal of Econometrics (to appear). DOI: 10.1016/j.jeconom.2021.12.011.
  • Eichinger, B., and Kirch, C. (2018), “A MOSUM Procedure for the Estimation of Multiple Random Change Points,” Bernoulli, 24, 526–564. DOI: 10.3150/16-BEJ887.
  • Fan, J., Ke, Y., and Wang, K. (2020), “Factor-Adjusted Regularized Model Selection,” Journal of Econometrics, 216, 71–85. DOI: 10.1016/j.jeconom.2020.01.006.
  • Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,” Journal of the Royal Statistical Society, Series B, 75, 603–680. DOI: 10.1111/rssb.12016.
  • Fan, J., Masini, R., and Medeiros, M. C. (2021), “Bridging Factor and Sparse Models,” arXiv preprint arXiv:2102.11341.
  • Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000), “The Generalized Dynamic Factor Model: Identification and Estimation,” The Review of Economics and Statistics, 82, 540–554. DOI: 10.1162/003465300559037.
  • Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2004), “The Generalized Dynamic Factor Model Consistency and Rates,” Journal of Econometrics, 119, 231–255.
  • Forni, M., Hallin, M., Lippi, M., and Zaffaroni, P. (2015), “Dynamic Factor Models with Infinite-Dimensional Factor Spaces: One-Sided Representations,” Journal of Econometrics, 185, 359–371. DOI: 10.1016/j.jeconom.2013.10.017.
  • Forni, M., Hallin, M., Lippi, M., and Zaffaroni, P. (2017), “Dynamic Factor Models with Infinite-Dimensional Factor Space: Asymptotic Analysis,” Journal of Econometrics, 199, 74–92.
  • Giannone, D., Lenza, M., and Primiceri, G. E. (2021), “Economic Predictions with Big Data: The Illusion of Sparsity,” ECB Working Paper 2542, European Central Bank.
  • Hallin, M., and Liška, R. (2007), “Determining the Number of Factors in the General Dynamic Factor Model,” Journal of the American Statistical Association, 102, 603–617. DOI: 10.1198/016214506000001275.
  • Han, F., Lu, H., and Liu, H. (2015), “A Direct Estimation of High Dimensional Stationary Vector Autoregressions,” Journal of Machine Learning Research, 16, 3115–3150.
  • Horn, R. A., and Johnson, C. R. (1985), Matrix Analysis, Cambridge: Cambridge University Press.
  • Kirch, C., Muhsal, B., and Ombao, H. (2015), “Detection of Changes in Multivariate Time Series with Application to EEG Data,” Journal of the American Statistical Association, 110, 1197–1216. DOI: 10.1080/01621459.2014.957545.
  • Krampe, J., and Margaritella, L. (2021), “Dynamic Factor Models with Sparse VAR Idiosyncratic Components,” arXiv preprint arXiv:2112.07149.
  • Li, Y.-N., Li, D., and Fryzlewicz, P. (2023), “Detection of Multiple Structural Breaks in Large Covariance Matrices,” Journal of Business and Economic Statistics, 41, 846–861, DOI: 10.1080/07350015.2022.2076686.
  • Lütkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Berlin: Springer.
  • Maeng, H., Eckley, I., and Fearnhead, P. (2022), “Collective Anomaly Detection in High-Dimensional VAR Models, Statistica Sinica (to appear).
  • Meier, A., Kirch, C., and Cho, H. (2021), “mosum: A Package for Moving Sums in Change-Point Analysis,” Journal of Statistical Software, 97, 1–42. DOI: 10.18637/jss.v097.i08.
  • Messer, M., Kirchner, M., Schiemann, J., Roeper, J., Neininger, R., and Schneider, G. (2014), “A Multiple Filter Test for the Detection of Rate Changes in Renewal Processes with Varying Variance,” Annals of Applied Statistics, 8, 2027–2067.
  • Michailidis, G., and d’Alché Buc, F. (2013), “Autoregressive Models for Gene Regulatory Network Inference: Sparsity, Stability and Causality Issues, Mathematical Biosciences, 246, 326–334. DOI: 10.1016/j.mbs.2013.10.003.
  • Politis, D. N. (2003), “Adaptive Bandwidth Choice,” Journal of Nonparametric Statistics, 15, 517–533. DOI: 10.1080/10485250310001604659.
  • Preuss, P., Puchstein, R., and Dette, H. (2015), “Detection of Multiple Structural Breaks in Multivariate Time Series,” Journal of the American Statistical Association, 110, 654–668. DOI: 10.1080/01621459.2014.920613.
  • Safikhani, A., Bai, Y., and Michailidis, G. (2022), “Fast and Scalable Algorithm for Detection of Structural Breaks in Big VAR Models,” Journal of Computational and Graphical Statistics, 31, 176–189. DOI: 10.1080/10618600.2021.1950005.
  • Safikhani, A., and Shojaie, A. (2022), “Joint Structural Break Detection and Parameter Estimation in High-Dimensional Nonstationary VAR Models,” Journal of the American Statistical Association, 117, 251–264. DOI: 10.1080/01621459.2020.1770097.
  • Shojaie, A., and Michailidis, G. (2010), “Discovering Graphical Granger Causality Using the Truncating Lasso Penalty,” Bioinformatics, 26, i517–i523. DOI: 10.1093/bioinformatics/btq377.
  • Wang, D., and Tsay, R. S. (2022), “Rate-Optimal Robust Estimation of High-Dimensional Vector Autoregressive Models,” arXiv preprint arXiv:2107.11002.
  • Wang, D., Yu, Y., and Rinaldo, A. (2021), “Optimal Covariance Change Point Localization in High Dimensions,” Bernoulli, 27, 554–575. DOI: 10.3150/20-BEJ1249.
  • Wang, D., Yu, Y., Rinaldo, A., and Willett, R. (2019), “Localizing Changes in High-Dimensional Vector Autoregressive Processes,” arXiv preprint arXiv:1909.06359.