1,808
Views
2
CrossRef citations to date
0
Altmetric
Articles

FNETS: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series

, ORCID Icon &

Abstract

We propose FNETS, a methodology for network estimation and forecasting of high-dimensional time series exhibiting strong serial- and cross-sectional correlations. We operate under a factor-adjusted vector autoregressive (VAR) model which, after accounting for pervasive co-movements of the variables by common factors, models the remaining idiosyncratic dynamic dependence between the variables as a sparse VAR process. Network estimation of FNETS consists of three steps: (i) factor-adjustment via dynamic principal component analysis, (ii) estimation of the latent VAR process via l1-regularized Yule-Walker estimator, and (iii) estimation of partial correlation and long-run partial correlation matrices. In doing so, we learn three networks underpinning the VAR process, namely a directed network representing the Granger causal linkages between the variables, an undirected one embedding their contemporaneous relationships and finally, an undirected network that summarizes both lead-lag and contemporaneous linkages. In addition, FNETS provides a suite of methods for forecasting the factor-driven and the idiosyncratic VAR processes. Under general conditions permitting tails heavier than the Gaussian one, we derive uniform consistency rates for the estimators in both network estimation and forecasting, which hold as the dimension of the panel and the sample size diverge. Simulation studies and real data application confirm the good performance of FNETS.

1 Introduction

Vector autoregressive (VAR) models are popularly adopted for time series analysis in economics and finance. Fitting a VAR model to the data enables inferring dynamic interdependence between the variables as well as forecasting the future. VAR models are particularly appealing for network analysis since estimating the nonzero elements of the VAR parameter matrices, a.k.a. transition matrices, recovers directed edges between the components of vector time series in a Granger causality network. In addition, by estimating a precision matrix (inverse of the covariance matrix) of the VAR innovations, we can also define a network capturing contemporaneous linear dependencies. For the network interpretation of VAR modeling, see for example, Dahlhaus (Citation2000), Eichler (Citation2007), Billio et al. (Citation2012), Ahelegbey, Billio, and Casarin (Citation2016), Barigozzi and Brownlees (Citation2019), Guðmundsson and Brownlees (Citation2021), and Uematsu and Yamagata (Citation2023).

Estimation of VAR models quickly becomes a high-dimensional problem as the number of parameters grows quadratically with the dimensionality. There is a mature literature on estimation of high-dimensional VAR models under the sparsity (Hsu, Hung, and Chang Citation2008; Basu and Michailidis Citation2015; Han, Lu, and Liu Citation2015; Kock and Callot Citation2015; Nicholson et al. Citation2020; Krampe and Paparoditis Citation2021; Masini, Medeiros, and Mendes Citation2022; Adamek, Smeekes, and Wilms Citation2023) and low-rank plus sparsity (Basu, Li, and Michailidis Citation2019) assumptions, see also BańburaBańbura, Giannone, and Reichlin (2010) for a Bayesian approach. In all above, either explicitly or implicitly, the spectral density of the time series is required to have eigenvalues which are uniformly bounded over frequencies. Indeed, this condition is crucial for controlling the deviation bounds involved in theoretical investigation of regularized estimators.

Lin and Michailidis (Citation2020) observe that for VAR processes, this assumption restricts the parameters to be either dense but small in their magnitude (which makes their estimation using the shrinkage-based methods challenging) or highly sparse, while Giannone, Lenza, and Primiceri (Citation2021) note the difficulty of identifying sparse predictive representations in many economic applications. Moreover, some datasets typically exhibit strong serial and cross-sectional correlations and violate the bounded spectrum assumption. The left panel of provides an illustration of this phenomenon; with the increase of dimensionality, a volatility panel dataset (see Section 5.3 for its description) exhibits a linear increase in the leading eigenvalue of the estimate of its spectral density matrix at frequency 0 (i.e., long-run covariance). The right panel visualizes the outcome from fitting a VAR(5) model to the same dataset without making any adjustment of the strong correlations (see the caption for further details), from which we cannot infer meaningful, sparse pairwise relationship.

Fig. 1 Left: The two largest eigenvalues (y-axis) of the long-run covariance matrix estimated from the volatility panel analyzed in Section 5.3 (March 2008 to March 2009, n = 252) with subsets of cross-sections randomly sampled 100 times for each given dimension p{5,,46} (x-axis). Right: logged and truncated p-values (truncation level chosen by Bonferroni correction with the significance level 0.1) from fitting a VAR(5) model to the same dataset using ridge regression and generating p-values corresponding to each coefficient as described in Cule, Vineis, and De Iorio (Citation2011). For each pair of variables (corresponding tickers given in x- and y-axes), the minimum p-value over the five lags is reported.

Fig. 1 Left: The two largest eigenvalues (y-axis) of the long-run covariance matrix estimated from the volatility panel analyzed in Section 5.3 (March 2008 to March 2009, n = 252) with subsets of cross-sections randomly sampled 100 times for each given dimension p∈{5,…,46} (x-axis). Right: logged and truncated p-values (truncation level chosen by Bonferroni correction with the significance level 0.1) from fitting a VAR(5) model to the same dataset using ridge regression and generating p-values corresponding to each coefficient as described in Cule, Vineis, and De Iorio (Citation2011). For each pair of variables (corresponding tickers given in x- and y-axes), the minimum p-value over the five lags is reported.

In this article, we propose to model high-dimensional time series by means of a factor-adjusted VAR approach, which simultaneously accounts for strong serial and cross-sectional correlations attributed to factors, as well as sparse, idiosyncratic correlations among the variables that remain after factor adjustment. We take the most general approach to factor modeling based on the generalized dynamic factor model, where factors are dynamic in the sense that they are allowed to have not only contemporaneous but also lagged effects on the variables (Forni et al. Citation2000). We propose FNETS, a suite of tools accompanying the model for estimation and forecasting with a particular focus on network analysis, which addresses the challenges arising from the latency of the VAR process as well as high dimensionality.

We make the following methodological and theoretical contributions.

  1. We propose an l1-regularized Yule-Walker estimation method for estimating the factor-adjusted, idiosyncratic VAR, while permitting the number of nonzero parameters to slowly grow with the dimensionality. Estimating the VAR parameters and the inverse of the innovation covariance, and then combining them allow us to define three networks underlying the latent VAR process, namely a direct network representing Granger causal linkages, an undirected one underpinning their contemporaneous relationships, as well as an undirected network summarizing both. Under general conditions permitting weak factors and heavier tails than the sub-Gaussian one, we show the consistency of FNETS in estimating the edge sets of these networks, which holds uniformly over all p2 entries of the networks (Propositions 3.3 and 3.5).

  2. We provide new consistency rates for the estimation and forecasting approaches considered by Forni et al. (2005, Citation2017), which hold uniformly for the entire cross-sections of p-dimensional time series (Propositions 4.1 and B.2). In doing so, we establish uniform consistency of the estimators of high-dimensional spectral density matrices of the factor-driven and the idiosyncratic components, extending the results of Zhang and Wu (Citation2021) to the presence of latent factors.

Our approach differs from the existing ones for factor-adjusted regression problems (Fan, Ke, and Wang Citation2020; Fan, Masini, and Medeiros Citation2021; Fan, Lou, and Yu Citation2023; Krampe and Margaritella Citation2021), as (i) it allows for the presence of dynamic factors, thus including all possible dynamic linear co-dependencies, and (ii) it relies only on the estimators of the autocovariances of the latent idiosyncratic process, and avoids estimating the entire latent process and controlling the errors arising from such a step, which increase with the sample size. The price to pay for the generality of the factor modeling in (i), is an extra term appearing in the rate of consistency which represents the bandwidth for spectral density estimation required for factor-adjustment in the frequency domain. We make explicit the role played by this bandwidth in the theoretical results, and also present the results under a more restricted static factor model for ease of comparison. We mention two more differences between this article and Fan, Masini, and Medeiros (Citation2021) and Fan, Lou, and Yu (Citation2023). First, they additionally consider the problem of testing hypotheses on the idiosyncratic covariance and the adequacy of factor/sparse regression, while we focus on network estimation. Second, their methods accommodate models for the idiosyncratic component other than VAR.

FNETS is another take the popular low-rank plus sparsity modeling framework in the high-dimensional learning literature, Also, it is in line with a frequently adopted practice in financial time series analysis where factor-driven common components representing the systematic sources of risk, are removed prior to inferring a network structure via (sparse) regression modeling and identifying the most central nodes representing the systemic sources of risk (Diebold and Y ilmaz 2014; Barigozzi and Brownlees Citation2019). We provide a rigorous theoretical treatment of this empirical approach by accounting for the effect of the factor-adjustment step on the second step regression.

The rest of the article is organized as follows. Section 2 introduces the factor-adjusted VAR model. Sections 3 and 4 describe the network estimation and forecasting methodologies comprising FNETS, respectively, and provide their theoretical consistency. In Section 5, we demonstrate the good estimation and forecasting performance of FNETS on a panel of volatility measures. Section 6 concludes the article, and all the proofs and complete simulation results are presented in Supplementary Appendix. The R software fnets implementing FNETS is available from CRAN (Barigozzi, Cho, and Owens Citation2023).

Notations

By I,O, and 0, we denote an identity matrix, a matrix of zeros, and a vector of zeros whose dimensions depend on the context. For a matrix A=[aii, 1im, 1in], we denote by A its transpose. The element-wise l,l0,l1 and l2-norms are denoted by |A|=max1immax1in|aii|, |A|0=i=1mi=1nI{aii0}, |A|1=i=1mi=1n|aii| and |A|2=i=1mi=1n|aii|2. The Frobenius, spectral, induced L1 and L-norms are denoted by ||A||F=|A|2, ||A||=Λmax(AA) (with Λmax(A) and Λmin(A) denoting its largest and smallest eigenvalues in modulus), ||A||1=max1ini=1m|aii| and ||A||=max1ini=1m|aii|. Let Ai· and A·k denote the ith row and the kth column of A. For two real numbers, set ab=max(a,b) and ab=min(a,b). Given two sequences {an} and {bn}, we write an=O(bn) if, for some finite constant C > 0 there exists NN0=N{0} such that |an||bn|1C for all nN; we denote by OP the stochastic boundedness. We write anbn when an=O(bn) and bn=O(an). Throughout, L denotes the lag operator and ι=1. Finally, IA=1 if the event A takes place and 0 otherwise.

2 Factor-Adjusted Vector Autoregressive Model

Consider a zero-mean, second-order stationary p-variate process Xt=(X1t,,Xpt),1tn, which is decomposed into the sum of two latent components: a factor-driven, common component χt=(χ1t,,χpt), and an idiosyncratic component ξt=(ξ1t,,ξpt) modeled as a VAR process. That is, Xt=χt+ξt where (1) χt=B(L)ut=l=0Blutl  with ut=(u1t,,uqt),and(1) (2) A(L)ξt=ξtl=1dAlξtl=Γ1/2εt  with εt=(ε1t,,εpt).(2)

In (1), the latent random vector ut, referred to as the vector of common factors or common shocks, is assumed to satisfy E(ut)=0 and cov(ut)=Iq, and are loaded on each χit via square summable, one-sided filters Bij(L)=l=0Bl,ijLl, where Bl=[Bl,ij, 1ip, 1jq]Rp×q. This defines the generalized dynamic factor model (GDFM) proposed by Forni et al. (Citation2000) and Forni and Lippi (Citation2001), which provides the most general approach to high-dimensional time series factor modeling.

In (2), the idiosyncratic component ξt is modeled as a VAR(d) process for some finite positive integer d, with innovations Γ1/2εt where ΓRp×p is some positive definite matrix and Γ1/2 its symmetric square root matrix, and E(εt)=0 and cov(εt)=Ip. We assume that ξt is causal (see Assumption 2.3(i)), that is, it admits the Wold representation: (3) ξt=D(L)Γ1/2εt=l=0DlΓ1/2εtl with D(L)=A1(L),(3) such that Γ1/2εt is seen as a vector of idiosyncratic shocks loaded on each ξit via square summable, one-sided filters Dik(L)=l=0Dl,ikLl where Dl=[Dl,ik, 1i,kp]. After accounting for the dominant cross-sectional dependence in the data (both contemporaneous and lagged) by factors, it is reasonable to assume that the dependences left in ξt are weak and, therefore, that the VAR structure is sufficiently sparse. Discussion on the precise requirement on the sparsity of Al, 1ld, and Γ1 is deferred to Section 3.

Remark 2.1.

A special case of the GDFM is the popularly adopted static factor model where the factors are loaded only contemporaneously (see e.g., Stock and Watson Citation2002; Bai Citation2003; Fan, Liao, and Mincheva Citation2013). This is formalized in Assumption 4.1, where we consider forecasting under a static representation. A sufficient condition to obtain a static representation from the GDFM in (1), is to assume B(L)=l=0sBlLl for some finite integer s0. For example, if s = 0, the model reduces to χt=B0ut while if s > 0, it can be written as χt=ΛFt with Λ=[Bl, 0ls] and Ft=(ut,,uts). Under the static factor model, Xt admits a factor-augmented VAR representation (see Remark 4.1).

In the remainder of this section, we list the assumptions required for identification and estimation of (1)–(2). Since χt and ξt are latent, some assumptions are required to ensure their (asymptotic) identifiability which are made in the frequency domain. Denote by Σx(ω) the spectral density matrix of Xt at frequency ω[π,π], and μx,j(ω) its dynamic eigenvalues which are real-valued and ordered in the decreasing order. We similarly define Σχ(ω),μχ,j(ω),Σξ(ω) and μξ,j(ω).

Assumption 2.1.

There exist a positive integer p01, constants ρj(3/4,1] with ρ1ρq, and pairs of continuous functions ωαχ,j(ω) and ωβχ,j(ω) for ω[π,π] and 1jq, such that for all pp0, βχ,1(ω)μχ,1(ω)pρ1αχ,1(ω)>>βχ,q(ω)μχ,q(ω)pρqαχ,q(ω)>0.

Under the assumption, if ρj=1 for all 1jq, then we are in presence of q factors that are equally pervasive for the whole cross-section. The left panel of depicts the case when ρ1=1. If ρj<1 for some j, we permit the presence of “weak” factors and our theoretical analysis explicitly reflects this; see, for example, Onatski (Citation2012) and Freyaldenhoven (Citation2021) for static factor models permitting weak factors. When weak factors are present, the ordering of the variables becomes important as p, whereas the case of linearly diverging factor strengths is compatible with completely arbitrary cross-sectional ordering. The requirement that ρj>3/4 is a minimal one, and generally larger values of ρj are required as the dimensionality increases and heavier tails are permitted as discussed later.

Assumptions 2.2 and 2.3 are made to control the serial dependence in Xt.

Assumption 2.2.

There exist some constants Ξ>0 and ς>2 such that for all l0, max1ip|Bl,i·|2Ξ(1+l)ςand(j=1q|Bl,·j|2)1/2Ξ(1+l)ς.

Assumption 2.3.

  1. d is a finite positive integer and det(A(z))0 for all |z|1.

  2. There exist some constants 0<mεMε such that ||Γ||Mε and Λmin(Γ)mε.

  3. There exist a constant mξ>0 such that infω[π,π]μξ,p(ω)mξ.

  4. There exist some constants Ξ>0 and ς>2 such that for all l0, |Dl,ik|Cik(1+l)ς with max{max1kpi=1pCik, max1ipk=1pCik,max1ipk=1pCik2}Ξ.

Assumption 2.3 (i) and (ii) are standard in the literature (Lütkepohl Citation2005) and imply that ξt is causal and has finite and nonzero covariance. Under Assumptions 2.2 and 2.3 (iv) (imposed on the Wold decomposition of ξt in (3)), the serial dependence in Xt decays at an algebraic rate. Further, we obtain a uniform bound for μξ,j(ω) under Assumption 2.3 (iv):

Proposition 2.1.

Under Assumption 2.3, uniformly over all ω[π,π], there exists some constant Bξ>0 depending only on Mε, Ξ and Ϛ, defined in Assumption 2.3 (iii) and (iv), such that supω[π,π]μξ,1(ω)Bξ.

Remark 2.2.

Proposition 2.1 and Assumption 2.3 (iii) jointly establish the uniform boundedness of μξ,1(ω) and μξ,p(ω), which is commonly assumed in the literature on high- dimensional VAR estimation via l1-regularization. A sufficient condition for Assumption 2.3 (iii) is that max{max1ipl=1d|Al,i·|1,max1jpl=1d|Al,·j|1}  Ξ for some constant Ξ>0 (Basu and Michailidis Citation2015). Further, when for example, d = 1, Assumption 2.3 (iv) follows if |A1|γ<1 since max(||Dl||1,||Dl||)Ξγl with Dl=A1l.

The two latent components χt and ξt, and the number of factors q, are identified thanks to the large gap between the eigenvalues of their spectral density matrices, which follows from Assumption 2.1 and Proposition 2.1. Then by Weyl’s inequality, the qth dynamic eigenvalue μx,q(ω) diverges almost everywhere in [π,π] as p, whereas μx,q+1(ω) is uniformly bounded for any pN and ω. This property is exploited in the FNETS methodology as later described in Section 3.2. It is worth stressing that Assumption 2.1 and Proposition 2.1 jointly constitute both a necessary and sufficient condition for the process Xt to admit the dynamic factor representation in (1), see Forni and Lippi (Citation2001).

Finally, we characterize the common and idiosyncratic innovations.

Assumption 2.4.

  1. {ut}tZ is a sequence of zero-mean, q-dimensional martingale difference vectors with cov(ut)=Iq, and uit and ujt are independent for all 1i,jq with ij and all tZ.

  2. {εt}tZ is a sequence of zero-mean, p-dimensional martingale difference vectors with cov(εt)=Ip, and εit and εjt are independent for all 1i,jp with ij and all tZ.

  3. E(ujtεit) = 0 for all 1jq,1ip and t,tZ.

  4. There exist some constants ν>4 and μν>0 such that max{max1jqE(|ujt|ν),max1ipE(|εit|ν)}μν.

Assumption 2.4 (i) and (ii) allow the common and idiosyncratic innovations to be sequences of martingale differences, relaxing the assumption of serial independence found in Forni et al. (Citation2017). Condition (iii) is standard in the factor modeling literature. Under (iv), we require that the innovations have ν>4 moments, which is considerably weaker than Gaussian or sub-Weibull tails assumed in the literature on VAR modeling of high-dimensional time series (Basu and Michailidis Citation2015; Kock and Callot Citation2015; Wong, Li, and Tewari Citation2020; Krampe and Paparoditis Citation2021; Masini, Medeiros, and Mendes Citation2022). Similar approaches to ours, based only on moment conditions, are found in Wu and Wu (Citation2016) who investigate the Lasso performance in deterministic designs under functional dependence, and Adamek, Smeekes, and Wilms (Citation2023) who assume instead near-epoch-dependence. In Appendix F, we separately consider the case when ut and εt are Gaussian for the sake of comparison.

3 Network Estimation via FNETS

3.1 Networks Underpinning Factor-Adjusted VAR Processes

Under the latent VAR model in (2), we can define three types of networks underpinning the interconnectedness of Xt after factor adjustment (Barigozzi and Brownlees Citation2019).

Let V={1,,p} denote the set of vertices representing the p time series. First, the transition matrices Al=[Al,ii, 1i,ip], encode the directed network NG=(V,EG) representing Granger causal linkages, with (4) EG={(i,i)V×V: Al,ii0 for some 1ld}(4) as the set of edges. Here, the presence of an edge (i,i)EG indicates that ξi,tl Granger causes ξit at some lag 1ld.

The second network contains undirected edges representing contemporaneous dependence between VAR innovations Γ1/2εt, denoted by NC=(V,EC); we have (i,i)EC iff the partial correlation between the ith and ith elements of Γ1/2εt is nonzero. Specifically, letting Γ1=Δ=[δii, 1i,ip], the set of edges is given by (5) EC={(i,i)V×V: ii and δiiδii·δii0}.(5)

Finally, we summarize the aforementioned lead-lag and contemporaneous relations between the variables in a single, undirected network NL=(V,EL) by means of the long-run partial correlations of ξt. Let Ω=[ωii, 1i,ip] denote the long-run partial covariance matrix of ξt, that is Ω=(Σξ(0))1=2πA(1)ΔA(1) under (2). Then, the set of edges of NL is (6) EL={(i,i)V×V: ii and ωiiωii·ωii0}.(6)

Generally, EL is greater than EGEC, see Appendix C for a sufficient condition for the absence of an edge (i,i) from NL. In the remainder of Section 3, we describe the network estimation methodology of FNETS which, consisting of three steps, estimates the three networks while fully accounting for the challenges arising from not directly observing the VAR process ξt, and investigate its theoretical properties.

3.2 Step 1: Factor Adjustment via Dynamic PCA

As described in Section 2, under our model (1)–(2), there exists a large gap in μx,j(ω), the dynamic eigenvalues of the spectral density matrix of Xt, between those attributed to the factors (jq) and those which are not (jq+1). With the goal of estimating the autocovariance (ACV) matrix of the latent VAR process ξt, we exploit this gap in the factor-adjustment step based on dynamic principal component analysis (PCA); see Chapter 9 of Brillinger (Citation1981) for the definition of dynamic PCA and Forni et al. (Citation2000) for its use in the estimation of GDFM. Throughout, we treat q as known and refer to Hallin and Liška (Citation2007) for its consistent estimation under (1).

Denote the ACV matrices of Xt by Γx(l)=E(XtlXt) for l0 and Γx(l)=Γx(l) for l1, and analogously define Γχ(l) and Γξ(l) with χt and ξt replacing Xt, respectively. Then, Σx(ω) and Γx(l) satisfy Σx(ω)=(2π)1l=Γx(l)exp(ιlω) for all ω[π,π]. Motivated by this, we estimate Σx(ω) by (7) Σ̂x(ω)=12πl=mmK(lm)Γ̂x(l)exp(ιlω),(7) with the sample ACV Γ̂x(l)=n1t=l+1nXtlXt when l0, and Γ̂x(l)=Γ̂x(l) for l<0, and the kernel bandwidth m=nβ for some β(0,1). We adopt the Bartlett kernel as K(·) which ensures positive semi-definiteness of Σ̂x(ω) (see Appendix F.2.4). Then, we evaluate Σ̂x(ω) at the 2m+1 Fourier frequencies ωk, mkm (ωk=2πk/(2m+1) for 0km, and ωk=ω|k| for mk1), and estimate Σχ(ωk) by retaining the contribution from the q largest eigenvalues and eigenvectors only. That is, we obtain Σ̂χ(ωk)=j=1qμ̂x,j(ωk)êx,j(ωk)(êx,j(ωk))* (with * denoting the transposed complex conjugate), where μ̂x,1(ω)μ̂x,q(ω), denote the q leading eigenvalues of Σ̂x(ω) and êx,j(ω) the associated (normalized) eigenvectors. From this, an estimator of Γχ(l) at a given lag lN, is obtained via inverse Fourier transform as Γ̂χ(l)=2π(2m+1)1k=mmΣ̂χ(ωk)exp(ιlωk) and finally, we estimate the ACV matrices of ξt with Γ̂ξ(l)=Γ̂x(l)Γ̂χ(l), by virtue of Assumption 2.4 (iii).

3.3 Step 2: Estimation of VAR Parameters and NG

Recalling the VAR(d) model in (2), let β=[Al, 1ld]R(pd)×p denote the matrix collecting all the VAR parameters. When ξt is directly observable, l1-regularized least squares or maximum likelihood estimators have been proposed for β, see the references given in Introduction. In the context of factor-adjusted regression modeling where the aim is to estimate the regression structure in the latent idiosyncratic process, it has been proposed to apply the l1-regularization methods after estimating the entire latent process by, say, ξ̂t (Fan, Ke, and Wang Citation2020; Fan, Masini, and Medeiros Citation2021; Fan, Lou, and Yu Citation2023; Krampe and Margaritella Citation2021). However, such an approach possibly suffers from the lack of statistical efficiency due to having to control the estimation errors in ξ̂t uniformly for all 1tn. Instead, we make use of the Yule-Walker (YW) equation β=G1g, where G=[Γξ(0)Γξ(1)Γξ(d+1)Γξ(d1)Γξ(d2)Γξ(0)]andg=[Γξ(1)Γξ(d)], with G being always invertible since Λmin(G)2πmξ>0 by Assumption 2.3 (iii). We propose to estimate β as a regularized YW estimator based on Ĝ and ĝ, which are obtained by replacing Γξ(l) with Γ̂ξ(l) derived in Step 1 of FNETS via dynamic PCA, in the definitions of G and g, respectively.

To handle the high dimensionality, we consider an l1-regularised estimator for β which solves the following l1-penalised M-estimation problem (8) β̂=argminMRpd×ptr(MĜM2Mĝ)+λ|M|1(8) with a tuning parameter λ>0. Note that the matrix Ĝ is guaranteed to be positive semi-definite (see Appendix F.2.4), thus the problem in (8) is convex with a global minimizer. We note the similarity between (8) and the Lasso estimator, but our estimator is specifically tailored for the problem of estimating the parameters for the latent VAR process ξt by means of second-order moments only, and thus differs fundamentally from the Lasso-type estimators proposed for high-dimensional VAR estimation. In Appendix A, we propose an alternative estimator based on a constrained l1-minimization approach closely related to the Dantzig selector (Candès and Tao Citation2007).

Once the VAR parameters are estimated, we propose to estimate the edge set of NG in (4) by the set of indices of the nonzero elements of a thresholded version of β̂, denoted by β̂(t)=[β̂ij·I{|β̂ij|>t}], with some threshold t>0.

3.4 Step 3: Estimation of NC and NL

Recall that the edge sets of NC and NL defined in (5)–(6), are given by the supports of Δ and Ω. Given β̂ in (8) which estimates β, a natural estimator of Γ arises from the YW equation Γ=Γξ(0)l=1dAlΓξ(l)=Γξ(0)βg, as Γ̂=Γ̂ξ(0)β̂ĝ. Then, we propose to estimate Δ=Γ1 via constrained l1-minimization as (9) Δˇ=argminMRp×p|M|1subject to|Γ̂MI|η,(9) where η>0 is a tuning parameter. This approach has originally been proposed for estimating the precision matrix of independent data (Cai, Liu, and Luo Citation2011), which we extend to time series settings. Since Δˇ=[δˇii, 1i,jp] is not guaranteed to be symmetric, a symmetrization step is performed to obtain Δ̂=[δ̂ii, 1i,ip] with δ̂ii=δˇii·I{|δˇii||δˇii|}+δˇii·I{|δˇii|<|δˇii|}. Then, the edge set of NC in (5) is estimated by the support of the thresholded estimator Δ̂(tδ)=[δ̂ii·I{|δ̂ii|>tδ}, 1i,ip] with some threshold tδ>0.

Finally, we estimate Ω=2π(A(1))ΔA(1) by replacing A(1) and Δ with their estimators. We adopt the thresholded estimator β̂(t)=[Â1(t),,Âd(t)], to obtain Â(1)=Il=1dÂl(t) and set Ω̂=2π(Â(1))Δ̂Â(1). Analogously, the edge set of NL in (6) is obtained by thresholding Ω̂=[ω̂ii, 1i,ip] with some threshold tω>0, as the support of Ω̂(tω)=[ω̂ii·I{|ω̂ii|>tω}, 1i,ip].

3.5 Theoretical Properties

We prove the consistency of FNETS in network estimation by establishing the theoretical properties of each of its three steps in Sections 3.5.1–3.5.3. Then in Section 3.5.4, we present the results for a special case where χt admits a static representation, np and E(|Xit|ν)< for ν>8, for ease of comparing our results to the existing ones.

Hereafter, we define (10) ψn=(mn12/νmlog(m)n) andϑn,p=(m(np)2/νlog7/2(p)nmlog(mp)n),(10) where the dependence of these quantities on ν is omitted for simplicity.

3.5.1 Factor Adjustment via Dynamic PCA

We first establish the consistency of the dynamic PCA-based estimator of Γχ(l).

Theorem 3.1.

Suppose that Assumptions 2.1, 2.2, 2.3, and 2.4 are met. Then, for any finite positive integer sd, as n,p, maxl:|l|s1pΓ̂χ(l)Γχ(l)F= OP(qp2(1ρq)(ψn1m1p)),maxl:|l|s|Γ̂χ(l)Γχ(l)|= OP(qp2(1ρq)(ϑn,p1m1p)).

Remark 3.1.

Theorem 3.1 is complemented by Proposition F.15 in Appendix which establishes the consistency of the spectral density matrix estimator Σ̂χ(ω) uniformly over ω[π,π] in both Frobenius and l-norms.

  1. Both ψn and ϑn,p in (10) increase with the bandwidth m. It is possible to find m that minimizes for example, (ϑn,pm1) which, roughly speaking, represents the bias-variance tradeoff in the estimation of the spectral density matrix Σx(ω). For example, in light-tailed settings with large enough ν, the choice m(nlog1(np))1/3 leads to the minimal rate in l-norm (ϑn,pm1)(log(np)/n)1/3 which nearly matches the optimal nonparametric rate when using the Bartlett kernel as in (7) (Priestley Citation1982, p. 463).

  2. Consistency in Frobenius norm depends on ψn which tends to zero as n without placing any constraint on the relative rate of divergence between n and p. Consistency in l-norm is determined by ϑn,p which depends on the interplay between the dimensionality and the tail behavior. Generally, the estimation error worsens as weaker factors are permitted (ρq<1 in Assumption 2.1) and as p grows, and also when ν is small such that heavier tails are permitted. Consider the case when all factors are strong (i.e., ρj=1). If pn, then l-consistency holds with an appropriately chosen m=nβ, β(0,1), that leads to ϑn,p=o(1), provided that ν>4. When all moments of ujt and εit exist, we achieve l-consistency even in the ultra high-dimensional case where log(p)=o(n).

From Theorem 3.1, the following proposition immediately follows.

Proposition 3.2.

Suppose that the conditions in Theorem 3.1 are met and let Assumption 2.1 hold with ρj=1, 1jq. Then, P(En,p)1 as n,p, where (11) En,p={maxdld|Γ̂ξ(l)Γξ(l)|Cξ(ϑn,p1m1p)}.(11) for some constant Cξ>0.

From Proposition 3.2, we have l-consistency of Γ̂ξ(l) in the presence of strong factors. Although it is possible to trace the effect of weak factors on the estimation of Γξ(l) (see Corollary F.17), we make this simplifying assumption to streamline the presentation of the theoretical results of the subsequent Steps 2–3 of FNETS.

Remark 3.2.

In Appendix F.2.8, we show that if χt admits the static representation discussed in Remark 2.1, the rate in Proposition 3.2 is further improved as (12) maxl:|l|d|Γ̂ξ(l)Γξ(l)|=OP(ϑ˜n,p1p) withϑ˜n,p=(p2/νlog3(p)n12/νlog(p)n).(12)

The term ϑ˜n,p comes from bounding maxl:|l|d|Γ̂x(l)Γx(l)|. Hence, the improved rate in (12) is comparable to the rate attained when we directly observe ξt apart from the presence of p1/2, which is due to the presence of latent factors; similar observations are made in Theorem 3.1 of Fan, Liao, and Mincheva (Citation2013).

3.5.2 Estimation of VAR Parameters and NG

We measure the sparsity of β by s0,j=|β·j|0,s0=j=1ps0,j and sin=max1jps0,j. When d = 1, the quantity sin coincides with the maximum in-degree per node of NG.

Proposition 3.3.

Suppose that Cξsin(ϑn,pm1p1/2)πmξ/16, where mξ is defined in Assumption 2.3 (iii). Also, set λ4Cξ(||β||1+1)(ϑn,pm1p1/2) in (8). Then, conditional on En,p defined in (11), we have |β̂β||β̂β|232sinλπmξand|β̂β|18s0λπmξ.

Following Loh and Wainwright (Citation2012), the proof of Proposition 3.3 proceeds by showing that, conditional on En,p, the matrix Ĝ meets a restricted eigenvalue condition (Bickel, Ritov, and Tsybakov Citation2009) and the deviation bound is controlled as |Ĝβĝ|λ/4. Then, thanks to Proposition 3.2, as n,p, the estimation errors of β̂ in l- and l1-norms, are bounded as in Proposition 3.3 with probability tending to one.

Remark 3.3.

As noted in Remark 2.2, the boundedness of μξ,j(ω) follows from that of ||β||1=max1jpl=1d|Al,j·|1, in which case ||β||1 appearing in the assumed lower bound on λ, does not inflate the rate of the estimation errors. In the light-tailed situation, with the optimal bandwidth m(nlog1(np))1/3 as specified in Remark 3.1(b), it is required that sin=O((nlog1(np))1/3p), which still allows the number of nonzero entries in each row of Al to grow with p. Here, the exponent 1/3 in place of 1/2 often found in the literature, comes from adopting the most general approach to time series factor modeling which necessitates selecting a bandwidth for frequency domain-based factor adjustment.

For sign consistency of the Lasso estimator, the (almost) necessary and sufficient condition is the so-called irrepresentable condition (Zhao and Yu Citation2006), which is known to be highly stringent (Tardivel and Bogdan Citation2022). Alternatively, Medeiros and Mendes (Citation2016) propose an adaptive Lasso estimator with data-driven weights for high-dimensional VAR estimation when ξt is directly observed. Instead, we propose to additionally threshold β̂ and obtain β̂(t), whose support consistently estimates the edge set of NG.

Corollary 3.4.

Suppose that the conditions of Proposition 3.3 are met. If (13) min(i,j):|βij|>0|βij|>2t(13) with t=32sinλ/(πmξ), then we have sign(β̂(t))=sign(β) conditional on En,p.

3.5.3 Estimation of NC and NL

Let sδ(ϱ)=max1ipi=1p|δii|ϱ, ϱ[0,1), denote the (weak) sparsity of Δ=[δii, 1i,ip]. Also, define sout=max1jpl=1d|Al,·j|0 which, complementing sin, represents the sparsity of the out-going edges of NG. Analogously as in Proposition 3.3, we establish deterministic guarantees for Δ̂ and Ω̂ conditional on En,p.

Proposition 3.5.

Suppose that the conditions in Propositions 3.3 are met, and set η=Csin||Δ||1(||β||1+1)(ϑn,pm1p1/2) in (9), with C depending only on Cξ and mξ. Then, conditional on En,p defined in (11), we have:

  1. |Δ̂Δ|4||Δ||1η and ||Δ̂Δ||12sδ(ϱ)(4||Δ||1η)1ϱ.

  2. If also soutt||A(1)||1 with t chosen as in Corollary 3.4, then, |Ω̂Ω|4π||A(1)||1(3||Δ||soutt+16||A(1)||1||Δ||1η).

Together with Assumption 2.3 (ii), Proposition 3.5(i) indicates asymptotic positive definiteness of Δ̂ provided that Δ is sufficiently sparse, as measured by ||Δ||1 and sδ(ϱ). By definition, NL combines NG and NC and consequently, its sparsity structure is determined by the sparsity of the other two networks, which is reflected in Proposition 3.5(ii). Specifically, the term ||A(1)||1 is related to the out-going property of NG, and satisfies ||A(1)||1max1jpl=1d|Al,·j|1, where the boundedness of the right-hand side is sufficient for the boundedness of μξ,j(ω) (Remark 2.2). Also, ||Δ||1 reflects the sparsity of the edge set of NC, and the tuning parameter η depends on the sparsity of the in-coming edges of NG through ||β||1=max1jpl=1d|Al,j·|1 and sin.

Similarly as in Corollary 3.4, we can show the consistency of the thresholded estimators Δ̂(tδ) and Ω̂(tω) in estimating the edge sets of NC and NL, respectively.

Corollary 3.6.

Suppose that conditions of Proposition 3.5 are met. Conditional on En,p:

  1. If min(i,i):|δii|>0|δii|>2tδ with tδ=4||Δ||1η, we have sign(Δ̂(tδ))=sign(Δ).

  2. If min(i,i):|ωii|>0|ωii|>2tω with tω=4π||A(1)||1(3||Δ||soutt+16||A(1)||1||Δ||1η), we have sign(Ω̂(tω))=sign(Ω).

3.5.4 The Case of the Static Factor Model

For ease of comparing the performance of FNETS with the existing results, we focus on the static factor model setting discussed in Remark 2.1, and assume that np and max(||β||1,||A(1)||1)=O(1). Then, from Remark 3.2 and the proof of Proposition 3.3, we obtain max1jp|β̂·jβ·j|2=OP(sinlog(n)/n) provided that ν>8, such that the condition in (13) is written with tsinlog(n)/n. That is, β̂ and its thresholded counterpart proposed for the estimation of the latent VAR process, perform as well as the benchmark derived under independence and Gaussianity in the Lasso literature (van de Geer, Bühlmann, and Zhou Citation2011). In this same setting, the factor-adjusted regression estimation method of Fan, Masini, and Medeiros (Citation2021), when applied to the problem of VAR parameter estimation, yields an estimator β̂FARM which attains max1jp|β̂·jFARMβ·j|2=OP(sinn1/2+5/ν) under strong mixingness, see their Theorem 3. Here, the larger OP -bound compared to ours stems from that β̂FARM requires the estimation of ξit for all i and t, the error from which increases with n as well as p. This demonstrates the efficacy of adopting our regularised YW estimator.

Continuing with the same setting, Propositions 3.5 implies that |Δ̂Δ|=OP(||Δ||12sinlog(n)n) and|Ω̂Ω|=OP((sout||Δ||12sin)sinlog(n)n).

The former is comparable (up to sin) to the results in Theorem 4 of Cai, Liu, and Luo (Citation2011) derived for estimating a sparse precision matrix of independent random vectors.

4 Forecasting via FNETS

4.1 Forecasting under the Static Factor Model Representation

For given time horizon a0, the best linear predictor of χn+a based on χnl, l0, is (14) χn+a|n=l=0Bl+aunl.(14) under (1). Following Forni et al. (2005), we consider a forecasting method for the factor-driven component which estimates χn+a|n under a restricted GDFM that admits a static representation of finite dimension. We formalize the static factor model discussed in Remark 2.1 in the following assumption.

Assumption 4.1.

  1. There exist two finite positive integers m1 and m2 such that m1+1m2,χt=M(1)(L)ft and ft=M(2)(L)ut where M(1)(L)=l=0m1Ml(1)Ll with M(1)Rp×q,M(2)(L)=l=0m2Ml(2)Ll with M(2)Rq×q and det(M(2)(z))0 for all |z|1.

  2. Let μχ,j, 1jr, denote the jth largest eigenvalue of Γχ(0). Then, there exist a positive integer p01, constants ϱj(7/8,1] with ϱ1ϱr, and pairs of positive constants (αχ,j,βχ,j), 1jr, such that for all pp0, βχ,1μχ,1pϱ1αχ,1>βχ,2μχ,2pϱ2αχ,r1>βχ,rμχ,rpϱrαχ,r>0.

In part (i), χt admits a static representation with r=q(m1+1) factors: χt=ΛFt, where Λ=[Ml(1), 0lm1],Ft=(ft,,ftm1) and ft=M(2)(L)ut. The condition that m1+1m2 is made for convenience, and the proposed estimator of χn+a|n can be modified accordingly when it is relaxed.

Remark 4.1.

Under Assumption 4.1 (i), the r-vector of static factors, Ft, is driven by the q-dimensional common shocks ut. If q < r, Anderson and Deistler (Citation2008) show that Ft always admits a VAR(h) representation: Ft=l=1hGlFtl+Hut for some finite positive integer h and HRr×q. Then, Xt has a factor-augmented VAR representation: Xt=Λl=1hGlFtl+ΛHut+l=1dAlξtl+Γ1/2εt=l=1dhClFtl+l=1dAlXtl+νt, with Cl=ΛGlI{lh}AlΛI{ld} and νt=ΛHut+Γ1/2εt. This model is a generalization of the factor augmented forecasting model considered by Stock and Watson (Citation2002) where only the factor-driven component is present, and it is also considered by Fan, Masini, and Medeiros (Citation2021).

It immediately follows from Proposition 2.1 that ||Γξ(0)||2πBξ. This, combined with Assumption 4.1 (ii), indicates the presence of a large gap in the eigenvalues of Γx(0), which allows the asymptotic identification of χt and ξt in the time domain, as well as that of the number of static factors r. Throughout, we treat r as known, and refer to for example Bai and Ng (Citation2002) and Ahn and Horenstein (Citation2013), for its estimation.

Let (μχ,j,eχ,j), 1jr, denote the pairs of eigenvalues and eigenvectors of Γχ(0) ordered such that μχ,1μχ,r. Then, Γχ(0)=EχMχEχ with Mχ=diag(μχ,j, 1jr) and Eχ=[eχ,j, 1jr]. Under Assumption 4.1 (i), we have χn+a|n in (14) satisfy χn+a|n=Proj(χn+a|Fnl,l0)=Proj(χn+a|Fn)=Γχ(a)EχMχ1Eχχn, where Proj(·|z) denotes the linear projection operator onto the linear space spanned by z. When a = 0, we trivially have χt|n=χt for tn. Then, a natural estimator of χn+a|n is (15) χ̂n+a|nres=Γ̂χ(a)ÊχM̂χ1ÊχXn,(15) where (μ̂χ,j,êχ,j), 1jr, denote the pairs of eigenvalues and eigenvectors of Γ̂χ(0), and Γ̂χ(l), l{0,a}, are estimated as described in Section 3.2. As a by-product, we obtain the in-sample estimator by setting a = 0, as χ̂tres=ÊχÊχXt for 1tn.

Remark 4.2.

Our proposed estimator χ̂n+a|nres differs from that of Forni et al. (2005), as they estimate the factor space via generalized PCA on Γ̂χ(0). This in effect replaces Êχ in (15) with the eigenvectors of W1Γ̂χ(0) where W is a diagonal matrix containing the estimators of the sample variance of ξt. Such an approach may gain in efficiency compared to ours in the same way a weighted least squares estimator is more efficient than the ordinary one in the presence of heteroscedasticity. However, since we investigate the consistency of χ̂n+a|nres without deriving its asymptotic distribution, we do not explore such approach in this article.

In Appendix B, we present an alternative forecasting method that operates under an unrestricted GDFM, that is it does not require Assumption 4.1. Referred to as χ̂n+a|nunr, we compare its performance with that of χ̂n+a|nres in numerical studies.

Once VAR parameters are estimated by β̂=[Â1,,Âd] as in (8), we produce a forecast of ξn+a given Xt, tn, by estimating the best linear predictor ξn+a|n=l=1dAlξn+1l|n (with ξt|n=ξt for tn), as (16) ξ̂n+a|n=l=1max(1,a)1Âlξ̂n+al|n+l=max(1,a)dÂlξ̂n+al.(16)

When ad, the in-sample estimators appearing in (16) are obtained as ξ̂t=Xtχ̂t, n+adtn, with either χ̂tres or χ̂tunr as χ̂t.

4.2 Theoretical Properties

Proposition 4.1 establishes the consistency of χ̂n+a|nres in estimating the best linear predictor of χn+a, where we make it explicit the effects of the presence of weak factors, both dynamic (as measured by μχ,j(ω) in Assumption 2.1) and static (as measured by μχ,j in Assumption 4.1 (ii)), and the tail behavior (through ψn and ϑn,p defined in (10)).

Proposition 4.1.

Suppose that the conditions in Theorem 3.1 are met and, in addition, we assume that Assumption 4.1 holds. Then, for any finite a0, we have |χ̂n+a|nresχn+a|n|=OP(p42ρq2ϱr(ψnpϱr1ϑn,p1m1p)).

As noted in Remark 3.1 (c), weaker factors and heavier tails impose a stronger requirement on the dimensionality p. If all factors are strong (ϱr=1), the rate becomes (ϑn,pm1p1/2). When a = 0, Proposition 4.1 provides in-sample estimation consistency for any given tn. The next proposition accounts for the irreducible error in χn+a|n, with which we conclude the analysis of the forecasting error |χ̂n+a|nresχn+a| when a1.

Proposition 4.2.

Suppose that Assumptions 2.2 and 2.4 hold. Then for any finite a1,|χn+a|nχn+a|=OP(q1/νμν1/νlog1/2(p)).

Recall the definition of sin given in Section 3.5. The next proposition investigates the performance of ξ̂n+a|n when a = 1, which can easily be extended to any finite a2.

Proposition 4.3.

Suppose that the in-sample estimator of ξt and β̂ satisfy (17) |ξ̂n+1lξn+1l|=OP(ζ¯n,p)for1ldandβ̂β1=OP(sinζn,p).(17)

Also, let Assumptions 2.3 and 2.4 hold. Then, |ξ̂n+1|nξn+1|=OP(sinζn,p(log1/2(p)p1/νμν1/ν+ζ¯n,p)+||β||1ζ¯n,p+p1/νμν1/ν).

Either of the in-sample estimators χ̂tres (described in Section 4.1) or χ̂tunr (Appendix B), can be used in place of χ̂t. Accordingly, the rate ζ¯n,p in (17) is inherited by that of χ̂tres (given in Proposition 4.1) or χ̂tunr (Proposition B.2 (iii)). From the proof of Proposition 3.3, we have ζn,p(||β||1+1)(ϑn,pm1p1/2) in (17).

5 Numerical Studies

5.1 Tuning Parameter Selection

We briefly discuss the choice of the tuning parameters for FNETS. For full details, see Owens, Cho, and Barigozzi (Citation2023) that accompanies its R implementation available on CRAN (Barigozzi, Cho, and Owens Citation2023).

Related to χt

We set the kernel bandwidth at m=4(n/log(n))1/3 based on the case when sufficiently large number of moments exist and np (Remark 3.1 (b)). In simulation studies reported in Appendix E, we treat the number of factors q (required for Step 1 of FNETS) known, and also treat the number of static factors r (for generating the forecast) as known if it is finite; when χt does not admit a static factor model (i.e. r=), we use the value returned by the ratio-based estimator of Ahn and Horenstein (Citation2013). In real data analysis reported in Section 5.3, we estimate both q and r, the former with the estimator proposed in Hallin and Liška (Citation2007), the latter as in Ahn and Horenstein (Citation2013).

Related to ξt

We select the tuning parameter λ in (8) jointly with the VAR order d, by adopting cross validation (CV); in time series settings, a similar approach is explored in Wang and Tsay (Citation2022). For this, the data is partitioned into M consecutive folds with indices Il={nl+1,,nl+1} where nl=min(ln/M,n), 0lM, and each fold is split into Iltrain={nl+1,,(nl+nl+1)/2} and Iltest=IlIltrain. Then with β̂ltrain(μ,b) obtained from {Xt, tIltrain} with the tuning parameter μ and the VAR order b, we evaluate CV(μ,b)=l=1M tr(Γ̂ξ,ltest(0)(β̂ltrain(μ,b))ĝltest(b)(ĝltest(b))β̂ltrain(μ,b)+(β̂ltrain(μ,b))Ĝltest(b)β̂ltrain(μ,b)), where Γ̂ξ,ltest(l),Ĝltest(b), and ĝltest(b) are generated analogously as Γ̂ξ(l),Ĝ, and ĝ, respectively, using the test set {Xt, tIltest}. The measure CV(μ,b) approximates the prediction error while accounting for that we do not directly observe ξt. Minimizing it over varying μ and b, we select λ and d. In simulation studies, we treat d as known while in real data analysis, we select it from the set {1,,5} via CV. For selecting η in (9), we adopt the Burg matrix divergence-based CV measure: CV(μ)=l=1Mtr(Δ̂ltrain(μ)Γ̂ltest)log|Δ̂ltrain(μ)Γ̂ltest|p.

For both CV procedures, we set M = 1 in the numerical results reported below. In simulation studies, we compare the estimators with their thresholded counterparts in estimating the network edge sets with the thresholds t,tδ and tω selected according to a data-driven approach motivated by Liu, Zhang, and Liu (Citation2021). Details are in Appendix D.

5.2 Simulations

In Appendix E, we investigate the estimation and forecasting performance of FNETS ondatasets simulated under a variety of settings, from Gaussian innovations ut and εt with (E1) Δ=I and (E2) ΔI, to (E3) heavy-tailed (t5) innovations with Δ=I, and when χt is generated from (C1) fully dynamic or (C2) static factor models. In addition, we consider the “oracle” setting (C0) χt=0 where, in the absence of the factor-driven component, the results obtained can serve as a benchmark. For comparison, we consider the factor-adjusted regression method of Fan, Masini, and Medeiros (Citation2021) and present the performance of their estimator of VAR parameters and forecasts.

5.3 Application to a Panel of Volatility Measures

We investigate the interconnectedness in a panel of volatility measures and evaluate its out-of-sample forecasting performance using FNETS. For this purpose, we consider a panel of p = 46 stock prices retrieved from the Wharton Research Data Service, of US companies which are all classified as “financials” according to the Global Industry Classification Standard; a list of company names and industry groups are found in Appendix G. The dataset spans the period between January 3, 2000 and December 31, 2012 (3267 trading days). Following Diebold and Y ilmaz (2014), we measure the volatility using the high-low range as σit2=0.361(pithighpitlow)2 where pithigh and pitlow denote, respectively, the maximum and the minimum log-price of stock i on day t, and set Xit=log(σit2); Brownlees and Gallo (Citation2010) support this choice of volatility measure over more sophisticated alternatives.

5.3.1 Network Analysis

We focus on the period 03/2006–02/2010 corresponding to the Great Financial Crisis. We partition the data into four segments of length n = 252 each (corresponding to the number of trading days in a single year) and on each segment, we apply FNETS to estimate the three networks NG,NC, and NL described in Section 3.1.

Each row of plots the heat maps of the matrices underlying the three networks of interest. From all four segments, the CV-based approach described in Section 5.1 returns d = 1 from the candidate VAR order set {1,,5}. Hence in each row, the left panel represents the estimator Â1=β̂, and the middle and the right show the (long-run) partial correlations from the corresponding Δ̂ and Ω̂ (with their diagonals set to be zero). Locations of the nonzero elements estimate the edge sets of the corresponding networks, and the hues represent the (signed) edge weights.

Fig. 2 Heat maps of the estimators of the VAR transition matrices Â1, partial correlations from Δ̂ and long-run partial correlations from Ω̂ (left to right), which in turn estimate the networks NG,NC, and NL, respectively, over three selected periods. The grouping of the companies according to their industry classifications are indicated by the axis label colors. The heat maps in the left column are in the scale of [0.81,0.81] while the others are in the scale of [1,1], with red hues denoting large positive values and blue hues large negative values.

Fig. 2 Heat maps of the estimators of the VAR transition matrices Â1, partial correlations from Δ̂ and long-run partial correlations from Ω̂ (left to right), which in turn estimate the networks NG,NC, and NL, respectively, over three selected periods. The grouping of the companies according to their industry classifications are indicated by the axis label colors. The heat maps in the left column are in the scale of [−0.81,0.81] while the others are in the scale of [−1,1], with red hues denoting large positive values and blue hues large negative values.

Prior to March 2007, all networks exhibit a low degree of interconnectedness but the number of edges increases considerably in 03/2007–02/2008 due mainly to an overall increase in dynamic co-dependencies and a prominent role of banks (blue group) not only in NG but also in NC. In 03/2008–02/2009, the companies belonging to the insurance sector (red group) play a central role and in 03/2009–02/2010, the companies become highly interconnected with two particular firms having many outgoing edges in NG. Also, while most edges in NL, which captures the overall long-run dependence, have positive weights across time and companies, their weights become negative in this last segment. We highlight that FNETS is able to capture the aforementioned group-specific activities although this information is not supplied to the estimation method.

5.3.2 Forecasting

We perform a rolling window-based forecasting exercise on the trading days in 2012. Starting from T = 3016 (the first trading day in 2012), we forecast XT+1 as X̂T+1|T(n)=χ̂T+1|T(n)+ξ̂T+1|T(n), where χ̂T+1|T(n) (resp. ξ̂T+1|T(n)) denotes the forecast of χT+1 (resp. ξT+1) using the preceding n data points {Xt, Tn+1tT}. We set n = 252. After the forecast X̂T+1|T(n) is generated, we update TT+1 and repeat the above procedure until T = 3267 (the last trading day in 2012) is reached.

For χ̂T+1|T(n), we consider the forecasting methods derived under the static factor model (Section 4.1, denoted by χ̂T+1|Tres(n)) and unrestricted GDFM (Appendix B, χ̂T+1|Tunr(n)). Following the analysis in Section 5.3.1, we set d = 1 when producing ξ̂T+1|T(n). Additionally, we report the forecasting performance of FarmPredict (Fan, Masini, and Medeiros Citation2021), which first fits an AR model to each of the p series (“AR”), projects the residuals on their principal components, and then fits VAR models to what remains via Lasso. Combining the three steps gives the final forecast X̂T+1|TFARM(n). The forecast produced by the first step univariate AR modeling, denoted by X̂T+1|TAR(n), is also included for comparison.

We evaluate the performance of X̂T+1|T using two measures of errors FET+1avg=|XT+1|22·|XT+1X̂T+1|T|22 and FET+1max=|XT+1|1·|XT+1X̂T+1|T|, see for the summary of the forecasting results. Among the forecasts generated by FNETS, the one based on χ̂T+1|Tres(n) performs the best in this exercise, which outperforms X̂T+1|TAR(n) and X̂T+1|TFARM(n) according to both FEavg and FEmax on average. As noted in Appendix E.2.2, the forecast based on χ̂T+1|Tunr shows instabilities and generally is outperformed by the one based on χ̂T+1|Tres, but nonetheless performs reasonably well. Given the high level of co-movements and persistence in the data, the good performance of FNETS is mainly attributed to the way we forecast the factor-driven component, which is based on the estimators derived under GFDM that fully exploit all the dynamic co-dependencies (see also the results obtained by Barigozzi and Hallin Citation2017 on a similar dataset).

Table 1 Mean, median and standard errors of FET+1avg and FET+1max on the trading days in 2012 for X̂T+1|T(n) in comparison with AR and FarmPredict (Fan, Masini, and Medeiros Citation2021) forecasts.

6 Conclusions

We propose and study the asymptotic properties of FNETS, a network estimation and forecasting methodology for high-dimensional time series under a dynamic factor-adjusted VAR model. Our estimation strategy fully takes into account the latency of the VAR process of interest via regularized YW estimation which, distinguished from the existing approaches, brings in methodological simplicity as well as theoretical benefits. We investigate the theoretical properties of FNETS under general conditions permitting weak factors and heavier tails than sub-Gaussianity commonly imposed in the high-dimensional VAR literature, and provide new insights into the interplay between various quantities determining the sparsity of the networks underpinning VAR processes, factor strength and tail behavior, on the estimation of those networks. Simulation studies and an application to a panel of financial time series show that FNETS is particularly useful for network analysis as it is able to discover group structures as well as producing accurate forecasts for highly co-moving and persistent time series such as log-volatilities. The R software fnets implementing FNETS is available from CRAN (Barigozzi, Cho, and Owens Citation2023).

Supplemental material

supplement.zip

Download Zip (2 MB)

Supplementary Materials

The supplementary materials contain the descriptions of the alternative estimation and forecasting methods, selection of the tuning parameters and the complete simulation results, in addition to all the proofs of the theoretical results.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Supported by the Leverhulme Trust (RPG-2019-390).

References

  • Adamek, R., Smeekes, S., and Wilms, I. (2023), “Lasso Inference for High-Dimensional Time Series,” Journal of Economics, 235, 1114–1143. DOI: 10.1016/j.jeconom.2022.08.008.
  • Ahelegbey, D. F., Billio, M., and Casarin, R. (2016), “Bayesian Graphical Models for Structural Vector Autoregressive Processes,” Journal of Applied Economics, 31, 357–386. DOI: 10.1002/jae.2443.
  • Ahn, S. C., and Horenstein, A. R. (2013), “Eigenvalue Ratio Test for the Number of Factors,” Econometrica, 81, 1203–1227.
  • Anderson, B. D., and Deistler, M. (2008), “Generalized Linear Dynamic Factor Models–A Structure Theory,” in 47th IEEE Conference on Decision and Control, pp. 1980–1985.
  • Bai, J. (2003), “Inferential Theory for Factor Models of Large Dimensions,” Econometrica, 71, 135–171. DOI: 10.1111/1468-0262.00392.
  • Bai, J., and Ng, S. (2002), “Determining the Number of Factors in Approximate Factor Models,” Econometrica, 70, 191–221. DOI: 10.1111/1468-0262.00273.
  • Bańbura, M., Giannone, D., and Reichlin, L. (2010), “Large Bayesian Vector Auto Regressions,” Journal of Applied Economics, 25, 71–92. DOI: 10.1002/jae.1137.
  • Barigozzi, M., and Brownlees, C. (2019), “NETS: Network Estimation for Time Series,” Journal of Applied Economics, 34, 347–364. DOI: 10.1002/jae.2676.
  • Barigozzi, M., Cho, H., and Owens, D. (2023), fnets: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series, R package version 0.1.5. DOI: 10.1080/07350015.2023.2257270.
  • Barigozzi, M., and Hallin, M. (2017), “Generalized Dynamic Factor Models and Volatilities: Estimation and Forecasting,” Journal of Economics, 201, 307–321. DOI: 10.1016/j.jeconom.2017.08.010.
  • Basu, S., Li, X., and Michailidis, G. (2019), “Low Rank and Structured Modeling of High-Dimensional Vector Autoregressions,” IEEE Transactions on Signal Processing, 67, 1207–1222. DOI: 10.1109/TSP.2018.2887401.
  • Basu, S., and Michailidis, G. (2015), “Regularized Estimation in Sparse High-Dimensional Time Series Models,” The Annals of Statistics, 43, 1535–1567. DOI: 10.1214/15-AOS1315.
  • Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous Analysis of Lasso and Dantzig Selector,” The Annals of Statistics, 37, 1705–1732. DOI: 10.1214/08-AOS620.
  • Billio, M., Getmansky, M., Lo, A. W., and Pelizzon, L. (2012), “Econometric Measures of Connectedness and Systemic Risk in the Finance and Insurance Sectors,” Journal of Financial Economics, 104, 535–559. DOI: 10.1016/j.jfineco.2011.12.010.
  • Brillinger, D. R. (1981), Time Series: Data Analysis and Theory, Philadelphia, PA: SIAM.
  • Brownlees, C., and Gallo, G. (2010), “Comparison of Volatility Measures: A Risk Management Perspective,” Journal of Financial Economics, 8, 29–56. DOI: 10.1093/jjfinec/nbp009.
  • Cai, T., Liu, W., and Luo, X. (2011), “A Constrained l1 Minimization Approach to Sparse Precision Matrix Estimation,” Journal of the American Statistical Association, 106, 594–607.
  • Candès, E. and Tao, T. (2007), “The Dantzig Selector: Statistical Estimation when p is much larger than n,” The Annals of Statistics, 35, 2313–2351. DOI: 10.1214/009053607000000532.
  • Cule, E., Vineis, P., and De Iorio, M. (2011), “Significance Testing in Ridge Regression for Genetic Data,” BMC Bioinformatics, 12, 1–15. DOI: 10.1186/1471-2105-12-372.
  • Dahlhaus, R. (2000), “Graphical Interaction Models for Multivariate Time Series,” Metrika, 51, 157–172. DOI: 10.1007/s001840000055.
  • Diebold, F. X., and Yilmaz, K. (2014), “On the Network Topology of Variance Decompositions: Measuring the Connectedness of Financial Firms,” Journal of Economics, 182, 119–134. DOI: 10.1016/j.jeconom.2014.04.012.
  • Eichler, M. (2007), “Granger Causality and Path Diagrams for Multivariate Time Series,” Journal of Economics, 137, 334–353. DOI: 10.1016/j.jeconom.2005.06.032.
  • Fan, J., Ke, Y., and Wang, K. (2020), “Factor-Adjusted Regularized Model Selection,” Journal of Economics, 216, 71–85. DOI: 10.1016/j.jeconom.2020.01.006.
  • Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,” Journal of the Royal Statistical Society, Series B, 75, 603–680. DOI: 10.1111/rssb.12016.
  • Fan, J., Lou, Z., and Yu, M. (2023), “Are Latent Factor Regression and Sparse Regression Adequate?,” Journal of the American Statistical Association (in press). DOI: 10.1080/01621459.2023.2169700.
  • Fan, J., Masini, R., and Medeiros, M. C. (2021), “Bridging Factor and Sparse Models,” arXiv preprint arXiv:2102.11341 .
  • Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000), “The Generalized Dynamic Factor Model: Identification and Estimation,” The Review of Economics and Statistics, 82, 540–554. DOI: 10.1162/003465300559037.
  • ———(2005), “The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting,” Journal of the American Statistical Association, 100, 830–840.
  • Forni, M., Hallin, M., Lippi, M., and Zaffaroni, P. (2017), “Dynamic Factor Models with Infinite-Dimensional Factor Space: Asymptotic Analysis,” Journal of Economics, 199, 74–92. DOI: 10.1016/j.jeconom.2017.04.002.
  • Forni, M., and Lippi, M. (2001), “The Generalized Dynamic Factor Model: Representation Theory,” Economic Theory, 17, 1113–1141. DOI: 10.1017/S0266466601176048.
  • Freyaldenhoven, S. (2021), “Factor Models with Local Factors–Determining the Number of Relevant Factors,” Journal of Economics, 229, 80–102. DOI: 10.1016/j.jeconom.2021.04.006.
  • Giannone, D., Lenza, M., and Primiceri, G. E. (2021), “Economic Predictions With Big Data: The Illusion of Sparsity,” Econometrica, 89, 2409–2437. DOI: 10.3982/ECTA17842.
  • Guðmundsson, G. S., and Brownlees, C. (2021), “Detecting Groups in Large Vector Autoregressions,” Journal of Economics, 225, 2–26. DOI: 10.1016/j.jeconom.2021.03.012.
  • Hallin, M., and Liška, R. (2007), “Determining the Number of Factors in the General Dynamic Factor Model,” Journal of the American Statistical Association, 102, 603–617. DOI: 10.1198/016214506000001275.
  • Han, F., Lu, H., and Liu, H. (2015), “A Direct Estimation of High Dimensional Stationary Vector Autoregressions,” Journal of Machine Learning Research, 16, 3115–3150.
  • Hsu, N.-J., Hung, H.-L., and Chang, Y.-M. (2008), “Subset Selection for Vector Autoregressive Processes Using Lasso,” Computational Statistics & Data Analysis, 52, 3645–3657. DOI: 10.1016/j.csda.2007.12.004.
  • Kock, A. B., and Callot, L. (2015), “Oracle Inequalities for High Dimensional Vector Autoregressions,” Journal of Economics, 186, 325–344. DOI: 10.1016/j.jeconom.2015.02.013.
  • Krampe, J., and Margaritella, L. (2021), “Dynamic Factor Models with Sparse VAR Idiosyncratic Components,” arXiv preprint arXiv:2112.07149.
  • Krampe, J., and Paparoditis, E. (2021), “Sparsity Concepts and Estimation Procedures for High-Dimensional Vector Autoregressive Models,” Journal of Time Series Analysis, 42, 554–579. DOI: 10.1111/jtsa.12586.
  • Lin, J., and Michailidis, G. (2020), “Regularized Estimation of High-Dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models,” Journal of Machine Learning Research, 21, 1–51.
  • Liu, B., Zhang, X., and Liu, Y. (2021), “Simultaneous Change Point Inference and Structure Recovery for High Dimensional Gaussian Graphical Models,” Journal of Machine Learning Research, 22, 1–62.
  • Loh, P.-L., and Wainwright, M. J. (2012), “High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity,” The Annals of Statistics, 40, 1637–1664. DOI: 10.1214/12-AOS1018.
  • Lütkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Berlin: Springer.
  • Masini, R. P., Medeiros, M. C., and Mendes, E. F. (2022), “Regularized Estimation of High-Dimensional Vector Autoregressions with Weakly Dependent Innovations,” Journal of Time Series Analysis, 43, 532–557. DOI: 10.1111/jtsa.12627.
  • Medeiros, M. C., and Mendes, E. F. (2016), “l1 -regularization of High-Dimensional Time-Series Models with Non-gaussian and Heteroskedastic Errors,” Journal of Economics, 191, 255–271.
  • Nicholson, W. B., Wilms, I., Bien, J., and Matteson, D. S. (2020), “High Dimensional Forecasting via Interpretable Vector Autoregression,” Journal of Machine Learning Research, 21, 1–52.
  • Onatski, A. (2012), “Asymptotics of the Principal Components Estimator of Large Factor Models with Weakly Influential Factors,” Journal of Economics, 168, 244–258. DOI: 10.1016/j.jeconom.2012.01.034.
  • Owens, D., Cho, H., and Barigozzi, M. (2023), “fnets: An R Package for Network Estimation and Forecasting via Factor-Adjusted VAR Modelling,” The R Journal (to appear).
  • Priestley, M. (1982), Spectral Analysis and Time Series, London: Academic Press.
  • Stock, J. H., and Watson, M. W. (2002), “Forecasting Using Principal Components from a Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179. DOI: 10.1198/016214502388618960.
  • Tardivel, P. J., and Bogdan, M. (2022), “On the Sign Recovery by Least Absolute Shrinkage and Selection Operator, Thresholded Least Absolute Shrinkage and Selection Operator, and Thresholded Basis Pursuit Denoising,” Scandinavian Journal of Statistics, 49, 1636–1668. DOI: 10.1111/sjos.12568.
  • Uematsu, Y., and Yamagata, T. (2023), “Discovering the Network Granger Causality in Large Vector Autoregressive Models,” arXiv preprint arXiv:2303.15158.
  • van de Geer, S., Bühlmann, P., and Zhou, S. (2011), “The Adaptive and the Thresholded Lasso for Potentially Misspecified Models,” Electronic Journal of Statistics, 5, 688–749. DOI: 10.1214/11-EJS624.
  • Wang, D., and Tsay, R. S. (2022), “Rate-Optimal Robust Estimation of High-Dimensional Vector Autoregressive Models,” arXiv preprint arXiv:2107.11002.
  • Wong, K. C., Li, Z., and Tewari, A. (2020), “Lasso Guarantees for β-mixing Heavy-Tailed Time Series,” The Annals of Statistics, 48, 1124–1142. DOI: 10.1214/19-AOS1840.
  • Wu, W.-B., and Wu, Y. N. (2016), “Performance Bounds for Parameter Estimates of High-Dimensional Linear Models with Correlated Errors,” Electronic Journal of Statistics, 10, 352–379. DOI: 10.1214/16-EJS1108.
  • Zhang, D., and Wu, W. B. (2021), “Convergence of Covariance and Spectral Density Estimates for High-Dimensional Locally Stationary Processes,” The Annals of Statistics, 49, 233–254. DOI: 10.1214/20-AOS1954.
  • Zhao, P., and Yu, B. (2006), “On Model Selection Consistency of Lasso,” Journal of Machine Learning Research, 7, 2541–2563.