Full article: FNETS: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We propose FNETS, a methodology for network estimation and forecasting of high-dimensional time series exhibiting strong serial- and cross-sectional correlations. We operate under a factor-adjusted vector autoregressive (VAR) model which, after accounting for pervasive co-movements of the variables by common factors, models the remaining idiosyncratic dynamic dependence between the variables as a sparse VAR process. Network estimation of FNETS consists of three steps: (i) factor-adjustment via dynamic principal component analysis, (ii) estimation of the latent VAR process via $l_{1}$ -regularized Yule-Walker estimator, and (iii) estimation of partial correlation and long-run partial correlation matrices. In doing so, we learn three networks underpinning the VAR process, namely a directed network representing the Granger causal linkages between the variables, an undirected one embedding their contemporaneous relationships and finally, an undirected network that summarizes both lead-lag and contemporaneous linkages. In addition, FNETS provides a suite of methods for forecasting the factor-driven and the idiosyncratic VAR processes. Under general conditions permitting tails heavier than the Gaussian one, we derive uniform consistency rates for the estimators in both network estimation and forecasting, which hold as the dimension of the panel and the sample size diverge. Simulation studies and real data application confirm the good performance of FNETS.

Keywords:

1 Introduction

Vector autoregressive (VAR) models are popularly adopted for time series analysis in economics and finance. Fitting a VAR model to the data enables inferring dynamic interdependence between the variables as well as forecasting the future. VAR models are particularly appealing for network analysis since estimating the nonzero elements of the VAR parameter matrices, a.k.a. transition matrices, recovers directed edges between the components of vector time series in a Granger causality network. In addition, by estimating a precision matrix (inverse of the covariance matrix) of the VAR innovations, we can also define a network capturing contemporaneous linear dependencies. For the network interpretation of VAR modeling, see for example, Dahlhaus (Citation2000), Eichler (Citation2007), Billio et al. (Citation2012), Ahelegbey, Billio, and Casarin (Citation2016), Barigozzi and Brownlees (Citation2019), Guðmundsson and Brownlees (Citation2021), and Uematsu and Yamagata (Citation2023).

Estimation of VAR models quickly becomes a high-dimensional problem as the number of parameters grows quadratically with the dimensionality. There is a mature literature on estimation of high-dimensional VAR models under the sparsity (Hsu, Hung, and Chang Citation2008; Basu and Michailidis Citation2015; Han, Lu, and Liu Citation2015; Kock and Callot Citation2015; Nicholson et al. Citation2020; Krampe and Paparoditis Citation2021; Masini, Medeiros, and Mendes Citation2022; Adamek, Smeekes, and Wilms Citation2023) and low-rank plus sparsity (Basu, Li, and Michailidis Citation2019) assumptions, see also BańburaBańbura, Giannone, and Reichlin (2010) for a Bayesian approach. In all above, either explicitly or implicitly, the spectral density of the time series is required to have eigenvalues which are uniformly bounded over frequencies. Indeed, this condition is crucial for controlling the deviation bounds involved in theoretical investigation of regularized estimators.

Lin and Michailidis (Citation2020) observe that for VAR processes, this assumption restricts the parameters to be either dense but small in their magnitude (which makes their estimation using the shrinkage-based methods challenging) or highly sparse, while Giannone, Lenza, and Primiceri (Citation2021) note the difficulty of identifying sparse predictive representations in many economic applications. Moreover, some datasets typically exhibit strong serial and cross-sectional correlations and violate the bounded spectrum assumption. The left panel of provides an illustration of this phenomenon; with the increase of dimensionality, a volatility panel dataset (see Section 5.3 for its description) exhibits a linear increase in the leading eigenvalue of the estimate of its spectral density matrix at frequency 0 (i.e., long-run covariance). The right panel visualizes the outcome from fitting a VAR(5) model to the same dataset without making any adjustment of the strong correlations (see the caption for further details), from which we cannot infer meaningful, sparse pairwise relationship.

Fig. 1 Left: The two largest eigenvalues (y-axis) of the long-run covariance matrix estimated from the volatility panel analyzed in Section 5.3 (March 2008 to March 2009, n = 252) with subsets of cross-sections randomly sampled 100 times for each given dimension $p \in {5, \dots, 46}$ (x-axis). Right: logged and truncated p-values (truncation level chosen by Bonferroni correction with the significance level 0.1) from fitting a VAR(5) model to the same dataset using ridge regression and generating p-values corresponding to each coefficient as described in Cule, Vineis, and De Iorio (Citation2011). For each pair of variables (corresponding tickers given in x- and y-axes), the minimum p-value over the five lags is reported.

In this article, we propose to model high-dimensional time series by means of a factor-adjusted VAR approach, which simultaneously accounts for strong serial and cross-sectional correlations attributed to factors, as well as sparse, idiosyncratic correlations among the variables that remain after factor adjustment. We take the most general approach to factor modeling based on the generalized dynamic factor model, where factors are dynamic in the sense that they are allowed to have not only contemporaneous but also lagged effects on the variables (Forni et al. Citation2000). We propose FNETS, a suite of tools accompanying the model for estimation and forecasting with a particular focus on network analysis, which addresses the challenges arising from the latency of the VAR process as well as high dimensionality.

We make the following methodological and theoretical contributions.

We propose an $l_{1}$ -regularized Yule-Walker estimation method for estimating the factor-adjusted, idiosyncratic VAR, while permitting the number of nonzero parameters to slowly grow with the dimensionality. Estimating the VAR parameters and the inverse of the innovation covariance, and then combining them allow us to define three networks underlying the latent VAR process, namely a direct network representing Granger causal linkages, an undirected one underpinning their contemporaneous relationships, as well as an undirected network summarizing both. Under general conditions permitting weak factors and heavier tails than the sub-Gaussian one, we show the consistency of FNETS in estimating the edge sets of these networks, which holds uniformly over all p² entries of the networks (Propositions 3.3 and 3.5).
We provide new consistency rates for the estimation and forecasting approaches considered by Forni et al. (2005, Citation2017), which hold uniformly for the entire cross-sections of p-dimensional time series (Propositions 4.1 and B.2). In doing so, we establish uniform consistency of the estimators of high-dimensional spectral density matrices of the factor-driven and the idiosyncratic components, extending the results of Zhang and Wu (Citation2021) to the presence of latent factors.

Our approach differs from the existing ones for factor-adjusted regression problems (Fan, Ke, and Wang Citation2020; Fan, Masini, and Medeiros Citation2021; Fan, Lou, and Yu Citation2023; Krampe and Margaritella Citation2021), as (i) it allows for the presence of dynamic factors, thus including all possible dynamic linear co-dependencies, and (ii) it relies only on the estimators of the autocovariances of the latent idiosyncratic process, and avoids estimating the entire latent process and controlling the errors arising from such a step, which increase with the sample size. The price to pay for the generality of the factor modeling in (i), is an extra term appearing in the rate of consistency which represents the bandwidth for spectral density estimation required for factor-adjustment in the frequency domain. We make explicit the role played by this bandwidth in the theoretical results, and also present the results under a more restricted static factor model for ease of comparison. We mention two more differences between this article and Fan, Masini, and Medeiros (Citation2021) and Fan, Lou, and Yu (Citation2023). First, they additionally consider the problem of testing hypotheses on the idiosyncratic covariance and the adequacy of factor/sparse regression, while we focus on network estimation. Second, their methods accommodate models for the idiosyncratic component other than VAR.

FNETS is another take the popular low-rank plus sparsity modeling framework in the high-dimensional learning literature, Also, it is in line with a frequently adopted practice in financial time series analysis where factor-driven common components representing the systematic sources of risk, are removed prior to inferring a network structure via (sparse) regression modeling and identifying the most central nodes representing the systemic sources of risk (Diebold and Y ilmaz 2014; Barigozzi and Brownlees Citation2019). We provide a rigorous theoretical treatment of this empirical approach by accounting for the effect of the factor-adjustment step on the second step regression.

The rest of the article is organized as follows. Section 2 introduces the factor-adjusted VAR model. Sections 3 and 4 describe the network estimation and forecasting methodologies comprising FNETS, respectively, and provide their theoretical consistency. In Section 5, we demonstrate the good estimation and forecasting performance of FNETS on a panel of volatility measures. Section 6 concludes the article, and all the proofs and complete simulation results are presented in Supplementary Appendix. The R software fnets implementing FNETS is available from CRAN (Barigozzi, Cho, and Owens Citation2023).

Notations

By $I, O$ , and $0$ , we denote an identity matrix, a matrix of zeros, and a vector of zeros whose dimensions depend on the context. For a matrix $A = [a_{i i'}, 1 \leq i \leq m, 1 \leq i' \leq n]$ , we denote by $A^{⊤}$ its transpose. The element-wise $l_{\infty}, l_{0}, l_{1}$ and $l_{2}$ -norms are denoted by $| A |_{\infty} = \max_{1 \leq i \leq m} \max_{1 \leq i' \leq n} | a_{i i'} |$ , $| A |_{0} = \sum_{i = 1}^{m} \sum_{i' = 1}^{n} I_{{a_{i i'} \neq 0}}$ , $| A |_{1} = \sum_{i = 1}^{m} \sum_{i' = 1}^{n} | a_{i i'} |$ and $| A |_{2} = \sqrt{\sum_{i = 1}^{m} \sum_{i' = 1}^{n} | a_{i i'} |^{2}}$ . The Frobenius, spectral, induced L₁ and $L_{\infty}$ -norms are denoted by $| | A | |_{F} = | A |_{2}$ , $| | A | | = \sqrt{Λ_{\max} (A^{⊤} A)}$ (with $Λ_{\max} (A)$ and $Λ_{\min} (A)$ denoting its largest and smallest eigenvalues in modulus), $| | A | |_{1} = \max_{1 \leq i' \leq n} \sum_{i = 1}^{m} | a_{i i'} |$ and $| | A | |_{\infty} = \max_{1 \leq i \leq n} \sum_{i' = 1}^{m} | a_{i i'} |$ . Let $A_{i \cdot}$ and $A_{\cdot k}$ denote the ith row and the kth column of $A$ . For two real numbers, set $a \lor b = \max (a, b)$ and $a \land b = \min (a, b)$ . Given two sequences ${a_{n}}$ and ${b_{n}}$ , we write $a_{n} = O (b_{n})$ if, for some finite constant C > 0 there exists $N \in N_{0} = N \cup {0}$ such that $| a_{n} | | b_{n} |^{- 1} \leq C$ for all $n \geq N$ ; we denote by O_P the stochastic boundedness. We write $a_{n} ≍ b_{n}$ when $a_{n} = O (b_{n})$ and $b_{n} = O (a_{n})$ . Throughout, L denotes the lag operator and $ι = \sqrt{- 1}$ . Finally, $I_{A} = 1$ if the event $A$ takes place and 0 otherwise.

2 Factor-Adjusted Vector Autoregressive Model

Consider a zero-mean, second-order stationary p-variate process $X_{t} = {(X_{1 t}, \dots, X_{p t})}^{⊤}, 1 \leq t \leq n$ , which is decomposed into the sum of two latent components: a factor-driven, common component $χ_{t} = {(χ_{1 t}, \dots, χ_{p t})}^{⊤}$ , and an idiosyncratic component $ξ_{t} = {(ξ_{1 t}, \dots, ξ_{p t})}^{⊤}$ modeled as a VAR process. That is, $X_{t} = χ_{t} + ξ_{t}$ where (1) $χ_{t} = B (L) u_{t} = \sum_{l = 0}^{\infty} B_{l} u_{t - l} with u_{t} = {(u_{1 t}, \dots, u_{q t})}^{⊤}, and$ (1) (2) $A (L) ξ_{t} = ξ_{t} - \sum_{l = 1}^{d} A_{l} ξ_{t - l} = Γ^{1 / 2} ε_{t} with ε_{t} = {(ε_{1 t}, \dots, ε_{p t})}^{⊤} .$ (2)

In (1), the latent random vector $u_{t}$ , referred to as the vector of common factors or common shocks, is assumed to satisfy $E (u_{t}) = 0$ and $cov (u_{t}) = I_{q}$ , and are loaded on each χ_it via square summable, one-sided filters $B_{i j} (L) = \sum_{l = 0}^{\infty} B_{l, i j} L^{l}$ , where $B_{l} = [B_{l, i j}, 1 \leq i \leq p, 1 \leq j \leq q] \in R^{p \times q}$ . This defines the generalized dynamic factor model (GDFM) proposed by Forni et al. (Citation2000) and Forni and Lippi (Citation2001), which provides the most general approach to high-dimensional time series factor modeling.

In (2), the idiosyncratic component $ξ_{t}$ is modeled as a VAR(d) process for some finite positive integer d, with innovations $Γ^{1 / 2} ε_{t}$ where $Γ \in R^{p \times p}$ is some positive definite matrix and $Γ^{1 / 2}$ its symmetric square root matrix, and $E (ε_{t}) = 0$ and $cov (ε_{t}) = I_{p}$ . We assume that $ξ_{t}$ is causal (see Assumption 2.3(i)), that is, it admits the Wold representation: (3) $ξ_{t} = D (L) Γ^{1 / 2} ε_{t} = \sum_{l = 0}^{\infty} D_{l} Γ^{1 / 2} ε_{t - l} with D (L) = A^{- 1} (L),$ (3) such that $Γ^{1 / 2} ε_{t}$ is seen as a vector of idiosyncratic shocks loaded on each ξ_it via square summable, one-sided filters $D_{i k} (L) = \sum_{l = 0}^{\infty} D_{l, i k} L^{l}$ where $D_{l} = [D_{l, i k}, 1 \leq i, k \leq p]$ . After accounting for the dominant cross-sectional dependence in the data (both contemporaneous and lagged) by factors, it is reasonable to assume that the dependences left in $ξ_{t}$ are weak and, therefore, that the VAR structure is sufficiently sparse. Discussion on the precise requirement on the sparsity of $A_{l}, 1 \leq l \leq d$ , and $Γ^{- 1}$ is deferred to Section 3.

Remark 2.1.

A special case of the GDFM is the popularly adopted static factor model where the factors are loaded only contemporaneously (see e.g., Stock and Watson Citation2002; Bai Citation2003; Fan, Liao, and Mincheva Citation2013). This is formalized in Assumption 4.1, where we consider forecasting under a static representation. A sufficient condition to obtain a static representation from the GDFM in (1), is to assume $B (L) = \sum_{l = 0}^{s} B_{l} L^{l}$ for some finite integer $s \geq 0$ . For example, if s = 0, the model reduces to $χ_{t} = B_{0} u_{t}$ while if s > 0, it can be written as $χ_{t} = Λ F_{t}$ with $Λ = [B_{l}, 0 \leq l \leq s]$ and $F_{t} = {(u_{t}^{⊤}, \dots, u_{t - s}^{⊤})}^{⊤}$ . Under the static factor model, $X_{t}$ admits a factor-augmented VAR representation (see Remark 4.1).

In the remainder of this section, we list the assumptions required for identification and estimation of (1)–(2). Since $χ_{t}$ and $ξ_{t}$ are latent, some assumptions are required to ensure their (asymptotic) identifiability which are made in the frequency domain. Denote by $Σ_{x} (ω)$ the spectral density matrix of $X_{t}$ at frequency $ω \in [- π, π]$ , and $μ_{x, j} (ω)$ its dynamic eigenvalues which are real-valued and ordered in the decreasing order. We similarly define $Σ_{χ} (ω), μ_{χ, j} (ω), Σ_{ξ} (ω)$ and $μ_{ξ, j} (ω)$ .

Assumption 2.1.

There exist a positive integer $p_{0} \geq 1$ , constants $ρ_{j} \in (3 / 4, 1]$ with $ρ_{1} \geq \dots \geq ρ_{q}$ , and pairs of continuous functions $ω \mapsto α_{χ, j} (ω)$ and $ω \mapsto β_{χ, j} (ω)$ for $ω \in [- π, π]$ and $1 \leq j \leq q$ , such that for all $p \geq p_{0}$ , $\begin{matrix} β_{χ, 1} (ω) \geq \frac{μ_{χ, 1} (ω)}{p^{ρ_{1}}} \geq α_{χ, 1} (ω) > \dots > β_{χ, q} (ω) \\ \geq \frac{μ_{χ, q} (ω)}{p^{ρ_{q}}} \geq α_{χ, q} (ω) > 0. \end{matrix}$

Under the assumption, if $ρ_{j} = 1$ for all $1 \leq j \leq q$ , then we are in presence of q factors that are equally pervasive for the whole cross-section. The left panel of depicts the case when $ρ_{1} = 1$ . If $ρ_{j} < 1$ for some j, we permit the presence of “weak” factors and our theoretical analysis explicitly reflects this; see, for example, Onatski (Citation2012) and Freyaldenhoven (Citation2021) for static factor models permitting weak factors. When weak factors are present, the ordering of the variables becomes important as $p \to \infty$ , whereas the case of linearly diverging factor strengths is compatible with completely arbitrary cross-sectional ordering. The requirement that $ρ_{j} > 3 / 4$ is a minimal one, and generally larger values of ρ_j are required as the dimensionality increases and heavier tails are permitted as discussed later.

Assumptions 2.2 and 2.3 are made to control the serial dependence in $X_{t}$ .

Assumption 2.2.

There exist some constants $Ξ > 0$ and $ς > 2$ such that for all $l \geq 0$ , $\begin{matrix} \max_{_{1 \leq i \leq p}} | B_{l, i \cdot} |_{2} \leq Ξ {(1 + l)}^{- ς} and \\ {(\sum_{j = 1}^{q} | B_{l, \cdot j} |_{\infty}^{2})}^{1 / 2} \leq Ξ {(1 + l)}^{- ς} . \end{matrix}$

Assumption 2.3.

d is a finite positive integer and $\det (A (z)) \neq 0$ for all $| z | \leq 1$ .
There exist some constants $0 < m_{ε} \leq M_{ε}$ such that $| | Γ | | \leq M_{ε}$ and $Λ_{\min} (Γ) \geq m_{ε}$ .
There exist a constant $m_{ξ} > 0$ such that $\inf_{ω \in [- π, π]} μ_{ξ, p} (ω) \geq m_{ξ}$ .
There exist some constants $Ξ > 0$ and $ς > 2$ such that for all $l \geq 0$ , $\begin{matrix} | D_{l, i k} | \leq C_{i k} {(1 + l)}^{- ς} with \\ \max { \max_{_{1 \leq k \leq p}} \sum_{i = 1}^{p} C_{i k}, \max_{_{1 \leq i \leq p}} \sum_{k = 1}^{p} C_{i k}, \max_{_{1 \leq i \leq p}} \sqrt{\sum_{k = 1}^{p} C_{i k}^{2}} } \leq Ξ . \end{matrix}$

Assumption 2.3 (i) and (ii) are standard in the literature (Lütkepohl Citation2005) and imply that $ξ_{t}$ is causal and has finite and nonzero covariance. Under Assumptions 2.2 and 2.3 (iv) (imposed on the Wold decomposition of $ξ_{t}$ in (3)), the serial dependence in $X_{t}$ decays at an algebraic rate. Further, we obtain a uniform bound for $μ_{ξ, j} (ω)$ under Assumption 2.3 (iv):

Proposition 2.1.

Under Assumption 2.3, uniformly over all $ω \in [- π, π]$ , there exists some constant $B_{ξ} > 0$ depending only on $M_{ε}$ , Ξ and Ϛ, defined in Assumption 2.3 (iii) and (iv), such that $\sup_{ω \in [- π, π]} μ_{ξ, 1} (ω) \leq B_{ξ}$ .

Remark 2.2.

Proposition 2.1 and Assumption 2.3 (iii) jointly establish the uniform boundedness of $μ_{ξ, 1} (ω)$ and $μ_{ξ, p} (ω)$ , which is commonly assumed in the literature on high- dimensional VAR estimation via $l_{1}$ -regularization. A sufficient condition for Assumption 2.3 (iii) is that $\max {\max_{1 \leq i \leq p} \sum_{l = 1}^{d} | A_{l, i \cdot} |_{1}, \max_{1 \leq j \leq p} \sum_{l = 1}^{d} | A_{l, \cdot j} |_{1}} \leq Ξ$ for some constant $Ξ > 0$ (Basu and Michailidis Citation2015). Further, when for example, d = 1, Assumption 2.3 (iv) follows if $| A_{1} |_{\infty} \leq γ < 1$ since $\max (| | D_{l} | |_{1}, | | D_{l} | |_{\infty}) \leq Ξ γ^{l}$ with $D_{l} = A_{1}^{l}$ .

The two latent components $χ_{t}$ and $ξ_{t}$ , and the number of factors q, are identified thanks to the large gap between the eigenvalues of their spectral density matrices, which follows from Assumption 2.1 and Proposition 2.1. Then by Weyl’s inequality, the qth dynamic eigenvalue $μ_{x, q} (ω)$ diverges almost everywhere in $[- π, π]$ as $p \to \infty$ , whereas $μ_{x, q + 1} (ω)$ is uniformly bounded for any $p \in N$ and ω. This property is exploited in the FNETS methodology as later described in Section 3.2. It is worth stressing that Assumption 2.1 and Proposition 2.1 jointly constitute both a necessary and sufficient condition for the process $X_{t}$ to admit the dynamic factor representation in (1), see Forni and Lippi (Citation2001).

Finally, we characterize the common and idiosyncratic innovations.

Assumption 2.4.

${u_{t}}_{t \in Z}$ is a sequence of zero-mean, q-dimensional martingale difference vectors with $cov (u_{t}) = I_{q}$ , and u_it and u_jt are independent for all $1 \leq i, j \leq q$ with $i \neq j$ and all $t \in Z$ .
${ε_{t}}_{t \in Z}$ is a sequence of zero-mean, p-dimensional martingale difference vectors with $cov (ε_{t}) = I_{p}$ , and $ε_{i t}$ and $ε_{j t}$ are independent for all $1 \leq i, j \leq p$ with $i \neq j$ and all $t \in Z$ .
$E (u_{j t} ε_{i t'}) = 0$ for all $1 \leq j \leq q, 1 \leq i \leq p$ and $t, t' \in Z$ .
There exist some constants $ν > 4$ and $μ_{ν} > 0$ such that $\max {\max_{1 \leq j \leq q} E (| u_{j t} |^{ν}), \max_{1 \leq i \leq p} E (| ε_{i t} |^{ν})} \leq μ_{ν}$ .

Assumption 2.4 (i) and (ii) allow the common and idiosyncratic innovations to be sequences of martingale differences, relaxing the assumption of serial independence found in Forni et al. (Citation2017). Condition (iii) is standard in the factor modeling literature. Under (iv), we require that the innovations have $ν > 4$ moments, which is considerably weaker than Gaussian or sub-Weibull tails assumed in the literature on VAR modeling of high-dimensional time series (Basu and Michailidis Citation2015; Kock and Callot Citation2015; Wong, Li, and Tewari Citation2020; Krampe and Paparoditis Citation2021; Masini, Medeiros, and Mendes Citation2022). Similar approaches to ours, based only on moment conditions, are found in Wu and Wu (Citation2016) who investigate the Lasso performance in deterministic designs under functional dependence, and Adamek, Smeekes, and Wilms (Citation2023) who assume instead near-epoch-dependence. In Appendix F, we separately consider the case when $u_{t}$ and $ε_{t}$ are Gaussian for the sake of comparison.

3 Network Estimation via FNETS

3.1 Networks Underpinning Factor-Adjusted VAR Processes

Under the latent VAR model in (2), we can define three types of networks underpinning the interconnectedness of $X_{t}$ after factor adjustment (Barigozzi and Brownlees Citation2019).

Let $V = {1, \dots, p}$ denote the set of vertices representing the p time series. First, the transition matrices $A_{l} = [A_{l, i i'}, 1 \leq i, i' \leq p]$ , encode the directed network $N^{G} = (V, E^{G})$ representing Granger causal linkages, with (4) $E^{G} = {(i, i') \in V \times V : A_{l, i i'} \neq 0 for some 1 \leq l \leq d}$ (4) as the set of edges. Here, the presence of an edge $(i, i') \in E^{G}$ indicates that $ξ_{i', t - l}$ Granger causes ξ_it at some lag $1 \leq l \leq d$ .

The second network contains undirected edges representing contemporaneous dependence between VAR innovations $Γ^{1 / 2} ε_{t}$ , denoted by $N^{C} = (V, E^{C})$ ; we have $(i, i') \in E^{C}$ iff the partial correlation between the ith and $i'$ th elements of $Γ^{1 / 2} ε_{t}$ is nonzero. Specifically, letting $Γ^{- 1} = Δ = [δ_{i i'}, 1 \leq i, i' \leq p]$ , the set of edges is given by (5) $E^{C} = {(i, i') \in V \times V : i \neq i' and - \frac{δ_{i i'}}{\sqrt{δ_{i i} \cdot δ_{i' i'}}} \neq 0} .$ (5)

Finally, we summarize the aforementioned lead-lag and contemporaneous relations between the variables in a single, undirected network $N^{L} = (V, E^{L})$ by means of the long-run partial correlations of $ξ_{t}$ . Let $Ω = [ω_{i i'}, 1 \leq i, i' \leq p]$ denote the long-run partial covariance matrix of $ξ_{t}$ , that is $Ω = {(Σ_{ξ} (0))}^{- 1} = 2 π A^{⊤} (1) Δ A (1)$ under (2). Then, the set of edges of $N^{L}$ is (6) $E^{L} = {(i, i') \in V \times V : i \neq i' and - \frac{ω_{i i'}}{\sqrt{ω_{i i} \cdot ω_{i' i'}}} \neq 0} .$ (6)

Generally, $E^{L}$ is greater than $E^{G} \cup E^{C}$ , see Appendix C for a sufficient condition for the absence of an edge $(i, i')$ from $N^{L}$ . In the remainder of Section 3, we describe the network estimation methodology of FNETS which, consisting of three steps, estimates the three networks while fully accounting for the challenges arising from not directly observing the VAR process $ξ_{t}$ , and investigate its theoretical properties.

3.2 Step 1: Factor Adjustment via Dynamic PCA

As described in Section 2, under our model (1)–(2), there exists a large gap in $μ_{x, j} (ω)$ , the dynamic eigenvalues of the spectral density matrix of $X_{t}$ , between those attributed to the factors ( $j \leq q$ ) and those which are not ( $j \geq q + 1$ ). With the goal of estimating the autocovariance (ACV) matrix of the latent VAR process $ξ_{t}$ , we exploit this gap in the factor-adjustment step based on dynamic principal component analysis (PCA); see Chapter 9 of Brillinger (Citation1981) for the definition of dynamic PCA and Forni et al. (Citation2000) for its use in the estimation of GDFM. Throughout, we treat q as known and refer to Hallin and Liška (Citation2007) for its consistent estimation under (1).

Denote the ACV matrices of $X_{t}$ by $Γ_{x} (l) = E (X_{t - l} X_{t}^{⊤})$ for $l \geq 0$ and $Γ_{x} (l) = Γ_{x}^{⊤} (- l)$ for $l \leq - 1$ , and analogously define $Γ_{χ} (l)$ and $Γ_{ξ} (l)$ with $χ_{t}$ and $ξ_{t}$ replacing $X_{t}$ , respectively. Then, $Σ_{x} (ω)$ and $Γ_{x} (l)$ satisfy $Σ_{x} (ω) = {(2 π)}^{- 1} \sum_{l = - \infty}^{\infty} Γ_{x} (l) \exp (- ι l ω)$ for all $ω \in [- π, π]$ . Motivated by this, we estimate $Σ_{x} (ω)$ by (7) ${\hat{Σ}}_{x} (ω) = \frac{1}{2 π} \sum_{l = - m}^{m} K (\frac{l}{m}) {\hat{Γ}}_{x} (l) \exp (- ι l ω),$ (7) with the sample ACV ${\hat{Γ}}_{x} (l) = n^{- 1} \sum_{t = l + 1}^{n} X_{t - l} X_{t}^{⊤}$ when $l \geq 0$ , and ${\hat{Γ}}_{x} (l) = {\hat{Γ}}_{x} {(- l)}^{⊤}$ for $l < 0$ , and the kernel bandwidth $m = ⌊ n^{β} ⌋$ for some $β \in (0, 1)$ . We adopt the Bartlett kernel as $K (\cdot)$ which ensures positive semi-definiteness of ${\hat{Σ}}_{x} (ω)$ (see Appendix F.2.4). Then, we evaluate ${\hat{Σ}}_{x} (ω)$ at the $2 m + 1$ Fourier frequencies $ω_{k}, - m \leq k \leq m$ ( $ω_{k} = 2 π k / (2 m + 1)$ for $0 \leq k \leq m$ , and $ω_{k} = - ω_{| k |}$ for $- m \leq k \leq - 1$ ), and estimate $Σ_{χ} (ω_{k})$ by retaining the contribution from the q largest eigenvalues and eigenvectors only. That is, we obtain ${\hat{Σ}}_{χ} (ω_{k}) = \sum_{j = 1}^{q} {\hat{μ}}_{x, j} (ω_{k}) {\hat{e}}_{x, j} (ω_{k}) {({\hat{e}}_{x, j} (ω_{k}))}^{*}$ (with $*$ denoting the transposed complex conjugate), where ${\hat{μ}}_{x, 1} (ω) \geq \dots \geq {\hat{μ}}_{x, q} (ω)$ , denote the q leading eigenvalues of ${\hat{Σ}}_{x} (ω)$ and ${\hat{e}}_{x, j} (ω)$ the associated (normalized) eigenvectors. From this, an estimator of $Γ_{χ} (l)$ at a given lag $l \in N$ , is obtained via inverse Fourier transform as ${\hat{Γ}}_{χ} (l) = 2 π {(2 m + 1)}^{- 1} \sum_{k = - m}^{m} {\hat{Σ}}_{χ} (ω_{k}) \exp (ι l ω_{k})$ and finally, we estimate the ACV matrices of $ξ_{t}$ with ${\hat{Γ}}_{ξ} (l) = {\hat{Γ}}_{x} (l) - {\hat{Γ}}_{χ} (l)$ , by virtue of Assumption 2.4 (iii).

3.3 Step 2: Estimation of VAR Parameters and $N^{G}$

Recalling the VAR(d) model in (2), let $β = {[A_{l}, 1 \leq l \leq d]}^{⊤} \in R^{(p d) \times p}$ denote the matrix collecting all the VAR parameters. When $ξ_{t}$ is directly observable, $l_{1}$ -regularized least squares or maximum likelihood estimators have been proposed for $β$ , see the references given in Introduction. In the context of factor-adjusted regression modeling where the aim is to estimate the regression structure in the latent idiosyncratic process, it has been proposed to apply the $l_{1}$ -regularization methods after estimating the entire latent process by, say, ${\hat{ξ}}_{t}$ (Fan, Ke, and Wang Citation2020; Fan, Masini, and Medeiros Citation2021; Fan, Lou, and Yu Citation2023; Krampe and Margaritella Citation2021). However, such an approach possibly suffers from the lack of statistical efficiency due to having to control the estimation errors in ${\hat{ξ}}_{t}$ uniformly for all $1 \leq t \leq n$ . Instead, we make use of the Yule-Walker (YW) equation $β = G^{- 1} g$ , where $\begin{matrix} G = [\begin{matrix} Γ_{ξ} (0) & Γ_{ξ} (- 1) & \dots & Γ_{ξ} (- d + 1) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Γ_{ξ} (d - 1) & Γ_{ξ} (d - 2) & \dots & Γ_{ξ} (0) \end{matrix}] and \\ g = [\begin{matrix} Γ_{ξ} (1) \\ ⋮ \\ Γ_{ξ} (d) \end{matrix}], \end{matrix}$ with $G$ being always invertible since $Λ_{\min} (G) \geq 2 π m_{ξ} > 0$ by Assumption 2.3 (iii). We propose to estimate $β$ as a regularized YW estimator based on $\hat{G}$ and $\hat{g}$ , which are obtained by replacing $Γ_{ξ} (l)$ with ${\hat{Γ}}_{ξ} (l)$ derived in Step 1 of FNETS via dynamic PCA, in the definitions of $G$ and $g$ , respectively.

To handle the high dimensionality, we consider an $l_{1}$ -regularised estimator for $β$ which solves the following $l_{1}$ -penalised M-estimation problem (8) $\hat{β} = \arg \min_{M \in R^{p d \times p}} tr (M^{⊤} \hat{G} M - 2 M^{⊤} \hat{g}) + λ | M |_{1}$ (8) with a tuning parameter $λ > 0$ . Note that the matrix $\hat{G}$ is guaranteed to be positive semi-definite (see Appendix F.2.4), thus the problem in (8) is convex with a global minimizer. We note the similarity between (8) and the Lasso estimator, but our estimator is specifically tailored for the problem of estimating the parameters for the latent VAR process $ξ_{t}$ by means of second-order moments only, and thus differs fundamentally from the Lasso-type estimators proposed for high-dimensional VAR estimation. In Appendix A, we propose an alternative estimator based on a constrained $l_{1}$ -minimization approach closely related to the Dantzig selector (Candès and Tao Citation2007).

Once the VAR parameters are estimated, we propose to estimate the edge set of $N^{G}$ in (4) by the set of indices of the nonzero elements of a thresholded version of $\hat{β}$ , denoted by $\hat{β} (t) = [{\hat{β}}_{i j} \cdot I_{{| {\hat{β}}_{i j} | > t}}]$ , with some threshold $t > 0$ .

3.4 Step 3: Estimation of $N^{C}$ and $N^{L}$

Recall that the edge sets of $N^{C}$ and $N^{L}$ defined in (5)–(6), are given by the supports of $Δ$ and $Ω$ . Given $\hat{β}$ in (8) which estimates $β$ , a natural estimator of $Γ$ arises from the YW equation $Γ = Γ_{ξ} (0) - \sum_{l = 1}^{d} A_{l} Γ_{ξ} (l) = Γ_{ξ} (0) - β^{⊤} g$ , as $\hat{Γ} = {\hat{Γ}}_{ξ} (0) - {\hat{β}}^{⊤} \hat{g}$ . Then, we propose to estimate $Δ = Γ^{- 1}$ via constrained $l_{1}$ -minimization as (9) $\overset{ˇ}{Δ} = \arg \min_{M \in R^{p \times p}} | M |_{1} subject to {| \hat{Γ} M - I |}_{\infty} \leq η,$ (9) where $η > 0$ is a tuning parameter. This approach has originally been proposed for estimating the precision matrix of independent data (Cai, Liu, and Luo Citation2011), which we extend to time series settings. Since $\overset{ˇ}{Δ} = [{\overset{ˇ}{δ}}_{i i'}, 1 \leq i, j \leq p]$ is not guaranteed to be symmetric, a symmetrization step is performed to obtain $\hat{Δ} = [{\hat{δ}}_{i i'}, 1 \leq i, i' \leq p]$ with ${\hat{δ}}_{i i'} = {\overset{ˇ}{δ}}_{i i'} \cdot I_{{| {\overset{ˇ}{δ}}_{i i'} | \leq | {\overset{ˇ}{δ}}_{i' i} |}} + {\overset{ˇ}{δ}}_{i' i} \cdot I_{{| {\overset{ˇ}{δ}}_{i' i} | < | {\overset{ˇ}{δ}}_{i i'} |}}$ . Then, the edge set of $N^{C}$ in (5) is estimated by the support of the thresholded estimator $\hat{Δ} (t_{δ}) = [{\hat{δ}}_{i i'} \cdot I_{{| {\hat{δ}}_{i i'} | > t_{δ}}}, 1 \leq i, i' \leq p]$ with some threshold $t_{δ} > 0$ .

Finally, we estimate $Ω = 2 π {(A (1))}^{⊤} Δ A (1)$ by replacing $A (1)$ and $Δ$ with their estimators. We adopt the thresholded estimator $\hat{β} (t) = {[{\hat{A}}_{1} (t), \dots, {\hat{A}}_{d} (t)]}^{⊤}$ , to obtain $\hat{A} (1) = I - \sum_{l = 1}^{d} {\hat{A}}_{l} (t)$ and set $\hat{Ω} = 2 π {(\hat{A} (1))}^{⊤} \hat{Δ} \hat{A} (1)$ . Analogously, the edge set of $N^{L}$ in (6) is obtained by thresholding $\hat{Ω} = [{\hat{ω}}_{i i'}, 1 \leq i, i' \leq p]$ with some threshold $t_{ω} > 0$ , as the support of $\hat{Ω} (t_{ω}) = [{\hat{ω}}_{i i'} \cdot I_{{| {\hat{ω}}_{i i'} | > t_{ω}}}, 1 \leq i, i' \leq p]$ .

3.5 Theoretical Properties

We prove the consistency of FNETS in network estimation by establishing the theoretical properties of each of its three steps in Sections 3.5.1–3.5.3. Then in Section 3.5.4, we present the results for a special case where $χ_{t}$ admits a static representation, $n ≍ p$ and $E (| X_{i t} |^{ν}) < \infty$ for $ν > 8$ , for ease of comparing our results to the existing ones.

Hereafter, we define (10) $\begin{matrix} ψ_{n} = (\frac{m}{n^{1 - 2 / ν}} \lor \sqrt{\frac{m \log (m)}{n}}) and \\ ϑ_{n, p} = (\frac{m {(n p)}^{2 / ν} \log^{7 / 2} (p)}{n} \lor \sqrt{\frac{m \log (m p)}{n}}), \end{matrix}$ (10) where the dependence of these quantities on ν is omitted for simplicity.

3.5.1 Factor Adjustment via Dynamic PCA

We first establish the consistency of the dynamic PCA-based estimator of $Γ_{χ} (l)$ .

Theorem 3.1.

Suppose that Assumptions 2.1, 2.2, 2.3, and 2.4 are met. Then, for any finite positive integer $s \leq d$ , as $n, p \to \infty$ , $\begin{matrix} \max_{_{l : | l | \leq s}} \frac{1}{p} {‖ {\hat{Γ}}_{χ} (l) - Γ_{χ} (l) ‖}_{F} = O_{P} (q p^{2 (1 - ρ_{q})} (ψ_{n} \lor \frac{1}{m} \lor \frac{1}{\sqrt{p}})), \\ \max_{_{l : | l | \leq s}} {| {\hat{Γ}}_{χ} (l) - Γ_{χ} (l) |}_{\infty} = O_{P} (q p^{2 (1 - ρ_{q})} (ϑ_{n, p} \lor \frac{1}{m} \lor \frac{1}{\sqrt{p}})) . \end{matrix}$

Remark 3.1.

Theorem 3.1 is complemented by Proposition F.15 in Appendix which establishes the consistency of the spectral density matrix estimator ${\hat{Σ}}_{χ} (ω)$ uniformly over $ω \in [- π, π]$ in both Frobenius and $l_{\infty}$ -norms.

Both ψ_n and $ϑ_{n, p}$ in (10) increase with the bandwidth m. It is possible to find m that minimizes for example, $(ϑ_{n, p} \lor m^{- 1})$ which, roughly speaking, represents the bias-variance tradeoff in the estimation of the spectral density matrix $Σ_{x} (ω)$ . For example, in light-tailed settings with large enough ν, the choice $m ≍ {(n \log^{- 1} (n p))}^{1 / 3}$ leads to the minimal rate in $l_{\infty}$ -norm $(ϑ_{n, p} \lor m^{- 1}) ≍ {(\log (n p) / n)}^{1 / 3}$ which nearly matches the optimal nonparametric rate when using the Bartlett kernel as in (7) (Priestley Citation1982, p. 463).
Consistency in Frobenius norm depends on ψ_n which tends to zero as $n \to \infty$ without placing any constraint on the relative rate of divergence between n and p. Consistency in $l_{\infty}$ -norm is determined by $ϑ_{n, p}$ which depends on the interplay between the dimensionality and the tail behavior. Generally, the estimation error worsens as weaker factors are permitted ( $ρ_{q} < 1$ in Assumption 2.1) and as p grows, and also when ν is small such that heavier tails are permitted. Consider the case when all factors are strong (i.e., $ρ_{j} = 1$ ). If $p ≍ n$ , then $l_{\infty}$ -consistency holds with an appropriately chosen $m = n^{β}, β \in (0, 1)$ , that leads to $ϑ_{n, p} = o (1)$ , provided that $ν > 4$ . When all moments of u_jt and $ε_{i t}$ exist, we achieve $l_{\infty}$ -consistency even in the ultra high-dimensional case where $\log (p) = o (n)$ .

From Theorem 3.1, the following proposition immediately follows.

Proposition 3.2.

Suppose that the conditions in Theorem 3.1 are met and let Assumption 2.1 hold with $ρ_{j} = 1, 1 \leq j \leq q$ . Then, $P (E_{n, p}) \to 1$ as $n, p \to \infty$ , where (11) $E_{n, p} = { \max_{_{- d \leq l \leq d}} {| {\hat{Γ}}_{ξ} (l) - Γ_{ξ} (l) |}_{\infty} \leq C_{ξ} (ϑ_{n, p} \lor \frac{1}{m} \lor \frac{1}{\sqrt{p}}) } .$ (11) for some constant $C_{ξ} > 0$ .

From Proposition 3.2, we have $l_{\infty}$ -consistency of ${\hat{Γ}}_{ξ} (l)$ in the presence of strong factors. Although it is possible to trace the effect of weak factors on the estimation of $Γ_{ξ} (l)$ (see Corollary F.17), we make this simplifying assumption to streamline the presentation of the theoretical results of the subsequent Steps 2–3 of FNETS.

Remark 3.2.

In Appendix F.2.8, we show that if $χ_{t}$ admits the static representation discussed in Remark 2.1, the rate in Proposition 3.2 is further improved as (12) $\begin{matrix} \max_{_{l : | l | \leq d}} {| {\hat{Γ}}_{ξ} (l) - Γ_{ξ} (l) |}_{\infty} = O_{P} ({\tilde{ϑ}}_{n, p} \lor \frac{1}{\sqrt{p}}) with \\ {\tilde{ϑ}}_{n, p} = (\frac{p^{2 / ν} \log^{3} (p)}{n^{1 - 2 / ν}} \lor \sqrt{\frac{\log (p)}{n}}) . \end{matrix}$ (12)

The term ${\tilde{ϑ}}_{n, p}$ comes from bounding $\max_{l : | l | \leq d} | {\hat{Γ}}_{x} (l) - Γ_{x} (l) |_{\infty}$ . Hence, the improved rate in (12) is comparable to the rate attained when we directly observe $ξ_{t}$ apart from the presence of $p^{- 1 / 2}$ , which is due to the presence of latent factors; similar observations are made in Theorem 3.1 of Fan, Liao, and Mincheva (Citation2013).

3.5.2 Estimation of VAR Parameters and $N^{G}$

We measure the sparsity of $β$ by $s_{0, j} = | β_{\cdot j} |_{0}, s_{0} = \sum_{j = 1}^{p} s_{0, j}$ and $s_{in} = \max_{1 \leq j \leq p} s_{0, j}$ . When d = 1, the quantity $s_{in}$ coincides with the maximum in-degree per node of $N^{G}$ .

Proposition 3.3.

Suppose that $C_{ξ} s_{in} (ϑ_{n, p} \lor m^{- 1} \lor p^{- 1 / 2}) \leq π m_{ξ} / 16$ , where $m_{ξ}$ is defined in Assumption 2.3 (iii). Also, set $λ \geq 4 C_{ξ} (| | β | |_{1} + 1) (ϑ_{n, p} \lor m^{- 1} \lor p^{- 1 / 2})$ in (8). Then, conditional on $E_{n, p}$ defined in (11), we have ${| \hat{β} - β |}_{\infty} \leq {| \hat{β} - β |}_{2} \leq \frac{32 \sqrt{s_{in}} λ}{π m_{ξ}} and {| \hat{β} - β |}_{1} \leq \frac{8 s_{0} λ}{π m_{ξ}} .$

Following Loh and Wainwright (Citation2012), the proof of Proposition 3.3 proceeds by showing that, conditional on $E_{n, p}$ , the matrix $\hat{G}$ meets a restricted eigenvalue condition (Bickel, Ritov, and Tsybakov Citation2009) and the deviation bound is controlled as $| \hat{G} β - \hat{g} |_{\infty} \leq λ / 4$ . Then, thanks to Proposition 3.2, as $n, p \to \infty$ , the estimation errors of $\hat{β}$ in $l_{\infty}$ - and $l_{1}$ -norms, are bounded as in Proposition 3.3 with probability tending to one.

Remark 3.3.

As noted in Remark 2.2, the boundedness of $μ_{ξ, j} (ω)$ follows from that of $| | β | |_{1} = \max_{1 \leq j \leq p} \sum_{l = 1}^{d} | A_{l, j \cdot} |_{1}$ , in which case $| | β | |_{1}$ appearing in the assumed lower bound on λ, does not inflate the rate of the estimation errors. In the light-tailed situation, with the optimal bandwidth $m ≍ {(n \log^{- 1} (n p))}^{1 / 3}$ as specified in Remark 3.1(b), it is required that $s_{in} = O ({(n \log^{- 1} (n p))}^{1 / 3} \land \sqrt{p})$ , which still allows the number of nonzero entries in each row of $A_{l}$ to grow with p. Here, the exponent 1/3 in place of 1/2 often found in the literature, comes from adopting the most general approach to time series factor modeling which necessitates selecting a bandwidth for frequency domain-based factor adjustment.

For sign consistency of the Lasso estimator, the (almost) necessary and sufficient condition is the so-called irrepresentable condition (Zhao and Yu Citation2006), which is known to be highly stringent (Tardivel and Bogdan Citation2022). Alternatively, Medeiros and Mendes (Citation2016) propose an adaptive Lasso estimator with data-driven weights for high-dimensional VAR estimation when $ξ_{t}$ is directly observed. Instead, we propose to additionally threshold $\hat{β}$ and obtain $\hat{β} (t)$ , whose support consistently estimates the edge set of $N^{G}$ .

Corollary 3.4.

Suppose that the conditions of Proposition 3.3 are met. If (13) $\min_{_{(i, j) : | β_{i j} | > 0}} | β_{i j} | > 2 t$ (13) with $t = 32 \sqrt{s_{in}} λ / (π m_{ξ})$ , then we have $sign (\hat{β} (t)) =sign (β)$ conditional on $E_{n, p}$ .

3.5.3 Estimation of $N^{C}$ and $N^{L}$

Let $s_{δ} (ϱ) = \max_{1 \leq i \leq p} \sum_{i' = 1}^{p} | δ_{i i'} |^{ϱ}, ϱ \in [0, 1)$ , denote the (weak) sparsity of $Δ = [δ_{i i'}, 1 \leq i, i' \leq p]$ . Also, define $s_{out} = \max_{1 \leq j \leq p} \sum_{l = 1}^{d} | A_{l, \cdot j} |_{0}$ which, complementing $s_{in}$ , represents the sparsity of the out-going edges of $N^{G}$ . Analogously as in Proposition 3.3, we establish deterministic guarantees for $\hat{Δ}$ and $\hat{Ω}$ conditional on $E_{n, p}$ .

Proposition 3.5.

Suppose that the conditions in Propositions 3.3 are met, and set $η = C s_{in} | | Δ | |_{1} (| | β | |_{1} + 1) (ϑ_{n, p} \lor m^{- 1} \lor p^{- 1 / 2})$ in (9), with C depending only on $C_{ξ}$ and $m_{ξ}$ . Then, conditional on $E_{n, p}$ defined in (11), we have:

$| \hat{Δ} - Δ |_{\infty} \leq 4 | | Δ | |_{1} η$ and $| | \hat{Δ} - Δ | | \leq 12 s_{δ} (ϱ) {(4 | | Δ | |_{1} η)}^{1 - ϱ}$ .
If also $s_{out} t \leq | | A (1) | |_{1}$ with $t$ chosen as in Corollary 3.4, then, ${| \hat{Ω} - Ω |}_{\infty} \leq 4 π | | A (1) | |_{1} (3 | | Δ | | s_{out} t + 16 | | A (1) | |_{1} | | Δ | |_{1} η) .$

Together with Assumption 2.3 (ii), Proposition 3.5(i) indicates asymptotic positive definiteness of $\hat{Δ}$ provided that $Δ$ is sufficiently sparse, as measured by $| | Δ | |_{1}$ and $s_{δ} (ϱ)$ . By definition, $N^{L}$ combines $N^{G}$ and $N^{C}$ and consequently, its sparsity structure is determined by the sparsity of the other two networks, which is reflected in Proposition 3.5(ii). Specifically, the term $| | A (1) | |_{1}$ is related to the out-going property of $N^{G}$ , and satisfies $| | A (1) | |_{1} \leq \max_{1 \leq j \leq p} \sum_{l = 1}^{d} | A_{l, \cdot j} |_{1}$ , where the boundedness of the right-hand side is sufficient for the boundedness of $μ_{ξ, j} (ω)$ (Remark 2.2). Also, $| | Δ | |_{1}$ reflects the sparsity of the edge set of $N^{C}$ , and the tuning parameter η depends on the sparsity of the in-coming edges of $N^{G}$ through $| | β | |_{1} = \max_{1 \leq j \leq p} \sum_{l = 1}^{d} | A_{l, j \cdot} |_{1}$ and $s_{in}$ .

Similarly as in Corollary 3.4, we can show the consistency of the thresholded estimators $\hat{Δ} (t_{δ})$ and $\hat{Ω} (t_{ω})$ in estimating the edge sets of $N^{C}$ and $N^{L}$ , respectively.

Corollary 3.6.

Suppose that conditions of Proposition 3.5 are met. Conditional on $E_{n, p}$ :

If $\min_{(i, i') : | δ_{i i'} | > 0} | δ_{i i'} | > 2 t_{δ}$ with $t_{δ} = 4 | | Δ | |_{1} η$ , we have $sign (\hat{Δ} (t_{δ})) =sign (Δ)$ .
If $\min_{(i, i') : | ω_{i i'} | > 0} | ω_{i i'} | > 2 t_{ω}$ with $t_{ω} = 4 π | | A (1) | |_{1} (3 | | Δ | | s_{out} t + 16 | | A (1) | |_{1} | | Δ | |_{1} η)$ , we have $sign (\hat{Ω} (t_{ω})) =sign (Ω)$ .

3.5.4 The Case of the Static Factor Model

For ease of comparing the performance of FNETS with the existing results, we focus on the static factor model setting discussed in Remark 2.1, and assume that $n ≍ p$ and $\max (| | β | |_{1}, | | A (1) | |_{1}) = O (1)$ . Then, from Remark 3.2 and the proof of Proposition 3.3, we obtain $\max_{1 \leq j \leq p} | {\hat{β}}_{\cdot j} - β_{\cdot j} |_{2} = O_{P} (\sqrt{s_{in} \log (n) / n})$ provided that $ν > 8$ , such that the condition in (13) is written with $t ≍ \sqrt{s_{in} \log (n) / n}$ . That is, $\hat{β}$ and its thresholded counterpart proposed for the estimation of the latent VAR process, perform as well as the benchmark derived under independence and Gaussianity in the Lasso literature (van de Geer, Bühlmann, and Zhou Citation2011). In this same setting, the factor-adjusted regression estimation method of Fan, Masini, and Medeiros (Citation2021), when applied to the problem of VAR parameter estimation, yields an estimator ${\hat{β}}^{FARM}$ which attains $\max_{1 \leq j \leq p} | {\hat{β}}_{\cdot j}^{FARM} - β_{\cdot j} |_{2} = O_{P} (\sqrt{s_{in}} n^{- 1 / 2 + 5 / ν})$ under strong mixingness, see their Theorem 3. Here, the larger O_P -bound compared to ours stems from that ${\hat{β}}^{FARM}$ requires the estimation of ξ_it for all i and t, the error from which increases with n as well as p. This demonstrates the efficacy of adopting our regularised YW estimator.

Continuing with the same setting, Propositions 3.5 implies that $\begin{matrix} {| \hat{Δ} - Δ |}_{\infty} = O_{P} (| | Δ | |_{1}^{2} s_{in} \sqrt{\frac{\log (n)}{n}}) and \\ {| \hat{Ω} - Ω |}_{\infty} = O_{P} ((s_{out} \lor | | Δ | |_{1}^{2} \sqrt{s_{in}}) \sqrt{\frac{s_{in} \log (n)}{n}}) . \end{matrix}$

The former is comparable (up to $s_{in}$ ) to the results in Theorem 4 of Cai, Liu, and Luo (Citation2011) derived for estimating a sparse precision matrix of independent random vectors.

4 Forecasting via FNETS

4.1 Forecasting under the Static Factor Model Representation

For given time horizon $a \geq 0$ , the best linear predictor of $χ_{n + a}$ based on $χ_{n - l}, l \geq 0$ , is (14) $χ_{n + a | n} = \sum_{l = 0}^{\infty} B_{l + a} u_{n - l} .$ (14) under (1). Following Forni et al. (2005), we consider a forecasting method for the factor-driven component which estimates $χ_{n + a | n}$ under a restricted GDFM that admits a static representation of finite dimension. We formalize the static factor model discussed in Remark 2.1 in the following assumption.

Assumption 4.1.

There exist two finite positive integers m₁ and m₂ such that $m_{1} + 1 \geq m_{2}, χ_{t} = M^{(1)} (L) f_{t}$ and $f_{t} = M^{(2)} (L) u_{t}$ where $M^{(1)} (L) = \sum_{l = 0}^{m_{1}} M_{l}^{(1)} L^{l}$ with $M^{(1)} \in R^{p \times q}, M^{(2)} (L) = \sum_{l = 0}^{m_{2}} M_{l}^{(2)} L^{l}$ with $M^{(2)} \in R^{q \times q}$ and $\det (M^{(2)} (z)) \neq 0$ for all $| z | \leq 1$ .
Let $μ_{χ, j}, 1 \leq j \leq r$ , denote the jth largest eigenvalue of $Γ_{χ} (0)$ . Then, there exist a positive integer $p_{0} \geq 1$ , constants $ϱ_{j} \in (7 / 8, 1]$ with $ϱ_{1} \geq \dots \geq ϱ_{r}$ , and pairs of positive constants $(α_{χ, j}, β_{χ, j}), 1 \leq j \leq r$ , such that for all $p \geq p_{0}$ , $\begin{matrix} β_{χ, 1} \geq \frac{μ_{χ, 1}}{p^{ϱ_{1}}} \geq α_{χ, 1} > β_{χ, 2} \geq \frac{μ_{χ, 2}}{p^{ϱ_{2}}} \\ \geq \dots \geq α_{χ, r - 1} > β_{χ, r} \geq \frac{μ_{χ, r}}{p^{ϱ_{r}}} \geq α_{χ, r} > 0. \end{matrix}$

In part (i), $χ_{t}$ admits a static representation with $r = q (m_{1} + 1)$ factors: $χ_{t} = Λ F_{t}$ , where $Λ = [M_{l}^{(1)}, 0 \leq l \leq m_{1}], F_{t} = {(f_{t}^{⊤}, \dots, f_{t - m_{1}}^{⊤})}^{⊤}$ and $f_{t} = M^{(2)} (L) u_{t}$ . The condition that $m_{1} + 1 \geq m_{2}$ is made for convenience, and the proposed estimator of $χ_{n + a | n}$ can be modified accordingly when it is relaxed.

Remark 4.1.

Under Assumption 4.1 (i), the r-vector of static factors, $F_{t}$ , is driven by the q-dimensional common shocks $u_{t}$ . If q < r, Anderson and Deistler (Citation2008) show that $F_{t}$ always admits a VAR(h) representation: $F_{t} = \sum_{l = 1}^{h} G_{l} F_{t - l} + H u_{t}$ for some finite positive integer h and $H \in R^{r \times q}$ . Then, $X_{t}$ has a factor-augmented VAR representation: $\begin{matrix} X_{t} = Λ \sum_{l = 1}^{h} G_{l} F_{t - l} + Λ H u_{t} + \sum_{l = 1}^{d} A_{l} ξ_{t - l} + Γ^{1 / 2} ε_{t} \\ = \sum_{l = 1}^{d \lor h} C_{l} F_{t - l} + \sum_{l = 1}^{d} A_{l} X_{t - l} + ν_{t}, \end{matrix}$ with $C_{l} = Λ G_{l} I_{{l \leq h}} - A_{l} Λ I_{{l \leq d}}$ and $ν_{t} = Λ H u_{t} + Γ^{1 / 2} ε_{t}$ . This model is a generalization of the factor augmented forecasting model considered by Stock and Watson (Citation2002) where only the factor-driven component is present, and it is also considered by Fan, Masini, and Medeiros (Citation2021).

It immediately follows from Proposition 2.1 that $| | Γ_{ξ} (0) | | \leq 2 π B_{ξ}$ . This, combined with Assumption 4.1 (ii), indicates the presence of a large gap in the eigenvalues of $Γ_{x} (0)$ , which allows the asymptotic identification of $χ_{t}$ and $ξ_{t}$ in the time domain, as well as that of the number of static factors r. Throughout, we treat r as known, and refer to for example Bai and Ng (Citation2002) and Ahn and Horenstein (Citation2013), for its estimation.

Let $(μ_{χ, j}, e_{χ, j}), 1 \leq j \leq r$ , denote the pairs of eigenvalues and eigenvectors of $Γ_{χ} (0)$ ordered such that $μ_{χ, 1} \geq \dots \geq μ_{χ, r}$ . Then, $Γ_{χ} (0) = E_{χ} M_{χ} E_{χ}^{⊤}$ with $M_{χ} = diag (μ_{χ, j}, 1 \leq j \leq r)$ and $E_{χ} = [e_{χ, j}, 1 \leq j \leq r]$ . Under Assumption 4.1 (i), we have $χ_{n + a | n}$ in (14) satisfy $χ_{n + a | n} = Proj (χ_{n + a} | F_{n - l}, l \geq 0) = Proj (χ_{n + a} | F_{n}) = Γ_{χ} (- a) E_{χ} M_{χ}^{- 1} E_{χ}^{⊤} χ_{n}$ , where $Proj (\cdot | z)$ denotes the linear projection operator onto the linear space spanned by $z$ . When a = 0, we trivially have $χ_{t | n} = χ_{t}$ for $t \leq n$ . Then, a natural estimator of $χ_{n + a | n}$ is (15) ${\hat{χ}}_{n + a | n}^{res} = {\hat{Γ}}_{χ} (- a) {\hat{E}}_{χ} {\hat{M}}_{χ}^{- 1} {\hat{E}}_{χ}^{⊤} X_{n},$ (15) where $({\hat{μ}}_{χ, j}, {\hat{e}}_{χ, j}), 1 \leq j \leq r$ , denote the pairs of eigenvalues and eigenvectors of ${\hat{Γ}}_{χ} (0)$ , and ${\hat{Γ}}_{χ} (l), l \in {0, a}$ , are estimated as described in Section 3.2. As a by-product, we obtain the in-sample estimator by setting a = 0, as ${\hat{χ}}_{t}^{res} = {\hat{E}}_{χ} {\hat{E}}_{χ}^{⊤} X_{t}$ for $1 \leq t \leq n$ .

Remark 4.2.

Our proposed estimator ${\hat{χ}}_{n + a | n}^{res}$ differs from that of Forni et al. (2005), as they estimate the factor space via generalized PCA on ${\hat{Γ}}_{χ} (0)$ . This in effect replaces ${\hat{E}}_{χ}$ in (15) with the eigenvectors of $W^{- 1} {\hat{Γ}}_{χ} (0)$ where $W$ is a diagonal matrix containing the estimators of the sample variance of $ξ_{t}$ . Such an approach may gain in efficiency compared to ours in the same way a weighted least squares estimator is more efficient than the ordinary one in the presence of heteroscedasticity. However, since we investigate the consistency of ${\hat{χ}}_{n + a | n}^{res}$ without deriving its asymptotic distribution, we do not explore such approach in this article.

In Appendix B, we present an alternative forecasting method that operates under an unrestricted GDFM, that is it does not require Assumption 4.1. Referred to as ${\hat{χ}}_{n + a | n}^{unr}$ , we compare its performance with that of ${\hat{χ}}_{n + a | n}^{res}$ in numerical studies.

Once VAR parameters are estimated by $\hat{β} = {[{\hat{A}}_{1}, \dots, {\hat{A}}_{d}]}^{⊤}$ as in (8), we produce a forecast of $ξ_{n + a}$ given $X_{t}, t \leq n$ , by estimating the best linear predictor $ξ_{n + a | n} = \sum_{l = 1}^{d} A_{l} ξ_{n + 1 - l | n}$ (with $ξ_{t | n} = ξ_{t}$ for $t \leq n$ ), as (16) ${\hat{ξ}}_{n + a | n} = \sum_{l = 1}^{\max (1, a) - 1} {\hat{A}}_{l} {\hat{ξ}}_{n + a - l | n} + \sum_{l = \max (1, a)}^{d} {\hat{A}}_{l} {\hat{ξ}}_{n + a - l} .$ (16)

When $a \leq d$ , the in-sample estimators appearing in (16) are obtained as ${\hat{ξ}}_{t} = X_{t} - {\hat{χ}}_{t}, n + a - d \leq t \leq n$ , with either ${\hat{χ}}_{t}^{res}$ or ${\hat{χ}}_{t}^{unr}$ as ${\hat{χ}}_{t}$ .

4.2 Theoretical Properties

Proposition 4.1 establishes the consistency of ${\hat{χ}}_{n + a | n}^{res}$ in estimating the best linear predictor of $χ_{n + a}$ , where we make it explicit the effects of the presence of weak factors, both dynamic (as measured by $μ_{χ, j} (ω)$ in Assumption 2.1) and static (as measured by $μ_{χ, j}$ in Assumption 4.1 (ii)), and the tail behavior (through ψ_n and $ϑ_{n, p}$ defined in (10)).

Proposition 4.1.

Suppose that the conditions in Theorem 3.1 are met and, in addition, we assume that Assumption 4.1 holds. Then, for any finite $a \geq 0$ , we have $\begin{matrix} {| {\hat{χ}}_{n + a | n}^{res} - χ_{n + a | n} |}_{\infty} \\ = O_{P} (p^{4 - 2 ρ_{q} - 2 ϱ_{r}} (ψ_{n} \lor p^{ϱ_{r} - 1} ϑ_{n, p} \lor \frac{1}{m} \lor \frac{1}{\sqrt{p}})) . \end{matrix}$

As noted in Remark 3.1 (c), weaker factors and heavier tails impose a stronger requirement on the dimensionality p. If all factors are strong ( $ϱ_{r} = 1$ ), the rate becomes $(ϑ_{n, p} \lor m^{- 1} \lor p^{- 1 / 2})$ . When a = 0, Proposition 4.1 provides in-sample estimation consistency for any given $t \leq n$ . The next proposition accounts for the irreducible error in $χ_{n + a | n}$ , with which we conclude the analysis of the forecasting error $| {\hat{χ}}_{n + a | n}^{res} - χ_{n + a} |_{\infty}$ when $a \geq 1$ .

Proposition 4.2.

Suppose that Assumptions 2.2 and 2.4 hold. Then for any finite $a \geq 1, | χ_{n + a | n} - χ_{n + a} |_{\infty} = O_{P} (q^{1 / ν} μ_{ν}^{1 / ν} \log^{1 / 2} (p))$ .

Recall the definition of $s_{in}$ given in Section 3.5. The next proposition investigates the performance of ${\hat{ξ}}_{n + a | n}$ when a = 1, which can easily be extended to any finite $a \geq 2$ .

Proposition 4.3.

Suppose that the in-sample estimator of $ξ_{t}$ and $\hat{β}$ satisfy (17) $\begin{matrix} {| {\hat{ξ}}_{n + 1 - l} - ξ_{n + 1 - l} |}_{\infty} = O_{P} ({\bar{ζ}}_{n, p}) for 1 \leq l \leq d and \\ {‖ \hat{β} - β ‖}_{1} = O_{P} (s_{in} ζ_{n, p}) . \end{matrix}$ (17)

Also, let Assumptions 2.3 and 2.4 hold. Then, $\begin{matrix} {| {\hat{ξ}}_{n + 1 | n} - ξ_{n + 1} |}_{\infty} = O_{P} (s_{in} ζ_{n, p} (\log^{1 / 2} (p) p^{1 / ν} μ_{ν}^{1 / ν} + {\bar{ζ}}_{n, p}) \\ + | | β | |_{1} {\bar{ζ}}_{n, p} + p^{1 / ν} μ_{ν}^{1 / ν}) . \end{matrix}$

Either of the in-sample estimators ${\hat{χ}}_{t}^{res}$ (described in Section 4.1) or ${\hat{χ}}_{t}^{unr}$ (Appendix B), can be used in place of ${\hat{χ}}_{t}$ . Accordingly, the rate ${\bar{ζ}}_{n, p}$ in (17) is inherited by that of ${\hat{χ}}_{t}^{res}$ (given in Proposition 4.1) or ${\hat{χ}}_{t}^{unr}$ (Proposition B.2 (iii)). From the proof of Proposition 3.3, we have $ζ_{n, p} ≍ (| | β | |_{1} + 1) (ϑ_{n, p} \lor m^{- 1} \lor p^{- 1 / 2})$ in (17).

5 Numerical Studies

5.1 Tuning Parameter Selection

We briefly discuss the choice of the tuning parameters for FNETS. For full details, see Owens, Cho, and Barigozzi (Citation2023) that accompanies its R implementation available on CRAN (Barigozzi, Cho, and Owens Citation2023).

Related to $χ_{t}$

We set the kernel bandwidth at $m = ⌊ 4 {(n / \log (n))}^{1 / 3} ⌋$ based on the case when sufficiently large number of moments exist and $n ≍ p$ (Remark 3.1 (b)). In simulation studies reported in Appendix E, we treat the number of factors q (required for Step 1 of FNETS) known, and also treat the number of static factors r (for generating the forecast) as known if it is finite; when $χ_{t}$ does not admit a static factor model (i.e. $r = \infty$ ), we use the value returned by the ratio-based estimator of Ahn and Horenstein (Citation2013). In real data analysis reported in Section 5.3, we estimate both q and r, the former with the estimator proposed in Hallin and Liška (Citation2007), the latter as in Ahn and Horenstein (Citation2013).

Related to $ξ_{t}$

We select the tuning parameter λ in (8) jointly with the VAR order d, by adopting cross validation (CV); in time series settings, a similar approach is explored in Wang and Tsay (Citation2022). For this, the data is partitioned into M consecutive folds with indices $I_{l} = {n_{l} + 1, \dots, n_{l + 1}}$ where $n_{l} = \min (l ⌈ n / M ⌉, n), 0 \leq l \leq M$ , and each fold is split into $I_{l}^{train} = {n_{l} + 1, \dots, ⌈ (n_{l} + n_{l + 1}) / 2 ⌉}$ and $I_{l}^{test} = I_{l} ∖ I_{l}^{train}$ . Then with ${\hat{β}}_{l}^{train} (μ, b)$ obtained from ${X_{t}, t \in I_{l}^{train}}$ with the tuning parameter μ and the VAR order b, we evaluate $\begin{matrix} CV (μ, b) = \sum_{l = 1}^{M} tr ({\hat{Γ}}_{ξ, l}^{test} (0) - {({\hat{β}}_{l}^{train} (μ, b))}^{⊤} {\hat{g}}_{l}^{test} (b) \\ - {({\hat{g}}_{l}^{test} (b))}^{⊤} {\hat{β}}_{l}^{train} (μ, b) \\ + {({\hat{β}}_{l}^{train} (μ, b))}^{⊤} {\hat{G}}_{l}^{test} (b) {\hat{β}}_{l}^{train} (μ, b)), \end{matrix}$ where ${\hat{Γ}}_{ξ, l}^{test} (l), {\hat{G}}_{l}^{test} (b)$ , and ${\hat{g}}_{l}^{test} (b)$ are generated analogously as ${\hat{Γ}}_{ξ} (l), \hat{G}$ , and $\hat{g}$ , respectively, using the test set ${X_{t}, t \in I_{l}^{test}}$ . The measure $CV (μ, b)$ approximates the prediction error while accounting for that we do not directly observe $ξ_{t}$ . Minimizing it over varying μ and b, we select λ and d. In simulation studies, we treat d as known while in real data analysis, we select it from the set ${1, \dots, 5}$ via CV. For selecting η in (9), we adopt the Burg matrix divergence-based CV measure: $CV (μ) = \sum_{l = 1}^{M} tr ({\hat{Δ}}_{l}^{train} (μ) {\hat{Γ}}_{l}^{test}) - \log | {\hat{Δ}}_{l}^{train} (μ) {\hat{Γ}}_{l}^{test} | - p .$

For both CV procedures, we set M = 1 in the numerical results reported below. In simulation studies, we compare the estimators with their thresholded counterparts in estimating the network edge sets with the thresholds $t, t_{δ}$ and $t_{ω}$ selected according to a data-driven approach motivated by Liu, Zhang, and Liu (Citation2021). Details are in Appendix D.

5.2 Simulations

In Appendix E, we investigate the estimation and forecasting performance of FNETS ondatasets simulated under a variety of settings, from Gaussian innovations $u_{t}$ and $ε_{t}$ with (E1) $Δ = I$ and (E2) $Δ \neq I$ , to (E3) heavy-tailed (t₅) innovations with $Δ = I$ , and when $χ_{t}$ is generated from (C1) fully dynamic or (C2) static factor models. In addition, we consider the “oracle” setting (C0) $χ_{t} = 0$ where, in the absence of the factor-driven component, the results obtained can serve as a benchmark. For comparison, we consider the factor-adjusted regression method of Fan, Masini, and Medeiros (Citation2021) and present the performance of their estimator of VAR parameters and forecasts.

5.3 Application to a Panel of Volatility Measures

We investigate the interconnectedness in a panel of volatility measures and evaluate its out-of-sample forecasting performance using FNETS. For this purpose, we consider a panel of p = 46 stock prices retrieved from the Wharton Research Data Service, of US companies which are all classified as “financials” according to the Global Industry Classification Standard; a list of company names and industry groups are found in Appendix G. The dataset spans the period between January 3, 2000 and December 31, 2012 (3267 trading days). Following Diebold and Y ilmaz (2014), we measure the volatility using the high-low range as $σ_{i t}^{2} = 0.361 {(p_{i t}^{high} - p_{i t}^{low})}^{2}$ where $p_{i t}^{high}$ and $p_{i t}^{low}$ denote, respectively, the maximum and the minimum log-price of stock i on day t, and set $X_{i t} = \log (σ_{i t}^{2})$ ; Brownlees and Gallo (Citation2010) support this choice of volatility measure over more sophisticated alternatives.

5.3.1 Network Analysis

We focus on the period 03/2006–02/2010 corresponding to the Great Financial Crisis. We partition the data into four segments of length n = 252 each (corresponding to the number of trading days in a single year) and on each segment, we apply FNETS to estimate the three networks $N^{G}, N^{C}$ , and $N^{L}$ described in Section 3.1.

Each row of plots the heat maps of the matrices underlying the three networks of interest. From all four segments, the CV-based approach described in Section 5.1 returns d = 1 from the candidate VAR order set ${1, \dots, 5}$ . Hence in each row, the left panel represents the estimator ${\hat{A}}_{1} = {\hat{β}}^{⊤}$ , and the middle and the right show the (long-run) partial correlations from the corresponding $\hat{Δ}$ and $\hat{Ω}$ (with their diagonals set to be zero). Locations of the nonzero elements estimate the edge sets of the corresponding networks, and the hues represent the (signed) edge weights.

Fig. 2 Heat maps of the estimators of the VAR transition matrices ${\hat{A}}_{1}$ , partial correlations from $\hat{Δ}$ and long-run partial correlations from $\hat{Ω}$ (left to right), which in turn estimate the networks $N^{G}, N^{C}$ , and $N^{L}$ , respectively, over three selected periods. The grouping of the companies according to their industry classifications are indicated by the axis label colors. The heat maps in the left column are in the scale of $[- 0.81, 0.81]$ while the others are in the scale of $[- 1, 1]$ , with red hues denoting large positive values and blue hues large negative values.

Fig. 2 Heat maps of the estimators of the VAR transition matrices Â1, partial correlations from Δ̂ and long-run partial correlations from Ω̂ (left to right), which in turn estimate the networks NG,NC, and NL, respectively, over three selected periods. The grouping of the companies according to their industry classifications are indicated by the axis label colors. The heat maps in the left column are in the scale of [−0.81,0.81] while the others are in the scale of [−1,1], with red hues denoting large positive values and blue hues large negative values.

Prior to March 2007, all networks exhibit a low degree of interconnectedness but the number of edges increases considerably in 03/2007–02/2008 due mainly to an overall increase in dynamic co-dependencies and a prominent role of banks (blue group) not only in $N^{G}$ but also in $N^{C}$ . In 03/2008–02/2009, the companies belonging to the insurance sector (red group) play a central role and in 03/2009–02/2010, the companies become highly interconnected with two particular firms having many outgoing edges in $N^{G}$ . Also, while most edges in $N^{L}$ , which captures the overall long-run dependence, have positive weights across time and companies, their weights become negative in this last segment. We highlight that FNETS is able to capture the aforementioned group-specific activities although this information is not supplied to the estimation method.

5.3.2 Forecasting

We perform a rolling window-based forecasting exercise on the trading days in 2012. Starting from T = 3016 (the first trading day in 2012), we forecast $X_{T + 1}$ as ${\hat{X}}_{T + 1 | T} (n) = {\hat{χ}}_{T + 1 | T} (n) + {\hat{ξ}}_{T + 1 | T} (n)$ , where ${\hat{χ}}_{T + 1 | T} (n)$ (resp. ${\hat{ξ}}_{T + 1 | T} (n)$ ) denotes the forecast of $χ_{T + 1}$ (resp. $ξ_{T + 1}$ ) using the preceding n data points ${X_{t}, T - n + 1 \leq t \leq T}$ . We set n = 252. After the forecast ${\hat{X}}_{T + 1 | T} (n)$ is generated, we update $T \leftarrow T + 1$ and repeat the above procedure until T = 3267 (the last trading day in 2012) is reached.

For ${\hat{χ}}_{T + 1 | T} (n)$ , we consider the forecasting methods derived under the static factor model (Section 4.1, denoted by ${\hat{χ}}_{T + 1 | T}^{res} (n)$ ) and unrestricted GDFM (Appendix B, ${\hat{χ}}_{T + 1 | T}^{unr} (n)$ ). Following the analysis in Section 5.3.1, we set d = 1 when producing ${\hat{ξ}}_{T + 1 | T} (n)$ . Additionally, we report the forecasting performance of FarmPredict (Fan, Masini, and Medeiros Citation2021), which first fits an AR model to each of the p series (“AR”), projects the residuals on their principal components, and then fits VAR models to what remains via Lasso. Combining the three steps gives the final forecast ${\hat{X}}_{T + 1 | T}^{FARM} (n)$ . The forecast produced by the first step univariate AR modeling, denoted by ${\hat{X}}_{T + 1 | T}^{AR} (n)$ , is also included for comparison.

We evaluate the performance of ${\hat{X}}_{T + 1 | T}$ using two measures of errors ${FE}_{T + 1}^{avg} = | X_{T + 1} |_{2}^{- 2} \cdot | X_{T + 1} - {\hat{X}}_{T + 1 | T} |_{2}^{2}$ and ${FE}_{T + 1}^{\max} = | X_{T + 1} |_{\infty}^{- 1} \cdot | X_{T + 1} - {\hat{X}}_{T + 1 | T} |_{\infty}$ , see for the summary of the forecasting results. Among the forecasts generated by FNETS, the one based on ${\hat{χ}}_{T + 1 | T}^{res} (n)$ performs the best in this exercise, which outperforms ${\hat{X}}_{T + 1 | T}^{AR} (n)$ and ${\hat{X}}_{T + 1 | T}^{FARM} (n)$ according to both ${FE}^{avg}$ and ${FE}^{\max}$ on average. As noted in Appendix E.2.2, the forecast based on ${\hat{χ}}_{T + 1 | T}^{unr}$ shows instabilities and generally is outperformed by the one based on ${\hat{χ}}_{T + 1 | T}^{res}$ , but nonetheless performs reasonably well. Given the high level of co-movements and persistence in the data, the good performance of FNETS is mainly attributed to the way we forecast the factor-driven component, which is based on the estimators derived under GFDM that fully exploit all the dynamic co-dependencies (see also the results obtained by Barigozzi and Hallin Citation2017 on a similar dataset).

Table 1 Mean, median and standard errors of ${FE}_{T + 1}^{avg}$ and ${FE}_{T + 1}^{\max}$ on the trading days in 2012 for ${\hat{X}}_{T + 1 | T} (n)$ in comparison with AR and FarmPredict (Fan, Masini, and Medeiros Citation2021) forecasts.

Display Table

6 Conclusions

We propose and study the asymptotic properties of FNETS, a network estimation and forecasting methodology for high-dimensional time series under a dynamic factor-adjusted VAR model. Our estimation strategy fully takes into account the latency of the VAR process of interest via regularized YW estimation which, distinguished from the existing approaches, brings in methodological simplicity as well as theoretical benefits. We investigate the theoretical properties of FNETS under general conditions permitting weak factors and heavier tails than sub-Gaussianity commonly imposed in the high-dimensional VAR literature, and provide new insights into the interplay between various quantities determining the sparsity of the networks underpinning VAR processes, factor strength and tail behavior, on the estimation of those networks. Simulation studies and an application to a panel of financial time series show that FNETS is particularly useful for network analysis as it is able to discover group structures as well as producing accurate forecasts for highly co-moving and persistent time series such as log-volatilities. The R software fnets implementing FNETS is available from CRAN (Barigozzi, Cho, and Owens Citation2023).

Supplemental material

supplement.zip

Download Zip (2 MB)

Supplementary Materials

The supplementary materials contain the descriptions of the alternative estimation and forecasting methods, selection of the tuning parameters and the complete simulation results, in addition to all the proofs of the theoretical results.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Supported by the Leverhulme Trust (RPG-2019-390).

References

Adamek, R., Smeekes, S., and Wilms, I. (2023), “Lasso Inference for High-Dimensional Time Series,” Journal of Economics, 235, 1114–1143. DOI: 10.1016/j.jeconom.2022.08.008.
Google Scholar
Ahelegbey, D. F., Billio, M., and Casarin, R. (2016), “Bayesian Graphical Models for Structural Vector Autoregressive Processes,” Journal of Applied Economics, 31, 357–386. DOI: 10.1002/jae.2443.
Web of Science ®Google Scholar
Ahn, S. C., and Horenstein, A. R. (2013), “Eigenvalue Ratio Test for the Number of Factors,” Econometrica, 81, 1203–1227.
Web of Science ®Google Scholar
Anderson, B. D., and Deistler, M. (2008), “Generalized Linear Dynamic Factor Models–A Structure Theory,” in 47th IEEE Conference on Decision and Control, pp. 1980–1985.
Google Scholar
Bai, J. (2003), “Inferential Theory for Factor Models of Large Dimensions,” Econometrica, 71, 135–171. DOI: 10.1111/1468-0262.00392.
Web of Science ®Google Scholar
Bai, J., and Ng, S. (2002), “Determining the Number of Factors in Approximate Factor Models,” Econometrica, 70, 191–221. DOI: 10.1111/1468-0262.00273.
Web of Science ®Google Scholar
Bańbura, M., Giannone, D., and Reichlin, L. (2010), “Large Bayesian Vector Auto Regressions,” Journal of Applied Economics, 25, 71–92. DOI: 10.1002/jae.1137.
Web of Science ®Google Scholar
Barigozzi, M., and Brownlees, C. (2019), “NETS: Network Estimation for Time Series,” Journal of Applied Economics, 34, 347–364. DOI: 10.1002/jae.2676.
Web of Science ®Google Scholar
Barigozzi, M., Cho, H., and Owens, D. (2023), fnets: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series, R package version 0.1.5. DOI: 10.1080/07350015.2023.2257270.
Google Scholar
Barigozzi, M., and Hallin, M. (2017), “Generalized Dynamic Factor Models and Volatilities: Estimation and Forecasting,” Journal of Economics, 201, 307–321. DOI: 10.1016/j.jeconom.2017.08.010.
Google Scholar
Basu, S., Li, X., and Michailidis, G. (2019), “Low Rank and Structured Modeling of High-Dimensional Vector Autoregressions,” IEEE Transactions on Signal Processing, 67, 1207–1222. DOI: 10.1109/TSP.2018.2887401.
Web of Science ®Google Scholar
Basu, S., and Michailidis, G. (2015), “Regularized Estimation in Sparse High-Dimensional Time Series Models,” The Annals of Statistics, 43, 1535–1567. DOI: 10.1214/15-AOS1315.
Web of Science ®Google Scholar
Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous Analysis of Lasso and Dantzig Selector,” The Annals of Statistics, 37, 1705–1732. DOI: 10.1214/08-AOS620.
Web of Science ®Google Scholar
Billio, M., Getmansky, M., Lo, A. W., and Pelizzon, L. (2012), “Econometric Measures of Connectedness and Systemic Risk in the Finance and Insurance Sectors,” Journal of Financial Economics, 104, 535–559. DOI: 10.1016/j.jfineco.2011.12.010.
Web of Science ®Google Scholar
Brillinger, D. R. (1981), Time Series: Data Analysis and Theory, Philadelphia, PA: SIAM.
Google Scholar
Brownlees, C., and Gallo, G. (2010), “Comparison of Volatility Measures: A Risk Management Perspective,” Journal of Financial Economics, 8, 29–56. DOI: 10.1093/jjfinec/nbp009.
Web of Science ®Google Scholar
Cai, T., Liu, W., and Luo, X. (2011), “A Constrained l1 Minimization Approach to Sparse Precision Matrix Estimation,” Journal of the American Statistical Association, 106, 594–607.
Web of Science ®Google Scholar
Candès, E. and Tao, T. (2007), “The Dantzig Selector: Statistical Estimation when p is much larger than n,” The Annals of Statistics, 35, 2313–2351. DOI: 10.1214/009053607000000532.
Web of Science ®Google Scholar
Cule, E., Vineis, P., and De Iorio, M. (2011), “Significance Testing in Ridge Regression for Genetic Data,” BMC Bioinformatics, 12, 1–15. DOI: 10.1186/1471-2105-12-372.
PubMed Web of Science ®Google Scholar
Dahlhaus, R. (2000), “Graphical Interaction Models for Multivariate Time Series,” Metrika, 51, 157–172. DOI: 10.1007/s001840000055.
Web of Science ®Google Scholar
Diebold, F. X., and Yilmaz, K. (2014), “On the Network Topology of Variance Decompositions: Measuring the Connectedness of Financial Firms,” Journal of Economics, 182, 119–134. DOI: 10.1016/j.jeconom.2014.04.012.
Web of Science ®Google Scholar
Eichler, M. (2007), “Granger Causality and Path Diagrams for Multivariate Time Series,” Journal of Economics, 137, 334–353. DOI: 10.1016/j.jeconom.2005.06.032.
Google Scholar
Fan, J., Ke, Y., and Wang, K. (2020), “Factor-Adjusted Regularized Model Selection,” Journal of Economics, 216, 71–85. DOI: 10.1016/j.jeconom.2020.01.006.
Web of Science ®Google Scholar
Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,” Journal of the Royal Statistical Society, Series B, 75, 603–680. DOI: 10.1111/rssb.12016.
Google Scholar
Fan, J., Lou, Z., and Yu, M. (2023), “Are Latent Factor Regression and Sparse Regression Adequate?,” Journal of the American Statistical Association (in press). DOI: 10.1080/01621459.2023.2169700.
Web of Science ®Google Scholar
Fan, J., Masini, R., and Medeiros, M. C. (2021), “Bridging Factor and Sparse Models,” arXiv preprint arXiv:2102.11341 .
Google Scholar
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000), “The Generalized Dynamic Factor Model: Identification and Estimation,” The Review of Economics and Statistics, 82, 540–554. DOI: 10.1162/003465300559037.
Web of Science ®Google Scholar
———(2005), “The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting,” Journal of the American Statistical Association, 100, 830–840.
Web of Science ®Google Scholar
Forni, M., Hallin, M., Lippi, M., and Zaffaroni, P. (2017), “Dynamic Factor Models with Infinite-Dimensional Factor Space: Asymptotic Analysis,” Journal of Economics, 199, 74–92. DOI: 10.1016/j.jeconom.2017.04.002.
Google Scholar
Forni, M., and Lippi, M. (2001), “The Generalized Dynamic Factor Model: Representation Theory,” Economic Theory, 17, 1113–1141. DOI: 10.1017/S0266466601176048.
Web of Science ®Google Scholar
Freyaldenhoven, S. (2021), “Factor Models with Local Factors–Determining the Number of Relevant Factors,” Journal of Economics, 229, 80–102. DOI: 10.1016/j.jeconom.2021.04.006.
Google Scholar
Giannone, D., Lenza, M., and Primiceri, G. E. (2021), “Economic Predictions With Big Data: The Illusion of Sparsity,” Econometrica, 89, 2409–2437. DOI: 10.3982/ECTA17842.
Web of Science ®Google Scholar
Guðmundsson, G. S., and Brownlees, C. (2021), “Detecting Groups in Large Vector Autoregressions,” Journal of Economics, 225, 2–26. DOI: 10.1016/j.jeconom.2021.03.012.
Google Scholar
Hallin, M., and Liška, R. (2007), “Determining the Number of Factors in the General Dynamic Factor Model,” Journal of the American Statistical Association, 102, 603–617. DOI: 10.1198/016214506000001275.
Web of Science ®Google Scholar
Han, F., Lu, H., and Liu, H. (2015), “A Direct Estimation of High Dimensional Stationary Vector Autoregressions,” Journal of Machine Learning Research, 16, 3115–3150.
Web of Science ®Google Scholar
Hsu, N.-J., Hung, H.-L., and Chang, Y.-M. (2008), “Subset Selection for Vector Autoregressive Processes Using Lasso,” Computational Statistics & Data Analysis, 52, 3645–3657. DOI: 10.1016/j.csda.2007.12.004.
Web of Science ®Google Scholar
Kock, A. B., and Callot, L. (2015), “Oracle Inequalities for High Dimensional Vector Autoregressions,” Journal of Economics, 186, 325–344. DOI: 10.1016/j.jeconom.2015.02.013.
Google Scholar
Krampe, J., and Margaritella, L. (2021), “Dynamic Factor Models with Sparse VAR Idiosyncratic Components,” arXiv preprint arXiv:2112.07149.
Google Scholar
Krampe, J., and Paparoditis, E. (2021), “Sparsity Concepts and Estimation Procedures for High-Dimensional Vector Autoregressive Models,” Journal of Time Series Analysis, 42, 554–579. DOI: 10.1111/jtsa.12586.
Web of Science ®Google Scholar
Lin, J., and Michailidis, G. (2020), “Regularized Estimation of High-Dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models,” Journal of Machine Learning Research, 21, 1–51.
PubMed Web of Science ®Google Scholar
Liu, B., Zhang, X., and Liu, Y. (2021), “Simultaneous Change Point Inference and Structure Recovery for High Dimensional Gaussian Graphical Models,” Journal of Machine Learning Research, 22, 1–62.
Web of Science ®Google Scholar
Loh, P.-L., and Wainwright, M. J. (2012), “High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity,” The Annals of Statistics, 40, 1637–1664. DOI: 10.1214/12-AOS1018.
Web of Science ®Google Scholar
Lütkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Berlin: Springer.
Google Scholar
Masini, R. P., Medeiros, M. C., and Mendes, E. F. (2022), “Regularized Estimation of High-Dimensional Vector Autoregressions with Weakly Dependent Innovations,” Journal of Time Series Analysis, 43, 532–557. DOI: 10.1111/jtsa.12627.
Web of Science ®Google Scholar
Medeiros, M. C., and Mendes, E. F. (2016), “l1 -regularization of High-Dimensional Time-Series Models with Non-gaussian and Heteroskedastic Errors,” Journal of Economics, 191, 255–271.
Google Scholar
Nicholson, W. B., Wilms, I., Bien, J., and Matteson, D. S. (2020), “High Dimensional Forecasting via Interpretable Vector Autoregression,” Journal of Machine Learning Research, 21, 1–52.
PubMed Web of Science ®Google Scholar
Onatski, A. (2012), “Asymptotics of the Principal Components Estimator of Large Factor Models with Weakly Influential Factors,” Journal of Economics, 168, 244–258. DOI: 10.1016/j.jeconom.2012.01.034.
Google Scholar
Owens, D., Cho, H., and Barigozzi, M. (2023), “fnets: An R Package for Network Estimation and Forecasting via Factor-Adjusted VAR Modelling,” The R Journal (to appear).
Google Scholar
Priestley, M. (1982), Spectral Analysis and Time Series, London: Academic Press.
Google Scholar
Stock, J. H., and Watson, M. W. (2002), “Forecasting Using Principal Components from a Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179. DOI: 10.1198/016214502388618960.
Web of Science ®Google Scholar
Tardivel, P. J., and Bogdan, M. (2022), “On the Sign Recovery by Least Absolute Shrinkage and Selection Operator, Thresholded Least Absolute Shrinkage and Selection Operator, and Thresholded Basis Pursuit Denoising,” Scandinavian Journal of Statistics, 49, 1636–1668. DOI: 10.1111/sjos.12568.
Web of Science ®Google Scholar
Uematsu, Y., and Yamagata, T. (2023), “Discovering the Network Granger Causality in Large Vector Autoregressive Models,” arXiv preprint arXiv:2303.15158.
Google Scholar
van de Geer, S., Bühlmann, P., and Zhou, S. (2011), “The Adaptive and the Thresholded Lasso for Potentially Misspecified Models,” Electronic Journal of Statistics, 5, 688–749. DOI: 10.1214/11-EJS624.
Web of Science ®Google Scholar
Wang, D., and Tsay, R. S. (2022), “Rate-Optimal Robust Estimation of High-Dimensional Vector Autoregressive Models,” arXiv preprint arXiv:2107.11002.
Google Scholar
Wong, K. C., Li, Z., and Tewari, A. (2020), “Lasso Guarantees for β-mixing Heavy-Tailed Time Series,” The Annals of Statistics, 48, 1124–1142. DOI: 10.1214/19-AOS1840.
Web of Science ®Google Scholar
Wu, W.-B., and Wu, Y. N. (2016), “Performance Bounds for Parameter Estimates of High-Dimensional Linear Models with Correlated Errors,” Electronic Journal of Statistics, 10, 352–379. DOI: 10.1214/16-EJS1108.
Web of Science ®Google Scholar
Zhang, D., and Wu, W. B. (2021), “Convergence of Covariance and Spectral Density Estimates for High-Dimensional Locally Stationary Processes,” The Annals of Statistics, 49, 233–254. DOI: 10.1214/20-AOS1954.
Web of Science ®Google Scholar
Zhao, P., and Yu, B. (2006), “On Model Selection Consistency of Lasso,” Journal of Machine Learning Research, 7, 2541–2563.
Web of Science ®Google Scholar

FNETS: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series

Abstract

1 Introduction

Notations

2 Factor-Adjusted Vector Autoregressive Model

3 Network Estimation via FNETS

3.1 Networks Underpinning Factor-Adjusted VAR Processes

3.2 Step 1: Factor Adjustment via Dynamic PCA

3.3 Step 2: Estimation of VAR Parameters and $N^{G}$

3.4 Step 3: Estimation of $N^{C}$ and $N^{L}$

3.5 Theoretical Properties

3.5.1 Factor Adjustment via Dynamic PCA

3.5.2 Estimation of VAR Parameters and $N^{G}$

3.5.3 Estimation of $N^{C}$ and $N^{L}$

3.5.4 The Case of the Static Factor Model

4 Forecasting via FNETS

4.1 Forecasting under the Static Factor Model Representation

4.2 Theoretical Properties

5 Numerical Studies

5.1 Tuning Parameter Selection

Related to $χ_{t}$

Related to $ξ_{t}$

5.2 Simulations

5.3 Application to a Panel of Volatility Measures

5.3.1 Network Analysis

5.3.2 Forecasting

Table 1 Mean, median and standard errors of ${FE}_{T + 1}^{avg}$ and ${FE}_{T + 1}^{\max}$ on the trading days in 2012 for ${\hat{X}}_{T + 1 | T} (n)$ in comparison with AR and FarmPredict (Fan, Masini, and Medeiros Citation2021) forecasts.

6 Conclusions

supplement.zip

Supplementary Materials

Disclosure Statement

References

Information for

Open access

Opportunities

Help and information

FNETS: Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series

Abstract

1 Introduction

Notations

2 Factor-Adjusted Vector Autoregressive Model

3 Network Estimation via FNETS

3.1 Networks Underpinning Factor-Adjusted VAR Processes

3.2 Step 1: Factor Adjustment via Dynamic PCA

3.3 Step 2: Estimation of VAR Parameters and NG

3.4 Step 3: Estimation of NC and NL

3.5 Theoretical Properties

3.5.1 Factor Adjustment via Dynamic PCA

3.5.2 Estimation of VAR Parameters and NG

3.5.3 Estimation of NC and NL

3.5.4 The Case of the Static Factor Model

4 Forecasting via FNETS

4.1 Forecasting under the Static Factor Model Representation

4.2 Theoretical Properties

5 Numerical Studies

5.1 Tuning Parameter Selection

Related to χt

Related to ξt

5.2 Simulations

5.3 Application to a Panel of Volatility Measures

5.3.1 Network Analysis

5.3.2 Forecasting

Table 1 Mean, median and standard errors of FET+1avg and FET+1max on the trading days in 2012 for X̂T+1|T(n) in comparison with AR and FarmPredict (Fan, Masini, and Medeiros Citation2021) forecasts.

6 Conclusions

supplement.zip

Supplementary Materials

Disclosure Statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.3 Step 2: Estimation of VAR Parameters and $N^{G}$

3.4 Step 3: Estimation of $N^{C}$ and $N^{L}$

3.5.2 Estimation of VAR Parameters and $N^{G}$

3.5.3 Estimation of $N^{C}$ and $N^{L}$

Related to $χ_{t}$

Related to $ξ_{t}$

Table 1 Mean, median and standard errors of ${FE}_{T + 1}^{avg}$ and ${FE}_{T + 1}^{\max}$ on the trading days in 2012 for ${\hat{X}}_{T + 1 | T} (n)$ in comparison with AR and FarmPredict (Fan, Masini, and Medeiros Citation2021) forecasts.