Full article: Mean-Structure and Autocorrelation Consistent Covariance Matrix Estimation

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We consider estimation of the asymptotic covariance matrix in nonstationary time series. A nonparametric estimator that is robust against unknown forms of trends and possibly a divergent number of change points (CPs) is proposed. It is algorithmically fast because neither a search for CPs, estimation of trends, nor cross-validation is required. Together with our proposed automatic optimal bandwidth selector, the resulting estimator is both statistically and computationally efficient. It is, therefore, useful in many statistical procedures, for example, CPs detection and construction of simultaneous confidence bands of trends. Empirical studies on four stock market indices are also discussed.

KEYWORDS:

1 Introduction

In many real applications, the observed time series ${Y_{i}}_{i = 1}^{n}$ is a contaminated version of the ideal stationary time series ${X_{i}}_{i = 1}^{n}$ . The contamination may consist of an unknown trend, seasonality, and abrupt change points (CPs). This type of nonstationary time series is commonly encountered in Econometrics, Risk Management, Neurology, Genetics, Ecology, etc. (see, e.g., Horváth, Kokoszka, and Steinebach Citation1999; Grangera and Hyung Citation2004; Banerjeea and Urga Citation2005; Mikkonen et al. Citation2014; Kirch, Muhsal, and Ombao Citation2015). As a result, assessing the stationarity of $Y_{i}$ is usually indispensable before conducting inference and modeling. Many tests for this purpose require estimating the asymptotic covariance matrix (ACM) of ${\bar{X}}_{n} = n^{- 1} \sum_{i = 1}^{n} X_{i}$ , namely, $Σ = \lim_{n \to \infty} n var ({\bar{X}}_{n})$ . Their performances rely on an efficient estimator of $Σ$ that is robust against the mean and autocorrelation structures. This article addresses the problem of mean-structure and autocorrelation consistent (MAC) estimation of $Σ$ .

One classical problem in assessing stability is CP detection. A large class of CP tests is based on the cumulative sum (CUSUM) process (e.g., Brown, Durbin, and Evans Citation1975; Ploberger and Krämer Citation1992; Jirak Citation2015). Among them, the celebrated Kolmogorov–Smirnoff (KS) test is arguably the most commonly used in detecting a mean shift. In the univariate case, the KS test statistic usually requires an estimator of the asymptotic variance constant (AVC) $σ^{2}$ , that is, the univariate analog of $Σ$ , for standardization (see Csörgö and Horváth Citation1997). However, without a jump robust estimator of $σ^{2}$ , the KS test may not be monotonically powerful with respect to the jump magnitude (see Vogelsang Citation1999; Crainiceanu and Vogelsang Citation2007; Juhl and Xiao Citation2009). Indeed, the power may even completely vanish (see for a visualization of this phenomenon). However, many existing approaches are either restricted to one CP or do not fully eliminate the nonmonotone problem. Furthermore, for multidimensional CP tests (e.g., Horváth, Kokoszka, and Steinebach Citation1999), there is no robust estimator of $Σ$ . Although Shao and Zhang (Citation2010) proposed a self-normalized KS test, which does not require estimating $Σ$ , it sacrifices power for that. Its power may even completely vanish under a misspecified alternative (see Section 5.4).

Besides detection of CPs, there are many more dedicated procedures for assessing the stability of mean, for example, testing the existence of structural breaks in trends, constructing simultaneous confidence bands (SCB) of trends, and testing non-constancy of trends (see Wu, Woodroofe, and Mentz Citation2001; Wu Citation2004; Wu and Zhao Citation2007, and references therein). All of the aforementioned procedures require an estimator of $σ^{2}$ that is jump robust as well as trend robust. It is worth mentioning that even in the absence of CP and trend, estimation of $σ^{2}$ is already difficult because it requires specifying a bandwidth parameter (see, e.g., Andrews Citation1991; Newey and West Citation1994). To the best of our knowledge, there is no jump and trend robust estimator of $Σ$ that equips with an optimal bandwidth estimator.

In view of the above problems, this article proposes a single-pass (i.e., neither estimation of CP nor trend is required) and fully nonparametric estimator of $Σ$ for general multidimensional time series. It is consistent and robust even if there are a divergent number of CPs and nonconstant trends of unknown forms. Furthermore, a closed-form formula of the optimal bandwidth is derived so that users do not need to resort to computationally intensive cross-validation. Hence, the resulting estimator is MAC, statistically efficient, and computationally fast.

The remaining part of the article is organized as follows. Section 2 reviews some standard estimation methods of $Σ$ . Section 3 provides motivation of deriving the proposed jump robust estimator; and presents the key theoretical results. Section 4 gives the extension to trend robustness. Implementation issues and generalization are also discussed. Section 5 illustrates finite sample performance Section 6 presents empirical studies on stock market indices. Section 7 concludes the article.

2 Review of Asymptotic Covariance Estimation

2.1 Mathematical Setup

Suppose the observed time series ${Y_{i} \in R^{d}}_{i = 1}^{n}, d \in N$ , is generated from $Y_{i} = μ (i / n) + X_{i}$ , where $μ : [0, 1] \to R^{d}$ is a mean function; and ${X_{i} \in R^{d}}_{i \in Z}$ is strictly stationary and ergodic with mean $E X_{i} \equiv 0, i \in Z$ , and autocovariance function (ACVF) $Γ_{k} : = E (X_{k} X_{0}^{⊺}), k \in Z$ . Also denote the symmetrized ACVF by $Π_{k} : = (Γ_{k} + Γ_{- k}) / 2, k \in Z$ . The ACM of ${\bar{X}}_{n} : = n^{- 1} \sum_{i = 1}^{n} X_{i}$ is defined by(1) $Σ : = \lim_{n \to \infty} n var ({\bar{X}}_{n}) = \sum_{k = - \infty}^{\infty} Γ_{k} = \sum_{k = - \infty}^{\infty} Π_{k},$ (1) provided that the limit exists. Note that the ACM is also known as the time-average covariance matrix, long-run covariance matrix, and (scaled) spectral density at zero frequency.

The unknown mean $μ (\cdot)$ combines trend, seasonality and jump discontinuities:(2) $μ (i / n) : = f (i / n) + \sum_{j = 0}^{J} ξ_{j} 1 {D_{j} \leq i < D_{j + 1}},$ (2) where $f : = f_{n} : [0, 1] \to R^{d}$ is a sequence of continuous functions; $J : = J_{n}$ is the number of CPs; $ξ_{j} : = ξ_{j, n}$ is the mean-shift from f in the time period $[D_{j}, D_{j + 1})$ , for $j = 0, \dots, J$ ; and $D_{j} : = D_{j, n}$ is the jth CP, for $j = 1, \dots, J$ , such that $1 \equiv D_{0} < D_{1} < \dots < D_{J} < D_{J + 1} \equiv n + 1$ . Here the indicator $1 {E} = 1$ if the event E occurs, otherwise $1 {E} = 0$ . Without loss of generality, assume $ξ_{j - 1} \neq ξ_{j}$ for all $j = 1, \dots, J$ . For simplicity, we write $μ_{i} : = μ (i / n)$ .

The unobservable time series ${X_{i}}$ is assumed to admit a causal representation $X_{i} = g (F_{i})$ , where $g (\cdot)$ is a d-dimensional measurable function, $F_{i} : = (\dots, ε_{i - 1}, ε_{i})$ ; and ${ε_{i}}_{i \in Z}$ are independent and identically distributed multidimensional vectors of innovations (see Wu Citation2005). This framework is general enough to cover many commonly-used models, for example, autoregressive moving average (ARMA) model, Volterra series, bilinear (BL) model, threshold AR model, and generalized AR conditional heteroscedastic (GARCH) model (see, e.g., Wu Citation2011; Degras et al. Citation2012). More multivariate examples defined under this framework can be found in Sections 1 and 2 of Wu and Zaffaroni (Citation2018).

2.2 Mathematical Notations

The following notations are used in the article. Denote $N = {1, 2, \dots}$ and $N_{0} = N \cup {0}$ . For $a \in R, ⌊ a ⌋$ and $⌈ a ⌉$ are the floor and ceiling of a, respectively. For $a, b \in R$ , denote $a \lor b = \max (a, b)$ and $a \land b = \min (a, b)$ . When the sample size n is clear, denote $[ [a] ] = (2 \lor ⌈ a ⌉) \land (n - 1)$ . For real sequences ${a_{n}}$ and ${b_{n}}$ , write $a_{n} \sim b_{n}$ if $a_{n} / b_{n} \to 1$ ; $a_{n} = o (b_{n})$ if $a_{n} / b_{n} \to 0$ ; and $a_{n} = O (b_{n})$ if there are M > 0 and N such that $| a_{n} / b_{n} | \leq M$ for all $n \geq N$ .

Matrices and vectors are written in boldface, while scalars are written in normal face. The (r, s)th element of a matrix A is denoted by $A^{[r, s]}$ . The uth component of a vector $μ$ is denoted by $μ^{[u]}$ . In one-dimensional case (i.e., d = 1), the ACM in (1) is written as Σ, $Σ^{[1, 1]}$ or $σ^{2}$ , and the mean function in (2) is written as $μ (\cdot)$ or $μ^{[1]} (\cdot)$ .

For any matrix A, denote its entry-wise absolute value by $| A |$ , its trace by $tr (A)$ , its transpose by $A^{⊺}$ , its column-by-column vectorization by $vec (A)$ , and $A^{\otimes 2} = A A^{⊺}$ . The diagonalization of a vector v is denoted by $diag (v)$ , that is, a diagonal matrix whose diagonal elements are the elements of v. Denote the column vector of ones, the column vector zeros, and the identity matrix by 1, 0, and I, respectively.

For any real random variable Z and any $p \geq 1$ , denote ${‖ Z ‖}_{p} = {(E {| Z |}^{p})}^{1 / p}$ . For any vector-valued random variable Z, we write $Z \in L^{p}$ if ${‖ Z^{[u]} ‖}_{p} < \infty$ for all u. If $ε_{1}, \dots, ε_{n}$ are identically and independently distributed (iid) as the standard normal distribution, we write $ε_{1}, \dots, ε_{n} \overset{iid}{\sim} N (0, 1)$ . If $ε, ε'$ are iid, then we say that $ε'$ is an iid copy of ε.

2.3 Estimation in Stationary Time Series

Suppose that $μ_{1} = \dots = μ_{n}$ . There are three standard classes of methods to estimate $Σ$ . The first one is the subsampling method (see, e.g., Meketon and Schmeiser Citation1984; Carlstein Citation1986; Song and Schmeiser Citation1995; Politis, Romano, and Wolf Citation1999; Chan and Yau Citation2017b). For instance, the overlapping batch means (OBM) estimator is(3) ${\hat{Σ}}_{OBM, n} : = \frac{l}{n - l + 1} \sum_{i = l}^{n} {(\frac{1}{l} \sum_{j = i - l + 1}^{i} {\hat{X}}_{j})}^{\otimes 2},$ (3) where $l \in N \cap (1, n)$ is the batch-size, ${\hat{X}}_{i} : = Y_{i} - {\bar{Y}}_{n}$ and ${\bar{Y}}_{n} : = n^{- 1} \sum_{i = 1}^{n} Y_{i}$ . The second one is the kernel method (see, e.g., Newey and West Citation1987; Andrews Citation1991; Politis Citation2011). For example, the Bartlett kernel and the quadratic spectral (QS) kernel estimators are(4) $\begin{matrix} {\hat{Σ}}_{Bart, n} : = \sum_{k = - l}^{l} Bart (k / l) {\hat{Γ}}_{k} and \\ {\hat{Σ}}_{QS, n} : = \sum_{k = - (n - 1)}^{n - 1} QS (k / l) {\hat{Γ}}_{k}, \end{matrix}$ (4) respectively, where ${\hat{Γ}}_{k} : = n^{- 1} \sum_{i = | k | + 1}^{n} {\hat{X}}_{i} {\hat{X}}_{i - | k |}^{⊺}$ , $Bart (t) : = (1 - | t |) 1 (| t | \leq 1)$ , and $QS (t) : = 25 {sin (6 π t / 5) / (6 π t / 5) - cos (6 π t / 5)} / (12 π^{2} t^{2})$ . The third one is based on the resampling method (see, e.g., Künsch Citation1989; Politis and Romano Citation1994; Paparoditis and Politis Citation2001; Lahiri Citation2003). Recently, a new class of estimators based on orthonormal sequences is proposed (see, e.g., Phillips Citation2005; Sun Citation2013). The choice of kernel or orthonormal sequences are discussed in Lazarus et al. (Citation2018). Besides, Müller (Citation2014) studied the problem under strong autocorrelation.

Estimation of $Σ$ is important because it is usually required in the inference of $μ$ , for example, construction of SCB for $μ$ , and output analysis in Markov chain Monte Carlo (see Flegal and Jones Citation2010; Chan and Yau Citation2016, Citation2017a; Liu and Flegal Citation2018). All three methods above require specifying an unknown bandwidth $l$ (or the batch size, block size, etc.) In practice, $l$ is crucial to the performance of estimators but its optimal value is notoriously difficult to estimate (see, e.g., Politis Citation2003; Hirukawa Citation2010).

2.4 Estimation in Nonstationary Time Series

Suppose that $μ_{i} \neq μ_{j}$ for some $i \neq j$ . In this case, as far as we know, (i) all existing estimators of $Σ$ are either restricted to particular forms of mean-structure; and (ii) the estimators are not equipped with the optimal bandwidth. Some representative estimators are listed below, and are summarized in . For reference, the precise formulas of the estimators are presented in Section C.1 of the supplementary materials.

Table 1 A summary of the robust estimators introduced in Section 2.4, where AC, CV, J, W, and WZ represent estimators proposed in Altissimoa and Corradic (Citation2003), Crainiceanu and Vogelsang (Citation2007), Jirak (Citation2015), Wu (Citation2004), and Wu and Zhao (Citation2007), respectively.

Display Table

Altissimoa and Corradic (Citation2003) proposed estimating $σ^{2}$ by applying a standard kernel estimator to the time series after being de-trended by a local mean estimator. The resulting estimator is consistent when the mean is a piecewise constant function with finitely many breaks. However, there are some drawbacks. First, they did not derive the optimal bandwidth. It is possible that the optimal bandwidth of the modified estimator is different from that of the standard kernel estimator. Second, the modified estimator introduces an extra tuning parameter, that is, the bandwidth for the local mean estimator. This bandwidth has to be chosen carefully to have a consistent estimator of $σ^{2}$ . However, its optimal value is unsolved. A similar method was proposed by Juhl and Xiao (Citation2009) in a hypothesis testing context. However, their estimator is inconsistent under non-stationarity.

Crainiceanu and Vogelsang (Citation2007) found that a CP test has a non-monotonic power if a non-robust estimator of $σ^{2}$ is used. They proposed an estimator of $σ^{2}$ that is robust to one CP. Their idea is to estimate a potential CP and then de-mean the observed time series before and after the estimated CP separately. So, the standard methods in Section 2.3 can be applied to estimate $σ^{2}$ . Their remedy mitigates the non-monotone problem, but it still has some drawbacks. First, it allows a single CP only; and the trend must be a piecewise constant. In reality, these assumptions may not be satisfied (see Section 6.1). Second, the optimal bandwidth is estimated by a parametric plug-in method proposed by Andrews (Citation1991). If the parametric model is misspecified, its performance is doubtful. Recently, Jirak (Citation2015) proposed a similar de-trending method for estimating $σ^{2}$ robustly, but the optimal bandwidth selection issue was not addressed.

In Wu (Citation2004) and Wu and Zhao (Citation2007), they proposed using the first-order difference of nonoverlapping batch means (NBMs) to construct robust estimators of $σ^{2}$ . There are some drawbacks. First, NBM-type estimators are less efficient than the overlapping batch means counterpart in terms of mean-squared error (MSE) (see Politis, Romano, and Wolf Citation1999). Thus, their estimators have a significant loss in $L^{2}$ efficiency. It is worth noting that there is no trivial way to extend their NBM-type estimators to the more efficient OBM-type estimators (see Remark C.2 in the supplementary materials). Second, they did not derive the optimal bandwidth.

Remark 2.1.

Gonçalves and White (Citation2002) proved that two block bootstrap estimators (Künsch Citation1989; Politis and Romano Citation1994) are consistent under a mild nonconstant mean structure, namely $U_{n} : = \sum_{i = 1}^{n} {(μ_{i} - {\bar{μ}}_{n})}^{2} / n = o (1 / l)$ , where $l$ is the block size used in the estimators, and ${\bar{μ}}_{n} = \sum_{i = 1}^{n} μ_{i} / n$ . However, $U_{n} = o (1 / l)$ does not hold if there is one nontrivial jump in mean. For example, if $μ_{i} = 1 (i \geq n / 2)$ , that is, the mean jumps from 0 to 1 at $n / 2$ , then $U_{n} \to 1 / 4 \neq 0$ . Gallant and White (Citation1988) documented similar results in the context of heteroscedasticity and autocorrelation consistent (HAC) variance estimation. In Theorem 6.8, they showed that standard HAC estimators are biased unless the mean is a constant.

3 Jump Robustness

3.1 Motivation

Throughout Section 3, $μ (\cdot)$ is assumed to be a piecewise constant function with J jumps, that is, $f (x) \equiv 0$ . This assumption will be relaxed in Section 4.1. Our proposal is to use a differencing technique consecutively (see Remark 3.1). If the number of jumps J and the magnitude of jumps $| ξ_{j} - ξ_{j - 1} |$ are not too large, then each value in the lag-1 difference sequence ${Y_{i} - Y_{i - 1}}_{i = 2}^{n}$ is of mean zero approximately. In this case, we have $\begin{matrix} E {\frac{1}{2 (n - 1)} \sum_{i = 2}^{n} {(Y_{i} - Y_{i - 1})}^{\otimes 2}} \\ \approx \frac{1}{2} (2 Γ_{0} - Γ_{1} - Γ_{1}^{⊺}) = Π_{0} - Π_{1} . \end{matrix}$

So, the semi-average of the lag-1 difference sequence is a potential estimator of $Π_{0} - Π_{1}$ , the spread between the symmetrized ACVF at lag 0 and lag 1. Similarly, a potential estimator of $Π_{0} - Π_{k}$ is the semi-average of the lag-k difference sequence (5) $\begin{matrix} {\hat{Ψ}}_{k} : = \frac{1}{2 (n - | k | + 1)} \sum_{i = | k | + 1}^{n} {(Y_{i} - Y_{i - | k |})}^{\otimes 2}, \\ | k | = 0, 1, \dots, n - 1 . \end{matrix}$ (5)

The convention ${\hat{Ψ}}_{t} : = {\hat{Ψ}}_{⌈ | t | ⌉ \land (n - 1)}$ is used for $t \in R$ . The summability of $Π_{k}$ in (1) implies $Π_{L} \to 0$ as $L \to \infty$ . Hence, ${\hat{Ψ}}_{L}$ approximates $Π_{0}$ for large L. The bi-differencing estimator(6) ${\hat{Π}}_{k} (L) : = {\hat{Ψ}}_{L} - {\hat{Ψ}}_{k}, | k | = 0, 1, \dots, n - 1,$ (6) is, thus, a potential estimator of $Π_{k}$ when L is large. Observe that the sample mean ${\bar{Y}}_{n}$ is not involved in the definition of ${\hat{Π}}_{k} (L)$ , therefore, we can estimate the ACVFs without estimating the mean $μ$ . The concept of bi-differencing is new. The first and second differencing operations (5) and (6) remove the first and second-order offsets, that is, $μ (\cdot)$ and $Π_{0}$ , respectively. A graphical illustration of the bi-differencing concept can be found in Section A of the supplementary materials. Using the representation $Σ = \sum_{k = - \infty}^{\infty} Π_{k}$ in (1), we may use a “naive” estimator of $Σ$ as follows:(7) ${\hat{Σ}}_{naive, n} : = \sum_{k = - l}^{l} K (k / l) {\hat{Π}}_{k} (L) = \sum_{k = - l}^{l} K (k / l) ({\hat{Ψ}}_{L} - {\hat{Ψ}}_{k}),$ (7) where $l \in N$ is a bandwidth, $K (\cdot)$ is a kernel function, and $L = c_{0} l$ for some $c_{0} \geq 1$ .

However, $MSE ({\hat{Σ}}_{naive, n}^{[r, s]}) \to 0$ slowly for all $r, s \in {1, \dots, d}$ . We demonstrate it through a simple Monte Carlo experiment in Section B of the supplementary materials. An explanation of the slow convergence of ${\hat{Σ}}_{naive, n}$ is that the “same correction term” ${\hat{Ψ}}_{L}$ is used for all ${\hat{Ψ}}_{k}, k = 0, \dots, l$ , in (7). So, the variance of the “aggregated correction term” $\sum_{k = - l}^{l} K (k / l) {\hat{Ψ}}_{L} = O (l) {\hat{Ψ}}_{L}$ increases with $l$ quadratically. This is a huge loss in $L^{2}$ efficiency because the variance of a standard ACM estimator only increases with $l$ linearly (see Andrews Citation1991, Proposition 1(a)). Our strategy is to replace ${\hat{Π}}_{k} (L)$ in (7) by ${\hat{Π}}_{k} (L_{k})$ with an appropriately chosen sequence ${L_{k} \in R^{+}}_{k \in Z}$ . This sequence should satisfy the following two conditions.

(Bias condition) $L_{k} ↑ \infty$ as $l ↑ \infty$ for each k so that ${\hat{Ψ}}_{L_{k}} \approx Π_{0}$ for each k. It ensures that ${\hat{Π}}_{k} (L_{k}) = {\hat{Ψ}}_{L_{k}} - {\hat{Ψ}}_{k}$ is able to accurately approximate $Π_{k}$ with a small bias.
(Variance condition) $L_{k} ↑ \infty$ as $| k | ↑ \infty$ for each $l$ so that asymptotically different correction terms ${\hat{Ψ}}_{L_{0}}, \dots, {\hat{Ψ}}_{L_{l}}$ are used to correct ${\hat{Ψ}}_{0}, \dots, {\hat{Ψ}}_{l}$ for each $l$ . Since ${\hat{Ψ}}_{L_{0}}, \dots, {\hat{Ψ}}_{L_{l}}$ are not perfectly correlated, it help reducing the variance of the new “aggregated correction term” $\sum_{k = - l}^{l} K (k / l) {\hat{Ψ}}_{L_{k}}$ .

Hence, L_k should be increasing with $l$ and $| k |$ . One choice of such L_k is a linear combination of $l$ and $| k |$ with positive weights, that is, $L_{k} = c_{0} l + c_{1} | k |$ , $c_{0}, c_{1} \in R^{+}$ . Note that ${\hat{Σ}}_{naive, n}^{[r, s]}$ sets $c_{1} = 0$ , which violates the variance condition. In the remaining part of this article, we will demonstrate that $L_{k} = c_{0} l + c_{1} | k |$ is sufficient to produce optimal results (see Remark 3.2).

Remark 3.1. D

ifference-based estimators are not new. It has been used in time series analysis and robust estimation (see, e.g., Anderson Citation1971; Hall, Kay, and Titterinton Citation1990; Dette, Munk, and Wagner Citation1998; Hall and Horowitz Citation2013). However, they are restricted to the estimation of the marginal variance $Γ_{0}$ . Differently, we aim at estimating the ACM $Σ = \sum_{k \in Z} Γ_{k}$ . It is a harder problem than estimating $Γ_{0}$ , and requires a new technique called bi-differencing. Our bi-differencing technique is partially motivated by the bipower variation (Barndorff-Nielsen and Shephard Citation2004) in the context of testing for jumps in a continuous time series.

Remark 3.2. A

s we will show in Theorems 3.1 and 3.2, a linear form of L_k already achieves the optimal convergence rate. Although an incremental improvement maybe possible by using a more general form of L_k, we leave it for future investigation.

3.2 Proposed Robust Estimators and Overview of Main Results

For estimation of $Σ$ , we can use the polynomial kernel $K_{q} (x) = (1 - | x |^{q}) 1 {| x | \leq 1}$ for some $q \in N$ . Then the jump robust estimator of the ACM $Σ$ is defined by(8) $\begin{matrix} {\hat{Σ}}_{0, q, n} : = \sum_{k = - l}^{l} K_{q} (| k | / l) \cdot {\hat{Π}}_{k} \\ = \sum_{k = - l}^{l} {1 - {| \frac{k}{l} |}^{q}} {{\hat{Ψ}}_{c_{0} l + c_{1} | k |} - {\hat{Ψ}}_{k}}, \end{matrix}$ (8) where $l = l_{n} \in N \cap (1, n)$ . Using other kernels in (8), for example, $QS (\cdot)$ , is also possible, however, we only focus on the polynomial kernel $K_{q} (\cdot)$ in this article to avoid complication. Extension to other kernels is routine. Users may choose their favorite kernel and their favorite sequence L_k. Relative to $l$ , these choices have slightly less impact on ${\hat{Σ}}_{p, q, n}$ at least in the first-order asymptotic. The effect of kernel choice on higher order asymptotic (see, e.g., Lazarus et al. Citation2018) is theoretically interesting. However, it is beyond the scope of this article. We leave it for future research.

Suppose that $Σ_{q} : = \sum_{k = - \infty}^{\infty} | k |^{q} Π_{k}$ exists and its entries are finite. Under the conditions in Theorems 3.1, 3.2, and part (1) of Corollary 4.1 (to be presented in Sections 3.3 and 4.1), the value of $MSE ({\hat{Σ}}_{0, q, n}^{[r, s]}) = E {({\hat{Σ}}_{0, q, n}^{[r, s]} - Σ^{[r, s]})}^{2}$ is given by(9) $\begin{matrix} MSE ({\hat{Σ}}_{0, q, n}^{[r, s]}) \sim {(Σ_{q}^{[r, s]})}^{2} \frac{1}{l^{2 q}} \\ + [\frac{4 q^{2} (1 + c_{1}) {Σ^{[r, r]} Σ^{[s, s]} + {(Σ^{[r, s]})}^{2}}}{(q + 1) (2 q + 1)}] \frac{l}{n} \end{matrix}$ (9) for each r, s. Hence, if $l = O (n^{1 / (1 + 2 q)})$ , then $MSE ({\hat{Σ}}_{0, q, n}^{[r, s]}) = O (n^{- 2 q / (1 + 2 q)})$ , which is the optimal convergence rate achieved by the standard estimators (see, e.g., Andrews Citation1991). In other words, the proposed robust estimator ${\hat{Σ}}_{0, q, n}$ is rate-optimal in the $L^{2}$ sense.

From (9), the MSE of the proposed estimator ${\hat{Σ}}_{0, q, n}^{[r, s]}$ depends on $Σ_{q}^{[r, s]}$ . Hence, its MSE-optimal bandwidth $l$ also depends on $Σ_{q}$ . As a result, a robust estimator of $Σ_{q}$ is also important for estimating the optimal bandwidth. This phenomenon is similar to the classic results in non-robust estimation of $Σ$ (see, e.g., Andrews Citation1991). It motivates us to study robust estimation for all $Σ_{0}, Σ_{1}, Σ_{2}, \dots$ . Similar to (8), our proposed jump robust estimator of $Σ_{p} = \sum_{k \in Z} | k |^{p} Π_{k}$ ( $p \in N_{0}$ ) is defined as(10) ${\hat{Σ}}_{p, q, n} : = {\hat{Σ}}_{p, q, n} (Y_{1 : n}, l, c_{0}, c_{1}) : = \sum_{k = - l}^{l} K_{q} (| k | / l) \cdot | k |^{p} \cdot {\hat{Π}}_{k} .$ (10)

The statistical meanings of p and q are summarized in the first two rows of .

Table 2 Summary of the statistical meanings of p, q, P and their associated quantities.

Display Table

3.3 Theoretical Results

We develop a general estimation procedure which takes various levels of serial dependence into account. For each $P \in N_{0}$ , define $ϒ_{P} : = \sum_{k = - \infty}^{\infty} | k |^{P} | Π_{k} |$ . The finiteness of $ϒ_{P}$ characterizes the strength of serial dependence, thus it is usually served as an assumption for proving consistency of estimators (see, e.g., Politis Citation2011, Theorem 1). More precisely, we follow Chan and Yau (Citation2017b) to define the coefficient of serial dependence of ${X_{i}}$ by(11) $\begin{matrix} CSD (X) : = \sup {P \in N_{0} : ϒ_{P}^{[•, •]} < \infty}, where \\ ϒ_{P}^{[•, •]} : = \max_{_{r, s \in {1, \dots, d}}} ϒ_{P}^{[r, s]} . \end{matrix}$ (11)

Clearly, the larger the value of $CSD (X)$ , the weaker the serial dependence. For example, consider a univariate fractional Gaussian noise process (Davies and Harte Citation1987) defined as a Gaussian process with ACVF $Γ_{k} = a {(| k | + c)}^{- b}$ for each k, where $a, c > 0$ and $b \geq 1$ . In this model, $ϒ_{P} < \infty$ if and only if $P < b - 1$ . Hence, $CSD (X) = ⌊ b - 1 ⌋$ . More examples and their associated values of $ϒ_{P}$ can be found in Appendix B of Chan and Yau (Citation2017b). Some multivariate examples can be found in Section C.3 of Chan and Yau (Citation2017a). As we shall see in Section 3.3.1, the assumption of CSD plays a critical role in controlling the bias of the estimator ${\hat{Σ}}_{p, q, n}$ .

Asymptotic theories are built on the framework of dependence measures (see Wu Citation2005). Recall that $X_{i} = g (F_{i})$ and $F_{i} : = (\dots, ε_{i - 1}, ε_{i})$ (see Section 2.1). Let $ε_{j}^{'}$ be an iid copy of $ε_{j}$ . Denote $X_{i, {j}} : = g (F_{i, {j}})$ and $F_{i, {j}} : = (F_{j - 1}, ε_{j}^{'}, ε_{j + 1}, \dots, ε_{i})$ . Define the physical dependence measure and its aggregated value by, respectively, $δ_{4, i}^{[u]} : = {‖ X_{i}^{[u]} - X_{i, {0}}^{[u]} ‖}_{4} and Δ_{4}^{[u]} : = \sum_{i = 0}^{\infty} δ_{4, i}^{[u]} .$

For example, consider a univariate linear process (Brockwell and Davis Citation1991, Definition 3.2.1) defined as $X_{i} = \sum_{j = 0}^{\infty} c_{j} ε_{i - j}$ , where ${c_{j}}$ are real coefficients such that $\sum_{j = 0}^{\infty} | c_{j} | < \infty$ , and ${ε_{j}}$ are iid noises such that $E | ε_{0} |^{4} < \infty$ . Then $δ_{4, i}^{[1]} = K | c_{i} |$ for each i, and $Δ_{4}^{[1]} = K \sum_{j = 0}^{\infty} | c_{j} | < \infty$ , where $K = | | ε_{i} - ε_{i}^{'} | |_{4} < \infty$ . More univariate examples and their associated values of physical dependence measures can be found in Examples 1–11 of Wu (Citation2011). Also see Models I–VI in Example 1 of Chan and Yau (Citation2017a) for some multivariate examples. Finiteness of $Δ_{4}^{[•]} : = \max_{u \in {1, \dots, d}} Δ_{4}^{[u]}$ (i.e., Assumption 3.1) is a mild and easily-verifiable condition for studying asymptotic properties (see Wu (Citation2007)).

Assumption 3.1

(Short range dependence). The time series ${X_{i}}$ satisfies $Δ_{4}^{[•]} < \infty$ .

Assumption 3.1

rules out time series having very strong serial dependence, for example, time series with $Σ^{[r, s]} = \infty$ . Indeed, Assumption 3.1 implies the existence of $Σ$ . More importantly, it leads to the invariance principle for the (scaled) partial sum $\sum_{i = 1}^{⌊ t n ⌋} X_{i} / \sqrt{n}$ for $0 \leq t \leq 1$ . It is required for deriving the variance of ${\hat{Σ}}_{p, q, n}$ . Assumption 3.1 is satisfied by many important time series models, including the aforementioned linear process, ARMA and BL models (see, e.g., Wu Citation2005; Liu and Wu Citation2010). Note also that some parallel formulations of dependence like strong mixing coefficient (Rosenblatt Citation1985) have been widely adopted by researchers. However, the mixing type assumptions are sometimes difficult to verify. On the contrary, Assumption 3.1 is more easily verifiable (see Wu Citation2011).

We also need to regularize the size of the bandwidth $l$ . Denote $J : = {1, \dots, J}$ and $J_{u} : = {j \in J : μ_{D_{j - 1}}^{[u]} \neq μ_{D_{j}}^{[u]}}$ for $u = 1, \dots, d$ .

Assumption 3.2

(Conditions on $l$ ). The bandwidth $l = l_{n}$ satisfies (i) $l \to \infty$ as $n \to \infty$ , (ii) $l = o (n)$ as $n \to \infty$ , and (iii) ${(c_{0} + c_{1}) \lor 1} l \leq \inf_{u \in {1, \dots, d}} \inf_{j \in J_{u}} (D_{j + 1} - D_{j})$ .

In Assumption 3.2, conditions (i) and (ii) require that the size of $l$ cannot be too small or too large, respectively. These conditions are commonly required in the small- $l$ subsampling approach (i.e., $l / n \to 0$ ) (see Politis, Romano, and Wolf Citation1999). Condition (iii) states that two consecutive CPs cannot be too close within the same component of the time series. Indeed, condition (iii) is stronger than needed but it makes derivations easier.

3.3.1 Bias and Variance Expressions

Let $χ : = 1 {c_{1} \neq 1 - c_{0} F_{p, q}}$ , where $F_{p, q} : = (p + 2) (p + q + 2) / {(p + 1) (p + q + 1)}$ . Also let $a_{n} : = \sup_{u \in {1, \dots, d}} \sup_{j \in J_{u}} | μ_{j}^{[u]} - μ_{j - 1}^{[u]} | .$

Also recall that J = J_n denotes the number of CPs (see (2)). The bias of the jump robust estimator, $Bias ({\hat{Σ}}_{p, q, n}^{[r, s]}) : = E ({\hat{Σ}}_{p, q, n}^{[r, s]}) - Σ_{p}^{[r, s]}$ , is given below.

Theorem 3.1

(Bias of the estimator). Suppose that $X_{1} \in L^{2}, f (x) \equiv 0$ , $CSD (X) = P \equiv p + q$ , and Assumption 3.2 holds, where $p \in N_{0}$ and $q \in N$ . Then, for $c_{0}, c_{1} \in R^{+}$ and $r, s \in {1, \dots, d}$ ,(12) $\begin{matrix} Bias ({\hat{Σ}}_{p, q, n}^{[r, s]}) = - \frac{1}{l^{q}} Σ_{p + q}^{[r, s]} + r_{n}^{bias}, \\ r_{n}^{bias} = O {\frac{l^{p + 1}}{n} (l^{χ} + \frac{l^{2}}{n}) a_{n}^{2} J_{n}} + o (\frac{1}{l^{q}}) . \end{matrix}$ (12)

In Theorem 3.1, the assumption $CSD (X) = p + q$ controls the rate of decay of ACVF. For a fixed p, if the value of $CSD (X)$ is larger, then q is larger, and the autocorrelation is weaker. Consequently, the autocorrelation at large lags only introduce a small bias to ${\hat{Σ}}_{p, q, n}^{[r, s]}$ . Hence, it makes sense that the magnitude of the leading term of the bias in (12), that is, $| Σ_{p + q}^{[r, s]} | / l^{q} = O (1 / l^{q})$ , is decreasing with q. Besides, J_n and a_n determine the frequency of the CPs and the magnitude of the jumps, respectively. From (12), if $a_{n}^{2} J_{n}$ is not too large so that $r_{n}^{bias} = o (1 / l^{q})$ , the dominating term of the asymptotic bias is $- Σ_{p + q}^{[r, s]} / l^{q}$ . Consequently, ${\hat{Σ}}_{p, q, n}^{[r, s]}$ is asymptotically unbiased as $l \to \infty$ . Technical conditions for controlling $r_{n}^{bias}$ are discussed in Corollary 4.1. Moreover, c₀ and c₁ do not affect the first-order asymptotic bias of ${\hat{Σ}}_{p, q, n}$ .

Define $Ξ^{[r, s]} : = Σ^{[r, r]} Σ^{[s, s]} + {(Σ^{[r, s]})}^{2}$ . The variance of ${\hat{Σ}}_{p, q, n}^{[r, s]}$ is given below.

Theorem 3.2

(Variance of the estimator). Suppose that $X_{1} \in L^{ν}$ for $ν > 4$ , $f (x) \equiv 0$ , and Assumptions 3.1 and 3.2 hold. If $p \in N_{0}, q \in N$ , $c_{0}, c_{1} \in R^{+}$ and $r, s \in {1, \dots, d}$ , then(13) $\begin{matrix} var ({\hat{Σ}}_{p, q, n}^{[r, s]}) = \frac{4 q^{2} (1 + c_{1}) Ξ^{[r, s]} l^{1 + 2 p}}{(2 p + 1) (2 p + q + 1) (2 p + 2 q + 1) n} + r_{n}^{var}, \\ r_{n}^{var} = O [\frac{l^{1 + 2 p}}{n} {\frac{l^{2}}{n} (1 + a_{n}^{2} J_{n}) + o (1)}] . \end{matrix}$ (13)

Theorem 3.2 requires Assumption 3.1 because its proof relies on the invariance principle, which is guaranteed by Assumption 3.1 (see, e.g., Wu Citation2005). However, the detailed strength of serial dependence (i.e., the CSD) is not important for deriving (13). Besides, unlike the asymptotic bias, the variance of ${\hat{Σ}}_{p, q, n}$ depends also on L_k. However, only c₁ but not c₀ is relevant. Since c₁ determines the speed of divergence of L_k as $k \to \infty$ , the variance in (13) is naturally increasing with c₁. Note that c₀ and c₁ are not tuning parameters for balancing the leading terms of the bias and variance because c₀ and c₁ are not involved in (12). Although c₀ and c₁ can be chosen optimally by balancing the second-order bias and variance, that is, $r_{n}^{bias}$ and $r_{n}^{var}$ , the effect on ${\hat{Σ}}_{p, q, n}^{[r, s]}$ is relatively incremental.

Consider $l = O (n^{θ})$ for some $θ \in (0, 1)$ . The MSE-optimal value of θ can be found by balancing the squared-bias and variance of ${\hat{Σ}}_{p, q, n}^{[r, s]}$ so that the MSE is minimized. Assume $a_{n}^{2} J_{n} \to \infty$ sufficiently slow so that $r_{n}^{bias} = o (1 / l^{q})$ and $r_{n}^{var} = o (l^{1 + 2 p} / n)$ (see Corollary 4.1 for explicit conditions to guarantee that). In this case, Theorems 3.1 and 3.2 imply that(14) $\begin{matrix} MSE ({\hat{Σ}}_{p, q, n}^{[r, s]}) = {Bias ({\hat{Σ}}_{p, q, n}^{[r, s]})}^{2} + var ({\hat{Σ}}_{p, q, n}^{[r, s]}) \\ = O (1 / l^{2 q}) + O (l^{1 + 2 p} / n) . \end{matrix}$ (14)

If $l = O (n^{θ^{⏧}})$ , then (14) achieves its minimum order, that is, $MSE ({\hat{Σ}}_{p, q, n}^{[r, s]}) = O (n^{- λ^{⏧}})$ , where(15) $θ^{⏧} : = 1 / (1 + 2 p + 2 q) and λ^{⏧} : = 2 q / (1 + 2 p + 2 q) .$ (15)

Note that the superscript “ $^{⏧}$ ” indicates optimal values. It is worth mentioning that the robust estimator ${\hat{Σ}}_{p, q, n}$ achieves the same optimal $L^{2}$ convergence rate as the non-robust counterparts (see, e.g., Andrews Citation1991; Chan and Yau Citation2017b).

3.3.2 Theoretically Optimal Bandwidth

In this subsection, we derive the optimal $l \sim ϕ n^{θ^{⏧}}$ , $ϕ \in R^{+}$ , such that the MSE of ${\hat{Σ}}_{p, q, n}$ is optimized up to the first order including its proportionality constant.

Suppose $r_{n}^{bias} = o (1 / l^{q})$ and $r_{n}^{var} = o (l^{1 + 2 p} / n)$ . Let W be a weight matrix specifying the entry-wise importance of $Σ_{p}$ . For example, $W = {(1 {r \leq s})}_{r, s = 1}^{d}$ puts equal weight on each element of the upper triangular part (including the diagonal) of $Σ_{p}$ . Write $W ≻ 0$ if $W^{[r, s]} \geq 0$ for all r, s, and $W^{[r, s]} > 0$ for at least one pair of r, s. From now on, assume $W ≻ 0$ . Denote $W : = diag {vec (W)}$ (see Section 2.2 for the definitions of $diag (\cdot)$ and $vec (\cdot)$ ). Then the optimal value of $ϕ$ is the minimizer of(16) $\begin{matrix} {AMSE}_{p, q, W} ({\hat{Σ}}_{p, q, •}) : = \lim_{n \to \infty} n^{2 q / (1 + 2 p + 2 q)} \\ {MSE}_{W} ({\hat{Σ}}_{p, q, n}), where \\ {MSE}_{W} ({\hat{Σ}}_{p, q, n}) : = E [{vec ({\hat{Σ}}_{p, q, n} - Σ_{p})}^{⊺} \\ W {vec ({\hat{Σ}}_{p, q, n} - Σ_{p})}] . \end{matrix}$ (16)

The weighted MSE (16) is a generalization of the $L^{2}$ risk under the Frobenius norm $| | \cdot | |_{F}$ because, for any square matrix A, ${vec (A)}^{⊺} W {vec (A)} = tr (A^{⊺} A) = | | A | |_{F}^{2}$ if $W = 1 1^{⊺}$ . Similar weighting rule is also adopted by Andrews (Citation1991) and Chan and Yau (Citation2017a). By Theorems 3.1 and 3.2, the optimal value of $ϕ$ is $ϕ^{⏧}$ , where(17) $ϕ^{⏧} : = ϕ_{p, q}^{⏧} : = {\frac{(2 p + q + 1) (2 p + 2 q + 1) κ_{p + q}}{2 q (1 + c_{1})}}^{θ^{⏧}},$ (17) (18) $\begin{matrix} κ_{p + q} : = \frac{{(vec Σ_{p + q})}^{⊺} W (vec Σ_{p + q})}{{(vec Ξ)}^{⊺} W (vec Ξ)} \\ = \frac{{(vec Σ_{p + q})}^{⊺} W (vec Σ_{p + q})}{{(vec Σ)}^{⊺} W (vec Σ) + tr {W (Σ \otimes Σ)}} . \end{matrix}$ (18)

Here $A \otimes B$ is the Kronecker’s product of A and B. Note that the size of the optimal bandwidth $l^{⏧} \sim ϕ^{⏧} n^{θ^{⏧}}$ depends on two parameters $θ^{⏧}$ and $ϕ^{⏧}$ .

The parameter $θ^{⏧} = 1 / (1 + 2 P)$ controls the divergence rate of $l^{⏧} = O (n^{θ^{⏧}})$ . Recall, from , that if P is small, then the serial dependence is strong. Hence, it makes sense to have a larger optimal bandwidth $l^{⏧}$ to cover more autocovariances.
The parameter $ϕ^{⏧}$ controls the leading coefficient of $l^{⏧}$ . The value of $ϕ^{⏧}$ depends on the unknown $κ_{p + q}$ . We interpret this quantity for univariate ${X_{i}}$ . In this case, $κ_{p + q} = {(Σ_{p + q} / Σ)}^{2} / 2$ , which is not purely increasing with the strength of autocorrelation. Indeed, it also depends on the sign of autocorrelation. For example, if $Γ_{k} = ρ^{k}$ for some $ρ \in (- 1, 1)$ , then $κ_{2} = 2 ρ^{2} / {(1 - ρ)}^{4}$ , which is not an increasing function of $| ρ |$ . This interesting phenomenon also exists in the standard variance estimation (see Andrews Citation1991, (5.1) and (5.2)). Estimation of $ϕ^{⏧}$ is presented in Section 4.3.

Formula (17) handles all entries of $Σ$ simultaneously. If the dependence structures of ${X_{i}}$ vary dramatically across entries, we may construct entry-adaptive optimal bandwidth. Let $e_{u} : = {(0, \dots, 0, 1, 0, \dots, 0)}^{⊺}$ be the uth elementary d-vector, that is, $e_{u}^{[v]} = 1 {v = u}$ for all $v \in {1, \dots, d}$ . Setting $W = e_{r} e_{s}^{⊺}$ , we can produce the optimal bandwidth for the (r, s)th entry of $Σ$ . The resulting optimal asymptotical MSE (AMSE) of ${\hat{Σ}}_{p, q, n}^{[r, s]}$ is given by(19) $\begin{matrix} n^{λ_{p, q}^{⏧}} E {({\hat{Σ}}_{p, q, n}^{[r, s]} - Σ_{p}^{[r, s]})}^{2} \\ \to \frac{1}{1 - λ^{⏧}} {\frac{λ^{⏧} (1 + c_{1})}{2 p + q + 1}}^{λ^{⏧}} Ξ^{[r, s]} {\frac{{(Σ_{p + q}^{[r, s]})}^{2}}{Ξ^{[r, s]}}}^{1 - λ^{⏧}} . \end{matrix}$ (19)

4 Extension, Discussion, and Implementation

4.1 Extension to Trend Robustness

In this section, we consider the full generalization of ${Y_{i}}$ , that is, the assumption $f (x) \equiv 0$ is removed. We measure the amount of fluctuation of $f = f_{n}$ by $b_{n} : = \sup_{u \in {1, \dots, d}} \sup_{x, x' \in [0, 1]} | \frac{f_{n}^{[u]} (x) - f_{n}^{[u]} (x')}{x - x'} |,$ which is small if the fluctuation of $f_{n}$ does not grow too fast with n. The following two theorems state the bias and variance of ${\hat{Σ}}_{p, q, n}$ when $μ (\cdot)$ consists of jumps and trends.

Theorem 4.1

(Bias of the estimator). If the assumption $f (x) \equiv 0$ is removed, then, under all other conditions in Theorem 3.1, (12) is satisfied with $r_{n}^{bias}$ being replaced by $R_{n}^{bias} = r_{n}^{bias} + O {l^{p + 3} (b_{n}^{2} + J_{n} a_{n} b_{n}) / n^{2}} .$

Theorem 4.2

(Variance of the estimator). If the assumption $f (x) \equiv 0$ is removed, then, under all other conditions in Theorem 3.2, (13) is satisfied with $r_{n}^{var}$ being replaced by $R_{n}^{var} = r_{n}^{var} + O (l^{4 + 2 p} b_{n}^{2} / n^{3}) .$

The optimal bandwidth in (17) and the optimal MSE in (19) remain valid, provided that $R_{n}^{bias} = o (1 / l^{q})$ and $R_{n}^{var} = o (l^{1 + 2 p} / n)$ . Consequently, ${\hat{Σ}}_{p, q, n}$ achieves the optimal convergence rate even in the presence of jumps and continuous trends. The remainder terms $R_{n}^{bias}$ and $R_{n}^{var}$ are influenced by (i) the jump effect $a_{n}^{2} J_{n}$ , (ii) the trend effect $b_{n}^{2}$ , and (iii) their joint effect $a_{n} b_{n} J_{n}$ . Using these three factors, we define the following classes of mean functions: $\begin{matrix} M^{⏧} : = {μ (\cdot) : a_{n}^{2} J_{n} = o (n^{θ^{⏧} (p + q - χ)}), \\ a_{n} b_{n} J_{n} + b_{n}^{2} = o (n^{θ^{⏧} (3 p + 3 q - 1)})}, \\ M : = {μ (\cdot) : a_{n}^{2} J_{n} = o (n^{θ^{⏧} (p + 2 q - χ)}), \\ a_{n} b_{n} J_{n} + b_{n}^{2} = o (n^{θ^{⏧} (3 p + 4 q - 1)})} . \end{matrix}$

Both $M^{⏧}$ and $M$ include only reasonably well-behaved mean functions $μ (\cdot)$ such that the aforementioned effects (i), (ii), and (iii) are small. Clearly, $M^{⏧} \subseteq M$ . Simple conditions to control $R_{n}^{bias}$ and $R_{n}^{var}$ are given below.

Corollary 4.1. A

ssume the conditions in Theorems 4.1 and 4.2. Let $l = O (n^{θ^{⏧}})$ .

If $μ (\cdot) \in M^{⏧}$ , then $R_{n}^{bias} = o (1 / l^{q})$ and $R_{n}^{var} = o (l^{1 + 2 p} / n)$ .
If $μ (\cdot) \in M$ , then $R_{n}^{bias} = o (1)$ and $R_{n}^{var} = o (1)$ .

The above results remain valid if $R_{n}^{bias}$ and $R_{n}^{var}$ are replaced by $r_{n}^{bias}$ and $r_{n}^{var}$ , respectively.

Corollary 4.1

ensures that ${\hat{Σ}}_{p, q, n}$ is $L^{2}$ consistent if $μ (\cdot)$ belongs to the well-behaved class $M$ . If $μ (\cdot)$ belongs to a more well-behaved class $M^{⏧}$ , we also have the optimal results (17) and (19), which imply that the convergence rate of the estimator ${\hat{Σ}}_{p, q, n}$ is not affected by jumps and trends. However, the standard (non-robust) estimators, for example, ${\hat{Σ}}_{OBM, n}$ in (3) and ${\hat{Σ}}_{QS, n}$ in (4), are not guaranteed to be consistent if $μ (\cdot) \in M$ .

For example, consider the estimator ${\hat{Σ}}_{0, 2, n}$ with $l = O (n^{1 / 5})$ , and any $c_{0} > 0$ and $c_{1} \geq 1$ . It is $L^{2}$ consistent, and satisfies the optimal results (17) and (19) if $μ (\cdot)$ belongs to(20) $M^{⏧} = {μ (\cdot) : a_{n}^{2} J_{n} = o (n^{1 / 5}), a_{n} b_{n} J_{n} + b_{n}^{2} = o (n)} .$ (20)

The class $M^{⏧}$ in (20) includes (but not restricted to) mean functions having piecewise Lipschitz continuous trends with at most $J_{n} = o (n^{1 / 5})$ bounded jumps. Note that such J_n is allowed to be divergent to infinity as $n \to \infty$ . See the last row of for a summary and a comparison with existing robust estimators. We illustrate Corollary 4.1 through a simple simulation experiment. Let $X_{i} = 0.5 X_{i - 1} + 0.5 ε_{i - 1} + ε_{i}$ , where $ε_{i} \overset{iid}{\sim} N (0, 1)$ . Consider(21) $μ (t) = 41 (0.2 \leq t < 0.3) + 2 e^{2 t} + sin (8 π t),$ (21) which consists of two CPs, an exponentially increasing trend, and a periodic structure. In this case, a_n = 4, $b_{n} = 4 (e^{2} + 2 π)$ , and J_n = 2. Hence, the mean function (21) is a member of the class $M^{⏧}$ defined in (20). shows a typical realization of $Y_{i} = X_{i} + μ_{i}$ ( $1 \leq i \leq 400$ ). The density functions of ${\hat{Σ}}_{0, 2, n}$ and ${\hat{Σ}}_{QS, n}$ are shown in . The proposed estimator ${\hat{Σ}}_{0, 2, n}$ concentrates at around the true value Σ = 9, however, the standard estimator ${\hat{Σ}}_{QS, n}$ is obviously off the targeted value.

Fig. 1 (a) A typical realization of the time series with the mean function defined in (21). (b) The density functions of ${\hat{Σ}}_{0, 2, n}$ and ${\hat{Σ}}_{QS, n}$ when n = 400. The true value is Σ = 9.

Fig. 1 (a) A typical realization of the time series with the mean function defined in (21). (b) The density functions of Σ̂0,2,n and Σ̂QS,n when n = 400. The true value is Σ = 9.

4.2 Comparison With Standard Estimators

The estimator ${\hat{Σ}}_{p, q, n}$ sacrifices statistical efficiency to gain robustness. In this section, we investigate how much efficiency is lost.

The proposed robust estimator ${\hat{Σ}}_{0, 1, n}$ uses the Bartlett kernel $K_{1} (\cdot) \equiv Bart (\cdot)$ . So, we compare it with the standard non-robust Bartlett kernel estimator ${\hat{Σ}}_{Bart, n}$ defined in (4). Denote the optimal bandwidths for ${\hat{Σ}}_{0, 1, n}$ and ${\hat{Σ}}_{Bart, n}$ by $l_{0, 1}^{⏧}$ and $l_{Bart}^{⏧}$ , respectively. According to (15) and (17), and Equation (5.2) of Andrews (Citation1991), they are given by $l_{0, 1}^{⏧} \sim {(\frac{3 κ_{1} n}{2 (1 + c_{1})})}^{1 / 3} and l_{Bart}^{⏧} \sim {(\frac{3 κ_{1} n}{2})}^{1 / 3},$ respectively, where κ₁ is defined in (18). Denote the resulting optimal estimators by ${\hat{Σ}}_{0, 1, n}^{⏧}$ and ${\hat{Σ}}_{Bart, n}^{⏧}$ , respectively. The ratio of their weighted MSEs (see (16)) is given below.

Proposition 4.1. A

ssume the conditions in Theorems 3.1 and 3.2. Let $c_{0}, c_{1} > 0, W ≻ 0$ , and $μ (t) = 0$ for all $t \in [0, 1]$ . Then ${MSE}_{W} ({\hat{Σ}}_{0, 1, n}^{⏧}) / {MSE}_{W} ({\hat{Σ}}_{Bart, n}^{⏧}) \to {(1 + c_{1})}^{2 / 3} > 1$ .

According to Proposition 4.1, the non-robust estimator ${\hat{Σ}}_{Bart, n}^{⏧}$ is more efficient than the robust estimator ${\hat{Σ}}_{0, 1, n}^{⏧}$ asymptotically. It makes sense. Note that the efficiency loss is smaller if c₁ is smaller. However, in finite sample, setting $c_{1} \approx 0$ may degenerate the estimator to the naive estimator ${\hat{Σ}}_{naive, n}$ defined in (7). Hence, using a small $c_{1} > 0$ is suggested only if the sample size n is extremely large. Practical suggestion on selecting c₁ is discussed in Section 4.3.

Besides, we also compare our estimator with the most promising (univariate) robust estimator proposed by Wu, Woodroofe, and Mentz (Citation2001), Wu (Citation2004), and Wu and Zhao (Citation2007), namely,(22) ${\hat{σ}}_{WZ 3, n}^{2} : = \frac{l}{2 (m - 1)} \sum_{k = 2}^{m} {(A_{k} - A_{k - 1})}^{2},$ (22) where $m = ⌊ n / l ⌋$ , and $A_{k} = l^{- 1} \sum_{i = 1 + (k - 1) l}^{k l} Y_{i}$ is the kth non-overlapping batch mean (NBM) for $k = 1, \dots, m$ . The optimal MSE of ${\hat{σ}}_{WZ 3, n}^{2}$ was not derived by the authors. For reference, we derive it under the constant mean assumption. Applying similar techniques as in Theorems 3.1 and 3.2, we have $Bias ({\hat{σ}}_{WZ 3, n}^{2}) \sim - Σ_{1} / l$ and $var ({\hat{σ}}_{WZ 3, n}^{2}) \sim 7 Σ^{2} l / (2 n) .$ The optimal bandwidth is $l \sim {4 Σ_{1}^{2} n / (7 Σ^{2})}^{1 / 3}$ . Consequently, $MSE ({\hat{Σ}}_{0, 1, n}) / MSE ({\hat{σ}}_{WZ 3, n}^{2}) \to {8 (1 + c_{1}) / 21}^{2 / 3}$ . In particular, when $c_{1} = 1$ , our estimator ${\hat{Σ}}_{0, 1, n}$ is uniformly better than ${\hat{σ}}_{WZ 3, n}^{2}$ , and satisfies that $MSE ({\hat{Σ}}_{Bart, n}) : MSE ({\hat{Σ}}_{0, 1, n}) : MSE ({\hat{σ}}_{WZ 3, n}^{2}) \approx 1.00 : 1.59 : 1.90$ when n is large and their respective optimal bandwidths are used.

4.3 Choices of q, c₀, c₁, and $l$

The best estimator in Wu and Zhao (Citation2007) has a MSE of size $O (n^{- 2 / 3})$ , whereas our proposed estimator ${\hat{Σ}}_{0, q, n}$ has a much smaller MSE, that is, $O {n^{- 2 q / (1 + 2 q)}}$ , if q > 1. In practice, if there is no prior information, we suggest q = 2, that is, assuming $CSD (X) = 2$ , which is essentially equivalent to the assumption ( $ϒ_{2} < \infty$ ) made by Paparoditis and Politis (Citation2001).

Although we develop theories for all $c_{1} > 0$ , it makes little sense to use $c_{1} \in (0, 1)$ statistically and intuitively. To see it, observe that ${\hat{Π}}_{k} (L_{k}) = {\hat{Ψ}}_{L_{k}} - {\hat{Ψ}}_{| k |}$ is a reasonable estimator of $Π_{k}$ only if $L_{k} > | k |$ , which is satisfied for all k if and only if $c_{1} \geq 1$ . Hence, it is sensible (but not necessary) to assume $c_{1} \in [1, \infty)$ , among which $c_{1} = 1$ minimizes the AMSE. So, $c_{1} = 1$ is suggested in practice. For q > 1, ${\hat{Σ}}_{p, q, n}$ has the same AMSE for any $c_{0} > 0$ , hence, c₀ does not affect the asymptotic behavior. We illustrate in Section C.4 of the supplementary materials that the finite sample performance of ${\hat{Σ}}_{p, q, n}$ is essentially the same for any c₀ that is not close to zero. In practice, we suggest using $c_{0} = 1$ as a default choice.

If an initial pilot estimate of $Σ_{p}$ is needed, we can use ${\hat{Σ}}_{p, q, n}$ with a rate optimal bandwidth $l = O (n^{θ^{⏧}})$ . In practice, we suggest $l = [ [2 n^{θ^{⏧}}] ]$ , where $[ [t] ] : = (2 \lor ⌈ t ⌉) \land (n - 1)$ . According to our simulation experience, this rule-of-thumb bandwidth gives reasonably good performance. Using the notation in (10), we denote the resulting pilot estimator by(23) ${\hat{Σ}}_{p, q, n}^{†} : = {\hat{Σ}}_{p, q, n} (Y_{1 : n}, l = [ [2 n^{1 / (1 + 2 p + 2 q)}] ], c_{0} = 1, c_{1} = 1) .$ (23)

In particular, for estimating $Σ \equiv Σ_{0}$ , our recommended default estimator is as simple as(24) ${\hat{Σ}}_{0, 2, n}^{†} = \sum_{k = - l}^{l} (1 - {| \frac{k}{l} |}^{2}) ({\hat{Ψ}}_{l + | k |} - {\hat{Ψ}}_{| k |}),$ (24) where $l = [ [2 n^{1 / 5}] ]$ and ${\hat{Ψ}}_{h} = {2 (n - | h | + 1)}^{- 1} \sum_{i = | h | + 1}^{n} {(Y_{i} - Y_{i - | h |})}^{\otimes 2}$ . If a more accurate estimate of $Σ_{p}$ is needed, we can use ${\hat{Σ}}_{p, q, n}$ with a fully optimal bandwidth $l \sim ϕ^{⏧} n^{θ^{⏧}}$ . From (17), $ϕ^{⏧}$ is a function of $Σ$ and $Σ_{p + q}$ . So, the value of $ϕ^{⏧}$ is unknown. We propose to first estimate $Σ$ and $Σ_{p + q}$ by the pilot estimators ${\hat{Σ}}_{0, 2, n}^{†}$ and ${\hat{Σ}}_{p + q, 2, n}^{†}$ . Then $ϕ^{⏧}$ is consistently estimated by plugging in these estimated values into (17) and (18), that is,(25) ${\hat{ϕ}}^{⏧} : = { \frac{\begin{matrix} (2 p + q + 1) (2 p + 2 q + 1) \\ {(vec {\hat{Σ}}_{p + q, 2, n}^{†})}^{⊺} W (vec {\hat{Σ}}_{p + q, 2, n}^{†}) \end{matrix}}{\begin{matrix} 2 q (1 + c_{1}) {(vec {\hat{Σ}}_{0, 1, n}^{†})}^{⊺} W (vec {\hat{Σ}}_{0, 2, n}^{†}) \\ + tr {W ({\hat{Σ}}_{0, 2, n}^{†} \otimes {\hat{Σ}}_{0, 2, n}^{†})} \end{matrix}} }^{ 1 / (1 + 2 p + 2 q)} .$ (25)

Using ${\hat{l}}^{⏧} : = [ [{\hat{ϕ}}^{⏧} n^{θ^{⏧}}] ],$ the estimator ${\hat{Σ}}_{p, q, n}$ is equipped with the optimal bandwidth asymptotically. The resulting estimator(26) ${\hat{Σ}}_{p, q, n}^{‡} : = {\hat{Σ}}_{p, q, n} (Y_{1 : n}, l = [ [{\hat{ϕ}}^{⏧} n^{1 / (1 + 2 p + 2 q)}] ], c_{0} = 1, c_{1} = 1)$ (26) is called the qth order MAC estimator of $Σ_{p}$ . It can be computed by Algorithm 1. The R-package MAC is built for implementing it.

Algorithm 1: Proposed MAC estimator ${\hat{Σ}}_{p, q, n}^{‡}$ for estimating $Σ_{p}$

[1] Input:

[2] (i) $Y_{1 : n}$ —d-dimensional time series;

[3] (ii) p—order of the estimand $Σ_{p}$ (set p = 0 for estimation of the ACM $Σ$ );

[4] (iii) q—order of the polynomial kernel $K_{q} (\cdot)$ (set q = 2 by default);

[5] (iv) c₀, c₁—parameters (set $c_{0} = c_{1} = 1$ by default); and

[6] (v) W—d × d weight matrix (set $W^{[r, s]} = 1 {r \leq s}$ for each $1 \leq r, s \leq d$ by default).

[7] begin

[8] Compute ${\hat{Σ}}_{0, 2, n}^{†}$ and ${\hat{Σ}}_{p + q, 2, n}^{†}$ according to (23);

[9] Compute ${\hat{ϕ}}^{⏧}$ according to (25);

[10] Compute the estimated optimal bandwidth ${\hat{l}}^{⏧} = [ [{\hat{ϕ}}^{⏧} n^{1 / (1 + 2 p + 2 q)}] ]$ ;

[11] Compute ${\hat{Σ}}_{p, q, n}^{‡} = {\hat{Σ}}_{p, q, n} (Y_{1 : n}, l = {\hat{l}}^{⏧}, c_{0}, c_{1})$ according to (10).

return ${\hat{Σ}}_{p, q, n}^{‡}$ – MAC estimator of $Σ_{p}$ .

4.4 Discussion on Robustness to Heteroscedasticity

Thus far we have assumed that the noise sequence ${X_{i}}$ is stationary (i.e., without heteroscedasticity). Now, suppose that ${X_{i}}$ is not stationary but satisfies $E (X_{i}) = 0$ . In this case, we define the finite-n version of $Σ_{0} = \lim_{n \to \infty} n var ({\bar{Y}}_{n})$ by(27) $\begin{matrix} Σ_{0, n} : = n var ({\bar{Y}}_{n}) = n E ({\bar{X}}_{n} {\bar{X}}_{n}^{⊺}) \\ = \frac{1}{n} \underset{1 \leq i, j \leq n}{\sum \sum} E (X_{i} X_{j}^{⊺}) = \sum_{| k | < n} Π_{k, n}, \end{matrix}$ (27) where $Π_{k, n} = \sum_{i = 1 + | k |}^{n} E (X_{i} X_{i - | k |}^{⊺} + X_{i - | k |} X_{i}^{⊺}) / (2 n)$ . Following the arguments in Section 3.1, it is not hard to see that ${\hat{Π}}_{k} (L)$ still approximates $Π_{k, n}$ . Thus, it is not surprising that the proposed estimator ${\hat{Σ}}_{p, q, n}$ continues to be consistent for $Σ_{p, n} : = \sum_{| k | < n} | k |^{p} Π_{k, n}$ . Similar to Section 8 of Andrews (Citation1991), we can extend the consistency results to heteroscedastic time series. Suppose the regularity conditions of Theorem 4.1, Theorem 4.2, and part (1) of Corollary 4.1 are satisfied except the following changes.

The stationarity of the noise sequence ${X_{i}}$ is removed. However, it still satisfies that there is some $ν > 4$ such that $X_{i} \in L^{ν}$ and $E (X_{i}) = 0$ for all $i \in Z$ .
The assumption $CSD (X) = p + q$ is changed to ${CSD}_{*} (X) = p + q$ , where $\begin{matrix} {CSD}_{*} (X) : = \sup \\ {P \in N : \max_{_{r, s \in {1, \dots, d}}} \sum_{k = - \infty}^{\infty} | k |^{P} \sup_{i \geq 1} E (X_{i}^{[r]} X_{i - k}^{[s]}) < \infty} . \end{matrix}$

We also define, for each $P \in N_{0}$ , that $\begin{matrix} Σ_{P, *}^{[r, s]} : = \sum_{k = - \infty}^{\infty} | k |^{P} \sup_{i \geq 1} E (X_{i}^{[r]} X_{i - k}^{[s]}) and \\ Ξ_{*}^{[r, s]} : = Σ_{0, *}^{[r, r]} Σ_{0, *}^{[s, s]} + {(Σ_{0, *}^{[r, s]})}^{2} . \end{matrix}$

Note that ${CSD}_{*} (X) = P$ implies that $Σ_{P, *}^{[r, s]} < \infty$ and $Ξ_{*}^{[r, s]} < \infty$ for each r, s. Under the modified regularity conditions, the conclusions of Theorems 4.1 and 4.2 are updated to(28) $\underset{n \to \infty}{\lim \sup} l^{2 q} {E ({\hat{Σ}}_{p, q, n}^{[r, s]}) - Σ_{p, n}^{[r, s]}}^{2} \leq {(Σ_{p + q, *}^{[r, s]})}^{2},$ (28) (29) $\begin{matrix} \underset{n \to \infty}{\lim \sup} \frac{n}{l^{1 + 2 p}} var ({\hat{Σ}}_{p, q, n}^{[r, s]}) \\ \leq \frac{4 q^{2} (1 + c_{1}) Ξ_{*}^{[r, s]}}{(2 p + 1) (2 p + q + 1) (2 p + 2 q + 1)} \end{matrix}$ (29)

for all $r, s \in {1, \dots, d}$ . If $l = O (n^{1 / (1 + 2 p + 2 q)})$ , then (28) and (29) imply that $\underset{n \to \infty}{\lim \sup} n^{2 q / (1 + 2 p + 2 q)} E {({\hat{Σ}}_{p, q, n}^{[r, s]} - Σ_{p, n}^{[r, s]})}^{2} \leq C$ for some $C < \infty$ . Hence, ${\hat{Σ}}_{p, q, n}$ is a consistent estimator of $Σ_{p, n}$ with the optimal convergence rate. Examples and finite-sample performance of ${\hat{Σ}}_{p, q, n}$ in the heteroscedastic case are shown in Section 5.3.

5 Finite Sample Performance

5.1 Efficiency and Robustness Against One Jump

We compare ${\hat{Σ}}_{0, q, n}$ with the following estimators in terms of efficiency and robustness.

(CV) Crainiceanu and Vogelsang (Citation2007) proposed to estimate one potential CP D₁ and then construct a de-trended process, say ${{\hat{X}}_{i}^{CV}}$ . The modified OBM estimator ${\hat{σ}}_{CV, n}^{2}$ is defined by applying the estimator (3) to ${{\hat{X}}_{i}^{CV}}$ instead of ${{\hat{X}}_{i}}$ . Andrews (Citation1991)’s AR(1)-plug-in rule is used for selecting the optimal batch size.
(WZ) Wu and Zhao (Citation2007) used NBMs ${A_{k}}$ to estimate $σ^{2}$ . They proposed ${\hat{σ}}_{WZ 1, n}^{2} : = π l {4 {(m - 1)}^{2}}^{- 1} \sum_{k = 2}^{m} | A_{k} - A_{k - 1} |, {\hat{σ}}_{WZ 2, n}^{2} : = l {(2 z_{3 / 4})}^{- 1} \underset{k \in {2, \dots, m}}{median} | A_{k} - A_{k - 1} |$ , and ${\hat{σ}}_{WZ 3, n}^{2}$ defined in (22), where $\underset{k \in K}{median} x_{k}$ denotes the median of ${x_{k}}_{k \in K}$ , and z_p is the $100 p %$ quantile of $N (0, 1)$ . They showed, under regularity conditions, that ${\hat{σ}}_{WZ 1, n}^{2}$ and ${\hat{σ}}_{WZ 2, n}^{2}$ are weakly consistent if $l = [ [n^{5 / 8}] ]$ , and that ${\hat{σ}}_{WZ 2, n}^{2}$ is $L^{2}$ consistent with $MSE ({\hat{σ}}_{WZ 3, n}^{2}) = O (n^{- 2 / 3})$ if $l ≍ n^{1 / 3}$ . For ${\hat{σ}}_{WZ 3, n}^{2}$ , we implement it with the an estimated optimal bandwidth by using our proposed estimator (see Section 4.2 for more details). Denote these three estimators by WZ1, WZ2, and WZ3, respectively.
(AC) Altissimoa and Corradic (Citation2003) proposed using Bartlett kernel estimator after locally detrending the mean. The bandwidth is selected by cross-validation. Denote the resulting estimator by ${\hat{σ}}_{AC, n}^{2}$ . They proved that ${\hat{σ}}_{AC, n}^{2}$ is consistent (see ).
(MAC) We use the estimators ${\hat{Σ}}_{0, 1, n}^{‡}$ and ${\hat{Σ}}_{0, 2, n}^{‡}$ , as well as the pilot estimator ${\hat{Σ}}_{0, 2, n}^{†}$ . Denote them by MAC(1), MAC(2), and MAC(P), respectively.

Their detailed formulas are presented in Section C.1 of the supplementary materials for reference. We recall from that ${\hat{σ}}_{CV, n}^{2}$ is robust to one CP without trend; ${\hat{σ}}_{WZ 1, n}^{2}$ , ${\hat{σ}}_{WZ 2, n}^{2}$ and ${\hat{σ}}_{WZ 3, n}^{2}$ are proved to be robust to trends only; ${\hat{σ}}_{AC, n}^{2}$ is only proved to be robust to finitely many CPs; and the proposed estimators ${\hat{Σ}}_{0, 1, n}^{⏧}$ and ${\hat{Σ}}_{0, 2, n}^{⏧}$ are robust to both trends and a divergent number of CPs. If there is at most one CP, then ${\hat{σ}}_{CV, n}^{2}$ is an oracle estimator because, in practice, we rarely know that there is at most one CP.

Consider the ARMA(1,1) model: $Y_{i} = X_{i} + μ_{i}$ where $X_{i} = a X_{i - 1} + b ε_{i - 1} + ε_{i}$ and $ε_{i} \overset{iid}{\sim} N (0, 1)$ , for $i = 1, \dots, n$ . In particular, consider $a = b = 0.2, 0.4, 0.6$ (Models A1–A3, respectively), $n = 400 \times 4^{j}, j = 0, \dots, 3$ , and five different mean sequences $μ_{i} = ξ \times 1 {i \leq n / 2}$ for $ξ = 0, \dots, 4$ . The MSEs are estimated by using 2000 independent replications. The lack of efficiency ( ${MSE}_{0}$ ) is measured by the MSE when ξ = 0, whereas the lack of robustness is measured by the standard derivation ( ${MSE}_{sd}$ ) of the MSEs across $ξ \in {0, 1, \dots, 4}$ . Smaller ${MSE}_{0}$ and smaller ${MSE}_{sd}$ imply higher efficiency and robustness, respectively.

The results are shown in . Clearly, ${\hat{σ}}_{WZ 1, n}^{2}$ and ${\hat{σ}}_{WZ 2, n}^{2}$ perform badly in terms of both efficiency and robustness. The major competitor ${\hat{σ}}_{WZ 3, n}^{2}$ performs reasonably well in terms of both two measures, however, it is less efficient than all of our proposed estimators ( ${\hat{Σ}}_{0, 1, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{†}$ ) in nearly all cases. The estimator ${\hat{σ}}_{AC, n}^{2}$ is quite efficient when the mean is a constant, however, it loses all of its efficiency when the jump size is large. For example, when n = 400, its MSE inflates 407% when the jump magnitude ξ increases from 0 to 4. Besides, the cross-validation step makes it computationally inefficient.

Fig. 2 The values of $log ({MSE}_{0})$ and $log ({MSE}_{sd})$ are plotted against n, where ${MSE}_{0}$ denotes the MSE when the jump size ξ = 0, and ${MSE}_{sd}$ denotes the standard deviation of the MSEs across different ξ. Recall that smaller ${MSE}_{0}$ and smaller ${MSE}_{sd}$ imply higher efficiency and robustness, respectively. Note that ${\hat{σ}}_{AC, n}^{2}$ is computed only when $n \leq 400$ because it requires a computationally intensive cross-validation step. Note that horizontal axis is plotted in the logarithmic scale for better visualization.

The proposed estimators ${\hat{Σ}}_{0, 1, n}^{‡}$ and ${\hat{Σ}}_{0, 2, n}^{‡}$ perform the best in nearly all cases. The advantage of ${\hat{Σ}}_{0, 2, n}^{‡}$ is increasingly obvious when n increases. The pilot estimator ${\hat{Σ}}_{0, 2, n}^{†}$ performs quite well, so it is justifiable to use it as an initial guess. It is remarked that ${\hat{Σ}}_{0, 2, n}^{†}$ performs very well in Model A3 because its default tuning parameter accidentally matches the theoretically optimal value. However, this privilege is not general (see, e.g., of another experiment in Section 5.3).

5.2 Robustness Against Trend and Multiple Jumps

In this subsection, we investigate the robustness against both trends and jumps. Consider the same models of ${X_{i}}$ in Section 5.1, but the mean function is replaced by $μ_{i} = (i / n) 1 {0.4 \leq i / n < 0.7} + {(5 i / n - 4)}^{2} 1 {i / n \geq 0.7}$ . shows a typical realization of ${Y_{i}}$ in Model A2. Observe that the trend effect and jump effect are not obvious because they are masked by the intrinsic variability of the noises ${X_{i}}$ . This scenario mimics the situation in which the observed time series looks stationary but, indeed, it has been contaminated by a hardly noticeable nonconstant trend and structural breaks.

Fig. 3 Thin solid line: A realization of ${Y_{i}}$ in Model A2 of length n = 400. Thick solid line: The nonconstant mean function ${μ_{i}}$ in Section 5.2. Dotted vertical lines: The change points.

The simulation result is visualized in . First, the MSE of the previous oracle estimator ${\hat{σ}}_{CV, n}^{2}$ does not decrease with n because it is no longer consistent when the mean is not a piecewise constant function. The estimators ${\hat{σ}}_{WZ 1, n}^{2}$ and ${\hat{σ}}_{WZ 2, n}^{2}$ perform poorly again. The estimator ${\hat{σ}}_{WZ 3, n}^{2}$ and our proposed ${\hat{Σ}}_{0, 1, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{†}$ perform well. Among them, ${\hat{σ}}_{WZ 3, n}^{2}$ performs least well, whereas ${\hat{Σ}}_{0, 2, n}^{‡}$ and ${\hat{Σ}}_{0, 2, n}^{†}$ perform most promisingly. The take-home message is that even if the trend is relatively insignificant, the impact on the estimators of $σ^{2}$ can be catastrophic especially when the mean-structure is misspecified.

Fig. 4 The values of $log {MSE (\cdot)}$ of different estimators are plotted against the sample size n in Models A1–A3. Here the mean function consists of nonconstant trends and multiple jumps (see Section 5.2 and ). Note that horizontal axis is plotted in the logarithmic scale.

Fig. 4 The values of log {MSE(·)} of different estimators are plotted against the sample size n in Models A1–A3. Here the mean function consists of nonconstant trends and multiple jumps (see Section 5.2 and Figure 3). Note that horizontal axis is plotted in the logarithmic scale.

5.3 Multivariate Time Series With Heteroscedastic Errors

We consider estimation of $Σ_{0, n}$ (defined in (27)) for a bivariate time series ${Y_{i} = {(Y_{i 1}, Y_{i 2})}^{⊺}}_{i = 1}^{n}$ with time-varying means and heteroscedastic errors. Let $Y_{i j} = μ_{i j} + τ_{i j} X_{i j}$ for $i = 1, \dots, n$ and j = 1, 2, where μ_ij is the mean, X_ij is a stationary noise, and τ_ij creates heteroscedasticity. Two mean sequences are used: (i) $μ_{i j} = 0$ for all i, j, and (ii) $μ_{i 1} = i / n, μ_{i 2} = 1 (i / n > 1 / 3)$ . We set $τ_{i 1} = 1 + i / (4 n), τ_{i 2} = 1 + sin (4 π i / n) / (4 n)$ , and generate ${X_{i j}}$ as follows: $\begin{matrix} [\begin{matrix} X_{i 1} \\ X_{i 2} \end{matrix}] = [\begin{matrix} 0.27 & - 0.09 \\ - 0.18 & 0.18 \end{matrix}] [\begin{matrix} X_{i - 1, 1} \\ X_{i - 1, 2} \end{matrix}] \\ + [\begin{matrix} 0.01 & - 0.14 \\ 0.28 & 0.08 \end{matrix}] ε_{i - 1} + ε_{i}, i = 1, 2, \dots, n, \end{matrix}$ where $ε_{0}, \dots, ε_{n}$ are independent standard bivariate normal random vectors.

The proposed estimators ${\hat{Σ}}_{0, 1, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{‡}$ , and ${\hat{Σ}}_{0, 2, n}^{†}$ are evaluated. We compare them with the standard Bartlett kernel estimator ${\hat{Σ}}_{Bart, n}$ and QS kernel estimators ${\hat{Σ}}_{QS, n}$ (see (4)). The bandwidths of ${\hat{Σ}}_{Bart, n}$ and ${\hat{Σ}}_{QS, n}$ are selected by Andrews’s (Citation1991) vector AR(1)-plug-in rule. As far as we know, in the multivariate setting, there exists no other estimator that is proved to be consistent and optimal in the presence of nonconstant mean, autocorrelation and heteroscedasticity. The results are shown in . We also repeat the experiment with homoscedastic errors, that is, $τ_{i j} = 1$ for all i, j. Since the results are very similar to the heteroscedastic case, we only present the result in of the supplementary materials.

Fig. 5 The values of $log {{MSE}_{W} (\cdot)}$ for ${\hat{Σ}}_{Bart, n}$ , ${\hat{Σ}}_{QS, n}, {\hat{Σ}}_{0, 1, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{‡}$ and ${\hat{Σ}}_{0, 2, n}^{†}$ in the heteroscedastic case are plotted against n, where $vec (W) = {(1, 1 / 2, 1 / 2, 1)}^{⊺}$ is used, and ${MSE}_{W} (\cdot)$ is defined in (16). The left and right plots show the results in the constant mean and nonconstant mean cases, respectively. Note that the horizontal axes are plotted in the logarithmic scale.

Fig. 5 The values of log {MSEW(·)} for Σ̂Bart,n, Σ̂QS,n,Σ̂0,1,n‡, Σ̂0,2,n‡ and Σ̂0,2,n† in the heteroscedastic case are plotted against n, where vec(W)=(1,1/2,1/2,1)⊺ is used, and MSEW(·) is defined in (16). The left and right plots show the results in the constant mean and nonconstant mean cases, respectively. Note that the horizontal axes are plotted in the logarithmic scale.

From , all five estimators are consistent in the constant-mean case. However, ${\hat{Σ}}_{Bart, n}$ and ${\hat{Σ}}_{QS, n}$ are no longer consistent when the mean is not a constant. On the other hand, the mean-structure does not affect the performance of ${\hat{Σ}}_{0, 1, n}^{‡}$ , ${\hat{Σ}}_{0, 2, n}^{‡}$ , and ${\hat{Σ}}_{0, 2, n}^{†}$ . It verifies the claimed consistency and robustness. In addition, although the pilot estimator ${\hat{Σ}}_{0, 2, n}^{†}$ does not perform as well as the optimal estimator ${\hat{Σ}}_{0, 2, n}^{‡}$ , it is still able to give sufficiently good results. It supports the use of ${\hat{Σ}}_{0, 2, n}^{†}$ as an initial estimator in practice.

5.4 Change-Point Detection

In this subsection, we consider the CP detection problem, that is, to test $H_{0} : E Y_{1} = \dots = E Y_{n}$ against $H_{1} : \exists D_{1} such that E Y_{1} = \dots = E Y_{D_{1} - 1} \neq E Y_{D_{1}} = \dots = E Y_{n}$ . We analyze (i) whether CP tests are monotonically powerful with respect to the magnitude of jump $| E Y_{D_{1}} - E Y_{D_{1} - 1} |$ ; and (ii) their power losses under a misspecified alternative hypothesis.

Let $T_{n} (k) : = n^{- 1 / 2} \sum_{i = 1}^{k} {\hat{X}}_{i}$ be the CUSUM process of ${\hat{X}}_{i} = Y_{i} - {\bar{Y}}_{n}$ . The standard KS test statistic is defined by $T_{n} : = \max_{k \in {1, \dots, n}} | T_{n} (k) / \hat{σ} |$ , where $\hat{σ}$ is a consistent estimator of σ. Then, H₀ is rejected at 5% level if $T_{n} > 1.358$ . Alternatively, a self-normalized KS test (Shao and Zhang Citation2010) can be used. Following them, we compare

(SZ) their self-normalized KS test, and
(KS) the standard KS tests with different estimators of $σ^{2}$ , namely, ${\hat{σ}}_{A, n}^{2}$ , ${\hat{σ}}_{CV, n}^{2}$ ${\hat{σ}}_{JX, n}^{2}$ , ${\hat{σ}}_{WZ 3, n}^{2}$ , ${\hat{σ}}_{AC, n}^{2}$ , and ${\hat{Σ}}_{0, 2, n}^{‡}$ , where ${\hat{σ}}_{A, n}^{2}$ is Bartlett kernel estimator with Andrew’s AR(1) plug-in selector of $l$ ; the estimator ${\hat{σ}}_{JX}^{2}$ is proposed by Juhl and Xiao (Citation2009); and all other estimators are defined in Section 5.1.

Detailed formulas of the above CP tests and estimators of $σ^{2}$ are presented in Section C.3 of the supplementary materials for reference. Consider the bilinear model: $Y_{i} = X_{i} + μ_{i}$ where $X_{i} = (a + b ε_{i}) X_{i - 1} + ε_{i}$ and $ε_{i} \overset{iid}{\sim} N (0, 1)$ , for $i = 1, \dots, n$ . The physical dependence measure decays at the rate $δ_{4, n}^{[1]} = O (ϱ^{n})$ , where $ϱ = \sqrt{a^{2} + b^{2}}$ (see Wu Citation2005, Citation2011). If ϱ is larger, the serial dependence is stronger. We use $a = 0.33, 0.36, 0.39$ and $b = 0.5, 0.6, 0.7$ so that $ϱ = 0.6, 0.7, 0.8$ , respectively. Denote them by Models B1–B3, respectively.

Both SZ and KS tests assume that the mean function is a piecewise constant with one CP when H₀ is false. If it is actually the case, we call that the alternative hypothesis is correctly specified, otherwise, the alternative hypothesis is said to be misspecified. We consider the following two alternative hypotheses in the experiments:

(correctly specified alternative) H₁: $μ_{i} = ξ \times 1 {i / n > 1 / 4}$ for $i = 1, \dots, n$ ; and
(misspecified alternative) $H_{1}^{'}$ : $μ_{i} = ξ (1 + e^{- 10 i / n + 5}) \times 1 {i / n > 1 / 4}$ for all $i = 1, \dots, n$ ,

where the value of

ξ \in R

controls the jump magnitude. If ξ = 0, both H₁ and

H_{1}^{'}

reduce to H₀. In H₁, the mean jumps to ξ at

i = ⌊ n / 4 ⌋ + 1

, and then stays constant; whereas, in

H_{1}^{'}

, the mean shoots up at

i = ⌊ n / 4 ⌋ + 1

, and then decays to ξ. In practice, CP may arrive like

H_{1}^{'}

instead of H₁, hence, a good CP test should be powerful in both cases. A good size-α CP test should satisfy the following four properties, where

α \in (0, 1)

(Size correctness) The probability of rejecting H₀ is close to α when H₀ is correct.
(Powerfulness) The probability of rejecting H₀ is high when H₀ is incorrect.
(Monotonicity of power) The power is increasing with the magnitude of jump $| ξ |$ .
(Robustness) The test is still powerful under misspecified alternative hypotheses.

The simulation is conducted for n = 100, 400, 800 with nominal size $α = 5 %$ . Since the results are similar under different models, we only report the results under Model B2 here (see ). The full results are deferred to Section C.3 in the supplementary materials. The size-adjusted power curves are also presented in the supplementary materials for reference. Under H₁, all tests except KS(A) and KS(JX) have monotonic powers with respect to $| ξ |$ . The test KS(WZ3) commits the Type I error more frequently than the nominal value even when the sample size is large. This over-size phenomenon is due to the use of inefficiency estimator of $σ^{2}$ . The power curves are largely the same for KS(CV), KS(AC), and KS(MAC(2)) as they are essentially the same test. Observe that SZ is significantly less powerful when $0 < ξ < 1$ .

Fig. 6 The powers of the CP tests defined in Section 5.4 are plotted against the jump magnitude ξ under Model B2. The scenarios under well-specified alternative H₁ and misspecified alternative $H_{1}^{'}$ are shown in the upper and lower plots, respectively. Dashed horizontal lines indicate the significance level $α = 5 %$ and zero. Note that horizontal axis is plotted in the logarithmic scale for better visualization.

Under $H_{1}^{'}$ , all tests except KS(MAC(2)) and KS(WZ(3)) immediately lose all power when $| ξ | > 0$ . In particular, SZ remains powerless even when n and ξ are large. It is not desirable because SZ is very sensitive to whether the alternative hypothesis is well-specified. For KS(A), KS(CV), KS(JX), and KS(AC), they are not powerfully because of using inconsistent or inefficient estimators of $σ^{2}$ . It gives a sign of warning to use these tests in practice. It is worth emphasizing that KS(WZ3) seems more powerful than KS(MAC(2)). However, it is just because KS(WZ3) rejects too frequently no matter H₀ is true or not. Hence, the apparently more powerful KS(WZ3) test is not reliable. Among all tests above, our proposed test KS(MAC(2)) is the only monotonically powerful test that has accurate size and is insensitive to misspecification of the alternative hypothesis.

6 Empirical Studies

6.1 Change Point Detection in S&P 500 Index

The Standard & Poor’s 500 (S&P 500) Index is a stock market index based on 500 representative companies in the USA. The daily adjusted close prices of the index, from 3 January 2006 to 30 December 2011 (n = 1511), are investigated. The dataset can be downloaded from http://finance.yahoo.com/quote/%5EGSPC/history. The financial crisis in 2008 is believed to have a tremendous impact on the global stock market. We suspect that it led to an abrupt change in the stock market. Testing this claim is important since a noncontinuous impact implies that the economy may have a structural change.

Denote the logarithm of the S&P 500 Index by Y_i. Observe that there is an obvious trend in Y_i (see ). A standard approach is to study the return series $y_{i} : = Y_{i} - Y_{i - 1}$ to get rid of the trend component. This differencing step is essential for many standard CP tests, for example, SZ and KS tests presented in Section 5.4, because they cannot handle trends. Using the CUMSUM-type CP estimator ${\hat{D}}_{1}$ (see (1) of the supplementary materials for its formula), we estimate the CP to be 10 March 2009. It is remarked that the same CP is detected by the method described in Altissimoa and Corradic (Citation2003). Hence, the CP test fails to capture the 2008 financial crisis. Indeed, testing H₀: “ $E y_{1} = \dots = E y_{n}$ ” by the KS(MAC) test defined in Section 5.4, we fail to reject H₀ at 5% level. We conclude that the 2008 financial crisis has no jump impact on the return y_i. Since taking the difference of Y_i may cancel out the potential jump effect, it seems desirable to analyze Y_i directly (see Vogelsang Citation1999 for a similar analysis). Using the CP test proposed by Wu and Zhao (Citation2007), we can test H₀: “The mean function $i \mapsto E Y_{i}$ is continuous” against H₁: “The mean function $i \mapsto E Y_{i}$ has a jump-discontinuity.” The test statistic is $Q_{n} : = {(k_{n} \hat{σ})}^{- 1} \max_{k_{n} \leq i \leq n - k_{n}} | \sum_{j = i + 1}^{k_{n} + i} Y_{j} - \sum_{j = i - k_{n} + 1}^{i} Y_{j} |,$ where ${\hat{σ}}^{2}$ is a consistent estimator of $σ^{2}$ , that is, the AVC of ${Y_{i}}$ ; and $k_{n} = ⌊ n^{0.6} ⌋$ . Then H₀ is rejected if Q_n is large. Using MAC(2) to estimate σ, we obtain $\hat{σ} = 0.0517$ ; and found that H₀ is rejected at any reasonable level. It is remarked that σ is estimated to be 0.0434 by using the estimator WZ3. Although this estimate is a bit smaller than our proposed estimate, the same conclusion for testing H₀ is obtained if this estimate is used in the test statistic Q_n. Although Wu and Zhao (Citation2007) did not provide any estimator of the CP, they argue that if i + 1 is a discontinuity point, then the difference of the averages inside the statistic Q_n should be large. Following their idea, ${\hat{D}}_{WZ} : = 1 + \underset{k_{n} \leq i \leq n - k_{n}}{\arg \max} | \sum_{j = i + 1}^{k_{n} + i} Y_{j} - \sum_{j = i - k_{n} + 1}^{i} Y_{j} |$ is a reasonable estimator of the CP. The estimated CP, ${\hat{D}}_{WZ}$ , is 7 October 2008 (see ). It indicates the 2008 financial crisis quite accurately. It coincides with our understanding of the stock market.

Fig. 7 Time series plot of ${Y_{i}}$ , that is, the daily S&P 500 Index (3 January 2006–30 December 2011) in the log scale (see Section 6.1). The vertical dotted line indicates the value of ${\hat{D}}_{WZ}$ estimated by the statistic in Wu and Zhao (Citation2007). Here $σ^{2}$ is estimated by MAC(2).

6.2 Simultaneous Change Point Detection in Several Indices

Besides S&P 500 Index mentioned in Section 6.1, there are several other stock market indices that are commonly used by traders, for example, Dow Jones Index, Nasdaq Composite, and Russell 2000. In this subsection, we investigate whether we can make use of these four market indices simultaneously to make a more precise detection of the 2008 financial crisis.

Consider the squared daily returns, which can be used as proxies for daily volatilities, of the aforementioned four indices in the period 1 July 2008–30 December 2008 (see ). Applying the CUSUM-type CP estimator ${\hat{D}}_{1}$ (see (1) of the supplementary materials for its formula) to each index individually, we obtain the same CP 29 September 2018. It is remarked that the no CP null hypothesis is rejected at 5% level by the test KS(MAC(2)) for each individual index.

Fig. 8 The squared returns of four stock indices (1 July 2008–30 December 2008) (see Section 6.2). The two vertical lines denote the CP locations. The earlier and later CPs are detected by the multivariate and univariate CUSUM CP estimators, respectively.

Since these stock market indices are highly correlated and are believed to follow the market trend very closely, a CP (if any) is likely to appear simultaneously. Hence, using multivariate time series for detecting a CP can be more accurate and precise. Applying the multivariate version of the KS CP test (Horváth, Kokoszka, and Steinebach Citation1999) to the four indices, we detect the CP to be 15 September 2008. From , the squared returns between 15 and 29 September are slightly higher than the first portion of the series. Hence, using multivariate time series helps detecting these small changes. Consequently, multivariate tests are potentially more useful in practice. It is also remarked that the no simultaneous CP alternative is rejected at 5% level by the CP test (Horváth, Kokoszka, and Steinebach Citation1999) with our proposed MAC(2) estimator.

7 Conclusions

In this article, we propose an estimator of the ACM in nonstationary time series. The estimator has several desirable features: (i) it is robust against unknown trends and a divergent number of jumps; (ii) it is optimal in the sense that an asymptotically correct optimal bandwidth can be implemented robustly; (iii) it is statistically efficient since it has the optimal $L^{2}$ convergence rate for different strength of serial dependence; (iv) it is computationally fast because neither numerical optimization, trend estimation, nor CPs detection is required; and (v) it is handy because its formula can be as simple as (24).

Some applications of the estimator are illustrated. In particular, we found that the CP test equipped with the proposed estimator is the only available test which is monotonically powerful and insensitive to a misspecified alternative hypothesis.

Supplementary Materials

Supplementary materials include graphical illustration, additional simulation results, and proofs. The R-package MAC for computing the proposed estimator is also provided.

Supplemental material

Supplemental Material

Download Zip (754.5 KB)

Acknowledgments

The author thanks the editor Christian Hansen, the associate editor, and two reviewers for their detailed and insightful comments. The author also gratefully thanks Neil Shephard for his helpful advice on improving the estimator as well as Xiao-Li Meng, Jim Stock and Pierre Jacob for fruitful discussions.

Additional information

Funding

This research was supported by the Direct Grant (4053356) provided by the Chinese University of Hong Kong, and the Early Career Scheme (24306919) provided by the University Grant Committee of HKSAR.

References

Altissimoa, F., and Corradic, V. (2003), “Strong Rules for Detecting the Number of Breaks in a Time Series,” Journal of Econometrics, 117, 207–244. DOI: https://doi.org/10.1016/S0304-4076(03)00147-7.
Web of Science ®Google Scholar
Anderson, T. W. (1971), The Statistical Analysis of Time Series, New York: Wiley.
Google Scholar
Andrews, D. W. K. (1991), “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59, 817–858. DOI: https://doi.org/10.2307/2938229.
Web of Science ®Google Scholar
Banerjeea, A., and Urga, G. (2005), “Modelling Structural Breaks, Long Memory and Stock Market Volatility: An Overview,” Journal of Econometrics, 19, 1–34. DOI: https://doi.org/10.1016/j.jeconom.2004.09.001.
Google Scholar
Barndorff-Nielsen, O. E., and Shephard, N. (2004), “Power and Bipower Variation With Stochastic Volatility and Jumps,” Journal of Financial Econometrics, 2, 1–37. DOI: https://doi.org/10.1093/jjfinec/nbh001.
Google Scholar
Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods, New York: Springer.
Google Scholar
Brown, R. L., Durbin, J., and Evans, J. M. (1975), “Techniques for Testing the Constancy of Regression Relationships Over Time,” Journal of the Royal Statistical Society, Series B, 37, 149–192. DOI: https://doi.org/10.1111/j.2517-6161.1975.tb01532.x.
Google Scholar
Carlstein, E. (1986), “The Use of Subseries Values for Estimating the Variance of a General Statistic From a Stationary Sequence,” The Annals of Statistics, 14, 1171–1179. DOI: https://doi.org/10.1214/aos/1176350057.
Web of Science ®Google Scholar
Chan, K. W., and Yau, C. Y. (2016), “New Recursive Estimators of the Time-Average Variance Constant,” Statistics and Computing, 26, 609–627. DOI: https://doi.org/10.1007/s11222-015-9548-7.
Web of Science ®Google Scholar
Chan, K. W., and Yau, C. Y. (2017a), “Automatic Optimal Batch Size Selection for Recursive Estimators of Time-Average Covariance Matrix,” Journal of the American Statistical Association, 112, 1076–1089.
Web of Science ®Google Scholar
Chan, K. W., and Yau, C. Y. (2017b), “High Order Corrected Estimator of Asymptotic Variance With Optimal Bandwidth,” Scandinavian Journal of Statistics, 44, 866–898.
Web of Science ®Google Scholar
Crainiceanu, C. M., and Vogelsang, T. J. (2007), “Nonmonotonic Power for Tests of a Mean Shift in a Time Series,” Journal of Statistical Computation and Simulation, 77, 457–476. DOI: https://doi.org/10.1080/10629360600569394.
Web of Science ®Google Scholar
Csörgö, M., and Horváth, L. (1997), Limit Theorems in Change-Point Analysis, New York: Wiley.
Google Scholar
Davies, R. B., and Harte, D. S. (1987), “Tests for Hurst Effect,” Biometrika, 74, 95–101. DOI: https://doi.org/10.1093/biomet/74.1.95.
Web of Science ®Google Scholar
Degras, D., Xu, Z., Zhang, T., and Wu, W. B. (2012), “Testing for Parallelism Among Trends in Multiple Time Series,” IEEE Transactions on Signal Processing, 60, 1087–1097. DOI: https://doi.org/10.1109/TSP.2011.2177831.
Web of Science ®Google Scholar
Dette, H., Munk, A., and Wagner, T. (1998), “Estimating the Variance in Nonparametric Regression—What Is a Reasonable Choice?,” Journal of the Royal Statistical Society, Series B, 60, 751–764. DOI: https://doi.org/10.1111/1467-9868.00152.
Google Scholar
Flegal, J. M., and Jones, G. L. (2010), “Batch Means and Spectral Variance Estimation in Markov Chain Monte Carlo,” The Annals of Statistics, 38, 1034–1070. DOI: https://doi.org/10.1214/09-AOS735.
Web of Science ®Google Scholar
Gallant, A. R., and White, H. (1988), A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models, New York: Basil Blackwell.
Google Scholar
Gonçalves, S., and White, H. (2002), “The Bootstrap of the Mean for Dependent Heterogeneous Arrays,” Econometric Theory, 18, 1367–1384. DOI: https://doi.org/10.1017/S0266466602186051.
Web of Science ®Google Scholar
Grangera, C. W. J., and Hyung, N. (2004), “Occasional Structural Breaks and Long Memory With an Application to the S&P 500 Absolute Stock Returns,” Journal of Empirical Finance, 11, 399–421.
Google Scholar
Hall, P., and Horowitz, J. (2013), “A Simple Bootstrap Method for Constructing Nonparametric Confidence Bands for Functions,” The Annals of Statistics, 41, 1892–1921. DOI: https://doi.org/10.1214/13-AOS1137.
Web of Science ®Google Scholar
Hall, P., Kay, J. W., and Titterinton, D. M. (1990), “Asymptotically Optimal Difference-Based Estimation of Variance in Nonparametric Regression,” Biometrika, 77, 521–528. DOI: https://doi.org/10.1093/biomet/77.3.521.
Web of Science ®Google Scholar
Hirukawa, M. (2010), “A Two-Stage Plug-In Bandwidth Selection and Its Implementation for Covariance Estimation,” Econometric Theory, 26, 710–743. DOI: https://doi.org/10.1017/S0266466609990089.
Web of Science ®Google Scholar
Horváth, L., Kokoszka, P., and Steinebach, J. (1999), “Testing for Changes in Multivariate Dependent Observations With an Application to Temperature Changes,” Journal of Multivariate Analysis, 68, 96–119. DOI: https://doi.org/10.1006/jmva.1998.1780.
Web of Science ®Google Scholar
Jirak, M. (2015), “Uniform Change Point Tests in High Dimension,” The Annals of Statistics, 43, 2451–2483. DOI: https://doi.org/10.1214/15-AOS1347.
Web of Science ®Google Scholar
Juhl, T., and Xiao, Z. (2009), “Tests for Changing Mean With Monotonic Power,” Journal of Econometrics, 148, 12–24. DOI: https://doi.org/10.1016/j.jeconom.2008.08.020.
Web of Science ®Google Scholar
Kirch, C., Muhsal, B., and Ombao, H. (2015), “Detection of Changes in Multivariate Time Series With Application to EEG Data,” Journal of the American Statistical Association, 110, 1197–1216. DOI: https://doi.org/10.1080/01621459.2014.957545.
Web of Science ®Google Scholar
Künsch, H. R. (1989), “The Jackknife and the Bootstrap for General Stationary Observations,” The Annals of Statistics, 17, 1217–1241.
Web of Science ®Google Scholar
Lahiri, S. N. (2003), Resampling Methods for Dependent Data, New York: Springer.
Google Scholar
Lazarus, E., Lewis, D. J., Stock, J. H., and Watson, M. W. (2018), “HAR Inference: Recommendations for Practice,” Journal of Business & Economic Statistics, 36, 541–559. DOI: https://doi.org/10.1080/07350015.2018.1506926.
Web of Science ®Google Scholar
Liu, W., and Wu, W. B. (2010), “Asymptomatic of Spectral Density Estimates,” Econometric Theory, 26, 1218–1245. DOI: https://doi.org/10.1017/S026646660999051X.
Web of Science ®Google Scholar
Liu, Y., and Flegal, J. M. (2018), “Weighted Batch Means Estimators in Markov Chain Monte Carlo,” Electronic Journal of Statistics, 12, 3397–3442. DOI: https://doi.org/10.1214/18-EJS1483.
Web of Science ®Google Scholar
Meketon, M. S., and Schmeiser, B. (1984), “Overlapping Batch Means: Something for Nothing?,” in Proceedings of the 16th Conference on Winter Simulation, pp. 226–230.
Google Scholar
Mikkonen, S., Laine, M., Mäkelä, H. M., Gregow, H., Tuomenvirta, H., Lahtinen, M., and Laaksonen, A. (2014), “Trends in the Average Temperature in Finland, 1847–2013,” Stochastic Environmental Research and Risk Assessment, 29, 1521–1529. DOI: https://doi.org/10.1007/s00477-014-0992-2.
Google Scholar
Müller, U. K. (2014), “HAC Corrections for Strongly Autocorrelated Time Series,” Journal of Business & Economic Statistics, 32, 311–322. DOI: https://doi.org/10.1080/07350015.2014.931238.
Web of Science ®Google Scholar
Newey, W. K., and West, K. D. (1987), “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708. DOI: https://doi.org/10.2307/1913610.
Web of Science ®Google Scholar
Newey, W. K., and West, K. D. (1994), “Automatic Lag Selection in Covariance Matrix Estimation,” The Review of Economic Studies, 61, 631–653.
Web of Science ®Google Scholar
Paparoditis, E., and Politis, D. N. (2001), “Tapered Block Bootstrap,” Biometrika, 88, 1105–1119. DOI: https://doi.org/10.1093/biomet/88.4.1105.
Web of Science ®Google Scholar
Phillips, P. C. B. (2005), “HAC Estimation by Automated Regression,” Econometric Theory, 21, 116–142. DOI: https://doi.org/10.1017/S0266466605050085.
Web of Science ®Google Scholar
Ploberger, W., and Krämer, W. (1992), “The CUSUM Test With OLS Residuals,” Econometrica, 60, 271–285. DOI: https://doi.org/10.2307/2951597.
Web of Science ®Google Scholar
Politis, D. N. (2003), “Adaptive Bandwidth Choice,” Journal of Nonparametric Statistics, 15, 517–533. DOI: https://doi.org/10.1080/10485250310001604659.
Web of Science ®Google Scholar
Politis, D. N. (2011), “Higher-Order Accurate, Positive Semidefinite Estimation of Large-Sample Covariance and Spectral Density Matrices,” Econometric Theory, 27, 703–744.
Web of Science ®Google Scholar
Politis, D. N., and Romano, J. P. (1994), “The Stationary Bootstrap,” Journal of the American Statistical Association, 89, 1303–1313. DOI: https://doi.org/10.1080/01621459.1994.10476870.
Web of Science ®Google Scholar
Politis, D. N., Romano, J. P., and Wolf, M. (1999), Subsampling, New York: Springer.
Google Scholar
Rosenblatt, M. (1985), Stationary Sequences and Random Fields, Boston: Birkhäuser.
Google Scholar
Shao, X., and Zhang, X. (2010), “Testing for Change Points in Time Series,” Journal of the American Statistical Association, 105, 1228–1240. DOI: https://doi.org/10.1198/jasa.2010.tm10103.
Web of Science ®Google Scholar
Song, W. T., and Schmeiser, B. W. (1995), “Optimal Mean-Squared-Error Batch Sizes,” Management Science, 41, 110–123. DOI: https://doi.org/10.1287/mnsc.41.1.110.
Web of Science ®Google Scholar
Sun, Y. (2013), “Heteroscedasticity and Autocorrelation Robust F Test Using Orthonormal Series Variance Estimator,” Econometrics Journal, 16, 1–26. DOI: https://doi.org/10.1111/j.1368-423X.2012.00390.x.
Web of Science ®Google Scholar
Vogelsang, T. J. (1999), “Sources of Nonmonotonic Power When Testing for a Shift in Mean of a Dynamic Time Series,” Journal of Econometrics, 88, 283–299. DOI: https://doi.org/10.1016/S0304-4076(98)00034-7.
Web of Science ®Google Scholar
Wu, W. B. (2004), “A Test for Detecting Changes in Mean,” in Time Series Analysis and Applications to Geophysical Systems, eds. D. R. Brillinger, E. A. Robinson, and F. Schoenberg (Vol. 139), New York: Springer-Verlag, pp. 105–122.
Google Scholar
Wu, W. B. (2005), “Nonlinear System Theory: Another Look at Dependence,” Proceedings of the National Academy of Sciences of the United States of America, 102, 14150–14154.
PubMed Web of Science ®Google Scholar
Wu, W. B. (2007), “Strong Invariance Principles for Dependent Random Variables,” The Annals of Probability, 35, 2294–2320.
Web of Science ®Google Scholar
Wu, W. B. (2011), “Asymptotic Theory for Stationary Processes,” Statistics and Its Interface, 4, 207–226.
Web of Science ®Google Scholar
Wu, W. B., Woodroofe, M., and Mentz, G. (2001), “Isotonic Regression: Another Look at the Change Point Problem,” Biometrika, 88, 793–804. DOI: https://doi.org/10.1093/biomet/88.3.793.
Web of Science ®Google Scholar
Wu, W. B., and Zaffaroni, P. (2018), “Asymptotic Theory for Spectral Density Estimates of General Multivariate Time Series,” Econometric Theory, 34, 1–22. DOI: https://doi.org/10.1017/S0266466617000068.
Web of Science ®Google Scholar
Wu, W. B., and Zhao, Z. (2007), “Inference of Trends in Time Series,” Journal of the Royal Statistical Society, Series B, 69, 391–410. DOI: https://doi.org/10.1111/j.1467-9868.2007.00594.x.
Google Scholar

Mean-Structure and Autocorrelation Consistent Covariance Matrix Estimation

Abstract

1 Introduction

2 Review of Asymptotic Covariance Estimation

2.1 Mathematical Setup

2.2 Mathematical Notations

2.3 Estimation in Stationary Time Series

2.4 Estimation in Nonstationary Time Series

Table 1 A summary of the robust estimators introduced in Section 2.4, where AC, CV, J, W, and WZ represent estimators proposed in Altissimoa and Corradic (Citation2003), Crainiceanu and Vogelsang (Citation2007), Jirak (Citation2015), Wu (Citation2004), and Wu and Zhao (Citation2007), respectively.

3 Jump Robustness

3.1 Motivation

3.2 Proposed Robust Estimators and Overview of Main Results

Table 2 Summary of the statistical meanings of p, q, P and their associated quantities.

3.3 Theoretical Results

3.3.1 Bias and Variance Expressions

3.3.2 Theoretically Optimal Bandwidth

4 Extension, Discussion, and Implementation

4.1 Extension to Trend Robustness

4.2 Comparison With Standard Estimators

4.3 Choices of q, c₀, c₁, and $l$

4.4 Discussion on Robustness to Heteroscedasticity

5 Finite Sample Performance

5.1 Efficiency and Robustness Against One Jump

5.2 Robustness Against Trend and Multiple Jumps

5.3 Multivariate Time Series With Heteroscedastic Errors

5.4 Change-Point Detection

6 Empirical Studies

6.1 Change Point Detection in S&P 500 Index

6.2 Simultaneous Change Point Detection in Several Indices

7 Conclusions

Supplementary Materials

Supplemental Material

Acknowledgments

Related Research Data

References

Information for

Open access

Opportunities

Help and information

Mean-Structure and Autocorrelation Consistent Covariance Matrix Estimation

Abstract

1 Introduction

2 Review of Asymptotic Covariance Estimation

2.1 Mathematical Setup

2.2 Mathematical Notations

2.3 Estimation in Stationary Time Series

2.4 Estimation in Nonstationary Time Series

Table 1 A summary of the robust estimators introduced in Section 2.4, where AC, CV, J, W, and WZ represent estimators proposed in Altissimoa and Corradic (Citation2003), Crainiceanu and Vogelsang (Citation2007), Jirak (Citation2015), Wu (Citation2004), and Wu and Zhao (Citation2007), respectively.

3 Jump Robustness

3.1 Motivation

3.2 Proposed Robust Estimators and Overview of Main Results

Table 2 Summary of the statistical meanings of p, q, P and their associated quantities.

3.3 Theoretical Results

3.3.1 Bias and Variance Expressions

3.3.2 Theoretically Optimal Bandwidth

4 Extension, Discussion, and Implementation

4.1 Extension to Trend Robustness

4.2 Comparison With Standard Estimators

4.3 Choices of q, c0, c1, and l

4.4 Discussion on Robustness to Heteroscedasticity

5 Finite Sample Performance

5.1 Efficiency and Robustness Against One Jump

5.2 Robustness Against Trend and Multiple Jumps

5.3 Multivariate Time Series With Heteroscedastic Errors

5.4 Change-Point Detection

6 Empirical Studies

6.1 Change Point Detection in S&P 500 Index

6.2 Simultaneous Change Point Detection in Several Indices

7 Conclusions

Supplementary Materials

Supplemental Material

Acknowledgments

Additional information

Funding

Related Research Data

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

4.3 Choices of q, c₀, c₁, and $l$