Full article: Empirical likelihood inference in autoregressive models with time-varying variances

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This paper develops the empirical likelihood ( $E L$ ) inference procedure for parameters in autoregressive models with the error variances scaled by an unknown nonparametric time-varying function. Compared with existing methods based on non-parametric and semi-parametric estimation, the proposed test statistic avoids estimating the variance function, while maintaining the asymptotic chi-square distribution under the null. Simulation studies demonstrate that the proposed $E L$ procedure (a) is more stable, i.e., depending less on the change points in the error variances, and (b) gets closer to the desired confidence level, than the traditional test statistic.

Keywords:

1. Introduction

In the literature of the macroeconomics and financial applications, the assumption of heteroscedasticity in many time series models revealed the facts that ignoring the issue of heteroscedasticity often leads to the inefficient estimation and unreliable inference. Thus, heteroscedasticity has been focused mainly on the effect of violations of homoscedasticity, usually in two forms, ‘conditional heteroscedasticity’ and ‘unconditional heteroscedasticity’.

Non-constant volatility will be identified by ‘conditional heteroscedasticity’, when future periods of high and low volatility cannot be identified. Bollerslev (Citation1986) and Engle (Citation1982) proposed ARCH or GARCH models and provided the efficient estimation of the mean function by quasi-maximum likelihood based on other adaptive procedures. More complicated GARCH models had been proposed to allow for conditional heteroscedasticity, for instance, varying coefficient GARCH models (see Polzehl & Spokoiny, Citation2006) and spline GARCH models (see Engle & Rangel, Citation2008). The time-varying volatility is often used to describe the conditional heteroscedasticity. Drees and Starica (Citation2002) and Starica (Citation2003) made use of a non-stationary framework to analyse time series of S&P 500 returns, and found that this approach outperformed the GARCH-type models.

‘Unconditional heteroscedasticity’ will be used, when variables that have identifiable seasonal variability, such as electricity usage, are discussed. Hansen (Citation1995) considered the linear regression model with deterministically trending regressors only, in which the error is an $A R (p)$ process scaled by a continuous function of time. Nesting autoregressive model is also a special case when the conditional error variance of the model is a function of a covariate that has a form of a nearly integrated stochastic process with no deterministic drift. For the constant coefficient autoregressive model with time-varying variances ( $A R T V$ ) which will be discussed in this article, Phillips and Xu (Citation2006) utilised the ordinary least squares method and the nonparametric estimation of the variance function to provide three heteroscedasticity-robust test statistics, and proved their asymptotic standard normal distributions. Xu and Phillips (Citation2008) proposed the heteroscedasticity-robust adaptive estimation for $A R T V$ . Meanwhile, performances of methods in Phillips and Xu (Citation2006) and Xu and Phillips (Citation2008) relied on appropriately selecting the bandwidth used in the non-parametric function estimation.

Motivated from the ‘empirical likelihood’ ( $E L$ ) approach, this article aims to develop a test statistic which is more stable, namely, depending less on the change points in the error variances, and avoiding the problem of selecting the bandwidth. In the literature, the $E L$ approach was introduced by Owen (Citation1988), Owen (Citation1990) and Owen (Citation1991) to construct confidence intervals in a nonparametric setting, which can be seen in Owen (Citation2001). Since an $E L$ approach possesses nonparametric properties, the distribution for the data is not required to be specified, and meanwhile more efficient estimates of the parameters can be yielded. The $E L$ approach allows data to decide the shape of confidence regions without estimating the variance of the test statistic, and also is Bartlett correctable in DiCiccio et al. (Citation1991). The $E L$ approach has been applied to various situations, such as generalised linear models in Kolaczyk (Citation1994), local linear smoother in Chen and Qin (Citation2000), partially linear models in Shi and Lau (Citation2000), parametric and semi-parametric models in multi response regression in Chen and Ingrid (Citation2009); linear regression with censored data in Zhou and Li (Citation2008), plug-in estimates of nuisance parameters in estimating equations in the context of survival analysis in Li and Wang (Citation2003) and Qin and Jing (Citation2001), heteroscedastic partially linear models in Lu (Citation2009); GARCH models in Chan and Ling (Citation2006); variable selection in Han et al. (Citation2013) and Variyath and Chen (Citation2010); analysis of longitudinal data in Qiu and Wu (Citation2015). Qin and Lawless (Citation1994) linked the $E L$ with finitely many estimating equations, which served as finitely many equality constraints. To the best of our knowledge, there is no existing published work in the literature using the $E L$ approach in the constant coefficient autoregressive models with time-varying variances. This article will also consider the constant coefficient autoregressive models with time-varying innovation variance by using the $E L$ approach.

The remainder of the paper proceeds as follows. Section 2 describes the autoregressive model with time-varying variances and discusses main assumptions. Section 3 reviews the existing methods. Section 4 develops the empirical likelihood inference procedure with theoretical guarantees. Section 5 conducts simulation studies to evaluate the finite sample performance of the proposed method when compared with alternative methods. Section 6 briefly concludes. Technical details and proofs of the main results are relegated to Appendix.

2. Autoregressive model with time-varying variances

The constant coefficient autoregressive model with time-varying variances is described as follows, (1) $\begin{aligned} Y_{t} & = β_{0} + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + \dots + β_{p} Y_{t - p} + u_{t} \\ = X_{t - 1}^{⊤} β_{o} + u_{t}, \end{aligned}$ (1) (2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) where $⊤$ denotes transpose, $X_{t - 1} = (1, Y_{t - 1}, \dots, Y_{t - p})^{⊤} \in R^{p + 1}$ is the vector of covariates, and $β_{o} = (β_{0}, β_{1}, \dots, β_{p})^{⊤} \in R^{p + 1}$ is the true parameter vector of interest, with $β_{p} \neq 0$ , and the lag order p finite and known. We assume that ${σ_{t}}$ is a deterministic sequence of time t, satisfying (3) $σ_{t} = g (t / T),$ (3) and ${ϵ_{t}}$ is a martingale difference sequence with respect to $F_{t}$ , where $F_{t} = σ (ϵ_{s} : s \leq t)$ is the σ-field generated by ${ϵ_{s} : s \leq t}$ with $E (ϵ_{t}^{2} ∣ F_{t - 1}) = 1$ , $a . s .$ , for all t. Thus, the conditional variance of ${u_{t}}$ is fully characterised by the multiplicative factor $σ_{t}$ in (Equation2(2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) ), i.e., (4) $E (u_{t}^{2} | F_{t - 1}) = σ_{t}^{2} = g^{2} (t / T), a . s . .$ (4) Suppose that the data are generated from models (Equation1(1) $\begin{aligned} Y_{t} & = β_{0} + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + \dots + β_{p} Y_{t - p} + u_{t} \\ = X_{t - 1}^{⊤} β_{o} + u_{t}, \end{aligned}$ (1) )–(Equation2(2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) ), and we observe a sample containing T + p observations, denoted by ${Y_{- p + 1}, Y_{- p + 2}, \dots, Y_{0}, Y_{1}, \dots, Y_{T}}$ . The main goals are to make inferences about the true parameter vector $β_{o}$ in models (Equation1(1) $\begin{aligned} Y_{t} & = β_{0} + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + \dots + β_{p} Y_{t - p} + u_{t} \\ = X_{t - 1}^{⊤} β_{o} + u_{t}, \end{aligned}$ (1) )–(Equation2(2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) ), i.e., testing the null hypothesis, (5) $H_{0} : β_{o} = b_{0},$ (5) where $b_{0} = (b_{0, 0}, b_{0, 1}, \dots, b_{0, p}) \in R^{p + 1}$ , and constructing a confidence region for $β_{o}$ .

Section 4 will present our proposed empirical likelihood inference, after Section 3 describes the estimation methods in Phillips and Xu (Citation2006).

To facilitate the discussion of main results and comparison with related existing methods, the following conditions provided in Phillips and Xu (Citation2006); Xu and Phillips (Citation2008) are considered.

Conditions

$g (\cdot)$ in (Equation3(3) $σ_{t} = g (t / T),$ (3) ) and (Equation4(4) $E (u_{t}^{2} | F_{t - 1}) = σ_{t}^{2} = g^{2} (t / T), a . s . .$ (4) ) is a measurable and strictly positive function on the interval $(0, 1]$ such that $0 < inf_{r \in (0, 1]} g (r) \leq sup_{r \in (0, 1]} g (r) < \infty$ , and $g (r)$ satisfies a Lipschitz condition except at a finite number of points of discontinuity;
Suppose that L is the lag operator. Then $0 = 1 - β_{1} L - β_{2} L^{2} - \dots - β_{p} L^{p}$ has all roots outside the unit circle;
${ϵ_{t}}$ satisfies $E (ϵ_{t} ∣ F_{t - 1}) = 0$ , and $E (ϵ_{t}^{2} | F_{t - 1}) = 1$ , $a . s .$ , for all t;
$sup_{t} E (| ϵ_{t}^{4 ν} |) < \infty$ for some $ν > 1$ .

Remark 2.1

In condition (A1), the function g is integrable on the interval $(0, 1]$ to any finite order. For brevity, we write $\int_{0}^{1} g^{m} (x) d x$ as $\int g^{m}$ for any finite positive integer $m \geq 1$ .
Condition (A2) satisfies the stability conditions which, for a constant $g (\cdot)$ and homoskedastic ${ϵ_{t}}$ , would ensure that ${Y_{t}}$ is stationary or asymptotically covariance-stationary. Under condition (A2), the mean μ of $Y_{t}$ is given by $μ = \frac{β_{0}}{1 - β_{1} - \dots - β_{p}},$ and $Y_{t}$ has the Wold representation, $Y_{t} = μ + \sum_{i = 1}^{\infty} α_{i} u_{t - i},$ where ${α_{i}}$ satisfies that $α_{i} - β_{1} α_{i - 1} - \dots - β_{p} α_{i - p} = 0, a s i > 0,$ and $\sum_{i = 1}^{\infty} | α_{i} | < \infty$ . Define Ω to be the matrix with the $(i, j)$ -th element $γ_{| i - j |}$ , where $γ_{k} = \sum_{i = 0}^{\infty} α_{i} α_{i + k} < \infty$ .
Condition (A3) ensures that ${ϵ_{t}}$ is a martingale difference sequence and, at the same time, stipulates $E (u_{t}^{2} ∣ F_{t - 1}) = g^{2} (t / T)$ doesn't depend on the past events, in other words, models (Equation1(1) $\begin{aligned} Y_{t} & = β_{0} + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + \dots + β_{p} Y_{t - p} + u_{t} \\ = X_{t - 1}^{⊤} β_{o} + u_{t}, \end{aligned}$ (1) )–(Equation2(2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) ) are unconditional heteroscedastic.

3. Existing methods

Regarding the estimation of $β_{o}$ in models (Equation1(1) $\begin{aligned} Y_{t} & = β_{0} + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + \dots + β_{p} Y_{t - p} + u_{t} \\ = X_{t - 1}^{⊤} β_{o} + u_{t}, \end{aligned}$ (1) )–(Equation2(2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) ), Phillips and Xu (Citation2006) reviewed the ordinary least squares ( $O L S$ ) estimator $\hat{β}$ , and showed that under the stated conditions, as $T \to \infty$ , (6) $\begin{aligned} \sqrt{T} (\hat{β} - β_{o}) & = {(\frac{1}{T} \sum_{t = 1}^{T} X_{t - 1}^{⊤} X_{t - 1})}^{- 1} \\ \times (\frac{1}{\sqrt{T}} \sum_{t = 1}^{T} X_{t - 1}^{⊤} ϵ_{t}) \overset{D}{\to} N (0, Λ), \end{aligned}$ (6) where $\overset{D}{\to}$ stands for converges in distribution, $Λ = Ω_{1}^{- 1} Ω_{2} Ω_{1}^{- 1}$ , $Ω_{1}$ and $Ω_{2}$ are defined as the $(p + 1) \times (p + 1)$ matrices, (7) $\begin{aligned} Ω_{1} & = (\begin{matrix} 1 & μ l_{p}^{⊤} \\ μ l_{p} & μ^{2} + (\int g^{2}) Ω \end{matrix}), \\ Ω_{2} & = (\begin{matrix} (\int g^{2}) & μ (\int g^{2}) l_{p}^{⊤} \\ μ (\int g^{2}) l_{p} & μ^{2} (\int g^{2}) + (\int g^{4}) Ω \end{matrix}), \end{aligned}$ (7) $l_{p} = (1, \dots, 1)^{⊤} \in R^{p}$ is a vector of ones, and μ and Ω are as defined in Remark 2.1.

Since g is typically unknown, the asymptotic covariance matrix Λ in (Equation6(6) $\begin{aligned} \sqrt{T} (\hat{β} - β_{o}) & = {(\frac{1}{T} \sum_{t = 1}^{T} X_{t - 1}^{⊤} X_{t - 1})}^{- 1} \\ \times (\frac{1}{\sqrt{T}} \sum_{t = 1}^{T} X_{t - 1}^{⊤} ϵ_{t}) \overset{D}{\to} N (0, Λ), \end{aligned}$ (6) ) must be estimated and this can be done in several ways. First, by applying the weighted sum of squared $O L S$ residuals using kernel smoothing, originally proposed by Nadaraya (Citation1964) and Watson (Citation1964) for estimation of regression functions, they proposed the consistent estimator of the function $g^{2} (r)$ non-parametrically for $r \in [0, 1]$ , (8) ${\hat{g}}^{2} (r) = \sum_{t = 1}^{T} w_{r, t} {\hat{u}}_{t}^{2},$ (8) where ${\hat{u}}_{t} = Y_{t} - X_{t - 1}^{⊤} \hat{β}$ is the $O L S$ residual and the weights $w_{r, t}$ , $t = 1, \dots, T$ , are defined as (9) $w_{r, t} = {\sum_{t = 1}^{T} K (\frac{[T r] - t}{T h_{T}})}^{- 1} K (\frac{[T r] - t}{T h_{T}}),$ (9) where the kernel function $K (\cdot) : R \mapsto [0, \infty)$ is assumed to satisfy $0 \leq K (z) \leq C_{1} < \infty$ uniformly in z and $\int_{- \infty}^{\infty} K (z) d z < C_{2} < \infty,$ for some constant $C_{1}$ and $C_{2}$ ; $h_{T}$ is a bandwidth parameter depending on T. The selection of bandwidth parameter $h_{T}$ uses the cross-validation procedure, i.e., minimises the averaged squared prediction errors (see Wong, Citation1983), (10) $C V (b) = \frac{1}{T} \sum_{s = 1}^{T} {{\hat{u}}_{s}^{2} - {\hat{g}}_{- s}^{2} (s / T)}^{2},$ (10) with respect to b, where ${\hat{g}}_{- s}^{2} (r) = \sum_{t = 1, t \neq s} w_{r, t} {\hat{u}}_{t}^{2} .$ Phillips and Xu (Citation2006) suggested the following three consistent estimators of the asymptotic covariance matrix Λ when g is unknown.

The first estimator of the asymptotic covariance matrix is (11) $\begin{aligned} {\hat{Λ}}_{1} & = T {(\sum_{t = 1}^{T} X_{t - 1} X_{t - 1}^{⊤})}^{- 1} (\sum_{t = 1}^{T} {\hat{u}}_{t}^{2} X_{t - 1} X_{t - 1}^{⊤}) \\ \times {(\sum_{t = 1}^{T} X_{t - 1} X_{t - 1}^{⊤})}^{- 1} . \end{aligned}$ (11)
The second estimator of the asymptotic covariance matrix is (12) ${\hat{Λ}}_{2} = {\hat{Ω}}_{1}^{- 1} (\sum_{t = 1}^{T} {\hat{u}}_{t}^{2} X_{t - 1} X_{t - 1}^{⊤}) {\hat{Ω}}_{1}^{- 1},$ (12) where the matrix ${\hat{Ω}}_{1}$ is defined as $\begin{aligned} {\hat{Ω}}_{1} = (\begin{matrix} 1 & \hat{μ} l_{p}^{⊤} \\ \hat{μ} l_{p} & {\hat{μ}}^{2} + (T^{- 1} \sum_{t = 1}^{T} {\hat{u}}_{t}^{2}) \hat{Ω} \end{matrix}), \end{aligned}$ where $\hat{μ}$ and $\hat{Ω}$ correspond to replacing $β_{o}$ , in the expressions of μ and Ω in Remark 2.1, with $\hat{β}$ .
The third estimator of the asymptotic covariance matrix is (13) ${\hat{Λ}}_{3} = {\hat{Ω}}_{1}^{- 1} {\tilde{Ω}}_{2} {\hat{Ω}}_{1}^{- 1},$ (13) where the matrix ${\hat{Ω}}_{2}$ is defined as $\begin{aligned} {\tilde{Ω}}_{2} = (\begin{matrix} \int {\hat{g}}^{2} & \hat{μ} (\int {\hat{g}}^{2}) l_{p}^{⊤} \\ \hat{μ} (\int {\hat{g}}^{2}) l_{p} & {\hat{μ}}^{2} (\int {\hat{g}}^{2}) + (\int {\hat{g}}^{4}) \hat{Ω} \end{matrix}) . \end{aligned}$

Based on the above three estimators ${\hat{Λ}}_{j}$ of the true covariance matrix Λ, Phillips and Xu (Citation2006) constructed three test statistics $t_{j}$ , j = 1, 2, 3, for the true parameter vector $β_{o}$ , stated as follows.

Lemma 3.1

Theorem 2(ii) in Phillips and Xu (Citation2006)

Assume that $\hat{β}$ is the $O L S$ estimator of $β_{o}$ . Then, under the above assumptions and null hypothesis (Equation5(5) $H_{0} : β_{o} = b_{0},$ (5) ), it follows that (14) $t_{j} = \frac{\sqrt{T} ({\hat{β}}_{k} - b_{0, k})}{(({\hat{Λ}}_{j})_{k k})^{1 / 2}} \overset{D}{\to} N (0, 1), a s T \to \infty,$ (14) where $({\hat{Λ}}_{j})_{k k}$ is the $(k, k)$ -th element of the matrix ${\hat{Λ}}_{j}$ , j = 1, 2, 3, defined in (Equation11(11) $\begin{aligned} {\hat{Λ}}_{1} & = T {(\sum_{t = 1}^{T} X_{t - 1} X_{t - 1}^{⊤})}^{- 1} (\sum_{t = 1}^{T} {\hat{u}}_{t}^{2} X_{t - 1} X_{t - 1}^{⊤}) \\ \times {(\sum_{t = 1}^{T} X_{t - 1} X_{t - 1}^{⊤})}^{- 1} . \end{aligned}$ (11) ), (Equation12(12) ${\hat{Λ}}_{2} = {\hat{Ω}}_{1}^{- 1} (\sum_{t = 1}^{T} {\hat{u}}_{t}^{2} X_{t - 1} X_{t - 1}^{⊤}) {\hat{Ω}}_{1}^{- 1},$ (12) ) and (Equation13(13) ${\hat{Λ}}_{3} = {\hat{Ω}}_{1}^{- 1} {\tilde{Ω}}_{2} {\hat{Ω}}_{1}^{- 1},$ (13) ), respectively.

Hence, a large sample level $100 (1 - α) %$ confidence region for $β_{o}$ based on the above Normal approximation (Equation14(14) $t_{j} = \frac{\sqrt{T} ({\hat{β}}_{k} - b_{0, k})}{(({\hat{Λ}}_{j})_{k k})^{1 / 2}} \overset{D}{\to} N (0, 1), a s T \to \infty,$ (14) ) is given by (15) $\begin{aligned} ℜ_{j, α} & = {b : T (\hat{β} - b)^{⊤} \\ \times [d i a g {({\hat{Λ}}_{j})_{k, k}, k = 0, 1, \dots, p}]^{- 1} (\hat{β} - b) \\ \leq χ_{p; 1 - α}^{2}}, \end{aligned}$ (15) where $d i a g {({\hat{Λ}}_{j})_{k, k}, k = 0, 1, \dots, p}$ is the main diagonal matrix of ${\hat{Λ}}_{j}$ , j = 1, 2, 3, and $χ_{p; 1 - α}^{2}$ denotes the $100 (1 - α)$ th quantile of the chi-square distribution $χ_{p}^{2}$ with p degrees of freedom.

4. Proposed method

In terms of the practical performance of the three tests $t_{j}$ in (Equation14(14) $t_{j} = \frac{\sqrt{T} ({\hat{β}}_{k} - b_{0, k})}{(({\hat{Λ}}_{j})_{k k})^{1 / 2}} \overset{D}{\to} N (0, 1), a s T \to \infty,$ (14) ), however, simulation results reveal two major issues arising from the estimation of the asymptotic covariance matrix and the selection of the bandwidth. In order to solve these problems, the proposed empirical likelihood approach will be applied to test parameters in models (Equation1(1) $\begin{aligned} Y_{t} & = β_{0} + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + \dots + β_{p} Y_{t - p} + u_{t} \\ = X_{t - 1}^{⊤} β_{o} + u_{t}, \end{aligned}$ (1) )–(Equation2(2) $\begin{aligned} u_{t} & = σ_{t} ϵ_{t}, t = 1, \dots, T, \end{aligned}$ (2) ).

To construct an empirical likelihood function, the estimation equations will be defined by means of, (16) $W_{t} (b) = X_{t - 1} \cdot (Y_{t} - X_{t - 1}^{⊤} b),$ (16) for a generic model parameter $b \in R^{p + 1}$ . According to condition (A3), we have that $\begin{aligned} E (W_{t} (β_{o})) \\ = E (X_{t - 1} g (t / T) ϵ_{t}) = g (t / T) E (X_{t - 1} ϵ_{t}) = 0 \end{aligned}$ holds for the true parameter vector $β_{o}$ . Based on (Equation16(16) $W_{t} (b) = X_{t - 1} \cdot (Y_{t} - X_{t - 1}^{⊤} b),$ (16) ), we define the empirical likelihood for the parameter $b$ by $L (b) = sup {\prod_{t = 1}^{T} q_{t} : \sum_{t = 1}^{T} q_{t} = 1, \sum_{t = 1}^{T} q_{t} W_{t} (b) = 0} .$ By using the Lagrange multiplier, we have ${\hat{q}}_{t} (b) = \frac{1}{T} {1 + {\hat{λ}}^{⊤} W_{t} (b)}^{- 1}, t = 1, \dots, T,$ where $\hat{λ} = \hat{λ} (b) \in R^{p + 1}$ is the solution of equations, (17) $\frac{1}{T} \sum_{t = 1}^{T} \frac{W_{t} (b)}{1 + {\hat{λ}}^{⊤} W_{t} (b)} = 0 .$ (17) We also note that $\prod_{t = 1}^{T} q_{t}$ , subject to constraints $q_{t} \geq 0$ and $\sum_{t = 1}^{T} q_{t} = 1$ , attains its maximum $(1 / T)^{T}$ at $q_{t} = 1 / T$ . Thus, the empirical likelihood ratio at $b$ is defined by $E L R (b) = \prod_{t = 1}^{T} {{\hat{q}}_{t} (b) T}^{- 1} = \prod_{t = 1}^{T} {1 + {\hat{λ}}^{⊤} W_{t} (b)} .$ Taking the log transformation of the above equation, we get the corresponding empirical log-likelihood ratio, (18) $ℓ (b) = 2 \sum_{t = 1}^{T} \log {1 + {\hat{λ}}^{⊤} W_{t} (b)} .$ (18) In addition, Theorem 4.1 below provides the asymptotic null distribution of $ℓ (β_{o})$ .

Theorem 4.1

Assume that conditions (A1)–(A4) hold. Then, under the null hypothesis (Equation5(5) $H_{0} : β_{o} = b_{0},$ (5) ), the limiting distribution of $ℓ (β_{o})$ is the chi-square distribution with p degrees of freedom, i.e., (19) $ℓ (β_{o}) \overset{D}{\to} χ_{p}^{2}, a s T \to \infty .$ (19)

According to Theorem 4.1, the empirical likelihood ratio confidence interval for the true value $β_{o}$ can be constructed as follows: (20) $ℜ_{E L, α} = {b : ℓ (b) \leq χ_{p; 1 - α}^{2}},$ (20) where $χ_{p; 1 - α}^{2}$ is defined below (Equation15(15) $\begin{aligned} ℜ_{j, α} & = {b : T (\hat{β} - b)^{⊤} \\ \times [d i a g {({\hat{Λ}}_{j})_{k, k}, k = 0, 1, \dots, p}]^{- 1} (\hat{β} - b) \\ \leq χ_{p; 1 - α}^{2}}, \end{aligned}$ (15) ). Combined with (Equation20(20) $ℜ_{E L, α} = {b : ℓ (b) \leq χ_{p; 1 - α}^{2}},$ (20) ), Theorem 4.1 implies Corollary 4.1.

Corollary 4.1

Under the conditions of Theorem 4.1, $P (β_{o} \in ℜ_{E L, α}) \to 1 - α, a s T \to \infty .$

5. Simulation evaluation

In this section, simulation studies are conducted to compare the finite sample performance of five methods described in Sections 3–4:

Ordinary least squares without the heteroscedasticity correction ( $O L S$ ),
$t_{1}$ , $t_{2}$ , $t_{3}$ ,
the proposed empirical likelihood ( $E L$ ) procedure.

The zero-mean $A R (1)$ with the time-varying variance is considered as follows: $Y_{t} = β_{0, 1} Y_{t - 1} + g (t / T) ϵ_{t},$ where ${ϵ_{t}} \overset{i . i . d .}{\sim} N (0, 1)$ . The kernel function $K (\cdot)$ is the standard Normal density function, $K (x) = \frac{1}{\sqrt{2 π}} \exp (- \frac{x^{2}}{2}), - \infty < x < \infty,$ and the bandwidth parameter is selected by the cross-validation criterion (Equation10(10) $C V (b) = \frac{1}{T} \sum_{s = 1}^{T} {{\hat{u}}_{s}^{2} - {\hat{g}}_{- s}^{2} (s / T)}^{2},$ (10) ). We consider $H_{0} : β_{0, 1} = β_{1}$ with known values of $β_{1}$ .

Three kinds of the variance functions $g^{2} (r)$ are considered in the following simulations: a single abrupt point model, two abrupt points model, continuous function variance model as follows.

Model 1: A single abrupt point model, $g^{2} (r) = σ_{0}^{2} + (σ_{1}^{2} - σ_{0}^{2}) I_{{r \geq κ}}, r \in [0, 1] .$ Model 1 corresponds to the case of a single abrupt change of the error variance from $σ_{0}^{2}$ to $σ_{1}^{2}$ at time $[κ T]$ , where κ is the break point within the value set ${0.1, 0.5, 0.9}$ . The ratio of post-break and pre-break standard deviations $δ = σ_{1} / σ_{0}$ is within the value set ${0.2, 1, 5}$ where $σ_{0} = 1$ .

Model 2: Two abrupt points model, $\begin{aligned} g^{2} (r) & = σ_{0}^{2} + (σ_{1}^{2} - σ_{0}^{2}) I_{{κ_{0} < r \leq κ_{1}}} + (σ_{2}^{2} - σ_{0}^{2}) I_{{κ_{1} < r}}, \\ r \in [0, 1] . \end{aligned}$ Model 2 corresponds to the case of two abrupt points model which has the change of the error variance from $σ_{0}^{2}$ to $σ_{1}^{2}$ and $σ_{1}^{2}$ to $σ_{2}^{2}$ . The time break points $(κ_{0}, κ_{1})$ take the values $(0.1, 0.9)$ ; $(σ_{0}^{2}, σ_{1}^{2}, σ_{2}^{2})$ are from the set ${(0.2, 5, 0.2), (5, 0.2, 5)}$ .

Model 3: Continuous function variance model, $g^{2} (r) = σ_{0}^{2} + (σ_{1}^{2} - σ_{0}^{2}) r^{m}, r \in [0, 1] .$ Model 3 considers that the variance of the errors is the continuous function from $σ_{0}^{2}$ to $σ_{1}^{2}$ . We suppose m to be within the value set ${1, 2}$ and $δ = σ_{1} / σ_{0}$ within the value set ${0.2, 5}$ where $σ_{0}^{2} = 1$ .

Model 1 and Model 3 are the same as in Cavaliere (Citation2004), Cavaliere and Taylor (Citation2007) and Phillips and Xu (Citation2006). Simulations are done when the parameter of interest $β_{1}$ increases on the set ${0.1, 0.5, 0.9}$ , and the nominal size is $5 %$ . The sample size T is from ${60, 200}$ respectively. The number of Monte Carlo replications is 5000.

Simulation results include two parts. The first part displayed in Tables , and assesses the rejection rates of five methods under the null hypothesis.

Table 1. Comparison of the rejection rates of five methods in Model 1 for $β_{1} \in {0.1, 0.5, 0.9}$ , $κ \in {0.1, 0.5, 0.9}$ , $δ \in {0.2, 1, 5}$ and the sample size $T \in {60, 200}$ , based on 5000 replications.

Display Table

Table 2. Comparison of the rejection rates of five methods in Model 2 for $β_{1} \in {0.1, 0.5, 0.9}$ , $[κ_{0}, κ_{1}] = [0.1, 0.9]$ , $[σ_{0}, σ_{1}, σ_{2}] \in {[0.2, 5, 0.2], [5, 0.2, 5]}$ and the sample size $T \in {60, 200}$ , based on 5000 replications.

Display Table

Table 3. Comparison of the rejection rates of five methods in Model 3 for $β_{1} \in {0.1, 0.5, 0.9}$ , $m \in {1, 2}$ , $δ \in {0.2, 5}$ and the sample size $t \in {60, 200}$ , based on 5000 replications.

Display Table

The second part includes Figures – to evaluate the rejection rates of methods $O L S$ , $t_{1}$ , $t_{2}$ , $t_{3}$ and $E L$ as the parameter $β_{1}$ under the alternatives increases.

Figure 1. The relationship between the rejection rates of $O L S$ , $t_{1}$ , $t_{2}$ , $t_{3}$ , $E L$ and the true coefficient $β_{1}$ in Model 1 $($ a single abrupt point model $)$ . The abrupt point $κ = 0.1$ , $δ = 0.2$ . The true parameter $β_{1}$ increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

From these simulations, we draw the following conclusions.

First, the $O L S$ -based test is the inefficient and unreliable test under the heteroscedastic innovations. From Table , the $O L S$ -based test overrejects overwhelmingly the null hypothesis when the null is true, and has the largest distorted size under $(κ, δ) \in {(0.1, 0.2), (0.9, 5)}$ . In addition, the distorted size doesn't reduce except for the homoscedastic innovations with the increasing sample size which is also shown in Figures and . From Table , the $O L S$ -based test has better performance than Table , however, the rejection rate reduces as the sample size increases. The results of the $O L S$ -based test in Table are similar to those in Table .
Second, the performance of $t_{2}$ and $t_{3}$ depends on the numerical value of the true parameter and the pattern of the variance function. From Figures , , , an interesting phenomenon can be found that the rejection rates of $t_{2}$ and $t_{3}$ are likely to be an increasing function of the parameter and grow bigger as $β_{1} > 0.5$ . The rejection rate of $t_{2}$ is far greater than the nominal size $5 %$ when the numerical value of the parameter is close to unity, namely $β_{1} = 0.9$ . In particular, it is easy to see that $t_{2}$ and $t_{3}$ overaccept the null hypothesis when the parameter is less than or equal to $5 %$ when $β_{1} < 0.5$ . On the contrary, $t_{2}$ and $t_{3}$ overreject the null hypothesis when $β_{1} > 0.9$ . It also has the similar conclusions from Tables –. So both $t_{2}$ and $t_{3}$ aren't the stable test for the $A R T V$ model.
Third, both $E L$ and $t_{1}$ are the stable tests for the $A R T V$ model and $E L$ outperforms $t_{1}$ . From Tables –, we can find that $E L$ and $t_{1}$ overreject the null hypothesis when the null is true. From Figures –, the rejection rate of $E L$ is almost a horizontal line and is closer to the nominal level $5 %$ than $t_{1}$ except Figure (a) when the sample size is 60. When the sample size is 200, EL's rejection rate is nearly a nominal size of $5 %$ and doesn't depend on the numerical value of the parameters even if the true value of β is close to unity $(β_{1} = 0.9)$ . $E L$ has the smallest size distortion overall and avoids correcting the variance. The simulation results generally support the asymptotic results. $E L$ is more stable and has better performance than $O L S$ , $t_{1}$ , $t_{2}$ , $t_{3}$ for testing the parameters of $A R T V$ . So $E L$ seems to be the better choice.

Figure 2. The relationship between the rejection rates of $O L S$ , $t_{1}$ , $t_{2}$ , $t_{3}$ , $E L$ and the true coefficient $β_{1}$ in Model 2 $($ two abrupt points model $)$ . The abrupt points $κ_{1} = 0.1$ , $κ_{2} = 0.9$ , $[σ_{0}, σ_{1}, σ_{2}] = [0.2, 5, 0.2]$ . The true parameter $β_{1}$ increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

Figure 3. The relationship between the rejection rates of $O L S$ , $t_{1}$ , $t_{2}$ , $t_{3}$ , $E L$ corresponding to the true coefficient $β_{1}$ in Model 3 $($ continuous function variance model $)$ , and m = 1, $δ = 0.2$ . The true parameter $β_{1}$ increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

6. Conclusion

This article focuses on the empirical likelihood approach for autoregressive models with error terms scaled by an unknown nonparametric time-varying function. The empirical likelihood ratio test statistic avoids estimating the unknown variance function, in the presence of heteroscedastic error terms. The results of simulations of three different models show that the empirical likelihood is more stable than the other four test statistics. In addition, some extensions include improving the efficiency of statistic based on the different equations, and locating the abrupt time points when they exist.

Acknowledgments

The authors thank the editor, Prof. Jun Shao, and two anonymous reviewers for helpful comments. Yu Han was supported by the Scientific Research Foundation of Jilin Education (JJKH20200102KJ). The work of C. Zhang was partially supported by U.S. National Science Foundation grants DMS-2013486 and DMS-1712418, and provided by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The authors thank the editor, Prof. Jun Shao, and two anonymous reviewers for helpful comments. Yu Han was supported by the Scientific Research Foundation of Jilin Education [grant number JJKH20200102KJ]. The work of C. Zhang was partially supported by U.S. National Science Foundation [grant numbers DMS-2013486 and DMS-1712418], and provided by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

Notes on contributors

Yu Han

Yu Han, received Ph.D. degree in mathematical statistics from Jilin University in 2012. He is currently an associate research fellow in the Educational Supervision and Evaluation Center of Northeast Electrical Power University. His current research interests are Time Series Analysis, Non-parametric and semi-parametric estimation & inference. He worked at Department of Statistics, Wisconsin University-Madison between 2013 and 2014 as a visiting scholar. He has published 12 papers. He has accomplished 3 projects as a principle investigator and as a participator. One was accomplished, and two are ongoing.

Chunming Zhang

Chunming Zhang is Professor of Statistics at the University of Wisconsin-Madison. Her research interests range from statistical learning and data mining, statistical methods with applications to imaging data, neuroinformatics and bioinformatics, multiple testing, large-scale simultaneous inference and applications, statistical methods in financial econometrics, non- and semi-parametric estimation and inference, to functional and longitudinal data analysis.

References

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31, 307–327. https://doi.org/https://doi.org/10.1016/0304-4076(86)90063-1
Web of Science ®Google Scholar
Cavaliere, G. (2004). Unit root tests under time-varying variance shifts. Econometric Reviews, 23, 259–292. https://doi.org/https://doi.org/10.1081/ETC-200028215
Google Scholar
Cavaliere, G., & Taylor, A. M. R. (2007). Testing for unit roots in time series models with nonstationary volatility. Journal of Econometrics, 140(2), 919–947. https://doi.org/https://doi.org/10.1016/j.jeconom.2006.07.019
Web of Science ®Google Scholar
Chan, N. H., & Ling, S. Q. (2006). Empirical likelihood for Garch models. Econometric Theory, 3, 403–428. https://doi.org/https://doi.org/10.1017/S0266466606060208
Google Scholar
Chen, S. X., & Ingrid, V. K. (2009). A review on empirical likelihood methods for regression. Test, 18(3), 415–447. https://doi.org/https://doi.org/10.1007/s11749-009-0159-5
Web of Science ®Google Scholar
Chen, S. X., & Qin, Y. (2000). Empirical likelihood confidence intervals for local linear smoothers. Biometrika, 87, 946–953. https://doi.org/https://doi.org/10.1093/biomet/87.4.946
Web of Science ®Google Scholar
DiCiccio, T., Hall, P., & Romano, J. (1991). Empirical likelihood is Bartlett-Correctable. Annals of Statistics, 19(2), 1053–1061. https://doi.org/https://doi.org/10.1214/aos/1176348137
Web of Science ®Google Scholar
Drees, H., & Starica, C. (2002). A simple non-stationary model for stock returns (Working paper). Chalmers University of Technology.
Google Scholar
Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation. Econometrica, 50, 987–1008. https://doi.org/https://doi.org/10.2307/1912773
Web of Science ®Google Scholar
Engle, R. F., & Rangel, J. G. (2008). The spline-GARCH model for low-frequency volatility and its global macroeconomic causes. The Review of Financial Studies, 21(3), 1187–1222. https://doi.org/https://doi.org/10.1093/rfs/hhn004
Web of Science ®Google Scholar
Han, Y., Jin, Y. H., & Chen, M. (2013). Empirical likelihood-based subset selection for partially linear autoregressive models. Acta Mathematicae Applicatae Sinica, English Series, 29(4), 793–808. https://doi.org/https://doi.org/10.1007/s10255-013-0256-9
Web of Science ®Google Scholar
Hansen, B. E. (1995). Regression with nonstationary volatility. Econometrica, 63, 1113–1132. https://doi.org/https://doi.org/10.2307/2171723
Web of Science ®Google Scholar
Kolaczyk, E. D. (1994). Empirical likelihood for generalized linear models. Statistica Sinica, 4, 199–218. http://www3.stat.sinica.edu.tw/statistica/oldpdf/A4n111.pdf
Web of Science ®Google Scholar
Li, G., & Wang, Q. H. (2003). Empirical likelihood regression analysis for right censored data. Statistica Sinica, 13, 51–68. https://www.jstor.org/stable/24307094?seq=1
Web of Science ®Google Scholar
Lu, X. W. (2009). Empirical likelihood for heteroscedastic partially linear models. Journal of Multivariate Analysis, 100, 387–396. https://doi.org/https://doi.org/10.1016/j.jmva.2008.05.006
Web of Science ®Google Scholar
Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and Its Applications, 9(1), 141–142. https://doi.org/https://doi.org/10.1137/1109020
Google Scholar
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249. https://doi.org/https://doi.org/10.1093/biomet/75.2.237
Web of Science ®Google Scholar
Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Annals of Statistics, 18, 90–120. https://doi.org/https://doi.org/10.1214/aos/1176347494
Web of Science ®Google Scholar
Owen, A. B. (1991). Empirical likelihood for linear models. Annals of Statistics, 19(4), 1725–1747. https://doi.org/https://doi.org/10.1214/aos/1176348368
Web of Science ®Google Scholar
Owen, A. B. (2001). Empirical Likelihood. Chapman and Hall.
Google Scholar
Phillips, P. C. B., & Xu, K. L. (2006). Inference in autoregression under heteroskedasticity. Journal of Time Series Analysis, 27, 289–308. https://doi.org/https://doi.org/10.1111/jtsa.2006.27.issue-2
Web of Science ®Google Scholar
Polzehl, J., & Spokoiny, V. (2006). Varying coefficient GARCH versus local constant volatility modeling: Comparison of predictive power (Working paper). Weierstrass Institute for Applied Analysis and Stochastics.
Google Scholar
Qiu, J., & Wu, L. (2015). A moving blocks empirical likelihood method for longitudinal data. Biometrics, 71, 616–624. https://doi.org/https://doi.org/10.1111/biom.12317
Web of Science ®Google Scholar
Qin, G., & Jing, B. Y. (2001). Empirical likelihood for censored linear regression. Scandinavian Journal of Statistics, 28, 661–673. https://doi.org/https://doi.org/10.1111/sjos.2001.28.issue-4
Web of Science ®Google Scholar
Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. Annals of Statistics, 22, 300–325. https://doi.org/https://doi.org/10.1214/aos/1176325370
Web of Science ®Google Scholar
Shi, J., & Lau, T. S. (2000). Empirical likelihood for partially linear models. Journal of Multivariate Analysis, 72(1), 132–148. https://doi.org/https://doi.org/10.1006/jmva.1999.1866
Web of Science ®Google Scholar
Starica, C. (2003). Is GARCH (1,1) as good a model as the Nobel prize accolades would imply (Working paper). Chalmers University of Technology.
Google Scholar
Xu, K. L., & Phillips, P. C. B. (2008). Adaptive estimation of autroregressive models with time-varying variances. Journal of Econometrics, 142, 265–280. https://doi.org/https://doi.org/10.1016/j.jeconom.2007.06.001
Web of Science ®Google Scholar
Variyath, A. M., & Chen, J. H. (2010). Abraham B. Empirical likelihood based variable selection. Journal of Statistical Planning and Inference, 140, 971–981. https://doi.org/https://doi.org/10.1016/j.jspi.2009.09.025
Web of Science ®Google Scholar
Watson, G. S. (1964). Smooth regression analysis. Sankhya Series A, 26, 359–372.
Google Scholar
Wong, W. H. (1983). On the consistency of cross validation in kernel nonparametric regression. Annals of Statistics, 11, 1136–1141. https://doi.org/https://doi.org/10.1214/aos/1176346327
Web of Science ®Google Scholar
Zhou, M., & Li, G. (2008). Empirical likelihood analysis of the Buckley-James estimator. Journal of Multivariate Analysis, 99, 649–664. https://doi.org/https://doi.org/10.1016/j.jmva.2007.02.007
PubMed Web of Science ®Google Scholar

Appendix. Proofs of main results

Before proving Theorem 4.1, we first show Lemmas A.1–A.2. To simplify notations, we denote

\hat{λ} = \hat{λ} (β_{o})

and

W_{t} = W_{t} (β_{o})

Lemma A.1

Assume that conditions (A1)–(A4) hold. Then (A1) $\begin{aligned} \frac{1}{\sqrt{T}} \sum_{t = 1}^{T} W_{t} \overset{D}{\to} N (0, Ω_{2}), \end{aligned}$ (A1) (A2) $\begin{aligned} \frac{1}{T} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤} \overset{P}{\to} Ω_{2}, \end{aligned}$ (A2) where $\overset{P}{\to}$ denotes converges in probability.

Proof.

According to Phillips and Xu (Citation2006) (Lemma 1(iii) –(iv)), the proof of Lemma A.1 completes.

Lemma A.2

Assume that conditions (A1)–(A3) hold. Then $\hat{λ} = O_{P} (T^{- 1 / 2}) .$

Proof.

From (Equation17(17) $\frac{1}{T} \sum_{t = 1}^{T} \frac{W_{t} (b)}{1 + {\hat{λ}}^{⊤} W_{t} (b)} = 0 .$ (17) ), we have $0 = \frac{1}{T} \sum_{t = 1}^{T} W_{t} - \frac{1}{T} \sum_{t = 1}^{T} \frac{W_{t} W_{t}^{⊤}}{1 + {\hat{λ}}^{⊤} W_{t}} \hat{λ} .$ By (EquationA1(A1) $\begin{aligned} \frac{1}{\sqrt{T}} \sum_{t = 1}^{T} W_{t} \overset{D}{\to} N (0, Ω_{2}), \end{aligned}$ (A1) ) of Lemma 3.1, $\begin{aligned} \frac{‖ \hat{λ} ‖_{2}}{1 + ‖ \hat{λ} ‖_{2} max_{t} ‖ W_{t} ‖_{2}} ‖ \frac{1}{T} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤} ‖ \\ \leq {‖ \frac{1}{T} \sum_{t = 1}^{T} W_{t} ‖}_{2} = O_{P} (T^{- 1 / 2}) . \end{aligned}$ According to conditions (A1) and (A4), we have $E (| Y_{t} |^{4 ν}) < \infty$ for some $ν > 1$ , and then (A3) $\begin{aligned} max_{t} ‖ W_{t} ‖_{2} \\ = max_{t} ‖ X_{t - 1} (Y_{t} - β_{o}^{⊤} X_{t - 1}) ‖_{2} = max_{t} ‖ X_{t - 1} u_{t} ‖_{2} \\ = max_{t} ‖ X_{t - 1} g (t / T) ϵ_{t} ‖_{2} = o_{P} (T^{\frac{1}{4 ν}}) . \end{aligned}$ (A3) From (EquationA2(A2) $\begin{aligned} \frac{1}{T} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤} \overset{P}{\to} Ω_{2}, \end{aligned}$ (A2) ) of Lemma A.1 and a similar argument used in Owen (Citation1991), the proof of Lemma A.2 is completed.

Proof of Theorem 4.1.

Noticing that if $β_{o}$ is the true parameters, applying Taylor's expansion to (Equation18(18) $ℓ (b) = 2 \sum_{t = 1}^{T} \log {1 + {\hat{λ}}^{⊤} W_{t} (b)} .$ (18) ), we have (A4) $\begin{aligned} ℓ (β_{o}) & = 2 \sum_{t = 1}^{T} \log (1 + {\hat{λ}}^{⊤} W_{t}) \\ = 2 \sum_{t = 1}^{T} {{\hat{λ}}^{⊤} W_{t} - \frac{1}{2} ({\hat{λ}}^{⊤} W_{t})^{2}} + r_{T}, \end{aligned}$ (A4) where $r_{T}$ , in probability, satisfies the following inequality in light of Lemma A.1 (EquationA2(A2) $\begin{aligned} \frac{1}{T} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤} \overset{P}{\to} Ω_{2}, \end{aligned}$ (A2) ) and Lemma A.2 for some constant C>0, $\begin{aligned} | r_{T} | & \leq C \sum_{t = 1}^{T} | {\hat{λ}}^{⊤} W_{t} |^{3} \\ \leq C ‖ \hat{λ} ‖_{2}^{3} max_{1 \leq t \leq T} ‖ W_{t} ‖_{2} \sum_{t = 1}^{T} ‖ W_{t} ‖_{2}^{2} = o_{P} (1) . \end{aligned}$ By Lemma A.1 (EquationA2(A2) $\begin{aligned} \frac{1}{T} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤} \overset{P}{\to} Ω_{2}, \end{aligned}$ (A2) ), Lemma A.2 and similar arguments as above, we have (A5) $\sum_{t = 1}^{T} \frac{({\hat{λ}}^{⊤} W_{t})^{3}}{1 + {\hat{λ}}^{⊤} W_{t}} = o_{P} (1) .$ (A5) By (Equation17(17) $\frac{1}{T} \sum_{t = 1}^{T} \frac{W_{t} (b)}{1 + {\hat{λ}}^{⊤} W_{t} (b)} = 0 .$ (17) ), we obtain (A6) $\begin{aligned} 0 & = \sum_{t = 1}^{T} \frac{{\hat{λ}}^{⊤} W_{t}}{1 + {\hat{λ}}^{⊤} W_{t}} = \sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t}) - \sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t})^{2} \\ + \sum_{t = 1}^{T} \frac{({\hat{λ}}^{⊤} W_{t})^{3}}{1 + {\hat{λ}}^{⊤} W_{t}} . \end{aligned}$ (A6) By (EquationA5(A5) $\sum_{t = 1}^{T} \frac{({\hat{λ}}^{⊤} W_{t})^{3}}{1 + {\hat{λ}}^{⊤} W_{t}} = o_{P} (1) .$ (A5) ) and (EquationA6(A6) $\begin{aligned} 0 & = \sum_{t = 1}^{T} \frac{{\hat{λ}}^{⊤} W_{t}}{1 + {\hat{λ}}^{⊤} W_{t}} = \sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t}) - \sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t})^{2} \\ + \sum_{t = 1}^{T} \frac{({\hat{λ}}^{⊤} W_{t})^{3}}{1 + {\hat{λ}}^{⊤} W_{t}} . \end{aligned}$ (A6) ), we obtain (A7) $\sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t}) = \sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t})^{2} + o_{P} (1) .$ (A7) Again by (Equation17(17) $\frac{1}{T} \sum_{t = 1}^{T} \frac{W_{t} (b)}{1 + {\hat{λ}}^{⊤} W_{t} (b)} = 0 .$ (17) ), we obtain $\begin{aligned} 0 & = \sum_{t = 1}^{T} \frac{W_{t}}{1 + {\hat{λ}}^{⊤} W_{t}} = \sum_{t = 1}^{T} W_{t} {1 - {\hat{λ}}^{⊤} W_{t} + \frac{({\hat{λ}}^{⊤} W_{t})^{2}}{1 + {\hat{λ}}^{⊤} W_{t}}} \\ = \sum_{t = 1}^{T} W_{t} - \sum_{t = 1}^{T} (W_{t} W_{t}^{⊤}) \hat{λ} + \sum_{t = 1}^{T} \frac{W_{t} ({\hat{λ}}^{⊤} W_{t})^{2}}{1 + {\hat{λ}}^{⊤} W_{t}} . \end{aligned}$ By Lemma A.1 and (EquationA3(A3) $\begin{aligned} max_{t} ‖ W_{t} ‖_{2} \\ = max_{t} ‖ X_{t - 1} (Y_{t} - β_{o}^{⊤} X_{t - 1}) ‖_{2} = max_{t} ‖ X_{t - 1} u_{t} ‖_{2} \\ = max_{t} ‖ X_{t - 1} g (t / T) ϵ_{t} ‖_{2} = o_{P} (T^{\frac{1}{4 ν}}) . \end{aligned}$ (A3) ), we have $\begin{aligned} \frac{1}{T} \sum_{t = 1}^{T} {‖ \frac{W_{t} ({\hat{λ}}^{⊤} W_{t})^{2}}{1 + {\hat{λ}}^{⊤} W_{t}} ‖}_{2} \\ \leq C ‖ {\hat{λ}}^{2} ‖_{2} max_{t} ‖ W_{t} ‖_{2} \frac{1}{T} \sum_{t = 1}^{T} ‖ W_{t} ‖_{2}^{2} = o_{P} (T^{- 1 / 2}) . \end{aligned}$ Thus, we have $\begin{aligned} \hat{λ} & = {(\sum_{t = 1}^{T} W_{t} W_{t}^{⊤})}^{- 1} \sum_{t = 1}^{T} W_{t} + {(\frac{1}{T} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤})}^{- 1} \\ \times {\frac{1}{T} \sum_{t = 1}^{T} \frac{W_{t} ({\hat{λ}}^{⊤} W_{t})^{2}}{1 + {\hat{λ}}^{⊤} W_{t}}} \\ = {(\sum_{t = 1}^{T} W_{t} W_{t}^{⊤})}^{- 1} \sum_{t = 1}^{T} W_{t} + o_{P} (T^{- 1 / 2}) . \end{aligned}$ By substituting $\hat{λ}$ of the above equation into (EquationA4(A4) $\begin{aligned} ℓ (β_{o}) & = 2 \sum_{t = 1}^{T} \log (1 + {\hat{λ}}^{⊤} W_{t}) \\ = 2 \sum_{t = 1}^{T} {{\hat{λ}}^{⊤} W_{t} - \frac{1}{2} ({\hat{λ}}^{⊤} W_{t})^{2}} + r_{T}, \end{aligned}$ (A4) ) and (EquationA7(A7) $\sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t}) = \sum_{t = 1}^{T} ({\hat{λ}}^{⊤} W_{t})^{2} + o_{P} (1) .$ (A7) ), we have $\begin{aligned} ℓ (β_{o}) & = \sum_{t = 1}^{T} {\hat{λ}}^{⊤} W_{t} W_{t}^{⊤} \hat{λ} + o_{P} (1) \\ = {(T^{- 1 / 2} \sum_{t = 1}^{T} W_{t})}^{⊤} {(T^{- 1} \sum_{t = 1}^{T} W_{t} W_{t}^{⊤})}^{- 1} \\ \times (T^{- 1 / 2} \sum_{t = 1}^{T} W_{t}) + o_{P} (1) . \end{aligned}$ The proof of Theorem 4.1 is completed by using Lemma A.1.

Empirical likelihood inference in autoregressive models with time-varying variances

Abstract

1. Introduction

2. Autoregressive model with time-varying variances

3. Existing methods

Theorem 2(ii) in Phillips and Xu (Citation2006)

4. Proposed method

5. Simulation evaluation

Table 1. Comparison of the rejection rates of five methods in Model 1 for $β_{1} \in {0.1, 0.5, 0.9}$ , $κ \in {0.1, 0.5, 0.9}$ , $δ \in {0.2, 1, 5}$ and the sample size $T \in {60, 200}$ , based on 5000 replications.

Table 2. Comparison of the rejection rates of five methods in Model 2 for $β_{1} \in {0.1, 0.5, 0.9}$ , $[κ_{0}, κ_{1}] = [0.1, 0.9]$ , $[σ_{0}, σ_{1}, σ_{2}] \in {[0.2, 5, 0.2], [5, 0.2, 5]}$ and the sample size $T \in {60, 200}$ , based on 5000 replications.

Table 3. Comparison of the rejection rates of five methods in Model 3 for $β_{1} \in {0.1, 0.5, 0.9}$ , $m \in {1, 2}$ , $δ \in {0.2, 5}$ and the sample size $t \in {60, 200}$ , based on 5000 replications.

6. Conclusion

Acknowledgments

Disclosure statement

Notes on contributors

Yu Han

Chunming Zhang

References

Appendix. Proofs of main results

Information for

Open access

Opportunities

Help and information

Empirical likelihood inference in autoregressive models with time-varying variances

Abstract

1. Introduction

2. Autoregressive model with time-varying variances

3. Existing methods

Theorem 2(ii) in Phillips and Xu (Citation2006)

4. Proposed method

5. Simulation evaluation

Table 1. Comparison of the rejection rates of five methods in Model 1 for β1∈{0.1,0.5,0.9}, κ∈{0.1,0.5,0.9}, δ∈{0.2,1,5} and the sample size T∈{60,200}, based on 5000 replications.

Table 2. Comparison of the rejection rates of five methods in Model 2 for β1∈{0.1,0.5,0.9}, [κ0,κ1]=[0.1,0.9], [σ0,σ1,σ2]∈{[0.2,5,0.2],[5,0.2,5]} and the sample size T∈{60,200}, based on 5000 replications.

Table 3. Comparison of the rejection rates of five methods in Model 3 for β1∈{0.1,0.5,0.9}, m∈{1,2}, δ∈{0.2,5} and the sample size t∈{60,200}, based on 5000 replications.

6. Conclusion

Acknowledgments

Disclosure statement

Additional information

Funding

Notes on contributors

Yu Han

Chunming Zhang

References

Appendix. Proofs of main results

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Comparison of the rejection rates of five methods in Model 1 for $β_{1} \in {0.1, 0.5, 0.9}$ , $κ \in {0.1, 0.5, 0.9}$ , $δ \in {0.2, 1, 5}$ and the sample size $T \in {60, 200}$ , based on 5000 replications.

Table 2. Comparison of the rejection rates of five methods in Model 2 for $β_{1} \in {0.1, 0.5, 0.9}$ , $[κ_{0}, κ_{1}] = [0.1, 0.9]$ , $[σ_{0}, σ_{1}, σ_{2}] \in {[0.2, 5, 0.2], [5, 0.2, 5]}$ and the sample size $T \in {60, 200}$ , based on 5000 replications.

Table 3. Comparison of the rejection rates of five methods in Model 3 for $β_{1} \in {0.1, 0.5, 0.9}$ , $m \in {1, 2}$ , $δ \in {0.2, 5}$ and the sample size $t \in {60, 200}$ , based on 5000 replications.