898
Views
1
CrossRef citations to date
0
Altmetric
Articles

Empirical likelihood inference in autoregressive models with time-varying variances

ORCID Icon & ORCID Icon
Pages 129-138 | Received 25 Dec 2020, Accepted 05 Apr 2021, Published online: 22 Apr 2021

Abstract

This paper develops the empirical likelihood (EL) inference procedure for parameters in autoregressive models with the error variances scaled by an unknown nonparametric time-varying function. Compared with existing methods based on non-parametric and semi-parametric estimation, the proposed test statistic avoids estimating the variance function, while maintaining the asymptotic chi-square distribution under the null. Simulation studies demonstrate that the proposed EL procedure (a) is more stable, i.e., depending less on the change points in the error variances, and (b) gets closer to the desired confidence level, than the traditional test statistic.

1. Introduction

In the literature of the macroeconomics and financial applications, the assumption of heteroscedasticity in many time series models revealed the facts that ignoring the issue of heteroscedasticity often leads to the inefficient estimation and unreliable inference. Thus, heteroscedasticity has been focused mainly on the effect of violations of homoscedasticity, usually in two forms, ‘conditional heteroscedasticity’ and ‘unconditional heteroscedasticity’.

Non-constant volatility will be identified by ‘conditional heteroscedasticity’, when future periods of high and low volatility cannot be identified. Bollerslev (Citation1986) and Engle (Citation1982) proposed ARCH or GARCH models and provided the efficient estimation of the mean function by quasi-maximum likelihood based on other adaptive procedures. More complicated GARCH models had been proposed to allow for conditional heteroscedasticity, for instance, varying coefficient GARCH models (see Polzehl & Spokoiny, Citation2006) and spline GARCH models (see Engle & Rangel, Citation2008). The time-varying volatility is often used to describe the conditional heteroscedasticity. Drees and Starica (Citation2002) and Starica (Citation2003) made use of a non-stationary framework to analyse time series of S&P 500 returns, and found that this approach outperformed the GARCH-type models.

‘Unconditional heteroscedasticity’ will be used, when variables that have identifiable seasonal variability, such as electricity usage, are discussed. Hansen (Citation1995) considered the linear regression model with deterministically trending regressors only, in which the error is an AR(p) process scaled by a continuous function of time. Nesting autoregressive model is also a special case when the conditional error variance of the model is a function of a covariate that has a form of a nearly integrated stochastic process with no deterministic drift. For the constant coefficient autoregressive model with time-varying variances (ARTV) which will be discussed in this article, Phillips and Xu (Citation2006) utilised the ordinary least squares method and the nonparametric estimation of the variance function to provide three heteroscedasticity-robust test statistics, and proved their asymptotic standard normal distributions. Xu and Phillips (Citation2008) proposed the heteroscedasticity-robust adaptive estimation for ARTV. Meanwhile, performances of methods in Phillips and Xu (Citation2006) and Xu and Phillips (Citation2008) relied on appropriately selecting the bandwidth used in the non-parametric function estimation.

Motivated from the ‘empirical likelihood’ (EL) approach, this article aims to develop a test statistic which is more stable, namely, depending less on the change points in the error variances, and avoiding the problem of selecting the bandwidth. In the literature, the EL approach was introduced by Owen (Citation1988), Owen (Citation1990) and Owen (Citation1991) to construct confidence intervals in a nonparametric setting, which can be seen in Owen (Citation2001). Since an EL approach possesses nonparametric properties, the distribution for the data is not required to be specified, and meanwhile more efficient estimates of the parameters can be yielded. The EL approach allows data to decide the shape of confidence regions without estimating the variance of the test statistic, and also is Bartlett correctable in DiCiccio et al. (Citation1991). The EL approach has been applied to various situations, such as generalised linear models in Kolaczyk (Citation1994), local linear smoother in Chen and Qin (Citation2000), partially linear models in Shi and Lau (Citation2000), parametric and semi-parametric models in multi response regression in Chen and Ingrid (Citation2009); linear regression with censored data in Zhou and Li (Citation2008), plug-in estimates of nuisance parameters in estimating equations in the context of survival analysis in Li and Wang (Citation2003) and Qin and Jing (Citation2001), heteroscedastic partially linear models in Lu (Citation2009); GARCH models in Chan and Ling (Citation2006); variable selection in Han et al. (Citation2013) and Variyath and Chen (Citation2010); analysis of longitudinal data in Qiu and Wu (Citation2015). Qin and Lawless (Citation1994) linked the EL with finitely many estimating equations, which served as finitely many equality constraints. To the best of our knowledge, there is no existing published work in the literature using the EL approach in the constant coefficient autoregressive models with time-varying variances. This article will also consider the constant coefficient autoregressive models with time-varying innovation variance by using the EL approach.

The remainder of the paper proceeds as follows. Section 2 describes the autoregressive model with time-varying variances and discusses main assumptions. Section 3 reviews the existing methods. Section 4 develops the empirical likelihood inference procedure with theoretical guarantees. Section 5 conducts simulation studies to evaluate the finite sample performance of the proposed method when compared with alternative methods. Section 6 briefly concludes. Technical details and proofs of the main results are relegated to Appendix.

2. Autoregressive model with time-varying variances

The constant coefficient autoregressive model with time-varying variances is described as follows, (1) Yt=β0+β1Yt1+β2Yt2++βpYtp+ut=Xt1βo+ut,(1) (2) ut=σtϵt,t=1,,T,(2) where denotes transpose, Xt1=(1,Yt1,,Ytp)Rp+1 is the vector of covariates, and βo=(β0,β1,,βp)Rp+1 is the true parameter vector of interest, with βp0, and the lag order p finite and known. We assume that {σt} is a deterministic sequence of time t, satisfying (3) σt=g(t/T),(3) and {ϵt} is a martingale difference sequence with respect to Ft, where Ft=σ(ϵs:st) is the σ-field generated by {ϵs:st} with E(ϵt2Ft1)=1, a.s., for all t. Thus, the conditional variance of {ut} is fully characterised by the multiplicative factor σt in (Equation2), i.e., (4) E(ut2|Ft1)=σt2=g2(t/T),a.s..(4) Suppose that the data are generated from models (Equation1)–(Equation2), and we observe a sample containing T + p observations, denoted by {Yp+1,Yp+2,,Y0,Y1,,YT}. The main goals are to make inferences about the true parameter vector βo in models (Equation1)–(Equation2), i.e., testing the null hypothesis, (5) H0:βo=b0,(5) where b0=(b0,0,b0,1,,b0,p)Rp+1, and constructing a confidence region for βo.

Section 4 will present our proposed empirical likelihood inference, after Section 3 describes the estimation methods in Phillips and Xu (Citation2006).

To facilitate the discussion of main results and comparison with related existing methods, the following conditions provided in Phillips and Xu (Citation2006); Xu and Phillips (Citation2008) are considered.

Conditions

  1. g() in (Equation3) and (Equation4) is a measurable and strictly positive function on the interval (0,1] such that 0<infr(0,1]g(r)supr(0,1]g(r)<, and g(r) satisfies a Lipschitz condition except at a finite number of points of discontinuity;

  2. Suppose that L is the lag operator. Then 0=1β1Lβ2L2βpLp has all roots outside the unit circle;

  3. {ϵt} satisfies E(ϵtFt1)=0, and E(ϵt2|Ft1)=1, a.s., for all t;

  4. suptE(|ϵt4ν|)< for some ν>1.

Remark 2.1

  1. In condition (A1), the function g is integrable on the interval (0,1] to any finite order. For brevity, we write 01gm(x)dx as gm for any finite positive integer m1.

  2. Condition (A2) satisfies the stability conditions which, for a constant g() and homoskedastic {ϵt}, would ensure that {Yt} is stationary or asymptotically covariance-stationary. Under condition (A2), the mean μ of Yt is given by μ=β01β1βp, and Yt has the Wold representation, Yt=μ+i=1αiuti, where {αi} satisfies that αiβ1αi1βpαip=0,as i>0, and i=1|αi|<. Define Ω to be the matrix with the (i,j)-th element γ|ij|, where γk=i=0αiαi+k<.

  3. Condition (A3) ensures that {ϵt} is a martingale difference sequence and, at the same time, stipulates E(ut2Ft1)=g2(t/T) doesn't depend on the past events, in other words, models (Equation1)–(Equation2) are unconditional heteroscedastic.

3. Existing methods

Regarding the estimation of βo in models (Equation1)–(Equation2), Phillips and Xu (Citation2006) reviewed the ordinary least squares (OLS) estimator βˆ, and showed that under the stated conditions, as T, (6) T(βˆβo)=(1Tt=1TXt1Xt1)1×(1Tt=1TXt1ϵt)DN(0,Λ),(6) where D stands for converges in distribution, Λ=Ω11Ω2Ω11, Ω1 and Ω2 are defined as the (p+1)×(p+1) matrices, (7) Ω1=(1μlpμlpμ2+(g2)Ω),Ω2=((g2)μ(g2)lpμ(g2)lpμ2(g2)+(g4)Ω),(7) lp=(1,,1)Rp is a vector of ones, and μ and Ω are as defined in Remark 2.1.

Since g is typically unknown, the asymptotic covariance matrix Λ in (Equation6) must be estimated and this can be done in several ways. First, by applying the weighted sum of squared OLS residuals using kernel smoothing, originally proposed by Nadaraya (Citation1964) and Watson (Citation1964) for estimation of regression functions, they proposed the consistent estimator of the function g2(r) non-parametrically for r[0,1], (8) gˆ2(r)=t=1Twr,tuˆt2,(8) where uˆt=YtXt1βˆ is the OLS residual and the weights wr,t, t=1,,T, are defined as (9) wr,t={t=1TK([Tr]tThT)}1K([Tr]tThT),(9) where the kernel function K():R[0,) is assumed to satisfy 0K(z)C1< uniformly in z and K(z)dz<C2<, for some constant C1 and C2; hT is a bandwidth parameter depending on T. The selection of bandwidth parameter hT uses the cross-validation procedure, i.e., minimises the averaged squared prediction errors (see Wong, Citation1983), (10) CV(b)=1Ts=1T{uˆs2gˆs2(s/T)}2,(10) with respect to b, where gˆs2(r)=t=1,tswr,tuˆt2. Phillips and Xu (Citation2006) suggested the following three consistent estimators of the asymptotic covariance matrix Λ when g is unknown.

  • The first estimator of the asymptotic covariance matrix is (11) Λˆ1=T(t=1TXt1Xt1)1(t=1Tuˆt2Xt1Xt1)×(t=1TXt1Xt1)1.(11)

  • The second estimator of the asymptotic covariance matrix is (12) Λˆ2=Ωˆ11(t=1Tuˆt2Xt1Xt1)Ωˆ11,(12) where the matrix Ωˆ1 is defined as Ωˆ1=(1μˆlpμˆlpμˆ2+(T1t=1Tuˆt2)Ωˆ), where μˆ and Ωˆ correspond to replacing βo, in the expressions of μ and Ω in Remark 2.1, with βˆ.

  • The third estimator of the asymptotic covariance matrix is (13) Λˆ3=Ωˆ11Ω~2Ωˆ11,(13) where the matrix Ωˆ2 is defined as Ω~2=(gˆ2μˆ(gˆ2)lpμˆ(gˆ2)lpμˆ2(gˆ2)+(gˆ4)Ωˆ).

Based on the above three estimators Λˆj of the true covariance matrix Λ, Phillips and Xu (Citation2006) constructed three test statistics tj, j = 1, 2, 3, for the true parameter vector βo, stated as follows.

Lemma 3.1

Theorem 2(ii) in Phillips and Xu (Citation2006)

Assume that βˆ is the OLS estimator of βo. Then, under the above assumptions and null hypothesis (Equation5), it follows that (14) tj=T(βˆkb0,k)((Λˆj)kk)1/2DN(0,1),as T,(14) where (Λˆj)kk is the (k,k)-th element of the matrix Λˆj, j = 1, 2, 3, defined in (Equation11), (Equation12) and (Equation13), respectively.

Hence, a large sample level 100(1α)% confidence region for βo based on the above Normal approximation (Equation14) is given by (15) j,α={b:T(βˆb)×[diag{(Λˆj)k,k,k=0,1,,p}]1(βˆb)χp;1α2},(15) where diag{(Λˆj)k,k,k=0,1,,p} is the main diagonal matrix of Λˆj, j = 1, 2, 3, and χp;1α2 denotes the 100(1α)th quantile of the chi-square distribution χp2 with p degrees of freedom.

4. Proposed method

In terms of the practical performance of the three tests tj in (Equation14), however, simulation results reveal two major issues arising from the estimation of the asymptotic covariance matrix and the selection of the bandwidth. In order to solve these problems, the proposed empirical likelihood approach will be applied to test parameters in models (Equation1)–(Equation2).

To construct an empirical likelihood function, the estimation equations will be defined by means of, (16) Wt(b)=Xt1(YtXt1b),(16) for a generic model parameter bRp+1. According to condition (A3), we have that E(Wt(βo))=E(Xt1g(t/T)ϵt)=g(t/T)E(Xt1ϵt)=0 holds for the true parameter vector βo. Based on (Equation16), we define the empirical likelihood for the parameter b by L(b)=sup{t=1Tqt:t=1Tqt=1, t=1TqtWt(b)=0}. By using the Lagrange multiplier, we have qˆt(b)=1T{1+λˆWt(b)}1,t=1,,T, where λˆ=λˆ(b)Rp+1 is the solution of equations, (17) 1Tt=1TWt(b)1+λˆWt(b)=0.(17) We also note that t=1Tqt, subject to constraints qt0 and t=1Tqt=1, attains its maximum (1/T)T at qt=1/T. Thus, the empirical likelihood ratio at b is defined by ELR(b)=t=1T{qˆt(b)T}1=t=1T{1+λˆWt(b)}. Taking the log transformation of the above equation, we get the corresponding empirical log-likelihood ratio, (18) (b)=2t=1Tlog{1+λˆWt(b)}.(18) In addition, Theorem 4.1 below provides the asymptotic null distribution of (βo).

Theorem 4.1

Assume that conditions (A1)–(A4) hold. Then, under the null hypothesis (Equation5), the limiting distribution of (βo) is the chi-square distribution with p degrees of freedom, i.e., (19) (βo)Dχp2,as T.(19)

According to Theorem 4.1, the empirical likelihood ratio confidence interval for the true value βo can be constructed as follows: (20) EL,α={b:(b)χp;1α2},(20) where χp;1α2 is defined below (Equation15). Combined with (Equation20), Theorem 4.1 implies Corollary 4.1.

Corollary 4.1

Under the conditions of Theorem 4.1, P(βoEL,α)1α,as T.

5. Simulation evaluation

In this section, simulation studies are conducted to compare the finite sample performance of five methods described in Sections 34:

  • Ordinary least squares without the heteroscedasticity correction (OLS),

  • t1, t2, t3,

  • the proposed empirical likelihood (EL) procedure.

The zero-mean AR(1) with the time-varying variance is considered as follows: Yt=β0,1Yt1+g(t/T)ϵt, where {ϵt}i.i.d.N(0,1). The kernel function K() is the standard Normal density function, K(x)=12πexp(x22),<x<, and the bandwidth parameter is selected by the cross-validation criterion (Equation10). We consider H0:β0,1=β1 with known values of β1.

Three kinds of the variance functions g2(r) are considered in the following simulations: a single abrupt point model, two abrupt points model, continuous function variance model as follows.

Model 1: A single abrupt point model, g2(r)=σ02+(σ12σ02)I{rκ},r[0,1]. Model 1 corresponds to the case of a single abrupt change of the error variance from σ02 to σ12 at time [κT], where κ is the break point within the value set {0.1,0.5,0.9}. The ratio of post-break and pre-break standard deviations δ=σ1/σ0 is within the value set {0.2,1,5} where σ0=1.

Model 2: Two abrupt points model, g2(r)=σ02+(σ12σ02)I{κ0<rκ1}+(σ22σ02)I{κ1<r},r[0,1]. Model 2 corresponds to the case of two abrupt points model which has the change of the error variance from σ02 to σ12 and σ12 to σ22. The time break points (κ0,κ1) take the values (0.1,0.9); (σ02,σ12,σ22) are from the set {(0.2,5,0.2),(5,0.2,5)}.

Model 3: Continuous function variance model, g2(r)=σ02+(σ12σ02)rm,r[0,1]. Model 3 considers that the variance of the errors is the continuous function from σ02 to σ12. We suppose m to be within the value set {1,2} and δ=σ1/σ0 within the value set {0.2,5} where σ02=1.

Model 1 and Model 3 are the same as in Cavaliere (Citation2004), Cavaliere and Taylor (Citation2007) and Phillips and Xu (Citation2006). Simulations are done when the parameter of interest β1 increases on the set {0.1,0.5,0.9}, and the nominal size is 5%. The sample size T is from {60,200} respectively. The number of Monte Carlo replications is 5000.

Simulation results include two parts. The first part displayed in Tables , and  assesses the rejection rates of five methods under the null hypothesis.

Table 1. Comparison of the rejection rates of five methods in Model 1 for β1{0.1,0.5,0.9}, κ{0.1,0.5,0.9}, δ{0.2,1,5} and the sample size T{60,200}, based on 5000 replications.

Table 2. Comparison of the rejection rates of five methods in Model 2 for β1{0.1,0.5,0.9}, [κ0,κ1]=[0.1,0.9], [σ0,σ1,σ2]{[0.2,5,0.2],[5,0.2,5]} and the sample size T{60,200}, based on 5000 replications.

Table 3. Comparison of the rejection rates of five methods in Model 3 for β1{0.1,0.5,0.9}, m{1,2}, δ{0.2,5} and the sample size t{60,200}, based on 5000 replications.

The second part includes Figures  to evaluate the rejection rates of methods OLS, t1, t2, t3 and EL as the parameter β1 under the alternatives increases.

Figure 1. The relationship between the rejection rates of OLS, t1, t2, t3, EL and the true coefficient β1 in Model 1 (a single abrupt point model). The abrupt point κ=0.1, δ=0.2. The true parameter β1 increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

Figure 1. The relationship between the rejection rates of OLS, t1, t2, t3, EL and the true coefficient β1 in Model 1 (a single abrupt point model). The abrupt point κ=0.1, δ=0.2. The true parameter β1 increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

From these simulations, we draw the following conclusions.

  1. First, the OLS-based test is the inefficient and unreliable test under the heteroscedastic innovations. From Table , the OLS-based test overrejects overwhelmingly the null hypothesis when the null is true, and has the largest distorted size under (κ,δ){(0.1,0.2),(0.9,5)}. In addition, the distorted size doesn't reduce except for the homoscedastic innovations with the increasing sample size which is also shown in Figures and . From Table , the OLS-based test has better performance than Table , however, the rejection rate reduces as the sample size increases. The results of the OLS-based test in Table  are similar to those in Table .

  2. Second, the performance of t2 and t3 depends on the numerical value of the true parameter and the pattern of the variance function. From Figures , , an interesting phenomenon can be found that the rejection rates of t2 and t3 are likely to be an increasing function of the parameter and grow bigger as β1>0.5. The rejection rate of t2 is far greater than the nominal size 5% when the numerical value of the parameter is close to unity, namely β1=0.9. In particular, it is easy to see that t2 and t3 overaccept the null hypothesis when the parameter is less than or equal to 5% when β1<0.5. On the contrary, t2 and t3 overreject the null hypothesis when β1>0.9. It also has the similar conclusions from Tables . So both t2 and t3 aren't the stable test for the ARTV model.

  3. Third, both EL and t1 are the stable tests for the ARTV model and EL outperforms t1. From Tables , we can find that EL and t1 overreject the null hypothesis when the null is true. From Figures , the rejection rate of EL is almost a horizontal line and is closer to the nominal level 5% than t1 except Figure (a) when the sample size is 60. When the sample size is 200, EL's rejection rate is nearly a nominal size of 5% and doesn't depend on the numerical value of the parameters even if the true value of β is close to unity (β1=0.9). EL has the smallest size distortion overall and avoids correcting the variance. The simulation results generally support the asymptotic results. EL is more stable and has better performance than OLS, t1, t2, t3 for testing the parameters of ARTV. So EL seems to be the better choice.

Figure 2. The relationship between the rejection rates of OLS, t1, t2, t3, EL and the true coefficient β1 in Model 2 (two abrupt points model). The abrupt points κ1=0.1, κ2=0.9, [σ0,σ1,σ2]=[0.2,5,0.2]. The true parameter β1 increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

Figure 2. The relationship between the rejection rates of OLS, t1, t2, t3, EL and the true coefficient β1 in Model 2 (two abrupt points model). The abrupt points κ1=0.1, κ2=0.9, [σ0,σ1,σ2]=[0.2,5,0.2]. The true parameter β1 increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

Figure 3. The relationship between the rejection rates of OLS, t1, t2, t3, EL corresponding to the true coefficient β1 in Model 3 (continuous function variance model), and m = 1, δ=0.2. The true parameter β1 increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

Figure 3. The relationship between the rejection rates of OLS, t1, t2, t3, EL corresponding to the true coefficient β1 in Model 3 (continuous function variance model), and m = 1, δ=0.2. The true parameter β1 increases gradually from 0.1 to 0.9. (a) The sample T = 60; (b) the sample T = 200.

6. Conclusion

This article focuses on the empirical likelihood approach for autoregressive models with error terms scaled by an unknown nonparametric time-varying function. The empirical likelihood ratio test statistic avoids estimating the unknown variance function, in the presence of heteroscedastic error terms. The results of simulations of three different models show that the empirical likelihood is more stable than the other four test statistics. In addition, some extensions include improving the efficiency of statistic based on the different equations, and locating the abrupt time points when they exist.

Acknowledgments

The authors thank the editor, Prof. Jun Shao, and two anonymous reviewers for helpful comments. Yu Han was supported by the Scientific Research Foundation of Jilin Education (JJKH20200102KJ). The work of C. Zhang was partially supported by U.S. National Science Foundation grants DMS-2013486 and DMS-1712418, and provided by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The authors thank the editor, Prof. Jun Shao, and two anonymous reviewers for helpful comments. Yu Han was supported by the Scientific Research Foundation of Jilin Education [grant number JJKH20200102KJ]. The work of C. Zhang was partially supported by U.S. National Science Foundation [grant numbers DMS-2013486 and DMS-1712418], and provided by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

Notes on contributors

Yu Han

Yu Han, received Ph.D. degree in mathematical statistics from Jilin University in 2012. He is currently an associate research fellow in the Educational Supervision and Evaluation Center of Northeast Electrical Power University. His current research interests are Time Series Analysis, Non-parametric and semi-parametric estimation & inference. He worked at Department of Statistics, Wisconsin University-Madison between 2013 and 2014 as a visiting scholar. He has published 12 papers. He has accomplished 3 projects as a principle investigator and as a participator. One was accomplished, and two are ongoing.

Chunming Zhang

Chunming Zhang is Professor of Statistics at the University of Wisconsin-Madison. Her research interests range from statistical learning and data mining, statistical methods with applications to imaging data, neuroinformatics and bioinformatics, multiple testing, large-scale simultaneous inference and applications, statistical methods in financial econometrics, non- and semi-parametric estimation and inference, to functional and longitudinal data analysis.

References

Appendix. Proofs of main results

Before proving Theorem 4.1, we first show Lemmas A.1–A.2. To simplify notations, we denote λˆ=λˆ(βo) and Wt=Wt(βo).

Lemma A.1

Assume that conditions (A1)(A4) hold. Then (A1) 1Tt=1TWtDN(0,Ω2),(A1) (A2) 1Tt=1TWtWtPΩ2,(A2) where P denotes converges in probability.

Proof.

According to Phillips and Xu (Citation2006) (Lemma 1(iii) –(iv)), the proof of Lemma A.1 completes.

Lemma A.2

Assume that conditions (A1)(A3) hold. Then λˆ=OP(T1/2).

Proof.

From (Equation17), we have 0=1Tt=1TWt1Tt=1TWtWt1+λˆWtλˆ. By (EquationA1) of Lemma 3.1, λˆ21+λˆ2maxtWt21Tt=1TWtWt1Tt=1TWt2=OP(T1/2). According to conditions (A1) and (A4), we have E(|Yt|4ν)< for some ν>1, and then (A3) maxtWt2=maxtXt1(YtβoXt1)2=maxtXt1ut2=maxtXt1g(t/T)ϵt2=oP(T14ν).(A3) From (EquationA2) of Lemma A.1 and a similar argument used in Owen (Citation1991), the proof of Lemma A.2 is completed.

Proof of Theorem 4.1.

Noticing that if βo is the true parameters, applying Taylor's expansion to (Equation18), we have (A4) (βo)=2t=1Tlog(1+λˆWt)=2t=1T{λˆWt12(λˆWt)2}+rT,(A4) where rT, in probability, satisfies the following inequality in light of Lemma A.1 (EquationA2) and Lemma A.2 for some constant C>0, |rT|Ct=1T|λˆWt|3Cλˆ23max1tTWt2t=1TWt22=oP(1). By Lemma A.1 (EquationA2), Lemma A.2 and similar arguments as above, we have (A5) t=1T(λˆWt)31+λˆWt=oP(1).(A5) By (Equation17), we obtain (A6) 0=t=1TλˆWt1+λˆWt=t=1T(λˆWt)t=1T(λˆWt)2+t=1T(λˆWt)31+λˆWt.(A6) By (EquationA5) and (EquationA6), we obtain (A7) t=1T(λˆWt)=t=1T(λˆWt)2+oP(1).(A7) Again by (Equation17), we obtain 0=t=1TWt1+λˆWt=t=1TWt{1λˆWt+(λˆWt)21+λˆWt}=t=1TWtt=1T(WtWt)λˆ+t=1TWt(λˆWt)21+λˆWt. By Lemma A.1 and (EquationA3), we have 1Tt=1TWt(λˆWt)21+λˆWt2Cλˆ22maxtWt21Tt=1TWt22=oP(T1/2). Thus, we have λˆ=(t=1TWtWt)1t=1TWt+(1Tt=1TWtWt)1×{1Tt=1TWt(λˆWt)21+λˆWt}=(t=1TWtWt)1t=1TWt+oP(T1/2). By substituting λˆ of the above equation into (EquationA4) and (EquationA7), we have (βo)=t=1TλˆWtWtλˆ+oP(1)=(T1/2t=1TWt)(T1t=1TWtWt)1×(T1/2t=1TWt)+oP(1). The proof of Theorem 4.1 is completed by using Lemma A.1.