1,075
Views
12
CrossRef citations to date
0
Altmetric
Original Articles

Homogeneity Pursuit in Single Index Models based Panel Data Analysis

, &
Pages 386-401 | Published online: 15 Oct 2019
 

Abstract

Panel data analysis is an important topic in statistics and econometrics. Traditionally, in panel data analysis, all individuals are assumed to share the same unknown parameters, e.g. the same coefficients of covariates when the linear models are used, and the differences between the individuals are accounted for by cluster effects. This kind of modelling only makes sense if our main interest is on the global trend, this is because it would not be able to tell us anything about the individual attributes which are sometimes very important. In this paper, we propose a modelling based on the single index models embedded with homogeneity for panel data analysis, which builds the individual attributes in the model and is parsimonious at the same time. We develop a data driven approach to identify the structure of homogeneity, and estimate the unknown parameters and functions based on the identified structure. Asymptotic properties of the resulting estimators are established. Intensive simulation studies conducted in this paper also show the resulting estimators work very well when sample size is finite. Finally, the proposed modelling is applied to a public financial dataset and a UK climate dataset, the results reveal some interesting findings.

Acknowledgements

The authors sincerely thank the Editor Professor Christian Hansen, the Associate Editor and two anonymous reviewers for their insightful comments that significantly improve the paper. The research of Heng Lian is supported by Hong Kong RGC general research fund 11301718, and by Project 11871411 from NSFC and the Shenzhen Research Institute, City University of Hong Kong. This research is also supported by National Natural Science Foundation of China (Grant Number 71833004)

Appendix

The Appendix includes additional notations and brief technical proofs supporting Section 3 in Sections A.1 and A.2–A.5, respectively.

A.1  Additional notations

Let ϵt=(ϵ1t,,ϵmt)T, yt=(y1t,,ymt)T, Xt=(X1tT,,XmtT)T. Due to assumption (C3), there exists θ0=(θ01T,,θ0mT)T,θ0i=(θ0i1,,θ0iK)T such that supx|g0i(x)θ0iTB(x)|CK2. Here and below we use C to denote a generic positive constant whose value can change even on the same line. We use .op to denote the operator norm of a matrix (the operator norm is the same as the largest singular value) and use . to denote the Frobenius norm of a matrix. We use .L2 to denote the L2 norm of functions and . is the sup-norm for vectors (maximum absolute value of the components).

Assume the true partition of components of θ0 and β¯0 is given by h=1H1G1,h={1,,mK} and h=1H2G2,h={1,,mp}, respectively. The unique values of the components of θ0 and β¯ are denoted by ξ0=(ξ01,,ξ0H1)TRH1 and η0=(η01,,η0H2)TRH2, respectively. Let JiG1 be the K×H1 binary matrix whose (k, h) entry is 1 if θ0ik=ξh and 0 otherwise. We have θ0i=JiG1ξ0. Similarly, we define JiG2 such that β0i=JiG2η0. The sizes of G1,h and G2,h are denoted by |G1,h| and |G2,h|. Finally, let DG1 and DG2 be the diagonal matrix with entries |G1,h| and |G2,h|, respectively.

A.2  Proof summary

We first define the oracle estimator as the minimizer (θ̂,β̂) of minθ,β¯i=1mt=1T(yitBT(XitTβi)θi)2,

where βi=(1,β¯iT)T=(1,βi1,,βip)T and θi=(θi1,,θiK)T with the constraint that components of β¯=(β¯1T,,β¯mT)T in the same partition take the same value and components of θ=(θ1T,,θmT)T in the same partition take the same value. Here we assume the partition is the true partition, thus the name “oracle”.

Below we first show that the oracle estimator satisfies the asymptotic normality properties stated in Theorem 2 (we also obtained convergence rate and asymptotic normality for the entire vector β and θ, see for example (A.9) and (A.10). Also, noting that all arguments carry over when the partition used in the oracle estimator is finer than the true partition, Theorem 1 follows directly as a special case that each component of θ and β forms its own group in the partition. Then we show that the change points can be consistently estimated, and thus the estimator we obtain in stage 3 will be exactly the same as the oracle estimator using the true partition, with probability approaching one, and Theorem 2 is proved. The rest of the appendix contains a sketch of the proofs outlined above while more details are relegated to the supplementary material, as well as several lemmas.

A.3  Proof of asymptotic property for the oracle estimator

In this part we consider the asymptotic property of the oracle estimator, denoted by (θ̂,β̂) in this section, which assumed knowledge of the true partitions. For clarity of presentation, the proof is split into several steps.

STEP 1. Prove the convergence rate θ̂θ0+β̂β0=Op((H1+H2)/T+mK2).

In this section, when we use θ, we always assume θi=JiG1ξ for some ξRH1 (that is, components of θ are partitioned in the same way as is the true θ0). It is easy to see that θθ0=DG1(ξξ0). Similarly, we always assume βi=JiG2η for some ηRH2 and ββ0=DG2(ηη0).

Define rT=(H1+H2)/T+mK2. We only need to show that infββ02+θθ02=LrT2i=1mt=1T(yitθiTB(XitTβi))2i=1mt=1T(yitθ0iTB(XitTβ0i))2>0

with probability approaching one, if L is large enough.

We have i=1mt=1T(yitθiTB(XitTβi))2i=1mt=1T(yitθ0iTB(XitTβ0i))2=i,t(θiTB(XitTβi)θ0iTB(XitTβ0i))22(ϵitrit)(θiTB(XitTβi)θ0iTB(XitTβ0i)),

where rit=θ0iTB(XitTβ0i)g(XitTβ0i) with |rit|CK2.

Furthermore, some algebra shows i,t(θiTB(XitTβi)θ0iTB(XitTβ0i))2=T((ξTξ0T)DG1,(ηTη0T)DG2)((DG1)100(DG2)1)((J1G1)T0(JmG1)T00(J1G2)T0(JmG2)T)(A˜1000A˜2000A˜m)·(J1G100J1G2JmG100JmG2)((DG1)100(DG2)1)(DG1(ξξ0)DG2(ηη0)),

where A˜i=1Tt=1T[(B(XitTβi)θ0iTB(XitTβi*)X¯it)(BT(XitTβi)θ0iTB(XitTβi*)X¯itT)],1im,

B(.)=(B1(.),,BK(.))T are the first derivatives of the basis functions and βi* lies between β0i and βi.

With the help of Lemma 3 in the SUPPLEMENTARY MATERIAL provided in a separated file, and noting that (A.1) O:=(J1G100J1G2JmG100JmG2)((DG1)100(DG2)1)(A.1)

is an orthonormal matrix (that is, OTO=I), we can get (A.2) i,t(θiTB(XitTβi)θ0iTB(XitTβ0i))2T(DG1(ξξ0)2+DG2(ηη0)2)=T(θθ02+ββ02).(A.2)

Now consider the term (ϵitrit)(θiTB(XitTβi)θ0iTB(XitTβ0i)). We can show i,tϵit(θiTB(XitTβi)θ0iTB(XitTβ0i))DG1(ξξ0)2+DG2(ηη0)2tOT(B(X1tTβ1)ϵ1tθ01TB(X1tTβ1*)X¯1tϵ1tB(XmtTβm)ϵmtθ0mTB(XmtTβm*)X¯mtϵmt),

where O is as defined in (A.1), and further calculations reveal that (A.3) EtOT(B(X1tTβ1)ϵ1tθ01TB(X1tTβ1*)X¯1tϵ1tB(XmtTβm)ϵmtθ0mTB(XmtTβm*)X¯mtϵmt)2 tr(OOT)·1t,tT[A1,|tt|000A2,|tt|000Am,|tt|]op,(A.3)

where Ai,|tt|=E[(B(XitTβi)θ0iTB(XitTβi*)X¯it)(BT(XitTβi)θ0iTB(XitTβi*)X¯itT)ϵitϵit].

By Lemma 4 in the SUPPLEMENTARY MATERIAL provided in a separated file, and that  tr(OOT)=H1+H2 (note OTO=IH1+H2), we have (A.4) i,tϵit(θiTB(XitTβi)θ0iTB(XitTβ0i))=Op((θθ2+ββ02)(H1+H2)T).(A.4)

Finally, using Cauchy-Schwarz inequality (A.5) i,trit(θiTB(XitTβi)θ0iTB(XitTβ0i))=CmTK2·Op(T(θθ2+ββ02))(A.5)

Combining (A.2)–(A.5), i=1mt=1T(yitθiTB(XitTβi))2i=1mt=1T(yitθ0iTB(XitTβ0i))2>0

with probability approaching one, if ββ02+θθ02=LrT2 with L sufficiently large. Thus there is a local minimizer (θ̂,β̂) with β̂β0+θ̂θ0=Op(rT).

STEP 2. Proof of convergence rate of β̂ and its asymptotic normality.

Let Πi be T×K matrices, i=1,,m, with rows ΠitT=BT(XitTβ0i). Define Vit=g0i(XitTβ0i)X¯it,Pi=Πi(ΠiTΠi)1ΠiT with rows PitT=ΠitT(ΠiTΠi)1ΠiT. We write, for any (θ,β) with θθ02CrT2 and ββ02CH2/T, i,t(yitθiTB(XitTβi))2=i,t(ϵitΠitT(θiθ0i)VitT(β¯iβ¯0i)Rit)2,

where Rit={θ0iTB(XitTβ0i)g0i(XitTβ0i)}+{(θiθ0i)T(B(XitTβi)B(XitTβ0i))}+{θ0iT(B(XitTβi)B(XitTβ0i)B(XitTβ0i)X¯itT(β¯iβ¯0i)}+{(θ0iTB(XitTβ0i)g0i(XitTβ0i))X¯itT(β¯iβ¯0i)}=Rit1+Rit2(θi,βi),

with Rit1=θ0TB(XitTβ0i)g0i(XitTβ0i) and Rit2(θi,βi) (or Rit2 for short) contains the remaining terms. Using θθ02+ββ02CrT2, we can show (A.6) i,tRit22=Op(TrT4K3+TrT2K2).(A.6)

We then orthogonalize the parametric part with respect to the nonparametric part by writing i,t(ϵitΠitT(θiθ0i)VitT(β¯iβ¯0i)Rit)2=i,t(ϵitΠitT(αiα0i)(VitViTPit)T(β¯iβ¯0i)Rit1Rit2(Mi(αi,βi))2,

where αi=θi+(ΠiTΠi)1ΠiTViβ¯i,α0i=θ0i+(ΠiTΠi)1ΠiTViβ¯0i,Vi=(Vit,,ViT)T, and Mi is the one-to-one mapping that maps (αi,βi) to (θi,βi). Below we write Rit2(Mi(αi,βi)) as Rit2,Rit2(Mi(α̂i,β̂i)) as R̂it2, and note Rit2(Mi(α̂i,β0))=0. Then, (A.7) 0i,t(ϵitΠitTα̂i(VitViTPit)T(β¯̂iβ¯0i)Rit1Rit2)2i,t(ϵitΠitTα̂iRit1)2=i,t(η̂η0)T(JiG2)T(VitViTPit)(VitTPitTVi)JiG2(η̂η0)2i,t(η̂η0)T(JiG2)T(VitViTPit)ϵit2i,tR̂it2ϵit+i,tR̂it22+2i,t((η̂η0)T(JiG2)T(VitViTPit)+R̂it2)(ΠitTα̂i+Rit1)+2i,tR̂it2·(η̂η0)T(JiG2)T(VitViTPit).(A.7)

The first term above is i,t(η̂η0)(JiG2)T(VitViTPit)(VitTPitTVi)JiG2(η̂η0)=T(η̂η0)TDG2O2T(Ĉ1000Ĉ2000Ĉm)O2DG2(η̂η0),

where O2=(J1G2JmG2)(DG2)1 is an mp×H2 orthonormal matrix, and Ĉi=t(VitViTPit)(VitViTPit)T/T.

Let Ci=E[(g0i(XitTβ0i))2(X¯itE[X¯it|Xitβ0i])2]. Lemma 5 in the SUPPLEMENTARY MATERIAL, provided in a separated file, shows that maxiĈiCiop=op(1). Based on this, we have the first term in (A.7) is bounded below by CTDG2(η̂η0)2.

Now consider the second term in (A.7). We have i,t(η̂η0)(JiG2)T(VitViTPit)ϵitDG2(η̂η0)·tO2T((V1tV1TP1t)ϵ1t(VmtVmTPmt)ϵmt),

and with some more detailed analyses we get (A.3), we get (A.8) E[tO2T((V1tV1TP1t)ϵ1t(VmtVmTPmt)ϵmt)2]=O(H2T)(A.8)

and thus the second term in (A.7) is Op(H2TDG2(η̂η0)). The remaining terms in (A.7) can be shown to be of order op(1). Summarizing the bounds for different terms in (A.7), we get DG2(η̂η0)2+DG2(η̂η0)Op(H2/T)+op(1/T)0,

which implies β̂β0=DG2(η̂η0)=Op(H2/T).

To get asymptotic normality, we define η˜=η0+(i,t(JiG2)T(VitViTPit)(VitTPitTVi)JiG2)1i,t(JiG2)T(VitViTPit)ϵit.

Then for any unit vector a2RH2, we have a2TDG2(η˜η0)=T1a2T(O2T(Ĉ1000Ĉ2000Ĉm)O2)1·tO2T((V1tV1TP1t)ϵ1t(VmtVmTPmt)ϵmt).

Consider b2:=T1a2T(O2T(C1000C2000Cm)O2)1·tO2T((V1tΦ1t)ϵ1t(VmtΦmt)ϵmt).

We can show using the central limit theorem under mixing conditions, for example results in Bardet et al. (2008), that Tν2,T1/2b2dN(0,1),

where ν2,T=a2T(O2TCO2)1O2TΣ2O2(O2TCO2)1a2,C=(C1000C2000Cm),Σ2=1T1t,tT[C1,|tt|000C2,|tt|000Cm,|tt|],

where Ci,|tt|=E[ϵitϵitg0i(XitTβ0i)g0i(XitTβ0i)(X¯itE[X¯it|XitTβ0i])(X¯itE[X¯it|XitTβ0i])T]. Furthermore, it can be shown that |a2TDG2(η˜η0)b1|=op(1/T),

and |a2TDG2(η̂η0)a2TDG2(η˜η0)|=op(1/T),

which established the asymptotic normality of η̂.

Since β¯̂β¯0=O2DG2(η̂η0), b2T(β¯̂β¯0)=b2TO2DG2(η̂η0) is asymptotically normal. That is, for any unit vector b2Rmp, (A.9) Tκ2,T1/2b2T(β¯̂β¯0)dN(0,1),(A.9)

where κ2,T:=b2TO2(O2TCO2)1O2TΣ2O2(O2TCO2)1O2Tb2.

STEP 3. Proof of the convergence rate of θ̂ and its asymptotic normality.

To get the convergence rate of θ̂, like for β̂, we perform a projection, which is now the projection for the nonparametric part. Let A0i:=argminAB(XitTβ0i)g0i(XitTβ0i)AX¯it2. Obviously, we have A0i=E[g0i(XitTβ0i)B(XitTβ0i)X¯itT](E[(g0i(X¯itTβ0i))2X¯itX¯itT])1.

Writing now that i,t(yitθiTB(XitTβi))2=i,t(ϵit(ΠitTQitTΠi)(θiθ0i)VitT(γiγ0i)Rit1Rit2(θi,γi(ViTVi)1ViTθi))2,

where QitT is the t-th row of Qi=Vi(ViTVi)1ViT and γi=β¯i+(ViTVi)1ViTΠiθi,γ0i=β¯0i+(ViTVi)1ViTΠiθ0i, we can show θ̂θ02=Op(H1/T) and its asymptotic normality similar to arguments used for β̂. In particular, we have that for any unit vector b1Rmp, (A.10) Tκ1,T1/2b1T(θ̂θ0)dN(0,1),(A.10)

where κ1,T=b1TO1(O1TDO1)1O1TΣ1O1(O1TDO1)1O1Tb1,D=(D1000D2000Dm),Σ1=1T1t,tT[D1,|tt|000D2,|tt|000Dm,|tt|],Di=E[(B(XitTβ0i)g0i(XitTβ0i)A0iX¯it)2],Di,|tt|=E[ϵitϵit(B(XitTβ0i)g0i(XitTβ0i)A0iX¯it)(B(XitTβ0i)g0i(XitTβ0i)A0iX¯it)T].

A.4  Proof of Theorems 1 and 2

We now consider the proof of Theorems 1 and 2 as special cases of (A.9) and (A.10). Consider first Theorem 2, under the additional assumption that the true partition is used. As shown previously, the asymptotic variance of β¯̂β¯0 is T1O2Θ2O2T, where Θ2=(O2TCO2)1O2TΣ2O2(O2TDO2)1. From our proof, it is easy to see that eigenvalues of Θ2 are bounded and bounded away from zero. By the definition of the mp×H2 matrix O2, it is easy to see that its row corresponding to βij, say denoted by O2(ij)T, has a single nonzero entry 1/mij. Let eij=mijO2(ij), which is a unit vector, then the asymptotic variance of β̂ijβ0ij is (mijT)1eijTΘ2eij.

The asymptotic variance of θ̂iθ0i is T1JiG1(DG1)1Θ¯1(DG1)1(JiG1)T, where Θ¯1=(O1TDO1)1O1TΣ1O1(O1TCO1)1

with eigenvalues bounded and bounded away from zero. By definition of JiG1 and DG1, it can be seen that each row of the K×H1 matrix JiG1(DG1)1 has a single nonzero entry 1/mi and thus if we define Ki=miJiG1(DG1)1, it is easy to directly verify that KiTv is bounded and bounded away from zero and infinity for any unit vector v. Also, we have B(x)K. Thus the asymptotic variance of BT(x)θ̂iBT(x)θ0i can be written as KmiTbT(x)Θ1Tb(x), if we define b(x)=KiTB(x)/KiTB(x), and Θ1=Θ¯1KiTB(x)2/K.

For Theorem 1, since the result is standard, and also is a special case of Theorem 2, we omit the repetition of arguments above. The quantities e˜ij,b˜(x),Θ˜1 and Θ˜2 are defined as above based on the trivial structure in which each single parameter forms its own group in the partition.

The proof of Theorem 2 would be complete if we can establish consistency of homogeneity pursuit based on change point detection. That is, we need to show that the true partition can be identified with probability approaching one.

First, we can show DG2(η̂η0)2=Op(log(Tm)/T) and DG1(ξ̂ξ0)2=Op(log(Tm)/T). The general strategy for establishing these is similar to showing the convergence rate of β̂, using a slightly different projection, and one needs to carefully construct bounds that are valid uniformly over components of η̂.

Then we use sequence b(1)b(n)(n=mp) for illustration, with estimated change points k̂0=0<k̂1<<k̂Ĥ2=n. The true ordered sequence of β is β0(1)β0(n) with change points kh, h=0,,H2. Let γ2=min2hH2|β0(kh+1)β0(kh)| be the minimum jump size. The sup-norm convergence results established above, when specializing to the estimator in stage 1, imply that β˜β0=Op(aT) where aT=log(Tm)/T. On the event {β˜β0CaT}. It is easy to see that (A.11) maxu1<k<e|Δu,e(k)Δu,e0(k)|aT.(A.11)

where Δu,e0(k)=(ek)(ku+1)eu+1|l=k+1eβ0(l)ekl=ukβ0(l)ku+1|.

Now suppose u – 1 and e are both change points and there is at least one change point inside (u1,e). Let k̂=argmaxu1<k<eΔu,e(k) and k0=argmaxu1<k<eΔu,e0(k). We prove consistency by way of contradiction. Suppose k̂ is not one of the true change points. Using some results in Venkatraman (1992) and Cho and Fryzlewicz (2012), we can show that this would lead to Δu,e(k0)>Δu,e(k̂) by (A.11), a contradiction by the definition of k̂. Also, in this case, it is easy to see that maxu1<k<eΔu,e(k)maxu1<k<eΔu,e0(k)anCγ2aT>δ2.

Now suppose still u1,e are both change points but there are no other change point inside (u1,e). In this case, using (A.11), it is easy to see that maxu1<k<eΔu,e(k)naT.

Since we refrain from further partitioning the interval (u1,e) if and only if maxu1<k<eΔu,e(k)<δ2 with naTδ2γ2, we see that the algorithm consistently identifies exactly the true change points in β0.

The proof for change point detection in θ is the same, and the proof of Theorem 2 is complete.

A.5  Proof of Theorem 3

For the first statement, we just need to note that β¯ is the minimizer of minai=1mβia2,

and all βˇi are the same, thus 1mpi=1mβˇiβi21mpi=1mβiβ¯2c.

Similarly we can show the second statement.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 123.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.