Abstract
Panel data analysis is an important topic in statistics and econometrics. Traditionally, in panel data analysis, all individuals are assumed to share the same unknown parameters, e.g. the same coefficients of covariates when the linear models are used, and the differences between the individuals are accounted for by cluster effects. This kind of modelling only makes sense if our main interest is on the global trend, this is because it would not be able to tell us anything about the individual attributes which are sometimes very important. In this paper, we propose a modelling based on the single index models embedded with homogeneity for panel data analysis, which builds the individual attributes in the model and is parsimonious at the same time. We develop a data driven approach to identify the structure of homogeneity, and estimate the unknown parameters and functions based on the identified structure. Asymptotic properties of the resulting estimators are established. Intensive simulation studies conducted in this paper also show the resulting estimators work very well when sample size is finite. Finally, the proposed modelling is applied to a public financial dataset and a UK climate dataset, the results reveal some interesting findings.
Keywords and phrases:
Acknowledgements
The authors sincerely thank the Editor Professor Christian Hansen, the Associate Editor and two anonymous reviewers for their insightful comments that significantly improve the paper. The research of Heng Lian is supported by Hong Kong RGC general research fund 11301718, and by Project 11871411 from NSFC and the Shenzhen Research Institute, City University of Hong Kong. This research is also supported by National Natural Science Foundation of China (Grant Number 71833004)
Appendix
The Appendix includes additional notations and brief technical proofs supporting Section 3 in Sections A.1 and A.2–A.5, respectively.
A.1 Additional notations
Let ,
Due to assumption (C3), there exists
such that
. Here and below we use C to denote a generic positive constant whose value can change even on the same line. We use
to denote the operator norm of a matrix (the operator norm is the same as the largest singular value) and use
to denote the Frobenius norm of a matrix. We use
to denote the
norm of functions and
is the sup-norm for vectors (maximum absolute value of the components).
Assume the true partition of components of and
is given by
and
, respectively. The unique values of the components of
and
are denoted by
and
, respectively. Let
be the
binary matrix whose (k, h) entry is 1 if
and 0 otherwise. We have
. Similarly, we define
such that
. The sizes of
and
are denoted by
and
. Finally, let
and
be the diagonal matrix with entries
and
, respectively.
A.2 Proof summary
We first define the oracle estimator as the minimizer of
where and
with the constraint that components of
in the same partition take the same value and components of
in the same partition take the same value. Here we assume the partition is the true partition, thus the name “oracle”.
Below we first show that the oracle estimator satisfies the asymptotic normality properties stated in Theorem 2 (we also obtained convergence rate and asymptotic normality for the entire vector and
, see for example (A.9) and (A.10). Also, noting that all arguments carry over when the partition used in the oracle estimator is finer than the true partition, Theorem 1 follows directly as a special case that each component of
and
forms its own group in the partition. Then we show that the change points can be consistently estimated, and thus the estimator we obtain in stage 3 will be exactly the same as the oracle estimator using the true partition, with probability approaching one, and Theorem 2 is proved. The rest of the appendix contains a sketch of the proofs outlined above while more details are relegated to the supplementary material, as well as several lemmas.
A.3 Proof of asymptotic property for the oracle estimator
In this part we consider the asymptotic property of the oracle estimator, denoted by in this section, which assumed knowledge of the true partitions. For clarity of presentation, the proof is split into several steps.
STEP 1. Prove the convergence rate .
In this section, when we use , we always assume
for some
(that is, components of
are partitioned in the same way as is the true
). It is easy to see that
. Similarly, we always assume
for some
and
.
Define . We only need to show that
with probability approaching one, if L is large enough.
We have
where with
.
Furthermore, some algebra shows
where
are the first derivatives of the basis functions and
lies between
and
.
With the help of Lemma 3 in the SUPPLEMENTARY MATERIAL provided in a separated file, and noting that
(A.1)
(A.1)
is an orthonormal matrix (that is, ), we can get
(A.2)
(A.2)
Now consider the term . We can show
where O is as defined in (A.1), and further calculations reveal that
(A.3)
(A.3)
where
By Lemma 4 in the SUPPLEMENTARY MATERIAL provided in a separated file, and that (note
), we have
(A.4)
(A.4)
Finally, using Cauchy-Schwarz inequality
(A.5)
(A.5)
Combining (A.2)–(A.5),
with probability approaching one, if with L sufficiently large. Thus there is a local minimizer
with
.
STEP 2. Proof of convergence rate of and its asymptotic normality.
Let be
matrices,
, with rows
. Define
with rows
. We write, for any
with
and
,
where
with and
(or
for short) contains the remaining terms. Using
, we can show
(A.6)
(A.6)
We then orthogonalize the parametric part with respect to the nonparametric part by writing
where , and
is the one-to-one mapping that maps
to
. Below we write
as
as
, and note
. Then,
(A.7)
(A.7)
The first term above is
where is an
orthonormal matrix, and
.
Let . Lemma 5 in the SUPPLEMENTARY MATERIAL, provided in a separated file, shows that
. Based on this, we have the first term in (A.7) is bounded below by
.
Now consider the second term in (A.7). We have
and with some more detailed analyses we get (A.3), we get
(A.8)
(A.8)
and thus the second term in (A.7) is . The remaining terms in (A.7) can be shown to be of order
. Summarizing the bounds for different terms in (A.7), we get
which implies .
To get asymptotic normality, we define
Then for any unit vector , we have
Consider
We can show using the central limit theorem under mixing conditions, for example results in Bardet et al. (2008), that
where
where . Furthermore, it can be shown that
and
which established the asymptotic normality of .
Since ,
is asymptotically normal. That is, for any unit vector
,
(A.9)
(A.9)
where
STEP 3. Proof of the convergence rate of and its asymptotic normality.
To get the convergence rate of , like for
, we perform a projection, which is now the projection for the nonparametric part. Let
. Obviously, we have
Writing now that
where is the t-th row of
and
, we can show
and its asymptotic normality similar to arguments used for
. In particular, we have that for any unit vector
,
(A.10)
(A.10)
where
A.4 Proof of Theorems 1 and 2
We now consider the proof of Theorems 1 and 2 as special cases of (A.9) and (A.10). Consider first Theorem 2, under the additional assumption that the true partition is used. As shown previously, the asymptotic variance of is
, where
From our proof, it is easy to see that eigenvalues of
are bounded and bounded away from zero. By the definition of the
matrix
, it is easy to see that its row corresponding to βij, say denoted by
, has a single nonzero entry
. Let
, which is a unit vector, then the asymptotic variance of
is
.
The asymptotic variance of is
, where
with eigenvalues bounded and bounded away from zero. By definition of and
, it can be seen that each row of the
matrix
has a single nonzero entry
and thus if we define
, it is easy to directly verify that
is bounded and bounded away from zero and infinity for any unit vector v. Also, we have
. Thus the asymptotic variance of
can be written as
, if we define
, and
.
For Theorem 1, since the result is standard, and also is a special case of Theorem 2, we omit the repetition of arguments above. The quantities and
are defined as above based on the trivial structure in which each single parameter forms its own group in the partition.
The proof of Theorem 2 would be complete if we can establish consistency of homogeneity pursuit based on change point detection. That is, we need to show that the true partition can be identified with probability approaching one.
First, we can show and
. The general strategy for establishing these is similar to showing the convergence rate of
, using a slightly different projection, and one needs to carefully construct bounds that are valid uniformly over components of
.
Then we use sequence for illustration, with estimated change points
. The true ordered sequence of
is
with change points kh,
. Let
be the minimum jump size. The sup-norm convergence results established above, when specializing to the estimator in stage 1, imply that
where
. On the event
. It is easy to see that
(A.11)
(A.11)
where
Now suppose u – 1 and e are both change points and there is at least one change point inside . Let
and
. We prove consistency by way of contradiction. Suppose
is not one of the true change points. Using some results in Venkatraman (1992) and Cho and Fryzlewicz (2012), we can show that this would lead to
by (A.11), a contradiction by the definition of
. Also, in this case, it is easy to see that
.
Now suppose still are both change points but there are no other change point inside
. In this case, using (A.11), it is easy to see that
.
Since we refrain from further partitioning the interval if and only if
with
, we see that the algorithm consistently identifies exactly the true change points in
.
The proof for change point detection in is the same, and the proof of Theorem 2 is complete.
A.5 Proof of Theorem 3
For the first statement, we just need to note that is the minimizer of
and all are the same, thus
Similarly we can show the second statement.