Homogeneity Pursuit in Single Index Models based Panel Data Analysis: Journal of Business & Economic Statistics: Vol 39 , No 2

Abstract

Panel data analysis is an important topic in statistics and econometrics. Traditionally, in panel data analysis, all individuals are assumed to share the same unknown parameters, e.g. the same coefficients of covariates when the linear models are used, and the differences between the individuals are accounted for by cluster effects. This kind of modelling only makes sense if our main interest is on the global trend, this is because it would not be able to tell us anything about the individual attributes which are sometimes very important. In this paper, we propose a modelling based on the single index models embedded with homogeneity for panel data analysis, which builds the individual attributes in the model and is parsimonious at the same time. We develop a data driven approach to identify the structure of homogeneity, and estimate the unknown parameters and functions based on the identified structure. Asymptotic properties of the resulting estimators are established. Intensive simulation studies conducted in this paper also show the resulting estimators work very well when sample size is finite. Finally, the proposed modelling is applied to a public financial dataset and a UK climate dataset, the results reveal some interesting findings.

Keywords and phrases:

Acknowledgements

The authors sincerely thank the Editor Professor Christian Hansen, the Associate Editor and two anonymous reviewers for their insightful comments that significantly improve the paper. The research of Heng Lian is supported by Hong Kong RGC general research fund 11301718, and by Project 11871411 from NSFC and the Shenzhen Research Institute, City University of Hong Kong. This research is also supported by National Natural Science Foundation of China (Grant Number 71833004)

Appendix

The Appendix includes additional notations and brief technical proofs supporting Section 3 in Sections A.1 and A.2–A.5, respectively.

A.1 Additional notations

Let $ϵ_{t} = {(ϵ_{1 t}, \dots, ϵ_{mt})}^{T}$ , $y_{t} = {(y_{1 t}, \dots, y_{mt})}^{T},$ $X_{t} = {(X_{1 t}^{T}, \dots, X_{mt}^{T})}^{T} .$ Due to assumption (C3), there exists $θ_{0} = {(θ_{01}^{T}, \dots, θ_{0 m}^{T})}^{T}, θ_{0 i} = {(θ_{0 i 1}, \dots, θ_{0 i K})}^{T}$ such that $\sup_{x} | g_{0 i} (x) - θ_{0 i}^{T} B (x) | \leq C K^{- 2}$ . Here and below we use C to denote a generic positive constant whose value can change even on the same line. We use $‖ . ‖_{op}$ to denote the operator norm of a matrix (the operator norm is the same as the largest singular value) and use $‖ . ‖$ to denote the Frobenius norm of a matrix. We use $‖ . ‖_{L^{2}}$ to denote the $L^{2}$ norm of functions and $‖ . ‖_{\infty}$ is the sup-norm for vectors (maximum absolute value of the components).

Assume the true partition of components of $θ_{0}$ and ${\bar{β}}_{0}$ is given by $\cup_{h = 1}^{H_{1}} G_{1, h} = {1, \dots, mK}$ and $\cup_{h = 1}^{H_{2}} G_{2, h} = {1, \dots, mp}$ , respectively. The unique values of the components of $θ_{0}$ and $\bar{β}$ are denoted by $ξ_{0} = {(ξ_{01}, \dots, ξ_{0 H_{1}})}^{T} \in R^{H_{1}}$ and $η_{0} = {(η_{01}, \dots, η_{0 H_{2}})}^{T} \in R^{H_{2}}$ , respectively. Let $J_{i}^{G_{1}}$ be the $K \times H_{1}$ binary matrix whose (k, h) entry is 1 if $θ_{0 i k} = ξ_{h}$ and 0 otherwise. We have $θ_{0 i} = J_{i}^{G_{1}} ξ_{0}$ . Similarly, we define $J_{i}^{G_{2}}$ such that $β_{0 i} = J_{i}^{G_{2}} η_{0}$ . The sizes of $G_{1, h}$ and $G_{2, h}$ are denoted by $| G_{1, h} |$ and $| G_{2, h} |$ . Finally, let $D^{G_{1}}$ and $D^{G_{2}}$ be the diagonal matrix with entries $\sqrt{| G_{1, h} |}$ and $\sqrt{| G_{2, h} |}$ , respectively.

A.2 Proof summary

We first define the oracle estimator as the minimizer $(\hat{θ}, \hat{β})$ of $\min_{θ, \bar{β}} \sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - B^{T} (X_{it}^{T} β_{i}) θ_{i})^{2},$

where $β_{i} = {(1, {\bar{β}}_{i}^{T})}^{T} = {(1, β_{i 1}, \dots, β_{ip})}^{T}$ and $θ_{i} = {(θ_{i 1}, \dots, θ_{iK})}^{T}$ with the constraint that components of $\bar{β} = {({\bar{β}}_{1}^{T}, \dots, {\bar{β}}_{m}^{T})}^{T}$ in the same partition take the same value and components of $θ = {(θ_{1}^{T}, \dots, θ_{m}^{T})}^{T}$ in the same partition take the same value. Here we assume the partition is the true partition, thus the name “oracle”.

Below we first show that the oracle estimator satisfies the asymptotic normality properties stated in Theorem 2 (we also obtained convergence rate and asymptotic normality for the entire vector $β$ and $θ$ , see for example (A.9) and (A.10). Also, noting that all arguments carry over when the partition used in the oracle estimator is finer than the true partition, Theorem 1 follows directly as a special case that each component of $θ$ and $β$ forms its own group in the partition. Then we show that the change points can be consistently estimated, and thus the estimator we obtain in stage 3 will be exactly the same as the oracle estimator using the true partition, with probability approaching one, and Theorem 2 is proved. The rest of the appendix contains a sketch of the proofs outlined above while more details are relegated to the supplementary material, as well as several lemmas.

A.3 Proof of asymptotic property for the oracle estimator

In this part we consider the asymptotic property of the oracle estimator, denoted by $(\hat{θ}, \hat{β})$ in this section, which assumed knowledge of the true partitions. For clarity of presentation, the proof is split into several steps.

STEP 1. Prove the convergence rate $‖ \hat{θ} - θ_{0} ‖ + ‖ \hat{β} - β_{0} ‖ = O_{p} (\sqrt{(H_{1} + H_{2}) / T} + \sqrt{m} K^{- 2})$ .

In this section, when we use $θ$ , we always assume $θ_{i} = J_{i}^{G_{1}} ξ$ for some $ξ \in R^{H_{1}}$ (that is, components of $θ$ are partitioned in the same way as is the true $θ_{0}$ ). It is easy to see that $‖ θ - θ_{0} ‖ = ‖ D^{G_{1}} (ξ - ξ_{0}) ‖$ . Similarly, we always assume $β_{i} = J_{i}^{G_{2}} η$ for some $η \in R^{H_{2}}$ and $‖ β - β_{0} ‖ = ‖ D^{G_{2}} (η - η_{0}) ‖$ .

Define $r_{T} = \sqrt{(H_{1} + H_{2}) / T} + \sqrt{m} K^{- 2}$ . We only need to show that $\inf_{‖ β - β_{0} ‖^{2} + ‖ θ - θ_{0} ‖^{2} = L r_{T}^{2}} \sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - θ_{i}^{T} B (X_{it}^{T} β_{i}))^{2} - \sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))^{2} > 0$

with probability approaching one, if L is large enough.

We have $\begin{matrix} \sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - θ_{i}^{T} B (X_{it}^{T} β_{i}))^{2} - \sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))^{2} \\ = \sum_{i, t} (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))^{2} - 2 (ϵ_{it} - r_{it}) (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i})), \end{matrix}$

where $r_{it} = θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}) - g (X_{it}^{T} β_{0 i})$ with $| r_{it} | \leq C K^{- 2}$ .

Furthermore, some algebra shows $\begin{matrix} \sum_{i, t} (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))^{2} \\ = T ((ξ^{T} - ξ_{0}^{T}) D^{G_{1}}, (η^{T} - η_{0}^{T}) D^{G_{2}}) \\ (\begin{matrix} {(D^{G_{1}})}^{- 1} & 0 \\ 0 & {(D^{G_{2}})}^{- 1} \end{matrix}) (\begin{matrix} {(J_{1}^{G_{1}})}^{T} & 0 & \dots & {(J_{m}^{G_{1}})}^{T} & 0 \\ 0 & {(J_{1}^{G_{2}})}^{T} & \dots & 0 & {(J_{m}^{G_{2}})}^{T} \end{matrix}) \\ (\begin{matrix} {\tilde{A}}_{1} & 0 & \dots & 0 \\ 0 & {\tilde{A}}_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & {\tilde{A}}_{m} \end{matrix}) \cdot (\begin{matrix} J_{1}^{G_{1}} & 0 \\ 0 & J_{1}^{G_{2}} \\ ⋮ & ⋮ \\ J_{m}^{G_{1}} & 0 \\ 0 & J_{m}^{G_{2}} \end{matrix}) (\begin{matrix} {(D^{G_{1}})}^{- 1} & 0 \\ 0 & {(D^{G_{2}})}^{- 1} \end{matrix}) (\begin{matrix} D^{G_{1}} (ξ - ξ_{0}) \\ D^{G_{2}} (η - η_{0}) \end{matrix}), \end{matrix}$

where ${\tilde{A}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} [(\begin{matrix} B (X_{it}^{T} β_{i}) \\ θ_{0 i}^{T} B' (X_{it}^{T} β_{i}^{*}) {\bar{X}}_{it} \end{matrix}) (\begin{matrix} B^{T} (X_{it}^{T} β_{i}) & θ_{0 i}^{T} B' (X_{it}^{T} β_{i}^{*}) {\bar{X}}_{it}^{T} \end{matrix})], 1 \leq i \leq m,$

$B' (.) = {(B_{1}^{'} (.), \dots, B_{K}^{'} (.))}^{T}$ are the first derivatives of the basis functions and $β_{i}^{*}$ lies between $β_{0 i}$ and $β_{i}$ .

With the help of Lemma 3 in the SUPPLEMENTARY MATERIAL provided in a separated file, and noting that (A.1) $O : = (\begin{matrix} J_{1}^{G_{1}} & 0 \\ 0 & J_{1}^{G_{2}} \\ ⋮ & ⋮ \\ J_{m}^{G_{1}} & 0 \\ 0 & J_{m}^{G_{2}} \end{matrix}) (\begin{matrix} {(D^{G_{1}})}^{- 1} & 0 \\ 0 & {(D^{G_{2}})}^{- 1} \end{matrix})$ (A.1)

is an orthonormal matrix (that is, $O^{T} O = I$ ), we can get (A.2) $\sum_{i, t} (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))^{2} ≍ T (‖ D^{G_{1}} (ξ - ξ_{0}) ‖^{2} + ‖ D^{G_{2}} (η - η_{0}) ‖^{2}) = T (‖ θ - θ_{0} ‖^{2} + ‖ β - β_{0} ‖^{2}) .$ (A.2)

Now consider the term $(ϵ_{it} - r_{it}) (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))$ . We can show $\begin{matrix} \sum_{i, t} ϵ_{it} (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i})) \\ \leq \sqrt{‖ D^{G_{1}} (ξ - ξ_{0}) ‖^{2} + ‖ D^{G_{2}} (η - η_{0}) ‖^{2}} ‖ \sum_{t} O^{T} (\begin{matrix} B (X_{1 t}^{T} β_{1}) ϵ_{1 t} \\ θ_{01}^{T} B' (X_{1 t}^{T} β_{1}^{*}) {\bar{X}}_{1 t} ϵ_{1 t} \\ ⋮ \\ B (X_{mt}^{T} β_{m}) ϵ_{mt} \\ θ_{0 m}^{T} B' (X_{mt}^{T} β_{m}^{*}) {\bar{X}}_{mt} ϵ_{mt} \end{matrix}) ‖, \end{matrix}$

where O is as defined in (A.1), and further calculations reveal that (A.3) $\begin{matrix} E {‖ \sum_{t} O^{T} (\begin{matrix} B (X_{1 t}^{T} β_{1}) ϵ_{1 t} \\ θ_{01}^{T} B' (X_{1 t}^{T} β_{1}^{*}) {\bar{X}}_{1 t} ϵ_{1 t} \\ ⋮ \\ B (X_{mt}^{T} β_{m}) ϵ_{mt} \\ θ_{0 m}^{T} B' (X_{mt}^{T} β_{m}^{*}) {\bar{X}}_{mt} ϵ_{mt} \end{matrix}) ‖}^{2} \\ \leq tr (O O^{T}) \cdot {‖ \sum_{1 \leq t, t' \leq T} [\begin{matrix} A_{1, | t - t' |} & 0 & \dots & 0 \\ 0 & A_{2, | t - t' |} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & A_{m, | t - t' |} \end{matrix}] ‖}_{op}, \end{matrix}$ (A.3)

where $A_{i, | t - t' |} = E [(\begin{matrix} B (X_{it}^{T} β_{i}) \\ θ_{0 i}^{T} B' (X_{it}^{T} β_{i}^{*}) {\bar{X}}_{it} \end{matrix}) (\begin{matrix} B^{T} (X_{it'}^{T} β_{i}) & θ_{0 i}^{T} B' (X_{it'}^{T} β_{i}^{*}) {\bar{X}}_{it'}^{T} \end{matrix}) ϵ_{it} ϵ_{it'}] .$

By Lemma 4 in the SUPPLEMENTARY MATERIAL provided in a separated file, and that $tr (O O^{T}) = H_{1} + H_{2}$ (note $O^{T} O = I_{H_{1} + H_{2}}$ ), we have (A.4) $\begin{matrix} \sum_{i, t} ϵ_{it} (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i})) \\ = O_{p} (\sqrt{(‖ θ - θ ‖^{2} + ‖ β - β_{0} ‖^{2}) (H_{1} + H_{2}) T}) . \end{matrix}$ (A.4)

Finally, using Cauchy-Schwarz inequality (A.5) $\begin{matrix} \sum_{i, t} r_{it} (θ_{i}^{T} B (X_{it}^{T} β_{i}) - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i})) \\ = C \sqrt{mT} K^{- 2} \cdot O_{p} (\sqrt{T (‖ θ - θ ‖^{2} + ‖ β - β_{0} ‖^{2})}) \end{matrix}$ (A.5)

Combining (A.2)–(A.5), $\sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - θ_{i}^{T} B (X_{it}^{T} β_{i}))^{2} - \sum_{i = 1}^{m} \sum_{t = 1}^{T} (y_{it} - θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}))^{2} > 0$

with probability approaching one, if $‖ β - β_{0} ‖^{2} + ‖ θ - θ_{0} ‖^{2} = L r_{T}^{2}$ with L sufficiently large. Thus there is a local minimizer $(\hat{θ}, \hat{β})$ with $‖ \hat{β} - β_{0} ‖ + ‖ \hat{θ} - θ_{0} ‖ = O_{p} (r_{T})$ .

STEP 2. Proof of convergence rate of $\hat{β}$ and its asymptotic normality.

Let $Π_{i}$ be $T \times K$ matrices, $i = 1, \dots, m$ , with rows $Π_{it}^{T} = B^{T} (X_{it}^{T} β_{0 i})$ . Define $V_{it} = g_{0 i}^{'} (X_{it}^{T} β_{0 i}) {\bar{X}}_{it}, P_{i} = Π_{i} {(Π_{i}^{T} Π_{i})}^{- 1} Π_{i}^{T}$ with rows $P_{it}^{T} = Π_{it}^{T} {(Π_{i}^{T} Π_{i})}^{- 1} Π_{i}^{T}$ . We write, for any $(θ, β)$ with $‖ θ - θ_{0} ‖^{2} \leq C r_{T}^{2}$ and $‖ β - β_{0} ‖^{2} \leq C H_{2} / T$ , $\begin{matrix} \sum_{i, t} (y_{it} - θ_{i}^{T} B (X_{it}^{T} β_{i}))^{2} \\ = \sum_{i, t} (ϵ_{it} - Π_{it}^{T} (θ_{i} - θ_{0 i}) - V_{it}^{T} ({\bar{β}}_{i} - {\bar{β}}_{0 i}) - R_{it})^{2}, \end{matrix}$

where $\begin{matrix} R_{it} \\ = {θ_{0 i}^{T} B (X_{it}^{T} β_{0 i}) - g_{0 i} (X_{it}^{T} β_{0 i})} + {{(θ_{i} - θ_{0 i})}^{T} (B (X_{it}^{T} β_{i}) - B (X_{it}^{T} β_{0 i}))} \\ + {θ_{0 i}^{T} (B (X_{it}^{T} β_{i}) - B (X_{it}^{T} β_{0 i}) - B' (X_{it}^{T} β_{0 i}) {\bar{X}}_{it}^{T} ({\bar{β}}_{i} - {\bar{β}}_{0 i})} \\ + {(θ_{0 i}^{T} B' (X_{it}^{T} β_{0 i}) - g_{0 i}^{'} (X_{it}^{T} β_{0 i})) {\bar{X}}_{it}^{T} ({\bar{β}}_{i} - {\bar{β}}_{0 i})} \\ = R_{it 1} + R_{it 2} (θ_{i}, β_{i}), \end{matrix}$

with $R_{it 1} = θ_{0}^{T} B (X_{it}^{T} β_{0 i}) - g_{0 i} (X_{it}^{T} β_{0 i})$ and $R_{it 2} (θ_{i}, β_{i})$ (or $R_{it 2}$ for short) contains the remaining terms. Using $‖ θ - θ_{0} ‖^{2} + ‖ β - β_{0} ‖^{2} \leq C r_{T}^{2}$ , we can show (A.6) $\sum_{i, t} R_{it 2}^{2} = O_{p} (T r_{T}^{4} K^{3} + T r_{T}^{2} K^{- 2}) .$ (A.6)

We then orthogonalize the parametric part with respect to the nonparametric part by writing $\begin{matrix} \sum_{i, t} (ϵ_{it} - Π_{it}^{T} (θ_{i} - θ_{0 i}) - V_{it}^{T} ({\bar{β}}_{i} - {\bar{β}}_{0 i}) - R_{it})^{2} \\ = \sum_{i, t} (ϵ_{it} - Π_{it}^{T} (α_{i} - α_{0 i}) - {(V_{it} - V_{i}^{T} P_{it})}^{T} ({\bar{β}}_{i} - {\bar{β}}_{0 i}) - R_{it 1} - R_{it 2} {(M_{i} (α_{i}, β_{i}))}^{2}, \end{matrix}$

where $α_{i} = θ_{i} + {(Π_{i}^{T} Π_{i})}^{- 1} Π_{i}^{T} V_{i} {\bar{β}}_{i}, α_{0 i} = θ_{0 i} + {(Π_{i}^{T} Π_{i})}^{- 1} Π_{i}^{T} V_{i} {\bar{β}}_{0 i}, V_{i} = {(V_{it}, \dots, V_{iT})}^{T}$ , and $M_{i}$ is the one-to-one mapping that maps $(α_{i}, β_{i})$ to $(θ_{i}, β_{i})$ . Below we write $R_{it 2} (M_{i} (α_{i}, β_{i}))$ as $R_{it 2}, R_{it 2} (M_{i} ({\hat{α}}_{i}, {\hat{β}}_{i}))$ as ${\hat{R}}_{it 2}$ , and note $R_{it 2} (M_{i} ({\hat{α}}_{i}, β_{0})) = 0$ . Then, (A.7) $\begin{matrix} 0 \geq \sum_{i, t} (ϵ_{it} - Π_{it}^{T} {\hat{α}}_{i} - {(V_{it} - V_{i}^{T} P_{it})}^{T} ({\hat{\bar{β}}}_{i} - {\bar{β}}_{0 i}) - R_{it 1} - R_{it 2})^{2} \\ - \sum_{i, t} {(ϵ_{it} - Π_{it}^{T} {\hat{α}}_{i} - R_{it 1})}^{2} \\ = \sum_{i, t} {(\hat{η} - η_{0})}^{T} {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) (V_{it}^{T} - P_{it}^{T} V_{i}) J_{i}^{G_{2}} (\hat{η} - η_{0}) \\ - 2 \sum_{i, t} {(\hat{η} - η_{0})}^{T} {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) ϵ_{it} \\ - 2 \sum_{i, t} {\hat{R}}_{it 2} ϵ_{it} \\ + \sum_{i, t} {\hat{R}}_{it 2}^{2} + 2 \sum_{i, t} ({(\hat{η} - η_{0})}^{T} {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) + {\hat{R}}_{it 2}) (Π_{it}^{T} {\hat{α}}_{i} + R_{it 1}) \\ + 2 \sum_{i, t} {\hat{R}}_{it 2} \cdot {(\hat{η} - η_{0})}^{T} {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) . \end{matrix}$ (A.7)

The first term above is $\begin{matrix} \sum_{i, t} (\hat{η} - η_{0}) {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) (V_{it}^{T} - P_{it}^{T} V_{i}) J_{i}^{G_{2}} (\hat{η} - η_{0}) \\ = T {(\hat{η} - η_{0})}^{T} D^{G_{2}} O_{2}^{T} (\begin{matrix} {\hat{C}}_{1} & 0 & \dots & 0 \\ 0 & {\hat{C}}_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & {\hat{C}}_{m} \end{matrix}) O_{2} D^{G_{2}} (\hat{η} - η_{0}), \end{matrix}$

where $O_{2} = (\begin{matrix} J_{1}^{G_{2}} \\ ⋮ \\ J_{m}^{G_{2}} \end{matrix}) {(D^{G_{2}})}^{- 1}$ is an $mp \times H_{2}$ orthonormal matrix, and ${\hat{C}}_{i} = \sum_{t} (V_{it} - V_{i}^{T} P_{it}) {(V_{it} - V_{i}^{T} P_{it})}^{T} / T$ .

Let $C_{i} = E [{(g_{0 i}^{'} (X_{it}^{T} β_{0 i}))}^{2} {({\bar{X}}_{it} - E [{\bar{X}}_{it} | X_{it} β_{0 i}])}^{\otimes 2}]$ . Lemma 5 in the SUPPLEMENTARY MATERIAL, provided in a separated file, shows that $\max_{i} ‖ {\hat{C}}_{i} - C_{i} ‖_{op} = o_{p} (1)$ . Based on this, we have the first term in (A.7) is bounded below by $CT ‖ D^{G_{2}} (\hat{η} - η_{0}) ‖^{2}$ .

Now consider the second term in (A.7). We have $\begin{matrix} \sum_{i, t} (\hat{η} - η_{0}) {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) ϵ_{it} \\ \leq ‖ D^{G_{2}} (\hat{η} - η_{0}) ‖ \cdot ‖ \sum_{t} O_{2}^{T} (\begin{matrix} (V_{1 t} - V_{1}^{T} P_{1 t}) ϵ_{1 t} \\ ⋮ \\ (V_{mt} - V_{m}^{T} P_{mt}) ϵ_{mt} \end{matrix}) ‖, \end{matrix}$

and with some more detailed analyses we get (A.3), we get (A.8) $E [{‖ \sum_{t} O_{2}^{T} (\begin{matrix} (V_{1 t} - V_{1}^{T} P_{1 t}) ϵ_{1 t} \\ ⋮ \\ (V_{mt} - V_{m}^{T} P_{mt}) ϵ_{mt} \end{matrix}) ‖}^{2}] = O (H_{2} T)$ (A.8)

and thus the second term in (A.7) is $O_{p} (\sqrt{H_{2} T} ‖ D^{G_{2}} (\hat{η} - η_{0}) ‖)$ . The remaining terms in (A.7) can be shown to be of order $o_{p} (1)$ . Summarizing the bounds for different terms in (A.7), we get $‖ D^{G_{2}} (\hat{η} - η_{0}) ‖^{2} + ‖ D^{G_{2}} (\hat{η} - η_{0}) ‖ O_{p} (\sqrt{H_{2} / T}) + o_{p} (1 / T) \leq 0,$

which implies $‖ \hat{β} - β_{0} ‖ = ‖ D^{G_{2}} (\hat{η} - η_{0}) ‖ = O_{p} (\sqrt{H_{2} / T})$ .

To get asymptotic normality, we define $\tilde{η} = η_{0} + {(\sum_{i, t} {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) (V_{it}^{T} - P_{it}^{T} V_{i}) J_{i}^{G_{2}})}^{- 1} \sum_{i, t} {(J_{i}^{G_{2}})}^{T} (V_{it} - V_{i}^{T} P_{it}) ϵ_{it} .$

Then for any unit vector $a_{2} \in R^{H_{2}}$ , we have $\begin{matrix} a_{2}^{T} D^{G_{2}} (\tilde{η} - η_{0}) \\ = T^{- 1} a_{2}^{T} {(O_{2}^{T} (\begin{matrix} {\hat{C}}_{1} & 0 & \dots & 0 \\ 0 & {\hat{C}}_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & {\hat{C}}_{m} \end{matrix}) O_{2})}^{- 1} \cdot \\ \sum_{t} O_{2}^{T} (\begin{matrix} (V_{1 t} - V_{1}^{T} P_{1 t}) ϵ_{1 t} \\ ⋮ \\ (V_{mt} - V_{m}^{T} P_{mt}) ϵ_{mt} \end{matrix}) . \end{matrix}$

Consider $\begin{matrix} b_{2} : = T^{- 1} a_{2}^{T} {(O_{2}^{T} (\begin{matrix} C_{1} & 0 & \dots & 0 \\ 0 & C_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & C_{m} \end{matrix}) O_{2})}^{- 1} \cdot \\ \sum_{t} O_{2}^{T} (\begin{matrix} (V_{1 t} - Φ_{1 t}) ϵ_{1 t} \\ ⋮ \\ (V_{mt} - Φ_{mt}) ϵ_{mt} \end{matrix}) . \end{matrix}$

We can show using the central limit theorem under mixing conditions, for example results in Bardet et al. (2008), that $\sqrt{T} ν_{2, T}^{- 1 / 2} b_{2} \overset{d}{\to} N (0, 1),$

where $\begin{matrix} ν_{2, T} = a_{2}^{T} {(O_{2}^{T} C O_{2})}^{- 1} O_{2}^{T} Σ_{2} O_{2} {(O_{2}^{T} C O_{2})}^{- 1} a_{2}, \\ C = (\begin{matrix} C_{1} & 0 & \dots & 0 \\ 0 & C_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & C_{m} \end{matrix}), \\ Σ_{2} = \frac{1}{T} \sum_{1 \leq t, t' \leq T} [\begin{matrix} C_{1, | t - t' |} & 0 & \dots & 0 \\ 0 & C_{2, | t - t' |} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & C_{m, | t - t' |} \end{matrix}], \end{matrix}$

where $C_{i, | t - t' |} = E [ϵ_{it} ϵ_{it'} g_{0 i}^{'} (X_{it}^{T} β_{0 i}) g_{0 i}^{'} (X_{it'}^{T} β_{0 i}) ({\bar{X}}_{it} - E [{\bar{X}}_{it} | X_{it}^{T} β_{0 i}]) {({\bar{X}}_{it'} - E [{\bar{X}}_{it'} | X_{it'}^{T} β_{0 i}])}^{T}]$ . Furthermore, it can be shown that $| a_{2}^{T} D^{G_{2}} (\tilde{η} - η_{0}) - b_{1} | = o_{p} (1 / \sqrt{T}),$

and $| a_{2}^{T} D^{G_{2}} (\hat{η} - η_{0}) - a_{2}^{T} D^{G_{2}} (\tilde{η} - η_{0}) | = o_{p} (1 / \sqrt{T}),$

which established the asymptotic normality of $\hat{η}$ .

Since $\hat{\bar{β}} - {\bar{β}}_{0} = O_{2} D^{G_{2}} (\hat{η} - η_{0})$ , $b_{2}^{T} (\hat{\bar{β}} - {\bar{β}}_{0}) = b_{2}^{T} O_{2} D^{G_{2}} (\hat{η} - η_{0})$ is asymptotically normal. That is, for any unit vector $b_{2} \in R^{mp}$ , (A.9) $\sqrt{T} κ_{2, T}^{- 1 / 2} b_{2}^{T} (\hat{\bar{β}} - {\bar{β}}_{0}) \overset{d}{\to} N (0, 1),$ (A.9)

where $κ_{2, T} : = b_{2}^{T} O_{2} {(O_{2}^{T} C O_{2})}^{- 1} O_{2}^{T} Σ_{2} O_{2} {(O_{2}^{T} C O_{2})}^{- 1} O_{2}^{T} b_{2} .$

STEP 3. Proof of the convergence rate of $\hat{θ}$ and its asymptotic normality.

To get the convergence rate of $\hat{θ}$ , like for $\hat{β}$ , we perform a projection, which is now the projection for the nonparametric part. Let $A_{0 i} : = \arg \min_{A} ‖ B (X_{it}^{T} β_{0 i}) - g'_{0 i} (X_{it}^{T} β_{0 i}) A {\bar{X}}_{it} ‖^{2}$ . Obviously, we have $A_{0 i} = E [g'_{0 i} (X_{it}^{T} β_{0 i}) B (X_{it}^{T} β_{0 i}) {\bar{X}}_{it}^{T}] {(E [{(g'_{0 i} ({\bar{X}}_{it}^{T} β_{0 i}))}^{2} {\bar{X}}_{it} {\bar{X}}_{it}^{T}])}^{- 1} .$

Writing now that $\begin{matrix} \sum_{i, t} (y_{it} - θ_{i}^{T} B (X_{it}^{T} β_{i}))^{2} \\ = \sum_{i, t} (ϵ_{it} - (Π_{it}^{T} - Q_{it}^{T} Π_{i}) (θ_{i} - θ_{0 i}) - V_{it}^{T} (γ_{i} - γ_{0 i}) - R_{it 1} - R_{it 2} (θ_{i}, γ_{i} - {(V_{i}^{T} V_{i})}^{- 1} V_{i}^{T} θ_{i}))^{2}, \end{matrix}$

where $Q_{it}^{T}$ is the t-th row of $Q_{i} = V_{i} {(V_{i}^{T} V_{i})}^{- 1} V_{i}^{T}$ and $γ_{i} = {\bar{β}}_{i} + {(V_{i}^{T} V_{i})}^{- 1} V_{i}^{T} Π_{i} θ_{i}, γ_{0 i} = {\bar{β}}_{0 i} + {(V_{i}^{T} V_{i})}^{- 1} V_{i}^{T} Π_{i} θ_{0 i}$ , we can show $‖ \hat{θ} - θ_{0} ‖^{2} = O_{p} (H_{1} / T)$ and its asymptotic normality similar to arguments used for $\hat{β}$ . In particular, we have that for any unit vector $b_{1} \in R^{mp}$ , (A.10) $\sqrt{T} κ_{1, T}^{- 1 / 2} b_{1}^{T} (\hat{θ} - θ_{0}) \overset{d}{\to} N (0, 1),$ (A.10)

where $\begin{matrix} κ_{1, T} = b_{1}^{T} O_{1} {(O_{1}^{T} D O_{1})}^{- 1} O_{1}^{T} Σ_{1} O_{1} {(O_{1}^{T} D O_{1})}^{- 1} O_{1}^{T} b_{1}, \\ D = (\begin{matrix} D_{1} & 0 & \dots & 0 \\ 0 & D_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & D_{m} \end{matrix}), \\ Σ_{1} = \frac{1}{T} \sum_{1 \leq t, t' \leq T} [\begin{matrix} D_{1, | t - t' |} & 0 & \dots & 0 \\ 0 & D_{2, | t - t' |} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & D_{m, | t - t' |} \end{matrix}], \\ D_{i} = E [{(B (X_{it}^{T} β_{0 i}) - g'_{0 i} (X_{it}^{T} β_{0 i}) A_{0 i} {\bar{X}}_{it})}^{\otimes 2}], \\ D_{i, | t - t' |} = E [ϵ_{it} ϵ_{it'} (B (X_{it}^{T} β_{0 i}) - g'_{0 i} (X_{it}^{T} β_{0 i}) A_{0 i} {\bar{X}}_{it}) {(B (X_{it'}^{T} β_{0 i}) - g'_{0 i} (X_{it'}^{T} β_{0 i}) A_{0 i} {\bar{X}}_{it'})}^{T}] . \end{matrix}$

A.4 Proof of Theorems 1 and 2

We now consider the proof of Theorems 1 and 2 as special cases of (A.9) and (A.10). Consider first Theorem 2, under the additional assumption that the true partition is used. As shown previously, the asymptotic variance of $\hat{\bar{β}} - {\bar{β}}_{0}$ is $T^{- 1} O_{2} Θ_{2} O_{2}^{T}$ , where $Θ_{2} = {(O_{2}^{T} C O_{2})}^{- 1} O_{2}^{T} Σ_{2} O_{2} {(O_{2}^{T} D O_{2})}^{- 1} .$ From our proof, it is easy to see that eigenvalues of $Θ_{2}$ are bounded and bounded away from zero. By the definition of the $mp \times H_{2}$ matrix $O_{2}$ , it is easy to see that its row corresponding to β_ij, say denoted by $O_{2 (ij)}^{T}$ , has a single nonzero entry $1 / \sqrt{m_{ij}}$ . Let $e_{ij} = \sqrt{m_{ij}} O_{2 (ij)}$ , which is a unit vector, then the asymptotic variance of ${\hat{β}}_{ij} - β_{0 i j}$ is ${(m_{ij} T)}^{- 1} e_{ij}^{T} Θ_{2} e_{ij}$ .

The asymptotic variance of ${\hat{θ}}_{i} - θ_{0 i}$ is $T^{- 1} J_{i}^{G_{1}} {(D^{G_{1}})}^{- 1} {\bar{Θ}}_{1} {(D^{G_{1}})}^{- 1} {(J_{i}^{G_{1}})}^{T}$ , where ${\bar{Θ}}_{1} = {(O_{1}^{T} D O_{1})}^{- 1} O_{1}^{T} Σ_{1} O_{1} {(O_{1}^{T} C O_{1})}^{- 1}$

with eigenvalues bounded and bounded away from zero. By definition of $J_{i}^{G_{1}}$ and $D^{G_{1}}$ , it can be seen that each row of the $K \times H_{1}$ matrix $J_{i}^{G_{1}} {(D^{G_{1}})}^{- 1}$ has a single nonzero entry $1 / \sqrt{m_{i}}$ and thus if we define $K_{i} = \sqrt{m_{i}} J_{i}^{G_{1}} {(D^{G_{1}})}^{- 1}$ , it is easy to directly verify that $K_{i}^{T} v$ is bounded and bounded away from zero and infinity for any unit vector v. Also, we have $‖ B (x) ‖ ≍ K$ . Thus the asymptotic variance of $B^{T} (x) {\hat{θ}}_{i} - B^{T} (x) θ_{0 i}$ can be written as $\frac{K}{m_{i} T} b^{T} (x) Θ_{1}^{T} b (x)$ , if we define $b (x) = K_{i}^{T} B (x) / ‖ K_{i}^{T} B (x) ‖$ , and $Θ_{1} = {\bar{Θ}}_{1} ‖ K_{i}^{T} B (x) ‖^{2} / K$ .

For Theorem 1, since the result is standard, and also is a special case of Theorem 2, we omit the repetition of arguments above. The quantities ${\tilde{e}}_{ij}, \tilde{b} (x), {\tilde{Θ}}_{1}$ and ${\tilde{Θ}}_{2}$ are defined as above based on the trivial structure in which each single parameter forms its own group in the partition.

The proof of Theorem 2 would be complete if we can establish consistency of homogeneity pursuit based on change point detection. That is, we need to show that the true partition can be identified with probability approaching one.

First, we can show $‖ D^{G_{2}} (\hat{η} - η_{0}) ‖_{\infty}^{2} = O_{p} (log (Tm) / T)$ and $‖ D^{G_{1}} (\hat{ξ} - ξ_{0}) ‖_{\infty}^{2} = O_{p} (log (Tm) / T)$ . The general strategy for establishing these is similar to showing the convergence rate of $\hat{β}$ , using a slightly different projection, and one needs to carefully construct bounds that are valid uniformly over components of $\hat{η}$ .

Then we use sequence $b_{(1)} \leq \dots \leq b_{(n)} (n = mp)$ for illustration, with estimated change points ${\hat{k}}_{0} = 0 < {\hat{k}}_{1} < \dots < {\hat{k}}_{{\hat{H}}_{2}} = n$ . The true ordered sequence of $β$ is $β_{0 (1)} \leq \dots \leq β_{0 (n)}$ with change points k_h, $h = 0, \dots, H_{2}$ . Let $γ_{2} = \min_{2 \leq h \leq H_{2}} | β_{0 (k_{h + 1})} - β_{0 (k_{h})} |$ be the minimum jump size. The sup-norm convergence results established above, when specializing to the estimator in stage 1, imply that $‖ \tilde{β} - β_{0} ‖_{\infty} = O_{p} (a_{T})$ where $a_{T} = \sqrt{log (Tm) / T}$ . On the event ${‖ \tilde{β} - β_{0} ‖_{\infty} \leq C a_{T}}$ . It is easy to see that (A.11) $\max_{u - 1 < k < e} | Δ_{u, e} (k) - Δ_{u, e}^{0} (k) | \leq a_{T} .$ (A.11)

where $Δ_{u, e}^{0} (k) = \sqrt{\frac{(e - k) (k - u + 1)}{e - u + 1}} | \frac{\sum_{l = k + 1}^{e} β_{0 (l)}}{e - k} - \frac{\sum_{l = u}^{k} β_{0 (l)}}{k - u + 1} | .$

Now suppose u – 1 and e are both change points and there is at least one change point inside $(u - 1, e)$ . Let $\hat{k} = \arg \max_{u - 1 < k < e} Δ_{u, e} (k)$ and $k_{0} = \arg \max_{u - 1 < k < e} Δ_{u, e}^{0} (k)$ . We prove consistency by way of contradiction. Suppose $\hat{k}$ is not one of the true change points. Using some results in Venkatraman (1992) and Cho and Fryzlewicz (2012), we can show that this would lead to $Δ_{u, e} (k_{0}) > Δ_{u, e} (\hat{k})$ by (A.11), a contradiction by the definition of $\hat{k}$ . Also, in this case, it is easy to see that $\max_{u - 1 < k < e} Δ_{u, e} (k) \geq \max_{u - 1 < k < e} Δ_{u, e}^{0} (k) - a_{n} \geq C γ_{2} - a_{T} > δ_{2}$ .

Now suppose still $u - 1, e$ are both change points but there are no other change point inside $(u - 1, e)$ . In this case, using (A.11), it is easy to see that $\max_{u - 1 < k < e} Δ_{u, e} (k) \leq \sqrt{n} a_{T}$ .

Since we refrain from further partitioning the interval $(u - 1, e)$ if and only if $\max_{u - 1 < k < e} Δ_{u, e} (k) < δ_{2}$ with $\sqrt{n} a_{T} ≪ δ_{2} ≪ γ_{2}$ , we see that the algorithm consistently identifies exactly the true change points in $β_{0}$ .

The proof for change point detection in $θ$ is the same, and the proof of Theorem 2 is complete.

A.5 Proof of Theorem 3

For the first statement, we just need to note that $\bar{β}$ is the minimizer of $\min_{a} \sum_{i = 1}^{m} ‖ β_{i} - a ‖^{2},$

and all ${\overset{ˇ}{β}}_{i}$ are the same, thus $\frac{1}{mp} \sum_{i = 1}^{m} ‖ {\overset{ˇ}{β}}_{i} - β_{i} ‖^{2} \geq \frac{1}{mp} \sum_{i = 1}^{m} ‖ β_{i} - \bar{β} ‖^{2} \geq c .$

Similarly we can show the second statement.

Homogeneity Pursuit in Single Index Models based Panel Data Analysis

A.1 Additional notations

A.2 Proof summary

A.3 Proof of asymptotic property for the oracle estimator

A.4 Proof of Theorems 1 and 2

A.5 Proof of Theorem 3

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Homogeneity Pursuit in Single Index Models based Panel Data Analysis

Abstract

Acknowledgements

Appendix

A.1 Additional notations

A.2 Proof summary

A.3 Proof of asymptotic property for the oracle estimator

A.4 Proof of Theorems 1 and 2

A.5 Proof of Theorem 3

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature