Variable selection for longitudinal varying coefficient errors-in-variables models: Communications in Statistics - Theory and Methods: Vol 51 , No 11

Abstract

In this paper, we investigate the variable selection for varying coefficient errors-in-variables (EV) models with longitudinal data when some covariates are measured with additive errors. A variable selection method based on bias-corrected penalized quadratic inference function (pQIF) is proposed by combining the basis function approximation to coefficient functions and bias-corrected quadratic inference function (QIF) with shrinkage estimations. The proposed method can handle the measurement errors of covariates and within-subject correlation, estimate and select non-zero nonparametric coefficient functions. With appropriate selection of the tuning parameters, we establish the consistency of the variable selection method and the sparsity properties of the regularized estimators. The finite sample performance of the proposed method is assessed by simulation studies. The utility of the method is further demonstrated via a real data analysis.

Keywords:

MATHEMATICS SUBJECT CLASSIFICATION:

Acknowledgments

We thank the editor and reviewers for their helpful comments that significantly improved the manuscript.

Appendix 1. Derivation process of EquationEquation (9)(9) $D_{i}^{(k)} = Σ_{u} \otimes (B_{i} diag (A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2}) B_{i}^{T}),$ (9)

According to the above, we can see that $u_{i j} = {(u_{i j}^{1}, u_{i j}^{2}, \dots, u_{i j}^{q})}^{T},$

$\begin{array}{l} B_{i j} = I_{q} \otimes B (t_{i j}) = {(\begin{matrix} B (t_{i j}) & 0 & \dots & 0 \\ 0 & B (t_{i j}) & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & B (t_{i j}) \end{matrix})}_{L q \times q}, {\tilde{u}}_{i j} = B_{i j} u_{i j} = {(\begin{matrix} B (t_{i j}) u_{i j}^{1} \\ B (t_{i j}) u_{i j}^{2} \\ ⋮ \\ B (t_{i j}) u_{i j}^{q} \end{matrix})}_{L q \times 1}, \\ {\tilde{u}}_{i} = {({\tilde{u}}_{i 1}, {\tilde{u}}_{i 2}, \dots, {\tilde{u}}_{i n_{i}})}^{T} = (\begin{matrix} {\tilde{u}}_{i 1}^{T} \\ {\tilde{u}}_{i 2}^{T} \\ ⋮ \\ {\tilde{u}}_{i n_{i}}^{T} \end{matrix}) = {(\begin{matrix} u_{i 1}^{1} B^{T} (t_{i 1}) & u_{i 1}^{2} B^{T} (t_{i 1}) & \dots & u_{i 1}^{q} B^{T} (t_{i 1}) \\ u_{i 2}^{1} B^{T} (t_{i 2}) & u_{i 2}^{2} B^{T} (t_{i 2}) & \dots & u_{i 2}^{q} B^{T} (t_{i 2}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ u_{i n_{i}}^{1} B^{T} (t_{i n_{i}}) & u_{i n_{i}}^{2} B^{T} (t_{i n_{i}}) & \dots & u_{i n_{i}}^{q} B^{T} (t_{i n_{i}}) \end{matrix})}_{n_{i} \times L q} \end{array}$

For simplicity, denote $Γ = A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} = {(γ_{i j})}_{n_{i} \times n_{i}} .$ According to $cov (u_{i j_{1}}, u_{i j_{2}}) = 0$ for $j_{1} \neq j_{2},$ so we ca get $\begin{matrix} D_{i}^{(k)} = E ({\tilde{u}}_{i}^{T} Γ {\tilde{u}}_{i}) = E (({\tilde{u}}_{i 1}, {\tilde{u}}_{i 2}, \dots, {\tilde{u}}_{i n_{i}}) (\begin{matrix} γ_{11} & γ_{12} & \dots & γ_{1 n_{i}} \\ γ_{21} & γ_{22} & \dots & γ_{2 n_{i}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ γ_{n_{i} 1} & γ_{n_{i} 2} & \dots & γ_{n_{i} n_{i}} \end{matrix}) (\begin{matrix} {\tilde{u}}_{i 1}^{T} \\ {\tilde{u}}_{i 2}^{T} \\ ⋮ \\ {\tilde{u}}_{i n_{i}}^{T} \end{matrix})) \\ = E ((\begin{matrix} \sum_{j = 1}^{n_{i}} u_{i j}^{1} γ_{j 1} B (t_{i j}) & \sum_{j = 1}^{n_{i}} u_{i j}^{1} γ_{j 2} B (t_{i j}) & \dots & \sum_{j = 1}^{n_{i}} u_{i j}^{1} γ_{j n_{i}} B (t_{i j}) \\ \sum_{j = 1}^{n_{i}} u_{i j}^{2} γ_{j 1} B (t_{i j}) & \sum_{j = 1}^{n_{i}} u_{i j}^{2} γ_{j 2} B (t_{i j}) & \dots & \sum_{j = 1}^{n_{i}} u_{i j}^{2} γ_{j n_{i}} B (t_{i j}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \sum_{j = 1}^{n_{i}} u_{i j}^{q} γ_{j 1} B (t_{i j}) & \sum_{j = 1}^{n_{i}} u_{i j}^{q} γ_{j 2} B (t_{i j}) & \dots & \sum_{j = 1}^{n_{i}} u_{i j}^{q} γ_{j n_{i}} B (t_{i j}) \end{matrix}) (\begin{matrix} u_{i 1}^{1} B^{T} (t_{i 1}) & u_{i 1}^{2} B^{T} (t_{i 1}) & \dots & u_{i 1}^{q} B^{T} (t_{i 1}) \\ u_{i 2}^{1} B^{T} (t_{i 2}) & u_{i 2}^{2} B^{T} (t_{i 2}) & \dots & u_{i 2}^{q} B^{T} (t_{i 2}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ u_{i n_{i}}^{1} B^{T} (t_{i n_{i}}) & u_{i n_{i}}^{2} B^{T} (t_{i n_{i}}) & \dots & u_{i n_{i}}^{q} B^{T} (t_{i n_{i}}) \end{matrix})) \\ = (\begin{matrix} {(D_{i}^{(k)})}_{11} & 0 & \dots & 0 \\ 0 & {(D_{i}^{(k)})}_{22} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & {(D_{i}^{(k)})}_{q q} \end{matrix}) \end{matrix}$ where $σ_{l}^{2} = var (u_{i j}^{l}), l = 1, 2, \dots, q,$ and $B_{i} = (B (t_{i 1}), B (t_{i 2}), \dots, B (t_{i n_{i}}))$ $\begin{matrix} {(D_{i}^{(k)})}_{l l} = E (u_{i 1}^{1} u_{i 1}^{1}) γ_{11} B (t_{i 1}) B^{T} (t_{i 1}) + E (u_{i 2}^{1} u_{i 2}^{1}) γ_{22} B (t_{i 2}) B^{T} (t_{i 2}) + \dots + E (u_{i n_{i}}^{1} u_{i n_{i}}^{1}) γ_{n_{i} n_{i}} B (t_{i n_{i}}) B^{T} (t_{i n_{i}}) \\ = σ_{1}^{2} γ_{11} B (t_{i 1}) B^{T} (t_{i 1}) + σ_{1}^{2} γ_{22} B (t_{i 2}) B^{T} (t_{i 2}) + \dots + σ_{1}^{2} γ_{n_{i} n_{i}} B (t_{i n_{i}}) B^{T} (t_{i n_{i}}) \\ = σ_{1}^{2} B_{i} (\begin{matrix} γ_{11} & 0 & \dots & 0 \\ 0 & γ_{22} & \dots & 0 \\ 0 & 0 & ⋱ & ⋮ \\ 0 & 0 & \dots & γ_{n_{i} n_{i}} \end{matrix}) B_{i}^{T} \\ = σ_{1}^{2} B_{i} diag (Γ) B_{i}^{T} \\ D_{i}^{(k)} = (\begin{matrix} σ_{1}^{2} B_{i} diag (Γ) B_{i}^{T} & 0 & \dots & 0 \\ 0 & σ_{2}^{2} B_{i} diag (Γ) B_{i}^{T} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{q}^{2} B_{i} diag (Γ) B_{i}^{T} \end{matrix}) = Σ_{u} \otimes B_{i} diag (Γ) B_{i}^{T} \end{matrix}$

The derivation process of EquationEquation (9)(9) $D_{i}^{(k)} = Σ_{u} \otimes (B_{i} diag (A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2}) B_{i}^{T}),$ (9) is finished.

Appendix 2. Proof of theorems

Lemma 1.

Suppose Conditions C2 and C8 hold and $K = O (N^{1 / (2 r + 1)})$ , there exists a constant $c_{0}$ that satisfies (22) $\sup_{t \in [0, 1]} | θ_{l} (t) - B^{T} (t) β_{l 0} | \leq c_{0} K^{- r}, l = 1, 2, \dots, q .$ (22)

Lemma 1 is the Corollary 6.21 in Schumaker (Citation2007), the proof is omitted here.

Lemma 2.

Assume Conditions C1-C11 hold, and $K = O (N^{1 / (2 r + 1)})$ , then we have

${\dot{\hat{\bar{g}}}}_{n} (β) \overset{p}{\to} - J_{0},$ where “ $\overset{p}{\to}$ ” represents the convergence in probability.
$\sqrt{n} {\hat{\bar{g}}}_{n} (β_{0}) \overset{d}{\to} N (0, Ω_{0}), {\hat{\bar{g}}}_{n} (β_{0}) = O_{p} (n^{- 1 / 2}),$ where “ $\overset{d}{\to}$ ” represents the convergence in distribution.

Proof.

We first prove part (i). According to EquationEquation (12)(12) ${\hat{\bar{g}}}_{n} (β) = \frac{1}{n} \sum_{i = 1}^{n} {\hat{g}}_{i} (β) = \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} {\tilde{W}}_{i}^{T} A_{i}^{- 1 / 2} M_{1} A_{i}^{- 1 / 2} (Y_{i} - {\tilde{W}}_{i}^{T} β) + {\hat{D}}_{i}^{(1)} β \\ ⋮ \\ {\tilde{W}}_{i}^{T} A_{i}^{- 1 / 2} M_{s} A_{i}^{- 1 / 2} (Y_{i} - {\tilde{W}}_{i}^{T} β) + {\hat{D}}_{i}^{(s)} β \end{matrix}) .$ (12) , we can get the first derivative of ${\hat{\bar{g}}}_{n} (β)$ about β as (23) ${\dot{\hat{\bar{g}}}}_{n} (β) = \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} - {\tilde{W}}_{i}^{T} A_{i}^{- 1 / 2} M_{1} A_{i}^{- 1 / 2} {\tilde{W}}_{i} + {\hat{D}}_{i}^{(1)} \\ ⋮ \\ - {\tilde{W}}_{i}^{T} A_{i}^{- 1 / 2} M_{s} A_{i}^{- 1 / 2} {\tilde{W}}_{i} + {\hat{D}}_{i}^{(s)} \end{matrix}) .$ (23)

Consider the kth block matrix of ${\dot{\hat{\bar{g}}}}_{n} (β)$ as ${\dot{\hat{\bar{g}}}}_{n k} (β), k = 1, 2, \dots, s$ $\begin{array}{l} {\dot{\hat{\bar{g}}}}_{n k} (β) = - \frac{1}{n} \sum_{i = 1}^{n} ({\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{X}}_{i} + {\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i} \\ + {\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{X}}_{i} + {\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i} - {\hat{D}}_{i}^{(k)}) \\ = - (Δ_{1} + Δ_{2} + Δ_{3} + Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)}) . \end{array}$

Now, we prove $Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} \overset{p}{\to} 0$ as $n \to \infty .$ $Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} = \frac{1}{n} \sum_{i = 1}^{n} {\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i} - D_{i}^{(k)} + D_{i}^{(k)} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)}$ where $D_{i}^{(k)} = E ({\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i}) .$ Clearly we get $\frac{1}{n} \sum_{i = 1}^{n} {\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i} - D_{i}^{(k)} \overset{p}{\to} 0$ as $n \to \infty .$ $E ({\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i}) - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} = Σ_{u} \otimes B_{i} diag (Γ) B_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} ({\hat{Σ}}_{u} \otimes B_{i} diag (Γ) B_{i}^{T})$

According to EquationEquation (10)(10) ${\hat{Σ}}_{u} = \frac{1}{n n_{0}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{0}} (\frac{1}{m_{i} - 1} \sum_{r = 1}^{m_{i}} (W_{i j}^{(r)} - {\bar{W}}_{i j}) {(W_{i j}^{(r)} - {\bar{W}}_{i j})}^{T}),$ (10) , we see that $\frac{1}{m_{i} - 1} \sum_{r = 1}^{m_{i}} (W_{i j}^{(r)} - {\bar{W}}_{i j}) {(W_{i j}^{(r)} - {\bar{W}}_{i j})}^{T}$ is the sample covariance matrix of Σ_u, which implies that ${\hat{Σ}}_{u}$ is the mean of some sample covariance matrices and ${\hat{Σ}}_{u} \overset{p}{\to} Σ_{u}$ as $n \to \infty .$ According to the plug-in principle, we get $E ({\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i}) - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} \overset{p}{\to} 0, and Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} \overset{p}{\to} 0.$

Under Condition C9, we can get $Δ_{1} \overset{p}{\to} J_{0}^{(k)} .$ To prove $Δ_{2}^{T} = Δ_{3} \overset{p}{\to} 0,$ denote $Δ_{2} = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i k},$ where $ξ_{i k} = {\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i},$ we can get $E (ξ_{i k}) = 0$ and $cov (ξ_{i k}) = {\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} E ({\tilde{u}}_{i} {\tilde{u}}_{i}^{T}) A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{X}}_{i} .$

From Conditions C4-C7, $E ({\tilde{u}}_{i} {\tilde{u}}_{i}^{T})$ is bounded. By the law of large numbers, $Δ_{3} \overset{p}{\to} 0 .$ So we get ${\dot{\hat{\bar{g}}}}_{n k} (β) \overset{p}{\to} - J_{0}^{(k)}$ and ${\dot{\hat{\bar{g}}}}_{n} (β) \overset{p}{\to} - J_{0}$ where $J_{0} = {(J_{0}^{(1)}, J_{0}^{(2)}, \dots, J_{0}^{(s)})}^{T} .$ The proof of the part (i) is completed.

Nextly, we prove part (ii). We first prove ${\hat{\bar{g}}}_{n} (β_{0}) \overset{p}{\to} 0 .$ Consider the kth block matrix of ${\hat{\bar{g}}}_{n} (β_{0})$ as ${\hat{\bar{g}}}_{n k} (β_{0}), k = 1, 2, \dots, s$ $\begin{matrix} {\hat{\bar{g}}}_{n k} (β_{0}) = \frac{1}{n} \sum_{i = 1}^{n} ({\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} ε_{i} - {\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i} β_{0} \\ + {\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} ε_{i} - {\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i} β_{0} \\ - {({\tilde{X}}_{i} + {\tilde{u}}_{i})}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{X}}_{i} R (t_{i}) + {\hat{D}}_{i}^{(k)} β_{0}) \\ = \frac{1}{n} \sum_{i = 1}^{n} (J_{1 i}^{(k)} - J_{2 i}^{(k)} + J_{3 i}^{(k)} - J_{4 i}^{(k)} - J_{5 i}^{(k)} + {\hat{D}}_{i}^{(k)} β_{0}) \\ = J_{1}^{(k)} - J_{2}^{(k)} + J_{3}^{(k)} - J_{4}^{(k)} - J_{5}^{(k)} + \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} β_{0} \end{matrix}$ where $R (t) = {(R_{1} (t), R_{2} (t), \dots, R_{q} (t))}^{T}, R_{l} (t) = θ_{l} (t) - B^{T} (t) β_{l 0}, l = 1, 2, \dots, q, J_{m} = (J_{m}^{(1)}, J_{m}^{(2)}, \dots, J_{m}^{(s)}), m = 1, 2, 3, 4, 5 .$

Obviously, we have $E (J_{1 k i}) = 0$ and $cov (J_{1 i}^{(k)}) = {\tilde{X}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} V_{i} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{X}}_{i}$

From Conditions C4 to C7, $cov (J_{1 i}^{(k)})$ is bounded, and we can get $J_{1}^{(k)} \overset{p}{\to} 0$ by the law of large numbers. Similarly, we have $cov (J_{2 i}^{(k)}) < \infty, J_{2}^{(k)} \overset{p}{\to} 0$ and satisfies.

In addition, $E (J_{3 i}^{(k)}) = 0 .$ According to the Cauchy-Schwarz inequality we have ${(cov (J_{3 i}^{(k)}))}^{2} \leq E ({\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} {\tilde{u}}_{i}) E (ε_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} ε_{i}) < \infty .$

Therefore, $J_{3}^{(k)} \overset{p}{\to} 0 .$

Under Condition C8 and Lemma 1, we have $J_{5}^{(k)} = O_{p} (n^{- 1 / 2} K^{- r}) = o_{p} (n^{- 1 / 2}) .$ From the definition of ${\hat{D}}_{i}^{(k)}$ and by the law of large numbers, $J_{4}^{(k)} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} β_{0} \overset{p}{\to} 0 .$ By the definition of ${\hat{D}}_{i}^{(k)}$ and the central limit theorem, we can get $J_{4}^{(k)} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} β_{0} = O_{p} (n^{- 1 / 2}) .$ So we have ${\hat{\bar{g}}}_{n k} (β_{0}) \overset{p}{\to} 0$ and ${\hat{\bar{g}}}_{n} (β_{0}) \overset{p}{\to} 0 .$

Next, we prove ${\hat{\bar{g}}}_{n} (β_{0}) = O_{p} (n^{- 1 / 2}) .$ According to the above conclusions, we have $\begin{matrix} {\hat{\bar{g}}}_{n k} (β_{0}) = \frac{1}{n} \sum_{i = 1}^{n} (J_{1 i}^{(k)} - J_{2 i}^{(k)} + J_{3 i}^{(k)} - (J_{4 i}^{(k)} - {\hat{D}}_{i}^{(k)} β_{0})) + o_{p} (n^{- 1 / 2}) \\ = \frac{1}{n} \sum_{i = 1}^{n} η_{i}^{(k)} + o_{p} (n^{- 1 / 2}) \end{matrix}$ where $η_{i} = {(η_{i}^{(1)}, η_{i}^{(2)}, \dots, η_{i}^{(s)})}^{T},$ $η_{i}^{(k)} = J_{1 i}^{(k)} - J_{2 i}^{(k)} + J_{3 i}^{(k)} - (J_{4 i}^{(k)} - {\hat{D}}_{i}^{(k)}) .$ Furthermore, we get ${\hat{\bar{g}}}_{n} (β_{0}) = \frac{1}{n} \sum_{i = 1}^{n} η_{i} + o_{p} (n^{- 1 / 2}), and Ω_{n} (β_{0}) = \frac{1}{n} \sum_{i = 1}^{n} η_{i} η_{i}^{T} + o_{p} (1) .$

Obviously, according to the above conclusions, we can get $E (η_{i}^{(k)}) = 0, cov (η_{i}^{(k)}) < \infty, m = 1, 2, 3, 4 .$

Under conditions C4-C7, following Tian, Xue, and Liu (Citation2014), for any $a \in R^{sqL}$ which satisfies $a^{T} a = 1, E (a^{T} J_{1 i}) = 0$ and $\sup_{i} | | a^{T} J_{1 i} | |^{2 + δ} \leq | | a^{T} | |^{2 + δ} \sup_{i} E | | J_{1 i} | |^{2 + δ} \leq \sup_{i} E | | ε_{i} | |^{2 + δ} \leq \infty .$ Similarly, for any $a \in R^{sqL}$ such that $a^{T} a = 1, E (a^{T} J_{2 i}) = 0,$ then $\sup_{i} | | a^{T} J_{2 i} | |^{2 + δ} \leq | | a^{T} | |^{2 + δ} \sup_{i} E | | J_{2 i} | |^{2 + δ} \leq \sup_{i} E | | u_{i} | |^{2 + δ} \leq \infty .$ Using the Cauchy-Schwarz inequality, for any $a \in R^{sqL}$ such that $a^{T} a = 1, E (a^{T} J_{3 i}) = 0,$ $\sup_{i} | | a^{T} J_{3 i} | |^{2 + δ} \leq \sqrt{\sup_{i} E | | ε_{i} | |^{2 + δ}} \cdot \sqrt{\sup_{i} E | | u_{i} | |^{2 + δ}} \leq \infty .$ So, we know that $a^{T} J_{1 i}, a^{T} J_{2 i}$ and $a^{T} J_{3 i}$ satisfy the Lyapunov condition for central limit theorem. In addition, we have $J_{4}^{(k)} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} β_{0} = O_{p} (n^{- 1 / 2}),$ under condition C5, so we get $\forall a \in R^{sqL}$ such that $a^{T} a = 1, E (a^{T} η_{i}) = 0, \sup_{i} E | | a^{T} η_{i} | |^{2 + δ} < \infty$ which implies that $a^{T} η_{i}$ satisfies the Lyapunov condition for the central limit theorem. Thus ${(a^{T} \sum_{i = 1}^{n} cov (η_{i}) a)}^{- 1 / 2} (\sum_{i = 1}^{n} a^{T} η_{i}) \overset{d}{\to} N (0, 1) .$

According to condition C4, we have $\frac{1}{n} \sum_{i = 1}^{n} cov (η_{i}) \overset{P}{\to} Ω_{0} .$ So $\sqrt{n} {\hat{\bar{g}}}_{n} (β_{0}) = \sqrt{n} \frac{1}{n} \sum_{i = 1}^{n} η_{i} + \sqrt{n} o_{p} (n^{- 1 / 2}) \overset{d}{\to} N (0, Ω_{0}), {\hat{\bar{g}}}_{n} (β_{0}) = O_{p} (n^{- 1 / 2}) .$

The proof of Lemma 2 is completed.

Lemma 3.

Suppose that the preceding regularity conditions of C1-C11 hold, $K = O (n^{1 / (2 r + 1)})$ then $\begin{matrix} | | n^{- 1} {\dot{Q}}_{n} (β_{0}) - 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) | | = O_{p} (n^{- 1}), \\ | | n^{- 1} {\ddot{Q}}_{n} (β_{0}) - 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (β_{0}) | | = o_{p} (1) . \end{matrix}$

Proof.

Following Tian, Xue, and Liu (Citation2014), apply Taylor expansion to $Q_{n} (β_{0})$ at $β_{0},$ we have $\begin{matrix} n^{- 1} {\dot{Q}}_{n} (β_{0}) = 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) + {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\dot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) \\ = 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) + O_{p} (n^{- 1}), \end{matrix}$ where ${\dot{Ω}}_{n}$ is a three-dimensional array of $(\frac{\partial Ω_{n}}{\partial β_{1}}, \frac{\partial Ω_{n}}{\partial β_{2}}, \dots, \frac{\partial Ω_{n}}{\partial β_{q}}) .$ By Lemma 2, we can see that ${\hat{\bar{g}}}_{n} (β_{0}) = O_{p} (n^{- 1 / 2}) .$ Under condition C4, we have ${\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\dot{Ω}}_{n}^{- 1} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) = O_{p} (n^{- 1 / 2}), n^{- 1} {\dot{Q}}_{n} (β_{0}) = 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) + O_{p} (n^{- 1}),$

So, we have $| | n^{- 1} {\dot{Q}}_{n} (β_{0}) - 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) | | = O_{p} (n^{- 1}) .$

Similarly, we get $n^{- 1} {\ddot{Q}}_{n} (β_{0}) = 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (β_{0}) + R_{n},$ where $R_{n} = 2 {\ddot{\hat{\bar{g}}}}_{n}^{T} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} - 4 {\dot{\hat{\bar{g}}}}_{n}^{T} Ω_{n}^{- 1} {\dot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} + 2 {\hat{\bar{g}}}_{n}^{T} Ω_{n}^{- 1} Ω_{n}^{- 1} {\dot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} - {\hat{\bar{g}}}_{n}^{T} Ω_{n}^{- 1} {\ddot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n}$ is a four-dimensional array ${\frac{\partial^{2} Ω_{n}}{\partial β_{i} β_{j}} : i, j = 1, 2, \dots, q} .$

By the definition of ${\hat{\bar{g}}}_{n} (β),$ and ${\ddot{\hat{\bar{g}}}}_{n} (β) = 0,$ so we have ${\ddot{\hat{\bar{g}}}}_{n}^{T} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} = 0 .$ Using Lemma 2, we get $\begin{matrix} {\dot{\hat{\bar{g}}}}_{n}^{T} Ω_{n}^{- 1} {\ddot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} = O_{p} (n^{- 1 / 2}) = o_{p} (1), {\hat{\bar{g}}}_{n}^{T} Ω_{n}^{- 1} Ω_{n}^{- 1} {\dot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} = O_{p} (n^{- 1}) = o_{p} (1) \\ {\dot{\hat{\bar{g}}}}_{n}^{T} Ω_{n}^{- 1} {\dot{Ω}}_{n} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} = o_{p} (1) \end{matrix}$

Hence we have $R_{n} = o_{p} (1) .$ So we get $| | n^{- 1} {\ddot{Q}}_{n} (β_{0}) - 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (β_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (β_{0}) | | = o_{p} (1) .$

The proof of Lemma 3 is completed.

Proof of Theorem 1.

From Lemma 1, we have $| | θ_{l} (t) - B^{T} (t) β_{l 0} | | = O (K^{- r}) .$ Suppose $δ = n^{- r / (2 r + 1)}$ and $β = β_{0} + δ D .$ To prove Theorem 1, we just have to show that for any $ε > 0,$ there exists a large constant C such that (24) $P {\inf_{| | D | | = C} Q_{p} (β) \geq Q_{p} (β_{0})} \geq 1 - ε .$ (24)

When $ε \geq 1,$ (24) is always true. So we just assume $ε \in (0, 1) .$ Without loss of generality, assume $θ_{l} (\cdot) = 0, l = q_{1} + 1, \dots, q$ and $p_{λ} (0) = 0,$ we have $Q_{p} (β) - Q_{p} (β_{0}) \geq Q_{n} (β) - Q_{n} (β_{0}) + \sum_{l = 1}^{q_{1}} n [p_{λ_{l}} (| | β_{l} | |_{H}) - p_{λ_{l}} (| | β_{l 0} | |_{H})] .$

Apply Taylor expansion to $Q_{n} (β_{0})$ at $β_{0},$ we have $Q_{n} (β) = Q_{n} (β_{0} + δ D) = Q_{n} (β_{0}) + δ D^{T} {\dot{Q}}_{n} (β_{0}) + \frac{1}{2} δ^{2} D^{T} {\ddot{Q}}_{n} (β_{0}) D + | | D | |^{2} o_{p} (1),$ where $\tilde{β}$ lies between β and $β_{0} .$ According to Lemmas 1 and 2, we can get $δ D^{T} {\dot{Q}}_{n} (θ_{0}) = δ D^{T} {2 n {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) + n O_{p} (n^{- 1})} = | | D | | O_{p} (\sqrt{n} δ) + | | D | | O_{p} (δ),$ and $\begin{matrix} \frac{1}{2} δ^{2} D^{T} {\ddot{Q}}_{n} (θ_{0}) D = \frac{1}{2} δ^{2} D^{T} {2 n {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (θ_{0}) + n o_{p} (1)} D \\ = n δ^{2} D^{T} {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (θ_{0}) D + n δ^{2} | | D | |^{2} o_{p} (1) . \end{matrix}$

Therefore we have $\begin{matrix} Q_{n} (β) - Q_{n} (β_{0}) = n δ^{2} | | D | |^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} + | | D | | O_{p} (\sqrt{n} δ) + | | D | | O_{p} (δ) + n δ^{2} | | D | |^{2} o_{p} (1) . \end{matrix}$

Obviously, $n δ^{2} | | D | |^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} \geq 0 .$ When C is large enough, $n δ^{2} | | D | |^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} \geq | | D | | O_{p} (\sqrt{n} δ), n δ^{2} | | D | |^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} \geq n δ^{2} | | D | |^{2} o_{p} (1) .$

So when C is large enough, $Q_{n} (θ) > Q_{n} (θ_{0}) .$

Assume $λ_{l} \to 0$ and $K = O (n^{1 / (2 r + 1)}) .$ When n is large enough, we have $| | β_{l} | |_{H} \geq a λ, | | β_{l 0} | |_{H} \geq a λ .$ Following the definition of the penalty function, we get $p_{λ_{l}} (| | β_{l} | |_{H}) = p_{λ_{l}} (| | β_{l 0} | |_{H}) = \frac{(1 + a) λ_{l}^{2}}{2}, \sum_{l = 1}^{q_{1}} n [p_{λ_{l}} (| | β_{l} | |_{H}) - p_{λ_{l}} (| | β_{l 0} | |_{H})] = 0.$

So, for any given $ε > 0,$ there exists a large enough C which satisfies EquationEquation (24)(24) $P {\inf_{| | D | | = C} Q_{p} (β) \geq Q_{p} (β_{0})} \geq 1 - ε .$ (24) , which further implies that there exists $\hat{β}$ which satisfies $| | \hat{β} - β_{0} | | = O_{p} (δ) = O_{p} (n^{- r / (2 r + 1)}) .$ Note that $\begin{matrix} | | {\hat{θ}}_{l} (t) - θ_{l} (t) | |^{2} = \int_{0}^{1} {B^{T} (t) {\hat{β}}_{l} - B^{T} (t) β_{l 0} + θ_{l} (t) - B^{T} (t) β_{l 0}}^{2} d t \\ \leq 2 \int_{0}^{1} {B^{T} (t) {\hat{β}}_{l} - B^{T} (t) β_{l 0}}^{2} d t + 2 \int_{0}^{1} {θ_{l} (t) - B^{T} (t) β_{l 0}}^{2} d t \\ = 2 {({\hat{β}}_{l} - β_{l 0})}^{T} (\int_{0}^{1} B^{T} (t) B (t) d t) ({\hat{β}}_{l} - β_{l 0}) + 2 \int_{0}^{1} {θ_{l} (t) - B^{T} (t) β_{l 0}}^{2} d t \\ = 2 {({\hat{β}}_{l} - β_{l 0})}^{T} H ({\hat{β}}_{l} - β_{l 0}) + 2 \int_{0}^{1} R_{l} {(t)}^{2} d t . \end{matrix}$

With the same arguments above, we can get $| | \hat{β} - β | | = O_{p} (n^{- r / (2 r + 1)}) .$ Therefore, invoking $H = O (1)),$ we have ${({\hat{β}}_{l} - β_{l 0})}^{T} H ({\hat{β}}_{l} - β_{l 0}) = O_{P} (n^{- 2 r / (2 r + 1)}) .$ With Lemma 1, we get $\int_{O}^{1} R_{l} {(t)}^{2} d t = O_{P} (n^{- 2 r / (2 r + 1)}) .$ Thus, we complete the proof of Theorem 1.

Proof of Theorem 2.

Assume $θ_{l} (\cdot) = 0$ for $l = q_{1} + 1, \dots, q$ and $θ_{l} (\cdot) (l = 1, 2, \dots, q_{1})$ are non-zero coefficient functions. So we get the corresponding regression parameter space $Θ$ as $Θ = {β : β = {(β_{1}^{T}, β_{2}^{T}, \dots, β_{q}^{T})}^{T}, β_{l}^{T} = 0, l = q_{1} + 1, q_{1} + 2, \dots, q} .$

For $l = q_{1} + 1, \dots, q,$ denote $Θ_{l} = {β : β = (0^{T}, 0^{T}, \dots, 0^{T}, β_{l}^{T}, 0^{T}, \dots, 0^{T})},$ where 0 is an $L \times 1$ vector of zeros. From Lemma 1 and Xue, Qu, and Zhou (Citation2010), we have $| | B^{T} (\cdot) β_{l} | | = O (n^{r / (2 r + 1)}),$ and $| | B^{T} (\cdot) β_{l} | | λ_{l} = O (n^{r / (2 r + 1)} λ_{l}) \to \infty .$ To prove Theorem 2, it is sufficient to show that, for any $β \in Θ$ and $β_{l}^{*} \in Θ_{l}, Q_{p} (β + β_{l}^{*}) \geq Q_{p} (β)$ is true with probability 1. $\begin{matrix} Q_{p} (β + β_{l}^{*}) - Q_{p} (β) = Q_{n} (β + β_{l}^{*}) - Q_{n} (β) + n p_{λ_{l}} (| | β_{l}^{*} | |_{H}) \\ = β_{l} {^{*}}^{T} {\dot{Q}}_{n} (β) + \frac{1}{2} β_{l} {^{*}}^{T} {\ddot{Q}}_{n} ({\hat{β}}_{l}^{*}) β_{l}^{*} (1 + o_{p} (1)) + n p_{λ_{l}} (| | β_{l}^{*} | |_{H}) \\ = n λ_{l} | | B^{T} (\cdot) β_{l}^{*} | | {\frac{R_{l}^{*}}{λ_{l}} + \frac{p'_{λ_{l}} (t)}{λ_{l}}} (1 + o_{p} (1)), \end{matrix}$ where ${\hat{β}}_{l}^{*}$ lies between $β + β_{l}^{*}$ and β, t lies between $0$ and $| | β_{l}^{*} | |_{H} .$ Furthermore, we get $R_{l}^{*} = \frac{β_{l}^{*}^{T} n^{- 1} {\dot{Q}}_{n} ({\hat{β}}_{l}^{*}) + \frac{1}{2} β_{l}^{*}^{T} n^{- 1} {\ddot{Q}}_{n} ({\hat{β}}_{l}^{*}) β_{l}^{*}}{| | B^{T} (\cdot) β_{l}^{*} | |} .$

According to Lemmas 2 and 3, we have $\begin{matrix} β_{l}^{* T} n^{- 1} {\dot{Q}}_{n} ({\hat{β}}_{l}^{*}) = O_{p} (n^{- 1 / 2}) = o_{p} (1), β_{l}^{* T} n^{- 1} {\ddot{Q}}_{n} ({\hat{β}}_{l}^{*}) β_{l}^{*} = β_{l}^{* T} J_{0}^{T} Ω_{0}^{- 1} J_{0} β_{l}^{*} + o_{p} (1), \\ \frac{R_{l}^{*}}{λ_{l}} = \frac{β_{l}^{*}^{T} J_{0}^{T} Ω_{0}^{- 1} J_{0} β_{l}^{*}}{| | B^{T} (\cdot) β_{l}^{*} | | λ_{l}} + o_{p} (1) \to 0. \end{matrix}$

Form Conditions C10 and C11, $\underset{n \to \infty}{\lim \inf} \underset{| | β_{l} | |_{H} \to 0}{\lim \inf} \frac{p'_{λ_{l}} (t)}{λ_{l}} > 0, l = q_{1} + 1, \dots, q .$

So for any $β \in Θ$ and $β_{l}^{*} \in Θ_{l}, Q_{p} (β + β_{l}^{*}) \geq Q_{p} (β)$ is true with probability tending to 1. This completes the proof of Theorem 2.

Proof of Theorem 3.

Following Wang, Li, and Tsai (Citation2007) and Tian, Xue, and Liu (Citation2014), we create three mutually exclusive sets: $R_{-} = {λ : S_{λ} \supseteq S_{T}}, R_{0} = {λ : S_{λ} = S_{T}}, R_{+} = {λ : S_{λ} \supseteq S_{T}, S_{λ} = S_{T}},$ where $R_{-}, R_{0},$ and $R_{+}$ represent underfitted, correctly fitted or overfitted model $S_{λ},$ respectively. Then, the theorem can be proved by comparing $BIC (S_{λ})$ and $BIC (S_{T}) .$ Here we consider two separate cases.

Case I: When $λ \in R_{-},$ we have $E ({\hat{\bar{g}}}_{n} (β_{λ})) \neq 0, E ({\hat{\bar{g}}}_{n} (β_{λ_{T}})) = o (1) .$

By the law of large numbers and the continuous mapping theorem, we can get $\begin{matrix} \frac{1}{n} {BIC (S_{λ}) - BIC (S_{T})} = {\hat{\bar{g}}}^{T} (β_{λ}) Ω_{n}^{- 1} (β_{λ}) {\hat{\bar{g}}}_{n} (β_{λ}) + d f_{λ} \frac{log (n)}{n} - \frac{1}{n} BIC (S_{T}) \\ \geq {\hat{\bar{g}}}^{T} (β_{λ}) Ω_{n}^{- 1} (β_{λ}) {\hat{\bar{g}}}_{n} (β_{λ}) - \frac{1}{n} BIC (S_{T}) \\ \geq \inf_{R_{-}} {\hat{\bar{g}}}^{T} (β_{λ}) Ω_{n}^{- 1} (β_{λ}) {\hat{\bar{g}}}_{n} (β_{λ}) - {\hat{\bar{g}}}^{T} (β_{λ_{T}}) Ω_{n}^{- 1} (β_{λ_{T}}) {\hat{\bar{g}}}_{n} (β_{λ_{T}}) - d f_{λ_{T}} \frac{log (n)}{n} \\ \to \min_{S_{λ} \supseteq S_{T}} {E {({\hat{\bar{g}}}_{n} (β_{λ}))}^{T} Ω_{n}^{- 1} (β_{λ}) E ({\hat{\bar{g}}}_{n} (β_{λ}))} - {E {({\hat{\bar{g}}}_{n} (β_{λ_{T}}))}^{T} Ω_{n}^{- 1} (β_{λ_{T}}) E ({\hat{\bar{g}}}_{n} (β_{λ_{T}}))} \\ > 0. \end{matrix}$ Case II: When $λ \in R_{-},$ we have $E ({\hat{\bar{g}}}_{n} (β_{λ})) \neq 0$ and $E ({\hat{\bar{g}}}_{n} (β_{λ_{T}})) = o (1) .$ So we can get $\begin{matrix} \inf_{λ \in R_{+}} \frac{1}{n} {BIC (S_{λ}) - BIC (S_{λ})} = \inf_{λ \in R_{+}} {\hat{\bar{g}}}^{T} (β_{λ}) Ω_{n}^{- 1} (β_{λ}) {\hat{\bar{g}}}_{n} (β_{λ}) - {\hat{\bar{g}}}^{T} (β_{λ_{T}}) Ω_{n}^{- 1} (β_{λ_{T}}) {\hat{\bar{g}}}_{n} (β_{λ_{T}}) + (d f_{λ} - d f_{λ_{T}}) \frac{log (n)}{n} \\ \geq \inf_{λ \in R_{+}} {\hat{\bar{g}}}^{T} (β_{λ}) Ω_{n}^{- 1} (β_{λ}) {\hat{\bar{g}}}_{n} (β_{λ}) - {\hat{\bar{g}}}^{T} (β_{λ_{T}}) Ω_{n}^{- 1} (β_{λ_{T}}) {\hat{\bar{g}}}_{n} (β_{λ_{T}}) \\ \to \min_{S_{λ} \supseteq S_{T}} {E {({\hat{\bar{g}}}_{n} (β_{λ}))}^{T} Ω_{n}^{- 1} (β_{λ}) E ({\hat{\bar{g}}}_{n} (β_{λ}))} - {E {({\hat{\bar{g}}}_{n} (β_{λ_{T}}))}^{T} Ω_{n}^{- 1} (β_{λ_{T}}) E ({\hat{\bar{g}}}_{n} (β_{λ_{T}}))} \\ > 0. \end{matrix}$

Both cases hold true in probability by the law of large numbers and the continuous mapping theorem. This completes the proof of the Theorem 3.

Additional information

Funding

This work was partly supported by a grant from the National Social Science Foundation of China (15CTJ008 to MZ) and a grant from the National Institute of Health (R21HG010073 to YC).

Variable selection for longitudinal varying coefficient errors-in-variables models

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Variable selection for longitudinal varying coefficient errors-in-variables models

Abstract

Acknowledgments

Appendix 1. Derivation process of EquationEquation (9)(9) Di(k)=Σu⊗(Bidiag(Ai−1/2MkAi−1/2)BiT),(9)

Appendix 2. Proof of theorems

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature

Appendix 1. Derivation process of EquationEquation (9)(9) $D_{i}^{(k)} = Σ_{u} \otimes (B_{i} diag (A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2}) B_{i}^{T}),$ (9)