Testing for constancy in varying coefficient models: Communications in Statistics - Theory and Methods: Vol 47 , No 4

ABSTRACT

We consider varying coefficient models, which are an extension of the classical linear regression models in the sense that the regression coefficients are replaced by functions in certain variables (for example, time), the covariates are also allowed to depend on other variables. Varying coefficient models are popular in longitudinal data and panel data studies, and have been applied in fields such as finance and health sciences. We consider longitudinal data and estimate the coefficient functions by the flexible B-spline technique. An important question in a varying coefficient model is whether an estimated coefficient function is statistically different from a constant (or zero). We develop testing procedures based on the estimated B-spline coefficients by making use of nice properties of a B-spline basis. Our method allows longitudinal data where repeated measurements for an individual can be correlated. We obtain the asymptotic null distribution of the test statistic. The power of the proposed testing procedures are illustrated on simulated data where we highlight the importance of including the correlation structure of the response variable and on real data.

KEYWORDS:

MATHEMATICS SUBJECT CLASSIFICATION:

Acknowledgments

We would like to thank the Editor and the referees for their detailed reading and very valuable comments on the manuscript.

Funding

M. Ahkim's research was supported by the Special Research Fund (BOF) of Universiteit Antwerpen [grant number 42FA070300FFB5994]. A. Verhasselt gratefully acknowledge support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) and the FWO [grant number 1.5.137.13N]. The infrastructure of the VSC—Flemish Supercomputer Center, funded by the Hercules Foundation and the Flemish Government—department EWI, was used for the simulations.

Appendix A. Notation

1.	For a real-valued function h on , denotes its supremum norm, while for a real vector valued function h = (h₁, …, h_m)′, we let its supremum norm be ‖h‖_∞ = max _{1 ⩽ i ⩽ m}‖h_i‖_∞.
2.	Let . Define the function g(t) = (g₀(t), …, g_d*(t))′ such that . Let denote the corresponding coefficient vector, i.e., . Throughout we assume that lim_{n → ∞}ρ_n = 0, i.e., the unknown function can be uniformly approximated by spline functions of a certain degree.

Appendix B. Assumptions

Assumption 1.

1.	The observation times t_ij, j = 1, …, N_i, i = 1, …, n, are chosen independently according to a distribution function F_T(t) on . Moreover, they are independent of the response and the covariate process {(Y_i(t), X_i(t))}, i = 1, …, n. The distribution function F_T(t) has a Lebesgue density f_T(t) that is bounded away from zero and infinity, uniformly over all , that is, there exist positive constants M₁ and M₂ such that M₁ ⩽ f_T(t) ⩽ M₂ for .
2.	The eigenvalues η₀(t), …, η_d(t) of are bounded away from zero and infinity, uniformly over all , that is, there exist positive constants M₃ and M₄ such that M₃ ⩽ η₀(t) ⩽ ⋅⋅⋅ ⩽ η_d(t) ⩽ M₄ for .
3.	There exists a positive constant M₅ such that \|X_p(t)\| ⩽ M₅ for and p = 0, …, d.
4.	There exists a positive constant M₆ such that for .
5.	.

These conditions are commonly used (e.g., Huang, Wu and Zhou Citation2004) and are satisfied in many practical examples. As for Assumption 1.1, when dealing with deterministic time points we can replace this assumption by for some distribution function F_T having a Lebesgue density function f_T which is bounded away from zero and infinity, uniformly over , where and is the indicator function (Huang, Wu and Zhou Citation2004). Note that we do not assume zero modeling bias, since we allow the knots to increase to infinity.

Appendix C. Theorem of Tan (Citation1977)

In the proof of Theorem 3 and 4 we need the following Lemma, based on Theorem 3.1 of Tan (Citation1977).

Lemma 1.

Let with V invertible and Q = Z′AZ, where A is a real symmetric matrix. Then Q = ∑^k_{i = 1}λ_iχ²(r_i, θ²_i) where χ²(r_i, θ²_i) are independent non-central chi-square variables, λ₁, …, λ_k are the non-zero distinct eigenvalues of VA with algebraic multiplicities r₁, …, r_k, respectively, and where VA has the spectral decomposition VA = ∑^k_{j = 1}λ_jE_j. Moreover, we have that

Appendix D. Proof of Theorem 1

Proof.

Under hypothesis H₁ we have that β_p(t) = ∑_lα_plB_pl(t; q_p) and α_pl = c_p for l = 1, …, m_p; p = 0, …, d. Therefore and Hence, we obtain that

The specified distribution of Q₁ ∼ ∑^k_{i = 1}λ_iχ²(r_i, θ²_i) follows from Lemma 1 in Appendix C with 0 = ∑_iλ_iθ²_i. We now show that ∑^k_{i = 1}r_i = N − dim and that all θ_i = 0. Note that the idempotent matrix has eigenvalues 0 and 1. Therefore we have the decomposition , where is the eigenspace corresponding to the eigenvalue λ = b of the matrix . Moreover, has dimension . Denote by the eigenspace of the eigenvalue λ = 0 of the matrix . One can verify that . Hence, in order to find the eigenvectors corresponding to a non-zero eigenvalue we can restrict to the space . This also means that the λ_i are eigenvalues of . Since is positive definite and the fact 0 = ∑_iλ_iθ²_i, we obtain that all θ_i = 0. The eigenspace of has dimension N, and therefore we have

It remains to show that Q₁ and Q₂ are independent. By Theorem 3.2 of Tan (Citation1977) Q₁ and Q₂ are independent if and only if (A1) It takes a small effort to verify the equation above by noting that .

Appendix E. Proof of Theorem 2

Proof.

The proof of this theorem is along the same lines as the proof of Theorem 3 in Li, Xu and Liu (Citation2011), some of the details are however different due to our longitudinal setting. Recall the definition of (see Appendix A). Set , then . We can also write , so that under hypothesis H₀ we obtain Note that under H₀. Hence so Denote . We define Using Lemma 1, we obtain that where γ² and θ²_i are specified in Lemma 1. Denote and . To prove Theorem 2, we need to show that (A2)

Some mathematical preparation is needed to prove (EquationA2(A2) ). The Takagi factorization of leads to a matrix G ∈ IR^{(N − dim) × N} such that Throughout ‖A‖ (‖c‖) denotes the Frobenius (Euclidean) norm of a matrix A (vector c), and ⟨a, b⟩ denotes the standard in-product of vectors a, b. Let , then where Let . Note that if , then there is nothing to prove since in that case ξ₀ = ξ₁ and η₀ = η₁, so we proceed with the case . We also have that from which it follows that . Define an orthogonal transformation T ∈ IR^{(N − dim) × (N − dim)} with first row equal to and let We obtain the expressions Therefore (A3) since for a mean zero normal variable Z we have the property . Now and TGG′T′ = I_{N − dim}. We want to bound . Let b = (b₁, b₂, …, b_N) denote the first row of the orthogonal matrix TG, then we know ‖b‖ = 1, also denote by c₁, …, c_N the columns of . Using the fact which is obtained by the Cauchy–Schwarz inequality, and the symmetric property of , we have that Using the previous inequality, we can continue from equation (EquationA3(A3) ) to obtain (A4) Let , then and . Analogously as in (EquationA4(A4) ) we obtain (A5) since for any orthogonal transformation , the variance of the first component of , where is obtained by the entry with index (1, 1) of the matrix Note that and are independent multivariate normal random vectors, because on the one hand on the other hand, by the same argument as in (EquationA1(A1) ) from which we find that Hence

Fix a t > 0, then (A6) For the last inequality, since η₁ and ξ₁ are independent, and η₁ and ξ₀ are independent, we have that where f is the density function of the multivariate normal distribution

Continuing from equation (EquationA6(A6) ) with k a positive real number (A7) where is the maximum of the density function of ξ₀ (the Markov inequality is applied in (EquationA7(A7) )). Substitute in (EquationA7(A7) ) to find that and by (EquationA6(A6) ) we obtain that for all t ⩾ 0

On the other hand, we obtain in a similar fashion (A8) where is the maximum of the density function of the random variable η₀. Substitute in (EquationA8(A8) ) to finally establish (A9)

Note that since H′H and G′G are idempotent matrices, thus 0 and 1 are the only eigenvalues. Then by (EquationA3(A3) ),(EquationA5(A5) ) and (EquationA9(A9) ), it follows that

Appendix F. Rate of convergence

In Theorem 2 we assume (Equation9(9) ). We shed more light on this rate by assuming that is bounded (N_max = max _{i = 1, …, n}N_i and N_min = min _{i = 1, …, n}N_i), and . Suppose that subjects with equal number of repeated measurements have the same time points, we do not need this assumption if the correlation structure does not depend on time, as is the case with any time independent correlation structure.

For the first part we use the fact that (Li, Xu and Liu Citation2011). Thus the first part is bounded by

Bounding

For the second part, we note that there is no closed-form expression of the density function of a linear combination of chi-square variables (see Bausch (Citation2013) among others). However, we obtain a reasonable bound on which is the maximum of the density of ∑^k_{i = 1}λ_iχ²(r_i).

First, it does not hold that r_i = 1 for all i. To prove this, suppose otherwise, i.e., r_i = 1 for all i. Then, by Theorem 1, we have k = ∑^k_{i = 1}r_i = N − dim. Next, we obtain a bound on k. We argue, as in the proof of Theorem 1, that to find a bound on k we restrict to the eigenspace of eigenvalue 1 of . Thus, restricting to , we only look at the number of positive eigenvalues of W^1/2VW^1/2 which is a block diagonal matrix. By the restriction on the time points (see above), W^1/2VW^1/2 contains at most N_max − N_min + 1 different block matrices with dimensions not exceeding N_max. Hence, the number of different positive eigenvalues does not exceed N_max(N_max − N_min + 1), i.e., k ⩽ N_max(N_max − N_min + 1). By assumption all r_i = 1, and thus it should hold (A10) Divide (EquationA10(A10) ) by N. Since N_max/N_min is bounded by C > 0 and N_max/n → 0, we obtain from the previous inequality using also the fact N ⩾ nN_min that the left-hand side is 1 + o(1), while the right-hand side is o(1). This is a contradiction. Hence, there is a 1 ⩽ j ⩽ k such that r_j > 1.

Also, we can write ∑^k_{i = 1}λ_iχ²(r_i) as a sum of a scaled chi-square distribution and the remaining part, where λ_max ≔ max _iλ_i is assumed to be an eigenvalue of a vector in . Moreover, we assume that . The density of this sum is a convolution which is bounded by (after a small calculation). Moreover, by Theorem 2.1 of Wolkowicz and Styan (Citation1980) we know that since V contains only ones on its diagonal. Hence, we derived

Bound on (Equation9(9) )

By the discussion above, we have the following bound on (Equation9(9) )

Testing for constancy in varying coefficient models

Bounding

Bound on (Equation9(9) )

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Testing for constancy in varying coefficient models

ABSTRACT

Acknowledgments

Funding

Appendix A. Notation

Appendix B. Assumptions

Appendix C. Theorem of Tan (Citation1977)

Appendix D. Proof of Theorem 1

Appendix E. Proof of Theorem 2

Appendix F. Rate of convergence

Bounding

Bound on (Equation9(9) )

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature