472
Views
0
CrossRef citations to date
0
Altmetric
Articles

Decomposition and reproducing property of local polynomial equivalent kernels in varying coefficient models

, &
Pages 357-372 | Received 29 Jun 2022, Accepted 15 May 2023, Published online: 30 May 2023

ABSTRACT

We consider local polynomial estimation for varying coefficient models and derive corresponding equivalent kernels that provide insights into the role of smoothing on the data and fill a gap in the literature. We show that the asymptotic equivalent kernels have an explicit decomposition with three parts: the inverse of the conditional moment matrix of covariates given the smoothing variable, the covariate vector, and the equivalent kernels of univariable local polynomials. We discuss finite-sample reproducing property which leads to zero bias in linear models with interactions between covariates and polynomials of the smoothing variable. By expressing the model in a centered form, equivalent kernels of estimating the intercept function are asymptotically identical to those of univariable local polynomials and estimators of slope functions are local analogues of slope estimators in linear models with weights assigned by equivalent kernels. Two examples are given to illustrate the weighting schemes and reproducing property.

MATHEMATICS SUBJECT CLASSIFICATION CODE:

1. Introduction

Motivated by situations of analysing complex data, several flexible regression models have been developed during the last three decades. Among these are the varying coefficient models (Hastie and Tibshirani Citation1993). They differ from classical linear models in that the regression coefficients are no longer constants but rather functions of a smoothing variable. The model has a form: (1) Y=g=1dag(U)Xg+ϵ=(1,x)Ta(U)+ϵ,(1) where x=(X2,,Xd)T are continuous covariates, X11, U is a continuous smoothing variable, a(U)=(a1(U),,ad(U))T is the functional coefficient vector, and ε is the error term with E(ϵ|U,x)=0 and Var(ϵ|U,x)=σ2(U). When d = 1, model (Equation1) is reduced to a univariable nonparametric model. If the varying coefficients are constants, i.e. ag(U)=ag,g=1,,d, then (Equation1) is the multiple linear regression model. The model retains general nonparametric characteristics and allows nonlinear interactions between the smoothing variable U and covariates x. Methods of estimating a() include the local polynomial approach (Fan and Zhang Citation1999), smoothing splines (Hastie and Tibshirani Citation1993; Chiang, Rice, and Wu Citation2001), and penalised splines (Ruppert, Wand, and Carroll Citation2003). An overview on methodology of varying coefficient models is given in Fan and Zhang (Citation2008) and Park, Mammen, Lee, and Lee (Citation2015).

This paper focuses on the local polynomial approach. When d = 1, with local polynomial fitting of pth order, it is well known that the estimators are linear smoothers (linear in Yi's) and the associated equivalent kernels are available (Fan and Gijbels Citation1996). The equivalent kernels give insights into the role of kernel smoothing on the data and the corresponding estimator of a1() has a remarkable ‘reproducing’ property (Tsybakov Citation2009, p:36), reproducing polynomials of degree p. However, to our knowledge, there are no results on equivalent kernels for local polynomial estimators in varying coefficient models (Equation1) in the literature. In this paper, we fill the gap by deriving asymptotically equivalent kernels for estimating ag(),g=1,,d and their derivatives, providing explicit forms of the connections between equivalent kernels of general d and those of d = 1, and studying extension of the reproducing property. The contribution of our paper includes the following: (i) under some conditions, the asymptotic equivalent kernels corresponding to estimating ag(ν)(),g=1,,d,ν=0,,p in (Equation1) have explicit decomposition forms that connect to those of d = 1 (Theorem 3.2); (ii) the finite-sample equivalent kernels corresponding to estimating ag(ν)(u),g=1,,d,ν=0,,p reproduce the νth derivative of polynomials of degree p in u (Proposition 3.1); (iii) with centered covariates, estimators of ak(),k=2,,d are local analogues of the slope estimators in linear models (Corollary 3.2).

We start the discussion with local linear fitting p = 1 and d = 2 in Theorem 3.1, followed by extending the results to a general d in (Equation1) with pth order local polynomial fitting in Theorem 3.2. Equations (Equation16) and (Equation26) in Theorems 3.1 and 3.2 respectively show that there are direct connections between asymptotic equivalent kernels of (Equation1) and those of univariable local polynomial regression ((Equation1) with d = 1). It may be seen from the second equation in (Equation26) that the asymptotic equivalent kernels are decomposed into three main parts, the inverse of the conditional moment matrix of (1,x) given U=u0, the covariate vector, and the equivalent kernels of univariable local polynomials. The finite-sample reproducing property is given in Proposition 3.1, which leads to zero bias when the true regression mean is the multiple linear model with interactions between x and up to order-p polynomials of U. In Section 3.3, we present two Corollaries of Theorems 3.1 and 3.2 when x is centered. It turns out that the centered form leads to simpler and more interpretable results: equivalent kernels of estimating a1(ν)() in (Equation1) with a general d is asymptotically identical to those of d = 1; estimators of a2(ν)(),,ad(ν)() may be asymptotically expressed in a form analogous to the slope estimators in linear models with weights assigned by equivalent kernels. This interpretation appears to be new in the literature. We conjecture that these equivalent kernel results may be useful to develop methodology when the responses are random objects, i.e. Fréchet regression. Petersen and Müller (Citation2019) propose to utilise the Euclidean local linear weights to fit Fréchet local linear regression. Thus Fréchet varying coefficient models will be an interesting topic for futureresearch.

The article is organised as follows. In Section 2, we summarise the local polynomial approach for estimating (Equation1) (Fan and Zhang Citation1999) and equivalent kernels of local polynomial regression (Fan and Gijbels Citation1996) with reproducing property in Proposition 2.1 (Tsybakov Citation2009). We present the main results in Section 3 and give two examples in Section 4 to illustrate, respectively, the weighting schemes of equivalent kernels when d = 2 and p = 1, and the reproducing property of equivalent kernels when d = 2 and p = 2. Proofs of Proposition 3.1 and Theorem 3.2 are provided in the Appendix.

2. Background

Consider a random sample {(Yi,Ui,Xi2,,Xid),i=1,,n} from model (Equation1) and define Xi1=1,i=1,,n. In this article, we adopt the local polynomial approach (Fan and Gijbels Citation1996) for estimating the coefficient functions ag(),g=1,,d in (Equation1). For Ui in a neighbourhood of a grid point u0, ag(Ui) is approximated locally by a polynomial of order p, j=0p(ag(j)(u0)/j!)(Uiu0)j based on a Taylor expansion. Then estimation is carried out by weighted least squares (Fan and Zhang Citation1999): (2) minβi=1n(Yig=1d[j=0pβg,j(Uiu0)j]Xig)2Kh(Uiu0),(2) where β=(β1,0,,β1,p,,βd,0,,βd,p)T, K() is a symmetric probability density function, h is the bandwidth determining the size of local neighbourhood, and Kh()=K(/h)/h. Throughout the paper, the dependence of β on u0 and h is suppressed if no ambiguity results. Under some conditions, (Equation2) has a unique solution, denoted by βˆ=(βˆ1,0,,βˆ1,p,,βˆd,0,,βˆd,p)T. It is clear that βˆg,0(u0) estimates ag(u0) of interest and j!βˆg,j(u0) estimates the jth derivative ag(j)(u0). The expression (Equation2) and its solution can be expressed in matrix notation. Let Xu0=[1(U1u0)pX1dX1d(U1u0)p1(Unu0)pXndXnd(Unu0)p],Wu0 be an n×n diagonal matrix of weights Kh(Uiu0), i=1,,n, and y=(Y1,,Yn)T. Then (Equation2) can be expressed as minβ(yXu0β)Wu0(yXu0β) which yields (3) βˆ(u0)=(Xu0TWu0Xu0)1Xu0TWu0y.(3) The estimator of a(u0) is (4) aˆ(u0)=(βˆ1,0(u0),,βˆd,0(u0))T=(Idep+1,1T)βˆ(u0),(4) where ⊗ denotes the Kronecker product, ep+1,k is a column vector of length (p+1) with 1 at the kth position and 0 elsewhere, and Id is the d-dimensional identity matrix. Let ζg,ν+1 be a column vector of length d×(p+1) with 1 at the [(g1)×(p+1)+(ν+1)]th position and 0 elsewhere, g=1,,d, ν=0,,p. Then βˆg,ν(u0)=ζg,ν+1Tβˆ(u0). In other words, if ζg,ν+1 is partitioned into d groups of length (p+1), then ζg,ν+1 indicates the (ν+1)th position in the gth group, and ζg,ν+1=ed,gep+1,ν+1. In the special case of d = 1, ζ1,ν+1=ep+1,ν+1.

The behaviour of βˆ(u0) differs whether u0 is an interior or boundary point. In this paper, we consider the case of interior points only. An informative tool to understand βˆ(u0) is the equivalent kernel. For d = 1 in model (Equation1), the weight function Wν() for βˆ1,ν(u0), i.e. βˆ1,ν(u0)=i=1nWν((Uiu0)/h)Yi, ν=0,,p, is given as follows (Fan and Gijbels Citation1996, p:63): (5) Wν(t)=ep+1,ν+1TSn1{1,th,,(th)p}TK(t)/h,(5) where Sn=Xu0TWu0Xu0. For an interior point u0, Wν() satisfies the following discrete moment conditions (Fan and Gijbels Citation1996, p:63 and p:103): (6) i=1n(Uiu0)qWν(Uiu0h)=δν,q,0ν,qp,(6) where δν,q is an indicator function of {ν=q}. For ν=0, the expression (Equation6) is referred to as the ‘reproducing’ property by Tsybakov (Citation2009, p:36, Proposition 1.12) because βˆ1,0() reproduces polynomials of degree p. Here, we extend Tsybakov's statement to a general ν=0,,p, in the following Proposition, which shows that the local polynomial kernel approach has derivative reproducing property.

Proposition 2.1

Let P() be a polynomial of degree p. Then (Equation6) implies the reproducing property for ν=0,,p: (7) ν!i=1nP(Ui)Wν(Uiu0h)=dνduνP(u)|u=u0.(7)

The proof of Proposition 2.1 is given in Tsybakov (Citation2009, pp:36-37), is straightforward based on (Equation6), and hence is omitted. For ν=1,,p, the reproducing property (Equation7) means that the weight function ν!Wν() corresponding to ν!βˆ1,ν() reproduces the νth derivative of polynomial P() with degree p. This includes shrinking polynomials of degree <ν to 0.

Let Sp=(μi+j)0i,jp with μi+j being the (i+j)th moment of K(). In an asymptotic form (Fan and Gijbels Citation1996, p:64), βˆ1,ν(u0)=(1/(nhν+1fU(u0))× i=1nKν((Uiu0)/h)Yi(1+op(1)), ν=0,,p, where fU() is the density function of U and (8) Kν(t)=ep+1,ν+1TSp1(1,t,,tp)TK(t)=(j=0ptjs(ν+1,j+1))K(t)(8) with s(i,j) being the (i,j)th element of Sp1. The Kν() is the asymptotic equivalent kernel for βˆ1,ν, satisfying the following property (Fan and Gijbels Citation1996, p:64): (9) tqKν(t)dt=δν,q,0ν,qp,(9) which is an asymptotic version of (Equation6); that is, Kν() is a kernel of order (ν,p+1) (see Gasser, Müller, and Mammitzsch Citation1985 for definition). It has been shown in Fan and Gijbels (Citation1996) that local polynomials with pν=1 outperform those with pν=0 asymptotically.

In the next section, we derive the equivalent kernels of βˆg,ν(u0), g=1,,d, ν=0,,p, for the varying coefficient model (Equation1), and investigate their reproducing property and connection to Kν() in (Equation8).

3. Results

3.1. Local linear case with d = 2

For clarity of presentation, we start with a simple case when d = 2 and p = 1, i.e. (10) Y=a1(U)+a2(U)X2+ϵ.(10) For a given interior point u0, with p = 1, Sn defined around (Equation5) is Sn=[11122122],where jk=[Sjk0Sjk1Sjk1Sjk2]with Sjkl=i=1nXijXik(Uiu0)lKh(Uiu0). Under the model (Equation10), based on (Equation3), βˆg,ν(u0), g = 1, 2 and ν=0,1, is a linear smoother: (11) βˆg,ν(u0)=i=1nWνg(Uiu0h;Xi2)Yi,(11) where (12) Wνg(Uiu0h;Xi2)=ζg,ν+1TSn1([1Xi2][1Uiu0])Kh(Uiu0).(12) It is straightforward to show that Wνg(;) satisfies the following discrete moment conditions: for ν,q=0,1, g = 1, 2, (13) i=1n(Uiu0)qWνg(Uiu0h;Xi2)=δg,1δν,q;i=1nXi2(Uiu0)qWνg(Uiu0h;Xi2)=δg,2δν,q.(13) Equation (Equation13) provides 4 moment conditions respectively for each combination of (g,ν). For g = 1, Wν1(;), ν=0,1, satisfies the same reproducing property as in Proposition 2.1 by the first equation in (Equation13). In addition, by the second equation in (Equation13), Wν1(;) shrinks the covariate Xi2's and interaction term Xi2Ui's to 0. In other words, the equivalent kernels corresponding to estimating a1(ν)() of d = 2 satisfy more properties than (Equation6) of d = 1 case. For g = 2, we list the properties of Wν2(;) in the following, while the general statement is given in Proposition 3.1 in Section 3.2.

For simplicity, denote ui0=(Uiu0)/h. When g = 2, (Equation13) gives the following results:

  1. When ν=q=0, i=1nW02(ui0;Xi2)=0, i.e. shrinking constants to 0, and i=1nXi2W02(ui0;Xi2)=1, i.e. reproducing the first derivative with respect to x2 (x2x2=1).

  2. When ν=0, q=1, i=1n(Uiu0)W02(ui0;Xi2)=0 or i=1nUiW02(ui0;Xi2)=0, i.e. shrinking linear terms of Ui's to 0; i=1nXi2(Uiu0)W02(ui0;Xi2)=0 or i=1nXi2UiW02((ui0;Xi2)=u0, reproducing the first partial derivative with respect to x2 (x2(ux2)|u0=u0).

  3. When ν=1, q=0, i=1nW12(ui0;Xi2)=0 and i=1nXi2W12(ui0;Xi2)=0, shrinking constants and linear terms of Xi2's to 0.

  4. When ν=q=1, i=1n(Uiu0)W12(ui0;Xi2)=0 or i=1nUiW12(ui0;Xi2)=0, shrinking linear terms of Ui's to 0; i=1nXi2(Uiu0)W12(ui0;Xi2)=1 or i=1nXi2UiW12(ui0;Xi2)=1, reproducing the first partial derivative with respect to both x2 and u (2ux2(ux2)|u0=1).

  5. From 1–4 above, the finite-sample bias when estimating E(Y|U,X2)=b0+b1U+b2X2+b3UX2 is zero.

Next we study the asymptotic forms for Wνg(;), ν=0,1 and g = 1, 2, to understand the local asymptotic behaviour of βˆg,ν(). Let x~=(1,X2,,Xd), f(x2,,xd|u) be the conditional density function of x given U = u, and Ωd(u) be the conditional expectation of x~x~T given U = u with its (j,k)th element rjk(u)=E(XjXk|U=u), j,k=1,,d. Mimicking the derivations of (Equation8) in Fan and Gijbels (Citation1996) and using the asymptotic forms of Sjkl in Zhang and Lee (Citation2000, equation (5.2)), we obtain (14) Sn1=1nfU(u0)([Ω2(u0)]1H11S11H11)(1+Op(lognnh)),(14) where Hp=diag(1,h,,hp). It can be seen from (Equation14) that when r12(u0)=r21(u0)=E(X2|U=u0)=0, Ω2(u0) is a diagonal matrix. The following theorem gives the explicit forms for the asymptotic equivalent kernel of βˆg,ν(u0) for g = 1, 2 and ν=0,1, and provides their moment properties.

Theorem 3.1

Consider a random sample {(Yi,Ui,Xi2), i=1,,n} from model (Equation10) (d = 2 in (Equation1)) with local linear p = 1 estimators. For an interior point u0, assume that f(x2|u0) is bounded away from 0 and ∞ and has a compact support. Conditioned on {Ui,Xi2}i=1n and under Conditions A in the Appendix, βˆg,ν(u0), g = 1, 2, ν=0,1, has the following asymptotic form: (15) βˆg,ν(u0)=1nhν+1fU(u0)i=1nKg,ν(Uiu0h;Xi2)Yi(1+op(1)),(15) where the equivalent kernel (16) Kg,ν(Uiu0h;Xi2)=ζg,ν+1T[Ω2(u0)S1]1([1Xi2][1Uiu0h])K(Uiu0h)=e2,gT[Ω2(u0)]1[1Xi2]Kν(Uiu0h)=(τ(g,1)(u0)+Xi2τ(g,2)(u0))(s(ν+1,1)+s(ν+1,2)((Uiu0)/h))×K(Uiu0h)(16) with τ(j,l)(u0) and s(j,l) the (j,l)th element of [Ω2(u0)]1 and S11 respectively. Then for ν,q=0,1, (17) tq(Kg,ν(t;x2)f(x2|u0)dx2)dt=δg,1δν,q,tq(x2Kg,ν(t;x2)f(x2|u0)dx2)dt=δg,2δν,q,(17) which are the asymptotic counterparts of (Equation13).

The results of Theorem 3.1 is a special case (d = 2, p = 1) of Theorem 3.2 and hence the proof of Theorem 3.1 follows that of Theorem 3.2. Expression (Equation16) gives different decomposition forms of Kg,ν(;) and interpretations of Theorem 3.1 are discussed as follows.

  • From the first equation in (Equation16), Kg,ν(;) consists of three main parts: (i) inverse of the Kronecker product of conditional moments Ω2(u0) and kernel moments S1, (ii) Kronecker product of covariate (1,Xi2) and local linear term (1,(Uiu0)/h), and (iii) kernel function K().

  • For the second equation in (Equation16), Kg,ν(;) can be rewritten as a matrix product of (i) gth row of [Ω2(u0)]1, (ii) covariate (1,Xi2), and (iii) the equivalent kernel Kν() (Equation8) of local linear regression. This expression shows the connection of Kg,ν(;x2) to Kν() in (Equation8).

  • It is clear that the ν!K1,ν(;) for estimating a1(ν)() under model (Equation10) of d = 2 is different from (Equation8) of d = 1. More explicitly, from (Equation16), (18) βˆ(u0)=i=1n[det(Ω2(u0))]1nfU(u0)[r22(u0)r12(u0)Xi2h1μ21((Uiu0)/h)(r22(u0)r12(u0)Xi2)Xi2r21(u0)h1μ21((Uiu0)/h)(Xi2r21(u0))]×Kh(Uiu0)Yi(1+op(1)).(18) Based on (Equation18), even when X2 and U are independent (r12=E(X2) and r22=E(X22) are free of u0), K1,ν(;) still involves Xi2's. Thus a sufficient condition for the equivalent kernels K1,ν(;) to be identical to those in (Equation8) of d = 1 is r12()0. That is, given U, X2 has a conditional mean of 0.

  • For g = 1, 2, the equivalent kernels for estimating the first derivative ag() is connected to those for estimating ag(): Kg,1(t;)=μ21tKg,0(t;). This is analogous to K1(t)=μ21tK0(t) when p = 1 in (Equation8).

  •  (Equation17) involves the conditional density f(x2|u) and shows the moment property of Kg,ν(t;x2) with respect to the conditional density when estimating a1(ν)() and a2(ν)() in (Equation10).

Next we discuss the decomposition and reproducing property of equivalent kernels for model (Equation1) with general d and p.

3.2. The case with general d and p

Results in Theorem 3.1 and the reproducing property (Equation13) are extended to general d and p in this subsection. For ease of notation, let ui0=(1,Uiu0,,(Uiu0)p), (19) X=[1X12X1d1Xn2Xnd]=[x~(1)Tx~(n)T]=[1x(1)T1x(n)T],(19) where x~(i) is the ith row vector of X and x(i)=(Xi2,,Xid)T without the intercept. For 0νp and 1gd, βˆg,ν(u0) is written as (20) βˆg,ν(u0)=i=1nWνg(Uiu0h;x(i)T)Yi,(20) where the weight function Wνg( ;x(i)) is (21) Wνg(Uiu0h;x(i)T)=ζg,ν+1TSn1(x~(i)ui0)Kh(Uiu0)(21) with ζg,ν+1 defined in Section 2. Based on (Equation21), we show in the Appendix that Wνg(;x(i)T) (Equation21) enjoys the following property analogous to (Equation13): for 0ν,qp, 2kd, and 1gd, (22) i=1n(Uiu0)qWνg(Uiu0h;x(i)T)=δg,1δν,q,i=1nXik(Uiu0)qWνg(Uiu0h;x(i)T)=δg,kδν,q.(22) We state the reproducing property of Wνg(;x(i)T) formally in the following Proposition.

Proposition 3.1

Let P() be a polynomial of degree p and Q(,xk)=xkP(), k=2,,d. Then (Equation22) implies the reproducing property, for ν=0,,p: (23) ν!i=1nP(Ui)Wνg(Uiu0h;x(i)T)=δg,1(dνduνP(u)|u=u0);(23) for k=2,,d, (24) ν!i=1nQ(Ui;Xik)Wνg(Uiu0h;x(i)T)=δg,kxk(νuνQ(u,xk)|u=u0).(24)

The outline of the proof of Proposition 3.1 is given in the Appendix. Proposition 3.1 shows that the equivalent kernel ν!Wν1(;) for estimating a1(ν)() reproduces the νth derivative of polynomials of Ui's with degree p, while shrinking Q(Ui;Xik)'s to 0. Moreover, based on (Equation24), the equivalent kernel ν!Wνk(;), k=2,,d, for estimating ak(ν)() reproduces the νth derivative of ak() when ak() is a polynomial with degree p. These results imply that the finite-sample bias when estimating E(Y|U,X2,,Xd)=j=0pb1jUj+k=2dj=0pbkjUjXk is zero, and that the polynomial reproducing property in Proposition 2.1 with d = 1 is valid for the interaction terms of covariates Xk's and pth order polynomials of U under (Equation1) with a general d.

Theorem 3.2 below gives the decomposition and moment property of asymptotic equivalent kernels for general d and p, which is an extension of Theorem 3.1.

Theorem 3.2

Consider a random sample {(Yi,Ui,x(i)T),i=1,,n} from model (Equation1) with local pth order polynomial estimators, p0. For an interior point u0, assume that f(x2,,xd|u0) is bounded away from 0 and ∞ and has a compact support. Conditioned on {Ui,x(i)T}i=1n and under Conditions A in the Appendix, βˆg,ν(u0) has the following asymptotic form for 0νp and 1gd: (25) βˆg,ν(u0)=1nhν+1fU(u0)i=1nKg,ν(Uiu0h;x(i)T)Yi(1+op(1)),(25) where the equivalent kernel (26) Kg,ν(Uiu0h;x(i)T)=ζg,ν+1T(Ωd(u0)Sp)1(x~(i)Hp1ui0)K(Uiu0h)=ed,gT[Ωd(u0)]1x~(i)TKν(Uiu0h)=(τ(g,1)(u0)+j=2dXijτ(g,j)(u0))(l=0p(Uiu0h)ls(ν+1,l+1))K(Uiu0h)(26) with τ(j,l)(u0) and s(j,l) the (j,l)th element of [Ωd(u0)]1 and Sp1 respectively. The moment property of Kg,ν(t;x2,,xd) is given below: (27) tq(K1,ν(t;x2,,xd)f(x2,,xd|u0)dx2dxd)dt=δg,1δν,q,tq(xkKg,ν(t;x2,,xd)f(x2,,xd|u0)dx2dxd)dt=δg,kδν,q,(27) where k=2,,d, and q=0,,p. (Equation27) is the asymptotic counterpart of (Equation22) and contains (p+1)d conditions for each combination of (g,ν).

The proof of Theorem 3.2 is given in the Appendix. From (Equation26), the equivalent kernel for βˆg,ν(u0) has decomposition forms analogous to the case for d = 2 and p = 1 in Theorem 3.1, while (Equation26) involves higher orders of local polynomials and more covariates. Some interpretations about (Equation26) is given below:

  • The second equation in (Equation26) shows that Kg,ν(;) consists of three parts: (i) [Ωd(u0)]1, the inverse of the conditional moment matrix of x~ given U=u0, (ii) the covariate vector x~(i), and the equivalent kernels Kν() (Equation8) of univariable local polynomials.

  • The second equation in (Equation26) not only gives an explicit connection between Kg,ν(;) and Kν(), but also the product [Ωd(u0)]1x~(i)T is analogous to the form of slopes ((XX)1x~(i)T) in classical linear models.

  • Based on the first equation in (Equation26), one alternative form of βˆ(u0) gives another connection to the case of d = 1: βˆ(u0)=i=1n1nfU(u0)(([Ωd(u0)]1x~(i))[Hp1Sp1Hp1ui0])Kh(Uiu0)Yi×(1+op(1)),which reduces to the case in Section 2 when d = 1.

3.3. Centering covariates

For classical linear models, centering covariates is useful in interpreting the effects of covariates and the slope estimators via least squares are the same with or without centering. In this subsection, we explore an analogous centered form for (Equation1) and we show that the resulting asymptotic equivalent kernels for estimating a1() are identical to Kν() in (Equation8) of d = 1 case. Moreover, the varying coefficient model may be interpreted as locally multiple linear model with interactions.

Let X¯k be the sample mean of kth covariate {Xik}i=1n, k=2,,d. Then rewrite model (Equation1) in terms of centered covariates for the ith observation, i=1,,n, (28) Yi=(a1(Ui)+X¯2a2(Ui)++X¯dad(Ui))+k=2dak(Ui)(XikX¯k)+ϵi.(28) It is straightforward to observe that the coefficient functions ak(),k=2,,d, are the same whether the covariates are centered or not, while the intercept function a1() will be different. For ease of notation, define XikcXikX¯k, k=2,,d, (29) [X12X¯2X1dX¯dXn2X¯2XndX¯d][X12cX1dcXn2cXndc][(x(1)c)(x(n)c)],(29) and (d1)×(d1) matrix Mxx(u) with (j1,k1)th element being E(XijcXikc|u)rjkc(u), j,k=2,,d. When the conditional density of Xk given U = u is well defined, E(Xik|u) is a function of u and E(Xikc|u)=0. The following Corollary for d = 2 and p = 1 is a special case of Theorem 3.1 either when Xi2's are centered or when E(X2|U=u)=r12(u)=0.

Corollary 3.1

Under the conditions in Theorem 3.1, results (a) and (b) below hold when Xi2's are centered.

(a)

K1,ν(), ν=0,1, is identical to Kν() in (Equation8) and does not involve Xi2c's;

(b)

the form of K2,ν( ;), ν=0,1, becomes simpler:

(30) K2,0(Uiu0h;Xi2c)=(Xi2cr22c(u0))K(Uiu0h),K2,1(Uiu0h;Xi2c)=(Xi2cr22c(u0))K1(Uiu0h)=(Xi2cr22c(u0))(Uiu0hμ2)K(Uiu0h)=(Uiu0hμ2)K2,0(Uiu0h;Xi2c).(30)
(c)

When E(X2|u0)=0, the results in (a) and (b) hold without centering of Xi2's, and Xi2c in (Equation30) can be replaced by Xi2.

Corollary 3.1 shows that with Xi2c's, the equivalent kernels K2,ν(;), ν=0,1 corresponding to βˆ2,0(u0) and βˆ2,1(u0) respectively involve a factor Xi2c/r22c(u0). In addition, when X2 and U are independent, r22c is a constant free of u0, and further adopting standardised Xi2 in (Equation28) by Xi2std=Xi2c/r22c, i=1,,n, leads to K2,ν((Uiu0)/h;Xi2std)=Xi2stdK1,ν((Uiu0)/h), ν=0,1.

Let us further explore Corollary 3.1 by centering Yi's as well, denoted by Yic. With centered observations {(Yic,Ui,Xi2c),i=1,n}, and based on (Equation30), (31) βˆ2,0(u0)=[n1i=1nXi2cYicKh(Uiu0)r22c(u0)fU(u0)](1+op(1)).(31) The denominator in (Equation31) can be viewed as a local variance of X2 at u0, while the numerator in (Equation31) can be interpreted as the locally weighted sample covariance between Xi2's and Yi's with weights assigned by Kh(Uiu0) around u0, denoted by Covˆu0(X2,Y)K. Hence βˆ2,0(u0) may be interpreted as Covˆu0(X2,Y)K/[Varˆ(X2|U=u0)fU(u0)]. This enhances the interpretations of βˆ2,0(u0) for estimating a2(u0) and presents a local analogue of the slope in simple linear regression. From (Equation30), βˆ2,1(u0) for estimating a2() has an analogous interpretation as Covˆu0(X2,Y)K1/[Varˆ(X2|U=u0)fU(u0)] with weights assigned by K1(). When d = 2 and p = 1, it is obvious that (Equation28) could be interpreted as locally multiple linear model with interactions, since E(Yic|u0)βˆ1,0(u0)+βˆ1,1(u0)(Uiu0)+βˆ2,0(u0)Xi2c+βˆ2,1(u0)Xi2c(Uiu0).

We now present a Corollary of Theorem 3.2 that extends the results in Corollary 3.1 to the case with general d and p.

Corollary 3.2

Under the conditions in Theorem 3.2, with centered covariates {Xikc,i=1,,n,k=2,,d},

(a)

the asymptotic equivalent kernel K1,ν() corresponding to estimating a1(ν)() is identical to Kν() in (Equation8), ν=0,,p;

(b)

the asymptotic equivalent kernel Kg,ν( ;), g=2,,d, ν=0,,p, possesses a simpler form, (32) Kg,ν(Uiu0h;(x(i)c))=ed,gT[Mxx(u0)]1(x(i)c)Kν(Uiu0h).(32)

(c)

Suppose that Yi's are centered as Yic's. Then for ν=0,,p, (33) (βˆ2,ν(u0),,βˆd,ν(u0))=n1hν[fU(u0)Mxx(u0)]1Covˆu0(x,Y)Kν(1+op(1)).(33)

Corollary 3.2(a) shows that with centered covariates, the asymptotic equivalent kernels corresponding to estimating a1(ν)() of d2, K1,ν(), ν=0,,p, are identical to Kν() in (Equation8) of d = 1. The expression (Equation33) in the case of ν=0, (βˆ2,0(u0),,βˆd,0(u0)) presents a local analogue of the slope estimators in multiple linear regression, since Mxx(u0) is approximately the conditional variance matrix of x given U=u0. The derivative terms (βˆ2,ν(u0),,βˆd,ν(u0)), ν=1,,p, could be interpreted as [Mxx(u0)]1Covˆu0(x,Y)Kν asymptotically through equivalent kernels Kν() of d = 1. We conjecture that these interpretations may be useful to develop methodology for Fréchet regression (Petersen and Müller Citation2019) when responses are random objects in a metric space. Petersen and Müller propose to adopt Euclidean local linear weights to fit Fréchet local linear regression, and in a similar approach, the equivalent kernels in Theorem 3.2 and Corollary 3.2 may be utilised to develop Fréchet varying coefficient models.

4. Examples

In Example 4.1, we demonstrate the weighting schemes of equivalent kernels (Equation13) and Corollary 3.1 when d = 2 and p = 1. Example 4.2 is for illustrating the reproducing property in Proposition 3.1 when d = 2 and p = 2. For illustration, the Epanechnikov kernel is used with a pre-specified bandwidth h = 0.1. The issues of bandwidth selection and the choice of local polynomial orders are beyond the scope of this work and the reader may explore the related discussion in Fan and Zhang (Citation2008) and Park et al. (Citation2015).

Example 4.1

(X2 | U=u)Uniform((1+u)/2,(1+u)/2),where U is Uniform(0,1). For this example, r12(u)=E(X2|U=u)=0, r22(u)=(1+u)2/12, and X2 and U are not independent. Since equivalent kernels (Equation8) for estimating a1(ν)() are known in the literature, we illustrate equivalent kernels for estimating a2(ν)(), ν=0,1. A random sample {(Ui,Xi2),i=1,,n} with size n = 100 was drawn and at a fixed u0=0.5, its neighbourhood (0.4, 0.6) contains 18 data points, i=39,,56.

  • For βˆ2,0 estimating a2(), Figure (a) shows the finite-sample weights (solid line) of {Xi2W02((Ui0.5)/h;Xi2),i=39,,56} whose sum equals to 1 ((Equation13) with g = 2, ν=0, and q = 0), reproducing the first derivative with respect to x2. The asymptotic weights {Xi2K2,0((Ui0.5)/h;Xi2)/(nh),i=39,,56} with K2,0() in (Equation30) are normalised to have a sum of 1 and shown as dash line in Figure (a).

  • Figure (b) shows {Xi2(Ui0.5)W02((Ui0.5)/h;Xi2),i=39,,56} (solid line) whose sum equals to 0 ((Equation13) with g = 2, ν=0, and q = 1), so that i=3956Xi2UiW02((Ui0.5)/h;Xi2)=u0=0.5, reproducing the first derivative of ux2 with respect to x2 at u0=0.5. Their asymptotic normalised Xi2(Ui0.5)K2,0((Ui0.5)/h;Xi2)/(nh)'s are shown in dash line.

  • For βˆ2,1 estimating a2(), Figure (c) show the finite-sample weights (solid lines) of {Xi2W12((Ui0.5)/h;Xi2),i=39,,56} whose sum is 0, i.e. shrinking linear terms of Xi2 to 0. Their normalised asymptotic weights Xi2K2,1((Ui0.5)/h;Xi2)/(nh2)'s are shown in dash line, with K2,1() in (Equation30).

  • Figure (d) shows {Xi2(Ui0.5)W12((Ui0.5)/h;Xi2),i=39,,56} whose sum is 1, i.e. i=3956Xi2UiW12((Ui0.5)/h;Xi2)=1, reproducing the first partial derivative with respect to both x2 and u, as well as their normalised asymptotic weights Xi2(Ui0.5)K2,1((Ui0.5)/h;Xi2)/(nh2) (dash line).

  • In contrast to the univariable local linear regression where the weights are typically concentrated around the target point of estimation, the weights in Figure (a,c) are influenced by the covariate Xi2's and the conditional variance function r22c() (Corollary 3.1), and may not be concentrated around the target point.

Figure 1. Example 4.1 of Section 4, comparison between the exact weight function (Equation13) (solid lines) and its normalised asymptotic form (Equation30) (dash lines) of βˆ2,0 with (a) q = 0 and (b) q = 1; of βˆ2,1 with (c) q = 0 and (d) q = 1.

Figure 1. Example 4.1 of Section 4, comparison between the exact weight function (Equation13(13) ∑i=1n(Ui−u0)qWνg(Ui−u0h;Xi2)=δg,1δν,q;∑i=1nXi2(Ui−u0)qWνg(Ui−u0h;Xi2)=δg,2δν,q.(13) ) (solid lines) and its normalised asymptotic form (Equation30(30) K2,0∗(Ui−u0h;Xi2c)=(Xi2cr22c(u0))K(Ui−u0h),K2,1∗(Ui−u0h;Xi2c)=(Xi2cr22c(u0))K1∗(Ui−u0h)=(Xi2cr22c(u0))(Ui−u0hμ2)K(Ui−u0h)=(Ui−u0hμ2)K2,0∗(Ui−u0h;Xi2c).(30) ) (dash lines) of βˆ2,0 with (a) q = 0 and (b) q = 1; of βˆ2,1 with (c) q = 0 and (d) q = 1.

Example 4.2

We set Xi2's and Ui's the same as those in Example 4.1, and further set P(Ui)=Ui21.5U to illustrate the reproducing property in Proposition 3.1 when d = 2 and p = 2. Again, let u0=0.5 and h = 0.1, Figure (a) plots the points (Ui,P(Ui)) as +'s, i=39,,56, and the weights W01((Uiu0)/h;Xi2) of g = 1 and ν=0 reproducing P(0.5)=0.5 (a solid black-square point) are plotted as circles with lines connecting them. There are a few negative weights since the weights are not necessarily nonnegative when p = 2 and d = 2. Figures (b,c) are similar to Figure (a) except for ν=1 and 2 respectively; i.e. the + points (Ui,P(ν)(Ui))'s and the weights Wν1((Uiu0)/h;Xi2) reproducing P(ν)(0.5) (P(0.5)=0.5 and P′′(0.5)=0, solid black-square points) are shown. The variation of weights increases as ν increases, as shown by the scale of the y-axis.

Figure 2. Example 4.2 of Section 4. (a)–(c) for g = 1 and ν=0, 1, 2, respectively: the +'s points are (Ui,P(ν)(Ui)), i=39,,56, and the weights Wν1((Uiu0)/h;Xi2) reproducing P(ν)(0.5) (a solid black-square point) are plotted as circles with lines. (d)-(f) for g = 2 and ν=0, 1, 2, respectively: The + points (Ui,νuνQ(Ui,Xi2))'s and the weights Wν2((Uiu0)/h;Xi2) that reproduces x2(νuνQ(u,x2)|u=0.5) (a solid black-square point) are shown.

Figure 2. Example 4.2 of Section 4. (a)–(c) for g = 1 and ν=0, 1, 2, respectively: the +'s points are (Ui,P(ν)(Ui)), i=39,…,56, and the weights Wν1((Ui−u0)/h;Xi2) reproducing P(ν)(0.5) (a solid black-square point) are plotted as circles with lines. (d)-(f) for g = 2 and ν=0, 1, 2, respectively: The + points (Ui,∂ν∂uνQ(Ui,Xi2))'s and the weights Wν2((Ui−u0)/h;Xi2) that reproduces ∂∂x2(∂ν∂uνQ(u,x2)|u=0.5) (a solid black-square point) are shown.

Analogous illustration is given in Figures (d–f) for g = 2 and Q(Ui,Xi2)=Xi2P(Ui). The + points (Ui,νuνQ(Ui,Xi2))'s and the weights Wν2((Uiu0)/h;Xi2) that reproduces x2(νuνQ(u,x2)|u=0.5)=P(ν)(0.5) (a solid black-square point) are shown. Again, the variation of weights increases as ν increases, as shown by the scale of the y-axis. The difference of weights between g = 1 and 2 are visually obvious when comparing Figure (a–c,d–f).

Acknowledgments

We thank the Editor, an Associate Editor, and two referees for constructive suggestions and insightful comments.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

C-Y Wu and L-H Huang were partially supported by the Ministry of Science and Technology, Taiwan, under Grants 107-2118-M-007-002-MY2, 105-2118-M-007-006-MY2 and 103-2118-M-007-001-MY2.

References

  • Chiang, C.-T., Rice, J.A., and Wu, C.O. (2001), ‘Smoothing Spline Estimation for Varying Coefficient Models with Repeatedly Measured Dependent Variables’, Journal of the American Statistical Association, 96, 605–619.
  • Fan, J., and Gijbels, I. (1996), Local Polynomial Modelling and Its Applications, London: Chapman & Hall.
  • Fan, J., and Zhang, W. (1999), ‘Statistical Estimation in Varying Coefficient Models’, Annals of Statistics, 27, 1491–1518.
  • Fan, J., and Zhang, W. (2008), ‘Statistical Methods with Varying Coefficient Models’, Statistics and its Interface, 1, 179–195.
  • Gasser, T., Müller, H.-G., and Mammitzsch, V. (1985), ‘Kernels for Nonparametric Curve Estimation’, Journal of the Royal Statistical Society: Series B Statistical Methodology, 47, 238–252.
  • Hastie, T.J., and Tibshirani, R.J. (1993), ‘Varying-Coefficient Models’, Journal of the Royal Statistical Society: Series B Statistical Methodology, 55, 757–796.
  • Park, B.U., Mammen, E., Lee, Y.K., and Lee, E.R. (2015), ‘Varying Coefficient Regression Models: a Review and New Developments’, International Statistical Review, 83, 36–64.
  • Petersen, A., and Müller, H.-G. (2019), ‘Fréchet Regression for Random Objects with Euclidean Predictors’, Annals of Statistics, 47, 691–719.
  • Ruppert, D., Wand, M.P., and Carroll, R.J. (2003), Semiparametric Regression, London: Cambridge University Press.
  • Tsybakov, A.B. (2009), Introduction to Nonparametric Estimation, New York: Springer-Verlag.
  • Zhang, W., and Lee, S.Y. (2000), ‘Variable Bandwidth Selection in Varying-Coefficient Models’, Journal of Multivariate Analysis, 74, 116–134.

Appendix

Conditions A

The following assumptions are taken from Zhang and Lee (Citation2000).

(A1)

EXj2s< for some s>2,j=2,,d.

(A2)

Let ag(l) denote the lth derivative of ag(); ag(p+1)() is continuous in a neighbourhood of u0 for g=1,,d. Further, assume ag(p+1)(u0)0, for g=1,,d.

(A3)

The functions rjk(), j,k=1,,d and σ2() have bounded second derivatives in a neighbourhood of u0.

(A4)

The marginal density fU(u) of U has a continuous second derivative in some neighbourhood of u0 and fU(u0)0.

(A5)

The kernel function K() is a symmetric density function with compact support.

Proof

Proof of (Equation22) and Proposition 3.1

From (Equation21), the LHS of the first equation in (Equation22) is i=1n(Uiu0)qWνg(Uiu0h;x(i)T)=ζg,ν+1TSn1i=1n(Uiu0)q(x~(i)ui0)Kh(Uiu0)=ζg,ν+1TSn1Snζ1,q+1=δg,1δν,q.Analogously, the LHS of the second equation in (Equation22) is i=1nXik(Uiu0)qWνg(Uiu0h;x(i)T)=ζg,ν+1TSn1i=1nXik(Uiu0)q(x~(i)ui0)Kh(Uiu0)=ζg,ν+1TSn1Snζk,q+1=δg,kδν,q.Hence (Equation22) is obtained. Then we show the results of Proposition 3.1. For (Equation23), since P() is a polynomial of degree p, P(Ui)=P(u0)+P(u0)(Uiu0)++P(p)(u0)p!(Uiu0)p.Plugging this polynomial into the LHS of (Equation23), the RHS of (Equation23) is obtained based on the first equation in (Equation22). Since Q(,xk)=xkP(), analogous arguments can be derived for (Equation24).

Proof

Proof of Theorem 3.2

From (Equation14), the matrix Sn1 for general d and p has an asymptotic form Zhang and Lee (Citation2000, equation (5.2)) Sn1=1nfU(u0)([Ωd(u0)]1Hp1Sp1Hp1)(1+Op(lognnh)).Then based on (Equation3), βˆ(u0)=i=1n1nhfU(u0)([Ωd(u0)]1Hp1Sp1Hp1)(x~(i)ui)K(Uiu0h)Yi×(1+op(1)).By properties of Kronecker product, the equivalent kernel corresponding to βˆg,ν(u0)=ζg,ν+1Tβˆ(u0) (ζg,ν+1 is derived as follows: βˆg,ν(u0)=i=1n1nh1+νfU(u0)(ed,gT[Ωd(u0)]1x~(i))(ep+1,ν+1TSp1Hp1uiK(Uiu0h))Yi×(1+op(1))=i=1n1nh1+νfU(u0)(ed,gT[Ωd(u0)]1x~(i)Kν(Uiu0h))Yi(1+op(1)).To show the moment property (Equation27), again by properties of Kronecker product, for k=1,,d, (ζg,ν+1T(Ωd(u0)Sp)1([xkxdxk][tqtp+q])K(t)f(x2,,xd|u0)dx2dxd)dt=ζg,ν+1T(Ωd(u0)Sp)1([r1k(u0)rdk(u0)][tqtp+q])K(t)dt=ζg,ν+1T(Ωd(u0)Sp)1[(Ωd(u0)ed,k)(Spep+1,q+1)]=ζg,ν+1T(Ωd(u0)Sp)1(Ωd(u0)Sp)ζk,q+1=δg,kδν,q.