Full article: Partially Linear Additive Regression with a General Hilbertian Response

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In this article we develop semiparametric regression techniques for fitting partially linear additive models. The methods are for a general Hilbert-space-valued response. They use a powerful technique of additive regression in profiling out the additive nonparametric components of the models, which necessarily involves additive regression of the nonadditive effects of covariates. We show that the estimators of the parametric components are $\sqrt{n}$ -consistent and asymptotically Gaussian under weak conditions. We also prove that the estimators of the nonparametric components, which are random elements taking values in a space of Hilbert-space-valued maps, achieve the univariate rate of convergence regardless of the dimension of covariates. We present some numerical evidence for the success of the proposed method and discuss real data applications. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

We study useful semiparametric regression techniques that can be used for analyzing a finite or infinite dimensional response variable. The response variable takes values in a general separable Hilbert space. The model consists of a parametric and a nonparametric part. The parametric part is linear in a covariate (predictor) vector, say $X = {(X_{1}, \dots, X_{p})}^{⊤}$ for $p \geq 1$ , and the nonparametric part is to model the effect of another covariate vector, say $Z = {(Z_{1}, \dots, Z_{d})}^{⊤}$ for $d \geq 1$ . To avoid the curse of dimensionality in estimating the nonparametric part, it is assumed that the nonparametric effect is additive in Z, that is, it adds the unknown nonparametric effects of the individual covariates Z_k. We consider two scenarios for X. One is that both X_j and Z_k are real-valued, and the other that X_j take values in the Hilbert space where the response variable takes values while Z_k are real-valued. The new techniques are important extensions of the partially linear additive regression for scalar responses coupled with scalar covariates studied by Yu, Mammen, and Park (Citation2011). In this article, we develop powerful techniques of estimating the models and provide sound theory that supports the methodology.

Our framework of Hilbert-space-valued (henceforth, Hilbertian) responses can be specialized to various data types. Its coverage is broad enough to include random variables, random vectors, random functions, random densities, compositional random vectors, infinite sequence random vectors, and compositional random functions, etc. All these types are commonly encountered in today’s data environments. The present work also provides a base toward semiparametric regression for other response spaces, such as Riemannian manifolds and Lie groups, along the way paved by Lin, Müller, and Park (Citation2022). The latter work pioneered a link connecting additive regression for manifold-valued responses to Hilbertian additive regression (Jeon and Park Citation2020) via Riemannian logarithmic map. Despite its importance, semiparametric regression with Hilbertian responses remains unexplored. To the best of our knowledge, this is the first attempt to study a semiparametric regression model for general Hilbertian responses.

We prove that our estimators of the parametric effects of X_j are $\sqrt{n}$ -consistent in case X_j are real-valued. We also derive the joint asymptotic distribution of the parametric estimators. For Hilbertian X_j, we show that the corresponding parametric estimators are still $\sqrt{n}$ -consistent if the Hilbert space where X_j take values is of finite-dimension, while they achieve a slightly slower rate if the Hilbert space is of infinite-dimension. Furthermore, we show that our methodology of estimating the nonparametric effect of the covariate Z is free from the curse of dimensionality, that is, affords a univariate rate of convergence regardless of the dimension of Z. This is in contrast with the partially linear modeling approach that does not have additivity structure in the nonparametric part. The latter suffers from the dimensionality problem. Not only the nonparametric part, our approach also improves the estimation of the parametric part. Indeed, we demonstrate, theoretically and numerically, that using additivity structure in the nonparametric part leads to efficiency gain in the estimation of the parametric part. It turns out that the gain is larger if $E (X_{j} | Z = \cdot)$ are farther from being additive.

The present work is not considered a direct extension of Yu, Mammen, and Park (Citation2011). Dealing with a general Hilbert space, instead of the conventional $R$ , as the space of the values of the response variable, needs a number of innovations in developing relevant methodology and theory. In case X_j takes values in $R$ , the theory requires to assess the stochastic magnitude of terms of the form $n^{- 1} \sum_{i = 1}^{n} U_{i} \cdot \hat{g} (Z_{i j})$ for some $U_{i}$ , which is a known function of $(X_{i}, Z_{i})$ , and for some stochastic (random) map $\hat{g}$ , which is a random element taking values in the space of Hilbertian maps $g : S \to H$ for a Hilbert space $H$ and a compact subset S of $R$ . The usual way of analyzing such a $\hat{g}$ -weighted average is to consider a large set $G$ to which $\hat{g}$ belongs with a high probability, and then derive the maximal stochastic size of $n^{- 1} \sum_{i = 1}^{n} U_{i} \cdot g (Z_{i j})$ over $g \in G$ using an entropy bound for the set $G$ . The set $G$ embodies Hilbertian maps $g : S \to H$ . In case $H$ is finite-dimensional as Euclidean spaces, the entropy of $G$ is finite although the dimension of $G$ may be infinite, particularly when $G$ is a class of nonparametric maps g. In case $H$ is infinite-dimensional, however, the entropy of $G$ is infinite so that it is not feasible to apply the empirical process theory directly to the $\hat{g}$ -weighted average. To resolve this difficulty we take the Hilbertian norm of the $\hat{g}$ -weighted average, and convert its maximal stochastic size to that of a $\tilde{g}$ -weighted average where $\tilde{g}$ is a stochastic map from S to $R$ . The conversion with the associated calculation of the sizes of various stochastic terms is one of the challenges we tackle in this article. For Hilbertian X_j taking values in $H$ , however, a similar idea leads to dealing with $n^{- 1} \sum_{i = 1}^{n} \hat{η} (V_{i}, Z_{i j})$ , where $V_{i}$ is an $H$ -valued function of $(X_{i}, Z_{i})$ and $\hat{η}$ is a stochastic map from $H \times S$ to $R$ . It turns out that the class embodying the latter stochastic map with a high probability does not necessarily have a finite entropy. Thus, the case of Hilbertian X_j needs a different treatment, which is another difficulty we resolve in this work.

Our proposals are related to Jeon and Park (Citation2020), which developed a smooth backfitting technique for additive regression with Hilbertian responses. The latter, however, is for additive models without parametric component X, and for additive regression of additive effect, that is, for estimating the “assumed” additive effect of Z. It does not cover cases where the additivity model assumption is violated in additive regression. In contrast, our work is for models with linear effect of X in addition to the additive nonparametric effect of the covariate vector Z. Dealing with the additional linear effect necessarily involves additive regression of nonadditive effects. In this respect, our theory for additive regression is more general than the one in Jeon and Park (Citation2020). Moreover, our modeling approach allows for discrete type covariates (ordinal or nominal), which arise in a variety of statistical problems.

In the case of a functional response, say $Y (\cdot) : T \to R$ for a domain $T \subset R^{q}$ , which is a special case of Hilbetian response, one might think of applying a technique for scalar responses, such as the one in Yu, Mammen, and Park (Citation2011), in a pointwise manner to Y(t) for each $t \in T$ and combine the pointwise results $\hat{Y} (t)$ , to construct $\hat{Y} (\cdot)$ . However, this naive approach is problematic since the resulting $\hat{Y} (\cdot)$ is not guaranteed to take values in the space where $Y (\cdot)$ comes from. This is particularly the case when $Y (\cdot)$ has constraints, for example, $Y (\cdot) \geq 0$ and $\int_{T} Y (t) d t = 1$ , like a random density. Similarly, for a compositional response $Y \equiv (Y_{1}, \dots, Y_{k})$ with $0 < Y_{j} < 1$ and $\sum_{j = 1}^{k} Y_{j} \equiv 1$ , component-wise regression with Y_j for each j does not give simplex-valued $\hat{Y}$ as a whole. Our approach does not have these drawbacks when specialized to functional and compositional data since it applies directly to the observations of functional $Y (\cdot)$ and of compositional vector Y, respectively, as data objects.

A few past works on semiparametric regression for real-valued responses include the estimation of partially linear models without additivity structure in the nonparametric part (Bhattacharya and Zhao Citation1997; Liang Citation2006), and of partially linear additive models (Opsomer and Ruppert Citation1999; Liang et al. Citation2008; Yu, Mammen, and Park Citation2011; Lee, Han, and Park Citation2018). Among the latter four studying partially linear additive models, Opsomer and Ruppert (Citation1999) and Liang et al. (Citation2008) employed the ordinary backfitting technique (Buja, Hastie, and Tibshirani Citation1989) to estimate the additive nonparametric part, while Yu, Mammen, and Park (Citation2011) and Lee, Han, and Park (Citation2018) used the smooth backfitting technique (Mammen, Linton, and Nielsen Citation1999). The smooth backfitting method has been proved to be successful in various structured nonparametric models under weak conditions (Yu, Park, and Mammen Citation2008; Linton, Sperlich, and van Keilegom Citation2008; Lee, Mammen, and Park Citation2010, Citation2012; Zhang, Park, and Wang Citation2013; Bissantz et al. Citation2016; Han and Park Citation2018; Han, Müller, and Park Citation2020). On the contrary, the ordinary backfitting is known to work under stronger conditions on covariates (Opsomer and Ruppert Citation1997).

2 Additive Regression

Our approach to the semiparametric regression requires the estimation of “best” approximations of $E (W | Z = \cdot)$ for W being a general Hilbertian random element, or a real-valued random variable in the respective spaces of “additive” maps. This section is devoted to characterizing the best approximations and their estimation. We assume that Z is supported on ${[0, 1]}^{d}$ .

2.1 Projection on Additive Function Space

We consider a general separable Hilbert space, denoted by $H$ . Euclidean spaces, L² spaces, Bayes-Hilbert spaces and simplices are special cases of $H$ . Here and below, we denote $H$ -valued maps, random elements taking values in $H$ and their values in $H$ , by bold-faced symbols. Note that we also use bold-faced symbols to denote random vectors and their realizations. Let $0$ be the zero vector in $H, 〈 \cdot, \cdot 〉$ the inner product of $H$ and $| | \cdot | |$ the associated norm. We denote by $\oplus$ and $⊙$ the operations of vector addition and of scalar multiplication in $H$ , respectively. Let $⊖$ denote the subtraction operation defined by $a ⊖ b = a \oplus (- 1) ⊙ b$ .

We let $H_{add} (H)$ denote the space of additive maps $g : {[0, 1]}^{d} \to H$ such that $E (| | g (Z) | |^{2}) < \infty$ and $g (z) = g_{1} (z_{1}) \oplus \dots \oplus g_{d} (z_{d})$ for some $g_{j} : [0, 1] \to H$ . For a random element W taking values in $H$ with $E (| | W | |^{2}) < \infty$ , let $m_{W, +} = m_{W, 1} \oplus \dots \oplus m_{W, d}$ denote the projection of the mean regression map $m_{W} : = E (W | Z = \cdot)$ onto $H_{add} (H)$ . That is, (2.1) $m_{W, +} = arg \min_{g_{+} \in H_{add} (H)} \int_{{[0, 1]}^{d}} {‖ m_{W} (z) ⊖ g_{+} (z) ‖}^{2} f (z) d z,$ (2.1) where f is the joint density of Z. For the definition of the conditional expectation for Hilbertian W, we refer to Bosq (Citation2000). The minimizer $m_{W, +}$ with $m_{W, +} (z) = m_{W, 1} (z_{1}) \oplus \dots \oplus m_{W, d} (z_{d})$ satisfies (2.2) $\begin{matrix} 0 & = \int_{{[0, 1]}^{d - 1}} \frac{f (z)}{f_{j} (z_{j})} ⊙ (m_{W} (z) ⊖ m_{W, 1} (z_{1}) ⊖ \dots ⊖ m_{W, d} (z_{d})) d z_{- j} \\ = E (W | Z_{j} = z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{f_{j k} (z_{j}, z_{k})}{f_{j} (z_{j})} ⊙ m_{W, k} (z_{k}) d z_{k} ⊖ m_{W, j} (z_{j}) \end{matrix}$ (2.2) for each $1 \leq j \leq d$ and for all z_j with $f_{j} (z_{j}) > 0$ . Here and below, $z_{- j}$ for a d-vector z is $(z_{1}, \dots, z_{j - 1}, z_{j + 1}, \dots, z_{d})$ , and f_j and f_jk are the densities of Z_j and (Z_j, Z_k), respectively. The integrals in (2.2) are special cases of the so called “Bochner integral” (Jeon and Park Citation2020), the latter being for Banach-space-valued maps. From (2.2), we see that the additive map $m_{W, +}$ is characterized as the solution of the following system of Hilbertian integral equations: (2.3) $\begin{matrix} m_{W, j} (z_{j}) & = E (W | Z_{j} = z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{f_{j k} (z_{j}, z_{k})}{f_{j} (z_{j})} ⊙ m_{W, k} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.3)

We note that (2.3) does not define a component tuple $(m_{W, j} : 1 \leq j \leq d)$ , but only the sum of its entries, $m_{W, +} = m_{W, 1} \oplus \dots \oplus m_{W, d}$ . Obviously, if $(m_{W, j} : 1 \leq j \leq d)$ satisfies (2.3), then $(m_{W, 1} ⊖ c, m_{W, 2} \oplus c, m_{W, 3}, \dots, m_{W, d})$ also satisfies (2.3) for any constant $c \in H$ . We also note that $m_{W, +} \in H_{add} (H)$ is different from the regression map $m_{W}$ unless $m_{W}$ belongs to $H_{add} (H)$ . Thus, the problem of estimating $m_{W, +}$ , which we present in Section 2.2, is different from that of estimating $m_{W}$ under the assumption $m_{W} \in H_{add} (H)$ , the latter having been studied by Jeon and Park (Citation2020).

2.2 Estimation of Additive Projection

We now discuss the estimation of $m_{W, +}$ . The basic idea is to replace the marginal regression maps $E (W | Z_{j} = \cdot)$ and the densities f_j and f_jk in (2.3) by the corresponding kernel estimators and then to solve the resulting system of equations. We note that our method requires the estimation of $m_{X_{j}, +}$ for real-valued X_j and of $m_{X_{j}, +}$ for Hilbertian $X_{j}$ , as well. Below, we describe the method for W taking values in a general Hilbert space, since specialization to the case of random variables is immediate from the general treatment.

Suppose that we have n observations, $(Z_{i}, W_{i}), 1 \leq i \leq n$ , with $Z_{i} = {(Z_{i 1}, \dots, Z_{i d})}^{⊤}$ . We estimate the marginal regression map $M_{W, j} (z_{j}) : = E (W | Z_{j} = z_{j})$ for $z_{j} \in [0, 1]$ by (2.4) ${\hat{M}}_{W, j} (z_{j}) = {\hat{f}}_{j} {(z_{j})}^{- 1} \cdot n^{- 1} ⊙ \oplus_{i = 1}^{n} (K_{h_{j}} (z_{j}, Z_{i j}) ⊙ W_{i}),$ (2.4) where ${\hat{f}}_{j} (z_{j}) = n^{- 1} \sum_{i = 1}^{n} K_{h_{j}} (z_{j}, Z_{i j})$ is a kernel estimator of f_j. We also estimate f_jk by ${\hat{f}}_{j k} (z_{j}, z_{k}) = n^{- 1} \sum_{i = 1}^{n} K_{h_{j}} (z_{j}, Z_{i j}) K_{h_{k}} (z_{k}, Z_{i k})$ . Here and throughout the article, $K_{h} (\cdot, \cdot)$ is a normalized kernel function defined by $K_{h} (u, v) = \frac{K_{h} (u - v)}{\int_{0}^{1} K_{h} (t - v) d t}, u, v \in [0, 1],$ where h > 0 is the bandwidth, $K_{h} (w) = K (w / h) / h$ and $K (\cdot) \geq 0$ is the baseline kernel function. This type of normalized kernels have been used in the literature (Yu, Park, and Mammen Citation2008; Lee, Mammen, and Park Citation2010, Citation2012; Jeon and Park Citation2020). It has the property that $\int_{0}^{1} K_{h} (u, v) d u = 1$ for all $v \in [0, 1]$ , which gives $\int_{0}^{1} {\hat{f}}_{j} (z_{j}) d z_{j} = 1$ and $\int_{0}^{1} {\hat{f}}_{j k} (z_{j}, z_{k}) d z_{k} = {\hat{f}}_{j} (z_{j})$ for all $z_{j} \in [0, 1]$ . Plugging these estimators into (2.3) gives the following estimated system of Hilbertian integral equations: (2.5) $\begin{matrix} {\hat{m}}_{W, j} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.5)

In the supplement S.7, we will show that the system of Equationequations (2.5)(2.5) $\begin{matrix} {\hat{m}}_{W, j} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.5) has a unique solution, which we denote by ${\hat{m}}_{W, +}$ . Just like that the system of equations at (2.3) determines only $m_{W, +} = m_{W, 1} \oplus \dots \oplus m_{W, d}$ , the EquationEquation (2.5)(2.5) $\begin{matrix} {\hat{m}}_{W, j} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.5) by itself defines only ${\hat{m}}_{W, +} = {\hat{m}}_{W, 1} \oplus \dots \oplus {\hat{m}}_{W, d}$ , not individual components ${\hat{m}}_{W, j}$ . The solution ${\hat{m}}_{W, +}$ does not have a closed form, but can be computed by iteration starting with an initial tuple $({\hat{m}}_{W, j}^{[0]} : 1 \leq j \leq d)$ . We may simply take ${\hat{m}}_{W, j}^{[0]} \equiv 0$ , for example. In the rth ( $r \geq 1$ ) cycle of the iteration, we update ${\hat{m}}_{W, j}^{[r - 1]}$ by (2.6) $\begin{matrix} {\hat{m}}_{W, j}^{[r]} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \oplus_{k = 1}^{j - 1} \int_{0}^{1} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k}^{[r]} (z_{k}) d z_{k} \\ ⊖ \oplus_{k = j + 1}^{d} \int_{0}^{1} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k}^{[r - 1]} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.6)

In the supplement S.7, we will show that ${\hat{m}}_{W, +}^{[r]} : = {\hat{m}}_{W, 1}^{[r]} \oplus \dots \oplus {\hat{m}}_{W, d}^{[r]}$ converges to ${\hat{m}}_{W, +}$ as $r \to \infty$ , see Proposition S.1. In the iteration, we can evaluate the Bochner integrals in (2.6) by the conventional Lebesgue integrals, provided that ${\hat{m}}_{W, j}^{[0]}$ are linear in $W_{i}$ , that is, ${\hat{m}}_{W, j}^{[0]} (z_{j}) = n^{- 1} ⊙ \oplus_{i = 1}^{n} (w_{i j} (z_{j}) ⊙ W_{i})$ for some real-valued functions $w_{i j} : [0, 1] \to R$ . This is because the marginal ${\hat{M}}_{W, j}$ are also linear in $W_{i}$ so that all the subsequent updates ${\hat{m}}_{W, j}^{[r]}$ for $r \geq 1$ are linear in $W_{i}$ . For a linear form $g (u) = n^{- 1} ⊙ \oplus_{i = 1}^{n} v_{i} (u) ⊙ W_{i}$ with real-valued weight functions $v_{i} : [0, 1] \to R$ , it holds that $\int_{0}^{1} {\hat{f}}_{j k} (z_{j}, u) ⊙ g (u) d u = n^{- 1} ⊙ \oplus_{i = 1}^{n} (\int_{0}^{1} v_{i} (u) \cdot {\hat{f}}_{j k} (z_{j}, u) d u) ⊙ W_{i},$ where the integral on the left side is in Bochner sense while the one on the right hand side is a Lebesgue integral.

The regression map $m_{W}$ is not assumed to be additive and thus estimating the additive map $m_{W, +}$ is considered additive regression of nonadditive effect, which is different from additive regression of additive effect, developed by Mammen, Linton, and Nielsen (Citation1999) and Jeon and Park (Citation2020). The very core of the matter in the present case regarding additive regression is that, when we write $W = m_{W, +} (Z) \oplus ϱ_{W}$ with $ϱ_{W} : = W ⊖ m_{W, +} (Z)$ as an error term, $E (ϱ_{W} | Z) = m_{W} (Z) ⊖ m_{W, +} (Z) \neq 0$ . Instead, by the definition of $m_{W, +}$ , we have $E (〈 ϱ_{W}, δ_{+} (Z) 〉) = 0$ for all $δ_{+} \in H_{add} (H)$ . This implies that, for all $1 \leq k \leq d$ and for all $δ_{k} : [0, 1] \to H$ with $E (| | δ_{k} (Z_{k}) | |^{2}) < \infty$ , $0 = E (〈 ϱ_{W}, δ_{k} (Z_{k}) 〉) = E (〈 E (ϱ_{W} | Z_{k}), δ_{k} (Z_{k}) 〉)$ . This gives (2.7) $E (ϱ_{W} | Z_{k}) = 0, 1 \leq k \leq d .$ (2.7)

We show that the latter is enough for the theory of ${\hat{m}}_{W, +}$ , see the supplement S.7 for details.

3 Hilbertian PLAM with Scalar Covariates

We introduce a partially linear additive model for a $H$ -valued response Y and covariate vectors $X \equiv {(X_{1}, \dots, X_{p})}^{⊤}$ and $Z \equiv {(Z_{1}, \dots, Z_{d})}^{⊤}$ taking values in $R^{p}$ and $R^{d}$ , respectively, and present a way of estimating the model. We allow the entries of X to be discrete (nominal or ordinal) while assuming that all entries of Z are of continuous type supported on $[0, 1]$ . We assume that $E (X_{j}^{2}) < \infty$ for all $1 \leq j \leq p$ .

3.1 The Model

The Hilbertian partially linear additive model (HPLAM) is given by (3.1) $Y = β_{0} \oplus \oplus_{j = 1}^{p} (X_{j} ⊙ β_{j}) \oplus \oplus_{k = 1}^{d} m_{k} (Z_{k}) \oplus ε,$ (3.1)

where $β_{j}$ are unknown constants in $H$ , $m_{k} : [0, 1] \to H$ are unknown component maps with $E (| | m_{k} (Z_{k}) | |^{2}) < \infty$ satisfying the constraints (3.2) $E (m_{k} (Z_{k})) = 0, 1 \leq k \leq d,$ (3.2) and $ε$ is a Hilbertian error term such that $E (ε | X, Z) = 0$ and $E (| | ε | |^{2}) < \infty$ . Below, we give a proposition that establishes the identifiability of $β_{j}$ and $m_{k}$ in the model (3.1). Let $P_{U}$ denote the distribution of a random vector or element U. We make the following weak assumption.

(A0) The $P_{X, Z}$ has a density with respect to the product measure $P_{X} \otimes P_{Z}$ , which is bounded away from zero on the support of $P_{X, Z}$ . The marginal distribution $P_{Z}$ has a density f with respect to the Lebesgue measure such that $f (z) > 0$ for all $z \in {[0, 1]}^{d}$ , and the support of $P_{X}$ is not contained in a hyperplane in $R^{p}$ .

Proposition 3.1.

Let $α_{j} \in H$ for $0 \leq j \leq p$ be Hilbertian constants and $g_{k} : [0, 1] \to H$ for $1 \leq k \leq d$ be Hilbertian maps with $E (| | g_{k} (Z_{k}) | |^{2}) < \infty$ and $E (g_{k} (Z_{k})) = 0$ . Assume the condition (A0). Then, (3.3) $E ({‖ α_{0} \oplus \oplus_{j = 1}^{p} (X_{j} ⊙ α_{j}) \oplus \oplus_{k = 1}^{d} g_{k} (Z_{k}) ‖}^{2}) = 0$ (3.3) implies $α_{j} = 0$ and $g_{k} (Z_{k}) = 0$ a.s. for all $0 \leq j \leq p$ and $1 \leq k \leq d$ .

A proof of Proposition 3.1 is given in the supplement S.1. With (3.2), we have $β_{0} = E (Y) ⊖ \oplus_{j = 1}^{p} E (X_{j}) ⊙ β_{j}$ , so that by plugging the expression into (3.1) we may rewrite the model (3.1) as $Y^{c} = \oplus_{j = 1}^{p} X_{j}^{c} ⊙ β_{j} \oplus \oplus_{k = 1}^{d} m_{k} (Z_{k}) \oplus ε,$ where $Y^{c} = Y ⊖ E (Y)$ and $X_{j}^{c} = X_{j} - E (X_{j})$ .

3.2 Estimation of Parametric Components

Here, we discuss the estimation of $β_{j}$ using the profiling technique (Severini and Wong Citation1992) in conjunction with the estimation of additive projections presented in Section 2.2. Let $m_{+} = m_{1} \oplus \dots \oplus m_{d} \in H_{add} (H)$ and put $β = {(β_{1}, \dots, β_{p})}^{⊤}$ . Note that $β$ does not involve $β_{0}$ . Then, under the model specification (3.1), (3.4) $(β, m_{+}) = \underset{b \in H^{p}, g_{+} \in H_{add} (H)}{argmin} E (| | Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ b) ⊖ g_{+} (Z) | |^{2}),$ (3.4) where $X^{c} = {(X_{1}^{c}, \dots, X_{p}^{c})}^{⊤}$ . Here and below, $c^{⊤} ⊙ b = \oplus_{j = 1}^{p} (c_{j} ⊙ b_{j})$ for $c = {(c_{1}, \dots, c_{p})}^{⊤} \in R^{p}$ and $b = {(b_{1}, \dots, b_{p})}^{⊤} \in H^{p}$ . Considering $m_{Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ b), +}$ for each $b \in H^{p}$ , which solves the system of Equationequations (2.3)(2.3) $\begin{matrix} m_{W, j} (z_{j}) & = E (W | Z_{j} = z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{f_{j k} (z_{j}, z_{k})}{f_{j} (z_{j})} ⊙ m_{W, k} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.3) with $W = Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ b)$ and noting that $m_{Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ β), +} = m_{+}$ , we get $β = \underset{b \in H^{p}}{argmin} E (| | Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ b) ⊖ m_{Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ b), +} (Z) | |^{2}) .$

According to the arguments leading to Corollary S.1 in the supplement S.7, $m_{W, +}$ is linear in W in the sense that $m_{(W \oplus W'), +} = m_{W, +} \oplus m_{W', +}$ and $m_{W \oplus c, +} = m_{W, +} \oplus c$ for all Hilbertian variables $W, W'$ and Hilbertian constant c. Also, it holds that $m_{(W_{1} ⊙ b_{1}) \oplus (W_{2} ⊙ b_{2}), +} = (m_{W_{1}, +} ⊙ b_{1}) \oplus (m_{W_{2}, +} ⊙ b_{2})$ for all Hilbertian constants $b_{j} \in H$ and random variables W_j. Thus, $m_{Y^{c} ⊖ ({(X^{c})}^{⊤} ⊙ b), +} = m_{Y^{c}, +} ⊖ (m_{X^{c}, +}^{⊤} ⊙ b)$ . Furthermore, we also get $\begin{matrix} Y^{c} ⊖ m_{Y^{c}, +} (Z) & = (Y ⊖ E (Y)) ⊖ (m_{Y, +} (Z) ⊖ E (Y)) \\ = Y ⊖ m_{Y, +} (Z), \end{matrix}$ and likewise $X^{c} - m_{X^{c}, +} (Z) = X - m_{X, +} (Z)$ . This gives (3.5) $β = \underset{b \in H^{p}}{argmin} E ({‖ (Y ⊖ m_{Y, +} (Z)) ⊖ {(X - m_{X, +} (Z))}^{⊤} ⊙ b ‖}^{2}) .$ (3.5)

We estimate $β$ by minimizing an empirical version of the objective functional at (3.5). Define ${\tilde{X}}_{i j} = X_{i j} - {\hat{m}}_{X_{j}, +} (Z_{i}), {\tilde{Y}}_{i} = Y_{i} ⊖ {\hat{m}}_{Y, +} (Z_{i}),$ where ${\hat{m}}_{X_{j}, +}$ for $1 \leq j \leq p$ and ${\hat{m}}_{Y, +}$ , respectively, are the solutions of (2.5) for $W = X_{j}$ and $W = Y$ . Put ${\hat{m}}_{X, +} = {({\hat{m}}_{X_{1}, +}, \dots, {\hat{m}}_{X_{p}, +})}^{⊤}$ . Then, ${\tilde{X}}_{i} : = {({\tilde{X}}_{i 1}, \dots, {\tilde{X}}_{i p})}^{⊤} = X_{i} - {\hat{m}}_{X, +} (Z_{i})$ . We propose to estimate $β$ by (3.6) $\hat{β} = \underset{b \in H^{p}}{argmin} n^{- 1} \sum_{i = 1}^{n} {‖ {\tilde{Y}}_{i} ⊖ {\tilde{X}}_{i}^{⊤} ⊙ b ‖}^{2} .$ (3.6)

We present an explicit form of $\hat{β}$ defined at (3.6). The Gâteaux derivative of the objective functional in (3.6) at $\hat{β}$ to an arbitrary direction $δ \in H^{p}$ equals $\begin{matrix} - 2 n^{- 1} \sum_{i = 1}^{n} 〈 {\tilde{X}}_{i}^{⊤} ⊙ δ, {\tilde{Y}}_{i} ⊖ {\tilde{X}}_{i}^{⊤} ⊙ \hat{β} 〉 \\ = - 2 n^{- 1} \sum_{j = 1}^{p} \sum_{i = 1}^{n} 〈 δ_{j}, {\tilde{X}}_{i j} ⊙ ({\tilde{Y}}_{i} ⊖ {\tilde{X}}_{i}^{⊤} ⊙ \hat{β}) 〉 \\ = - 2 n^{- 1} \sum_{j = 1}^{p} 〈 δ_{j}, \oplus_{i = 1}^{n} {\tilde{X}}_{i j} ⊙ ({\tilde{Y}}_{i} ⊖ {\tilde{X}}_{i}^{⊤} ⊙ \hat{β}) 〉 . \end{matrix}$

Equating the above derivative with zero for all $δ \in H^{p}$ gives $\oplus_{i = 1}^{n} {\tilde{X}}_{i j} ⊙ ({\tilde{Y}}_{i} ⊖ {\tilde{X}}_{i}^{⊤} ⊙ \hat{β}) = 0, 1 \leq j \leq p .$

From this we get that, whenever the p × p matrix $\sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$ is invertible, (3.7) $\hat{β} = {(\sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤})}^{- 1} ⊙ (\oplus_{i = 1}^{n} {\tilde{X}}_{i} ⊙ {\tilde{Y}}_{i}),$ (3.7) where $c ⊙ v_{0} = {(c_{1} ⊙ v_{0}, \dots, c_{p} ⊙ v_{0})}^{⊤} \in H^{p}$ for $v_{0} \in H$ and $c = {(c_{1}, \dots, c_{p})}^{⊤} \in R^{p}$ , and $A ⊙ b = {(\oplus_{j = 1}^{p} a_{1 j} ⊙ b_{j}, \dots, \oplus_{j = 1}^{p} a_{p j} ⊙ b_{j})}^{⊤}$ for a p × p real matrix $A = (a_{i j})$ and $b = {(b_{1}, \dots, b_{p})}^{⊤} \in H^{p}$ . With $\hat{β} = {({\hat{β}}_{1}, \dots, {\hat{β}}_{p})}^{⊤}$ defined at (3.7) we estimate $β_{0}$ in the model (3.1) by ${\hat{β}}_{0} = \bar{Y} ⊖ \oplus_{j = 1}^{p} {\bar{X}}_{j} ⊙ {\hat{β}}_{j},$ where ${\bar{X}}_{j} = n^{- 1} \sum_{i = 1}^{n} X_{i j}$ and $\bar{Y} = n^{- 1} ⊙ (\oplus_{i = 1}^{n} Y_{i})$ . In Section 7.1, we will show that $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$ is invertible with probability tending to one and that $\hat{β}$ is $\sqrt{n}$ -consistent and $\sqrt{n} ⊙ (\hat{β} ⊖ β)$ converges to a mean zero Gaussian random element taking values in $H$ .

3.3 Estimation of Nonparametric Components

We estimate $m_{+} = \oplus_{k = 1}^{d} m_{k}$ in the model (3.1) by ${\hat{m}}_{+}$ , which solves the system of equations at (2.5) with $W_{i} = Y_{i} ⊖ {\hat{β}}_{0} ⊖ (X_{i}^{⊤} ⊙ \hat{β})$ . From the linearity of ${\hat{m}}_{W, +}$ in $(W_{1}, \dots, W_{n})$ (Corollary S.1 in the supplement S.7), it holds that ${\hat{m}}_{+} (z) = {\hat{m}}_{Y, +} (z) ⊖ {\hat{β}}_{0} ⊖ ({\hat{m}}_{X, +} {(z)}^{⊤} ⊙ \hat{β})$ , where ${\hat{m}}_{X, +} (z) = {({\hat{m}}_{X_{1}, +} (z), \dots, {\hat{m}}_{X_{p}, +} (z))}^{⊤}$ . To estimate the individual component maps $m_{k}$ satisfying the constraints $E (m_{k} (Z_{k})) = 0$ , we put the following constraints on ${\hat{m}}_{k}$ : (3.8) $\int_{0}^{1} {\hat{f}}_{k} (u) ⊙ {\hat{m}}_{k} (u) d u = 0, 1 \leq k \leq d,$ (3.8) where the integrals are in Bochner sense. Then, the constraints (3.8) identify the individual ${\hat{m}}_{k}$ uniquely such that ${\hat{m}}_{+} = {\hat{m}}_{1} \oplus \dots \oplus {\hat{m}}_{d}$ .

To see how the constraints identify a unique set of ${\hat{m}}_{k}$ , let $({\tilde{m}}_{1}, \dots, {\tilde{m}}_{d})$ be an arbitrary tuple such that ${\hat{m}}_{+} = {\tilde{m}}_{1} \oplus \dots \oplus {\tilde{m}}_{d}$ . Choose ${\hat{m}}_{k} = {\tilde{m}}_{k} ⊖ \int_{0}^{1} {\hat{f}}_{k} (z_{k}) ⊙ {\tilde{m}}_{k} (z_{k}) d z_{k}$ . Obviously, each ${\hat{m}}_{k}$ satisfies (3.8). It also holds (3.9) $\oplus_{k = 1}^{d} \int_{0}^{1} {\hat{f}}_{k} (z_{k}) ⊙ {\tilde{m}}_{k} (z_{k}) d z_{k} = 0,$ (3.9) which gives ${\hat{m}}_{+} = {\hat{m}}_{1} \oplus \dots \oplus {\hat{m}}_{d}$ . To see (3.9), we note that, since ${\hat{m}}_{+} = {\tilde{m}}_{1} \oplus \dots \oplus {\tilde{m}}_{d}$ , the tuple $({\tilde{m}}_{1}, \dots, {\tilde{m}}_{d})$ satisfies $\begin{matrix} {\hat{f}}_{k} (z_{k}) ⊙ {\tilde{m}}_{k} (z_{k}) & = {\hat{f}}_{k} (z_{k}) ⊙ {\hat{M}}_{Y ⊖ {\hat{β}}_{0} ⊖ (X^{⊤} ⊙ \hat{β}), k} (z_{k}) \\ ⊖ \underset{l \neq k}{\oplus} \int_{0}^{1} {\hat{f}}_{k l} (z_{k}, z_{l}) ⊙ {\tilde{m}}_{l} (z_{l}) d z_{l} \end{matrix}$ for all k. Integrating both sides over $z_{k} \in [0, 1]$ then gives $\begin{matrix} (z_{l}) \oplus_{l = 1}^{d} \int_{0}^{1} {\hat{f}}_{l} (z_{l}) ⊙ {\tilde{m}}_{l} (z_{l}) d z_{l} & \overset{k}{\equiv} \int_{0}^{1} {\hat{f}}_{k} (z_{k}) ⊙ {\hat{M}}_{Y ⊖ {\hat{β}}_{0} ⊖ (X^{⊤} ⊙ \hat{β}), k} (z_{k}) d z_{k} \\ = n^{- 1} ⊙ \oplus_{i = 1}^{n} (Y_{i} ⊖ {\hat{β}}_{0} ⊖ X_{i}^{⊤} ⊙ \hat{β}) \\ = 0 . \end{matrix}$

The first equality in the above equations follows from $\int_{0}^{1} {\hat{f}}_{k l} (z_{k}, z_{l}) d z_{k} = {\hat{f}}_{l} (z_{l})$ , and the second from $\int_{0}^{1} K_{h_{k}} (z_{k}, Z_{i k}) d z_{k} \equiv 1$ in view of (2.4).

3.4 Hilbertian PLM

We discuss briefly the estimation of the Hilbertian partially linear model (HPLM): $Y = β_{0} \oplus \oplus_{j = 1}^{p} (X_{j} ⊙ β_{j}) \oplus m (Z) \oplus ε$ , where m is allowed to be nonadditive, that is, m may not belong to $H_{add} (H)$ . It deserves being discussed in this article since there has been no study on this model for Hilbertian responses, to the best of our knowledge, and it helps to understand our numerical results to be presented in Section 5.

Let ${\overset{⁁}{m}}_{Y}$ be a multivariate kernel estimator of $m_{Y} : = E (Y |$ $Z = \cdot)$ defined by (3.10) ${\hat{m}}_{Y} (z) = {(\sum_{i = 1}^{n} \prod_{j = 1}^{d} K_{h_{j}} (z_{j}, Z_{i j}))}^{- 1} ⊙ \oplus_{i = 1}^{n} (\prod_{j = 1}^{d} K_{h_{j}} (z_{j}, Z_{i j})) ⊙ Y_{i} .$ (3.10)

Likewise, let ${\hat{m}}_{X}$ be a multivariate kernel estimator of $m_{X} : = E (X | Z = \cdot)$ , which is obtained by changing $Y_{i}$ in (3.10) to $X_{i}$ and the vector operations for $H$ to those for $R^{p}$ . Let $ϵ_{Y} = Y ⊖ m_{Y} (Z)$ , which should be differentiated from $ε = Y ⊖ E (Y | X, Z)$ . Put $ϵ_{X} = X - m_{X} (Z)$ . Also, let $ϵ_{Y, i} = Y_{i} ⊖ m_{Y} (Z_{i})$ and $ϵ_{X, i} = X_{i} - m_{X} (Z_{i})$ . Define ${\hat{ϵ}}_{Y, i} = Y_{i} ⊖ {\hat{m}}_{Y} (Z_{i}), {\hat{ϵ}}_{X, i} = X_{i} - {\hat{m}}_{X} (Z_{i}) .$

Applying the idea of profiling, explored in Section 3.2 for the HPLAM, now to the HPLM, we may estimate $β = {(β_{1}, \dots, β_{p})}^{⊤}$ by (3.11) ${\hat{β}}^{PLM} = {(\sum_{i = 1}^{n} {\hat{ϵ}}_{X, i} {\hat{ϵ}}_{X, i}^{⊤})}^{- 1} ⊙ (\oplus_{i = 1}^{n} {\hat{ϵ}}_{X, i} ⊙ {\hat{ϵ}}_{Y, i}),$ (3.11) and $β_{0}$ by ${\hat{β}}_{0}^{PLM} = \bar{Y} ⊖ \oplus_{j = 1}^{p} {\bar{X}}_{j} ⊙ {\hat{β}}_{j}^{PLM}$ , where ${\hat{β}}_{j}^{PLM}$ is the jth entry of ${\hat{β}}^{PLM}$ . We note that ${\hat{β}}^{PLM}$ takes the same form as $\hat{β}$ at (3.7) with ${\hat{ϵ}}_{X, i}$ and ${\hat{ϵ}}_{Y, i}$ taking the roles of ${\tilde{X}}_{i}$ and ${\tilde{Y}}_{i}$ , respectively.

Let $ϱ_{X} = X - m_{X, +} (Z)$ . Note that $ϱ_{X} = ϵ_{X}$ if all $E (X_{j} | Z = \cdot)$ are additive maps, that is, belong to $H_{add} (R)$ . According to the theory in Section 7.1, under the HPLAM model (3.1), the asymptotic variances of ${\hat{β}}_{j}^{PLM}$ are equal to or larger than those of the respective ${\hat{β}}_{j}$ , see the EquationEquation (7.2)(7.2) $\begin{matrix} {〈 (Σ^{PLM} - Σ) (a), a 〉}_{tp} \\ = E [(〈 ε, a_{1} 〉, \dots, 〈 ε, a_{p} 〉) \cdot ({(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} - {(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1}) \\ \cdot {(〈 ε, a_{1} 〉, \dots, 〈 ε, a_{p} 〉)}^{⊤}], \end{matrix}$ (7.2) . The gains by ${\hat{β}}_{j}$ against ${\hat{β}}_{j}^{PLM}$ get larger as the diagonal entries of ${[E (ϵ_{X} ϵ_{X}^{⊤})]}^{- 1}$ grow farther away from the respective diagonal entries of ${[E (ϱ_{X} ϱ_{X}^{⊤})]}^{- 1}$ .

4 Hilbertian PLAM with Hilbertian Covariates

In this section we consider the case where the covariates in the parametric part are also Hilbertian. We continue to use the notation $X \equiv {(X_{1}, \dots, X_{p})}^{⊤}$ for Hilbertian $X_{j}$ . Assume that $E (| | X_{j} | |^{2}) < \infty$ for all $1 \leq j \leq p$ .

4.1 The Model

Here, we assume that the Hilbert spaces for $X_{j}$ are the same as $H$ . In Section 4.3, we discuss the case where the spaces for $X_{j}$ are different from each other or from $H$ . We study the following Hilbertian partially linear additive model (4.1) $Y = β_{0} \oplus \oplus_{j = 1}^{p} (β_{j} ⊙ X_{j}) \oplus \oplus_{k = 1}^{d} m_{k} (Z_{k}) \oplus ε,$ (4.1) where $β_{0}$ is an unknown constant in $H$ but β_j for $1 \leq j \leq p$ are now unknown constants in $R$ . The component maps $m_{k}$ and the error term $ε$ are as in the model (3.1) with the constraints (3.2). As in Section 3.1, we find $β_{0} = E (Y) ⊖ \oplus_{j = 1}^{p} (β_{j} ⊙ E (X_{j}))$ and thus may rewrite the model (4.1) as (4.2) $Y^{c} = \oplus_{j = 1}^{p} (β_{j} ⊙ X_{j}^{c}) \oplus \oplus_{k = 1}^{d} m_{k} (Z_{k}) \oplus ε,$ (4.2) where $Y^{c} = Y ⊖ E (Y)$ and $X_{j}^{c} = X_{j} ⊖ E (X_{j})$ .

Under a weak assumption, the constraints (3.2) also entail the identifiability of the model (4.1). To state a proposition for the identifiability, we introduce new symbols for matrices and vectors of Hilbertian elements. Let $〈〈 a, b 〉〉$ for $a = {(a_{1}, \dots, a_{p})}^{⊤} \in H^{p}$ and $b = {(b_{1}, \dots, b_{p})}^{⊤} \in H^{p}$ denote the p × p matrix whose (j, k) element equals $〈 a_{j}, b_{k} 〉$ . Also, let $〈〈 a, c 〉$ and $〈 c, a 〉〉$ for $a \in H^{p}$ and $c \in H$ denote $p \times 1$ and $1 \times p$ vectors, respectively, whose jth entries are $〈 a_{j}, c 〉$ and $〈 c, a_{j} 〉$ . In this notation, $E (〈〈 X^{c}, X^{c} 〉〉)$ denotes the p × p matrix whose (j, k)th element equals $E (〈 X_{j}^{c}, X_{k}^{c} 〉)$ . We make the following assumption.

(B0) The joint distribution $P_{X, Z}$ has a density with respect to the product measure $P_{X} \otimes P_{Z}$ , which is bounded away from zero on the support of $P_{X, Z}$ . The marginal distribution $P_{Z}$ has a density f with respect to the Lebesgue measure such that $f (z) > 0$ for all $z \in {[0, 1]}^{d}$ , and the matrix $E (〈〈 X^{c}, X^{c} 〉〉)$ is positive-definite.

Proposition 4.1.

Let $α_{0} \in H$ and $α_{j} \in R$ for $1 \leq j \leq p$ be constants and $g_{k} : [0, 1] \to H$ for $1 \leq k \leq d$ satisfy $E (| | g_{k} (Z_{k}) | |^{2}) < \infty$ and $E (g_{k} (Z_{k})) = 0$ . Assume the condition (B0). Then, (4.3) $E ({‖ α_{0} \oplus \oplus_{j = 1}^{p} (α_{j} ⊙ X_{j}) \oplus \oplus_{k = 1}^{d} g_{k} (Z_{k}) ‖}^{2}) = 0$ (4.3) implies $α_{0} = 0, α_{j} = 0$ and $g_{k} (Z_{k}) = 0$ a.s. for all $1 \leq j \leq p$ and $1 \leq k \leq d$ .

A proof of Proposition 4.1 can be found in the supplement S.2.

4.2 Estimation of the Model

Put $β = {(β_{1}, \dots, β_{p})}^{⊤} \in R^{p}$ . From (4.2), it holds that $(β, m_{+}) = \underset{b \in R^{p}, g_{+} \in H_{add} (H)}{argmin} E ({‖ Y^{c} ⊖ (\oplus_{j = 1}^{p} b_{j} ⊙ X_{j}^{c}) ⊖ g_{+} (Z) ‖}^{2}),$ where $b = {(b_{1}, \dots, b_{p})}^{⊤} \in R^{p}$ . Noting that $m_{+} = m_{U, +}$ with $U = Y^{c} ⊖ \oplus_{j = 1}^{p} (β_{j} ⊙ X_{j}^{c})$ , and using that $m_{W, +}$ is linear in W for any Hilbertian W, we get $\begin{matrix} β & = \underset{b \in R^{p}}{argmin} E (⏧ (Y ⊖ m_{Y, +} (Z)) \\ ⊖ (\oplus_{j = 1}^{p} b_{j} ⊙ (X_{j} ⊖ m_{X_{j}, +} (Z))) ⏧^{2}) . \end{matrix}$

Let ${\tilde{Y}}_{i} = Y_{i} ⊖ {\hat{m}}_{Y, +} (Z_{i})$ and ${\tilde{X}}_{i j} = X_{i j} ⊖ {\hat{m}}_{X_{j}, +} (Z_{i})$ . We estimate $β$ by (4.4) $\hat{β} = \underset{b \in R^{p}}{argmin} n^{- 1} \sum_{i = 1}^{n} {‖ {\tilde{Y}}_{i} ⊖ (\oplus_{j = 1}^{p} b_{j} ⊙ {\tilde{X}}_{i j}) ‖}^{2} .$ (4.4)

Put ${\tilde{X}}_{i} = {({\tilde{X}}_{i 1}, \dots, {\tilde{X}}_{i p})}^{⊤}$ and let $〈〈 {\tilde{X}}_{i}, {\tilde{X}}_{i} 〉〉$ denote the p × p matrix whose (j, k)th element is given by $〈 {\tilde{X}}_{i j}, {\tilde{X}}_{i k} 〉$ . Also, let $〈〈 {\tilde{X}}_{i}, {\tilde{Y}}_{i} 〉$ denote the p-vector whose jth element is given by $〈 {\tilde{X}}_{i j}, {\tilde{Y}}_{i} 〉$ . By taking the Gâteaux derivative of the objective functional in (4.4), we find that (4.5) $\hat{β} = {(\sum_{i = 1}^{n} 〈〈 {\tilde{X}}_{i}, {\tilde{X}}_{i} 〉〉)}^{- 1} \cdot \sum_{i = 1}^{n} 〈〈 {\tilde{X}}_{i}, {\tilde{Y}}_{i} 〉,$ (4.5) provided that $\sum_{i = 1}^{n} 〈〈 {\tilde{X}}_{i}, {\tilde{X}}_{i} 〉〉$ is invertible. In Section 7.2, we show that $n^{- 1} \sum_{i = 1}^{n} 〈〈 {\tilde{X}}_{i}, {\tilde{X}}_{i} 〉〉$ is invertible with probability tending to one. With $\hat{β} = {({\hat{β}}_{1}, \dots, {\hat{β}}_{p})}^{⊤}$ defined at (4.5) we estimate $β_{0}$ in the model (4.1) by ${\hat{β}}_{0} = \bar{Y} ⊖ (\oplus_{j = 1}^{p} {\hat{β}}_{j} ⊙ {\bar{X}}_{j})$ , where ${\bar{X}}_{j} = n^{- 1} ⊙ \oplus_{i = 1}^{n} X_{i j}$ and $\bar{Y} = n^{- 1} ⊙ \oplus_{i = 1}^{n} Y_{i}$ .

We also estimate $m_{+}$ by the solution ${\hat{m}}_{+}$ of the system of Equationequations (2.5)(2.5) $\begin{matrix} {\hat{m}}_{W, j} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{0}^{1} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k} (z_{k}) d z_{k}, \\ 1 \leq j \leq d . \end{matrix}$ (2.5) with $W_{i} = Y_{i} ⊖ {\hat{β}}_{0} ⊖ (\oplus_{j = 1}^{p} {\hat{β}}_{j} ⊙ X_{i j}) .$

By arguing as in Section 3.3, we may show that ${\hat{m}}_{+}$ is uniquely decomposed into ${\hat{m}}_{+} = \oplus_{j = 1}^{d} {\hat{m}}_{j}$ with the constraints at (3.8).

4.3 Discussion

One may wonder if the linear part of the model (4.1) covers the standard function-on-function linear regression model when $X_{j}$ and Y are functional variables of infinite-dimension. In fact, such a functional linear model reduces to the model (3.1), not to the one at (4.1). To see this, consider the case with p = 1 for simplicity, that is, in the linear part of the model, there is only one covariate, say X, taking values in $L^{2} (S)$ with $S \subset R$ . Let the response variable $Y \equiv Y (\cdot)$ take values in $L^{2} (T)$ for some $T \subset R$ . The standard function-on-function linear regression model (Ramsay and Silverman Citation2005; Yao, Müller, and Wang Citation2005; Benatia, Carrasco, and Florens Citation2017) with the covariate $X \equiv X (\cdot)$ is given by $L_{θ} : L^{2} (S) \to L^{2} (T)$ such that (4.6) $L_{θ} (x) (v) = β_{0} (v) + \int_{S} θ (u, v) x (u) d u, x \equiv x (\cdot) \in L^{2} (S),$ (4.6) where $β_{0} \in L^{2} (T)$ and $θ \in L^{2} (S \times T)$ are considered as parameters. Let ${ϕ_{j} : j \geq 1}$ and ${ψ_{k} : k \geq 1}$ be known bases of $L^{2} (S)$ and $L^{2} (T)$ , respectively. An approach in, for example, Ramsay and Silverman (Citation2005) is to take, as the space for $θ (\cdot, \cdot)$ , a tensor product of finite-dimensional subspaces of $L^{2} (S)$ and $L^{2} (T)$ . In this approach θ is represented as (4.7) $θ (u, v) = \sum_{j = 1}^{L_{1}} \sum_{k = 1}^{L_{2}} θ_{j k} ϕ_{j} (u) ψ_{k} (v), θ_{j k} \in R$ (4.7) for some $L_{1}, L_{2} \geq 1$ . In our discussion, we allow $L_{2} = \infty$ while we assume L₁ is finite and fixed. For the functional covariate $X (\cdot)$ , put $ξ_{j} = \int_{S} ϕ_{j} (u) X (u) d u, β_{j} (v) = \sum_{k = 1}^{L_{2}} θ_{j k} ψ_{k} (v) .$

Then, plugging the representation at (4.7) into the model (4.6) we get (4.8) $L_{θ} (X) (v) = β_{0} (v) + \sum_{j = 1}^{L_{1}} ξ_{j} \cdot β_{j} (v), v \in T .$ (4.8)

Letting ξ_j and $β_{j} (\cdot) \in L^{2} (T)$ take the respective roles of X_j and $β_{j}$ in the model (3.1), the model (4.8) reduces to a special case of the linear part of the model (3.1) in Section 3.1. The above observation manifests that our model at (3.1) accommodates even infinite-dimensional θ in the standard function-on-function linear regression model (4.6), and the theory developed in Section 7.1 applies directly to this case.

One may think of an alternative basis generating θ, based on the functional principal components of $X (\cdot)$ and $Y (\cdot)$ . In this way, θ is represented as at (4.7) but now with $ϕ_{j}$ and ψ_k being the eigenfunctions of the respective covariance operators of $X (\cdot)$ and $Y (\cdot)$ . The main difference from the previous approach is that $ϕ_{j}$ and ψ_k are unknown. Thus, incorporating the unknown basis into the model at (4.6), ξ_j at (4.8) are not observed but need to be estimated. Put ${\hat{ξ}}_{i j} = \int_{S} {\hat{ϕ}}_{j} (u) X_{i} (u) d u, 1 \leq j \leq L_{1},$ where ${\hat{ϕ}}_{j}$ are the eigenfunctions of the sample covariance operator based on ${X_{i} (\cdot) : 1 \leq i \leq n}$ . Then, the estimation of the model (3.1), which incorporates (4.8) into the linear part, boils down to an errors-in-variables problem with “small” errors in observing the true covariate values $ξ_{i j} : = \int_{S} ϕ_{j} (u) X_{i} (u) d u$ . Certainly, the “measurement errors,” ${\hat{ξ}}_{i j} - ξ_{i j}$ , affect the estimation of $β_{j} = β_{j} (\cdot)$ and $m_{k}$ in the model (3.1). However, it is different from a typical errors-in-variables problem since the measurement errors are vanishing as $n \to \infty$ . Elaborating more on this, we let $L_{1} \equiv L_{1, n}$ now diverge as $n \to \infty$ . If $E (| | X | |_{L^{2} (S)}^{4}) < \infty$ and the eigenvalues λ_j corresponding to $ϕ_{j}$ satisfy the standard separation condition that $λ_{j} - λ_{j + 1} \geq C j^{- ς}$ for some constants C > 0 and $ς > 2$ as in, for example, Hall and Horowitz (Citation2007), then Lemma 2.3 in Horváth and Kokoszka (Citation2012) assures that $\max_{1 \leq j \leq L_{n}} | | {\hat{ϕ}}_{j} (\cdot) - ϕ_{j} (\cdot) | |_{L^{2} (S)} = O_{p} (n^{- 1 / 2} \cdot L_{1, n}^{ς})$ . The latter implies that $\max_{1 \leq j \leq L_{n}} | {\hat{ξ}}_{i j} - ξ_{i j} | = O_{p} (n^{- 1 / 2} \cdot L_{1, n}^{ς})$ for each i. If we further assume that $E (\exp (| | X | |_{L^{2} (S)} / C_{0})) < \infty$ for some constant $C_{0} > 0$ , then we may show that $\max_{1 \leq i \leq n} \max_{1 \leq j \leq L_{n}} | {\hat{ξ}}_{i j} - ξ_{i j} | = O_{p} (n^{- 1 / 2} \cdot L_{1, n}^{ς} \cdot \log n)$ . Following the arguments as in, for example, Jeon and Van Bever (Citation2022), we expect that this would add an extra error of size $O_{p} (n^{- 1 / 2} \cdot L_{1, n}^{ς} \cdot \log n)$ to the errors of ${\hat{β}}_{j}$ and ${\hat{m}}_{k}$ presented in Section 7.1.

The model (4.1), as it stands, assumes that the spaces for $X_{j}$ are the same as $H$ , the space for Y. However, the limitation is not essential. Basically, there is an isometric isomorphism that maps the space for $X_{j}$ , say $H_{j}$ , to $H$ , provided that $dim (H_{j}) = dim (H)$ . Let it be called $g_{j} : H_{j} \to H$ . Then, one may let Y depend on $X_{j}, 1 \leq j \leq p$ , via $g_{j}$ , so that one may postulate (4.9) $Y = β_{0} \oplus \oplus_{j = 1}^{p} (β_{j} ⊙ g_{j} (X_{j})) \oplus \oplus_{k = 1}^{d} m_{k} (Z_{k}) \oplus ε .$ (4.9)

The methodology of estimating the above model (4.9) and the associated theory follow directly from those for the model (4.1) by letting $g_{j} (X_{j})$ in the model (4.9) take the roles of $X_{j}$ in the model (4.1).

One may be also interested in the case where the covariates in the nonparametric part, now denoted by $Z_{k}$ for $1 \leq k \leq d$ , are also Hilbertian. Here, we briefly discuss a method designed for finite-dimensional $Z_{k}$ . Dealing with infinite-dimensional $Z_{k}$ in our framework is infeasible since then one needs to estimate nonparametric functions defined on infinite-dimensional domains. Let $Z_{k}$ take values in a compact subset $C_{k}$ of a q_k-dimensional Hilbert space $H_{k}$ . Then, one may consider a version of the model (3.1) or of the model (4.1) with $m_{k}$ now being defined as a map from $C_{k}$ to $H$ . For a general Hilbertian W, one may also obtain the following version of the system of equations at (2.5): (4.10) $\begin{matrix} {\hat{m}}_{W, j} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{C_{k}} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k} (z_{k}) d μ_{k} (z_{k}), \\ 1 \leq j \leq d . \end{matrix}$ (4.10)

Here, μ_k is the pushforward measure induced by the q_k-dimensional Lebesgue measure ${Leb}_{q_{k}}$ given by $μ_{k} (A) = {Leb}_{q_{k}} (η_{k} (A))$ for $A \subset H_{k}$ , and $η_{k} : H_{k} \to R^{q_{k}}$ is an isometric isomorphism. Also, ${\hat{M}}_{W, j}, {\hat{f}}_{j}$ and ${\hat{f}}_{j k}$ are kernel-based estimators of $E (W | Z_{j} = \cdot)$ , the density f_j of $Z_{j}$ and f_jk of $(Z_{j}, Z_{k})$ , respectively. We refer to Jeon, Park, and Van Keilegom (Citation2021) for concrete forms of these estimators. Then, one is able to estimate the model with Hilbertian $Z_{k}$ in the same way as we estimate the models (3.1) and (4.1), solving the system of Equationequations (4.10)(4.10) $\begin{matrix} {\hat{m}}_{W, j} (z_{j}) & = {\hat{M}}_{W, j} (z_{j}) ⊖ \underset{k \neq j}{\oplus} \int_{C_{k}} \frac{{\hat{f}}_{j k} (z_{j}, z_{k})}{{\hat{f}}_{j} (z_{j})} ⊙ {\hat{m}}_{W, k} (z_{k}) d μ_{k} (z_{k}), \\ 1 \leq j \leq d . \end{matrix}$ (4.10) for various choices of W.

5 Simulation Studies

5.1 Bandwidth Selection

The construction of ${\hat{m}}_{X_{j}, +}$ or ${\hat{m}}_{X_{j}, +}$ for $1 \leq j \leq p$ and ${\hat{m}}_{Y, +}$ in computing $\hat{β}$ , requires choosing a set of bandwidths $(h_{k} : 1 \leq k \leq d)$ . There is another set of bandwidths we need to select to construct ${\hat{m}}_{+}$ , solving (2.5) with $W_{i} = Y_{i} ⊖ {\hat{β}}_{0} ⊖ (X_{i}^{⊤} ⊙ \hat{β})$ . In our theoretical development in Section 7, we allow these bandwidth sets to be different from each other. In our numerical study, however, we simply took ${\hat{m}}_{+} (z) = {\hat{m}}_{Y, +} (z) ⊖ {\hat{β}}_{0} ⊖ ({\hat{m}}_{X, +} {(z)}^{⊤} ⊙ \hat{β})$ from the already calculated ${\hat{β}}_{0}, \hat{β}$ , ${\hat{m}}_{Y, +}$ and ${\hat{m}}_{X, +}$ based on a set of bandwidths $(h_{k} : 1 \leq k \leq d)$ . To choose the single bandwidth set, we used the CBS (Coordinate-wise Bandwidth Selection) algorithm introduced in Jeon and Park (Citation2020), instead of a full-dimensional grid search. The algorithm is reproduced in the supplement S.3. However, for the approach of HPLM without additive modeling introduced in Section 3.4, we used the bandwidths obtained from a full-dimensional grid search. The CBS algorithm is based on a cross-validation criterion. In our simulation study discussed below within Section 5, we used a 10-fold cross-validation, while in the real data applications presented in Section 6, we employed a 5-fold cross-validation. For numerical integration in the implementation of the iterative algorithm at (2.6), we used a trapezoidal rule with 101 equally spaced grid points.

5.2 Data Generating Models

We conducted simulation studies for the model (3.1) with a density response $Y \equiv Y (\cdot)$ taking values in the Bayes-Hilbert space $B ([- 1 / 2, 1 / 2])$ defined in the supplement S.4. We considered several cases with p = 2 and d = 2 or 3. For the covariates Z_j in the nonparametric part, we took $Z_{j} = Φ (U_{j}), 1 \leq j \leq 3$ , where $Φ$ is the cumulative distribution function of N(0, 1) and $U \equiv {(U_{1}, U_{2}, U_{3})}^{⊤}$ is a multivariate normal random vectors with $E (U_{j}) = 0, var (U_{j}) = 1$ and $cov (U_{j}, U_{k}) = ρ$ for all $1 \leq j \neq k \leq 3$ . This allows for dependence among Z_j when $ρ \neq 0$ . For X₁ in the parametric part, we took $X_{1} = X_{1, α}$ with $X_{1, α} = α \cdot η_{1}^{A} (Z_{1}, Z_{2}) + \sqrt{1 - α^{2}} \cdot η_{1}^{B} (Z_{1}, Z_{2}) + ϵ_{X_{1}}, 0 \leq α \leq 1.$

Here, $ϵ_{X_{1}}$ is $N (0, 1 / 2)$ independent of Z and $\begin{matrix} η_{1}^{A} (z_{1}, z_{2}) & = \sqrt{72} (z_{1} - \frac{1}{2}) (z_{2} - \frac{1}{2}), \\ η_{1}^{B} (z_{1}, z_{2}) & = \sqrt{3} (z_{1} + z_{2}), (z_{1}, z_{2}) \in {[0, 1]}^{2} . \end{matrix}$

We note that $η_{1}^{A}$ is nonadditive while $η_{1}^{B}$ is additive such that (5.1) $\int_{{[0, 1]}^{2}} {(η_{1}^{A} (z_{1}, z_{2}))}^{2} d z_{1} d z_{2} = \int_{{[0, 1]}^{2}} (η_{1}^{B} (z_{1}, z_{2}))^{2} d z_{1} d z_{2} = 1 / 2,$ (5.1) (5.2) $\int_{{[0, 1]}^{2}} η_{1}^{A} (z_{1}, z_{2}) g_{+} (z_{1}, z_{2}) d z_{1} d z_{2} = 0$ (5.2) for all $g_{+} : {[0, 1]}^{2} \to R$ with $g_{+} (z_{1}, z_{2}) = g_{1} (z_{1}) + g_{2} (z_{2})$ for some g_j. For X₂, we set $X_{2} = η_{2} (Z_{1}, Z_{2}) + ϵ_{X_{2}}$ , where $ϵ_{X_{2}}$ is $N (0, 1 / 2)$ independent of Z and $ϵ_{X_{1}}$ , and η₂ is given by $\begin{matrix} η_{2} (z_{1}, z_{2}) & = {(\frac{2}{9} - \frac{1}{8})}^{- 1 / 2} (| z_{1} - \frac{1}{2} | \cdot | z_{2} - \frac{1}{2} | - \frac{1}{4}), \\ (z_{1}, z_{2}) & \in {[0, 1]}^{2}, \end{matrix}$

so that it is nonadditive with (5.3) $\int_{{[0, 1]}^{2}} η_{2} {(z_{1}, z_{2})}^{2} d z_{1} d z_{2} = 1 / 2.$ (5.3)

We note that, if α = 0, then $E (X_{1, α} | (Z_{1}, Z_{2}) = \cdot) = η_{1}^{B} = m_{X_{1, α}, +} \in H_{add} (R)$ . Because of (5.2), $η_{1}^{A}$ is perpendicular to $H_{add} (R)$ when ρ = 0, in which case Z₁ and Z₂ are independent with densities $f_{1} = f_{2} = I_{[0, 1]} (\cdot)$ , in the sense that $E (η_{1}^{A} (Z_{1}, Z_{2}) g_{+} (Z_{1}, Z_{2})) = 0$ for all $g_{+} \in H_{add} (R)$ . This implies that $m_{X_{1, α}, +} = \sqrt{1 - α^{2}} \cdot η_{1}^{B}$ when ρ = 0, and thus $E (X_{1, α} | (Z_{1}, Z_{2}) = \cdot) = α \cdot η_{1}^{A} + \sqrt{1 - α^{2}} \cdot η_{1}^{B}$ gets farther away from $m_{X_{1, α}, +}$ as α increases, so that α controls the degree of departure of $E (X_{1, α} | (Z_{1}, Z_{2}) = \cdot)$ from $m_{X_{1, α}, +}$ . We also note that (5.1) and (5.3) give (5.4) $E (X_{1, α}^{2}) = E (X_{2}^{2}) = 1 when ρ = 0.$ (5.4)

We considered four models to generate $Y_{i} (\cdot)$ . The first model is given by $\begin{matrix} Model 1 : Y (\cdot) & = (X_{1, α} ⊙ β_{1} (\cdot)) \oplus (X_{2} ⊙ β_{2} (\cdot)) \\ \oplus \oplus_{k = 1}^{2} m_{k} (Z_{k}, \cdot) \oplus ε (\cdot), ρ = 0. \end{matrix}$

For $ε (\cdot)$ in Model 1 and others below, we took $ε (t) = ω (t) / \int_{- 1 / 2}^{1 / 2} ω (u) d u$ where $ω (\cdot)$ is a linear interpolation of $ω (t_{i})$ for $t_{i} = - 1 / 2 + (i - 1) / 100, 1 \leq i \leq 101$ and $ω (t_{i})$ are iid $Lognormal (0, 1)$ , independent of $(ϵ_{X_{1}}, ϵ_{X_{2}})$ and $(Z_{1}, Z_{2}, Z_{3})$ . Put $W (t) = \log ω (t)$ . Then, $\begin{matrix} | | ε (\cdot) | |_{B}^{2} & = \int_{- 1 / 2}^{1 / 2} (\log ω (t) - \int_{- 1 / 2}^{1 / 2} \log ω (u) d u)^{2} d t \\ ≃ \frac{1}{100} \sum_{i = 1}^{100} W {(t_{i})}^{2} - {(\frac{1}{100} \sum_{i = 1}^{100} W (t_{i}))}^{2}, \end{matrix}$ where $| | \cdot | |_{B}$ is the norm for the space $B$ defined through (S.6) in the supplementary materials. Since $W (t_{i})$ are iid N(0, 1), we get $E (| | ε (\cdot) | |_{B}^{2}) ≃ 99 / 100 ≃ 1$ . The coefficients $β_{j} (\cdot)$ , which themselves are densities on $[- 1 / 2, 1 / 2]$ , are chosen as follows: $\begin{matrix} β_{1} (t) & = {[\int_{- 1 / 2}^{1 / 2} {(1 + \frac{2}{5} \sin (4 u π))}^{C_{1}} d u]}^{- 1} {(1 + \frac{2}{5} \sin (4 t π))}^{C_{1}} \\ \cdot I (| t | \leq 1 / 2), \\ β_{2} (t) & = {[\int_{- 1 / 2}^{1 / 2} {(\frac{6}{5} - \frac{9}{5} {(| u | - \frac{1}{6})}_{+})}^{C_{2}} d u]}^{- 1} {(\frac{6}{5} - \frac{9}{5} {(| t | - \frac{1}{6})}_{+})}^{C_{2}} \\ \cdot I (| t | \leq 1 / 2), \end{matrix}$ where $a_{+} = \max {a, 0}$ and C_j are chosen so that $| | β_{j} (\cdot) | |_{B} = 1$ . Because of (5.4), it holds that $E (| | X_{1, α} ⊙ β_{1} (\cdot) | |_{B}^{2}) = E (| | X_{2} ⊙ β_{2} (\cdot) | |_{B}^{2}) = 1 when ρ = 0.$

The components m₁ and m₂ are also normalized. Specifically, $\begin{matrix} m_{1} (z_{1}, t) & = {(\int_{- 1 / 2}^{1 / 2} (2 - \exp (z_{1}) | u |)^{c_{1}} d u)}^{- 1} {(2 - \exp (z_{1}) | t |)}^{c_{1}} \\ \cdot I (| t | \leq 1 / 2), \\ m_{2} (z_{2}, t) & = {(\int_{- 1 / 2}^{1 / 2} \cos {(u π / 2)}^{c_{2} \cdot (z_{2} + z_{2}^{3})} d u)}^{- 1} \cos {(t π / 2)}^{c_{2} \cdot (z_{2} + z_{2}^{3})} \\ \cdot I (| t | \leq 1 / 2), \end{matrix}$ where c₁ and c₂ are chosen so that $E (| | m_{j} (Z_{j}, \cdot) | |_{B}^{2}) = 1$ for j = 1, 2.

As discussed briefly in Section 3.4 and demonstrated by our theory in Section 7.1, our estimator $\hat{β}$ gets more stable than ${\hat{β}}^{PLM}$ at (3.11), which ignores the additivity structure in the nonparametric part of the model, if $ϱ_{X} - ϵ_{X} = m_{X} (Z) - m_{X, +} (Z)$ grows farther away from the zero vector. Put (5.5) $Δ^{PLAM} = {[E (ϱ_{X} ϱ_{X}^{⊤})]}^{- 1}, Δ^{PLM} = {[E (ϵ_{X} ϵ_{X}^{⊤})]}^{- 1} .$ (5.5)

In Model 1 where ρ = 0, we may find exact formulas for $Δ^{PLAM}$ and $Δ^{PLM}$ . Indeed, when ρ = 0, we get $E (ϱ_{X_{1, α}} | Z) = α \cdot η_{1}^{A} (Z_{1}, Z_{2})$ from (5.2) and $E (ϱ_{X_{2}} | Z) = η_{2} (Z_{1}, Z_{2}) - η_{2, +} (Z_{1}, Z_{2})$ , where $η_{2, +}$ is the projection of η₂ onto $H_{add} (H)$ . Since $ϵ_{X_{1}}$ and $ϵ_{X_{2}}$ are independent, $ϱ_{X_{1, α}}$ and $ϱ_{X_{2}}$ are also independent conditional on Z. From these, we may find that the ith diagonal entries $Δ_{i, i}^{PLAM}$ and $Δ_{i, i}^{PLM}$ of $Δ^{PLAM}$ and $Δ^{PLM}$ , respectively, are: (5.6) $\begin{matrix} Δ_{1, 1}^{PLAM} & = \frac{2}{2 α^{2} E (η_{1}^{A} {(Z)}^{2}) + 1}, \\ Δ_{2, 2}^{PLAM} & = \frac{2}{2 E {(η_{2} (Z) - η_{2, +} (Z))}^{2} + 1}, \\ Δ_{1, 1}^{PLM} & = 2, \\ Δ_{2, 2}^{PLM} & = 2. \end{matrix}$ (5.6)

From the formula (5.6) and according to Theorem 7.1 with the discussion that follows, we get that, under Model 1, the asymptotic gains by ${\hat{β}}_{j}$ against ${\hat{β}}_{j}^{PLM}$ are (5.7) $n^{- 1} \cdot (Δ_{j, j}^{PLM} - Δ_{j, j}^{PLAM}) \cdot E (| | ε (\cdot) | |_{B}^{2}) ≃ n^{- 1} \cdot (Δ_{j, j}^{PLM} - Δ_{j, j}^{PLAM}) .$ (5.7)

Through the simulation with various values of α under Model 1, we may see how these theoretical gains by ${\hat{β}}_{j}$ take effect empirically.

For the second model we specialized $X_{1, α}$ to $X_{1, 1}$ , that is, took α = 1, and also fix $ρ = 1 / 2$ in the generation of $Z_{i}$ . Other than that, it is the same as Model 1. The third and fourth models have the same parametric part as the second one. The third model has a nonadditive map in the nonparametric part, while the fourth has a three dimensional additive map. Specifically, $\begin{matrix} Model 2 : Y (\cdot) & = (X_{1, 1} ⊙ β_{1} (\cdot)) \oplus (X_{2} ⊙ β_{2} (\cdot)) \\ \oplus \oplus_{k = 1}^{2} m_{k} (Z_{k}, \cdot) \oplus ε (\cdot) ρ = 1 / 2; \\ Model 3 : Y (\cdot) & = (X_{1, 1} ⊙ β_{1} (\cdot)) \oplus (X_{2} ⊙ β_{2} (\cdot)) \\ \oplus m_{1, 2} (Z_{1}, Z_{2}, \cdot) \oplus ε (\cdot), ρ = 1 / 2; \\ Model 4 : Y (\cdot) & = (X_{1, 1} ⊙ β_{1} (\cdot)) \oplus (X_{2} ⊙ β_{2} (\cdot)) \\ \oplus \oplus_{k = 1}^{3} m_{k} (Z_{k}, \cdot) \oplus ε (\cdot) ρ = 1 / 2. \end{matrix}$

Here, the density-valued maps m₃ and $m_{1, 2}$ are given by $\begin{matrix} m_{3} (z_{3}, t) & = {(\int_{- 1 / 2}^{1 / 2} {(1 + u)}^{c_{3} \cdot \sin (2 π z_{3})} d u)}^{- 1} {(1 + t)}^{c_{3} \cdot \sin (2 π z_{3})} \\ \cdot I (| t | \leq 1 / 2), \\ m_{1, 2} (z_{1}, z_{2}, t) & = {(\int_{- 1 / 2}^{1 / 2} \exp (- c_{1, 2} \cdot u^{2} \log (1 + z_{1} / 2 + z_{2} / 2)) d u)}^{- 1} \\ \cdot \exp (- c_{1, 2} \cdot t^{2} \log (1 + z_{1} / 2 + z_{2} / 2)) \cdot I (| t | \leq 1 / 2), \end{matrix}$ where c₃ and $c_{1, 2}$ are chosen so that $E (| | m_{1, 2} (Z_{1}, Z_{2}, \cdot) | |_{B}^{2}) = E (| | m_{3} (Z_{3}, \cdot) | |_{B}^{2}) = 1$ . We considered Model 3 to see the sensitivity of our approach to the violation of additivity in the nonparametric part, and Model 4 to learn the effect of increased dimension of Z, in comparison with Model 2.

5.3 Simulation Results

We generated N = 500 pseudo samples of sizes n = 200 and 400 according to the four models specified in Section 5.2. We computed Monte Carlo approximations of (5.8) $\begin{matrix} MSE ({\hat{β}}_{j}) & : = E (| | {\hat{β}}_{j} (\cdot) ⊖ β_{j} (\cdot) | |_{B}^{2}) = SB ({\hat{β}}_{j}) + var ({\hat{β}}_{j}), \\ SB ({\hat{β}}_{j}) & : = | | E ({\hat{β}}_{j} (\cdot)) ⊖ β_{j} (\cdot) | |_{B}^{2}, \\ var ({\hat{β}}_{j}) & : = E (| | {\hat{β}}_{j} (\cdot) ⊖ E ({\hat{β}}_{j} (\cdot)) | |_{B}^{2}) \end{matrix}$ (5.8) and those for ${\hat{β}}_{j}^{PLM}$ . For the estimators of the nonparametric parts, we approximated (5.9) $\begin{matrix} MISE ({\hat{m}}_{+}) & : = \int E (| | {\hat{m}}_{+} (z, \cdot) ⊖ m_{+} (z, \cdot) | |_{B}^{2}) d z \\ = ISB ({\hat{m}}_{+}) + IV ({\hat{m}}_{+}), \\ ISB ({\hat{m}}_{+}) & : = \int | | E ({\hat{m}}_{+} (z, \cdot)) ⊖ m_{+} (z, \cdot) | |_{B}^{2} d z, \\ IV ({\hat{m}}_{+}) & : = \int E ({‖ {\hat{m}}_{+} (z, \cdot) ⊖ E ({\hat{m}}_{+} (z, \cdot)) ‖}_{B}) d z \end{matrix}$ (5.9) and those for the PLM, based on the 500 pseudo samples. The target $m_{+}$ in this computation was the centered version of $m_{1} \oplus m_{2}$ or $m_{1} \oplus m_{2} \oplus m_{3}$ , that is, for Models 1 and 2, for example, (5.10) $\begin{matrix} m_{+} (z, \cdot) & = [m_{1} (z_{1}, \cdot) ⊖ E (m_{1} (Z_{1}, \cdot))] \\ \oplus [m_{2} (z_{2}, \cdot) ⊖ E (m_{2} (Z_{2}, \cdot))] . \end{matrix}$ (5.10)

The centering introduces the coefficient $β_{0} = m_{0}$ , which for Models 1 and 2 is given by $m_{0} (\cdot) = E (m_{1} (Z_{1}, \cdot)) \oplus E (m_{2} (Z_{2}, \cdot))$ .

reports the values of the measures for Model 1. For Model 1, the theoretical gain by ${\hat{β}}_{2}$ against ${\hat{β}}_{2}^{PLM}$ does not depend on α as shown by (5.6) and (5.7). This is confirmed by the MSE values in the column of β₂ in the table. For ${\hat{β}}_{1}$ against ${\hat{β}}_{1}^{PLM}$ , however, we get from (5.1) and (5.7) that $n^{- 1} \cdot (Δ_{1, 1}^{PLM} - Δ_{1, 1}^{PLAM}) = n^{- 1} \cdot 2 α^{2} / (α^{2} + 1)$ . The MSE values in the column of β₁ are roughly in accordance with this formula. For example, in case α = 1 and n = 200, the theoretical value equals $5 \times 10^{- 3}$ while the corresponding empirical value is $(10.63 - 5.17) \times 10^{- 3} = 5.46 \times 10^{- 3}$ . The results in the table also demonstrate that our approach leads to more accurate estimation for the nonparametric part as well.

Table 1 Mean squared error (MSE), squared bias (SB), and variance of ${\hat{β}}_{j}$ , as defined at (5.8), and mean integrated squared error (MISE), integrated squared bias (ISB), and integrated variance (IV), as defined at (5.9), under Model 1, multiplied by 10³.

Display Table

Comparing the values for different sample sizes, we see that the results in obey the theoretical rates of convergence. The rate for the parametric part is $n^{- 1}$ for both PLAM and PLM, while those for the nonparametric part equal $n^{- 4 / 5}$ for our method and $n^{- 2 / 3}$ for the PLM, provided that the corresponding optimal bandwidth sizes are used. We find that, in estimating m₀ and β_j, the ratios of the MSE values for n = 400 against n = 200 roughly coincide with the theoretical value ${(400 / 200)}^{- 1} = 1 / 2$ . In estimating the nonparametric part $m_{+}$ , the ratios of the MISE values are nearly identical to the theoretical values ${(400 / 200)}^{4 / 5} ≃ 0.574$ and ${(400 / 200)}^{2 / 3} ≃ 0.630$ , respectively, for our estimator and for the PLM. For instance, in case α = 0, the empirical ratio $26.06 / 44.11 ≃ 0.590$ for our estimator is roughly the same as 0.574, while $42.66 / 68.10 ≃ 0.626$ for the PLM approximates well the theoretical value 0.630.

reports the values of the measures at (5.8) and (5.9) for Models 2–4. Recall that, in these models, X₁ is the same as the one under Model 1 when α = 1, but $Z_{j} = Φ (U_{j})$ are correlated to each other via the correlation $ρ = 1 / 2$ among U_j. From the table, we first learn that our proposal continues to dominate the PLM when the covariates in the nonparametric part are correlated, when their effects are nonadditive, or when their number increases. The comparison between the values for Model 2 and those for Model 1 with α = 1 may reveal the effect of correlation among covariates. We observe that the presence of the correlation increased the values of MSE and MISE slightly for our estimators. For the PLM, in estimating β_j, the asymptotic theory at (7.1) in Section 7.1 tells that introducing correlation between Z_j does not affect the asymptotic MSE properties of ${\hat{β}}_{j}^{PLM}$ because $ϵ_{X_{1}}$ and $ϵ_{X_{2}}$ are independent of Z in our simulation setting. This is confirmed empirically by comparing the values, corresponding to the PLM, in the columns of β_j for Model 2 in , with those for Model 1 (α = 1) in .

Table 2 Values of measures of performance for ${\hat{β}}_{j}$ and ${\hat{m}}_{+}$ ( ${\hat{m}}_{1, 2}$ ) under Models 2, 3, and 4 where Z_j are correlated, multiplied by 10³.

Display Table

Now, we note that Models 3 and 4 differ from Model 2 only in the nonparametric part. In the computation of the values for Model 3, the target $m_{1, 2}$ was actually the centered version as at (5.10) for $m_{+}$ . The MSE values in the columns of β_j for Model 3, in comparison with those for Model 2, demonstrate empirically the insensitivity of both ${\hat{β}}_{j}$ and ${\hat{β}}_{j}^{PLM}$ to non-additivity in the nonparametric part, as asserted by Theorem 7.1 and the discussion that follows. Indeed, our theory tells that $ϱ_{X} = X - m_{X, +} (Z)$ and $ϵ_{X} = X - m_{X} (Z)$ determine the asymptotic distributions of $\hat{β}$ and ${\hat{β}}^{PLM}$ , respectively, and these have nothing to do with the structure of the nonparametric part. As for the estimation of the nonparametric part, the MISE values of both our estimator and the PLM for Model 3 are smaller than the corresponding ones for Model 2. This is due to the fact that $m_{1, 2}$ is easier to estimate than $m_{1} \oplus m_{2}$ . Comparing the values for Models 2 and 4, we see that the increased dimension of Z does not affect the precision of the estimators of β_j. It increases the MISE values of the nonparametric part, moderately for our estimator but substantially for the PLM. This illustrates the dimensionality problem with the PLM approach. Finally, we find that, as the results in , those in as well are in accordance with the corresponding theoretical rates of convergence.

and also report the average computing time, given a set of bandwidths chosen by a cross-validation criterion. The results indicate that the calculation of the Hilbertian PLAM estimators takes two or three times more than the computation of the Hilbertian PLM estimators, except in the case of Model 4 where the covariate dimension in the nonparametric part is higher than the other models. The comparison of computing time demonstrates how much extra computational time one needs to perform the smooth backfitting iteration, which the Hilbertian PLAM requires while the Hilbertian PLM does not.

6 Application to U.S. Presidential Election Data

We present a real data application for the model (4.1). Another real data application for the model (3.1) can be found in the supplement S.6. It is believed that the underlying political orientation and population characteristics of a region affect an election result for the region. To validate this belief in the United States, we analyzed the 2020 U.S. presidential election data. We put the observed proportions of votes earned by Democratic Party ( $Y_{i 1}$ ), Repulican Party ( $Y_{i 2}$ ) and the rest ( $Y_{i 3}$ ) for the ith state into a vector $Y_{i} = (Y_{i 1}, Y_{i 2}, Y_{i 3})$ , and took the compositional vectors as the observations of the response Y. We considered such three categories since the first two parties received the major spotlight in the election. We note that Y satisfies the two constraints $0 < Y_{j} < 1$ and $\sum_{j = 1}^{3} Y_{j} = 1$ , and thus, it takes values in $S_{1}^{3} = {(a_{1}, a_{2}, a_{3}) \in R^{3} : 0 < a_{j} < 1, \sum_{j = 1}^{3} a_{j} = 1}$ , which is a two-dimensional Hilbert space as detailed in the supplement S.5. As a covariate, we considered a compositional vector $X_{1}$ that comprises the proportions of votes earned by the same parties in the 2016 U.S. presidential election. We added three other covariates in the model building, which are the proportion of people who have a bachelor or a higher degree (Z₁), per capita income (Z₂) and median age (Z₃). The observations on these socio-demographic variables Z_j were obtained from https://www.census.gov/acs/www/data/data-tables-and-tools/data-profiles/2019/. The observed value $X_{i 1}$ of the Hilbertian covariate $X_{1}$ is considered to represent the political orientation of the ith state, and the values of the socio-demographic covariates, Z_ij for $1 \leq j \leq 3$ , measure the state’s education, wealth and age levels, respectively.

We applied the Hilbertian partially linear additive regression approach based on the model (4.1) with the covariates $X_{1}, Z_{1}, Z_{2}$ and Z₃. In doing so, we applied trapezoidal rules based on two different grids to approximate the integrals in the iterative algorithm at (2.6): one with 101 equally spaced points (Dense) and the other with 21 (Sparse). We also used two different bandwidth selectors: one was the CBS bandwidth introduced in Section 5.1, and the other an optimally chosen bandwidth for density estimation (KDE) according to the R function “h.amise” in R package “kedd (version 1.0.3).” Based on these options we implemented the three combinations: Dense-CBS, Sparse-CBS, and Dense-KDE. As a competing approach, we also chose the Hilbertian additive regression approach of Jeon and Park (Citation2020), which uses only the scalar covariates $(Z_{1}, Z_{2}, Z_{3})$ , with the Dense-CBS scheme. We compared the prediction performance of these approaches via the leave-one-out average squared prediction error (ASPE) defined by $\begin{matrix} n^{- 1} \sum_{i = 1}^{n} | | Y_{i} ⊖ {\hat{Y}}_{i}^{(- i)} | |^{2} \\ = n^{- 1} \sum_{i = 1}^{n} {(2 \times 3)}^{- 1} \sum_{j = 1}^{3} \sum_{k = 1}^{3} {(\log (Y_{i j} / Y_{i k}) - \log ({\hat{Y}}_{i j}^{(- i)} / {\hat{Y}}_{i k}^{(- i)}))}^{2}, \end{matrix}$ where n = 51 and ${\hat{Y}}_{i}^{(- i)} \equiv ({\hat{Y}}_{i 1}^{(- i)}, {\hat{Y}}_{i 2}^{(- i)}, {\hat{Y}}_{i 3}^{(- i)})$ is the predicted value of $Y_{i}$ obtained without the ith observation. The above definition of ASPE is based on the geometry of $S_{1}^{3}$ given in the supplement S.5. The values of the ASPE for our approach with the Dense-CBS, Sparse-CBS, and Dense-KDE schemes were respectively 0.063, 0.063, and 0.070, while the application of Jeon and Park (Citation2020) gave 0.198. Thus, our partially linear additive modeling approach incorporating the compositional covariate improved the prediction performance greatly. It turned out that the size of grid in numerical integration does not have significant effects on the prediction performance. However, it turned out that the CBS scheme taking into account the values of Y outperforms the scheme based on KDE.

To see the effects of the covariates, we estimated the model (4.1) using the whole dataset with the Dense-CBS scheme. We obtained that ${\hat{β}}_{1} = 2.454$ . The result with such a large positive value demonstrates that the party supporting tendency in the 2020 U.S. election is tied strongly with that in the 2016 election. Thus, the empirical result provides a strong evidence that the underlying political orientation is an important determinant for the presidential election. The estimated component maps are depicted in , which visualize the effects of the socio-demographic variables on the election results. A clear lesson from the estimated maps is that the approval rate for the Democratic candidate remains unchanged as education level or income level increases, while it decreases for the Republican candidate and increases for the group of other candidates. The effects of age do not seem to have a monotone pattern.

Fig. 1 The estimated component maps ${\hat{m}}_{k}$ for the population characteristics based on the proposed method applied to the U.S. election data.

Fig. 1 The estimated component maps m̂k for the population characteristics based on the proposed method applied to the U.S. election data.

7 Theoretical Development

7.1 Asymptotic Theory: Scalar Covariates

Here, we discuss the case where $X_{i}$ and $Z_{i}$ take values in $R^{p}$ and ${[0, 1]}^{d}$ , respectively. We assume that $(X_{i}, Z_{i}, ε_{i})$ for $1 \leq i \leq n$ are iid copies of $(X, Z, ε)$ such that $E (ε | X, Z) = 0$ . Recall that $m_{X_{j}, +}$ , for each $1 \leq j \leq p$ , is the solution of (2.3) with $W = X_{j}$ , $m_{X, +} = {(m_{X_{1}, +}, \dots, m_{X_{p}, +})}^{⊤} : {[0, 1]}^{d} \to R^{p}, ϱ_{X_{j}} = X_{j} - m_{X_{j}, +} (Z)$ and $ϱ_{X} = {(ϱ_{X_{1}}, \dots, ϱ_{X_{p}})}^{⊤}$ . We let $(h_{k} : 1 \leq k \leq d)$ denote the bandwidth set we use in the construction of ${\hat{m}}_{X, +}$ and ${\hat{m}}_{Y, +}$ for $\hat{β}$ and ${\hat{β}}_{0}$ . Recall ${\hat{m}}_{+} = {\hat{m}}_{Y ⊖ {\hat{β}}_{0} ⊖ (X^{⊤} ⊙ \hat{β}), +}$ , which solves (2.5) with $W_{i} = Y_{i} ⊖ {\hat{β}}_{0} ⊖ (X_{i}^{⊤} ⊙ \hat{β})$ . We let $(b_{k} : 1 \leq k \leq d)$ denote the bandwidth set we use in the construction of ${\hat{m}}_{+}$ with $\hat{β}$ and ${\hat{β}}_{0}$ at hand.

7.1.1 Technical Assumptions and Invertibility of $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$

We collect the technical assumptions we use for our theoretical development.

(A1) The joint density f is bounded away from zero and infinity on its support ${[0, 1]}^{d}$ and all f_jk are continuously differentiable on ${[0, 1]}^{2}$ .
(A2) The real-valued additive functions $m_{X_{j}, +}$ are twice continuously differentiable on ${[0, 1]}^{d}$ and the $H$ -valued $m_{k}$ are twice continously Fréchet differentiable on $[0, 1]$ .
(A3) The Hilbertian error variable $ε$ satisfies $E (| | ε | |^{α}) < \infty$ for some $α > 12 / 5$ when $H$ is finite-dimensional and $α > 5 / 2$ when $H$ is infinite-dimensional. Also, there exists a constant $0 < C_{1} < \infty$ such that $E (| | ε | |^{2} | X, Z) \leq C_{1}$ a.s.
(A4) The p × p matrix $E (ϱ_{X} ϱ_{X}^{⊤})$ is invertible.
(A5) There exist constants $0 < L, C_{2} < \infty$ such that $\max_{1 \leq j \leq p} E (\exp (| X_{j} | / L) | Z) \leq C_{2}$ a.s.
(A6) The baseline kernel K is symmetric and positive on $(- 1, 1)$ but it vanishes on $R ∖ (- 1, 1)$ and its first derivative is Lipschitz continuous.
(A7) The bandwidths h_k satisfy $h_{k} ≍ n^{- γ}$ for some γ with $1 / 6 < γ < \min {1 / 3, (α - 2) / α}$ when $H$ is finite-dimensional, and with $1 / 5 < γ < \min {1 / 3, (α - 2) / α}$ when $H$ is infinite-dimensional.
(A8) The bandwidths b_k satisfy that $b_{k} \to 0$ and $\underset{n}{\lim \inf} n^{c_{k}} b_{k} > 0$ for some constants $c_{k} < (α - 2) / α$ for all $1 \leq k \leq d$ , and ${(n b_{j} b_{k})}^{- 1} \log n \to 0$ for all $1 \leq j \neq k \leq d$ .

The conditions (A1) and (A2) for $m_{k}$ are typically assumed in the smooth backfitting literature, see Jeon and Park (Citation2020). The moment conditions at (A3) ensure $(α - 2) / α > 1 / 6$ for finite-dimensional $H$ and $(α - 2) / α > 1 / 5$ for infinite-dimensional $H$ , so that the bandwidth ranges in (A7) make sense. The range for infinite-dimensional $H$ does not cover the optimal size for univariate nonparametric smoothing, which is $n^{- 1 / 5}$ , but we note that the bandwidths h_k are for estimating the parametric part of the model, not for the nonparametric part. For the latter, we use b_k for which we assume (A8). The condition (A4) ensures that the matrix $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$ is invertible with probability tending to one, which we detail below. The exponential moment condition (A5) is assumed to use empirical process theory (e.g., van de Geer Citation2000) in developing technical discussion for ${\hat{β}}_{j}$ . The condition (A6) is stronger than the typical one in kernel smoothing that K itself is Lipschitz continuous. The latter condition is required to make our estimator ${\hat{m}}_{+}$ smoother, so that it belongs to an $H$ -valued function class with a proper entropy.

We now state a proposition that demonstrates the invertibility of $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$ .

Proposition 7.1.

Assume (A1), (A2) for $m_{X_{j}, +}$ and (A4)–(A7). Then, $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤} = E (ϱ_{X} ϱ_{X}^{⊤}) + O_{p} (n^{- 2 γ} + n^{- (1 - γ) / 2} \sqrt{\log n}),$ so that $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$ is invertible with probability tending to one.

7.1.2 Estimation of Parametric Components

We present the limit distribution of $\hat{β}$ defined at (3.7). For this we need to introduce several terminologies. First, we let $H^{p}$ be equipped with an inner product ${〈 \cdot, \cdot 〉}_{tp}$ defined by ${〈 a, b 〉}_{tp} = \sum_{j = 1}^{p} 〈 a_{j}, b_{j} 〉$ for $a = {(a_{1}, \dots, a_{p})}^{⊤}$ and $b = {(b_{1}, \dots, b_{p})}^{⊤}$ in $H^{p}$ . Let $| | \cdot | |_{tp}$ be the associated norm such that $| | a | |_{tp}^{2} = \sum_{j = 1}^{p} | | a_{j} | |^{2}$ . For $e \in H$ , define $e \otimes e : H^{p} \to H^{p}$ , an outer product in $H^{p}$ , by $(e \otimes e) (a) = {(〈 e, a_{1} 〉 ⊙ e, \dots, 〈 e, a_{p} 〉 ⊙ e)}^{⊤} .$

For a general $H^{p}$ -valued random element V, a linear operator $S : H^{p} \to H^{p}$ such that $cov ({〈 V, a 〉}_{tp}, {〈 V, b 〉}_{tp}) = {〈 S (a), b 〉}_{tp}, a, b \in H^{p}$ is called the covariance operator of V. For a random vector U taking values in $R^{p}$ , $cov ({〈 U ⊙ ε, a 〉}_{tp}, {〈 U ⊙ ε, b 〉}_{tp}) = {〈 E [U U^{⊤} ⊙ (ε \otimes ε) (a)], b 〉}_{tp},$ so that $E [U U^{⊤} ⊙ (ε \otimes ε)] : H^{p} \to H^{p}$ , defined by $E [U U^{⊤} ⊙ (ε \otimes ε)] (a) = E [U U^{⊤} ⊙ (ε \otimes ε) (a)]$ , is the covariance operator of $U ⊙ ε$ . Let $G (0_{p}, S)$ denote a Gaussian random element taking values in $H^{p}$ with mean $0_{p} = {(0, \dots, 0)}^{⊤} \in H^{p}$ and covariance operator S, that is, the real-valued random variable ${〈 G (0_{p}, S), a 〉}_{tp}$ for any $a \in H^{p}$ is normally distributed with mean 0 and variance ${〈 S (a), a 〉}_{tp}$ . Let $Σ = E [{(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1} \cdot ϱ_{X} ϱ_{X}^{⊤} \cdot {(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1} ⊙ (ε \otimes ε)]$ , which is the covariance operator of ${(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1} ϱ_{X} ⊙ ε$ . Note that, in case $ε$ is independent of $(X, Z)$ , then $Σ$ reduces to ${(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1}$ $⊙ E (ε \otimes ε)$ .

Theorem 7.1.

Under the assumptions (A1)–(A7), it holds that $\begin{matrix} \sqrt{n} ⊙ (\hat{β} ⊖ β) & = n^{- 1 / 2} ⊙ (\oplus_{i = 1}^{n} {(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1} ϱ_{X, i} ⊙ ε_{i}) \\ + o_{p} (1) \\ ⇝ G (0_{p}, Σ) . \end{matrix}$

The following corollary for ${\hat{β}}_{0} = \bar{Y} ⊖ (\oplus_{j = 1}^{p} {\bar{X}}_{j} ⊙ {\hat{β}}_{j})$ is an immediate consequence of Theorem 7.1. Recall that $β_{0} = E (Y) ⊖ (\oplus_{j = 1}^{p} E (X_{j}) ⊙ β_{j})$ .

Corollary 7.1.

Under the assumptions (A1)–(A7), it holds that $| | {\hat{β}}_{0} ⊖ β_{0} | | = O_{p} (n^{- 1 / 2})$ .

7.1.3 Comparison with HPLM

We make a theoretical comparison of $\hat{β}$ and ${\hat{β}}^{PLM}$ , the latter of which is defined at (3.11). Let $Σ^{PLM} = E [{(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} \cdot ϵ_{X} ϵ_{X}^{⊤} \cdot {(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} ⊙ (ε \otimes ε)]$ . Then, along the lines of the proof of Theorem 7.1 in the supplement S.11, we may prove that, under suitable conditions, (7.1) $\begin{matrix} \sqrt{n} ⊙ ({\hat{β}}^{PLM} ⊖ β) ⇝ G (0_{p}, Σ^{PLM}) . \end{matrix}$ (7.1)

A special case of (7.1) for $H = R$ and d = 1 has been derived by Liang (Citation2006). The result (7.1) for ${\hat{β}}^{PLM}$ is valid under the PLM and thus remains true under the PLAM as well. In case $ε$ is independent of $(X, Z), Σ^{PLM}$ reduces to ${(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} ⊙ E (ε \otimes ε)$ . The following definition is an extension of the classical notion of “asymptotic efficiency” for real-valued parameters to that for Hilbertian parameters.

Definition 7.1.

Let ${\hat{θ}}_{1}$ and ${\hat{θ}}_{2}$ be estimators of a parameter $θ$ in a separable Hilbert space $H$ such that $s_{n} ⊙ ({\hat{θ}}_{j} ⊖ θ) ⇝ G (0, Σ_{j})$ for a sequence s_n and covariance operators $Σ_{j}$ for j = 1, 2. We say that ${\hat{θ}}_{1}$ is asymptotically more efficient than ${\hat{θ}}_{2}$ if $Σ_{1} - Σ_{2}$ is a nonnegative definite operator, that is, $〈 (Σ_{1} - Σ_{2}) (h), h 〉 \geq 0$ for all $h \in H$ .

We note that ${(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} - {(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1}$ is nonnegative definite and it is zero only if $m_{X} = m_{X, +}$ . A direct computation shows that (7.2) $\begin{matrix} {〈 (Σ^{PLM} - Σ) (a), a 〉}_{tp} \\ = E [(〈 ε, a_{1} 〉, \dots, 〈 ε, a_{p} 〉) \cdot ({(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} - {(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1}) \\ \cdot {(〈 ε, a_{1} 〉, \dots, 〈 ε, a_{p} 〉)}^{⊤}], \end{matrix}$ (7.2) which implies that $\hat{β}$ is asymptotically more efficient than ${\hat{β}}^{PLM}$ in the sense of Definition 7.1. Let d_j denote the jth diagonal entry of ${(E ϵ_{X} ϵ_{X}^{⊤})}^{- 1} - {(E ϱ_{X} ϱ_{X}^{⊤})}^{- 1}$ . Then, under the PLAM (3.1), it holds that $n \cdot (| | {\hat{β}}_{j}^{PLM} ⊖ β_{j} | |^{2} - | | {\hat{β}}_{j} ⊖ β_{j} | |^{2}) = S_{n j} + o_{p} (1)$ with $E (S_{n j}) = d_{j} \cdot E (| | ε | |^{2})$ , which is the gain by ${\hat{β}}_{j}$ against ${\hat{β}}_{j}^{PLM}$ under (3.1).

7.1.4 Estimation of Nonparametric Components

As we discussed in Section 3.3, ${\hat{m}}_{+}$ is uniquely decomposed into ${\hat{m}}_{+} = \oplus_{j = 1}^{d} {\hat{m}}_{j}$ with the constraints at (3.8). We study the error rates for ${\hat{m}}_{+}$ and ${\hat{m}}_{k}$ . The asymptotic results presented here do not follow immediately from the results in Jeon and Park (Citation2020) since $Y_{i} ⊖ {\hat{β}}_{0} ⊖ (X_{i}^{⊤} ⊙ \hat{β}) \neq m_{+} (Z_{i}) \oplus ε$ . Furthermore, as seen in the assumption (A8), our work allows for a flexible range for the bandwidths b_k, instead of assuming $b_{k} ≍ n^{- 1 / 5}$ as in Jeon and Park (Citation2020). Put $κ_{k} (τ) = b_{k}^{τ} + {(n b_{k})}^{- 1 / 2} \sqrt{\log n}$ for $τ = 1, 2$ and $1 \leq k \leq d$ . Also, let $δ_{k} = \max_{l \neq k} b_{l}^{2} + n^{- 1 / 2} \cdot {(\min_{l \neq k} b_{l})}^{- 1 / 2}$ .

Theorem 7.2.

Assume (A1)–(A8). Then, it holds that, for each $1 \leq k \leq d$ , $\begin{matrix} \sup_{z_{k} \in [2 b_{k}, 1 - 2 b_{k}]} | | {\hat{m}}_{k} (z_{k}) ⊖ m_{k} (z_{k}) | | & = O_{p} (κ_{k} (2) + δ_{k}), \\ \sup_{z_{k} \in [0, 1]} | | {\hat{m}}_{k} (z_{k}) ⊖ m_{k} (z_{k}) | | & = O_{p} (κ_{k} (1) + δ_{k}) . \end{matrix}$

Consequently, $\begin{matrix} \sup_{z \in [2 b_{1}, 1 - 2 b_{1}] \times \dots \times [2 b_{d}, 1 - 2 b_{d}]} | | {\hat{m}}_{+} (z) ⊖ m_{+} (z) | | & = O_{p} (\max_{1 \leq k \leq d} κ_{k} (2)), \\ \sup_{z \in {[0, 1]}^{d}} | | {\hat{m}}_{+} (z) ⊖ m_{+} (z) | | & = O_{p} (\max_{1 \leq k \leq d} κ_{k} (1)) . \end{matrix}$

In the case where $b_{k} ≍ n^{- 1 / 5}$ for all $1 \leq k \leq d$ , the rates in Theorem 7.2 reduce to the usual ones for univariate smoothing. Indeed, for each $1 \leq k \leq d$ , $\begin{matrix} \sup_{z_{k} \in [2 b_{k}, 1 - 2 b_{k}]} | | {\hat{m}}_{k} (z_{k}) ⊖ m_{k} (z_{k}) | | & = O_{p} (n^{- 2 / 5} \sqrt{\log n}), \\ \sup_{z_{k} \in [0, 1]} | | {\hat{m}}_{k} (z_{k}) ⊖ m_{k} (z_{k}) | | & = O_{p} (n^{- 1 / 5}) . \end{matrix}$

We also derive the asymptotic distributions of ${\hat{m}}_{+}$ and ${\hat{m}}_{k}$ , which we defer to the supplement S.15.

7.2 Asymptotic Theory: Hilbertian Covariates

Here, we discuss the case where the covariates in the linear part of the model, which we denoted by $X_{j}$ for $1 \leq j \leq p$ , take values in $H$ . Define now $ϱ_{X_{j}} = X_{j} ⊖ m_{X_{j}, +} (Z)$ . Recall that $E (〈〈 ϱ_{X}, ϱ_{X} 〉〉)$ is the p × p matrix whose (j, k) element equals $E (〈 ϱ_{X_{j}}, ϱ_{X_{k}} 〉)$ . We first introduce some technical assumptions for Hilbertian $X_{j}$ . We note that the assumptions (A1), (A6) and (A8) in Section 7.1 still apply in the present case, which we call (B1), (B6), and (B8), respectively, here. The corresponding versions of the others are given below.

(B2) The $H$ -valued $m_{X_{j}, +}$ are twice continuously Fréchet differentiable on ${[0, 1]}^{d}$ and the $H$ -valued $m_{k}$ at (4.1) are twice continously Fréchet differentiable on $[0, 1]$ .
(B3) The Hilbertian error variable $ε$ at (4.1) satisfies $E (| | ε | |^{α}) < \infty$ for some $α > 12 / 5$ . Also, there exists a constant $0 < C_{1} < \infty$ such that $E (| | ε | |^{2} | X, Z) \leq C_{1}$ a.s.
(B4) The p × p matrix $E (〈〈 ϱ_{X}, ϱ_{X} 〉〉)$ is invertible.
(B5) There exist constants $0 < L, C_{2} < \infty$ such that $\max_{1 \leq j \leq p} E (\exp (| | X_{j} | | / L) | Z) \leq C_{2}$ a.s.
(B7) The bandwidths h_k satisfy $h_{k} ≍ n^{- γ}$ for some γ with $1 / 6 < γ < \min {1 / 3, (α - 2) / α}$ .

The conditions (B3) and (B7) are the same as (A3) and (A7), respectively, for finite-dimensional $H$ . In fact, for infinite-dimensional $H$ , we are able to derive a slower rate for $\hat{β}$ and ${\hat{β}}_{0}$ , see below the second parts of Theorem 7.3 and Corollary 7.2. For such rate of convergence, we find that (B3) and (B7) are sufficient. The following proposition is an analogue of Proposition 7.1 for Hilbertian $X_{j}$ .

Proposition 7.2.

Assume (B1), (B2) for $m_{X_{j}, +}$ and (B4)–(B7). Then, $n^{- 1} \sum_{i = 1}^{n} 〈〈 {\tilde{X}}_{i}, {\tilde{X}}_{i} 〉〉 = E (〈〈 ϱ_{X}, ϱ_{X} 〉〉) + O_{p} (n^{- 2 γ} + n^{- (1 - γ) / 2} \sqrt{\log n}),$ so that $n^{- 1} \sum_{i = 1}^{n} 〈〈 {\tilde{X}}_{i}, {\tilde{X}}_{i} 〉〉$ is invertible with probability tending to one.

We now derive the asymptotic properties of $\hat{β}$ . Note that $E (〈〈 ϱ_{X}, ε 〉〈 ε, ϱ_{X} 〉〉)$ is the p × p matrix whose (j, k) element is given by $E (〈 ϱ_{X_{j}}, ε 〉〈 ϱ_{X_{k}}, ε 〉)$ . Define $Σ_{p} = {(E (〈〈 ϱ_{X}, ϱ_{X} 〉〉))}^{- 1} \cdot E (〈〈 ϱ_{X}, ε 〉〈 ε, ϱ_{X} 〉〉) \cdot {(E (〈〈 ϱ_{X}, ϱ_{X} 〉〉))}^{- 1} .$

Let $N_{p} (0_{p}, Σ_{p})$ denote the multivariate normal distribution with mean $0_{p} = {(0, \dots, 0)}^{⊤} \in R^{p}$ and covariance matrix Σ_p.

Theorem 7.3.

Assume (B1)–(B7). If $H$ is finite-dimensional, then it holds that $\begin{matrix} \sqrt{n} (\hat{β} - β) & = n^{- 1 / 2} \cdot \sum_{i = 1}^{n} (E 〈〈 ϱ_{X}, ϱ_{X} 〉〉)^{- 1} 〈〈 ϱ_{X, i}, ε_{i} 〉 + o_{p} (1) \\ ⇝ N_{p} (0_{p}, Σ_{p}) . \end{matrix}$

If $H$ is infinite-dimensional, then $\hat{β} - β = O_{p} (n^{- 2 γ} \log n + n^{- (1 - γ) / 2} {(\log n)}^{3 / 2})$ .

The following corollary for ${\hat{β}}_{0} = \bar{Y} ⊖ \oplus_{j = 1}^{p} {\hat{β}}_{j} ⊙ {\bar{X}}_{j}$ is an immediate consequence of Theorem 7.3. Recall that $β_{0} = E (Y) ⊖ \oplus_{j = 1}^{p} β_{j} ⊙ E (X_{j})$ .

Corollary 7.2.

Assume (B1)–(B7). If $H$ is finite-dimensional, then $| | {\hat{β}}_{0} ⊖ β_{0} | | = O_{p} (n^{- 1 / 2})$ . If $H$ is infinite-dimensional, then $| | {\hat{β}}_{0} ⊖ β_{0} | | = O_{p} (n^{- 2 γ} \log n + n^{- (1 - γ) / 2} {(\log n)}^{3 / 2})$ .

Next, we present the asymptotic properties of ${\hat{m}}_{+}$ and ${\hat{m}}_{k}$ satisfying the constraints at (3.8). As in Section 7.1, we let $(b_{k} : 1 \leq k \leq d)$ denote the set of bandwidths for defining (2.5) with $W_{i} = Y_{i} ⊖ {\hat{β}}_{0} ⊖ \oplus_{j = 1}^{p} ({\hat{β}}_{j} ⊙ X_{i j})$ . Recall $κ_{k} (τ) = b_{k}^{τ} + {(n b_{k})}^{- 1 / 2} \sqrt{\log n}$ and $δ_{k} = \max_{l \neq k} b_{l}^{2} + n^{- 1 / 2} \cdot {(\min_{l \neq k} b_{l})}^{- 1 / 2}$ . Put ${\tilde{δ}}_{k} = δ_{k}$ if $H$ is finite-dimensional, and ${\tilde{δ}}_{k} = δ_{k} + n^{- 2 γ} {(\log n)}^{2} + n^{- (1 - γ) / 2} {(\log n)}^{5 / 2}$ if $H$ is infinite-dimensional.

Theorem 7.4.

Assume (B1)–(B8). Then, Theorem 7.2 remains to hold with δ_k being replaced by ${\tilde{δ}}_{k}$ .

We also derive the asymptotic distributions of ${\hat{m}}_{+}$ and ${\hat{m}}_{k}$ in the case of Hilbertian $X_{j}$ , which we defer to the supplement S.18.

Supplementary Materials

The supplementary material contains the description of the CBS algorithm, the geometries of Bayes-Hilbert spaces and simplices, and an additional real data application. It also includes two additional theorems for the asymptotic distributions of the estimators of the nonparametric components of the partially linear additive models, and all technical proofs.

Supplemental material

Acknowledgments

The authors thank an Associate Editor and two referees for giving thoughtful and constructive comments on the earlier versions of the article.

Additional information

Funding

Byeong U. Park’s research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2019R1A2C3007355). Kyusang Yu’s research was supported by the National Research Foundation of Korea (NRF) funded by the Korea government (MSIT) NRF-2021R1A4A5032622.

References

Benatia, D., Carrasco, M., and Florens, J.-P. (2017), “Functional Linear Regression with Functional Response,” Journal of Econometrics, 201, 269–291. DOI: 10.1016/j.jeconom.2017.08.008.
Web of Science ®Google Scholar
Bhattacharya, P. K., and Zhao, P.-L. (1997), “Semiparametric Inference in a Partial Linear Model,” The Annals of Statistics, 25, 244–262. DOI: 10.1214/aos/1034276628.
Web of Science ®Google Scholar
Bissantz, N., Dette, H., Hildebrandt, T., and Bissantz, K. (2016), “Smooth Backfitting in Additive Inverse Regression,” Annals of the Institute of Statistical Mathematics, 68, 827–853. DOI: 10.1007/s10463-015-0517-x.
Web of Science ®Google Scholar
Bosq, D. (2000), Linear Processes in Function Spaces, New York: Springer.
Google Scholar
Buja, A., Hastie, T., and Tibshirani, R. (1989), “Linear Smoothers and Additive Models,” (with discussion), The Annals of Statistics, 17, 453–555. DOI: 10.1214/aos/1176347115.
Web of Science ®Google Scholar
Hall, P., and Horowitz, J. L. (2007), “Methodology and Convergence Rates for Functional Linear Regression,” The Annals of Statistics, 35, 70–91. DOI: 10.1214/009053606000000957.
Web of Science ®Google Scholar
Han, K., and Park, B. U. (2018), “Smooth Backfitting for Errors-in-Variables Additive Models,” The Annals of Statistics, 46, 2216–2250. DOI: 10.1214/17-AOS1617.
Web of Science ®Google Scholar
Han, K., Müller, H.-G., and Park, B. U. (2020), “Additive Functional Regression for Densities as Responses,” Journal of the American Statistical Association, 115, 997–1010. DOI: 10.1080/01621459.2019.1604365.
Web of Science ®Google Scholar
Horváth, L., and Kokoszka, P. (2012), Inference for Functional Data with Applications, New York: Springer.
Google Scholar
Jeon, J. M., and Park, B. U. (2020), “Additive Regression with Hilbertian Responses,” The Annals of Statistics, 48, 2671–2697. DOI: 10.1214/19-AOS1902.
Web of Science ®Google Scholar
Jeon, J. M., Park, B. U., and Van Keilegom, I. (2021), “Additive Regression with Non-Euclidean Responses and Predictors,” The Annals of Statistics, 49, 2611–2641. DOI: 10.1214/21-AOS2048.
Web of Science ®Google Scholar
Jeon, J. M., and Van Bever, G. (2022), “Additive Regression with General Imperfect Variables,” arXiv:2212.05745.
Google Scholar
Lee, E. R., Han, K., and Park, B. U. (2018), “Estimation of Errors-in-Variables Partially Linear Additive Models,” Statistica Sinica, 28, 2353–2373.
Web of Science ®Google Scholar
Lee, Y. K., Mammen, E., and Park, B. U. (2010), “Backfitting and Smooth Backfitting for Additive Quantile Models,” The Annals of Statistics, 38, 2857–2883. DOI: 10.1214/10-AOS808.
Web of Science ®Google Scholar
Lee, Y. K., Mammen, E., and Park, B. U. (2012), “Flexible Generalized Varying Coefficient Regression Models,” The Annals of Statistics, 40, 1906–1933.
Web of Science ®Google Scholar
Liang, H. (2006), “Estimation in Partially Linear Models and Numerical Comparisons,” Computational Statistics & Data Analysis, 50, 675–687. DOI: 10.1016/j.csda.2004.10.007.
PubMed Web of Science ®Google Scholar
Liang, H., Thurston, S. W., Ruppert, D., Apanasovich, T., and Hauser, R. (2008), “Additive Partially Linear Models with Measurement Errors,” Biometrika, 95, 667–678. DOI: 10.1093/biomet/asn024.
Web of Science ®Google Scholar
Lin, Z., Müller, H.-G., and Park, B. U. (2022), “Additive Models for Symmetric Positive-Definite Matrices and Lie Groups,” Biometrika (to appear). DOI: 10.1093/biomet/asac055.
Google Scholar
Linton, O., Sperlich, S., and van Keilegom, I. (2008), “Estimation of a Semiparametric Transformation Model,” The Annals of Statistics, 36, 686–718. DOI: 10.1214/009053607000000848.
Web of Science ®Google Scholar
Mammen, E., Linton, O., and Nielsen, J. P. (1999), “The Existence and Asymptotic Properties of a Backfitting Projection Algorithm under Weak Conditions,” The Annals of Statistics, 27, 1443–1490. DOI: 10.1214/aos/1017939138.
Web of Science ®Google Scholar
Opsomer, J. D., and Ruppert, D. (1997), “Fitting a Bivariate Additive Model by Local Polynomial Regression,” The Annals of Statistics, 25, 186–211. DOI: 10.1214/aos/1034276626.
Web of Science ®Google Scholar
Opsomer, J. D., and Ruppert, D. (1999), “A Root-n Consistent Backfitting Estimator for Semiparametric Additive Modeling,” Journal of Computational and Graphical Statistics, 8, 715–732.
Web of Science ®Google Scholar
Ramsay, J. O., and Silverman, B. W. (2005), Functional Data Analysis, New York: Springer.
Google Scholar
Severini, T., and Wong, W. (1992), “Profile Likelihood and Conditionally Parametric Models,” The Annals of Statistics, 20, 1768–1802. DOI: 10.1214/aos/1176348889.
Web of Science ®Google Scholar
van de Geer, S. (2000), Empirical Processes in M-Estimation, Cambridge: Cambridge University Press.
Google Scholar
Yao, F., Müller, H.-G., and Wang, J.-L. (2005), “Functional Linear Regression Analysis for Longitudinal Data,” The Annals of Statistics, 33, 2873–2903. DOI: 10.1214/009053605000000660.
Web of Science ®Google Scholar
Yu, K., Mammen, E., and Park, B. U. (2011), “Semi-Parametric Regression: Efficiency Gains from Modeling the Nonparametric Part,” Bernoulli, 17, 736–748. DOI: 10.3150/10-BEJ296.
Web of Science ®Google Scholar
Yu, K., Park, B. U., and Mammen, E. (2008), “Smooth Backfitting in Generalized Additive Models,” The Annals of Statistics, 36, 228–260. DOI: 10.1214/009053607000000596.
Web of Science ®Google Scholar
Zhang, X., Park, B. U., and Wang, J.-L. (2013), “Time-Varying Additive Models for Longitudinal Data,” Journal of the American Statistical Association, 108, 983–998. DOI: 10.1080/01621459.2013.778776.
Web of Science ®Google Scholar

Partially Linear Additive Regression with a General Hilbertian Response

ABSTRACT

1 Introduction

2 Additive Regression

2.1 Projection on Additive Function Space

2.2 Estimation of Additive Projection

3 Hilbertian PLAM with Scalar Covariates

3.1 The Model

3.2 Estimation of Parametric Components

3.3 Estimation of Nonparametric Components

3.4 Hilbertian PLM

4 Hilbertian PLAM with Hilbertian Covariates

4.1 The Model

4.2 Estimation of the Model

4.3 Discussion

5 Simulation Studies

5.1 Bandwidth Selection

5.2 Data Generating Models

5.3 Simulation Results

Table 1 Mean squared error (MSE), squared bias (SB), and variance of ${\hat{β}}_{j}$ , as defined at (5.8), and mean integrated squared error (MISE), integrated squared bias (ISB), and integrated variance (IV), as defined at (5.9), under Model 1, multiplied by 10³.

Table 2 Values of measures of performance for ${\hat{β}}_{j}$ and ${\hat{m}}_{+}$ ( ${\hat{m}}_{1, 2}$ ) under Models 2, 3, and 4 where Z_j are correlated, multiplied by 10³.

6 Application to U.S. Presidential Election Data

7 Theoretical Development

7.1 Asymptotic Theory: Scalar Covariates

7.1.1 Technical Assumptions and Invertibility of $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$

7.1.2 Estimation of Parametric Components

7.1.3 Comparison with HPLM

7.1.4 Estimation of Nonparametric Components

7.2 Asymptotic Theory: Hilbertian Covariates

Supplementary Materials

Supplemental Material

Supplemental Material

Supplemental Material

Acknowledgments

Related Research Data

References

Information for

Open access

Opportunities

Help and information

Partially Linear Additive Regression with a General Hilbertian Response

ABSTRACT

1 Introduction

2 Additive Regression

2.1 Projection on Additive Function Space

2.2 Estimation of Additive Projection

3 Hilbertian PLAM with Scalar Covariates

3.1 The Model

3.2 Estimation of Parametric Components

3.3 Estimation of Nonparametric Components

3.4 Hilbertian PLM

4 Hilbertian PLAM with Hilbertian Covariates

4.1 The Model

4.2 Estimation of the Model

4.3 Discussion

5 Simulation Studies

5.1 Bandwidth Selection

5.2 Data Generating Models

5.3 Simulation Results

Table 1 Mean squared error (MSE), squared bias (SB), and variance of β̂j, as defined at (5.8), and mean integrated squared error (MISE), integrated squared bias (ISB), and integrated variance (IV), as defined at (5.9), under Model 1, multiplied by 103.

Table 2 Values of measures of performance for β̂j and m̂+ (m̂1,2) under Models 2, 3, and 4 where Zj are correlated, multiplied by 103.

6 Application to U.S. Presidential Election Data

7 Theoretical Development

7.1 Asymptotic Theory: Scalar Covariates

7.1.1 Technical Assumptions and Invertibility of n−1∑i=1nX˜iX˜i⊤

7.1.2 Estimation of Parametric Components

7.1.3 Comparison with HPLM

7.1.4 Estimation of Nonparametric Components

7.2 Asymptotic Theory: Hilbertian Covariates

Supplementary Materials

Supplemental Material

Supplemental Material

Supplemental Material

Acknowledgments

Additional information

Funding

Related Research Data

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1 Mean squared error (MSE), squared bias (SB), and variance of ${\hat{β}}_{j}$ , as defined at (5.8), and mean integrated squared error (MISE), integrated squared bias (ISB), and integrated variance (IV), as defined at (5.9), under Model 1, multiplied by 10³.

Table 2 Values of measures of performance for ${\hat{β}}_{j}$ and ${\hat{m}}_{+}$ ( ${\hat{m}}_{1, 2}$ ) under Models 2, 3, and 4 where Z_j are correlated, multiplied by 10³.

7.1.1 Technical Assumptions and Invertibility of $n^{- 1} \sum_{i = 1}^{n} {\tilde{X}}_{i} {\tilde{X}}_{i}^{⊤}$