Search in:

Statistical Theory and Related Fields Volume 5, 2021 - Issue 1

Submit an article Journal homepage

Free access

567

Views

CrossRef citations to date

Altmetric

Listen

Articles

Efficient estimation of smoothing spline with exact shape constraints

Vincent Chana Department of Statistics, University of Wisconsin-Madison, Madison, WI, USAView further author information

Kam-Wah Tsuia Department of Statistics, University of Wisconsin-Madison, Madison, WI, USAView further author information

Yanran Weib Department of Statistics, Virginia Tech, Blacksburg, VA, USA

https://orcid.org/0000-0002-6745-7326 View further author information

Zhiyang Zhangb Department of Statistics, Virginia Tech, Blacksburg, VA, USAView further author information

Xinwei Dengb Department of Statistics, Virginia Tech, Blacksburg, VA, USACorrespondence[email protected]

https://orcid.org/0000-0002-1560-2405 View further author information

Pages 55-69 | Received 14 Feb 2019, Accepted 24 Jan 2020, Published online: 07 Feb 2020

Cite this article
https://doi.org/10.1080/24754269.2020.1722604
CrossMark

In this article

ABSTRACT
1. Introduction
2. Revisit of natural cubic spline and smoothing spline
3. Exact shape-constrained smoothing spline
4. Efficient algorithm for parameter estimation
5. Simulations
6. Real data analysis
7. Discussion
Disclosure statement
Additional information
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Smoothing spline is a popular method in non-parametric function estimation. For the analysis of data from real applications, specific shapes on the estimated function are often required to ensure the estimated function undeviating from the domain knowledge. In this work, we focus on constructing the exact shape constrained smoothing spline with efficient estimation. The ‘exact’ here is referred as to impose the shape constraint on an infinite set such as an interval in one-dimensional case. Thus the estimation becomes a so-called semi-infinite optimisation problem with an infinite number of constraints. The proposed method is able to establish a sufficient and necessary condition for transforming the exact shape constraints to a finite number of constraints, leading to efficient estimation of the shape constrained functions. The performance of the proposed methods is evaluated by both simulation and real case studies.

Keywords:

Monotonicity
non-linear constraint
non-parametric regression

1. Introduction

In recent years, non-parametric smoothing methods have gained popularity in various science and engineering areas such as economics, biology, smart manufacturing, etc. One advantage of these methods is that they do not assume strong parametric structure for the underlying model. While in the analysis of data from real applications, researchers often demand a specific shape on the estimated function to ensure the estimated function undeviating from their domain knowledge. For example, shape with monotonicity are often required in the estimation of dose-response function (Kelly & Rice, Citation1990) in medicine. The degradation curve of scaffolds in engineered scaffold fabrication often requires to be monotonic and concave (Zeng, Deng, & Yang, Citation2016). The estimation of human growth curve (Ducharme & Fontez, Citation2004) in biometrics and estimation of utility function (Matzkin, Citation1991) in economics often needs the concavity in shape.

Among various non-parametric smoothers, spline smoothing and kernel smoothing are quite popular. Theoretical and numerical properties of these techniques have been well studied. See Wahba (Citation1990) and Green and Silverman (Citation1993) for thorough discussions of the spline smoothing problem, and Fan and Gijbels (Citation1996) and Wand and Jones (Citation1995) for the kernel smoothing problem. Unlike its unconstrained counterpart, shape constrained smoother has not received large attentions in the statistics literature. As pointed in Delecroix and Thomas-Agnan (Citation2000), most of the isotonic estimates are based on splines rather than on kernels since enforcing the restrictions at the minimisation step appears to be a natural solution.

There are various splines to enable the shape constraints. For example, B-spline is a popular approach because of its special properties: non-decreasing coefficients imply non-decreasing resulting function (Brezger & Steiner, Citation2003; Dierckx, Citation1980; He & Shi, Citation1998; Kelly & Rice, Citation1990). There are also I-spline methods (Curry & Schoenberg, Citation1966; Ramsay, Citation1988) to integrate over non-negative M-splines for constructing monotone smoothers. Meyer (Citation2008) defined the C-splines that at each observation to impose the shape constraint. Combinations of monotonicity and convexity can be imposed by the regression splines. Meyer (Citation2012) also developed penalised splines under shape constraints. (Liao & Meyer, Citation2017) proposed estimators of change-point based on constrained splines. Meyer (Citation2018) proposed the constrained generalised additive model using iteratively reweighted cone projection algorithm. The smoothing spline is another popular approach, of which the most notable works include (Wang & Li, Citation2008)'s second order cone programming to create monotone smoothing spline and (Turlach, Citation2005)'s approach of adaptively adding constraints to create shape constrained smoothing spline.

In this work, we consider the smoothing spline to study the exact shape constraints. Here the meaning of ‘exact’ is referred as to impose the shape constraint on an infinite set such as an interval in one-dimensional case. It leads to a so-called semi-infinite optimisation problem with an infinite number of constraints. Suppose that the observational data are $(x_{i}, y_{i})$ for $i = 1, \dots, n$ , and we assume $L \leq x_{1} < \dots < x_{n} \leq U$ . The exact shape constrained smoothing spline is defined as the solution of the following optimisation problem: (1a) $\begin{aligned} \underset{f}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} + λ \int_{a}^{b} [f^{(2)} (t)]^{2} d t, \end{aligned}$ (1a) (1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) where $f^{(r)} (x)$ is the rth derivative of $f (x)$ and $λ \geq 0$ is a tuning parameter. The formulation in (Equation1a(1a) $\begin{aligned} \underset{f}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} + λ \int_{a}^{b} [f^{(2)} (t)]^{2} d t, \end{aligned}$ (1a) ) without constraint is the well-known smoothing spline problem, the solution of which is known as the cubic smoothing spline over the class of twice differential functions. The framework for r = 1 and r = 2 corresponds to the monotone non-decreasing and convex shape constraint, respectively. The monotone decreasing or concave constraint can be easily obtained by reversing the inequality sign in (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ). One can pursue either one of the constraints or some of them under this framework. For example, non-global constraint, such as convex for $x \leq 0$ and concave for x>0 is possible. A mixed constraint, such as combination of concave and monotone increasing, can also be applied.

The challenges for the estimation in (Equation1(1a) $\begin{aligned} \underset{f}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} + λ \int_{a}^{b} [f^{(2)} (t)]^{2} d t, \end{aligned}$ (1a) ) is the constraints are defined on an infinite set, which implies an infinite number of constraints. By taking advantage of the close connection between the natural cubic spline and the smoothing spline, the proposed method utilises a good representation of smoothing spline to establish a sufficient and necessary condition for transforming the exact shape constraints in (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ) to a finite number of constraints. The resultant solution to the case of r = 2 is straightforward, and the challenge arises when r = 1. To the best of our knowledge, an exact solution for r = 1 has yet to be found in the literature. To facilitate the computation of parameter estimation, we also develop efficient algorithm based on matrix approximation for the large data.

The remaining paper is organised as follows. Section 2 revisits the connection between the natural cubic splines and smoothing splines. The proposed exact shape-constraint smoothing spline is detailed in Section 3. An efficient computation algorithm for parameter estimation are developed in Section 4. Sections 5 and 6 evaluate the performance of the proposed method from a simulation study and an application to real life data. We conclude this work with some discussion in Section 7.

2. Revisit of natural cubic spline and smoothing spline

The natural cubic spline (NCS) plays an essential role for the smoothing spline. Suppose that the observed data are $(x_{1}, y_{1}), \dots, (x_{n}, y_{n})$ with $x_{1} < \dots < x_{n}$ . An NCS function $f (x)$ with knots at $x_{1}, \dots, x_{n}$ is a piecewise polynomial of degree up to three with breakpoints at $x_{1}, \dots, x_{n}$ . In addition, $f (x)$ is twice continuously differentiable at the knots and linear beyond the boundary.

Let $f (x)$ be a NCS with knots at $x_{1}, \dots, x_{n}$ . By definition, $f (x)$ can be expressed as (2a) $\begin{aligned} f (x) = \{\begin{cases} f_{0} (x) = c_{0} x + d_{0}, & x < x_{1}, \\ f_{i} (x) = a_{i} x^{3} \\ + b_{i} x^{2} + c_{i} x + d_{i}, & x_{i} \leq x < x_{i + 1} \\ f o r i = 1, \dots, n - 1, \\ f_{n} (x) = c_{n} x + d_{n}, & x \geq x_{n}, \end{cases} \end{aligned}$ (2a) (2b) $\begin{aligned} w i t h r e s t r i c t i o n s \\ f_{i} (x_{i + 1}) & = f_{i + 1} (x_{i + 1}), f_{i}^{'} (x_{i + 1}) = f_{i + 1}^{'} (x_{i + 1}), f_{i}^{″} (x_{i + 1}) \\ = f_{i + 1}^{″} (x_{i + 1}) f o r i = 0, \dots, n - 1. \end{aligned}$ (2b) The derivatives of $f (x)$ can be obtained by taking derivative on each polynomial and maintaining the relevant constraints. Please refer to Appendix 1 for explicit expressions. This piecewise polynomial representation of an NCS in (Equation2(2a) $\begin{aligned} f (x) = \{\begin{cases} f_{0} (x) = c_{0} x + d_{0}, & x < x_{1}, \\ f_{i} (x) = a_{i} x^{3} \\ + b_{i} x^{2} + c_{i} x + d_{i}, & x_{i} \leq x < x_{i + 1} \\ f o r i = 1, \dots, n - 1, \\ f_{n} (x) = c_{n} x + d_{n}, & x \geq x_{n}, \end{cases} \end{aligned}$ (2a) ) is a key for formulating the shape constraints in Section 3. For estimation of $f (x)$ , however, there exists another presentation for computational purpose. Specifically, we first estimate $f (x_{1}), \dots, f (x_{n})$ by writing them as a linear combination of specific basis functions and estimate the corresponding coefficients. Then we can utilise the value-second derivative representation (Green & Silverman, Citation1993) to recover the entire function $f (x)$ . As a result, the problem in (Equation1a(1a) $\begin{aligned} \underset{f}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} + λ \int_{a}^{b} [f^{(2)} (t)]^{2} d t, \end{aligned}$ (1a) ) can be converted into a ridge regression-like problem that can be efficiently solved.

Let $1_{n}$ be length-n vector of ones, $x = (x_{1}, \dots, x_{n})^{T}$ and $g = (f (x_{1}), \dots, f (x_{n}))^{T}$ . Without loss of generality, we assume $x$ is centred with zero mean. We can construct the banded matrices $Q$ and $R$ according to Equations (EquationA1(A1) $\begin{aligned} Q & = (\begin{matrix} h_{1}^{- 1} \\ - h_{1}^{- 1} - h_{2}^{- 1} & ⋱ & 0 \\ h_{2}^{- 1} & ⋱ \\ ⋱ & h_{n - 2}^{- 1} \\ 0 & - h_{n - 2}^{1} - h_{n - 1}^{- 1} \\ h_{n - 1}^{- 1} \end{matrix}), \end{aligned}$ (A1) ) and (EquationA2(A2) $\begin{aligned} R & = (\begin{matrix} \frac{1}{3} (h_{1} + h_{2}) & \frac{1}{6} h_{2} & 0 \\ \frac{1}{6} h_{2} & ⋱ & ⋱ \\ ⋱ & ⋱ & \frac{1}{6} h_{n - 2} \\ 0 & \frac{1}{6} h_{n - 2} & \frac{1}{3} (h_{n - 2} + h_{n - 1}) \end{matrix}), \end{aligned}$ (A2) ) in Appendix A.2. The linear mixed model representation described in Appendix A.3 allows us to rewrite the NCS formulation in (Equation2(2a) $\begin{aligned} f (x) = \{\begin{cases} f_{0} (x) = c_{0} x + d_{0}, & x < x_{1}, \\ f_{i} (x) = a_{i} x^{3} \\ + b_{i} x^{2} + c_{i} x + d_{i}, & x_{i} \leq x < x_{i + 1} \\ f o r i = 1, \dots, n - 1, \\ f_{n} (x) = c_{n} x + d_{n}, & x \geq x_{n}, \end{cases} \end{aligned}$ (2a) ) as (3) $g = 1_{n} θ_{0} + x θ_{1} + A β,$ (3) here $A = Q (Q^{T} Q)^{- 1} R^{1 / 2} \in R^{n \times (n - 2)}$ . The $θ_{1}, θ_{2}$ and $β = (β_{1}, \dots, β_{n - 2})^{'}$ are parameters. By construction, matrix $A$ has full rank. It is easy to check that $1_{n}^{T} x = 0$ , $1_{n}^{T} A = 0$ and $x^{T} A = 0$ . Hence ${1_{n}, x, A}$ form a basis of the n-dimensional euclidean space. Furthermore, if we define matrix $K = Q R^{- 1} Q^{T} \in R^{n \times n}$ , then we get: (4) $\int [f^{″} (t)]^{2} d t = g^{T} K g = β^{T} β .$ (4) It is worth to pointing out that the underlying model for the smoothing spline can be considered as a natural cubic spline with knots at $x_{1}, \dots, x_{n}$ and at most $k = 2 [n / 2] + 2$ additional knots of which the locations are unknown (Utreras, Citation1985). Here we made a mild assumption that $f (x)$ is a natural cubic spline with knots at $x_{1}, \dots, x_{n}$ . Combining this assumption with results in (Equation3(3) $g = 1_{n} θ_{0} + x θ_{1} + A β,$ (3) ) and (Equation4(4) $\int [f^{″} (t)]^{2} d t = g^{T} K g = β^{T} β .$ (4) ), the smoothing spline expressed in (Equation1a(1a) $\begin{aligned} \underset{f}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} + λ \int_{a}^{b} [f^{(2)} (t)]^{2} d t, \end{aligned}$ (1a) ) is equivalent to the problem of (5a) $\underset{(θ_{0}, θ_{1}, β)}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - θ_{0} - x_{i} θ_{1} - A_{i} β)^{2} + λ_{1} β^{T} β,$ (5a) where $A_{i}$ is the ith row of matrix $A$ . This is simply a ridge regression problem with response vector $y$ and covariates matrix $(1_{n}, x, A)$ without penalising $θ_{0}$ and $θ_{1}$ .

When one obtain the estimate of $θ = (θ_{0}, θ_{1}, b)^{T}$ , the entire estimated function can be constructed by following the procedure in Appendix A.3. one can actually obtain the piecewise polynomial representation of the estimated function by following the steps in Appendix A.4. That is, the piecewise polynomial representation is fully specified when $(θ_{1}, θ_{2}, β)$ are known.

3. Exact shape-constrained smoothing spline

To have the exact shape constraint, one major difficulty is that the inequality constraint in (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ) cannot be guaranteed by simply enforcing constraints at $x_{1}, \dots, x_{n}$ . Due to these challenges, many computable ‘solutions’ to the shape constrained smoothing spline problem described in ((Equation1a(1a) $\begin{aligned} \underset{f}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} + λ \int_{a}^{b} [f^{(2)} (t)]^{2} d t, \end{aligned}$ (1a) ) and (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) )) are approximations of some kind. Some approximate by assuming the resulting function being a natural cubic spline with knots at all data points (Turlach, Citation2005; Wang & Li, Citation2008), others approximate by discretisation of infinite constraint (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ) to a finite number of constraints (Mammen & Thomas-Agnan, Citation1999; Nagahara & Martin, Citation2013; Villalobos & Wahba, Citation1987).

It is known that the solution to the exact shape-constraint smoothing spline in (1a) and (1b) is a natural cubic spline with knots at $x_{1}, \dots, x_{n}$ and at most $k = 2 [n / 2] + 2$ additional knots of which the locations are unknown as proved in Theorem 3.3 in (Utreras, Citation1985). Unfortunately, it does not provide much practical use because of the unknown locations of those additional knots. However, it sheds some lights that a natural cubic spline with knots at all data points is an adequate approximation to the theoretically correct model proved by Utreras (Citation1985). Therefore, we develop our proposed method under the consideration that the estimated model is a natural cubic spline with knots at all data points. Specifically, we propose a representation only using n−1 constraints that is equivalent to the infinite constraint (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ) for $r = 1 o r 2$ . Compared to Turlach (Citation2005) who took an adaptive approach to adding constraints and thus changing the underlying quadratic program for parameter estimation, the proposed method optimises over a larger underlying model space yet it maintains the exactness of the shape constraint. Different from Wang Li (Citation2008) who only works on monotonicity constraint (r = 1), our proposed method also works on convexity constraint (r = 2) and can easily be extended to mixed and non-global constraint.

The key idea of our proposed method is to utilise the piecewise polynomial representation of NCS to provide a sufficient and necessary condition in converting constraint (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ) for r = 2 and r = 1 to the form of $c (θ; x) ⪰ 0$ , where we define notation ≽ as element-wise greater than or equal to. Then we can express the shape constrained smoothing spline problem as (5a) $\begin{aligned} \underset{(θ_{0}, θ_{1}, β)}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - θ_{0} - x_{i} θ_{1} - A_{i} β)^{2} + λ_{1} β^{T} β, \end{aligned}$ (5a) (5b) $\begin{aligned} s u b j e c t t o c (θ; x) ⪰ 0 . \end{aligned}$ (5b) The formulation above can be optimised by many standard optimisation methods that take non-linear constraints.

The shape constraint (Equation1b(1b) $\begin{aligned} s u b j e c t t o f^{(r)} (x) \geq 0 f o r x \in [L, U], \end{aligned}$ (1b) ) for r = 1 (monotonicity) and r = 2 (convexity) are presented below as Theorems 3.1 and 3.2, respectively. The mixed constraints can be achieved by combining the corresponding constraints.

Theorem 3.1

For the smoothing spline, the monotone non-decreasing constraint, defined as $f^{'} (x) \geq 0,$ holds if and only if constraint (Equation5b(5b) $\begin{aligned} s u b j e c t t o c (θ; x) ⪰ 0 . \end{aligned}$ (5b) ) with (6) $\begin{aligned} c (θ; x) = (c_{1} (θ; x), \dots, c_{n - 1} (θ; x))^{T}, \\ w h e r e \\ c_{i} (θ; x) = \{\begin{cases} min (f^{'} (x_{i}), f^{'} (x_{i + 1}), f^{'} (\frac{- b_{i}}{3 a_{i}})), \\ i f \frac{- b_{i}}{3 a_{i}} \in (x_{i}, x_{i + 1}) a n d a_{i} > 0 \\ min (f^{'} (x_{i}), f^{'} (x_{i + 1})), o t h e r w i s e \end{cases} \end{aligned}$ (6) holds.

Proof.

Based on the polynomial representation of NCS, it is easy to get the first derivative $f^{'} (x)$ as (7) $\begin{aligned} f^{'} (x) = \{\begin{cases} f_{0}^{'} (x) = c_{0}, & x < x_{1}, \\ f_{i}^{'} (x) = 3 a_{i} x^{2} \\ + 2 b_{i} x + c_{i}, & x_{i} \leq x < x_{i + 1} f o r \\ i = 1, \dots, n - 1, \\ f_{n}^{'} (x) = c_{n}, & x \geq x_{n}, \end{cases} \\ w i t h r e s t r i c t i o n s f_{i}^{'} (x_{i + 1}) = f_{i + 1}^{'} (x_{i + 1}), f_{i}^{″} (x_{i + 1}) \\ = f_{i + 1}^{″} (x_{i + 1}), f o r i = 0, \dots, n - 1. \end{aligned}$ (7) Clearly, $f^{'} (x)$ is a continuous piecewise polynomial of at most second order on each interval $[x_{1}, x_{2})$ ,…, $[x_{n - 1}, x_{n})$ , and constant on the boundary interval $(- \infty, x_{1})$ , and $[x_{n}, \infty)$ . For $i = 1, \dots, n - 1$ , if $a_{i} = 0$ , $f_{i}^{'} (x)$ is a linear function on $[x_{i}, x_{i + 1})$ , so $f^{'} (x) \geq min (f^{'} (x_{i}), f^{'} (x_{i + 1}))$ on the interval. If $a_{i} \neq 0$ , $f_{i}^{'} (x)$ is a parabola on $[x_{i}, x_{i + 1})$ with stationary point at $- b_{i} / 3 a_{i}$ , specifically

if $a_{i} < 0$ , $f_{i}^{'} (x)$ is concave. So regardless of the location of the stationary point, $f^{'} (x) \geq min (f^{'} (x_{i}), f^{'} (x_{i + 1}))$ on $[x_{i}, x_{i + 1})$ .
if $a_{i} > 0$ , $f_{i}^{'} (x)$ is a convex parabola where the stationary point may or may not lie on $[x_{i}, x_{i + 1})$ .
1. If the stationary point is not on the interval, $f_{i}^{'} (x)$ is monotone on $[x_{i}, x_{i + 1})$ , so $f^{'} (x) \geq min (f^{'} (x_{i}), f^{'} (x_{i + 1}))$ on the interval.
2. If the stationary point is on the interval, global minimum could be at either the boundary or the stationary point, so $f^{'} (x) \geq min (f^{'} (x_{i}), f^{'} (x_{i + 1}), f^{'} (- b_{i} / 3 a_{i}))$ on the interval.

Non-negativity on the boundary interval $(- \infty, x_{1})$ and $[x_{n}, \infty)$ hold if $c_{0} \geq 0$ and $c_{n} \geq 0$ . But no extra constraint is needed because by continuity of $f^{'} (x)$ , $c_{0} = f_{1}^{'} (x_{1})$ and $c_{n} = f_{n - 1}^{'} (x_{n})$ , non-negativitiy is already ensured by $c_{1} (θ; x) \geq 0$ and $c_{n - 1} (θ; x) \geq 0$ .

If $c (θ, x) ⪰ 0$ , then each piecewise polynomial, $f_{i}^{'} (x)$ for $i = 0, \dots, n$ , is non-negative, which in turn implies that the whole function $f^{'} (x)$ must be non-negative. Therefore, validity of inequality (Equation6b(5b) $\begin{aligned} s u b j e c t t o c (θ; x) ⪰ 0 . \end{aligned}$ (5b) ) implies the monotone non-decreasing constraint.

The other direction is obvious by definition.

Theorem 3.2

For the smoothing spline, the convexity constraint, defined as $f^{'} (x) \geq 0$ , holds if and only if constraint (Equation5b(5b) $\begin{aligned} s u b j e c t t o c (θ; x) ⪰ 0 . \end{aligned}$ (5b) ) with (8) $\begin{aligned} c (θ; x) = (c_{1} (θ; x), \dots, c_{n - 1} (θ; x))^{T}, \\ w h e r e \\ c_{i} (θ; x) = min (f^{″} (x_{i}), f^{″} (x_{i + 1})), \end{aligned}$ (8) holds.

Proof.

Based on the polynomial representation of NCS, it is easy to get the first derivative $f^{″} (x)$ as (9) $\begin{aligned} f^{″} (x) \\ = \{\begin{cases} f_{0}^{″} (x) = 0, & x < x_{1}, \\ f_{i}^{″} (x) = 6 a_{i} x + 2 b_{i}, & x_{i} \leq x < x_{i + 1} f o r \\ i = 1, \dots, n - 1, \\ f_{n}^{″} (x) = 0, & x \geq x_{n}, \end{cases} \\ w i t h r e s t r i c t i o n s f_{i}^{″} (x_{i + 1}) = f_{i + 1}^{″} (x_{i + 1}), \\ f o r i & = 0, \dots, n - 1. \end{aligned}$ (9) The $f^{″} (x)$ is a continuous piecewise linear polynomial on each interval $(- \infty, x_{1})$ , $[x_{1}, x_{2})$ ,…, $[x_{n - 1}, x_{n})$ and $[x_{n}, \infty)$ . Since $f^{″} (x) = 0$ for any $x \leq x_{1}$ and $x \geq x_{n}$ , we only need to consider $f^{″} (x)$ when $x \in (x_{1}, x_{n})$ . For any i, linearity of $f_{i}^{″} (x)$ implies $f^{″} (x) \geq min (f^{″} (x_{i}), f^{″} (x_{i + 1}))$ on interval $[x_{i}, x_{i + 1})$ . If $c (θ; x) ⪰ 0$ , each piecewise polynomial, $f_{i}^{″} (x)$ for $i = 0, \dots, n$ , is non-negative, which in turn implies that the whole function $f^{″} (x)$ is non-negative. Therefore, inequality (Equation6b(5b) $\begin{aligned} s u b j e c t t o c (θ; x) ⪰ 0 . \end{aligned}$ (5b) ) implies convexity constraint.

The other direction is obvious by definition.

The above theorems are defined for global constraint, i.e. constraint that holds on the entire domain of the function $[L, U]$ . To extend the results to mixed constraint and non-global constraint, we can easily apply Theorem 3.2 and Theorem 3.1 to different local intervals $[L_{j}, U_{j}]$ , where $L \leq L_{j} \leq U_{j} \leq U$ . In addition, we can impose up to second-order smooth constraint on the boundary of local intervals. A general procedure is described as follows:

Let $L_{i}$ for $i = 1, \dots, M$ be the points where monotonicity/convexity changes, and $L_{0} = L$ and $L_{M + 1} = U$ . Partition the domain of $f (x)$ into $(L_{0}, L_{1}], \dots, (L_{M}, L_{M + 1})$ . As a result, $f (x)$ on each interval only requires one shape constraint.
For each interval, impose constraint according to Theorem 3.2 and Theorem 3.1.
For $i = 1, \dots, M$ , add $f^{'} (L_{i}) = 0$ for monotone constraint or $f^{″} (L_{i}) = 0$ for convexity constraint.

The importance of Step 3 is that it can prevent the stationary/inflection point of the estimated function from floating between the knots immediately smaller and larger than the point where monotonicity/convexity changes.

4. Efficient algorithm for parameter estimation

Note that the optimisation problem described in (Equation6a(5a) $\begin{aligned} \underset{(θ_{0}, θ_{1}, β)}{m i n i m i s e} \sum_{i = 1}^{n} (y_{i} - θ_{0} - x_{i} θ_{1} - A_{i} β)^{2} + λ_{1} β^{T} β, \end{aligned}$ (5a) ) and (Equation6b(5b) $\begin{aligned} s u b j e c t t o c (θ; x) ⪰ 0 . \end{aligned}$ (5b) ) is a quadratic programming with non-linear constraints. Using Python, our implementation algorithm is based on the ralg solver under the OpenOPT platform. The ralg solver resembles the quasi-Newton Method with adaptive space dilation developed by Shor and Zhurbenko (Citation1971). Two advantages for this choice are that it accepts user-provided first-derivative and allows large number of constraints.

4.1. Computation of large data

When the number of observation n is large, the growing dimension of the $n \times (n - 2)$ matrix $A$ and vector of constraints $c (θ; x)$ are the bottleneck for efficient computation. To address this challenge, we consider to approximate the matrix $A$ by a $n \times K$ dimensional matrix $A^{*}$ , where $K \leq n - 2$ is independent of n. It also allows the dimension of $θ$ to be some fixed integer $K + 2 \leq n$ .

Following Wand and Ormerod's (Citation2008) mixed model formulation in the semiparametric regression, we adopt a good way to approximate $A$ with $A^{*} = B L$ , where $B$ is an $n \times (K + 4)$ matrix and $L$ is an $(K + 4) \times (K + 2)$ matrix.

The construction of $B$ is described as follows. First we define $κ_{1}, \dots, κ_{K + 8}$ as $\begin{aligned} a & = κ_{1} = κ_{2} = κ_{3} = κ_{4} = x_{1} - ϵ, κ_{5} = x_{1}, \\ κ_{6}, \dots, κ_{K + 3} to be the \\ \times (\frac{1}{K + 1}, \frac{2}{K + 1}, \dots, \frac{K}{K + 1}) \\ \times 100^{t h} percentile of x_{1}, \dots, x_{n}, \\ κ_{K + 4} & = x_{n}, b = κ_{K + 5} = κ_{K + 6} \\ = κ_{K + 7} = κ_{K + 8} = x_{n} + ϵ . \end{aligned}$ Then we construct B-spline basis functions (Hastie, Tibshirani, & Friedman, Citation2001) $B_{1, 4} (x), \dots, B_{K + 4, 4} (x)$ based on knots sequence $κ_{1}, \dots, κ_{K + 8}$ as $\begin{aligned} B_{i, 1} (x) = \{\begin{cases} 1 & if x \in [κ_{i}, κ_{i + 1}) \\ 0 & otherwise, \end{cases} \end{aligned}$ for $i = 1, \dots, K + 7$ , and $\begin{aligned} B_{i, m} (x) & = \frac{x - κ_{i}}{κ_{i + m - 1} - κ_{i}} B_{i, m - 1} (x) \\ + \frac{κ_{i + m} - x}{κ_{i + m} - κ_{i + 1}} B_{i + 1, m - 1} (x), \end{aligned}$ for m = 2, 3, 4 and $i = 1, \dots, K + 8 - m$ . Consequently, the matrix $B$ is constructed with its $(i, j)^{t h}$ entry $b_{i, j} = B_{j, 4} (x_{i})$ .

The construction of matrix $L$ is described as follows. First we define an $(K + 4) \times (K + 4)$ matrix $Ω$ with its $(i, j)^{t h}$ entry as (10) $Ω_{i, j} = \int_{a}^{b} B_{i, 4}^{″} (x) B_{j, 4}^{″} (x) d x,$ (10) where function $B_{i, 4}^{″} (x)$ is the second derivative function of $B_{i, 4} (x)$ for $i = 1, \dots, K + 4$ as following: $\begin{aligned} B_{i, 4}^{″} (x) & = 6 {\frac{B_{i, 2} (x)}{(κ_{i + 3} - κ_{i}) (κ_{i + 2} - κ_{i})} \\ - [\frac{1}{(κ_{i + 4} - κ_{i + 1}) (κ_{i + 3} - κ_{i + 1})} \\ + \frac{1}{(κ_{i + 3} - κ_{i}) (κ_{i + 3} - κ_{i + 1})}] B_{i + 1, 2} (x) \\ + \frac{B_{i + 2, 2} (x)}{(κ_{i + 4} - κ_{i + 1}) (κ_{i + 4} - κ_{i + 2})}} \end{aligned}$ Based on the spectral decomposition, $Ω$ can be written as $Ω = U D U^{T}$ , where $U \in R^{(K + 4) \times (K + 4)}$ is an orthogonal matrix and $D \in R^{(K + 4) \times (K + 4)}$ is diagonal matrix with K + 2 positive entries and two zero entries on the diagonal. Let $d$ be a vector that contains the K + 2 positive entries in matrix $D$ , and matrix $U_{x} \in R^{(K + 4) \times (K + 2)}$ contains the columns in $U$ of which their positions correspond to those positive entries in $D$ . Then $L$ is constructed as $L = U_{x} d i a g (d^{- \frac{1}{2}})$ , where $d i a g (d)$ denotes a diagonal matrix with diagonal entries equal to vector $d$ .

Wand and Ormerod (Citation2008) shows that when K = n−2, $A^{*} = B L = A$ . When K<n−2, by substituting matrix $A$ by $A^{*}$ , the length of parameter vector $θ^{*} = (θ_{0}, θ_{1}, b^{*})$ reduces to K + 2 since the length of vector $b^{*}$ is K. Another advantage of using $A^{*}$ is that other than the fact that $K \leq n$ , the choice of K is independent to n.

The essence of this approximation attributes to the construction of matrix $B$ in Step 1(c) above. The choice of K controls the length of knots sequence being used in the construction of the B-spline basis functions. When K is at its maximum of n−2, all data are used as the knots sequence. Then no approximation will occur. When K<n−2, a proper subset of data is used as the knots sequence. One can understand this reduction in the length of knots sequence as approximating the full data with a properly chosen (equally spaced in terms of percentile) subset of data. As a result of this approximation, there is a reduction in dimension from $A$ to $A^{*}$ . For a proper choice of K, a large value of K close to n may not lead to significant computational gain. While a small value of K could make the approximation low accurate. From our empirical experience, K = 30 provides a good balance between computational gain and approximation quality.

4.2. Tuning parameter selection

The tuning parameter λ controls the smoothness of the estimated function. As $λ \to \infty$ , the estimated function approaches linearity (smoothest); whereas when $λ \to 0$ , the estimated function tends to the natural cubic spline interpolant (roughest). Although the incorporation of appropriate monotonicity and/or convexity constraints helps smooth out unnecessary roughness in the estimated function, the choice of tuning parameter λ for the exact shape constrained smoothing spline is still important in obtaining a good fit. In this work, we select the optimal value of λ that minimises mean squared error using k-fold cross-validation.

5. Simulations

In this section, we evaluate the performance of our proposed method, the shape constraint smoothing spline (SCSS), under various non-linear functions and error combinations. The methods of comparison include the Brunk's isotonic non-parametric estimator (Brunk), the unconstrained smoothing spline (SS), the proposed SSCS with global monotone non-decreasing constraint (SSCS-Monotone), the proposed SSCS with problem specific convexity constraint (SSCS-Convex), and the proposed SSCS with global monotone non-decreasing and problem specific convexity constraints (SSCS-Mixed). In addition, we also compare the regression spline under the aforementioned three types of constraints, respectively, denoted as RS-Monotone, RS-Convex, and RS-Mixed. We also compare the B-spline method under the monotone non-decreasing constraint (BS-Monotone) and problem specific convexity constraint (BS-Convex). The regression spline and the B-spline methods are implemented using R Package cgam and spline2, respectively. Note that Wang and Li (Citation2008) kindly provided their code for comparison, but requires a commercial optimiser, MOSEK, which we currently have no access to.

The simulation data are generated from the following model: $y_{i} = f (x_{i}) + ϵ_{i}, i = 1, \dots, 50,$ where $x_{i}$ distributes uniformly between -10 to 10. The function $f (x)$ varies among examples as follows:

Example 1: $f (x) = 1 / (1 + e^{- x})$ is an increasing function which is convex for x<0 and concave when x>0,
Example 2: $f (x) = x^{3} / 10^{3}$ is an increasing function which is concave for x<0 and convex when x>0,
Example 3: $f (x) = 0 I_{- 10 \leq x \leq - 3} + 0.2 I_{- 3 < x \leq 0} + 0.5 I_{0 < x \leq 5} + 0.8 I_{5 < x \leq 8} + 1 I_{8 < x \leq 10}$ is a non-decreasing step function,
Example 4: $f (x) = (20 x^{2} + x^{3}) / 3000$ is a non-monotone function which is convex for $x > \frac{- 20}{3}$ and concave when $x < \frac{- 20}{3}$ ,
Example 5: $f (x) = (e^{x / 20} - e^{- 10 / 20}) / e^{10 / 20} - e^{- 10 / 20}$ is an increasing function which is convex everywhere.

Figure shows the visualisation of above functions. For each example above, three different distributions for ε are considered: 1) the normal distribution $N (0, σ^{2})$ ; 2) student t distribution with 10 degree of freedom; and 3) Beta distribution $B e t a (3, 2)$ . These error distributions have zero mean and standard deviation $σ = 0.4$ . These simulation setup is identical to Wang and Li (Citation2008), except we added a globally convex function in Example 5. For each setting, the simulation is repeated for 500 times. The mean squared prediction error $M S P E (\hat{f}) = 1 / n \sum_{i = 1}^{n} (\hat{f} (x_{i}) - f (x_{i}))^{2}$ is used as an evaluation criterion.

Figure 1. Comparison of the five true functions used in simulation studies.

Table reports the averaged MSPE ( $\times 100$ ) and its standard error ( $\times 100$ ) over 500 repetitions for methods in comparison. It is clear that the proposed methods with appropriate constraint have smaller MSPE than other methods in comparison. Note that it is important to impose appropriate constraint. In Example 4 which has a quadratic-shaped function, The performance of SCSS-Monotone is not as good as the SS since the monotone constraint is not proper here. When imposing the convexity constraint, the SCSS-convex performs much better than other methods. It is worth pointing out the B-spline method has comparable performance to the proposed method in Example 1, but not as good as the proposed method in other examples.

Table 1. Simulation studies comparing SCSS and other estimators under different functions and errors settings.Averaged MSPE ( $\times 100$ ) and its standard error ( $\times 100$ ) over 500 repetitions are reported. NA entries correspond to methods with no applicable settings.

Display Table

Figure 5 to 10 in Appendix 3 report the estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS and other estimators for Example 4 and Example 5. The SCSS-Monotone, SCSS-Convex and SCSS-Mixed produce slightly smoother percentile intervals compared to other methods, especially at the left and right boundaries on the x axis. Example 4 reveals the behaviour of SCSS under a mis-specified monotone constraint. On the interval $[- 10, 0]$ , the true function in Example 4 is monotone decreasing but SCSS-Monotone is constrained to be non-decreasing.

To further examine the rate of convergence, we consider a numerical study to check the proposed method's convergence as the sample size gets large. Taking Example 5 for elaboration, we allow the same size increasing gradually from $n = 50$ to n = 300. At each given sample size, we compare the discrepancy between $f (x)$ and $\hat{f} (x)$ by $\int (f (x_{i}) - \hat{f} (x_{i}))^{2} d x$ , Table shows that as sample size increases, the function estimated by SCSS is getting closer to the true function. Figure reports the convergence of the estimated function in Example 5 under the normal error and convex constraint. Results for the other two constraints can be found in the Appendix 3.

Figure 2. SCSS convergence for function $f (x) = (e^{x / 20} - e^{- 10 / 20}) / (e^{10 / 20} - e^{- 10 / 20})$ in Example 5 under normal error and convex constraint.

Table 2. Simulation studies measuring the convergence of SCSS based on the integrated mean squared errors.

Download CSV Display Table

6. Real data analysis

In this section, we evaluate the performance of the proposed SCSS methods in comparison with the regular smoothing spline (SS). The Auto MPG Data from UCI Machine Learning Repository (Lichman, Citation2013) is used for our demonstration. This dataset concerning fuel consumption contains 398 observations with nine attributes: fuel consumption (miles per gallon), number of cylinders, engine displacement (cubic inches), horsepower, vehicle weight (pounds), time to accelerate from 0 to 60 mph (sec.), model year (modulo 100), origin of car (1. American, 2. European, 3. Japanese) and car names. For both methods, the optimal value of tuning parameter λ by minimising the average mean squared error from 10-fold cross-validation.

We first analyse relationship between the weight (weight) and the fuel consumption (mpg) of vehicles. Figure confirms the intuition of a negative correlation between weight and mpg. Without any constraint, SS provided a monotone estimate that is consistent with the intuition. On the other hand, it is assuring to see that SCSS-Monotone provides an estimate that almost overlaps its unconstrained counterpart.

Figure 3. Comparison between unconstrained (SS) and monotone constrained (SCSS) smoothing spline for the Auto MPG Data. The response is mpg, modelled as a function of weight.

Next we consider the vehicle's volume (displacement) to be the predictor variable instead of weight. In general, one would expect a negative correlation between mpg and displacement. Figure reveals the potential problem when prior knowledge on the function shape is not incorporated. The wiggly function estimated by SS contradicts the expectation of a monotone decreasing relationship between mpg and displacement. While the proposed SCSS-monotone capture the pattern of data quite well.

Figure 4. Comparison between unconstrained (SS) and monotone constrained (SCSS) smoothing spline for the Auto MPG Data. The response is mpg, modelled as a function of displacement.

In a short summary, when one has prior shape information about the function to be estimated, it would be better to incorporate it into the estimation process. The proposed shape constraint smoothing spline can effectively help avoid unexpected results from non-parametric method.

7. Discussion

In this work, we proposed to impose the exact shape constraint on the smoothing spline, and enable efficient estimation. Compared to other methods also based on the fundamental assumption that the resulted function is a NCS with knots at all data points, the proposed SCSS method guarantees constraints to be followed exactly and also allows mixed and/or non-global constraints.

The technique developed in the SSCS method can be extended for the additive model, partially linear regression model, the non-parametric model with covaraites, etc. The theoretical investigation of the asymptotic convergence of SSCS can be quite challenging due to the exact (i.e. over an interval) constraint. Some theoretical results are available for functional ANOVA using spline with shape constraint at given finite locations (Dai & Chien, Citation2017). However, such results can not be easily extended to the smoothing spline with exact constraint. Another future study is to formulate necessary and sufficient conditions for function bound constraint and log-convexity constraint. In addition, efficient optimisation methods that take advantage of a quadratic program with non-linear constraints could be useful for further enhance our proposed method.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by National Science Foundation [1634867].

Notes on contributors

Vincent Chan

Vincent Chan obtained his Ph.D. from the department of statistics at University of Wisconsin-Madison. His research interests include single index model and regularization.

Kam-Wah Tsui

Kam-Wah Tsui is a professor in the department of statistics at University of Wisconsin-Madison. His research interests include Bayesian analysis, decision theory, survey sampling, and statistical inference.

Yanran Wei

Yanran Wei is a Ph.D. student in the department of statistics at Virginia Tech. Her research interests include Big Data analytics, and statistical modeling in financial application.

Zhiyang Zhang

Zhiyang Zhang is a faculty in the department of statistics at Virginia Tech. Her research interests include modeling complex data, and statistical modeling in chemistry applications.

Xinwei Deng

Xinwei Deng is an associate professor in the department of statistics at Virginia Tech. His research interests include machine learning, design of experiment, and interface between machine learning and experimental design.

References

Brezger, A., & Steiner, W. J. (2003). Monotonic regression based on Bayesian P-splines: An application to estimating price response functions from store-level scanner data. Tech. rep., Discussion paper//Sonderforschungsbereich 386 der Ludwig-Maximilians-Universität München.
Google Scholar
Curry, H. B., & Schoenberg, I. J. (1966). On Pólya frequency functions IV: The fundamental spline functions and their limits. Journal D'analyse Mathématique, 17, 71–107. doi: 10.1007/BF02788653
Web of Science ®Google Scholar
Dai, X., & Chien, P. (2017). Minimax optimal rates of estimation in functional anova models with derivatives. arXiv:1706.00850.
Google Scholar
Delecroix, M., & Thomas-Agnan, C. (2000). Spline and Kernel regression under shape restrictions. In Smoothing and Regression: Approaches, Computation, and Application (pp. 109–133).
Google Scholar
Dierckx, I. P (1980). Algorithm/algorithmus 42 an algorithm for cubic spline fitting with convexity constraints. Computing, 24, 349–371. doi: 10.1007/BF02237820
Web of Science ®Google Scholar
Ducharme, G. R., & Fontez, B. (2004). A smooth test of goodness-of-fit for growth curves and monotonic nonlinear regression models. Biometrics, 60, 977–986. doi: 10.1111/j.0006-341X.2004.00253.x
PubMed Web of Science ®Google Scholar
Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications: Monographs on statistics and applied probability. New York: CRC Press.
Google Scholar
Green, P. J. (1987). Penalized likelihood for general semi-parametric regression models. International Statistical Review / Revue Internationale de Statistique, 55(3), 245.
Web of Science ®Google Scholar
Green, P. J., & Silverman, B. W. (1993). Nonparametric regression and generalized linear models: A roughness penalty approach. New York: CRC Press.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J, The elements of statistical learning, Springer series in statistics Springer (Vol. 1). Berlin, 2001.
Google Scholar
He, X., & Shi, P. (1998). Monotone B-spline smoothing. Journal of the American Statistical Association, 93, 643–650.
Web of Science ®Google Scholar
Kelly, C., & Rice, J. (1990). Monotone smoothing with application to dose-response curves and the assessment of synergism. Biometrics, 46(4), 1071–1085. doi: 10.2307/2532449
PubMed Web of Science ®Google Scholar
Liao, X., & Meyer, M. C. (2017). Change-point estimation using shape-restricted regression splines. Journal of Statistical Planning and Inference, 188, 8–21. doi: 10.1016/j.jspi.2017.03.007
Web of Science ®Google Scholar
Lichman, M. (2013), UCI machine learning repository.
Google Scholar
Mammen, E., & Thomas-Agnan, C. (1999). Smoothing splines and shape restrictions. Scandinavian Journal of Statistics, 26, 239–252. doi: 10.1111/1467-9469.00147
Web of Science ®Google Scholar
Matzkin, R. L. (1991). Semiparametric estimation of monotone and concave utility functions for polychotomous choice models. Econometrica: Journal of the Econometric Society, 59,1315–1327. doi: 10.2307/2938369
Web of Science ®Google Scholar
Meyer, M. C. (2008). Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2, 1013–1033. doi: 10.1214/08-AOAS167
Web of Science ®Google Scholar
Meyer, M. C. (2012). Constrained penalized splines. Canadian Journal of Statistics, 40, 190–206. doi: 10.1002/cjs.10137
Web of Science ®Google Scholar
Meyer, M. C. (2018). Constrained partial linear regression splines. Statistica Sinica, 28, 277–292.
Web of Science ®Google Scholar
Nagahara, M., & Martin, C. F. (2013). Monotone smoothing splines using general linear systems. Asian Journal of Control, 15, 461–468. doi: 10.1002/asjc.557
Web of Science ®Google Scholar
Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3,425–441. doi: 10.1214/ss/1177012761
Google Scholar
Shor, N. Z., & Zhurbenko, N. (1971). The minimization method using space dilatation in direction of difference of two sequential gradients. Kibernetika, 3, 51–59.
Google Scholar
Turlach, B. A. (2005). Shape constrained smoothing using smoothing splines. Computational Statistics, 20, 81–104. doi: 10.1007/BF02736124
Web of Science ®Google Scholar
Utreras, F. I. (1985). Smoothing noisy data under monotonicity constraints existence, characterization and convergence rates. Numerische Mathematik, 47, 611–625. doi: 10.1007/BF01389460
Web of Science ®Google Scholar
Villalobos, M., & Wahba, G. (1987). Inequality-constrained multivariate smoothing splines with application to the estimation of posterior probabilities. Journal of the American Statistical Association, 82, 239–248. doi: 10.1080/01621459.1987.10478426
Web of Science ®Google Scholar
Wahba, G. (1990). Spline models for observational data. Philadelphia, Pennsylvania: Siam.
Google Scholar
Wand, M., & Jones, M.. (1995), Kernel smoothing. Vol. 60 of Monographs on statistics and applied probability.
Google Scholar
Wand, M. P., & Ormerod, J. T. (2008). On Semiparametric regression with O'sullivan penalized splines. Australian & New Zealand Journal of Statistics, 50(2), 179–198. doi: 10.1111/j.1467-842X.2008.00507.x
Web of Science ®Google Scholar
Wang, X., & Li, F. (2008). Isotonic smoothing spline regression. Journal of Computational and Graphical Statistics, 17, 21–37. doi: 10.1198/106186008X285627
Web of Science ®Google Scholar
Zeng, L., Deng, X., & Yang, J. (2016). Constrained hierarchical modeling of degradation data in tissue-engineered scaffold fabrication. IIE Transactions, 48, 16–33. doi: 10.1080/0740817X.2015.1019164
Web of Science ®Google Scholar
Zhang, D., Lin, X., Raz, J., & Sowers, M. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association, 93(442), 710. doi: 10.1080/01621459.1998.10473723
Web of Science ®Google Scholar

Appendices

Appendix 1

We provide some preliminary materials about natural cubic spline (NCS) and smoothing spline. Readers of interest can refer to Wahba (Citation1990) and Green and Silverman (Citation1993) for details.

A.1. Value-second derivative representation

The value-second derivative representation allows specification of a NCS simply by its value and second derivative at each knots. This representation provides a link between the entire NCS

f (x)

and

(x_{i}, f (x_{i}))

for

i = 1, \dots, n

. Let us define

g_{i} = f (x_{i}) a n d γ_{i} = f^{″} (x_{i}) f o r i = 1, \dots, n .

Also, let vector

g = (g_{1}, \dots, g_{n})^{T}

and

γ = (γ_{1}, \dots, γ_{n})^{T}

. Note that due to the natural boundary conditions of a NCS,

γ_{1} = γ_{n} = 0

. In addition, construct

n \times (n - 2)

matrix

Q

and

(n - 2) \times (n - 2)

matrix

R

as follows:

(A1)

\begin{aligned} Q & = (\begin{matrix} h_{1}^{- 1} \\ - h_{1}^{- 1} - h_{2}^{- 1} & ⋱ & 0 \\ h_{2}^{- 1} & ⋱ \\ ⋱ & h_{n - 2}^{- 1} \\ 0 & - h_{n - 2}^{1} - h_{n - 1}^{- 1} \\ h_{n - 1}^{- 1} \end{matrix}), \end{aligned}

(A1)

(A2)

\begin{aligned} R & = (\begin{matrix} \frac{1}{3} (h_{1} + h_{2}) & \frac{1}{6} h_{2} & 0 \\ \frac{1}{6} h_{2} & ⋱ & ⋱ \\ ⋱ & ⋱ & \frac{1}{6} h_{n - 2} \\ 0 & \frac{1}{6} h_{n - 2} & \frac{1}{3} (h_{n - 2} + h_{n - 1}) \end{matrix}), \end{aligned}

(A2) where

h_{i} = x_{i + 1} - x_{i}, f o r i = 1, \dots, n - 1

. By construction, matrix

R

is strictly positive-definite.

Lemma A.1

Theorem 2.1 in Green and Silverman (Citation1993)

The vectors $g$ and $γ$ specify a natural cubic spline f if and only if the condition (A3) $Q^{T} g = R γ,$ (A3) is satified. If (EquationA3(A3) $Q^{T} g = R γ,$ (A3) ) is satified then the roughness penalty will satisfy (A4) $\int [f^{″} (t)]^{2} d t = γ^{T} R γ = g^{T} K g,$ (A4) where $K = Q R^{- 1} Q^{T}$ .

This value-second derivative representation provides a formula to recover the entire NCS with $x_{i}$ and $f (x_{i})$ for $i = 1, \dots, n$ .

A.2. Linear mixed model representation

The linear mixed model representation of an NCS allows us to express $f (x_{1}), \dots, f (x_{n})$ as a linear combination of a specific basis functions. Let function f be an NCS on interval $[a, b]$ . Denote $L^{2} [a, b]$ the space of square integrable functions on interval $[a, b]$ . Let $H = {f : f, f^{'}$ are absolutely continuous, $f^{″} \in L^{2} [a, b]}$ be a second-order Sobolev space of the NCS functions. Under the following definition of norm $| | f | |^{2} = {[\int_{a}^{b} f (x) d x]}^{2} + {[\int_{a}^{b} f^{'} (x) d x]}^{2} + \int_{a}^{b} [f^{″} (x)]^{2} d x .$ Wahba (Citation1990) shows that $H$ is a reproducing kernel Hilbert space that can be decomposed into a direct sum of three orthogonal subspaces: $H = {1} \oplus H_{0} \oplus H_{1},$ where ${1}$ is the mean subspace, $H_{0} = {f : f^{″} (x) = 0}$ is the linear contrast subspace and $H_{1} = {f : \int_{a}^{b} f (x) d x = 0, \int_{a}^{b} f^{'} (x) d x = 0, \int_{a}^{b} f^{″} (x) d x \in L [a, b]}$ is the non-linear subspace. This decomposition means that any NCS function $f \in H$ can be uniquely decomposed into a sum of a constant part, a linear part and a non-linear part as follows: (A5) $f (x) = θ_{0} + x θ_{1} + f_{1} (x),$ (A5) for some functions $f_{1} \in H_{1}$ .

Knowing that the solution is necessarily a NCS with knots at $x_{1}, \dots, x_{n}$ , one particular form of Equation (EquationA5(A5) $f (x) = θ_{0} + x θ_{1} + f_{1} (x),$ (A5) ) is given by the linear mixed model representation (Green Citation1987; Zhang et al. Citation1998) as follows: (A6) $g = 1_{n} θ_{0} + x θ_{1} + A β,$ (A6) where $A = Q (Q^{T} Q)^{- 1} R^{1 / 2}$ is a $n \times (n - 2)$ matrix, $1_{n}$ is a length-n vector of ones and $x = (x_{1}, \dots, x_{n})^{T}$ .

Appendix 2

The linear mixed model representation is used for efficient computation of NCS. Meanwhile, the piecewise polynomial representation is used for formulating shape constraint on NCS for the same problem. The connection between linear mixed model representation and piecewise polynomial representation are stated as follows.

A.3. Specifying the NCS function f from $x$ and $g$

Given $x$ , matrice $Q$ and $R$ can be constructed as shown in Appendix A.2. The second derivative vector $γ$ , can be obtained by Theorem A.1 as follows, (A7) $γ = R^{- 1} Q^{T} g,$ (A7) since $R$ is of full rank by construction. From Section 2.4.1 in Green and Silverman (Citation1993), the derivate of $f (.)$ at knot $x_{1}$ and $x_{n}$ are $\begin{aligned} g_{1}^{'} & = \frac{g_{2} - g_{1}}{x_{2} - x_{1}} - \frac{1}{6} (x_{2} - x_{1}) γ_{2} \\ g_{n}^{'} & = \frac{g_{n} - g_{n - 1}}{x_{n} - x_{n - 1}} + \frac{1}{6} (x_{n} - x_{n - 1}) γ_{n - 1}, \end{aligned}$ respectively. Finally, with $h_{i} = x_{i + 1} - x_{i}$ , the following formula summarised from Section 2.4.2 in Green and Silverman (Citation1993) gives the entire NCS function f: $f (t) = \{\begin{cases} g_{1} - (x_{1} - t) g_{1}^{'}, & i f t \leq x_{1}, \\ \frac{(t - x_{i}) g_{i + 1} + (x_{i + 1} - t) g_{i}}{h_{i}} \\ - \frac{(t - x_{i}) (x_{i + 1} - t)}{6} \\ \{(1 + \frac{t - x_{i}}{h_{i}}) γ_{i + 1} \\ + (1 + \frac{x_{i + 1} - t}{h_{i}}) γ_{i}\}, & i f x_{i} < t < x_{i + 1} \\ f o r i \in (1, \dots, n - 1), \\ g_{n} + (t - x_{n}) g_{n}^{'}, & i f t > x_{n} . \end{cases}$ Hence the NCS function f is fully specified. That is, we can reconstruct NCS function f from $x$ and $g$ .

A.4. Specifying the piecewise polynomial representation given f

The resulting function f can be used to estimate all coefficients $a_{i}$ , $b_{i}$ , $c_{i}$ and $d_{i}$ under the piecewise polynomial representation of f. The steps are as follows:

Given $x$ and the function of f, get $γ$ from (EquationA7(A7) $γ = R^{- 1} Q^{T} g,$ (A7) ).
Calculate $(f^{'} (x_{1}), \dots, f^{'} (x_{n}))^{T}$ by Equation (2.20) and (2.21) in Green and Silverman (Citation1993) as: $\begin{aligned} f^{'} (x_{i}) & = \frac{g_{i + 1} - g_{i}}{x_{i + 1} - x_{i}} \\ - \frac{(x_{i + 1} - x_{i}) (2 γ_{i} + γ_{i + 1})}{6}, i = 1, \dots, n - 1; \\ f^{'} (x_{n}) & = \frac{g_{n} - g_{n - 1}}{x_{n} - x_{n - 1}} + \frac{γ_{n - 1} (x_{n} - x_{n - 1})}{6} . \end{aligned}$
Obtain $c_{0} = f_{0}^{'} (x_{1})$ and $c_{n} = f_{n}^{'} (x_{n})$ based on the piecewise polynomial representation.
Using the fact $f^{″} (x_{i + 1}) = 6 a_{i} x_{i + 1} + 2 b_{i}$ and $f^{″} (x_{i}) = 6 a_{i} x_{i} + 2 b_{i}$ , we get $a_{i} = \frac{1}{6} ((γ_{i + 1} - γ_{i}) / (x_{i + 1} - x_{i}))$ for $i = 1, \dots, n - 1$ .
With $a_{i}$ 's from previous step, obtain $b_{i} = \frac{1}{2} (γ_{i} - 6 a_{i} x_{i})$ for $i = 1, \dots, n - 1$ .
From previous steps with $a_{i}$ , $b_{i}$ and $f^{'} (x_{i})$ , obtain $c_{i} = f^{'} (x_{i}) - 3 a_{i} x_{i}^{2} - 2 b_{i} x_{i}$ for $i = 1, \dots, n - 1$ .
Obtain $d_{n} = f (x_{n}) - c_{n} x_{n}$ , $d_{0} = f (x_{1}) - c_{0} x_{1}$ , and $d_{i} = f (x_{i}) - a_{i} x_{i}^{3} - b_{i} x_{i}^{2} - c_{i} x_{i}$ for $i = 1, \dots, n - 1$ .

Appendix 3

Figure A1. The estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS for function $f (x) = (20 x^{2} + x^{3}) / 3000$ in Example 4 under normal error.

Figure A2. The estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS for function $f (x) = (20 x^{2} + x^{3}) / 3000$ in Example 4 under t error.

Figure A3. The estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS for function $f (x) = (20 x^{2} + x^{3}) / 3000$ in Example 4 under beta error.

Figure A4. The estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS for function $f (x) = (e^{x / 20} - e^{- 10 / 20}) / (e^{10 / 20} - e^{- 10 / 20})$ in Example 5 under normal error.

Figure A5. The estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS for function $f (x) = (e^{x / 20} - e^{- 10 / 20}) / (e^{10 / 20} - e^{- 10 / 20})$ in Example 5 under t error.

Figure A6. The estimator percentiles ( $2.5 %$ and $97.5 %$ ) of SCSS for function $f (x) = (e^{x / 20} - e^{- 10 / 20}) / (e^{10 / 20} - e^{- 10 / 20})$ in Example 5 under beta error.

Figure A7. SCSS convergence for function $f (x) = (e^{x / 20} - e^{- 10 / 20}) / (e^{10 / 20} - e^{- 10 / 20})$ in Example 5 under normal error and monotone constraint.

Figure A8. SCSS convergence for function $f (x) = (e^{x / 20} - e^{- 10 / 20}) / (e^{10 / 20} - e^{- 10 / 20})$ in Example 5 under normal error and mixed constraint.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free