ABSTRACT
Spline smoothing is a widely used nonparametric method that allows data to speak for themselves. Due to its complexity and flexibility, fitting smoothing spline models is usually computationally intensive which may become prohibitive with large datasets. To overcome memory and CPU limitations, we propose four divide and recombine (D&R) approaches for fitting cubic splines with large datasets. We consider two approaches to divide the data: random and sequential. For each approach of division, we consider two approaches to recombine. These D&R approaches are implemented in parallel without communication. Extensive simulations show that these D&R approaches are scalable and have comparable performance as the method that uses the whole data. The sequential D&R approaches are spatially adaptive which lead to better performance than the method that uses the whole data when the underlying function is spatially inhomogeneous.
Supplementary Material: Simulation Codes
1. | fit only: folder that contains four R code files to compute spline estimates and system runtime for three cases with GML choice of the smoothing parameter and σ = 0.1.
| ||||||||||||||||||||||||||||
2. | with pstd: folder that contains four R codes files to compute confidence intervals, average MSE, squared bias and variance for three cases with GML choice of the smoothing parameter and σ = 0.1.
|