477
Views
9
CrossRef citations to date
0
Altmetric
Short Technical Notes

Divide and Recombine Approaches for Fitting Smoothing Spline Models with Large Datasets

&
Pages 677-683 | Received 01 Apr 2017, Published online: 06 Jun 2018
 

ABSTRACT

Spline smoothing is a widely used nonparametric method that allows data to speak for themselves. Due to its complexity and flexibility, fitting smoothing spline models is usually computationally intensive which may become prohibitive with large datasets. To overcome memory and CPU limitations, we propose four divide and recombine (D&R) approaches for fitting cubic splines with large datasets. We consider two approaches to divide the data: random and sequential. For each approach of division, we consider two approaches to recombine. These D&R approaches are implemented in parallel without communication. Extensive simulations show that these D&R approaches are scalable and have comparable performance as the method that uses the whole data. The sequential D&R approaches are spatially adaptive which lead to better performance than the method that uses the whole data when the underlying function is spatially inhomogeneous.

Supplementary Material: Simulation Codes

1.

fit only: folder that contains four R code files to compute spline estimates and system runtime for three cases with GML choice of the smoothing parameter and σ = 0.1.

(a)

fit_only.R: R code file contains six functions to compute estimates by four D&R methods and two other methods.

(b)

simple_sin_s1.R: R code file to compute system runtime for case 1.

(c)

sin_s1.R: R code file to compute system runtime for case 2.

(d)

doppler_s1.R: R code file to compute system runtime for case 3.

2.

with pstd: folder that contains four R codes files to compute confidence intervals, average MSE, squared bias and variance for three cases with GML choice of the smoothing parameter and σ = 0.1.

(a)

fit_only.R: R code file contains six functions to compute estimates and confidence intervals by four D&R methods and two other methods.

(b)

simple_sin_s1.R: R code file to compute confidence intervals, average MSE, squared bias and variance for case 1.

(c)

sin_s1.R: R code file to compute confidence intervals, average MSE, squared bias and variance for case 2.

(d)

doppler_s1.R: R code file to compute confidence intervals, average MSE, squared bias and variance, and generate Figure 1, the typical spline estimates for case 3.

Additional information

Funding

This research was supported by a grant from the National Science Foundation (DMS-1507620). The authors acknowledge support from the Center for Scientific Computing from the CNSI, MRL: an NSF MRSEC (DMR-1121053).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.