Abstract
The additive model is a popular nonparametric regression method due to its ability to retain modeling flexibility while avoiding the curse of dimensionality. The backfitting algorithm is an intuitive and widely used numerical approach for fitting additive models. However, its application to large datasets may incur a high computational cost and is thus infeasible in practice. To address this problem, we propose a novel approach called independence-encouraging subsampling (IES) to select a subsample from big data for training additive models. Inspired by the minimax optimality of an orthogonal array (OA) due to its pairwise independent predictors and uniform coverage for the range of each predictor, the IES approach selects a subsample that approximates an OA to achieve the minimax optimality. Our asymptotic analyses demonstrate that an IES subsample converges to an OA and that the backfitting algorithm over the subsample converges to a unique solution even if the predictors are highly dependent in the full data. The proposed IES method is shown to be numerically appealing via simulations and a real data application. Theoretical proofs, R codes, and supplementary numerical results are accessible online as supplementarymaterials.
Supplementary Materials
Online supplementary materials:
provides proofs of the theoretical results, a discussion of the existence of orthogonal arrays, and supporting numerical results.
Code:
provides R code to replicate our result and use the method for other applications.
Acknowledgments
We thank the editor, associate editor, and three referees for their valuable comments and suggestions.
Disclosure Statement
The authors report there are no competing interests to declare.