62
Views
149
CrossRef citations to date
0
Altmetric
Theory and Methods

A Flexible and Fast Method for Automatic Smoothing

, &
Pages 643-652 | Received 01 Feb 1988, Published online: 27 Feb 2012
 

Abstract

The choice of smoothing parameter or bandwidth is crucial when applying nonparametric regression estimators such as kernel estimators. The optimal choice depends on the data at hand. A data-driven bandwidth selection, close to the optimal one, would make these curve estimators objective, more reliable, and easier to use. Minimizing residual mean squared error criteria, such as cross-validation, have been frequently proposed to estimate the optimal smoothing parameter. Empirical and theoretical evidence indicates that cross-validation rules and related methods lead to rather variable estimated optimal bandwidths. The method presented here builds on estimating the asymptotically optimal bandwidth from the data. Since estimators for the residual variance and for an asymptotic expression for the bias are plugged into the asymptotic formula, such selection rules are called “plug-in” estimators. The functional that quantifies bias is approximated by the integrated squared second derivative of the regression function. This functional is determined by an iterative procedure with good convergence properties in theory and practice. A theoretical large-sample analysis shows that the plug-in estimator is attractive in terms of variability, with a relative rate OP (n -1/2) for smooth functions. In contrast, cross-validation leads to a relative rate OP (n -1/10). Despite the estimation of second derivatives involved, asymptotic properties are still adequate for nonsmooth functions. This robustness is desirable for a non-parametric approach. An extensive simulation study confirms the theoretical findings: The plug-in estimator of the bandwidth has much lower variability than cross-validation estimators for a broad variety of situations, including nonsmooth functions. Despite its asymptotic background, it works well for sample sizes as small as n = 15–25. It applies to fixed and to random design, to equally spaced and nonequally spaced design. Additional assets are computational speed and the great flexibility of this approach. It can also be extended to estimating the optimal bandwidth when determining derivatives of a regression function or probability densities and spectrum densities.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.