Abstract
In applications of the k-nearest neighbour technique (kNN) with real-valued attributes of interest (Y) the predictions are biased for units with ancillary values of X with poor or no representation in a sample of n units. In this article a model-assisted calibration is proposed that reduces unit-level extrapolation bias. The bias is estimated as the difference in model-based predictions of Y given the X-values of the true k nearest units and the k selected reference units. Calibrated kNN predictions are then obtained by adding this difference to the original kNN prediction. The relationship is modelled between Y and X with decorrelated X-variables, variables scaled to the interval [0,1] and Bernstein basis functions to capture changes in Y as a function of changes in X. Three examples with actual forest inventory data from Italy, the USA and Finland demonstrated that calibrated kNN predictions were, on average, closer to their true values than non-calibrated predictions. Calibrated predictions had a range much closer to the actual range of Y than non-calibrated predictions.
Acknowledgements
Data for the IT population were kindly made available by Dr Piermaria Corona, University of Tuscany, Department of Forest Environment and Resources. We are grateful to three anonymous journal referees and the Editor for numerous constructive and helpful comments and suggestions to an earlier version of this manuscript