Abstract
Let X, X1, X2, … be independent and identically distributed ℝd-valued random variables and let m:ℝd→ℝ be a measurable function such that a density f of Y=m(X) exists. The problem of estimating f based on a sample of the distribution of (X,Y) and on additional independent observations of X is considered. Two kernel density estimates are compared: the standard kernel density estimate based on the y-values of the sample of (X,Y), and a kernel density estimate based on artificially generated y-values corresponding to the additional observations of X. It is shown that under suitable smoothness assumptions on f and m the rate of convergence of the L1 error of the latter estimate is better than that of the standard kernel density estimate. Furthermore, a density estimate defined as convex combination of these two estimates is considered and a data-driven choice of its parameters (bandwidths and weight of the convex combination) is proposed and analysed.
AMS Classification::
Acknowledgements
The authors would like to thank an associate editor and two anonymous referees for various comments which helped to improve the first version of this paper.
Funding
The first two authors would like to thank the German Research Foundation (DFG) for funding this project within the Collaborative Research Centre 805. The third author would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) [grant number RGPIN 270-2010].