Abstract
This paper tackles the problem of constructing a nonparametric predictor when the latent variables are given with incomplete information. The convenient predictor for this task is the random forest algorithm in conjunction to the so-called CART criterion. The proposed technique enables a partial imputation of the missing values in the data set in a way that suits both a consistent estimator of the regression function as well as a partial recovery of the missing values. The imputation is done through iterative assignation of the missing values to the tree's cells, maximising the CART criterion. A proof of the consistency of the random forest estimator is given in the case where each latent variable is missing completely at random (MCAR).
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Codes to reproduce our results can be found in https://github.com/IrvingGomez/RandomForestsSimulations.