73
Views
0
CrossRef citations to date
0
Altmetric
Articles

On the consistency of a random forest algorithm in the presence of missing entries

&
Pages 400-434 | Received 22 Sep 2021, Accepted 23 May 2023, Published online: 06 Jun 2023
 

Abstract

This paper tackles the problem of constructing a nonparametric predictor when the latent variables are given with incomplete information. The convenient predictor for this task is the random forest algorithm in conjunction to the so-called CART criterion. The proposed technique enables a partial imputation of the missing values in the data set in a way that suits both a consistent estimator of the regression function as well as a partial recovery of the missing values. The imputation is done through iterative assignation of the missing values to the tree's cells, maximising the CART criterion. A proof of the consistency of the random forest estimator is given in the case where each latent variable is missing completely at random (MCAR).

AMS Subject Classifications:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Codes to reproduce our results can be found in https://github.com/IrvingGomez/RandomForestsSimulations.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.