Abstract
Random forests (RFs), introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, simplified versions of RF, called purely RFs (PRF), which can be theoretically handled more easily, have been considered. In this paper, we study the variance of such forests. First, we show a general upper bound which emphasises the fact that a forest reduces the variance. We then introduce a simple variant of PRFs, that we call purely uniformly RFs. For this variant and in the context of regression problems with a one-dimensional predictor space, we show that both random trees and RFs reach minimax rate of convergence. In addition, we prove that compared with random trees, RFs improve accuracy by reducing the estimator variance by a factor of three-fourths.
Acknowledgements
The author would like to thank Pascal Massart for very valuable discussions, Jean-Michel Poggi and Nicolas Verzelen for interesting remarks, and Lucile Debrosse for her precious help with English language. The author also thank two anonymous referees who help to significantly improve this paper.