Abstract
SubBag is a technique by combining bagging and random subspace methods to generate ensemble classifiers with good generalization capability. In practice, a hyperparameter K of SubBag—the number of randomly selected features to create each base classifier—should be specified beforehand. In this article, we propose to employ the out-of-bag instances to determine the optimal value of K in SubBag. The experiments conducted with some UCI real-world data sets show that the proposed method can make SubBag achieve the optimal performance in nearly all the considered cases. Meanwhile, it occupied less computational sources than cross validation procedure.
Acknowledgments
The authors would like to acknowledge the support from the Chinese National Basic Research Program (973 Program, No. 2007CB311002), the National Natural Science Foundation of China (No. 60675013), the Chinese Social Science Foundation of Ministry of Education of China (No. 09YJA790174), as well as the Fundamental Research Funds for the Central Universities of China.
Notes
“ •” and “ ○ ”, respectively, indicate that SubBag with K* is significantly better and worse than the corresponding algorithm at the significance level α = 0.05.