Abstract
A novel ensemble-based feature selection method was developed which is designated as ensemble partial least squares regression coeffientents (EPRC). It was composed of two steps: generating a series of different single feature selectors and aggregating them to reach a consensus. Specifically, the bootstrap resampling approach was used to generate a diversity of single feature selectors, and the absolute values of the regression coefficients of the partial least squares (PLS) model were used to rank the features. Next, these feature rankings out of single feature selectors were aggregated by the weighted-sum approach. Finally, coupled with the regression model, the features selected by EPRC were evaluated through cross validation and an independent test set. By experiments of constructing the spectroscopy analysis model on three near infrared spectroscopy (NIRS) datasets, it was shown that the EPRC located key wavelengths, gave a promotion to regression performance, and was more stable and interpretable to the domain experts.
Acknowledgments
The authors acknowledge Dr. H.B. Lu for fruitful discussions on optimization theory. The authors acknowledge Dr. H. Wang for help with molecular structures.
Disclosure statement
No potential conflict of interest was reported by the authors.