ABSTRACT
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Acknowledgments
Ahmet Aloglu, Xinyi Wang, and Zewei Chen are thanked for their helpful comments. Jim Harnly and the USDA ARS are thanked for supplying the ginseng UV spectra and partial support of this project. John Kalivas at Idaho University is thanked for supplying the NIR Wheat Data. Tecator is thanked for making the meat dataset publicly available.