ABSTRACT
The dependence of statistical validation parameters was investigated on the size of the sample taken in fit of multivariate linear curves. We observed that R2 and related internal parameters were misleading as they overestimated the goodness-of-fit of models at small sample size. Cross-validation metrics showed correct trends. It was possible to scale the leave-one-out and the leave-many-out results close to identical by correcting the degrees of freedom of the models. y and x-randomized validation parameters were calculated and the methods provided close to identical results. We suggest to use the simplest methods in both cases. The external parameters followed correct trends with respect to the sample size, but their sensitivity differed. We plotted the Roy-Ojha metrics in 2D and we coloured them with respect to other external parameters to provide an easy classification of models. The rank correlations were calculated between the performance parameters. Up to a sample size, goodness-of-fit and robustness were distinguishable, but above a certain sample size, the parameters were redundant. The external-internal pairs were weakly correlated. Our data show that all the three aspects of validation are necessary at small sample sizes, but the internal check of robustness is not informative above a given sample size.
Acknowledgements
The authors thank the fruitful discussions with the participants of the Conferentia Chemometrica conference held in Karcag (Hungary) in September 2019. The investigation was partly supported by grant NKFI K-128136.
Description of the supplementary material
In the supplementary material, we clarified two questions on validation parameters, where we found several misinterpretations or not thorough conclusions in the literature. In Table S1 of the supplementary material, the differences in the interpretation of validation parameters were collected between unconstrained linear regression and other model types. In figure S1 in the supplementary material we showed biased models with shift and scale in order to demonstrate the behaviour of CCC and R2, that seemed to be necessary due to the several misinterpretations originating from the only conditional equivalence of R2 and the square of Pearson correlation coefficient. Here, we found that the signalling power of CCC was surely not larger than that of R2.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary material
Supplementary data for this article can be accessed at: https://doi.org/10.1080/1062936X.2021.1890208.