547
Views
31
CrossRef citations to date
0
Altmetric
18th International Conference on QSAR in Environmental and Health Sciences (QSAR 2018)

Modelling methods and cross-validation variants in QSAR: a multi-level analysis$

ORCID Icon, ORCID Icon & ORCID Icon
Pages 661-674 | Received 11 Jul 2018, Accepted 24 Jul 2018, Published online: 30 Aug 2018
 

ABSTRACT

Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r2 and Q2). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.

Acknowledgement

This work was supported by the National Research, Development and Innovation Office of Hungary (NKFIH, grants K 119269 and KH_17 125608).

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.