547
Views
31
CrossRef citations to date
0
Altmetric
18th International Conference on QSAR in Environmental and Health Sciences (QSAR 2018)

Modelling methods and cross-validation variants in QSAR: a multi-level analysis$

ORCID Icon, ORCID Icon & ORCID Icon
Pages 661-674 | Received 11 Jul 2018, Accepted 24 Jul 2018, Published online: 30 Aug 2018

References

  • C. Hansch and T. Fujita, p-σ-π analysis. A method for the correlation of biological activity and chemical structure, J. Am. Chem. Soc. 86 (1964), pp. 1616–1626.
  • C. Hansch, P.P. Maloney, T. Fujita, and R.M. Muir, Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients, Nature 194 (1962), pp. 178–180.
  • L.P. Hammett, The effect of structure upon the reactions of organic compounds. Benzene derivatives, J. Am. Chem. Soc. 59 (1937), pp. 96–103.
  • H. Kubinyi, QSAR : Hansch Analysis and Related Approaches, Wiley-VCH, Weinheim, Germany, 1993.
  • A. Cherkasov, E.N. Muratov, D. Fourches, A. Varnek, I.I. Baskin, M. Cronin, J. Dearden, P. Gramatica, Y.C. Martin, R. Todeschini, V. Consonni, V.E. Kuz’min, R. Cramer, R. Benigni, C. Yang, J. Rathman, L. Terfloth, J. Gasteiger, A. Richard, and A. Tropsha, QSAR modeling: Where have you been? Where are you going to? J. Med. Chem. 57 (2014), pp. 4977–5010.
  • D. Bajusz, A. Rácz, and K. Héberger, Chemical data formats, fingerprints, and other molecular descriptions for database analysis and searching, in Comprehensive Medicinal Chemistry III, S. Chackalamannil, D.P. Rotella, and S.E. Ward, eds., Elsevier, Oxford, 2017, pp. 329–378.
  • T. Hastie, R. Tibshirani, and J.H. Friedman, Cross-validation, in The Elements of Statistical Learning: Data Mining, Inference, and Prediction, T. Hastie, R. Tibshirani, and J.H. Friedman, eds., Springer, New York, 2009, pp. 241–249.
  • R.D. Cramer, D.E. Patterson, and J.D. Bunce, Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins, J. Am. Chem. Soc. 110 (1988), pp. 5959–5967.
  • J. Verma, V. Khedkar, and E. Coutinho, 3D-QSAR in drug design – A review, Curr. Top. Med. Chem. 10 (2010), pp. 95–115.
  • S. Alam and F. Khan, 3D-QSAR studies on maslinic acid analogs for anticancer activity against breast cancer cell line MCF-7, Sci. Rep. 7 (2017), pp.1-13. article no: 6019.
  • A.M. Doweyko, 3D-QSAR illusions, J. Comput. Aided. Mol. Des. 18 (2004), pp. 587–596.
  • K.H. Esbensen and P. Geladi, Principles of proper validation: Use and abuse of re-sampling for validation, J. Chemom. 24 (2010), pp. 168–187.
  • P. Gramatica, Principles of QSAR models validation: Internal and external, QSAR Comb. Sci. 26 (2007), pp. 694–701.
  • D.M. Hawkins, S.C. Basak, and D. Mills, Assessing model fit by cross-validation, J. Chem. Inf. Comput. Sci. 43 (2003), pp. 579–586.
  • D.M. Hawkins, The problem of overfitting, J. Chem. Inf. Comput. Sci. 44 (2004), pp. 1–12.
  • A. Rácz, D. Bajusz, and K. Héberger, Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters, SAR QSAR Environ. Res. 26 (2015), pp. 683–700.
  • K. Héberger, A. Rácz, and D. Bajusz, Which performance parameters are best suited to assess the predictive ability of models? in Advances in QSAR Modeling, K. Roy, ed., Springer,  International Publishing, New York, USA, 2017, pp. 89–104.
  • J.C. Dearden, M.T.D. Cronin, and K.L.E. Kaiser, How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR), SAR QSAR Environ. Res. 20 (2009), pp. 241–266.
  • T. Hanser, C. Barber, J.F. Marchaland, and S. Werner, Applicability domain: Towards a more formal definition, SAR QSAR Environ. Res. 27 (2016), pp. 893–909.
  • P. Gramatica and A. Sangion, A historical excursus on the statistical validation parameters for QSAR Models: A clarification concerning metrics and terminology, J. Chem. Inf. Model. 56 (2016), pp. 1127–1131.
  • O. Farkas and K. Héberger, Comparison of ridge regression, partial least squares, pair-wise correlation, forward- and best subset selection methods for prediction of retention indices for aliphatic alcohols, J. Chem. Inf. Model. 45 (2005), pp. 339–346.
  • J.P. Doucet, E. Papa, A. Doucet-Panaye, and J. Devillers, QSAR models for predicting the toxicity of piperidine derivatives against Aedes aegypti, SAR QSAR Environ. Res. 28 (2017), pp. 451–470.
  • J.A. Castillo-Garit, G.M. Casañola-Martin, S.J. Barigye, H. Pham-The, F. Torrens, and A. Torreblanca, Machine learning-based models to predict modes of toxic action of phenols to Tetrahymena pyriformis, SAR QSAR Environ. Res. 28 (2017), pp. 735–747.
  • S. Bitam, M. Hamadache, and S. Hanini, QSAR model for prediction of the therapeutic potency of N-benzylpiperidine derivatives as AChE inhibitors, SAR QSAR Environ. Res. 28 (2017), pp. 471–489.
  • D. Qu, A. Yan, and J.S. Zhang, SAR and QSAR study on the bioactivities of human epidermal growth factor receptor-2 (HER2) inhibitors, SAR QSAR Environ Res. 28 (2017), pp. 111–112.
  • K. Héberger, S. Kolarević, M. Kračun-Kolarević, K. Sunjog, Z. Gačić, Z. Kljajić, M. Mitrić, and B. Vuković-Gačić, Evaluation of single-cell gel electrophoresis data: Combination of variance analysis with sum of ranking differences, Mutat. Res. Genet. Toxicol. Environ. Mutagen. 771 (2014), pp. 15–22.
  • C. Bertinetto, C. Duce, R. Solaro, K. Héberger, A. Miličević,  and S. Nikolić, Modeling of the acute toxicity of benzene derivatives by complementary QSAR methods, Match-Comm. Math. Comput. Chem. 70 (2013), pp. 1005–1021.
  • M. Cassotti, D. Ballabio, V. Consonni, A. Mauri, I. V Tetko, and R. Todeschini, Prediction of acute aquatic toxicity toward Daphnia magna by using the GA-kNN method, Altern. Lab. Anim. 42 (2014), pp. 31–41.
  • QikProp, Release 2017-4. Schrödinger, LLC, New York, NY, 2017.
  • RDKit: Open-Source Cheminformatics Software; software available at http://rdkit.org/.
  • PLS Toolbox, Eigenvector Research Inc.; software available at: http://www.eigenvector.com/index.htm.
  • A. Rácz, M. Fodor, and K. Héberger, Development and comparison of regression models for the determination of quality parameters in margarine spread samples using NIR spectroscopy, Anal. Meth. 10 (2018), pp. 3089–3099.
  • K. Héberger, Sum of ranking differences compares methods or models fairly, TrAC Trends Anal. Chem. 29 (2010), pp. 101–109.
  • K. Kollár-Hunek and K. Héberger, Method and model comparison by sum of ranking differences in cases of repeated observations (ties), Chemom. Intell. Lab. Syst. 127 (2013), pp. 139–146.
  • D. Bajusz, A. Rácz, and K. Héberger, Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7 (2015), pp.1-13. Article no: 20.
  • G. Tóth, Z. Bodai, and K. Héberger, Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart, J. Comput. Aided Mol. Des. 27 (2013), pp. 837–844.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.