655
Views
79
CrossRef citations to date
0
Altmetric
Articles

Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parametersFootnote

, &
Pages 683-700 | Received 17 Jun 2015, Accepted 16 Aug 2015, Published online: 05 Oct 2015

References

  • F. Stout, M.R. Baines, and J.H. Kalivas, Impartial graphical comparison of multivariate calibration methods and the harmony/parsimony tradeoff, J. Chemom. 20 (2006), pp. 464–475.
  • J.H. Kalivas, K. Héberger, and E. Andries, Sum of ranking differences (SRD) to ensemble multivariate calibration model merits for tuning parameter selection and comparing calibration methods, Anal. Chim. Acta 869 (2015), pp. 21–33.
  • P. Geladi, J. Swerts, and F. Lindgren, Multiwavelength microscopic image analysis of a piece of painted chinaware: Classification and regression, Chemom. Intell. Lab. Syst. 24 (1994), pp. 145–167.
  • P. Geladi, The regression model comparison plot (REMOCOP), in Frontiers in Analytical Spectroscopy, D. Andrews and A. Davies, eds., The Royal Society of Chemistry, Cambridge, 1995, pp. 225–236.
  • R. Todeschini, D. Ballabio, V. Consonni, A. Mauri, and M. Pavan, CAIMAN (Classification And Influence Matrix Analysis): A new approach to the classification based on leverage-scaled functions, Chemom. Intell. Lab. Syst. 87 (2007), pp. 3–17.
  • K. Héberger and R. Rajkó, Generalization of pair correlation method (PCM) for non-parametric variable selection, J. Chemom. 16 (2002), pp. 436–443.
  • K. Héberger and R. Rajkó, Variable selection using pair-correlation method. Environmental applications, SAR QSAR Environ. Res. 13 (2002), pp. 541–554.
  • P. Gramatica, N. Chirico, E. Papa, S. Cassani, and S. Kovarich, QSARINS: A new software for the development, analysis, and validation of QSAR MLR models, J. Comput. Chem. 34 (2013), pp. 2121–2132.
  • C. Bertinetto, C. Duce, R. Solaro, and K. Héberger, Modeling of the acute toxicity of benzene derivatives by complementary QSAR methods, Match Commun. Math. Co. 70 (2013), pp. 1005–1021.
  • N. Matuszak, G.G. Muccioli, G. Labar, and D.M. Lambert, Synthesis and in vitro evaluation of N-substituted maleimide derivatives as selective monoglyceride lipase inhibitors, J. Med. Chem. 52 (2009), pp. 7410–7420.
  • J. Wu, Y. Wang, and Y. Shen, Molecular docking and QSAR analysis on maleimide derivatives selective inhibition against human monoglyceride lipase based on various modeling methods and conformations, Chemom. Intell. Lab. Syst. 131 (2014), pp. 22–30.
  • Instant JChem. ChemAxon LLC, Budapest, Hungary, 2014.
  • QikProp, version 4.2. Schrödinger, LLC, New York, NY, USA, 2014.
  • Small-Molecule Drug Discovery Suite 2014-4. Schrödinger, LLC, New York, NY, USA, 2014.
  • RDKit: Cheminformatics and Machine Learning Software, Open-Source, available at http://www.rdkit.org
  • KNIME | Konstanz Information Miner. University of Konstanz, Konstanz, Germany, 2015.
  • Schrödinger Documentation; available at http://www.schrodinger.com/supportdocs/18/.
  • RDKit Descriptor List; available at http://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-descriptors.
  • P. Gramatica, S. Cassani, and N. Chirico, QSARINS-chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS, J. Comput. Chem. 35 (2014), pp. 1036–1044.
  • R.L. Haupt and S.E. Haupt (eds.), Practical Genetic Algorithms, 2nd ed., Wiley, New York, NY, 2004.
  • H.R. Keller, D.L. Massart, and J.P. Brans, Multicriteria decision making: A case study, Chemom. Intell. Lab. Syst. 11 (1991), pp. 175–189.
  • E.C. Harrington, The desirability function, Ind. Qual. Control 21 (1965), pp. 494–498.
  • G. Derringer and R. Suich, Simultaneous optimization of several response variables, J. Qual. Technol. 12 (1980), pp. 214–219.
  • M. Pavan, Total and Partial Ranking Methods in Chemical Sciences, University of Milano, Bicocca, 2003.
  • J.H. Friedman, Multivariate adaptive regression splines, Ann. Stat. 19 (1991), pp. 1–67.
  • R. Todeschini, V. Consonni, and A. Maiocchi, The K correlation index: Theory development and its application in chemometrics, Chemom. Intell. Lab. Syst. 46 (1999), pp. 13–29.
  • L.I.-K. Lin, A concordance correlation coefficient to evaluate reproducibility, Biometrics 45 (1989), pp. 255–268.
  • L.I.-K. Lin, Assay validation using the concordance correlation coefficient, Biometrics 48 (1992), p. 599.
  • C. Rücker, G. Rücker, and M. Meringer, y-Randomization and its variants in QSPR/QSAR, J. Chem. Inf. Model. 47 (2007), pp. 2345–2457.
  • V. Consonni, D. Ballabio, and R. Todeschini, Evaluation of model predictive ability by external validation techniques, J. Chemom. 24 (2010), pp. 194–201.
  • G. Schüürmann, R.-U. Ebert, J. Chen, B. Wang, and R. Kühne, External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs. training set activity mean, J. Chem. Inf. Model. 48 (2008), pp. 2140–2145.
  • L.M. Shi, H. Fang, W. Tong, J. Wu, R. Perkins, R.M. Blair, W.S. Branham, S.L. Dial, C.L. Moland, and D.M. Sheehan, QSAR models using a large diverse set of estrogens, J. Chem. Inf. Model. 41 (2001), pp. 186–195.
  • V. Consonni, D. Ballabio, and R. Todeschini, Comments on the definition of the Q2 parameter for QSAR validation, J. Chem. Inf. Model. 49 (2009), pp. 1669–1678.
  • P.P. Roy and K. Roy, On some aspects of variable selection for partial least squares regression models, QSAR Comb. Sci. 27 (2008), pp. 302–313.
  • P.K. Ojha, I. Mitra, R.N. Das, and K. Roy, Further exploring rm2 metrics for validation of QSPR models, Chemom. Intell. Lab. Syst. 107 (2011), pp. 194–205.
  • K. Héberger, Sum of ranking differences compares methods or models fairly, TrAC Trends Anal. Chem. 29 (2010), pp. 101–109.
  • K. Héberger and K. Kollár-Hunek, Sum of ranking differences for method discrimination and its validation: Comparison of ranks with random numbers, J. Chemom. 25 (2011), pp. 151–158.
  • K. Kollár-Hunek and K. Héberger, Method and model comparison by sum of ranking differences in cases of repeated observations (ties), Chemom. Intell. Lab. Syst. 127 (2013), pp. 139–146.
  • D. Bajusz, A. Rácz, and K. Héberger, Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7 (2015), p. 20.
  • T. Hastie, R. Tibshirani, and J. Friedman, Linear Discriminant Analysis, in Elements of Statistical Learning. Data Mining, Inference, Prediction, Springer, New York, NY, USA, 2001, pp. 106–119.
  • K. Héberger and B. Skrbić, Ranking and similarity for quantitative structure-retention relationship models in predicting Lee retention indices of polycyclic aromatic hydrocarbons, Anal. Chim. Acta 716 (2012), pp. 92–100.
  • N. Chirico and P. Gramatica, Real external predictivity of QSAR Models: How to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient, J. Chem. Inf. Model. 51 (2011), pp. 2320–2335.
  • N. Chirico and P. Gramatica, Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection, J. Chem. Inf. Model. 52 (2012), pp. 2044–2058.
  • D.M. Hawkins, S.C. Basak, and D. Mills, Assessing model fit by cross-validation, J. Chem. Inf. Comput. Sci. 43 (2003), pp. 579–586.
  • P. Filzmoser, B. Liebmann, and K. Varmuza, Repeated double cross validation, J. Chemom. 23 (2009), pp. 160–171.
  • J. Shao, Linear model selection by cross-validation, J. Am. Stat. Assoc. 88 (1993), pp. 486–494.
  • B. Efron and R. Tibshirani, Improvements on cross-validation: The 632+ bootstrap method, J. Am. Stat. Assoc. 92 (1997), pp. 548–560.
  • K. Baumann, Chance correlation in variable subset regression: Influence of the objective function, the selection mechanism, and ensemble averaging, QSAR Comb. Sci. 24 (2005), pp. 1033–1046.
  • A. Tropsha, P. Gramatica, and V. Gombar, The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models, QSAR Comb. Sci. 22 (2003), pp. 69–77.
  • P. Gramatica, Principles of QSAR models validation: Internal and external, QSAR Comb. Sci. 26 (2007), pp. 694–701.
  • K.H. Esbensen and P. Geladi, Principles of proper validation: Use and abuse of re-sampling for validation, J. Chemom. 24 (2010), pp. 168–187.
  • K. Roy, I. Mitra, S. Kar, P.K. Ojha, R.N. Das, and H. Kabir, Comparative studies on some metrics for external validation of QSPR models, J. Chem. Inf. Model. 52 (2012), pp. 396–408.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.