434
Views
10
CrossRef citations to date
0
Altmetric
Research Article

Cross-validation strategies in QSPR modelling of chemical reactions

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 207-219 | Received 18 Sep 2020, Accepted 26 Jan 2021, Published online: 19 Feb 2021

References

  • M. Balls, B.J. Blaauboer, J.H. Fentem, L. Bruner, R.D. Combes, B. Ekwall, R.J. Fielder, A. Guillouzo, R.W. Lewis, D.P. Lovell, C.A. Reinhardt, G. Repetto, D. Sladowski, H. Spielmann, and F. Zucco, Practical aspects of the validation of toxicity test procedures, Altern. Lab. Anim. 23 (1995), pp. 129–146. doi:10.1177/026119299502300116.
  • A. Tropsha, P. Gramatica, and V. Gombar, The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models, QSAR Comb. Sci. 22 (2003), pp. 69–77. doi:10.1002/qsar.200390007.
  • I.V. Tetko, V.P. Solov’ev, A.V. Antonov, X. Yao, J.P. Doucet, B. Fan, F. Hoonakker, D. Fourches, P. Jost, N. Lachiche, and A. Varnek, Benchmarking of linear and nonlinear approaches for quantitative structure−property relationship studies of metal complexation with ionophores, J. Chem. Inf. Model. 46 (2006), pp. 808–819. doi:10.1021/ci0504216.
  • R. Veerasamy, H. Rajak, A. Jain, S. Sivadasan, C.P. Varghese, and R.K. Agrawal, Validation of QSAR models - Strategies and importance, Int. J. Drug Discov. 2 (2011), pp. 511–519.
  • K. Roy, S. Kar, and R.N. Das, Validation of QSAR models, in Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, K. Roy (ed.), Elsevier, San Diego, 2015, pp. 231–289.
  • P. Gramatica and A. Sangion, A historical excursus on the statistical validation parameters for QSAR models: A clarification concerning metrics and terminology, J. Chem. Inf. Model. 56 (2016), pp. 1127–1131. doi:10.1021/acs.jcim.6b00088.
  • I. Oprisiu, E. Varlamova, E. Muratov, A. Artemenko, G. Marcou, P. Polishchuk, V. Kuz’min, and A. Varnek, QSPR approach to predict nonadditive properties of mixtures. Application to bubble point temperatures of binary mixtures of liquids, Mol. Inform. 31 (2012), pp. 491–502. doi:10.1002/minf.201200006.
  • E. Muratov, E.V. Varlamova, V.E. Kuzmin, A.G. Artemenko, N.N. Muratov, S. Mileyko, D. Fourches, and A. Tropsha, Everything out validation approach for qsar models of chemical mixtures, J. Clin. Pharm. 1 (2014), pp. 1005.
  • M. Glavatskikh, T. Madzhidov, V. Solov’ev, G. Marcou, D. Horvath, and A. Varnek, Predictive models for the free energy of hydrogen bonded complexes with single and cooperative hydrogen bonds, Mol. Inform. 35 (2016), pp. 629–638. doi:10.1002/minf.201600070.
  • A.O. Aptula, N.G. Jeliazkova, T.W. Schultz, and M.T.D. Cronin, The better predictive model: High q2 for the training set or low root mean square error of prediction for the test set? QSAR Comb. Sci. 24 (2005), pp. 385–396. doi:10.1002/qsar.200430909.
  • A. Golbraikh and A. Tropsha, Beware of q2!, J. Mol. Graph. Model. 20 (2002), pp. 269–276. doi:10.1016/S1093-3263(01)00123-1.
  • P. Gramatica, Principles of QSAR models validation: Internal and external, QSAR Comb. Sci. 26 (2007), pp. 694–701. doi:10.1002/qsar.200610151.
  • J. Gasteiger and J. Zupan, Neural Networks in Chemistry, Angew. Chemie Int. Ed. English 32 (1993), pp. 503–527. doi:10.1002/anie.199305031.
  • J. Huuskonen, QSAR modeling with the electrotopological state: TIBO derivatives, J. Chem. Inf. Comput. Sci. 41 (2001), pp. 425–429. doi:10.1021/ci0001435.
  • I.V. Tetko, V.V. Kovalishyn, and D.J. Livingstone, Volume learning algorithm artificial neural networks for 3D QSAR studies, J. Med. Chem. 44 (2001), pp. 2411–2420. doi:10.1021/jm010858e.
  • M. Snarey, N.K. Terrett, P. Willett, and D.J. Wilton, Comparison of algorithms for dissimilarity-based compound selection, J. Mol. Graph. Model. 15 (1997), pp. 372–385. doi:10.1016/S1093-3263(98)00008-4.
  • A. Golbraikh, Molecular dataset diversity indices and their applications to comparison of chemical databases and QSAR analysis, J. Chem. Inf. Comput. Sci. 40 (2000), pp. 414–425. doi:10.1021/ci990437u.
  • C. Szántai-Kis, I. Kövesdi, G. Kéri, and L. Örfi, Validation subset selections for extrapolation oriented QSPAR models, Mol. Divers. 7 (2003), pp. 37–43. doi:10.1023/B:MODI.0000006538.99122.00.
  • D. Baumann and K. Baumann, Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation, J. Cheminform. 6 (2014), pp. 47. doi:10.1186/s13321-014-0047-1.
  • T. Gimadiev, T. Madzhidov, I. Tetko, R. Nugmanov, I. Casciuc, O. Klimchuk, A. Bodrov, P. Polishchuk, I. Antipinn, and A. Varnek, Bimolecular nucleophilic substitution reactions: Predictive models for rate constants and molecular reaction pairs analysis, Mol. Inform. 38 (2019), pp. 1800104. doi:10.1002/minf.201800104.
  • P. Polishchuk, T. Madzhidov, T. Gimadiev, A. Bodrov, R. Nugmanovn, and A. Varnek, Structure–reactivity modeling using mixture-based representation of chemical reactions, J. Comput. Aided. Mol. Des. 31 (2017), pp. 829–839. doi:10.1007/s10822-017-0044-3.
  • O. Engkvist, P.-O. Norrby, N. Selmi, Y. Lam, Z. Peng, E.C. Sherer, W. Amberg, T. Erhardn, and L.A. Smyth, Computational prediction of chemical reactions: Current status and outlook, Drug Discov. Today 23 (2018), pp. 1203–1218. doi:10.1016/j.drudis.2018.02.014.
  • H. Gao, T.J. Struble, C.W. Coley, Y. Wang, W.H. Green, and K.F. Jensen, Using machine learning to predict suitable conditions for organic reactions, ACS Cent. Sci. 4 (2018), pp. 1465–1476. doi:10.1021/acscentsci.8b00357.
  • I.I. Baskin, T.I. Madzhidov, I.S. Antipin, and A.A. Varnek, Artificial intelligence in synthetic chemistry: Achievements and prospects, Russ. Chem. Rev. 86 (2017), pp. 1127–1156. doi:10.1070/RCR4746.
  • D.T. Ahneman, J.G. Estrada, S. Lin, S.D. Dreher, and A.G. Doyle, Predicting reaction performance in C–N cross-coupling using machine learning, Science 360 (2018), pp. 186–190. doi:10.1126/science.aar5169.
  • F. Sandfort, F. Strieth-Kalthoff, M. Kühnemund, C. Beecks, and F. Glorius, A structure-based platform for predicting chemical reactivity, Chem 6 (2020), pp. 1379–1390. doi:10.1016/j.chempr.2020.02.017.
  • J.A. Kammeraad, J. Goetz, E.A. Walker, A. Tewari, and P.M. Zimmerman, What does the machine learn? Knowledge representations of chemical reactivity, J. Chem. Inf. Model. 60 (2020), pp. 1290–1301. doi:10.1021/acs.jcim.9b00721.
  • A.A. Kravtsov, P.V. Karpov, I.I. Baskin, V.A. Palyulin, and N.S. Zefirov, Prediction of rate constants of SN2 reactions by the multicomponent QSPR method, Dokl. Chem. 440 (2011), pp. 299–301. doi:10.1134/S0012500811100107.
  • A.A. Kravtsov, P.V. Karpov, I.I. Baskin, V.A. Palyulin, and N.S. Zefirov, Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of SN1 rate constants by means of QSPR, Dokl. Chem. 441 (2011), pp. 314–317. doi:10.1134/S0012500811110048.
  • T.I. Madzhidov, A.V. Bodrov, T.R. Gimadiev, R.I. Nugmanov, I.S. Antipin, and A.A. Varnek, Structure–reactivity relationship in bimolecular elimination reactions based on the condensed graph of a reaction, J. Struct. Chem. 56 (2015), pp. 1227–1234. doi:10.1134/S002247661507001X.
  • T.I. Madzhidov, T.R. Gimadiev, D.A. Malakhova, R.I. Nugmanov, I.I. Baskin, I.S. Antipin, and A.A. Varnek, Structure–reactivity relationship in Diels–Alder reactions obtained using the condensed reaction graph approach, J. Struct. Chem. 58 (2017), pp. 650–656. doi:10.1134/S0022476617040023.
  • T.I. Madzhidov, P.G. Polishchuk, R.I. Nugmanov, A.V. Bodrov, A.I. Lin, I.I. Baskin, A.A. Varnek, and I.S. Antipin, Structure-reactivity relationships in terms of the condensed graphs of reactions, Russ. J. Org. Chem. 50 (2014), pp. 459–463. doi:10.1134/S1070428014040010.
  • T.R. Gimadiev, T.I. Madzhidov, R.I. Nugmanov, I.I. Baskin, I.S. Antipin, and A. Varnek, Assessment of tautomer distribution using the condensed reaction graph approach, J. Comput. Aided. Mol. Des. 32 (2018), pp. 401–414. doi:10.1007/s10822-018-0101-6.
  • A. Varnek, D. Fourches, F. Hoonakker, and V.P. Solov’ev, Substructural fragments: An universal language to encode reactions, molecular and supramolecular structures, J. Comput. Aided. Mol. Des. 19 (2005), pp. 693–703. doi:10.1007/s10822-005-9008-0.
  • F. Hoonakker, N. Lachiche, A. Varnek, and A. Wagner, Condensed graph of reaction: Considering a chemical reaction as one single pseudo molecule, Int. J. Artif. Intell. Tools 20 (2011), pp. 253–270. doi:10.1142/S0218213011000140.
  • R.I. Nugmanov, R.N. Mukhametgaleev, T. Akhmetshin, T.R. Gimadiev, V.A. Afonina, T.I. Madzhidov, and A. Varnek, CGRtools: Python library for molecule, reaction, and condensed graph of reaction processing, J. Chem. Inf. Model. 59 (2019), pp. 2516–2521. doi:10.1021/acs.jcim.9b00102.
  • A. Varnek, D. Fourches, D. Horvath, O. Klimchuk, C. Gaudin, P. Vayer, V. Solov’ev, F. Hoonakker, I. Tetko, and G. Marcou, ISIDA - Platform for virtual screening based on fragment and pharmacophoric descriptors, Curr. Comput. Aided-Drug Des. 4 (2008), pp. 191–198. doi:10.2174/157340908785747465.
  • J. Catalán, V. López, P. Pérez, R. Martin-Villamil, and J.-G. Rodríguez, Progress towards a generalized solvent polarity scale: The solvatochromism of 2-(dimethylamino)-7-nitrofluorene and its homomorph 2-fluoro-7-nitrofluorene, Liebigs Ann. 1995 (1995), pp. 241–252. doi:10.1002/jlac.199519950234.
  • J. Catalán, C. Díaz, and A. Generalized Solvent, Acidity Scale: The Solvatochromism ofo-tert-Butylstilbazolium Betaine Dye and Its Homomorpho,o′-Di-tert-butylstilbazolium Betaine Dye, Liebigs Ann. 1997 (1997), pp. 1941–1949. doi:10.1002/jlac.199719970921.
  • R.W. Taft and M.J. Kamlet, The solvatochromic comparison method. 2. The .alpha.-scale of solvent hydrogen-bond donor (HBD) acidities, J. Am. Chem. Soc. 98 (1976), pp. 2886–2894. doi:10.1021/ja00426a036.
  • M.J. Kamlet and R.W. Taft, The solvatochromic comparison method. I. The .beta.-scale of solvent hydrogen-bond acceptor (HBA) basicities, J. Am. Chem. Soc. 98 (1976), pp. 377–383. doi:10.1021/ja00418a009.
  • M.J. Kamlet, J.L. Abboud, and R.W. Taft, The solvatochromic comparison method. 6. The .pi.* scale of solvent polarities, J. Am. Chem. Soc. 99 (1977), pp. 6027–6038. doi:10.1021/ja00460a031.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, A. Müller, J. Nothman, G. Louppe, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res. 12 (2011), pp. 2825–2830.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.