References
- Royal Society of Chemistry. ChemSpider. Available at http://www.chemspider.com/.
- The PubChem Project. Available at http://pubchem.ncbi.nlm.nih.gov/.
- Open Data. From Wikipedia, the free encyclopedia. Available at https://en.wikipedia.org/wiki/Open_Data.
- A.M. Clark and S. Ekins, Open source Bayesian models. 2. Mining a “big dataset” to create and validate models with ChEMBL, J. Chem. Inf. Model. 55 (2015), pp. 1246–1260.
- A.M. Clark, S. Ekins, and A.J. Williams, Redefining cheminformatics with intuitive collaborative mobile apps, Mol. Inform. 31 (2012), pp. 569–584.
- I.V. Tetko, D.M. Lowe, and A.J. Williams, The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS, J. Cheminf. 8 (2016).
- Wikidata, the free and open knowledge base. Available at https://www.wikidata.org/wiki/Wikidata:Main_Page.
- S. Ekins and A.J. Williams, Precompetitive preclinical ADME/Tox data: Set it free on the web to facilitate computational model building and assist drug development, Lab. Chip. 10 (2010), pp. 13–22.
- D. Ballabio, M. Vasighi, V. Consonni, and M. Kompany-Zareh, Genetic algorithms for architecture optimisation of counter-propagation artificial neural networks, Chemom. Intell. Lab. Syst. 105 (2011), pp. 56–64.
- K. Mansouri, A. Abdelaziz, A. Rybacka, A. Roncaglioni, A. Tropsha, A. Varnek, A. Zakharov, A. Worth, A. Richard, C. M. Grulke, D. Trisciuzzi, D. Fourches, D. Horvath, E. Benfenati, E. Muratov, E. B. Wedebye, F. Grisoni, G. F. Mangiatordi, G. M. Incisivo, H. Hong, H. W. Ng, I. V. Tetko, I. Balabin, J. Kancherla, J. Shen, J. Burton, M. Nicklaus, M. Cassotti, N. G. Nikolov, O. Nicolotti, P. L. Andersson, Q. Zang, R. Politi, R. D. Beger, R. Todeschini, R. Huang, S. Farag, S. A. Rosenberg, S. Slavov, X. Hu, and R. S. Judson, CERAPP: Collaborative estrogen receptor activity prediction project, Environ. Health Perspect. 124 (2016). pp. 1023–1033.
- K. Mansouri, T. Ringsted, D. Ballabio, R. Todeschini, and V. Consonni, Quantitative structure–activity relationship models for ready biodegradability of chemicals, J. Chem. Inf. Model. 53 (2013), pp. 867–878.
- S.H. Slavov, B.A. Pearce, D.A. Buzatu, J.G. Wilkes, and R.D. Beger, Complementary PLS and KNN algorithms for improved 3D-QSDAR consensus modeling of AhR binding, J. Cheminf. 5 (2013), p. 47.
- P. Gramatica, Principles of QSAR models validation: Internal and external, QSAR Comb. Sci. 26 (2007), pp. 694–701.
- D. Fourches, E. Muratov, and A. Tropsha, Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research, J. Chem. Inf. Model. 50 (2010), pp. 1189–1204.
- D. Fourches, E. Muratov, and A. Tropsha, Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation, J. Chem. Inf, Model. 56 (2016), pp. 1243–1252.
- A.J. Williams, S. Ekins, and V. Tkachenko, Towards a gold standard: Regarding quality in public domain chemistry databases and approaches to improving the situation, Drug Discov. Today 17 (2012), pp. 685–701.
- M.R. Berthold, N. Cebron, F. Dill, T.R. Gabriel, T. Kötter, T. Meinl et al. KNIME: The Konstanz Information Miner, in Studies in Data Analysis, Machine Learning and Applications: Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7–9, 2007, C. Preisach, H. Burkhardt, L. Schmidt-Thieme and R. Decker, eds., Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 319–326.
- U. S. EPA, EPI Suite Data. Available at http://esc.syrres.com/interkow/EpiSuiteData_ISIS_SDF.htm.
- Scientific Databases. Available at http://www.srcinc.com/what-we-do/environmental/scientific-databases.html.
- K. Mansouri, Estimating Degradation and Fate of Organic Pollutants by QSAR Modeling, LAP LAMBERT Academic Publishing, Saarbrucken, Germany, 2013.
- F. Sahigara, K. Mansouri, D. Ballabio, A. Mauri, V. Consonni, and R. Todeschini, Comparison of different approaches to define the applicability domain of QSAR models, Molecules 17 (2012), pp. 4791–4810.
- I.V. Tetko, V.Y. Tanchuk, and A.E.P. Villa, Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices, J. Chem. Inf. Comput. Sci. 41 (2001), pp. 1407–1421.
- K. Mansouri, V. Consonni, M.K. Durjava, B. Kolar, T. Öberg, and R. Todeschini, Assessing bioaccumulation of polybrominated diphenyl ethers for aquatic species by QSAR modeling, Chemosphere 89 (2012), pp. 433–444.
- A. Lang, Comparing EPI Suite MP Prediction to MPModel. Available at http://onschallenge.wikispaces.com/EPISuite.
- G. Nicola, M.R. Berthold, M.P. Hedrick, and M.K. Gilson, Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME, Database J. Biol. Databases Curation 2015 (2015), pp. bav087.
- Chemical table file. Available at https://en.wikipedia.org/wiki/Chemical_table_file.
- D. Weininger, SMILES. 3. DEPICT. Graphical depiction of chemical structures, J. Chem. Inf. Model. 30 (1990), pp. 237–243.
- SMILES Simplified Molecular-Input Line-Entry System. Available at https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system.
- CAS Registry Number. Available at https://en.wikipedia.org/wiki/CAS_Registry_Number.
- ACD/Labs, ACD/ChemFolder, version 12, ACD/Labs, 2016. Available at http://www.acdlabs.com/products/enterprise/components/cfe/compare.php
- ACD/Labs, ACD/Name, ACD/Labs, 2016. Available at http://www.acdlabs.com/products/draw_nom/nom/name
- epam, Indigo Toolkit, epam, 2016. Available at http://lifescience.opensource.epam.com/indigo
- ACD/Labs, ACD/Dictionary, ACD/Labs, 2016. http://www.acdlabs.com/products/draw_nom/draw/chemsketch/index.php
- DSSTox. Available at http://www.epa.gov/ncct/dsstox/.
- A.M. Richard, R.S. Judson, K.A. Houck, C.M. Grulke, P. Volarath, I. Thillainadarajah, C. Yang, J. Rathman, M. T. Martin, J. F. Wambaugh, T. B. Knudsen, J. Kancherla, K. Mansouri, G. Patlewicz, A. J. Williams, S. B. Little, K. M. Crofton, and R. S. Thomas, ToxCast chemical landscape: Paving the road to 21st century toxicology, Chem. Res. Toxicol. 29 (2016), pp. 1225–1251.
- ChemIDPlus. Available at http://chem.sis.nlm.nih.gov/chemidplus.
- W.E. Winkler, String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage, in proceedings of the section on survey research methods. American Statistical Association (1990), pp. 354–359.
- C.W. Yap, PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints, J. Comput. Chem. 32 (2011), pp. 1466–1474.
- R. Leardi and A. Lupiáñez, González, Genetic algorithms applied to feature selection in PLS regression: How and when to use them, Chemom. Intell. Lab. Syst. 41 (1998), pp. 195–207.
- L. Ståhle and S. Wold, Partial least squares analysis with cross-validation for the two-class problem: A Monte Carlo study, J. Chemom. 1 (1987), pp. 185–196.
- S. Wold, M. Sjöström, and L. Eriksson, PLS-regression: A basic tool of chemometrics, Chemom. Intell. Lab. Syst. 58 (2001), pp. 109–130.
- P. Filzmoser, B. Liebmann, and K. Varmuza, Repeated double cross validation, J. Chemom. 23 (2009), pp. 160–171.
- V. Consonni, D. Ballabio, and R. Todeschini, Comments on the definition of the Q2 parameter for QSAR validation, J. Chem. Inf. Model. 49 (2009), pp. 1669–1678.
- V. Consonni, D. Ballabio, and R. Todeschini, Evaluation of model predictive ability by external validation techniques, J. Chemom. 24 (2010), pp. 194–201.
- G.M. Maggiora, On outliers and activity cliffs. Why QSAR often disappoints, J. Chem. Inf. Model. 46 (2006), pp. 1535–1535.
- M.M.W.B. Hendriks, J.H. de Boer, A.K. Smilde, and D.A. Doornbos, Multicriteria decision making, Chemom. Intell. Lab. Syst. 16 (1992), pp. 175–191.
- H.R. Keller, D.L. Massart, and J.P. Brans, Multicriteria decision making: A case study, Chemom. Intell. Lab. Syst. 11 (1991), pp. 175–189.
- P.J. Lewi and J. Van Hoof, Multicriteria decision making using Pareto optimality and PROMETHEE preference ranking, Chemom. Intell. Lab. Syst. 16 (1992), pp. 139–144.
- M. Pavan, A. Mauri, and R. Todeschini, Total ranking models by the genetic algorithm variable subset selection (GA-VSS) approach for environmental priority settings, Anal. Bioanal. Chem. 380 (2004), pp. 430–444.
- M. Pavan and R. Todeschini, Chapter 2 total-order ranking methods, Data Handl. Sci. Technol. 27 (2008), pp. 51–72.
- M. Pavan and R. Todeschini, Multicriteria decision making methods, in Comprehensive Chemometrics, Elsevier, Amsterdam, 2009, pp. 591–629.
- MathWorks, MATLAB Version 8.2, MathWorks, 2015. Software Available at http://www.mathworks.com
- iCSS Chemistry Dashboard. Available at http://comptox.epa.gov.
- E. Furusjö, A. Svenson, M. Rahmberg, and M. Andersson, The importance of outlier detection and training set selection for reliable environmental QSAR predictions, Chemosphere 63 (2006), pp. 99–108.
- R. Guha, D. Dutta, P.C. Jurs, T. Chen, and R.-N.N. Curves, An Intuitive approach to outlier detection using a distance based method, J. Chem. Inf. Model. 46 (2006), pp. 1713–1722.
- L. Molnár, G.M. Keserű, Á. Papp, Z. Gulyás, and F. Darvas, A neural network based prediction of octanol–water partition coefficients using atomic fragmental descriptors, Bioorg. Med. Chem. Lett. 14 (2004), pp. 851–853.
- T. Öberg, A QSAR for baseline toxicity: Validation, domain of application, and prediction, Chem. Res. Toxicol. 17 (2004), pp. 1630–1637.
- B. Bhhatarai, W. Teetz, T. Liu, T. Öberg, N. Jeliazkova, N. Kochev, O. Pukalov, I. V. Tetko, S. Kovarich, E. Papa, and P. Gramatica, CADASTER QSPR models for predictions of melting and boiling points of perfluorinated chemicals, Mol. Inform. 30 (2011), pp. 189–204.
- M. Yadav, S. Joshi, A. Nayarisseri, A. Jain, A. Hussain, and T. Dubey, Global QSAR modeling of LogP values of phenethylamines acting as adrenergic alpha-1 receptor agonists, Interdiscip. Sci. Comput. Life Sci. 5 (2013), pp. 150–154.
- BIOVIA Enhanced Stereochemical Representation. Available at http://accelrys.com/products/pdf/enhanced-stereochemical-representation.pdf.
- S.R. Heller, A. McNaught, I. Pletnev, S. Stein, and D. Tchekhovskoi, InChI, the IUPAC International Chemical Identifier, J. Cheminf. 7 (2015), pp. 23.
- M. Waldman, R. Fraczkiewicz, and R.D. Clark, Tales from the war on error: The art and science of curating QSAR data, J. Comput. Aided Mol, Des. 29 (2015), pp. 897–910.
- I.V. Tetko, I. Sushko, A.K. Pandey, H. Zhu, A. Tropsha, E. Papa, T. Öberg, R. Todeschini, D. Fourches, and A. Varnek, Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection, J. Chem. Inf. Model. 48 (2008), pp. 1733–1746.
- D. Young, T. Martin, R. Venkatapathy, and P. Harten, Are the chemical structures in your QSAR correct?, QSAR Comb. Sci. 27 (2008), pp. 1337–1345.
- ACToR Dashboard. Available at https://actor.epa.gov/actor/.
- iCSS ToxCast Dashboard. Available at http://actor.epa.gov/dashboard2/.
- CPCat: Chemical and Product Categories. Available at http://actor.epa.gov/cpcat/faces/home.xhtml.