841
Views
23
CrossRef citations to date
0
Altmetric
Review

Assigning confidence to molecular property prediction

, , , , , , , & show all
Pages 1009-1023 | Received 25 Feb 2021, Accepted 29 Apr 2021, Published online: 15 Jun 2021

References

  • Learning D. https://www.deeplearningbook.org/.
  • Muratov EN, Bajorath J, Sheridan RP, et al. QSAR without border. Chem Soc Rev. 2020;49:3525–3564.
  • Netzeva TI, Worth A, Aldenberg T, et al. Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships: the Report and Recommendations of ECVAM Workshop 521. Altern. Lab. Anim. 33, 155–173 (2005).
  • Pan SJ, Yang QA. A Survey on Transfer Learnin. IEEE Transactions on Knowledge and Data Engineering. 2010;22(10):1345–1359.
  • Ballester PJ. Selecting machine-learning scoring functions for structure-based virtual screenin. Drug Discovery Today. 2019;32-33:81–87.
  • Chen L, Cruz A, Ramsey S, et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screenin. PLOS ONE. 2019;14(e0220113):8.
  • Yang J, Shen C, Huang N. Predicting or Pretending: artificial Intelligence for Protein-Ligand Interactions Lack of Sufficiently Large and Unbiased Dataset. Front Pharmacol. 2020;11:69.
  • Sieg J, Flachsenberg F, Rarey M. In Need of Bias Control: evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screenin. J Chem Inf Model. 2019;59:947–961.
  • Mysinger MM, Carchia M, Irwin JJ, et al. Directory of Useful Decoys, Enhanced (DUD-E): better Ligands and Decoys for Better Benchmarkin. J Med Chem. 2012;55:6582–6594.
  • Wójcikowski M, Ballester PJ, Siedlecki P. Performance of machine-learning scoring functions in structure-based virtual screenin. Scientific Reports. 2017;7(1). DOI:https://doi.org/10.1038/srep46710
  • Chen P, Ke Y, Lu Y, et al. DLIGAND2: an improved knowledge-based energy function for protein–ligand interactions using the distance-scaled, finite, ideal-gas reference stat. Journal of Cheminformatics. 2019;11(1):52. .
  • Yasuo N, Sekijima M. Improved Method of Structure-Based Virtual Screening via Interaction-Energy-Based Learnin. Journal of Chemical Information and Modeling. 2019;59(3):1050–1061.
  • Ragoza M, Hochuli J, Idrobo E, et al. Protein–Ligand Scoring with Convolutional Neural Network. Journal of Chemical Information and Modeling. 2017;57(4):942–957. .
  • Imrie F, Bradley AR, Van Der Schaar M, et al. Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Dat. Journal of Chemical Information and Modeling. 2018;58(11):2319–2330. .
  • Adeshina YO, Deeds EJ, Karanicolas J. Machine learning classification can reduce false positives in structure-based virtual screenin. Proceedings of the National Academy of Sciences. 2020;117(31):18477–18488.
  • Stecula A, Hussain MS, Viola RE. Discovery of Novel Inhibitors of a Critical Brain Enzyme Using a Homology Model and a Deep Convolutional Neural Networ. Journal of Medicinal Chemistry. 2020;63(16):8867–8875.
  • Wallach I, Heifets A. Most ligand-based classification benchmarks reward memorization rather than generalizatio. J Chem Inf Model. 2018;58(5):916–932.
  • Cai C, Wang S, Xu Y, et al. Transfer Learning for Drug Discover. Journal of Medicinal Chemistry. 2020;63(16):8683–8694. .
  • Li X, Fourches D. Inductive transfer learning for molecular activity prediction: next-Gen QSAR Models with MolPMoFi. Journal of Cheminformatics. 2020;12(1). DOI:https://doi.org/10.1186/s13321-020-00430-x
  • Goh GB, Siegel C, Vishnu A, et al. Using Rule-Based Labels for Weak Supervised Learning: a ChemNet for Transferable Chemical Property Predictio. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 302–310 (Association for Computing Machinery, 2018). New york, USA. doi:https://doi.org/10.1145/3219819.3219838.
  • Hüllermeier E, Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Machine Learning. 2021;110(3):457–506.
  • Breiman L. Random Forest. Mach Learn. 2001;45(1):5–32.
  • Sheridan RP. Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Fores. Journal of Chemical Information and Modeling. 2012;52(3):814–823.
  • Toplak M, Močnik R, Polajnar M, et al. Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression model. Journal of Chemical Information and Modeling. 2014;54(2):431–441. .
  • Rahman R, Otridge J, Pal R. IntegratedMRF: random forest-based framework for integrating prediction from different data type. Bioinformatics. 2017;33(9):1407–1410.
  • Lakshminarayanan B, Pritzel A, Simple BC. Scalable Predictive Uncertainty Estimation using Deep Ensembles. In: Guyon I, Luxburg UV, Bengio S, et al, editors.Advances in Neural Information Processing Systems; 2017. Curran Associates, Inc.
  • Scalia G, Grambow CA, Pernici B, et al. Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Predictio. Journal of Chemical Information and Modeling. 2020;60(6):2697–2717. .
  • Nix DA, Weigend AS Estimating the mean and variance of the target probability distributio. Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94) 55–60 (1994). Orlando, FL.
  • Svensson F, Norinder U, Bender A. Modelling compound cytotoxicity using conformal prediction and PubChem HTS dat. Toxicology Research. 2017;6(1):73–80.
  • Norinder U, Carlsson L, Boyer S, et al. Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determinatio. Journal of Chemical Information and Modeling. 2014;54(6):1596–1603. .
  • Svensson F, Afzal AM, Norinder U, et al. Maximizing gain in high-throughput screening using conformal predictio. Journal of Cheminformatics. 2018;10(1). DOI:https://doi.org/10.1186/s13321-018-0260-4.
  • Alvarsson J, Arvidsson Mcshane S, Norinder U, et al. Predicting With Confidence: using Conformal Prediction in Drug Discover. Journal of Pharmaceutical Sciences. 2021;110(1):42–49. .
  • Miteva MA, Guyon F, Tuffery TP. Frog2: efficient 3D conformation ensemble generator for small compound. Nucleic Acids Res. 2010;38(Web Server):W622–W627.
  • Guasch L, Peach ML, Nicklaus MC. Tautomerism of warfarin: combined chemoinformatics, quantum chemical, and NMR investigatio. J Org Chem. 2015;80(20):9900–9909.
  • Rasmussen CE, Williams CKI. Gaussian processes for machine learning. Cambridge, MA: MIT Press; 2006.
  • Sahli Costabal F, Matsuno K, Yao J, et al. Machine learning in drug development: characterizing the effect of 30 drugs on the QT interval using Gaussian process regression, sensitivity analysis, and uncertainty quantificatio. ComputerMethodsin Applied Mechanics and Engineering. 2019;348:313–333.
  • Bannan CC, Mobley DL, Skillman AG. SAMPL6 challenge results frompredictions based on a general Gaussian process mode. . Journal of Computer-Aided Molecular Design. 2018;32(10):1165–1177.
  • Hie B, Bryson BD, Berger B. Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. Cell systems. 2020;11(5):461–477.e9.
  • Blundell C, Cornebise J, Kavukcuoglu K, et al. Weight Uncertainty in Neural Networ. International Conference on Machine Learning 1613–1622 (PMLR, 2015). Lille, France.
  • Zhang Y, Lee AA. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learnin. Chemical Science. 2019;10(35):8154–8163.
  • Guo C, Pleiss G, Sun Y, et al. On Calibration of Modern Neural Networks. In: International Conference on Machine Learning. Sydney, Australia. 2017. , pp. 1321–1330.
  • Williams DP, Lazic SE, Foster AJ, et al. Predicting Drug-Induced Liver Injury with Bayesian Machine Learnin. Chemical Research in Toxicology. 2020;33(1):239–248. .
  • Semenova E, Williams DP, Afzal AM, et al. A Bayesian neural network for toxicity predictio. bioRxiv. 2020;4(28):65532.
  • Lazic SE, Edmunds N, Pollard CE. Predicting Drug Safety and Communicating Risk: benefits of a Bayesian Approac. Toxicological Sciences. 2018;162(1):89–98.
  • Settles B. Active Learning. Synth Lect Artif Intell Mach Learn. 2012;6:1–114.
  • Reker D, Schneider G. Active-learning strategies in computer-assisted drug discover. Drug Discovery Today. 2015;20(4):458–465.
  • Reker D, Schneider P, Schneider G, et al. Active learning for computational chemogenomic. Future Medicinal Chemistry. 2017;9(4):381–402. .
  • Ryu S, Kwon Y, Kim WYA. A Bayesian graph convolutional network for reliable prediction of molecular properties with uncertainty quantificatio. . Chemical Science. 2019;10(36):8438–8446.
  • Hirschfeld L, Swanson K, Yang K, et al. Uncertainty Quantification Using Neural Networks for Molecular Property Predictio. Journal of Chemical Information and Modeling. 2020;60(8):3770–3780. .
  • Moss HB, Griffiths -R-R. Gaussian Process Molecule Property Prediction with FlowMO. In: ArXiv201001118 Cs Stat. 2020.
  • Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rule. Journal of Chemical Information and Modeling. 1988;28(1):31–36.
  • Hechinger M, Leonhard K, Marquardt W. What is Wrong with Quantitative Structure–Property Relations Models Based on Three-Dimensional Descriptors Journal of Chemical Information and Modeling. 2012;52(8):1984–1993.
  • Girard A, Rasmussen C, Candela JQ, et al. Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecastin. Adv Neural Inf Process Syst. 2003;15(NIPS):2002.
  • Hanafusa R, Okadome T. Bayesian Kernel Regression for Noisy Inputs Based on Nadaraya–Watson Estimator Constructed from Noiseless Training Dat. Adv. Data Sci. Adapt. Anal 2020;12:205000).
  • Titsias MK, Lawrence ND Bayesian Gaussian Process Latent Variable Mode. International Conference on Artificial Intelligence and Statistics (AISTATS) 2010. Sardinia, Italy.
  • Li P, Chen S. A review on Gaussian Process Latent Variable Model. CAAI Transactions on Intelligence Technology. 2016;1(4):366–376.
  • Gebhardt J, Kiesel M, Riniker S, et al. Combining Molecular Dynamics and Machine Learning to Predict Self-Solvation Free Energies and Limiting Activity Coefficient. J Chem Inf Model. 2020;60:5319–5330.
  • Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreadin. J Comput Chem. 2010;31:455–461.
  • Halgren TA, Murphy RB, Friesner RA, et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screenin. J Med Chem. 2004;47(7):1750–1759.
  • Spitzer R, Jain AN. Surflex-Dock: docking benchmarks and real-world applicatio. J Comput Aided Mol Des. 2012;26:687–699.
  • Verdonk ML, Cole JC, Hartshorn MJ, et al. Improved protein–ligand docking using GOL. Proteins Struct Funct Bioinforma. 2003;52:609–623.
  • Cheng T, Li X, Li Y, et al. Comparative Assessment of Scoring Functions on a Diverse Test Se. J Chem Inf Model. 2009;49:1079–1093.
  • Li Y, Han L, Liu Z, et al. Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Result. J Chem Inf Model. 2014;54:1717–1736.
  • Khamis MA, Gomaa W, Ahmed WF. Machine learning in computational dockin. Artif Intell Med. 2015;63:135–152.
  • Sánchez-Cruz N, Medina-Franco JL, Mestres J, et al. Extended Connectivity Interaction Features: improving Binding Affinity Prediction Through Chemical Descriptio. Bioinforma Oxf Engl. 2020. DOI:https://doi.org/10.1093/bioinformatics/btaa982.
  • Procacci P. Methodological uncertainties in drug-receptor binding free energy predictions based on classical molecular dynamic. Curr Opin Struct Biol. 2021;67:127–134.
  • Jiménez J, Škalič M, Martínez-Rosell G, et al. KDEEP: protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Network. J Chem Inf Model. 2018;58:287–296.
  • Pantsar T, Poso A. Binding Affinity via Docking: fact and Fictio. Molecules. 2018;23:8.
  • Morrone JA, Weber JK, Huynh T, et al. Combining Docking Pose Rank and Structure with Deep Learning Improves Protein–Ligand Binding Mode Prediction over a Baseline Docking Approac. J Chem Inf Model. 2020;60:4170–4179.
  • Shirts MR, Mobley DL, Chodera JD. Chapter 4 Alchemical Free Energy Calculations: ready for Prime Time? In: Spellmeyer DC, Wheeler R, editors. Annual Reports in Computational Chemistry. Elsevier; 2007. p. 41–59.
  • Shirts MR, Mobley DL. An Introduction to Best Practices in Free Energy Calculations. In: Monticelli L, Salonen E, editors. Biomolecular Simulations: methods and Protocols. Totowa, NJ: Humana Press; 2013. p. 271–311. DOI:https://doi.org/10.1007/978-1-62703-017-5_11.
  • Gapsys V, Michielssens S, Peters JH, et al. Calculation of Binding Free Energies. In: Kukol A, editor. Molecular Modeling of Proteins. Totowa, NJ: Springer; 2015. p. 173–209. DOI:https://doi.org/10.1007/978-1-4939-1465-4_9
  • Cournia Z, Allen BK, Beuming T, et al. Rigorous Free Energy Simulations in Virtual Screenin. J Chem Inf Model. 2020;60:4153–4169.
  • Wang L, Wu Y, Deng Y, et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Fiel. J Am Chem Soc. 2015;137:2695–2703.
  • Aldeghi M, Heifetz A, Bodkin MJ, et al. Accurate calculation of the absolute free energy of binding for drug molecule. Chem Sci. 2015;7:207–218.
  • Gaieb Z, Liu S, Gathiaka S, et al. D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energie. J Comput Aided Mol Des. 2018;32:1–20.
  • Shan Y, Kim ET, Eastwood MP, et al. How Does a Drug Molecule Find Its Target Binding Site?(Vol 133, Pg 9181, 2011. J Am Chem Soc. 2014;136(8):3320.
  • Zwanzig RW. High‐Temperature Equation of State by a Perturbation Method. I. Nonpolar Gase. J Chem Phys. 1954;22:1420–1426.
  • Kirkwood JG. Statistical Mechanics of Fluid Mixture. J Chem Phys. 1935;3:300–313.
  • Kumar S, Bouzida D, Swendsen RH, et al. THE weighted histogram analysis method for free-energy calculations on biomolecules. I. The metho. J Comput Chem. 1992;13:1011–1021.
  • Bennett CH. Efficient estimation of free energy differences from Monte Carlo dat. J Comput Phys. 1976;22:245–268.
  • Shirts MR, Chodera JD. Statistically optimal analysis of samples from multiple equilibrium state. J Chem Phys. 2008;129:12.
  • Wu H, Paul F, Wehmeyer C, et al. Multiensemble Markov models of molecular thermodynamics and kinetic. Proc Natl Acad Sci. 2016;113:E3221–E3230.
  • Rosta E, Hummer G. Free Energies from Dynamic Weighted Histogram Analysis Using Unbiased Markov State Mode. J Chem Theory Comput. 2015;11(1):276–285.
  • Grossfield A, Patrone PN, Roe DR, et al. Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations [Article v1.0. Living J. Comput. Mol. Sci 2018;1:5067.
  • Chodera JD, Simple A. Method for Automated Equilibration Detection in Molecular Simulation. J Chem Theory Comput. 2016;12:1799–1805.
  • Crooks GE. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy difference. Phys Rev E. 1999;60:2721–2726.
  • Gapsys V, Pérez-Benito L, Aldeghi M, et al. Large scale relative protein ligand binding affinities using non-equilibrium alchem. Chem Sci. 2020;11:1140–1152.
  • Baumann H, Gapsys V, De Groot BL, et al. Challenges Encountered Applying Equilibrium and Non-Equilibrium Binding Free Energy Calculation. (2020) doi:https://doi.org/10.26434/chemrxiv.13225181.v1.
  • Khalak Y, Tresadern G, De Groot BL, et al. Non-equilibrium approach for binding free energies in cyclodextrins in SAMPL7: force fields and softwar. J Comput Aided Mol Des. 2020. DOI:https://doi.org/10.1007/s10822-020-00359-1.
  • Gapsys V, De Groot BL. On the importance of statistics in molecular simulations for thermodynamics, kinetics and simulation box siz. eLife. 2020;9:e57589.
  • Shirts MR, Klein C, Swails JM, et al. Lessons learned from comparing molecular dynamics engines on the SAMPL5 datase. J Comput Aided Mol Des. 2017;31:147–161.
  • Merz PT, Shirts MR. Testing for physical validity in molecular simulation. PLOS ONE. 2018;13:9.
  • Nerenberg PS, Head-Gordon T. New developments in force fields for biomolecular simulation. Curr Opin Struct Biol. 2018;49:129–138.
  • Beauchamp KA, Lin Y-S, Das R, et al. Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurement. J Chem Theory Comput. 2012;8:1409–1414.
  • Mobley DL, Bannan CC, Rizzi A, et al. Escaping Atom Types in Force Fields Using Direct Chemical Perceptio. J Chem Theory Comput. 2018;14:6076–6092.
  • Wang L-P, Martinez TJ, Pande VS. Building Force Fields: an Automatic, Systematic, and Reproducible Approac. J Phys Chem Lett. 2014;5:1885–1891.
  • Wang L-P, McKiernan KA, Gomes J, et al. Building a More Predictive Protein Force Field: a Systematic and Reproducible Route to AMBER-FB1. J Phys Chem B. 2017;121:4023–4039.
  • Qiu Y, Nerenberg PS, Head-Gordon T, et al. Systematic Optimization of Water Models Using Liquid/Vapor Surface Tension Dat. The Journal of Physical Chemistry B. 2019;123(32):7061–7073. .
  • Ponder JW, Wu C, Ren P, et al. Current Status of the AMOEBA Polarizable Force Fiel. The Journal of Physical Chemistry B. 2010;114(8):2549–2564. .
  • Shi Y, Laury ML, Wang Z, et al. AMOEBA binding free energies for the SAMPL7 TrimerTrip host–guest challeng. J Comput Aided Mol Des. 2020. https://doi.org/10.1007/s10822-020-00358-2.
  • Smith JS, Isayev O, Roitberg AEANI-1. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cos. Chemical Science. 2017;8(4):3192–3203.
  • Lahey S-LJ, Rowley CN. Simulating protein–ligand binding with neural network potential. . Chemical Science. 2020;11(9):2362–2368.
  • Işık M, Bergazin TD, Fox T, et al. Assessing the accuracy of octanol–water partition coefficient predictions in the SAMPL6 Part II log P Challeng. Journal of Computer-Aided Molecular Design. 2020;34(4):335–370. .
  • Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: generative models for matter engineerin. Science. 2018;361(6400):360–365.
  • Pollice R, Dos Passos Gomes G, Aldeghi M, et al. Data-Driven Strategies for Accelerated Materials Desig. Accounts of Chemical Research. 2021;54(4):849–860. .
  • Kingma DP, Welling M. Auto-Encoding Variational Bayes. ArXiv13126114 Cs Stat (2014. International Conference on Learning Representations,  Alberta, Canada, 2014. https://openreview.net/forum?id=33X9fd2-9FyZd
  • Goodfellow IJ, Pouget-Abadie J, Mizra M, et al. Generative Adversarial Networks. In: ArXiv14062661 Cs Stat. 2014.
  • Sutton RS, Barto AG. Reinforcement Learning, second edition: an Introduction. Cambridge, MA: MIT Press; 2018.
  • Mitchell M. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press; 1998.
  • Gómez-Bombarelli R, Wei JN, Duvenaud D, et al., Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecule. ACS Central Science. 4(2): 268–276. 2018. .
  • Prykhodko O, Johansson SV, Kotsias P-C, et al. A de novo molecular generation method using latent vector based generative adversarial networ. J Cheminform. 2019;11(1):74.
  • Maziarka Ł, Pocha A, Kaczmarczyk J, et al. Mol-CycleGAN: a generative model for molecular optimization. J Cheminform. 2020;12(1). DOI:https://doi.org/10.1186/s13321-019-0404-1
  • Ståhl N, Falkman G, Karlsson A, et al. Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Desig. Journal of Chemical Information and Modeling. 2019;59(7):3166–3176. .
  • Zhou Z, Kearnes S, Li L, et al. Optimization of Molecules via Deep Reinforcement Learnin. Sci Rep. 2019;9(1):1-10.
  • Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de novo drug desig. Science Advances. 2018;4(7):eaap7885.
  • Jensen H. J. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical spac. Chemical Science. 2019;10(12):3567–3572.
  • Yoshikawa N, Terayama K, Sumita M, et al. Population-based De Novo Molecule Generation, Using Grammatical Evolutio. Chemistry Letters. 2018;47(11):1431–1434. .
  • Nigam A, Friederich P, Krenn M, et al. Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space in 2020. International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020. https://openreview.net/forum?id=H1lmyRNFvr
  • Thiede LA, Krenn M, Nigam A, et al. Curiosity in exploring chemical space: intrinsic rewards for deep molecular reinforcement learning, ArXiv201211293 Phys, 2020.
  • Shen C, Krenn M, Eppel S, et al. Deep Molecular Dreaming: inverse machine learning for de-novo molecular design and interpretability with surjective representations, ArXiv201209712 Phys, 2020.
  • Krenn M, Häse F, Nigam A, et al. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representatio. Machine Learning: Science and Technology. 2020;1(4):045024. .
  • Nigam A, Pollice R, Krenn M, et al. (2021). Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIE. Chemical Science.
  • Sorkun MC, Koelman JMVA, Er S. Pushing the limits of solubility prediction via quality-oriented data selectio. iScience. 2021;24(1):101961.
  • Brereton AE, MacKinnon S, Safikhani Z, et al. Predicting drug properties with parameter-free machine learning: pareto-optimal embedded modeling (POEM. Machine Learning: Science and Technology. 2020;1(2):025008. .
  • Cruz-Monteagudo M, Medina-Franco J, Pérez-Castillo Y, et al. Activity cliffs in drug discovery: dr Jekyll or Mr Hyde Drug Discovery Today. 2014;19(8):1069–1080.
  • Irwin JJ, Tang KG, Young J, et al. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discover. J Chem Inf Model2020;60:12.
  • Ramakrishnan R, Dral PO, Rupp M, et al. Quantum chemistry structures and properties of 134 kilo molecule. Scientific Data. 2014;1(1). DOI:https://doi.org/10.1038/sdata.2014.22.
  • Nakata M, Shimazaki ST. PubChemQC Project: a Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistr. . Journal of Chemical Information and Modeling. 2017;57(6):1300–1308.
  • Nakata M, Shimazaki T, Hashimoto M, et al. PubChemQC PM6: data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Propertie. Journal of Chemical Information and Modeling. 2020;60(12):5891–5899. .
  • Tox21 Data Browse. https://tripod.nih.gov/tox21/
  • ToxCast Database (invitroDB. https://epa.figshare.com/articles/dataset/ToxCast_Database_invitroDB_/6062623
  • Wu Z, Ramsundar B, Feinberg EN, et al. MoleculeNet: a benchmark for molecular machine learnin. Chemical Science. 2018;9(2):513–530.
  • SIDER Side Effect Resourc. http://sideeffects.embl.de/
  • Mendez D, Gaulton A, Bento AP, et al. ChEMBL: towards direct deposition of bioassay dat. Nucleic Acids Res. 2019;47(D1):D930–D940.
  • Sorkun MC, Khetan A, Er S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compound. Scientific Data. 2019;6(1):143.
  • Keller A, Vosshall LB. Olfactory perception of chemically diverse molecule. BMC Neuroscience. 2016;17(1). DOI:https://doi.org/10.1186/s12868-016-0287-2
  • Mobley DL, Guthrie JP. FreeSolv: a database of experimental and calculated hydration free energies, with input file. Journal of Computer-Aided Molecular Design. 2014;28(7):711–720.
  • Delaney JSESOL. ESOL:  estimating Aqueous Solubility Directly from Molecular Structur.Journal of Chemical Information and Computer Sciences. 2004;44(3):1000–1005.
  • Wang J-B, Cao D-S, Zhu M-F, et al. In silico evaluation of logD 7.4 and comparison with other prediction method. Journal of Chemometrics. 2015;29(7):389–398. .
  • Li W. Lipophilicity Dataset - logD7.4 of 1,130 Compound. (2017) doi:https://doi.org/10.6084/m9.figshare.5596750.
  • Wang Y, Xiao J, Suzek TO, et al. PubChem’s BioAssay Databas. Nucleic Acids Res. 2012;40(D1):D400–D412. .
  • Wang R, Fang X, Lu Y, et al. The PDBbind Database: collection of Binding Affinities for Protein−Ligand Complexes with Known Three-Dimensional Structure. Journal of Medicinal Chemistry. 2004;47(12):2977–2980. .
  • Liu Z, Li Y, Han L, et al. PDB-wide collection of binding data: current status of the PDBbind databas. Bioinformatics. 2015;31(3):405–412.
  • Martins IF, Teixeira AL, Pinheiro L, et al. A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modelin. Journal of Chemical Information and Modeling. 2012;52(6):1686–1697. .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.