737
Views
9
CrossRef citations to date
0
Altmetric
Review

The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction

, &
Pages 1045-1056 | Received 11 Jan 2021, Accepted 08 Mar 2021, Published online: 19 Mar 2021

References

  • Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.
  • Brown TB, Mann B, Ryder N; et al. Language models are few-shot learners. ArXiv200514165 Cs, 2020.
  • Silver D, Huang A, Maddison CJ, et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016;529(7587):484–489.
  • Goodfellow I, Bengio Y, Courville A. Deep learning; adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press; 2016.
  • LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444.
  • Vamathevan J, Clark D, Czodrowski P, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18(6):463–477.
  • Schneider P, Walters WP, Plowright AT, et al. Rethinking drug design in the artificial intelligence Era. Nat Rev Drug Discov. 2020;19(5):353–364.
  • Freedman DH. Hunting for new drugs with AI. Nature. 2019;576(7787):S49–S53.
  • Yang X, Wang Y, Byrne R, et al. Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev. 2019;119(18):10520–10594.
  • Properzi F, Taylor K, Steedman M, et al. Intelligent drug discovery; Deloitte Centre for Health Solutions; Deloitte University, B-1831 Diegem, Berkenlaan. 2019.
  • AI for drug discovery, biomarker development, and advanced R&D landscape overview 2019/Q2; Deep Knowledge Analytics “Pharma Division”. 2019. [cited 2021 Dec 31]. Available from: https://ai-pharma.dka.global/quarter-2-2019/.
  • Stokes JM, Yang K, Swanson K, et al. A deep learning approach to antibiotic discovery. Cell. 2020;180(4):688–702.e13.
  • Callaway E. ‘It will change everything’: deepMind’s AI makes gigantic leap in solving protein structures. Nature. 2020;588(7837):203–204.
  • Senior AW, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577(7792):706–710.
  • Ekins S, Puhl AC, Zorn KM, et al. Exploiting machine learning for end-to-end drug discovery and development. Nat Mater. 2019;18(5):435–441.
  • Unterthiner T, Mayr A. Deep learning as an opportunity in virtual screening. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Redhook NY USA. 2014; 27.
  • Struble TJ, Alvarez JC, Brown SP, et al. Current and future roles of artificial intelligence in medicinal chemistry synthesis. J Med Chem. 2020;63(16):8667–8682.
  • Walters WP, Barzilay R. Applications of deep learning in molecule generation and molecular property prediction. Acc Chem Res. 2020; acs.accounts.0c00699. DOI: https://doi.org/10.1021/acs.accounts.0c00699.
  • Merck molecular activity challenge. cited [2020 Nov 29]. Available from: https://kaggle.com/c/MerckActivity
  • Ma J, Sheridan RP, Liaw A, et al. Deep neural nets as a method for quantitative structure–activity relationships. J Chem Inf Model. 2015;55(2):263–274.
  • Mayr A, Klambauer G, Unterthiner T, et al. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci. 2018;9(24):5441–5451.
  • Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(D1):D1100–D1107.
  • Jordan AM. Artificial intelligence in drug design—the storm before the calm? ACS Med Chem Lett. 2018;9(12):1150–1152.
  • Xu Y, Li X, Yao H, et al. Neural networks in drug discovery: current insights from medicinal chemists. Future Med Chem. 2019;11(14):1669–1672.
  • Su M, Feng G, Liu Z, et al. Tapping on the black box: how is the scoring power of a machine-learning scoring function dependent on the training set? J Chem Inf Model. 2020;60(3):1122–1136.
  • Liu R, Wang H, Glover KP, et al. Dissecting machine-learning prediction of molecular activity: is an applicability domain needed for quantitative structure–activity relationship models based on deep neural networks? J Chem Inf Model. 2019;59(1):117–126.
  • Paul D, Sanap G, Shenoy S, et al. Artificial intelligence in drug discovery and development. Drug Discov Today. 2020;26(1):S1359644620304256.
  • Brown N, Ertl P, Lewis R, et al. Artificial intelligence in chemistry and drug design. J Comput Aided Mol Des. 2020;34(7):709–715.
  • Brown N, Cambruzzi J, Cox PJ, et al. Big data in drug discovery. In: Progress in Medicinal Chemistry. Vol. 57, Elsevier; 2018. p. 277–356.
  • Zhu H. Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol. 2020;60(1):573–589.
  • Irwin BWJ, Levell JR, Whitehead TM, et al. Practical applications of deep learning to impute heterogeneous drug discovery data. J Chem Inf Model. 2020;60(6):2848–2857.
  • Zhang Y, Yang QA Survey on multi-task learning. ArXiv170708114 Cs, 2018.
  • Pan SJ, Yang QA. Survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–1359.
  • Brigato L, Iocchi L, Close A Look at deep learning with small data. ArXiv200312843 Cs Stat, 2020.
  • David L, Thakkar A, Mercado R, et al. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminformatics. 2020;12(1):56.
  • Lo YC, Rensi SE, Torng W, et al. Machine Learning in Chemoinformatics and Drug Discovery. Drug Discovery Today, 2018;23(8):1538–1546. Available from: https://doi.org/10.1016/j.drudis.2018.05.010
  • Durant JL, Leland BA, Henry DR, et al. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42(6):1273–1280.
  • Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754.
  • Fieser LF, Ettlinger MG, Fawaz G. Naphthoquinone antimalarials. XV. Distribution between organic solvents and aqueous buffers 1,2. J Am Chem Soc. 1948;70(10):3228–3232.
  • Cherkasov A, Muratov EN, Fourches D, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57(12):4977–5010.
  • Todeschini R, Consonni V. Molecular descriptors for chemoinformatics: volume i: alphabetical listing/volume II: appendices, references. 1st ed. Methods and Principles in Medicinal Chemistry. Vol. 41, Wiley; 2009. DOI: https://doi.org/10.1002/9783527628766
  • Topliss JG, Costello RJ. Chance correlations in structure-activity studies using multiple regression analysis. J Med Chem. 1972;15(10):1066–1068.
  • Koutsoukas A, Paricharak S, Galloway WRJD, et al. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. J Chem Inf Model. 2014;54(1):230–242.
  • Bender A, Jenkins JL, Scheiber J, et al. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J Chem Inf Model. 2009;49(1):108–119.
  • Dearden JC, Cronin MTD, Kaiser KLE. How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR). SAR QSAR Environ Res. 2009;20(3–4):241–266.
  • Chuang KV, Gunsalus LM, Keiser MJ. Learning molecular representations for medicinal chemistry: miniperspective. J Med Chem. 2020;63(16):8705–8722.
  • Ponzoni I, Sebastián-Pérez V, Requena-Triguero C, et al. Hybridizing feature selection and feature learning approaches in QSAR modeling for drug discovery. Sci Rep. 2017;7(1):2403.
  • Yang K, Swanson K, Jin W, et al. Analyzing learned molecular representations for property prediction. J Chem Inf Model. 2019;59(8):3370–3388.
  • Wu Z, Ramsundar B, Feinberg EN, et al. MoleculeNet: a benchmark for molecular machine learning. Chem Sci. 2018;9(2):513–530.
  • Korotcov A, Tkachenko V, Russo DP, et al. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol Pharm. 2017;14(12):4462–4475.
  • Koutsoukas A, Monaghan KJ, Li X, et al. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminformatics. 2017;9(1):42.
  • Tsou LK, Yeh S-H, Ueng S-H, et al. Comparative study between deep learning and QSAR classifications for TNBC inhibitors and novel GPCR Agonist discovery. Sci Rep. 2020;10(1):16771.
  • Sejnowski TJ. The unreasonable effectiveness of deep learning in artificial intelligence. Proceedings of the National Academy of Sciences. 2020;117(48):30033–30038.
  • Gilmer J, Schoenholz SS, Riley PF, et al. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning; Proceedings of Machine Learning Research; PMLR: International Convention Centre, Sydney, Australia, 2017;70:1263–1272.
  • Li Y, Tarlow D, Brockschmidt M, et al. Gated graph sequence neural networks. ArXiv151105493 Cs Stat, 2017.
  • Schütt KT, Arbabzadah F, Chmiela S, et al. Quantum-chemical insights from deep tensor neural networks. Nat Commun. 2017;8(1):13890.
  • Chakravarti SK, Alla SRM. Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Front Artif Intell. 2019;2:17.
  • Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model. 2013;53(7):1563–1575.
  • Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. ArXiv170603762 Cs, 2017.
  • Öztürk H, Özgür A, Schwaller P, et al. Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discov Today. 2020;25(4):689–705.
  • Schwaller P, Laino T, Gaudin T, et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci. 2019;5(9):1572–1583.
  • Nayak P, Silberfarb A, Chen R, et al. Transformer based molecule encoding for property prediction. ArXiv201103518 Q-bio, 2020.
  • Netzeva TI, Worth AP, Aldenberg T, et al. Current status of methods for defining the applicability domain of (Quantitative) structure-activity relationships: the report and recommendations of ECVAM workshop 52. Altern Lab Anim. 2005;33(2):155–173.
  • Maple HJ, Clayden N, Baron A, et al. Developing degraders: principles and perspectives on design and chemical space. MedChemComm. 2019;10(10):1755–1764.
  • Chopra R, Sadok A, Collins I, et al. Evaluation of the approaches to targeted protein degradation for drug discovery. Drug Discov Today Technol. 2019;31:5–13.
  • Costales MG, Childs-Disney JL, Haniff HS, et al. How we think about targeting RNA with small molecules. J Med Chem. 2020;63(17):8880–8900.
  • Hanser T, Barber C, Guesné S, et al. Applicability domain: towards a more formal framework to express the applicability of a model and the confidence in individual predictions. Advances in Computational Toxicology. Hong H, editor. Vol. 30, Cham: Challenges and Advances in Computational Chemistry and Physics; Springer International Publishing; 2019. 215–232.
  • Jaworska J, Nikolova-Jeliazkova N, Aldenberg T. QSAR applicability domain estimation by projection of the training set in descriptor space: a review. Altern Lab Anim. 2005;33(5):445–459.
  • Mathea M, Klingspohn W, Baumann K. Chemoinformatic classification methods and their applicability domain. Mol Inform. 2016;35(5):160–180.
  • Carrió P, Pinto M, Ecker G, et al. Applicability Domain Analysis (ADAN): a robust method for assessing the reliability of drug property predictions. J Chem Inf Model. 2014;54(5):1500–1511.
  • Klingspohn W, Mathea M, Ter Laak A, et al. Efficiency of different measures for defining the applicability domain of classification models. J Cheminformatics. 2017;9(1):44.
  • Hirschfeld L, Swanson K, Yang K, et al. Uncertainty quantification using neural networks for molecular property prediction. J Chem Inf Model. 2020;60(8):3770–3780.
  • Eklund M, Norinder U, Boyer S, et al. The application of conformal prediction to the drug discovery process. Ann Math Artif Intell. 2015;74(1):117–132.
  • Kim S, Chen J, Cheng T, et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):D1102–D1109.
  • Wong SC, Gatt A, Stamatescu V, et al. Understanding data augmentation for classification: when to warp? In 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA); IEEE: Gold Coast, Australia, 2016; pp 1–6. DOI: https://doi.org/10.1109/DICTA.2016.7797091.
  • Bjerrum EJ SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv:1703.07076 [cs], 2017.
  • Asilar E, Hemmerich J, Ecker GF. Image based liver toxicity prediction. J Chem Inf Model. 2020;60(3):1111–1121.
  • Borrel A, Huang R, Sakamuru S, et al. High-throughput screening to predict chemical-assay interference. Sci Rep. 2020;10(1):3986.
  • Sheridan RP. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J Chem Inf Model. 2013;53(4):783–790.
  • Rifaioglu AS, Atas H, Martin MJ, et al. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform. 2019;20(5):1878–1912.
  • Gaulton A, Hersey A, Nowotka M, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45(D1):D945–D954.
  • Irwin JJ, Tang KG, Young J, et al. ZINC20—A free ultralarge-scale chemical database for ligand discovery. J Chem Inf Model. 2020;60(12):6065–6073.
  • Pejó B The good, the bad, and the ugly: quality inference in federated learning. ArXiv200706236 Cs Stat, 2020.
  • Simpson PB, Wilkinson GF. What makes a drug discovery consortium successful? Nat Rev Drug Discov. 2020;19(11):737–738.
  • Hinkson IV, Madej B, Stahlberg EA. Accelerating therapeutics for opportunities in medicine: a paradigm shift in drug discovery. Front Pharmacol. 2020;11. DOI:https://doi.org/10.3389/fphar.2020.00770
  • Year 1 announcement. [cited 2021 Mar 1]. Available from: https://www.melloddy.eu/y1announcement
  • Wild DJ. Mining large heterogeneous data sets in drug discovery. Expert Opin Drug Discov. 2009;4(10):995–1004.
  • Riley P. Three Pitfalls to avoid in machine learning. Nature. 2019;572(7767):27–29.
  • D’Amour A, Heller K, Moldovan D; et al. Underspecification presents challenges for credibility in modern machine learning. ArXiv201103395 Cs Stat, 2020.
  • Cai C, Wang S, Xu Y, et al. Transfer learning for drug discovery. J Med Chem. 2020;63(16):8683–8694.
  • Duvenaud DK, Maclaurin D, Iparraguirre J, et al. InAdvances in Neural Information Processing Systems; Curran Associates, Inc.: Redhook NY USA. 2015;28.
  • Coley CW, Barzilay R, Green WH, et al. Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model. 2017;57(8):1757–1772.
  • Abbasi K, Poso A, Ghasemi J, et al. Deep transferable compound representation across domains and tasks for low data drug discovery. J Chem Inf Model. 2019;59(11):4528–4539.
  • Wang S, Guo Y, Wang Y, et al. SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics; ACM: Niagara Falls NY USA, 2019; pp 429–436. DOI: https://doi.org/10.1145/3307339.3342186.
  • Li X, Fourches D. Inductive transfer learning for molecular activity prediction: next-Gen QSAR models with MolPMoFiT. J Cheminformatics. 2020;12(1):27.
  • Ramsundar B, Liu B, Wu Z, et al. Is multitask deep learning practical for pharma? J Chem Inf Model. 2017;57(8):2068–2076.
  • Feinberg EN, Joshi E, Pande VS, et al. Improvement in ADMET prediction with multitask deep featurization. J Med Chem. 2020;63(16):8835–8848.
  • Feinberg EN, Sur D, Wu Z, et al. PotentialNet for molecular property prediction. ACS Cent Sci. 2018;4(11):1520–1530.
  • Montanari F, Kuhnke L A multitask deep learning model for physico-chemical property prediction. Gordon Research Conference in Computer Aided Drug Design. 2019.
  • Finn C, Abbeel P, Levine S Model-Agnostic meta-learning for fast adaptation of deep networks. ArXiv170303400 Cs, 2017.
  • Olier I, Sadawi N, Bickerton GR, et al. Meta-QSAR: a large-scale application of meta-learning to drug design and discovery. Mach Learn. 2018;107(1):285–311.
  • Nguyen CQ, Kreatsoulas C, Branson KM Meta-learning GNN initializations for low-resource molecular property prediction. ArXiv200305996 Phys. Stat, 2020.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.