ABSTRACT
Introduction
Artificial intelligence (AI) has seen a massive resurgence in recent years with wide successes in computer vision, natural language processing, and games. The similar creation of robust and accurate AI models for ADME/Tox endpoint and activity prediction would be revolutionary to drug discovery pipelines. There have been numerous demonstrations of successful applications, but a key challenge remains: how generalizable are these predictive models?
Areas covered
The authors present a summary of current promising components of AI models in the context of early drug discovery where ADME/Tox endpoint and activity prediction is the main driver of the iterative drug design process. Following that is a review of applicability domains and dataset construction considerations which determine generalizability bottlenecks for AI deployment. Further reviewed is the role of promising learning frameworks – multitask, transfer, and meta learning – which leverage auxiliary data to overcome issues of generalizability.
Expert opinion
The authors conclude that the most promising direction toward integrating reliable and informative AI models into the drug discovery pipeline is a conjunction of learned feature representations, deep learning, and novel learning frameworks. Such a solution would address the sparse and incomplete datasets that are available for key endpoints related to drug discovery.
Acknowledgments
The authors thank Dr S Szalma (Takeda Global Head of Computational Biology), Dr. L Hamann (Takeda Head of Drug Discovery Sciences), and Professor I Tsigelny (San Diego Supercomputer Center) for their critical reading of the manuscript.
Declaration of interest
SS Bahmanyar and JC Baber are both employees of Takeda Pharmaceuticals. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.
Article highlights
Harnessing computational power to make predictions based on data, AI is an effective technique and has proven itself in other domains and is poised to do the same to drug discovery.
AI models that make robust in silico predictions for activities and ADME/Tox endpoints can help make the drug candidate selection more efficient and cost effective, delivering safe and efficacious medicines for the patient.
The main challenge to creating accurate and applicable AI models is that the available experimental data is heterogenous, noisy, and sparse, so appropriate data curation and data collection is of the utmost importance.
Ultimately, the ability of AI models to become more generalizable to novel situations is the only solution to keep up with the fast-paced and dynamic nature of drug discovery.
The most promising, generalizable AI models utilize deep learning on learned feature representations, integrated under learning frameworks which allow models to aggregate information from similar but distinct ADME/Tox endpoint and activity prediction tasks.
This box summarizes key points contained in the article.