1,091
Views
0
CrossRef citations to date
0
Altmetric
Artificial Intelligence

Insights into artificial intelligence utilisation in drug discovery

& ORCID Icon
Pages 304-308 | Received 01 Jan 2024, Accepted 05 Feb 2024, Published online: 22 Feb 2024

Introduction

Drug discovery is the process of identifying and evaluating molecules for their ability to safely modulate disease, ultimately aiming to make medicines that enhance patient health and quality of lifeCitation1. Historically, this field was marked by serendipitous discoveries such as Fleming’s discovery of penicillin, where breakthroughs occurred by chance. The 1960s and 1970s marked a shift towards a more rational approach, with advancements like X-ray crystallography. These advancements not only provided deep insights into protein structures but also laid the groundwork for computational drug discovery, a process that utilizes computer-based algorithms and simulations to predict and analyze the behaviour of potential drug compounds. This evolution in methodology was highlighted in an article in Fortune magazine on 5 October 1981, which described Merck’s implementation of computational methods, sparking significant excitement in the fieldCitation2. Subsequently, the rise of biotechnology and genomics in the late twentieth century further advanced drug discovery into a multidisciplinary endeavour, emphasizing experimental models like assay-based testing for detailed drug interaction insights, translational models to connect research with clinical application, and clinical models essential for conducting human trials to evaluate drug safety and efficacyCitation3.

High-throughput screening (HTS), a method of testing large numbers of molecules for biological responses using automation and minimal design, despite its efficiency, it suffered from low rate of successful hits. However, these limitations often restrict HTS to extensive research programs. Recent studies have continued to explore these limitations, suggesting improvements in HTS methodologiesCitation4,Citation5.

Despite advancements in such traditional methods, the complex, time-consuming, and costly nature of drug discovery persists. On average, developing a drug from natural sources or innovative techniques can take 10–15 years and cost over $2.5 billionCitation6. In response to these challenges, the drug development industry, driven by the need to optimize costs and efficiency, has increasingly turned towards artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL). AI refers to the simulation of human intelligence in machines programmed to think and act like humans. ML involves the use of algorithms to mimic the functionality of human brain to perform different tasks. DL a more advanced subset of ML that uses multi-layered neural networks to efficiently identify complex patterns in data. This shift marks a new era in drug discoveryCitation7.

With the expansion of chemical and biological data, AI is revolutionizing the field, promising enhanced efficiency, reduced costs, and improved success rates in drug development. The applications of AI extend across various stages, including target identification and validation, hit discovery, lead prioritization, drugs property prediction, de novo drug design, chemical synthesis and drug repurposing (). Despite these significant advances, the full potential of AI, ML, and DL in drug discovery, especially in bridging the unpredictability of early discoveries with the precision now required, is an area of ongoing exploration. This editorial highlights the current landscape of AI in drug discovery, critically analyzing its capabilities and potential, and forecasting its future trajectory.

Figure 1. The application of AI methodologies in drug discovery.

Figure 1. The application of AI methodologies in drug discovery.

Discussion

AI in target identification and validation

AI tools offer robust methods for systematically analysing complex biological datasets to identify, prioritize, and evaluate potential targets that are efficacious, safe, and druggableCitation8,Citation9. For instance, Kang and colleagues developed a novel DL model, “Highlight on Target Sequence” (HoTS). This model predicts binding regions in target protein sequences based on patterns learned from protein-ligand interactions. Employing HoTS, they identified a novel putative binding site on the P2X3 receptor. This site was then utilized for virtual screening, leading to the discovery of novel hit compounds with distinct chemical structures from known P2X3R antagonists. The use of HoTS not only increased the hit rate significantly compared to previous random screenings but also accelerated the discovery of novel chemical entities targeting P2X3RCitation10.

Neural networks are pivotal in AI -driven biological research, deciphering complex biological data through neural network analyses and DL techniquesCitation11. DL methods are increasingly used for disease target identification and drug discovery. An example of this is Insilico Medicine’s PandaOmics, which was employed to analyze gene dysregulation, altered pathways, and potential therapy targets for amyotrophic lateral sclerosis (ALS). Using 20 AI-driven models and bioinformatics tools, PandaOmics identified 28 therapy targets, validated using Drosophila models of c9ALS, demonstrating this platform’s success in target discoveryCitation11. It also emphasizes the role of AI in enhancing the precision and efficiency of the drug discovery process, especially in tackling intricate neurological disorders like ALS. This paves the way for more targeted and effective treatments for complex diseases, showcasing the transformative potential of AI in medical research.

Graph-based neural networks, such as GCN and GAT, have gained attention for their ability to navigate complex biological networks effectivelyCitation11. These advanced DL networks have shown exceptional efficiency in predicting protein-protein interactions (PPI), a crucial aspect of understanding cellular functions. Jha et al.Citation12, further explored these predictions, using GCN and GAT neural networks to predict protein interactions based on structural details and sequence properties of proteins. Protein graphs were constructed using 3D atomic coordinates from Protein Data Bank files, with additional insights from AlphaFold. While these methods represent significant advancements in comprehending complex biological processes through prediction, their complexity can limit their practical application. Moreover, their effectiveness is partly contingent on the quality of generated structures from databases, which may not always be comprehensive or accurateCitation11.

AI in hit discovery, filtration, and lead optimisation

Following target identification, hit discovery constitutes a critical juncture in drug development, determining the potential success or failure of new therapeutic candidates. Traditionally, techniques such as high-throughput and virtual screening are employed across extensive chemical spaces to identify and prioritize hit molecules for further development stages. The advent of AI-powered virtual screening has markedly transformed this phase, enabling more efficient processing, often within days or weeks. This shift owes much to advancements in data availability, enhanced computing capacities, and the integration of supercomputing technologies like GPUs and parallel processingCitation13. Additionally, a major issue in traditional approaches is the high incidence of false positives. AI-driven computational filters have been instrumental in addressing this issue, effectively identifying and excluding artifactual compounds, thereby streamlining the selection of viable hitsCitation14.

AI tools have also been pivotal in refining the docking process, identifying the most promising molecular scaffolds in expansive libraries. A notable example of AI's application in this domain is the work of Gentile et al.Citation15. They introduced an AI technique that utilises “deep docking", which enables the screening of 1% of the ligands effectively reducing the size of a massive library consisting of 1 billion ligands by a factor of 100. As a result, the screening process becomes significantly more efficient and manageable. However, it’s important to note that such DL approaches require substantial computational resources and depend heavily on the accuracy of the generated structures for screening. While offering remarkable efficiency, these complexities can sometimes limit the practical application of “deep docking” in scenarios with novel or less-characterized targets, highlighting the need for balanced consideration in AI’s evolving role in drug discovery. The same method was applied by Anh-Tien Ton and colleagues to target main protease (Mpro), a SARS-CoV-2 proteinCitation16. They managed to screen all 1.3 billion compounds from ZINC15 library to identify top 1,000 potential ligands for SARS-CoV-2 Mpro protein. This result is further evidence of AI’s transformative impact on drug discovery, showcasing its ability to significantly streamline and enhance the efficiency of the screening process.

Once promising "lead" drug compounds are identified, AI plays a crucial role in prioritizing them for further evaluation by ranking these molecules. This AI-based approach offers a more advanced method compared to traditional ranking techniques. Hongjian Li and colleagues reviewed AI scoring functions (SFs) for drug lead optimization in the 2015–2019 periodCitation17. Their findings revealed that the performance gap between AI scoring functions and classical SFs has widened over time. For instance, on the CASF-2007 benchmark, AI-based RF-Score v3 improved from an R_p of 0.803 to 0.836, significantly outperforming classical SFs like GlideScore-XP (R_p = 0.457). This 1.83-fold increase in performance is attributed to methodological advancements and the growing availability of data.

AI in drug properties prediction

Predicting physicochemical properties, bioactivity, and toxicity is crucial in drug discovery. Xiangxiang Zeng and colleagues have developed a DL framework called ImageMol, which is a learning system trained on a dataset of 10 million unlabelled drug like molecules colleaguesCitation18. Their model has shown capabilities, in assessing properties of these molecules, including drug metabolism, brain penetration, toxicity and molecular target profiles. It has also demonstrated accuracy, in identifying COVID 19 treatments and molecules that combat SARS CoV 2. Citation19. Jang, H.Y et al. have introduced a DL model that has been specifically designed to predict and quantify drug-drug interactions (DDIs). This model demonstrates performance in predicting the fold change (FC) of the area under the time concentration curve (AUC) within a narrow range of ± 0.5959. Moreover, it achieves prediction accuracy across AUC ranges; 75.77% for 0.8–1.25 fold 86.68%, for 0.67–1.5 fold and 94.76%, for 0.5–2 foldCitation20. While its precision in predictions is a notable strength, the model’s dependence on data quality may pose challenges for its adaptability across different drugs

AI in de novo drug design

De novo drug design involves creating new compounds with specific pharmacological and physicochemical characteristics from the ground up is increasingly being facilitated by AI-based methods, which are now commonly used for molecular generation tasks.

Yi Fang and colleagues have recently presented their novel method known as the Quality Assessment-Based Drug Design (QADD) approach. This innovative approach employs a multiobjective deep reinforcement learning process to systematically create molecules that possess several desirable characteristics. Their study offered insights into the efficiency of QADD in optimizing multiple molecular properties and creating molecules with high drug potentialCitation21.

However, one of the challenges we face is that de novo generators based on sequences often produce invalid outputs. A recent study proposed a solution by using post hoc error correction techniquesCitation22. They utilised transformer models commonly used in natural language processing to transform Simplified Molecular Input Line Entry System (SMILES) representations into valid ones. The findings revealed that by correcting the outputs in SMILES models the validity of these models increased significantly leading to the generation of diverse molecules. Moreover this research emphasized how post hoc error correction, in SMILES can enhance the discovery of drug candidates.

AI in chemical synthesis

AI has made notable strides in the field of drug synthesis, greatly enhancing the efficiency and accuracy of chemical synthesis processesCitation23,Citation24. Junliang Liu and Jason E. HeinCitation25, emphasized the substantial acceleration in the identification of ideal reaction conditions and the achievement of error-free autonomous synthesis by combining automation and real-time reaction monitoring with AI. Real-time monitoring, coupled with predictive modelling and AI-enhanced platforms, can streamline experimental procedures considerably. However, depending on automated approaches carries the danger of underestimating the complexity of chemical reactions, which might result in oversimplified explanations or results.

AI in drug repurposing

AI offers significant potential in drug repurposing trough leveraging the extensive knowledge base of existing drugs, gathered from various sources, and applying it within densely interconnected networks. The architecture of DL models is particularly adept at uncovering relationships between biological pathways and targets, enhancing the efficiency of drug repurposing efforts. A notable application of this approach was by Zeng and colleagues, who developed a comprehensive knowledge graph to identify potential therapeutics for COVID-19. This graph included millions of publications and mapped the relationships between them, leading to the identification of 41 promising drugs. These findings, while preliminary, highlights the transformative potential of AI in drug repurposing. However, this area faces significant challenges like the requirement for substantial computational resources to manage and analyze such large-scale networks. Another innovative strategy, proposed by Yang and Agarwal, involves building ML model that focuses on the side effects of known drugs. This model could illuminate new paths for drug repurposing by pinpointing specific areas for further investigation.

AI limitations in drug discovery

The application of AI in drug discovery, despite its potential, is not without limitations and challenges. Key among these is the issue of bias within AI algorithms, which can emerge from unrepresentative datasets or flawed training methodologies Such biases risk skewing drug efficacy assessments and potentially overlooking treatments for less represented conditions. Ethical concerns also play a critical role, particularly regarding patient data privacy. The automation brought about by AI raises questions about the future roles of researchers and healthcare professionals, underlining the need for ethical oversight. Moreover, the inherent limitations of current AI technologies, constrained by data availability and interpretative capabilities, cannot be overlooked. These limitations underscore AI’s reliance on extensive computational resources, which increases costs and limits accessibility, especially for resource-constrained institutions.

Milestones and future directions

The application of AI in drug discovery has indeed transitioned from a speculative concept to a tangible reality, marked by several significant milestones. Exscientia announced in early 2020 the entry of their AI-designed drug, DSP-1181 into human clinical trials The drug was intended to treat Obsessive compulsive disorder (OCD). Although detailed results from these trials and the ongoing status of the drug are not yet publically published, its notable advancement of DSP-1181 to clinical trials within just two years of development highlights the efficiency and potential of AI technologies in expediting the drug discovery process. In February 2022, Insilico Medicine launched its novel AI-discovered and AI-designed drug to Phase 1 trials in record time. This achievement was accomplished in 30 months, which is significantly faster and more cost-effective compared to conventional methods of drug discovery.

January 2023 marked an additional step forward when AbSci successfully created and validated de novo antibodies derived entirely by artificial intelligence. Shortly thereafter, in February 2023 the FDA granted their inaugural Orphan Drug Designation for AI-generated drugs; Insilico Medicine plans on commencing a global Phase II trial early the following year.

Conclusion

AI’s arrival in drug discovery marks a fundamental shift, moving from traditional and labour-intensive approaches to an efficient, data-driven one. Currently, the application of ML and DL, have revolutionized various stages of drug development. This includes target identification and validation, hit discovery, lead optimization, the prediction of drug properties, and chemical synthesis. AI’s capacity for analysing vast datasets to accurately predict results has significantly accelerated drug discovery processes, while simultaneously reducing costs and increasing the likelihood of success. However, AI utilization comes with limitations, especially in algorithmic biases, ethical concerns, and the requirement for substantial computational power. Moreover, the real-world efficacy and safety of AI-driven drugs need to be validated for safety and efficacy. In sum, While AI hold considerable for advancing drug discovery, its full impact on developing safe, effective treatments depends on overcoming current technological and ethical hurdles. The future of AI in this field rests on advancing these technologies within a framework that prioritizes rigorous validation and ethical considerations.

Transparency

Author contributions

The authors have contributed equally to the study design, performed data extraction, manuscript drafting and reviewing.

Acknowledgement

We would like to thank Prof. Mohammad A. Ghattas for reviewing the manuscript and providing insightful comments to improve its quality.

Declaration of funding

No funding was received to produce this article.

Declaration of financial/other relationships

The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

References

  • Bateman TJ. Chapter 29 - Drug discovery. In: Huang SM, Lertora JJL, Vicini P, Atkinson AJ, editors. Atkinson’s principles of clinical pharmacology (Fourth Edition). Boston: Academic Press; 2022. p. 563–572.
  • Sliwoski G, Kothiwale S, Meiler J, et al. Computational methods in drug discovery. Pharmacol Rev. 2014;66(1):334–395. doi:10.1124/pr.112.007336.
  • Zhou SF, Zhong WZ. Drug design and discovery: principles and applications, Vol. 22, Switzerland: Molecules (Basel, Switzerland); 2017. doi:10.3390/molecules22020279.
  • Singh AV, Bansod G, Mahajan M, et al. Digital transformation in toxicology: improving communication and efficiency in risk assessment. ACS Omega. 2023;8(24):21377–21390. doi:10.1021/acsomega.3c00596.
  • Singh AV, Varma M, Laux P, et al. Artificial intelligence and machine learning disciplines with the potential to improve the nanotoxicology and nanomedicine fields: a comprehensive review. Vol. 97, Archives of toxicology. Germany: Springer Science and Business Media Deutschland GmbH; 2023. p963–979. doi:10.1007/s00204-023-03471-x.
  • DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ. 2016;47:20–33. doi:10.1016/j.jhealeco.2016.01.012.
  • Al Meslamani AZ, Jarab AS, Ghattas MA. The role of machine learning in healthcare responses to pandemics: maximizing benefits and filling gaps. J Med Econ. 2023;26(1):777–780. doi:10.1080/13696998.2023.2224018.
  • Paudel N, Rai M, Adhikari S, et al. Green extraction, phytochemical profiling, and biological evaluation of dysphania ambrosioides: an in silico and in vitro medicinal investigation. J Herbs Spices Med Plants. 2023;2023:1–18. doi:10.1080/10496475.2023.2267467.
  • Rai M, Singh AV, Paudel N, et al. Herbal concoction unveiled: a computational analysis of phytochemicals’ pharmacokinetic and toxicological profiles using novel approach methodologies (NAMs). Curr Res Toxicol. 2023;5:100118. doi:10.1016/j.crtox.2023.100118.
  • Kang KM, Lee I, Nam H, et al. AI-based prediction of new binding site and virtual screening for the discovery of novel P2X3 receptor antagonists. Eur J Med Chem. 2022;240:114556. doi:10.1016/j.ejmech.2022.114556.
  • Yoo J, Kim TY, Joung I, et al. Industrializing AI/ML during the end-to-end drug discovery process. Curr Opin Struct Biol. 2023;79:102528. doi:10.1016/j.sbi.2023.102528.
  • Jha K, Saha S, Singh H. Prediction of protein-protein interaction using graph neural networks. Sci Rep. 2022;12(1):8360. doi:10.1038/s41598-022-12201-9.
  • Kimber TB, Chen Y, Volkamer A. Deep learning in virtual screening: recent applications and developments. Int J Mol Sci. 2021;22(9):4435. doi:10.3390/ijms22094435.
  • Mahgoub RE, Atatreh N, Ghattas MA. Using filters in virtual screening: a comprehensive guide to minimize errors and maximize efficiency. Annu Rep Med Chem. 2022;59:99–136. https://www.researchgate.net/publication/365111978_Using_filters_in_virtual_screening_A_comprehensive_guide_to_minimize_errors_and_maximize_efficiency.
  • Gentile F, Agrawal V, Hsing M, et al. Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent Sci. 2020;6(6):939–949. doi:10.1021/acscentsci.0c00229.
  • Ton AT, Gentile F, Hsing M, et al. Rapid identification of potential inhibitors of SARS-CoV-2 main protease by deep docking of 1.3 billion compounds. Mol Inform. 2020;39(8):e2000028. doi:10.1002/minf.202000028.
  • Li H, Sze KH, Lu G, et al. Machine-learning scoring functions for structure-based drug lead optimization. Vol. 10, USA: Wiley Interdisciplinary Reviews: Computational Molecular Science. Blackwell Publishing Inc.; 2020.
  • Li H, Sze KH, Lu G, et al. Machine-learning scoring functions for structure-based drug lead optimization. WIREs Comput Mol Sci. 2020;10(5):e1465. doi:10.1002/wcms.1465.
  • Zeng X, Xiang H, Yu L, et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat Mach Intell. 2022;4(11):1004–1016. doi:10.1038/s42256-022-00557-6.
  • Jang HY, Song J, Kim JH, et al. Machine learning-based quantitative prediction of drug exposure in drug-drug interactions using drug label information. NPJ Digit Med. 2022;5(1):88. doi:10.1038/s41746-022-00639-0.
  • Fang Y, Pan X, Shen HB. De novo drug design by iterative multiobjective deep reinforcement learning with graph-based molecular quality assessment. Bioinformatics. 2023;39(4):157. doi:10.1093/bioinformatics/btad157.
  • Schoenmaker L, Béquignon OJM, Jespers W, et al. UnCorrupt SMILES: a novel approach to de novo design. J Cheminform. 2023;15(1):22. doi:10.1186/s13321-023-00696-x.
  • Rai M, Paudel N, Sakhrie M, et al. Perspective on quantitative structure–toxicity relationship (QSTR) models to predict hepatic biotransformation of xenobiotics. Livers [Internet]. 2023;3(3):448–462. https://www.mdpi.com/2673-4389/3/3/32 doi:10.3390/livers3030032.
  • Mone NS, Syed S, Ravichandiran P, et al. Synergistic and additive effects of menadione in combination with antibiotics on Multidrug-Resistant Staphylococcus aureus: insights from Structure-Function analysis of naphthoquinones. ChemMedChem. 2023;18(24):e202300328. Available from: doi:10.1002/cmdc.202300328.
  • Zeng X, Song X, Ma T, et al. Repurpose open data to discover therapeutics for COVID-19 using deep learning. J Proteome Res. 2020;19(11):4624–4636. doi:10.1021/acs.jproteome.0c00316.