5,644
Views
7
CrossRef citations to date
0
Altmetric
Editorial

What are the current challenges for machine learning in drug discovery and repurposing?

ORCID Icon
Pages 423-425 | Received 18 Nov 2021, Accepted 04 Mar 2022, Published online: 08 Mar 2022

1. Introduction

Machine learning (ML) approaches have been increasingly adopted for computer-assisted drug discovery in the recent years. This rapid progress is mostly due to the large-scale data sets available for ML model training, along with the development of deep learning (DL) and other artificial intelligence (AI) approaches that enable an integrated use of heterogeneous compound and target data, ranging from chemical structure to biochemical, in vitro, in vivo, and clinical endpoints for treatment response modeling. Exciting opportunities to apply ML and AI methods occur in all stages of drug discovery and development, including target identification and validation, prediction of molecular structure and function, and compound screening and optimization [Citation1]. Based on high-quality training data, supervised learning and feature selection offers a systematic means for identification of predictive biomarkers for precision medicine; for instance, using clinical data, such predictive approaches may provide real-word evidence of treatment responses for designing marker-based trials and improving patient selection. In particular, cancer research and precision oncology has witnessed in the past few years an enormous progress in the application of ML to cancer detection and diagnosis, identification of new targets for anticancer drug discovery and repurposing, along with prediction of treatment outcomes for cancer patients [Citation2].

However, as was recently demonstrated in the fast pace of development and application of ML models for COVID-19 detection and prognostication, the eventual health-care impact of the new models may remain rather limited unless more attention will be paid to the rigorous and often lengthy process of model validation and potential biases in the training data sets. Strikingly, a recent report showed that none of the ML models developed using chest radiographs and CT scans so far are of potential clinical use for COVID-19 management, mainly due to methodological flaws and high or unclear risk of bias in the imaging datasets used for model training [Citation3]. Another important application of ML in the fight against COVID-19 is drug screening and repurposing. Since developing new drugs is costly and time-consuming, several efforts have been directed toward exploring existing drugs for COVID-19 treatment. However, after the first year of the COVID-19 pandemic, drug repurposing had not produced new treatment solutions [Citation4]. This does not imply the failure of drug repurposing concept per se, rather that the development of clinic-ready treatments takes time, even with new and accelerated approaches. Indeed, increasing number repurposed drugs continue to be considered in clinical trials, along with several novel candidates.

To address these challenges, a number of recent community efforts have set guidelines for improving the quality and applicability of ML models, for instance, their benchmarking in well-curated external validation data, against established standards and quality metrics, implemented as community-wide recommendations for reporting ML-based studies [Citation5], along with standards and best practices for code publication and workflow automation [Citation6]. It is evident that a broad adoption of these recommendations by the ML developers and life science journals will improve future assessment and reproducibility of ML-based studies. Equally important to making the codes and data available for others is that the ML models will also be implemented as easy-to-use web-applications to promote their wider use in the community, also among researchers without programming skills or bioinformatics support, while ensuring that the modeling standards are maintained. This is especially important for the complex DL algorithms, as the lack of user-friendly solutions may hinder their use in practice by the drug discovery and repurposing experts.

Another timely issue for which there are some research solutions available but that are not yet made widely available in drug discovery pipelines is how to effectively integrate multi-modal and multi-scale data for guiding the drug discovery process and for predicting drug responses both in preclinical models and in patients. A wide range of learning approaches have been developed for systematic identification of drug repurposing leads based on select big data resources, such as drug structure and target profiles combined with multi-omics data from cell-based models [Citation7], but what is currently lacking are web-based platforms that integrate all these data into an easy-to-access summary for the drug discovery and repurposing community to be used when prioritizing leads for further preclinical and clinical development. Similarly, despite the development of increasingly accurate AI algorithms for drug response prediction in preclinical disease models, including patient-derived primary cells, organoids and xenografts [Citation8], there are still limited solutions available for making accurate and actionable predictions for patient clinical responses.

2. Current and emerging challenges

Below, I highlight four critical challenges, ranging from preclinical to clinical development and practical implementation, which will require novel ML solutions to support a truly data-driven, yet actionable and transparent decision-making process to speed-up the drug discovery process and to reduce failure rates in the clinical development phases.

  1. Drug combinations are often required to treat complex diseases such as viral infections and advanced cancers [Citation9,Citation10]. For instance, combinations of kinase inhibitors or single molecules that inhibit multiple kinases may significantly enhance treatment efficacy and duration and combat treatment resistance in cancer [Citation11]. Due to exponentially increasing number of potential compound and target combinations, however, systematic design and screening of the phenotypic effects of the combinatorial space pose both computational and experimental challenges. While there are many ML models developed for prediction of pairwise drug-dose combination responses, systematic prediction of higher-order combination effects with more than two drugs or targets remains still an unsolved challenge. Tensor learning models have enabled accurate prediction of pairwise drug combination dose-response matrices in cancer cell lines [Citation9], and this computationally efficient learning approach could be applied also to other types of preclinical models, with available large-scale pharmacogenomic data, with the aim to prioritize the most potent combinations for further in vitro or in vivo testing, including also higher-order combinations among new drug molecules and doses.

  2. Most current ML models for treatment response prediction consider efficacy as the primary outcome, while neglecting potential toxicity or selective efficacy (difference between efficacy and toxicity in disease and healthy cells), even though these are critical factors for the success of clinical development. Accordingly, there is a need for careful assessment and prediction of toxic effects of compounds already in preclinical and in silico models to balance the trade-off between treatment efficacy and tolerable toxicity for speeding-up and de-risking the next steps of drug development; for instance, next-generation kinase inhibitors with enhanced selectivity and reduced adverse effects [Citation12]. The use of single-cell data together with ML models has shown promise for finding anticancer combinations that selectively co-inhibit malignant cells, while avoiding inhibition of nonmalignant cells [Citation13], thereby increasing their likelihood for clinical success. Transfer learning and in silico cell type deconvolution approaches [Citation14] may provide practical solutions to avoid the need to generate massive amounts of still quite expensive single-cell data, toward predicting responders to combination therapies, and also their toxic effects and even the dosage that optimize both efficacy and safety.

  3. The in silico treatment response predictions must be validated also in patient data and clinical outcomes. Such real-world evidence for the ML predictions is critical for the clinical development and to establish the practical utility and clinical impact in the decision-making process (e.g. early no go decision if the compound is considered to have too severe toxic effects). Many of the current challenges faced when applying ML for drug discovery, especially for clinical development, are related to the fact that current standards of AI algorithms are not matching with those required for clinical research. Therefore, systematic and comprehensive high-quality clinical data sets are critically needed for ML model validation. The new discovery workflows need to be carefully validated for accuracy and reproducibility, against community-consensus performance metrics, under a wide range of realistic scenarios, not only in a limited scope of benchmark datasets. Sharing and reusing of sensitive patient data can be done using technologies that compartmentalize code and data, or through so-called ‘model to data’ concept [Citation15], which enable federated learning to leverage patient-level data for the model development and systematic benchmarking.

  4. Despite the growing number of drug discovery applications, most ML and especially DL models are still ‘black-boxes’ and often remain elusive to interpretation by the human expert [Citation16]. The mathematical models implemented as web-based decisions support systems need to be made transparent to the users; in particular, the expert systems must be made open to their prediction rules, with uncertainty estimates, so that users can gain confidence. Interpretable, transparent, and explainable models involve clear description of the optimization objectives (e.g. synergy, efficacy and/or toxicity), and quantitative performance and uncertainty evaluation (e.g. cross-validation, conformal predictions, and real-world validations), which help the users to decide how and when to use the algorithms to obtain valid results and improve decision-making. For instance, human-in-the-loop reinforcement learning could use expert advice in an interactive decision-making process to improve the interpretability of ML-guided drug discovery process. However, there needs to be regular quality monitoring and assurance after the deployment to confirm robust and improved performance of the models over time and across multiple discovery applications.

3. Expert opinion

Once the above challenges have been properly addressed, both by implementation of improved methods and their deployment and adoption, I am confident that the next-generation ML and AI-guided expert systems will provide truly impactful and trustworthy computational tools to deliver actionable and testable hypotheses for accelerated discovery of novel drugs, therapeutic targets and drug combinations. Importantly, the ML developers should adhere to the community standards for model and code sharing, benchmarking and reproducibility, especially for clinical development. Similarly, journals in the field should keep on enforcing standardized best-practice checklists for the ML-based discoveries as a requirement for publication. Only this way we, as research community, can provide new solutions, for instance, to drug efficacy and safety evaluations based on ML modeling and integrated multi-modal analyses. Like has been demonstrated during the COVID-19 pandemic, open-science and rapid sharing of research results are critically required for community-wide benchmarking and crowdsourced development of novel, much needed therapeutics for existing and emerging diseases. I strongly believe that by working together, we can all benefit from a more rational and systematic drug discovery and optimization approaches, which will have a significant impact on drug discovery pipelines, and eventually on public health.

Declaration of interest

The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Reviewer disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

References

  • Vamathevan J, Clark D, Czodrowski P, et al., Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18(6):463–477.
  • Bhinder B, Gilvary C, Madhukar NS, et al. Artificial intelligence in cancer research and precision medicine. Cancer Discov. 2021;11(4):900–915.
  • Roberts M, Driggs D, Thorpe M, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021;3(3):199–217.
  • Muratov EN, Amaro R, Andrade CH, et al. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem Soc Rev. 2021;50(16):9121–9151.
  • Walsh I, Fishman D, Garcia-Gasulla D, et al. DOME: recommendations for supervised machine learning validation in biology. Nat Methods. 2021;18(10):1122–1127.
  • Heil BJ, Hoffman MM, Markowetz F, et al. Reproducibility standards for machine learning in the life sciences. Nat Methods. 2021;18(10):1132–1135.
  • Tanoli Z, Vähä-Koskela M, Aittokallio T. Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin Drug Discov. 2021;16(9):977–989.
  • Ballester PJ, Stevens R, Haibe-Kains B, et al., Artificial intelligence for drug response prediction in disease models. Brief Bioinform. 23(1): bbab450. 2022.
  • Julkunen H, Cichonska A, Gautam P, et al. Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects. Nat Commun. 2020;11(1):6136.
  • White J, Schiffer JT, Bender R, et al. Drug combinations as a first line of defense against coronaviruses and other emerging viruses. mBio. 2021;12(6):e0334721.
  • Attwood MM, Fabbro D, Sokolov AV, et al. Trends in kinase drug discovery: targets, indications and inhibitor design. Nat Rev Drug Discov. 2021;20(11):839–861.
  • Cohen P, Cross D, Jänne PA. Kinase drug discovery 20 years after imatinib: progress and future directions. Nat Rev Drug Discov. 2021;20(7):551–569.
  • Ianevski A, Lahtela J, Javarappa KK, et al., Patient-tailored design for selective co-inhibition of leukemic cell subpopulations. Sci Adv. 7(8): eabe4038. 2021.
  • Avila Cobos F, Alquicira-Hernandez J, Powell JE, et al. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun. 2020;11(1):5650.
  • Guinney J, Saez-Rodriguez J. Alternative models for sharing confidential biomedical data. Nat Biotechnol. 2018;36(5):391–392.
  • Jiménez-Luna J, Grisoni F, and Schneider G. Drug discovery with explainable artificial intelligence. Nat Mach Intell. 2020;2(10):573–584.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.