ABSTRACT
Introduction
Natural products (NPs) are a desirable source of new therapeutics due to their structural diversity and evolutionarily optimized bioactivities. NPs and their derivatives account for roughly 70% of approved pharmaceuticals. However, the rate at which novel NPs are discovered has decreased. To accelerate the microbial NP discovery process, machine learning (ML) is being applied to numerous areas of NP discovery and development.
Areas covered
This review explores the utility of ML at various phases of the microbial NP drug discovery pipeline, discussing concrete examples throughout each major phase: genome mining, dereplication, and biological target prediction. Moreover, the authors discuss how ML approaches can be applied to semi-synthetic approaches to drug discovery.
Expert opinion
Despite the important role that microbial NPs play in the development of novel drugs, their discovery has declined due to challenges associated with the conventional discovery process. ML is positioned to overcome these limitations given its ability to model complex datasets and generalize to novel chemical and sequence space. Unsurprisingly, ML comes with its own limitations that must be considered for its successful implementation. The authors stress the importance of continuing to build high quality and open access NP datasets to further increase the utility of ML in NP discovery.
Article highlights
Microbial natural products are a promising source of novel therapeutics.
Machine learning approaches are being increasingly applied to relieve bottlenecks throughout the microbial natural product discovery process.
Machine learning has allowed for the exploration of novel biosynthetic gene clusters due to its ability to generalize to new sequence spaces.
Machine learning has been applied to the interpretation of metabolomic data, which can be leveraged for the efficient dereplication of microbial secondary metabolites.
Machine learning has facilitated biological target prediction, providing insight into the mechanisms of action of natural products.
Generative machine learning models have improved the design of natural product-inspired chemical libraries by preserving various chemical features that are important for the bioactivity of natural products.
Declaration of interest
JM Stokes is co-founder and scientific director of Phare Bio. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.