ABSTRACT
Introduction
After the initial wave of antibiotic discovery, few novel classes of antibiotics have emerged, with the latest dating back to the 1980’s. Furthermore, the pace of antibiotic drug discovery is unable to keep up with the increasing prevalence of antibiotic drug resistance. However, the increasing amount of available data promotes the use of machine learning techniques (MLT) in drug discovery projects (e.g. construction of regression/classification models and ranking/virtual screening of compounds).
Areas covered
In this review, the authors cover some of the applications of MLT in medicinal chemistry, focusing on the development of new antibiotics, the prediction of resistance and its mechanisms. The aim of this review is to illustrate the main advantages and disadvantages and the major trends from studies over the past 5 years.
Expert opinion
The application of MLT to antibacterial drug discovery can aid the selection of new and potent lead compounds, with desirable pharmacokinetic and toxic profiles for further optimization. The increasing volume of available data along with the constant improvement in computational power and algorithms has meant that we are experiencing a transition in the way we face modern issues such as drug resistance, where our decisions are data-driven and experiments can be focused by data-suggested hypotheses.
Article highlights
The current scenario of bacterial infections and the emergence of resistance for available drugs urgently necessitates new efforts to design/discover new antibacterial agents.
Machine learning techniques are considered powerful tools to generate predictive models useful in virtual screening campaigns for new antibacterial discovery and their use has become more common in recent decades.
Some machine learning algorithms have been developed to mimic the natural thinking and/or decision-making process of unseen data in a non-linear way, which is a great differential from classical statistical approaches.
Methods such as Supporting Vector Machines (SVM), Artificial Neural Networks (ANNs), Decision Trees (DT) and Random Forests (RF) have been more often employed in the generation of predictive models than in other classification and regression methods.
Whole genome sequencing datasets, usually available from diagnostics, can be used by MLT to understand and discover novels resistance mechanisms in the bacteria population, ultimately helping to prioritize and discover molecular targets.
This box summarizes key points contained in the article.
Acknowledgments
The authors would like to thank Leonardo Hiroyuki Santos Momo and Joe Joiner for their critical reading and useful comments on this manuscript.
Declaration of interest
The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose/