Search in:

Expert Opinion on Drug Discovery Volume 19, 2024 - Issue 6

Submit an article Journal homepage

Open access

1,158

Views

CrossRef citations to date

Altmetric

Review

Another string to your bow: machine learning prediction of the pharmacokinetic properties of small molecules

Davide BassaniPharmaceutical Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, SwitzerlandCorrespondence[email protected]
View further author information

Neil John ParrottPharmaceutical Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, SwitzerlandView further author information

Nenad ManevskiPharmaceutical Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, SwitzerlandView further author information

Jitao David ZhangPharmaceutical Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, SwitzerlandCorrespondence[email protected]

https://orcid.org/0000-0002-3085-0909 View further author information

Pages 683-698 | Received 23 Oct 2023, Accepted 23 Apr 2024, Published online: 10 May 2024

Cite this article
https://doi.org/10.1080/17460441.2024.2348157
CrossMark

Sample our Medicine, Dentistry, Nursing & Allied Health journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

ABSTRACT

Introduction

Prediction of pharmacokinetic (PK) properties is crucial for drug discovery and development. Machine-learning (ML) models, which use statistical pattern recognition to learn correlations between input features (such as chemical structures) and target variables (such as PK parameters), are being increasingly used for this purpose. To embed ML models for PK prediction into workflows and to guide future development, a solid understanding of their applicability, advantages, limitations, and synergies with other approaches is necessary.

Areas covered

This narrative review discusses the design and application of ML models to predict PK parameters of small molecules, especially in light of established approaches including in vitro-in vivo extrapolation (IVIVE) and physiologically based pharmacokinetic (PBPK) models. The authors illustrate scenarios in which the three approaches are used and emphasize how they enhance and complement each other. In particular, they highlight achievements, the state of the art and potentials of applying machine learning for PK prediction through a comphrehensive literature review.

Expert opinion

ML models, when carefully crafted, regularly updated, and appropriately used, empower users to prioritize molecules with favorable PK properties. Informed practitioners can leverage these models to improve the efficiency of drug discovery and development process.

KEYWORDS:

Drug discovery
pharmacokinetics
ADME
machine learning
PBPK
IVIVE

Article highlights

Predicting pharmacokinetic (PK) properties of drug candidates in animals and in humans is an essential task for drug discovery and development.
Machine learning is increasingly applied to predict absorption, distribution, metabolism, and excretion (ADME) and PK properties of small molecules.
Machine-learning (ML) models complement established methods including in vitro-in vivo extrapolation (IVIVE) and physiologically based pharmacokinetic (PBPK) modeling, enhancing the ability to design and prioritize molecules with favorable PK properties.
Successful predictions of PK parameters with ML models require high-quality and continuously updated data, a reliable infrastructure, mechanisms to assess model’s performance regularly and to retrain the model when necessary, feedback and retrospective analysis comparing predictions and observations, as well as research and education on how to integrate them into drug discovery workflows.
ML-based PK prediction warrants further research, in particular enriching data, improving models’ interpretability, reducing bias, and exploring synergies with other models, especially in clinical settings.
The integration of ML models with IVIVE and PBPK approaches can provide a more comprehensive understanding of drug’s behavior, potentially improving the efficiency of drug discovery and development.

List of abbreviations

AAFE	=	Absolute average fold error, a value indicating the closeness of the model prediction to the real value. The value is a positive number equal or greater than 1. The higher the value, the further the predictions to the real values (therefore the poorer the performance). The definition is given below (ABS=absolute). $AAFE = 10^{[\sum A B S ({log}_{10} (\frac{prediction}{observation})) / number of observations]}$
AFE	=	Average fold error, a measure of the average over/under estimation of a predicted property by a model. Its value is positive, values above 1 indicate a tendency to overprediction, while below 1, a tendency to underprediction. The definition is given below. $AFE = 10^{[\sum {log}_{10} (\frac{prediction}{observation})) / number of observations]}$
ADME	=	Absorption, distribution, metabolism, and excretion.
AUC	=	Area under the (time/concentration) curve. It is defined as the definite integral of the concentration of a drug in plasma as a function of time.
CLp	=	Plasma clearance, the amount of plasma which is cleared from the drug in a defined time frame.
Cmax	=	Maximum plasma concentration, also written as C_max.
DL	=	Deep learning. A term used to describe machine-learning methodologies that use neural network architectures with multiple layers.
DNN	=	Deep neural network.
F%	=	Bioavailability. It is expressed as the percentage of the administered compound which reaches the blood systemic circulation. It is calculated as the ratio between the AUC of the administration route of interest and the AUC of the intravenous route, which has F% = 100% by definition.
GMFE	=	Geometric mean fold error, synonymous with AAFE.
GNN	=	Graph neural network, a neural network architecture for graph-based learning. For instance, 2D or 3D structures of small molecules can be represented as a graph, i.e. a collection of nodes (atoms) and edges (bonds).
HT	=	High-throughput.
HT-PBPK	=	High-throughput PBPK modeling.
IVIVE	=	In vitro-in vivo extrapolation.
MESN	=	Multi embedding-based synthetic network.
MLP	=	Multilayer perceptron, a classical architecture of an artificial neural network, in which every neuron is fully connected to all the neurons in the previous and next layer.
MPS	=	Microphysiological systems.
NAM	=	New approach methodologies.
NN	=	Neural network, synonymous with artificial neural network in this context.
PBPK	=	Physiologically based pharmacokinetic modeling.
PCA	=	Principal component analysis.
PD	=	Pharmacodynamics.
PK	=	Pharmacokinetics.
ML	=	Machine learning.
ODE	=	Ordinary differential equation.
PopPK	=	Population pharmacokinetics.
QML	=	Quantum machine learning.
RMSE	=	Root mean square error, a measure of goodness of the model fit. It is defined as the root mean square of the residuals, using the following equation (pred = prediction; obs = observation, n = number of predictions/observation evaluated). $RMSE = \frac{\overset{2}{\sum_{i = 0}^{n} (pre d_{i} - ob s_{i})}}{n}$
R²	=	Coefficient of determination. There are multiple definitions of R². In our context, it is defined with the equation below (the bar over observations means the average observation), which is a relative measure of discrepancy between predictions and observations, compared with a null model which uses just the average value of the observations as prediction. $R^{2} = 1 - \frac{\sum {(observation - prediction)}^{2}}{\sum {(observation - \overline{observations})}^{2}}$
RF	=	Random forest, a class of machine learning models built with an ensemble of individual decision trees. Each decision tree makes its prediction, and the random forest model makes predictions by pooling individual predictions.
SVR	=	Support vector regression, a variant of a machine-learning algorithm known as the support vector machine (SVM). SVM is a supervised learning algorithm for classification tasks. It works by mapping input data into higher dimensions and finding a decision boundary (known as hyperplanes) there to separate classes of input data. The term ‘support vector’ refers to the data point(s) that lie closest to the decision boundary. SVR is derived from SVM and addresses regression tasks.
t_1/2	=	Half-life. Time necessary for a substance to reach a plasma concentration equal to half of its initial value.
Tmax	=	Time point in which the Cmax is measured in the concentration/time curve of a substance in the plasma, also written as T_max or t_max.
Vss	=	Volume of distribution at steady state. It represents a theoretical volume into which a drug is distributed at steady-state conditions.

Acknowledgments

The authors would like to thank Stephen Fowler, Andrea Andrews-Morger, Julia Pletz, and Leonid Komissarov for their valuable comments and feedback. The authors are indebted to the input of many colleagues in the department of Pharmaceutical Science, and the support of Fabian Birzele, Sherri Dudal and Marianne Manchester. The authors also thank Matthew Wright from Genentech, who shared with us valuable experience and helpful suggestions.

Declaration of Interest

D Bassani was kindly supported by the Roche Postdoc Fellowship (RPF). All authors are employees of Hoffmann-La Roche. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Reviewer Disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/17460441.2024.2348157.

Additional information

Funding

This paper was not funded.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Another string to your bow: machine learning prediction of the pharmacokinetic properties of small molecules

Introduction

Areas covered

Expert opinion

Information for

Open access

Opportunities

Help and information

Another string to your bow: machine learning prediction of the pharmacokinetic properties of small molecules

ABSTRACT

Introduction

Areas covered

Expert opinion

Article highlights

List of abbreviations

Acknowledgments

Declaration of Interest

Reviewer Disclosures

Supplementary material

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature