841
Views
23
CrossRef citations to date
0
Altmetric
Review

Assigning confidence to molecular property prediction

, , , , , , , & show all
Pages 1009-1023 | Received 25 Feb 2021, Accepted 29 Apr 2021, Published online: 15 Jun 2021
 

ABSTRACT

Introduction: Computational modeling has rapidly advanced over the last decades. Recently, machine learning has emerged as a powerful and cost-effective strategy to learn from existing datasets and perform predictions on unseen molecules. Accordingly, the explosive rise of data-driven techniques raises an important question: What confidence can be assigned to molecular property predictions and what techniques can be used?

Areas covered: The authors discuss popular strategies for predicting molecular properties, their corresponding uncertainty sources and methods to quantify uncertainty. First, the authors’ considerations for assessing confidence begin with dataset bias and size, data-driven property prediction and feature design. Next, the authors discuss property simulation via computations of binding affinity in detail. Lastly, they investigate how these uncertainties propagate to generative models, as they are usually coupled with property predictors.

Expert opinion: Computational techniques are paramount to reduce the prohibitive cost of brute-force experimentation during exploration. The authors believe that assessing uncertainty in property prediction models is essential whenever closed-loop drug design campaigns relying on high-throughput virtual screening are deployed. Accordingly, considering sources of uncertainty leads to better-informed validations, more reliable predictions and more realistic expectations of the entire workflow. Overall, this increases confidence in the predictions and, ultimately, accelerates drug design.

Article highlights

  • For many important properties in drug discovery, only a limited amount of high-quality data is available. Moreover, the data might only be available for specific structural families, which can introduce bias.

  • Predictive models obtained using inductive inference are never formally correct and inherently uncertain. There are often additional sources of uncertainty that stem from noise or imprecision on target measurements. Many ML methods are available for representing uncertainty.

  • Property predictors are provided inputs that are considered important for estimating molecular properties.

  • A key property of interest in drug discovery is the binding affinity of a ligand to a receptor of interest.

  • Generative models have recently been introduced as techniques for discovering drugs with desired properties. Molecular property predictors are often attached to these generative models.

This box summarizes key points contained in the article.

Abbreviations

ML - machine learning;

ADMET - absorption, distribution, metabolism, excretion and toxicity;

QSAR - quantitative structure-activity relationships;

AD - applicability domain

CP - conformal prediction

GP - gaussian processes

BNN - bayesian neural network

SMILES - simplified molecular input line entry system

Cryo-EM - cryo-electron microscopy

FEP - free energy perturbation

MD - molecular dynamics

VAE - variational autoencoders

GAN - generative adversarial networks

RL - einforcement learning

GAs - genetic algorithms

Acknowledgments

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Office of Naval Research.

Declaration of interest

A Aspuru-Guzik is a co-founder and the Chief Visionary Officer at Kebotix Inc. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Additional information

Funding

R Pollice acknowledges funding through a Postdoc Mobility fellowship by the Swiss National Science Foundation (SNSF, Project No. 191127). RJ Hickman gratefully acknowledges NSERC for provision of the Postgraduate Scholarships-Doctoral Program (PGSD3-534584-2019). M Aldeghi is supported by a Postdoctoral Fellowship of the Vector Institute. VA Voelz and MFD Hurley acknowledge support in part by National Institutes of Health grant 1R01GM123296. A Aspuru-Guzik thanks Anders G. Frøseth for his generous support. A Aspuru-Guzik also acknowledges the generous support of Natural Resources Canada and the Canada 150 Research Chairs program. The authors also acknowledge the Department of Navy award (N00014-19-1-2134) issued by the Office of Naval Research.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 99.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,340.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.