ABSTRACT
Introduction: The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data.
Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31 selected datasets were assessed using specific criteria derived in this study.
Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed.
Article highlights
Over 140 datasets covering 24 ADME properties were identified from the literature
Characteristics influencing the modelability of available data were investigated
31 benchmark datasets were assessed according to defined criteria for modelability
Recommendations are provided for publishing and assessing suitability for modelling
This box summarizes key points contained in the article.
Declaration of interest
The authors were supported by the European Chemical Agency. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Supplemental data
Supplemental data for this article can be accessed here.