ABSTRACT
Introduction
Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation.
Areas covered
An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed.
Expert opinion
The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Article highlights
Large chemical spaces include compound collections, combinatorial libraries, DNA-encoded chemical libraries, and virtual chemical spaces explored by generative models.
Molecular representations and similarity metrics suitable for large CS exploration are discussed.
Approaches to characterizing CSs and comparing CSs are discussed.
Large chemical spaces require specialized algorithms for efficient navigation that are limited to neighborhoods of structurally similar molecules.
Smarter navigation approaches are required that focus on better but not necessarily larger molecule selections.
Validation of deep generative models remains challenging as long as CSs of these models remain virtual without experimental verification.
Declaration of interest
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.