905
Views
11
CrossRef citations to date
0
Altmetric
Editorial

Activity cliff clusters as a source of structure–activity relationship information

, , &

Abstract

The activity cliff (AC) concept is widely applied in medicinal chemistry. ACs are formed by compounds with small structural changes having large differences in potency. Accordingly, ACs are a primary source of structure–activity relationship (SAR) information. Through large-scale compound data mining it has been shown that the vast majority of ACs are formed in a coordinated manner by groups of structurally analogous compounds with significant potency variations. In network representations coordinated ACs form clusters of varying size but frequently recurrent topology. Recently, computational methods have been introduced to systematically organize AC clusters and extract SAR information from them. AC clusters are widely distributed over compound activity classes and represent a rich source of SAR information. These clusters can be visualized in AC networks and isolated. However, it is challenging to extract SAR information from such clusters and make this information available to the practice of medicinal chemistry. Therefore, it is essential to go beyond subjective case-by-case analysis and design computational approaches to systematically access SAR information associated with AC clusters.

1. Introduction

According to their original definition, activity cliffs (ACs) are formed by pairs of structurally similar compounds that have a large difference in potency Citation[1,2]. This definition contains three major points of consideration, that is, similarity and potency difference criteria as well as the focus on compound pairs. For a meaningful analysis of ACs, compound similarity and potency difference criteria must be set and consistently applied. In addition, given the original focus on compound pairs, ACs were originally assessed in ‘isolation’, without considering their structural and activity context.

On the basis of our experience, we favor a structurally conservative definition that restricts the formation of ACs to structural analogs and requires a minimal potency difference of two orders of magnitude between cliff partners, assessed on the basis of (assay-independent) equilibrium constants, whenever possible Citation[3]. Following our approach, structural relationships are established by applying a modified matched molecular pair (MMP) formalism Citation[4,5]. An MMP represents a pair of compounds that are only distinguished by a structural change at a single site Citation[4]. In addition, we limit such changes to small replacements typically observed in analog series by applying transformation size restrictions to MMPs Citation[5]. Pairs of active compounds forming transformation size-restricted MMPs and having an at least 100-fold difference in potency are termed MMP-cliffs Citation[5]. These MMP-cliffs represent our preferred AC definition that is consistently applied in the following. Moreover, departing from the compound pair focus, we assess ACs in their structure–activity context.

Through large-scale compound data mining, we have determined that the vast majority of current ACs are formed in a ‘coordinated’ manner Citation[6,7]. This means that multiple ACs are formed by groups of structural analogs with large potency variations. These active cliffs often overlap because individual compounds in groups of analogs are involved in the formation of more than one cliff Citation[6,7]. The identification of coordinated ACs has extended the AC concept. Although coordinated ACs are associated with higher structure–activity relationship (SAR) information content than ACs considered on the basis of individual compound pairs Citation[3], their analysis is a complex task.

Herein, we discuss what has been learned about coordinated ACs thus far and introduce computational approaches designed to explore and exploit them.

2. Activity cliff distribution

To guide the discussion we provide AC statistics for bioactive compounds available in ChEMBL (version 17) Citation[8], the major public repository of compounds and activity data from medicinal chemistry sources. The results are summarized in . A total of 15,543 well-defined ACs were detected in 282 compound activity classes on the basis of high-confidence activity data (confidence criteria were applied according to Citation[7]). On average, 23.5% of all active compounds were involved in the formation of ACs (and 4.7% of all qualifying MMPs sharing the same activity represented ACs).

Table 1. Activity cliff statistics.

The more than 15,000 currently available ACs provide a large knowledge base for medicinal chemistry. Importantly, 95% of these ACs are formed in a coordinated manner, as also reported in . Hence, the analysis of coordinated ACs should take center stage in SAR exploration.

3. AC networks

In light of the above, a key question was how to best characterize coordinated ACs. For this purpose, network representations were designed to visualize these AC arrangements Citation[9]. In AC networks, nodes represent compounds and edges between nodes indicate the formation of cliffs. For ChEMBL (version 17), a global AC network was generated on the basis of 72,494 active compounds with defined equilibrium constants belonging to 661 different (target-based) activity classes. A subset of 11,787 active compounds was found to participate in a total of 15,543 ACs, only 771 (5%) of which were formed by isolated compound pairs. Nearly 87% of the compounds comprising the AC network formed coordinated ACs, giving rise to > 1100 distinct clusters. Because the preferred MMP-cliff definition was applied here, there are no similarity threshold values responsible for cluster formation. Rather, a cluster is formed by a group (subset) of compounds forming MMP relationships with each other (MMP-cliffs), but not with other data set compounds. Thus, cluster formation is driven by exclusive structural relationships. A subset of 369 AC clusters was found to consist of 6 to 15 compounds and 18 clusters of > 50 compounds (the largest clusters contained > 100 compounds). shows a selection of exemplary clusters of limited size from the network having different compositions and topologies.

Figure 1. Schematic activity cliff network. Exemplary clusters from the global AC network with irregular or recurrent main topologies and hybrids of these topologies are shown. In addition, for each topological category, the total number of clusters in the network is provided.

Figure 1. Schematic activity cliff network. Exemplary clusters from the global AC network with irregular or recurrent main topologies and hybrids of these topologies are shown. In addition, for each topological category, the total number of clusters in the network is provided.

Network analysis revealed that coordinated ACs produced many individual clusters of different compositions and sizes. Most of these clusters only consisted of compounds sharing the same activity (i.e., only ∼ 8.5% of the compounds participated in the formation of one or more multi-target ACs). Interestingly, ∼ 800 AC clusters were found to display three recurrent main topologies (termed the ‘star’, ‘chain’, and ‘rectangle’ topology) and extensions of these topologies, regardless of compound activity Citation[9], as illustrated in . Thus, many AC clusters can be assigned to a small number of topological motifs, which aids in SAR exploration and hence rationalizes the significance of recurrent topologies.

4. AC clusters and SAR information

Network analysis clearly established AC clusters as a basic unit for the analysis of coordinated ACs and the SAR information they contain. As discussed above, AC clusters represent a large knowledge base for SAR analysis. However, dissecting individual clusters for a given activity class is an arduous task. Hence, a key question for AC cluster analysis was how to systematically and consistently organize clusters according to SAR features and extract SAR information from them. To these ends, two computational approaches have recently been introduced.

4.1 Cluster indices and index map

For the systematic classification and characterization of AC clusters, two numerical indices were defined including the ‘MMP index’, which quantified the structural similarity of compounds in AC clusters, and the ‘Core index’, which characterized the AC diversity within clusters (i.e., quantified to what extent ACs were formed on the basis of the same MMP core structure or different cores) Citation[10]. The MMP index accounted for the ratio of MMP relationships over all pairs of compounds in a cluster. Within larger clusters, AC sequences often displayed progressive structural modifications between participating compounds. Accordingly, low MMP index scores reflected the prevalence of compounds with gradual structural modifications and high scores the presence of structurally analogous compounds. Furthermore, if the majority of ACs in a cluster were formed based upon the same core structure, a low Core index score was obtained. By contrast, if ACs were predominantly formed in sequences with gradual structural changes the Core index score was high. shows representative examples of AC clusters in the index map and structures of compounds within two clusters with high MMP index (cluster 1, top) or high Core index scores (cluster 2, bottom).

Figure 2. Activity cliff clusters in the index map. Activity cliffs depicted in Figure 1 have different locations in the index map depending on their individual cluster scores.

Figure 2. Activity cliff clusters in the index map. Activity cliffs depicted in Figure 1 have different locations in the index map depending on their individual cluster scores.

Figure 3. Index map-dependent differences in activity cliff cluster composition. Two representative activity cliff clusters with high MMP index (cluster 1; MMP index = 1.0) or Core index score (cluster 2; Core index = 0.75) are shown and compounds forming these clusters. Varying modification sites are highlighted using different shades of gray.

Figure 3. Index map-dependent differences in activity cliff cluster composition. Two representative activity cliff clusters with high MMP index (cluster 1; MMP index = 1.0) or Core index score (cluster 2; Core index = 0.75) are shown and compounds forming these clusters. Varying modification sites are highlighted using different shades of gray.

The comparison of compound structure relationships and AC patterns in clusters provides direct access to SAR information. Therefore, by combining the MMP and Core indices, an ‘index map’ was obtained, which enabled a systematic organization of AC clusters. shows an exemplary index map for a compound activity class. Based on index threshold values of 0.5, the map was divided into four quadrants (regions I–IV). For example, the upper left region I contained clusters comprising close structural analogs and having low AC diversity, whereas the lower right region III contained clusters of compounds with partly different cores and high AC diversity. Typically, all four regions were populated in index maps, albeit to varying extents, depending on the compound classes under study Citation[10]. From index maps, clusters with different SAR characteristics were selected and further analyzed.

Figure 4. Activity class example. The index map distribution of activity cliff clusters formed by adenosine A3 receptor antagonists (ChEMBL target ID 256) is shown.

Figure 4. Activity class example. The index map distribution of activity cliff clusters formed by adenosine A3 receptor antagonists (ChEMBL target ID 256) is shown.

The index map approach has been applied to systematically analyze AC clusters formed by currently available type I inhibitors of 266 kinase targets (covering ∼ 50% of the human kinome) and study SARs associated with prioritized clusters Citation[11]. A global AC network based on IC50 values available for type I inhibitors contained 357 AC clusters formed by 2986 compounds Citation[11] and yielded an index map distribution similar to the one described above.

4.2 Matching molecular series from clusters

In addition to cluster indexing, organization, and prioritization, another computational method was devised for the immediate extraction of SAR information from AC clusters. This approach automatically isolates matching molecular series (MMS) from AC clusters and detects relationships between these series Citation[12]. MMS were introduced as an extension of the MMP concept used to define ACs. An MMS represents a series of compounds that are only distinguished by a structural change at a single site (accordingly, compounds comprising an MMS form all possible pairwise MMP relationships). A survey of AC clusters from ChEMBL (version 17) revealed that ∼ 76% of all clusters contained multiple MMS of varying length Citation[12]. Detecting large numbers of MMS in individual clusters (e.g., 10, 20, 50 or more) was not unusual. Hence, MMS from AC clusters represented a substantial and chemically intuitive source of SAR information. In addition, relationships between different MMS originating from the same cluster were assessed Citation[12]. Therefore, compounds forming an MMS were ordered according to increasing potency. Then, pairs of MMS were compared to determine if they shared one or more compounds (which frequently occurred in AC clusters). As illustrated in , a shared compound defined a transition point between two MMS. Accordingly, pairs of MMS were compared along their potency gradient and it was determined whether transitioning from one MMS to another (by following well-defined structural relationships) led to a further increase in maximal compound potency. shows pairs of MMS belonging to different (potency-based) relationship categories. Hence, the consideration of MMS pairs with one or more transition points further increased the amount of SAR information that could be obtained from AC clusters compared to the analysis of individual series.

Figure 5. Matching molecular series from activity cliff clusters. Two exemplary pairs of MMS belonging to different relationship categories are depicted. For cluster 3 (MMS pair with no potency gain) and cluster 4 (MMS pair with a significant potency gain), the shared compound, MMP cores, and corresponding substitution sites are shown. For clarity, clusters and MMS of small size were selected for display. Modification sites are highlighted in gray.

Figure 5. Matching molecular series from activity cliff clusters. Two exemplary pairs of MMS belonging to different relationship categories are depicted. For cluster 3 (MMS pair with no potency gain) and cluster 4 (MMS pair with a significant potency gain), the shared compound, MMP cores, and corresponding substitution sites are shown. For clarity, clusters and MMS of small size were selected for display. Modification sites are highlighted in gray.

Across ChEMBL compounds, > 700 MMS pairs were detected where transitioning between series resulted in a further compound potency gain of at least one order of magnitude. These pairs originated from ∼ 150 AC clusters and 88 different compound activity classes Citation[12].

5. Conclusions

We have introduced coordinated ACs that dominate AC populations and give rise to the formation of clusters in AC network representations. The notion of coordinated ACs further extends the AC concept and is of significance for medicinal chemistry. This is the case because AC clusters have higher SAR information content than cliffs considered on the basis of compound pairs. However, this SAR information is often difficult to extract from clusters. For this purpose, the first computational approaches have recently been introduced, as discussed herein.

6. Expert opinion

The finding that the vast majority of ACs were formed in a coordinated manner triggered a revision and further extension of the AC concept. In this context, a key question was how to best rationalize and visualize coordinated AC arrangements. To these ends, AC network analysis made important contributions, leading to the identification of AC clusters of in part recurrent topology. AC clusters are of high interest for the practice of medicinal chemistry because they contain much more SAR information than ACs considered at the level of individual compound pairs. However, although a large body of AC clusters was identified, a major difficulty was how to organize and prioritize these clusters and extract SAR information from them, without the need to subjectively assess clusters on a case-by-case basis. Accordingly, the development of computational approaches for systematic AC cluster analysis and SAR exploration was of prime relevance. Progress has recently been made. The introduction of cluster indices and index maps has made it possible to organize AC clusters in a formally consistent manner according to SAR characteristics. Moreover, applying the MMS concept, the extraction of interpretable SAR information from AC clusters has been automated. A general plus of the MMS-based approach, including the identification of series with shared compounds, is its intuitive nature. MMS-based SAR information is immediately accessible by medicinal chemists (who are used to inspecting series of analogs). Because these MMS represent AC sequences, which can also be transformed into compound series following potency gradients, their SAR information content is generally high. We believe that compound optimization efforts focusing on popular therapeutic targets, for which much compound activity data is already available in the public domain and/or inside pharma settings, should take available SAR information into careful consideration. To aid in this process, future AC research will likely concentrate on the development and refinement of additional computational tools, with an emphasis on ease of use and interpretation, for which the MMS-based methodology discussed herein is thought to represent an instructive prototype.

Declaration of interest

The authors are supported by the University of Bonn. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Bibliography

  • Maggiora GM. On outliers and activity cliffs – why QSAR often disappoints. J Chem Inf Model 2006;46(4):1535–5
  • Stumpfe D, Bajorath J. Exploring activity cliffs in medicinal chemistry. J Med Chem 2012;55(7):2932–42
  • Stumpfe D, Hu Y, Dimova D, et al. Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 2014;57(1):18–28
  • Structure modification in chemical databases, in Chemoinformatics in Drug Discovery. Kenny PW, Sadowski J, Oprea TI. Editors Wiley-VCH; Weinheim, Germany: 2005. pp 271–85
  • Hu X, Hu Y, Vogt M, et al. MMP-cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model 2012;52(5):1138–45
  • Hu Y, Stumpfe D, Bajorath J. Advancing the activity cliff concept [v1; ref status: indexed, http://f1000r.es/1wf] F1000Res. 2013;2:199
  • Stumpfe D, de la Vega de León A, Dimova D, et al. Advancing the activity cliff concept, part II [v1; ref status: indexed, http://f1000r.es/34p] F1000Res. 2014;3:75
  • Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 2011;40:D1100–7
  • Stumpfe D, Dimova D, Bajorath J. Composition and topology of activity cliff clusters formed by bioactive compounds. J Chem Inf Model 2014;54(2):451–61
  • Dimova D, Stumpfe D, Bajorath J. A method for the evaluation of structure-activity relationship information associated with coordinated activity cliffs. J Med Chem 2014;57(15):6553–63
  • Dimova D, Stumpfe D, Bajorath J. Systematic assessment of coordinated activity cliffs formed by kinase inhibitors and detailed characterization of activity cliff clusters and associated SAR information. Eur J Med Chem 2015;90:414–27
  • Dimova D, Bajorath J. Extraction of structure-activity relationship information from activity cliff clusters via matching molecular series. Eur J Med Chem 2014;87:454–60

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.