489
Views
2
CrossRef citations to date
0
Altmetric
Research Articles (GP2A Conference)

Description and assessment of a model for GSK-3β database virtual screening

, , &
Pages 152-157 | Received 26 Sep 2008, Accepted 30 Jan 2009, Published online: 19 Feb 2010

Abstract

Molecular docking was used in order to prioritize organic syntheses or experimental evaluations. Different GSK-3β protein models were generated in silico from a known X-ray structure. A set of 42 known inhibitors were then flexibly docked into each rigid model and re-scored with various functions, which led to different rankings. The biological activities of the chemicals were then compared to each set of results and one of the rigid models emerged in combination with two scoring functions as giving the best correlation. This methodology constitutes an easy and accurate way to generate reliable models for virtual database screening.

Introduction

Protein kinases are at the crossroads of multiple signaling pathways controlling the cell cycle and/or apoptosis, which are deregulated in cancer. As such, they constitute ideal therapeutic targets. In total, 518 different kinases have been described, and this family occupies first rank among the families of genes struck by genetic alterations in tumors. Pharmacologically, they can easily be targeted, and several protein kinase inhibitors have now successfully been developed as drugs. For instance, trastuzumab (Herceptin®), which is a humanized monoclonal antibody targeting the HER2neu receptor, is used in metastatic breast cancerCitation1. Imatinib mesylate (Gleevec®), which is a small molecule inhibitor of the tyrosine kinases BCR-ABL, KIT, and PDGFRA/B, is efficient in the treatment of chronic myeloid leukemia, digestive stromal tumors, and idiopathic hypereosinophiliaCitation2.

Kinases are responsible for the phosphorylation of miscellaneous proteins and present a high sequence similarity, with conserved structural and regulatory elements such as the adenosine phosphate (ATP) binding site, which is the target of most kinase inhibitors. This results in a paradox in obtaining selective inhibitors: a ligand may act on several kinases leading to undesired toxicity. However, this problem may be addressed by kinase profile assays to measure specific affinities between different proteinsCitation3,Citation4 or even between isoforms of a kinaseCitation5.

Glycogen synthase kinase-3 (GSK-3) belongs to the CMGC family (containing CDK, MAPK, GSK-3, and CLK), and has stirred up much interest since it is connected to several pathologies such as diabetesCitation6, Alzheimer diseaseCitation7, neurodegenerative disordersCitation8, and cancerCitation9. Two isoforms exist with similar functions and substrate affinitiesCitation10.

Many inhibitors of GSK-3β have been reported in the literature (with known affinity) along with X-ray complexes deposited in the Protein Data Bank (PDB). In the course of our search for new anti-cancer drugs, we are interested in developing in silico models of some kinases in order to perform virtual screening (VS). VS is an ensemble of computational procedures aiming at selecting the most promising compounds for experimental screening from an electronic database. Because of the reasons stated above, GSK-3β seemed a worthy candidate to start our studies. Such models should be as reliable as possible, which means giving a high correlation between biological and calculated activities, but they should also be simple to process and allow fast screening of large chemical databases.

Among the docking programs developed for VS, DockCitation11 presents a number of advantages: a robust algorithm allowing an efficient sampling of the conformational space, a reasonable average running time per ligand, and the availability of an executable multi-processor. This software has been used extensively for a number of projects and is free to academics. When evaluating the fitness of a given chemical for a specific protein, one has to determine the free energy of the resulting ligand–protein complex with a scoring function. This prediction is performed during the docking process but it may also be realized afterwards as a means to compare the different techniques to calculate the stability of the obtained data. Since the chosen docking program does not allow protein flexibility, we docked a set of known inhibitors in different structures of the rigid receptor.

In this report, we describe our methodology in selecting a combination of protein model-scoring functions to perform virtual screening of large chemical databases using Dock software.

Materials and methods

Preparations of the receptor protein models

The X-ray structure 1Q3DCitation12 was selected from the PDBCitation13; it is a dimer of two GSK-3β nearly identical chains incorporating staurosporine as a ligand in the active site. Insight II (2000)Citation14 was used to complete the missing residues then to add hydrogens and charges according to a consistent valence force field (CVFF)Citation15. This structure was chosen because the ligand staurosporine is the most hindering known inhibitor while being less flexible; it presents the largest interaction surface of the kinase binding site. With the following process, our purpose was to obtain different states of the kinase that would allow binding of a wide variety of ligands.

In order to obtain several models of the receptor, the structure was divided into two monomers whose similarity was verified by superimposition. The staurosporine was fixed along with the backbone and the structure was minimized during 1000 cycles using a steepest-descent, then a conjugate algorithm to a final root mean square (rms) gradient ≤0.01 kJ Å−1 mol−1. This led to the generation of the first receptor model, which was structurally very similar to the PDB protein. Then a constraint was applied to the backbone and gradually relaxed after the minimization steps; five models were thus obtained with the respective tether force constants of 100, 50, 25, 10 and 0 kcal/Å2. Finally, the constraint applied on staurosporine was also relaxed to lead to the last receptor structure. From now on, these successive models will be referred as Start, T100, T50, T25, T10, T0, Relax. Each structure was obtained from the previous one as described above, and led to discrete to severe differences in terms of both side-chain displacements and backbone moves.

Construction of the ligand databases

All ligands were built with the Insight II Builder module; CVFF charges were applied with hydrogen atoms present and the structures were minimized by a combination of the algorithms described above. A first structurally homogeneous training set () was prepared with 13 indirubines with known affinities ranging from 5 nM to >100 µM16.

Figure 1. Structures of the indirubines composing training set 1.

Figure 1.  Structures of the indirubines composing training set 1.

These were chosen to start our study since this family of chemicals is one of the most prominent among kinase inhibitors. A second training set with much more structural diversity () was prepared from three scaffoldsCitation17 following a procedure similar to the one described above. The reported biological affinities of these 29 chemicals for GSK-3β ranged from 5 nM to 23 µM. Two of these 29 molecules bear a piperidine cycle which is flexible (compounds 23–25 and 26–28), yet rings are treated as rigid by the used docking software. As a consequence, a Monte-Carlo conformational search was performed, and three different representative conformations were chosen since there is no way to predict which conformation is more likely to be active. The ranking calculated for such ligands is a mean of the rankings of the three conformations.

Figure 2. Structures of the compounds composing training set 2. The three representative conformations of the piperidine ring are displayed for compounds 26–28 (as obtained by Monte-Carlo conformational search).

Figure 2.  Structures of the compounds composing training set 2. The three representative conformations of the piperidine ring are displayed for compounds 26–28 (as obtained by Monte-Carlo conformational search).

Virtual screening with Dock

Dock version 5.4 was used as a multi-processor executable on an IBM cluster of 32 Xeon 2.8 GHz hyper-threaded processors, under the Linux operating system. The following process was repeated for each protein model generated as described above.

First, a Connolly surface of the binding site was generated by using a 1.4 Å probe radius, followed by the generation of a set of overlapping spheres that were then clustered according to their spatial distribution. The spheres located too distant from the binding site were manually eliminated in the final model cluster. To compute interaction energies, 3D grids of 0.3 Å resolution were centered on the binding site. These energy-scoring grids were obtained by using an all-atom model and a distance-dependent dielectric function with an infinite cutoff. The size of the grid box was chosen to enclose all selected spheres using an extra margin of 8 Å. A flexible ligand docking was then performed starting with a selection and matching of an anchor cycle within a maximum of 2000 orientations, followed by growth of the ligand with 100 configurations per cycle. The final step included energy minimizations with the Dock scoring function to generate a binding mode with the best energy score. The best 50 generated conformations for each ligand were stored for further scoring. Five independent docking experiments were realized with different random seeds. The resulting ranking was a mean of these five results. In order to check the relevance of the obtained conformations, the root mean square deviation (RMSD) was calculated with crystallographic data as reference when available. At least one close conformation (RMSD <2.0 Å) was found among the 10 best ranked compounds after docking, with very close energy values as calculated using the Dock scoring function.

Re-scoring of the conformations generated by Dock

In order to compare the performance of the scoring techniques, several scoring functions free to academics were used after the docking experiments. Every conformation issued from each of the five Dock experiments was re-evaluated. For the following functions, the ligands were re-ranked by the mean of their best results in each experiment. Correlation coefficients were calculated for all re-scoring results with the biological ranking as the known matrix (linear regression). All the obtained r values were superior to the r0 calculated for α = 1%, meaning that the variables (calculated rank and experimental affinity) were independent while correlatedCitation18.

GBSACitation19 is a force field-based scoring function readily implemented within the Dock software, based on the generalized-born surface area (GB/SA) model of solvation for small molecules. XScoreCitation20 is an empirical scoring function developed from protein–ligand complexes from the PDB, composed of three independent functions (HM, HP, HS) plus a fourth obtained by combining the former (AV). DrugScoreCitation21,Citation22 is a knowledge-based scoring function using potential mean force. Two DrugScore versions have been published with different training sets (_PDB and _CSD); each incorporates three different components (PAIR, SURF, and PAIRSURF).

Results and discussion

The methodology we describe herein was developed in order to overcome some of the problems one can encounter when using VS. The first problem is dependent on the docking software: in order to be able to screen huge libraries, it is necessary to limit the number of degrees of freedom to reduce computation time. In our case, we used a version of the Dock software that allows flexibility of the ligands, but treats the protein as a rigid object. The second problem results from the former, since it is well known that a receptor protein is flexible; one has to take into account the possibility for the side chains of the active site to move as a result of the insertion of a ligand (induced fit). Moreover, it has been previously reported that kinases are subject to conformational changes during activationCitation23, and several methods have been presented to overcome this phenomenonCitation24. A new method based on molecular dynamics was recently proposed to simulate the flexibility of the active site, and it might also be an alternative to our proposed protocolCitation25. We devised a new way to obtain several models from a PDB structure (see “Materials and methods”) that presents the advantage of being inexpensive in terms of computational time and very easy to perform. With seven rigid “pictures” of the desired active site, we introduced conformational diversity into the target receptor. Thus, the VS of GSK-3β with two training sets of ligands was realized in a very short time, and we were able to compare the performance of a combination of protein model-scoring functions that would yield results in accordance with the known inhibitory activity of the chosen chemicals.

Indirubines have been reported as mimics for the ATP purine moiety and tested for kinase inhibition with success. The important structural similarity within this class of compounds prompted our interest in testing our docking parameters with this first training set. The best poses obtained after each docking experiment were then re-evaluated with several scoring functions and the ligands were ranked according to their respective results (number 1 was assigned to the best ligand while number 13 described the worst). The rankings obtained were means of five docking then re-scoring processes, and were used to calculate the regression coefficient for a scoring function, as plotted in .

Figure 3. Correlation coefficients calculated after training set 1 docking and re-scoring. HM, HP, HS, and AV are the four components of XScore. P_, S_, and PS_ stand for the PAIR, SURF, and PAIRSURF functions of DrugScore_CSD or DrugScore_PDB.

Figure 3.  Correlation coefficients calculated after training set 1 docking and re-scoring. HM, HP, HS, and AV are the four components of XScore. P_, S_, and PS_ stand for the PAIR, SURF, and PAIRSURF functions of DrugScore_CSD or DrugScore_PDB.

Observation of the variations shows a dependence on the protein model, while the calculated correlations span from a negative value up to 80%. The four functions embedded in XScore displayed the same tendency, with a dramatic decrease with the T10 model and much better results with the Relax model. The GBSA function showed a very poor performance when applied on this training set, while the Dock function was better, following the same inclination. The DrugScore functions (PAIR or PAIRSURF) achieved the best results with the starting model (80%), and decreased gradually while relaxing the protein backbone. The SURF function, which gave identical results whatever the version of DrugScore, presented a contrary aptitude, following the XScore trend. The advantage of DrugScore appeared when comparing the correlation coefficients obtained for the five experiments (up to 87%), meaning a more stable aspect and a better convergence in the selected poses, especially when considering the starting model (SD).

A second training set was then screened with the docking parameters devised for the indirubines, then submitted to the re-scoring processes described above. In this case, the chemical diversity was much higher with three different scaffolds, along with the increased number of ligands; this led to lower correlation coefficients as shown in .

Figure 4. Correlation coefficients calculated after training set 2 docking and re-scoring. HM, HP, HS, and AV are the four components of XScore. P_, S_, and PS_ stand for the PAIR, SURF, and PAIRSURF functions of DrugScore_CSD or DrugScore_PDB.

Figure 4.  Correlation coefficients calculated after training set 2 docking and re-scoring. HM, HP, HS, and AV are the four components of XScore. P_, S_, and PS_ stand for the PAIR, SURF, and PAIRSURF functions of DrugScore_CSD or DrugScore_PDB.

Again, a dependence on the protein model was observed; however, variations were weaker compared to the results obtained with the first training set. The Dock scoring function displayed nearly no dependence, while the GBSA function decreased rapidly to negative values and the XScore components were close to zero (note that the HS function showed a better trend than its congeners). The DrugScore function performed rather better than the average again, still with the exception of the SURF function, which was rather disappointing again. As observed with the first training set, the best correlation was obtained by the combination of PAIR_CSD and the SD model, even though both PAIRSURF functions also displayed good results. The correlation coefficients were lower with this collection of compounds, but DrugScore functions again outperformed the others (see ).

Table 1. Representative correlation coefficients.

Many reasons have been discussed elsewhere to explain the individual performance of the scoring functions, but so far no general rules have been advanced about how to select an adequate one: the trial and error process still seems to be the only answer.

As expected, the correlations from the second training set stand lower than those obtained with indirubines. Nonetheless, the combination of the SD protein model with DrugScore has given reasonably good results in terms of ligand ranking compared to the known biological activity. Interestingly, the SD model, which is closest to the crystal structure obtained from the PDB, proved to be efficient enough. Since false positives or false negatives are unavoidableCitation26 in VS, there is no other way to identify new ligands but to experimentally test these. With such a procedure as described herein, quick and easy to set up, we expect to be able to shorten considerably the list of compounds to screen biologically or to synthesize. Considering the huge number of chemically available compounds, there is still a need to prioritize the top list of compounds to consider for further experimentation. From this point of view and because computational methods have improved while calculation means have increased, it is now possible to envision VS of large chemical databases as an essential step in a drug design process.

Further work is in progress with other kinases, and an automatized process is under testing in order to perform reverse screening (looking for the target of a specific inhibitor).

Acknowledgments

We address our most sincere thanks to Professor J.-P. Mazat for very helpful discussions in the field of statistics. IECB is gratefully acknowledged for the use of its PC cluster. All scoring functions used in this report (XScore, DrugScore, Dock, and GBSA) were obtained from their respective authors to whom we address our best regards.

Declaration of interest

The authors wish to express their most sincere thanks to the Association pour la Recherche contre le Cancer (ARC) and Ligue Nationale Contre le Cancer (Comité de la Dordogne, Comité de la Gironde) for their financial support.

References

  • Slamon DJ, Leyland-Jones B, Shak S, Fuchs H, Paton V, Bajamonde A, et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N Engl J Med 2001;344:783–92.
  • Druker BJ, Sawyers CL, Kantarjian H, Resta DJ, Reese SF, Ford JM, et al. Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome. N Engl J Med 2001;344:1038–42.
  • Davies SP, Reddy H, Caivano M, Cohen P. Specificity and mechanism of action of some commonly used protein kinase inhibitors. Biochem J 2000;351:95–105.
  • Fabian MA, Biggs WH 3rd, Treiber DK, Atteridge CE, Azimioara MD, Benedetti MG, et al. A small molecule-kinase interaction map for clinical kinase inhibitors. Nat Biotechnol 2005;23:329–36.
  • Fitzgerald CE, Patel SB, Becker JW, Cameron PM, Zaller D, Pikounis VB, et al. Structural basis for p38α MAP kinase quinazolinone and pyridol-pyrimidine inhibitor specificity. Nat Struct Biol 2003;10:764–9.
  • Eldar-Finkelman H, Schreyer SA, Shinohara MM, LeBoeuf RC, Krebs EG. Increased glycogen synthase kinase-3 activity in diabetes- and obesity-prone C57BL/6J mice. Diabetes 1999;48:1662–6.
  • Takashima A, Yamaguchi H, Noguchi K, Michel G, Ishiguro K, Sato K, et al. Amyloid β peptide induces cytoplasmic accumulation of amyloid protein precursor via tau protein kinase I/glycogen synthase kinase-3 βin rat hippocampal neurons. Neurosci Lett 1995;198:83–6.
  • Martinez A, Castro A, Dorronsoro I, Alonso M. Glycogen synthase kinase 3 (GSK-3) inhibitors as new promising drugs for diabetes, neurodegeneration, cancer, and inflammation. Med Res Rev 2002;22:373–84.
  • Ougolkov AV, Fernandez-Zapico ME, Savoy DN, Urrutia RA, Billadeau DD. Glycogen synthase kinase-3β participates in nuclear factor κB-mediated gene transcription and cell survival in pancreatic cancer cells. Cancer Res 2005;65:2076–2081.
  • Woodgett JR. Molecular cloning and expression of glycogen synthase kinase-3/factor A. EMBO J 1990;9:2431–8.
  • Moustakas DT, Lang PT, Pegg S, Pettersen E, Kuntz ID, Brooijmans N, et al. Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des 2006;20:601–19.
  • Bertrand JA, Thieffine S, Vulpetti A, Cristiani C, Valsasina B, Knapp S, et al. Structural characterization of the GSK-3β active site using selective and non-selective ATP-mimetic inhibitors. J Mol Biol 2003;333:393–407.
  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res 2000;28:235–42.
  • http://www.accelrys.com/products/
  • Dauber-Osguthorpe P, Roberts VA, Osguthorpe DJ, Wolff J, Genest M, Hagler AT. Structure and energetics of ligand binding to proteins: Escherichia coli dihydrofolate reductase-trimethoprim, a drug-receptor system. Proteins Struct Func Gen 1988;4:31–47.
  • Meijer L, Skaltsounis AL, Magiatis P, Polychronopoulos P, Knockaert M, Leost M, et al. GSK-3-selective inhibitors derived from Tyrian purple indirubins. Chem Biol 2003;10:1255–66.
  • Taha MO, Bustanji Y, Al-Ghussein MA, Mohammad M, Zalloum H, Al-Masri IM, et al. Pharmacophore modeling, quantitative structure-activity relationship analysis, and in silico screening reveal potent glycogen synthase kinase-3β inhibitory activities for cimetidine, hydroxychloroquine, and gemifloxacin. J Med Chem 2008;51:2062–77.
  • Schwartz D, ed. Méthodes Statistiques à l’Usage des Médecins et des Biologistes. Paris: Flammarion Médecine-Sciences, 1993.
  • Liu HY, Kuntz ID, Zou X. Pairwise GB/SA scoring function for structure-based drug design. J Phys Chem B 2004;108: 5453–62.
  • Wang R, Lai L, Wang S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des 2002;16:11–26.
  • Gohlke H, Hendlich M, Klebe G. Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 2000;295:337–56.
  • Velec HF, Gohlke H, Klebe G. DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J Med Chem 2005;48:6296–303.
  • Huse M, Kuriyan J. The conformational plasticity of protein kinases. Cell 2002;109:275–82.
  • Cavasotto CN, Abagyan RA. Protein flexibility in ligand docking and virtual screening to protein kinases. J Mol Biol 2004;337:209–25.
  • Withers IM, Mazanetz MP, Wang H, Fischer PM, Laughton CA. Active site pressurization: a new tool for structure-guided drug design and other studies of protein flexibility. J Chem Inf Model 2008;48:1448–54.
  • Alvarez JC. High-throughput docking as a source of novel drug leads. Curr Opin Chem Biol 2004;8:365–70.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.