Full article: Estimating affinities of calcium ions to proteins

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Ca²⁺-ions have a range of affinities to different proteins, depending on the various functions of these proteins. This makes the determination of Ca²⁺-protein affinities an interesting subject for functional studies. We have investigated the performance of two methods – Fold-X and AutoDock vina – in the prediction of Ca²⁺-protein affinities. Both methods, although based on different energy functions, showed virtually the same correlation with experimental affinities. Guided by insight from experiment, we further derived a simple linear model based on the solvent accessible surface of Ca²⁺ that had practically the same performance in terms of absolute errors as the more complex docking methods.

Keywords:

Introduction

Calcium ions (in the following termed Ca²⁺-ions or Ca²⁺) are important signalling agents that mediate a large number of intra- and extra-cellular processes,^Citation1 for instance blood clotting, neurotransmission, or muscle contraction. Many of these processes involve proteins that bind Ca²⁺ more or less transiently, and accordingly, with a wide range of Ca²⁺-protein affinities.^Citation2 In order to study the functional mechanisms of these proteins, it is desirable to determine these affinities. For some proteins these affinities have been determined experimentally,^Citation3 but to our experience these data are not available for most of the Ca²⁺-binding proteins. Given that the Protein Data Bank (PDB^Citation4) currently contains about five thousand structures of such proteins, it would be attractive for mechanistic studies to have a method at hand to quickly estimate Ca²⁺-protein affinities based on the structures of the corresponding complexes.

Theoretically, it should be possible to compute the Ca²⁺-protein affinity by free energy techniques based on physical models, eg, by pulling the ion out of its pocket with a series of umbrella potentials in molecular dynamics simulations and integrating over the potential of mean force,^Citation5 or by the Molecular Mechanics – Poisson-Boltzmann/Surface Area (MM-PB/SA) method.^Citation6 Despite their indisputable potential, these techniques have in the case of Ca²⁺-protein affinity so far not been demonstrated quantitative agreement with experiment, and, in addition, they are relatively costly in terms of computational resources. An alternative would be an estimation using empirical approaches. An early attempt towards a fast estimation of Ca²⁺-protein affinity was the work by Boguta et al^Citation7 who related secondary structure information with Ca²⁺-protein affinity. They found that for some proteins these relations could be used to classify their affinities, while for other proteins their scheme was less successful.^Citation8 More recently, Schymkowitz et al^Citation9 have published Fold-X, a method and empirical force field developed for, amongst other things, the fast prediction of the binding sites and affinities of metal ions, including Ca²⁺ and its rival Mg²⁺. Since Fold-X is a kind of docking method for special ligands, it would be interesting to compare the predictive performance of this method with that of a non-specialized state-of-the-art docking method in order to assess the advancement achieved by the special parametrization of protein-metal-ion interactions in Fold-X.

The importance of Ca²⁺-protein binding has over the years led to large body of experimental work from which a qualitative picture has emerged of the factors that govern Ca²⁺-protein affinity.^{Citation2,Citation3} It is particularly notable that Ca²⁺- binding usually seems to be dominated by a gain in entropy, probably due to the release of water molecules from the solvation shell of the ion; in other words: the less ligands of the protein-bound Ca²⁺ are water molecules (and the more are functional groups of the protein), the tighter the binding. Such qualitative models may also be helpful for guiding the development of methods for affinity estimation.

In what follows we will address some of the points raised above, namely, we will compare the correlation of experimental affinities with affinities predicted by Fold-X and the state-of-the-art docking method AutoDock vina.^Citation10 The surprising finding of this comparison has prompted further study to possibly identify more simple computational models with the same power and speed in estimation of Ca²⁺-protein affinities. We will show that such models can be found and that they reflect knowledge attained by experimental work.

Materials and methods

We compared two docking methods for their ability to estimate Ca²⁺-protein affinities, Fold-X^Citation9 and AutoDock vina.^Citation10 Fold-X has been published as a method for the prediction of positions of metal-ions on proteins, and for the prediction of affinities between proteins and metal-ions. The energetic model underlying Fold-X has an ad hoc form with a number of parameters that have been fitted to experimental data.^Citation9 We obtained Fold-X versions 2.5.2 and 3.0b3 as executables for Linux from the respective server, including documentation. Despite considerable efforts we were not able to generate estimates of affinities of metal-ions to proteins using the commands described in the documentation of the software; we suspect that the option in those versions of the software is dysfunctional. Fortunately, the authors of Fold-X have offered as Table 4 in their supplementary material to Ref^Citation9 a list of 48 Ca²⁺-binding pockets in 19 X-ray structures, mostly with experimentally determined affinities, and affinities predicted with an earlier version of Fold-X. Hence, we took these data (“Fold-X dataset”) as basis for the comparison, specifically the columns “experimental energy” and “predicted energy” of Table 4 in Ref.^Citation9 For five of the binding pockets, two experimental energies were given; since the differences between the first and second energies were relatively small, only the first value was considered in each case. For ease of comparison with Ref ^Citation9 we give all affinities in units of kcal/mol (1 kcal/mol = 4.1868 kJ/mol).

The 19 X-ray structures of the Fold-X dataset were retrieved from the Protein Data-bank (PDB^Citation4) for comparative analysis with AutoDock vina.^Citation10 AutoDock vina version 1.0.3 for Linux was downloaded as executable from the website of its authors.

Usually, some information is missing from X-ray structures that is needed for energy calculations. Most importantly, this is the case for hydrogen positions, including also hydrogen bond networks. Related to this is the possibility to optimize X-ray structures by flipping carbonyl-oxygens and –NH groups (both groups have similar electron densities). Finally, sometimes X-ray structures contain atomic overlaps that can be removed relatively easily. To see whether by considering these effects, affinity predictions can be improved, we used three different protocols with AutoDock vina.

In the first protocol, PDB files were prepared with the AutoDockTool suite of AutoDock 4^Citation11 by removing water molecules, non-standard residues, and alternate positions of residues. Then polar hydrogens were added to the protein in standard orientation without rotational optimization, and Gasteiger charges^Citation12 were computed for protein atoms by AutoDock. Finally, all Ca²⁺-ions in the protein were redocked with AutoDock vina using default parameters, except for the “search space”, ie, the volume in which the optimal docking position is searched for. Since we were interested in affinities at the crystallographically determined positions of the Ca²⁺-ions, and not in finding optimal positions, we restricted the search space to the minimum allowed by AutoDock vina around the crystallographically determined positions, namely a cube of 1 Å (0.1 nm) length in x-, y-, and z-directions around the crystallographic positions of the Ca²⁺-ions. This protocol was applied to all Ca²⁺-ions in the Fold-X dataset.

In the second protocol we introduced in the preparation of the protein structures a further step in which the “reduce” method^Citation13 was used to optimize positions of hydrogen atoms around crystallographically determined heavy-atom positions, including also potential flips of amide-groups in side chains of asparagine and glutamine. Otherwise this second protocol had the same elements as the first, including also the restrained docking of Ca²⁺-ions with AutoDock vina.

The third protocol was similar to the first one, but additionally optimized hydrogen positions with pdb2pqr,^Citation14 including debumping to avoid steric clashes and optimization of the hydrogen bonding network.

For computations of solvent accessible surfaces (SAS) we used MSMS version 2.5.7^Citation15 with standard atomic radii for protein atoms and 1 Å for Ca²⁺. All statistical analyses were carried out with R.^Citation16

Results and discussion

Comparison of Fold-X and AutoDock vina

We first analyzed the correlation of Ca²⁺-affinity predictions by Fold-X and AutoDock vina with experimental data in the Fold-X dataset. To this end, linear models were fitted using the least-squares algorithm. The Fold-X predictions had a value of Pearson correlation coefficient r of 0.67 with a least-square fitted straight line ΔG_exp = 3.4221 kcal/mol + 0.5854 ΔG_pred (). The predictions by AutoDock vina (using the first protocol described in “Materials and methods”) had r = 0.71 with a best-fit line ΔG_exp = −2.393 kcal/mol + 7.309 ΔG_pred (). Both above correlation coefficients are different from zero with high significance according to t-tests (p ≈ 10⁻⁶). Although the range of predictions by Fold- X has a much better overlap with the range of experimental affinities, the linear correlation as given by r is slightly worse than that obtained with the AutoDock vina predictions. However, closer inspection of the data shows that the ranking of r-values is mainly based on a single value, the outlier in the lower left corner in . This outlier is given in the Fold-X dataset with a negative experimental affinity of −1.2 kcal/mol, though without reference to an experimental source of that value. The corresponding structure in the Fold-X dataset is that of a calmodulin of Paramecium tetraurelia^Citation17 (PDB entry 1exr), and the conspicuous affinity value probably refers to an unusual fifth Ca²⁺-ion bound to a pocket that, according to the crystallographers, probably had been created by crystal contacts and thus may be without functional relevance. In the Fold-X dataset there is no prediction given for this pocket, while AutoDock vina produces the mentioned outlier. We can interpret this complex as representative of an extremely weakly bound Ca²⁺, and if we replace the experimental value of −1.2 kcal/mol by 0 kcal/mol (ie, zero affinity) the correlation coefficient r of AutoDock vina and experiment is practically unchanged at 0.71. If we omit this experimental value altogether, the value of r drops to 0.63, which is somewhat lower than r = 0.67 of Fold-X with experiment. If we consider the 95% confidence intervals based on a t-test for r we obtain [0.46, 0.81] for Fold-X vs experiment and [0.41, 0.78] for AutoDock vina vs experiment. Thus, the correlation coefficients between Fold-X predictions and experiment, and AutoDock vina predictions and experiment are virtually equal. This is astonishing since Fold-X had been at least partially calibrated with the same dataset, as mentioned in Ref,^Citation8 while AutoDock vina has probably not been specifically developed to solve the problem for which we have employed it here.

Figure 1 Correlation of Fold-X predictions and experiment. The straight line is a least-square fit between experimentally determined Ca²⁺-protein affinities (ΔG_exp) and affinities predicted with Fold-X (ΔG_pred). Pearson correlation coefficient is r = 0.67. Data.^Citation8

Figure 1 Correlation of Fold-X predictions and experiment. The straight line is a least-square fit between experimentally determined Ca2+-protein affinities (ΔGexp) and affinities predicted with Fold-X (ΔGpred). Pearson correlation coefficient is r = 0.67. Data.Citation8

Figure 2 Correlation of AutoDock vina predictions and experiment. The dashed straight line is the least-square fit between experimentally determined Ca²⁺-protein affinities (ΔG_exp) and affinities predicted with AutoDock vina (ΔG_pred). Pearson correlation coefficient is r = 0.71. If the outlier in the lower-left is dropped, r decreases to 0.63 (solid line). Experimental affinities.^Citation8

The two other protocols used in conjunction with AutoDock vina (see “Materials and methods”) had little effect and did not improve the correlation with experiment (r = 0.71 and r = 0.69 for the second and third protocol, respectively). This may be due to the fact that these protocols mainly affect hydrogen positions, while Ca²⁺-binding pockets usually are dominated by anionic groups with few protons.

A simple model for estimating Ca²⁺–protein affinities

The fact that Fold-X and AutoDock vina did show the same correlation with experiment could be a consequence of the two underlying energy models capturing the same dominating cause of Ca²⁺-protein affinity. If this hypothesis is true, we should find a strongly decreased correlation of the two models with experiment after removing the contribution of that dominating cause out of the data.

A candidate for such a dominating effect mentioned in the introduction is the entropy gain due to water molecules that are released from the first hydration shell of Ca²⁺ on binding of the ion to the protein. In other words, the more water molecules are still attached to the protein-bound Ca²⁺, the lower the entropy gain and thus, the lower the affinity. This argument suggests an avenue to a computational test of the above hypothesis of a dominating factor: if we assume that the number of water molecules attached to the protein-bound Ca²⁺ is proportional to the solvent accessible surface (SAS) of the ion, we should expect a negative linear correlation of experimental free energy of binding ΔG_exp and SAS. The part of ΔG_exp not explained by the correlation with SAS is then contained in the residuals e_i = αSAS_i + β −ΔG_exp,i, with α and β slope and intercept, respectively, of the least-squares fitted linear model, and SAS_i and ΔG_exp,i the SAS and experimental affinity, respectively, of the ith Ca²⁺ in the Fold-X dataset. If the above hypothesis of a dominating factor is true, there should be a much smaller correlation between the predicted ΔG_FoldX,i (and ΔG_vina,i) and e_i as compared to the correlation between ΔG_FoldX,i (and ΔG_vina,i) and ΔG_exp,i. In the following we carry out this partial correlation analysis.

As solvent probe radius we first assumed 1.4 Å, a value that is frequently used to model a molecular “water-sphere”. With this sphere the distribution of SAS values was strongly skewed with a peak at the lowest SAS values. In fact, in thirteen of the binding pockets in the Fold-X dataset, Ca²⁺ was not accessible at all (SAS = 0 Å²). An Anderson-Darling test rejected with high significance that the SAS values are normally distributed. Therefore, the correlation with ΔG_exp was not tested with Pearson correlation coefficient r but with Spearman rank correlation ρ. We found that ρ = −0.52 was significantly different from zero (significance level 0.05, P = 2 · 10⁻⁴) (see ). This correlation dropped only slightly when the outlier discussed above was omitted.

Table 1 Spearman rank correlation ρ of four models with experimental affinities ΔG_exp and residuals e

Download CSV Display Table

The correlation of ΔG_FoldX with the residuals e_i of the least-squares fitted linear model ΔG_exp(SAS) was somewhat lower (ρ = 0.52) than the correlation of ΔG_FoldX with ΔG_exp, but remained highly significant (P = 4 · 10⁻⁴). The same was true for ΔG_vina (ρ = 0.50, P = 4 · 10⁻⁴). In view of our hypothesis this means that there is an effect on the affinity that can be formulated in terms of SAS, but it may not be dominating affinity.

Our argument has so far neglected the fact that the crystal structures are results of an averaging process. A protein at ambient temperatures explores many conformations, so that the solvent accessibility of Ca²⁺-ions computed for the crystal structure may not reflect its true accessibility. Since Ca²⁺-ions in proteins are often surrounded by a tightly packed first co- ordination shell and thus have minimum solvent accessibility, we expect that conformational flexibility could perturb that packing and thus lead on average to a higher solvent accessibility. An approach that takes mobility into account could be to simulate the molecular dynamics and compute the accessibility as thermodynamic average. As this is computationally expensive, and we were more interested in a fast approximation, we tried to find a faster alternative that works in a similar direction. In a sense, the higher accessibility due to the protein flexibility can be mimicked by using a probe with a smaller radius. We therefore carried out the partial correlation analysis described above with a series of smaller probe radii between 1.4 Å and 0.3 Å. The highest correlations of SAS with ΔG_exp were obtained with 0.4 Å and 0.5 Å. As the numerically safe minimum probe radius in MSMS (see “Materials and methods”) is 0.5 Å, we completed our analysis with this value (see and ).

Figure 3 Correlation of SAS (probe radius 0.5 Å) and experiment. The straight line is the least-squares fit between experimentally determined Ca²⁺-protein affinities (ΔG_exp) and the solvent accessible surfaces (SAS) of the Ca²⁺-ions. Spearman rank correlation coefficient ρ is −0.53 (P = 2 · 10⁻⁴ for null-hypothesis ρ = 0). Experimental affinities.^Citation8

Figure 3 Correlation of SAS (probe radius 0.5 Å) and experiment. The straight line is the least-squares fit between experimentally determined Ca2+-protein affinities (ΔGexp) and the solvent accessible surfaces (SAS) of the Ca2+-ions. Spearman rank correlation coefficient ρ is −0.53 (P = 2 · 10−4 for null-hypothesis ρ = 0). Experimental affinities.Citation8

While the correlation ρ of SAS with ΔG_exp changed only marginally from −0.52 to −0.53 and the p-value remained constant, the correlation of the residuals with ΔG_FoldX and ΔG_vina, respectively, dropped more strongly to 0.39 and 0.34 with p-values of 0.01 and 0.02 indicating no longer highly significant correlation. Omission of the discussed outlier does not change the picture; conversely, one could argue that a fully solvent exposed Ca²⁺ with ΔG = 0 (which is approximately the case for the outlier) should be included in the data to represent the limiting case of no binding. Overall we can conclude that SAS with probe radius of 0.5 Å (SAS_0.5) indeed models a dominating effect on affinity.

For a simple model to predict ΔG from SAS we did a least-squares fit of SAS_0.5 and ΔG_exp and found

Δ G_{e x p} \approx - 0.63 \cdot S A S_{0.5} + 11.95

(1)

with SAS_0.5 in Å² and ΔG_exp in kcal/mol. Following the above argument we replaced in the fit the outlier by the point given by theoretical values for an unbound Ca²⁺, ie, SAS_0.5 = 4π(r_ca₋_ion + r_probe)² = 4π1.5² = 18.85 and ΔG = 0, and enforced inclusion of this point (18.85, 0) in the linear model. The mean of the absolute errors of this model on the Fold-X dataset was 1.9 kcal/mol, which is not much larger than the mean absolute error of 1.8 kcal/mol of Fold-X itself against ΔG_exp.

We assessed the robustness of the model of EquationEq. (1) $Δ G_{e x p} \approx - 0.63 \cdot S A S_{0.5} + 11.95$ (1) in a leave-one-out test: Each of the ΔG_exp values was left out of the fitting procedure, and then a model was derived from the other values. The ΔG_exp of the Ca²⁺-ion left out was then predicted by applying the new model to the SAS_0.5 value of the left-out Ca²⁺-ion. This was iterated over all ΔG values. The resulting mean of absolute errors was 2.0 kcal/mol.

Thus, we have achieved our goal of a simple and fast computational procedure that allows an estimation of Ca²⁺- protein affinities based on the structure of the complexes. Judged from the numerical experiments described above, the accuracy of the method should be high enough to classify Ca²⁺-binding pockets into weakly or strongly binding. The accuracy is limited by several factors, of which we mention two: First, as pointed out in Ref ^Citation8 the experimental data on which the model has been based may in part not satisfy modern standards. Second, the simple model of EquationEq. (1) $Δ G_{e x p} \approx - 0.63 \cdot S A S_{0.5} + 11.95$ (1) does completely neglect that the binding of calcium is often accompanied by global re-arrangements of protein conformation that also affect the free energy of binding.^Citation2

Finally, we can speculate how Fold-X and AutoDock vina with their different energy functions could nevertheless capture the effect expressed in terms of SAS that we mainly attribute to the entropy gain due to release of water molecules bound to the solvated Ca²⁺. Neither the energy function of Fold-X nor that of AutoDock vina contains a term that explicitly takes into account this physical effect. However, both Fold-X and AutoDock vina evaluate the Ca²⁺-protein affinity essentially by estimating interactions of Ca²⁺ with the atoms of the protein lining the binding pocket. According to our experience these pockets are dominated by anionic groups and groups with negative partial charges. The direct interaction of Ca²⁺ with such negative groups is in fact taken into account by Fold-X and AutoDock vina, and this may be the cause of the apparent correlation with SAS: the more negative groups are around a Ca²⁺, the lower the predicted affinity due to direct interaction, but also the lower the SAS of that Ca²⁺, because each of the neighboring groups will supplant water molecules and lead to their release.

Acknowledgments/disclosures

The authors report no conflicts of interest in this work. Funding by BMBF grant number 01EZ0933 is gratefully acknowledged.

References

BerridgeMJBootmanMDRoderickHLCalcium signalling: dynamics, homeostasis and remodellingNat Rev Mol Cell Biol2003451752912838335
PubMed Web of Science ®Google Scholar
GiffordJLWalshMPVogelHJStructures and metal-ion-binding properties of the Ca2+-binding helix-loop-helix ef-hand motifsBiochem J200740519922117590154
PubMed Web of Science ®Google Scholar
LinseSForsénSDeterminants that govern high-affinity calcium bindingAdv Second Messenger Phosphoprotein Res199530891517695999
PubMedGoogle Scholar
BermanHMWestbrookJFengZThe Protein Data BankNucleic Acids Res20002823524210592235
PubMed Web of Science ®Google Scholar
KobayashiCTakadaSProtein grabs a ligand by extending anchor residues: molecular simulation for Ca2+ binding to calmodulin loopBiophys J2006903043305116473902
PubMed Web of Science ®Google Scholar
ZhaoJNelsonDJHuoSPotential influence of asp in the Ca2+ coordination position 5 of parvalbumin on the calcium-binding affinity: a computational studyJ Inorg Biochem20061001879188716965819
PubMed Web of Science ®Google Scholar
BogutaGStepkowskiDBierzyńskiATheoretical estimation of the calcium-binding constants for proteins from the troponin c superfamily based on a secondary structure prediction method. i. estimation procedureJ Theor Biol198813541613256716
PubMed Web of Science ®Google Scholar
BogutaGStepkowskiDBierzyńskiATheoretical estimation of the calcium-binding constants for proteins from the troponin c superfamily based on a secondary structure prediction method. ii. applicationsJ Theor Biol198813563733256717
PubMed Web of Science ®Google Scholar
SchymkowitzJWRousseauFMartinsICFerkinghoff-BorgJStricherFSerranoLPrediction of water and metal binding sites and their affinities by using the fold-x force fieldProc Natl Acad Sci U S A2005102101471015216006526
PubMed Web of Science ®Google Scholar
TrottOOlsonAJAutodock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreadingJ Comput Chem20093145546119499576
PubMed Web of Science ®Google Scholar
MorrisGMHueyRLindstromWAutodock4 and autodocktools4: Automated docking with selective receptor flexibilityJ Comput Chem2009302785279119399780
PubMed Web of Science ®Google Scholar
GasteigerJMarsiliMIterative partial equalization of orbital electronegativity – rapid access to atomic chargesTetrahedron19803632193228
Web of Science ®Google Scholar
WordJMLovellSCRichardsonJSRichardsonDCAsparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientationJ Mol Biol1999285173517479917408
PubMed Web of Science ®Google Scholar
DolinskyTJNielsenJEMcCammonJABakerNAPdb2pqr: an automated pipeline for the setup of poisson-boltzmann electrostatics calculationsNucleic Acids Res200432W66566715215472
PubMed Web of Science ®Google Scholar
SannerMFOlsonAJSpehnerJCReduced surface: an efficient way to compute molecular surfacesBiopolymers1996383053208906967
PubMed Web of Science ®Google Scholar
R Development Core TeamR: A Language and Environment for Statistical ComputingR Foundation for Statistical ComputingVienna, Austria20063-900051-07-0 http://www.R-project.org
Google Scholar
WilsonMABrungerATThe 1.0 a crystal structure of ca(2+)-bound calmodulin: an analysis of disorder and implications for functionally relevant plasticityJ Mol Biol20003011237125610966818
PubMed Web of Science ®Google Scholar

Estimating affinities of calcium ions to proteins

Abstract

Introduction

Materials and methods

Results and discussion

Comparison of Fold-X and AutoDock vina

A simple model for estimating Ca²⁺–protein affinities

Table 1 Spearman rank correlation ρ of four models with experimental affinities ΔG_exp and residuals e

Acknowledgments/disclosures

References

Information for

Open access

Opportunities

Help and information

Estimating affinities of calcium ions to proteins

Abstract

Introduction

Materials and methods

Results and discussion

Comparison of Fold-X and AutoDock vina

A simple model for estimating Ca2+–protein affinities

Table 1 Spearman rank correlation ρ of four models with experimental affinities ΔGexp and residuals e

Acknowledgments/disclosures

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

A simple model for estimating Ca²⁺–protein affinities

Table 1 Spearman rank correlation ρ of four models with experimental affinities ΔG_exp and residuals e