1,744
Views
15
CrossRef citations to date
0
Altmetric
Editorial

Peptide retention prediction in reversed-phase chromatography: proteomic applications

Pages 1-4 | Published online: 09 Jan 2014

The vast majority of proteomics analyses are performed in a bottom-up fashion, where targeted proteins are subjected to proteolytic digestion followed by reversed-phase (RP) liquid chromatography-mass spectrometry (LC-MS) analysis of the resulting peptide mixture. LC-MS as a hybrid analytical technique has undergone significant developments in the past two decades, mostly driven by the MS component since the introduction of the soft electrospray ionization (ESI) and MALDI ionization methods in the early 1990s. While improvements in performance of modern MS equipment are slowing down, greater attention is paid to the use of the ‘auxiliary’ information provided by the RPLC component of LC-MS. In proteomic peptide analyses, LC and MS can be viewed as complementary separation techniques, with MS possessing much higher separation power – thus, the masses of analytes (and their fragments) can be calculated and measured with high precision. In an attempt to provide the same ‘predictive power’ to RPLC component, proteomic researchers are pursuing an improved understanding of the peptide separation mechanism. Better peptide retention prediction models allow peptide retention to be used as a second constraint in protein identification, both in shotgun proteomics Citation[1] and for streamlining scheduled multiple reaction monitoring (MRM; selected reaction monitoring [SRM]) method development Citation[2]. The availability of extended peptide retention datasets and the need to develop more accurate prediction models have driven numerous studies in this field and led to a retention prediction ‘renaissance’ in the proteomics era Citation[3–12]. We envision peptide retention prediction as being an important component of proteomic software applications in both instrument control and data analysis.

Peptide retention prediction in RPLC has been a popular topic since the early 1980s. The fundamental assumption is that peptide hydrophobicity can be computed as the sum of hydrophobicities (retention coefficients) of its constituent amino acids Citation[13]. These ‘purely additive’ models were built around optimizing these 20 parameters against observed peptide retention time values. In this early work, the limited size of retention datasets (typically 25–50 peptides) led to significant over-fitting and discrepancies between assigned retention coefficients across studies. Only recently has a consensus of retention values for the naturally occurring amino acids in the C18 phase been determined by independent RPLC measurements Citation[14,15]. Multiple early studies indicated that secondary structure and nearest-neighbor effects could significantly alter a peptide’s chromatographic behavior. However, models attempting to address the sequence-dependent character of peptide retention have only emerged recently with the availability of large peptide datasets in the proteomics era.

Sequence Specific Retention Calculator (SSRCalc) was the first algorithm to address these issues. The authors Citation[3] proposed a mechanism and quantitative description of the ion-pairing effect on apparent hydrophobicity of N-terminal residues at sequence positions 1–3. Most tryptic peptides carry a free N-terminal amino group positively charged at acidic eluent conditions. A cloud of negatively charged anions of the ion-pairing modifier interacts with the positively charged N-terminus, reducing the apparent hydrophobicities of N-terminal residues. A similar mechanism affects the retention contribution from residues located next to internal Lys, Arg and His. Further study by Petritis et al.Citation[4] attempted the differentiation of up to 25 positions inside a peptide chain by assigning retention coefficients specific to each position from the N- and C-termini. This required the collection of a very large training set of approximately 345,000 peptides. Another important sequence-specific feature is the formation of amphipathic helices: α-helical structures that form two distinctive faces – hydrophobic and hydrophilic. Upon interaction with hydrophobic surfaces, the peptide orientation favors interaction of its hydrophobic face, resulting in an extremely high retention. While the most advanced models Citation[4–6] addressed helicity in some fashion, reliable computation of peptide helicity and its effect on chromatographic retention are crucial for progress in the field.

Overall, we have witnessed significant improvement in the development of RPLC predictive models over the past decade. The typical reported peptide retention prediction accuracy has increased from approximately 0.91–0.92 to 0.96–0.98 R2 linear correlation values. However, there are still a number of applied and fundamental problems that restrict the introduction of retention prediction to common proteomic practice, and these will be described in the following sections.

A great diversity of chromatographic systems (columns & eluents) is employed

It is common knowledge within the chromatographic community that alteration of RPLC chromatographic conditions will lead to variation in separation selectivity, but such understanding is not typical for the field of proteomics. The use of different ion-pairing modifiers (formic, acetic and trifluoroacetic [TFA] acids) alters the separation selectivity. However, it is common to see the application of retention prediction models without consideration of the eluent composition. Furthermore, there are more than 400 RP sorbents commercially available, so particular attention should be paid when using retention prediction models: the bonded chemistry type (C8 or C18), end-capping chemistry, pore size and carbon load of the sorbent should all correspond to the conditions used for the employed algorithm development.

Krokhin and coworkers addressed the issue of chromatographic condition diversity by developing four different versions of the SSRCalc model: 300 Å – TFA, 100 Å – TFA, 100 Å – formic acid and 100 Å – pH 10 Citation[5,16]. These conditions cover the most popular RPLC applications in proteomics: TFA for off-line fractionation, including LC-MALDI MS; formic acid for ESI; and pH 10 for off-line fractionation in 2D RP-RP/MS. For better results, selecting RPLC conditions as close as possible to one of the available predictive models is strongly recommended by the authors.

Petritis et al.Citation[4] optimized their normalized elution time (NET) predictor using an analytical neural network approach. A very large peptides dataset was collected using a mixture of TFA and acetic acid as eluent components. The selectivity of this model is very similar to TFA-alone-based prediction algorithms, since TFA is the dominating ion-pairing agent in their mixed system. This limits the applicability of the NET predictor, as most bottom-up applications utilize the formic acid modifier compatible with ESI.

Several authors have proposed constructing predictive models that are optimized ‘on the spot’ using a portion of the sample’s identified peptides for training Citation[6,9,10]. This provides greater flexibility and theoretically allows for any RPLC conditions. They utilize fast optimization techniques like support vector regression (SVR), minimizing the time required for optimization. However, the required size of the training dataset might be an obstacle to its widespread adoption. It was recently demonstrated that a simple additive model reaches its maximum accuracy (R2 value: ˜0.945) using a 200–300 peptide training dataset Citation[15], but an advanced sequence-specific algorithm (R2 value ˜0.97–98) requires significantly more peptides. There is no guarantee that every chromatographic run under investigation will provide a sufficient number of confident sequence identifications to build a predictive model.

Reproducibility & quality of chromatographic separations

The reproducibility and quality of chromatographic separations should be monitored with the same precision and attention to detail as the calibration of a mass spectrometer. Several research groups have proposed spiking the sample with a mixture of standard synthetic peptides to monitor the performance of chromatographic systems Citation[17,18]: the peak shapes and correct distribution across the chromatogram. These peptides are chosen or designed to span a wide range of hydrophobicities, allowing monitoring of the acetonitrile gradient and calibration of the whole chromatographic space for a more accurate application of retention prediction models.

Unification of peptide hydrophobicity scales

Unification of peptide hydrophobicity scales will simplify inter-laboratory data transfer and comparison. Most prediction models report a unitless hydrophobicity or NET value. Krokhin and Spicer Citation[17] suggested using hydrophobicity index units, which correspond to the acetonitrile percentage needed to elute a peptide from the RP column. These values were carefully measured for the six members of the peptide retention standard, then mapped against the output values for the four versions of SSRCalc. In our opinion, expressing a peptide’s hydrophobicity in hydrophobicity index (acetonitrile %) units represents a viable method to unify the various scales.

Accurate blind comparison of the algorithms

It has become common practice for authors to claim superior performance of their models Citation[4,6,10]. Often, these conclusions are made based on limited datasets or using outdated versions of the competitor’s models. The field of peptide retention prediction will benefit from the creation of standard consensus peptide retention datasets for accurate blind comparison of the algorithms. This will eliminate many of the controversial conclusions and accelerate advancements in the field.

Fundamental chromatographic problems

There are still a number of fundamental chromatographic problems that needed to be addressed. A simplified view of peptide RPLC represents separation as a ‘catch and release’ process: peptides elute from the column when a particular concentration of organic solvent is reached. In reality, peptides are constantly moving through a column with different accelerations across the acetonitrile gradient. This may result in varying selectivity (retention order) when the gradient slope, flow rate or column size are altered. Recently Spicer et al.Citation[19] measured and proposed an algorithm to compute peptide slope values, S, in the fundamental equation of linear solvent strength theory, which describes these variations. Thus, new analytical techniques for peptide analysis allowed this fundamental problem to be solved in peptide chromatography, first formulated in the mid-1980s. However, further studies are needed to incorporate these findings into current retention prediction models and SRM experiment design software.

Of even greater complexity is the propensity of peptides to form amphipathic helical structures. Despite years of fundamental structural biology studies, there is no quantitative model to describe the interactions of peptide α-helices with hydrophobic surfaces. The main reason was the inability to accurately measure these interactions for a large number of analytes. Proteomic measurements combined with advanced retention prediction modeling will allow such measurements, separating the ‘helical’ component of peptide retention from other factors. The ability to estimate peptide helicity will ultimately not only benefit peptide retention prediction, but also improve the understanding of peptide interactions in biological hydrophobic environments.

Financial & competing interests disclosure

This work was supported by the Natural Sciences and Engineering Research Council of Canada. The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

References

  • Strittmatter EF, Kangas LJ, Petritis K et al. Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J. Proteome Res.3, 760–769 (2004).
  • Lange V, Picotti P, Domon B et al. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol.4, 222 (2008).
  • Krokhin OV, Craig R, Spicer V et al. An improved model for prediction of retention times of tryptic peptides in ion pair reversed-phase HPLC: its application to protein peptide mapping by off-line HPLC-MALDI MS. Mol. Cell Proteomics3, 908–919 (2004).
  • Petritis K, Kangas LJ, Yan B et al. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. Anal. Chem.78, 5026–5039 (2006).
  • Krokhin OV. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-Å pore size C18 sorbents. Anal. Chem.78, 7785–7795 (2006).
  • Moruz L, Tomazela D, Kall L. Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res.9, 5209–5216 (2010).
  • Shinoda K, Sugimoto M, Yachie N et al. Prediction of liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome using artificial neural networks. J. Proteome Res.5, 3312–3317 (2006).
  • Gorshkov AV, Tarasova IA, Evreinov VV et al. Liquid chromatography at critical conditions: comprehensive approach to sequence-dependent retention time prediction. Anal. Chem.78, 7770–7777 (2006).
  • Klammer AA, Yi X, MacCoss MJ et al. Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. Anal. Chem.79, 6111–6118 (2007).
  • Pfeifer N, Leinenbach A, Huber CG et al. Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics. BMC Bioinformatics8, 468 (2007).
  • Gilar M, Xie H, Jaworski A. Utility of retention prediction model for investigation of peptide separation selectivity in reversed-phase liquid chromatography: impact of concentration of trifluoroacetic acid, column temperature, gradient slope and type of stationary phase. Anal. Chem.82, 265–275 (2010).
  • Baczek T, Kaliszan R. Predictions of peptides’ retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics. Proteomics9, 835–847 (2009).
  • Meek JL. Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. Proc. Natl Acad. Sci. USA77, 1632–1636 (1980).
  • Mant CT, Kovacs JM, Kim HM et al. Intrinsic amino acid side-chain hydrophilicity/hydrophobicity coefficients determined by reversed-phase high-performance liquid chromatography of model peptides: comparison with other hydrophilicity/hydrophobicity scales. Biopolymers92, 573–595 (2009).
  • Shamshurin D, Spicer V, Krokhin OV. Defining intrinsic hydrophobicity of amino acids’ side chains in random coil conformation. Reversed-phase liquid chromatography of designed synthetic peptides vs. random peptide data sets. J. Chromatogr. A1218, 6348–6355 (2011).
  • Dwivedi RC, Spicer V, Harder M et al. Practical implementation of 2D HPLC scheme with accurate peptide retention prediction in both dimensions for high-throughput bottom-up proteomics. Anal. Chem.80, 7036–7042 (2008).
  • Krokhin OV, Spicer V. Peptide retention standards and hydrophobicity indexes in reversed-phase high-performance liquid chromatography of peptides. Anal. Chem.81, 9522–9530 (2009).
  • Tarasova IA, Guryca V, Pridatchenko ML et al. Standardization of retention time data for AMT tag proteomics database generation. J. Chromatogr. B. Analyt. Technol. Biomed. Life Sci.877, 433–440 (2009).
  • Spicer V, Grigoryan M, Gotfrid A et al. Predicting retention time shifts associated with variation of the gradient slope in peptide RP-HPLC. Anal. Chem.82, 9678–9685 (2010).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.