461
Views
6
CrossRef citations to date
0
Altmetric
Editorial

Report from the 2nd Annual US HUPO Meeting on the HUPO Human Plasma Proteome Project

Pages 165-168 | Published online: 09 Jan 2014

The Plasma Proteome Project (PPP) of the Human Proteome Organization (HUPO) was discussed in a workshop at the US HUPO Annual Meeting in Boston (MA, USA) on March 14, 2006. Highlights presented here include ties to the Disease Biomarkers Initiative, deeper mining of the plasma proteome, a robust ongoing bioinformatics effort, ties to the Antibody Production Initiative and consensus on preferred specimens for plasma and serum studies.

Introduction

From initial discussions in 2002 through 2005, the HUPO PPP completed an ambitious Pilot Phase with the publication of a special issue of Proteomics in August 2005, which contained 28 papers on many aspects of the human plasma proteome, both collaborative analyses and laboratory-specific studies Citation[1]. The title, Exploring the Human Plasma Proteome, captures the nature of this phase of the work, in which 35 laboratories in 13 countries provided datasets using their preferred technologies and interpretations on HUPO PPP reference specimens. Independent analyses from the raw spectra or mass-to-charge peak lists were also performed. The PPP datasets are publicly available for additional collaborative and independent analyses at the University of Michigan (MI, USA) Citation[101], European Bioinformatics Institute Citation[102] and Institute for Systems Biology Citation[103]. First at the Munich Congress of HUPO Citation[2] and then at the US HUPO Boston meeting, investigators and other interested parties were invited to in-depth workshops on the progress made, the compelling issues to be addressed, and the feasible actions for the Next Phase of the PPP (co-chairs Young-Ki Paik, Matthias Mann and Gilbert Omenn) under the HUPO Initiatives chair Sam Hanash.

Goals

The long-term scientific goals of the HUPO initiatives are to advance the capabilities and reproducibility of proteomics technologies and to build a foundation for disease biomarker discovery, validation and applications. The PPP goals are:

Comprehensive analysis of human plasma and serum proteins

Characterization of physiological, pathological and pharmacological sources of variation within individuals over time

Determination of the extent of variation across individuals and populations

Boston Plasma Proteome Project Workshop

A very well-attended workshop on March 14 addressed five aims for the Next Phase of the PPP:

Ties to disease biomarker studies

Specimen fractionation and technologies for deeper mining of the plasma proteome

Objectives of an ongoing robust bioinformatics effort

Basis for nominating proteins for production of antibodies

Gaining consensus on the preferred specimen for blood samples

Disease biomarkers

The meeting began with the disease biomarker topic, as a continuation of the discussion the night before at the Disease Biomarkers Workshop convened by Salvatore Sechi of the National Institutes of Health on March 13. The key point of that preceding session was the connection of organ- and disease-based studies with collection and analysis of plasma specimens from the same participants, so that candidate biomarkers discovered in the tissues or proximal fluids (cerebral spinal fluid, nipple aspirate, urine, bronchial alveolar lavage, bile, pancreatic juice, tears and semen) could be sought in more readily obtained specimens of blood. The PPP database is a useful resource for investigators identifying proteins that are over- or underexpressed in tissues, and wondering whether the same proteins had already been detected in plasma or serum.

Sechi pointed out that relevant reference specimens from patients with well-characterized disorders may help advance the work. The National Institute of Diabetes and Digestive and Kidney Diseases is providing funded investigators with citrated-plasma specimens from, for example, patients with type 2 diabetes, individuals with prediabetes and normal normoglycemic individuals. There ensued quite a discussion about metabolic disorders for which plasma would be the primary specimen, since there may be no counterpart to a biopsied or surgically removed tumor specimen at the tissue level. Also, prominent comorbidities were noted, such as high blood pressure and being overweight, for cardiovascular diseases, diabetes and cancers. David States stressed that investigators interested in variation in a group of interest should plan to analyze at least five individuals for a reliable mean, and 25 individuals for a reliable estimate of the standard deviation; others noted that such analyses may need to be performed in replicate samples, perhaps triplicate. Several participants called for standardized operating procedures for specimens and for peptide, protein and plasma reference standards (see later). It was noted that the US FDA has issued an advisory document for submission of gene expression data and would probably welcome suggestions for such an advisory for proteomic analyses to be submitted as part of New Drug Applications.

Both high- and low-abundance proteins may be of interest. The PPP Pilot Phase sought to define the sensitivity of various technologies and combinations of technologies to detect proteins as far down the scale of dynamic range as possible, presumably enhancing the prospects of detecting proteins secreted by cells, released from cells during turnover or injury, or released from cells undergoing apoptosis or other cell death processes. The PPP datasets contain large numbers of such proteins Citation[1]. Others noted that high-abundance proteins may have post-translational modifications as a result of disease processes, including proteolysis to peptides in vivo or ex vivoCitation[3]. Acute-phase reactants, such as serum amyloid A, transthyretin, haptoglobin and C-reactive protein, have been reported frequently in association with various diseases. It is unlikely that these proteins would be highly specific, unless there were particular modifications. Abundant proteins, starting with albumin and lipoproteins, may be convenient carriers of low-abundance proteins and protein fragments. Finally, immunoglobulins may be highly informative of disease processes that generate immunogenic protein antigens, such as in cancers that are associated with circulating autoantibodies Citation[4,5]. These autoantibodies represent a biological amplification of the tumor antigen signal, perhaps by 1000-fold or more, which should greatly increase the feasibility of detecting the tumor signal.

Deeper mining of the plasma proteome

Ruedi Aebersold provided comments in advance about an emerging strategy to build a glycosite database of peptides and proteins containing N- and O-glycosyl side chains, part of a broader strategy to identify proteotypic peptides, unique sequences for as many specific proteins as possible, labeled with heavy isotopes suitable for spiking patient specimens or normal specimens. The result would be mass pairs for the peptides, permitting specific, sensitive identification and reliable relative quantitation Citation[6,7]. Data are readily available at Citation[103].

David Speicher reminded the group of the many significant findings of the Pilot Phase of the PPP. He noted the very dynamic nature of the plasma proteome and the value of documenting the similarities and differences of similar and different technologies. He emphasized the need for the Next Phase to address higher reproducibility, higher throughput and quantitation. He referred to the calibration standards from the Association of Biomolecular Resource Facilities (ABRF), and noted that using 50 proteins at high, equal concentration is just a starting point in the modification of mixtures to have a very broad range of concentrations across the proteins, including many at low abundances. He also called for more complex reference specimens, namely plasma with spiking for specified aims. Of course, investigation of elaborate variables cannot be performed on a voluntary basis or shoestring budget. He also called for multiple aliquots of a specimen, perhaps three for replicate analysis, and the others to be saved for targeted analysis of proteins identified by other methods, including multiple reaction methods.

Bill Hancock described advantages of glycoprotein analyses. The glycoproteome reflects especially secreted proteins, focuses on post-translational modifications that are important to protein functions and eliminates albumin. Selectivity among lectins permits subfractionation for proteins containing fucosyl, sialyl or other particular glycoconjugates. Glycoproteins may be relatively protected against proteases. Glycoproteins frequently have high charge heterogeneity (e.g., 21 charge forms for tissue plasminogen activator and 50 for prostate-specific antigen), which leads to smearing on gels and chromatography. Most glycoproteins also have unglycosylated components corresponding to 10–20% by weight, and thus it is wise to analyze both lectin-bound and flow-through fractions. It is desirable to avoid EDTA or citrate in order to have calcium in the binding buffer; but it is important to remove fibrinogen, to prevent clotting in the column with Ca2+. In most instances, it is also useful to remove immunoglobulins. There are changes in glycoproteins in patients with cancers, diabetes, cardiovascular disease, autoimmune disorders and other conditions.

Phil Andrews noted that every aspect of proteomics technology platforms is changing rapidly. Various instruments have complementary advantages and limitations. Phosphoprotein analyses are now feasible, along with other post-translational modifications. Andrews has been leading ABRF projects with standard peptide and proteins mixtures. A standard mixture of 50 proteins was sent to 120 laboratories, of which 60 sent in their analytical findings. A total of 12 had 100% correct hits for up to 48 proteins called, while the percentage dropped off substantially over the remaining 80% of laboratories. There was no correlation of instrument or other obvious technology variable with success rate. For example, LCQ gave higher rates than LTQ (ThermoFinnigan) or ABI 4800 (Applied Biosystems) among results submitted. The conclusion is that the operator is the key variable, together with the intensity of analysis. Mass spectrometry is still an art.

Martin Steffen emphasized the cellular component of blood, especially lymphocytes, which may reflect changes in primary tissues or be the site of important pathologic processes. Other white blood cells, platelets, red blood cells and circulating cancer or other cells may contribute to the plasma proteome findings.

Proteomics informatics

David States indicated that specific challenges include specificity of protein inferences from peptides deduced from mass spectrometry using various software tools and databases, quantitation of protein or peptide concentrations, detection of inherited, splicing or environmental sources of protein variation, tissue origins of circulating proteins, and post-translational modifications. Steve Stein emphasized the importance of signal-to-noise discrimination and the highly variable results with five different search engines, and even different versions or different settings of the same search engine. Jimmy Eng explained that PeptideAtlas brings together datasets from various sources into a single analytical pipeline. He recommended the paper by Kapp and coworkers on analysis of one dataset with five different search engines in the PPP special issue, including effects of choice of detailed parameters Citation[8]. Nathan Edwards and others also emphasized the unpredictable changes in widely used gene and protein databases as updates are periodically incorporated.

A validated resource of antibodies

Mathias Uhlen described the Protein Atlas and HUPO Antibody Production Initiative Citation[9]. He noted that, although there are 50,000 commercial antibody preparations available, it is a highly redundant and poorly validated resource; for example, there are 1400 antibodies against p53. At Citation[104], one can observe the immunohistochemical patterns of 718 antibodies (against 660 different proteins) in 48 normal human tissues from three individuals and in tumors of 20 different sites from (usually) 12 patients each. To date, he has not produced antibodies against plasma proteins, and he noted that probing the plasma with antibodies is difficult, due to the dynamic range and complexity. The present and planned (up to 10,000) antibodies can be used for immunohistochemistry, western blots, immunoprecipitation and quality assurance against other methods. There is a trade-off in using polyclonal antibodies, which recognize multiple epitopes and may react in the presence of post-translational modifications, are much easier to work with, and are much simpler and cheaper to produce than monoclonals. Uhlen’s group is adding approximately ten new antibodies per day. However, polyclonal antibodies are not a renewable resource and require extensive quality assurance each time. Monoclonal antibodies can be produced on a sustained basis, but have opposite deficiencies.

Xiaohang Fang described the capability of the chicken immunoglobulin Y polyclonal antibody production system both for customized production and for depletion of 12 or more abundant proteins. The group briefly discussed the lists of plasma proteins in the PPP Core Dataset of 3020 proteins based on two or more peptide IDs Citation[1], and the 889 proteins in the truncated dataset after application of adjustments for protein length and for multiple comparisons testing Citation[10]. Matching proteins identified by mass spectrometry to proteins detected by antibodies runs into numerous mismatch problems due to multiple synonyms and protein families, and due to uncertainties about the epitope and protein specificity of the antibodies. Feng found 60 of the 889 matched by name to 2000 antibodies in the GenWay Biotech, Inc. inventory.

Specimen collection standard operating protocols

There was consensus that investigators need clear guidance in the form of standard operating procedures for choice of specimen to collect and details of specimen collection, processing, handling, aliquotting, freezing and thawing. HUPO PPP stalwarts Gerard Siest, Frank Vitzthum, Dan Chan (Johns Hopkins) and Young-Ki Paik provided inputs in advance, and many others present at the workshop agreed that the preferred specimen is K2-EDTA plasma. Plasma proteins have less ex vivo variability and less proteolysis than serum proteins from the same donors. Of course, for certain kinds of assays as in all of clinical chemistry, serum or other special conditions may be required; for example, if calcium-dependent enzyme activities are to be assayed, EDTA would not be appropriate.

David Warunek of BD Biosciences reiterated their offer to prepare a reference specimen (or specimens) for the Next Phase work. Ruth Vanbogelen of Pfizer Inc. offered to share their well-tested standard operating procedures.

Conclusion

A PPP working group will be established to turn the specimen collection consensus into draft standard operating protocols. Another will bring together guidance on peptide, protein and plasma reference standards, following the National Institute of Standards and Technology/National Cancer Institute workshop of August 2005 Citation[11]. Major laboratories who are able to conduct large-scale analyses with advanced technology platforms with their own funding, and who are willing to submit their data for cross-laboratory PPP analyses, will constitute the participating laboratory group for the Next Phase of the PPP.

References

  • Omenn GS, States DJ, Adamski MR et al. Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics5, 3226–3245 (2005).
  • Omenn GS, Paik Y-K, Speicher D. The HUPO Plasma Proteome Project: a report from the Munich Congress. Proteomics6, 9–11 (2006).
  • Villaneuva J, Shaffer DR, Philip J et al. Differential exoprotease activities confer tumor-specific serum peptidome patterns. J. Clin. Invest.116, 271–284 (2006).
  • Brichory FM, Misek DE, Yim AM et al. An immune response manifested by the common occurrence of annexins I and II autoantibodies and high circulating levels of IL-6 in lung cancer. Proc. Natl Acad. Sci. USA98, 9824–9829 (2001).
  • Wang X, Yu J, Sreekumar A et al. Autoantibody signatures in prostate cancer. New Engl. J. Med.353, 1224–1235 (2005).
  • Kuster B, Schirle M, Mallick P, Aebersold R. Scoring proteomes with proteotypic peptide probes. Nature Rev. Mol. Cell Biol.6, 577–583 (2005).
  • Craig R, Cortens JP, Beavis RC. The use of proteotypic peptide libraries for protein identification. Rapid Commun. Mass Spectrom19, 1844–1850 (2005).
  • Kapp EA, Schutz F, Connolly LM et al. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis.Proteomics5, 3475–3490 (2005).
  • Uhlen M, Ponten F. Antibody-based proteomics for human tissue profiling. Mol. Cell. Proteomics4(4), 384–393 (2005).
  • States DJ, Omenn GS, Blackwell TW et al. Deriving high confidence protein identifications from a HUPO collaborative study of human serum and plasma. Nature Biotechnol(2006) (In Press).
  • Barker PE, Wagner PD, Stein SE, Bunk DM, Srivastava S, Omenn GS. Standards for plasma and serum proteomics in early cancer detection: a needs assessment report from the NIST-NCI SMART Workshop, August 18–19, 2005. Clin. Chem.(2006) (In Press).

Websites

  • University of Michigan: Human Proteome Organization Plasma Proteome Project www.bioinformatics.med.umich.edu/ hupo/ppp
  • European Bioinformatics Institute www.ebi.ac.uk/pride
  • Institute for Systems Biology www.peptideatlas.org/repository
  • Human Protein Atlas www.proteinatlas.org

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.