291
Views
5
CrossRef citations to date
0
Altmetric
Review

Proteogenomics in the context of the Human Proteome Project (HPP)

, , &
Pages 267-275 | Received 22 Oct 2018, Accepted 08 Jan 2019, Published online: 28 Jan 2019
 

ABSTRACT

Introduction: The technological and scientific progress performed in the Human Proteome Project (HPP) has provided to the scientific community a new set of experimental and bioinformatic methods in the challenging field of shotgun and SRM/MRM-based Proteomics. The requirements for a protein to be considered experimentally validated are now well-established, and the information about the human proteome is available in the neXtProt database, while targeted proteomic assays are stored in SRMAtlas. However, the study of the missing proteins continues being an outstanding issue.

Areas covered: This review is focused on the implementation of proteogenomic methods designed to improve the detection and validation of the missing proteins. The evolution of the methodological strategies based on the combination of different omic technologies and the use of huge publicly available datasets is shown taking the Chromosome 16 Consortium as reference.

Expert commentary: Proteogenomics and other strategies of data analysis implemented within the C-HPP initiative could be used as guidance to complete in a near future the catalog of the human proteins. Besides, in the next years, we will probably witness their use in the B/D-HPP initiative to go a step forward on the implications of the proteins in the human biology and disease.

Article highlights

  • Proteogenomics is a promising area of research in several technological and scientific areas especially in biology, biomedicine and, in the last few years, in clinical biomarker discovery.

  • The basics of proteogenomic methods are the creation of customized protein sequence databases for the proteomic searches and the subsequent statistical analyses for the FDR estimation in the obtained results considering the size effect derived from these databases.

  • One of the key objectives of the HPP project is the experimental detection of the proteins annotated in neXtProt database with protein evidences PE2, PE3 and PE4 (MPs) in a biological matrix using stringent statistical thresholds.

  • The availability in public repositories such as GEO or PRIDE of large amounts of high throughput experiments to study the human transcriptome (microarrays, RNA-Seq) and proteome (shotgun Proteomics) has allowed the development of new bioinformatic workflows for finding MPs with a reanalysis of these datasets that follows the HPP guidelines.

  • The integration of transcriptomic and proteomic experiments (Proteogenomics) has been used to study the characteristics of the missing proteins in order to increase the knowledge about them. The obtained results describe the functions and pathways in which they are involved, their tissue specificity, and serve as guidance for the design of validation experiments in certain biological matrices (for example, brain and testis tissues and embryonic cell lines).

  • The study of peptide detectability using a machine learning approach reveals the MS limitations to detect a subset of peptides, especially MP peptides.

  • Unfortunately, even the predictions performed using the most sophisticated algorithms are difficult to validate. New experimental approaches, such as protein enrichment or depletion strategies, and new biological matrices must be incorporated into the project in order to complete the human proteome catalog. The bioinformatic methods provided by the HPP scientific community to study the MPs can also be applied to the B/D-HPP initiative for the research of human protein implications in the cellular processes and human diseases.

This box summarizes key points contained in the article.

Declaration of interest

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Reviewer declarations

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

The CIMA Bioinformatics Platform is member of the ProteoRed-ISCIII platform. This work was supported by PRBB-ISCIII (PT13/0001/0002), PRB3-ISCIII (PT17/0019/0013), Departamento de Salud of Gobierno de Navarra (33/2015) and Ministerio de Economía y Competitividad co-financed by FEDER funds (DPI2015-68982-R) to V. Segura; PhD funding from Ministerio de Economía y Competitividad (BES-2016-079065) to J. G González-Gomariz; and PRBB-ISCIII staff funding to M. López-Sánchez.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 99.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 641.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.