ABSTRACT
Proteogenomics, the alliance of proteomics, transcriptomics, genomics and bioinformatics, was first proposed for refining genome annotation using experimental data acquired on gene products. With high-throughput analysis of proteins made possible with next-generation tandem mass spectrometers, proteogenomics is greatly improving human genome annotation per se, and is helping to decrypt the numerous gene and protein modifications occurring during development, aging, illness and cancer progression. Further efforts are required to obtain a comprehensive picture of human genes, their products, functions, and drift over time or in reaction to microbiota and pathogen stimuli. This should be performed not only to obtain a general overview of the human population, but also to gain specific information at the individual level. This review focuses on the clinical implications of proteogenomics: novel biological insights into fundamental biology, better characterization of pathogens and parasites, discovery of novel diagnostic approaches for cancer, and personalized medicine.
Financial & competing interests disclosure
This study was supported by the Commissariat à l’Energie Atomique et aux Energies Alternatives and the Agence Nationale de la Recherche (ANR-12-BSV6-0012-01 & ANR-14-CE21-0006-02). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Proteogenomics combines the strengths of different omics approaches. Based on the six-frame translation of genome or RNAseq transcriptome sequences, proteogenomics allows quick identification of key proteins from a shotgun proteomic dataset. Better annotation of translation starts and splicing events can also be obtained from the peptides recorded by tandem mass spectrometry.
Proteogenomics is, today, the customary approach for discovering novel human genes and documenting the different proteoforms that could be produced from each of the ~20,000 human protein-coding genes. To date, approximately 18% of human genes have not been characterized through the detection by mass spectrometry of any peptide sequence evidence. Advances in covering the whole human proteome are being achieved through a multinational proteogenomics consortium.
Omics-based personalized monitoring is starting to be implemented, representing an important breakthrough in personalized medicine. Proteogenomics allows individual polymorphisms to be taken into account when analyzing proteomic data. Although the cost of long-term studies on large cohorts restrains its generalization, proteogenomics approaches show attractive perspectives.
Onco-proteogenomic approaches have been implemented for a better understanding of cancer pathways. Because of the important genomic drift of cancer tissues, novel proteoforms not yet present in human protein sequence databases can be detected by tandem mass spectrometry when searching unassigned MS/MS spectra. These unexpected peptides or proteoforms could be used as novel diagnostic biomarkers, or at least for personalized monitoring.
Numerous human pathogens, largely bacteria, have been better characterized in terms of protein sequence patrimony by proteogenomics. Complex protozoan parasites are also under study to improve their genome annotation and to better describe their cellular mechanisms. New knowledge could emerge from proteogenomics-inspired characterization of microbe–host interactions, and consequently novel drugs targeting the most sensitive molecular players could be proposed.