561
Views
3
CrossRef citations to date
0
Altmetric
Review

Peptidomics and proteogenomics: background, challenges and future needs

, , , , ORCID Icon, , & show all
Pages 643-659 | Received 23 Jun 2021, Accepted 10 Sep 2021, Published online: 24 Sep 2021
 

ABSTRACT

Introduction

With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development.

Areas covered

Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date.

Expert Opinion

This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.

Box 1 Reference protein sequence databases

  • RefSeq: RefSeq is the major database containing non-redundant sequences developed and maintained by National Center for Biotechnology Information (NCBI). Most archeal and bacterial genome sequences are generated and annotated using prokaryotic genome annotation pipeline while some are manually curated. In case of eukaryotes two methods are followed; one by using eukaryotic genome annotation pipeline and other using International Nucleotide Sequence Database Collaboration (INSDC). In genome annotation pipeline annotation is done using already available transcripts sequence data, protein homology, ab-initio prediction or curated transcripts and proteins [Citation157] . The annotated genomes are made available for public to download the sequence data or perform homology BLAST searches. This database is regularly maintained and curated and remains updated with the increasing number of sequence information. The database also contains information on specific loci such as RefSeqGene or fungal internal transcribed spacer sequences (ITS) [157]. RefSeq records can be accessed through NCBI website.

  • UniProt: The UniProt database is maintained by the research groups from Swiss Institute of Bioinformatics (SIB), Protein Information Resource (PIR) and European Bioinformatics Institute (EBI). Sequence information can be retrieved in batches using UniProt identifiers and the files are available in various formats as TEXT, XML, RDF and FASTA.

Article highlights

  • Proteogenomics and peptidomics involve the identification of peptides.

  • Various bioinformatics tools enable researchers to perform genome-wide association studies and provide answers to various biological questions.

  • The advanced techniques of proteogenomics can help in identifying the implication of single nucleotide variations (SNVs) and their impact on disease etiology.

  • Proteogenomic studies have also been performed on cancer tissues to identify potential biomarkers, such as chromosomal hotspots and proteins with altered expression that play critical roles in disease development.

Disclosure statement

The authors have no relevant affiliations or financial interests with organizations or companies that have a financial interest in or conflict with the topics or materials discussed in the manuscript. This includes employment, consulting, honoraria, stock ownership or options, expert testimony, grants received or applied for, or patents or royalties.

Reviewer disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

The authors thank the Portuguese Foundation for Science and Technology (FCT), European Union, QREN, FEDER and COMPETE for funding UnIC - Unidade de Investigação Cardiovascular (UIDB/00051/2020 and UIDP/00051/2020), iBiMED (UIDB/04501/2020, POCI-01-0145-FEDER-007628) and LAQV/REQUIMTE (UIDB/50006/2020) research units. RV is supported by individual fellowship grants (IF/00286/2015). VT is supported by the Mahidol University research grant and the Thailand Research Fund (IRN60W0004). This work is funded by national funds (OE), through FCT – Fundação para a Ciência e a Tecnologia, I.P., in the scope of the framework contract foreseen in the numbers 4, 5 and 6 of the article 23, of the Decree-Law 57/2016, of August 29, changed by Law 57/2017, of July 19. COST Action CA19144 European Venom Network (EUVEN). We acknowledge the MASSFIITB Facility at IIT Bombay supported by the Department of Biotechnology (BT/PR13114/INF/22/206/2015) and Ministry of Human Resource Development, Government of India (MHRD-UAY Phase-II Project (IITB_001) to SS. MC acknowledges IIT Bombay Post-Doctoral Fellowship.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.