ABSTRACT
Introduction
With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development.
Areas covered
Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date.
Expert Opinion
This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.
Box 1 Reference protein sequence databases
RefSeq: RefSeq is the major database containing non-redundant sequences developed and maintained by National Center for Biotechnology Information (NCBI). Most archeal and bacterial genome sequences are generated and annotated using prokaryotic genome annotation pipeline while some are manually curated. In case of eukaryotes two methods are followed; one by using eukaryotic genome annotation pipeline and other using International Nucleotide Sequence Database Collaboration (INSDC). In genome annotation pipeline annotation is done using already available transcripts sequence data, protein homology, ab-initio prediction or curated transcripts and proteins [Citation157] . The annotated genomes are made available for public to download the sequence data or perform homology BLAST searches. This database is regularly maintained and curated and remains updated with the increasing number of sequence information. The database also contains information on specific loci such as RefSeqGene or fungal internal transcribed spacer sequences (ITS) [157]. RefSeq records can be accessed through NCBI website.
UniProt: The UniProt database is maintained by the research groups from Swiss Institute of Bioinformatics (SIB), Protein Information Resource (PIR) and European Bioinformatics Institute (EBI). Sequence information can be retrieved in batches using UniProt identifiers and the files are available in various formats as TEXT, XML, RDF and FASTA.
Article highlights
Proteogenomics and peptidomics involve the identification of peptides.
Various bioinformatics tools enable researchers to perform genome-wide association studies and provide answers to various biological questions.
The advanced techniques of proteogenomics can help in identifying the implication of single nucleotide variations (SNVs) and their impact on disease etiology.
Proteogenomic studies have also been performed on cancer tissues to identify potential biomarkers, such as chromosomal hotspots and proteins with altered expression that play critical roles in disease development.
Disclosure statement
The authors have no relevant affiliations or financial interests with organizations or companies that have a financial interest in or conflict with the topics or materials discussed in the manuscript. This includes employment, consulting, honoraria, stock ownership or options, expert testimony, grants received or applied for, or patents or royalties.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.
Supplementary material
Supplemental data for this article can be accessed here.