668
Views
50
CrossRef citations to date
0
Altmetric
Review

Metaproteomic data analysis at a glance: advances in computational microbial community proteomics

, &
Pages 757-769 | Received 23 May 2016, Accepted 30 Jun 2016, Published online: 20 Jul 2016

References

  • Rodríguez-Valera F. Environmental genomics, the big picture? FEMS Microbiol Lett. 2004;231(2):153–158.
  • Bäckhed F, Ley RE, Sonnenburg JL, et al. Host-bacterial mutualism in the human intestine. Science. 2005;307(5717):1915–1920.
  • Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: the unseen majority. Proc Natl Acad Sci U S A. 1998;95(12):6578–6583.
  • Pace NR. A molecular view of microbial diversity and the biosphere. Science. 1997;276(5313):734–740.
  • Konopka A. What is microbial community ecology? ISME J. 2009;3(11):1223–1230.
  • Wilmes P, Bond PL. Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol. 2006;14(2):92–97.
  • Wilmes P, Bond PL. The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms. Environ Microbiol. 2004;6(9):911–920.
  • Seifert J, Herbst FA, Halkjaer Nielsen P. et al.Bioinformatic progress and applications in metaproteogenomics for bridging the gap between genomic sequences and metabolic functions in microbial communities. Proteomics. 2013;13(18–19):2786–2804.
  • Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods. 2014;11(11):1114–1125.
  • Armengaud J, Trapp J, Pible O, et al. Non-model organisms, a species endangered by proteogenomics. J Proteomics. 2014;105:5–18.
  • Mehlan H, Schmidt F, Weiss S, et al. Data visualization in environmental proteomics. Proteomics. 2013;13(18–19):2805–2821.
  • Oveland E, Muth T, Rapp E, et al. Viewing the proteome: how to visualize proteomics data? Proteomics. 2015;15(8):1341–1355.
  • Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005;33(Database issue):D501–504.
  • Apweiler R, Bairoch A, Wu CH, et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32(Database issue):D115–119.
  • Yates A, Akanni W, Amode MR, et al. Ensembl 2016. Nucleic Acids Res. 2016;44(D1):D710–716.
  • Suzek BE, Huang H, McGarvey P, et al. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23(10):1282–1288.
  • Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5(11):976–989.
  • Perkins DN, Pappin DJ, Creasy DM, et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567.
  • Geer LY, Markey SP, Kowalak JA, et al. Open mass spectrometry search algorithm. J Proteome Res. 2004;3(5):958–964.
  • Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20(9):1466–1467.
  • Diament BJ, Noble WS. Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res. 2011;10(9):3871–3879.
  • Tabb DL, Fernando CG, Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res. 2007;6(2):654–661.
  • Eng JK, Jahan TA, Hoopmann MR. Comet: an open-source MS/MS sequence database search tool. Proteomics. 2013;13(1):22–24.
  • Kim S, Pevzner PA. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun. 2014;5:5277.
  • Dorfer V, Pichler P, Stranzl T. A universal identification algorithm optimized for high accuracy tandem mass spectra. J Proteome Res. 2014;13(8):3679–3684.
  • Cox J, Neuhauser N, Michalski A, et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res. 2011;10(4):51794–51805.
  • Sturm M, Bertsch A, Gröpl C, et al. OpenMS - an open-source software framework for mass spectrometry. BMC Bioinformatics. 2008;9:163.
  • Keller A, Shteynberg D. Software pipeline and data analysis for MS/MS proteomics: the trans-proteomic pipeline. Methods Mol Biol. 2011;694:169–189.
  • Kremer LP, Leufken J, Oyunchimeg P, et al. Ursgal, universal python module combining common bottom-up proteomics tools for large-scale analysis. J Proteome Res. 2016;15(3):788–794.
  • Vaudel M, Barsnes H, Berven FS, et al. An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics. 2011;11(5):996–999.
  • Vaudel M, Burkhart JM, Zahedi RP, et al. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol. 2015;33(1):22–24.
  • Käll L, Storey JD, Noble WS. QVALITY: non-parametric estimation of q-values and posterior error probabilities. Bioinformatics. 2009;25(7):964–966.
  • Wedge DC, Krishna R, Blackhurst P, et al. FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines. J Proteome Res. 2011;10(4):2088–2094.
  • Yadav AK, Kadimi PK, Kumar D, et al. ProteoStats–a library for estimating false discovery rates in proteomics pipelines. Bioinformatics. 2013;29(21):2799–2800.
  • Gonnelli G, Stock M, Verwaeren J, et al. A decoy-free approach to the identification of peptides. J Proteome Res. 2015;14(4):1792–1798.
  • Kwon T, Choi H, Vogel C, et al. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res. 2011;10(7):2949–2958.
  • Shteynberg D, Deutsch EW, Lam H, et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteomics. 2011;10(12):M111 007690.
  • Käll L, Canterbury JD, Weston J, et al. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods. 2007;4(11):923–925.
  • Frank A, Pevzner P. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem. 2005;77(4):964–973.
  • Chi H, Chen H, He K, et al. pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. J Proteome Res. 2013;12(2):615–625.
  • Ma B, Zhang K, Hendrie C, et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003;17(20):2337–2342.
  • Ma B. Novor: real-time peptide de novo sequencing software. J Am Soc Mass Spectrom. 2015;26(11):1885–1894.
  • Muth T, Weilnböck L, Rapp E, et al. DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra. J Proteome Res. 2014;13(2):1143–1146.
  • Leprevost FV, Valente RH, Borges DL, et al. PepExplorer: a similarity-driven tool for analyzing de novo sequencing results. Mol Cell Proteomics. 2014;13(9):2480–2489.
  • Renard BY, Xu B, Kirchner M, et al. Overcoming species boundaries in peptide identification with Bayesian information criterion-driven error-tolerant peptide search (BICEPS). Mol Cell Proteomics. 2012;11(7):M111 014167.
  • Huson DH, Auch AF, Qi J, et al. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–386.
  • Mesuere B, Devreese B, Debyser G, et al. Unipept: tryptic peptide-based biodiversity analysis of metaproteome samples. J Proteome Res. 2012;11(12):5773–5780.
  • Muth T, Behne A, Heyer R, et al. The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. J Proteome Res. 2015;14(3):1557–1565.
  • Schneider T, Schmid E, De Castro JV Jr., et al. Structure and function of the symbiosis partners of the lung lichen (Lobaria pulmonaria L. Hoffm.) analyzed by metaproteomics. Proteomics. 2011;11(13):2752–2756.
  • Penzlin A, Lindner MS, Doellinger J, et al. Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. Bioinformatics. 2014;30(12):i149–156.
  • Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–29.
  • Galperin MY, Makarova KS, Wolf YI, et al. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015;43(Database issue):D261–269.
  • Szklarczyk D, Franceschini A, Wyder S, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447–452.
  • Huerta-Cepas J, Szklarczyk D, Forslund K, et al. EggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016;44(D1):D286–293.
  • Ogata H, Goto S, Sato K, et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27(1):29–34.
  • Krieger CJ, Zhang P, Mueller LA, et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 2004;32(Database issue):D438–442.
  • Apweiler R, Attwood TK, Bairoch A, et al. InterPro–an integrated documentation resource for protein families, domains and functional sites. Bioinformatics. 2000;16(12):1145–1150.
  • Amann RI, Ludwig W, Schleifer KH. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. 1995;59(1):143–169.
  • Kuhring M, Renard BY. Estimating the computational limits of detection of microbial non-model organisms. Proteomics. 2015;15(20):3580–3584.
  • Denef VJ, Shah MB, Verberkmoes NC, et al. Implications of strain- and species-level sequence divergence for community and isolate shotgun proteomic analysis. J Proteome Res. 2007;6(8):3152–3161.
  • Verberkmoes NC, Russell AL, Shah M, et al. Shotgun metaproteomics of the human distal gut microbiota. Isme J. 2009;3(2):179–189.
  • Hettich RL, Pan C, Chourey K, et al. Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. Anal Chem. 2013;85(9):4203–4214.
  • Daniel H, Moghaddas Gholami A, Berry D, et al. High-fat diet alters gut microbiota physiology in mice. Isme J. 2014;8(2):295–308.
  • Vizcaíno JA, Foster JM, Martens L. Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research. J Proteomics. 2010;73(11):2136–2146.
  • Griss J, Perez-Riverol Y, Lewis S, et al. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Meth. 2016. doi:10.1038/nmeth.3902.
  • Zickmann F, Renard BY. IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy. BMC Genomics. 2015;16:134.
  • Morris RM, Nunn BL, Frazar C, et al. Comparative metaproteomics reveals ocean-scale shifts in microbial nutrient utilization and energy transduction. Isme J. 2010;4(5):673–685.
  • Rooijers K, Kolmeder C, Juste C, et al. An iterative workflow for mining the human intestinal metaproteome. BMC Genomics. 2011;12:6.
  • Kolmeder CA, de Been M, Nikkilä J, . Comparative metaproteomics and diversity analysis of human intestinal microbiota testifies for its temporal stability and expression of core functions. PLoS One. 2012;7(1):e29913.
  • Kolmeder CA, Ritari J, Verdam FJ, et al. Colonic metaproteomic signatures of active bacteria and the host in obesity. Proteomics. 2015;15(20):3544–3552.
  • Vaudel M, Sickmann A, Martens L. Current methods for global proteome identification. Expert Rev Proteomics. 2012;9(5):519–532.
  • Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods. 2007;4(3):207–214.
  • Colaert N, Degroeve S, Helsens K, et al. Analysis of the resolution limitations of peptide identification algorithms. J Proteome Res. 2011;10(12):5555–5561.
  • Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics. 2010;73(11):2092–2123.
  • Noble WS. Mass spectrometrists should search only for peptides they care about. Nat Methods. 2015;12(7):605–608.
  • Tanca A, Palomba A, Deligios M, et al. Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture. PLoS One. 2013;8(12):e82981.
  • Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res. 2012;11(11):5221–5234.
  • Castellana N, Bafna V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J Proteomics. 2010;73(11):2124–2135.
  • Muth T, Kolmeder CA, Salojärvi J, et al. Navigating through metaproteomics data: a logbook of database searching. Proteomics. 2015;15(20):3439–3453.
  • Jeong K, Kim S, Bandeira N. False discovery rates in spectral identification. BMC Bioinformatics. 2012;13(Suppl 16):S2.
  • Keller A, Nesvizhskii AI, Kolker E, et al. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002;74(20):5383–5392.
  • Renard BY, Timm W, Kirchner M, et al. Estimating the confidence of peptide identifications without decoy databases. Anal Chem. 2010;82(11):4314–4318.
  • Kim S, Gupta N, Pevzner PA. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res. 2008;7(8):3354–3363.
  • Howbert JJ, Noble WS. Computing exact p-values for a cross-correlation shotgun proteomics score function. Mol Cell Proteomics. 2014;13(9):2467–2479.
  • Verheggen K, Barsnes H, Martens L. Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory. Proteomics. 2014;14(4–5):367–377.
  • Ning K, Fermin D, Nesvizhskii AI. Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets. Proteomics. 2010;10(14):2712–2718.
  • Kertesz-Farkas A, Keich U, Noble WS. Tandem Mass Spectrum Identification via Cascaded Search. J Proteome Res. 2015;14(8):3027–3038.
  • Jagtap P, Goslinga J, Kooren JA, et al. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics. 2013;13(8):1352–1357.
  • Yılmaz S, Victor B, Hulstaert N, et al. A pipeline for differential proteomics in unsequenced species. J Proteome Res. 2016;15(6):1963–1970.
  • Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410.
  • Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev Proteomics. 2011;8(5):645–657.
  • Cantarel BL, Erickson AR, VerBerkmoes NC, et al. Strategies for metagenomic-guided whole-community proteomics of complex microbial environments. PLoS One. 2011;6(11):e27173.
  • Han Y, Ma B, Zhang K. SPIDER: software for protein identification from sequence tags with de novo sequencing error. J Bioinform Comput Biol. 2005;3(3):697–716.
  • Ma B, Johnson R. De novo sequencing and homology searching. Mol Cell Proteomics. 2012;11(2):O111 014902.
  • Nesvizhskii AI, Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics. 2005;4(10):1419–1440.
  • Kolmeder CA, de Vos WM. Metaproteomics of our microbiome - developing insight in function and activity in man and model systems. J Proteomics. 2014;97:3–16.
  • Serang O, Noble W. A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface. 2012;5(1):3–20.
  • Huang T, Wang J, Yu W, et al. Protein inference: a review. Brief Bioinform. 2012;13(5):586–614.
  • Martens L, Hermjakob H. Proteomics data validation: why all must provide data. Mol Biosyst. 2007;3(8):518–522.
  • Jagtap P, McGowan T, Bandhakavi S, et al. Deep metaproteomic analysis of human salivary supernatant. Proteomics. 2012;12(7):992–1001.
  • Huson DH, Mitra S, Ruscheweyh HJ, et al. Integrative analysis of environmental sequences using MEGAN4. Genome Res. 2011;21(9):1552–1560.
  • Mesuere B, Willems T, Van der Jeugt F, et al. Unipept web services for metaproteomics analysis. Bioinformatics. 2016;32(11):1746–1748.
  • Vandermarliere E, Mueller M, Martens L. Getting intimate with trypsin, the leading protease in proteomics. Mass Spectrom Rev. 2013;32(6):453–465.
  • Chourey K, Nissen S, Vishnivetskaya T, et al. Environmental proteomics reveals early microbial community responses to biostimulation at a uranium- and nitrate-contaminated site. Proteomics. 2013;13(18–19):2921–2930.
  • Goecks J, Nekrutenko A, Taylor J, et al. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86.
  • Jagtap PD, Blakely A, Murray K, et al. Metaproteomic analysis using the Galaxy framework. Proteomics. 2015;15(20):3553–3565.
  • Püttker S, Kohrs F, Benndorf D, et al. Metaproteomics of activated sludge from a wastewater treatment plant - a pilot study. Proteomics. 2015;15(20):3596–3601.
  • Theuerl S, Kohrs F, Benndorf D, et al. Community shifts in a well-operating agricultural biogas plant: how process variations are handled by the microbiome. Appl Microbiol Biotechnol. 2015;99(18):7791–7803.
  • Barsnes H, Martens L. Crowdsourcing in proteomics: public resources lead to better experiments. Amino Acids. 2013;44(4):1129–1137.
  • Huntley RP, Sawford T, Mutowo-Meullenet P, et al. The GOA database: gene ontology annotation updates for 2015. Nucleic Acids Res. 2015;43(Database issue):D1057–1063.
  • Tatusov RL, Fedorova ND, Jackson JD, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41.
  • Letunic I, Yamada T, Kanehisa M, et al. iPath: interactive exploration of biochemical pathways and networks. Trends Biochem Sci. 2008;33(3):101–103.
  • Altman T, Travers M, Kothari A, et al. A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinformatics. 2013;14:112.
  • Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43(D1):D257–260.
  • Finn RD, Tate J, Mistry J, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36(Database issue):D281–288.
  • Attwood TK, Beck ME, Bleasby AJ, et al. PRINTS–a database of protein motif fingerprints. Nucleic Acids Res. 1994;22(17):3590–3596.
  • Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic Acids Res. 2003;31(1):371–373.
  • Seifert J, Taubert M, Jehmlich N, et al. Protein-based stable isotope probing (protein-SIP) in functional metaproteomics. Mass Spectrom Rev. 2012;31(6):683–697.
  • Sachsenberg T, Herbst FA, Taubert M, et al. MetaProSIP: automated inference of stable isotope incorporation rates in proteins for functional metaproteomics. J Proteome Res. 2015;14(2):619–627.
  • Venable JD, Dong MQ, Wohlschlegel J, et al. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat Methods. 2004;1(1):39–45.
  • Panchaud A, Scherl A, Shaffer SA, et al. Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. Anal Chem. 2009;81(15):6481–6488.
  • Gillet LC, Navarro P, Tate S, et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics. 2012;11(6):O111 016717.
  • Cain JA, Solis N, Cordwell SJ. Beyond gene expression: the impact of protein post-translational modifications in bacteria. J Proteomics. 2014;97:265–286.
  • Chick JM, Kolippakkam D, Nusinow DP, et al. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol. 2015;33(7):743–749.
  • Shortreed MR, Wenger CD, Frey BL, et al. Global identification of protein post-translational modifications in a single-pass database search. J Proteome Res. 2015;14(11):4714–4720.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.