203
Views
0
CrossRef citations to date
0
Altmetric
Review

Proteomic repository data submission, dissemination, and reuse: key messages

ORCID Icon
Pages 297-310 | Received 28 Oct 2022, Accepted 07 Dec 2022, Published online: 26 Dec 2022

References

  • Martens L, Vizcaino JA. A golden age for working with public proteomics data. Trends Biochem Sci. 2017;42(5):333–341.
  • Perez-Riverol Y, Alpi E, Wang R, et al. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics. 2015;15(5–6):930–949.
  • Vizcaino JA, Deutsch EW, Wang R, et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol. 2014;32(3):223–226.
  • Deutsch EW, Orchard S, Binz PA, et al. Proteomics standards initiative: fifteen years of progress and future work. J Proteome Res. 2017;16(12):4288–4298.
  • Perez-Riverol Y, Bai J, Bandla C, et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022;50(D1):D543–D552.
  • Desiere F, Deutsch EW, King NL, et al. The peptideatlas project. Nucleic Acids Res. 2006;34( Database issue):D655–658.
  • Deutsch EW. The peptideatlas project. Methods Mol Biol. 2010;604:285–296.
  • Moriya Y, Kawano S, Okuda S, et al. The jPOST environment: an integrated proteomics data repository and database. Nucleic Acids Res. 2019;47(D1):D1218–D1224.
  • Chen T, Ma J, Liu Y, et al. iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res. 2022;50(D1):D1522–D1527.
  • Sharma V, Eckels J, Schilling B, et al. Panorama public: a public repository for quantitative data sets processed in skyline. Mol Cell Proteomics. 2018;17(6):1239–1244.
  • Deutsch EW, Perez-Riverol Y, Carver J, et al. Universal spectrum identifier for mass spectra. Nat Methods. 2021;18(7):768–770.
  • Choi M, Carver J, Chiva C, et al. MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets. Nat Methods. 2020;17(10):981–984.
  • Jarnuczak AF, Najgebauer H, Barzine M, et al. An integrated landscape of protein expression in human cancer. Sci Data. 2021;8(1):115.
  • Moreno P, Fexova S, George N, et al. Expression Atlas update: gene and protein expression in multiple species. Nucleic Acids Res. 2022;50(D1):D129–D140.
  • Samaras P, Schmidt T, Frejno M, et al. ProteomicsDB: a multi-omics and multi-organism resource for life science research. Nucleic Acids Res. 2020;48(D1):D1153–D1163.
  • Fenyo D, Beavis RC. The GPMDB REST interface. Bioinformatics. 2015;31(12):2056–2058.
  • Ramasamy P, Turan D, Tichshenko N, et al. Scop3P: a comprehensive resource of human phosphosites within their full context. J Proteome Res. 2020;19(8):3478–3486.
  • Brunet MA, Brunelle M, Lucier JF, et al. OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes. Nucleic Acids Res. 2019;47(D1):D403–D410.
  • Olexiouk V, Van Criekinge W, Menschaert G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2018;46(D1):D497–D502.
  • Shao X, Taha IN, Clauser KR, et al. MatrisomeDB: the ECM-protein knowledge database. Nucleic Acids Res. 2020;48(D1):D1136–D1144.
  • Deutsch EW, Csordas A, Sun Z, et al. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 2017;45(D1):D1100–D1106.
  • Deutsch EW, Bandeira N, Sharma V, et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 2020;48(D1):D1145–D1152.
  • Wu P, Heins ZJ, Muller JT, et al. Integration and analysis of CPTAC proteomics data in the context of cancer genomics in the cBioPortal. Mol Cell Proteomics. 2019;18(9):1893–1898.
  • Wang M, Wang J, Carver J, et al. Assembling the community-scale discoverable human proteome. Cell Syst. 2018;7(4):412–421 e415.
  • Brunet MA, Lucier JF, Levesque M, et al. OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes. Nucleic Acids Res. 2021;49(D1):D380–D388.
  • Marcu A, Bichmann L, Kuchenbecker L, et al. HLA Ligand Atlas: a benign reference of HLA-presented peptides to improve T-cell-based cancer immunotherapy. J Immunother Cancer. 2021;9(4):e002071.
  • Brenes AJ, Hukelmann JL, Spinelli L, et al. The immunological proteome resource. bioRxiv. 2022;2008.2029.505666.
  • Perez-Riverol Y, Csordas A, Bai J, et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 2019;47(D1):D442–D450.
  • Ternent T, Csordas A, Qi D, et al. How to submit MS proteomics data to ProteomeXchange via the PRIDE database. Proteomics. 2014;14(20):2233–2241.
  • Vizcaino JA, Mayer G, Perkins S, et al. The mzIdentML data standard version 1.2, supporting advances in proteome informatics. Mol Cell Proteomics. 2017;16(7):1275–1285.
  • Hoffmann N, Rein J, Sachsenberg T, et al. mzTab-M: a data standard for sharing quantitative results in mass spectrometry metabolomics. Anal Chem. 2019;91(5):3302–3310.
  • Griss J, Jones AR, Sachsenberg T, et al. The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics. 2014;13(10):2765–2775.
  • Okuda S, Watanabe Y, Moriya Y, et al. jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res. 2017;45(D1):D1107–D1111.
  • Pino LK, Searle BC, Bollinger JG, et al. The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev. 2020;39(3):229–244.
  • Dai C, Fullgrabe A, Pfeuffer J, et al. A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun. 2021;12(1):5854.
  • Griss J, Perez-Riverol Y, Hermjakob H, et al. Identifying novel biomarkers through data mining-a realistic scenario? Proteomics Clin Appl. 2015;9(3–4):437–443.
  • Perez-Riverol Y, Ternent T, Koch M, et al. OLS client and ols dialog: open source tools to annotate public omics datasets. Proteomics. 2017;17(19):1700244.
  • Perkins DN, Pappin DJ, Creasy DM, et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567.
  • Uszkoreit J, Perez-Riverol Y, Eggers B, et al. Protein inference using PIA workflows and PSI standard file formats. J Proteome Res. 2019;18(2):741–747.
  • Uszkoreit J, Maerkens A, Perez-Riverol Y, et al. PIA: an intuitive protein inference engine with a web-based user interface. J Proteome Res. 2015;14(7):2988–2997.
  • Pfeuffer J, Sachsenberg T, Alka O, et al. OpenMS - A platform for reproducible analysis of mass spectrometry data. J Biotechnol. 2017;261:142–148.
  • Sinitcyn P, Hamzeiy H, Salinas Soto F, et al. MaxDIA enables library-based and library-free data-independent acquisition proteomics. Nat Biotechnol. 2021;39(12):1563–1573.
  • Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26(12):1367–1372.
  • Choi M, Chang CY, Clough T, et al. MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics. 2014;30(17):2524–2526.
  • Tyanova S, Temu T, Sinitcyn P, et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods. 2016;13(9):731–740.
  • Jorissen RN, Gibbs P, Christie M, et al. Metastasis-associated gene expression changes predict poor outcomes in patients with dukes stage B and C colorectal cancer. Clin Cancer Res. 2009;15(24):7642–7651.
  • Kim MS, Pinto SM, Getnet D, et al. A draft map of the human proteome. Nature. 2014;509(7502):575–581.
  • Craig R, Cortens JP, Beavis RC. Open source system for analyzing, validating, and storing protein identification data. J Proteome Res. 2004;3(6):1234–1242.
  • van Wijk Kj, Leppert T, Sun Q, et al. The Arabidopsis PeptideAtlas: harnessing worldwide proteomics data to create a comprehensive community proteomics resource. Plant Cell. 2021;33(11):3421–3453.
  • Omenn GS, Lane L, Lundberg EK, et al. Metrics for the human proteome project 2016: progress on identifying and characterizing the human proteome, including post-translational modifications. J Proteome Res. 2016;15(11):3951–3960.
  • Kalyuzhnyy A, Eyers PA, Eyers CE, et al. Profiling the human phosphoproteome to estimate the true extent of protein phosphorylation. J Proteome Res. 2022;21(6):1510–1524.
  • Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20(9):1466–1467.
  • Shteynberg DD, Deutsch EW, Campbell DS, et al. PTMProphet: fast and accurate mass modification localization for the trans-proteomic pipeline. J Proteome Res. 2019;18(12):4262–4272.
  • Ramsbottom KA, Prakash A, Riverol YP, et al. Method for independent estimation of the false localization rate for phosphoproteomics. J Proteome Res. 2022;21(7):1603–1615.
  • Taus T, Kocher T, Pichler P, et al. Universal and confident phosphorylation site localization using phosphoRS. J Proteome Res. 2011;10(12):5354–5362.
  • Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods. 2014;11(11):1114–1125.
  • Chong C, Coukos G, Bassani-Sternberg M. Identification of tumor antigens with immunopeptidomics. Nat Biotechnol. 2022;40(2):175–188.
  • Umer HM, Audain E, Zhu Y, et al. Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides. Bioinformatics. 2021;38(5):1470–1472.
  • Barsnes H, Vaudel M. SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res. 2018;17(7):2552–2555.
  • Yates AD, Allen J, Amode RM, et al. Ensembl Genomes 2022: an expanding genome resource for non-vertebrates. Nucleic Acids Res. 2022;50(D1):D996–D1003.
  • Cunningham F, Allen JE, Allen J, et al. Ensembl 2022. Nucleic Acids Res. 2022;50(D1):D988–D995.
  • Vaudel M, Burkhart JM, Zahedi RP, et al. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol. 2015;33(1):22–24.
  • Cote RG, Jones P, Martens L, et al. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinformatics. 2007;8(1):401.
  • Deutsch EW, Mendoza L, Shteynberg D, et al. A guided tour of the trans-proteomic pipeline. Proteomics. 2010;10(6):1150–1159.
  • Kim S, Pevzner PA. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun. 2014;5(1):5277.
  • Eng JK, Jahan TA, Hoopmann MR. Comet: an open-source MS/MS sequence database search tool. Proteomics. 2013;13(1):22–24.
  • Frank A, Pevzner P. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem. 2005;77(4):964–973.
  • Perez-Riverol Y, Bai M, da Veiga Leprevost F, et al. Discovering and linking public omics data sets using the omics discovery index. Nat Biotechnol. 2017;35(5):406–409.
  • Perez-Riverol Y, Zorin A, Dass G, et al. Quantifying the impact of public omics data. Nat Commun. 2019;10(1):3512.
  • Perez-Riverol Y, Moreno P. Scalable data analysis in proteomics and metabolomics using biocontainers and workflows engines. Proteomics. 2020;20(9):e1900147.
  • Neely BA. Cloudy with a chance of peptides: accessibility, scalability, and reproducibility with cloud-hosted environments. J Proteome Res. 2021;20(4):2076–2082.
  • Solntsev SK, Shortreed MR, Frey BL, et al. Enhanced global post-translational modification discovery with metamorpheus. J Proteome Res. 2018;17(5):1844–1851.
  • Kong AT, Leprevost FV, Avtonomov DM, et al. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods. 2017;14(5):513–520.
  • Fahrner M, Foll MC, Gruning BA, et al. Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the galaxy framework. Gigascience. 2022;11. DOI:10.1093/gigascience/giac005
  • Walzer M, Garcia-Seisdedos D, Prakash A, et al. Implementing the reuse of public DIA proteomics datasets: from the PRIDE database to expression atlas. Sci Data. 2022;9(1):335.
  • Bichmann L, Gupta S, Rosenberger G, et al. DIAproteomics: a multifunctional data analysis pipeline for data-independent acquisition proteomics and peptidomics. J Proteome Res. 2021;20(7):3758–3766.
  • Savitski MM, Wilhelm M, Hahne H, et al. A scalable approach for protein false discovery rate estimation in large proteomic data sets. Mol Cell Proteomics. 2015;14(9):2394–2404.
  • Serang O, Kall L. Solution to statistical challenges in proteomics is more statistics, not less. J Proteome Res. 2015;14(10):4099–4103.
  • Omenn GS, Lane L, Lundberg EK, et al. Metrics for the human proteome project 2015: progress on the human proteome and guidelines for high-confidence protein identification. J Proteome Res. 2015;14(9):3452–3460.
  • Deutsch EW, Lane L, Overall CM, et al. Human proteome project mass spectrometry data interpretation guidelines 3.0. J Proteome Res. 2019;18(12):4108–4116.
  • Omenn GS, Lane L, Overall CM, et al. The 2022 report on the human proteome from the HUPO Human proteome project. J Proteome Res. 2022. DOI:10.1021/acs.jproteome.2c00498.
  • Perez-Riverol Y, Vizcaino JA, Griss J. Future prospects of spectral clustering approaches in proteomics. Proteomics. 2018;18(14):e1700454.
  • Griss J, Perez-Riverol Y, Lewis S, et al. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Methods. 2016;13(8):651–656.
  • Schaab C, Geiger T, Stoehr G, et al. Analysis of high accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics. 2012;11(3):M111 014068.
  • Wang M, Herrmann CJ, Simonovic M, et al. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics. 2015;15(18):3163–3168.
  • Montague E, Janko I, Stanberry L, et al. Beyond protein expression, MOPED goes multi-omics. Nucleic Acids Res. 2015;43( Database issue):D1145–1151.
  • Schwanhausser B, Busse D, Li N, et al. Global quantification of mammalian gene expression control. Nature. 2011;473(7347):337–342.
  • Lundgren DH, Hwang SI, Wu L, et al. Role of spectral counting in quantitative proteomics. Expert Rev Proteomics. 2010;7(1):39–53.
  • Wisniewski JR, Hein MY, Cox J, et al. A “proteomic ruler” for protein copy number and concentration estimation without spike-in standards. Mol Cell Proteomics. 2014;13(12):3497–3506.
  • Carvalho-Silva D, Pierleoni A, Pignatelli M, et al. Open targets platform: new developments and updates two years on. Nucleic Acids Res. 2019;47(D1):D1056–D1065.
  • Pinter N, Glatzer D, Fahrner M, et al. MaxQuant and MSstats in galaxy enable reproducible cloud-based analysis of quantitative proteomics experiments for everyone. J Proteome Res. 2022;21(6):1558–1565.
  • Bai M, Deng J, Dai C, et al. LFQ-based peptide and protein intensity downstream analysis. 2022.
  • Xu T, Park SK, Venable JD, et al. ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteomics. 2015;129:16–24.
  • Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589.
  • Wen B, Zeng WF, Liao Y, et al. Deep Learning in Proteomics. Proteomics. 2020;20(21–22):e1900335.
  • Meyer JG. Deep learning neural network tools for proteomics. Cell Rep Methods. 2021;1(2):100003
  • Gessulat S, Schmidt T, Zolg DP, et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods. 2019;16(6):509–518.
  • Tiwary S, Levy R, Gutenbrunner P, et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat Methods. 2019;16(6):519–525.
  • Demichev V, Messner CB, Vernardis SI, et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat Methods. 2020;17(1):41–44.
  • Yang Y, Liu X, Shen C, et al. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun. 2020;11(1):146.
  • Wen B, Li K, Zhang Y, et al. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nat Commun. 2020;11(1):1759.
  • Bouwmeester R, Gabriels R, Hulstaert N, et al. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat Methods. 2021;18(11):1363–1369.
  • Li K, Jain A, Malovannaya A, et al. DeepRescore: leveraging deep learning to improve peptide identification in immunopeptidomics. Proteomics. 2020;20(21–22):e1900334.
  • Declercq A, Bouwmeester R, Hirschler A, et al. MS(2)rescore: data-driven rescoring dramatically boosts immunopeptide identification rates. Mol Cell Proteomics. 2022;21(8):100266.
  • Qin C, Luo X, Deng C, et al. Deep learning embedder method and tool for mass spectra similarity search. J Proteomics. 2021;232:104070.
  • Bittremieux W, May DH, Bilmes J, et al. A learned embedding for efficient joint analysis of millions of mass spectra. Nat Methods. 2022;19(6):675–678.
  • Rehfeldt T, Gabriels R, Bouwmeester R, et al. ProteomicsML: an online platform for community-curated datasets and tutorials for machine learning in proteomics. 2022.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.