Abstract
Proteomics is a data-rich discipline that makes extensive use of separation tools, mass spectrometry and bioinformatics to analyze and interpret the features and dynamics of the proteome. A major challenge for the field is how proteomics data can be stored and managed, such that data become permanent and can be mined with current and future tools. This article details our experience in the development of a commercial proteomic information management system. We identify the challenges faced in data acquisition, workflow management, data permanence, security, data interpretation and analysis, as well as the solutions implemented to address these issues. We finally provide a perspective on data management in proteomics and the implications for academic and industry-based researchers working in this field.
Acknowledgements
We would like to acknowledge the efforts of Elizabeth Shaw, Balaji Srinivasan, Paul Morabito, Hiren Joshi, Abi Manoharan, Diane Sexton, Michel Poelman, Jian Wang, Ray Oreo, Jie Wang, Sophia He, Daryl Radivojevic, Nicole Steindl, Ted Zhou, Daniel Burr, Edmond Breen, Mervyn Thomas, Wendy Holstein and input from users in the Discovery and Technology teams for the development of the BioinformatIQ information management system.
Financial & competing interests disclosure
Marc R Wilkins and F Keith Junius hold shares in Proteome Systems Ltd. Proteome Systems Ltd owns the intellectual property developed in the construction of the proteomic information management system described in this paper. Paul Bizannes and Philip E Doggett are current employees of Proteome Systems Ltd. All other authors are former employees of Proteome Systems Ltd. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Notes
*Clinical source data are numerous and varied. This table shows just a few categories of these data; some clinical research forms hold over 100 different patient variables. The specific patient variables will have data in many forms, including numerical information, categories, short text answers and free-form, long-text descriptions.