1,134
Views
0
CrossRef citations to date
0
Altmetric
Editorial

How could IonStar challenge the current status quo of quantitative proteomics in large sample cohorts?

, , &
Pages 541-543 | Received 31 Jul 2017, Accepted 15 Jun 2018, Published online: 26 Jun 2018

1. Current status of quantitative proteomics in large cohorts

Quantitative proteomics represents a powerful tool for investigating global protein changes in clinical and pharmaceutical cohorts. To obtain reliable quantitative results, several key features are desirable: (1) high quantitative accuracy and precision; (2) large sampling capacity in a single batch; (3) in-depth and reproducible protein measurement (i.e. proteome coverage and missing data levels); (4) low false-positive discovery of significantly altered proteins. Because of its theoretically unlimited sampling capacity, label-free approaches appear to be a more logical choice for large-cohort proteomics analysis over labeling approaches[Citation1]. Based on the source of quantitative information, label-free methods can be classified into MS1- and MS2-based methods. MS2-based methods typically rely on Data-Dependent Acquisition (DDA) for quantification, representative examples being spectral counting (SpC) and MS2 ion intensities[Citation2]. However, as sample number increases, MS2-DDA quantification usually suffers from drastic decrease of proteins that are reproducibly quantified in all samples, due to the stochastic nature of DDA[Citation3] as well as the routinely practiced DDA features to increase identification depth, such as dynamic exclusion[Citation4]. MS2-based Data-Independent Acquisition (MS2-DIA), which acquires data based on relatively wide MS2 windows, has emerged as a promising alternative to MS2-DDA to alleviate missing data and enhance quantitative quality in large sample cohorts. Nonetheless, challenges still remain, including limited depth of identification/quantification[Citation5] and lack of publicly accessible measures for false discovery rate estimation[Citation6]. Progress has been made to improve the performance of MS2-DIA methods, e.g. spectral library-free peptide identification (e.g. PECAN, DIA-Umpire, SpectronautTM Pulsar) and incorporation of MS1 information for quantification (e.g. DIA-Umpire, Skyline), yet these methods warrant further evaluation and application[Citation5,Citation7].

MS1-based methods, in comparison, acquire quantitative information from high-resolution MS1 measurement, while MS2 information is merely employed for assigning peptide identities to individual MS1 quantitative features. Because acquisition and calculation of MS1 ion intensities or peak area-under-the-curves (AUCs) are completely independent of MS2-DDA, the impacts of MS2 stochasticity on peptide/protein quantification could be minimized. By applying chromatographic alignment to correct inter-run deviation of peptide retention time (RT) and feature propagation strategies such as accurate mass and time tag (AMT) approach[Citation8] to infer peptide identities across the entire dataset, MS1-based methods hold great potential for achieving sensitive and reproducible protein quantification in sizable sample sets[Citation9]. Prominent examples of commercial and open-source MS1-based methods include Proteome Discoverer, MaxQuant, OpenMS, Skyline, PEPPeR, Census, Superhirn, Dante and Rosetta Elucidator[Citation10]. However, it has also been shown that the missing data problem still prevails, especially when it comes to large-cohort analysis, where the gain of more quantified proteins in larger sample sets could be readily canceled out by snowballing missing data problem rooting from suboptimal reproducibility in experimental procedures and algorithms for feature generation and processing[Citation11]. More recently, a few MS1-based quantitative packages (e.g. Progenesis QI, DeMix-Q, FeatureFinderIdentification) reported low-missing data quantification respectively via improvement of feature detection and/or propagation algorithm, nonetheless these packages either bear suboptimal data quality rooted from insufficient feature quality control or have not been demonstrated in large-scale analysis (e.g. >20 samples)[Citation12Citation14]. Overall, these problems constitute a dilemma for comprehensive and reliable proteomics analysis of large sample cohorts.

2. Principles and components of Ionstar

IonStar is an MS1-based quantitative method devised for large-cohort proteomics analysis, which prominently alleviates issues associated with quantitative precision, missing data, and false-positive discovery of protein changes[Citation9,Citation15]. IonStar features a series of experimental procedures to guarantee efficient, consistent and extensive peptide production as well as sensitive and robust liquid chromatography-mass spectrometry (LC-MS) analysis across many samples, plus a data processing pipeline to achieve precise and reproducible protein measurement in large sample sets. The experimental procedures start with a surfactant cocktail-aided extraction/precipitation/on-pellet digestion (SEPOD) protocol (Shen et al., manuscript in revision), which uses a high-concentration surfactant cocktail for exhaustive protein extraction, extensive protein denaturation, and effective removal of detrimental matrix components. Peptides are separated with a high-capacity and reproducible reversed-phase-nano-LC approach on a 100-cm long column with a large-i.d. trap setting, and are detected by an ultra-high field (UHF) Orbitrap mass spectrometer under high MS1 resolution (e.g. 120k). Detailed setup and performance evaluation of IonStar experimental procedures are described by Shen et al [Citation9]. Overall, these procedures set a solid foundation for high-quality MS1-based quantification in large sample cohorts.

For data processing, IonStar adopts a profile-based workflow which detects features after alignment, as opposed to feature-based workflows (e.g. OpenMS, MaxQuant, PEPPeR, Superhirn). A unique ChromAlign algorithm is employed by IonStar for global RT correction across the dataset[Citation16]. Compared with conventional one-step, two-dimensional alignment algorithms which often consider m/z, RT, and/or reduced isotopic envelope, ChromAlign conducts two-step, three-dimensional alignment by including high-resolution MS1 mass peaks of high-abundance signals as an additional dimension, substantially improving the efficiency and reproducibility of RT correction in large-scale LC-MS runs derived from complex peptide mixtures. As suggested by our results, ChromAlign decreased >97% of RT deviation in a 20-sample dataset while another prevalent algorithm only decreased RT deviation by ~ 50%[Citation15]. This alignment strategy significantly improves the accuracy in the assignment of quantitative features and enables high-quality feature generation using a straightforward Direct Ion-Current Extraction (DICE) method (similar strategies are named as extracted ion chromatogram (XIC)-based feature detection[Citation10]), which employs narrow m/z-RT windows enabled by high-resolution MS1 detection to extract the ion currents of a precursor and robustly propagate the feature across the entire dataset. The generated feature sets, termed as “frames”, record corresponding MS1 ion currents and MS2 scan numbers from every run, which are then matched with database searching results to retrieve peptide IDs. Compared with popular Peak Property-Based (PPB) methods (or scan-based feature detection[Citation10]) which generates features individually in each run and then propagates using AMT-like approaches plus additional requirements for the peak (e.g. peak shape, isotopic envelope), this DICE method, in conjunction with the use of ChromAlign, proves to be more sensitive (i.e. generating more valid quantitative features) and remarkably reduces mismatching and missing data on the feature level[Citation15]. These two steps of the IonStar data processing pipeline have been developed and incorporated into the SIEVE package from Thermo Scientific. However, since inadequate quality control during feature generation is performed, low-quality features caused by chemical/instrumental noises and co-eluted peaks may be included, compromising quantitative quality and subsequent data interpretation. Therefore, post-feature generation quality control measures are necessary to remove outliers in quantification. In IonStar, a multivariate mean-variation modeling algorithm, PCOut[Citation17], is adopted to exclude peptides containing low-quality quantitative features by examining the principal component properties of quantitative ratios between experimental conditions and control for peptides inferred to the same protein. The PCOut algorithm is particularly effective for high-dimension data such as proteomics quantification results involving multiple conditions, which is common in clinical and pharmaceutical studies. Finally, IonStar also provides a number of options for data normalization and aggregation, which can be evaluated and selected by the users.

3. Significant improved proteomic quantification using ionstar

So far, IonStar or the ion-current-based concept of IonStar have been applied in the quantitative proteomics analysis of a variety of biological systems, and has led to the publication of over 20 peer-reviewed papers. Most recently, we have shown that in a N = 100 biological sample set, IonStar achieved excellent depth of >7,000 unique protein groups with ~ 8% intragroup variation for technical replicates and extremely-low missing data rate (~ 0.5% on protein level), in contrast to 10 ~ 20% missing data rate in 10 to 20 replicates using prevalent MS1-based methods[Citation15]. To provide more comprehensive comparison between IonStar and other popular methods, we prepared a concocted spike-in sample set containing human protein mixture as constant background (>90% of total amount) and five different levels of E.Coli protein mixture as the subset of proteins that changes (3%, 4.5%, 6%, 7.5%, and 9%, N = 4 per spike-in level). Quantitative performance of IonStar was benchmarked with this human-E.Coli sample set along with a number of label-free quantitative methods, including SpC, Proteome Discoverer 2.1, OpenMS, and MaxQuant. Consequently, we found that IonStar offered far superior quantification of the human-E.Coli proteome compared with other methods involved, manifested by much lower missing data (0.1% vs. 16.5–43.4% on protein level), better quantitative accuracy/precision (~ 5% vs. 10–19% intra-group variation), the widest protein abundance range, and the highest sensitivity/specificity for identifying altered proteins (<5% vs. 8–34% false altered-protein discovery rate)[Citation15]. These results suggest the great potential of IonStar to address the current dilemma of quantitative proteomics for in-depth and reliable large-cohort analysis.

4. Future perspectives of Ionstar

For the IonStar pipeline, there is still much room for improvement, e.g. in proteome coverage. For IonStar, over 50% of all ‘frames’ generated during the DICE procedure are not annotated because of the lack of valid peptide IDs by the MS2-DDA approach, and thus retrieving the IDs of unexploited ‘frames’ will be an effective strategy to boost the quantitative depth of IonStar. We anticipate this can be achieved by constructing a MS1-peptide ID library containing high-accuracy m/z and standardized RT (by either endogenous or spiked-in peptide standard sets) of peptides and developing an accurate and robust algorithm to match the library entries to unassigned frames. The MS1-peptide ID library can be established by either experimental measurement (similar to building spectral library used in SWATH[Citation18]) or by in silico peptide m/z-RT prediction[Citation19]. Moreover, we will optimize a set of algorithms to provide feature assignment with high sensitivity and specificity. Success of this strategy will further enhance the capability of IonStar for more in-depth analysis of large sample cohorts.

5. Conclusion

For clinical and pharmaceutical proteomics investigations, the capability of a proteomics method to reproducibly and accurately quantify proteins in large sample cohorts is a key premise. Problems associated with proteome coverage and missing data rate when replicate number increases must be addressed to ensure comprehensive and informative analysis. While most existing approaches fail to fulfill such demands, we demonstrated that IonStar, a quantitative proteomics approach combining a suite of well-optimized experimental procedures and an ion current-based quantification pipeline, achieves accurate, precise proteome-wide quantification performance with extremely-low levels of missing values and false-positive discovery of altered proteins, and thus markedly challenges the current status quo of quantitative proteomics. With future implementation of a novel MS1-peptide ID library matching method, we believe that IonStar will be one of the methods of choice for low-cost, reliable and in-depth proteomics investigation of large cohorts.

Declaration of interest

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Additional information

Funding

This article was not funded.

References

  • Cox J, Hein MY, Luber CA, et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014;13:2513–2526.
  • Chen YY, Chambers MC, Li M, et al. IDPQuantify: combining precursor intensity with spectral counts for protein and peptide quantification. J Proteome Res. 2013;12:4111–4121.
  • Tabb DL, Vega-Montoto L, Rudnick PA, et al. Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res. 2010;9:761–776.
  • Zhang Y, Wen Z, Washburn MP, et al. Effect of dynamic exclusion duration on spectral count based quantitative proteomics. Anal Chem. 2009;81:6317–6326.
  • Navarro P, Kuharev J, Gillet LC, et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol. 2016;34:1130–1136.
  • Tsou CC, Avtonomov D, Larsen B, et al. DIA-umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods. 2015;12:258–264. 7 p following 64.
  • Ting YS, Egertson JD, Bollinger JG, et al. PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat Meth. 2017;14:903–908.
  • Smith RD, Anderson GA, Lipton MS, et al. An accurate mass tag strategy for quantitative and high-throughput proteome measurements. Proteomics. 2002;2:513–523.
  • Shen X, Shen S, Li J, et al. An Ionstar experimental strategy for ms1 ion current-based quantification using ultrahigh-field Orbitrap: reproducible, in-depth, and accurate protein measurement in large cohorts. J Proteome Res. 2017;16:2445–2456.
  • Sandin M, Teleman J, Malmstrom J, et al. Data processing methods and quality control strategies for label-free LC-MS protein quantification. Biochim Biophys Acta. 2014;1844:29–41.
  • Bruderer R, Bernhardt OM, Gandhi T, et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol Cell Proteomics. 2015;14:1400–1410.
  • Chawade A, Sandin M, Teleman J, et al. Data processing has major impact on the outcome of quantitative label-free LC-MS analysis. J Proteome Res. 2015;14:676–687.
  • Weisser H, Choudhary JS. Targeted feature detection for data-dependent shotgun proteomics. J Proteome Res. 2017;16:2964–2974.
  • Zhang B, Kall L, Zubarev RA. DeMix-Q: Quantification-Centered Data Processing Workflow. Mol Cell Proteomics. 2016;15:1467–1478.
  • Shen X, Shen S, Li J, et al. IonStar enables high-precision, low-missing-data proteomics quantification in large biological cohorts. Proc Natl Acad Sci USA. 2018;115:4767–4776.
  • Sadygov RG, Maroto FM, Huhmer AF. ChromAlign: a two-step algorithmic procedure for time alignment of three-dimensional LC-MS chromatographic surfaces. Anal Chem. 2006;78:8207–8217.
  • Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Comput Stat Data Anal. 2008;52:1694–1711.
  • Schubert OT, Gillet LC, Collins BC, et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc. 2015;10:426–441.
  • Moruz L, Hoopmann MR, Rosenlund M, et al. Mass fingerprinting of complex mixtures: protein inference from high-resolution peptide masses and predicted retention times. J Proteome Res. 2013;12:5730–5741.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.