159
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Predicting Patient Survival from Proteomic Profile using Mass Spectrometry Data: An Empirical Study

, &
Pages 485-498 | Received 18 Jul 2011, Accepted 25 Oct 2011, Published online: 20 Nov 2012
 

Abstract

Predicting survival times of patients with the proteomic profile of bodily fluids, such as plasma and serum, has been of interest in biomedical research. In this article, we consider the same with patient serum using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) data of non small cell lung cancer patients. Due to much larger dimension of features in a mass spectrum compared to the study sample size, traditional linear regression modeling of survival times with high number of proteomic features is not feasible. Hence, we consider latent factor and regularized/penalized methods for fitting such models in order to predict patient survival from the mass spectrometry features. Extensive numerical studies involving both simulated as well as real mass spectrometry data are used to compare four popular regression methods, namely, partial least squares (PLS), sparse partial least square (SPLS), least absolute shrinkage and selection operator (LASSO), and elastic net regularization, on processed spectra. Right censoring is handled through a residual-based multiple imputation. The results measured by means squared error of fit and prediction, vary considerably on the methods used, the tuning parameters of the methods and selected features after preprocessing. Overall, more complex methods such as the elastic net and SPLS result in better performances provided the operational parameters are chosen carefully via cross validation. For survival time prediction, we recommend using the elastic net based on a selected set of features.

[Supplementary materials are available for this article. Go to the publisher's online edition of Communications in Statistics—Simulation and Computation for the following free supplemental resource: a file containing tables and figures showing the mean squared error of fit in a simulated model, the estimated mean squared error of fit for the Milan, NSCLC data, median value of the optimum number of steps or number of components based on minimization of EMSEP, mean squared error of prediction in a simulated model, observed versus fitted values in Milan NSCLC data and feature identification.]

Mathematics Subject Classification:

Acknowledgments

This work was supported by grants from the National Science Foundation (NSF-DMS-0805559 to Susmita Datta) and the National Institutes of Health (NIH-CA133844 to Susmita Datta). We thank David P. Carbone for kindly providing us the Milan NSCLC Data. We thankfully acknowledge Johannes Voortman and Thang V. Pham for graciously sharing the Nethrlands NSCLC Data with us. We thank the referees for their constructive comments.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.