Predicting Patient Survival from Proteomic Profile using Mass Spectrometry Data: An Empirical Study

Farida Mostajabi Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, Kentucky, USA

Somnath Datta Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, Kentucky, USA

Susmita Datta Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, Kentucky, USA

Abstract

Predicting survival times of patients with the proteomic profile of bodily fluids, such as plasma and serum, has been of interest in biomedical research. In this article, we consider the same with patient serum using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) data of non small cell lung cancer patients. Due to much larger dimension of features in a mass spectrum compared to the study sample size, traditional linear regression modeling of survival times with high number of proteomic features is not feasible. Hence, we consider latent factor and regularized/penalized methods for fitting such models in order to predict patient survival from the mass spectrometry features. Extensive numerical studies involving both simulated as well as real mass spectrometry data are used to compare four popular regression methods, namely, partial least squares (PLS), sparse partial least square (SPLS), least absolute shrinkage and selection operator (LASSO), and elastic net regularization, on processed spectra. Right censoring is handled through a residual-based multiple imputation. The results measured by means squared error of fit and prediction, vary considerably on the methods used, the tuning parameters of the methods and selected features after preprocessing. Overall, more complex methods such as the elastic net and SPLS result in better performances provided the operational parameters are chosen carefully via cross validation. For survival time prediction, we recommend using the elastic net based on a selected set of features.

[Supplementary materials are available for this article. Go to the publisher's online edition of Communications in Statistics—Simulation and Computation for the following free supplemental resource: a file containing tables and figures showing the mean squared error of fit in a simulated model, the estimated mean squared error of fit for the Milan, NSCLC data, median value of the optimum number of steps or number of components based on minimization of EMSEP, mean squared error of prediction in a simulated model, observed versus fitted values in Milan NSCLC data and feature identification.]

Keywords::

Mathematics Subject Classification:

Acknowledgments

This work was supported by grants from the National Science Foundation (NSF-DMS-0805559 to Susmita Datta) and the National Institutes of Health (NIH-CA133844 to Susmita Datta). We thank David P. Carbone for kindly providing us the Milan NSCLC Data. We thankfully acknowledge Johannes Voortman and Thang V. Pham for graciously sharing the Nethrlands NSCLC Data with us. We thank the referees for their constructive comments.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Predicting Patient Survival from Proteomic Profile using Mass Spectrometry Data: An Empirical Study

Information for

Open access

Opportunities

Help and information

Predicting Patient Survival from Proteomic Profile using Mass Spectrometry Data: An Empirical Study

Abstract

Acknowledgments

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature