44
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

A Predictive Risk Probability Approach for Microarray Data with Survival as an Endpoint

, , , , &
Pages 841-852 | Received 21 Aug 2007, Accepted 05 Feb 2008, Published online: 10 Sep 2008
 

Abstract

Gene expression profiling has played an important role in cancer risk classification and has shown promising results. Since gene expression profiling often involves determination of a set of top rank genes for analysis, it is important to evaluate how modeling performance varies with the number of selected top ranked genes incorporated in the model. We used a colon data set collected at Moffitt Cancer Center as an example of the study, and ranked genes based on the univariate Cox proportional hazards model. A set of top ranked genes was selected for evaluation. The selection was done by choosing the top k ranked genes for k  = 1 to 12,500. An analysis indicated a considerable variation of classification outcomes when the number of top ranked genes was changed. We developed a predictive risk probability approach to accommodate this variation by identifying a range number of top ranked genes. For each number of top ranked genes, the procedure classifies each patient as having high risk (score = 1) or low risk (score = 0). The categorizations are then averaged, giving a risk score between 0 and 1, thus providing a ranking for the patient's need for further treatment. This approach was applied to the colon data set and demonstrated the strength of this approach by three criteria: First, a univariate Cox proportional hazards model showed a highly statistically significant level (log-rank χ 2 statistics = 110 with p -value <10 −16) for the predictive risk probability classification. Second, the survival tree model used the risk probability to partition patients into five risk groups showing a good separation of survival curves (log-rank χ 2 statistics = 215). In addition, utilization of the risk group status identified a small set of risk genes that may be practical for biological validation. Third, analysis of resampling the risk probability suggested the variation pattern of the log-rank χ 2 in the colon cancer data set was unlikely caused by chance.

ACKNOWLEDGMENTS

This work was supported by grants from the National Institutes of Health (2P30CA76292-08, R01CA112215-03, and R01CA098522-05). The authors acknowledge the use of the services provided by Research Computing, University of South Florida.

The views presented in this article are those of the authors and do not necessarily represent those of the U.S. Food and Drug Administration.

Notes

Censoring rate = 58%.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.