Abstract
The process of building a subset model in backward stepwise logistic regression for the purpose of prediction relies on two separate criteria: selection criteria and stopping criteria. SAS/IML programs were written to provide Monte Carlo simulations to determine the best α level for the stopping criterion and three selection criteria: Log-likelihood ratio statistic (LR), Score statistic (SC), and Wald's statistic (WD). Performance was evaluated using Efron's (Efron Citation1986) estimated true error rate of prediction. In our study, we found that the best α varied around between 0.24 and 0.40. For the selection criteria, LR and SC, α significantly decreased with the number of predictor variables, but for WD it did not. An overall recommendation is that the LR or SC should be used as a selection criterion, and a stopping criterion of 0.20 ≤ α ≤ 0.40 should be used, with a further refinement that, with the fewer variables, one should use a larger α level.
Acknowledgments
This study was supported by a postgraduate scholarship from the Natural Sciences and Engineering Research Council (Canada), and by grants from the National Cancer Institute of Canada (015046), and the Natural Sciences and Engineering Research Council (Canada) (9280-03). We thank the editor and referees for their comments which significantly improved the quality of this article.
Notes
Lack-of-fit test F 22,5 = 0.67 (p-value = 0.7683).