Abstract
Testing multiple null hypotheses in two stages to decide which of these can be rejected or accepted at the first stage and which should be followed up for further testing having had additional observations is of importance in many scientific studies. We develop two procedures, each with two different combination functions, Fisher's and Simes’, to combine p-values from two stages, given prespecified boundaries on the first-stage p-values in terms of the false discovery rate (FDR) and controlling the overall FDR at a desired level. The FDR control is proved when the pairs of first- and second-stage p-values are independent and those corresponding to the null hypotheses are identically distributed as a pair (p 1, p 2) satisfying the p-clud property. We did simulations to show that (1) our two-stage procedures can have significant power improvements over the first-stage Benjamini–Hochberg (BH) procedure compared to the improvement offered by the ideal BH procedure that one would have used had the second stage data been available for all the hypotheses, and can continue to control the FDR under some dependence situations, and (2) can offer considerable cost savings compared to the ideal BH procedure. The procedures are illustrated through a real gene expression data. Supplementary materials for this article are available online.
Acknowledgments
This work is based on Jingjing's PhD thesis under the supervision of Sarkar. The research of Sarkar and Guo were supported by NSF Grants DMS-1006344, 1309273 and DMS-1006021, 1309162 respectively. We thank the AE and two referees whose comments led a much improved presentation.
Notes
The H function for Simes' combination function is also given in an unpublished manuscript, Chen, J., Sarkar, S. K. and Bretz, F. (2011). “Finding Critical Values with Prefixed Early Stopping Boundaries and Controlled Type I Error for Two-Stage Combination Test.”