ABSTRACT
Clinical trials can typically feature two different types of multiple inference: testing of more than one null hypothesis and testing at multiple time points. These modes of multiplicity are closely related mathematically but distinct statistically and philosophically. Regulatory agencies require strong control of the family-wise error rate (FWER), the risk of falsely rejecting any null hypothesis at any analysis. The correlations between test statistics at interim analyses and the final analysis are therefore routinely used in group sequential designs to achieve less conservative critical values. However, the same type of correlations between different comparisons, endpoints or sub-populations are less commonly used. As a result, FWER is in practice often controlled conservatively for commonly applied procedures.
Repeated testing of the same null hypothesis may give changing results, when the hypothesis is rejected at an interim but accepted at the final analysis. The mathematically correct overall rejection is at odds with an inference theoretic approach and with common sense. We discuss these two issues, of incorporating correlations and how to interpret time-changing conclusions, and provide case studies where power can be increased while adhering to sound statistical principles.
Acknowledgments
We would like to thank two anonymous referees and the Associate Editor for providing thoughtful suggestions that greatly improved the content of the paper.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed on the publisher’s website