ABSTRACT
We propose a sequential, anytime-valid method to test the conditional independence of a response Y and a predictor X given a random vector Z. The proposed test is based on e-statistics and test martingales, which generalize likelihood ratios and allow valid inference at arbitrary stopping times. In accordance with the recently introduced model-X setting, our test depends on the availability of the conditional distribution of X given Z, or at least a sufficiently sharp approximation thereof. Within this setting, we derive a general method for constructing e-statistics for testing conditional independence, show that it leads to growth-rate optimal e-statistics for simple alternatives, and prove that our method yields tests with asymptotic power one in the special case of a logistic regression model. A simulation study is done to demonstrate that the approach is competitive in terms of power when compared to established sequential and nonsequential testing methods, and robust with respect to violations of the model-X assumption. Supplementary materials for this article are available online.
Acknowledgments
We are grateful to Aaditya Ramdas, Yaniv Romano, and three anonymous referees for their helpful feedback on this article. Part of the computations have been performed on UBELIX (https://ubelix.unibe.ch/), the HPC cluster of the Universiy of Bern.
Notes
1 E-statistics are commonly known as e-variables; we use the former to stress data dependence.