1,768
Views
33
CrossRef citations to date
0
Altmetric
Theory and Methods

Metropolized Knockoff Sampling

, , &
Pages 1413-1427 | Received 05 Mar 2019, Accepted 30 Jan 2020, Published online: 17 Mar 2020
 

Abstract

Model-X knockoffs is a wrapper that transforms essentially any feature importance measure into a variable selection algorithm, which discovers true effects while rigorously controlling the expected fraction of false positives. A frequently discussed challenge to apply this method is to construct knockoff variables, which are synthetic variables obeying a crucial exchangeability property with the explanatory variables under study. This article introduces techniques for knockoff generation in great generality: we provide a sequential characterization of all possible knockoff distributions, which leads to a Metropolis–Hastings formulation of an exact knockoff sampler. We further show how to use conditional independence structure to speed up computations. Combining these two threads, we introduce an explicit set of sequential algorithms and empirically demonstrate their effectiveness. Our theoretical analysis proves that our algorithms achieve near-optimal computational complexity in certain cases. The techniques we develop are sufficiently rich to enable knockoff sampling in challenging models including cases where the covariates are continuous and heavy-tailed, and follow a graphical model such as the Ising model. Supplementary materials for this article are available online.

Supplementary Materials

Appendix: Appendix of the article. (.pdf file)

Division of Mathematical Sciences;

Acknowledgments

S. B. and E.C. would like to thank Yaniv Romano and Matteo Sesia for useful comments on an early version of this work. L. J. and W. W. would like to thank Jun Liu for fruitful discussions on MCMC.

Notes

1 See Appendix F.6 for a simulation study demonstrating the relationship between power and absolute correlation.

2 In the presence of a response Y, we also require X˜Y|X, which is easily satisfied by procedures that generate X˜ from X without looking at Y.

3 We use (W1|W2) to denote the conditional distribution of W 1 given W 2. We use subscript k to mean the vector with the kth coordinate removed, and 1:k to mean the first k coordinates of the vector. We use the subscript 1:0 to mean an empty vector.

4 More generally, we take as acceptance probability γ α with γ(0,1]. In this work, γ is set to 1 as default, except in Section 3.3 and Appendix F.2, which are cases where tuning γ is recommended.

5 The proposal distribution at step j only depends on the conditional density of (Xj|Xj,X1:(j1)*,X˜1:(j1)) which can be easily shown to be symmetric in the first j – 1 pairs.

6 In this section, when not explicitly specified, a variable is set to its observed value, for example, P(X1|X2=z2,X3,X˜1,X1*) is shorthand for P(X1=x1|X2=z2,X3=x3,X˜1=x˜1,X1*=x1*).

7 This simple fact follows from the running intersection property; we refer the reader to Lemma 1 in Appendix A.

8 A chordal graph is a graph such that any cycle of length 4 or larger has a chord.

9 The hardware varies across simulations, but each CPU is between 2.5 GHz and 3.3 GHz.

Additional information

Funding

E. C. was partially supported by the Office of Naval Research under grant N00014-16-1-2712, by the National Science Foundation via DMS 1712800, and by a generous gift from TwoSigma. S. B. was supported by a Ric Weiland Graduate Fellowship.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.