1,034
Views
4
CrossRef citations to date
0
Altmetric
Theory and Methods

Structure–Adaptive Sequential Testing for Online False Discovery Rate Control

, &
Pages 732-745 | Received 25 Feb 2020, Accepted 09 Jul 2021, Published online: 17 Nov 2021
 

Abstract

Consider the online testing of a stream of hypotheses where a real-time decision must be made before the next data point arrives. The error rate is required to be controlled at all decision points. Conventional simultaneous testing rules are no longer applicable due to the more stringent error constraints and absence of future data. Moreover, the online decision-making process may come to a halt when the total error budget, or alpha-wealth, is exhausted. This work develops a new class of structure-adaptive sequential testing (SAST) rules for online false discovery rate (FDR) control. A key element in our proposal is a new alpha-investing algorithm that precisely characterizes the gains and losses in sequential decision making. SAST captures time varying structures of the data stream, learns the optimal threshold adaptively in an ongoing manner and optimizes the alpha-wealth allocation across different time periods. We present theory and numerical results to show that SAST is asymptotically valid for online FDR control and achieves substantial power gain over existing online testing rules.

Supplementary material

The supplementary material contains the proofs of main theorems, other theoretical results and additional numerical results.

Notes

1 T may be taken either as {1,2,,T} on a growing domain or a set of points that lie on a fixed-domain regular grid: {1/T,2/T,,(T1)/T,1} with T.

2 As pointed out by a referee, (2) should be understood as the “average” FDR under the random mixture model (1); the expectation E is taken over both θt and Xt. Therefore, the “average” FDR is the correct understanding through which our theory may be properly conceptualized.

3 The asymptotic equivalence can be shown by following similar lines as done in Basu et al. (Citation2018) for proving the equivalence between the marginal FDR and FDR. Empirically the two power measures yield almost identical patterns in our simulations.

4 In situations where the empirical null is more appropriate (Efron Citation2004), f0 can be first estimated using the method in Jin and Cai (Citation2007) and then treated as known.

5 In situations where the online FDR analysis must start without prior data, we suggest applying existing methods such as LOND first and then switch to SAST as more data are acquired.

6 Let T be a prespecified large integer denoting the number of tests we tentatively plan to conduct. The bandwidth hT will be determined by T and fixed throughout the entire period, which is allowed to go beyond T. The choice of hT is discussed in detail in Section 4.1. We recommend using the same hT in both EquationEquations (1) and Equation(2) to stabilize the performance.

7 These prespecified regions serve as the proxies of the interested signals that we wish to discover.

Additional information

Funding

The research of Wenguang Sun was supported in part by NSF grant DMS-2015339.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.