940
Views
23
CrossRef citations to date
0
Altmetric
Theory and Methods

Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans

, , &
Pages 789-801 | Received 01 Nov 2015, Published online: 06 Jun 2018
 

ABSTRACT

The scan statistic is by far the most popular method for anomaly detection, being popular in syndromic surveillance, signal and image processing, and target detection based on sensor networks, among other applications. The use of the scan statistics in such settings yields a hypothesis testing procedure, where the null hypothesis corresponds to the absence of anomalous behavior. If the null distribution is known, then calibration of a scan-based test is relatively easy, as it can be done by Monte Carlo simulation. When the null distribution is unknown, it is less straightforward. We investigate two procedures. The first one is a calibration by permutation and the other is a rank-based scan test, which is distribution-free and less sensitive to outliers. Furthermore, the rank scan test requires only a one-time calibration for a given data size making it computationally much more appealing. In both cases, we quantify the performance loss with respect to an oracle scan test that knows the null distribution. We show that using one of these calibration procedures results in only a very small loss of power in the context of a natural exponential family. This includes the classical normal location model, popular in signal processing, and the Poisson model, popular in syndromic surveillance. We perform numerical experiments on simulated data further supporting our theory and also on a real dataset from genomics. Supplementary materials for this article are available online.

Supplementary Materials

The online supplementary materials contain additional proofs and results. Please refer to https://arxiv.org/abs/1508.03002 for the full proofs.

Acknowledgments

We would like to thank Daniel Neill for references on fast scans; Martin Kulldorff for references on permutation scans, for bringing the reference (Jung and Cho Citation2015) to our attention, and for helpful comments; Bing Zhou for help with the real data application of Section 5.4; and Tengyao Wang and Richard Samworth for discussions about the properties of ϒ0. Comments of an anonymous referee helped improve the narrative and presentation. Some of the work was done while ET was at Technische Universiteit Eindhoven and MW was at the University of California, San Diego. The basic concept of this article arose out of conversations between some of the authors at the 2013 Mathematical Statistics conference in Luminy, France.

Notes

1 In fact, this setting might have been the original motivation for the work on the scan statistic (Wallenstein Citation2009).

2 This article was made public after our article was posted on the arxiv.org. To the best of our knowledge, this other publication became publicly available on October 20, 2015 (doi:10.1186/s12942-015-0024-6), a couple of months after ours appeared online on August 12, 2015.

3 The latter explains why, in two-sample testing, methods based on ranks were feasible decades before methods based on permutations, which typically require access to a computer.

4 For two distribution (functions) on the real line, F and G, we say that G stochastically dominates F if G(t) ⩽ F(t) for all tR. We denote this by GF.

5 A family of densities (fθ: θ ∈ Θ), where ΘR, has the monotone likelihood ratio property if fθ'(x)/fθ(x) is increasing in x when θ′ > θ.

6 Throughout, the observations are ranked in increasing order of magnitude.

7 This is based on a personal communication from Richard J. Samworth and Tengyao Wang, who got interested in this question after one of the present authors presented this work at Cambridge University.

8 Of course, it could be calibrated by permutation, but this would make the procedure much more like the permutation scan test (with the same high-computational burden), somewhat far from the intentions of Cai, Jeng, and Li (Citation2012).

Additional information

Funding

This work was partially supported by a grant from the U.S. National Science Foundation (DMS 1223137) and a grant from the Nederlandse organisatie voor Wetenschappelijk Onderzoek (NWO 613.001.114).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.