On Exact Computation of Tukey Depth Central Regions: Journal of Computational and Graphical Statistics: Vol 33 , No 2

Abstract

The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the d-dimensional space whose Tukey depth exceeds given thresholds k. We address the problem of fast and exact computation of those central regions. First, we analyze an efficient Algorithm (A) from Liu, Mosler, and Mozharovskyi, and prove that it yields exact results in dimension d = 2, or for a low threshold k in arbitrary dimension. We provide examples where Algorithm (A) fails to recover the exact Tukey depth region for d > 2, and propose a modification that is guaranteed to be exact. We express the problem of computing the exact central region in its dual formulation, and use that viewpoint to demonstrate that further substantial improvements to our algorithm are unlikely. An efficient C++ implementation of our exact algorithm is freely available in the R package TukeyRegion.

Keywords:

Supplementary Materials

An updated R package TukeyRegion, version 0.1.6.3 where the novel exact Algorithm (B) is implemented.
A pdf file with an additional Algorithm (A³) motivated by an extension of Algorithm (A) with k = 2. Using the dual graph, we present a dataset where also this possible simplification of Algorithms (B) and (C) fails to recover the central region. Further, we propose to use the dual graph for heuristic assessment of the quality of approximation using non-exact algorithms like Algorithms (A), (A²) or (A³). This file also contains very detailed results of the complete simulation study.
A Mathematica notebook with functions for computing the dual graph of X, containing also interactive visualizations of all the examples provided in this article.
Complete R source codes for the simulation studies performed in Section 4 and the supplementary material.

Disclosure Statement

The authors report there are no competing interests to declare.

Notes

1 We consider only the depth for datasets, that is the sample depth. For general measures the depth is typically taken scaled into the interval $[0, 1]$ , which is obtained by dividing our expression for hD by n. For our purposes, the integer-valued version of the depth is more convenient to work with, but this minor difference is without loss of generality.

2 By the barycenter we mean the expected value of the uniform distribution on this convex compact set.

3 A set of $n \geq d + 1$ points in $R^{d}$ is in general position if no d + 1 of these points lie in a hyperplane.

4 Convex hull of A is defined as the intersection of all convex sets that contain A; its affine hull is the intersection of all translations of vector subspaces (that is, affine subspaces of $R^{d}$ ) that contain A.

5 At this step we slightly simplify Algorithm 2 from [LMM]. In the original version, only two relevant halfspaces $H \in H (k)$ of this type are found in Step 2(d) [LMM, p. 686]. Of course, our inclusion of (possibly) more than two relevant halfspaces in (A₂) makes Algorithm (A) to search through more ridges. Thus, if Algorithm 2 from [LMM] is exact, then so must be our Algorithm (A). This difference is of no importance for our exposition, and does not alter any of our conclusions.

6 The extreme cases $k \geq n / 2$ are not interesting, because clearly $h D_{k} (X) = \emptyset$ if $k > n / 2$ (see e.g., Liu, Luo, and Zuo Citation2020, Theorem 1). Furthermore, for n even and $k = n / 2$ , if the set $h D_{n / 2} (X)$ is non-empty, then X is a halfspace symmetric (Zuo and Serfling Citation2000b) configuration of points. By Liu, Luo, and Zuo (Citation2020, Proposition 1) for d > 2 this is impossible. For d = 2 and the nontrivial case n > 2 this is possible only for $h D_{n / 2} (X)$ a single point set (Zuo and Serfling Citation2000b, Theorem 3.1), a situation which is not covered by RidgeSearch. In fact, it can be shown that for X sampled from an absolutely continuous distribution in dimension d = 2, $h D_{n / 2} (X)$ is either empty or a sample point from X, with probability one (Pokorný, Laketa, and Nagy Citation2023).

7 It must be, however, noted that the time complexity of Step (C₃) might exceed that of Steps (C₁) and (C₂). For numerical evidence, see Tables 3 and 4 in [LMM] where computation times for Steps (C₁) and (C₂) are reported (times without brackets) together with times for Step (C₃) (times in brackets).

Additional information

Funding

This work was partially supported by a mobility grant 8J21FR013 of the Czech Ministry of Education, Youth and Sports and the Programme Barrande (Campus France) mobility grant 46745VD of the French Ministry of Europe and Foreign Affairs and French Ministry of Higher Education and Research. P. Laketa was supported by the OP RDE project “International mobility of research, technical and administrative staff at the Charles University,” grant CZ.02.2.69/0.0/0.0/18_053/0016976. The work of S. Nagy was supported by Czech Science Foundation (EXPRO project n. 19-28231X). The work of P. Mozharovskyi was supported by the Young Researcher Grant of the French National Agency for Research (ANR JCJC 2021) in category Artificial Intelligence (registered under the number ANR-21-CE23-0029-01).

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 180.00 Add to cart

* Local tax will be added as applicable

On Exact Computation of Tukey Depth Central Regions

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

On Exact Computation of Tukey Depth Central Regions

Abstract

Supplementary Materials

Disclosure Statement

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature