71
Views
1
CrossRef citations to date
0
Altmetric
Efficient Computing

On Exact Computation of Tukey Depth Central Regions

, , & ORCID Icon
Pages 699-713 | Received 09 Jun 2022, Accepted 07 Aug 2023, Published online: 21 Nov 2023
 

Abstract

The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the d-dimensional space whose Tukey depth exceeds given thresholds k. We address the problem of fast and exact computation of those central regions. First, we analyze an efficient Algorithm (A) from Liu, Mosler, and Mozharovskyi, and prove that it yields exact results in dimension d = 2, or for a low threshold k in arbitrary dimension. We provide examples where Algorithm (A) fails to recover the exact Tukey depth region for d > 2, and propose a modification that is guaranteed to be exact. We express the problem of computing the exact central region in its dual formulation, and use that viewpoint to demonstrate that further substantial improvements to our algorithm are unlikely. An efficient C++ implementation of our exact algorithm is freely available in the R package TukeyRegion.

Supplementary Materials

  • An updated R package TukeyRegion, version 0.1.6.3 where the novel exact Algorithm (B) is implemented.

  • A pdf file with an additional Algorithm (A3) motivated by an extension of Algorithm (A) with k = 2. Using the dual graph, we present a dataset where also this possible simplification of Algorithms (B) and (C) fails to recover the central region. Further, we propose to use the dual graph for heuristic assessment of the quality of approximation using non-exact algorithms like Algorithms (A), (A2) or (A3). This file also contains very detailed results of the complete simulation study.

  • A Mathematica notebook with functions for computing the dual graph of X, containing also interactive visualizations of all the examples provided in this article.

  • Complete R source codes for the simulation studies performed in Section 4 and the supplementary material.

Disclosure Statement

The authors report there are no competing interests to declare.

Notes

1 We consider only the depth for datasets, that is the sample depth. For general measures the depth is typically taken scaled into the interval [0,1], which is obtained by dividing our expression for hD by n. For our purposes, the integer-valued version of the depth is more convenient to work with, but this minor difference is without loss of generality.

2 By the barycenter we mean the expected value of the uniform distribution on this convex compact set.

3 A set of nd+1 points in Rd is in general position if no d + 1 of these points lie in a hyperplane.

4 Convex hull of A is defined as the intersection of all convex sets that contain A; its affine hull is the intersection of all translations of vector subspaces (that is, affine subspaces of Rd) that contain A.

5 At this step we slightly simplify Algorithm 2 from [LMM]. In the original version, only two relevant halfspaces HH(k) of this type are found in Step 2(d) [LMM, p. 686]. Of course, our inclusion of (possibly) more than two relevant halfspaces in (A2) makes Algorithm (A) to search through more ridges. Thus, if Algorithm 2 from [LMM] is exact, then so must be our Algorithm (A). This difference is of no importance for our exposition, and does not alter any of our conclusions.

6 The extreme cases kn/2 are not interesting, because clearly hDk(X)= if k>n/2 (see e.g., Liu, Luo, and Zuo Citation2020, Theorem 1). Furthermore, for n even and k=n/2, if the set hDn/2(X) is non-empty, then X is a halfspace symmetric (Zuo and Serfling Citation2000b) configuration of points. By Liu, Luo, and Zuo (Citation2020, Proposition 1) for d > 2 this is impossible. For d = 2 and the nontrivial case n > 2 this is possible only for hDn/2(X)a single point set (Zuo and Serfling Citation2000b, Theorem 3.1), a situation which is not covered by RidgeSearch. In fact, it can be shown that for X sampled from an absolutely continuous distribution in dimension d = 2, hDn/2(X) is either empty or a sample point from X, with probability one (Pokorný, Laketa, and Nagy Citation2023).

7 It must be, however, noted that the time complexity of Step (C3) might exceed that of Steps (C1) and (C2). For numerical evidence, see Tables 3 and 4 in [LMM] where computation times for Steps (C1) and (C2) are reported (times without brackets) together with times for Step (C3) (times in brackets).

Additional information

Funding

This work was partially supported by a mobility grant 8J21FR013 of the Czech Ministry of Education, Youth and Sports and the Programme Barrande (Campus France) mobility grant 46745VD of the French Ministry of Europe and Foreign Affairs and French Ministry of Higher Education and Research. P. Laketa was supported by the OP RDE project “International mobility of research, technical and administrative staff at the Charles University,” grant CZ.02.2.69/0.0/0.0/18_053/0016976. The work of S. Nagy was supported by Czech Science Foundation (EXPRO project n. 19-28231X). The work of P. Mozharovskyi was supported by the Young Researcher Grant of the French National Agency for Research (ANR JCJC 2021) in category Artificial Intelligence (registered under the number ANR-21-CE23-0029-01).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.