Full article: Testing exchangeability of multivariate distributions

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Although there have been a number of available tests of bivariate exchangeability, i.e. bivariate symmetry for bivariate distributions, the literature is void of tests whether a multivariate distribution with more than two dimensions is exchangeable or not. In this paper, multivariate permutation tests of exchangeability of multivariate distributions are proposed, which are based on the non-parametric combination methodology, i.e. on combining non-parametric bivariate exchangeability tests. Numerical experiments on real as well as simulated multivariate data with more than two dimensions are presented here. The multivariate permutation test turns out to be typically more powerful than a bivariate exchangeability test performed only over a single pair of variables, and also more suitable compared to tests exploiting the approaches of Benjamini–Yekutieli or Bonferroni.

Keywords:

1. Introduction

Let us consider a random vector $X \in R^{p}$ . Let $X_{1}, \dots, X_{n}$ with a fixed n represent p-dimensional random vectors following the probability distribution $L (X)$ . The distribution $L (X)$ is called exchangeable (interchangeable, permutable) if it holds (1) $L (P X) = L (X)$ (1) for all permutation matrices P, i.e. for all square matrices P of size $p \times p$ with values 0 or 1 such that each row and each column contains the value 1 exactly once. In other words, (Equation1(1) $L (P X) = L (X)$ (1) ) means that indexing of variables of $L (X)$ is irrelevant and $L (X)$ is permutation invariant. This work is interested in testing exchangeability of $L (X)$ based on given data, i.e. testing the null hypothesis $H_{0}$ that (Equation1(1) $L (P X) = L (X)$ (1) ) holds against the general alternative hypothesis $H_{1}$ stating that $H_{0}$ does not hold.

Attention has been paid to exchangeable distributions mainly for the bivariate case (p = 2). For a bivariate random vector $(X_{1}, X_{2})^{T}$ , exchangeability (Equation1(1) $L (P X) = L (X)$ (1) ) is equivalent to (2) $L (X_{1}, X_{2}) = L (X_{2}, X_{1}) .$ (2) Bivariate exchangeability often denoted as bivariate symmetry can be interpreted as symmetry with respect to permutation of axes or symmetry along the axis of the first quadrant (i.e. line $x_{1} = x_{2}$ ). We note that (Equation2(2) $L (X_{1}, X_{2}) = L (X_{2}, X_{1}) .$ (2) ) in the bivariate case naturally implies (3) $L (X_{1}) = L (X_{2}) .$ (3) Already the first bivariate symmetry tests in the 1970s [Citation18,Citation45] were motivated by a practical question of whether a medical treatment has an effect or not. There have been a number of available bivariate symmetry tests, which are suitable for paired data allowing for a (possibly high) correlation of the two variables within pairs. Still, we are not aware of any test for distributions with more than two dimensions, although these could represent very natural extensions of tests for the bivariate case. Only in the context of copulas, a test of exchangeability of copulas based on empirical processes was proposed in [Citation14] for an arbitrary dimension. To give a motivation example for distributions with $p \geq 3$ , the income of five nutrients (calcium, iron, protein, vitamin A, vitamin D) on a sample of women was investigated in [Citation11], where a test of exchangeability of the multivariate distribution of the 5 variables would be useful. Another example may be a study of the effect of a treatment (e.g. of the COVID-19 vaccine) applied in three sequential doses.

Testing exchangeability of multivariate distributions may also be motivated within a broader context of tests of various forms of symmetry of multivariate continuous distributions. It was recently suggested to replace testing the exchangeability of a distribution by testing axial symmetry, e.g. using the test based on (multivariate) directional quantiles of [Citation19]; it namely holds that if a multivariate distribution is exchangeable after a shift, then it is symmetric around the axis of the first orthant. Another connection mentioned in [Citation20] is related to p independent univariate distributions, which are assumed to be the same up to their location; their joint distribution (after a shift) has to be exchangeable. Thus, the test whether their distributions are the same (up to a shift) may be performed as a test of exchangeability of the joint distribution (again up to a shift). Other useful relationships between exchangeability of a distribution and some symmetry concepts (defined in [Citation33,Citation39]) were also described in [Citation19,Citation20].

It is important to stress that this paper is interested in permuting the variables of a multivariate distribution but not in permuting observations. The latter is connected to the concept of an exchangeable sequence of random variables (or arrays; see Section 7 of [Citation25]), which is defined as a sequence invariant to permuting the observations, in other words, as a sequence with an exchangeable distribution [Citation7]. Exchangeability of a sequence represents a weaker property compared to independence of the coordinates and the theory of finite exchangeable random vectors stems from de Finetti's theorem as overviewed in [Citation21]. Exchangeability of a sequence of random variables have found applications within conformal inference, which can be characterized as a method for constructing valid prediction errors for new sequentially added observations; conformal inference has become popular for neural networks or random forests [Citation26].

Section 2 recalls available tests of bivariate symmetry. In the methodological Section 3, multivariate permutation tests of exchangeability of distributions with more than two dimensions are proposed. The performance of the tests is investigated on real data in Section 4 and on simulated data in Section 5. Multivariate permutation tests turn out to outperform tests constructed from multiple comparison procedures of Benjamini–Yekutieli or Bonferroni. Section 6 brings conclusions.

2. Available bivariate symmetry tests

We now consider i.i.d. bivariate random vectors $(X_{11}, X_{21})^{T}, \dots, (X_{1 n}, X_{2 n})^{T}$ . Particular available tests of the null hypothesis of bivariate symmetry are overviewed in this section. Testing the exchangeability of distribution is desirable, especially if it can be performed in a non-parametric, distribution-free, coordinate-free way by means of simple and powerful tests not relying on strong assumptions. Bivariate symmetry tests can typically be performed as exact (permutation) tests; advantages of permutation tests in the context of bivariate symmetry were recalled in [Citation3] or [Citation10] and for the context of various other symmetry hypotheses in [Citation35]. We stress that all tests mentioned in this paper are performed on raw data without any normalization.

Bell and Haller [Citation3] overviewed several equivalent formulations of bivariate symmetry. They derived the likelihood ratio test for normal data in the form (4) $L R = \frac{1 - r^{2} (S, D)}{1 + | \bar{D} | / σ_{D}} w i t h \bar{D} = \sum_{i = 1}^{n} D_{i} / n a n d σ_{D}^{2} = \sum_{i = 1}^{n} (D_{i} - \bar{D})^{2} / n,$ (4) where $r (S, D)$ denotes the Pearson correlation coefficient between $(S_{1}, \dots, S_{n})^{T}$ and $(D_{1}, \dots, D_{n})^{T}$ with $D_{i} = X_{2 i} - X_{1 i}$ and $S_{i} = X_{1 i} + X_{2 i}$ for $i = 1, \dots, n$ . Moreover, all distribution-free tests of bivariate symmetry were proven to be based on permutations in [Citation3], where the Wilcoxon signed-rank test was disqualified from being used for testing bivariate symmetry, as its power was shown to equal only to the specified level α.
The test of Hollander [Citation18] is based on empirical distribution functions ${\hat{F}}_{n} (x_{1}, x_{2})$ and ${\hat{F}}_{n} (x_{2}, x_{1})$ . The exact size and power of the Hollander test, which represents a rank-based analogue of a Cramér-von Mises type test [Citation11], may be computed for small samples by the algorithm of Hilton and Gee [Citation17], which is implemented in the R package NSM3. Quessy [Citation35] considered a related statistic based on the characteristic function (instead of the empirical distribution function) and used the theory of V-statistics to derive their null asymptotic distribution.
Yanagimito and Sibuya [Citation45] proposed and investigated rank tests of bivariate symmetry based on the maximal invariant statistic. Let us now use the notation $(U_{1}, V_{1})^{T}, \dots, (U_{n}, V_{n})^{T}$ for the permutation of the random sample $(X_{11}, X_{21})^{T}, \dots, (X_{1 n}, X_{2 n})^{T}$ for that it holds $U_{i} \leq U_{i + 1}$ and $V_{i} \leq V_{i + 1}$ if $U_{i} = U_{i + 1}$ . Further, let $Z_{1}, \dots, Z_{n}$ be the corresponding permutation of $s g n (X_{1} - Y_{1}), \dots, s g n (X_{n} - Y_{n})$ . Let $R_{i}$ be the rank of $U_{i}$ in the pooled sample ${X_{11}, X_{21}, \dots, X_{1 n}, X_{2 n}}$ and let $S_{i}$ be the rank of $V_{i}$ in the pooled sample. The maximal invariant statistic is the vector of triplets $(R_{1}, S_{1}, Z_{1})^{T}, \dots, (R_{n}, S_{n}, Z_{n})^{T}$ and the paper considered the linear (bivariate) rank test statistic in the form (5) $T = \sum_{i = 1}^{n} Z_{i} (φ (R_{i}) - φ (S_{i}))$ (5) with a given score function φ and investigated its unbiasedness.
Snijders [Citation41] also considered rank tests (Equation5(5) $T = \sum_{i = 1}^{n} Z_{i} (φ (R_{i}) - φ (S_{i}))$ (5) ) and derived locally most powerful (LMP) rank tests and their asymptotic normality using a Hájek-type theorem (as in Section 6.1 of [Citation13]). Particularly, he proved (Equation5(5) $T = \sum_{i = 1}^{n} Z_{i} (φ (R_{i}) - φ (S_{i}))$ (5) ) with Wilcoxon scores, i.e. the test statistic (6) $W = \sum_{i = 1}^{n} Z_{i} (R_{i} - S_{i}),$ (6) to be the LMP rank test assuming $X_{1}$ and $X_{2}$ to come from logistic distributions.
Ernst and Schucany [Citation10] focused on testing the null hypothesis formulated as (Equation3(3) $L (X_{1}) = L (X_{2}) .$ (3) ). They proposed a bivariate test statistic so that one component compares the means between $X_{1}$ and $X_{2}$ and the other compares their variances. The test is based on the Mahalanobis distance of the test statistic from the origin.
Modarres [Citation32] proposed five tests of bivariate symmetry. The first three are obtained as two-sample tests of equality of distribution functions based on Euclidean interpoint distances; they compare the distribution of the raw data with that of the data reflected along the line $x_{1} = x_{2}$ :
1. Runs test (based on evaluating a minimum spanning tree).
2. Nearest neighbor test (application of the test of Henze [Citation15] to bivariate symmetry).
3. Rank test of equality of multivariate distribution functions (application of the test of Maa et al. [Citation28] to bivariate symmetry).
4. Sign test (based on diving the observations to 6 regions, performed as a standard test about the probability of a binomial distribution).
5. Bootstrap test based on $max {{\hat{F}}_{n} (x_{1 i}, x_{2 i}) - {\hat{F}}_{n} (x_{2 i}, x_{1 i})}$ , where $F_{n}$ denotes the empirical distribution function; the test requires bootstrap estimation of the null distribution of the test statistic.
Concerning the rank test, Modarres [Citation32] described it to reject bivariate symmetry for large values of the test statistic M; however, we can immediately think of a simplistic example, in which the test statistic decreases as we deviate from $H_{0}$ . Therefore, we recommend to perform the rank test as a two-sided test, rejecting $H_{0}$ for very large or very small test statistics. To justify this, violating bivariate symmetry implies that the test statistic is very small or very large. The null distribution of M is the same as that of the Wilcoxon rank-sum statistic for two samples, where each contains n observations. In analogy, the Wilcoxon rank-sum test (Wilcoxon two-sample test) based on interpoint distances, which was theoretically investigated in [Citation22], is also rejected for very large or very small values of the test statistic.
Rao and Raghunath [Citation37] developed a non-parametric test for a more general situation of symmetry about a line (possibly different from the line $x_{1} = x_{2}$ ). The test performs a tedious partition of the sample space to subjectively chosen sets (clusters) and the test statistic is obtained as a deviance measure comparing true and expected counts in the clusters.
A bivariate symmetry test for competing risks in survival analysis was proposed in [Citation9] and its saddle point approximation was derived in [Citation1].

3. Tests of exchangeability of multivariate distributions

Because there seems to be no direct way for obtaining exchangeability tests for distributions with more than two dimensions, we propose to construct such tests by combining (dependent) permutation tests of bivariate symmetry applied to individual pairs of variables. The proposed non-parametric combination methodology is presented in Section 3.1. For comparison, tests constructed from multiple comparisons procedures of Benjamini–Yekutieli or Bonferroni are also considered; these are described in Section 3.2. This paper concentrated on data coming from continuous distributions. For categorical data, which can be represented in square contingency tables, exchangeability of the distribution is known as complete symmetry and we refer to Section 10.7 of [Citation2] for its treatment.

Using the notation of Section 1, $H_{0}$ (Equation1(1) $L (P X) = L (X)$ (1) ) is replaced by a different null hypothesis (7) $H_{0}^{*} : L (X_{j}, X_{k}) = L (X_{k}, X_{j}) f o r a l l j = 1, \dots, p - 1, k = j + 1, \dots, p,$ (7) formulated for the total number $D = p (p - 1) / 2$ of all pairs of variables. We thus consider testing the composite null hypothesis $H_{0}^{*}$ , which may be expressed as (8) $H_{0}^{*} : ⋂_{j = 1}^{D - 1} ⋂_{k = j + 1}^{D} H_{0 j k}, w h e r e H_{0 j k} : L (X_{j}, X_{k}) = L (X_{k}, X_{j}),$ (8) against (9) $H_{1}^{*} : ⋃_{j = 1}^{D - 1} ⋃_{k = j + 1}^{D} H_{1 j k}, w h e r e H_{1 j k} : H_{0 j k} d o e s n o t h o l d .$ (9) However, because testing (Equation7(7) $H_{0}^{*} : L (X_{j}, X_{k}) = L (X_{k}, X_{j}) f o r a l l j = 1, \dots, p - 1, k = j + 1, \dots, p,$ (7) ) is not equivalent to testing the null hypothesis (Equation1(1) $L (P X) = L (X)$ (1) ), it may happen that exchangeability (Equation1(1) $L (P X) = L (X)$ (1) ) fails to be true while pairwise exchangeability is fulfilled for each individual pair.

3.1. Multivariate permutation test

As bivariate symmetry tests are often performed as permutation tests [Citation35], it is the most natural approach to exploit the non-parametric combination methodology using one of the approaches due to Fisher, Lipták or Tippett. In such approach, the dependence of the tests is implicitly taken into account by means of the permutation strategy (Section 1.2 of [Citation6]), irrespective of the dependence relations. The approach, which is free of assumptions about the distribution of the data, yields a global (combined) p-value and may be interpreted as a multivariate permutation test.

The implementation of the non-parametric combination methodology is straightforward following Algorithm 1 formulated for a general p, based on combining the total number of $D = p (p - 1) / 2$ tests of bivariate symmetry for all pairs of variables. In (11), ${\hat{Q}}_{d} (z)$ is used instead of (10) $\frac{1}{B} \sum_{b = 1}^{B} 1 [| T_{d}^{* (b)} | \geq z], z > 0, d = 1, \dots, D,$ (10) where $1$ denotes indicator function; the reason is a better ability of ${\hat{Q}}_{d} (z)$ to keep the probability of type I error in numerical experiments [Citation6].

In the experiments with multivariate data, we use tests obtained by combining these bivariate symmetry tests:

The test of Hollander [Citation18].
The likelihood ratio (LR) test of Bell and Haller [Citation3]; this is the only test which is rejected for a small value of the test statistic.
The rank test of Snijders [Citation41] with Wilcoxon scores.
The sign test of Modarres [Citation32].
The nearest neighbor (NN) test of Modarres [Citation32] based on interpoint distances.
The rank test of Modarres [Citation32] based on interpoint distances (say ID-rank test) with Wilcoxon scores; as explained in Section 2, we (unlike the original test) reject for very large or for very small values of the test statistic.

Algorithm 1 can be described as an adaptation of the general algorithm of [Citation6], using one of these combination functions as a special case:

Fisher (omnibus) $ψ (q_{1}, \dots, q_{D}) = - 2 \sum_{d} \log (q_{d})$ ;
Lipták $ψ (q_{1}, \dots, q_{D}) = \sum_{d} Φ^{- 1} (1 - q_{d})$ ;
Tippett $ψ (q_{1}, \dots, q_{D}) = max_{d} {1 - q_{d}}$ .

As discussed in Section 4.2.4 of [Citation34], Tippett's combination is recommendable if only one or a few (but not all) sub-alternatives are true, and that of Lipták [Citation27] performs well when all sub-alternatives are jointly true. Fisher's combination is the most popular in practice [Citation12]; it is considered intermediate between the two others and thus suitable when no prior expectation is available. Finding the optimal combining function for given data appears, however, impossible (Section 4.2.2 of [Citation34]).

Properties of the multivariate permutation tests of this section follow from the non-parametric combination methodology. Particularly, the constructed tests hold the probability of type I error and the distribution of the test statistics under $H_{0}^{*}$ does not depend on the underlying distribution of the data. This is true in spite of the individual p-values being dependent and the three choices of ψ used here to fulfil the assumptions of Section 1.2 of [Citation6]. If the individual (pairwise) tests are consistent, the multivariate permutation test is consistent. If the individual tests are unbiased, the multivariate permutation test is unbiased (Section 4.3 of [Citation8,Citation34]).

3.2. Testing exchangeability of a distribution based on multiple comparisons

For the sake of comparisons, we also consider exchangeability tests for distributions with dimensionality $p \geq 3$ based on multiple comparisons. Although the methods of this section are not primarily designed for the task and are intended for post hoc comparisons (after a global test of a composite null hypothesis), we use them here for constructing a global test of $H_{0}^{*}$ based on (corrected) p-values of pairwise tests of bivariate symmetry. We use here the notation $p_{1}, \dots, p_{D}$ for p-values of the pairwise tests, which will be arranged in ascending order as (13) $p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(D)} .$ (13)

Benjamini–Yekutieli. The most popular multiple testing procedure keeping the false discovery rate (FDR), defined as the percentage of false positive tests (incorrectly rejecting the null hypothesis) among all significant tests, below the chosen level α is the approach of Benjamini and Hochberg (B-H) [Citation4] for independent tests. An extension of Benjamini and Yekutieli (B-Y) [Citation5] is suitable for (potentially) dependent statistics even if the structure of the dependence is not known. It is convenient to express the (global) test based on the B-Y procedure by means of the scheme (14) $H_{0}^{*} i s r e j e c t e d ⟺ \prod_{d = 1}^{D} 1 [p_{(d)} > \frac{d}{D} {(\sum_{h = 1}^{D} \frac{1}{h})}^{- 1} α] = 0.$ (14) Some other procedures for testing high-dimensional data controling for FDR were presented in the overview [Citation23].

The Bonferroni method represents the simplest approach to multiple tests ensuring to keep the family-wise error rate (FWER), defined as the probability of at least one incorrect rejecting the null hypothesis (i.e. making at least one type I error) among all tests, under the specified level α. We now use it to construct the (very conservative) global test according to the scheme (15) $H_{0}^{*} i s r e j e c t e d ⟺ p_{(1)} \leq α / D .$ (15)

4. Analysis of real datasets

We consider two real datasets to illustrate the performance of the tests of exchangeability of distributions with p = 3. All computations of this paper were performed in R software [Citation36] exploiting additional packages (NSM3, spdep, purrr, FNN, and BioConductor).

The first dataset contains gene expression (GE) measurements acquired in the study described in [Citation31]. Gene expressions for $p = 38 950$ gene transcripts were measured on 24 individuals having a cerebrovascular stroke and 24 control persons. First, the Limma methodology (Linear Models for Microarray Data of [Citation40]) was applied to find the most differentially expressed genes. The data are used here only for the n = 24 patients and with p = 3 most important genes that contribute the most to the separation between the two groups of individuals.

We are interested in testing whether the joint distribution of the three considered genes is exchangeable. The results of permutation tests of bivariate symmetry applied to individual pairs of variables (1-2, 1-3, and 2-3, i.e. with D = 3 in the notation of Algorithm 1) for the GE dataset are presented in Table . The multivariate permutation tests based on test statistics of bivariate symmetry tests are presented there as well. Tests of bivariate symmetry for the second and third gene have smaller p-values compared to tests evaluated for other pairs of genes. For all the tests, the multivariate permutation tests yield the smallest p-values, if a suitable combination method is chosen; however, the best of the three combination methods in terms of power turns out to be different for different bivariate symmetry tests. Most commonly (although not always), the approach of Tippett yields the best results here.

Table 1. P-values of tests for the gene expressions (GE) dataset of Section 4 with p = 3 and n = 24.

Download CSV Display Table

The second dataset is a subset of the Australian athletes (AA) dataset with p = 3. This dataset available, e.g. in the R software package DAAG [Citation29] with the red blood cell count ( $X_{1}$ ), white blood cell count ( $X_{2}$ ), and hemoglobin concentration ( $X_{3}$ ) was analyzed, e.g. in [Citation16] or [Citation24]. Graphical visualizations reveal the variables to be very far from being permutable; thus, we further consider 1000 random subsamples from the dataset, where each contains only n = 20 measurements. If we consider all tests presented in Table for the GE dataset, the average p-values across the 1000 subsamples of the AA dataset are all highly significant and below 0.001. This also makes the global tests using Bonferroni correction and the B-Y procedure significant in all situations under consideration.

5. Simulations

The aim of the simulations is to compare the performance of the tests of exchangeability of a distribution proposed in Section 3 for testing the null hypothesis $H_{0}^{*}$ against $H_{1}^{*}$ . The performance of permutation tests of bivariate symmetry applied to pairs of variables is investigated as well. The simulations are performed for data generated from four different models. In simulations A, B, C, and E, we randomly generate $κ = 1000$ samples and for each of them, the permutation tests (bivariate or multivariate) are always performed with 1000 permutations.

Simulation A. Using n = 20, we independently generate 3-dimensional data from normal distribution $N_{3} (0, I_{3})$ , where $I_{3}$ denotes the unit matrix of size $3 \times 3$ . A selected percentage (ranging from 0 to $20 %$ ) of the observations is replaced by values, which are independently generated from $N_{3} (μ, I_{3} / 10)$ , where $μ = (7, - 1, 1)^{T}$ . The results in the form of empirical rejection frequencies (rates) are presented in Table .Footnote¹

Table 2. Simulation A: Empirical rejection frequencies (in $%$ ).

Display Table

Simulation B. Using n = 30, we independently generate 3-dimensional data from multivariate $t_{1}$ distribution with scale matrix $I_{3}$ . A selected percentage (ranging from 0 to $20 %$ ) of the observations is replaced by values, which are independently generated from non-central multivariate $t_{1}$ distribution with scale matrix $I_{3} / 10$ and with non-centrality parameter (i.e. mode) equal to $μ = (12, 5, 0)^{T}$ . The results in the form of empirical rejection frequencies are presented in Table .

Table 3. Simulation B: Empirical rejection frequencies (in $%$ ).

Display Table

Simulation C. Using n = 30, we independently generate 4-dimensional data from the multivariate logistic distribution [Citation30] with the vector of location parameters $(1, 1, 1, 1)^{T}$ and the vector of scale parameters $(1, 1, 1, 1)^{T}$ . A selected percentage (ranging from 0 to $25 %$ ) of the observations is replaced by values, which are independently generated from multivariate logistic distribution with the vector of location parameters $(6, 6, - 1, - 1)^{T}$ and the vector of scale parameters $(0.3, 0.3, 0.3, 0.3)^{T}$ . The results in the form of averaged empirical rejection frequencies are visualized in Figure , where the horizontal lines correspond to $5 %$ . The average rejection rates of bivariate symmetry tests applied to variables 1 and 3 are shown in Figure (left); testing for the pairs of variables 1-4, 2-3, and 2-4 yields analogous results due to the structure of the data. Average rejection rates of the multivariate permutation test using Tippett's method is shown in Figure (middle) and using Lipták's method in Figure (right).

Figure 1. Simulation C: empirical rejection rates for the permutation test of bivariate symmetry for variables 1 and 3 (left), and for the multivariate permutation test using the approach of Fisher (middle) and Tippett (right). The curves correspond to (1) Hollander (stars), (2) LR (circles), (3) rank test of Snijders (squares), (4) sign test (triangles), (5) NN test (diamonds), and (6) ID-rank test (plus signs).

Simulation D. Simulation D is aimed at comparing the methods for larger values of p. Using n = 20, we independently generate p-dimensional data from multivariate normal distribution $N_{p} (μ, I_{p})$ , where $μ_{i} = 1 [i \leq m]$ is considered for $i = 1, \dots, p$ with a given m. We use always 1000 permutations; $κ = 1000$ is used for p = 10, but only $κ = 100$ is used for p = 100. The results are presented in Table .

Table 4. Simulation D: Empirical rejection frequencies (in $%$ ).

Display Table

To discuss the results of all the simulations, the multivariate permutation tests hold the probability of type I error at the $5 %$ level. The power of all the tests increases together with increasing contamination of the datasets, i.e. increases as the data become more distant from $H_{0}$ . To compare individual tests, the largest powers are obtained with Hollander's test and the ID-rank test (i.e. in the novel version as a two-sided ID-rank test as justified here in Section 2). It seems as the most interesting result that the computations confirm the powers of the multivariate permutation tests to outperform those of bivariate symmetry tests applied to an individual pair of variables. Concerning the non-parametric combination methodology, Tippett's approach yields the largest power and Fisher's is slightly weaker, leaving Lipták's approach behind.

The Benjamini–Yekutieli or Bonferroni approaches, which are primarily designed for post hoc comparisons, without any surprise reduce the type I errors and also attain lower powers. For p = 3, the global test based on the B-Y procedure compares $p_{(1)}$ with 0.009, $p_{(2)}$ with 0.018, and $p_{(3)}$ with 0.027, and the test based on the Bonferroni procedure has its probability of type I error equal to precisely 0.017. For a small deviation from $H_{0}^{*}$ with p = 3, the powers of the B-Y or Bonferroni global tests turn out to be extremely low (much below those of the multivariate permutation tests), and rapidly increase with an increasing deviation from $H_{0}^{*}$ . For a larger p, the probability of type I error of B-Y and Bonferroni drops even more, which is very apparent here for p = 100 with a corresponding D = 4950 pairs to be compared. Tippett's approach outperforms the B-Y or Bonferroni approaches in all considered situations under $H_{1}^{*}$ for p = 10 as well as for p = 100.

Simulation E. In order to remind that the test procedures of Section 3.1 were formulated after replacing $H_{0}$ by $H_{0}^{*}$ , we present one more simulation for data that violate $H_{0}$ and retain close to $H_{0}^{*}$ as much as possible. To approximate such situation, we start with the non-contaminated data from Simulation A. These are modified to decrease the density around points $(- m, - m, - m)^{T}$ , $(m, m, - m)^{T}$ , $(- m, m, m)^{T}$ , and $(m, - m, m)^{T}$ with m = 1.5 by finding always 3 closest observations to these points in terms of the Euclidean distance. These are replaced by data around points $(m, - m, - m)^{T}$ , $(- m, m, - m)^{T}$ , $(- m, - m, m)^{T}$ , and $(m, m, m)$ , which are generated as normally distributed with expectation given to these points and with covariance matrix $I_{3} / 10$ . The results in Table show the pairwise tests to have their powers very slightly above 0.05 being outperformed by multivariate permutation tests. The results of the latter remain nevertheless quite low for such a strong violation of $H_{0}$ , so this specific design reveals the limitation of replacing $H_{0}$ by $H_{0}^{*}$ .

Table 5. Simulation E: Empirical rejection frequencies (in $%$ ).

Display Table

6. Conclusions

While the literature seems void of exchangeability tests for distributions with $p \geq 3$ , this paper investigates several possibilities for their construction based on combining bivariate symmetry tests performed over individual pairs of variables. We recommend to perform the multivariate tests as multivariate permutation tests obtained by the non-parametric combination methodology. The computations over the presented real as well as simulated multivariate data reveal the multiple testing procedures to be more powerful than a bivariate exchangeability test performed only over a single pair of variables.

Numerous available studies comparing the Bonferroni correction with B-H and/or B-Y approaches [Citation44] have most often not considered the non-parametric combination methodology. This is because the non-parametric combination methodology is a specific approach tailor-made for combining permutation tests. Multivariate permutations tests keep the probability of type I error under the specified level. This does not, however, hold for tests based on the approaches of B-H, B-Y or Bonferroni, which are all primarily designed for post hoc comparisons; these approaches turn out in the simulations of Section 5 to be unsuitable for testing a composite (global) null hypothesis. All the multivariate tests presented in this paper are computationally demanding for larger values of p, which is a natural property of all tests based on permutations. Let S denote the complexity of computing an individual test statistic and B the number of permutations (as in Algorithm 1). The computational complexity of the multivariate permutation test, which can be expressed as $(D + 1) B S$ , is fully comparable to that of the approaches of B-Y or Bonferroni, which both have the complexity of DBS.

Finally, we can say that the multivariate tests of this paper consider p-values as significance probabilities, as it is typical in statistical practice, not reflecting that they are actually random variables. A perspective approach based on expected p-values (EPV) developed in [Citation38] and extended in [Citation42,Citation43] for the context of multiple testing seems, however, not to have been extended to the context of permutation tests (as EPV-based testing needs a specified null distribution of the test statistic and its known or estimated distribution under the alternative) and to the context with a composite alternative hypothesis.

Acknowledgments

The authors would like to thank Miroslav Šiman for the discussion. The authors received valuable input from the reviewers and an associate editor.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The work is supported by the grants GA21-05325S (J. Kalina) and GA22-02067S (P. Janáček) of the Czech Science Foundation.

Notes

1 A confidence interval for a rejection frequency π based on its estimate

\hat{π}

obtained in simulations may be computed using the standard error

SE (\hat{π}) = 1.96 \cdot \sqrt{\hat{π} (1 - \hat{π}) / κ}

, if the lower bound is non-negative. When simulating

κ = 1000

random samples, it approximately holds here that

SE (\hat{π}) ≐ 0.03

for

\hat{π} \in (0.2, 0.8)

and

SE (\hat{π}) ≐ 0.02

for

\hat{π} \in (0.05, 0.2]

or for

\hat{π} \in [0.8, 0.95)

References

E.F. Abd-Elfattah, Bivariate symmetry tests for complete and competing risks data: A saddle point approach, J. Stat. Comput. Simul. 87 (2017), pp. 1269–1275.
Web of Science ®Google Scholar
A. Agresti, Categorical Data Analysis, 2nd ed., Wiley, Hoboken, 2002.
Google Scholar
C.B. Bell and H.S. Haller, Bivariate symmetry tests: parametric and nonparametric, Ann. Math. Stat. 40 (1969), pp. 259–269.
Google Scholar
Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Stat. Soc. Ser. B. 57 (1995), pp. 289–300.
Web of Science ®Google Scholar
Y. Benjamini and D. Yekutieli, The control of the false discovery rate in multiple testing under dependency, Ann. Stat. 29 (2001), pp. 1165–1188.
Web of Science ®Google Scholar
S. Bonnini, L. Corain, M. Marozzi, and L. Salmaso, Nonparametric Hypothesis Testing: Rank and Permutation Methods with Applications in R, Wiley, New York, 2014.
Google Scholar
D. Commenges, Transformations which preserve exchangeability and application to permutation tests, J. Nonparametr. Stat. 15 (2003), pp. 171–185.
Web of Science ®Google Scholar
L. Corain and L. Salmaso, Nonparametric permutation and combination-based multivariate control charts with applications in microelectronics, Appl. Stoch. Models Bus. Ind. 29 (2013), pp. 334–349.
Web of Science ®Google Scholar
J.V. Deshpande, A test for bivariate symmetry of dependent competing risks, Biom. J. 32 (2007), pp. 737–746.
Google Scholar
M.D. Ernst and W.R. Schucany, A class of permutation tests of bivariate interchangeability, J. Am. Stat. Assoc. 94 (1999), pp. 273–284.
Web of Science ®Google Scholar
C. Genest, J. Nešlehová, and J.F. Quessy, Tests of symmetry for bivariate copulas, Ann. Inst. Stat. Math. 64 (2012), pp. 811–834.
Web of Science ®Google Scholar
R.A. Giancristofaro and C. Brombin, Overview of nonparametric combination-based permutation tests for multivariate multi-sample problems, Statistica 74 (2014), pp. 233–246.
Google Scholar
J. Hájek, Z. Šidák, and P.K. Sen, Theory of Rank Tests, 2nd ed., Academic Press, San Diego, 1999.
Google Scholar
M. Harder and U. Stadtmüller, Testing exchangeability of copulas in arbitrary dimension, J. Nonparametr. Stat. 29 (2017), pp. 40–60.
Web of Science ®Google Scholar
N. Henze, A multivariate two-sample test based on the number of nearest neighbor type coincidences, Ann. Stat. 16 (1988), pp. 772–783.
Web of Science ®Google Scholar
N. Henze, Z. Hlávka, and S.G. Meintanis, Testing for spherical symmetry via the empirical characteristic function, Statistics 48 (2014), pp. 1282–1296.
Web of Science ®Google Scholar
J.F. Hilton and L. Gee, The size and power of the exact bivariate symmetry test, Comput. Stat. Data Anal. 26 (1997), pp. 53–69.
Web of Science ®Google Scholar
M. Hollander, A nonparametric test for bivariate symmetry, Biometrika 58 (1971), pp. 203–212.
Web of Science ®Google Scholar
Š. Hudecová and M. Šiman, Testing axial symmetry by means of directional regression quantiles, Electron. J. Stat. 15 (2021), pp. 2690–2715.
Web of Science ®Google Scholar
Š. Hudecová and M. Šiman, Testing symmetry around a subspace, Stat. Pap. 62 (2021), pp. 2491–2508.
Web of Science ®Google Scholar
S. Janson, T. Konstantopoulos, and L. Yuan, On a representation theorem for finitely exchangeable random vectors, J. Math. Anal. Appl. 442 (2016), pp. 703–714.
Web of Science ®Google Scholar
J. Jurečková and J. Kalina, Nonparametric multivariate rank tests and their unbiasedness, Bernoulli 18 (2012), pp. 229–251.
Web of Science ®Google Scholar
J. Kalina, Classification methods for high-dimensional data, Biocybern. Biomed. Eng. 34 (2014), pp. 10–18.
Web of Science ®Google Scholar
J. Kalina, Common multivariate estimators of location and scatter capture the symmetry of the underlying distribution, Commun. Stat. Simul. 50 (2021), pp. 2845–2857.
Web of Science ®Google Scholar
O. Kallenberg, Probabilistic Symmetries and Invariance Principles, Springer, New York, 2005.
Google Scholar
A.K. Kuchibhotla, Exchangeability, conformal prediction, and rank tests, preprint (2021). Available at https://arxiv.org/abs/2005.06095v3.
Google Scholar
T. Lipták, On the combination of independent tests, Magyar Tud. Akad. Mat. Kutató Int. Közl. 3 (1958), pp. 171–197.
Google Scholar
J.F. Maa, D.K. Pearl, and R. Bartoszyński, Reducing multidimensional two-sample data to one-dimensional interpoint comparisons, Ann. Stat. 24 (1996), pp. 1069–1074.
Web of Science ®Google Scholar
J.H. Maindonald and W.J. Braun, DAAG: Data analysis and graphics data and functions, R package version 1.22, 2015. Available at https://cran.r-project.org/web/packages/DAAG.
Google Scholar
H.J. Malik and B. Abraham, Multivariate logistic distributions, Ann. Stat. 3 (1973), pp. 588–590.
Google Scholar
M. Marozzi, A. Mukherjee, and J. Kalina, Interpoint distance tests for high-dimensional comparison studies, J. Appl. Stat. 47 (2020), pp. 653–665.
PubMed Web of Science ®Google Scholar
R. Modarres, Tests of bivariate exchangeability, Int. Stat. Rev. 76 (2008), pp. 203–213.
Web of Science ®Google Scholar
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks, Springer, New York, 2010.
Google Scholar
F. Pesarin and L. Salmaso, Permutation Tests for Complex Data: Theory, Applications and Software, Wiley, New York, 2010.
Google Scholar
J.F. Quessy, On consistent nonparametric statistical tests of symmetry hypotheses, Symmetry 8 (2016), Article 31.
Google Scholar
R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, 2018. Available at http://www.R-project.org.
Google Scholar
K.S.M. Rao and M. Raghunath, A simple nonparametric test for bivariate symmetry about a line, J. Stat. Plan. Inference 142 (2012), pp. 430–444.
Web of Science ®Google Scholar
H. Sackrowitz and E. Samuel-Cahn, P-values as random variables: Expected p values, Am. Stat. 53 (1999), pp. 326–3321.
Web of Science ®Google Scholar
R. Serfling, Multivariate symmetry and asymmetry, in Encyclopedia of Statistical Sciences, S. Kotz, N. Balakrishnan, C.B. Read, B. Vidakovic, eds., 2nd ed., Vol. 8, Wiley, New York, 2006, pp. 5338–5345.
Google Scholar
G.K. Smyth, Limma: Linear models for microarray data, in Bioinformatics and Computational Biology Solutions Using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber, eds., Springer, New York, 2005, pp. 397–420.
Google Scholar
T. Snijders, Rank tests for bivariate symmetry, Ann. Stat. 9 (1981), pp. 1087–1095.
Web of Science ®Google Scholar
A. Vexler and J. Yu, To t-test or not to t-test? A p-values-based point of view in the receiver operating characteristic curve framework, J. Comput. Biol. 25 (2018), pp. 541–550.
PubMed Web of Science ®Google Scholar
A. Vexler, J. Yu, Y. Zhao, A.D. Hutson, and G. Gurevich, Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures, Stat. Methods Med. Res. 27 (2018), pp. 3560–3576.
PubMed Web of Science ®Google Scholar
T. White, J. van der Ende, and T.E. Nichols, Beyond Bonferroni revisited: Concerns over inflated false positive research findings in the fields of conservation genetics, biology, and medicine, Conserv. Genet. 20 (2019), pp. 927–937.
Web of Science ®Google Scholar
T. Yanagimoto and M. Sibuya, Test of symmetry of a bivariate distribution, Sankhya A 38 (1976), pp. 105–115.
Google Scholar

Testing exchangeability of multivariate distributions

Abstract

1. Introduction

2. Available bivariate symmetry tests

3. Tests of exchangeability of multivariate distributions

3.1. Multivariate permutation test

3.2. Testing exchangeability of a distribution based on multiple comparisons

4. Analysis of real datasets

Table 1. P-values of tests for the gene expressions (GE) dataset of Section 4 with p = 3 and n = 24.

5. Simulations

Table 2. Simulation A: Empirical rejection frequencies (in $%$ ).

Table 3. Simulation B: Empirical rejection frequencies (in $%$ ).

Table 4. Simulation D: Empirical rejection frequencies (in $%$ ).

Table 5. Simulation E: Empirical rejection frequencies (in $%$ ).

6. Conclusions

Acknowledgments

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Testing exchangeability of multivariate distributions

Abstract

1. Introduction

2. Available bivariate symmetry tests

3. Tests of exchangeability of multivariate distributions

3.1. Multivariate permutation test

3.2. Testing exchangeability of a distribution based on multiple comparisons

4. Analysis of real datasets

Table 1. P-values of tests for the gene expressions (GE) dataset of Section 4 with p = 3 and n = 24.

5. Simulations

Table 2. Simulation A: Empirical rejection frequencies (in %).

Table 3. Simulation B: Empirical rejection frequencies (in %).

Table 4. Simulation D: Empirical rejection frequencies (in %).

Table 5. Simulation E: Empirical rejection frequencies (in %).

6. Conclusions

Acknowledgments

Disclosure statement

Additional information

Funding

Notes

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 2. Simulation A: Empirical rejection frequencies (in $%$ ).

Table 3. Simulation B: Empirical rejection frequencies (in $%$ ).

Table 4. Simulation D: Empirical rejection frequencies (in $%$ ).

Table 5. Simulation E: Empirical rejection frequencies (in $%$ ).