230
Views
0
CrossRef citations to date
0
Altmetric
Testing and Inference

Comparing Two Samples Through Stochastic Dominance: A Graphical Approach

ORCID Icon, , &
Pages 551-566 | Received 29 Dec 2021, Accepted 26 May 2022, Published online: 19 Jul 2022
 

Abstract

Nondeterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this article, we propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables stochastically dominates the other one. Then, we present a graphical method that decomposes in quantiles (i) the proposed dominance measure and (ii) the probability that one of the random variables takes lower values than the other. With illustrative purposes, we reevaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions—missed by the rest of the methods—can be inferred. Additionally, the software package RVCompare was created as a convenient way of applying and experimenting with the proposed framework.

Supplementary Material

RVCompare:With this R package, users can compute the CP and CD of two distributions, given their probability density functions. Furthermore, it can be used to produce the proposed cumulative difference-plot, given the observed data. (The package can be directly installed from CRAN and is also available in the GitHub repo https://github.com/EtorArza/RVCompare).

Reproducibility:Alongside the article, we provide the code to generate the figures in this article and replicate the experimentation. For instructions on how to install the dependencies and replicate the results, refer to the README.md file in the repository. (GitHub repo https://github.com/EtorArza/SupplementaryPaperRVCompare).

Appendices:To keep the length of the article at a reasonable size, the appendices have been moved to another document. (The appendices are available for download at https://doi.org/10.5281/zenodo.6528669).

Disclosure Statement

The authors report that there are no competing interests to declare.

Acknowledgments

This work was funded in part by the Spanish Ministry of Science, Innovation and Universities through PID2019-106453GA-I00/AEI/10.13039/5011 00011033 and the BCAM Severo Ochoa excellence accreditation SEV-2017-0718; by the Basque Government through the Research Groups 2019-2021 IT1244-19, ELKARTEK Program (project code KK-2020/00049) and BERC 2018-2021 program.

Notes

1 We follow an example in the Keras Chollet et al. (2015) library, and train the neural network for one epoch.

2 The source of the package RVCompare can be found at github.com/EtorArza/RVCompare. The code to reproduce every figure in the article is available at github.com/EtorArza/SupplementaryPaperRVCompare.

3 Without loss of generality, minimization is assumed in this article.

4 Note that XAXB is not equivalent to XBXA.

5 We define XA+λ as the random variable, that is, sampled in two steps: first obtain an observation from XA and then add λ to this observation. We define λ·XA in a similar way.

6 The probability density function of M[1τ,τ](XB1,XB2) is defined as (1τ)·gB1(x)+τ·gB2(x). Note that τ[0,1].

7 See https://etorarza.github.io/pages/2021-interactive-comparing-RV.html for an interactive example that illustrates the above point.

8 The bootstrapping Efron and Tibshirani (1993) method involves considering the observed values as a population from which random samples with replacement are drawn. These samples are then used to estimate the upper and lower pointwise confidence intervals of the cumulative distribution of YA and YB . Since a pointwise estimation of the confidence interval is used, we can expect that a portion proportional to α will fall outside the confidence band.

Note that we are interested in having an overall confidence of 1α, thus, we want that the cumulative distributions of YA and YB are inside their confidence bands at the same time with this level of confidence (Goeman and Solari 2014; Bauer 1991). This means that we have to use a higher confidence level for each band: (1α).

9 Note that if the random variables being compared take values in a maximization setting (higher values are preferred), then the random variables need to be redefined as the inverse with respect to the sum (this simply means the sampled values are multiplied by –1) before generating the cumulative difference-plot. With this change, the interpretation of the cumulative difference-plot is consistent and intuitive: for either minimization or maximization, on the left side of the cumulative difference-plot, the most desirable values that the random variables take are compared. If the difference is positive on the left side of the cumulative difference-plot, then the best values that XA takes are better than the best values that XB takes. Similarly, the worst values are compared on the right side of the cumulative difference-plot: if the difference is positive on this side, then the worst values of XA are better than the worst values of XB .

10 The definition of what data with a more extreme statistic value is not the same for every null hypothesis test, and it depends on the test being used.

11 For paired data, the Wilcoxon signed-rank test (Wilcoxon Citation1945) or the sign test (Conover and Conover Citation1980) should be used. However, in the context of this article, the samples observed from the random variables are not paired. In this article, we consider the Mann–Whitney test as it is probably the most well-known nonparametric test for unpaired data, although take into account that more modern alternatives have been proposed (Ledwina and Wyłupek 2012; Baumgartner, WeiB, and Schindler 1998; Biswas and Ghosh 2014).

12 The source code to replicate this experiment is available in the file mann_whitney_counter_example.R in our Github repository.

Additional information

Funding

This work was funded in part by the Spanish Ministry of Science, Innovation and Universities through PID2019-106453GA-I00/AEI/10.13039/5011 00011033 and the BCAM Severo Ochoa excellence accreditation SEV-2017-0718; by the Basque Government through the Research Groups 2019-2021 IT1244-19, ELKARTEK Program (project code KK-2020/00049) and BERC 2018-2021 program.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.