Comparing Two Samples Through Stochastic Dominance: A Graphical Approach: Journal of Computational and Graphical Statistics: Vol 32 , No 2

Abstract

Nondeterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this article, we propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables stochastically dominates the other one. Then, we present a graphical method that decomposes in quantiles (i) the proposed dominance measure and (ii) the probability that one of the random variables takes lower values than the other. With illustrative purposes, we reevaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions—missed by the rest of the methods—can be inferred. Additionally, the software package RVCompare was created as a convenient way of applying and experimenting with the proposed framework.

Keywords:

Supplementary Material

RVCompare:With this R package, users can compute the $C_{P}$ and $C_{D}$ of two distributions, given their probability density functions. Furthermore, it can be used to produce the proposed cumulative difference-plot, given the observed data. (The package can be directly installed from CRAN and is also available in the GitHub repo https://github.com/EtorArza/RVCompare).

Reproducibility:Alongside the article, we provide the code to generate the figures in this article and replicate the experimentation. For instructions on how to install the dependencies and replicate the results, refer to the README.md file in the repository. (GitHub repo https://github.com/EtorArza/SupplementaryPaperRVCompare).

Appendices:To keep the length of the article at a reasonable size, the appendices have been moved to another document. (The appendices are available for download at https://doi.org/10.5281/zenodo.6528669).

Disclosure Statement

The authors report that there are no competing interests to declare.

Acknowledgments

This work was funded in part by the Spanish Ministry of Science, Innovation and Universities through PID2019-106453GA-I00/AEI/10.13039/5011 00011033 and the BCAM Severo Ochoa excellence accreditation SEV-2017-0718; by the Basque Government through the Research Groups 2019-2021 IT1244-19, ELKARTEK Program (project code KK-2020/00049) and BERC 2018-2021 program.

Notes

1 We follow an example in the Keras Chollet et al. (2015) library, and train the neural network for one epoch.

2 The source of the package RVCompare can be found at github.com/EtorArza/RVCompare. The code to reproduce every figure in the article is available at github.com/EtorArza/SupplementaryPaperRVCompare.

3 Without loss of generality, minimization is assumed in this article.

4 Note that $X_{A} ⊁ X_{B}$ is not equivalent to $X_{B} ≻ X_{A}$ .

5 We define $X_{A} + λ$ as the random variable, that is, sampled in two steps: first obtain an observation from X_A and then add λ to this observation. We define $λ \cdot X_{A}$ in a similar way.

6 The probability density function of $M_{[1 - τ, τ]} (X_{B 1}, X_{B 2})$ is defined as $(1 - τ) \cdot g_{B 1} (x) + τ \cdot g_{B 2} (x)$ . Note that $τ \in [0, 1]$ .

7 See https://etorarza.github.io/pages/2021-interactive-comparing-RV.html for an interactive example that illustrates the above point.

8 The bootstrapping Efron and Tibshirani (1993) method involves considering the observed values as a population from which random samples with replacement are drawn. These samples are then used to estimate the upper and lower pointwise confidence intervals of the cumulative distribution of Y_A and Y_B . Since a pointwise estimation of the confidence interval is used, we can expect that a portion proportional to α will fall outside the confidence band.

Note that we are interested in having an overall confidence of $1 - α$ , thus, we want that the cumulative distributions of Y_A and Y_B are inside their confidence bands at the same time with this level of confidence (Goeman and Solari 2014; Bauer 1991). This means that we have to use a higher confidence level for each band: $\sqrt{(1 - α)}$ .

9 Note that if the random variables being compared take values in a maximization setting (higher values are preferred), then the random variables need to be redefined as the inverse with respect to the sum (this simply means the sampled values are multiplied by –1) before generating the cumulative difference-plot. With this change, the interpretation of the cumulative difference-plot is consistent and intuitive: for either minimization or maximization, on the left side of the cumulative difference-plot, the most desirable values that the random variables take are compared. If the difference is positive on the left side of the cumulative difference-plot, then the best values that X_A takes are better than the best values that X_B takes. Similarly, the worst values are compared on the right side of the cumulative difference-plot: if the difference is positive on this side, then the worst values of X_A are better than the worst values of X_B .

10 The definition of what data with a more extreme statistic value is not the same for every null hypothesis test, and it depends on the test being used.

11 For paired data, the Wilcoxon signed-rank test (Wilcoxon Citation1945) or the sign test (Conover and Conover Citation1980) should be used. However, in the context of this article, the samples observed from the random variables are not paired. In this article, we consider the Mann–Whitney test as it is probably the most well-known nonparametric test for unpaired data, although take into account that more modern alternatives have been proposed (Ledwina and Wyłupek 2012; Baumgartner, WeiB, and Schindler 1998; Biswas and Ghosh 2014).

12 The source code to replicate this experiment is available in the file mann_whitney_counter_example.R in our Github repository.

Additional information

Funding

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 180.00 Add to cart

* Local tax will be added as applicable

Comparing Two Samples Through Stochastic Dominance: A Graphical Approach

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Comparing Two Samples Through Stochastic Dominance: A Graphical Approach

Abstract

Supplementary Material

Disclosure Statement

Acknowledgments

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature