ABSTRACT
Recently, online-controlled experiments (i.e., A/B tests) have become an extremely valuable tool used by internet and technology companies for purposes of advertising, product development, product improvement, customer acquisition, and customer retention to name a few. The data-driven decisions that result from these experiments have traditionally been informed by null hypothesis significance tests and analyses based on p-values. However, recently attention has been drawn to the shortcomings of hypothesis testing, and an emphasis has been placed on the development of new methodologies that overcome these shortcomings. We propose the use of posterior probabilities to facilitate comparisons that account for practical equivalence and that quantify the likelihood that a result is practically meaningful, as opposed to statistically significant. We call these posterior probabilities comparative probability metrics (CPMs). This Bayesian methodology provides a flexible and intuitive means of making meaningful comparisons by directly calculating, for example, the probability that two groups are practically equivalent, or the probability that one group is practically superior to another. In this article, we describe a unified framework for constructing and estimating such probabilities, and we develop a sample size determination methodology that may be used to determine how much data are required to calculate trustworthy CPMs.
Supplementary Materials
The supplementary materials contain the simulation results described in Sections 2.2, 2.3, and 4.2, and are available online. The data associated with the examples from Section 3 are also available online.
Acknowledgments
The authors thank two anonymous reviewers and an associate editor for insightful comments that were very helpful in improving the article.
Funding
This work was supported by the Natural Sciences and Engineering Research Council of Canada by way of an Undergraduate Student Research Award as well as Grant RGPIN-2019-04212.