691
Views
0
CrossRef citations to date
0
Altmetric
Special Section on Roles of Hypothesis Testing, p-Values, and Decision-Making in Biopharmaceutical Research

Is the p-Value a Suitable Basis for the Construction of Measures of Evidence? Comment on “The Role of p-Values in Judging the Strength of Evidence and Realistic Replication Expectations”

&
Pages 28-29 | Received 01 Sep 2020, Accepted 16 Sep 2020, Published online: 05 Nov 2020

Dr. Gibson has to be congratulated for having enriched the wealth of articles written in response to the ASA statement on p-values of 2016 by a valuable and thoughtful contribution. We particularly appreciate that a biostatistician being in charge of one of the worldwide leading pharmaceutical companies joins the fraction of expert statistical scientists who do not share the view that p-values ought to be deprived of their role as a major backbone of inference in the empirical sciences as a whole. Remarkably, among the colleagues taking a skeptical perspective regarding the inititative platformed on one of the flagship journals of the ASA are even top-ranking representatives of that highly respected scientific organization: Karen Kafadar (Citation2019), who emphatically warned in an editorial to Amstat News of the potentially detrimental consequences of abandoning “our well-researched and theoretically sound statistical methodology,” was the President of the ASA when she wrote this! Fortunately, historical experience suggests that p-values and hypotheses tests are fairly robust against initiatives aiming to eliminate them from the practice of statistical data analysis. Actually, a p-value banning campaign was launched already in the 1980s by several influential epidemiologists. In that previous version, the campaign targeted at replacing hypothesis testing by confidence interval estimation as the standard methodology of inference for use in scientific data analysis (cf. Rothman Citation1978; Salsburg Citation1985; Gardner and Altman Citation1986). Some of its main proponents organized a symposium devoted to the topic at the 1986 Annual Conference of the Society for Epidemiologic Research (SER) and chaired it wearing t-shirts with “no p-value” logos. There is little, if any evidence, that the pro-confidence interval initiative had substantial impact on scientific practice and statistical education. As a striking proof of the contrary, a glance over a recent issue of the American Journal of Epidemiology might be sufficient: although p-values are strongly discouraged in the instruction to authors, numerous articles reporting results of significance tests are still published in that journal. One of the indicators suggesting that the recent campaign can be expected to share the same fate is that up to now no steps were taken to change the title of the magazine edited jointly by the ASA and the Royal Statistical Society for promoting the public perception of the statistical science. The title still reads “Significance,” and there are not a few additional facts making it very unlikely that methods of significance testing might really become obsolete in the near future.

In its more technical part, Gibson (Citation2020) focuses on proposals for making the p-value a sensible measure of evidence through rescaling it by means of a suitable one-to-one transformation. We have serious doubts whether finding such a transformation is a worthwhile objective. Major reasons for this view are the following.

  1. Distinguishing between good and poor measures of evidence requires that one is able to define this intrinsically elusive term without operationalizing it through providing a specific rule for its quantification. The definition offered by Gibson (“Evidence is information indicating the degree to which a proposition is valid.”) is nonoperational but fairly problematic from a logical point of view: a proposition of whatever kind is valid or not, implying that validity cannot be considered a degradable attribute.

  2. In many areas of biopharmaceutical research and the medical sciences, one is regularly urged to take a decision on whether or not the “evidence” made available on a given problem is sufficient for considering it solved and setting the corresponding action(s) in effect. The p-value in its original form seems much better suited for decision making than a transform like log10(p) not admitting a direct probabilistic interpretation. By experience, most people have a much more clear-cut sense of orders of magnitude for probabilities as compared to different scales. The high importance of decision making in the medical sciences was already cited by Fleiss (Citation1986a, 1986b, 1986c) in his convincing refutation of the alleged inadequacy of significance testing as a basic tool of scientific inference.

  3. In the simple example of the two-sample Gauss-test for differences between two normal means with unpaired samples of data of common known variance, the p-value and thus any transform of it, is a function of the normalized point estimate n/2|δ̂|/σ. The latter indicates how much larger the observed value of the natural estimator of the parameter of interest came out to be as compared to its standard error and seems at least as well suited for measuring the “evidence” for a deviation of the true parameter value from zero as are continuous monotone functions of this computationally straightforward quantity. Except for the factor n/2, it coincides with the estimated “effect size” in the sense of the terminology popularized primarily among researchers in the social sciences by Cohen (Citation1962).

An important issue with regard to which we fully agree with the author of the article under discussion, is the suitability claimed in a number of references cited there, of the so-called Bayes factor as an option for replacing the p-value by an improved measure of evidence. First, the term is misleading, since it refers to a quantity which is not intrinsically Bayesian but plays a prominent role in frequentist inference under the heading likelihood ratio. Second, the concept can easily made precise only in the practically rather irrelevant case of a couple of one-point hypotheses, since otherwise either H0 or H1 or both correspond to a whole, typically non-finite sub-family of densities of the outcome variable. Third and most important, we know of no convincing argument warranting the claim that, as a measure of “evidence,” the Bayes factor makes more sense than the p-value.

References

  • Cohen, J. (1962), “The Statistical Power of Abnormal-Social Psychological Research: A Review,” The Journal of Abnormal and Social Psychology, 65, 145–153. DOI: 10.1037/h0045186.
  • Fleiss, J. L. (1986a), “Significance Tests Have a Role in Epidemiologic Research: Reactions to A.M. Walker,” American Journal of Public Health, 76, 559–560. DOI: 10.2105/ajph.76.5.559.
  • Cohen, J. (1986b), “Confidence Intervals vs Significance Tests: Quantitative Interpretation,” American Journal of Public Health, 76, 587.
  • Cohen, J. (1986c), “Dr. Fleiss Responds,” American Journal of Public Health, 76, 1033–1044.
  • Gardner, M. J., and Altman, D. G. (1986), “Confidence Intervals Rather Than P Values: Estimation Rather Than Hypothesis Testing,” British Medical Journal (Clinical Research Edition), 292, 746–750. DOI: 10.1136/bmj.292.6522.746.
  • Gibson, E. W. (2020), “The Role of p-Values in Judging the Strength of Evidence and Realistic Replication Expectations,” Statistics in Biopharmaceutical Research, doi:10.1080/19466315.2020.1724560.
  • Kafadar, K. (2019), “Statistics and Unintended Consequences,” Amstat News, 504, 3–4.
  • Rothman, K. J. (1978), “A Show of Confidence,” The New England Journal of Medicine, 299, 1362–1363. DOI: 10.1056/NEJM197812142992410.
  • Salsburg, D. S. (1985), “The Religion of Statistics as Practiced in Medical Journals,” The American Statistician, 39, 220–223.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.