3,312
Views
0
CrossRef citations to date
0
Altmetric
Editorial

The Importance of Effect Size Reporting in Communication Research Reports

ORCID Icon

Abstract

In reflection of Communication Research Reports shift to requiring (rather than suggesting) effect size reporting with all manuscripts as of Volume 34, a few remarks explaining the context of this shift for submitting authors are offered, including a brief explanation of the complementary relationship between p-values and effects sizes, and the reasons that both are critical to providing readers with a more complete sense of any manuscript’s evidential value.

Nearly 23 years ago, the American Psychological Association (APA, Citation1994) first made mention of the importance of effect size reportingalbeit through a vague statement “encouraging” the presentation of effect-size information (critiqued for its ambiguity by Fidler, Citation2010). As a journal that uses APA guidelines to direct our own manuscript style, we have informally required effect size measures for years. Beginning with Volume 32, Communication Research Reports (CRR)Footnote1 added the following requirement to manuscripts being submitted for publication consideration:

All data analyses should report relevant effect size statistics, and we recommend reporting other diagnostics (such as, but not limited to: observed power, tests of significance inflation, and test of insufficient variance, among others) at the Author’s discretion (Communication Research Reports, Citation2017, para. 4).

As a journal that is primarily concerned with reporting empirical work from a social science perspective (Bowman, Citation2016), the requirement that all authors report effect size estimates is a recognition of perhaps one of the most critical outcomes of empirical research: the magnitude of an observed effect (Lakens, Citation2013). Authors commonly rely solely on tests of statistical significance, usually with an a priori probability level of p < .05, to frame their effects. Indeed, the p < .05 significance level is so ubiquitous in communication studies that direct interpretation of the value is often not considered (and surprisingly difficult, c.f. Aschwanden, Citation2015). In simplest terms, when a finding is statistically significant (using an a priori p-value), one has evidence that the probability of a finding being an equal or at least as extreme as what was observed is high (in the case of p = .05, at least 95%), assuming the null is correct.Footnote2

However, p-values are only able to inform the binary decision as to whether or not a given treatment has an effect that is statistically significant from the null—they are unable to offer insight into the magnitude of the actual effect (Sullivan & Feinn, Citation2012). Effect sizes such as Cohen’s d and R2 or η2 allow scholars to report on the magnitude of their effects and primers such as that offered by Cohen (Citation1988) provide a basic framework for interpreting those effects in term of being comparatively small (Cohen’s d = .2 corresponds to about 1% of observed explained explained), moderate (Cohen’s d = .5 corresponds to about 6% of observed variance explained), or large (Cohen’s d = .8 corresponds to about 14% of observed variance explained) Footnote3. Notably, Cohen cautioned against using this framework as a de facto determination of an effect’s practical magnitude, and encouraged researchers to frame effects within the context for which they are being observed. Weber and Popova (Citation2012) suggest scholars benchmark findings against effect sizes in previous work.

Critical to the relationship between significance levels and effect sizes is that although the formulas for both include observed means and standard deviations, only significance levels consider sample size in their calculation (in denominator). The end result is that larger sample sizes lead to lower p-values regardless of the actual magnitude of an effect. Consider an observed mean difference of .10 between groups with equal standard deviations of .50. The magnitude of this effect would be Cohen’s d = .2 (a small effect) regardless if it was observed between groups of two people, 20 people, or 2000 (setting aside the discussion of effect size stability, cf. Lakens & Evers, Citation2014). Yet, the observed effect—the same 1% of explained variance—would not trigger a statistically significant effect at the p < .05 level until the sample size reached a minimum N = 788 (394 per group). Notably, this example highlights two related concerns with a sole reliance on p-values, that (a) authors often used p-values as their primary evidential value in rejecting the null hypothesis, while (b) disregarding interpretation magnitude of an observed effect that might not have practical impact.

While p-values represent the probability that an observed finding (or more extreme) differs from chance, they offer no insight into the “extremity” of the effect itself—and they place more evidential value on the sample size rather than the effect size. Should one interpret an effect size of 1% as more relevant, simply because it was culled from a sample of 788 (as compared to similarly representative samples of 78 or 78,000)? It seems inappropriate to use sample size as sole evidence of a hypothesis’ impact without further elaboration as to the magnitude of the effect. Effect sizes work in tandem with p-values: the p-value gives an estimate of findings being systematic with some probability, and the effect size quantifies the magnitude of the finding. Finally, effects sizes are critical for meta-analytic work that attempts to synthesize across several studies in order to establish overall observed data (Schmidt & Hunter, Citation2015).

The issues inherent to the discussion of effect sizes—and the broader critiques of statistical conclusions validity—are complex and beyond the scope of any one short editorial. The end goal here is to raise awareness in potential CRR authors as to the reasons why the journal has shifted towards being stricter with respect to the requirement of effect size reporting and interpretation, as an effort to improve the evidential value of manuscripts published with CRR.Footnote4

Acknowledgments

I wish to sincerely thank my numerous colleagues who offered peer review of this editorial prior to publication.

Notes

[1] I should note that under Editor Don Stacks (Vols. 31-33), effect size estimates were increasingly strongly recommended, as I can attest to from my own experiences as an author and reviewer. A more comprehensive definition of p as a conditional probability is offered by Kline (Citation2013).

[2] Notably, CRR is generally aware of emerging perspectives in the social sciences that encourage alternatives to null-hypothesis significance tests (NHST) due in part to their reliance on assuming the null hypothesis to be a priori correct, such Bayesian methods that allow for direct comparisons between the statistical likelihood of competing hypotheses, such as the predicted and null hypotheses (Eddy, Citation2004 by contrast, the logic underpinning NHST only allows one to reject or fail to reject the null. We welcome submissions based on Bayesian statistical models, as well as other equivalence tests such as those proffered by Weber and Popova, Citation2012.

[3] A number of tools exist online to help authors and readers translate between different effects sizes, such as those provided by Lenhard and Lenhard (Citation2016).

[4] For authors looking for additional guidance, there are a broad range of effects sizes available based on one’s focus (c.f. Lakens, Citation2013, for a primer on effect sizes for mean comparisons; Schmidt and Hunter, Citation2015, for explanation and application of myriad effect size measures for meta-analyses).

References

  • American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: APA.
  • Aschwanden, C. (2015, November 24). Not even scientist can easily explain p-values. FiveThirtyEight. Retrieved from http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/
  • Bowman, N. D. (2016). Research reports as the “nuts and bolts” of communication research. Communication Research Reports, 33(2), 87. doi: 10.1080/08824096.2016.1174536
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Communication Research Reports. (2017). Instructions for authors. Retrieved from http://www.tandfonline.com/action/authorSubmission?show=instructions&journalCode=rcrr20
  • Eddy, S. R. (2004). What is Bayesian statistics? Nature Biotechnology, 22, 1177–1178. doi: 10.1038/nbt0904-1177.
  • Edwards, M. A., & Roy, S. (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34(1), 51–61. doi: 10.1089/ees.2016.0223
  • Fidler, F. (2010). The American Psychological Association publication manual, 6th edition: Implications for statistics education. In C. Reading (Ed)., Data and context in statistics education: Towards an evidence-based society. Voorberg, Netherlands: International Association of Statistics Education.
  • Kline, R. (2013). Beyond significance testing. Washington, D.C.: APA.
  • Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers of Psychology, 26. doi: 10.3389/fpsyg.2013.00863
  • Lakens, D. & Evers, E. R. K. (2014). Sailing from the seas of chaos into the corridor or stability. Perspectives on Psychological Science, 9(3). 278–292. doi: 10.1177/1745691614528520
  • Lenhard, W. & Lenhard, A. (2016). Calculation of Effect Sizes. Available at https://www.psychometrica.de/effect_size.html. Bibergau (Germany): Psychometrica. doi: 10.13140/RG.2.1.3478.4245
  • Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage.
  • Sullivan, G. M. & Feinn, R. (2012). Using effect sizes—or why the p-value is not enough. Journal of Graduate Medical Education, 4(3), 279–282. doi: 10.4300.JGME-D-12-00156.1
  • Weber, R., & Popova, L. (2012). Testing equivalence in communication research: Theory and application. Communication Methods & Measures, 6, 190–213. doi: 10.1080/19312458.2012.703834

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.