1,908
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Strong-Form Frequentist Testing In Communication Science: Principles, Opportunities, And Challenges

ORCID Icon & ORCID Icon

References

  • Aisbett, J., Lakens, D., & Sainani, K. (2020). Magnitude based inference in 23Relation to one-sided hypotheses testing procedures. SportRχiv. https://doi.org/10.31236/osf.io/pn9s3
  • Amrhein, V., & Greenland, S. (2017). Remove, rather than redefine, statistical significance. Nature Human Behaviour, 2(1), 4–4. https://doi.org/10.1038/s41562-017-0224-0
  • Amrhein, V., & Greenland, S. (2018). Remove, rather than redefine, statistical significance. Nature Human Behaviour, 2(1), 4. https://doi.org/10.1038/s41562-017-0224-0
  • Amrhein, V., Trafimow, D., & Greenland, S. (2019). Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician, 73(sup1), 262–270. https://doi.org/10.1080/00031305.2018.1543137
  • Asendorpf, J. B., Conner, M., de Fruyt, F., de Houwer, J., Denissen, J. J. A., Fiedler, K., … Wicherts, J. M. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27(2), 108–119. https://doi.org/10.1002/per.1919
  • Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. https://doi.org/10.1348/000712608X377117
  • Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., & Cesarini, D. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Erlbaum.
  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155. https://doi.org/10.1037/0033-2909.112.1.155
  • Cousins, R. D. (2020). Connections between statistical practice in elementary particle physics and the severity concept as discussed in Mayo’s Statistical Inference as Severe Testing. ArXiv:2002.09713 [Hep-Ex, Physics:Physics, Stat]. http://arxiv.org/abs/2002.09713
  • Cumming, G. (2013). Cohen’s d needs to be readily interpretable: Comment on Shieh (2013). Behavior Research Methods, 45(4), 968–971. https://doi.org/10.3758/s13428-013-0392-4
  • Cumming, G. (2014). The new statistics: Why and How. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
  • De Groot, A. (1956/2014). The meaning of “significance” for different types of research (translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han LJ van der Maas). Acta Psychologica, 148, 188–194.https://doi.org/10.1016/j.actpsy.2014.02.001
  • Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statistics In Medicine, 26(18), 3385–3397. https://doi.org/10.1002/sim.2771
  • Derksen, M. (2019). Putting Popper to work. Theory & Psychology, 29(4), 449–465. https://doi.org/10.1177/0959354319838343
  • Dienlin, T., Johannes, N., Bowman, N. D., Masur, P. K., Engesser, S., Kümpel, A. S., Lukito, J., Bier, L. M., Zhang, R., Johnson, B. K., Huskey, R., Schneider, F. M., Breuer, J., Parry, D. A., Vermeulen, I., Fisher, J. T., Banks, J., Weber, R., Ellis, D. A., & de Vreese, C. (2020). An Agenda for open science in communication. Journal of Communication, 71(1), 1–26. https://doi.org/10.1093/joc/jqz052
  • Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/bf03193146
  • Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149
  • Fisher, R. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society: Series B (Methodological), 17(1), 69–78. https://doi.org/10.1111/j.2517-6161.1955.tb00180.x
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Chapman & Hall/CRC.
  • Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606. https://doi.org/10.1016/j.socec.2004.09.033
  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
  • Greenland, S. (2019). Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values. The American Statistician, 73(sup1), 106–114. https://doi.org/10.1080/00031305.2018.1529625
  • Haig, B. D. (2020). What can psychology’s statistics reformers learn from the error-statistical perspective? Methods in Psychology, 2, 100020. https://doi.org/10.1016/j.metip.2020.100020
  • Hawkins, A. T., & Samuels, L. R. (2021). Use of confidence intervals in interpreting nonstatistically significant results. JAMA, 326(20), 2068–2069. https://doi.org/10.1001/jama.2021.16172
  • Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences (4th ed.). Sage Publications, Inc.
  • Lakatos, I. (1978). Falsification and the methodology of scientific research programmes. In J. Worrall & G. Currie (Eds.), The methodology of scientific research programmes: Philosophical papers (Vol. 1, pp. 8–101). Cambridge University Press.
  • Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. F., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., & Zwaan, R. A. (2018). Justify your α. Nature Human Behavior, 2(3), 168–171. https://doi.org/10.1038/s41562-018-0311-x
  • Lakens, D. (2021). The practical alternative to the P value is the correctly used P value. Perspectives on Psychological Science, 16(3), 639–648. https://doi.org/10.1177/1745691620958012
  • Levine, T. R., Weber, R., Hullett, C., Park, H. S., & Lindsey, L. L. M. (2008). A Critical Assessment of Null Hypothesis Significance Testing in Quantitative Communication Research. Human Communication Research, 34(2), 171–187. https://doi.org/10.1111/j.1468-2958.2008.00317.x
  • Liu, X., & Raudenbush, S. (2004). A note on the noncentrality parameter and effect size estimates for the F test in ANOVA. Journal of Educational and Behavioral Statistics, 29(2), 251–255. https://doi.org/10.3102/10769986029002251
  • Liu, X. S. (2014). Statistical power analysis for the social and behavioral sciences: Basic and advanced techniques. Routledge.
  • Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. The British Journal for the Philosophy of Science, 57(2), 323–357. https://doi.org/10.1093/bjps/axl003
  • Mayo, D. (2018). Statistical inference as severe testing. How to get beyond the statistics wars. Cambridge University Press.
  • Mayo, D. (2019). P -value thresholds: Forfeit at your Peril. European Journal of Clinical Investigation, 49(10), e13170. https://doi.org/10.1111/eci.13170
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73(sup1), 235–245. https://doi.org/10.1080/00031305.2018.1527253
  • Meehl, P. E. (1967). Theory-Testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115. https://doi.org/10.1086/288135
  • Meehl, P. E. (1990). Appraising and amending theories: The strategy of lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1
  • Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. The Journal of Applied Psychology, 84(2), 234–248. https://doi.org/10.1037/0021-9010.84.2.234
  • Neyman, J. (1942). Basic ideas and some recent results of the theory of testing statistical hypotheses. Journal of the Royal Statistical Society, 105(4), 292–327. https://doi.org/10.2307/2980436
  • Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. https://doi.org/10.1037/1082-989x.5.2.241
  • PASS. (2020). Power analysis and sample size software [Computer Software]. NCSS, LLC. ncss.com/software/pass
  • Pastore, M. (2018). Overlapping: A R package for estimating overlapping in empirical distributions. Journal of Open Source Software, 3(32), 1023. https://doi.org/10.21105/joss.01023
  • Popper, K. (1963). Conjectures and refutations: The growth of scientific knowledge. Routledge and Kegan Paul.
  • R Core Team (2020). R: A language and environment for statistical computing. https://www.R-project.org/
  • Ruckdeschel, P., Kohl, M., Stabla, T., & Camphausen, F. (2006). S4 classes for distributions. R News, 6(2), 2–6. https://cran.r-project.org/web/packages/distrDoc/vignettes/distr.pdf
  • Scheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2020). Why hypothesis testers should spend less time testing hypotheses. Perspectives on Psychological Science, 16(4), 744–755. https://doi.org/10.1177/1745691620966795
  • Snijders, T. A. B. (2005). Power and sample size in multilevel linear models. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science (Vol. 3, pp. 1570–1573). Wiley.
  • Spanos, A. (2014). Recurring controversies about P values and confidence intervals revisited. Ecology, 95(3), 645–651. https://doi.org/10.1890/13-1291.1
  • Spanos, A., & Mayo, D. G. (2015). Error statistical modeling and inference: Where methodology meets ontology. Synthese, 192(11), 3533–3555. https://doi.org/10.1007/s11229-015-0744-y
  • Vermeulen, I., Beukeboom, C. J., Batenburg, A., Avramiea, A., Stoyanov, D., van de Velde, B., & Oegema, D. (2015). Blinded by the light: How a focus on statistical “Significance” may cause p-value misreporting and an excess of p-values just below.05 in communication science. Communication Methods and Measures, 9(4), 253–279. https://doi.org/10.1080/19312458.2015.1096333
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p < .05. The American Statistician, 73(sup1), 1–19. https://doi.org/10.1080/00031305.2019.1583913
  • Weber, R., & Popova, L. (2012). Testing equivalence in communication research: Theory and application. Communication Methods and Measures, 6(3), 190–213. https://doi.org/10.1080/19312458.2012.703834
  • Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed.). CRC Press.