5,884
Views
7
CrossRef citations to date
0
Altmetric
Reforming Institutions: Changing Publication Policies and Statistical Education

The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication

&
Pages 358-373 | Received 15 Mar 2018, Accepted 23 Nov 2018, Published online: 20 Mar 2019

References

  • Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017), “The Earth is Flat (p < 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” Technical report, PeerJ Preprints.
  • Aycaguer, L. C. S., and Galbán, P. A. (2013), Explicación Del Tamaño Muestral Empleado: Una Exigencia Irracional De Las Revistas Biomédicas. Gaceta Sanitaria 27, 53–57.
  • Bakker, M., van Dijk, A., and Wicherts, J. M. (2012), “The Rules of the Game Called Psychological Science,” Perspectives on Psychological Science, 7, 543–554. DOI: 10.1177/1745691612459060.
  • Begley, C. G., and Ellis, L. M. (2012), “Drug Development: Raise Standards for Preclinical Cancer Research,” Nature, 483, 531–533. DOI: 10.1038/483531a.
  • Begley, C. G., and Ioannidis, J. P. (2015), “Reproducibility in Science: Improving the Standard for Basic and Preclinical Research,” Circulation Research, 116, 116–126. DOI: 10.1161/CIRCRESAHA.114.303819.
  • Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., Fehr, E., Fidler, D., Field, A. P., Forster, M., George, E. I., Gonzalez, R., Goodman, S., Green, E., Green, D. P., Greenwald, A. G., Hadfield, J. D.,. Hedges, L. V., Held, L., Hua Ho, T., Hoijtink, H., Hruschka, D. J., Imai, K., Imbens, G., Ioannidis, J. P. A., Jeon, M., Holland Jones, J., Kirchler, M., Laibson, D., List, J., Little, R., Lupia, A., Machery, E., Maxwell, S. E., McCarthy, M., Moore, D. A.,. Morgan, S. L., Munafó, M., Nakagawa, S., Nyhan, B., Parker, T. H., Pericchi, L., Perugini, M., Rouder, J., Rousseau, J., Savalei, V., Schönbrodt, F. D., Sellke, T., Sinclair, B., Tingley, D., Van Zandt, T., Vazire, S., Watts, D. J., Winship, C., Wolpert, R. L., Xie, Y. Young, C., Zinman, J. and Johnson V. E. (2018), “Redefine Statistical Significance,” Nature Human Behaviour, 2, 6–10. DOI: 10.1038/s41562-017-0189-z.
  • Bland, J. M. (2009), “The Tyranny of Power: Is There a Better Way to Calculate Sample Size? BMJ, 339, b3985. DOI: 10.1136/bmj.b3985.
  • Borm, G. F., den Heijer, M., and Zielhuis, G. A. (2009), “Publication Bias Was Not a Good Reason to Discourage Trials With Low Power,” Journal of Clinical Epidemiology, 62, 47–53. DOI: 10.1016/j.jclinepi.2008.02.017.
  • Bosker, T., Mudge, J. F., and Munkittrick, K. R. (2013), “Statistical Reporting Deficiencies in Environmental Toxicology,” Environmental Toxicology and Chemistry, 32, 1737–1739. DOI: 10.1002/etc.2226.
  • Brembs, B., Button, K., and Munafò, M. (2013), “Deep Impact: Unintended Consequences of Journal Rank,” Frontiers in Human Neuroscience, 7, 1–12.
  • Burt, T., Button, K., Thom, H., Noveck, R., and Munafò, M. R. (2017), “The Burden of the “False-Negatives” in Clinical Development: Analyses of Current and Alternative Scenarios and Corrective Measures,” Clinical and Translational Science, 10, 470–479. DOI: 10.1111/cts.12478.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., and Munafò, M. R. (2013a). “Empirical Evidence for Low Reproducibility Indicates Low Pre-study Odds,” Nature Reviews Neuroscience 14, 877–877. DOI: 10.1038/nrn3475-c6.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., and Munafò, M. R.(2013b), Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience,” Nature Reviews Neuroscience 14 (5), 365–376. DOI: 10.1038/nrn3475.
  • Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., et al. (2016), “Evaluating Replicability of Laboratory Experiments in Economics,” Science, 351, 1433–1436. DOI: 10.1126/science.aaf0918.
  • Campbell, H., and Gustafson, P. (2018), “Conditional Equivalence Testing: An Alternative Remedy for Publication Bias,” PloS One 13, e0195145. DOI: 10.1371/journal.pone.0195145.
  • Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P., and Willmes, K. (2015), “Registered Reports: Realigning Incentives in Scientific Publishing,” Cortex, 66, A1–A2. DOI: 10.1016/j.cortex.2015.03.022.
  • Charles, P., B. Giraudeau, A. Dechartres, G. Baron, and P. Ravaud (2009). Reporting of sample size calculation in randomised controlled trials. BMJ 338, b1732. DOI: 10.1136/bmj.b1732.
  • Charlton, B. G., and Andras, P. (2006). “How Should We Rate Research?: Counting Number of Publications May Be Best Research Performance Measure,” BMJ 332, 1214–1215. DOI: 10.1136/bmj.332.7551.1214-c.
  • Coffman, L. C., and Niederle, M. (2015), “Pre-analysis Plans Have Limited Upside, Especially Where Replications Are Feasible,” Journal of Economic Perspectives, 29, 81–98. DOI: 10.1257/jep.29.3.81.
  • Cohen (1962), “The Statistical Power of Abnormal-Social Psychological Research: A Review,” The Journal of Abnormal and Social Psychology, 65, 145–153. DOI: 10.1037/h0045186.
  • Cohen, B. A. (2017), “How Should Novelty be Valued in Science?” eLife 6:e28699.
  • Contopoulos-Ioannidis, D. G., Ntzani, E., and Ioannidis, J. (2003), “Translation of Highly Promising Basic Science Research Into Clinical Applications,” The American Journal of Medicine, 114, 477–484. DOI: 10.1016/S0002-9343(03)00013-5.
  • Dasgupta, P., and Maskin, E. (1987), “The Simple Economics of Research Portfolios,” The Economic Journal, 97, 581–595. DOI: 10.2307/2232925.
  • de Winter, J., and Happee, R. (2013), “Why Selective Publication of Statistically Significant Results Can Be Effective,” PLoS One, 8, e66463.
  • Djulbegovic, B., and Hozo, I. (2007), “When Should Potentially False Re-search Findings Be Considered Acceptable?” PLoS Medcine, 4, e26. DOI: 10.1371/journal.pmed.0040026.
  • Djulbegovic, B., Kumar, A., Magazin, A., Schroen, A. T., Soares, H., Hozo, I., Clarke, M., Sargent, D., and Schell, M. J. (2011), “Optimism Bias Leads to Inconclusive Results—An Empirical Study,” Journal of Clinical Epidemiology 64, 583–593. DOI: 10.1016/j.jclinepi.2010.09.007.
  • Dumas-Mallet, E., Button, K. S., Boraud, T., Gonon, F., and Munafò, M. R. (2017), “Low Statistical Power in Biomedical Science: A Review of Three Human Research Domains,” Royal Society Open Science, 4, 160254. DOI: 10.1098/rsos.160254.
  • Fanelli, D. (2011), “Negative Results Are Disappearing From Most Disciplines and Countries,” Scientometrics, 90, 891–904. DOI: 10.1007/s11192-011-0494-7.
  • Fidler, F., Burgman, M. A., Cumming, G., Buttrose, R., and Thomason, N. (2006), “Impact of Criticism of Null-hypothesis Significance Testing on Statistical Reporting Practices in Conservation Biology,” Conservation Biology, 20, 1539–1544. DOI: 10.1111/j.1523-1739.2006.00525.x.
  • Fiedler, K. (2017). What Constitutes Strong Psychological Science? The (neglected) Role of Diagnosticity and a Priori Theorizing,” Perspectives on Psychological Science, 12, 46–61. DOI: 10.1177/1745691616654458.
  • Fire, M., and Guestrin, C. (2018), “Over-optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action,” arXiv preprint arXiv:1809.07841.
  • Fraley, R. C., and Vazire, S. (2014), “The n-pact Factor: Evaluating the Quality of Empirical Journals With Respect to Sample Size and Statistical Power,” PloS One 9, e109019. DOI: 10.1371/journal.pone.0109019.
  • Freedman, L. P., Cockburn, I. M., and Simcoe, T. S. (2015), “The economics of reproducibility in preclinical research. PLoS Biology, 13, e1002165. DOI: 10.1371/journal.pbio.1002165.
  • Fritz, A., Scherndl, T., and Kühberger, A. (2013), “A Comprehensive Review of Reporting Practices in Psychological Journals: Are Effect Sizes Really Enough?” Theory & Psychology, 23, 98–122. DOI: 10.1177/0959354312436870.
  • Gall, T., Ioannidis, J., and Maniadis, Z. (2017), “The Credibility Crisis in Research: Can Economics Tools Help?” PLoS Biology, 15, e2001846. DOI: 10.1371/journal.pbio.2001846.
  • Gaudart, J., Huiart, L., Milligan, P. J., Thiebaut, R., and Giorgi, R. (2014), “Reproducibility Issues in Science, is p value Really the Only Answer?” Proceedings of the National Academy of Sciences of the United States of America 111, E1934. DOI: 10.1073/pnas.1323051111.
  • Gelman, A., and Carlin, J. (2014), “Beyond Power Calculations Assessing Type s (sign) and Type m (magnitude) Errors,” Perspectives on Psychological Science, 9, 641–651. DOI: 10.1177/1745691614551642.
  • Gervais, W. M., Jewell, J. A., Najle, M. B., and Ng, B. K. (2015), “A Powerful Nudge? Presenting Calculable Consequences of Underpowered Research Shifts Incentives Toward Adequately Powered Designs,” Social Psychological and Personality Science, 6, 847–854. DOI: 10.1177/1948550615584199.
  • Goodman, S., and Greenland, S. (2007), “Assessing the Unreliability of the Medical Literature: A Response to ‘Why Most Published Research Findings Are False?”’ Johns Hopkins University, Department of Biostatistics; Working Papers.
  • Gornitzki, C., Larsson, A., and Fadeel, B. (2015), “Freewheelin’ Scientists: Citing Bob Dylan in the Biomedical Literature,” BMJ 351:h6505. DOI: 10.1136/bmj.h6505.
  • Greenwald, A. G. (1975), “Consequences of Prejudice Against the Null Hypothesis,” Psychological Bulletin, 82, 1–20. DOI: 10.1037/h0076157.
  • Greve, W., Bröder, A., and Erdfelder, E. (2013), “Result-blind Peer Reviews and Editorial Decisions: A Missing Pillar of Scientific Culture,” European Psychologist, 18, 286–294. DOI: 10.1027/1016-9040/a000144.
  • Hagen, K. (2016), “Novel or Reproducible: That is the Question,” Glycobiology, 26, 429–429.
  • Halpern, S. D., Karlawish, J. H., and Berlin, J. A. (2002), “The Continuing Unethical Conduct of Underpowered Clinical Trials,” JAMA 288, 358–362. DOI: 10.1001/jama.288.3.358.
  • Hazra, A., and Gogtay, N. (2016), “Biostatistics Series Module 5: Determining Sample Size,” Indian Journal of Dermatology 61, 496–504. DOI: 10.4103/0019-5154.190119.
  • Higginson, A. D., and Munafò, M. R. (2016), “Current Incentives for Scientists Lead to Underpowered Studies With Erroneous Conclusions,” PLoS Biology, 14, e2000995. DOI: 10.1371/journal.pbio.2000995.
  • Hubbard, R., and Bayarri, M. J. (2003), “Confusion Over Measures of Evidence (p’s) Versus Errors (α’s) in Classical Statistical Testing,” The American Statistician, 57, 171–178. DOI: 10.1198/0003130031856.
  • IntHout, J., Ioannidis, J. P., and Borm, G. F. (2016), “Obtaining Evidence by a Single Well-powered Trial or Several Modestly Powered Trials,” Statistical Methods in Medical Research, 25, 538–552. DOI: 10.1177/0962280212461098.
  • Ioannidis, J. P. (2005), “Why Most Published Research Findings Are False,” PLoS Medicine, 2, e124. DOI: 10.1371/journal.pmed.0020124.
  • Ioannidis, J. P. (2008),“Why Most Discovered True Associations Are Inflated,” Epidemiology, 640–648. DOI: 10.1097/EDE.0b013e31818131e7.
  • Ioannidis, J. P., Hozo, I., and Djulbegovic, B. (2013), “Optimal Type I and Type II Error Pairs When the Available Sample Size Is Fixed,” Journal of Clinical Epidemiology, 66, 903–910. DOI: 10.1016/j.jclinepi.2013.03.002.
  • Johnson, V. E. (2013), “Revised Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences 110, 19313–19317. DOI: 10.1073/pnas.1313476110.
  • Johnson, V. E. (2014), “Reply to Gelman, Gaudart, Pericchi: More Reasons to Revise Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences, 111, E1936–E1937. DOI: 10.1073/pnas.1400338111.
  • Kimmelman, J., Mogil, J. S., and Dirnagl, U.(2014), “Distinguishing Between Exploratory and Confirmatory Preclinical Research Will Improve Translation,” PLoS Biology, 12, e1001863. DOI: 10.1371/journal.pbio.1001863.
  • Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E. et al. (2018), “Justify Your Alpha,” Nature Human Behaviour 2, 168–171. DOI: 10.1038/s41562-018-0311-x.
  • Lamberink, H. J., Otte, W. M., Sinke, M. R., Lakens, D., Glasziou, P. P., Tijdink, J. K., and Vinkers, C. H. (2018), “Statistical Power of Clinical Trials Increased While Effect Size Remained Stable: An Empirical Analysis of 136,212 Clinical Trials Between 1975 and 2014,” Journal of Clinical Epidemiology 102, 123–128. DOI: 10.1016/j.jclinepi.2018.06.014.
  • Lane, D. M., and Dunlap, W. P.(1978), “Estimating Effect Size: Bias Resulting From the Significance Criterion in Editorial Decisions,” British Journal of Mathematical and Statistical Psychology, 31, 107–112. DOI: 10.1111/j.2044-8317.1978.tb00578.x.
  • Lee, C. J., and Schunn, C. D. (2011), “Social Biases and Solutions for Procedural Objectivity,” Hypatia 26, 352–373. DOI: 10.1111/j.1527-2001.2011.01178.x.
  • Leek, J. T., and Jager, L. R. (2017), “Is Most Published Research Really False?” Annual Review of Statistics and Its Application, 4, 109–122. DOI: 10.1146/annurev-statistics-060116-054104.
  • Lenth, R. V. (2001), “Some Practical Guidelines for Effective Sample Size Determination,” The American Statistician, 55, 187–193. DOI: 10.1198/000313001317098149.
  • Makel, M. C., Plucker, J. A., and Hegarty, B. (2012), “Replications in Psychology Research: How Often Do They Really Occur?” Perspectives on Psychological Science, 7, 537–542. DOI: 10.1177/1745691612460688.
  • Martin, G., and Clarke, R. M. (2017). Are Psychology Journals Anti-Replication? A Snapshot of Editorial Practices, Frontiers in Psychology, 8, 1–6.
  • McLaughlin, A. (2011), “In Pursuit of Resistance: Pragmatic Recommendations for Doing Science Within One’s Means,” European Journal for Philosophy of Science, 1, 353–371. DOI: 10.1007/s13194-011-0030-x.
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2017), “Abandon Statistical Significance,” arXiv preprint arXiv:1709.07588.
  • Miller, J., and Ulrich, R. (2016), “Optimizing Research Payoff,” Perspectives on Psychological Science 11, 664–691. DOI: 10.1177/1745691616649170.
  • Moonesinghe, R., Khoury, M. J., and Janssens, A. C. J. (2007), “Most Published Research Findings Are False–But a Little Replication Goes a Long Way,” PLoS Medicine 4, e28. DOI: 10.1371/journal.pmed.0040028.
  • Mudge, J. F., Baker, L. F., Edge, C. B., and Houlahan, J. E. (2012). “Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests,” PloS One 7, e32734. DOI: 10.1371/journal.pone.0032734.
  • Nord, C. L., Valton, V., Wood, J., and Roiser, J. P. (2017), “Power-up: A Reanalysis of ‘Power Failure’ in Neuroscience using Mixture Modelling,” Journal of Neuroscience, 3592–16.
  • OpenScienceCollaboration (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349, aac4716.
  • Ploutz-Snyder, R. J., Fiedler, J., and Feiveson, A. H. (2014), “Justifying Small-n Research in Scientifically Amazing Settings: Challenging the Notion That Only? Big-n? Studies Are Worthwhile,” Journal of Applied Physiology, 116, 1251–1252. DOI: 10.1152/japplphysiol.01335.2013.
  • Richard, F. D., Bond C. F. Jr, and Stokes-Zoota, J. J. (2003), “One Hundred Years of Social Psychology Quantitatively Described,” Review of General Psychology, 7, 331–363. DOI: 10.1037/1089-2680.7.4.331.
  • Roos, J. M. (2017), “Measuring the Effects of Experimental Costs on Sample Sizes.” under review, available at: http://www.jasonmtroos.com/assets/media/papers/Experimental_Costs_Sample_Sizes.pdf
  • Sakaluk, J. K. (2016), “Exploring Small, Confirming Big: An Alternative System to the New Statistics for Advancing Cumulative and Replicable Psychological Research,” Journal of Experimental Social Psychology, 66, 47–54. DOI: 10.1016/j.jesp.2015.09.013.
  • Schulz, K. F., and Grimes, D. A. (2005), “Sample Size Calculations in Randomised Trials: Mandatory and Mystical,” The Lancet 365, 1348–1353. DOI: 10.1016/S0140-6736(05)61034-3.
  • Senn, S. (2012), “Misunderstanding Publication Bias: Editors Are Not Blameless After All,” F1000Research 1.
  • Siler, K., Lee, K., and Bero, L. (2015), “Measuring the Effectiveness of Scientific Gatekeeping,” Proceedings of the National Academy of Sciences 112, 360–365. DOI: 10.1073/pnas.1418218112.
  • Smaldino, P. E., and McElreath, R. (2016), “The Natural Selection of Bad Science,” Royal Society Open Science, 3, 160384. DOI: 10.1098/rsos.160384.
  • Song, F., Y. Loke, and Hooper, L. (2014), “Why Are Medical and Health-related Studies Not Being Published? A Systematic Review of Reasons Given by Investigators,” PLoS One 9 (10), e110418. DOI: 10.1371/journal.pone.0110418.
  • Spiegelhalter, D. (2017), “Trust in Numbers,” Journal of the Royal Statistical Society, Series A, 180, 948–965. DOI: 10.1111/rssa.12302.
  • Stanley, T., Jarrell, S. B., and Doucouliagos, H. (2010), “Could It Be Better to Discard 90% of the Data? A Statistical Paradox,” The American Statistician 64, 70–77. DOI: 10.1198/tast.2009.08205.
  • Sterling, T. D., Rosenbaum, W. L., and Weinkam, J. J. (1995), “Publication Decisions Revisited: The Effect of the Outcome of Statistical Tests on the Decision to Publish and Vice Versa,” The American Statistician, 49, 108–112. DOI: 10.2307/2684823.
  • Szucs, D., and Ioannidis, J. P. (2016), “Empirical Assessment of Published Effect Sizes and Power in the Recent Cognitive Neuroscience and Psychology Literature,” bioRxiv, 071530.
  • van Assen, M. A., van Aert, R. C., Nuijten, M. B., and Wicherts, J. M. (2014), “Why Publishing Everything is More Effective Than Selective Publishing of Statistically Significant Results,” PLoS One, 9, e84896. DOI: 10.1371/journal.pone.0084896.
  • van Dijk, D., Manor, O., and Carey, L. B. (2014), “Publication Metrics and Success on the Academic Job Market,” Current Biology, 24, R516–R517. DOI: 10.1016/j.cub.2014.04.039.
  • Vasishth, S., and Gelman, A. (2017), “The Illusion of Power: How the Statistical Significance Filter Leads to Overconfident Expectations of Replicability,” arXiv preprint arXiv:1702.00556.
  • Walker, A. M. (1995), “Low Power and Striking Results—A Surprise But Not a Paradox,” Mass Medical Soc. 332 (16), 1091–1092. DOI: 10.1056/NEJM199504203321609.
  • Wei, Y., and Chen, F. (2018), “Lowering the p Value Threshold–Reply,” JAMA 320, 937–938.
  • Wilson, B. M., and Wixted, J. T. (2018), “The Prior Odds of Testing a True Effect in Cognitive and Social Psychology,” Advances in Methods and Practices in Psychological Science, 2515245918767122.
  • Yeung, A. W. (2017), “Do Neuroscience Journals Accept Replications? A Survey of Literature,” Frontiers in Human Neuroscience, 11, 468.