30,349
Views
46
CrossRef citations to date
0
Altmetric
Articles

Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science

ORCID Icon
Pages 246-261 | Received 24 Feb 2018, Accepted 17 Aug 2018, Published online: 20 Mar 2019

References

  • Allen, J. P. (1972), “Summary of Scientific Results,” Apollo 15 Preliminary Science Report NASA SP-289, NASA Manned Spacecraft Center, Washington, DC.
  • Ambroise, C., and McLachlan, G. J. (2002), “Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data,” Proceedings of the National Academy of Sciences, 99, 6562–6566. DOI: 10.1073/pnas.102102699.
  • Argamon, S. E. (2017), “Don’t Strengthen Statistical Significance—Abolish It,” American Scientist, Macroscope blog [online], available at https://www.americanscientist.org/blog/macroscope/dont-strengthen-statistical-significance-abolish-it
  • Bailey, D. (2018), “Why Outliers Are Good for Science,” Significance, 15, 14–19. DOI: 10.1111/j.1740-9713.2018.01105.x.
  • Barnett, V. (1999), Comparative Statistical Inference (3rd ed.), Chichester, UK: Wiley.
  • Berger, J. O., and Berry, D. A. (1988), “Statistical Analysis and the Illusion of Objectivity,” American Scientist, 76, 159–165.
  • Berk, R., Brown, L., and Zhao, L. (2010), “Statistical Inference After Model Selection,” Journal of Quantitative Criminology, 26, 217–236. DOI: 10.1007/s10940-009-9077-7.
  • Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. (2013), “Valid Post-Selection Inference,” Annals of Statistics, 41, 802–837. DOI: 10.1214/12-AOS1077.
  • Berry, D. (2016), “P-values Are Not What They’re Cracked Up To Be,” Online supplement to Wasserstein & Lazar (2016).
  • Box, G. E. P. (1976), “Science and Statistics,” Journal of the American Statistical Association, 71, 791–799. DOI: 10.1080/01621459.1976.10480949.
  • Box, G. E. P. (1999), “Statistics as a Catalyst to Learning by Scientific Method Part II—A Discussion,” Journal of Quality Technology, 31, 16–29.
  • Box, G. E. P., Hunter, J. S., and Hunter, W. G. (2005) Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.), Hoboken, NJ: Wiley.
  • Carpenter, D. (2010), Reputation and Power: Organizational Image and Pharmaceutical Regulation at the FDA, Princeton, NJ: Princeton University Press.
  • Chatfield, C. (1995), “Model Uncertainty, Data Mining and Statistical Inference” (with discussion), Journal of the Royal Statistical Society, Series A, 158, 419–466. DOI: 10.2307/2983440.
  • Chow, S.-C., and Chang, M. (2012), Adaptive Design Methods in Clinical Trials (2nd ed.), Boca Raton, FL: CRC Press.
  • Cleveland, W. S. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the American Statistical Association, 74, 829–836. DOI: 10.1080/01621459.1979.10481038.
  • Cleveland, W. S. (1993), Visualizing Data, Summit, NJ: Hobart Press.
  • Cleveland, W. S. (1994), The Elements of Graphing Data (2nd ed.), Summit, NJ: Hobart Press.
  • Cochran, W. G. (1977), Sampling Techniques (3rd ed.), New York: Wiley.
  • Coleman, D., and Gunter, B. (2014), A DOE Handbook: A Simple Approach to Basic Statistical Design of Experiments, Seattle, WA: Amazon CreateSpace.
  • Couzin-Frankel, J. (2013), “When Mice Mislead,” Science, 342, 922–925. DOI: 10.1126/science.342.6161.922.
  • Cox, D. R. (1975), “A Note on Data-Splitting for the Evaluation of Significance Levels,” Biometrika, 62, 441–444. DOI: 10.1093/biomet/62.2.441.
  • Cox, G. M. (1957), “Statistical Frontiers,” Journal of the American Statistical Association, 52, 1–12. DOI: 10.1080/01621459.1957.10501361.
  • Dahl, F. A., Grotle, M., Benth, J. J. S., and Natvig, B. (2008), “Data Splitting as a Countermeasure Against Hypothesis Fishing: With a Case Study of Predictors for Low Back Pain,” European Journal of Epidemiology, 23, 237–242. DOI: 10.1007/s10654-008-9230-x.
  • David, H. A. (1968), “Gini’s Mean Difference Rediscovered,” Biometrika, 55, 573–575. DOI: 10.2307/2334264.
  • Diaconis, P. (1985), “Theories of Data Analysis: From Magical Thinking Through Classical Statistics,” in Exploring Data Tables, Trends, and Shapes, eds. D. C. Hoaglin, F. Mosteller, and J. W. Tukey, New York: Wiley, pp. 1–36.
  • Dickhaus, T. (2014), Simultaneous Statistical Inference, Berlin: Springer.
  • Dijkstra, T. K. (ed.) (1988), On Model Uncertainty and its Statistical Implications (Lecture Notes in Economics and Mathematical Systems, Vol. 307), Berlin: Springer.
  • Draper, D. (1995), “Assessment and Propagation of Model Uncertainty” (with discussion), Journal of the Royal Statistical Society, Series B, 57, 45–97. DOI: 10.1111/j.2517-6161.1995.tb02015.x.
  • Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. (2016), “Preserving Statistical Validity in Adaptive Data Analysis,” [online], available at https://arxiv.org/abs/1411.2664v3
  • Efron, B. (1982), The Jacknife, the Bootstrap and Other Resampling Plans, Philadelphia, PA: Society for Industrial and Applied Mathematics.
  • Efron, B. (1983), “Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation,” Journal of the American Statistical Association, 78, 316–331.
  • Efron, B. (2014), “Estimation and Accuracy After Model Selection,” Journal of the American Statistical Association, 109, 991–1007.
  • Efron, B., and Hastie, T. (2016), Computer-Age Statistical Inference: Algorithms, Evidence, and Data Science, New York: Cambridge University Press.
  • Ehrenberg, A. S. C. (1990), “A Hope for the Future of Statistics: MSOD,” The American Statistician, 44, 195–196. DOI: 10.1080/00031305.1990.10475717.
  • Faraway, J. (2016), “Does Data Splitting Improve Prediction?” Statistics and Computing, 26, 49–60. DOI: 10.1007/s11222-014-9522-9.
  • Feller, W. (1969), “Are Life Scientists Overawed by Statistics? (Too Much Faith in Statistics),” Scientific Research, 4, 24–29.
  • Fisher, R. A. (1922), “On the Mathematical Foundations of Theoretical Statistics,” Philosophical Transactions of the Royal Society of London A, 222, 309–368. DOI: 10.1098/rsta.1922.0009.
  • Fisher, R. A. (1926), “The Arrangement of Field Experiments,” Journal of the Ministry of Agriculture of Great Britain, 33, 503–513.
  • Fragoso, T. M., Bertoli, W., and Louzada, F. (2018), “Bayesian Model Averaging: A Systematic Review and Conceptual Classification,” International Statistical Review, 86, 1–28. DOI: 10.1111/insr.12243.
  • Freedman, D. A. (1983), “A Note on Screening Regression Equations,” The American Statistician, 37, 152–155. DOI: 10.2307/2685877.
  • Freedman, D. A. (1991), “Statistical Models and Shoe Leather,” Sociological Methodology, 21, 291–313.
  • Freedman, D. A. (1995), “Issues in the Foundations of Statistics: Probability and Statistical Models,” Foundations of Science, 1, 19–39.
  • Freedman, D. A. (1998), “Oasis or Mirage?,” Chance, 21, 59–61.
  • Freedman, D. A. (1999), “From Association to Causation: Some Remarks on the History of Statistics,” Statistical Science, 14, 243–258.
  • Gauch, R. R. (2009), It’s Great! Oops, No It Isn’t: Why Clinical Research Can’t Guarantee the Right Medical Answers, Berlin: Springer.
  • Geisser, S. (2006), Modes of Parametric Statistical Inference, Hoboken, NJ: Wiley.
  • Gelman, A. (2003), “A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing,” International Statistical Review, 71, 369–382. DOI: 10.1111/j.1751-5823.2003.tb00203.x.
  • Gelman, A. (2015), “Statistics and Research Integrity,” European Science Editing, 41, 13–14.
  • Gelman, A. (2016), “The Problems with P-values Are Not Just With P-values,” online supplement to Wasserstein & Lazar (2016).
  • Gelman, A., and Loken, E. (2014), “The Statistical Crisis in Science,” American Scientist, 102, 460–465. DOI: 10.1511/2014.111.460.
  • Gigerenzer, G., and Marewski, J. (2015), “Surrogate Science: The Idol of a Universal Method for Scientific Inference,” Journal of Management, 41, 421–440. DOI: 10.1177/0149206314547522.
  • Goldin-Meadow, S. (2016), “Why Preregistration Makes Me Nervous,” Association for Psychological Science Observer, [online], available at https://www.psychologicalscience.org/observer/why-preregistration-makes-me-nervous
  • Gong, G. (1986), “Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression,” Journal of the American Statistical Association, 81, 108–113. DOI: 10.1080/01621459.1986.10478245.
  • Grolemund, G., and Wickham, H. (2014), “A Cognitive Interpretation of Data Analysis” (with discussion), International Statistical Review, 82, 184–213. DOI: 10.1111/insr.12028.
  • Gunter, B., and Tong, C. (2017), “What Are the Odds!? The ‘Airport Fallacy’ and Statistical Inference,” Significance, 14, 38–41. DOI: 10.1111/j.1740-9713.2017.01057.x.
  • Harrell, F. E., Jr. (2015), Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (2nd ed.), New York: Springer.
  • Harrell, F. E., Jr. (2018), “Improving Research Through Safer Learning from Data,” Statistical Thinking blog [online], available at http://www.fharrell.com/post/improve-research/
  • Harris, R. (2017), Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions, New York: Basic Books.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.), New York: Springer.
  • Hjort, N. L., and Claeskens, G. (2003), “Frequentist Model Average Estimators,” Journal of the American Statistical Association, 98, 879–899. DOI: 10.1198/016214503000000828.
  • Hjorth, U. (1989), “On Model Selection in the Computer Age,” Journal of Statistical Planning and Inference, 23, 101–115. DOI: 10.1016/0378-3758(89)90043-8.
  • Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (eds.) (1983), Understanding Robust and Exploratory Data Analysis, New York: Wiley.
  • Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (eds.) (1985), Exploring Data Tables, Trends, and Shapes, New York: Wiley.
  • Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (eds.) (1991), Fundamentals of Exploratory Analysis of Variance, New York: Wiley.
  • Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999), “Bayesian Model Averaging: A Tutorial,” Statistical Science, 14, 382–417.
  • Holmes, S. (2018), “Statistical Proof? The Problem of Irreproducibility,” Bulletin of the American Mathematical Society, 55, 31–55. DOI: 10.1090/bull/1597.
  • Huber, P. J. (1985), “Data Analysis: In Search of an Identity,” in Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer (Vol. 1), Wadsworth, pp. 65–78.
  • International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (1997), ICH Harmonised Tripartite Guideline: General Considerations for Clinical Trials, E8.
  • International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (1998), ICH Harmonised Tripartite Guideline: Statistical Principles for Clinical Trials, E9.
  • Kallus, N. (2018), “Optimal A Priori Balance in the Design of Controlled Experiments,” Journal of the Royal Statistical Society, Series B, 80, 85–112. DOI: 10.1111/rssb.12240.
  • Koopmans, T. C. (1949), “Identification Problems in Economic Model Construction,” Econometrica, 17, 125–144. DOI: 10.2307/1905689.
  • Koyama, T. (2011), “Dynamite Plots,” [online], available at http://biostat.mc.vanderbilt.edu/wiki/Main/DynamitePlots
  • Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F., and Baker, C. I., (2009), “Circular Analysis in Systems Neuroscience: The Dangers of Double Dipping,” Nature Neuroscience, 12, 535–540. DOI: 10.1038/nn.2303.
  • Krzywinski, M., and Altman, N. (2014), “Visualizing Samples with Box Plots,” Nature Methods, 11, 119–120.
  • Kuehl, R. O. (2000), Design of Experiments: Statistical Principles of Research Design and Analysis (2nd ed.), Pacific Grove, CA: Duxbury.
  • Kuhn, T. S. (1970), The Structure of Scientific Revolutions (2nd ed.), Chicago, IL: University of Chicago Press.
  • Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016), “Exact Post-Selection Inference, with Application to the Lasso,” Annals of Statistics, 44, 907–927. DOI: 10.1214/15-AOS1371.
  • Leeb, H. (2009), “Conditional Predictive Inference Post Model Selection,” Annals of Statistics, 37, 2838–2876. DOI: 10.1214/08-AOS660.
  • Leek, J. T., Scharpf, R. B., Corrada Bravo, H., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K., and Irizarry, R. A. (2010), “Tackling the Widespread and Critical Impact of Batch Effects in High-throughput Data,” Nature Reviews Genetics, 11, 733–739. DOI: 10.1038/nrg2825.
  • Lew, M. (2016), “Three Inferential Questions, Two Types of P-value,” Online supplement to Wasserstein & Lazar (2016).
  • Lewis, J. A. (1999), “Statistical Principles for Clinical Trials (ICH E9) An Introductory Note on an International Guideline,” Statistics in Medicine, 18, 1903–1904. DOI: 10.1002/(SICI)1097-0258(19990815)18:15<1903::AID-SIM188>3.0.CO;2-F.
  • Lithgow, G. J., Driscoll, M., and Phillips, P. (2017), “A Long Journey to Reproducible Results,” Nature, 548, 387–388, available at https://www.nature.com/news/a-long-journey-to-reproducible-results-1.22478?WT.mc_id=FBK_NatureNews&sf108251523=1
  • Lock Morgan, K., and Rubin, D. B. (2012), “Rerandomization to Improve Covariate Balance in Experiments,” Annals of Statistics, 40, 1263–1282. DOI: 10.1214/12-AOS1008.
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2017), “Abandon Statistical Significance,” [online], available at https://arxiv.org/abs/1709.07588.
  • McShane, B. B., and Gelman, A. (2017), “Abandon Statistical Significance,” Nature, 551, 558, available at https://www.nature.com/articles/d41586-017-07522-z
  • Mallows, C. L. (1983), “Data Description,” in Scientific Inference, Data Analysis, and Robustness, eds. G. E. P. Box, T. Leonard, and C.-F. Wu, New York: Academic Press, pp. 135–151.
  • Mallows, C. L. (1998), “The Zeroth Problem,” The American Statistician, 52, 1–9.
  • Mallows, C. L., and Walley, P. (1980), “A Theory of Data Analysis?” in Proceedings of the Business and Economics Statistics Section, American Statistical Association, pp. 8–14.
  • Maronna, R., Martin, D., and Yohai, V. (2006), Robust Statistics: Theory and Methods, Hoboken, NJ: Wiley.
  • Mogil, J. S., and Macleod, M. R. (2017), “No Publication Without Confirmation,” Nature, 542, 409–411, available at https://www.nature.com/news/no-publication-without-confirmation-1.21509
  • Montgomery, D. C. (2017), Design and Analysis of Experiments (9th ed.), Hoboken, NJ: Wiley.
  • Moore, D. S. (1992), “What is Statistics?” in Perspectives on Contemporary Statistics, eds. D. C. Hoaglin, and D. S. Moore, Washington, DC: Mathematical Association of America, pp. 1–17.
  • Moses, L. E. (1992), “The Reasoning of Statistical Inference,” in Perspectives on Contemporary Statistics, eds. D. C. Hoaglin, and D. S. Moore, Washington, DC: Mathematical Association of America, pp. 107–122.
  • Mosteller, F., and Tukey, J. W. (1977), Data Analysis and Regression, Reading, MA: Addison-Wesley.
  • Motulsky, H. J. (2014), “Common Misconceptions About Data Analysis and Statistics,” Journal of Pharmacology and Experimental Therapeutics, 351, 200–205. DOI: 10.1124/jpet.114.219170.
  • Munafo, M. R., and Davey Smith, G. (2018), “Robust Research Needs Many Lines of Evidence,” Nature, [online], available at https://www.nature.com/articles/d41586-018-01023-3
  • Murray, D. M. (1998), Design and Analysis of Group-Randomised Trials (Monographs in Epidemiology and Biostatistics, Vol. 27), New York: Oxford University Press.
  • Nelder, J. A. (1986), “Statistics, Science and Technology,” Journal of the Royal Statistical Society, Series A, 149, 109–121. DOI: 10.2307/2981525.
  • Nosek, B. A., Ebersole, C. R., DeHaven, A. C., and Mellor, D. T. (2018), “The Preregistration Revolution,” Proceedings of the National Academy of Sciences, 115, 2600–2606. DOI: 10.1073/pnas.1708274114.
  • Pearson, R. K. (2011), Exploring Data in Engineering, the Sciences, and Medicine, New York: Oxford University Press.
  • Piantadosi, S. (2017), Clinical Trials: A Methodologic Perspective (3rd ed.), Hoboken, NJ: Wiley.
  • Picard, R. R., and Cook, R. D. (1984), “Cross-Validation of Regression Models,” Journal of the American Statistical Association, 79, 575–583. DOI: 10.1080/01621459.1984.10478083.
  • Pikounis, B. (2001), “One-Factor Comparative Studies,” in Applied Statistics in the Pharmaceutical Industry, eds. S. P. Millard, and A. Krause, New York: Springer, pp. 17–40.
  • Raftery, A., Madigan, D., and Hoeting, J. (1993), “Model Selection and Accounting for Model Uncertainty in Linear Regression Models,” Technical Report No. 262, University of Washington (Seattle), Department of Statistics.
  • Reunanen, J. (2003), “Overfitting in Making Comparisons Between Variable Selection Methods,” Journal of Machine Learning Research, 3, 1371–1382.
  • Robbins, N. B. (2013), Creating More Effective Graphs, Wayne, NJ: Chart House.
  • Rousseeuw, P. J., and Croux, C. (1993), “Alternatives to the Median Absolute Deviation,” Journal of the American Statistical Association, 88, 1273–1283. DOI: 10.1080/01621459.1993.10476408.
  • Scheiner, L. B. (1997), “Learning Versus Confirming in Clinical Trials,” Clinical Pharmacology and Therapeutics, 61, 275–291.
  • Scott, S. (2013), “Pre-Registration Would Put Science in Chains,” Times Higher Education Supplement, [online], available at https://www.timeshighereducation.com/comment/opinion/pre-registration-would-put-science-in-chains/2005954.article
  • Seife, C. (2000), “CERN’s Gamble Shows Perils, Rewards of Playing the Odds,” Science, 289, 2260–2262. DOI: 10.1126/science.289.5488.2260.
  • Shell, E. R. (2016), “Hurdling Obstacles: Meet Marcia McNutt, Scientist, Administrator, Editor, and Now National Academy of Sciences President,” Science, 353, 116–119.
  • Simmons, J. P., Nelson, L. D., and Simonsohn, U. (2011), “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” Psychological Science, 22, 1359–1366. DOI: 10.1177/0956797611417632.
  • Snee, R. D. (1986), “In Pursuit of Total Quality,” Quality Progress, 20, 25–31.
  • Stone, M. (1974), “Cross-Validatory Choice and Assessment of Statistical Predictions,” Journal of the Royal Statistical Society, Series B, 36, 111–147. DOI: 10.1111/j.2517-6161.1974.tb00994.x.
  • Taleb, N. N. (2007), The Black Swan: The Impact of the Highly Improbable, New York: Random House.
  • Taylor, J. E., and Tibshirani, R. (2015), “Statistical Learning and Selective Inference,” Proceedings of the National Academy of Sciences, 112, 7629–7634. DOI: 10.1073/pnas.1507583112.
  • The OPERA Collaboration (2012), “Measurement of the Neutrino Velocity with the OPERA Detector in the CNGS Beam,” Journal of High Energy Physics [online], available at https://doi.org/10.1007/JHEP10(2012)093
  • Thompson, S. K. (2012), Sampling (3rd ed.), Hoboken, NJ: Wiley.
  • Tibshirani, R. (1996), “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
  • Tong, C., and Lock, A. (2015), “A Computational Procedure for Mean Kinetic Temperature Using Unequally Spaced Data,” in Proceedings of the Biopharmaceutical Section, American Statistical Association, pp. 2065–2070.
  • Touboul, P., Metris, G., Rodrigues, M., Andre, Y., Baghi, Q., Berge, J., Boulanger, D., Bremer, S., Carle, P., Chhun, R., et al. (2017), “MICROSCOPE Mission: First Results of a Space Test of the Equivalence Principle,” Physical Review Letters, 119, 231101.
  • Tukey, J. W. (1962), “The Future of Data Analysis,” Annals of Statistics, 33, 1–67. DOI: 10.1214/aoms/1177704711.
  • Tukey, J. W. (1969), “Analyzing Data: Sanctification or Detective Work?,” American Psychologist, 24, 83–91.
  • Tukey, J. W. (1973), “Exploratory Data Analysis as Part of a Larger Whole,” in Proceedings of the Eighteenth Conference on the Design of Experiments in Army Research Development and Testing (Part I), Durham, NC: Army Research Office, pp. 1–10.
  • Tukey, J. W. (1977), Exploratory Data Analysis, Reading, MA: Addison-Wesley.
  • Wasserman, L. (2006), All of Nonparametric Statistics, New York: Springer.
  • Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on P-Values: Context, Process, and Purpose,” The American Statistician, 70, 129–133. DOI: 10.1080/00031305.2016.1154108.
  • Wenmackers, S., and Vanpoucke, D. E. P. (2012), “Models and Simulations in Material Science: Two Cases Without Error Bars,” Statistica Neerlandica, 66, 339–355. DOI: 10.1111/j.1467-9574.2011.00519.x.
  • Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., and van Assen, M. A. L. M. (2016), “Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking,” Frontiers in Psychology, 7, 1832.
  • Winer, B. J., Brown, D. R., and Michels, K. M. (1991), Statistical Principles in Experimental Design (3rd ed.), Boston, MA: McGraw-Hill.
  • Youden, W. J. (1972), “Enduring Values,” Technometrics, 14, 1–11. DOI: 10.1080/00401706.1972.10488878.