111,715
Views
312
CrossRef citations to date
0
Altmetric
Discussion

50 Years of Data Science

Pages 745-766 | Received 01 Aug 2017, Published online: 19 Dec 2017

References

  • Barlow, M. (2013), The Culture of Big Data, Sebastopol, CA: O’Reilly Media, Inc.
  • Baumer, B. (2015), “A Data Science Course for Undergraduates: Thinking With Data,” The American Statistician, 69, 334–342.
  • Bernau, C., Riester, M., Boulesteix, A.-L., Parmigiani, G., Huttenhower, C., Waldron, L., and Trippa, L. (2014), “Cross-Study Validation for the Assessment of Prediction Algorithms,” Bioinformatics, 30, i105–i112.
  • Breiman, L. (2001), “Statistical Modeling: the Two Cultures,” Statistical Science, 16, 199–231.
  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., and Munafò, M. R. (2013), “Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience,” Nature Reviews Neuroscience, 14, 365–376.
  • Carp, J. (2012), “The Secret Lives of Experiments: Methods Reporting in the fMRI Literature,” Neuroimage, 63, 289–300.
  • Chambers, J. M. (1993), “Greater or Lesser Statistics: A Choice for Future Research,” Statistics and Computing, 3, 182–184.
  • Chavalarias, D., Wallach, J., Li, A., and Ioannidis, J. A. (2016), “Evolution of Reporting p Values in the Biomedical Literature, 1990–2015,” Journal of the American Medical Association, 315, 1141–1148.
  • Cleveland, W. S. (1985), The Elements of Graphing Data, Monterey, CA: Wadsworth Advanced Books and Software.
  • ——— (1993), Visualizing Data, Summit, NJ: Hobart Press.
  • ——— (2001), “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,” International Statistical Review, 69, 21–26.
  • Coale, A. J., and Stephan, F. F. (1962), “The Case of the Indians and the Teen-Age Widows,” Journal of the American Statistical Association, 57, 338–347.
  • Collins, F., and Tabak, L. A. (2014), “Policy: NIH Plans to Enhance Reproducibility,” Nature, 505, 612–613.
  • Cook, D., and Swayne, D. F. (2007), Interactive and Dynamic Graphics for Data Analysis: With R and GGobi, New York: Springer Science & Business Media.
  • Dettling, M. (2004), “BagBoosting for Tumor Classification with Gene Expression Data,” Bioinformatics, 20, 3583–3593.
  • Donoho, D., and Jin, J. (2008), “Higher Criticism Thresholding: Optimal Feature Selection When Useful Features Are Rare and Weak,” Proceedings of the National Academy of Sciences, 105, 14790–14795.
  • Donoho, D. L., Maleki, A., Rahman, I. U., Shahram, M., and Stodden, V. (2009), “Reproducible Research in Computational Harmonic Analysis,” Computing in Science and Engineering, 11, 8–18.
  • Fisher, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, 7, 179–188.
  • Freire, J., Bonnet, P., and Shasha, D. (2012), “Computational Reproducibility: State-of-the-Art, Challenges, and Database Research Opportunities,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, ACM, pp. 593–596.
  • Gavish, M. (2012), “Three Dream Applications of Verifiable Computational Results,” Computing in Science & Engineering, 14, 26–31.
  • Gavish, M., and Donoho, D. (2011), “A Universal Identifier for Computational Results,” Procedia Computer Science, 4, 637–647.
  • Hand, D. J. (2006), “Classifier Technology and the Illusion of Progress,” Statistical Science, 21, 1–14.
  • Harris, H., Murphy, S., and Vaisman, M. (2013), Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work, Sebastopol, CA: O’Reilly Media, Inc.
  • Heroux, M. A. (2015), “Editorial: ACM TOMS Replicated Computational Results Initiative,” ACM Transactions on Mathematical Software, 41, 13:1–13.
  • Horton, N. J., Baumer, B. S., and Wickham, H. (2015), “Taking a Chance in the Classroom: Setting the Stage for Data Science: Integration of Data Management Skills in Introductory and Second Courses in Statistics,” CHANCE, 28, 40–50.
  • Hotelling, H. (1940), “The Teaching of Statistics,” The Annals of Mathematical Statistics, 11, 457–470.
  • Ioannidis, J. P. A. (2005), “Contradicted and Initially Stronger Effects in Highly Cited Clinical Research,” Journal of the American Medical Association, 294, 218–228.
  • ——— (2007), “Non-Replication and Inconsistency in the Genome-Wide Association Setting,” Human Heredity, 64, 203–213.
  • ——— (2008), “Why Most Discovered True Associations are Inflated,” Epidemiology, 19, 640–648.
  • Iverson, K. E. (1991), “A Personal View of APL,” IBM Systems Journal, 30, 582–593.
  • Jager, L. R., and Leek, J. T. (2014), “An Estimate of the Science-Wise False Discovery Rate and Application to The Top Medical Literature,” Biostatistics, 15, 1–12.
  • Liberman, M. (2010), “Fred Jelinek,” Computational Linguistics, 36, 595–599.
  • Madigan, D., Stang, P. E., Berlin, J. A., Schuemie, M., Overhage, J. M., Suchard, M. A., Dumouchel, B., Hartzema, A. G., and Ryan, P. B. (2014), “A Systematic Statistical Approach to Evaluating Evidence From Observational Studies,” Annual Review of Statistics and Its Application, 1, 11–39.
  • Marchi, M., and Albert, J. (2013), Analyzing Baseball Data with R, Boca Raton, FL: CRC Press.
  • McNutt, M. (2014), “Reproducibility,” Science, 343, 229.
  • Mosteller, F., and Tukey, J. W. (1968), “Data Analysis, Including Statistics,” in Handbook of Social Psychology (Vol. 2), eds. G. Lindzey, and E. Aronson, Reading, MA: Addison-Wesley, pp. 80–203.
  • Open Science Collaboration et al. (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349, aac4716.
  • Pan, Z., Trikalinos, T. A., Kavvoura, F. K., Lau, J., and Ioannidis, J. P. A. (2005), “Local Literature Bias in Genetic Epidemiology: An Empirical Evaluation of the Chinese Literature,” PLoS Medicine, 2, 1309.
  • Peng, R. D. (2009), “Reproducible Research and Biostatistics,” Biostatistics, 10, 405–408.
  • Prinz, F., Schlange, T., and Asadullah, K. (2011), “Believe It or Not: How Much Can We Rely on Published Data on Potential Drug Targets?” Nature Reviews Drug Discovery, 10, 712–712.
  • Ryan, P. B., Madigan, D., Stang, P. E., Overhage, J. M., Racoosin, J. A., and Hartzema, A. G. (2012), “Empirical Assessment of Methods for Risk Identification in Healthcare Data: Results From the Experiments of the Observational Medical Outcomes Partnership,” Statistics in Medicine, 31, 4401–4415.
  • Stodden, V. (2012), “Reproducible Research: Tools and Strategies for Scientific Computing,” Computing in Science and Engineering, 14, 11–12.
  • Stodden, V., Guo, P., and Ma, Z. (2013), “Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals,” PLoS ONE, 8, e67111.
  • Stodden, V., Leisch, F., and Peng, R. D., editors. (2014), Implementing Reproducible Research, Boca Raton, FL: Chapman & Hall/CRC.
  • Stodden, V., and Miguez, S. (2014), “Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research,” Journal of Open Research Software, 1, e21.
  • Sullivan, P. F. (2007), “Spurious Genetic Associations,” Biological Psychiatry, 61, 1121–1126.
  • Tango, T. M., Lichtman, M. G., and Dolphin, A. E. (2007), The Book: Playing the Percentages in Baseball, Lincoln, NE: Potomac Books, Inc.
  • Tukey, J. W. (1962), “The Future of Data Analysis,” The Annals of Mathematical Statistics, 33, 1–67.
  • ——— (1977), Exploratory Data Analysis, Reading, MA: Addison-Wesley.
  • ——— (1994), The Collected Works of John W. Tukey: Multiple Comparisons (Vol. 1), eds. H. I. Braun, Pacific Grove, CA: Wadsworth & Brooks/Cole.
  • Wandell, B. A., Rokem, A., Perry, L. M., Schaefer, G., and Dougherty, R. F. (2015), “Quantitative Biology – Quantitative Methods,” Bibliographic Code: 2015arXiv150206900W.
  • Wickham, H. (2007), “Reshaping Data With the Reshape Package,” Journal of Statistical Software, 21, 1–20.
  • ——— (2011), “ggplot2,” Wiley Interdisciplinary Reviews: Computational Statistics, 3, 180–185.
  • ——— (2011), “The Split-Apply-Combine Strategy for Data Analysis,” Journal of Statistical Software, 40, 1–29.
  • ——— (2014), “Tidy Data,” Journal of Statistical Software, 59, 1–23.
  • Wilkinson, L. (2006), The Grammar of Graphics, New York: Springer Science & Business Media.
  • Zhao, S. D., Parmigiani, G., Huttenhower, C., and Waldron, L. (2014), “Más-o-Menos: A Simple Sign Averaging Method for Discrimination in Genomic Data Analysis,” Bioinformatics, 30, 3062–3069.