CrossRef citations to date

Teaching Creative and Practical Data Science at Scale

, & ORCID Icon
Pages S27-S39 | Published online: 22 Mar 2021


  • Allaire, J. J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., and Iannone, R. (2019), “rmarkdown: Dynamic Documents for R,” available at https://rmarkdown.rstudio.com/docs/index.html.
  • Anaconda (2016), “Anaconda Software Distribution,” available at https://anaconda.com.
  • Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016), “Machine Bias,” ProPublica, available at https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
  • Baumer, B. (2015), “A Data Science Course for Undergraduates: Thinking With Data,” The American Statistician, 69, 334–342, DOI: 10.1080/00031305.2015.1081105.
  • Baumer, B. S., Garcia, R. L., Kim, A. Y., Kinnaird, K. M., and Ott, M. Q. (2020), “Integrating Data Science Ethics Into an Undergraduate Major,” arXiv no. 2001.07649.
  • Berthold, M. R. (2019), “What Does It Take to be a Successful Data Scientist?,” Harvard Data Science Review, 1, DOI: 10.1162/99608f92.e0eaabfc.
  • Bird, S., Kenthapadi, K., Kiciman, E., and Mitchell, M. (2019), “Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned,” in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM’19, Association for Computing Machinery, New York, NY, USA, pp. 834–835, DOI: 10.1145/3289600.3291383.
  • Blischak, J. D., Davenport, E. R., and Wilson, G. (2016), “A Quick Introduction to Version Control With Git and GitHub,” PLoS Computational Biology, 12, e1004668,DOI: 10.1371/journal.pcbi.1004668.
  • Bours, B. (2018), “Women and Minorities in Tech, by the Numbers,” Wired, available at https://www.wired.com/story/computer-science-graduates-diversity/.
  • Buolamwini, J. (2018), “Opinion | When the Robot Doesn’t See Dark Skin,” The New York Times, available at https://www.nytimes.com/2018/06/21/opinion/facial-analysis-technology-bias.html.
  • Buolamwini, J., and Gebru, T. (2018), “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” in Conference on Fairness, Accountability and Transparency, PMLR, pp. 77–91, available at http://proceedings.mlr.press/v81/buolamwini18a.html.
  • Cairo, A. (2012), The Functional Art: An Introduction to Information Graphics and Visualization, Berkeley, CA: New Riders Publishing.
  • Carver, R., Everson, M., Gabrosek, J., Horton, N., Lock, R., Mocko, M., Rossman, A., Holmes, G., Belleman, P., Witmer, J., and Wood, B. (2016), “Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report 2016,” available at https://www.amstat.org/asa/education/Guidelines-for-Assessment-and-Instruction-in-Statistics-Education-Reports.aspx.
  • Çetinkaya-Rundel, M., and Ellison, V. (2020), “A Fresh Look at Introductory Data Science,” Journal of Statistics Education, 1–11, DOI: 10.1080/10691898.2020.1804497.
  • Chandra, R. V., and Varanasi, B. S. (2015), “Requests: HTTP for Humans,” available at https://2.python-requests.org/en/master/.
  • Chu, J., Clinton, D., Dich, J., and Schonken, A. (2019), “COGS 108—Final Project—Crime Watch Effectiveness and Distribution,” GitHub, available at https://github.com/COGS108/FinalProjects-Fa19/blob/master/FinalProject-group030.ipynb.
  • Donoho, D. (2017), “50 Years of Data Science,” Journal of Computational and Graphical Statistics, 26, 745–766, DOI: 10.1080/10618600.2017.1384734.
  • Doucette, D. (2019), “Data Science Degrees: What Do Universities Need to Meet the Demand?,” Technology Solutions That Drive Education, available at https://edtechmagazine.com/higher/article/2019/01/data-science-degrees-what-do-universities-need-meet-demand-perfcon.
  • Eubanks, V. (2018), Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, New York: St. Martin’s Press.
  • Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., and Wenderoth, M. P. (2014), “Active Learning Increases Student Performance in Science, Engineering, and Mathematics,” Proceedings of the National Academy of Sciences of the United States of America, 111, 8410–8415, DOI: 10.1073/pnas.1319030111.
  • Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. (2020), “Array Programming With NumPy,” Nature, 585, 357–362, DOI: 10.1038/s41586-020-2649-2.
  • Hicks, M. (2017), Programmed Inequality: How Britain Discarded Women Technologists and Lost Its Edge in Computing, Cambridge, MA: The MIT Press.
  • Hicks, S. C., and Irizarry, R. A. (2018), “A Guide to Teaching Data Science,” The American Statistician, 72, 382–391, DOI: 10.1080/00031305.2017.1356747.
  • Hunter, J. D. (2007), “Matplotlib: A 2D Graphics Environment,” Computing in Science & Engineering, 9, 90–95, DOI: 10.1109/MCSE.2007.55.
  • JupyterHub (2017), available at https://github.com/jupyterhub/jupyterhub.
  • Keyes, O., Hutson, J., and Durbin, M. (2019), “A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry,” in Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, CHI EA ’19, Association for Computing Machinery, New York, NY, USA, pp. 1–11, DOI: 10.1145/3290607.3310433.
  • Kluyver, T., Ragan-Kelley, B., Perez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., and Jupyter Development Team (2016), “Jupyter Notebooks—A Publishing Format for Reproducible Computational Workflows,” in Positioning and Power in Academic Publishing: Players, Agents and Agendas, eds. F. Loizides and B. Scmidt, Amsterdam: IOS Press, pp. 87–90, DOI: 10.3233/978-1-61499-649-1-87.
  • Knaflic, C. N. (2015), Storytelling With Data: A Data Visualization Guide for Business Professionals, Hoboken, NJ: Wiley.
  • Kramer, A. D. I., Guillory, J. E., and Hancock, J. T. (2014), “Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks,” Proceedings of the National Academy of Sciences of the United States of America, 111, 8788–8790, DOI: 10.1073/pnas.1320040111.
  • Krzywinski, M., and Cairo, A. (2013), “Storytelling,” Nature Methods, 10, 687–687, DOI: 10.1038/nmeth.2571.
  • Lohr, S. (2012), “Opinion | Big Data’s Impact in the World,” The New York Times, available at https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html.
  • Loper, E., and Bird, S. (2002), “NLTK: The Natural Language Toolkit,” in ETMTNLP ’02: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, DOI: 10.3115/1118108.1118117.
  • Loy, A., Kuiper, S., and Chihara, L. (2019), “Supporting Data Science in the Statistics Curriculum,” Journal of Statistics Education, 27, 2–11, DOI: 10.1080/10691898.2018.1564638.
  • Lue, R. A. (2019), “Data Science as a Foundation for Inclusive Learning,” Harvard Data Science Review, 1, DOI: 10.1162/99608f92.c9267215.
  • Mangalindan, J. (2014), “How Tech Companies Compare in Employee Diversity,” Fortune, available at https://fortune.com/2014/08/29/how-tech-companies-compare-in-employee-diversity/.
  • Markow, W., Braganza, S., Taska, B., Miller, Steven, M., and Hughes, D. (2017), “The Quant Crunch: How the Demand for Data Science Skills Is Disrupting the Job Market,” available at https://www.bhef.com/publications/quant-crunch-how-demand-data-science-skills-disrupting-job-market.
  • Mason, H., and Patil, D. (2015), Data Driven, Sebastopol, CA: O’Reilly Media, Inc.
  • McKinney, W. (2010), “Data Structures for Statistical Computing in Python,” in Proceedings of the 9th Python in Science Conference, DOI: 10.25080/Majora-92bf1922-00a.
  • National Academies of Sciences, Engineering and Medicine (2018), Data Science for Undergraduates: Opportunities and Options, Washington, DC: National Academies Press, DOI: 10.17226/25104.
  • Nelson, B. (2014), “The Data on Diversity,” Communications of the ACM, 57, 86–95, DOI: 10.1145/2597886.
  • Neumann, D. L., Hood, M., and Neumann, M. M. (2013), “Using Real-Life Data When Teaching Statistics: Student Perceptions of This Strategy in an Introductory Statistics Course,” Statistics Education Research Journal, 12, 59–70.
  • Noble, S. U. (2018), Algorithms of Oppression: How Search Engines Reinforce Racism, New York: NYU Press.
  • Nolan, D., and Perrett, J. (2016), “Teaching and Learning Data Visualization: Ideas and Assignments,” The American Statistician, 70, 260–269, DOI: 10.1080/00031305.2015.1123651.
  • Nolan, D., and Temple Lang, D. (2010), “Computing in the Statistics Curricula,” The American Statistician, 64, 97–107, DOI: 10.1198/tast.2010.09132.
  • O’Neil, C. (2016), Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, New York: Crown Publishers.
  • Office for Civil Rights (2012), “Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule,” HHS.gov, available at https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html.
  • Patil, P., Peng, R. D., and Leek, J. T. (2016), “A Statistical Definition for Reproducibility and Replicability,” bioRxiv”no. 066803, DOI: 10.1101/066803.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011), “Scikit-Learn: Machine Learning in Python,” Journal of Machine Learning Research, 12, 2825–2830, DOI: 10.5555/1953048.2078195.
  • Peters, B. (2012), “The Age of Big Data,” Forbes, available at https://www.forbes.com/sites/bradpeters/2012/07/12/the-age-of-big-data/.
  • Piatetsky, G., and Gandhi, P. (2018), “How Many Data Scientists Are There and Is There a Shortage?,” KDnuggets, available at https://www.kdnuggets.com/2018/09/how-many-data-scientists-are-there.html.
  • Porter, L., Bouvier, D., Cutts, Q., Grissom, S., Lee, C., McCartney, R., Zingaro, D., and Simon, B. (2016), “A Multi-institutional Study of Peer Instruction in Introductory Computing,” in Proceedings of the 47th ACM Technical Symposium on Computing Science Education, SIGCSE ’16, ACM, New York, NY, USA, pp. 358–363, DOI: 10.1145/2839509.2844642.
  • Prince, M. (2004), “Does Active Learning Work? A Review of the Research,” Journal of Engineering Education, 93, 223–231, DOI: 10.1002/j.2168-9830.2004.tb00809.x.
  • Project Jupyter, Blank, D., Bourgin, D., Brown, A., Bussonnier, M., Frederic, J., Granger, B., Griffiths, T., Hamrick, J., Kelley, K., Pacer, M., Page, L., Pérez, F., Ragan-Kelley, B., Suchow, J., and Willing, C. (2019), “nbgrader: A Tool for Creating and Grading Assignments in the Jupyter Notebook,” Journal of Open Source Education, 2, 32, DOI: 10.21105/jose.00032.
  • Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., and Denton, E. (2020), “Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, Association for Computing Machinery, New York, NY, USA, pp. 145–151, DOI: 10.1145/3375627.3375820.
  • Richardson, L. (2007), “Beautiful Soup Documentation,” available at https://www.crummy.com/software/BeautifulSoup/bs4/doc/.
  • Rivers, E. (2017), Women, Minorities, and Persons With Disabilities in Science and Engineering, Alexandria, VA: National Science Foundation, available at https://ncses.nsf.gov/pubs/nsf19304/digest.
  • Rubin, M. J. (2013), “The Effectiveness of Live-Coding to Teach Introductory Programming,” in Proceeding of the 44th ACM Technical Symposium on Computer Science education—SIGCSE ’13, ACM Press, Denver, CO, USA, p. 651, DOI: 10.1145/2445196.2445388.
  • Ruiz Junco, P. (2017), “Data Scientist Personas: What Skills Do They Have and How Much Do They Make?,” Glassdoor Economic Research, available at https://www.glassdoor.com/research/data-scientist-personas/.
  • Russell, M. (2018), “University Students Flock to Data Science as Interest and Demand Surge,” Center for Digital Education, available at https://www.govtech.com/education/higher-ed/University-Students-Flock-to-Data-Science-as-Interest-and-Demand-Surge.html.
  • Salian, I. (2017), “Universities Rush to Add Data Science Majors as Demand Explodes—SFChronicle.com,” San Francisco Chronicle, available at https://www.sfchronicle.com/business/article/Universities-rush-to-add-data-science-majors-as-12170047.php.
  • Saltz, J. S., Dewar, N. I., and Heckman, R. (2018), “Key Concepts for a Data Science Ethics Curriculum,” in Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE ’18, Association for Computing Machinery, Baltimore, MD, USA, pp. 952–957, DOI: 10.1145/3159450.3159483.
  • Schlegel, K., Linden, A., Sallam, R., Howson, C., Sicular, S., Hare, J., Krensky, P., Tapadinhas, J., and Heizenberg, J. (2016), “Predicts 2017: Analytics Strategy and Technology,” available at https://www.gartner.com/en/documents/3531618/predicts-2017-analytics-strategy-and-technology.
  • Seabold, S., and Perktold, J. (2010), “Statsmodels: Econometric and Statistical Modeling with Python,” in Proceedings of the 9th Python in Science Conference, Austin, TX, pp. 92–96, DOI: 10.25080/Majora-92bf1922-011.
  • Shields, M. (2005), “Information Literacy, Statistical Literacy, Data Literacy,” IASSIST Quarterly, 28, 6, DOI: 10.29173/iq790.
  • Sweeney, L. (2015), “Only You, Your Doctor, and Many Others May Know,” Technology Science, 2015092903, https://techscience.org/a/2015092903/.
  • Taschuk, M., and Wilson, G. (2017), “Ten Simple Rules for Making Research Software More Robust,” PLOS Computational Biology, 13, e1005412,DOI: 10.1371/journal.pcbi.1005412.
  • Tate, E. (2017), “Data Analytics Programs Taking Off at Colleges,” Inside Higher Ed, available at https://www.insidehighered.com/digital-learning/article/2017/03/15/data-analytics-programs-taking-colleges.
  • Tufte, E. R. (1986), The Visual Display of Quantitative Information, Cheshire, CT: Graphics Press.
  • Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ, Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., and van Mulbregt, P. (2020), “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, 17, 261–272, DOI: 10.1038/s41592-019-0686-2.
  • Voytek, B. (2016), “The Virtuous Cycle of a Data Ecosystem,” PLOS Computational Biology, 12, e1005037,DOI: 10.1371/journal.pcbi.1005037.
  • Voytek, B. (2017), “Social Media, Open Science, and Data Science Are Inextricably Linked,” Neuron, 96, 1219–1222, DOI: 10.1016/j.neuron.2017.11.015.
  • Waskom, M., Botvinnik, O., Ostblom, J., Gelbart, M., Lukauskas, S., Hobson, P., Gemperline, D. C., Augspurger, T., Halchenko, Y., Cole, J. B., Warmenhoven, J., de Ruiter, J., Pye, C., Hoyer, S., Vanderplas, J., Villalba, S., Kunter, G., Quintero, E., Bachant, P., Martin, M., Meyer, K., Swain, C., Miles, A., Brunner, T., O’Kane, D., Yarkoni, T., Williams, M. L., Evans, C., Fitzgerald, C., and Brian (2020), “mwaskom/seaborn: v0.10.1 (April 2020),” Zenodo, DOI: 10.5281/zenodo.3767070.
  • Wessel, J. R., Gorgolewski, K. J., and Bellec, P. (2019), “Switching Software in Science: Motivations, Challenges, and Solutions,” Trends in Cognitive Sciences, 23, 265–267, DOI: 10.1016/j.tics.2019.01.004.
  • Wickham, H. (2014), “Tidy Data,” Journal of Statistical Software, 59, 1–23, DOI: 10.18637/jss.v059.i10.
  • Wilkinson, L. (1999), The Grammar of Graphics, Berlin, Heidelberg: Springer-Verlag.
  • Williams, D. L., Beard, J. D., and Rymer, J. (1991), “Team Projects: Achieving Their Full Potential,” Journal of Marketing Education, 13, 45–53, DOI: 10.1177/027347539101300208.
  • Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis, M., Guy, R. T., Haddock, S. H. D., Huff, K. D., Mitchell, I. M., Plumbley, M. D., Waugh, B., White, E. P., and Wilson, P. (2014), “Best Practices for Scientific Computing,” PLoS Biology, 12, e1001745,DOI: 10.1371/journal.pbio.1001745.
  • Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., and Teal, T. K. (2017), “Good enough practices in scientific computing,” PLOS Computational Biology, 13, e1005510,DOI: 10.1371/journal.pcbi.1005510.
  • Wood, B. L., Mocko, M., Everson, M., Horton, N. J., and Velleman, P. (2018), “Updated Guidelines, Updated Curriculum: The GAISE College Report and Introductory Statistics for the Modern Student,” CHANCE, 31, 53–59, DOI: 10.1080/09332480.2018.1467642.
  • Wu, X., and Zhang, X. (2017), “Automated Inference on Criminality Using Face Images,” arXiv no. 1611.04135.
  • Wu, X., Zhu, X., Wu, G.-Q., and Ding, W. (2014), “Data Mining With Big Data,” IEEE Transactions on Knowledge and Data Engineering, 26, 97–107, DOI: 10.1109/TKDE.2013.109.
  • Xie, Y., Allaire, J. J., and Grolemund, G. (2018), R Markdown: The Definitive Guide, Boca Raton, FL: Chapman and Hall/CRC.
  • Yan, D., and Davis, G. E. (2019), “A First Course in Data Science,” Journal of Statistics Education, 27, 99–109, DOI: 10.1080/10691898.2019.1623136.
  • Yau, N. (2013), Data Points: Visualization That Means Something, New York: Wiley.
  • Zang, J., Dummit, K., Graves, J., Lisker, P., and Sweeney, L. (2015), “Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps,” Technology Science, 2015103001, available at https://techscience.org/a/2015103001/.