4,291
Views
50
CrossRef citations to date
0
Altmetric
DATA SCIENCE

A Guide to Teaching Data Science

ORCID Icon & ORCID Icon
Pages 382-391 | Received 22 Dec 2016, Published online: 14 Nov 2018
 

ABSTRACT

Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed in 1999. We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch in 1999 and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course. Supplementary materials for this article are available online.

Acknowledgments

The authors thank Joe Blitzstein and Hanspeter Pfister, the creators of CS109, from which they borrowed several logistical ideas, David Robinson, their tidyverse and ggplot2 guru, for advice and for his guest lectures, Alyssa Frazee who helped develop the movie ratings lecture, Joe Paulson for suggesting Google polls, Héctor Corrada-Bravo for advice on teaching Machine Learning, GitHub Education for providing free private repositories, Garrett Grolemund, Sherri Rose, and Christine Choirat for presenting guest lectures, all of the TAs from our Introduction to Data Science course BIO260 (Luis Campos, Stephanie Chan, Brian Feeny, Ollie McDonald, Hilary Parker, Kela Roberts, Claudio Rosenberg, Ayshwarya Subramanian), and GroupLens for giving them permission to adapt and redistribute part of the ‘ml-latest’ MovieLens dataset on their course website. The authors thank Jeff Leek for comments and suggestions that improved the manuscript and Scott Zeger for a helpful discussion.

Supplementary Materials

The supplemental contains a discussion on how past contributions from statistics have influenced today's data science and a discussion on expanding and updating the statistics curriculum. It also contains a description of the individual case studies discussed in this article.

Notes

Additional information

Funding

The authors thank NIH R25GM114818 grant for partial support for creating the teaching materials.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 106.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.