4,449
Views
5
CrossRef citations to date
0
Altmetric
Editorial

Integrating Computing in the Statistics and Data Science Curriculum: Creative Structures, Novel Skills and Habits, and Ways to Teach Computational Thinking

ORCID Icon & ORCID Icon
Pages S1-S3 | Published online: 22 Mar 2021

ABSTRACT

Nolan and Temple Lang argued for the fundamental role of computing in the statistics curriculum. In the intervening decade, the statistics education community has acknowledged that computational skills are as important to statistics and data science practice as mathematics. There remains a notable gap, however, between our intentions and our actions. In this special issue of the Journal of Statistics and Data Science Education, we have assembled a collection of articles that (1) suggest creative structures to integrate computing, (2) describe novel data science skills and habits, and (3) propose ways to teach computational thinking. We believe that it is critical for the community to redouble our efforts to embrace sophisticated computing in the statistics and data science curriculum. We hope that these articles provide useful guidance for the community to move these efforts forward.

1 Introduction

In their 2010 article “Computing in the Statistics Curriculum,” Deborah Nolan and Duncan Temple Lang noted that “computational literacy and programming are as fundamental to statistical practice and research as mathematics” and that “these changes necessitate re-evaluation of the training and education practices in statistics” (Nolan and Temple Lang Citation2010). We could not agree more about the fundamental role of computing and the need for change at all educational levels. Over the last decade, we have seen the role of computing in the statistics curriculum change and grow. The tools have become better, computing is now more established in almost every classroom, and arguably most importantly, the development and success of modern statistics has been enhanced by ideas of computational thinking.

Before introducing the articles in this special issue, we reflect on the questions originally posed by Nolan and Temple Lang:

  1. When they graduate, what ought our students be able to do computationally, and are we preparing them adequately in this regard?

  2. Do we provide students the essential skills needed to engage in statistical problem solving and keep abreast of new technologies as they evolve?

  3. Do our students build the confidence needed to overcome computational challenges to, for example, reliably design and run a synthetic experiment or carry out a comprehensive data analysis?

  4. Overall, are we doing a good job preparing students who are ready to engage in and succeed at statistical inquiry?

Nolan and Temple Lang also provided a damning critique of the status quo at the time:

Many statisticians advocate—or at least practice—the approach in which students are told to learn how to program by themselves, from each other, or from their teaching assistant in a two-week “crash course” in basic syntax at the start of a course. Let us reflect on how effective this approach has been. Can our students compute confidently, reliably, and efficiently? We find that this do-it-yourself “lite” approach sends a strong signal that the material is not of intellectual importance relative to the material covered in lectures. In addition, students pick up bad habits, misunderstandings, and, more importantly, the wrong concepts. They learn just enough to get what they need done, but they do not learn the simple ways to do things nor take the time to abstract what they have learned and assimilate these generalities. Their initial knowledge shapes the way they think in the future and typically severely limits them, making some tasks impossible (p. 100).

We concur that such an approach to computation is insufficient and at times counterproductive.

What has happened in the intervening decade? We believe that there is a growing consensus on the importance of computational literacy and computing in the statistics and data science curriculum. The American Statistical Association’s updated Guidelines for Undergraduate Programs in Statistics (American Statistical Association 2014), the revised GAISE (Guidelines for Assessment and Instruction in Statistics Education) College report (American Statistical Association Citation2016), and the National Academies of Science, Engineering, and Medicine’s consensus study on “Data Science for Undergraduates: Opportunities and Options” (National Academies of Science, Engineering, and Medicine Citation2018) provide detailed rationales for the fundamental role computing plays in statistical thinking. The Association for Computing Machinery Data Science group is working to characterize important computing capacities for data science programs (ACM 2019).

More pointedly, George Cobb (Citation2015) noted a convergence of mathematics, computation, and context in statistics education and called for a deep-rethinking of the curriculum from the ground up. The “Mere Renovation Is Too Little Too Late” article sparked 19 spirited responses and a provocative rejoinder (more on the “tear-down” metaphor) that challenged the community in a number of fundamental ways (Various 2015).

We envisioned this special issue as a way both to highlight innovations and approaches that have helped move the profession forward, as well as to identify places where future work is needed. Many of these articles work to answer the questions posed by Nolan and Temple Lang as well as ones they had not anticipated in 2010.

The set of articles included in the special issue can be organized into three non-mutually exclusive clusters that take different approaches to address the questions laid out by Nolan and Temple Lang. The first approach features creative structures for changing how we integrate computing into the learning of statistics. The second approach focuses on novel or technical data science skills and habits. The third reflects that, more and more, statistics educators are embracing and teaching ideas of computational thinking.

2 Creative Structures

Restructuring how we conceive of a syllabus and how we teach particular material is never a small task. However, as different individuals modernize their own courses, we can all learn from their experiences. Both Çetinkaya-Rundel and Ellison (Citation2021) and Donoghue, Voytek, and Ellis (Citation2021) describe creative and modern data science courses that fold together aspects of statistical inference with vital computational skills. Schwab-McCoy, Baker, and Gasper (Citation2021) report on a study describing the emerging consensus of the elements of a data science course. Kim and Henke (Citation2021) present some of the technical aspects vital to getting a solid computational course up and running. A less technical approach is described by Burckhardt, Nugent, and Genovese (Citation2021) using the suite of materials implemented by their integrated statistics learning environment (ISLE). An immersive data science living and learning community is presented by Gundlach and Ward (Citation2021). Finally, Theobold, Hancock, and Mannheimer (Citation2021) describe an alternative to course learning through a series of workshops.

3 Novel or Technical Data Science Skills and Habits

The world of data science is rapidly changing, and it can be incredibly difficult to keep up. Many of the articles in this special issue focus on new, important, and exciting skills and tools that are essential for students if they want to contribute in today’s data-centric world. Boehm and Hanlon (Citation2021) and Çetinkaya-Rundel and Ellison (Citation2021) discuss the full cycle of iterating a data science project. Kim and Hardin (Citation2021) take it one step further and describe the importance of iterating on the full cycle. A few specific skills are laid out in detail: Dogucu and Çetinkaya-Rundel (Citation2021) describe web scraping and Adams et al. (Citation2021) explore techniques for working with multivariate data. Beckman et al. (Citation2021) compare ways of incorporating Git in the statistical classroom so that students have the skills to hit the ground running in jobs and in their own data projects, reinforcing the value of reproducible workflows as a foundation for reproducible research.

4 Computational Thinking

The last approach may be the most difficult for statistics and data science educators to embrace and implement in their own classes. The value of bringing in ideas of software engineering or computational thinking is that they help create a mindset that empowers students to simultaneously think both statistically and computationally. Wing (Citation2006) described how computing can impact a field, for example, “Computer science’s contribution to biology goes beyond the ability to search through vast amounts of sequence data looking for patterns. The hope is that data structures and algorithms—our computational abstractions and methods—can represent the structure of proteins in ways that elucidate their function. Computational biology is changing the way biologists think.”

As a discipline, we are embracing the many ways that computing is changing how statisticians think. Woodard and Lee (Citation2021) report on a study where students spoke through their thought process as they performed computational tasks; their results are, somewhat unsurprisingly, that computing is difficult and not intuitive. Schwab-McCoy, Baker, and Gasper (Citation2021) describe the challenge in front of us to teach computational thinking effectively. Donoghue, Voytek, and Ellis (Citation2021), who describe debugging, and Theobold, Hancock, and Mannheimer (Citation2021), who discuss the teaching of iteration (a fundamental component of algorithmic thinking), speak to integrating small pieces of computational thinking within the data science curriculum. Reinhart and Genovese (Citation2021) describe an entire course that is a cross between software engineering and statistics, providing insight into the types of skills that many statisticians (at all levels) need for success.

5 What Would Deb and Duncan Say?

As we worked on the special issue, we thought that there would be value in asking the authors of Nolan and Temple Lang (Citation2010) to share their thoughts about the articles, what they saw as most valuable, and to peer into the future. Their incisive and provocative retrospective leads off the special issue (Nolan and Temple Lang Citation2021).

6 Conclusion

The articles in the special issue encourage us to redouble our efforts to embrace computing in the classroom, to constantly push ourselves to learn more tools, and to let computational thinking make our own work better. We believe that the leading thinkers of the next decade will be those who seamlessly knit together tools from both statistics and computing and that how we think about statistics will be informed by complementary computational thinking. To forge ahead we need to cultivate computing foundations throughout the statistics paradigm. It is our hope that the articles in this special issue initiate a new way of thinking for you and your students.

Funding

This work was supported by Division of Information and Intelligent Systems (1923388).

References

  • Adams, B., Baller, D., Jonas, B., Joseph, A.-C., and Cummiskey, K. (2021), “Computational Skills for Multivariable Thinking in Introductory Statistics,” Journal of Statistics and Data Science Education, 29, 1–21, DOI: 10.1080/10691898.2020.1852139.
  • American Statistical Association (2014), “Curriculum Guidelines for Undergraduate Programs in Statistical Science,” available at https://www.amstat.org/asa/education/Curriculum-Guidelines-for-Undergraduate-Programs-in-Statistical-Science.aspx.
  • American Statistical Association (2016), “Guidelines for Assessment and Instruction in Statistics Education (GAISE) Revised College Report,” available at https://www.amstat.org/asa/education/Guidelines-for-Assessment-and-Instruction-in-Statistics-Education-Reports.aspx.
  • Association for Computing Machinery Data Science Task Force (2019), “Computing Competencies for Undergraduate Data Science Curricula: Draft 2,” available at http://dstf.acm.org/DSReportDraft2Full.pdf. Accessed 2021 January 24.
  • Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J., and Tackett, M. (2021), “Implementing Version Control With Git and GitHub as a Learning Objective in Statistics and Data Science Courses,” Journal of Statistics and Data Science Education, 29, 1–35, DOI: 10.1080/10691898.2020.1848485.
  • Boehm, F. J., and Hanlon, B. M. (2021), “What Is Happening on Twitter? A Framework for Student Research Projects With Tweets,” Journal of Statistics and Data Science Education, 29, DOI: 10.1080/10691898.2020.1848486.
  • Burckhardt, P., Nugent, R., and Genovese, C. R. (2021), “Teaching Statistical Concepts and Modern Data Analysis With a Computing-Integrated Learning Environment,” Journal of Statistics and Data Science Education, 29, 1–28, DOI: 10.1080/10691898.2020.1854637.
  • Çetinkaya-Rundel, M., and Ellison, V. (2021), “A Fresh Look at Introductory Data Science,” Journal of Statistics and Data Science Education, 29, 1–11, DOI: 10.1080/10691898.2020.1804497.
  • Cobb, G. (2015), “Mere Renovation Is Too Little Too Late: We Need to Rethink Our Undergraduate Curriculum From the Ground Up,” The American Statistician, 69, 266–282. DOI: 10.1080/00031305.2015.1093029.
  • Dogucu, M., and Çetinkaya-Rundel, M. (2021), “Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities,” Journal of Statistics and Data Science Education, 29, 1–11, DOI: 10.1080/10691898.2020.1787116.
  • Donoghue, T., Voytek, B., and Ellis, S. E. (2021), “Teaching Creative and Practical Data Science at Scale,” Journal of Statistics and Data Science Education, 29, 1–22, DOI: 10.1080/10691898.2020.1860725.
  • Gundlach, E., and Ward, M. D. (2021), “The Data Mine: Enabling Data Science Across the Curriculum,” Journal of Statistics and Data Science Education, 29, 1–14, DOI: 10.1080/10691898.2020.1848484.
  • Kim, A. Y., and Hardin, J. (2021), “‘Playing the Whole Game’: A Data Collection and Analysis Exercise With Google Calendar,” Journal of Statistics and Data Science Education, 29, 1–10, DOI: 10.1080/10691898.2020.1799728.
  • Kim, B., and Henke, G. (2021), “Easy-to-Use Cloud Computing for Teaching Data Science,” Journal of Statistics and Data Science Education, 29, 1–18, DOI: 10.1080/10691898.2020.1860726.
  • National Academies of Science, Engineering, and Medicine (2018), “Data Science for Undergraduates: Opportunities and Options,” available at https://nas.edu/envisioningds.
  • Nolan, D. A., and Temple Lang, D. (2010), “Computing in the Statistics Curriculum,” The American Statistician, 64, 97–107. DOI: 10.1198/tast.2010.09132.
  • Nolan, D. A., and Temple Lang, D. (2021), “Computing in the Statistics Curricula: A 10-Year Retrospective,” Journal of Statistics and Data Science Education, 29, DOI: 10.1080/10691898.2020.1862609.
  • Reinhart, A., and Genovese, C. R. (2021), “Expanding the Scope of Statistical Computing: Training Statisticians to Be Software Engineers,” Journal of Statistics and Data Science Education, 29, 1–23, DOI: 10.1080/10691898.2020.1845109.
  • Schwab-McCoy, A., Baker, C. M., and Gasper, R. E. (2021), “Data Science in 2020: Computing, Curricula, and Challenges for the Next 10 Years,” Journal of Statistics and Data Science Education, 29, 1–17, DOI: 10.1080/10691898.2020.1851159.
  • Theobold, A. S., Hancock, S. A., and Mannheimer, S. (2021), “Designing Data Science Workshops for Data-Intensive Environmental Science Research,” Journal of Statistics and Data Science Education, 29, 1–31, DOI: 10.1080/10691898.2020.1854636.
  • Various (2015), “Discission Papers: Mere Renovation Is Too Little Too Late: We Need to Rethink Our Undergraduate Curriculum From the Ground Up,” The American Statistician, 69, available at https://nhorton.people.amherst.edu/mererenovation.
  • Wing, J. M. (2006), “Computational Thinking,” Communications of the ACM, 49, 33–35. DOI: 10.1145/1118178.1118215.
  • Woodard, V., and Lee, H. (2021), “How Students Use Statistical Computing in Problem Solving,” Journal of Statistics and Data Science Education, 29, 1–18, DOI: 10.1080/10691898.2020.1847007.