487
Views
0
CrossRef citations to date
0
Altmetric
Teacher’s Corner

Facilitating Authentic Practice for Early Undergraduate Statistics Students

Pages 433-444 | Received 27 Mar 2020, Accepted 11 Oct 2020, Published online: 14 Dec 2020
 

Abstract

In current curricula, authentic statistical practice generally only occurs in capstone projects undertaken by advanced undergraduate and Master’s students. We argue that deferring practice is a mistake: undergraduate students should achieve experience via repeated practice from their first years onward, to achieve heightened levels of confidence and competence prior to graduation. However, statistical practice is not a “one size fits all” enterprise: for instance, elements of a capstone experience, such as extensive data preprocessing, may be out of place in earlier practice settings due to less-experienced students’ relative lack of coding skill. We describe a course we have implemented at Carnegie Mellon University, currently open to second-year students, that provides a circumscribed opportunity for statistical practice that limits coding breadth, uses fully curated data, treats statistical learning models as “gray boxes” to be understood qualitatively, and provides open-ended semester-long projects that students pursue outside of class. We show how pre- and post-course assessment tests and retrospective surveys indicate clear gains in the students’ knowledge of, and attitudes toward, statistical practice. Given its clear benefits, we feel that statistics and data science programs should offer a course like the one we describe to all undergraduate students pursuing statistics and data science degrees.

Supplementary Material

Fall 2018 Assessment and Retrospection: Text and Results:This document provides the texts of the pre- and post-course assessment tests and the retrospective surveys given in Fall 2018, along with results and commentary (.pdf file)

Introduction to Statistical Research Methodology: Highlighted Concepts:This document provides important analysis-specific concepts that are highlighted in ISRM; it is meant to complement the higher-level learning objectives discussion of Section 5 (.pdf file)

Lab: Best-Subset Selection:An example of a solution set for a lab that focuses upon best-subset selection. The accompanying lecture is available online at the class GitHub site. Other lab solution sets are available from the author upon request (.html file)

Acknowledgments

The author would like to thank Rebecca Nugent, Joel Greenhouse, Christopher Genovese, and others within the department for supporting the creation of Introduction to Statistical Research Methodology and helping to ensure its ongoing success, and to also thank Sarah Woodley and Emily Weiss for guidance on learning objectives and assessment.

Notes

1 As an illustration, we contrast our focus to that of, for example, Loy, Kuiper, and Chihara (Citation2019), whose tutorials include eight that focus on data processing and visualization and a ninth that introduces students to classification and regression tree models.

2 We note here that while the focus of this article is on how to provide statistical practice to early undergraduates, what we describe can be, and has been extended to provide training in statistical practice to graduate students and faculty from outside of the data sciences, as well as to those in industry via executive education, etc. (see, e.g., https://pefreeman.github.io/SLSW_2019.)

3 In this course, students apply exploratory data analysis, elementary probability, hypothesis testing, and empirical research methods to data drawn from interdisciplinary case studies, and write up their results in scientific reports. We do not use a textbook. Students may receive credit for the course by achieving a score of 4 or 5 on the AP Statistics exam.

4 The one most typically taken, Methods for Statistics & Data Science (MSDS), extends Reasoning With Data by covering the basics of simple and multiple linear regression, ANOVA, logistic regression, classification, and clustering. Like the previous course, it features the writing of reports, and like the previous course, there is no textbook. Note that ISRM, which can also be used to fulfill the intermediate data analysis requirement but is typically taken along with MSDS, does not specifically leverage material covered in MSDS or Reasoning with Data; rather, it developed organically from the author’s experiences, described below, in providing research opportunities to early undergraduates.

6 For course details and materials (lectures and datasets), see https://github.com/pefreeman/36-290. Note that laboratory assignments are not provided but are available upon request.

7 Alternatively, one can use RStudio Cloud.

8 Examples include https://education.rstudio.com and https://rstudio.cloud/learn/primers.

9 A student conventionally takes this course prior to taking upper-division regression courses, so as to be prepared to process data and to code statistical analyses.

10 We provide an example of a lab assignment as Supplementary Material.

12 It is important to note that given the conditions of the course, the research question and information about the data are provided a priori, and thus students simply need to summarize that information. However, they are made aware that in more typical situations (like an advanced-undergraduate capstone project) the client should always be queried for this information.

13 Weeks 8 and 12 focused on the semester projects and Week 14 was Thanksgiving week. While this article is being prepared, plans are being made for teaching in the Fall 2020 semester in the face of the continued SARS-CoV-2 pandemic. Our university has mandated a hybrid approach for classes with 25 or fewer students, in which we combine in-person teaching to those on campus in a de-densified environment, remote teaching to those that can attend class sessions live via Zoom, and asynchronous content. Having surveyed the incoming course cohort, we can confidently predict that the majority of students will attend the class in person, if allowed to return to campus by the university. Additionally, given the experience gained teaching similar lab- and project-based classes in the second half of the Spring 2020 semester, we can also confidently state that we will not have to alter the course timeline, content, and assignments. The primary concern will be integrating remote and asynchronous students into the class culture. For remote students, integration will be accomplished by mandating that all students, even those inside the classroom, use Zoom, so that we can use (a) breakout rooms to, for example, bring friend groups together at social distance, and (b) screen sharing to minimize contact between students asking questions and both the author and the course teaching assistant. Recordings of Zoom sessions will be made, additional office hours will be scheduled, and additional support will be provided for those who will participate asynchronously. The final two weeks of the course, post-Thanksgiving, will be purely remote and we will forego traditional class periods to shift to Zoom meetings held with each semester project group at times appropriate for the group participants.

14 There are many lower-level, analysis-specific details that comprise ISRM’s foundational knowledge that we will not list here. The interested reader will find these details within the Supplementary Material.

15 The use of diagnostic tests and retrospective surveys of attitudes for research purposes was approved by the CMU Institutional Review Board as STUDY2018_00000373.

16 The quantitative results of the retrospective survey completed by the Fall 2018 class cohort indicates that current students also perceive that these benefits exist, although it is still too early to determine fully how the course has affected these students’ performances in upper-division classes, etc.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 106.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.