1,265
Views
0
CrossRef citations to date
0
Altmetric
Article

Increasing Student Access to and Readiness for Statistical Competitions

ORCID Icon & ORCID Icon
Pages 258-263 | Published online: 16 Feb 2023

Abstract

Statistical competitions like ASA DataFest and the Women in Data Science (WiDS) Datathon give students valuable experience working with real, challenging data. By participating, students practice important statistics and data science skills including data wrangling, visualization, modeling, communication, and teamwork. However, while advanced students may have already acquired these skills over the course of their undergraduate program, students with less experience often need additional preparation to participate. In this article, we discuss strategies and targeted activities for helping lower-level students feel comfortable and prepared to compete in events like DataFest. We also share how we used these tools to create a low-stakes DataFest preparation course at our institution. Supplementary materials for this article are available online.

1 Introduction

As interest in undergraduate statistics and data science programs has grown, so has student participation in data analysis competitions. One such competition is the American Statistical Association (ASA) DataFest which began in 2011 with 30 participants at UCLA (Gould Citation2014). Today more than 2000 participants from dozens of institutions in the United States and Canada take part in DataFest. Other competitions include events like the Women in Data Science (WiDS) Datathon (WiDS Citation2021), the NFL Big Data Bowl (NFL Football Operations Citation2022), and numerous Kaggle competitions (Polak and Cook Citation2021; Kaggle Citation2022) and challenges at individual institutions.

These competitions typically involve students working in teams to analyze large, complex datasets, guided by research questions from the competition sponsors, organizers, or clients. For example, the 2017 iteration of DataFest used data from millions of searches on Expedia.com, while the 2022 WiDS Datathon involved predicting building energy use with data from Climate Change AI and Lawrence Berkeley National Laboratory (Data: WiDS Datathon, Version 1 Citation2022). Sometimes questions of interest are broad and open-ended—DataFest events often give awards for Best Insight, Best Visualization, and Best Use of External Data—while other competitions have a more specific goal in mind. Kaggle competitions, for instance, usually judge participants on how well their models predict on unseen, heldout test data (Polak and Cook Citation2021).

1.1 Benefits of Competitions

Working with challenging data to address research questions requires participants to clean and visualize data, choose appropriate summary statistics and models, and communicate their results to competition judges. These features are similar to benefits of open-ended course projects (Bailey, Spence, and Sinn Citation2013), and are aligned with many GAISE recommendations (GAISE College Report ASA Revision Committee Citation2016). Competitions are therefore an excellent opportunity for students to engage in the full, iterative process of data analysis described by Wickham and Grolemund (Citation2016) and Kim and Hardin (Citation2021) (including data collection, if participants search for outside data to aid their analysis).

Furthermore, students gain additional exposure to real data and collaboration (Çetinkaya-Rundel and Stangl Citation2013; Bray Citation2014; Gould Citation2014). These are valuable experiences when applying for jobs and internships, and can help shape future career choices. Previous research also supports the benefits of competitions; Polak and Cook (Citation2021) found a small but statistically significant increase in learning associated with participation in in-class Kaggle competitions, while Van Nuland et al. (Citation2015), Calnon, Gifford, and Agah (Citation2012), and Carpio Cañada et al. (Citation2015) found improvements in test scores after participating in anatomy, robotics, and artificial intelligence competitions.

1.2 Challenges for Lower-Level Students

However, the greater size and complexity of competition data, and the open-ended nature of many research questions, make it challenging for lower-level students to access these competitions. By “lower-level” students we mean those who have taken only one or two statistics courses; these students may be familiar with linear regression and some basic computing, but have not seen more advanced topics like clustering, dimension reduction, decision trees, and hierarchical models. Moreover, even if introductory statistics courses involve data analysis projects, lower-level students are unlikely to have worked with data of the size and complexity found in DataFest or the WiDS Datathon, and simply have less practice with the full data analysis cycle. Finally, lower-level students may feel uncomfortable competing against more advanced students, or joining their teams.

1.3 Contributions

In this article, we describe strategies we used to encourage our lower-level students to participate in DataFest 2022 and to help prepare them for the competition. We also present activities and tips for other educators who want to help students prepare to compete in a statistical competition such as DataFest. Our approach involves two main steps. First, we created two levels of competition, with teams assigned to each level based on their previous background in statistics. This created a specific space within DataFest for the lower-level students we sought to recruit. We then created a series of activities which focused on helping students develop and practice important skills for successful participation in DataFest. We presented these activities to students in the form of a low-stakes one-credit course in the semester leading up to DataFest. We describe the different levels, the course, and the activities we created in Section 2. In Section 3, we summarize the results of our efforts, and in Section 4 we discuss possible modifications and tips for other instructors.

All of the resources and activities used in our course can be found on our website (https://datafest-prep.github.io/), and the source files are available on our GitHub repository (https://github.com/datafest-prep/datafest-prep.github.io). The project discussed in this article was deemed exempt from human subjects research approval by our institution’s IRB.

2 Materials and Methods

2.1 ASA DataFest

DataFest is an annual data analysis competition which takes place over the course of one weekend, typically from Friday evening to Sunday afternoon. A new dataset is used each year, and is kept secret from students until it is revealed at the beginning of the competition. A wide range of data has been used in past years, and the dataset is typically large and rich enough to yield many possible approaches for analysis.

While DataFest began at UCLA, competitions are now held at multiple sites; those interested in hosting an official ASA DataFest can find information at https://ww2.amstat.org/education/datafest/. Events are typically held in-person in March through May, but remote events have been used in response to the COVID-19 pandemic. Different sites may hold their events on different weekends, but all official events use the same data, and participants are prohibited from sharing any information about the data until after all events have occurred. Furthermore, participants sign agreements that they will delete copies of the data after the event, and will not share specifics outside the event.

Over the course of the competition weekend, students work in teams to analyze the data and answer high-level questions posed by the data provider, who serves as the client for the competition. At the end of the event, students present their results to a panel of judges which evaluates their work and awards prizes for Best Insight, Best Visualization, and Best Use of External Data.

2.2 Competition Levels

In working to encourage lower-level students to engage in DataFest, our first step was to focus on creating a sense of belonging for our lower-level students. Our goal was to communicate clearly to students that they were wanted and that they had a place to succeed in DataFest regardless of their background in statistics. Creating a sense of belonging is important for student recruitment and retention (O’Keeffe Citation2013), and our approach to doing this was to divide our DataFest competition into two different competition levels. Levels were based solely on the statistics and data science background of each student, and teams only competed against other teams at their competition level. Our two levels are:

  • Level 1: Students who have taken only 1–2 statistics/data science courses. These are students who have no more than regression knowledge at the level of Stat2 (Cannon et al. Citation2018).

  • Level 2: Students who have taken 3+ stats courses.

Historically at Wake Forest University, DataFest has been attended by Level 2 students, primarily juniors and seniors with multiple statistics and computer science courses. The goal with creating these two levels was to make it clear that Level 1 students were wanted at DataFest and that they had a chance to be competitive. Accordingly, Level 1 was designed to target our lower-level population of interest. The strategy of creating competition levels could be adapted to suit the needs of an individual campus, for instance creating a level for graduate students or a level for intermediate students.

Because teams only competed against others in their competition level, we also adapted the award system for DataFest. Typically, DataFest has awards for Best Insight, Best Visualization, and Best Use of External Data (American Statistical Association Citation2022). At DataFest 2022 at Wake Forest, we had a winner and a runner up for Level 1, and a winner and a runner up for Level 2. In other words, our two Levels were not judged against one another. Judging for the two levels was conducted in separate rooms, with each judge assigned to one of the levels, which also reduced the number of presentations each judge needed to assess.

As a practical note, dividing the teams into two levels requires querying the students about their statistics background and manually assigning each student to levels. Teams were chosen by the students themselves, though we helped place a few individuals who registered on their own. When students chose their teams, teams with all Level 1 students were assigned to Level 1, and teams with all Level 2 students or with a mixture of Level 1 and Level 2 students were assigned to Level 2. We note that we had only one of these mixed teams, with four Level 2 students and one Level 1 student.

2.3 Recruiting and the Course

In order for the competition levels discussed in Section 2.2 to aid in the process of recruiting students, it was important to clearly communicate the structure we were creating for DataFest to our target students (Level 1 students). To advertise DataFest, in Fall 2021 we E-mailed all students in STA 112, the second-semester statistics course at Wake Forest University, which covers R (R Core Team Citation2021), linear regression, and some logistic regression using Cannon et al. (Citation2018). Most students in STA 112 are in their first or second year. In this E-mail we described DataFest and the two different competition levels, emphasizing that students would compete only against other teams at their level as well as highlighting the potential benefits of attending such a competition. Separate E-mails advertising DataFest were also sent in Fall 2021 and Spring 2022 to all majors and minors in our department.

We also introduced another strategy targeted at helping our Level 1 students succeed in DataFest: STA 175. STA 175 Statistical Competitions is a one-credit pass/fail course that ran in Spring 2022. The course does not fulfill any major, minor, or general education requirement. Instead, the purpose of this course was to engage Level 1 students in activities designed to help them develop and practice skills they would need to succeed in DataFest. The course was deliberately designed to be low-stakes. The goal was to encourage lower-level students interested in DataFest to take this course and increase their preparation for the competition. Accordingly, the course was set up like a weekly club meeting. Each week the students met with their teams for 50 min and worked together on an activity. These activities (which we discuss in detail in Section 2.4) were designed as guided tutorials to help students learn and practice skills necessary for success at DataFest. Both faculty members who volunteered to lead this course moved around the room discussing the activity with students and answering questions. Students earned a “Pass” in STA 175 if they attended class each week, worked through the activity, and participated fully in DataFest. These activities and more details about the course will be discussed in Section 2.4.

Registration for the course was not required to compete in DataFest, but was encouraged for lower-level students. STA 175 was advertised in the DataFest recruiting E-mails sent in Fall 2021, recommending that interested lower-level students sign up for STA 175 to practice for the competition. Here is an excerpt:

If you are interested in DataFest, you can sign up for STA 175, a one credit pass/fail course that will meet once a week. The only pre-req is STA 112. No homework, no tests, but a chance to learn about R and learn about working with real data. This is especially useful for those who are relatively new to R or looking to learn more about it. If you participate in class and participate fully in DataFest, you pass!

We note that while the STA 175 course was targeted at Level 1 students, Level 2 students were also able to take the course if they so chose. Our final enrollment of 35 students included 27 Level 1 students and 8 Level 2 students.

The choice to present the DataFest preparation activities in the form of a class allowed the faculty running DataFest to have an accurate count of the students who would be participating in DataFest, which helped with budgeting for food and other necessities for the event. It also meant that “Statistical Competitions” appeared on the transcripts for our students, which is an asset to students as they apply for jobs or for admission to graduate programs. However, instead of presenting these activities in a class format, the structure of meeting once a week to work through activities could easily be used as a structure for a statistics/data science club. The activities could also be used by an instructor to build DataFest skills into an existing course. For example, several activities focus on making a data analysis plan to answer an open-ended research question and choosing appropriate statistical methods. These activities lend themselves well to group discussions and could be used in regression and modeling courses as in-class group activities. Further activity details are provided in the following section.

2.4 Activities

Once students were recruited into DataFest and to the preparatory STA 175 course, the next step was to equip the students with the skills they need to handle the large, messy data that is a key component of DataFest. We did this through a series of nine activities, presented to the students in the nine weeks leading up to DataFest 2022. In this section, we will discuss the skills these activities emphasized, the dataset used for most of the activities, as well as identifying which activities focused on which competition skill.

2.4.1 Competition Skills

Based on our previous experiences with DataFest (one of us has mentored students as both a graduate student and faculty member, the other participated as an undergraduate), we identified the following Competition Skills as skills or experiences the students needed for success in DataFest:

  1. Working in teams

  2. Coding in teams (Collaborative Coding)

  3. Data visualization

  4. Data cleaning/data wrangling

  5. Working with large, messy data

  6. Getting started with an open-ended research question

  7. Choosing appropriate statistical methods based on the client request

  8. Presenting/explaining results

  9. Planning and time management

The first two Competition Skills (working in teams and coding in teams) are critical because most statistical competitions, and indeed statistical analysis in practice, involve students working in teams. However, Level 1 students may not have experience working in teams, and likely would have not have experience trying to tackle a data analysis with different team members coding on different computers. Our activities encouraged teamwork and collaborative coding in two ways. First, students worked within their competition teams for the entire semester, which allowed students to hone their ability to work as a group and identify the strengths of each team member. It also provided opportunities for students to bond and get to know one another. This process of building community within a classroom is one technique for supporting diverse learners (Taylor et al. Citation2022). Second, we included guidance throughout the activities on how to efficiently divide work between team members. For example, Activity 4 asks each team member to explore a different variable in EDA, while Activity 8 suggests that different team members work on visualizing relationships, identifying important groups in the data, and finding missing or incorrect data. To encourage collaborative coding, our students worked on shared projects on RStudio Server Pro accounts, but the activities can also be done with other installations of R and RStudio.

The next three Competition Skills focus on the practical knowledge students would need to work with the large and messy data that is at the core of statistical competitions. Our Level 1 students had some familiarity with R, but typically only at the level of making basic visualizations and fitting regression models on relatively small, tidy datasets. DataFest data is generally at least 1 GB in size, and is typically larger. The data may be contained in multiple files and is almost never in a format that students can use without cleaning and wrangling. Because of this, our STA 175 activities were designed to help students learn to use R (R Core Team Citation2021) to make large data more manageable. These activities also provided a resource that students could reference during the competition.

The last four Competition Skills focused on choosing appropriate methodology and explaining both methods and results to clients, all within the generally short competition time frame (DataFest, for instance, lasts roughly 48 hr). Most Level 1 students have not had experience choosing methods based on the research question of a client, nor had they needed to communicate process or results to a client. Our activities allowed students to practice these skills in a safe space and to receive feedback as they worked. For example, Activity 8 walks students through the initial steps of tackling an open-ended research question. In this activity, students are given a client request and asked to turn this research question into a statistical question which can be answered with the available data, after performing exploratory data analysis. They then assess which of the statistical methods they have learned are most appropriate to answer this statistical question.

In the context of DataFest, the “client” is the data provider, as the organization or individual providing the data typically gives a desired goal or list of goals for the analysis. These goals are generally open-ended, which requires students to think critically about how to address the needs of the client (the data provider) within the 48 hr competition period. A secondary client for DataFest competitors is the panel of judges to whom they must present their results. To prepare students for this, Activity 9, tasks students with explaining their approach to a statistical analysis to an imaginary panel of judges. The activity provides detail on ways to adapt their explanation of their process and results to suit the needs of their client, focusing on clarity, addressing research goals, and presentation.

2.4.2 The Activities

In total, we designed 9 activities to teach or reinforce the Competition Skills for the Level 1 students in STA 175. Students worked on one activity per week, using the entire 50 min time block of the class to work on the week’s activity with their team. The two faculty members circulated around the room, answering questions, providing suggestions, and talking with the teams.

In , we indicate which of the nine activities focuses on each of the nine Competition Skills. You will note that Time Management is not explicitly listed in , as the timed nature of the class period provided students practice managing their time and learning to work on a task in a fixed time frame. We also integrated guidance on planning analyses throughout the activities, with the end of several activities asking students to plan how they would proceed with the next steps. Other skills, like data visualization and wrangling, were specifically targeted in early activities and then used throughout the course.

Table 1 A map of the competition skills to the activities that teach or reinforce these concepts.

We note that these activities assume students have had some basic familiarity with R. We do provide guides for installing/accessing R and getting started with R Markdown, but if students have never seen R, we would recommend including one introductory activity on R before using the STA 175 activities. Examples of such activities can be found in a variety of sources including OpenIntro Statistics (Diez, Cetinkaya-Rundel, and Barr Citation2019).

2.4.3 The Data

A critical component of creating these activities was selecting data that would reflect many of the key elements of a DataFest dataset. This means the dataset needed to be large and complicated, with some missing or messy data, and some possible supervised research questions. Because of this, we built most of our activities around a dataset that was being used for a different statistical competition, specifically the 2022 WiDS Datathon (WiDS Citation2021). The dataset for the WiDS Datathon was created by Climate Change AI and Lawrence Berkeley National Laboratory, and the data was hosted on Kaggle (Data: WiDS Datathon, Version 1 Citation2022). The dataset has over 70,000 rows and more than 30 features.

The stated goal in the WiDS Datathon was to “analyze differences in building energy efficiency, creating models to predict building energy consumption” (Data: WiDS Datathon, Version 1 Citation2022), and we mirrored this in many of our activities. Each row in the dataset represents a specific building and a variety of categorical and numeric variables about that building are provided. The dataset also provides information about the region the building is located in, along with climate and weather information for that region. More detailed information about this dataset, including a list of variables and a link to download the data, can be found at https://www.kaggle.com/competitions/widsdatathon2022/overview

3 Results

3.1 Benefits of the Course Structure

By using data from a real competition (the WiDS Datathon), students were able to practice with data similar in size and complexity to what they might see in DataFest. This provided an opportunity to make mistakes, and learn from those errors, in a low-stakes environment with direct access to our help and guidance. Previous literature argues that making and correcting mistakes is an important part of learning (Dweck Citation2006; DeBrincat Citation2015; Hoffman and Elmi Citation2021), and that providing a safe space to make mistakes is valuable for students (Evans Citation2017). The course also provided extended practice with group work, which has been recommended to promote student learning (Garfield Citation1993; Keeler and Steinhorst Citation1995; Vance Citation2021).

During DataFest, which ran over a weekend from Friday evening to Sunday afternoon, students were able to ask us for help and guidance on the data and their analyses. We were both available in person all day Saturday, and we had a constant stream of students asking for advice. By the time the competition occurred, students had several months of experience interacting with us in class, which gave them time to get comfortable approaching us, and possibly other mentors, for help.

3.2 Student Participation and Feedback

Our goal was to encourage Level 1 students to participate in DataFest, and the number of Level 1 participants increased substantially in 2022 compared to previous years. displays the counts of students participating in DataFest since 2019 (the first year Wake Forest students could elect to participate in DataFest). Though 2022 was the first year we divided the students into competition levels, we were able to count the number of Level 1 and 2 students in previous years by looking back at rosters from previous competitions and assigning students to the appropriate levels. In 2019, all students were Level 2 students. In 2022, after the application of our methods, nearly half the participants were Level 1 students, with 7 Level 1 teams and 8 Level 2 teams. We note that all but two of our Level 1 competitors participated in STA 175.

Table 2 Number of Wake Forest students participating in DataFest at each level.

All 7 of our Level 1 teams successfully completed the competition and presented to our panel of judges. This suggests that deliberately recruiting and supporting Level 1 students helped them feel welcome and able to compete in DataFest.

To directly capture student feelings about DataFest and the STA 175 preparation course, we surveyed all participants after the competition to ask about their experiences. Some survey and course evaluation responses are summarized in , showing that respondents found the course valuable and Level 1 students appreciated the different competition levels.

Table 3 Results from student surveys and course evaluations.

This student feedback, and their completion of STA 175 and DataFest when both were optional, suggests that students found STA 175 useful for preparing for DataFest.

4 Discussion

To help make DataFest more accessible to lower-level students, we divided our students into two competition levels, and created a one-credit optional, low-stakes course to help our students prepare for DataFest with a series of targeted activities. Enrollment of lower-level students subsequently increased, and 100% of our Level 1 students who replied to a survey said they were more comfortable competing against other Level 1 teams. Student feedback on surveys and course evaluations also suggests that students found our course and activities valuable preparation for DataFest.

In addition, in our experience DataFest provides a great way for students to prepare for job interviews. For instance, several of our students have had to do 48 hr data analysis tasks as part of their job interviews. Others have been asked during the interview to describe how they would approach a particular analysis task and why. These skills directly relate to skills used in DataFest, where students get to practice choosing a method appropriate to a task, executing that method in a short period of time, and orally presenting (and defending) their results. These benefits of DataFest have been previously discussed in Ullman et al. (Citation2020), with discussion participants also noting that DataFest is an opportunity for students to connect with the broader statistical community and local professionals.

While our materials were designed to prepare students for DataFest, similar skills are required for other statistical competitions. Our activities could also be used outside of a dedicated course, for example by a data analysis club, or for individual study. For students with little or no computing experience, we recommend providing an additional initial activity that introduces students to R. We may also add an additional data wrangling activity to future iterations of the course, as data preparation is often one of the most challenging steps in a competition. We note that more data would need to be collected to assess the benefits of these activities for students competing in other statistical competitions.

For instructors interested in helping students prepare for a competition, we summarize our experiences with the following tips:

  • Use complex data, such as data from previous competitions. Competition data is typically much larger and messier than what students have seen in homework assignments and class projects, and it is important to expose students to these challenges.

  • Actively recruit students with less experience. Make it clear that these students belong in competitions too, and share any resources to help them prepare.

  • Give students practice working in teams, and getting to know the instructor.

  • Make practice low-stakes, for example by making the course Pass/Fail and grading assignments only for effort or completeness. This allows students to use the activities for practice, rather than worrying about their grade or making mistakes, and also reduces the burden on instructors who are volunteering their time.

Supplementary Materials

The dataset used in the activities is available in the supplementary materials.

Supplemental material

Supplemental Material

Download Zip (15.4 MB)

Acknowledgments

Thank you to our students for participating in STA 175 and in DataFest, and for providing helpful feedback on their experiences. Thank you to Julie Wise and Caroline Bowen for their help organizing DataFest at Wake Forest, and to Jihyeon Kwon for helping students during the event. Finally, thank you to Cody Stevens and the Wake Forest DEAC HPC Cluster for setting up RStudio Server Pro and for invaluable technical support during DataFest.

Data Availability Statement

All data are fully reported in the article.

Disclosure Statement

The authors report there are no competing interests to declare.

References

  • American Statistical Association. (2022). “ASA DataFest in a Box,” Available at https://ww2.amstat.org/education/datafest/datafestinabox.cfm.
  • Bailey, B., Spence, D. J., and Sinn, R. (2013), “Implementation of Discovery Projects in Statistics,” Journal of Statistics Education, 21.
  • Bray, A. (2014), “A Festival of Data: Student Perspectives,” AMSTAT News: The Membership Magazine of the American Statistical Association, 3–4.
  • Calnon, M., Gifford, C. M., and Agah, A. (2012), “Robotics Competitions in the Classroom: Enriching Graduate-level Education in Computer Science and Engineering,” Global Journal of Engineering Education, 14, 6–13.
  • Cannon, A. R., Cobb, G. W., Hartlaub, B. A., Legler, J. M., Lock, R. H., Moore, T. L., Rossman, A. J., and Witmer, J. (2018), STAT2: Modeling with Regression and ANOVA, New York: W. H. Freeman.
  • Carpio Cañada, J., Mateo Sanguino, T., Merelo Guervós, J., and Rivas Santos, V. (2015), “Open Classroom: Enhancing Student Achievement on Artificial Intelligence through an International Online Competition,” Journal of Computer Assisted Learning, 31, 14–31.
  • Çetinkaya-Rundel, M., and Stangl, D. (2013), “Taking A Chance in the Classroom: A Celebration of Data,” Chance. 26, 43–46.
  • Data: WiDS Datathon, Version 1. (2022), “Climate Change AI (CCAI) and Lawrence Berkeley National Laboratory (Berkeley Lab),” Available at https://www.kaggle.com/competitions/widsdatathon2022/overview Retrieved January 10, 2022.
  • DeBrincat, D. (2015), “Yes, No, Wait, What?: The Benefits of Student Mistakes in the Classroom,” The History Teacher, 49, 9–34.
  • Diez, D. M., Cetinkaya-Rundel, M., and Barr, C. D. (2019), OpenIntro Statistics (4th ed.), Boston, MA: OpenIntro Inc.
  • Dweck, C. S. (2006), Mindset: The New Psychology of Success, New York: Random House.
  • Evans, M. (2017), “Providing Students with Real Experience while Maintaining a Safe Place to Make Mistakes,” Journalism Education, 6, 76–83.
  • GAISE College Report ASA Revision Committee. (2016), “Guidelines for Assessment and Instruction in Statistics Education College Report, Available at http://www.amstat.org/education/gaise.
  • Garfield, J. (1993), “Teaching Statistics Using Small-Group Cooperative Learning,” Journal of Statistics Education, 1.
  • Gould, R. (2014), “Datafest: Celebrating Data in the Data Deluge,” in Sustainability in Statistics Education. Proceedings of the Ninth International Conference on Teaching Statistics, pp. 1–4.
  • Hoffman, H. J., and Elmi, A. F. (2021), “Do Students Learn More from Erroneous Code? Exploring Student Performance and Satisfaction in an Error-free Versus an Error-full SAS[textregistered] Programming Environment,” Journal of Statistics and Data Science Education, 29, 228–240.
  • Kaggle. (2022), “Competitions,” Available at https://www.kaggle.com/competitions.
  • Keeler, C. M., and Steinhorst, R. K. (1995), “Using Small Groups to Promote Active Learning in the Introductory Statistics Course: A Report From the Field,” Journal of Statistics Education, 3.
  • Kim, A. Y., and Hardin, J. (2021), ““Playing the Whole Game”: A Data Collection and Analysis Exercise with Google Calendar,” Journal of Statistics and Data Science Education, 29, S51–S60.
  • NFL Football Operations. (2022), “Big Data Bowl,” Available at https://operations.nfl.com/gameday/analytics/big-data-bowl/.
  • O’Keeffe, P. (2013), “A Sense of Belonging: Improving Student Retention,” College Student Journal, 47, 605–613.
  • Polak, J., and Cook, D. (2021), “A Study on Student Performance, Engagement, and Experience With Kaggle InClass Data Challenges,” Journal of Statistics and Data Science Education, 29, 63–70.
  • R Core Team. (2021), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
  • Taylor, L. L., Doehler, K., VanKrevelen, R., Weaver, M. A., and Trocki, A. D. (2022), “A Case Study of Strategies for Intentionally Building Course Community to Support Diverse Learners in an Introductory Statistics Course,” Teaching Statistics, 44, 48–58.
  • Ullman, J., Kolaczyk, E., Brachman, R., Bray, A., Ziganto, D., Uzzo, S., Cramer, C., and Borner, K. (2020), “Meeting #4: Alternative Mechanisms for Data Science Education,” in Roundtable on Data Science Postsecondary Education: A Compilation of Meeting Highlights. Washington DC: The National Academies Press.
  • Van Nuland, S. E., Roach, V. A., Wilson, T. D., and Belliveau, D. J. (2015), “Head to Head: The Role of Academic Competition in Undergraduate Anatomical Education,” Anatomical Sciences Education, 8, 404–412. DOI: 10.1002/ase.1498.
  • Vance, E. A. (2021), “Using Team-Based Learning to Teach Data Science,” Journal of Statistics and Data Science Education, 29, 277–296.
  • Wickham, H., and Grolemund, G. (2016), R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, Sebastopol, CA: O’Reilly Media.
  • WiDS. (2021), “Announcing the 5th Annual WiDS Datathon 2022 Challenge: Using Data Science to Mitigate Climate Change,” Available at https://www.widsconference.org/blog_archive/announcing-the-5th-annual-wids-datathon-2022-challenge-using-data-science-to-mitigate-climate-change.