1,458
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Studying the Opportunities Provided by an Applied High School Mathematics Course: Explorations in Data Science

, , , , , & show all

Abstract

This article reports on a multi-method study of a high school course in data science, finding that students who take data science take more mathematics courses than those who do not, there are more under-represented students in data science than is typical for other advanced mathematics courses; that the students who take data science are more positive about a future in STEM and they tend to be older. Analysis of writing from the students shows that students are very positive about the course, appreciating the relevance of the content, the opportunity to investigate ideas, the chance to learn challenging, applied content, and the opportunity to think creatively. In an assessment of data and functions given to students in data science and Algebra 2 courses, the students in data science scored at significantly higher levels.

1 Introduction

Mathematics is a broad subject, but for a long time, schools in the United States have valued a single pathway that focuses on algebra and trigonometry as students go through their high school years (Dana Center Citation2022; Levitt Citation2022). For a minority of students (approximately 16%) this pathway culminates in calculus in high school (National Center for Education Statistics, Citation2022). For many students the narrowness of the course options as well as the procedural nature of the content (LaMar, Leshin, and Boaler Citation2020) leads to a dislike of mathematics and an ending of any mathematical interests (Drew Citation2011). As students decide to end their mathematical course-taking they lose the opportunity to continue in many STEM fields. Higher education pathways in STEM generally require mathematics knowledge and a willingness to take courses in and engage in the subject. This loss of students, which is not equally distributed across demographic groups, creates a problem for our society. If we do not have a diverse group of people working in STEM, we are at risk of unseen biases infiltrating STEM solutions, as well as a reduction in students’ career and life options (D’ignazio and Klein Citation2020).

Even students who choose to continue past Algebra 2 and trigonometry often enter what has been termed the “race to calculus” (Bressoud Citation2017). As most schools require more classes to be taken than there are years of high school, students can take calculus in high school only if they are advanced in middle school. This means they need to compress content and learn at a faster pace, at the expense of depth. This leaves many students lacking understanding (Drew Citation2011), as revealed when Bressoud (Citation2017) studied over 800,000 students taking calculus, finding that two thirds of students either retook the course or took a lower-level course in college. Only 19% of students continued to the next course: Calculus II. The division of students into different pathways in middle school also leads to something more insidious—a racially tracked system (National Center for Education Statistics, Citation2022). This results from several forces that work to preclude the placement of Black and Brown students in higher tracks, including low teacher expectations (Lawyers’ Committee for Civil Rights of the San Francisco Bay Area 9 (LCCR), Citation2013) and parental pressure from more privileged parents.

The decision to create a high school mathematics pathway consisting of geometry, algebra and trigonometry was made in 1892 by a committee of ten men (National Education Association of the United States Citation1894; Fiss Citation2012). The endurance of this pathway, with little change in its emphasis has been noted by many to be remarkable, particularly given the changing nature of mathematical demands in the world (Lawyers’ Committee for Civil Rights of the San Francisco Bay Area 9 Citation2013). In 1995, statistics was introduced as an alternative to calculus (Scheaffer and Jacobbe Citation2014) with an Advanced Placement (AP) course in statistics first being offered in 1997 (Piccolino Citation1996). However, there is a widespread perception, held by many teachers and parents, that calculus gives an advantage in college admissions (Burdman Citation2019). It is unclear how accurate this perception is, as many students are admitted to colleges without taking calculus. Nevertheless, the idea that calculus carries significant weight in college admissions has meant that most high achieving students engage in the race to calculus, whether calculus is a valuable course for their future pathways or not. Unfortunately, only 16% of students get there—for too many students Algebra 2 is the end of their mathematical journeys (National Center for Education Statistics, Citation2022).

While mathematics curricula have stagnated, more appropriately serving the needs of people in the 1900s, the last ten years have witnessed a data explosion and greater need than ever before for students to learn the mathematical methods and tools that will allow them to make sense of data in the world, constructing mathematical models that give insights into mathematical patterns. University teams have now created data science high school courses. In 2014, the University of California (UC) system began to honor these courses, and in 2020 they communicated that students could take a data science course in lieu of Algebra 2 (Burdman Citation2023). In 2023 a campaign to discredit data science offerings resulted in the UC Board of Admissions and Relations with schools (BOARS) committee reconsidering this decision (Burdman Citation2023). There are two main arguments given by the group opposed to data science. The first is that students may choose to take data science in high school, but then later realize they need a course in Algebra 2 to major in STEM in college. The second is the claim that the content of data science is less “rigorous” than the content of algebra (Harman Citation2023). The validity of both claims is considered through the data received from four school districts that are teaching a course in data science shared in this article.

Despite the caution now being displayed by the UC colleges, multiple states across the United States have created new high school mathematics pathways that include an option to take a course in data science. The promise of data science courses is that they teach students mathematical content that is more meaningful to their lives, help students see a future in STEM, and provide an important prelude to a course in statistics. Currently 17 states across the United States have K-12 Data Science efforts underway. Fourteen of those programs are in high school mathematics in a variety of formats (Alabama, Arizona, Arkansas, California, Georgia, Maryland, Massachusetts, Michigan, Ohio, Oklahoma, Oregon, Utah, Virginia, and Washington) (“CA Adopts K-12 Data Science” Citation2023); other states have pursued data science opportunities in Career & Technical Education, Computer Science, and interdisciplinary programs.

Despite their promise, courses in data science are in their infancy and only a few research studies examining their impact exist. This article provides one of the first examples of a multi-method detailed study of students’ experiences in a high school data science course, involving records from over 6500 students in multiple schools and districts and asks the research questions: How does a data science course impact students’ mathematics pathways? How do students experience a project-based data science course? What do students learn in a project-based data science course? And how does a data science course impact students’ ideas about a future in STEM? These questions are answered using quantitative analyses of pathways, student and teacher interviews, quick writes, student surveys, and a test of data and functions given to students in data science and Algebra 2 classes. Any study of students’ experiences in courses, particularly when taught by innovative teachers, raises questions as to whether positive outcomes can be scaled more broadly. But as this is the first large-scale study of a data science course in public high schools, it seems valuable in highlighting outcomes that are at least possible.

2 Background

2.1 The Data Science Course

The data science course at the focus of the current study was created at Stanford University and is titled: Explorations in Data Science.Footnote1 The course is project-based, designed around a data science cycle of inquiry, shown in .

Fig. 1 Data science cycle of inquiry.

Fig. 1 Data science cycle of inquiry.

The curriculum follows an eight-unit timeline that develops an understanding of univariate data, bivariate data, simulations and probability, clustering with numerical and categorical data, prioritization models, developing preference algorithms, helping students learn to model univariate, bivariate, and multivariate data with functions. The course culminates in students exploring a data science question of their own. Throughout the eight units students are introduced to different technology tools including Google Sheets and Slides, CODAP, Google Colab, and Tableau. These technology tools develop the skills for students to apply their learnings from each unit to a real-world dataset. Datasets include data generated by the students, classmates, and from open-source and publicly available datasets. Topics are introduced to be relevant to students’ lives such as water usage in their community, predicting music choices, data on living in different states from the American Community Survey, and colorism in popular media.

The course introduces challenging mathematics concepts in synthesis with technology, embedded in an authentic context. Below is a list of examples:

  • In Unit 2, students conduct sampling, comparing state and local level survey data results to interpret and identify the needs of their home state.

  • In Unit 3, students develop understanding of bivariate relationships through linear regression while exploring water usage in their household and across cities.

  • In Unit 4 students gain an introduction to programming skills through generating a song shuffle simulator. Students begin writing if-then statements and loops to generate a simulation program. Then, students dive into theoretical, experimental, and conditional probability applying the data they collected from their program to these mathematical concepts.

  • In Unit 5, students begin developing foundational linear algebra concepts, while working on a project exploring colorism in popular media. Students learn how to work in higher dimensional spaces and are introduced to k-means clustering, applying their probability skills to deepen their understanding of the concept.

  • In Unit 6, students create a prioritization model where they develop skills of weighting, normalization, and sensitivity analysis to cultivate their skills in developing their own mathematical models.

  • In Unit 7, students consider machine learning, through which they model using polynomial equations and conditional probability in addition to developing an introductory understanding of vectors and matrices.

  • In the final unit the students conduct their own data science investigation.

The Common Core standards learned through the course primarily come from mathematics, but standards from computer science and English language arts are also met.

In addition to the mathematical and statistical content, the data science course that is the focus of this article emphasizes ethical data science practices, introducing students to the ways people create their own data and practices that will help them become careful consumers. Introducing topics such as spurious correlations, manipulated data visualizations, and data privacy, build student language and understanding around data ethics and best practices. Students are also invited to explore their own privacy policies for electronic devices or apps they may use to understand the ways in which data is being collected by companies and to consider ethical challenges. In some projects students are specifically asked to comment on the data ethics of their projects and to think about their own practices.

As a project-based course, the students learn through collaboration and exploration, and they are assessed through project rubrics, rather than tests. Throughout the course students are reminded of growth mindset messages (Dweck Citation2006), emphasizing the value of challenge as an opportunity for growth and learning.

The course is freely available and in 2022–23 was being taught by 1800 teachers in 1400 school districts across the United States to over 160,000 students. The teachers of the courses reported that 47% of the students were girls/non-binary, 57% were students of color and 68% had not been mathematically advanced in middle school. The six schools providing data for the current study came from 4 districts in California in two distinct geographic regions and were invited to take part in a study as early adopters of the course.

2.2 The Need for Data Science Education

The first modern usage of the term “Data Science” is often attributed to a position paper by William Cleveland for the International Statistical Review in 2001 where he defined data science as, “the analysis of data to solve problems posed in terms of the subject matter under investigation” and explained that this work is not done solely by statisticians but also subject matter analysts who have a vested interest in learning from their data (Cleveland Citation2001, p. 22). Over the past 20 years, many definitions of data science and data literacy have been published, all of which center around the same themes. The current study draws on a definition of data science coming from the National Science Foundation which defines it as, “the processes and systems that enable us to extract knowledge or insight from data in various forms and translate it into action” (Berman et al. Citation2018, p. 67). Additionally, we use Wolff et al. (Citation2016, p. 23) definition of data literacy:the ability to ask and answer real-world questions from large and small data sets through an inquiry process, with consideration of ethical use of data. It is based on core practical and creative skills, with the ability to extend knowledge of specialist data handling skills according to goals. These include the abilities to select, clean, analyze, visualize, critique, and interpret data, as well as to communicate stories from data and use data as part of a design process.

Over 20 years ago, Cleveland (Citation2001) wrote, “Education in data science does many things. It trains statisticians. But just as important it trains non-statisticians, conveying how valuable data science is for learning about the world.” While the world undoubtedly needs a more diverse set of people who can fill the role of Data Scientists (D’ignazio and Klein Citation2020), in the 21st century, all citizens need to understand data and its use. All people generate, share, and interpret data in the course of their everyday lives. Unfortunately, research shows that students are not well prepared to make sense of data (Wineburg et al. Citation2016; Educational Data Systems Citation2018). In California, only eleven percent of eleventh graders are meeting standards related to data analysis (Educational Data Systems Citation2018). Failure to prepare citizens to make sense of their data-filled world is particularly dangerous in complex digital and social media environments as the spread of misinformation runs rampant (Engel Citation2017) and people are left unequipped to separate fact from fiction (Wineburg et al. Citation2016; Zucker, Noyce, and McCullough Citation2020).

shows the percentage of students in California meeting grade-level standards in data analysis in 2018 (Educational Data Systems Citation2018). Looking at elementary grades, students are meeting the minimal data-related standards at very high rates. The middle grades of the Common Core are filled with data standards, but less than 50% of students met those standards, and by high school only eleven percent of 11th graders met the data-related standards (Educational Data Systems Citation2018).

Fig. 2 Students from 21 school districts in California meeting data science standards assessed through MARS assessments (n = 14,574). Adapted from: Silicon Valley Mathematics Initiative’s Mathematics Assessment Collaborative: Mathematics Assessment Service (MARS) and California Assessment of Student Performance and Progress (CAASPP) Technical Report by Educational Data Systems, Citation2018.

Fig. 2 Students from 21 school districts in California meeting data science standards assessed through MARS assessments (n = 14,574). Adapted from: Silicon Valley Mathematics Initiative’s Mathematics Assessment Collaborative: Mathematics Assessment Service (MARS) and California Assessment of Student Performance and Progress (CAASPP) Technical Report by Educational Data Systems, Citation2018.

Studies have shown that young people are lacking in data literacy (Konold et al. Citation2015) and are not critical assessors of the validity of data presented to them (Wineburg et al. Citation2016; Zucker, Noyce, and McCullough Citation2020). For example, Konold et al. (Citation2015) found that when faced with data visualizations, K-12 students often focus on individual data points like an outlier, maximum, or minimum, but often do not consider the data in aggregate. This narrow focus on individual points results in a failure to interpret the global patterns and trends that a visualization may summarize. Researchers across disciplines have raised the concern that a lack of data literacy amongst our society has led to the viral spread of misinformation and a general threat to social progress and democracy (Gould Citation2010; Gould et al. Citation2016; O’Neil Citation2016; Wineburg et al. Citation2016; Engel Citation2017; Noble Citation2018; Erickson et al. Citation2019; Zucker, Noyce, and McCullough Citation2020).

2.3 The Emergence of Data Science Education in K-12

The urgency of data science education within the K-12 space has been made apparent by the need to prepare students to be “data savvy” citizens (Finzer Citation2013; Engel Citation2017) and to better align to the needs of modern employers and workplaces (Levitt Citation2022). The integration of data science into K-12 coursework would also broaden and improve mathematics pathway options at the secondary level with a number of benefits, including: (a) closer alignment to diverse post-secondary interests (b) better demonstration of the ways mathematics applies to students’ personal interests, and (c) opportunities to address the history of inequitable outcomes patterned by race, gender, language, and socioeconomic status (Burdman Citation2018; Berry III and Larson Citation2019; Daro and Asturias Citation2019; Dana Center Citation2020; Reed et al. Citation2023). Improvements could come about through the provision of new courses and through improved course design in both data science and Algebra courses (Stigler and Son Citation2023). Together these different arguments led to a call for change in secondary mathematics to include a focus on data science (Boaler and Levitt Citation2019).

While contemporary high school data science coursework has been around since at least 2014 (see, e.g., the Introduction to Data Science (IDS) curriculum, available online: https://www.ucladatascienceed.org/), efforts toward this change have expanded greatly in recent years. The expansion of data science learning options across K-12 (LaMar and Boaler Citation2021) has provided an opportunity for the field of data science education research to emerge, but with some uncertainty as to where the field should situate itself. For example, in 2020 the Journal of the Learning Sciences released a special issue on data science called “Situating Data Science: Exploring How Relationships to Data Shape Learning”. In 2021 the Journal of Statistics Education changed the official name of the journal to become the Journal of Statistics and Data Science Education, and over the past three years Mathematics Teacher: Learning and Teaching PK-12 has published at least eleven articles meant to encourage and support mathematics teachers to integrate data science topics into their content (Custer and Simic-Muller Citation2023). At the moment, the research field is still in its infancy with the majority of currently published work focused upon conceptualizing the field, content, and pedagogy (Erickson et al. Citation2019; Bargagliotti, Arnold, and Franklin Citation2021; Lee, Wilkerson, and Lanouette Citation2021; Arnold et al. Citation2022; De Veaux et al. Citation2022; Kahn et al. Citation2022; Dogucu, Johnson, and Ott Citation2023; Hazzan and Mike Citation2023; Msweli, Mawela, and Twinomurinzi Citation2023) and reviewing the available technology tools (Konold Citation2007; Pimentel, Horton, and Wilkerson Citation2022; Moon et al. Citation2023).

With the current research focus, few studies exist that provide data on the experience of secondary teachers and students engaging with data science teaching and learning at scale. Two studies have investigated high school teachers’ understandings of working with data or learning to teach data science topics (Gould et al. Citation2016; Gould, Bargagliotti, and Johnson Citation2017). The studies featuring student participants are almost entirely focused upon studying the implementation of data science learning opportunities in informal settings like special unit offerings within other coursework, after school or summer programs (Ben-Zvi and Arcavi Citation2001; Kahn Citation2020; DesPortes et al. Citation2022; Kahn et al. Citation2022; Hedges and Given Citation2023). Three recent research studies focused on the implementation of high school data science as a mathematics course (Gould et al. Citation2016; Heinzman Citation2022; Reed et al. Citation2023) all of which are focused on the same curriculum: Introduction to Data Science (IDS). Gould et al. (Citation2016) offer a short roundtable paper that gives an overview of the IDS curriculum, and some of the lessons learned from implementing the curriculum—including the challenges faced by teachers (i.e., engaging in the data science cycle) and some quantitative measures of student achievement.

Heinzman (Citation2022) provided the first study sharing findings around the student experience in a high school data science course. This case study focused on the experience of 19 students enrolled in the IDS course and included classroom observations, a survey, two focus groups with students and one teacher interview as data sources. Heinzman found that the students expressed feelings of agency and belonging within the course as they were given the tools and power to explore data they found personally interesting. In addition, the students valued their peers as resources and generated a vibrant classroom community.

Finally, Reed et al. (Citation2023) studied the California Mathematics Readiness Challenge Initiative aimed to increase equitable access to advanced mathematics courses for students in their final year of high school. Six developers of advanced innovative math (AIM) courses serving as viable alternatives to traditional 12th grade math classes partnered with local California public universities to conduct the courses and provide professional development support to the teachers of the courses. Introduction to Data Science was just one of the six courses included while the other five focused on topics like quantitative reasoning and discrete mathematics. High school students enrolled in the AIM course were those students who want to pursue higher education but would have otherwise opted not to enroll in a 12th year math course for reasons including poor experience in past math classes, lack of access to course options due to hierarchical course sequencing, or poor advising. The results of Reed and colleagues study indicated that teachers of these AIM courses—including Introduction to Data Science—noticed improved educational outcomes for their students including increased feelings of community, and increased confidence in mathematics. Researchers also found an increased likelihood of completing course requirements for California State University and University of California eligibility by 3–10 percentage points (Reed et al. Citation2023).

2.4 Project-Based Learning

The study presented in this paper focuses on the impact of a project-based high school data science course on students. Project-based learning has decades of evidence supporting its positive impact on student engagement, motivation, learning, and achievement (Kokotsaki, Menzies, and Wiggins Citation2016). One of the benefits of project-based learning is that it allows students to experience content that comes from the world and that they can investigate in depth. Data science is a natural topic for a project-based learning approach as it involves data from the world and datasets lend themselves to exploration and rich questions. Kokotsaki, Menzies, and Wiggins (Citation2016, p. 1) define project-based learning asan active student-centered form of instruction which is characterized by students’ autonomy, constructive investigations, goal-setting, collaboration, communication and reflection within real-world practices.

In classrooms using a project-based learning approach, students are actively engaged in the learning process. Research shows that learning through a project-based approach can positively affect student achievement (Craig and Marshall Citation2019), creativity (de Oliveira Biazus and Mahtari Citation2022), motivation (Shin Citation2018), and critical thinking (Holmes and Hwang Citation2016). To synthesize the impact of a project-based learning approach on student learning, Chen and Yang (Citation2019) conducted a meta-analysis of 30 journal articles published between 1998 and 2017 representing 12,585 students for 189 different schools across nine countries. Their analysis showed a positive effect size of 0.71 standard deviations, indicating that a project-based learning approach has a medium to large effect on student academic achievement as measured by standardized and researcher-developed assessments compared to a traditional, teacher-led approach (Chen and Yang Citation2019). While research on project-based learning has a long history, research on data science education is still in its infancy as data science only emerged as a field at the turn of the century.

3 Study Design and Analysis Methods

3.1 Research Questions and Setting

The study at the center of this paper was designed around four broad questions:

  • How does a data science course impact students’ mathematics pathways?

  • How do students experience a project-based data science course?

  • What do students learn in a project-based data science course?

  • How does a data science course impact students’ ideas about a future in STEM?

The questions were chosen to help address the ways a data science course might impact students’ high school mathematics pathways and experiences. The focus of the study was six classes in five high schools across four separate districts and two distinct geographic regions in California. The demographics of the schools are shown in . As the table shows, the schools were racially and economically diverse and varied widely in size and students’ proficiency in mathematics. Data were collected in these schools during the 2021–22 school year. The current study was approved by the Stanford Institutional Review Board under protocol ID number 57693. Teachers were recruited into the study through a district call out for those who would be interested in teaching data science and participating in a study. Teachers who expressed interest were then provided with forms to consent to participate in the study. Prior to filming in classes and collecting student data, students were asked to have their parents complete a consent form and they were given the opportunity to assent to be included in the research. Students who opted not to participate were still able to fully participate in the class, but their information was not included in the study, and they were not filmed.

Table 1 Demographic profiles of participating schoolsTable Footnote*.

3.2 Data Sources and Analysis Methods

shows what data sources were obtained from each of the schools and districts included in the study. In addition to the main classes included in the study, quick writes were solicited via an anonymous form from teachers implementing the class across the United States. These sources make up the majority of the quick writes included in the study. In we note how the data sources address each of four research questions.

Table 2 Data sources by region, school, and class.

Table 3 Research questions and supporting data sources.

3.2.1 Quantitative Analysis of Mathematics Courses

Course enrollment, achievement, demographic, and socioeconomic data for over 24,500 students from 2018–2019 through to the 2021–2022 academic year provided by the districts and allowed for observational consideration of the amount of math taken by students as Data Science was first offered.Footnote2 Using subsets of the available data for 12th and 11th grade students, the number of semesters of math classes for students who took data science and those who did not could be compared. The data also allowed for consideration of the differences in student composition of groups of students that chose to add data science to their math pathway and those that chose other advanced math courses.

3.2.2 Quick Writes

In addition to the six main classrooms that were the primary focus of the study additional data were gained from a broader cross section of classrooms from around the United States. Teachers were recruited from a pool of candidates who had completed a professional development program and indicated they would be teaching the data science curriculum during the 2021–2022 school year. Teachers were asked to have their students complete an anonymous quick write assignment about their experience in and feelings about the course. The prompts for the quick writes were:

  • What are your thoughts about this data science course? What do you like, or not like, about it?

  • How would you describe this class to another student who is thinking about taking it next year?

Quick writes from 138 students from 5 classrooms were analyzed by a team of researchers. Of these classrooms, three were in California, one in New York and one in Florida. Only 24 quick writes in this dataset came from a school in the more focused study. These quick writes were anonymous, so no demographic data is available about each individual student response.

The student quick writes were coded in multiple rounds. This began with open coding of the quick writes, resulting in the development of bottom-up codes (Emerson, Fretz, and Shaw Citation2011), and the development of a codebook (Saldaña Citation2015) shared in Appendix B. Codes were applied at the question response level and application of multiple codes was allowed. A sentiment analysis was also conducted to estimate students’ feelings about the course whether positive, negative, or neutral. Example excerpts and definitions for how this sentiment was determined and are included in the codebook in Appendix B. As the team coded, additional potential codes were identified and brought to the broader team for discussion. When general agreement had been reached the team conducted inter-rater reliability testing and got a Cohen’s Kappa of 0.7, good agreement (Landis and Koch Citation1977). After discussion of different interpretations 100% agreement was reached. The team then divided excerpts by code and all team members reviewed all codes; disagreements were discussed and adjudicated to come to consensus. Analysis of code counts and content of each tagged code was summarized and shared with the full research team. Once coding was complete the full research team came together to discuss the codes from the quick writes along with results from the student and teacher interviews and determine what themes were present in the data. In the final analysis the 22 codes were consolidated into four overarching themes that appeared across the qualitative data sources.

3.2.3 Observation and Interviews

During the school year each classroom was observed and filmed three different times (fall, winter & spring); each recording consisted of one entire class period. These observations were used to prompt discussion in teacher interviews.

To understand student and teacher experiences of the course, 17 students from four of the five schools were interviewed in small groups of 3–6, and seven teachers from all five schools were also interviewed. Teachers were invited to be interviewed on three different occasions to get data from different parts of the course, three of the teachers were interviewed more than once.

In both student and teacher interviews the participants were asked to reflect on lessons, particularly focusing on levels of student engagement and on the nature of the learning in the course. Teachers were also asked to comment on any modifications that they made to the prescribed instructional materials and compare the course to other mathematics courses that they were teaching or had taught.

The codebook that was developed for the quick writes was used to code the student and teacher interviews. The unit of analysis for coding was a complete idea expressed by a student or teacher. Two senior researchers on the project coded transcripts of the teacher and student interviews and achieved an initial inter-rater reliability (Campbell et al. Citation2013) of 85%. Excerpts where coding differed were discussed, resulting in 100% agreement.

3.2.4 Student Surveys

A survey was distributed to students in a subset of classes that volunteered to participate in schools 1, 2, and 3 during spring 2021 (see ). The survey was designed to determine students’ interests and feelings about mathematics and STEM fields before the Data Science class was offered. This survey was repeated in spring 2022; however, only schools 2 and 3 participated in this post survey (see ). The survey administration resulted in 500 students with pre-post observational data from schools 2 and 3. However, only 25 data science students completed the post survey and of those only 12 had also completed the pre survey. Because of the significant number of missing pre surveys for data science students, the decision was made to drop the pre-post design and only complete analysis on the post surveys with a comparison between data science and non-data science students. The three survey outcomes are: students’ beliefs about having a future in STEM, students’ interest in pursuing STEM, and students’ confidence in making sense of data and online information.

The belief about having a future in STEM is measured with a single self-report rating item: “I know I have a future in STEM if I want one”, where 1 = disagree and 10 = Agree.

Interest in pursuing STEM is measured using three 5-pt Likert items: “How interested are you in pursuing a future career in STEM?” (0 = not at all interested to 4 = very interested), “How motivated are you to take STEM classes in college”, and “How motivated are you to major in a STEM field in college?” (0 = ‘not at all’ to 4 = ‘a great deal’). The ratings for the three items are summed together into a single STEM interest score with a range from 0 to 12 and that has a Cronbach’s alpha reliability of 0.93.

Confidence in data science skills relevant to STEM is measured with four items asking students to rate how confident they are in their skills to “Use data to make a decision”, “Create a convincing argument using data as evidence”, “Assess the reliability of claims made in online sources (e.g., social media, news sites, etc.)”, and “Interpret data presented in graphs, charts, and tables”. Each item is rated with a scale from 1 to 10 and the sum of the four ratings is used as a single indicator of confidence. Scores range from 4 to 40 and the measure has a Cronbach’s alpha reliability of 0.90.

Three variables were employed to control for some of the potential preexisting differences in the data science and non-data science groups of students—specifically gender, support to pursue STEM, and grade. Gender is measured categorically with students indicating “male”, “female”, or “non-binary”. Support to pursue STEM was measured with the question “Do you have people in your life that encourage you to learn STEM subjects?”. Finally, grade was measured as their current self-reported grade (9, 10, 11, or 12).

Multivariate Analysis of Covariance (MANCOVA) was used to determine whether there were significant differences in the means of the data science and non-data science students on the three dependent variables while controlling for group differences in gender, having support to pursue STEM, and grade.

3.2.5 Assessment of Data and Functions

Toward the end of the school year, students in data science and in Algebra 2 classes at schools 1 through 4 were given an assessment of data and functions that tested outcomes addressed in Algebra 1 and 2 through the use of data. Students were asked to build and make sense of algebraic functions. In one question that came from the National Assessment of Educational Progress (NAEP) 12th grade test, students were asked to make sense of measures of center that could be used to describe a dataset. They were asked to choose which statistic would be most appropriate to use for different datasets. In the second question, adapted from a Core Plus curriculum unit on linear functions, students were asked to describe relationships they saw in a table of data and develop a linear algebraic model, which they were then asked to interpret. The content of the questions is expected to be taught in both Algebra 1 and 2 courses. The full assessment of data and functions is shared in Appendix A.

The test was taken by 692 students across the schools. These data were connected with course enrollment, achievement, demographic, and socioeconomic data provided by schools 1, 3, and 4. Analysis of the test results was conducted using Analysis of Covariance (ANCOVA). A series of fixed effect covariates were used to control for preexisting differences in mathematics experience. These considered whether the student had already taken Algebra 1, whether the student had already taken Algebra 2, most recent prior math achievement (whichever of Algebra 1 or Algebra 2 Grade Point Average (GPA) that was available), grade level, age, attendance, sex, indicators of race and ethnicity, Free and Reduced Lunch (FRL) status, English Language Learner (ELL) status, special education status, and whether students had ever been suspended.

4 Results

4.1 Math Pathways

The subset of 3018 12th grade students from the two regions who were enrolled for at least three consecutive years including 2021–2022 and who had complete achievement, demographic, and socioeconomic data were used to investigate if taking data science was associated with the number of semesters of math classes students completed by the end of 2022.Footnote3 The sample contained 67 data science and 2951 non-data science students from two school districts. shows the demographic breakdown of the data science and non-data science student groups used for this analysis.

Table 4 Demographic characteristics of 12th grade students enrolled and not enrolled in Data Science in 2021–2022.

A series of tests of association revealed that the data science courses included students with slightly lower prior Algebra achievement (p = 0.023), were slightly older (p = 0.024), made up of more White students (p = 0.006), more Native American students (p = <0.001), fewer Asian students (p < 0.001), and more Latine students (p < 0.001).

The average number of semesters of math classes taken over the consecutive three year period from 2019–2020 to 2021–2022, including the Algebra 1 and 2 course series, Geometry, Pre-Calculus, AP Calculus, AP Stats, Other math classes, and Data Science, are compared across the two groups using an Analysis of Covariance (ANCOVA). This analysis facilitates controlling for preexisting differences in the two groups based on sex, age, indicators for race and ethnicity, years enrolled in the school, most recent Algebra course grade, Free and Reduced Lunch (FRL) status, English Language Learner (ELL) status, Special Education (SPED) status, number of suspensions, and attendance rates.

The ANCOVA results reveal that the assumption of equality of error variance between the two groups is met (F(1, 3001) = 6.97, p = 0.180). Of the covariates included in the analysis, being female (F(1, 3001) = 9.94, p = 0.007)), Native American (relative to being White) (F(1, 3001) = 29.18, p < 0.001), having been enrolled for fewer years (F(1, 3001) = 373.99, p < 0.001), being in a Free and Reduced Lunch program (F(1, 3001) = 14.06, p < 0.001), and being classified and as an English Language Learner (F(1, 3001) = 1.51, p = 0.001), are associated with having taken less math over the same three year period. Being Asian (relative to being White) is associated with having taken more math (F(1, 3001) = 25.09, p < 0.001).

When controlling for differences in the two groups based on the set of covariates, on average, 12th grade students who took data science took 0.39 more semesters of mathematics over the same 3 year period than students that did not take data science (MDS = 6.14, MNo DS = 5.75, Diff = 0.39, 95% CI[.10, .67]); F(1, 3001) = 6.97, p = 0.008; see supplementary Table S1 for the full between subjects effects ANCOVA results).Footnote4 In other words, taking more math over a comparable period of time is associated with enrolling in data science. A similar analysis with the subset of 3,485 11th grade students of which 30 had enrolled in data science, shows a similar pattern of results (Diff = 0.17) but lacks statistical power to reveal significance (see supplementary Table S2).

Taken together, the analysis shows that although data science attracts more White students than other math classes, it also attracts more Latine and Native American studentsFootnote5, who have historically been underrepresented in high-level courses. Importantly, taking data science is also associated with taking more mathematics courses overall.

4.2 Students’ Experiences of Data Science

As described above, three forms of data were used to investigate students’ experiences of the data science course. These comprised teacher interviews, student interviews, and 138 students’ anonymous views recorded on a quick write form, online, which asked them to share thoughts about the data science course. The three forms of data were coded by a team of researchers, with four themes emerging from across the datasets that most accurately described the students’ responses to the class. These were: relevance of content, the opportunity to investigate in a project-based course, the challenge of the work, and the opportunity to think “outside of the box.” In the following, student reflections are shared first, followed by teacher reflections.

4.2.1 Students: Applicability of Content

The theme of applicability of the content students were learning came from 71% of the students interviewed and was the most frequent code from the quick writes, given to 13% of all comments. Students talked about being grateful for learning how to analyze, model, and apply data to real-life situations. As students talked about this, many of them contrasted the usefulness of the knowledge with their other mathematics courses they had taken, for example:

And then the equations, the numbers that we use in previous math classes, like integrated math 1 or 2, it is just numbers and we’re not going to really use that in any reasoning about anything. But here, once we have numbers, we use them to make different types of data that we can understand and visually see.

Some of the students related the applicability of the content to the jobs they thought may be in their futures:

I think the aspects of data representation we learn are really important for careers in any form of mathematics, finance of business, which apply to many of what we plan to go into or major in. Even if we do not, knowing how to represent and interpret data visually are important tools for understanding data on the news, the stock market, and in cases of public health (like the pandemic).

Students appreciated working with data from the world; noting the value of considering real problems from the world, such as water shortage and colorism in the media, using data and developing models to produce insights and potential solutions. Students noted that a data science course gave them the opportunity to apply the mathematics they had previously learned:

It’s applying knowledge we already know (…) and is teaching information that we can actually take to almost every part of our lives going forwards, something most math classes cannot. All of the subjects covered so far seem important to me both in the digital age of data and graphs, but also in visualizing information in a new light.

Students expressed gratitude for the data science process in helping them develop data literacy, learning about the ethics, and threats of data. The students were aware of the importance of learning about data ethics in navigating the data in the outside world:

Besides being extremely fun and engaging with many hands-on materials, I have learned so much about the data science process, and despite it sounding cliche, it has really made me think differently about the outside world. I really enjoy the conversations that we have in class, about ethics, manipulating data, and so much more.

Students appreciated integrating technology and mathematics to relevant situations, as one student described:

I really like this course! I love math that I can use outside of high school. Learning how to use programs like CODAP and Google Sheets is very useful. I like being able to manipulate data and find trends. So far the class has been awesome and I cannot wait to continue with more projects.

4.2.2 Students: Opportunities to Investigate

A second major theme that emerged from 35% of the students interviewed, the teacher interviews, and 13% of all the quick write comments, was related to the project-based nature of the course. Multiple students described the opportunity to investigate ideas deeply, which they contrasted with their experiences of other math courses that they described as requiring memorization of methods. Students enjoyed the project-based nature of the course, but their comments spoke to the ways that learning through exploration had increased their learning and understanding:

I really have enjoyed the course so far. I love how we get right into doing projects and applying what we learned instead of lecturing for months before doing any meaningful work. I also like how realistic and “real” the work is, as I feel like I am doing work that actually applies to real life and I see myself doing very similar things in my future.

Eleven percent of student comments from the quick writes made comparisons with other math courses, with the overwhelming majority of students talking about the irrelevance of previous courses, and dissatisfaction with manipulating procedures. Students described the opportunities to work collaboratively, investigating and connecting ideas, as meaningful for them, allowing them to learn the content deeply. Students frequently contrasted the learning opportunities they received with other courses in which they had to “simply regurgitate” information. As one student noted:

It isn’t about memorizing a bunch of different equations, or finding a side or angle, but understanding what the data means and how you can manipulate it to find answers for any given question.

Students contrasted the learning in data science with courses that focused on memorization:

I like this course a lot. There is little that you have to memorize, and it is mostly project-based. I like that format more than the format of normal courses where you are constantly memorizing, and then being assessed on what you memorized.

4.2.3 Students: Challenging Content

A third theme that emerged from 76% of the students interviewed, and 3% of the quick write comments, was the challenge of the content. Students particularly described the difficulty of learning the technology embedded in the course, which introduces students to coding, and to using spreadsheets and data visualization tools. These tools included programs such as Tableau, CODAP, and Google Sheets, Docs, Slides, and Colab.

We are learning new stuff. I forgot how to code, since third grade. They always give you a little lesson. When we really got into coding, it was so hard for me. I was practicing most of the time, so I got into coding again. It was a little difficult.

Some of the students described the algebraic requirements of the course – developing and using algebraic functions, as the most difficult part:

The hardest part for me is figuring out which functions I need to learn to do what I want the end product to do.

Notably, when students talked about the difficulty of the content, many of them concluded that the project-based nature of the course had made the ideas accessible:

It [the course] is both easier and difficult. Easier to comprehend, but tackling a more difficult subject, since coding itself is a more difficult form of math. But the class itself is explaining it all easier here, so its easier to comprehend and easier to do. In short, it’s being taught better and so it makes it easier for us all to – well – get.

Some students used the quick writes to describe the difficulty of previous math courses, noting that difficulty was increased because they did not have pedagogical structures that allowed them to develop understanding. They contrasted this with the groupwork that was employed in their data science course, which allowed students to move forward even when the learning was particularly challenging:

A lot of opportunities to meet new people, because there is a ton of discussing, which also helps when I am struggling, because we all work together to figure out the problem. So far it is one of my favorite classes this year.:)

Some of the students even noted that other math classes were characterized by divisions between students who were successful and others who were not, but the data science course, despite being challenging, was not creating such divisions, as one student reflected in interview:

It is real interesting. We have, in this class specifically, a lot of people who are very strong in math and we have a lot of people who aren’t coming from very math strong backgrounds, and no one is falling behind. That’s very interesting to me, because normally, by this point in the semester, in Algebra 2, or any other math class, there’s a huge divide between the people who are really on top of it, and the people who are really falling behind and struggling. In this class I don’t see that at all. Everyone’s on the same page.

4.2.4 Students: Creative, “Out of the Box” Thinking

Finally, a fourth theme that featured in 41% of student interviews and 4% of quick writes was the theme of “thinking outside of the box.” In interviews and quick writes students spoke about the opportunities to engage with creativity, something that had been absent in their previous mathematics experiences. The students described being creative with the process of data collection, and with the creation of data visualizations:

I like it, it is an interesting course to take. I like that it shows that data science does not have to be boring or all about numbers. I also like that there is a lot of creative ways to illustrate a graph and get to be creative with it.

And the possibility of their minds being stretched and expanded because of it:

It gives you a lot of, you have to think outside the box a lot, so it kind of expands your mind and your creativity.

4.2.5 Students: Overall Feelings about Course

The quick write comments were also rated by the team of researchers for their overall tone, this showed that 60% of comments were positive, 30% were neutral (offering no clear tone or an even mix of positive and negative) and 10% were negative. The comments that were negative related mainly to the difficulty of coding, the need to write reflections and write up project results, and the lack of explicit directions (probably due to the responsibility students needed to take in projects, compared to other math classes in which they were answering specific questions). Taken as a whole the vast majority of comments (90%) from students were either describing the course with mixed sentiment (30%) or describing the ways the course had enhanced their learning opportunities (60%).

The coding of teacher interviews produced themes that overlapped with the student interview and quick write analysis, with three of the four themes reported above emerging from the teacher data (applicability of content, opportunities to investigate and challenging content).

4.2.6 Teachers: Applicability of Content

Fifty-three percent of the teachers’ comments applied to the applicability of the content. Teachers cited the opportunities to apply ideas as a reason for the high levels of engagement they witnessed from students in their classes:

I feel like they see the value of what we’re doing, and they see the real-world applications of what it is, as opposed to in my Algebra 2, they don’t make those connections as easily.

Notably the opportunities to apply ideas had meant students were no longer asking teachers: “When are we going to use this”?

But overall, as far as engagement is concerned, I think the students are interested in it. They’re having fun. They seem to be at least a lot more; I don’t get any complaints about what we’re doing, whereas in Algebra 2: “Why are we learning this?”, that kind of thing. We don’t get: “Why are we learning this?” ever. So, if there’s anything like the value of; they know why they’re learning this. They can see it; it’s immediate.

4.2.7 Teachers: Opportunities to Investigate

The code of ‘opportunities to investigate’ was applied to 18% of the teacher comments. The teachers also described the affordances of a project-based class, in which students collaborated and investigated ideas. As one teacher noted:

I’m letting them learn, letting them explore and teach each other… a very cooperative atmosphere. They’re working together on things.

Other teachers noted the opportunities for student agency provided by the investigative content.

The great part about the class is there’s a lot more room for students to explore what they’re interested in.

For example, I love how they discovered that you can just start typing a function and it comes up, and that’s nothing I taught them. They discovered that. It’s like, hey, how many other functions are there? And what other questions might we want to know about this data? So I want to prod and push their thinking about, now that we know we can do this, what does it tell us? And what other ways that weren’t in the group task can we actually look at this data? Maybe do a little searching. And just get them to expand their minds about how they can look at the data. But most importantly, after doing that, what does it tell them?

Teachers noted that the content of data science opened space for students’ ideas to be brought to the forefront, as students themselves chose datasets to investigate and methods of data investigation and analysis.

4.2.8 Teachers: Challenging Content

Teachers reflected on the difficulty of the content in the course, noting that the course was different to a more typical Algebra 2 course, and the challenge level was less about procedural repetition and more about critical thinking:

I think it prepares students in a different way. So, I think it actually teaches them much more of the scientific thinking process. I think it teaches them how to acquire new skills that are necessary to become someone that does STEM. So, I think what it’s teaching is actually more the thinking skills and the practical day-to-day operations of someone who works in the STEM field. So, I think that’s what’s good.

Some teachers had heard the claim that data science was not as “rigorous” an option as Algebra 2 content, and rejected the idea, pointing out the different quality of rigor. Twenty-nine percent of teacher comments talked about the opportunities for students to engage in higher order thinking, noting that this had been rare in their other high school courses:

And I don’t think there’s a lot of classes at the high school level that lets them have that open, analytical push into higher, thought processes - into a lot of different areas and avenues. And for that, I don’t think people realize that’s what this class is allowing to have happen. And I think it’s really needed for the kids.

In challenging the idea that the course was less rigorous than Algebra courses teacher noted, again, the applicability of the content and the need to think about data science differently:

It’s not a low-level math class, (.) There are different sets of knowledge… I worked as a software engineer in San Francisco, and there was a data science team, and that data science team used Tableau for what they were doing. So that was just like, oh, that’s awesome. This is something that is actually being used in the real world that students are learning about. So, I just wanted to point out that there’s real world application to the stuff going on that I can clearly see.

For all the reasons that the previous three themes captured, teachers spoke about their appreciation of teaching the course. They regarded the opportunities for students to work collaboratively, and investigate ideas, as important for the students, which provided rewarding teaching experiences for them. Some of the teachers particularly noted the applied nature of the content motivating students, which helped them in their role:

And so I think that’s one of the greatest things is as a teacher, you don’t have to make an argument for it. This is why you’re doing it. It’s more just like, “Oh yeah, this makes sense. Yeah, this is cool, and I want to learn how to do that.” So that’s great about it.

Others simply noted their appreciation for the opportunity to teach the class:

I think if my colleagues really thought about it, they would all want to steal it from me.

All seven of the teachers interviewed related their increased teaching enjoyment to the engagement of the students, with high levels of appreciation for students being given the opportunity to think creatively, develop higher-order thinking, and apply mathematical ideas to the world.

4.3 Student Survey Results

Multivariate Analysis of Covariance (MANCOVA) was conducted with the data from the subset of 347 students with complete post survey scores and demographic data to evaluate differences between data science (n = 25) and non-data science (n = 322) students’ beliefs about having a future in STEM, interests in pursuing STEM, and confidence in making sense of data and online information. shows the demographic characteristics of the two groups. Students in data science were more likely to report having people encouraging them to take STEM (p = 0.003).

Table 5 Available demographic characteristics of data science and non-data science students with post survey data.

As described in Section 3.2.4, three variables, gender, support to pursue STEM, and grade level, were employed to control for some of the potential preexisting differences in the data science and non-data science groups of students. The MANCOVA results reveal that assumption of equality of covariance (p = 0.523) and the assumptions of equal variances across the two groups for each of the three variables are met (p = 0.512, p = 0.455, and p = 0.986, respectively). Of the covariates included in the analysis, students who self-reported as non-binary had a higher average interest in STEM relative to males (F (1, 341) = 80.54, p = 0.002). In addition, students indicating having people in their lives that encouraged them to learn STEM had stronger average interest in STEM (F (1, 341) = 55.70, p < 0.001), confidence in data science skills (F (1, 341) = 97.24, p < 0.001), and belief they had a future in STEM (F (1, 341) = 13.52, p < 0.001). Finally, students’ grade level was negatively associated with confidence (F (1, 341) = 10.70, p = 0.039).

As shown in , after controlling for differences in the covariates across the two groups, compared to non-data science students, data science students were, on average, more positive about having a future in STEM if they wanted (Diff = 1.22, 95% CI [.17, 2.27]; F(1, 341) = 5.25, p = 0.039). They were also, on average, more interested in pursuing STEM (Diff = 1.54, 95% CI[.32, 2.76]; F(1, 341) = 6.19, p = 0.013), and reported higher confidence in their ability to make sense of data and information on the internet (Diff = 5.60, 95% CI[2.23, 8.96]; F(1, 341) = 10.70, p < 0.001; see supplemental Table S3 for the full set of MANCOVA results).Footnote6

Table 6 Mean estimates for non-data science and data science students.

Additional survey data collected in the Spring of 2023 from 11th and 12th grade students who were taking the data science course in eight high schools from the same districts supported the finding of increased confidence in data science skills and interest in STEM. The mean total confidence score based on the same instrument described above is 27.42, (n = 323, 95% CI [26.68, 28.13]). In addition, 50% (150/301) of students stated that the data science course had increased their interest in a STEM career, and 36% of students stated that the course had increased their interest in pursuing higher education. These percentages, though not large, seem to indicate that the course made a meaningful difference to an important group of students in the course who may not have been considering either STEM or higher education as options for them.

4.4 Assessment of Data and Functions

The total percent score on the assessment of data and functions (described in Section 3.2.5) for each student used for the main analysis was calculated as the sum of the score on the first question and each of the sub-questions of the second question dividing by the total possible score and multiplying by 100. The assessment and points for each section is included in Appendix A. Cronbach’s Alpha reliability of the total percentage score is 0.74 suggesting the assessment of data and functions is a consistent measure of student achievement. The difference in the average percent scores of data science and Algebra 2 students on the assessment is compared using a second Analysis of Covariance with the covariates listed in 3.2.5.

A total of 692 students took the assessment of data and function in the two regions. Demographic and covariate data was then connected to students’ scores on the assessment. Based on the available data, independent sample t-tests showed that the Data Science students were more likely to have already taken at least one semester of Algebra 1 (p = 0.009) and Algebra 2 (p < 0.001), had lower prior math grades (p = 0.007), were in a higher grade (p = < 0.001), were older (p < 0.001), more likely to be Native American (p = 0.008), more likely to be Latine (p < 0.012), and were more likely to be enrolled in a FRL program (p = 0.024), be ELL (p < 0.017), and be in a special education program (p < 0.072).

Of the 692 students who took the assessment, a total of 449 had complete demographic and covariate data. After removing outliers,Footnote7 407 students (60 in data science and 347 in Algebra 2) who had completed the assessment of data and functions were used for the main analysis. shows the demographic characteristics of the subset of students for whom complete data was available and who completed the assessment of data and functions.

Table 7 Demographic characteristics of the subset of students who completed the assessment of data and functions and for whom complete demographic data was available.

The assumption of equality of variance is met for the analysis (F (3, 403) = 0.606, p = 0.611). Of the covariates included, only ‘most recent Algebra GPA’ (F (1, 386) = 9.87, p = 0.002) was correlated with performance on the assessment. Data science students scored, on average, higher on the assessment of data and functions than the Algebra students, (MData Science = 58.5, MAlgebra = 50.6, Diff = 7.93, 95% CI [1.82, 14.03]; F(1, 386) = 6.51, p = 0.011; see supplemental Table S4 for full ANCOVA results).Footnote8

Sixty-two percent of the data science students had already taken at least one semester of Algebra 2. The analysis controlled for this difference and found that taking Algebra 2 did not advantage the students. A repeated analysis with just the students who had no prior Algebra 2 education confirms the results and shows that on average, data science students still achieved higher scores than the Algebra 2 students (F(1, 343) = 5.13, p = 0.024; see Table S5).

5 Limitations

It is important to acknowledge that these results, along with the previous quantitative results related to the number of semesters of math classes, are exploratory. Given the small number of data science students available to conduct the analyses, a more complete hierarchical model was not able to be employed to explore random effects such as the teacher or school. Causal inference is also limited due to the inability to randomly assign students to take data science or not. While our analyses used methods to account for important potential confounders, additional variables may play a role in explaining the observed differences. This study should be expanded to include a wider evaluation with more schools to fully understand the effects of the availability of data science.

A broader limitation of this study is the fact that those teaching the data science course were “early adopters” of the course, and their teaching of the content may not be representative of teachers drawn from a wider pool. Nevertheless, the strong performance of the students taking data science, on the test of data and functions that required a significant use and application of algebra, is promising.

More generally, a limitation of this study is the comparison of a project-based data science course with traditionally taught Algebra 2 courses, with many of the student comments relating to the project-based nature of the course and the opportunities to investigate ideas. This limitation is mitigated by the fact that it is partly the nature of the data science content that encourages students to apply and investigate ideas, as pointed out by those working in the field of data and statistics education (Erickson et al. Citation2019; Bargagliotti, Arnold, and Franklin Citation2021). Researchers note the active role played by students when asking questions of data, and setting up and analyzing mathematical models that do not occur as naturally when students are learning the content of Algebra 2. This is not to say that a project-based Algebra 2 course could not be developed, but the way the content of Algebra 2 has been prescribed has often led educators to design more passive learning experiences for students (LaMar, Leshin, and Boaler Citation2020). It could be argued that a more valuable course for students is one that allows them to learn algebraic concepts through the use and application of data.

6 Conclusion

This article described a multi-method study of high school students taking a course in data science, adding to other emerging studies (Gould et al. Citation2016; Heinzman Citation2022; Reed et al. Citation2023) showing the potential of data science content for high student engagement and learning of applied mathematics. This evidence seems particularly important in the light of the critiques that are surfacing about the validity of data science courses. One of the critiques given is that a data science course lacks rigor (Harman Citation2023), and a true test of students’ mathematical prowess comes from students’ success, or not, in algebraic manipulation courses (Harman Citation2023). But the data from this study, from the descriptions of students and teachers, shows that while they report that the content of the course is challenging for students, the quality of the rigor is different, resting upon the need to use and apply methods, and master various technological tools, in the service of building and analyzing mathematical models. Commentators claiming that data science is less rigorous than Algebra are often employing a narrow definition of mathematical rigor.

A second critique offered by those opposed to data science, is that students may choose to take a data science course in high school, but later realize that they needed Algebra 2 to pursue a career in STEM. Data from this study addresses this critique in several ways. First the quantitative data show that, in this study, students who take data science take more mathematics courses, with 62% of the students taking both data science and Algebra 2. The data show that a data science course may not pull students away from Algebra 2, but rather it might provide another mathematical option for students. Second, in an assessment of data and functions, students in the data science course achieved at significantly higher levels than students taking Algebra 2, showing that they were more able to develop and interpret algebraic functions. Even though the number of students involved limits the conclusions that can be drawn, the data do at least suggest that rather than students missing out on the content of algebra, students who took data science deepened and extended their understanding of algebra, developing the ability to use it to make sense of data in the world.

Most of the students in the study had taken both data science and Algebra 2, but for some students the data science course prompted an interest in STEM, which provided a need to take additional courses to be ready to take calculus in college. The fact that the content of data science serves to reengage students in this way, and, importantly, gives students the belief that they can succeed, is positive and should prompt both schools and colleges to offer more flexibility to students. One mechanism for this could be in the provision of bridging courses and new courses that move from the content of data science to the mathematics of calculus.

In addition to the opportunity provided for students to learn and apply mathematical content, data science serves an important role in disrupting the inequities that plague high school mathematics pathways. As described in the introduction, the calculus pathway requires that students are advanced in middle school, which has led to the filtering of most students out of calculus and ultimately away from STEM, particularly Black and Brown students (National Center for Education Statistics Citation2022). The data science course that was the focus of this study was being taken by a more racially diverse group of students than is typical of high-level mathematics courses and rather than limiting the students’ options, as some have claimed, it sparked an interest in STEM that students had previously not developed. A 35-year veteran teacher who had been teaching a different data science course, submitted these comments to California’s state board of education meeting, as part of the California mathematics framework adoption process:

For the last 6 years I had the amazing opportunity to teach a full high school data science course, this course transformed my own teaching practices in these later years, and transformed the lives of many students. Data science by nature brings equity into the classroom as special ed students, English language learners and calculus students work side by side, each is an expert in their domain area. Students who had a dislike for math suddenly were transformed into math lovers, they became skilled in statistical analysis, and computer programming, and critical thinking, valuable skills needed to navigate this world. Students found themselves capable of communicating their mathematical findings and interpretations, a natural outcome, based on the high engagement to research current issues and to share their capstone research findings. Since I also taught AP statistics, I saw many students who never would have taken an AP math course take AP statistics. The DS course completely prepared them for this. I witnessed their success with many of them wishing to major in statistics or data science.

Joy Straub, high school mathematics teacher: (https://drive.google.com/file/d/14zL-etWRLXcLkTFXHzUisyCmAPf1n-_Z/view)

The question requires you to show your work and explain your reasoning. You may use drawings, words, and numbers in your explanation. Your answer should be clear enough so that another person could read it and understand your thinking. It is important that you show all of your work.

The table below shows the number of people each week getting a vaccination in two regions. The mean (average) and the median values are shared at the end of the table.

(a) Which statistic, the mean or the median, would you use to describe the vaccination numbers for the 5 weeks in region A? Justify your answer.

(b) Which statistic, the mean or the median, would you use to describe the vaccination numbers for the 5 weeks in region B? Justify your answer.

A threat to the development of a data science pathway in schools is the framing of data science as a low-level alternative to calculus. But teachers of the course argue against this framing, pointing to the challenge of the content, and the value of students with different mathematical backgrounds working together and learning together.

For some students, data science content is an important addition to their calculus pathways, giving them the first opportunity they have received to exercise agency (Boaler and Greeno Citation2000) in their learning of mathematics, and apply the content they have been learning. For others, such courses prompt an interest in statistics, leading to an AP Statistics course. And for all students, data science courses help students develop the ability to read data, create mathematical models, use technological tools, and interpret findings—important understandings for young people taking their place in our newly data-filled world.

Supplemental material

Supplemental Material

Download MS Word (55.8 KB)

Supplementary Materials

For the Math Pathways, Student Survey, and Assessment of Data and Functions analyses of covariance, supplementary tables provide the full variance component estimate results. These tables can be used to understand the magnitude of the contribution of the different covariates in explaining the variation in the outcome scores for each analysis.

Data Availability Statement

Deidentified data and code can be found at https://osf.io/5gb4m/

Disclosure Statement

Several of the authors of this paper were involved in writing the data science curriculum that is the subject of the study. Authors have also been involved in advocating for Data Science as an additional mathematics pathway in K-12 mathematics curriculum.

Additional information

Funding

This report is based on research funded in part by the Bill and Melinda Gates Foundation. Additional funding for this work was also provided by Valhalla Foundation. The findings and conclusions contained within are those of the authors and do not necessarily reflect the positions or policies of the Bill and Melinda Gates Foundation or Valhalla Foundation.

Notes

2 School 2 from the Central Valley region was unable to provide the enrollment and sociodemographic data needed for the analysis and the district in the Southern California region provided data for 4 additional schools in addition to the 2 focal schools.

3 The three consecutive time span was used to ensure a fair comparison across the two groups and limit data loss that would occur with a 4-year window.

4 The normality assumptions for the dependent variable were not met. In response, a rank transformation was applied to the dependent variable. This non-parametric approach allowed for the use of parametric tests on the rank scores. Analyses confirmed that the assumptions of homogeneity of variances were met post-transformation, F(1, 3016) = .116, p = .733. These results also revealed a significant effect on the ranked dependent variable, F(1, 3001) = 6.71, p = .010, indicating the groups significantly differed in their average rank confirming the reported ANCOVA results are robust to violations of non-normality.

5 There were insufficient numbers of Black students in the research schools to see participation patterns.

6 Multivariate outlier analysis resulted in the removal of two non-Data Science student cases. Assumptions of equality of covariance across the dependent variables for the two groups and equality of error variance for each dependent variable are met. While the normality assumptions of all cells of the design for each dependent variable were not met, results from MANCOVA are robust to violations of normality in the data.

7 Outliers were identified by examining the box plots of the distribution of test scores in the two groups. The total score threshold of > 18 percent was selected, removing about 9% of cases. The threshold corresponded to a natural gap in the distribution for both groups. Most cases were students who had scored zero confirming limited engagement in the assessment.

8 Given the objective socioeconomic differences in the two geographic regions in the study, geographic region was included as a fixed factor that might interact with the effect of being enrolled in Data Science. As expected, students from one of the regions performed significantly better than students in the other region. However, the interaction with being enrolled in Data Science was non-significant. Hence, only the main effect of the Data Science course is reported.

References

  • Arnold, P., Bargagliotti, A., Franklin, C., and Gould, R. (2022), “Bringing Complex Data into the Classroom,” Harvard Data Science Review, 4, 1–11. DOI: 10.1162/99608f92.4ec90534.
  • Bargagliotti, A., Arnold, P., and Franklin, C. (2021), “GAISE II: Bringing Data into Classrooms,” Mathematics Teacher: Learning and Teaching PK-12, 114, 424–435. DOI: 10.5951/MTLT.2020.0343.
  • Ben-Zvi, D., and Arcavi, A. (2001), “Junior High School Students’ Construction of Global Views of Data and Data Representations,” Educational Studies in Mathematics, 45, 35–65.
  • Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., Franklin, M., Martonosi, M., Raghavan, P., Stodden, V., and Szalay, A. S. (2018), “Realizing the Potential of Data Science,” Communications of the ACM, 61, 67–72. DOI: 10.1145/3188721.
  • Berry III, R. Q., and Larson, M. R. (2019), “The Need to Catalyze Change in High School Mathematics,” Phi Delta Kappan, 100, 39–44. DOI: 10.1177/0031721719834027.
  • Boaler, J., and Levitt, S. D. (2019), “Opinion: Modern High School Math Should be About Data Science—Not Algebra 2,” The Los Angeles Times, October 23. Available at https://www.latimes.com/opinion/story/2019-10-23/math-high-school-algebra-data-statistics
  • Boaler, J., and Greeno, J. G. (2000), “Identity, Agency, and Knowing in Mathematics Worlds,” Multiple Perspectives on Mathematics Teaching and Learning, 1, 171–200.
  • Bressoud, D. (2017), “Introduction,” in The Role of Calculus in the Transition from High School to College Mathematics: Report of the Workshop Held at the MAA Carriage House, ed. D. Bressoud, pp. 3–12. Mathematical Association of America and National Council of Teachers of Mathematics.
  • Burdman, P. (2018), “The Mathematics of Opportunity: Rethinking the Role of Math in Educational Equity,” Just Equations. Available at https://justequations.org/resource/the-mathematics-of-opportunity-report
  • Burdman, P. (2019), “Why Calculus? Why Indeed?” Just Equations, Blog, High School Math Policies. Available at https://justequations.org/blog/why-calculus-why-indeed
  • Burdman, P. (2023), “Rx for Data Science Education Debates: More Light and Less Heat,” Just Equations, Blog, High School Math Policies. Available at https://justequations.org/blog/rx-for-data-science-education-debates-more-light-and-less-heat
  • CA Adopts K-12 Data Science Education. (2023), Available at https://www.datascience4everyone.org/post/ca-adopts-k-12-data-science-education
  • Campbell, J. L., Quincy, C., Osserman, J., and Pedersen, O. K. (2013), “Coding in-Depth Semistructured Interviews: Problems of Unitization and Intercoder Reliability and Agreement,” Sociological Methods & Research, 42, 294–320. DOI: 10.1177/0049124113500475.
  • Chen, C. H., and Yang, Y. C. (2019), “Revisiting the Effects of Project-Based Learning on Students’ Academic Achievement: A Meta-Analysis Investigating Moderators,” Educational Research Review, 26, 71–81. DOI: 10.1016/j.edurev.2018.11.001.
  • Cleveland, W. S. (2001), “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,” International Statistical Review, 69, 21–26. DOI: 10.1111/j.1751-5823.2001.tb00477.x.
  • Craig, T. T., and Marshall, J. (2019), “Effect of Project-Based Learning on High School Students’ State-Mandated, Standardized Math and Science Exam Performance,” Journal of Research in Science Teaching, 56, 1461–1488. DOI: 10.1002/tea.21582.
  • Custer, D. B., and Simic-Muller, K. (2023), “The Data Revolution,” Mathematics Teacher: Learning and Teaching PK-12, 116, 78–79. DOI: 10.5951/MTLT.2022.0322.
  • D’ignazio, C., and Klein, L. F. (2020), Data Feminism. Cambridge, MA: The MIT Press.
  • Dana Center. (2020), “Launch Years: A New Vision for the Transition from High School to Postsecondary Mathematics,” The University of Texas at Austin: Charles A. Dana Center. Available at https://utdanacenter.org/launchyears
  • Dana Center. (2022), “Re-Envisioning Mathematics Pathways to Expand Opportunities: The Landscape of High School to Postsecondary Course Sequences,” The University of Texas at Austin: Charles A. Dana Center. Available at https://edstrategy.org/wp-content/uploads/2022/07/Re-Envisioning-Mathematics-Pathways-to-Expand-Opportunities_FINAL.pdf
  • Daro, P., and Asturias, H. (2019), “Branching Out: Designing High School Math Pathways for Equity,” Just Equations. Available at https://justequations.org/resource/branching-out-designing-high-school-math-pathways-for-equity
  • de Oliveira Biazus, M., and Mahtari, S. (2022), “The Impact of Project-Based Learning (PjBL) Model on Secondary Students’ Creative Thinking Skills,” International Journal of Essential Competencies in Education, 1, 38–48. DOI: 10.36312/ijece.v1i1.752.
  • De Veaux, R., Hoerl, R., Snee, R., and Velleman, P. (2022), “Toward Holistic Data Science Education,” Statistics Education Research Journal, 21, 2–2. DOI: 10.52041/serj.v21i2.40.
  • DesPortes, K., Vacca, R., Tes, M., Woods, P., and Matuk, C. (2022), “Dancing with Data: Embodying the Numerical and Humanistic Sides of Data,” in Proceedings of the 16th International Conference of the Learning Sciences-ICLS 2022, pp. 305–312, International Society of the Learning Sciences.
  • Dogucu, M., Johnson, A. A., and Ott, M. (2023), “Framework for Accessible and Inclusive Teaching Materials for Statistics and Data Science Courses,” Journal of Statistics and Data Science Education, 31, 144–150. DOI: 10.1080/26939169.2023.2165988.
  • Drew, C. (2011), “Why Science Majors Change Their Minds (It’s Just So Darn Hard),” The New York Times. Available at https://www.nytimes.com/2011/11/06/education/edlife/why-science-majors-change-their-mind-its-just-so-darn-hard.html
  • Dweck, C. S. (2006), Mindset: The New Psychology of Success, Random House, NY: Penguin Random House LLC.
  • Educational Data Systems. (2018), Silicon Valley Mathematics Initiative’s Mathematics Assessment Collaborative: Mathematics Assessment Service (MARS) and California Assessment of Student Performance and Progress (CAASPP) Technical Report.
  • Emerson, R. M., Fretz, R. I., and Shaw, L. L. (2011), Writing Ethnographic Fieldnotes, Chicago, IL: The University of Chicago Press.
  • Engel, J. (2017), “Statistical Literacy for Active Citizenship: A Call for Data Science Education,” Statistics Education Research Journal, 16, 44–49. DOI: 10.52041/serj.v16i1.213.
  • Erickson, T., Wilkerson, M., Finzer, W., and Reichsman, F. (2019), “Data Moves,” Technology Innovations in Statistics Education, 12, 1–25. DOI: 10.5070/T5121038001.
  • Finzer, W. (2013), “The Data Science Education Dilemma,” Technology Innovations in Statistics Education, 7, 1–10. DOI: 10.5070/T572013891.
  • Fiss, A. (2012), “Problems of Abstraction: Defining an American Standard for Mathematics Education at the Turn of the Twentieth Century,” Science & Education, 21, 1185–1197. DOI: 10.1007/s11191-011-9413-9.
  • Gould, R. (2010), “Statistics and the Modern Student,” International Statistical Review, 78, 297–315. DOI: 10.1111/j.1751-5823.2010.00117.x.
  • Gould, R., Bargagliotti, A., and Johnson, T. (2017), “An Analysis of Secondary Teachers’ Reasoning with Participatory Sensing Data,” Statistics Education Research Journal, 16, 305–334. DOI: 10.52041/serj.v16i2.194.
  • Gould, R., Machado, S., Ong, C., Johnson, T., Molyneux, J., Nolen, S., Tangmunarunkit, H., Trusela, L., and Zanontian, L. (2016), “Teaching Data Science to Secondary Students: The Mobilize Introduction to Data Science Curriculum,” Promoting Understanding of Statistics about Society. In Proceedings of the Roundtable Conference of the International Association of Statistics Education (IASE), Berlin, Germany, ed. J. Engel.
  • Harman, A. (2023), “In California, a Math Problem: Does Data Science = Algebra II,” New York Times, July 13, Available at https://www.nytimes.com/2023/07/13/us/california-math-data-science-algebra.html
  • Hazzan, O., and Mike, K. (2023), Guide to Teaching Data Science: An Interdisciplinary Approach, Cham: Springer Nature.
  • Hedges, S., and Given, K. (2023), “Addressing Confirmation Bias in Middle School Data Science Education,” Foundations of Data Science, 5, 223–243. DOI: 10.3934/fods.2021035.
  • Heinzman, E. (2022), “I Love Math Only If It’s Coding”: A Case Study of Student Experiences in an Introduction to Data Science Course,” Statistics Education Research Journal, 21, 5. DOI: 10.52041/serj.v21i2.43.
  • Holmes, V. L., and Hwang, Y. (2016), “Exploring the Effects of Project-Based Learning in Secondary Mathematics Education,” The Journal of Educational Research, 109, 449–463. DOI: 10.1080/00220671.2014.979911.
  • Kahn, J. (2020), “Learning at the Intersection of Self and Society: The Family Geobiography as a Context for Data Science Education,” Journal of the Learning Sciences, 29, 57–80. DOI: 10.1080/10508406.2019.1693377.
  • Kahn, J. B., Peralta, L. M., Rubel, L. H., Lim, V. Y., Jiang, S., and Herbel-Eisenmann, B. (2022), “Notice, Wonder, Feel, Act, and Reimagine as a Path toward Social Justice in Data Science Education,” Educational Technology & Society, 25, 80–92.
  • Kokotsaki, D., Menzies, V., and Wiggins, A. (2016), “Project-Based Learning: A Review of the Literature,” Improving Schools, 19, 267–277. DOI: 10.1177/1365480216659733.
  • Konold, C. (2007), “Designing a Data Analysis Tool for Learners,” in Thinking with Data: The 33rd Annual Carnegie Symposium on Cognition, eds. M. Lovett and P. Shah. Hillside, NJ: Lawrence Erlbaum Associates.
  • Konold, C., Higgins, T., Russell, S. J., and Khalil, K. (2015), “Data Seen through Different Lenses,” Educational Studies in Mathematics, 88, 305–325. DOI: 10.1007/s10649-013-9529-8.
  • LaMar, T., and Boaler, J. (2021), “The Importance and Emergence of K-12 Data Science,” Phi Delta Kappan, 103, 49–53. DOI: 10.1177/00317217211043627.
  • LaMar, T., Leshin, M., and Boaler, J. (2020), “The Derailing Impact of Content Standards – an Equity Focused District Held Back by Narrow Mathematics,” International Journal of Educational Research Open, 1, 100015. DOI: 10.1016/j.ijedro.2020.100015.
  • Landis, J. R., and Koch, G. G. (1977), “The Measurement of Observer Agreement for Categorical Data,” Biometrics, 33, 159–174.
  • Lawyers’ Committee for Civil Rights of the San Francisco Bay Area 9 (LCCR). (2013), “Held Back: Addressing Misplacement of 9th Grade Students in Bay Area School Math Classes,” Available at https://lccrsf.org/wp-content/uploads/HELD-BACK-9th-Grade-Math-Misplacement.pdf
  • Lee, V. R., Wilkerson, M. H., and Lanouette, K. (2021), “A Call for a Humanistic Stance toward K-12 Data Science Education,” Educational Researcher, 50, 664–672. DOI: 10.3102/0013189X211048810.
  • Levitt, S. D. (2022), “Rethinking Math Education,” Education Next, 22, 66–71. https://www.proquest.com/scholarly-journals/rethinking-math-education/docview/2733270884/se-2.
  • Moon, P. F., Israel-Fishelson, R., Tabak, R., and Weintrop, D. (2023), June). “The Tools Being Used to Introduce Youth to Data Science,” in Proceedings of the 22nd Annual ACM Interaction Design and Children Conference, pp. 150–159. DOI: 10.1145/3585088.3589363.
  • Msweli, N. T., Mawela, T., and Twinomurinzi, H. (2023), “Data Science Education – a Scoping Review,” Journal of Information Technology Education: Research, 22, 263–294. DOI: 10.28945/5173.
  • National Center for Education Statistics. (2022), High School Mathematics and Science Course Completion. Condition of Education, U.S. Department of Education, Institute of Education Sciences. Retrieved August 28, 2023 from, available at https://nces.ed.gov/programs/coe/indicator/sod.
  • National Education Association of the United States. (1894), Report of the Committee of Ten on Secondary School Studies: With the Reports of the Conferences Arranged by the Committee, New York, Cincinnati, Chicago: The American Book Company.
  • Noble, S. U. (2018), Algorithms of Oppression: How Search Engines Reinforce Racism, New York: New York University Press.
  • O’Neil, C. (2016), Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, New York: Broadway Books.
  • Piccolino, A. V. (1996), “The Advanced Placement Course in Statistics: Increasing Students’ Options,” The Mathematics Teacher, 89, 376–377. DOI: 10.5951/MT.89.5.0376.
  • Pimentel, D. R., Horton, N. J., and Wilkerson, M. H. (2022), Tools to Support Data Analysis and Data Science in K-12 Education, Washington, DC: National Academies of Science.
  • Reed, S., Bracco, K., Kurlaender, M., and Merritt, C. (2023), “Innovating High School Math Through K-12 and Higher Education Partnerships,” Policy Analysis for California Education (PACE). Available at https://files.eric.ed.gov/fulltext/ED628238.pdf
  • Saldaña, J. (2015), The Coding Manual for Qualitative Researchers, Thousand Oaks, CA: Sage.
  • Scheaffer, R. L., and Jacobbe, T. (2014), “Statistics Education in the K-12 Schools of the United States: A Brief History,” Journal of Statistics Education, 22, 1–13. DOI: 10.1080/10691898.2014.11889705.
  • Shin, M. H. (2018), “Effects of Project-Based Learning on Students’ Motivation and Self-Efficacy,” English Teaching, 73, 95–114. DOI: 10.15858/engtea.73.1.201803.95.
  • Stigler, J., and Son, J. (2023), “Don’t Force a False Choice between Algebra and Data Science,” EdSource, Jan 2023. Available at https://edsource.org/2023/dont-force-a-false-choice-between-algebra-and-data-science/684817
  • Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016), “Evaluating Information: The Cornerstone of Civic Online Reasoning,” Stanford Digital Repository. Available at https://purl.stanford.edu/fv751yt5934
  • Wolff, A., Gooch, D., Montaner, J. J. C., Rashid, U., and Kortuem, G. (2016), “Creating an Understanding of Data Literacy for a Data-Driven Society,” The Journal of Community Informatics, 12. DOI: 10.15353/joci.v12i3.3275.
  • Zucker, A., Noyce, P., and McCullough, A. (2020), “JUST SAY NO!: Teaching Students to Resist Scientific Misinformation,” The Science Teacher, 87, 24–29. DOI: 10.2505/4/tst20_087_05_24.

A Appendix A

Questions from the assessment of data and functions given to students in Algebra 2 and in Data Science.

Assessment Item 1—This is a question adapted from a question in the National Assessment of Educational Progress (NAEP) bank of questions. Score range 0 to 5.

Assessment Item 2—This question is adapted from the textbook: Core Plus Mathematics Course 1, Unit 3 Linear Functions. Score range 0 to 13.

The data in the table linked below shows how average daily food supply (in calories) is related to life expectancy (in years) and infant mortality rates (in deaths per 1000 births) in a sample of countries.

(a) Make scatter plots of the (Daily calories, life expectancy) and (daily calories, infant mortality) data. Feel free to use any tools you would like to create these plots. Study the patterns in the table and scatter plots use them to answer the following questions.

(b) If you used google sheets please share a link with permission to view your plot. To change the share settings, click on the share button in the top right corner and set to “anyone with link can view.”

(c) What seems to be the general relationship between daily calories and life expectancy in the sample countries?

(d) Economists might use a linear model to predict the increase of life expectancy or decrease of infant mortality for various increases in food supply. Determine a linear model for calculating life expectancy from calories using the (daily calories, life expectancy) daily pattern.

(e) What story does this data tell?

(f) What other data would you like to collect to draw stronger conclusions?

B Appendix B

This codebook was used for coding the quick writes and adapted to code interviews as well.