1,744
Views
8
CrossRef citations to date
0
Altmetric
Articles

Assessing the Association Between Precourse Metrics of Student Preparation and Student Performance in Introductory Statistics: Results from Early Data on Simulation-Based Inference vs. Nonsimulation-Based Inference

, , , , , , & show all

ABSTRACT

The recent simulation-based inference (SBI) movement in algebra-based introductory statistics courses (Stat 101) has provided preliminary evidence of improved student conceptual understanding and retention. However, little is known about whether these positive effects are preferentially distributed across types of students entering the course. We consider how two metrics of Stat 101 student preparation (precourse performance on concept inventory and math ACT score) may or may not be associated with end of course student performance on conceptual inventories. Students across all preparation levels tended to show improvement in Stat 101, but more improvement was observed across all student preparation levels in early versions of a SBI course. Furthermore, students' gains tended to be similar regardless of whether students entered the course with more preparation or less. Recent data on a sample of students using a current version of an SBI course showed similar results, though direct comparison with non-SBI students was not possible. Overall, our analysis provides additional evidence that SBI curricula are effective at improving students' conceptual understanding of statistical ideas postcourse regardless student preparation. Further work is needed to better understand nuances of student improvement based on other student demographics, prior coursework as well as instructor and institutional variables.

This article is part of the following collections:
JSDSE Jackie Dietz Best Paper Award

1. Introduction

Students taking college-level, algebra-based introductory statistics courses (Stat 101) come to the course having a variety of different amounts of preparation. For example, some students have completed high school algebra or college algebra, some have completed high school statistics, and still others have completed high school pre-Calculus, Calculus, or AP Statistics. A struggle when teaching Stat 101 is accommodating these varying levels of preparation appropriately, so that all students can learn effectively: strong students are not bored, nor are weaker students left behind, and all students improve on key measures of conceptual understanding.

Numerous studies have looked at variables related to student performance in statistics courses. Prior research has shown that prior mathematical skills, as measured by the ACT and a basic math skills test, were strong predictors of student performance (Johnson and Kuennen Citation2006), while the type and order of college level mathematics courses taken was strongly associated with student performance in an introductory statistics course in a business school (Green et al. Citation2009). Another study of student performance in business statistics suggested that both performance in an algebra course and college GPA were strong predictors of student performance in the course (Rochelle and Dotterweich Citation2007). Two recent studies demonstrated that college GPA, as well as ACT, were both strong predictors of student performance in a general, multidisciplinary introductory statistics course (Li, Uvah, and Amin Citation2012; Wang, Tu, and Shieh Citation2007). This finding is in line with other research which has indicated that poor mathematical training in high school leads to challenges learning statistics in college (Gnaldi Citation2006). Another study, focusing on performance in an introductory statistics course for psychology students, found that age and an assessment of algebra skills were strong predictors of student performance (Lester Citation2007). A recent multi-institutional study concluded that increased mathematical coursework in high school was associated with students later taking more and harder statistics courses in college, and performing better in those courses (Dupuis et al. Citation2011). Two studies which considered both mathematical competencies as well as student attitudes found that both were significant predictors of student performance (Cherney and Cooney Citation2005; Silvia et al. Citation2008).

These studies examining student performance have generally focused on mathematical and quantitative reasoning (e.g., ACT score) and general measures of student performance (e.g., class grade and GPA), as these metrics are generally readily available. However, these studies are limited in at least three important ways. First, these studies mainly look at predicting students' end of course performance using prior predictors: they fail to consider what factors may be associated with students' change in statistical understanding throughout the course. For example, are students who end the course at a high level of conceptual understanding simply doing so because they started with a high level of understanding? Second, in these studies, student performance is generally measured using course grades, instead of using valid and reliable measures of students' conceptual understanding of statistics in the course. Finally, these studies are primarily focused on looking at student performance in a single course, and not drawing comparisons across curricula. In particular, given the recent gain in popularity of simulation-based inference (SBI) as an approach for teaching introductory statistics (Tintle et al. Citation2015; Tintle et al. Citation2011), an important and unaddressed question is how student abilities and background may be associated with student performance differently in SBI curricula versus traditional curricula.

In this article, we address these limitations by focusing primarily on how students' conceptual understanding of statistics changes from the beginning of the course to the end using a valid and reliable instrument. First, we explore how students' growth in conceptual understanding may be different depending on students' mathematical or statistical abilities when they enter the course as measured by ACT score, a statistics concepts pretest or their college GPA. For two institutions, we have data both before and after a switch to an early-SBI curriculum, and compare student performance both across ability/preparation groups as well as across curricula. We examine overall performance as well as performance within different subscales. Second, for a larger set of institutions using an SBI curriculum, we evaluate whether students at all ability/preparation levels show similar improvement in conceptual understanding overall, and within subscales, in light of recent results showing improved conceptual performance among students in simulation-based courses (Chance and Mcgaughey Citation2014; Tintle et al. Citation2014; Tintle et al. Citation2012; Tintle et al. Citation2011).

2. Methods

In our analysis, we considered students using three different curricula: (1) Students using a traditional (consensus curriculum) Stat 101 textbook (denoted “consensus”), (2) Students using a preliminary version of a simulation-based inference curriculum (denoted “early-SBI”), and (3) Students using a full version of a simulation-based inference curriculum (denoted “SBI”). The following three sections briefly summarize each curriculum, while Section 2.4 explains which students and assessments were used with each curriculum. We now briefly describe each curriculum.

2.1. Consensus Curriculum (Consensus)

For more than a decade Stat 101 has had a generally accepted consensus curriculum focused on the normal distribution and its derivatives, such as the t-distribution, for conducting statistical inference (Malone et al. Citation2010; Scheaffer Citation1997). Students at two institutions (both small, Midwestern liberal arts colleges) used textbooks following a consensus curriculum in 2007 and spring 2011 before switching to SBI see (Tintle et al. Citation2014; Tintle et al. Citation2012, Citation2011) additional details on this curriculum and its implementation.

2.2. Early Simulation-Based Inference Curriculum (Early-SBI)

The same two institutions, which used the consensus curriculum, switched to an early version of an SBI curriculum in subsequent years (2009 and the 2011–2012 academic year, respectively). In this curriculum, unlike the consensus curriculum, formal statistical inference is introduced to students early in the course, motivated through tactile and computer-aided simulations in contrast to formal, mathematical representations of sampling distributions. Notably, student profiles were similar before and after the switch to the early-SBI curriculum, and a subset of instructors were similar before and after the switch. See other manuscripts for a full description of the early version of this curriculum, its implementation and other student and instructor details (Tintle et al. Citation2013; Tintle et al. Citation2012, Citation2011).

2.3. Simulation-Based Inference Curriculum (SBI)

Significant revisions were made to the early SBI curriculum in recent years, primarily with regards to the ordering of topics, while maintaining the focus on SBI. For example, in the early-SBI curriculum simulation-based methods were covered in the first half of the course and theory-based methods in the latter half, while in the new approach chapters were based around data contexts, meaning the simulation-based and theory-based methods were presented back-to-back for each type of data analysis. This curriculum was used at numerous institutions in the 2013–2014 academic year, and was recently published in its first edition, which is only very modestly different from the version used in 2013–14 (N. Tintle et al. Citation2016).

2.4. Samples and Assessments

The samples and assessments used in this analysis have been described in detail elsewhere (Consensus and early-SBI: (Tintle et al. Citation2014; Tintle et al. Citation2012, Citation2011); SBI (Chance, Wong, and Tintle, n.d.)), we provide a brief overview here. The consensus curriculum sample consists of 289 students from two institutions, covering two separate semesters and multiple instructor-sections of approximately 30 students per instructor-section. All students completed the Comprehensive Assessment of Outcomes in Statistics (CAOS; del Mas et al. Citation2007) during the first week of classes and again during the final week of class. Response rates were over 85% per section. The early-SBI sample consists of 366 students from the same two institutions as the consensus sample, with similar demographic characteristics and obtained in semesters shortly after the consensus sample data was obtained. The assessment was taken online and outside of class, with a small incentive (homework points) for completion, but not for performance.

The SBI sample consists of 1078 students from 34 instructor sections across 13 institutions including one community college, one private university, two high school AP statistics courses, four private liberal arts colleges and five large, public universities. These instructors administered a modified version of the CAOS test, which exchanged some questions that most students tend to get right or get wrong with alternative formulations or alternative questions altogether. See Chance et al. (Citation2017) for details. Test administration varied a bit between sections, but generally followed the same pattern as described for the consensus and early SBI samples. All data collection was approved by the ABC College Institutional Review Board. Composite ACT scores were gathered from the institutional research office for a single institution, while college GPA was gathered via self-report from all students in the SBI sample.

2.5. Statistical Analysis

Statistical analyses examine pre- to postcourse changes in test performance either on the test as a whole, or on subscales of the test covering similar topics (see Tintle et al. Citation2011 for details). In general, analyses examine either statistical significance between pretest and posttest (paired t-tests) or whether change scores (posttest minus pretest) are significantly different across subgroups (e.g., SBI vs. non-SBI curricula). In linear models testing for differences in change scores across consensus and early-SBI curricula, institution is adjusted for as a covariate (Change (Posttest minus pretest) = Curriculum +Institution). Model conditions (e.g., normality, etc.) were assessed and deemed to be sufficiently met by the data. A significance level of 0.05 is used throughout.

3. Results

3.1. Consensus Curriculum vs. Early-SBI: Overall Results

On average, students in the early-SBI curriculum showed similar performance on the CAOS pretest to students taking the consensus curriculum (Consensus mean: 47.7% vs. SBI mean: 44.6%) when data was combined across both institutions. The distribution of pretest scores approximately followed those of nationally representative samples (delMas et al. Citation2007). We separated students into three groups of approximately equal size (tertile split) based on pretest scores (40% or lower correct (Low performance), between 40% and 50% correct (Middle performance) or 50% or more correct (High performance)). provides the full details of the distribution of pretest and posttest scores between the two curricula. provides a visual representation of this data.

Table 1. Pre- and postcourse CAOS scores stratified by precourse performance and curriculum.

Figure 1. Student performance on CAOS stratified by pretest score and curriculum.

Figure 1. Student performance on CAOS stratified by pretest score and curriculum.

shows that all three precourse performance groups performed better using the early-SBI curriculum (between 2 and 3 percentage points), but only the middle performing group's performance was statistically significant (p = 0.046) after adjusting for institutional differences.

shows results when students were stratified by their ACT scores (again, using an approximate tertile split for this sample). As with , the early-SBI curriculum shows improved performance in all three groups compared to the consensus curriculum in this analysis, however in this case the results were significant in each of the three groups (mean changes of 8.2 percentage points (p = 0.006), 4.7 percentage points (p = 0.028) and 6.0 percentage points (p = 0.014), for the low, middle, and high groups, respectively. summarizes these data visually illustrating gains for all students, but larger gains for the early-SBI students across ACT score groups.

Table 2. Pre- and postcourse CAOS scores stratified by ACT score and curriculum.Footnote1

Figure 2. Student performance on CAOS stratified by ACT score and curriculum.

Figure 2. Student performance on CAOS stratified by ACT score and curriculum.

3.2. Consensus Curriculum vs. Early-SBI: Results by Subscale

shows the change in percentage correct from pretest to posttest for each of the nine individual subscales of the CAOS test (delMas et al. Citation2007), by curriculum, for students in the lowest precourse performance group (getting 40% or fewer questions correct on the pretest). Across the nine subscales of the CAOS test, three subscales showed significant improvement: data collection and design, tests of significance, and probability. After adjusting for institution, these lower performing students showed an approximately 8.4 percentage point improvement on tests of significance (p = 0.03), 9.4 percentage point improvement on data collection and design issues (p = 0.02), and 15.8 percentage point improvement on probability topics (p = 0.005). Data collection and design showed similar positive impact of the early-SBI curriculum for students with medium (40–50%) and high (50%+) performance on the pretest (11.5 point curricular impact (p = 0.01) and 7.7 point curricular impact (p = 0.06), though the improvement was only statistically significant for the medium group and was borderline significant for the high performance group. Probability and simulation also only showed statistically significant curricular impact for the medium performing group (14.4 impact; p = 0.03), with the highest group showing a 2.5 point improvement from the early-SBI curriculum, a nonsignificant difference (p = 0.60). Tests of significance also more improvement for early SBI students relative to the consensus curriculum (6.5 points (p = 0.09) and 8.2 points better (p = 0.02), respectively, for medium and high students), though this was only approaching significance for the medium students. The descriptive statistics subscale students showed significantly less improvement for the medium students using the early-SBI curriculum as compared to the consensus curriculum (10 point decline; p = 0.01). Appendix Tables A and B give full results for students in the Middle and High performing pre-test performance groups (the Appendix is available in the online supplemental files).

Table 3. Pre- and postcourse CAOS scores by subscale and curriculum—low performing students.

3.3. SBI Curriculum Results: Overall and by Subscale

illustrates the overall performance questions of SBI students on the modified CAOS across multiple institutions when stratified by pretest score or by self-reported GPA. As was done earlier, pretest concept score groups were created by breaking the sample into three approximately equal-size groups (tertiles) based on percentage correct (Low: 40% or less correct (n = 291); Middle: Between 40 and 55% correct (n = 422); High: 55% or better (n = 365)). College GPA groups were created by creating a group with GPAs of B or worse (Low: Less than or equal to 3.0; n = 193), GPAs between B+ and A- (Middle: Between 3.0 and 3.7; n = 654), and GPAs in the A range (High: 3.7 or above; n = 231).

Table 4. Pre- and postcourse concept scores stratified by precourse performance and GPA among SBI students in 2013–2014 (n = 1078).

All subgroups show significant improvement. Improvement is largest for the least well-prepared group when stratifying by pretest score (Low (15.2) vs. Middle (8.1): p < 0.001; Low (15.2) vs. High (4.0): p < 0.001), and largest for students with the largest GPAs when stratifying by GPA (Low (7.3) vs. Middle (8.1): p = 0.41; Low (7.3) vs. High (11.1): p = 0.002).

illustrates improvement scores within groups of similar conceptual questions. For the group of students with the lowest pre-test scores, significant improvement is seen within all seven conceptual areas. When stratifying by GPA, four of the seven groups of questions showed significant improvement for students with the lowest GPAs.

Table 5. Pre- and postcourse conceptual understanding by subscale—lower performing students (either based on pretest group or GPA) students in 2013–2014.

Similar patterns are observed for students in typical and more well-prepared groups as well (see Appendix Tables C and D). In particular, the middle performing group showed improvement on 5 of 7 scales when stratifying by either pretest or GPA, while the highest group showed improvement on 3 of 7 scales when stratifying by pretest and 7 of 7 when stratifying by GPA.

4. Discussion

Results from early implementations of SBI curricula for Stat 101 have shown promising results, leading to better postcourse performance on conceptual tests and better student retention postcourse. When stratifying students based on precourse conceptual test performance, ACT score or self-reported GPA, all groups of students showed significant improvement in postcourse understanding with some groups showing significantly greater improvement with SBI than with the consensus curriculum. Improvement among students using an early-SBI curriculum was greatest in areas related to data collection and design, tests of significance, and probability/simulation—all areas emphasized by the curriculum. A later version of the curriculum showed significant improvement among all topics, with largest improvements in tests of significance, confidence intervals, and probability/simulation.

In particular, we have demonstrated that the SBI curriculum considered here appears to work well for students across a range of preparation levels regardless of prior statistical abilities, mathematical abilities, or general academic performance (college GPA). This provides a strong case that the accessible, tactile, and conceptual approach at the core of SBI is helping to level the playing field in introductory statistics courses, not simply making the top students better, nor is it only benefiting weaker students. Minor decreases in performance in descriptive statistics have been noted in early versions of the SBI curriculum (Tintle et al. Citation2011), but have gone away in assessment of more recent SBI curriculum (Chance et al. Citation2017; Tintle et al. Citation2014).

Some limitations of our analysis are worth noting. First, we note that the later version of the SBI curriculum was not compared against non-SBI curriculum, limiting the conclusions that can be drawn from the analysis. Results generally show similar trends as in early version of the curriculum, but further work is necessary to make comparisons across student preparation levels and curricula. Second, we note that tests were taken outside of class and for completion credit. Future work is needed to assess the impact of environment or different incentives on student performance on conceptual assessments in introductory statistics. Third, limited information was available on courses students might have taken prior to the course (e.g., calculus, AP statistics, etc.). Future work is needed to capture this information and assess its ability to potentially better explain student performance in Stat 101 courses. Finally, recent explorations of student performance by our group on larger datasets (Chance, Wong and Tintle Citation2017) have used more sophisticated modelling approaches (e.g., hierarchical models) to account for a variety of additional instructor and institutional variables and should be considered in future work.

It is worth noting that although, in places, we focused on pretest performance to stratify the sample, we recognize the impact of regression to the mean when comparing pre-test to posttest scores. With this in mind, we looked at alternative stratification approaches including: ACT score and self-reported college GPA. Broad results are generally similar, showing that students across preparation levels show improvement in statistical thinking and reasoning. However, further work could consider different prior tests of statistical reasoning to classify students, as well as considering other demographic variables (e.g., race; socioeconomic status) and their association with student performance in simulation-based inference curricula. Further, we note that using tertiles to split students based on ACT score yielded only 33% of the sample in the lowest ACT group (ACT<22), whereas in national samples 63% of students have ACT scores below 22 (ACT Citation2017). Additional exploration of samples, especially among very academically underperforming students is warranted. Finally, another important area for further work is the consideration of student attitudes, especially among weaker students and how this impacts students' growth in conceptual understanding of statistical concepts.

Supplemental material

Supplemental Material

Download ()

Additional information

Funding

National Science Foundation DUE-1323210.

References

  • ACT. (2017), https://www.act.org/content/dam/act/unsecured/documents/NormsChartMCandComposite-Web2015-16.pdf. Accessed August 2017.
  • Chance, B., and Mcgaughey, K. (2014), “Impact of a Simulation/Randomization-Based Curriculum on Student Understanding of p-Values and Confidence Intervals,” Proceedings of the 9th International Conference on Teaching Statistics, 9.
  • Chance, B., Wong, J., and Tintle, N. (2017), “Student Performance in Curricula Centered on Simulation-Based Inference: A Preliminary Report,” Journal of Statistics Education.
  • Cherney, I. D., and Cooney, R. R. (2005), “The Mathematics and Statistics Perception Scale,” Transactions of the Nebraska Academy of Sciences, 30, 1–8.
  • del Mas, R., Garfield, J., Ooms, A., and Chance, B. (2007), “Assessing Students' Conceptual Understanding After a First Course in Statistics,” Statistics Education Research Journal, 6(28–58).
  • Dupuis, D. N., Medhanie, A., Harwell, M., Lebeau, B., Monson, D., and Post, T. R. (2011), “A Multi-Institutional Study of the Relationship Between High School Mathematics Achievement and Performance in Introductory College Statistics,” Statistics Education Research Journal, 11(1), 4–20.
  • Gnaldi, M. (2006), “The Relationship Between Poor Numerical Abilities and Subsequent Difficulty in Accumulating Statistical Knowledge,” Teaching Statistics, 28(2), 49–53.
  • Green, J. J., Stone, C. C., Zegeye, A., and Charles, T. A. (2009), “How Much Math Do Students Need to Succeed in Business and Economics Statistics? An Ordered Probit Analysis,” Journal of Statistics Education, 17(3).
  • Johnson, M., and Kuennen, E. (2006), “Basic Math Skills and Performance in an Introductory Statistics Course,” Journal of Statistics Education, 14(2).
  • Lester, D. (2007), “Predicting Performance in a Psychological Statistics Course,” Psychological Reports, 101, 334.
  • Li, K., Uvah, J., and Amin, R. (2012), “Predicting Students' Performance in Elements of Statistics,” US-China Review, 10, 875–884. Retrieved from http://files.eric.ed.gov/fulltext/ED537981.pdf.
  • Malone, C., Gabrosek, J., Curtiss, P., and Race, M. (2010), “Resequencing Topics in an Introductory Applied Statistics Course,” The American Statistician, 64(1), 52–8.
  • Rochelle, C. F., and Dotterweich, D. (2007), “Student Success in Business Statistics,” Journal of Economics and Finance Education, 6(1).
  • Scheaffer, R. (1997), “Discussion to New Pedagogy and New Content: The Case of Statistics,” International Statistics Review, 65(2), 156–8.
  • Silvia, G., Matteo, C., Francesca, C., and Caterina, P. (2008), “Who Failed the Introductory Statistics Examination? A Study on a Sample of Psychology Students,” International Conference on Mathematics Education.
  • Tintle, N., Chance, B., Cobb, G., Rossman, A., Roy, S., Swanson, T., and VanderStoep, J. (2016), Introduction to Statistical Investigations (First ed.), Hoboken, New Jersey, NJ: Wiley.
  • Tintle, N. L., Chance, B., Cobb, G., Rossman, A., Roy, S., Swanson, T., and VanderStoep, J. (2013), “Challenging the State of the Art in Post-Introductory Statistics: Preparation, Concepts and Pedagogy,” Proceedings of the 59th ISI World Statistics Congress, (Session IPS032), 295–300.
  • Tintle, N. L., Chance, B. L., Cobb, G. W., Roy, S., Swanson, T., and VanderStoep, J. (2015), “Combating Anti-Statistical Thinking Using Simulation-Based Methods Throughout the Undergraduate Curriculum,” The American Statistician, 69, 362–370.
  • Tintle, N. L., Rogers, A., Chance, B., Cobb, G., Rossman, A., Roy, S., … Vanderstoep, J. (2014), “Quantitative Evidence for the Use of Simulation and Randomization in the Introductory Statistics Course,” Proceedings of the 9th International Conference on Teaching Statistics.
  • Tintle, N., Topliff, K., VanderStoep, J., Holmes, V., and Swanson, T. (2012), “Retention of Statistical Concepts in a Preliminary Randomization-Based Introductory Statistics Curriculum,” Statistics Education Research Journal, 11(1), 21–40.
  • Tintle, N., VanderStoep, J., Holmes, V., Quisenberry, B., and Swanson, T. (2011), “Development and Assessment of a Preliminary Randomization-Based Introductory Statistics Curriculum,” Journal of Statistics Education, 19(1), 1–25.
  • Wang, J.-T., Tu, S.-Y., and Shieh, Y.-Y. (2007), “A Study on Student Performance in the College Introductory Statistics Course,” AMATYC Review, 29(1), 54–62.

Appendix

Table A. Pre- and postcourse conceptual understanding by subscale—middle-performing students.

Table B. Pre- and postcourse conceptual understanding by subscale—high-performing students.

Table C. Pre- and postcourse conceptual understanding by subscale—middle-performing students (either based on pretest group or GPA) students in 2013–2014.

Table D. Pre- and postcourse conceptual understanding by subscale—high-performing students (either based on pretest group or GPA) students in 2013–2014.