1,554
Views
2
CrossRef citations to date
0
Altmetric
Theory, Contexts, and Mechanisms

Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments

, , , &
Pages 584-634 | Received 11 Sep 2020, Accepted 07 Oct 2021, Published online: 21 Jan 2022
 

Abstract

Since the standards-based education movement began in the early 1990s, mathematics education reformers have developed and evaluated many interventions to support students in mastering more rigorous content. We conducted a systematic review and meta-analysis of U.S. PreK-12 mathematics intervention effects from 1991 to 2017 to study sources of heterogeneity. From more than 9,000 published and unpublished study reports, we found 191 randomized control trials that met our inclusion criteria, with 1,109 effect size estimates representing more than a quarter of a million students. The average effect size on student mathematics achievement was 0.31, with wide heterogeneity of most effects ranging from −0.60 to 1.23. Two modeling approaches—meta-regression and machine learning—provided converging evidence that outcome measure type (researcher-created vs. standardized) and technology delivery (vs. teacher or interventionist delivery) were predictors of effect size. Intervention type, intervention length, grade level, and publication year were also identified as potentially explanatory factors.

Notes

1 N = 1 confounds occur when the intervention or comparison group contains only one study unit.

2 Our broader project coded 282 RCT studies, but this manuscript’s analyses focus on the 191 studies with at least one business-as-usual (BAU) control group. We excluded 91 studies that had only alternative treatment comparisons (and no BAU group) due to the complexity and lack of clear methodological guidance on analyzing such studies (e.g., the effect size could be positive or negative if the choice of the “main” intervention is not clear).

3 We recognize that blended or hybrid intervention types may be important and missing parts of this broad typology and we reflect on this issue in the limitations section.

4 Appropriate summary statistics were not always available to calculate SMDs for all effects. We queried authors for the missing information, which yielded some success in obtaining the necessary data to calculate SMDs. Our response rate for queries was 42%.

5 The prediction intervals estimated were based on a standard normal distribution: PI=g±τ(1.96), where g is the estimated average effect and τ is the estimated between-effect standard deviation.

6 We also conducted exploratory analyses using the Meta-CART package (Li et al., Citation2020), but the results indicated worse predictive performance compared to even standard linear meta-regression models.

7 The primary intervention type was always coded; thus, if a curriculum intervention also included some pedagogical strategies, it was only coded as a curriculum intervention.

8 Studies often included schools from more than one than locale setting (e.g., urban and suburban), thus, these percentages sum to greater than 100%.

9 This estimate assumed that the effect distribution is normally distributed with a mean of 0.31 and a standard deviation of 0.47 (see Mathur & VanderWeele, Citation2019). We used cluster bootstrapping sampling at the study level, not effect size level, to compute the confidence intervals (see the Methods section for further detail).

10 Assumed correlation reflects whether a correlation was imputed in calculating the effect size, which applies to three scenarios: (1) standard deviations for pre-post gain scores were reported, which had to be corrected; (2) the effect size was based on an ANCOVA F-test statistic, but the model R2 value was not reported; and (3) the effect size was based on an unstandardized regression coefficient and its standard error, but the posttest standard deviation was not reported. In total, this designation applied to 10 of 1,109 effect sizes (1%).

11 Outcome-intervention alignment was operationalized as the proportion of overlap between the outcome domains covered in the outcome measure and the content covered in the intervention. For example, if a study used an outcome measure that measured number sense and basic operations, but the intervention focused only on number sense, the alignment score would be 0.50. If the intervention had focused on both number sense and basic operations, the alignment score would have been 1.0. If the intervention had not focused on either number sense or basic operations, the alignment score would have been 0.0.

Additional information

Funding

This study was supported by the U.S. Institute of Education Sciences (IES) under grant [R305A170146]. Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily represent the views of the IES.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 302.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.