624
Views
10
CrossRef citations to date
0
Altmetric
Measurement, Statistics, and Research Design

Time-Indexed Effect Size for Educational Research and Evaluation: Reinterpreting Program Effects and Achievement Gaps in K–12 Reading and Math

, &
Pages 193-213 | Published online: 03 Jan 2018
 

ABSTRACT

Through a synthesis of test publisher norms and national longitudinal data sets, this study provides new national norms of academic growth in K–12 reading and math to help reinterpret conventional effect sizes in time units. We propose , a time-indexed–effect-size metric to estimate how long it would take for an “untreated” control group to reach the treatment group outcome in terms familiar to educators—years/months of schooling. It serves as a supplement to conventional effect-size metrics, such as Cohen's d, by taking into account different amounts of time needed for learning at different ages or grade levels. Through applications to Project STAR small class effects and NAEP racial achievement gaps, we demonstrate how to interpret and use . It is expected to provide a more developmentally appropriate context for interpreting the size of an effect, a step toward bridging the gap between educational research and practice.

Fundings

This study was supported by the US Department of Education research grant (R305D090021).

Notes

1. Full linear growth model can be estimated via Hierarchical Linear Modeling (HLM) as follows (see Raudenbush & Bryk, Citation2002):

Level 1 Model (Time Level):

Yti = p0i + p1i(Time)ti + eti ,

where (Time) represents elapsed time since the first data collection point.

Level 2 Model (Person Level):

p0i = b00 + b01(Treatment)i + r0i

p1i = b10 + b11(Treatment)i + r1i ,

where (Treatment) represents treatment group dummy codes, with control as the referent.

b00 and b10 represent expected intercept and slope for the control group; slopes for (Treatment) indicate deflections from these estimates for the intervention group.

Although the linearity assumption works for typical school intervention studies that have no more than three repeated measures of an outcome (e.g., baseline, midway, and termination) in each grade, that assumption may not be realistic or necessary for studies with more frequent measurements such as multiwave curriculum-based measurement (CBM), which would allow for more sensitive detection of small changes in a short time period, and more flexible HLM modeling of intervention effects based on students’ growth trajectories (see Shin et al., Citation2004).

2. IRT-based vertical equating procedures are based on the assumption of sufficient test overlap between adjacent test levels and students from multiple grades who take the tests as a combined norming sample.

3. ECLS-K and NELS 8th-grade tests have close alignment with each other, as both adopted similar assessment frameworks and test items (Najarian, Pollack, & Sorongon, Citation2009).

4. Since sampling designs were similar across the three tests, only sample-size differences were considered for weighting (see Appendix). Our sensitivity analysis revealed that the results of synthesis without use of differential weights were very similar. The growth norms were highly convergent among the three tests with any paired correlation coefficients at or above .99.

5. The 12th grade Black-White gaps in school year units are incredibly large and may have been exaggerated if the 12th grade students’ true academic-growth rates were underestimated due to possible deterioration of test-taking motivation at the end of high school. The validity of interpretations remains questionable and needs further investigation in light of the entire growth trajectory; the 12th-grade growth rate of being near zero is an unexpected deviation from a slow but steady pattern of achievement gains observed during the lower grades in high school.

6. Bloom et al. (Citation2008) also used some longitudinal data to show the same patterns as they found with the cross-sectional data; the magnitudes of differences between the two types of effect-size estimates were typically small (less than 0.10 of a standard deviation). Since their samples for this comparison were drawn from two urban school districts, there is a need for using a nationally representative sample to compare cross-sectional and longitudinal results and verify the generalizability of their finding.

7. Enacted by state law in 1995, Wisconsin's Student Achievement Guarantee in Education (SAGE) program began as a 5-year pilot program in the 1996–97 school year to test the hypothesis that smaller classes in elementary schools raise the academic achievement of disadvantaged students.

8. Weighted means can be obtained by using within-study–sample-size information (n1 and n2 for treatment and comparison groups) and weighting each study's effect-size estimate by the inverse of its variance. The variance estimate for can be approximated as follows (see Cooper & Hedges, Citation1994):Var (d)=[n1+n2n1n2+d22(n1+n22)][n1+n2n1+n22].

9. MAT, TN, and SAT used an equating of levels program in which students took two adjacent levels of the tests. MAT 8th edition comprises a battery of 14 overlapping test levels (Harcourt, Citation2002). TN 2nd edition comprises a battery of 12 overlapping test levels (CTB/McGraw-Hill, Citation2003). SAT 10th edition comprises a battery of 13 overlapping test levels (Harcourt, Citation2004). ECLS-K and NELS:88 used equating based on a common set of anchor items across adjacent grade forms and most content areas represented in all grade forms (Najarian et al., Citation2009; Pollack et al., Citation2005; Rock et al., Citation1995).

10. Forty-seven percent of the STAR kindergarten sample attended rural schools. Approximately 48 percent of the STAR students qualified for free or reduced-price lunch compared to approximately 29 percent of public school students nationally (in 1987–1988). Approximately 33 percent of the STAR sample were minorities, of which 98 percent were Black. In contrast, the entire SAT7 spring standardization sample consisted of 22 percent minority students, of which 55 percent were Black (Psychological Corporation, Citation1985). Similarly, in our longitudinal sample (ECLS-K), 34 percent were minority students (47 percent, Black).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 169.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.