ABSTRACT
We investigate the impact of a relatively brief cross-curricular intervention, Word Generation, on middle school students' development of taught academic vocabulary. Students (n = 8382) in forty-four middle schools in three urban districts were randomly assigned to treatment or control conditions. Treatment teachers implemented the program with minimal support and varying levels of commitment. Students in treatment schools scored almost a point higher on the curriculum-based vocabulary posttests than those in control schools (Hedges's g = 0.094, p < 0.05). Though there was no main treatment effect on the standardized measures of students' general vocabulary knowledge or reading comprehension, baseline-by-treatment interactions at the school and student level acted to attenuate the Matthew Effect in reading and vocabulary growth.
Funding
This work was supported by the Institute of Educational Sciences [Grant number R305A090555].
Notes
1 Not all schools participated in ways that had been agreed upon with the district leaders. School 32 participated as both a control and Word Generation school, where eighth graders were in the control condition and sixth graders in the treatment condition. School 37 was assigned to TX but did not implement at all. Two other schools assigned to control (24 and 36) and one other school assigned to treatment (33) dropped out of the study and did not provide data.
2 In order to be sure that missing data did not unduly influence our results, we replicated descriptive tables and basic models with multiply imputed (MI) data sets created using the multivariate normal model (Little & Rubin, Citation2002). We imputed pretest and post-test scores for each student with missing data on any of the three achievement measures using all of the achievement data from that student, plus information about district and grade level, as well as demographic information about the student's school (PERCENT_FARM PERCENT_ELL). Despite important developments in the field in respect to conducting MI in multilevel contexts, we were not confident in our models that imputed school-level or teaching-team-level data, so we fit basic models. Each model used the mi estimate command with xtmix in Stata on MI data sets predicting outcome measures from each pretest, controlling for grade level (school was the grouping variable). The fully imputed data set included 11,015 students in each model. We found that the models looked very similar to those that we have presented here. The coefficients for the variable associated with Word Generation participation were TREAT (WG vocabulary) = 0.972, p = 0.15, TREAT (general vocabulary) = 1.04, p = n.s., and TREAT (reading comprehension) = 0.147, p = n.s. These estimates do not properly account for student nesting, so we have more confidence in the full HLM models we present in ; however, these MI models align with and complement our other models and we take them to suggest that our models are not being unduly influenced by nonrandom missingness.
3 This resulted in 11 very small teams being eliminated from the analysis (including teams with only two or three students who contributed pretest and posttest data in the grade level). We fit models with different exclusion criteria at the team level and found consistent results in our estimates of treatment effects on taught words. With no limit on the number of students per teaching team group, TREAT = 0.872 (Nstudents = 8,465, Nteams = 118); when we limit the sample to only teams with more than four students, TREAT = .883 (Nstudents = 8,459, Nteams = 116); when we limit the sample to only teams with more than eight students, TREAT = .954 (Nstudents = 8,421, Nteams = 110); when we limit the sample to only teams that contributed more than 12 students, TREAT = 1.03 (Nstudents = 8,338, Nteams = 102).
4 We calculated effect sizes with the following equation:
5 We calculated pooled standard deviation with the following equation:
6 In our exploratory models we also found school-mean general vocabulary by treatment interactions in predicting general vocabulary posttests (β = −.156, p < 0.01, model deviance = 66,830.3) and reading comprehension (β = −.352, p < 0.01, model deviance = 71,415.1). We also found an interaction between treatment and school-level percent of free and reduced-price lunch in predicting posttest reading comprehension (β = 0.400, p < 0.01, model deviance = 71,416.8). However, none of these interactions were significant when we also included interactions with baseline WG vocabulary. In each case, the interactions with baseline WG vocabulary resulted in better model fit (based on fit statistics on ), as we anticipated. We present the best-fitting models.