12
Views
1
CrossRef citations to date
0
Altmetric
Articles

An experimental assessment of the effects of reading software products on reading test scores

, &
Pages 1-26 | Received 03 Nov 2009, Accepted 31 Dec 2009, Published online: 30 Apr 2010
 

Abstract

Responding to a Congressional mandate in the No Child Left Behind Act, the effectiveness of nine reading instructional software applications was investigated using an experimental design and a large sample of American schools, teachers, and students. The software packages were selected based on evidence of effectiveness from previous research. Five software applications were for first‐grade students and four were for fourth‐grade students. Standardized tests were administered in the fall and spring of the 2004–2005 school year, classrooms were observed, and school records data were collected. The findings indicated that instructional activities were altered by the presence of software, but test scores were not statistically significantly higher in classrooms in which the software applications were used. For a subset of teachers and schools, an additional year of follow‐up was conducted to examine whether effectiveness was moderated by a year of teacher experience. The results were mixed, with measured effects being lower in the first grade and higher in the fourth grade.

Acknowledgements

We thank Barbara Means, Robert Murphy, William Penuel, and their staff at SRI International for their significant contributions to the observational study on which the implementation analysis draws heavily, and Audrey Pendleton at the US Department of Education’s Institute of Education Sciences for her contributions. The research reported here was supported by contract ED‐01‐CO‐0039/0007 with Mathematica Policy Research, Inc., however, the views expressed here are those of the authors and do not necessarily reflect the views or policies of the Institute of Education Sciences. The authors are responsible for any errors.

Notes

1. Multiplying total days of usage (76) by minutes used (23) yields 1748 minutes of annual use, which is divided by 180 to arrive at average use of 9.7 minutes per actual school day.

2. The observation protocol called for observers to observe periods during which treatment teachers were using reading products. To the extent possible, control classrooms were observed during the same period that treatment teachers were using products. For example, if treatment teachers in a particular school used the product during the last half of the reading period, observers attempted to observe control classrooms during the last half of the reading period. If observers had observed classrooms at random times rather than when products were used, the differences shown in Table may have been smaller because products would not be in use during some of the observations.

3. Observers estimated the proportion of students who were doing something other than the assigned academic task during a one‐minute segment using categories such as walking around the classroom for reasons unrelated to the task, talking with other students on topics unrelated to the task, talking with other students while the teacher was addressing the whole class, sitting at the computer for long periods with no interactions with keyboard or mouse, or sleeping or having their heads on their desks.

4. The main model estimated a single effect for all five products by including an indicator that a student was in a treatment classroom, regardless of the product used in the classroom. This approach essentially averages individual product effects with weights that are proportional to the number of classrooms that products have in the study. Products with more classrooms contribute more to the estimated effect.

5. Effect sizes are calculated by dividing the score difference shown in the table by the standard deviation of the distribution of spring test scores for the full control group.

6. The study also examined whether products reduced the proportion of students who were low scorers on the SAT 9 by creating an indicator of whether a student fell in the lower third of scores and estimating a three‐level model with that indicator as the outcome. Whether the student was below the 33rd percentile on the pretest was used as a covariate along with the same set of variables used for the score model above. The results indicated that products did not have a statistically significant effect on whether students were low scorers. Two‐level models also were estimated for students in each quartile (based on the fall score), to assess product effects across the achievement distribution. Estimated effect and p‐values for the four quartiles were 0.35 (.82), 0.82 (.53), 1.77 (.19), and 0.25 (.86).

7. School effects were estimated using a regression model in which test scores are regressed on student and teacher characteristics and the treatment indicator is interacted with an indicator for each school. Standard errors were adjusted for classroom clustering. Effect sizes for schools are calculated by dividing the score difference for each school by the standard deviation of the distribution of control group test scores. An alternate effect size was calculated by dividing the score difference for each school by the standard deviation of the distribution of control group test scores for students in that school. The alternate effect sizes show the same pattern between schools, but are more variable.

8. As with first grade, the study team worked with districts to identify hardware and software needs such as computers, headphones, memory, and operating system upgrades.

9. As noted in the previous section, the study itself may have increased usage because it purchased hardware and software upgrades directly (which allowed it not to go through district procurement processes), paid teachers honoraria for attending training on using products, and relayed information to product developers about issues with using products that the team observed during classroom visits.

10. The observation protocol called for observers to observe reading periods during which treatment teachers were using products. If observers had observed classrooms at random times, the differences shown in Table may have been smaller because products would not be in use during some of the observations.

11. Additional analyses also found that products did not affect whether students were low scorers. Two‐level models also were estimated for students in each quartile (based on the fall score) to assess product effects across the score distribution. Estimated effects (in NCE units) and p‐values for the four quartiles were −0.32 (.69), 0.51 (.64), 1.77 (.21), and 0.94 (.71). None are statistically different from zero.

12. Because the sample of teacher and students in this analysis differs from the sample used to estimate first‐year effects reported above (it is based on teachers who remained in the same school and grade level for the second year), the first‐year estimates differ from those reported above.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.