Abstract
In this study we use data from the Early Childhood Longitudinal Survey third- and fifth-grade samples to investigate teacher judgments of student achievement, the extent to which they offer a similar picture of student mathematics achievement compared to standardized test scores, and whether classroom assessment practices moderate the relationship between the two measures. Results indicate that teacher ratings correlate strongly with standardized test scores; however, this relationship varies considerably across teachers, and this variation is associated with certain classroom assessment practices. Furthermore, the evidence suggests that teachers evaluate student performance not in absolute terms but relative to other students in the school and that they may adjust their grading for some students, perhaps with basis on perceived differences in need and/or ability.
Notes
1Detailed psychometric information for the direct cognitive assessments used in ECLS is presented by CitationPollack, Najarian, Rock, and Atkins-Burnett (2005) and CitationPollack, Rock, Weiss, and Atkins-Burnett (2005).
2The third-grade survey additionally inquired about attendance, cooperativeness, and ability to take directions. Those items are not included in our study.
3The ECLS weighting schemes are not specifically designed for use with multilevel models, and thus the properties of the variance parameter estimates for cross-level interactions obtained from weighted multilevel models were thus unknown (CitationAsparouhov, 2006).
4The third- and fifth-grade data were collected at different points in time and thus do not allow for direct longitudinal comparisons (i.e., changes across grades cannot be separated from changes across time). However, to the extent that it can be assumed that no sweeping national changes occurred between 2002 and 2004, the weighted statistics in the table characterize the reported assessment practices of teachers of nationally representative samples of third- and fifth-grade students in that period, and thus permits rough comparisons across grades.
aInteraction parameters estimated as single-level weighted regressions with cluster-adjusted standard errors. Each block of variables was entered in the model separately. Thus, for each grade the table reflects four separate models, one for each set of background and assessment practice variables.
5Other potential alternative explanations are less conceptually appealing: For example differences in instructional sensitivity between ARS and DCA scores could affect their overall variance but should not systematically affect the proportion of variance across classrooms and schools. Differences in terms of alignment to standards or contents taught could influence the distribution, but those in fact should increase variance at the school level or higher (e.g., district or state), not classroom variance within schools.