432
Views
1
CrossRef citations to date
0
Altmetric
Intervention, Evaluation, and Policy Studies

Performance Evaluations as a Measure of Teacher Effectiveness When Implementation Differs: Accounting for Variation across Classrooms, Schools, and Districts

ORCID Icon, &
Pages 510-531 | Received 01 Apr 2021, Accepted 04 Nov 2021, Published online: 03 Mar 2022
 

Abstract

We use statewide data from Massachusetts to investigate teacher performance evaluations as a measure of teaching effectiveness. Schools tend to classify most of their teachers as proficient, but we document substantial variation across schools in the extent to which ratings differentiate teachers. Using event study and teacher fixed effects designs, we verify that these patterns are driven by differences in the application of standards rather than differences in the distribution of teacher quality. When we evaluate teachers’ movement from schools with greater to lower differentiation in their evaluations using an event study design, we find that the probability of receiving the highest performance rating drops by about 5 percentage points and, at least in the first year, the probability of receiving one of the lowest two ratings drops by 5 percentage points. As a result, even after regression adjustment, teacher evaluation ratings generally provide unreliable predictions of future teacher evaluations after teachers switch schools. These findings suggest that policymakers and researchers should use caution in using performance evaluation ratings to make comparisons between teachers in different contexts.

Acknowledgments

We wish to thank ESE for providing the data we utilize, Bingjie Chen for excellent research assistance, and Claire Abbott, Matthew Steinberg, and attendees of the 2017 APPAM Fall Conference for comments that improved the manuscript. All opinions expressed in this article are those of the authors and do not necessarily reflect the views of ESE, the study’s sponsors, or the institutions to which the author(s) are affiliated. Any errors are attributable to the authors.

Notes

1 The patterns are similar when we categorize districts based on the within-district ratings variance: 10% of teachers in low variance districts and 22% of teachers in high variance districts receive ratings other than proficient.

2 The timeline depends on a teacher’s career stage and prior evaluation results. The cycle begins with a self-assessment by the teacher and the development of a professional growth plan. During the implementation of the growth plan, teachers receive periodic feedback through a formative assessment process. The summative evaluation occurs at least annually for beginning and low-performing teachers and at least biennially for teachers previously earning one of the top two ratings. Teachers on a biennial review cycle receive a formative assessment in the alternating year. We include these formative ratings in the analyses in this article, although the results are not sensitive to using summative performance ratings only.

3 Third year teachers (or teachers new to a district for three years) must be rated proficient on all four standards to receive tenure.

4 This simplified conceptual model assumes a single dimension of teacher quality, but emerging evidence suggests that that teacher effects on non-cognitive outcomes are not highly correlated with their effects on student test scores (Gershenson, Citation2016; Jackson, Citation2018; Kraft, Citation2019; Liu & Loeb, Citation2019), providing evidence of multiple dimensions of teacher quality.

5 The model in Equation (1) is equivalent to a model where raters have rater-specific thresholds μlj= μlβj. The model could be also extended to allow idiosyncratic variation in the threshold values by indexing each value by j (e.g., μlj). Such a model would allow raters to compress segments of the ratings distribution by differing amounts.

6 Teachers do appear multiple times in the data for different school switches, although this is not common. We define all covariates according to the switching event (i.e., include a teacher-by-switch effect rather than a teacher effect) and include teachers as a level of clustering to account for dependence across mobility events.

7 Much of the prior work on classroom composition effects has focused on self-contained classrooms in the elementary grades where teachers have only one classroom assignment. We replicate this analysis using a sample of self-contained classrooms in grades 4 and 5 in Supplementary Appendix A and find similar patterns, although the effects of classroom composition appear somewhat stronger in that context. A one standard deviation increase in the achievement distribution raises the overall rating by about 0.4–0.6 points.

8 We describe the procedure for constructing leave-out predictions of teacher performance in more detail in Supplementary Appendix B.

9 Because we estimate these regressions separately by variance group, this redefinition has no effect on the estimated coefficients.

10 We do not have identifiers for the evaluator, so we cannot test directly for the existence of rater effects. Nonetheless, the year-to-year correlation of teacher performance evaluations is lower in years in which schools change principals (0.59) than in years when they do not (0.67).

11 We report similar analyses using data from other school districts in Supplementary Appendix C. Given the relatively smaller number of district-to-district transitions, the estimates are significantly less precisely estimated than those using data from other schools.

Additional information

Funding

This research was supported by a grant from the Massachusetts Department of Elementary and Secondary Education (ESE) and by the National Center for Analysis of Longitudinal Data in Education Research (CALDER), which is funded by a consortium of foundations. For more information about CALDER funders, see www.caldercenter.org/about-calder.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.