479
Views
2
CrossRef citations to date
0
Altmetric
METHODOLOGICAL STUDIES

Disentangling Disadvantage: Can We Distinguish Good Teaching From Classroom Composition?

, , &
 

Abstract

This article investigates the use of teacher value-added estimates to assess the distribution of effective teaching across students of varying socioeconomic disadvantage in the presence of classroom composition effects. We examine, via simulations, how accurately commonly used teacher value-added estimators recover the rank correlation between true and estimated teacher effects and a parameter representing the distribution of effective teaching. We consider various scenarios of teacher assignment, within-teacher variability in classroom composition, the importance of classroom composition effects, and the presence of student unobserved heterogeneity. No single model recovers without bias estimates of the distribution parameter in all the scenarios we consider. Models that rank teacher effectiveness most accurately do not necessarily recover distribution parameter estimates with less bias. Since true teacher sorting in real data is seldom known, we recommend that analysts incorporate contextual information into their decisions about model choice and we offer some guidance on how to do so.

Notes

Although alternative parameters could be used for measuring the performance of estimators recovering teacher effects and degree of systematic teacher assignment with respect to student disadvantage, we focus on the rank correlation to assess the performance of teacher value-added models in recovering true teacher effects to relate better to other papers in the literature that use simulation methods to investigate how well teacher value-added models recover true teacher effects (e.g., Guarino, Reckase, & Wooldridge, Citation2013). Similarly, we use correlation coefficients between estimated teacher effects and the teachers’ aggregate proportion of student disadvantage as our distribution parameter estimate because these are the most natural way to model teacher assignment with classroom effects in our simulation setup, described later, thereby allowing us to intuitively quantify bias.

Test scores are standardized to have a mean of 0 and a standard deviation of 1 in each year. Similar results were obtained when using students’ math test scores instead of reading test scores. The results for math are available from the authors upon request.

For teacher fixed-effects models we include an intermediate step in which we “shrink” teacher effect estimates from step one to account for heterogeneity in the amount of information available to estimate each teacher's contribution to student achievement (Jacob & Lefgren, Citation2007).

This distribution replicates the distribution of school-average OBD observed in administrative data from a large, urban school district.

A detailed description of our simulation's setup can be found in Appendix C.

The standard deviation of this error distribution is restricted so that the resulting variance of the error in the test-score equation νit has feasible values.

Baseline scores are, therefore, not correlated with students’ OBD status, but subsequent scores are.

Note that our data-generating process does not include school effects that are separate from the combined effects of teacher, classroom, and student characteristics. Disentangling teacher and school effects reliably requires that numerous teachers are observed for multiple years before and after changing schools. Although these conditions would be straightforward to generate within simulated data, they would have limited applicability to many analytic situations, where separating school and teacher effects requires large panel data sets and several assumptions about the stability of teacher effects across contexts (see, e.g., Jackson, Citation2012; Mansfield, Citation2010; and Xu, Ozek, and Corritore, 2012). For tractability of the current analysis, we focus just on distinguishing teacher effects from classroom composition effects.

It is difficult to compare these parameter estimates from teacher value-added models with those from prior studies for two reasons. First, many prior studies of teacher value-added do not report parameter estimates from student-level regressions. Second, among those that report parameter estimates, it is often the case that the set of student-, classroom-, and teacher-level covariates included in the regression either (a) is more comprehensive than ours (e.g., Clotfelter et al., Citation2007), or (b) includes student fixed effects that absorb time-invariant student characteristics (e.g., Clotfelter, Ladd, & Vigdor, Citation2010; Hannaway et al., 2010; Rockoff, Citation2004). Having said that, available parameter estimates are quantitatively similar to the ones we employ in our simulations. For example, using student achievement data for grades 3–5 from North Carolina and a much richer specification, Clotfelter et al. (2007) report estimates of β of approximately 0.7 and of λ of approximately −0.1. Similarly, using high school achievement data from North Carolina, Clotfelter et al. (2010) report estimates of α of −0.06 and of β of approximately 0.4.

We also performed simulation exercises varying the persistence parameter β to smaller values (0.4 and 0.2), with similar conclusions. Model performance in recovering true teacher effects decreases at lower levels of persistence. No such clear pattern emerges, however, in the distribution parameter. Results for these lower persistence cases are available from the authors upon request. We do not consider other scenarios for the choice of α because we are primarily concerned with features of the data-generating process that we believe are potentially most relevant for the problem we study.

Specifically, we expand equation (5) and allow for an additional student unobserved heterogeneity term drawn from a standard uniform distribution, such as there is a negative correlation with students’ OBD status. The average correlation between student unobserved heterogeneity and OBD status in our simulated data is approximately −0.6. Although we acknowledge that this is a rather high correlation and this scenario might represent an extreme case in which a big part of the unobserved heterogeneity will be controlled by observed student disadvantage background, we still think the presented results are informative.

We could have also considered models in which we control for other classroom composition variables such as classroom average test scores. We do not do so for two reasons. First, doing so would increase considerably the number of scenarios we consider. Second, classroom-average-lagged test scores are correlated with classroom proportion OBD but do not enter the data-generating process directly. Situations in which we control for classroom-average-lagged scores are, therefore, an intermediate case from those in which we control and do not control for the classroom proportion OBD.

We also obtained estimates using teacher fixed-effects methods without empirical Bayes adjustments and results were similar to the adjusted ones presented here.

We also fitted equations in gains by omitting the lagged test score from the set of explanatory variables and using gains in test scores as our dependent variable instead. In general, models in gains did worse than models in levels in recovering the size of teacher contributions to student outcomes. When estimating the degree of sorting, models in levels also did better than models in gains in the presence of sorting of teachers into certain classrooms. However, in the absence of student sorting, models in gains perform better at recovering the distribution parameter. Results are available from the authors upon request.

There is another potential source of bias in the fixed-effects estimator if teachers have some of the same students in consecutive years. It is well known that panel data models that include both student fixed effects and a lagged dependent variable lead to biased estimates (Nickell, Citation1981). Although our specifications use teacher fixed effects and not student fixed effects, a similar bias will occur in our levels specification if some students have the same teacher in consecutive years. For these “repeater” students, their lagged test-score will be correlated with the true teacher effect for their current year, a part of which will be in the error term for the estimated model. This could lead to bias in all the estimated coefficients in the equation if the time-varying unobserved classroom shocks (ζit) are correlated over time. This correlation could be accentuated in practice if classroom composition does not vary sufficiently over time. In our limited classroom variability scenario in which students and teachers stay in the same school, one-sixth of the students on average have the same teacher in consecutive years. However, by construction we do not allow the unobserved classroom shocks to be correlated over time, and so we do not expect this source of bias to be important for our simulation results. It should be stressed, however, that in actual data available to us from several large, urban districts, up to 15 percent of students were taught by the same teacher in consecutive years. Having the same teacher can happen for multiple reasons including being held back or limited availability of teachers in smaller districts.

In other analyses (results not shown, available upon request), we have also estimated rank and distribution correlations with only one year of achievement data (plus a baseline score). In those simulations, the rank correlation of true and estimated teacher random effects more closely resembles the rank correlation of true and estimated teacher effects from the aggregated residuals model than that from the fixed effects models. Similarly, when using less data, estimates of the distribution correlation from random-effects models resemble those from aggregated-residuals models. The result that with minimum years of achievement data random-effects and aggregated-residuals models perform similarly in terms of their ability to rank teachers is consistent with simulation results in Guarino, Reckase, and Wooldridge (Citation2013). However, unlike Guarino, Reckase and Wooldridge (2013), we find that with minimum data, fixed-effects estimates always appear to rank teachers more accurately than estimates from aggregated-residuals or random-effects models. We hypothesize that this divergence stems from the fact that in our simulations, we consider peer effects in the data-generating process, unlike theirs.

Even if teacher assignment is random, the covariance between contemporaneous classroom proportion of student disadvantage and lagged classroom average scores is non-zero. With limited variability in classroom composition, a teacher gets assigned to a similar “type” of student year after year so that there is systematic correlation between classroom composition, teacher effectiveness, and student learning over time. Limited variability in classroom composition is, therefore, not a concern in aggregated residuals models because by assuming that teacher effects are the portion of test scores that are uncorrelated with covariates included in the model, aggregated residuals are effectively imposing orthogonality between teacher value-added and classroom composition.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.