330
Views
7
CrossRef citations to date
0
Altmetric
Articles

Is Teacher Value Added a Matter of Scale? The Practical Consequences of Treating an Ordinal Scale as Interval for Estimation of Teacher Effects

Pages 52-70 | Published online: 14 Oct 2016
 

ABSTRACT

Research shows that assuming a test scale is equal-interval can be problematic, especially when the assessment is being used to achieve a policy aim like evaluating growth over time. However, little research considers whether teacher value added is sensitive to the underlying test scale, and in particular whether treating an ordinal scale as interval might lead to erroneous conclusions about teacher quality. This article addresses the issue by estimating teacher value added, then applying extremely mild nonlinear transformations to the original scale and re-estimating the value added. Although by definition at most one of these scales can be equal-interval, all are treated as if interval-scaled when estimating value added. Results show that value added is sensitive to the scale used. While rank orderings of teachers do not change radically by transformation, even mild departures from the original scale can change a teacher’s odds of being considered high- or low-performing by a factor of 5.

Notes

1 When, as in this study, prior-year test scores are used as a dependent variable (rather than use gain scores as the dependent variable), the outcome test score is often standardized to place TVA estimates for teachers at different grade levels, or based on different outcome years, into a common metric. This approach differs from Ballou’s (Citation2009) and Briggs and Domingue’s (2012) studies on the issue of TVA and scale dependence, which use the original, unstandardized scale scores.

2 In general, the TVA model used in this study closely approximates the one used by the State of Florida to estimate teacher contributions to student learning, although there are key differences. For one, TVA estimates in this study’s model are pooled across years rather than taken as gains between 2 years. For another, there are differences in the covariates used, due mainly to the availability of additional relevant variables in M-DCPS. For example, this study’s model includes student suspensions and expulsions, as well as several classroom- and school-level variables. The model in this study also drops teachers from the analysis who have fewer than 10 students upon which to base TVA estimates.

3 Although standardization of test scores within grade and year largely renders these fixed effects redundant, they are nonetheless included to match common practice in TVA estimation.

4 TVA estimates were also run using grade-by-year interactions. Given results did not differ substantively, only results from models using the basic year and grade fixed effects are reported.

5 These transformations are applied by year such that both the current and lagged test score distributions were transformed using the same beta parameters.

6 Quantile-quantile plots of transformed versus untransformed score distributions in grades 4 and 5 show much less departure from each other than similar plots for untransformed scores in adjacent grades. Quantile-quantile plots available on request.

7 For the remainder of the article, for clarity, “decile” will refer to the baseline (i.e., prior-year) student scores in a teacher’s class and “percentile” to rankings of teachers based on TVA. Note that because decile classifications are based on a rank transformation of student prior-year test scores, they are invariant across the original scale and transformations 1–4.

8 Despite remaining agnostic about whether the original scale is interval, this article’s approach privileges the original scale by using it as the baseline against which to compare TVA based on other scales. However, if all of the transformations are plausible, then one could also justifiably compare the two scales that are the most different in terms of skewness (i.e., transformations 1 and 4). This approach of comparing the extremes is avoided for two reasons. First, the original scale is the one used in practice and therefore warrants some emphasis. Second, comparing the transformed to the original is more conservative than comparing transformed scales at either extreme. Nonetheless, analyses (not reported here) were also conducted to compare TVA using transformation 1 to transformation 4, and the results showed greater reordering of teachers than in the reported results, which compare either transformation to TVA using the original scale.

9 One exception is that the means shift upward under transformations three and four in decile 10, which helps explain why in the effects of those two transformations in that decile reduce the odds of a teacher being in the bottom 5%, but have relatively little effect on the odds of a teacher being in the top 5%. That is, the effect of TVA variances in decile 10 under transformations three and four is increased by the shift in means for low-performing teachers, but mitigated by the shift in means for high-performing teachers.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 400.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.