811
Views
29
CrossRef citations to date
0
Altmetric
Original Articles

Validating Arguments for Observational Instruments: Attending to Multiple Sources of Variation

, , , , , , , & show all
Pages 88-106 | Published online: 20 Sep 2012
 

Abstract

Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of these instruments. We illustrate how such an analysis might be used to assess a validity argument constructed for the Mathematical Quality of Instruction instrument, focusing in particular on the effects of varying the rater pool, subject matter content, observation procedure, and district context. Variation in the subject matter content of lessons did not affect rater agreement with master scores, but the evaluation of other portions of the validity argument varied according to the composition of the rater pool, observation procedure, and district context. These results demonstrate the need for conducting such analyses, especially for classroom observation instruments that are subject to multiple sources of variation.

Notes

1This is also the case in portfolio assessments where the content of the portfolio entries can vary within and across classrooms (see, e.g., CitationKoretz, Stecher, Klein, & McCaffrey, 1994).

2A fifth major dimension, Classroom Work is Connected to Mathematics, consists of a single item and is not discussed here.

3This use of the term “accuracy” is more restrictive than is often found in the measurement literature. We use it here to avoid the more cumbersome “rater agreement with master scores” below.

4The hiring cut score is based on a deviance metric, where we calculate raters' average absolute deviations from the master score. We do not use this metric here, as it provides the same picture of hiring and calibration practices as the percent correct score.

5There were many more item-segment combinations for arithmetic/algebra than for measurement/geometry due to the smaller number of measurement/geometry lessons in these studies.

a n = 291.

*p < .05.

**p < .01.

***p < .001.

*p < .05.

**p < .01.

***p < .001.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 290.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.