811
Views
29
CrossRef citations to date
0
Altmetric
Original Articles

Validating Arguments for Observational Instruments: Attending to Multiple Sources of Variation

, , , , , , , & show all
Pages 88-106 | Published online: 20 Sep 2012
 

Abstract

Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of these instruments. We illustrate how such an analysis might be used to assess a validity argument constructed for the Mathematical Quality of Instruction instrument, focusing in particular on the effects of varying the rater pool, subject matter content, observation procedure, and district context. Variation in the subject matter content of lessons did not affect rater agreement with master scores, but the evaluation of other portions of the validity argument varied according to the composition of the rater pool, observation procedure, and district context. These results demonstrate the need for conducting such analyses, especially for classroom observation instruments that are subject to multiple sources of variation.

Notes

1This is also the case in portfolio assessments where the content of the portfolio entries can vary within and across classrooms (see, e.g., CitationKoretz, Stecher, Klein, & McCaffrey, 1994).

2A fifth major dimension, Classroom Work is Connected to Mathematics, consists of a single item and is not discussed here.

3This use of the term “accuracy” is more restrictive than is often found in the measurement literature. We use it here to avoid the more cumbersome “rater agreement with master scores” below.

4The hiring cut score is based on a deviance metric, where we calculate raters' average absolute deviations from the master score. We do not use this metric here, as it provides the same picture of hiring and calibration practices as the percent correct score.

5There were many more item-segment combinations for arithmetic/algebra than for measurement/geometry due to the smaller number of measurement/geometry lessons in these studies.

a n = 291.

*p < .05.

**p < .01.

***p < .001.

*p < .05.

**p < .01.

***p < .001.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.