Publication Cover
Educational Research and Evaluation
An International Journal on Theory and Practice
Volume 13, 2007 - Issue 5
1,365
Views
45
CrossRef citations to date
0
Altmetric
Articles

Assessment as Judgment-in-Context: Analysing how teachers evaluate students' writingFootnote1

, &
Pages 401-434 | Received 10 Jul 2006, Accepted 23 Aug 2007, Published online: 30 May 2008
 

Abstract

In this paper, we analyse teachers' judgments of students' written texts. We document how teachers use evidence in ways that depend both on their knowledge of the students and on the assessment framework they need to use. We analyse teachers' judgments by contrasting the structures of assessments made using teachers' normal classroom judgment processes with those made using an external set of “benchmark” standards. We show how the tension between demands for system-wide assessment validity and localised contextually sensitive site validity impacts on the richness and consistency of the judgment processes. We conclude that current understandings of teacher judgment processes that operate in everyday assessment practices generally fail to account for the complexity and dynamism of this routine classroom activity. Furthermore, we demonstrate that the methodology of judgment analysis, combined with think-aloud protocols, has the potential to shed light on the complexities associated with the operation of judgment in educational assessment.

Notes

1. The research reported in this article was partially funded by a research grant from the Australian Research Council (#A79906109) to the three authors.

2. For our purposes, a “cue” will refer to any bit of information that might potentially be drawn upon or referred to by a teacher to inform a judgment (Snow, Citation1968).

3. The regression weights reflect cue importance and, for ease of interpretation, are typically transformed to relative weights (summing to 100%) summarising the percentage of emphasis the teacher gave each cue when making judgments.

4. A configural judgment policy involves either the use of cues in a nonlinear fashion (such as with quadratic cue-judgment relationships) or in an interactive fashion (where the impact of one cue on judgment depends upon specific values of another cue) or both. Policy capturing that does not consider configural cue use is essentially evaluating cue main effects.

5. The labelling of the G and C terms is retained as the standard nomenclature for referring to policy similarity and unmodelled policy similarity (Tucker, Citation1964). More generally, the technical terms associated with Lens Model and policy capturing research evolved with the work of Hursch, Hammond, and Hursch (Citation1964) and continues to be reflected today (Goldstein Citation2004; Yen, Citation2005, provides a recent educational example).

6. Steps were taken to ensure that the writing instances in this subsample did not come from the classrooms of any of the teachers participating in the investigation.

7. A full description of all 26 objective text features is available from the first author on request.

8. The immediately deleted Coded Text Features concerned the number of illegal words in the writing, number of inconsistent uses of tense, and whether or not the work was handwritten or typed. This left a total of 23 Coded Text features for the cue condensation analysis. The only two remaining Coded Text Features that were not positively skewed and therefore were not log-transformed were adherence to genre and use of paragraphing.

9. We used a principal components approach to condensing the Coded Text Feature cues. This exploratory technique was used because we had no a priori theory about how such cues would be organised in this sample of students' writings. There was no coherent framework available within the empirical literature that would give sufficient guidance for a confirmatory approach.

10. The generalisability and stability of the principal components that we identified rested in the number and variety of texts that teachers brought to the study as their “in-context” pieces, coupled with the 25 we selected for the “out-of-context” pieces, rather than the number of teachers in the sample (their number was actually largely irrelevant to this analysis). With 23 variables, 520 instances of writing (22.6 to 1 ratio) were considered more than sufficient for producing a stable solution.

11. A full description of all 26 finely grained think-aloud features is available from the first author on request.

12. The process for computing meaningful relative cue weights involved recording the squared semipartial correlation associated with each cue at the step of its entry into the judgment model. These squared semipartial correlations were then transformed into relative cue weights by dividing each by the total of the 14 squared semipartial correlations (a procedure described in Cooksey, Citation1996, p. 170). Technically, the resulting relative cue weights were interpretable as the proportion of total explainable variance uniquely attributable to each specific cue at the point at which it entered the model.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 235.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.