Abstract
This research investigated the accuracy (agreement with the original marking and grading) of examiners’ holistic judgements of the quality of examination scripts that were close together in overall mark. For a History and a Physics exam, examiners considered pairs of scripts (with marks removed) and made three types of judgement: (1) Absolute – which grade each script was worth; (2) Relative – which of the pair was better in terms of overall quality; (3) Confidence – how confident they were about judgements (1) and (2). In both subjects, relative judgements were more accurate than absolute judgements, and judgements rated as ‘very confident’ were more accurate than other judgements. In Physics, the further apart the two scripts in terms of overall mark the greater was the likelihood of a correct relative judgement, but in History this expected pattern was not found. Despite differences between the research setting and the use of expert judgement in grading the live examinations, these results suggest that the current procedures do not use expert judgement in the most effective way.
Notes
1. As diagnosed by the Hosmer and Lemeshow (Citation2000) goodness of fit test – see SAS online documentation: http://support.sas.com/documentation/cdl/en/statug/59654/HTML/default/statug_logistic_sect039.htm.