Abstract
Comparative Judgement (CJ) is an increasingly widely investigated method in assessment for creating a scale, for example of the quality of essays. One area that has attracted attention in CJ studies is the optimisation of the selection of pairs of objects for judgement. One approach is known as adaptive comparative judgement (ACJ). It has been claimed in the literature that ACJ produces very high reliability, often higher than can be obtained by conventional marking. Bramley showed by simulation that adaptivity can substantially inflate the apparent reliability in ACJ. The empirical study described here compared an adaptive with a non-adaptive CJ study using GCSE English essays. An all-play-all set of comparisons of a subset of the essays allowed the extent of scale inflation to be quantified: the reported adaptive reliability was 0.97 whereas the deflated value was 0.84. The value from the non-adaptive study was 0.72. However, the scale from the non-adaptive study correlated slightly higher with external variables, suggesting the non-adaptive study was no less valid than the adaptive one.
Acknowledgement
We would like to thank Chris Wheadon and Brian Henderson of NoMoreMarking™ for their help with running this study.
Notes
1. http://digitalassess.com/adaptive-comparative-judgement-a-groundbreaking-tool-for-assessment-and-learning/ Retrieved December 13, 2016.
2. In fact, Pollitt’s (Citation2015) simulations were themselves based on a flawed circular argument: that a realistic value for the SD of the generating distribution in a simulation can be derived from previous real ACJ studies – but the claim is that the scale separations and hence SDs in these studies were inflated.
3. The General Certificate of Secondary Education – a high stakes examination taken at the end of the period of compulsory schooling by 16 year olds in England, Wales and Northern Ireland.
4. To calculate the confidence limits, the standard error for the difference between two essays (sediff) was calculated as . Then 1.96 × sediff was added and subtracted from the difference in measures (β 1 – β 2) and used in Equation (1) to derive the predicted proportion and hence number of wins.