1,192
Views
21
CrossRef citations to date
0
Altmetric
Articles

The effect of adaptivity on the reliability coefficient in adaptive comparative judgement

&
Pages 43-58 | Received 23 Dec 2016, Accepted 07 Dec 2017, Published online: 05 Jan 2018
 

Abstract

Comparative Judgement (CJ) is an increasingly widely investigated method in assessment for creating a scale, for example of the quality of essays. One area that has attracted attention in CJ studies is the optimisation of the selection of pairs of objects for judgement. One approach is known as adaptive comparative judgement (ACJ). It has been claimed in the literature that ACJ produces very high reliability, often higher than can be obtained by conventional marking. Bramley showed by simulation that adaptivity can substantially inflate the apparent reliability in ACJ. The empirical study described here compared an adaptive with a non-adaptive CJ study using GCSE English essays. An all-play-all set of comparisons of a subset of the essays allowed the extent of scale inflation to be quantified: the reported adaptive reliability was 0.97 whereas the deflated value was 0.84. The value from the non-adaptive study was 0.72. However, the scale from the non-adaptive study correlated slightly higher with external variables, suggesting the non-adaptive study was no less valid than the adaptive one.

Acknowledgement

We would like to thank Chris Wheadon and Brian Henderson of NoMoreMarking™ for their help with running this study.

Notes

2. In fact, Pollitt’s (Citation2015) simulations were themselves based on a flawed circular argument: that a realistic value for the SD of the generating distribution in a simulation can be derived from previous real ACJ studies – but the claim is that the scale separations and hence SDs in these studies were inflated.

3. The General Certificate of Secondary Education – a high stakes examination taken at the end of the period of compulsory schooling by 16 year olds in England, Wales and Northern Ireland.

4. To calculate the confidence limits, the standard error for the difference between two essays (sediff) was calculated as se12+se22. Then 1.96 × sediff was added and subtracted from the difference in measures (β 1 – β 2) and used in Equation (1) to derive the predicted proportion and hence number of wins.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 467.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.