Search in:

Advanced search

Assessment in Education: Principles, Policy & Practice Volume 27, 2020 - Issue 1

Submit an article Journal homepage

2,336

Views

CrossRef citations to date

Altmetric

Articles

A comparative judgement approach to the large-scale assessment of primary writing in England

Christopher WheadonNo More Marking, Durham, UKView further author information

Patrick BarmbyNo More Marking, Durham, UKView further author information

Daisy ChristodoulouNo More Marking, Durham, UKCorrespondence[email protected]

https://orcid.org/0000-0002-9465-6101 View further author information

Brian HendersonNo More Marking, Durham, UKView further author information

Pages 46-64 | Received 24 Aug 2018, Accepted 29 Nov 2019, Published online: 05 Dec 2019

Cite this article
https://doi.org/10.1080/0969594X.2019.1700212
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.
Web of Science ®Google Scholar
Andrich, D. (1982). An index of person separation in latent trait theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research & Perspectives, 9(1), 95–104.
Google Scholar
Benton, T., & Gallacher, T. (2018). Is comparative judgement just a quick form of multiple marking? Research Matters: A Cambridge Assessment Publication, 24, 37–40.
Google Scholar
Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs I: The method of paired comparisons. Biometrika, 39, 324–345.
Web of Science ®Google Scholar
Bramley, T., & Vitello, S. (2018). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 26(1), 43–58.
Web of Science ®Google Scholar
Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18, 65–81.
Web of Science ®Google Scholar
Department for Education. (2017). Primary assessment in England: Government consultation response. London: DfE.
Google Scholar
He, Q., Anwyll, S., Glanville, M., & Deavall, A. (2013). An investigation of the reliability of marking of the key stage 2 national curriculum English writing tests in England. Educational Research, 55(4), 393–410.
Web of Science ®Google Scholar
Heldsinger, S. A., & Humphry, S. M. (2010). Using the method of pairwise comparison to obtain reliable teacher assessments. The Australian Educational Researcher, 37(2), 1–19.
Web of Science ®Google Scholar
Heldsinger, S. A., & Humphry, S. M. (2013). Using calibrated exemplars in the teacher-assessment of writing: An empirical study. Educational Research, 55(3), 219–235.
Web of Science ®Google Scholar
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177.
Web of Science ®Google Scholar
Humphry, S. M., & Heldsinger, S. A. (2014). Common structural design features of rubrics may represent a threat to validity. Educational Researcher, 43(5), 253–263.
Web of Science ®Google Scholar
Humphry, S. M., & McGrane, J. A. (2015). Equating a large-scale writing assessment using pairwise comparisons of performances. The Australian Educational Researcher, 42(4), 443–460.
Web of Science ®Google Scholar
Jones, I., & Inglis, M. (2015). The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics, 89, 337.
Web of Science ®Google Scholar
Koretz, D. (1998). Large-scale portfolio assessments in the US: Evidence pertaining to the quality of measurement. Assessment in Education: Principles, Policy & Practice, 5(3), 309–334.
Google Scholar
Laming, D. R. J. (2004). Human judgment: The eye of the beholder. London: Thomson Learning.
Google Scholar
Lesterhuis, M., van Daal, T., Van Gasse, R., Coertjens, L., Donche, V., & De Maeyer, S. (2018). When teachers compare argumentative texts. Decisions informed by multiple complex aspects of text quality. L1-Educational Studies in Language and Literature, 18, 1–22.
Web of Science ®Google Scholar
Luce, R. D. (1959). Individual choice behaviours: A theoretical analysis. New York, NY: J. Wiley.
Google Scholar
McGrane, J. A., Humphry, S. M., & Heldsinger, S. (2018). Applying a Thurstonian, two-stage method in the standardized assessment of writing. Applied Measurement in Education, 31(4), 297–311.
Web of Science ®Google Scholar
Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability. London: National Assessment Agency.
Google Scholar
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241–256.
Google Scholar
Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research & Evaluation, 7(10), 71–81.
Google Scholar
Murphy, R. J. L. (1982). A further report of investigations into the reliability of marking of GCE examinations. British Journal of Educational Psychology, 52(1), 58–63.
Web of Science ®Google Scholar
National Center for Education Statistics. (2017). National assessment of educational progress: An overview of NAEP. Washington, D.C: Author.
Google Scholar
Ofqual. (2016). Marking consistency metrics. Coventry: Author.
Google Scholar
Ofqual. (2018a). Key stage 2 writing moderation: Observations on the consistency of moderator judgements. Coventry: Author.
Google Scholar
Ofqual. (2018b). Marking consistency metrics: An update. Coventry: Author.
Google Scholar
Ofqual. (2018c). Marking reliability studies 2017: Rank ordering versus marking – Which is more reliable? Coventry: Author.
Google Scholar
Polanyi, M. (1974). Personal knowledge: Towards a post-critical philosophy. Chicago, IL: University of Chicago Press.
Google Scholar
Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19(3), 281–300.
Google Scholar
Popham, W. J. (2005). The instructional consequences of criterion-referenced clarity. Educational Measurement: Issues and Practice, 13(4), 15–18.
Google Scholar
Rezaei, A., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15(1), 18–39.
Google Scholar
Robitzsch, A. (2019, January 4). sirt: Supplementary item response theory models. R package version 3.1-80. Retrieved from https://CRAN.R-project.org/package=sirt
Google Scholar
Smith, E. V., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.
Google Scholar
Standards and Testing Agency. (2015). 2016 national curriculum assessments key stage 2: Interim teacher assessment frameworks at the end of key stage 2. London: STA.
Google Scholar
Thurstone, L. L. (1927a). A law of comparative judgment. Psychological Review, 34(4), 273.
Google Scholar
Thurstone, L. L. (1927b). Psychophysical analysis. The American Journal of Psychology, 38(3), 368–389.
Google Scholar
Tidd, M. (2016). Instead of discussing great teaching and learning, we’re looking for loopholes to hoodwink the moderators. Times Educational Supplement.
Google Scholar
van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., & De Maeyer, S. (2019). Validity of comparative judgement to assess academic writing: Examining implications of its holistic character and building on a shared consensus. Assessment in Education: Principles, Policy & Practice, 26(1), 59–74.
Web of Science ®Google Scholar
Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice. doi:https://doi.org/10.1080/0969594X.2019.1602027. [Taylor & Francis Online].
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A comparative judgement approach to the large-scale assessment of primary writing in England

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A comparative judgement approach to the large-scale assessment of primary writing in England

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date