2,336
Views
6
CrossRef citations to date
0
Altmetric
Articles

A comparative judgement approach to the large-scale assessment of primary writing in England

Pages 46-64 | Received 24 Aug 2018, Accepted 29 Nov 2019, Published online: 05 Dec 2019

References

  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.
  • Andrich, D. (1982). An index of person separation in latent trait theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research & Perspectives, 9(1), 95–104.
  • Benton, T., & Gallacher, T. (2018). Is comparative judgement just a quick form of multiple marking? Research Matters: A Cambridge Assessment Publication, 24, 37–40.
  • Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs I: The method of paired comparisons. Biometrika, 39, 324–345.
  • Bramley, T., & Vitello, S. (2018). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 26(1), 43–58.
  • Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18, 65–81.
  • Department for Education. (2017). Primary assessment in England: Government consultation response. London: DfE.
  • He, Q., Anwyll, S., Glanville, M., & Deavall, A. (2013). An investigation of the reliability of marking of the key stage 2 national curriculum English writing tests in England. Educational Research, 55(4), 393–410.
  • Heldsinger, S. A., & Humphry, S. M. (2010). Using the method of pairwise comparison to obtain reliable teacher assessments. The Australian Educational Researcher, 37(2), 1–19.
  • Heldsinger, S. A., & Humphry, S. M. (2013). Using calibrated exemplars in the teacher-assessment of writing: An empirical study. Educational Research, 55(3), 219–235.
  • Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177.
  • Humphry, S. M., & Heldsinger, S. A. (2014). Common structural design features of rubrics may represent a threat to validity. Educational Researcher, 43(5), 253–263.
  • Humphry, S. M., & McGrane, J. A. (2015). Equating a large-scale writing assessment using pairwise comparisons of performances. The Australian Educational Researcher, 42(4), 443–460.
  • Jones, I., & Inglis, M. (2015). The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics, 89, 337.
  • Koretz, D. (1998). Large-scale portfolio assessments in the US: Evidence pertaining to the quality of measurement. Assessment in Education: Principles, Policy & Practice, 5(3), 309–334.
  • Laming, D. R. J. (2004). Human judgment: The eye of the beholder. London: Thomson Learning.
  • Lesterhuis, M., van Daal, T., Van Gasse, R., Coertjens, L., Donche, V., & De Maeyer, S. (2018). When teachers compare argumentative texts. Decisions informed by multiple complex aspects of text quality. L1-Educational Studies in Language and Literature, 18, 1–22.
  • Luce, R. D. (1959). Individual choice behaviours: A theoretical analysis. New York, NY: J. Wiley.
  • McGrane, J. A., Humphry, S. M., & Heldsinger, S. (2018). Applying a Thurstonian, two-stage method in the standardized assessment of writing. Applied Measurement in Education, 31(4), 297–311.
  • Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability. London: National Assessment Agency.
  • Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241–256.
  • Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research & Evaluation, 7(10), 71–81.
  • Murphy, R. J. L. (1982). A further report of investigations into the reliability of marking of GCE examinations. British Journal of Educational Psychology, 52(1), 58–63.
  • National Center for Education Statistics. (2017). National assessment of educational progress: An overview of NAEP. Washington, D.C: Author.
  • Ofqual. (2016). Marking consistency metrics. Coventry: Author.
  • Ofqual. (2018a). Key stage 2 writing moderation: Observations on the consistency of moderator judgements. Coventry: Author.
  • Ofqual. (2018b). Marking consistency metrics: An update. Coventry: Author.
  • Ofqual. (2018c). Marking reliability studies 2017: Rank ordering versus marking – Which is more reliable? Coventry: Author.
  • Polanyi, M. (1974). Personal knowledge: Towards a post-critical philosophy. Chicago, IL: University of Chicago Press.
  • Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19(3), 281–300.
  • Popham, W. J. (2005). The instructional consequences of criterion-referenced clarity. Educational Measurement: Issues and Practice, 13(4), 15–18.
  • Rezaei, A., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15(1), 18–39.
  • Robitzsch, A. (2019, January 4). sirt: Supplementary item response theory models. R package version 3.1-80. Retrieved from https://CRAN.R-project.org/package=sirt
  • Smith, E. V., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.
  • Standards and Testing Agency. (2015). 2016 national curriculum assessments key stage 2: Interim teacher assessment frameworks at the end of key stage 2. London: STA.
  • Thurstone, L. L. (1927a). A law of comparative judgment. Psychological Review, 34(4), 273.
  • Thurstone, L. L. (1927b). Psychophysical analysis. The American Journal of Psychology, 38(3), 368–389.
  • Tidd, M. (2016). Instead of discussing great teaching and learning, we’re looking for loopholes to hoodwink the moderators. Times Educational Supplement.
  • van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., & De Maeyer, S. (2019). Validity of comparative judgement to assess academic writing: Examining implications of its holistic character and building on a shared consensus. Assessment in Education: Principles, Policy & Practice, 26(1), 59–74.
  • Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice. doi:https://doi.org/10.1080/0969594X.2019.1602027. [Taylor & Francis Online].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.