3,458
Views
26
CrossRef citations to date
0
Altmetric
Articles

On the reliability of high-stakes teacher assessment

Pages 91-105 | Published online: 18 Jan 2013

References

  • Baird , J.-A. , Beguin , A. , Black , P. , Pollitt , A. and Stanley , G. 2012 . “ The Reliability Programme Final Report of the Technical Advisory Group ” . In Ofqual’s Reliability Compendium , Edited by: Opposs , D. and He , Q. 771 – 838 . Coventry : Office of Examinations and Qualifications Regulation .
  • Baker , E. L. , Ayres , P. , O’Neil , H. F. , Choi , K. , Sawyer , W. , Sylvester , R. M. and Carroll , B. 2008 . KS3 English Test Marker Study in Australia. Final report to the National Assessment Agency of England , London : National Assessment Agency .
  • Bennett , R. E. , Gottesman , R. L. , Rock , D. A. and Cerullo , F. 1993 . Influence of Behaviour Perceptions and Gender on Teachers’ Judgements of Students’ Academic Skill . Journal of Educational Psychology , 85 : 347 – 356 .
  • Bew, P. 2011. Independent Review of Key Stage 2 testing, assessment and accountability. Final Report to the British Government Department for Education.
  • Black , P. , Harrison , C. , Hodgen , J. , Marshall , B. and Serret , N. 2010 . Validity in Teachers’ Summative Assessments . Assessment in Education , 17 : 215 – 232 .
  • Black , P. , Harrison , C. , Hodgen , J. , Marshall , B. and Serret , N. 2011 . Can Teachers’ Summative Assessments Produce Dependable Results and Enhance Classroom Learning? . Assessment in Education , 18 : 451 – 469 .
  • Brennan , R. L. 2001 . Generalizability Theory , New York , NY : Springer-Verlag .
  • Burgess, S., and E. Greaves. 2009. Test Scores, Subjective Assessment and Stereotyping of Ethnic Minorities. Working paper 09/221, University of Bristol, Centre for Market and Public Organization.
  • Cardinet , J. , Johnson , S. and Pini , G.-R. 2010 . Applying Generalizability Theory using EduG , New York , NY : Routledge .
  • Cumming , J. J. and Maxwell , G. S. 2004 . Assessment in Australian Schools: Current Practice and Trends . Assessment in Education , 11 : 94 – 108 .
  • Daugherty , R. 2007 . National Curriculum Assessment in Wales: Evidence-informed Policy? . Welsh Journal of Education , 14 : 62 – 77 .
  • Daugherty, R. 2011. “Designing and Implementing a Teacher-based Assessment System: Where is the infrastructure?” Paper presented at the Oxford University Centre for Educational Assessment seminar Teachers’ judgments within systems of summative assessment: strategies for enhancing consistency, Oxford, June.
  • Dhillon , D. 2005 . Teachers’ Estimates of Candidates’ Grades. Curriculum 2000 Advanced Level Qualifications . British Educational Research Journal , 31 : 69 – 88 .
  • Estyn . 2010 . Evaluation of the Arrangements to Assure the Consistency of Teacher Assessment in the Core Subjects at Key Stage 2 and Key Stage 3 , Cardiff : Her Majesty’s Inspectorate for Education and Training in Wales .
  • Gustafsson, J.-E., and G. Erickson. 2011. “To Trust or Not to Trust? Contrasting Findings from Teachers’ Assessments.” Paper presented at the annual conference of the Association for Educational Assessment – Europe, Belfast, Northern Ireland, November.
  • Harlen , W. 2004a . A Systematic Review of the Evidence of the Impact on Students, Teachers and the Curriculum of the Process of Using Assessment by Teachers for Summative Purposes , London : EPPI-Centre, Social Science Research Unit, Institute of Education, University of London .
  • Harlen , W. 2004b . A Systematic Review of the Evidence of the Reliability and Validity of Assessment by Teachers for Summative Purposes , London : EPPI-Centre, Social Science Research Unit, Institute of Education, University of London .
  • Harlen , W. 2005 . Trusting Teachers’ Judgements: Research Evidence of the Reliability and Validity of Teachers’ Assessment Used for Summative Purposes . Research Papers in Education , 20 : 245 – 270 .
  • Harlen , W. 2007 . Assessment of Learning , London : Sage .
  • Harlen , W. and Deakin Crick , R. 2002 . A Systematic Review of the Impact of Summative Assessment and Tests on Students’ Motivation for Learning , London : EPPI-Centre, Social Science Research Unit, Institute of Education .
  • Harlen , W. and Deakin Crick , R. 2003 . Testing and Motivation to Learn . Assessment in Education , 10 : 170 – 207 .
  • Hauser-Cram , P. , Sirin , S. R. and Stipek , D. J. 2003 . When Teachers’ and Parents’ Values Differ: Teacher Ratings of Academic Competence in Children from Low-income Families . Journal of Educational Psychology , 95 : 813 – 820 .
  • Hayward , E. L. 2007 . Curriculum, Pedagogies and Assessment in Scotland: The Quest for Social Justice. ‘Ah kent yir faither’ . Assessment in Education , 14 : 251 – 268 .
  • Hutchinson , C. and Hayward , L. 2005 . The Journey so Far: Assessment for Learning in Scotland . The Curriculum Journal , 16 : 225 – 248 .
  • Hutchison , D. and Benton , T. 2010 . Parallel Universes and Parallel Measures: Estimating the Reliability of Test Results , Coventry : Office of Qualifications and Examinations Regulation .
  • Johnson, S. 2010. The Reliability of Writing in the 2009 Survey. Internal report produced for the Scottish Government.
  • Johnson , S. 2011 . Assessing Learning in the Primary Classroom , London : Routledge .
  • Johnson , S. 2012 . “ A Focus on Teacher Assessment Reliability in GCSE and GCE ” . In Ofqual’s Reliability Compendium , Edited by: Opposs , D. and He , Q. 365 – 416 . Coventry : Office of Qualifications and Examinations Regulation .
  • Johnson, S., and L. Munro. 2008. “Teacher Judgements and Test Results: Should Teachers and Tests Agree?” Paper presented at the Annual Conference of the Association of Educational Assessment – Europe, Hissar, Bulgaria, November.
  • Lafontaine , D. and Monseur , C. 2009 . Les évaluations des performances en mathématiques sont-elles influencées par le sexe de l’élève? . Mesure et Evaluation en Education , 32 : 71 – 98 .
  • MacCann , R. G. and Stanley , G. 2010 . Classification Consistency When Scores are Converted to Grades: Examination Marks Versus Moderated School Assessments . Assessment in Education , 17 : 255 – 272 .
  • Martinez , J. F. , Stecher , B. and Borko , H. 2009 . Classroom Assessment Practices, Teacher Judgments, and Student Achievement in Mathematics: Evidence from the ECLS . Educational Assessment , 14 : 78 – 102 .
  • Maxwell, G. 2006. “Quality Management of School-based Assessments: Moderation of Teacher Judgements.” Paper presented at the 32nd IAEA Conference, Singapore, May.
  • Meadows , M. and Billington , L. 2005 . A Review of the Literature on Marking Reliability , London : National Assessment Agency .
  • Morgan , C. and Watson , A. 2002 . The Interpretive Nature of Teachers’ Assessment of Students’ Mathematics: Issues for Equity . Journal of Research in Mathematics Education , 33 : 78 – 110 .
  • Murphy , D. J. , Bruce , D. A. , Mercer , S. W. and Eva , K. W. 2009 . The Reliability of Workplace-based Assessment in Postgraduate Medical Education and Training: A National Evaluation in General Practice in the United Kingdom . Advances in Health Sciences Education , 14 : 219 – 232 .
  • Newton, P., and M. Meadows. 2011. “Special Issue: Marking Quality Within Test and Examination Systems.” Assessment in Education, 18: 213–216.
  • Opposs , D. and He , Q. , eds. 2012 . Ofqual’s Reliability Compendium , Coventry : Office of Qualifications and Examinations Regulation .
  • QCA . 2006 . A Review of GCSE Coursework , London : Qualifications and Curriculum Authority .
  • QCDA . 2009 . Changes to GCSEs and the Introduction of Controlled Assessment for GCSEs , London : Qualifications and Curriculum Development Agency .
  • QSA . 2010 . Moderation Handbook for Authority Subjects , Brisbane : Queensland Studies Authority .
  • QSA. 2011. Random Sampling Project. 2011 Report on Random Sampling of Assessment in Authority subjects. Brisbane: Queensland Studies Authority.
  • Ready , D. D. and Wright , D. L. 2011 . Accuracy and Inaccuracy in Teachers’ Perceptions of Young Children’s Cognitive Abilities: The Role of Child Background and Classroom Context . American Educational Research Journal , 48 : 335 – 360 .
  • Reeves , D. J. , Boyle , W. F. and Christie , T. 2001 . The Relationship Between Teacher Assessments and Pupil Attainments in Standard Test Tasks at Key Stage 2, 1996–98 . British Educational Research Journal , 27 : 141 – 160 .
  • Robinson , C. 2007 . “ Awarding Examination Grades: Current Processes ” . In Techniques for Monitoring the Comparability of Examination Standards , Edited by: Newton , P. , Baird , J.-A. , Goldstein , H. , Patrick , H. and Tymms , P. 97 – 123 . London : Qualifications and Curriculum Authority .
  • Schoonen , R. 2005 . Generalizability of Writing Scores: An Application of Structural Equation Modelling . Language Testing , 22 : 1 – 30 .
  • Shavelson , R. J. , Baxter , G. P. and Gao , X. 1993 . Sampling Variability of Performance Assessments . Journal of Educational Measurement , 30 : 215 – 232 .
  • SSA. 2006. Scottish Survey of Achievement. 2005 English language and Core Skills – Practitioner’s Report. Edinburgh: Scottish Government.
  • Stanley, G., R. MacCann, J. Gardner, L. Reynolds, and I. Wild. 2009. Review of Teacher Assessment: Evidence of What Works Best and Issues for Development. Oxford: University of Oxford Centre for Educational Assessment.
  • Taylor, M. 1992. The Reliability of Judgements Made by Coursework Assessors. Associated Examining Board internal report.
  • Thomas , S. , Madaus , G. E. , Raczek , A. E. and Smees , R. 1998 . Comparing Teacher Assessment and Standard Task Results in England: The Relationship Between Pupil Characteristics and Attainment . Assessment in Education , 5 : 213 – 246 .
  • van Rijn , P. W. , Béguin , A. A. and Verstralen , H. H. F. M. 2012 . Educational Measurement Issues and Implications of High Stakes Decisions Making in Final Examinations in Secondary Education in the Netherlands . Assessment in Education , 19 : 117 – 136 .
  • Wikstrom , C. 2006 . Education and Assessment in Sweden . Assessment in Education , 13 : 113 – 128 .
  • Wiliam , D. 2001 . Validity, Reliability and all that Jazz . Education , 3–13 ( 29 ) : 17 – 21 .
  • Wiliam , D. 2003 . National Curriculum Assessment: How to Make it Better . Research Papers in Education , 18 : 129 – 136 .
  • Wilmut , J. 2005 . Experiences of Summative Teacher Assessment in the UK , London : Qualifications and Curriculum Authority .
  • Wyatt-Smith , C. and Castleton , G. 2005 . Examining How Teachers Judge Student Writing: An Australian Case Study . Journal of Curriculum Studies , 37 : 131 – 154 .
  • Wyatt-Smith , C. , Klenowski , V. and Gunn , S. 2010 . The Centrality of Teachers’ Judgement Practice in Assessment: A Study of Standards in Moderation . Assessment in Education , 17 : 59 – 75 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.