560
Views
7
CrossRef citations to date
0
Altmetric
Articles

Classification accuracy in Key Stage 2 National Curriculum tests in England

, &
Pages 22-42 | Published online: 01 Feb 2013

References

  • American Psychological Association, American Educational Research Association, and National Council on Measurement in Education. 1999. Standards for educational and psychological testing. Washington, DC: American Psychological Association.
  • Bachman , L. 2004 . Statistical Analyses for Language Assessment , Cambridge : Cambridge University Press .
  • Bachman , L. and Palmer , A. 1996 . Language Testing in Practice , Oxford : Oxford University Press .
  • Black, P., and D. Wiliam. 2012. “The Reliability of Assessments.” In Assessment and learning, edited by J. Gardner. 2nd ed, 243–263. London: Sage.
  • Bond , T. and Fox , C. 2007 . Applying the Rasch Model: Fundamental Measurement in the Human Sciences , 2nd ed. , Mahwah , NJ : Lawrence Erlbaum .
  • Brennan , R. 2004 . BB-CLASS: A Computer Program that Uses the Beta-Binomial Model for Classification Consistency and Accuracy , Iowa City , IA : University of Iowa Center for Advanced Studies in Measurement and Assessment .
  • Children, Schools and Families Committee. 2008. Testing and Assessment: Third Report of Session 2007–08. Vol. 1. London: The Stationery Office.
  • Crocker , L. and Algina , J. 1986 . Introduction to Classical and Modern Test Theory , New York , NY : Harcourt Brace .
  • Cronbach , L. 1951 . Coefficient Alpha and the Internal Structure of Tests . Psychometrika , 16 : 297 – 334 .
  • Feldt , L. and Brennan , R. 1989 . “ Reliability ” . In Educational measurement , 3rd ed. , Edited by: Linn , R. L. 105 – 146 . Washington , DC : American Council on Education/Macmillan .
  • Feldt , L. , Steffen , M. and Gupta , N. 1985 . A Comparison of Five Methods for Estimating the Standard Error of Measurement at Specific Score Levels . Applied Psychological Measurement , 9 : 351 – 361 .
  • Gardner , J. and Cowan , P. 2005 . The Fallibility of High Stakes ‘11-plus’ Testing in Northern Ireland . Assessment in Education: Principles Policy and Practice , 12 ( 2 ) : 145 – 165 .
  • Hambleton , R. and Swaminathan , H. 1985 . Item Response Theory: Principles and Applications , Boston , MA : Kluwer Academic .
  • Hambleton , R. , Swaminathan , H. and Rogers , H. 1991 . Fundamentals of Item Response Theory , Newbury Park , CA : Sage .
  • Hanson, B. 1991. Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. Iowa City, IA: ACT. http://www.act.org/research/reports/pdf/ACT_RR91-05.pdf.
  • Hanson, B., L. Zeng, and D. Colton. 1994. A Comparison of Presmoothing and Postsmoothing Methods in Equipercentile Equating. Iowa City, IA: ACT. http://www.act.org/research/reports/pdf/ACT_RR94-04.pdf
  • Harvill , L. 1991 . An NCME Instructional Module on Standard Error of Measurement . Educational Measurement: Issues and Practice , 10 ( 2 ) : 33 – 41 .
  • He , Q. , Hayes , M. and Wiliam , D. 2012 . “ Classification Accuracy in Results from Ks2 National Curriculum Tests ” . In Ofqual’s Reliability Compendium , Edited by: Opposs , D. and He , Q. 91 – 106 . Coventry : Ofqual .
  • Hutchison , D. and Benton , T. 2012 . “ Parallel Universes and Parallel Measures: Estimating the Reliability of Test Results ” . In Ofqual’s Reliability Compendium , Edited by: Opposs , D. and He , Q. 419 – 454 . Coventry : Ofqual .
  • Lee , W. 2008 . Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory , Iowa City , IA : University of Iowa Center for Advanced Studies in Measurement and Assessment .
  • Lee , W. 2010 . Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory . Journal of Educational Measurement , 47 : 1 – 17 .
  • Lee, W., and M. Kolen. 2008. IRT-CLASS: A Computer Program for Item Response Theory Classification Consistency and Accuracy (Version 2.0). Iowa City, IA: University of Iowa Center for Advanced Studies in Measurement and Assessment.
  • Livingston , S. and Lewis , C. 1995 . Estimating the Consistency and Accuracy of Classifications Based on Test Scores . Journal of Educational Measurement , 32 : 179 – 197 .
  • Lord , F. 1969 . Estimating True-score Distribution in Psychological Testing (and Empirical Bayes Estimation Problem) . Psychometrica , 34 : 259 – 299 .
  • Lord , F. 1980 . Applications of Item Response Theory to Practical Testing Problems , Hillsdale , NJ : Lawrence Erlbaum .
  • Lord , F. and Novick , M. 1968 . Statistical Theories of Mental Test Scores , Reading , MA : Addison Wesley .
  • Masters , G. 1982 . A Rasch Model for Partial Credit Scoring . Psychometrika , 47 : 149 – 174 .
  • Maughan , S. , Styles , B. , Lin , Y. and Kirkup , C. 2012 . “ Partial Estimates of Reliability: Reliability in the Key Stage 2 Science Tests ” . In Ofqual’s Reliability Compendium , Edited by: Opposs , D. and He , Q. 67 – 90 . Coventry : Ofqual .
  • Newton , P. 2009 . The Reliability of Results from National Curriculum Testing in England . Educational Research , 51 ( 2 ) : 181 – 212 .
  • Opposs , D. and He , Q. 2012 . “ The Reliability Programme: Final Report. Coventry, UK. Ofqual ” . In Ofqual’s Reliability Compendium , Edited by: Opposs , D. and He , Q. 853 – 900 . Coventry : Ofqual .
  • Qualifications and Curriculum Authority. 2009. Test Development, Level Setting and Maintaining Standards.” http://webarchive.nationalarchives.gov.uk/20090608182316/testsandexams.qca.org.uk/18939.aspx.
  • Qualls-Payne , A. 1992 . A Comparison of Score Level Estimates of the Standard Error of Measurement . Journal of Educational Measurement , 29 : 213 – 225 .
  • Rasch , G. 1960 . Probabilistic Models for Some Intelligence and Attainment Tests , Copenhagen : Danmarks pædagogiske Institut .
  • Rasch Measurement Transactions . 2007 . Standard Errors and Reliabilities: Rasch and Raw Score . Rasch Measurement Transactions , 20 ( 4 ) : 1086
  • Rudner, L. 2001. “Computing the Expected Proportions of Misclassified Examinees.” Practical Assessment, Research & Evaluation 7 (14). Accessed June 18, 2011, http://PAREonline.net/getvn.asp?v=7&n=14.
  • Rudner, L. 2005. “Expected Classification Accuracy.” Practical Assessment, Research and Evaluation 10 (13). Accessed June 18, 2011, http://pareonline.net/pdf/v10n13.pdf.
  • Traub , R. and Rowley , G. 1991 . An NCME Instructional Module on Understanding Reliability . Educational Measurement: Issues and Practice , 10 ( 1 ) : 37 – 45 .
  • Wiliam , D. 2001 . Reliability, Validity, and all that Jazz . Education , 29 : 17 – 21 .
  • Wright , B. and Masters , G. 1982 . Rating Scale Analysis. Rasch Measurement , Chicago , IL : MESA Press .
  • Wright , B. and Stone , M. 1979 . Best Design: Rasch Measurement , Chicago , IL : MESA Press .
  • Zhang , B. 2008 . Investigating Proficiency Classification for the Examination for the Certificate of Proficiency in English (ECPE) . Spaan Fellow Working Papers in Second or Foreign Language Assessment , 6 : 57 – 75 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.