References
- Baird, J.-A. 2007. “Alternative Conceptions of Comparability”. In Chap. 4 of Techniques for Monitoring the Comparability of Examination Standards, edited by P. Newton, J. Baird, H. Goldstein, H. Patrick, and P. Tymms. London: Qualifications and Curriculum Authority.
- Bakker, M., P. Sanders, D. Beijaard, E. Roelofs, D. Tigelaar, and N. Verloop. 2008. “De betrouwbaarheid en generaliseerbaarheid van competentiebeoordelingen op basis van een videodossier” [Reliability and Generalizability of Performance Judgments Based on a Video Portfolio]. Pedagogische Studiën 85 (4): 240–260.
- Brennan , R. L. 2001 . Generalizability Theory , New York , NY : Springer .
- Brookhart , S. M. 2003 . Developing Measurement Theory for Classroom Assessment Purposes and Uses . Educational Measurement: Issues and Practice , 224 : 5 – 12 .
- Buckendahl , C. W. , Yang , Y. and Ferdous , A. 2003 . An Alternative Strategy for Estimating Decision Consistency Reliability , Lincoln , NE : University of Nebraska .
- Cardinet , J. , Johnson , S. and Pini , G. 2010 . Applying Generalizability Theory Using EduG , New York , NY : Taylor and Francis .
- Chester , M. D. 2003 . Multiple Measures and High Stakes Decisions. A Framework for Combining Measures . Educational Measurement: Issues and Practice , 22 ( 2 ) : 32 – 41 .
- Cohen , J. 1960 . A Coefficient of Agreement for Nominal Scales . Educational and Psychological Measurement , 20 : 37 – 46 .
- Cohen , J. 1968 . Weighted Kappa: Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit . Psychological Bulletin , 70 : 213 – 220 .
- Crisp , V. and Novakovic , N. 2009 . Are all Assessments Equal? The Comparability of Demands of College-based Assessments in a Vocationally Related Qualification . Research in Post-compulsory Education , 141 : 1 – 18 .
- Cronbach , L. J. 1951 . Coefficient Alpha and the Internal Structure of Tests . Psychometrika , 163 : 297 – 334 .
- Cronbach , L. J. and Gleser , G. C. 1957 . Psychological Tests and Personnel Decisions , Urbana , IL : University of Illinois Press .
- Cronbach , L. J. , Linn , R. L. , Brennan , R. L. and Haertel , E. H. 1997 . Generalizability Analysis for Performance Assessments of Student Achievement or School Effectiveness . Educational and Psychological Measurement , 573 : 373 – 399 .
- Crooks , T. J. , Kane , M. T. and Cohen , A. S. 1996 . Threats to the Valid Use of Assessments . Assessment in Education , 3 ( 3 ) : 265 – 265 .
- Douglas, K. M. 2007. “General Method for Estimating the Classification Reliability of Complex Decisions Based on Configural Combinations of Multiple Assessment Scores.” PhD thesis, University of Maryland, USA.
- Downing , S. M. 2004 . Reliability: On the Reproducibility of Assessment Data . Medical Education , 38 : 1006 – 1012 .
- Driessen , E. W. , Tarwijk , J. V. , Overeem , K. , Vermunt , J. D. and Van der Vleuten , C. P. M. 2005 . Conditions for Successful Use of Portfolio for Reflection . Medical Education , 39 : 1230 – 1235 .
- Driessen , E. , van der Vleuten , C. , Schuwirth , L. , van Tartwijk , J. and Vermunt , J. 2005 . The Use of Qualitative Research Criteria for Portfolio Assessment as an Alternative to Reliability Evaluation: A Case Study . Medical Education , 39 : 214 – 220 .
- Dunbar , S. B. , Koretz , D. and Hoover , H. D. 1991 . Quality Control in the Development and Use of Performance Assessments . Applied Measurement in Education , 4 : 289 – 304 .
- Ebel , R. L. and Frisbie , D. A. 1991 . Essentials of Educational Measurement , Englewood Cliffs , NJ : Prentice-Hall .
- Eraut, M., S. Steadman, J. Trill, and J. Parkes. 1996. “The Assessment of NVQs.” Research Report No. 4, University of Sussex: Brighton.
- Good, R. 2002. “Using Discriminant Analysis as a Method of Combining Multiple Measures of Student Performance.” Paper presented at the annual meeting of the AERA, New Orleans, USA, April 1–5.
- Greatorex, J. 2002. “Two Heads are Better than One: Standardizing The Judgements of National Vocational Qualification Assessors.” Paper presented at the British Educational Research Association Conference, Exeter, UK, September 4–6.
- Greatorex , J. 2005 . Assessing the Evidence: Different Types of NVQ Evidence and their Impact on Reliability and Fairness . Journal of Vocational Education and Training , 572 : 149 – 164 .
- Greatorex, J., and M. Shannon. 2003. “How Can NVQ Assessors’ Judgements be Standardized?” Paper presented at the annual conference of the British Educational Research Association, Edinburgh, September 6–8.
- Gwet , K. 2002 . Inter-rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity . Statistical Methods for Inter-Rater Reliability Assessment , 2 : 1 – 9 .
- Gwet , K. 2008 . Computing Inter-rater Reliability and its Variance in the Presence of High Agreement . British Journal of Mathematical and Statistical Psychology , 61 : 29 – 48 .
- Holsgrove, G. 2010. “Reliability Issues in the Assessment of Small Cohorts.” General Medical Council GMC Supplementary guidance.
- Johnson , M. 2006 . A Review of Vocational Research in the UK 2002–2006: Measurement and Accessibility Issues . International Journal of Training Research , 42 : 48 – 71 .
- Johnson, M. 2008a. “Assessing at the Borderline: Judging a Vocationally Related Portfolio Holistically.” Issues in Educational Research 18 (1): 26–43.
- Johnson , M. 2008b . Exploring Assessor Consistency in a Health and Social Care Qualification Using a Sociocultural Perspective . Journal of Vocational Education and Training , 602 : 173 – 187 .
- Kane , M. 2006 . “ Validation ” . In Educational Measurement , 4th ed. , Edited by: Brennan , R. L. 17 – 64 . Westport , CT : Praeger .
- Kneebone , R. L. , Kidd , J. , Nestel , D. , Barnet , A. , Lo , B. , King , R. , Yang , G. Z. and Brown , R. 2005 . Blurring the Boundaries: Scenario-based Simulation in a Clinical Setting . Medical Education , 39 ( 6 ) : 580 – 587 .
- Kuder , G. and Richardson , M. 1937 . The Theory of the Estimation of Test Reliability . Psychometrika , 23 : 151 – 160 .
- Lane, S., and C. A. Stone. 2006. “Performance Assessment.” In Educational Measurement, edited by R. L. Brennan, 387–430. Westport, CT: Praeger.
- Lord , F. M. and Novick , M. R. 1968 . Statistical Theories of Mental Test Scores , Reading , MA : Addison-Welsley .
- Melville , C. , Rees , M. , Brookfield , D. and Anderson , J. 2004 . Portfolios for Assessment of Paediatric Specialist Registrars . Medical Education , 3810 : 1117 – 1125 .
- Mislevy , R. J. 2009 . “ Validity from the Perspective of Model-based Reasoning ” . In The Concept of Validity: Revisions, New Directions and Applications , Edited by: Lissitz , R. L. 83 – 108 . Charlotte , NC : Information Age .
- Mitchell , L. and Bartram , D. 1994 . The Place of Knowledge and Understanding in the Development of National Vocational Qualifications and Scottish Vocational Qualifications , Sheffield : Employment Dept .
- Murphy , D. J. , Bruce , D. A. , Mercer , S. W. and Eva , K. W. 2009 . Chap. 16 of The Reliability of Workplace-based Assessment in Postgraduate Medical Education and Training: A National Evaluation in General Practice in the United Kingdom . Advances in Health Sciences Education , 142 : 219 – 232 .
- Murphy, R., P. Burke, S. Content, M. Frearson, J. Gillispie, M. Hadfield, R. Rainbow, J. Wallis, and J. Wilmut. 1995. “The Reliability of Assessment of NVQs.” Report presented to NCVQ, School of Education, University of Nottingham.
- Norcini, J. J. 2010. “Workplace-based Assessment”. Chap. 16 In Understanding Medical Education: Evidence, Theory and Practice, edited by T. Swanwick, 232–245. Oxford: John Wiley & Sons.
- Norcini , J. J. , Blank , L. L. , Duffy , F. D. and Fortna , G. S. 2003 . The Mini-CEX: A Method for Assessing Clinical Skills . Annals of Internal Medicine , 138 ( 6 ) : 476 – 481 .
- Osburn , H. G. 2000 . Coefficient Alpha and Related Internal Consistency Reliability Coefficients . Psychological Methods , 5 : 343 – 355 .
- Parkes , J. 2007 . Reliability as Argument . Educational Measurement: Issues and Practice , 264 : 2 – 10 .
- Pollitt , A. 2012 . The Method of Adaptive Comparative Judgement . Assessment in Education: Principles, Policy & Practice , 19 : 281 – 300 .
- Sijtsma , K. 2009 . On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha . Psychometrika , 741 : 107 – 120 .
- Smith , J. K. 2003 . Reconsidering Reliability in Classroom Assessment and Grading . Educational Measurement: Issues and Practice , 224 : 26 – 33 .
- Vleuten , C. P. M. and Schuwirth , L. W. T. 2005 . Assessing Professional Competence: From Methods to Programmes . Medical Education , 393 : 309 – 317 .
- Vleuten , C. P. M. , van Luyk , S. J. and Swanson , D. B. 1988 . Reliability Generalizability of the Maastricht Skills Test . Research in Medical Education , 27 : 228 – 233 .
- Wass , V. and Jolly , B. 2001 . Does Observation Add to the Validity of the Long Case? . Medical Education , 35 : 729 – 734 .
- Whittington , D. 1999 . Making Room for Values and Fairness: Teaching Reliability and Validity in the Classroom Context . Educational Measurement: Issues and Practice , 181 : 14 – 22 .
- Wolf , A. 1995 . Competence-based Assessment , Buckingham : Open University Press .
- Wolf , A. 1998 . Portfolio Assessment as National Policy: The National Council for Vocational Qualifications and its Quest for a Pedagogical Revolution . Assessment in Education, Policy and Practice , 5 ( 3 ) : 413 – 445 .
- Zegers , F. E. 1991 . Coefficients for Interrater Agreement . Applied Psychological Measurement , 15 : 321 – 333 .