1,130
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Quantifying error in OSCE standard setting for varying cohort sizes: A resampling approach to measuring assessment quality

, , &

References

  • Altman DG, Bland JM. 2005. Standard deviations and standard errors. BMJ 331(7521):903
  • Bland JM, Altman DG. 1997. Statistics notes: Cronbach’s alpha. BMJ 314(7080):572
  • Bloch R, Norman G. 2012. Generalizability theory for the perplexed: A practical introduction and guide: AMEE Guide No. 68. Med Teach 34(11):960–992
  • Boos D, Stefanski L. 2010. Efron’s bootstrap. Significance 7(4):186–188
  • Boursicot KAM, Roberts TE, Pell G. 2007. Using borderline methods to compare passing standards for OSCEs at graduation across three medical schools. Med Educ 41(11):1024–1031
  • Cizek GJ, Bunch MB. 2007. Standard setting a guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage Publications
  • Clauser BE, Clyman SG, Margolis MJ, Ross LP. 1996. Are fully compensatory models appropriate for setting standards on performance assessments of clinical skills? Acad Med 71(1 Suppl):S90–S92
  • Crossley J, Davies H, Humphris G, Jolly B. 2002. Generalisability: A key to unlock professional assessment. Med Educ 36(10):972–978
  • Crossley J, Russell J, Jolly B, Ricketts C, Roberts C, Schuwirth L, Norcini J. 2007. “I’m pickin’ up good regressions”: The governance of generalisability analyses. Med Educ 41(10):926–934
  • Dijkstra J, van der Vleuten CPM, Schuwirth LWT. 2010. A new framework for designing programmes of assessment. Adv Health Sci Educ 15(3):379–393
  • Downing SM. 2003. Item response theory: applications of modern test theory in medical education. Med Educ 37(8):739–745
  • Draper NR, Smith H. 1998. Applied regression analysis. New York: Wiley
  • Efron B. 1979. Bootstrap methods: Another look at the jackknife. Ann Stat 7(1):1–26
  • Efron B, Tibshirani RJ. 1993. An introduction to the bootstrap. New York: Chapman and Hall
  • Fuller R, Homer M, Pell G. 2011. What a difference an examiner makes! __Detection, impact and resolution of “rogue” examiner behaviour in high stakes OSCE assessments. Presented at the AMEE, Vienna
  • Fuller R, Homer M, Pell G. 2013. Longitudinal interrelationships of OSCE station level analyses, quality improvement and overall reliability. Med Teach 35(6):515–517
  • Gingerich A, Regehr G, Eva KW. 2011. Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Acad Med 86:S1–S7
  • Govaerts MJB, van der Vleuten CPM, Schuwirth LWT, Muijtjens AMM. 2007. Broadening perspectives on clinical performance assessment: Rethinking the nature of in-training assessment. Adv Health Sci Educ Theory Pract 12(2):239–260
  • Hays R, Gupta TS, Veitch J. 2008. The practical value of the standard error of measurement in borderline pass/fail decisions. Med Educ 42(8):810–815
  • Hejri SM, Jalili M, Muijtjens AMM, van der Vleuten CPM. 2013. Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination. J Res Med Sci 18(10):887–891
  • Homer M, Darling J, Pell G. 2012. Psychometric characteristics of integrated multi-specialty examinations: Ebel ratings and unidimensionality. Assess Eval High Educ 37(7):787–804
  • Humphrey-Murto S, MacFadyen JC. 2002. Standard setting: A comparison of case-author and modified borderline-group methods in a small-scale OSCE. Acad Med 77(7):729–732
  • Kane M. 2001. So much remains the same: Conception and status of validation in setting standards. In: Cizek GJ, editor. Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates. pp 53–58
  • Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E. 2011. Opening the black box of clinical skills assessment via observation: A conceptual model. Med Educ 45(10):1048–1060
  • Kramer A, Muijtjens A, Jansen K, Düsman H, Tan L, van der Vleuten C. 2003. Comparison of a rational and an empirical standard setting procedure for an OSCE. Med Educ 37(2):132–139
  • Livingston SA, Zieky MJ. 1982. Passing scores: A manual for setting standards of performance on educational and occupational tests. Educational Testing Service. [Accessed 19 March 2015]. Available from http://www.eric.ed.gov/ERICWebPortal/contentdelivery/servlet/ERICServlet?accno=ED227113
  • Lord FM. 1984. Standard errors of measurement at different ability levels. J Educ Meas 21(3):239–243
  • McManus IC. 2012. The misinterpretation of the standard error of measurement in medical education: A primer on the problems, pitfalls and peculiarities of the three different standard errors of measurement. Med Teach 34(7):569–576
  • Muijtjens AMM, Kramer AWM, Kaufman DM, van der Vleuten CPM. 2003. Using resampling to estimate the precision of an empirical standard-setting method. Appl Meas Educ 16(3):245–256
  • Pell G, Fuller R, Homer M, Roberts T. 2010. How to measure the quality of the OSCE: A review of metrics – AMEE guide no. 49. Med Teach 32(10):802–811
  • R Core Team. 2013. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available from http://www.R-project.org/
  • Raymond MR, Swygert KA, Kahraman N. 2012. Measurement precision for repeat examinees on a standardized patient examination. Adv Health Sci Educ Theory Pract 17(3):325–337
  • Reece A, Chung EMK, Gardiner RM, Williams SE. 2008. Competency domains in an undergraduate Objective Structured Clinical Examination: Their impact on compensatory standard setting. Med Educ 42(6):600–606
  • Rowntree D. 2000. Statistics without tears: An introduction for non-mathematicians. London: Penguin Books
  • Sadler DR. 2009. Indeterminacy in the use of preset criteria for assessment and grading. Assess Eval High Educ 34(2):159–179
  • Schoonheim-Klein M, Muijtjens A, Habets L, Manogue M, van der Vleuten C, van der Velden U. 2009. Who will pass the dental OSCE? Comparison of the Angoff and the borderline regression standard setting methods. Eur J Dental Educ 13(3):162–171
  • Schuwirth LWT, van der Vleuten CPM. 2006. A plea for new psychometric models in educational assessment. Med Educ 40(4):296–300
  • Sinharay S, Haberman S, Puhan G. 2007. Subscores based on classical test theory: To report or not to report. Educ Meas Issues Pract 26(4):21–28
  • Sinharay S, Puhan G, Haberman SJ. 2011. An NCME instructional module on subscores. Educ Meas Issues Pract 30(3):29–40
  • Streiner DL, Norman GR. 2008. Health measurement scales: A practical guide to their development and use. 4th ed. Oxford, New York: Oxford University Press
  • Tavakol M, Dennick R. 2011. Making sense of Cronbach’s alpha. Int J Med Educ 2:53–55
  • van der Vleuten CPM. 2014. When I say … context specificity. Med Educ 48(3):234–235
  • Wood M. 2004. Statistical inference using bootstrap confidence intervals. Significance 1(4):180–182
  • Wood TJ, Humphrey-Murto SM, Norman GR. 2006. Standard setting in a small scale OSCE: A comparison of the modified borderline-group method and the borderline regression method. Adv Health Sci Educ 11(2):115–122

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.