Abstract
Introduction: OSCEs are commonly conducted in multiple cycles (different circuits, times, and locations), yet the potential for students’ allocation to different OSCE cycles is rarely considered as a source of variance—perhaps in part because conventional psychometrics provide limited insight.
Methods: We used Many Facet Rasch Modeling (MFRM) to estimate the influence of “examiner cohorts” (the combined influence of the examiners in the cycle to which each student was allocated) on students’ scores within a fully nested multi-cycle OSCE.
Results: Observed average scores for examiners cycles varied by 8.6%, but model-adjusted estimates showed a smaller range of 4.4%. Most students’ scores were only slightly altered by the model; the greatest score increase was 5.3%, and greatest score decrease was −3.6%, with 2 students passing who would have failed.
Discussion: Despite using 16 examiners per cycle, examiner variability did not completely counter-balance, resulting in an influence of OSCE cycles on students’ scores. Assumptions were required for the MFRM analysis; innovative procedures to overcome these limitations and strengthen OSCEs are discussed.
Conclusions: OSCE cycle allocation has the potential to exert a small but unfair influence on students’ OSCE scores; these little-considered influences should challenge our assumptions and design of OSCEs.
Acknowledgements
Authors thank Dr. Emyr Benbow, Manchester Medical School, for invaluable expertise in our discussions of OSCE design and conduct;
Dr. Peter D. MacMillan, for his thoughtful review and suggestions for improvement of this manuscript.
Disclosure statement
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.
Glossary
Many Facet Rasch Modeling (MFRM): A form of psychometric analysis, derived from item response theory, which models the comparative influence of a number of “facets” (variables that can have an influence) on examinees’ scores. The modeling is then able to estimate the score each examinee would have received if they had encountered a completely neutral example of each facet (for example a completely neutral examiner), and provide an estimated “fair score”.
Bond, T., & Fox, C. (2012). Applying the Rasch Model Fundamental Measurement in the Human Sciences (2nd Editio). New York & London: Routledge.
Notes on contributors
Peter Yeates, MRCP, PhD, is a Lecturer in Medical Education Research at Keele University School of Medicine, and a consultant in Acute and Respiratory Medicine in Pennine Acute Hospitals NHS Trust. His research focuses on assessor variability and assessor cognition within health professionals’ education.
Stefanie S. Sebok-Syer, PhD, is a postdoctoral fellow at the Centre for Education Research and Innovation specializing in measurement, assessment, and evaluation. Her main interests include exploring the rating behavior of assessors, particularly in high-stakes assessment contexts.