Abstract
Introduction: In recent decades, there has been a move towards standardized models of assessment where all students sit the same test (e.g. OSCE). By contrast, in a sequential test the examination is in two parts, a “screening” test (S1) that all candidates take, and then a second “test” (S2) which only the weaker candidates sit. This article investigates the diagnostic accuracy of this assessment design, and investigates failing students’ subsequent performance under this model.
Methods: Using recent undergraduate knowledge and performance data, we compare S1 “decisions” to S2 overall pass/fail decisions to assess diagnostic accuracy in a sequential model. We also evaluate the longitudinal performance of failing students using changes in percentile ranks over a full repeated year.
Findings: We find a small but important improvement in diagnostic accuracy under a sequential model (of the order 2–4% of students misclassified under a traditional model). Further, after a resit year, weaker students’ rankings relative to their peers improve by 20–30 percentile points.
Discussion: These findings provide strong empirical support for the theoretical arguments in favor of a sequential testing model of assessment, particularly that diagnostic accuracy and longitudinal assessment outcomes post-remediation for the weakest students are both improved.
Ethics
The University of Leeds gave permission for this anonymized data to be used for research. The co-chairs of the University of Leeds School of Medicine ethics committee confirmed to the authors that formal ethics approval for this study was not required as it involved the use of routinely collected student assessment data which were fully anonymized prior to analysis.
Disclosure statement
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.
Glossary
Sequential testing: In a sequential test (Pell et al. Citation2013), the examination is in two parts, beginning with a ‘screening’ test that all candidates take often associated with a higher passing threshold to avoid false positive decisions. Those students not achieving this threshold have not failed the test at this point, but are required to take a further ‘additional’ test to provide more evidence as to their ‘true’ performance. The full sequence must be blueprinted collectively, and careful consideration should be given to those selection of topics/domains assessed in the first part of the sequence.
Pell, G., Fuller, R., Homer, M. and Roberts, T. 2013. Advancing the objective structured clinical examination: sequential testing in theory and practice. Medical Education. 47(6), pp.569–577.
Notes on contributors
Matt Homer, BSc, MSc, PhD, CStat, is an Associate Professor, working in both the Schools of Medicine and Education. His medical education research focuses on psychometrics and assessment quality, particularly related to OSCEs and knowledge tests.
Richard Fuller, MA, MBChB, FRCP, FacadMed, is a consultant physician, Professor of Medical Education and Director of the undergraduate degree programme at Leeds Institute of Medical Education. His research interests focus on the ‘personalisation’ of assessment, to support individual learner journeys, through application of intelligent assessment design in campus and workplace based assessment formats, assessor behaviours, mobile technology delivered assessment and the impact of sequential testing methodologies.
Godfrey Pell, BEng, MSc, C.Stat, C.Sci, is a principal research fellow emeritus at Leeds Institute of Medical Education, who has a strong background in management. His research focuses on quality within the OSCE, including theoretical and practical applications. He acts as an assessment consultant to a number of medical schools.
Notes
1 This is calculated as (cut score – mean score)/SD.