1,122
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Quantifying error in OSCE standard setting for varying cohort sizes: A resampling approach to measuring assessment quality

, , &
Pages 181-188 | Published online: 24 Apr 2015
 

Abstract

Background: The use of the borderline regression method (BRM) is a widely accepted standard setting method for OSCEs. However, it is unclear whether this method is appropriate for use with small cohorts (e.g. specialist post-graduate examinations).

Aims and methods: This work uses an innovative application of resampling methods applied to four pre-existing OSCE data sets (number of stations between 17 and 21) from two institutions to investigate how the robustness of the BRM changes as the cohort size varies. Using a variety of metrics, the ‘quality’ of an OSCE is evaluated for cohorts of approximately n = 300 down to n = 15. Estimates of the standard error in station-level and overall pass marks, R2 coefficient, and Cronbach’s alpha are all calculated as cohort size varies.

Results and conclusion: For larger cohorts (n > 200), the standard error in the overall pass mark is small (less than 0.5%), and for individual stations is of the order of 1–2%. These errors grow as the sample size reduces, with cohorts of less than 50 candidates showing unacceptably large standard error. Alpha and R2 also become unstable for small cohorts. The resampling methodology is shown to be robust and has the potential to be more widely applied in standard setting and medical assessment quality assurance and research.

Practice points

  • Using resampling methods one can calculate the error (standard error) in the pass mark calculations under examinee-centred methods of standard setting, such as the borderline regression method (BRM).

  • For large cohorts (e.g. 200+), the BRM pass mark has only a small standard error (<0.5%), providing additional validation of the BRM as a defensible methods of standard setting.

  • The standard error in the pass mark grows as the cohort size decreases, and, particularly at the station level, becomes unacceptably large for small cohorts (e.g. n < 50).

  • Small intuitions should be taken care to estimate standard errors in pass marks and ensure that their pass/fail decisions under the BRM are sufficiently defensible.

  • Resampling methods provide robust and conceptually straightforward ways to calculate (standard) error in a range of post-hoc metrics.

Acknowledgements

The authors thank Rob Long from the University of Leeds for the help in writing the R code used in this study.

Declarations of interest: The authors report no declarations of interest.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 65.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 771.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.