1,126
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Quantifying error in OSCE standard setting for varying cohort sizes: A resampling approach to measuring assessment quality

, , &
 

Abstract

Background: The use of the borderline regression method (BRM) is a widely accepted standard setting method for OSCEs. However, it is unclear whether this method is appropriate for use with small cohorts (e.g. specialist post-graduate examinations).

Aims and methods: This work uses an innovative application of resampling methods applied to four pre-existing OSCE data sets (number of stations between 17 and 21) from two institutions to investigate how the robustness of the BRM changes as the cohort size varies. Using a variety of metrics, the ‘quality’ of an OSCE is evaluated for cohorts of approximately n = 300 down to n = 15. Estimates of the standard error in station-level and overall pass marks, R2 coefficient, and Cronbach’s alpha are all calculated as cohort size varies.

Results and conclusion: For larger cohorts (n > 200), the standard error in the overall pass mark is small (less than 0.5%), and for individual stations is of the order of 1–2%. These errors grow as the sample size reduces, with cohorts of less than 50 candidates showing unacceptably large standard error. Alpha and R2 also become unstable for small cohorts. The resampling methodology is shown to be robust and has the potential to be more widely applied in standard setting and medical assessment quality assurance and research.

Practice points

  • Using resampling methods one can calculate the error (standard error) in the pass mark calculations under examinee-centred methods of standard setting, such as the borderline regression method (BRM).

  • For large cohorts (e.g. 200+), the BRM pass mark has only a small standard error (<0.5%), providing additional validation of the BRM as a defensible methods of standard setting.

  • The standard error in the pass mark grows as the cohort size decreases, and, particularly at the station level, becomes unacceptably large for small cohorts (e.g. n < 50).

  • Small intuitions should be taken care to estimate standard errors in pass marks and ensure that their pass/fail decisions under the BRM are sufficiently defensible.

  • Resampling methods provide robust and conceptually straightforward ways to calculate (standard) error in a range of post-hoc metrics.

Acknowledgements

The authors thank Rob Long from the University of Leeds for the help in writing the R code used in this study.

Declarations of interest: The authors report no declarations of interest.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.