717
Views
21
CrossRef citations to date
0
Altmetric
Original Articles

Assessor training: its effects on criterion‐based assessment in a medical context

, &
Pages 143-154 | Received 19 Apr 2007, Accepted 15 Mar 2008, Published online: 20 Jun 2008
 

Abstract

Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as Objective Structured Clinical Examinations (OSCEs) are widely used to assess clinical competence, both at the undergraduate and postgraduate levels. However, the issue of setting the pass marks or passing standard for such examinations remains contentious. The arrangements for particular OSCE assessment activities usually involve many different assessors, and have practical aspects that cannot be exactly duplicated across the cohort. These complexities therefore raise issues with respect to the robustness of the comparative student grading mechanism. This article addresses one aspect affecting the reliability of assessment, namely the effects of assessor training on the awarding of student marks. The article also investigates the interaction of gender between assessors, trained and untrained, and students. The findings, which are based on a detailed analysis of final year OSCE marks, indicate that untrained assessors award higher marks than trained assessors, and that a gender interaction exists; more specifically, that the use of untrained assessors tends to benefit female students over male students. The tension between reliability and validity has been particularly important in the field of medical education for a number of years – with medical students in the latter stages of their courses often being required to demonstrate competence in a variety of different simulated clinical activities with different patients, in front of different assessors in different hospitals and on different days. The complex nature of the OSCE arrangements raises serious questions as to the robustness of the setting the pass/fail boundary and, to a lesser extent, the honours boundaries.

Notes

1. For a summary of many of the government initiatives, see Pell (Citation2002).

2. This is the method of standard setting employed to turn the OSCE marks discussed in this article into pass/fail scores – in simple terms, assessors award criterion‐based marks, which are totalled as well as overall grades. The marks are regressed on the grades to calculate the pass mark.

3. At each station, a particular clinical task is set for the student to carry out.

4. See Pell and Roberts (Citation2006) for detailed description of the Borderline Regression Method as implemented at Leeds Medical School.

5. This is an internal quality initiative implemented at the University of Leeds.

6. A detailed explanation of the borderline method may be found in Dauphinee et al. (Citation1997).

7. In addition to the 12 single stations which lasted for five or so minutes each, there were six double stations which lasted for 13 minutes and carried approximately double the marks. The marks for these double stations are not included in any of the analysis presented in this article.

8. The actual number of individual assessors employed over a year will be a little less than this since some assessors do examine more than one OSCE in a year. In particular, there is some overlap between assessors for Years 3 and 5 and for the resits more generally.

9. The authors would have preferred to have taken account of any hierarchical clustering present in the data, but since this is a retrospective study and the data was anonymized this was not possible.

10. In addition to the analysis presented in this article, the authors have also decomposed the data to see if it is possible to identify any systematic causes of variance beyond those fixed effects described above. For example, the variance of the marks awarded by trained and untrained assessors is remarkably similar, and the patterns of data described in the analysis apply equally to both the upper and lower halves of the student cohort based on performance. Hence, we are confident that no important additional fixed effects are present in the data.

11. The same direction and magnitude of difference in mean marks was also found at the six double stations. At these stations, the untrained assessors mean mark was 2.3 marks higher than that of the trained assessors. (Double stations, lasting 13 minutes instead of the 6 minutes allowed for the single stations, carry approximately twice the marks.)

12. The inference here is based on the assumption that the use of only trained assessors would not have affected the pass mark. Whilst not strictly true, this effect would not be particularly large (a few marks) and the overall conclusions in terms of the number of additional failure are not materially affected.

13. For other year‐groups, the corresponding results were: Y3, p < 0.001 (F = 33.27, df = 3); Y4, p = 0.045 (F = 2.736, df = 3).

14. For other year‐groups, the corresponding results were: Y3, p < 0.001 (F = 32.89, df = 3); Y4, p = 0.080 (F = 2.268, df = 3).

15. These extra requirements are for the two major groups of stations in ‘examination’ and ‘history taking’.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.