410
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A realist evaluation of how, why and when objective structured clinical exams (OSCEs) are experienced as an authentic assessment of clinical preparedness

ORCID Icon, ORCID Icon, ORCID Icon, , ORCID Icon, ORCID Icon, ORCID Icon, , , , , , ORCID Icon & ORCID Icon show all
Received 28 Nov 2023, Accepted 02 Apr 2024, Published online: 18 Apr 2024

Abstract

Introduction

Whilst rarely researched, the authenticity with which Objective Structured Clinical Exams (OSCEs) simulate practice is arguably critical to making valid judgements about candidates’ preparedness to progress in their training. We studied how and why an OSCE gave rise to different experiences of authenticity for different participants under different circumstances.

Methods

We used Realist evaluation, collecting data through interviews/focus groups from participants across four UK medical schools who participated in an OSCE which aimed to enhance authenticity.

Results

Several features of OSCE stations (realistic, complex, complete cases, sufficient time, autonomy, props, guidelines, limited examiner interaction etc) combined to enable students to project into their future roles, judge and integrate information, consider their actions and act naturally. When this occurred, their performances felt like an authentic representation of their clinical practice. This didn’t work all the time: focusing on unavoidable differences with practice, incongruous features, anxiety and preoccupation with examiners’ expectations sometimes disrupted immersion, producing inauthenticity.

Conclusions

The perception of authenticity in OSCEs appears to originate from an interaction of station design with individual preferences and contextual expectations. Whilst tentatively suggesting ways to promote authenticity, more understanding is needed of candidates’ interaction with simulation and scenario immersion in summative assessment.

Practice points

  • OSCEs comprise a simulated assessment of candidates’ clinical skills. The authenticity with which OSCEs simulate practice arguably contributes to the validity of the assessment.

  • Authenticity within an OSCE cannot be assumed but appears to emerge from the interaction of station design, individual preferences and contextual expectations of assessment.

  • Several actions can be tentatively recommended to promote OSCE authenticity, but they require considered application in practice as their influence depends on context.

  • Enabling candidates to demonstrate an authentic representation of their usual clinical performance within OSCEs may enhance fairness whilst promoting greater engagement in clinical learning.

Introduction

Objective structured clinical exams (OSCEs) are a key component of many assessment programmes within health professionals’ education internationally because of their ability to provide comparable, blueprinted assessments for high-stakes purposes (Boursicot et al. Citation2021). Whilst beneficial, one observation is inescapable: OSCEs assess candidates’ performance within simulated rather than real practice (Adamo Citation2003). Simulated assessment requires that students, simulated patients, and examiners engage in a “fiction contract” (Dieckmann et al. Citation2007): they need to suspend disbelief in order to inhabit the simulated scenario, particularly if it departs from the reality of their prior clinical experience. When OSCEs are used to determine whether learners are ready to progress into the next phase of training or practice, their validity depends (at least in part) on how authentically they represent the typical work, judgement, skills, and behaviours from the practice into which candidates will progress (Hodges Citation2003b). As per Lavoie et al. (Citation2020) we assert that whilst authenticity encompasses elements of fidelity (how well the scenario replicates practice), it also includes how learners interact with the scenarios and their subjective responses to them. Relating this to Kane’s model of validity (Cook et al. Citation2015), if the OSCE poorly evokes how a candidate will think or behave in practice then the extrapolation domain of the validity argument will be challenged.

Despite the importance of authenticity to OSCEs’ validity, there has been little research on this topic. The majority of OSCE literature is focused on their psychometric properties (Swanson and van der Vleuten Citation2013). Commonly, content evidence is sought as part of the evidentiary chain which supports their interpretation for a particular purpose (Downing Citation2003; Kane Citation2013), but this evidence typically involves establishing how comprehensively the stations within the OSCE sample a blueprint (Kreptul and Thomas Citation2016). Whilst content alignment and sampling are critical, it fails to ‘assess authenticity’ at the level of station design.

The small body of literature on authenticity within OSCEs has highlighted a number of themes. Fragmentation of clinical tasks (Gupta et al. Citation2010; van der Vleuten et al. Citation2010; Nasir et al. Citation2014), or scenarios which poorly represent practice (either through unrealistic patient presentations or because the actions expected of candidates are incongruent with usual responses in clinical practice (Hyde et al. Citation2022)) contribute to inauthenticity. Poor ‘task time agreement’ (i.e. time pressure) causes candidates to hurry, altering their interaction with (simulated) patients (Gormley et al. Citation2016) and/or simply preventing the testing of many realistic scenarios (Marwaha Citation2011). Simulated patients (SP) portrayals of cases may be inauthentic, either because SPs comply with examination instructions more readily than is typical of real patients or because scenario scripting reflects professional stereotypes of patients (Gormley et al. Citation2016; Hyde et al. Citation2022). “The OSCE” itself has been conceptualised as theatre (Hodges Citation2003a; Gormley et al. Citation2016), in which the triadic interaction between candidate, examiner and simulated patient produces a ritualised and rehearsed show which interferes with more authentic interaction between the candidate and simulated patient. Moreover, candidates’ expectations around how stations are ‘scored’ may produce perverse incentives to inauthentic behaviour (Gormley et al. Citation2016). Harrison (Citation2018) describes lack of agency by candidates within observed scenarios which inhibits them from displaying their mastery of a skill. More concerningly, learners’ avoidance of clinical learning prior to OSCEs (Rudland et al. Citation2008) may emanate from misalignment between authentic practice and their sometimes-formulaic expectations of OSCEs.

Whilst these critiques might challenge the utility of OSCEs, we, like others (Chan and Rashid Citation2023), assert that there are several reasons to persevere. Firstly, workplace-based assessments are not without challenges (Watling et al. Citation2016, Schumacher et al. Citation2022, LaDonna et al. Citation2017, Phinney et al. Citation2022, Melvin et al. Citation2019, Yepes-Rios et al. Citation2016, Sebok-Syer et al. Citation2018). Secondly, simulated assessment enables testing of (particularly undergraduate) preparedness for scenarios which candidates would rarely or never manage in practice. Thirdly, OSCEs demonstrate a commitment to equivalent and therefore comparable testing of candidates. Nevertheless, OSCEs’ contribution to programmes of assessment would be enhanced if their authenticity could be improved. Given that relatively little work has considered how the authenticity of OSCEs might be enhanced, we explored how students’ and examiners’ interaction with station design and OSCE conduct may give rise to authenticity (or its absence) within OSCEs. To do this we focused on a comparatively novel approach to station design in use at one of the participating institutions, whilst acknowledging that similar approaches are used in different ways by many institutions. For reference, these were called “authentic OSCE stations” (see methods section). We then asked the question, “How do candidates and examiners use and interact with authentic OSCE stations and how, why, under what circumstances, and for whom does this influence the perception of the OSCE being an authentic test of candidates’ preparedness for their future roles?

Methods

Data were collected within the AD-Equiv study (Yeates et al. Citation2022), which aimed to enhance the authenticity, diagnostic accuracy and equivalence of a distributed graduation-level OSCE. We adopted a Realist ontology and epistemology which holds that whilst mind-independent realities exist, knowledge of these realities is constructed by observing and deducing how different contexts interact with particular mechanisms to produce different outcomes for different people (Pawson and Tilley Citation1997). We adopted the Realist evaluation approach (Duddy and Wong Citation2022; Trisha Greenhalgh et al. Citation2017) gathering data through realist interviews (Manzano Citation2016; Greenhalgh et al. Citation2017) and focus groups from students, examiners and simulated patients who had recently experienced authentic OSCE stations. Focus groups were used immediately post OSCEs to explore students’ perspectives whilst the experience remained recent, using the interplay to explore why different students responded differently. Interviews, conversely, allowed more in-depth individual responses and reflections with hours to a few weeks hindsight. A comparison of our method with the RAMESES II checklist (Wong et al. Citation2016) is provided in Supplementary Appendix 3.

The intervention: introducing ‘authentic’ OSCE stations

Authentic OSCE stations were designed using the following principles: stations should feature a scenario from the typical work of UK foundation doctors (PGY1-2); (as far as practical) candidates should complete the whole patient encounter; station information should describe the clinical context, and (rather than instructing to perform specific actions) students should use their clinical judgement to decide and then do what they would in practice. Interaction with the examiner is kept to a minimum to encourage agency by students. Pilot work ensured task-time agreement for these newer authentic stations. Supplementary Appendix 2 contains further details of the OSCE.

Population, sampling and recruitment

Our study population was participants (candidates, examiners, and simulated patients) in UK undergraduate OSCEs for students in the latter stages of undergraduate training. We sampled purposively (aiming for diversity of geography and schools) from four UK medical schools (one from each of the UK’s nations England, Northern Ireland, Scotland and Wales) including established and comparatively new medical schools, from 4th (penultimate) and 5th (final) year in this these schools. Participants were recruited via email and verbal invitation.

Initial programme theory

A central aspect of any theory-driven realist investigation is to develop an initial program theory (IPT). An IPT can be used to frame and understand how, for whom, why, and under what contexts an intervention is expected to work (Duddy and Wong Citation2022).

Our IPT posited that the combination of elements in authentic OSCE stations would collectively increase authenticity for candidates because they would enable demonstration of students’ ability to work through a clinical task in a manner more closely representing routine clinical practice. We expected this to be mediated by candidates’ understanding of the task and the degree of challenge the stations presented.

Data collection

Interviews were performed via Microsoft Teams within 3 months of the OSCE. Student focus groups were performed in-person immediately after the OSCE. All data was audio recorded and transcribed.

Researchers used a question template which explored comparisons between participants’ perceptions of authentic OSCE stations and their usual OSCE station format, focusing on how and why elements of the intervention (content, timing, stimulus materials, whole tasks etc) evoked different responses and produced different outcomes for different stations, different people or within different schools. Consistent with recommendations for realist interviewing (Greenhalgh et al. Citation2017), researchers adapted their questions to pursue emergent issues and test elements of the evolving programme theory based on their impressions from recent analysis. Throughout interviews, researchers sought to establish a safe, inclusive and confidential environment.

Analysis

Data were analysed using a realist logic of analysis (Pawson and Tilley Citation1997; The RAMESES II Project Citation2017). Data analysis progressed in parallel to data collection and informed subsequent interview conduct. Two researchers (PY and AM) read portions of the data for familiarisation and to discuss sensitising concepts (i.e. whole tasks, time, fragmentation, agency etc) in relation to the IPT. One researcher (AM) then coded all data, inductively and theoretically labelling concepts which appeared relevant to the IPT. Ancillary coding was performed by PY to further aid sensitisation. Codes were gathered into similar categories. Researchers scrutinised these categories looking for relevant contexts, mechanisms and outcomes which were then organised into context-mechanism-outcome configurations (CMOCs), using the analytic processes of juxtaposition (comparing data from different sources to find patterns), reconciliation (resolving discrepancies between data sources), adjudication (weighing the evidence to decide on the most plausible explanations), and consolidation (integrating findings into a coherent whole) (Papoutsi et al. Citation2018). As CMOCs were produced, researchers considered their relevance to the evolving programme theory. GW provided additional, data-driven critique of the CMOCs as they were refined. Analysis progressed until the researchers judged that plausible CMOCs had been developed, which aligned with demi-regularities (semi-recurrent outcome patterns) within the data and a mature programme theory had been produced.

Reflexivity: Our research team included clinical (PY, RK, KC, AC, CC, RG, SE, RF, RMK, GW) and non-clinical (NC, VON, RV) OSCE experts, including people who have developed the authentic OSCE format at Keele (RK, NC) and people from medical schools which use different OSCE formats (KC, AC, VON, CC, RG, RV, SE). It also included Realist evaluation experts (GW) and other researchers with little experience of OSCEs (AM). Consequently, whilst the team was interested in the potential benefits of authentic OSCEs, this diversity produced contrasting views of the benefits and challenges of elements of the intervention (for example the balance of giving students instructions vs enabling autonomy) and challenges and refinements to CMOC description.

Ethics

All participation in the study was voluntary, and participants had the right to withdraw. Informed consent was documented through electronic consent forms. All data were pseudonymised and treated as confidential. Neither of the interviewers (PY or AM) had direct course responsibility for students or their progression. Ethical approval for the study was granted by Keele Research Ethics Committee (ref MH-210209).

Results

The relevant portions of 35 interviews and 3 focus groups were included, comprising a total of 15 candidates, 13 examiners and 7 simulated patients. Our analysis produced 13 CMOCs, each providing a data-driven theoretical claim about the mechanistic operation of a facet of the intervention in a given context. Individual CMOCs with supporting evidence can be seen in Supplementary Appendix 1 and are summarised in . Here we present a narrative of the programme theory which they support, referencing the relevant CMOCs within the text along with some abbreviated illustrative quotes.

Table 1. Titles of CMOCs.

In this study, many of the scenarios we created enabled students to demonstrate what felt, for them, like a very authentic representation of how they would typically behave in practice. This therefore felt like a fair means to assess their readiness to progress in practice. For them, this improved on existing assessments. This didn’t, however, work for everyone or in every station.

The degree to which students and examiners experienced the scenarios as authentic was influenced by a number of factors which interacted with each other differently in each scenario. Several of these were characteristics of the station design and emerged from the principles of authentic OSCE stations, which underpinned the intervention. Some were person-related and others related to peoples’ norms and expectations, or more broadly the culture within the assessment.

When authenticity was enhanced

When scenarios effectively represented typical work of F1 doctors, this gave the scenario credibility (CMOC_1a_TW) and situated students thinking in the clinical environment because it aligned with students’ experiences and expectations of their future roles (CMOC_1b_TW).

I also liked the presentations that the patients had were things that you would have to manage as an F1 [okay], it wasn’t like spotting signs of something you know, it was very much like acute presentation, managing the patient you know, things that you do to stabilise them before you would refer them onto the next team so, I thought that was quite useful. (24th AD equiv, Focus Group Site4, Students)

This in turn caused them to project themselves into that role which enabled them to think and behave as they would in practice (CMOC_1c_TW). This helped them to act naturally. When clinical problems were rich and complex enough to be unlikely to be rehearsed, and to require nuanced approaches, students found that that standard answers weren’t appropriate, and students had to draw from their experience to propose solutions (CMOC_1b_TW and CMOC_2_WT). Conversely, less complex or more predictable scenarios were more likely to evoke rehearsed responses. Asking students to complete the whole clinical task gave them space to integrate and follow through on the information they chose to gather. This tapped into their evolving clinical acumen, whilst avoiding fragmentation, so offered a better representation of what students would actually do in practice (CMOC_2_WT).

This [referring to the authentic OSCE] is the sort of task that I’m going to have to be able to do altogether in a few months’ time… it’s, it’s reflected in kind of like my ward placements as well. … I know how to take a, a fairly comprehensive history. I know how to examine a patient. It’s – but it’s the – it’s putting it altogether and it’s the, the decision-making that comes during that and afterwards as well; … it’s the putting it altogether and the, the, the synthesising all of the information. (23rd AD-Equiv, interview, site3 Stud6).

This was further aided by both the comparatively limited direction about what actions to perform and by avoiding prompting by examiners; students had to work out what to do, and their choices had consequences for the outcome of the station. This forced students to make judgements about what was appropriate (CMOC_3a_LP), which advantaged students who had engaged more in the clinical environment as they could draw from their experience to integrate information and direct their actions (CMOC_3b_LP).

it helped me see the station more as something like real-life … I could go in and kind of make my own plan and, and go in my own direction. I suppose it maybe adds to a little bit of uncertainty because you don’t know if you’re going down the right track or not. … knowing that even if I was doing something wrong or, … no one would interrupt to tell me … so it was definitely a lot more like, like real-life (23rd AD-Equiv, interview, Site3 Stud6)

Providing sufficient time aided this as it enabled students to think and integrate without either panicking or defaulting to rehearsed routines (CMOC_4_Ti).

it made me kind of be a bit more realistic because it’s kind of like, ‘Okay. I don’t have to worry about, you know, seven-minute timer, eight-minute timer’. It’s like I can actually think and not just regurgitate (16th AD-Equiv, interview, site 3 stud3)

Supplying relevant resources (such as guidelines) avoided artificial reliance on memory and let students show their familiarity with accessing information. This further aligned with what they would do in practice (CMOC_5_Tr). Use of simple props (i.e. a telephone) made scenarios look and feel more realistic which further aided students’ immersion and therefore helped them to act naturally (CMOC_6_Pr). When scenarios required students to demonstrate actions (for example using a probe to measure oxygen saturations rather than simply stating an intention to measure them), this added further to the assessments’ authenticity because it tested relevant workplace skills rather than just knowledge (CMOC_7_Dem). This contributed to students’ projection into their future roles and their ability to demonstrate their preparedness.

When authenticity was challenged

This didn’t work for everyone. For example, a minority of students and examiners found it hard to move beyond the fact the actor was not a patient and that the students’ actions would have no impact on them. This (or any) focus on unavoidable differences from practice reduced their engagement in the scenarios’ fiction and therefore made the assessment feel like an artificial test rather than a useful representation of the students’ practice (CMOC_8_DFC).

You know there’s someone watching you, so you’re kind of doing things to look like you’re doing things, rather than to really be looking for what the problem is, whereas in a real patient, you don’t bother with any of the OSCE faff. You just get straight to what’s important (11th AD-Equiv, interview Site3 st1)

The way in which scenarios were enacted also influenced students’ immersion. When elements were incongruous with real practice (i.e. a simulated patient looked comfortable when they were supposed to be seriously unwell), this could lead students to make assumptions or feel confused, which could impact their performance (CMOC_9_Inc), but perhaps more importantly presented a dilemma about where the limits lay of doing what they should in practice versus demonstrating a skill in an artificial way. Where scenarios could legitimately be approached in different ways (for example a focused versus more systematic physical examination; different reasonable management approaches) students and examiners were unsure what to expect or how to behave, because they struggled to reconcile differences between approaches which may be acceptable in practice and typical expectations of formal exams (CMOC_10_LDP). This tended to further reduce students’ inclination to show their authentic clinical performance.

I love the idea of talking to the patient naturally and naturally leading onto an examination without being artificially put into that scenario … [but] there is a set marking scheme, there is a set thing we need to do … I was trying to think, okay, what do the examiners want from me? [okay] if I don’t do the clubbing, would I get marked down? (24th AD Equiv focus group, site4 students).

Some students struggled with the lack of precise direction about what actions to perform; they were used to demonstrating specified skills and the need to use judgement made them anxious and made it hard for them to behave authentically (CMOC_11a&b_Anx).

Students described OSCEs which impose ritualistic inauthentic expectations as reducing their engagement with clinical learning environments because their clinical learning isn’t rewarded in OSCEs (CMOC_12_Rit). Conversely, students perceived that the more that OSCEs reward demonstration of authentic practice, the more they would prepare by spending time in clinical environments resulting in greater preparedness for practice (CMOC_13a&b_IOL).

So maybe like by making the OSCE a bit more realistic and sort of something we can learn by being on the ward. You know, it might encourage us to prepare for practice a bit more. (8th AD-Equiv, focus group, site2 students)

Discussion

Summary of programme theory

Most students, when asked to respond to “authentic” scenarios recognised the applicability of the scenarios to their future roles and were able to act naturally and use their evolving clinical acumen to demonstrate an authentic representation of their preparedness for practice. This recognition was promoted when scenarios were typical of the work of new doctors, presented as complete tasks, and which required students to use their judgement to gather and integrate information to reach judgements and then communicate with and manage the patient. Students needed sufficient time to think and avoid rushing. Props and resources which were typical of the situation aided immersion. Students who had been more engaged in clinical practice recognised the applicability of scenarios to their future roles and were able to act more naturally.

Some circumstances challenged authenticity: when students or examiners focused on unavoidable differences with practice, or when incongruous elements of station implementation produced assumptions or confusion for students, or where students were preoccupied with the examiners’ expectations, this tended to inhibit immersion. These contexts caused students to question how naturally to behave, which produced anxiety and reduced the authenticity of the assessment for both students and examiners.

Theoretical interpretation

Our findings suggest that authenticity in OSCEs critically depends on how effectively candidates (and to a lesser, but still important degree) examiners and simulated patients suspend their disbelief and immerse themselves in the “fiction contract” (Dieckmann et al. Citation2007) required by the scenario. Several of our findings resonate with literature on fidelity in simulation: the use of cases drawn from real practice (Maclean et al. Citation2019), the need to actually perform (rather than state) tasks (Engström et al. Citation2016), the use of environmental cues (i.e. props) (Nanji et al. Citation2013) and the chance for the scenario to evolve based on the student’s actions (Marei et al. Citation2018). What our work adds is the observation that these can combine within an OSCE scenario in a way which enables students to project into their future role and act (and therefore demonstrate) their readiness to progress to the next stage of clinical practice. Moreover, students’ desire to do well in an assessment context sometimes appeared to conflict with their ability to be immersed, suggesting that immersion is more complex in assessment scenarios than when simulation is focused on learning. This may partly be because people find it hard to behave naturally whilst being observed (i.e. the Hawthorne Effect) (Adair Citation1984), a phenomenon which also occurs in workplace-based assessment (Watling et al. Citation2016). The assessment context may also make it harder for students to judge how far to suspend disbelief (can I trust my visual gestalt about a “patient” or do I need to demonstrate the steps I would take with a sicker patient in order to obtain marks?). Whilst this relates to Dieckmann’s (2007) idea of shared rules it shows that in an assessment context a conflict may arise between allowing enough autonomy to behave authentically and fairly communicating expectations. Consequently, whilst many principles of simulation fidelity are well established, we have shown how their application in an assessment must be more complex.

Immersion in the scenarios appeared to work less well for some people than others. Our data aren’t able to explain this observation fully and further investigation is warranted. Further explanations could arise from individual differences in abstract thinking ability or imagination. In other literature, differences in peoples’ ability to be immersed in virtual reality has been attributed to variation in spatial awareness (Coxon et al. Citation2016), whilst differences in engagement with a story’s narrative has been attributed to differences in emotional sensitivity (Samur et al. Citation2021). Consequently, there may be some individuals for whom engagement with the fiction contract is simply more difficult. This suggests the need for further exploration around the validity implications of OSCEs for such people.

Practical implications

Based on our findings, whilst acknowledging the need for replication in summative contexts, we tentatively suggest a number of principles which may be used to enhance the authenticity of assessment in OSCEs (see box 2). As with all complex interventions, implementing these principles will require careful consideration within the context they are used. The suggestion to allow sufficient time may result in fewer, longer stations. This may reduce sampling of the OSCE blueprint, which has itself been proposed as a major source of validity evidence (Wass et al. Citation2001). The comparative reliability of these stations is not yet well established and is hard to predict. On one hand, fewer independent observations may reduce reliability (Eva Citation2018) but judgements which align more with clinicians’ experience of practice could support reliability (Crossley et al. Citation2010) whilst focusing the assessment more on complex problem solving skills that are typical features of contemporary taxonomies of learning (Dubas and Toledo Citation2016). This debate is not new; Cook et al. (Citation2015) note a tension between what Kane (Citation2013) describes as extrapolation (using the score[s] as a reflection of real-world performance) and generalisation (using the score[s] as a reflection of expected performance in all possible test settings), or what has sometimes historically been referred to simply as validity versus reliability (van der Vleuten Citation1996). Authenticity of assessment tasks is key to extrapolating from the current observation to likely behaviour in practice. Consequently, we assert that by thoughtfully applying the principles we have proposed, educators may increase the ability to extrapolate from OSCEs to likely performance in real life. Further work will be required to balance this against impact on generalisation.

Given the practice-focused nature of these questions, we recommend that both station writing and examination should ideally be performed by people who are highly familiar with current work and expectations of new doctors’ roles.

Strengths and limitations

Our findings are supported by data which was sampled from four different institutions and included a large number of participants using Realist evaluation to study the operation of this station format across different contexts. Despite this, the study has some limitations. Firstly, it is limited by the voluntary nature of students’ and examiners’ participation whose perspectives may not have fully represented wider participants. The assessment was conducted in a formative context which could have produced different behaviour or reactions by participants compared with a summative setting. Consequently, it would be useful to determine if our findings hold true in routine summative settings. Our findings are limited by the sample of stations we used. Whilst these encompassed a representative range of tasks, further samples of stations designed along the same principles could potentially produce different results. Students could potentially have encountered the simulated patients in this study at previous times during their studies when the simulated patients were playing other roles. As we did not collect data on this, we cannot comment on the degree to which this may have influenced authenticity. This would benefit from exploration in future work.

Future research

Future research should examine how these principles operate across different contexts, including summative and postgraduate assessments and when applied to a wide range of other OSCE station formats. Further research should seek further contexts where the principles may achieve different outcomes or provide further insight into some of the remaining challenges, such as how to manage legitimately different approaches to the same task, different formats of station instruction, or how to avoid incongruous elements in station design. Theoretical work should explore how candidates, examiners and simulated patients engage with the fiction contract within summative assessment and its implications for assessment validity and outcomes for individuals.

Conclusions

The validity with which OSCEs can inform decisions on students’ preparedness to enter practice is (at least partly) dependent on how authentically the OSCEs simulate the practice into which students will progress. Responding to challenges to OSCEs’ authenticity, we have shown how the combination of design features used within “authentic OSCE stations” operated to increase assessment authenticity for the majority of students across a range of scenarios. The principles which we have developed represent a further advancement in our shared understanding of good assessment design and we encourage their use to enhance assessment authenticity within OSCEs.

Author contributions

Peter Yeates: substantial contribution to study conception, design, data collection analysis and drafting. Adriano Maluf: substantial contribution to study conception, design, data collection and analysis. Ruth Kinston: substantial contribution to study conception, design, data collection and analysis. Natalie Cope: substantial contribution to study conception, design, data collection and analysis. Kathy Cullen: substantial contribution to study conception and data collection. Aiden Cole: substantial contribution to study conception and data collection. Vikki O’Neill: substantial contribution to study conception and data collection. Ching-wa Chung: substantial contribution to study conception and data collection. Rhian Goodfellow: substantial contribution to study conception and data collection. Rebecca Vallender: substantial contribution to study conception and data collection. Sue Ensaff: substantial contribution to study conception and data collection. Rikki Goddard-Fuller: substantial contribution to study conception and data analysis. RK McKinley: substantial contribution to study conception and data analysis. Geoff Wong: substantial contribution to study conception and data analysis.

Supplemental material

Supplemental Material

Download MS Word (68 KB)

Acknowledgements

We would like to acknowledge and thank all students, simulated patients, examiners, assessment and administrative staff in each of the four participating medical schools who contributed to the study.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data sharing

Data are not available in a repository.

Box 1.

Glossary of Realist Evaluation Terminology:

Context: Any condition that triggers and/or modifies the behaviour of a mechanism. Context refers to the important feature(s) of the circumstances in which an intervention ‘works’ (or a phenomenon happens) which ‘trigger’ the mechanisms that generate outcomes.

Mechanism: The underlying process by which outcomes are caused. Mechanisms are usually descriptions of the tendencies, reasoning and behaviour of agents involved in a process or participants in an intervention and their response to the important context(s) in which they exist.

CMOC: ‘Context–mechanism–outcome configuration’; a diagrammatic or narrative description offering an explanation of the relationship between some particular context(s), mechanism(s) and outcome(s). Multiple CMOCs may exist within a single programme theory.

Programme theory: A theory that describes what an intervention comprises and how it is expected to work, or the process by which the outcomes of interest are thought to come about (expressed as a narrative description or in a diagram). A realist programme theory is expressed in terms of the relationships between relevant context(s), mechanism(s) and outcome(s) (CMOCs)—and the relations between CMOCs.

See (Duddy and Wong Citation2022) for reference.

Box 2 Summary recommendations to enhance authenticity in OSCEs

1. Select tasks typical of new doctors’ work.

2. Ask students to complete the whole clinical encounter as far as possible.

3. Encourage students’ judgement: limit prompting and intervention by examiners.

4. Provide sufficient time to avoid rushing. This enables clinical reasoning.

5. Provide access to realistic knowledge resources, such as guidelines or prescribing resources.

6. Use props to aid immersion in scenarios.

7. Require students to demonstrate skills rather than just verbalizing them.

8. Avoid incongruent elements which can disrupt scenario immersion.

9. Prepare students and examiners for uncertainty, acknowledging multiple legitimate approaches to safe, effective outcomes.

Additional information

Funding

Peter Yeates was funded through a National Institute for Health Research (NIHR) Clinician Scientist award. The study constitutes independent research and does not represent the views of the NIHR, the NHS or the department of health and social care.

Notes on contributors

Peter Yeates

Peter Yeates, Keele University, School of Medicine.

Adriano Maluf

Adriano Maluf, Faculty of Health and Life Sciences, De Montford University.

Ruth Kinston

Ruth Kinston, Keele University, School of Medicine.

Natalie Cope

Natalie Cope, Keele University, School of Medicine.

Kathy Cullen

Kathy Cullen, Queens University Belfast, School of Medicine, Dentistry and Biomedical Sciences.

Aidan Cole

Aidan Cole, Queens University Belfast, School of Medicine, Dentistry and Biomedical Sciences.

Vikki O’Neill

Vikki O’Neill, Queens University Belfast, School of Medicine, Dentistry and Biomedical Sciences.

Ching-wa Chung

Ching-wa Chung, University of Aberdeen, School of Medicine, Medical Sciences and Nutrition.

Rhian Goodfellow

Rhian Goodfellow, Cardiff University, School of Medicine.

Rebecca Vallender

Rebecca Vallender, Cardiff University, School of Medicine.

Sue Ensaff

Sue Ensaff, Cardiff University, School of Medicine.

Rikki Goddard-Fuller

Rikki Goddard-Fuller, Christie Education, Christie Hospitals NHS Foundation Trus.

Robert McKinley

Robert McKinley, Keele University, School of Medicine.

Geoff Wong

Geoff Wong, University of Oxford, Nuffield Department of Primary Care Health Sciences.

References

  • Adair JG. 1984. The Hawthorne effect: a reconsideration of the methodological artifact. J Appl Psychol. 69(2):334–345. doi:10.1037/0021-9010.69.2.334.
  • Adamo G. 2003. Simulated and standardized patients in OSCEs: achievements and challenges 1992-2003. Med Teach. 25(3):262–270. doi:10.1080/0142159031000100300.
  • Boursicot K, Kemp S, Wilkinson T, Findyartini A, Canning C, Cilliers F, Fuller R. 2021. Performance assessment: consensus statement and recommendations from the 2020 Ottawa Conference. Med Teach. 43(1):58–67. doi:10.1080/0142159X.2020.1830052.
  • Chan SCC, Rashid MA. 2023. The art of reinvention: the remarkable longevity of the OSCE. Med Educ. 58(2):177–179. doi:10.1111/medu.15266.
  • Cook DA, Brydges R, Ginsburg S, Hatala R. 2015. A contemporary approach to validity arguments: a practical guide to Kane’s framework. Med Educ. 49(6):560–575. doi:10.1111/medu.12678.
  • Coxon M, Kelly N, Page S. 2016. Individual differences in virtual reality: are spatial presence and spatial ability linked? Virtual Reality. 20(4):203–212. doi:10.1007/s10055-016-0292-x.
  • Crossley J, Johnson G, Booth J, Wade W. 2010. Good Question, Good Answer: construct alignment improves the performance of workplace based assessment scales. Med Educ. 45(6):560–569. doi:10.1111/j.1365-2923.2010.03913.x.
  • Dieckmann P, Gaba D, Rall M. 2007. Deepening the theoretical foundations of patient simulation as social practice. Simul Healthc. 2(3):183–193. doi:10.1097/SIH.0b013e3180f637f5.
  • Downing SM. 2003. Validity: on meaningful interpretation of assessment data. Med Educ. 37(9):830–837. doi:10.1046/j.1365-2923.2003.01594.x.
  • Dubas JM, Toledo SA. 2016. Taking higher order thinking seriously: using Marzano’s taxonomy in the economics classroom. Int Rev Econ Educ. 21:12–20. doi:10.1016/j.iree.2015.10.005.
  • Duddy C, Wong G. 2022. Grand rounds in methodology: when are realist reviews useful, and what does a ‘good’ realist review look like? BMJ Qual Saf. 32(3):173–180. bmjqs-2022-015236. doi:10.1136/bmjqs-2022-015236.
  • Engström H, Andersson Hagiwara M, Backlund P, Lebram M, Lundberg L, Johannesson M, Sterner A, Maurin Söderholm H. 2016. The impact of contextualization on immersion in healthcare simulation. Adv Simul. 1(8)1–11. doi:10.1186/s41077-016-0009-y.
  • Eva KW. 2018. Cognitive influences on complex performance assessment: lessons from the interplay between medicine and psychology. J Appl Res Memory and Cogn. 7(2):177–188. doi:10.1016/j.jarmac.2018.03.008.
  • Gormley GJ, Hodges BD, McNaughton N, Johnston JL. 2016. The show must go on? Patients, props and pedagogy in the theatre of the OSCE. Med Educ. 50(12):1237–1240. doi:10.1111/medu.13016.
  • Greenhalgh T, Pawson G, Westhorp G, Greenhalgh J, Manzano A, Jagosh J. 2017. The Realist Interview: the RAMESES II Project. https://www.ramesesproject.org/media/RAMESES_II_Realist_interviewing.pdf.
  • Harrison C. 2018. Aiming for agency and authenticity in assessment. Perspect Med Educ. 7(6):348–349. doi:10.1007/s40037-018-0484-z.
  • Hodges B. 2003a. OSCE! Variations on a theme by Harden. Med Educ. 37(12):1134–1140. doi:10.1111/j.1365-2923.2003.01717.x.
  • Hodges B. 2003b. Validity and the OSCE. In: Medical teacher. Vol. 25(3):250–254. doi:10.1080/01421590310001002836.
  • Hyde S, Fessey C, Boursicot K, MacKenzie R, McGrath D. 2022. OSCE rater cognition – an international multi-centre qualitative study. BMC Med Educ. 22(1):6. doi:10.1186/s12909-021-03077-w.
  • Kane MT. 2013. Validating the interpretations and uses of test scores. J Educational Measurement. 50(1):1–73. doi:10.1111/jedm.12001.
  • Kreptul D, Thomas RE. 2016. Family medicine resident OSCEs: a systematic review. Educ Prim Care. 27(6):471–477. doi:10.1080/14739879.2016.1205835.
  • LaDonna KA, Hatala R, Lingard L, Voyer S, Watling C. 2017. Staging a performance: learners’ perceptions about direct observation during residency. Med Educ. 51(5):498–510. doi:10.1111/medu.13232.
  • Lavoie P, Deschênes MF, Nolin R, Bélisle M, Blanchet Garneau A, Boyer L, Lapierre A, Fernandez N. 2020. Beyond technology: a scoping review of features that promote fidelity and authenticity in simulation-based health professional education. Clin Simul Nurs. 42:22–41. doi:10.1016/j.ecns.2020.02.001.
  • Maclean S, Geddes F, Kelly M, Della P. 2019. Realism and presence in simulation: nursing student perceptions and learning outcomes. J Nurs Educ. 58(6):330–338. doi:10.3928/01484834-20190521-03.
  • Manzano A. 2016. The craft of interviewing in realist evaluation. Evaluation. 22(3):342–360. doi:10.1177/1356389016638615.
  • Marei HF, Al-Eraky MM, Almasoud NN, Donkers J, Van Merrienboer JJG. 2018. The use of virtual patient scenarios as a vehicle for teaching professionalism. Eur J Dent Educ. 22(2):e253–e260. doi:10.1111/eje.12283.
  • Marwaha S. 2011. Objective Structured Clinical Examinations (OSCEs), psychiatry and the Clinical assessment of Skills and Competencies (CASC) same evidence, different judgement. BMC Psychiatry. 11(1):85. doi:10.1186/1471-244X-11-85.
  • Melvin L, Rassos J, Panisko D, Driessen E, Kulasegaram KM, Kuper A. 2019. Overshadowed by assessment: understanding trainee and supervisor perspectives on the oral case presentation in internal medicine workplace-based assessment. Acad Med. 94(2):244–250. doi:10.1097/ACM.0000000000002451.
  • Nanji KC, Baca K, Raemer DB. 2013. The effect of an olfactory and visual cue on realism and engagement in a health care simulation experience. Simul Healthc. 8(3):143–147. doi:10.1097/SIH.0b013e31827d27f9.
  • Nasir AA, Yusuf AS, Abdur-Rahman LO, Babalola OM, Adeyeye AA, Popoola AA, Adeniran JO. 2014. Medical students’ perception of objective structured clinical examination: a feedback for process improvement. J Surg Educ. 71(5):701–706. doi:10.1016/j.jsurg.2014.02.010.
  • Papoutsi C, Mattick K, Pearson M, Brennan N, Briscoe S, Wong G. 2018. Interventions to improve antimicrobial prescribing of doctors in training (IMPACT): a realist review. Health Serv Deliv Res. 6(10):1–136. doi:10.3310/hsdr06100.
  • Pawson R, Tilley N. 1997. Realistic evaluation. 1st ed. Sage Publications Ltd.
  • Yeates P, Maluf A, Kinston R, Cope N, McCray G, Cullen K, O’Neill V, Cole A, Goodfellow R, Vallander B, et al. 2022. Enhancing authenticity, diagnosticity and equivalence (AD-Equiv) in multi-centre OSCE exams in health professionals education. protocol for a complex intervention study. BMJ Open. 12(12):e064387. doi:10.1136/bmjopen-2022-064387.
  • Phinney LB, Fluet A, O'Brien BC, Seligman L, Hauer KE. 2022. Beyond checking boxes: exploring tensions with use of a workplace-based assessment tool for formative assessment in clerkships. Acad Med. 97(10):1511–1520. doi:10.1097/ACM.0000000000004774.
  • Gupta P, Dewan P, Singh T. 2010. Objective Structured Clinical Examination (OSCE) revisited. Indian Pediatr. 47(11):911–920. doi:10.1007/s13312-010-0155-6.
  • Rudland J, Wilkinson T, Smith-Han K, Thompson-Fawcett M. 2008. “You can do it late at night or in the morning. You can do it at home, I did it with my flatmate” The educational impact of an OSCE. Med Teach. 30(2):206–211. doi:10.1080/01421590701851312.
  • Samur D, Tops M, Slapšinskaitė R, Koole SL. 2021. Getting lost in a story: how narrative engagement emerges from narrative perspective and individual differences in alexithymia. Cogn Emot. 35(3):576–588. doi:10.1080/02699931.2020.1732876.
  • Schumacher DJ, Michelson C, Winn AS, Turner DA, Elshoff E, Kinnear B. 2022. Making prospective entrustment decisions: knowing limits, seeking help and defaulting. Med Educ. 56(9):892–900. doi:10.1111/medu.14797.
  • Sebok-Syer SS, Chahine S, Watling CJ, Goldszmidt M, Cristancho S, Lingard L. 2018. Considering the interdependence of clinical performance: implications for assessment and entrustment. Med Educ. 52(9):970–980. doi:10.1111/medu.13588.
  • Swanson DB, van der Vleuten CPM. 2013. Assessment of clinical skills with standardized patients: state of the art revisited. Teach Learn Med. 25 Suppl 1(sup1):S17–S25. doi:10.1080/10401334.2013.842916.
  • The RAMESES II Project. 2017. Retroduction in realist evaluation. Nihr. 207:1–3. http://www.ramesesproject.org/media/RAMESES_II_Retroduction.pdf.
  • Greenhalgh T, Pawson R, Wong G, Westhorp G, Greenhalgh J, Manzano A, Jagosh J. 2017. Quality standards for realist evaluation for evaluators and peer-reviewers the RAMESES II Project. www.ramesesproject.org.
  • van der Vleuten CPM. 1996. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ Theory Pract. 1(1):41–67. doi:10.1007/BF00596229.
  • van der Vleuten CPM, Schuwirth LWT, Scheele F, Driessen EW, Hodges B. 2010. The assessment of professional competence: building blocks for theory development. Best Pract Res Clin Obstet Gynaecol. 24(6):703–719. doi:10.1016/j.bpobgyn.2010.04.001.
  • Wass V, Vleuten CVD, Shatzer J, Jones R. 2001. Medical education quartet Assessment of clinical competence. Lancet. 357(9260):945–949. doi:10.1016/S0140-6736(00)04221-5.
  • Watling C, LaDonna KA, Lingard L, Voyer S, Hatala R. 2016. ‘Sometimes the work just needs to be done’: socio-cultural influences on direct observation in medical training. Med Educ. 50(10):1054–1064. doi:10.1111/medu.13062.
  • Wong G, Westhorp G, Manzano A, Greenhalgh J, Jagosh J, Greenhalgh T. 2016. RAMESES II reporting standards for realist evaluations. BMC Med. 14(1):96. doi:10.1186/s12916-016-0643-1.
  • Yepes-Rios M, Dudek N, Duboyce R, Curtis J, Allard RJ, Varpio L. 2016. The failure to fail underperforming trainees in health professions education: a BEME systematic review: BEME Guide No. 42. Med Teach. 38(11):1092–1099. doi:10.1080/0142159X.2016.1215414.