1,433
Views
3
CrossRef citations to date
0
Altmetric
Web Papers

Predictive validity of measurements of clinical competence using the Team Objective Structured Bedside Assessment (TOSBA): Assessing the clinical competence of final year medical students

, , , , &
Pages e545-e550 | Published online: 12 Nov 2009

Abstract

Background: The importance of valid and reliable assessment of student competence and performance is gaining increased recognition. Provision of valid patient-based formative assessment is an increasing challenge for clinical teachers in a busy hospital setting. A formative assessment tool that reliably predicts performance in the summative setting would be of value to both students and teachers.

Aim: This study explores the utility of the team objective structured bedside assessment (TOSBA), a novel ward-based formative assessment tool, in predicting student performance in the final clinical examination.

Methods: The performance of a cohort of final year students (n = 191) in the TOSBA was compared with their subsequent performance in the final examination. A comparison was also made between student performance in the existing formative assessment tool, the objective structured long examination record (OSLER) and the final examination. We also examined the relationship between the TOSBA and the components of the final examination using clustering around latent variables analysis.

Results: There was a clear relationship between student performance in the TOSBA and performance in the final examination (r2 = 0.35). Student performance in the OSLER showed a poor relationship with performance in the final examination (r2 = 0.15) compared with the TOSBA. The TOSBA results showed particular correlation with specific components of the final examination which were clinically based.

Conclusion: TOSBA performance is a strong predictor of subsequent performance in the final examination. The clustering of the TOSBA with other assessments of clinical skills underlines its utility. Further research is required to determine whether performance in the TOSBA is predictive of subsequent performance during internship.

Introduction

The importance of valid and reliable assessment of student competence and performance is gaining increased recognition. Society and other stakeholders rightly demand that the final medical examination delivers doctors who are competent and fit to practice as an intern. The predictive value of measurements obtained in medical school and future performance in medical practice has been the subject of a recent Best Evidence Medical Education (BEME) review (Hamdy Citation2006). Prediction of future student performance is of interest both to students and their clinical teachers and formative assessment is an important aspect of student learning and professional development. A formative and authentic assessment tool that reliably predicts future student performance, in a summative setting, would be of great value (Wilkinson & Frampton Citation2004).

Prior to 2005, the objective structured long examination record (OSLER) was the main assessment instrument used at the Royal College of Surgeons in Ireland medical school in the formative assessment of final year students (Gleeson Citation1997). We have recently described the team objective structured bedside assessment (TOSBA) which adapted the group objective structured clinical examination (GOSCE) (Biran Citation1991) and the team objective structured clinical examination (TOSCE) format (Singleton et al. Citation1999) as a formative ward-based teaching and assessment tool, using ward-based patients (Miller et al. Citation2007). The TOSBA has been introduced and is now used, in addition to the OSLER, as a formative assessment tool of final-year student clinical competence.

Aim

The aim of this study is to assess the utility of the TOSBA in predicting performance in the final clinical examination. Predictive validity is assessed by comparing the results of the TOSBA with the results of the final medical examination held later in the same academic year. In addition, we compare student performance in the TOSBA with that in the OSLER. Convergent and divergent validity is assessed by examining the relationship between student performance in the TOSBA and performance in the individual clinical components of the final examination.

Methods

The curriculum at our institution is a system- based, 5-year programme. The final examination is an assessment in medicine and surgery and comprises a number of components. These can be divided into those that assess knowledge (essay, short notes, multiple choice questions and a 10-station data OSCE) and those that assess clinical, patient-centred skills (an observed long case and a 10 station clinical OSCE). The clinical OSCE consists of eight patient-based stations (each 7.5 min duration) and two communication stations (each 10 min duration). Students who perform well in the final examination are awarded a distinction or ‘Honours’. This is designated as a ‘P+’ (Appendix).

We have previously reported a detailed description of the TOSBA (Miller et al. Citation2007). Briefly, the TOSBA is a ward-based teaching and formative assessment during which three groups of five students rotate through three bedside stations in the same medical ward. Each station is comprised of an in-patient and an examiner. Consecutive students in each group are each given 5 min to perform one of the five different standardised clinical tasks: (i) take a brief but focused history, (ii) perform a targeted physical examination, (iii) generate a patient-specific differential diagnosis, (iv) outline an investigation and management plan and (v) answer questions pertaining to the patient's drug prescription chart. The students are directly observed performing the tasks, are graded on their performance () and provided with educational feedback by the examiner. On completion of the TOSBA, all three examiners confer and an agreed final grade is awarded. Students are scheduled to attend two TOSBAs during their 4-week medicine rotation and therefore, ideally, see a patient with a problem in six different (organ) systems, depending on the available in-patient case-mix.

Table 1.  TOSBA and OSLER grading scheme

The OSLER is a 10-item analytical record of the traditional long case. The 10-item scale includes four items on presentation of history, (pace/clarity, communication process, systematic presentation and correct facts established), three items on physical examination (systematic, technique and correct findings established) and one item each on the formulation of appropriate investigations in a logical sequence, management and clinical acumen (Gleeson Citation1997). Unlike the TOSBA, students are not directly observed during the history-taking component of the OSLER. Educational feedback is provided and students are awarded one of the four grades on their performance – the P/P is not included (). The OSLER assessments were performed by members of the clinical team, to which a student was attached. An average of four (3.927) OSLERS were completed per student.

From September 2005 to March 2006, a total of 204 final-year medical students were exposed to TOSBAs over the course of the academic year. One hundred and ninety-one students sat for the final examination in medicine and surgery and were the subject of this study (11 students were not examined due to ineligibility and one student was absent). A core group of eight clinical faculty members who were familiar with curricular outcomes and expected level of student clinical competency carried out the TOSBA assessments. While no formal examiner training took place, the examiners had considerable experience in undergraduate teaching and assessment. An average of two (1.65) TOSBAs were completed per student. Data were analysed with Stata/SE release 10. Clustering around latent variables analysis was used to examine the patterns of association between assessment modalities.

Results

The performance of 191 students was analysed, and complete assessment data was available for 172 (90%) students. The relationship between the OSLER, TOSBA and final examination mark is illustrated in . Student performance in the OSLER showed a poor relationship with performance in the total final examination (r2 = 0.15). The OSLER showed a restricted grade distribution, with 56% of students achieving the same grade (P+) and a further 26% achieving a P/P+ (). Furthermore, there was only a two-mark mean difference in the final examination performance between students who received a P− grade and those who received a P/P+. While students who scored P− had a higher average mark than those who scored P, this difference was not statistically significant (Scheffé post hoc test, p = 0.444). This pattern was repeated for the final clinical mark.

Figure 1. Association of OSLER and TOSBA grades to final total mark. OSLER r2 = 0.15; TOSBA r2 = 0.35.

Figure 1. Association of OSLER and TOSBA grades to final total mark. OSLER r2 = 0.15; TOSBA r2 = 0.35.

In comparison, as shown in , a moderate correlation was found between the TOSBA and the final examination performance (r2 = 0.35) The TOSBA had a similar concentration of grades to the OSLER, with 57% of students receiving the same grade (P) and only 12% receiving extreme grades (P− and P+) (). However, the relationship of TOSBA grades to the final examination result showed a better discrimination, with a 12-mark difference in average performance between those who received a P– grade and those who received a P/P+ and a graded association between TOSBA grade and an average final mark, though the two-mark difference between the final results of those who achieved P/P+ and P+ grades is not statistically significant (ScheffÕ post hoc test p = 0.990). A similar pattern was again seen for final clinical mark.

Table 2.  Comparative performance of the OSLER and TOSBA

The relationship between the TOSBA and the failure and honours rate in the final examination mark is illustrated in . There is a clear relationship in both cases. A student who performed poorly in the formative TOSBA (P−) did not achieve an honours grade in the final examination and were likely to fail (38%). The converse is true for students who performed well in the TOSBA (i.e. scored either a P/P+ or P+ grade) – they had an 80% chance of achieving honours. In addition, no student who scored a P/P+ or P+ failed in the final examination. Of those students who scored a P in the TOSBA, 44% subsequently achieved an honours grade in the final examination.

Figure 2. Association gradients for failure rates and honour rates.

Figure 2. Association gradients for failure rates and honour rates.

Of the 11 students who failed in their final examination, nine also failed in the TOSBA (sensitivity 82%, 95% CI 48–98%). Seven of these (77%) received a P− grade and two (23%) received a P/P− grade. There were 47 failures overall on the TOSBA giving a predictive value of failure of 19% (95% CI 9% to 33%). Of the 17 students who failed in the clinical component of their final examination, eight had failed in the TOSBA, giving a sensitivity of 47%, 95% CI 23–72%. The predictive value was 17% (95% CI 8–31%).

Correlation between the TOSBA and final examination components

We examined the relationship between the TOSBA and the components of the final examination using clustering around latent variables analysis. Clustering around latent variables sequentially groups the variables into clusters with the aim of minimising the variation within clusters and maximising the variation between clusters (Vigneau & Qannari Citation2003). Unlike factor analysis, it is a sequential process that makes it useful for detecting complex data structures such as clusters-within-clusters. This results in a hierarchical cluster analysis of the variables. The output of the analysis is shown in . The assessments formed two broad clusters: the cluster at the top of the figure contains all of the written assessments. It can be seen that there is a strong link between assessments of the same type – the two multiple choice question (MCQ) exams cluster together, as do the data interpretation examinations (OSCE) and the essay examinations. Taken together, this cluster is dominated by assessments which test the knowledge domain and entail recognition, recall and organisation of material.

Figure 3. Association gradients for failure rates and honour rates.

Figure 3. Association gradients for failure rates and honour rates.

The lower cluster is the patient-centred assessments. The TOSBA clusters with the medical in vivo patient-based component of the OSCE, the clinical long case and the communication skills assessment. It correlated less well with the medical in vitro, data-based component of the OSCE and poorly with the MCQ, essay paper and short notes assessments that were fact-orientated and measured knowledge and memory

Discussion

The analysis presented in this article represents an evaluation of the utility of the TOSBA in predicting subsequent student performance and identifies aspects of the TOSBA which could improve its value as a formative assessment tool of final-year student clinical competence.

Validity refers to the extent to which a measurement actually measures what it is intended to measure (Van der Vleuten Citation2000). Validity is not so much a property of a test, but rather refers to the usefulness of the test for a particular purpose. Multiple sources of evidence are required to evaluate the appropriateness of a test for a particular purpose (Sireci Citation2007). Although in the present evaluation we were limited to examine the relationship between performance in the TOSBA and other measures of student performance, we were also able to use these to examine convergent, divergent and predictive validity, all characteristics of the assessment process which are often neglected because of the difficulties inherent in their determination (Van der Vleuten Citation2000). Convergent validity was supported by the clustering of the TOSBA scores with other measures of clinical competence, and divergent validity supported by their distinctness from knowledge-based assessments.

Student performance in the TOSBA was predictive of performance in the final examination which, in itself, is unremarkable since it assesses skills which are also assessed in components of the final examination. In particular, however, 82% of those who would ultimately fail their in final examination failed in the TOSBA. However, given the failure rate in the TOSBA, this amounted to a predictive value of just under 20%, i.e. only 20% of students failing in the TOSBA went on to fail in the final examination. This may be, in part, explained by the subsequent remediation programme which was provided for those students who failed in their TOSBA, accounting for an improved final examination pass rate. Timely intervention based on early identification of poor clinical performance has been shown to help weaker students to improve their performance (Sayer et al. Citation2002). In addition, although poorly performing students have been shown not to seek out guidance and support (Malik Citation2000), we believe it is possible that students, who were identified as potential failures by the TOSBA, used this formative assessment as an incentive to improve their performance. Closer monitoring of the impact of remedial support on the subsequent performance of underperforming students will be an area of future research.

The TOSBA grades showed an ordered relationship with performance in the final medical examination which was superior to that of the OSLER, and a more impressive discriminant ability, with important differences in performance on the final examination being evident between the highest and lowest TOSBA grades. The ordered relationship with performance in the final medical examination shown by the TOSBA was not seen with the OSLER which demonstrated poor discriminant ability, with little differences in performance on the final examination being evident between the highest and lowest OSLER grades. The poor performance of the OSLER may reflect the junior status and inadequate examiner training of those who carried out these assessments – predominately interns or senior house officers. In addition, observation of the history-taking process in the TOSBA, unlike the OSLER (unobserved), may improve the validity of the TOSBA grades. As a result of our data showing the clear superiority of the TOSBA over the OSLER, we no longer use the latter as a formative assessment tool.

The TOSBA was a useful predictor of an honours performance in the final examination. A student who achieved a P/P+ or P+ grade had an 80% chance of achieving honours. The converse is true for students who performed poorly in the formative TOSBA – they were very unlikely to achieve an honours grade in the final examination.

While the TOSBA is a good predictor of extremes in performance, it is less reliable for those students who received a P grade. Examination of the TOSBA grade breakdown suggests that the criteria for the P/P+ and P+ grades need to be reviewed. Forty-four percent of students who obtained a P grade in the TOSBA subsequently achieved honours (P+) in their final examination. Further analysis is required to evaluate this central cluster of grades.

The use of clustering around latent variables to examine convergent and divergent validity of the TOSBA is a new and potentially useful statistical tool (Vigneau & Qannari Citation2003). Clustering around latent variables shows the interrelationships between assessments as a tree diagram allowing the reader to see clustering at several levels. In the case of our final year assessments, it is clear that there are two broad categories, corresponding to knowledge-based and clinical skill assessments, and that the TOSBA fits into the latter category. However, within the assessments, a finer structure can be seen: assessments that test the same domain tend to cluster together – notably the MCQ examinations, whose close relationship indicates that there is an element of 'MCQ skill' underlying performance on these assessments.

We have not used the TOSBA in the final examination. This study is an exercise in the use of a formative assessment tool, with a high throughput, to determine its utility in predicting student performance and to evaluate its potential as a summative tool. In the context of increasing student numbers, the sustainability of the OSCE as an assessment tool is open to question (Harden & Gleeson Citation1979). The challenges of running an OSCE of psychometric integrity for large student groups are well documented (Van der Vleuten & Swanson Citation1990). This may be a circumstance where the TOSBA could be introduced with good effect. The use of real patients in the TOSBA confers an authenticity, which is not a feature of the OSCE, to the assessment of clinical competence. In addition, an earlier study of the TOSBA has shown that students appreciated the educational opportunity of learning from their peers in a team setting during the formative TOSBA assessment process (Miller et al. Citation2007).

Evaluation of test validity is not a static, one-time event (Sireci Citation2007). We are currently using the results of this evaluation to improve the conduct of the TOSBA assessment, concentrating on examiner training and clearer grading guidelines for examiners with the aim of reducing the proportion of students clustered into the same grade.

Conclusion

The challenge of providing rigorous assessment of medical student cohorts of increasing size is one that faces all medical schools. The potential to reliably assess students in groups, in an authentic environment, represents a move away from traditional approaches to undergraduate assessment. If this approach can be implemented, without jeopardising the quality of the assessment, it is to be welcomed.

We believe the TOSBA has development potential as an assessment tool. The potential is shown in the good discriminatory ability of the grades as predictors of final examination performance. The clustering of the TOSBA with other assessments of clinical skills underlines its utility. Future analyses will address the current failure of the grading scheme to identify many of the students who subsequently achieved honours in the final examination. Our findings have education and research implications. We believe the TOSBA provides valid measurements of clinical competence that are useful in the formative assessment of medical students in an authentic setting, though it may benefit from further refinement. Further research is required to determine whether undergraduate performance as measured by the TOSBA is predictive of subsequent performance during internship.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

Additional information

Notes on contributors

Frances M. Meagher

F.M. MEAGHER is a Corrigan Lecturer in Medicine and Physician at the Royal College of Surgeons in Ireland (RCSI).

Marcus W. Butler

M.W. BUTLER is a Lecturer in Medicine and Specialist Registrars in Respiratory Medicine, RCSI.

Stanley D.W. Miller

S.D.W. MILLER is a Lecturer in Medicine and Specialist Registrars in Respiratory Medicine, RCSI.

Richard W. Costello

R.W. COSTELLO is a Senior Lecturer in Medicine, RCSI, and Respiratory Physician in Beaumont Hospital, Dublin.

Ronan M. Conroy

R.M. CONROY is a Senior Lecturer in Epidemiology and Public Health, RCSI.

Noel G. McElvaney

N.G. McELVANEY is a Professor of Medicine and Chairman of the Department of Medicine, RCSI, and Respiratory Physician, Beaumont Hospital, Dublin.

References

  • Biran LA. Self-assessment and learning through GOSCE (group objective structured clinical examination). Med Educ 1991; 25(6)475–479
  • Gleeson F. The objective long examination record (OSLER). Med Teach 1997; 19: 7–14
  • Hamdy H, Prasad K, Anderson MB, Scherpbier A, Williams R, Zwierstra R, Cuddihy H. BEME systematic review: Predictive values of measurements obtained in medical schools and future performance in clinical practice. Med Teach 2006; 28(2)103–116
  • Harden RM, Gleeson FA. Assessment of clinical competence using an objective structured. Med Educ 1979; 13(1)41–54
  • Malik S. Students, tutors and relationships: The ingredients of a successful support scheme. Med Educ 2000; 34: 635–641
  • Miller SDW, Butler MW, Meagher F, Costello RW, McElvaney NG. Team objective structured bedside assessment (TOSBA): A novel and feasible way of providing formative teaching and assessment. Med Teach 2007; 29: 156–159
  • Sayer M, Chaput De Saintonge M, Evans D, Wood D. Support for students with academic difficulties. Med Educ 2002; 36(7)643–650
  • Singleton A, Smith F, Harris T, Ross-Harper R, Hilton S. An evaluation of the team objective structured clinical examination (TOSCE). Med Educ 1999; 33(1)34–41
  • Sireci SG. On validity theory and test validation. Educ Res 2007; 36(8)477–481
  • Van der Vleuten CPM. Validity of final examinations in undergraduate medical training. Brit Med J 2000; 321: 1217–1219
  • Van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: State of the art. Teach and Learn Med 1990; 2: 58–76
  • Vigneau E, Qannari EM. Clustering of variables around latent components. Commun Stat-Simul C 2003; 32(4)1131–1150
  • Wilkinson TJ, Frampton CM. Comprehensive undergraduate assessments improve prediction of clinical performance. Med Educ 2004; 38: 1111–1116

Appendix: OSLER grading system

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.