862
Views
16
CrossRef citations to date
0
Altmetric
Web Papers

Assessing postgraduate trainees in Canada: Are we achieving diversity in methods?

, , &
Pages e58-e63 | Published online: 03 Jul 2009

Abstract

Background: Resident evaluation is a complex and challenging task, and little is known about what assessment methods, predominate within or across specialties.

Aims: To determine the methods program directors in Canada use to assess residents and their perceptions of how evaluation could be improved.

Methods: We conducted a web-based survey of program directors from The Royal College of Physicians and Surgeons of Canada (RCPSC)-accredited training programs, to examine the use of the In-Training Evaluation Report (ITER), the use of non-ITER tools and program directors’ perceived needs for improvement in evaluation methods.

Results: One hundred forty-nine of the eligible 280 program directors participated in the survey. ITERs were used by all but one program. Of the non-ITER tools, multiple choice questions (71.8%) and oral examinations (85.9%) were most utilized, whereas essays (11.4%) and simulations (28.2%) were least used across all specialties. Surgical specialties had significantly higher multiple choice questions and logbook utilization, whereas medical specialties were significantly more likely to include Objective Stuctured Clinical Examinations (OSCEs). Program directors expressed a strong need for national collaboration between programs within a specialty to improve the resident evaluation processes.

Conclusions: Program directors use a variety of methods to assess trainees. They continue to rely heavily on the ITER, but are using other tools.

Introduction

Postgraduate medical education and assessment have undergone many changes. While it is recognized that the ultimate goal of a postgraduate training program is to produce competent physicians ready for independent medical practice (Epstein & Hundert Citation2002), the perspective of what constitutes a ‘competent’ physician and what evidence is required to attest that someone is ‘competent’ have changed. Professional organizations, such as the Royal College of Physicians and Surgeons of Canada (RCPSC) (Frank Citation2005), the Accreditation Council for Graduate Medical Education (ACGME) of the United States (ACGME Citation2007), the Modernizing Medical Careers Programme of the United Kingdom (UK Foundation Programme Office Citation2007) and the Confederation of Postgraduate Medical Education Councils of Australia (Commonwealth Department of Health and Ageing Citation2003), have each identified the competencies which physicians will attain through their postgraduate residency training. These competencies, while varying from one jurisdiction to another, include communication skills, collegiality and professionalism in addition to the medical expert role.

Traditionally in Canada, the assessment of the trainee has relied upon the In-training Evaluation Report (ITER). The ITER provides systematic, longitudinal observation and feedback on resident performance in real clinical settings. It can be completed part way through or at the end of a rotation to document how well the trainee is doing with physician preceptors, other residents, allied health professionals, and patients. ITER formats include checklists, global rating scales and behavioural assessment forms (Gray Citation1996b). At the end of training, a Final In-Training Evaluation Report (FITER) is completed enabling the candidate to proceed to the RCPSC certifying examination. However, along with the development of CanMEDS roles has come the recognition that a combination of assessment tools must be used to evaluate trainees on a broader range of competencies. Accordingly, the RCPSC has published The CanMEDS Assessment Tools Handbook: An Introductory Guide to Assessment Methods for the CanMEDS Competencies (Bandiera et al. Citation2006). The handbook is designed to help residency education program directors and committees select the optimal methods for assessment. It provides general information and suggests how methods such as the ITERs, written tests, structured oral examinations, direct observation, Objective Structured Clinical Examinations (OSCEs), standardized patients, multi-source feedback, portfolios and logbooks, simulation-based assessment and encounter cards can be informative. It also suggests which tools are most effective for each of the competencies.

Despite the availability of a handbook and training offered by the RCPSC and within medical schools, it is not known what assessment methods predominate within or across specialties. The general standards of accreditation of the RCPSC (RCPSC Citation2006) states that the assessment system must be based on the goals and objectives of the program and must clearly identify the methods by which the residents are to be evaluated. It further specifies that residency programs must assess medical knowledge with appropriate written and performance-based assessment, clinical skills with direct observation and other CanMEDS competencies with observation and interviews with peers, allied health professionals and patients. However, the actual ways that residents are assessed have been left to the discretion of program directors. This approach is very different than that adopted in the United Kingdom, where the Foundation Programme has mandated the use and administration frequency of four specific tools, namely, mini clinical evaluation exercise, direct observation of procedural skills, case-based discussion and multi-source feedback (UK Foundation Programme Office Citation2007). In Canada, it is particularly important to identify the approaches being used in evaluation as this will provide a baseline on which the adoption of different tools can be monitored over time. Sharing ‘best practices’ across and within specialties can inform and improve assessment.

The purpose of this study was to examine evaluation practices in Canadian postgraduate training programs. Our first objective was to determine the assessment methods used currently, with a particular focus on the ITER due to its traditional role in evaluation. Our second objective was to identify the ways that program directors believe residency evaluation could be improved. We also examined whether the medical, surgical and investigative specialties differed in their approaches to resident evaluation.

Methods

Study sample

We conducted a web-based survey of program directors for the RCPSC accredited specialty postgraduate training programs from the 13 English-speaking medical schools for the academic year of 2006–2007. Due to the feasibility and practicality considerations, subspecialties and French-speaking programs were excluded. There were a total of 284 eligible training programs with 280 program directors surveyed; 4 program directors had responsibility for 2 programs each.

Data collection

The survey asked program directors to describe how often they administered ITERs. Firstly, they were asked whether the ITERs were program or rotation specific and who completed the ITERs (i.e. physician preceptors only versus nurses, other health care professionals, resident peers, medical students, patients and self). They were also asked about ITER training for physician raters. Secondly, program directors were queried about their use of non-ITER assessment tools, including Multiple Choice Questions (MCQs), Short Answer Questions (SAQs), essays, oral examinations, OSCEs, simulations and logbooks. They were also queried about the mean number of non-ITER assessment tools used. Lastly, program directors were asked to use a 5-point scale to indicate their perceived need for the development of new assessment tools, systematic integration of currently available tools, national collaboration between programs within a specialty, local collaboration between residency programs within a university and for specific guidelines for tools by the RCPSC.

The survey was administered on-line using Survey Console (SurveyConsole Citation2007). The on-line survey, along with a cover letter that explained the nature of the study and its potential impact, were first distributed in early March 2007 by e-mail to the specialty program directors included in the study. When the response rate fell at approximately 2.5 weeks after the first e-mail, a reminder e-mail was sent to all program directors. A third e-mail invitation was sent 2 days after the study deadline to those who had not responded. Data collection was completed by April 2007.

Statistical analyses

Programs were grouped into three categories: medical, surgical and investigative specialties (). The mean number of trainees in each program along with the range (minimum and maximum) were determined. Descriptive analyses were calculated for the items on the survey. Differences between specialty groups were analysed using one-way analysis of variance (ANOVA) and p < 0.001 was considered statistically significant.

Table 1.  Response rate and mean number of non-ITER assessment tools used by programs

This study received ethics approval from the Conjoint Health Research Ethics Board at the University of Calgary.

Results

Of the 280 eligible program directors, 11 could not be contacted due to inaccurate e-mail addresses. A total of 167 (62.1%) questionnaires were returned and 149 (55.4%) were included in the final analysis. Sixteen duplicate questionnaires were excluded and additional two questionnaires were excluded due to the lack of information on specialty and university. Because none of the program directors from the general pathology and medical biochemistry returned completed questionnaires, these two specialties were not included in the analyses. Of the 131 non-respondent program directors, 52 were from the 126 eligible programs in medical specialties, 41 were from the 99 programs in surgical specialties and 38 were from the 55 programs in investigative specialties. Investigative specialties were significantly less likely to participate in the study; only 17 of the 55 (31%) eligible program directors responded, compared to the 58.7% and 58.6% response rates in the medical and surgical specialties, respectively. The mean numbers of residents were similar between respondent and non-respondent programs (21.8 ± 23.5 versus 17.9 ± 16.7, respectively, p = 0.406). Information about the response rates by programs are provided in .

In-training evaluation reports

In-training evaluation reports were used by all but one respondent program. Of the respondent programs, 47 programs (32.4%) administered ITERs at least monthly, 39 programs (26.9%) administered ITERs bimonthly and 58 programs (40.0%) administered ITERs quarterly. Significant differences in ITER administration frequency were observed between the three specialty categories. The medical specialties administered ITERs most frequently (almost monthly), while surgical specialties were more likely to administer them quarterly. In keeping with the RCPSC evaluation requirement, 128 (85.9%) of the respondent programs had ITERs developed specifically for their programs. Furthermore, 85 programs (58.2%) had specific ITERs tailored for different clinical rotations. These results are presented in . A total of 37 programs (24.8%) provided ITER training for physician raters. Compared to physician raters, very few programs had non-physician assessors involved in ITER completion. These results are presented in .

Table 2.  ITER characteristics and administration frequency with physician raters by specialty category

Table 3.  ITER administration frequency by source

Non-ITER assessment tools used

When queried about assessment tools other than ITERs, all respondent programs reported using one or more tools for resident evaluation. Multiple choice questions and oral examinations were most frequently used, and were also more likely to have pre-determined pass/fail. Essays and simulations were least utilized. About half of the respondent programs reported the use of OSCEs. These results are summarized in . Among the non-ITER assessment tools, oral examinations were administered most frequently, with a mean of 2.20 (SD = 1.48) times per year as shown in . When considering the annual administration frequency of all non-ITER assessment tools as a sum, residents were evaluated from twice per year in two respondent programs (1.5%) to 29 times per year in one program (0.7%), with a mean of 7.95 (SD = 4.18) times per year. Of the respondent programs, the mean number of non-ITER assessment tools utilized was 3.64 (SD = 1.31) ().

Table 4.  Non-ITER assessment tool utilization by specialty and with pre-determined pass fail systems

Table 5.  Non-ITER assessment tools annual administration frequency

Ways of improving resident evaluation

When asked about potential areas for improvement in evaluation processes, respondent program directors indicated that national collaboration between programs within their own specialty, and leadership from the RCPSC regarding assessment tools were their greatest needs.

Discussion

This is one of the first studies of Canadian residency program directors designed to examine current national practices of resident evaluation. The response rate includes program directors from the 28 RCPSC accredited specialties across 13 English-speaking medical schools with the RCPSC accredited residency training programs at the time of the study. Investigative specialties were less likely to participate in the study.

In-training evaluation reports

ITERs completed by physician raters continue to be the most commonly used assessment method with most programs administering them monthly or every 2 months. The combined advantages of feasibility and face validity likely lead to this high usage (Daelmans et al. Citation2005). Few programs provided ITER training to physician raters, a finding consistent with the results of previous work by Epstein and Hundert (Citation2002) and Ruedy (Citation2006). Very few programs relied on non-physician raters to assess trainees. This last finding was again identified by Epstein and Hundert (Citation2002) and Ruedy (Citation2006), and is a concern as proponents of ITERs such as Gray (Citation1996a) and Turnbull et al. (Citation2000) have advocated for multi-disciplinary input to improve the validity and reliability of ITERs. The general standards of accreditation of the RCPSC (RCPSC Citation2006) strongly recommends evaluation from non-physician assessors. There were, however, differences in the use of ITERs by specialty grouping. Medical and investigative specialties administered ITERs more frequently than surgical specialties. Non-physician raters were more likely to be used by medical and surgical specialties than investigative specialties. This finding likely relates to the nature of training for investigative specialties and the opportunity these professionals get to work with nurses, patients and medical students. It does not account for a lower involvement by resident peers in ITER evaluation.

Non-ITER assessment tools used

Consistent with the recognition that many methods are required to ensure a quality assessment across competencies, it was reassuring to find that all respondent programs used one or more assessment tools in resident evaluation, in addition to the ITER. Oral examinations, MCQs, and SAQs are the three most popular non-ITER assessment methods used by most programs. About half of the programs used OSCEs and logbooks. Very few programs used simulations and even fewer relied on essays. There were differences by specialty group. Medical specialties were most likely to use OSCE assessments, possibly related to the nature of the work of medical specialists and the significant investment in human resources required for OSCEs (Barman Citation2005), which become cost-effective when there are sufficient number of trainees. In our study, medical specialties reported higher mean number of residents than the surgical and investigative specialties. Surgical specialties had higher MCQ and logbook utilization than medical and investigative specialties. All surgical trainees (except those in ophthalmology and obstetrics and gynaecology) have to pass the Principles of Surgery Examination which consists of a series of MCQs. Most surgical programs recommend their residents maintain an updated procedural log throughout their residency in order to obtain future hospital privileges in Canada and the United States.

Ways of improving resident evaluation

Medical education researchers suggest that using a combination of assessment tools results in more comprehensive evaluation of the learner and hence a higher quality evaluation of trainees (Miller Citation1990; Holmboe & Hawkins Citation1998; Turnbull et al. Citation1998; Farrell Citation2005; van der Vleuten & Schuwirth Citation2005; Cole Citation2006). To this end, the RCPSC requires residency training programs to provide ongoing evaluation of their residents using a variety of assessment methods. This study provides reassuring evidence that all programs were in compliance with this recommendation. Nonetheless, respondent program directors reported a strong need for improvement in resident in-training evaluation processes with support for national collaboration between programs within a specialty. One can propose that specialty societies should take a leadership role in establishing lateral collaboration between programs in resident evaluation. Such organizations exist in the United States, e.g. the Association of Program Directors in Radiology Education Committee, which assists radiology program directors to meet ACGME evaluation requirements (Collins et al. Citation2004). A similar approach in Canada to the formation of committees of program directors would facilitate the sharing of best practices across programs.

Study limitations

The study has a number of limitations. It was carried out in one country involving only English-speaking programs, and therefore findings may not be applicable to the French-speaking programs in Canada. Although there were data from all the 13 English-speaking institutions, program directors for the investigative specialties were significantly less likely to participate in the study compared to the medical and surgical specialties, reducing the applicability of the study results to that group. Furthermore, only program directors were surveyed and only their perspectives were presented in the study. Since program directors are only one of many stakeholders involved in resident evaluation, and as reported by Murphy et al. (Citation2008) awareness of the perceptions of all stakeholders is important in resident evaluation. However, a study surveying all stakeholders was beyond the scope of this study.

Nonetheless, these findings suggest a number of new opportunities to improve trainee assessment. ITER assessment could be enhanced by training raters to reduce bias and errors (Holmboe & Hawkins Citation1998). Involving peers, patients and other health care professionals in ITER assessment would likely increase the validity of the ratings. Canada is a small country with 17 medical schools. Collaboration with the RCPSC and within specialties could help with the testing of assessment instruments and approaches to training. The importance of assessment is recognized and is too complicated to be left to the limited resources that many residency programs and specialties have.

Conclusion

In this study we found that while most of the Canadian residency programs use ITERs to evaluate resident performance, few programs provide rater training for ITERs. This increases the idiosyncracy of observer ratings, which may reduce the validity of ITERs. Similarly, few programs use non-physician raters, which may further compromise the rating quality, particularly for competencies such as collaboration. In addition to ITERs, multiple choice questions and oral examinations are the most popular tools in assessing resident competencies whereas essays and simulations are least used across all specialties. However, important differences were observed between specialties. Surgical specialties had higher MCQ and logbook use whereas medical specialties were more likely to incorporate the OSCE in resident evaluation. Finally, helping program directors to develop and share tools across institutions may be the optimal way of improving the diversity and quality of evaluation methods currently being used.

In this study, we found that the use of ITERs continued to predominate resident assessment. Nonetheless, other methods are being adopted and used across and within disciplines to assess trainees. This work will remain important as the task of ensuring competence across a broad set of physician roles continues. This work will require ongoing collaboration across programs, universities and within disciplines.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

Additional information

Notes on contributors

Sophia Chou

SOPHIA CHOU, MD, MSc, is the principal investigator responsible for study design, data analysis and drafting of this article.

Jocelyn Lockyer

JOCELYN LOCKYER, PhD, has made significant contribution to study design, data analysis and article write-up.

Gary Cole

GARY COLE, PhD, has mode significant contribution to the data acquisition and article revision.

Kevin McLaughlin

KEVIN MCLAUGHLIN, MD, PhD, has made substantial contribution to the study design and article revision.

Reference

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.