1,927
Views
4
CrossRef citations to date
0
Altmetric
Validation

Discriminating Features of Narrative Evaluations of Communication Skills During an OSCE

ORCID Icon, , &

Abstract

Construct: Authors examined the use of narrative comments for evaluation of student communications skills in a standardized, summative assessment (Objective Structured Clinical Examinations [OSCE]). Background: The use of narrative evaluations in workplace settings is gaining credibility as an assessment tool, but it is unknown how assessors convey judgments using narratives in high-stakes standardized assessments. The aim of this study was to explore constructs (i.e., performance dimensions), as well as linguistic strategies that assessors use to distinguish between poor and good students when writing narrative assessment comments of communication skills during an OSCE. Approach: Eighteen assessors from Qatar University were recruited to write narrative assessment comments of communication skills for 14 students completing a summative OSCE. Assessors scored overall communication performance on a 5-point scale. Narrative evaluations for the top and bottom 2 performing students for each station (based on communication scores) were analyzed for linguistic strategies and constructs that informed assessment decisions. Results: Seventy-two narrative evaluations with 662 comments were analyzed. Most comments (77%) were written without the use of politeness strategies. A further 22% of comments were hedged. Hedging was used more commonly in poor performers, compared to good performers (30% vs. 15%, respectively). Overarching constructs of confidence, adaptability, patient safety, and professionalism were key dimensions that characterized the narrative evaluations of students’ performance. Conclusions: Results contribute to our understanding regarding the utility of narrative comments for summative assessment of communication skills. Assessors’ comments could be characterized by the constructs of confidence, adaptability, patient safety, and professionalism when distinguishing between levels of student performance. Findings support the notion that judgments are arrived at by clustering sets of behaviors into overarching and meaningful constructs rather than by solely focusing on discrete behaviors. These results call for the development of better-anchored evaluation tools for communication assessment during OSCEs, constructively aligned with assessors’ map of the reality of professional practice.

Introduction

There are increasing calls for use of narrative evaluations for assessment within medical education.Citation1,Citation2 Narrative allows assessment to go beyond quantitative scales and rubrics and enables greater individualization and specificity of assessment information.Citation3–5 These data can be used to inform targeted feedback in formative assessment and to support decisions in summative assessments that must be defendable (such as graduation decisions or licensure and registration requirements).Citation3,Citation5,Citation6 Moreover, narrative data may allow decision makers to better conceptualize student performance, based on better understanding of assessors’ performance interpretations and judgments.

Although, theoretically, use of narrative comments for assessment purposes is established, questions remain prior to implementation as a summative assessment strategy.Citation6 Past research on narrative from workplace-based in-training evaluation reports (ITERs), for example, showed that assessors can reliably categorize and rank order narrative evaluations based on trainee overall performance level and clinical skills.Citation5,Citation6 Of interest, however, little research has tried to determine how those assessors who write narrative use certain words, comments, or phrases to provide this discriminatory ability, especially relating to communication competencies. For example, it is unknown which communication skills, behaviors, or attributes assessors document in support of high performance and vice versa for justification of poor performance. It is known from rater cognition research that assessors typically vary on what they focus on when observing interactions and how they interpret overall performance when making judgments as a whole.Citation7–9 There is growing evidence that these differences could represent meaningfully idiosyncratic interpretations of student performance.Citation8 Consequently, in narrative comments assessors may be inclined to document components that are relevant to their own performance theory, namely, the unique set of behaviors or constructs that they pay attention to when making an overall judgment.Citation4,Citation10,Citation11

Aside from what assessors document to support or justify high or poor performance, how they write the comments may offer additional insight into their overall performance judgment. Research findings in the context of workplace-based assessments indicate that assessors’ language in narratives is often vague and generic; assessors use “hidden codes” and other linguistic strategies that must be deciphered to understand the intended meaning of comments.Citation12–14 Findings demonstrate that assessors frequently use linguistic strategies of politeness to disguise or soften conveyed judgments, more commonly in poor performers.Citation14 Politeness theory attempts to explain how individuals in the setting of a face-threatening act use linguistic strategies protect the receiver’s self-image as well as to evade any potential conflicts and ensure smooth social interactions.Citation15 With respect to poor performers, for example, the assessor may modify the impact of their comments by using strategies to soften effects on the receiver, such as depersonalizing comments (“no eye contact” vs. “she gave no eye contact”) or choosing hedges (words/phrases) to not fully commit to statements ( “a bit rude,” “not very self-confident”).Citation14,Citation15 In the case of high performers, on the other hand, assessors may choose to give compliments (“Excellent job”) or make the receiver feel accepted as part of a group (“A great pharmacist”) to intensify their approval.Citation14 Research findings suggest that assessors modify their language when writing assessment comments depending on performance level (increase hedging in poor performers), affecting overall interpretation or actually providing clues or codes to assessors’ intended judgment.Citation14

Although data regarding narratives in assessment are increasing, research on the use of narratives for summative assessment so far focuses on workplace based settings. Given increasing calls for use of narrative assessments, on one hand, and the potential impact of assessors’ idiosyncrasies and linguistic strategies on utility of narrative comments for decision making, on the other, there is an urgent need to explore these phenomena in different assessment contexts. Objective Structured Clinical Examinations (OSCEs) are typically task-focused assessments used as summative evaluations of a trainee’s ability to demonstrate clinical skills.Citation16 Recent research regarding narrative in OSCE contexts focused on the feedback potential of narratives but did not address assessors’ use of narratives for high-stakes decision making.Citation17,Citation18 If narrative evaluations are to be used as a summative assessment approach in these contexts, we need a better understanding of how assessors use specific comments to discriminate and convey messages about students’ performance levels. Therefore, the aim of this study was to explore constructs (i.e., performance dimensions), as well as linguistic strategies that assessors use to distinguish between poor and good students when writing narrative assessment comments of communication skills during an OSCE.

Methods

Setting

This study was conducted in Qatar. The College of Pharmacy at Qatar University maintains full accreditation for the Bachelor of Science in Pharmacy and Doctor of Pharmacy programs from the Canadian Council of Accreditation of Pharmacy Programs.Citation19 As such, teaching and assessment methods, including use of OSCEs, align with Canadian policies, procedures, and standards.Citation20 Currently, students take a summative OSCE prior to graduation as an exit-from-degree exam. The OSCE is high stakes in nature, as it serves as the final examination for the final clinical course within the curriculum. For the OSCE, cases are blueprinted to the program’s competency framework.Citation21 Interactive (i.e., communication) stations include cases related to medication counseling, referral, self-care, device teaching, ethics in confidentiality, adverse effect management, and provision of health-related information. Students are unaware of case topics prior to completion of the OSCE and have 8 minutes to complete each station. Standardized textbook references are available, if required.

Research procedures and data collection

For this study, two OSCE cycles (each consisting of the same nine interactive OSCE stations) occurred on the same day, with seven participant-students in each cycle (a total of 14 students). Students were convenience sampled from a total population of 25 students (all female), as only 25 students are enrolled per academic year. Recruitment occurred via e-mail to all eligible students. The first 14 students to reply with interest for the study were selected. Participant-students were positioned between nonparticipant students within each cycle to allow for ample time for assessors to write a narrative evaluation. All students had experience completing formative OSCEs. All received extensive communication training both on campus during professional skills courses and off campus during 960 hours of patient care and professional practice-related activities.

Eighteen assessors (n = 2 per OSCE station) were recruited via e-mail to provide narrative evaluations of students’ communication skills for study purposes. Two other assessors were present within the station to evaluate students for grading purposes (required by university policies). Participants were selected using convenience sampling from the assessor pool at Qatar University. Assessors were eligible for participation if they were health professionals, had previous training and experience assessing pharmacy student communication skills, and provided consent. All assessors were pharmacists or faculty within the College of Pharmacy. Assessors were all experienced OSCE assessors and were trained according to the framework for teaching and assessing communication skills currently used at Qatar University. For this study, they received additional training during a group session in advance of the OSCE by explaining study objectives and providing samples of narrative assessment comments extracted from the literature.Citation10,Citation22 These examples were not related to communication so as to prevent anchoring or seeding bias during the actual study.

Assessors remained constant for each station throughout both cycles and were asked to write narrative evaluations of students’ communication skills during the observed interaction. Evaluations were handwritten on a blank sheet of paper. Instructions to assessors were as follows: “Please use the space below (and on the reverse if needed) to write a detailed narrative evaluation of the students’ communication skills.” No direction in the length or content of assessment comments was purposefully given in order to minimize bias in terms of the skills, behaviors, and other attributes on which assessors focus. Assessors were given 17 minutes to write narrative assessment comments per station/student. Assessors were instructed to keep the narrative comments strictly confidential from their coassessor or any other person to avoid data contamination.

Assessors were also asked to assign an overall performance score to each student according to a 5-point communication assessment scale, with anchors at points 1 (communicates inappropriately and ineffectively to the task), 3 (communicates with some logic and comprehension but not applied consistently), and 5 (communicates precisely, logically and perceptively to the encounter, integrating all relevant components). Assessors were instructed that their scores would not count for grading purposes but students would receive their narrative comments upon completion of the summative OSCE.

As we specifically sought to explore constructs (i.e., performance dimensions) and linguistic strategies that assessors use to distinguish between poor and good students when writing narrative assessment comments, we purposively collected and analyzed the narrative evaluations for the top two and bottom two performing students for each station. For each station, poor performers (bottom two per station) and high-performing students (top two per station) were identified based on the composite quantitative communication scores given by the two assessors involved in the study.

Data analysis

Use of linguistic strategies

Extracted narrative comments were coded line by line using a deductive approach according to the adapted politeness theory framework.Citation14,Citation15 If a politeness strategy was noted, it was coded as such. If no politeness strategy was noted, it was coded as “no politeness strategy.” Upon reading the narratives, it was clear that the majority of comments were depersonalized and evident that depersonalization was not being used to convey politeness in this context, but rather to record short observations. We therefore coded each statement as either personalized or depersonalized, in addition to any other politeness strategy present. Comments that were depersonalized but that did not include another politeness strategy were coded as “no politeness strategy.” Coding occurred by two independent coders, the PI (KW) and a research assistant. Coding and coding processes were regularly discussed within the research team; all investigators were given the raw coding data to review and question or challenge coding results. Disagreements for coding were resolved through discussion. The proportion of narrative comments coded for each politeness strategy was calculated. The chi-square test was used to compare proportions of linguistic strategies used across performance levels (IBM SPSS Statistics, Version 24.0, Armonk, NY).

Use of performance dimensions and constructs

To determine which aspects of communication assessors focused on within narrative when discriminating between students, comments were recoded using a general inductive coding approach.Citation23 Specifically, each narrative was first segmented into phrases representing a single thought, idea or statement by one of the researchers (KW). Segments were identified on the basis of semantic features (i.e., content features, as opposed to noncontent features such as syntax); segments could thus include several comments. Each segment was reviewed and coded by two independent coders using open and axial coding to identify communication behaviors as well as core meaning.Citation23 After coding was complete, coders identified segments that appeared to be decisive for the overall judgment, justifying the respective good or poor performance rating as determined by the quantitative scores. Identification of these segments occurred by comparing and contrasting the performance score with comments that were either positive or negative, depending on the score. For example, for a narrative with a score of 5, all segments within the narrative that represented a clearly positively perceived aspect of communication were extracted for further review. For each of these segments, narrative comments were re-viewed multiple times to search for patterns in what assessors appeared to focus on when discriminating between levels of student performance. These patterns of interrelated comments were then grouped into key categories representing different constructs that appeared to form the central focus in assessors’ narrative comments. Narratives were then reread multiple times to check for disconfirming evidence and to ensure robustness of the analysis. Again, coding and coding processes were regularly discussed within the research team, and all investigators agreed on final identified constructs. Finally, representative quotes from the narrative comments were extracted.

This study was approved by the Qatar University Institutional Review Board (QU-IRB 571-E/16).

Results

Assessor demographics are provided in . For eight stations, the range of communication scores was from 2 to 5, and for one station the range was 3 to 5. Seventy-two narrative evaluations were extracted for analysis (36 from good performers, and 36 from poor performers). The final data set included 662 individual comments, averaging nine comments per narrative.

Table 1. Baseline characteristics of assessors.

Use of linguistic strategies

provides results of the linguistic line-by-line coding analysis. The majority of narrative comments were written with no politeness strategy (77%) and were depersonalized (73%). A further 22% of comments were hedged. With respect to the entire narrative, eight (11%) of narratives contained no politeness, 17 (24%) were entirely depersonalized, and 51 of 72 (71%) contained at least one hedge. Assessors used politeness strategies differently between good and poor performers. Coding of “no politeness strategy” was more common in comments pertaining to good versus poor students (83% vs. 70%, respectively, chi-square test statistic =15.3, p < .001). furthermore shows that hedging was more commonly coded in comments for poor performers, as compared to good performers (30% vs. 15%, respectively, chi-square test statistic =21.4, p < .001).

Table 2. Common linguistic strategies coded within narratives obtained from OSCE evaluations for good and poor performing students.

Use of performance dimensions and communication constructs

We identified four communication constructs that characterized the written assessment comments (i.e., confidence, adaptability, patient safety, and professionalism). Behaviors were documented both generally (usually for good performers) or specifically using detailed descriptions of student behaviors within the interaction (usually for poor performers). The following sections provide a description of each of the constructs and how assessor comments characterized the construct, with specific examples for both good and poor performers. Explanations of how the comments were deemed to be coupled with assessors’ communication scores are also provided.

Construct 1: Confidence

Confidence was identified as a key characteristic of assessors’ narratives, often explicitly stated and exemplified using a cluster of different behavioral descriptions:

Projected well, was confident and assertive. (Assessor 7, Student 14, score =4)

Tone of voice is confident and reassuring. … No distracting hand/body gestures. (Assessor 14, Student 13, score =5)

For poor performers, comments often included more detailed descriptions of specific behaviors, inferring (lack of) confidence:

When making recommendation: very soft spoken and not confident, uses words such as “like something” not sure of self. (Assessor 3, Student 2, score =3)

Backed away when patient was upset. … Fiddled with pen during entire conversation … not confident with answer. (Assessor 8, Student 2, score =2)

For both good and poor performers, these comments about confidence appeared to set the tone of the overall narrative. The common attribute between most confidence-related comments was that of demonstrating command of the interaction. As shown in the preceding examples, assessors used specific communication behaviors (e.g., gestures or eye contact) to explain or justify how a student did or did not portray confidence. These could be explicitly stated (“poor eye contact – lack of confidence”) or implicitly interpreted based on the comments in context of the segment or complete narrative. Voice, with respect to tone and volume, was central across the entire data set as a measure of confidence, clearly discriminating between good and poor performers.

Construct 2: Adaptability

Similarly, the ability of students to adapt or tailor communication to the patient was identified as a central construct in assessor narratives. Good performers were explicitly rewarded for being patient centered in their approach, exemplified by verbal and nonverbal behaviors (voice tone, facial expressions, and gestures/body language) and the patient’s satisfaction with the interaction, as inferred from verbal or nonverbal patient communication:

Expresses compassion, responds perceptively to patient feelings. … Establishes good rapport with the patient. (Assessor 18, Student 10, score =5)

The student showed excellent empathy towards the patient, and she used good non-verbal gestures to make the patient more comfortable even if she was in a hurry. (Assessor 1, Student 10, score =5)

Stood to match patient, asked him to sit, good to establish rapport. (Assessor 7, Student 14, score =4)

Conversely, in poor performers, assessors documented inability to develop rapport and inappropriate responses to the patient’s reactions and emotions:

Smiled throughout interaction despite patient being upset. … Did not acknowledge patient’s emotions. … Did not try to explain answer in any other way. (Assessor 8, Student 2, score =2)

Slightly superior mannerisms. … Not listening to standardized patient’s intro—in too much of a hurry. … Not reassuring—no warmth in manner. This mother is worried—it’s your job to calm her down. (Assessor 14, Student 4, score =3)

She did not show sympathy to the patient and talked like robot! (Assessor 9, Student 5, score =3)

A common feature of these comments is the incorporation of multiple examples or behaviors into a segment that, when read as a whole, represents the larger construct of the student’s ability (or inability) to adapt to the situation at hand.

Construct 3: Patient safety

Narrative comments were characterized by single statements relating to a patient’s health and well-being. Safety was not always associated with a specific risk or harm done but also included communication to promote patient understanding of the therapeutic plan and medication-related instructions. Instances in which safety was mentioned positively were associated with good performance:

She … cared for the patient’s safety regardless of the fact that he was in a hurry. (Assessor 1, Student 5, score =5)

She was empathetic and noticed shortness of breath quickly and asked about it. … She insisted on referral (very good). (Assessor 5, Student 12, score =4)

Explicitly mentioning risks to patient safety was associated with poor performance:

Regardless of the fact that she provided the right recommendation, she had enough time to counsel the patient about what she missed in order to ensure safety. (Assessor 1, Student 3, score =2)

Referred to another pharmacist—dangerous! Did not act with responsibility and tried to pass off to another pharmacist or for patient to look it up [themselves]. (Assessor 8, Student 2, score =2)

Although the construct of safety was identified less frequently than confidence or adaptability, the association with quantitative scores was very strong. A single mention of the safety construct appeared to be decisive for the assessor’s overall judgement of the interaction.

Construct 4: Professionalism

Professionalism was identified to be a fourth key construct in assessor narratives and comments that helped distinguish between student levels of performance. Assessors used examples relating to attitude, appearance, and professional identity to infer students’ professionalism. Comments for good performers used frequently strong compliments and punctuation to accentuate emphasis:

For me, she demonstrates a model healthcare provider! (Assessor 5, Student 6, score =5)

Good introduction, looks professionally dressed. … Overall, she discussed the situation well and had excellent verbal and nonverbal communication skills. (Assessor 7, Student 4, score =5)

Comments for poor performers were consistently justified or explained as to why the assessor deemed the student’s actions to not be professional:

Really needs to work on professional appearance like hair style and clothing—part of public confidence. (Assessor 14, Student 2, score =3)

Not friendly—she introduced herself but not in the friendly manner. (Assessor 9, Student 8, score =2)

For this construct, assessors appeared to distinguish between good and poor performers by providing insight of how they related to students on a professional level. Assessors focused on if the student behaves, dresses, or overall acts like an individual fit-for-practice and could be entrusted to start working as “one of them.”

Discussion

This study attempted to explore assessors’ use of narrative assessment comments of communication skills to distinguish between performance levels during a summative OSCE. We found that assessors’ narrative primarily included depersonalized comments without the use of politeness yet hedging was used more commonly for poor performers. Our findings also showed that assessors’ comments could be characterized by four constructs related to communication—confidence, adaptability, patient safety, and professionalism—when discriminating between students. Despite documenting judgments regarding many aspects of communication, including specific behaviors, these four constructs consistently appeared to inform overall performance judgments as demonstrated by scoring as a “good” or “poor” performer.

A key finding from this study is that assessors seem to focus on fundamental constructs rather than discrete behaviors, when judging task performance and conveying performance information during OSCEs. Assessors tend to cluster specific communication behaviors into meaningful patterns to explain or justify their judgments pertaining to overarching constructs. They are likely less worried if a student makes a behavioral mistake and more concerned if the student can adopt and demonstrate these core constructs or fundamental patterns of behavior that determine effective pharmacist–patient communication. This finding is in line with recent literature on rater cognition that suggests that assessors, although valuing different behaviors in assessment, tend to prioritize centralized themes when making judgments.Citation7,Citation9–11,Citation13,Citation24 Our results support this concept and provide evidence that assessors, when writing narratives, seem to take a more holistic approach to assessment of OSCE performance, focusing on broad constructs, yet substantiate their judgments by providing descriptions of specific behaviors. Findings from our study thus advance our understanding of how assessors’ observe and judge students’ communication skills in standardized assessment tasks. Results from our study specifically point at key dimensions in assessors’ performance theories (i.e., the types of constructs that assessors focus on when interpreting observed communication behaviors), underpinning their decisions and performance feedback.

Although contexts were not directly compared in this study, our results with respect to linguistic strategies in the OSCE setting differ from previous findings in workplace-based assessment.Citation14 In workplace settings, research findings showed high frequency use of positive politeness strategies (compliments, exaggeration, in group identity markers, offers, and optimism) whereas these strategies were negligible in our study.Citation14 As well, hedging appeared to be more commonly used by assessors in the workplace context (hedging present in 94% vs. 71% of low vs. high performer comments, respectively).Citation14 The specific and depersonalized feedback observed in our study may indicate that assessors feel safe to write these comments in an OSCE context. Various factors may explain this contextual difference. An OSCE station is a short, task-focused interaction, compared to weeks or months of supervised, multifaceted workplace-based training. This may facilitate assessors to simply write down what they see and how they see it rather than providing general comments, encouragement, or compliments. In addition, as assessors within the OSCE station do not interact with students, assessors and students do not develop working relationships or interdependence. Assessors may therefore not feel the need to disguise their comments with politeness.

Implications

The two major findings of our study contribute to better understanding of the use of narrative comments in summative assessment and have further implications for assessment of communication skills during OSCEs. The overall lack of politeness strategies within comments facilitate fairly easy interpretation of assessors’ judgments and are likely to allow students and summative decision makers to understand intended meaning of assessors’ comments. Our findings with respect to assessors’ use of broad constructs raise questions about common assessment practices using tools that break down communication evaluation into microcomponents rather than interpreting performance as a whole. Assessment tools focusing on what assessors consider to be key constructs in communication and using assessors’ language might be more constructively aligned, more meaningful, and thus potentially more valid in high-stakes assessments.Citation25 Inclusion of discrete, specific behaviors, however, may be needed to enhance the assessment’s formative function (specifying feedback) and/or to further justify summative judgments. Further research for the use of narrative in summative OSCE evaluation should therefore focus on exploring the utility of anchoring summative assessment tools using overarching constructs, illustrated with specific communication behaviors. In addition, research should focus on development of an “assessment thesaurus,” listing words and phrases representing relevant performance dimensions and constructs. Development of a useful and meaningful assessment language may support assessor tasks in judgment and decision making, as well as design of rating scales with constructively aligned anchors.

Limitations

Limitations of our study should be noted. First, our data are limited to one setting with a small sample size of students and assessors. However, although this is a limitation for transferability of the results, the number of narrative assessment comments analyzed within our study was large (72 narratives with a total of 662 comments), therefore supporting our conclusions for this specific data set. Second, English was not the first language for 11 of 18 assessors. Although all comments were written in English, grammar may have differed as a result of English proficiency and language and cultural differences may have influenced the linguistic analysis. Our findings, however, showed consistent use of key constructs across all assessors. A third limitation of this study is that the data were obtained from assessors who had protected time to provide comments and were not providing summative pass–fail decisions, which might explain the relative disuse of politeness strategies. It should be noted, however, that these assessors were highly trained, sampled from the same assessor pool as regular assessors, and instructed to provide assessment as if they were making summative decisions. A final limitation of this study is that written narratives do not necessarily reflect assessors’ information processing. We do not know how assessors made their judgments, that is, if they first notice certain behaviors and thereafter construct their judgment or if they use the overarching constructs as a starting point for selecting and interpreting behavioral observations. However, providing motivations for performance ratings is inherent in many assessment tasks and our data clearly suggest that assessors’ motivation in our summative communication OSCE rests on broad and overarching constructs. Despite study limitations, we feel the results are novel, meaningful, and of significance for research pertaining to narrative in assessment.

Conclusions

This study contributes to increasing understanding regarding the use of narrative evaluations and was the first to explore through a qualitative lens, distinguishing features of narrative comments across good and poor performers regarding communication skills within an OSCE context. Our results support the utility of narrative evaluations for summative decision making in standardized assessment settings. As evidenced from our study, narrative evaluations may enhance assessment of communication performance through facilitating development of better-anchored evaluation tools, constructively aligned with assessors’ map of the reality of professional practice. To optimize narrative assessment approaches, however, further research and field testing across different professional and cultural contexts is needed to deepen our understanding how assessor narratives can be used to support summative decision making while fostering student learning and competence development.

Acknowledgments

We thank Eline Vanassche for her assistance with the qualitative analysis.

References

  • Hodges B. Assessment in the post-psychometric era: learning to love the subjective and collective. Med Teach. 2013;35(7):564–568.
  • Hanson JL, Rosenberg AA, Lane JL. Narrative descriptions should replace grades and numerical ratings for clinical performance in medical education in the United States. Front Psychol. 2013;4:668.
  • Cook DA, Kuper A, Hatala R, Ginsburg S. When assessment data are words: validity evidence for qualitative educational assessments. Acad Med. 2016;91(10):1359–1369.
  • Govaerts MJB, Van de Wiel MJW, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: raters' performance theories and constructs. Adv Health Sci Educ Theory Pract. 2013;18(3):375–396.
  • Ginsburg S, Eva K, Regehr G. Do in-training evaluation reports deserve their bad reputations? A Study of the reliability and predictive ability of ITER scores and narrative comments. Acad Med. 2013;88(10):1539–1544.
  • Ginsburg S, van der Vleuten CPM, Eva KW. The hidden value of narrative comments for assessment: a quantitative reliability analysis of qualitative data. Acad Med. 2017;92(11):1617–1621.
  • Gauthier G, St-Onge C, Tavares W. Rater cognition: review and integration of research findings. Med Educ. 2016;50(5):511–522.
  • Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the 'black box' differently: assessor cognition from three research perspectives. Med Educ. 2014;48(11):1055–1068.
  • Wilby KJ, Govaerts MJB, Austin Z, Dolmans DHJM. Exploring the influence of cultural orientations on assessment of communication behaviours during patient-practitioner interactions. BMC Med Educ. 2017;17:61.
  • Ginsburg S, McIlroy J, Oulanova O, Eva K, Regehr G. Toward authentic clinical evaluation: pitfalls in the pursuit of competency. Acad Med. 2010;85(5):780–786.
  • Wilbur K, Hassaballa N, Mahmood OS, Black EK. Describing student performance: a comparison among clinical preceptors across cultural contexts. Med Educ. 2017;51(4):411–422.
  • Ginsburg S, Gold W, Cavalcanti RB, Kurabi B, McDonald-Blumer H. Competencies “plus”: the nature of written comments on internal medicine residents evaluation forms. Acad Med. 2011;86:S30–S34.
  • Ginsburg S, Regehr G, Lingard L, Eva KW. Reading between the lines: faculty interpretations of narrative evaluation comments. Med Educ. 2015;49(3):296–306.
  • Ginsburg S, van der Vleuten C, Eva KW, Lingard L. Hedging to save face: a linguistic analysis of written comments on in-training evaluation reports. Adv Health Sci Educ Theory Pract. 2016;21(1):175–188.
  • Brown P, Levinson SC. Politeness: Some Universals in Language Usage. New York, NY: Cambridge University Press; 1987.
  • Austin Z, O'Byrne C, Pugsley J, Munoz LQ. Development and validation processes for an objective structured clinical examination (OSCE) for entry-to-practice certification in pharmacy: the canadian experience. Am J Pharm Educ. 2003;67(3):76.
  • Harrison CJ, Konings KD, Molyneux A, Schuwirth LWT, Wass V, van der Vleuten CPM. Web-based feedback after summative assessment: how do students engage? Med Educ. 2013;47(7):734–744.
  • Harrison CH, Molyneux A, Blackwell S, Wass VJ. How we give personalised audio feedback after summative OSCEs. Med Teach. 2015;37(4):323–326.
  • Canadian Council for Accreditation of Pharmacy Programs. Accredited Programs. Canadian Council for Accreditation of Pharmacy Programs Web site. http://www.ccapp-accredit.ca. Published 2017, Accessed September 23, 2018.
  • Wilby KJ, Diab M. Key challenges for implementing a Canadian-based objective structured clinical examination (OSCE) in a Middle Eastern context. Can Med Educ J. 2016;7(3):e4–e9.
  • Association of Faculties of Pharmacy of Canada. Educational Outcomes for first professional degree programs in pharmacy in Canada–June 4, 2017. Association of Faculties of Pharmacy of Canada Web site. http://afpc.info/node/39. Published June 20, 2017. Accessed September 23, 2018.
  • Regehr G, Ginsburg S, Herold J, Hatala R, Eva K, Oulanova O. Using “standardized narratives” to explore new ways to represent faculty opinions of resident performance. Acad Med. 2012;87(4):419–429.
  • Thomas DR. A general inductive approach for analyzing qualitative evaluation data. Am J Eval. 2006;27(2):237–246.
  • Govaerts MJ, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: effects of rater expertise. Adv Health Sci Educ Theory Pract. 2011;16(2):151–165.
  • Biggs J. Enhancing teaching through constructive alignment. High Educ. 1996;32(3):347–364.