Abstract
Purpose
Systematic differences among raters’ approaches to student assessment may result in leniency or stringency of assessment scores. This study examines the generalizability of medical student workplace-based competency assessments including the impact of rater-adjusted scores for leniency and stringency.
Methods
Data were collected from summative clerkship assessments completed for 204 students during 2017–2018 the clerkship at a single institution. Generalizability theory was used to explore variance attributed to different facets (rater, learner, item, and competency domain) through three unbalanced random-effects models by clerkship including applying assessor stringency-leniency adjustments.
Results
In the original assessments, only 4–8% of the variance was attributed to the student with the remainder being rater variance and error. Aggregating items to create a composite score increased variability attributable to the student (5–13% of variance). Applying a stringency-leniency (‘hawk-dove’) correction substantially increased the variance attributed to the student (14.8–17.8%) and reliability. Controlling for assessor leniency/stringency reduced measurement error, decreasing the number of assessments required for generalizability from 16–50 to 11–14.
Conclusions
Similar to prior research, most of the variance in competency assessment scores was attributable to raters, with only a small proportion attributed to the student. Making stringency-leniency corrections using rater-adjusted scores improved the psychometric characteristics of assessment scores.
Disclosure statement
The authors of this manuscript have no declarations of interest to report.
Glossary
Hawk–Dove effect: Systematic differences among raters’ approaches to student assessment may result in leniency or stringency, traditionally known as the ‘hawk-dove’ effect. Hawks tend to more stringent (lower) scores and Doves tend to more lenient (higher) scores.
McManus I, Thompson M, Mollon J. 2006. Assessment of examiner leniency and stringency (‘hawk-dove effect’) in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling. BMC Med Educ. 6(1):42–22.
Additional information
Notes on contributors
Sally A. Santen
S.A. Santen, MD, PhD, is senior associate dean for evaluation, assessment, and scholarship, and professor of emergency medicine Virginia Commonwealth University School of Medicine.
Michael Ryan
M. Ryan, MD, MEHP, is assistant dean for clinical medical education and associate professor of pediatrics, Virginia Commonwealth University School of Medicine.
Marieka A. Helou
M.A. Helou, MD, is associate professor, and clerkship director, department of pediatrics, Virginia Commonwealth University School of Medicine.
Alicia Richards
A. Richards, MS, is a doctoral student in the department of biostatistics, Virginia Commonwealth University School of Medicine.
Robert A. Perera
R.A. Perera, PhD, is associate professor of biostatistics, Virginia Commonwealth University School of Medicine.
Kellen Haley
K. Haley, MS, was a fourth-year medical student at Virginia Commonwealth University School of Medicine and is now an intern in neurology at Rochester School of Medicine.
Melissa Bradner
M. Bradner, MD, MSHA, is associate professor and clerkship director of family medicine, Virginia Commonwealth University School of Medicine.
Fidelma B. Rigby
F.B. Rigby, MD, is associate professor and clerkship director of obstetrics and gynecology, Virginia Commonwealth University School of Medicine.
Yoon Soo Park
Y.S. Park, PhD, was associate professor and associate head, department of medical education, and director of research, office of educational affairs, University of Illinois at Chicago College of Medicine, Chicago, Illinois; and is now at the Massachusetts General Hospital, Harvard Medical School.