340
Views
2
CrossRef citations to date
0
Altmetric
Educational Case Reports

A Multi-institutional Study of the Feasibility and Reliability of the Implementation of Constructed Response Exam Questions

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 609-622 | Received 14 Sep 2021, Accepted 27 Jul 2022, Published online: 20 Aug 2022
 

Abstract

Problem

Some medical schools have incorporated constructed response short answer questions (CR-SAQs) into their assessment toolkits. Although CR-SAQs carry benefits for medical students and educators, the faculty perception that the amount of time required to create and score CR-SAQs is not feasible and concerns about reliable scoring may impede the use of this assessment type in medical education.

Intervention

Three US medical schools collaborated to write and score CR-SAQs based on a single vignette. Study participants included faculty question writers (N = 5) and three groups of scorers: faculty content experts (N = 7), faculty non-content experts (N = 6), and fourth-year medical students (N = 7). Structured interviews were performed with question writers and an online survey was administered to scorers to gather information about their process for creating and scoring CR-SAQs. A content analysis was performed on the qualitative data using Bowen’s model of feasibility as a framework. To examine inter-rater reliability between the content expert and other scorers, a random selection of fifty student responses from each site were scored by each site’s faculty content experts, faculty non-content experts, and student scorers. A holistic rubric (6-point Likert scale) was used by two schools and an analytic rubric (3-4 point checklist) was used by one school. Cohen’s weighted kappa (κw) was used to evaluate inter-rater reliability.

Context

This research study was implemented at three US medical schools that are nationally dispersed and have been administering CR-SAQ summative exams as part of their programs of assessment for at least five years. The study exam question was included in an end-of-course summative exam during the first year of medical school.

Impact

Five question writers (100%) participated in the interviews and twelve scorers (60% response rate) completed the survey. Qualitative comments revealed three aspects of feasibility: practicality (time, institutional culture, teamwork), implementation (steps in the question writing and scoring process), and adaptation (feedback, rubric adjustment, continuous quality improvement). The scorers’ described their experience in terms of the need for outside resources, concern about lack of expertise, and value gained through scoring. Inter-rater reliability between the faculty content expert and student scorers was fair/moderate (κw=.34-.53, holistic rubrics) or substantial (κw=.67-.76, analytic rubric), but much lower between faculty content and non-content experts (κw=.18-.29, holistic rubrics; κw=.59-.66, analytic rubric).

Lessons Learned

Our findings show that from the faculty perspective it is feasible to include CR-SAQs in summative exams and we provide practical information for medical educators creating and scoring CR-SAQs. We also learned that CR-SAQs can be reliably scored by faculty without content expertise or senior medical students using an analytic rubric, or by senior medical students using a holistic rubric, which provides options to alleviate the faculty burden associated with grading CR-SAQs.

Acknowledgements

The authors wish to thank the content experts who worked diligently together to create a common set of exam questions, and the non-content experts and students who scored the study questions, as well as Elisabeth Schlegel, PhD for performing the interviews with the question writers. The authors also thank Saori Wendy Herman, MLIS, AHIP and Krista Paxton for their assistance with manuscript preparation.

Disclosure statement

No potential conflict of interest was reported by the authors.

Ethical approval

This study was deemed exempt from review by the Hofstra University, University of California at San Francisco, and Case Western Reserve University Institutional Review Boards.

Notes

† There were originally three questions, but due to an administrative error one of the study questions was not included in the exam at School C, therefore that question was removed from the final quantitative analysis at all three sites.

Additional information

Funding

This research was supported by a $10,000 grant through the Group on Educational Affairs (GEA) National Grant Award program from the Association of American Medical Colleges (AAMC).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 65.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 464.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.