169
Views
2
CrossRef citations to date
0
Altmetric
Original Research

Group versus modified individual standard-setting on multiple-choice questions with the Angoff method for fourth-year medical students in the internal medicine clerkship

, , , , , , & show all
Pages 195-200 | Published online: 27 Sep 2013

Abstract

Background

The Angoff method is one of the preferred methods for setting a passing level in an exam. Normally, group meetings are required, which may be a problem for busy medical educators. Here, we compared a modified Angoff individual method to the conventional group method.

Methods

Six clinical instructors were divided into two groups matched by teaching experience: modified Angoff individual method (three persons) and conventional group method (three persons). The passing scores were set by using the Angoff theory. The groups set the scores individually and then met to determine the passing score. In the modified Angoff individual method, passing scores were judged by each instructor and the final passing score was adjusted by the concordance method and reliability index.

Results

There were 94 fourth-year medical students who took the test. The mean (standard deviation) test score was 65.35 (8.38), with a median of 64 (range 46–82). The three individual instructors took 45, 60, and 60 minutes to finish the task, while the group spent 90 minutes in discussion. The final passing score in the modified Angoff individual method was 52.18 (56.75 minus 4.57) or 52 versus 51 from the standard group method. There was not much difference in numbers of failed students by either method (four versus three).

Conclusion

The modified Angoff individual method may be a feasible way to set a standard passing score with less time consumed and more independent rather than group work by instructors.

Introduction

Medical educators generally perform various roles, such as attending physicians, teachers, and researchers. As a result, they have limited time for medical education tasks. Time limitation has been shown to be associated with low motivation for educational work.Citation1

Setting up the standard passing level is crucial for a licensing examination.Citation2 The Angoff method is one of the most preferred methods by which to achieve this.Citation3 Six judges are needed to discuss each test item in short-answer-question-based and extended-matching-question-based papers to increase reliability.Citation4 In the conventional Angoff method, each judge must rate each item individually but can change their decision at the group deliberation at any time.

The Angoff method has been used to set up standard passing levels since 1971. The working team is composed of a group of judges who each evaluate all test items. The main concept is that the borderline student is able to answer each test item correctly. The Angoff has been modified as a simplified methodCitation4 or three-level method,Citation5 with both methods being acceptable for setting the passing score. The Angoff was the most popular method for multiple-choice questions during the 1990s.Citation6 It can be used for both medium- and high-stake examinations, such as licensing examinations,Citation6Citation8 and is also appropriate for an Objectively Structured Clinical Examination (OSCE),Citation9,Citation10 or even testing by computer.Citation11

Even though the Angoff method has been proven to be as effective as other methods for determining a standard passing level, such as the whole-test Ebel or Hofstee method,Citation8,Citation12,Citation13 there are some limitations with the Angoff method. The passing score might depend on the characters and knowledge of the judges.Citation14Citation17 Another disadvantage of the method is that it is time-consuming due to the availability requirement of judges for group deliberation. As mentioned earlier, medical educators tend to have time limitations.

The group process and discussion method generally establishes more valid and reliable passing scores than an individual method. The discussion might, however, take a long time and is dependent on the availability of the group members. Therefore, if an individual method gives a similar passing score compared with the group method, using the individual method may be preferable. Logically, the passing scores from an individual method need to be adjusted. This study evaluated the passing scores of the group versus the modified Angoff individual method by using this theory on multiple-choice questions. The aim of this study was to improve the time-consuming process while retaining the advantages of the standard.

Methods

Six clinical instructors at the Department of Medicine, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand, were invited to participate in the study. All instructors accepted the invitation and were divided into two groups (modified Angoff individual method and group method), matched by the teaching experience.

All instructors were asked to set the standard score for multiple-choice questions that were used for fourth-year medical students of the internal medicine clerkship. The test contained 100 questions with five choices in each question. The Angoff methodCitation3 was introduced to all instructors. Judges were asked to vote “yes” or “no” for each test item as to whether a borderline student would be able to answer the question correctly.

The modified Angoff individual method

The three instructors in the individual group worked on the test individually. Judges were asked to score each test question in the test with either a “yes” or “no” rating, in terms of its being able to be answered correctly by a borderline examinee. As a substitute for the group discussion, two adjustments were used, including the concordance technique and further adjustment with the reliability index. It is widely agreed upon that using the concordance rating may increase reliability. There were five concordance scores used for adjustment: concordant “yes” score between instructors 1 and 2; concordant “yes” score between instructors 1 and 3; concordant “yes” score between instructors 2 and 3; concordant “yes” score between any two instructors; and concordant “yes” score among three instructors. Three individual passing scores were recorded. In total, there were eight raw numbers. Adjustment of the average of eight raw numbers with a standard error of the mean (SEM) determined the final passing score for the modified Angoff individual method (). These adjustments were made to substitute for the group discussion of the conventional Angoff method.

Table 1 Adjustment of final passing scores in a modified individual Angoff method by a reliability index

Passing scores were calculated by an average of eight raw numbers and reported in four values (value without adjustment, value with adjustment by concordance method, value adjustment with SEM, and value with adjustment by both concordance and SEM). The time each judge took to score all test items (minutes), time to make adjustment by a coordinator (minutes), and time to finish the task after participation in the study (weeks) were also recorded.

Group method

Group methods followed the conventional Angoff method.Citation3 The five steps of the Angoff method were outlined to the judges to be followed: (1) judges’ discussion of what constituted a borderline examinee; (2) consensus agreement of the borderline examinee; (3) score rating by each judge individually; (4) score recording; and (5) discussion to determine passing score. All instructors had to finish their individual ratings before the group discussion. For the final step, the three instructors worked together, discussed each item, and made the final decision for each item as “yes” or “no,” representing the passing score for the group method. Judges also discussed their own ratings in regard to the group passing score.

The time each judge used to score all test items individually in minutes, the time for group discussion (minutes), the time for a coordinator to collect data for the group discussion of the final passing score (minutes), the time required for the group discussion meeting (minutes), and time to finish the task after participation in the study (weeks) were recorded.

Results

The test was used for students in block 1 of the internal medicine rotation at the Faculty of Medicine, Khon Kaen University. There were 94 medical students who took this test. The mean (standard deviation) test score was 65.35 (8.38), with a median of 64 (range 46–82). The Kuder–Richardson Formula 20 was 0.77 with an SEM 20 of 4.06. Cronbach’s alpha was 0.77 with an SEM of 4.00, and the reliability coefficient of the whole test was 0.73, with an SEM of 4.57.

The profiles of the instructors in the modified Angoff individual and group methods were comparable in terms of sex, specialty, teaching experience, and academic position ().

Table 2 Profiles of instructor participants in each group

Passing scores

The eight scores from individual judges and concordant scores from the modified Angoff individual method were 65, 59, 68, 44, 55, 53, 68, and 42 (). The mean passing score by three judges without adjustment was 64/100 (). After adjustment with concordance or SEM, the passing score became lower (). The final passing score in the modified Angoff individual method was 52.18 (56.75 minus 4.57) or 52. The passing score from the group assessment was somewhat lower, at 51 (). The total number of students who failed was 4/94 (4.3%) by the modified Angoff individual method and 3/94 (3.2%) by the group method.

Table 3 Score for each item by individual instructors and the group

Table 4 Passing scores by the modified Angoff individual method

Time spent

The three instructors in the modified Angoff individual method spent 45, 60, and 60 minutes to finish the task individually, while, in the group method, the instructors spent 60, 60, and 90 minutes to finish individual ratings prior to the group discussion. The modified Angoff individual method instructors finished the task within 2 weeks after assignment (1 day, 4 days, and 2 weeks), while the group method took 1 week to make an appointment for the group meeting, which was held 3 weeks later (). A coordinator spent 60 minutes making the adjustments for the modified Angoff individual method, while the group took 15 minutes to discuss the passing score. The total time spent in all steps was longer in the group method (4 weeks, 5 hours, and 15 minutes).

Table 5 Time spent in the modified Angoff individual and group methods

Discussion

The standard passing scores decided with the modified Angoff individual method and the Angoff group method were comparable. After adjustments, the modified Angoff individual method determined a passing score of 52/100, while the conventional group method resulted in a passing score of 51/100. For the entire process, the modified Angoff individual method was less time-consuming (~2 weeks versus 4 weeks) for establishing the passing scores (). Number of failed students by both passing scores was comparable.

Instructors who participated in the modified Angoff individual method were happy to do the task individually because they could manage their own schedules. Another benefit of the individual method was that it was less time-consuming. The disadvantage of the individual method is that the instructors had no chance to discuss with other members and the judgment on each item was totally individually dependent. In addition, there was no chance to discuss the final passing score together and compare with the real situation, as was possible in the group method.Citation3 Judges’ discussion tended to increase the passing score, however.Citation18 The passing score was nonetheless comparable with the standard group method (52 versus 51) after the adjustment with concordance scores and the reliability index.Citation19,Citation20 The average score of the individual three instructors was quite high (64/100) and lowered after the adjustment by either concordance scores or the reliability index ().

For the group method, most of the time spent on the administration process was in making an appointment for three busy instructors, amounting to 1 week to decide the date and 3 weeks afterward to bring all three instructors together. The times spent on individually rating the questions were quite similar to those in the individual method, with somewhat higher average times than the individual method (60, 60, and 90 minutes versus 45, 60, and 60 minutes). Similarly, time spent by a coordinator to summarize the final passing score in the group method was not very different from the time spent to adjust the scores in the individual method (15 versus 60 minutes). As mentioned earlier, the group method is more time-consuming due to the difficulty of coordinating the busy schedules of instructors in order that they can have the group meeting. In this study, both groups were comprised of instructors with quite similar teaching experiences and specialties. The final passing scores were somewhat lower than those previously reported in the literature: passing scores are usually around 60/100.Citation18

This is a preliminary study comparing modified Angoff individual versus group Angoff methods. The number of judges in each group was lower than the recommended six judges. The number of suitable judges used was actually based on the study for short-answer-question-based and extended-matching-question-based papers, not for a multiple-choice test like in this study.Citation4 In addition, the number of adjustments by concordance should be at least 720 (6!) scores instead of eight scores. A concordance score is used to increase interrater agreement, while adjustment with the reliability index increases score consistency. Both adjustments were done in this study to substitute for the group discussion of the conventional Angoff method. Further studies should be undertaken to confirm the comparable outcomes by the individual and group methods, comprising three judges in the modified Angoff individual method versus six judges in the group method, or six judges in the modified Angoff individual method (with computer adjustment) versus six judges in the group method.

Conclusion

This study introduced a modified Angoff individual method, which was done to set the standard passing level for multiple-choice questions. This modified Angoff individual method may be feasible but needs further studies. The modified Angoff individual method was less time-consuming than the conventional group method and better suited the busy working schedules of the instructors.

Disclosure

The authors report no conflicts of interest in this work.

References

  • ZibrowskiEMWestonWWGoldszmidtMA‘I don’t have time’: issues of fragmentation, prioritisation and motivation for education scholarship among medical facultyMed Educ20084287287818715484
  • DowningSMLieskaNGRaibleMDEstablishing passing standards for classroom achievement tests in medical education: a comparative study of four methodsAcad Med200378S85S8714557105
  • AngoffWHScales, norms, and equivalent scoresThorndikeRLEducational MeasurementWashington DCAmerican Council on Education1971508600
  • FowellSLFewtrellRMcLaughlinPJEstimating the minimum number of judges required for test-centred standard setting on written assessments. Do discussion and iteration have an influence?Adv Health Sci Educ Theory Pract200813112416957872
  • YudkowskyRDowningSMPopescuMSetting standards for performance tests: a pilot study of a three-level Angoff methodAcad Med200883S13S1618820491
  • AhnDSAhnSReconsidering the cut score of Korean National Medical Licensing ExaminationJ Educ Eval Health Prof20074119224002
  • HessBSubhiyahRGGiordanoCConvergence between cluster analysis and the Angoff method for setting minimum passing scores on credentialing examinationsEval Health Prof20073036237517986670
  • YudkowskyRDowningSMWirthSSimpler standards for local performance examinations: the Yes/No Angoff and whole-test EbelTeach Learn Med20082021221718615294
  • BoursicotKARobertsTEPellGUsing borderline methods to compare passing standards for OSCEs at graduation across three medical schoolsMed Educ2007411024103117973762
  • KaufmanDMMannKVMuijtjensAMvan der VleutenCPA comparison of standard-setting procedures for an OSCE in undergraduate medical educationAcad Med20007526727110724316
  • SiriwardenaANDixonHBlowCIrishBMilnePPerformance and views of examiners in the Applied Knowledge Test for the nMRCGP licensing examinationBr J Gen Pract200959e38e4319192366
  • WayneDBBarsukJHCohenEMcGaghieWCDo baseline data influence standard setting for a clinical skills examination?Acad Med200782S105S10817895672
  • DowningSMTekianAYudkowskyRProcedures for establishing defensible absolute passing scores on performance examinations in health professions educationTeach Learn Med200618505716354141
  • WayneDBCohenEMakoulGMcGaghieWCThe impact of judge selection on standard setting for a patient survey of physician communication skillsAcad Med200883S17S2018820492
  • VerhoevenBHVerwijnenGMMuijtjensAMScherpbierAJvan der VleutenCPPanel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated studentsMed Educ20023686086712354249
  • BoursicotKARobertsTEPellGStandard setting for clinical competence at graduation from medical school: a comparison of passing scores across five medical schoolsAdv Health Sci Educ Theory Pract20061117318316729244
  • VerheggenMMMuijtjensAMVan OsJSchuwirthLWIs an Angoff standard an indication of minimal competence of examinees or of judges?Adv Health Sci Educ Theory Pract20081320321117043915
  • HurtzGMAuerbachMAA meta-analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensusEduc Psychol Meas200363584601
  • CulliganBItem Response Theory, Reliability and Standard Error Available from: http://www.wordengine.jp/research/pdf/IRT_reliability_and_standard_error.pdfAccessed June 7, 2013
  • LangenbucherJLabouvieEMorgensternJMeasuring diagnostic agreementJ Consult Clin Psychol199664128512898991315