131
Views
1
CrossRef citations to date
0
Altmetric
Original Research

Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia

, , &
Pages 31-46 | Published online: 24 May 2012

Abstract

Background

The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations.

Objective

To develop test blueprints for the written examinations used in the psychiatry residency program.

Methods

Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom’s taxonomy were elicited.

Results

Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program.

Conclusion

A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs.

A test blueprint defines the scope and focus of a test, ensuring congruence between the learning objectives and the course contents, thus verifying the validity of the content.Citation1 Test blueprinting organizes the process of test development to best represent the teaching/learning process. It identifies the objectives at each level of the cognitive domain of Bloom’s taxonomy, Citation2 weightage for objectives, and determines how many items need to be selected for each sub-category of a test. The cognitive domain deals with a person’s ability to process and utilize information in a meaningful way. Other domains of Bloom’s taxonomy include affective, which relates to the attitudes and feelings that result from the learning process, and psychomotor, which involves manipulative or physical skills. Bloom identified six levels within the cognitive domain. From lowest to highest, the levels are as follows: knowledge (remembrance), comprehension (understand), application (apply), analysis (analyze), synthesis (evaluate/evaluating) and finally evaluation (create/creating). Currently, the latter three are placed in a similar hierarchy ().Citation3,Citation4 Test blueprinting is also used to avoid haphazard test development. Unplanned test development results in a test with little content validity.Citation5 Without a test blueprint, a test will produce scores that are of limited use and interpretation.Citation2,Citation6,Citation7 Another rationale for developing test blueprints is that doing so improves the teaching and learning experiences of residents in training and facilitates curricular consonance.

Table 1 Bloom’s taxonomy - old and revised hierarchy link

The postgraduate training program in psychiatry in Saudi Arabia was established in 1997 in an effort to develop local professional, culturally sensitive manpower and further expand and improve the specialized mental health services supported by qualified and well-trained psychiatrists. It has local training committees in Dammam and in Riyadh. Each local program has its own selection criteria, training centers, and in-training assessment. It is a four year program and consists of two levels. Level 1, junior residency in year 1 and 2, addresses general psychiatry, consultation-liaison psychiatry, addiction and drug abuse, neurology, basic principles of pharmacotherapy and psychotherapy, and neurosciences related to psychiatry. Level 2, senior residency of year 3 and 4, focuses on sub-specialties of clinical psychiatry.

The assessment methods, which are used to determine whether residents are promoted from one year to the next, include the summation of end of rotation evaluations for the year using rating scales (comprising 50% of the total score) and the annual in-training examination scores (50% of the total score). These methods constitute the overall annual evaluation. In order to be promoted to the next year, the candidate is required to pass the end year examination, which is the annual in-training examination. This examination comprises of 100 multiple choice questions (MCQs) in year 1 and year 3 and 60% is required to pass. In year 2 and year 4, the assessment includes a written exam that is comprised of 100 MCQs each and a clinical oral exam that consists of clinical vignettes and a long case. The passing score for the written exam is 40% and 60% for the clinical oral component. However, the former does not reflect the relative weight of the written component. Surprisingly, there are no specific learning objectives for each year, and test blueprints are not used for test item construction and designing written examinations. Therefore, the purpose of this methodological paper is to describe a systematic approach, coupled with documentary and content analysis, to develop test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia.

Method

A descriptive qualitative study was conducted in the following three stages:

Development of test blueprints

Documentary and content analysis were used to develop test blueprints. The contents of these test blueprints were identified from different sources: (i) the booklet produced by American Board of Psychiatry and Neurology which describes part I summative examination;Citation8 (ii) the Instructional Manual of the American College of Psychiatrists for the year 2006, which is used as a formative exam for the psychiatry resident in-training examination (PRITE);Citation9 (iii) the contents of new MRCPsych examination from Royal College of Psychiatrists, UK,Citation10 was used for summative exam purpose; and (iv) documents from the Saudi Board Training Program in Psychiatry, including an information booklet entitled Saudi Board of Psychiatry. These contents were organized into four global test blueprints. Two investigators (EMG and RAN) independently did the original abstraction of weighting the aforesaid four existing certification exams and any disagreement was discussed with other authors (KSA and RPS). All four of them were agreed on any conflicting weighting.

According to modified Bloom’s taxonomy, each test blueprint was comprised of contents representing the following cognitive categories: recall, interpretation and problem solving.Citation11 For each year, two formats of the test blueprints were developed: the global test blueprint, which is comprised of the main themes derived from contents, and the detailed test blueprint, which is comprised of the main themes as well as the subcategories.

A cover letter, along with the four global blueprints, was sent to seven consultant psychiatrists outside Riyadh for piloting. They were contacted by one of the investigators (EMG) and their verbal consent to participate was obtained. They were asked to provide written feedback on blueprints in terms of content areas, weightage for each area, and percentage of test items to be included in each cognitive category. They were also asked to critique, add, delete, and/or reorganize content.

Gathering the opinions of experts about the test blueprints

The test blueprints were modified after piloting. Eight psychiatrists from Riyadh were invited to participate as panelists. An expert was defined as a person who has national or international board certification in psychiatry, has at least 5 years postgraduate teaching experience in a specialty, at least 8 years clinical experience, professional status as a consultant, and is involved in the Saudi Board of Psychiatry as a member or as an examiner. The modified global test blueprints were hand-delivered to each expert with a cover letter explaining the task, along with a questionnaire for recording the demographic data.

The expert panelists were asked to provide written feedback on the modified blueprints regarding content areas, weightage for each area, and percentage of test items to be included in each cognitive category – recall, interpretation and problem solving – according to modified Bloom’s taxonomy.Citation11 They were also asked to critique, add, delete, and/or reorganize the content. Using an approach described earlier for the neurology content of a new occupational therapy course,Citation12 the written opinions were analyzed to determine the extent of consensus among the experts. A “modified” nominal group technique was used: the expert panel was grouped together in two sessions to further explore the written responses and reach consensus on each of the four blueprints in terms of content, total weightage for each content area, and percentage of test items to be included under each of the cognitive categories, which are recall, interpretation and problem solving.

Nominal group discussion

Six experts who provided the written opinion participated in the group discussion. One of the investigators facilitated the group activity and was assisted by a psychiatric senior registrar who documented the final responses of the panel. The tasks expectations were clarified, the collected written opinions were projected, and the group was asked whether they agreed or disagreed with the identified content areas, total weightage for each content area, and the percentage of test items allocated. A final consensus on the four global test blueprints was reached.

Results

Pilot studies on test blueprints developed

Two out of the seven consultants (29%) provided feedback by email. After piloting, the four “global” test blueprints were modified accordingly.

The modified test blueprints based on expert opinions

Six experts (75%), whose profile is given in , provided printed written feedback. Based on their feedback, the weightage for each content area, and percentage of test items to be included in each cognitive category – recall, interpretation and problem solving – were analyzed and transformed into tables of specification.

Table 2 Profile of the panel experts

The “modified” nominal group discussion

As a group, the six experts discussed and reached a final consensus on each of the four test blueprints. In all four global test blueprints, experts differed about the total weightage for each content area and the proportion of test items to be included in each cognitive category. The main themes in the final version of the blueprints were 14 for the first and second years and 15 for the third and fourth years. A consensus (90%) was achieved among experts regarding the total weightage for each content area and the proportion of test items to be included in each cognitive category based on modified test blueprints. Accordingly, the test blueprints, both global (see ) and detailed ( to ), were further modified.

Table 3 Global test blueprint for the first year

Table 4 Global test blueprint for the second year

Table 5 Global test blueprint for the third year

Table 6 Global test blueprint for the fourth year

The expert panel suggested progressively decreasing the weightage for recall type of test-items and increasing interpretation/problem-solving type of test-items from year 1 to year 4 exams. This echoed the most pertinent part of Bloom’s taxonomy, which emphasizes the transition from mastery of facts to capacity for analysis as residents advance through years of training.

For any item of the test blueprints, 90% or more participants agreed. In addition, the four investigators themselves attempted to achieve nearly 100% final consensus on all items, including contents, weightage for each area, and percentage of test items. With regard to exceptions to some items, such as neurology and research, the final version of test blueprints weighting did not vary significantly from the original weighting. The authors feel that all three methods contributed fairly to the construction of the test blueprints.

Discussion

In developing test blueprints, three steps need to be considered.Citation13 The first step is preparation of a list of the instructional objectives. The second step is preparation of course content outlines. The third step is designing a two-way chart that relates instructional objectives to instructional content. In this study, the first and third steps could not be achieved because the learning objectives for each year were not specified. Only course content outlines (second step) were available for the first and second years. The contents of the test blueprint were based on alternative documentary sources, including the Royal College of Psychiatrists, UK; the American Board of Psychiatry and Neurology; the American College of Psychiatrists;Citation8Citation10 and the Saudi Board Training Program in Psychiatry.

Relying upon documents or records in research studies has both advantages and disadvantages. The main advantage is that the data provided are rich, detailed, and readily available for use. It is inexpensive, both economically and in terms of time, to collect data. The limitations of using documentary analysis are missing or incomplete data, inaccuracies in material, and inherent biases. Another difficulty encountered by the researchers in analyzing documents is that often documents are presented and organized in different ways.Citation14,Citation15

In this study, the documents used were prepared by professional organizations for different purposes – each with its own format, style, and content classification. The Royal College of Psychiatrists and The American Board of Psychiatry and NeurologyCitation8,Citation10 uses these for a single summative exam purpose; whereas, the American College of PsychiatristsCitation9 uses these for a single formative exam purpose. In this study, the researchers used a documentary analysis approach, because there were no suitable checklists to be used at this stage of the study. Global test blueprints, one for each year, were used. Each table of specifications is comprised of three elements:Citation7 contents, processes that are cognitive categories, and the importance of each showing proportion of test items to be included under each category.

According to this study, the researchers faced difficulty in categorizing divergent contents, which were primarily intended for a single exam purpose, into four test blueprints and exams. As a result, the Saudi Board Training Program in Psychiatry information booklet and content outlines, the Saudi Board experience of one of the researchers (EMG), and the expert panels’ opinion formed the bases on which the test blueprints were designed.

Both a global and detailed test format were developed each year. The detailed test blueprints, which were comprised of the main themes and subcategories, were identified to avoid confusion that could arise from divergent classifications. For example, the Royal College of Psychiatrists relied upon WHO classification, in terms of International Classification of Diseases (ICD 10), while the American Board of Psychiatry and Neurology and the American College of Psychiatrists relied upon the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) Revised Text. However, the Saudi Board Training Program in Psychiatry relies upon both classifications.

In the pilot study, the low response rate was perhaps due to the length of the tasks expected from experts. The experts were asked to provide written feedback about blueprints in terms of content, weightage, and proportion of test items to be included in each cognitive category; they were also expected to comment, add, delete and/or reorganize contents. An alternative explanation could be the unfamiliarity of respondents with Bloom’s old and revised taxonomy or with test blueprinting.

The experts’ opinion varied as to the contents to be included on each exam, weightage for topics, and proportion of test items in different cognitive categories. Webb and associatesCitation16 reported similar findings when they sought the training directors’ opinions about psychiatry residents’ in-service training exam. This finding can be explained by the diverse training background and clinical experience of experts. Two were British graduates, one is a current academic staff member, two are former academic staff members, and three were clinical in-service providers. There were also divergent views about the competencies that residents were expected to master. For example, one of the experts suggested that 15% of the test items be allocated to research methods in the second year, whereas another expert suggested only 2% and justified his views by asserting that, when compared to researchers, the residents who are expected to provide services in hospitals need to master clinical topics such as adult psychiatry. Another explanation could be the experts’ awareness that there is only one assessment method, MCQs, that is used in the first and third years of the residency program. In addition to being an expert in the subject area they are assessing, OsterlindCitation7 emphasized the need for training judges in the expected task of test blue printing as an important criterion.

There was a unanimous consensus on all content areas suggested to be included in each test blueprint with the exception of neurology. According to the head of the local committee of Psychiatry Residency Training Program at Riyadh, no test items on neurology should be included in the exam paper, even though neurology is an obligatory rotation during the second year of the residency program. This finding contradicts the recommendation that the instructional time should be based on relative curriculum weightage.Citation2,Citation17 Although residents undergo a three month rotation in neurology, no test items were included in the examinations.

Several caveats of the study need to be considered. Perhaps it is useful to further refine the test blueprints in terms of specific learning objectives rather than content. Furthermore, these learning objectives need to consider test blueprinting from a broader perspective to include knowledge, skills, and attitude domains. The poor response rate (29% of participants provided feedback during pilot study) reflected another weakness. These have ramifications in terms of appropriate instructional methods for instructors and learning strategies for residents. According to some academicians, the role of learning and rotation specific objectives for residents training, although important but not highlighted in this research, needs to be fully addressed in future studies on test blueprints development. The generalizability issue is also pertinent. The researchers are of the opinion that the methodology described is useful, appropriate for any test blueprint designing, especially for in-service assessment of postgraduate residents, in order to render the evaluation process educationally sound and rational. The specifics of content, instruments and weightage can be determined by keeping in mind the program goals and institutional mission and vision.

Another pertinent issue is the use of the modified nominal group technique (NGT), which shortens the process from 2–3 hours (classic NGT and other types of group discussions) to 90 minutes. This increases the practicality of the exercise and is used to elicit feedback from groups of six to 40 experts or learners. Participation in the modified NGT evaluation can be mandatory or voluntary. Every individual contributes equally to the exercise, and the nonjudgmental process encourages individuals to give honest observations and constructive criticisms. The modified NGT has some other advantages; it produces rank-ordered, weighted, semi-quantitative data on learners’ perceptions of the strengths and weaknesses of a course; generates both positive and negative feedback; and minimizes the influence that a “vocal minority” of learners with strong opinions can have in typical focus-group settings. The modified NGT is a valuable tool that may be used to evaluate both new curricula and established courses or programs. Most importantly, the modified NGT reflects more consensus and greater understanding of reasons for disagreement. On the other hand, the Delphi approach has greater reliability and can be combined with the NGT for developing a hybrid method.Citation18,Citation19

Conclusion

A systematic approach used to develop test blueprints is appropriate and is advocated. Using test blueprints will guide the test developer on how many items are to be selected for the test, the types of content being tested, and the range of competencies addressed. Test blueprinting is an important step for ensuring test validity in all residency programs including psychiatry.

Recommendations

  1. In order to enhance the response rate of experts, the planners need to use strategies for the development of test blueprints for residents in training exams in psychiatry or other specialties in future. One such option recommended by some academicians is to combine expert panelists and local experts in the group discussion.

  2. When the test blueprints for any specialty are designed, the teaching and learning objectives should be broad based, comprehensive, and precisely addressed. The objectives of postgraduate medical education may include competency-based residency training including medical knowledge, medical sciences, research, clinical skills, advocacy, leadership, professionalism, communication, outcomes, and system-based practice. A tailored test blueprint should enable tutorial interaction between experts and resident and between peers. It should also enhance self-directed learning. Overall, clinical, training, and research competency development topics need to be integrated into a test blueprints meant for postgraduate medical education.

  3. In a related development, it remains unascertained how the MCQs are constructed from tailored blueprints for postgraduate medical training. From training and examination perspectives, this will be research to be conducted in future.

Special note

Critics of Bloom’s taxonomy of learning objective

Some researchers criticized the Bloom’s classification because it was not a properly constructed taxonomy, as it lacked a systemic rationale of construction.Citation20 This was subsequently acknowledged by Anderson and colleagues in the revision of the taxonomy and the taxonomy reestablished on more systematic lines.Citation21 Some critiques of cognitive domain admitted the existence of these six categories, but questioned the existence of a sequential, hierarchical link.Citation22 Also, the revised edition of Bloom’s taxonomy has moved Synthesis in higher order than Evaluation. Some consider the three lowest levels as hierarchically ordered, but the three higher levels as parallel.Citation21

Acknowledgments

The authors wish to express their sincere thanks to all mental health experts for their important opinions and contributions to the production of this manuscript.

Disclosure

The authors disclosed no conflicts of interest in this paper.

References

  • BridgePDMusialJFrankRRoeTSawilowskySMeasurement practices: methods for developing content-valid student examinationsMed Teacher2003254414421
  • NotarCEZuelkeDCWilsonJDYunkerBDThe table of specification: insuring accountability in teacher made testsJ Instr Psychol2004312115129
  • BloomBSEngelhartMDFurstEJHillWHKrathwohlDRTaxonomy of Educational Objectives: The Classification of Educational Goals; Handbook I: Cognitive DomainNew YorkLongmans, Green & Co1956
  • AndersonLWKrathwohlDRTaxonomy for Learning, Teaching and Assessing: A Revision of Bloom’s Taxonomy of Educational ObjectivesNew YorkLongman2001
  • The blueprint-test plan development [online]2001 [cited January 15, 2007]Available from: http://www.quasar.ualberta.ca/AHE/edae458/section5/section5bluepage.htm
  • FowellSLSouthgateLJBlighJGEvaluating assessment: the missing link?Med Educ199933427628110336758
  • OsterlindSJConstructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats2nd edNew YorkKluwer Academic2003
  • American Board of Psychiatry and NeurologyBooklet produced by American Board of Psychiatry and Neurology describing part I for year 2007Section IV: Psychiatry Certification Examination Procedures, Formats, and ContentBuffalo Grove, ILAmerican Board of Psychiatry and Neurology2011 Available from: http://www.abpn.com/downloads/ifas/2012_IFA_Cert_Psych_110111.pdf
  • American College of PsychiatristsInstructional manual of the American College of Psychiatrists for year 2006. [Online]2006 [cited February 15, 2007] Available from: http://www.acpsych.org/prite
  • Royal College of Psychiatrists, UKContent of new MRCPsych examination for year 2007. [Online]2007 [cited February 17, 2007. Available from: http://rcpsych.ac.uk/exams
  • CoxKRHow did you guess? Or, what do multiple-choice questions measure?Med J Aust1979123884886967086
  • McCluskeyACollaborative curriculum development: Clinicians’ views on the neurology content of a new occupational therapy courseAust Occup Ther J2000471110
  • LinnRLMillerMDMeasurement and Assessment in Teaching9th edNew JerseyPrentice-Hall2006
  • AbbottSShawSElstonJComparative analysis of health policy implementation. The use of documentary analysisPolicy Stud J2004254259266
  • AppletonJVCowleySAnalysing clinical practice guide lines. A method of documentary analysisJ Adv Nurs1997255100810179147206
  • WebbLCSexsonSScullyJReynoldsCFShoreMFTraining directors’ opinions about the psychiatry resident in-training examination (PRITE)Am J Psychiatry19921495215241554038
  • Abdel-HameedAAAl-FarisEAAlorainyIAAl RukbanMOThe criteria and analysis of good multiple choice questions in a health professional settingSaudi Med J200526101505151016228046
  • DobbieARhodesMTysingerJWFreemanJUsing a Modified Nominal Group technique As a Curriculum Evaluation ToolFam Med200436640240615181551
  • HutchingsARaineRSandersonCBlackNA comparison of formal consensus methods used for developing clinical guidelinesJ Health Serv Res Policy200611421822417018195
  • MorsheadRWComment on: Taxonomy of Educational Objectives Handbook II: Affective Domain. Authors: Krathwohl DR, Bloom BS, Masia BB. David McKay Co. New York 1964Studies in Philosophy and Education196541164170 Available from: http://deepblue.lib.umich.edu/bitstream/2027.42/43808/1/11217_2004_Article_BF00373956.pdf
  • AndersonLWKrathwohlDRAirasianPWAndersonLWKrathwohlDRTaxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational ObjectivesBostonAllyn and Bacon2000
  • PaulRCritical Thinking: What Every Person Needs to Survive in a Rapidly Changing World3rd edRohnert Park, CASonoma State University Press1993
  • DraperSTaxonomies of learning aims and objectives: Bloom, neoBloom, and criticisms Available from: http://www.psy.gla.ac.uk/~steve/best/bloom.htmlAccessed April 2, 2012

Appendices

Appendix I Detailed test blueprint for the first year

Appendix II Detailed test blueprint for the second year

Appendix III Detailed test blueprint for the third year

Appendix IV Detailed test blueprint for the fourth year