Search in:

Medical Education Online Volume 24, 2019 - Issue 1

Submit an article Journal homepage

Open access

2,560

Views

CrossRef citations to date

Altmetric

Listen

Research Article

Best practice versus actual practice: an audit of survey pretesting practices reported in a sample of medical education journals

Colleen Y. ColbertEducation Institute, Cleveland Clinic and Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OhioCorrespondence[email protected]

https://orcid.org/0000-0002-2608-7218

Judith C. FrenchGeneral Surgery Residency Program, Cleveland Clinic and Cleveland Clinic Lerner College of Medicine of Case Western University, Cleveland, Ohio

https://orcid.org/0000-0001-8393-8376

Alejandro C. ArroligaDepartment of Medicine, Baylor Scott & White Health/Texas A&M HSC College of Medicine, Temple, Texas

S. Beth BiererEducation Institute, Cleveland Clinic and Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio

https://orcid.org/0000-0002-7952-8822

Article: 1673596 | Received 21 Jun 2019, Accepted 19 Sep 2019, Published online: 31 Oct 2019

Cite this article
https://doi.org/10.1080/10872981.2019.1673596
CrossMark

In this article

ABSTRACT
Introduction
Question response processes
Rationale for assessing survey questions
Pretesting methods
Purpose
Conclusion
Acknowledgements
Disclosure statement
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

ABSTRACT

Background: Despite recommendations from survey scientists, surveys appear to be utilized in medical education without the critical step of pretesting prior to survey launch. Pretesting helps ensure respondents understand questions as survey developers intended and that items and response options are relevant to respondents and adequately address constructs, topics, issues or problems. While psychometric testing is important in assessing aspects of question quality and item performance, it cannot discern how respondents, based upon their lived experiences, interpret the questions we pose.

Aim: This audit study explored whether authors of medical education journal articles within audited journals reported pretesting survey instruments during survey development, as recommended by survey scientists and established guidelines/standards for survey instrument development.

Methods: Five national and international medical education journals publishing survey articles from Jan. 2014 – Dec. 2015 were audited to determine whether authors reported pretesting during survey development. All abstracts within all issues of these journals were initially reviewed. Two hundred fifty-one articles met inclusion criteria using a protocol piloted and revised prior to use.

Results: The number of survey articles published per journal ranged from 11 to 106. Of 251 audited articles, 181 (72.11%) described using a new instrument without pretesting, while 17 (6.77%) described using a new instrument where items were pretested. Fifty-three (21.12%) articles described using pre-existing instruments; of these, no articles (0%) reported pretesting existing survey instruments prior to use.

Conclusions: Findings from this audit study indicate that reported survey pretesting appears to be lower than that reported in healthcare journals. This is concerning, as results of survey studies and evaluation projects are used to inform educational practices, guide future research, and influence policy and program development. Findings apply to both survey developers and faculty across a range of fields, including evaluation and medical education research.

KEYWORDS:

Audit
survey methodology
pretesting
questionnaire
questionnaire methodology
survey validation
healthcare

Introduction

Despite recommendations from survey scientists and medical educators to assess the quality of survey items prior to use [Citation1–Citation24], survey instruments are frequently utilized in the field of medical education without a key step in question development: pretesting. For some medical education researchers, program evaluators, and curriculum designers, the omission of pretesting methodology () may represent a gap in knowledge regarding question design [Citation10]. For others, the time required for pretesting (i.e., assessing survey items prior to use to enhance construct validity and minimize measurement error) may seem unnecessary [Citation5,Citation14,Citation25], especially when journals do not uniformly require descriptions of survey development [Citation3,Citation5,Citation15]. One analysis of healthcare-related journals found that descriptions of question development (including assessment of question quality) were included in less than a quarter of studies reviewed [Citation5]. In another analysis, only 36% of survey articles within a sample of critical care medicine journals reported pretesting methods or piloting of the instrument [Citation26]. In a recent study focusing on surveys in research articles, Artino et al. [Citation3] examined the overall quality of self-administered research survey instruments published in three high-impact factor health professions education journals in 2013 and found low rates of cognitive interviewing/testing, one type of pretesting, in their sample of 36 survey research articles.

Table 1. Definitions and descriptions.

Download CSV Display Table

In this article, we describe the use of audit methodology to examine pretesting practices reported in five medical education journals, with the goal of improving the quality of survey question design and survey reporting practices. Audits compare current practices with best practices or guidelines/standards, with a goal of practice improvement [Citation34–Citation38]. During our audit, we examined medical education journal articles reporting a broad range of survey usage (e.g., survey instruments used in medical education research, program evaluation, curriculum development, innovations, etc.), and then compared these descriptions of survey development with recommendations from methodologists and scientists within the field of survey science [Citation1,Citation2,Citation6,Citation8–Citation15,Citation18,Citation19,Citation21,Citation22,Citation24,Citation39]. Findings reported in this paper should be useful to survey developers across a range of fields, including program evaluation, and ensure the implementation of best practices.

Background

Survey methodology includes the use of in-person, phone, or online interviews and self-administered questionnaires (SAQs) (e.g., paper, emailed, online), where a sample of respondents drawn from a target population is queried systematically with standardized instruments to examine aspects of the larger population [Citation7,Citation12,Citation14,Citation22]. The goal is for respondents to ‘interpret the questions consistently and as the question developer intended.’ (p. 2) [Citation40]. Survey developers often assume that instruments will collect ‘true’ or accurate data [Citation8,Citation41]. Yet, this may not be the case, due to issues which arise during the question-response process [Citation9,Citation11,Citation22,Citation23].

Question response processes

While some survey developers may view questions as ‘objective’ if they query for quantitative answers and ‘subjective’ if they query for narrative responses, no question is truly objective due to the array of factors which can prevent a survey developer’s intended meaning from being accurately communicated to respondents [Citation2,Citation6,Citation9,Citation10,Citation12,Citation18,Citation19,Citation42,Citation43]. A question-response model, based in cognitive psychology, is often used to describe the question-response process [Citation13,Citation19,Citation21,Citation43]. According to this model, when respondents answer survey questions they engage in a complex series of iterative cognitive processes. These processes include question comprehension (from a literal and pragmatic standpoint); retrieval of relevant information; calculation, estimation or construction of required information; decisions of whether to tell the truth, how much to answer, and estimation of harm (i.e., judgment); and selection or formulation of a response, either by mapping answers to response options provided or answering open-ended questions [Citation9,Citation12,Citation18,Citation19,Citation43,Citation44]. As Collins [Citation9] and Dillman et al. [Citation10] have noted, respondents must engage in at least two additional steps with self-administered questionnaires: understanding that they are being asked to fill out a questionnaire and then understanding how to navigate through a survey tool on their own. Every juncture in this set of iterative processes provides potential sources of measurement error during data collection [Citation9,Citation13,Citation44].

Approaches to answering questions also mirror conversational behaviors and are influenced by socio-cultural expectations and norms [Citation13,Citation18,Citation19,Citation42,Citation45], which in turn affect how respondents approach the overall survey response process [Citation44]. Ambiguous items are problematic for respondents and survey developers, as respondents may create their own meanings for unclear questions or ignore ambiguous items entirely. Additionally, meanings can vary across individuals, social groups, and cultures [Citation9,Citation12,Citation44–Citation46]. All of these factors may contribute to increased measurement error, thereby impacting score reliability, when surveys are fielded [Citation6,Citation19,Citation45]. Even seemingly straightforward questions such as, ‘How much exercise do you get, on average?’ can generate responses rife with measurement error when a time period is omitted [Citation6] or respondents are forced to calculate in order to select a response. At times, respondents may find the number of response options to be inadequate or inappropriate, thereby inserting measurement error into the data collection process. In other cases, respondents may be concerned about confidentiality of the response process (e.g., surveys concerning the workplace) and may skip questions or produce superficial or socially acceptable answers [Citation9,Citation12,Citation42,Citation46]. For SAQs, where respondents lack verbal cues and interviewer clarification, answering even simple questions can be problematic [Citation9,Citation12,Citation18,Citation19,Citation44].

Rationale for assessing survey questions

Validity involves making judgments about the extent to which sources of evidence support the adequacy and appropriateness of score interpretations and actions based upon those scores [Citation47]. Examining response processes during pretesting provides an important source of validity evidence, as the validity of survey results is determined by multiple factors, including the comprehensibility of questions asked, adequacy of response options, overall survey design, and acceptability of the survey tool [Citation8–Citation10,Citation18,Citation25,Citation41,Citation48].

When respondents uniformly comprehend an item differently than survey developers intended, measurement error is introduced into survey data [Citation2,Citation9–Citation13,Citation21–Citation23,Citation44]. Even the most common terms (e.g., ‘child/children’ and ‘you’) used in survey questions have been interpreted in different ways by respondents [Citation12,Citation19]. Online, self-administered questionnaires can be especially challenging for some respondents due to visual design elements and navigational factors which can affect item completion and item comprehension [Citation2,Citation10,Citation22]. Location (spacing/alignment) of visual design elements affects respondents’ perceptions of item relatedness and may cause respondents to perceive items as being related when they are not [Citation10], which can affect the accuracy of responses.

While measurement error associated with survey use cannot be avoided [Citation19,Citation22,Citation41,Citation44], it can be minimized by following recommended steps during survey development [Citation1,Citation2,Citation12,Citation22], including assessing the quality of survey items prior to use [Citation1,Citation2,Citation6,Citation9,Citation13,Citation15,Citation20,Citation22,Citation24]. Item assessment includes both quantitative and qualitative methods. Quantitative methods can help us to determine how widespread a problem is and how individual survey items perform compared with other items. Yet, these methods cannot tell us why an item is problematic, especially across socio-cultural groups. The acceptance of pretesting as a fundamental step in establishing survey item quality has prompted numerous government agencies to use cognitive interviews, one method of pretesting, to gather feedback to determine whether survey instruments need to be modified prior to use [Citation44]. Pretesting not only helps to ensure respondents understand questions as survey developers intended, but also ensures that constructs, topics, issues, or problems are relevant and adequately represented from the point of view of survey participants. Miller noted that “Today there is little debate that question design – how questions are worded and the placement of those questions within the questionnaire – impacts responses.” (p. 2) [Citation13].

Pretesting methods

Pretesting refers to a variety of methods (e.g., cognitive interviews, focus groups, use of questionnaire appraisal tools, etc.) which are used to assess both questions and survey tool format prior to use [Citation2,Citation10,Citation15]. Pretesting occurs prior to pilot testing, and most pretesting methods are participatory (i.e., respondents are made aware of their roles in assessing survey items) [Citation15]. A hybrid approach, combining a variety of pretesting methods, can overcome limitations of any one method [Citation2,Citation9,Citation10,Citation22,Citation39]. See for descriptions of pretesting methods. Of note, while expert feedback is often important during the question design process, it cannot provide information on whether questions will be understandable (literally and pragmatically) or acceptable to members of specific target populations (e.g., patients, specific ethnic or cultural groups, etc.) due to differences in education, language, health literacy, reading level, and sociocultural factors between experts and survey respondents [Citation2,Citation6,Citation9,Citation13,Citation19,Citation44]. Content experts are typically not knowledgeable about differences in item functioning across cultures and languages, which is especially critical when survey tools are translated [Citation44]. Determining how representative members of the target population will interact with and comprehend survey items is therefore considered to be a fundamental step in survey development [Citation9,Citation10,Citation13,Citation22–Citation24,Citation44]. Last, assessing the performance of a survey in the field and under realistic conditions (field testing) [Citation1,Citation49] is often combined with pretesting of online and self-administered surveys prior to pilot testing ().

Purpose

The purpose of this study was to determine whether authors of medical education journal articles featuring survey methodology reported pretesting survey instruments during survey development, a best practice recommended by survey scientists and one of the established guidelines/standards for survey development [see Citation10,Citation12,Citation15].

Methods

Review of the literature

Rather than rely solely on recommendations published within our own field, we adopted a transdisciplinary approach and reviewed literature from experts in the field of survey methodology/survey science [Citation6,Citation8,Citation10–Citation12,Citation15,Citation18,Citation19,Citation21–Citation25,Citation42,Citation43,Citation46,Citation48] to identify best practices regarding pretesting methods.

Review of national/international guidelines

In keeping with audit methodology, we reviewed guidelines, best practices, and standards (grey literature) from the American Statistical Association’s Section on Survey Methods [Citation2], Pew Research Center [Citation14], the Statistical and Science Policy Office’s Subcommittee on Questionnaire Evaluation Methods [Citation40], and the American Association of Public Opinion Research (AAPOR) [Citation1]. AAPOR is the largest survey and public opinion research organization in the USA, whose members include employees of national and international governmental bodies (e.g., U.S. Census Bureau), academic/scientific institutions (e.g., NORC at the University of Chicago), private enterprise survey practitioners, and survey scientists from all over the world.

In addition to offering guidelines concerning overall survey quality (), AAPOR offers specific guidance concerning pretesting in its Best Practices for Survey Research:

5. Take great care in matching question wording to the concepts being measured and the population studied:

“ … Ways should be devised to keep respondent mistakes and biases (e.g., memory of past events) to a minimum, and to measure those that cannot be eliminated. To accomplish these objectives, well-established cognitive research methods (e.g., paraphrasing and “think-aloud” interviews) and similar methods (e.g., behavioral coding of interviewer-respondent interactions) should be employed with persons similar to those to be surveyed to assess and improve all key questions along these various dimensions.” [Citation1]

Guideline 7. Pretest questionnaires and procedures to identify problems prior to the survey.

“ … Because it is rarely possible to foresee all the potential misunderstandings or biasing effects of different questions or procedures, it is vital for a well-designed survey operation to include provision for a pretest. All questions should be pretested to ensure that questions are understood by respondents, can be properly administered by interviewers or rendered by web survey software and do not adversely affect survey cooperation.” [Citation1]

Figure 1. Data extraction form sample.

Figure 2. Article selection process for the audit of survey reporting practices in five journals.

Journal selection criteria

National and international journals indexed in PubMed which focused on undergraduate and postgraduate medical education were included. Unlike previous studies examining only journals with the highest impact factors in their fields [Citation3,Citation5,Citation26,Citation50], we purposefully selected journals with a range of impact factors (low to high). We also included journals such as the Journal of Surgical Education, which were ‘high-output’ survey publishers (i.e., those which published a large volume of survey articles). Though the number of journals audited was small (five), our study was similar to three studies outside medical education which performed in-depth reviews of survey reporting practices (four to five journals) [Citation5,Citation26,Citation50] and an analysis of survey research in health professions education (three journals). The following journals were included in the audit study: Academic Medicine, Journal of Graduate Medical Education, Journal of Surgical Education, Medical Education, and Perspectives on Medical Education.

Audit methodology

Audit methodology is common across fields as diverse as finance [Citation38], quality improvement [Citation34,Citation37], clinical care [Citation36,Citation51], and social science research [Citation35]. Audits involve an examination of practices, processes, materials, or outcomes against established criteria/standards, guidelines, best practices, or laws [Citation34–Citation38]. Most audits are performed with a goal of bringing about changes in practice [Citation34,Citation36,Citation38]. We chose this methodology as it would allow us to evaluate how pretesting practices are reported across a subset of medical education journals and then compare published descriptions of pretesting with accepted standards or best practices within the field of survey science [Citation1,Citation2,Citation6,Citation8,Citation10–Citation12,Citation14,Citation15,Citation18,Citation21,Citation22,Citation24,Citation44,Citation46].

Audit procedure

Once journals were selected, JCF and CYC identified the following article categories to be included in the audit: articles, original reports/articles, research reports/articles, innovation reports, brief reports, and perspectives. The following categories were excluded: letters to the editor, art and medicine, commentaries. All survey articles (rather than a sample) published within these five medical education journals for the time period January 2014 – December 2015 were eligible for inclusion. This time period was chosen to ensure articles were no longer under embargo as we began our audit in 2016.

CYC and JCF then systematically reviewed each journal’s table of contents. Next, all abstracts (except within excluded categories) within every journal issue for the time period Jan. 2014 through Dec. 2015 were reviewed for relevance. See Figure 2. Methods and results sections of all abstracts were analyzed. Examples of terms within abstracts which triggered full article access included, but were not limited to the following: survey, surveys, surveyed, surveying, survey design/study, question(s), questionnaire(s), poll, polling, polled, probe(s), probing, verbal probing, query, queries, querying, response, respond, respondent(s), respondent debriefing, debriefing, pretest, pretesting, pre-piloting, testing items, assessing items/questions, cognitive interview(s), cognitive interviewing, cognitive testing, think aloud, beliefs, perspectives, opinions, thoughts. In addition, if article relevance was unclear, the article was pulled and examined. Thus, the process to identify articles went beyond a typical key word search. All titles of articles pulled for review were then entered into a data extraction form, an Excel spreadsheet, by journal name.

Based upon a review of survey science literature and published recommendations from survey methodologists, CYC and JCF developed a coding protocol () focusing on pretesting, as this is a critical, but frequently overlooked step in survey development. The coding protocol was piloted and refined (e.g., a code for non-survey was added to the protocol) prior to use during the 2016 audit. The following codes were used in the data extraction form: 0 – Not a survey (e.g., when the tool described was actually not a survey); 1 – New instrument, no pretesting/pre-piloting reported; 2 – New instrument, pretesting/pre-piloting reported or steps described; 3 – Pre-existing instrument (or items from pre-existing instruments), no pretesting/pre-piloting reported with target population in new study/curriculum; 4 – Pre-existing instrument, pretesting/pre-piloting reported or steps described.

To be coded as a ‘2’ or ‘4’, authors merely had to mention methods recommended for pretesting survey items in the survey science literature (e.g., pretest, pretested, pretesting, pre-pilot, pre-piloted, pre-piloting, testing items/surveys, cognitive interviews/cognitive testing, survey evaluation, feedback on items, verbal probes, think a louds, etc.). We specifically searched for language within the text of each article which indicated that authors had pretested survey tools with respondents who were similar to members of the target population, as this is essential for establishing construct validity and recommended by survey methodologists and scientists [Citation1,Citation2,Citation22,Citation24,Citation25]. For example, in cases where authors described utilizing a previously developed instrument, but did not report testing/pretesting items prior to use, the article was coded as a ‘3’ (Appendix 1). When disagreements about coding arose, CYC and JCF discussed those items and reached consensus concerning any changes to coding.

Results

Of 254 articles initially selected for review, three described the development of tests (rather than surveys) and were omitted from the analysis. In aggregate, a total of 251 articles described the use of survey methodology. The number of survey articles published by audited journals during the study time period ranged from 11 to 106 (Median = 37) and the range of pretesting/pre-piloting reported was 0–13.51%. Additionally, 181 of 251 articles (72.11%) described the use of a new survey instrument, but failed to report survey or item pretesting prior to use (). Fifty-three of 251 articles (21.12%) described the use of a pre-existing survey instrument (or items from a pre-existing instrument), but no articles (0%) reported pretesting the survey tool prior to use. Several of these authors noted that survey instruments were validated previously or items were derived from ‘valid’ survey instruments, yet no description of pretesting with individuals similar to the new target population was provided (see Discussion). Lack of pretesting was also not mentioned as a limitation in the majority of articles.

Of 251 articles, only 17 (6.77%) described using a new instrument that was pretested prior to use (). What follows are examples of descriptions of pretesting from articles which were coded as ‘2’ during the audit (new instrument, pretesting/pre-piloting reported or steps described).

Table 2. Audit data for a sample of medical education journals publishing survey articles from 2014 to 2015.

Download CSV Display Table

Descriptions of pretesting were often brief and provided few to no details about specific pretesting methods:

“Content validation was accomplished via review by experts, including program directors. Cognitive pretesting with 4 interns who had recently selected residency programs informed minor revisions” (p. 22) [Citation52].

“Prior to administration, the survey underwent expert review by colleagues as well as cognitive pre-testing with a group of third year medical students from our institution” (p. 750) [Citation53].

“Before dissemination, the survey was tested amongst student representatives of the college, who completed the questionnaire and provided feedback to inform the final version”(p. 663) [Citation54].

In some cases, authors referred readers to other articles which described the development of an instrument:

“The survey instrument was developed and tested in 2008–2009 by experts in survey research, organizational science, and academic medicine. Literature reviews, faculty focus groups, and cognitive interviews were used to inform its development” (p.356) [Citation55].

Other articles included more detailed reports of survey development, including pretesting methodology:

“We also engaged in detailed cognitive pretesting to identify problems with the survey questions that could result in response error (e.g., complicated instructions, vague wording, inappropriate assumptions). We conducted cognitive interviews with a small number of faculty members similar to those in our target population, using a think-aloud approach and verbal probing techniques. We then modified the survey questions on the basis of our findings” (p. 302) [Citation56].

“The survey methodology has been detailed in a prior publication. Briefly, we developed an initial pool of questions using qualitative methods and detailed interviews, which we then subjected to Dillman’s 4 stages of pretesting, including review by knowledgeable colleagues, cognitive interviewing, pilot testing, and a final check” (p. 578) [Citation57].

“The survey was developed based on clinical reasonableness, with input from an associate programme director of the medicine residency and the medical director of the rapid response system. It underwent cognitive testing with two medical residents, and questions and response sets were revised based on their responses. Because the available population of residents is comparatively small, we did not conduct further pre-testing of the instrument in order to avoid contaminating the subject pool” (p. 1213) [Citation58].

Overall, the majority of descriptions of pretesting did not disclose the amount of information recommended by AAPOR and other organizations, as there was often no mention of size of pretest groups and specific pretesting methods which were used (e.g., cognitive interviews, verbal probing methods, focus groups, think a louds, respondent debriefings, etc.).

Discussion and implications

In this study, only 6.77% of survey articles (N = 251 articles) published within five medical education journals over a two-year period reported pretesting survey items prior to use. No audited survey articles (0%) reported pretesting an existing instrument, though surveys are frequently used within new contexts and with new target populations and thus require pretesting [Citation2,Citation3,Citation22]. This audit study goes beyond findings from previous studies reported in the medical education and health professions education literature as it provides data on survey reporting practices both within and beyond research contexts in audited journals. While Artino et al.’s [Citation3] work is important in highlighting problems in overall quality of published survey instruments, it focused exclusively on research surveys published within three high-impact health professions education journals in 2013 (37 self-administered surveys in 36 research articles). In this study, we audited 251 medical education articles published during 2014–15 in journals with a range of impact factors (low to high) which reported results of evaluation projects, curriculum development projects, and research studies where survey methodology had been used. We discovered that articles published within high impact factor journals did not report pretesting more frequently than articles in low impact factor journals.

Survey methodology is widely used within the field of medical education and is one of the most commonly used methodologies in health professions education research (3). It is also integral to numerous program evaluation models, including the Kirkpatrick model [Citation59], which has been embraced by medical educators as they evaluate single programs or units within curricula. Survey methodology is also used in large-scale evaluations of programs or organizations, and data derived from surveys can influence program adoption, continuance, funding, policies [Citation60] and politics.

While numerous factors affect the quality of data derived from survey instruments (e.g., adequacy of content coverage in the instrument, scale development and refinement, sampling, problems in implementation in the field), if different respondents do not understand questions in the same way and as researchers intended, then the other issues are moot [Citation2,Citation8,Citation9,Citation11,Citation12,Citation15,Citation18,Citation21,Citation22,Citation24,Citation40,Citation43,Citation47,Citation48,Citation61,Citation62]. Examining response processes also provides one source of evidence recommended to establish a validity argument which supports the appropriate use and interpretations of survey results [Citation47]. Therefore, methods which focus on the question-response process are viewed as key to determining whether questions are understandable and acceptable to members of target populations [Citation1,Citation2,Citation5,Citation22,Citation44].

Our results were somewhat surprising, given both the literature and guidelines describing the need for assessing the quality of survey items prior to use [Citation1,Citation2,Citation4,Citation18,Citation22,Citation23,Citation25,Citation46]. Yet, our results do support findings from other fields. Bennett et al. [Citation5], in a review of the top five journals from internal medicine, health sciences, informatics and public health, found that fewer than 20% of articles (N = 117 articles) included descriptions of survey development, including pretesting. Duffett et al. [Citation26] also found low reporting levels of survey pretesting across five journals in critical care medicine. In nephrology, Ho-Ting Li et al. [Citation50] found 27% of survey articles within a sample of four journals mentioned gathering validity evidence related to survey development. Finally, Artino et al. [Citation3], in an examination of 36 survey research articles published in 2013 in three high-impact journals, found that only four articles (10.8%) reported survey pretesting which involved cognitive processes.

Use of ‘validated’ tools

In this sample of 251 survey articles, 21.12% utilized pre-existing questionnaires or items from pre-existing questionnaires, many of which were described as ‘validated.’ While researchers often explore pre-existing questionnaires for sample questions [Citation6], problems arise when they fail to assess survey items with their own study populations (3) prior to data collection. Just as no measurement tools contain validity, no questions can contain validity either [Citation48,Citation61,Citation63]. Educators, researchers, and program evaluators cannot be assured that the questions they construct, adopt, or adapt will measure constructs they are interested in measuring – or that score interpretations are actually valid – if they fail to assess these questions with representative members of new target populations prior to use [Citation1,Citation12,Citation15,Citation22]. Collecting data without pretesting with a local sample can thus lead to inaccuracies in survey results [Citation9].

Limitations

The paper reports the results of an audit of 251 survey articles from five medical education journals over a 2-year time period. Journals not indexed in PubMed at the time of our review were not included in this limited audit, nor were online-only journals. We also did not include clinical journals from nursing, pharmacy, or allied health education or journals which publish medical education reports on an intermittent basis [Citation64]. There is also a possibility that survey articles coded as ‘1’ (new instrument, no pretesting) and ‘3’ (pre-existing instrument, no pretesting) had actually included survey instruments which were pretested, but the authors failed to report it (e.g., due to article space limitations) [Citation7]. While it is possible that journal space limitations or lack of explicit journal instructions caused authors to omit pretesting descriptions, we do not believe that this affected the results of this particular study, given earlier reports in the medical education literature on widespread problems with survey development [Citation3,Citation7,Citation23] and our own experiences in the field. In addition, in order to be coded as a ‘2’ (new instrument, pretested) or ‘4’ (existing instrument, pretested) in this audit study, an author simply had to note that their ‘survey was pretested’ prior to use. Finally, while we did not formally calculate interrater agreement, we did utilize a data extraction form with explicit categories and descriptors (), and achieved consensus on coding whenever we experienced discrepancies.

Conclusion

Survey methodology is used in large-scale evaluations of programs or organizations; medical education research, projects, and evaluations. Data derived from surveys can influence program adoption, continuance, funding, and policies [Citation7,Citation10,Citation25,Citation60]. Despite the prevalence of surveys as a common data collection method, it appears that many of the surveys used in evaluation projects and medical education research reported in articles examined in this audit study may have contained questions which were never assessed for comprehensibility, acceptability (e.g., cultural), and answerability prior to use. Findings from this audit study indicate that reported pretesting of survey tools prior to use may be even lower than that reported in healthcare studies [Citation5,Citation26,Citation50].

Given the prevalence of survey methods (especially self-administered questionnaires) across health professions education [Citation3,Citation4] and the potential impact of findings, we believe it is imperative that evaluators, researchers, and clinician educators adhere to accepted standards of survey development by pretesting survey questions prior to use. All survey questions, even seemingly ‘objective’ or ‘simple’ items (e.g., demographic questions), should be pretested prior to use to ensure we are measuring what we think we are measuring (the construct of interest) [Citation2,Citation4,Citation6,Citation8,Citation15,Citation19,Citation22,Citation25,Citation40,Citation43]. While psychometric testing is important in assessing aspects of question quality and item performance, it cannot help us to understand how our respondents, based upon their own lived experiences, interpret questions we pose (i.e., verstehen) [Citation65]. Therefore, some researchers are turning to mixed method approaches (quantitative and qualitative) to ‘improve the validity of survey estimates’ (p. 152) [Citation66]. Last, we recommend that authors, reviewers, and editors involved in the publication of survey articles and reports adhere to best practices in survey methodology, which includes transparency in reporting survey methods and disseminating survey results [Citation1,Citation5,Citation67].

Acknowledgments

The authors would like to thank our former colleague Elaine Dannefer, PhD, who provided invaluable feedback during revisions of an earlier version of this manuscript. We would also like to thank reviewers from ZMEO who provided thoughtful, actionable feedback.

Disclosure statement

No potential conflict of interest was reported by the authors.

Related Research Data

SURVEY RESEARCH

Source: Annual Reviews

Pretesting survey instruments: An overview of cognitive methods

Source: Springer Science and Business Media LLC

Are We Asking the Right Questions? Using Cognitive Interviews to Improve Surveys in Education Research

Source: American Educational Research Association (AERA)

The Psychology of Survey Response

Source: Cambridge University Press

Characteristics, satisfaction, and engagement of part-time faculty at U.S. medical schools.

Source: Ovid Technologies (Wolters Kluwer Health)

Long-term culture change related to rapid response system implementation.

Source: Wiley

A guide for the design and conduct of self-administered surveys of clinicians

Source: Joule Inc.

Encyclopedia of Survey Research Methods

Source: SAGE Publications, Inc.

Are graduating surgical residents confident in performing open vascular surgery? Results of a national survey.

Source: Elsevier BV

The Science of Asking Questions

Source: Annual Reviews

Cognitive Interviewing to Assess Social Support Measures in Ethiopia and Kenya

Source: SAGE Publications

Audit of Intrapartum Care Based on the National Guideline for Midwifery and Birth Services.

Source: SAGE Publications

Methods for Testing and Evaluating Survey Questions

Source: Oxford University Press (OUP)

Quality of Survey Reporting in Nephrology Journals: A Methodologic Review

Source: American Society of Nephrology (ASN)

A methodologic review*

Source: Ovid Technologies (Wolters Kluwer Health)

24/7 In-house Intensivist Coverage and Fellowship Education: A Cross-sectional Survey of Academic Medical Centers in the United States

Source: Elsevier BV

Methods for Testing and Evaluating Survey Questions

Source: Oxford University Press (OUP)

Using Qualitative Methods to Validate and Contextualize Quantitative Findings: A Case Study of Research on Sexual Behavior and Gender-Based Violence Among Young Swazi Women.

Source: Johns Hopkins School Bloomberg School of Public Health, Center for Communication Programs

Asking questions about behavior: cognition, communication, and questionnaire construction

Source: Elsevier BV

Foundations and New Directions

Source: John Wiley & Sons, Inc.

Assessing the Quality of Published Survey Instruments in Health Professions Education Research

Source: Ovid Technologies (Wolters Kluwer Health)

The use of verbal report methods in the development and testing of survey questionnaires

Source: Wiley

Self-reports: How the questions shape the answers.

Source: American Psychological Association (APA)

Self-reports: How the questions shape the answers.

Source: American Psychological Association (APA)

Attitudes, Motivators, and Barriers to a Career in Surgery: A National Study of UK Undergraduate Medical Students

Source: Elsevier BV

Seeking Comparability and Enhancing Understanding

Source: SAGE Publications

Data Collection

Source: John Wiley & Sons, Inc.

Linking provided by

References

American Association of Public Opinion Research. Best practices for survey research. 2018. Available from: https://www.org/Standards-Ethics/Best-Practices.aspx#best6
Google Scholar
American Statistical Association. Section on survey research methods. How to conduct pretesting. 1997. Available from: http://www.parkdatabase.org/files/documents/1999_ASA-How-to-conduct-pre-testing_ASA.pdf
Google Scholar
Artino AR, Phillips AW, Utrankar A, et al. “The questions shape the answers”: assessing the quality of published survey instruments in health professions education research. Acad Med. 2018;93:456–11. Available from: https://journals.lww.com/academicmedicine/Abstract/publishahead/_The_Questions_Shape_the_Answers____Assessing_the.98067.aspx
PubMed Web of Science ®Google Scholar
Artino AR, Rochelle JS, Dezee KJ, et al. Developing questionnaires for educational research: AMEE guide no. 87. Med Teach. 2014;36:463–474.
PubMed Web of Science ®Google Scholar
Bennett C, Khangura S, Brehaut J, et al. Reporting guidelines for survey research: an analysis of published guidance and reporting practices. PLoS Med. 2011;8:e1001069. Available from: http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001069#s4
Web of Science ®Google Scholar
Bradburn N, Sudman S, Wansink B. Questionnaires from start to finish. In Asking questions. The definitive guide to questionnaire design – for market research, political polls, and social and health questionnaires. San Francisco, CA: Jossey-Bass; 2004. p. 317–318.
Google Scholar
Colbert CY, Diaz-Guzman E, Myers JD, et al. How to interpret surveys in medical research: a practical approach. Cleve Clin J Med. 2013;80:423–435.
Web of Science ®Google Scholar
Collins D. Pretesting survey instruments: an overview of cognitive methods. Qual Life Res. 2003;12:229–238.
PubMed Web of Science ®Google Scholar
Collins D. Cognitive interviewing: origin, purpose and limitations. In: Collins D, editor. Cognitive interviewing practice. Los Angeles (CA): Sage Publications; 2014. p. 3–27.
Google Scholar
Dillman DA, Smyth JD, Christian LM. Aural versus visual design of questions and questionnaires. In: Internet, phone, mail, and mixed mode surveys. The tailored design method. 4th ed. Hoboken (NJ): John Wiley & Sons, Inc.; 2014. p. 169–227.
Google Scholar
Fowler FJ. How unclear terms affect survey data. Public Opin Quar. 1992;56:218–231.
PubMed Web of Science ®Google Scholar
Groves RM, Fowler FJ, Couper MP, et al. Survey methodology. 2nd ed. Hoboken (NJ): John Wiley and Sons, Inc.; 2009.
Google Scholar
Miller K. Introduction. In: Miller K, Willson S, Chepp V, et al., editors. Cognitive interviewing methodology. Wiley series in survey methodology. Hoboken (NJ): John Wiley & Sons, Inc.; 2014. p. 1–5.
Google Scholar
Pew Research Center. U.S. survey research. Questionnaire design. Available from: http://pewresearch.org/methodology/u-s-survey-research/questionnaire-design/
Google Scholar
Presser S, Couper MP, Lessler JT, et al. Methods for testing and evaluating survey questions. Pub Opin Quarter. 2004;68:109–130.
Web of Science ®Google Scholar
Programme for International Student Assessment (PISA). Technical report. The development of the context questionnaires. Paris, France: Organization for Economic Co-operation and Development Publishing; 2009. p. 57. Available from: https://www.oecd.org/pisa/data/42025182.pdf
Google Scholar
Rickards G, Magee C, Artino AR. You can’t fix by analysis what you’ve spoiled by design: developing survey instruments and collecting validity evidence. J Grad Med Educ. 2012 Dec;4(4):407–410.
Google Scholar
Schwartz N, Oyserman D. Asking questions about behavior: cognition, communication, and questionnaire construction. Am J Eval. 2001;22:127–160.
Web of Science ®Google Scholar
Tourangeau R, Rips LJ, Rasinski K. The psychology of survey responses. New York (NY): Cambridge University Press; 2000.
Google Scholar
Sullivan GM, Artino AR. How to create a bad survey instrument. J Grad Med Educ. 2017;9:411–415.
Google Scholar
Willis GB. Cognitive interviewing training guide. 2012. Available from: www.researchgate.net/gordonbwillis
Google Scholar
Willis G. Current developments in the cognitive testing of survey questions. American Association of Public Opion Research (AAPOR) Webinar. 2018 June 1. Available from: https://www.aapor.org/Education-Resources/Online-Education/Webinars-FAQ.aspx
Google Scholar
Willis GB, Artino AR. What do our respondents think we’re asking? Using cognitive interviewing to improve medical education surveys. J Grad Med Educ. 2013;5:353–356.
PubMedGoogle Scholar
Willis G, Royston P, Bercini D. The use of verbal report methods in the development and testing of survey questionnaires. App Cog Psych. 1991;5:251–267.
Web of Science ®Google Scholar
Desimone LM, Le Floch KC. Are we asking the right questions? Using cognitive interviews to improve surveys in education research. Educ Eval Pol Anal. 2004;26:1–22.
Web of Science ®Google Scholar
Duffett M, Burns KE, Adhikari NK, et al. Quality of reporting of surveys in critical care journals: a methodologic review. Crit Care Med. 2012;40:441–449.
Web of Science ®Google Scholar
Lavrakas PJ. Encyclopedia of survey research methods. Thousand Oaks (CA): Sage Publications, Inc.; 2008. DOI:10.4135/9781412963947
Google Scholar
Billings-Gagliardi S, Barrett SV, Mazor KM. Interpreting course evaluation results: insights from think aloud interviews with medical students. Med Educ. 2004;38:1061–1070.
PubMed Web of Science ®Google Scholar
Willson S, Miller K. Data collection. In: Miller K, Willson S, Chepp V, et al., editors. Cognitive interviewing methodology. Wiley series in survey methodology. Hoboken (NJ): John Wiley & Sons, Inc; 2014. p. 15–33.
Google Scholar
Krueger RA, Casey MA. Focus groups. A practical guide for applied research. 3rd ed. Thousand Oaks (CA): Sage Publications; 2000.
Google Scholar
Diaz-Guzman E, Colbert CY, Mannino DM, et al. 24/7 in-house intensivist coverage and fellowship education: a cross sectional survey of academic medical centers in the USA. Chest. 2012;141:959–966.
Web of Science ®Google Scholar
Mullens JE, Kasprzyk D. Using qualitative methods to validate quantitative survey instruments. National Center for Education Statistics. 2002. Available from: https://pdfs.semanticscholar.org/b41a/de60b7918adeffa6d8c7b655b658ff2ab832.pdf?_ga=2.116987523.425779153.1521137932-
Google Scholar
Rogers G. Accreditation Board for Engineering and Technology (ABET), Inc. Sample protocol for pilot testing survey items. Available from: http://www.abet.org/wp-content/uploads/2015/04/sample-protocol-pilot-testing-survey-items-pdf
Google Scholar
Agency for Healthcare Research and Quality. ESRD toolkit. Using checklists and audit tools. 2015. Available from: https://www.ahrq.gov.resources.esrd
Google Scholar
Akkerman S, Admiraal W, Brekelmans M, et al. Auditing quality of research in social sciences. Qual Quant. 2008;42:257–274.
Web of Science ®Google Scholar
Copeland GA Clinical governance support team. National health service. A practical handbook for clinical audit guidance. 2005. Available from: http://webarchive.nationalarchives.gov.uk/20081112120728/http://www.cgsupport.nhs.uk/Resources/Clinical_Audit/1@Introduction_and_Contents.asp
Google Scholar
Goddard M World health organization. Primary health care continuous quality improvement (CQI) tools. eHealth. Compendium of innovative health technologies for low-resource settings. 2012. Available from: http://www.who.int/ehealth
Google Scholar
Organization for Economic Co-Operation and Development. Good practices in supporting supreme audit institutions. 2011. Available from: http://www.oecd.org/dac/effectiveness/Final%20SAI%Good%20Practice%20Note.pdf
Google Scholar
Blake M. Other pretesting methods. In: Collins D, editor. Cognitive interviewing practice. Los Angeles (CA): Sage Publishing; 2014. p. 28–56.
Google Scholar
Statistical and Science Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget. Statistical policy working paper 47. Evaluating survey questions: An inventory of methods. 2016 January. Available from: https://nces.ed.gov/FCSM/pdf/spwp47.pdf
Google Scholar
Ruark A, Fielding-Miller R. Using qualitative methods to validate and contextualize quantitative findings: a case study of research on sexual behavior and gender-based violence among young Swazi women. Global Health Sci Prac. 2016;4:373–383.
PubMed Web of Science ®Google Scholar
Krosnick JA. Survey research. Ann Rev Psych. 1999;50:537–567.
PubMed Web of Science ®Google Scholar
Schaeffer NC, Presser S. The science of asking questions. Ann Rev Soc. 2003;29:65–88.
Web of Science ®Google Scholar
Willis GB, Miller K. Cross-cultural cognitive interviewing: seeking comparability and enhancing understanding. Field Methods. 2011;23:331–341.
Web of Science ®Google Scholar
Martin SL, Birhanu Z, Omotao MO, et al. “I can’t answer what you’re asking me. Let me go, please.”: cognitive interviewing to assess social support measures in Ethiopia and Kenya. Field Methods. 2017;29(4):317–332.
Web of Science ®Google Scholar
Podsakoff PM, MacKenzie SB, Podsakoff NP. Sources of method bias in social science research and recommendations on how to control it. Ann Rev Psych. 2012;63:539–569.
PubMed Web of Science ®Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. Validity. In: Standards for educational and psychological testing. Washington (DC): American Educational Research Association; 2014. p. 11–31.
Google Scholar
Schwarz N. Self-reports. How the questions shape the answers. Am Psych. 1999;54:93–105. Available from: http://psycnet.apa.org/fulltext/1999-00297-001.pdf
Web of Science ®Google Scholar
National Center for Education Statistics. Planning and design of surveys. Available fom: https://nces.ed.gov/statprog/2002/std2_1.asp
Google Scholar
Ho-Ting Li A, Thomas SM, Farag A, et al. Quality of survey reporting in nephrology journals: a methodologic review. Clin J Am Soc Neph. 2014;9:2089–2094.
Web of Science ®Google Scholar
Minooee S, Masoumeh S, Sheikhan Z, et al. Audit of intrapartum care based on national guidelines for midwifery and birth services. Eval Health Prof. 2018;1–15. DOI:10.1177/0163278718778095
Web of Science ®Google Scholar
Phitayakorn R, Macklin EA, Goldsmith J, et al. Applicants’ self-reported priorities in selecting a residency program. J Grad Med Educ. 2015;7:21–26.
PubMedGoogle Scholar
Cook AF, Arora VM, Rasinski KA, et al. The prevalence of medical student mistreatment and its association with burnout. Acad Med. 2014;89: 759–754.
Web of Science ®Google Scholar
Sutton PA, Mason J, Vimalachandran D, et al. Attitudes, motivators, and barriers to a career in surgery: a national study of UK undergraduate medical students. J Surg Educ. 2014;71:662–667.
Web of Science ®Google Scholar
Pollart SM, Dandar V, Brubaker L, et al. Characteristics, satisfaction, and engagement of part-time faculty at U.S. medical schools. Acad Med. 2015;90:355–364.
Web of Science ®Google Scholar
DeCastro R, Griffith KA, Ubel PA, et al. Mentoring and the career satisfaction of male and female academic medical faculty. Acad Med. 2014;89:302–311.
Web of Science ®Google Scholar
Fonseca AL, Reddy V, Longo WE, et al. Are graduating surgical residents confident in performing open vascular surgery? Results of a national survey. J Surg Educ. 2015;72:577–584.
Web of Science ®Google Scholar
Stevens J, Johansson A, Lennes I, et al. Long-term culture change related to rapid response system implementation. Med Educ. 2014;48:1211–1219.
Web of Science ®Google Scholar
Kirkpatrick JD, Kirkpatrick WK. Four levels of training evaluation. Alexandria (VA): ATD Press; 2016.
Google Scholar
Derzon JH. Challenges, opportunities and methods for large-scale evaluations. Eval Health Prof. 2018;41:321–345.
Web of Science ®Google Scholar
Pedhazur EJ, Pedhazur Schmelkin L. Criterion-related validation. In: Measurement, design and analysis: an integrated approach. Hillsdale (NJ): Lawrence Erlbaum and Associates; 1991. p. 30–32.
Google Scholar
Burns KE, Duffett M, Kho ME, et al., ACCADEMY Group. A guide for the design and conduct of self-administered surveys of clinicians. Can Med Assoc J. 2008;179:245–252.
PubMed Web of Science ®Google Scholar
Downing SM. Threats to the validity of clinical teaching assessments: what about rater error? Med Educ. 2005;39:350–355.
PubMed Web of Science ®Google Scholar
Group on Educational Affairs, Medical Education Scholarship Research and Evaluation Section. Annotated bibliography of journals for educational scholarship, revised. 2017. Available from: https://www.aamc.org/download/456646/data/annotatedbibliography-of-journals-july-2017.pdf
Google Scholar
Chepp V, Gray C. Foundations and new directions. In: Miller K, Willson S, Chepp V, et al., editors. Cognitive interviewing methodology. Wiley series in survey methodology. Hoboken (NJ): John Wiley & Sons, Inc. 2014. Chapter 2, p. 1–5.
Google Scholar
Benitez Baena I, Padilla JL. Cognitive interviewing in mixed research. In: Miller K, Willson S, Chepp V, et al., editors. Cognitive interviewing methodology. Wiley series in survey methodology. Hoboken (NJ): John Wiley & Sons, Inc. 2014. Chapter 9, p. 152.
Google Scholar
Willis G, Boeije H. The survey field needs a framework for the systematic reporting of questionnaire development and pretesting. Methodology. 2013;93:85–86.
Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Best practice versus actual practice: an audit of survey pretesting practices reported in a sample of medical education journals

ABSTRACT