599
Views
5
CrossRef citations to date
0
Altmetric
Web Paper

Identification of best evidence in medical education. Case study

, , , , &
Pages e72-e75 | Published online: 03 Jul 2009

Abstract

Aim: Compare how different researchers performed in screening for informative evidence about medical education.

Method: Six researchers with three different levels of involvement in a systematic literature review screened articles by title and (where available) abstract, and then by reading articles they had selected in full text. The reference standard was a consensus decision to include or exclude the article in the final analysis, whose results are published elsewhere.

Results: The single screener most involved in the literature search, who was also the most junior member of the topic review group, achieved a sensitivity approaching 100% and a specificity of 98–100% for informative articles. She far outperformed the other researchers, all of whom had as much or more topic knowledge and greater research experience.

Conclusion: It was not possible to improve on the performance of the single motivated and capable primary screener and trying to do so increased the number of uninformative articles retrieved. One interpretation is that the primary searcher was more practiced and focused on the task than her more senior colleagues, yet they tended to become worse rather than better with practice. The fact that a well informed but relatively naïve person consistently outperformed her more “expert” colleagues might suggest an alternative explanation: Given the patchy standard and qualitative nature of the evidence, perhaps experts found it harder than a novice to make reliable choices, in which case their unreliable performance reflects the nature of present day education evidence. This case study illustrates the value of quality assuring the article selection process. Given the amount of disagreement uncovered by the study, we suggest that consensus between reviewers is an important reference standard against which the performance of any single primary screener should be checked.

Introduction

The evidence movement came into existence because practitioners were too ready to base practice on hunch, consensus, and tradition, and academics were better at producing new evidence than helping practitioners use what already existed. Evidence based medicine, which has been at the front of the evidence movement, has been criticised as ‘statistical rather than scientific’, elevating mega-trials and meta-analysis over important but unquantifiable factors like experience, judgement and expertise (Charlton & Miles Citation1998). Now that medical education is moving from opinion-based education to evidence-based education, educators have to decide what evidence should guide their practice (Albanese & Norcini Citation2002). Far from the methodological purity of placebo controlled trials, education researchers change practice within a system that is open, complex, non-linear, organic, and historical (Kelly Citation2003) and use qualitative as well as quantitative method to evaluate the outcome (Murray Citation2002). We have examined how experience in clinical and community settings could contribute to early medical education, one of the early ‘Best Evidence Medical Education’ systematic reviews (Dornan et al. Citation2006). We assumed that final adjudication of evidence should be by consensus but were unsure whether more than one person needed to conduct the preliminary sifting of the results of literature searches.

Methods

This section is complementary to ‘Methods’ in our main report (Dornan et al. Citation2006).

Topic review group (TRG)

The group had six members from various parts of the world who we describe here as one ‘level one’ (L1) researcher, one ‘level two’ (L2) researcher, and four ‘level three’ (L3) researchers, the ‘levels’ reflecting how involved they were in the conduct of the literature search. The L1 researcher was a medical student with no prior research experience, who devised and conducted the searches as her compulsory fourth year research project. Other than email contact with the BEME Information Scientist to construct the search syntax, she had no specific bibliographic training or supervision. The L2 researcher was chair of the TRG, an experienced clinical researcher, and supervisor of the L1 researcher's project. The L3 researchers were education researchers varying in experience from PhD student to professor, three of whom were medically qualified.

Validation stages

  1. Helped by an information scientist, the L1 researcher developed a search syntax and ran a ‘scoping search’ to test whether it would yield articles fulfilling the review's entry criteria. She and the L2 researcher independently examined the titles and, when available, abstracts of all articles identified by it. They obtained a full text copy of any article that either of them judged to be relevant and came to a consensus as to whether or not it was informative.

  2. It was decided the L1 researcher should single screen articles in the main search but, to quality control her work, she chose a stratified (by first letter of the first author's surname), random 10% sample of papers (title and abstract). She and the L2 researcher repeated the steps in the preceding paragraph. Articles that either of them judged to be informative were obtained and entered into the main coding process, whereby both the L1 and L2 researcher and at least one L3 researcher decided whether they met the review's inclusion criteria, seeking the opinion of the whole TRG if there was disagreement. So, the performance of the L1 and L2 researchers was validated against a consensus reference standard of ‘finally included in the review’.

  3. The L1 researcher constructed a test set of 124 articles, including 14 that she and/or the L2 researcher had agreed in stage 2 to be relevant and a randomly selected sample of 110 articles selected by neither of them. Stage 3a: All L3 researchers reviewed the articles by title and abstract and decided which were relevant. Stage 3b: Any article chosen by any researcher was sent in full text to all six reviewers, who decided whether or not it should be included in the final set. Their individual judgements were validated against the same reference standard as in the previous stage.

Results are presented as sensitivities (percentage of articles chosen by each screener meeting the reference standard) and specificity (percentage of articles that did not meet the standard that each screener excluded).

Results

Stage 1

The scoping search identified 1003 articles and the L1 and L2 researchers finally agreed that 21 (2.1%) of them were informative. shows that the L1 researcher was considerably more sensitive than the L2 researcher, with little difference in specificity.

Table 1.  Performance of researchers at identifying informative evidence

Stage 2

The main search identified 6981 articles, of which 699 were included in the stratified 10% sample. Twelve (1.7%) met the reference standard of selection. shows that the L1 researcher now achieved a sensitivity of 100% (the L2 researcher adding no articles) with a specificity of 98%.

Stage 3

shows that the L3 researchers had sensitivities ranging from 40 to 100% with specificities around 80% in stage 3a and 50% in stage 3b. Collectively, they chose to see full text copies of 68 of the 124 articles (55%), only 5 (7%) of which met the reference standard.

Discussion and conclusion

The junior researcher most involved in the literature search achieved a sensitivity of 98–100%, incorrectly excluding few articles that met the reference standard. She also made few false positive choices. Our ‘true positive’ rate of 2% is similar to other BEME reviews. If our findings were generalisable to other reviews, a search with 5000 ‘hits’ would yield about 100 articles that needed to be reviewed in full text. A single screener with our L1 researcher's balance of sensitivity and specificity would identify 98 of them and miss two, whilst retrieving about 20 other articles that proved uninformative. It is far from certain that second screening would identify the two articles missed by the L1 researcher, but it would result in about 180 extra uninformative articles being obtained. About half the evidence identified for our review was too methodologically weak to be useful (Dornan et al. Citation2006) so a huge amount of extra effort and expense would yield perhaps one methodologically strong article.

The study has several methodological limitations. If, as is likely, our level one screener had a disproportionate influence on the final consensus, our study design will have tended to overestimate her performance. The number of articles that the L3 researchers wanted to see in full text may have been increased by only having the title available to them. The method may have created non-independence between the gold standard and the rater. Given those considerations, we must recommend great caution in generalising the sensitivities and specificities we calculated. However, that caution applies to their absolute values more than their relative values and the fact that a medical student outperformed all four Professors calls for comment. A simple explanation would be that the student put more time and effort into making her choices but experts do not tend to improve their performance by spending longer over tasks and their performance tended to get worse, not better, as the study progressed. An intriguing alternative explanation is that well-informed naivety was the reason for the L1 researcher's good performance. Perhaps the complexity of the research topic and the nature of the evidence they had to decide on made it hard for experts to exercise judgement. We suggest that any systematic review group would be well advised to quality control its performance, particularly if it intends to rely on a single first line screener. We offer our method of quality control and use of consensus as a reference standard as a way of helping them do so.

Funding

Tim Dornan's research funds

Ethical approval

Not sought as the study did not directly involve human subjects

Acknowledgements

We thank Alex Haig for advising us how to perform the literature searches. Rhona Dalton and Valerie Haigh helped us obtain the articles. Marilyn Hammick and Pat Lilley were an important source of support throughout the review.

Additional information

Notes on contributors

Tim Dornan

TIM DORNAN, an endocrinologist and medical educationalist, was the L2 researcher.

Sonia Littlewood

SONIA LITTLEWOOD, now a surgical resident, was the L1 researcher.

Stephen A Margolis

STEPHEN MARGOLIS, an academic primary care practitioner, was an L3 researcher.

Valmae Ypinazar

VALMAE YPINAZAR, an education researcher, was an L3 researcher.

Albert Scherpbier

ALBERT SCHERPBIER, a Professor of Medical Education, was an L3 researcher.

John Spencer

JOHN SPENCER, a Professor of Medical Education in Primary Care, co-convened the TRG and was an L3 researcher.

References

  • Albanese M, Norcini J. Systematic reviews: What are they and why should we care?. Advances in Health Sciences Education. Theory and Practice 2002; 7: 147–151
  • Charlton BG, Miles A. The rise and fall of EBM. Quarterly Journal of Medicine 1998; 91: 371–374
  • Dornan T, Littlewood S, Margolis S, Scherpbier A, Spencer J, Ypinazar V. How can experience in clinical and community settings contribute to early medical education? Med. Teach 2006; 28: 3–18
  • Kelly AE. Research as design. Educational Researcher 2003; 23: 3–4
  • Murray E. Challenges in educational research. Medical Education 2002; 36: 110–112

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.