4,836
Views
67
CrossRef citations to date
0
Altmetric
Research Article

The effects of audience response systems on learning outcomes in health professions education. A BEME systematic review: BEME Guide No. 21

, , &
Pages e386-e405 | Published online: 11 May 2012

Abstract

Background: Audience response systems (ARS) represent one approach to make classroom learning more active. Although ARS may have pedagogical value, their impact is still unclear. This systematic review aims to examine the effect of ARS on learning outcomes in health professions education.

Methods: After a comprehensive literature search, two reviewers completed title screening, full-text review and quality assessment of comparative studies in health professions education. Qualitative synthesis and meta-analysis of immediate and longer term knowledge scores were conducted.

Results: Twenty-one of 1013 titles were included. Most studies evaluated ARS in lectures (20 studies) and in undergraduates (14 studies). Fourteen studies reported statistically significant improvement in knowledge scores with ARS. Meta-analysis showed greater differences with non-randomised study design. Qualitative synthesis showed greater differences with non-interactive teaching comparators and in postgraduates. Six of 21 studies reported student reaction; 5 favoured ARS while 1 had mixed results.

Conclusion: This review provides some evidence to suggest the effectiveness of ARS in improving learning outcomes. These findings are more striking when ARS teaching is compared to non-interactive sessions and when non-randomised study designs are used. This review highlights the importance of having high quality studies with balanced comparators available to those making curricular decisions.

Introduction

There has been a shift in health trainee education from traditional lectures to a more engaging and active style of teaching. This is in part because of the inadequacies of traditional lecturing to meet the needs of growing class sizes; and the increasing evidence that lectures are not effective for solidifying long-term knowledge acquisition or for promoting translation beyond the acquisition of knowledge to its application in both related and different settings (Alexander et al. Citation2009; Forsetlund et al. Citation2009). Audience response system(s) (ARS) represent a recent innovation that is being used by an increasing number of educational institutions to facilitate student engagement and learning. It consists of an input device controlled by the learner, a receiver and a display linked to the input that can be controlled by the instructor. ARS were first seen at Cornell and Stanford Universities in the 1960s but were not made available for commercial use until the 1990s. Since that time, this technology has been evolving to meet the needs of the modern classroom (Judson & Sawada Citation2002; Abrahamson Citation2006). A more affordable and convenient ARS was marketed in 1999, and in 2003, it started having widespread use in classrooms of higher education (Banks & Bateman Citation2004; Abrahamson Citation2006; Kay & LeSage Citation2009). ARS are being used in a variety of ways: as a learning strategy to facilitate increased attention, interaction, instruction, student preparation and discussion; to motivate students for attendance and participation; and to provide formative and summative knowledge assessments (Kay & LeSage Citation2009).

The literature concerning ARS in education has consistently purported that, when used properly, ARS can achieve positive results for participants (Caldwell Citation2007; Cain & Robinson Citation2008). However, there has been reluctance in using ARS by many teachers and faculties. Some have expressed concerns regarding the time and effort required to prepare new ARS style lectures (Halloran Citation1995), the cost to faculty and students of implementing the new system and the decreased time available to cover lecture material (Miller et al. Citation2003; Cain & Robinson Citation2008).

Although ARS may have real pedagogical value, their impact on learning in health professions education is still unclear. There have been eight reviews published exploring the cost, use and effect of ARS in the broader education literature (Judson & Sawada Citation2002; Roschelle et al. Citation2004; Fies & Marshall Citation2006; Caldwell Citation2007; Simpson & Oliver Citation2007; Cain & Robinson Citation2008; MacArthur & Jones Citation2008; Kay & Lesage Citation2009). However, many of these reviews were not systematic and several had inadequate rigour in their methods as discussed below. Many of these reviews address more general populations, including but not exclusively examining health professions education. Some were published nearly a decade ago and are limited by the number of studies they include.

The most recent systematic review by Kay and LeSage examines the different uses of ARS in higher education, includes 52 studies and represents the most thorough and rigorous review to date. The authors reported a number of promising strategies including collecting formative assessment feedback and peer-based instruction. However, of the 52 studies only seven studies related to health professions education, and these studies focussed on teaching strategies to improve the use of ARS rather than on learning outcomes.

Cain and Robinson published a review in 2008 that gave an overview of the current applications of ARS within health trainee education. This was not a systematic review and reported data on only six studies.

Reviews that report learning outcomes have consistently found that learner reaction is positive (Judson & Sawada Citation2002; Roschelle et al. Citation2004; Fies & Marshall Citation2006; Caldwell Citation2007; Simpson & Oliver Citation2007; Cain & Robinson Citation2008; MacArthur & Jones Citation2008). However, the reviews that reported knowledge outcomes (Judson & Sawada Citation2002; Fies & Marshall Citation2006; Caldwell Citation2007; Cain & Robinson Citation2008; MacArthur & Jones Citation2008) reported mixed results, some studies favouring ARS and others not.

Many reviews have highlighted limitations of the current literature. For example, in 2002, Judson and Sawada published a review that concluded the positive effects of ARS on knowledge scores and learner reaction point more to the teaching practices of the instructor than the incorporation of the ARS technology. The review by Fies and Marshall examined the different uses of ARS in education and concluded that much of the current literature compares ARS versus non-ARS teaching sessions that are unequal. They call for research that rigorously assesses ARS with more balanced comparators in a variety of educational settings.

Until this time, there has been a shortage of literature that would allow a high quality methodological review to be performed that focused on health professions education. We however, in the past few years, a substantial number of new articles with this focus have been published. It is now possible to more rigorously assess the effect of ARS on learning in health professions trainees and provide a better understanding of their use in this distinct context.

Methods

Research question

The overall research question for this systematic review is: what are the effects of ARS on learning outcomes in health professions education? This review includes undergraduate and graduate students, clinical trainees and practicing professionals. The effectiveness of educational strategies was measured in terms of the classic Kirkpatrick model (Kirkpatrick & Kirkpatrick Citation2006) including change in patients’ health, change in learners’ behaviour, change in learners’ skills, change in learners’ knowledge, change in learners’ attitudes/perceptions and change in learners’ reactions. Although it is not explicit in Kirkpatrick's framework, we included learners’ self-confidence under the category of learners’ attitudes/perceptions.

Search strategy

A comprehensive search strategy was developed by a health science librarian in consultation with the other co-authors. We identified relevant studies from the online databases listed in and from other relevant sources as described below.

Table 1  Included online databases

Two search strategies were used depending on whether the database in question was health related or not. This was done to ensure the inclusion of all relevant studies. The specific terms and search strategies can be found for health-related databases in and general databases in . In addition, the reference lists of all included studies were hand searched, as were those of relevant reviews that were identified during the title screening procedure described below. We also hand-searched the conference proceedings for the Association of American Medical Colleges, the Association of Medical Education in Europe and the Canadian Conference of Medical Education from 2007 to 2009. A separate cited reference search was conducted using Web of Science and SCOPUS for each included study to identify papers where it had been cited. The primary authors of all included studies were contacted by email to determine if they knew of any unpublished, recently published or ongoing studies relevant to the review. The contact information used was extracted from the included papers or from the university directories associated with the primary authors.

Table 2  Search terms and strategy

Table 3  Search terms and strategy

Screening and selection of studies

The titles and abstracts generated from the electronic database searches were collated in a Refworks reference management database. They were then screened by two reviewers (AO and CN) to exclude those that obviously did not meet the inclusion criteria or address the question under study. The full texts of the remaining studies were retrieved and a pre-approved inclusion form was applied to each to identify relevant studies. This was done independently by two reviewers (AO and CN), and any disagreements that arose were resolved through discussion, or with the aid of a third reviewer (LH) as required.

The inclusion criteria are detailed in . These were applied to each potentially relevant study to evaluate whether the study should be included in the review. This review focused on health professions trainees who experienced teaching interventions as evaluated by controlled studies.

Table 4  Inclusion and exclusion criteria applied to potentially relevant studies to determine suitability for systematic review purposes

Assessment of methodological quality

The methodological quality of included studies was evaluated independently by two reviewers (LH and CN) using well-recognised tools. The Cochrane Risk of Bias tool was used for controlled trials (Higgins & Green Citation2006). The Newcastle-Ottawa Scale was used for cohort studies (Wells et al.). Discrepancies were resolved through consensus.

Data extraction

Data were extracted and entered into an electronic data extraction form. These were developed and piloted in a systematic review performed by the authors (Hartling et al. Citation2010). These forms were further revised and tailored to the current review. One reviewer extracted data (CN), but to ensure accuracy and consistency of the process, a sample of 20% of the articles was randomly selected for extraction by a second reviewer (AO). The data extracted by the two reviewers were then compared, and no significant discrepancies or errors were detected.

Analysis

The evidence was qualitatively reviewed with studies being grouped by interventions and comparisons and summarised according to the outcomes assessed according to Kirkpatrick levels. Evidence tables detailing study characteristics (including population, intervention, comparison, outcomes and design), results and authors’ conclusions are provided. We meta-analysed immediate and long-term knowledge scores. Data were combined using weighted mean differences (WMDs), inverse variance methods and random effects models. Studies were grouped by design, and meta-analysis was performed separately for randomised controlled trials (RCTs) and non-randomised studies. For the purpose of this analysis, long-term outcomes were defined as the latest examination scores reported, provided the examination was not given immediately after the teaching session. Those that were given immediately following the teaching session were designated as immediate knowledge score outcomes. Heterogeneity was quantified using the I2 statistic; an I2 value of greater than 50% was considered substantial heterogeneity (Higgins & Thompson Citation2002; Higgins et al. Citation2003). Knowledge scores were assessed using different scales (e.g. 0–100, 0–7, etc.); we conducted sensitivity analyses using standardised mean differences to account for this variability. Analyses were conducted using RevMan 5.0 (The Cochrane Collaboration, Copenhagen, Denmark). Results are reported with 95% confidence intervals (CIs) and statistical significant was set at p < 0.05.

Results

presents a flow diagram of the study selection process. Eight hundred and fourteen studies were identified by electronic database searches, and 193 studies were identified by reference and hand searches. Of these 1007 studies, title and abstract screening identified 220 potentially relevant studies that warranted full-text review. Authors of included studies were contacted by email, and this yielded six additional studies giving a total of 1013 studies for review. Inclusion criteria were applied to the full text of these 226 studies. As a result, 21 studies met inclusion criteria for this review.

Figure 1. Flow diagram of included studies.

Figure 1. Flow diagram of included studies.

Among the included studies, nine were RCTs (Miller et al. Citation2003; Palmer et al. Citation2005; Pradhan et al. Citation2005; Duggan et al. Citation2007; Plant Citation2007; Elashvili et al. Citation2008; Rubio et al. Citation2008; Liu et al. Citation2010; Moser et al. Citation2010), two were non-randomised controlled trials (NRCTs) (Schackow et al. Citation2004; Patterson et al. 2010), two were prospective cohort studies (O’Brien et al. Citation2006; Stein et al. Citation2006) and eight were non-concurrent cohort studies (Halloran Citation1995; Slain et al. Citation2004; Barbour Citation2008; Berry Citation2009; Cain et al. Citation2009; Doucet et al. Citation2009; Lymn & Mostyn Citation2009; Grimes et al. Citation2010).

Most of the studies were conducted in the United States (16 studies; Halloran Citation1995; Miller et al. Citation2003; Schackow et al. Citation2004; Slain et al. Citation2004; Pradhan et al. Citation2005; O’Brien et al. Citation2006; Stein et al. Citation2006; Plant Citation2007; Elashvili et al. Citation2008; Rubio et al. Citation2008; Berry Citation2009; Cain et al. Citation2009; Grimes et al. Citation2010; Liu Citation2010; Moser et al. Citation2010; Patterson et al. 2010) with the remainder based in the United Kingdom (Barbour Citation2008; Lymn & Mostyn Citation2009), Australia (two studies; Palmer et al. Citation2005; Duggan et al. Citation2007) and Canada (Doucet et al. Citation2009). Thirteen of the 21 studies were concerned with undergraduate health professions education including four studies in nursing (Halloran Citation1995; Stein et al. Citation2006; Berry Citation2009; Patterson et al. 2010), three studies in medicine (Palmer et al. Citation2005; Duggan et al. Citation2007; Moser et al. Citation2010), two studies in dentistry (Barbour Citation2008; Elashvili et al. Citation2008), two studies in pharmacy (Cain et al. Citation2009; Liu et al. Citation2010) and two studies in veterinary medicine (Plant Citation2007; Doucet et al. Citation2009). Three studies involved medical residents (Palmer et al. Citation2005; Pradhan et al. Citation2005; Rubio et al. Citation2008). Three studies involved graduate trainees, two in pharmacy (Slain et al. Citation2004; Moser et al. Citation2010) and the other in nursing (Grimes et al. Citation2010). Practicing professionals were the subjects in two studies, one involving physicians (Miller et al. Citation2003) and the other nurses (Lymn & Mostyn Citation2009). Several studies assessed more than one level of Kirkpatrick learning outcomes. All 21 studies assessed change in knowledge, and six studies assessed a change in learner reactions (Miller et al. Citation2003; Slain et al. Citation2004; Duggan et al. Citation2007; Elashvili et al. Citation2008; Cain et al. Citation2009; Doucet et al. Citation2009). One of the studies assessed change in self-confidence (Doucet et al. Citation2009). None of the studies evaluated skills or patient outcomes. In total, 2637 participants were involved in the included studies.

Methodological quality and risk of bias of included studies

The methodological quality of the studies varied, however several weaknesses were common to particular designs. The 11 RCTs and NRCTs were assessed using the Cochrane Risk of Bias tool. The randomisation process and allocation concealment were unclear in all nine randomised control trials (Miller et al. Citation2003; Pradhan et al. Citation2005; Duggan et al. Citation2007; Plant Citation2007; Palmer & Devitt 2007; Elashvili et al. Citation2008; Rubio et al. Citation2008; Liu et al. Citation2010; Moser et al. Citation2010). Two trials were not randomised (Schackow et al. Citation2004; Patterson et al. 2010). In about half of the trials (Pradhan et al. Citation2005; Elashvili et al. Citation2008; Rubio et al. Citation2008; Liu et al. Citation2010; Moser et al. Citation2010; Patterson et al. 2010), outcome data were either incomplete or inadequately addressed. One trial (Moser et al. Citation2010) was found to be at risk of selective outcome reporting. Eight trials (Miller et al. Citation2003; Schackow et al. Citation2004; Pradhan et al. Citation2005; Duggan et al. Citation2007; Plant Citation2007; Elashvili et al. Citation2008; Rubio et al. Citation2008; Moser et al. Citation2010) did not present any baseline characteristics of the groups being compared, and one trial reported general baseline imbalance.

For the majority of prospective and non-concurrent cohorts (Halloran Citation1995; Slain et al. Citation2004; O’Brien et al. Citation2006; Stein et al. Citation2006; Barbour Citation2008; Berry Citation2009; Cain et al. Citation2009; Doucet et al. Citation2009; Grimes et al. Citation2010), the exposed and non-exposed groups were drawn from the same community, and the learners were truly representative of the average participant in the community. One non-concurrent cohort was not drawn from the same community (Lymn & Mostyn Citation2009). However, none of the studies took into account the comparability of cohorts or controlled for potential confounders in the association between intervention and outcomes (skills, knowledge and confidence). All of the studies had a clear definition of the outcome, and reported outcomes were based on record linkage. Three studies provided no statement regarding completeness of follow-up (Stein et al. Citation2006; Barbour Citation2008; Lymn & Mostyn Citation2009). One study had less than 10% of its subjects lost, and this small loss is unlikely to introduce bias (Slain et al. Citation2004). One study did not have adequate follow-up of participants, as its loss to follow-up rate was greater than 10% of study participants and there was an incomplete description of those lost (Doucet et al. Citation2009). Further detailed results of the assessments of methodological quality are available from the authors on request.

Characteristics of included studies

provides a summary of the interventions, comparators, outcomes measured and main findings of all included studies. All studies reported knowledge as an outcome, one reported learner self-confidence (Doucet et al. Citation2009) and six reported learner reaction (Miller et al. Citation2003; Slain et al. Citation2004; Duggan et al. Citation2007; Elashvili et al. Citation2008; Cain et al. Citation2009; Doucet et al. Citation2009). and detail the characteristics and results of all included studies. The following provides a narrative overview of the results grouped according to educational outcome.

Table 5  Summary of findings

Table 6  Study characteristics

Table 7  Main findings of the review

Knowledge

All 21 studies, involving 2637 participants, compared knowledge-based learning outcomes between ARS lectures vs. traditional lectures (20 studies) and ARS tutorial vs. traditional tutorial (one study). Fourteen studies reported a statistically significant difference in at least one knowledge assessment score in favour of ARS. In terms of the magnitudes of difference, of the studies with statistically significant differences, five reported a difference of at least 10% in knowledge assessment scores favouring the ARS group. Of these five studies, three were RCTs (n = 22, n = 77 and n = 17; Pradhan et al. Citation2005; Rubio et al. Citation2008; Elashvili et al. Citation2008), one was an NRCT (n = 24; Schackow et al. Citation2004) and one was a non-concurrent cohort (n = 131). The subjects of these studies were medical residents (three studies Schackow et al. Citation2004; Pradhan et al. Citation2005; Rubio et al. Citation2008;), undergraduate dental students (one study; Elashvili et al. Citation2008) and graduate pharmacy students (one study; Slain et al. Citation2004). Interestingly, there were only three studies (Palmer et al. Citation2005; Pradhan et al. Citation2005; Rubio et al. Citation2008) in the review with medical resident participants and all three showed a greater than 10% increase in knowledge assessment scores using ARS. Six studies reported a statistically significant difference in knowledge assessment scores of at least 5% in favour of the ARS group. There were three RCTs (n = 179, n = 102 and n = 86; Palmer et al. Citation2005; Liu et al. Citation2010; Moser et al. Citation2010) and three non-concurrent cohort studies (n = 88, n = 66 and n = 254; Cain et al. Citation2009; Lymn & Mostyn Citation2009; Grimes et al. Citation2010). The participants varied, including undergraduate pharmacy students (two studies; Cain et al. Citation2009; Liu et al. Citation2010), undergraduate medical students (one study; Palmer et al. Citation2005), graduate nursing students (one study; Grimes et al. Citation2010), graduate pharmacy students (one study; Moser et al. Citation2010) and health professionals (one study; Lymn & Mostyn Citation2009).

Three studies reported a statistically significant difference in knowledge assessment scores that was less than 5% favouring ARS. Two of these were non-concurrent cohort studies (n = 126 and n = 169; Berry Citation2009; Doucet et al. Citation2009) and one was a prospective cohort study (n = 148; O’Brien et al. Citation2006). These studies involved participants from undergraduate nursing (one study; Berry Citation2009), undergraduate medicine (one study; O’Brien et al. Citation2006) and undergraduate veterinary medicine (one study) programs (Doucet et al. Citation2009).

Seven studies reported no statistically significant difference in any knowledge assessment measure. Three of these studies were RCTs (n = 283, n = 55 and n = 20; Miller et al. Citation2003; Duggan et al. Citation2007; Plant Citation2007), one was an NRCT (n = 70; Patterson et al. 2010), two were non-concurrent cohort studies (n = 28 and n = 142; Halloran Citation1995; Barbour Citation2008) and one was a prospective cohort (n = 283; Stein et al. Citation2006). Of the seven studies showing no significant difference, participants from undergraduate nursing (three studies; Halloran Citation1995; Stein et al. Citation2006; Patterson et al. 2010), undergraduate dentistry (one study; Barbour Citation2008), undergraduate veterinary medicine (one study; Plant Citation2007), undergraduate medicine (one study; Duggan et al. Citation2007) and practicing professionals (one study; Miller et al. Citation2003) were involved.

The effect of ARS on short- and long-term knowledge assessment scores was examined. Nine studies examined scores from tests, quizzes or questionnaires that immediately followed exposure to ARS (Miller et al. Citation2003; Schackow et al. Citation2004; Palmer et al. Citation2005; Duggan et al. Citation2007; Plant Citation2007; Elashvili et al. Citation2008; Rubio et al. Citation2008; Liu et al. Citation2010; Moser et al. Citation2010). The range of number of immediate knowledge assessments performed in each of these studies was one to two. Four studies (Schackow et al. Citation2004; Elashvili et al. Citation2008; Rubio et al. Citation2008; Moser et al. Citation2010) reported a significant difference in at least one knowledge assessment score favouring ARS lectures, four (Miller et al. Citation2003; Palmer et al. Citation2005; Duggan et al. Citation2007; Plant Citation2007) reported no difference and one (Liu et al. Citation2010) reported immediate quiz scores favouring traditional lectures, but this difference did not extend to the long-term scores in this study.

Eighteen studies reported long-term knowledge assessment scores (at least one month later) from quizzes, tests, unit exams, final exams, class averages or overall grade point averages. The range of number of long-term knowledge assessments performed in each of these studies was one to three. Of these 18 studies, eight (Schackow et al. Citation2004; Slain et al. Citation2004; Palmer et al. Citation2005; Pradhan et al. Citation2005; Rubio et al. Citation2008; Cain et al. Citation2009; Grimes et al. Citation2010; Moser et al. Citation2010) reported a significant difference in at least one knowledge assessment score favouring ARS. The other 10 studies (Halloran Citation1995; O’Brien et al. Citation2006; Duggan et al. Citation2007; Plant Citation2007; Barbour Citation2008; Elashvili et al. Citation2008; Berry Citation2009; Doucet et al. Citation2009; Liu et al. Citation2010; Patterson et al. 2010) reported no difference in any score. There were no long-term knowledge assessment scores that significantly favoured traditional teaching.

Comparison group

A difference in knowledge assessment scores can have as much to do with the comparator group as with the intervention group. In order to better understand the impact of ARS on knowledge-based scores, the comparator groups were also analysed. As part of the data extraction, comparator groups were divided into interactive vs. non-interactive categories. An interactive comparator was defined as one where any similar questions were asked or any attempted interaction was observed. Six of the 21 studies compared ARS lectures with traditional lectures that were not interactive (Schackow et al. Citation2004; Pradhan et al. Citation2005; Duggan et al. Citation2007; Plant Citation2007; Elashvili et al. Citation2008; Rubio et al. Citation2008). Of these six studies, four reported a statistically significant difference in knowledge assessment scores favouring ARS and the difference in all four studies was 10% or greater (Schackow et al. Citation2004; Pradhan et al. Citation2005; Elashvili et al. Citation2008; Rubio et al. Citation2008). Eleven of the 21 studies compared ARS lectures (10 studies; Halloran Citation1995; Miller et al. Citation2003; Slain et al. Citation2004; Barbour Citation2008; Berry Citation2009; Cain et al. Citation2009; Doucet et al. Citation2009; Liu et al. Citation2010; Moser et al. Citation2010; Patterson et al. 2010) and tutorials (one study; Doucet et al. Citation2009) with traditional lectures/tutorials that were interactive. Seven of the 11 studies (Slain et al. Citation2004; O’Brien et al. Citation2006; Elashvili et al. Citation2008; Berry Citation2009; Cain et al. Citation2009; Doucet et al. Citation2009; Liu et al. Citation2010) reported a statistically significant difference in knowledge assessment scores. Of these seven studies, only one (Slain et al. Citation2004) reported a statistically significant increase of 10% or greater. Three studies did not make clear the level of interaction of the comparator. Two of these studies (Lymn & Mostyn Citation2009; Grimes et al. Citation2010) favoured ARS, while one (Stein et al. Citation2006) reported no difference in knowledge assessment scores. Thus, while ARS can increase knowledge-based scores, the greatest effect is seen when they are compared to non-interactive lectures.

Meta-analysis

Meta-analyses were performed for immediate and long-term knowledge outcomes. The results are shown in and , respectively. The RCTs showed no significant difference between groups in either immediate (WMD; 4.53, 95% CI −0.68, 9.74, n = 8) or long-term (WMD 1.36, 95% CI −3.77, 6.50, n = 6) knowledge scores. The non-randomised studies demonstrated a significant difference favouring ARS for both immediate (WMD 4.57, 95% CI 1.47, 7.67, n = 10) and long-term (WMD 35, 95% CI 26.4, 43.6, n = 1) knowledge scores; however, the latter analysis was based on only one study. Statistical heterogeneity was high in all groups with I2 values ranging from 70% to 89%. There was substantial variation between studies that may contribute to the statistical heterogeneity observed; this includes differences in characteristics of the participants (e.g. professional groups, undergraduate vs. other), content of the lectures, comparison groups (i.e. interactive vs. non-interactive comparators), individuals delivering the lectures, methods and time points for outcome assessment, as well as other study design features (e.g. concurrent vs. non-concurrent controls).

Figure 2. Meta-analysis of immediate knowledge-based test scores comparing ARS lectures (experimental) to traditional lectures (control).

Figure 2. Meta-analysis of immediate knowledge-based test scores comparing ARS lectures (experimental) to traditional lectures (control).

Figure 3. Meta-analysis of long-term knowledge-based test scores comparing ARS lectures (experimental) to traditional lectures (control).

Figure 3. Meta-analysis of long-term knowledge-based test scores comparing ARS lectures (experimental) to traditional lectures (control).

We conducted sensitivity analyses using standardised mean differences to account for the variation in total scores used across studies. The patterns were similar to results based on WMDs with the RCTs showing no significant differences and the non-randomised studies showing significant differences of similar magnitude for both immediate and long-term knowledge scores (data not shown; available from authors on request).

Student self-confidence and learner reaction

One non-concurrent cohort (n = 169; Doucet et al. Citation2009) involving undergraduate veterinary medicine students compared students’ self-confidence in skills relating to clinical pharmacology after ARS and traditional instruction. The study favoured ARS lectures with self-confidence in three of six skills categories rated significantly higher by ARS participants. The other three skill categories showed no significant difference in self-confidence between ARS and traditional lecture cohorts.

Six studies involving 1236 participants compared learner reactions to the ARS enhanced teaching sessions and traditional teaching sessions. Three of the six studies were non-concurrent cohort studies (Slain et al. Citation2004; Cain et al. Citation2009; Doucet et al. Citation2009), whereas the other three were RCTs (Miller et al. Citation2003; Duggan et al. Citation2007; Elashvili et al. Citation2008). One of these studies (non-concurrent cohort; Slain et al. Citation2004), examined student reaction in three separate courses (n = 131, n = 141 and n = 131). All three of these comparator courses favoured the ARS group. In one RCT (n = 127; Duggan et al. Citation2007), the same class completed evaluations at different times. This study had mixed results in that it favoured an ARS lecture with one teacher and favoured a traditional lecture with another teacher. Two other non-concurrent cohort studies (n = 254 and n = 169; Cain et al. Citation2009; Doucet et al. Citation2009) and two RCTs (n = 283 and n = 77; Miller et al. Citation2003; Elashvili et al. Citation2008) reported student reaction that favoured the ARS. Overall, five of the six studies reported favourable learner reaction to ARS, and one study reported mixed results.

Discussion

This systematic review examined the effect of ARS on learning outcomes in health professions education. The results show some modest beneficial to neutral effects of ARS in terms of increased knowledge and self-confidence, as well as positive learner reactions. These results are reassuring for health professions educators concerned that ARS will negatively impact student achievement.

Twenty-one studies were included in the analysis and 14 of these reported statistically significant differences in favour of ARS groups over comparators in terms of knowledge scores. Five studies (Schackow et al. Citation2004; Slain et al. Citation2004; Pradhan et al. Citation2005; Elashvili et al. Citation2008; Rubio et al. Citation2008) demonstrated an increase of at least a 10% in knowledge assessment scores for the ARS group, an additional six studies (Palmer et al. Citation2005; Cain et al. Citation2009; Lymn & Mostyn Citation2009; Grimes et al. Citation2010; Liu et al. Citation2010; Moser et al. Citation2010) reported an increase of at least 5%, and three studies (Barbour Citation2008; Berry Citation2009; Doucet et al. Citation2009) reported increases of less than 5%. Only one study (Palmer et al. Citation2005) favoured a traditional lecture format over ARS with a statistically significant difference in scores on an immediate post-lecture quiz. However, this study reported results that favoured ARS lectures in the delayed quiz and in their analysis of knowledge retention. Thus, the effect of ARS on combined test scores was reported as favouring ARS. The authors in this study hypothesised that the findings in favour of the traditional lecture for the early quiz were due to the students’ initial unfamiliarity with ARS technology. Although a number of studies reported no statistically significant difference in scores, there were no studies that reported a negative impact on knowledge-based outcome scores.

The results of our meta-analysis provide additional insights into the impact of ARS on knowledge outcomes. While the results were heterogeneous, the pooled results provide an estimate of the potential impact that ARS can have on knowledge scores. The difference for immediate knowledge showed a difference of approximately 4.5% on test scores. The magnitude of effect may be more or less depending on a number of factors, in particular, the intervention against which the ARS is compared. Through our qualitative analysis, we found that studies where ARS was compared against interactive teaching modalities showed less impact on knowledge outcomes than those that had a non-interactive comparison. Our meta-analysis also demonstrated that the magnitude of effect and statistical significance are tempered by study design: the pooled results were not significant for RCTs but were significant for the non-randomised studies. This was particularly apparent for the longer term outcomes where there was no difference among the RCTs but a substantial difference for non-randomised studies, although only one study was included; hence, we cannot make firm conclusions regarding the impact of ARS on longer term knowledge retention.

Our findings suggest that the non-randomised studies may overestimate the benefits of ARS due to methodological limitations inherent in these designs. In particular, our quality assessment highlights that many of non-randomised studies did not control for potential confounders or baseline imbalances between study groups. Future research should use randomised methods; by controlling for both known and unknown confounders between study groups, randomised studies yield less biased estimates of effect.

One non-concurrent cohort (Doucet et al. Citation2009) reported the self-confidence of undergraduate veterinary medicine students in clinical pharmacology. The study favoured ARS lectures; however, this single study makes it difficult to generalise these findings to other areas of education.

In terms of learner reaction, five of six studies favoured ARS lectures. As this systematic review included only comparative data, many studies that reported non-comparative student reaction were excluded. The following were three common themes noted in the review of the learner reaction data: ARS lectures were of a higher quality, they led to increased interaction and they were more enjoyable. These findings are consistent with studies that have been published describing the use of ARS in other teaching contexts (Roschelle et al. Citation2004; Fies & Marshall Citation2006; Caldwell Citation2007). It should be noted that for nearly all studies, ARS were novel learning tools for the students. As other authors have suggested (Caldwell Citation2007) some of the positive effects seen may be due to the novelty of the ARS where ‘special treatment causes the improvement rather than the use of clickers’. However, this effect is difficult to assess as longer term studies have not been reported.

The current review highlights one of the caveats in interpreting this body of evidence, that is, the fact that different comparison groups were used across relevant studies. To explore the possibility of different results depending on the comparison group used, we conducted sub-group analyses to examine results of studies with interactive versus non-interactive comparators. The greatest effects on knowledge scores were seen when ARS was compared to non-interactive lectures; the differences between groups were less pronounced when non-interactive comparators were excluded. These results suggest that the positive effects of ARS on knowledge outcomes may also be produced by other interactive lecture styles or interactive modalities. These findings support previous studies that have hypothesised that increased interaction, rather than the actual technology, may be the mechanism by which ARS positively affects student achievement (Poulis et al. Citation1998; Caldwell Citation2007).

Overall, the previous reviews of ARS do not include or examine the use and impact of ARS in health professions education thoroughly nor do they systematically report the impact of the ARS on learning outcomes. The use of ARS among clinical trainees and health professionals presents a distinct work-based clinical context and has not been previously reported with similar rigour or in similar detail. For example, this is the first review to include studies of ARS in continuing professional learning. It is also the first review to explore the impact of interactive versus non-interactive comparators. Furthermore, it is the first to pool data in order to quantify the potential magnitude of effect of ARS.

In terms of limitations, inclusion bias was minimised by prospectively establishing the search strategy and by having two authors screen all potential studies, maximising the likelihood that this review is inclusive of all relevant studies. However, this review is limited by the methodological quality of included studies. Most of the studies were at a high risk of bias due to inadequate blinding of participants and/or outcome assessors. In addition, many included trials presented outcome data that was not complete or not clearly described. Either of these flaws may result in an error when estimating the intervention's effects. Similarly, few cohorts accounted for differences in learning style or level of education. Randomised trials provide a less biased comparison as the randomisation process theoretically distributes both known and unknown confounders equally between groups. We found that the magnitude of effect was smaller for randomised trials compared to non-randomised studies. Future research should aim to employ randomised methods or account for potential confounders in order to avoid overestimates of intervention effects.

Another limitation of this body of evidence is that only one study (Duggan et al. Citation2007) provided power calculations. Without these calculations, it is not possible to determine if observations of no difference between the interventions being compared represents actual equivalence or simply points to insufficient statistical power (i.e. type II errors). We recommend that researchers conduct sample size calculations in future studies in order to allow for more meaningful conclusions to be drawn.

The review is also limited by weaknesses inherent to the field of investigation, many of which have been previously discussed. For example, Schmidt et al. (Citation1987) outlined the difficulty controlling for extraneous variables that may affect outcomes, particularly in studies that extend over a period of time. The authors have also detailed the struggle involved in identifying and isolating the relative contributions of different curricular components that may affect outcomes (Schmidt et al. Citation1987; Schmidt et al. Citation1996; Tamblyn et al. Citation2005). In addition, existing outcomes and measurement tools may ineffectively assess important areas of health professionals’ competence (Berkson Citation1993; Vernon & Blake Citation1993; Distlehorst et al. Citation2005). This is particularly relevant to the current review as the majority of data reported focused on the lower Kirkpatrick level outcomes of knowledge scores and learner reaction.

Finally, with the heterogeneity of populations, designs, interventions, comparators and outcomes measured the findings cannot be easily generalised to health professions trainees of all levels or differing education settings. However, this review is the most comprehensive evaluation of studies pertaining to health professions in the literature and allows findings on ARS to be extended to the postgraduate and continuing professional education realms.

Conclusions

This review provides a comprehensive synthesis of the evidence to guide health professions educators regarding the implementation and use of ARS in this distinctive setting. Although causal relationships cannot be determined from this review, there were a number of interesting and novel findings. ARS did not have a consistent negative impact on student achievement in any setting or compared to any other group. However, only a few studies demonstrated large increases in knowledge scores, and these were primarily non-randomised studies that compared ARS to non-interactive teaching strategies. On further examination of the studies, comparisons of interactive teaching session to ARS lectures/tutorials revealed smaller differences favouring ARS lectures. A number of studies reported no difference in student achievement. Short-term and long-term knowledge assessment scores were affected similarly. This review also revealed an interesting trend in that all three studies examining medical residents reported a large increase in knowledge assessment scores compared to non-interactive lectures. One may hypothesise that in settings, such as medical residencies, where sleep deprivation and subsequent difficulties with attention are common and well documented, the ability of ARS to enhance learner interactivity may be even more beneficial, although further study is required.

Many health professions educators feel that the expenditure of money and time are worthwhile only if a new teaching intervention substantially impacts measurable learning outcomes. The results of this review indicate that ARS may produce improved short-term and long-term knowledge outcomes. Although ARS is not the only solution for lecturers who struggle with student engagement and poor learning outcomes, it does provide a convenient way for educators to create an interactive teaching environment. However, education programmes that already consistently use an interactive style of lecturing may not see a significant increase in knowledge scores with the implementation of an ARS. The most telling result in this review is the finding that non-randomised study designs produced more strongly positive results in favour of ARS than the higher quality randomised studies, whereas smaller if any differences in learning outcomes were seen with ARS. This in itself is a very important result that reinforces the need for curriculum planners to demand more rigorous studies prior to implementing new teaching strategies and reinforces the importance of systematic evaluations of the literature on common curricular interventions in medical education.

Acknowledgements

The authors thank Ben Vandermeer for his assistance in the statistical analysis.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

References

  • Abrahamson L. A brief history of networked classrooms: Effects, cases, pedagogy, and implications. Audience response systems in higher education: Applications and cases, DA Banks. Information Science Publishing, Hershey, PA 2006; 25
  • Alexander CJ, Crescini WM, Juskewitch JE, Lachman N, Pawlina W. Assessing the integration of audience response system technology in teaching of anatomical sciences. Anat Sci Educ 2009; 2(4)160–166
  • Banks DA, Bateman S. Audience response systemsin education: Supporting a ‘lost in the desert' learning scenario. International Conference on Computers in Education (ICCE2004, Acquiring and Constructing Knowledge Through Human-Computer Interaction Creating New Visions for the Future of Learning), Melbourne. 2004
  • Barbour ME. Electronic voting in dental materials education: The impact on students’ attitudes and exam performance. J Dent Educ 2008; 72(9)1042–1047
  • Berkson L. Problem-based learning: Have the expectations been met?. Acad Med 1993; 68(10 Suppl.)79–88
  • Berry J. Technology support in nursing education: Clickers in the classroom. Nurs Educ Perspect 2009; 30(5)295–298
  • Cain J, Black EP, Rohr J. An audience response system strategy to improve student motivation, attention, and feedback. Am J Pharm Educ 2009; 73(2)21
  • Cain J, Robinson E. A primer on audience response systems: Current applications and future considerations. Am J Pharm Educ 2008; 72(4)77
  • Caldwell JE. Clickers in the large classroom: Current research and best-practice tips. CBE Life Sci Educ [Electronic Resource] 2007; 6(1)9–20
  • Distlehorst LH, Dawson E, Robbs RS, Barrows HS. Problem-based learning outcomes: The glass half-full. Acad Med 2005; 80(3)294–299
  • Doucet M, Vrins A, Harvey D. Effect of using an audience response system on learning environment, motivation and long-term retention, during case-discussions in a large group of undergraduate veterinary clinical pharmacology students. Med Teach 2009; 31(12)e570–e579
  • Duggan PM, Palmer E, Devitt P. Electronic voting to encourage interactive lectures: A randomised trial. BMC Med Educ 2007; 7: 25
  • Elashvili A, Denehy GE, Dawson DV, Cunningham MA. Evaluation of an audience response system in a preclinical operative dentistry course. J Dent Educ 2008; 72(11)1296–1303
  • Fies C, Marshall J. Classroom response systems: A review of the literature. J Sci Educ Technol 2006; 15(1)101–109
  • Forsetlund L, Bjorndal A, Rashidian A, Jamtvedt G, O’Brien MA, Wolf F, Davis D, Odgaard-Jensen J, Oxman AD. Continuing education meetings and workshops: Effects on professional practice and health care outcomes. Cochrane Database Syst Rev 2009, 2: CD003030
  • Grimes C, Rogers GJ, Volker D, Ramberg E. Classroom performance system use in an accelerated graduate nursing program. Comput Inform Nurs 2010; 28(2)79–85
  • Halloran L. A comparison of two methods of teaching. computer managed instruction and keypad questions versus traditional classroom lecture. Comput Nurs 1995; 13(6)285–288
  • Hartling L, Spooner C, Tjosvold L, Oswald A. Problem-based learning in pre-clinical medical education: 22 years of outcome research. Med Teach 2010; 32(1)28–35
  • Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions. Wiley & Sons, Chichester, UK, Cochrane Collaboration 2006
  • Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002; 21(11)1539–1558
  • Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327(7414)557–560
  • Judson E, Sawada D. Learning from past and present: Electronic response systems in college lecture halls. J Comput Math Sci Teach 2002; 21(2)167–181
  • Kay RH, LeSage A. A strategic assessment of audience response systems used in higher education. Australas J Educ Technol 2009; 25(2)235–249
  • Kirkpatrick DL, Kirkpatrick JD. Evaluating training programs: The four levels3rd. Berrett-Koehler, San Francisco, CA 2006
  • Liu FC, Gettig JP, Fjortoft N. Impact of a student response system on short- and long-term learning in a drug literature evaluation course. Am J Pharm Educ 2010; 74(1)6
  • Lymn JS, Mostyn A (2009) Use of audience response technology to engage non-medical prescribing students in pharmacology. ICERI2009 Proceedings, MadridMadrid, 16 to 18 November, 2009, Spain, pp. 4518–4524
  • MacArthur JR, Jones LL. A review of literature reports of clickers applicable to college chemistry classrooms. Chem Educ Res Pract 2008; 9(3)187–195
  • Miller RG, Ashar BH, Getz KJ. Evaluation of an audience response system for the continuing education of health professionals. J Contin Educ Health Prof 2003; 23(2)109–115
  • Morrison A, Moulton K, Clark M, Polisena J, Fiander M, Mierzwinski-Urban M, Mensinkai S, Clifford T, Hutton B. English-language restriction when conducting systematic review-based meta-analyses: Systematic review of published studies. Canadian Agency for Drugs and Technologies in Health, Ottawa 2009
  • Moser LR, Kalus JS, Brubaker C, 2010. Evaluation of an audience response system on pharmacy student test performance. Unpublished manuscript
  • O’Brien TE, Wang W, Medvedev I, Wile MZ, Nosek TM. Use of a computerized audience response system in medical student teaching: Its effect on exam performance. Med Teach 2006; 28(8)736–738
  • Palmer, E J, and Devitt, P G. 2007. Assessment of higher order cognitive skills in undergraduate education: Modified essay or multiple choice questions? Research paper, BMC Medical Education, 2007, 7:49 [Accessed 26 April 2010] http://www.biomedcentral.com/1472-6920/7/49.
  • Palmer EJ, Devitt PG, De Young NJ, Morris D. Assessment of an electronic voting system within the tutorial setting: A randomised controlled trial. BMC Med Educ 2005; 5(1)24
  • Patterson B, Kilpatrick J, Woebkenberg E. Evidence for teaching practice: The impact of clickers. Nurse Educ Today, 2009; 30(7)603–607
  • Plant JD. Incorporating an audience response system into veterinary dermatology lectures: Effect on student knowledge retention and satisfaction. J Vet Med Educ 2007; 34(5)674–677
  • Poulis J, Massen C, Robens E, Gilbert M. Physics lecturing with audience paced feedback. Am J Phys 1998; 66(5)439–441
  • Pradhan A, Sparano D, Ananth CV. The influence of an audience response system on knowledge retention: An application to resident education. Am J Obstet Gynecol 2005; 193(5)1827–1830
  • Roschelle J, Penuel WR, Abrahamson L. The networked classroom. Educ Leadersh 2004; 61(5)50–54
  • Rubio EI, Bassignani MJ, White MA, Brant WE. Effect of an audience response system on resident learning and retention of lecture material. AJR. Am J Roentgenol 2008; 190(6)W319–W322
  • Schackow TE, Chavez M, Loya L, Friedman M. Audience response system: Effect on learning in family medicine residents. Fam Med 2004; 36(7)496–504
  • Schmidt HG, Dauphinee WD, Patel VL. Comparing the effects of problem-based and conventional curricula in an international sample. J Med Educ 1987; 62(4)305–315
  • Schmidt HG, Machiels-Bongaerts M, Hermans H, Ten Cate TJ, Venekamp R, Boshuizen HPA. The development of diagnostic competence: Comparison of a problem-based, an integrated, and a conventional medical curriculum. Acad Med 1996; 71(6)658–664
  • Simpson V, Oliver M. Electronic voting systems for lectures then and now: A comparison of research and practice. Australas J Educ Technol 2007; 23(2)187–208
  • Slain D, Abate M, Hodges BM, Stamatakis MK, Wolak S. An interactive response system to promote active learning in the doctor of pharmacy curriculum. Am J Pharm Educ 2004; 68(5)1–9
  • Stein PS, Challman SD, Brueckner JK. Using audience response technology for pretest reviews in an undergraduate nursing course. J Nurs Educ 2006; 45(11)469–473
  • Tamblyn R, Abrahamowicz M, Dauphinee D, Girard N, Bartlett G, Grand’Maison P. Effect of a community oriented problem based learning curriculum on quality of primary care delivered by graduates: Historical cohort comparison study. BMJ 2005; 331(7523)1002
  • Vernon DT, Blake RL. Does problem-based learning work? A meta-analysis of evaluative research. Acad Med 1993; 68(7)550–563
  • GA Wells, Shea, B, O'Connell, D, Peterson, J, Welch, V, Losos, M, Tugwell, P. 2011. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Ottawa: Ottawa Hospital Research Institute, [Accessed 26 April 2010] http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.