1,680
Views
3
CrossRef citations to date
0
Altmetric
Articles

Replication studies: an essay in praise of ground-up conceptual replications in the science of learning

ABSTRACT

This paper discusses adapting Churches’ approach to large-scale teacher/researcher conceptual replications of major “science of learning” findings, to increase teachers’ engagement with empirical research on, and building research networks for, gathering data on the science of learning. The project here demonstrated the feasibility of teacher-led randomised controlled trials for conceptually replicating the effects of cognitive science on learning, as specified by researchers. It also indicated high levels of interest by teachers in applying more science of learning in their practice. The approach gave freedom to teachers to design interventions, choose research methods, and measure outcomes, even though such freedom would be in tension with some scientific research which relies on constraining the sources of variation. This paper discusses how a balance can be struck between the objectives of teachers and researchers engaged in replicating cognitive science findings, and promoting teacher engagement in conceptual replication research.

Introduction

This paper aims to draw attention to a method that might be employed by schools to expand the use of the science of learning in shaping classroom practice. It reports a project in which replication studies were “owned” by the teachers, singly and together, and along with researchers. This paper discusses a model that could be used to support teachers in carrying out conceptual replications. It aims to support the view that teachers can conduct high-quality empirical research within their normal roles. The paper reports an example of this from the work of Churches et al. (Citation2020) (henceforth referred to as “the Churches model”) under which teachers, en masse, demonstrated both the inclination and the requisite capacities to conduct randomised controlled trials in a conceptual replication. It looks at the issue from the point of view of efforts to expand the use of the science of learning in shaping education practice.

This paper is intentionally unlike a “standard” academic paper; rather, it is an essay – a deliberate attempt to provide a narrative that is purposely close to the world of teachers, as these are the key participants in the study reported here. When making a case for teachers as conceptual replication researchers, the paper’s style and organisation differ from those of the “normal” academic paper; it deliberately catches and respects the voice of the author, as this is a key feature of the teachers involved in the study reported here; respect for diversity and inclusion are essential in recognising the contribution of teachers to the worlds of research. The paper is a novel contribution to the teacher-as-researcher movement that has been alive and well for decades.

The case is made that teachers are well positioned to generate and conduct conceptual replications. Added to this is the author’s observation that involving teachers as conceptual replication researchers is not only faithful to features of conceptual replications, for example, repeating original studies in different contexts and with different features of the study (e.g., sampling), but also, by involving networks and groups of teachers, the scope of the generalisability of the original study is broadened more widely than might be possible in “standard” replications. Moreover, the involvement of teachers in such conceptual replications acts as a form of professional development and respect for the professionalism of teachers, and fuels the thirst that some teachers have for research in education to be of practical use in their classroom.

This paper takes calls for inclusive education to extend to the involvement and engagement of teachers in the research enterprise. To accomplish this, the paper deliberately provides a personal, idiosyncratic narrative in setting out the features of the Churches model and in commenting on its contribution to conceptual replications which are “ground-up” rather than the top-down approach characteristic of many research studies.

The paper proceeds in several stages. First, it suggests why closer practitioner involvement in carrying out research may be necessary for the development of schools’ use of research. It then outlines previous and current arrangements for encouraging schools to participate in designing and implementing research. It contrasts the current capacity to generate studies through these routes, with the potential of the Churches model to harness teachers’ growing professional interest in research. It describes the Churches model and the challenges that such an approach poses for the purposes of gathering informative data. It also outlines methodological issues that teachers often seem to overlook in their early attempts at empirical research. It then offers an overview of the potential of the Churches model to generate useful research data as a form of conceptual replication. Finally, it proposes how the Churches model could be adapted in scaling up conceptual replication, retaining aspects that are likely to be important to teachers’ interest in participating, whilst providing sufficient controls to offer useful insights into effectiveness, that is, balancing the individual and group needs of teachers with the need for rigour in research.

Increasing schools’ use of the science of learning

If we want research to impact positively on classrooms, then the involvement of teachers in planning and conducting the research in their own ways and on topics that are of immediate and practical concern to teachers is important. Often, too few effects known to improve learning in laboratory conditions have been reproduced in and across school settings. Consequently, schools might have good grounds to be sceptical about claims that techniques will improve learning in their context, as there is often little evidence of them working in conditions similar to their own. Thus, for schools to make greater use of the science of learning, the argument here is that there is a need to increase schools’ capacity to collaborate with research, and the paper reports an instance of how this can occur – the Churches model. However, the quantity of research work that schools are willing to carry out might be limited by a natural tension between the scientific method and schooling, as both require control over the experiential conditions of students (Dawson et al., Citation2018; Plummer et al., Citation2014), and these might not coincide with each other. School priorities take precedence over those of research, as schools are responsible for the immediate concerns of maintaining safety, behaviour, and administering a daily-changing programme of activities to a whole school (Dawson et al., Citation2018; Plummer et al., Citation2014). However, some of these challenges could be overcome if teachers were to take a leading role in planning how to carry out research in schools, as they are ideally placed to fit conditions needed for research around the conditions operating in their schools.

A recent systematic review of teaching techniques informed by cognitive science identified large gaps in the literature concerning their effectiveness for different age groups, contexts, and different taught subjects (Perry et al., Citation2021). We might regard it as unnecessary to reproduce trials of all permutations of age, subjects, and contexts, because a clear generalisable pattern should emerge from several studies of the most common permutations, for example, retrieval practices in late primary humanities. However, even completing several studies in each permutation that practitioners would recognise as relevant to them would still require a large number of new studies to be carried out. Achieving acceptable coverage of ages, subjects, and contexts requires a number of studies that may be unlikely to be generated through traditional routes and current research capacity.

The need for collaborations in research

There are examples of UK schools successfully leading collaborations with researchers, but not many. For example, in 2013, three separate groups of schools coincidentally applied to the UK’s Education Endowment Fund (EEF) to fund evaluations of the same literacy programme “Fresh Start” (Gorard et al., Citation2016). The EEF felt that the sample sizes of the individual school groups were too small to be informative and suggested that the three proposals merge into one in order to gain higher statistical power. The project was unusual, as the schools took a leading role in designing and carrying out the research. Schools agreed to work with researchers to guide the process of randomised assignment to conditions, the design of the counterfactual condition (the control groups delayed receiving the intervention for one term), and agreeing the choice of a standardised pre- and post-test. The project produced informative results, as the intervention and evaluation were carried out thoroughly. Only one of the 10 schools appeared not to have been randomised as advised; no school dropped out, and student attrition was low (3%). It seems likely that the participants in the schools were well motivated, due to their leading role in selecting the intervention and designing the study, and it can be suggested, too, that this was perhaps because the participants experienced a sense of ownership and encouragement to identify with the benefits of thorough implementation.

Whilst this is an example of schools taking a leading role in collaboration with researchers, it does not appear to be part of a major trend or to have influenced other schools to attempt similar projects as, to the knowledge of this author, it is the sole instance, occurring 8 years before writing the present article, with no other similar examples since. It seems possible that this collaboration occurred due to unique circumstances that were especially conducive to such work, that is, the unusual coincidence of three school clusters proposing the same evaluation at the same time, attracting funding usually withheld due to challenges in getting large numbers of schools to coordinate.

The success of the collaborative project may also be related to the topic of the research, a phonics intervention, a subject especially positioned within the tradition of thorough evaluation (e.g., the discussion of synthetic phonics). The example presented above illustrates a dilemma facing education research: More collaborations would be possible if more schools volunteered, and they would volunteer where a tradition of thorough evaluation has become established; but, in order to establish a tradition, the involvement of schools is essential, including the needs for participants in the schools to acquire research skill, experience, and confidence. Waiting for circumstances to arise where schools propose rigorous research designs before supporting collaborations has resulted in comparatively few studies. Consequently, this route seems unlikely to drive the increase in research capacity needed for extensive school-led applied research, including conceptual replication study to produce an evidence base across relevant subject areas and contexts (Perry et al., Citation2021).

There is an example of schools working together to replicate an intervention without the input of researchers. Ten primary schools in the Anglican Schools Partnership agreed to meet regularly for 1 year to develop higher quality feedback in English, based on their interpretation of John Hattie’s effective feedback model (Gorard et al., Citation2014; Hattie & Timperley, Citation2007). After a year, no differences were found between teacher-assessed grades in the 10 participating schools compared to five control schools. It was concluded that the intervention schools were unable to specify an operational definition of the intervention in sufficient detail to ensure all schools applied the intervention consistently. This example illustrates the value of collaboration between schools and researchers and the important role researchers may play in constraining sources of variation.

The Churches model

The conceptual replication in this paper was the Churches model, and it was a model of research which involved teachers, groups of teachers, groups of schools, and researchers collaboratively in replication research, and which placed very great emphasis on teacher involvement from the very inception of the project onwards, including the focus, planning, implementation, data analysis, and reporting of the conceptual replication research. In turn, this required training teachers in research methods. In the example below, the focus was on the science of learning, and their applications and effects in classrooms, that is, a project that would provide useful and relevant practical advice for teachers. Here science of learning topics included attention, working memory, retrieval, and distributed practice/interleaving, and these were the focus of the conceptual replications in many classrooms. The conceptual replication comprised the aggregation of results from multiple small/micro replication research studies involving many teachers in many schools.

Richard Churches, from which the Churches model derives its name, provided training in research methods, in person, to teachers in 50 schools that had engaged with a UK government programme to increase schools’ use of empirical research. The Churches model describes the training as equivalent to an A level in psychology research methods. The training requires a high level of specialist expertise on the part of the trainer, for its effectiveness, as the trainer drew upon extensive knowledge and experience in working with teachers on school improvement as a consultant and school quality inspector. Trainees received the book Teacher-Led Research: Designing and Implementing Randomised Controlled Trials and Other Forms of Experimental Research, written for the project (Churches & Dommett, Citation2016). Teachers attended conferences on “neuroscience in education” provided by the Wellcome Trust (a health and medical research organisation), at which they received seminars on “learning science” based on Churches and colleagues’ recommendations in their book Neuroscience for Teachers: Applying Research Evidence From Brain Science, commissioned for the project (Churches et al., Citation2017). The conferences and books presented several well-established learning science effects on attention, working memory, retrieval, and distributed practice/interleaving (Churches & Dommett, Citation2016; Churches et al., Citation2017; Churches & McAleavy, Citation2016).

The teachers then worked with a panel of two learning scientists and one expert in education, to develop research proposals to test these learning science effects in their schools. Teachers decided how they would apply the cognitive effects in their teaching, selected different forms of randomised controlled trial methods to suit their school setting, and designed the tests to measure the impact on learning of these learning science effects. They worked with the panel to agree to putting the research method into a form that the project termed a “research protocol”. Over a period of months, the teachers applied their protocols in their schools and returned results to the project. The project generated a large number of studies, 15 experiments, some with multiple age groups yielding 34 studies overall on a large sample of 2,157 students. Phase 2 of the Wellcome Trust project yielded 75 studies, fewer than had been expected, due to the pandemic which was present at the time.

An output of the project was conference posters concisely outlining the research objectives, procedure, design, results, and interpretation on one page in a standardised format. These were intended and used to describe and promote the work at conferences but were not designed to provide sufficient information for others to replicate the studies. Teachers most commonly chose to study retrieval practice, giving students several short tests to increase experience drawing learned items from memory, and interleaving, where a learning topic is repeatedly taught in short sessions spread over an extended time period interspersed with other learning. They most often tested their effect on spelling, the times tables, and vocabulary.

Research schools and the Education Endowment Foundation

Extending the involvement of schools in research, with collaborations between teachers, researchers, and schools, might increase through the inception of “research schools” in the UK. At the time of writing, a substantial improvement in UK schools’ use of empirical research might develop from the establishment of funded “research schools” by the EEF. These 27 schools are intended to act as centres of expertise and regional hubs working with the EEF to learn how best to teach their own teachers how to convert research into practice, and disseminating this to the schools around them. They received substantial additional funding of £140,000 over 3 years. These schools were selected in a competitive process in which many more applied than were selected, further indicating schools’ high level of interest in enhancing research expertise. This approach seeks to provide a thorough grounding in interpreting and applying research among these schools and establishing long-lasting institutional expertise sustained over time across staff changes. Dissemination also seems likely to be effective in communicating with other schools, as research findings and outcomes are produced by practitioners speaking to their peers from a shared perspective.

In the present context of conceptual replications, this all implies that research schools are well placed to gain educational value from replication studies. However, currently, as the project does not encourage research schools to lead research or collaborate with researchers to carry out studies, it might not directly increase the capacity of schools to conduct or participate in research needed to fill the large explanatory gaps discussed above. On the other hand, the proactive stance of these schools towards empirical research apparently encouraged five of them to participate in the Churches model. The focus on interpreting and applying research within these schools seems to lead indirectly to an increased disposition to participate in and lead research.

Further, with regard to research schools, one purpose of replication research is to explore the generalisability of established findings by testing the parameters within which effects apply. Arguably, one domain of parameters is the effect of practical conditions under which interventions have been implemented and applied. This suggests a distinct line of enquiry focused on the practical circumstances under which interventions work best. Another domain of parameters are the populations, sample groups, and contexts in which interventions work best. Research schools seem already well placed to offer views on the generalisability of implementation and application, as this is part of the scope of the intention of the extra resourcing. There seems a complementary overlap between the Churches model of replications and research schools, and this could yield substantial progress.

The EEF also offers other opportunities for UK schools and teachers to engage with researchers, encouraging practitioners to send them research proposals about how effectively schools are organised for learning, in their programme entitled “School Choices”. According to the EEF, popular research questions are given higher funding priority (Edovald & Nevill, Citation2021). This project has produced two projects on the effect on learning of grouping students by prior attainment (Hodgen & Taylor, Citation2019) and the effect on learning of being taught by newly qualified teachers who have been trained in different ways of entering the profession (Rutt, Citation2021). Here, schools can have some influence on which research topics are investigated, to the extent that they propose research popular with other schools.

The EEF also aims to reflect teachers’ interests in techniques that improve learning in everyday teaching in their programme “Teacher Choices”, which funds research projects on practical classroom practice (Styles, Citation2020). This project currently investigates two projects, one on the effect on tests of retrieval practice among 12-year-olds and another on the effect on reading comprehension of interleaving among 8- to 10-year-olds (Styles, Citation2020).

The EEF also provides guidance to schools, which is intended to draw them towards using more scientific methods to evaluate their policies as general good managerial practice (Coe & Kime, Citation2013). The EEF’s DIY Evaluation Guide (Coe & Kime, Citation2013) is an online text written for schools; it provides steps such as defining the research question in measurable terms, specifying the measurement instruments, defining counterfactual comparison groups, recording pre-test and post-test scores, and providing an Excel tool to carry out inferential statistical analysis. The project offers a potentially useful resource, describing some key concepts in the scientific method, written appropriately for school audiences.

However, its impact seems likely to be limited, as it is passive and receptive in nature, only consisting of text online with no other undertakings for encouraging schools to participate. The impetus to apply the methodology must arise from the schools and be maintained by them throughout, as there is no facility to interact with the project or communicate with researchers, or to receive input or feedback on their evaluations. There is also no facility for participating schools to share what they have found with others – an important part of scientific research. The project is promoted only through the guide’s prominent position on the organisation’s website. There are no available indications of the extent to which schools use the tool, such as testimonial examples or reports of outcomes of the project; consequently, it is not possible to assess its uptake by schools. The provision of a guide alone is unlikely to induce schools to use research and to be involved in it more than they are currently inclined to do.

EEF projects intended to improve schools’ use of empirical research might help schools to obtain more from the literature than presently exists. However, they provide little proactive encouragement or opportunity for schools to collaborate with researchers to engage in research themselves, though research schools’ immersion in interpreting and applying research might indirectly make them more inclined to carry out research, as evidenced by their participation in the Churches model.

On the one hand, this EEF approach to raising schools’ research capacity has the potential for restricting research to more rigorous designs, for example, large samples, and limiting those who carry it out to those with established expertise in the field. This will uphold standards and act as a bulwark against low standards of research. However, on the other hand, this approach might obstruct those without experience from acquiring it, missing the opportunity to harness the energy and motivation that arise from the ownership of being closely involved in the design and implementation of studies. It might also restrict the capacity to carry out research to the current cadre of researchers sufficiently skilled (Connolly et al., Citation2018; Fitz-Gibbon, Citation1986; Gorard et al., Citation2017; Perry et al., Citation2021).

The Churches model provides an example of a project that successfully brings teachers into the process of carrying out research in schools. It is a recent manifestation of long-standing calls by Fitz-Gibbon (Citation1986) and Connolly et al. (Citation2018) for schools to carry out micro-randomised controlled trials (RCTs) that are analysed in aggregate across multiple schools in meta-analysis. There are several indications that willingness to participate in RCTs is now present among schools (Churches et al., Citation2020; Dawson et al., Citation2018; Styles & Torgerson, Citation2018). The Churches model involves a larger number of school-led RCTs than previously, by creating a structure for supporting teachers to work in partnership with researchers, to conceptually reproduce studies in the science of learning findings themselves. The effectiveness of the Churches model seems to provide indications of how similar programmes, on a larger scale, could substantially increase research engagement among teachers and increase the capacity of schools to carry out informative research, including conceptual replication studies seeking to examine intervention effectiveness across contexts and in relation to factors such as subject area, pupil age, and implementation approach.

What we have, then, in the preceding discussion, is the recognition that teachers can and should be involved in rigorous classroom-based conceptual replication research, that collaboration should occur not only between researchers and teachers, but between teachers in different schools who join together in conducting randomised controlled trials on common topics of concern and interest (in the example above, this was the science of learning and how it applied on classrooms), that is, the conceptual replication is undertaken by the aggregation of results from many teachers in many schools. Developing research capacity in teachers involves training and ongoing support. Further, the conduct of conceptual research in classrooms can extend generalisability considerably, though care has to be taken to ensure not only that sufficient rigour is exercised in the research, but that undue variation – which derives from different contextual situations, different interpretations of the RCT’s “concept” in the conceptual replication, and the need to ensure that schools’ operations and organisational features are paramount – is avoided or balanced with the need for standardisation of key features of the conceptual replication. The intimate involvement of teachers in conceptual replication studies presents challenges; the discussion below turns to these.

Coordination across studies

If the goal of projects like the Churches model is to gather sufficient microstudies so that their aggregation forms coherent conceptual replication datasets on specific issues, then coordination between the small studies is important. Ideally, teachers agree on how many of the teachers will investigate each issue, including which approach with which age groups and subjects, to optimise coverage of the project, as a collective endeavour. In the Churches model, teachers had the freedom to select their own topics and approaches; this was to maximise uptake, and there were no arrangements for individual teachers to coordinate their approaches with other teachers. The central panel steered methodological approaches, corralling them towards more robust techniques; however, the topic, samples, intervention, tests, and counterfactual design were not forced by the project to coordinate with the work of other teachers. This laissez-faire approach aimed to maximise uptake, in a context in which there were good grounds to doubt whether any teachers could be encouraged to attempt trials.

Teachers’ first attempts at trials research seem likely to be motivated from the individual perspective of their own professional development, or to answer specific hypotheses for themselves, rather than as a collective endeavour. It seems likely that the motivation to coordinate with others and become part of a wider group investigating the same issue arises later in the developing mind-set of becoming a researcher than the incentives for the individual to try research for the first time.

Common methodological oversights

Several studies might have similar methodological limitations, suggesting that aspects of research design that teachers may often find challenging might require additional support in the development of research proposals. These are introduced below.

Dosage

A conceptual replication which involves several schools and teachers requires consistent “dosage” of the intervention across the participating schools and teachers. Many studies might design interventions in small quantities relative to the outcome that they are attempting to influence. For example, Makarova (Citation2018) compared scores in a 1-hr national examination unit in science among 60 fifteen-year-olds after half of them received one 35-min retrieval practice test similar to the final test in the lesson prior to a 1-hr test. They found no effect on performance, possibly due to the relatively small size of the effect of one retrieval practice relative to factors influencing performance in large and complex national exams.

Similarly, Baker and Hindley (Citation2018) compared times table recall of 200 eight-year-olds of which half received a short multiple-choice test once a week for 3 weeks, compared to half (the control group) who copied out the same 10 multiplications and solved them collaboratively with a peer, once a week for 3 weeks. They found no benefit, possibly due to the control condition receiving retrieval practice and collaborative learning likely to be at least as effective as the intervention.

Other studies have investigated the efficacy of computer software designed to improve recall of times tables by providing novel mnemonic visual and auditory presentations of multiplication facts. The teachers Dunford and Rhoades (Citation2018) provided a relatively high dose of 10 min every day for 2 weeks, whereas Pemberton’s (Citation2018) intervention was half as long, five times in 2 weeks. The larger dose showed a statistically significant and large effect size (Cohen’s d = 0.8), whereas the smaller intervention did not.

Sufficient detail to replicate

For a conceptual replication to work effectively, sufficient detail should be provided for the participants to be able to conduct the replication with fidelity to the proposed research. In the Churches model, several study posters omitted details that were important for gaining a clear sight of the mechanisms at work or for enabling other educators to repeat the techniques themselves. For example, the teacher Morris (Citation2018) compared 9-year-olds’ recall of times tables after half had received repeated testing over 2 weeks, finding a large statistically significant effect. However, the study provides no details of the tests, how often they were given, their length, or indication of cognitive activity required to complete them. Consequently, the reader has no sight of the type of testing or the quantities likely to yield similar effects. Similarly, Siddle (Citation2018) compared the vocabulary of 160 five- to 9-year-olds after half were tested on their recall three times in a month, with a view to producing a testing effect. No details are provided of the tests such as their format, number of items, or duration. Consequently, readers are unable to ascertain the form or scale of vocabulary testing responsible for producing effects.

Trade-off between complexity versus statistical power

The advantage of conducting a conceptual replication across many schools and teachers is that, when the results are combined, the drawback of small samples is purported to be overcome. Many teacher-led studies employ relatively small sample sizes, the largest in the low hundreds, close to a minimum necessary to measure reliable learning effects among varied students over short periods of time. However, many choose to dilute already low statistical power to investigate more complex questions by designing more than one level of intervention. For example, Morris (Citation2018) sampled four classes and chose a design with four different conditions, assigning a class to each, making it likely that the differences in outcomes were due to differences between the classes. Similarly, Siddle (Citation2018) divided a sample of 320 pupils into three groups to compare the effect of distributing retrieval practices over different timeframes. Here the study reduced its statistical power by one third, for the purposes of measuring the best dispersion of practice sessions at the expense of measuring the central issue of whether retrieval practice benefits vocabulary learning.

Overall, however, the research designs developed in the Churches model were robust and provide a clear sight of the effectiveness of techniques. Most controlled for unobserved variation between participants by allocating them to conditions at random. Experimental control was augmented by statistical control, where differences between the conditions were adjusted in the analysis, by entering pre-test scores as a covariate.

Analysis of multiple teacher-led studies

In conceptual replications that run across many teachers and schools, the rigorous and fair analysis of data can be challenging. Analysing and interpreting studies, such as these in the Churches model, present a number of challenges, as the aggregated dataset might include variation in many of the parameters that experiments are normally designed to constrain. As each teacher develops their own intervention, there is variation between studies in what is being measured with regard to its effect on learning. Consequently, when analyses are combined across the studies, it might not be clear what is affecting learning. Similarly, as teachers develop their own instruments to measure outcomes, there is likely to be variation in what is being affected by the interventions. Consequently, analysing all the studies as if they were one experiment leads to a lower resolution of information than might appear at first sight. The Churches model highlighted this issue and approached analysis by using random effects models to allow for unknown variability in what was delivered in the interventions and in what was represented by the outcomes. This variation was also partially compensated for by the statistical power of the large sample size of all the studies combined. This found that all the teacher trials across the topics and approaches had an overall benefit to test scores with an effect size of Cohen’s d = 0.3. Thus, teachers were able to improve performance on teacher-created tests, using techniques informed by their study of effects well known to the learning sciences.

Implications of the Churches model for greater collaboration between teachers and researchers

The Churches model succeeded in encouraging a large number of teachers to collaborate with researchers to produce RCTs of science of learning effects in schools. This demonstrates greater engagement between teachers and researchers than is common. However, this does not seem to be the only finding with wide-reaching implications for greater collaboration between learning science and schooling.

The original objectives of the project were limited to providing training to teachers to develop their understanding of trials (Churches, Citation2016; Churches & Dommett, Citation2016). The organisers were surprised to find that, following training, the teachers began to act on their own initiative to run RCTs on the cognitive effects in their schools and return the results to the project without instructions to do so (Churches, Citation2016; Churches & Dommett, Citation2016; Churches et al., Citation2020). In response, the organisers expanded the scope of the projects from simply providing training to supporting and encouraging teachers’ disposition to develop their own RCTs of cognitive science effects. The fact that teachers responded to training by independently running their own controlled studies suggests a high level of interest among teachers to be more involved in learning science research. Indeed, as the original project did not encourage or even suggest that teachers carry out their own RCTs, it seems unlikely that the provisions of the project itself were only or largely responsible for driving this response. Rather, it seems that the project serendipitously, fortuitously, tapped into a strong underlying propensity among the teachers to be more involved in learning about learning and its related research. Other indications of this propensity are evident in teacher surveys; for example, Simmonds (Citation2014) found that most teachers (80%) reported a willingness to collaborate with researchers whose projects concerned learning.

However, it is important to consider that the response of participants in the Churches model represents inclinations that were only present among the teachers selected to attend on the basis of special knowledge and experience. Whilst it seems likely that the participants’ familiarity with the concepts played a role in enabling them to carry out RCTs on their own, the drive to do so appears to have arisen from the teachers themselves, as the project originally provided no direct impetus encouraging it. The attendees chose to use the research within their teaching, instead of stopping their teaching to do the research, suggesting that the motivation arises from a desire to become more effective teachers rather than to become researchers. This desire for professional development might extend beyond the participants in the project. This paper suggests that the high level of motivation demonstrated by the Churches model reflects a more general inclination for teachers to be more involved in learning about learning and its related research, and, indeed, for research more widely, for the purposes of becoming better educators, and that this may mean that teachers would participate, given the opportunity for similar projects. Whether teachers would be willing to engage in similar forms of micro research could be studied by a repeat of the Churches model with different areas of focus and, thereby, with different conceptual replications.

Replicating the Churches model of replications

This paper proposes that it is useful to repeat the Churches model in conducting conceptual replications, retaining its original processes and structures, as it provides an approach with much greater success than was expected. The Churches model demonstrates the circumstances under which scores of teachers were induced to conduct trial replications; for the model to be tested more widely would require larger quantities of data, and with different “concepts” to be tested in conceptual replications (i.e., not only the science of learning). The question then emerges of whether the Churches model can be scaled up or extended more widely to provide such data. How could the Churches model be replicated to reach a larger number of teachers, and adapted to contribute evidence that is likely to inform improvements directly in the effectiveness of teaching?

The success of the Churches model might be largely due, for example, to the freedom that teachers had to create interventions, choose research designs to fit their teaching arrangements, and create tests to measure impact. The freedom to design the interventions and tests allowed, indeed enabled, teachers to incorporate experiments into their current teaching programme and practices, providing relevance to professional concerns and interests. This enables teachers to exercise their professional judgement, encouraging them to identify with research as part of this.

Further, the freedom to adapt research designs to fit counterfactual comparisons around tightly managed school structures might be crucial to their participation, giving teachers the opportunity to arrange for experimental conditions within often inflexible school arrangements. It allows and enables “research-minded” teachers to take advantage of circumstances in which natural experimental conditions might exist or which can be created easily with students within their authority to assign to groups. Consequently, a replication of the Churches model should include one feature of the model, which is that of having the same broad freedoms to design interventions, tests, and research designs for the purposes of increasing teachers’ participation.

Whilst these freedoms might increase teachers’ willingness to engage in empirical research, they introduce variation into many of the parameters that experiments are normally designed to limit for the purpose of collecting informative data. This presents a significant challenge to expectations that teacher-led trials could provide data that can isolate the effect of interventions from other influences, and to provide a clear sight of effects of the intervention. Teacher-designed interventions are based on individual interpretations of the mechanisms driving effects, differing between teachers and making it unlikely that teacher-led studies which are reported to be studying the same effect are, in fact, investigating the same effects. To gain a clear sight of the effect of an intervention, a replication of the Churches model would involve the teachers in the project being asked to apply a strictly defined intervention.

Teachers in the Churches model employed a variety of research designs. For example, some compared students before and after interventions, whereas others compared those receiving interventions with different individuals receiving “normal” conditions. Combining studies using within-subject designs with those using between-subjects designs risks confusing variation due to impact with variation due to “normal” differences between groups. As freedom to fit experiments into school arrangements seems likely to determine much participation, a replication of the Churches model might encourage teachers to design their own arrangements for counterfactual controls. However, a replication would attempt to adjust for distortions in the data due to different research designs in analysis, by carrying out separate meta-analyses for groups of studies using similar research designs in an approach to meta-analysis proposed by S. Higgins (personal communication, July 11, 2021).

Studies reported in this paper also used a variety of sampling methods to assign students to conditions. Some paired individuals with one other student with similar scores using criteria set by the teachers. The comparability of these pairs is likely to vary widely between studies, as teachers use tests that vary in how well they hold important background variables constant. Teachers also differ in the criteria that they use to determine that two students are matched; some apply a much stricter interpretation than others. In the Churches model reported here, methods of randomisation also varied between teachers. Some used a criterion to define a pool of students who would be randomly assigned. Others created pools of students with different levels of attainment, and randomised students within each stratum. Thus, there was some selection of those to be randomised, indicating that studies varied in how representative they were of the real-world school populations. A replication of the Churches model would provide enhanced training on sampling methods that provided clear explanations of the advantages and disadvantages of different sampling techniques, with a ranking of the security of each approach. It would also require participants to provide a “write-up” of experiments in sufficient detail to allow fair replication to be conducted. A replication would attempt to adjust for variations in sampling and for differences in research design, for example, by analysing studies in groups that used a comparable sampling method.

Many teachers’ interest in participating might depend upon the relevance of experiments to learning involved in their current teaching. Consequently, the freedom to test an intervention’s effect by creating their own tests of current subject matter is important for uptake. However, this freedom risks obscuring clear sight of effects, as “homemade” tests on a variety of forms of learning might vary on factors important for understanding the mechanisms at work. A replication of the Churches model would include enhanced training on the creation of tests, encouraging participants to use parts of tests available to other schools and those that have been standardised to age norms, such as past national examinations and statutory assessments. A conceptual replication would allow the same freedom as obtained in the Churches model, to create tests, and to encourage take-up of opportunities for the conceptual replication research. However, meta-analysis would be carried out on groups of studies using comparable tests. A separate part of the project would be to provide specified tests to be used by teachers wishing to contribute to informative research within constrained research protocols.

Data analysis in the Churches model used random effects to adjust for differences between studies in the interventions and outcomes, on the assumption that the interventions were evoking different cognitive mechanisms and that the outcomes measured different learning. This renders it unclear as to what is affecting what. To obtain a clearer picture of which interventions affect which learning outcomes, a repeat of the Churches model could apply two analytical strategies: One would control for the differences between studies by analysing studies in which teachers used their freedom to conduct varieties of experiments in groups of comparable interventions, design, and tests, in a strategy proposed by S. Higgins (personal communication, July 11, 2021); the second strategy would attempt to meet the assumptions for fixed-effects meta-analysis, that is, that the interventions were measuring the same cognitive mechanism and the outcomes were measuring the same kind of learning. This could be achieved by experimental control rather than statistical adjustment, by encouraging teachers to carry out experiments using the same methods and procedures as others. This would allow clear sight of the extent to which specific interventions affect specific learning outcomes, thereby providing greater clarity on which mechanisms are effective.

A repeat of the Churches model could use the same training but provide it online, as proposed in the Churches model itself, to allow larger numbers of people to participate, including different geographical contexts. Greater emphasis would be placed on training sampling techniques and the choice of tests. It would allow the same freedoms to design interventions, methods, and tests, for the purposes of incentivising uptake. Training might steer participants towards simple, standardised experiments that teachers were more commonly positioned to implement for the purposes of gathering more data under comparable conditions. It would attempt to achieve this by creating a system to guide the process of proposing experiments and their protocols. This would provide feedback on design decisions and would provide advice on advantages and disadvantages of proposals, as well as providing suggestions for alternative designs that are more in line with standard requirements in experiments.

Greater emphasis would be placed on participants’ “write-up” of experiments in which a system would require participants to submit sufficient specifications of experiments to enable others to replicate them. This information could be used to steer participants towards standardised trials and increase the speed of analysis and the production of dissemination outputs, for example, through peer-reviewed journals. Project leaders would carry out much of this process to avoid overburdening teachers. Where successful, possible, or actual publication of RCTs could act as a substantial incentive, raising the professional status of the teacher authors and encouraging other teachers to attempt similar careful studies.

Teacher-led trials as a source of replications

The issue of repeating experiments is central to scientific inquiry in order to avoid the charge that findings are merely the artefact of unique conditions. Consequently, maturation of an area of study must involve repeating experiments. Teacher-led trials are initially unlikely to yield many direct replications, where precise details of other studies are reproduced. This might be partly because teachers new to research sciences seem likely to see the benefits of doing research through the lens of their immediate professional concerns and interests. Their interest in carrying out research may depend upon the relevance of measured outcomes to their current teaching. This may lead to many teachers choosing topics and outcomes that are specific to the subject matter that they are teaching, which might differ between teachers, making direct replications of common topics or outcomes less popular.

However, many teacher interests are in the same subject matter with a common curriculum. In the Churches model, a large number of studies focused on the same outcomes, multiplication tables, spelling, and vocabulary although the comparability of spelling and vocabulary outcomes might be questionable, as details were limited, and variety in important differences between these tests was likely. Consequently, there would be opportunities to interest teachers in studying the same specified outcomes aligned with a common curriculum.

Teachers might also be less well positioned to carry out direct replications with tightly specified intervention and counterfactual conditions if they (the teachers) have limited control over the arrangement of student groups in schools. It also seems possible, even likely, that, just as with the situation of topics and outcomes, many teachers’ interest in participation will depend upon the perceived relevance of interventions to their current teaching. Consequently, for many teachers, engagement in the project might arise from the facility to adapt interventions into forms that tend to be unlike the interventions of other teachers.

Going beyond conceptual replications, the motivation to conduct direct replications relates to a possible desire to contribute to a wider dialogue across a profession about a shared definition of mechanisms at play in promoting learning. Some teachers might be interested in aligning their work closely with others, in order to make such contributions. Many teachers’ interest in trials might relate to the opportunity to innovate and distinguish themselves from others’ interpretation, and/or to investigate research questions for their own, independent, professional interest. Knowing how many teachers would participate in direct replications is an open question. Conceptual replications might offer more scope for teachers to vary conditions; this may be a more inviting prospect for teachers than direct replications, while also achieving the typical aim of conceptual replications of examining whether an intervention “works” in different contexts, conditions, measures, and intervention variants. Overall, this paper suggests a model which could shed light on the feasibility of harnessing the teacher population to fill gaps in our understanding of “what works” with regard to conceptual replications.

Conclusion

If empirical research in education is to have an impact on improving learning in classrooms, more of it is needed to build a body of evidence that practitioners understandably require before becoming convinced that it can help. Traditional top-down routes of commissioning research might not have the capacity to fill the gaps in evidence needed to further this. Teacher-led trials have the potential capacity to help here, partly because of the number of teachers and partly because “insiders” might be better positioned than “outsiders” to fit research around schooling, to which is added the opportunity to bring teachers into much closer ownership of the field. The Churches model demonstrates that many teachers, given appropriate training and structure, are willing and able to create trials that provide evidence of a standard of published research in education.

This paper has aimed to describe how a balance may be found between attracting teachers to participate by incorporating a degree of freedom and collecting informative data in controlled conditions, nudging teachers towards greater experimental standardisation and control and applying tighter statistical control. Large increases in scale and geographical reach can be supported by online training and structures. The Churches model originally underestimated teachers’ ability to carry out randomised controlled trials. Replicating the Churches model online might attract larger numbers of teachers into working more closely with learning science and research methods, increasing the research capacity of schools and potentially opening up opportunities to generate more replication study data that inform how learning can be improved.

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Notes on contributors

John F. Brown

John F. Brown is currently a statistician working for the School of Education at Durham University on the Sutton Trust-Education Endowment Foundation’s Teaching and Learning Toolkit. He has published on randomised controlled studies in education and teacher recruitment. John was previously a multi-academy trust manager and a statistician for the Department for Education.

References