1,695
Views
1
CrossRef citations to date
0
Altmetric
Articles

Replicable quantitative psychological and educational research: Possibility or pipe dream?

ORCID Icon
Pages 111-121 | Received 29 Dec 2021, Accepted 20 Apr 2022, Published online: 25 Jun 2022

ABSTRACT

Since the advent of the twenty-first century, science has experienced a crisis pertaining to the replicability of quantitative research findings, which has become known as the ‘replication crisis’. The replication crisis has particularly afflicted research in the behavioural sciences, and psychology in particular. Given the relevance of psychology to education, it is unsurprising that the replication crisis also presents an issue for quantitative educational research, thus potentially compromising its practical usefulness. This article outlines the replication crisis in psychology, and highlights its significance for quantitative educational research. Greater methodological rigour, aimed at addressing methodological deficiencies in the conduct of scientific research, has been suggested as a response to the replication crisis. Following a review of various calls for greater methodological rigour, the current article argues that, whilst methodological deficiencies may be a contributory factor, there is potentially a more fundamental reason for the replication crisis. The case is made that the non-separability of a measured attribute from the measurement instrument, and the irreducible uncertainty in unmeasured attributes, may be the principal reasons for the replication crisis in psychology and education. Implications of such an explanation for the crisis are articulated in relation to the evidence-based policy agenda in education.

Introduction

Since the beginning of the twenty-first century, scientific research has experienced a crisis of confidence in the form of what has become known as the ‘replication crisis’. This refers to the fact that, in many areas of scientific endeavour, difficulties have arisen when attempting to reproduce previous research findings (Earp & Trafimow, Citation2015; Fanelli, Citation2018; Hughes, Citation2018). Psychology has been particularly impacted by the replication crisis, with a study by the Open Science Collaboration (Citation2015) reporting that, of 100 studies published in three psychology journals, the replication rate could be as low as 36%. All 100 attempted replications were conducted using high-powered designs, and made use of the materials utilised in the original studies where possible. Similar difficulties were subsequently reported in other areas of scientific endeavour, thus extending the scope of the replication crisis (Fanelli, Citation2018). Concerns pertaining to the replication crisis, and the reliability of research findings, have spread to several social science disciplines, including education (Frias‐Navarro et al., Citation2020; Makel et al., Citation2019).

The replication crisis has potentially deleterious implications for educational research if it transpires that a large proportion of published quantitative research findings are unreliable. Such a situation would negatively impact upon the work of other researchers who utilise the findings in follow-on studies. It is also important to note that a crisis of confidence in the reproducibility of research findings would engender grave concerns amongst those who fund educational research with a view to identifying and supporting effective practice. Perhaps the most concerning implication of the replication crisis for education, however, is the potentially adverse effect it could have on the work of policymakers and practitioners, who rely on reliable and robust research findings to serve as the foundation for evidence-based policy and practice decisions (Makel et al., Citation2021). For example, an efficacy trial of Collaborative Strategic Reading (CSR), which is a reading intervention that utilises both strategy instruction and peer-led discussions about text, revealed positive impacts on reading comprehension (Vaughn et al., Citation2011). However, a subsequent effectiveness trial of CSR failed to replicate this finding (Kim, Citation2019), which is problematic for policymakers and practitioners since evidence concerning the efficacy of the intervention is inconsistent.

In the context of psychology, the term ‘replication crisis’ has proven to be somewhat contentious since difficulties associated with replication are often linked to a myriad of issues pertaining to dubious research and publication practices. Consequently, some researchers have argued that psychology is not experiencing a replication crisis, or that use of the term ‘replication crisis’ distorts the reality of the situation (Gilbert et al., Citation2016). Nevertheless, the term ‘replication crisis’ appears to have endured in relation to the difficulties associated with replicating research findings in psychology (Wiggins & Christopherson, Citation2019), and it has also been mooted in connection with reliability of research findings in the closely allied discipline of education (Hedges, Citation2018; Makel et al., Citation2019; Makel & Plucker, Citation2014).

The current article gives a brief historical overview of the replication crisis in psychology, and its implications for educational research, before reviewing various calls for greater methodological rigour that have been made as a response to the crisis. Whilst most responses advocate a range of strategies for addressing methodological deficiencies in research and publication practices, this article fundamentally challenges, at a paradigmatic level, the premises upon which the notion of replication is predicated. I draw upon Wittgenstein’s later philosophy of mind to offer a novel philosophical critique of the quests for objectivity and universalism in psychology and education that are implicit in replication of empirical research findings. Finally, implications of this critique for the evidence-based policy agenda in education are articulated.

The following section provides a brief historical overview of the emergence of the replication crisis in psychological science, and it also outlines some relevant perspectives on the replication of quantitative educational research findings that have been highlighted in the literature.

The replication crisis in psychology and education

The replication crisis in psychology can be traced back to the latter part of the nineteenth century, when some psychologists failed to replicate the findings of other researchers (Wiggins & Christopherson, Citation2019). However, increasing levels of sophistication in the methods used to measure psychological phenomena have been viewed as a means of ensuring that the findings of psychological research are replicable. Because of these increasing levels of sophistication, measurements of psychological attributes are deemed to be reliable, and thus reproducible in different contexts. Psychologists therefore contend that the verification of true hypotheses in different contexts should not be problematic (Wiggins & Christopherson, Citation2019). Nevertheless, some notable events during the past decade have led to a renewed focus on the replication crisis in psychology, including the reticence of the Journal of Personality and Social Psychology to publish studies that failed to replicate Bem’s (Citation2011) highly contentious findings in support of the phenomenon of precognition of a future event that could not reasonably be anticipated (Aldhous, Citation2011).

It is important to distinguish between the notions of ‘conceptual replication’ and ‘direct replication’. A direct replication entails using the same methodological approach as a given study in an attempt to replicate its findings. A conceptual replication, on the other hand, is a study that follows as a natural consequence of a particular study’s findings, but which does not utilise exactly the same methodology as the original study. During the latter part of the twentieth century, greater priority was given to conceptual replication as a vehicle for further developing psychological theories while simultaneously confirming the reliability of a particular result (Wiggins & Christopherson, Citation2019). However, a conceptual replication may not be an adequate substitute for a direct replication involving use of exactly the same methodology as the original study.

The latter part of the twentieth century also heralded a focus on the standardisation of research methodologies in psychology, and psychologists came to view their use of standardised research and statistical methods as legitimising their work (Danziger, Citation1990). Therefore, accepted methodological and statistical approaches were effectively adopted as algorithms that could substantiate research findings since they offered a systematic and rigorous approach to the analysis of data that could reliably quantify the magnitude of a particular effect. Accordingly, psychologists tended to place greater emphasis on ensuring that their research adhered to the accepted methodological approaches rather than conducting direct replication studies, assuming that their use of standardised approaches would be sufficient to disprove and eliminate false theories (Wiggins & Christopherson, Citation2019).

Despite this focus on methodological and statistical standardisation, it is worrying that some psychological researchers appear to be under the illusion that, in statistical significance testing, the p value implies the probability of a successful replication is 1p, so that p values <0.05, for example, have at least a 95% chance of successfully replicating, thus leading to the mistaken assumption that replication studies are redundant (Gigerenzer, Citation2018; Wiggins & Christopherson, Citation2019). The potential impact of a number of other dubious research practices have been highlighted by Simmons et al. (Citation2011). These include terminating data collection only after statistically significant findings emerge, and collecting many variables but selectively choosing to include only those that lead to significant findings in statistical models, both of which could precipitate false-positive results in a research study and lead to findings that are not replicable.

Replication is an important facet of another area of scientific endeavour closely related to psychological research: quantitative educational research. Indeed, one of the founding fathers of modern educational statistics, Sir Ronald Fisher, viewed replication as an important component of experimental research design (Cai et al., Citation2018). Collins (Citation1985, p. 19) described replication as ‘the Supreme Court of the scientific system’, where the validity of previous research findings can be tested, or the conditions under which the findings hold can be investigated. It is therefore problematic that scant attention has been paid to replication of previous findings in quantitative educational research (Makel et al., Citation2019).

In an analysis of the full publication history of the top 100 education journals according to 2011 5-year impact factors, Makel and Plucker (Citation2014) reported that just 0.13% of the 164,589 articles that had been published (i.e. 221 articles) involved replication studies (which were direct replications, conceptual replications, or a mixture of both). Makel and Plucker (Citation2014) found that 67.4% of the 221 replication studies successfully reproduced the findings of the original study, but the replication success rate was influenced by replication type and authorship characteristics of the replication study. Just 28.5% of the replication studies were direct replications, 69.2% were conceptual replications, and 2.3% entailed a mixture of direct and conceptual replications. Successful replications of the original studies were reported by 71.4% of direct replications, 66% of conceptual replications, and 60% of mixed replications. However, it is noteworthy that the overall replication success rate for all types of replications decreased from 67.4% to 54% when none of the authors of the replication study featured in the authorship of the original study. This is a worrying finding since it means that almost half of the replication studies failed to replicate the previously reported findings.

It is thus apparent that a limited number of replication studies have been attempted in education, and the replication rates are small for those that have been conducted. This is problematic since findings from single studies cannot provide a secure basis for informing policy and practice decisions in education. The following section summarises the various calls for greater methodological rigour that have been articulated in the literature as a consequence of the replication crisis.

Greater methodological rigour as a response to the replication crisis

A major focus of those seeking to address the replication crisis has been on identifying and challenging dubious research practices, i.e. decisions made by researchers and/or the use of methodological approaches that may exacerbate the controversy. These include, for example, the use of small samples, non-registration of research protocols before the research is conducted, absence of statistical power planning, and post-hoc formulation of hypotheses to ensure statistically significant results are obtained (Frias‐Navarro et al., Citation2020; Wiggins & Christopherson, Citation2019). Many researchers in both psychology and education have admitted to engaging in dubious research practices that yield positive, but potentially unreliable, research findings (John et al., Citation2012; Makel et al., Citation2019). Consequently, it has been suggested that, when publishing their findings, scholars should be required to more fully explain their methodology and to refrain from concealing vital aspects of the approach taken (Simmons et al., Citation2011). It has also been argued that dubious research practices could be addressed by requiring researchers to pre-register their research protocols (including hypotheses, methodological plans and analytical plans) before data collection commences, so that methodology-related decisions are divorced from their potential impact on the study’s findings (Frias‐Navarro et al., Citation2020; Nosek et al., Citation2018). In addition, there have been calls to mandate the sharing of complete data sets, and the code used to analyse the data, to permit other researchers to verify the findings of a study, and as a further antidote to dubious research practices (Frias‐Navarro et al., Citation2020). Although such data/code sharing is encouraged by some journals, not all researchers actually do so (Vanpaemel et al., Citation2015).

Under the umbrella of dubious research practices, some aspects of statistical analysis warrant particular consideration. Reformers have raised questions about the uses and, more importantly, abuses of statistical hypothesis significance testing. More specifically, they have identified so-called ‘p-hacking’ as a problematic practice that has the potential to undermine the replicability of research findings in psychology and education (Frias‐Navarro et al., Citation2020; Wiggins & Christopherson, Citation2019). This refers to a variety of strategies that researchers may employ in an attempt to ensure that statistical significance is attained, such as manipulating the data or analysis approach. Data manipulation may entail such practices as including or excluding outliers in an inconsistent manner to generate a p value that suits the researcher’s agenda, while questionable analysis approaches may, for example, entail comparing different subgroups until a significant result emerges and then retrospectively framing the hypothesis. Bayesian analysis is a suggested alternative to statistical hypothesis testing since Bayesian methods are concerned with the likelihood of future events based on knowledge of past events, which is particularly pertinent for those who are concerned about reproducibility of research findings (Krueger, Citation2001). Alternatively, it has been suggested that, if the current widely-used approach to statistical hypothesis testing is retained, effect sizes and associated confidence limits should be reported rather than simply relying on p values (Cumming, Citation2014).

The term meta-analysis was introduced by Glass (Citation1976) to describe a quantitative approach for systematically reviewing a body of research. In meta-analysis, the findings of individual studies are transformed into effect sizes, and these effect sizes are then pooled across several studies to give an overall indication of the effect size associated with the phenomenon under consideration. Indeed, in what has been termed education’s ‘Holy Grail’, Hattie (Citation2009) presents the results of a meta-meta-analysis of over 800 meta-analyses, with the aim of summarising and synthesising a very large volume of empirical research on the effects of various educational interventions and influences on student achievement. Although meta-analysis has proven to be an extremely popular method for summarising bodies of research, it has also attracted extensive critique (Sharpe & Poets, Citation2020). Whilst meta-analysis has been heralded as a potential solution to the replication crisis, Makel and Plucker (Citation2014) make the point that meta-analyses combine the results from studies that were conducted for different purposes, but the main purpose of replications is to reproduce previous research findings.

Prestigious academic journals usually prioritise the publication of original and statistically significant findings. Such outlets tend to be reticent to publish non-significant results and replication studies, which consequently leads to lacunae in the literature and a distinct bias towards statistically significant findings (Frias‐Navarro et al., Citation2020; Wiggins & Christopherson, Citation2019). To address the reticence of journals to publish null results, Nosek and Lakens (Citation2014) suggested that protocols should be established to ensure that both significant and non-significant findings are published for studies conducted in a sufficiently rigorous manner. To deal with journals’ reluctance to publish replication studies, Srivastava (Citation2012) suggested that the onus should be on journals to publish replication studies that do not yield the same conclusions as research studies they initially published.

As illustrated in this section, the various responses to the replication crisis have focussed mainly on methodological issues to promote greater rigour in research practices. Rather less consideration has been given to the potential role of the fundamental theoretical and philosophical underpinnings of psychological and educational measurement in the replication crisis. Accordingly, the following section addresses this important omission from the extant literature on the replication crisis in psychology and education.

Paradigmatic explanation for the replication crisis: Non-separability and irreducible uncertainty in psychological and educational measurement

Reformers seeking to address the replication crisis in psychology and education have tended to be less critical of the objectivist epistemological underpinnings of quantitative psychological and educational research, and to focus their reform efforts on improving methodological practices in pursuing objectivist ideals. Objectivist epistemology holds that psychological and educational phenomena exist independently of the researcher, and that their nature can be uncovered using appropriate research methods. The basic premise upon which the notion of replication is predicated presupposes that universal principles and laws underpin an objective social reality. In this section, however, I challenge the validity of such an objectivist epistemological stance, and commitment to universalism, in the context of psychological and educational research by drawing upon fundamental aspects of Wittgenstein’s later philosophy of mind to argue that:

  1. Measurements of intentional psychological predicates (such as thinking, learning, understanding, remembering, meaning, and intending) of the type that feature in quantitative psychological and educational research cannot be separated from the instruments used to measure them;

  2. There is irreducible uncertainty in unmeasured intentional psychological predicates.

The arguments below are developed using the measurement of cognitive abilities for illustrative purposes, but similar arguments could be propounded for the measurement of other intentional psychological predicates.

Wittgenstein contends that a psychological trait, such as ability, cannot just consist of a mental process: ‘In the sense in which there are processes (including mental processes) which are characteristic of understanding, understanding is not a mental process’ (Wittgenstein, Citation2009, §154). In particular, Wittgenstein’s discussion of rule-following leads to philosophical dilemmas if a finite mental object, such as an image or formula, is viewed as the source of a student’s ability. This then precipitates further conundrums if, for example, a standardised test of mathematical achievement is deemed to be an appropriate measuring instrument for checking up on students’ mathematical abilities, as summarised below.

Bruner (Citation1996, p. 129) has defined learning as developing the capacity to follow rules so as to ‘go beyond the information given’. Wittgenstein (Citation2009), however, demonstrates that a paradox arises if the origins of a student’s ability to act according to a given rule is construed as a mental object such as a formula or image. Wittgenstein posits that, if a finite mental object were the source of the student’s ability to correctly follow the rule, any behaviour on the part of the student could be viewed as either agreeing with, or violating, the requirements of the rule if a suitable interpretation of the object is adopted.

This was our paradox: no course of action could be determined by a rule, because every course of action can be brought into accord with the rule. The answer was: if every course of action can be brought into accord with the rule, then it can also be brought into conflict with it. And so there would be neither accord nor conflict here. (Wittgenstein, Citation2009, §201)

For example, suppose that a question, QM, on a standardised test of mathematical achievement requires students to calculate the value of a3 when a=4. Clearly, a student could attach the usual interpretation to the formula, i.e. replace a by 4, then calculate 4×4×4, and respond with the answer ‘64’. Alternatively, the student could interpret the formula in various unconventional ways, and suggest answers such as 12 (obtained by multiplying 4 by 3), 7 (obtained by adding 4 and 3) or 81 (obtained by calculating the value of 3×3×3×3), or a multitude of other possibilities. In other words, the student could either interpret and respond to the question by following the normal (correct) mathematical conventions, or else they could fail to adhere to the normal (correct) conventions and consequently give an answer that is inconsistent with normal conventions.

In a similar vein, consider a question, QL, on a standardised literacy test which requires students to explain the meaning of the underlined word in the sentence: ‘The book was in mint condition.’ A student could interpret the underlined word in the conventional manner, and proffer an answer such as ‘perfect’. However, the student could instead attach an unorthodox interpretation to the word, and respond with answers such as ‘smells like a mint sweet’, ‘it is green’, or a plethora of other options.

The inability of a finite mental object (e.g. mental image or formula) to guide the student in the correct use of a rule cannot be addressed by positing that the student must be able to interpret the object correctly. Alas, this does not rectify the philosophical conundrum since any interpretation of the corresponding mental object could itself be interpreted in multiple different ways, hence leading to an infinite regress:

If it requires interpretation, that could be done in lots of ways. So how do I tell which interpretation is correct? Does that, for instance, call for a further rule—a rule for determining the correct interpretation of the original—and if so, why does it not raise the same difficulty again, thereby generating a regress? (Wright, Citation2001, p. 163)

In addition, the difficulties surrounding interpretations are not dispensed with by postulating that a Platonic mechanism, which obviates the need for interpretation, exists in the student’s mind, but enables the student to access all potential future applications of the rule. Wittgenstein dismissed mathematical Platonism as an explanatory mechanism for rule-following behaviour: ‘The mathematician is an inventor, not a discoverer’ (Wittgenstein, Citation1978, I, §168). Although this is an explicit rejection of Platonism in the context of mathematical rule-following, Wittgenstein (Citation2009) generalises the argument to all forms of rule-governed behaviour.

Wittgenstein’s analysis of rule-following implies that, before a student proffers an answer to a test item, such as the sample mathematics question, QM, or the sample literacy question, QL, criteria for determining a correct application of the relevant rule do not exist. In other words, the student is in a superposition of two states before giving an answer to the question: the student is both correct and incorrect. Wittgenstein contends that a rule cannot be followed privately (i.e. in the mental realm) since criteria for the correct application of the rule do not exist in the mental realm. Instead, Wittgenstein argues that established practices determine whether a rule has been applied correctly or incorrectly:

‘Following a rule’ is a practice. And to think one is following a rule is not to follow a rule. And that’s why it’s not possible to follow a rule ‘privately’; otherwise, thinking one was following a rule would be the same thing as following it. (Wittgenstein, Citation2009, §202)

For example, in the sample mathematics test item referenced above, QM, the criteria associated with the mathematical practices of algebraic substitution and calculating the cube of a number are invoked when the student proffers an answer of ‘64’, or a different answer, to decide if the answer is correct or incorrect. Clearly, an answer of ‘64’ would be judged to be correct, but any alternative answer would be deemed incorrect as it would not conform with the relevant disciplinary practices. When the student responds to the test item, he or she transitions from indefinite ability with respect to the item (i.e. they are both correct and incorrect) to a definite ability with respect to the item (i.e. they are either correct or incorrect). A similar line of reasoning could be applied to the sample literacy test item, QL. Prior to answering QL, a student has indeterminate ability with respect to QL, but the relevant ability transitions to a definite state after an answer is offered.

Therefore, the correctness, or otherwise, of a student’s response to a particular test item is judged by comparing the response with the answer that accords with the relevant disciplinary customs or practices, and the student’s ability relative to the test item is indefinite before the student responds to the item. Consequently, it can be concluded that the measurement process influences the measured value of the student’s ability since the relevant ability measurement is actually a joint property of both the student and the device used to measure the ability, i.e. the items on the test. Thus, it appears that the relevant student ability does not exist as a thing-in-itself, and that it is only meaningful to refer to it relative to the instrument used to measure it, i.e. the relevant test paper. The prospect of validity in psychological or educational measurement thus appears to be a pipe dream rather than a possibility, since it is problematic to abstract a measurement of a psychological predicate away from the instrument used to measure it (Cantley, Citation2015, Citation2017, Citation2019).

It is plausible that the suggested indeterminism in unmeasured psychological or educational predicates might be challenged by incorporating arguments pertaining to brain neural processes. Clearly, such processes are measurable using sophisticated technological approaches. However, the measurements would be first-person/third-person symmetrical, rather than adhering to the first-person/third-person asymmetry which Wittgenstein (Citation2009) contended is a hallmark of intentional psychological predicates. In the context of psychological predicates, there are no criteria for first-person ascriptions, but third-person ascriptions are based on criteria. In the case of physical phenomena, such as brain neural processes, both first-person and third-person ascriptions are based on the invocation of criteria.

The original metaphorical use of the word ‘inner’ reflects the realization that you and I stand on a different logical level in regard to what I think and feel. But the view that thoughts and feelings are brain-processes abolishes this logical difference. If this view were true, you and I would stand on the same level in regard to what I think and feel. In order to ascertain my thoughts and feelings you and I would equally have to rely on advanced technology and scientific theory. (Malcolm, Citation1986, p. 191)

This implies that measurements of intentional psychological predicates made using technological approaches, such as brain scans, would be incongruous with current conceptions of psychological attributes. Entities that demonstrate first-person/third-person symmetry (i.e. measurements garnered using technological methods) are not logically equivalent to those that are asymmetrical with respect to first and third-person ascriptions (i.e. psychological phenomena). However, even if one assumes that there is a systematic relationship between subjective mental states and physical brain states, a philosophical conundrum remains. The reasoning presented above regarding Wittgenstein’s analysis of rule-following means that, since subjective mental states cannot be identified using criteria, i.e. they are indeterminate, subjective mental states cannot be equated to brain states. Therefore, it would appear that, both prior to and after measurement, the uncertainty in intentional psychological predicates is irreducible since it cannot be reduced by measuring physical properties of the brain because of the indeterminism in the corresponding subjective mental states (Cantley, Citation2019). This analysis aligns with Luntley’s (Citation2017) perspectives on learning and how it occurs, and also connects with the work of Duncan and Sankey (Citation2019) and Duncan et al. (Citation2022) on the nature of learning.

The non-separability of psychological predicates from the instruments used to measure them, coupled with the irreducible uncertainty in unmeasured psychological predicates, casts considerable doubt on the objectivist ideals associated with positivistic psychological and educational research. Indeed, these issues may lie at the heart of the replication crisis in relation to quantitative psychological and educational research. Of course, any attempt to improve the robustness of the methodological approaches employed in empirical psychological and educational research is commendable, but solely relying on methodological improvements, without adopting a critical stance to objectivism, may be insufficient to adequately address the replication crisis.

The analysis presented above places objectivism under strain as an underpinning paradigm for psychological and educational research. Indeed, it may transpire that interpretivism, which has hitherto usually provided the theoretical grounding of qualitative research, is a more suitable paradigm for all psychological and educational research. In particular, it may be more appropriate for quantitative researchers in psychology and education to abandon their quest for objectivity and universalism, which are both implicit in the replication crisis, and to adopt a similar stance to those scholars who embrace qualitative methodologies. Such a move would not preclude the use of quantitative research methods, but it would place greater emphasis on drawing conclusions relative to a particular research context, and in relation to the measuring instrument(s) employed in a particular study. Whilst this would render the pursuit of objective, generalisable findings to be a pipe dream rather than a possibility, I contend that, in response to the replication crisis, it is imperative for psychologists and educationalists to reassess their unwavering commitment to objectivism and universalism, rather than focussing exclusively on improving methodological rigour.

The final section briefly summarises the arguments presented in the article before assessing their implications for the evidence-based policy agenda in education.

Conclusion

The current article has outlined the origins of the replication crisis in psychology and education, which has cast doubt on the reproducibility of quantitative research findings in both disciplines. Clearly, the reproducibility of research findings is important in all areas of scientific endeavour, but it is particularly pertinent in the case of educational research, which has the potential to have significant traction in relation to policy and practice decisions in the real world. Those seeking to address the replication crisis have focussed almost exclusively on addressing methodological deficiencies in contemporary research practices, and have devoted significantly less attention to critically assessing the suitability of objectivism as an underpinning paradigm for psychological and educational research (Frias‐Navarro et al., Citation2020; Wiggins & Christopherson, Citation2019). In this article, I have used aspects of Wittgenstein’s later philosophy of mind to fundamentally challenge the objectivist ideals and commitment to universalism that are implicit in the pursuit of replicable quantitative psychological and educational research. More specifically, I have problematised current approaches to psychological and educational measurement by arguing that:

  1. Measurements of intentional psychological predicates (such as thinking, learning, understanding, remembering, meaning, intending, etc.) of the type that feature in quantitative psychological and educational research cannot be separated from the instruments used to measure them;

  2. There is irreducible uncertainty in unmeasured intentional psychological predicates.

Accordingly, I have argued that reformers must be prepared to acknowledge, and deal with, potential shortcomings in the theoretical underpinnings of quantitative research rather than solely focussing on addressing perceived methodological deficiencies.

The twenty-first century has heralded a significant focus on evidence-based policymaking in education (Lingard, Citation2013; Pellegrini & Vivanet, Citation2021), with large amounts of funding being invested in educational research to identify and support the development of effective practice. Given the numerous concerns that have been expressed about the replicability of quantitative educational research, it is imperative that all plausible explanations of the replication crisis, including both theoretical and methodological issues, are thoroughly assessed through a critical lens. Taking such an approach, and dealing appropriately with the outcomes, would reduce the probability of education policy decision-making being based on dubious evidence that may potentially lead to the inappropriate deployment of significant public resources.

Additional information

Notes on contributors

Ian Cantley

Ian Cantley’s current research interests are in mathematics education, and the mathematical and philosophical foundations of educational measurement models.

References

  • Aldhous, P. (2011, May 5). Journal rejects studies contradicting precognition. New Scientist. https://www.newscientist.com/article/dn20447-journal-rejects-studies-contradicting-precognition/
  • Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524
  • Bruner, J. (1996). The culture of education. Harvard University Press.
  • Cai, J., Morris, A., Hohensee, C., Hwang, S., Robison, V., & Hiebert, J. (2018). The role of replication studies in educational research. Journal for Research in Mathematics Education, 49(1), 2–8. https://doi.org/10.5951/jresematheduc.49.1.0002
  • Cantley, I. (2015). How secure is a Newtonian paradigm for psychological and educational measurement? Theory & Psychology, 25(1), 117–138. https://doi.org/10.1177/0959354314561141
  • Cantley, I. (2017). A quantum measurement paradigm for educational predicates: Implications for validity in educational measurement. Educational Philosophy and Theory, 49(4), 405–421. https://doi.org/10.1080/00131857.2015.1048668
  • Cantley, I. (2019). PISA and policy-borrowing: A philosophical perspective on their interplay in mathematics education. Educational Philosophy and Theory, 51(12), 1200–1215. https://doi.org/10.1080/00131857.2018.1523005
  • Collins, H. M. (1985). Changing order: Replication and induction in scientific practice. Sage.
  • Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
  • Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge University Press.
  • Duncan, C., & Sankey, D. (2019). Two conflicting visions of education and their consilience. Educational Philosophy and Theory, 51(14), 1454–1464. https://doi.org/10.1080/00131857.2018.1557044
  • Duncan, C., Kim, M., Baek, S., Wu, K. Y. Y., & Sankey, D. (2022). The limits of motivation theory in education and the dynamics of value-embedded learning (VEL). Educational Philosophy and Theory, 54 (5), 618–629. https://doi.org/10.1080/00131857.2021.1897575
  • Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, 6, 621. https://doi.org/10.3389/fpsyg.2015.00621
  • Fanelli, D. (2018). Opinion: Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2628–2631. https://doi.org/10.1073/pnas.1708272114
  • Frias‐Navarro, D., Pascual‐Llobell, J., Pascual‐Soler, M., Perezgonzalez, J., & Berrios‐Riquelme, J. (2020). Replication crisis or an opportunity to improve scientific production? European Journal of Education, 55(4), 618–631. https://doi.org/10.1111/ejed.12417
  • Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1(2), 198–218. https://doi.org/10.1177/2515245918771329
  • Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “estimating the reproducibility of psychological science”. Science (New York, N.Y.), 351(6277), 1037. https://doi.org/10.1126/science.aad7243
  • Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. https://doi.org/10.3102/0013189X005010003
  • Hattie, J. A. C. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
  • Hedges, L. V. (2018). Challenges in building usable knowledge in education. Journal of Research on Educational Effectiveness, 11(1), 1–21. https://doi.org/10.1080/19345747.2017.1375583
  • Hughes, B. M. (2018). Psychology in crisis. Palgrave.
  • John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953
  • Kim, J. S. (2019). Making every study count: Learning from replication failure to improve intervention research. Educational Researcher, 48(9), 599–607. https://doi.org/10.3102/0013189X19891428
  • Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method. The American Psychologist, 56(1), 16–26. https://doi.org/10.1037/0003-066x.56.1.16
  • Lingard, B. (2013). The impact of research on education policy in an era of evidence-based policy. Critical Studies in Education, 54(2), 113–131. https://doi.org/10.1080/17508487.2013.781515
  • Luntley, M. (2017). Forgetski Vygotsky: Or, a plea for bootstrapping accounts of learning. Educational Philosophy and Theory, 49(10), 957–970. https://doi.org/10.1080/00131857.2016.1248341
  • Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43(6), 304–316. https://doi.org/10.3102/0013189X14545513
  • Makel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both questionable and open research practices are prevalent in education research. Educational Researcher, 50(8), 493–504. https://doi.org/10.3102/0013189X211001356
  • Makel, M. C., Smith, K. N., McBee, M. T., Peters, S. J., & Miller, E. M. (2019). A path to greater credibility: Large-scale collaborative education research. AERA Open, 5(4), 233285841989196. https://doi.org/10.1177/2332858419891963
  • Malcolm, N. (1986). Wittgenstein: Nothing is hidden. Blackwell.
  • Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
  • Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943. https://doi.org/10.1126/science.aac4716
  • Pellegrini, M., & Vivanet, G. (2021). Evidence-based policies in education: Initiatives and challenges in Europe. ECNU Review of Education, 4(1), 25–45. https://doi.org/10.1177/2096531120924670
  • Sharpe, D., & Poets, S. (2020). Meta-analysis as a response to the replication crisis. Canadian Psychology/Psychologie Canadienne, 61(4), 377–387. https://doi.org/10.1037/cap0000215
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
  • Srivastava, S. (2012, September 27). A Pottery Barn rule for scientific journals. The Hardest Science. https://thehardestscience.com/2012/09/27/a-pottery-barn-rule-for-scientific-journals/
  • Vanpaemel, W., Vermorgen, M., Deriemaecker, L., & Storms, G. (2015). Are we wasting a good crisis? The availability of psychological research data after the storm. Collabra, 1(1), 3. https://doi.org/10.1525/collabra.13
  • Vaughn, S., Klingner, J. K., Swanson, E. A., Boardman, A. G., Roberts, G., Mohammed, S. S., & Stillman-Spisak, S. J. (2011). Efficacy of collaborative strategic reading with middle school students. American Educational Research Journal, 48(4), 938–964. https://doi.org/10.3102/0002831211410305
  • Wiggins, B. J., & Christopherson, C. D. (2019). The replication crisis in psychology: An overview for theoretical and philosophical psychology. Journal of Theoretical and Philosophical Psychology, 39(4), 202–217. https://doi.org/10.1037/teo0000137
  • Wittgenstein, L. (1978). Remarks on the foundations of mathematics (3rd ed., G. H. von Wright, R. Rhees, & G. E. M. Anscombe, Eds.; G. E. M. Anscombe, Trans.). Blackwell.
  • Wittgenstein, L. (2009). Philosophical investigations: The German text with an English translation. (P. M. S. Hacker & J. Schulte, Eds.; G. E. M. Anscombe, P. M. S. Hacker, & J. Schulte Trans.; 4th ed.). Wiley-Blackwell.
  • Wright, C. (2001). Rails to infinity. Harvard University Press.