1,855
Views
8
CrossRef citations to date
0
Altmetric
Original Articles

The cognitive reflection test revisited: exploring the ways individuals solve the test

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 207-234 | Received 06 Jun 2016, Accepted 27 Jan 2017, Published online: 01 Mar 2017
 

ABSTRACT

Individuals’ propensity not to override the first answer that comes to mind is thought to be a crucial cause behind many failures in reasoning. In the present study, we aimed to explore the strategies used and the abilities employed when individuals solve the cognitive reflection test (CRT), the most widely used measure of this tendency. Alongside individual differences measures, protocol analysis was employed to unfold the steps of the reasoning process in solving the CRT. This exploration revealed that there are several ways people solve or fail the test. Importantly, 77% of the cases in which reasoners gave the correct final answer in our protocol analysis, they started their response with the correct answer or with a line of thought which led to the correct answer. We also found that 39% of the incorrect responders reflected on their first response. The findings indicate that the suppression of the first answer may not be the only crucial feature of reflectivity in the CRT and that the lack of relevant knowledge is a prominent cause of the reasoning errors. Additionally, we confirmed that the CRT is a multi-faceted construct: both numeracy and reflectivity account for performance. The results can help to better apprehend the “whys and whens” of the decision errors in heuristics and biases tasks and to further refine existing explanatory models.

Acknowledgments

We would like to thank Árpád Völgyesi for running the verbal protocols, Bence Bago and Zoltan Kekecs for their helpful comments with the analysis, Melissa Wood for proofreading the manuscript and Melinda Szászi-Szrenka for her supporting patience throughout the study.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 The responses in the CRT are often grouped into three categories: “intuitive incorrect” (10 cents, 100 machines, 24 days); “non-intuitive incorrect” (any other answer); and “non-intuitive correct” (5 cents, 5 machines, 47 days).

2 Based on Google Scholar, January 2017.

3 No lure CRT tasks are CRT-like arithmetic problems that supposedly do not trigger an “intuitive incorrect” response. For example, “If it takes 1 nurse 5 min to measure the blood pressure of 6 patients, how many minutes would it take 100 nurses to measure the blood pressure of 300 patients?” (Baron, Scott, Fincher, & Metz, Citation2014)

4 Numeracy is one's ability to store, represent and process mathematical operations (Peters, Citation2012).

5 The European version of the bat and ball problem was administered where the cost of the bat and the ball is given in €.

6 In order to compute B, one has to model the predictions of the tested hypotheses. Since all of the hypotheses in the current study had directional predictions, following Dienes's recommendations (Citation2011, Citation2014), we modelled the alternative hypotheses with half-normal distributions with 0 probability for negative values. We applied two ways to determine the SD of the half-normal distributions. If we had information on the effect size of the alternative model, then we used it as the SD of the half-normal distribution. Otherwise, we estimated the maximum possible effect size of the alternative hypothesis and we applied the half of it as the SD of the half-normal distribution.

7 The assumptions of the multiple regression were not met. A bootstrapping estimation of 10,000 samples confirmed the results of the regression analysis.

8 We used the glmer and lmer functions from the lme4 package in R for the mixed-effect analyses (Bates, Maechler, Bolker, & Walker, Citation2015). The corresponding t statistics reported are based on the result of Wald t tests.

9 “H” indicates that we applied a half-normal distribution to model the predictions of the alternative hypothesis. The first number in the bracket displays the centre of the distribution, and the second indicates the SD of the distribution.

10 We assumed that the effect size of H1 cannot be bigger than the average RT of the group with longer RT. Consequently, the average RT in the “Correct start” group was taken as an estimate of the maximum effect size of H1. The half of its value was employed as the SD of the model.

11 As there was no previous study examining the predictive power of BIDR on the CRT performance, we applied the predictive power of the BNT as a rough estimate for the maximum effect size of H1. Thus, the half of this value was employed as the SD of the model.

12 Although we did not formulate specific hypotheses, Appendix B.2 depicts the means and standard deviations of all the individual differences measures (BNT, AOT, REI, BBS, SI, BIDR) across the different categories created in the protocol analyses.

13 The predictive power of the BNT for giving the right answer on the CRT was taken as the maximum of the expected effect size for H1, and so the half of this value was employed as the SD of the model.

14 We took the maximum expected effect size from a model where REI predicted the accuracy of the answer for H1. The half of its value was employed as the SD of the model.

15 Compared to our findings, the relatively low proportion of ‘Correct start’ cases could have been caused by several differences between the two experimental designs. First, unlike us, the authors used the modified bat and ball problem. Additionally, the authors did not control for the time-course assumption of the answers, which is crucial regarding our theoretical question, as it is possible that those who indicated awareness of the ‘intuitive’ response may have started to think with a correct strategy, and the incorrect solution came to their mind only later. Finally, their results are based on participants’ self-reports after solving the task and not on verbal protocols.

16 Working memory (WM) differences can bring additional complexity in the equation: people with higher working memory spam are thought to be more numerate (Peters, Dieckmann, Dixon, Hibbard, & Mertz, Citation2007; Reyna, Nelson, Han, & Dieckmann, Citation2009), but they may find the cost of additional thinking lower than their low WM counterparts (Stupple, Gale, & Richmond, Citation2013).

Additional information

Funding

This work was supported by the doctoral scholarship of Eötvös Loránd University, and by the “Pallas Athéné Domus Animae Alapítvány”. Aba Szollosi was supported by the “Nemzet Fiatal Tehetségeiért” Scholarship [NTP-NFTÖ-16-1184].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 418.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.