1,434
Views
0
CrossRef citations to date
0
Altmetric
Report

Using stories to assess linear reasoning abolishes the age-related differences found in formal tests

, , &
Pages 623-633 | Received 23 Dec 2020, Accepted 29 Mar 2021, Published online: 19 Apr 2021

Abstract

Older adults are known to have difficulty with tests of formal reasoning. Inspired by previous work suggesting an influence of participants’ living ecology on reasoning ability, we examined in a group of 270 younger, middle-aged, and older adults whether presenting transitive reasoning problems (i.e., A > B, B > C, hence A > C) as informal narrative stories rather than formal problems might alleviate age-related declines. Formal materials resulted in the usual (strong) age-related differences favouring the young. In contrast, when informal spoken narratives were used and additionally all time pressure was removed, adult age differences were effectively abolished, possibly because the tasks now allow for easier encoding into and retrieval from episodic memory. This suggests that older adults’ real-life reasoning abilities are seriously underestimated when standard testing procedures are used.

As people age, performance on a plethora of cognitive tasks, ranging from simple and choice reaction time over working memory tasks and tests of episodic memory to tests of reasoning and general (fluid) intelligence, declines markedly (e.g., Salthouse, Citation1991; Verhaeghen, Citation2014). Of note are the large declines in reasoning ability – one meta-analysis estimates the correlation between adult age and reasoning at −.45, implying that adult age explains about 20% of the variance in reasoning ability (Verhaeghen, Citation2014). Equally of note, however, is the abstract nature of the materials used in these reasoning tasks – finding the missing pattern in a 3 by 3 matrix of abstract figures, solving formal syllogisms, or completing a number sequence.

In the psychometric testing tradition, such abstractions are a deliberate effort to de-emphasize the role of pre-existing knowledge: Abstract materials level the playing field and make results strongly dependent on online processing. The most-often used reasoning test in aging studies is the Raven’s Progressive Matrices test (Raven, Citation1936), which involves picking the correct missing piece to complete a set of geometric designs. This test was originally designed as a culture-fair test, allowing for an unbiased comparison of the reasoning abilities of between children with high and low SES (e.g., Guinagh, Citation1971). Human beings, however, often bring specific knowledge to bear in their day-to-day cognitive lives. One classic example concerns the mental arithmetic skills of child street vendors in Brazil (Carraher et al., Citation1985). When simple math problems were framed as prices and transactions, akin to the problems the children faced in daily life, they performed very well, but their performance suffered drastically when the exact same operations were presented as formal, school-type problems. On the other end of the lifespan, older university professors, who spend a great deal of their life reading and writing complex texts, appear to be impervious to the normal effects of aging on memory for prose, but not to those on basic abilities such as reaction time and working memory (Shimamura et al., Citation1995). Both studies suggest that performance on cognitive tasks might be influenced by how closely these tasks resemble the types of tasks or stimuli participants engage with on a day-to-day basis. Current models of cognitive optimization in aging likewise emphasize the importance of the living ecology, such as socially integrated and engaged lifestyles (Hertzog et al., Citation2008; Stine-Morrow et al., Citation2014).

In the present study, we investigated whether this ecological hypothesis also holds true for the reported age-related declines in reasoning abilities. That is, we examine whether adult age differences in reasoning might be attenuated when the problems are cast in more informal, everyday language, as opposed to presented more formally. To do so, we examined problems of transitive reasoning (i.e., A > B, B > C, therefore A > C) formulated either as formal problems or embedded in a meaningful story. Transitive reasoning in its formal guise has been shown to be age-sensitive (e.g., Sedek & Von Hecker, Citation2004); it is also related to working memory capacity (e.g., Vandierendonck & De Vooght, Citation1997), which is known to decline considerably over the adult life span (e.g., Verhaeghen, Citation2014).

An additional potential impediment to a correct assessment of older adults’ daily-life capabilities is the type of instruction given. Typical instructions give equal emphasis to speed and accuracy (often formulated as ‘be as fast and accurate as possible’), or omit mentions of speed or accuracy altogether. One problem with this approach is that under such conditions, older adults often prioritize accuracy over speed – going slow to avoid errors – whereas younger adults do the opposite – going fast but accruing errors (e.g., Ratcliff & McKoon, Citation2008). This strategic difference makes standard results hard to interpret. Another issue is that because the cognitive system of older adults is inherently slower than that of younger adults, any instruction that does not remove all time pressure likely leads to an overestimation of age-related differences in true cognitive ability (Salthouse, Citation1996). Consequently, we included a condition in which we explicitly emphasized accuracy (“Don’t hurry, think for as long as you need to be completely sure that your answer is correct”), in addition to a control condition (which mentioned neither speed nor accuracy), and a condition where speed was emphasized (“Please do not hesitate and make decisions quickly”).

Method

Participants

A total of 270 individuals participated, all native speakers of Polish: 90 younger adults (age range: 21–25, on average 23 years old, 61% women, with 15.8 years of formal education), 90 middle-aged adults (age range: 45–49, on average 46.7 years old, 54% women, with 16.1 years of formal education), and 90 older adults (age range: 65–84, on average 70.5 years old, 67% women, with 15.4 years of formal education); 30 of each group received one of the three sets of instructions (i.e., speed emphasis, control, or accuracy emphasis). Recruitment was based on local advertisements: university bulletin boards for younger adults, local bulletin boards for middle-aged and older adults. Participants were remunerated for their participation.

Materials

The following measures were used in the study: Mini-Mental State Examination (MMSE; Folstein et al., Citation1975), Wechsler Digit Symbol Substitution Test (DSST; Wechsler, Citation1981), the Operation Span task (Ospan; Turner & Engle, Citation1989), and the computerized reasoning task created for this study.

Mini-Mental State Examination. MMSE is a paper and pencil screening test for cognitive impairments, typically dementia. The 30-point questionnaire includes questions and tasks regarding orientation in time and space, memory, recall, attention, language, comprehension, reading, writing, and copying. In this study, it was used for the oldest group of participants as a selection tool to exclude those showing signs of dementia; participants scoring 26 or below were excluded.

Wechsler Digit Symbol Substitution Test (DSST). DSST is a paper-and-pencil subtest of WAIS-R that requires participants to match symbols to numbers according to a key presented at the top of the page. The number of correct digits completed within 90 seconds is the indicator of processing speed.

Operation Span Task (OSPAN). OSPAN is a computerized measure of working memory capacity which captures simultaneous maintenance and processing. Participants are presented with a series of arithmetic problems such as “Does (6 × 3) + 2 = 24?”, accompanied by a word (e.g., “audience”). The task is to correctly answer each problem and retain the word for later recall. Sequence length increased from two to seven. Our version of the task includes six trials with three sequences each. Working memory capacity is calculated as the sum of correctly recalled words only in the sequences where all words were recalled correctly.

Computerized Reasoning Task. The task is comprised of three parts with three trials each. Each trial requires remembering presented information and recalling it. In the first and second parts of the task participants are presented with short narrative stories (either written or spoken; order was counterbalanced between participants) representing premises, that is, spatial relations between three targets (closer – in the middle – further; taller – middle – shorter; to the left – in the middle – to the right). Relations are always presented for adjacent pairs (e.g., left end point – middle, middle – right end point); information about the end points is never presented and thus needs to be inferred. The example of such narrative story is “…Kate was a slim, long-haired blond girl. She wore a beautiful silk dress and pearl jewelry. Ann, dressed in a light-colored suit, was slimmer than her friend and hardly even reached her ear. Margaret was the third candidate; she wore her jet-black hair loose. Seeing her look down on Kate, one could assume that she would play basketball…” as the premises, and “It appeared that Margaret was taller than Ann” as the possible inference (see Appendix for a full example). Spoken text was delivered at 120 words/minute in a male voice through the text-to-speech computer program IVONA. In the third part of the task, relations between objects are presented in a formal, symbolic way (e.g., “C is lower than Z, L is higher than Z”) and asked whether a particular inference (e.g., L is higher than C) was warranted. Once the relational information for a particular trial is presented, participants answer a series of five true/false questions concerning the end points. Three types of questions are presented: (a) correct information (either concise summary or a copy of one of the premises); (b) paraphrase (one true and one false) of the premises; (c) inference (one true and one false). Analyses were performed on corrected accuracy (hits minus false alarms).

A total of six short stories was used for the narrative portion, presenting three different spatial relations (two tasks for each type of relation). These six stories were randomly divided into two sets, each including three different spatial relations. Study participants were randomly assigned to a particular set of tasks either in the form of written text or recorded audio (in the Polish language). Each participant was subjected to all of the stories, but in random order and through randomized assignment of a particular story to the text/audio form. Questions were also presented in random order. The task was preceded by a training session with a written story.

Procedure

Participants were informed about the purpose of the study and filled out a consent form. Following that, the oldest group completed the MMSE. All participants performed the computerized reasoning tasks. Participants were randomly assigned to one of three conditions. In the control condition neither accuracy nor speed were mentioned. In the accuracy-emphasis condition, participants were instructed to focus on accuracy (“It is very important to perform the tasks accurately. Please read the tasks insightfully and answer the questions very carefully. Don’t hurry, think for as long as you need to be completely sure that your answer is correct.”). In the speed-accuracy condition, participants were instructed to focus on the speed of task completion (“It is not just correctness that counts - the pace of completing tasks is extremely important. Please read the tasks quickly and answer the questions swiftly. Please do not hesitate and make decisions quickly.”). After completing the reasoning tasks, participants were administered DSST and OSPAN.

Results

Results are represented in . First, we obtained a number of anticipated effects, suggesting that our data generally replicate what can be expected from the literature. Age had a negative effect on reasoning performance, F (2,261) = 7.26, p < .001, partial ηp2 = .177. Younger adults had the highest level of performance (average corrected accuracy of 0.66); older adults the lowest (0.39); middle-aged adults were situated in between (0.44). Also as expected, the emphasis manipulation proved significant, F (2,261) = 3.97, p = .020, ηp2 = .029, with the speed-emphasis condition leading to lower corrected accuracy (0.42) than the control condition (0.52) and the accuracy-emphasis condition (0.54). The close proximity of the two latter numbers suggests that, when left at their own devices, participants generally favoured being correct over being fast. Also in line with the literature, we found strong and negative correlations between chronological age and speed of processing, r = −.79, and between chronological age and working memory, r = −.62.

Figure 1. Results of GLM analysis of reasoning performance including the emphasis manipulation (speed emphasis, control, accuracy emphasis) and age (younger, middle-aged, older) as the between-subject factors and the three types of materials (formal, written narrative, spoken narrative) as the within-subject factor.

Figure 1. Results of GLM analysis of reasoning performance including the emphasis manipulation (speed emphasis, control, accuracy emphasis) and age (younger, middle-aged, older) as the between-subject factors and the three types of materials (formal, written narrative, spoken narrative) as the within-subject factor.

Second, and crucially, using formal materials severely underestimated the ability of older adults compared to an informal presentation. Even more noteworthy, when informal spoken narratives were used and all time pressure was removed, adult age differences were effectively abolished. Statistical tests bear this out. Although no main effect of type of materials emerged (average corrected accuracy was 0.47 for formal problems, 0.52 for narrative written material, and 0.50 for narrative spoken material), the age by type of material interaction was significant, F (2,261) = 8.90, p < .001. This interaction indicates that the way the problems were presented affected performance differently for the different age groups. Follow-up repeated-measures ANOVAs were conducted within each age group with stimulus materials as the within-subject factor and type of emphasis as the between-subject factor. For younger adults, type of material had a significant effect, F (2,174) = 6.90, p = .001: Performance was highest when the tasks were presented as formal logical problems, and lowest when they were presented as spoken narratives. When the conditions are ordered as presented in the figure (i.e., formal, narrative-written, narrative-spoken), a linear effect with a positive slope emerged, F (1,89) = 12.24, p = .001. Older adults too, showed a significant effect, F (2,174) = 3.06, p = .049, but in the opposite direction: Older adults’ performance was largest for spoken narratives and lowest for formal problems, and the linear effect had a negative slope, F (1,89) = 6.33, p = .014. Middle-aged adults were situated in between: For them, the effect of type of material was not significant, F (2,174) = 0.73, p = .48. When just the narrative-spoken data are analysed, performance of older adults was statistically indistinguishable from that of younger adults in the main condition of interest, that is, the accuracy emphasis condition, t(df = 58) = 0.61, p = .55. The speed-emphasis condition yielded the same result, t(df = 58) = 1.16, p = .25; an age-difference, however, remained for the control condition, t(df = 58) = 2.42, p = .018.

shows the correlations between age, speed of processing, and working memory and performance on each of the types of materials in the three conditions. Generally, speed of processing and working memory are correlated with reasoning performance, with the notable exception of the informal-spoken tasks in the speed and accuracy emphasis conditions, where age likewise did not correlate with reasoning performance.

Table 1. Pearson correlations between age, speed of processing, and working memory and performance on each of the types of materials in the three conditions.

Discussion

These results are remarkable. Of particular interest is the dissociation between age and type of materials. When we equated participants on emphasis, that is, when we encouraged them to prioritize either speed or accuracy, younger adults performed best when the material was presented as a problem of formal logic and worst when it was presented as an informal narrative. Older adults showed exactly the opposite pattern, to the point where age differences effectively disappeared when presentation was informal and spoken. There are several possible, not necessarily mutually exclusive possible explanations for these findings.

The first concerns the living ecology, captured in aging theory as the disuse hypothesis (e.g., Baron & Cerella, Citation1993), which states that observed age-related differences in cognitive performance might be partially due to older adults being less likely to engage in the sort of cognitive operations and/or engage with the type of materials used in standard tests. Formal reasoning problems tend to be encountered more often (if not exclusively) in the classroom, and so younger adults, who are either still in school or have left school more recently, would be more familiar with such problems, and therefore outperform older adults.

Another possible explanation could be motivational: Older adults often engage cognitive resources selectively, investing less effort in tasks that are assumed to incur a cost that is perceived as too high (Hess, Citation2014). Under this assumption, formal reasoning (often perceived as hard) is simply not worth the effort for older adults, but tasks presented as social narratives are seen as worth engaging in.

A third possible explanation concerns the differential resource-dependence of the different types of materials. Formal transitive problems require the involvement of resources such as executive control and working memory, as has indeed been found (e.g. Vandierendonck & De Vooght, Citation1997). As mentioned earlier, this dependence on online processing resources is precisely the reason why psychometricians prefer to use abstract materials. In the present data set, working memory did indeed correlate with performance on the formal reasoning task, but less so with performance on the informal-spoken task, suggesting that the latter at least partially circumvents the use of resources. One potential reason could be the visual imagery allowed by our materials – reading that Margaret is as tall as a basketball player and literally looking down on Kate has the possibility of creating a striking visual image that can be read directly and easily combined with the image of Ann hardly reaching Kate’s ear for a resource-free read-out, and thus likely makes for easy memory encoding and retrieval.

Curiously, however, the informal-written task still shows resource-dependency. This suggest that it is not the informal or imagery-affording nature of the material per se that makes the informal-spoken condition less dependent on resources and (likely therefore) less age-dependent. We think a potential reason might be that older adults tend to emphasize the early, perceptual stages of processing, especially in the auditory modality (this has been labeled the ‘effortfulness hypothesis’; e.g., Schneider & Pichora-Fuller, Citation2000; Wingfield et al., Citation2005). If so, our older adults may have paid particular attention to the spoken (as opposed to written) narratives, and consequently formed a strong mental image that made performance on the reasoning task resource-independent. A similar amount of perceptual effort would be less likely to be invested in the written-narrative task, because of the possibility for re-reading and the easier correction for age-related visual decrements than auditory decrements. In this case, then, the spoken narrative guides older adults to an efficient reasoning strategy, in accordance with the oft-stated view that environmental support (e.g., Craik, 1994) or scaffolding (e.g., Reuter-Lorenz & Park, Citation2010) might serve to ameliorate or even (as in this case) abolish age-related differences in fluid aspects of cognition.

To summarize, we found that the use of formal materials and statements in linear reasoning resulted in the usual (strong) age-related differences favouring the young. In contrast, when informal spoken narratives were used and additionally all time pressure was removed, adult age differences were effectively abolished, possibly because the tasks now allow for easier encoding into and retrieval from episodic memory. This suggests that older adults’ real-life reasoning abilities are seriously underestimated when standard testing procedures are used.

Author contributions

G. Sedek and K. Lengsfeld developed the study concept and design. Testing and data collection were performed by K. Lengsfeld. All authors performed the data analysis and interpretation. P. Verhaeghen drafted the manuscript, K. Rydzewska and G. Sedek provided revisions. All authors approved the final version of the manuscript for submission.

Acknowledgments

The authors thank Agnieszka Grodecka for assistance with data collection.

Disclosure statement

Authors declare no competing interests.

Additional information

Funding

This work was supported by grant 2018/29/B/HS6/02604 from the National Science Centre of Poland.

References

Appendix

Samples from the reasoning tasks

NARRATIVE TASK:

1. Things became interesting when the three candidates for the main prize were presented. Kate was a slim, long-haired blond girl. She wore a beautiful silk dress and pearl jewelry. Ann, dressed in a light-colored suit, was slimmer than her friend and hardly even reached her ear. Margaret was the third candidate; she wore her jet-black hair loose. Seeing her look down on Kate, one could assume that she would play basketball, but only someone who didn’t notice her extremely high-heeled shoes would think so.

Questions [Is it TRUE or FALSE?] presented in random sequence:

  • Ann, dressed in a light-colored suit, was slimmer than Kate and hardly even reached her ear. [correct information]

  • Kate appeared to be shorter than Margaret, who wore high-heeled shoes. [true paraphrase]

  • Ann was taller than petite Kate, who reached up to her shoulder with the top of her head. [false paraphrase]

  • It appeared that Margaret was taller than Ann. [true inference]

  • One could see Ann looking down upon Margaret, who wore her jet-black hair loose. [false inference]

FORMAL TASK:

1. C is lower than Z

L is higher than Z

Questions [Is it TRUE or FALSE?] presented in a random sequence:

  • L is higher than Z [correct information]

  • Z is higher than C [true paraphrase]

  • Z is higher than L [false paraphrase]

  • C is higher than L [true inference]

  • L is higher than C [false inference]