2,277
Views
1
CrossRef citations to date
0
Altmetric
Research Article

The Flynn effect in estimates of premorbid intellectual functioning in an Australian sample

ORCID Icon, ORCID Icon, ORCID Icon &
Article: 2001297 | Received 09 Sep 2021, Accepted 28 Oct 2021, Published online: 16 Feb 2022

ABSTRACT

Objective

While the Flynn effect is a well-recognised phenomenon impacting tests of cognitive ability, limited research has been conducted into its relevance for tests of premorbid ability. Consequently, we aimed to investigate whether estimated FSIQ scores from four commonly used word reading tasks (the NART, the NART2, the WTAR, and the TOPF) were influenced by the Flynn effect.

Method

We administered the NART, WTAR, and TOPF to 120 healthy community-dwelling adults. Using these raw scores we calculated estimated FSIQ scores using the predictive models published in the relevant manuals and compared these with scores obtained on the WASI-II.

Results

We found a linear increase in estimated FSIQ, with the oldest reading task, the NART, returning the highest scores and the most recent, the TOPF, the lowest. The NART, WTAR US, and TOPF US overestimated intellectual ability compared to current functioning measured by the WASI-II.

Conclusions

Our findings indicated tests of premorbid functioning appear to be subject to the Flynn effect, and clinicians should exercise caution in using older word reading tasks such as the NART. Our results support the need for Australian standardisations of these instruments.

KEY POINTS

What is already known about this topic:

  1. The Flynn effect is the well-known observation that population intelligence is increasing by 3 IQ points per decade.

  2. Word reading tasks reliably and validly estimate premorbid intellectual functioning in patients with neuropsychological impairment.

  3. There is some evidence indicating word reading tasks might be impacted by the Flynn effect.

What this topic adds:

  1. We replicated previous research and found results that were consistent with the Flynn effect in estimating premorbid intellectual functioning across the TOPF, WTAR and NART2 and NART.

  2. Our results confirmed older tests such as the NART are likely to significantly overestimate premorbid intellectual functioning and should be used with caution.

  3. Differences in predicted FSIQ scores based on UK and US norms point to a need for future Australian standardisations of these tests.

Neuropsychological assessment following brain injury or the onset of disease such as dementia is of growing importance given demographic change and the increased prevalence of these conditions (Australian Institute of Health and Welfare, Citation2020; Nguyen et al., Citation2016; Pozzato et al., Citation2019). Specialised assessment is required to determine cognitive impairment and diagnoses and contribute to rehabilitation planning and injury compensation processes (Australian Commission on Safety and Quality in Health Care, Citation2017; Lezak et al., Citation2012). There is a growing demand for high quality assessment of cognitive functioning (Howieson, Citation2019), and current policy in Australia aims to raise awareness and support best practice approaches to cognitive impairment in health care settings (https://cognitivecare.gov.au/).

Using the deficit measurement paradigm, clinicians assessing individuals with cognitive impairment first establish an estimate of the person’s premorbid level of intellectual functioning (Lezak et al., Citation2012). Current cognitive functioning is then compared with this estimate. A number of neuropsychological tests have been developed to assist in accurately estimating an individual’s premorbid level of intellectual functioning. In current research and practice, word reading tasks are typically used for these estimates. These word reading tasks are based on three observations: the extent of vocabulary knowledge is relatively resilient to brain impairment; vocabulary is robustly correlated with scores on intelligence tests; and vocabulary can be measured using a list of irregularly spelled words (Bright & van der Linde, Citation2020; Crowe, Citation2010; Holdnack & Drozdick, Citation2009).

Word reading tasks

There are a range of word reading tasks in use internationally and in Australian clinical practice and research (e.g., Bright & van der Linde, Citation2020; Skilbeck et al., Citation2013; Thomas et al., Citation2020). Word reading tasks most frequently used in the Australian context include the National Adult Reading Test (NART; Nelson, Citation1982), restandardised as the NART2 in 1991 (Nelson & Willison, Citation1991; Strauss et al., Citation2006); the Wechsler Test of Adult Reading (WTAR; Wechsler, Citation2001); and the Test of Premorbid Functioning (TOPF; Wechsler, Citation2009, Citation2011a).

The NART was developed in the UK to predict Wechsler Adult Intelligence Scale (WAIS) full scale IQ (FSIQ), verbal IQ, and performance IQ. Participants read a list of 50 irregularly spelled words aloud and the number of items pronounced incorrectly (error score) is used to estimate premorbid IQ scores. The original NART standardisation study was published in 1982, and tested 120 adults (age range 20–70 years), with a mean FSIQ of 109.2. The NART was restandardised against the WAIS-R in 1991 in a study involving 182 adults (ages 18–70, mean FSIQ = 107.4) and is referred to as the NART2 (or NART-R by some authors; Nelson & Willison, Citation1991). While both instruments comprise the same 50 items, the NART provided predicted intellectual functioning scores against the WAIS, and the NART2 models predicted WAIS-R scores (Strauss et al., Citation2006). The standardisation studies showed the NART and NART2 had strong psychometric properties, and the instrument has also been shown to be a good predictor of general intelligence (Crawford et al., Citation1989). The NART and the NART2 should be distinguished from word reading tasks developed in the US (e.g., National American Adult Reading Test, NAART; Blair & Spreen, Citation1989).

The WTAR (Wechsler, Citation2001) was developed in the US along with the third versions of the WAIS and the Wechsler Memory Scale (WMS-III). The WTAR is a list of 50 irregularly spelled words, where the number of words pronounced correctly is used to estimate premorbid IQ. In the standardisation study published in 2001, WTAR scores, together with demographic variables (education, ethnicity, sex) were found to account for 51% of variance in FSIQ scores (Wechsler, Citation2001).

The TOPF was released in the US in 2009 with the fourth edition of the Wechsler scales (Holdnack & Drozdick, Citation2009; Wechsler, Citation2009). The instrument includes 70 irregularly spelled words in order of relative difficulty. The TOPF-UK was developed to predict WAIS-IV and WMS-IV scores in a UK sample (Wechsler, Citation2011a). It comprises the same 70 items as the original US version but with eight words in different order. The number of words pronounced correctly is used for scoring and the test includes a discontinue rule whereby the test is discontinued after five incorrect answers. A model that included the TOPF plus demographics was found to account for 57% of variance in Weschler Abbreviated Scale of Intelligence – Second edition scores (WASI-II; Wechsler, Citation2011b), and up to 65% of WAIS-IV scores (Holdnack & Drozdick, Citation2009).

In our recent research examining these three tests in Australia, we observed the NART2, WTAR, and TOPF plus demographics to account for more modest variance in WASI-II scores (.26 – .36; Thomas et al., Citation2021, Citation2020). We also observed significant departures from the word order used in both the UK and the US, with 32 words out of order by more than five places compared to the US and 30 compared to the UK. Together, these findings suggest that the use of the US and UK norms for these word reading tasks, developed between 12 and 40 years ago, may not be appropriate in the current Australian context.

The Flynn effect

The Flynn effect is the well-known observation of increases in IQ scores of approximately three points per decade. Flynn (Citation1984) initially observed IQ score gains ranging from .25 to .44 points per year in data from 73 studies using the Stanford-Binet and Wechsler intelligence tests. These findings have since been replicated in a number of studies (Flynn, Citation1987; Trahan et al., Citation2014). Trahan et al.’s (Citation2014) meta-analysis examined 285 studies that included 378 relevant comparisons and observed an overall Flynn effect of 2.31 points per decade. When only Wechsler/Binet tests normed since 1972 were included, as in Flynn’s original study, the Flynn effect estimate rose to 2.93, closer to the usual estimate of 3 points per decade. Greater gains in nonverbal measures such as Raven’s Progressive Matrices, associated with fluid intelligence, have been observed compared to tests of crystalised intelligence (Flynn, Citation2012). Given word reading tasks primarily test crystalised intelligence we might expect to see a modest Flynn effect in scores on these tasks.

Models predicting premorbid IQ have measures of intellectual functioning as their dependent variable, predicted by word reading task performance and demographics. Most of the variance in intellectual functioning in these models is accounted for by performance on word reading tasks. These models were developed to predict scores on particular IQ tests (e.g., the NART predicted WAIS scores, and the TOPF predicted WAIS IV scores). Although these are different tests, they all measure the underlying construct of intellectual functioning and have been shown to be very strongly correlated. The Flynn effect is said to act upon measured performance on tests of intellectual functioning and may also act on vocabulary knowledge and performance on word reading tasks. If the Flynn effect acts upon the individual components of the predictive models, it may be expected the estimates these models produce will be similarly impacted.

There is some evidence of a plateau and even decline in the Flynn effect in some countries (Dutton et al., Citation2016; Platt et al., Citation2019; Sundet et al., Citation2004). A review of Flynn effect studies found evidence for a decline in IQ scores in a number of Scandinavian countries (Dutton et al., Citation2016). For instance, data from 407,166 individuals completing military service in Finland showed an increase of four IQ points between 1988 and 1997, and a decline of around two points between 1997 and 2009 (Dutton & Lynn, Citation2013).

Evidence for the Flynn effect in estimating premorbid cognitive performance was found in a US study of 189 veterans receiving a neuropsychological evaluation between 2011 and 2016 (Kirton et al., Citation2020). Specifically, estimates derived from a demographic model developed in 1981 to predict the WAIS-R were significantly higher compared to the more recently developed TOPF. When an adjustment of .23 FSIQ points per year was made to the demographic estimate to account for the Flynn effect, no difference was observed between the demographic model and the TOPF.

This question was also considered in an Australian study, which investigated whether the Flynn effect was evident in the NART, NART2, WTAR, and TOPF (Norton et al., Citation2016). A sample of 95 adults recruited from the community completed these tests as well as the WAIS-IV. Predicted WAIS-IV scores derived from each of the tests of premorbid functioning were compared. The predicted WAIS-IV score for the NART was 110.08 (SD = 4.25), for the NART2 104.25 (SD = 6.47), the WTAR 104.42 (SD = 6.37), and the TOPF 102.16 (SD = 10.45). These findings were consistent with the Flynn effect. For instance, a significant 5.8 point differential was observed between NART and NART2 scores. Similarly, the significant 2.26 point difference between the WTAR and TOPF was consistent with the Flynn effect, albeit slightly smaller than the 3.3 point differential expected based on the publication years of these word reading tasks. No significant difference was observed between the NART2 and WTAR.

The current study

The Flynn effect remains an important consideration in the assessment of premorbid intellectual functioning, especially where older norms are used. For instance, given the NART was published in 1982 (Nelson, Citation1982) and the NART2 in 1991 (Nelson & Willison, Citation1991), it could be expected the predicted IQ scores from these two measures would differ by around 3 points if the Flynn effect holds for tests of premorbid functioning. Moreover, a clinician using the NART in 2021 might overestimate premorbid IQ by close to 12 points. While Norton et al.’s study was important as the first to examine the impact of the Flynn effect on estimation of premorbid IQ scores, it utilised a small, young, and relatively well-educated sample. Given these limitations, the authors called for replication of their study. Kirton et al.’s study provided this replication but only examined the TOPF. The current study will replicate Norton et al.’s study with a different, larger and more diverse sample.

The aims of the current study therefore were to:

  1. Test whether the Flynn effect was evident in predicted FSIQ scores derived from the NART, NART2, WTAR, and TOPF; and

  2. Determine whether the norms derived from the original standardisation studies were valid for use in current clinical practice.

Based on the publication years of the word reading tasks, we hypothesised that highest predicted FSIQ scores would be derived from NART scores, then the NART2, WTAR, and TOPF respectively. Further, given participants in our study completed the WASI-II in 2016, we expected scores for this test to be lower than the predicted scores derived from all the reading tests.

Method

Participants

Participants were a convenience sample of 145 healthy adults recruited from the general population via email and social media posts. They all nominated English as their first language and reported no history of serious neurological injury or disease. A total of 112 (93.3%) resided in NSW, with the remainder coming from Queensland (N = 6), Victoria (N = 1), and Western Australia (N = 1). Given concerns about the accuracy of reading tests for individuals with either high or low IQ (Bright et al., Citation2018; Mathias et al., Citation2007), we excluded individuals with a FSIQ of below 79 or above 120, leading to a final sample of 120. The sample included 72 females (60%) and 48 males (40%). Mean age was 35.14 (SD = 13.22), with a range of 18 to 68. Most participants were born in Australia (95%), with the remainder born in the UK, Malaysia, Nepal, and Thailand. Participants had on average 15 years of education (SD = 2.44, range 7–22 years), equivalent to a Bachelor level degree.

Measures

Participants answered demographic questions relating to their age, gender, country of birth, level of education, and employment history. They also completed the Weschler Abbreviated Scale of Intelligence – Second Edition (Wechsler, Citation2011b), National Adult Reading Test (Nelson & Willison, Citation1991), the Wechsler Test of Adult Reading (Wechsler, Citation2001), and the Test of Premorbid Functioning (Wechsler, Citation2009).

Weschler Abbreviated Scale of Intelligence – Second edition (WASI-II, Wechsler, Citation2011b)

This is an abbreviated version of the more extensive WAIS-IV. The WASI-II was developed to provide quick and accurate measurement of intellectual functioning. It is used where administration length or client fatigue might prevent use of the longer test. Subtests include Similarities, Matrix Reasoning, Block Design, and Vocabulary. The WASI-II provides scores on Full Scale IQ (FSIQ), Verbal Comprehension (VCI), and Perceptual Reasoning (PRI). Only FSIQ-4 scores will be used in the current study. This score has a population mean of 100 (SD = 15).

The WASI-II has been shown to have excellent internal consistency for each of the three index scores: VCI (r = .95), PRI (r = .94), and FSIQ (r = .97). Test-retest reliability of the scaled scores has been reported as very good, ranging between .90-.95. Interscorer reliability was also very good, ranging from .94-.99 across all subtests. Correlations with the WAIS-IV FSIQ were strong, with FSIQ-4 correlating at r = .92, FSIQ-2 at r = .86, VCI at r = .88, and PRI at r = .87.

National adult reading test (Nelson, Citation1982)

This test is comprised of 50 irregularly spelled words. Participants read each word aloud and their pronunciation is scored either correct or incorrect. Pronunciation in all word reading tasks in the current study was assessed according to the Macquarie Dictionary (Macquarie Dictionary, Citation2013). Scores are comprised of total words pronounced incorrectly. The NART was observed to have excellent internal reliability in the original norming study of 120 individuals with extracerebral disorders (α = .93). The restandardised NART2 (Nelson & Willison, Citation1991) has excellent internal consistency, α = .93, interrater reliability, r = .96-.97, and test-retest reliability r = .98 (Nelson & Willison, Citation1991).

Wechsler Test of Adult Reading (WTAR, Wechsler, Citation2001)

This is also comprised of 50 irregularly spelled words. The total score comprised the number of words pronounced correctly. The WTAR has excellent internal consistency, α = .87 to .95, and test-retest reliability r = .90 to .94 (Wechsler, Citation2001). The WTAR has been shown to be strongly correlated with measures of intellectual functioning (e.g., r = .73 with WAIS-III FSIQ, and r = .74 with VCI (Wechsler, Citation2001).

Test of Premorbid Functioning (TOPF, Wechsler, Citation2009)

This test is comprised of 70 irregularly spelled words. Similar to the NART, participants read each word aloud and their pronunciation is scored either correct or incorrect. The TOPF has excellent internal reliability (r = .92 – .99) across different age ranges, and test-retest reliability of r = .89 – .95 (Holdnack & Drozdick, Citation2009). It has been found to account for as much as 65% of variance in WAIS-IV index scores.

Procedure

Ethics approval was granted by the Charles Sturt University Human Research Ethics Committee, protocol H16055. The data for this study were collected by student researchers and psychologists, with training and supervision provided by an experienced Clinical Psychologist (MT). Participants provided demographic details, completed the WASI-II, and each of the word reading tasks. Items on the three word reading tasks were combined into one list of 127 words (after common items were removed) and these were presented to participants in order of difficulty. Responses to the word reading tasks were recorded and later rescored by a Clinical Psychologist and Clinical Neuropsychologist. These rescored data were used here. WASI-II forms were scored at the time of administration and later reviewed by the supervising Clinical Psychologist.

Analyses

The focus of this study was estimates of FSIQ scores calculated using the predictive models published in the NART, NART2, WTAR, and TOPF manuals. One-way repeated measures analysis of variance was performed to determine if there were differences between the mean predicted FSIQ scores derived from each of these tests. We also tested whether there were differences between the predicted scores from each test and our observed WASI-II FSIQ scores. Cohen’s d effect sizes were calculated for all mean differences. In calculating these effect size statistics, we used the population standard deviation of 15, rather than the pooled standard deviation, as it is the recognised population standard deviation on widely used measures of intellectual functioning and makes the magnitude of the effect sizes derived in our study most easily interpretable. We also suspected using the pooled standard deviations would inflate the effect size due to the restricted FSIQ range (and subsequently smaller standard deviations) we utilised in this study. For instance, using a population standard deviation resulted in an effect size of .53 (moderate) for the difference between the WASI-II and NART, and .79 (large) using the sample pooled standard deviation.

Some violations of normality were observed for the TOPF predicted FSIQ score (UK norms), the WTAR (UK and US norms), and the observed WASI-II scores. Examination of boxplots also showed some outliers. Closer inspection of the data showed these violations to be relatively modest (skew range Z = −1.34–4.15) thus no transformations of the data were undertaken. Mauchley’s test indicated the assumption of sphericity was violated, hence the Greenhouse-Geisser correction was applied (ε = .388).

Results

In the final sample of 120, the mean WASI-II FSIQ-4 score was 103.64 (SD = 9.11, range 79–120). reports mean raw scores on the word reading tasks collected in 2016 and compares these with those published in the relevant manual (where available). The only difference observed was between our mean error score on the NART and the mean score from the 1981 manual, with a small effect size.

Table 1. Comparison of raw scores on word reading tasks

shows the means and standard deviations of the predicted FSIQ scores for the NART, NART2, WTAR, TOPF, and WASI-II FSIQ. Predicted FSIQ scores differed significantly between reading tests (F (2.328, 227.048) = 75.10, p < .001, ηp2 = .387). As expected, estimated NART scores were highest, followed by NART2, WTAR, and TOPF.

Table 2. Distributions of predicted FSIQ scores

shows the results of the pairwise comparisons. Posthoc tests using a Bonferroni correction showed predicted FSIQ from the NART, as the oldest test, was significantly higher compared to the other tests. The WTAR and NART2 did not differ but overall, the results were consistent with a modest Flynn effect. Using the US norms returned a significantly higher predicted FSIQ score compared to the UK norms. There was a nearly six-point differential between the scores derived from the TOPF US and UK, with our observed WASI-II score falling almost directly between. A smaller but still significant difference was observed between the WTAR US and UK. Given the similar publication dates for these tests, it is unlikely the differences observed are caused by the Flynn effect. Overall effect sizes ranged from moderate (d = .72, NART/TOPF UK) to negligible (d = .03, WTAR US/TOPF US).

Table 3. Pairwise comparisons between predicted FSIQ scores for the NART, NART2, WTAR, and TOPF

shows pairwise comparisons between our observed WASI-II FSIQ scores and the predicted score from each test. Observed WASI-II scores were significantly lower than NART, WTAR US, and TOPF US predicted scores, and significantly higher than TOPF UK predicted scores. The most substantial difference observed was with NART scores, with mean difference of 7.95 IQ points (d = .53, moderate).

Table 4. Pairwise comparisons between observed WASI-II FSIQ-4 scores and predicted FSIQ scores for the NART, NART2, WTAR, and TOPF

shows: 1) our results using UK norms; 2) our results using US norms; 3) Norton et al.’s (Citation2016) results; and 4) the hypothesised Flynn effect based on an estimated increase of 3 IQ points per 10 years. We note our data and Norton et al.’s were collected at approximately the same time and thus any differences between our results should not be attributed to the Flynn effect. Overall, these findings suggest the Flynn effect does impact tests of premorbid functioning with higher predicted FSIQ scores for the tests with older norms, and lower scores for the more recently normed tests.

Figure 1. The Flynn effect and premorbid cognitive functioning: the current results (using UK and US norms), Norton et al., and the hypothesised Flynn effect.

Figure 1. The Flynn effect and premorbid cognitive functioning: the current results (using UK and US norms), Norton et al., and the hypothesised Flynn effect.

Discussion

The current study aimed to investigate whether the Flynn effect was evident in reading tests that estimate premorbid intellectual functioning, thereby replicating the earlier study of Norton et al. (Citation2016), with a larger and more diverse sample. Specifically, we tested whether the Flynn effect was evident in predicted FSIQ scores derived from the NART, NART2, WTAR, and TOPF. While a considerable body of research shows reading tests to be a reliable and valid method of estimating premorbid intellectual ability, and consequently valuable tools for clinicians seeking to diagnose and treat acquired brain injury, dementia, and other neuropsychological disorders, the pattern of our results appeared consistent with the Flynn effect.

We observed an effect consistent with the rule of thumb increase of three IQ points per decade suggested since Flynn’s original study (Flynn, Citation1984). For instance, the largest difference (d = .72) we observed was just under 11 points between the NART (norms published in 1982) and the TOPF UK (norms published in 2011), which was in fact larger than the nine-point difference expected given the nearly 30-year gap between publication of these tests. Estimated FSIQ scores derived from the NART were significantly higher compared to all the other tests used in the study. The NART2 was also significantly higher than the TOPF UK, but did not differ significantly from the other tests (with the exception of the NART). Although the predicted IQ score from the WTAR (UK) was lower than the NART2, this difference was marginally non-significant (p = .071). The TOPF, as the most recently normed test, produced the lowest estimates of FSIQ, although scores from the TOPF US were close to six points higher than those derived from the TOPF UK. This latter effect (i.e., US normed tests produced higher estimated FSIQ scores than UK norms) was also replicated in the WTAR, with a more modest differential of two points observed.

These findings are consistent with past research. For instance, in the UK standardisation for the WTAR, participants scored on average 4.8 FSIQ points higher based on US norms compared to UK norms (Wechsler, Citation2001). Determining which norms to use is a clear decision for clinicians in the US and UK, however, this decision becomes more problematic for Australian psychologists in the light of the current findings. We will return to this point later in the discussion.

We also compared the raw scores of our sample on the word reading tasks with those published in the relevant manuals and only observed a difference between our NART error score and the 1981 manual. We note the manuals for the WTAR and TOPF US only published standardised scores with a mean of 100 and a SD of 15 and thus we were unable to make a direct comparison for these tests. These findings indicate vocabulary ability, as a measure of crystalised intelligence, may be less influenced by the Flynn effect (Flynn, Citation2012). They could also be a result of our relatively well-educated sample.

Our second aim was to examine differences between predicted FSIQ scores and the actual WASI-II FSIQ scores achieved by our participants, in order to determine whether tests with older norms remain valid for current use. When compared to WASI-II FSIQ scores, the predicted NART, WTAR (US), and TOPF US scores were significantly higher, while the TOPF UK was significantly lower. The difference between WASI-II scores and the NART was the largest with a medium effect size. The other differences were either non-significant (NART2, WTAR UK) or had a small effect size. Although we concluded differences between the predicted scores reported earlier were consistent with a modest Flynn effect, these latter findings could also be interpreted as consistent with findings of a plateau or even reversal of the Flynn effect in this century (Dutton et al., Citation2016; Platt et al., Citation2019).

Another way to view these results is in terms of whether the relevant word reading test tended to under- or overestimate premorbid intellectual functioning. Based on the current results, use of the NART would provide an overestimate of premorbid cognitive ability by approximately eight points and the TOPF UK by nearly three points, whereas use of the WTAR (US) and TOPF US would underestimate premorbid ability by around three points respectively. Of these differences, it is likely only the overestimate provided by the NART would be of concern to clinicians, although as the interval between date of publication and use increases the magnitude of these patterns of over- and underestimation are also likely to increase.

The current results support the need for Australian standardisation research to support local use of the TOPF. In the absence of Australian norms for the TOPF, our recently published research findings relating to the use of the WTAR and NART in the Australian context provide contemporary predictive models for these tests (Thomas et al., Citation2021). These should give clinicians greater confidence in using older word reading tasks. Another strategy could be to apply an adjustment such as the .23 points per year suggested by Kirton et al. (Citation2020), although given variability in the magnitude and direction of the Flynn effect (Platt et al., Citation2019; Sundet et al., Citation2004), it is difficult to determine an accurate adjustment figure. In any event, ours and other findings (Kirton et al., Citation2020; Norton et al., Citation2016) show the Flynn effect is present in word reading tasks such as the NART, WTAR, and TOPF, and should be accounted for when estimating premorbid intellectual ability.

The current study replicated Norton et al.’s (Citation2016) methodology with a larger and more diverse sample. However, after individuals with low and high IQs were omitted from the study, we still had a relatively modest sample of 120, with overall high levels of education. The mean WASI-II FSIQ for our sample was 103.64, which given the published SEM for the test is 2.77 (Wechsler, Citation2011b, p. 114) represents a score equivalent to the population mean. We note our data were collected in 2016 and the Flynn effect may continue to have an influence on estimated premorbid intellectual ability.

We used the shorter WASI-II to measure FSIQ instead of the WAIS-IV. The WASI-II is quicker to complete and has strong psychometric properties. Its Index Scores are strongly correlated with those of the WAIS-IV, r > .86. The use of the WASI-II in the current study was appropriate, as it measures the underlying construct of intelligence. It is also known to be very strongly correlated with the other tests of intellectual functioning used in the development of the predictive models examined in this study.

In conclusion, this study has replicated previous findings indicating the influence of the Flynn effect in estimates premorbid intellectual functioning using four commonly used word reading tasks: the NART, the NART2, the WTAR, and the TOPF. Our results showed significant differences in premorbid estimates consistent with the Flynn effect and showed caution should be applied in the use of original norms from the NART, in particular. Further research could provide updated norms for use with these older word reading tasks (Bright et al., Citation2018; Thomas et al., Citation2021). Current functioning on the WASI-II FSIQ-4 was significantly lower compared to estimated FSIQ scores derived from the NART, the WTAR (US), and the TOPF US norms, and significantly higher than those derived from TOPF UK norms. Although the effect sizes for these differences were small, this, in addition to our earlier research highlighting problems with the TOPF (Thomas et al., Citation2020) points to the need for further Australian standardisation research for this measure.

Acknowledgments

We would like to express our gratitude and appreciation to the participants who volunteered for this study and Catherine Richardson, Michael Magee, Michelle Vongdara, Maddison Lloyd, Peter Rohr, and Chloe Weekes who assisted with data collection and initial analyses, as part of their psychology student research experience. Thanks to Katie Long at Marathon Health, who assisted with graphic design.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available on request from the corresponding author, [AM]. The data are not publicly available due to the ongoing nature of this research project. Once the study is completed data will be lodged on a public database.

Additional information

Funding

This work was supported by Charles Sturt University [Project Number 20770 Faculty of Arts Research Compact Grant].

References

  • Australian Commission on Safety and Quality in Health Care. (2017). National safety and quality health service standards.
  • Australian Institute of Health and Welfare. (2020). Australia’s health 2020.
  • Blair, J. R., & Spreen, O. (1989). Predicting premorbid IQ: A revision of the National Adult Reading Test. The Clinical Neuropsychologist, 3(2), 129–9. https://doi.org/10.1080/13854048908403285
  • Bright, P., Hale, E., Gooch, V. J., Myhill, T., & van der Linde, I. (2018). The National Adult Reading Test: Restandardisation against the Wechsler Adult Intelligence Scale-Fourth edition. Neuropsychological Rehabilitation, 28(6), 1019–1027. https://doi.org/10.1080/09602011.2016.1231121
  • Bright, P., & van der Linde, I. (2020). Comparison of methods for estimating premorbid intelligence. Neuropsychological Rehabilitation, 30(1), 1–14. https://doi.org/10.1080/09602011.2018.1445650
  • Crawford, J. R., Parker, D., Stewart, L., Besson, J., & De Lacey, G. (1989). Prediction of WAIS IQ with the National Adult Reading Test: Cross‐validation and extension. British Journal of Clinical Psychology, 28(3), 267–273. https://doi.org/10.1111/j.2044-8260.1989.tb01376.x
  • Crowe, S. F. (2010). Evidence of absence: A guide to cognitive assessment in Australia. Australian Academic Press.
  • Dutton, E., & Lynn, R. (2013). A negative Flynn effect in Finland, 1997–2009. Intelligence, 41(6), 817–820. https://doi.org/10.1016/j.intell.2013.05.008
  • Dutton, E., van der Linden, D., & Lynn, R. (2016). The negative Flynn effect: A systematic literature review. Intelligence, 59 November–December , 163–169. https://doi.org/10.1016/j.intell.2016.10.002
  • Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin, 95(1), 29–51. https://doi.org/10.1037/0033-2909.95.1.29
  • Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101(2), 171. https://doi.org/10.1037/0033-2909.101.2.171
  • Flynn, J. R. (2012). Are we getting smarter?: Rising IQ in the twenty-first century. Cambridge University Press.
  • Holdnack, J., & Drozdick, L. (2009). Advanced clinical solutions for WAIS-IV and WMS-IV: Clinical and interpretive manual. Pearson.
  • Howieson, D. (2019). Current limitations of neuropsychological tests and assessment procedures. The Clinical Neuropsychologist, 33(2), 200–208. https://doi.org/10.1080/13854046.2018.1552762
  • Kirton, J. W., Soble, J. R., Marceaux, J. C., Messerly, J., Bain, K. M., Webber, T. A.,Fullen, C., Alverson, W. A., & McCoy, K. J. (2020). Comparison of models of premorbid IQ estimation using the TOPF, OPIE-3, and Barona equation, with corrections for the Flynn effect. Neuropsychology, 34(1), 43–52. https://doi.org/10.1037/neu0000569
  • Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012). Neuropsychological assessment (5th ed.). Oxford University Press.
  • Macquarie Dictionary. (2013). Macquarie dictionary (6th ed.). Macquarie Library Pty Ltd.
  • Mathias, J. L., Bowden, S. C., & Barrett-Woodbridge, M. (2007). Accuracy of the Wechsler Test of Adult Reading (WTAR) and National Adult Reading Test (NART) when estimating IQ in a healthy Australian sample. Australian Psychologist, 42(1), 49–56. https://doi.org/10.1080/00050060600827599
  • Nelson, H. E., & Willison, J. (1991). National adult reading test (NART) test manual (2nd ed.). Nfer-Nelson.
  • Nelson, H. E. (1982). National Adult Reading Test (NART): For the assessment of premorbid intelligence in patients with dementia: Test manual. Nfer-Nelson.
  • Nguyen, R., Fiest, K. M., McChesney, J., Kwon, C.-S., Jette, N., Frolkis, A. D., Atta, C., Mah, S., Dhaliwal, H., Reid, A., & Pringsheim, T. (2016). The international incidence of traumatic brain injury: A systematic review and meta-analysis. Canadian Journal Of Neurological Sciences / Journal Canadien des Sciences Neurologiques, 43(6), 774–785. https://doi.org/10.1017/cjn.2016.290
  • Norton, K., Watt, S., Gow, B., & Crowe, S. F. (2016). Are tests of premorbid functioning subject to the Flynn effect?: Accuracy of premorbid functioning tests. Australian Psychologist, 51(5), 374–379. https://doi.org/10.1111/ap.12235
  • Platt, J. M., Keyes, K. M., McLaughlin, K. A., & Kaufman, A. S. (2019). The Flynn effect for fluid IQ may not generalize to all ages or ability levels: A population-based study of 10,000 US adolescents. Intelligence, 77 November–December , 101385. https://doi.org/10.1016/j.intell.2019.101385
  • Pozzato, I., Tate, R. L., Rosenkoetter, U., & Cameron, I. D. (2019). Epidemiology of hospitalised traumatic brain injury in the state of New South Wales, Australia: A population-based study. Australian and New Zealand Journal of Public Health, 43(4), 382–388. https://doi.org/10.1111/1753-6405.12878
  • Skilbeck, C., Dean, T., Thomas, M., & Slatyer, M. (2013). Impaired National Adult Reading Test (NART) performance in traumatic brain injury. Neuropsychological Rehabilitation, 23(2), 234–255. https://doi.org/10.1080/09602011.2012.747968
  • Strauss, E., Sherman, E., & Spreen, O. (2006). A compendium of neuropsychological tests. Oxford University Press.
  • Sundet, J. M., Barlaug, D. G., & Torjussen, T. M. (2004). The end of the Flynn effect?: A study of secular trends in mean intelligence test scores of Norwegian conscripts during half a century. Intelligence, 32(4), 349–362. https://doi.org/10.1016/j.intell.2004.06.004
  • Thomas, M. D., McGrath, A., Sugden, N., Weekes, C., & Skilbeck, C. E. (2021). Investigating the National Adult Reading Test (NART-2) and Wechsler Test of Adult Reading Test (WTAR) in predicting Wechsler Abbreviated Scale of Intelligence – Second edition (WASI-II) scores in an Australian sample. Australian Psychologist 56 5 372–381 . https://doi.org/10.1080/00050067.2021.1937923
  • Thomas, M. D., Sugden, N., McGrath, A., Rohr, P., Weekes, C., & Skilbeck, C. E. (2020). Investigating the Test of Premorbid Functioning (TOPF) in predicting Wechsler Abbreviated Scale of Intelligence – Second edition (WASI-II) scores in an Australian sample. Neuropsychological Rehabilitation, 1–25. https://doi.org/10.1080/09602011.2020.1842213
  • Trahan, L. H., Stuebing, K. K., Fletcher, J. M., & Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332–1360. https://doi.org/10.1037/a0037173
  • Wechsler, D. (2001). Wechsler Test of Adult Reading. Psychological Corporation.
  • Wechsler, D. (2009). Advanced clinical solutions for WAIS-IV and WMS-IV. Pearson.
  • Wechsler, D. (2011a). Test of premorbid functioning: UK edition. Pearson.
  • Wechsler, D. (2011b). WASI-II: Wechsler abbreviated scale of intelligence. Pearson.