8,445
Views
18
CrossRef citations to date
0
Altmetric
Articles

Performance validity test failure predicts suppression of neuropsychological test results in developmentally disabled children

&

Abstract

There is increasing awareness of the need to use Performance Validity Tests (PVTs) when assessing cognitive abilities in children. Since 1996, there has been an ongoing clinical study of the use of three PVTs with a consecutive series of 1,285 developmentally disabled children. In this study, we report on the results of these PVTs in children of many diagnostic categories. Failure rates on all three tests were very low. The mean scores on the effort measures in those passing the PVTs were extremely high. Failure on each PVT was found to be associated with a significant and widespread suppression of scores across a neuropsychological battery. Failure on even one PVT significantly suppresses ability test scores.

Introduction

This study will describe the results of three Performance Validity Tests (PVTs) given over more than a twenty year period to a consecutive series of developmentally disabled children of mixed diagnoses. To date, this is probably the largest consecutive series of developmentally disabled children given multiple PVTs. When the preliminary testing began in 1996, most people did not yet recognize that PVTs were required for adults and almost nobody was using PVTs with children. The National Academy of Neuropsychology position paper supporting the need for validity testing for adults did not appear until nine years later in 2005 (Bush et al., Citation2005). Today, the tide has changed and there is widespread acknowledgment of the need to use PVTs not only with adults but also with children, so that invalid results may be detected using objective tests.

Kirkwood (Citation2015) provided a comprehensive review of the use of effort measures with children and adolescents, including the Medical Symptom Validity Test (MSVT), and reported that:

In brief, available studies indicate that non-credible effort can have a dramatic effect across most cognitive domains, not only in adult but also in child evaluations. Given the size of the effects, interpreting data without accounting for invalid effort could lead to gross interpretive errors, inaccurate diagnosis and etiological conclusions, ineffective treatment recommendations and inappropriate health care, educational and government resource utilization. (p. 11)

Hence, there is a need for research on methods of identifying poor effort in children. Baron (Citation2019) also endorsed the value of PVTs in pediatric neuropsychological assessment. Larson et al. (Citation2015, p. 192) noted that

a growing body of research has examined the value of PVTs in pediatric populations. Recent studies indicate that high percentages of school-age children are capable of passing several stand-alone PVTs…. The Medical Symptom Validity Test (MSVT, Green, Citation2004) was developed for use with adults and pediatric populations….

Ideally, PVTs do not measure ability in the vast majority of people with clinical conditions who are likely to be tested. Ideally, PVTs would be insensitive to differences in ability and age, as well as being insensitive to actual brain impairment. Even more importantly, although PVTs should lack sensitivity to true differences in ability between individuals and between groups, PVT scores should nevertheless be able to explain a lot of variance in the neuropsychological test battery as a whole. Traditionally, we used to think that variance in neuropsychological test scores mainly reflected differences in ability. We now know that a lot of variance is due to differences in effort and not ability (Green, Rohling, Lees-Haley, & Allen, Citation2001; Kirkwood, Citation2015; Stevens, Friedel, Mehren, & Merten, Citation2008).

The Word Memory Test (WMT; Green & Astner, Citation1995; Green, Citation2003) was originally used with adults with brain trauma and other neurological diseases but shortly afterward its use with children was investigated. The expectation when testing began was that these impaired children would often fail the WMT, but empirical data quickly showed that this was not the case. On the primary effort measures, developmentally disabled children in many different diagnostic categories scored just as highly as healthy adults (Green & Flaro, Citation2003). The adult cutoffs were found to be applicable to children without adjustment, provided that reading was at a grade three level. Ten percent of children failed the WMT in the latter study. Six children were given the same test again but this time with an external incentive. If they passed the second time, they could choose any of the soft drinks or candies displayed on a shelf in view of the children (Green & Flaro, Citation2003). All but one of these children easily passed the WMT on their second attempt, showing that their failure the first time was due to poor motivation or lack of interest. They were not false positives for poor effort. The 15-year old child who failed the second time had worse than chance scores, showing that he knew the correct answers but was choosing the wrong ones much of the time. Not surprisingly, he had a diagnosis of oppositional defiant disorder. The specificity of the WMT in the children of this study with at least grade 3 reading was 100%, once motivation was enhanced using an external incentive.

The WMT has been used widely with adults and children. One survey showed that the WMT was the third most widely used of all computerized tests in neuropsychology (Rabin et al., Citation2014). In another survey of professionals by Martin, Schroeder, and Odland (Citation2015), the WMT and the MSVT were found to be the second and third most widely used PVTs with adults and the Nonverbal Medical Symptom Validity Test (NVMSVT; Green, Citation2008) came in tenth position. The cutoffs for the WMT were chosen to be three standard deviations below the mean of a group of adults with moderate to severe traumatic brain injury (TBI) and a group of disabled patients with neurological illnesses (Allen & Green, Citation1999; Green & Allen, Citation1999). One boy and one woman with surgical removal of the left temporal lobe for control of seizures passed the WMT (Carone, Green, & Drane, Citation2013). Three patients with bilateral hippocampal damage also passed the WMT (Goodrich-Hunsaker & Hopkins, Citation2009). Carone (Citation2014) reported on the remarkable case of a nine year old girl with minimal brain tissue from hydrocephalus, who had a full scale IQ of 58 and an uncontrolled seizure disorder. She was reluctant to take the tests but performed almost perfectly on both the WMT and MSVT recognition subtests.

The MSVT is essentially a short form of the WMT and it is based on presenting strongly linked word pairs, such as “tooth pick,” followed by certain tests of memory, including recognition memory subtests. In contrast, the NVMSVT employs colored images presented in pairs and the effort measures are also based on recognition testing. The MSVT and the NVMSVT were developed with the intention of testing both children and adults. The cutoffs for both tests were derived from samples of developmentally disabled children. The logic was that, if certain subtests were so easy that even children with severe disabilities could pass them, the same subtests should also be passed easily by almost any adult.

In the MSVT test manual (Green, Citation2008), healthy children in grades two to seven were found to score almost perfectly on the Immediate Recognition subtest (IR), Delayed Recognition (DR), and consistency (CNS) with median scores typically at 100% correct even in the youngest children. Blaskewitz, Merten, and Kathmann (Citation2008) compared the MSVT with the Test of Memory Malingering (TOMM) in a sample of healthy children asked either to simulate impairment or to try their best to do well. Both had perfect specificity in the good effort children. The MSVT had at least 90% sensitivity to simulation and the TOMM had 68% sensitivity. Ten percent of the children admitted that they had not simulated impairment but had tried their best, consistent with their teachers’ instructions to try their best no matter what the circumstances. This is an example of how a volunteer’s motivation may not always be what we think it is.

The MSVT was applied to a group of children with severe disabilities, including Fetal Alcohol Spectrum Disorders (FASD), and the results were described in the MSVT test manual (Green, Citation2004) in the section on interpretation. For the recognition memory trials and the consistency score between these two trials (i.e., the effort measures), the cutoffs were chosen to be three standard deviations below the mean from these disabled children. The same cutoffs were then applied to many adult samples, including civilians with mainly soft tissue injuries (Richman et al., Citation2006), psychiatric patients (Gill, Green, Flaro, & Pucci, Citation2007), and soldiers with mild TBI (Armistead-Jehle, Citation2010). In the article by Richman et al. (Citation2006), it was reported that healthy children were tested using the MSVT in French. Even though they understood no French words, they scored means of 98% correct on the recognition memory subtests, just like healthy French speaking adult volunteers.

In a series of 380 developmentally disabled children (Green, Flaro, Brockhaus, & Montijo, Citation2012), failure on the MSVT was found in only 5% of those tested. Thus, even if all the failures were false positives for invalid data, which is unlikely, the MSVT displayed at the very least 95% specificity in these disabled children, showing just how easy the recognition memory subtests are. These children were part of the first 380 cases from the current series of 1,285 children.

Independent research by Kirkwood (Citation2015) supported the use of the MSVT in children with disabilities. They found that 38% of the variance in the whole test battery was explained by the effort scores on the MSVT. The same had previously been found with adults, in whom the WMT explained 50% of the variance in the whole test battery, whereas years of education explained only 12% and brain injury severity explained 4% (Green et al., Citation2001). Stevens et al. (Citation2008, p.191), using the MSVT in German, wrote

Effort accounted for up to 35% of the variance of performance in the domains of cognitive speed, memory and intelligence. After controlling for effort, there was no significant effect that could be attributed to substantial brain injury. The findings confirm that there is a general and strong effect of effort on psychological test results, which dwarfs the impact of substantial brain injury. Effort testing should become a standard procedure in psychological testing.

Mentally handicapped children with a mean Full Scale Intelligence Quotient (FSIQ) of 61 all passed both the MSVT and the WMT as long as their reading skills were at or above a grade three level (Green & Flaro, Citation2016). Carone (Citation2008) reported that children with neurological diseases including severe TBI easily passed the MSVT and they rated it as very easy, whereas adults with mild TBI often failed it and rated it as difficult. Similarly, adults with acute severe TBI and still in hospital easily passed the MSVT once they emerged from post-traumatic amnesia (Macciocchi, Seel, Yi, & Small, Citation2017).

The NVMSVT employs artist-drawn colored images of simple pairs, such as a bird and a nest. It was reported in the test manual (Green, Citation2008) that children with FASD scored very highly on the IR subtest and on the three Delayed Recognition subtests (DR, Delayed Recognition Archetypes [DRA], and Delayed Recognition Variations [DRV]). The cutoffs for failure on these subtests were chosen to be at the fifth percentile relative to the first group of FASD children tested. In a large series of developmentally disabled children (Green et al., Citation2012), the cutoffs had at least 90% specificity, even if all cases failing the NVMSVT were false positives, which is very unlikely. The NVMSVT was easily passed by most developmentally disabled children (Green et al., Citation2012). One study showed that the NVMSVT was more sensitive than the TOMM (Green, Citation2011) and this finding was replicated independently by Armistead-Jehle and Gervais (Citation2011). This has not yet been replicated in children but as previously noted, Blaskewitz et al. (Citation2008) found the MSVT to be more sensitive than the TOMM in healthy children asked to simulate impairment.

The current study was designed to evaluate performance on the WMT, MSVT, and NVMSVT in a large consecutive series of developmentally disabled children. The primary goal was to determine whether failure on each PVT led to lowered scores on the ability test battery. This required a very large sample because such a small percentage of children fail these PVTs. It was hypothesized that those who failed any one of the PVTs would show suppressed scores on ability tests. It was hypothesized that those who failed two of these PVTs would score lower than those who failed only one of the PVTs and that those failing all three PVTs would score the lowest of all. If these hypotheses were supported, we would have further evidence that failure on these PVTs is an indicator that other neuropsychological test results in the same assessment underestimate true ability. If not, we would have to question the predictive utility of these PVTs because failure on a PVT should predict lowered scores on at least some ability tests, if not most of them. A PVT is not fully validated until we show that failure on that PVT actually indicates suppression of test scores across the neuropsychological battery.

Method

Participants

Between the years 2000 and 2019, 1,285 young people with many different developmental disabilities were referred to the second author for clinical neuropsychological assessment, mainly by the local Social Services department and by pediatricians. The number dropped to 1,174, when we excluded those aged between 18 and 21 years, as well as children under the age of 7 years, who were below the recommended minimum age to administer the PVTs in the study. Data from these 1,174 children are analyzed in the following sections.

The diagnostic composition of the sample is shown in . The two largest groups were children with FASD (n = 285) and Attention Deficit Hyperactivity Disorder (ADHD, n = 157). The mean age of the sample was 13.2 years (SD 3.0) and 59.3% were boys. They ranged in age from 7 to 17 years. Backgrounds were mixed but it is worth noting that most of the FASD children were from First Nations. The most common referral question was what type of academic accommodations, special housing or other social services would be needed to support and assist these handicapped children. Data from all cases were entered into the spreadsheet daily or weekly. Parents or guardians signed consent forms to allow data to be analyzed anonymously for purposes of this research. They all had fluent English and understood that file reviews might be done retrospectively at some time in the future for research purposes. There was no requirement to submit the project to an Independent Review Board in Alberta, the Province of Canada where this work was performed.

Table 1. Primary diagnostic categories and numbers per group.

Tests used (PVTs)

The computerized version of the WMT (Green, Allen, & Astner, Citation1996; Green, Citation2003) involves presenting 20 strongly associated word pairs, such as “man” and “woman” on a computer screen twice and then testing memory in various ways. On the IR subtest, the person is asked to choose one of the target words from pairs of words, which are shown on the computer screen. For each stimulus, one of these words (the target) was in the original list and the other one (the foil) was not. After 30 minutes, a similar task is repeated using the same target words but with new foil words (DR). The computer calculates the consistency (CNS) of responses from IR to DR. This is followed by the multiple choice (MC) subtest, in which the first word from each pair is presented and the person has to pick from a list of eight options the word that was paired with it. On the paired associate recall trial (PA), the examiner says the first word and asks what word went with it. Finally, there is a free recall (FR) trial in which the examiner asks the person to recall all the words from the list. There is an optional long delayed free recall task (LDFR). Scores on each measure are expressed as a percentage of the maximum possible score.

The MSVT (Green, Citation2004) is essentially a short version of the WMT in which only 10 word pairs are presented on the computer screen. Each word pair represents only one concept (e.g., jet plane). There are two recognition subtests, including IR, followed after 10 minutes by the DR subtest. The computer calculates consistency between responses on IR and DR. Then, there is a PA subtest and a FR subtest, as on the WMT.

The NVMSVT (Green, Citation2008) involves the presentation twice in succession of 10 lively, colored artist-drawn images, each showing two very strongly linked items, such as a mouse with a piece of cheese (target items for later recognition and recall). The DR trial involves presenting two images, one seen before (target) one not seen before (foil). The person uses a computer mouse to select the target item. On the delayed recognition section, after a 10 minute delay, three separate types of recognition task are used but the items are intermingled so that it is not obvious that there are three different tasks. There is a simple target-foil recognition task (DR). There is a DRA task employing previously unseen “archetypal” images as the foils and using previously seen foils from the IR trial as the targets. The third subtest, DRV, uses the original paired images and the same image with a slight variation (e.g., the mouse but with a slightly different shape of cheese). PA involves showing one half of each image pair (e.g., the mouse) and asking the person to say what went with it in the original list. Finally, FR involves asking the person to recall as many of the items in the paired images as possible.

The WMT, MSVT, and NVMSVT are all failed by some people with Alzheimer’s type dementia, despite assumed best effort (Green, Montijo, & Brockhaus, Citation2011; Howe, Anderson, Kaufman, Sachs, & Loring, Citation2007; Howe & Loring, Citation2008; Henry, Merten, Wolf, & Harth, Citation2010; Singhal, Green, Ashaye, Shankar, & Gill, Citation2009) but certain patterns of scores are found within each test, such that scores on easier subtests are much higher than those on harder subtests. These are known as dementia-like profiles or genuine memory impairment profiles (GMIP). If the GMIP is not present in a case of failure, the results are not consistent with actual impairment and poor effort is suggested. If the GMIP is present, clinical judgment is needed to decide whether the person has a condition known to cause a GMIP, such as dementia or severe dyslexia. In a simulator study, Armistead-Jehle and Denney (Citation2015) offered substantial rewards to any student who could perform like dementia patients and produce a GMIP on each of the WMT, the MSVT and the NVMSVT. No case was able to accomplish this on all three tests and few achieved a GMIP on two of these tests. It was especially hard for simulators to create a credible profile on the NVMSVT.

People with dementia who fail the easy NVMSVT measures nearly all score at least 20 points higher on the mean of the first three scores (IR, DR, CNS) versus the last two scores (PA and FR). They also show very small scatter among the scores across IR, DR, CNS, DRA, and DRV because these are all equally easy and the standard deviation of these scores in dementia patients is rarely greater than 12 points. Finally, they score at least 11 points lower on PA compared with the mean of the previous four scores (DR, CNS, DRA, and DRV). When the PA score is not at least 11 points lower than the mean of DR, CNS, DRA, and DRV, the person has failed criterion B1. When the mean of IR, DR, and CNS is not at least 20 points above the mean of PA and FR, the person has failed criterion B2. When the standard deviation of IR, DR, CNS, DRA, and DRV is 12 or more the person has failed criterion B3. If an individual fails the easy subtests of the NVMSVT and the profile fails on two or more of these criteria (B1, B2, and B3), the results are considered to reflect poor effort and not genuine impairment similar to that observed in people with dementia.

Tests used (neuropsychological tests)

Many different neuropsychological tests were given to the children. For purposes of assessing the impact of failure on PVTs, we selected the tests shown in . These 11 tests were chosen for various reasons, the first of which was that each test had been given to at least 300 children. The large sample sizes allowed us to examine the effects of failed PVTs on ability tests, even though only a small percentage of children failed each PVT. Second, we dropped certain tests which had different versions for different age ranges because that would take up too much space and crowd the tables (e.g., Trail Making Test). The 11 tests were chosen for analysis mainly because they are known to be important abilities underlying both academic achievement and competent social functioning. We also wanted to include diverse tasks, representing a range of abilities tapping different domains of brain function. In theory, poor effort evident on memory based PVTs might affect memory tasks in particular, while having little or no effect on non-memory tasks. The ability test results analyzed in this study included:

Table 2. Neuropsychological test scores in children passing or failing the WMT.

  1. Full Scale Intelligence: results are shown as standard scores (Wechsler, Citation2004: The Wechsler Intelligence Scale for Children).

  2. Reading abilities: scores shown as reading grade equivalents (Wilkinson & Robertson, Citation2006: Wide Range Achievement Test).

  3. Mathematical abilities: scores shown as mathematical grade equivalents (Wilkinson & Robertson, Citation2006: Wide Range Achievement Test).

  4. Word list learning and memory: scores shown as standard scores for the numbers of words recalled from an orally presented list after a short delay (Talley, Citation1988: Children’s Auditory Verbal Learning Test-2 [CAVLT-2]).

  5. Word list learning and memory: scores shown as standard scores for the numbers of words recalled from an orally presented list after a longer delay (Talley, Citation1988: Children’s Auditory Verbal Learning Test-2 [CAVLT-2]).

  6. The ability to comprehend and repeat short stories immediately after hearing them; raw scores shown as number of items recalled from a total of 10 tape recorded stories heard in both ears simultaneously, where the normal adult mean is 100 (SD 13) and on which the average child at age 12 or above scores the same as the average adult (Green & Kramar, Citation1983: The Story Recall Test).

  7. Executive functions on a planning task: moving scores shown as standard scores (Culbertson & Zillmer, Citation2001: Tower of London).

  8. Executive functions on a problem solving task: The tables show the raw scores for the number of categories achieved (Heaton & PAR Staff, Citation1993: Wisconsin Card Sorting Test: Computer Version 4.).

  9. Ability to judge emotion: raw scores are given for the number of errors (Green, Flaro, & Gervais, Citation1986: Green’s Emotional Perception Test, involving judgment of emotions “happy, angry, sad, frightened or neutral” expressed in the tone of voice of 45 tape recorded sentences).

  10. Olfactory identification: sum of left and right nostrils for the number of items correctly identified out of a maximum of 10 per nostril (Green, Citation1989: Alberta Smell Test, involving judgment of odors in scented markers).

  11. Manipulation of small pegs: raw scores shown as the number of seconds to complete the task using the right hand (Trites, Citation1977: Grooved Pegboard, which involves using each hand separately to manipulate small key-like pegs and to place them into slots in a pegboard).

Procedures

Pilot testing with children using the oral version of Word Memory Test (Green & Astner, Citation1995) began in 1996. It was clear that children could easily pass the effort measures of the oral WMT. The more efficient, computerized version of the WMT (Green, Allen, & Astner, Citation1996; Green, Citation2003) was given to children starting in the year 2000, at which point records of the results were routinely saved by the WMT computer program and also entered manually into a computerized spreadsheet. We analyzed only data from the computerized WMT gathered between the years 2000 and 2019 (n = 1,174). When the MSVT (Green, Citation2004) was added to the battery and later the NVMSVT (Green, Citation2008), time was limited and so not all children were given all three tests. Overall, 568 children were given all three tests. Some children were given two of the tests (124 cases had the WMT and MSVT; 23 children had the WMT and NVMSVT; 114 children had the MSVT and the NVMSVT). Some children were given only one of the tests (176 children had only the WMT; 60 children had only the MSVT; 78 children had only the NVMSVT). In total, 891 children took the WMT, 866 children took the MSVT and 783 children were given the NVMSVT. 31 children were not given any of the PVTs. One case, for example, was a severely autistic child with no language and only capable of making grunting or screaming sounds and hence was untestable. These 31 children were not included in the study.

Results

Word Memory Test

A total of 100 out of 891 (i.e., 11.2%) of the children failed the WMT using the standard cutoffs. Even if we assumed that all the WMT failures were false positives for poor effort, the specificity of the WMT in this group of developmentally disabled children would be 89%. In fact, it is likely that most of those who failed were actually making insufficient effort to produce valid test results. It may be recalled that Green and Flaro (Citation2003) found that children who failed the WMT were able to pass, once they were offered a small incentive, even though they had failed the WMT the first time. Those children were all part of the current series and therefore we know that some of those who failed the WMT were able to pass but were not motivated to do so.

A total of 47 of the children failing the WMT had a profile that invariably means poor effort because they did not display a GMIP. This means that there were internal inconsistencies between the subtests, which are rare in those who genuinely are unable to pass. Their data would be interpreted as invalid irrespective of diagnosis. 53 of the WMT failures had a profile that would be credible in someone with dementia (GMIP). This distribution of GMIP versus poor effort profiles is typically what is found in groups who are asked to simulate impairment on the WMT (e.g., Green, Lees-Haley, & Allen, Citation2003). This means that, in principle, 53 cases (5.9% of all cases given the WMT) might be considered to be potential failures despite best effort. However, in clinical cases, false positives would only be concluded if the person showed extremely severe impairment akin to that seen in Alzheimer’s disease. False positives on the WMT are very rare, except in dementia.

A total of 11 of the children failing the WMT had less than grade 3 reading ability. Seven of these 11 children did have dementia like profiles (GMIP) suggesting that failure could have been explained by dyslexia. It is therefore possible that they failed the WMT despite best effort. Larochette and Harrison (Citation2012) have shown that severe dyslexia can lead to WMT failure and, in such cases, there is a dementia like profile (GMIP) on the WMT.

To study the impact of WMT failure, we may examine scores on selected neuropsychological tests in those who passed or failed the WMT. shows that, on 9 out of the 11 ability tests, those who failed the WMT scored significantly lower than those who passed. Thus, failure on the WMT was linked with significant suppression of most scores across the test battery. The most likely explanation is that those who failed the WMT did so because of poor effort and they were making a poor effort on tests in general. That is why they scored significantly lower across most neuropsychological tests compared with those who passed the WMT. It could be argued, to the contrary, that the cases who failed the WMT did so because they were actually more impaired and that is why they also scored lower across the test battery. It has already been concluded from previous research summarized in the introduction that the WMT effort subtests are not affected by age or by FSIQ (e.g. Green & Flaro, Citation2003) and they are insensitive to most brain diseases (e.g., Green, Lees-Haley, & Allen, Citation2003). It is therefore unlikely that the lower ability test scores in those failing the WMT were due to lower ability.

In the current data, if the WMT recognition memory subtests and consistency really were sensitive to differences in ability, we would expect to find a clinically significant correlation between WMT recognition scores versus both age and intelligence in the 791 cases who passed the WMT and whose effort was, therefore, presumed to be good. However, that is not what the data show. Using nonparametric statistics to compare mean WMT effort scores, (IR + DR + CNS)/3, by age level, there was no significant difference. There was also no significant difference in the mean WMT effort scores by FSIQ level. The mean WMT effort score, (IR + DR + CNS)/3, did not correlate significantly with intelligence (Spearman’s r = 0.05) and the correlation with age was weak (r = 0.08) in the children passing the WMT, although statistically significant at .05. Bearing in mind that the children ranged in age from 7 to 17 years, we would expect a much bigger correlation for a variable measuring ability. For example, in the cases passing the WMT, age correlated with WMT Free Recall, which is an ability measure, at 0.28 (Spearman).

In the whole sample, irrespective of passing or failing the WMT, the mean of WMT IR, DR and CNS in the nine children aged 7 years was 94% (SD 9) and in the 17 year olds (n = 168), it was 95.5% (SD 6). This is a trivial difference. The children were broken down into three ranges of FSIQ. There were 373 children in the lowest FSIQ range and their mean FSIQ was 72.8 (SD 7.6). Their mean score on (WMT IR + DR + CNS)/3 was 94.5% correct (SD 7.1). The highest FSIQ range children (n = 120) had a mean FSIQ of 110.3 (SD 7.4), which is an average of 37.5 points higher than the low FSIQ group. The highest FSIQ group’s mean score on (WMT IR + DR + CNS)/3 of 96.5% (SD 4.9) was only one percentage point higher than that of the lowest FSIQ group. This is a trivial and nonsignificant difference.

In summary, age and intelligence were unrelated to the mean WMT effort scores in this sample of children. Even in the WMT failures, there was no correlation between the mean WMT effort scores and FSIQ. Age also did not correlate with WMT scores in the subgroup who failed the WMT (Spearman .12) and there was no significant effect of age on WMT effort scores (Kruskal-Wallis).

The absence of substantial correlations between age and intelligence versus the WMT effort measures is in keeping with the WMT recognition subtests measuring effort and not ability. Note that in the mean scores on the IR and DR subtests were over 97% correct (SD = 3) in the whole sample of 791 developmentally disabled children who passed the WMT. This is the same as found in healthy adult volunteers (Green et al., Citation2003) and it is evidence that these subtests are not sensitive to differences in ability. Hence, failure on the WMT cannot be explained as a function of low ability.

Table 3. Mean scores per subtest on the WMT by pass or fail WMT.

MSVT

A total of 866 children between the ages of 7 and 17 were given the MSVT. Of these, 818 cases (94.5%) passed the MSVT and only 48 cases (i.e., 5.5%) failed the MSVT. Even if all these children were assumed to be false positives for poor effort and invalid test results, the MSVT would still have 94.5% specificity in this large group of children with developmental disabilities. This reinforces how easy it is for a person to pass the MSVT. The mean scores per subtest are shown for those who passed and failed the MSVT in . Despite their significant disabilities clinically, the mean scores for IR and DR were over 99% correct in the children who passed the MSVT and the standard deviations were very small (1.7 to 2.6). All of the subtest scores are greater than the corresponding WMT scores in . This is mainly because the WMT has 20 word pairs but the MSVT has only 10 pairs and so the MSVT is slightly easier to memorize than the WMT.

Table 4. Mean scores per MSVT subtest by pass or fail the MSVT.

If the MSVT measures effort, we would expect that those who passed the MSVT would score significantly higher across the neuropsychological test battery than those who failed. In , we observe that the scores on tests of ability were significantly higher in the MSVT passes versus the failures in all comparisons but one. The exception was a test of olfactory identification.

Table 5. Scores on ability tests by passing or failing the MSVT.

In the 818 children passing the MSVT, there was no significant effect of intelligence on the mean of MSVT IR, DR, and CNS (Kruskal-Wallis). There was a statistically significant effect of age (Chi square 23, df 10, p < .003) in which the mean MSVT score in the seven year olds was 97.2% correct as opposed to a mean of 99% correct in the 17 year olds. This is not clinically significant. Mean MSVT effort scores correlated significantly but weakly with age (Spearman’s r = 0.1) and mean MSVT scores did not correlate with intelligence (r = 0.08).

These results are consistent with the MSVT being a PVT. The MSVT results are insensitive to true differences in ability, which means that failure cannot be explained by low ability in this sample. Despite being insensitive to differences in ability, they predict whether or not the battery of tests as a whole underestimates true ability.

NVMSVT

Out of 783 children, 682 cases passed the NVMSVT and 101 cases (12.9%) failed the NVMSVT, using the standard cutoff from the manual (criterion A1), based on the mean of IR, DR, CNS, DRA, and DRV. Even if all these children were assumed to be false positives for poor effort and invalid test results, the NVMSVT would still have 87.1% specificity in this large group of children with developmental disabilities. In fact, most of those who failed were making a poor effort. One way to determine if a failure is a result of poor effort or genuine impairment similar to that seen in dementia is to examine the profile using criteria A, B1, B2, and B3. The dementia sample of Henry et al. (Citation2010) may be used as an example of what NVMSVT profiles look like in cases with very severe impairment. No case in that study had a poor effort profile, despite being severely cognitively impaired. This means that those who failed the easy subtests (Criterion A) did not fail two or more of the three B criteria. Instead, they all had a GMIP, which is the typical dementia profile reported in the test manual (Green, Citation2008). In contrast, in the 101 children who failed the NVMSVT in the current study, 58 cases failed all three B criteria, 36 cases failed two of the three B criteria, 5 cases failed only one and 2 failed no B criteria. Applying the standard interpretation, we would classify the data from the 94 cases who failed two or more B criteria as indicating poor effort. In a clinical assessment, we would consider the remaining 7 cases as showing a possible GMIP, meaning that we would have to determine whether the person was actually functioning in daily life as poorly as people with Alzheimer’s disease and had a diagnosis which is known to be associated with NVMSVT failure. If so, the person’s data might be valid. Otherwise their data would be invalid.

In summary, 93% of all children failing the NVMSVT had profiles which indicate invalid data because their results showed internal inconsistencies which are not typical of people with Alzheimer’s type dementia but are typical of simulators reported in the test manual (Green, Citation2008) and by Armistead-Jehle and Denney (Citation2015). This means that the false positive rate of the NVMSVT in this sample was, at the most, 0.095% (less than 1%). 61 of the children failing the NVMSVT were also given the WMT and/or the MSVT and of these, 27 failed one or both of these PVTs. Thus, 34 cases (4.3% of the whole sample given the NVMSVT) failed the NVMSVT but not the WMT or MSVT.

shows ability test scores in those who passed or failed the NVMSVT. On all the ability tests except the Alberta Smell Test, those who failed the NVMSVT scored significantly lower than those who passed. Thus, we observe a sweeping superiority of performance across almost all the measures of ability in those who passed the NVMSVT, suggesting that failure on this test is a sign of generally poor effort throughout testing. Poor effort suppresses scores on the NVMSVT and on most neuropsychological tests.

Table 6. Scores on ability tests by pass versus fail the NVMSVT.

Those with a FSIQ below 70 (mean FSIQ = 61, SD 8, n = 103) who passed the NVMSVT scored a mean of 97.5% correct (SD 5.4) on (IR + DR + CNS + DRA + DRV)/5. In contrast, those with a FSIQ above 70 (mean FSIQ = 88, SD 11.4, n = 568) scored a mean of 98.5% (SD 16) on the same subtests. There was, therefore, only a difference of one percentage point between those of very low FSIQ versus those with much higher FSIQ. Overall, the difference in NVMSVT effort scores by FSIQ was not statistically significant. There was a significant difference in NVMSVT effort scores by age (Chi square = 22.2, df 10, p < .014) but the effect was trivial. For example the mean easy NVMSVT score in 7-year olds was 97% correct and in 17-year olds the mean score was 98.6%. The mean of the easy NVMSVT scores correlated weakly with age (Spearman’s r = 0.14) and FSIQ (r = 0.08). Because NVMSVT effort scores are not affected to a clinically significant degree by age or FSIQ, we concluded that they do not measure ability but are affected by effort.

In , we see that the scores on the recognition subtests in the passers are all above 97% correct, except for the DRA subtest which is 91% correct. The results of those failing the NVMSVT are of interest. IR (95% correct) is a simple forced choice test and it is not sensitive to poor effort. DR is similar but after a ten minute delay and this makes it more sensitive (76%). DRA is a very different task, in which what was previously a foil on the IR trial, such as an airplane, is now presented as the target that the client must identify. It must be selected in preference to a novel foil, which has survival value (e.g., a bat with wings spread and teeth bared). The mean score on DRA in those who failed the test was 66%, the lowest of all the recognition subtests. DRA is considered to be the single most sensitive subtest to poor effort.

Table 7. Scores on the NVMSVT by pass or fail the NVMSVT.

Also notable is that the PA score is 96% correct, even in those who fail the NVMSVT and it is not sensitive to poor effort. The PA score is considerably higher than the mean of the preceding recognition subtests and the consistency score. This pattern was previously seen with adult simulators and it is the opposite of that seen in people with dementia, for whom the PA subtest is quite difficult. The children showing poor effort are scoring about the same as patients with dementia on the easier subtests (DR, CNS, DRA, DRV) but then scoring much higher than dementia patients on the much harder PA subtest (Green, Citation2008). The relative elevation of the PA score over easier subtests is paradoxical but it is typical of people making a poor effort.

If we apply the standard rules to the mean profile from the children failing the NVMSVT, they fail criteria B1 and B2 but not B3. The standard interpretation of failure on criterion A (low score across the easy subtests) and failure on two of the three B criteria is that effort is not sufficient to produce valid test results. If the mean profile from the NVMSVT cases failing the test in were from an individual, that person would be assumed to be producing invalid test data, which underestimate true ability. This is consistent with the data in .

Profile of NVMSVT scores in those failing the WMT or the MSVT

The existence of a profile of NVMSVT scores, which is incompatible with actual severe impairment but which is typical of simulators, provides us with a method for evaluating further some of those who failed the WMT. Out of 591 cases that were given both the WMT and the NVMSVT, 72 cases failed the WMT and, of those, 15 cases failed the NVMSVT. In those who failed both NVMSVT and the WMT, the mean NVMSVT profile was as follows: IR 96.6%, DR 73.3%, CNS 72.6%, DRA 70%, DRV 75.3%, PA 97.3%, FR 55%. This profile fails on criteria A, B1, and B2, meaning that it contains significant internal inconsistencies (e.g., PA is higher than DR). More than 95% of dementia patients studied thus far have not shown profiles of this type. Thus, the overall profile from those who failed the WMT and the NVMSVT was one that is not plausible from someone with severe impairment who is providing valid data. Notably, the PA score is a mean of 25 points higher than the mean of the previous four scores, whereas it should be at least 11 points lower in people with genuine severe impairment (fail B1); The easy-hard difference is only 6 points, whereas it should be at least 20 points (fail B2) and the standard deviation of scores from IR to DRV is 10.8, which does not quite meet the 12-point cutoff (B3).

Out of 682 cases given the MSVT and the NVMSVT, 33 cases failed the MSVT. 11 of those 33 cases failed the NVMSVT. In those who failed the NVMSVT as well as the MSVT, the mean NVMSVT profile was as follows: IR 97.3%, DR 88.1%, CNS 86.9%, DRA 80.4%, DRV 93.6%, PA 96.3%, FR 55%. Criterion A is failed (Mean IR, DR, CNS, DRA, DRV less than 90%); Criterion B1 is failed (PA minus the mean of DR, CNS, DRA, and DRV is not −11 or lower); Criterion B2 is failed (mean IR, DR, CNS not 20 or more points higher than mean PA and FR); Criterion B3 is not failed. Two out of three B criteria are failed in those who failed the MSVT and failed the NVMSVT. Thus, at least these cases were probably not false positives for the MSVT.

A total of 22 cases failed the MSVT but passed the NVMSVT, probably because the MSVT is more sensitive than the NVMSVT. 57 cases failed the WMT and passed the NVMSVT, which is thought to reflect the fact the NVMSVT is not as sensitive as the WMT to invalid data. Another possibility is that the children tested are not systematically showing poor effort across all tests but showing inconsistent effort across PVTs that are all very easy (e.g., passing NVMSVT and failing MSVT). If so, they might also show inconsistent effort across the battery, explaining the significant suppression of scores in those who only fail one PVT. It may be noted that in a large adult sample with compensation incentives, far more adults fail the WMT, MSVT, and NVMSVT compared with these children and far more fail two or more of the three PVTs (Green et al., Citation2001).

Failure on one or more PVTs

A total of 568 cases were given the WMT, the MSVT, and the NVMSVT. Of these, 12.3% failed the WMT, 4% failed the MSVT, and 9.5% failed the NVMSVT. A total of 450 cases (88%) passed all three tests, 93 (16.4%) failed only one of the three PVTs, 21 (3.6%) failed two out of three and only a tiny proportion, 4 cases (0.07%), failed all three tests. There is not enough space to show all the statistical results, but we may note that there were highly significant differences among the four groups (i.e., fail no PVT, fail one, fail two, or fail three PVTs) on 9 out of the 11 tests shown in (all except smell test and reading level). Post hoc Bonferroni comparisons showed that those who failed only one PVT scored significantly lower than those who passed all three PVTs on nine of the eleven (9/11) ability tests (all except reading and smell test). Thus, even when only one of the three PVTs is failed, the data suggest that test scores from that child underestimate true ability on more than two thirds of all tests administered. Moreover, on post hoc Bonferroni comparisons, there was no significant difference among any of the 11 ability test scores in those who failed only one PVT versus those failing two PVTs.

To illustrate the effect, we may examine one of the ability test scores, namely the number of categories achieved on the Wisconsin Card Sorting Test (WCST). In , we see that the more PVTs that are failed, the lower the performance becomes. Post hoc analysis shows that the 93 children who failed only one PVT scored significantly lower than those who failed no PVT (p < .005). There was no significant difference between the WCST score in those who failed only one PVT versus those who failed two PVTs. There was a significant difference between the WCST score (mean 4.4) in those who failed only one PVT versus those who failed three PVTs (mean 2.0, p < .02).

Table 8. WCST Categories Achieved by the number of PVTs failed.

A similar pattern emerges in data from the CAVLT (IR), as shown in . The more PVTs failed, the lower the scores across the test battery. Even when only one of the three PVTs is failed, scores across most of the battery are significantly lower than in the cases that passed all three PVTs. In such cases, effort across the three equally easy PVTs may be regarded as inconsistent and unreliable. It is probable that, when only one of these three PVTs is failed, data on other tests are also unreliable.

Table 9. Mean scores on CAVLT Immediate Recall by number of PVTs failed.

Diagnosis

In an analysis of variance, there were significant differences between diagnostic groups in on 10 out of the 11 ability test variables (all except the smell test). For example, the mean FSIQ scores ranged from 109 in a group with no diagnosis to 59 in the intellectual deficit group and the overall group differences were highly significant (df 46, 1071, p < .0001). Despite the wide spread of abilities in different diagnostic groups, there were no significant differences in either the mean effort scores or the failure rates on these three PVTs by diagnostic group, whether using parametric (analysis of variance [ANOVA]) or nonparametric statistics (Kruskal-Wallis).

One of the most severely impaired groups was the children with FASD, which is the single major preventable cause of intellectual deficits. In this group with widespread cognitive impairment, the mean FSIQ was 78 (SD 12) but the mean MSVT effort score was 98% correct, the mean WMT effort score was 94% correct and the mean NVMSVT effort score was 96% correct. In the intellectual deficit group, with a mean FSIQ of 59 (SD 11), the mean MSVT effort score was 96% correct, the mean WMT effort score was 96% correct and the mean NVMSVT effort score was 96% correct. These are close to perfect mean scores despite the children having severe cognitive impairment and despite the fact that some of the children failed some of these tests.

Discussion

As we may deduce from their diagnoses and from their ability test scores shown in the tables, the children of this study mostly suffered from disabling conditions, including intellectual deficits, FASD, childhood schizophrenia, autism, language disorders, ADHD and various neurological conditions, such that most of them were under the care of Social Services. The reason for studying these children in the first place was that many of them would be expected to be severely impaired in numerous ways. If severely handicapped children could pass certain PVTs, then adults should also be able to pass them, assuming good effort. One purpose of the current study was to discover whether cognitive impairment in such children could explain failure on the WMT, the MSVT, or the NVMSVT.

The failure rates in the developmentally disabled children were low on each of the three PVTs. The mean scores on the effort measures within the WMT, the MSVT, and the NVMSVT were high and unrelated to intelligence. Whereas the diagnostic groups differed significantly in terms of intelligence, there were no clinically significant differences between groups on the effort measures of the three PVTs employed. The children ranged in age from 7 to 17 years but the PVT effort scores did not differ based on age. In those children who passed these PVTs, the mean recognition test scores were at the same level as seen in healthy adults.

The primary hypothesis was that failure on the PVTs would be linked with lowered test scores on ability tests and this hypothesis was supported. When any one of the PVTs was failed, there was a widespread and significant suppression of scores across almost all of the ability tests which they were given. One example is that the mean FSIQ of those failing the MSVT was about one standard deviation lower than those passing the MSVT (). The best explanation is that poor effort caused the low PVT scores and that poor effort in this minority of children permeated the whole assessment, leading to ability scores which significantly underestimated the children’s true abilities, including FSIQ.

As Kirkwood (Citation2015) stated, the size of the impact of poor effort on the test battery is extremely large and, if not recognized, poor effort would lead to the children’s abilities being grossly underestimated. The extent of the effect of failing a PVT in the children of the current study is remarkable, as we can see by scanning . Yet, young children with a FSIQ below 70 scored just as highly on the effort measures of all three tests as 17 year old children of significantly higher intelligence. The group with the lowest FSIQ (mean 59, SD 11) had the label “intellectual deficit.” Their mean MSVT effort score was 96% correct, their mean WMT effort score was 96% correct, and their mean NVMSVT effort score was 96% correct. Such high scores provide further evidence that these measures are insensitive to impairment in these developmentally disabled children.

There was a wide range of cognitive impairment by diagnostic group, as shown by the ability tests. However, there was no difference between the failure rates on any of the three PVTs by diagnosis. The most severely impaired groups failed the PVTs no more often than the least severely impaired groups. This was another manifestation of the fact that the PVTs main effort measures were unaffected by differences in abilities.

Some research with embedded PVTs has led to the suggestion that invalid test results should only be concluded if at least two PVTs are failed or even three (e.g., Lichtenstein, Flaro, Baldwin, Rai, & Erdodi, Citation2019). The impetus for such a criterion came from the fact that all embedded PVTs, such as Reliable Digit Span (RDS; Greiffenstein, Baker, & Gola, Citation1994) are based on tests which were originally designed to measure actual ability. RDS is a score derived from an intelligence subtest. This leads to relatively low specificity in children for the original adult cutoff because children do not score the same as adults on these tests (e.g., Blaskewitz et al., Citation2008). Adults and children of varied intelligence levels differ widely in their abilities to repeat digits and people with brain diseases may obtain low scores as a result of the disease (e.g., Schroeder, Twumasi-Ankrah, Baade, & Marshall, Citation2012). The problem of low specificity may be partially overcome by lowering cutoffs but this leads to reduced sensitivity to poor effort. Another strategy to enhance the specificity of embedded PVTs is to require failure not just on one embedded test but on two or more PVTs (Boone, Citation2013).

An important finding of the current study is that failing only one of the three PVTs led to ability test scores in the children, which significantly underestimated true ability on two-thirds of all the ability tests administered (see and ). The guidelines for interpreting embedded tests cannot be applied to the current PVTs because they do not share the same properties as embedded tests. Notably, the current PVTs are extremely easy even for developmentally disabled children, they are insensitive to most forms of brain disease and they are unrelated to FSIQ and age, as shown in this study. Embedded measures have not been shown to share these features.

Interpretation of failure on one of the current PVTs should not be ignored because of a “two or more” rule derived from very different PVTs. Instead, it should be based on data from relevant clinical groups. For example, a child with a mild head injury, with no radiological brain abnormalities and no loss of consciousness might fail the MSVT. The examiner would have to ask whether it makes clinical sense for such a child to fail the MSVT based on the published literature. The answer in this case would be no because we know that even with severe TBI, children and adults do not fail the MSVT if they try to pass (Carone, Citation2008; Macciocchi et al., Citation2017). Failure of the MSVT alone would not be explainable by mild TBI but it would predict lowered scores across a battery of ability tests, which is what we observe in . It would not make sense clinically to ignore such a finding and to claim that we should only conclude invalid results if two PVTs are failed.

In addition, in this study, there was no significant difference between the 11 ability test scores in those who failed only one PVT versus those failing two PVTs. Caution is needed because the absence of a difference might be a function of a low number of cases failing two or more PVTs in the children of this study. Generally, it is probably true that the more PVTs that are failed the greater the degree to which other test scores are suppressed. Failure on three PVTs in this study did lead to lower scores than in those failing only one PVT. In any event, failure of only one PVT in this study did lead to significantly lowered ability scores across the test battery (e.g., and ).

Failure rates will vary from one sample to another for many reasons, including the availability of financial rewards, educational accommodations for being cognitively impaired or the avoidance of punishment in criminal prosecutions. In the current sample, financial decisions such as whether to fund a group home for a teenager did exist but such external variables were not systematically recorded. It is likely that many of the children with intellectual deficits did not appreciate the fact that the test results could affect their funding in future. In this study, we did not have a subgroup that was highly motivated to do well to achieve an external goal but nevertheless, the failure rates on the PVTs were very low overall. Ideally, motivational variables would be measured in future studies and perhaps controlled where possible. In the study by Green and Flaro (Citation2003), incentives were offered to children who failed the WMT and this led to all but one of them passing when taking it a second time. That study was based on a number of children taken from early in the current series. Offering positive incentives to do well, rather than just offering rewards for participating in a study might be considered in future studies of children and adults, where there may be uncertainty about the reasons for PVT failure or where there is doubt about whether a full effort is being made.

One limitation to the current study is that we did not have a large sample of healthy children to compare with developmentally disabled children. This is mostly a problem with respect to the more difficult subtests, such as Free Recall, which are ability measures. It is not a problem for the easy recognition measures, since we know that if disabled children easily pass these measures then healthy children will also pass, as seen already with the MSVT (Green, Citation2004). If healthy child norms were available at multiple age levels, it would enhance the use of the harder subtests as memory tests.

Another limitation is that we could not compare mild versus moderate to severe TBI cases because the severity of injury was not recorded. In adults, failure on these PVTs has been found to be much higher in adults with mild TBI than with moderate or severe TBI, which is known as a reverse dose-response effect (Hill, Citation1965). We did observe an absence of a dose-response effect in children because there was no difference in failure rates on the PVTs by diagnosis, despite widely varying levels of cognitive impairment. This is another indicator that it is not cognitive impairment that causes failure on these PVTs.

Within the design of this study, involving retrospective analysis of clinical cases, we were able to establish high levels of specificity in the PVTs and also to show that PVT failure suppressed test scores but the question of sensitivity to poor effort was less well addressed. Other studies are relevant to these questions. For example, Carone (Citation2008) and Macciocchi et al. (Citation2017) proved that severe TBI in children and adults does not cause MSVT failure and so we may partially evaluate sensitivity by looking at failure rates in those with only mild TBI who fail the MSVT. All those who fail are either showing poor effort or are displaying far more impairment than the disabled children in the current study. Carone (Citation2008) found that children with severe TBI did not fail the MSVT, however, in the same study there was a 24% failure rate on the MSVT in adults with mild TBI, who presumably failed because of poor effort. Similarly, Armistead-Jehle (Citation2010) found about half of the soldiers with mild TBI failed the MSVT. Mild TBI cannot explain why they failed because even severe TBI does not caused failure on the MSVT (Macciocchi et al., Citation2017). When twice as many adults with mild TBI fail the WMT compared with moderate to severe TBI (Flaro, Green, Flaro, Green, & Robertson, Citation2007), we may assume poor effort in the mild TBI adults. Such studies show that the MSVT is sensitive to poor effort but not exactly how sensitive. Perhaps there were cases of poor effort in these samples that were not detected?

Establishing sensitivity is relatively easy in simulator studies and the WMT, MSVT, and NVMSVT have all been shown to be very highly sensitive in many simulator studies (e.g., Blaskewitz et al., Citation2008; Green et al., Citation2003; Henry et al., Citation2010; Merten, Green, Henry, Blaskewitz, & Brockhaus, Citation2005). Such studies are valuable but we also need clinical studies in which known malingerers are contrasted with presumed good effort cases. This is not as easy as it sounds, however, because PVTs differ markedly by sensitivity levels. If we define malingering using one PVT or two or three PVTs but they are of low sensitivity, we may draw false conclusions. People not defined as malingering by the low sensitivity PVTs will often fail another more sensitive PVT. The problem is the lack of a gold standard for poor effort or malingering (Mossman, Wygant, & Gervais, Citation2012).

Another limitation to the current study is that we did not record preexisting diagnoses and then document whether or not current test results led to a change in diagnosis. We only have retrospective anecdotal evidence based on the recollection of the second author that there were some examples in which diagnosis was affected and retrospective recall over a 20 year period is not very reliable. In a previous assessment, an eight year old boy had been found to be of borderline intelligence and placed in a class for “slow learners” but no PVTs had been used. In the current assessment, all PVTs were passed and his FSIQ was found to be 120. If PVTs had been used in the first assessment, borderline intelligence might not have been concluded and his class placement and treatment probably would have been different. On some occasions, children’s failures on PVTs were felt to be consistent with oppositional defiant disorder (ODD) or conduct disorder. Children with these clinical disorders often present as antagonistic, resistant, and negativistic about the evaluation but strangely enough there was no difference in failure rates on PVTs by diagnostic group in this study. In one case, having previously been found to be uncooperative and being diagnosed with ODD, the child passed all PVTs and it was felt that his clinical presentation was more consistent with ADHD. One adolescent, who had earlier been diagnosed with ADHD, failed all the effort measures and was referred for investigation of possible Klinefelter Syndrome. That diagnosis was later confirmed by a pediatric neurologist. In the relatively few cases of poor effort in this study, the PVT results did not always affect diagnostic classification but we have evidence that the true abilities of these children were higher than suggested by their cognitive test results. Future studies are needed to address empirically whether PVT failure affects diagnosis and, if so, in what way.

In summary, the data from this large cohort of children show that the WMT, MSVT, and NVMSVT are suitable for use with children who have a variety of developmental disorders. Even the group with a mean FSIQ of 59 scored as well as healthy adults in terms of their mean scores on the effort measures of these tests. Failure on these PVTs was linked with lowered scores across the ability test battery. Failure on any one of these tests in isolation predicts that ability test scores will underestimate true abilities, although a greater suppression of scores on ability tests is probable if more than one is failed. A clinical judgment is needed whenever any child or adult fails these PVTs. We have to ask ourselves “Does it make clinical sense that developmentally disabled children pass these PVTs whereas my client with diagnosis X, at age Y, and with external incentives Z fails them? Is my client really more impaired than those children?” More data are needed on healthy children to support the use of these PVTs as memory measures (e.g., Free Recall on the WMT). It is recommended that whenever possible careful control of external incentives should be employed in research studies involving PVTs. This will help in measuring sensitivity and specificity.

Conflicts of interest

P. Green is the author of the WMT, MSVT, and NVMSVT and his test publishing company, Green’s Publishing, distributes these computerized tests. L. Flaro has no financial interest in these tests. Data gathering was done as part of clinical work and no grant funding was received.

References

  • Allen, L. M., & Green, P. (1999). Severe TBI Sample Performance on CARB and the WMT; Supplement to the CARB ’97 and Word Memory Test Manuals. Cognisyst, North Carolina, USA.
  • Armistead-Jehle, P. (2010). Symptom validity test performance in U.S. veterans referred for evaluation of mild TBI. Applied Neuropsychology, 17(1), 52–59. doi:10.1080/09084280903526182
  • Armistead-Jehle, P., & Denney, R. L. (2015). The detection of feigned impairment using the WMT, MSVT, and NV-MSVT. Applied Neuropsychology, Adult, 22(2), 147–155. doi:10.1080/23279095.2014.880842
  • Armistead-Jehle, P., & Gervais, R. (2011). Sensitivity of the test of memory malingering and the nonverbal medical symptom validity test: A replication study. Applied Neuropsychology, 18(4), 284–290. doi:10.1080/09084282.2011.595455
  • Baron, I. S. (2019). Neuropsychological evaluation of the child: Domains, methods and case studies. New York, USA: Oxford University Press, p. 938.
  • Blaskewitz, N., Merten, T., & Kathmann, N. (2008). Performance of children on symptom validity tests: TOMM, MSVT, and FIT. Archives of Clinical Neuropsychology, 23(4), 379–391. doi:10.1016/j.acn.2008.01.008
  • Boone, K. (2013). Clinical practice of forensic neuropsychology. New York, USA: Guilford.
  • Bush, S. S., Ruff, R. M., Troster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., … Silver, C. H. (2005). Symptom validity assessment: Practice issues and medical necessity. NAN Policy and Planning Committee. Archives of Clinical Neuropsychology, 20(4), 419–426. doi:10.1016/j.acn.2005.02.002
  • Carone, D. A. (2008). Children with moderate/severe brain damage/dysfunction outperform adults with mild-to-no brain damage on the Medical Symptom Validity Test. Brain Injury, 22(12), 960–971. doi:10.1080/02699050802491297
  • Carone, D. A. (2014). Young child with severe brain volume loss easily passes the Word Memory Test and Medical Symptom Validity Test: Implication for mild TBI. The Clinical Neuropsychologist, 28(1), 146–162. doi:10.1080/13854046.2013.861019
  • Carone, D. A., Green, P., & Drane, D. L. (2013). Word memory test profiles in two cases with surgical removal of the left anterior hippocampus and parahippocampal gyrus. Applied Neuropsychology: Adult, 21(2), 155–160. doi:10.1080/09084282.2012.755533
  • Culbertson, W. C., & Zillmer, E. A. (2001). Tower of London, Drexel University. 2nd ed. Toronto, Canada: Multi-Health Systems Inc.
  • Flaro, L., Green, P., Flaro, L., Green, P., & Robertson, E. (2007). Word Memory Test failure 23 times higher in mild brain injury than in parents seeking custody: The power of external incentives. Brain Injury, 21(4), 373–383. doi:10.1080/02699050701311133
  • Gill, D., Green, P., Flaro, L., & Pucci, T. (2007). The role of effort testing in independent medical examinations. Medico-Legal Journal, 75(2), 64–71. doi:10.1258/spmlj.75.2.64
  • Goodrich-Hunsaker, N. J., & Hopkins, R. O. (2009). Word Memory Test performance in amnesic patients with hippocampal damage. Neuropsychology, 23(4), 529–534. doi:10.1037/a0015444
  • Green, P. (1989). Alberta smell test: Instructions, record forms and clinical data. Kelowna, Canada: Green’s Publishing.
  • Green, P., Allen, L., Astner, K. (1996). Manual for the computerized Word Memory Test. Cognisyst, N.C.
  • Green, P. (2003). Green’s Word Memory Test for Windows user’s manual (Revised 2005). Kelowna, BC, Canada: Green’s Publishing.
  • Green, P. (2004). Green’s Medical Symptom Validity Test (MSVT) for Microsoft Windows user’s manual. Kelowna, BC, Canada: Green’s Publishing.
  • Green, P. (2008). Green’s Nonverbal Medical Symptom Validity Test (NV-MSVT) for Microsoft Windows user’s manual. Kelowna, BC, Canada: Green’s Publishing.
  • Green, P. (2011). Comparison between the Test of Memory Malingering (TOMM) and the Non-Verbal Medical Symptom Validity Test (NV-MSVT) in adults with disability claims. Applied Neuropsychology, 18(1), 18–26. doi:10.1080/09084282.2010.523365
  • Green, P., & Allen, L. (1999). Performance of Neurological Patients on the Word Memory Test (WMT) and Computerized Assessment of Response Bias (CARB); Supplement to the Word Memory Test and CARB ’97 Manuals, Cognisyst, North Carolina, USA.
  • Green, P., & Astner, K. (1995). Manual for the Oral Word Memory Test. North Carolina, USA: Cognisyst.
  • Green, P., & Flaro, F. (2015). Results from three Performance Validity Tests (PVTs) in adults with intellectual deficits. Applied Neuropsychology, Adult, 22(4), 293–303.
  • Green, P., & Flaro, L. (2003). Word Memory Test performance in children. Child Neuropsychology, 9(3), 189–207. doi:10.1076/chin.9.3.189.16460
  • Green, P., & Flaro, L. (2016). Results from three Performance Validity Tests in children with intellectual disability. Applied Neuropsychology, Child, 5(1), 25–34. doi:10.1080/21622965.2014.935378
  • Green, P., Flaro, L., Brockhaus, R., & Montijo, J. (2012). Performance on the WMT, MSVT, & NV-MSVT in children with developmental disabilities and in adults with mild traumatic brain injury. In C. R. Reynolds & A. Horton (Eds.), Detection of malingering during head injury litigation (2nd ed.). New York, NY: Plenum Press.
  • Green, P., Flaro, L., & Gervais, R. (1986, revised 2008). Green’s Emotional Perception Test (EPT). Kelowna, Canada: Green’s Publishing.
  • Green, P., & Kramar, K. (1983). Auditory comprehension test (renamed Story Recall Test). Kelowna, Canada: Green’s Publishing.
  • Green, P., Lees-Haley, P., & Allen, L. M. (2003). The Word Memory Test and the Validity of Neuropsychological Test Scores. Journal of Forensic Neuropsychology, 2(3–4), 97–124. doi:10.1300/J151v02n03_05
  • Green, P., Montijo, J., & Brockhaus, R. (2011). High specificity of the Word Memory Test and Medical Symptom Validity Test in groups with severe verbal memory impairment. Applied Neuropsychology, 18(2), 86–94. doi:10.1080/09084282.2010.523389
  • Green, P., Rohling, M. L., Lees-Haley, P. R., & Allen, L. M. (2001). Effort has a greater effect on test scores than severe brain injury in compensation claimants. Brain Injury, 15(12), 1045–1060. doi:10.1080/02699050110088254
  • Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6(3), 218–224. doi:10.1037//1040-3590.6.3.218
  • Heaton, R. K., & PAR Staff. (1993). WCST: Computer Version 4 research edition. Lutz, FL: Psychological Assessment Resources Inc.
  • Henry, M., Merten, T., Wolf, S. A., & Harth, S. (2010). Nonverbal Medical Symptom Validity Test performance of elderly healthy adults and clinical neurology patients. Journal of Clinical and Experimental Neuropsychology, 32(1), 19–27. doi:10.1080/13803390902791653
  • Hill, A. B. (1965). The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine, 58(5), 295–300.
  • Howe, L. L. S., Anderson, A. M., Kaufman, D. A. S., Sachs, B. C., & Loring, D. W. (2007). Characterization of the Medical Symptom Validity Test in evaluation of clinically referred memory disorders clinic patients. Archives of Clinical Neuropsychology, 22(6), 753–761. doi:10.1016/j.acn.2007.06.003
  • Howe, L. L. S., & Loring, D. W. (2008). Classification accuracy and predictive ability of the Medical Symptom Validity Test’s dementia profile and general memory impairment profile. The Clinical Neuropsychologist, 23(2), 329–342. doi:10.1080/13854040801945060
  • Kirkwood, M. W. (2015). Review of pediatric performance and symptom validity tests. In M. W. Kirkwood (Ed.), Validity testing in child and adolescent assessment: Evaluating exaggeration, feigning, and noncredible effort. New York, NY: Guilford.
  • Larochette, A. C., & Harrison, A. G. (2012). Word Memory Test performance in Canadian adolescents with learning disabilities: A preliminary study. Applied Neuropsychology Child, 1(1), 38–47. doi:10.1080/21622965.2012.665777
  • Larson, J. C., Flaro, L., Peterson, R. L., Connery, A. K., Baker, D. A., & Kirkwood, M. W. (2015). The Medical Symptom Test measures effort not ability in children: A comparison between mild TBI and Fetal Alcohol Spectrum Disorder samples. Archives of Clinical Neuropsychology, 30, 192–199. doi:10.1093/arclin/acv012
  • Lichtenstein, J. D., Flaro, L., Baldwin, F. S., Rai, J., & Erdodi, L. A. (2019). Further evidence for embedded performance validity tests in children with the Connors Continuous Performance Test-Second Edition. Developmental Neuropsychology, 44(2),159–171. https://doi.org/10.1080/87565641.2019.1565535
  • Macciocchi, S. N., Seel, R. T., Yi, A., & Small, S. (2017). Medical symptom validity test performance following moderate-severe traumatic brain injury: Expectations based on orientation log classification. Archives of Clinical Neuropsychology, 32(3), 339–348. doi:10.1093/arclin/acw112
  • Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015). Neuropsychologists’ validity testing beliefs & practices: A survery of North American professionals. The Clinical Neuropsychologist, 29(6), 741–776. doi:10.1080/13854046.2015.1087597
  • Merten, T., Green, P., Henry, M., Blaskewitz, N., & Brockhaus, R. (2005). Analog validation of German-language symptom validity tests and the influence of coaching. Archives of Clinical Neuropsychology, 20(6), 719–726. doi:10.1016/j.acn.2005.04.004
  • Mossman, D., Wygant, D., & Gervais, R. (2012). Estimating the accuracy of neurocognitive effort measures in the absence of a “Gold Standard”. Psychological Assessment, 12, 1040–3590.
  • Rabin, L. A., Spadaccini, A. T., Brodale, D. L., Grant, K. S., Elbulok, M. M., & Barr, W. B. (2014). Utilization rates of tests & test batteries among clinical psychologists in the United States & Canada. Professional Psychology Research & Practice, 45(5), 368–377. doi:10.1037/a0037987
  • Richman, J., Green, P., Gervais, R., Flaro, L., Merten, T., Brockhaus, R., & Ranks, D. (2006). Objective tests of symptom exaggeration in independent medical evaluations. Journal of Occupational and Environmental Medicine, 48(3), 303–311.
  • Schroeder, R. W., Twumasi-Ankrah, P., Baade, L. E., & Marshall, P. S. (2012). Reliable digit span: A systematic review and cross-validation study. Assessment, 19(1), 21–30.
  • Singhal, A., Green, P., Ashaye, K., Shankar, K., & Gill, D. (2009). High specificity of the Medical Symptom Validity Test in patients with very severe memory impairment. Archives of Clinical Neuropsychology, 24(8), 721–728. doi:10.1093/arclin/acp074
  • Stevens, A., Friedel, E., Mehren, G., & Merten, T. (2008). Malingering and uncooperativeness in psychiatric and psychological assessment: Prevalence and effects in a German sample of claimants. Psychiatry Research, 157(1–3), 191–200. doi:10.1016/j.psychres.2007.01.003
  • Talley, J. L. (1988). Children’s auditory verbal learning test-2. Odessa, FL: Psychological Assessment Resources, Inc.
  • Trites, R. N. (1977). Grooved Pegboard by Lafayette instruments, Neuropsychological Test Manual. Ottawa, Ontario, Canada: Royal Ottawa Hospital.
  • Wechsler, D. (2004). The Wechsler intelligence scale for children (4th Ed.; Canadian Norms). Toronto, Ontario; Harcourt Assessment.
  • Wilkinson, G. S., & Robertson, G. J. (2006). Wide range achievement test 4, professional manual. Lutz, FL: Psychological Assessment Resources.