450
Views
0
CrossRef citations to date
0
Altmetric

Over the past three decades, measurement of response validity in clinical neuropsychology has become a topic of keen importance. Prior to the 1990s, few neuropsychologists regularly assessed performance validity (cf., Lezak, Citation1976), and embellished or malingered cognitive deficits were thought unlikely (cf. Bigler, Citation1986). Once deemed relevant only for forensic evaluations, there are ample indications that some individuals undergoing clinical evaluations obtain abnormal scores on performance validity tests (cf. Martin & Schroeder, Citation2020). Practice guidelines have emerged, and measurement of performance validity is now a norm rather than exception (Bush et al., Citation2005; Chafetz et al., Citation2015, Heilbronner et al., Citation2009; Sweet et al., Citation2021).

Coinciding with the growing use of performance validity testing in medico-legal and clinical evaluations, research involving performance validity measures has grown dramatically. In the mid 1990s, less than 10% of publications in key neuropsychological journals concerned measurement of performance validity, but this value more than doubled two decades later (Martin et al., Citation2015). This burgeoning topic has changed forensic and clinical practice in ways that are arguably transformative in scope and impact.

This issue of the Journal of Clinical and Experimental Neuropsychology is dedicated to innovations in the measurement of performance validity. As the literature concerning this topic area has accumulated, challenges have emerged, and this issue is intended to focus attention upon them. For instance, many of the most often used performance validity measures have existed for several decades. Knowledge of such instruments has permeated into the realm of lawyers, with attorneys being prone to coach clients into eluding detection of embellished cognitive dysfunction (cf. Lippa, Citation2017; Victor & Abeles, Citation2004). The potential compromise of testing instruments has resulted in the need for new measurement methods.

The current issue addresses this matter by depicting two new measures of performance validity utilizing novel methodologies. Patrick et al. (Citation2024) employed pupillometry to discriminate individuals simulating cognitive impairment from people who had sustained a traumatic brain injury. Pupillometry indices measure unintentional responses. Patrick et al.’s findings suggest that identification of non-credible test performance may occur without referencing cognitive performance. Detection of non-credible performance may occur while circumventing examinees’ deliberate efforts to modulate test-taking strategies. Such a technique is unique and could hold transformative promise as a performance validity indicator.

Basso et al. (Citation2024) offer another innovative instrument. Noting that non-declarative/implicit memory is relatively robust to many forms of brain damage, they employed a measure of perceptual memory as a performance validity indicator. Relying upon a simulation design, they showed that two perceptual memory tests achieved classification accuracy that compared favorably to commonly used standalone and embedded performance validity tests. Although clinical data were absent from this investigation, the perceptual memory indices hold promise as novel and theoretically supported performance validity tests.

In addition to novel instruments, Whiteside et al. (Citation2024) extended the use of performance validity tests to a unique and understudied population. They examined prevalence of non-credible performance in people with COVID-related cognitive complaints. Although the sample comprised consecutive referrals in a clinical setting, 25% of the examinees were seeking compensation (i.e., disability benefits). Approximately 9% of the sample failed two or more performance validity indicators, and over 50% of these cases possessed external incentive to embellish their dysfunction. Among those who passed performance validity indicators less than 25% sought compensation. Not only did those seeking compensation fail performance validity indicators more often, but their neurocognitive profiles were also markedly worse than those not seeking compensation. Remarkably, those not seeking compensation tended to perform normally on objective neurocognitive tests.

Further enhancing clinical practice, several papers refined the accuracy of commonly used performance validity measures. Peak et al. (Citation2024) compared embedded indices from the Trail Making Test in heterogeneous groups of veterans, optimizing cutoff values. Notably, Rohling et al. (Citation2024) evaluated the optimal cutoff scores for the Word Memory Test across a large sample of examinees. They determined that an elevated risk of false positives exists with standard criterion scores, and they offer more conservative thresholds to enhance specificity. Such efforts enhance classification accuracy of performance validity indices, making it more difficult to elude detection and reducing the risk of false positive errors.

Apart from refining testing instruments, normative reference groups likely require updating. In particular, population diversity in western nations will continue to increase markedly over the coming decades. To assure that test scores are interpreted accurately, a relevant normative reference must exist, thereby requiring increasingly diverse normative samples. Achieving this objective is arguably one of the most significant challenges to clinical neuropsychology. Remarkably, few studies of performance validity measures have focused upon this topic. Denning and Horner (Citation2024) address this issue by examining performance validity test failure among African American and Caucasian veterans. Despite controlling for demographic and clinical factors, embedded measures were prone to false positive errors among the African American examinees. This raises concerns regarding ethnicity-mediated classification errors when assessing performance validity. Such findings should hasten efforts to enhance diversity of normative reference groups.

Extending the issue of diversity, Tierney et al. (Citation2024) examined the impact of performance validity test failure upon neuropsychological performance in veterans evaluated for clinical rather than disability purposes. Most research concerning this population has focused upon veterans who are evaluated for compensation or forensic purposes. Surprisingly, those who failed one performance validity measures tended to achieve normal scores on more than half of neuropsychological tests, with some domains remaining normal as much as 88% of the time. These data raise questions about the meaning of performance validity tests. Indeed, performance validity tests may be criticized for what they seem to lack, namely a coherent operational definition of what they measure.

What do performance validity tests measure?

The field has struggled to offer an operational definition of key constructs, and intelligence is a salient example of this nebulous quality. Perhaps best stated by Hogan (Citation2006), “ … .the study of intelligence is one of the most anti-intellectual areas of all psychology – there are no theories of intelligence, there is just a measurement model.” Over 80 years prior, Boring (Citation1923), a prominent figure in the study of intelligence, anticipated Hogan, stating “ … intelligence as a measurable capacity must at the start be defined as the capacity to do well in an intelligence test … . measurable intelligence is simply what the tests of intelligence test (emphasis added) … .” Arguably, the study of intelligence has evolved, but a universally accepted definition eludes our discipline. Clinical measures of intelligence such as the Wechsler scales remain a patchwork of subtests that evolved over the past 100 years. They have been retained because they possess empirical validity. Basic theory, when invoked, tends to be retrofitted upon applied tools. Generally, theory has not inspired measurement in clinical settings.

When considering performance validity tests, a clear, consensually accepted, and validated operational definition also seems absent. Initially, such tests were deemed measures of malingering (e.g., Tombaugh, Citation1996). However, existing definitions of malingering require some form of external incentive (e.g., Sherman et al., Citation2020). Examinees may fail performance validity tests in clinical settings when an external incentive is absent or unknown, especially among patients diagnosed with functional neurological disorders or traumatic brain injury (McWhirtier et al., Citation2020). Because performance validity tests are failed by individuals without a known external incentive (cf. Chafetz et al., Citation2015; Martin & Schroeder, Citation2020), such a descriptor seems at best insufficient and at worst harmful in clinical settings. Consequently, malingering offers an inadequate operational definition.

The American Academy of Clinical Neuropsychology published a consensus statement concerning the labeling of test scores (Guilmette et al., Citation2020), and referred to performance validity tests as, “ … measures primarily used to identify concerns regarding test engagement, symptom magnification, effort, and test validity.” Although clear, such terms may not describe what is measured by performance validity tests. Among experts in the field, effort and engagement are eschewed as descriptors (Schroeder et al., Citation2016). Presumably, this is because efforts to exaggerate cognitive dysfunction require engagement and effort on the part of examinees. Some people may exert considerable effort to misrepresent their level of function, and below-chance scores on performance validity indicators may be achieved.

Ultimately, the tacit consensus among experts is that performance validity tests measure performance validity (Schroeder et al., Citation2016). Such an operational definition appears tautological. Absent an empirically supported operational definition of what is measured by performance validity tests, such an approach is also unscientific. A straightforward question begs for answers. What do performance validity tests measure?

In addressing this issue, it might be reasonable to borrow from other measures of response bias. For example, personality instruments such as the Personality Assessment Inventory or versions of the Minnesota Multiphasic Personality Inventory identify biased responding in an empirical manner. Symptom validity scales include items that are endorsed infrequently, even among individuals with authentic mental illness. Base rates of endorsement are empirically established. High rates of improbable item endorsement on validity scales suggest that responses to clinically relevant scales may be biased or invalid.

Arguably, performance validity measures employ a similar logic. Criterion scores for performance validity tests are established upon base rates of performance. Too many improbable responses on performance validity measures suggest that performance on neurocognitive measures is biased or invalid. In such a circumstance, variance on neurocognitive measures is largely associated with response bias, and the construct of interest is inaccurately measured. Employing this perspective, performance validity tests may be operationally defined as measuring improbable performance. This empirical definition establishes a clear and discrete means of establishing the utility of performance validity measures.

Such an actuarial definition may seem to obviate inferences about intentions of examinees, but this is not necessarily true. When classifying pathology, positive and negative symptoms are sometimes referenced. Positive symptoms refer to the presence of signs or behaviors that are normally absent. Negative symptoms refer to the absence of signs or behaviors that are normally present. The most obvious positive indicator of improbable performance would be below chance performance. Such bias is likely intentional. Negative improbable performance refers to scores that do not fall below chance. However, such scores remain below an established base rate of performance among patients. Inferences regarding the intentionality of negative improbable performance cannot be made with certainty. With positive and negative responses, base rate performance on performance validity tests is used to determine improbable test scores.

Inasmuch as base rates are used to establish cutoffs for performance validity tests, a potential vulnerability emerges. In particular, many performance validity tests employ a single criterion value to establish improbable performance regardless of patient population. Tests such as the Test of Memory Malingering and Word Memory Test rely upon universal cutoff scores. Their respective criterion values were largely established among patients with mild traumatic brain injury. Patients with more severe cognitive impairment likely require more conservative cutoff scores. Research has been conducted with various populations using these tests, and classification accuracy has been demonstrated using the established cutoff values. Nonetheless, few studies of the Test of Memory Malingering or Word Memory Tests have determined whether population specific cutoff scores are optimal or necessary.

In contrast, measures such as the Dot Counting Test and Word Choice Test reference population specific cutoff scores. Criterion values may vary considerably between populations on these instruments. It seems dubious that the “one size fits all” criteria of the Test of Memory Malingering and Word Memory Test are optimal. Indeed, in the current issue, the work of Rohling et al. (Citation2024) implies that population specific cutoffs should be employed when using the Word Memory Test. As the field moves forward, it would seem imperative to establish population specific criteria to determine improbable scores on performance validity tests. Determining the right cutoff for patients of a particular diagnosis and specific demographic profile will optimize classification accuracy.

Recommendations

It seems essential that our discipline strive to establish empirically valid innovations involving performance validity testing. Sound application of performance validity tests should be a top priority. In this regard, we offer the following recommendations for research and practice:

  1. We should aggressively control risk of false positives, thereby reducing risk of harm to patients and forensic examinees. Toward this end, we echo existing recommendations to employ multiple performance validity indicators assessed through multiple methods, and base rate criteria should be established upon high specificity values (e.g., Sherman et al., Citation2020). Cognizance of base rates among multiple indicators should be noted, but failure of multiple performance validity tests should remain a criterion of improbable responding.

  2. To mitigate risk of compromised test security, the field should develop new performance validity indicators. Efforts should offer innovation over existing instruments. Potential examples of innovative methodology include adapting symptoms validity measures as performance validity measures, integrating biometric data with PVT test data, and developing more embedded measures for various cognitive domains.

  3. As new instruments are developed and as existing measures are enhanced, researchers should empirically establish the validity of population specific cutoffs. To enhance applied utility, patient populations should include diagnostic groups that are commonly seen in clinic but not necessarily in forensic settings (e.g., neurodegenerative conditions affecting middle-aged adults). Norms should include diverse demographic groups with sufficient sample size to offer stable normative estimates. Much in the way that the Dot Counting Test or Word Choice Test are employed, clinicians might apply norms for a diagnostically relevant population to specific patients. For instance, if a patient is being referred with a presumptive or provisional diagnosis of a neuropsychiatric condition, relevant norms may be accessed to interpret performance validity test scores. Borrowing from precision medicine, we should establish what is improbable for which people with a particular diagnosis.

The collection of papers in this special issue offers a contemporary depiction of performance validity research. They also offer tantalizing hints of where research involving PVTs may evolve. We hope that these papers stimulate a collegial exchange of ideas and that they offer a catalyst as the field moves forward.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Basso, M. R., Guzman, D., Hoffmeister, J., Mulligan, R., Whiteside, D. M., & Combs, D. (2024). Use of perceptual memory as a performance validity indicator: Initial validation with simulated mild traumatic brain injury. Journal of Clinical and Experimental Neuropsychology, 1–12. https://doi.org/10.1080/13803395.2024.2314991
  • Bigler, E. (1986). Forensic issues in neuropsychology. In D. Wedding, A. Horton Jr., & J. Webster (Eds.), The neuropsychology handbook: Behavioral and clinical perspectives (pp. 526–547). Springer Publishing.
  • Boring, E. G. (1923). Intelligence as the tests test it. New Republic, 36, 35–37. https://brocku.ca/MeadProject/sup/Boring_1923.html#:~:text=What%20do%20they%20mean%3F&text=They%20mean%20in%20the%20first,rigorous%20discussion%20of%20the%20tests
  • Bush, S. S., Ruff, R. M., Troster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., Reynolds, C. R., & Silver, C. H. (2005). Symptom validity assessment: Practice issues and medical necessity. Archives of Clinical Neuropsychology, 20, 419–426.
  • Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini, K. J., Boone, K. B., Kirkwood, M. W., Larrabee, G. J., & Ord, J. S. (2015). Official position of the American Academy of Clinical Neuropsychology Social Security administration policy on validity testing: Guidance and recommendations for change. The Clinical Neuropsychologist, 29(6), 723–740. https://doi.org/10.1080/13854046.2015.1099738
  • Denning, J. H., & Horner, M. D. (2024). The impact of race and other demographic factors on the false positive rates of five embedded performance validity tests (PVTs) in a Veteran sample. Journal of Clinical and Experimental Neuropsychology, 1–11. https://doi.org/10.1080/13803395.2024.2314737
  • Guilmette, T. J., Sweet, J. J., Hebben, N., Koltai, D., Mahone, E. M., Spiegler, B. J., Stucky, K., Westerveld, M., & Conference, P. (2020). American academy of clinical neuropsychology consensus conference statement on uniform labeling of performance test scores. The Clinical Neuropsychologist, 34(3), 437–453. https://doi.org/10.1080/13854046.2020.1722244
  • Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., Millis, S. R., & Conference Participants. (2009). American Academy of Clinical Neuropsychology Consensus Conference Statement on the Neuropsychological Assessment of Effort, Response Bias, and Malingering, The Clinical Neuropsychologist (Vol. 23. pp. 1093–1129).
  • Hogan, R. (2006). Who wants to be a psychologist? Journal of Personality Assessment, 86(2), 119–130. https://doi.org/10.1207/s15327752jpa8602_01
  • Lezak, M. D. (1976). Neuropsychological assessment. Oxford U Press.
  • Lippa, S. M. (2017). Performance validity testing in neuropsychology: A clinical guide, critical review, and update on a rapidly evolving literature. The Clinical Neuropsychologist, 32, 391–421. https://doi.org/10.1080/13854046.2017.1406146
  • Martin, P. K., & Schroeder, R. W. (2020). Base Rates of Invalid Test Performance Across Clinical Non-forensic Contexts and settings. Archives of Clinical Neuropsychology, 35(6), 717–725. https://doi.org/10.1093/arclin/acaa017
  • Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015). Neuropsychologists’ validity testing beliefs and practices: A survey of North American professionals. The Clinical Neuropsychologist, 29(6), 741–776. https://doi.org/10.1080/13854046.2015.1087597
  • McWhirter, L., Ritchie, C. W., Stone, J., & Carson, A. (2020). Performance validity test failure in clinical populations—a systematic review. Journal of Neurology, Neurosurgery & Psychiatry, 91, 945–952.
  • Patrick, S. D., Rapport, L. J., Hanks, R. A., & Kanser, R. J. (2024). Detecting feigned cognitive impairment using pupillometry on the warrington recognition memory test for words. Journal of Clinical and Experimental Neuropsychology, 1–10. https://doi.org/10.1080/13803395.2024.2312624
  • Peak, A. M., Marceaux, J. C., Chicota-Carroll, C., & Soble, J. R. (2024). Cross-validation of the trail making test as a non-memory-based embedded performance validity test among veterans with and without cognitive impairment. Journal of Clinical and Experimental Neuropsychology, 1–9. https://doi.org/10.1080/13803395.2023.2287784
  • Rohling, M. L., Demakis, G. J., & Langhinrichsen-Rohling, J. (2024). Adjusting the cutoffs for detection of invalid performance on the word memory test. Journal of Clinical and Experimental Neuropsychology. https://doi.org/10.1080/13803395.2024.2314736
  • Schroeder, R. W., Martin, P. K., & Odland, A. P. (2016). Expert beliefs and practices regarding neuropsychological validity testing. The Clinical Neuropsychologist, 30(4), 515–535. https://doi.org/10.1080/13854046.2016.1177118
  • Sherman, E. M. S., Slick, D. J., & Iverson, G. L. (2020). Multidimensional malingering criteria for neuropsychological assessment: A 20-year update of the malingered neuropsychological dysfunction criteria. Archives of Clinical Neuropsychology, 35(6), 735–764. https://doi.org/10.1093/arclin/acaa019
  • Sweet, J. J., Heilbronner, R. L., Morgan, J. E., Larrabee, G. J., Rohling, M. L., Boone, K. B., Kirkwood, M. W., Schroeder, R. W., Suhr, J. A., & Conference Participants. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35(6), 1053–1106. https://doi.org/10.1080/13854046.2021.1896036
  • Tierney, S. M., Matchanova, A., Miller, B. I., Troyanskaya, M., Romesser, J., Sim, A., & Pastorek, N. J. (2024). Cognitive “success” in the setting of performance validity test failure. Journal of Clinical and Experimental Neuropsychology, 1–9. https://doi.org/10.1080/13803395.2023.2244161
  • Tombaugh, T. N. (1996). TOMM: Test of memory malingering. Multi-Health Systems.
  • Victor, T. L., & Abeles, N. (2004). Coaching clients to take psychological and neuropsychological tests: A clash of ethical obligations. Professional Psychology, Research and Practice, 35(4), 373–379. https://doi.org/10.1037/0735-7028.35.4.373
  • Whiteside, D. M., Basso, M. R., Shen, C., Fry, L., Naini, S., Waldron, E. J., Holker, E., Porter, J., Eskridge, C., Logemann, A., & Minor, G. N. (2024). The relationship between performance validity testing, external incentives, and cognitive functioning in long COVID. Journal of Clinical and Experimental Neuropsychology, 1–10. https://doi.org/10.1080/13803395.2024.2312625

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.