132
Views
3
CrossRef citations to date
0
Altmetric
Articles

When CAT is not an option: complementary methods of test abbreviation for neurocognitive batteries

, , , , , , & show all
Pages 35-54 | Received 20 May 2020, Accepted 30 Nov 2020, Published online: 11 Dec 2020
 

ABSTRACT

Introduction

There is an obvious need for efficient measurement of neuropsychiatric phenomena. A proven method—computerized adaptive testing (CAT)—is not feasible for all tests, necessitating alternatives for increasing test efficiency.

Methods

We combined/compared two methods for abbreviating rapid tests using two tests unamenable to CAT (a Continuous Performance Test [CPT] and n-back test [NBACK]). N=9,498 (mean age 14.2 years; 52% female) were administered the tests, and abbreviation was accomplished using methods answering two questions: what happens to measurement error as items are removed, and what happens to correlations with validity criteria as items are removed. The first was investigated using quasi-CAT simulation, while the second was investigated using bootstrapped confidence intervals around full-form-short-form comparisons.

Results

Results for the two methods overlapped, suggesting that the CPT could be abbreviated to 57% of original and NBACK could be abbreviated to 87% of original with the max-acceptable loss of precision and min-acceptable relationships with validity criteria.

Conclusions

This method combination shows promise for use in other test types, and the divergent results for the CPT/NBACK demonstrate the methods’ abilities to detect when a test should not be shortened. The methods should be used in combination because they emphasize complementary measurement qualities: precision/validity..

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Note that the benefits of CAT depend on how it is applied. One way (variable-length) is to allow tests to proceed until some precision criterion is satisfied. This ensures that each examinee has a comparable (and acceptable) level of precision (see Babcock & Weiss, Citation2009; Paap et al., Citation2019). An alternative (fixed-length) is to administer a specific number of items to all examinees, where examinees can finish the test with varying levels of precision, yet the length of the test is predictable. Importantly, a fixed-length adaptive form will still usually produce results superior to a fixed-length short-form because the items administered in the fixed-length CAT will be selected to maximize information. An obvious problem with variable-length CATs for large-scale studies is that the varying length makes them difficult to fit in the tight participant schedules (measure after measure after measure) common in large scale studies.

2 “Desirable” in this case depends on the goals of the test. For example, if the test were designed to detect cognitive impairment in an elderly population, the desirable range of difficulty parameters would be toward the low end of the continuum, because the goal is to distinguish those with versus without impairment (implying low), not with versus without high-level abilities. Likewise, if there is no specific target population, then the desirable range of difficulties would roughly match what one expects in the general population: a normal curve (most difficulty parameters near the mean, with fewer and fewer as one moves away from the mean).

3 In both Shanmugan et al. (Citation2016) and Moore et al. (Citation2019), the model described here was used as an intermediate step in the process of obtaining orthogonal bifactor (Reise et al., Citation2010) scores. While the goal in those earlier studies was to obtain orthogonal scores (or a “p” factor in the case of Moore et al.), the present study does not need orthogonal scores, and therefore the correlated scores from the “middle step” correlated-traits model were used here.

4 Note that the idea of judging the “quality” of items based on exposure (frequency of administration) in CAT simulation is not new, and it is certainly possible that the methods demonstrated in Reise and Henson (Citation2000) and Choi et al. (Citation2010) would produce results identical to those seen here. However, Moore et al. (Citation2015) differs from the above approaches in that the above emphasize administration rank (an item is good if it is administered early), whereas Moore et al. (Citation2015) emphasize total item administrations. The two approaches can produce different results if, for example, there is an item with very high administration frequency that tends to be administered late. This can happen if several CAT item-administration “chains” converge on the item later in the administration sequence.

5 Two assumptions here are that, 1) the distribution of the trait is normal (Gaussian), such that an optimal distribution of difficulties would reflect the high proportion of examinees around the mean, and 2) each item’s characteristics can be assessed “in a vacuum”, independent of administration order or which items it ends up combining with. Accordingly, an “optimal” difficulty is one close to the mean, and higher discrimination is always better. In reality, the “optimal” workings of CAT are more complex, such that this phrase “optimal combination of discrimination and difficulty” could not be asserted without more qualifications. For example, it is not necessarily true that higher discrimination parameters are always better, as this will depend on when the items are administered during the test (Chang & Ying, Citation1999). See Chang (Citation2015) for summary.

6 In a typical test/scale revision pipeline (see, e.g., Reise et al., Citation2000), an item/stimulus with such poor item qualities would be removed from future versions. For this specific application, however, we are treating the test as a fixed program that can be only shortened, not altered at the source code (stimulus presentation) level. If the task were reprogrammed from the beginning, it might be advisable to remove the problematic stimuli.

7 Note that Cronbach’s alpha is given here only as additional information. It is not used in the simulations.

Additional information

Funding

This work was supported by the Lifespan Brain Institute (LiBI); NIMH grants MH089983, MH117014, and MH096891; and the Dowshen Neuroscience Fund Program for Neuroscience. National Institute of Mental Health.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 267.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.