Abstract
In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.
Notes
1 It is typically assumed that the two stages are equally likely to be compromised once they have both been repeatedly used. Under certain circumstances, a priori information of which items are most likely to be compromised is available, and thus T2 can only be formed by those items. But in this study, we do not make such an assumption.
2 The variance of normal distribution is the same as the variance of the c-parameters of T1 items with high-guessing parameters.