ABSTRACT
Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the Standards for Educational and Psychological Testing, this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic criteria (e.g., exclude all examinees with RG rates of 10%) have been adopted in the literature. Given that these criteria lack strong methodological support, the objective of this simulation study was to evaluate their appropriateness in terms of individual ability estimate and classification accuracy when manipulating both assessment and RG characteristics. The findings provide evidence that employing a common criterion for all examinees may be an ineffective strategy because a given RG percentage may have differing degrees of biasing effects based on test difficulty, examinee ability, and RG pattern. These results suggest that practitioners may benefit from establishing context-specific exclusion criteria that consider test purpose, score use, and targeted examinee trait levels.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
1 RG has been documented in high-stakes testing contexts due to test-speededness (Schnipke & Scrams, Citation1997). However, the focus of this paper is on RG associated with low test-taking effort.
2 Although these rates could be associated with test-speededness, the papers cited did not note concerns about insufficient testing time.
3 An alternative solution for these response strings is to add or subtract a constant from the summed score, which is an implementation option in the irtplay package. However, this approach was not adopted to avoid inclusion of artificial bias in ability parameter estimates.
4 This finding reflects the fact that positive bias will occur when the RG accuracy rate exceeds the chance level. As such, the degree of bias present may be exacerbated by including items with c parameters that are less than the reciprocal of the number of response options. As can be seen in the supplemental file, many of the items used for data generation in this simulation study possessed c parameters below .25, which may have been one reason for the positive bias observed.