1,011
Views
41
CrossRef citations to date
0
Altmetric
Articles

A General Approach to Measuring Test-Taking Effort on Computer-Based Tests

Pages 343-354 | Published online: 01 Sep 2017
 

ABSTRACT

There has been an increased interest in the impact of unmotivated test taking on test performance and score validity. This has led to the development of new ways of measuring test-taking effort based on item response time. In particular, Response Time Effort (RTE) has been shown to provide an assessment of effort down to the level of individual item responses. A limitation of RTE, however, is that it is intended for use with selected response items that must be answered before a test taker can move on to the next item. The current study outlines a general process for measuring item-level effort that can be applied to an expanded set of item types and test-taking behaviors (such as omitted or constructed responses). This process, which is illustrated with data from a large-scale assessment program, should improve our ability to detect non-effortful test taking and perform individual score validation.

Notes

1 Strictly speaking, RTE measures the amount of non-effortful behavior (exhibited as rapid guessing) detected during a test event. Rapid guesses are assumed to be non-effortful, while responses classified as solution behaviors may comprise both effortful and undetected instances of non-effortful responses. This is discussed further in the Limitations section.

2 Person fit indices have also been developed to identify aberrant test-taking behavior. As Wise (Citation2015) noted, however, because person fit indices are sensitive to myriad sources of misfit, it can be difficult to unambiguously interpret a given instance of misfit as a lack of effort.

3 Within the United States, the PBTS is known as the OECD Test for Schools (Based on PISA).

4 We also investigated a variety of brief answer length thresholds ranging from 6% to 20%. Between 6% and 10%, the results were highly similar. Beyond 10%, no additional test takers were identified because there were none who rapidly entered longer answers. This suggests that the shortness of the time threshold materially constrains the answer length thresholds that could be considered.

5 Research on RTE has found that once the percentage of non-effortful responses exceeds 10%, test performance begins to be materially distorted (Wise, Citation2015; Wise & Kingsbury, Citation2016). Thus, RTE equal to .90 can be considered a useful criterion for invalidating a score because it is too distorted to be trustworthy. A similar argument could be made for RBE scores.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 400.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.