308
Views
1
CrossRef citations to date
0
Altmetric
Articles

Auditing for Score Inflation Using Self-Monitoring Assessments: Findings From Three Pilot Studies

, , , , &
Pages 231-247 | Published online: 20 Sep 2016
 

Abstract

Test-based accountability often produces score inflation. Most studies have evaluated inflation by comparing trends on a high-stakes test and a lower stakes audit test. However, Koretz and Beguin (2010) noted weaknesses of audit tests and suggested self-monitoring assessments (SMAs), which incorporate audit items into high-stakes tests. This article reports the first three trials of SMAs, evaluating whether SMAs can detect inflation that had already been documented. The studies were conducted with mathematics tests in three grades. Despite severe conservative biases, the audit component functioned as expected in many of the trials. The difference in performance between nonaudit and audit items was associated with factors that earlier research showed to be related to test preparation and score inflation, such as scoring just below the Proficient cut in the previous year and school poverty. However, a number of null findings underscore the need for additional research into the design of audit items.

Funding

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305AII0420, and by the Spencer Foundation, through Grants 201100075 and 201200071, to the President and Fellows of Harvard College.

Acknowledgments

We thank the New York State Education Department for providing the data used in this study. The opinions expressed are those of the authors and do not represent views of the Institute of Education Sciences, the U.S. Department of Education, the Spencer Foundation, or the New York State Education Department or its staff.

Notes

1 Score inflation has typically been operationalized as the divergence in trends between scores on a high-stakes test and on a lower stakes audit test designed to support similar inferences, using either identical students or randomly equivalent groups. For a discussion of methods for validating scores under high-stakes conditions, see Koretz and Hamilton (Citation2006).

2 We also dropped a very small number of students with mismatched form booklets. In addition, we dropped P.S. 184 Shuang Wen School, a public school in New York City with an immersion program in Mandarin Chinese. A large percent of the school's students were Asian, and an extreme value relative to the rest of the schools in our sample inflated coefficients for the school proportion–Asian variable.

3 In most cases, parents reported race. When parents did not report race, districts were responsible for assigning classifications.

4 For detailed information about the criteria for the low-income variable, see University of the State of New York (Citation2011, p. 44).

5 Conceptually, reliability cannot be negative. In practice, when reliability is very low, one can obtain negative estimates from sampling error. One characteristic of our data increases the probability of negative sample estimates. Using the classical model, the estimated reliability of a difference score will be negative whenever , where x and y are the two tests that are differenced. This inequality is more likely to hold when the test with a larger variance has a considerably lower reliability—precisely what our data produce. Following convention, we set all negative reliability estimates to zero.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 290.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.