563
Views
15
CrossRef citations to date
0
Altmetric
CONTENT ARTICLES IN ECONOMICS

The Consequences of Using one Assessment System to Pursue two Objectives

Pages 339-352 | Published online: 27 Sep 2013
 

Abstract

Education officials often use one assessment system both to create measures of student achievement and to create performance metrics for educators. However, modern standardized testing systems are not designed to produce performance metrics for teachers or principals. They are designed to produce reliable measures of individual student achievement in a low-stakes testing environment. The design features that promote reliable measurement provide opportunities for teachers to profitably coach students on test-taking skills, and educators typically exploit these opportunities whenever modern assessments are used in high-stakes settings as vehicles for gathering information about their performance. Because these coaching responses often contaminate measures of both student achievement and educator performance, it is likely possible to acquire more accurate measures of both student achievement and education performance by developing separate assessment systems that are designed specifically for each measurement task.

JEL codes:

Acknowledgments

The author thanks the Searle Freedom Trust for research support and also thanks Lindy and Michael Keiser for research support through a gift to the University of Chicago's Committee on Education. He further thanks Michael Greenstone, Diane Whitmore Schanzenbach, and Robert S. Gibbons for useful comments, Robert D. Gibbons and David Thissen for their insights on psychometrics and assessment development, and Ian Fillmore, Sarah A. G. Komisarow, and Richard Olson for excellent research assistance.

Notes

1. The SMARTER Balanced Assessment Consortium (SBAC) and the Partnership for Assessment of Readiness for College and Careers (PARCC) are the two groups developing assessment systems for the Common Core State Standards using funds awarded as part of the Obama Administration's Race to the Top initiative.

2. The pattern described here is not definitive proof that student test score gains on high-stakes assessments do not reflect real gains in subject mastery because the parallel low-stakes assessments are never identical in terms of subject content. See Koretz (Citation2002) for more on this point. However, many of the studies in this literature present results that are difficult to explain in a credible way without some story that explains how high-stakes assessments scores can rise quickly without commensurate improvements in true levels of math or reading achievement.

3. Glewwe, Ilias, and Kremer (2010), Jacob (Citation2005), Klein et al. (Citation2000), Koretz and Barron (Citation1998), Koretz (Citation2002), and Vigdor (Citation2009) all present results that show divergence between student assessment results on two assessments of the same subject matter in settings where one assessment became a high-stakes assessment for educators and the other assessment continued to involve relatively low stakes. Neal (Citation2012) provides a detailed discussion of this literature.

4. See Campbell (Citation1979), Kerr (Citation1995), and Rothstein (Citation2009) for many examples.

5. The effort distortions induced by assessment-based accountability are not one-time costs. Any system that induces teachers to adopt teaching methods that raise test scores but degrade the true quality of instruction imposes an ongoing cost on students, and students bear these costs throughout all grades and classes where their teachers are subject to assessment-based accountability.

6. See Hambleton, Swaminathan, and Rogers (1991, 135).

7. See Kolen and Brennan (2010, 19).

8. For more on IRT models, see Hambleton, Swaminathan, and Rogers (1991).

9. See Bay-Borelli et al. (2010, 25).

10. Kolen and Brennan (Citation2010) asserted that proper ex post equating of the results from different exam forms is not possible without an ex ante commitment to systematic procedures that govern item and form development, and they gave several examples of cases where equating procedures did not work well ex post because different assessment forms in a series were not developed and administered in a consistent manner (see Chapter 8).

12. The new exam was not given in the first quarter of 2004, and pass rates historically vary by quarter, with pass rates for first quarters being below the corresponding year-wide averages. The pass rate in the final three quarters of 2004 was almost identical to the 2005 annual pass rate and may have been slightly below if the exam was given in all four quarters of 2004.

13. The pass rates for the other three components of the exam follow a similar pattern, but the patterns on other exams are more difficult to interpret because both the format and the item content of the other three exams changed substantially to reflect new international standards for accounting. The 2011 drop in annual pass rates is less than one percent for BEC and roughly five percent for AUD and FAR. For all three exams, the decline in pass rates is more pronounced when one compares pass rates from the first two quarters of 2011 to the pass rates from the first two quarters of 2010. The changes in content specifications for all exams were announced more than a year before the 2011 exams were administered.

14. See Lazear and Rosen (Citation1981) as well as chapters 10 and 11 in Lazear and Gibbs (Citation2008).

15. The performance metric we propose is called the Percentile Performance Index (PPI). It is similar in construction to Student Growth Percentile measures (SGP) that are already being used in some states as accountability measures (see Betebenner Citation2009). Free PPI software is available at http://sites.google.com/site/dereknealresearch/home/pay-for-percentile-software.

16. Standard results on optimal incentive contracts show that if educators are risk-neutral, a reduction in reliability does not hamper efficient incentive provision. On the other hand, if educators are risk-averse, they will demand to be compensated for assuming the extra risk created by any drop in reliability. However, as the number of students that any educator or group of educators teaches grows large, this effect may well become a second-order concern.

17. See Prendergast (Citation1999) and Neal (Citation2012).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 130.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.