Abstract
Accountability has become a primary function of large-scale testing in the United States. The pressure on educators to raise scores is vastly greater than it was several decades ago. Research has shown that high-stakes testing can generate behavioral responses that inflate scores, often severely. I argue that because of these responses, using tests for accountability necessitates major changes in the practices of educational measurement. The needed changes span the entire testing endeavor. This article addresses implications for design, linking, and validation. It offers suggestions about possible new approaches and calls for research evaluating them.
Notes
1 During the years in question, New York State released all of its test items after a single use. They can be retrieved from http://www.nysedregents.org/intermediate.html
2 One reviewer asked for specific examples. I choose not to identify particular jurisdictions or individuals. However, I have personally experienced this in at least 3 states, in all cases being told explicitly that the reason was the risk of unwanted findings; the same happened to one of my students in a fourth state just last year.