REFERENCES
- Benjamini, Y., & Hochberg, Y. (1994). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.
- Birenbaum, M. (1986). Effect of dissimulation motivation and anxiety on response pattern appropriateness measures. Applied Psychological Measurement, 10, 167–174.
- Darling-Hammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence. Education Policy Analysis Archives, 8(1). Retrieved March 15, 2000 from http://epaa.asu.edu/epaa/v8nl/
- Dorn, S. (1998). The political legacy of school accountability systems. Education Policy Analysis Archives, 6(1). Retrieved March 15, 2000 from http://epaa.asu.edu/epaa/v6nl/
- Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 59–79.
- Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1991). Appropriateness measurement for some multidimensional test batteries. Applied Psychological Measurement, 15, 171–191.
- Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86.
- Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64.
- Freund, D. S., & Rock, D. A. (1992). A preliminary investigation of pattern-marking in 1990 NAEP data. Paper presented at the annual meeting of the American Educational Research Association, San Francisco. (ERIC Document Reproduction Service No. ED 347 189)
- Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Mahwah, NJ: Erlbaum.
- Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.
- Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole.
- Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56, 213–228.
- Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269–290.
- Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
- Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a suppressor variable in the MMPI. Journal of Applied Psychology, 30, 525–561.
- Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107–135.
- Muraki, E., & Bock, R. D. (1996). PARSCALE: IRT based test score and item analysis for graded open-ended excercises and performance tasks. Chicago: Scientific Software International.
- Nering, M. L. (1995). The distribution of person fit using true and estimated person parameters. Applied Psychological Measurement, 19, 121–129.
- Nering, M. L. (1997). The distribution of indexes of person fit within the CAT environment. Applied Psychological Measurement, 21, 115–127.
- Nering, M. L., & Meijer, R. R. (1998). A comparison of the person response function and the l person—fit statistic. Applied Psychological Measurement, 22, 53–69.
- Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991). A developmental perspective on standardized achievement testing. Educational Researcher, 20, 2–7.
- Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23, 41–53.
- Sijtsma, K., & Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66, 191–207.
- Van Krimpen-Stoop, E. M. L. A., & Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26, 164–180.
- Wolf, L. S., & Smith, J. K. (1995). Consequences of performance, test motivation, and mentally taxing items. Applied Measurement in Education, 8, 341–351.
- Zickar, M. J., & Drasgow, F. (1996). Detecting faking on a personality instrument using appropriateness measurement. Applied Psychological Measurement, 20, 71–87.