569
Views
11
CrossRef citations to date
0
Altmetric
Original Articles

Item Calibration Samples and the Stability of Achievement Estimates and System Rankings: Another Look at the PISA Model

, &

REFERENCES

  • Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23. doi:10.1177/0146621697211001.
  • Adams, R.J., Wu, M.L., & Carstensen, C.H. (2007). Multivariate and mixture distribution Rasch models: extensions and applications. In Application of multivariate Rasch models in international large scale educational assessment (pp. 271–280). New York: Springer.
  • Adams, R.J., Wu, M.L., & Wilson, M. (2012). ACER ConQuest 3.0.1. ACER.
  • Brown, G., Micklewright, J., Schnepf, S.V., & Waldmann, R. (2007). International surveys of educational achievement: How robust are the findings? Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(3), 623–646. doi:10.1111/j.1467-985X.2006.00439.x.
  • Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. New York, NY: Psychology Press.
  • Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3/4), 199–215.
  • Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study. Assessment in Education: Principles, Policy & Practice, 11(3), 319–330. doi:10.1080/0969594042000304618.
  • Gonzalez, E. (2009). GenItmDat Macro for SAS (Version 1). Princeton, NJ: ETS.
  • Grisay, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69–86. doi:10.1016/j.stueduc.2007.01.006.
  • Hambleton, R.K., & Rogers, H.J. (1989). Detecting potentially biased test items: comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education, 2(4), 313–334. doi:10.1207/s15324818ame0204_4.
  • Kreiner, S., & Christensen, K.B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. doi:10.1007/s11336-013-9347-z.
  • Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267.
  • Mazzeo, J., & von Davier, M. (2009). Review of the Programme for International Student Assessment (PISA) test design: recommendations for fostering stability in assessment results. Presented at the NCES Conference on the Program for International Student Assessment: What Can We Learn from PISA? Washington, DC: IES National Center for Education Statistics.
  • Mellenbergh, G.J. (1982). Contingency table models for assessing item bias. Journal of Educational and Behavioral Statistics, 7(2), 105–118. doi:10.3102/10769986007002105.
  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. doi:10.1007/BF02294825.
  • Millsap, R.E. (2011). Statistical approaches to measurement invariance. New York: Routledge.
  • Mislevy, R.J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359–381. doi:10.1007/BF02306026.
  • Mislevy, R.J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196.
  • Mislevy, R.J., Beaton, A.E., Kaplan, B., & Sheehan, K.M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29(2), 133–161.
  • Mislevy, R.J., Johnson, E.G., & Muraki, E. (1992). Chapter 3: scaling procedures in NAEP. Journal of Educational and Behavioral Statistics, 17(2), 131–154. doi:10.3102/10769986017002131.
  • Organisation for Economic Co-Operation and Development (n.d.). FAQ: OECD PISA. Retrieved from www.oecd.org/pisa/aboutpisa/pisafaq.htm
  • Organisation for Economic Co-Operation and Development. (2002). PISA 2000 technical report. Paris: OECD Publishing.
  • Organisation for Economic Co-Operation and Development. (2005). PISA 2003 technical report. Paris: OECD Publishing.
  • Organisation for Economic Co-Operation and Development.. (2009). PISA 2006 technical report. Paris: OECD Publishing.
  • Organisation for Economic Co-Operation and Development. (2012). PISA 2009 technical report. Paris: OECD Publishing. Retrieved from http://www.oecd.org/edu/preschoolandschool/programmeforinternationalstudentassessmentpisa/pisa2009technicalreport.htm.
  • Oliveri, M.E., & Ercikan, K. (2011). Do different approaches to examining construct comparability in multilanguage assessments lead to similar conclusions? Applied Measurement in Education, 24(4), 349–366. doi:10.1080/08957347.2011.607063.
  • Oliveri, M.E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333.
  • Oliveri, M.E., & von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in International Large-Scale Assessments. International Journal of Testing, 14(1), 1–21. doi:10.1080/15305058.2013.825265.
  • Rasch, G. (1980). Probabilistic models for intelligence and attainment tests (ed.)2nd. Chicago: University of Chicago.
  • Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.
  • Rutkowski, L., von Davier, M., Gonzalez, E., & Zhou, Y. (2014). Assessment design for international large-scale assessment. In Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis. Boca Raton, FL: Chapman & Hall/CRC Press.
  • Sahlberg, P. (2013, December 8). The PISA 2012 scores show the failure of “market based” education reform. The Guardian. Retrieved from http://www.theguardian.com.
  • SAS. (2002–2010). (Version 9.3). Cary, NC, USA: SAS.
  • Shoemaker, D.M. (1973). Principles and procedures of multiple matrix sampling (Vol.). Oxford, UK: Ballinger.
  • Von Davier, M., Gonzalez, E., & Mislevy, R.J. (2009). What are plausible values and why are they useful? IERI Monograph Series, 2, 9–36.
  • Von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). The statistical procedures used in National Assessment of Educational Progress: recent developments and future directions. In C.R. Rao and S. Sinharay (Eds.), Handbook of statistics (Vol., pp. 1039–1055). Amsterdam, Netherlands: Elsevier. Retrieved from http://www.sciencedirect.com/science/article/pii/S0169716106260322.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.