377
Views
3
CrossRef citations to date
0
Altmetric
Articles

A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests

, ORCID Icon &
Pages 248-269 | Received 13 Sep 2017, Accepted 26 Sep 2018, Published online: 13 Dec 2018

REFERENCES

  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.
  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.
  • Cai, L. (2015). Lord-Wingersky Algorithm Version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80(2), 535–559.
  • Cai, L. (2017). flexMIRT version 3.51: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
  • Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16(3), 221–248.
  • Cao, Y., Lu, R., & Tao, W. (2014, December). Effect of item response theory (IRT) model selection on testlet-based test equating (ETS Research Report RR-14-19). Princeton, NJ: ETS.
  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.
  • DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145–168.
  • Dorans, N. J., & Feigenbaum, M. D. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. In I. M. Lawrence, N. J. Dorans, M. D. Feigenbaum, N. J. Feryok, A. P. Schmitt, & N. K. Wright (Eds.), Technical issues related to the introduction of the new SAT and PSAT/NMSQT (Research Memorandum No. RM-94-10). Princeton, NJ: Educational Testing Service.
  • Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436.
  • Hanson, B. A. (1994). An extension of the Lord-Wingersky algorithm to polytomous items. (Unpublished research note).
  • Kolen, M. J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18(1), 1–11.
  • Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29–37.
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. New York, NY: Springer.
  • Lee, G., Kolen, M. J., Frisbie, D. A., & Ankenmann, R. D. (2001). Comparison of dichotomous and polytomous item response models in equating scores from tests composed of testlets. Applied Psychological Measurement, 25(4), 357–372.
  • Lee, G., Lee, W., Kolen, M. J., Park, I., Kim, D., & Yang, J. S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28(2), 681–700.
  • Lee, W., He, Y., Hagge, S., Wang, W., & Kolen, M. J. (2012). Equating mixed-format tests using dichotomous common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2, CASMA Monograph No. 2.2). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
  • Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3–21.
  • Lord, F. M., & Wingersky, M. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 452–461.
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.
  • R Core Team. (2018). R: A language and environment for statistical computing [Computer software]. Vienna, Austria. Retrieved from https://www.R-project.org/
  • Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
  • Rosenbaum, P. R. (1988). Items bundles. Psychometrika, 53(3), 349–359.
  • Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York, NY: Springer.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108–121.
  • Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp 245–269). Dordrecht, The Netherlands: Kluwer.
  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local independence. Journal of Educational Measurement, 30(3), 187–213.
  • Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing, 27(1), 119–140.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.