Search in:

Advanced search

International Journal of Testing Volume 19, 2019 - Issue 3

Submit an article Journal homepage

377

Views

CrossRef citations to date

Altmetric

Articles

A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests

Kyung Yong KimEducational Research Methodology, University of North Carolina at Greensboro, USA; Correspondence[email protected]

Euijin LimTEPS Center, Language Education Institute, Seoul National University, Korea;

http://orcid.org/0000-0003-3547-9843

Won-Chan LeeCASMA, University of Iowa, USA

Pages 248-269 | Received 13 Sep 2017, Accepted 26 Sep 2018, Published online: 13 Dec 2018

Cite this article
https://doi.org/10.1080/15305058.2018.1530239
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

REFERENCES

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.
Web of Science ®Google Scholar
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.
Web of Science ®Google Scholar
Cai, L. (2015). Lord-Wingersky Algorithm Version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80(2), 535–559.
PubMed Web of Science ®Google Scholar
Cai, L. (2017). flexMIRT version 3.51: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Google Scholar
Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16(3), 221–248.
PubMed Web of Science ®Google Scholar
Cao, Y., Lu, R., & Tao, W. (2014, December). Effect of item response theory (IRT) model selection on testlet-based test equating (ETS Research Report RR-14-19). Princeton, NJ: ETS.
Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.
Web of Science ®Google Scholar
DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145–168.
Web of Science ®Google Scholar
Dorans, N. J., & Feigenbaum, M. D. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. In I. M. Lawrence, N. J. Dorans, M. D. Feigenbaum, N. J. Feryok, A. P. Schmitt, & N. K. Wright (Eds.), Technical issues related to the introduction of the new SAT and PSAT/NMSQT (Research Memorandum No. RM-94-10). Princeton, NJ: Educational Testing Service.
Google Scholar
Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436.
Web of Science ®Google Scholar
Hanson, B. A. (1994). An extension of the Lord-Wingersky algorithm to polytomous items. (Unpublished research note).
Google Scholar
Kolen, M. J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18(1), 1–11.
Web of Science ®Google Scholar
Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29–37.
Google Scholar
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. New York, NY: Springer.
Google Scholar
Lee, G., Kolen, M. J., Frisbie, D. A., & Ankenmann, R. D. (2001). Comparison of dichotomous and polytomous item response models in equating scores from tests composed of testlets. Applied Psychological Measurement, 25(4), 357–372.
Web of Science ®Google Scholar
Lee, G., Lee, W., Kolen, M. J., Park, I., Kim, D., & Yang, J. S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28(2), 681–700.
Google Scholar
Lee, W., He, Y., Hagge, S., Wang, W., & Kolen, M. J. (2012). Equating mixed-format tests using dichotomous common items. In M. J. Kolen & W. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Vol. 2, CASMA Monograph No. 2.2). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, The University of Iowa.
Google Scholar
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3–21.
Web of Science ®Google Scholar
Lord, F. M., & Wingersky, M. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 452–461.
Web of Science ®Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.
Web of Science ®Google Scholar
R Core Team. (2018). R: A language and environment for statistical computing [Computer software]. Vienna, Austria. Retrieved from https://www.R-project.org/
Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
Google Scholar
Rosenbaum, P. R. (1988). Items bundles. Psychometrika, 53(3), 349–359.
Web of Science ®Google Scholar
Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York, NY: Springer.
Google Scholar
Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108–121.
Web of Science ®Google Scholar
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp 245–269). Dordrecht, The Netherlands: Kluwer.
Google Scholar
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local independence. Journal of Educational Measurement, 30(3), 187–213.
Web of Science ®Google Scholar
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing, 27(1), 119–140.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests

REFERENCES

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests

REFERENCES

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date