References
- Akour, M., & Al-Omari, H. (2013). Empirical investigation of the stability of IRT item-parameters estimation. International Online Journal of Educational Sciences, 5(2), 291–301.
- Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item response model (Research Report 81–20). Princeton, NJ: Educational Testing Service. https://doi.org/https://doi.org/10.1002/j.2333-8504.1981.tb01255.x
- Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/https://doi.org/10.1007/BF02293801
- Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/https://doi.org/10.1177/014662168200600405
- Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/https://doi.org/10.18637/jss.v048.i06
- Cheng, Y., & Liu, C. (2015). The effect of upper and lower asymptotes of IRT models on computerized adaptive testing. Applied Psychological Measurement, 39(7), 551–565. https://doi.org/https://doi.org/10.1177/0146621615585850
- Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163. https://doi.org/https://doi.org/10.1007/s11336-015-9477-6
- de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
- DeMars, C. (2010). Item response theory: Understanding statistics measurement. Oxford University Press.
- Downing, S. M. (2003). Item response theory: Applications of modern test theory in medical education. Medical Education, 37(8), 739–745. https://doi.org/https://doi.org/10.1046/j.1365-2923.2003.01587.x
- Eichenbaum, A. E., Marcus, D. K., & French, B. F. (2019). Item response theory analysis of the psychopathic personality inventory–revised. Assessment, 26(6), 1046–1058. https://doi.org/https://doi.org/10.1177/1073191117715729
- Finch, H., & French, B. F. (2019). A comparison of estimation techniques for IRT models with small samples. Applied Measurement in Education, 32(2), 77–96. https://doi.org/https://doi.org/10.1080/08957347.2019.1577243
- Hambleton, R. K., & Cook, L. L. (1983). Robustness of item response models and effects of test length and sample size on the precision of ability estimates. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31–49). Academic Press.
- Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer-Nijhoff Publishing.
- Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.
- Han, K. T. (2016). Maximum likelihood score estimation method with fences for short-length tests and computerized adaptive tests. Applied Psychological Measurement, 40(4), 289–301. https://doi.org/https://doi.org/10.1177/0146621616631317
- Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125. https://doi.org/https://doi.org/10.1177/014662169602000201
- Harwell, M. R., & Janosky, J. E. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement, 15(3), 279–291. https://doi.org/https://doi.org/10.1177/014662169101500308
- Hockemeyer, C. (2002). A comparison of non-deterministic procedures for the adaptive assessment of knowledge. Psychological Test and Assessment Modeling, 44(4), 495.
- Hohensinn, C., & Kubinger, K. D. (2011). Applying item response theory methods to examine the impact of different response formats. Educational and Psychological Measurement, 71(4), 732–746. https://doi.org/https://doi.org/10.1177/0013164410390032
- Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6(3), 249–260. https://doi.org/https://doi.org/10.1177/014662168200600301
- Kalkan, Ö. K., & Çuhadar, İ. (2020). An evaluation of 4PL IRT and DINA models for estimating pseudo-guessing and slipping parameters. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 131–146. https://doi.org/https://doi.org/10.21031/epod.660273
- Kean, J., & Reilly, J. (2014). Item response theory. In F. M. Hammond, J. F. Malec, T. G. Nick, & R. M. Buschbacher (Eds.), Handbook for clinical research: Design, statistics, and implementation (pp. 195–198). Demos Medical Publishing.
- Liao, W. W., Ho, R. G., Yen, Y. C., & Cheng, H. C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 1679–1694. https://doi.org/https://doi.org/10.2224/sbp.2012.40.10.1679
- Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328.
- Loken, E., & Rulison, K. L. (2010). Estimation of a four‐parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509–525. https://doi.org/https://doi.org/10.1348/000711009X474502
- Lord, F. (1952). A theory of test scores (Psychometric Monographs No. 7). Psychometric Corporation.
- Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum’s three-parameter logistic model. Educational and Psychological Measurement, 28(4), 989–1020. https://doi.org/https://doi.org/10.1177/001316446802800401
- Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed score “equatings.”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/https://doi.org/10.1177/014662168400800409
- Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304–315. https://doi.org/https://doi.org/10.1177/0146621613475471
- Meng, X., Xu, G., Zhang, J., & Tao, J. (2019). Marginalized maximum a posteriori estimation for the four-parameter logistic model under a mixture modelling framework. British Journal of Mathematical and Statistical Psychology, 73(S1), 51–82. https://doi.org/https://doi.org/10.1111/bmsp.12185
- Mislevy, R. J., & Stocking, M. L. (1989). A consumer’s guide to LOGIST and BILOG. Applied Psychological Measurement, 13(1), 57–75. https://doi.org/https://doi.org/10.1177/014662168901300106
- Nicewander, W. A. (2018). Conditional reliability coefficients for test scores. Psychological Methods, 23(2), 351–362. https://doi.org/https://doi.org/10.1037/met0000132
- Nicewander, W. A., & Schulz, E. M. (2015). A comparison of two methods for computing IRT scores from the number-correct score. Applied Psychological Measurement, 39(8), 643–655. https://doi.org/https://doi.org/10.1177/0146621615601081
- O’Neill, T. R., Gregg, J. L., & Peabody, M. R. (2020). Effect of sample size on common item equating using the dichotomous Rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/https://doi.org/10.1080/08957347.2019.1674309
- Patsula, L. N., & Gessaroli, M. E. (1995, April). A comparison of item parameter estimates and ICCs produced with TESTGRAF and BILOG under different test lengths and sample sizes [Paper Presnetation]. The meeting of National Council on Measurement in Education, San Francisco, CA.
- R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
- Reckase, M. D. (2009). Multidimensional item response theory. Springer.
- Ree, M. J., & Jensen, H. E. (1983). Effects of sample size on linear equating of item characteristic curve parameters. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 135–146). Academic Press.
- Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83–101. https://doi.org/https://doi.org/10.1177/0146621608324023
- Rupp, A. A., & Zumbo, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66(1), 63–84. https://doi.org/https://doi.org/10.1177/0013164404273942
- Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17(1), 321–335. https://doi.org/https://doi.org/10.12738/estp.2017.1.0270
- Sharp, C., Steinberg, L., Yaroslavsky, I., Hofmeyr, A., Dellis, A., Ross, D., & Kincaid, H. (2012). An item response theory analysis of the Problem Gambling Severity Index. Assessment, 19(2), 167–175. https://doi.org/https://doi.org/10.1177/1073191111418296
- Skaggs, G., & Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495–529. https://doi.org/https://doi.org/10.3102/00346543056004495
- Stone, C. A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 40(4), 331–352. https://doi.org/https://doi.org/10.1111/j.1745-3984.2003.tb01150.x
- Swaminathan, H., & Gifford, J. A. (1983). Estimation of parameters in the three-parameter latent trait model. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 13–30). Academic Press.
- Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TOEFL IRT-based equating. ETS Research Report Series, 1993(2), i–38. https://doi.org/https://doi.org/10.1002/j.2333-8504.1993.tb01570.x
- Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397–412. https://doi.org/https://doi.org/10.1007/BF02293705
- Viswesvaran, C., & Ones, D. S. (1993). Integrating responses across surveys using item response theory. Perceptual and Motor Skills, 77(1), 147–153. https://doi.org/https://doi.org/10.2466/pms.1993.77.1.147
- Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350–370. https://doi.org/https://doi.org/10.1080/00273171.2017.1292893
- Waller, N. G., & Reise, S. P. (2010). Measuring psychopathology with nonstandard item response theory models: Fitting the four-parameter model to the Minnesota Multiphasic Personality Inventory. In S. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 147–173). Washington, DC: American Psychological Association.
- Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109–135. https://doi.org/https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
- Wingersky, M. S., Patrick, R., & Lord, F. M. (1988). LOGIST user’s guide. Version 6.0. Educational Testing Service.
- Woods, C. M. (2008). Ramsay-curve item response theory for the three-parameter logistic item response model. Applied Psychological Measurement, 32(6), 447–465. https://doi.org/https://doi.org/10.1177/0146621607308014
- Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52(2), 275–291. https://doi.org/https://doi.org/10.1007/BF02294241
- Yen, Y. C., Ho, R. G., Laio, W. W., Chen, L. J., & Kuo, C. C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75–87. https://doi.org/https://doi.org/10.1177/0146621611432862