1,527
Views
3
CrossRef citations to date
0
Altmetric
Articles

Estimating Optimal Weights for Compound Scores: A Multidimensional IRT Approach

ORCID Icon, , & ORCID Icon
Pages 914-924 | Received 21 Sep 2017, Accepted 14 May 2018, Published online: 21 Nov 2018

References

  • Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91. doi:10.1111/j.1745-3984.1992.tb00368.x
  • Ackerman, T. (1994). Creating a test information profile for a two-dimensional latent space. Applied Psychological Measurement, 18(3), 257–276. doi:10.1177/014662169401800306
  • Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311–330. doi:10.1177/014662169602000402
  • Albers, C. J., Critchley, F., & Gower, J. C. (2011). Quadratic minimization problems in statistics. Journal of Multivariate Analysis, 102(3), 698–722. doi:10.1016/j.jmva.2009.12.018
  • Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269. doi:10.3102/10769986017003251
  • Bechger, T. M., Maris, G., Verstralen, H. H. F. M., & Béguin, A. A. (2003). Using classical test theory in combination with item response theory. Applied Psychological Measurement, 27(5), 319–334. doi:10.1177/0146621603257518
  • Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–562. doi:10.1007/BF02296195
  • Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: an application of an EM-algorithm. Psychometrika, 46, 443–459.
  • Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological, 12(3), 261–280. doi:10.1177/014662168801200305
  • Bock, R. D., & Schilling, S. G. (1997). High dimensional full-information item factor analysis. In M. Berkane (Ed.), Latent variable modeling and applications of causality (pp. 163–176). New York: Springer. doi:10.1007/978-1-4612-1842-5_8
  • Bock, R., Gibbons, R., Schilling, S., Muraki, E., Wilson, D., & Wood, R. (2003). TESTFACT 4.0 computer software and manual. Lincolnwood, IL: Scientific Software International.
  • Culpepper, S. A. (2013). The reliability and precision of total scores and IRT estimates as a function of polytomous IRT parameters and latent trait distribution. Applied Psychological Measurement, 37(3), 201–225. doi:10.1177/0146621612470210
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.
  • Feldt, L., & Brennan, R. (1989). Reliability. In R. Linn (Ed.), Educational Measurement (3rd ed., pp. 105–146). New York, NY: The American Council on Education, MacMillan.
  • Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approach to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409. doi:10.1080/01621459.1990.10476213
  • Glas, C. A. W. (2014). Adaptive mastery testing using a multidimensional IRT model. In D. Yan, A. A., von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 205–218). Boca Raton, FL: Chapman and Hall/CRC.
  • Glas, C. A. W., & Vos, H. J. (2010). Adaptive mastery testing using a multidimensional IRT model. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 409–431). New York, NJ: Springer. doi:10.1007/978-0-387-85461-8_21
  • Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33(2), 204–229. doi:10.3102/1076998607302636
  • Haberman, S. J., Davier, M., & Lee, Y. H. (2008). Comparison of multidimensional item response models: Multivariate normal ability distributions versus multivariate polytomous ability distributions. ETS Research Report Series, 2008(2), i. doi:10.1002/j.2333-8504.2008.tb02131.x
  • Haberman, S. J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 79–95. doi:10.1348/000711007X248875
  • Haberman, S. J., & Sinharay, S. (2010). Reporting subscores using item response theory. Psychometrika, 75(2), 209–227. doi:10.1007/s11336-010-9158-4
  • Johnson, V., & Albert, J. H. (1999). Ordinal data modeling. New York: NJ: Springer. doi:10.1007/b98832
  • Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution, critique and future directions. Statistics in Medicine, 28(25), 3049–3067. doi:10.1002/sim.3680
  • Mosier, C. I. (1943). On the reliability of a weighted composite. Psychometrika, 8(3), 161–168. doi:10.1007/BF02288700
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 1992 (1), i–176. doi:10.1002/j.2333-8504.1992.tb01436.x
  • OECD. (2015). PISA 2015 technical report. Chapter 16. Retrieved form http://www.oecd.org/pisa/data/2015-technical-report/
  • OECD. (2016). PISA 2018: Draft analytical frameworks, May, 2016. Retrieved form https://www.oecd.org/pisa/data/PISA-2018-draft-frameworks.pdf
  • Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria.
  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Denmark Paedagogiske Institute.
  • Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9(4), 401–412. doi:10.1177/014662168500900409
  • Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W. van der Linden & R. Hambleton (Eds.), Handbook of modern item response theory (pp. 271–286). New York: Springer. doi:10.1007/978-1-4757-2691-6_16
  • Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer. doi:10.1007/978-0-387-89976-3
  • Reise, S. P., Bonifay, W. E., & Haviland, M. G. (2013). Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment, 95(2), 129–140. doi:10.1080/00223891.2012.725437
  • Rudner, L. (2005). Informed test component weighting. Educational Measurement: Issues and Practice, 20(1), 16–19. doi:10.1111/j.1745-3992.2001.tb00054.x
  • Rijmen, F., Jeon, M., von Davier, M., & Rabe-Hesketh, S. (2014). A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys. Journal of Educational and Behavioral Statistics, 39(4), 235–256.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. doi:10.1002/j.2333-8504.1968.tb00153.x
  • Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555. doi:10.1007/S11336-009-9136-X
  • Siemons, L., ten Klooster, P. M., Taal, E., Kuper, I. H., van Riel, P. L. C. M., van de Laar, M. A. F. J., & Glas, C. A. W. (2011). Validating the 28-tender joint count using item response theory. Journal of Rheumatology, 38(12), 2557–2564. doi:10.3899/jrheum.110436
  • Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47(2), 150–174. doi:10.1111/j.1745-3984.2010.00106.x
  • Sinharay, S., Puhan, G., & Haberman, S. J. (2010). Reporting diagnostic scores in educational testing: Temptations, pitfalls, and some solutions. Multivariate Behavioral Research, 45(3), 553–573. doi:10.1080/00273171.2010.483382
  • Sinharay, S., Puhan, G., & Haberman, S. J. (2011). An NCME instructional module on subscores. Educational Measurement: Issues and Practice, 30(3), 29–40. doi:10.1111/j.1745-3992.2011.00208.x
  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408. doi:10.1007/BF02294363
  • Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398), 528–540. doi:10.1080/01621459.1987.10478458
  • Verhelst, N. D., Glas, C. A. W., & de Vries, H. H. (1997). A steps model to analyze partial credit. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 123–138). New York: Springer. doi:10.1007/978-1-4757-2691-6_7
  • Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., & Nelson, L. (2001). Augmented scores-“borrowing strength” to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Hillsdale: Lawrence Erlbaum.
  • Willms, J. D. (2006). Learning divides: Ten policy questions about the performance and equity of schools and schooling systems. Montreal: UNESCO Institute for Statistics.