845
Views
3
CrossRef citations to date
0
Altmetric
Articles

Multilevel Reliability Measures of Latent Scores Within an Item Response Theory Framework

, &

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Retrieved from http://www.apa.org/science/programs/testing/standards.aspx
  • Asparouhov, T., & Muthén, B. O. (2016). IRT in Mplus. Retrieved from https://www.statmodel.com/download/MplusIRT.pdf
  • Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker. Retrieved from https://www.crcpress.com/Item-Response-Theory-Parameter-Estimation-Techniques-Second-Edition/Baker-Kim/p/book/9780824758257
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. doi:10.1007/BF02293801
  • Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. doi:10.1177/014662168200600405
  • Bottge, B. A., Toland, M. D., Gassaway, L., Butler, M., Choo, S., Griffen, A. K., & Ma, X. (2015). Impact of enhanced anchored instruction in inclusive math classrooms. Exceptional Children, 81(2), 158–175. doi:10.1177/0014402914551742
  • Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31(2), 144–152. doi:10.1111/j.2044-8317.1978.tb00581.x
  • Brennan, R. L. (1995). The conventional wisdom about group mean scores. Journal of Educational Measurement, 32(4), 385–396. doi:10.1111/j.1745-3984.1995.tb00473.x
  • Brennan, R. L., & Kane, M. T. (1977). Signal/noise ratios for domain-referenced tests. Psychometrika, 42(4), 609–625. doi:10.1007/BF02295983
  • Brennan, R. L., Yin, P., & Kane, M. T. (2003). Methodology for examining the reliability of group mean difference scores. Journal of Educational Measurement, 40(3), 207–230. doi:10.1111/j.1745-3984.2003.tb01105.x
  • Cai, L. (2017). flexMIRT version 3.51: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group. Retrieved from https://www.vpgcentral.com/software/irt-software/
  • Chang, H.-H., & Stout, W. F. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52. doi:10.1007/BF02294469
  • Cheng, Y., & Yuan, K.-H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280–291. doi:10.1007/s11336-009-9144-x
  • Cheng, Y., Yuan, K.-H., & Liu, C. (2012). Comparison of reliability measures under factor analysis and item response theory. Educational and Psychological Measurement, 72(1), 52–67. doi:10.1177/0013164411407315
  • Chi, Y., & Li, J. (2012). Evaluating the performance of different procedures for constructing confidence intervals for coefficient alpha: A simulation study. British Journal of Mathematical and Statistical Psychology, 65, 467–498. doi:10.1111/j.2044-8317.2012.02038.x
  • Cho, S.-J., Preacher, K. J., & Bottge, B. A. (2015). Detecting intervention effects in a cluster randomized design using multilevel structural equation modeling for binary responses. Applied Psychological Measurement, 39(8), 627–642. doi:10.1177/0146621615591094
  • Choi, I. H., & Wilson, M. (2016). Incorporating mobility in growth modeling for multilevel and longitudinal item response data. Multivariate Behavioral Research, 51(1), 120–137. doi:10.1080/00273171.2015.1114911
  • Clements, D. H., Sarama, J., Spitler, M. E., Lange, A. A., & Wolfe, C. B. (2011). Mathematics learned by young children in an intervention based on learning trajectories: A large-scale cluster randomization trial. Journal for Research in Mathematics Education, 42, 127–166. doi:10.5951/jresematheduc.42.2.0127
  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. doi:10.1007/BF02310555
  • Cronbach, L. J. (1976). Research on classrooms and schools: Formulation of questions, design, and analysis. Stanford, CA: Stanford University, Evaluation Consortium. Retrieved from https://eric.ed.gov/?id=ED135801
  • Cronbach, L. J., & Gleser, G. C. (1964). The signal/noise ratio in the comparison of reliability coefficients. Educational and Psychological Measurement, 24(3), 467–480. doi:10.1177/001316446402400303
  • Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.
  • De Jong, M. G., Steenkamp, J. B. E. M., & Fox, J.-P. (2007). Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. Journal of Consumer Research, 34(2), 260–278. doi:10.1086/518532
  • Ferguson, R. (2010). Student perceptions of teaching effectiveness. Dissertation brief from the National Center for Teacher Effectiveness and the Achievement Gap Initiative, Harvard University, Cambridge, MA.
  • Feuerstahler, L. M. (2018). Source of error in IRT trait estimation. Applied Psychological Measurement, 42(5), 359–375. doi:10.1177/0146621617733955
  • Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299. doi:10.1037/a0015825
  • Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. New York, NY: Springer. doi:10.1007/978-1-4419-0742-4
  • Fox, J. P., & Glas, G. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66(2), 271–288. doi:10.1007/BF02294839
  • Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. doi:10.1037/a0032138
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton, FL: Chapman & Hall/CRC. doi:10.1201/b16018
  • Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–511. doi:10.1111/j.1745-3984.1984.tb01039.x
  • Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21(4), 347–360. doi:10.1111/j.1745-3984.1984.tb01039.x
  • Haberman, S. J., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response model. Psychometrika, 75(2), 209–227. doi:10.1007/s11336-010-9158-4
  • Hambleton, R. K., & Novick, M. R. (1973). Toward an integration of theory and methods for criterion-referenced tests. Journal of Educational Measurement, 10(3), 159–170. doi:10.1111/j.1745-3984.1973.tb00793.x
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer. doi:10.1007/978-94-017-1988-9
  • Houts, C. R., & Cai, L. (2013). flexMIRT user’s manual version 2: Flexible multilevel multidimensional item analysis and test scoring. Chapel Hill, NC: Vector Psychometric Group. Retrieved from https://www.vpgcentral.com/wp-content/uploads/2014/04/flexMIRTUserManual1.pdf
  • Hox, J. J., & Maas, C. J. M. (2001). The accuracy of multilevel structural equation modeling with pseudobalanced groups and small samples. Structural Equation Modeling: A Multidisciplinary Journal, 8(2), 157–174. doi:10.1207/S15328007SEM0802_1
  • Hsu, H.-Y., Lin, J.-H., Kwok, O.-M., Acosta, S., & Willson, V. (2017). The impact of intraclass correlation on the effectiveness of level-specific fit indices in multilevel structural modeling: A Monte Carlo study. Educational and Psychological Measurement, 77(1), 5–31. doi:10.1177/0013164416642823
  • Huang, F. L., & Cornell, D. G. (2016). Multilevel factor structure, concurrent validity, and test-retest reliability of the high school teacher version of the Authoritative School Climate Survey. Journal of Psychoeducational Assessment, 34(6), 536–549. doi:10.1177/0734282915621439
  • Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of classical test theory and item response theory in individual change assessment. Applied Psychological Measurement, 40(8), 559–572. doi:10.1177/0146621616664046
  • Jak, S., Oort, F. J., & Dolan, C. V. (2014). Measurement bias in multilevel data. Structural Equation Modeling: A Multidisciplinary Journal, 21(1), 31–39. doi:10.1080/10705511.2014.856694
  • Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38(1), 79–93. doi:10.1111/j.1745-3984.2001.tb01117.x
  • Kelly, K., & Pornprasertmanit, S. (2016). Confidence intervals for population reliability coefficients: Evaluation of methods, recommendations, and software for composite measures. Psychological Methods, 21, 69–92. doi:10.1037/a0040086
  • Kim, S. (2012). A note on the reliability coefficients for item response model-based ability estimates. Psychometrika, 77(1), 153–162. doi:10.1007/s11336-011-9238-0
  • Kozlowski, S. W. J., & Klein, K. J. (2000). A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research and methods in organizations: Foundations, extensions, and new directions (pp. 3–90). San Francisco, CA: Jossey-Bass. Retrieved from http://psycnet.apa.org/record/2000-16936-001
  • Lee, W.-y., & Cho, S.-J. (2017). Detecting differential item discrimination (DID) and the consequences of ignoring DID in multilevel item response models. Journal of Educational Measurement, 54(3), 364–393. doi:10.1111/jedm.12148
  • Longford, N. T., & Muthén, B. O. (1992). Factor analysis for cluster observations. Psychometrika, 57(4), 581–597. doi:10.1007/BF02294421
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. doi:10.4324/9780203056615
  • Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245. doi:10.1007/BF02294018
  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Retrieved from http://psycnet.apa.org/record/1968-35040-000
  • Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011). A 2 × 2 taxonomy of multilevel latent contextual models: Accuracy-bias trade-offs in full and partial error correction models. Psychological Methods, 16(4), 444–467. doi:10.1037/a0024376
  • Luo, Y., & Dimitrov, D. M. (2019). A short note on obtaining point estimates of the IRT ability parameter with MCMC estimation in Mplus: How many plausible values are needed. Educational and Psychological Measurement, 79(2), 272–287. doi:10.1177/0013164418777569
  • Marsh, H. W., Lüdtke, O., Nagengast, B., Trautwein, U., Morin, A. J. S., Abduljabbar, A. S., & Köller, O. (2012). Classroom climate effects: Methodological issues in the evaluation of group-level effects. Educational Psychologist, 47(2), 106–124. doi:10.1080/00461520.2012.670488
  • Magis, D. (2014). Accuracy of asymptotic standard errors of the maximum and weighted likelihood estimators of proficiency levels with short tests. Applied Psychological Measurement, 38(2), 105–121. doi:10.1177/0146621613496890
  • McDonald, R. P. (1970). The theoretical foundations of principal factor analysis, canonical factor analysis and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23(1), 1–21. doi:10.1111/j.2044-8317.1970.tb00432.x
  • McDonald, R. P. (1993). A general model for two-level data with responses missing at random. Psychometrika, 58(4), 575–585. doi:10.1007/BF02294828
  • McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. doi:10.1037//1082-989X.1.1.30
  • Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1(3), 293–299. doi:10.1037//1082-989X.1.3.293
  • Milanzi, E., Molenberghs, G., Alonso, A., Verbeke, G., & De Boeck, P. (2015). Reliability measures in item response theory: Manifest versus latent correlation functions. British Journal of Mathematical and Statistical Psychology, 68(1), 43–64. doi:10.1111/bmsp.12033
  • Monroe, S., & Cai, L. (2015). Examining the reliability of student growth percentiles using multidimensional IRT. Educational Measurement: Issues and Practice, 34(4), 21–30. doi:10.1111/emip.12092
  • Muthén, B. O. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338–354. doi:10.1111/j.1745-3984.1991.tb00363.x
  • Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods and Research, 22(3), 376–398. doi:10.1177/0049124194022003006
  • Muthén, B., & Asparouhov, T. (2013). BSEM measurement invariance analysis: Mplus Web Note No. 17. Retrieved from http://www.statmodel.com/examples/webnotes/webnote17.pdf
  • Muthén, L. K., & Muthén, B. O. (1998–2015). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén. Retrieved from https://www.statmodel.com/download/usersguide/Mplus
  • Nicewander, W. A. (1993). Some relationships between the information function of IRT and the signal/noise ratio and reliability coefficient of classical test theory. Psychometrika, 58(1), 139–141. doi:10.1007/BF02294477
  • Nicewander, W. A. (2018). Conditional reliability coefficients for test scores. Psychological Methods, 23(2), 351–362. doi:10.1037/met0000132
  • Nicewander, W. A., & Schulz, E. M. (2015). A comparison of two methods for computing IRT scores from the number-correct score. Applied Psychological Measurement, 39(8), 643–655. doi:10.1177/0146621615601081
  • Nicewander, W. A., & Thomasson, G. L. (1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 29, 239–247. doi:10.1177/01466219922031356
  • O’Brien, R. M. (1991). Correcting measures of relationship between aggregate-level variables. Sociological Methodology, 21, 125–165. doi:10.2307/270934
  • Patarapichayatham, C., & Kamata, A. (2014). Effects of differential item discriminations between individual-level and cluster-level under the multilevel item response theory model. Open Journal of Applied Sciences, 4, 425–432. doi:10.4236/ojapps.2014.48039
  • Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178. doi:10.2307/1165199
  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modeling. Psychometrika, 69(2), 167–190. doi:10.1007/BF02295939
  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323. doi:10.1016/j.jeconom.2004.08.017
  • Rahman, A., Malik, A., Sikander, S., Roberts, C., & Creed, F. (2008). Cognitive behaviour therapy-based intervention by community health workers for mothers with depression and their infants in rural Pakistan: A cluster-randomised controlled trial. The Lancet, 372(9642), 902–909. doi:10.1016/S0140-6736(08)61400-2
  • Raju, N. S., & Oshima, T. C. (2005). Two prophecy formulas for assessing the reliability of item response theory-based ability estimates. Educational and Psychological Measurement, 65(3), 361–375. doi:10.1177/0013164404267289
  • Raju, N. S., Price, L. R., Oshima, T. C., & Nering, M. L. (2007). Standardized conditional SEM: A case for conditional reliability. Applied Psychological Measurement, 31(3), 169–180. doi:10.1177/0146621606291569
  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models (2nd ed.). Thousand Oaks, CA: Sage Publications. Retrieved from https://us.sagepub.com/en-us/nam/hierarchical-linear-models/book9230
  • Raykov, T., & Marcoulides, G. A. (2006). On multilevel model reliability estimation from the perspective of structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 13(1), 130–141. doi:10.1207/s15328007sem1301_7
  • Raykov, T., & Penev, S. (2010). Testing multivariate mean collinearity with missing data via latent variable modeling. British Journal of Mathematical and Statistical Psychology, 63(3), 481–490. doi:10.1348/000711009X471901
  • Reise, S. P., Ventura, J., Nuechterlein, K. J., & Kim, K. H. (2005). An illustration of multilevel factor analysis. Journal of Personality Assessment, 84(2), 126–136. doi:10.1207/s15327752jpa8402_02
  • Romano, J. L., Kromrey, J. D., & Hibbard, S. T. (2010). A Monte Carlo study of eight confidence interval methods for coefficient alpha. Educational and Psychological Measurement, 70(3), 376–393. doi:10.1177/0013164409355690
  • Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229–244. doi:10.1177/014662169401800304
  • Schweig, J. (2014a). Cross-level measurement invariance in school and classroom environment surveys: Implications for policy and practice. Educational Evaluation and Policy Analysis, 36(3), 259–280. doi:10.3102/0162373713509880
  • Schweig, J. (2014b). Multilevel factor analysis by model segregation: New applications for robust test statistics. Journal of Educational and Behavioral Statistics, 39(5), 394–422. doi:10.3102/1076998614544784
  • Skrondal, A., & Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(3), 659–687. doi:10.1111/j.1467-985X.2009.00587.x
  • Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage. Retrieved from https://us.sagepub.com/en-us/nam/multilevel-analysis/book234191
  • Stapleton, L. M., Yang, J. S., & Hancock, G. R. (2016). Construct meaning in multilevel settings. Journal of Educational and Behavioral Statistics, 41, 463–480. doi:10.3102/1076998616646200
  • Sterba, S. K., Preacher, K. J., Forehand, R., Hardcastle, E. J., Cole, D. A., & Compas, B. E. (2014). Structural equation modeling approaches for analyzing partially nested data. Multivariate Behavioral Research, 49(2), 93–118. doi:10.1080/00273171.2014.882253
  • Thissen, D., Nelson, L., & Swygert, K. A. (2001). Item response theory applied to combination of multiple-choice and constructed-response items: Approximation methods for scale scores. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 293–341). Mahwah, NJ: Erlbaum. doi:10.4324/9781410604729
  • Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum Associates. doi:10.4324/9781410604729
  • Thomson, G. H. (1940). Weighting for battery reliability and prediction. British Journal of Psychology. General Section, 30, 357–366. doi:10.1111/j.2044-8295.1940.tb00968.x
  • Tsutakawa, R. K., & Johnson, J. C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55(2), 371–390. doi:10.1007/BF02295293
  • Wallace, T. L., Kelcey, B., & Ruzek, E. (2016). What can student perception surveys tell us about teaching: Empirical testing the underlying structure of the Tripod student perception survey. American Educational Research Journal, 53(6), 1834–1868. doi:10.3102/0002831216671864
  • Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264–290. doi:10.1177/0013164411410056
  • Zyphur, M., Kaplan, S., & Christian, M. (2008). Assumptions of cross-level measurement and structural invariance in the analysis of the multilevel data: Problems and solutions. Group Dynamics: Theory, Research, and Practice, 12(2), 127–140. doi:10.1037/1089-2699.12.2.127

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.