845
Views
3
CrossRef citations to date
0
Altmetric
Articles

Multilevel Reliability Measures of Latent Scores Within an Item Response Theory Framework

, &
 

Abstract

This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.

Notes

1 The authors thank the anonymous reviewer of a previous version of this paper for pointing out different approaches for obtaining a single reliability measure in the IRT literature.

2 Either probit or logit can be used for binary responses. In this study, the probit link was chosen because all categorical variables can be analyzed using the probit link only in Mplus for ESTIMATOR = BAYES.

3 As shown in Cronbach and Gleser (Citation1964), the signal-to-noise ratio equals the correlation coefficients divided by 1 minus the correlation coefficient.

4 In Mplus, the variance-covariance of the estimator can be obtained using the “OUTPUT: TECH3”; option.

5 In addition to the constraints to identify the model, residual variances in the Mplus IRT model specification were set to 1 to identify the model at the between level (see Asparouhov & Muthén, Citation2016, for details).

6 Milanzi et al. (Citation2015) considered a 24-item as the longest test in their simulation study.

7 The true within-level item discrimination, αi,W, was transformed for the normal-ogive model as αi,W/1.702.

8 The criterion SE with 200 replications was similar to the criterion SE with a large number of 1,000 replications.

9 As a rule of thumb, values larger than 20% indicate an unacceptable degree of bias (e.g., Forero & Maydeu-Olivares, Citation2009).

10 In the gllamm command, the error message is as follows: numerical derivatives are approximate flat or discontinuous region encountered. In Mplus, the error message is as follows: the standard errors of the model parameter estimates may not be trustworthy for some parameters due to a nonpositive definite first-order derivative product matrix.

11 The computational time and burn-in iterations were reported based on the results of the replication for each condition. The computational time ranged from 176 to 1,297 min on Performance Intel Core i7-4770 [email protected]. Based on the PSR criterion, the burn-in iterations were the same for Bayesian analysis with informative and non-informative priors. The burn-in iterations ranged from 250 to 650 across the simulation conditions. The posterior moments were calculated based on the post burn-in iterations ranging from 500 to 1,300 across simulation conditions.

12 The exceptional case is for the between-level item discrimination parameters in K = 20, nk = 15, and I = 40 conditions with ICC = .1 and ICC = .2 in Bayesian analysis with non-informative priors. In this condition (a small number of clusters and cluster sizes with a larger number of items, the bias was unexpectedly small. The small bias resulted in a small RMSE in the condition. However, the posterior standard deviation of the estimates was large (the average posterior deviation [SE] = 1.578 with ICC = .1 and average posterior deviation [SE] = 1.486 with ICC = .2) as a pattern in the small sample size, compared to the other conditions (e.g., the average posterior deviation [SE] = 0.503 with ICC = .1 and the average posterior deviation [SE] = 0.421 with ICC = .2 in the condition K = 40, nk = 15, and I = 40). Although there was no convergence problem in the condition, sampling variability can be large around the point estimate (the posterior median). Thus, the bias based on the posterior median should be interpreted with caution in the K = 40, nk = 15, and I = 40 conditions.

13 The results for the multilevel reliability measures based on the posterior mean were not different from those for the measures based on the posterior median because the posterior distributions were close to symmetric. The symmetric posterior distribution was observed as the sample size was large (e.g., Gelman et al., Citation2013).

14 In the model with separate item discriminations over levels, the within-level SRMR ranged from 0.023 to 0.117 across all conditions. The between-level SRMR ranged from 0.020 to 0.194 in the condition we recommended using between-cluster reliability. In the model with the same item discrimination over levels, the within-level SRMR ranged from 0.049 to 0.191 and the between-level SRMR ranged from 0.093 to 0.106.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.