Abstract
Multilevel Rasch models are increasingly used to estimate the relationships between test scores and student and school factors. Response data were generated to follow one-, two-, and three-parameter logistic (1PL, 2PL, 3PL) models, but the Rasch model was used to estimate the latent regression parameters. When the response functions followed 2PL or 3PL models, the proportion of variance explained in test scores by the simulated student or school predictors was estimated accurately with a Rasch model. Proportion of variance within and between schools was also estimated accurately. The regression coefficients were misestimated unless they were rescaled out of logit units. However, item-level parameters, such as DIF effects, were biased when the Rasch model was violated, similar to single-level models.
Notes
Notes
1 Although all of the mathematical properties of the Rasch model hold for the 1PL, the Rasch model did not develop out of the same tradition as IRT models (see Wright & Stone, Citation1979, for an explication of Rasch’s philosophy). Additionally, the 1PL model is often identified by constraining the variance of the θs, which necessitates adding an a-parameter common to all items. With this parameterization, the units are no longer logits; for example, if the a-parameter = 0.8, each scale unit equals 0.8 logits.
2 These interpretations of the covariates presuppose no collinearity among the predictors and no omitted covariates. If, for example, Xj and Xk are correlated, β1 indicates the expected change in ηijk for each unit change in the examinee predictor holding the school predictor constant and β3 indicates the expected change in ηijk for each unit change in the school predictor given a constant distribution of Xj within schools. Similarly, if there is a contextual (compositional) effect of the school mean Xj and this effect were omitted from the model and Xj is not school-mean centered, β1 would be a mixture of the expected change within schools and the contextual effect.
3 Similarly, even if the item difficulties are estimated in the studied sample, one could estimate the item difficulties by conditional maximum likelihood and then replace the εi with the fixed estimates (Christensen, Bjorner, Kreiner, & Petersen, Citation2004; Zwinderman, Citation1991, pp. 593, 598). With very short tests, estimating the item parameters separately from the regression parameters may lead to underestimates of the standard errors of the regression parameters, but this problem decreases with increasing test length (Christensen et al., Citation2004).
4 An additional complication is that one must control for overall group-mean differences, often labeled impact in the DIF literature. If one were confident that the reference item contains no DIF, then the coefficient for the DIF characteristic in the equation for β0j represents impact. Alternatively, if no item is designated as the reference item (γ00 is omitted and there are β0j . . . βIj instead of β(I−1)j), then the coefficient for the DIF characteristic in the equation for β0j represents impact if the DIF balances to zero across items. See Cheong and Kamata (Citation2013) for further discussion.
5 Hedges and Hedberg summarized data from standardized tests of mathematics and reading, grades K–12. The average ICC, before adding covariates to the model, was about .22 for nationally representative samples but smaller when limited to low-socioeconomic or low-achievement schools. Grade 3 reading had the highest (without covariates) ICC of .27.
6 An a-parameter of 1 in the logistic metric is equivalent to an a-parameter of 0.588 in the normal (probit) metric.
7 The term explained variance is not intended to imply causality but is simply less awkward than variance accounted for.
8 For the 3PL items, these b-differences corresponded to average log-odds differences of 0.43, 0.35, and 0.26 for the easy, middle, and hard items, respectively. The averages were calculated using 50 quadrature points evenly spaced between −4 and 4, with the difference at each quadrature point weighted by the total (focal + reference) population density at that point.