350
Views
1
CrossRef citations to date
0
Altmetric
Measurement, Statistics, and Research Design

Multilevel Rasch Modeling: Does Misfit to the Rasch Model Impact the Regression Model?

Pages 605-619 | Published online: 20 May 2019
 

Abstract

Multilevel Rasch models are increasingly used to estimate the relationships between test scores and student and school factors. Response data were generated to follow one-, two-, and three-parameter logistic (1PL, 2PL, 3PL) models, but the Rasch model was used to estimate the latent regression parameters. When the response functions followed 2PL or 3PL models, the proportion of variance explained in test scores by the simulated student or school predictors was estimated accurately with a Rasch model. Proportion of variance within and between schools was also estimated accurately. The regression coefficients were misestimated unless they were rescaled out of logit units. However, item-level parameters, such as DIF effects, were biased when the Rasch model was violated, similar to single-level models.

Notes

Notes

1 Although all of the mathematical properties of the Rasch model hold for the 1PL, the Rasch model did not develop out of the same tradition as IRT models (see Wright & Stone, Citation1979, for an explication of Rasch’s philosophy). Additionally, the 1PL model is often identified by constraining the variance of the θs, which necessitates adding an a-parameter common to all items. With this parameterization, the units are no longer logits; for example, if the a-parameter = 0.8, each scale unit equals 0.8 logits.

2 These interpretations of the covariates presuppose no collinearity among the predictors and no omitted covariates. If, for example, Xj and Xk are correlated, β1 indicates the expected change in ηijk for each unit change in the examinee predictor holding the school predictor constant and β3 indicates the expected change in ηijk for each unit change in the school predictor given a constant distribution of Xj within schools. Similarly, if there is a contextual (compositional) effect of the school mean Xj and this effect were omitted from the model and Xj is not school-mean centered, β1 would be a mixture of the expected change within schools and the contextual effect.

3 Similarly, even if the item difficulties are estimated in the studied sample, one could estimate the item difficulties by conditional maximum likelihood and then replace the εi with the fixed estimates (Christensen, Bjorner, Kreiner, & Petersen, Citation2004; Zwinderman, Citation1991, pp. 593, 598). With very short tests, estimating the item parameters separately from the regression parameters may lead to underestimates of the standard errors of the regression parameters, but this problem decreases with increasing test length (Christensen et al., Citation2004).

4 An additional complication is that one must control for overall group-mean differences, often labeled impact in the DIF literature. If one were confident that the reference item contains no DIF, then the coefficient for the DIF characteristic in the equation for β0j represents impact. Alternatively, if no item is designated as the reference item (γ00 is omitted and there are β0j . . . βIj instead of β(I−1)j), then the coefficient for the DIF characteristic in the equation for β0j represents impact if the DIF balances to zero across items. See Cheong and Kamata (Citation2013) for further discussion.

5 Hedges and Hedberg summarized data from standardized tests of mathematics and reading, grades K–12. The average ICC, before adding covariates to the model, was about .22 for nationally representative samples but smaller when limited to low-socioeconomic or low-achievement schools. Grade 3 reading had the highest (without covariates) ICC of .27.

6 An a-parameter of 1 in the logistic metric is equivalent to an a-parameter of 0.588 in the normal (probit) metric.

7 The term explained variance is not intended to imply causality but is simply less awkward than variance accounted for.

8 For the 3PL items, these b-differences corresponded to average log-odds differences of 0.43, 0.35, and 0.26 for the easy, middle, and hard items, respectively. The averages were calculated using 50 quadrature points evenly spaced between −4 and 4, with the difference at each quadrature point weighted by the total (focal + reference) population density at that point.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 169.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.