Abstract
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on the construct. This study focuses on the use of multiple-indicator multiple-cause (MIMIC) structural equation models for DIF testing, parameterized as item response models. The accuracy of these methods, and the sample size requirements, are not well established. This study examines the accuracy of MIMIC methods for DIF testing when the focal group is small and compares results with those obtained using 2-group item response theory (IRT). Results support the utility of the MIMIC approach. With small focal-group samples, tests of uniform DIF with binary or 5-category ordinal responses were more accurate with MIMIC models than 2-group IRT. Recommendations are offered for the application of MIMIC methods for DIF testing.
ACKNOWLEDGMENT
I am grateful to Michael Strube for providing helpful suggestions on a draft of this article.
Notes
1 The data for the empirical example were originally collected by Carol M. Woods as part of a collaborative project with Jonathan S. Abramowitz and David F. Tolin. Participants constitute a subsample of the samples published previously in research by CitationDeacon, Abramowitz, Woods, & Tolin (2003) and CitationWoods (2006).