ABSTRACT
Gender fairness in testing can be impeded by the presence of differential item functioning (DIF), which potentially causes test bias. In this study, the presence and causes of gender-related DIF were investigated with real data from 800 items answered by 250,000 test takers. DIF was examined using the Mantel–Haenszel and logistic regression procedures. Little DIF was found in the quantitative items and a moderate amount was found in the verbal items. Vocabulary items favored women if sampled from traditionally female domains but generally not vice versa if sampled from male domains. The sentence completion item format in the English reading comprehension subtest favored men regardless of content. The findings, if supported in a cross-validation study, can potentially lead to changes in how vocabulary items are sampled and in the use of the sentence completion format in English reading comprehension, thereby increasing gender fairness in the examined test.
Acknowledgements
The author is grateful to Maria Johansson and Morgan Thorsell for valuable input concerning the qualitative properties of the verbal and quantitative DIF-items respectively. The author is also grateful to Marie Wiberg and Per-Erik Lyrén for their valuable comments on the manuscript.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Jonathan Wedman http://orcid.org/0000-0002-8479-9117
Notes
1 In this study, category B is labeled “Moderate”, instead of the original “Intermediate”, so that the terminology for the Mantel–Haenszel procedure will match that of logistic regression.