Abstract
Simulations were conducted to investigate factors that influence the Mantel, generalized Mantel–Haenszel (GMH), and logistic discriminant function analysis (LDFA) methods in assessing differential item functioning (DIF) for polytomous items. The results show that the magnitude of DIF contamination in the matching score, as measured by the average signed area between the 2 item characteristic curves of the reference and focal groups (Raju, 1988), was more crucial than the percentage of DIF items in a test in determining the Type I error (false positives) of the 3 methods. As long as the average signed area approached zero, all 3 methods maintained control over their Type I error, even when the percentage of DIF items was as high as 20%. The Mantel and LDFA methods yielded higher power (true positives) than the GMH method under all but the balanced DIF patterns. Test purification procedures were useful in improving control over Type I error under the constant DIF pattern.