148
Views
0
CrossRef citations to date
0
Altmetric
Articles

Exploring the Stability of Differential Item Functioning Across Administrations and Critical Values Using the Rasch Separate Calibration t-test Method

ORCID Icon & ORCID Icon

References

  • ACT Incorporated. (2017). Fairness report for the ACT tests (2015-2016 Report). Retrieved from http://www.act.org/content/act/en/research/technical-manuals-and-fairness-reports.html
  • AERA, APA, & NCME. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
  • AERA, APA, & NCME. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • AERA, APA, & NCME. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Andrich, D., & Hagquist, C. (2012). Real and artificial differential item functioning. Journal of Educational and Behavioral Statistics, 37(3), 387–416. doi:10.3102/1076998611411913
  • Andrich, D., & Hagquist, C. (2014). Real and artificial differential item functioning in polytomous items. Educational and Psychological Measurement, 75(2), 185–207.
  • Angoff, W. H. (1972). A technique for the investigation of cultural differences. Paper presented at the annual meeting of the American Psychological Association, Honolulu, HI. ERIC Document Reproduction Services No. ED 069686.
  • Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3–23). Hillsdale, NJ: Lawrence Erlbaum.
  • Bernstein, I., Samuels, E., Woo, A., & Hagge, S. L. (2013). Assessing DIF among small samples with separate calibration t and Mantel-Haenszel χ2 statistics in the Rasch model. Journal of Applied Measurement, 14(4), 389–399.
  • Cardall, C., & Coffman, W. E. (1964). A method for comparing the performance of different groups on the same items of a test. Princeton, NJ: Educational Testing Service. Research Bulletin RB-64-61.
  • Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. doi:10.1111/jedm.12009
  • DeMars, C. (2015). Estimating variance components from sparse data matrices in large-scale educational assessments. Applied Measurement in Education, 28(1), 1–13. doi:10.1080/08957347.2014.973562
  • Gamerman, D., Goncalves, F. B., & Soares, T. M. (2018). Differential item functioning. In W. J. van der Linden (Ed.), Handbook of item response theory (Vol. 3, pp. 67–86). Boca Raton, FL: CRC Press.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Newbury Park, CA: Sage.
  • Holland, P. W. (1985). On the study of differential item performance without IRT. Paper presented at the Proceedings of the 27th Annual Conference of the Military Testing Association, San Diego, CA.
  • Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.
  • Ironson, G. H. (1982). Use of chi-square and latent trait approaches for detecting item bias. In A. Berk (Ed.), Handbook of methods for detecting item bias (pp. 117–155). Baltimore, MD: Johns Hopkins University Press.
  • Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2013). Modeling differential item functioning using a generalization of the multiple-group bifactor model. Journal of Educational and Behavioral Statistics, 38(1), 32–60. doi:10.3102/1076998611432173
  • Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger.
  • Kane, M. (2013). Validating the interpretations and n uses of test scores. Journal of Educational Measurement, 50(1), 1–73. doi:10.1111/jedm.12000
  • Kane, M. (2016). Validation strategies: Delineating and validating proposed interpretations. In S. Lane, M. R. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 64–80). New York, NY: Routledge.
  • Li, X., & Wang, W.-C. (2015). Assessment of differential item functioning under cognitive diagnosis models: The DINA model example. Journal of Educational Measurement, 52(1), 28–54. doi:10.1111/jedm.2015.52.issue-1
  • Linacre, J. M. (2014). Winsteps Rasch measurement computer program (version 3.81.0). Beaverton, OR: Winsteps.com.
  • Linacre, J. M., & Wright, B. D. (1989). Mantel-Haenszel DIF and PROX are equivalent! Rasch Measurement Transactions, 3(2), 52–53.
  • Linn, R. L., & Drasgow, F. (1987). Implications of the golden rule settlement for test construction. Educational Measurement: Issues & Practice, 6(2), 13–17. doi:10.1111/emip.1987.6.issue-2
  • Linn, R. L., Levine, M. V., Hastings, C. N., & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5(2), 159–173. doi:10.1177/014662168100500202
  • Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 171–196). Hillsdale, NJ: Lawrence Erlbaum.
  • Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of the data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
  • Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443–451. doi:10.1177/0013164492052002020
  • McCarty, F. A., Oshima, T. C., & Raju, N. S. (2007). Identifying possible sources of differential functioning using differential bundle functioning with polytomously scored data. Applied Measurement in Education, 20(2), 205–225. doi:10.1080/08957340701301660
  • Meij, A. M. M., Kelderman, H., & Flier, H. V. D. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975–999. doi:10.1080/00273171.2010.533047
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315–328. doi:10.1177/014662169401800403
  • National Assessment of Educational Progress. (2018). Differential item functioning. Retrieved from https://nces.ed.gov/nationsreportcard/tdw/analysis/scaling_checks_dif.aspx
  • Northwest Evaluation Association. (2018). How research informs our products. Retrieved from https://www.nwea.org/research/how-research-informs-our-products/
  • Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed. ed.). Thousand Oaks, CA: Sage.
  • Paek, I., & Wilson, M. (2011). Formulating the Rasch differential item functioning model under the marginal maximum likelihood estimation context and its comparison with Mantel-Haenszel procedure in short test and small sample conditions. Educational and Psychological Measurement, 71(6), 1023–1046. doi:10.1177/0013164411400734
  • Penfield, R., & Lam, T. C. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues & Practice, 19(3), 5–15. doi:10.1111/j.1745-3992.2000.tb00033.x
  • Penfield, R. D. (2014). An NCME instructional module on polytomous item response theory models. Educational Measurement: Issues and Practice, 33(1), 36–48. doi:10.1111/emip.12023
  • Perneger, T. V. (1998). What’s wrong with Bonferroni adjustments. British Medical Journal, 316(7139), 1236–1238.
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. doi:10.1007/BF02294403
  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.
  • Reckase, M. D. (2009). Multidimensional item response theory (1st ed. ed.). New York, NY: Springer.
  • Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116. doi:10.1177/014662169301700201
  • Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355–371. doi:10.1177/014662169602000404
  • Schulz, E. M., Perlman, C., Rice, W. K., & Wright, B. D. (1996). An empirical comparison of Rasch and Mantel-Haenszel procedures for assessing differential item functioning. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 65–82). Norwood, NJ: Ablex.
  • Shepard, L., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317–375. doi:10.3102/10769986006004317
  • Smith, R. M. (1996). A comparison of the Rasch separate calibration and between-fit methods of detecting item bias. Educational & Psychological Measurement, 56(3), 403–418. doi:10.1177/0013164496056003003
  • Suh, Y. (2016). Effect size measures for differential item functioning in a multidimensional IRT model. Journal of Educational Measurement, 53(4), 403–430. doi:10.1111/jedm.12123
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. doi:10.1111/jedm.1990.27.issue-4
  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Lawrence Erlbaum.
  • Wang, C., Zheng, C., & Chang, H.-H. (2014). An enhanced approach to combine item response theory with cognitive diagnosis in adaptive testing. Journal of Educational Measurement, 51(4), 358–380. doi:10.1111/jedm.12057
  • Woo, A., & Dragan, M. (2012). Ensuring validity of NCLEX with differential item functioning analysis. Journal of Nursing Regulation, 2(4), 29–31. doi:10.1016/S2155-8256(15)30252-0
  • Wright, B. D., & Douglas, G. A. (1975). Best test design and self-tailored testing. Memo no. 19. MESA psychometric laboratory University of Chicago, Chicago, IL.
  • Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.
  • Wyse, A. E. (2013). DIF cancellation in the Rasch model. Journal of Applied Measurement, 14(2), 118–128.
  • Zieky, M. J. (2003). A DIF primer (Center for Education in Assessment). Princeton, N.J.: Educational Testing Service.
  • Zieky, M. J. (2016). Developing fair tests. In S. Lane, M. R. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 81–99). New York, NY: Routledge.
  • Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. (ETS Research Report No. ETS RR-12-08). Princeton, NJ: Educational Testing Service. Retrieved from https://www.ets.org/Media/Research/pdf/RR-12-08.pdf

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.