Search in:

Advanced search

Measurement: Interdisciplinary Research and Perspectives Volume 17, 2019 - Issue 2

Submit an article Journal homepage

148

Views

CrossRef citations to date

Altmetric

Articles

Exploring the Stability of Differential Item Functioning Across Administrations and Critical Values Using the Rasch Separate Calibration t-test Method

Michael R. PeabodyPsychometrics Department, American Board of Family Medicine

http://orcid.org/0000-0002-0062-5444

Stefanie A. WindEducational Research Department, University of Alabama, TuscaloosaCorrespondence[email protected]

http://orcid.org/0000-0002-1599-375X

Pages 78-92 | Published online: 30 May 2019

Cite this article
https://doi.org/10.1080/15366367.2018.1533782
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

ACT Incorporated. (2017). Fairness report for the ACT tests (2015-2016 Report). Retrieved from http://www.act.org/content/act/en/research/technical-manuals-and-fairness-reports.html
Google Scholar
AERA, APA, & NCME. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Google Scholar
AERA, APA, & NCME. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
AERA, APA, & NCME. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Andrich, D., & Hagquist, C. (2012). Real and artificial differential item functioning. Journal of Educational and Behavioral Statistics, 37(3), 387–416. doi:10.3102/1076998611411913
Web of Science ®Google Scholar
Andrich, D., & Hagquist, C. (2014). Real and artificial differential item functioning in polytomous items. Educational and Psychological Measurement, 75(2), 185–207.
PubMed Web of Science ®Google Scholar
Angoff, W. H. (1972). A technique for the investigation of cultural differences. Paper presented at the annual meeting of the American Psychological Association, Honolulu, HI. ERIC Document Reproduction Services No. ED 069686.
Google Scholar
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3–23). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Bernstein, I., Samuels, E., Woo, A., & Hagge, S. L. (2013). Assessing DIF among small samples with separate calibration t and Mantel-Haenszel χ2 statistics in the Rasch model. Journal of Applied Measurement, 14(4), 389–399.
PubMedGoogle Scholar
Cardall, C., & Coffman, W. E. (1964). A method for comparing the performance of different groups on the same items of a test. Princeton, NJ: Educational Testing Service. Research Bulletin RB-64-61.
Google Scholar
Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. doi:10.1111/jedm.12009
Web of Science ®Google Scholar
DeMars, C. (2015). Estimating variance components from sparse data matrices in large-scale educational assessments. Applied Measurement in Education, 28(1), 1–13. doi:10.1080/08957347.2014.973562
Web of Science ®Google Scholar
Gamerman, D., Goncalves, F. B., & Soares, T. M. (2018). Differential item functioning. In W. J. van der Linden (Ed.), Handbook of item response theory (Vol. 3, pp. 67–86). Boca Raton, FL: CRC Press.
Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Newbury Park, CA: Sage.
Google Scholar
Holland, P. W. (1985). On the study of differential item performance without IRT. Paper presented at the Proceedings of the 27th Annual Conference of the Military Testing Association, San Diego, CA.
Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Ironson, G. H. (1982). Use of chi-square and latent trait approaches for detecting item bias. In A. Berk (Ed.), Handbook of methods for detecting item bias (pp. 117–155). Baltimore, MD: Johns Hopkins University Press.
Google Scholar
Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2013). Modeling differential item functioning using a generalization of the multiple-group bifactor model. Journal of Educational and Behavioral Statistics, 38(1), 32–60. doi:10.3102/1076998611432173
Web of Science ®Google Scholar
Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger.
Google Scholar
Kane, M. (2013). Validating the interpretations and n uses of test scores. Journal of Educational Measurement, 50(1), 1–73. doi:10.1111/jedm.12000
Web of Science ®Google Scholar
Kane, M. (2016). Validation strategies: Delineating and validating proposed interpretations. In S. Lane, M. R. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 64–80). New York, NY: Routledge.
Google Scholar
Li, X., & Wang, W.-C. (2015). Assessment of differential item functioning under cognitive diagnosis models: The DINA model example. Journal of Educational Measurement, 52(1), 28–54. doi:10.1111/jedm.2015.52.issue-1
Web of Science ®Google Scholar
Linacre, J. M. (2014). Winsteps Rasch measurement computer program (version 3.81.0). Beaverton, OR: Winsteps.com.
Google Scholar
Linacre, J. M., & Wright, B. D. (1989). Mantel-Haenszel DIF and PROX are equivalent! Rasch Measurement Transactions, 3(2), 52–53.
Google Scholar
Linn, R. L., & Drasgow, F. (1987). Implications of the golden rule settlement for test construction. Educational Measurement: Issues & Practice, 6(2), 13–17. doi:10.1111/emip.1987.6.issue-2
Google Scholar
Linn, R. L., Levine, M. V., Hastings, C. N., & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5(2), 159–173. doi:10.1177/014662168100500202
Web of Science ®Google Scholar
Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 171–196). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of the data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
PubMed Web of Science ®Google Scholar
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443–451. doi:10.1177/0013164492052002020
Web of Science ®Google Scholar
McCarty, F. A., Oshima, T. C., & Raju, N. S. (2007). Identifying possible sources of differential functioning using differential bundle functioning with polytomously scored data. Applied Measurement in Education, 20(2), 205–225. doi:10.1080/08957340701301660
Web of Science ®Google Scholar
Meij, A. M. M., Kelderman, H., & Flier, H. V. D. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975–999. doi:10.1080/00273171.2010.533047
PubMed Web of Science ®Google Scholar
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315–328. doi:10.1177/014662169401800403
Web of Science ®Google Scholar
National Assessment of Educational Progress. (2018). Differential item functioning. Retrieved from https://nces.ed.gov/nationsreportcard/tdw/analysis/scaling_checks_dif.aspx
Google Scholar
Northwest Evaluation Association. (2018). How research informs our products. Retrieved from https://www.nwea.org/research/how-research-informs-our-products/
Google Scholar
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed. ed.). Thousand Oaks, CA: Sage.
Google Scholar
Paek, I., & Wilson, M. (2011). Formulating the Rasch differential item functioning model under the marginal maximum likelihood estimation context and its comparison with Mantel-Haenszel procedure in short test and small sample conditions. Educational and Psychological Measurement, 71(6), 1023–1046. doi:10.1177/0013164411400734
Web of Science ®Google Scholar
Penfield, R., & Lam, T. C. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues & Practice, 19(3), 5–15. doi:10.1111/j.1745-3992.2000.tb00033.x
Google Scholar
Penfield, R. D. (2014). An NCME instructional module on polytomous item response theory models. Educational Measurement: Issues and Practice, 33(1), 36–48. doi:10.1111/emip.12023
Web of Science ®Google Scholar
Perneger, T. V. (1998). What’s wrong with Bonferroni adjustments. British Medical Journal, 316(7139), 1236–1238.
PubMed Web of Science ®Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. doi:10.1007/BF02294403
Web of Science ®Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.
Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory (1st ed. ed.). New York, NY: Springer.
Google Scholar
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116. doi:10.1177/014662169301700201
Web of Science ®Google Scholar
Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355–371. doi:10.1177/014662169602000404
Web of Science ®Google Scholar
Schulz, E. M., Perlman, C., Rice, W. K., & Wright, B. D. (1996). An empirical comparison of Rasch and Mantel-Haenszel procedures for assessing differential item functioning. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 65–82). Norwood, NJ: Ablex.
Google Scholar
Shepard, L., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317–375. doi:10.3102/10769986006004317
Google Scholar
Smith, R. M. (1996). A comparison of the Rasch separate calibration and between-fit methods of detecting item bias. Educational & Psychological Measurement, 56(3), 403–418. doi:10.1177/0013164496056003003
Web of Science ®Google Scholar
Suh, Y. (2016). Effect size measures for differential item functioning in a multidimensional IRT model. Journal of Educational Measurement, 53(4), 403–430. doi:10.1111/jedm.12123
Web of Science ®Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. doi:10.1111/jedm.1990.27.issue-4
Web of Science ®Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Wang, C., Zheng, C., & Chang, H.-H. (2014). An enhanced approach to combine item response theory with cognitive diagnosis in adaptive testing. Journal of Educational Measurement, 51(4), 358–380. doi:10.1111/jedm.12057
Web of Science ®Google Scholar
Woo, A., & Dragan, M. (2012). Ensuring validity of NCLEX with differential item functioning analysis. Journal of Nursing Regulation, 2(4), 29–31. doi:10.1016/S2155-8256(15)30252-0
Google Scholar
Wright, B. D., & Douglas, G. A. (1975). Best test design and self-tailored testing. Memo no. 19. MESA psychometric laboratory University of Chicago, Chicago, IL.
Google Scholar
Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.
Google Scholar
Wyse, A. E. (2013). DIF cancellation in the Rasch model. Journal of Applied Measurement, 14(2), 118–128.
PubMedGoogle Scholar
Zieky, M. J. (2003). A DIF primer (Center for Education in Assessment). Princeton, N.J.: Educational Testing Service.
Google Scholar
Zieky, M. J. (2016). Developing fair tests. In S. Lane, M. R. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 81–99). New York, NY: Routledge.
Google Scholar
Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. (ETS Research Report No. ETS RR-12-08). Princeton, NJ: Educational Testing Service. Retrieved from https://www.ets.org/Media/Research/pdf/RR-12-08.pdf
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Exploring the Stability of Differential Item Functioning Across Administrations and Critical Values Using the Rasch Separate Calibration t-test Method

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Exploring the Stability of Differential Item Functioning Across Administrations and Critical Values Using the Rasch Separate Calibration t-test Method

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date