References
- *Abbott, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7–36. https://doi.org/https://doi.org/10.1177/0265532207071510
- *Acar, T. (2011). Sample size in differential item functioning: An application of hierarchical linear modeling. Kuram Ve Uygulamada Egitim Bilimleri, 11(1), 284–288. https://eric.ed.gov/?id=EJ919902
- *Ajeigbe, T. O., & Afolabi, E. R. I. (2014). Assessing unidimensionality and differential item functioning in qualifying examination for senior secondary school students, Osun state, Nigeria. World Journal of Education, 4(4), 30–37. https://eric.ed.gov/?id=EJ1158579 https://doi.org/10.5430/wje.v4n4p30
- *Akcan, R., & Kabasakal, K. A. (2019). An investigation of item bias of English test: The case of 2016 year undergraduate placement exam in Turkey. International Journal of Assessment Tools in Education, 6(1), 48–62. https://eric.ed.gov/?id=EJ1246297 https://doi.org/10.21449/ijate.508581
- *Alavi, S. M., & Bordbar, S. (2017). Differential item functioning analysis of high-stakes test in terms of gender: A Rasch model approach. Malaysian Online Journal of Educational Sciences, 5(1), 10–24. https://mojes.um.edu.my/article/view/12631
- *Alavi, S. M., Kaivanpanah, S., & Masjedlou, A. P. (2018). Validity of the listening module of international English language testing system: Multiple sources of evidence. Language Testing in Asia, 8(1), 8. https://doi.org/https://doi.org/10.1186/s40468-018-0057-4
- *Allalouf, A. (2003). Revising translated differential item functioning items as a tool for improving cross-lingual assessment. Applied Measurement in Education, 16(1), 55–73. https://doi.org/https://doi.org/10.1207/S15324818AME1601_3
- *Allalouf, A., & Abramzon, A. (2008). Constructing better second language assessments based on differential item functioning analysis. Language Assessment Quarterly, 5(2), 120–141. https://doi.org/https://doi.org/10.1080/15434300801934710
- *Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185–198. https://doi.org/https://doi.org/10.1111/j.1745-3984.1999.tb00553.x
- *Al-Owidha, A. A. (2018). Investigating the psychometric properties of the Qiyas for L1 Arabic language test using a Rasch measurement framework. Language Testing in Asia, 8(1), 12. https://doi.org/https://doi.org/10.1186/s40468-018-0064-5
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. 2014. Standards for educational and psychological testing. Eds.
- Angoff, W. H. (1972, July). A technique for the investigation of cultural differences. Paper presented at American Psychological Association Meeting, Honolulu, Hawaii, United States. https://files.eric.ed.gov/fulltext/ED069686.pdf
- *Aryadoust, V. (2012). Differential item functioning in while-listening performance tests: The case of the International English Language Testing System (IELTS) listening module. International Journal of Listening, 26(1), 40–60. https://doi.org/https://doi.org/10.1080/10904018.2012.639649
- *Aryadoust, V., Goh, C. C. M., & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening test. Language Assessment Quarterly, 8(4), 361–385. https://doi.org/https://doi.org/10.1080/15434303.2011.628632
- *Bae, J., & Bachman, L. F. (1998). A latent variable approach to listening and reading: Testing factorial invariance across two groups of children in the Korean/English two-way immersion program. Language Testing, 15(3), 380–414. https://doi.org/https://doi.org/10.1177/026553229801500304
- *Banerjee, J., & Papageorgiou, S. (2016). What’s in a topic? Exploring the interaction between test-taker age and item content in high-stakes testing. International Journal of Listening, 30(1–2), 8–24. https://doi.org/https://doi.org/10.1080/10904018.2015.1056876
- *Bao, H., Dayton, C. M., & Hendrickson, A. B. (2009). Differential item functioning amplification and cancellation in a reading test. Practical Assessment, Research & Evaluation, 14, 19. https://doi.org/https://doi.org/10.7275/6cmj-q724
- *Berendes, K., Wagner, W., Meurers, D., & Trautwein, U. (2019). When a silent reading fluency test measures more than reading fluency: Academic language features predict the test performance of students with a non-German home language. Reading and Writing: An Interdisciplinary Journal, 32(3), 561–583. https://doi.org/https://doi.org/10.1007/s11145-018-9878-x
- *Breland, H., & Lee, Y.-W. (2007). Investigating uniform and non-uniform gender DIF in computer-based ESL writing assessment. Applied Measurement in Education, 20(4), 377–403. https://doi.org/https://doi.org/10.1080/08957340701429652
- *Bruckner, C., Yoder, P., Stone, W., & Saylor, M. (2007). Construct validity of the MCDI-I Receptive Vocabulary Scale can be improved: Differential item functioning between toddlers with autism spectrum disorders and typically developing infants. Journal of Speech, Language, and Hearing Research, 50(6), 1631–1638. https://doi.org/https://doi.org/10.1044/1092-4388(2007/110)
- *Cadime, I., Viana, F. L., & Ribeiro, I. (2014). Invariance on a reading comprehension test in European Portuguese: A differential item functioning analysis between students from rural and urban areas. European Journal of Developmental Psychology, 11(6), 754–766. https://doi.org/https://doi.org/10.1080/17405629.2014.938629
- Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. SAGE Publications.
- *Charles, P., Belisle, M., Tonita, K., & Smith, J. (2015). Help me tell my story: Development of an oral language measurement scale. Journal of Applied Measurement, 16(3), 278–297.
- *Chen, P.-H., & Fu, J.-T. (2018). Examining measurement properties of the revised preschool language assessment for use with Mandarin-speaking children. Language Assessment Quarterly, 15(4), 348–367. https://doi.org/https://doi.org/10.1080/15434303.2018.1529176
- *Chen, Y.-F., & Jiao, H. (2014). Exploring the utility of background and cognitive variables in explaining latent differential item functioning: An example of the PISA 2009 reading assessment. Educational Assessment, 19(2), 77–96. https://doi.org/https://doi.org/10.1080/10627197.2014.903650
- *Chen, Z., & Henning, G. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2(2), 155–163. https://doi.org/https://doi.org/10.1177/026553228500200204
- Cho, Y., Jiao, H., & Macready, G. B. (2012, April). Assessing the effects of different item parameter profiles in mixture Rasch models. Annual Meeting of the American Educational Research Association.
- Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. https://doi.org/https://doi.org/10.1111/j.1745-3984.2005.00007
- De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3–4), 243–276. https://doi.org/https://doi.org/10.1080/15305058.2002.9669495
- *Domingue, B. W., Lang, D., Cuevas, M., Castellanos, M., Lopera, C., Mariño, J. P., Molina, A., & Shavelson, R. J. (2017). Measuring student learning in technical programs: A case study from Colombia. AERA Open, 3(1), 1. https://eric.ed.gov/?id=EJ1194181 https://doi.org/10.1177/2332858417692997
- Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23(4), 355–368. https://doi.org/https://doi.org/10.1111/j.1745-3984.1986.tb00255.x
- Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item‐bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33(4), 465–484. https://doi.org/https://doi.org/10.1111/j.1745-3984.1996.tb00502.x
- *Elder, C. (1996). The effect of language background on “foreign” language test performance: The case of Chinese, Italian, and modern Greek. Language Learning, 46(2), 233–282. https://doi.org/https://doi.org/10.1111/j.1467-1770.1996.tb01236.x
- *Elder, C., McNamara, T., & Congdon, P. (2003). Rasch techniques for detecting bias in performance assessments: An example comparing the performance of native and non-native speakers on a test of academic English. Journal of Applied Measurement, 4(2), 181–197.
- *Elosua Oliden, P., & Mujika Lizaso, J. (2014). Impact of family language and testing language on reading performance in a bilingual educational context. Psicothema, 26(3), 328–335. https://doi.org/https://doi.org/10.7334/psicothema2013.344
- *Elosua, P., & Lopez-Jauregui, A. (2007). Potential sources of differential item functioning in the adaptation of tests. International Journal of Testing, 7(1), 39–52. https://doi.org/https://doi.org/10.1080/15305050709336857
- *Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of English and French versions of Canada’s National achievement tests. Applied Measurement in Education, 17(3), 301–321. https://doi.org/https://doi.org/10.1207/s15324818ame1703_4
- *Ercikan, K., Roth, W.-M., Simon, M., Sandilands, D., & Lyons-Thomas, J. (2014). Inconsistencies in DIF detection for sub-groups in heterogeneous language groups. Applied Measurement in Education, 27(4), 273–285. https://doi.org/https://doi.org/10.1080/08957347.2014.944306
- *Farrington, A. L., & Lonigan, C. J. (2015). Examining the measurement precision and invariance of the “revised get ready to read!”. Journal of Learning Disabilities, 48(3), 227–238. https://doi.org/https://doi.org/10.1177/0022219413495568
- *Farrington, A. L., Lonigan, C. J., Phillips, B. M., Farver, J. M., & McDowell, K. D. (2015). Evaluation of the utility of the “Revised get ready to read!” For Spanish-speaking English-language learners through differential item functioning analysis. Assessment for Effective Intervention, 40(4), 216–227. https://doi.org/https://doi.org/10.1177/1534508415577468
- Ferne, T., & Rupp, A. A. (2007). A Synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4(2), 113–148. https://doi.org/https://doi.org/10.1080/15434300701375923
- *Fidalgo, A. M., Alavi, S. M., & Amirian, S. M. R. (2014). Strategies for testing statistical and practical significance in detecting DIF with logistic regression models. Language Testing, 31(4), 433–451. https://doi.org/https://doi.org/10.1177/0265532214526748
- *Filipi, A. (2012). Do questions written in the target language make foreign language listening comprehension tests more difficult? Language Testing, 29(4), 511–532. https://doi.org/https://doi.org/10.1177/0265532212441329
- *Finch, W. H., Hernández Finch, M. E., & French, B. F. (2016). Recursive partitioning to identify potential causes of differential item functioning in cross-national data. International Journal of Testing, 16(1), 21–53. https://doi.org/https://doi.org/10.1080/15305058.2015.1039644
- *Fox, M. C., Berry, J. M., & Freeman, S. P. (2014). Are vocabulary tests measurement invariant between age groups? An item response analysis of three popular tests. Psychology and Aging, 29(4), 925–938. https://doi.org/https://doi.org/10.1037/a0038217
- *Freedle, R., & Kostin, I. (1990). Item difficulty of four verbal item types and an index of differential item functioning for black and white examinees. Journal of Educational Measurement, 27(4), 329–343. https://doi.org/https://doi.org/10.1111/j.1745-3984.1990.tb00752.x
- *Freedle, R., & Kostin, I. (1997). Predicting black and white differential item functioning in verbal analogy performance. Intelligence, 24(2), 417–444. https://doi.org/https://doi.org/10.1016/S0160-2896(97)90058-1
- *French, B. F., & Gotch, C. M. (2013). Sex differences in item functioning in the comprehensive inventory of basic skills-II vocabulary assessments. Journal of Psychoeducational Assessment, 31(4), 410–417. https://doi.org/https://doi.org/10.1177/0734282912460857
- *Geramipour, M., & Shahmirzadi, N. (2004). A gender-related differential item functioning study of an English test. The Journal of Asia TEFL, 16(2), 674–682. https://doi.org/http://dx.doi.org/10.18823/asiatefl.2019.16.2.15.674
- *Geranpayeh, A., & Kunnan, A. J. (2007). Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4(2), 190–222. https://doi.org/https://doi.org/10.1080/15434300701375758
- *Gierl, M. J. (2005). Using dimensionality-based DIF analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24(1), 3–14. https://doi.org/https://doi.org/10.1111/j.1745-3992.2005.00002.x
- Gierl, M. J., & Khalid, S. N. (2000, April). Identifying sources of differential item functioning on translated achievement tests: A confirmatory analysis. Presented at annual meeting of the National Council on Measurement in Education. New Orleans, LA.
- Gierl, M. J., Rogers, W. T., & Klinger, D. A. (1999). Using statistical and judgmental reviews to identify and interpret translation differential item functioning. Alberta Journal of Educational Research, 45(4), 353–376.
- *Goodrich, J. M., Lonigan, C. J., & Alfonso, S. V. (2019). Measurement of early literacy skills among monolingual English-speaking and Spanish-speaking language-minority children: A differential item functioning analysis. Early Childhood Research Quarterly, 47(2), 99–110. https://doi.org/https://doi.org/10.1016/j.ecresq.2018.10.007
- *Grover, R. K., & Ercikan, K. (2017). For which boys and which girls are reading assessment items biased against? Detection of differential item functioning in heterogeneous gender populations. Applied Measurement in Education, 30(3), 178–195. https://doi.org/https://doi.org/10.1080/08957347.2017.1316276
- Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11), S182–S188. https://doi.org/https://doi.org/10.1097/01.mlr.0000245443.86671.c4
- *Harding, L. (2012). Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective. Language Testing, 29(2), 163–180. https://doi.org/https://doi.org/10.1177/0265532211421161
- *Heppt, B., Haag, N., Böhme, K., & Stanat, P. (2015). The role of academic-language features for reading comprehension of language-minority students and students from low-SES families. Reading Research Quarterly, 50(1), 61–82. https://doi.org/https://doi.org/10.1002/rrq.83
- Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Lawrence Erlbaum Associates.
- *Jang, E. E., & Roussos, L. (2009). Integrative analytic approach to detecting and interpreting L2 vocabulary DIF. International Journal of Testing, 9(3), 238–259. https://doi.org/https://doi.org/10.1080/15305050903107022
- *Kankaraš, M., & Moors, G. (2014). Analysis of cross-cultural comparability of PISA 2009 scores. Journal of Cross-Cultural Psychology, 45(3), 381–399. https://doi.org/https://doi.org/10.1177/0022022113511297
- *Kato, K., Moen, R. E., & Thurlow, M. L. (2009). Differentials of a state reading assessment: Item functioning, distractor functioning, and omission frequency for disability categories. Educational Measurement: Issues and Practice, 28(2), 28–40. https://doi.org/https://doi.org/10.1111/j.1745-3992.2009.00145.x
- *Keuning, J., & Verhoeven, L. T. W. (2007). Screening for word reading and spelling problems in elementary school: An item response theory perspective. Educational and Child Psychology, 24(4), 44–58. http://hdl.handle.net/2066/56606
- *Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1), 89–114. https://doi.org/https://doi.org/10.1177/026553220101800104
- *Kim, S.-H., & Cohen, A. S. (1991). A comparison of two area measures for detecting differential item functioning. Applied Psychological Measurement, 15(3), 269–278. https://doi.org/https://doi.org/10.1177/014662169101500307
- *Kim, Y.-H., & Jang, E. E. (2009). Differential functioning of reading subskills on the OSSLT for L1 and ELL students: A multidimensionality model-based DBF/DIF approach. Language Learning, 59(4), 825–865. https://doi.org/https://doi.org/10.1111/j.1467-9922.2009.00527.x
- *Koo, J., Becker, B. J., & Kim, Y.-S. (2014). Examining differential item functioning trends for English language learners in a reading test: A meta-analytical approach. Language Testing, 31(1), 89–109. https://doi.org/https://doi.org/10.1177/0265532213496097
- *Korkmaz, H. T. E., Stark, S., Berument, S. K., & Guven, A. G. (2012). Detecting differential item functioning across different age groups on the Turkish Receptive Language Test for children. The International Journal of Educational and Psychological Assessment, 12(1), 81–94.
- *Kornilov, S. A., Lebedeva, T. V., Zhukova, M. A., Prikhoda, N. A., Korotaeva, I. V., . K., & Grigorenko, E. L. (2016). Language development in rural and urban Russian-speaking children with and without developmental language disorder. Learning and Individual Differences, 46, 45–53. https://doi.org/https://doi.org/10.1016/j.lindif.2015.07.001
- *Kunnan, A. J. (1990). DIF in native language and gender groups in an ESL placement test. TESOL Quarterly, 24(4), 741–746. https://doi.org/https://doi.org/10.2307/3587128
- Kunnan, A. J. (2007). Test Fairness, test bias, and DIF. Language Assessment Quarterly, 4(2), 109–112. https://doi.org/https://doi.org/10.1080/15434300701375865
- *Lee, H., & Geisinger, K. F. (2014). The effect of propensity scores on DIF analysis: Inference on the potential cause of DIF. International Journal of Testing, 14(4), 313–338. https://doi.org/https://doi.org/10.1080/15305058.2014.922567
- *Lee, Y.-W., Breland, H., & Muraki, E. (2005). Comparability of TOEFL CBT writing prompts for different native language groups. International Journal of Testing, 5(2), 131–158. https://doi.org/https://doi.org/10.1207/s15327574ijt0502_3
- *Lesniewska, J., Pichette, F., & Béland, S. (2018). First language test bias? Comparing French-speaking and polish-speaking participants’ performance on the peabody picture vocabulary test. Canadian Modern Language Review, 74(1), 27–52. https://doi.org/https://doi.org/10.3138/cmlr.3670
- Li, F., Cohen, A. S., Kim, S.-H., & Cho, S.-J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33(5), 353–373. https://doi.org/https://doi.org/10.1177/0146621608326422
- Li, H., Hunter, C. V., & Oshima, T. C. (2013). Gender DIF in reading tests: A synthesis of research. In R. E. Millsap, L. Andries, D. M. B. Van Der Ark, & C. M. Woods (Eds.), Springer proceedings in mathematics & statistics: New developments in quantitative psychology (pp. 489–506). Springer.
- Li, H., Qin, Q., & Lei, P. W. (2017). An examination of the instructional sensitivity of the TIMSS math items: A hierarchical differential item functioning approach. Educational Assessment, 22(1), 1–17. https://doi.org/https://doi.org/10.1080/10627197.2016.1271702
- *Liu, O. L. (2011). Do major field of study and cultural familiarity affect TOEFL[R] iBT reading performance? A confirmatory approach to differential item functioning. Applied Measurement in Education, 24(3), 235–255. https://doi.org/https://doi.org/10.1080/08957347.2011.580645
- *Luppescu, S., & Day, R. R. (1993). Reading, dictionaries, and vocabulary learning. Language Learning, 43(2), 263–287. https://doi.org/https://doi.org/10.1111/j.1467-1770.1992.tb00717.x
- *Magis, D., Raiche, G., Beland, S., & Gerard, P. (2011). A Generalized logistic regression procedure to detect differential item functioning among multiple groups. International Journal of Testing, 11(4), 365–386. https://doi.org/https://doi.org/10.1080/15305058.2011.602810
- Mapuranga, R., Dorans, N. J., & Middleton, K. (2008). A review of recent developments in differential item functioning. (Research Report No. RR–08–43). ETS. https://doi.org/https://doi.org/10.1002/j.2333-8504.2008.tb02129.x
- Mazor, K. M., Kanjee, A., & Clauser, B. E. (1995). Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32(2), 131–144. https://doi.org/https://doi.org/10.1111/j.1745-3984.1995.tb00459.x
- McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24(2), 99–114. https://doi.org/https://doi.org/10.1177/01466210022031552
- Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/https://doi.org/10.1037/0003-066X.50.9.741
- *Morales, A. M. F., van de Vijver, F. J. R., & Poortinga, Y. H. (2013). Differential item functioning and educational risk factors in Guatemalan reading assessment. Revista Interamericana De Psicología, 47(3), 422–432.
- O’Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 255–276). Lawrence Erlbaum Associates, Inc.
- *Ögretmen, T. (2015). DIF analysis across genders for reading comprehension part of English language achievement exam as a foreign language. Educational Research and Reviews, 10(11), 1505–1513. https://doi.org/https://doi.org/10.5897/ERR2015.2284
- *Oliveri, M. E., Ercikan, K., Lyons-Thomas, J., & Holtzman, S. (2016). Analyzing fairness among linguistic minority populations using a latent class differential item functioning approach. Applied Measurement in Education, 29(1), 17–29. https://doi.org/https://doi.org/10.1080/08957347.2015.1102913
- *Oliveri, M. E., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272–293. https://doi.org/https://doi.org/10.1080/15305058.2012.738266
- *Oliveri, M. E., Lawless, R., Robin, F., & Bridgeman, B. (2018). An exploratory analysis of differential item functioning and its possible sources in a higher education admissions context. Applied Measurement in Education, 31(1), 1–16. https://doi.org/https://doi.org/10.1080/08957347.2017.1391258
- *Pae, H. K., Greenberg, D., & Morris, R. D. (2012). Construct validity and measurement invariance of the peabody picture vocabulary test–III form A. Language Assessment Quarterly, 9(2), 152–171. https://doi.org/https://doi.org/10.1080/15434303.2011.613504
- *Pae, T.-I. (2004a). DIF for examinees with different academic backgrounds. Language Testing, 21(1), 53–73. https://doi.org/https://doi.org/10.1191/0265532204lt274oa
- *Pae, T.-I. (2004b). Gender effect on reading comprehension with Korean EFL learners. System: An International Journal of Educational Technology and Applied Linguistics, 32(2), 265–281. https://doi.org/https://doi.org/10.1177/0265532211434027
- *Pae, T.-I. (2012). Causes of gender DIF on an EFL language test: A multiple-data analysis over nine years. Language Testing, 29(4), 533–554. https://doi.org/https://doi.org/10.1177/0265532211434027
- *Pae, T.-I., & Park, G.-P. (2006). Examining the relationship between differential item functioning and differential test functioning. Language Testing, 23(4), 475–496. https://doi.org/https://doi.org/10.1191/0265532206lt338oa
- *Park, H.-S., Pearson, P. D., & Reckase, M. D. (2005). Assessing the effect of cohort, gender, and race on differential item functioning (DIF) in an adaptive test designed for multi-age groups. Reading Psychology an International Quarterly, 26(1), 81–101. https://doi.org/https://doi.org/10.1080/02702710590923805
- Penfield, R. D., & Lam, T. C. M. (2005). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice,19, 19(3), 5–15. https://doi.org/https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
- Pepper, D., Hodgen, J., Lamesoo, K., Kõiv, P., & Tolboom, J. (2018). Think aloud: Using cognitive interviewing to validate the PISA assessment of student self-efficacy in mathematics. International Journal of Research & Method in Education, 41(1), 3–16. https://doi.org/https://doi.org/10.1080/1743727X.2016.1238891
- *Petscher, Y., Connor, C. M., & Al Otaiba, S. (2012). Psychometric analysis of the diagnostic evaluation of language variation assessment. Assessment for Effective Intervention, 37(4), 243–250. https://doi.org/https://doi.org/10.1177/1534508411413760
- *Prieto, G., & Nieto, E. (2014). Influence of DIF on differences in performance of Italian and Asian individuals on a reading comprehension test of Spanish as a foreign language. Journal of Applied Measurement, 15(2), 176–188.
- *Puhan, G., Boughton, K., & Kim, S. (2007). Examining differences in examinee performance in paper and pencil and computerized testing. Journal of Technology, Learning, and Assessment, 6(3), 1–20. https://eric.ed.gov/?id=EJ838613
- *Qi, C. H., & Marley, S. C. (2009). Differential item functioning analysis of the preschool language scale—4 between English-speaking Hispanic and European American children from low-income families. Topics in Early Childhood Special Education, 29(3), 171–180. https://doi.org/https://doi.org/10.1177/0271121409332674
- *Raju, N. S., Drasgow, F., & Slinde, J. A. (1993). An empirical comparison of the area methods, Lord’s Chi-Square Test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53(2), 301–314. https://doi.org/https://doi.org/10.1177/0013164493053002001
- *Ravand, H. (2015). Item response theory using hierarchical generalized linear models. Practical Assessment, Research & Evaluation, 20(7). https://scholarworks.umass.edu/pare/vol20/iss1/7/
- *Reed, D. K., Vaughn, S., & Petscher, Y. (2012). The Validity of a holistically scored retell protocol for determining the reading comprehension of middle school students. Learning Disability Quarterly, 35(2), 76–89. https://doi.org/https://doi.org/10.1177/0731948711432509
- *Roever, C. (2007). DIF in the assessment of second language pragmatics. Language Assessment Quarterly, 4(2), 165–189. https://doi.org/https://doi.org/10.1080/15434300701375733
- *Ross, S. J., & Okabe, J. (2006). The subjective and objective interface of bias detection on language tests. International Journal of Testing, 6(3), 229–253. https://doi.org/https://doi.org/10.1207/s15327574ijt0603_2
- Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355–371. https://doi.org/https://doi.org/10.1177/014662169602000404
- *Runnels, J. (2013). Measuring differential item and test functioning across academic disciplines. Language Testing in Asia, 3(1), 9. https://doi.org/https://doi.org/10.1186/2229-0443-3-9
- *Ryan, K. E., & Bachman, L. F. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9(1), 12–29. https://doi.org/https://doi.org/10.1177/026553229200900103
- *Sandilands, D., Oliveri, M. E., Zumbo, B. D., & Ercikan, K. (2013). Investigating sources of differential item functioning in international large-scale assessments using a confirmatory approach. International Journal of Testing, 13(2), 152–174. https://doi.org/https://doi.org/10.1080/15305058.2012.690140
- *Sandilos, L. E., Lewis, K., Komaroff, E., Hammer, C. S., Scarpino, S. E., Lopez, L., Rodriguez, B., & Goldstein, B. (2015). Analysis of bilingual children’s performance on the English and Spanish versions of the Woodcock-Muñoz Language Survey-R (WMLS-R). Language Assessment Quarterly, 12(4), 386–408. https://doi.org/https://doi.org/10.1080/15434303.2015.1100198
- *Santelices, M. V., & Wilson, M. (2010). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80(1), 106–134. https://doi.org/https://doi.org/10.17763/haer.80.1.j94675w001329270
- *Sasaki, M. (1991). A comparison of two methods for detecting differential item functioning in an ESL placement test. Language Testing, 8(2), 95–111. https://doi.org/https://doi.org/10.1177/026553229100800201
- *Scheffner-Hammer, C., Pennock-Roman, M., Rzasa, S., & Tomblin, J. B. (2002). An analysis of the test of language development—primary for item bias. American Journal of Speech-Language Pathology, 11(3), 274–284. https://doi.org/https://doi.org/10.1044/1058-0360(2002/032)
- *Scheuneman, J. D., & Gerritz, K. (1990). Using differential item functioning procedures to explore sources of item difficulty and group performance characteristics. Journal of Educational Measurement, 27(2), 109–131. https://doi.org/https://doi.org/10.1111/j.1745-3984.1990.tb00737.x
- *Schmitt, A. P. (1988). Language and cultural characteristics that explain differential item functioning for Hispanic examinees on the scholastic aptitude test. Journal of Educational Measurement, 25(1), 1–13. https://doi.org/https://doi.org/10.1111/j.1745-3984.1988.tb00287.x
- Schmitt, A. P., Holland, P. W., & Dorans, N. J. (1993). Evaluating hypotheses about differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 281–315). Lawrence Erlbaum Associates, Inc.
- Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., & Sprangers, M. A. G. (2010). Interpretation of differential item functioning (DIF) analyses using external review. Expert Reviews in Pharmacoeconomics and Outcomes Research, 10(3), 253–258. https://doi.org/https://doi.org/10.1586/erp.10.22
- *Sekercioglu, G., & Kogar, H. (2018). The examination of measurement invariance and differential item functioning of PISA 2015 cognitive tests in terms of the commonly used languages. Novitas-ROYAL (Research on Youth and Language), 12(2), 152–172. https://eric.ed.gov/?id=EJ1195282
- Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159–194. https://doi.org/https://doi.org/10.1007/BF02294572
- *Shimizu, Y., & Zumbo, B. D. (2005). A logistic regression for differential item functioning primer. Japan Language Testing Association Journal, 7, 110–124. https://doi.org/https://doi.org/10.20622/jltaj.7.0_110
- *Simos, P. G., Sideridis, G. D., Protopapas, A., & Mouzaki, A. (2011). Psychometric evaluation of a receptive vocabulary test for Greek elementary students. Assessment for Effective Intervention, 37(1), 34–49. https://doi.org/https://doi.org/10.1177/1534508411413254
- *Sireci, S. G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148–166. https://doi.org/https://doi.org/10.1191/0265532203lt249oa
- Sireci, S. G., & Rios, J. A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2–3), 170–187. https://doi.org/https://doi.org/10.1080/13803611.2013.767621
- *Snetzler, S., & Qualls, A. L. (2000). Examination of differential item functioning on a standardized achievement battery with limited English proficient students. Educational and Psychological Measurement, 60(4), 564–577. https://doi.org/https://doi.org/10.1177/00131640021970727
- Steinberg, L., Thissen, D., & Wainer, H. (1990). Validity. In H. Wainer, N. J. Ns, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, & D. Thissen (Eds.), Computerized adaptive testing: A primer (pp. 187–231). Lawrence Erlbaum.
- *Stubbe, T. C. (2011). How do different versions of a test instrument function in a single language? A DIF analysis of the PIRLS 2006 German assessments. Educational Research and Evaluation, 17(6), 465–481. https://doi.org/https://doi.org/10.1080/13803611.2011.630560
- *Styles, I., Wildy, H., Pepper, V., Faulkner, J., & Berman, Y. (2014). Australian indigenous students’ performance on the PIPS-BLA reading and mathematics scales: 2011–2013. International Research in Early Childhood Education, 5(1), 103–123. https://eric.ed.gov/?id=EJ1150993
- Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. https://doi.org/https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
- *Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323–340. https://doi.org/https://doi.org/10.1177/026553220001700303
- *Taylor, C. S., & Lee, Y. (2011). Ethnic DIF in reading tests with mixed item formats. Educational Assessment, 16(1), 35–68. https://doi.org/https://doi.org/10.1080/10627197.2011.552039
- *Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25(3), 246–280. https://doi.org/https://doi.org/10.1080/08957347.2012.687650
- *Teker, G. T., & Dogan, N. (2015). The effects of testlets on reliability and differential item functioning. Educational Sciences: Theory and Practice, 15(4), 969–980. https://doi.org/https://doi.org/10.12738/estp.2015.4.2577
- *Uiterwijk, H., & Vallen, T. (2005). Linguistic sources of item bias for second generation immigrants in Dutch tests. Language Testing, 22(2), 211–234. https://doi.org/https://doi.org/10.1191/0265532205lt301oa
- von Davier, M., & Yamamoto, K. (2007). Mixture-distribution and HYBRID Rasch models. In M. von Davier & C. H. Carstensen (Eds.), Statistics for social and behavioral sciences. Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 99–115). Springer.
- *waKivilu, J. M. (2010). Determination of differential bundle functioning (DBF) of numeracy and literacy tests administered to grade 3 learners in South Africa. South African Journal of Psychology, 40(3), 308–317. https://doi.org/https://doi.org/10.1177/008124631004000309
- *Walker, C. M. (2011). What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment, 29(4), 364–376. https://doi.org/https://doi.org/10.1177/0734282911406666
- *Webb, M. L., Cohen, A. S., & Schwanenflugel, P. J. (2008). Latent class analysis of differential item functioning on the peabody picture vocabulary test-III. Educational and Psychological Measurement, 68(2), 335–351. https://doi.org/https://doi.org/10.1177/0013164407308474
- *Wedman, J. (2018). Reasons for gender-related differential item functioning in a college admissions test. Scandinavian Journal of Educational Research, 62(6), 959–970. https://doi.org/https://doi.org/10.1080/00313831.2017.1402365
- *Welch, C. J., & Miller, T. R. (1995). Assessing differential item functioning in direct writing assessments: Problems and an example. Journal of Educational Measurement, 32(2), 163–178. https://doi.org/https://doi.org/10.1111/j.1745-3984.1995.tb00461.x
- Woitschach, P., Zumbo, B. D., & Fernández Alonso, R. (2019). An ecological view of measurement: Focus on multilevel model explanation of differential item functioning. Psicothema, 31(2), 194–203. https://doi.org/https://doi.org/10.7334/psicothema2018.303
- Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
- Zumbo, B. D. (2007a). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223–233. https://doi.org/https://doi.org/10.1080/15434300701375832
- Zumbo, B. D. (2007b). Validity: Foundational issues and statistical methodology. C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 45–79). Elsevier Science. Psychometrics.
- Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). IAP - Information Age Publishing, Inc.
- Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research & Policy Studies, 5(1), 1–23. https://eric.ed.gov/?id=EJ846827
- *Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136–151. https://doi.org/https://doi.org/10.1080/15434303.2014.972559
- Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. (RR–12–08). ETS. https://doi.org/http://dx.doi.org/10.1002/j.2333-8504.2012.tb02290.x