References
- Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185–198. https://doi.org/10.1111/j.1745-3984.1999.tb00553.x
- ALTE. (2001). Principles of Good Practice for ALTE Examinations. Retrieved October 10, 2022. ALTE Working Group. http://www.alte.org/
- Bronfenbrenner, U. (1979). The ecology of human development. Harvard University Press.
- Camilli, G. (2006). Test fairness. In R. Brennan (Ed.), Educational measurement (pp. 221–256). American Council on Education & Praeger series on higher education.
- Camilli, G., & Shepard, L. (1994). Methods for identifying biased test items. Sage.
- Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76(1), 114–140. https://doi.org/10.1177/0013164415584576
- Council of Europe. (2001) . Common European framework of reference for languages: Learning, teaching and assessment.
- Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23(4), 355–368. https://doi.org/10.1111/j.1745-3984.1986.tb00255.x
- Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1992). The standardization approach to assessing comprehensive differential item functioning. Journal of Educational Measurement, 29(4), 309–319. https://doi.org/10.1111/j.1745-3984.1992.tb00379.x
- Drasgow, F., Nye, C. D., Stark, S., & Chernyshenko, O. S. (2018). Differential item and test functioning. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley handbook of psychometric testing (pp. 885–899). Wiley-Blackwell.
- Elosua, P. (2006). Funcionamiento diferencial del ítem en la evaluación internacional PISA. Detección y Comprensión. [Differential Item Functioning in the PISA International Assessment Detection and Understanding] RELIEVE, 12(2). https://ojs.uv.es/index.php/RELIEVE/article/view/4229
- Elosua, P., & Hambleton, R. H. (2018). Psychological and educational test score comparability across language and cultural groups in the presence of item bias. Journal of Psychology and Education, 13(1), 23–32. https://doi.org/10.23923/rpye2018.01.155
- Elosua, P., & López, A. (2007). Potential DIF sources in the adaptation of tests. International Journal of Testing, 7(1), 39–52. https://doi.org/10.1080/15305050709336857
- Elosua, P., & Peñalba, A. (2018). Language competence assessment in minoritized language revitalisation contexts. The case of Basque. Journal of Multilingual and Multicultural Development, 39(7), 629–640. https://doi.org/10.1080/01434632.2017.1417415
- Elosua, P., & Zumbo, B. D. (2008). Coeficientes de fiabilidad para escalas de respuesta categórica ordenada [Reliability coefficients for ordinal response scales]. Psicothema, 20(4), 896–901. https://www.psicothema.com/pdf/3572.pdf
- Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3–4), 199–215.
- Ferguson, C. A. (1996). Standardization as a form of language spread. In T. Huebner (Ed.), Sociolinguistic perspectives: Papers on language in society, 1959-1994 (pp. 189–199). Oxford University Press.
- Ferne, T., & Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4(2), 113–148. https://doi.org/10.1080/15434300701375923
- Garcia-Garzon, E., Abad, F. J., & Garrido, L. E. (2021). On omega hierarchical estimation: A comparison of exploratory bi-factor analysis algorithms. Multivariate Behavioral Research, 56(1), 101–119. https://doi.org/10.1080/00273171.2020.1736977
- Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38(2), 164–187. https://doi.org/10.1111/j.1745-3984.2001.tb01121.x
- Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11), S182–S188. https://doi.org/10.1097/01.mlr.0000245443.86671.c4
- Hickey, R. (2012). Standard english and standards of english. In R. Hickey (Ed.), Standards of English: Codified varieties around the world (pp. 1–33). Cambridge University Press.
- Hogan-Brun, G., & Wolff, S. (2004). Minority languages in Europe: Frameworks, status, prospects. Palgrave Macmillan.
- Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Lawrence Erlbaum Associates, Inc.
- Hualde, J. I., & Zuazo, K. (2007). The standardization of the Basque language. Language Problems and Language Planning, 31(2), 142–168. https://doi.org/10.1075/lplp.31.2.04hua
- Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
- Lane, P., Costa, J., & De Korne, H. (Eds.). (2017). Standardizing minority languages: Competing ideologies of authority and authenticity in the global periphery. Routledge. https://doi.org/10.4324/9781315647722
- Li, H., Hunter, C. V., & Bialo, J. A. (2021). A revisit of Zumbo’s third generation DIF: How are we doing in language testing? Language Assessment Quarterly, 19(1), 27–53. https://doi.org/10.1080/15434303.2021.1963253
- Liu, Y., Zumbo, B., Gustafson, P., Huang, Y., Kroc, E., & Wu, A. (2016). Investigating causal DIF via propensity score methods. Practical Assessment, Research & Evaluation, 21. Article 13. https://doi.org/10.7275/ewqz-n963
- McLafferty, I. (2004). Focus group interviews as a data collecting strategy. Journal of Advanced Nursing, 48(2), 187–194. https://doi.org/10.1111/j.1365-2648.2004.03186.x
- Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge.
- O’Neill, K., & McPeek, W. (1993). Item and test characteristics that are associated with differential item functioning. In P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 255–276). Lawrence Erlbaum.
- Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Sage.
- Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, vol 26: Psychometrics (pp. 125–167). Elsevier.
- Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-Based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353–368. https://doi.org/10.1177/014662169501900405
- Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696. https://doi.org/10.1080/00273171.2012.715555
- Reise, S. P., Moore, T. M., & Haviland, M. G. (2013). Applying unidimensional item response theory models to psychological data. In K. Geisinger (Ed.), APA handbook of testing and assessment in psychology (Vol. 1, pp. 101–119). American Psychological Associationhttp://dx.doi.org/10.1037/14047-0
- Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016). Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21(2), 137–150. https://doi.org/10.1037/met0000045
- Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02
- Solano-Flores, G., & Elosua, P. (2021). Measuring and operationalizing national assessment capacity. XII International Test Commission Conference, Virtual.
- Solano-Flores, G., & Milbourn, T. (2016). Assessment capacity, cultural validity and consequential validity in PISA. RELIEVE, 22(1), M12. https://doi.org/10.7203/relieve.22.1.8281
- Vogt, D. S., King, D. W., & King, L. A. (2004). Focus groups in psychological assessment: Enhancing content validity by consulting members of the target population. Psychological Assessment, 16(3), 231–243. https://doi.org/10.1037/1040-3590.16.3.231
- Walker, C. M. (2011). What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment, 29(4), 364–376. https://doi.org/10.1177/0734282911406666
- Wyse, A. E. (2013). DIF cancellation in the rasch model. Journal of Applied Measurement, 14(2), 118–128.
- Zenisky, A. L., Hambleton, R. K., & Robin, F. (2003). DIF detection and interpretation in large-scale science assessments: Informing item writing practices. Educational Assessment, 9(1&2), 61–78.
- Zieky, M. J. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item FECV (pp. 337–347). Lawrence Erlbaum Associates, Publishers.
- Zuazo, K. (2008). Euskalkiak Euskararen dialektoak ( [Euskalkiak. Basque dialects]). Elkar.
- Zuazo, K., & Benton, G. (2019). Standard Basque and its dialects. Routledge.
- Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). National Defense Headquarters.
- Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832
- Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera-Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136–151. https://doi.org/10.1080/15434303.2014.972559
- Zwick, R., & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26(1), 55–66. https://doi.org/10.1111/j.1745-3984.1989.tb00318.x