2,634
Views
30
CrossRef citations to date
0
Altmetric
Articles

ITC Guidelines for Translating and Adapting Tests (Second Edition)

REFERENCES

  • Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185–198. doi:10.1111/j.1745-3984.1999.tb00553.x.
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
  • Angoff, W. H., & Modu, C. C. (1973). Equating the scales of the Prueba de Apititud Academica and the Scholastic Aptitude Test (Research Rep No. 3). New York, NY: College Entrance Examination Board.
  • Asparouhov, T., & Muthén, B. (2009). Exploratory structural modeling. Structural Equation Modeling, 16, 397–438. doi:10.1080/10705510903008204.
  • Brislin, R. W. (1986). The wording and translation of research instruments. In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural psychology (pp. 137–164). Newbury Park, CA: Sage Publications.
  • Byrne, B. (2001). Structural equation modeling with AMOS, EQS, and LISREL: Comparative approaches to testing for the factorial validity of a measuring instrument. International Journal of Testing, 1, 55–86. doi:10.1207/S15327574IJT0101_4.
  • Byrne, B. (2003). Measuring self-concept measurement across culture: Issues, caveats, and application. In H. W. Marsh, R. Craven, & D. M. McInerney (Eds.), International advances in self research (pp. 30-41). Greenwich, CT: Information Age Publishing.
  • Byrne, B. (2006). Structural equation modeling with EQS: Basic concepts, applications, and programming (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • Byrne, B. M. (2008). Testing for multigroup equivalence of a measuring instrument: A walk through the process. Psicothema, 20, 872–882.
  • Byrne, B. M., & van de Vijver, F. J. R. (2010). Testing for measurement and structural equivalence in large-scale cross-cultural studies: Addressing the issue of nonequivalence. International Journal of Testing, 10, 107–132 doi:10.1080/15305051003637306.
  • Byrne, B. M., & van de Vijver, F. J. R. (2014). Factorial structure of the Family Values Scale from a multilevel-multicultural perspective. International Journal of Testing, 14, 168–192. doi:10.1080/15305058.2013.870903.
  • Clauser, B. E., Nungester, R. J., Mazor, K., & Ripley, D. (1996). A comparison of alternative matching strategies for DIF detection in tests that are multidimensional. Journal of Educational Measurement, 33(2), 202–214. doi:10.1111/j.1745-3984.1996.tb00489.x.
  • Cook, L. L., & Schmitt-Cascallar, A. P. (2005). Establishing score comparability for tests given in different languages. In R. K. Hambleton, P. F. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 139–170). Mahwah, NJ: Lawrence Erlbaum.
  • Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning: Theory and Practice (pp. 137–166). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Ellis, B. B. (1989). Differential item functioning: Implications for test translation. Journal of Applied Psychology, 74, 912–921. doi:10.1037/0021-9010.74.6.912.
  • Ellis, B. B., & Kimmel, H. D. (1992). Identification of unique cultural response patterns by means of item response theory. Journal of Applied Psychology, 77, 177–184. doi:10.1037/0021-9010.77.2.177.
  • Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29(6), 543–533. doi:10.1016/S0883-0355(98)00047-0.
  • Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3), 199–215. doi:10.1207/S15327574IJT023&4_2.
  • Ercikan, K., Gierl, J. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of English and French versions of Canada's national achievement tests. Applied Measurement in Education, 17(3), 301–321. doi:10.1207/s15324818ame1703_4.
  • Ercikan, K., Simon, M., & Oliveri, M. E. (2013). Score comparability of multiple language versions of assessments within jurisdictions. In M. Simon, K. Ercikan, & M. Rousseau (Eds.), An international handbook for large-scale assessments (pp. 110–124). New York, NY: Routledge.
  • Grégoire, J., & Hambleton, R. K.(Eds.). (2009). Advances in test adaptation research [Special Issue]. International Journal of Testing, 9(2), 73–166. doi:10.1080/15305050902880678.
  • Grisay, A. (2003). Translation procedures in OECD/PISA 2000 international assessment. Language Testing, 20(2), 225–240. doi:10.1191/0265532203lt254oa.
  • Hambleton, R. K. (2002). The next generation of the ITC test translation and adaptation guidelines. European Journal of Psychological Assessment, 17(3), 164–172. doi:10.1027//1015-5759.17.3.164.
  • Hambleton, R. K. (2005). Issues, designs, and technical guidelines for adapting tests into multiple languages and cultures. In R. K. Hambleton, P. F. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 3–38). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Applied Testing Technology, 1(1), 1–16.
  • Hambleton, R. K., Clauser, B. E., Mazor, K. M., & Jones, R. W. (1993). Advances in the detection of differentially functioning test items. European Journal of Psychological Assessment, 9(1), 1–18.
  • Hambleton, R. K., & Lee, M. (2013). Methods of translating and adapting tests to increase cross-language validity. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), The Oxford handbook of child assessment (pp. 172–181). New York, NY: Oxford University Press.
  • Hambleton, R. K., Merenda, P. F., & Spielberger, C. (Eds.). (2005). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum Publishers.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.
  • Hambleton, R. K., Yu, L., & Slater, S. C. (1999). Field-test of ITC guidelines for adapting psychological tests. European Journal of Psychological Assessment, 15(3), 270–276. doi:10.1027//1015-5759.15.3.270.
  • Hambleton, R. K., & Zenisky, A. (2010). Translating and adapting tests for cross-cultural assessment. In D. Matsumoto & F. van de Vijver (Eds.), Cross-cultural research methods (pp. 46–74). New York, NY: Cambridge University Press.
  • Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249–260. doi:10.1177/014662168200600301.
  • Javaras, K. N., & Ripley, B. D. (2007). An ‘unfolding’ latent variable model for Likert attitude data: Drawing inferences adjusted for response style. Journal of the American Statistical Association, 102, 454–463. doi:10.1198/016214506000000960.
  • Jeanrie, C., & Bertrand, R. (1999). Translating tests with the International Test Commission Guidelines: Keeping validity in mind. European Journal of Psychological Assessment, 15(3), 277–283. doi:10.1027//1015-5759.15.3.277.
  • Johnson, T. R. (2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika, 68, 563–583. doi:10.1007/BF02295612.
  • Kolen, M. J., & Brennan, R. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer.
  • Levin, K., Willis, G. B., Forsyth, B. H., Norberg, A., Stapleton Kudela, M., Stark, D., & Thompson, F. E. (2009). Using cognitive interviews to evaluate the Spanish-language translation of a dietary questionnaire. Survey Research Methods, 3(1), 13–25.
  • Li, Y., Cohen, A. S., & Ibarra, R. A. (2004). Characteristics of mathematics items associated with gender DIF. International Journal of Testing, 4(2), 115–135. doi:10.1207/s15327574ijt0402_2.
  • Mazor, K. H., Clauser, B. E., & Hambleton, R. K. (1992). The effect of simple size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443–451. doi:10.1177/0013164492052002020.
  • McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Erlbaum.
  • Muniz, J., Elosua, P., & Hambleton, R. K. (2013). Directrices para la traduccion y adaptacion de los tests: Segunda edicion. Psicothema, 25(2), 149–155.
  • Muñiz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115–135. doi:10.1207/S15327574IJT0102_2.
  • Oort, F. J., & Berberoğlu, G. (1992). Using restricted factor analysis with binary data for item bias detection and item analysis. In T. J. Plomp, J. M. Pieters, & A. Feteris (Eds.), European Conference on Educational Research: Book of Summaries (pp. 708–710). Twente, The Netherlands: University of Twente, Department of Education.
  • Park, H., Pearson, P. D., & Reckase, M. D. (2005). Assessing the effect of cohort, gender, and race on DIF in an adaptive test designed for multi-age groups. Reading Psychology, 26, 81–101. doi:10.1080/02702710590923805.
  • Rios, J., & Sireci, S. (2014). Guidelines versus practices in cross-lingual assessment: A disconcerting disconnect. International Journal of Testing, 14(4), 289–312. doi:10.1080/15305058.2014.924006.
  • Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116. doi:10.1177/014662169301700201.
  • Rotter, J. B., & Rafferty, J. E. (1950). Manual: The Rotter Incomplete Sentences Blank: College Form. New York, NY: Psychological Corporation.
  • Scheuneman, J. D., & Grima, A. (1997). Characteristics of quantitative word items associated with differential performance for female and Black examinees. Applied Measurement in Education, 10(4), 299–319. doi:10.1207/s15324818ame1004_1.
  • Sireci, S. G. (1997). Problems and issues in linking tests across languages. Educational Measurement: Issues and Practice, 16, 12–19. doi:10.1111/j.1745-3992.1997.tb00581.x.
  • Sireci, S. G. (2005). Using bilinguals to evaluate the comparability of different language versions of a test. In R. K. Hambleton, P. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 117–138). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • Sireci, S. G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148–166. doi:10.1191/0265532203lt249oa.
  • Sireci, S. G., & Berberoğlu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248. doi:10.1207/S15324818AME1303_1.
  • Sireci, S. G., Patsula, L., & Hambleton, R. K. (2005). Statistical methods for identifying flaws in the test adaptation process. In R. K. P. Hambleton, & C. Merenda, C. Spielberger, (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 93–116). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • Sireci, S. G., Harter, J., Yang, Y., & Bhola, D. (2003). Evaluating the equivalence of an employee attitude survey across languages, cultures, and administration formats. International Journal of Testing, 3(2), 129–150. doi:10.1207/S15327574IJT0302_3.
  • Sireci, S. G., & Wells, C. S. (2010). Evaluating the comparability of English and Spanish video accommodations for English language learners. In P. Winter (Ed.), Evaluating the comparability of scores from achievement test variations (pp. 33–68). Washington, DC: Council of Chief State School Officers.
  • Solano-Flores, G., Trumbull, E., & Nelson-Barber, S. (2002). Concurrent development of dual language assessments: An alternative to translating tests for linguistic minorities. International Journal of Testing, 2(2), 107–129. doi:10.1207/S15327574IJT0202_2.
  • Subok, L. (2017). Detecting differential item functioning using the logistic regression procedure in small samples. Applied Psychological Measurement, 41(1), 30–43. doi:10.1177/0146621616668015.
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. doi:10.1111/j.1745-3984.1990.tb00754.x.
  • Tanzer, N. K., & Sim, C. O. E. (1999). Adapting instruments for use in multiple languages and cultures: A review of the ITC Guidelines for Test Adaptation. European Journal of Psychological Assessment, 15, 258–269. doi:10.1027//1015-5759.15.3.258.
  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning: Theory and practice (pp. 67–113). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • van de Vijver, F. J. R., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1, 89–99. doi:10.1027/1016-9040.1.2.89.
  • van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage Publications.
  • van de Vijver, F. J. R., & Poortinga, Y. H. (1991). Testing across cultures. In R. K. Hambleton & J. Zaal (Eds.), Advances in educational and psychological testing (pp. 277–308). Dordrecht, The Netherlands: Kluwer Academic Publishers.
  • van de Vijver, F. J. R., & Poortinga, Y. H. (1992). Testing in culturally heterogeneous populations: When are cultural loadings undesirable? European Journal of Psychological Assessment, 8, 17–24.
  • van de Vijver, F. J. R., & Poortinga, Y. H. (2005). Conceptual and methodical issues in adapting tests. In R. K. Hambleton, P. F. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 39–64). Mahwah, NJ: Lawrence Erlbaum Publishers.
  • van de Vijver, F. J. R., & Tanzer, N. K. (1997). Bias and equivalence in cross-cultural assessment: An overview. European Review of Applied Psychology, 47(4), 263–279.
  • Wolf, E. J., Harrington, K. M., Clark, S. L., & Miller, M. W. (2013). Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety. Educational and Psychological Measurement, 73(6), 913–934. doi:10.1177/0013164413495237.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.