Search in:

Advanced search

International Journal of Testing Volume 18, 2018 - Issue 2

Submit an article Journal homepage

2,634

Views

CrossRef citations to date

Altmetric

Articles

ITC Guidelines for Translating and Adapting Tests (Second Edition)

Pages 101-134 | Published online: 21 Dec 2017

Cite this article
https://doi.org/10.1080/15305058.2017.1398166
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

REFERENCES

Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185–198. doi:10.1111/j.1745-3984.1999.tb00553.x.
Web of Science ®Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
Google Scholar
Angoff, W. H., & Modu, C. C. (1973). Equating the scales of the Prueba de Apititud Academica and the Scholastic Aptitude Test (Research Rep No. 3). New York, NY: College Entrance Examination Board.
Google Scholar
Asparouhov, T., & Muthén, B. (2009). Exploratory structural modeling. Structural Equation Modeling, 16, 397–438. doi:10.1080/10705510903008204.
Web of Science ®Google Scholar
Brislin, R. W. (1986). The wording and translation of research instruments. In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural psychology (pp. 137–164). Newbury Park, CA: Sage Publications.
Google Scholar
Byrne, B. (2001). Structural equation modeling with AMOS, EQS, and LISREL: Comparative approaches to testing for the factorial validity of a measuring instrument. International Journal of Testing, 1, 55–86. doi:10.1207/S15327574IJT0101_4.
Google Scholar
Byrne, B. (2003). Measuring self-concept measurement across culture: Issues, caveats, and application. In H. W. Marsh, R. Craven, & D. M. McInerney (Eds.), International advances in self research (pp. 30-41). Greenwich, CT: Information Age Publishing.
Google Scholar
Byrne, B. (2006). Structural equation modeling with EQS: Basic concepts, applications, and programming (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
Byrne, B. M. (2008). Testing for multigroup equivalence of a measuring instrument: A walk through the process. Psicothema, 20, 872–882.
PubMed Web of Science ®Google Scholar
Byrne, B. M., & van de Vijver, F. J. R. (2010). Testing for measurement and structural equivalence in large-scale cross-cultural studies: Addressing the issue of nonequivalence. International Journal of Testing, 10, 107–132 doi:10.1080/15305051003637306.
Google Scholar
Byrne, B. M., & van de Vijver, F. J. R. (2014). Factorial structure of the Family Values Scale from a multilevel-multicultural perspective. International Journal of Testing, 14, 168–192. doi:10.1080/15305058.2013.870903.
Google Scholar
Clauser, B. E., Nungester, R. J., Mazor, K., & Ripley, D. (1996). A comparison of alternative matching strategies for DIF detection in tests that are multidimensional. Journal of Educational Measurement, 33(2), 202–214. doi:10.1111/j.1745-3984.1996.tb00489.x.
Web of Science ®Google Scholar
Cook, L. L., & Schmitt-Cascallar, A. P. (2005). Establishing score comparability for tests given in different languages. In R. K. Hambleton, P. F. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 139–170). Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning: Theory and Practice (pp. 137–166). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Ellis, B. B. (1989). Differential item functioning: Implications for test translation. Journal of Applied Psychology, 74, 912–921. doi:10.1037/0021-9010.74.6.912.
Web of Science ®Google Scholar
Ellis, B. B., & Kimmel, H. D. (1992). Identification of unique cultural response patterns by means of item response theory. Journal of Applied Psychology, 77, 177–184. doi:10.1037/0021-9010.77.2.177.
Web of Science ®Google Scholar
Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29(6), 543–533. doi:10.1016/S0883-0355(98)00047-0.
Google Scholar
Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3), 199–215. doi:10.1207/S15327574IJT023&4_2.
Google Scholar
Ercikan, K., Gierl, J. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of English and French versions of Canada's national achievement tests. Applied Measurement in Education, 17(3), 301–321. doi:10.1207/s15324818ame1703_4.
Web of Science ®Google Scholar
Ercikan, K., Simon, M., & Oliveri, M. E. (2013). Score comparability of multiple language versions of assessments within jurisdictions. In M. Simon, K. Ercikan, & M. Rousseau (Eds.), An international handbook for large-scale assessments (pp. 110–124). New York, NY: Routledge.
Google Scholar
Grégoire, J., & Hambleton, R. K.(Eds.). (2009). Advances in test adaptation research [Special Issue]. International Journal of Testing, 9(2), 73–166. doi:10.1080/15305050902880678.
Google Scholar
Grisay, A. (2003). Translation procedures in OECD/PISA 2000 international assessment. Language Testing, 20(2), 225–240. doi:10.1191/0265532203lt254oa.
Google Scholar
Hambleton, R. K. (2002). The next generation of the ITC test translation and adaptation guidelines. European Journal of Psychological Assessment, 17(3), 164–172. doi:10.1027//1015-5759.17.3.164.
Web of Science ®Google Scholar
Hambleton, R. K. (2005). Issues, designs, and technical guidelines for adapting tests into multiple languages and cultures. In R. K. Hambleton, P. F. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 3–38). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Applied Testing Technology, 1(1), 1–16.
Google Scholar
Hambleton, R. K., Clauser, B. E., Mazor, K. M., & Jones, R. W. (1993). Advances in the detection of differentially functioning test items. European Journal of Psychological Assessment, 9(1), 1–18.
Google Scholar
Hambleton, R. K., & Lee, M. (2013). Methods of translating and adapting tests to increase cross-language validity. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), The Oxford handbook of child assessment (pp. 172–181). New York, NY: Oxford University Press.
Google Scholar
Hambleton, R. K., Merenda, P. F., & Spielberger, C. (Eds.). (2005). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.
Google Scholar
Hambleton, R. K., Yu, L., & Slater, S. C. (1999). Field-test of ITC guidelines for adapting psychological tests. European Journal of Psychological Assessment, 15(3), 270–276. doi:10.1027//1015-5759.15.3.270.
Google Scholar
Hambleton, R. K., & Zenisky, A. (2010). Translating and adapting tests for cross-cultural assessment. In D. Matsumoto & F. van de Vijver (Eds.), Cross-cultural research methods (pp. 46–74). New York, NY: Cambridge University Press.
Google Scholar
Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249–260. doi:10.1177/014662168200600301.
Web of Science ®Google Scholar
Javaras, K. N., & Ripley, B. D. (2007). An ‘unfolding’ latent variable model for Likert attitude data: Drawing inferences adjusted for response style. Journal of the American Statistical Association, 102, 454–463. doi:10.1198/016214506000000960.
Web of Science ®Google Scholar
Jeanrie, C., & Bertrand, R. (1999). Translating tests with the International Test Commission Guidelines: Keeping validity in mind. European Journal of Psychological Assessment, 15(3), 277–283. doi:10.1027//1015-5759.15.3.277.
Web of Science ®Google Scholar
Johnson, T. R. (2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika, 68, 563–583. doi:10.1007/BF02295612.
Web of Science ®Google Scholar
Kolen, M. J., & Brennan, R. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer.
Google Scholar
Levin, K., Willis, G. B., Forsyth, B. H., Norberg, A., Stapleton Kudela, M., Stark, D., & Thompson, F. E. (2009). Using cognitive interviews to evaluate the Spanish-language translation of a dietary questionnaire. Survey Research Methods, 3(1), 13–25.
Google Scholar
Li, Y., Cohen, A. S., & Ibarra, R. A. (2004). Characteristics of mathematics items associated with gender DIF. International Journal of Testing, 4(2), 115–135. doi:10.1207/s15327574ijt0402_2.
Google Scholar
Mazor, K. H., Clauser, B. E., & Hambleton, R. K. (1992). The effect of simple size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443–451. doi:10.1177/0013164492052002020.
Web of Science ®Google Scholar
McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Erlbaum.
Google Scholar
Muniz, J., Elosua, P., & Hambleton, R. K. (2013). Directrices para la traduccion y adaptacion de los tests: Segunda edicion. Psicothema, 25(2), 149–155.
PubMedGoogle Scholar
Muñiz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115–135. doi:10.1207/S15327574IJT0102_2.
Google Scholar
Oort, F. J., & Berberoğlu, G. (1992). Using restricted factor analysis with binary data for item bias detection and item analysis. In T. J. Plomp, J. M. Pieters, & A. Feteris (Eds.), European Conference on Educational Research: Book of Summaries (pp. 708–710). Twente, The Netherlands: University of Twente, Department of Education.
Google Scholar
Park, H., Pearson, P. D., & Reckase, M. D. (2005). Assessing the effect of cohort, gender, and race on DIF in an adaptive test designed for multi-age groups. Reading Psychology, 26, 81–101. doi:10.1080/02702710590923805.
Google Scholar
Rios, J., & Sireci, S. (2014). Guidelines versus practices in cross-lingual assessment: A disconcerting disconnect. International Journal of Testing, 14(4), 289–312. doi:10.1080/15305058.2014.924006.
Google Scholar
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116. doi:10.1177/014662169301700201.
Web of Science ®Google Scholar
Rotter, J. B., & Rafferty, J. E. (1950). Manual: The Rotter Incomplete Sentences Blank: College Form. New York, NY: Psychological Corporation.
Google Scholar
Scheuneman, J. D., & Grima, A. (1997). Characteristics of quantitative word items associated with differential performance for female and Black examinees. Applied Measurement in Education, 10(4), 299–319. doi:10.1207/s15324818ame1004_1.
Web of Science ®Google Scholar
Sireci, S. G. (1997). Problems and issues in linking tests across languages. Educational Measurement: Issues and Practice, 16, 12–19. doi:10.1111/j.1745-3992.1997.tb00581.x.
Google Scholar
Sireci, S. G. (2005). Using bilinguals to evaluate the comparability of different language versions of a test. In R. K. Hambleton, P. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 117–138). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
Sireci, S. G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148–166. doi:10.1191/0265532203lt249oa.
Google Scholar
Sireci, S. G., & Berberoğlu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248. doi:10.1207/S15324818AME1303_1.
Web of Science ®Google Scholar
Sireci, S. G., Patsula, L., & Hambleton, R. K. (2005). Statistical methods for identifying flaws in the test adaptation process. In R. K. P. Hambleton, & C. Merenda, C. Spielberger, (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 93–116). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
Sireci, S. G., Harter, J., Yang, Y., & Bhola, D. (2003). Evaluating the equivalence of an employee attitude survey across languages, cultures, and administration formats. International Journal of Testing, 3(2), 129–150. doi:10.1207/S15327574IJT0302_3.
Google Scholar
Sireci, S. G., & Wells, C. S. (2010). Evaluating the comparability of English and Spanish video accommodations for English language learners. In P. Winter (Ed.), Evaluating the comparability of scores from achievement test variations (pp. 33–68). Washington, DC: Council of Chief State School Officers.
Google Scholar
Solano-Flores, G., Trumbull, E., & Nelson-Barber, S. (2002). Concurrent development of dual language assessments: An alternative to translating tests for linguistic minorities. International Journal of Testing, 2(2), 107–129. doi:10.1207/S15327574IJT0202_2.
Google Scholar
Subok, L. (2017). Detecting differential item functioning using the logistic regression procedure in small samples. Applied Psychological Measurement, 41(1), 30–43. doi:10.1177/0146621616668015.
Web of Science ®Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. doi:10.1111/j.1745-3984.1990.tb00754.x.
Web of Science ®Google Scholar
Tanzer, N. K., & Sim, C. O. E. (1999). Adapting instruments for use in multiple languages and cultures: A review of the ITC Guidelines for Test Adaptation. European Journal of Psychological Assessment, 15, 258–269. doi:10.1027//1015-5759.15.3.258.
Web of Science ®Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning: Theory and practice (pp. 67–113). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
van de Vijver, F. J. R., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1, 89–99. doi:10.1027/1016-9040.1.2.89.
Web of Science ®Google Scholar
van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage Publications.
Google Scholar
van de Vijver, F. J. R., & Poortinga, Y. H. (1991). Testing across cultures. In R. K. Hambleton & J. Zaal (Eds.), Advances in educational and psychological testing (pp. 277–308). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar
van de Vijver, F. J. R., & Poortinga, Y. H. (1992). Testing in culturally heterogeneous populations: When are cultural loadings undesirable? European Journal of Psychological Assessment, 8, 17–24.
Google Scholar
van de Vijver, F. J. R., & Poortinga, Y. H. (2005). Conceptual and methodical issues in adapting tests. In R. K. Hambleton, P. F. Merenda, & C. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 39–64). Mahwah, NJ: Lawrence Erlbaum Publishers.
Google Scholar
van de Vijver, F. J. R., & Tanzer, N. K. (1997). Bias and equivalence in cross-cultural assessment: An overview. European Review of Applied Psychology, 47(4), 263–279.
Web of Science ®Google Scholar
Wolf, E. J., Harrington, K. M., Clark, S. L., & Miller, M. W. (2013). Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety. Educational and Psychological Measurement, 73(6), 913–934. doi:10.1177/0013164413495237.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

ITC Guidelines for Translating and Adapting Tests (Second Edition)

REFERENCES

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

ITC Guidelines for Translating and Adapting Tests (Second Edition)

REFERENCES

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date