Abstract
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies used to address the equivalence of multiple language versions of the same assessment, including in the context of international assessment where cross-cultural fairness is a concern. We also argue that none of the available statistical or qualitative techniques are capable of teasing out the language variable and neutralising its potential effects on item difficulty and demands. Exploring the use of automated text analysis tools at the quality control stage may be successful in addressing some of these challenges.
Acknowledgements
This research was carried out under the auspices of a Doctor of Philosophy (DPhil) thesis programme at the University of Oxford in the UK. The authors would like to thank Professor Pauline Rea-Dickens for the invaluable comments on an earlier version of this manuscript.
Notes
1. Software available online free: http://www.readingmaturity.com/rmm-web/main#/passage/28156.
4. Modern standard Arabic is relatively invariant across Arabic countries, spoken Arabic consists of different dialects which vary, sometimes dramatically, across Arabic countries. In the following, the analysis of the questions makes reference to the Levantine dialect; that is, Arabic spoken in Syria, Lebanon, Palestine and Jordan.
6. El Masri (Citation2015) provides several additional examples of how language idiosyncrasies such as differential familiarity with technical acronyms and differential interference between every day and scientific language which can introduce bias in adapted versions of tests.