Abstract
International surveys are increasingly being used to understand nonacademic outcomes like math and science motivation, and to inform education policy changes within countries. Such instruments assume that the measure works consistently across countries, ethnicities, and languages—that is, they assume measurement invariance. While studies have already demonstrated that some items in international survey measures are noninvariant using basic group comparisons, they do not investigate complex, intersectional sources of bias that go beyond group membership. In this study, we use an emergent method to examine the sensitivity of item parameters from the Programme for International Student Assessment (PISA) survey instruments to intersectional sources of bias. Results indicate that non-invariance exists for most of the items examined, which can change individual scores after accounting for the moderators. Although country-level ranking did not change substantively after accounting for bias, policymaking is likely to be influenced negatively if such sources of non-invariance are not addressed.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 According to Finney and DiStefano (Citation2006) and Bandalos (Citation2014), with a robust estimator, MLR can produce unbiased parameter estimates and standard errors for categorical dependent variables with at least 4–5 response categories and a large sample size, which are all satisfied with the PISA data.
2 After the second step, we excluded language groups if less than 5% of the sample spoke the language for convergence purposes.
3 The test language variable is shown as a single variable here for parsimony purposes. In the analyses, each language is a dummy coded individual variable.
4 Results were not sensitive to this dichotomization in terms of the practical significance of our results.
5 The descriptive statistics indicated that several languages were spoken by less than 5% of the sample, so we excluded those language groups from later non-invariance estimation. The language remained in our model were Chinese, English, French, Russian, Portuguese, Turkish, Thai, and Spanish.
6 Given the use of a categorical maximum likelihood estimator, we focus on relative fit statistics like the AIC and BIC.
7 Given that standard errors are very small and could hardly provide helpful information, we chose not to include them in the figures for parsimony purposes.