Abstract
The purpose of the current study was to examine the impact of item parameter drift (IPD) occurring in context questionnaires from an international large-scale assessment and determine the most appropriate way to address IPD. Focusing on the context of psychometric and educational research where scores from context questionnaires composed of polytomous items were employed for the classification of examinees, the current research investigated the impacts of IPD on the estimation of questionnaire scores and classification accuracy with five manipulated factors: the length of a questionnaire, the proportion of items exhibiting IPD, the direction and magnitude of IPD, and three decisions about IPD. The results indicated that the impact of IPD occurring in a short context questionnaire on the accuracy of score estimation and classification of examinees was substantial. The accuracy in classification considerably decreased especially at the lowest and highest categories of a trait. Unlike the recommendation from literature in educational testing, the current study demonstrated that keeping items exhibiting IPD and removing them only for transformation were appropriate when IPD occurred in relatively short context questionnaires. Using 2011 TIMSS data from Iran, an applied example demonstrated the application of provided guidance in making appropriate decisions about IPD.
Notes
1 The step parameter values were obtained from the values of item location parameters (δ) and threshold parameters (τ) in Martin and Mullis (Citation2012) following the formula in De Ayala (Citation2009), .
2 Context questionnaire scores estimated through IRT were linearly converted to have a more convenient reporting metric with a mean of 10 and standard deviation of 2 (see Martin et al. [Citation2014] for the linear conversion and Martin and Mullis [Citation2012] for cut scores).
3 Item parameters at cycle 4 which mirrored the 2011 TIMSS results were employed as the reference values for the manipulation of IPD because the TIMSS started using IRT estimation from the 2011 cycle. Thus, IPD were manipulated backwards from cycle 4, and no IPD occurred in cycle 4 (see Tables 1 and 2).
4 Cycle 1 was used as the reference year to link other cycles, thus linking was not necessary for cycle 1.
5 Under-classification means that respondents with category 3 in the no IPD conditions were assigned into category 2 in the IPD conditions or those with category 2 in the no IPD conditions were assigned into category 1 in the IPD conditions.
6 Over-classification means that respondents with category 1 in the no IPD conditions were assigned into category 2 in the IPD conditions or those with category 2 in the no IPD conditions were assigned into category 3 in the IPD conditions.
7 The different misspecification patterns found in the simulation study may be due to the psychometric characteristics of the item manipulated for positive IPD (relatively low step parameter values).