Abstract
The aim of this study was to improve the criterion-related test score interpretation of a text-based assessment of scientific reasoning competencies in higher education by evaluating factors which systematically affect item difficulty. To provide evidence about the specific demands which test items of various difficulty make on pre-service teachers’ scientific reasoning competencies, we applied a general linear mixed model which allows estimation of the impact of item features on the response observations. The item features had been identified during a standard setting process. Results indicate important predictive potential of one formal item feature (length of response options), two features based on cognitive demands (processing data from tables, processing abstract concepts) and one feature based on solid knowledge (specialist terms). The revealed predictive potential of item features was in accordance with the cognitive demands operationalised in our competence model. Thus, we conclude that the findings support the validity of our interpretation of the test scores as measures of scientific reasoning competencies.
Acknowledgements
The authors want to thank Ronny Scherer for participating in early discussions and recommending valuable literature, and Sven Liepertz for sharing his expertise with the lme4 package. Finally, we are grateful to the two anonymous reviewers, for their comments and suggestions have significantly improved the quality of the article.