ABSTRACT
Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13 empirical item difficulty modeling studies of reading comprehension tests. We define reading comprehension item response demands as reading passage variables (e.g., length, complexity), passage-by-item variables (e.g., degree of correspondence between item and text, type of information requested), and item stem and response option variables. We report on response demand variables that are related to item difficulty and illustrate how they can be used to manage item difficulty in construct-relevant ways so that empirical item difficulties are within a targeted range (e.g., located within the Proficient or other proficiency level range on a test’s IRT scale, where intended).
Acknowledgments
The authors thank Peter Afflerbach, Jill Fitzgerald, and Kristin Morrison, and the editor and three reviewers for their advice on this study.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
1 Achievement (or Performance) level descriptors (ALDs, PLDs) describe the content area knowledge and skills that students in each level have demonstrated on a test and can be expected to be able to demonstrate outside of the test. Common ALD/PLD labels are Advanced, Proficient, Basic, and Below Basic. Achievement level labels are likely to change as a result of scrutiny in diversity, equity, and inclusion efforts around the country.
2 We have not been able to procure Tinkelman, S. (Citation1947). Difficulty prediction of test items. Bureau of Publications, Teachers College, Columbia University, New York. We would welcome help with this.
3 We use “passage” to refer to all sorts of reading stimuli that appear in current educational and credentialing assessment programs, such as literary and informational texts, literary nonfiction, functional texts like flyers or memos, paired passages, and so on. The studies that meet our inclusion criteria (see directly below) and that we review here happen to include only traditional reading passages.
4 We distinguish reading comprehension tests in assessing reading/language arts from reading comprehension tests in English language proficiency tests. Our rationale is based on the difference between test target constructs: English language proficiency tests define increasing levels of competence in the acquisition of the English language while reading/language arts tests define what all students at a specified grade should know and be able to do in the academic content area of reading/language arts (after ESEA Section 1111(b)(1)(F); see https://www2.ed.gov/documents/essa-act-of-1965.pdf).
5 We have not reviewed for this paper the empirical literature on item-writer cognitive processing and their writing and decision-making processes; these are our observations from working in educational testing.