ABSTRACT
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found that although TDS is not specifically designed to identify discrepant or misfitting score profiles, this approach can improve person fit for students who are also within a critical range of achievement. Our findings also suggest that TDS alone is not sufficient to provide a comprehensive picture of measurement quality because it does not directly consider person fit. We suggest that researchers and practitioners consider combining TDS with other approaches to score resolution. We consider the implications of our findings for research and practice.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/10627197.2024.2356745
Notes
1 Chance-corrected consistency statistics (weighted kappa; J. Cohen, Citation1968) for student classifications as “flagged” or “not flagged” between each pair of methods are presented in Table A7 in the Online Supplement. Recognizing that our classification analysis is different from typical rater agreement analyses for which chance-corrected statistics were designed, as well as the limitations of chance-corrected agreement analyses (Brennan & Prediger, Citation1981; Uebersax, Citation2009), we do not discuss these results in detail here.
2 Chance-corrected versions of these results are presented in Figure A1 in the Online Supplement.