Abstract
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in English and Spanish. We examined the mean scores given by raters of different language backgrounds. Also, using generalizability theory, we examined the amount of score variation due to student (the object of measurement) and four sources of measurement error—item, language of testing, rater language background, and rater nested in rater language background. We observed a small, statistically significant difference between mean scores given by raters of different language background and a negligible score variation due to the main and interaction effect of rater. Provided that they are certified bilingual teachers, and regardless of language of testing, raters of different language backgrounds can score ELL responses to short-answer, open-ended items with comparable reliability.
ACKNOWLEDGMENTS
This investigation was supported by the Leadership Education for Advancement and Promotion (LEAP) program of the University of Colorado at Boulder, Project Number 1544472. We are grateful to Carole Capsalis for her support and to Min Li for her expert advice. Also, we are grateful to Chao Wang and Khanh Nguyen-Le for their support in data collection stages. We are also thankful to the teachers who enthusiastically participated in the study. The opinions expressed in this report are those of the authors and do not reflect the opinions of the funding program or our colleagues.
Notes
1The National Assessment of Educational Progress glossary of terms defines English language learners as “students who are in the process of acquiring English language skills and knowledge” (CitationNational Center for Education Statistics, 2009).
2We thank Derek Briggs for his comments on our work, which motivated us to examine the reasons for the small score variation due to rater observed in our previous studies.
3“Bilingual teacher” should not be confused with “teacher of English as a foreign language.” These two types of professionals have very different profiles, as discussed later.
4Seven teachers finished certification before the start of our project. The eighth participant was within six months of completing the process of certification at the time the project started.
5While we recognize occasion as a hidden facet (see CitationCronbach, Linn, Brennan, & Haertel, 1997), in this case nested within language of testing, this condition was unavoidable.