2,526
Views
24
CrossRef citations to date
0
Altmetric
Articles

A Qualitative Analysis of Rater Behavior on an L2 Speaking Assessment

 

Abstract

Human raters are normally involved in L2 performance assessment; as a result, rater behavior has been widely investigated to reduce rater effects on test scores and to provide validity arguments. Yet raters’ cognition and use of rubrics in their actual rating have rarely been explored qualitatively in L2 speaking assessments. In this study three rater groups (novice, developing, and expert) were first operationalized on the basis of four background variables (rating experience, teaching experience, rater training, and educational background) to predict different levels of expertise in rating. The three groups of raters then evaluated 18 ESL learners’ oral responses using an analytic scoring rubric across three occasions, separated by one-month intervals. Recorded verbal report data were analyzed (a) to compare rater behavior across the three groups and (b) to examine the development of rating performance within each group over time. The analysis revealed that the three groups of raters from different backgrounds presented varying levels of rating ability and different paces of improvement in their rating performance. The findings of the study suggest that a comprehensive consideration of rater characteristics contributes to a better understanding of raters’ different needs for training and rating.

ACKNOWLEDGMENTS

The author thanks James Purpura, Hansun Waring, and Kirby Grabowski, who read the previous version of this manuscript. Thanks to the three anonymous reviewers of the Language Assessment Quarterly, who provided insightful comments.

Notes

1 Originally, Lumley (Citation2005) used the term “educational background” (i.e., postgraduate qualifications in Applied Linguistics and/or ESL) as a criterion of rater selection. Instead, the current study used the term “coursework” to further differentiate the degree of educational background of the raters who were MA students or recent graduates of the TESOL and Applied Linguistics programs.

2 The original scoring rubric used in the language program to score the speaking placement test had not been developed on the basis of a solid theoretical grounding. Therefore, the rubric was revised, deriving the five components from Purpura’s (Citation2004) definition of language knowledge, and was pilot-tested for the current study.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.