2,198
Views
51
CrossRef citations to date
0
Altmetric
Articles

Variability in ESL Essay Rating Processes: The Role of the Rating Scale and Rater Experience

Pages 54-74 | Published online: 19 Feb 2010
 

Abstract

Various factors contribute to variability in English as a second language (ESL) essay scores and rating processes. Most previous research, however, has focused on score variability in relation to task, rater, and essay characteristics. A few studies have examined variability in essay rating processes. The current study used think-aloud protocols to examine the roles of rating scales, rater experience, and interactions between them in variability in raters' decision-making processes and the aspects of writing they attend to when reading and rating ESL essays. The study included 11 novice and 14 experienced raters, who each rated 12 ESL essays, both holistically and analytically, while thinking aloud. The findings indicated that rating scale type had larger effects on the participants' rating processes than did rater experience. With holistic scoring, raters tended to refer more often to the essay (the focus of the assessment), whereas with analytic scoring they tended to refer to the rating scale (the source of evaluation criteria) more frequently; analytic scoring drew raters' attention to all evaluation criteria in the rating scale, and novices were influenced by variation in rating scales more than were the experienced raters. The article concludes with implications for essay rating practices and research.

ACKNOWLEDGMENTS

This research was partially supported by a grant from Educational Testing Service (TOEFL small Grant for Doctoral Research in Second or Foreign Language Assessment, 2006). An earlier version of this paper was presented at AAAL conference, March 2009, Denver, CO. I would like to thank the raters who participated in this study and Alister Cumming, Merrill Swain, Liz Hamp-Lyons and two anonymous Language Assessment Quarterly reviewers for their comments on earlier versions of this article.

Notes

1Raters' strategies for controlling their own evaluation behavior (e.g., define, assess, and revise own rating criteria; summarize own rating judgment collectively).

2 CitationCumming et al. (2001, Citation2002) developed three frameworks based on data from different types of tasks and both ESL and English teachers.

3Wilcoxon signed-ranks test is a nonparametric equivalent of the dependent t test, whereas Mann-Whitney test is a nonparametric equivalent of the independent t test for comparing two independent groups.

4In other words, for each rater the proportions of decision-making behaviors and aspects of writing attended to were compared across rating scales. The unit of analysis was the think-aloud protocol and sample size was the number of protocols per rater per rating scale (i.e., n = 12 protocols per rater per rating scale, except for raters with missing data).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 232.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.