Automated writing evaluation for formative assessment of second language writing: investigating the accuracy and usefulness of feedback as part of argument-based validation

Jim RanalliDepartment of English, Iowa State University, Ames, IA, USACorrespondence[email protected]

Stephanie LinkDepartment of English, Iowa State University, Ames, IA, USA;Department of English, Oklahoma State University, Stillwater, OK, USA

Evgeny Chukharev-HudilainenDepartment of English, Iowa State University, Ames, IA, USA

Abstract

An increasing number of studies on the use of tools for automated writing evaluation (AWE) in writing classrooms suggest growing interest in their potential for formative assessment. As with all assessments, these applications should be validated in terms of their intended interpretations and uses. A recent argument-based validation framework outlined inferences that require backing to support integration of one AWE tool, Criterion, into a college-level English as a Second Language (ESL) writing course. The present research appraised evidence for the assumptions underlying two inferences in this argument. In the first of two studies, we assessed evidence for the evaluation inference, which includes the assumption that Criterion provides students with accurate feedback. The second study focused on the utilisation inference involving the assumption that Criterion feedback is useful for students to make decisions about revisions. Results showed accuracy varied considerably across error types, as did students’ abilities to use Criterion feedback to correct written errors. The findings can inform discussion of whether and how to integrate the use of AWE into writing classrooms while raising important questions regarding standards for validation of AWE as formative assessment, Criterion developers’ approach to accuracy, and instructors’ assumptions about the underlying purposes of AWE-based writing activities.

Keywords:

Notes

1. Criterion provides holistic scores for essays based on prompts in the system's built-in prompt library. Holistic scores can also be obtained using prompts developed by the instructor, but these prompts must share certain features with Criterion's built-in prompts and must be created using a special feature within the system.

2. In an early paper about Criterion, Burstein et al. (2003) refer to a 90% precision rate used to evaluate algorithms addressing bigram errors or confusable words. We adopt the standard mentioned in Quinlan et al. because, given recent findings regarding Criterion feedback accuracy in classroom-based studies (e.g. Lavolette et al., Citation2015), we believe an 80% threshold to be more realistic.

3. Compound words emerged as a common error category in the frequency analysis, but a review of the raw data showed the vast majority of occurrences consisted of flagging of the word cannot, which Criterion analysed as having been spelled as two words when this was not the case.

4. Krippendorff's alpha is a statistical measure of the agreement achieved among annotators when coding a set of units of analysis. The annotation tool used in this study employs Krippendorf's α because this metric was specifically designed for content analysis applications and can support any number of annotators and categories as well as various metrics of distance between categories (nominal, ordinal, interval, etc.) as well as incomplete coding data (i.e. when all coders have not coded the entire data-set). Krippendorff’s α differs from Cronbach's α in that the latter is a correlation-based consistency index that standardises annotators' values and measures only covariation. Both indices are reported here on the assumption that Cronbach's α will be more familiar to readers.

5. Unlike the other error types eliciting generic feedback, Preposition Error had lower mental effort ratings more in line with the error types providing specific feedback. This may be attributable to the error-correction item involving preposition error in Part 1, which as the error-correction results show, consisted of an item that proved to be easy for most participants.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Automated writing evaluation for formative assessment of second language writing: investigating the accuracy and usefulness of feedback as part of argument-based validation

Information for

Open access

Opportunities

Help and information

Automated writing evaluation for formative assessment of second language writing: investigating the accuracy and usefulness of feedback as part of argument-based validation

Abstract

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature