1,289
Views
1
CrossRef citations to date
0
Altmetric
Articles

Assessing L2 English speaking using automated scoring technology: examining automarker reliability

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 411-436 | Received 31 Jul 2020, Accepted 31 Aug 2021, Published online: 28 Sep 2021
 

ABSTRACT

Recent advances in machine learning have made automated scoring of learner speech widespread, and yet validation research that provides support for applying automated scoring technology to assessment is still in its infancy. Both the educational measurement and language assessment communities have called for greater transparency in describing scoring algorithms and research evidence about the reliability of automated scoring.  This paper reports on a study that investigated the reliability of an automarker using candidate responses produced in an online oral English test. Based on ‘limits of agreement’ and multi-faceted Rasch analyses on automarker scores and individual examiner scores, the study found that the automarker, while exhibiting excellent internal consistency, was slightly more lenient than examiner fair average scores, particularly for low-proficiency speakers. Additionally, it was found that an automarker uncertainty measure termed Language Quality, which indicates the confidence of speech recognition, was useful for predicting automarker reliability and flagging abnormal speech.

Acknowledgments

We would like to thank Mark Gales, Kate Knill, Trevor Benjamin, Fumiyo Nakatsuhara, Vivien Berry, and the anonymous reviewers for their stimulating and helpful comments on the manuscript. Grateful acknowledgements are extended to Annabelle Pinnington, David Dursun, Martin Robinson, Bronagh Rolph, Kevin Cheung, Ardeshir Geranpayeh, Jamalddin Kasim, and John Savage, for their advice and assistance.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Jing Xu

Jing Xu is a Principal Research Manager at Cambridge Assessment English, University of Cambridge. His research focuses on technology-enhanced language learning and assessment and validity theory. He was a recipient of the Jacqueline A. Ross Dissertation Award in 2017 and a co-recipient of the ILTA Best Article Award in 2010. He has published in academic journals including Language Assessment QuarterlyLanguage TestingCALICO and Assessment in Education. He is the co-editor of Language test validation in a digital age (Cambridge University Press) and Towards adaptive CALL: Natural language processing for diagnostic language assessment (Iowa State University). He holds a PhD in Applied Linguistics and Technology from Iowa State University, USA.

Edmund Jones

Edmund Jones is a Senior Research Manager at Cambridge Assessment English. He works on research related to computer-based tests, automated assessment, and detection of malpractice. He provides expertise in quantitative research methods for projects for governmental education ministries. He previously worked on statistical modelling of cardiovascular disease and holds a PhD in Computational Statistics from Bristol University, UK.

Victoria Laxton

Victoria Laxton is currently a postdoctoral researcher in the Department of Psychology of the University of Derby. She was a research analyst at Cambridge Assessment English in 2020, where she worked on a variety of English assessment projects. Her interests stem from the psychology of education, with a focus on cognitive functions such as visual perception. She holds a PhD in Psychology from Nottingham Trent University, UK.

Evelina Galaczi

Evelina Galaczi is Head of Research Strategy at Cambridge Assessment English, where she leads a research team of experts in language learning, teaching and assessment. She has worked in language education for over 25 years as a teacher, teacher trainer, materials writer, researcher and assessment specialist.  Her expertise lies in second language assessment and learning, with a specific focus on speaking assessment, interactional competence, test development and the use of technologies in learning and assessment. She has presented worldwide and published in academic journals including Applied LinguisticsLanguage Assessment QuarterlyLanguage Testing and Assessment in Education. She is also the co-author/editor of Exploring language frameworks and Measured constructs (both Cambridge University Press) and a co-author of chapters in The Routledge Handbook of Second Language Acquisition and Language Testing and The Routledge Handbook of Language Testing, 2nd editionShe holds Master’s and Doctorate degrees in Applied Linguistics from Columbia University, USA.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 467.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.