Search in:

Assessment in Education: Principles, Policy & Practice Volume 28, 2021 - Issue 4: Use of Innovative Technology in Oral Language Assessment

Submit an article Journal homepage

786

Views

CrossRef citations to date

Altmetric

Articles

Complementary strengths? Evaluation of a hybrid human-machine scoring approach for a test of oral academic English

Larry DavisCenter for Language Education and Assessment Research, Educational Testing Service, Princeton, NJ, USACorrespondence[email protected]

https://orcid.org/0000-0002-1656-1123 View further author information

Spiros PapageorgiouCenter for Language Education and Assessment Research, Educational Testing Service, Princeton, NJ, USA

https://orcid.org/0000-0002-7940-3472 View further author information

Pages 437-455 | Received 31 May 2020, Accepted 31 Aug 2021, Published online: 21 Sep 2021

Cite this article
https://doi.org/10.1080/0969594X.2021.1979466
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Bridgeman, B. (2013). Human ratings and automated essay evaluation. In M. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 221–232). Routledge.
Google Scholar
Chen, L., Zechner, K., Yoon, S. Y., Evanini, K., Wang, X., Loukina, A., Tao, J., Davis, L., Lee, C. M., Ma, M., Mundkowsky, R., Lu, C., Leong, C. W., & Gyawali, B. (2018). Automated scoring of nonnative speech using the SpeechRaterSM v. 5.0 engine (Research Report No. RR-18–10). Educational Testing Service. https://doi.org/https://doi.org/10.1002/ets2.12198
Google Scholar
Educational Testing Service. (2017). How the test is scored. https://www.ets.org/gre/revised_general/scores/how/
Google Scholar
Educational Testing Service. (2019). TOEFL iBT speaking section scoring guide. https://www.ets.org/s/toefl/pdf/toefl_speaking_rubrics.pdf
Google Scholar
Educational Testing Service. (2020a). TOEFL Research Insight Series Volume 2: TOEFL Research. https://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v2.pdf
Google Scholar
Educational Testing Service. (2020b). TOEFL Research Insight Series Volume 3: Reliability and comparability of TOEFL iBT scores. https://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf
Google Scholar
Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27(3), 317–334. https://doi.org/https://doi.org/10.1177/0265532210363144
Web of Science ®Google Scholar
Harsch, C., & Martin, G. (2013). Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20(3), 281–307. https://doi.org/https://doi.org/10.1080/0969594X.2012.742422
Google Scholar
Higgins, D., Xi, X., Zechner, K., & Williamson, D. (2011). A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech & Language, 25(2), 282–306. https://doi.org/https://doi.org/10.1016/j.csl.2010.06.001
Web of Science ®Google Scholar
Isaacs, T. (2018a). Shifting sands in second language pronunciation teaching and assessment research and practice. Language Assessment Quarterly, 15(3), 273–293. https://doi.org/https://doi.org/10.1080/15434303.2018.1472264
Web of Science ®Google Scholar
Isaacs, T. (2018b). Fully automated speaking assessment: Changes to proficiency testing and the role of pronunciation. In O. Kang, R. I. Thomson, & J. Murphy (Eds.), The Routledge Handbook of English Pronunciation (pp. 570–584). Routledge.
Google Scholar
LaFlair, G. T., & Settles, B. (2020). Duolingo English Test: Technical manual. Duolingo. https://duolingo-papers.s3.amazonaws.com/other/det-technical-manual-current.pdf
Google Scholar
Loukina, A., Lopez, M., Evanini, K., Suendermann-Oeft, D., & Zechner, K. (2015). Expert and crowdsourced annotation of pronunciation errors for automatic scoring systems. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association (pp. 2809–2813). International Speech Communication Association. https://doi.org/http://dx.doi.org/10.21437/Interspeech.2015-591
Google Scholar
Loukina, A., & Yoon, S. Y. (2020). Scoring and filtering models for automated speech scoring. In K. Zechner & K. Evanini (Eds.), Automated speaking assessment: Using language technologies to score spontaneous speech (pp. 192–204). Routledge.
Google Scholar
Loukina, A., Zechner, K., Chen, L., & Heilman, M. (2015). Feature selection for automated speech scoring. In J. Tetreault, J. Burstein, & C. Leacock (Eds.), Proceedings of the Tenth workshop on innovative use of NLP for building educational applications (pp. 12–19). Association for Computational Linguistics. https://doi.org/http://dx.doi.org/10.3115/v1/W15-06
Google Scholar
Luo, D., Gu, W., Luo, R., & Wang, L. (2016). Investigation of the effects of automatic scoring technology on human raters’ performances in L2 speech proficiency assessment. In 10th International Symposium on Chinese Spoken Language Processing (pp. 1–5). https://doi.org/https://doi.org/10.1109/ISCSLP.2016.7918378
Google Scholar
Madnani, N., Loukina, A., von Davier, A., Burstein, J., & Cahill, A. (2017). Building better open-source tools to support fairness in automated scoring. In D. Hovy, S. Spruit, M. Mitchell, E. Bender, M. Strube, & H. Wallach (Eds.), Proceedings of the First Workshop on Ethics in Natural Language Processing (pp. 41–52). Association for Computational Linguistics. https://aclanthology.org/W17-1605.pdf
Google Scholar
Papageorgiou, S., Tannenbaum, R. J., Bridgeman, B., & Cho, Y. (2015). The association between TOEFL iBT® test scores and the Common European Framework of Reference (CEFR) levels (Research Memorandum No. RM-15–06). Educational Testing Service. https://www.ets.org/Media/Research/pdf/RM-15-06.pdf
Google Scholar
Pearson. (2019). Pearson Test of English Academic: Automated scoring. https://assets.ctfassets.net/yqwtwibiobs4/018RxttvPWsMkkGIQJ5Gg3/6f410437ceb2c6f2762fbcdfa8a28e8c/2021_PTEA_White_Paper_Institutions_Automated_Scoring_White_Paper-May-2018.pdf
Google Scholar
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed). Harcourt Brace.
Google Scholar
Poonpon, K., & Jamieson, J. (2013). Developing analytic rating guides for TOEFL iBT’s integrated speaking tasks (Research Report No. RR-13–13). Educational Testing Service. https://doi.org/https://doi.org/10.1002/j.2333-8504.2013.tb02320.x
Google Scholar
Qian, Y., Lange, P., & Evanini, K. (2020). Summary and outlook on automated speech scoring. In K. Zechner & K. Evanini (Eds.), Automated speaking assessment: Using language technologies to score spontaneous speech (pp. 61–74). Routledge.
Google Scholar
Ramineni, C., Trapani, C. S., Williamson, D. M., Davey, T., & Bridgeman, B. (2012). Evaluation of the e-rater® scoring engine for the TOEFL® independent and integrated prompts (Research Report No. RR-12–06). Educational Testing Service. https://doi.org/https://doi.org/10.1002/j.2333-8504.2012.tb02288.x
Google Scholar
Sawaki, Y., & Sinharay, S. (2013). Investigating the value of section scores for the TOEFL iBT test (Research Report No. RR-13–35). Educational Testing Service. https://doi.org/https://doi.org/10.1002/j.2333-8504.2013.tb02342.x
Google Scholar
Sinharay, S., Puhan, G., & Haberman, S. J. (2011). An NCME instructional module on subscores. Educational Measurement: Issues and Practice, 30(3), 29–40. https://doi.org/https://doi.org/10.1111/j.1745-3992.2011.00208.x
Google Scholar
Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Pearson.
Google Scholar
Williamson, D., Xi, X., & Breyer, J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/https://doi.org/10.1111/j.1745-3992.2011.00223.x
Google Scholar
Xi, X. (2007). Evaluating analytic scoring for the TOEFL Academic Speaking Test (TAST) for operational use. Language Testing, 24(2), 251–286. https://doi.org/https://doi.org/10.1177/0265532207076365
Google Scholar
Xi, X., Higgins, D., Zechner, K., & Williamson, D. (2008). Automated scoring of spontaneous speech using SpeechRaterSM v. 1.0 (Research Report No. RR-08–62). Educational Testing Service. https://doi.org/https://doi.org/10.1002/j.2333-8504.2008.tb02148.x
Google Scholar
Xi, X., Higgins, D., Zechner, K., & Williamson, D. (2012). A comparison of two scoring methods for an automated speech scoring system. Language Testing, 29(3), 371–394. https://doi.org/https://doi.org/10.1177/0265532211425673
Web of Science ®Google Scholar
Xu, J., Brenchley, M., Jones, E., Pinnington, A., Benjamin, T., Knill, K., Seal-Coon, G., Robinson, M., & Geranpayeh, A. (2020). Linguaskill: Building a validity argument for the speaking test. Cambridge Assessment English. https://www.cambridgeenglish.org/Images/589637-linguaskill-building-a-validity-argument-for-the-speaking-test.pdf
Google Scholar
Yoon, S. Y., & Zechner, K. (2017). Combining human and automated scores for the improved assessment of non-native speech. Speech Communication, 93, 43–52. https://doi.org/https://doi.org/10.1016/j.specom.2017.08.001
Web of Science ®Google Scholar
Zechner, K. (2020). Summary and outlook on automated speech scoring. In K. Zechner & K. Evanini (Eds.), Automated speaking assessment: Using language technologies to score spontaneous speech (pp. 192–204). Routledge.
Google Scholar
Zechner, K., Chen, L., Davis, L., Evanini, K., Lee, C. M., Leong, C. W., Wang, X., & Yoon, S. Y. (2015). Automated scoring of speaking tasks in the Test of English-for-Teaching (TEFT™) (Research Report No. RR-15–31). Educational Testing Service. https://doi.org/https://doi.org/10.1002/ets2.12080
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Complementary strengths? Evaluation of a hybrid human-machine scoring approach for a test of oral academic English

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Complementary strengths? Evaluation of a hybrid human-machine scoring approach for a test of oral academic English

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date