Search in:

Language Assessment Quarterly Volume 15, 2018 - Issue 3: Conceptualizing and Operationalizing Speaking Assessment for a New Century

Submit an article Journal homepage

3,417

Views

CrossRef citations to date

Altmetric

Original Articles

Speech Technologies and the Assessment of Second Language Speaking: Approaches, Challenges, and Opportunities

Diane LitmanDepartment of Computer Science, University of Pittsburgh, Pittsburgh, PA, USACorrespondence[email protected]

Helmer StrikCentre for Language Studies, Radboud Universiteit, Nijmegen, Netherlands

Gad S. LimMichigan Language Assessment, Ann Arbor, MI, USA

http://orcid.org/0000-0001-5208-4953

Pages 294-309 | Published online: 04 Jun 2018

Cite this article
https://doi.org/10.1080/15434303.2018.1472265
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Ai, H., & Litman, D. J. (2008). Assessing dialog system user simulation evaluation measures using human judges. Proceedings of 46th Annual Conference of Association of Computational Linguistics, 622–629. Columbus, Ohio, USA: Association for Computational Linguistics.
Google Scholar
Bernstein, J. (1999). PhonePass testing: Structure and construct. Menlo Park, CA: Ordinate.
Google Scholar
Bernstein, J. (2012). Computer scoring of spoken responses. In C. A. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 857–863). New York, NY, USA: Wiley.
Google Scholar
Bernstein, J., & Cheng, J. (2007). Logic, operation and validation of a spoken english Test,” chapter 8. In V. M. Holland & F. P. Fisher (Eds.), Speech technologies for language learning (pp. 174–194). New York, NY, USA: Routledge.
Google Scholar
Butler, F. A., Eignor, D., Jones, S., McNamara, T., & Suomi, B. K. (2000). TOEFL 2000 speaking framework: A working paper. TOEFL Monograph Series MS-20. Princeton, NJ: Educational Testing Service.
Google Scholar
Cabral, C., Campbell, N., Ganesh, S., Gilmartin, E., Haider, F., Kenny, E., … Orosko, O. R. (2014). MILLA – A multimodal interactive language agent. Edinburgh, United Kingdom: eNTERFACE, Software.
Google Scholar
Chapelle, C. A., & Chung, Y. R. (2010). The promise of NLP and speech processing technologies in language assessment. Language Testing, 27(3), 301–315. doi:10.1177/0265532210364405
Web of Science ®Google Scholar
Cheng, J., Chen, X., & Metallinou, A. (2015). Deep neural network acoustic models for spoken assessment applications. Speech Communication, 73, 14–27. doi:10.1016/j.specom.2015.07.006
Web of Science ®Google Scholar
Chun, C. (2006). Commentary: An analysis of a language test for employment: The authenticity of the phonepass test. Language Assessment Quarterly, 3(3), 295–306. doi:10.1207/s15434311laq0303_4
Google Scholar
Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27(1), 49–64. doi:10.1016/S0346-251X(98)00049-9
Google Scholar
Cook, K., McGhee, J., & Lonsdale, D. (2011). Elicited imitation for prediction of OPI test scores. Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications, 30–37. Portland, Oregon, USA: Association for Computational Linguistics.
Google Scholar
Cuayáhuitl, H., Dethlefs, N., Hastie, H., & Lemon, O. (2013). Impact of ASR N-Best information on bayesian dialogue act recognition. Proceedings of SIGDIAL. Metz, France: Association for Computational Linguistics.
Google Scholar
Cucchiarini, C., Strik, H., & Boves, L. (1998). Automatic pronunciation grading for Dutch. Proceedings of the ESCA Workshop on Speech Technology in Language Learning, 95–98. Marholmen, Sweden: ESCA.
Google Scholar
Cucchiarini, C., Strik, H., & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30, 109–119. doi:10.1016/S0167-6393(99)00040-0
Web of Science ®Google Scholar
Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology. Journal of the Acoustical Society of America, 107(2), 989–999. doi:10.1121/1.428279
PubMed Web of Science ®Google Scholar
Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873. doi:10.1121/1.1471894
PubMed Web of Science ®Google Scholar
Cucchiarini, C., Van Doremalen, J., & Strik, H. (2010). Fluency in non-native read and spontaneous speech. Proceedings of the DiSS-LPSS Joint Workshop 2010, 15–18. Tokyo, Japan: University of Tokyo.
Google Scholar
De Jong, J. H. A. L., Lennig, M., Kerkhoff, A., & Poelmans, P. (2009). Development of a test of spoken Dutch for prospective immigrants. Language Assessment Quarterly, 6(1), 41–60. doi:10.1080/15434300802606564
Web of Science ®Google Scholar
De Jong, N. (this issue). Fluency in second language testing: Insights from different disciplines. Language Assessment Quarterly.
Google Scholar
De Wet, F., Van Der Walt, C., & Niesler, T. R. (2009). Automatic assessment of oral language proficiency and listening comprehension. Speech Communication, 51, 864–874. doi:10.1016/j.specom.2009.03.002
Web of Science ®Google Scholar
Delcloque, P. (Ed.) (1999). Progressing interface transparency: Speech applications in computer assisted language learning. Proceedings of ‘Integrating Speech Technology in Learning’ (InSTIL), Besançon, France.
Google Scholar
Delcloque, P. (Ed.) (2000). Speech technology in language learning and the assistive interface. Proceedings of ‘Integrating Speech Technology in Learning’ (InSTIL), Dundee, Scotland.
Google Scholar
Delmonte, R. (2004)InSTIL/ICALL Symposium ‘NLP and Speech Technologies in Advanced Language Learning Systems’, Venice, Italy. http://project.cgm.unive.it/events/ICALL2004/index.htm
Google Scholar
Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603. doi:10.2307/3587748
Web of Science ®Google Scholar
Downey, R., Farhady, H., Present-Thomas, R., Suzuki, M., & Van Moere, A. (2008). Evaluation of the usefulness of the Versant for English test: A response. Language Assessment Quarterly, 5(2), 160–167. doi:10.1080/15434300801934744
Web of Science ®Google Scholar
Dzikovska, M. O., Nielsen, R. D., & Brew, C. (2012). Towards effective tutorial feedback for explanation questions: A dataset and baselines. Proceedings Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT ‘12), 200–210. Montreal, Canada: Association for Computational Linguistics.
Google Scholar
Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51, 10. doi:10.1016/j.specom.2009.04.005
Web of Science ®Google Scholar
Evanini, K., So, Y., Tao, J., Zapata-Rivera, D., Luce, C., Battistini, L., & Wang, X. (2014). Performance of a trialogue-based prototype system for English language assessment for young learners. Proceedings of the Interspeech Workshop on Child Computer Interaction (WOCCI). Singapore, Singapore: ISCA.
Google Scholar
Evanini, K., Xie, S., & Zechner, K. (2013). Prompt-based content scoring for automated spoken language assessment. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 157–162. Atlanta, Georgia, USA: Association for Computational Linguistics.
Google Scholar
Forbes-Riley, K., & Litman, D. (2011). Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor. Speech Communication, 53(9–10), 1115–1136. doi:10.1016/j.specom.2011.02.006
Web of Science ®Google Scholar
Galaczi, E., & Taylor, L. (this issue). Interactional competence: Conceptualisations, operationalisations, and outstanding questions. Language Assessment Quarterly.
Google Scholar
Gandhe, S., & Traum, D. (2008). An evaluation understudy for dialogue coherence models. Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, 172–181. Columbus, Ohio, USA: Association for Computational Linguistics.
Google Scholar
Gasic, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., … Young, S. (2013). POMDP-based dialogue manager adaptation to extended domains. Proceedings of SIGDIAL, 214-222. Metz, France: Association for Computational Linguistics.
Google Scholar
Ginther, A., Dimova, S., & Yang, R. (2010). Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring. Language Testing, 27(3), 379–399. doi:10.1177/0265532210364407
Web of Science ®Google Scholar
Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may I help you? Speech Communication, 23, 113–127. doi:10.1016/S0167-6393(97)00040-X
Web of Science ®Google Scholar
Graham, C. R., Lonsdale, D., Kennington, C., Johnson, A., & McGhee, J. (2008). Elicited imitation as an oral proficiency measure with ASR scoring. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), 1604–1610. Marrakech, Morocco: LREC.
Google Scholar
Gravano, A., Hirschberg, J., & Beňuš, Š. (2012). Affirmative cue words in task-oriented dialogue. Computational Linguistics, 38(1), 1–39. doi:10.1162/COLI_a_00083
Web of Science ®Google Scholar
Isaacs, T. (this issue). Shifting sands in second language pronunciation assessment research and practice. Language Assessment Quarterly.
Google Scholar
Johnson, W. L., & Valente, A. (2008). Tactical language and culture training systems: Using artificial intelligence to teach foreign languages and cultures. Association for the Advancement of Artificial Intelligence, 30(2), 1632–1639.
Google Scholar
Jurafsky, D., & Martin, J. (2009). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2nd ed.). New Jersey, USA: Prentice-Hall.
Google Scholar
Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. pp. 17–64). Westport, CT: Praeger.
Google Scholar
Kanters, S., Cucchiarini, C., & Strik, H. (2009). The goodness of pronunciation algorithm: A detailed performance study. Proceedings of the 2009 ISCA Workshop on Speech and Language Technology in Education (SLaTE), 2–5. Austin, Texas, USA: ISCA.
Google Scholar
Litman, D., Young, S., Gales, M., Knill, K., Ottewell, K., Van Dalen, R., & Vandyke, D. (2016). Towards using conversations with spoken dialogue systems in the automated assessment of non-native speakers of english. 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 270–275. Los Angeles, California, USA: SIGdial.
Google Scholar
Luo, D., Minematsu, N., Yamauchi, Y., & Hiroshi, K. (2009). Analysis and comparison of automatic language proficiency assessment between shadowed sentences and read sentences. In Proceedings of SLaTE (pp. 2009). Warwick, England: ISCA.
Google Scholar
Malinin, A., Van Dalen, R. C., Wang, Y., Knill, K. M., & Gales, M. J. F. (2016). Off-topic response detection for spontaneous spoken english assessment. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2016), 1075–1084. Berlin, Germany: Association for Computational Linguistics.
Google Scholar
McGraw, I., & Seneff, S. (2007). Immersive second language acquisition in narrow domains: A prototype ISLAND dialogue system. Proceedings of SlaTE, 84–87. Farmington, Pennsylvania, USA: ISCA.
Google Scholar
Müller, P., De Wet, F., Van Der Walt, C., & Niesler, T. (2009). Automatically assessing the oral proficiency of proficient L2 speakers. Proceedings of SLaTE, 29–32. Austin, Texas, USA: ISCA.
Google Scholar
Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30, 83–94. doi:10.1016/S0167-6393(99)00046-1
Web of Science ®Google Scholar
Pearlman, M. (2008). Finalizing the test blueprint. In C. A. Chapelle, M. K. Enright, & J. M. Jamieson (Eds.), Building a validity argument for the test of English as a foreign language (pp. 227–258). New York, NY, USA: Routledge.
Google Scholar
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., & Hinton, G. (2016). Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548v1 [cs.NE]; https://arxiv.org/abs/1701.06548
Google Scholar
Plough, I., Briggs, S., & Van Bonn, S. (2010). A multi-method analysis of evaluation criteria used to assess the speaking proficiency of graduate student instructors. Language Testing, 27(2), 235–260. doi:10.1177/0265532209349469
Web of Science ®Google Scholar
Raux, A., & Eskenazi, M. (2004). Non-Native Users in the Let’s Go!! Spoken Dialogue System: dealing with linguistic mismatch. Proceedings of HLT-NAACL, 217–224. Boston, Massachusetts, USA: Association for Computational Linguistics.
Google Scholar
Raux, A., & Eskenazi, M. (2009). A finite-state turn-taking model for spoken dialog systems. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 629–637. Boulder, Colorado, USA: Association for Computational Linguistics.
Google Scholar
Saville, N. (2003). The process of test development and revision within UCLES EFL. In C. J. Weir & M. Milanovic (Eds.), Continuity and innovation: revising the cambridge proficiency in english examination 1913-2002 (pp. 57–120). Cambridge, United Kingdom: UCLES/Cambridge University Press.
Google Scholar
Seedhouse, P., & Egbert, M. (2006). The interactional organization of the IELTS speaking test. IELTS Research Reports, 6, 161–204.
Google Scholar
Seedhouse, P., & Harris, A. (2011). Topic development in the IELTS speaking test. IELTS Research Reports, 12, 69–124.
Google Scholar
Seedhouse, P., Harris, A., Naeb, R., & And Ustunel, E. (2014). The relationship between speaking features and band descriptors: A mixed methods study. IELTS Research Reports Online Series, 2, 1–30.
Google Scholar
Seneff, S., Wang, C., & Chao, C. Y. (2007). Spoken dialogue systems for language learning. Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 13–14. Rochester, New York, USA: Association for Computational Linguistics.
Google Scholar
Shashidhar, V., Pandey, N., & Aggarwal, V. (2015). Automatic spontaneous speech grading: A novel feature derivation technique using the crowd. Proceedings 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing, 1085–1094. Beijing, China: Association for Computational Linguistics.
Google Scholar
Singh, S., Litman, D., Kearns, M., & Walker, M. (2002). Optimizing dialogue managment with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research, 16, 105–133.
Web of Science ®Google Scholar
Stolarova, M., Wolf, C., Rinker, T., & Brielmann, A. (2014). How to assess and compare inter-rater reliability, agreement and correlation of ratings: An exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs. Frontiers in Psychology, 5, 509. doi:10.3389/fpsyg.2014.00509
PubMed Web of Science ®Google Scholar
Stoyanchev, S., Liu, A., & Hirschberg, J. (2013). Modelling human clarification strategies. Proceedings of SIGDIAL, 137–141. Metz, France: SIGdial.
Google Scholar
Strik, H., Neri, A., & Cucchiarini, C. (2008). Speech technology for language tutoring. Proceedings of LangTech 2008, 73–76. Rome, Italy: LangTech.
Google Scholar
Strik, H., Truong, K., De Wet, F., & Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection. Speech Communication, 51, 845–852. doi:10.1016/j.specom.2009.05.007
Web of Science ®Google Scholar
Su, P. H., Wu, C. H., & Lee, L. S. (2015). Recursive dialogue game for personalized computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 127–141.
Web of Science ®Google Scholar
Tao, J., Ghaffarzadegan, S., Chen, L., & Zechner, K. (2016). Exploring deep learning architectures for automatically grading non-native spontaneous speech. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016). Shanghai, China: IEEE.
Google Scholar
Townshend, B., Bernstein, J., Todic, O., & Warren, E. (1998). Estimation of spoken language proficiency. Proceedings of the ESCA Workshop STiLL: ‘Speech Technology in Language Learning’, 179–182. Marholmen, Sweden: ESCA.
Google Scholar
Van Compernolle, D. (2001). Recognizing speech of goats, wolves, sheep and non-natives. Speech Communication, 35, 71–79. doi:10.1016/S0167-6393(00)00096-0
Web of Science ®Google Scholar
Van Dalen, R., Knill, K., & Gales, M. (2015). Automatically grading learners’ English using a Gaussian process. Proceedings Sixth Workshop on Speech and Language Technology in Education (SLaTE), 7–12. Leipzig, Germany: ISCA.
Google Scholar
Van Doremalen, J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing, 2010(2010), Article ID 973954, 13 pages. doi:10.1186/1687-4722-2010-973954
Google Scholar
Van Doremalen, J., Cucchiarini, C., & Strik, H. (2011). Speech technology in CALL: The essential role of adaptation. Interdisciplinary Approaches to Adaptive Learning; Communications in Computer and Information Science Series, 26, 56–69.
Google Scholar
Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing, 29(3), 325–344. doi:10.1177/0265532211424478
Web of Science ®Google Scholar
Visser, T., Traum, D., DeVault, D., & Op Den Akker, R. (2014). A model for incremental grounding in spoken dialogue systems. Journal on Multimodal User Interfaces, 8(1), 61–73.
Web of Science ®Google Scholar
Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1997). PARADISE: A framework for evaluating spoken dialogue agents. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, 271–280. Madrid, Spain: Association for Computational Linguistics.
Google Scholar
Wang, X., Evanini, K., & Zechner, K. (2013). Coherence modeling for the automated assessment of spontaneous spoken responses. Proceedings Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 814–819. Atlanta, Georgia, USA: Association for Computational Linguistics.
Google Scholar
Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. doi:10.1145/365153.365168
Web of Science ®Google Scholar
Williams, J., Raux, A., Ramachandran, D., & Black, A. (2013). The dialog state tracking challenge. Proceedings of the SIGDIAL 2013 Conference, 404–413. Metz, France: SIGdial.
Google Scholar
Witt, S. M., & Young, S. (2000). Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication, 30, 95–108. doi:10.1016/S0167-6393(99)00044-8
Web of Science ®Google Scholar
Xi, X., Higgins, D., Zechner, K., & Williamson, D. M. (2008). Automated scoring of spontaneous speech using SpeechRater v1.0. Educational Testing Service Research Report No. RR-08-62. Princeton, NJ: ETS.
Google Scholar
Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., … Zweig, G. (2016). Achieving human parity in conversational speech recognition. arXiv:1610.05256 [cs.CL]; https://arxiv.org/abs/1610.05256
Google Scholar
Xiong, W., Evanini, K., Zechner, K., & Chen, L. (2013). Automated content scoring of spoken responses containing multiple parts with factual information. Proceedings SLaTE 2013, 137–142. Porto, Portugal: ISCA.
Google Scholar
Yoon, S., & Xie, S. (2014). Similarity based non-scorable response detection for automated speech scoring. Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 116–123. Baltimore, Maryland, USA: Association for Computational Linguistics.
Google Scholar
Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., & Yu, K. (2010). The hidden information state model: A practical framework for POMDP-based spoken dialogue management. Computer Speech and Language, 24, 150–174. doi:10.1016/j.csl.2009.04.001
Web of Science ®Google Scholar
Zechner, K., & Bejar, I. (2006). Towards automatic scoring of non-native spontaneous speech. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 216–223. New York, NY, USA: Association for Computational Linguistics.
Google Scholar
Zechner, K., Evanini, K., Yoon, S. Y., Davis, L., Wang, X., Chen, L., … Leong, C. W. (2014). Automated scoring of speaking items in an assessment for teachers of English as a Foreign Language. Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, 134–142. Baltimore, Maryland, USA: Association for Computational Linguistics.
Google Scholar
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895. doi:10.1016/j.specom.2009.04.009
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Speech Technologies and the Assessment of Second Language Speaking: Approaches, Challenges, and Opportunities

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Speech Technologies and the Assessment of Second Language Speaking: Approaches, Challenges, and Opportunities

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date