3,397
Views
22
CrossRef citations to date
0
Altmetric
Original Articles

Speech Technologies and the Assessment of Second Language Speaking: Approaches, Challenges, and Opportunities

, & ORCID Icon

References

  • Ai, H., & Litman, D. J. (2008). Assessing dialog system user simulation evaluation measures using human judges. Proceedings of 46th Annual Conference of Association of Computational Linguistics, 622–629. Columbus, Ohio, USA: Association for Computational Linguistics.
  • Bernstein, J. (1999). PhonePass testing: Structure and construct. Menlo Park, CA: Ordinate.
  • Bernstein, J. (2012). Computer scoring of spoken responses. In C. A. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 857–863). New York, NY, USA: Wiley.
  • Bernstein, J., & Cheng, J. (2007). Logic, operation and validation of a spoken english Test,” chapter 8. In V. M. Holland & F. P. Fisher (Eds.), Speech technologies for language learning (pp. 174–194). New York, NY, USA: Routledge.
  • Butler, F. A., Eignor, D., Jones, S., McNamara, T., & Suomi, B. K. (2000). TOEFL 2000 speaking framework: A working paper. TOEFL Monograph Series MS-20. Princeton, NJ: Educational Testing Service.
  • Cabral, C., Campbell, N., Ganesh, S., Gilmartin, E., Haider, F., Kenny, E., … Orosko, O. R. (2014). MILLA – A multimodal interactive language agent. Edinburgh, United Kingdom: eNTERFACE, Software.
  • Chapelle, C. A., & Chung, Y. R. (2010). The promise of NLP and speech processing technologies in language assessment. Language Testing, 27(3), 301–315. doi:10.1177/0265532210364405
  • Cheng, J., Chen, X., & Metallinou, A. (2015). Deep neural network acoustic models for spoken assessment applications. Speech Communication, 73, 14–27. doi:10.1016/j.specom.2015.07.006
  • Chun, C. (2006). Commentary: An analysis of a language test for employment: The authenticity of the phonepass test. Language Assessment Quarterly, 3(3), 295–306. doi:10.1207/s15434311laq0303_4
  • Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27(1), 49–64. doi:10.1016/S0346-251X(98)00049-9
  • Cook, K., McGhee, J., & Lonsdale, D. (2011). Elicited imitation for prediction of OPI test scores. Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications, 30–37. Portland, Oregon, USA: Association for Computational Linguistics.
  • Cuayáhuitl, H., Dethlefs, N., Hastie, H., & Lemon, O. (2013). Impact of ASR N-Best information on bayesian dialogue act recognition. Proceedings of SIGDIAL. Metz, France: Association for Computational Linguistics.
  • Cucchiarini, C., Strik, H., & Boves, L. (1998). Automatic pronunciation grading for Dutch. Proceedings of the ESCA Workshop on Speech Technology in Language Learning, 95–98. Marholmen, Sweden: ESCA.
  • Cucchiarini, C., Strik, H., & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30, 109–119. doi:10.1016/S0167-6393(99)00040-0
  • Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology. Journal of the Acoustical Society of America, 107(2), 989–999. doi:10.1121/1.428279
  • Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873. doi:10.1121/1.1471894
  • Cucchiarini, C., Van Doremalen, J., & Strik, H. (2010). Fluency in non-native read and spontaneous speech. Proceedings of the DiSS-LPSS Joint Workshop 2010, 15–18. Tokyo, Japan: University of Tokyo.
  • De Jong, J. H. A. L., Lennig, M., Kerkhoff, A., & Poelmans, P. (2009). Development of a test of spoken Dutch for prospective immigrants. Language Assessment Quarterly, 6(1), 41–60. doi:10.1080/15434300802606564
  • De Jong, N. (this issue). Fluency in second language testing: Insights from different disciplines. Language Assessment Quarterly.
  • De Wet, F., Van Der Walt, C., & Niesler, T. R. (2009). Automatic assessment of oral language proficiency and listening comprehension. Speech Communication, 51, 864–874. doi:10.1016/j.specom.2009.03.002
  • Delcloque, P. (Ed.) (1999). Progressing interface transparency: Speech applications in computer assisted language learning. Proceedings of ‘Integrating Speech Technology in Learning’ (InSTIL), Besançon, France.
  • Delcloque, P. (Ed.) (2000). Speech technology in language learning and the assistive interface. Proceedings of ‘Integrating Speech Technology in Learning’ (InSTIL), Dundee, Scotland.
  • Delmonte, R. (2004)InSTIL/ICALL Symposium ‘NLP and Speech Technologies in Advanced Language Learning Systems’, Venice, Italy. http://project.cgm.unive.it/events/ICALL2004/index.htm
  • Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603. doi:10.2307/3587748
  • Downey, R., Farhady, H., Present-Thomas, R., Suzuki, M., & Van Moere, A. (2008). Evaluation of the usefulness of the Versant for English test: A response. Language Assessment Quarterly, 5(2), 160–167. doi:10.1080/15434300801934744
  • Dzikovska, M. O., Nielsen, R. D., & Brew, C. (2012). Towards effective tutorial feedback for explanation questions: A dataset and baselines. Proceedings Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT ‘12), 200–210. Montreal, Canada: Association for Computational Linguistics.
  • Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51, 10. doi:10.1016/j.specom.2009.04.005
  • Evanini, K., So, Y., Tao, J., Zapata-Rivera, D., Luce, C., Battistini, L., & Wang, X. (2014). Performance of a trialogue-based prototype system for English language assessment for young learners. Proceedings of the Interspeech Workshop on Child Computer Interaction (WOCCI). Singapore, Singapore: ISCA.
  • Evanini, K., Xie, S., & Zechner, K. (2013). Prompt-based content scoring for automated spoken language assessment. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 157–162. Atlanta, Georgia, USA: Association for Computational Linguistics.
  • Forbes-Riley, K., & Litman, D. (2011). Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor. Speech Communication, 53(9–10), 1115–1136. doi:10.1016/j.specom.2011.02.006
  • Galaczi, E., & Taylor, L. (this issue). Interactional competence: Conceptualisations, operationalisations, and outstanding questions. Language Assessment Quarterly.
  • Gandhe, S., & Traum, D. (2008). An evaluation understudy for dialogue coherence models. Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, 172–181. Columbus, Ohio, USA: Association for Computational Linguistics.
  • Gasic, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., … Young, S. (2013). POMDP-based dialogue manager adaptation to extended domains. Proceedings of SIGDIAL, 214-222. Metz, France: Association for Computational Linguistics.
  • Ginther, A., Dimova, S., & Yang, R. (2010). Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring. Language Testing, 27(3), 379–399. doi:10.1177/0265532210364407
  • Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may I help you? Speech Communication, 23, 113–127. doi:10.1016/S0167-6393(97)00040-X
  • Graham, C. R., Lonsdale, D., Kennington, C., Johnson, A., & McGhee, J. (2008). Elicited imitation as an oral proficiency measure with ASR scoring. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), 1604–1610. Marrakech, Morocco: LREC.
  • Gravano, A., Hirschberg, J., & Beňuš, Š. (2012). Affirmative cue words in task-oriented dialogue. Computational Linguistics, 38(1), 1–39. doi:10.1162/COLI_a_00083
  • Isaacs, T. (this issue). Shifting sands in second language pronunciation assessment research and practice. Language Assessment Quarterly.
  • Johnson, W. L., & Valente, A. (2008). Tactical language and culture training systems: Using artificial intelligence to teach foreign languages and cultures. Association for the Advancement of Artificial Intelligence, 30(2), 1632–1639.
  • Jurafsky, D., & Martin, J. (2009). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2nd ed.). New Jersey, USA: Prentice-Hall.
  • Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. pp. 17–64). Westport, CT: Praeger.
  • Kanters, S., Cucchiarini, C., & Strik, H. (2009). The goodness of pronunciation algorithm: A detailed performance study. Proceedings of the 2009 ISCA Workshop on Speech and Language Technology in Education (SLaTE), 2–5. Austin, Texas, USA: ISCA.
  • Litman, D., Young, S., Gales, M., Knill, K., Ottewell, K., Van Dalen, R., & Vandyke, D. (2016). Towards using conversations with spoken dialogue systems in the automated assessment of non-native speakers of english. 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 270–275. Los Angeles, California, USA: SIGdial.
  • Luo, D., Minematsu, N., Yamauchi, Y., & Hiroshi, K. (2009). Analysis and comparison of automatic language proficiency assessment between shadowed sentences and read sentences. In Proceedings of SLaTE (pp. 2009). Warwick, England: ISCA.
  • Malinin, A., Van Dalen, R. C., Wang, Y., Knill, K. M., & Gales, M. J. F. (2016). Off-topic response detection for spontaneous spoken english assessment. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2016), 1075–1084. Berlin, Germany: Association for Computational Linguistics.
  • McGraw, I., & Seneff, S. (2007). Immersive second language acquisition in narrow domains: A prototype ISLAND dialogue system. Proceedings of SlaTE, 84–87. Farmington, Pennsylvania, USA: ISCA.
  • Müller, P., De Wet, F., Van Der Walt, C., & Niesler, T. (2009). Automatically assessing the oral proficiency of proficient L2 speakers. Proceedings of SLaTE, 29–32. Austin, Texas, USA: ISCA.
  • Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30, 83–94. doi:10.1016/S0167-6393(99)00046-1
  • Pearlman, M. (2008). Finalizing the test blueprint. In C. A. Chapelle, M. K. Enright, & J. M. Jamieson (Eds.), Building a validity argument for the test of English as a foreign language (pp. 227–258). New York, NY, USA: Routledge.
  • Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., & Hinton, G. (2016). Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548v1 [cs.NE]; https://arxiv.org/abs/1701.06548
  • Plough, I., Briggs, S., & Van Bonn, S. (2010). A multi-method analysis of evaluation criteria used to assess the speaking proficiency of graduate student instructors. Language Testing, 27(2), 235–260. doi:10.1177/0265532209349469
  • Raux, A., & Eskenazi, M. (2004). Non-Native Users in the Let’s Go!! Spoken Dialogue System: dealing with linguistic mismatch. Proceedings of HLT-NAACL, 217–224. Boston, Massachusetts, USA: Association for Computational Linguistics.
  • Raux, A., & Eskenazi, M. (2009). A finite-state turn-taking model for spoken dialog systems. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 629–637. Boulder, Colorado, USA: Association for Computational Linguistics.
  • Saville, N. (2003). The process of test development and revision within UCLES EFL. In C. J. Weir & M. Milanovic (Eds.), Continuity and innovation: revising the cambridge proficiency in english examination 1913-2002 (pp. 57–120). Cambridge, United Kingdom: UCLES/Cambridge University Press.
  • Seedhouse, P., & Egbert, M. (2006). The interactional organization of the IELTS speaking test. IELTS Research Reports, 6, 161–204.
  • Seedhouse, P., & Harris, A. (2011). Topic development in the IELTS speaking test. IELTS Research Reports, 12, 69–124.
  • Seedhouse, P., Harris, A., Naeb, R., & And Ustunel, E. (2014). The relationship between speaking features and band descriptors: A mixed methods study. IELTS Research Reports Online Series, 2, 1–30.
  • Seneff, S., Wang, C., & Chao, C. Y. (2007). Spoken dialogue systems for language learning. Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 13–14. Rochester, New York, USA: Association for Computational Linguistics.
  • Shashidhar, V., Pandey, N., & Aggarwal, V. (2015). Automatic spontaneous speech grading: A novel feature derivation technique using the crowd. Proceedings 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing, 1085–1094. Beijing, China: Association for Computational Linguistics.
  • Singh, S., Litman, D., Kearns, M., & Walker, M. (2002). Optimizing dialogue managment with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research, 16, 105–133.
  • Stolarova, M., Wolf, C., Rinker, T., & Brielmann, A. (2014). How to assess and compare inter-rater reliability, agreement and correlation of ratings: An exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs. Frontiers in Psychology, 5, 509. doi:10.3389/fpsyg.2014.00509
  • Stoyanchev, S., Liu, A., & Hirschberg, J. (2013). Modelling human clarification strategies. Proceedings of SIGDIAL, 137–141. Metz, France: SIGdial.
  • Strik, H., Neri, A., & Cucchiarini, C. (2008). Speech technology for language tutoring. Proceedings of LangTech 2008, 73–76. Rome, Italy: LangTech.
  • Strik, H., Truong, K., De Wet, F., & Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection. Speech Communication, 51, 845–852. doi:10.1016/j.specom.2009.05.007
  • Su, P. H., Wu, C. H., & Lee, L. S. (2015). Recursive dialogue game for personalized computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 127–141.
  • Tao, J., Ghaffarzadegan, S., Chen, L., & Zechner, K. (2016). Exploring deep learning architectures for automatically grading non-native spontaneous speech. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016). Shanghai, China: IEEE.
  • Townshend, B., Bernstein, J., Todic, O., & Warren, E. (1998). Estimation of spoken language proficiency. Proceedings of the ESCA Workshop STiLL: ‘Speech Technology in Language Learning’, 179–182. Marholmen, Sweden: ESCA.
  • Van Compernolle, D. (2001). Recognizing speech of goats, wolves, sheep and non-natives. Speech Communication, 35, 71–79. doi:10.1016/S0167-6393(00)00096-0
  • Van Dalen, R., Knill, K., & Gales, M. (2015). Automatically grading learners’ English using a Gaussian process. Proceedings Sixth Workshop on Speech and Language Technology in Education (SLaTE), 7–12. Leipzig, Germany: ISCA.
  • Van Doremalen, J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing, 2010(2010), Article ID 973954, 13 pages. doi:10.1186/1687-4722-2010-973954
  • Van Doremalen, J., Cucchiarini, C., & Strik, H. (2011). Speech technology in CALL: The essential role of adaptation. Interdisciplinary Approaches to Adaptive Learning; Communications in Computer and Information Science Series, 26, 56–69.
  • Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing, 29(3), 325–344. doi:10.1177/0265532211424478
  • Visser, T., Traum, D., DeVault, D., & Op Den Akker, R. (2014). A model for incremental grounding in spoken dialogue systems. Journal on Multimodal User Interfaces, 8(1), 61–73.
  • Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1997). PARADISE: A framework for evaluating spoken dialogue agents. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, 271–280. Madrid, Spain: Association for Computational Linguistics.
  • Wang, X., Evanini, K., & Zechner, K. (2013). Coherence modeling for the automated assessment of spontaneous spoken responses. Proceedings Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 814–819. Atlanta, Georgia, USA: Association for Computational Linguistics.
  • Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. doi:10.1145/365153.365168
  • Williams, J., Raux, A., Ramachandran, D., & Black, A. (2013). The dialog state tracking challenge. Proceedings of the SIGDIAL 2013 Conference, 404–413. Metz, France: SIGdial.
  • Witt, S. M., & Young, S. (2000). Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication, 30, 95–108. doi:10.1016/S0167-6393(99)00044-8
  • Xi, X., Higgins, D., Zechner, K., & Williamson, D. M. (2008). Automated scoring of spontaneous speech using SpeechRater v1.0. Educational Testing Service Research Report No. RR-08-62. Princeton, NJ: ETS.
  • Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., … Zweig, G. (2016). Achieving human parity in conversational speech recognition. arXiv:1610.05256 [cs.CL]; https://arxiv.org/abs/1610.05256
  • Xiong, W., Evanini, K., Zechner, K., & Chen, L. (2013). Automated content scoring of spoken responses containing multiple parts with factual information. Proceedings SLaTE 2013, 137–142. Porto, Portugal: ISCA.
  • Yoon, S., & Xie, S. (2014). Similarity based non-scorable response detection for automated speech scoring. Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 116–123. Baltimore, Maryland, USA: Association for Computational Linguistics.
  • Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., & Yu, K. (2010). The hidden information state model: A practical framework for POMDP-based spoken dialogue management. Computer Speech and Language, 24, 150–174. doi:10.1016/j.csl.2009.04.001
  • Zechner, K., & Bejar, I. (2006). Towards automatic scoring of non-native spontaneous speech. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 216–223. New York, NY, USA: Association for Computational Linguistics.
  • Zechner, K., Evanini, K., Yoon, S. Y., Davis, L., Wang, X., Chen, L., … Leong, C. W. (2014). Automated scoring of speaking items in an assessment for teachers of English as a Foreign Language. Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, 134–142. Baltimore, Maryland, USA: Association for Computational Linguistics.
  • Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895. doi:10.1016/j.specom.2009.04.009

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.