265
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Assessing learners’ English public speaking anxiety with multimodal deep learning technologies

ORCID Icon, , , , &
Received 18 Jul 2023, Accepted 27 Apr 2024, Published online: 11 May 2024

References

  • Ajjawi, R., & Boud, D. (2017). Researching feedback dialogue: An interactional analysis approach. Assessment & Evaluation in Higher Education, 42(2), 252–265. https://doi.org/10.1080/02602938.2015.1102863
  • Al-Inbari, F. A. Y., & Al-Wasy, B. Q. M. (2022). The impact of automated writing evaluation (AWE) on EFL learners’ peer and self-editing. Education and Information Technologies, 28, 6645–6665. https://doi.org/10.1007/s10639-022-11458-x
  • Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). Wav2vec 2.0: A framework for self-supervised learning of speech representations [Paper presentation]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, December 6-12.
  • Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607
  • Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2022). Web-based language learning and speaking anxiety. Computer Assisted Language Learning, 35(5–6), 1058–1089. https://doi.org/10.1080/09588221.2020.1770293
  • Bodie, G. D. (2010). A racing heart, rattling knees, and ruminative thoughts: Defining, explaining, and treating public speaking anxiety. Communication Education, 59(1), 70–105. https://doi.org/10.1080/03634520903443849
  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162). Sage.
  • Butt, A. R., Arsalan, A., & Majid, M. (2020). Multimodal personality trait recognition using wearable sensors in response to public speaking. IEEE Sensors Journal, 20(12), 6532–6541. https://doi.org/10.1109/JSEN.2020.2976159
  • Chen, L., Leong, C. W., Feng, G., Lee, C. M., & Somasundaran, S. (2015). Utilizing multimodal cues to automatically evaluate public speaking performance [Paper presentation]. Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, NW Washington, DC, United States, September 21-24. https://doi.org/10.1109/ACII.2015.7344601
  • Chen, Y. (2022). Effects of technology-enhanced language learning on reducing EFL learners’ public speaking anxiety. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2022.2055083
  • Chen, H., & Pan, J. (2022). Computer or human: A comparative study of automated evaluation scoring and instructors’ feedback on Chinese college students’ English writing. Asian-Pacific Journal of Second and Foreign Language Education, 7, 1–20. https://doi.org/10.1186/s40862-022-00171-4
  • Coskun, A. (2016). Causes of the “I Can Understand English but I Can’t Speak” Syndrome in Turkey. Journal on English Language Teaching, 6(3), 1–12. https://doi.org/10.26634/jelt.6.3.8174
  • Davis, F. D. (1986). A technology acceptance model for empirically testing new end-user information systems: Theory and results [Doctoral dissertation]. Sloan School of Management, Massachusetts Institute of Technology.
  • Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Florence, Italy, July 28-August 2. https://doi.org/10.18653/v1/N19-1423
  • Divekar, R. R., Drozdal, J., Chabot, S., Zhou, Y., Su, H., Chen, Y., … Braasch, J. (2022). Foreign language acquisition via artificial intelligence and extended reality: Design and evaluation. Computer Assisted Language Learning, 35(9), 2332–2360. https://doi.org/10.1080/09588221.2021.1879162
  • Engelhard, G.Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.), Large–Scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 261–287). Lawrence Erlbaum.
  • Fuyuno, M., Komiya, R., & Saitoh, T. (2018). Multimodal analysis of public speaking performance by EFL learners: Applying deep learning to understanding how successful speakers use facial movement. Asian Journal of Applied Linguistics, 5(1), 117–129. https://caes.hku.hk/ajal/index.php/ajal/article/view/508
  • Gabory, E., & Chollet, M. (2020). Investigating the influence of sound design for inducing anxiety in virtual public speaking [Paper presentation]. Companion Publication of the 2020 International Conference on Multimodal Interaction, Virtual Event, October 25-29. Netherlands. https://doi.org/10.1145/3395035.3425227
  • Gan, T., Li, J., Wong, Y., & Kankanhalli, M. S. (2019). A multi-sensor framework for personal presentation analytics. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(2), 1–21. https://doi.org/10.1145/3300941
  • Giannakakis, G., Grigoriadis, D., Giannakaki, K., Simantiraki, O., Roniotis, A., & Tsiknakis, M. (2019). Review on psychological stress detection using biosignals. IEEE Transactions on Affective Computing, 13(1), 440–460. https://doi.org/10.1109/TAFFC.2019.2927337
  • Giannakakis, G., Pediaditis, M., Manousos, D., Kazantzaki, E., Chiarugi, F., Simos, P. G., … Tsiknakis, M. (2017). Stress and anxiety detection using facial cues from videos. Biomedical Signal Processing and Control, 31, 89–101. https://doi.org/10.1016/j.bspc.2016.06.020
  • González, A. A., Castillo, M. M. M., Guzmán, A. S., & Merino, A. D. P. (2022 Threshold-based anxiety detection algorithm through ECG and GSR signals [Paper presentation].2022 IEEE Sixth Ecuador Technical Chapters Meeting (ETCM), Quito. October 11-14). https://doi.org/10.1109/ETCM56276.2022.9935706
  • Gregersen, T. (2020). Dynamic properties of language anxiety. Studies in Second Language Learning and Teaching, 10(1), 67–87. https://doi.org/10.14746/ssllt.2020.10.1.4
  • Gregersen, T. S. (2005). Nonverbal cues: Clues to the detection of foreign language anxiety. Foreign Language Annals, 38(3), 388–400. https://doi.org/10.1111/j.1944-9720.2005.tb02225.x
  • Hasan, M. K., Rahman, W., Zadeh, A., Zhong, J., Tanveer, M. I., & Morency, L. P. (2019). UR-FUNNY: A multimodal language dataset for understanding humor [Paper presentation]. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, November 3-7. https://doi.org/10.18653/v1/D19-1211
  • Hazarika, D., Zimmermann, R., & Poria, S. (2020). MISA: Modality-invariant and-specific representations for multimodal sentiment analysis [Paper presentation]. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, United States. October 12-16. https://doi.org/10.1145/3394171.3413678
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016, June 27-30) Deep residual learning for image recognition [Paper presentation]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, United States. https://doi.org/10.1109/CVPR.2016.90
  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. r., Jaitly, N., … Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597
  • Horwitz, E. K., Horwitz, M. B., & Cope, J. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70(2), 125–132. https://doi.org/10.2307/327317
  • Huang, F., Wen, W., & Liu, G. (2016). Facial expression recognition of public speaking anxiety [Paper presentation].2016 9th International Symposium on Computational Intelligence and Design, Hangzhou, China, December 10-11. https://doi.org/10.1109/ISCID.2016.1061
  • Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187–200. https://doi.org/10.1007/BF02289233
  • Kimani, E., Bickmore, T., Picard, R., Goodwin, M., & Jimison, H. (2022). Real-time public speaking anxiety prediction model for oral presentations [Paper presentation]. Companion Publication of the 2022 International Conference on Multimodal Interaction, Bengaluru, India, November 7-11. https://doi.org/10.1145/3536220.3563686
  • Kusumawat, A. J., & Fauzia, F. S. (2019). Students’ anxiety in Indonesian EFL public speaking class: A quantitative research [Paper presentation]. Proceedings of the 2019 5th International Conference on Education and Training Technologies, Seoul, Republic of Korea, May 27-29. https://doi.org/10.1145/3337682.3337703
  • Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R. Routledge.
  • Lee, S.,Lee, T.,Yang, T.,Yoon, C.,Kim, S., &P. (2020). Detection of drivers’ anxiety invoked by driving situations using multimodal biosignals. Processes, 8(2), 155. https://doi.org/10.3390/pr8020155
  • Lee, H., & Kleinsmith, A. (2019, May 4-9) Public speaking anxiety in a real classroom: Towards developing a reflection system [Paper presentation]. Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland, UK. https://doi.org/10.1145/3290607.3312875
  • Li, X., Liu, M., & Zhang, C. (2020). Technological impact on language anxiety dynamic. Computers & Education, 150, 103839. https://doi.org/10.1016/j.compedu.2020.103839
  • Liu, Z., Shen, Y., Lakshminarasimhan, V. B., Liang, P. P., Zadeh, A., & Morency, L. P., (2018). Efficient low-rank multimodal fusion with modality-specific factors [Paper presentation]. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, July 15-20. https://doi.org/10.18653/v1/P18-1209
  • Lucas, S. E., & Yin, S. (2011). The art of public speaking (teacher’s book) (10th Eds.). Foreign Language Teaching and Research Press.
  • Ma, Q. (2020). Examining the role of inter-group peer online feedback on wiki writing in an EAP context. Computer Assisted Language Learning, 33(3), 197–216. https://doi.org/10.1080/09588221.2018.1556703
  • MacIntyre, P. D. (2017). An overview of language anxiety research and trends in its development. In C. Gkonou, M. Daubney, & J.-M. Dewaele (Eds.), New insights into language anxiety: Theory, research and educational implications (pp. 11–30). Multilingual Matters.https://doi.org/10.21832/9781783097722-003
  • McCroskey, J. C. (1970). Measures of communication-bound anxiety. Speech Monographs, 37(4), 269–277. https://doi.org/10.1080/03637757009375677
  • Mihoub, A., & Lefebvre, G. (2019). Wearables and social signal processing for smarter public presentations. ACM Transactions on Interactive Intelligent Systems (TiiS), 9(2–3), 1–24. https://doi.org/10.1145/3234507
  • Min, S., He, L., & Zhang, J. (2020). Review of recent empirical research (2011-2018) on language assessment in China. Language Teaching, 53(3), 316–340. https://doi.org/10.1017/S0261444820000051
  • Palmas, F., Reinelt, R., Cichor, J. E., Plecher, D. A., & Klinker, G. (2021). Virtual reality public speaking training: Experimental evaluation of direct feedback technology acceptance [Paper presentation]. 2021 IEEE Virtual Reality and 3D User Interfaces (VR), Lisboa, Portugal, March 27- April 1. https://doi.org/10.1109/VR50410.2021.00070
  • Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (1985). A conceptual model of service quality and its implications for future research. Journal of Marketing, 49(4), 41–50. https://doi.org/10.1177/002224298504900403
  • Qian, C., Feng, F., Wen, L., Ma, C., & Xie, P. (2021, August 1-6). Counterfactual inference for text classification debiasing [Paper presentation]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1), Bangkok, Thailand. https://doi.org/10.18653/v1/2021.acl-long.422
  • Raether, J., Nirjhar, E. H., & Chaspari, T. (2022). Evaluating just-in-time vibrotactile feedback for communication anxiety [Paper presentation]. Proceedings of the 2022 International Conference on Multimodal Interaction, Bengaluru, India, November 7-11. https://doi.org/10.1145/3536221.3556590
  • Roca, J. C., Chiu, C. M., & Martínez, F. J. (2006). Understanding e-learning continuance intention: An extension of the technology acceptance model. International Journal of Human-Computer Studies, 64(8), 683–696. https://doi.org/10.1016/j.ijhcs.2006.01.003
  • Schneider, J., Börner, D., Van Rosmalen, P., & Specht, M. (2015). Presentation trainer, your public speaking multimodal coach [Paper presentation]. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, Washington, United States. November 9-13. https://doi.org/10.1145/2818346.2830603
  • Schwerdtfeger, A. (2004). Predicting autonomic reactivity to public speaking: Don’t get fixed on self-report data!. International Journal of Psychophysiology, 52(3), 217–224. https://doi.org/10.1016/j.ijpsycho.2003.10.008
  • Senaratne, H., Kuhlmann, L., Ellis, K., Melvin, G., & Oviatt, S. (2021). A multimodal dataset and evaluation for feature estimators of temporal phases of anxiety. Proceedings of the 2021 International [Paper presentation]. Conference on Multimodal Interaction, Montréal, QC, Canada. October 18-22. https://doi.org/10.1145/3462244.3479900
  • Senaratne, H., Oviatt, S., Ellis, K., & Melvin, G. (2022). A critical review of multimodal-multisensor analytics for anxiety assessment. ACM Transactions on Computing for Healthcare, 3(4), 1–42. https://doi.org/10.1145/3556980
  • Shachak, A., Kuziemsky, C., & Petersen, C. (2019). Beyond TAM and UTAUT: Future directions for HIT implementation research. Journal of Biomedical Informatics, 100, 103315. https://doi.org/10.1016/j.jbi.2019.103315
  • Song, W., Wu, B., Zheng, C., & Zhang, H. (2023). Detection of public speaking anxiety: A new dataset and algorithm [Paper presentation]. 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, July 10–14. https://doi.org/10.1109/ICME55011.2023.00448
  • Spreng, R. A., MacKenzie, S. B., & Olshavsky, R. W. (1996). A reexamination of the determinants of consumer satisfaction. Journal of Marketing, 60(3), 15–32. https://doi.org/10.1177/002224299606000302
  • Sülter, R. E., Ketelaar, P. E., & Lange, W. G. (2022). SpeakApp-Kids! Virtual reality training to reduce fear of public speaking in children–A proof of concept. Computers & Education, 178, 104384. https://doi.org/10.1016/j.compedu.2021.104384
  • Sun, H., Wang, H., Liu, J., Chen, Y. W., & Lin, L. (2022a). CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation [Paper presentation]. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10-14. https://doi.org/10.1145/3503161.3548025
  • Sun, T., Wang, W., Jing, L., Cui, Y., Song, X., & Nie, L. (2022b). Counterfactual reasoning for out-of-distribution multimodal sentiment analysis [Paper presentation]. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10-14. https://doi.org/10.1145/3503161.3548211
  • Torres, K. M., & Turner, J. E. (2016). Students’ foreign language anxiety and self-efficacy beliefs across different levels of university foreign language coursework. Journal of Spanish Language Teaching, 3(1), 57–73. https://doi.org/10.1080/23247797.2016.1163101
  • Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition [Paper presentation]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, United States, June 18-22. https://doi.org/10.1109/CVPR.2018.00675
  • Vallade, J. I., Kaufmann, R., Frisby, B. N., & Martin, J. C. (2021). Technology acceptance model: Investigating students’ intentions toward adoption of immersive 360 videos for public speaking rehearsals. Communication Education, 70(2), 127–145. https://doi.org/10.1080/03634523.2020.1791351
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need [Paper presentation]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, United States, December 4-9.
  • Wang, C., Liang, L., Liu, X., Lu, Y., Shen, J., Luo, H., & Xie, W. (2021). Multimodal fusion diagnosis of depression and anxiety based on face video [Paper presentation]. 2021 IEEE International Conference on Medical Imaging Physics and Engineering, Hefei, China, November 12-14. https://doi.org/10.1109/ICMIPE53131.2021.9698881
  • Wang, D., Guo, X., Tian, Y., Liu, J., He, L., & Luo, X. (2023). TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recognition, 136, 109259. https://doi.org/10.1016/j.patcog.2022.109259
  • West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues and applications (pp. 56–75). Sage.
  • Wu, G., Long, Y., Li, Y., Pan, L., Wang, E., & Dai, L. (2009). iFLY system for the NIST 2008 speaker recognition evaluation [Paper presentation]. IEEE international conference on acoustics, speech and signal processing. https://doi.org/10.1109/ICASSP.2009.4960557
  • Yadav, M., Sakib, M. N., Nirjhar, E. H., Feng, K., Behzadan, A. H., & Chaspari, T. (2020). Exploring individual differences of public speaking anxiety in real-life and virtual presentations. IEEE Transactions on Affective Computing, 13(3), 1168–1182. https://doi.org/10.1109/TAFFC.2020.3048299
  • Yan, Q., Zhang, L. J., & Dixon, H. R. (2022). Exploring classroom-based assessment for young EFL learners in the Chinese context: Teachers’ beliefs and practices. Frontiers in Psychology, 13, 1051728. https://doi.org/10.3389/fpsyg.2022.1051728
  • Yoon, J., Kang, C., Kim, S., & Han, J. (2022). D-vlog: Multimodal vlog dataset for depression detection [Paper presentation]. Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, United States, February 7-14. https://doi.org/10.1609/aaai.v36i11.21483
  • Yuan, Z., Li, W., Xu, H., & Yu, W. (2021, October 20-24). Transformer-based feature reconstruction network for robust multimodal sentiment analysis [Paper presentation]. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China. https://doi.org/10.1145/3474085.3475585
  • Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L. P. (2017). Tensor fusion network for multimodal sentiment analysis [Paper presentation]. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, September 7-11. https://doi.org/10.18653/v1/d17-1115
  • Zhai, N., & Ma, X. (2021). Automated writing evaluation (AWE) feedback: A systematic investigation of college students’ acceptance. Computer Assisted Language Learning, 35(9), 2817–2842. https://doi.org/10.1080/09588221.2021.1897019
  • Zhang, D., Ju, X., Li, J., Li, S., Zhu, Q., & Zhou, G. (2020a). Multi-modal multi-label emotion detection with modality and label dependence [Paper presentation]. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 16-20. Online. https://doi.org/10.18653/v1/2020.emnlp-main.291
  • Zhang, X., Pan, J., Shen, J., Din, Z., Li, J., Lu, D., … Hu, B. (2020b). Fusing of electroencephalogram and eye movement with group sparse canonical correlation analysis for anxiety detection. IEEE Transactions on Affective Computing, 13(2), 958–971. https://doi.org/10.1109/TAFFC.2020.2981440
  • Zhang, M., Bridgeman, B., & Davis, L. (2019a). Validity considerations for using automated scoring in speaking assessment. In K. Zechner, & K Evanini (Eds.), Automated speaking assessment (pp. 21–31) Routledge. https://doi.org/10.4324/9781315165103-2
  • Zhang, D., Li, S., Zhu, Q., & Zhou, G. (2019b, October 21-25). Effective sentiment-relevant word selection for multi-modal sentiment analysis in spoken language [Paper presentation]. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France. https://doi.org/10.1145/3343031.3350987
  • Zheng, C., & Li, C. (2015). Application of formative assessment in a Chinese EFL course. In M. Li & Y. Zhao (Eds.), Exploring learning & teaching in higher education (pp. 395–410). Springer. https://doi.org/10.1007/978-3-642-55352-3_18
  • Zheng, C., Chen, X., Zhang, H., & Chai, C. (In press). Automated versus peer assessment: Effects on learners’ English public speaking. Language Learning & Technology, 28(1), 1–34.
  • Zheng, C., Wang, L., & Chai, C. S. (2023). Self-assessment first or peer-assessment first: Effects of video-based formative practice on learners’ English public speaking anxiety and performance. Computer Assisted Language Learning, 36(4), 806–839. https://doi.org/10.1080/09588221.2021.1946562
  • Zhou, H., Zhou, X., Zeng, Z., Zhang, L., & Shen, Z. (2023). A comprehensive survey on multimodal recommender systems: Taxonomy, evaluation, and future directions. https://doi.org/10.48550/arXiv.2302.04473

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.