180
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

CNAMD Corpus: A Chinese Natural Audiovisual Multimodal Database of Conversations for Social Interactive Agents

ORCID Icon, , , , , & show all
Pages 2041-2053 | Received 28 Nov 2022, Accepted 19 Jun 2023, Published online: 04 Jul 2023

References

  • Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3–4), 273–287. https://doi.org/10.1007/s10579-007-9061-5
  • Avruch, K. (2004). Culture as context, culture as communication: Considerations for humanitarian negotiators. Harvard Negotiation Law Review, 9, 391. https://heinonline.org/HOL/P?h=hein.journals/haneg9&i=395
  • Bao, J., Duan, N., Yan, Z., Zhou, M., & Zhao, T. (2016). Constraint-based question answering with knowledge graph. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan (pp. 2503–2514). The COLING 2016 Organizing Committee. https://doi.org/10.1007/978-3-319-45510-5_4
  • Baseer, N., Mahboob, U., & Degnan, J. (2017). Micro-feedback training: Learning the art of effective feedback. Pakistan Journal of Medical Sciences, 33(6), 525–1527. https://doi.org/10.12669/pjms.336.13721
  • Berant, J., Chou, A., Frostig, R., & Liang, P. (2013, October). Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1533–1544). https://aclanthology.org/D13-1160.pdf
  • Boudin, A. (2022). Interdisciplinary corpus-based approach for exploring multimodal conversational feedback. In International Conference on Multimodal Interaction (pp. 705–710). Association for Computing Machinery. https://doi.org/10.1145/3536221.3557029
  • Burmania, A., Parthasarathy, S., & Busso, C. (2016). Increasing the reliability of crowdsourcing evaluations using online quality assessment. IEEE Transactions on Affective Computing, 7(4), 374–388. https://doi.org/10.1109/TAFFC.2015.2493525
  • Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., & Provost, E. M. (2017). MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 8(1), 67–80. https://doi.org/10.1109/TAFFC.2016.2515617
  • Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., & Stone, M. (1994). Animated conversation: Rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques (pp. 413–420). Association for Computing Machinery. https://doi.org/10.1145/192161.192272
  • Cassell, J., & Stone, M. (1999). Living hand to mouth: Psychological theories about speech and gesture in interactive dialogue systems [Paper presentation]. Proceedings of AAAI 1999 Fall Symposium on Narrative Intelligence, North Falmouth, MA. https://www.aaai.org/Papers/Symposia/Fall/1999/FS-99-03/FS99-03-005.pdf
  • Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. ArXiv Preprint ArXiv:1911.02116. https://doi.org/10.48550/arXiv.1911.02116
  • Fan, X., Zhang, F., Wang, H., & Lu, X. (2012). The system of face detection based on OpenCV. In 2012 24th Chinese Control and Decision Conference (CCDC), Taiyuan (pp. 648–651). https://doi.org/10.1109/CCDC.2012.6242980
  • Gadbois, E. A., Jimenez, F., Brazier, J. F., Davoodi, N. M., Nunn, A. S., Mills, W. L., Dosa, D., & Thomas, K. S. (2022). Findings from talking tech: A technology training pilot intervention to reduce loneliness and social isolation among homebound older adults. Innovation in Aging, 6(5), 1–12. https://doi.org/10.1093/geroni/igac040
  • Gander, A. J., Lindström, N. B., & Gander, P. (2021). Expressing agreement in Swedish and Chinese: A case study of communicative feedback in first-time encounters. In P.-L. P. Rau (Ed.), International Conference on Human-Computer Interaction (pp. 390–407). Springer International Publishing. https://doi.org/10.1007/978-3-030-77074-7_30
  • Gudykunst, W. B. (1983). Uncertainty reduction and predictability of behavior in low-and high-context cultures: An exploratory study. Communication Quarterly, 31(1), 49–55. https://doi.org/10.1080/01463378309369485
  • Hall, E. T. (1976). Beyond culture. Anchor. https://books.google.com.hk/books?id=reByw3FWVWsC
  • Hennessy, C., & Forrester, G. (2014). Developing a framework for effective audio feedback: A case study. Assessment & Evaluation in Higher Education, 39(7), 777–789. https://doi.org/10.1080/02602938.2013.870530
  • Hough, J., Tian, Y., de Ruiter, L., Betz, S., Kousidis, S., Schlangen, D., & Ginzburg, J. (2016). DUEL: A multi-lingual multimodal dialogue corpus for disfluency, exclamations and laughter. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 1784–1788). European Language Resources Association (ELRA). https://aclanthology.org/L16-1281
  • Huang, L., Morency, L.-P., & Gratch, J. (2011). Virtual rapport 2.0. In International Workshop on Intelligent Virtual Agents (pp. 68–79). https://doi.org/10.1007/978-3-642-23974-8_8
  • Huis in ‘t Veld, E. M. J., Van Boxtel, G. J. M., & de Gelder, B. (2014). The Body Action Coding System I: Muscle activations during the perception and expression of emotion. Social Neuroscience, 9(3), 249–264. https://doi.org/10.1080/17470919.2014.890668
  • Jack, R. E., & Schyns, P. G. (2017). Toward a social psychophysics of face communication. Annual Review of Psychology, 68(1), 269–297. https://doi.org/10.1146/annurev-psych-010416-044242
  • Kaynak, E., Kara, A., & Maksüdünov, A. (2022). An empirical investigation of home buying behavior in a high-context culture: A strategic marketing-oriented approach. International Journal of Housing Markets and Analysis. https://doi.org/10.1108/IJHMA-07-2022-0095
  • Kim, D., Pan, Y., & Park, H. S. (1998). High-versus low-context culture: A comparison of Chinese, Korean, and American cultures. Psychology and Marketing, 15(6), 507–521. https://doi.org/10.1002/(SICI)1520-6793(199809)15:6<507::AID-MAR2>3.0.CO;2-A
  • King, D. E. (2009). Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10, 1755–1758. https://doi.org/10.5555/1577069.1755843
  • Kotov, A., Zaidelman, L., Zinina, A., Arinkin, N., Filatov, A., & Kivva, K. (2021). Conceptual processing system for a companion robot. Cognitive Systems Research, 67(1), 28–32. https://doi.org/10.1016/j.cogsys.2020.12.007
  • Lee, G., Deng, Z., Ma, S., Shiratori, T., Srinivasa, S. S., & Sheikh, Y. (2019). Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea (pp. 763–772). https://doi.org/10.1109/iccv.2019.00085
  • Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One, 13(5), 0196391. https://doi.org/10.1371/journal.pone.0196391
  • Lotfian, R., & Busso, C. (2019). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471–483. https://doi.org/10.1109/TAFFC.2017.2736999
  • Loveys, K., Hiko, C., Sagar, M., Zhang, X., & Broadbent, E. (2022). “I felt her company”: A qualitative study on factors affecting closeness and emotional support seeking with an embodied conversational agent. International Journal of Human-Computer Studies, 160(1), 102771. https://doi.org/10.1016/j.ijhcs.2021.102771
  • Lu, J., & Allwood, J. (2011). Unimodal and multimodal feedback in Chinese and Swedish mono-cultural and intercultural interactions (a pilot study). In Proceedings of the 3rd Nordic Symposium on Multimodal Communication, NEALT (Vol. 15, pp. 40–47). https://doi.org/10.1016/s0958-2118(96)90196-2
  • Lugrin, B., Pelachaud, C., & Traum, D. (2022). The handbook on socially interactive agents: 20 Years of research on embodied conversational agents, intelligent virtual agents, and social robotics, volume 2: Interactivity, platforms, application. Association for Computing Machinery and Morgan & Claypool Publishers. https://books.google.com/books?id=iwiWEAAAQBAJ
  • Maat, M. T., Truong, K. P., & Heylen, D. (2010). How turn-taking strategies influence users’ impressions of an agent. In J. Allbeck, N. Badler, T. Bickmore, C. Pelachaud, & A. Safonova (Eds.), International Conference on Intelligent Virtual Agents (pp. 441–453). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-15892-6_48
  • Metallinou, A., Yang, Z., Lee, C., Busso, C., Carnicke, S., & Narayanan, S. (2016). The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations. Language Resources and Evaluation, 50(3), 497–521. https://doi.org/10.1007/s10579-015-9300-0
  • Moneglia, M., Brown, S. W., Frontini, F., Gagliardi, G., Khan, F., Monachini, M., Panunzi, A., & (2014). The IMAGACT visual ontology. An extendable multilingual infrastructure for the representation of lexical encoding of action. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 3425–3432). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2014/pdf/318_Paper.pdf
  • Munalim, L. O., Genuino, C. F., & Tuttle, B. E. (2022). Turn-taking model for Filipinos’ high-context communication style from no-answered and non-answered questions in faculty meetings. 3L the Southeast Asian Journal of English Language Studies, 28(1), 44–59. https://doi.org/10.17576/3L-2022-2801-04
  • Navarretta, C., Ahlsén, E., Allwood, J., Jokinen, K., & Paggio, P. (2011). Creating comparable multimodal corpora for Nordic languages. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011) (pp. 153–160). Northern European Association for Language Technology (NEALT). https://aclanthology.org/W11-4621.pdf
  • Paggio, P., & Navarretta, C. (2011). Feedback and gestural behaviour in a conversational corpus of Danish. In P. Paggio, E. Ahlsén, J. Allwood, K. Jokinen, & C. Navarrett (Eds.), NEALT (Northen European Association of Language Technology) Proceedings Series (pp. 33–39). Northern European Association for Language Technology.
  • Pasternak, K., Wu, Z., Visser, U., & Lisetti, C. (2021). Let’s be friends! A rapport-building 3D embodied conversational agent for the Human Support Robot. ArXiv Preprint ArXiv:2103.04498. https://doi.org/10.48550/arXiv.2103.04498
  • Popescu, V., Liu, L., Del Gratta, R., Choukri, K., & Calzolari, N. (2016). New developments in the LRE map. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 4526–4530). European Language Resources Association (ELRA). https://aclanthology.org/L16-1716
  • Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China (pp. 1–8). https://doi.org/10.1109/fg.2013.6553805
  • Sadoughi, N., & Busso, C. (2021). Speech-driven expressive talking lips with conditional sequential generative adversarial networks. IEEE Transactions on Affective Computing, 12(4), 1031–1044. https://doi.org/10.1109/TAFFC.2019.2916031
  • Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In INTERSPEECH (pp. 338–342). https://doi.org/10.21437/interspeech.2014-80
  • Sharma, S., Gangadhara, K. G., Xu, F., Slowe, A. S., Frank, M. G., & Nwogu, I. (2021). Coupled systems for modeling rapport between interlocutors. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India (pp. 1–8). IEEE Press. https://doi.org/10.1109/FG52635.2021.9667067
  • Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55. https://doi.org/10.1109/T-AFFC.2011.25
  • Sun, T., Hu, Q., Libby, J., & Atashzar, S. F. (2022). Deep heterogeneous dilation of LSTM for transient-phase gesture prediction through high-density electromyography: Towards application in neurorobotics. IEEE Robotics and Automation Letters, 7(2), 2851–2858. https://doi.org/10.1109/LRA.2022.3142721
  • Talmor, A., & Berant, J. (2018). The web as a knowledge-base for answering complex questions. ArXiv Preprint ArXiv:1803.06643. https://doi.org/10.48550/arXiv.1803.06643
  • Toivio, E., & Jokinen, K. (2012). Multimodal feedback signaling in Finnish (pp. 247–255). Baltic HLT. https://books.google.com/books?id=ve/_AAQAAQBAJ
  • Trotta, D., & Guarasci, R. (2021). How are gestures used by politicians? A multimodal co-gesture analysis. Italian Journal of Computational Linguistics, 7(1), 45–66. https://doi.org/10.4000/ijcol.827
  • Vidal, A., Salman, A., Lin, W.-C., & Busso, C. (2020). MSP-face corpus: A natural audiovisual emotional database. In Proceedings of the 2020 International Conference on Multimodal Interaction (pp. 397–405). Association for Computing Machinery. https://doi.org/10.1145/3382507.3418872
  • Vila-Lopez, N., Boluda, I.-K., & Marin-Aguilar, J. T. (2022). Improving residents’ quality of life through sustainable experiential mega-events: High-versus low-context cultures. Journal of Hospitality & Tourism Research, 46(5), 979–1005. https://doi.org/10.1177/1096348020901775
  • Wang, I., & Ruiz, J. (2021). Examining the use of nonverbal communication in virtual agents. International Journal of Human–Computer Interaction, 37(17), 1648–1673. https://doi.org/10.1080/10447318.2021.1898851
  • Wu, B., Zhong, J., & Yang, C. (2022). A visual-based gesture prediction framework applied in social robots. IEEE/CAA Journal of Automatica Sinica, 9(3), 510–519. https://doi.org/10.1109/JAS.2021.1004243
  • Würtz, E. (2005). Intercultural communication on web sites: A cross-cultural analysis of web sites from high-context cultures and low-context cultures. Journal of Computer-Mediated Communication, 11(1), 274–299. https://doi.org/10.1111/j.1083-6101.2006.tb00313.x
  • Yang, T.-Y., Chen, Y.-T., Lin, Y.-Y., & Chuang, Y.-Y. (2019). FSA-net: Learning fine-grained structure aggregation for head pose estimation from a single image [Paper presentation]. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA (pp. 1087–1096). https://doi.org/10.1109/cvpr.2019.00118
  • Yoon, Y., Ko, W.-R., Jang, M., Lee, J., Kim, J., & Lee, G. (2019). Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots [Paper presentation]. In 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada (pp. 4303–4309). https://doi.org/10.1109/ICRA.2019.8793720
  • Zadeh, A., Zellers, R., Pincus, E., & Morency, L.-P. (2016). Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, 31(6), 82–88. https://doi.org/10.1109/MIS.2016.94
  • Zhang, B., Zhu, Y., Deng, J., Zheng, W., Liu, Y., Wang, C., & Zeng, R. (2023). “I am here to assist your tourism”: Predicting continuance intention to use AI-based chatbots for tourism. Does gender really matter? International Journal of Human–Computer Interaction, 39(9), 1887–1903. https://doi.org/10.1080/10447318.2022.2124345
  • Zhang, H., Yuan, T., Chen, J., Li, X., Zheng, R., Huang, Y., Chen, X., Gong, E., Chen, Z., Hu, X., Yu, D., Ma, Y., & Huang, L. (2022). PaddleSpeech: An easy-to-use all-in-one speech toolkit. ArXiv Preprint ArXiv220512007. https://doi.org/10.18653/v1/2022.naacl-demo.12
  • Zhang, S., Liu, C., Jiang, H., Wei, S., Dai, L., & Hu, Y. (2015). Feedforward sequential memory networks: A new structure to learn long-term dependency. ArXiv Preprint ArXiv:1512.08301. https://doi.org/10.48550/arXiv.1512.08301
  • Zhang, Y., Che, H., Li, J., Li, C., Wang, X., & Wang, Z. (2021). One-shot voice conversion based on speaker aware module. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada (pp. 5959–5963). https://doi.org/10.1109/icassp39728.2021.9414081

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.