Search in:

International Journal of Human–Computer Interaction Volume 40, 2024 - Issue 8: Human-Computer Interaction Innovations in China

Submit an article Journal homepage

180

Views

CrossRef citations to date

Altmetric

Research Articles

CNAMD Corpus: A Chinese Natural Audiovisual Multimodal Database of Conversations for Social Interactive Agents

Jingyu Wua College of Computer Science and Technology, Zhejiang University, Hangzhou, China

https://orcid.org/0000-0002-2586-330X View further author information

Shi Chena College of Computer Science and Technology, Zhejiang University, Hangzhou, China;b Zhejiang-Singapore Innovation and AI Joint Research Lab, Hangzhou, ChinaView further author information

Wei Xianga College of Computer Science and Technology, Zhejiang University, Hangzhou, ChinaCorrespondence[email protected]
View further author information

Lingyun Suna College of Computer Science and Technology, Zhejiang University, Hangzhou, China;b Zhejiang-Singapore Innovation and AI Joint Research Lab, Hangzhou, ChinaView further author information

Hongzeng Zhanga College of Computer Science and Technology, Zhejiang University, Hangzhou, ChinaView further author information

Zhengyu Zhanga College of Computer Science and Technology, Zhejiang University, Hangzhou, ChinaView further author information

Yanxu Lia College of Computer Science and Technology, Zhejiang University, Hangzhou, ChinaView further author information

show all

Pages 2041-2053 | Received 28 Nov 2022, Accepted 19 Jun 2023, Published online: 04 Jul 2023

Cite this article
https://doi.org/10.1080/10447318.2023.2228530
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3–4), 273–287. https://doi.org/10.1007/s10579-007-9061-5
Web of Science ®Google Scholar
Avruch, K. (2004). Culture as context, culture as communication: Considerations for humanitarian negotiators. Harvard Negotiation Law Review, 9, 391. https://heinonline.org/HOL/P?h=hein.journals/haneg9&i=395
Google Scholar
Bao, J., Duan, N., Yan, Z., Zhou, M., & Zhao, T. (2016). Constraint-based question answering with knowledge graph. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan (pp. 2503–2514). The COLING 2016 Organizing Committee. https://doi.org/10.1007/978-3-319-45510-5_4
Google Scholar
Baseer, N., Mahboob, U., & Degnan, J. (2017). Micro-feedback training: Learning the art of effective feedback. Pakistan Journal of Medical Sciences, 33(6), 525–1527. https://doi.org/10.12669/pjms.336.13721
Web of Science ®Google Scholar
Berant, J., Chou, A., Frostig, R., & Liang, P. (2013, October). Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1533–1544). https://aclanthology.org/D13-1160.pdf
Google Scholar
Boudin, A. (2022). Interdisciplinary corpus-based approach for exploring multimodal conversational feedback. In International Conference on Multimodal Interaction (pp. 705–710). Association for Computing Machinery. https://doi.org/10.1145/3536221.3557029
Google Scholar
Burmania, A., Parthasarathy, S., & Busso, C. (2016). Increasing the reliability of crowdsourcing evaluations using online quality assessment. IEEE Transactions on Affective Computing, 7(4), 374–388. https://doi.org/10.1109/TAFFC.2015.2493525
Web of Science ®Google Scholar
Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., & Provost, E. M. (2017). MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 8(1), 67–80. https://doi.org/10.1109/TAFFC.2016.2515617
Web of Science ®Google Scholar
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., & Stone, M. (1994). Animated conversation: Rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques (pp. 413–420). Association for Computing Machinery. https://doi.org/10.1145/192161.192272
Google Scholar
Cassell, J., & Stone, M. (1999). Living hand to mouth: Psychological theories about speech and gesture in interactive dialogue systems [Paper presentation]. Proceedings of AAAI 1999 Fall Symposium on Narrative Intelligence, North Falmouth, MA. https://www.aaai.org/Papers/Symposia/Fall/1999/FS-99-03/FS99-03-005.pdf
Google Scholar
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. ArXiv Preprint ArXiv:1911.02116. https://doi.org/10.48550/arXiv.1911.02116
Google Scholar
Fan, X., Zhang, F., Wang, H., & Lu, X. (2012). The system of face detection based on OpenCV. In 2012 24th Chinese Control and Decision Conference (CCDC), Taiyuan (pp. 648–651). https://doi.org/10.1109/CCDC.2012.6242980
Google Scholar
Gadbois, E. A., Jimenez, F., Brazier, J. F., Davoodi, N. M., Nunn, A. S., Mills, W. L., Dosa, D., & Thomas, K. S. (2022). Findings from talking tech: A technology training pilot intervention to reduce loneliness and social isolation among homebound older adults. Innovation in Aging, 6(5), 1–12. https://doi.org/10.1093/geroni/igac040
PubMed Web of Science ®Google Scholar
Gander, A. J., Lindström, N. B., & Gander, P. (2021). Expressing agreement in Swedish and Chinese: A case study of communicative feedback in first-time encounters. In P.-L. P. Rau (Ed.), International Conference on Human-Computer Interaction (pp. 390–407). Springer International Publishing. https://doi.org/10.1007/978-3-030-77074-7_30
Google Scholar
Gudykunst, W. B. (1983). Uncertainty reduction and predictability of behavior in low-and high-context cultures: An exploratory study. Communication Quarterly, 31(1), 49–55. https://doi.org/10.1080/01463378309369485
Google Scholar
Hall, E. T. (1976). Beyond culture. Anchor. https://books.google.com.hk/books?id=reByw3FWVWsC
Google Scholar
Hennessy, C., & Forrester, G. (2014). Developing a framework for effective audio feedback: A case study. Assessment & Evaluation in Higher Education, 39(7), 777–789. https://doi.org/10.1080/02602938.2013.870530
Web of Science ®Google Scholar
Hough, J., Tian, Y., de Ruiter, L., Betz, S., Kousidis, S., Schlangen, D., & Ginzburg, J. (2016). DUEL: A multi-lingual multimodal dialogue corpus for disfluency, exclamations and laughter. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 1784–1788). European Language Resources Association (ELRA). https://aclanthology.org/L16-1281
Google Scholar
Huang, L., Morency, L.-P., & Gratch, J. (2011). Virtual rapport 2.0. In International Workshop on Intelligent Virtual Agents (pp. 68–79). https://doi.org/10.1007/978-3-642-23974-8_8
Google Scholar
Huis in ‘t Veld, E. M. J., Van Boxtel, G. J. M., & de Gelder, B. (2014). The Body Action Coding System I: Muscle activations during the perception and expression of emotion. Social Neuroscience, 9(3), 249–264. https://doi.org/10.1080/17470919.2014.890668
Google Scholar
Jack, R. E., & Schyns, P. G. (2017). Toward a social psychophysics of face communication. Annual Review of Psychology, 68(1), 269–297. https://doi.org/10.1146/annurev-psych-010416-044242
PubMed Web of Science ®Google Scholar
Kaynak, E., Kara, A., & Maksüdünov, A. (2022). An empirical investigation of home buying behavior in a high-context culture: A strategic marketing-oriented approach. International Journal of Housing Markets and Analysis. https://doi.org/10.1108/IJHMA-07-2022-0095
Web of Science ®Google Scholar
Kim, D., Pan, Y., & Park, H. S. (1998). High-versus low-context culture: A comparison of Chinese, Korean, and American cultures. Psychology and Marketing, 15(6), 507–521. https://doi.org/10.1002/(SICI)1520-6793(199809)15:6<507::AID-MAR2>3.0.CO;2-A
Web of Science ®Google Scholar
King, D. E. (2009). Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10, 1755–1758. https://doi.org/10.5555/1577069.1755843
Web of Science ®Google Scholar
Kotov, A., Zaidelman, L., Zinina, A., Arinkin, N., Filatov, A., & Kivva, K. (2021). Conceptual processing system for a companion robot. Cognitive Systems Research, 67(1), 28–32. https://doi.org/10.1016/j.cogsys.2020.12.007
Web of Science ®Google Scholar
Lee, G., Deng, Z., Ma, S., Shiratori, T., Srinivasa, S. S., & Sheikh, Y. (2019). Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea (pp. 763–772). https://doi.org/10.1109/iccv.2019.00085
Google Scholar
Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One, 13(5), 0196391. https://doi.org/10.1371/journal.pone.0196391
PubMed Web of Science ®Google Scholar
Lotfian, R., & Busso, C. (2019). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471–483. https://doi.org/10.1109/TAFFC.2017.2736999
Web of Science ®Google Scholar
Loveys, K., Hiko, C., Sagar, M., Zhang, X., & Broadbent, E. (2022). “I felt her company”: A qualitative study on factors affecting closeness and emotional support seeking with an embodied conversational agent. International Journal of Human-Computer Studies, 160(1), 102771. https://doi.org/10.1016/j.ijhcs.2021.102771
Web of Science ®Google Scholar
Lu, J., & Allwood, J. (2011). Unimodal and multimodal feedback in Chinese and Swedish mono-cultural and intercultural interactions (a pilot study). In Proceedings of the 3rd Nordic Symposium on Multimodal Communication, NEALT (Vol. 15, pp. 40–47). https://doi.org/10.1016/s0958-2118(96)90196-2
Google Scholar
Lugrin, B., Pelachaud, C., & Traum, D. (2022). The handbook on socially interactive agents: 20 Years of research on embodied conversational agents, intelligent virtual agents, and social robotics, volume 2: Interactivity, platforms, application. Association for Computing Machinery and Morgan & Claypool Publishers. https://books.google.com/books?id=iwiWEAAAQBAJ
Google Scholar
Maat, M. T., Truong, K. P., & Heylen, D. (2010). How turn-taking strategies influence users’ impressions of an agent. In J. Allbeck, N. Badler, T. Bickmore, C. Pelachaud, & A. Safonova (Eds.), International Conference on Intelligent Virtual Agents (pp. 441–453). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-15892-6_48
Google Scholar
Metallinou, A., Yang, Z., Lee, C., Busso, C., Carnicke, S., & Narayanan, S. (2016). The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations. Language Resources and Evaluation, 50(3), 497–521. https://doi.org/10.1007/s10579-015-9300-0
Web of Science ®Google Scholar
Moneglia, M., Brown, S. W., Frontini, F., Gagliardi, G., Khan, F., Monachini, M., Panunzi, A., & (2014). The IMAGACT visual ontology. An extendable multilingual infrastructure for the representation of lexical encoding of action. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 3425–3432). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2014/pdf/318_Paper.pdf
Google Scholar
Munalim, L. O., Genuino, C. F., & Tuttle, B. E. (2022). Turn-taking model for Filipinos’ high-context communication style from no-answered and non-answered questions in faculty meetings. 3L the Southeast Asian Journal of English Language Studies, 28(1), 44–59. https://doi.org/10.17576/3L-2022-2801-04
Google Scholar
Navarretta, C., Ahlsén, E., Allwood, J., Jokinen, K., & Paggio, P. (2011). Creating comparable multimodal corpora for Nordic languages. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011) (pp. 153–160). Northern European Association for Language Technology (NEALT). https://aclanthology.org/W11-4621.pdf
Google Scholar
Paggio, P., & Navarretta, C. (2011). Feedback and gestural behaviour in a conversational corpus of Danish. In P. Paggio, E. Ahlsén, J. Allwood, K. Jokinen, & C. Navarrett (Eds.), NEALT (Northen European Association of Language Technology) Proceedings Series (pp. 33–39). Northern European Association for Language Technology.
Google Scholar
Pasternak, K., Wu, Z., Visser, U., & Lisetti, C. (2021). Let’s be friends! A rapport-building 3D embodied conversational agent for the Human Support Robot. ArXiv Preprint ArXiv:2103.04498. https://doi.org/10.48550/arXiv.2103.04498
Google Scholar
Popescu, V., Liu, L., Del Gratta, R., Choukri, K., & Calzolari, N. (2016). New developments in the LRE map. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 4526–4530). European Language Resources Association (ELRA). https://aclanthology.org/L16-1716
Google Scholar
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China (pp. 1–8). https://doi.org/10.1109/fg.2013.6553805
Google Scholar
Sadoughi, N., & Busso, C. (2021). Speech-driven expressive talking lips with conditional sequential generative adversarial networks. IEEE Transactions on Affective Computing, 12(4), 1031–1044. https://doi.org/10.1109/TAFFC.2019.2916031
Web of Science ®Google Scholar
Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In INTERSPEECH (pp. 338–342). https://doi.org/10.21437/interspeech.2014-80
Google Scholar
Sharma, S., Gangadhara, K. G., Xu, F., Slowe, A. S., Frank, M. G., & Nwogu, I. (2021). Coupled systems for modeling rapport between interlocutors. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India (pp. 1–8). IEEE Press. https://doi.org/10.1109/FG52635.2021.9667067
Google Scholar
Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55. https://doi.org/10.1109/T-AFFC.2011.25
Web of Science ®Google Scholar
Sun, T., Hu, Q., Libby, J., & Atashzar, S. F. (2022). Deep heterogeneous dilation of LSTM for transient-phase gesture prediction through high-density electromyography: Towards application in neurorobotics. IEEE Robotics and Automation Letters, 7(2), 2851–2858. https://doi.org/10.1109/LRA.2022.3142721
Web of Science ®Google Scholar
Talmor, A., & Berant, J. (2018). The web as a knowledge-base for answering complex questions. ArXiv Preprint ArXiv:1803.06643. https://doi.org/10.48550/arXiv.1803.06643
Google Scholar
Toivio, E., & Jokinen, K. (2012). Multimodal feedback signaling in Finnish (pp. 247–255). Baltic HLT. https://books.google.com/books?id=ve/_AAQAAQBAJ
Google Scholar
Trotta, D., & Guarasci, R. (2021). How are gestures used by politicians? A multimodal co-gesture analysis. Italian Journal of Computational Linguistics, 7(1), 45–66. https://doi.org/10.4000/ijcol.827
Google Scholar
Vidal, A., Salman, A., Lin, W.-C., & Busso, C. (2020). MSP-face corpus: A natural audiovisual emotional database. In Proceedings of the 2020 International Conference on Multimodal Interaction (pp. 397–405). Association for Computing Machinery. https://doi.org/10.1145/3382507.3418872
Google Scholar
Vila-Lopez, N., Boluda, I.-K., & Marin-Aguilar, J. T. (2022). Improving residents’ quality of life through sustainable experiential mega-events: High-versus low-context cultures. Journal of Hospitality & Tourism Research, 46(5), 979–1005. https://doi.org/10.1177/1096348020901775
Web of Science ®Google Scholar
Wang, I., & Ruiz, J. (2021). Examining the use of nonverbal communication in virtual agents. International Journal of Human–Computer Interaction, 37(17), 1648–1673. https://doi.org/10.1080/10447318.2021.1898851
Web of Science ®Google Scholar
Wu, B., Zhong, J., & Yang, C. (2022). A visual-based gesture prediction framework applied in social robots. IEEE/CAA Journal of Automatica Sinica, 9(3), 510–519. https://doi.org/10.1109/JAS.2021.1004243
Google Scholar
Würtz, E. (2005). Intercultural communication on web sites: A cross-cultural analysis of web sites from high-context cultures and low-context cultures. Journal of Computer-Mediated Communication, 11(1), 274–299. https://doi.org/10.1111/j.1083-6101.2006.tb00313.x
Web of Science ®Google Scholar
Yang, T.-Y., Chen, Y.-T., Lin, Y.-Y., & Chuang, Y.-Y. (2019). FSA-net: Learning fine-grained structure aggregation for head pose estimation from a single image [Paper presentation]. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA (pp. 1087–1096). https://doi.org/10.1109/cvpr.2019.00118
Google Scholar
Yoon, Y., Ko, W.-R., Jang, M., Lee, J., Kim, J., & Lee, G. (2019). Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots [Paper presentation]. In 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada (pp. 4303–4309). https://doi.org/10.1109/ICRA.2019.8793720
Google Scholar
Zadeh, A., Zellers, R., Pincus, E., & Morency, L.-P. (2016). Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, 31(6), 82–88. https://doi.org/10.1109/MIS.2016.94
Web of Science ®Google Scholar
Zhang, B., Zhu, Y., Deng, J., Zheng, W., Liu, Y., Wang, C., & Zeng, R. (2023). “I am here to assist your tourism”: Predicting continuance intention to use AI-based chatbots for tourism. Does gender really matter? International Journal of Human–Computer Interaction, 39(9), 1887–1903. https://doi.org/10.1080/10447318.2022.2124345
Web of Science ®Google Scholar
Zhang, H., Yuan, T., Chen, J., Li, X., Zheng, R., Huang, Y., Chen, X., Gong, E., Chen, Z., Hu, X., Yu, D., Ma, Y., & Huang, L. (2022). PaddleSpeech: An easy-to-use all-in-one speech toolkit. ArXiv Preprint ArXiv220512007. https://doi.org/10.18653/v1/2022.naacl-demo.12
Google Scholar
Zhang, S., Liu, C., Jiang, H., Wei, S., Dai, L., & Hu, Y. (2015). Feedforward sequential memory networks: A new structure to learn long-term dependency. ArXiv Preprint ArXiv:1512.08301. https://doi.org/10.48550/arXiv.1512.08301
Google Scholar
Zhang, Y., Che, H., Li, J., Li, C., Wang, X., & Wang, Z. (2021). One-shot voice conversion based on speaker aware module. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada (pp. 5959–5963). https://doi.org/10.1109/icassp39728.2021.9414081
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

CNAMD Corpus: A Chinese Natural Audiovisual Multimodal Database of Conversations for Social Interactive Agents

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

CNAMD Corpus: A Chinese Natural Audiovisual Multimodal Database of Conversations for Social Interactive Agents

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date