Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Yu Songa School of Energy Engineering, Jiangxi Vocational College of Industry & Engineering, Pingxiang, Jiangxi, ChinaView further author information

Qi Zhoub School of Artificial Intelligence, Guangdong Open University, Guangzhou, Guangdong, ChinaCorrespondence[email protected]
View further author information

Article: 2356992 | Received 02 Jan 2024, Accepted 13 May 2024, Published online: 21 May 2024

Cite this article
https://doi.org/10.1080/08839514.2024.2356992
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Atmaja, B. T., and M. Akagi 2020. Multitask learning and multistage fusion for dimensional audiovisual emotion recognition. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4482–22. doi:10.1109/ICASSP40776.2020.9052916
Google Scholar
Bendjoudi, I., F. Vanderhaegen, D. Hamad, and F. Dornaika. 2021. Multi-label, multi-task CNN approach for context-based emotion recognition. Information Fusion 76:422–28. doi:10.1016/j.inffus.2020.11.007
Web of Science ®Google Scholar
Cai, Y., X. Li, and J. Li. 2023. Emotion recognition using different sensors, emotion models, methods and datasets: A comprehensive review. Sensors 23 (5):2455. doi:10.3390/s23052455
PubMed Web of Science ®Google Scholar
Chang, X., and W. Skarbek. 2021. Multi-modal residual perceptron network for audio–video emotion recognition. Sensors 21 (16):5452. doi:10.3390/s21165452
PubMed Web of Science ®Google Scholar
Chen, W., X. Xing, P. Chen, and X. Xu (2024). Vesper: A compact and effective pretrained model for speech emotion recognition. IEEE Transactions on Affective Computing. doi:10.1109/TAFFC.2024.3369726
Google Scholar
Datta, S., and S. Chakrabarti. 2022. Integrated two variant deep learners for aspect-based sentiment analysis: An improved meta-heuristic-based model. Cybernetics and Systems 1–37. doi:10.1080/01969722.2022.2145657
Web of Science ®Google Scholar
Feng, J., S. Cai, K. Li, Y. Chen, Q. Cai, and H. Zhao. 2023. Fusing syntax and semantics-based graph convolutional network for aspect-based sentiment analysis. International Journal of Data Warehousing and Mining 19 (1):1–15. doi:10.4018/IJDWM.319803
Web of Science ®Google Scholar
Feng, H., S. Ueno, and T. Kawahara. 2020. End-to-end speech emotion recognition combined with acoustic-to-word ASR model. In Interspeech, 501–05. doi:10.21437/Interspeech.2020-1180
Google Scholar
Goshvarpour, A., and A. Goshvarpour. 2023. Novel high-dimensional phase space features for EEG emotion recognition. Signal, Image and Video Processing 17 (2):417–25. doi:10.1007/s11760-022-02248-6
Web of Science ®Google Scholar
Huang, W., S. Cai, H. Li, and Q. Cai. 2023. Structure graph refined information propagate network for aspect-based sentiment analysis. International Journal of Data Warehousing and Mining 19 (1):1–20. doi:10.4018/IJDWM.327363
Web of Science ®Google Scholar
Kakuba, S., A. Poulose, and D. S. Han. 2022a. Attention-based multi-learning approach for speech emotion recognition with dilated convolution. IEEE Access 10:122302–13. doi:10.1109/ACCESS.2022.3223705
Web of Science ®Google Scholar
Kakuba, S., A. Poulose, and D. S. Han. 2022b. Deep learning-based speech emotion recognition using multi-level fusion of concurrent features. IEEE Access 10:125538–51. doi:10.1109/ACCESS.2022.3225684
Google Scholar
Kakuba, S., A. Poulose, and D. S. Han. 2023. Deep learning approaches for bimodal speech emotion recognition: Advancements, challenges, and a multi-learning model. Institute of Electrical and Electronics Engineers Access 11:113769–89. doi:10.1109/ACCESS.2023.3325037
Google Scholar
Kumar, Y., and M. Mahajan. 2019. Machine learning based speech emotions recognition system. International Journal of Scientific and Technology Research 8 (7):722–29.
Google Scholar
Latif, S., R. K. Rana, S. Khalifa, R. Jurdak, and B. Schuller 2022. Multitask learning from augmented auxiliary data for improving speech emotion recognition. IEEE Transactions on Affective Computing 14:4. doi:10.1109/TAFFC.2022.3221749
Google Scholar
Le, H., G. Lee, S. Kim, S. Kim, and H. Yang. 2023. Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning. Institute of Electrical and Electronics Engineers Access 11:14742–51. doi:10.1109/ACCESS.2023.3244390
Google Scholar
Lian, Z., B. Liu, and J. Tao. 2021. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29:985–1000. doi:10.1109/TASLP.2021.3049898
Web of Science ®Google Scholar
Lian, Z., B. Liu, and J. Tao (2022). Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition. IEEE Transactions on Affective Computing 14:2415–2429. doi:10.1109/TAFFC.2022.3141237
Google Scholar
Li, H., W. Ding, Z. Wu, and Z. Liu 2020. Learning fine-grained cross modality excitement for speech emotion recognition. arxiv preprint arxiv:2010.12733.
Google Scholar
Lu, Z., L. Cao, Y. Zhang, C. Chiu, and J. Fan 2020. Speech sentiment analysis via pre-trained features from end-to-end asr models. In IEEE ICASSP, Barcelona, Spain, 7149–53. doi:10.1109/ICASSP40776.2020.9052937
Google Scholar
Ma, H., J. Wang, H. Lin, B. Zhang, Y. Zhang, and B. Xu 2023. A transformer-based model with self-distillation for multimodal emotion recognition in conversations. ArXiv, abs/2310.20494.
Google Scholar
Meng, W., and N. Yolwas. 2023. A study of speech recognition for Kazakh based on unsupervised pre-training. Sensors 23 (2):870. doi:10.3390/s23020870
PubMed Web of Science ®Google Scholar
Min, B., H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth. 2023. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys 56 (2):1–40. doi:10.1145/3605943
Web of Science ®Google Scholar
Morais, E. D., R. Hoory, W. Zhu, I. Gat, M. Damasceno, and H. Aronowitz 2022. Speech emotion recognition using self-supervised features. arXiv preprint arXiv:2202.03896.
Google Scholar
Padi, S., S. O. Sadjadi, D. Manocha, and R. D. Sriram 2022. Multimodal emotion recognition using transfer learning from speaker recognition and BERT-based models. arXiv preprint arXiv:2202.08974.
Google Scholar
Paszke, A., S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. Neural Information Processing Systems 32.
Google Scholar
Ribeiro, A. H., and T. B. Schön. 2023. Overparameterized linear regression under adversarial attacks. IEEE Transactions on Signal Processing 71:601–14. doi:10.1109/TSP.2023.3246228
Web of Science ®Google Scholar
Sajjad, M., and S. Kwon. 2020. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. Institute of Electrical and Electronics Engineers Access 8:79861–75. doi:10.1109/ACCESS.2020.2990405
Google Scholar
Samant, S. S., V. Singh, A. Chauhan, and J. Dasarahalli Narasimaiah. 2022. An optimized crossover framework for social media sentiment analysis. Cybernetics and Systems 1–29. doi:10.1080/01969722.2022.2146849
Web of Science ®Google Scholar
Sanh, V., L. Debut, J. Chaumond, and T. Wolf 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. ar**v preprint ar**v:1910.01108.
Google Scholar
Sarvakar, K., R. Senkamalavalli, S. Raghavendra, Kumar JS, Manjunath R, Jaiswal S. 2023. Facial emotion recognition using convolutional neural networks. Materials Today: Proceedings 80:3560–64. doi:10.1016/j.matpr.2021.07.297
Google Scholar
Schneider, S., A. Baevski, R. Collobert, and M. Auli 2019. wav2vec: Unsupervised pre-training for speech recognition. ar**v preprint ar**v:1904.05862.
Google Scholar
Sebastian, J., P. Pierucci. 2019. Fusion techniques for utterance level emotion recognition combining speech and transcripts. In Interspeech 2019 51–55. doi:10.21437/Interspeech.2019-3201
Google Scholar
Sharafi, M., M. Yazdchi, R. Rasti, and F. Nasimi. 2022. A novel spatio-temporal convolutional neural framework for multimodal emotion recognition. Biomedical Signal Processing and Control 78:103970. doi:10.1016/j.bspc.2022.103970
Web of Science ®Google Scholar
Singh, P., R. Srivastava, K. P. Rana, and V. Kumar. 2021. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowledge-Based Systems 229:107316. doi:10.1016/j.knosys.2021.107316
Web of Science ®Google Scholar
Tsai, Y. H., S. Bai, P. P. Liang, J. Z. Kolter, L. Morency, and R. Salakhutdinov 2019. Multimodal transformer for unaligned multimodal language sequences. In ACL 6558. doi:10.18653/v1/p19-1656
Google Scholar
Vaswani, A., N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30.
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. 2017. Attention is all you need. NeurIPS 2017:30.
Google Scholar
Wang, H., X. Li, Z. Ren, M. Wang, and C. Ma. 2023. Multimodal sentiment analysis representations learning via contrastive learning with condense attention fusion. Sensors 23 (5):2679. doi:10.3390/s23052679
PubMed Web of Science ®Google Scholar
Wang, J., M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh 2020. Speech emotion recognition with dual-sequence LSTM architecture. In IEEE ICASSP, Barcelona, Spain 6474–78. doi:10.1109/ICASSP40776.2020.9054629
Google Scholar
Wen, H., S. You, and Y. Fu. 2021. Cross-modal dynamic convolution for multi-modal emotion recognition. Journal of Visual Communication and Image Representation 78:103178. doi:10.1016/j.jvcir.2021.103178
Web of Science ®Google Scholar
Wu, X., Lv S, Zang L, Han J, Hu S. 2019. Conditional BERT contextual augmentation. International Conference on Computational Science. Cham: Springer. doi:10.1007/978-3-030-22747-0_7
Google Scholar
Xie, B., M. Sidulova, and C. H. Park. 2021. Robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion. Sensors 21 (14):4913. doi:10.3390/s21144913
PubMed Web of Science ®Google Scholar
Xu, M., F. Zhang, and W. Zhang. 2021. Head fusion: Improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS dataset. IEEE Access 9:74539–49. doi:10.1109/ACCESS.2021.3067460
Web of Science ®Google Scholar
Yang, E., J. W. Pan, X. M. Wang, H. B. Yu, L. Shen, X. H. Chen, L. Xiao, J. Jiang, and G. B. Guo. 2023. AdaTask: A task-aware adaptive learning rate approach to multi-task learning. Proceedings of the AAAI Conference on Artificial Intelligence 37 (9):10745–53. doi:10.1609/aaai.v37i9.26275
Google Scholar
Zeng, Y., H. Mao, D. Peng, and Z. Yi. 2019. Spectrogram based multi-task audio classification. Multimedia Tools and Applications 78 (3):3705–22. doi:10.1007/s11042-017-5539-3
Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date