272
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

&
Article: 2356992 | Received 02 Jan 2024, Accepted 13 May 2024, Published online: 21 May 2024
 

ABSTRACT

In the field of emotion recognition, analyzing emotions from speech alone (single-modal speech emotion recognition) has several limitations, including limited data volume and low accuracy. Additionally, single-task models lack generalization and fail to fully utilize relevant information. To address these issues, this paper proposes a new bi-modal bi-task emotion recognition model. The proposed model introduces multi-task learning on the Transformer architecture. On one hand, unsupervised contrastive predictive coding is used to extract denser features from the data while preserving self-information and context-related information. On the other hand, model robustness against interfering information is enhanced by employing self-supervised contrastive learning. Furthermore, the proposed model utilizes a modality fusion module to incorporate textual and audio information to implicitly align features from both modalities. The proposed model achieved accuracy rates of 82.3% and 83.5% on the IEMOCAP and RAVDESS datasets, respectively, when considering weighted accuracy (WA). When weight is not considered (unweighted accuracy (UA)), the model achieved 83.0% and 82.4% accuracy. Compared to the existing methods, the performance is further improved.

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their contributions toward improving the quality of this paper.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

The data used to support the findings of this study are included within the article.

Additional information

Funding

This work was supported by the Guangdong Provincial Major Research Platform Ordinary University Characteristic Innovation Project Fund [No.2022KTSCX204].