Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture: Applied Artificial Intelligence: Vol 38, No 1

272

Views

CrossRef citations to date

Altmetric

ABSTRACT

In the field of emotion recognition, analyzing emotions from speech alone (single-modal speech emotion recognition) has several limitations, including limited data volume and low accuracy. Additionally, single-task models lack generalization and fail to fully utilize relevant information. To address these issues, this paper proposes a new bi-modal bi-task emotion recognition model. The proposed model introduces multi-task learning on the Transformer architecture. On one hand, unsupervised contrastive predictive coding is used to extract denser features from the data while preserving self-information and context-related information. On the other hand, model robustness against interfering information is enhanced by employing self-supervised contrastive learning. Furthermore, the proposed model utilizes a modality fusion module to incorporate textual and audio information to implicitly align features from both modalities. The proposed model achieved accuracy rates of 82.3% and 83.5% on the IEMOCAP and RAVDESS datasets, respectively, when considering weighted accuracy (WA). When weight is not considered (unweighted accuracy (UA)), the model achieved 83.0% and 82.4% accuracy. Compared to the existing methods, the performance is further improved.

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their contributions toward improving the quality of this paper.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

The data used to support the findings of this study are included within the article.

Additional information

Funding

This work was supported by the Guangdong Provincial Major Research Platform Ordinary University Characteristic Innovation Project Fund [No.2022KTSCX204].

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Information for

Open access

Opportunities

Help and information

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

ABSTRACT

Acknowledgements

Disclosure Statement

Data Availability Statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature