565
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Perceiving emotion from a talker: How face and voice work together

&
Pages 902-921 | Received 16 Dec 2011, Accepted 17 Jul 2012, Published online: 03 Sep 2012
 

Abstract

The experiment investigated how the addition of emotion information from the voice affects the identification of facial emotion. We presented whole face, upper face, and lower face displays and examined correct recognition rates and patterns of response confusions for auditory-visual (AV), auditory-only (AO), and visual-only (VO) expressive speech. Emotion recognition accuracy was superior for AV compared to unimodal presentation. The pattern of response confusions differed across the unimodal conditions and across display type. For AV presentation, a response confusion only occurred when such a confusion was present in each modality separately, thus response confusions were reduced compared to unimodal presentations. Emotion space (calculated from the confusion data) differed across display types for the VO presentations but was more similar for the AV ones indicating that the addition of the auditory information acted to harmonize the various VO response patterns. These results are discussed with respect to how bimodal emotion recognition combines auditory and visual information.

Acknowledgments

Portions of this study were presented at the ninth international conference on Auditory-Visual Speech Processing (AVSP 2010). The authors wish to thank N. Xu, C. Gasparini, J. Ramos, B. Kasisopa, and E. Cvejic for their assistance in conducting the experiment, and acknowledge support from ARC (DP0666857 and TS0669874).

Notes

1The upper/lower face manipulation was chosen because there is evidence that the importance of face regions varies across emotion type. Indeed, the idea that different face regions are more prominent in displaying emotion has a long history. For example, Wundt (Citation1911) speculated that the lower face (mouth region) is fundamental to the expression of the emotions due to the correlation of pleasantness and unpleasantness with qualities of taste, e.g., sweet and bitter. The upper and lower face display conditions were also selected because such a division fits with an argument that facial expressions are functionally organized across the upper–lower facial axis (Ross, Prodan, & Monnot, Citation2007).

2Our interest was not particularly with whether the obtained space had dimensions similar to those that have been derived from ratings of static faces. Nevertheless, the dimensions for the MDS solution of the VO whole face presentation condition (black line in the centre panel) were consistent with previous findings that had used static photographs (e.g., Russell & Bullock, Citation1985). That is, the vertical axis of this space was consistent with a sleepiness–arousal dimension (from sad at the bottom, through happy, anger, to surprise, and disgust) and the horizontal axis consistent with a pleasant–unpleasant dimension (from happy on the left, through surprise, sad, disgust, then anger).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 238.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.