Abstract
This cross-cultural study of emotional tone of voice recognition tests the in-group advantage hypothesis (Elfenbein & Ambady, 2002) employing a quasi-balanced design. Individuals of Chinese and British background were asked to recognise pseudosentences produced by Chinese and British native speakers, displaying one of seven emotions (anger, disgust, fear, happy, neutral tone of voice, sad, and surprise). Findings reveal that emotional displays were recognised at rates higher than predicted by chance; however, members of each cultural group were more accurate in recognising the displays communicated by a member of their own cultural group than a member of the other cultural group. Moreover, the evaluation of error matrices indicates that both culture groups relied on similar mechanism when recognising emotional displays from the voice. Overall, the study reveals evidence for both universal and culture-specific principles in vocal emotion recognition.
Authors share first authorship. AKU is now at the School of Psychology, University of Kent, UK.
We would like to thank Sarah Harris, Laura King, Constance Lau, Yuchen Lao, and Hoi Lee for their help with stimulus preparation and data collection.
Authors share first authorship. AKU is now at the School of Psychology, University of Kent, UK.
We would like to thank Sarah Harris, Laura King, Constance Lau, Yuchen Lao, and Hoi Lee for their help with stimulus preparation and data collection.
Notes
1 The use of acted speech samples has previously been challenged. However, there is clear empirical evidence that speech samples obtained from well trained actors contain very comparable acoustic features to samples obtained from non-posed situations (see, e.g., Scherer, Citation2013, for a recent comparison of mood induced and acted materials). Given that the current study set out to compare vocal emotion recognition across languages, it was crucial to use controlled, good quality recordings, which could not have been achieved if spontaneous speech had been used (also see, e.g., Banse & Scherer, Citation1996, for a discussion on the advantages/disadvantages of using acted speech samples).
2 It was not possible to meet this selection criterion for Chinese sentences intoned in a tone conveying disgust. Thus, for this condition the best 28 sentences were included while ignoring the selection criterion.
3 All reported discriminant analyses were successfully cross-validated in SPSS with randomly selected subsamples.
4 When analyses are repeated using the acoustic parameter range dB, results showed a similar pattern: Results for Chinese materials revealed that for 61.2% of the sentences emotional category membership was predicted correctly: anger 46.4%; disgust 52.4%; fear 42.1%; happiness 44.4%; neutral 100%; sadness 66.7%; and pleasant surprise 80%. Results for English sentences were similar as 70.4% of sentences were classified correctly: anger 96.4%; disgust 64.3%; fear 25.0%; happiness 57.1%; neutral 78.6%; sadness 96.4%; and pleasant surprise 75.0%.
5 Again, similar results were found when range dB was used as predictor (instead of mean dB): For Chinese sentences, the model could predict 37.4% of mistakes made by Chinese participants while 41.3% of misclassified sentences were identified correctly by the model for English participants. Results for English sentences revealed that 43.2% of mistakes were predicted correctly for English participants versus 46.9% for Chinese participants.
6 For ease of reading, we will from now on use the term “cultural group”.