Abstract
We examined whether facial expressions of performers influence the emotional connotations of sung materials, and whether attention is implicated in audio-visual integration of affective cues. In Experiment 1, participants judged the emotional valence of audio-visual presentations of sung intervals. Performances were edited such that auditory and visual information conveyed congruent or incongruent affective connotations. In the single-task condition, participants judged the emotional connotation of sung intervals. In the dual-task condition, participants judged the emotional connotation of intervals while performing a secondary task. Judgements were influenced by melodic cues and facial expressions and the effects were undiminished by the secondary task. Experiment 2 involved identical conditions but participants were instructed to base judgements on auditory information alone. Again, facial expressions influenced judgements and the effect was undiminished by the secondary task. The results suggest that visual aspects of music performance are automatically and preattentively registered and integrated with auditory cues.
Notes
1Our manipulation of attention was adapted from the approach used by Vroomen et al. (Citation2001). The latter study involved only three conditions (two single-task conditions and one dual-task condition) and numbers only appeared at one rate (five times per second). In one single-task condition, numbers appeared but participants ignored them. In the other single-task condition, no numbers appeared. Thus, the two studies manipulated attention in similar ways for the same purpose, but numbers always occurred in the current study and were presented at slow and fast rates.
2Two independent groups of participants with a similar distribution of music training provided emotion ratings for audio-only (n=21) and visual-only (n=20) presentations of the same stimuli. Sung major thirds presented as audio (M=3.96, SD=0.50) or video (M=4.21, SD=0.74) were assigned significantly higher ratings than minor thirds presented as audio (M=2.48, SD=0.84) or video (M=2.05, SD=0.86), F(1, 39) = 179.28, p<.0001. Mean ratings for incongruent audio-visual stimuli were intermediate between mean ratings for happy (major third) and sad (minor third) audio-alone and video-alone stimuli, confirming that visual and auditory cues both influenced emotion judgements. Ratings of happy audio-alone presentations were higher than ratings of happy-audio/sad-video presentations, t(57) = 8.35, p<.0001, and ratings of sad audio-alone presentations were lower than ratings for sad-audio/happy-video presentations, t(57) = − 4.99, p<.0001. Similarly, ratings of happy visual-alone presentations were higher than ratings of happy-video/sad-audio presentations, t(56) = 3.51, p<.001, and ratings of sad video-alone presentations were lower than ratings of sad-video/happy-audio presentations, t(57) = − 3.70, p<.0001. Thus, when presented with incongruent cues from two sensory modalities, participants did not rely on one channel as the basis of their rating, but integrated audio and visual signals to generate a rating that reflected a balance between the two cues.