Abstract
When watching others describe events, does information from their speech and gestures affect our memory representations for the gist and surface form of the described events? Does our reliance on these memory representations change over time? Forty participants watched videos of stories narrated by an actor. Each story included three target events that differed in their speech-gesture congruency for particular actions (congruent speech/gesture, incongruent speech/gesture, or speech with no gesture). Participants had to reproduce target event sentences, prompted after delays of 2, 6, or 18 minutes. Seeing gestures, either congruent or incongruent, led to better gist recall (more mentions of the target action, more gestures for the target action, and more complete target events) compared to not seeing gestures. However, seeing incongruent gestures sometimes led to reproductions of the incongruent gestures, particularly after short delays, as well as inaccuracies in speech. Our results suggest that over time people increasingly rely on multimodal gist-based representations and rely less on representations that include surface and source information about speech and gesture.
Acknowledgements
This material is based upon work supported by NSF under grants IIS-0527585 and ITR-0325188, and by NIH grants R01-51663 and R01-059787. We would like to thank Richard Gerrig, Susan Brennan, and our colleagues from the Gesture Focus Group for helpful discussion. We are grateful to Drew Boudreau for serving as our actor, Matthew Belevich, Gennadiy Ryklin, and Marina Khan for help with coding, and Donna Kat for her enormous technical assistance. We also thank two anonymous reviewers for suggestions that significantly improved this paper.
Notes
1Although a potential criticism is that incongruities between verbal and nonverbal information are unnatural or implausible in communication, there is evidence that they are not. Speech-gesture mismatches are prevalent in transitional points of cognitive development (e.g., Church & Golden-Meadow, Citation1986; Goldin-Meadow, Nusbaum, Garber, & Church, Citation1993; Perry, Church, & Goldin-Meadow, Citation1988), and adults use such mismatching gestures to assess children's knowledge (Alibali, Flevares, & Goldin-Meadow, Citation1997; Goldin-Meadow & Singer, Citation2003; Goldin-Meadow, Wein, & Chang, Citation1992). Mismatches can also occur between people's speech and facial displays, and observers readily interpret them. For example, positively valued utterances paired with negatively valued facial expressions and vocal qualities were judged by respondents to be sarcastic, while negatively valued utterances paired with positively valued nonverbal displays were judged to involve joking (Bugental, Kaswan, & Love, Citation1970). As Bavelas and Chovil (Citation2000) have pointed out, addressees interpret inconsistent pairings of verbal and nonverbal information felicitously assuming that they are intended by the speaker to be part of a single unified message, consistent with Grice's (Citation1967/1989) cooperative principle. Thus, it is not the case that our participants should be unable or unwilling to interpret speech-gesture incongruities felicitously.
2From this assessment of a total of 209 representational gestures for the target action, we excluded five quoted incongruent gestures. In these gestures participants reproduced consciously, and not unwittingly, incongruities performed by the actor in the stimulus. Their quotation was marked by a combination of facial displays, laughing, and exaggerated gesturing.