ABSTRACT
In everyday communication, natural spoken sentences are expressed in a multisensory way through auditory signals and speakers’ visible articulatory gestures. An important issue is to know whether audiovisual speech plays a main role in the linguistic encoding of an utterance until access to meaning. To this end, we conducted an event-related potential experiment during which participants listened passively to spoken sentences and a lexical recognition task. The results revealed that N200 and N400 waves had a greater amplitude after semantically incongruous words than after expected words. This effect of semantic congruency was increased over N200 in the audiovisual trials. Words presented audiovisually also elicited a reduced amplitude of the N400 wave and a facilitated recovery in memory. Our findings shed light on the influence of audiovisual speech on the understanding of natural spoken sentences by acting on the early stages of word recognition in order to access a lexical-semantic network.
Acknowledgements
We are very grateful to Adèle Delalleau, Benjamin Lob, Amandine Lepachelet, Laurent Ott and Maeva Veber for their help in the selecting the stimuli and the running of the experiment. We also thank Perrine Janssoone for recording the stimuli. ERP analyses were performed with the Cartool software (supported by the Center for Biomedical Imaging in Geneva and Lausanne). The manuscript was proofread by a native-speaking English copy-editor. We thank the anonymous reviewers for their helpful comments.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1 Previous studies examining the N100 explored when multisensory integration takes place (e.g. Besle et al., Citation2004; Klucharev et al., Citation2003; Pilling, Citation2009; van Wassenhove et al., Citation2005). They used the rationale of the additive model (i.e. differences between the summed unimodal activity and the activity generated by the audiovisual condition). However, this approach can lead to biases (Besle, Fort, & Giard, Citation2004; Stekelenburg & Vroomen, Citation2007; and Teder-Salejarvi, McDonald, Di Russo, & Hillyard, Citation2002). Biases come from the assumption that unimodal auditory stimuli and unimodal visual stimuli are independently processed. In fact, a common activity including attentional modulation, working memory or any higher cognitive processes may be associated with the processing of both types of stimuli (auditory and visual). This issue is very problematic when investigating effects after 200 ms from stimulus onset where the higher cognitive processes are likely to occur. Interestingly, Baart (Citation2016) demonstrated that the suppression of N100/P200 in amplitude and the speeding up of N100/P200 latencies by audiovisual speech were not modulated by whether the visual-only condition was subtracted or not from the audiovisual condition. For all these reasons, it was legitimate to compare audiovisual and auditory-only trials in the present study.
2 Left anterior: D10, D11, D12, D13, D14, D18, D19, right anterior: B20, B21, B29, B30, B31, B32, B22, frontocentral: D2, C2, FCz, C24, C22, C11, Fz, centroparietal: Cz, A2, CPz, B1, B2, D15, D16, left parietal: A17, A16, A9, A8, A7, A6, D29, right parietal: A30, A29, B3, B4, B5, B6, B13 and occipito-parietal: A5, A18, A20, Pz, Poz, A31, A32.
3 ANOVA analysis exclusively based on the mean amplitude of centroparietal sites where the N100 amplitude was found to be the strongest revealed no significant effect of Modality (F(1,31) = 1.86, MSE = 25.78, p = .18, η²p = .005). The same analysis based on the mean amplitude or the peak amplitude over CPz and Cz again showed no significant effects of Modality (CPz, mean amplitude, F(1,31) = 0.35, MSE = 2.07, p = .56, η²p = .011, CPz peak amplitude, F(1,31) = 0.003, MSE = 2.15, p = .99, η²p = 10−5, Cz, mean amplitude, F(1,31) = 2, MSE = 9.1, p = .17, η²p = .06, Cz, peak amplitude, F(1,31) = 1.1, MSE = 10.98, p = .32, η²p = .03). Furthermore, an analysis based on the peak latency over CPz and Cz again revealed no main effect of Modality (CPz, F(1,31) = 1, MSE = 407, p = .33, η²p = .03, Cz, F(1,31) = 3.1, MSE = 327, p = .10, η²p = .02).
4 The P200 is apparent during the listening of natural speech when auditory evoked spread spectrum (AESPA) analysis is used (Power, Foxe, Forde, Reilly, & Lalor, Citation2012). The AESPA method is sensitive to electrophysiological brain response related to the amplitude envelope of a natural continuous speech. Contrary to this approach, we tracked to ERPs responses after expected and incongruous words in the context of semantically constraining sentences. In such experimental designs (e.g. Connolly & Phillips, Citation1994; Connolly et al., Citation1990; van den Brink & Hagoort, Citation2004; van den Brink et al., Citation2001), a N200 is elicited when the initial phonemes of the perceived word do not match the initial phonemes of the expected word from the sentence constraints.