The contribution of audiovisual speech to lexical-semantic processing in natural spoken sentences: Language, Cognition and Neuroscience: Vol 35 , No 6

Sample our Behavioral Sciences journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/23273798.2019.1641612?needAccess=true

ABSTRACT

In everyday communication, natural spoken sentences are expressed in a multisensory way through auditory signals and speakers’ visible articulatory gestures. An important issue is to know whether audiovisual speech plays a main role in the linguistic encoding of an utterance until access to meaning. To this end, we conducted an event-related potential experiment during which participants listened passively to spoken sentences and a lexical recognition task. The results revealed that N200 and N400 waves had a greater amplitude after semantically incongruous words than after expected words. This effect of semantic congruency was increased over N200 in the audiovisual trials. Words presented audiovisually also elicited a reduced amplitude of the N400 wave and a facilitated recovery in memory. Our findings shed light on the influence of audiovisual speech on the understanding of natural spoken sentences by acting on the early stages of word recognition in order to access a lexical-semantic network.

KEYWORDS:

Audiovisual speech
sentence context
spoken-word recognition
lexical recognition memory
event-related potentials

Acknowledgements

We are very grateful to Adèle Delalleau, Benjamin Lob, Amandine Lepachelet, Laurent Ott and Maeva Veber for their help in the selecting the stimuli and the running of the experiment. We also thank Perrine Janssoone for recording the stimuli. ERP analyses were performed with the Cartool software (supported by the Center for Biomedical Imaging in Geneva and Lausanne). The manuscript was proofread by a native-speaking English copy-editor. We thank the anonymous reviewers for their helpful comments.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 Previous studies examining the N100 explored when multisensory integration takes place (e.g. Besle et al., Citation2004; Klucharev et al., Citation2003; Pilling, Citation2009; van Wassenhove et al., Citation2005). They used the rationale of the additive model (i.e. differences between the summed unimodal activity and the activity generated by the audiovisual condition). However, this approach can lead to biases (Besle, Fort, & Giard, Citation2004; Stekelenburg & Vroomen, Citation2007; and Teder-Salejarvi, McDonald, Di Russo, & Hillyard, Citation2002). Biases come from the assumption that unimodal auditory stimuli and unimodal visual stimuli are independently processed. In fact, a common activity including attentional modulation, working memory or any higher cognitive processes may be associated with the processing of both types of stimuli (auditory and visual). This issue is very problematic when investigating effects after 200 ms from stimulus onset where the higher cognitive processes are likely to occur. Interestingly, Baart (Citation2016) demonstrated that the suppression of N100/P200 in amplitude and the speeding up of N100/P200 latencies by audiovisual speech were not modulated by whether the visual-only condition was subtracted or not from the audiovisual condition. For all these reasons, it was legitimate to compare audiovisual and auditory-only trials in the present study.

2 Left anterior: D10, D11, D12, D13, D14, D18, D19, right anterior: B20, B21, B29, B30, B31, B32, B22, frontocentral: D2, C2, FCz, C24, C22, C11, Fz, centroparietal: Cz, A2, CPz, B1, B2, D15, D16, left parietal: A17, A16, A9, A8, A7, A6, D29, right parietal: A30, A29, B3, B4, B5, B6, B13 and occipito-parietal: A5, A18, A20, Pz, Poz, A31, A32.

3 ANOVA analysis exclusively based on the mean amplitude of centroparietal sites where the N100 amplitude was found to be the strongest revealed no significant effect of Modality (F(1,31) = 1.86, MSE = 25.78, p = .18, η²_p = .005). The same analysis based on the mean amplitude or the peak amplitude over CPz and Cz again showed no significant effects of Modality (CPz, mean amplitude, F(1,31) = 0.35, MSE = 2.07, p = .56, η²_p = .011, CPz peak amplitude, F(1,31) = 0.003, MSE = 2.15, p = .99, η²_p = 10⁻⁵, Cz, mean amplitude, F(1,31) = 2, MSE = 9.1, p = .17, η²_p = .06, Cz, peak amplitude, F(1,31) = 1.1, MSE = 10.98, p = .32, η²_p = .03). Furthermore, an analysis based on the peak latency over CPz and Cz again revealed no main effect of Modality (CPz, F(1,31) = 1, MSE = 407, p = .33, η²_p = .03, Cz, F(1,31) = 3.1, MSE = 327, p = .10, η²_p = .02).

4 The P200 is apparent during the listening of natural speech when auditory evoked spread spectrum (AESPA) analysis is used (Power, Foxe, Forde, Reilly, & Lalor, Citation2012). The AESPA method is sensitive to electrophysiological brain response related to the amplitude envelope of a natural continuous speech. Contrary to this approach, we tracked to ERPs responses after expected and incongruous words in the context of semantically constraining sentences. In such experimental designs (e.g. Connolly & Phillips, Citation1994; Connolly et al., Citation1990; van den Brink & Hagoort, Citation2004; van den Brink et al., Citation2001), a N200 is elicited when the initial phonemes of the perceived word do not match the initial phonemes of the expected word from the sentence constraints.

Besle, J., Fort, A., Delpuech, C., & Giard, M. H. (2004). Bimodal speech: Early suppressive visual effects in human auditory cortex. European Journal of Neuroscience, 20, 2225–2234. doi: 10.1111/j.1460-9568.2004.03670.x

PubMed Web of Science ®Google Scholar

Klucharev, V., Möttönen, R., & Sams, M. (2003). Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception. Brain Cognitive Research, 18, 65–75. doi: 10.1016/j.cogbrainres.2003.09.004

PubMedGoogle Scholar

Pilling, M. (2009). Auditory event-related potentials (ERPs) in audiovisual speech perception. Journal of Speech, Language and Hearing Research, 52, 1073–1081. doi:10.1044/1092-4388(2009/07-0276 doi: 10.1044/1092-4388(2009/07-0276)

PubMed Web of Science ®Google Scholar

van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences, 102, 1181–1186. doi: 10.1073/pnas.0408949102

PubMed Web of Science ®Google Scholar

Besle, J., Fort, A., & Giard, M. H. (2004). Interest and validity of the additive model in electrophysiological studies of multisensory interactions. Cognitive Processing, 5, 189–192. doi: 10.1007/s10339-004-0026-y

Google Scholar

Stekelenburg, J. J., & Vroomen, J. (2007). Neural correlates of multisensory integration of ecologically valid audiovisual events. Journal of Cognitive Neuroscience, 19, 1964–1973. doi: 10.1162/jocn.2007.19.12.1964

PubMed Web of Science ®Google Scholar

Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A. (2002). An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings. Cognitive Brain Research, 14, 106–114. doi :10.1016/S0926-6410(02)00065-4

PubMedGoogle Scholar

Baart, M. (2016). Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays. Psychophysiology, 53, 1295–1306. doi: 10.1111/psyp.12683

PubMed Web of Science ®Google Scholar

Power, A. J., Foxe, J. J., Forde, E.-J., Reilly, R. B., & Lalor, E. C. (2012). At what time is the cocktail party? A late locus of selective attention to natural speech. European Journal of Neuroscience, 35, 1497–1503. doi: 10.1111/j.1460-9568.2012.08060.x

PubMed Web of Science ®Google Scholar

Connolly, J. F., & Phillips, N. A. (1994). Event-related potential components reflect phonological and semantic processing of the terminal word of spoken sentences. Journal of Cognitive Neuroscience, 6, 256–266. doi: 10.1162/jocn.1994.6.3.256

PubMed Web of Science ®Google Scholar

Connolly, J. F., Stewart, S. H., & Phillips, N. A. (1990). The effects of processing requirements on neurophysiological responses to spoken sentences. Brain and Language, 39, 302–318. doi: 10.1016/0093-934X(90)90016-A

PubMed Web of Science ®Google Scholar

van den Brink, D., & Hagoort, P. (2004). The influence of semantic and syntactic context constraints on lexical selection and integration in spoken-word comprehension as revealed by ERPs. Journal of Cognitive Neuroscience, 16, 1068–1084. doi: 10.1162/0898929041502670

PubMed Web of Science ®Google Scholar

van den Brink, D., Brown, C. M., & Hagoort, P. (2001). Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects. Journal of Cognitive Neuroscience, 13, 967–985. doi: 10.1162/089892901753165872

PubMed Web of Science ®Google Scholar

Additional information

Funding

This research was supported by visual studies grant (SCV2013-2014) from the French National Research Agency (ANR-11-EQPX-0023), and European funds through the FEDER SCV-IrDIVE program. It was also funded by the University of Lille (AAPEtablissement2014) and the municipal authorities in Lille (AppelLMCU2014).

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 444.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

The contribution of audiovisual speech to lexical-semantic processing in natural spoken sentences

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

The contribution of audiovisual speech to lexical-semantic processing in natural spoken sentences

ABSTRACT

Acknowledgements

Disclosure statement

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature