1,497
Views
3
CrossRef citations to date
0
Altmetric
COMMENTARIES

Balancing exogenous and endogenous cortical rhythms for speech and language requires a lot of entraining: a commentary on Meyer, Sun & Martin (2020)

Pages 1133-1137 | Received 04 Feb 2020, Accepted 19 Feb 2020, Published online: 27 Feb 2020
This article refers to:
Synchronous, but not entrained: exogenous and endogenous cortical rhythms of speech and language processing

One of the most fruitful areas of research in recent years in the cognitive neuroscience of language has investigated the role of neural entrainment in speech segmentation and comprehension (e.g. Giraud & Poeppel, Citation2012; Kösem & Van Wassenhove, Citation2017; Meyer, Citation2018). Another line of research has probed the role of neural synchronisation in the processing of words, phrases, sentences, discourse and event structures, and even pragmatics (e.g. Bastiaansen & Hagoort, Citation2006; Lewis, Wang, & Bastiaansen, Citation2015; Prystauka & Lewis, Citation2019; Weiss & Mueller, Citation2012). Traditionally these two lines of enquiry have foraged independently, each gathering new observations and new insights, and independently mapping out the dense forests that constitute our knowledge and understanding of how the brain gives rise to what is arguably our most uniquely human characteristic. Much progress has been made, and the time appears ripe to compare maps and try to discover common paths, clearings that seemed obvious to one but perhaps not the other line of enquiry, and tangled thickets that might best be explored in tandem.

In a recent article, Meyer, Sun, and Martin (Citation2019; henceforth referred to as MSM) make a case for precisely such a synthesis between these two lines of research. MSM make the important distinction between what they refer to as entrainment proper and internal synchronicity. The key distinguishing feature being that entrainment proper requires an external (quasi-)rhythmic stimulus that drives cortical synchronicity at rates corresponding to rhythmic characteristics of the driving stimulus. Internal synchronicity, on the other hand, is endogenously generated based on abstract linguistic representations that allow for inferences about linguistic information at various levels of abstraction (e.g. words, phrases, sentences etc.). Crucially, such synchronicity does not require these linguistic units to be marked in the physical characteristics of the speech stimulus.

From my perspective, MSM offer two major insights that have the potential to drive the field forward and to open new avenues of research. First, they suggest that much of what has been called entrainment in the literature might turn out to be internal synchronicity in disguise. The crux of their argument is that it is difficult to tell the two apart because the rates at which entrainment is observed very often overlap with the rates expected for internal synchronicity. MSM discuss a number of studies in the entrainment literature for which this is at least a plausible alternative explanation. A second important insight is MSM’s suggestion that entrainment proper and internal synchronicity may provide a degree of useful redundancy when it comes to speech segmentation and/or comprehension. The key suggestion is that these mechanisms may complement one another to bring about sustained entrainment when either mechanism alone may not be sufficient to allow for decoding the relevant information from the speech signal. Speech is quasi-rhythmic for example and often does not provide sufficient information in the physical stimulus to allow for entrainment proper. In such cases, internal synchronicity can provide a helping hand by “filling in the blanks” so to say; generating entrainment based on inferences from abstract symbolic representations about the expected rhythmicity of different types of linguistic information. MSM propose that both entrainment proper and internal synchronisation might be consistently at work during speech segmentation and comprehension, and that the relative weighting of these two factors may be modified appropriately depending on the situation.

I find it difficult to quibble with MSM and consider their proposal(s) to be very much in the spirit of recent thinking about how best to reconcile the literatures on entrainment and network synchronisation (Lakatos, Gross, & Thut, Citation2019). In the remainder of this commentary, I outline a few considerations that I think will be important to bear in mind while navigating some of the new paths opened up by MSM, and I speculate about some intriguing consequences in case MSM’s proposal turns out to be on the right track.

Meaningful measurements make meaningful distinctions

An important technical point that should be considered is the method used to quantify entrainment and/or synchronicity. For many, specifics of such technical details can often be dry, uninteresting, and have little influence on interpretation of the overall findings. In the particular case at hand, however, some of the specifics are immensely important for the kinds of conclusions one can draw from the findings. Some studies, for instance, have employed explicit measures of coupling (using measures like coherence or mutual information) between physical characteristics of the speech signal (e.g. amplitude envelope, syllable rate, phoneme rate, etc.), and frequency-specific spectral information (e.g. power, phase) in the Magnetoencephalography/Electroencephalography (M/EEG) data concurrently recorded (e.g. Ahissar et al., Citation2001; Gross et al., Citation2013; Keitel, Gross, & Kayser, Citation2018; Lam, Hultén, Hagoort, & Schoffelen, Citation2018). Others have instead measured M/EEG while people listen to speech with known spectral characteristics centred at specific rates of interest (e.g. syllable rate, phoneme rate, etc.), and looked for entrainment by searching for corresponding peaks in M/EEG power spectra, or in measures of phase consistency across trials (e.g. Ding, Melloni, Zhang, Tian, & Poeppel, Citation2016; Kösem et al., Citation2018; Meyer, Henry, Gaston, Schmuck, & Friederici, Citation2017). Importantly, in the latter cases entrainment is inferred without explicitly computing a measure of coupling between the speech signal and the neural activity. Both approaches have merits and are typically an appropriate choice for the question of interest in the studies in which they are employed. When synthesising the wider literature, however, the details are frequently glossed over and reviews will often treat both approaches as indicative of entrainment proper, when in fact only the former explicitly establishes a relationship with the external driving speech stimulus. A crucial corollary is that if one is expecting to observe internal synchronicity, then it does not make much sense to use measures that look for coupling between the speech stimulus and frequency-specific neural activity. This is because in such cases, the rates at which synchronicity is expected are internally generated based on inferences from abstract linguistic information, and should thus not (necessarily) be expected to have physically observable signatures in the speech signal itself.

What’s the going rate and where does it originate?

In characterising the anatomical implementation of the endogenous-exogenous balance (relative weighting of internal synchronicity and entrainment proper) MSM suggest that entrainment proper might be restricted to primary sensory (in this case auditory) cortices. As the information generating the synchronicity becomes more abstract and categorical in nature, the balance shifts towards internal synchronicity and away from entrainment proper. MSM propose that this shift may be reflected in internal synchronicity moving out from auditory association cortices and spreading to more widely distributed brain networks. This raises the important question of which brain regions will be incorporated into these more widespread networks related to the processing of different kinds of abstract information. Similarly, one might ask whether such networks are mediated by internal synchronicity at different rates depending on the type of information or level of abstraction of the associated inferences. Alternatively, the observed internal synchronicity may be related to network size (Buzsáki, Logothetis, & Singer, Citation2013; Lakatos et al., Citation2019) rather than the type of linguistic information with which it is associated. Speculatively, one might imagine that there is some correlation between the degree of abstraction of linguistic information, the size of the neural networks expressing associated internal synchronicity, and the rate at which such networks express said synchronicity. This offers an intriguing hypothesis to explore in future experimental work.

A related question one might ask is how internal synchronicity in widespread networks influences entrainment proper. Should we expect internal synchronicity to exert its influence by matching the delta and/or theta rate typically observed for entrainment proper in auditory cortices? Alternatively, might it be the case that internal synchronicity in widespread brain networks modulates entrainment proper in sensory cortices by resetting or aligning the phase of oscillations in those regions, perhaps through a mechanism like cross-frequency coupling (e.g. Jensen & Colgin, Citation2007)? Working out precisely how (or even whether) internal synchronicity interacts with entrainment proper in the service of better aligning information uptake from the speech input with top-down inferences about more abstract linguistic representations, will be a fascinating direction to pursue in future experimental and computational modelling work.

A role for control?

Perhaps the most appealing aspect of models that incorporate entrainment for the purposes of speech segmentation (Ghitza, Citation2011; Giraud & Poeppel, Citation2012) is that the processing is largely automatic. In such models, auditory cortices entrain to the relevant rates (e.g. syllabic or phonemic) for speech segmentation, simply in virtue of being confronted with (quasi-)rhythmic speech input. The entire process is dynamic and self-organising (Giraud & Poeppel, Citation2012; Obleser & Kayser, Citation2019), and the auditory system is thought to be anatomically and physiologically predisposed (likely due to a combination of evolutionary history and statistical learning across development) to entrain more to speech input than to other auditory inputs, and to do so at specific frequencies corresponding to the relevant rates at which speech segmentation typically unfolds within a language. This elegant proposal allows for an explanation of speech segmentation that does not require intervention by some top-down monitoring system to keep track of whether the input constitutes speech or not.

The quasi-rhythmic nature of relevant characteristics of the speech signal to which the auditory system is expected to entrain has posed a niggling question to this otherwise elegant theory of speech segmentation by the brain’s auditory system. MSM’s proposal that a fundamental part of the dynamics of the system is a tradeoff in the relative weighting of entrainment proper and internal synchronicity, is itself a very elegant way to deal with this observation that the speech signal is not perfectly rhythmic. The possibility that in cases where rhythmicity in the speech signal is not perfectly isochronous internal synchronicity is weighted more highly and essentially “fills in the blanks” for driving the entrainment of auditory cortices, provides an appealing answer to such questions.

One important question that arises, however, is precisely how the weighting between intrinsic synchronicity and entrainment proper is determined. How does the system “know” when to focus more on the bottom-up input (entrainment proper), and when instead to rely more on top-down information (intrinsic synchronicity) to drive the oscillatory behaviour in auditory cortices? If the rate of intrinsic synchronicity is different from that of entrainment proper, would this be expected to result in competition between endogenously generated expectations and exogenous features of the speech input in determining how the speech signal is segmented? These are important questions, because over-reliance on top-down information may lead to inappropriate or incorrect speech segmentation when expectations only partially match the input. The degree to which this occurs (c.f., Bosker, Citation2017; Kösem et al., Citation2018), and the severity of the consequences (e.g. for understanding an interlocutor’s intended meaning) in cases where it does, will be important to assess. Indeed, recent proposals suggest that an inappropriate weighting of top-down and bottom-up information may be an important factor contributing to psychosis (Sterzer et al., Citation2018) and that neural synchronisation may provide a useful “window” for monitoring such (im)balances (e.g. Lakatos, Schroeder, Leitman, & Javitt, Citation2013). Similarly, if too much weight is given to entrainment proper, one is right back to the starting point that speech is not perfectly rhythmic and so seems unlikely to be the sole driver of strong entrainment in auditory cortices. Some relevant factors in determining this weighting might be things like the likelihood that the input signal is speech (or speech-like), how reliable the speech signal is (in a noisy environment, for instance, one might expect a higher weighting of intrinsic synchronicity), the amount of relevant top-down information that has been accumulated, and possibly the estimated degree of variability in the syllable or phoneme rates.

Some speculative consequences

If MSM turn out to be correct that much of what we call entrainment is actually internal synchronicity in disguise, one conclusion we might draw is that the only speech-specific entrainment is entrainment proper. Put another way, internal synchronicity at rates corresponding to inferences about abstract linguistic information, that does not (necessarily) express a discrete physical signature in the speech signal (e.g. morphemes, lexical representations, syntactic and semantic structures, discourse and event structures), might be expected to exhibit similar synchronicity for input modalities other than speech; like in reading for example. This possibility that entrainment may be observed during reading and not just listening to speech is intriguing and warrants further experimental investigation. In fact, there is already quite a bit of evidence for internal synchronicity during reading (for a recent review see Prystauka & Lewis, Citation2019). What has not been observed is entrainment proper related to visual sampling while reading, akin to the entrainment proper observed while listening to speech. One potential reason for this is that most studies of reading using M/EEG employ a rapid serial visual presentation (RSVP) approach for presenting words, as this typically makes subsequent data analyses less complex and more systematic. It is possible, however, that this approach obscures any entrainment proper that may be observable when reading naturally (i.e. executing saccadic eye movements to scan the page in the service of information uptake; Rayner, Citation1998). As one might expect, there are systematic differences between the eye movement behaviour of fast and slow readers (Hawelka, Schuster, Gagl, & Hutzler, Citation2015). Artificially imposing a particular reading rate as is typically done with RSVP may interfere with or reduce any entrainment proper that might arise when reading naturally. Fixation-related approaches to the analysis of M/EEG data recorded simultaneously with people’s eye movements during naturalistic text reading (e.g. Dimigen, Sommer, Hohlfeld, Jacobs, & Kliegl, Citation2011; Himmelstoss, Schuster, Hutzler, Moran, & Hawelka, Citation2019; Metzner, von der Malsburg, Vasishth, & Rösler, Citation2017) offers the potential to further probe whether such entrainment proper can be observed during naturalistic reading.

One fascinating aspect of this question will be whether entrainment-like behaviour is observed in auditory cortices, or instead in the visual system (or perhaps both). This would speak directly to models of reading (e.g. Clifton et al., Citation2016; Reichle, Rayner, & Pollatsek, Citation2003; Seidenberg & Plaut, Citation1998) and questions about the degree to which grapheme-to-phoneme mapping may be mediated by neural synchronicity. There is already evidence that lip-reading and co-speech gestures (both involving input from the visual modality) accompanying speech input can lead to (increased) entrainment in auditory and visual regions (Brookshire, Lu, Nusbaum, Goldin-meadow, & Casasanto, Citation2017; Crosse, Butler, & Lalor, Citation2015; O’Sullivan, Crosse, Di Liberto, & Lalor, Citation2017). On the other hand, visual sampling is known to be (quasi-)rhythmic in nature (e.g. VanRullen, Zoefel, & Ilhan, Citation2014), and is crucially mediated by attention. This suggests that we might (also) expect to observe entrainment proper in the visual system during reading. Just like for speech, it will be important investigate the extent to which this entrainment proper (assuming that it is observed for reading) is mediated by more abstract linguistic information through internal synchronicity.

Acknowledgements

Many thanks to Julie Van Dyke for our endless discussions about potential relationships between eye movements during reading and neural entrainment.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

A.G.L. was supported by Gravitation Grant 024.001.006 of the Language in Interaction Consortium from Netherlands Organisation for Scientific Research.

References

  • Ahissar, E., Nagarajan, S., Ahissar, M., Protopapas, A., Mahncke, H., & Merzenich, M. M. (2001). Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proceedings of the National Academy of Sciences of the United States of America, 98(23), 13367–13372. doi: 10.1073/pnas.201400998
  • Bastiaansen, M., & Hagoort, P. (2006). Oscillatory neuronal dynamics during language comprehension. In Progress in brain research-event-related dynamics of brain oscillations (pp. 179–196). https://doi.org/10.1016/S0079-6123.
  • Bosker, H. R. (2017). Accounting for rate-dependent category boundary shifts in speech perception. Attention, Perception, & Psychophysics, 79, 333–343. doi: 10.3758/s13414-016-1206-4
  • Brookshire, G., Lu, J., Nusbaum, H. C., Goldin-meadow, S., & Casasanto, D. (2017). Visual cortex entrains to sign language. Proceedings of the National Academy of Sciences of the United States of America, 114(24), 6352–6357. doi: 10.1073/pnas.1620350114
  • Buzsáki, G., Logothetis, N., & Singer, W. (2013). Scaling brain size, keeping timing: Evolutionary preservation of brain rhythms. Neuron, 80(3), 751–764. doi: 10.1016/j.neuron.2013.10.002
  • Clifton, C., Ferreira, F., Henderson, J. M., Inhoff, A. W., Liversedge, S. P., Reichle, E. D., & Schotter, E. R. (2016). Eye movements in reading and information processing: Keith Rayner’s 40year legacy. Journal of Memory and Language, 86, 1–19. doi: 10.1016/j.jml.2015.07.004
  • Crosse, X. M. J., Butler, X. J. S., & Lalor, E. C. (2015). Congruent visual speech enhances cortical entrainment to continuous auditory speech in noise-free conditions. Journal of Neuroscience, 35(42), 14195–14204. doi: 10.1523/JNEUROSCI.1829-15.2015
  • Dimigen, O., Sommer, W., Hohlfeld, A., Jacobs, A. M., & Kliegl, R. (2011). Coregistration of eye movements and EEG in natural reading: Analyses and review. Journal of Experimental Psychology: General, 140(4), 552–572. doi: 10.1037/a0023885
  • Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158. doi: 10.1038/nn.4186
  • Ghitza, O. (2011). Linking speech perception and neurophysiology: Speech decoding guided by cascaded oscillators locked to the input rhythm. Frontiers in Psychology, 2, 130. doi: 10.3389/fpsyg.2011.00130
  • Giraud, A.-L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience, 15(4), 511–517. doi: 10.1038/nn.3063
  • Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., & Garrod, S. (2013). Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biology, 11(12), e1001752. doi: 10.1371/journal.pbio.1001752
  • Hawelka, S., Schuster, S., Gagl, B., & Hutzler, F. (2015). On forward inferences of fast and slow readers. An eye movement study. Scientific Reports, 5, 8432. doi: 10.1038/srep08432
  • Himmelstoss, N. A., Schuster, S., Hutzler, F., Moran, R., & Hawelka, S. (2019). Co-registration of eye movements and neuroimaging for studying contextual predictions in natural reading. Language, Cognition and Neuroscience, 1–18. doi:10.1080/23273798.2019.1616102.
  • Jensen, O., & Colgin, L. L. (2007). Cross-frequency coupling between neuronal oscillations. Trends in Cognitive Sciences, 11(7), 267–269. doi: 10.1016/j.tics.2007.05.003
  • Keitel, A., Gross, J., & Kayser, C. (2018). Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features. PLoS Biology, 16(3), e2004473. doi: 10.1371/journal.pbio.2004473
  • Kösem, A., Bosker, H. R., Takashima, A., Meyer, A., Jensen, O., & Hagoort, P. (2018). Neural entrainment determines the words we hear. Current Biology, 28(18), 2867–2875. doi: 10.1016/j.cub.2018.07.023
  • Kösem, A., & Van Wassenhove, V. (2017). Distinct contributions of low-and high-frequency neural oscillations to speech comprehension. Language, Cognition and Neuroscience, 32(5), 536–544. doi: 10.1080/23273798.2016.1238495
  • Lakatos, P., Gross, J., & Thut, G. (2019). A new unifying account of the roles of neuronal entrainment. Current Biology, 29(18), R890–R905. doi: 10.1016/j.cub.2019.07.075
  • Lakatos, P., Schroeder, C. E., Leitman, D. I., & Javitt, D. C. (2013). Predictive suppression of cortical excitability and its deficit in schizophrenia. Journal of Neuroscience, 33(28), 11692–11702. doi: 10.1523/JNEUROSCI.0010-13.2013
  • Lam, N. H., Hultén, A., Hagoort, P., & Schoffelen, J. M. (2018). Robust neuronal oscillatory entrainment to speech displays individual variation in lateralisation. Language, Cognition and Neuroscience, 33(8), 943–954. doi: 10.1080/23273798.2018.1437456
  • Lewis, A. G., Wang, L., & Bastiaansen, M. (2015). Fast oscillatory dynamics during language comprehension: Unification versus maintenance and prediction? Brain and Language, 148, 51–63. doi: 10.1016/j.bandl.2015.01.003
  • Metzner, P., von der Malsburg, T., Vasishth, S., & Rösler, F. (2017). The importance of reading naturally: Evidence from combined recordings of eye movements and electric brain potentials. Cognitive Science, 41, 1232–1263. doi: 10.1111/cogs.12384
  • Meyer, L. (2018). The neural oscillations of speech processing and language comprehension : State of the art and emerging mechanisms. European Journal of Neuroscience, 48, 2609–2621. doi: 10.1111/ejn.13748
  • Meyer, L., Henry, M. J., Gaston, P., Schmuck, N., & Friederici, A. D. (2017). Linguistic bias modulates interpretation of speech via neural delta-band oscillations. Cerebral Cortex, 27(9), 4293–4302.
  • Meyer, L., Sun, Y., & Martin, A. E. (2019). Synchronous, but not entrained: Exogenous and endogenous cortical rhythms of speech and language processing. Language, Cognition and Neuroscience, 1–11. https://doi.org/10.1080/23273798.2019.1693050.
  • Obleser, J., & Kayser, C. (2019). Neural entrainment and attentional selection in the listening brain. Trends in Cognitive Sciences, 23(11), 913–926. doi: 10.1016/j.tics.2019.08.004
  • O’Sullivan, A. E., Crosse, M. J., Di Liberto, G. M., & Lalor, E. C. (2017). Visual cortical entrainment to motion and categorical speech features during silent lipreading. Frontiers in Human Neuroscience, 10, 679.
  • Prystauka, Y., & Lewis, A. G. (2019). The power of neural oscillations to inform sentence comprehension: A linguistic perspective. Language and Linguistics Compass, 13(9), e12347. doi: 10.1111/lnc3.12347
  • Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. doi: 10.1037/0033-2909.124.3.372
  • Reichle, E. D., Rayner, K., & Pollatsek, A. (2003). The E-Z reader model of eye-movement control in reading : Comparisons to other models. Behavioral and Brain Sciences, 26, 445–526. doi: 10.1017/S0140525X03000104
  • Seidenberg, M. S., & Plaut, D. C. (1998). Evaluating word-reading models at the item level: Matching the grain of theory and data. Psychological Science, 9(3), 234–237. doi: 10.1111/1467-9280.00046
  • Sterzer, P., Adams, R. A., Fletcher, P., Frith, C., Lawrie, S. M., Muckli, L., … Corlett, P. R. (2018). The predictive coding account of psychosis. Biological Psychiatry, 84(9), 634–643. doi: 10.1016/j.biopsych.2018.05.015
  • VanRullen, R., Zoefel, B., & Ilhan, B. (2014). On the cyclic nature of perception in vision versus audition. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1641), 20130214. doi: 10.1098/rstb.2013.0214
  • Weiss, S., & Mueller, H. M. (2012). “Too many betas do not spoil the broth”: The role of beta brain oscillations in language processing. Frontiers in Psychology, 3, 201. doi: 10.3389/fpsyg.2012.00201