3,003
Views
17
CrossRef citations to date
0
Altmetric
AUTHORS' RESPONSE

“Entraining” to speech, generating language?

, &
Pages 1138-1148 | Received 16 Sep 2020, Accepted 16 Sep 2020, Published online: 04 Oct 2020

ABSTRACT

Could meaning be read from acoustics, or from the refraction rate of pyramidal cells innervated by the cochlea, everyone would be an omniglot. Speech does not contain sufficient acoustic cues to identify linguistic units such as morphemes, words, and phrases without prior knowledge. Our target article (Meyer, L., Sun, Y., & Martin, A. E. (2019). Synchronous, but not entrained: Exogenous and endogenous cortical rhythms of speech and language processing. Language, Cognition and Neuroscience, 1–11. https://doi.org/10.1080/23273798.2019.1693050) thus questioned the concept of “entrainment” of neural oscillations to such units. We suggested that synchronicity with these points to the existence of endogenous functional “oscillators”—or population rhythmic activity in Giraud’s (2020) terms—that underlie the inference, generation, and prediction of linguistic units. Here, we address a series of inspirational commentaries by our colleagues. As apparent from these, some issues raised by our target article have already been raised in the literature. Psycho– and neurolinguists might still benefit from our reply, as “oscillations are an old concept in vision and motor functions, but a new one in linguistics” (Giraud, A.-L. 2020. Oscillations for all A commentary on Meyer, Sun & Martin (2020). Language, Cognition and Neuroscience, 1–8).

This article refers to:
Synchronous, but not entrained: exogenous and endogenous cortical rhythms of speech and language processing

Introduction

Speech does not mark the boundaries of every linguistic segment, be it morpheme, word, or phrase, in an injective fashion with what we as comprehenders perceive. In our target article (Meyer, Sun, & Martin, Citation2019), we argued that the alleged entrainment of neural oscillations with such symbolic units is thus implausible. We suggested that synchronicity between oscillations generated in sensory cortices by the physical nature of the sensory stimulus is combined with signals from cortical areas that encode symbolic units, such as morphemes, words, and phrases. In short, and though not fully formalised here (see Martin, Citation2020), such a claim implies a system architecture where the resulting combined population-level rhythmic activity (Giraud, Citation2020) fluctuates in step with higher-level linguistic structure. Such intrinsic synchronicity, then, can be said to reflect the inference, generation, and prediction of symbolic units like morphemes, words, and phrases. We are honoured and thankful that our article yielded a series of commentaries from auditory neuroscience to neurolinguistics (Ghitza, Citation2020; Giraud, Citation2020; Gwilliams, Citation2020; Haegens, Citation2020; Kandylaki & Kotz, Citation2020; Klimovich-Gray & Molinaro, Citation2020; Lewis, Citation2020) ranging from helpfully skeptical to further expansions of the premises put forth in our target article. Below, we discuss the following leitmotifs from these commentaries:

  1. A narrow definition should be met when invoking “entrainment”

  2. If it’s not entrainment, call it tracking?

  3. Is intrinsic synchronicity the same as top-down modulation of entrainment?

  4. How do entrainment and intrinsic synchronicity interact?

  5. Periodic linguistic processing without entrainment?

  6. Synchronicity with symbolic units: periodic ERPs?

We did not intend to underemphasise the importance of recent opinion and review articles that focus on the role of endogenous neural oscillations in the top-down modulation of speech entrainment (Haegens & Zion-Golumbic, Citation2018; cited by Haegens, Citation2020; Lakatos et al., Citation2019; Obleser & Kayser, Citation2019; Rimmele et al., Citation2018; Zoefel et al., Citation2018; cited by Haegens, Citation2020). Publication of these overlapped in time with revisions of our article and we still believe to present timely and interesting perspectives on speech processing that are complementary to our account of language processing. In contrast to the “bottom-up” processing of the speech signal alone, we focused on the generation of linguistic inferences and predictions rather than their possible function in the top-down physiological modulation of “entrainment” proper separate from linguistic representation.

1. A narrow definition should be met when invoking “entrainment”

Haegens’ (Citation2020) commentary underlines the necessity to reflect on the loose use of the entrainment term before coining new terms in uncertainty or negation. Because our psycho– and neurolinguistic readership may not be familiar with the need for a narrower definition of entrainment, we made the Entrainment Checklist below, copying and explaining (a) to (c) from Obleser & Kayser (Citation2019) and Haegens (Citation2020). For psycho– and neurolinguists who want to study symbolic computations instead of acoustics, we added (d) and (e).

a. Oscillatory activity in the absence of rhythmic stimulation

Oscillatory activity in the absence of rhythmic stimulation is a prerequisite for entrainment (Haegens, Citation2020; Obleser & Kayser, Citation2019). Resting-state data may provide a data-driven test of this criterion. For example, a magnetoencephalography study by Keitel and Gross (Citation2016) has shown oscillatory activity across the cerebral cortex in the absence of rhythmic stimulation. Their report of spectral differences between cortical areas entails the use of such data for sanity-checking whether speech–brain synchronicity during rhythmic stimulation modulates the activity of a preexisting oscillator or whether it spuriously occurs in an area that does not exhibit an oscillator that operates at the stimulation frequency to start with. Researchers could restrict analyses of data recorded during stimulation to spectro–temporal regions of interest that display preexisting resting-state activity at or near the frequency of rhythmic acoustic stimulation. We note however, that while endogenous eigenfrequencies should play a role—if only to reflect the fact that the brain is a living organ whose metabolic and computational infrastructure is composed of periods of excitation, refraction, and inhibition (Lakatos et al., Citation2019)—their presence is not inconsistent with electrophysiological responses to speech and language processing being composed of evoked responses, even if the events evoking those responses simply do not exist in speech, but are symbolic.

b. Frequency-selective phase alignment with a rhythmic external physical stimulus

An oscillator that comes with its genuine eigenfrequency should not entrain to arbitrary stimulation rhythms. Instead, the magnitude of entrainment should depend on the proximity between the stimulation frequency and the oscillator’s eigenfrequency (i.e. the so-called Arnold Tongue; Hahn et al., Citation2019; Hyafil et al., Citation2015; Lakatos et al., Citation2019; Obleser & Kayser, Citation2019; Pikovsky et al., Citation2002). Arnold Tongues are emerging in the domain of speech entrainment, where oscillatory activity at phoneme– and syllable rates prevails at rest in auditory brain regions (Daube et al., Citation2019; Giraud et al., Citation2007; Peelle et al., Citation2013). Moreover, reduced frontal delta-band activity at rest has been linked to reduced delta-band entrainment (Arns et al., Citation2007; Hämäläinen et al., Citation2012; Molinaro et al., Citation2016; Pagnotta et al., Citation2015). Of potential interest to linguists, evidence for Arnold Tongues during speech processing could be used to study biological links between speech as an object of cortical information processing and language as a cognitive and cultural system.

c. Rhythmic activity transiently continues at the stimulus rate after stimulus offset

The workhorse of entrainment in speech and language processing research is the recording of electrophysiological signals while presenting continuous speech. As most of our readers know, frequency-domain speech–brain synchronicity in such experiments could as easily be mimicked by sequences of evoked responses as it could be driven by "true" oscillations (e.g. Ding & Simon, Citation2014; Haegens, Citation2020; Klimesch et al., Citation2007; Obleser & Kayser, Citation2019). Critical for psycho– and neurolinguists, this holds not only for acoustic speech rhythms, but for internal computations as well (e.g. N400, P600, and CPS; Frank et al., Citation2015; Kuperberg et al., Citation2019; Steinhauer & Friederici, Citation2001). Alternative experimental designs employ two-phase trial structures involving initial playback of an acoustic rhythm and subsequent presentation of a target stimulus; the rationale is to affect target processing through prior entrainment (e.g. Bosker, Citation2017; Hickok et al., Citation2015; Kösem et al., Citation2018). While, as noted by Haegens (Citation2020), spectral smearing can still jeopardise analysis in the target phase, behavioural effects in the target phase depending on manipulations of the first-phase carrier can nevertheless be interpreted to indicate entrainment. Such designs may thus be preferable.

d. The stimulus feature under study is physical, not symbolic

As stressed in our target article, entrainment is a phenomenon that is best understood in relation to physical stimuli, but it is unclear how symbols would drive entrainment properFootnote1 (e.g. Lewis, Citation2020). If the phase of an electrophysiological oscillator is supposed to inherit the phase of a stimulus, the stimulus needs a phase to start with. To support an experimenter’s assumption that stimuli exhibit acoustic rhythms that could possibly drive entrainment, experimenters should append the result of spectral analysis of their stimuli. Examples of symbols that cannot cause entrainment in ecologically valid settings are morphemes, words, phonetic features, parts of speech, and information-theoretic complexity metrics, mostly because their onsets and offsets need not be predictable in order to be perceived. Causally speaking, proficient listeners generate associated linguistic symbols on encountering familiar patterns in the electrophysiological imprint served by their auditory periphery. We refer the interested reader to the commentary by Giraud (Citation2020) for an inspiring discussion of the possible neurobiological underpinnings.

e. Rhythmic stimulation is ecologically valid

Experimenters sometimes operationalise speech through the temporal straightjacket of isochrony (for discussion, see Cummins, Citation2012; Goswami & Leong, Citation2013; for examples of affected linguistic levels, see also Gwilliams, Citation2020). We trivially note that when introducing acoustic rhythmicity into an experiment, the odds for observing electrophysiological rhythmicity are good. Before labelling such results entrainment and concluding that the associated processing mechanism is oscillatory, experimenters should motivate their choice of introducing rhythmicity via statistical assessments of natural speech as such (for a good example, see Ding et al., Citation2017). Critically, the use of naturalistic stimuli (cf. Kandylaki & Kotz, Citation2020) does not lift this analytical burden: Methodologically speaking, speech–brain synchronicity with natural speech can be significant in spite of lacking stimulus rhythmicity (see Kaufeld et al., Citation2020).Footnote2 As an example, consider the assumption that speech prosody is rhythmic enough for entraining an electrophysiological oscillator (e.g. Ghitza, Citation2020), as well as reports of entrainment to natural prosody (Bourguignon et al., Citation2013; Kaufeld et al., Citation2020; Mai et al., Citation2016; Meyer & Gumbert, Citation2018; Meyer et al., Citation2016). While there are physiological, environmental, and possibly electrophysiological constraints that may lead to a non-uniform distribution of prosodic events in speech (cf. Kreiner & Eviatar, Citation2014; Rochet-Capellan & Fuchs, Citation2014), corpus analyses still suggest substantial variance in the duration of prosodic units (Vollrath et al., Citation1992). It would be very important to assess whether this variance fits the bandwidth of an electrophysiological oscillator assumed to be devoted to prosody processing.

2. If it’s not entrainment, call it tracking?

In line with Obleser & Kayser (Citation2019), Haegens (Citation2020) suggests that phenomena that do not meet a narrow definition of entrainment—including cases of synchronicity with symbolic units—should rather be labelled tracking or entrainment in the broad sense. While tracking and entrainment are sometimes used interchangeably to label synchronicity with both acoustic and symbolic units (Brennan & Martin, Citation2020; Cogan & Poeppel, Citation2011; Daube et al., Citation2019; Gross et al., Citation2013; Hämäläinen et al., Citation2012; Jochaut et al., Citation2015; Kaufeld et al., Citation2020; Kayser et al., Citation2015; Luo & Poeppel, Citation2007; Luo et al., Citation2010; Mai et al., Citation2016; Meyer & Gumbert, Citation2018; Molinaro et al., Citation2016; Park et al., Citation2018; Weissbart, Kandylaki, Reichenbach, et al., Citation2019; Zoefel & VanRullen, Citation2016), some authors use tracking to label synchronicity with symbolic units alone (Bourguignon, Molinaro, et al., Citation2020; Brennan & Martin, Citation2020; Ding et al., Citation2016; Kaufeld et al., Citation2020; Zhang & Ding, Citation2016).

We still worry that tracking of symbolic units is not fully consistent: As noted by Lewis (Citation2020), symbolic units cannot be tracked in the literal sense, because the physicality of speech does not feature a one-to-one mapping to symbolic units for an electrophysiological observer to track (Martin, Citation2016, Citation2020). For a concrete example, the seminal study by Ding et al. (Citation2016) presented isochronous word sequences in a language that was either native to their participants or not. Each pair of words denoted a syntactic phrase and each quadruplet of words denoted a sentence—which non-native listeners could not recognise. Frequency components in the magnetoencephalogram mirrored the paces of phrases and sentences in native listeners only. Hence, while not physically present in acoustics, these symbolic units were present in listeners’ electrophysiology, possibly in a cyclic fashion. Our worry with applying the tracking label here: If electrophysiology tracked syntactic structures, one would logically entail that electrophysiology tracked itself.

Giraud (Citation2020) and Gwilliams (Citation2020) come to our terminological rescue, discussing that synchronicity with symbolic units might not reflect their tracking, but their generation proper (see also Martin, Citation2016, Citation2020; Martin & Doumas, Citation2017). This suggestion draws a clear line between electrophysiological functions that perceive and process acoustic stimuli and functions that invoke or infer the symbolic units. See below for the implications that this terminological dissociation has for capturing the exogenous–endogenous interplay.

3. Is intrinsic synchronicity redundant with top-down modulation of entrainment?

Klimovich-Gray and Molinaro (Citation2020) worry that the separation of endogenous oscillatory activity from the top-down modulation of entrainment may be overly analytic: Without being at pace with perceptual sampling, the inference, generation, and prediction of linguistic units would be of little use for comprehension. We certainly agree; our target article did not intend to question interactions between entrainment and intrinsic synchronicity. We acknowledge the role of oscillatory coupling in the functional connectivity between auditory regions and frontal cortices (e.g. Molinaro et al., Citation2016; Park et al., Citation2015) as substrate of top-down amplification (e.g. Schroeder et al., Citation2008; for review, see Vanrullen et al., Citation2011) and temporal binding (e.g. Giraud & Poeppel, Citation2012; Morillon et al., Citation2012). Also, as discussed by Giraud (Citation2020), exogenous and endogenous oscillators may not always be neuroanatomically distinct; instead, a single network might show entrainment while still acting as a pacemaker. Yet, Giraud (Citation2020) also proposes that oscillators that fulfil abstract purposes might lean towards serving as pacemakers, whereas less abstract processes might be dominated by entrainment. Along these lines, both Giraud (Citation2020) and Lewis (Citation2020) suggest that the separation between entrainment and pacemaking might conceptually depend on abstraction and neuroanatomically entail an increasing network size.

Our proposal of intrinsic synchronicity aims to conceptualise these pacemakers as such. Here, we suggest a clinical approach to dissociate entrainment and pacemaking while acknowledging their mutual interactions. As a first example, Broca’s aphasia could be a case of intact speech entrainment in spite of abnormal periodic chunking—potentially allowing for the dissociation of exogenous entrainment proper and endogenous linguistic processing. Patients suffering from Broca’s aphasia after left-hemispheric precentral lesions exhibit altered auditory chunking time windows in spite of structurally intact auditory regions (Szelag et al., Citation1997). While we are not aware of reports of intact prosody entrainment in Broca’s aphasia, we note that such patients are certainly able to shadow speech (Fridriksson et al., Citation2012; Fridriksson et al., Citation2015). In healthy subjects, prosody entrainment associates with auditory, but not precentral activity (Bourguignon et al., Citation2013). In line with this picture, we found repetitive transcranial magnetic stimulation of left inferior frontal cortex in healthy subjects to affect linguistic chunking while leaving prosody perception intact (Meyer et al., Citation2018). In principle, lesion data could thus dissociate prosody entrainment from intrinsic synchronicity with endogenously generated linguistic chunks, helping to address the question of whether multi-word chunking relies on prosody entrainment, an internal oscillatory pacemaker, or both (Ghitza, Citation2020).

As a second example, linguistic dysfunction in schizophrenia (Kircher et al., Citation2018; Sterzer et al., Citation2018) has been argued to reflect an imbalance between speech perception and the internal generation of linguistic predictions (Brown & Kuperberg, Citation2015). In schizophrenia patients suffering from auditory hallucinations, overly strong predictions can trigger the hallucination of words that do not exist outside of the patient’s brain (Alderson-Day et al., Citation2017). Auditory stimulation in schizophrenia patients associates with abnormal beta– and delta-band oscillations (Lakatos et al., Citation2013), previously proposed to subserve the prediction of content and timing, respectively (Lewis & Bastiaansen, Citation2015; Schroeder & Lakatos, Citation2009; Stefanics et al., Citation2010). In the case of auditory hallucinations, there is no stimulus to entrain to, and thus no entrainment proper to be modulated—could endogenous oscillatory activity underlie the hallucination as such? This hypothesis is supported by the observation of auditory activity to visual-only stimulation with lip movements (Bourguignon et al., Citation2020; cited by Klimovich-Gray & Molinaro, Citation2020).

4. How do entrainment and intrinsic synchronicity interact?

Lewis (Citation2020) points out that our target article lacks a proposal as to how entrainment and intrinsic synchronicity might interact. We are thankful for Giraud's (Citation2020) insightful suggestion that the generation of symbolic units may involve a matching between quasi-periodic speech segments and internally stored or generated symbols—such that generative rhythms impose their preferred pace onto perceptual systems to enforce an according segmentation of speech. This is entirely consistent with the analysis-by-synthesis type of proposal laid out in recent models of language processing (Martin, Citation2016, Citation2020) which have also been realised in an abstracted computational instantiation (Martin & Doumas, Citation2017). Such a view is based on claims about how sensory systems make contact with action systems writ large (Buzsáki, Citation2019; Ernst & Bülthoff, Citation2004; Olshausen, Citation2013), but also on classic and core ideas in psycholinguistics (Halle & Stevens, Citation1962; Marslen-Wilson & Welsh, Citation1978). Lewis’ (2020) concerns resonate with the proposal of a now-or-never bottleneck, according to which abstraction must occur before speech segments corresponding to phonemes, syllables, words, and syntactic structures are forgotten (e.g. Christiansen & Chater, Citation2015; but note that perceptual memory and memory in language processing may not only be a function of time; see Brown, Citation1958; McElree, Citation2006; Peterson & Peterson, Citation1959; Sperling, Citation1983). This may also partially answer Lewis’ (2020) question of the exact weighting between entrainment and intrinsic synchronicity: Ultimately, the weighting should depend on temporal compatibility between stimulus and inference—balance when the stimulus pace matches the preferred linguistic pace, imbalance when there is a mismatch. Gain and inhibition, and as a result, the phase of the signals that emerge when neural assemblies form dynamically across the language network, likely play crucial roles in titrating how sensory and abstract neural signals are combined (Martin, Citation2020).

Giraud (Citation2020) discusses that gamma-band oscillations have been associated both with acoustic processing at the sub-syllabic rate (Daube et al., Citation2019; Gross et al., Citation2013) and with the invocation of phonemic categories (Di Liberto et al., Citation2015; Lehongre et al., Citation2011; Mesgarani et al., Citation2014; Nourski et al., Citation2015). We thank Giraud (Citation2020) for pointing us to literature on endogenous timing constraints on phoneme perception, suggesting that sounds can only be dissociated in time when paced within the period range of lower-gamma-band cycles (Joliot et al., Citation1994). This observation is consistent with lower-gamma-band cycles acting as pacemakers to sub-syllabic sampling, imposing a preferred pace of invocation of phonemic categories onto auditory sampling. In other words: The endogenous generation of phonemic units sets a pace for the entrainment by the sub-syllabic acoustic rhythm. This should be investigated further.

As a second example from the slow end of the frequency axis, multi-word chunks cannot exceed a duration of about 2–3 s, after which auditory short-term memory fades (Baddeley et al., Citation1975); in parallel, chunking-related event-related potentials occur without having been triggered by prosodic cues (Schremm et al., Citation2015). Functional neuroimaging results are consistent with endogenous operation time windows in the order of seconds that are devoted to the processing of multi-word sequences (Hasson et al., Citation2008; Lerner et al., Citation2011). Our readers know that it is debated whether delta-band oscillations are exogenously entrained by speech prosody alone to support linguistic chunking (Bourguignon et al., Citation2013; Gross et al., Citation2013; Mai et al., Citation2016) or whether they also underlie the endogenous generation of multi-word chunks (Boucher et al., Citation2018; Ding et al., Citation2016; Meyer et al., Citation2016) or temporal predictions on the time scale of seconds (Arnal et al., Citation2015; Breska & Deouell, Citation2017; cited by Haegens, Citation2020; Donhauser & Baillet, Citation2019; Lakatos et al., Citation2008; Lakatos et al., Citation2013; Meyer & Gumbert, Citation2018; Stefanics et al., Citation2010; Weissbart et al., Citation2019). We concur with Ghitza's (Citation2020) suggestion that all of these can be true, such that speech prosody could exogenously entrain delta-band oscillations, the cycles of which then act as endogenous temporal limiters of chunk duration. This could help explaining why multi-word chunks can be generated without the presence of prosodic cues (i.e. «every prosodic unit is a syntactic unit» does not entail that «every syntactic unit is a prosodic unit»; e.g. Drury et al., Citation2016; Steinhauer & Friederici, Citation2001).

5. Periodic linguistic processing without entrainment?

As mentioned in (d) of the Entrainment Checklist, rhythmic acoustic cues may be necessary for entrainment. Yet, as discussed in our target article, there could still be periodic linguistic processing without acoustic cues. In general, periodicity may be the standard, rather than the exception of electrophysiology (Buzsáki, Citation2006, Citation2019; Lakatos et al., Citation2019; Palva et al., Citation2005; VanRullen, Citation2016; see also (a) of the Entrainment Checklist). As discussed above, there are endogenous timing constraints on phoneme perception and multi-word chunking (Joliot et al., Citation1994; Schremm et al., Citation2015). Auditory neuroscience may inspire psycholinguists to reconsider such evidence in terms of neural oscillations (e.g. Martin, Citation2020; Tilsen, Citation2018).

We thank Kandylaki and Kotz (Citation2020) for sketching further directions for such research. In particular, they raise the fascinating possibility that the formation of sentences’ verb–argument structure (i.e. those syntactic phrases denoting the who and whom involved in an action, plus the verb denoting the action; Chomsky, Citation1965) could be a periodic process, pointing to an underlying endogenous oscillatory generator. Indeed, a number of correlations between the phase of delta-band oscillations and syntactic structures (Brennan & Martin, Citation2020; Ding et al., Citation2016; Kaufeld et al., Citation2020; Meyer et al., Citation2016) or information-theoretic metrics of sequential and syntactic complexity (Meyer & Gumbert, Citation2018; Weissbart, Kandylaki, Reichenbach, et al., Citation2019) have been published. In principle, it is still unknown whether these findings reflect verb–argument structure, implicit prosodic phrases (Drury et al., Citation2016; Frazier et al., Citation2006; Kreiner & Eviatar, Citation2014; Schremm et al., Citation2015), or even information structure (i.e. given–new, theme–rheme, topic–comment alternations over time; for an introduction, see Krifka, Citation2008; Hagoort, personal communication). Referring to (e) of the Entrainment Checklist, we encourage corpus linguists to haunt for periodicity across these diverse levels of theoretical linguistic description.

6. Synchronicity with symbolic units: periodic ERPs?

We thank Gwilliams (Citation2020) for highlighting one of the biggest challenges for testing whether linguistic processing involves endogenous oscillators: the ambiguity of evoked responses and oscillations (e.g. (c) of the Entrainment Checklist). Linguistic processes with a deterministic time lag can phase-lock endogenous oscillations across trials, masquerading as an evoked response in the average (e.g. Klimesch et al., Citation2007). In turn, linguistic processes occurring at every given linguistic segment (e.g. syllable, word, or phrase) of a sentence or narrative will give rise to speech–brain synchronicity at the segment frequency. The plot thickens, because, in the limit, it may be the case that systems of oscillators and series of evoked responses can approximate each other.Footnote3 Here, we discuss exemplary evoked responses for which this might be the case. Afterwards, we provide strategies for assessing whether these have endogenous oscillatory substrates.

In the time domain, linguistic prediction relates to the N400 (Bornkessel-Schlesewsky & Schlesewsky, Citation2019; Cowles et al., Citation2007; Fitz & Chang, Citation2018; Frank et al., Citation2015; Kuperberg et al., Citation2019; Rabovsky et al., Citation2018). In the frequency domain, delta-band oscillations have been linked to computational metrics of linguistic prediction (Meyer & Gumbert, Citation2018; Weissbart et al., Citation2019). Frequency decomposition of the N400 shows a dominant delta-band component (Roehm et al., Citation2009). This pattern leaves it unclear whether linguistic prediction has an evoked or oscillatory substrate.

A second example, the P600, is thought to reflect a revision of the current syntactic structure or overall interpretation (Bornkessel-Schlesewsky & Schlesewsky, Citation2008, Citation2019; Kaan & Swaab, Citation2003; Kuperberg et al., Citation2019). Noteworthy, the P600 may be elicited by every single word in continuous narratives, depending on the amount of revision or reinterpretation that is required (Hale et al., Citation2018). While we are not aware of a published frequency decomposition of the P600, a sequence of single-word P600s would likely surface as oscillatory synchronicity between revision or integration demands and the EEG.

Our third example is chunking. The boundaries of multi-word chunks are accompanied by the CPS (Steinhauer et al., Citation1999). While the CPS can be triggered by prosody (Gilbert et al., Citation2015; Holzgrefe et al., Citation2013; Steinhauer, Citation2003), it can also be triggered by visual cues during reading (Drury et al., Citation2016; Steinhauer, Citation2003). Strikingly, the CPS appears with an endogenous period of 2–3 s even in the absence of prosody (Roll et al., Citation2012; Schremm et al., Citation2015). Frequency-domain analyses show delta-band phase in the CPS window to predict chunking decisions (Meyer et al., Citation2016). In principle, the CPS could thus reflect a phase reset of endogenous delta-band oscillations that are devoted to chunking (Boucher et al., Citation2018; Ding et al., Citation2016).

To understand the relationship between these evoked responses and endogenous oscillatory activity, single-trial phase should be assessed concurrently to averaging across trials. Averaging is often advocated as prerequisite of sufficient signal-to-noise ratio (Luck, Citation2014), but multivariate approaches question this assumption (e.g. Sassenhagen & Fiebach, Citation2019; Sassenhagen et al., Citation2014). Averaging is also still thought to yield temporally invariant electrophysiological counterparts of discrete processing steps—the boxes and arrows of cognitive (neuro)science (e.g. Luck, Citation2014). Still, it is unclear whether evoked responses reflect singular stimulus-driven amplitude events rather than the average modulation of ongoing oscillatory activity (Klimesch et al., Citation2007). In contrast, phase is an indicator of neuronal excitability that directly predicts behavioural responses, thus providing a parsimonious substrate of cognitive processing (Henry & Obleser, Citation2012; Meyer & Gumbert, Citation2018; Schroeder et al., Citation2008; Stefanics et al., Citation2010). While phase-locking must surface as an evoked response in the average, an association between single-trial phase and a downstream behavioural task can still point to an oscillatory substrate. As related recommendation raised in (e) of the Entrainment Checklist, artificial rhythmicity of endogenous linguistic processing should not be induced by the experimental operationalisation. To close, Giraud (Citation2020) posits that symbolic representations are not extended in time. While we agree that one of the benefits of symbolising a representation is that it need no longer be tied to the vagaries or particulars of the stimulus, environment, or instance, there is no reason that the brain could not generate expectations or gather statistics about when and in what contexts symbolic representations are likely to occur or should be inferred. Thus, decoupling symbols from time is not a necessary condition. Martin and Doumas (Citation2017) showed that a symbolic-connectionist model that uses time to encode functionally symbolic representations in the state dynamics of a neural network could approximate the pattern of neural oscillations found by Ding et al. (Citation2016). The model relies on symbols being extended in time; the rhythmic activity in the neural network, and the ability to separate patterns in the network by when they occur, is what leads to functionally symbolic representations.

Conclusion

Entrainment is a useful concept to describe auditory speech processing. Neural oscillations might also have a role in the inference, generation, and prediction of linguistic units, but this should not be termed entrainment. In turn, the assessment of electrophysiological periodicity of linguistic processing requires dedicated experimental paradigms. In addition, linguists should test for periodicity in speech and text corpora to assess whether such a hypothesis is ecologically valid. Once periodicity has been established, the electrophysiological basis and according limitations of exogenous–endogenous interactions can be pursued. Speech acoustics entrain neural oscillations, but neural oscillations—or population rhythmic activity—likely generate language.

Acknowledgments

L. M. was supported by the Max Planck Research Group Language Cycles. Y. S. was supported by the Max Planck Society. A. E. M. was supported by the Max Planck Research Group Language and Computation in Neural Systems and by the Netherlands Organization for Scientific Research (Grant 016. Vidi.188.029).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 One can imagine an experiment where a participant is asked to monitor a masked visual stream of images for an object, and that if that object were to appear periodically and predictably, perhaps entrainment over and above entrainment to the presentation rate could be observed.

2 To complicate things further, it is not well understood how much isochrony is required for entrainment to occur—in principle, oscillators can tolerate some temporal variability of their entraining stimuli; that is, anisochrony does not necessarily rule out entrainment (Lakatos et al., Citation2019).

3 Trains of evoked responses and oscillations might not even be mutually exclusive—consider the frequent observation that specific phase intervals raise the probability of evoked responses to occur (e.g., Henry & Obleser, Citation2012; Lakatos et al., Citation2019; Stefanics et al., Citation2010).

References