1,409
Views
2
CrossRef citations to date
0
Altmetric
Regular Articles

Decreased sensitivity to changing durational parameters of syllable sequences in people who stutter

ORCID Icon &
Pages 179-187 | Received 03 Sep 2018, Accepted 03 Jul 2019, Published online: 15 Jul 2019

ABSTRACT

Stuttering is a disorder that affects the coordination of complex sequencing mechanisms that define the temporal layout of speech. However, classical motor areas of the brain, responsible for the sequencing of articulatory aspects in speech production also process non-verbal and even non-motor temporal information. This configuration suggests that perceptual temporal processing capacities may factor into the symptom profile of various motor disorders. We investigated perceptual sensitivity for changing temporal parameters of sequentially presented consonant–vowel-consonant syllables in people who stutter (PWS) and matched controls. Changes were durational contrasts (short vs. long) of the whole syllable and/or the vocalic nucleus. Analyses focused on sensitivity indices (d’), response times, response time variability, and co-variation of these variables with offline measures of cognitive performance. Results indicate lower sensitivity for durational contrasts and longer and more variable response times for long vowels in the PWS group, pointing towards subtle perceptual verbal temporal processing differences.

1. Introduction

Speaking requires the fluent coordination of neurocognitive and muscular sequencing operations to produce the fleeting dynamic sound modulations that constitute the acoustic speech signal. Stuttering is a well-known but still poorly understood phenomenon that affects these sequencing operations. At the surface level, stuttering is marked by dysfluent speech, with symptoms comprising repetitions or prolongations of sounds, syllables, or articulatory postures next to avoidance and struggle behaviour (Van Riper & Erickson, Citation1996, p. 254). Based on these directly observable characteristics, the discussion centres on speech motor control. However, contributions of sensory processes, in particular auditory feedback, have always been considered. One line of research emphasises the role of timing for stuttering (Alm, Citation2004; Etchell, Johnson, & Sowman, Citation2014; Falk, Müller, & Dalla Bella, Citation2015; MacKay & MacDonald, Citation1984; Park & Logan, Citation2015; Wieland, McAuley, Dilley, & Chang, Citation2015), principally acknowledging that “speech is patterned in time, both motorically and acoustically” (Van Riper, Citation1982, p. 20). Timing differences in PWS can extend beyond the speech domain. There are some, albeit partly conflicting indications for more variable motor timing in children who stutter when they are asked to uphold a steady clapping rate after external pacing is removed (Olander, Smith, & Zelaznik, Citation2010; but see Hilger, Zelaznik, & Smith, Citation2016). Recently, it has been shown that PWS demonstrate subtle timing differences in a task requiring manual motor synchronisation with variable metronome sequences (Sares, Deroche, Shiller, & Gracco, Citation2019). There is also evidence for co-variation of such basic motor timing abilities and speech timing abilities and stuttering severity (Cooper & Allen, Citation1977; Falk et al., Citation2015).

Earlier work on timing in stuttering was limited by several aspects: (i) the lack of a theory about how timing is achieved in fluent speech, (ii) an inability to specify the nature and the cause of the potential timing differences, and (iii) concerns that the conceptualisation of stuttering as a timing problem may not be compatible with data indicating an important role of auditory feedback processing in stuttering (MacKay & MacDonald, Citation1984; Van Riper, Citation1982). The effectiveness of auditory feedback manipulations, particularly of delayed auditory feedback, in elevating stuttering symptoms is long established (Lotzmann, Citation1961; Soderberg, Citation1968). However, PWS also show more variable timing in their response to feedback (pitch) manipulations of their own voice delivered by headphones (Sares, Deroche, Shiller, & Gracco, Citation2018). Differences in speech and non-speech timing, correlations between non-speech timing abilities and stuttering severity, effects of temporal feedback manipulations on speech motor fluency, and variable timing in response to pitch feedback manipulations seem to point toward global temporal processing differences that affect production and perception of temporal structure in PWS.

Better knowledge about the structural and functional differentiation of brain structures associated with motor control may help to assess the relation of timing and stuttering. Macrostructures such as the cerebellum, the basal ganglia, and supplementary motor cortices comprise subregions engaged in sensorimotor and sensory processing that are supported by specific connectivity patterns (Akkal, Dum, & Strick, Citation2007; Bostan & Strick, Citation2018; Petacchi, Laird, Fox, & Bower, Citation2005; Picard & Strick, Citation2001; Strick, Dum, & Fiez, Citation2009). Stuttering is associated with activation changes in motor areas, e.g. it has been shown that stuttering severity correlates with activation of the head of the caudate nucleus of the basal ganglia (Giraud et al., Citation2008). Other studies indicate reduced activation during planning and perception of speech and non-speech gestures as opposed to activation increases and decreases during production in several brain areas (Chang et al., Citation2009). Crucially, several of these areas are also involved in motor and non-motor temporal processing. Temporal processing is typically ascribed to a dedicated distributed system that involves cerebellar, basal ganglia, supplementary motor, and dorsolateral prefrontal regions (Ivry & Schlerf, Citation2008; Merchant, Harrington, & Meck, Citation2013; Spencer & Ivry, Citation2013; Wiener, Turkeltaub, & Coslett, Citation2010). With any form of sequential behaviour, including speech processing, this configuration implies rapid, parallel, and repetitive passing of temporal information through subregions of the same macrostructure. This requires that any realistic model of sequential behaviour in general, and speech processing in particular has to account for the differentiation of these “motor regions” and their temporal processing function across production and perception (Ackermann, Mathiak, & Riecker, Citation2007; Kotz & Schwartze, Citation2016; Mariën et al., Citation2014; Scott, McGettigan, & Eisner, Citation2009).

Functional similarity, spatial proximity, and temporal overlap in this distributed system may generate interference in case of pathological changes, effectively diffusing the differentiation of production and perception either in specific parts or across the whole system. Other mechanisms may compensate for inadequate differentiation. For example, delayed auditory feedback improves speech fluency in PWS, either because it decreases automaticity or because it prevents speakers from hearing their own errors, thereby reducing or prohibiting erroneous basal ganglia activity (Alm, Citation2004; Guenther & Hickok, Citation2016). The higher degree of temporal differentiation introduced by the delay may also improve differentiation of production-related and perceptual temporal processing. Irrespective of such speculations, the nature and cause of temporal processing differences in stuttering remain elusive. It has been shown that children who stutter have problems discriminating non-verbal auditory rhythms, suggesting basic perceptual temporal processing differences that may cascade into the internally paced control of movement (Wieland et al., Citation2015). This is consistent with earlier ideas that stuttering is induced by asynchronies between the predicted and the actual timing of intervals between the vowels of successive syllables as specified by the same “rhythmic component” (Harrington, Citation1988). Such a unitary “rhythmic component” may well reflect engagement of the proposed temporal processing system in production and perception.

With this perspective in mind, the current study investigated perceptual verbal temporal processing in people who stutter (PWS) and matched controls listening to continuously presented monosyllabic sequences. Unlike studies that focus on motor aspects of stuttering, the current study decidedly targeted the testing of perceptual timing capacities and their relation to verbal comprehension capacities, using a combination of neuropsychological indices of verbal comprehension, verbal stimulus material, and an implicit perceptual experimental task, i.e. motor aspects of stuttering symptomatology were not of primary interest. Participants were asked to indicate any perceived change in stimulus identity by button-press (1-back same-different judgment). These changes were implemented on the basis of four variants of the consonant–vowel-consonant (CVC) syllable /tak/ spoken by a female speaker. The specific phoneme combination was selected to maximise the principal discriminability between the sounds (two different voiceless stop consonants as opposed to one open vocal tract vowel). This choice was intended to increase the likelihood that subsequent findings could be ascribed to temporal characteristics rather than difficulties in discriminating the sounds per se. The four experimental variants were generated via manipulations (short vs. long) of the total duration and/or the duration of the vocalic nucleus (peak) of the syllable. The latter manipulation was motivated by the finding that this element, rather than the onset of a stimulus, is critical for tempo and regularity judgments (Morton, Marcus, & Frankish, Citation1976; Port, Citation2003). Our main hypothesis was that PWS would show decreased sensitivity to these changes due to differences in perceptual temporal processing. This decrease may be specific to or at least stronger for manipulations of either the nucleus or the total syllable duration. We further predicted that such a decrease would be associated with longer and more variable response times, although we did not instruct participants to respond as fast as possible as we did not want to include an element of time pressure. Next to these primary hypotheses, we explored the relation of experimental performance and performance in terms of verbal comprehension scores as operationalised in the Wechsler Adult Intelligence Scale (WAIS-III; Wechsler, Citation1997). This was done to probe potential implications of perceptual temporal processing acuity for verbal comprehension in PWS.

2. Material and methods

2.1. Participants

Participants were 12 right-handed (Edinburgh Handedness Inventory > +80 (Oldfield, Citation1971)) self-reported PWS (1 woman; mean age: 33.3, SD: 7.45 years) with a history of stuttering dating back to childhood (i.e. with a beginning between ages 3–11) and 12 controls (mean age: 32.6, SD: 7.25 years). Controls matched individual PWS in terms of age (+/− 2 years), handedness, and formal education (in years). Stuttering severity was assessed by means of the Stuttering Severity Instrument (SSI-3 (Riley, Citation1994)) on the basis of recordings of spontaneous speech samples (assessed in terms of percentage of stuttered syllables and the duration of longest instances of stuttering). This indicated very mild (N = 9), mild (1), severe (1), and very severe (1) forms (mean SSI score = 17, SD = 10, median 12.5) of stuttering in the group. All but three PWS (1 woman, all with very mild forms) had received therapy in the past, either one (N = 2 PWS), two (N = 4), three (N = 2), or four (N = 1) therapy units comprising multiple sessions each. Controls were recruited via databases at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig, while PWS were recruited via advertisements placed at the University of Leipzig and at logopaedic practices. All participants were native speakers of German. None reported any additional previous history of hearing problems or neuropsychological disorders. All participants provided their informed written consent and received a compensatory fee. The study was approved by the ethics committee of the University of Leipzig.

2.2. Verbal comprehension and working memory testing

Prior to the behavioural experiment, all participants were assessed using a selection of subtests from the Wechsler-Intelligenztest für Erwachsene, a German adaptation of the Wechsler Adult Intelligence Scale (WAIS-III; Wechsler, Citation1997). The selected tests covered the domains of verbal comprehension (Gemeinsamkeiten finden (similarities), Wortschatz-Test (vocabulary), Allgemeines Wissen (information), Allgemeines Verständnis (comprehension)) and working memory (Zahlen nachsprechen (digit span), Rechnerisches Denken (arithmetic), Buchstaben-Zahlen-Folge (letter-number sequencing)), which allowed calculating one index score per domain and a combined verbal IQ score. These specific measures rather than the full IQ score were selected in order to restrict the subsequent analyses to the presumably most relevant aspects (language and working memory), conducting three one-way ANOVAs to compare the two groups and to explore patterns of co-variation between the respective scores and the experimental variables of interest in the PWS group.

2.3. Experimental setup

During the behavioural experiment, participants sat in a comfortable chair in a dimly-lit testing booth. They were asked to fixate an asterisk continuously displayed on a computer screen that was placed on a table in front of them. Stimulus delivery, online-randomization, and response recording was controlled by Presentation 16.0 (Neurobehavioral Systems). The stimulus material consisted of a pseudo-randomized sequence of 512 syllables with a total duration of approximately 15 min., comprising four variants of the CVC syllable /tak/ (128 of each variant). The syllable (no lexical meaning in German) was selected to provide high phonemic discriminability of consonant and vowel sounds. These syllable variants were derived from two original recordings (stereo, 44100 Hz) of this syllable spoken by a female speaker who was instructed to produce a short vowel in the first instance, and a long vowel in the second instance. The four variants of the original syllable (; ) were generated via manipulations of specific sound characteristics using the Audacity 2.0.0 software package (The Audacity Team). These manipulations consisted of normalisation of the original recordings (0.0 dB) followed by a shortening of the original long vowel variant by removing 85 ms of the voice onset time preceding /k/ to generate a first 500 ms template (midpoint of the intended 400 and 600 ms final durations) and then cross-splicing the vowel segment of the original short vowel variant into this template to generate a second 500 ms template (i.e. retaining the original consonant segments of the first template). These templates were then either shortened to 400 ms or lengthened to 600 ms by applying the “change tempo (change tempo without pitch)” effect provided by Audacity over the whole template in order to obtain the four experimental variants. This procedure generated one variant with long overall duration and short vowel duration (losv), one with short overall duration and short vowel duration (shsv), one with long overall duration and long vowel duration (lolv), and one with short overall duration and long vowel duration (shlv). The inter-stimulus-interval was 1200 ms. Pseudo-randomization ensured that no more than five instances of one syllable variant were presented consecutively. This guaranteed that all syllable variants were presented throughout the whole sequence, while it also served to reduce the potential number of false positives that we expected to observe with the ongoing repetition of just one syllable variant.

Figure 1. Stimulus material. Four variants of the syllable /tak/ were derived from two original recordings of this syllable spoken by a female speaker. Manipulations involved changing the duration of syllabic nucleus (vowel segment /a/) via cross-splicing and changes of the overall syllable duration, whereas syllable onset (consonant segment /t/) and coda (consonant segment /k/) characteristics were retained from one of the original recordings. Lines in the lower panels depict intensity (grey) and pitch (black) contours. Abbreviations: losv (long overall duration, short vowel duration), shsv (short overall duration, short vowel duration), lolv (long overall duration, long vowel duration), shlv (short overall duration, long vowel duration).

Figure 1. Stimulus material. Four variants of the syllable /tak/ were derived from two original recordings of this syllable spoken by a female speaker. Manipulations involved changing the duration of syllabic nucleus (vowel segment /a/) via cross-splicing and changes of the overall syllable duration, whereas syllable onset (consonant segment /t/) and coda (consonant segment /k/) characteristics were retained from one of the original recordings. Lines in the lower panels depict intensity (grey) and pitch (black) contours. Abbreviations: losv (long overall duration, short vowel duration), shsv (short overall duration, short vowel duration), lolv (long overall duration, long vowel duration), shlv (short overall duration, long vowel duration).

Table 1. Sound characteristics of the two original syllables and the four experimental variants as provided by Praat 6.0.28 (Boersma and Weenink).

Participants listened to the stimuli presented via headphones (Sennheiser HD 202) at a comfortable intensity. They were instructed to pay attention to the stimulus sequence and to indicate a perceived change of stimulus identity relative to the immediately preceding stimulus by pressing the spacebar on a keyboard connected to the stimulus computer (1-back same-different judgment).

The primary variable of interest was the perceptual sensitivity to the changes in the stimulus as indicated by the sensitivity index d’ (Z(hit rate) − Z(false alarm rate)) computed separately for each individual and each condition. Although the primary focus of the study was on group differences in the general ability to perceive the temporal manipulations embedded in the sequence, we accounted for potential differences between the two dimensions of the manipulations by conducting a 2 × 2 × 2 ANOVA with the between subjects factor group (PWS vs. controls) and the within-subject factors stimulus duration (short vs. long) and vowel duration (long vs. short) in IBM SPSS 22 (IBM Corp.). This analysis was performed after adjusting extreme values in the calculation of the d’ values, i.e. hit rates of 1.00 and false alarm rates of 0.00 were adjusted by replacing the former with (n−0.5)/n and the latter with 0.5/n where n is the number of signal or noise trials (Macmillan & Kaplan, Citation1985; Stanislaw & Todorov, Citation1999).

Task instructions did not require participants to respond as quickly as possible in order to avoid explicit time-pressure, which may additionally draw on limited temporal processing and/or motor resources. However, the continuous nature of the stimulation required that responses were given during the time frame defined by the duration of the stimuli (400/600 ms) and the subsequent inter-stimulus-interval (1200 ms). Responses falling outside a range of 200 to 1600 ms post-stimulus onset were excluded from all further analyses. Mean response times were calculated for correct responses only and compared between the two groups separately for each syllable variant by means of one-way ANOVAs. These direct comparisons were performed to account for the fact that the absolute response times would vary as a function of the length of the syllable and the actual timing of the manipulation within each stimulus type. However, response time variability was assessed in terms of a directly comparable relative measure, i.e. the respective coefficient of variation (the ratio of the standard deviation to the mean response time calculated for each participant) and thus analysed analogous to the d’ values. Where required, the sequentially rejective Holm–Bonferroni method (Holm, Citation1979) was applied to address the problem of multiple comparisons. In order to assess any disproportional influence of individual PWS/control pairs on detection performance, response times, and response time variability we performed n−1 jackknife resampling analyses (Tukey, Citation1958) for each significant finding and additionally report the respective jackknife estimate (mean p value) that was obtained for these analyses.

3. Results

3.1. Verbal comprehension and working memory

Three one-way ANOVAs were conducted to compare the WAIS-III scores for verbal comprehension, working memory, and verbal IQ between the two groups. The three simultaneous tests resulted in a Holm–Bonferroni adjusted alpha level of .017 for the most significant test. These analyses yielded no significant difference in terms of verbal comprehension, F (1,22) = 1.039, p = .319, ηp2=.045, next to a weak trend towards different verbal IQs, F (1,22) = 3.343, p = .081, ηp2=.132, and a significant difference for working memory, F (1,22) = 6.782, p = .016, ηp2=.236. However, the jackknife estimate was .025 (SD .013) for the latter finding and only four resampled datasets produced p-values smaller than the adjusted alpha level (.017), prompting us to reject this result for the full group. Taken together, these findings suggest comparable performance of the two groups in terms of WAIS-III scores, with some indications of lower working memory performance in the PWS group.

3.2. Perceptual sensitivity to change

Initial inspection of the data indicated generally lower hit rates in PWS (mean = 2.4, SD = 0.7) than in controls (mean = 2.9, SD = 0.5) (), with one person per group performing close to ceiling for the shlv variant (PWS .98; control .99) and one other control with similarly high performance for the losv variant only (.98). Levene’s test of equality of error variances performed separately for the d’ values obtained for each condition was non-significant in all cases.

Table 2. Means (M) and standard deviations (SD) for sensitivity indices (d’), hit rates, response times, and response time variability (coefficient of variation, CV) for people who stutter (PWS) and controls and each syllable variant: losv (long overall duration, short vowel duration), shsv (short overall duration, short vowel duration), lolv (long overall duration, long vowel duration), shlv (short overall duration, long vowel duration).

The subsequent ANOVA yielded a significant main effect of group, F (1,22) = 6.05, p = .027, ηp2=.204, while none of the other main effects (stimulus duration, F (1,22) = .241, p = .628, ηp2=.011; vowel duration, F (1,22) = 2.367, p = .138, ηp2=.097) or interactions involving the factor group yielded a significant result (group x stimulus duration, F (1,22) = 1.445, p = .242, ηp2=.062; group x vowel duration, F (1,22) = 2.934, p = .101, ηp2=.118; group x stimulus duration x vowel duration, F (1,22) = 2.343, p = .140, ηp2=.096). The jackknife estimate was .036 (SD .014) for the main effect of group. These findings confirm a global reduction in perceptual sensitivity towards the temporal manipulations in PWS. This finding was not differentially modulated by overall stimulus duration or vowel duration despite some numerical differences suggesting reduced sensitivity to the vowel manipulation ().

Figure 2. Mean sensitivity indices (d’) for the detection of changes (top panel), reaction times (RT) for correct responses (middle panel), and coefficients of variation (CV) of response times for correct responses (bottom panel) across the four syllable variants for people who stutter (PWS) and controls. The PWS group included 1 woman (individual 11), 1 very severe case (individual 8), and 1 case of relatively late stuttering onset (individual 8); individuals 3 and 12 had very mild symptoms. (losv = long overall duration, short vowel duration, shsv = short overall duration, short vowel duration, lolv = long overall duration, long vowel duration, shlv = short overall duration, short vowel duration). Error bars indicate standard deviations.

Figure 2. Mean sensitivity indices (d’) for the detection of changes (top panel), reaction times (RT) for correct responses (middle panel), and coefficients of variation (CV) of response times for correct responses (bottom panel) across the four syllable variants for people who stutter (PWS) and controls. The PWS group included 1 woman (individual 11), 1 very severe case (individual 8), and 1 case of relatively late stuttering onset (individual 8); individuals 3 and 12 had very mild symptoms. (losv = long overall duration, short vowel duration, shsv = short overall duration, short vowel duration, lolv = long overall duration, long vowel duration, shlv = short overall duration, short vowel duration). Error bars indicate standard deviations.

3.3. Response times and response time variability

Initial inspection of the data indicated overall longer response times in PWS than in controls () across all syllable variants. However, the planned comparisons confirmed such difference only for the shlv variant, F (1,22) = 4.696, p = .041, ηp2=.176, for which controls responded faster (Mean 673 ms, SD 64 ms) than PWS (Mean 741 ms, SD 88 ms), with a jackknife estimate of .037 (.026). This was followed by the other long vowel (lolv) variant, F (1,22) = 2.850, p = .105, ηp2=.115 and clearly non-significant results for the two variants with short vowel durations (losv), F (1,22) = .941, p = .342, ηp2=.041; (shsv), F (1,22) = 1.129, p = .299, ηp2=.049.

Coefficients of variation were principally higher in PWS than in controls (). As for d’ scores, Levene’s test of equality of error variances performed separately for the coefficient of variation obtained for each condition was non-significant in all cases. The subsequent ANOVA yielded significant interactions of group x vowel duration, F (1,22) = 6.216, p = .021, ηp2=.220, and stimulus duration x vowel duration, F (1,22) = 4.841, p = .039, ηp2=.180, with all other p > .109. Resolving the former by the factor vowel duration provided a significant difference between the groups in their response variability for long vowel variants, F (1,22) = 5.113, p = .034, ηp2=.189, with a jackknife estimate of .040 (SD .020), but not for short vowel variants, F (1,22) = .110, p = .743, ηp2=.005.

3.4. Co-variation of verbal abilities

Correlation analyses were performed to test for co-variation of experimental performance and verbal comprehension scores (Wechsler, Citation1997). Analyses were restricted to variables for which significant group differences were obtained in the behavioural results, i.e. they included (i) the mean d’ across all four syllable variants, (ii) the mean coefficient of variation across the two long vowel variants, and (iii) the mean response time for the shlv variant. The experimental variables were not correlated with each other, resulting in four simultaneous tests and a Holm–Bonferroni adjusted alpha level of .0125 for the most significant test. This procedure yielded a significant negative correlation between mean response time variability for long vowel variants and verbal comprehension scores in PWS, r = −.707, p = .010. Similar tendencies of co-variation between response times for the shlv variant and verbal comprehension scores, r = −.539, p = .070, and between verbal comprehension scores and mean d’ scores, r = .525, p = .079, were consistent with this finding but already failed to reach conventional alpha levels. No significant result was obtained for the control group, suggesting a link between higher response time variability for long vowel variants and lower verbal comprehension performance that is specific to PWS.

Finally, as the group difference for working memory was only rejected on the basis of the resampling procedure, we performed an additional correlation analysis to test for co-variance between this score and detection performance for all syllable variants. However, there was no significant relation of these measures (all p > .318), suggesting that any potential working memory issue in the PWS cannot explain the performance in the 1-back same-different judgement.

4. Discussion

The results of the current study confirm decreased global perceptual sensitivity to acoustic changes (as indexed by d’ scores) embedded in a sequence of aurally presented CVC syllables in a group of PWS relative to controls. Although performance in the PWS group was overall high, the calculated sensitivity indices were in all cases lower than for the control group. The specific phonemes used (two stop consonants and one vowel) were selected to promote high discriminability in order to facilitate task performance. All changes in the syllable sequence were based on manipulations of either the total duration or the duration of the vowel segment constituting the syllable nucleus. Like every manipulation of sound this procedure inevitably entails changes in sound quality. However, the basic nature of the task, the intentional choice of high phonemic discriminability within the syllable, and the focus on generating syllables differing in durational characteristics seem to justify the conclusion that the main findings reflect verbal perceptual temporal processing differences in PWS. This was also expressed in longer response times for the short total duration/long vowel duration (shlv) syllable variant and more variable mean response times for the long vowel variants. The latter parameter was also negatively correlated with verbal comprehension in the PWS group. Although the response time data only establish an indirect link to the perceptual processing of the vowel manipulations, they suggest that problems with the encoding of the vowel over time may influence comprehension scores in PWS. Taken together, the current findings corroborate recent evidence informed by a renewed interest in the relation of timing capacities and stuttering in sensorimotor and sensory contexts (Etchell et al., Citation2014; Falk et al., Citation2015; Wieland et al., Citation2015).

The results call for a systematic assessment of the role of specific temporal stimulus parameters on the one hand and perceptual sensitivity to these parameters in PWS on the other hand. Parameters such as rate, rhythm, and duration can be assessed across different speech-inherent timescales (from brief sounds to conversational interactions) to specify the nature and the cause of timing differences in stuttering (MacKay & MacDonald, Citation1984). Temporal parameters of speech, their functional interpretation, and their neural signatures each offer starting points in this direction, e.g. in terms of a temporal framework for the description of acoustic structure (Rosen, Citation1992), the role of oscillatory activity across different frequency bands for comprehension (Peelle & Davis, Citation2012), or asymmetric time-windows in the neural representation of the signal (Poeppel, Citation2003). This speech-specific approach may be informed by domain-general characteristics of dedicated temporal processing. For example, the cerebellar role in the precise temporal coding of salient events in the sub-second range (Spencer & Ivry, Citation2013) may be conceived as a clock signal operating at the phonetic syllable frequency (about 4 Hz) that guides the allocation of attention in time, thereby partly determining the overall quality of speech processing (Schwartze & Kotz, Citation2013; Schwartze & Kotz, Citation2016). The speech signal conveys several likely candidates that may constitute respective “salient events” which trigger the cerebellar timing system, e.g. signal onsets, or the successive peaks of acoustic energy associated with the vocalic nucleus (Greenberg, Carvey, Hitchcock, & Chang, Citation2003; Schwartze & Kotz, Citation2016). A similar notion is expressed in the “theta syllable”, defined as the speech segment between successive vocalic nuclei that sets a temporal window structure for speech decoding indicated by entrainment of neural oscillatory activity in the theta (3–9 Hz) range (Ghitza, Citation2013). The precise timing signal provided by the cerebellum may partly drive such oscillatory activity and provide a “clock-signal” that plays a role in coordinating the perceptual integration of auditory information over time (Schwartze & Kotz, Citation2016).

Against this background, imprecise encoding and decoding of the temporal characteristics of salient events such as vocalic nuclei over time may not only induce stuttering (Harrington, Citation1988) but factor into attentional, working memory, and long-term memory processes. This link may also explain co-variation of mean response time variability for the long vowel syllable variants and verbal comprehension scores in the PWS group in the current study. For longer vowels, it may essentially be more difficult to determine and subsequently signal the precise time-point of their occurrence as a “salient event”. However, in light of non-verbal sensorimotor temporal processing differences in PWS (Falk et al., Citation2015; Olander et al., Citation2010), it will ultimately be necessary to test whether some temporal processing aspects are speech-specific, and how they interact with other speech-relevant processes, e.g. working memory or the allocation of attention in time. Taken together, our conceptually motivated behavioural study of perceptual temporal processing capacities in PWS lead to empirical findings that support the notion that the directly observable motor component of stuttering is a manifestation of a more complex sensorimotor dysfunction. This dysfunction seems to not only affect the ability to fluently encode the temporal structure of successive speech elements but also the abilities to decode the temporal structure of such a sequence and to use this information in a perceptual discrimination task.

Acknowledgements

The authors would like to thank Jana Kynast and Sebastian Wahnelt for support during the data acquisition. Part of this work was conducted while the first author was a member of staff at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig, Germany.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Ackermann, H., Mathiak, K., & Riecker, A. (2007). The contribution of the cerebellum to speech production and speech perception: Clinical and functional imaging data. The Cerebellum, 6, 202–213. doi: 10.1080/14734220701266742
  • Akkal, D., Dum, R. P., & Strick, P. L. (2007). Supplementary motor area and presupplementary motor area: Targets of basal ganglia and cerebellar output. Journal of Neuroscience, 27, 10659–10673. doi: 10.1523/JNEUROSCI.3134-07.2007
  • Alm, P. A. (2004). Stuttering and the basal ganglia circuits: A critical review of possible relations. Journal of Communication Disorders, 37, 325–369. doi: 10.1016/j.jcomdis.2004.03.001
  • Bostan, A. C., & Strick, P. L. (2018). The basal ganglia and the cerebellum: Nodes in an integrated network. Nature Reviews Neuroscience, 19, 338–350. doi: 10.1038/s41583-018-0002-7
  • Chang, S., Kenney, M. K., Loucks, T. M. J., & Ludlow, C. L. (2009). Brain activation abnormalities during speech and non-speech in stuttering speakers. Neuroimage, 46, 201–212. doi: 10.1016/j.neuroimage.2009.01.066
  • Cooper, M. H., & Allen, G. D. (1977). Timing control accuracy in normal speakers and stutterers. Journal of Speech and Hearing, 20, 55–71. doi: 10.1044/jshr.2001.55
  • Etchell, A. E., Johnson, B. W., & Sowman, P. F. (2014). Behavioral and multimodal neuroimaging evidence for a deficit in brain timing networks in stuttering: A hypothesis and theory. Frontiers in Human Neuroscience, 8, 467. doi: 10.3389/fnhum.2014.00467
  • Falk, S., Müller, T., & Dalla Bella, S. (2015). Non-verbal sensorimotor timing deficits in children and adolescents who stutter. Frontiers in Psychology, 6, 847. doi: 10.3389/fpsyg.2015.00847
  • Ghitza, O. (2013). The theta-syllable: A unit of speech information defined by cortical function. Frontiers in Psychology, 4, 138. doi: 10.3389/fpsyg.2013.00138
  • Giraud, A.-L., Neumann, K., Bachoud-Levi, A.-C., Gudenberg, A.-W., von Gudenbery, A. W., Euler, H. A., Lanfermann, H., & Preibisch, C. (2008). Severity of dysfluency correlates with basal ganglia activity in persistent developmental stuttering. Brain and Language, 104, 190–199. doi: 10.1016/j.bandl.2007.04.005
  • Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S. (2003). Temporal properties of spontaneous speech—a syllable-centric perspective. Journal of Phonetics, 31, 465–485. doi: 10.1016/j.wocn.2003.09.005
  • Guenther, F. H., & Hickok, G. (2016). Neural models of motor speech control. In G. Hickok & S. L. Small (Eds.), Neurobiology of language (pp. 725–740). Amsterdam: Academic Press.
  • Harrington, J. (1988). Stuttering, delayed acoustic feedback, and linguistic rhythm. Journal of Speech, Language, and Hearing Research, 31, 36–47. doi: 10.1044/jshr.3101.36
  • Hilger, A. I., Zelaznik, H., & Smith, A. (2016). Evidence that bimanual motor timing performance is not a significant factor in developmental stuttering. Journal of Speech, Language, and Hearing Research, 59, 674–685. doi: 10.1044/2016_JSLHR-S-15-0172
  • Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.
  • Ivry, R. B., & Schlerf, J. E. (2008). Dedicated and intrinsic models of time perception. Trends in Cognitive Sciences, 12, 273–280. doi: 10.1016/j.tics.2008.04.002
  • Kotz, S. A., & Schwartze, M. (2016). Motor-timing and sequencing in speech production: A general-purpose framework. In G. Hickok & S. L. Small (Eds.), Neurobiology of language (pp. 717–724). Amsterdam: Academic Press.
  • Lotzmann, G. (1961). Zur Anwendung variierter Verzögerungs- zeiten bei Balbuties. Folia Phoniatrica et Logopaedica, 13, 276–312. doi: 10.1159/000262924
  • MacKay, D. G., & MacDonald, M. C. (1984). Stuttering as a sequencing and timing disorder. In R. F. Curlee & W. H. Perkins (Eds.), Nature and treatment of stuttering (pp. 261–282). San Diego, CA: New directions, College-Hill Press.
  • Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis of group data: Estimating sensitivity from average hit and false-alarm rates. Psychological Bulletin, 98, 185–199. doi: 10.1037/0033-2909.98.1.185
  • Mariën, P., Ackermann, H., Adamaszek, M., Barwood, C. H., Beaton, A., Desmond, J., & Ziegler, W. (2014). Consensus paper: Language and the cerebellum: An ongoing enigma. Cerebellum, 13, 386–410.
  • Merchant, H., Harrington, D. L., & Meck, W. H. (2013). Neural basis of the perception and estimation of time. Annual Review of Neuroscience, 36, 313–336. doi: 10.1146/annurev-neuro-062012-170349
  • Morton, J., Marcus, S. M., & Frankish, C. R. (1976). Perceptual centers (P-centers). Psychological Review, 83, 405–408. doi: 10.1037/0033-295X.83.5.405
  • Olander, L., Smith, A., & Zelaznik, H. (2010). Evidence that a motor timing deficit is a factor in the development of stuttering. Journal of Speech, Language, and Hearing Research, 53, 876–886. doi: 10.1044/1092-4388(2009/09-0007)
  • Oldfield, R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9, 97–113. doi: 10.1016/0028-3932(71)90067-4
  • Park, J., & Logan, K. J. (2015). The role of temporal speech cues in facilitating the fluency of adults who stutter. Journal of Fluency Disorders, 46, 41–55. doi: 10.1016/j.jfludis.2015.07.001
  • Peelle, J., & Davis, M. H. (2012). Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology, 3, 320. doi: 10.3389/fpsyg.2012.00320
  • Petacchi, A., Laird, A. R., Fox, P. T., & Bower, J. M. (2005). Cerebellum and auditory function: An ALE meta-analysis of functional neuroimaging studies. Human Brain Mapping, 25, 118–128. doi: 10.1002/hbm.20137
  • Picard, N., & Strick, P. L. (2001). Imaging the premotor areas. Current Opinion in Neurobiology, 11, 663–672. doi: 10.1016/S0959-4388(01)00266-5
  • Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time’. Speech Communication, 41, 245–255. doi: 10.1016/S0167-6393(02)00107-3
  • Port, R. F. (2003). Meter and speech. Journal of Phonetics, 31, 599–611. doi: 10.1016/j.wocn.2003.08.001
  • Riley, G. (1994). The stuttering severity instrument for adults and children (SSI-3) (3rd ed.). Austin, Texas: PRO-ED.
  • Rosen, S. (1992). Temporal information in speech: Acoustic, auditory and linguistic aspects. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 336, 367–373. doi: 10.1098/rstb.1992.0070
  • Sares, A. G., Deroche, M. L. D., Shiller, D. M., & Gracco, V. L. (2018). Timing variability of sensorimotor integration during vocalization in individuals who stutter. Scientific Reports, 8, 16340. doi: 10.1038/s41598-018-34517-1
  • Sares, A. G., Deroche, M. L. D., Shiller, D. M., & Gracco, V. L. (2019). Adults who stutter and metronome synchronization: Evidence for a nonspeech timing deficit. Annals of the New York Academy of Sciences.
  • Schwartze, M., & Kotz, S. A. (2013). A dual-pathway neural architecture for specific temporal prediction. Neuroscience and Biobehavioral Reviews, 37, 2587–2596. doi: 10.1016/j.neubiorev.2013.08.005
  • Schwartze, M., & Kotz, S. A. (2016). Contributions of cerebellar event-based temporal processing and preparatory function to speech perception. Brain and Language, 161, 28–32. doi: 10.1016/j.bandl.2015.08.005
  • Scott, S. K., McGettigan, C., & Eisner, F. (2009). A little more conversation, a little less action – candidate roles for the motor cortex in speech perception. Nature Reviews Neuroscience, 10, 295–302. doi: 10.1038/nrn2603
  • Soderberg, G. A. (1968). Delayed auditory feedback and stuttering. Journal of Speech and Hearing Disorders, 33, 260–267. doi: 10.1044/jshd.3303.260
  • Spencer, R. M. C., & Ivry, R. B. (2013). Cerebellum and timing. In M. Manto, D. L. Gruol, J. D. Schmahmann, N. Koibuchi, & F. Rossi (Eds.), Handbook of the cerebellum and cerebellar disorders (pp. 1201–1219). Dordrecht: Springer.
  • Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection measures. Behavior Research Methods, Instruments, & Computers, 31, 137–149. doi: 10.3758/BF03207704
  • Strick, P. L., Dum, R. P., & Fiez, J. A. (2009). Cerebellum and nonmotor function. Annual Review of Neuroscience, 32, 413–434. doi: 10.1146/annurev.neuro.31.060407.125606
  • Tukey, J. W. (1958). Bias and confidence in not-quite large samples. The Annals of Mathematical Statistics, 29, 614–623. doi: 10.1214/aoms/1177706647
  • Van Riper, C. (1982). The nature of stuttering. Englewood Cliffs, NJ: Prentice-Hall.
  • Van Riper, C., & Erickson, R. L. (1996). Speech correction: An introduction to speech pathology and audiology. Boston: Allyn and Bacon.
  • Wechsler, D. (1997). Wechsler adult intelligence scale (3rd ed.). San Antonio, TX: The Psychological Corporation.
  • Wieland, E. A., McAuley, J. D., Dilley, L. C., & Chang, S. (2015). Evidence for a rhythm perception deficit in children who stutter. Brain & Language, 144, 26–34. doi: 10.1016/j.bandl.2015.03.008
  • Wiener, M., Turkeltaub, P., & Coslett, H. B. (2010). The image of time: A voxel-wise meta-analysis. Neuroimage, 49, 1728–1740. doi: 10.1016/j.neuroimage.2009.09.064