764
Views
42
CrossRef citations to date
0
Altmetric
Special Section: Brain Oscillations in Language

Acoustic-driven delta rhythms as prosodic markers

Pages 545-561 | Received 16 Jan 2016, Accepted 19 Aug 2016, Published online: 11 Oct 2016
 

ABSTRACT

Oscillation-based models of speech perception postulate a cortical computation principle by which decoding is performed within a time-varying window structure, synchronised with the input on multiple time scales. The windows are generated by a segmentation process, implemented by a cascade of oscillators. This paper tests the hypothesis that prosodic segmentation is driven by a “flexible” (in contrast to autonomous, “rigid”) oscillator in the delta range (0.5–3 Hz) by tracking prosodic rhythms, such that intelligibility is impaired when the ability of this oscillator to synchronise to these rhythms is impaired. In setting phrasal boundaries, both bottom-up acoustic-driven and top-down context-invoked processes interact in a manner that is difficult to decompose. The present experiments used context-free random-digit strings in order to focus exclusively on bottom-up processes. Two experiments are reported. Listeners performed a target identification task, listening to stimuli with prescribed chunking patterns (Experiment I) or chunking rates (Experiment II), followed by a target. Irrespective of the chunking pattern, performance is high only for targets inside of a chunk, pointing to the benefit of acoustic prosodic segmentation in digit retrieval. Importantly, performance remains high as long as the chunking rate is within the frequency range of neuronal delta, but sharply deteriorates for higher rates. This data provides psychophysical evidence for the role of acoustic-driven segmentation, with flexible delta oscillations at the core, in digit retrieval.

Acknowledgments

I would like to thank AT&T Labs and Interactions LLC for allowing me access to the Natural Voices Text-To-Speech system, and in particular to Mark Beutnagel for instructions and advice; to Yair Ghitza for conducting the hierarchical logistic regression analysis of the data; to Nelson Cowan for bringing Ryan's work (Citation1969) to my attention; to Mark Liberman for sharing the preliminary data on sound/silence durations at the phrasal level; to Peter Cariani and to David Poeppel for commenting on an earlier version of the manuscript; and to the two anonymous reviewers for their thorough comments.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1. The core is a vocabulary of “Lego” speech segments from which the stimuli presented to the listeners were synthesised, by concatenation.

2. A phrase is meant to be a group of words roughly 1-s long – not necessarily a sentence.

3. When attending to time-compressed speech listeners experience insensitivity to moderate time scale variations; deterioration in intelligibility for compression factors beyond 3; and a recovery of intelligibility by repackaging (e.g. Ghitza, Citation2014; Ghitza & Greenberg, Citation2009), where “repackaging” is a process of dividing the time-compressed waveform into fragments, called packets, and delivering the packets in a prescribed rate.

4. Which acoustic landmarks drive the flexible theta oscillator? Two options have been considered, yet to be vetted: (i) CV boundaries (“acoustic edges”), and (ii) vocalic nuclei (“mid vowels”). Here, the vocalic nuclei are preferred because of robustness considerations: in the presence of background noise the “islands” of reliable acoustics are the mid vowel regions (Ghitza, Citation2013).

5. The theta-syllable (Ghitza, Citation2013) is a discrete speech-information unit defined by cortical function. Its acoustic correlate is a theta-cycle long speech segment located in between two successive vocalic nuclei. As such, a theta-syllable is aligned with a VCV cluster.

6. This match can be viewed as a synchronisation between the amount of information in the input stream and the necessary decoding time in the pre-lexical level, determined by the flexible theta oscillator (Ghitza, Citation2011).

7. The AT&T-TTS system (http://www.wizzardsoftware.com/text-to-voice.php) uses a form of concatenative synthesis based on a unit-selection process, where the units are cut from a large, high-quality, pre-recorded natural voice fragments. The system produces natural-sounding, highly intelligible spoken material with a realistic prosodic rhythm – with accentuation defined by the system's internal prosodic rules – and is considered to have some of the finest quality synthesis of any commercial product.

8. The standard deviation here is the square root of the unbiased estimator of the variance.

9. Because these simulations are not simply standard error calculations, the credible intervals are not restricted to be symmetrical around the mean, as can be seen under close inspection of the data later on.

10. To illustrate our notation of chunking pattern, the root string 3762895069, for example, can be chunked into the regular chunking pattern 22222 [37 62 89 50 69], or into the irregular pattern 3322 [376 289 50 69], etc.)

11. If we were to be consistent with our notation, a (a sequence of ten 1's) should have been used. Alas, we use the shorthand 11111, instead.

12. Time compression uses a pitch-synchronous, overlap and add (PSOLA) procedure (Moulines and Charpentier, Citation1990) incorporated into PRAAT (http://www.fon.hum.uva.nl/praat/) – a speech analysis and modification toolbox. In the time-compressed signal, the formant patterns and other spectral properties are altered in duration; however, the fundamental frequency (pitch) contour remains the same (this is the motivation for using PSOLA methods).

13. This is so across languages.

14. Preliminary data suggest that this indeed is the case (Liberman, Citation2016a, Citation2016b; Ryant & Liberman, Citation2016).

Additional information

Funding

This study was funded by a research grant from the Air Force Office of Scientific Research.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.