563
Views
6
CrossRef citations to date
0
Altmetric
REGULAR ARTICLES

Prosodic prominence effects in the processing of spectral cues

ORCID Icon
Pages 586-611 | Received 20 Aug 2020, Accepted 30 Nov 2020, Published online: 11 Jan 2021
 

ABSTRACT

Two experiments test how phrasal prominence influences listeners' perception of vowel contrasts and how prominence information and vowel formant cues are integrated in processing. Experiment 1 finds that listeners incorporate phrasal prominence in their perception of vowels, in line with how spectral structure is modulated by prominence in speech. Experiment 2 explores how prominence information is integrated with formant cues in a visual world eyetracking task. Prominence shows an overall later influence in processing in line with current models of prosodic and segmental integration. However, listeners' perception of formants was also impacted more subtly by prominence immediately in processing such that prominence information directly shapes how formant cues are perceived. Results are discussed in terms of their implications for models of prosodic effects in segmental perception and possible differences between prosodic prominence and prosodic boundaries in this regard.

Acknowledgments

Many thanks are due to Sun-Ah Jun, Taehong Cho, Pat Keating and Megha Sundara for valuable suggestions and discussion, and to three anonymous reviewers for very helpful feedback. Further thanks to Danielle Bagnas, Juliana Casparian, Qingxia Guo and Jae Weller for their help with data collection and to Adam Royer for recording the speech materials used in this study.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Here the terminology from the prosodic hierarchy described in Beckman and Pierrehumbert (Citation1986) and Pierrehumbert (Citation1980) is adopted, in keeping with much of the literature on domain-initial strengthening.

2 For example, AP-internal [p*uri] as in (porasεk p*uri), where parentheses indicate an AP boundary, may be “beak” or “root”. However, an AP-initial target word disambiguates the meaning: (porasεk) (p*uri) can only be “root”.

3 A P-score is the proportion of times naïve listeners perceive a given word as being prominent, which they annotate as they listen to a speech sample, see e.g. Cole and Shattuck-Hufnagel (Citation2016).

4 For example, the finding that pitch perception is relative to pitch range and context is well established in the psycho-acoustic literature (Plantinga & Trainor, Citation2005; Repp, Citation1997; Schellenberg & Trehub, Citation2003), and the context-dependence of duration perception is also well-established (Bosker et al., Citation2017; Diehl & Walsh, Citation1989; Jones & McAuley, Citation2005). Even in the case of Steffman and Jun (Citation2019), where listeners heard a single isolated word in a given trial, perception of this word would be relative to stimuli heard on other trials, i.e. the global context of the experiment (cf. Bigand & Pineau, Citation1997; Jones & McAuley, Citation2005).

5 Effects of speech rate on lexical processing (Dilley & Pitt, Citation2010; Reinisch et al., Citation2011a, Citation2011b) evidence another sort expectational influence in this domain.

6 This pattern does not necessarily occur for high vowels, where sonority expansion might jeopardise attainment of the articulatory target for the vowel gesture: in these cases sonority expansion can be suppressed (Cho, Citation2005), or other prominence enhancement effects, e.g. hyper-articulation, are observed (de Jong, Citation1991, Citation1995).

7 The stimuli can be accessed via a repository hosted by the Open Science Foundation at: https://osf.io/4cemb/.

8 An exploratory analysis found that using a non-transformed preference measure, modelling looks only to “ebb” or “ab”, resulted in essentially the same results, as expected (cf. Kingston et al., Citation2016).

9 These factor smooths were shown to provide a better model fit than trajectories that were only by-participant, as assessed by comparing fREML scores using the CompareML() function in itsadug, including more complex factor smooths both increased fREML and decreased AIC.

10 This comparison was carried out by comparing model scores using the CompareML() in itsadug, as in Nixon et al. (Citation2016). The original model was compared to one in which prominence condition was removed from the three way interaction.

11 AR1 models assume that neighbouring observations in a time series are correlated such that the error in one time bin (in this case) is in part dependent on the error in adjacent bins. Assuming correlated errors in parameter estimation helps remove correlations among residuals; see e.g. Baayen et al. (Citation2018) for more information.

12 Following Maslowski et al. (Citation2020), an alternative operationalisation of the effect would be to compare the two steps which are acoustically most different (i.e. steps 1 and 6). This comparison yielded a similar though slightly earlier effect, with divergence estimated at 258 ms after target onset.

13 The divergence estimate shown below is given for the post-focus condition (as collapsing across conditions is not possible); the effect in the NPA condition showed a similar timecourse.

14 This timing asynchrony was also seen in a more traditional moving window analysis, not included here. In that analysis time was binned into 100 ms windows and a linear mixed effects regression on logit-transformed preference measures was run in each. Continuum step began to have a significant effect in the 300-400ms window. Phrasal prominence began to have a significant effect in the 500–600 ms window, though notably the prominence effect approached significance earlier in time, in similar fashion to Kim et al. (Citation2018).

15 Notably this asymmetry exists, even though the vowel preceding the target is longer in the post-focus condition (262 ms as compared to 200 ms in the NPA condition), giving listeners more time to compute the prosodic structure of the phrase as it unfolds.

16 As noted in Section 2.1, not all vowels show clear sonority expansion effects in speech production (Cho, Citation2005; de Jong, Citation1995). For example prominent high vowels (e.g. /i/ in American English; Cho, Citation2005) can show more extreme articulations under prominence (contra sonority expansion). Future work will accordingly benefit from testing how phrasal prominence impacts listeners' perception of other vowel contrasts, including those which do not undergo sonority expansion.

17 It is worth reiterating here however that Kim et al. (Citation2018) found subtle, non-significant effects of AP phrasing on processing, which highlights the need to explore further how boundary processing might also exert early phonetic context effects.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 444.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.