1,733
Views
150
CrossRef citations to date
0
Altmetric
Original Articles

Acoustic correlates of information structure

, , &
Pages 1044-1098 | Published online: 06 Oct 2010
 

Abstract

This paper reports three studies aimed at addressing three questions about the acoustic correlates of information structure in English: (1) do speakers mark information structure prosodically, and, to the extent they do; (2) what are the acoustic features associated with different aspects of information structure; and (3) how well can listeners retrieve this information from the signal? The information structure of subject–verb–object sentences was manipulated via the questions preceding those sentences: elements in the target sentences were either focused (i.e., the answer to a wh-question) or given (i.e., mentioned in prior discourse); furthermore, focused elements had either an implicit or an explicit contrast set in the discourse; finally, either only the object was focused (narrow object focus) or the entire event was focused (wide focus). The results across all three experiments demonstrated that people reliably mark (1) focus location (subject, verb, or object) using greater intensity, longer duration, and higher mean and maximum F0, and (2) focus breadth, such that narrow object focus is marked with greater intensity, longer duration, and higher mean and maximum F0 on the object than wide focus. Furthermore, when participants are made aware of prosodic ambiguity present across different information structures, they reliably mark focus type, so that contrastively focused elements are produced with greater intensity, longer duration, and lower mean and maximum F0 than noncontrastively focused elements. In addition to having important theoretical consequences for accounts of semantics and prosody, these experiments demonstrate that linear residualisation successfully removes individual differences in people's productions thereby revealing cross-speaker generalisations. Furthermore, discriminant modelling allows us to objectively determine the acoustic features that underlie meaning differences.

Notes

1 Numerous terms are used in the literature to refer to the distinction between the information that is old for the listener and the information that the speaker is adding to the discourse: background and foreground; given and new; topic and comment; theme and rheme, etc. In this paper, we will use the term given to refer to the parts of the utterance which are old to the discourse, and focused to refer to the part of the utterance which is new to the discourse.

2 Topic, the third component of information structure, describes which discourse referent focused information should be associated with, as in the mention of Damon in “As for Damon, he fried an omelet”. The current studies do not address the prosodic realisation of topic.

3 The ToBI system was developed in the early 90s as the standard system for annotation of prosodic features (Silverman et al., Citation1992).

4 In the absence of explicit instruction to produce complete sentences, with a lexicalised subject, verb, and object, speakers would likely resort to pronouns or would omit given elements altogether (e.g., What did Damon fry this morning? An omelet). A complete production account of information structure meaning distinctions should include not just the prosodic cues used by the speakers, but also syntactic and lexical production choices, as well as the interaction among these different production strategies. However, because we focus on prosody in the current investigation, we wanted to be able to compare acoustic features across identical words. Thus, we required that participants always produce a subject, verb, object, and temporal modifier on every trial.

5 LDA calculates a function, computed as a linear combination of all predictors entered, which results in the best separation of two or more groups. For two groups, only one function is computed. For three groups, the first function provides the best separation of Group 1 from Groups 2 and 3; a second, orthogonal, function provides the best separation of Groups 2 and 3, after partialling out variance accounted for by the first function. Stepwise LDA is an iterative procedure which adds predictors based on which of the candidate predictors provide the best discrimination.

6 Wilks's lambda is a measure of the distance between groups on means of the independent variables, and is computed for each function. It ranges in size from 0 to 1; lower values indicate a larger separation between groups. The extent to which the model can effectively discriminate a new set of data is simulated by a leave-one-out classification, in which the acoustic data from each production are iteratively removed from the dataset, the model is computed, and the left-out case is classified by the resultant functions.

7 The coefficients in indicate which acoustic features best discriminate focus location, such that larger absolute values indicate a greater contribution of that feature to discrimination. For example, inspection of the plot in and the coefficients in the focus location columns of shows that the acoustic features of Damon score around zero, or lower, on the first function (−0.002, 0.001, −0.01, and −0.06) and around zero on the second function (−0.003, 0.021, −0.016, and −0.101). Fried shows a different pattern; specifically, the acoustic features of fried have coefficients around zero for the first function, and negative coefficients for the second function. Finally, omelet shows a third pattern: its acoustic correlates are centered around zero for the first function, but are high for the second function.[0]

8 In early pilots in which there was no feedback for incorrect responses, we observed that listeners were at chance in choosing the correct question.

9 Importantly, the F0 results are not artefacts of the residualisation procedure employed to remove variance from the acoustic features due to speaker and item. The same numerical pattern of F0 values is observed whether residualisation is employed or not, though only the residualised acoustic features successfully discriminate focus type.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 444.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.