10,259
Views
490
CrossRef citations to date
0
Altmetric
Articles

What do we mean by prediction in language comprehension?

& ORCID Icon
Pages 32-59 | Received 24 Mar 2015, Accepted 28 Sep 2015, Published online: 13 Nov 2015
 

ABSTRACT

We consider several key aspects of prediction in language comprehension: its computational nature, the representational level(s) at which we predict, whether we use higher-level representations to predictively pre-activate lower level representations, and whether we “commit” in any way to our predictions, beyond pre-activation. We argue that the bulk of behavioural and neural evidence suggests that we predict probabilistically and at multiple levels and grains of representation. We also argue that we can, in principle, use higher-level inferences to predictively pre-activate information at multiple lower representational levels. We suggest that the degree and level of predictive pre-activation might be a function of its expected utility, which, in turn, may depend on comprehenders’ goals and their estimates of the relative reliability of their prior knowledge and the bottom-up input. Finally, we argue that all these properties of language understanding can be naturally explained and productively explored within a multi-representational hierarchical actively generative architecture whose goal is to infer the message intended by the producer, and in which predictions play a crucial role in explaining the bottom-up input.

Acknowledgements

We thank Meredith Brown, Ralf Haefner, David Kleinschmidt, Rajeev Raizada, Michael Tanenhaus, and Eddie Wlotko for extended and very helpful discussions reflected in this paper. We also thank Meredith Brown, Vera Demberg, JP de Ruiter, Kara Federmeier, Karl Friston, Ray Jackendoff, Tal Linzen, and our two anonymous reviewers for their excellent feedback on the manuscript. All errors remain the authors. We are also very grateful to Arim Choi Perrachione for all her help with manuscript preparation.

In memory of Bern Milton Jacobson, 1919–2015.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. To derive cloze probabilities, a group of participants are presented with a series of sentence contexts and asked to produce the most likely next word for each context. The cloze probability of a given word in a given sentence context is estimated as the proportion of times that particular word is produced over all productions (Taylor, Citation1953). In addition, the constraint of a context can be calculated by taking the most common completion produced by participants who saw this context, regardless of whether or not this completion matches the word that was actually presented, and tallying the number of participants who provided this completion.

2. For an alternative conceptualisation of the linking function between probabilistic belief updating and reading times, see Hale (Citation2003, Citation2011). For empirical evaluation and further discussion, see Frank (Citation2013), Linzen and Jaeger (Citation2015), Roark, Bachrach, Cardenas, and Pallier (Citation2009), and Wu, Bachrach, Cardenas, and Schuler (Citation2010).

3. There are, of course, other ways of formalising prediction error, dating back to Bush and Mosteller (Citation1951) and Rescorla and Wagner (Citation1972). One difference between these formalisations and a Bayesian formalisation (Bayesian surprise) is that the former do not take into account uncertainty during inference or prediction (see Kruschke, Citation2008 for an excellent discussion). Regardless of how it is formalised, however, prediction and prediction error play a central role in both learning and processing, providing a powerful way of bridging literatures and of potentially linking across computational and algorithmic levels of analysis (see Jaeger & Snider, Citation2013; Kuperberg, Citation2015).

4. As we will discuss in section 4, however, very low probability incoming words that mismatch the most likely continuation in a highly constraining context can evoke a qualitatively distinct late anterior positivity ERP effect, in addition to the N400 effect.

5. In this sense, the meaning of the word generative has some similarities with Chomsky's original conception of a generative syntax, in which a grammar generated multiple possible structures (Chomsky, Citation1965). There is, however, an important difference: whereas generative grammars in the Chomskyan tradition served to test whether a sentence could be generated from a grammar (in which case it was accepted by that grammar), the generative computational models referred to here represent distributions of outputs (e.g., sentences). That is, rather than to stop at the question of whether a sentence can be generated, these models aim to capture how likely a sentence is to be generated (although it is worth noting that a generative syntax was formalised in probabilistic terms as early as Booth, Citation1969, and that probabilistic treatments of grammars have long been acknowledged in the field of sociolinguistics, see Cedergren & Sankoff, Citation1974; Labov, Citation1969 for early discussion).

6. Here, we refer to knowledge, stored at multiple grains within memory about the conceptual features that are necessary (Chomsky, Citation1965; Dowty, Citation1979; Katz & Fodor, Citation1963), as well as those that are most likely (McRae, Ferretti, & Amyote, Citation1997) to be associated with a particular semantic-thematic role of an individual event or state. This knowledge might also include the necessary and likely temporal, spatial, and causal relationships that link multiple events and states together to form sequences of events. The latter are sometimes referred to as scripts, frames, or narrative schemas (Fillmore, Citation2006; Schank & Abelson, Citation1977; Sitnikova, Holcomb, & Kuperberg, Citation2008; Wood & Grafman, Citation2003; Zwaan & Radvansky, Citation1998).

7. Note, however, that the term integration has been used in different ways in the literature. The usage described here contrasts integration with pre-activation (Federmeier, Citation2007; see also Van Petten & Luka, Citation2012, for discussion). Others, however, have used the term integration to refer more specifically to the process by which a word is combined or unified with its context to come up with a propositional meaning (e.g. Hagoort, Baggio, & Willems, Citation2009; Jackendoff, Citation2002; Lau, Phillips, & Poeppel, Citation2008).

8. The term, priming, is sometimes used simply to describe the phenomenon of facilitated processing of a target that is preceded by a prime, with which it shares one or more representation(s), regardless of mechanism. Pre-activation is just one of these mechanisms. For example, multiple different mechanisms have been proposed to account for the phenomena of both semantic priming (see Neely, Citation1991 for a review) and syntactic priming (e.g. Chang, Dell, & Bock, Citation2006; Jaeger & Snider, Citation2013; Tooley & Traxler, Citation2010).

9. For example, memory-based models of text processing assumed that simple lexico-semantic relationships within the internal representation of context, approximating to a “bag of words” (quantified using measures like latent semantic analysis, Kintsch, Citation2001; Landauer & Dumais, Citation1997; Landauer, Foltz, & Laham, Citation1998), could interact with lexico-semantic relationships stored within long-term memory, and prime upcoming lexico-semantic information through spreading activation (Kintsch, Citation1988; McKoon & Ratcliff, Citation1992; Myers & O'Brien, Citation1998; Sanford, Citation1990; Sanford & Garrod, Citation1998). This was known as resonance, and it can be distinguished from the use of high-level representations of events or event structures (that include information about “who does what to whom”) to predictively pre-activate upcoming semantic features or categories (see Kuperberg et al., Citation2011; Lau et al., Citation2013; Otten & Van Berkum, Citation2007; Paczynski & Kuperberg, Citation2012 for discussion).

10. There is, however, also evidence that top-down influences on the perception of lower level information is not the exception, but rather the norm, at least at the lowest levels of speech perception. For example, the internal distributional structure of phonological categories is known to affect the perception of subphonemic acoustic similarity (known as the perceptual magnet effect, Feldman et al., Citation2009; Kuhl, Citation1991). This effect has been shown to be a rational consequence of the fact that there is always uncertainty about the perceptual input (due to noise in the neural systems underlying perception). In inferring the percept, comprehenders thus rely on what they know about the statistical structure underlying the speech signal (Feldman et al., Citation2009; see also Haefner, Berkes, & Fiser, Citation2014, for a discussion of how sampling-based top-down pre-activation can explain otherwise surprising correlations in firing rates in neural populations).

11. Actively generative models also provide a link between language comprehension and language production (for discussion, see Jaeger & Ferreira, Citation2013; Pickering & Garrod, Citation2007, Citation2013, and for further discussion of the relationship between prediction in language comprehension and production, see Brown & Kuperberg, Citation2015; Dell & Chang, Citation2014; Federmeier, Citation2007; Garrod & Pickering, Citation2015; Jaeger & Snider, Citation2013; Kurumada & Jaeger, Citation2015; Magyari & de Ruiter, Citation2012).

12. Hierarchical predictive coding in the brain takes the principles of the hierarchical generative framework to an extreme by proposing that the flow of bottom-up information from primary sensory cortices to higher level association cortices constitutes only the prediction error, that is, only information that has not already been “explained away” by predictions that have propagated down from higher level cortices (see Clark, Citation2013; Friston, Citation2005, Citation2008; Wacongne et al., Citation2011).

Additional information

Funding

This work was partially funded by NIMH [R01 MH071635] and NICHD [R01 HD082527] grants to G. R. K., as well as by NICHD [R01 HD075797] and an NSF CAREER grant [IIS 1150028] to T. F. J.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 444.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.