ABSTRACT
We consider several key aspects of prediction in language comprehension: its computational nature, the representational level(s) at which we predict, whether we use higher-level representations to predictively pre-activate lower level representations, and whether we “commit” in any way to our predictions, beyond pre-activation. We argue that the bulk of behavioural and neural evidence suggests that we predict probabilistically and at multiple levels and grains of representation. We also argue that we can, in principle, use higher-level inferences to predictively pre-activate information at multiple lower representational levels. We suggest that the degree and level of predictive pre-activation might be a function of its expected utility, which, in turn, may depend on comprehenders’ goals and their estimates of the relative reliability of their prior knowledge and the bottom-up input. Finally, we argue that all these properties of language understanding can be naturally explained and productively explored within a multi-representational hierarchical actively generative architecture whose goal is to infer the message intended by the producer, and in which predictions play a crucial role in explaining the bottom-up input.
Acknowledgements
We thank Meredith Brown, Ralf Haefner, David Kleinschmidt, Rajeev Raizada, Michael Tanenhaus, and Eddie Wlotko for extended and very helpful discussions reflected in this paper. We also thank Meredith Brown, Vera Demberg, JP de Ruiter, Kara Federmeier, Karl Friston, Ray Jackendoff, Tal Linzen, and our two anonymous reviewers for their excellent feedback on the manuscript. All errors remain the authors. We are also very grateful to Arim Choi Perrachione for all her help with manuscript preparation.
In memory of Bern Milton Jacobson, 1919–2015.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. To derive cloze probabilities, a group of participants are presented with a series of sentence contexts and asked to produce the most likely next word for each context. The cloze probability of a given word in a given sentence context is estimated as the proportion of times that particular word is produced over all productions (Taylor, Citation1953). In addition, the constraint of a context can be calculated by taking the most common completion produced by participants who saw this context, regardless of whether or not this completion matches the word that was actually presented, and tallying the number of participants who provided this completion.
2. For an alternative conceptualisation of the linking function between probabilistic belief updating and reading times, see Hale (Citation2003, Citation2011). For empirical evaluation and further discussion, see Frank (Citation2013), Linzen and Jaeger (Citation2015), Roark, Bachrach, Cardenas, and Pallier (Citation2009), and Wu, Bachrach, Cardenas, and Schuler (Citation2010).
3. There are, of course, other ways of formalising prediction error, dating back to Bush and Mosteller (Citation1951) and Rescorla and Wagner (Citation1972). One difference between these formalisations and a Bayesian formalisation (Bayesian surprise) is that the former do not take into account uncertainty during inference or prediction (see Kruschke, Citation2008 for an excellent discussion). Regardless of how it is formalised, however, prediction and prediction error play a central role in both learning and processing, providing a powerful way of bridging literatures and of potentially linking across computational and algorithmic levels of analysis (see Jaeger & Snider, Citation2013; Kuperberg, Citation2015).
4. As we will discuss in section 4, however, very low probability incoming words that mismatch the most likely continuation in a highly constraining context can evoke a qualitatively distinct late anterior positivity ERP effect, in addition to the N400 effect.
5. In this sense, the meaning of the word generative has some similarities with Chomsky's original conception of a generative syntax, in which a grammar generated multiple possible structures (Chomsky, Citation1965). There is, however, an important difference: whereas generative grammars in the Chomskyan tradition served to test whether a sentence could be generated from a grammar (in which case it was accepted by that grammar), the generative computational models referred to here represent distributions of outputs (e.g., sentences). That is, rather than to stop at the question of whether a sentence can be generated, these models aim to capture how likely a sentence is to be generated (although it is worth noting that a generative syntax was formalised in probabilistic terms as early as Booth, Citation1969, and that probabilistic treatments of grammars have long been acknowledged in the field of sociolinguistics, see Cedergren & Sankoff, Citation1974; Labov, Citation1969 for early discussion).
6. Here, we refer to knowledge, stored at multiple grains within memory about the conceptual features that are necessary (Chomsky, Citation1965; Dowty, Citation1979; Katz & Fodor, Citation1963), as well as those that are most likely (McRae, Ferretti, & Amyote, Citation1997) to be associated with a particular semantic-thematic role of an individual event or state. This knowledge might also include the necessary and likely temporal, spatial, and causal relationships that link multiple events and states together to form sequences of events. The latter are sometimes referred to as scripts, frames, or narrative schemas (Fillmore, Citation2006; Schank & Abelson, Citation1977; Sitnikova, Holcomb, & Kuperberg, Citation2008; Wood & Grafman, Citation2003; Zwaan & Radvansky, Citation1998).
7. Note, however, that the term integration has been used in different ways in the literature. The usage described here contrasts integration with pre-activation (Federmeier, Citation2007; see also Van Petten & Luka, Citation2012, for discussion). Others, however, have used the term integration to refer more specifically to the process by which a word is combined or unified with its context to come up with a propositional meaning (e.g. Hagoort, Baggio, & Willems, Citation2009; Jackendoff, Citation2002; Lau, Phillips, & Poeppel, Citation2008).
8. The term, priming, is sometimes used simply to describe the phenomenon of facilitated processing of a target that is preceded by a prime, with which it shares one or more representation(s), regardless of mechanism. Pre-activation is just one of these mechanisms. For example, multiple different mechanisms have been proposed to account for the phenomena of both semantic priming (see Neely, Citation1991 for a review) and syntactic priming (e.g. Chang, Dell, & Bock, Citation2006; Jaeger & Snider, Citation2013; Tooley & Traxler, Citation2010).
9. For example, memory-based models of text processing assumed that simple lexico-semantic relationships within the internal representation of context, approximating to a “bag of words” (quantified using measures like latent semantic analysis, Kintsch, Citation2001; Landauer & Dumais, Citation1997; Landauer, Foltz, & Laham, Citation1998), could interact with lexico-semantic relationships stored within long-term memory, and prime upcoming lexico-semantic information through spreading activation (Kintsch, Citation1988; McKoon & Ratcliff, Citation1992; Myers & O'Brien, Citation1998; Sanford, Citation1990; Sanford & Garrod, Citation1998). This was known as resonance, and it can be distinguished from the use of high-level representations of events or event structures (that include information about “who does what to whom”) to predictively pre-activate upcoming semantic features or categories (see Kuperberg et al., Citation2011; Lau et al., Citation2013; Otten & Van Berkum, Citation2007; Paczynski & Kuperberg, Citation2012 for discussion).
10. There is, however, also evidence that top-down influences on the perception of lower level information is not the exception, but rather the norm, at least at the lowest levels of speech perception. For example, the internal distributional structure of phonological categories is known to affect the perception of subphonemic acoustic similarity (known as the perceptual magnet effect, Feldman et al., Citation2009; Kuhl, Citation1991). This effect has been shown to be a rational consequence of the fact that there is always uncertainty about the perceptual input (due to noise in the neural systems underlying perception). In inferring the percept, comprehenders thus rely on what they know about the statistical structure underlying the speech signal (Feldman et al., Citation2009; see also Haefner, Berkes, & Fiser, Citation2014, for a discussion of how sampling-based top-down pre-activation can explain otherwise surprising correlations in firing rates in neural populations).
11. Actively generative models also provide a link between language comprehension and language production (for discussion, see Jaeger & Ferreira, Citation2013; Pickering & Garrod, Citation2007, Citation2013, and for further discussion of the relationship between prediction in language comprehension and production, see Brown & Kuperberg, Citation2015; Dell & Chang, Citation2014; Federmeier, Citation2007; Garrod & Pickering, Citation2015; Jaeger & Snider, Citation2013; Kurumada & Jaeger, Citation2015; Magyari & de Ruiter, Citation2012).
12. Hierarchical predictive coding in the brain takes the principles of the hierarchical generative framework to an extreme by proposing that the flow of bottom-up information from primary sensory cortices to higher level association cortices constitutes only the prediction error, that is, only information that has not already been “explained away” by predictions that have propagated down from higher level cortices (see Clark, Citation2013; Friston, Citation2005, Citation2008; Wacongne et al., Citation2011).
Taylor, W. (1953). ‘Cloze’ procedure: A new tool for measuring readability. Journalism Quarterly, 30, 415–433. Hale, J. (2003). The information conveyed by words in sentences. Journal of Psycholinguistic Research, 32(2), 101–123. doi:10.1023/A:1022492123056 Hale, J. (2011). What a rational parser would do. Cognitive Science, 35(3), 399–443. doi:10.1111/J.1551-6709.2010.01145.X Frank, S. L. (2013). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science, 5(3), 475–494. doi:10.1111/tops.12025 Linzen, T., & Jaeger, T. F. (2015). Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science. Advance online publication. doi:10.1111/cogs.12274 Roark, B., Bachrach, A., Cardenas, C., & Pallier, C. (2009). Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing. Paper presented at the Proceedings of the 2009 conference on Empirical Methods in Natural Language Processing (EMNLP ‘09), Singapore. Wu, S., Bachrach, A., Cardenas, C., & Schuler, C. (2010). Complexity metrics in an incremental right-corner parser. Paper presented at the Proceedings of the 48th annual meeting of the Association for Computational Linguistics (ACL ‘10), Uppsala, Sweden. Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58(5), 313–323. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In W. E. Prokasy & A. H. Black (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York, NY: Appleton- Century-Crofts. Kruschke, J. K. (2008). Bayesian approaches to associative learning: From passive to active learning. Learning and Behavior, 36(3), 210–226. doi:10.3758/lb.36.3.210 Jaeger, T. F., & Snider, N. E. (2013). Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime's prediction error given both prior and recent experience. Cognition, 127(1), 57–83. doi:10.1016/j.cognition.2012.10.013 Kuperberg, G. R. (2015). What event-related potentials might tell us about the neural architecture of language comprehension. Manuscript submitted for publication. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: MIT Press. Booth, T. L. (1969). Probabilistic representation of formal languages. Paper presented at the IEEE conference record of 10th annual symposium on Switching and Automata Theory, Waterloo, ON, Canada. Cedergren, H. J., & Sankoff, D. (1974). Variable rules: Performance as a statistical reflection of competence. Language, 50(2), 333–355. doi:10.2307/412441 Labov, W. (1969). Contraction, deletion, and inherent variability of the English copula. Language, 45(4), 715–762. doi:10.2307/412333 Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: MIT Press. Dowty, D. R. (1979). Word meaning and Montague grammar: The semantics of verbs and times in generative semantics and in Montague's PTQ. Dordrecht: Reidel. Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39, 170–210. McRae, K., Ferretti, T. R., & Amyote, L. (1997). Thematic roles as verb-specific concepts. Language and Cognitive Processes, 12(2–3), 137–176. doi:10.1080/016909697386835 Fillmore, C. J. (2006). Frame semantics. Cognitive Linguistics: Basic Readings, 34, 373–400. doi:10.1515/9783110199901.373 Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates. Sitnikova, T., Holcomb, P., & Kuperberg, G. R. (2008). Neurocognitive mechanisms of human comprehension. In T. F. Shipley & J. M. Zacks (Eds.), Understanding events: How humans see, represent, and act on events (pp. 639–683). New York, NY: Oxford University Press. Wood, J. N., & Grafman, J. (2003). Human prefrontal cortex: Processing and representational perspectives. Nature Reviews Neuroscience, 4(2), 139–147. doi:10.1038/Nrn1033 Zwaan, R. A., & Radvansky, G. A. (1998). Situation models in language comprehension and memory. Psychological Bulletin, 123(2), 162–185. doi:10.1037/0033-2909.123.2.162 Federmeier, K. D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505. doi:10.1111/j.1469-8986.2007.00531.x Van Petten, C., & Luka, B. J. (2012). Prediction during language comprehension: Benefits, costs, and ERP components. International Journal of Psychophysiology, 83(2), 176–190. doi:10.1016/j.ijpsycho.2011.09.015 Hagoort, P., Baggio, G., & Willems, R. M. (2009). Semantic unification. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (4th ed., pp. 819–836). Cambridge: MIT Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. New York, NY: Oxford University Press. Lau, E. F., Phillips, C., & Poeppel, D. (2008). A cortical network for semantics: (De)constructing the N400. Nature Reviews Neuroscience, 9(12), 920–933. doi:10.1038/nrn2532 Neely, J. H. (1991). Semantic priming effects in visual word recognition: A selective review of current findings and theories. In D. Besner & G. W. Humphreys (Eds.), Basic processes in reading and visual word recognition (pp. 264–333). Hillsdale, NJ: Erlbaum. Chang, F., Dell, G. S., & Bock, J. K. (2006). Becoming syntactic. Psychological Review, 113(2), 234–272. doi:10.1037/0033-295x.113.2.234 Jaeger, T. F., & Snider, N. E. (2013). Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime's prediction error given both prior and recent experience. Cognition, 127(1), 57–83. doi:10.1016/j.cognition.2012.10.013 Tooley, K. M., & Traxler, M. J. (2010). Syntactic priming effects in comprehension: A critical review. Language and Linguistics Compass, 4(10), 925–937. doi:10.1111/j.1749-818X.2010.00249.x Kintsch, W. (2001). Predication. Cognitive Science, 25(173–202). doi:10.1207/s15516709cog2502_1 Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240. doi:10.1037/0033-295x.104.2.211 Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284. doi:10.1080/01638539809545028 Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95, 163–182. doi:10.1037/0033-295X.95.2.163 McKoon, G., & Ratcliff, R. (1992). Inference during reading. Psychological Review, 99(3), 440–466. doi:10.1037/0033-295X.99.3.440 Myers, J. L., & O'Brien, E. J. (1998). Accessing the discourse representation during reading. Discourse Processes, 26(2&3), 131–157. doi:10.1080/01638539809545042 Sanford, A. J. (1990). On the nature of text-driven inference. In D. A. Balota, F. d'Arcais, & K. Rayner (Eds.), Comprehension processes in reading (pp. 515–538). Hillsdale, NJ: Erlbaum. Sanford, A. J., & Garrod, S. C. (1998). The role of scenario mapping in text comprehension. Discourse Processes, 26(2–3), 159–190. doi:10.1080/01638539809545043 Kuperberg, G. R., Paczynski, M., & Ditman, T. (2011). Establishing causal coherence across sentences: an ERP study. Journal of Cognitive Neuroscience, 23(5), 1230–1246. doi:10.1162/jocn.2010.21452 Lau, E. F., Holcomb, P. J., & Kuperberg, G. R. (2013). Dissociating N400 effects of prediction from association in single-word contexts. Journal of Cognitive Neuroscience, 25(3), 484–502. doi:10.1162/jocn_a_00328 Otten, M., & Van Berkum, J. J. A. (2007). What makes a discourse constraining? Comparing the effects of discourse message and scenario fit on the discourse-dependent N400 effect. Brain Research, 1146, 158–171. doi:10.1016/j.brainres.2007.03.058 Paczynski, M., & Kuperberg, G. R. (2012). Multiple influences of semantic memory on sentence processing: Distinct effects of semantic relatedness on violations of real-world event/state knowledge and animacy selection restrictions. Journal of Memory and Language, 67(4), 426–448. doi:10.1016/j.jml.2012.07.003 Feldman, N. H., Griffiths, T. L., & Morgan, J. L. (2009). The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference. Psychological Review, 116(4), 752–782. doi:10.1037/a0017196 Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50(2), 93–107. doi:10.3758/bf03212211 Feldman, N. H., Griffiths, T. L., & Morgan, J. L. (2009). The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference. Psychological Review, 116(4), 752–782. doi:10.1037/a0017196 Haefner, R. M., Berkes, P., & Fiser, J. (2014). Perceptual decision-making as probabilistic inference by neural sampling. arXiv preprint arXiv:1409.0257. Jaeger, T. F., & Ferreira, V. (2013). Seeking predictions from a predictive framework. Behavioral and Brain Sciences, 36(4), 359–360. doi:10.1017/S0140525X12002762 Pickering, M. J., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11(3), 105–110. doi:10.1016/j.tics.2006.12.002 Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences, 36(04), 329–347. doi:10.1017/S0140525X12001495 Brown, M., & Kuperberg, G. R. (in press). A hierarchical generative framework of language processing: Linking language perception, interpretation, and production abnormalities in schizophrenia. Frontiers in Human Neuroscience. Dell, G. S., & Chang, F. (2014). The P-chain: Relating sentence production and its disorders to comprehension and acquisition. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1634), 20120394. doi:10.1098/rstb.2012.0394 Federmeier, K. D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505. doi:10.1111/j.1469-8986.2007.00531.x Garrod, S., & Pickering, M. J. (2015). The use of content and timing to predict turn transitions. Frontiers in Psychology, 6, 751. doi:10.3389/fpsyg.2015.00751 Jaeger, T. F., & Snider, N. E. (2013). Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime's prediction error given both prior and recent experience. Cognition, 127(1), 57–83. doi:10.1016/j.cognition.2012.10.013 Kurumada, C., & Jaeger, T. F. (2015). Communicative efficiency in language production: Optional case-marking in Japanese. Journal of Memory and Language, 83, 152–189. doi:10.1016/j.jml.2015.03.003 Magyari, L., & de Ruiter, J. P. (2012). Prediction of turn-ends based on anticipation of upcoming words. Frontiers in Psychology, 3, 376. doi:10.3389/fpsyg.2012.00376 Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 233–253. doi:10.1017/S0140525X12000477 Friston, K. J. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456), 815–836. doi:10.1098/Rstb.2005.1622 Friston, K. J. (2008). Hierarchical models in the brain. PLoS Computational Biology, 4(11), e1000211. doi:10.1371/journal.pcbi.1000211 Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences, 108(51), 20754–20759. doi:10.1073/Pnas.1117807108 Additional information
Funding
This work was partially funded by NIMH [R01 MH071635] and NICHD [R01 HD082527] grants to G. R. K., as well as by NICHD [R01 HD075797] and an NSF CAREER grant [IIS 1150028] to T. F. J.