1,970
Views
0
CrossRef citations to date
0
Altmetric
INTRODUCTION

Introduction to the special issue emergence of speech and language from prediction error: error-driven language models

& ORCID Icon
Pages 411-418 | Received 21 Oct 2022, Accepted 11 Mar 2023, Published online: 18 Apr 2023

ABSTRACT

Last year, 2022, marked the 50th anniversary of the Rescorla-Wagner learning equations – a landmark in the development of learning theory. Originally based on animal learning, the equations and error-driven learning models as a whole were rapidly adopted and applied in several areas of human psychology. While language acquisition research initially took a different path, interest in the role of error-driven learning in language is growing. With the aim of strengthening this emerging research field, sparking discussion and cross-fertilisation of ideas across linguistics, psychology and cognate fields, this Special Issue presents nine papers that address the role of error-driven learning in language. The papers investigate a wide range of subfields of linguistics, and include two review articles, in addition to computational modelling and experimental research. The collection thus serves both as an introduction for those new to the subject and as an overview for those already working in the field.

1. Introduction

This is a critical moment in the field of language research. The last several decades of empirical research in linguistics have witnessed a dramatic shift in the fundamental assumptions researchers hold about language. The previously dominant idea that language was largely innate and relied on specialised biological systems (Chomsky & Halle, Citation1968; Eimas & Corbit, Citation1973; Eimas et al., Citation1971), is increasingly giving way to the idea that language emerges from general learning mechanisms (see Frost et al., Citation2019; Hasson, Citation2017; Perruchet & Pacton, Citation2006; Rebuschat & Monaghan, Citation2019, for reviews; and Nixon & Tomaschek, Citation2023, for discussion related to speech sounds). We are now faced with a new problem: if language is learned, what cognitive mechanisms are responsible? And what implications does this have for everyday language use? There has also been increasing interest in evolving the field of linguistics by developing robust models of language that draw not only on experimental data, but are also supported by mathematical and computational modelling (Frost et al., Citation2015; MacWhinney, Citation2010; McMurray et al., Citation2009; McMurray & Hollich, Citation2009; Monaghan & Rowland, Citation2017).

Meanwhile, in a separate area of psychology, robust theoretical, mathematical and computational models of learning, known as error-driven learning models, have been developed. The models were originally based on animal learning research that could be said to have started with Pavlov, and kicked off in earnest at least as early as the 1960s (Kamin, Citation1968, Citation1969; Rescorla, Citation1968, Citation1988; Siegel & Allan, Citation1996; Widrow & Hoff, Citation1960), eventually leading to the highly influential Rescorla-Wagner learning equations (Rescorla & Wagner, Citation1972).

Due to an unfortunate lack of interdisciplinary communication, this rich knowledge about learning has, until recently, hardly been known in the language sciences. However, language researchers have begun to draw on the insights of learning theory and use the tools of error-driven learning to investigate human learning in linguistic phenomena (Baayen et al., Citation2016; Chang et al., Citation2006; Ellis & Sagarra, Citation2010; Hills et al., Citation2010; Nieder et al., Citation2022; Nixon, Citation2018, Citation2020; Nixon & Tomaschek, Citation2020, Citation2021; Ramscar, Dye, McCauley, Citation2013; Ramscar & Yarlett, Citation2007; Ramscar et al., Citation2010; Tomaschek et al., Citation2019; Tomaschek & Ramscar, Citation2022). The fact that the models are mathematically and computationally implemented allows researchers to make strong predictions about linguistic behaviour and to find common mechanisms for seemingly unrelated phenomena. Error-driven learning models are proving to be remarkably powerful for predicting and explaining a wide range of phenomena in human learning, including language acquisition, comprehension and production. However, wide-scale use of these models has not yet reached the language sciences. This may be at least partly due to a lack of awareness of and familiarity with error-driven learning models in the linguistic, psycholinguistic and language science communities. Therefore, one of the key aims of this special issue is to raise awareness of error-driven learning models among researchers in these disciplines. That said, while the focus of this issue is language, the principles of error-driven learning are applicable across many other areas of cognitive science and psychology, such as category learning, memory, visual processing, music cognition, motor control and social cognition, to name but a few. Examination of these models in the present issue may help spark fruitful development or connections with other such disciplines.

But what is “error” and what is its function in learning? Below, we first briefly introduce error-driven learning and the related literature. We then introduce each of the contributions in this collection.

2. Error-driven learning

According to recent theories of learning such as “discriminative learning” (e.g. Ramscar, Dye, McCauley, Citation2013; Ramscar et al., Citation2010), the “free energy” theory (e.g. Friston, Citation2011; Kaplan & Friston, Citation2018) and connectionist models (Chang, Citation2002; Chang et al., Citation2006, Citation2000; Elman, Citation1990), learning is based on prediction and feedback from prediction error. Below, we give a brief introduction to error-driven learning models. For a more in-depth introduction and comparison with some other leading models, the reader is referred to Nixon (Citation2020) and Ramscar, Dye, McCauley (Citation2013).

At any given moment, incoming sensory cues are potentially used to make predictions about upcoming events (outcomes in the error-driven learning parlance;Footnote1 Hoppe et al., Citation2020; Nixon, Citation2020; Nixon & Tomaschek, Citation2020, Citation2021; Ramscar et al., Citation2010). In error-driven learning models, these predictions – i.e. expectations – are represented as connections between cues and outcomes; the more expected an outcome is, based on a cue (or set of cues), the greater the connection strength (or connection weights). Learning is implemented as incremental adjustments to these connection weights.

Closely related to the concept of expectation, prediction error is essentially surprise. A common misinterpretation of error is to think of it as equivalent to a “mistake” – along the lines of “I expected X, but Y happened”. Rather, prediction error is a gradient measure of the difference between the degree of expectation of an outcome and the observed occurrence (or non-occurrence) of the outcome. As such, if I have a 90% expectation of X, and X occurs, a 10% prediction error remains (the difference between 90% and 100%); on the other hand, if I have a 90% expectation of X, and X does not occur, I have 90% prediction error (the difference between 90% and 0%).Footnote2

This numerical calculation of the degree of expectation and error is crucial to the model. These calculations are the basis for estimating learning. Learning can be estimated incrementally in what are called learning events: for example, each trial of an experiment. The degree of learning in each learning event is proportional to the prediction error in that learning event. Highly expected outcomes, if they occur, yield less prediction error – and thus less learning – than outcomes that are less expected. Conversely, when an expected outcome does not occur, this results in more error – and therefore more learning – than when an unexpected outcome does not occur.

In the Rescorla-Wagner equations (Rescorla & Wagner, Citation1972), outcomes are considered to have maximum and minimum limits on expectation. Expectation tends to start at or near zero early in learning, and when cues predict the occurrence of an outcome, expectation increases with experience. Crucially, as a result of adjustments being proportional to prediction error, learning is not linear, but rather takes the shape of the famous “learning curve”: learning is faster early on and slows over time as expectation increases.

Another important aspect of the model is that any adjustments made to connection weights in a given learning event are divided and shared equally between all cues present in that learning event.Footnote3 This means that cues compete for predicting outcomes. Cue competition has important consequences. One well-known example of how cue competition affects learning is when there are differences in the relative timing of learning different cues. If certain cues are encountered early on in learning (near the beginning of the learning curve), they may gain a bigger share of learning, due to weight adjustments being larger at that early stage. Later, when learning approaches asymptote – i.e. when an outcome is highly expected and uncertainty is low – any new cues introduced at this point are not learned well. This “blocking effect” was first demonstrated in Kamin's animal learning experiments (Kamin, Citation1969). In an artificial language learning study, Nixon (Citation2020) showed this blocking effect also occurs in speech acquisition and may be an important factor in understanding how second language acquisition differs from first language acquisition.

The principles of error-driven learning are computationally implemented most famously in Rescorla & Wagner's (Citation1972) learning equations (Rescorla & Wagner, Citation1972), although a similar algorithm was also proposed by Widrow and Hoff (Citation1960, see also Stone, Citation1986). It is formally equivalent to the delta-rule used in connectionist networks (Sutton & Barto, Citation1981) and equivalent to learning algorithms in most neural networks (Ng & Jordan, Citation2002). The Rescorla-Wagner learning equations have been used to predict learning and human performance in the domains of morphological learning in children (Ramscar, Dye, Kelin, Citation2013; Ramscar, Dye, McCauley, Citation2013; Ramscar et al., Citation2011, Citation2010), morphological processing in adult perception and production (Nieder et al., Citation2021, Citation2022; Tomaschek et al., Citation2019; Tomaschek & Ramscar, Citation2022), auditory comprehension and word recognition (Arnold et al., Citation2017; Baayen et al., Citation2016; Shafaei-Bajestan & Baayen, Citation2018) and speech acquisition and phonetic learning in adult second language learners (Nixon, Citation2018, Citation2020) and infants (Nixon & Tomaschek, Citation2020, Citation2021). The papers in this Special Issue add to the growing literature exploring error-driven learning as a mechanism contributing to language acquisition and processing, perception, production and comprehension. The papers are summarised below.

3. This issue

The nine papers in this issue cover a broad range of subfields of linguistics: from phonetic learning (McMurray, Citation2023), morphology (Ramscar, Citation2023), semantics (Filipović Ðurđević & Kostić, Citation2023; Luo et al., Citation2023) and word recognition (Shafaei-Bajestan et al., Citation2023) to structural priming (Khoe et al., Citation2023), letter sequences (Kapatsinski, Citation2023), grammar acquisition in sentence processing (Bröker & Ramscar, Citation2023) and memory formation during reading (Haeuser & Kray, Citation2023). The broad scope of topics illustrates the potential of error-driven learning as a unifying theory of language acquisition and processing. In saying this, we do not assume that error-driven learning is the only learning mechanism involved; however, it appears to be involved at all linguistic levels.

Underlying error-driven learning theory are computational implementations that allow researchers to make precise predictions about human behaviour, neural signals or any other measurable effects of learning. Several papers use R packages such as NDL (Arppe et al., Citation2018), EDL (van Rij & Hoppe, Citation2021) or LDL (Baayen et al., Citation2019) which are based on the Rescorla-Wagner equations (Rescorla & Wagner, Citation1972). Others use deep learning (Luo et al., Citation2023) or the Dual-path model proposed by (Chang et al., 2023; Khoe et al., Citation2023). Two papers in this Special Issue specifically address the computational implementation, either discussing aspects of the algorithm (Kapatsinski, Citation2023) or the input representations (Bröker & Ramscar, Citation2023).

3.1. Phonetics

McMurray (Citation2023) provides an extensive review of the state of the art in speech acquisition research. He re-examines the question of how it is that infants learn the phonetic space of their language. In what McMurray terms the “standard model”, it is often assumed that infants learn native language speech sounds very early on, within the first year, and that – since it appears that there are no overt “teaching signals” – this must be achieved via unsupervised learning mechanisms, such as distributional learning. However, McMurray (Citation2023) argues against this view for a number of reasons. In particular, he argues that the idea that there were no teaching signals available was premised on the assumption that teaching signals needed explicit instruction or corrections. However, in an error-driven learning framework, the teaching signals are essentially always available, because they result from the process of prediction and prediction error – even in what McMurray calls an “unsupervised ecology”.

3.2. Morphology

Traditionally, it was assumed that regularly inflected word forms were obtained through the application of rule-based processes, while irregularly inflected word forms were stored in the lexicon (see e.g. Pinker & Prince, Citation1994). However, Rumelhart and McClelland (Citation1986) presented a computational model that could produce regular and irregular word forms through analogy learned by means of a single learning mechanism. Ramscar (Citation2023) further examines Rumelhart and McClelland (Citation1986)'s proposal. He discusses how the knowledge required to understand and produce regularly and irregularly inflected words is obtained through prediction and prediction-error. A review is presented of how this knowledge is applied to novel words from an error-driven learning perspective, taking into account developmental changes in children's morphological capabilities as well as context-dependent variation in inflection.

3.3. Semantics

Filipović Ðurđević & Kostić (Citation2023) take an information theoretic and discriminative learning approach to understanding polysemy. The term polysemy is usually defined as the characteristic of a word having multiple senses. As such, polysemy is a form of ambiguity. Typically, the degree of ambiguity of polysemous words is considered to depend on the number of senses. However, Filipović Ðurđević & Kostić (Citation2023) refine this measure of ambiguity. Rather than discrete counts of the number of senses, they reinterpret polysemy as a continuous measure of sense diversity or sense uncertainty. They use these information theoretic measures and discriminative learning to examine the question of how polysemy is processed during visual lexical decision.

Luo et al. (Citation2023) use a deep learning model to show how words shape semantic processing through a discriminative process – labels (i.e. words) that are shared by objects or events highlight commonalities between these objects or events, while contrasting labels highlight differences. Differences in the use of labels might arise, for example, from differences between languages or differences in levels of expertise. For example, do all dogs share the same label or are dogs labelled by their breed? Luo et al. (Citation2023) show how these differences in label use lead to differences in the semantic representations underlying the labels; in other words, they demonstrate a means by which language can shape thought.

3.4. Word recognition

Shafaei-Bajestan et al. (Citation2023) use a computational modelling experiment to address the role of discrete units in word recognition. Although most theoretical and computational work proposes that discrete phones function as an interface between the variable acoustic information listeners perceive and meaning, there has been strong interest in developing models that do not assume discrete units (see Nixon & Tomaschek, Citation2020, Citation2021 for error-driven learning models that do not assume discrete units and Nixon & Tomaschek, Citation2023 for a review). To this end, Shafaei-Bajestan et al. (Citation2023) modelled word recognition without any discrete units, either in the acoustic input or in the semantic output. Instead, they used a gradient representation of meaning and acoustic input obtained from spontaneous speech of many speakers to train their model. On the basis of their results, they argue that word recognition does not need an interface in the form of discrete units. Instead, observed discrete units are actually an epiphenomenon of phone-like clustering of acoustic cues due to learning.

3.5. Sentence processing

In three simulations of bilingual sentence production, Khoe et al. (Citation2023) test whether cross-linguistic structural priming can be explained by error-driven implicit learning. Previous work has investigated whether participants' choice between two alternative constructions is affected by recent exposure. For example, the dative alternation has two alternative constructions, e.g.: “[subject] showed [direct object] to [indirect object]” or “[subject] showed [indirect object] [direct object]”. Previous studies have shown that when two alternative structures are available, the likelihood of participants selecting a particular alternative is increased by recently reading (or hearing) that alternative – a phenomenon known as structural priming (Bock, Citation1986; Branigan & Pickering, Citation2017). This phenomenon has traditionally been understood as resulting from increased activationFootnote4 of the recently encountered alternative. On the other hand, the Dual-path model of sentence production (Chang, Citation2002; Chang et al., Citation2006), proposes that this structural priming effect can instead be explained as a learning effect, rather than an effect of activation in this sense.

Khoe et al. (Citation2023) ran three bilingual models of sentence production (Spanish-English, verb-final Dutch-English, and verb-medial Dutch-English). The simulations were able to capture key findings from behavioural studies of structural priming, supporting the proposal that cross-linguistic structural priming occurs as a result of error-driven implicit learning.

A related question is whether error-driven implicit learning also plays a role in how memory traces are formed while reading. Some studies report that in sentence reading, unpredictable words are better recalled than predictable words, arguing for a “boosting effect” of prediction error in the case of unpredictability. Other studies demonstrate the opposite finding – an effect that is supposed to result from a stronger entrenchment of predictable words in the lexicon. To solve this contradiction, Haeuser & Kray (Citation2023) further investigate the effect of prediction error in a self-paced reading task. They demonstrate that unpredictable words were indeed better recalled. Their result not only supports the idea that words with larger prediction error are better stored in memory, but demonstrates the importance of prediction error for long-term learning.

3.6. Computational implementation

One prediction of the Rescorla-Wagner model is that under certain specific circumstances, due to cue competition, a cue that does not co-occur with an outcome will develop positive connection weights with that outcome. This counter-intuitive prediction has been termed “spurious excitement”. Kapatsinski (Citation2023) puts this prediction to the test with the learning of monosyllabic consonant-vowel words – which occurred either in isolation or in sequences of two or three – as cues to press a left or right arrow. The cues were carefully selected to produce the conditions under which the Rescorla-Wagner model predicts spurious excitement. The experimental results showed no evidence of spurious excitement, calling into question this prediction of the Rescorla-Wagner model. Based on these results Kapatsinski (Citation2023) proposes an adjustment to the Rescorla-Wagner learning equations: rather than a linear activation function, a logistic activation function is proposed, which avoids spurious excitement and yet is still able to capture many of the known learning effects.

Bröker & Ramscar (Citation2023) take an overarching look at the question of what is involved in cognitive modelling and the role of computational modelling in this. They argue that an extremely important – but frequently overlooked – aspect of computational cognitive modelling lies in determining the appropriate input representations to the model. That is, not only is it important to determine the algorithm that best captures (human language) learning, but it is also important to determine what constitutes the input to that learning. To address this, they revisit the question of the role that implicit “negative evidence” plays in language learning. They find that with a certain set of input representations to the model, it is possible to model the results with a dual-mechanism approach, in which children are assumed to switch their assumptions about how the data are sampled depending on the context and adjust their learning strategy accordingly (i.e. between a generative model and a discriminative model). On the other hand, they also find that a single mechanism (discriminative model) approach is possible with a different set of model representations. They conclude that an increased focus on selecting model input representations that faithfully capture the task design could benefit the conceptualisation, evaluation and comparison of cognitive models.

4. Conclusions and future directions

The contributions in this special issue show a wide range of phenomena predicted by error-driven learning models, from behavioural tasks to neural processes, from phonetics to sentence processing. The papers used various different implementations of error-driven learning models, discriminative learning, deep learning and connectionist models, which are all united in their assumption that learning occurs through a process of prediction and feedback from prediction error.

In the transition from animal models to models of human learning, there have been a number of challenges, some of which have been discussed widely in the literature. For example, what is the role of attention in error-driven learning? Should error-driven learning be considered a mechanism for explicit processes or is it purely an implicit learning model? Are there differences between reinforcement learning – i.e. learning involving rewards – and sensory error-driven learning? If so, which best characterises language learning? These will be important questions to investigate in the further development of a learning theory of language.

Acknowledgments

We would like to express thanks for the wonderful editorial support throughout the process of putting together this special issue. Thanks also to the authors for their valuable contributions and hard work, without which this special issue would not have been possible.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This publication was supported by a collaborative grant from the Deutsche Forschungsgemeinschaft (German Research Foundation; Research Unit FOR2373, Project BA 3080/3-2).

Notes

1 In the following, we use the terminology of discriminative learning. Although there are differences in the implementation, the general principles described below apply to all the models.

2 This calculation assumes equivalent calculation of the prediction error for the occurrence vs. non-occurrence of outcomes, i.e. “positive” vs. “negative” evidence. For further discussion of this issue, see Nixon (Citation2020); Ramscar, Dye, McCauley (Citation2013).

3 Some researchers have argued that absent cues may also affect learning (e.g. Van Hamme & Wasserman, Citation1994), but see Nixon et al. (Citation2022) for discussion and an alternative view.

4 Note that the term “activation” is used in a different sense here to the way it is commonly used in relation to the Rescorla-Wagner equations and similar learning algorithms: in the Rescorla-Wagner equations, activation refers to the sum of connection weights; here it is used in a more general sense commonly used in psycholinguistics to refer to transient neural or cognitive activity related to the item.

References

  • Arnold, D., Tomaschek, F., Sering, K., Lopez, F., & Baayen, R. H. (2017, April). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PloS One, 12(4), Article e0174623. https://doi.org/10.1371/journal.pone.0174623
  • Arppe, A., Hendrix, P., Milin, P., Baayen, R. H., Sering, T., & Shaoul, C. (2018, September). NDL: Naive discriminative learning. Retrieved November 20, 2020, from https://CRAN.R-project.org/package=ndl
  • Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition but in linear discriminative learning. Complexity, 2019. https://doi.org/10.1017/S0140525X16002028
  • Baayen, R. H., Shaoul, C., Willits, J., & Ramscar, M. (2016). Comprehension without segmentation: A proof of concept with naive discriminative learning. Language, Cognition and Neuroscience, 31(1), 106–128. https://doi.org/10.1080/23273798.2015.1065336
  • Bock, J. K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18(3), 355–387. https://doi.org/10.1016/0010-0285(86)90004-6
  • Branigan, H. P., & Pickering, M. J. (2017). Structural priming and the representation of language. Behavioral and Brain Sciences, 40, e282.
  • Bröker, F., & Ramscar, M. (2023). Representing absence of evidence: Why algorithms and representations matter in models of language and cognition. Language, Cognition and Neuroscience, 26(5), 609–651. https://doi.org/10.1080/23273798.2020.1862257
  • Chang, F. (2002). Symbolically speaking: A connectionist model of sentence production. Cognitive Science, 26(5), 609–651. https://doi.org/10.1207/s15516709cog2605_3
  • Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological Review, 113(2), 234–272. https://doi.org/10.1037/0033-295X.113.2.234
  • Chang, F., Dell, G. S., Bock, K., & Z. M. Griffin (2000). Structural priming as implicit learning: A comparison of models of sentence production. Journal of Psycholinguistic Research, 29(2), 217–230. https://doi.org/10.1023/A:1005101313330
  • Chomsky, N., & Halle, M. (1968). The sound pattern of English. Harper & Row, Publishers.
  • Eimas, P. D., & Corbit, J. D. (1973). Selective adaptation of linguistic feature detectors. Cognitive Psychology, 4(1), 99–109. https://doi.org/10.1016/0010-0285(73)90006-6
  • Eimas, P. D., E. R. Siqueland, Jusczyk, P., & Vigorito, J. (1971). Speech perception in infants. Science, 171(3968), 303–306. https://doi.org/10.1126/science.171.3968.303
  • Ellis, N. C., & Sagarra, N. (2010). The bounds of adult language acquisition: Blocking and learned attention. Studies in Second Language Acquisition, 32(4), 553–580. https://doi.org/10.1017/S0272263110000264
  • Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1207/s15516709cog1402_1.
  • Filipović Ðurđević, D., & Kostić, A. (2023). We probably sense sense probabilities. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.1909083
  • Friston, K. (2011). What is optimal about motor control? Neuron, 72(3), 488–498. https://doi.org/10.1016/j.neuron.2011.10.018
  • Frost, R., Armstrong, B. C., & Christiansen, M. H. (2019). Statistical learning research: A critical review and possible new directions. Psychological Bulletin, 145(12), 1128–1153. https://doi.org/10.1037/bul0000210
  • Frost, R., Armstrong, B. C., Siegelman, N., & Christiansen, M. H. (2015). Domain generality versus modality specificity: The paradox of statistical learning. Trends in Cognitive Sciences, 19(3), 117–125. https://doi.org/10.1016/j.tics.2014.12.010
  • Haeuser, K. I., & Kray, J. (2023). Effects of prediction error on episodic memory retrieval: Evidence from sentence reading and word recognition. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.1924387
  • Hasson, U. (2017). The neurobiology of uncertainty: Implications for statistical learning. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1711), Article 20160048. https://doi.org/10.1098/rstb.2016.0048
  • Hills, T. T., Maouene, J., Riordan, B., & L. B. Smith (2010). The associative structure of language: Contextual diversity in early word learning. Journal of Memory and Language, 63(3), 259–273. https://doi.org/10.1016/j.jml.2010.06.002
  • Hoppe, D. B., van Rij, J., Hendriks, P., & Ramscar, M. (2020, November). Order matters! influences of linear order on linguistic category learning. Cognitive Science, 44(11), Article e12910. https://doi.org/10.1111/cogs.12910
  • Kamin, L. J. (1968). “Attention-like” processes in classical conditioning. In Miami symposium on the prediction of behavior: Aversive stimulation (pp. 9–31).
  • Kamin, L. J. (1969). Predictability, surprise, attention and conditioning. In Punishment and aversive behavior. Appleton-Century-Crofts
  • Kapatsinski, V. (2023). Learning fast while avoiding spurious excitement and overcoming cue competition requires setting unachievable goals: Reasons for using the logistic activation function in learning to predict categorical outcomes. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.1927120
  • Kaplan, R., & Friston, K. J. (2018, March). Planning and navigation as active inference. Biological Cybernetics, 112(4), 323–343. https://doi.org/10.1007/s00422-018-0753-2
  • Khoe, Y. H., Tsoukala, C., Kootstra, G. J., & Frank, S. L. (2023). Is structural priming between different languages a learning effect? Modelling priming as error-driven implicit learning. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.1998563
  • Luo, X., Sexton, N. J., & Love, B. C. (2023). A deep learning account of how language affects thought. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.2001023
  • MacWhinney, B. (2010). Computational models of child language learning: An introduction. Journal of Child Language, 37(3), 477–485. https://doi.org/10.1017/S0305000910000139
  • McMurray, B. (2023). The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2022.2105367
  • McMurray, B., Aslin, R. N., & Toscano, J. C. (2009). Statistical learning of phonetic categories: Insights from a computational approach. Developmental Science, 12(3), 369–378. https://doi.org/10.1111/desc.2009.12.issue-3
  • McMurray, B., & Hollich, G. (2009). Core computational principles of language acquisition: Can statistical learning do the job? Introduction to special section. Developmental Science, 12(3), 365–368. https://doi.org/10.1111/desc.2009.12.issue-3
  • Monaghan, P., & Rowland, C. F. (2017). Combining language corpora with experimental and computational approaches for language acquisition research. Language Learning, 67(S1), 14–39. https://doi.org/10.1111/lang.12221
  • Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In M. I. Jordan, Y. LeCun, & S. A. Solla (Eds.), Advances in neural information processing systems. Neural information processing systems conferences from 1988 to 1999 (CDROM) (pp. 841–848).
  • Nieder, J., Tomaschek, F., Cohrs, E., & de Vijver, R. V. (2021, September). Modelling Maltese noun plural classes without morphemes. Language, Cognition and Neuroscience, 37(2), 1–22. https://doi.org/10.1080/23273798.2021.1977835
  • Nieder, J., van de Vijver, R., & Tomaschek, F. (2022, October). “All mimsy were the borogoves” – a discriminative learning model of morphological knowledge in pseudo-word inflection. Language, Cognition and Neuroscience, 1–18. https://doi.org/10.1080/23273798.2022.2127805
  • Nixon, J. S. (2018). Effective acoustic cue learning is not just statistical, it is discriminative. In Interspeech 2018 – 19th annual conference of the international speech communication association (pp. 1447–1451). International Speech Communication Association
  • Nixon, J. S. (2020). Of mice and men: Speech sound acquisition as discriminative learning from prediction error, not just statistical tracking. Cognition, 197, Article 104081. https://doi.org/10.1016/j.cognition.2019.104081
  • Nixon, J. S., Poelstra, S., & van Rij, J. (2022). Does error-driven learning occur in the absence of cues? Examination of the effects of updating connection weights to absent cues. In Proceedings of the 44th annual meeting of the cognitive science society (pp. 2590–2597). Cognitive Science Society
  • Nixon, J. S., & Tomaschek, F. (2020). Learning from the acoustic signal: Error-driven learning of low-level acoustics discriminates vowel and consonant pairs. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd annual meeting of the cognitive science society (pp. 585–591). Cognitive Science Society.
  • Nixon, J. S., & Tomaschek, F. (2021). Prediction and error in early infant speech perception: A speech acquisition model. Cognition, 212, Article 104697. https://doi.org/10.1016/j.cognition.2021.104697
  • Nixon, J. S., & Tomaschek, F. (2023). Does speech comprehension require phonemes? In M. Diaz-Campos, & S. Balasch (Eds.), The handbook of usage-based linguistics. Forthcoming. Wiley Blackwell Publishing.
  • Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in Cognitive Sciences, 10(5), 233–238. https://doi.org/10.1016/j.tics.2006.03.006
  • Pinker, S., & Prince, A. (1994). Regular and irregular morphology and the psychological status of rules of grammar. In S. Lima, R. Corrigan, & G. Iverson (Eds.), The reality of linguistic rules (pp. 353–388). John Benjamins.
  • Ramscar, M. (2023). A discriminative account of the learning, representation and processing of inflection systems. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.2014062
  • Ramscar, M., Dye, M., & Klein, J. (2013). Children value informativity over logic in word learning. Psychological Science, 24(6), 1017–1023. https://doi.org/10.1177/0956797612460691
  • Ramscar, M., Dye, M., & McCauley, S. M. (2013). Error and expectation in language learning: The curious absence of mouses in adult speech. Language, 89(4), 760–793. https://doi.org/10.1353/lan.2013.0068
  • Ramscar, M., Dye, M., Popick, H. M., O'Donnell-McCarthy, F., & Bishop, D. (2011). The enigma of number: Why children find the meanings of even small number words hard to learn and how we can help them do better. PloS One, 6(7), Article e22501. https://doi.org/10.1371/journal.pone.0022501
  • Ramscar, M., & Yarlett, D. (2007). Linguistic self-correction in the absence of feedback: A new approach to the logical problem of language acquisition. Cognitive Science, 31(6), 927–960. https://doi.org/10.1080/03640210701703576
  • Ramscar, M., Yarlett, D., Dye, M., Denny, K., & Thorpe, K. (2010). The effects of feature-label-order and their implications for symbolic learning. Cognitive Science, 34(6), 909–957. https://doi.org/10.1111/(ISSN)1551-6709
  • Rebuschat, P., & Monaghan, P. (2019). Editors' introduction: Aligning implicit learning and statistical learning: Two approaches, one phenomenon. Topics in Cognitive Science, 11(3), 459–467. https://doi.org/10.1111/tops.2019.11.issue-3
  • Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66(1), 1–5. https://doi.org/10.1037/h0025984
  • Rescorla, R. A. (1988). Pavlovian conditioning – it's not what you think it is. American Psychologist, 43(3), 151–160. https://doi.org/10.1037/0003-066X.43.3.151
  • Rescorla, R. A., & Wagner, A. (1972). A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black, & W. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–69). Appleton Century Crofts.
  • Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of English verbs. In G. T. M. Altmann (Ed.), Psycholinguistics. MIT Press.
  • Shafaei-Bajestan, E., & Baayen, R. H. (2018). Wide learning for auditory comprehension. In Interspeech (pp. 966–970). International Speech Communication Association.
  • Shafaei-Bajestan, E., Moradipour-Tari, M., Uhrig, P., & Baayen, R. H. (2023). LDL-AURIS: A computational model, grounded in error-driven learning, for the comprehension of single spoken words. Language, Cognition and Neuroscience. https://doi.org/10.1080/23273798.2021.1954207
  • Siegel, S., & Allan, L. G. (1996). The widespread influence of the Rescorla-Wagner model. Psychonomic Bulletin & Review, 3(3), 314–321. https://doi.org/10.3758/BF03210755
  • Stone, G. O. (1986). An analysis of the delta rule and the learning of statistical associations. In Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 444–459). MIT Press. https://ieeexplore.ieee.org/servlet/opac?bknumber=6276825
  • Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88(2), 135–170. https://doi.org/10.1037/0033-295X.88.2.135
  • Tomaschek, F., Plag, I., Ernestus, M., & Baayen, R. H. (2019). Phonetic effects of morphology and context: Modeling the duration of word-final S in English with naïve discriminative learning. Journal of Linguistics, 57(1), 123–161. https://doi.org/10.1017/S0022226719000203
  • Tomaschek, F., & Ramscar, M. (2022). Understanding the phonetic characteristics of speech under uncertainty–implications of the representation of linguistic knowledge in learning and processing. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.754395
  • Van Hamme, L. J., & E. A. Wasserman (1994). Cue competition in causality judgments: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25(2), 127–151. https://doi.org/10.1006/lmot.1994.1008
  • van Rij, J., & Hoppe, D. (2021). EDL: Toolbox for error-driven learning simulations with two-layer networks. Retrieved November 20, 2020, from https://jacolienvanrij.com/Rpackages/edl/
  • Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. In 1960 WESCON convention record part IV (pp. 96–104). https://www.bibsonomy.org/bibtex/24c3b6ae932deb6bb1d04ad76c9c94a69/schaul

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.