326
Views
3
CrossRef citations to date
0
Altmetric
REGULAR ARTICLES

Learning fast while avoiding spurious excitement and overcoming cue competition requires setting unachievable goals: reasons for using the logistic activation function in learning to predict categorical outcomes

Pages 575-596 | Received 04 Jun 2020, Accepted 26 Apr 2021, Published online: 17 May 2021

References

  • Arnold, D., Tomaschek, F., Sering, K., Lopez, F., & Baayen, R. H. (2017). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PloS One, 12(4), e0174623. https://doi.org/10.1371/journal.pone.0174623
  • Arnon, I., & Ramscar, M. (2012). Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition, 122(3), 292–305. https://doi.org/10.1016/j.cognition.2011.10.009
  • Arppe, A., Hendrix, P., Milin, P., Baayen, R. H., Sering, T., & Shaoul, C. (2018). ndl: Naive Discriminative Learning. R package version 0.2.18. https://CRAN.R-project.org/package=ndl
  • Azorlosa, J. L., & Cicala, G. A. (1986). Blocking of conditioned suppression with 1 or 10 compound trials. Animal Learning & Behaviour, 14(2), 163–167. https://doi.org/10.3758/BF03200051
  • Baayen, R. H. (2010). Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon, 5(3), 436–461. https://doi.org/10.1075/ml.5.3.10baa
  • Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition but in linear discriminative learning. Complexity, 2019. https://doi.org/10.1155/2019/4895891
  • Baayen, R. H., Endresen, A., Janda, L. A., Makarova, A., & Nesset, T. (2013). Making choices in Russian: Pros and cons of statistical methods for rival forms. Russian Linguistics, 37(3), 253–291. https://doi.org/10.1007/s11185-013-9118-6
  • Baayen, R. H., Hendrix, P., & Ramscar, M. (2013). Sidestepping the combinatorial explosion: An explanation of n-gram frequency effects based on naive discriminative learning. Language and Speech, 56(3), 329–347. https://doi.org/10.1177/0023830913484896
  • Baayen, R. H., Milin, P., Đurđević, D. F., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481. https://doi.org/10.1037/a0023851
  • Baker, A. G. (1974). Conditioned inhibition is not the symmetrical opposite of conditioned excitation: A test of the Rescorla-Wagner model. Learning and Motivation, 5(3), 369–379. https://doi.org/10.1016/0023-9690(74)90018-6
  • Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
  • Bellingham, W. P., & Gillette, K. (1981). Attenuation of overshadowing as a function of nondifferential compound conditioning trials. Bulletin of the Psychonomic Society, 18(4), 218–220. https://doi.org/10.3758/BF03333608
  • Blaisdell, A. P., Denniston, J. C., & Miller, R. R. (2001). Recovery from the overexpectation effect: Contrasting performance-focused and acquisition-focused models of retrospective revaluation. Animal Learning & Behaviour, 29(4), 367–380. https://doi.org/10.3758/BF03192902
  • Blevins, J. P., Ackerman, F., Malouf, R., & Ramscar, M. (2016). Morphology as an adaptive discriminative system. Morphological Metatheory, 271–302. https://doi.org/10.1075/la.229.10ble
  • Booij, G. (2010). Construction morphology. Language and Linguistics Compass, 4(7), 543–555. https://doi.org/10.1111/j.1749-818X.2010.00213.x
  • Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58(5), 313–323. https://doi.org/10.1037/h0054388
  • Bybee, J. (2001). Phonology and language use. Cambridge University Press.
  • Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review, 22(2-4), 381–410. https://doi.org/10.1515/tlir.2005.22.2-4.381
  • Caballero, G., & Kapatsinski, V. (in press). How agglutinative? Searching for cues to meaning in Choguita Rarámuri (Tarahumara) using discriminative learning. In A. Sims, A. Ussishkin, J. Parker, & S. Wray (Eds.), Morphological typology and linguistic cognition. Cambridge University Press.
  • Daelemans, W., & Van den Bosch, A. (2005). Memory-based language processing. Cambridge University Press.
  • Dawson, M. R. (2008). Connectionism and classical conditioning. Comparative Cognition & Behaviour Reviews, 3, 1–115. https://doi.org/10.3819/ccbr.2008.30008
  • Dawson, M. R., & Spetch, M. L. (2005). Traditional perceptrons do not produce the overexpectation effect. Neural Information Processing-Letters and Reviews, 7(1), 11–17.
  • Divjak, D. (2019). Frequency in language: Memory, attention and learning. Cambridge University Press.
  • Ellis, N. C. (2006). Selective attention and transfer phenomena in L2 acquisition: Contingency, cue competition, salience, interference, overshadowing, blocking, and perceptual learning. Applied Linguistics, 27(2), 164–194. https://doi.org/10.1093/applin/aml015
  • Ellis, N. C., & Ferreira-Junior, F. (2009). Constructions and their acquisition: Islands and the distinctiveness of their occupancy. Annual Review of Cognitive Linguistics, 7(1), 188–221. https://doi.org/10.1075/arcl.7.08ell
  • Ellis, N. C., & Sagarra, N. (2010). Learned attention effects in L2 temporal reference: The first hour and the next eight semesters. Language Learning, 60, 85–108. https://doi.org/10.1111/j.1467-9922.2010.00602.x
  • Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learning curve: Implications of a quantitative analysis. Proceedings of the National Academy of Sciences, 101(36), 13124–13131. https://doi.org/10.1073/pnas.0404965101
  • Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251–279. https://doi.org/10.1037/0033-295X.105.2.251
  • Goldstone, R. L. (2000). Unitization during category learning. Journal of Experimental Psychology: Human Perception and Performance, 26(1), 86–112. https://doi.org/10.1037/0096-1523.26.1.86
  • Griffiths, T. L., Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2011). Bayes and blickets: Effects of knowledge on causal induction in children and adults. Cognitive Science, 35(8), 1407–1455. https://doi.org/10.1111/j.1551-6709.2011.01203.x
  • Harmon, Z., Idemaru, K., & Kapatsinski, V. (2019). Learning mechanisms in cue reweighting. Cognition, 189, 76–88. https://doi.org/10.1016/j.cognition.2019.03.011
  • Harmon, Z., & Kapatsinski, V. (2020). The best-laid plans of mice and men: Competition between top-down and preceding-item cues in plan execution. Proceedings of the Annual Conference of the Cognitive Science Society, 42, 1674–1680.
  • Harris, A. C. (2017). Multiple exponence. Oxford University Press.
  • Hayes, B. (2020). Deriving the wug-shaped curve: A criterion for assessing formal theories of linguistic variation. (Unpublished manuscript, UCLA). https://linguistics.ucla.edu/people/hayes/papers/HayesWugShapedCurve.pdf
  • Hebb, D. O. (1949). The organization of behaviour: A neuropsychological theory. Wiley.
  • Jamieson, R. K., Crump, M. J., & Hannah, S. D. (2012). An instance theory of associative learning. Learning & Behaviour, 40(1), 61-82. https://doi.org/10.3758/s13420-011-0046-2
  • Jordan, M. I. (1995). Why the logistic function? A tutorial discussion on probabilities and neural networks (MIT Computational Cognitive Science Technical Report 9503.
  • Kamin, L. J. (1969). Selective association and conditioning. In N. J. Mackintosh, & W. K. Honig (Eds.), Fundamental issues in associative learning (pp. 42–64). Dalhousie University Press.
  • Kapatsinski, V. (2007). Implementing and testing theories of linguistic constituency I: English syllable structure. Research on Spoken Language Processing Progress Report, 28, 241–276. Indiana University Speech Research Laboratory.
  • Kapatsinski, V. (2009). Testing theories of linguistic constituency with configural learning: The case of the English syllable. Language, 85(2), 248–277. https://doi.org/10.1353/lan.0.0118
  • Kapatsinski, V. (2018). Changing minds changing tools: From learning theory to language acquisition to language change. MIT Press.
  • Kondaurova, M. V., & Francis, A. L. (2008). The relationship between native allophonic experience with vowel duration and perception of the English tense/lax vowel contrast by spanish and Russian listeners. The Journal of the Acoustical Society of America, 124(6), 3959–3971. https://doi.org/10.1121/1.2999341
  • Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22–44. https://doi.org/10.1037/0033-295X.99.1.22
  • Lattal, K. M., & Nakajima, S. (1998). Overexpectation in appetitive Pavlovian and instrumental conditioning. Animal Learning & Behaviour, 26(3), 351–360. https://doi.org/10.3758/BF03199227
  • Logue, A. W. (1979). Taste aversion and the generality of the laws of learning. Psychological Bulletin, 86(2), 276–296. https://doi.org/10.1037/0033-2909.86.2.276
  • Lubow, R. E., & Moore, A. U. (1959). Latent inhibition: The effect of nonreinforced pre-exposure to the conditional stimulus. Journal of Comparative and Physiological Psychology, 52(4), 415–419. https://doi.org/10.1037/h0046700
  • McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419–457. https://doi.org/10.1037/0033-295X.102.3.419
  • McMurray, B., Horst, J. S., & Samuelson, L. K. (2012). Word learning emerges from the interaction of online referent selection and slow associative learning. Psychological Review, 119(4), 831–877. https://doi.org/10.1037/a0029872
  • Melz, E. R., Cheng, P. W., Holyoak, K. J., & Waldmann, M. R. (1993). Cue competition in human categorization: Contingency or the rescorla–Wagner learning rule? Comment on shanks (1991). Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(6), 1398–1410. https://doi.org/10.1037/0278-7393.19.6.1398
  • Milin, P., Divjak, D., & Baayen, R. H. (2017). A learning perspective on individual differences in skilled reading: Exploring and exploiting orthographic and semantic discrimination cues. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(11), 1730–1751. https://doi.org/10.1037/xlm0000410
  • Milin, P., Divjak, D., Dimitrijević, S., & Baayen, R. H. (2016). Towards cognitively plausible data science in language research. Cognitive Linguistics, 27(4), 507–526. https://doi.org/10.1515/cog-2016-0055
  • Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the rescorla–Wagner model. Psychological Bulletin, 117(3), 363–386. https://doi.org/10.1037/0033-2909.117.3.363
  • Navarro, J. I., Hallam, S. C., Matzel, L. D., & Miller, R. R. (1989). Superconditioning and overshadowing. Learning & Motivation, 20(2), 130–152. https://doi.org/10.1016/0023-9690(89)90014-3
  • Nixon, J. S. (2020). Of mice and men: Speech sound acquisition as discriminative learning from prediction error, not just statistical tracking. Cognition, 197, 104081. https://doi.org/10.1016/j.cognition.2019.104081
  • Olejarczuk, P., Kapatsinski, V., & Baayen, R. H. (2018). Distributional learning is error-driven: The role of surprise in the acquisition of phonetic categories. Linguistics Vanguard, 4(s2), s2. https://doi.org/10.1515/lingvan-2017-0020
  • Packheiser, J., Pusch, R., Stein, C. C., Güntürkün, O., Lachnit, H., & Uengoer, M. (2020). How competitive is cue competition? Quarterly Journal of Experimental Psychology, 73(1), 104–114. https://doi.org/10.1177/1747021819866967
  • Pavlov, I. P. (1927). Conditioned reflexes. Oxford University Press.
  • Pearce, J. M. (1987). A model for stimulus generalization in Pavlovian conditioning. Psychological Review, 94(1), 61–73. https://doi.org/10.1037/0033-295X.94.1.61
  • Pearce, J. M. (1994). Similarity and discrimination: A selective review and a connectionist model. Psychological Review, 101(4), 587–607. https://doi.org/10.1037/0033-295X.101.4.587
  • Pearce, J. M., & Redhead, E. S. (1995). Supernormal conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 21(2), 155–165. https://doi.org/10.1037/0097-7403.21.2.155
  • Plaut, D. C., & Booth, J. R. (2000). Individual and developmental differences in semantic priming: Empirical and computational support for a single-mechanism account of lexical processing. Psychological Review, 107(4), 786–823. https://doi.org/10.1037/0033-295X.107.4.786
  • Ramscar, M., Dye, M., & Klein, J. (2013). Children value informativity over logic in word learning. Psychological Science, 24(6), 1017–1023. https://doi.org/10.1177/0956797612460691
  • Ramscar, M., Dye, M., & McCauley, S. M. (2013). Error and expectation in language learning: The curious absence of” mouses” in adult speech. Language, 89(4), 760–793. https://doi.org/10.1353/lan.2013.0068
  • Ramscar, M., Hendrix, P., Shaoul, C., Milin, P., & Baayen, H. (2014). The myth of cognitive decline: Non-linear dynamics of lifelong learning. Topics in Cognitive Science, 6(1), 5–42. https://doi.org/10.1111/tops.12078
  • Ramscar, M., Yarlett, D., Dye, M., Denny, K., & Thorpe, K. (2010). The effects of feature-label-order and their implications for symbolic learning. Cognitive Science, 34(6), 909–957. https://doi.org/10.1111/j.1551-6709.2009.01092.x
  • R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
  • Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66(1), 1–5. https://doi.org/10.1037/h0025984
  • Rescorla, R. A. (1970). Reduction in the effectiveness of reinforcement after prior excitatory conditioning. Learning & Motivation, 1(4), 372–381. https://doi.org/10.1016/0023-9690(70)90101-3
  • Rescorla, R. A. (1971). Variation in the effectiveness of reinforcement and nonreinforcement following prior inhibitory conditioning. Learning & Motivation, 2(2), 113–123. https://doi.org/10.1016/0023-9690(71)90002-6
  • Rescorla, R. A. (1973). Evidence for “unique stimulus” account of configural conditioning. Journal of Comparative and Physiological Psychology, 85(2), 331–338. https://doi.org/10.1037/h0035046
  • Rescorla, R. A. (1989). Redundant treatment of neutral and excitatory stimuli in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 15(3), 212–223. https://doi.org/10.1037/0097-7403.15.3.212
  • Rescorla, R. A. (1999). Summation and overexpectation with qualitatively different outcomes. Animal Learning & Behaviour, 27(1), 50–62. https://doi.org/10.3758/BF03199431
  • Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black, & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). Appleton-Century-Crofts.
  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://doi.org/10.1038/323533a0
  • Schumacher, M., Roßner, R., & Vach, W. (1996). Neural networks and logistic regression: Part I. Computational Statistics & Data Analysis, 21(6), 661–682. https://doi.org/10.1016/0167-9473(95)00032-1
  • Seyfarth, S., & Myslin, M. (2014). Discriminative learning predicts human recognition of English blend sources. Proceedings of the Annual Conference of the Cognitive Science Society, 36, 1413–1417.
  • Stein, R. B. (1967). The frequency of nerve action potentials generated by applied currents. Proceedings of the Royal Society, B167(1006), 64–86. https://doi.org/10.1098/rspb.1967.0013
  • Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88(2), 135–170. https://doi.org/10.1037/0033-295X.88.2.135
  • Wang, X., Qin, Y., Wang, Y., Xiang, S., & Chen, H. (2019). ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing, 363, 88–98. https://doi.org/10.1016/j.neucom.2019.07.017
  • Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis. Springer.
  • Williams, B. A., & McDevitt, M. A. (2002). Inhibition and superconditioning. Psychological Science, 13(5), 454–459. https://doi.org/10.1111/1467-9280.00480
  • Xu, F., & Tenenbaum, J. B. (2007). Word learning as Bayesian inference. Psychological Review, 114(2), 245–272. https://doi.org/10.1037/0033-295X.114.2.245

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.