1,149
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Poverty of the Stimulus Without Tears

ORCID Icon
Pages 415-454 | Published online: 04 Oct 2021
 

ABSTRACT

Poverty of the stimulus has been at the heart of ferocious and tear-filled debates at the nexus of psychology, linguistics, and philosophy for decades. This review is intended as a guide for readers without a formal linguistics or philosophy background, focusing on what poverty of the stimulus is and how it’s been interpreted, which is traditionally where the tears have come in. I discuss poverty of the stimulus from the perspective of language development, highlighting how poverty of the stimulus relates to expectations about learning and the data available to learn from. I describe common interpretations of what poverty of the stimulus means when it occurs, and approaches for determining when poverty of the stimulus is in fact occurring. I close with illustrative examples of poverty of the stimulus in the domains of syntax, lexical semantics, and phonology, and discuss the value of identifying instances of poverty of the stimulus when it comes to understanding language development.

View correction statement:
Correction

Acknowledgments

I’m incredibly grateful to both Greg Scontras and Richard Futrell for comments, discussions, and putting up with me while I worked out how to write this. I’m additionally grateful to both Cindy Fisher and Greg Hickok, for their comments and their unflagging support of a review on this topic. I also want to thank Stephen Goldinger and several anonymous reviewers for their very sensible and illuminating comments on earlier drafts of this manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 I recognize that this review is rather long, as I attempted to draw together information on many different aspects of poverty of the stimulus. Given this, some readers may find it helpful to skip to sections targeting aspects of poverty of the stimulus that they’re particularly interested in, after reading the introductory section.

2 It’s important to underscore that “prior” isn’t the same as “innate”, something which will be discussed more thoroughly in So what does it mean if the data are insufficient?.

3 One modern solution to the Subset Problem involves a domain-general ability to leverage ambiguous C-and-F data by considering how likely each hypothesis is to generate those kind of data (here, C is more likely than F to generate C-and-F data). This has been called the Size Principle (Tenenbaum & Griffiths, Citation2001), and is discussed recently in more detail for the Subset Problem in Lasnik and Lidz (Citation2017) and Pearl (Citation2021). This ability to leverage ambiguous data would allow children to use ambiguous data as indirect evidence for C (and against F). Indirect evidence is an evidence type discussed later on in this section when considering the data available to children.

4 I want to note that this is a hypothetical example, rather than one motivated by a particular empirical debate. However, I think this example sketches out a potential hypothesis space for the child that makes the distinctions I discuss later on easier to follow.

5 Syntactic: Structure dependence

For a concrete example of changing ideas, see the discussion in Syntactic: Structure dependence about structure dependence.

6 When indirect evidence is also negative evidence like this, it’s sometimes referred to as “implicit negative evidence” (Rohde & Plaut, Citation1999) rather than “indirect negative evidence”.

7 More on this terminology in the next section, especially since it’s been taken to mean different things by different people. That is, the use of the term “nativist” has itself caused some tears.

8 See Specifying how much data is enough Lexical semantic: Exact cardinal number words for concrete examples of this kind of proposed innate knowledge.

9 Special thanks to Vic Ferreira for first pointing this out to me.

10 Interestingly, it’s not always true that constraints lead to biases that we can easily describe and therefore implement in an ideal learner; in Pearl and Phillips (Citation2018), a constraint on memory led to better acquisition performance, but it wasn’t from anything easy to explicitly describe.

11 I say “primarily” because the variational learner of Legate and Yang (Citation2002) actually learns from ambiguous data as well, but is really driven by unambiguous data. This is because over time, ambiguous data cancel out, leaving the unambiguous data to drive the learner’s generalizations. See Pearl (Citationin press) for more discussion about why this is.

12 It’s important to note that this “all else being equal” assumption does a lot of work. As just one example, we would need to assume that both knowledge pieces were equally complex from the perspective of the child. If not, it might be that the simpler one was learned more quickly than the more complex one, even with the same amount of unambiguous data.

13 I note that I’m using the term “knowledge” more broadly than it’s traditionally been used. More specifically, I’m referring to any kind of bias or preference as knowledge (such as the preference for more compact representations). This contrasts with only using “knowledge” to refer to the representations in a child’s hypothesis space.

14 Of course, this estimate of the input rate is also an idealization, in the sense that we only have samples of every component that goes into that estimated input rate; however, in contrast to formal learnability work, this view of the input derives from empirical estimates of children’s input.

15 The particular knowledge was when to contract want to into wanna in English. That is, English speakers know they can contract Who do you want to rescue? into Who do you wanna rescue?, but not Who do you want to do the rescuing? into Who do you wanna do the rescuing?

16 Note that overhypotheses are very similar to the traditional notion of linguistic parameters often used by linguistic nativists, where a linguistic parameter is a piece of abstract structural knowledge that can be applied to may different linguistic knowledge pieces. See Pearl and Lidz (Citation2013) and Pearl (Citationin press for more discussion on this point.

17 We might imagine this prior knowledge about structure-dependence could result from a Perfors, Tenenbaum, Regier et al. (Citation2011) learner who figured out that structure-dependence was the right overhypothesis.

18 Note that this meant the modeled learner had to be able to reliably extract both the syntactic and meaning information from the input. This is of course an idealization – perhaps especially for the meaning information – though see Pearl (Citationin press for more discussion about why the particular meaning information that the learner had to extract isn’t implausible for young children to be able to extract.

19 This investigation was restricted to a small input set due to how long it takes to generate accurate annotations of available syntactic and logical information for child input utterances. We might reasonably infer that the modeled learner would show even more impressive performance if it was given a quantity of data more in line with what children truly learn from.

20 Note that a neural network’s non-linear learning process is another way to navigate a large hypothesis space, though it’s hard to interpret exactly how that space is being navigated, in contrast with symbolic approaches like Bayesian inference (Pearl, Citation2019). Still, neural network results do represent an “in principle” solution to learning problems, confirming what it’s possible to learn, given certain input and learning constraints (as encoded in the neural network’s architecture).

21 However, the caveat is that the innards of neural networks are hard to interpret, so it’s possible (though perhaps not highly probable) that an explicit preference for structure-dependent representations was in fact present in the specific numbers in the vectors inside the neural network. It’s just that we as humans can’t yet easily interpret vectorized representations, and so it’s difficult to tell for sure.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 239.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.