457
Views
10
CrossRef citations to date
0
Altmetric
Articles

Rage against the machine: Evaluation metrics in the 21st century

Pages 100-125 | Received 30 Nov 2016, Accepted 04 Dec 2016, Published online: 20 Mar 2017
 

ABSTRACT

I review the classic literature in generative grammar and Marr’s three-level program for cognitive science to defend the Evaluation Metric as a psychological theory of language learning. Focusing on well-established facts of language variation, change, and use, I argue that optimal statistical principles embodied in Bayesian inference models are ill-suited for language acquisition. Specific attention will be given to the Subset Problem: Indirect negative evidence, which can be attractively formulated in the Bayesian framework, is ineffective when the statistical properties of language are examined in detail. As an alternative, I suggest that the Tolerance Principle (Yang 2016) provides a unified solution for the problem of induction and generalization: It bridges the computational and algorithm levels in Marr’s formulation, while retaining the commitment to the formal and empirical constraints in child language development.

Acknowledgment

I would like to thank Noam Chomsky, Alex Clark, Stephen Crain, Randy Gallistel, Steve Isard, Mark Johnson, Norbert Hornstein, and Lisa Pearl for helpful discussions of the materials presented here.

Notes

1 I am indebted to Constantine Lignos who, in his dissertation (2013), initiated the discussion of Marr’s levels in the setting of language acquisition, with specific reference to the problem of infant word segmentation.

2 All page numbers in Marr’s Vision refer to the 2010 MIT Press reprinting of the original 1982 edition.

3 Tommy Poggio, Marr’s closest collaborator, similarly cautions against the detachments of the three levels and strongly emphasizes the need for reintegration (Marr Citation2010: 365).

4 Though how well it works must be compared against alternative formulations, some of which are domain specific (e.g., triggering; Gibson & Wexler Citation1994), while others are domain general (e.g., reinforcement learning such as the Bush & Mosteller (Citation1951) model of linear reward penalty scheme used in Yang (Citation2002a) for parameter setting).

5 Or immediately favored, for whatever reason, eliminating the old form instantly, which is also inconsistent with the record of variation in language change.

6 That is not to say that the learner will always recapitulate the statistical distribution of linguistics variants in the environment. In fact, the distribution may be gradually altered by the relative “fitness” of the competing variants, resulting in language change; see Yang (Citation2000) for details.

7 Or, the measure of optimality, which is generally construed as expected payoff/penalty in the behavioral studies, would have to be reconceptualized.

8 Due to the stochastic nature of approximation methods, there are some practical difficulties in assessing the performance of Bayesian models: How long should one allow the search to run? How close does the best solution (when the search terminates) approach the true global optimum? The complexity of Bayesian inference can easily overwhelm most research groups’ computing resources.

9 It is logically possible, though I believe unlikely, that the superset hypothesis of attributive usage is already ruled before the child has uttered a word. If so, then the onus is on the advocate of indirect negative evidence to demonstrate the reality of this very brief stage.

10 Presumably, there are more than two such typical adjectives, but here a cursory evaluation is sufficient.

11 Can the “a-adjective as PP” hypothesis developed in Yang (Citation2015) be cast in terms of Bayesian inference? Absolutely: The Bayesian framework is extremely flexible. But doing so entails the acknowledgment that the Bayesian formulation of indirect negative evidence, which is its central appeal in the study of linguistic generalization, is no longer at play, and the Bayesian framework is superfluous.

12 Unsurprisingly, the verbs that allow the construction are considerably more frequent than those that do not. This accounts for the justified productivity of (17) when evaluated on a corpus of child-directed speech that contains the relatively high-frequency words.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 362.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.