457
Views
10
CrossRef citations to date
0
Altmetric
Articles

Rage against the machine: Evaluation metrics in the 21st century

Pages 100-125 | Received 30 Nov 2016, Accepted 04 Dec 2016, Published online: 20 Mar 2017
 

ABSTRACT

I review the classic literature in generative grammar and Marr’s three-level program for cognitive science to defend the Evaluation Metric as a psychological theory of language learning. Focusing on well-established facts of language variation, change, and use, I argue that optimal statistical principles embodied in Bayesian inference models are ill-suited for language acquisition. Specific attention will be given to the Subset Problem: Indirect negative evidence, which can be attractively formulated in the Bayesian framework, is ineffective when the statistical properties of language are examined in detail. As an alternative, I suggest that the Tolerance Principle (Yang 2016) provides a unified solution for the problem of induction and generalization: It bridges the computational and algorithm levels in Marr’s formulation, while retaining the commitment to the formal and empirical constraints in child language development.

Acknowledgment

I would like to thank Noam Chomsky, Alex Clark, Stephen Crain, Randy Gallistel, Steve Isard, Mark Johnson, Norbert Hornstein, and Lisa Pearl for helpful discussions of the materials presented here.

Notes

1 I am indebted to Constantine Lignos who, in his dissertation (2013), initiated the discussion of Marr’s levels in the setting of language acquisition, with specific reference to the problem of infant word segmentation.

2 All page numbers in Marr’s Vision refer to the 2010 MIT Press reprinting of the original 1982 edition.

3 Tommy Poggio, Marr’s closest collaborator, similarly cautions against the detachments of the three levels and strongly emphasizes the need for reintegration (Marr Citation2010: 365).

4 Though how well it works must be compared against alternative formulations, some of which are domain specific (e.g., triggering; Gibson & Wexler Citation1994), while others are domain general (e.g., reinforcement learning such as the Bush & Mosteller (Citation1951) model of linear reward penalty scheme used in Yang (Citation2002a) for parameter setting).

5 Or immediately favored, for whatever reason, eliminating the old form instantly, which is also inconsistent with the record of variation in language change.

6 That is not to say that the learner will always recapitulate the statistical distribution of linguistics variants in the environment. In fact, the distribution may be gradually altered by the relative “fitness” of the competing variants, resulting in language change; see Yang (Citation2000) for details.

7 Or, the measure of optimality, which is generally construed as expected payoff/penalty in the behavioral studies, would have to be reconceptualized.

8 Due to the stochastic nature of approximation methods, there are some practical difficulties in assessing the performance of Bayesian models: How long should one allow the search to run? How close does the best solution (when the search terminates) approach the true global optimum? The complexity of Bayesian inference can easily overwhelm most research groups’ computing resources.

9 It is logically possible, though I believe unlikely, that the superset hypothesis of attributive usage is already ruled before the child has uttered a word. If so, then the onus is on the advocate of indirect negative evidence to demonstrate the reality of this very brief stage.

10 Presumably, there are more than two such typical adjectives, but here a cursory evaluation is sufficient.

11 Can the “a-adjective as PP” hypothesis developed in Yang (Citation2015) be cast in terms of Bayesian inference? Absolutely: The Bayesian framework is extremely flexible. But doing so entails the acknowledgment that the Bayesian formulation of indirect negative evidence, which is its central appeal in the study of linguistic generalization, is no longer at play, and the Bayesian framework is superfluous.

12 Unsurprisingly, the verbs that allow the construction are considerably more frequent than those that do not. This accounts for the justified productivity of (17) when evaluated on a corpus of child-directed speech that contains the relatively high-frequency words.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.