Search in:

Language and Cognitive Processes Volume 28, 2013 - Issue 1-2

Journal homepage

3,010

Views

CrossRef citations to date

Altmetric

Formal vs. Processing Approaches to Syntactic Phenomena

The need for quantitative methods in syntax and semantics research

Edward Gibson Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Linguistics and Philosophy, Massachusetts Institute of Technology, Cambridge, MA, USACorrespondence[email protected]

Evelina Fedorenko Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA

Pages 88-124 | Received 14 Dec 2009, Accepted 06 Aug 2010, Published online: 27 Oct 2010

Cite this article
https://doi.org/10.1080/01690965.2010.515080

Sample our Behavioral Sciences journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/01690965.2010.515080?needAccess=true

Abstract

The prevalent method in syntax and semantics research involves obtaining a judgement of the acceptability of a sentence/meaning pair, typically by just the author of the paper, sometimes with feedback from colleagues. This methodology does not allow proper testing of scientific hypotheses because of (a) the small number of experimental participants (typically one); (b) the small number of experimental stimuli (typically one); (c) cognitive biases on the part of the researcher and participants; and (d) the effect of the preceding context (e.g., other constructions the researcher may have been recently considering). In the current paper we respond to some arguments that have been given in support of continuing to use the traditional nonquantitative method in syntax/semantics research. One recent defence of the traditional method comes from Phillips (2009), who argues that no harm has come from the nonquantitative approach in syntax research thus far. Phillips argues that there are no cases in the literature where an incorrect intuitive judgement has become the basis for a widely accepted generalisation or an important theoretical claim. He therefore concludes that there is no reason to adopt more rigorous data collection standards. We challenge Philips' conclusion by presenting three cases from the literature where a faulty intuition has led to incorrect generalisations and mistaken theorising, plausibly due to cognitive biases on the part of the researchers. Furthermore, we present additional arguments for rigorous data collection standards. For example, allowing lax data collection standards has the undesirable effect that the results and claims will often be ignored by researchers with stronger methodological standards. Finally, we observe that behavioural experiments are easier to conduct in English than ever before, with the advent of Amazon.com's Mechanical Turk, a marketplace interface that can be used for collecting behavioural data over the internet.

Keywords:

Syntax
Semantics
Sentence processing
Confirmation bias

Acknowledgments

This research conducted here was supported by the National Science Foundation under Grant No. 0844472, “Bayesian Cue Integration in Probability-Sensitive Language Processing”. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

We would like to thank Diogo Almeida, Leon Bergen, Joan Bresnan, David Caplan, Nick Chater, Morten Christiansen, Mike Frank, Adele Goldberg, Helen Goodluck, Greg Hickok, Ray Jackendoff, Nancy Kanwisher, Roger Levy, Maryellen MacDonald, James Myers, Colin Phillips, Steve Piantadosi, Steve Pinker, David Poeppel, Omer Preminger, Ian Roberts, Greg Scontras, Jon Sprouse, Carson Schütze, Mike Tanenhaus, Vince Walsh, Duane Watson, Eytan Zweig, members of TedLab, and three anonymous reviewers for their comments on an earlier draft of this paper. We would also like to thank Kristina Fedorenko for her help in constructing the materials for the experiment in Case Study 3.

Notes

¹In a nonquantitative (single-participant/single-item) version of the acceptability judegment task a 2- or 3-point scale is typically used, usually consisting of “good”/“natural”/“grammatical”/“acceptable” vs. “bad”/“unnatural”/“ungrammatical”/“unacceptable” (usually annotated with an asterisk “*” in the papers reporting such judgements), and sometimes including a judgement of “in between”/“questionable” (usually annotated with a question mark “?”). In a quantitative version of the acceptability judgement task (with multiple participants and items) a fixed scale (a “Likert scale”) with five or seven points is typically used. Alternatively, a geometric scale is used where the acceptability of each target sentence is compared to a reference sentence. This latter method is known as magnitude estimation (Bard, Robertson, & Sorace, Citation1996). Although some researchers have hypothesised that magnitude estimation allows detecting more fine-grained distinctions than Likert scales (Bard et al., 1996; Featherston, Citation2005, 2007; Keller, Citation2000), controlled experiments using both Likert scales and magnitude estimation suggest that the two methods are equally sensitive (Fedorenko & Gibson, 2010a; Fukuda, Michel, Beecher, & Goodall, Citation2010; Weskott & Fanselow, Citation2008, Citation2009).

²It should be noted that some researchers have criticised the acceptability judgement method because it requires participants to be aware of language as an object of evaluation, rather than simply as a means of communication (Edelman & Christiansen, Citation2003). Whereas this concern is worth considering with respect to the research questions that are being evaluated, one should also consider the strengths of the acceptability judgement method: (1) it is an extremely simple and efficient task; and (2) results from acceptability judgement experiments are highly systematic across speakers and correlate with other dependent measures, presumably because the same factors affect participants’ responses across different measures (Schütze, Citation1996; Sorace & Keller, Citation2005).

³In analysing quantitative data, it is important to examine the distributions of individual responses in order to determine whether further analyses may be necessary, in cases where the population is not sufficiently homogeneous with respect to the phenomena in question. A wide range of analysis techniques are available for not only characterising the population as a whole, but also for detecting stable sub-populations within the larger population or for characterising stable individual differences (e.g., Gibson et al., Citation2009).

⁴In fact, some methods in cognitive science and cognitive neuroscience were specifically developed to get at representational questions (e.g., lexical/syntactic priming methods, neural adaptation or multi-voxel pattern analyses in functional MRI).

⁵See http://www.talkingbrains.org/2010/06/weak-quantitative-standards-in.html for a recent presentation and discussion of some of these arguments.

⁶Colin Phillips and some of his former students/postdocs have commented to us that, in their experience, quantitative acceptability judgement studies almost always validate the claim(s) in the literature. This is not our experience, however. Most experiments that we have run which attempt to test some syntactic/semantic hypothesis in the literature end up providing us with a pattern of data that had not been known before the experiment (e.g., Breen, Fedorenko, Wagner, & Gibson, 2010; Fedorenko & Gibson, 2010a; Patel et al., 2009; Scontras & Gibson, 2010).

⁷Note that there is a tension between this claim and the Chomsky an idea that there is a universal language faculty possessed by all native speakers of a language, including naïve subjects. Moreover, as discussed above, the need to ignore irrelevant features of examples should be eliminated by good experimental design: the experimenter reduces the possibility of confounding influences by controlling theoretically irrelevant variables in the materials to be compared.

⁸These two sentences are of course not a minimal pair, because of several uncontrolled differences between the items, including (a) the wh-question in (11) is a matrix wh-question, while the wh-question in (12) is an embedded wh-question; and (b) the lexical items in the two sentences aren't the same (“I” vs. “you”). These differences were controlled in the experimental comparison reported below.

⁹An anonymous reviewer has noted that “the importance of vacuous movement has plummeted greatly since Barriers days, and thus that the judgements in question really don't matter all that much”. Although this may be true, this is certainly a case that, in the words of Phillips (2009) “adversely affected the progress of the field of syntax”. In particular, Chomsky's writings have a much greater impact than other syntacticians’ writings, so any errors in his work are exacerbated in the field for years to come. To give a specific example, the first author of this paper (Edward Gibson) began to work in the field of syntax in the late 1980s, but was so disenchanted by several of the judgements in Chomsky (1986) that he shifted his research focus to a different topic within the area of language research.

For summaries of issues relevant to basic experimental design for language research, see e.g., Ferreira (2005) and Myers (2009).

Bard , E. G. , Robertson , D. and Sorace , A. 1996 . Magnitude estimation of linguistic acceptability . Language , 72 : 32 – 68 .

Web of Science ®Google Scholar

Featherston , S. 2005 . Magnitude estimation and what it can do for your syntax: Some wh-constraints in German . Lingua , 115 ( 11 ) : 1525 – 1550 .

Web of Science ®Google Scholar

Keller , F. 2000 . Gradience in grammar: Experimental and computational aspects of degrees of grammaticality . Doctoral dissertation, University of Edinburgh, UK .

Google Scholar

Fukuda , S. , Michel , D. , Beecher , H. , & Goodall , G. 2010 . Comparing three methods for sentence judgment experiments . LSA Annual Meeting, Baltimore, MD, January 8, 2010 .

Google Scholar

Weskott , T. and Fanselow , G. 2008 . “ Variance and informativity in different measures of linguistic acceptability ” . In Proceedings of the 27th West Coast Conference on Formal Linguistics (WCCFL) , Edited by: Abner , N. and Bishop , J. 431 – 439 . Somerville , MA : Cascadilla Press .

Google Scholar

Weskott , T. and Fanselow , G. 2009 . “ Scaling issues in the measurement of linguistic acceptability ” . In The fruits of empirical linguistics. Vol. 1: Process , Edited by: Featherston , S. and Winkler , S. 229 – 245 . Berlin , New York : Mouton de Gruyter .

Google Scholar

Edelman , S. and Christiansen , M. 2003 . How seriously should we take Minimalist syntax? . Trends in Cognitive Sciences , 7 : 60 – 61 .

PubMed Web of Science ®Google Scholar

Schütze , C. 1996 . The empirical base of linguistics: Grammaticality judgments and linguistic methodology , Chicago , IL : University of Chicago Press .

Google Scholar

Sorace , A. and Keller , F. 2005 . Gradience in linguistic data . Lingua , 115 : 1497 – 1524 .

Web of Science ®Google Scholar

Gibson , E. , Fedorenko , E. , Ichinco , D. , Piantadosi , S. , Twarog , N. , & Troyer , M. 2009 . Quantitative investigations of syntactic representations and processes . Talk presented at the workshop on formal vs. processing explanations of syntactic phenomena, University of York , UK , April 28, 2009 .

Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

The need for quantitative methods in syntax and semantics research

Information for

Open access

Opportunities

Help and information

The need for quantitative methods in syntax and semantics research

Abstract

Acknowledgments

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature