519
Views
3
CrossRef citations to date
0
Altmetric
Regular Articles

The relational processing limits of classic and contemporary neural network models of language processing

ORCID Icon, ORCID Icon & ORCID Icon
Pages 240-254 | Received 25 Oct 2019, Accepted 03 Sep 2020, Published online: 21 Sep 2020
 

ABSTRACT

Whether neural networks can capture relational knowledge is a matter of long-standing controversy. Recently, some researchers have argued that (1) classic connectionist models can handle relational structure and (2) the success of deep learning approaches to natural language processing suggests that structured representations are unnecessary to model human language. We tested the Story Gestalt model, a classic connectionist model of text comprehension, and a Sequence-to-Sequence with Attention model, a modern deep learning architecture for natural language processing. Both models were trained to answer questions about stories based on abstract thematic roles. Two simulations varied the statistical structure of new stories while keeping their relational structure intact. The performance of each model fell below chance at least under one manipulation. We argue that both models fail our tests because they can't perform dynamic binding. These results cast doubts on the suitability of traditional neural networks for explaining relational reasoning and language processing phenomena.

Acknowledgments

The work of Guillermo Puebla was supported by the PhD Scholarship Program of CONICYT, Chile. Andrea E. Martin was supported by the Max Planck Research Group “Language and Computation in Neural Systems” and by the Netherlands Organization for Scientific Research (Grant 016.Vidi.188.029). We thank Hugh Rabagliati for his comments on earlier versions of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Although these concepts were never used the in the context of each specific script, they were seen in the training dataset as a whole. By definition, the output of any traditional neural network to a completely new (unseen) concept depends on its initial weights. Given that these weights are initialised randomly, the behaviour of a neural network regarding an unseen input will be essentially random (Marcus, Citation1998).

2 We actually run that simulation and, unsurprisingly, obtained perfect “generalisation”.

Additional information

Funding

The work of Guillermo Puebla was supported by the PhD Scholarship Program of CONICYT, Chile. Andrea E. Martin was supported by the Max Planck Research Group “Language and Computation in Neural Systems” and by the Netherlands Organization for Scientific Research (Grant 016.Vidi.188.029).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 444.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.