268
Views
13
CrossRef citations to date
0
Altmetric
Regular articles

The role of corpus size and syntax in deriving lexico-semantic representations for a wide range of concepts

, &
Pages 1643-1664 | Received 11 Feb 2014, Accepted 28 Oct 2014, Published online: 26 Feb 2015
 

Abstract

One of the most significant recent advances in the study of semantic processing is the advent of models based on text and other corpora. In this study, we address what impact both the quantitative and qualitative properties of corpora have on mental representations derived from them. More precisely, we evaluate models with different linguistic and mental constraints on their ability to predict semantic relatedness between items from a vast range of domains and categories. We find that a model based on syntactic dependency relations captures significantly less of the variability for all kinds of words, regardless of the semantic relation between them or their abstractness. The largest difference was found for concrete nouns, which are commonly used to assess semantic processing. For both models we find that limited amounts of data suffice in order to obtain reliable predictions. Together, these findings suggest new constraints for the construction of mental models from corpora, both in terms of the corpus size and in terms of the linguistic properties that contribute to mental representations.

Notes

1The study is ongoing at http://www.smallworldofwords.com/nl and currently contains data for over 16,000 cue words.

2In past studies we have applied -score weighting, as this consistently improved the estimates of the word-association measures over a range of tasks. In this study, PMI was nevertheless chosen to increase the comparability of the text-model where PMI is applied as standard.

3The 95% confidence intervals used here encompass a significance test but also provide an estimate of the magnitude of the effect. If zero is included in the confidence interval, the result will not reach the 5% significance level.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.