Search in:

The Quarterly Journal of Experimental Psychology Volume 68, 2015 - Issue 8: Megastudies, Crowdsourcing, and Large Datasets in Psycholinguistics

Journal homepage

476

Views

CrossRef citations to date

Altmetric

Regular articles

How useful are corpus-based methods for extrapolating psycholinguistic variables?

Paweł ManderaDepartment of Experimental Psychology, Ghent University, Ghent, BelgiumCorrespondence[email protected]

Emmanuel KeuleersDepartment of Experimental Psychology, Ghent University, Ghent, Belgium

Marc BrysbaertDepartment of Experimental Psychology, Ghent University, Ghent, Belgium

Pages 1623-1642 | Received 10 May 2014, Accepted 24 Oct 2014, Published online: 19 Feb 2015

Cite this article
https://doi.org/10.1080/17470218.2014.988735
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

REFERENCES

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459. doi: 10.3758/BF03193014
PubMed Web of Science ®Google Scholar
Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1). Retrieved from http://clic.cimec.unitn.it/marco/publications/acl2014/baroni-etal-countpredict-acl2014.pdf
Google Scholar
Bestgen, Y. (2002). Détermination de la valence affective de termes dans de grands corpus de textes [Determination of the emotional valence of terms in large corpora]. In Y. Toussaint & C. Nedellec (Eds.), Actes du Colloque International sur la Fouille de Texte CIFT ‘02 (pp. 81–94). Nancy, France: INRIA.
Google Scholar
Bestgen, Y., & Vincze, N. (2012). Checking and bootstrapping lexical norms by means of word similarity indexes. Behavior Research Methods, 44(4), 998–1006. doi:10.3758/s13428-012-0195-z
PubMed Web of Science ®Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
Web of Science ®Google Scholar
Bradley, M., & Lang, P. (1999). Affective norms for English words (ANEW): Stimuli, instruction manual, and affective ratings (Technical Report No. C-1). Gainesville, FL: University of Florida, NIMH Center for Research in Psychophysiology.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi: 10.1023/A:1010933404324
Web of Science ®Google Scholar
Broder, A. Z. (1997). On the resemblance and containment of documents. Proceedings of Compression and Complexity of Sequences 1997 (pp. 21–29). IEEE. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=666900
Google Scholar
Brysbaert, M., & Ghyselinck, M. (2006). The effect of age of acquisition: Partly frequency related, partly frequency independent. Visual Cognition, 13(7–8), 992–1011. doi:10.1080/13506280544000165
Web of Science ®Google Scholar
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. doi:10.3758/BRM.41.4.977
PubMed Web of Science ®Google Scholar
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911. doi:10.3758/s13428-013-0403-5
PubMed Web of Science ®Google Scholar
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497–505. doi:10.1080/14640748108400805
Google Scholar
Feng, S., Cai, Z., Crossley, S., & McNamara, D. S. (2011). Simulating Human Ratings on Word Concreteness. Twenty-Fourth International FLAIRS Conference. Retrieved from http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS11/paper/viewPDFInterstitial/2644/3035
Google Scholar
Fix, E., & Hodges, J. (1951). Discriminatory analysis, nonparametric discrimination: Consistency properties. US Air Force School of Aviation Medicine, Technical Report, 4(3).
Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., 6(6), 721–741. doi:10.1109/TPAMI.1984.4767596
PubMed Web of Science ®Google Scholar
Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1944 words. Behavior Research Methods & Instrumentation, 12(4), 395–427. doi:10.3758/BF03201693
Web of Science ®Google Scholar
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2), 183–233. doi: 10.1023/A:1007665907178
Web of Science ®Google Scholar
Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42(3), 643–650. doi:10.3758/BRM.42.3.643
PubMed Web of Science ®Google Scholar
Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304. doi:10.3758/s13428-011-0118-4
PubMed Web of Science ®Google Scholar
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978–990. doi:10.3758/s13428-012-0210-4
PubMed Web of Science ®Google Scholar
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–210. doi: 10.1037/0033-295X.104.2.211
Web of Science ®Google Scholar
Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28(2), 203–208. doi: 10.3758/BF03204766
Google Scholar
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.
Google Scholar
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs]. Retrieved from http://arxiv.org/abs/1301.3781
Google Scholar
Mikolov, T., Le, Q. V., & Sutskever, I. (2013). Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [cs]. Retrieved from http://arxiv.org/abs/1309.4168
Google Scholar
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). WordNet: An on-line lexical database. International Journal of Lexicography, 3, 235–244. doi: 10.1093/ijl/3.4.235
Google Scholar
Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods, 41(3), 647–656. doi:10.3758/BRM.41.3.647
PubMed Web of Science ®Google Scholar
Recchia, G., & Louwerse, M. M. (2014). Reproducing affective norms with lexical co-occurrence statistics: Predicting valence, arousal, and dominance. The Quarterly Journal of Experimental Psychology, 1–45. doi:10.1080/17470218.2014.941296
Web of Science ®Google Scholar
Rijsbergen, C. J. V. (1979). Information retrieval. London: Butterworths.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. doi:10.1038/323533a0
Web of Science ®Google Scholar
Sahlgren, M. (2006). The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. (Doctoral dissertation, Stockholm University). Retrieved from http://eprints.sics.se/437/1/TheWordSpaceModel.pdf.
Google Scholar
Shaoul, C., & Westbury, C. (2006). Word frequency effects in high-dimensional co-occurrence models: A new approach. Behavior Research Methods, 38(2), 190–195. doi:10.3758/BF03192768
PubMed Web of Science ®Google Scholar
Shaoul, C., & Westbury, C. (2010). Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods, 42(2), 393–413. doi:10.3758/BRM.42.2.393
PubMed Web of Science ®Google Scholar
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. S. McNamara, S. Dennis & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 424–440). Hillsdale, NJ: Erlbaum.
Google Scholar
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 173–180). Association for Computational Linguistics.
Google Scholar
Toutanova, K., & Manning, C. D. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13 (pp. 63–70). Association for Computational Linguistics.
Google Scholar
Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45(4), 1191–1207. doi:10.3758/s13428-012-0314-x
PubMed Web of Science ®Google Scholar
Westbury, C. (2013). You Can't Drink a Word: Lexical and Individual Emotionality Affect Subjective Familiarity Judgments. Journal of Psycholinguistic Research, 43(5), 631–49. doi:10.1007/s10936-013-9266-2
Web of Science ®Google Scholar
Westbury, C. F., Shaoul, C., Hollis, G., Smithson, L., Briesemeister, B. B., Hofmann, M. J., & Jacobs, A. M. (2013). Now you see it, now you don't: on emotion, context, and the algorithmic prediction of human imageability judgments. Frontiers in Psychology, 4. doi:10.3389/fpsyg.2013.00991
PubMedGoogle Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

How useful are corpus-based methods for extrapolating psycholinguistic variables?

REFERENCES

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

How useful are corpus-based methods for extrapolating psycholinguistic variables?

REFERENCES

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date