Search in:

Advanced search

Journal of Quantitative Linguistics Volume 29, 2022 - Issue 1

Submit an article Journal homepage

1,108

Views

CrossRef citations to date

Altmetric

Research Article

Lexical Richness and Text Length: An Entropy-based Perspective

Yaqian ShiSchool of Foreign Languages, Huazhong University of Science and Technology, Wuhan, People’s Republic of China

https://orcid.org/0000-0003-4958-2286 View further author information

Lei LeiSchool of Foreign Languages, Huazhong University of Science and Technology, Wuhan, People’s Republic of ChinaCorrespondence[email protected] [email protected]

https://orcid.org/0000-0002-3366-1855 View further author information

Pages 62-79 | Published online: 10 Jun 2020

Cite this article
https://doi.org/10.1080/09296174.2020.1766346
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowledge and Information Systems, 51(2), 339–367. https://doi.org/https://doi.org/10.1007/s10115-016-0987-z
PubMed Web of Science ®Google Scholar
Bentz, C., Alikaniotis, D., Cysouw, M., & Ferrer-i-Cancho, R. (2017). The entropy of words — Learnability and expressivity across more than 1000 languages. Entropy, 19(6), 275. https://doi.org/https://doi.org/10.3390/e19060275
Web of Science ®Google Scholar
Čech, R. (2015). Text length and the lambda frequency structure of a text. In G. K. Mikros & J. Mačutek (Eds.), Sequences in language and text (pp. 73–87). Walter de Gruyter GmbH.
Google Scholar
Chen, R., Liu, H., & Altmann, G. (2017). Entropy in different text types. Digital Scholarship in the Humanities, 32(3), 528–542. https://doi.org/https://doi.org/10.1093/llc/fqw008
Web of Science ®Google Scholar
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100. https://doi.org/https://doi.org/10.1080/09296171003643098
Web of Science ®Google Scholar
Daller, H., Milton, J., & Treffers-Daller, J. (Eds.). (2007). Modelling and assessing vocabulary knowledge. Cambridge University Press.
Google Scholar
Daller, H., Van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in the spontaneous speech of bilinguals. Applied Linguistics, 24(2), 197–222. https://doi.org/https://doi.org/10.1093/applin/24.2.197
Web of Science ®Google Scholar
Fan, F., Yang, Y., & Wang, Y. (2016). The probability distribution of textual vocabulary in the English language. Journal of Quantitative Linguistics, 23(1), 49–70. https://doi.org/https://doi.org/10.1080/09296174.2015.1071149
Web of Science ®Google Scholar
Grabchak, M., Zhang, Z., & Zhang, D. T. (2013). Authorship attribution using entropy. Journal of Quantitative Linguistics, 20(4), 301–313. https://doi.org/https://doi.org/10.1080/09296174.2013.830551
Web of Science ®Google Scholar
Guiraud, P. (1954). Les Caractères Statistiques du Vocabulaire. Presses Universitaires de France.
Google Scholar
Hausser, J., & Strimmer, K. (2009). Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. Journal of Machine Learning Research, 10(50), 1469–1484. http://jmlr.csail.mit.edu/papers/volume10/hausser09a/hausser09a.pdf
Web of Science ®Google Scholar
Heaps, H. S. (1978). Information retrieval: Computational and theoretical aspects. Academic Press.
Google Scholar
Herdan, G. (1960). Type-token mathematics: A textbook of mathematical linguistics. Mouton.
Google Scholar
Herdan, G. (1966). The advanced theory of language as choice and chance. Springer.
Google Scholar
Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: A developmental perspective. Lund Working Papers in Linguistics, 53, 61–79.
Google Scholar
Johnson, W. (1944). Studies in language behaviour: A program of research. Psychological Monographs, 56(2), 1–15. https://doi.org/https://doi.org/10.1037/h0093508
Google Scholar
Juola, P. (2008). Assessing linguistic complexity. In M. Miestamo, K. Sinnemäki, & F. Karlsson (Eds.), Language complexity: Typology, contact, change (pp. 89–108). John Benjamins Press.
Google Scholar
Juola, P. (2013). Using the Google N-Gram corpus to measure cultural complexity. Literary and Linguistic Computing, 28(4), 668–675. https://doi.org/https://doi.org/10.1093/llc/fqt017
Google Scholar
Koplenig, A., Wolfer, S., & Müller-spitzer, C. (2019). Studying lexical dynamics and language change via generalized entropies: The problem of sample size. Entropy, 21(5), 464. https://doi.org/https://doi.org/10.3390/e21050464
Web of Science ®Google Scholar
Kubát, M. (2014). Moving window type-token ratio and text length. In G. Altmann, R. Čech, J. Mačutek, & L. Uhlířová (Eds.), Empirical approaches to language and text analysis (pp. 105–113). RAM-Verlag.
Google Scholar
Kubát, M., Mačutek, J., & Čech, R. (2020). Communists spoke differently: An analysis of Czechoslovak and Czech annual presidential speeches. Digital Scholarship in the Humanities. Retrieved Accessed March 4, 2020, from. https://doi.org/https://doi.org/10.1093/llc/fqz089
Google Scholar
Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of Quantitative Linguistics, 20(4), 339–349. https://doi.org/https://doi.org/10.1080/09296174.2013.830552
Web of Science ®Google Scholar
Liu, Z. (2016). A diachronic study on British and Chinese cultural complexity with Google Books N-grams. Journal of Quantitative Linguistics, 23(4), 361–373. https://doi.org/https://doi.org/10.1080/09296174.2016.1226431
Web of Science ®Google Scholar
Lozano, A., Casas, B., Bentz, C., & Ferrer-i-Cancho, R. (2016). Fast calculation of entropy with Zhang’s estimator. In E. Kelih, R. Knight, J. Macutek, & A. Wilson (Eds.), Issues in quantitative linguistics 4 (pp. 273–285). RAM-Verlag. Dedicated to Reinhard Köhler on the occasion of his 65th birthday. No. 23 of the series “Studies in Quantitative Linguistics”. Retrieved July 26, 2019, from https://arxiv.xilesou.top/abs/1707.08290
Google Scholar
Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal, 96(2), 190–208. https://doi.org/https://doi.org/10.1111/j.1540-4781.2011.01232_1.x
Web of Science ®Google Scholar
Malvern, D., & Richards, B. (1997). A new measure of lexical diversity. In A. Ryan & A. Wray (Eds.), Evolving models of language: Papers from the annual meeting of the British association for applied linguistics held at the University of Wales, Swansea, September 1996 British studies in applied linguistics (pp. 58–71). Multilingual Matters.
Google Scholar
Malvern, D., & Richards, B. (2013). Measures of lexical richness. In C. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 3968–3972). Blackwell Publishing Ltd.
Google Scholar
Mason, O., (2000). Parameters of collocation: The word in the centre of gravity. In J. Kirk (Ed.), Corpora galore: Analyses and techniques in describing English (pp. 267–280). Papers from the Nineteenth International Conference on English Language Research on Computerised Corpora (ICAME 1998). Amsterdam: Rodopi–Atlanta.
Google Scholar
McCarthy, P. M., & Jarvis, S. (2007). Vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488. https://doi.org/https://doi.org/10.1177/0265532207080767
Google Scholar
Mckee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Digital Scholarship in the Humanities, 15(3), 323–338. https://doi.org/https://doi.org/10.1093/llc/15.3.323
Google Scholar
Miranda-García, A., & Calle-Martín, J. (2005). The validity of lemma-based lexical richness in authorship attribution: A proposal for the old English gospels. ICAME Journal, 29, 115–129. http://korpus.uib.no/icame/ij29/ij29-page115-130.pdf
Google Scholar
Nemenman, I., Shafee, F., & Bialek, W. (2002). Entropy and inference, revisited. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 472–478). MIT Press.
Google Scholar
Popescu, -I.-I., Čech, R., & Altmann, G. (2011). The lambda-structure of texts. RAM-Verlag.
Google Scholar
Popescu, -I.-I., Mačutek, J., & Altmann, G. (2008). Word frequency and arc length. Glottometrics, 17, 18–42. https://www.ram-verlag.eu/wp-content/uploads/2018/08/g17zeit.pdf
Google Scholar
Popescu, -I.-I., Mačutek, J., & Altmann, G. (2010). Word forms, style and typology. Glottotheory, 3(1), 89–96. https://doi.org/https://doi.org/10.1515/glot-2010-0006
Google Scholar
Popescu, -I.-I., Mačutek, J., Kelih, E., Čech, R., Best, K.-H., & Altmann, G. (2010). Vectors and codes of text. RAM-Verlag.
Google Scholar
Popescu, -I.-I., Vidya, M. N., Uhlířová, L., Pustet, R., Mehler, A., Mačutek, J., Krupa, V., Köhler, R., Jayaram, B. D., Grzybek, P., & Altmann, G. (2009). Word frequency studies. Mouton de Gruyter.
Google Scholar
Rajput, N. K., Ahuja, B., & Riyal, M. K. (2018). A novel approach towards deriving vocabulary quotient. Digital Scholarship in the Humanities, 33(4), 894–901. https://doi.org/https://doi.org/10.1093/llc/fqy014
Web of Science ®Google Scholar
Read, J. (2000). Assessing vocabulary. Cambridge University Press.
Google Scholar
Richards, B. (1987). Type/token ratios: What do they really tell us? Journal of Child Language, 14(2), 201–209. https://doi.org/https://doi.org/10.1017/S0305000900012885
PubMed Web of Science ®Google Scholar
Sadeghi, K., & Dilmaghani, S. K. (2013). The relationship between lexical diversity and genre in Iranian EFL learners’ writings. Journal of Language Teaching and Research, 4(2), 328–334. https://doi.org/https://doi.org/10.4304/jltr.4.2.328-334
Google Scholar
Shah, S. K., Gill, A. A., Mahmood, R., & Bilal, M. (2013). Lexical richness, a reliable measure of intermediate L2 learners’ current status of acquisition of English language. Journal of Education and Practice, 4(6), 42–47. https://www.iiste.org/Journals/index.php/JEP/article/view/4811/4890
Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. https://doi.org/https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Google Scholar
Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50–64. https://doi.org/https://doi.org/10.1002/j.1538-7305.1951.tb01366.x
Google Scholar
Smith, J. A., & Kelly, C. (2002). Stylistic constancy and change across literary corpora: Using measures of lexical richness to date works. Computers and the Humanities, 36(4), 411–430. https://doi.org/https://doi.org/10.1023/A:1020201615753
Google Scholar
Somers, H. H. (1966). Statistical methods in literary analysis. In J. Leeds (Ed.), The computer and literary style (pp. 128–140). Kent State University Press.
Google Scholar
Štajner, S., & Zampieri, M. (2013). Stylistic changes for temporal text classification. In I. Habernal & V. Matousek (Eds.), Text, speech, and dialogue: 16th international conference, TSD 2013, Pilsen, Czech Republic, September 2013 proceedings. Berlin: Springer-Verlag.
Google Scholar
Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39(3), 302–327. https://doi.org/https://doi.org/10.1093/applin/amw009
Web of Science ®Google Scholar
Wang, Y., & Liu, H. (2018). Is Trump always rambling like a fourth-grade student? An analysis of stylistic features of Donald Trump’s political discourse during the 2016 election. Discourse & Society, 29(3), 299–323. https://doi.org/https://doi.org/10.1177/0957926517734659
Web of Science ®Google Scholar
Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge University Press.
Google Scholar
Zhang, Y. (2014). A corpus-based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction. Language Sciences, 44, 60–69. https://doi.org/https://doi.org/10.1016/j.langsci.2013.12.003
Web of Science ®Google Scholar
Zhang, Y. (2015). Entropic evolution of lexical richness of homogeneous texts over time: A dynamic complexity perspective. Journal of Language Modelling, 3(2), 569–599. https://doi.org/https://doi.org/10.15398/jlm.v3i2.111
Google Scholar
Zhang, Z. (2012). Entropy estimation in turing’s perspective. Neural Computation, 24(5), 1368–1389. https://doi.org/https://doi.org/10.1162/NECO_a_00266
PubMed Web of Science ®Google Scholar
Zhu, H., & Lei, L. (2018a). British cultural complexity: An entropy-based approach. Journal of Quantitative Linguistics, 25(2), 190–205. https://doi.org/https://doi.org/10.1080/09296174.2017.1348014
Web of Science ®Google Scholar
Zhu, H., & Lei, L. (2018b). Is modern English becoming less inflectionally diversified? Evidence from entropy-based algorithm. Lingua, 216, 10–27. https://doi.org/https://doi.org/10.1016/j.lingua.2018.10.006
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Lexical Richness and Text Length: An Entropy-based Perspective

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Lexical Richness and Text Length: An Entropy-based Perspective

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date