434
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese

 

ABSTRACT

The system-level complexity of language has been thoroughly investigated in terms of Zipf’s law, whose quantitative features have proved to reflect text/language typology. This study extends the scope of Zipf’s law from the macroscopic scale of language to specific words in contexts, with the aim of examining its potential as an indicator of word typology. The focus is confined to the high-frequency words in English and Chinese as found in the FLOB and LCMC corpora. It has been found that the log–log rank-frequency distributions of contextual words of the words in question generally abide by the linear function y = ax+b. Moreover, it has been shown that an adjusted version of parameter a can help to distinguish the words in question’s classes. The contextual information as reflected by this Zipf-based index might be more important to the emergence of word classes of Chinese, which has no real inflection as a word-class indicator. From a Zipfian approach, the findings have preliminarily approved Saussure’s systems thinking regarding linguistic signs. Meanwhile, they may also contribute to such fields as usage-based linguistics.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes

1. Modal verbs were not lemmatized except for their contracted forms (i.e. ‘ll -> will, ‘d -> would). Personal pronouns were lemmatized only to remove the nominative-accusative distinction in word form (e.g. me -> I).

2. A word separated from the word in question by a punctuation mark does not count as a co-occurring word.

3. The fitting failed for the English lemma terms, tagged as II32 and always found in in terms of, and so it was excluded from the analysis that followed.

5. * represents a string of any length.

7. The auxiliaries in Chinese are rather different from auxiliaries in English. See Section 3.5 for more details.

Additional information

Funding

This work was supported by the Youth Project of the National Social Science Fund of China under Grant 18CYY005 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.