382
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

A cross-linguistic quantitative study of homophony*

Pages 129-159 | Published online: 16 Feb 2007
 

Abstract

Homophony is ubiquitous across languages. It is an important source of ambiguity which is a distinctive feature of human language. There have, however, been few quantitative investigations on questions such as “Do languages have similar degrees of homophony?”, “Can the degree of homophony in a language be predictable?” We report a preliminary attempt to answer these questions. We measure the degree of homophony of two sets of languages, one including twenty Chinese dialects and the other including three Germanic languages. It is found that there exists a strong correlation between the degree of homophony and the number of occurring syllable types (which can be taken as an estimation of the size of the phonological resource of a language), or the number of monosyllabic words in the lexicon. Furthermore, the distributional properties of homophony reflect some self-organization characteristics of language as a system, as illustrated by two pieces of evidence: the first is the correlation between the degree of homophony and the degree of disyllabification in Chinese dialects, and the second is the observation from some languages that pairs of words tend to exist in different grammatical classes, suggesting that language self-organizes in a way to decrease the chances of ambiguity.

Acknowledgements

I would like to thank Professors William S.-Y. Wang and Chin-Chuan Cheng, and the members of the former Language Engineering Laboratory of City University of Hong Kong, for their helpful discussions. Also, I am thankful to Volker Dollun, Dinoj Surendran, Lolke Van-Der-Veen and Feng Wang for their help in this study. Special thanks are due to Dr Christophe Coupé and the support of Laboratoire Dynamique du Langage, Institut des Sciences de l'Homme in Lyon, France.

Notes

1In this paper, the pronunciations of Chinese characters are given by the pinyin spelling, with tone following the syllable.

2We note that there are still words which have different spellings but actually come from the same origin, such as “check” and “cheque” in English.

3Among the 20 dialects, the data of 17 dialects are from (Han4yu3 Fang1yin1 Zi4hui4, “A collection of Character Pronunciation in Chinese Dialects”, abbreviated as Zihui 1989).

4Japanese and Korean have had heavy contacts with Chinese, and there are many Chinese borrowing words in these two languages. In Japanese there are two main layers of borrowings, called Kan-on and Go-on readings respectively.

5It was found that there are some repetitive entries in the lexicons for the three languages. Therefore, we carried out some cleaning processing on the lexicons to remove the repeated items.

6Wordform lexicon includes words like “walk”, “walked” and “walking” as individual items, while lemma lexicon excludes inflectional word forms, for example, in the above case, only “walk” is included, but not the above three inflectional forms. The ratios between the number of word forms and the number of lemmata give a rough idea how the three languages differ in their inflectional morphology complexity. The average number of word forms for each lemmata in German is much higher (321081/51728 = 6.2) than that of Dutch (313270/122400 = 2.6) and English (77031/41535 = 1.9), that is, in German the words have more inflectional forms on average.

7We note that this way of searching for homophones in the whole word list is still dependent on the size of the lexicon, but confining the first word in the first 5000 frequent word list at least provides some compatibility, as those pairs in which both members are infrequent are excluded.

8In English, there are similar phenomena to the disyllabification in Chinese. For example, in some areas in the United States, there has been a sound change merging [ϵ] and [I], which results in pairs of homophones such as “pen” and “pin”. It is found that these two words are expressed by adding a modifier, for example: “ink pen”, and “stick pin” in order to eliminate the possible confusion. Also, to differentiate the second person plural pronoun and the second person singular, the expression “you all” is often used to indicate the plural meaning. These examples show how ambiguity avoidance leads to fixed collocations of individual words. Though so far these words have not become lexical items, they may become lexicalized later.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.