Abstract
A bilingual dictionary is a valuable linguistic resource which records, among other things, the differences in the segmentation of semantic space by the two languages and hence the difficulty in producing faithful translations between the two languages. Statistical analysis of nearly a hundred dictionaries has allowed us to determine how best to measure the semantic distance between languages from bilingual dictionaries.
The distribution of the number of words in language A having n translations in language B, for n = 1, 2, 3, etc., was found to have a specific shape depending on the semantic distance between the two languages. A sample of only a thousand words was sufficient to obtain an estimate of semantic distance.
We give a theoretical justification for this distance based on models of the historical evolution of monolingual and bilingual dictionaries.
Among our linguistic findings, we discovered, for example, that French is semantically closer to Basque than to German. We envisage an application of our semantic distance measure in the choice of an intermediate language when performing indirect translation, i.e. translating from language A to language B via a third language C.