Abstract
Any sequence of written symbols is called a message for convenience. The information associated with any one symbol depends on the relative frequency of occurrence and the average over the various symbols of a language is termed entropy. A more accurate description should take into account a larger number of previous symbols or blocks of previous symbols and their effect on the succeeding symbol. The increasing dependency of the symbols on those preceding them reduces the entropy per symbol.
Shannon has suggested a fairly reliable method for finding the reduction in entropy with increasing length of the message. In this method, the number of trials a subject makes to get the correct letter, is used as the score for that letter. These scores for the various positions or order N, are used to find the entropy.
Shannon's predictive entropy method is repeated for Indian languages on phoneme basis and the entropies for various orders are obtained for Hindi and Tamil. The entropies on grapheme basis are also obtained. The entropy at the hundredth phoneme or grapheme may be taken to indicate the true value of the entropy in the natural language.
Indexing terms: