Abstract
The present paper is an attempt at analysing the pattern of occurrence of different alphabets of the Hindi alphabet or varṇamālā in the text and corpus of Hindi. One text has been selected and nine corpora have been formed and taken for the present study by the compilation of different texts or corpuses selected from diverse sources. An assessment of the relative proportion of the vowels and members of different groups of Devanāgari symbols, according to the Phonological Inventory of Indic script, by the rank frequency approach has been done and Zipf’s orders of the groups have been discussed. The entropic measure of various groups has also been compared. Moreover, the characteristic curves for the Hindi language text or corpus regarding the proportion of various groups of symbols and consonants have been presented. By using the linear regression techniques for one and two independent variables, a suitable model for the frequencies of different groups has been determined.
Acknowledgement
The authors are grateful to the University Grants Commission (UGC), New Delhi, India for providing financial assistance in the form of Post doctoral fellowship [F.4-2/2006(BSR)/13-770/2012(BSR)] to the first author. The research has been sponsored by the UGC under the “UGC Dr. D. S. Kothari Post Doctoral fellowship scheme”.
Notes
1 From http://taj.chass.ncsu.edu/Hindi.Less.25/wrtingsys.htm (Retrieved on 15/03/2014).
2 http://www.isamaj.com/kidzcorner/hindi/Varnmala.html, http://www.indif.com/kids/learn_hindi/hindi_alphabets.aspx, http://iteachhindi.blogspot.in/2010/05/hindi-varnamala-hindi-alphabets.html (Retrieved on 15/03/2014).
3 http://www.learning-hindi.com/post/1573331422/lesson-72-ipa-and-hindi (Retrieved on 15/03/2014).
4 www.bodhgayanews.net/hindi/HIN11_Script_Intro.pdf (Retrieved on 15/03/2014).
5 <en.wikibooks.org/wiki/Hindi/Speaking_and_Writing> (Retrieved on 15/03/2014).
6 Articles for corpora T3, T4 and T5, selected from the website of ‘Navbharat Times’, are of time period from 30 Oct. 08 to 5 Dec. 08. Katha is the Hindi translation of a story and ‘Katha-Sagar’ refers collection of short stories. Similarly from ‘Sampadakeey’ (Hindi translation of editorial) section editorials and from ‘Nazaria’ (viewpoint) perspective articles by different writers have been selected.
7 “ELRA catalogue (http://catalog.elra.info), The EMILLE/CIIL Corpus, catalogue reference: ELRA-W0037”.
8 In section ‘miscellaneous’ tagged as literature-novel; literature-essay and literature-myths respectively.