209
Views
5
CrossRef citations to date
0
Altmetric
Articles

Large Scale Quantitative Analysis of three Indo-Aryan Languages

&
Pages 109-132 | Published online: 23 Feb 2016
 

Abstract

In this paper, we present a thorough quantitative analysis of large scale media text of three Indo-Aryan languages, viz. Hindi, Gujarati and Bengali. Population wise they together amount to 600 million speakers. Understanding and processing media text is very important from sociological, cultural and information science/theoretic stand points. We did a detailed study to understand the statistical nature of these data. The study demonstrates effect of size and category of media text on term distributions. We establish that while higher order n-grams tend to follow Zipf’s law, the same is not always true for unigrams. We attempt to model the change in term distribution in two separate parts: effect on steepness of the term distribution and that on the tail of the term distribution. To the best of our knowledge this is the first exploratory study of these three languages on such a large scale.

Acknowledgement

This work was supported by the Department of Electronics & Information Technology (DeitY) under the Cross Lingual Information Access project.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.