19
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

Phone-based identification of language in code-mixed social network data

 

Abstract

We explore frequently used techniques such as n-gram, support vector machine, and conditional random field approaches to identify language in code-mixed data of social network. Digging deeper the language identification problem as well content of social network we found that irrespective of language, users are more convenient inphoneme/phone-based writing system compared to following the actual writing convention. We also found that hardly few messages are written using standard norms. This article discusses a simple phone-based n-gram method to identify language in text of social network. Obtained result is encouraging and comparable to state-of-the-art of the code-mixed data.

Mathematics Subject Classification 2010:

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.