Abstract
We explore frequently used techniques such as n-gram, support vector machine, and conditional random field approaches to identify language in code-mixed data of social network. Digging deeper the language identification problem as well content of social network we found that irrespective of language, users are more convenient inphoneme/phone-based writing system compared to following the actual writing convention. We also found that hardly few messages are written using standard norms. This article discusses a simple phone-based n-gram method to identify language in text of social network. Obtained result is encouraging and comparable to state-of-the-art of the code-mixed data.
Mathematics Subject Classification 2010: