124
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Linguistic Properties of Emojis: A Quantitative Exploration of Emoji Frequency, Category, and Position on Twitter

, , , &
Pages 183-209 | Published online: 06 Jun 2024
 

ABSTRACT

Emojis in digital communication have drawn increasing academic attention. Qualitative studies mainly rely on a presumption that emojis share similar properties with units of natural language. It remains to be explored with quantitative methods whether emojis exhibit the same or similar behaviour from linguistic units (like words, morphemes). This study investigates emoji features in relation to language properties based on Zipf’s law and linear regression models. Results show that, firstly, the rank frequency distribution of emojis can be well fitted by Zipf’s law, and the parameters of emoji distribution are closer to those of written language. Secondly, most emoji categories tend to occur in the latter half of the tweet; while in some cases, they can also be at the beginning or in the middle of a tweet. Thirdly, the relative position of the more frequently-used emojis will be further back in the tweet. When emojis’ frequencies are relatively greater, their categories vary more in terms of their positions. In general, our quantitative findings suggest that emojis display linguistic properties to some extent. Our exploratory study demonstrates the value of applying linguistic laws and quantitative methods to investigate emoji features, extending the application of quantitative linguistic methods into emoji studies.

Acknowledgments

The authors would like to thank the reviewers for their insightful and constructive feedback, which has played a crucial role in refining the arguments presented in this version. As a collaborated project, this study is funded by several grants, including the Humanities and Social Sciences Fund of the Ministry of Education of PRC (Grant No. 22YJA740037 and Grant No. 21YJC740060), the MOE Project of Key Research Institute of Humanities and Social Sciences at Universities in China (No. 22JJD740018), and Guangdong Planning Office of Philosophy and Social Science of PRC (Grant No. GD23YWY02 and Grant No. GD21CWY07).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. This term is defined by Uhlířová (Citation2007) as word frequency as conditioned by the word position in a sentence.

2. In this study, it is important to emphasize that the Unicode versions utilized, namely Unicode 11.0 for the 2018 data and Unicode 13.0 for the categorization in 2021, do not impact the stability of existing emojis. Regardless of Unicode version updates, each emoji is associated with a unique and unchanging Unicode codepoint. Therefore, the categorization applied to these emojis in our analysis remains consistent and valid over time, irrespective of Unicode iterations.

3. For positional information, we investigated emojis’ relative positions in the whole tweet rather than in a clause or sentence, as qualitative analysis would normally do. Some tweets include emojis occurring in the middle without punctuation (Zappavigna, Citation2015), making it impossible to parse clauses or sentences accurately in a large-scale corpus. We follow the norms of studies on emoji position to take the whole tweet as the unit for examining position information (Zhao et al., Citation2018).

4. To test the reliability of the Zipfian distribution against the possibility of random occurrence, a Monte Carlo simulation approach was used. A total of 1000 simulations were conducted, with the frequency data being randomly shuffled within each simulation. The following results were obtained: Mean Durbin-Watson Statistic: 2.0018, Standard Deviation of Durbin-Watson (DW) Statistic: 0.0355, Range of Durbin-Watson Statistic: 1.8906 to 2.1143. A DW around 2 suggests an absence of autocorrelation within the residuals of randomly structured datasets (Koplenig, Citation2018). The mean DW statistic from the simulations supports the presence of a Zipfian relationship.

5. It should be noted that in one of the previous studies (Lin et al., Citation2014), parameter values of Zipf’s law in spoken texts are larger than those in written books. Their research differs from ours in that their parameter values are based on less frequent words (with a rank range of [60,1000], ibid, p.4) instead of the whole distribution. To ensure consistency and improve comparability, we refrained from truncating each distribution and opted instead to compare the entire distribution of emojis, spoken language, and written language in the current research. That said, different regimes of word frequency distributions are indeed found by Ferrer i Cancho and Solé (Citation2001). Thus, part of the rank-frequency distributions, say, highly frequent and less frequent tokens in emoji distributions still merit further studies in the future.

6. The presence of Male () and Female () gender symbols amongst the top 20 emojis warrants specific attention. These symbols are conventionally used as gender markers attached to other person-depicting emojis for rendering purposes, as detailed by the Unicode Consortium (http://www.unicode.org/L2/L2016/16181-gender-zwj-sequences.pdf). While they may not represent standalone communicative symbols, their usage within our corpus reflects current digital communication practices where users combine these markers with other emojis to specify gender nuances. Thus, those two emojis were not excluded from the current corpus.

7. It should be noted that, unlike previous studies such as Neophytou et al. (Citation2017), our study does not compare emoji type differences with parameter differences due to the unequal sample sizes.

8. According to Halliday (Citation1994), the ‘theme’ in Systemic Functional Linguistics refers to the element that serves as the point of information departure or what the clause is about. An ‘unmarked’ theme is typically the subject of a sentence; a ‘marked’ theme, on the other hand, occurs when an element other than the subject is foregrounded at the beginning of the sentence.

9. https://emojipedia.org/objects, accessed Mar 8, 2024.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.