11,634
Views
1
CrossRef citations to date
0
Altmetric
Articles

What Does Fake Look Like? A Review of the Literature on Intentional Deception in the News and on Social Media

ORCID Icon, ORCID Icon, , ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all

ABSTRACT

This paper focuses on the content features of intentional deceptive information in the news (i.e., fake news) and on social media. Based on an extensive review of relevant literature (i.e., political journalism and communication, computational linguistics), we take stock of existing knowledge and present an overview of the structural characteristics that are indicative of intentionally deceptive information. We discuss the strength of underlying empirical evidence and identify underdeveloped areas of research. With this paper, we aim to contribute to the systematic study of intentional deception in the news and on social media and to help setting up new lines of research in which intentionally deceptive news items can be operationalized in consistent ways.

The recent outbreak of a novel coronavirus disease (COVID-19) vividly demonstrated how the deliberate spread of disinformation may have very real consequences for people’s beliefs and behavior. In the early stages of the pandemic, for example, rumors swirled that the rise and spread of COVID-19 could be linked to 5G cellular networks (Ahmed et al. Citation2020). The panic on social media was said to travel faster than the virus itself (Depoux et al. Citation2020) and within weeks, dozens of 5G masts were attacked throughout the United Kingdom and elsewhere.Footnote1 The fabrication and spread of intentionally deceptive information is not confined to the case of COVID-19, nor is it typical for British culture. The phenomenon is on the rise across the globe, as illustrated by a Freedom House report finding that 30% more countries were subject to online fake activity in 2017 than 2016.Footnote2 The rise of deliberate disinformation—nowadays often referred to as fake news—has sparked societal as well as academic debates. However, despite this heightened attention, important aspects of the phenomenon are still only marginally understood.

In the field of political communication and journalism, research on deceptive news can be roughly divided into three sub areas. First, there is conceptual work focusing on theoretical clarification, offering fake news typologies and connecting the recent upsurge to the context of changing media landscapes (e.g., Egelhofer and Lecheler Citation2019; Søe Citation2019; Tandoc, Lim, and Ling Citation2018; Waisbord Citation2018). Second, a strand of empirical work studies the dynamics, spread, and effects of fake news (e.g., Grindberg et al. Citation2019; Guess, Nyhan, and Reifler Citation2018; Pierri et al. Citation2020; Silverman Citation2016a), identifying the scope of exposure as well as the most relevant disseminators, and, as far as possible, assessing its impact on citizens’ attitudes and beliefs (see also Tfsati et al. Citation2020). Finally, there is work focusing on ways in which the spread and impact of disinformation could be tackled (e.g., Hameleers Citation2020; Lazer et al. Citation2017; Walter et al. Citation2020). Possible remedies that have been suggested are increasing levels of media literacy on the audience level and debunking (i.e., fact-checking) by journalists (Hameleers and Van der Meer Citation2019; Nyhan et al. Citation2020). Social media platforms and advertising networks have also been urged to take measures to reduce the prevalence of fake news on their systems as they violate policies against misleading content (Wingfield, Isaac, and Benner Citation2016).

These works have offered important insights into the nature, spread and scope of fake news, and the varying degrees to which counter measures such as fact-checking activities are successful. However, they offer less guidance when it comes to identifying the more general characteristics of fake news. In fact, we know only little about structural features that may facilitate the identification of disinformation. It may well be that intentionally deceptive information differs from regular news reports in only one respect: The intentions of the source. In that case, identifying intentionally deceptive information based on content characteristics alone is virtually impossible. However, it is reasonable to assume that the content of deceptive information is structurally different compared to fully correct information. For example, work in the field of linguistics suggests that deceptive language contains more tentative, angry, and emotional words than truthful language does (Asubiaro and Rubin Citation2018).

With this paper, we aim to fill this void by mapping existing knowledge about the structural features of intentional deception in texts. To delimit the scope, we focus specifically on messages that are created and distributed with the aim of influencing political attitudes, behavior, or processes. Two types of communication are distinguished: Intentional deception masqueraded as an entire news article (which often is referred to as fake news), and intentional deception in texts on social media (i.e., Twitter, Facebook). We adopt a systematic approach, reviewing different literatures to identify features that are common in both types of communication. To enhance a structured discussion of the results, we arrange the features by distinguishing between content features on the one hand and linguistic features on the other. The former refers to structural characteristics that deal with the substantive message that is communicated, such as ideological biases. The latter relates to structural characteristics of the language that is used, for example, the use of pronouns or the presence of swear. In sum, our overview offers a set of empirically validated criteria indicating the likelihood of intentional deception in text. Based on our findings, we discuss ambiguities and underdeveloped areas in this research field and suggest possible avenues for future research.

Intentionally Deceptive Information

The recent upsurge of fake news may elicit the view that we deal with a modern phenomenon. That is however not the case. The complicated relationship between news, politics, and the truth did not start with the inauguration of President Trump, nor with the advent of the Internet (Schudson Citation2019). In fact, intentionally deceptive information in the news is all but new; it has been around since the establishment of the press, although not equally present in all periods since (Ortoleva Citation2019). What makes it such a pressing concern today is that the digital media environment enhances the enormous and rapid spread of information. For example, we know from several studies that Facebook was a key disseminator of fake news stories during the 2016 Presidential election campaign (e.g., Guess, Nyhan, and Reifler Citation2018). Against this backdrop, some have argued that we have entered “a post-truth era” (Benkler, Faris, and Roberts Citation2018, 23) which has provoked renewed academic interest in deceptive information—nowadays often referred to as fake news.

Despite this heightened interest, the concept remains contested which is illustrated by the numerous operationalizations and typologies of fake news (e.g., Tandoc, Lim, and Ling Citation2018). Also from an audience perspective, no unanimity exists as people hold rather diverging views of what the phenomenon entails (Nielsen and Graves Citation2017). With this paper, we hope to contribute to research in this area by identifying the content features that are indicative of intentionally deceptive information in news-like texts and on social media. Given the conceptual ambiguity and the frantic pace at which this research field has developed over the past few years, it is key to clearly limit this study’s scope and to adopt a systematic approach when reviewing the relevant literature. We do so in the following ways.

First, to demarcate the scope of our review, two dimensions are of particular interest: The specific format in which deceptive information is presented and the intentions underlying the creation and spread of deceptive information. Regarding the former, we are interested in the manifestation of intentional deception in written news-like reports and on social media. Studies that focus on intentional deception in other sorts of texts, ranging from emails (Zhou et al. Citation2004) to financial reports (Humpherys et al. Citation2011), are not included in the current study. Related to deceptive information in the news, we include studies that focus on deceptive information that masquerades as an entire news article. Regular or truthful news reports that contain one or more false claims thus fall outside this study’s scope. Related to the format of social media messages, we include studies that focus on textual communication on social media platforms. Thus, visual messages or hybrid communication in which text is subordinate to the image—such as memes—are not included, neither are videos and so-called “deepfakes”. Regarding intentions, Allcott and Gentzkow (Citation2017) identify two motivations underlying the production of intentional deception in texts: financial and ideological. We are mainly interested in the latter, thus in content that is created and distributed with the goal of promoting particular ideas or people, often by discrediting others. In news-like texts, this is often achieved by taking on some form of journalistic credibility by trying to mimic the look and feel of real news (Tandoc, Lim, and Ling Citation2018). This implies that satire falls outside this study’s scope, as does commercial (native) advertising or clickbait articles that are created out of commercial considerations alone. For social media, we include studies that focus on intentional deceptive information that is mainly produced in order to influence public opinion—and not only to entertain, for example.

It is important to note, however, that the intentions of news sources are typically not known. This is a major challenge to research in this area, as communication scholars generally distinguish disinformation from misinformation and regular news by referring to the intentions of the source—without measuring them. The most common way to work around this issue is to rely on datasets with validated examples of news that have been proven to be false, containing information with very low levels of facticity (Tandoc, Lim, and Ling Citation2018). Examples of such datasets are fact-checking websites such as PolitiFact or the Washington Post Fact Checker (for an overview, see Omezi and Jahankhani Citation2020). Theoretically, there is a possibility that these news stories identified as false are examples of misinformation (not being meant to deceive) instead of disinformation. The same applies to tweets in which the content is not compatible with reality (low facticity). All this is to say that in some cases, we are forced to make assumptions about the underlying intentions of the source, without being able to substantiate these assumptions with empirical assessments.

Second, we have collected studies in a conscious and systematic way. Key to the validity of our review is that the findings are representative of a fast-growing research area that covers diverse literatures. More specifically, we identified relevant research in two ways. First, we restricted ourselves to scholarly articles. To identify relevant studies, we used Google Scholar to search for articles published in peer-reviewed journals between 2010 and 2021. This time frame was selected because it covers the vast majority (>95%) of all scholarly work dealing with intentional deception in news-like texts and on social media. To select relevant articles, we used the search terms misinformation, disinformation, and fake news. Relevant studies had to have at least one of these search terms in their title, abstract, or as keyword. Book reviews, editorials and other types of content that usually are not peer-reviewed were filtered out, as were articles that only mentioned the search terms in passing or that were published in non-reputable journals.Footnote3 Based on this corpus, we close-read the selected articles and identified the ones—regardless of discipline—that dealt with third-person detection of intentional deception in news-like texts and on social media. Papers dealing with intentional deception in other sorts of content were excluded from the analysis. In addition, we followed up the references in the articles, to make sure we had not missed any relevant studies. If we came across papers that dated from earlier periods (before 2010), we read them and included them if they focused on intentional deceptive information in news-like texts or on social media. This process revealed that most relevant research has been published in the fields of journalism, political communication, and (computational) linguistics.

Altogether, we identified more than 30 relevant studies. Having examined these, we list the characteristics that are found to be indicative of intentional deception in text. We subdivide these features into two broad categories: content features and linguistic features. These categories are not fully mutually exclusive, and they mainly serve to enhance a more structured discussion of the results. We start off with the content features that have been identified across literatures. For each feature, we discuss some of the most relevant studies, instead of providing an exhaustive overview.

Content Features

In the examined literature, we observe four structural characteristics that relate to the substantive message that is communicated by intentional deceptive texts. These messages often have an ideological bias in favor of the right, are designed to provoke negative emotions such as anger and fear, contain little verifiable information, and make use of fully packed and sensationalist headlines. We discuss each feature in more detail below.

Ideological Bias

Research suggests that intentionally deceptive information tends to be ideologically biased in favor of the right. The U.S. 2016 presidential election has been a catalyst for academic research on disinformation—quite some influential work focuses on the U.S. context in the years preceding and following this election. For example, Faris et al. (Citation2017) analyze mainstream and social media election coverage. Their data cover over two million stories, published by approximately 70,000 online media sources between May 1, 2015 and Election Day (November 8, 2016). While partisan bias exists on both sides of the political spectrum (Bradshaw et al. Citation2020), the content receives more amplification and legitimation on the right side of it, especially on Facebook (Faris et al. Citation2017). In fact, intentionally deceptive information in news-like texts was found to often have a pro-Trump signature (Silverman Citation2016b), and stories favoring Trump were shared more widely than the ones favoring Clinton (Allcott and Gentzkow Citation2017; Lazer et al. Citation2017). In a similar vein, Benkler, Faris, and Roberts (Citation2018) collected and analyzed two million stories published during the 2016 presidential election campaign and 1.9 million stories about the Trump presidency during its first year. In both studies, the spread and reach of disinformation is examined by tracing cross-linking patterns between media sources, including sharing activities on Twitter and Facebook. The insulated right-wing media ecosystem is found to be much more susceptible to disinformation than the other side of the spectrum, spanning from center-right to far-left publications. Similarly, Marwick and Lewis (Citation2017) study media manipulation in the U.S. context by far-right groups in the run-up to the 2016 election. They conclude that most Clinton supporters got news from mainstream sources, while many Trump supporters were surrounded by a far-right network which “peddled heavily in misinformation, rumors, conspiracy theories, and attacks on the mainstream media” (Marwick and Lewis Citation2017, 21).

Further research is however needed, especially between-country and over-time analyses, to examine whether this partisan asymmetry is a persistent phenomenon, or whether it is a transient artifact elicited by the Trump administration (Freelon and Wells Citation2020, 152). There is some work examining partisan content of disinformation outside the U.S. context. For example, Pierri et al. (Citation2020) examine Italian disinformation on Twitter and find that most topics concern polarizing arguments related to immigration, crime, and national safety—issues that mirror the agenda of the conservative and far-right political community. Similarly, German disinformation is found to contain mostly right-wing implications “such as skepticism toward the European Union (…) and above all the exclusion of migrants and refugees” (Zimmermann and Kohring Citation2020, 221).

Use and Presence of Emotions

Research from different fields indicates that the use and presence of emotions may be structurally different in intentionally deceptive information compared to regular news reports. While the latter also are pervaded by subjective language in the form of journalistic appraisals (see e.g., Wahl-Jorgensen Citation2013, Citation2019, Citation2020), the use of emotions in intentional deceptive news items is considerably different: emotions are more visible, more prominent, and more negative.

There are quite some studies that provide empirical support for this observation. Bradshaw et al. (Citation2020) for example, have developed a grounded typology distinguishing different types of news and information that was spread and shared on Twitter during the 2016 U.S. Presidential election and the State of the Union address in January 2018. Based on 21.8 million tweets, they analyze the domains (URLs) that were most often shared. Most relevant to the purpose of the current paper is what they call junk news—sources that deliberately publish or aggregate misleading, deceptive, or incorrect information packaged as real news about politics, economics, or culture (Bradshaw et al. Citation2020, 188). A key feature of this type of deceptive information relates to emotionally driven language; propaganda techniques are used to persuade users at an emotional rather than cognitive level. A similar approach is adopted by Neudert, Kollanyi, and Howard (Citation2017), who examine the spread and reach of intentionally deceptive information during campaign periods in Germany and, later, in Germany, France, and the UK (Neudert, Howard, and Kollanyi Citation2019). They also find that emotionally charged words are key to these types of messages. Scholars analyzing intentionally deceptive information on Breitbart’s Facebook timeline also conclude that the content is affective, purposively designed to provoke voter outrage and to elicit fear and disgust (Benkler, Faris, and Roberts Citation2018; Faris et al. Citation2017). It is thus important to note that emotive language mainly refers to negative emotions, as deceptive news or tweets aim to elicit negative affect among those who consume it (see also Hameleers, van der Meer, and Vliegenthart Citation2021).

Emotive language and words are also identified as a typical feature in a whole different strand of studies. Deception detection is a thriving research field in computer science and linguistics. To develop methods of automated detection, scholars rely on benchmark labeled datasets (e.g., PolitiFact) that contain truthful and deceptive news content that has been fact-checked for its veracity. Based on such datasets, machine learning models are developed to automatically detect disinformation based on linguistic features, with varying degrees of success (e.g., Asubiaro and Rubin Citation2018; Wang Citation2017). Horne and Adali (Citation2017), for example, use three separate datasets to study the linguistic features of real news, fake news, and satire. They conclude that intentionally deceptive news typically contains more negative emotive words than truthful news (see also Zhou and Zafarani Citation2020). Also fabricated headlines score high on emotiveness (Asubiaro and Rubin Citation2018, see for similar findings Volkova et al. Citation2017). In relation to social media, Van Der Zee et al. (Citation2018) conduct a linguistic analysis on 447 tweets written by formerU.S. President Trump that were fact-checked by The Washington Post. That study suggests that truthful tweets contain more positive sentiments, while deceptive tweets have more negative emotions. Research examining Russian disinformation campaigns on social media also find that these messages aim to provoke outrage and anger against oppositional outgroups. A common strategy to achieve this effect is by vilifying political and social adversaries (Freelon and Lokot Citation2020; Howard et al. Citation2019).

Verifiability

Maybe not surprisingly, intentional deceptive information tends to be less verifiable. Journalism scholars have examined examples of journalistic fakery in order to identify salient features. For example, Lasorsa and Dai (Citation2007) compare deceptive news stories with authentic ones and conclude that fake news stories often deal with issues that are conducive to secrecy, which may conceal the absence of verifiable information. The latter also relates to the use of sources, by which we mean individuals who provide information or quotes for news articles. Intentionally deceptive stories contain more references to sources; however, those sources are rarely presented in a way that makes them traceable. Identifiable names are not provided, as sources are typically anonymous and vague (Bonet-Jover et al. Citation2021). Others refer to the use of conspiratorial and dubious sources, without fact-checking (Bradshaw et al. Citation2020; Marchal et al. Citation2019; Neudert, Howard, and Kollanyi Citation2019). The use of non-verifiable sources is also observable in linguistic features, such as the greater presence of pronouns instead of specific source names (see below). Furthermore, sources are less often used for direct quotes, as quoted content is found to be lower in fake news compared to legitimate news (Reddy et al. Citation2020). A lack of verifiable information resonates with work suggesting that source cues are less and less important for how people find and process information about public affairs, which may ultimately enhance the consumption of deceptive messages (e.g., Kalogeropoulos, Fletcher, and Nielsen Citation2019).

Use of Headlines

Headlines are a strong differentiating factor between fake and real news (Horne and Adali Citation2017). In general, the headlines of intentionally deceptive news items tend to be eye-catching, with a propensity for exaggeration, sensationalism, and scaremongering (Chen, Conroy, and Rubin Citation2015; Potthast et al. Citation2016; Sahoo and Gupta Citation2021; Shu et al. Citation2017; Tucker et al. Citation2018). In addition, these headlines tend to be longer (Asubiaro and Rubin Citation2018 ; Liu et al. Citation2019), using more capitalized words, proper nouns and verb phrases. By doing so, titles of fabricated news items try to get many points across, while titles of truthful texts most often opt for brief and general summary statements (Horne and Adali Citation2017, 764). For example, Liu et al. (Citation2019) show in their study on health-related information on Chinese social media that the long headlines of fake news articles often displayed patterns of “click-baiting” measured through the use of imperative idioms such as “(you) must” or “never (do this)”.

Linguistic Features

In addition, based on the examined literature, a number of features can be identified that relate to the specific language that is used in intentional deceptive news-like texts or social media posts. Summing up, intentionally deceptive texts tend to be characterized by a more frequent use of capitalization, pronouns and informal language or swear. In addition, there are three linguistic features that are often examined but for which no unambiguous evidence is found. For levels of lexical diversity, text length, and the use punctuation the literature offers diverging clues.

Lexical Diversity

Higher degrees of lexical diversity imply a higher number of unique words used in a text, when this number is adjusted to be independent of the length of the text, it is referred to as the Type-Token Ratio. Lower degrees of lexical diversity in intentionally deceptive communication have been confirmed in a variety of settings (e.g., Fuller, Biros, and Delen Citation2011; Newman et al. Citation2003; Zhou et al. Citation2004). However, applied to news articles, results are mixed. Horne and Adali (Citation2017), comparing the linguistic features of real and fake news content, find that intentionally deceptive news items are characterized by lower levels of lexical diversity than real news articles: “Fake articles need a slightly lower education level to read (…) They seem to be filled with less substantial information, which is demonstrated by a high amount of redundancy, more adverbs, fewer nouns, fewer analytic words, and fewer quotes.” (Horne and Adali Citation2017 : 763). This observation is supported by Ahmed, Traore, and Saad (Citation2018 ) who find higher usage of verbs and adverbs for fake news, whereas real news is characterized by a higher proportion of nouns and adjectives. In contrast, Abonizio et al. (Citation2020), comparing fake and legitimate news across three corpora (in English, Portuguese, and Spanish), find higher Type-Token Ratio values for the fake news corpora. Similarly, there is some evidence pointing towards higher lexical diversity in deceptive news, specifically in terms of the number of unique verbs used (Zhou and Zafarani Citation2019, Citation2020).

Capitalization

There is some evidence that intentionally deceptive news-like texts tend to include excessive capitalization, in the headline as well as in the body of an article, in order to attract attention (e.g., Bradshaw et al. Citation2020; Marchal et al. Citation2019; Neudert, Kollanyi, and Howard Citation2017; Reddy et al. Citation2020). This may also apply to tweets, where the excessive use of uppercase text (> 70 percent) has been found to be an indicator of false content (Srivastava, Rehm, and Schneider Citation2017).

Use of Pronouns

The use of pronouns might also be structurally different. Applied to news content, Rashkin et al. (Citation2017) find that first-person and second-person pronouns are more often used in deceptive news texts. The authors explain the difference by referring to journalism practices. Editors of trustworthy sources are possibly quite rigorous about removing language that seems too personal, while such processes do not play a role in the production of fabricated news items (see also the findings about source use). In a similar vein, Asubiaro and Rubin (Citation2018) find that the use of (all types of) pronouns is typically higher in deceptive news. Applied to social media, the aforementioned study by Van Der Zee et al. (Citation2018) finds similar patterns (for third-person pronouns) for tweets by former President Trump that were established to be deceptive by The Washington Post. Gupta et al. (Citation2014), who also study communication on Twitter, develop a model that automatically assesses the credibility of tweets during crises. Different from the study by Van der Zee et al, they base themselves on a large N dataset, covering tweets from millions of users. A higher use of pronouns is one of the criteria on the basis of which tweets score lower on credibility.

Length

Turning to the length of texts and the difference between deceptive and truthful communication, research has provided some mixed results. This might be due to the type of text that is considered. Concerning fake news reports, these tend to be shorter than truthful news stories: the number of words as well as the number of paragraphs is significantly lower (Asubiaro and Rubin Citation2018; Horne and Adali Citation2017). In tweets, however, the opposite pattern seems to apply as tweets by former President Trump are found to be longer when they contain deceptive information (Van Der Zee et al. Citation2018).

Informal Language and Swear

In regular news reports, informal words and language are generally kept to a minimum. However, in intentionally deceptive news items, slang and swear (“inflammatory language”) are found to be rather frequent (e.g., Asubiaro and Rubin Citation2018; Gupta et al. Citation2014; Neudert, Howard, and Kollanyi Citation2019; Rashkin et al. Citation2017; Zhou and Zafarani Citation2020). In a recent paper, Hameleers, van der Meer, and Vliegenthart (Citation2021) conduct an extensive content analysis of deceptive statements that were fact-checked by Politifact.org and Snopes.com. They find that completely false information is most likely to contain hate speech and incivility. As a result, they conclude that hate speech and incivility may be considered indicators of disinformation.

Punctuation

Finally, several studies examine the use of punctuation as a possible feature of intentionally deceptive information in news-like texts or on social media. Punctuation includes periods, comma, colon, semi-colon, question marks, exclamation marks, and quotes. Results are mixed. There is some evidence that fake news articles use fewer punctuation (e.g., Horne and Adali Citation2017), but others find more use of punctuation, at least in the headlines (Asubiaro and Rubin Citation2018; Liu et al. Citation2019). Further, there may be differences dependent on the specific punctuation signs that are used. For instance, Liu et al. (Citation2019) conclude in their study of health-related information on Chinese social media that fake news articles use more exclamation marks in their headlines, whereas real news articles utilize more question marks.

Summing up, this review shows that there are a number of characteristics that are potentially indicative of intentionally deceptive content. summarizes the features, together with references to relevant studies.

Table 1. Overview of identified features indicative of intentionally deceptive texts in the news and on social media.

Evaluation of Features—Solidity Versus Ambiguity

The list of features that follows from the above review—and that is summarized in —varies in terms of the strength and unambiguity of the underlying empirical evidence. For some of the features there is considerable empirical support, such as the ideological bias and the use of emotions. Lower levels of verifiability, the use of eye-catching headlines and informal words and language are also recurring observations across disciplines. It must be noted, however, that most of the work focuses on intentionally deceptive information in the news, and less can be said about the applicability of these findings to the content that is produced on social media. This may be related to higher individual variation in writing style on social media (Faustini and Covões Citation2020). For some of the features, research provides mixed results. Our review indicates that different automated detection techniques may lead to different observations, for example in the case of punctuation.

In addition, our systematic review also facilitates the identification of underdeveloped research areas. First of all, comparative research is scarce. A notable exception is provided by Humprecht (Citation2019), who compares (fact-checked) online disinformation across English-speaking (US, UK) and German-speaking countries (Austria, Germany). She observes how the subject of fake news strongly mirrors national news agendas: topics such as health care (US) and macroeconomic issues (UK) prevailed in the English-speaking countries, whereas fake news produced in German-speaking countries often targeted and vilified immigrants, as the refugee situation was on top of these national agendas. Such an approach is valuable as it sheds light on cross-national differences and similarities. The study indicates that—despite considerable cross-national variation—intentionally deceptive information thrives best on those topics that are most polarized in a given context. More comparative research is needed in this area. Relatedly, the bulk of research focusing on the automated detection of deceptive texts is based on the English language (but see e.g., Abonizio et al. Citation2020; Chu, Xie, and Wang Citation2021; Faustini and Covões Citation2020). Work in which these or similar techniques are developed based on other world languages such as Mandarin and Spanish would be extremely useful as there is no reason to expect that intentionally deceptive information stops at the borders of the Anglophone world. Finally, research that looks into deceptive information on social media is strongly shaped by considerations of data availability. As a result, most of what we know is based on data stemming from Twitter and public Facebook pages. Private Facebook pages and, more importantly, private messaging apps are rarely examined. For many reasons, including ethical ones, studying these sources is a challenging endeavor. However, as a substantive part of our daily communication runs through these channels, academic research into the prevalence of deceptive information is needed.

As intentionally deceptive information is a phenomenon with potentially far-reaching consequences for democratic stability, the question arises how the identified features may facilitate countermeasures such as fact-checking activities. This question is even more relevant, as many scholars engaged with automated deception detection, propose human-in the-loop solutions to ultimately detect fake news (Bourgonje, Schneider, and Rehm Citation2017; Conroy, Rubin, and Chen Citation2015). Linguistic analyses provide valuable tools but cannot fully replace human judgments (Rehm Citation2017), as automated models may only detect certain types of fake content (Ruchansky, Seo, and Liu Citation2017) and may be vulnerable to false positives, for example when real news is under-written or deals with specific topics (Zhou et al. Citation2019). There is a rich literature on the effectiveness of several fact-checking tactics (e.g., Nyhan and Reifler Citation2015; Nyhan et al. Citation2020; Walter et al. Citation2020), that also highlights some ongoing controversies around this rather new and expanding practice, such as accusations of partisanship (Graves and Cherubini Citation2016; Graves Citation2017). Our review may be of use to this fast-growing industry, especially in relation to the veracity of information in news-like texts. Identifying features that can draw from substantial empirical support—e.g., use of emotions—may facilitate a more effective selection of material to be scrutinized. The partisan bias identified in this paper is particularly relevant in light of ongoing discussions about the objectivity of fact-checking. As fact-checking is largely motivated by journalistic ideals (Graves, Nyhan, and Reifler Citation2016), objectivity is a first prerequisite for any fact-checking activity. To prevent any criticism of taking sides, some fact-checking organizations have taken a methodological approach to maintain political balance (Graves and Cherubini Citation2016). The observation that intentionally deceptive information is not equally spread across the political spectrum, may justify a focus on the more right-leaning end of it. Finally, recent work by Clayton et al. (Citation2020) suggests that social media platforms may combat fake news by providing deceptive headlines with a warning flag (“rated false”). The selection of these headlines may benefit from our review, that identifies long and sensationalist ones as indicative of deceptive content.

Conclusions

This paper has reviewed research related to content features of intentionally deceptive information and sought to combine scattered knowledge into a single overview. Estimating the likelihood of news being fake—based on content alone and in the absence of additional information such as source cues or network characteristics—is never watertight, and in some cases, it might approach an educated guess. With this paper, we hope to contribute to the “educated” part of this process and to provide more solid ground for informed estimations. Based on a systematic review of the literature and a careful examination of the underlying scientific evidence, we identify several elements as agreed-upon indicators of intentionally deceptive news content: an ideological/partisan bias and the use of negative emotions provoking anger or fear. In addition, low levels of verifiability, the presence of a long sensationalist headline in which a lot of information is packed, and the use of informal language or swear, further increase the probability of dealing with intentionally deceptive news items. Interestingly, the features identified are rather similar for social media. However, as most research thus far has focused on the spread of disinformation through social media, less attention has been devoted to deceptive user generated content on the platforms themselves.

For several reasons, we believe that this review is important. First, it may contribute to the development of methods that enable a more reliable detection of deceptive information. This has proven to be a challenging endeavor. While the relevance of such a method clearly crosses the boundaries of a single discipline, there have not been much interdisciplinary collaborations. Second, the list of features may facilitate a more consistent operationalization of fake news, which can be applied in new research settings such as experiments and content analyses. Especially the latter may help scholars to examine the prevalence of the phenomenon, in general and across topics (see also Tucker et al. Citation2018, 56).

Of course, this paper is not without limitations. First of all, our review is inevitably incomplete. We selected studies by using the search terms misinformation, disinformation, and fake news. Research that engages with these and related phenomena, spanning several disciplines, is growing at a frantic pace with new and insightful studies being published every week. To minimize selection bias, we adopted a systematic approach and followed up on all references, but we nevertheless realize that results might change as new studies are being published. Second, a sole focus on content implies that we disregard other sorts of information that may be indicative of deception. Most notably, the specific source (i.e., website, platform) of an article and the visual material that accompanies it, may have a bearing on the perceived credibility (e.g., Hameleers et al. Citation2020). Krafft and Donovan (Citation2020), for example, show that visual “evidence” collages are a key strategic element in the formation and spread of disinformation. Related to social media, Zhang and Ghorbani (Citation2020) argue that an analysis of content alone is not sufficient for (automated) deception detection, as it should be combined with an analysis of other sorts of data such as information relating to author and user characteristics (see also Bondielli and Marcelloni Citation2019). Finally, the focus on textual characteristics also implies that we do not consider other sorts of online deceptive messages, such as deepfakes. Given their potential impact on trust in news (e.g., Vaccari and Chadwick Citation2020), it is important that future research takes up this challenge.

We are fully aware of the fact that any review or line of research enhancing the identification of intentionally deceptive information does not change the increasingly thorny relationship between news and truth. Nor do fact checking activities or measures against misleading content on social media platforms. Graves and Wells convincingly (Citation2019) argue that scholars tend to perceive the fake news phenomenon mainly as a matter of information availability, which has led to a focus on remedying falsehoods by containing the spread of bad information and replacing it with facts. In a sense, our paper fits in this tradition as it also builds on the assumption that when falsehoods are detected, we have moved closer towards a cure. It needs to be noted though that such an informational approach does not have much leverage if or when truth is not embedded in and supported by institutions that enable and require citizens as well as institutions to respond in good faith to public truth claims (Graves and Wells Citation2019, 39–40). In other words: When intentionally deceptive information is successfully detected and reported, and yet it does not seem to matter, there is not much to be won. If this is the case, more fundamental system change might be necessary to restore the power of truth in modern democracy.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work is part of the research project “Knowledge Resistance: Causes, Consequences, Cures,” funded by Riksbankens Jubileumsfond for the Advancement of the Humanities and Social Sciences [grant number M18-0310:1].

Notes

3 More specifically, when articles were published in journals not published by established publishers, we checked the journals using www.predatoryjournals.com and www.journalguide.com.

References