15,411
Views
25
CrossRef citations to date
0
Altmetric
Research Article

Computational Social Science and the Study of Political Communication

ORCID Icon & ORCID Icon

ABSTRACT

The challenge of disentangling political communication processes and their effects has grown with the complexity of the new political information environment. But so have scientists’ toolsets and capacities to better study and understand them. We map the challenges and opportunities of developing, synthesizing, and applying data collection and analysis techniques relying primarily on computational methods and tools to answer substantive theory-driven questions in the field of political communication. We foreground the theoretical, empirical, and institutional opportunities and challenges of Computational Communication Science (CCS) that are relevant to the political communication community. We also assess understandings of CCS and highlight challenges associated with data and resource requirements, as well as those connected with the theory and semantics of digital signals. With an eye to existing practices, we elaborate on the key role of infrastructures, academic institutions, ethics, and training in computational methods. Finally, we present the six full articles and two forum contributions of this special issue illustrating methodological innovation, as well as the theoretical, practical, and institutional relevance and challenges for realizing the potential of computational methods in political communication.

Digital communication has opened up a host of new avenues for social and political interactions. These have radical effects on political information environments and the democratic attitudes and behaviors they shape (van Aelst et al., Citation2017). Not only are citizens able to produce content and have their voices heard in ways that were inconceivable two decades ago, but systemic changes within the broader media ecology, such as the expansion of choice in the media environment and the increasingly important role of social networking sites as sources of political information, are changing traditional political information production, distribution, and consumption dynamics (Jungherr et al., Citation2020b). Developments such as the mass supplanting of political information with entertainment, the diversification of media diets, the fragmentation of the media environment and the rising proliferation of misinformation, have not only impacted political communication processes, but have also added new challenges to the study of political communication. At the heart of these, there continue to lie questions political communication scholars have always asked: how can we reliably measure the reach of specific media outlets or political actors, identify the often-overlapping media and information diets of people, and estimate the effects of information – especially in new, noisy and deeply confusing information environments?

But while the challenge of disentangling political communication processes and their effects has grown with the complexity of the political information environments, so have our toolsets and capacities as social scientists to better study and understand them. The sinking costs of computational power and broad access to data science toolkits, formerly confined to either highly specialized communities or to particular disciplines, have provided access to a wide variety of data containing political information, novel data collection practices, and a stream of new methods with which to make sense of them. These developments are often discussed under term Computational Communication Science (CCS) (van Atteveldt & Peng, Citation2018). We treat as computational political communication the research developing, synthesizing, and applying data collection and analysis techniques relying primarily on computational methods and tools, with the objective to answer substantive theory-driven questions in the field of political communication.

For those wishing to study political communication processes using entry-level computational methods and tools, CCS is very accessible (for examples see Habel & Theocharis, Citation2020). But critical questions emerge. Do the data we can access really help us answer the substantive questions we are interested in? Is the mere adjustment of R code from a tutorial enough to establish reliable findings? Can a colorful visualization in Gephi alone provide sufficient evidence that Twitter conversations around a hashtag are polarized? What theoretical puzzle does a purely descriptive overview of how people connect in online discussions help us answer, and what theory-building capacity do such insights have? How certain can we be that all those bots identified by Botometer are really non-humans? How do these methods square with developments toward open science, data sharing, and replicability?

There is little doubt that new types of data and methodological approaches allow a new vista into both existing and new communication processes. But while the fact that we can observe and describe a political communication process from a viewpoint that has so far not been attainable can sometimes be instructive in itself, does it also mean that we can necessarily say something new, meaningful or theoretically interesting about it? And does the fact that new and alternative ways of measuring our concepts are now at our disposal mean that these are not ridden with the same problems our previous approaches suffered from?

Much of computational communication research currently strives (and often achieves) to be not only conceptually clear and theory-driven, but to deploy sophisticated analytical methods that are rigorously validated and made transparent through openly accessible replication repositories. But the apparent ‘magic’ of computational tools and methods and the ease with which you can summarize certain types of information in seemingly insightful ways, might lure unsuspecting researchers into reading too much into their findings. In this, the apparent ease of the use of computational methods threatens achieving meaningful and valid insights. Even worse, despite the efforts of experts to make their tools available with immense documentation and support to everyone who has the will, resources, and capacity to invest time in building relevant skillsets, we feel that a rift runs along the lines well-known in other resource-intensive scientific fields: between those low and high in resources, those with access to high-quality data – including proprietary data from digital platforms – and those without, those with little and much institutional support, those in United States and elsewhere. In this, growing uses and demand for extensive computational methods and datasets thus risk exacerbating existing inequalities in the opportunities for contributions in the social sciences at large, and political communication research in particular. This is an issue many journals – including this one – and interest groups in the field increasingly commit to addressing.

Against this backdrop, the goal of this contribution, and of this Special Issue, is to foreground the theoretical, empirical, and institutional opportunities and challenges of CCS that are relevant to the political communication community. We believe that, despite its vast potential, as of yet CCS has only had marginal impact on core tenets in the field of political communication. New computational methods remain ill-connected with established approaches to social science research, and findings predominantly speak to isolated single-country cases. One reason for that is that computational methods are still pursued by a minority of political communication researchers (though this is changing rapidly, as the elevation of the ICA Computational Methods interest group – founded only in 2016 – into a division within a very short time, testifies). While one might argue that, until relatively recently, this was also the case with quantitative methods more broadly, the same need not be with computational methods. Technical knowledge, formerly mostly found in expensive methods textbooks, is now available in the form of both online guides/tutorials and code repositories with detailed instructions (Stan, for example, a language for Bayesian inference and optimization, comes free with open-source code and a 500-page manual) that can proliferate at a much faster rate and are accessible to a larger audience for free.

Another important barrier is that much of CCS research appears to lack connections to relevant theories, deploys measures that can be questionable, its capacities to reveal novel aspects of political communication processes are often misunderstood, and it remains largely descriptive or, on the other end, it can sometimes showcase methodological rigor at the expense of well-defined theoretical mechanisms. These are all understandable symptoms of an interdisciplinary field that has not yet matured, and they can, as numerous textbooks demonstrate, also be encountered in other types of social science research to a greater or lesser extent (Kellstedt & Whitten, Citation2018). Yet, as this is not the first time social science researchers are confronted with many of these issues), we have the advantage of learning from these enduring controversies and shortening the curve of progress (the use of content analysis since the 1950s, and more than 60 years of public opinion/survey research have taught us a lot about theory-testing and valid measurement – see, for example, Barberá, Citation2020). Finally, and this is something not based on quantitative indicators but rather comes from our experience and discussions with colleagues in relevant interest groups, CCS’s highly interdisciplinary nature makes it institutionally cumbersome, often creating imbalances on who can substantively contribute to this research, but also on how scholars with relevant interdisciplinary expertise can position themselves on the market.

With this special issue, we plan to map the potential of CCS for the political communication community and demonstrate its broad appeal beyond that of highly technically skilled researchers, focusing on approaches and perspectives that not only demonstrate its methodological innovation but, most importantly, illustrate its theoretical, practical, and institutional relevance and the challenges in realizing its potential.

Defining Computational Communication Science

We position Computational Political Communication within the subfield of Computational Ccommunication Science (CCS), which itself is a variant of Computational Social Science (CSS) (Lazer et al., Citation2009). CSS is a developing interdisciplinary scientific subfield that is still lacking clear demarcations. We define computational social science as an interdisciplinary scientific field in which contributions develop and test theories or provide systematic descriptions of human, organizational, and institutional behavior through the use of computational methods and practices. On the most basic level, this can mean the use of standardized computational methods on well-structured datasets (e.g., applying an off-the-shelf dictionary to calculate how often specific words are used in hundreds of political speeches), or at more advanced levels the development or extensive modification of specific software solutions dedicated to solving analytically intensive problems (e.g., from developing dedicated software solutions for the automated collection and preparation of large unstructured datasets to writing code for performing simulations). Accordingly, CCS, and by extension Computational Political Communication, lie at the intersection of CSS and (political) communication, with a topical focus on theories and phenomena associated with communicative channels, objects, behavior, and effects.

The definition points to an important point of tension in precisely differentiating CSS from other fields in the social sciences. Nearly all contemporary work in the social sciences relies on computational methods. This includes the storage and processing of digital data – such as digital text, image, or audio files; computationally assisted data analysis – such as regression analyses and simulations; or data collection through digital sensors – such as eye tracking or internet of things enabled devices. In this work, computation is often a necessary precondition. For example, while it is possible to run multiple regressions with pen and paper, the success of this method in the social sciences depends on the digital representation of the underlying datasets and computational resources available to process the data. As in the most general reading of our definition the use of any computational method in data handling and analysis would qualify as computational social science, one could argue that nearly any form of contemporary social science would constitute computational social science. Obviously, this is not helpful in identifying constituting elements of the field and subsequent potentials and challenges.

It might be helpful to focus more on studies and research projects in which computational methods and practices are not used as plug-and-play solutions but instead demand for varying degrees of customization with regard to data collection, preparation, analysis, or presentation. Again, this is best thought of as a distinction in degree. On one end of the scale, we find projects that require some coding with regard to the sequential calling of preexisting or slightly modified functions or data management. On the other end of the scale we find research projects that demand the development of dedicated software solutions, for example, in automated and continuous data collection, preparation and structuring of large unstructured raw data, or the development of dedicated non-standardized analysis procedures. Projects at different ends of this scale share issues arising from their focus on social behavior, systems, or phenomena but they vary significantly with regard to their computational demands. Projects that use standardized computational methods might thus be basically indistinguishable from other areas in empirical social science research. On the other hand, projects at the other end of the scale are likely to face challenges indistinguishable from software development in computer science.

Often, CSS is discussed in the context of new datasets that have been made available through digital technology. This very famously includes data documenting user behavior in digital environments – so-called digital trace data (Freelon, Citation2014; Golder & Macy, Citation2014; Howison et al., Citation2011; Jungherr, Citation2015). Increasingly, however, other large datasets that are relevant for political communication research have become available digitally, such as large text corpora covering fields as diverse as newspaper coverage (Barberá et al., Citation2020), literature (Piper, Citation2018; Underwood, Citation2019), historical or contemporary parliamentary speeches (Rauh & Schwalbach, Citation2020), and images (Williams et al., Citation2020). All these datasets are legitimate objects of computational communication, and it is thus unnecessarily limiting to restrict one’s definition of CSS to specific types of datasets.

Correspondingly, we find it unproductive to limit one’s definition of CSS to one specific topical subfield. It is true that much early work in CSS focused on digital communication environments, but to us this is an artifact of early availability of datasets documenting user behavior on social media – especially Facebook and Twitter – and not a constitutive feature of CSS. Our understanding of CSS is thus not tied to a specific set of methods, datasets, or research interests. Instead, to us, the constituting element of CSS differentiating it from other approaches in the social sciences, and in political communication in particular, is the degree to which research projects demand the inclusion and development of computational methods over the course of a project. At the same time CSS is a specific subfield in computer science research in that it focuses on social systems and phenomena. Consequently, approaches and methods have to account for the specific conditions of this research area (Flyvbjerk, Citation2001). By focusing on these two constitutive characteristics of CSS – the examination of social systems, phenomena, and processes based on computational methods – we can identify and discuss the associated promises and challenges encountered in this field.

Promises, Promises

Accounts of CSS are usually accompanied by strong expectations with regard to the promises they hold for the study of societies and human behavior. This also holds for the study of political communication. Promises usually come in two forms: The first focuses on the increased coverage of social phenomena and human behavior through digital trace data and digital sensors, the second goes even further and expects a transformation of the nature of the social sciences.

On the most fundamental level, proponents of CSS agree in that the digital transformation has led to a massive increase in available data sources and types for social scientists. This is true for data that in principle had been available before but now are available at a significantly larger scale, such as newspaper corpora. Beyond this, we also have new data sources. For one, the interactions of users with online services create data traces. In principle, these digital trace data provide a comprehensive account of user behavior with, and mediated by, digital services, and can additionally provide environmental details that were previously impossible to acquire. For this reason, they are highly promising to political communication researchers as they allow an entry point for investigating processes and behaviors within what are probably the most vibrant political information environments of our time (Golder & Macy, Citation2014; Howison et al., Citation2011; Jungherr, Citation2015; Salganik, Citation2018).

In practice, however, most political communication researchers only have access to highly limited snap-shots of digital trace data and remain at the mercy of digital platforms regarding data access (Freelon, Citation2018). The upside of this is that this type of access is still better than not having even this narrow corridor into the data streams of platforms where much of today’s exciting communication – and politics – happens. At the same time, however, it has made the realization of the full promise of digital trace data more elusive than originally hoped for, by adding strict barriers to the inferences that can be made about what these snapshots exactly represent and how, in the end, we can extract from massive data corpora something that is actually useful (Grimmer & Steward, Citation2013). Beyond this, we also encounter new data sources provided by digital sensors. This could be data emerging as a byproduct of another service, like satellite imagery (Weidmann & Schutte, Citation2017), or the output of sensors specifically designed by researchers (Pentland, Citation2008; Stopczynski et al., Citation2014). In principle, this data type is bound to increase with the availability and wide distribution of Internet of Things devices.

In combination, this increase in available data sources allows for an increasing coverage and surfacing of social phenomena and human behavior. It also enables the examination of well-known phenomena at much higher temporal, behavioral, and procedural resolution especially when combined with other methods of social science inquiry. This might also allow for a more systems-level view of societies and human behavior (Golder & Macy, Citation2014; Lazer et al., Citation2009; Salganik, Citation2018). As an example, while the phenomenon of misinformation has in the past been studied using surveys and survey experiments (e.g., Kuklinski et al., Citation2000), organizing – and eventually matching – digital traces with individual-level data can now provide a far greater depth into mechanisms of exposure to dis- or misinformation (Grinberg et al., Citation2019; Guess et al., Citation2019) providing insights that would not be possible if survey data were used alone. Here researchers are not only using more and different data than before, but to achieve their goals they engage in high degree of customization when it comes to both the data generation and the analytical process.

More ambitiously, the availability of vast datasets documenting human behavior in interaction with digital services – or covered by digital sensors – has also led to the expectation that social sciences might transcend their status of a soft science into an actual scientific discipline with models allowing for the confident prediction of the future. In this view, more data do not only mean an increase of the coverage of social processes or human behavior but actually would allow for a “measurement revolution” (Watts, Citation2011) in the social sciences, allowing them to overcome their current state of after-the-fact explanation and develop into a science with true predictive power (Hofman et al., Citation2017). This hope rests on a view of society as being shaped by underlying context-independent laws that have mostly remained invisible to scientists due to the lack of opportunities to acquire data that can now be accessed (González-Bailón, Citation2017).

While we increasingly see studies that illustrate the first promise of CSS based on the expansive coverage of social phenomena, the second promise of a transformation of social science into a more strictly predictive science remains unfulfilled, especially when one looks at political communication research. While one might treat this as an indicator that we simply need even more data, we feel it is more plausible that the nature of the social sciences is the examination of context-dependent phenomena (Elster, Citation2015; Flyvbjerk, Citation2001; Gerring, Citation2012), and as such prediction in the social sciences is more an instrument of theory-testing and not an instrument of planning and design, as for example, in engineering or physics.

Overall, while these promises have been well articulated and prominently advanced, the challenges of realizing them remains predominantly buried in the discussion section of empirical papers. But looking back into what is now more than a decade of research allows us to identify at least three problem areas:

  • CSS research continues to remain weak in connecting its research designs and findings to established theories, concepts, mechanisms, and discussions in the social sciences (Jungherr & Theocharis, Citation2017);

  • While problems with data – especially when it comes to social media data – have been the subject of considerable debate in CSS (Japec et al., Citation2015; Sen et al., Citation2019; Stier et al., Citation2019), data generating processes and their effects on the composition, coverage, and interpretative meaning of signals in available datasets (Jungherr, Citation2019) are often treated as issues of – at best – secondary importance;

  • CSS as an interdisciplinary research field struggles with establishing practices that connect it more strongly within the established social sciences, develop standards of transparency in data collection, preparation, harmonization and analysis, and surface and problematize conflicts of interest between researchers, industry, and the media (Jungherr et al., Citation2020a). Steps that have been taken to address especially the last of those issues (King & Persily, Citation2019) have been met with skepticism (Bruns, Citation2019).

For CSS, and therefore also computational political communication, to flourish and transcend its current niche existence among computational enthusiasts, among social scientists and socially curious computer scientists, these challenges have to be addressed.

Challenges

Data and Resource Requirements

Challenges in computational communication are not unlike those of CSS. The feature most often associated with CSS is probably the extraordinary size of datasets (Salganik, Citation2018). This development has given rise to the term “big data” in order to discuss associated research potentials (Lazer & Radford, Citation2017; Schroeder, Citation2016, Citation2019), though as of recently the term has lost some of its popularity given a growing awareness of its conceptual ambiguity.

On a very fundamental level, increasing sizes of datasets bring practical issues in their storage and processing. While it is true that processing power of computers increases, this does not make up for the increasing demands put on them through new types of datasets. This is already true for textual data, and all the more true for datasets with high-resolution images or videos which are becoming of increasing utility to political communication scholars, especially as research interest switches to platforms such as Instagram, YouTube, and Tik Tok. Projects collecting and using respective datasets increasingly face non-trivial data preparation and processing tasks that go well beyond the scope of typical projects in the social sciences.

Beyond these fundamental issues in the handling of large datasets with diverse types of data, further issues emerge from the mere fact that projects in political communication often use social media data collected through public interfaces provided by social media platforms. Until recently, this data access through so-called Application Programming Interfaces (APIs) was easy and provided researchers with comparatively rich data. But while some platforms gradually take steps to facilitate access and use of their data by academic researchers, others have come to restrict public data access through APIs thereby limiting the opportunities for collecting datasets of high quality. While some researchers have advocated for partnerships with social media companies as a possible way of mitigating risks related to current constraints in terms of reliability and reproducibility, and preserving user privacy by gaining access to research-grade data (Puschmann, Citation2019), others have advocated the development of dedicated data collection solutions independent from the access provided by platforms (Freelon, Citation2018). These examples illustrate that these data do not only hold potentials but also come with significant challenges. This makes it an area that increasingly demands for interdisciplinary teams of computer and social scientists to handle these demands (King, Citation2011).

Large datasets also bring considerable privacy concerns, data ownership and subsequent unresolved issues of research transparency and replicability. Also, in practice, it is likely that most users of online services remain unaware that their public contributions and interactions -as well as associated metadata - might be visible to others and subject to research projects in which these data are often used to infer their preferences, traits, or characteristics. This is even more true for the data from digitally enabled sensors or Internet of Things devices – all data sources of growing interest. Additionally, datasets with many data points per individual raise significant challenges to guarantee that it is not possible to identify the identity of individuals. This makes it difficult for companies to provide researchers with access to data (King & Persily, Citation2019; Levi & Rajala, Citation2020) and also raises issues after the completion of projects. While there is an increasing awareness in the sciences for transparent research practices and access provision to data underlying research projects (Christensen et al., Citation2019), privacy concerns and the often-proprietary nature of the underlying data make it much harder to establish similar standards. This often leads to papers remaining opaque when compared to developing transparency standards in the social sciences (Jungherr et al., Citation2020a).

Links: Theory and the Semantics of Digital Signals

A more hidden challenge in CSS, and therefore by extension to computational communication, lies in linking data and results to phenomena of interest and to relevant theories (Jungherr et al., Citation2020a). Social science is a field of highly contextual dependent findings, making grand-theorizing untenable (Flyvbjerk, Citation2001). Still, linking studies to existing theoretical mechanisms that have been shown to be at work when it comes to certain processes, phenomena or behaviors, allows researchers to build plausible research puzzles and establish a link to previous knowledge. This enables an assessment of what part of our previous understanding of reality is supported or contradicted by novel findings. Only this active linking between established theoretical ideas and novel research environments, phenomena, and methods will allow for the emergence of a cumulative body of evidence instead of a palimpsest of ill-connected and isolated findings (Schroeder, Citation2019).

The power of theory-driven research in computational communication can be illustrated by a recent study that also illustrates the aspect of customization with regards to the research design and analysis that we discussed earlier. Much of the current debate about the downsides of social media use is that it supposedly creates “echo chambers”, and as a result people are not sufficiently exposed to information contradicting their prior beliefs. Scholars concerned about the adverse effects of echo chambers suggest that communication with people “from the other side” can enhance exposure to diverse viewpoints thereby reducing polarization. In a recent study, Bail et al. (Citation2018) assess this hypothesis, but also go further and theorize the existence of a rival mechanism according to which such interactions have a backfire effect. To address this puzzle, they deployed an ingenious experimental research design that combined survey research, bot technology, and Twitter data. Their results revealed significant partisan differences in backfire effects, opening up new avenues for further exploring mechanisms behind the created backfire effect among Republicans in particular, and which their study was not able to reveal. Despite its limitations (discussed extensively by the authors), this theory-driven study is instructive in its deployment and craftiness of computational methods to better understand specific communication processes.

Experimental studies, such as this one, are a highly promising arena in computational methods-powered political communication research in which treatments can be rolled out in realistic environments for participants of unprecedented counts (Bail et al., Citation2018; Leeper, Citation2020; Salganik & Watts, Citation2009; Siegel & Badaan, Citation2020). But while this allows for high experimental control and the identification of small effect sizes, once the number of available observations increases, researchers also have to adjust the criteria by which they interpret results (Japec et al., Citation2015).

One example for the necessity of this adjustment is offered by Bond et al. (Citation2012). The authors present a highly innovative experimental study in which they ran an experiment with 61 million Facebook users, a selection of whom they showed information if their friends had indicated that they had voted in a US-election. The authors identified very small effects of a specific variation of this information treatment. While they are careful in mentioning this as a limitation in the text of their paper, in the abstract and conclusion they speak prominently about the success of influencing people through information on Facebook. This and not the more careful reading accounting for effect sizes has come to dominate references of this paper in both scientific and public discourse. Here, the cautioning by the authors themselves goes out of the window and the study is predominantly cited as evidence for the tremendous manipulative power of Facebook in political communication and elections. In popular reception the high number of participants might thus have worked as a cue of the importance of the findings, when in fact the large participant count waters down the relevance of the reported statistical significance. This prominent study illustrates the necessity for researchers to adjust their reporting practices to the new conditions of big datasets.

Beyond theory, there is another link currently predominantly neglected in CSS, the one between signals found in data and the phenomena of interest to a study, the semantics of digital signals (Jungherr, Citation2019). Consider the following example that should be familiar to political communication scholars. Anyone doing research on social media knows that any given data point is a symbolic representation. It could represent something direct and unambiguous, such as for example, when a user clicking “like” below a post on Facebook about a friend’s solidarity statement with the Black Lives Matter movement. But the data point could also represent something more indirect. For example, the “like” on Facebook could be an expression of support under a post voiced by another user, it could represent an agreement with a factual or interpretative statement, it could be an expression of sympathy to another user, or an act of social capital maintenance totally devoid of any direct connection to the content of the liked post. While one could argue that survey researchers are often faced with similar issues, linking signals with behaviors in social media environments is a much riskier endeavor. This is not only because of the diverse architectures of the platforms and the multiple social cues afforded by the specific activities offered by each platform’s embedded functions. It is also because of the hard-to-measure error associated with the meaning assigned to specific acts. The semantics of signal and represented object might vary not only over time but also between types of content (e.g., text, images, or video) and services, especially given what we know about variation in platform affordances (Jungherr & Jürgens, Citation2013). This makes it crucially important for researchers to explicitly state their interpretation of the relationships between signals and objects of representation in order to foreground their underlying assumptions. Currently, the phenomena directly represented by digital traces are often conveniently ignored. Instead, scholars tend to project their interests onto signals found in digital trace data without worrying too much about establishing the link between their chosen signal and the phenomenon of interest (Jungherr et al., Citation2017).

Labels: Can We Infer People’s Traits Based on Digital Traces?

The practice of labeling in CSS is tightly connected to the issues arising from establishing the semantic link between signals in data and the phenomena they are supposed to represent. One prominent feature in CSS is the attribution of labels to individuals or digital avatars based on behavior manifested in their digital traces. Examples abound. Activity on social media has been used to label users according to their political ideology (Barberá, Citation2015), psychological traits (Azucar et al., Citation2018), mental health (Chancellor & Choudhury, Citation2020), or the authenticity of their account (Rauchfleisch & Kaiser, Citation2020). Labels are powerful tools in CSS as they allow large-scale automated assignment of interventions based on perceived traits or preferences of users. In this, they resemble scoring procedures in other fields (Citron & Pasquale, Citation2014). Unsurprisingly, this raises a host of concerns and demands on researchers providing labeling propositions and solutions.

For academic researchers it is completely legitimate to identify correlations between signals in digital traces and other metrics documenting traits, preferences, or expected behavior by individuals. It is something else entirely if these labeling solutions become the basis of business models or policy interventions. For this, as the Cambridge Analytica case forcefully demonstrates, much greater scrutiny and public oversight is needed. For one, the discussion about labeling tends to focus on its surprising ease. Seemingly successful cases, once established in the public imagination, turn out to be hard to dislodge even if academic discussion moves beyond early enthusiasm to a more critical stance. Take the discussion about the supposed prevalence of (semi-)automated accounts – so-called bots – in public discourse (Schneier, Citation2020). Here public and political imagination seems obsessed with visions of public debates in online spaces being overrun by manipulative and inauthentic accounts pushing narratives challenging to the political status quo. While methods of labeling social media accounts as bots abound (Varol et al., Citation2017), findings have been mixed (Keller et al., Citation2020). Increasingly, early enthusiasm is replaced by skepticism. Careful examination shows that methods labeling accounts as bots have not proved to be reliable, with mislabeling of actual authentic and legitimate accounts as bots (false-positive) and strong temporal decay in precision (out of sample prediction) (Rauchfleisch & Kaiser, Citation2020).

Labeling undoubtedly matters (Pasquale, Citation2015) and automatically labeling social media users as bots/-supporters of a political candidate in particular brings risks. While these risks remain manageable when labeling efforts stay in the confines of academic papers, they grow exponentially once unvetted and unsupervised labeling solutions are deployed widely on online platforms and become the basis for the roll-out of automated interventions. In the current heightened political climate and opaque governance practices of platforms, the scholarly community should be doubly careful with regard to how labels are assigned, how they are audited, and as the basis of which interventions they serve. Silencing automated accounts might be legitimate in specific circumstances. Silencing users who for some reason or other have been labeled as “bots” decidedly less so.

As it stands, CSS has not yet reflected this ethical responsibility in its practices sufficiently. Overall, the case of bot detection illustrates the need for CSS to demand much more vigilant reliability, validity, and robustness checks of proposed labeling procedures as overly-enthusiastic prototypes might develop hard to control societal effects that can become difficult to curtail once the collective imagination takes possession of them.

The Shock of the New: Theory-driven Work as Stabilizer

Digital technology has expanded the reach of everyone and changed the composition and processes in social systems. While some political communication processes and phenomena can be parsed surprisingly well with existing scientific theories – see for example, the rich theoretical literature on digital media and polarization – others are inherently new or at least sufficiently different with regard to dynamics, reach, or effects as to demand new conceptualization and potentially even new theories (Neuman, Citation2016; Schroeder, Citation2018). Examples include the discussions on disinformation (Lazer et al., Citation2018) or the nature and effects of uncivil discourse in online communication spaces (Munger, Citation2017; Theocharis et al., Citation2020, Citation2016). In a fluid, high-choice, and complex political information environment, political communication researchers are therefore especially well-placed to employ theory-driven designs, as discussed above, to allow for the linking of findings to established discourses, but also to drive theorizing in order for the field to account for the changes to human behavior, institutions, and social structures we are living through.

At the time of writing, scholars are actively working on the impact of the COVID-19 pandemic. An event of momentous significance, it has elevated scientific and public communication to new heights. It is also triggering concerns about a number of issues, including how people change their information diets in response to the crisis, how racism and xenophobia in online conversations is impacting the health of online debates, how much misinformation is out there and what its consequences for public health might be. In times of social isolation, media consumption of all forms is bound to increase and the pandemic will, no doubt, offer itself for exciting research in the years to come, not least because never before has a singular and life-threatening event dominated media and communication for so long. That computational methods are going to play an important role in research associated with the event is in no doubt.

But, momentous though the COVID-19 pandemic may be, our existing understanding of political communication processes during major events (especially if it only pertains to public behavior on Twitter) does not exactly present us with momentous theoretical problems. We already have, for example, expectations and solid knowledge about conversation dynamics and information diffusion on Twitter in particular. We are also well informed as to why social media could prove pivotal for enabling groups and individuals to organize solidarity and collective action in their neighborhood, why certain types of misinformation will potentially polarize certain age groups but not others, and why social media affordances might contribute to certain individuals garnering support using falsehoods instead of facing complete demolition in the polls. Our aim here is not to pre-judge, or even give indication of what is, good and bad research. Nor is it to say do this research and not the other, or use this set of theories and tools and not the others. Rather our goal is to point out possible pitfalls in certain approaches that have become increasingly common, and explain the characteristics of research that we believe could, by now, be better represented and which can help the field address important questions in a more interesting manner.

The flip-side of theorizing, especially in new and unfamiliar environments or contexts, is the need for extensive and diligent descriptive work in order to map new phenomena widely and systematically (Swedberg, Citation2014). An inherently welcome maturation process in the social sciences over the last decades has given rise in some areas to neglect and active discouragement of descriptive work. While this might be justified for areas in which others are doing the descriptive heavy lifting – such as journalists, lawyers, or historians – in CSS it would be a mistake to adopt this attitude. The task of mapping the effects of the digital transformation in various fields and across cultural, temporal, or national contexts is far from trivial and crucial to public understanding and the further development of the field.

Practices: The Key Role of Infrastructures, Institutions, Ethics, and Training

CSS, an interdisciplinary field at the borders of various social sciences, computer science, and even some natural sciences, is having a significant impact on research practices in the social sciences and political communication in particular. While some communication is happening at the borders between researchers coming from these fields, in practice everyone brings the practices and standards from their original field to new endeavors in CSS. It thus comes as little surprise that CSS is not dominated by a coherent theoretical tradition, specific methods, or datasets. Instead we find a myriad of approaches and standards at work. This makes it difficult to find a coherent language and to develop a framework under which empirical findings from different traditions following different standards to contribute to a cumulative account of research (Schroeder, Citation2019).

While the dominant overviews of CSS clearly reflect the interdisciplinary nature of the field, in practice this is difficult to realize in research teams (Gilardi et al., Citation2020). While there are a number of high-profile CSS groups – predominantly in the USA – that are able to assemble interdisciplinary teams, in most academic contexts, field-specific hiring practices make this difficult. In practice, we either find loose interdisciplinary assemblages of research groups that within themselves remain more or less homogenous, or research groups situated in one field that try to pick up necessary skills from different fields on the fly. Of these three options, the first – establishment of dedicated interdisciplinary research groups – appears the most promising one regarding the development of CSS as a coherent field and allowing work on tough challenges. At the same time, employment in such a team might be risky for PhD students and post-docs as at present it is unclear if comparable teams spring up in sufficient size as to provide further employment opportunities and if more traditional job searches recognize their experience in interdisciplinary teams as valuable. The second option – loose interdisciplinary alliances between in themselves homogenous research groups – are somewhat risky with regard to the fragility of these efforts but still contribute to an interdisciplinary dialogue and potential standardization of CSS as a field. It also brings somewhat smaller risks for PhD students or post-docs as they might be working on interdisciplinary projects, but they maintain their affiliation with a clearly identifiable unit in their respective field. The third option – a homogenous group picking up skills from other areas on a need-to-know basis – is probably the most fragile option that at the same time contributes little if nothing to a standardization of CSS as an interdisciplinary field.

We stress here that true interdisciplinary research and teaching is not only difficult to attain practically, but requires the institutional open-mindedness and resources which, rhetoric aside, few institutions are willing or able to provide. It is no secret that interdisciplinary research requires not only bold decisions with an eye to the future, but also the funding of possibly high-risk, experimental initiatives – and these are both aspects that some academic systems privilege (and are able to financially sustain) much more than others. It is unsurprising that much of the development in computational communication begins its ambitious trajectory from the USA, where public but also private funding are responsible for the establishment and evolution of a series of excellent labs and centers that produce cutting edge work. This trajectory rarely continues to Europe and elsewhere, where not only funding for such initiatives is harder to come by, but where the few existing centers or labs have been mostly established in a handful of highly prestigious and heavily-funded institutions. As an example, of the “39 women doing amazing research in computational social science”, according to a Sage Ocean piece on diversity published October 2018,Footnote1 28 are based in the USA, and from the rest six are based at Oxford, Cambridge, the London School of Economics, and the newly found Alan Turing Institute – UK’s prestigious national institute for data science and artificial intelligence.

Troublingly, consistently the most publicly visible work in CSS is based on proprietary data that researchers gained access to through privileged partnerships with digital platforms. This is disconcerting for the future of this field for various reasons. For one, the need for proprietary data to do research reinforces existing power-imbalances. While researchers at Berkeley, Stanford, or MIT can rely on a strong alumni network to gain access and trust within companies providing digital platforms (Minsky, Citation2016), researchers from Europe and elsewhere cannot rely on this sort of access and therefore are consistently worse off than their well-resourced colleagues. This contrasts strongly to the cultural background and identity of the vast majority of international platform users. The underrepresentation of researchers from, e.g. Asia and India inevitably leads to a bias in research attention on the uses of platforms in Western democracies, especially the USA. This is of increasing importance as the USA continues to chart a very specific and contiguous course, and as a result CSS risks speaking predominantly to very specific momentary concerns of this particular country.

Even more worrying, studies based on proprietary data cannot be externally replicated. This is deeply problematic. For one, CSS is an emerging field where standards might shift over time. So even the most well-intentioned and most carefully designed study might need revisiting a few years after publication after a shift in standards or increased sensibility toward potential biases in data collection and analysis. Without a transparent replication regime this is not feasible. The reliance on proprietary data might actually endanger methodological progress in the field and the ongoing conversation of solutions on that front is to be greatly encouraged. At the same time opaqueness with regard to the underlying data and its selection process makes it essentially a faith-based decision to trust the findings or not. As companies providing access to said data are self-interested entities one might be forgiven to think this a weak criterion. More generally, in its reliance on access provided by companies who themselves, their governance-processes, and their business models are subject to researchers’ findings, CSS is a field deeply mired in conflicts of interest between researchers, companies, and governments. As of now, the field has neglected to account for these conflicts and develop standards to make them transparent or avoid them (Jungherr et al., Citation2020a). This is a fundamental challenge for the subsequent maturation of CSS.

The interdisciplinary nature of the field also raises challenges for the review process. A researcher coming from a communication science background will review a paper written by a computer scientist based on the standards and practices of her field . She is therefore likely to find that paper falling short of these standards while a reviewer coming from computer science might have found it ready for publication. Our own experience from the review process in this special issue, in which reviewers came not from computer science but almost exclusively from the fields of political science and communication, already demonstrates that it is unlikely for CSS to develop a coherent and uniform core of theories, methods, and practices soon. Our sense is that it is necessary for editors and reviewers to reflect on these challenges and accordingly review papers with a somewhat broader mind than if they would be reviewing for an article at the core of their own field.

Finally, this challenge also arises with regard to education in CSS. How can one avoid to re-train social scientists into mediocre coders and computer scientists in mediocre social scientists? What is the right balance in providing a common core of CSS as to allow practitioners to use a common language and have a shared understanding of underlying challenges but also allow them to diverge in order to develop necessary specializations in theory, research design, and method? These are questions to which the field has no coherent answer yet but that are of fundamental importance in its maturation process.

Asking Better Questions: The Potential of CSS in Political Communication

The availability of new types of data and the development of computational tools and methods to make sense of them allow a multitude of political communication processes to be investigated from perspectives that were previously impossible. More detailed and more accurate information about people’s news consumption and media diets can be acquired by matching individual-level data with data harvested via web-browser or social media trackers. This invites new insights as to how exposure to different types of content might be affecting different types of behavior, such as political participation and attitudes such as media trust, and can help inspect a number of classic media effects theories such as framing, priming or agenda-setting in new and more detailed light (Jungherr et al., Citation2019). This can allow scholars to better understanding changes in the traditional gatekeeping role of legacy media and professional journalists in an increasingly richer and more competitive media ecology.

Combining different types of data provides also a far more refined insight on how political information and content impacts individuals differently (Popa et al., Citation2020; Scharkow et al., Citation2020; Wells & Thorson, Citation2017), possibly exacerbating already existing inequalities in political information of high quality. Designs employing digital trace and individual level data are, similarly, able to answer a number of new questions related to political communication during electoral campaigns. These questions range from how people deploy communication strategies and language strategically to mobilize others, to how people are being impacted by it as they watch political debates and rallies.

Research into the communication strategies of social movements is also enriched by being able to look into patterns of diffusion of information across networks (Mercea & Bastos, Citation2016). New ways of sharing humorous political content originating from talk shows or social media memes can now be captured more precisely using digital trace data, and the impact of political humor can be better understood. Importantly, due to the more easily traceable textual and visual character of political discussion online, political communication scholars can today, assisted by a host of (automated) text analysis tools and methods, study in great detail political disagreement and the effects this might have not only on polarization, but also on uncivil behavior. Manifestations of intolerance in human communication, such as racism, misogynism, homophobia, all extremely difficult to measure using surveys, can now be observed on social media and analyzed using what are by now common mining methods and through a multitude of sophisticated text and network analysis methods in order to understand their effects (Benoit, Citation2020).

Building on this rich portfolio, our goal in this special issue was to attract contributions that help illustrate the kinds of problems computational methods help political communication scholars solve, and what theories they help us advance. We were also interested in contributions illustrating applications of computational methods for addressing major questions pertaining to a number of different topics. We were, finally, interested in garnering insights from scholars working in the field of computational methods and who could discuss their experiences pertaining to the interdisciplinary challenges in the field. We received a large number of high-quality submissions and are proud to present six full-scale research articles and two forum contributions. We are especially glad, that the collection of articles presented here manages to avoid some of the cultural and resource-driven biases in CSS reported above.

In their contribution to this Special Issue, Lu and Pan theorize that the expansion of social media use among the Chinese public has made it pertinent for the government to expand its propaganda proliferation strategies in ways that are different in kind to classic propaganda dissemination. Their study, which is one of the first demonstrating the benefits of combining ethnographic and computational methods, reveals the central role of metrics among Chinese propagandists and offers a number of novel insights on the different types of content used for achieving reach.

One of the most exciting advances in political communication research is the gradual entry of images as objects of analysis. The potentials – but also the caveats and extra caution when it comes to validation – of image analysis are powerfully demonstrated by two studies in this Special Issue. In the first one, Haim and Jungblut look into candidate imagery during the 2019 European Parliamentary Election using a comparative dataset with candidates across all 28 European members states. They provide a first, large-scale descriptive analysis and exploration of variation of visual communication of candidates across different platforms, demonstrate the value of descriptive analysis in CSS that we discussed earlier, and illustrate a number of different aspects of visual communication pertaining to non-verbal behavior. While the study’s methodological approach involves third-party tools that are not only mired by a number of concerns (van Atteveldt & Peng, Citation2018) but also diverge from our understanding of CSS as discussed in this contribution, it nevertheless demonstrates how off-the-shelf providers for visual analysis can be applied in the study of visual communication and makes a strong case for using such tools with strong validation.

Boussalis and Coan’s study asks to what extent nonverbal signals from candidates during televised debates might influence how voters form their level of support. Drawing on literatures on morphological features of candidates and the effects of facial signals of politicians, they theorize that specific emotional displays/facial expressions of candidates in televised debates might influence voter support. Combining frame-level facial display data of political candidates in the US with second-by-second continuous response measures of viewer reactions to debate participants, they find that facial signals of emotions by participants in televised debates may influence how viewers evaluate candidate performance.

Yarchi, Baden, and Kligler-Vilenchik address the important question of political polarization on social media platforms. The authors examine patterns in political talk on three online platforms on a political controversy in Israel. They find strong differences between the patterns identified on the three platforms with Twitter showing the strongest evidence of polarization across the three measures, interactions on WhatsApp growing depolarized over time, and Facebook showing the weakest evidence of polarization. This article provides a stark warning against drawing conclusions of digital media driving political polarization based on single-platform studies, especially if based on Twitter.

The contributions by Dun, Soroka and Wlezien, and Nicholls and Cullpepper are better understood as speaking to the applications section of this special issue. The study by Dun, Soroka and Wlezien is situated within classic political communication approaches involving content analysis of media coverage, in this case of US defense spending. Relying on a large and longitudinal corpus of roughly 2 million articles between 1980 and 2018, they apply what they label as the dictionary-plus-supervised-learning approach. The results introduce new considerations as to whether the introduction of machine-learning brings sufficient benefits, but the authors propose that the two approaches need not compete and offer approaches for combining them.

One of the most prominent approaches to the study of political text is the analysis of frames. Here, the contribution by Nicholls and Culpepper offers interesting new perspectives. The automated discovery of frames is a thorny issue in the analysis of text. Often, researchers choose one approach without necessarily justifying their choice or providing comparisons to other methods. Nicholls and Culpepper illustrate the drawbacks of this practice by testing the performance of three different procedures in the automated identification of frames. They show that the quality of the approaches varies with regard to the nature of the corpus and different conceptual aspects of frames, and offer a powerful reminder that computational methods are not plug-and-play devices that can be deployed without adjustments.

We were happy to have two forum contributions addressing different challenging aspects of interdisciplinary research. Windsor’s forum account is important in highlighting the complexities of setting up an interdisciplinary lab and developing a common language with scholars from computer science, and illustrates this in practice through a discussion of how different disciplines interpret and operationalize “cohesion„. Van Atteveldt, Althaus, and Wessler discuss the many issues emerging in collaborative endeavors that necessitate data sharing. As data sharing in CSS is often governed by copyright law and terms of service contracts, sustainable and ethical solutions that foster comparative work are very challenging, and their experience with short-term approaches is a useful guide for anyone beginning such endeavors with an interest to mitigate such problems.

Being able to open so many new avenues into political communication research, computational tools and methods have, quite clearly, a very broad appeal to political communication scholars. Yet, despite its broad appeal CSS is not only still limited, but it is also characterized by serious inequalities which only seem to grow over time. Why? We believe there are two reasons. For one, despite deceptively easy entry points CSS has a steep learning curve in acquiring competencies in computational methods that go beyond the use of out-of-the-box solutions . The second reason is the lack of comprehensive training in computational methods in the social sciences, which is itself partially an outcome of not enough people with this cross-disciplinary expertise being hired.

We hope that this Special Issue takes a first step towards demonstrating not only the challenges but also the broad and multifaceted application – and thereby appeal – of computational methods for the political communication community. Computational methods allow for approaching a multitude of existing problems and research puzzles that are particular to this era that is so much shaped by digital media from different angles and diverse ways. As we have shown, computational political communication, CCS, and CSS may differ based on the breadth of topics discussed, but they do not vary based on the underlying challenges of creating an interdisciplinary field at the borders of the social sciences, computer science, and the natural sciences. While the temptation to create subfields of subfields is strong, the field should carefully contemplate this development. There are still too few people working at this intersection at this point to begin with. Splitting these few further into silos risks slowing the development of interdisciplinary standards in favor of the emergence of various subfield-specific practices in the use of computational methods. While this development might increase the speed in which computational methods are accepted in specific social science subfields, a byproduct of it could be that the work on the hard questions of establishing interdisciplinary practices between social, computer, and natural scientists is pushed to the sidelines. We suspect that this could have consequences for the development of fresh methods and approaches to problems, and a possible strengthening of ready-made computational solutions to problems in the social sciences.

We see this Special Issue as a conversation-starter on why computational methods are of broader appeal to communication scholars and not the limited domain of highly technically skilled researchers. We believe that this can only be achieved by not only recognizing the necessity of interdisciplinary work but by keeping into perspective what is new and what is not, and that the pitfalls and boundaries of computational communication research relying on computational methods are not unlike those faced by the broader field of CSS.

Acknowledgments

The authors would like to thank Valeska Gerstung, Zoltan Fazekas, Sebastian Adrian Popa, Oliver Posegga, Spyros Kosmidis, Ralph Schroeder, Cristian Vaccari, Stefanie Walter, Chris Wlezien and the Editor for reading draft versions of this manuscript and providing immensely helpful comments. Special thanks go to the Editor, Claes de Vreese, for his guidance and support throughout the entire process of editing the special issue “Computational Political Communication: Theory, Applications and Interdisciplinary Challenges”.

Disclosure Statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Yannis Theocharis

Yannis Theocharis is Professor of Media and Communication with focus on Innovative Methods at the Centre for Media, Communication and Information Research (ZeMKI), University of Bremen

Andreas Jungherr is Professor of Communication Science with special focus on Digital Transformation and Publics at the Friedrich-Schiller-University Jena.

Notes

1. https://ocean.sagepub.com/blog/2018/9/28/39-women-doing-amazing-research-in-computational-social-science

References