Publication Cover
International Interactions
Empirical and Theoretical Research in International Relations
Volume 46, 2020 - Issue 6
2,322
Views
18
CrossRef citations to date
0
Altmetric
Research Note

Known unknowns: media bias in the reporting of political violence

Pages 1043-1060 | Published online: 26 Oct 2020
 

ABSTRACT

How does sourcing affect which events are included in international relations datasets? The increasing number of machine-coded datasets offers the promise of coding a larger corpus of documents more quickly, but existing automated processes rely exclusively on databases of news reports for coverage. We exploit source variation in the UCDP GED dataset, which includes events from media reports and non-media sources, to explore the bias introduced by including only media reports in international relations datasets. Unlike previous studies, our approach allows us to compare subnational and cross-national determinants of bias. We find that media sources severely underreport events in African countries, and coverage is also associated with country-level factors like international trade and subnational factors like access to communication technology. Non-media sources cover a significant number of events not included in media sources; their inclusion can expand coverage and reduce bias in datasets.

¿Cómo la causa afecta los acontecimientos que se incluyen en los conjuntos de datos de relaciones internacionales? La creciente cantidad de conjuntos de datos codificados por máquinas garantiza la codificación de un mayor corpus de documentos con mayor rapidez, pero los procesos automatizados existentes dependen exclusivamente de bases de datos de noticias para la cobertura. Utilizamos la variación de la fuente en el conjunto de datos de acontecimientos georreferenciados (Georeferenced Event Dataset, GED) del Programa de Datos sobre Conflictos de Uppsala (Uppsala Conflict Data Program, UCDP), que incluye acontecimientos de los informes de los medios de comunicación y fuentes que no pertenecen a los medios de comunicación, para analizar el sesgo que surge al incluir solo los informes de los medios de comunicación en los conjuntos de datos de relaciones internacionales. A diferencia de estudios anteriores, nuestro enfoque nos permite comparar los determinantes de sesgo a nivel subnacional y transnacional. Las fuentes de los medios de comunicación informan muy poco sobre los acontecimientos en los países africanos, y la cobertura también está asociada a factores nacionales, como el comercio internacional y factores subnacionales, como el acceso a la tecnología de la comunicación. Las fuentes que no pertenecen a los medios de comunicación cubren un número significativo de acontecimientos que no se incluyen en las fuentes de los medios de comunicación; su inclusión puede ampliar la cobertura y reducir el sesgo en los conjuntos de datos.

Comment le sourçage affecte-t-il les événements inclus dans les jeux de données portant sur les relations internationales ? Le nombre croissant de jeux de données codés par machine offre la promesse de coder plus rapidement un plus grand corpus documentaire, mais les processus automatisés existants s’appuient exclusivement leur couverture sur des bases de données de reportages d’actualité. Nous exploitons la variation des sources du jeu de données GED de l’UCDP, qui comprend des événements provenant de reportages des médias et de sources non médiatiques, afin d’explorer le biais introduit en n’incluant que les reportages des médias dans les jeux de données portant sur les relations internationales. Contrairement aux études précédentes, notre approche nous permet de comparer les déterminants infranationaux et transnationaux du biais. Nous constatons que les sources médiatiques ne couvrent pas suffisamment les événements intervenant dans les pays africains et que cette couverture est également associée à des facteurs nationaux comme le commerce international et à des facteurs infranationaux comme l’accès aux technologies de communication. Les sources non médiatiques couvrent un nombre important d’événements non inclus dans les sources médiatiques, et leur inclusion peut élargir la couverture et réduire la partialité des jeux de données.

Acknowledgments

We would like to thank Mihai Croicu, Anita Gohdes, Nils Metternich, Nils Weidmann, and Yuri Zhukov for excellent advice. Replication materials are available on Dataverse at http://dvn.iq.harvard.edu/dvn/dv/internationalinteractions. Please direct questions about replication to Nick Dietrich at [email protected].

Supplementary Material

Supplemental data for this article can be accessed on the publisher’s website.

Notes

1 Multiple systems estimation offers an alternative to media-based reporting (Lum, Price, and Banks Citation2013) but is only feasible for limited empirical domains (i.e. country or subnational levels). It is therefore not an option for researchers interested in large cross-national analyses.

2 This is true with two exceptions: India and Syria. UCDP has determined that the non-media sources used in coding those countries — the South Asian Terrorism Portal and the Syrian Observatory for Human Rights, respectively — surpass media coverage in their detail and scope and therefore code these non-media sources first. These two countries are dropped from our analysis.

3 We address the issue of source language in the online appendix.

4 We are unable to make the opposite conclusion; events with only media sources may or may not also be reported in non-media sources. If an event has already been coded with a media source, no further action is taken if the same event is found in a non-media source.

5 This problem is not unique to our data; previous research has also struggled with using non-media sources as a benchmark because they do not constitute true population data. Davenport and Ball (Citation2002) is one of the few sources to acknowledge that coverage of conflict events by non-media sources is nonrandom. SIGACTS, for instance, includes events declassified by the U.S. Department of Defense, but not all events are equally likely to be declassified; it would be reasonable to expect that covert actions and special operations are likely to be classified at a higher level.

6 This is a sample of 25,180 events.

7 Some readers may prefer a less complicated method. Although we believe that random forests are appropriate, we validate the findings of the random forests with a logistic regression in the online appendix. The results show significant differences at the 95% level for 32 of the 40 variables we tested and affirm the primary findings of the random forests.

8 We use this measure instead of the Strobl et al. (Citation2008) conditional variable importance score for two reasons. First, patterns of missing data are a known problem in international relations (Lemke Citation2003, 120) and many of our covariates have a significant number of missing values. These missing values are unlikely to be missing completely at random, making complete case analysis inappropriate. Even if values were missing at random conditional on the other covariates, the Hapfelmeier scores would be more accurate than scores obtained from imputation because they account for the prevalence of missing data (Hapfelmeier et al. Citation2014, 21). Second, we prefer the Hapfelmeier et al. scores because our analysis is intended to be descriptive. Our goal is to inform practitioners about events most associated with media bias. Conditioning on other variables would change the interpretation of these scores and likely result in misspecification because we do not have a theoretically-derived model of the causal process.

9 For information on these categories, see http://pcr.uu.se/research/ucdp/definitions/

10 Our intuition, however, is that the risk set for this uncertainty is circumscribed to conflict zones: we have high confidence that most countries truly have no conflict events (e.g. Sweden, Norway), but in countries with ongoing armed conflict there is uncertainty regarding the scope of reporting accuracy.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 640.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.