Publication Cover
International Interactions
Empirical and Theoretical Research in International Relations
Volume 48, 2022 - Issue 5
1,522
Views
1
CrossRef citations to date
0
Altmetric
International Conflict

When the levee breaks: A forecasting model of violent and nonviolent dissent

ORCID Icon & ORCID Icon
Pages 997-1026 | Received 05 Jan 2021, Accepted 23 May 2022, Published online: 08 Aug 2022
 

Abstract

Forecasting major political conflicts is a long-time interest in conflict research. However, the literature thus far has focused almost exclusively on armed conflicts such as civil wars. Attempts to forecast primarily unarmed conflicts have yet to identify a model able to forecast such uprisings with a high degree of accuracy. This thorny forecasting problem may in part be due to the literature’s heavy focus on parametric forecasting methods and relatively rare testing and comparison of a wide range of forecasting algorithms. This paper addresses these gaps in the literature by developing the first unified forecasting model of both major armed and unarmed conflicts at the country-year level based on extensive training, cross-validation, and comparison of eight machine learning algorithms and five forecasting ensembles. We draw on two types of data: slow-moving structural factors such as geography and levels of economic development and short-term political dynamics captured by events data trends, to inform our forecasting models. This approach significantly improves predictive power for both armed and unarmed conflict in comparison to commonly used methods in the literature and suggests that there is significant room for improving forecasts of major political conflicts. However, our algorithms still forecast armed conflict significantly better than unarmed conflict, suggesting the need for continued theory development to inform future forecasting efforts in this area.

El poder predecir los grandes conflictos políticos es un tema que interesa desde hace tiempo dentro del campo de la investigación de conflictos. Sin embargo, hasta ahora, la literatura especializada se ha centrado casi exclusivamente en los conflictos armados, como, por ejemplo, las guerras civiles. Los intentos de predecir conflictos, principalmente no armados, aún no han podido identificar un modelo capaz de pronosticar estos levantamientos con un alto grado de precisión. Este azaroso problema para realizar predicciones puede deberse, en parte, a que la literatura se centra mucho en los métodos predictivos paramétricos y a que las pruebas y comparaciones de una amplia gama de algoritmos de predicción son relativamente escasas. Este artículo aborda estas lagunas en la literatura desarrollando el primer modelo unificado de predicción, tanto de los grandes conflictos armados como de los no armados a nivel de país-año, basado en formación intensiva, validación cruzada y en la comparación de ocho algoritmos de aprendizaje automático y cinco conjuntos predictivos. Recurrimos a dos tipos de datos: factores estructurales de evolución lenta, como la geografía y los niveles de desarrollo económico, así como la dinámica política a corto plazo plasmada en las tendencias registradas en los datos de los acontecimientos, para fundamentar nuestros modelos de predicción. Este enfoque mejora significativamente el poder de predicción, tanto para los conflictos armados como para los no armados, en comparación con los métodos que se usan habitualmente en la literatura y sugiere que hay un margen significativo para mejorar las predicciones de los grandes conflictos políticos. Sin embargo, nuestros algoritmos siguen prediciendo mucho mejor los conflictos armados que los no armados, lo que sugiere la necesidad de seguir desarrollando la teoría para fundamentar los futuros esfuerzos de predicción en este ámbito.

L’anticipation de conflits politiques majeurs est un objet de recherche déjà ancien. Toutefois, à ce jour, la littérature spécialisée existante est presque exclusivement focalisée sur les conflits armés, tels que les guerres civiles. Les efforts de prédiction portant sur des conflits principalement non armés requièrent donc l’identification d’un modèle capable de prévoir les soulèvements, et ce avec un haut degré de précision. Cet épineux problème de prévision est probablement partiellement dû à une littérature fortement axée sur des méthodes de prévision de type paramétrique, laissant peu de place au test et à la comparaison d’un vaste éventail d’algorithmes prédictifs. Cet article a vocation à combler cette lacune en développant le premier modèle prédictif unifié pour des conflits majeurs armés comme non armés, par pays et par année. Ce type de modèle s’appuie sur un entraînement approfondi, une validation croisée et une comparaison portant sur huit algorithmes d’apprentissage automatique et cinq ensembles prédictifs. Pour informer nos modèles, nous nous appuyons sur deux types de données : des facteurs structurels à évolution lente, tels que la géographie ou le niveau de développement économique, d’une part, et sur des dynamiques politiques à court terme, illustrées par des données sur les tendances événementielles, d’autre part. Cette approche permet d’améliorer de manière significative les capacités de prédiction pour les conflits armés comme non armés, par rapport aux méthodes habituellement utilisées dans la littérature, et suggère que la prévision des conflits politiques majeurs peut encore être considérablement optimisée. Toutefois, nos algorithmes restent nettement plus performants pour la prévision des conflits armés que pour les conflits non armés ; un constat reflétant la nécessité de poursuivre ce travail théorique, de manière à mieux informer les futurs efforts de prévision dans ce domaine.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Acknowledgment

The authors are grateful for feedback from Matthew DiGiuseppe, John Gledhill, Joop van Holsteyn, Corinna Jentzsch, Daniel C. Thomas, and participants in our panels at the Conflict Research Society annual conference, the Tinbergen European Peace Science Conference, The Hague, The Netherlands, and Political Studies Association Political Methodology Group Annual Conference. We also would like to thank three anonymous reviewers and the editors for their invaluable comments.

Notes

1 We follow the distinction in Hegre et al. (Citation2017, 113) between forecasting as “prediction about unrealized outcomes given model estimates from realized data” and prediction as “the assignment of probability distributions to realized or unrealized outcomes.”

2 See the articles in the special issue of the Journal of Peace Research edited by Hegre et al. (Citation2017).

3 If we consider the models with the lag of outcome variables, the models perform even better. See the empirical section for further details.

4 A focus on statistical significance as measured through low p-values also puts conflict research in danger of p-hacking, in which the researcher’s biases lead them to manipulate findings to achieve their preferred result. For seminal articles on p-hacking, see Bruns and Ioannidis (Citation2016) and Ioannidis (Citation2005).

5 See complete definition as well as the definition of all these specific terms at https://www.pcr.uu.se/research/ucdp/definitions/.

6 This timeframe was determined based on limitations in data availability.

7 Running the reported trained algorithms took about 12 hours on an Amazon Web Services virtual machine with 36 CPU cores and 72 gigabytes of RAM.

8 The number of possible structural variables to employ in conflict forecasting is rapidly proliferating. We selected the variables for our models as well-supported predictors commonly used across a wide variety of influential sources and with relatively little missing data. For studies that employ similar structural predictors, see Chenoweth and Ulfelder (Citation2017) and Hegre et al. (Citation2019).

9 ELF=1πi2, where πi group i’s population share.

10 Similarly aggregated event counts of various types (both from ICEWS and from other sources) have been used in conflict forecasting by Blair and Sambanis (Citation2020), Chenoweth and Ulfelder (Citation2017), Chiba and Gleditsch (Citation2017), and Ward et al. (Citation2013), among others.

11 For a detailed discussion of these methods, see Hastie, Tibshirani, and Friedman (2013).

12 Specifically, precision-recall curves plot the number of true positives divided by the sum of true positives plus the false negatives (recall) along the x-axis and the number of true positives divided by the sum of true positives and false positives along the y-axis.

13 For a detailed discussion of the advantages of the AUC-PR over the AUC-ROC, see Beger (Citation2016). Alternate measures of model performance include Brier and F1 scores. We chose the AUC-PR over these measures for ease of comparability to the commonly employed and well-understood AUC-ROC. For a discussion of the advantages and disadvantages of various performance measures in conflict forecasting, see D’Orazio (Citation2020).

14 One exception is Muchlinski et al. (Citation2016).

15 Countries in MENA determined by the countries included in the MENA politico-geographic category by the Varieties of Democracy project.

16 NAVCO 2.1 codes the revolution in Syria as primarily nonviolent in 2011 and shifting to a violent civil war in 2012.

17 Friedman’s (Citation2001) relative importance of input variables sums marginal improvements over all boosted trees. These values are then normalized to give us a ranking of features. We also perform, Breiman’s (Citation2001) permutation feature importance is an alternate method of measuring the importance of individual predictors, which we discuss and present results from in the online appendix (Figures A17 and A18). The findings of these methods are not substantially different.