ABSTRACT
Many scholars believe that the proliferation of large-scale datasets will spur scientific advancement and help us to predict the future using sophisticated statistical techniques. Indeed, a team of researchers achieved astonishing success using the world’s largest event dataset, produced by the icews project, to predict complex social outcomes such as civil wars and irregular government turnovers. However, the secret of their success lay in transforming epistemically difficult questions into easy ones. Forecasting the onset of civil wars becomes an easy task if one relies on explanatory variables that measure how often newspapers report on tensions, fights, or killings. But news reports on prewar conflicts are just variations of the variable that researchers want to predict; the finding that more conflicts are likely to occur when journalists report about conflicts carries little scientific value. A similar success rate in “predicting” interstate wars can also be achieved by a simple Google News search for country names and conflict-related news shortly just before a conflict is coded as a war. Big data can help researchers to make predictions in simple situations, but there is no evidence that predictions will also succeed in uncertain environments with complex outcomes—such as those characteristic of politics.
Notes
1. Facebook Entry, Nassim Nicholas Taleb, 2 February, 2015. https://www.facebook.com/permalink.php?story_fbid=10152794640733375&id=13012333374
2. The icews dataset was recently made available to the public (Boschee et al. Citation2015).
3. Official estimates count 85 deaths and 1,813 injured, while unofficial sources claim an even higher toll (Nidhi Citation2012, 14).
4. Nate Silver's team also failed in their prediction for the UK election. None of their prediction intervals included the true number of seats for the four biggest parties (Lauderdale Citation2015).
5. These studies are ignored by Metternich et al. (Citation2013) in its discussion on Thailand's conflicts.
6. The UCDP definitions can be found at http://www.pcr.uu.se/research/ucdp/definitions.
7. Moreover, Ward et al. (Citation2013b)'s dataset is based on monthly data for forecasts over a six-month period. This reduces the difficulty of making predictions even further, as the escalation phase of a conflict usually lasts longer than a month before it is coded as a civil war. It is thus imaginable that the UCDP conflict count is already close to the 25-death threshold for a civil war in a given month, allowing for a relative easy prediction based on current trends as to whether the threshold will be surpassed in the next six months.