Events Data as Bismarck's Sausages? Intercoder Reliability, Coders' Selection, and Data Quality: International Interactions: Vol 37 , No 3

Sample our Politics & International Relations journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/03050629.2011.596028?needAccess=true

Abstract

Precise measurement is difficult but essential in the generation of high-quality data, and it is therefore remarkable that often so little attention is paid to intercoder reliability. It is commonly recognized that poor validity leads to systematic errors and biased inference. In contrast, low reliability is generally assumed to be a lesser concern, leading only to random errors and inefficiency. We evaluate the intercoder reliability of our recently collected data on governance events in UN peacekeeping and show how poor coding and low intercoder reliability can produce systematic errors and even biased inference. We also show how intercoder reliability checks are useful to improve data quality. Continuous testing for intercoder reliability ex post enables researchers to create better data and ultimately improves the quality of their analyses.

Keywords:

coders
events data
measurement
peacekeeping
reliability

Acknowledgments

A previous version of this paper was presented at the ISA annual conference in New York, March 2009 and at the Folke Bernadotte UN PKO working group in Columbia University, New York, October 30–31, 2009. We acknowledge financial support from the Folke Bernadotte Akademin and the ESF/ESRC (RES-062-23-0259). We thank the participants at the Folke Bernadotte UN PKO working group for their comments. We also thank Kristian Skrede Gleditsch, María Belén González, Birger Heldt, Spyros Kosmidis, Nikolay Marinov, Will H. Moore, Vera Troeger, and Steffen Weiss and three anonymous reviewers of International Interactions for their comments. Replication data are available on the International Interactions dataverse page at http://dvn.iq.harvard.edu/dvn/dv/internationalinteractions.

Notes

¹The findings of the survey are discussed in more detail in the concluding section. See also .

²In this article we pay special attention to events data; yet, the issue of intercoder reliability applies more broadly to research methods. Even qualitative data, like interviews, archival research or even participatory observation, require “coding,” that is, the selection and interpretation of information (CitationKalyvas 2006:393–422). Although often probably unfeasible for practical reasons, in principle researchers could independently code the “raw” information in order to gauge the reliability of qualitative research.

³Once measurement bias has been identified, it is straightforward to evaluate the impact of low validity on the statistical analysis and even to correct for any bias. However, it is far from straightforward to determine whether unreliable data have significant impact on the statistical analysis or how to use information on reliability to improve data quality. CitationBaugh (2003) has developed a method that allows the adjustment of coefficients using the estimated data reliability.

⁴Conflict and cooperation do not refer to the general background of the governance event, but must be in direct response to an event. A deteriorating security situation is likely to influence governance, but conflict is only coded if the governance actors or public goods are explicitly targeted. The baseline categories are when no conflict (cooperation) is recorded.

⁵For more information on the other variables please refer to the codebook downloadable from http://privatewww.essex.ac.uk/~hdorus/.

⁶In the second stage, not all reports were double coded; Angola was double coded for the 93% of the reports, Burundi for 28%, Central African Republic 100%, and Democratic Republic of Congo 24%.

⁷As one of the reviewers pointed out, event selection may well be the major difference between high frequency data, such as TABARI, BCOW or the UN Peacekeeping events data, and country-year data, like the UCDP-PRIO data.

⁸ CitationHruschka et al. (2004) suggest fairly stringent cutoffs of Kappa ≥ 0.80 or 0.90, while CitationLandis and Koch (1977:165) suggest a range of intercoder reliability based on Kappa, distinguishing between poor (< 0.0), slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and almost perfect (0.81–1.00).

⁹The probit models of and largely replicate these findings in CitationDorussen and Gizelis (2008), at least for the “good” coders. This is noteworthy since CitationDorussen and Gizelis (2008) rely exclusive on data collected in stage 2, use a more extensive model, disaggregate conflict/cooperation levels and estimate the effects on conflict and cooperation simultaneously.

¹⁰Though we provide only graphic tests for heteroskedasticity, we have also consistent findings using heteroskedasticity probit models (CitationAlvarez and Brehm 1995) where we use the “good versus bad” variable to model the variance of the residuals. Indeed, we find that using “good” coders shrinks the residuals' variance.

¹¹We classify an article as quantitative if it includes at least one regression table. We excluded in our category “quantitative” articles that introduce new datasets, if they did not satisfy our criterion of at least one regression table. Moreover, we excluded formal theory articles that did not have any empirics.

Kalyvas , Stathis . 2006 . The Logic of Violence in Civil War , New York : Cambridge University Press .

Google Scholar

Baugh , Frank . 2003 . “ Correcting Effect Sizes for Score Reliability ” . In Score Reliability: Contemporary Thinking on Reliability Issues , Edited by: Thompson , Bruce . Thousand Oaks, CA : Sage .

Google Scholar

Hruschka , Daniel , Deborah , Schwartz , Daphne Cobb , St. John , Erin , Picone-Decaro , Jenkins , Richard A. and Carey , James W. 2004 . Reliability in Coding Open-Ended Data . Field Methods , 16 ( 3 ) : 307 – 331 .

Google Scholar

Landis , Richard and Gary , Koch . 1977 . The Measurement of Observer Agreement for Categorical Data . Biometrics , 33 ( 1 ) : 159 – 174 .

PubMed Web of Science ®Google Scholar

Dorussen , Han and Theodora-Ismene , Gizelis . Into the Lion's Den. The Local Reception of UN Peacekeeping . Paper presented at the 49th Annual ISA Convention . San Francisco, CA.

Google Scholar

Dorussen , Han and Theodora-Ismene , Gizelis . Into the Lion's Den. The Local Reception of UN Peacekeeping . Paper presented at the 49th Annual ISA Convention . San Francisco, CA.

Google Scholar

Alvarez , Michael and John , Brehm . 1995 . American Ambivalence Towards Abortion Policy: Development of a Heteroskedastic Probit Model of Competing Values . American Journal of Political Science , 39 ( 4 ) : 1055 – 1082 .

Web of Science ®Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 640.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

Events Data as Bismarck's Sausages? Intercoder Reliability, Coders' Selection, and Data Quality

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Events Data as Bismarck's Sausages? Intercoder Reliability, Coders' Selection, and Data Quality

Abstract

Acknowledgments

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature