Publication Cover
International Interactions
Empirical and Theoretical Research in International Relations
Volume 42, 2016 - Issue 3
740
Views
16
CrossRef citations to date
0
Altmetric
Original Articles

Consensus Decisions and Similarity Measures in International Organizations

&
Pages 503-529 | Published online: 08 Apr 2016
 

ABSTRACT

Voting behavior in international organizations, most notably in the United Nations General Assembly (UNGA), is often used to infer the similarity of foreign policy preferences of member states. Most of these measures ignore, however, that particular covoting patterns may appear simply by chance (Häge 2011) and that these patterns of agreement (or the absence thereof) are only observable if decisions are reached through roll-call votes. As the relative frequency of roll-call votes changes considerably over time in most international organizations, currently used similarity and affinity measures offer a misleading picture. Based on a complete data set of UNGA resolution decisions, we demonstrate how taking different forms of chance agreement and the relative prevalence of consensus decisions into account affects conclusions about the effect of the similarity of member states’ foreign policy positions on foreign aid allocation.

Supplemental Material

Supplemental data for this article can be accessed on the publisher’s website.www.tandfonline.com/gini.

Notes

1 In this article, we will treat adoptions without a vote as synonymous with a consensus decision, as does much of the literature; see Blake and Lockwood Payton (Citation2015).

2 In the conclusion, based on some preliminary work, we offer some thoughts about how this problem might be addressed in the context of IRT models.

3 See Stokman (Citation1977) and Mokken and Stokman (Citation1985) for similar suggestions in the context of UNGA voting.

4 For a recent study using this measure, see Mattes, Leeds, and Carroll (Citation2015).

5 In selecting one or the other approach to measuring foreign policy similarity, researchers should consider to what extent they find this assumption justified.

6 For discussions on voting rules in international organizations in general and consensus decision making in particular, see Blake and Lockwood Payton (Citation2015). Presumably, the rationale for not taking consensus votes into account is that they do not provide for variation in voting behavior, but existing work does not explicitly justify or even discuss their exclusion (for example, Gartzke Citation1998; Alesina and Dollar Citation2000). We contend that consensus votes provide information about states’ agreement and, as outlined in further detail below, that disregarding them leads to biased measures.

7 Hug (Citation2012) shows that there is considerable variation in the share of decisions adopted without a formal vote even in UNGA decisions not related to resolutions.

8 For some contested votes up to 1988, the UNGA’s minutes only report the marginal vote distribution rather than a full roll call. We refer to those votes as “nonrecorded,” as does the United Nations, to distinguish them from consensus decisions and roll-call votes. For the replication analyses reported in the main text, we omitted resolutions adopted through nonrecorded votes. However, in the Web appendix, we report the results of replication analyses based on the averages of five imputed data sets (as suggested by King, Honaker, Joseph, and Scheve Citation2001). More specifically, based on the reported marginal vote distributions, we randomly assigned yes and no votes as well as abstentions to the participating countries for all resolutions adopted through nonrecorded votes. The similarity measures were then calculated from the imputed data sets. The results show that these imputations barely affect our substantive conclusions, largely because the number of such nonrecorded votes has declined dramatically during the time period we cover. The number of nonrecorded votes are as follows (years not listed after 1970 had no such votes): 64 (1970), 49 (1971), 40 (1972), 33 (1973), 36 (1974), 31 (1975), 7 (1976), 9 (1977), 1 (1978), 2 (1979), 17 (1980), 13 (1981), 2 (1982), 2 (1984), and 1 (1988).

9 For a list, see the appendix in Thacker (Citation1999).

10 The affinity measures are not affected by the way consensus votes are coded, as long as they are coded in the same way for all member states. Assuming that a consensus vote indicates either abstentions by all states or no votes by all states would lead to the same affinity score as assuming that it indicates yes votes by all states. However, the assumption that it signifies yes votes makes more substantive sense.

11 Indeed, in some international organizations where consensus voting is common, the respective decisions are explicitly recorded as having been adopted “by unanimity”. A prime example of this practice is the Council of the European Union (Häge Citation2013). In fact, an important reason for consensus decisions not being adopted through a roll call might be the actual absence of opposition to a motion. If it is clear from the outset that all states agree to a motion, taking a roll call is redundant.

12 We acknowledge that, empirically, the incidence of extraneous factors being responsible for a yea vote might by higher in the case of consensus than recorded votes. However, the distinction between consensus and recorded votes in this respect is a matter of degree, not a qualitative one. An important rationale for applying a chance-correction is its adjustment of similarity scores for the possibility that covoting is not purely a result of similar policy positions. But again, although this correction might be somewhat more important when including consensus votes in the analysis, the same considerations apply equally when only roll-call votes are considered. Indeed, earlier proposals for applying chance-corrections to similarity indices were made in the context of analyses of roll-call votes only, see Mokken and Stokman (Citation1985).

13 One of the anonymous reviewers suggested that our proposal replaces the empirically untestable assumption that roll-call votes are representative of consensus votes with the equally untestable assumption that all states voted in favor when a resolution was adopted through a consensus vote. In our view, we are merely extending an already existing assumption made in analyses of recorded votes to consensus votes. In any case, our approach provides at least an alternative way of measuring preference similarity that broadens the methodological choice set for researchers. Where no state objected to the adoption of a resolution and is on public record for not doing so, it seems more plausible to us to assume that everybody was in favor of the resolution than to assume that 20%, 30%, or maybe even 40% of the states privately opposed the resolution but did prefer to not voice their dissent publicly (which is implied by the assumption that roll-call votes are representative of consensus votes). In general, consensus decision making seems to follow a similar logic in all international organizations. Thus, the choice between the two assumptions needs to be made on conceptual rather than empirical grounds.

14 Agreement measures can either be formulated in terms of the proportion of agreement pA or the proportion of disagreement pD, where .pA=1-pD The choice of formulation is arbitrary. We focus on the proportion of disagreement, as it is equivalent to the “sum of distances” measures used to measure agreement in the case of interval-level variables.

15 For example, another prominent weighting function for ordered categorical data assigns weights to cells according to the squared distance between the row and column index number, that is, wij=(i-j)2. Applying this weighting function is equivalent to calculating the squared distance between dyad members’ variable values on interval-level scales. However, as no compelling reason exists to weight the difference between the two extreme categories four times heavier than the difference between the middle category and one of the extreme categories, we do not consider this weighting function in our analyses.

16 UNGA sessions and years do not completely overlap. As the temporal scope of the units of analysis usually used in international relations research is the year or a multiple thereof, we calculate agreement scores for individual years rather than UNGA sessions. In the calculation of dyadic similarity scores, a particular resolution is only included if both states were present during the meeting in which the resolution was adopted.

17 The extent to which the artificial data in do indeed reflect the actual voting behavior of those states during the Cold War is incidental to the argument we make here.

18 See the notes to for a detailed example of how the proportion of disagreement is calculated from the information in the contingency tables.

19 Note that the size of the bias is not constant across dyads within a year. For example, consider a proportion of disagreement of 0.5 resulting from contrary voting on four out of eight roll-call votes. Adding two consensus votes increases the denominator from 8 to 10, resulting in a proportion of disagreement of 0.4. Now consider a proportion of disagreement of 0.25 resulting from contrary voting on two out of eight roll-call votes. Adding two consensus votes in this situation results in a proportion of disagreement of 0.2. Thus, whereas the bias in the first situation is 0.1, it is 0.05 in the latter. The situation becomes even more complicated when chance-corrections are applied, as the resulting similarity values are generally nonlinear functions of the proportion of disagreement. The differential impact on dyads within the same year implies that the bias resulting from ignoring consensus votes cannot be avoided by including control variables, such as time dummies or a continuous variable for the number of consensus votes, in statistical analyses. If the number of consensus votes was the same for each dyad in each year, including the number of consensus votes in that year plus its interaction with the proportion of disagreement could in principle be a technical substitute for including consensus votes in the measure itself. However, in practice, the number of consensus votes is not constant for all dyad members in a certain year. A dyadic similarity score can only be calculated if both dyad members participated in the adoption of a particular resolution. Due to some states only being members during part of a year or simply not attending the General Assembly meeting in which a resolution has been adopted, this is not always the case. As a result, the number of consensus votes varies from dyad to dyad. In fact, absenteeism is quite common in the General Assembly; roughly 89% of all dyad similarity scores are based on a number of resolutions that is lower than the total number of resolutions adopted during a particular year because one or both dyad members did not attend a the meeting in which a particular resolution was adopted. Furthermore, this percentage varies widely over time from 21% in 1955 to 100% in 1985.

20 Mokken and Stokman (Citation1985:187–188) argue that this chance correction is useful for measuring the cohesion of a decision-making body as a whole.

21 The lack of plausible assumptions about the marginal distributions used in the calculation of chance disagreement in S is understandable, given that the correction for chance disagreement was not an explicit goal in the development of this measure.

22 It is easy to construct an example of a contingency table with asymmetric marginal distributions that yields a higher expected proportion of disagreement value than 0.5.

23 Häge (Citation2011:293) makes the case that the assumptions of Scott’s π are more appropriate for measuring foreign policy similarity based on UNGA voting data. In terms of the relatively low costs of creating a UNGA voting tie compared to an alliance tie, this makes sense. However, in the case of UNGA data, the main reason why individual states may systematically differ in their propensity to vote in a certain way has less to do with differential costs, given that voting is relatively “cheap” regardless of what type of vote is being cast (see Hovet Citation1960) but with the content of the agenda they are asked to vote upon.

24 In the Web appendix we also report replication results based on the other four similarity measures discussed earlier.

25 We obtained the replication data from the AidData website (http://aiddata.org/content/index/Research/replication-datasets), and David Dollar provided greatly appreciated help in using it.

26 Unfortunately, the authors offer almost no explanation of how this measure was constructed. For instance, we do not know whether abstentions were counted. We also do not know whether proportions were calculated over all resolutions adopted throughout the 5-year period used as the temporal unit of analysis in this study or first for individual years separately and then aggregated over the 5-year period.

27 Again, it is important to note that we make the assumption that adoptions without a vote signal unanimous support for the resolution in question. As noted in footnote 8, we omit nonrecorded votes for which only the marginal distribution is recorded. Results based on imputed data sets taking those nonrecorded votes into account are reported in the Web appendix. All the data will be made available on dataverse upon publication.

28 In the Web appendix we also report replications of a model by Alesina and Dollar (Citation2000) focusing on total bilateral aid. For the model reported in , we list in the Web appendix (Table A11) the list of countries covered and the number of cases.

29 Given the robustness of the negative effect of this variable in the remaining models in , we can only suspect a typo in Alesina and Dollar’s (Citation2000) article. Regarding Alesina and Dollar’s (Citation2000) model, one might also suspect that changes in the dependent variable over time are not only affected by their independent variables but also by past aid allocations (we thank an anonymous reviewer for alerting us to this point). As the goal of our replication analysis is to show the sensitivity of Alesina and Dollar’s (Citation2000) result to changes in the measures of similarity, it seems inappropriate to change the underlying empirical model.

30 When replicating these analyses and also taking into account nonrecorded votes through imputed data sets, we find the same pattern of coefficients (see Table A6 in the Web appendix).

31 The latter two results derive from analyses reported in the Web appendix.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 640.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.