368
Views
25
CrossRef citations to date
0
Altmetric
Original Articles

When dichotomisation becomes a problem for the analysis of middle‐sized datasets

Pages 33-50 | Received 15 Sep 2005, Accepted 15 May 2007, Published online: 27 Jan 2009
 

Abstract

This article aims at illustrating the circumstances in which Qualitative Comparative Analysis (QCA) and its ramifications, fs/QCA and MVQCA, become particularly useful tools of analysis. To this end, we discuss the most pertinent problem which researchers encounter when using QCA: the problem of contradicting observations. In QCA analysis, contradictions arise from the sheer number of cases and the problem of dichotomisation. In order to handle contradictions, the method for analysing middle‐sized‐N situations should therefore be chosen according to two parameters: the size of a dataset, and the need to preserve raw‐data information. While QCA is an apt tool for analysing comparatively small middle‐sized datasets with a correspondingly reduced necessity to preserve cluster information, the opposite holds true for fs/QCA. MVQCA strikes a balance between these two methods as it is most suitable for analysing genuinely middle‐sized case sets for which some cluster information needs to be preserved.

Acknowledgements

We wish to thank Dirk Berg‐Schlosser, Jaap Dronkers, Charles Ragin, Benoît Rihoux, Simon Toubeau and Sakura Yamasaki for stimulating discussions and their comments on earlier versions of this article.

Notes

1. Following the suggestion of Charles Ragin (see Ragin, Citation2003, p. 13), we use the notion of ‘small‐N’ for samples which include one to four cases. Examples of small‐N methods are hermeneutics, in‐depth interviews and long‐term observations (process‐tracing).

2. Following the suggestion of Charles Ragin (see Ragin, Citation2003, p. 13), we use the notion of ‘large‐N’ for samples which include more than 50 cases. Examples of large‐N methods are regression analysis and its various ramifications.

3. Consequent to the remarks of Notes 1 and 2, the notion of ‘middle‐sized‐N’ refers to samples which include 5 to 50 cases (see Ragin, Citation2003, p. 13).

4. We wish to stress that we do not want to define small‐size, genuinely middle‐sized and large‐size middle‐sized datasets by suggesting precise numbers. The reason is that these definitions depend on the individual research design, i.e. the number and the conceptual richness of causal and outcome variables.

5. The urban population indicator refers to that percentage of a population which lives in cities or towns.

6. The measure of non‐agricultural population reports the percentage of an economically active population which works in sectors outside agriculture.

7. The literacy rate refers to that share of population above 15 years which is able to read and write.

8. The variable on university education describes the number of students per 100,000 inhabitants who are enrolled in institutes of higher education. Setting the level for 100% university education at 5,000 students per 100,000 inhabitants, the indicator is calculated as follows:

9. Accordingly, our analysis includes Austria (AUS), Belgium (BEL), Czechoslovakia (CZE), Finland (FIN), France (FRA), Germany (GER), Greece (GRE), Hungary (HUN), Italy (ITA), the Netherlands (NET), Poland (POL), Portugal (POR), Romania (ROM), Spain (SPA), Sweden (SWE) and the United Kingdom (UK).

10. Logical remainders constitute all those combinations of causal conditions which are not, or cannot be, observed so that their outcome is unknown to the researcher (see Ragin, Citation1987, pp. 104–113).

11. See Note 4.

12. Ragin points out that further possibilities exist to deal with contradictions. That is, a researcher can also decide to assign an outcome score of ‘0’, or respectively ‘1’ to all contradicting cases (Ragin, Citation1987, pp. 116–117). These procedures are, however, problematic in that they ‘violate the spirit of case‐oriented qualitative research. [Accordingly they] should be used only when it is impossible to return to the original cases and construct a better truth table’ (Ragin, Citation1987, p. 118).

13. See Note 4.

14. We wish to emphasise that the results obtained from data transformation as described in Table are stable. In this respect, it is important to note that different ways exist in which the original (Vanhanen) dataset can be transformed into membership scores. Another, statistically neutral way consists in assigning zero membership (0.00) to the lowest observed value, while full membership (1.00) is assigned to the highest value of each variable. All intermediary values are then converted proportionately: The lowest case value is deduced from each individual case‐value; the so obtained figure is then divided by the difference between the highest and the lowest score. In other words, the following equation is applied to each variable:

This way of determining membership scores is, however, susceptible to outliers, because the obtained membership scores will depict a distorted image if the case sample includes outliers which provide extreme maximum or minimum values. For this reason, we preferred determining membership scores on the basis of a cluster analysis using the simple average linkage method. Yet, we cross‐checked our results. In so doing, we found that the results reported in the remainder of this section are stable in that they do not change if membership scores are determined according to the aforementioned standardisation formula.

15. It should be noted that such probabilistic criteria are fairly lax. Usually, a researcher would choose more conventional criteria, such as a 0.05 significance level and a benchmark proportion of 0.65. In this situation, a case set needs to contain at least seven consistent cases to make a cause qualify as a necessary/sufficient condition.

16. See Note 4.

17. Hence, we assign a score of ‘0’ to those cases with a raw‐data value between 0 and 32, and a score of ‘1’ to cases with a value from 32.1 to 43. Finally, we assign a score of ‘2’ to all cases with an original value above 43.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 323.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.