When dichotomisation becomes a problem for the analysis of middle‐sized datasets: International Journal of Social Research Methodology: Vol 12 , No 1

Abstract

This article aims at illustrating the circumstances in which Qualitative Comparative Analysis (QCA) and its ramifications, fs/QCA and MVQCA, become particularly useful tools of analysis. To this end, we discuss the most pertinent problem which researchers encounter when using QCA: the problem of contradicting observations. In QCA analysis, contradictions arise from the sheer number of cases and the problem of dichotomisation. In order to handle contradictions, the method for analysing middle‐sized‐N situations should therefore be chosen according to two parameters: the size of a dataset, and the need to preserve raw‐data information. While QCA is an apt tool for analysing comparatively small middle‐sized datasets with a correspondingly reduced necessity to preserve cluster information, the opposite holds true for fs/QCA. MVQCA strikes a balance between these two methods as it is most suitable for analysing genuinely middle‐sized case sets for which some cluster information needs to be preserved.

Acknowledgements

We wish to thank Dirk Berg‐Schlosser, Jaap Dronkers, Charles Ragin, Benoît Rihoux, Simon Toubeau and Sakura Yamasaki for stimulating discussions and their comments on earlier versions of this article.

Notes

1. Following the suggestion of Charles Ragin (see Ragin, Citation2003, p. 13), we use the notion of ‘small‐N’ for samples which include one to four cases. Examples of small‐N methods are hermeneutics, in‐depth interviews and long‐term observations (process‐tracing).

2. Following the suggestion of Charles Ragin (see Ragin, Citation2003, p. 13), we use the notion of ‘large‐N’ for samples which include more than 50 cases. Examples of large‐N methods are regression analysis and its various ramifications.

3. Consequent to the remarks of Notes 1 and 2, the notion of ‘middle‐sized‐N’ refers to samples which include 5 to 50 cases (see Ragin, Citation2003, p. 13).

4. We wish to stress that we do not want to define small‐size, genuinely middle‐sized and large‐size middle‐sized datasets by suggesting precise numbers. The reason is that these definitions depend on the individual research design, i.e. the number and the conceptual richness of causal and outcome variables.

5. The urban population indicator refers to that percentage of a population which lives in cities or towns.

6. The measure of non‐agricultural population reports the percentage of an economically active population which works in sectors outside agriculture.

7. The literacy rate refers to that share of population above 15 years which is able to read and write.

8. The variable on university education describes the number of students per 100,000 inhabitants who are enrolled in institutes of higher education. Setting the level for 100% university education at 5,000 students per 100,000 inhabitants, the indicator is calculated as follows:

9. Accordingly, our analysis includes Austria (AUS), Belgium (BEL), Czechoslovakia (CZE), Finland (FIN), France (FRA), Germany (GER), Greece (GRE), Hungary (HUN), Italy (ITA), the Netherlands (NET), Poland (POL), Portugal (POR), Romania (ROM), Spain (SPA), Sweden (SWE) and the United Kingdom (UK).

10. Logical remainders constitute all those combinations of causal conditions which are not, or cannot be, observed so that their outcome is unknown to the researcher (see Ragin, Citation1987, pp. 104–113).

11. See Note 4.

12. Ragin points out that further possibilities exist to deal with contradictions. That is, a researcher can also decide to assign an outcome score of ‘0’, or respectively ‘1’ to all contradicting cases (Ragin, Citation1987, pp. 116–117). These procedures are, however, problematic in that they ‘violate the spirit of case‐oriented qualitative research. [Accordingly they] should be used only when it is impossible to return to the original cases and construct a better truth table’ (Ragin, Citation1987, p. 118).

13. See Note 4.

14. We wish to emphasise that the results obtained from data transformation as described in Table are stable. In this respect, it is important to note that different ways exist in which the original (Vanhanen) dataset can be transformed into membership scores. Another, statistically neutral way consists in assigning zero membership (0.00) to the lowest observed value, while full membership (1.00) is assigned to the highest value of each variable. All intermediary values are then converted proportionately: The lowest case value is deduced from each individual case‐value; the so obtained figure is then divided by the difference between the highest and the lowest score. In other words, the following equation is applied to each variable:

This way of determining membership scores is, however, susceptible to outliers, because the obtained membership scores will depict a distorted image if the case sample includes outliers which provide extreme maximum or minimum values. For this reason, we preferred determining membership scores on the basis of a cluster analysis using the simple average linkage method. Yet, we cross‐checked our results. In so doing, we found that the results reported in the remainder of this section are stable in that they do not change if membership scores are determined according to the aforementioned standardisation formula.

15. It should be noted that such probabilistic criteria are fairly lax. Usually, a researcher would choose more conventional criteria, such as a 0.05 significance level and a benchmark proportion of 0.65. In this situation, a case set needs to contain at least seven consistent cases to make a cause qualify as a necessary/sufficient condition.

16. See Note 4.

17. Hence, we assign a score of ‘0’ to those cases with a raw‐data value between 0 and 32, and a score of ‘1’ to cases with a value from 32.1 to 43. Finally, we assign a score of ‘2’ to all cases with an original value above 43.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 323.00 Add to cart

* Local tax will be added as applicable

When dichotomisation becomes a problem for the analysis of middle‐sized datasets

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

When dichotomisation becomes a problem for the analysis of middle‐sized datasets

Abstract

Acknowledgements

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature