Abstract
A sample of size N is characterized by two attributes, A and B. Assume that the corresponding 2 × 2 table of counts is subdivided into n 2 × 2 subtables according to the levels of an arbitrary and unknown factor C. The first case of Simpson's Paradox (SP) occurs when A and B are negatively associated in the sample (i.e., in the original 2 × 2 table), but positively associated or independent within each level of C. The second case of SP can be defined similarly by interchanging the words “negatively” and “positively” in the previous sentence. We consider the proportion of subdivisions of the original 2 × 2 table into n 2 × 2 subtables such that SP occurs. In a recent paper by Hadjicostas Citation[1], the case n = 2 is examined, and is proven that, as N increases without bound, the aforementioned proportion of Simpson subdivisions approaches a function of the odds ratio of a limiting form of the original 2 × 2 table. In this paper, the results are partially generalized for the cases n ≥ 3. If n = 3, the asymptotic least upper bound for the proportion of Simpson subdivisions is calculated exactly. For n = 4, simulation results show that the corresponding upper bound is of the order 10−4.
ACKNOWLEDGMENTS
The author would like to thank J. B. Kadane for suggesting the topic of this paper, and D. Banks for making some helpful comments for improving the presentation of the paper.