777
Views
0
CrossRef citations to date
0
Altmetric
Article

On the distribution of Gini’s rank association index

, &
Pages 7788-7796 | Received 21 Sep 2021, Accepted 25 Apr 2022, Published online: 11 May 2022

Abstract

Gini’s rank association index is a non parametric measure of association, and is fully described by Genest, Nešlehová and Ben Ghorbal (Citation2010). In this article, we compute the null distribution of this index up to n = 28 where n is the sample size. Our methods are based on permanents and extend the results of Betro (Citation1993). We also discuss approximations to the null distribution for large n. We believe that Gini’s rank association index should be more widely used; in particular, it may be preferable to Spearman’s rank correlation coefficient if the bivariate distribution is such that outliers are quite likely to occur.

1. Introduction

Gini’s rank association index, also known as Gini’s gamma, is a rank correlation coefficient introduced by the Italian statistician, demographer and sociologist Corrado Gini (Citation1914). Let (X1, Y1), …, (Xn, Yn) be a random sample from some continuous bivariate variable (X, Y), with ranks (R1, S1), …, (Rn, Sn), respectively. Gini’s gamma is defined to be: (1.1) γn=1Di=1n{|(n+1Ri)Si||RiSi|},(1.1)  where D=n2/2={n22, if n is even n212, if n is odd .

The sum in formula Equation(1.1) may take any value from – D to D, hence γn is discretely distributed on the interval [1,1]. In particular, a value of γn greater than zero indicates a positive monotonic relationship between X and Y, while a value less than zero indicates a negative relationship. The strength of the relationship depends on how close Gini’s gamma lies to + 1 and −1, respectively, and a value near zero indicates an absence of correlation.

Under the assumption of the independence of X and Y, we have (1.2) E(γn)=0,(1.2) (1.3) var(γn)={2(n2+2)3(n1)n2, if n is even 2(n2+3)3(n1)(n21), if n is odd ,(1.3) and that the limiting distribution of nγn is normal with mean 0 and variance 2/3, as shown by Cifarelli and Regazzini (Citation1977).

As emphasized by Genest, Nešlehová and Ben Ghorbal (Citation2010), Gini’s gamma belongs to the same family of correlation coefficients as the better-known statistics Kendall’s tau and Spearman’s rho. Also included in this family is Spearman’s footrule, a coefficient that is closely related to Gini’s gamma and defined as formula Equation(1.4), (1.4) ϕn=13n21i=1n|RiSi|.(1.4)

The Manhattan (or city block) distance used in Spearman’s footrule and Gini’s gamma is less variable than the Euclidean metric used in Spearman’s rho, as stated by Diaconis and Graham (Citation1977). However, unlike Gini’s gamma, Spearman’s footrule does not behave symmetrically for inverse rankings.

The distribution of γn under the assumption of the independence of X and Y has been studied by a number of authors. Savorgnan (Citation1915) tabulated the exact null distribution for n5, Salvemini (Citation1951) extended these results to n = 6 and 7, and Betro (Citation1993) tabulated the null distribution for n{8,,15}.

In Section 2, we outline the main methods previously used for computing the distribution of Gini’s gamma. In Section 3 we compute the exact distribution of Gini’s rank association index up to n = 28 by the Permanent and BBFG Method. In Section 4 we discuss some approximations to the distribution of Gini’s gamma for large n.

At this point, we mention that Pearson’s correlation coefficient may of course be used to test for independence if (X, Y) follows a bivariate normal distribution (at least approximately). The asymptotic relative efficiencies under bivariate normality of various rank statistics are shown in (Genest, Nešlehová and Ben Ghorbal Citation2010).

Table 1. Asymptotic relative efficiency in the bivariate normal case.

These rank correlation coefficients are suitable if there may be a monotonic relationship between X and Y. If the possible relationship may be the non monotonic, one may use Hoeffding’s D-test or other tests of a similar form, as described by Hoeffding (Citation1948), Blum, Kiefer, and Rosenblatt (Citation1961), Even-Zohar (Citation2020) and Genest and Rémillard (Citation2004). It should be mentioned that these test statistics do not, of course, give an indication of the direction of any relationship between X and Y (if a monotonic relationship exists).

2. Previous results

To calculate the exact distribution of Gini’s rank association index under the assumption of independence, the simplest algorithm is to enumerate all n! permutations; this will be referred to as the Basic Permutation Method. In this algorithm, the calculation and storage of permutations occupy a lot of time and computer memory. Otten (Citation1973) improved the Basic Permutation Method by splitting the permutations into two sub-permutations, in sets {1,,k} and {k+1,,n} for some integer k between 1 and n. These sub-permutations are processed separately and are followed by addition. Using this algorithm, Otten (Citation1973) extended the exact null distribution of Spearman’s rho to n = 13. The exact distribution of Spearman’s rho was later computed up to n = 26 by Gustafson (Citation2009) using a method based on permanents.

Betro (Citation1993) determined the probability generating function of the Gini’s gamma through a procedure similar to Kendall’s (Citation1962) calculation of Spearman’s probability generating function. The probability generating function was then transformed into an expression related to a permanent. Thus, the calculation of the probability generating function was reduced to calculating the permanent of a matrix A, and the exact distribution of Gini’s gamma was computed by Betro for up to n = 15. It had previously been calculated for smaller n by Savorgnan (Citation1915), Salvemini (Citation1951), and Cifarelli and Regazzini (Citation1977).

3. Computing the distribution of Gini’s gamma up to n = 28

To further extend the exact distribution of Gini’s gamma, we employed Betro’s method (Betro Citation1993) to construct a modified probability generating function (PGF), and further optimized the computational efficiency to O(2nn) by using the Balasubramanian-Bax-Franklin-Glynn formula rather than Ryser’s formula.

Balasubramanian (Citation1980), Bax (Citation1998), Bax and Franklin (Citation1996), and Glynn (Citation2010) found a formula for calculating permanents: the Balasubramanian-Bax-Franklin-Glynn (BBFG) formula: perm(A)=12n1[δ(k=1nδk)j=1ni=1nδiaij], where δ=(δ1=1,δ2,,δn){±1}n.

We define the modified PGF for a variable ξ which may take a finite number of integer values (positive, negative, and zero), as: Fξ*(z)=k=m1k=m2[Pξ(k)zk], where m1Z,m2Z,m1<m2, and Pξ(k) is the probability that ξ=k.

Now we define M=D×γn, which takes even integer values from –D to D. By methods similar to those used by Cifarelli and Regazzini (Citation1977) to find the PGF of M + D, we find that the modified PGF of M is: Fξ*(z)=1n! perm(A), where A is the n × n matrix with entries zqi,j where qi,j=(|(n+1i)j||ij|) for i,j{1,,n}.

Using the above PGF we implemented the BBFG formula and calculated the distribution of γn up to n = 28, using the C++ programming language. We shall call this method for computing the distribution of Gini’s rank association coefficient the Permanent and BBFG Method.

Tables A1–A3 list the cumulative distribution function (CDF) of M for n from 5 to 28, rounded to five decimal places. The values for integer n{8,,15} in Table A1 are consistent with those of Betro (Citation1993).

4. Approximations to Gini’s gamma for large n

This section considers two possible approximations for the null distribution of Gini’s gamma: the normal approximation and the Edgeworth approximation.

The mean of γn and the variance of γn are given by formulae Equation(1.2) and Equation(1.3) respectively. It follows that the mean of M is 0 and the variance of M is σM2={n2(n2+2)6(n1), if n is even(n2+3)(n21)6(n1), if n is odd.

Allowing for a continuity correction of 1, since the values (m) attainable by M are even integers, we may approximate the CDF of the variable M by Φ(x), where x=(m+1)/σM, and Φ is the cumulative distribution function of a standard normal random variable. This is the normal approximation.

Allowing for the same continuity correction, the Edgeworth approximation of the variable M is given by Φ(x)β242πe12x2(x33x), where β=E(γn4)/[var(γn)]23 is the kurtosis of γn.

Formulae for the fourth moment are given by Landenna, Scagni and Boldrini (Citation1989). We have corrected a suspected typographical error (by changing 107 to 105).

compares the normal and Edgeworth approximations to the calculated CDF of M for certain values of m when n = 28. From this, it can be seen that the normal distribution gives a good approximation to Gini’s gamma for n = 28, and therefore it is likely that it would provide a good approximation for n > 28. We can also see that the Edgeworth approximation is extremely accurate, more so than the normal, although the CDF may slightly exceed 1 in certain cases.

Table 2. The calculated CDF of Gini’s gamma for n = 28 with the normal and Edgeworth approximations.

5. Conclusions

This article reviews the methods for calculating and approximating the null distribution of Gini’s gamma and extends the tabulation to n = 28. One reason for calculating more exact values of the null distribution of Gini’s gamma is to verify that the normal and Edgeworth approximations are generally reliable. Gini’s gamma is symmetric for inverse rankings and its use of the Manhattan distance implies that it is less variable than statistics that use the Euclidean metric. The tables in the Appendix may be used to construct tests for bivariate independence using Gini’s gamma for n up to 28. The approximations described in Section 4 may be used, with relative confidence, to carry out tests of independence for n larger than 28.

Acknowledgments

We thank Dr. Andreas Grothey (University of Edinburgh) for his valuable advice on code optimization and Prof. Christian Genest (McGill University) for explaining some points. We also thank Liam Bryan and Xueying Liu (Indiana University-Purdue University) for their unpublished project report: A Study of Spearman’s Rank Correlation. Finally, we would like to thank the University of Edinburgh and the University of Bath for the use of their high-powered computer facilities.

Additional information

Funding

Yiwei Zong gratefully acknowledges financial support in the form of a vacation grant from the School of Mathematics and College of Science and Engineering at the University of Edinburgh. Ffion Loring gratefully acknowledges financial support in the form of a vacation grant from MAC-MIGS at the Maxwell Institute Graduate School.

References

  • Balasubramanian, K. 1980. Combinatorics and diagonals of matrices. Thesis, Indian Statistical Institute, Calcutta.
  • Bax, E. T. 1998. Finite-difference algorithms for counting problems. Thesis. California Institute of Technology, Pasadena, CA.
  • Bax, E. T., and J. Franklin. 1996. A finite-difference sieve to count paths and cycles by length. Information Processing Letters 60 (4):171–6. doi:10.1016/S0020-0190(96)00159-7.
  • Betro, B. 1993. On the distribution of Gini’s rank association coefficient. Communications in Statistics - Simulation and Computation 22 (2):497–505. doi:10.1080/03610919308813105.
  • Blum, J. R., J. Kiefer, and M. Rosenblatt. 1961. Distribution free tests of independence based on the sample distribution function. The Annals of Mathematical Statistics 32 (2):485–98. doi:10.1214/aoms/1177705055.
  • Cifarelli, D. M., and E. Regazzini. 1977. On a distribution-free test of independence based on Gini’s rank association coefficient. Recent Developments in Statistics (Proceedings of the European Meeting of Statisticians, Grenoble 1976), Amsterdam, North Holland, 375–85.
  • Diaconis, P., and R. L. Graham. 1977. Spearman’s footrule as a measure of disarray. Journal of the Royal Statistical Society: Series B (Methodological) 39 (2):262–8. doi:10.1111/j.2517-6161.1977.tb01624.x.
  • Even-Zohar, C. 2020. Independence: fast rank tests. arXiv preprint arXiv: 2010.09712. doi:10.48550/arXiv.2010.09712.
  • Genest, C., and B. Rémillard. 2004. Tests of independence and randomness based on the empirical copula process. Test 13 (2):335–69. doi:10.1007/BF02595777.
  • Genest, C., J. Nešlehová, and N. Ben Ghorbal. 2010. Spearman’s footrule and Gini’s gamma: A review with complements. Journal of Nonparametric Statistics 22 (8):937–54. doi:10.1080/10485250903499667.
  • Gini, C. 1914. L’Ammontare e la Composizione della Ricchezza delle Nazione. Torino: Bocca.
  • Glynn, D. G. 2010. The permanent of a square matrix. European Journal of Combinatorics 31 (7):1887–91. doi:10.1016/j.ejc.2010.01.010.
  • Gustafson, L. 2009. Spearman’s rho null distribution. Last modified September 6. https://luke-g.com/math/spearman/index.html (accessed August 6, 2021).
  • Hoeffding, W. 1948. A non-parametric test of independence. The Annals of Mathematical Statistics 19 (4):546–57. doi:10.1214/aoms/1177730150.
  • Kendall, M. G. 1962. Rank correlation methods: 3d Ed. C. Griffin. Statistics 19:546–57.
  • Landenna, G., A. Scagni, and M. Boldrini. 1989. An approximated distribution of the Gini’s rank association coefficient. Communications in Statistics - Theory and Methods 18 (6):2017–26. doi:10.1080/03610928908830019.
  • Otten, A. 1973. The null distribution of Spearman’s S when n=13(1)16. Statistica Neerlandica 27 (1):19–20.
  • Salvemini, T. 1951. Sui vari indici di cograduazione. Statistica 11:133–54.
  • Savorgnan, F. 1915. Sulla Formazione dei Valori dell’Indice di Cograduazione. Studi Economico-Giuridici dell’Università di Cagliari.

Appendix A

Table A1. The CDF of M for 5n18.

Table A2. The CDF of M for 19n23.

Table A3. The CDF of M for 24n28.