Abstract
In this paper, we discuss some important aspects of the bivariate alternative zero-inflated logarithmic series distribution (BAZILSD) of which the marginals are the alternative zero-inflated logarithmic series distributions of Kumar and Riyaz (2015. An alternative version of zero-inflated logarithmic series distribution and some of its applications. Journal of Statistical Computation and Simulation, 85(6), 1117–1127). We study some important properties of the distribution by deriving expressions for its probability mass function, factorial moments, conditional probability generating functions, and recursion formulae for its probabilities, raw moments and factorial moments. The parameters of the BAZILSD are estimated by the method of maximum likelihood and certain test procedures are also considered. Further certain real-life data applications are cited for illustrating the usefulness of the model. A simulation study is conducted for assessing the performance of the maximum likelihood estimators of the parameters of the BAZILSD.
1. Introduction
Bivariate discrete distributions have received much attention in the literature. For example, see Ghosh and Balakrishnan (Citation2015), Hassan and El-Bassiouni (Citation2013), Kemp (Citation2013), Kumar (Citation2008), Kocherlakota and Kocherlakota (Citation1992) and references therein. Due to the extensive applications of logarithmic series distribution in various areas of scientific research especially in biology, ecology, meteorology, etc., the bivariate logarithmic series distribution (BLSD) is of particular interest. Chapter 7 of Kocherlakota and Kocherlakota (Citation1992) is fully devoted to the BLSD. Subrahmaniam (Citation1966) defined the BLSD through the following probability generating function (pgf) (1.1) (1.1) in which , and such that . An important drawback of the BLSD in practical point of view is that it excludes the (0, 0)-th observation from its support. To overcome this difficulty, Kumar and Riyaz (Citation2014) considered a class of bivariate distribution namely the ‘bivariate zero-inflated logarithmic series distribution (BZILSD)’ through the following probability mass function (pmf), for any non-negative integers and , , and such that . (1.2) (1.2) where and in which is the Gauss hypergeometric function (cf. Mathai & Haubold, Citation2008).
Kumar and Riyaz (Citation2013) considered the zero-inflated logarithmic series distribution (ZILSD) through the following pgf, in which with . (1.3) (1.3) or equivalently, (1.4) (1.4) Kumar and Riyaz (Citation2015) considered another zero-inflated logarithmic series distribution, which they termed as ‘the alternative zero-inflated logarithmic series distribution (AZILSD)’, through the following pmf, for , (1.5) (1.5) in which , , and such that . The pgf of the AZILSD with pmf (1.5) is (1.6) (1.6) or equivalently, (1.7) (1.7) Kumar and Riyaz (Citation2017) studied an extended version of AZILSD and its important properties. Kumar and Riyaz (Citation2016) considered an order version of AZILSD and studied its important applications.
Through this paper, we consider a bivariate version of the AZILSD through the name ‘the bivariate alternative zero-inflated logarithmic series distribution’ or, in short ‘the BAZILSD’, and discuss some of its important aspects. In Section 2, we derive the BAZILSD as a bivariate random sum distribution of independent and identically distributed bivariate Bernoulli random variables and show that the marginal distributions of the BAZILSD are AZILSD. We obtain expressions for its pmf, mean, covariance, factorial moments and conditional pgfs which are included in Section 2. In Section 3, we derive certain recursion formulae for probabilities, raw moments and factorial moments of the BAZILSD. In Section 4, we describe the estimation of the parameters of the BAZILSD by method of maximum likelihood and certain test procedures are suggested. And in Section 5, we illustrate the usefulness of the BAZILSD through fitting the distribution to certain real-life data sets. In Section 6, a brief simulation study is conducted for examining the performance of the maximum likelihood estimators of the parameters of the BAZILSD.
It is important to note that the BAZILSD possesses a bivariate random sum structure as shown in Section 2. Certain bivariate random sum distributions are studied in the literature. For example, see Kumar (Citation2007, Citation2013). The random sum structure arises in several areas of scientific research particularly in actuarial science, agricultural science, biological science and physical science. Chapter 9 of Johnson et al. (Citation2005) fully devoted to univariate random sum distributions.
For simplicity in the notations, we adopt the following notations throughout in the manuscript. (1.8) (1.8) (1.9) (1.9) (1.10) (1.10) (1.11) (1.11) (1.12) (1.12) (1.13) (1.13)
2. A genesis and some properties of the BAZILSD
First, we derive the BAZILSD in the following and discuss some of its properties.
Consider the sequence of independent and identically distributed bivariate Bernoulli random vectors, each with pgf in which , with such that , and Let be a non-negative integer valued random variable having AZILSD with pgf (1.6), in which . Assume that and ’s are independent. Define , for each in which and , for and . Set where denotes the indicator function of an event . Then the pgf of is (2.1) (2.1) where is defined in (1.9).
We call a distribution with pgf (2.1) ‘the bivariate alternative zero-inflated logarithmic series distribution’ or, in short ‘the BAZILSD’. Clearly when , the pgf given in (2.1) reduces to the following pgf of the BZILSD with pmf (1.2). (2.2) (2.2) which shows that the proposed bivariate model of the AZILSD can be considered as a more flexible model in practical point of view compared to the BZILSD. Further, it can be noted that the marginals of the BAZILSD are AZILSD whereas the marginals of the BZILSD are not ZILSD.
Proposition 2.1.
If follows the BAZILSD, then the marginal distribution of for is AZILSD with pgf given below. and
The proof follows from the fact that and .Proposition 2.2.
The pgf of the conditional distribution of given is the following: for any non-negative integer , (2.3) (2.3)
Proof:
For any non-negative integer assume that . Now, we have the following partial derivatives of order of with respect to evaluated at . (2.4) (2.4) where for (2.5) (2.5) and is defined in (1.8).
Now, applying the formula for the conditional pgf in terms of partial derivatives of the joint pgf developed by Subrahmaniam (Citation1966), we obtain the conditional pgf of given as which implies (2.3) in the light of (1.8).
Remark 2.1.
The conditional distribution of given as given in (2.3) can be written as where is the pgf of a binomial random variable with parameters and and is the pgf of a random variable following the AZILSD with parameters , and . Thus clearly, the conditional distribution given is the distribution of the sum of two independent random variables and .
By using Remark 2.1, we obtain the following proposition.Proposition 2.3:
Let follow the BAZILSD with pgf (2.1). Then (2.6) (2.6) (2.7) (2.7)
Remark 2.2:
By a similar approach, for any non-negative integer with we can obtain the conditional pgf of given by interchanging and in (2.3). Therefore, it is evident that comments similar to those in Remark 2.1 are valid regarding conditional distribution of given and the explicit expression for and can be obtained by interchanging and in the right hand side expressions of (2.6) and (2.7) respectively.
Proposition 2.4 :
Let follow the BAZILSD with pgf (2.1) and let be any non-negative integers. The pmf and the -th factorial moment of the BAZILSD are (2.8) (2.8) (2.9) (2.9) where is defined in (1.2), for , ’s are defined in (1.11) and (1.12) and .
Proof :
In order to obtain the probability mass function of the BAZILSD, we need the following derivatives of , in which is a non-negative integer. (2.10) (2.10) where (2.11) (2.11) The following derivatives are needed in the sequel, in which and . (2.12) (2.12) (2.13) (2.13) Differentiating both sides of (2.10) times with respect to and applying (2.12) and (2.13), we get the following. (2.14) (2.14) By putting in (2.14) and by dividing , we get (2.8). By putting in (2.14), we get (2.9).
Proposition 2.5 :
Let follow the BAZILSD with pgf (2.1). Then we have the following, in which , (2.15) (2.15) (2.16) (2.16) and (2.17) (2.17) where and are given in (2.5).
The proof follows from (2.9) in the light of the relations:
Proposition 2.6.
Let follow the BAZILSD with pgf (2.1). Then follows the modified AZILSD studied by Kumar and Riyaz (Citation2013).
The proof follows from the fact that the pgf of is
3. Recursion formulae
In this section, we develop certain recursion formulae for probabilities, raw moments and factorial moments. Let be a random vector with pgf (2.1). For the sake of computational simplicity, we define , for . Now we have the following from (2.1) in which , for (3.1) (3.1) Now we obtain the following propositions.
Proposition 3.1
The probability mass function of the BAZILSD satisfies the following recurrence formulae, in which is defined in Proposition 2.5. (3.2) (3.2) (3.3) (3.3) (3.4) (3.4) (3.5) (3.5)
Proof :
From (2.10) with , we have the following. (3.6) (3.6) On differentiating both sides of (3.1) with respect to , we have (3.7) (3.7) From (3.1), we also have the following. (3.8) (3.8) Now by using (3.7) and (3.8) in (3.6) we get (3.9) (3.9) On equating the coefficient of on both sides of (3.9), we get (3.2). By equating the coefficient of on both sides of (3.9), we get the relation (3.3). We omit the proof of relations (3.4) and (3.5) as it is similar to that of relations (3.2) and (3.3).
Proposition 3.2 :
Two recurrence formulae for the -th raw moment of the BAZILSD are the following, for . (3.10) (3.10) (3.11) (3.11)
Proof :
The characteristic function of the BAZILSD with pgf (2.1) is the following. For in and , (3.12) (3.12) where .
On differentiating (3.12) with respect to we get, (3.13) (3.13) In the light of (3.12), we have the following from (3.13). Now, on expanding exponential functions, rearranging the term and by using standard properties of double sum we obtain the following. (3.14) (3.14) On equating coefficients of on both sides of (3.14), we get the relation (3.10). A similar procedure will give (3.11).
Proposition 3.3
: The -th order factorial moment of the BAZILSD satisfies the following recurrence formulae, for , in which . (3.15) (3.15) (3.16) (3.16)
Proof:
Let be a random vector having the BAZILSD with pgf as given in (3.1). Then the factorial moment generating function of the BAZILSD is (3.17) (3.17) where .
On differentiating (3.16) with respect to , we get In the light of (3.17), we can write this as (3.18) (3.18) Equating the coefficient of on both sides of (3.18), we get (3.15). Similar procedures will lead to (3.16).
4. Estimation and testing
In this section, we discuss the estimation of the parameters , , and of the BAZILSD by the method of method maximum likelihood and construct certain test procedures for testing the significance of the additional parameter of the BAZILSD.
4.1. Maximum likelihood estimation
Let be the frequency of the -th cell of a bivariate data. Let be the highest value of observed and be the highest value of observed. Then the likelihood function of the sample is (4.1) (4.1) where is the pmf of the BAZILSD as given in (2.8). Taking logarithm on both sides of (4.1), we get (4.2) (4.2) where is given in (1.9), and is defined in Proposition 2.4.
Let , , and denote the maximum likelihood estimators of the parameters , , and of the BAZILSD. On differentiating (4.2), partially with respect to the parameters , , and , respectively, and equating to zero, we get the following likelihood equations, in which in the light of , where and are defined in (2.5) and (1.8), respectively. (4.3) (4.4) (4.3) (4.4) (4.5) (4.5) and (4.6) (4.6) Now on solving these likelihood equations (4.3)–(4.6) by using some mathematical software such as MATHLAB, MATHCAD, MATHEMATICA, etc., one can obtain the maximum likelihood estimators of the parameters , , and .
4.2. Testing of the hypothesis
For testing the hypothesis against the alternative hypothesis , we construct the generalized likelihood ratio test (GLRT) and Rao’s efficient score test (REST) as follows.
In case of (GLRT), the test statistic is (4.7) (4.7) where is the maximum likelihood estimator of with no restrictions, and is the maximum likelihood estimator of when . The test statistic given in (4.7) is asymptotically distributed as Chi-square with one degree of freedom. For details, see Rao (Citation1973).
In case of (REST), the following test statistic can be used. (4.8) (4.8) where and are the Fisher information matrices in which and for and are as given in the Appendix. The test statistic given in (4.8) follows Chi-square distribution with one degree of freedom (see Rao, Citation1973).
5. Applications
For numerical applications, we consider two real-life data sets of which the first data set is from MitchelL and Paulson (Citation1981), which consists of the number of aborts by 109 aircrafts in two consecutive six months of one year period and the second data set, taken from Partrat (Citation1993), is the yearly frequency of hurricanes affecting tropical cyclones in two zones belonging to the North Atlantic coastal states in the USA. We have fitted the BZILSD, the BAZILSD and the bivariate Poisson distribution (BPD) to these data sets by the method of the maximum likelihood estimates of the parameter of the models. For the first data set, the maximum likelihood estimates (MLES) of the parameters in case of the BZILSD are , and , those in case of the BAZILSD are , , and , and those in case of the BPD are and For the second data set, the MLES of the parameters in case of the BZILSD are , and = 0.02, those in case of the BAZILSD are , , and , and those in case of the BPD are and The computed values of the expected frequencies of the BZILSD, the BAZILSD and the BPD are all presented in the Tables and .
(In each cell, the first row represents the observed frequency, the second row represents theoretical frequency of the BZILSD, the third row represents theoretical frequency of BAZILSD and the last row represents theoretical frequency of BPD).
(In each cell, the first row represents the observed frequency, the second row represents theoretical frequency of the BZILSD, the third row represents theoretical frequency of BAZILSD and the last row represents theoretical frequency of BPD).
The goodness of fit is applied to the first data set in case of the BAZILSD in nine categories [such as (0,0), (0,1), (0,2), (0, 3 and above); (1,0), (1, 1 and above); (2,0), (2, 1 and above) and (3,0 and above)], that in case of the BZILSD in eight categories [such as (0,0), (0,1), (0,2), (0, 3 and above); (1,0), (1, 1 and above); (2, 0 and above) and (3,0 and above)] and that in case of the BPD in seven categories [such as (0,0), (0,1 and above); (1,0), (1, 1 and above); (2, 0), (2, 1 and above); (3,0 and above)]. In the second data set, in case of the BAZILSD the goodness of fit is applied in seven categories [such as (0,0), (0,1), (0, 2 and above); (1,0), (1, 1 and above); (2, 0 and above) and (3,0 and above)], that in case of the BZILSD there are seven categories [such as (0,0), (0,1), (0, 2 and above); (1,0), (1, 1 and above) and (2, 0), (2,1 and above)] and that in case of the BPD in seven categories [such as (0,0), (0,1), (0, 2 and above); (1,0), (1, 1 and above); (2, 0), (2, 1 and above)]. The computed values of the Chi-square statistic and in case of both the models – BZILSD, BAZILSD and BPD for data set 1 and data set 2 are all presented in Table . Based on the values of Chi-square statistic and , it can be observed that BAZILSD gives a better fit to both data sets compared to the existing models – the BZILSD and the BPD.
Table contains the computed values of and the GLRT statistic for the BAZILSD in case of for both the data sets. We have also computed the values of based on (4.8) for the BAZILSD in the case of first data set as and for the BAZILSD in the case of second data set as given below. Since the critical value for the test at 5% level of significance and one degree of freedom is 3.84, the null hypothesis that is rejected in both the above cases in respect of GLRT and REST.
6. Simulation
It is quite difficult to examine the theoretical performance of the estimators of different parameters of the BAZILSD obtained by the method of maximum likelihood. So we have attempted a simulation study for assessing the performance of the estimators. We have simulated three data sets of sample size 150, 300 and 600 in both the positively correlated and negatively correlated situations of the BAZILSD by using Markov chain Monte Carlo (MCMC) procedure, and considered 200 replications in each case. We have considered the following two sets of parameters: (i) , , , (positively correlated) and (ii) , , , (negatively correlated) as initial values of the parameters while simulating the data sets. The computed values of the bias and standard errors in case of each of the estimators are given Table . From Table , it can be observed that both the bias and standard errors of the estimators of the parameters are in decreasing order as the sample size increases.
Acknowledgements
The authors are grateful to the Editor-in-Chief and the anonymous Referees for their valuable comments on an earlier version which helped to improve the quality of this article.
References
- Ghosh, I., & Balakrishnan, N. (2015). Study of incompatibility or near compatibility of bivariate discrete conditional probability distributions through divergence measures. Journal of Statistical Computation and Simulation, 85(1), 117–130. https://doi.org/10.1080/00949655.2013.806509
- Hassan, M. Y., & El-Bassiouni, M. Y. (2013). Modelling Poisson marked point processes using bivariate mixture transition distributions. Journal of Statistical Computation and Simulation, 83(8), 1440–1452. https://doi.org/10.1080/00949655.2012.662683
- Johnson, N. L., Kemp, A. W., & Kotz, S. (2005). Univariate discrete distributions. 3rd ed. Wiley.
- Kemp, A. W. (2013). New discrete Appell and Humbert distributions with relevance to bivariate accident data. Journal of Multivariate Analysis, 113, 2–6. https://doi.org/10.1016/j.jmva.2011.08.011
- Kocherlakota, S., & Kocherlakota, K. (1992). Bivariate discrete distributions. Marcel Dekker.
- Kumar, C. S. (2007). Some properties of bivariate generalized hypergeometric probability distribution. Journal of the Korean Statistical Society, 36, 349–355. http://koreascience.or.kr/article/JAKO200734515966569.page?&lang=ko.
- Kumar, C. S. (2008). A unified approach to bivariate discrete distributions. Metrika, 67(1), 113–121. https://doi.org/10.1007/s00184-007-0125-8
- Kumar, C. S. (2013). The bivariate confluent hypergeometric series distribution. Economic Quality Control, 28(2), 23–30. https://doi.org/10.1515/eqc-2013-0009
- Kumar, C. S., & Riyaz, A. (2013). On the zero-inflated logarithmic series distribution and its modification. Statistica, 73(4), 477–492. https://doi.org/10.6092/issn.1973-2201/4498
- Kumar, C. S., & Riyaz, A. (2014). On a bivariate version of zero-inflated logarithmic series distribution and its applications. Journal of Combinatorics, Information and System Science, 39(4), 249–262.
- Kumar, C. S., & Riyaz, A. (2015). An alternative version of zero-inflated logarithmic series distribution and some of its applications. Journal of Statistical Computation and Simulation, 85(6), 1117–1127. https://doi.org/10.1080/00949655.2013.867347
- Kumar, C. S., & Riyaz, A. (2016). An order k version of the alternative zero-inflated logarithmic series distribution and its applications. Journal of Applied Statistics, 43(14), 2681–2695. https://doi.org/10.1080/02664763.2016.1142949
- Kumar, C. S., & Riyaz, A. (2017). On some aspects of a generalized alternative zero-inflated logarithmic series distribution. Communications in Statistics – Simulation and Computations, 46(4), 2689–2700. https://doi.org/10.1080/03610918.2015.1057287
- Mathai, A. M., & Haubold, H. J. (2008). Special functions for applied scientists. Springer.
- MitchelL, C. R., & Paulson, A. S. (1981). A new bivariate negative binomial distribution. Naval Research Logistics Quarterly, 28(3), 359–374. https://doi.org/10.1002/nav.3800280302
- Partrat, C. (1993). Compound model for two dependent kinds of clam. XXIVe ASTIN Colloquium.
- Rao, C. R. (1973). Linear statistical inference and its applications. John Wiley.
- Subrahmaniam, K. (1966). A test for intrinsic correlation in the theory of accident proneness. Journal of the Royal Statistical Society B, 35(1), 131–146. https://doi.org/10.1111/j.2517-6161.1966.tb00631.
Appendix
The entries of for the computations of the test statistic in case of REST are as given below. and in which and are defined in Equations (4.2) and (4.3).
The entries of for the computations of the test statistic in case of REST are as given below. For and 4, ’s are given below in which and