![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
Abstract
In this paper, we discuss some important aspects of the bivariate alternative zero-inflated logarithmic series distribution (BAZILSD) of which the marginals are the alternative zero-inflated logarithmic series distributions of Kumar and Riyaz (2015. An alternative version of zero-inflated logarithmic series distribution and some of its applications. Journal of Statistical Computation and Simulation, 85(6), 1117–1127). We study some important properties of the distribution by deriving expressions for its probability mass function, factorial moments, conditional probability generating functions, and recursion formulae for its probabilities, raw moments and factorial moments. The parameters of the BAZILSD are estimated by the method of maximum likelihood and certain test procedures are also considered. Further certain real-life data applications are cited for illustrating the usefulness of the model. A simulation study is conducted for assessing the performance of the maximum likelihood estimators of the parameters of the BAZILSD.
1. Introduction
Bivariate discrete distributions have received much attention in the literature. For example, see Ghosh and Balakrishnan (Citation2015), Hassan and El-Bassiouni (Citation2013), Kemp (Citation2013), Kumar (Citation2008), Kocherlakota and Kocherlakota (Citation1992) and references therein. Due to the extensive applications of logarithmic series distribution in various areas of scientific research especially in biology, ecology, meteorology, etc., the bivariate logarithmic series distribution (BLSD) is of particular interest. Chapter 7 of Kocherlakota and Kocherlakota (Citation1992) is fully devoted to the BLSD. Subrahmaniam (Citation1966) defined the BLSD through the following probability generating function (pgf)
(1.1)
(1.1) in which
,
and
such that
. An important drawback of the BLSD in practical point of view is that it excludes the (0, 0)-th observation from its support. To overcome this difficulty, Kumar and Riyaz (Citation2014) considered a class of bivariate distribution namely the ‘bivariate zero-inflated logarithmic series distribution (BZILSD)’ through the following probability mass function (pmf), for any non-negative integers
and
,
,
and
such that
.
(1.2)
(1.2) where
and
in which
is the Gauss hypergeometric function (cf. Mathai & Haubold, Citation2008).
Kumar and Riyaz (Citation2013) considered the zero-inflated logarithmic series distribution (ZILSD) through the following pgf, in which with
.
(1.3)
(1.3) or equivalently,
(1.4)
(1.4) Kumar and Riyaz (Citation2015) considered another zero-inflated logarithmic series distribution, which they termed as ‘the alternative zero-inflated logarithmic series distribution (AZILSD)’, through the following pmf, for
,
(1.5)
(1.5) in which
,
,
and
such that
. The pgf of the AZILSD with pmf (1.5) is
(1.6)
(1.6) or equivalently,
(1.7)
(1.7) Kumar and Riyaz (Citation2017) studied an extended version of AZILSD and its important properties. Kumar and Riyaz (Citation2016) considered an order
version of AZILSD and studied its important applications.
Through this paper, we consider a bivariate version of the AZILSD through the name ‘the bivariate alternative zero-inflated logarithmic series distribution’ or, in short ‘the BAZILSD’, and discuss some of its important aspects. In Section 2, we derive the BAZILSD as a bivariate random sum distribution of independent and identically distributed bivariate Bernoulli random variables and show that the marginal distributions of the BAZILSD are AZILSD. We obtain expressions for its pmf, mean, covariance, factorial moments and conditional pgfs which are included in Section 2. In Section 3, we derive certain recursion formulae for probabilities, raw moments and factorial moments of the BAZILSD. In Section 4, we describe the estimation of the parameters of the BAZILSD by method of maximum likelihood and certain test procedures are suggested. And in Section 5, we illustrate the usefulness of the BAZILSD through fitting the distribution to certain real-life data sets. In Section 6, a brief simulation study is conducted for examining the performance of the maximum likelihood estimators of the parameters of the BAZILSD.
It is important to note that the BAZILSD possesses a bivariate random sum structure as shown in Section 2. Certain bivariate random sum distributions are studied in the literature. For example, see Kumar (Citation2007, Citation2013). The random sum structure arises in several areas of scientific research particularly in actuarial science, agricultural science, biological science and physical science. Chapter 9 of Johnson et al. (Citation2005) fully devoted to univariate random sum distributions.
For simplicity in the notations, we adopt the following notations throughout in the manuscript.
(1.8)
(1.8)
(1.9)
(1.9)
(1.10)
(1.10)
(1.11)
(1.11)
(1.12)
(1.12)
(1.13)
(1.13)
2. A genesis and some properties of the BAZILSD
First, we derive the BAZILSD in the following and discuss some of its properties.
Consider the sequence of independent and identically distributed bivariate Bernoulli random vectors, each with pgf
in which
,
with
such that
,
and
Let
be a non-negative integer valued random variable having AZILSD with pgf (1.6), in which
. Assume that
and
’s are independent. Define
, for each
in which
and
, for
and
. Set
where
denotes the indicator function of an event
. Then the pgf of
is
(2.1)
(2.1) where
is defined in (1.9).
We call a distribution with pgf (2.1) ‘the bivariate alternative zero-inflated logarithmic series distribution’ or, in short ‘the BAZILSD’. Clearly when , the pgf given in (2.1) reduces to the following pgf of the BZILSD with pmf (1.2).
(2.2)
(2.2) which shows that the proposed bivariate model of the AZILSD can be considered as a more flexible model in practical point of view compared to the BZILSD. Further, it can be noted that the marginals of the BAZILSD are AZILSD whereas the marginals of the BZILSD are not ZILSD.
Proposition 2.1.
If follows the BAZILSD, then the marginal distribution of
for
is AZILSD with pgf given below.
and
Proposition 2.2.
The pgf of the conditional distribution of given
is the following: for any non-negative integer
,
(2.3)
(2.3)
Proof:
For any non-negative integer assume that
. Now, we have the following partial derivatives of order
of
with respect to
evaluated at
.
(2.4)
(2.4) where for
(2.5)
(2.5) and
is defined in (1.8).
Now, applying the formula for the conditional pgf in terms of partial derivatives of the joint pgf developed by Subrahmaniam (Citation1966), we obtain the conditional pgf of given
as
which implies (2.3) in the light of (1.8).
Remark 2.1.
The conditional distribution of given
as given in (2.3) can be written as
where
is the pgf of a binomial random variable with parameters
and
and
is the pgf of a random variable following the AZILSD with parameters
,
and
. Thus clearly, the conditional distribution
given
is the distribution of the sum of two independent random variables
and
.
Proposition 2.3:
Let follow the BAZILSD with pgf (2.1). Then
(2.6)
(2.6)
(2.7)
(2.7)
Remark 2.2:
By a similar approach, for any non-negative integer with
we can obtain the conditional pgf of
given
by interchanging
and
in (2.3). Therefore, it is evident that comments similar to those in Remark 2.1 are valid regarding conditional distribution of
given
and the explicit expression for
and
can be obtained by interchanging
and
in the right hand side expressions of (2.6) and (2.7) respectively.
Proposition 2.4 :
Let follow the BAZILSD with pgf (2.1) and let
be any non-negative integers. The pmf
and the
-th factorial moment
of the BAZILSD are
(2.8)
(2.8)
(2.9)
(2.9) where
is defined in (1.2), for
,
’s are defined in (1.11) and (1.12) and
.
Proof :
In order to obtain the probability mass function of the BAZILSD, we need the following derivatives of , in which
is a non-negative integer.
(2.10)
(2.10) where
(2.11)
(2.11) The following derivatives are needed in the sequel, in which
and
.
(2.12)
(2.12)
(2.13)
(2.13) Differentiating both sides of (2.10)
times with respect to
and applying (2.12) and (2.13), we get the following.
(2.14)
(2.14) By putting
in (2.14) and by dividing
, we get (2.8). By putting
in (2.14), we get (2.9).
Proposition 2.5 :
Let follow the BAZILSD with pgf (2.1). Then we have the following, in which
,
(2.15)
(2.15)
(2.16)
(2.16) and
(2.17)
(2.17) where
and
are given in (2.5).
The proof follows from (2.9) in the light of the relations:
Proposition 2.6.
Let follow the BAZILSD with pgf (2.1). Then
follows the modified AZILSD studied by Kumar and Riyaz (Citation2013).
The proof follows from the fact that the pgf of is
3. Recursion formulae
In this section, we develop certain recursion formulae for probabilities, raw moments and factorial moments. Let be a random vector with pgf (2.1). For the sake of computational simplicity, we define
, for
. Now we have the following from (2.1) in which
, for
(3.1)
(3.1) Now we obtain the following propositions.
Proposition 3.1
The probability mass function of the BAZILSD satisfies the following recurrence formulae, in which
is defined in Proposition 2.5.
(3.2)
(3.2)
(3.3)
(3.3)
(3.4)
(3.4)
(3.5)
(3.5)
Proof :
From (2.10) with , we have the following.
(3.6)
(3.6) On differentiating both sides of (3.1) with respect to
, we have
(3.7)
(3.7) From (3.1), we also have the following.
(3.8)
(3.8) Now by using (3.7) and (3.8) in (3.6) we get
(3.9)
(3.9) On equating the coefficient of
on both sides of (3.9), we get (3.2). By equating the coefficient of
on both sides of (3.9), we get the relation (3.3). We omit the proof of relations (3.4) and (3.5) as it is similar to that of relations (3.2) and (3.3).
Proposition 3.2 :
Two recurrence formulae for the -th raw moment
of the BAZILSD are the following, for
.
(3.10)
(3.10)
(3.11)
(3.11)
Proof :
The characteristic function of the BAZILSD with pgf (2.1) is the following. For
in
and
,
(3.12)
(3.12) where
.
On differentiating (3.12) with respect to we get,
(3.13)
(3.13) In the light of (3.12), we have the following from (3.13).
Now, on expanding exponential functions, rearranging the term and by using standard properties of double sum we obtain the following.
(3.14)
(3.14) On equating coefficients of
on both sides of (3.14), we get the relation (3.10). A similar procedure will give (3.11).
Proposition 3.3
: The -th order factorial moment
of the BAZILSD satisfies the following recurrence formulae, for
, in which
.
(3.15)
(3.15)
(3.16)
(3.16)
Proof:
Let be a random vector having the BAZILSD with pgf
as given in (3.1). Then the factorial moment generating function
of the BAZILSD is
(3.17)
(3.17) where
.
On differentiating (3.16) with respect to , we get
In the light of (3.17), we can write this as
(3.18)
(3.18) Equating the coefficient of
on both sides of (3.18), we get (3.15). Similar procedures will lead to (3.16).
4. Estimation and testing
In this section, we discuss the estimation of the parameters ,
,
and
of the BAZILSD by the method of method maximum likelihood and construct certain test procedures for testing the significance of the additional parameter
of the BAZILSD.
4.1. Maximum likelihood estimation
Let be the frequency of the
-th cell of a bivariate data. Let
be the highest value of
observed and
be the highest value of
observed. Then the likelihood function of the sample is
(4.1)
(4.1) where
is the pmf of the BAZILSD as given in (2.8). Taking logarithm on both sides of (4.1), we get
(4.2)
(4.2) where
is given in (1.9),
and
is defined in Proposition 2.4.
Let ,
,
and
denote the maximum likelihood estimators of the parameters
,
,
and
of the BAZILSD. On differentiating (4.2), partially with respect to the parameters
,
,
and
, respectively, and equating to zero, we get the following likelihood equations, in which
in the light of
, where
and
are defined in (2.5) and (1.8), respectively.
(4.3) (4.4)
(4.3) (4.4)
(4.5)
(4.5) and
(4.6)
(4.6) Now on solving these likelihood equations (4.3)–(4.6) by using some mathematical software such as MATHLAB, MATHCAD, MATHEMATICA, etc., one can obtain the maximum likelihood estimators of the parameters
,
,
and
.
4.2. Testing of the hypothesis
For testing the hypothesis against the alternative hypothesis
, we construct the generalized likelihood ratio test (GLRT) and Rao’s efficient score test (REST) as follows.
In case of (GLRT), the test statistic is
(4.7)
(4.7) where
is the maximum likelihood estimator of
with no restrictions, and
is the maximum likelihood estimator of
when
. The test statistic
given in (4.7) is asymptotically distributed as Chi-square with one degree of freedom. For details, see Rao (Citation1973).
In case of (REST), the following test statistic can be used.
(4.8)
(4.8) where
and
are the Fisher information matrices in which
and
for
and
are as given in the Appendix. The test statistic given in (4.8) follows Chi-square distribution with one degree of freedom (see Rao, Citation1973).
5. Applications
For numerical applications, we consider two real-life data sets of which the first data set is from MitchelL and Paulson (Citation1981), which consists of the number of aborts by 109 aircrafts in two consecutive six months of one year period and the second data set, taken from Partrat (Citation1993), is the yearly frequency of hurricanes affecting tropical cyclones in two zones belonging to the North Atlantic coastal states in the USA. We have fitted the BZILSD, the BAZILSD and the bivariate Poisson distribution (BPD) to these data sets by the method of the maximum likelihood estimates of the parameter of the models. For the first data set, the maximum likelihood estimates (MLES) of the parameters in case of the BZILSD are ,
and
, those in case of the BAZILSD are
,
,
and
, and those in case of the BPD are
and
For the second data set, the MLES of the parameters in case of the BZILSD are
,
and
= 0.02, those in case of the BAZILSD are
,
,
and
, and those in case of the BPD are
and
The computed values of the expected frequencies of the BZILSD, the BAZILSD and the BPD are all presented in the Tables and .
Table 1. Observed frequencies and computed values of expected frequencies of the BZILSD, the BAZILSD and the BPD by method of maximum likelihood for the first data set.
Table 2. Observed frequencies and computed values of expected frequencies of the BZILSD, the BAZILSD and the BPD by method of maximum likelihood for the second data set.
(In each cell, the first row represents the observed frequency, the second row represents theoretical frequency of the BZILSD, the third row represents theoretical frequency of BAZILSD and the last row represents theoretical frequency of BPD).
(In each cell, the first row represents the observed frequency, the second row represents theoretical frequency of the BZILSD, the third row represents theoretical frequency of BAZILSD and the last row represents theoretical frequency of BPD).
The goodness of fit is applied to the first data set in case of the BAZILSD in nine categories [such as (0,0), (0,1), (0,2), (0, 3 and above); (1,0), (1, 1 and above); (2,0), (2, 1 and above) and (3,0 and above)], that in case of the BZILSD in eight categories [such as (0,0), (0,1), (0,2), (0, 3 and above); (1,0), (1, 1 and above); (2, 0 and above) and (3,0 and above)] and that in case of the BPD in seven categories [such as (0,0), (0,1 and above); (1,0), (1, 1 and above); (2, 0), (2, 1 and above); (3,0 and above)]. In the second data set, in case of the BAZILSD the goodness of fit is applied in seven categories [such as (0,0), (0,1), (0, 2 and above); (1,0), (1, 1 and above); (2, 0 and above) and (3,0 and above)], that in case of the BZILSD there are seven categories [such as (0,0), (0,1), (0, 2 and above); (1,0), (1, 1 and above) and (2, 0), (2,1 and above)] and that in case of the BPD in seven categories [such as (0,0), (0,1), (0, 2 and above); (1,0), (1, 1 and above); (2, 0), (2, 1 and above)]. The computed values of the Chi-square statistic and in case of both the models – BZILSD, BAZILSD and BPD for data set 1 and data set 2 are all presented in Table . Based on the values of Chi-square statistic and
, it can be observed that BAZILSD gives a better fit to both data sets compared to the existing models – the BZILSD and the BPD.
Table 3. The computed Chi-square value and value while fitting the models – BZILSD, BAZILSD and BPD for the Data set 1 and Data set 2.
Table contains the computed values of and the GLRT statistic for the BAZILSD in case of for both the data sets. We have also computed the values of
based on (4.8) for the BAZILSD in the case of first data set as
and for the BAZILSD in the case of second data set
as given below.
Since the critical value for the test at 5% level of significance and one degree of freedom is 3.84, the null hypothesis that
is rejected in both the above cases in respect of GLRT and REST.
Table 4. The computed the values of and the generalized likelihood ratio test statistic under
.
6. Simulation
It is quite difficult to examine the theoretical performance of the estimators of different parameters of the BAZILSD obtained by the method of maximum likelihood. So we have attempted a simulation study for assessing the performance of the estimators. We have simulated three data sets of sample size 150, 300 and 600 in both the positively correlated and negatively correlated situations of the BAZILSD by using Markov chain Monte Carlo (MCMC) procedure, and considered 200 replications in each case. We have considered the following two sets of parameters: (i) ,
,
,
(positively correlated) and (ii)
,
,
,
(negatively correlated) as initial values of the parameters while simulating the data sets. The computed values of the bias and standard errors in case of each of the estimators are given Table . From Table , it can be observed that both the bias and standard errors of the estimators of the parameters are in decreasing order as the sample size increases.
Table 5. Bias and standard errors (within parenthesis) of the estimators of the parameters ,
,
and
of the BAZILSD for the simulated data sets.
Acknowledgements
The authors are grateful to the Editor-in-Chief and the anonymous Referees for their valuable comments on an earlier version which helped to improve the quality of this article.
References
- Ghosh, I., & Balakrishnan, N. (2015). Study of incompatibility or near compatibility of bivariate discrete conditional probability distributions through divergence measures. Journal of Statistical Computation and Simulation, 85(1), 117–130. https://doi.org/10.1080/00949655.2013.806509
- Hassan, M. Y., & El-Bassiouni, M. Y. (2013). Modelling Poisson marked point processes using bivariate mixture transition distributions. Journal of Statistical Computation and Simulation, 83(8), 1440–1452. https://doi.org/10.1080/00949655.2012.662683
- Johnson, N. L., Kemp, A. W., & Kotz, S. (2005). Univariate discrete distributions. 3rd ed. Wiley.
- Kemp, A. W. (2013). New discrete Appell and Humbert distributions with relevance to bivariate accident data. Journal of Multivariate Analysis, 113, 2–6. https://doi.org/10.1016/j.jmva.2011.08.011
- Kocherlakota, S., & Kocherlakota, K. (1992). Bivariate discrete distributions. Marcel Dekker.
- Kumar, C. S. (2007). Some properties of bivariate generalized hypergeometric probability distribution. Journal of the Korean Statistical Society, 36, 349–355. http://koreascience.or.kr/article/JAKO200734515966569.page?&lang=ko.
- Kumar, C. S. (2008). A unified approach to bivariate discrete distributions. Metrika, 67(1), 113–121. https://doi.org/10.1007/s00184-007-0125-8
- Kumar, C. S. (2013). The bivariate confluent hypergeometric series distribution. Economic Quality Control, 28(2), 23–30. https://doi.org/10.1515/eqc-2013-0009
- Kumar, C. S., & Riyaz, A. (2013). On the zero-inflated logarithmic series distribution and its modification. Statistica, 73(4), 477–492. https://doi.org/10.6092/issn.1973-2201/4498
- Kumar, C. S., & Riyaz, A. (2014). On a bivariate version of zero-inflated logarithmic series distribution and its applications. Journal of Combinatorics, Information and System Science, 39(4), 249–262.
- Kumar, C. S., & Riyaz, A. (2015). An alternative version of zero-inflated logarithmic series distribution and some of its applications. Journal of Statistical Computation and Simulation, 85(6), 1117–1127. https://doi.org/10.1080/00949655.2013.867347
- Kumar, C. S., & Riyaz, A. (2016). An order k version of the alternative zero-inflated logarithmic series distribution and its applications. Journal of Applied Statistics, 43(14), 2681–2695. https://doi.org/10.1080/02664763.2016.1142949
- Kumar, C. S., & Riyaz, A. (2017). On some aspects of a generalized alternative zero-inflated logarithmic series distribution. Communications in Statistics – Simulation and Computations, 46(4), 2689–2700. https://doi.org/10.1080/03610918.2015.1057287
- Mathai, A. M., & Haubold, H. J. (2008). Special functions for applied scientists. Springer.
- MitchelL, C. R., & Paulson, A. S. (1981). A new bivariate negative binomial distribution. Naval Research Logistics Quarterly, 28(3), 359–374. https://doi.org/10.1002/nav.3800280302
- Partrat, C. (1993). Compound model for two dependent kinds of clam. XXIVe ASTIN Colloquium.
- Rao, C. R. (1973). Linear statistical inference and its applications. John Wiley.
- Subrahmaniam, K. (1966). A test for intrinsic correlation in the theory of accident proneness. Journal of the Royal Statistical Society B, 35(1), 131–146. https://doi.org/10.1111/j.2517-6161.1966.tb00631.
Appendix
The entries of for the computations of the test statistic in case of REST are as given below.
and
in which
and
are defined in Equations (4.2) and (4.3).
The entries of for the computations of the test statistic in case of REST are as given below. For
and 4,
’s are given below in which
and