793
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Comparison of estimation methods for one-inflated positive Poisson distribution

ORCID Icon, &
Pages 869-881 | Received 13 Aug 2021, Accepted 10 Nov 2021, Published online: 06 Dec 2021

Abstract

This paper aims to propose estimation methods for one-inflated positive Poisson (OIPP) distribution and compare their properties in terms of unbiasedness, consistency, efficiency and deficiency. All estimators considered in the study are asymptotically unbiased and consistent. The maximum likelihood estimator (MLE) for the OIPP distribution is asymptotically normal. When compared to the MLE, the ordinary least square estimator (OLSE) is the most efficient, followed by the method of moments estimator (MME) and the ratio of probability estimator (RPE). A novel one-inflation index was also proposed to assess the presence of excess ones in the dataset for the positive Poisson distribution to determine whether a one-inflated distribution is required for model fitting. A real dataset with a large number of ones, as identified by the proposed one-inflation index, was used for model fitting. It is found that the OLSE and MLE are the best estimators for an OIPP distribution.

1. Introduction

In discrete count data distribution, one method of modelling positive data is to truncate the distribution at zero, resulting in a zero-truncated distribution. Positive count data modelling can be traced back to the mid-twentieth century, when the first zero-truncated distribution, known as the zero-truncated Poisson distribution [Citation1], was proposed. The probability mass function of the zero-truncated Poisson distribution is given as. Pr(X=x|λ)=λxx![exp(λ)1];x=1,2,3,,where λ>0 and the maximum likelihood estimator (MLE) of λ can be obtained numerically by solving (1) x¯=λˆ11exp(λˆ1).(1) Subsequently, other estimators for a zero-truncated Poisson distribution were developed and analysed [Citation2–5]. An asymptotically unbiased estimator of λ was proposed [Citation2] as follows: (2) λˆ2=x=0rxnx/x=0r1nx,(2) where nx is the frequency of x-valued data and r is the maximum number taken by x. Similarly, an efficient estimator of λ was proposed [Citation3] as follows: (3) λˆ3=1nx=2xnx,(3)

Both (1) and (3) were used to estimate the mortality rate of the number of gall-cells per flower-head [Citation4], and it was discovered that both estimators provide similar estimates for mortality rate estimation. In addition to the estimators in (1–3), a minimum variance unbiased estimator of λ was proposed [Citation5] as follows: (4) λˆ4=tnC(n,t);1nt,1t50 or n=t51 or n=1,t51tn1n1nt1;2n15,t51 or n16,tnλˆ1 or λˆ3;otherwise,(4) where t=nx¯, C(n,t)=1St1n1/Stn and Stn is the Stirling number of the second kind. The function used for this estimator is dependent on the subdomain as shown in (4).

The negative binomial distribution was modified to the zero-truncated negative binomial distribution as an alternative to the zero-truncated Poisson distribution. As a result, the moment estimator for the zero-truncated negative binomial distribution was proposed [Citation6] and further modified, resulting in a simpler but more efficient estimator [Citation7]. A zero-truncated distribution based on the Poisson–Lindley distribution likelihood was also proposed, leading to the development of estimation methods based on the method of moments and maximum likelihood [Citation8].

Inflated models based on the zero-truncated distribution have been studied to incorporate a large number of ones in the dataset. The proposed distributions include the one-inflated positive Poisson [Citation9], one-inflated zero-truncated negative binomial [Citation10], one-inflated positive Poisson mixture [Citation11] and one-inflated positive Poisson–Lindley [Citation12] models, which are all commonly used to estimate the population size of individuals in capture–recapture experiments. The inflation parameter in the one-inflated model is critical for reflecting the desire and ability of a captured subject to avoid recapture [Citation9]. In line with [Citation9], the goal of our study is to propose multiple estimation methods for an OIPP distribution and to identify the best estimators by considering various estimator properties. An inflation index that can assess the presence of excess ones in the dataset was also developed and analysed.

This paper is structured as follows: Section 2 provides a brief overview of the OIPP distribution and its statistical properties. Section 3 proposes estimation methods for an OIPP distribution. Section 4 discusses two simulation studies that were conducted to investigate the performance of the proposed estimation methods in terms of unbiasedness, consistency, efficiency and deficiency. Section 5 proposes a novel one-inflation index to assess the presence of excess ones in a dataset for the positive Poisson distribution to determine whether a one-inflated distribution is required for model fitting. The performance of the proposed index is also addressed in the same section. In Section 6, a dataset is fitted to the OIPP distribution using the proposed estimation methods and model fittings. Section 7 concludes the study.

2. One-inflated positive Poisson (OIPP) distribution and its statistical properties

Let YOIPP(ω,λ), then the probability mass function of Y is given in (5), where λ>0 refers to the rate parameter and ω refers to the one-modification parameter for the oIPP distribution. (5) Pr(Y=y|ω,λ)=ω+(1ω)λexp(λ)1;y=1(1ω)λyy![exp(λ)1];y2(5) If ω>0, then the distribution is known as an OIPP distribution [Citation9]. If ω=0, the OIPP distribution is reduced to a zero-truncated Poisson distribution [Citation1]. It is possible for ω to be negative, such as ω(λ/[exp(λ)1λ],0), in which case the distribution is known as a one-deflated positive Poisson distribution. In this study, we restrict ω>0 to ensure the one-inflation property. The formulae for the first two moments about the origin, the dispersion index and the moment generating function are given respectively as (6) μ=ω+(1ω)λexp(λ)exp(λ)1,μ2=ω+(1ω)λexp(λ)[λ+1]exp(λ)1=μ+(μω)λ,d=σ2/μ=1+λμωλ/μ,MY(t)=ωexp(t)+(1ω){exp[λexp(t)]1}exp(λ)1.(6)

Figure illustrates the dispersion index of various pairs of parameters λ and ω. The heatmap in Figure shows that the OIPP distribution can be either underdispersed or overdispersed.

Figure 1. Heatmap of dispersion index for various pairs of parameters λ and ω.

Figure 1. Heatmap of dispersion index for various pairs of parameters λ and ω.

For y2, the OIPP distribution is unimodal, which can be deduced based on the decreasing function from the following ratio: Pr(Y=y+1|ω,λ)Pr(Y=y|ω,λ)=(1ω)λy+1(y+1)![exp(λ)1](1ω)λyy![exp(λ)1]=λy+1.

3. Estimation of parameters for OIPP distribution

Several estimation methods for an OIPP distribution were developed based on the method of moments, maximum likelihood, one-proportion, ratio of probability and ordinary least squares. Fitting a model to a dataset based on its method of moments, maximum likelihood and/or ordinary least squares estimators is a common practice in statistical modelling. On the other hand, the one-proportion and ratio of probability estimators are rarely used.

3.1. Method of moments estimator (MME)

The method of moments estimator (MME) is obtained by equating sample moments with theoretical moments. By equating the first two sample moments with the first two equations in (6), the MMEs of λ and ω are obtained by solving. (1m1)λ~2exp(λ~)+(m2m1)λ~exp(λ~)+(m1m2)[exp(λ~)1]=0and ω~=(m1λ~)exp(λ~)m1exp(λ~)1λ~exp(λ~),where mk=i=1nyik/n for n data, and ω~ and λ~ are the respective MMEs of ω and λ.

3.2. Maximum likelihood estimation (MLE)

The log-likelihood function l for a random variable Y that follows the OIPP distribution is given as. ln1lnω+(1ω)λexp(λ)1+(nn1){ln(1ω)ln[exp(λ)1]}+y=2nyln(y),where ny refers to the number of y-valued observations and n=y=1ny. By differentiating l with respect to ω and λ and setting it to zero, the respective MLEs of ω and λ can be obtained by solving ωˆ=n1[exp(λˆ)1]nλˆn[exp(λˆ)1λˆ]and (λˆA)exp(λˆ)+(A1)λˆ+A=0,where A=(nm1n1)/(nn1) and ωˆ and λˆ are the respective MLEs of ω and λ.

A one-proportion estimator (OPE) is the extension to the estimator for the generalized negative binomial distribution. The OPE for the generalized negative binomial distribution can be obtained by comparing the first two sample moments and the sample proportion of zeros to the population proportion [Citation13]. Similarly, the zero-inflated Poisson distribution parameter estimator is obtained by equating the empirical probability with the theoretical probability of zero-valued observations [Citation14]. For the OIPP distribution, the OPE of ω is obtained by equating the theoretical proportion of ones to the sample proportion of one, while the OPE of λ is obtained by equating the population mean to the sample mean. Surprisingly, the OPEs of ω and λ are identical to their respective MLEs.

Theorem 1:

The MLE λˆ of λ is consistent and asymptotically normal, such that. n(λˆλ)dN(0,I1(λ)),where I(λ)=ω+(1ω)λu×[(1ω)uv(λeλ)+eλ(uλeλ)(ωuv+1)+ωu+1]u2v2+1ωλeλ(uλ)(1ω)u3is the Fisher information of λ with u=exp(λ)1 and v=ω(uλ)+λ.

Proof:

The regularity conditions under which the MLE λˆ is consistent and asymptotically normal is satisfied by the OIPP distribution (see [Citation15, Chapter 6]), therefore I(λ)=E2lnf(y)λ2=E2λ2I(y=1)lnω+(1ω)λexp(λ)1+I(y>1)(ln(1ω)+ylnλlny!ln(exp(λ)1))ω+(1ω)λexp(λ)1=E[I(y=1)P+I(y>1)Q]=Pf(1)+y=2Qf(y),where f(y)=Pr(Y=y), and P and Q are respectively given as P=(1ω)uv[λexp(λ)]exp(λ)[uλexp(λ)](ωuv+1)ωu1u2v2and Q=yλ2+exp(λ)u2.Solving the summation in I(y) yields I(λ)=ω+(1ω)λu×[(1ω)uv(λeλ)+eλ(uλeλ)(ωuv+1)+ωu+1]u2v2+1ωλeλ(uλ)(1ω)u3.From Theorem 1, the asymptotic 100(1α)% confidence interval of λ is given as. λˆzα/2I1/2(λ)n.

Theorem 2:

The MLE ωˆ of ω is consistent and asymptotically normal, such that n(ωˆω)dN(0,I1(ω)),where I(ω)=λuu(uλ)[ω(uλ)+λ]v2+11ωis the Fisher information of ω with u=exp(λ)1 and v=ω(uλ)+λ.

Proof:

The regularity conditions under which the MLE ωˆ is consistent and asymptotically normal is satisfied by the OIPP distribution (see [, Chapter 6]), therefore I(ω)=E2lnf(y)ω2=E2ω2I(y=1)lnω+(1ω)λexp(λ)1+I(y>1)(ln(1ω)+ylnλlny!ln(exp(λ)1))ω+(1ω)λexp(λ)1=EI(y=1)R+I(y>1)1(1ω)2=Rf(1)+1(1ω)2[1f(1)],where f(y)=Pr(Y=y) and R=(uλ)2/v2.

Substituting R in the derivation above yields I(ω)=λuu(uλ)[ω(uλ)+λ]v2+11ω.Theorem 2 implies that the asymptotic 100(1α)% confidence interval of ω is given as ωˆzα/2I1/2(ω)n.

3.3. Ratio of probability estimator (RPE)

The ratio of probability estimator (RPE) is the extension to the estimator for the generalized negative binomial distribution that can be obtained either by equating the ratio of one-valued observations to two-valued observations or by equating the first two theoretical moments with the sample moments [Citation13]. The same method has been employed to obtain the parameter estimators for the zero-inflated Poisson–Lindley distribution [Citation16]. Our study is premised on [Citation13,Citation16], in which the respective estimators of λ and ω were obtained by considering the ratio of probability for 3-valued observations to 2-valued observations and by equating the population mean with the sample mean. Parameter ω can be eliminated by considering the ratio of probability for 3-valued observations to 2-valued observations, allowing parameter λ to be easily obtained as follows: Pr(Y=3|ω,λ)Pr(Y=2|ω,λ)=(1ω)λ33![exp(λ)1](1ω)λ22![exp(λ)1]=λ3.By further equating λ/3 to the empirical ratio of n3/n2, the resulting RPE of λ is given as λ=3n3/n2,where λ˘ is the RPE of λ and λ˘ can only be obtained when both n2 and n3 are greater than zero. Note that the formula of λ˘ is very simple, hence can be solved manually.

The calculation of RPE of ω is similar to the calculation of MME of ω. Notice that the RPE of λ is a special case of the general form of the Zelterman estimator studied by Böhning [Citation17] given as (i+1)ni+1/ni, where i=2. Since the OIPP distribution is an extended zero-truncated Poisson distribution, it is only appropriate to use the first approach given by Böhning [Citation17]. This entails truncating the Poisson model at all counts except 2 and 3, resulting in the following log-likelihood function. =lnL=11+λ3n2λ31+λ3n3,with maximum likelihood estimate λ˘=3n3/n2=λ˘ (refer to [Citation17] for a detailed explanation). Based on the findings in [Citation17], the variance of RPE of λ, which is a special case of Zelterman estimator, can be written as Varλ=λ21n2+1n3.

The variance can be further estimated by substituting λ with λ˘, resulting in Vaˆrλ=9n3n23(n2+n3).Identical results can be obtained using the second approach given by Böhning [Citation17], which considers the nonparametric multinomial approach. Based on the variance and the estimated variance formulae, it can be concluded that the RPE of λ is consistent since as n increases, so does ni. Also, the one-inflation and unimodality properties indicate that n2>n3, resulting in zero Var(λ˘) and Vaˆr(λ˘).

3.4. Ordinary least squares estimator (OLSE)

The ordinary least squares estimator (OLSE) is an estimator that minimizes the function of the squared difference between theoretical and empirical cumulative functions. Suppose y(1)y(2)y(n) is the order statistics of the data that follows the OIPP distribution. It is known that E[F(Y(i))]=i/(n+1) for i=1,2,,n. However, for count data, the expected value of the order statistics is best expressed with respect to the data frequency, such that E[F(Ynj)]=jynj/(n+1). The OLSE of ω and λ can be obtained by minimizing the following function: Sy(ω,λ)=y=1F(y|ω,λ)j=1ynjn+12=y=1j=1yω+(1ω)λexp(λ)1I(j=1)+(1ω)λjj![exp(λ)1]I(j>1)njn+12,where I() is an indicator function and ny is the frequency of y-valued data. The estimators ω and λ that give the smallest Sy( ω,λ) are the OLSEs of ω and λ, respectively. The resulting estimators have invariance property since OLSE is a special case of the minimum-distance estimation [Citation18].

4. Simulation study

Two simulation studies were conducted to evaluate the performance of the proposed estimators in terms of unbiasedness, consistency and efficiency. Both simulation studies used λ>1 to ensure that n2>0 and n3>0, so that the RPEs of ω and λ can be obtained.

4.1. Unbiasedness and consistency properties of the estimators

The first simulation study was conducted to evaluate the unbiasedness and consistency of the proposed estimators. The setting for the first simulation study is as follows:

Simulation setting:

Step 1. Generate n=100,200,,1000 random data samples that follow a positive Poisson distribution with parameters ω and λ, where ω=0.1,0.3 and λ=1.2,2.0.

Step 2. Change the value of ωn data to 1 at random to ensure the one-inflation property.

Step 3. Estimate the parameters using MME, MLE, RPE and OLSE.

Step 4. Repeat Steps 1–3 for a total of 2000 times, and calculate bias Bias(δ) and mean squared error MSE(δ) for parameter δ using the following respective formulae Bias(δ)=j=12000|δδj|/2000and MSE(δ)=j=12000(δδj)2/2000,where δ is the estimated δ and δ=ω,λ. The absolute value of the deviation about the true parameter was considered when computing the bias to provide a smooth trend of bias as n increases and avoid bias value fluctuations.

Figures and respectively illustrate the bias of the proposed estimators of λ and ω. Based on Figures and , the following conclusions can be drawn:

  1. The estimators are asymptotically unbiased. As n increases, the bias of the estimators of both λ and ω decrease and approach zero.

  2. The biases of MMEs, MLEs and OLSEs are similar and significantly smaller than those of RPEs for the same sample size and parameters.

  3. The estimators with the lowest to highest bias for any given sample size are MLE, OLSE, MME and RPE.

Figure 2. The absolute relative bias of the estimator of λ when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 2. The absolute relative bias of the estimator of λ when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 3. The absolute relative bias of the estimator of ω when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 3. The absolute relative bias of the estimator of ω when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figures and respectively illustrate the mean squared error of the estimators of λ and ω using the proposed estimation methods. Based on Figures and , the following conclusions can be drawn:

  1. The estimators are consistent for all parameters. As n increases, the mean squared errors of the estimators decrease and approach zero.

  2. The mean squared error of MMEs, MLEs and OLSEs are similar and significantly smaller than those of RPEs for the same sample size and parameters.

  3. The estimators with the lowest to highest mean squared error for any given sample size are MLE, OLSE, MME and RPE.

Figure 4. The mean squared error of the estimator of λ when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 4. The mean squared error of the estimator of λ when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 5. The mean squared error values of estimator ω when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 5. The mean squared error values of estimator ω when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

In short, the proposed estimators are asymptotically unbiased and consistent for all values of parameters. RPE has a significantly larger bias and mean squared error than other estimators, but it still may provide a good model fitting for very large samples (n1000). The best to worst estimators in terms of unbiasedness and consistency are MLE, OLSE, MME and RPE.

4.2. Efficiency property of the estimators

The second simulation study was conducted to evaluate the efficiency of the proposed estimators compared to the efficiency of the MLE. The setting for the second simulation study is as follows:

Simulation setting:

Step 1. Generate n=1000 random data samples that follow the positive Poisson distribution with parameters ω and λ.

Step 2. Change the value of ωn data to 1 at random to ensure the one-inflation property.

Step 3. Estimate the parameters using MME, MLE, RPE and OLSE.

Step 4. Repeat Steps 1–3 for a total of 2000 times, and calculate the efficiency of all estimators compared to the efficiency of the MLE using eff(δ,δˆ)=Var(δˆ)/Var(δ),where δ is the MME, RPE and OLSE of δ, and δˆ is the MLE of δ. Since the asymptotically unbiased property has been established and variance can be represented as the sum of mean squared error and squared bias, the efficiency of the estimators can be calculated using the following formula: eff(δ,δˆ)=MSE(δˆ)/MSE(δ).

Figure and Figure respectively illustrates the efficiency of the proposed estimators of λ and ω in percentage compared to the efficiency of the MLE. Based on Figure and Figure , the following conclusions can be drawn:

  1. The MME of λ is approximately 70–80% as efficient as the MLE of λ.

  2. The MME of ω is approximately 40–70% as efficient as the MLE of ω.

  3. The RPE of λ is approximately 5–30% as efficient as the MLE of λ.

  4. The RPE of ω is approximately 2–25% as efficient as the MLE of ω.

  5. The OLSEs of λ and ω is approximately 90–99% as efficient as the MLE of λ and ω.

Figure 6. The efficiency of the estimators of λ when estimated using the MME, RPE and OLSE with respect to the MLE.

Figure 6. The efficiency of the estimators of λ when estimated using the MME, RPE and OLSE with respect to the MLE.

Figure 7. The efficiency of the estimators of ω when estimated using the MME, RPE and OLSE with respect to the MLE.

Figure 7. The efficiency of the estimators of ω when estimated using the MME, RPE and OLSE with respect to the MLE.

It can be concluded that the OLSE is almost as efficient as the MLE but more efficient than the MME and RPE. Also, the MME is significantly more efficient than the RPE.

Another way to evaluate estimator performance is by calculating the joint efficiencies of the estimators for the two parameters using the deficiency criterion [Citation19], which is defined as. Def(λ,ω)=MSE(λ)+MSE(ω),where λ and ω are the estimators of λ and ω, respectively. Figure shows that the joint efficiency decreases as the sample size grows for given λ and ω. The MLE, MME and OLSE have similar joint efficiencies, whereas RPE has a substantially larger joint efficiency, even when the sample size is as large as 1000. According to the deficiency criterion, the MLE is the most efficient estimator. The estimators with the best to worst efficiency are MLE, OLSE, MME and RPE. These findings corroborate the findings in Figure and Figure .

Figure 8. The joint efficiency of the estimators of λ and ω when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

Figure 8. The joint efficiency of the estimators of λ and ω when estimated using the MLE, MME, OLSE and RPE for λ=1.2,2.0 and ω=0.1,0.3.

5. One-inflation index under positive Poisson distribution

A novel index called the one-inflation index was developed to determine whether a one-inflated distribution is required to model a dataset. By definition, a k-inflation index is an index that assesses the presence of k-valued data with respect to a certain distribution. For instance, the zero-inflation index, denoted as zip, assesses the presence of excess zeros in a dataset for a Poisson distribution [Citation20,Citation21]. Another zero-inflation index, denoted as zinb, was introduced to assess the presence of excess zeros in the dataset for a negative binomial distribution [Citation22]. The formulae of zip and zinb are respectively given as. zip=1+ln(p0)/μand zinb=1+(σ2μ)ln(p0)μ2ln(σ2/μ),where p0 is the proportion of zeros, μ is the mean and σ2 is the variance. When the sample zero-inflation index is greater than zero, this indicates that there are excess zeros in the dataset. Following the works in [Citation20–22], a new one-inflation index is proposed to assess the presence of excess ones in the dataset for a positive Poisson distribution. The one-inflation index is denoted as oipp=1+ln(p1)ln[exp(d+μ1)1]ln(d+μ1),where p1 is the proportion of ones and d is the dispersion index. If a random variable follows the positive Poisson distribution, then oiPP=0, while oiPP>0 indicates that the dataset contains a large number of ones for the positive Poisson distribution. The presence of excess ones in the sample data can be determined by computing the sample one-inflation index. As an example, a dataset of size 5000 that follows the OIPP distribution was simulated, and the one-inflation index oˆiPP for a positive Poisson distribution are shown in Table .

Table 1. The sample proportion of one-valued data, sample mean, sample variance, sample dispersion index and sample one-inflation index for the one-inflated positive Poisson distribution for various pairs of parameters λ and ω, based on a dataset of size 5000.

According to Table , the index correctly assesses the presence of excess ones in the sample data. When ω=0, this implies that the index is very close to zero, which indicates that there is no excess of ones in the data for the OIPP distribution since this distribution reduces to a positive Poisson distribution. Trivially, a smaller λ yields a higher proportion of ones in the positive Poisson distribution and a lower proportion of ones contributed by ω in the OIPP distribution. Therefore, the resulting index will be comparatively low (refer to Table for λ=0.5 and all values of ω). However, when λ is large, the proportion of ones in the positive Poisson distribution will be small, while the proportion of ones contributed by ω in the OIPP distribution will be large. This results in a high index value (refer to Table for λ=2.5 and positive ω). It can be concluded that the one-inflation index is useful to assess the presence of excess ones in a dataset for a positive Poisson distribution.

This one-inflation index can be a viable alternative to the score test proposed by Godwin and Böhning [Citation9], in which the null hypothesis of the test states that there is no inflation in the data. The score test [Citation9] is given as. S=n1Tλˆn2λˆT+1nTTλˆwhere T=exp(λˆ)1 and Sχ2(1). The information about whether a dataset contains excess ones or not needs to be considered in statistical modelling. If the dataset contains excess ones, then one-inflated positive count data distributions should be considered.

6. Applications

A dataset on the frequency of a person being arrested for drunk driving [Citation23] was used to demonstrate and analyse the performance of the proposed estimators in fitting real data. The sample one-inflation index of 0.1545 implies that the dataset contains a substantial number of ones in the data that cannot be explained by a positive Poisson distribution. The model fitting to the data using a positive Poisson distribution yields λˆ=0.1275. The estimated value of λ generates a score s=30.6686, hence rejecting the null hypothesis of no inflation. Therefore, the OIPP distribution is appropriate for model fitting for this dataset.

The chi-square goodness-of-fit test and the root mean squared error (RMSE) of the fitted data were used to evaluate the proposed estimators, in which RMSE=x=1h(nxnˆx)2/h, where h is the number of data groups and nˆx is the estimated frequency for the respective nx. In general, the best estimator provides adequate goodness of fit and has the lowest RMSE.vn

The results of the model fitting for the frequency of a person being arrested for drunk driving based on the proposed estimators are shown in Table . Based on the goodness-of-fit test and p-values, all estimators are found to be adequate except for RPE. The RMSE of the OLSE is found to be significantly smaller compared to other estimators. Therefore, the OLSE provide the best fit for describing the frequency of a person being arrested for drunk driving.

Table 2. The results of model fitting of the frequency of a person being arrested for drunk driving based on the MME, MLE, RPE and OLSE for the OIPP distribution.

7. Conclusions

Several estimators for the OIPP distribution were proposed in this study, namely MME, MLE, RPE, OPE and OLSE, in which the MLE and OPE both yield identical results. Two comprehensive simulation studies were conducted to evaluate the unbiasedness, consistency and efficiency of the proposed estimators. According to the results, the proposed estimators are asymptotically unbiased and consistent. The best to worst estimators based on the bias, mean squared error, efficiency and deficiency values are MLE, OLSE, MME and RPE. A one-inflation index to assess the presence of excess ones in a dataset for a positive Poisson distribution was also proposed. Based on the chi-square goodness-of-fit test and RMSE, the OLSE provides the best fit to the dataset adopted in this study. Therefore, both MLE and OLSE are the best estimators for the OIPP distribution.

Acknowledgements

The authors gratefully acknowledge the financial support received in the form of research grants from the Ministry of Education, Malaysia [FRGS/1/2019/STG06/UKM/01/5]; and Universiti Kebangsaan Malaysia [GUP-2019-031].

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by Ministry of Education, Malaysia: [Grant Number FRGS/1/2019/STG06/UKM/01/5]; Universiti Kebangsaan Malaysia: [Grant Number GUP-2019-031].

References

  • David FN, Johnson NL. The truncated Poisson. Biometrics. 1952;8:275–285.
  • Moore PG. The estimation of the Poisson parameter from a truncated distribution. Biometrika. 1952;39:247–251.
  • Plackett RL. The truncated Poisson distribution. Biometrics. 1953;9:485–488.
  • Finney DJ, Varley GC. The truncated Poisson distribution. Biometrics. 1955;11:387–394.
  • Tate RF, Goen RL. Minimum variance unbiased estimation for the truncated Poisson distribution. Ann Math Stat. 1958;29:755–765.
  • Sampford MR. The truncated negative binomial distribution. Biometrika. 1955;42:58–69.
  • Brass W. Simplified methods of fitting the truncated negative binomial distribution. Biometrika. 1958;45:59–68.
  • Ghitany ME, Al-Mutairi DK, Nadarajah S. Zero-truncated Poisson-Lindley distribution and its application. Math Comput Simul. 2008;79:279–287.
  • Godwin RT, Böhning D. Estimation of the population size by using the one-inflated positive Poisson model. J R Stat Soc: Ser C. 2017;66:425–448.
  • Godwin RT. One-inflation and unobserved heterogeneity in population size estimation. Biometrical J. 2017;59:79–93.
  • Godwin R. The one-inflated positive Poisson mixture model for use in population size estimation. Biometrical J. 2019;61:1541–1556.
  • Tajuddin, R. R. M., Ismail, N., Kamarulzaman, I., 2021. Estimating population size of criminals: a new Horvitz-Thompson estimator under one-inflated positive Poisson-Lindley model. Crime & Delinquency. doi:10.1177%2F00111287211014158
  • Famoye F. Parameter estimation for generalized negative binomial distribution. Commun Stat – Simu Comput. 1997;26:269–279.
  • Wagh YS, Kamalja KK. Zero-inflated models and estimation in zero-inflated Poisson distribution. Commun Stat – Simu Comput. 2018;47:2248–2265.
  • Hogg, R.V., McKean, J.W., Craig, A.T. Maximum likelihood estimation, in Introduction to mathematical statistics, 6th ed. Pearson Prentice Hall, New Jersey, pp. 313, 2005.
  • Borah M, Nath AD. A study of the inflated Poisson Lindley distribution. J Indian Soc Agr Stat. 2001;54:317–323.
  • Böhning D. A simple variance formula for population size estimators by conditioning. Stat Methodol. 2008;5:410–423.
  • Drossos CA, Philippou AN. A note on minimum distance estimates. Ann I Stat Math. 1980;32:121–123.
  • Akgül FG, Şenoğlu B, Arslan T. An alternative distribution to Weibull for modeling the wind speed data: Inverse Weibull distribution. Energ Convers Manage. 2016;114:234–240.
  • Puig P. Characterizing additively closed discrete models by a property of their maximum likelihood estimators, with an application to generalized Hermite distributions. J Am Stat Assoc. 2003;98:687–692.
  • Puig P, Valero J. Count data distributions. J Am Stat Assoc. 2006;101:332–340.
  • Blasco-Moreno A, Pérez-Casany M, Puig P, et al. What does a zero mean? Understanding false, random and structural zeros in ecology. Methods Ecol Evol. 2019;10:949–959.
  • Van der Heijden PGM, Cruyff M, Van Houwelingen HC. Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statistica Neerlandica. 2003;57:289–304.