Full article: On the MLE of the Waring distribution

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The two-parameter Waring is an important heavy-tailed discrete distribution, which extends the famous Yule-Simon distribution and provides more flexibility when modelling the data. The commonly used EFF (Expectation-First Frequency) for parameter estimation can only be applied when the first moment exists, and it only uses the information of the expectation and the first frequency, which is not as efficient as the maximum likelihood estimator (MLE). However, the MLE may not exist for some sample data. We apply the profile method to the log-likelihood function and derive the necessary and sufficient conditions for the existence of the MLE of the Waring parameters. We use extensive simulation studies to compare the MLE and EFF methods, and the goodness-of-fit comparison with the Yule-Simon distribution. We also apply the Waring distribution to fit an insurance data.

Keywords:

1. Introduction

The power-law distributions are a class of heavy-tailed univariate distributions that describe a quantity whose probability decreases as a power of its magnitude, which is widely used in social science, network science and so on. Two commonly used discrete examples are Zipf distribution and Yule-Simon distribution (or Yule distribution). Zipf law is found by the linguist Zipf when studying the words in a linguistic corpus, in which the frequency of a certain word is proportional to $r^{- d}$ , where r is the corresponding rank and d is some positive value. The Yule-Simon distribution is a highly skewed discrete probability distribution with very long upper tails, named after Udny Yule and Herbert Simon–winner of the 1978 Nobel Prize in economics, with distribution function $P (X = k) = \frac{α Γ (k) Γ (α + 1)}{Γ (α + k + 1)}, α > 0, k = 1, 2, 3, \dots,$ where $Γ (\cdot)$ is the Gamma function, and α is the parameter. Yule (Citation1925) proposed the distribution first, applying it to model the number of species in the biological genera. Simon (Citation1955) rediscovered the ‘Yule’ distribution later, using it to examine city populations, income distributions, and word frequency in publications (Mills, Citation2017). In Price (Citation1965, Citation1976), Price, a famous American scientist, found that the number of citations of the literature follows the Yule distribution, when linking the published literature with his cited literature to form a directed network of scientific and technological literature. It is a cumulative advantage distribution based on the mechanism of ‘success breeds success’.

The two-parameter Waring distribution is a generalization of the Yule-Simon distribution, which provides more flexibility than the commonly used one-parameter Zipf distribution, Yule-Simon distribution, negative binomial distribution, etc. The Waring distribution can describe a wide variety of phenomena in actuarial science, network science, library and information science, such as number of shares purchased by each customer, number of traffic accidents, number of nodes in the internet connections, and frequency of authors who publish a certain number of paper (Huete-Morales & Marmolejo-Martín, Citation2020; Panaretos & Xekalaki, Citation1986; Seal, Citation1952; Xekalaki, Citation1983). The distribution function of $X \sim W (α, β)$ is given by (1) $P (X = k) = α \cdot \frac{Γ (β + k - 1)}{Γ (β)} \cdot \frac{Γ (α + β)}{Γ (α + β + k)}, α > 0, β > 0, k = 1, 2, 3, \dots,$ (1) where $α, β$ are the parameters of the Waring distribution. It is easy to prove that the Waring distribution is a heavy-tailed distribution, with a polynomial tail of order $α + 1$ . We can also derive that $E (X) = 1 + \frac{β}{α - 1}$ if $α > 1$ , and $v a r (X) = \frac{α β (α + β - 1)}{(α - 1)^{2} (α - 2)}$ if $α > 2$ . The Yule-Simon distribution is a special case of the Waring distribution with $β = 1$ .

The parameter estimation is extremely important to make a statistical inference. Garcia (Citation2011) provided a fixed-point algorithm to estimate the Yule-Simon distribution parameter. For the Waring distribution, a commonly used method is the EFF (Expectation-First Frequency), which is essentially the method of moments. More specifically, the EFF method uses the sample mean $\bar{X}$ to estimate $E (X) = 1 + \frac{β}{α - 1}$ and the empirical first frequency $\hat{P} (X = 1)$ to estimate $P (X = 1) = \frac{α}{α + β}$ , leading to $\hat{α} = \frac{\hat{P} (X = 1) \cdot (\bar{X} - 1)}{\hat{P} (X = 1) \cdot \bar{X} - 1}, \hat{β} = \frac{{1 - \hat{P} (X = 1)} \cdot (\bar{X} - 1)}{\hat{P} (X = 1) \cdot \bar{X} - 1} .$ The EFF method has two drawbacks: first, it restricts that $α > 1$ , which can not be used when the first moment does not exist; second, it only uses the information of $P (X = 1)$ and $E (X)$ , which loses information of the data. Xekalaki (Citation1985) proposed a factorial moment estimation for the bivariate generalized Waring distribution, which also suffers from these drawbacks.

In the current literature, researchers also considered the maximum likelihood estimator (MLE) of the Waring parameters. However, they usually directly applied the optimization algorithm to the log-likelihood function, without verifying the existence of the MLE (Rivas & Campos, Citation2021). As we all know, MLE does not exist in all cases. In fact, for some sample data, the MLE of Waring parameters exists, while for some sample data, it does not exist. For example, in the insurance share data analysed in Section 4, the MLE of the Waring parameters does not exist for the groups with central ages 17.5, 22.5 and 67.5; for each group, the age length equals 5. If we do not know whether MLE exists and we calculate it, then it is questionable to show the credibility of MLE. Based on this consideration, the existence of MLE will be investigated in this paper. More specifically, we apply the profile method to the log-likelihood function, deriving the necessary and sufficient conditions for the existence of the MLE of the Waring parameters. When the largest value in the observed sample is small, we also verify our theory by exactly solving the estimating equation system. Furthermore, we get two byproducts during the proof of the main result. The first one is our Lemma 2.3, which provides an alternative way to prove the existence of MLE for two parameters, while the conventional proof includes a complicated calculation of the Hessian matrix. The second one is our Lemma 2.4, which provides a comparison method for two increasing and concave functions. These results may play a role in other applications.

Through extensive simulation studies, we find that when the sample size is as small as n = 100, both MLE and EFF yield relatively poor estimates. When $n \geq 200$ , MLE always results in much smaller biases than EFF; the relative bias of MLE decreases from 6%-7% when n = 200 to around 1% when n = 1000, while that of EFF is still around 10% even when n = 1000 for $α \leq 1.2$ . The relative standard errors from MLE are comparable with those from EFF for medium-sized samples (n = 200 and 400), but smaller for n = 1000. Overall, the MLE method results better performance than the EFF method when $α / β$ is not large or the sample size is large enough. The performance of EFF is relatively better when $α / β$ is large, say $α / β \geq 2$ . Our explanation is that, since $P (X = 1) = \frac{α}{α + β} = \frac{α / β}{α / β + 1}$ , if $α / β$ is large, then $P (X = 1)$ is close to 1, and thus EFF includes relatively more information than the case with small $α / β$ . We also compare the Waring distribution and Yule-Simon distribution in terms of goodness-of-fit to the data, and we find that the Waring distribution fits the data similar to the Yule-Simon distribution when $β = 1$ , and much better when β departs from 1.

The rest of the paper is organized as follows. Section 2 presents the main result based on the profile method. Section 3 gives some numerical studies to show the advantage of MLE over the EFF method, and that of Waring distribution over the Yule-Simon distribution. The real insurance data analysis is presented in Section 4. All technical details are deferred to the Appendix.

2. Maximum likelihood estimator of the Waring parameters

For the two-parameter Waring distribution, we have $\begin{aligned} P (X = 1) & = α \cdot \frac{Γ (β)}{Γ (β)} \cdot \frac{Γ (α + β)}{Γ (α + β + 1)} = \frac{α}{α + β}, \\ P (X = k) & = α \cdot \frac{Γ (β + k - 1)}{Γ (β)} \cdot \frac{Γ (α + β)}{Γ (α + β + k)} \\ = α \cdot \frac{β (β + 1) (β + 2) \dots (β + k - 2)}{(α + β) (α + β + 1) \dots (α + β + k - 1)}, k = 2, 3, \dots . \end{aligned}$ Suppose that $x_{1}, \dots, x_{n}$ is a random sample from the Waring distribution $W (α, β)$ , and let $m = max {x_{1}, \dots, x_{n}}$ be the largest observe value, $n_{k}$ be the number of observations equal to k, $k = 1, \dots, m$ , and $\sum_{k = 1}^{m} n_{k} = n$ . Based on the data ${x_{1}, \dots, x_{n}}$ , we can easily derive the likelihood function as $\begin{aligned} L_{n} (α, β) \\ = {(\frac{α}{α + β})}^{n_{1}} \prod_{k = 2}^{m} {\frac{α β (β + 1) \dots (β + k - 2)}{(α + β) (α + β + 1) \dots (α + β + k - 1)}}^{n_{k}} \\ = {(\frac{α}{α + β})}^{n} {(\frac{β}{α + β + 1})}^{\sum_{s = 2}^{m} n_{s}} {(\frac{β + 1}{α + β + 2})}^{\sum_{s = 3}^{m} n_{s}} \dots {(\frac{β + m - 2}{α + β + m - 1})}^{n_{m}} . \end{aligned}$ Then the log-likelihood is (2) $\begin{aligned} ℓ_{n} (α, β) \\ = \log L_{n} (α, β) \\ = n {\log α - \log (α + β)} + \sum_{s = 2}^{m} n_{s} {\log β - \log (α + β + 1)} \\ + \sum_{s = 3}^{m} n_{s} {\log (β + 1) - \log (α + β + 2)} + \dots \\ + n_{m} {\log (β + m - 2) - \log (α + β + m - 1)} . \end{aligned}$ (2) Taking partial derivatives with respect to α and β leads to the following maximum likelihood equations (3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) (4) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial β} \\ = - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) \\ + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) = 0, \end{aligned}$ (4) where $p_{k} = n_{k} / n$ with $p_{m} > 0$ .

We first consider Equation (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) ), which can be treated as the conditional maximum likelihood equation of α given a positive β. When m = 1, that is, all the observed values equal to 1, since $\frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} = \frac{1}{α} - \frac{1}{α + β} = \frac{β}{α + β} > 0$ , thus there is no solution to the likelihood equation. We focus on the situation where $m \geq 2$ .

In the following, we first consider the conditional maximum likelihood Equation (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) ) given any positive β, which can be regarded as a generalization of the Yule-Simon distribution, and we prove that those results for Yule-Simon distribution ( $β = 1$ ) also hold for any $β > 0$ . More specifically, given a positive β, we denote the conditional MLE of α as $α (β)$ . According to (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) ), $α (β)$ satisfies (5) $α (β) = \frac{1}{\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β) + β + 2} + \dots + \frac{p_{m}}{α (β) + β + m - 1}} .$ (5) For notational ease, we define (6) $\begin{aligned} η_{1} & = \sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s} = \sum_{t = 2}^{m} (t - 1) p_{t}, η_{2} = \sum_{t = 2}^{m} t \sum_{s = t}^{m} p_{s} = \sum_{t = 2}^{m} \frac{(t - 1) (t + 2)}{2} p_{t}, \\ η_{3} & = \sum_{t = 2}^{m} t^{2} \sum_{s = t}^{m} p_{s} = \sum_{t = 2}^{m} \frac{(t - 1) (2 t^{2} + 5 t + 6)}{2} p_{t}, \end{aligned}$ (6) and present the properties of $α (β)$ in the following Proposition 2.1.

Proposition 2.1

Let $α (β)$ be defined as in (Equation5(5) $α (β) = \frac{1}{\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β) + β + 2} + \dots + \frac{p_{m}}{α (β) + β + m - 1}} .$ (5) ). We have the following properties.

Property 1. If $β \to 0$ , we have $α (β) \to 0$ .
Property 2. If $β \to \infty$ , we have $α (β) = \frac{1}{η_{1}} \cdot β + \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} + \frac{η_{2}^{2} - η_{1} η_{3} - η_{1} + 2 η_{2} - η_{3}}{(1 + η_{1})^{3}} \cdot \frac{1}{β} + O (\frac{1}{β^{2}}) .$
Property 3. $α (β)$ is an increasing and concave function of β.
Property 4. The first derivative $α^{'} (β) \to \infty$ if $β \to 0$ , and $α^{'} (β) \to \frac{1}{η_{1}}$ if $β \to \infty$ .
Property 5. When $β > 0$ , the number of solutions to $α (β) = Z (β)$ is finite, where $Z (β)$ is any polynomial or fractional function of β.

Next we discuss the existence of MLE of $(α, β)$ . By (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) ), we have (7) $\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β) + β + 2} + \dots + \frac{p_{m}}{α (β) + β + m - 1} = \frac{1}{α (β)} .$ (7) By (Equation4(4) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial β} \\ = - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) \\ + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) = 0, \end{aligned}$ (4) ), we have $\begin{aligned} \frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β) + β + 2} + \dots + \frac{p_{m}}{α (β) + β + m - 1} \\ = \frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2} . \end{aligned}$ Let (8) $h (β) = \frac{1}{\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}} .$ (8) If the curves $y = h (β)$ and $y = α (β)$ intersect at some $β > 0$ , we have solution to the equation system (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) )–(Equation4(4) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial β} \\ = - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) \\ + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) = 0, \end{aligned}$ (4) ). Later we prove that the intersection is unique and is the MLE of the Waring distribution.

To discuss whether $y = h (β)$ and $y = α (β)$ intersect at some $β > 0$ , we first present the properties of $h (β)$ in the following proposition.

Proposition 2.2

Let $h (β)$ be defined as in (Equation8(8) $h (β) = \frac{1}{\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}} .$ (8) ), we have the following properties.

Property 1*. If $β \to 0$ , we have $h (β) \to 0$ .
Property 2*. If $β \to \infty$ , we have $h (β) = \frac{1}{η_{1}} \cdot β + \frac{η_{2} - 2 η_{1}}{η_{1}^{2}} + \frac{η_{2}^{2} - η_{1} η_{3}}{η_{1}^{3}} \cdot \frac{1}{β} + O (\frac{1}{β^{2}}) .$
Property 3*. $h (β)$ is an increasing and concave function of β.
Property 4*. The first derivative $h^{'} (β) \to \frac{1}{\sum_{s = 2}^{m} p_{s}}$ if $β \to 0$ , and $h^{'} (β) \to \frac{1}{η_{1}}$ if $β \to \infty$ .
Property 5*. When $β > 0$ , the number of solutions to $h (β) = Z (β)$ is finite, where $Z (β)$ is any polynomial or fractional function of β.

Based on Properties 1 and 4 of Proposition 2.1 and 1* and 4* of Proposition 2.2, it is easy to derive that $α (β) > h (β)$ when β is small. Therefore, if we can prove that $α (β) < h (β)$ for some large β, due to the continuity of the two functions, there must exist solution to the equation systems (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) )–(Equation4(4) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial β} \\ = - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) \\ + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) = 0, \end{aligned}$ (4) ). This is the key idea to check the existence of the MLE.

Before presenting the main result, we first give two important lemmas.

Lemma 2.3

For the log-likelihood function $ℓ_{n} (α, β)$ , assume that for any β, $ℓ_{n} (α (β), β) = max_{α} ℓ_{n} (α, β)$ , and there exists $β_{1}$ such that

$\partial ℓ_{n} (α, β) / \partial β |_{α = α (β_{1}), β = β_{1}} = 0$ , $\partial ℓ_{n} (α, β) / \partial β |_{α = α (β)} > 0$ for $β < β_{1}$ and

$\partial ℓ_{n} (α, β) / \partial β |_{α = α (β)} < 0$ for $β > β_{1}$ . Then we have $ℓ_{n} (α (β_{1}), β_{1}) = max_{α, β} ℓ_{n} (α, β)$ .

Lemma 2.3 provides an alternative to the proof of MLE based on the profile method, which is simpler than the conventional proof that includes complicated calculation of the Hessian matrix.

Lemma 2.4

Assume that $t_{1} (x)$ and $t_{2} (x)$ are increasing and concave functions for x>0, the curves $y = t_{1} (x)$ and $y = t_{2} (x)$ only intersect finite times, and the number of solutions to $t_{i} (x) = Z (x)$ is finite for both i = 1, 2, where $Z (x)$ is any polynomial or fractional function of x. Further assume that

$t_{1} (a) = t_{2} (a)$ for some a;
there exists some $δ^{*} > 0$ such that $t_{1} (x) > t_{2} (x)$ for $x \in (a, a + δ^{*})$ ;
$lim_{x \to \infty} \frac{t_{1} (x)}{x} = lim_{x \to \infty} \frac{t_{2} (x)}{x} = c^{*} > 0$ ;
there exists $δ_{4}^{*}$ such that $t_{1} (x) > t_{2} (x)$ for $x \in (δ_{4}^{*}, \infty)$ .

Then, we have $t_{1} (x) \geq t_{2} (x)$ for all $x \in (a, \infty)$ .

Lemma 2.4 provides a general method to compare two increasing and concave functions, without requiring the explicit form of the functions, which not only simplifies the comparison of $α (β)$ and $h (β)$ , but also has its own value in other applications.

Based on Propositions 2.1–2.2, Lemmas 2.3–2.4, we summarize the existence of MLE in the following Theorem 2.5.

Theorem 2.5

Suppose that ${x_{1}, \dots, x_{n}}$ is a random sample from the Waring distribution $W (α, β)$ , and $m = max {x_{1}, \dots, x_{n}}$ . Let $p_{k} = n_{k} / n$ be the proportion of ${x_{i} = k}$ with $p_{m} > 0$ . Let $α_{i n t e r c e p t} = \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})}, h_{i n t e r c e p t} = \frac{η_{2} - 2 η_{1}}{η_{1}^{2}} .$

If $α_{i n t e r c e p t} < h_{i n t e r c e p t}$ , then the MLE of $(α, β)$ exists.
If $α_{i n t e r c e p t} > h_{i n t e r c e p t}$ , then the MLE of $(α, β)$ does not exist.
If $α_{i n t e r c e p t} = h_{i n t e r c e p t}$ , or equivalently, $η_{1}^{2} + 2 η_{1} - η_{2} = 0$ , we denote $d_{α} = \frac{η_{2}^{2} - η_{1} η_{3} - η_{1} + 2 η_{2} - η_{3}}{(1 + η_{1})^{3}}$ and $d_{h} = \frac{η_{2}^{2} - η_{1} η_{3}}{η_{1}^{3}}$ . The MLE exists if $d_{α} < d_{h}$ and doesn't exist if $d_{α} > d_{h}$ .

To derive the necessary and sufficient conditions of MLE existence, we start form the conditional MLE of α for a given β, because it is easier to discuss the possible solutions by intersection of two curves determined by the estimating equations. Numerically, since we only have two parameters to estimate, thus it is quite efficient to solve that by the ‘optim’ function in R.

Remark 2.1

Unlike the existing literature which directly applied the optimization algorithm to the log-likelihood function, without verifying the existence of the MLE (Huete-Morales & Marmolejo-Martín, Citation2020; Rivas & Campos, Citation2021), we present the necessary and sufficient conditions for the existence of the MLE of the Waring parameters, which is the first attempt. It is easy to see that the sign of $α_{i n t e r c e p t} - h_{i n t e r c e p t}$ is equal to the sign of $A (p_{2}, \dots, p_{m}) = η_{1}^{2} + 2 η_{1} - η_{2} .$ For m = 2, we have $A (p_{2}, \dots, p_{m}) = p_{2}^{2} = (n_{2} / n)^{2} > 0$ , and thus the MLE of $(α, β)$ does not exist. For $m \geq 3$ , it depends, and we can check the sign of $A (p_{2}, \dots, p_{m}) = {p_{2} + 2 p_{3} + \dots + (m - 1) p_{m}}^{2} - {p_{3} + 3 p_{4} + \dots + \frac{(m - 1) (m - 2)}{2} p_{m}}$ for a general m. For m = 2, 3, we also carefully check the existence of real-valued solution to the equation system (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) )–(Equation4(4) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial β} \\ = - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) \\ + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) = 0, \end{aligned}$ (4) ), and find that the sign of $α_{i n t e r c e p t} - h_{i n t e r c e p t}$ indeed determines the existence of MLE. The readers can refer to the authors for checking details.

One more comment on Theorem 2.5 is as follows. If $α_{i n t e r c e p t} < h_{i n t e r c e p t}$ , or $α_{i n t e r c e p t} = h_{i n t e r c e p t}$ with $d_{α} < d_{h}$ , the MLE of the Waring parameters is a finite vector. Then the Waring distribution fits the data better than the Yule-Simon distribution, if the estimated β departs from 1, and similarly if the estimated β is close to 1. If $α_{i n t e r c e p t} > h_{i n t e r c e p t}$ , or $α_{i n t e r c e p t} = h_{i n t e r c e p t}$ with $d_{α} > d_{h}$ , the likelihood function will be maximized at the boundary region, i.e., infinity. Therefore, if we directly apply the optimization algorithm to the likelihood function, the MLE may be far from the true parameters; for example, in the real data application, we get that MLE $\hat{α} = 1, 687, 133.2, \hat{β} = 675, 078.4$ for the group with central age 67.5 (age from 65 to 70), where in fact that the MLE does not exist. In such cases, we can use the EFF method if the EFF estimates are in reasonable scales, and the Waring distribution will still fit the data better than the Yule-Simon distribution.

3. Simulation studies

3.1. Comparison of MLE and EFF

In this section, we give some numerical studies to compare the MLE and the EFF method in the Waring parameter estimation.

The Waring distributed observations are generated by the function rWARING in the R package gamlss.dist. We need mention that in the function rWARING, the parameters is ${μ, σ}$ , and the probability mass function is given by $P (X = k) = \frac{(1 + σ) Γ (k + \frac{μ}{σ}) Γ (\frac{μ + σ + 1}{σ})}{σ Γ (k + \frac{μ + 1}{σ} + 2) Γ (\frac{μ}{σ})}, k = 0, 1, 2, \dots, μ > 0, σ > 0.$ Comparing the above probability mass function to (Equation1(1) $P (X = k) = α \cdot \frac{Γ (β + k - 1)}{Γ (β)} \cdot \frac{Γ (α + β)}{Γ (α + β + k)}, α > 0, β > 0, k = 1, 2, 3, \dots,$ (1) ), we can find that we need to add 1 to the generated values from rWARING, and the relationship between the parameters is $α = 1 + 1 / σ$ and $β = μ / σ$ . Thus rWARING automatically restricts $α > 1$ and the EFF estimator exists. We consider 20 combinations of $(α, β)$ , where $α = 2, 1.5, 1.2, 1.1, 1.05$ and $β = 0.5, 1, 1.5, 2$ , with sample sizes n = 100, 200, 400 and 1000. We generate 500 replicates for each case.

Probably due to the parameter specification and restricted data-generating process of the function rWARING, we find that $α_{i n t e r c e p t} < h_{i n t e r c e p t}$ is satisfied in all cases, except two replicates in the case $α = 2, β = 0.5$ with small sample size n = 100. By Remark 2.1, $α_{i n t e r c e p t} < h_{i n t e r c e p t}$ is equivalent to (9) ${p_{2} + 2 p_{3} + \dots + (m - 1) p_{m}}^{2} < p_{3} + 3 p_{4} + \dots + \frac{(m - 1) (m - 2)}{2} p_{m} .$ (9) It is easy to see that $\begin{aligned} {p_{2} + 2 p_{3} + \dots + (m - 1) p_{m}}^{2} & = {\sum_{k = 1}^{m} (k - 1) p_{k}}^{2} = {E_{n} (X) - 1}^{2}, \\ p_{3} + 3 p_{4} + \dots + \frac{(m - 1) (m - 2)}{2} p_{m} & = \sum_{k = 1}^{m} \frac{(k - 1) (k - 2)}{2} p_{k} \\ = \frac{1}{2} E_{n} (X^{2}) - \frac{3}{2} E_{n} (X) + 1, \end{aligned}$ where $E_{n}$ means the empirical distribution. When $1 < α \leq 2$ , $E (X)$ exists while $E (X^{2})$ diverges. Thus (Equation9(9) ${p_{2} + 2 p_{3} + \dots + (m - 1) p_{m}}^{2} < p_{3} + 3 p_{4} + \dots + \frac{(m - 1) (m - 2)}{2} p_{m} .$ (9) ) is very likely to hold, and the MLE exists. However, in real applications, it is possible that $α_{i n t e r c e p t} > h_{i n t e r c e p t}$ (Section 4).

As mentioned immediately after Theorem 2.5, we use the ‘optim’ function to solve the MLE after verifying its existence. We tried four methods to initialize the parameters: (i) small values, $(α^{(0)}, β^{(0)}) = (1.1, 0.1)$ ; (ii) large values, $(α^{(0)}, β^{(0)}) = (2.5, 3)$ ; (iii) true values of the parameters plus a random perturbation $N (0, {0.2}^{2})$ , but restrict that $α^{(0)} \geq 1.1$ and $β^{(0)} \geq 0.1$ ; (iv) the EFF method. Extensive numerical studies show that these four initializing methods yield almost the same results, which indicates that the optimization is not sensitive to the initial values. Therefore, we use the EFF estimator for initialization if EFF produces positive estimates, otherwise, we set the initial values as $(α^{(0)}, β^{(0)}) = (1.1, 0.1)$ .

Among all the cases, the EFF method results in negative estimates only in one replicate in the case $α = 2, β = 0.5$ with small sample size n = 100; in another replicate, the denominator $\hat{P} (X = 1) \cdot \bar{X} - 1$ is exactly 0, so the estimator does not exist; these two replicates are deleted for fair comparison. Since the parameters are in different scales, especially the parameter β, the maximal value is four times of the minimal one. Thus for fair comparison, we report the rBias (relative bias, defined as the bias divided by the true value of the parameter) and rStd (relative standard errors, defined as the standard error divided by the true value of the parameter) in Tables and . We find that, when the sample size is as small as n = 100, both MLE and EFF yield relatively poor estimates, with standard errors being larger than or close to 50% of the true value of the parameter, which indicates that it is challenging to accurately estimate the parameters with small sample sizes. Therefore, we focus on the comparison of MLE and EFF for $n \geq 200$ . First, MLE always results in much smaller biases than EFF. Though the rBias of EFF decreases when the sample size increases, it increases when the true α decreases, and it is still around 10% even when n = 1000 for $α \leq 1.2$ ; the rBias of MLE decreases from 6%–7% to around 1% when n increases from 200 to 1000, regardless of the true α. Second, MLE results in comparable rStd with EFF for medium-sized sample (n = 200 and 400), but smaller rStd for n = 1000. Overall, the MLE method results better performance than the EFF method when $α / β$ is not large or the sample size is large enough. The performance of EFF is relatively better when $α / β$ is large, e.g., $α / β \geq 2$ . Our explanation is that, since $P (X = 1) = \frac{α}{α + β} = \frac{α / β}{α / β + 1}$ , if $α / β$ is large, then $P (X = 1)$ is close to 1. Thus EFF includes relatively more information than the case with small $α / β$ .

Table 1. Relative biases and relative standard errors of estimated parameters, for $β = 0.5$ and 1.

Display Table

Table 2. Relative biases and relative standard errors of estimated parameters, for $β = 1.5$ and 2.

Display Table

3.2. Goodness-of-fit comparison with Yule-Simon distribution

In this section, we compare the Waring distribution and the Yule-Simon distribution, in terms of goodness-of-fit to the data.

We fix $α = 1.5$ , and generate data from the Waring distribution with $β = 1, 1.5, 2$ ; data is generated from the function rWARING as in Section 3.1. When $β = 1$ , it is exactly the Yule-Simon distribution, and when β departs from 1, the Yule-Simon assumption is violated. We consider 500 replicates with sample sizes n = 100, 200, 400, 1000. To initialize the optimization for the MLE of the Yule-Simon parameter α, we use the first frequency $P (X = 1) = \frac{α}{α + 1}$ , that is, $\tilde{α} = \frac{\hat{P} (X = 1)}{1 - \hat{P} (X = 1)}$ . Figure presents the box-plots of the likelihood ratio statistics $T_{n} = 2 {ℓ_{n} (\hat{α}, \hat{β}) - ℓ_{n}^{*} ({\hat{α}}^{*})},$ where $ℓ_{n} (\hat{α}, \hat{β})$ is the log-likelihood function of the Waring fitting evaluated at the MLE $(\hat{α}, \hat{β})$ , and $ℓ_{n}^{*} ({\hat{α}}^{*})$ is the log-likelihood function of the Yule-Simon fitting evaluated at the MLE ${\hat{α}}^{*}$ . If the true β equals 1, the Yule-Simon distribution is correct, so it is easy to prove that $T_{n} \sim χ_{1}^{2}$ ; if the true β departs from 1, the Yule-Simon distribution is not correct, so $T_{n}$ will be large. The box-plots in Figure confirm that the Waring distribution fits the data similar to the Yule-Simon distribution when $β = 1$ , and much better when β departs from 1. We further report the proportion of replicates that the Yule-Simon distribution is rejected at nominal level 0.05, in Table .

Figure 1. Box-plots of $T_{n}$ corresponding to $β = 1$ (first row), 1.5 (second row), and 2 (third row), respectively, where the dashed line indicates the critical value 3.84, and the number at the right side of the figure is the proportion that $T_{n} > 3.84$ . In the last piece, all $T_{n}$ 's are much larger than 3.84, and thus the dashed line and the rejection proportion are not shown in the figure.

Table 3. Proportion of replicates that the Yule-Simon distribution is rejected at nominal level 0.05.

Display Table

4. Real data application

Seal (Citation1947, Citation1952) provided data on insurance shares for 12 different age periods. The original data is about male lives assured in a British life office, maintained for administrative purposes. The analysed data is a random subset, and every tenth names in this list were included until the total of 2000 was reached. The lives sampled are scheduled according to the year of birth and the number of policies in force. The group is represented by the central age.

Seal (Citation1952) fitted the data using the discrete Pareto, with probability mass function $P (X = k) = k^{- d} / ζ (d), k = 1, 2, 3, \dots, d > 1,$ where $ζ (d)$ is a normalization constant, and the parameter d is estimated by the MLE. Here we apply the Waring distribution to fit the data. For the age periods centred at 17.5 and 22.5, the maximal number of shares is 2. The EFF method leads to negative parameter estimates, while the MLE is proved not to exist as in Remark 2.1. We focus on the rest 10 groups, with central ages from 27.5 to 72.5. Among these 10 groups, for the group with central age 67.5, we have n = 45 and $n_{1} = 33, n_{2} = 7, n_{3} = 4, n_{4} = 1$ , and it is easy to verify that (Equation9(9) ${p_{2} + 2 p_{3} + \dots + (m - 1) p_{m}}^{2} < p_{3} + 3 p_{4} + \dots + \frac{(m - 1) (m - 2)}{2} p_{m} .$ (9) ) does not hold. Thus the MLE does not exist. If we directly apply the optimization algorithm, we get $\hat{α} = 1, 687, 133.2, \hat{β} = 675, 078.4$ , which is meaningless. However, if we use the EFF method, we get $\hat{α} = 11, \hat{β} = 4$ , and the resulted fitting is reasonably good. Thus, we need to be careful in using the MLE. Table summarizes the comparison of the actual distribution with discrete Pareto law fitting, Waring fitting with EFF and MLE, we find that the Waring distribution fits the data slightly better than the discrete Pareto law.

Table 4. Comparison of actual distribution (A) with discrete Pareto law fitting (P), Waring fitting with EFF (E) and MLE (M).

Download CSV Display Table

5. Discussion

To fit a given data set by the Waring distribution, we need to verify the existence condition of the MLE of the Waring parameters before we use the MLE. If the existence condition is not satisfied, it means that the likelihood is maximized at the boundary, i.e., infinity. Therefore, if we directly apply the optimization algorithm to the likelihood function, the MLE may be far from the true parameters; see for example, we get MLE $\hat{α} = 1, 687, 133.2, \hat{β} = 675, 078.4$ for the group with central age 67.5, where in fact that the MLE does not exist. In such cases, we can use the EFF method if the EFF estimates are in reasonable scales. Based on the simulation studies and the real data analysis, we find that, when the sample size is small or the maximum observed value is small, the MLE is less likely to exist, and when the sample size is big and the maximum observed value is large, the MLE is more likely to exist. Nevertheless, we need verify the existence condition for the MLE.

Acknowledgements

The authors would like to thank two anonymous reviewers, an associate editor and the editor for constructive comments and helpful suggestions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work is partially supported by National Natural Science Foundation of China [Grant Numbers 11671096, 11690013, 11731011, 11871376] and Natural Science Foundation of Shanghai [Grant Number 21ZR1420700].

References

Garcia, J. M. (2011). A fixed-point algorithm to estimate the Yule–Simon distribution parameter. Applied Mathematics and Computation, 217(21), 8560–8566. https://doi.org/10.1016/j.amc.2011.03.092
Web of Science ®Google Scholar
Huete-Morales, M. D., & Marmolejo-Martín, J. A. (2020). The Waring distribution as a low-frequency prediction model: A study of organic livestock farms in Andalusia. Mathematics, 8(11), 2025. https://doi.org/10.3390/math8112025
Web of Science ®Google Scholar
Mills, T. (2017). A statistical biography of george udny yule: A loafer of the world. Cambridge Scholars Press.
Google Scholar
Panaretos, J., & Xekalaki, E. (1986). The stuttering generalized waring distribution. Statistics and Probability Letters, 4(6), 313–318. https://doi.org/10.1016/0167-7152(86)90051-9
Web of Science ®Google Scholar
Price, D. (1965). Network of scientific papers. Science, 149(3683), 510–515. https://doi.org/10.1126/science.149.3683.510
PubMed Web of Science ®Google Scholar
Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306. https://doi.org/10.1002/(ISSN)1097-4571
Web of Science ®Google Scholar
Rivas, L., & Campos, F. (2021). Zero inflated Waring distribution. Communications in Statistics – Simulation and Computation, to appear. https://doi.org/10.1080/03610918.2021.1944638
Google Scholar
Seal, H. L. (1947). A probability distribution of deaths at age x when policies are counted instead of lives. Scandinavian Actuarial Journal, 1947, 118–43. https://doi.org/10.1080/03461238.1947.10419647
Google Scholar
Seal, H. L. (1952). The maximum likelihood fitting of the discrete Pareto law. Journal of the Institute of Actuaries, 78(1), 115–121. https://doi.org/10.1017/S0020268100052501
Google Scholar
Simon, H. A. (1955). On a class of skew distribution functions. Biometrika, 42(3–4), 425–440. https://doi.org/10.1093/biomet/42.3-4.425
Web of Science ®Google Scholar
Xekalaki, E. (1983). The univariate generalized Waring distribution in relation to accident theory: Proneness, spells or contagion? Biometrics, 39(4), 887–895. https://doi.org/10.2307/2531324
PubMed Web of Science ®Google Scholar
Xekalaki, E. (1985). Factorial moment estimation for the bivariate generalized Waring distribution. Statistical Papers, 26(1), 115–129. https://doi.org/10.1007/BF02932525
Google Scholar
Yule, G. U. (1925). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S. Philosophical Transactions of the Royal Society B, 213, 21–87.
Google Scholar

Appendices

The appendix contains some useful lemmas and technical proofs.

Appendix 1. Some useful lemmas

Lemma A.1 Define

g (x) = \frac{1}{\frac{1}{a_{1} x + b_{1}} + \dots + \frac{1}{a_{k} x + b_{k}}}, x > 0,

where

a_{1}, \dots, a_{k}

are positive and

b_{1}, \dots, b_{k}

are nonnegative. Then

g (x)

is an increasing and concave function.

Proof.

It is easy to derive that $\begin{aligned} g^{'} (x) & = g^{2} (x) {\frac{a_{1}}{(a_{1} x + b_{1})^{2}} + \dots + \frac{a_{k}}{(a_{k} x + b_{k})^{2}}} > 0, \\ g^{''} (x) & = 2 g^{3} (x) [{\frac{a_{1}}{(a_{1} x + b_{1})^{2}} + \dots + \frac{a_{k}}{(a_{k} x + b_{k})^{2}}}^{2} \\ - (\frac{1}{a_{1} x + b_{1}} + \dots + \frac{1}{a_{k} x + b_{k}}) {\frac{a_{1}^{2}}{(a_{1} x + b_{1})^{3}} + \dots + \frac{a_{k}^{2}}{(a_{k} x + b_{k})^{3}}}] \\ = - \sum_{1 \leq i < j \leq k} \frac{1}{(a_{i} x + b_{i}) (a_{j} x + b_{j})} {(\frac{1}{a_{i} x + b_{i}} - \frac{1}{a_{j} x + b_{j}})}^{2} < 0. ■ \end{aligned}$

Lemma A.2

When $x \to \infty$ , we have $\begin{aligned} \frac{x^{m} + a_{1} x^{m - 1} + a_{2} x^{m - 2} + \dots}{b_{1} x^{m - 1} + b_{2} x^{m - 2} + b_{3} x^{m - 3} + \dots} \\ = \frac{1}{b_{1}} x + \frac{a_{1} b_{1} - b_{2}}{b_{1}^{2}} + \frac{a_{2} b_{1}^{2} - b_{1} b_{3} - a_{1} b_{1} b_{2} + b_{2}^{2}}{b_{1}^{3}} \cdot \frac{1}{x} + O (\frac{1}{x^{2}}) . \end{aligned}$

Proof.

Assume that $\frac{x^{m} + a_{1} x^{m - 1} + a_{2} x^{m - 2} + \dots}{b_{1} x^{m - 1} + b_{2} x^{m - 2} + b_{3} x^{m - 3} + \dots} = \frac{1}{b_{1}} x + c + d \cdot \frac{1}{x} + O (\frac{1}{x^{2}}) .$ Then $\begin{aligned} x^{m} + a_{1} x^{m - 1} + a_{2} x^{m - 2} + \dots \\ = (b_{1} x^{m - 1} + b_{2} x^{m - 2} + b_{3} x^{m - 3} + \dots) \cdot (\frac{1}{b_{1}} x + c + d \cdot \frac{1}{x} + \dots) \\ = x^{m} + (b_{1} c + \frac{b_{2}}{b_{1}}) x^{m - 1} + (b_{1} d + b_{2} c + \frac{b_{3}}{b_{1}}) x^{m - 2} + \dots, \end{aligned}$ which indicates that: (i) $b_{1} c + b_{2} / b_{1} = a_{1}$ , and then $c = \frac{a_{1} b_{1} - b_{2}}{b_{1}^{2}}$ ; (ii) $b_{1} d + b_{2} c + \frac{b_{3}}{b_{1}} = a_{2}$ , and then $d = \frac{a_{2} b_{1}^{2} - b_{1} b_{3} - a_{1} b_{1} b_{2} + b_{2}^{2}}{b_{1}^{3}}$ . The proof is completed.

Appendix 2. Technical Proofs

Appendix 2.1. Proof of Lemmas 2.3–2.4

Proof

Proof of Lemma 2.3

Since for any β, $ℓ_{n} (α (β), β) = max_{α} ℓ_{n} (α, β)$ . Thus, to prove $ℓ_{n} (α (β_{1}), β_{1}) = max_{α, β} ℓ_{n} (α, β)$ , we only need prove that $β_{1}$ maximizes $ℓ_{n} (α (β), β)$ . Therefore, we only need prove that $\partial ℓ_{n} (α (β), β) / \partial β |_{β = β_{1}} = 0$ , $\partial ℓ_{n} (α (β), β) / \partial β > 0$ for $β < β_{1}$ and $\partial ℓ_{n} (α (β), β) / \partial β < 0$ for $β > β_{1}$ .

Consider the following decomposition, $\begin{aligned} \frac{ℓ_{n} (α (β + Δ β), β + Δ β) - ℓ_{n} (α (β), β)}{Δ β} \\ = \frac{ℓ_{n} (α (β + Δ β), β + Δ β) - ℓ_{n} (α (β), β + Δ β)}{Δ β} + \frac{ℓ_{n} (α (β), β + Δ β) - ℓ_{n} (α (β), β)}{Δ β} \\ \to {{\frac{\partial ℓ_{n} (α, β)}{\partial α} \cdot \frac{\partial α (β)}{\partial β} + \frac{\partial ℓ (α, β)}{\partial β}} |}_{α = α (β)}, \end{aligned}$ where $\frac{\partial ℓ_{n} (α, β)}{\partial α} |_{α = α (β)} = 0$ , and thus $\frac{\partial ℓ (α, β)}{\partial β} |_{α = α (β)}$ totally determines the sign of $\frac{\partial ℓ (α (β), β)}{\partial β}$ . The proof is completed.

Proof

Proof of Lemma 2.4

We use the method of contradiction. If the conclusion is not correct, then there exists $x_{1} > a$ such that $t_{1} (x_{1}) = t_{2} (x_{1})$ , $t_{1} (x) > t_{2} (x)$ for $x < x_{1}$ and $t_{1} (x) < t_{2} (x)$ for $x \in (x_{1}, x_{1} + δ_{0})$ for some $δ_{0} > 0$ . By assumption (D), the curves $y = t_{1} (x)$ and $y = t_{2} (x)$ will intersect again after $(x_{1}, t_{1} (x_{1}))$ , i.e., there exists $x_{2} > x_{1}$ such that $t_{1} (x_{2}) = t_{2} (x_{2})$ , $t_{1} (x) < t_{2} (x)$ for $x \in (x_{1}, x_{2})$ and $t_{1} (x) > t_{2} (x)$ for $x > x_{2}$ (suppose that there exists only one such $x_{2}$ , otherwise, we consider the largest intersection). According to assumption (D), take one point $x_{*} \in (δ_{4}^{*}, \infty)$ (which is of course greater than $x_{2}$ ), use $(x_{2}, t (x_{2}))$ as the starting point, and then take a ray interpolating $(x_{*}, t_{1} (x_{*}))$ . Let $x_{*}$ diverge to infinity so that the point $(x_{*}, t_{1} (x_{*}))$ moves along the curve $y = t_{1} (x)$ . Since $t_{1} (x)$ is increasing and concave, the ray interpolating $(x_{*}, t_{1} (x_{*}))$ tilts down around the start point $(x_{2}, t (x_{2}))$ . By assumption (C), when $x_{*} \to \infty$ , the slope of the ray $\frac{t_{1} (x_{*}) - t_{1} (x_{2})}{x_{*} - x_{2}} \to c^{*} .$ Thus the limit of the ray is a ray with start point $(x_{2}, t_{1} (x_{2}))$ and slope $c^{*}$ , denoted as L, and the curve $y = t_{1} (x)$ is above L.

Note that the start point of the ray L, $(x_{2}, t (x_{2}))$ , is on the curve $y = t_{2} (x)$ . By assumption (D), there exists an $x^{*}$ , which satisfies that, the curve $y = t_{2} (x)$ intersects L at $(x^{*}, t_{2} (x^{*}))$ and $y = t_{2} (x)$ lies below L for $x \in (x^{*}, x^{*} + δ_{5}^{*})$ with some positive $δ_{5}^{*}$ . Without loss of generality, we assume that $x_{2}$ is such point, that is, $y = t_{2} (x)$ lies below L for $x \in (x_{2}, x_{2} + δ_{5}^{*})$ .

Through the intersection $(x_{2}, t_{1} (x_{2}))$ , we make tangent line of the curve $y = t_{2} (x)$ . If the tangent line coincides with the ray, then take another point $x^{* *} \in (x_{2}, x_{2} + δ_{5}^{*})$ , and make another tangent line of the curve $y = t_{2} (x)$ through the point $(x^{* *}, t_{2} (x^{* *}))$ . Since $y = t_{2} (x)$ is increasing and concave, if the tangent line (of $y = t_{2} (x)$ ) through $(x_{2}, t_{1} (x_{2}))$ coincides with the ray L, the tangent line through $(x^{* *}, t_{2} (x^{* *}))$ does not coincide with L. Note that the curve $y = t_{1} (x)$ is above L, while $y = t_{2} (x)$ is below the tangent line (a concave curve is always below its tangent line) which is below the ray L (the one which does not coincide with L must be below L according to the above discussion). Therefore, $lim_{x \to \infty} \frac{t_{1} (x)}{x} \neq lim_{x \to \infty} \frac{t_{2} (x)}{x}$ , which contradicts with assumption (C).

To summary, no such $x_{1} > a$ exists that $t_{1} (x_{1}) = t_{2} (x_{1})$ , $t_{1} (x) > t_{2} (x)$ for $x < x_{1}$ and $t_{1} (x) < t_{2} (x)$ for $x \in (x_{1}, x_{1} + δ_{0})$ for some $δ_{0} > 0$ . We conclude that, $t_{1} (x) \geq t_{2} (x)$ for $x \in (a, \infty)$ . The proof of Lemma 2.4 is completed.

Appendix 2.2. Proof of Propositions 2.1–2.2

Proof

Proof of Propositions 2.1

Proof of Property 1. If $β \to 0$ , we have $g_{1} (α, β) \to \frac{1}{\frac{1}{α} + \frac{\sum_{s = 2}^{m} p_{s}}{α + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + 2} + \dots + \frac{p_{m}}{α + m - 1}} \to 0,$ when $α \to 0$ . Therefore, when $β \to 0$ , the intersection of $y = g_{1} (α, β)$ and $y = α$ converges to the origin of coordinates.

Proof of Property 2. If $β \to \infty$ , then for any $α > 0$ , we have $g_{1} (α, β) \to \infty$ . Thus, if $β \to \infty$ , then $α (β) \to \infty$ because $(α (β), β)$ is the intersection. We have $\begin{aligned} α (β) & = \frac{1}{\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β) + β + 2} + \dots + \frac{p_{m}}{α (β) + β + m - 1}} \\ = \frac{{α (β) + β}^{m} + a_{1} \cdot {α (β) + β}^{m - 1} + a_{2} \cdot {α (β) + β}^{m - 2} + \dots}{b_{1} \cdot {α (β) + β}^{m - 1} + b_{2} \cdot {α (β) + β}^{m - 2} + b_{3} \cdot {α (β) + β}^{m - 3} + \dots}, \end{aligned}$ where $\begin{aligned} a_{1} & = \frac{m (m - 1)}{2}, a_{2} = \frac{1}{24} m (m - 1) (m - 2) (3 m - 1), b_{1} = 1 + η_{1}, \\ b_{2} & = \frac{m (m - 1)}{2} (1 + η_{1}) + η_{1} - η_{2}, \\ b_{3} & = \frac{m (m - 1) (m - 2) (3 m - 1)}{24} + \frac{m (m - 1) (3 m^{2} - 7 m + 14) + 24}{24} η_{1} \\ - {\frac{m (m - 1)}{2} + 2} η_{2} + η_{3} . \end{aligned}$ Based on Lemma A.2, tedious calculation yields $α (β) = \frac{1}{1 + η_{1}} {α (β) + β} + c_{α}^{'} + \frac{d_{α}}{α (β) + β} + \dots,$ where $c_{α}^{'} = \frac{η_{2} - η_{1}}{(1 + η_{1})^{2}}, d_{α} = \frac{η_{2}^{2} - η_{1} η_{3} - η_{1} + 2 η_{2} - η_{3}}{(1 + η_{1})^{3}} .$ Simple algebra yields $α (β) = \frac{1}{η_{1}} β + \frac{c_{α}^{'} (1 + η_{1})}{η_{1}} + \frac{1 + η_{1}}{η_{1}} \frac{d_{α}}{α (β) + β} + \dots = \frac{1}{η_{1}} β + c_{α} + \frac{d_{α}}{β} + O (1 / β^{2}),$ where $c_{α} = \frac{c_{α}^{'} (1 + η_{1})}{η_{1}} = \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})}$ .

Proof of Property 3. Since (A1) $\frac{1}{α (β)} = \frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1},$ (A1) taking derivative with respect to β on both sides of (EquationA1(A1) $\frac{1}{α (β)} = \frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1},$ (A1) ), we have $α^{'} (β) = α^{2} (β) [\frac{α^{'} (β) + 1}{{α (β) + β}^{2}} + \frac{{α^{'} (β) + 1} \sum_{s = 2}^{m} p_{s}}{{α (β) + β + 1}^{2}} + \dots + \frac{{α^{'} (β) + 1} p_{m}}{{α (β) + β + m - 1}^{2}}] .$ Simple algebra leads to $α^{'} (β) = \frac{u (β)}{1 - u (β)},$ where (A2) $u (β) = \frac{\frac{1}{{α (β) + β}^{2}} + \frac{\sum_{s = 2}^{m} p_{s}}{{α (β) + β + 1}^{2}} + \dots + \frac{p_{m}}{{α (β) + β + m - 1}^{2}}}{{\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1}}^{2}} > 0.$ (A2) Furthermore, since $\begin{aligned} {\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1}}^{2} \\ = \sum_{i = 1}^{m} [\frac{\sum_{s = i} p_{s}}{α (β) + β + i - 1} {\sum_{j = 1}^{m} \frac{\sum_{s = j}^{m} p_{s}}{α (β) + β + j - 1}}] \\ > \sum_{i = 1}^{m} {\frac{\sum_{s = i} p_{s}}{α (β) + β + i - 1} \frac{1}{α (β) + β}} > \sum_{i = 1}^{m} \frac{\sum_{s = i} p_{s}}{{α (β) + β + i - 1}^{2}}, \end{aligned}$ which indicates that $u (β) < 1$ . Therefore, $α^{'} (β) > 0$ .

Taking derivative with respect to β twice on both sides of (EquationA1(A1) $\frac{1}{α (β)} = \frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1},$ (A1) ), we have $\begin{aligned} α^{''} (β) \\ = 2 α^{3} (β) ({[\frac{α^{'} (β) + 1}{{α (β) + β}^{2}} + \frac{{α^{'} (β) + 1} \sum_{s = 2}^{m} p_{s}}{{α (β) + β + 1}^{2}} + \dots + \frac{{α^{'} (β) + 1} p_{m}}{{α (β) + β + m - 1}^{2}}]}^{2} \\ - {\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β) + β + 2} + \dots + \frac{p_{m}}{α (β) + β + m - 1}} \\ \times [\frac{{α^{'} (β) + 1}^{2}}{{α (β) + β}^{3}} + \frac{{α^{'} (β) + 1}^{2} \sum_{s = 2}^{m} p_{s}}{{α (β) + β + 1}^{3}} + \dots + \frac{{α^{'} (β) + 1}^{2} p_{m}}{{α (β) + β + m - 1}^{3}}]) \\ = - 2 α^{3} (β) {α^{'} (β) + 1}^{2} \\ \times [\sum_{0 \leq i < j \leq m - 1} \frac{\sum_{s = i + 1}^{m} p_{s} \sum_{s = j + 1}^{m} p_{s}}{{α (β) + β + i} {α (β) + β + j}} {\frac{1}{α (β) + β + i} - \frac{1}{α (β) + β + j}}^{2}] \\ < 0. \end{aligned}$

Proof of Property 4. If $β \to 0$ , then $α (β) \to 0$ , and thus (EquationA2(A2) $u (β) = \frac{\frac{1}{{α (β) + β}^{2}} + \frac{\sum_{s = 2}^{m} p_{s}}{{α (β) + β + 1}^{2}} + \dots + \frac{p_{m}}{{α (β) + β + m - 1}^{2}}}{{\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1}}^{2}} > 0.$ (A2) ) indicates that $u (β) \to 1$ ; therefore $α^{'} (β) \to \infty$ . If $β \to \infty$ , then $α (β) \to \infty$ , and thus (EquationA2(A2) $u (β) = \frac{\frac{1}{{α (β) + β}^{2}} + \frac{\sum_{s = 2}^{m} p_{s}}{{α (β) + β + 1}^{2}} + \dots + \frac{p_{m}}{{α (β) + β + m - 1}^{2}}}{{\frac{1}{α (β) + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β) + β + 1} + \dots + \frac{p_{m}}{α (β) + β + m - 1}}^{2}} > 0.$ (A2) ) indicates that $u (β) \to \frac{1}{1 + \sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s}}$ ; therefore $α^{'} (β) \to \frac{1}{\sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s}}$ .

Proof of Property 5. According to (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) ), the conditional maximum likelihood equation of α can be rewritten as $\frac{α}{α + β} + \frac{α \sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{α \sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{α p_{m}}{α + β + m - 1} = 1.$ Let $f (α) = \frac{α}{α + β} + \frac{α \sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{α \sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{α p_{m}}{α + β + m - 1},$ and then $f (α)$ is an increasing function of α. Since $f (α (β)) = 1$ , then x is less than, equal to or greater than $α (β)$ which is equivalent to that $f (x)$ is less than, equal to or greater than 1. Therefore, $α (β) = Z (β)$ is equivalent to $f (Z (β)) = 1$ . Since $Z (β)$ is a polynomial or fractional function of β, then $f (Z (β)) = \frac{Z (β)}{Z (β) + β} + \frac{Z (β) \sum_{s = 2}^{m} p_{s}}{Z (β) + β + 1} + \frac{Z (β) \sum_{s = 3}^{m} p_{s}}{Z (β) + β + 2} + \dots + \frac{Z (β) p_{m}}{Z (β) + β + m - 1} = 1$ is a high-ordered polynomial equation, which has finite number of solutions.

Proof

Proof of Proposition 2.2

The proofs of Properties 1* and 5* are similar to the proofs of Properties 1 and 5 in Proposition 2.1, respectively, and Property 3* follows from Lemma A.1. In the following, we present the proofs of Properties 2* and 4*.

Proof of Property 2*. By Lemma A.2, it is easy to obtain $h (β) = \frac{β^{m - 1} + a_{1} β^{m - 2} + a_{2} β^{m - 3} + \dots}{b_{1} β^{m - 2} + b_{2} β^{m - 3} + b_{3} β^{m - 4} + \dots},$ where $\begin{aligned} a_{1} & = \frac{(m - 1) (m - 2)}{2}, a_{2} = \frac{(m - 1) (m - 2) (m - 3) (3 m - 4)}{24}, b_{1} = η_{1}, \\ b_{2} & = \frac{(m - 2) (m - 1) + 4}{2} η_{1} - η_{2}, \\ b_{3} & = \frac{(m - 2) (m - 1) (3 m^{2} - 13 m + 36) + 96}{24} η_{1} - \frac{m^{2} - 3 m + 10}{2} η_{2} + η_{3} . \end{aligned}$ Therefore, we have $h (β) = \frac{1}{η_{1}} β + c_{h} + d_{h} \frac{1}{β} + O (1 / β^{2}),$ where $\begin{aligned} c_{h} & = \frac{a_{1} b_{1} - b_{2}}{b_{1}^{2}} = \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, \\ d_{h} & = \frac{a_{2} b_{1}^{2} - b_{1} b_{3} - a_{1} b_{1} b_{2} + b_{2}^{2}}{b_{1}^{3}} = \frac{η_{2}^{2} - η_{1} η_{3}}{η_{1}^{3}} . \end{aligned}$ Proof of Property 4*. It is easy to derive that $\begin{aligned} h^{'} (β) & = \frac{\frac{\sum_{s = 2} p_{s}}{β^{2}} + \frac{\sum_{s = 3} p_{s}}{(β + 1)^{2}} + \dots + \frac{p_{m}}{(β + m - 2)^{2}}}{(\frac{\sum_{s = 2} p_{s}}{β} + \frac{\sum_{s = 3} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2})} \\ = \frac{(\sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s}) β^{2 (m - 1)} + \dots + (\sum_{s = 2}^{m} p_{s}) {(m - 2)!}^{2}}{(\sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s})^{2} β^{2 (m - 1)} + \dots + (\sum_{s = 2}^{m} p_{s})^{2} {(m - 2)!}^{2}}, \end{aligned}$ and we have $\begin{aligned} h^{'} (β) & \to \frac{(\sum_{s = 2}^{m} p_{s}) {(m - 2)!}^{2}}{(\sum_{s = 2}^{m} p_{s})^{2} {(m - 2)!}^{2}} = \frac{1}{\sum_{s = 2}^{m} p_{s}}, w h e n β \to 0, \\ h^{'} (β) & \to \frac{\sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s}}{(\sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s})^{2}} = \frac{1}{\sum_{t = 2}^{m} \sum_{s = t}^{m} p_{s}} = \frac{1}{η_{1}}, w h e n β \to \infty . \end{aligned}$

Appendix 2.3. Proof of Theorem 2.5

By Properties 1, 4 of $α (β)$ and 1*, 4* of $h (β)$ , when $β \to 0$ , $h (β) \to 0$ and $α (β) \to 0$ ; however, $h^{'} (β) \to \frac{1}{\sum_{s = 2}^{m} p_{s}}$ while $α^{'} (β) \to \infty$ . Thus, there exists $δ_{1} > 0$ , such that $α (β) > h (β)$ for $β \in (0, δ_{1})$ .

By Property 2 of $α (β)$ and 2* of $h (β)$ , when $β \to \infty$ , $\begin{aligned} h (β) & = \frac{1}{η_{1}} \cdot β + \frac{η_{2} - 2 η_{1}}{η_{1}^{2}} + \frac{η_{2}^{2} - η_{1} η_{3}}{η_{1}^{3}} \cdot \frac{1}{β} + O (\frac{1}{β^{2}}), \\ α (β) & = \frac{1}{η_{1}} \cdot β + \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} + \frac{η_{2}^{2} - η_{1} η_{3} - η_{1} + 2 η_{2} - η_{3}}{(1 + η_{1})^{3}} \cdot \frac{1}{β} + O (\frac{1}{β^{2}}) . \end{aligned}$ We first discuss the situation $\frac{η_{2} - 2 η_{1}}{η_{1}^{2}} \neq \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})}$ . We have, there exists $δ_{2} > 0$ , such that for $β \in (δ_{2}, \infty)$ , (A3) $\begin{aligned} {\begin{cases} α (β) < h (β), & i f \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} < \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, \\ α (β) > h (β), & i f \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} > \frac{η_{2} - 2 η_{1}}{η_{1}^{2}} . \end{cases} \end{aligned}$ (A3) In case of $\frac{η_{2} - 2 η_{1}}{η_{1}^{2}} = \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})}$ , that is, $η_{2} = η_{1}^{2} + 2 η_{1}$ , we need compare $d_{α} = \frac{η_{2}^{2} - η_{1} η_{3} - η_{1} + 2 η_{2} - η_{3}}{(1 + η_{1})^{3}}$ and $d_{h} = \frac{η_{2}^{2} - η_{1} η_{3}}{η_{1}^{3}}$ . If $d_{α} > d_{h}$ , $α (β) > h (β)$ and if $d_{α} < d_{h}$ , $α (β) < h (β)$ .

Therefore, if (A4) $\frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} < \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, o r \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} = \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, d_{α} < d_{h},$ (A4) there must exist an intersection for the curves $y = h (β)$ and $y = α (β)$ . Part (II) of Theorem 2.5 follows directly from Lemma 2.4. Thus we only need prove part (I). In the following, we assume that (EquationA4(A4) $\frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} < \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, o r \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} = \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, d_{α} < d_{h},$ (A4) ) holds so that $y = h (β)$ and $y = α (β)$ intersect at least once at some positive β.

Suppose that $y = h (β)$ and $y = α (β)$ intersect firstly at $(β_{1}, α_{1})$ , where $α_{1} = h (β_{1}) = α (β_{1})$ , and then $α (β) > h (β)$ for $β \in (0, β_{1})$ . By Property 5 of $α (β)$ in Proposition 2.1 the curves $y = h (β)$ and $y = α (β)$ only intersect finite times. Therefore, there exists $δ_{3} > β_{1}$ such that $y = α (β)$ and $y = h (β)$ do not intersect for $β \in (β_{1}, δ_{3})$ . If for $β \in (β_{1}, δ_{3})$ , the curve $y = α (β)$ is above $y = h (β)$ . Then, due to (EquationA4(A4) $\frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} < \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, o r \frac{η_{2} - η_{1}}{η_{1} (1 + η_{1})} = \frac{η_{2} - 2 η_{1}}{η_{1}^{2}}, d_{α} < d_{h},$ (A4) ), the curve $y = α (β)$ will finally be below the curve $y = h (β)$ . Thus the two curves will intersect again. However, because the number of intersections is finite, it cannot be always the case that the curve $y = α (β)$ lies above $y = h (β)$ after the intersection, i.e., there exists an intersection that $y = α (β)$ lies below $y = h (β)$ after that intersection. Without loss of generality, we assume that (A5) $y = α (β) lies below y = h (β) after the first intersection (β_{1}, α_{1}) .$ (A5) Next, we prove that $(α (β_{1}), β_{1})$ is the maximizer of the log-likelihood function $ℓ_{n} (α, β)$ . Since $α (β)$ is the conditional maximum likelihood estimator of α, i.e., $max_{α, β > 0} ℓ_{n} (α, β) = max_{β > 0} ℓ_{n} (α (β), β),$ We only need prove that $β = β_{1}$ is a maximizer of $ℓ_{n} (α (β), β)$ .

Since $(α (β_{1}), β_{1})$ is a solution to the equation system (Equation3(3) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial α} \\ = \frac{1}{α} - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) = 0, \end{aligned}$ (3) )–(Equation4(4) $\begin{aligned} \frac{1}{n} \cdot \frac{\partial ℓ_{n} (α, β)}{\partial β} \\ = - (\frac{1}{α + β} + \frac{\sum_{s = 2}^{m} p_{s}}{α + β + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α + β + 2} + \dots + \frac{p_{m}}{α + β + m - 1}) \\ + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) = 0, \end{aligned}$ (4) ), then $\begin{aligned} \frac{1}{n} \cdot {\frac{\partial ℓ_{n} (α, β)}{\partial α} |}_{α = α (β_{1}), β = β_{1}} \\ = \frac{1}{α (β_{1})} - (\frac{1}{α (β_{1}) + β_{1}} + \frac{\sum_{s = 2}^{m} p_{s}}{α (β_{1}) + β_{1} + 1} + \frac{\sum_{s = 3}^{m} p_{s}}{α (β_{1}) + β_{1} + 2} + \dots + \frac{p_{m}}{α (β_{1}) + β_{1} + m - 1}) \\ = 0, \\ \frac{1}{n} \cdot {\frac{\partial ℓ_{n} (α, β)}{\partial β} |}_{α = α (β_{1}), β = β_{1}} \\ = - \frac{1}{α (β_{1})} + (\frac{\sum_{s = 2}^{m} p_{s}}{β_{1}} + \frac{\sum_{s = 3}^{m} p_{s}}{β_{1} + 1} + \dots + \frac{p_{m}}{β_{1} + m - 2}) \\ = 0. \end{aligned}$ To prove that $β = β_{1}$ maximizes $ℓ_{n} (α (β), β)$ , by Lemma 2.3, we only need prove that $\frac{\partial ℓ_{n} (α, β)}{\partial β} |_{α = α (β)}$ is greater than zero for $β \in (0, β_{1})$ and smaller than zero if $β \in (β_{1}, \infty)$ .

We first consider $β \in (0, β_{1})$ . When $β \in (0, β_{1})$ , we have $α (β) > h (β)$ . Therefore, (A6) $\frac{1}{n} \cdot {\frac{\partial ℓ_{n} (α, β)}{\partial β} |}_{α = α (β)} = - \frac{1}{α (β)} + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) > 0.$ (A6) We next consider $β \in (β_{1}, \infty)$ . By (EquationA5(A5) $y = α (β) lies below y = h (β) after the first intersection (β_{1}, α_{1}) .$ (A5) ), $α (β) < h (β)$ if $β \in (β_{1}, δ_{3})$ . Then, by Lemma 2.4, $y = α (β)$ can't be above $y = h (β)$ at any $β > β_{1}$ , i.e., $α (β) \leq h (β)$ for all $β > β_{1}$ . Therefore, (A7) $\frac{1}{n} \cdot {\frac{\partial ℓ_{n} (α, β)}{\partial β} |}_{α = α (β)} = - \frac{1}{α (β)} + (\frac{\sum_{s = 2}^{m} p_{s}}{β} + \frac{\sum_{s = 3}^{m} p_{s}}{β + 1} + \dots + \frac{p_{m}}{β + m - 2}) < 0.$ (A7) The proof is completed. We see that the overall proof depends on the fact that $\begin{aligned} i f α (β) > h (β), t h e n \frac{\partial ℓ_{n} (α (β), β)}{\partial β} > 0, a n d t h u s ℓ_{n} (α (β), β) increases with β; \\ i f α (β) < h (β), t h e n \frac{\partial ℓ_{n} (α (β), β)}{\partial β} < 0, a n d t h u s ℓ_{n} (α (β), β) decreases with β . \end{aligned}$

On the MLE of the Waring distribution

Abstract

1. Introduction

2. Maximum likelihood estimator of the Waring parameters

3. Simulation studies

3.1. Comparison of MLE and EFF

Table 1. Relative biases and relative standard errors of estimated parameters, for $β = 0.5$ and 1.

Table 2. Relative biases and relative standard errors of estimated parameters, for $β = 1.5$ and 2.

3.2. Goodness-of-fit comparison with Yule-Simon distribution

Table 3. Proportion of replicates that the Yule-Simon distribution is rejected at nominal level 0.05.

4. Real data application

Table 4. Comparison of actual distribution (A) with discrete Pareto law fitting (P), Waring fitting with EFF (E) and MLE (M).

5. Discussion

Acknowledgements

Disclosure statement

References

Appendices

Appendix 1. Some useful lemmas

Appendix 2. Technical Proofs

Appendix 2.1. Proof of Lemmas 2.3–2.4

Proof of Lemma 2.3

Proof of Lemma 2.4

Appendix 2.2. Proof of Propositions 2.1–2.2

Proof of Propositions 2.1

Proof of Proposition 2.2

Appendix 2.3. Proof of Theorem 2.5

Information for

Open access

Opportunities

Help and information

On the MLE of the Waring distribution

Abstract

1. Introduction

2. Maximum likelihood estimator of the Waring parameters

3. Simulation studies

3.1. Comparison of MLE and EFF

Table 1. Relative biases and relative standard errors of estimated parameters, for β=0.5 and 1.

Table 2. Relative biases and relative standard errors of estimated parameters, for β=1.5 and 2.

3.2. Goodness-of-fit comparison with Yule-Simon distribution

Table 3. Proportion of replicates that the Yule-Simon distribution is rejected at nominal level 0.05.

4. Real data application

Table 4. Comparison of actual distribution (A) with discrete Pareto law fitting (P), Waring fitting with EFF (E) and MLE (M).

5. Discussion

Acknowledgements

Disclosure statement

Additional information

Funding

References

Appendices

Appendix 1. Some useful lemmas

Appendix 2. Technical Proofs

Appendix 2.1. Proof of Lemmas 2.3–2.4

Proof of Lemma 2.3

Proof of Lemma 2.4

Appendix 2.2. Proof of Propositions 2.1–2.2

Proof of Propositions 2.1

Proof of Proposition 2.2

Appendix 2.3. Proof of Theorem 2.5

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Relative biases and relative standard errors of estimated parameters, for $β = 0.5$ and 1.

Table 2. Relative biases and relative standard errors of estimated parameters, for $β = 1.5$ and 2.