Full article: Chen-G class of distributions

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The quest to generate distributions with more desirable and flexible properties for the modeling of data has led to an intense focus on the development of new families that are generalizations of existing distributions by researchers. A new family of distributions called the chen generated family is developed in this study. Its statistical properties such as the quantile, moments, incomplete moments, stochastic ordering and order statistics are derived by using the method of maximum likelihood, estimators for the parameters of the new family are developed. Three special distributions, Chen Burr III, Chen Kumaraswamy and Chen Weibull, are proposed from the new family, though it can generalize other distributions. A demonstration of the usefulness of the new family is performed using real dataset.

Keywords:

Jel:

PUBLIC INTEREST STATEMENT

Modeling of natural phenomena such as earthquakes, rainfall, tsunami and so on mostly involves the use of statistical distributions. Since the accuracy of the results largely depends on how well the distribution fits the dataset, the study develops a new family of distributions which is to improve the flexibility of existing distributions.

Conflict of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

1. Introduction

The accuracy of parametric statistical inference and modeling of datasets largely depends on how well the probability distribution fits the given dataset once it has met all distributional assumptions. Several studies have been carried out on statistical distributions in the quest to generate distributions with more desirable and flexible properties that can model real-life datasets of varying shapes of density and failure rate functions. Currently, most studies are focused on developing new families that are generalizations of existing distributions to provide better fit to the modeling of data. These families of distributions are constructed by either compounding two or more distributions or adding one or more parameters to the baseline model. Many authors have extensively reviewed the various families of distributions (Hamedani, Yousof, Rasekhi, Alizadeh, & Najibi, Citation2018; Lee, Famoye, & Alzaatreh, Citation2013; Nasiru, Citation2018; Nasiru, Mwita, & Ngesa, Citation2018; Zubair, Citation2018).

In this study a new class of distributions is developed and proposed using the T-X approach (Alzaatreh, Lee, & Famoye, Citation2013). The Chen generated (CG) family of distributions is obtained by compounding the two-parameter Chen distribution (Chen, Citation2000) and an arbitrary baseline cumulative distribution function (cdf) of a continuous random variable. The main motivation for developing this family is to improve the flexibility of the existing classical distributions, thus to enabling them to provide a better fit to real data sets than other candidate distributions with the same number of parameters and model different kinds of failure rate (monotonic and non-monotonic).

The remaining sections of the paper follow this order: the Chen generated (CG) family of distributions is defined in section 2. The mixture representation of the probability density function (pdf) is presented in section 3. Some statistical properties of the family of distributions are derived in section 4. The estimators for the parameters of the family are developed in section 5. Some special distributions from the CG family of distributions are proposed and discussed in section 6. Simulations to examine the properties of estimators of parameters of the special distributions are carried out in section 7. Real-life data set is used to demonstrate the application of the special distributions in section 8. Concluding remarks of the study are captured in section 9.

2. Chen generated a family of distributions

Let $T$ be a Chen distributed continuous random variable, its cdf denoted by $F (t)$ is given by $F (t) = 1 - e^{λ (1 - e^{t^{β}})}, t > 0$ (Chen, Citation2000). Also, let $G (x)$ and $g (x)$ be the respective cdf and pdf of an arbitrary continuous random variable $X$ . The cdf of the CG family is defined as;

(1)

F (x) = \int_{0}^{G (x)} f (t) d t = A [1 - e^{λ (1 - e^{G {(x)}^{β}})}], x > 0, λ > 0, β > 0,

(1)

where $A = 1 / 1 - e^{λ (1 - e)}$ is a normalizing constant, $λ$ and $β$ are scale and shape parameters, respectively. The pdf $f (x)$ of the family is given by;

(2)

f (x) = A λ β g (x) G (x)^{β - 1} e^{G {(x)}^{β}} e^{λ (1 - e^{G {(x)}^{β}})}, x > 0, λ > 0, β > 0.

(2)

The survival function, $S (x)$ of the CG family is;

(3)

S (x) = 1 - A [1 - e^{λ (1 - e^{G {(x)}^{β}})}], x > 0, λ > 0, β > 0.

(3)

The failure rate or hazard function, $h (x)$ of the family is obtained as follows:

(4)

h (x) = \frac{A λ β g (x) G {(x)}^{β - 1} e^{G {(x)}^{β}} e^{λ (1 - e^{G {(x)}^{β}})}}{1 - A [1 - e^{λ (1 - e^{G {(x)}^{β}})}]}, x > 0, λ > 0, β > 0.

(4)

3. Mixture representation of distribution

Mixture representation plays a useful role in the derivation of the statistical properties of the new family of distributions. Hence, the mixture representation of the pdf of the CG family of distributions is derived in this section.

By applying Taylor series expansion, the pdf of the CG family in EquationEquation (2)(2) $f (x) = A λ β g (x) G (x)^{β - 1} e^{G {(x)}^{β}} e^{λ (1 - e^{G {(x)}^{β}})}, x > 0, λ > 0, β > 0.$ (2) is expressed as

(5)

f (x) = A λ β e^{λ} g (x) G (x)^{β - 1} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \frac{{(- 1)}^{i} λ^{i}}{i!} \frac{{(i + 1)}^{j}}{j!} G {(x)}^{β (j + 1) - 1} .

(5)

EquationEquation (5)(5) $f (x) = A λ β e^{λ} g (x) G (x)^{β - 1} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \frac{{(- 1)}^{i} λ^{i}}{i!} \frac{{(i + 1)}^{j}}{j!} G {(x)}^{β (j + 1) - 1} .$ (5) can be rewritten as;

f (x) = A λ β e^{λ} g (x) \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \frac{{(- 1)}^{i} λ^{i}}{i!} \frac{{(i + 1)}^{j}}{j!} {[1 - (1 - G (x))]}^{β (j + 1) - 1} .

$f (x)$ is further expanded using the binomial series expansion $(1 - z)^{a - 1} = \sum_{k = 0}^{\infty} {(- 1)}^{k} (\begin{matrix} a - 1 \\ k \end{matrix}) z^{k},$ $| z | < 1$ for any real non-integer $a > 0$ as follows:

f (x) = A λ β e^{λ} g (x) \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \frac{{(- 1)}^{i} {(i + 1)}^{j} λ^{i}}{i! j!} {\sum_{k = 0}^{\infty} (- 1)}^{k} (\begin{matrix} β (j + 1) - 1 \\ k \end{matrix}) {(1 - G (x))}^{k} .

Assuming $a$ an integer in the binomial expansion,

(6)

f (x) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} g (x) {(G (x))}^{l},

(6)

where

ω_{i j k l} = \frac{{(- 1)}^{i + k + l} {(i + 1)}^{j} λ^{i} e^{λ}}{i! j!} (\begin{matrix} β (j + 1) - 1 \\ k \end{matrix}) (\begin{matrix} k \\ l \end{matrix}) .

From EquationEquation (6)(6) $f (x) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} g (x) {(G (x))}^{l},$ (6) , the CG family’s density is expressed as a product of the parameters and the sum of the product of the pdf and weighted power series of the baseline distribution function $G (x)$ .

Alternatively, EquationEquation (6)(6) $f (x) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} g (x) {(G (x))}^{l},$ (6) can further be written in terms of the exponentiated-G (expo-G) density function as

(7)

f (x) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l}^{*} π_{l + 1} (x),

(7)

where ${ω_{i j k l}}^{*} = \frac{ω_{i j k l}}{l + 1}$ and $π_{l + 1} (x) = (l + 1) g (x) {(G (x))}^{l}$ is the expo-G density function with the power parameter $(l + 1)$ .

4. Statistical properties

This section discusses some of the statistical properties of the CG family of distributions. These include: quantile function, non-central moments, moments, generating functions, inequality measures, entropies, residual life, stochastic ordering and order statistics.

4.1. Quantile function

Proposition 1. The quantile function for CG family of distributions is given by

(8)

Q_{G} (u) = x_{u} = G^{- 1} {(ln [1 - \frac{ln (1 - u / A)}{λ}])}^{\frac{1}{β}}, 0 < u < 1,

(8)

Proof. The quantile function $Q_{G} (u)$ of a random variable $X$ is defined as $F (x_{u}) = P (X \leq x_{u}) = u, u \in (0, 1)$ . Replacing $x$ with $x_{u}$ in EquationEquation (1)(1) $F (x) = \int_{0}^{G (x)} f (t) d t = A [1 - e^{λ (1 - e^{G {(x)}^{β}})}], x > 0, λ > 0, β > 0,$ (1) , equating $F (x_{u})$ to $u$ and making $x_{u}$ the subject yields the quantile function. The median of the family is obtained by substituting $u = 0.5$ in EquationEquation (8)(8) $Q_{G} (u) = x_{u} = G^{- 1} {(ln [1 - \frac{ln (1 - u / A)}{λ}])}^{\frac{1}{β}}, 0 < u < 1,$ (8) .

4.2. Moments, moment generating functions and incomplete moments

Moments are very essential in statistical analysis as they can be used to study important features (such as tendencies, variation, skewness, kurtosis and so on) of a distribution.

4.2.1. Non-central moments

Proposition 2. The $r^{t h}$ non-central moment of the CG family is given by

(9)

μ_{r}^{'} = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} τ_{(r, l)}, r = 1, 2, . . .,

(9)

$w h e r e τ_{(r, l)} = \int_{- \infty}^{\infty} x^{r} g (x) {(G (x))}^{l} d x$ is the probability weighted moment of the baseline distribution $G (x) .$

Proof. The $r^{t h}$ non-central moment is defined as $E (X^{r}) = μ_{r}^{'} = \int_{- \infty}^{\infty} x^{r} f (x) d x$ , thus using the mixture form of the density, the $r^{t h}$ non-central moment of the CG family is given by $μ_{r}^{'} = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{- \infty}^{\infty} x^{r} g (x) {(G (x))}^{l} d x .$

Alternatively, the $r^{t h}$ non-central moment of the CG family can be described in terms of the quantile function as follows;

$L e t G (x) = u, x = G^{- 1} (u) = Q_{G} (u), \frac{d G (x)}{d x} = \frac{d u}{d x} = g (x) a n d g (x) d x = d u$ . From EquationEquation (9)(9) $μ_{r}^{'} = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} τ_{(r, l)}, r = 1, 2, . . .,$ (9) ,

(10)

μ_{r}^{'} = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{0}^{1} Q_{G} {(u)}^{r} u^{l} d u .

(10)

4.2.2. Moment generating functions

Proposition 3. The moment generating function of the CG family is given by

(11)

M_{X} (t) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{r = 0}^{\infty} \sum_{l = 0}^{k} \frac{{(t)}^{r}}{r!} ω_{i j k l} τ_{(r, l) .}

(11)

Proof. By definition, the moment generating function is given by $M_{X} (t) = \int_{- \infty}^{\infty} e^{t x} f (x) d x$ , expanding $M_{X} (t)$ using Taylor series, $M_{X} (t) = \sum_{r = 0}^{\infty} \frac{{(t)}^{r}}{r!} \int_{- \infty}^{\infty} x^{r} f (x) d x$ .

But from EquationEquation (9)(9) $μ_{r}^{'} = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} τ_{(r, l)}, r = 1, 2, . . .,$ (9) , $\int_{- \infty}^{\infty} x^{r} f (x) d x = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} τ_{(r, l)},$ hence the proof.

Alternatively, letting $G (x) = u$ , the moment generating function can be expressed in terms of quantile functions as;

(12)

M_{X} (t) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{0}^{1} e^{t Q_{G} (u)} u^{l} d u .

(12)

4.2.3. Incomplete moments

Proposition 4. The $r^{t h}$ incomplete moment of the CG family of distribution is given by

(13)

M_{r} (y) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{- \infty}^{y} x^{r} g (x) {(G (x))}^{l} d x, r = 1, 2, . . . .

(13)

Proof. The $r^{t h}$ incomplete moment is defined as $M_{r} (y) = \int_{- \infty}^{y} x^{r} f (x) d x .$ Substituting the mixture representation of the density function into the definition of the $r^{t h}$ incomplete moments completes the proof.

Alternatively, letting $G (x) = u$ , the incomplete moments can be expressed in terms of the quantile function as;

(14)

M_{r} (y) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{0}^{G (y)} Q_{G} {(u)}^{r} u^{l} d u .

(14)

4.3. Inequality measures

Lorenz and Bonferroni curves are applied in so many fields such as econometrics, demography, reliability, medicine and insurance. They are generally used in studying inequality measures like income and poverty.

4.3.1. Lorenz curve

The Lorenz curve $L_{F} (y)$ for incomplete moments is defined as $L_{F} (y) = \frac{1}{μ} \int_{- \infty}^{y} x f (x) d x$ for the CG family, it is given by;

(15)

L_{F} (y) = \frac{A λ β}{μ} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{- \infty}^{y} x g (x) {(G (x))}^{l} d x .

(15)

Alternatively, letting $G (x) = u$ , $L_{F} (y)$ can be expressed in terms of the quantile functions as;

(16)

L_{F} (y) = \frac{A λ β}{μ} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{0}^{G (y)} Q_{G} (u) u^{l} d u

(16)

4.3.2. Bonferroni curve

Bonferroni curve $B_{F} (y)$ is defined as $B_{F} (y) = \frac{L_{F} (y)}{F (y)}$ , hence for the CG family it is given by;

(17)

B_{F} (y) = \frac{A λ β}{μ F (y)} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{- \infty}^{y} x g (x) {(G (x))}^{l} d x .

(17)

4.4. Mean residual life

The mean residual life of a component (which is the average survival time of the component after it has exceeded a specific time $y$ ) is defined as $E (X - y / X > y) .$

Proposition 5. The mean residual life of a CG random variable $Y$ is given by

(18)

\overline{M} (y) = \frac{1}{1 - F (y)} [μ - A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} \int_{- \infty}^{y} x g (x) {(G (x))}^{l} d x] - y .

(18)

Proof. The mean residual life is defined as $\overline{M} (y) = \frac{1}{1 - F (y)} [μ - \int_{- \infty}^{y} x f (x) d x] - y$ . Substituting $f (x)$ in EquationEquation (6)(6) $f (x) = A λ β \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ω_{i j k l} g (x) {(G (x))}^{l},$ (6) into $\overline{M} (y)$ gives the mean residual life.

4.5. Entropy

Entropy is a measure of variation or uncertainty of a random variable. Its application spans across probability theory, engineering and science in general.

4.5.1. Rényi’s entropy

The Rényi’s entropy (Rényi, Citation1961) for a random variable with pdf $f (x)$ , is defined as;

I_{R} (δ) = \frac{1}{1 - δ} log [\int_{- \infty}^{\infty} f^{δ} (x) d x], δ \neq 1, δ > 0

Proposition 5. Renyi’s entropy for the CG random variable is given by;

(19)

I_{R} (δ) = \frac{1}{1 - δ} log [{(A λ β)}^{δ} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ϖ_{i j k l} \int_{- \infty}^{\infty} g {(x)}^{δ} {(G (x))}^{l} d x], δ \neq 1, δ > 0,

(19)

where

ϖ_{i j k l} = \frac{{(- 1)}^{i + k + l} {(λ δ)}^{i}}{i!} \frac{{(i + δ)}^{j}}{j!} e^{λ δ} (\begin{matrix} β (j + δ) - 1 \\ k \end{matrix}) (\begin{matrix} k \\ l \end{matrix})

Proof. From EquationEquation (2)(2) $f (x) = A λ β g (x) G (x)^{β - 1} e^{G {(x)}^{β}} e^{λ (1 - e^{G {(x)}^{β}})}, x > 0, λ > 0, β > 0.$ (2) , $f^{δ} (x) = {(A λ β)}^{δ} g (x)^{δ} G (x)^{δ β - 1} e^{δ G {(x)}^{β}} e^{λ δ} e^{- λ δ e^{G {(x)}^{β}}}$

Adopting similar concept for expanding the density, $f^{δ} (x)$ becomes

f^{δ} (x) = {(A λ β)}^{δ} \sum_{i = 0}^{\infty} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{k} ϖ_{i j k l} g {(x)}^{δ} {(G (x))}^{l}

where $ϖ_{i j k l} = \frac{{(- 1)}^{i + k + l} {(λ δ)}^{i}}{i!} \frac{{(i + δ)}^{j}}{j!} e^{λ δ} (\begin{matrix} β (j + δ) - 1 \\ k \end{matrix}) (\begin{matrix} k \\ l \end{matrix})$ . Substituting $f^{δ} (x)$ into $I_{R} (δ)$ completes the proof.

4.6. Stochastic ordering

Ordering mechanism in data can easily be shown using stochastic ordering. Let $X$ and $Y$ be random variables with cdfs $F_{X} (x)$ and $F_{Y} (x)$ respectively. $X$ is less than $Y$ in likelihood ratio order $(X \leq_{l r} Y)$ , if the function $f (x) / g (x) i s d e c r e a s i n g f o r a l l x$ .

Proposition 6. Let $X \~ C G (λ_{1}, β, ψ)$ and $Y \~ C G (λ_{2}, β, ψ)$ , where $ψ$ is a $(p \times 1)$ vector of parameters associated with the baseline distribution. $X$ is less than $Y$ in likelihood ratio order $(X \leq_{l r} Y)$ if $λ_{2} < λ_{1}$ .

Proof. The ratio of their pdfs is given by $\frac{f_{X} (x)}{f_{Y} (x)} = \frac{λ_{1}}{λ_{2}} {e^{(λ_{1} - λ_{2}) (1 -}}^{e^{G {(x)}^{β}})}$ , which is a decreasing function if $λ_{2} < λ_{1}$ .

4.7. Order statistics

The pdf for the $p^{t h}$ order statistic $X_{p : n}$ , of an independent identically distributed random sample $X_{1}, X_{2}, . . ., X_{n}$ of size $n$ , $f_{X_{p : n}} (x)$ , is given by;

f_{X_{p : n}} (x) = \frac{n!}{(p - 1)! (n - p)!} {[F (x)]}^{p - 1} {[1 - F (x)]}^{n - p} f (x), p = 1, 2, \dots, n .

Expanding ${[F (x)]}^{p - 1}$ using binomial expansion, ${[F (x)]}^{p - 1} = \sum_{i = 0}^{p - 1} {{(- 1)}^{i} (\begin{matrix} p - 1 \\ i \end{matrix}) [1 - F (x)]}^{i}$ . Substituting into the density of the $p^{t h}$ order statistic yields,

f_{X_{p : n}} (x) = \frac{n!}{(p - 1)! (n - p)!} \sum_{i = 0}^{p - 1} {(- 1)}^{i} (\begin{matrix} p - 1 \\ i \end{matrix}) {[S (x)]}^{n - p + i} f (x)

where ${[S (x)]}^{n - p + i} = {[1 - F (x)]}^{n - p + i}$ .

Hence, the pdf for the $p^{t h}$ order statistic is given by;

(20)

\begin{aligned} f_{X_{p : n}} (x) = \frac{n!}{(p - 1)! (n - p)!} \sum_{i = 0}^{p - 1} {(- 1)}^{i} (\begin{matrix} p - 1 \\ i \end{matrix}) A λ β g (x) G {(x)}^{β - 1} e^{G {(x)}^{β}} \times e^{λ (n - p + i + 1) (1 - e^{G {(x)}^{β}})} \end{aligned}

(20)

Employing a similar concept of expanding the density of the CG family, a mixture representation of the pdf of the $p^{t h}$ order statistic is defined as;

(21)

f_{X_{p : n}} (x) = \frac{n! A λ β}{(p - 1)! (n - p)!} \sum_{i = 0}^{p - 1} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{\infty} \sum_{m = 0}^{l} D_{i j k l m} g (x) G (x)^{m},

(21)

where

D_{i j k l} = \frac{{(- 1)}^{i + j + l + m} {[λ (n - p + i + 1)]}^{j}}{i!} \frac{{(j + 1)}^{k}}{k!} (\begin{matrix} p - 1 \\ i \end{matrix}) (\begin{matrix} β (k + 1) - 1 \\ l \end{matrix}) (\begin{matrix} l \crm \end{matrix}) e^{λ (n - p + i + 1)} .

4.7.1. Moments of order statistics

The $r^{t h}$ non-central moment of the $p^{t h}$ order statistic is given by $E (X_{p : n}^{r}) = μ_{r}^{' (p : n)} = \int_{- \infty}^{\infty} x^{r} f_{X_{p : n}} (x) d x$ . Substituting EquationEquation (21)(21) $f_{X_{p : n}} (x) = \frac{n! A λ β}{(p - 1)! (n - p)!} \sum_{i = 0}^{p - 1} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{\infty} \sum_{m = 0}^{l} D_{i j k l m} g (x) G (x)^{m},$ (21) into $E (X_{p : n}^{r}),$ the $r^{t h}$ non-central moment of the $p^{t h}$ order statistic of the CG random variable is given by,

(22)

E (X_{p : n}^{r}) = \frac{n! A λ β}{(p - 1)! (n - p)!} \sum_{i = 0}^{p - 1} \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{l = 0}^{\infty} \sum_{m = 0}^{l} D_{i j k l m} τ_{(r, m)}

(22)

where $τ_{(r, m)} = \int_{- \infty}^{\infty} x^{r} g (x) G {(x)}^{m} d x$ is the probability weighted moment of the baseline distribution.

5. Parameter estimation

Maximum likelihood estimation method was used in estimating the parameters for the family of distribution for similar reasons as stated in Nasiru et al. (Nasiru et al., Citation2018). Given a random sample $x_{1}, x_{2}, . . ., x_{n}$ of size $n$ from the CG family of distributions, the total log-likelihood function is given by

(23)

\begin{matrix} ℓ = n log A λ β + \sum_{i = 1}^{n} log g (x_{i}; ψ) + (β - 1) \sum_{i = 1}^{n} log G (x_{i}; ψ) + \sum_{i = 1}^{n} G {(x_{i}; ψ)}^{β} + λ \sum_{i = 1}^{n} (1 - e^{G {(x_{i}; ψ)}^{β}}), \end{matrix}

(23)

where $ψ$ is a $(p \times 1)$ vector of parameters associated with the baseline distribution.

The parameters are then estimated by partially differentiating the total log-likelihood function with respect to the parameters of the CG family as follows.

(24)

\frac{\partial ℓ}{\partial λ} = \frac{n}{λ} + \frac{n (1 - e) e^{λ (1 - e)}}{1 - e^{λ (1 - e)}} + \sum_{i = 1}^{n} (1 - e^{G {(x_{i}; ψ)}^{β}}),

(24)

(25)

\begin{matrix} \frac{\partial ℓ}{\partial β} = \frac{n}{λ} + \sum_{i = 1}^{n} log G (x_{i}; ψ) + \sum_{i = 1}^{n} G {(x_{i}; ψ)}^{β} log G (x_{i}; ψ) - λ \sum_{i = 1}^{n} G {(x_{i}; ψ)}^{β} e^{G {(x_{i}; ψ)}^{β}} log G (x_{i}; ψ) \end{matrix}

(25)

and

(26)

\begin{aligned} \frac{\partial ℓ}{\partial ψ} = \sum_{i = 1}^{n} \frac{g_{k}^{'} (x_{i}; ψ)}{G (x_{i}; ψ)} + (β - 1) \sum_{I = 1}^{n} \frac{G_{k}^{'} (x_{i}; ψ)}{G (x_{i}; ψ)} + \sum_{i = 1}^{n} G_{K}^{'} (x_{i}; ψ) G {(x_{i}; ψ)}^{β - 1} - λ β \sum_{i = 1}^{n} G_{K}^{'} (x_{i}; ψ) G {(x_{i}; ψ)}^{β - 1} e^{G {(x_{i}; ψ)}^{β}}, \end{aligned}

(26)

where $g_{K}^{'} (x_{i}; ψ) = \frac{\partial g (x_{i}; ψ)}{\partial ψ}$ and $G_{K}^{'} (x_{i}; ψ) = \frac{\partial G (x_{i}; ψ)}{\partial ψ}$ .

Equating the score functions to zero and numerically solving the system of equations using techniques such as the quasi Newton-Raphson method, gives the maximum likelihood estimates for the parameters. The interval estimates of the parameters are obtained by first finding the observed $(p \times p)$ information matrix given by $J (ϑ) = \frac{\partial^{2} ℓ}{\partial q \partial r}$ (for $q, r = λ, β, ψ$ and $ϑ = (λ, β, ψ)^{T}$ ), whose elements can be numerically computed. Under the regularity conditions, as $n \to \infty$ , ${\hat{ϑ}}_{} N_{p} (0, J (\hat{ϑ})^{- 1})$ , where $J (\hat{ϑ})$ is the observed information matrix evaluated at $\hat{ϑ} .$ The approximate $100 (1 - ρ) %$ confidence intervals (where $ρ$ is the significance level) can be constructed using the asymptotic normal distribution.

6. Some special distributions

The CG family of distributions can be used to extend many distributions to create more flexibility in their applications. In this section some special distributions were developed.

6.1. Chen Burr III distribution

Suppose that the baseline distribution is Burr III (Burr, Citation1942), it’s cdf and pdf are given by $G (x) = {(1 + x^{- θ})}^{- γ}$ and $g (x) = γ θ x^{- θ - 1} {(1 + x^{- θ})}^{- γ - 1}$ , $x > 0, θ > 0, γ > 0$ respectively. The cdf of Chen Burr III (CB) is given by

(27)

F (x) = A [1 - exp (λ (1 - e^{{(1 + x^{- θ})}^{- γ β}}))], x > 0, α > 0, β > 0, γ > 0, λ > 0.

(27)

Its corresponding density and hazard functions are, respectively

(28)

f (x) = A λ β γ θ {(x)}^{- θ - 1} {(1 + x^{- θ})}^{- γ β - 1} exp [{(1 + x^{- θ})}^{- γ β} + λ (1 - e^{{(1 + x^{- θ})}^{- γ β}})], x > 0

(28)

and

(29)

h (x) = \frac{A λ β γ θ {(x)}^{- θ - 1} {(1 + x^{- θ})}^{- γ β - 1} exp [{(1 + x^{- θ})}^{- γ β} + λ (1 - e^{{(1 + x^{- θ})}^{- γ β}})]}{1 - A [1 - exp λ (1 - e^{{(1 + x^{- θ})}^{- γ β}})]}, x > 0.

(29)

Plots of the density and hazard rate functions of the CB distribution are displayed in Figure . The density plot exhibit varying shapes including unimodal with different degrees of kurtosis, right skewed and reversed J shapes. The hazard rate function for some selected values exhibited upside down bathtub, decreasing and increasing failure rates.

Figure 1. Plots of density and hazard rate functions of CB distribution

The CB distribution’s quantile function $Q_{G} (u)$ is given by;

Q_{G} (u) = x_{u} = {[{(log (1 - (\frac{log (1 - u / A)}{λ})))}^{- \frac{1}{γ β}} - 1]}^{- \frac{1}{θ}} .

6.2. Chen Kumaraswamy distribution

The Chen Kumaraswamy (CK) distribution uses the Kumaraswamy distribution (Kumaraswamy, Citation1980) with pdf and cdf respectively given by $G (x) = 1 - {(1 - x^{a})}^{b}$ and $g (x) = a b x^{a - 1} {(1 - x^{a})}^{b - 1}$ , $0 < x < 1, a > 0, b > 0$ as the baseline distribution. The cdf of CK distribution is given by

(30)

F (x) = A [1 - exp λ [1 - e^{{[1 - {(1 - x^{a})}^{b}]}^{β}}]] x > 0, a > 0, b > 0, β > 0, λ > 0,

(30)

with its corresponding density and hazard rate functions, respectively, given by

(31)

\begin{aligned} f (x) = A λ β a b x^{a - 1} {(1 - x^{a})}^{b - 1} {(1 - {(1 - x^{a})}^{b})}^{β - 1} exp [{(1 - {(1 - x^{a})}^{b})}^{β} + λ (1 - e^{{(1 - {(1 - x^{a})}^{b})}^{β}})], x > 0 \end{aligned}

(31)

and

(32)

\begin{aligned} h (x) = \frac{A λ β a b x^{a - 1} {(1 - x^{a})}^{b - 1} {(1 - {(1 - x^{a})}^{b})}^{β - 1} exp [{(1 - {(1 - x^{a})}^{b})}^{β} + λ (1 - e^{{(1 - {(1 - x^{a})}^{b})}^{β}})]}{1 - [1 - exp λ [1 - e^{{[1 - {(1 - x^{a})}^{b}]}^{β}}]]}, x > 0. \end{aligned}

(32)

Plots of the density and hazard rate functions of the CK distribution are displayed in Figure . The plot of the density shows shapes such as; the reversed J, left skewed, right skewed and unimodal shapes among others. The hazard rate plot for some selected values exhibits increasing and decreasing failure rates, unimodal and bathtub shapes.

Figure 2. Plots of the density and hazard rate function of CK distribution

The quantile function $Q_{G} (u)$ is obtained as.

Q_{G} (u) = x_{u} = {[1 - {(1 - {(log (1 - (\frac{log (1 - u / A)}{λ})))}^{\frac{1}{β}})}^{\frac{1}{b}}]}^{\frac{1}{a}}

6.3. Chen Weibull distribution

Chen Weibull (CW) distribution is obtained using Weibull distribution (Weibull, Citation1951) with cdf and pdf, respectively, given by $G (x) = 1 - e^{- {(\frac{x}{α})}^{γ}}$ and $g (x) = (\frac{γ}{α}) {(\frac{x}{α})}^{γ - 1} e^{- {(\frac{x}{α})}^{γ}}$ as baseline distribution. The cdf and pdf of the CW distribution are, respectively, given by

(33)

F (x) = A [1 - exp λ [1 - e^{{(1 - e^{- {(\frac{x}{α})}^{γ}})}^{β}}]], x > 0, α > 0, β > 0, λ > 0, γ > 0

(33)

and

(34)

\begin{aligned} f (x) = A λ β (\frac{γ}{α}) {(\frac{x}{α})}^{γ - 1} {(1 - e^{- {(\frac{x}{α})}^{γ}})}^{β - 1} exp [λ {(1 - e^{- {(\frac{x}{α})}^{γ}})}^{β} - {(\frac{x}{α})}^{γ} + (1 - e^{- {(\frac{x}{α})}^{γ}})], x > 0. \end{aligned}

(34)

The hazard rate function is given by

(35)

\begin{aligned} h (x) = \frac{A λ β (\frac{γ}{α}) {(\frac{x}{α})}^{γ - 1} {(1 - e^{- {(\frac{x}{α})}^{γ}})}^{β - 1} exp [λ {(1 - e^{- {(\frac{x}{α})}^{γ}})}^{β} - {(\frac{x}{α})}^{γ} + (1 - e^{- {(\frac{x}{α})}^{γ}})]}{1 - A [1 - exp λ [1 - e^{{(1 - e^{- {(\frac{x}{α})}^{γ}})}^{β}}]]}, x > 0. \end{aligned}

(35)

The CW distribution’s plots of its density exhibit; right skewed, left skewed, unimodal and reversed J shapes among others as shown in Figure . The hazard rate plot of the CW distribution for some selected values exhibits varying shapes such as increasing and decreasing failure rates, right and left skewed unimodal shapes and upside down bathtub shape.

Figure 3. Plots of density and hazard rate function of CW distribution

The quantile function $Q_{G} (u)$ of the CW distribution is given by

Q_{G} (u) = x_{u} = α {(- log (1 - {(\frac{log (1 - u / A)}{λ})}^{\frac{1}{β}}))}^{\frac{1}{γ}} .

7. Simulations

Monte Carlo simulations were performed in this section to investigate the behavior of the maximum likelihood estimators of the parameters. For illustration purposes, the simulation experiments were undertaken using the Chen Weibull distribution. The experiments were replicated for $N = 1500$ times using sample size $n = 50, 150, 300, 600, 1000$ and parameter values $I : λ = 1.9, β = 0.9, α = 0.8, γ = 4.8$ and $I I : λ = 0.5, β = 0.5, α = 0.5, γ = 0.5$ . The average bias (AB), root-mean-square error (RMSE) and coverage probability (CP) of the $95 %$ confidence intervals for the estimators of the parameters were estimated. From Table , the ABs and RMSEs for the estimators generally decrease to zero as the sample size increases. This implies that as the sample size increases the accuracy and consistency of the maximum likelihood estimators are achieved. Also, the CPs for most of the estimators are quite close to the nominal value of 0.95. Thus, we can say that the maximum likelihood technique works very well to estimate the parameters of the Chen Weibull distribution.

Table 1. Monte Carlo simulation results

Display Table

8. Applications

In this section the performance of the CW distribution in providing good parametric fits to real-life datasets is demonstrated. Its goodness of fit measures are compared with competing models such as; exponentiated Chen (EC) (Chaubey & Zhang, Citation2015), extended Weibull (EW) (Xie, Tang, & Goh, Citation2002) and Kumaraswamy exponentiated Chen (KEC) (Khan, King, & Hudson, Citation2018) distributions. The information criteria and goodness of fit measures used are; Akaike information criteria (AIC), Bayesian information criteria (BIC), consistent Akaike information criteria (CAIC), HQ information Criteria (HQIC), Kolmogorov–Smirnov statistic(KS),Cramer-von misses distance values (CM) and Anderson Darling statistic (AD). In obtaining the maximum likelihood estimates for the parameters, the log-likelihood function of the models were maximized using the bbmle package’s subroutine mle2 in R (Bolker, Citation2014). The maximum likelihood estimates with the largest maxima were chosen after using a wide range of initial values.

For illustration, the first dataset (data1) consists of the fatigue times of 6061-T6 aluminum coupons cut parallel with the direction of rolling and oscillated at 18 cycles per second found in Birnbaum & Saunders (Birnbaum & Saunders, Citation1969), whilst the second dataset (data2) represents survival times of guinea pigs injected with different amounts of tubercle bacilli studied by Bjerkedal (Bjerkedal, Citation1960). These datasets are given in Tables and .

Table 2. Fatigue time of 101 6061-T6 aluminum coupons

Download CSV Display Table

Table 3. Survival times of guinea pigs injected with different amounts of tubercle bacilli

Download CSV Display Table

A preliminary exploration of the datasets on the shapes of the hazard rate functions showed that data1 has an increasing hazard rate function whilst data two have a unimodal hazard rate function as shown in Figure .

Figure 4. TTT-transform plots for the datasets

The maximum likelihood estimates and the corresponding standard errors of the parameters of the fitted distributions for both datasets and their goodness of fit measures are displayed in Tables and respectively. The parameters of all the distributions were significant at 5% significance level, with the exception of CW and KEC distributions which had only one of their parameters ( $λ$ and $b$ respectively) significant at 15% significance level.

Table 4. Maximum likelihood estimates and standard errors of parameters in brackets

Display Table

Table 5. Goodness-of-fit statistics and information criteria

Download CSV Display Table

Compared to the competing models, the CW distribution with its four parameters provides a better fit for the datasets as it has the smallest value for all the goodness of fit measures used as shown in Table .

This is further confirmed by the plots of densities and cdfs of the empirical and fitted distributions as shown in Figures and . From the fitted plot, it is observed that the CW provides a reasonable fit to the density.

Figure 5. Empirical and fitted density and cdf plots of data1

Figure 6. Empirical and fitted density and cdf plots of data2

The P-P plots also indicates the CW distribution provides a better fit for both datasets in comparison with KEC, EC and EW distributions as shown in Figures and .

Figure 7. P-P plots of fitted distributions for data1

Figure 8. P-P plots of fitted distributions for data2

The profile likelihoods of the estimated parameters of the CW distribution for the datasets are shown in Figures and . From the plots, it is observed that the estimated values for the parameters are the maxima.

Figure 9. Profile log-likelihood plot of CW parameters for data1

9. Conclusion

The focus of most researchers is geared towards developing new families of distributions for generalizing existing distributions to provide better fit for the modeling of life data. A new family of distribution called the CG family is developed and studied. Its statistical properties such as the quantile, moments, incomplete moments, generating function, entropies, stochastic ordering and order statistics are derived. Estimators for the parameters of the new family were developed using the method of maximum likelihood. A demonstration of the application of the special distribution developed from the family was carried out using two-real datasets. A comparison of the results with that of other existing distributions showed that the special distribution developed from the CG family provide a better parametric fit to these datasets.

Additional information

Funding

The authors received no direct funding for this research.

Notes on contributors

Lea Anzagra

Lea Anzagra is a doctoral candidate with the Department of Statistics, University for Development Studies, Ghana. Her research interest is in distribution theory, probability theory and survival analysis.

Solomon Sarpong

Solomon Sarpong is a Senior Lecturer with the Department of Statistics, University for Development Studies, Ghana. His research interest is in distribution theory and time series analysis.

Suleman Nasiru

Suleman Nasiru is a Senior Lecturer with the Department of Statistics, University for Development Studies, Ghana. His research interest is in distribution theory, quality control and time series analysis.

References

Alzaatreh, A., Lee, C., & Famoye, F. (2013). A new method for generating families of continuous distributions. Metron, 71(1), 63–20. doi:10.1007/s40300-013-0007-y
Google Scholar
Birnbaum, Z. W., & Saunders, S. C. (1969). Estimation for a family of life distributions with applications to fatigue. Journal of Applied Probability, 6, 328–347. doi:10.2307/3212004
Web of Science ®Google Scholar
Bjerkedal, T. (1960). Acquisition of resistance in guinea pies infected with different doses of virulent tubercle bacilli. American Journal of Hygeine, 72(1), 130–148.
PubMedGoogle Scholar
Bolker, B. (2014). Tools for general maximum likelihood estimation. R development core team.
Google Scholar
Burr, I. W. (1942). Cumulative frequency functions. Annals of Mathematical Statistics, 13, 215–232. doi:10.1214/aoms/1177731607
Google Scholar
Chaubey, Y. P., & Zhang, R. (2015). An extension of Chen’s family of survival distributions with bathtub shape or increasing hazard rate function. Communications in Statistics - Theory and Methods, 44(19), 4049–4064. doi:10.1080/03610926.2014.997357
Web of Science ®Google Scholar
Chen, Z. (2000). A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function. Statistics & Probability Letters, 49, 155–161. doi:10.1016/S0167-7152(00)00044-4
Web of Science ®Google Scholar
Cordeiro, G. M., & de Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81(7), 883–898. doi:10.1080/00949650903530745
Web of Science ®Google Scholar
Hamedani, G. G., Yousof, H. M., Rasekhi, M., Alizadeh, M., & Najibi, S. M. (2018). Type I general exponential class of distributions. Pakistan Journal of Statistics and Operation Research, 14(1), 39–55. doi:10.18187/pjsor.v14i1.2193
Web of Science ®Google Scholar
Khan, M. S., King, R., & Hudson, I. L. (2018). Kumaraswamy exponentiated Chen distribution for modelling lifetime data. Applied Mathematics and Information Sciences, 12(3), 617–623. doi:10.18576/amis/120317
Google Scholar
Kumaraswamy, P. (1980). Generalized probability density-function for double-bounded random-processes. Journal of Hydrology, 462, 79–88. doi:10.1016/0022-1694(80)90036-0
Web of Science ®Google Scholar
Lee, C., Famoye, F., & Alzaatreh, A. Y. (2013). Methods for generating families of univariate continuous distributions in the recent decades. WIREs Computational Statistics, 5(3), 219–238. doi:10.1002/wics.1255
Google Scholar
Nasiru, S. (2018). Extended Odd Fréchet-G family of distributions. Journal of Probability and Statistics, 2018, 1–12. doi:10.1155/2018/2931326
Web of Science ®Google Scholar
Nasiru, S., Mwita, P. N., & Ngesa, O. (2018). Exponentiated generalized power series family of distributions. Annals of Data Science, 6(3), 463–489. doi:10.1007/s40745-018-0170-3
Google Scholar
Rényi, A. (1961). On measures of entropy and information. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Contributions to the Theory of Statistics, 1, 547–561.
Google Scholar
Weibull, W. (1951). A statistical distribution function of wide applicability. Journal of Applied Mechanics, 18, 293–296.
Web of Science ®Google Scholar
Xie, M., Tang, Y., & Goh, T. N. (2002). A modified Weibull extension with bathtub-shaped failure rate function. Reliability Engineering and System Safety, 76(3), 279–285. doi:10.1016/S0951-8320(02)00022-4
Web of Science ®Google Scholar
Zubair, A. (2018). The Zubair-G family of distributions: Properties and applications. Annals of Data Science. doi:10.1007/s40745-018-0169-9
Google Scholar

Chen-G class of distributions

Abstract

PUBLIC INTEREST STATEMENT

Conflict of Interest

1. Introduction

2. Chen generated a family of distributions

3. Mixture representation of distribution

4. Statistical properties

4.1. Quantile function

4.2. Moments, moment generating functions and incomplete moments

4.2.1. Non-central moments

4.2.2. Moment generating functions

4.2.3. Incomplete moments

4.3. Inequality measures

4.3.1. Lorenz curve

4.3.2. Bonferroni curve

4.4. Mean residual life

4.5. Entropy

4.5.1. Rényi’s entropy

4.6. Stochastic ordering

4.7. Order statistics

4.7.1. Moments of order statistics

5. Parameter estimation

6. Some special distributions

6.1. Chen Burr III distribution

6.2. Chen Kumaraswamy distribution

6.3. Chen Weibull distribution

7. Simulations

Table 1. Monte Carlo simulation results

8. Applications

Table 2. Fatigue time of 101 6061-T6 aluminum coupons

Table 3. Survival times of guinea pigs injected with different amounts of tubercle bacilli

Table 4. Maximum likelihood estimates and standard errors of parameters in brackets

Table 5. Goodness-of-fit statistics and information criteria

9. Conclusion

Additional information

Funding

Notes on contributors

Lea Anzagra

Solomon Sarpong

Suleman Nasiru

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date