1,015
Views
4
CrossRef citations to date
0
Altmetric
Research Article

Parameter estimations for mixed generalized exponential distribution based on progressive type-I interval censoring

, , , & | (Reviewing Editor)
Article: 1280913 | Received 04 Oct 2016, Accepted 20 Dec 2016, Published online: 02 Mar 2017

Abstract

This paper considers the estimation of parameters based on a progressively type-I interval-censored data from a mixed generalized exponential distribution. The maximum likelihood estimation is used but an analytic form cannot be obtained. The EM algorithm is applied to obtain the maximum likelihood estimates. The performance of the estimates is judged by a simulating study and a real data is presented to illustrate the method of estimation developed here.

Public Interest Statement

Progressive type-I interval censoring data in survival analysis and reliability analysis are a kind of interested and popular data type. This work is modeling for this kind of data with a more flexible Mixed Generalized Exponential Distribution including the exponential distribution and Weibull distribution. And the result of this paper can apply to the lifetime analysis in clinical trial, engineering and so on.

1. Introduction

In the life testing, reliability studies and survival analysis, it is extremely common that items are lost or removed from experiments before failure. The most popular censoring schemes are type-I and type-II censoring. Under type-I censoring, the test is continued until a pre-specified time. In type-II censoring, the test is continued up to a pre-specified number of failures. Progressive type-II censoring is an extension of the traditional type-II censoring scheme and it has been studied such as Balakrishnan and Aggarwala (Citation2000), Balakrishnan, Kannan, Lin, and Wu (Citation2004) and so on. Sometimes it is impossible to observe the whole life test process continuously due to time and cost constraint. Only the number of failures is observed in an interval instead of observing the failure time exactly and this is called interval censoring. As the mixture of the interval censoring and progressive censoring, Aggarwala (Citation2001) introduced progressive type-I interval censoring scheme and developed the statistical inference for the exponential distribution based on progressively type-I interval-censored data. Ng and Wang (Citation2009) discussed statistical inference for Weibull distribution under progressive type-I interval censoring. They have compared with different estimation methods for the parameters of Weibull distribution via simulation. Lin, Wu, and Balakrishnan (Citation2009) considered determination of optimum life testing plan with progressively type-I interval-censored data from the log-normal distribution. Chen and Lio (Citation2010) considered parameter estimations for generalized exponential distribution under progressive type-I interval censoring. They have obtained the estimates of unknown parameters using EM algorithm, midpoint approximation method and method of moments. Further, Lio, Chen, and Tsai (Citation2011) presented the parameter estimations of the generalized Rayleigh distribution at the same censoring scheme. Peng and Yan (Citation2011) considered the MLEs and moments estimators of the parameters for gamma distribution based on progressive type-I interval censoring and compared with the biases and mean square errors through simulation. Pradhan and Gijo (Citation2013) considered inference for the unknown parameters of log-normal distribution based on progressively type-I interval-censored data through both frequency and Bayesian approaches.

In most simple life tests, the experimental data come from one population and have only one type of failure. In fact, a mixed distribution can provide a flexible candidate to depict the time to failure. Especially in the analysis of biased life data, the failure hazard in the beginning is quite high and the failure rate is decreased or constant as the age increases. On this occasion, population is not homogeneous but is made up from sub-populations. There are many researches for the mixed distribution, such as McClean (Citation1986), Soliman (Citation2006), Tian, He, and Chen (Citation2008), Tian, Tian and Chen (Citation2012), Tian, Tian, and Zhu (Citation2014), Tian, Zhu and Tian (Citation2013) and so on. Nevertheless, to our best knowledge, no published papers address maximum likelihood estimation of the mixed generalized exponential distribution (MGED) under progressive type-I interval censoring. In this work, we consider the estimation for parameters of the MGED based on progressively type-I interval-censored data. This paper gives the likelihood equations of the unknown parameters. It is observed that there is no closed form of MLEs. Therefore, it is suggested to use the EM algorithm to compute the MLEs and present the performance of the estimates by simulations.

The rest of the paper is organized as follows. The model and data description are provided in Section 2. The maximum likelihood estimators of the unknown parameters are obtained in Section 3. The EM algorithm procedure is given in Section 4. The performance of the estimators is investigated via simulation in Section 5. A data-set is analyzed in Section 6. Section 7 includes some conclusion remarks.

2. The description of the model and data

Consider the MGED with m components with its probability density function, cumulative distribution function and hazard function as follows(2.1) f(t;P,α,λ)=k=1mPkαkλk1-e-λktαk-1e-λkt,t>0(2.1) (2.2) Ft;P,α,λ=k=1mpk1-e-λktαk,t0(2.2)

(2.3) ht;P,α,λ=k=1mpkαkλk1-e-λktαk-1e-λkt1-k=1mpk1-e-λktαk,t0(2.3)

where P=p1,p2,,pm-1, α=α1,,αm, λ=λ1,,λm, 0 < pk < 1, k = 1, …, m − 1, pm=1-k=1m-1pk, αk > 0 are the shape parameters, λk>0 are the scale parameters, the number of the parameters is 3m − 1.

Then, progressive type-I interval censoring scheme is as follows. Let n identical items be placed on a test at time t0 = 0 and pre-specify N inspection times t1 < t2 < ⋯ < tN, where tN is the scheduled termination time of the experiment. Suppose Xi is the failure number within the interval of (ti−1ti], Yi is the survival items numbers at time ti and Ri is the number of items randomly removed from the surviving items at time ti. R1R2, …, RN can be pre-specified by the percentage of the remaining surviving items. Therefore, for the pre-specified percentage pi,Ri=Yipi,i=1,2,,N,pN=1. Alternatively R1R2, …, Rm can also be pre-specified positive integers. In this case, there must be at least Ri units available for removal. The proportions of surviving units to be removed at the monitoring and censoring points have been given in the simulation. And the progressively type-I interval-censored data are given by {XiRiti}, i = 1, 2, …, N, where the sample size n=i=1N(Xi+Ri). If Ri = 0, i = 1, 2, …, N – 1, then progressive type-I interval censoring is reduced to the traditional interval-censored data, X1X2, …, XNXN+1 = RN.

3. Maximum likelihood estimation

Given a progressively type-I interval-censored data, {XiRiti}, i = 1, 2, …, N which sample size is n, from a lifetime distribution with distribution function F(Tθ) and θ is the parameter vector. Then, the likelihood function is (Aggarwala, Citation2001)(3.1) L(θ)F(t1,θ)X1[1-F(t1,θ)]R1×[F(t2,θ)-F(t1,θ)]X2[1-F(t2,θ)]R2××[F(tN,θ)-F(tN-1,θ)]XN[1-F(tN,θ)]RN=i=1N[F(ti,θ)-F(ti-1,θ)]Xi[1-F(ti,θ)]Ri(3.1)

where t0 = 0. We can obtain the MLEs of the parameters by maximizing the likelihood function in (3.1).

Given the MGED by the Equation (2.2), the above likelihood function can be specified as follows:L(P,α,λ)i=1Nk=1mpk(1-e-λkti)αk-k=1mpk(1-e-λkti-1)αkXi1-k=1mpk(1-e-λkti)αkRi

And the log-likelihood function isl(P,α,λ)=lnL(P,α,λ)=constant+i=1NXilnk=1mpk(1-e-λkti)αk-k=1mpk(1-e-λkti-1)αk+Riln1-k=1mpk(1-e-λkti)αk

Taking derivatives of the log-likelihood function with respective to pk,αk,λk to zero, the following score equations are obtained:i=1NXi(1-e-λkti)αk-(1-e-λkti-1)αkk=1mpk(1-e-λkti)αk-k=1mpk(1-e-λkti-1)αk-i=1NRi(1-e-λkti)αk1-k=1mpk(1-e-λkti)αk=0i=1NXi[pk(1-e-λkti)αkln(1-e-λkti)-pk(1-e-λkti-1)αkln(1-e-λkti)]k=1mpk(1-e-λkti)αk-k=1mpk(1-e-λkti-1)αk-i=1NRipk(1-e-λkti)αkln(1-e-λkti)1-k=1mpk(1-e-λkti)αk=0

i=1NXipkαkti(1-e-λkti)αk-1e-λkti-pkαkti-1(1-e-λkti-1)αk-1e-λkti-1k=1mpk(1-e-λkti)αk-k=1mpk(1-e-λkti-1)αk-i=1NRipkαkti(1-e-λkti)αk-1e-λkti1-k=1mpk(1-e-λkti)αk=0

Obviously, the equations are quite complex and there is no closed form for the solutions to the above equations and the EM algorithm introduced as follows to find the MLEs of pk,αk,λk, k = 1, 2, …, m.

4. EM algorithm

The Expectation Maximization (EM) Algorithm (Dempster, Laird, & Rubin, Citation1977) is a useful tool to estimate the parameters of the distribution based on an incomplete data. Here, some notations are similar to that in the Tian et al. (Citation2014). In order to estimate the MGED models under progressive type-I interval censoring, EM algorithm is applied to estimate the unknown parameters. Suppose τ1τ2, …, τn are n independent identically distributed (iid) samples from the MGED model and denote

fki=αkλk(1-e-λkτi)αk-1e-λkτifi=k=1mpkfki

Ski=1-(1-e-λkτi)αkSi=k=1mpkSki

where k = 1, 2, …, m, i = 1, 2, …, n.

Here, the sample τi, i = 1, 2, …, n is divided into two components that consist of τij and τij. Let τijj = 1, 2, …, Xi be the lifetimes within interval (ti−1ti] and τij, j = 1, 2, …, Ri be the lifetimes for those withdrawn items at ti for i = 1, 2, …, N and the number of the τij and τij are i=1NXi, i=1NRi, respectively. Introduce an indicator vector Iij = (Iij1Iij2, …, Iijm) of τij, Iij=(Iij1,Iij2,,Iijm)of τij, Iijk(Iijk) is a binary random variable only taking the value 1 if τij(τij) comes the k-th component, and 0 otherwise for k = 1, 2, …, m. Also denote I=(Iij,Iij), i = 1, 2, …, N, j = 1, 2, …, Xi, j = 1, 2, …, Ri as an indicator matrix composed of n indicator vectors of all life variables τij and τij. Evidently random vector Iij = (Iij1Iij2, …, Iijm) and Iij=(Iij1,Iij2,,Iijm) follows the multinomial distribution. However, we may not know which component the variate comes from. Namely, I cannot be observed, thus regard it as the missing data in the EM algorithm. In the following paper, denote Iij(1)=(Iij1(1),Iij2(1),,Iijm(1)), Iij(2)=(Iij1(2),Iij2(2),,Iijm(2)) as the indicator vectors of the data τij and τij, respectively.

As to the the lifetimes τij within interval (ti−1ti], the joint density of τij and Iij(1) is g(τij,Iij(1)|P,α,λ)=k=1m[pk(Ski-1-Ski)]Iijk(1). Given τij, the conditional density of Iij(1) is p(Iijk(1)=1|τij,P,α,λ)=pk(Ski-1-Ski)(Si-1-Si),k=1,2,,m.

Where, Ski=1-(1-e-λkti)αk, Ski-1=1-(1-e-λkti-1)αk, Si=k=1mpkSki, Si-1=k=1mpkSki-1

For the right-censored data τij, the joint density of τij and Iij(2) is given by g(τij,Iij(2)|P,α,λ)=k=1m[pkSki]Iijk(2). Given τij, the conditional density of Iij(2) is p(Iijk(2)=1|τij,P,α,λ)=pkSkiSi,k=1,2,,m. Where Ski=1-(1-e-λkti)αk, Si=k=1mpkSki.

In Section 2, denote the progressively type-I interval-censored sample as {XiRiti}, i = 1, 2, …, N. In the progressive type-I interval censoring experiments, we can only observe the failure numbers Xi within the intervals (ti−1ti] and Ri, the number of the censored items withdrawn at the censoring time ti for i = 1, 2, …, N. Then, the observed values can be simply denoted as Y = (t1t2, …, tNX1X2, …, XNR1R2, …, RN). However, the true failure time within the interval (ti−1ti] denoted as τij, i = 1, 2, …, N, j = 1, 2, …, Xi and the true failure time of censored units at ti denoted as τij, i = 1, 2, …, N, j = 1, 2, …, Ri can not be observed in life experiments. Consequently, τ=(τij,τij;i=1,2,,N,j=1,2,,Xi,j=1,2,,Ri) can be regarded as missing data. All the missing data can be denoted (τI) and all the complete data can be denoted as W = (τIY). The following procedure will give the MLEs of all unknown parameters via the EM algorithm consisting of two steps: E-step and M-step.

First of all, the likelihood function of the MGED model under the complete data W is given byl(P,α,λ|W)=i=1Nj=1Xik=1mpkαkλk(1-e-λkτij)αk-1e-λkτijIijk(1)×j=1Rik=1mpkαkλk(1-e-λkτij)αk-1e-λkτijIijk(2)

The log-likelihood function of the complete data is

lnl(P,α,λ|W)=i=1Nk=1mj=1XiIijk(1)ln(pkαkλk)+(αk-1)ln(1-e-λkτij)-λkτij+j=1RiIijk(2)ln(pkαkλk)+(αk-1)ln(1-e-λkτij)-λkτij

Given initial values P(0),α(0),λ(0) of the unknown parameter vectors, we can obtain parameter estimates of model (2.1) based on EM algorithm via the following two steps. Of course, estimation performance has a great relationship with the choice of the initial values. Generally, different initial values may be lead to different convergence rate. In the simulation, it is suggest to choose some groups of initial values to compare the estimation results.

E-step: suppose the (h − 1)-th iteration values are P(h-1),α(h-1),λ(h-1), then the Q function of the h-th iteration is obtained byQ(P,α,λ|P(h-1),α(h-1),λ(h-1),Y)=Elnl(P,α,λ|W)|P(h-1),α(h-1),λ(h-1),Y=i=1Nk=1mEj=1XiIijk(1)ln(pkαkλk)+(αk-1)ln(1-e-λkτij)-λkτij+j=1RiIijk(2)ln(pkαkλk)+(αk-1)ln(1-e-λkτij)-λkτij=i=1Nk=1mj=1Xiln(pkαkλk)E(Iijk(1))+(αk-1)j=1XiEln(1-e-λkτij)Iijk(1)-λkj=1XiE(τijIijk(1))+j=1Riln(pkαkλk)E(Iijk(2))+(αk-1)j=1RiEln(1-e-λkτij)Iijk(2)-λkj=1RiE(τijIijk(2))=i=1Nk=1mj=1Xiln(pkαkλk)E[E(Iijk(1)|P(h-1),α(h-1),λ(h-1),τ,Y)]+(αk-1)j=1XiEE(ln(1-e-λkτij)Iijk(1)|P(h-1),α(h-1),λ(h-1),τ,Y)-λkj=1XiE[E(τij·Iijk(1)|P(h-1),α(h-1),λ(h-1),τ,Y)]+j=1Riln(pkαkλk)EE(Iijk(2)|P(h-1),α(h-1),λ(h-1),τ,Y)+(αk-1)j=1RiEE(ln(1-e-λkτij)Iijk(2)|P(h-1),α(h-1),λ(h-1),τ,Y)-λkj=1RiEE(τijIijk(2)|P(h-1),α(h-1),λ(h-1),τ,Y)=i=1Nk=1mj=1Xiln(pkαkλk)Eakij(h-1)(τij)+(αk-1)j=1XiEln(1-e-λkτij)akij(h-1)(τij)-λkj=1XiEτij·akij(h-1)(τij)+j=1Riln(pkαkλk)E[bkij(h-1)(τij)]+(αk-1)j=1RiEln(1-e-λkτij)bkij(h-1)(τij)-λkj=1RiEτij·bkij(h-1)(τij)=i=1Nk=1mj=1Xiln(pkαkλk)ak,i,i-1(h-1)(ti,ti-1)+(αk-1)j=1XiEln(1-e-λkτij)ak,i,i-1(h-1)(ti,ti-1)-λkj=1XiEτij·ak,i,i-1(h-1)(ti,ti-1)+Ri·ln(pkαkλk)·bki(h-1)(ti)+(αk-1)j=1RiEln(1-e-λkτij)bki(h-1)(ti)-λkj=1RiEτij·bki(h-1)(ti)

where,akij(h-1)(τij)=pk(h-1)(Ski-1(h-1)(ti-1)-Ski(h-1)(ti))(Si-1(h-1)(ti-1)-Si(h-1)(ti))=^ak,i,i-1(h-1)(ti,ti-1)

bkij(h-1)(τij)=pk(h-1)·Ski(h-1)(ti)Si(h-1)(ti)=^bki(h-1)(ti)Ski-1(h-1)(ti-1)=1-(1-e-λk(h-1)ti-1)αk(h-1)Ski(h-1)(ti)=1-(1-e-λk(h-1)ti)αk(h-1)Si-1(h-1)(ti-1)=k=1mpk(h-1)Ski-1(h-1)(ti-1)Si(h-1)(ti)=k=1mpk(h-1)Ski(h-1)(ti)k=1,2,,m,i=1,2,,N,j=1,2,,Xi,j=1,2,,Ri

Then,

Q(P,α,λ|P(h-1),α(h-1),λ(h-1),Y)=i=1Nk=1mj=1Xiln(pkαkλk)ak,i,i-1(h-1)(ti,ti-1)+(αk-1)j=1Xiti-1tiln(1-e-λkx)ak,i,i-1(h-1)(ti,ti-1)pij(x)dx-λkj=1Xiti-1tix·ak,i,i-1(h-1)(ti,ti-1)pij(x)dx+Ri·ln(pkαkλk)bki(h-1)(ti)+(αk-1)j=1Ritiln(1-e-λkx)bki(h-1)(ti)pij(x)dx-λkj=1Ritix·bki(h-1)(ti)pij(x)dx=i=1Nk=1mXiln(pkαkλk)Δ1kij(h-1)+(αk-1)XiΔ2kij(h-1)-λkXiΔ3kij(h-1)+Ri·ln(pkαkλk)Δ4ki(h-1)+(αk-1)RiΔ5kij(h-1)-λkRiΔ6kij(h-1)=i=1Nk=1m(XiΔ1kij(h-1)+RiΔ4ki(h-1))ln(pkαkλk)+(αk-1)(XiΔ2kij(h-1)+RiΔ5kij(h-1))-λk(XiΔ3kij(h-1)+RiΔ6kij(h-1))

whereΔ1k,i,i-1(h-1)=ak,i,i-1(h-1)(ti,ti-1),Δ2kij(h-1)=ti-1tiln(1-e-λkx)ak,i,i-1(h-1)(ti,ti-1)pij(x)dx

Δ3kij(h-1)=ti-1tixak,i,i-1(h-1)(ti,ti-1)pij(x)dx,Δ4ki(h-1)=bki(h-1)(ti)Δ5kij(h-1)=tiln(1-e-λkx)bki(h-1)(ti)pij(x)dx,Δ6kij(h-1)=tixbki(h-1)(ti)pij(x)dxpij(x|P(h-1),α(h-1),λ(h-1),Y)=k=1mpk(h-1)αk(h-1)λk(h-1)(1-e-λk(h-1)x)αk(h-1)-1e-λk(h-1)xk=1mpk(h-1)(1-e-λk(h-1)ti)αk(h-1)-k=1mpk(h-1)(1-e-λk(h-1)ti-1)αk(h-1),x(ti-1,ti),andpij(x|P(h-1),α(h-1),λ(h-1),Y)=k=1mpk(h-1)αk(h-1)λk(h-1)(1-e-λk(h-1)x)αk(h-1)-1e-λk(h-1)x1-k=1mpk(h-1)(1-e-λk(h-1)ti)αk(h-1),x(ti,+).

M-step: we maximize the approximate Q function numerically in E-step with respect to unknown parameters P,α,λ to update estimates which are denoted as P(h),α(h),λ(h).

In order to obtain the estimates of unknown parameters more conveniently in M-step, use Δ~2kij(h-1),Δ~5kij(h-1) instead of Δ2kij(h-1),Δ5kij(h-1), respectively. Here, Δ~2kij(h-1),Δ~5kij(h-1) have the following expressions:

Δ~2kij(h-1)=ti-1tiln(1-e-λk(h-1)x)ak,i,i-1(h-1)(ti,ti-1)pij(x)dx

Δ~5kij(h-1)=tiln(1-e-λk(h-1)x)bki(h-1)(ti)pij(x)dx

Then the above Q function is:

Q(P,α,λ|P(h-1),α(h-1),λ(h-1),Y)i=1Nk=1m(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))ln(pkαkλk)+(αk-1)(XiΔ~2kij(h-1)+RiΔ~5kij(h-1))-λk(XiΔ3kij(h-1)+RiΔ6kij(h-1))

Then, take the derivatives of unknown parameters and let(4.1) Qαki=1N1αk(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))+XiΔ~2kij(h-1)+RiΔ~5kij(h-1)=0k=1,,m(4.1) (4.2) Qλki=1N1λk(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))-(XiΔ3kij(h-1)+RiΔ6kij(h-1))=0k=1,,m(4.2) (4.3) Qpk=1pki=1N(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))-11-j=1m-1pji=1N(XiΔ1m,i,i-1(h-1)+RiΔ4mi(h-1))=0k=1,,m-1.(4.3)

Solve the Equations (4.1) and (4.2), there are

(4.4) α^k(h)-i=1N(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))i=1N(XiΔ~2kij(h-1)+RiΔ~5kij(h-1)),k=1,,m(4.4)

(4.5) λ^k(h)i=1N(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))i=1N(XiΔ3kij(h-1)+RiΔ6kij(h-1)),k=1,,m(4.5)

Solve the Equation (4.3), there is

pki=1N(XiΔ1m,i,i-1(h-1)+RiΔ4mi(h-1))+j=1m-1pj·i=1N(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1))

(4.6) =i=1N(XiΔ1k,i,i-1(h-1)+RiΔ4ki(h-1)),k=1,2,,m-1(4.6)

From Equation (4.6), obtain the h-th iteration values of parameters p1, …, pm−1 which are the solutions of the linear equation group denoted by AP = b, where PAb are given respectively as followsP=(p1,p2,,pm-1)TAm-1=(als),als=i=1N(XiΔ1l,i,i-1(h-1)+RiΔ4li(h-1))+i=1N(XiΔ1m,i,i-1(h-1)+RiΔ4mi(h-1)),l=si=1N(XiΔ1l,i,i-1(h-1)+RiΔ4li(h-1)),lsb=i=1N(XiΔ11,i,i-1(h-1)+RiΔ41i(h-1)),,i=1N(XiΔ1m-1,i,i-1(h-1)+RiΔ4m-1i(h-1))T

Since i=1N(XiΔ1l,i,i-1(h-1)+RiΔ4li(h-1))>0,l=1,2,,m-1, it is easy to testify that the rank of matrix A is m − 1, i.e. A is a invertible matrix. Therefore, the unique solution of parameter vector P of the h-th iteration in the M-step is obtained

(4.7) p^(h)=(p1(h),p2(h),,pm-1(h))T=A-1b(4.7)

From the above Equations (4.4), (4.5) and (4.7), we can update P^(h),α^(h),λ^(h) by repeating E-step and M-step till the total error of all estimated parameters approaches the supposed restraint.

5. Simulation study

The purpose of simulation study is to investigate the performance of the estimates for the MGED parameters in modeling progressive type-I interval censoring lifetime data. Here, we use some similar algorithm steps proposed in Aggarwala (Citation2001). The simulation is conducted in R language. To be self-contained, the algorithm is re-produced as follows.

Firstly, generate the numbers, Xi, of failed items in each subinterval (ti−1ti], i = 1, 2, …, N, from a sample of size n putting on life testing at time ti = 0. A progressively type-I interval-censored data,(XiRiti), i = 1, 2, …, N, from MGED which has distribution (2.2) can be generated using the fact that(5.1) X1Binom[n,F(T1)](5.1)

and let X0 = 0 and R0 = 0 for i = 1, 2, …, N,

Xi|Xi-1,,X1,Ri-1,,R1

rBinomn-j=1i-1(Xj+Rj),F(Ti)-F(Ti-1)1-j=1i-1[F(Tj)-F(Tj-1)]

(5.2) =rBinomn-j=1i-1(Xi+Ri),F(Ti)-F(Ti-1)1-F(Ti-1)(5.2)

(5.3) Ri=floorpi×n-j=1i-1(Xj+Rj)-Xi(5.3)

where floor() returns the largest integer not greater than the argument in R language and 0 = t0 < t1 < ⋯ < tN < ∞ are pre-scheduled times.

In this paper, we just consider the MGED model with two components under the progressive type-I interval censoring scheme. Suppose that the true values of model parameters are p1=0.6,α1=0.9,λ1=0.13,α2=1,λ2=1.3, while the initial values are taken as p1(0)=0.4,α1(0)=0.8, λ1(0)=0.1,α2(0)=0.9,λ2(0)=1. Each replication of the simulation generates a progressively type-I interval-censored data of size n = 60, 120, 180, 300, 500 with N = 10, pre-specified inspection times t1 = 1, t2 = 2, t3 = 3, t4 = 4, t5 = 5, t6 = 6, t7 = 7, t8 = 8, t9 = 9 and t10 = 10 and t10 = 10 as the time to terminate the experiment.

In this paper, we consider the following progressive interval censoring

Schemes:

Scheme 1: p(1)=(0.05,0.05,0.05,0.05,0.1,0.1,0.1,0.1,0.1,1)

Scheme 2: p(2)=(0.05,0.05,0.05,0.05,0.1,0.1,0,0,0,1)

Scheme 3: p(3)=(0,0,0,0.1,0.05,0.1,0.15,0.05,0.05,1)

Scheme 4: p(4)=(0,0,0,0,0,0,0,0,0,1)

where censoring in p(1) in lighter for the first four intervals and heavier for the next five intervals. The censoring pattern is reversed in p(2). p(3) is no censoring in the first three interval and then some censoring afterwards. p(4) is the conventional interval censoring where no removals prior to the experiment termination.

Repeat the simulation experiments 1,000 times for the sample sizes n = 60, 120, 180, 300, 500. Denote the h-th estimates as Θ(h)=(p1(h),α1(h),λ1(h),α2(h),λ2(h)), h = 1, …, s, s = 1,000. The final average bias and mean square errors (MSEs) of the estimates are given respectively as Bias(θj)=(1s)h=1s(θ^j(h)-θj) and MSE(θj)=(1s)h=1s(θ^j-θj)2, where θ^j is the j-th coordinate of the unknown parameter vector Θ. The computation results for four different censoring schemes are illustrated in Tables , respectively.

Table 1. Average biases, MSE(s) of estimators for scheme 1 when the sample sizes are n = 60, 120, 180, 300, 500

Table 2. Average biases, MSE(s) of estimators for scheme 2 when the sample sizes are n = 60, 120, 180, 300, 500

Table 3. Average biases, MSE(s) of estimators for scheme 3 when the sample sizes are n = 60, 120, 180, 300, 500

Table 4. Average biases, MSE(s) of estimators for scheme 4 when the sample sizes are n = 60, 120, 180, 300, 500

From Tables , we can see that the EM algorithm is very effective in dealing with parameter estimation of the MGED for the given four sampling schemes under the progressive type-I interval censoring. The number of the specified inspection is fixed. For the given one scheme, it is obvious that the simulation results are changed as n increases, but the range is very small. It is easy to understand the phenomenon that the estimation of the unknown parameters is biased slightly when the number of the items withdrawn from the experiment is increasing. It is obvious that the simulation results are mainly decided by the sample size n and the censoring proportion. For different censoring schemes, the smaller censoring proportion we design in life testing, the better estimation results we will get. Therefore, the scheme 4 presents the most precise estimation. Further, for fixed N as n increases, on the whole, the biases, the MSEs decrease.

6. Data analysis

In order to illustrate the effectiveness of the model and algorithm, we analyze a real data-set bellow. A data-set which describe the survival times for surgery of a group of 374 patients who underwent operations in connection with a type of malignant disease (Berkson & Gage, Citation1950; Lawless, Citation2003). The data are given in Table .

Table 5. Survival time in interval form for the patients who underwent operations in connection with a type of malignant disease

According to the data above, djs in the first five intervals are significantly different with the last intervals and think that the data may not come from the homogenous population. Thus, consider two components of the MGED (1.1) to analyze this data-set. Through the method discussed above, we can obtain that the parameter estimates are p^1=0.4213, α^1=0.9216, λ^1=0.0886, α^2=0.9118, λ^2=0.4017, respectively. And the survival function and the hazard rate function of the model (1.1) are as follow:(6.1) S^(x)=1-[0.4213×(1-e-0.0886x)0.9216+0.5787×(1-e-0.4017x)0.9118],x0(6.1) (6.2) h^(x)=0.0344×(1-e-0.0886x)-0.0784e-0.0886x+0.2119×(1-e-0.4017x)-0.0882e-0.4017x1-[0.4213×(1-e-0.0886x)0.9216+0.5787×(1-e-0.4017x)0.9118],x0(6.2)

From (6.1) and (6.2), we can get the fitted survival function and hazard rate function of the real data as in Figure . For Figure , the hazard rate function is monotone decreasing function. There is great change in the first few years and then tends to be gentle after eight years. Therefore, for this disease, the risk rate of patients after operation in the short time is higher. In order to avoid the recurrence in the recovery stage after the operation, we suggest to check regularly.

Figure 1. The fitted survival function and hazard rate function of the real data.

Figure 1. The fitted survival function and hazard rate function of the real data.

In order check the validity of the model, we adopt the Kolmogorov-Smirnov goodness-of-fit test statistic for the fitted distribution F^(x). Define the maximum distance in Kolmogorov-Smirnov goodness-of-fit test:

(6.3) Dn(F)=sup0x<|F^n(x)-F(x|θ^)|,(6.3)

as the distance between the empirical distribution F^n(x), of the given complete data-set and the fitted distribution function F(x|θ^) with θ^ as the MLE of unknown parameter vector θ. When a progressively type-I interval-censored data are given, the empirical distribution is replaced by the following (6.4) in the formula Dn(F).(6.4) F^(xi)=1-j=1i(1-p^j),i=1,,N(6.4)

where p^j=djn-k=0j-1dk-k=0j-1Rk,j=1,,N.

Fit the data-set in Table , we obtain the K-S distance 0.2406. So it is reasonable to say that the MGED provides a good fit for the given data-set in Table . Figure is given to compare the empirical distribution function with the fitted distribution function.

Figure 2. The empirical and fitted distribution function.

Note: Fold line represents the empirical distribution function, dotted line represents the fitted distribution function.
Figure 2. The empirical and fitted distribution function.

7. Conclusion and remarks

This article considers the estimation of parameters based on a progressively type-I interval-censored data from a MGED. Because the data in this paper are not complete, the EM algorithm is used to obtain the maximum likelihood estimates. Then the performance of the estimates is presented by a simulating study in different setups and gets effective results. Lastly, a surgery data of a group of 374 patients who underwent operations in connection with a type of malignant disease are presented to illustrate the method of estimation developed here.

Additional information

Funding

This research was partially supported by National Science Foundation of China (NSFC) [grant numbers 11301037, 11671054 and 11571051] and The Education Department of Jilin Province, "13th Five-Year" project planning 2016317.

Notes on contributors

Chunjie Wang

Chunjie Wang obtained her PhD degree in Probability and Mathematics Statistics from the School of Mathematics, Jilin University, Changchun, China. She is currently working in Changchun University of Technology (CCUT), Changchun, China, as an associate professor. Her research interests include Survival analysis, Mathematical Statistics.

Shuying Wang

Shuying Wang is currently doing her PhD from Jilin University, Changchun, China. Her research interests include Mathematical Statistics and Survival analysis.

Dehui Wang

Dehui Wang is currently working as professor in the School of Mathematics in Jilin University, Changchun, China. His research interests include Mathematical Statistics, Time Series, and Survival Analysis.

Chunjing Li

Chunjie Li is currently doing her PhD from Jilin University, Changchun, China. Her research interests include Mathematical Statistics and Survival Analysis.

Xiaogang Dong

Xiaogang Dong obtained his PhD degree in Statistics from Jilin University, Changchun, China. He is currently working in Changchun University of Technology (CCUT), Changchun, China, as a professor. His research interests include High frequency data analysis, Survival analysis, Mathematical Statistics, and Applied Statistics.

References

  • Aggarwala, R. (2001). Progressive interval censoring: Some mathematical results with applications to inference. Communications in Statistics-Theory and Methods, 30, 1921–1935.10.1081/STA-100105705
  • Balakrishnan, N., & Aggarwala, R. (2000). Progressive Censoring. Boston, MA: Birkhauser.10.1007/978-1-4612-1334-5
  • Balakrishnan, N., Kannan, N., Lin, C. T., & Wu, S. J. (2004). Inference for the extreme value distribution under progressive Type-II censoring. Journal of Statistical Computation and Simulation, 74, 25–45.10.1080/0094965031000105881
  • Berkson, J., & Gage, R. P. (1950). Calculation of survival rates for cancer. Proceedings of the Staff Meetings Mayo Clinic, 25, 270–286.
  • Chen, D. G., & Lio, Y. L. (2010). Parameter estimations for generalized exponential distribution under progressive type-I interval censoring. Computational Statistics and Data Analysis, 54, 1581–1591.10.1016/j.csda.2010.01.007
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 1–38.
  • Lawless, J. F. (2003). Statistical models and methods for lifetime data (2nd ed.). Hoboken, NJ: John Wiley & Sons.
  • Lin, C. T., Wu, S. J. S., & Balakrishnan, N. (2009). Planning life tests with progressively type-I interval censored data from the lognormal distribution. Journal of Statistical Planning and Inference, 139, 54–61.10.1016/j.jspi.2008.05.016
  • Lio, Y. Z., Chen, D. G., & Tsai, T. R. (2011). Parameter estimations for generalized Rayleigh distribution under progressively type-I interval censored data. Open Journal of Statistics, 1, 46–57.10.4236/ojs.2011.12006
  • McClean, S. (1986). Estimation for the mixed exponential distribution using grouped follow-up data. Journal of the Royal Statistical Society. Series C: Applied Statistics, 35, 31–37.
  • Ng, H., & Wang, Z. (2009). Statistical estimation for the parameters of Weibull distribution based on progressively type-I interval censored sample. Journal of Statistical Computation and Simulation, 79, 145–159.10.1080/00949650701648822
  • Peng, X., & Yan, Z. (2011). Parameter estimations with gamma distribution based on progressive type- I interval censoring. IEEE International Conference on Computer Science and Automation Engineering, 1, 449–453.
  • Pradhan, B., & Gijo, E. V. (2013). Parameter estimation of lognormal distribution under progressive type-I interval censoring (Technical Report No. SQCOR-2013-02).
  • Soliman, A. A. (2006). Estimators for the finite mixture of Rayleigh model based on progressively censored data. Communications in Statistics-Theory and Methods, 35, 803–820.10.1080/03610920500501379
  • Tian, Y., He, J., & Chen, P. (2008). Mixed generalized exponential with censored data. Journal of Chongqing Institute of Technology (Natural Science), 22, 79–81.
  • Tian, Y. Z., Tian, M. Z. I., & Chen, P. (2012). Parameter estimation for a mixture of generalized exponential distribution under grouped and right-censored samples. Chinese Journal of Applied Probability and Statistics, 28, 561–571.
  • Tian, Y. Z., Tian, M. Z., & Zhu, Q. Q. (2014). Estimating a finite mixed exponential distribution under progressively type-II censored data. Communications in Statistics-Theory and Methods, 43, 3762–3776.10.1080/03610926.2012.752843
  • Tian, Y. Z., Zhu, Q. Q., & Tian, M. Z. (2013). Inference for mixed generalized exponential distribution under progressively type-II censored samples. Journal of Applied Statistics, 41, 660–676.