1,444
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A restricted gamma ridge regression estimator combining the gamma ridge regression and the restricted maximum likelihood methods of estimation

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1696-1713 | Received 03 Aug 2021, Accepted 08 Nov 2021, Published online: 27 Nov 2021

Abstract

In this article, we propose a restricted gamma ridge regression estimator (RGRRE) by combining the gamma ridge regression (GRR) and restricted maximum likelihood estimator (RMLE) to combat multicollinearity problem for estimating the parameter β in the gamma regression model. The properties of the new estimator are discussed, and its superiority over the GRR, RMLE and traditional maximum likelihood estimator is theoretically analysed under different conditions. We also suggest some estimating methods to find the optimal value of the shrinkage parameter. A Monte Carlo simulation study is conducted to judge the performance of the proposed estimator. Finally, an empirical application is analysed to show the benefit of RGRRE over the existing estimators.

1. Introduction

Gamma regression model (GRM) is frequently used in the field of medical sciences, health care economics and automobile insurance claims [Citation1]. The GRM is used, when the dependent variable is positively skewed and assumes that the dependent variable has a gamma distribution [Citation2]. The usual maximum likelihood estimator (MLE) is used to estimate the parameter vector β of the GRM by means of iterative reweighed least squares (IRWLS) algorithm. The MLE minimizes the weighted sum of squared error. Though the MLE estimator cannot be suitable if it is suspected that β may belong to a linear subspace such as Fβ=f, where f is an m×1 vector of known elements and F is the m×p known matrix of full rank with m<p. In this condition, a restricted maximum likelihood estimator (RMLE) should be used [Citation3,Citation4]. The problem of multicollinearity and its consequences on a GLM are well known, for example, one of the most important consequences of multicollinearity on the restricted and unrestricted estimators in the GRM is that both estimators provide large variances of the regression coefficients. Another consequence of multicollinearity is the wider confidence interval, decreased statistical power which results in an increase in the probability of type II error in hypothesis testing in terms of the parameters [Citation5].

Unrestricted ridge estimators under the classical linear regression model and the GRM have been considered respective by Hoerl and Kennard [Citation6] and Amin et al. [Citation7] to remedy the problem caused by multicollinearity. However, the regression coefficient estimates are obtained by using the restricted estimator when the prior information for regression coefficients is available, which be able to form linear restriction Fβ=f in the GLM. For instance, the linear restriction exists in sum-to-zero parameterization, when the regressors are qualitative and in quantal response problems with constant relative potency between drugs (see for more details, Citation3). Therefore, in this paper, the model of interest is the GRM which relates to the system of linear restrictions. Nyquist [Citation3] proposed the RMLE for the GLM under linear restrictions on the parameters. However, in the presence of multicollinearity, the weighted matrix of cross products is ill-conditioned, which leads to instability and high variances of both the MLE (unrestricted) and RMLE. To overcome the problem of multicollinearity, different kinds of shrinkage estimator have been proposed. For the linear regression model, Kaçiranlar et al. [Citation8] developed the restricted Liu estimator by combing the Liu estimator and the restricted least square estimator (RSLE), and Sarkar [Citation9] proposed restricted ridge regression approach by combining ordinary ridge regression and RSLE. However, literature on the restricted ridge regression under the GRM is limited.

The aim of this paper is to propose a new restricted gamma ridge regression estimator (RGRRE) by combining the GRRE and RMLE by following the work of Sarkar [Citation9]. The motivation at the back of the new estimator is as follows. Under certain circumstances, a reparameterization in the forms of linear combinations of the regression coefficients is found when the linear restrictions on the regression coefficients are true. Now consider the problem of multicollinearity under the reparametrized model, then it is required to modify the RMLE in the line of GRRE to obtain the efficient estimates. The mean square error (MSE) properties of the RGRRE are studied and show the superiority of RGRRE to the RMLE. We also suggest some methods for estimating the value of ridge parameter.

The rest of this study is organized as follows: The GRM and existing estimators are discussed in Section 2. In Section 3, we derive the MSE properties of the proposed estimator and the theoretical comparison with existing estimators is considered. In Section 4, we suggest some estimators for the selection of the ridge parameter of the RGRRE. The design of the Monte Carlo simulation study and its results are provided in Section 5. A real-life dataset is analysed in Section 6. Some concluding remarks are presented in Section 7.

2. The gamma regression model and estimators

In the GLM framework, the response variable y is assumed to follow an exponential family distribution with mean μ. The random variables yi(i=1,,n) contain n independent observations, i.e. y1,y2,,yn. Suppose that the probability density function of yi is shown as below f(yi,θi,φ)=exp[yiθib(θi)a(φ)+c(yi,φ)],where θi is the location parameter, φ is the dispersion parameter and b(θi) is the cumulant function. The GLM employs the relation (1) g(μi)=ηi=xiβ,i=1,,n(1) where E(yi)=μi=b(θi)θi, g() is the monotonic differentiable log link function, ηi is the linear predictor, β=(β1,β2,,βp) is a p×1 vector of regression coefficients and xi is the ith row of X=[xi1,xi2,,xip] which is an n×p matrix with p non-stochastic explanatory variables. Then the log-likelihood function is defined as i=[yiθib(θi)a(φ)+c(yi,φ)]The MLE of βj is obtained by differentiating the function below (β)=i=1n[yiθib(θi)a(φ)+c(yi,φ)]which is βj=1a(φ)i=1n[yiμiwig(μi)fj(xi)],j=1,2,,p.Consider the response variable yi contains n independent observations that comes from the gamma distribution. Let yiGamma(μ,ϕ) by using the reparameterization as α=ϕ1 and β=μϕ where μ=E(yi) is the mean function of the response variable and ϕ>0 is the dispersion parameter (see for more details of the GRM estimation [Citation7,Citation10]). The mean function of the GRM is defined as E(yi)=b(θi)θi=μi=exp(xiβ) by using log link function g(μi)=ηi=xiβ as defined in Equation (1). The most common estimation method for the GRM is the MLE. The MLE is obtained by solving lβ=dldθidθidμidμidηiηiβ=i=1n(yiμi)a(φ)1Varμdμidηixi.Since the above function is non-linear in β, so we can solve the above expression iteratively through Fisher’s scoring method. Consider β(m) be the estimated value of the MLE of β with iterations t(t1) which may be written as β(t+1)=β(t)+{I(β(t))}1S(β(t)),where I(β(t)) represents Fisher information matrix and S(β) is the p×1 vector of scores, and both S(β) and I(β(t)) are computed at the last iteration β(t), where the convergence is achieved. After the final iteration, the MLE can be obtained by the IRWLS method as (2) β^MLE=(A)1XW^z,(2) where A=XW^X, W^=diag{(1/μi2)μi2} and z is n×1 vector with elements zi=xiβ+(yiμi)μi2 being evaluated at the last iteration β(t), whereas μi=exp(xiβ^) represents the mean function of the response variable using log link function. Let Xn=W^n1/2X and Z=W^n1/2z, then βˇMLE=(XnXn)1XnZ.By following Kibria and Saleh [Citation4], we assume two conditions (i) 1n(XnXn)=A, as n and (ii) max1inxni(XnXn)1xni0 as n, where A=XW^X is finite and p.d matrix and xni is the ith row of the matrix Xn. The asymptotic distribution of βˇMLE is stated as n1/2F(βˇMLEβ)Nm[0,(FA1F)].

2.1. The RMLE

One method to improve the efficiency of the estimators is the use of extraneous or prior information. In practice, such prior information might be accessible in relation to the regression coefficients. For instance, in applied economics, the constant returns to scale imply that the exponents in a Cobb–Douglas production function should sum to unity. In the second example, the absence of money illusion on the part of consumers implies that the sum of money income and price elasticities in a demand function should be zero. This type of prior information may be available from an extraneous source, user experience, some theoretical consideration, among others. To make use of such information in enhancing the estimation of regression coefficients, it can be said in the form of linear restrictions.

Our primary aim is to estimate β=(β1,β2,,βp) when it is suspected that β belongs to the linear sub-space defined by (3) Frβ=fr,r=1,2,,m,(3) where fr are scalars, Fr are p×1 known vectors and m linearly independent restriction on parameter vector β. In such situation, we may use restricted estimator of β [Citation3,Citation4,Citation11,Citation12]. The RMLE [Citation3] is obtained to maximize the log-likelihood function of the GRM over β under restrictions Frβ=fr. One method for solving restricted optimization problems is the quadratic penalty function. The quadratic function for the RMLE is given as Θ(β,λ)=(β)12r=1mλr(frFrβ)2and maxβΘ(β,λ) for fixed and positive value of λj. Differentiating Θ(β,λ) with respect to βj equals Pj(β,λ)=i=1n(yiμi)a(φ)1Varμdμidηixij12r=1mFrjλr(fjFjβ).We compute the RMLE by using the method similar to that for unrestricted estimator. Therefore, the (t+1)th approximation of the RMLE is finally obtained as (4) β^RMLE=β^MLE+A1F(FA1F)1(fFβ^MLE).(4) For testing the hypothesis Ho:Fβ=f, the Wald-type test is defined as Ln=n(FβˇMLEf)[F(n1An)1F]1(FβˇMLEf).Under Ho:Fβ=f, LnDχm2 as n, Ln follows central χ2 distribution with m degrees of freedom.

Clearly, E(β^RMLE)β unless (3) holds: Bias(β^RMLE)=E(β^RMLE)β=A1F(FA1F)1(fFβ).The variance–covariance matrix of β^RMLE is defined as Cov(β^RMLE)=φ[A1A1F(FA1F)1FA1],where φ is the estimated dispersion parameter computed as φ=1npi=1n(yiμ^i)2μ^i2.

The performance of the RMLE is superior to the MLE since RMLE has a smaller sampling variance than the MLE. The asymptotic variance–covariance matrix of β^MLE is defined as Cov(β^MLE)=φ(A1). Therefore, it is shown that [Cov(β^MLE)Cov(β^RMLE)]=φ(A1)φ[A1A1F(FA1F)1FA1]which is a positive semidefinite (psd) matrix. So, we conclude that the RMLE has minimum sampling variance as compared to the MLE. Amin et al. [Citation7] proposed the GRRE for the GRM (5) β^(k)=Akβ^MLE,k0(5) where Ak=(Ip+kA1)1, Ip is an identity matrix of order p×p and k0 is the ridge parameter. The MSE properties of the β^(k) are defined by Amin et al. [Citation7].

3. Proposed estimator

The RMLE of β is obtained by maximizing the log-likelihood function of the GRM subject to the restrictions given in Equation (3). Now considering the problem of multicollinearity under the reparametrized model, the method of RMLE may produce poor estimates and provide misleading information as in MLE in the presence of multicollinearity. Therefore, it is required to modify the RMLE in the line of GRRE to obtain efficient estimates under a set of linear restrictions by following the work of Sarkar [Citation9]. Quadratic function for the RGRRE is defined as Θ(β,k,λ)=ββ2kl(β)+r=1mλr(frFrβ)2,where 1/k is the Lagrange multipliers. The quadratic function Θ(β,k,λ) combines objective functions of GRRE [Citation13] and RMLE [Citation3], and the same idea is also carried out by Kurtoǧlu and Özkale [Citation14] for the count regression model where the dispersion parameter equals to 1. Differentiating Θ(β,k,λ) with respect to βj equals Pj(β,k,λ)=2[βj1ki=1n(yiμi)a(φ)1Varμdμidηixij+r=1mFrjλr(fjFjβ)].Now define H(β,k,λ) is p×p matrix with elements jq(β,k,λ) and taking the second-order derivates of Pj(β,k,λ) as jq(β,k,λ)=2(Θ(β,k,λ))βjβq=2[δjq1k2((β))βjβq+r=1mλrFrjFrq].Taking the expectation of both sides, we have E[2(Θ(β,k,λ))βjβq]=2[δjq1kE{2((β))βjβq}+r=1mλrFrjFrq]=2[δjq+1ki=1nxijxiqa(φ)1μi2(dμidηi)2+r=1mλrFrjFrq],where δjq={1ifj=q0otherwise.

Let βF(k) is the RGRRE of β with iterations t(t1). Subsequently, by means of the Fisher scoring method this case yields βF(k)(t+1)=βF(k)(t)+H(βF(t),k,λ)1Θ(βF(t),k,λ),Based on the objective functions of RMLE Θ(β,λ) and GRRE Θ(β,k,λ), the RGRRE is defined as (6) β^F(k)=β^(k)+(A+kIp)1F[F(A+kIp)1F]1(fFβ^MLE),k>0.(6) The final form of the restricted ridge estimator in the GLM was proposed by Kurtoğlu and Özkale [Citation14] and it is same for the GRM. However, the MSE properties of GRM are different as compared to the count regression model by considering the effect of dispersion parameter in the estimation methods of optimal shrinkage parameter k in the RGRRE. The performance of the shrinkage estimators is different for different forms of the GLM in the restricted estimators (e.g. [Citation4,Citation10–12]).

The RGRRE can be simplified as (6a) β^F(k)=Akβ^RMLE,(6a) where Ak=(Ip+kA1)1 and β^RMLE=β^MLE+A1F(FA1F)1(fFβ^MLE). It can be easily seen that β^F(k)=β^RMLE when k=0.

We compute the bias of β^F(k) by using (6) as Bias(β^F(k))=E[β^F(k)]β=[Akβ+AkA1F(FA1F)1(fFβ)]β=(AkIp)β+AkA1F(FA1F)1(fFβ)=k(A+kIp)1β+AkA1F(FA1F)1(fFβ)So the RGRRE is a biased estimator of parameter vector β unless k=0 and fFβ=0, where 0 is the m×1 null vector.

The variance–covariance matrix of β^F(k) is computed as Cov(β^F(k))=φ[Ak{A1A1F(FA1F)1FA1}Ak].

3.1. MSE properties and superiority of the β^F(k)

In this section, we define the MSE properties and gauge the performance of β^F(k). The matrix means squared error (MMSE) of an estimator υ^ of υ can be defined as (7) MMSE(υ^)=E(υ^υ)(υ^υ)=Cov(υ^)+Bias(υ^)Bias(υ^),(7) where Cov(υ^) denotes the variance–covariance matrix of an estimator υ^ and Bias(υ^)=E(υ^)υ represents the bias vector. The scalar MSE (SMSE) of υ^ can be found by applying the trace operator which is given by (8) SMSE(υ^)=tr[MMSE(υ^)]=tr[Cov(υ^)]+Bias(υ^)Bias(υ^).(8) Let υ^j(j=1,2) be two competitive estimators of parameter υ, and the estimator υ^2 is said to be superior to υ^1 in the form of MMSE criterion if MMSE(υ^2)MMSE(υ^1)0. Moreover, if the estimator υ^2 dominates the estimator υ^1 in the form of MMSE criterion, then SMSE(υ^1)SMSE(υ^2). For the comparison of the following discussion, we list the following lemmas for making the comparison of the estimators.

Lemma 1.

Let A is a real symmetric matrix, P is a matrix, then A0P,PAP0 each eigenvalue of matrix A is nonnegative.

Proof:

See Wang et al. [Citation15]

Lemma 2.

Suppose that matrices A and B are not singular, and CandD are matrices with proper orders, then (A+CBD)1=A1A1C(B1+DA1C)1DA1.

Proof:

See Rao et al. [Citation16]

3.1.1. Comparison between β^F(k) and β^RMLE when the prior restrictions on the parameters are true, i.e. τ=(fFβ)=0

It is evident that E(β^RMLE)β unless (3) holds. The β^RMLE is an unbiased estimator when τ=0 while β^F(k) is always a biased estimation method.

Theorem 3.1.

Under the GRM, the β^F(k) is a biased estimator and β^RMLE is an unbiased estimator when τ=0. Though, the Cov(β^F(k))Cov(β^RMLE) for k0.

Proof:

As regards performance by the variance–covariance matrices of β^F(k) and β^RMLE, we compute the difference of two variance–covariance matrices as (9) [Cov(β^RMLE)Cov(β^F(k))]=φ[A1A1F(FA1F)1FA1]φ[Ak{A1A1F(FA1F)1FA1}Ak]=[φAk{k2A1GA1+kGA1+kA1G}Ak],(9) where G=A1A1F(FA1F)1FA1 (G= psd), A=XW^X (A= positive definite) and Ak=(Ip+kA1)1. From (9), we conclude that A1G is a non-negative definite (nnd) matrix and A1GA1 is a psd matrix. Therefore, [Cov(β^RMLE)Cov(β^F(k))] is a psd for k0. Hence, the sampling variance of β^F(k) has smaller than the β^RMLE.

Now, we discuss the SMSE properties of the β^F(k) and show the comparison of β^F(k) to the β^RMLE. Following [Citation11] and [Citation3], we define the SMSE of β^RMLE for the GRM as below (10) SMSE(β^RMLE)=φtr(G)=φj=1pmjj,(10) where mjj0 represents the jth diagonal element of the matrix M=QGQ, Q is the orthogonal matrix such that QGQ=Λ=diag(λ1,λ2,,λp). The SMSE of the RGRRE is computed as (11) SMSE(β^F(k))=φtr[Cov(β^F(k))]+[Bias(β^F(k))][Bias(β^F(k))](11)

If we assume that the prior restrictions hold, i.e.τ=0, then (11) may be written as SMSE(β^F(k))=φtr(AkGAk)+k2β(A+kIp)2β.After simplification, the SMSE of the RGRRE is defined as (12) SMSE(β^F(k))=φj=1pλj2(λj+k)2mjj+k2j=1pαj2(λj+k)2=γ1(Fk)+γ2(Fk)(12) where α is the jth element of Qβ, λj is the jth eigenvalue of the matrix A, γ1(Fk)andγ2(Fk) represent the total variance and squared bias of β^F(k) respectively. From (12), it can be noted that the bias of the RGRRE is the same as the bias of the GRRE when the prior restrictions on the parameters are true, i.e. τ=(fFβ)=0. By following [Citation6], we show the following SMSE properties of β^F(k) for the GRM.

Theorem 3.2.

The total variance γ1(Fk) is a continuous and monotonically decreasing function of k.

Proof:

Differentiating the expression γ1(Fk) with respect to k as below (13) {γ1(Fk)}k=2φj=1pλj2(λj+k)3mjj(13) Based on (13), it is evident that γ1(Fk) is a continuous and monotonically decreasing function of k since {γ1(Fk)}k when k0+ and λp0.

Theorem 3.3.

The squared bias γ2(Fk) is a continuous and monotonically increasing function of k.

Proof:

Differentiating the expression γ2(Fk) with respect to k as below (14) {γ2(Fk)}k=2kj=1pλjαj2(λj+k)3.(14) Equation (14) indicates that γ2(Fk) is a continuous and monotonically increasing function of k for k>0 and λj>0.

Theorem 3.4.

Under the GRM, there always exists k>0 in the range 0<k<φ/[max(αj2(λjmjj))] such that SMSE(β^F(k))<SMSE(β^RMLE) when τ=0.

Proof:

The first derivative of (12) with respect to k equals (15) {SMSE(β^F(k))}k=2φj=1pλj2(λj+k)3mjj+2kj=1pλjαj2(λj+k)3=2j=1pkλjαj2φλj2mjj(λj+k)3.(15) As discussed earlier mjj0 and λj>0, j=1,2,,p. It is well known that total variance and total bias are monotonically decreasing and increasing function of k. It is then evident that {γ1(Fk)}k and {γ2(Fk)}k are always non-positive and non-negative, respectively. Thus to prove the theorem it is enough to condition that 0<k<φ/[max(αj2(λjmjj))] for {SMSE(β^F(k))}k to be negative. Hence, we have proven that the existence of a value for k, which showed the superiority of RGRRE over the GRRE.

Remark.

From Theorem 3.4, it is noted that M=QGQ=Λ1B, where B=QA1F(FA1F)1FA1Q and M is a psd matrix. Consequently, mjj=1λjbjj1λj and bjj=diag(B). It can also be noted that φ/[max(αj2(λjmjj))]φ/αmax2. Therefore, we conclude that one of the beneficial consequences of involving exact prior information is that the range of values of k for the dominance of the RGRRE over the RMLE becomes reduced as compared to that for the dominance of GRRE over the traditional MLE in the sense of SMSE criterion.

3.1.2. Comparison between β^F(k) and β^(k) when the prior restrictions on the parameters are true, i.e. τ=(fFβ)=0

This section compares the performance of β^F(k) and β^(k).

Theorem 3.5.

Under the GRM, the Cov(β^F(k)) is less than Cov(β^(k)) for k0iffτ=0.

Proof:

Both β^F(k) and β^(k) have same bias iff τ=0, which can be defined as Bias(β^F(k))=Bias(β^(k))=k(A+kIp)1β.Thus we only make a comparison between the dispersion matrices of the β^F(k) and β^(k). (16) [Cov(β^(k))Cov(β^F(k))]=φ[W(k)1AW(k)1]φ[Ak{A1A1F(FA1F)1FA1}Ak]=φ[W(k)1A1F(FA1F)1FA1W(k)1],(16) where W(k)=(A+kIp), Ak=W(k)A and W(k)=AkA1. [Cov(β^(k))Cov(β^F(k))] is a psd matrix for k0. It is enough to show that RGRRE is superior to the GRRE.

3.1.3. Comparison between β^F(k) and β^RMLE when τ0

The performance of the β^F(k) ultimately depends on the assumed restrictions and its bias when the prior constraints on the parameters are not true. To make a comparison of β^F(k) and β^RMLE in the sense of SMSE, we illustrate Theorem 3.6.

Theorem 3.6.

Under the GRM with collinear regressors, if we have kˇ>0, where kˇ=[min(ϕλj2mjj+τ˘j+αjλjτ˘j)]/[max(λjαj2+αjτ˘j)]. Then SMSE(β^F(k))<SMSE(β^RMLE) for 0<k<kˇ< when τ0.

Proof:

Rewrite the SMSE of β^F(k) and β^RMLE when the restriction does not hold, respectively: SMSE(β^RMLE)=φtr[G]+τA2τ,where τ=F(FA1F)1τ. The final form of SMSE (β^RMLE) is defined as below: (17) SMSE(β^RMLE)=j=1pφλj2mjj+τ˘j2λj2,(17) where τ^:=Q. The SMSE of β^F(k) may rewrite as below (18) SMSE(β^F(k))=φtr[AkGAk]+[W(k)1τkW(k)1βτA2τ]×[W(k)1τkW(k)1βτA2τ]δ(k)=SMSE(β^F(k))=φj=1pλj2mjj(λj+k)2+j=1pk2αj2+τ˘j22kαjτ˘j(λj+k)2.(18) Taking the first derivative of (18) with respect to k to complete the theorem. δ(k)={SMSE(β^F(k))}k=2φj=1pλj2mjj(λj+k)3+2kj=1pkλjαj2+kαjτ˘jτ˘j2αjλjτ˘j*(λj+k)3=2j=1pkλjαj2+kαjτ˘j*(τ˘j*2+αjλjτ˘j*+φλj2mjj)(λj+k)3.and so δ(0)=2j=1p(τ˙j2+αjλjτ˘j*+φλj2mjj)(λj)3.Additionally, δ(0)=j=1pφλj2mj+τ˙j2λj2=SMSE(β^RMLE).Thus, to prove the theorem, it is enough to show that δ(k) is negative. If δ(0)<0, then for 0<k<kˇ<, where kˇ=[min(ϕλj2mjj+τ˘j+αjλjτ˘j)][max(λjαj2+αjτ˘j)], we have δ(k)<δ(0) or SMSE(β^F(k))<SMSE(β^RMLE).

3.1.4. Comparison between β^F(k) and β^(k) when τ0

As we discussed the performance of β^F(k) in the previous section, so it is easy to show the superiority of β^F(k) to the β^(k).

Theorem 3.7.

Under the GRM with collinear regressors when τ0, if we have k>0, then SMSE(β^F(k))<SMSE(β^(k)) for 0<[max(τ˘j2ϕλj(1+λjmjj))][min(2αjτ˘j)]<k<.

Proof:

The SMSE of β^(k) is defined as (19) SMSE(β^(k))=φj=1pλj(λj+k)2+k2j=1pαj2(λj+k)2(19) The SMSE difference is computed by using (18)–(19), SMSE(β^(k))SMSE(β^F(k))=j=1p2kαjτ˘j[τ˘j2ϕλj(1+λjmjj)][min(2αjτ˘j)].The expression 2kαjτ˘j[τ˘j2ϕλj(1+mjj)]>0 if k>[max(τ˘j2ϕλj(1+λjmjj))][min(2αjτ˘j)]>0. This completes the theorem.

4. Estimation of the shrinkage parameter k

In practice, it is best to estimate the shrinkage parameter k in order to attain minimum MSE of the RMLE and RGRRE instead of MLE and GRRE. For this purpose, we find the optimal value of k by differentiating (12) with respect to k and equating to zero, we have (20) [MSE(β^F(k))]k=φj=1pλj2(λj+k)3mjjkj=1pαj2(λj+k)2+k2j=1pαj2(λj+k)2=0(20) It is easy to see that the above equation may be simplified as (21) [MSE(β^F(k))]k=φλj2mjjkαj2λj=0(21) By solving (21) for k, we have (22) kj=φλjmjjαj2,(22) where αj2 which is squared of αj represents the jth element of the vector α=Qβ. The details of these matrices are already discussed in the previous section. After estimating the optimal value of the shrinkage parameter, we propose several estimators to assess efficiency. Motivated by the work [Citation17,Citation18] where different ridge estimation methods have been proposed for estimating the shrinkage parameter. Following are our proposed ridge estimators for the RGRRE: k1=k^mean=j=1p(φ^λjmjjα^j2)p;k2=k^median=median(φ^λjmjjα^j2)k3=k^max=max(φ^λjmjjα^j2);k4=k^min=min(φ^λjmjjα^j2)k5=k^HM=pj=1p(1/(φ^λjmjjα^j2));k6=k^GM=(j=1p(φ^λjmjjα^j2))1/p.

5. The Monte Carlo simulation

In this section, we present a Monte Carlo simulation study to assess the performance of our proposed estimators under different evaluated situations, where MSE is considered as an assessment criterion.

5.1. The design of the simulation

The response variable of the GRM is generated using pseudo-random numbers from G(μi,φ) distribution with the log link function, where (23) μ^i=exp(β0+β1xi1+β2xi2++βpxip),i=1,,n,j=1,,p.(23) The selection of parametric values in (23) is in such a way that j=1pβj2=1, which is the common constraint in different simulation studies [Citation7].

Following the work of McDonald and Galarneau [Citation19], the explanatory variables are generated by xij=(1ρ2)1/2uij+ρuij+1,i=1,2,,n,j=1,2,,p,Where uij are pseudo-random numbers that are generated from the standard normal distribution and ρ is specified so that the correlation between any two explanatory variables is given by ρ2. To see the correlation effect in the simulation, we considered ρ2 = 0.90, 0.95, 0.99, and 0.999. Additionally, we also consider the seven sets of sample sizes to examine the clear effect of multicollinearity, i.e. 50, 100, 150, 200, 300, 400 and 500. The values of β affect the value of Fβ=f which measures the correctness of the restrictions, and the performances of the estimators depend on the magnitude of Fβ=f. Therefore, following the work of Mansson et al. [Citation11], the following restrictions are imposed on p = 4 and p = 8 which are respectively given as F=[10112111]andf=[00]The restrictions for p = 8 are set to be F=[1011211131311121]andf=[00]In the simulation procedure, the dispersion parameter is estimated to be the Pearson method, i.e. φ^=(np)1i=1n(yiμ^i)2μ^i2 for each sample size and explanatory variables. For a combination of the various values, i.e. n, p and ρ the generated data are repeated 2000 times, and the whole process is run 2000 times to compute the simulated MSE as follows [Citation20,Citation21]: MSE(β^)=1Ri=1R(β^(i)β)(β^(i)β),where R is the number of replications set to be 2000 times, and subscript (i) refers to the ith replication and β^(i) is the estimate of β in the ith replication of the experiment.

5.2. Results and discussion

Table and represent the estimated MSEs of the GRM estimators. To assess the performance of the proposed estimators, we consider multiple factors, i.e. multicollinearity, sample size and different explanatory variables to monitor efficiency and compared the performance of the proposed estimator with some existing estimators. We can observe from tables that the new proposed estimators perform better than RMLE as well as MLE and GRRE. Specifically, the shrinkage parameter k3 performed better in the sense of minimum MSE as compared to the other proposed estimators.

Table 1. Estimated MSE when p = 4.

It can also be observed from Tables and that as we increase the levels of multicollinearity, there is a general increase in the estimated MSEs of the considered estimators. Moreover, findings also demonstrate that the unrestricted estimators, i.e. MLE and GRRE severely effect from the correlated explanatory variables. Table shows that when severe multicollinearity exists among the explanatory variables, i.e. ρ2=0.999 and n = 500, then the MSEs of the MLE, RMLE, GRRE and six proposed estimators are respectively given as 13.2751, 8.5290, 10.2307, 1.8389, 2.9298, 1.5251, 11.7882, 6.9484 and 2.5806. Among these, k3 attains the minimum MSE as compared to others. So, we can say that our proposed k3 are more resistant in the presence of severe multicollinearity as compared to other estimators. Moreover, explanatory variables also play a critical role in the performance of all the listed estimators. It is clearly observed from Tables and that as we increase the value of p from 4 to 8, the MSE of all the estimators may rapidly increase. For both the value of p, we observed that our proposed restricted ridge parameters outperform the RMLE and other estimation methods in all of the evaluated situations. Thus we can say that the proposed RGRRE are significantly decreasing the MSE. The sample size is also an important factor in judging the performance of any estimator. Findings clarify that as we increase the sample size, the estimated MSE gradually decreases for all the estimators under study, which is considered as an important property of any estimator. Usually, for the estimation of unknown parameters, the most robust option is k3 as compared to other estimation methods in the presence of severe multicollinearity.

Table 2. Estimated MSE when p = 8.

6. An empirical application: hydrocarbon escape data

In this section, we evaluate the performance of the proposed estimator using a real application. For this purpose, we consider a hydrocarbon dataset, which is taken from Weisberg [Citation22]. When petrol is pumped into tanks, hydrocarbons escape into the atmosphere. For the reduction of pollution in the atmosphere, different devices are installed for the absorption of vapours. To evaluate their effectiveness, 32 laboratory experiments were conducted without using the devices. There were four explanatory variables that are involved in this laboratory experiment and the description of the given variables is as follows: The quantity y (in grams) of hydrocarbon escaping was measured as a function of the tank temperature x1 (in oF), the temperature x2 (in oF) of the petrol pumped in, the initial pressure x3 in the tank and the pressure x4 of the petrol pumped in (both in pounds per square inch). Before further proceeding, it is very crucial to find the probability distribution of the response variable y to identify the appropriate regression model. For this purpose, we use three important tests, namely, Anderson–Darling, Cramer–Von Mises and Pearson χ2 test (for detailed description, we recommend the following studies: [Citation23,Citation24]). The results of Table showed that the hydrogen escape dataset is well fitted to the gamma distribution due to its maximum p-value as compared to other competitive distributions. More specifically, it can be seen that the test statistic (p-value) of Cramer–Von Mises found to be 0.0459 (0.5811) clearly signifying that the considered dataset is well fitted to the gamma distribution. The bivariate correlation among the four explanatory variables is displayed in Table . One can be clearly seen from Table that there is a high correlation among all of the four explanatory variables, which clearly indicates that the dataset is highly multicollinear.

Table 3. Distribution of goodness of fit test for hydrogen escape data.

Table 4. Correlation matrix.

Based on the analysis of the dataset, the estimated coefficients and relative efficiencies are summarized in Table to judge the performance of the MLE, RMLE and RGRREs. The restriction matrix for the considered dataset is to be F=[0,1,1,1,1] with f=[0]. Since relative efficiency is used in the example for assessment purpose, particularly defined as e(β^MLE,β^i)=SMSE(β^i)SMSE(β^MLE) where β^i(β^RMLE,β^(k),β^F(k1)β^F(k6)) is any biased estimator. Thus e(β^MLE,β^i)<1 indicate improved precision relative to MLE, and e(β^MLE,β^i)>1 indicate worse performance. From Table , the RMLE (β^RMLE), GRRE (β^(k)) and proposed RGRREs (β^F(k1)β^F(k6)) have smaller SMSE than MLE (β^MLE). Moreover, the minimum value of e(β^MLE,β^i) suggests the best estimator, among others. The application result shows that the performance of the proposed RGRRE outperforms the RMLE and GRRE in the presence of high, but imperfect multicollinearity. In Table , the results from the GRM are shown in terms of the estimated parameter and relative efficiency for different estimators. It can be noted that the MSE is most extensive for β^MLE and β^(k) indicating the poor performance in case of severe multicollinearity and many explanatory variables as shown in the simulation study. The second worst estimator is the β^RMLE. The estimator that minimizes the MSE is the β^F(k6).

Table 5. Estimated parameter and relative efficiency for different estimators.

7. Conclusion

In this paper, we introduced a new RGRRE for the estimation of unknown parameters of the GRM in the presence of mild to severe multicollinearity. We also proposed some methods to choose the ridge parameter k for the RGRRE. A Monte Carlo simulation study has been designed to evaluate the performance of the proposed RGRREs and compared it with other estimators under different evaluated conditions where MSE is considered as an assessment criterion. Moreover, to illustrate the benefits of using the proposed estimator, we also consider an empirical application. From the simulation and real application results, we conclude that the performance of RGRRE under different ridge parameters is better than the MLE, RMLE and GRRE. Specifically, the RGRRE with parameter k3 is the most robust and recommended estimator in the sense of minimum MSE. Since Monte Carlo simulation results proved k3 as the most resistant estimator in the presence of severe multicollinearity. Hence, we suggest the researchers to use RGRRE with ridge parameter k3 and k6 for the estimation of GRM in the presence of severe multicollinearity.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Algamal ZY. Developing a ridge estimator for the gamma regression model. J Chemometr. 2018;32(10):e3054.
  • Qasim M, Amin M, Amanullah M. On the performance of some new Liu parameters for the gamma regression model. J Stat Comput Simul. 2018;88(16):3065–3080.
  • Nyquist H. Restricted estimation of generalized linear models. J Royal Statist Soc C Appl Statist. 1991;40(1):133–141.
  • Kibria BG, Saleh AME. Improving the estimators of the parameters of a probit regression model: a ridge regression approach. J Stat Plan Inference. 2012;142(6):1421–1435.
  • Qasim M, Kibria BMG, Månsson K, et al. A new Poisson Liu regression estimator: method and application. J Appl Stat. 2020;47(12):2258–2271.
  • Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67.
  • Amin M, Qasim M, Amanullah M, et al. Performance of some ridge estimators for the gamma regression model. Statist Papers. 2020;61(3):997–1026.
  • Kaçiranlar S, Sakallioğlu S, Akdeniz F, et al. A new biased estimator in linear regression and a detailed analysis of the widely-analyzed dataset on portland cement. Sankhyā Ind J Stat B. 1999: 443–459.
  • Sarkar N. A new estimator combining the ridge regression and the restricted least squares methods of estimation. Commun Stat Theory Methods. 1992;21(7):1987–2000.
  • Mahmoudi A, Arabi Belaghi R, Mandal S. A comparison of preliminary test, stein-type and penalty estimators in gamma regression model. J Stat Comput Simul. 2020;90(17):3051–3079.
  • Månsson K, Kibria BM, Shukur G. A restricted Liu estimator for binary regression models and its application to an applied demand system. J Appl Stat. 2016;43(6):1119–1127.
  • Asar Y, Arashi M, Wu J. Restricted ridge estimator in the logistic regression model. Commun Stat Simul Comput. 2017;46(8):6538–6544.
  • Segerstedt B. On ordinary ridge regression in generalized linear models. Commun Statist Theory Methods. 1992;21(8):2227–2246.
  • Kurtoğlu F, Özkale MR. Restricted ridge estimator in generalized linear models: Monte Carlo simulation studies on Poisson and binomial distributed responses. Commun Stat Simul Comput. 2017;48(4):1191–1218.
  • Wang SG, Wu MX, Jia ZZ. Matrix inequalities. 2nd ed Beijing: Chinese Science Press; 2006.
  • Rao CR, Toutenburg H, Shalabh, et al. Linear models and generalizations—least squares and alternatives. Berlin: Springer; 2008.
  • Kibria BG. Performance of some new ridge regression estimators. Commun Stat Simul Comput. 2003;32(2):419–435.
  • Qasim M, Månsson K, Kibria BMG. On some beta ridge regression estimators: method, simulation and application. J Stat Comput Simul. 2021;91(9):1699–1712.
  • McDonald GC, Galarneau DI. A Monte Carlo evaluation of some ridge-type estimators. J Am Stat Assoc. 1975;70(350):407–416.
  • Månsson G K, Shukur G. A Poisson ridge regression estimator. Econo Model. 2011;28:1475–1481.
  • Varathan N, Wijekoon P. Optimal generalized logistic estimator. Commun Stat Theory Methods. 2018;47:463–474.
  • Weisberg S. Applied linear regression. John Wiley & Sons; 1980.
  • J. Zhang, Powerful goodness-of-fit and multi-sample tests, PhD Thesis. 2011. York University, Toronto.
  • Evan DL, Drew JH, Leemis LM. The distribution of Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling test statistics for exponential populations with estimated parameters. Commun Stat Theory Methods. 2008;37:1396–1421.