Full article: A restricted gamma ridge regression estimator combining the gamma ridge regression and the restricted maximum likelihood methods of estimation

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this article, we propose a restricted gamma ridge regression estimator (RGRRE) by combining the gamma ridge regression (GRR) and restricted maximum likelihood estimator (RMLE) to combat multicollinearity problem for estimating the parameter $β$ in the gamma regression model. The properties of the new estimator are discussed, and its superiority over the GRR, RMLE and traditional maximum likelihood estimator is theoretically analysed under different conditions. We also suggest some estimating methods to find the optimal value of the shrinkage parameter. A Monte Carlo simulation study is conducted to judge the performance of the proposed estimator. Finally, an empirical application is analysed to show the benefit of RGRRE over the existing estimators.

KEYWORDS:

1. Introduction

Gamma regression model (GRM) is frequently used in the field of medical sciences, health care economics and automobile insurance claims [Citation1]. The GRM is used, when the dependent variable is positively skewed and assumes that the dependent variable has a gamma distribution [Citation2]. The usual maximum likelihood estimator (MLE) is used to estimate the parameter vector $β$ of the GRM by means of iterative reweighed least squares (IRWLS) algorithm. The MLE minimizes the weighted sum of squared error. Though the MLE estimator cannot be suitable if it is suspected that $β$ may belong to a linear subspace such as $F β = f$ , where f is an $m \times 1$ vector of known elements and F is the $m \times p$ known matrix of full rank with $m < p .$ In this condition, a restricted maximum likelihood estimator (RMLE) should be used [Citation3,Citation4]. The problem of multicollinearity and its consequences on a GLM are well known, for example, one of the most important consequences of multicollinearity on the restricted and unrestricted estimators in the GRM is that both estimators provide large variances of the regression coefficients. Another consequence of multicollinearity is the wider confidence interval, decreased statistical power which results in an increase in the probability of type II error in hypothesis testing in terms of the parameters [Citation5].

Unrestricted ridge estimators under the classical linear regression model and the GRM have been considered respective by Hoerl and Kennard [Citation6] and Amin et al. [Citation7] to remedy the problem caused by multicollinearity. However, the regression coefficient estimates are obtained by using the restricted estimator when the prior information for regression coefficients is available, which be able to form linear restriction $F β = f$ in the GLM. For instance, the linear restriction exists in sum-to-zero parameterization, when the regressors are qualitative and in quantal response problems with constant relative potency between drugs (see for more details, Citation3). Therefore, in this paper, the model of interest is the GRM which relates to the system of linear restrictions. Nyquist [Citation3] proposed the RMLE for the GLM under linear restrictions on the parameters. However, in the presence of multicollinearity, the weighted matrix of cross products is ill-conditioned, which leads to instability and high variances of both the MLE (unrestricted) and RMLE. To overcome the problem of multicollinearity, different kinds of shrinkage estimator have been proposed. For the linear regression model, Kaçiranlar et al. [Citation8] developed the restricted Liu estimator by combing the Liu estimator and the restricted least square estimator (RSLE), and Sarkar [Citation9] proposed restricted ridge regression approach by combining ordinary ridge regression and RSLE. However, literature on the restricted ridge regression under the GRM is limited.

The aim of this paper is to propose a new restricted gamma ridge regression estimator (RGRRE) by combining the GRRE and RMLE by following the work of Sarkar [Citation9]. The motivation at the back of the new estimator is as follows. Under certain circumstances, a reparameterization in the forms of linear combinations of the regression coefficients is found when the linear restrictions on the regression coefficients are true. Now consider the problem of multicollinearity under the reparametrized model, then it is required to modify the RMLE in the line of GRRE to obtain the efficient estimates. The mean square error (MSE) properties of the RGRRE are studied and show the superiority of RGRRE to the RMLE. We also suggest some methods for estimating the value of ridge parameter.

The rest of this study is organized as follows: The GRM and existing estimators are discussed in Section 2. In Section 3, we derive the MSE properties of the proposed estimator and the theoretical comparison with existing estimators is considered. In Section 4, we suggest some estimators for the selection of the ridge parameter of the RGRRE. The design of the Monte Carlo simulation study and its results are provided in Section 5. A real-life dataset is analysed in Section 6. Some concluding remarks are presented in Section 7.

2. The gamma regression model and estimators

In the GLM framework, the response variable $y$ is assumed to follow an exponential family distribution with mean $μ$ . The random variables $y_{i} (i = 1, \dots, n)$ contain n independent observations, i.e. $y_{1}, y_{2}, \dots, y_{n}$ . Suppose that the probability density function of $y_{i}$ is shown as below $\begin{aligned} f (y_{i}, θ_{i}, φ) = \exp [\frac{y_{i} θ_{i} - b (θ_{i})}{a (φ)} + c (y_{i}, φ)], \end{aligned}$ where $θ_{i}$ is the location parameter, $φ$ is the dispersion parameter and $b (θ_{i})$ is the cumulant function. The GLM employs the relation (1) $\begin{aligned} g (μ_{i}) = η_{i} = x_{i}^{'} β, i = 1, \dots, n \end{aligned}$ (1) where $E (y_{i}) = μ_{i} = \frac{\partial b (θ_{i})}{\partial θ_{i}}$ , $g (\cdot)$ is the monotonic differentiable log link function, $η_{i}$ is the linear predictor, $β = (β_{1}, β_{2}, \dots, β_{p})^{'}$ is a $p \times 1$ vector of regression coefficients and $x_{i}$ is the ith row of $X = [x_{i 1}, x_{i 2}, \dots, x_{i p}]^{'}$ which is an $n \times p$ matrix with $p$ non-stochastic explanatory variables. Then the log-likelihood function is defined as $\begin{aligned} ℓ_{i} = [\frac{y_{i} θ_{i} - b (θ_{i})}{a (φ)} + c (y_{i}, φ)] \end{aligned}$ The MLE of $β_{j}$ is obtained by differentiating the function below $\begin{aligned} ℓ (β) = \sum_{i = 1}^{n} [\frac{y_{i} θ_{i} - b (θ_{i})}{a (φ)} + c (y_{i}, φ)] \end{aligned}$ which is $\begin{aligned} \frac{\partial ℓ}{\partial β_{j}} = \frac{1}{a (φ)} \sum_{i = 1}^{n} [\frac{y_{i} - μ_{i}}{w_{i}} g^{'} (μ_{i}) f_{j} (x_{i})], j = 1, 2, \dots, p . \end{aligned}$ Consider the response variable $y_{i}$ contains n independent observations that comes from the gamma distribution. Let $y_{i} \sim G a m m a (μ, ϕ)$ by using the reparameterization as $α = ϕ^{- 1}$ and $β = μ ϕ$ where $μ = E (y_{i})$ is the mean function of the response variable and $ϕ > 0$ is the dispersion parameter (see for more details of the GRM estimation [Citation7,Citation10]). The mean function of the GRM is defined as $E (y_{i}) = \frac{\partial b (θ_{i})}{\partial θ_{i}} = μ_{i} = e x p (x_{i}^{'} β)$ by using log link function $g (μ_{i}) = η_{i} = x_{i}^{'} β$ as defined in Equation (1). The most common estimation method for the GRM is the MLE. The MLE is obtained by solving $\begin{aligned} \frac{\partial l}{\partial β} = \frac{d l}{d θ_{i}} \frac{d θ_{i}}{d μ_{i}} \frac{d μ_{i}}{d η_{i}} \frac{\partial η_{i}}{\partial β} = \sum_{i = 1}^{n} \frac{(y_{i} - μ_{i})}{a (φ)} \frac{1}{V a r_{μ}} \frac{d μ_{i}}{d η_{i}} x_{i} . \end{aligned}$ Since the above function is non-linear in $β$ , so we can solve the above expression iteratively through Fisher’s scoring method. Consider $β^{(m)}$ be the estimated value of the MLE of $β$ with iterations $t (t \geq 1)$ which may be written as $\begin{aligned} β^{(t + 1)} = β^{(t)} + {I (β^{(t)})}^{- 1} S (β^{(t)}), \end{aligned}$ where $I (β^{(t)})$ represents Fisher information matrix and $S (β)$ is the $p \times 1$ vector of scores, and both $S (β)$ and $I (β^{(t)})$ are computed at the last iteration $β^{(t)}$ , where the convergence is achieved. After the final iteration, the MLE can be obtained by the IRWLS method as (2) $\begin{aligned} {\hat{β}}_{M L E} = (A)^{- 1} X^{'} \hat{W} z, \end{aligned}$ (2) where $A = X^{'} \hat{W} X$ , $\hat{W} = d i a g {(1 / μ_{i}^{2}) μ_{i}^{2}}$ and $z$ is $n \times 1$ vector with elements $z_{i} = x_{i}^{^{'}} β + \frac{(y_{i} - μ_{i})}{μ_{i}^{2}}$ being evaluated at the last iteration $β^{(t)}$ , whereas $μ_{i} = e x p (x_{i}^{^{'}} \hat{β})$ represents the mean function of the response variable using log link function. Let $X_{n}^{*} = {\hat{W}}_{n}^{1 / 2} X$ and $Z^{*} = {\hat{W}}_{n}^{1 / 2} z$ , then $\begin{aligned} {\overset{ˇ}{β}}_{M L E} = (X {_{n}^{*}}^{'} X_{n}^{*})^{- 1} X_{n}^{*} Z^{*} . \end{aligned}$ By following Kibria and Saleh [Citation4], we assume two conditions (i) $\frac{1}{n} (X {_{n}^{*}}^{'} X_{n}^{*}) = A$ , as $n \to \infty$ and (ii) $max_{1 \leq i \leq n} x_{n_{i}}^{*} (X {_{n}^{*}}^{'} X_{n}^{*})^{- 1} x_{n_{i}}^{*} \to 0$ as $n \to \infty$ , where $A = X^{'} \hat{W} X$ is finite and p.d matrix and $x_{n_{i}}^{*}$ is the ith row of the matrix $X_{n}^{*}$ . The asymptotic distribution of ${\overset{ˇ}{β}}_{M L E}$ is stated as $\begin{aligned} n^{1 / 2} F ({\overset{ˇ}{β}}_{M L E} - β) \sim N_{m} [0, (F^{'} A^{- 1} F)] . \end{aligned}$

2.1. The RMLE

One method to improve the efficiency of the estimators is the use of extraneous or prior information. In practice, such prior information might be accessible in relation to the regression coefficients. For instance, in applied economics, the constant returns to scale imply that the exponents in a Cobb–Douglas production function should sum to unity. In the second example, the absence of money illusion on the part of consumers implies that the sum of money income and price elasticities in a demand function should be zero. This type of prior information may be available from an extraneous source, user experience, some theoretical consideration, among others. To make use of such information in enhancing the estimation of regression coefficients, it can be said in the form of linear restrictions.

Our primary aim is to estimate $β = (β_{1}, β_{2}, \dots, β_{p})^{'}$ when it is suspected that $β$ belongs to the linear sub-space defined by (3) $\begin{aligned} F_{r}^{^{'}} β = f_{r}, r = 1, 2, \dots, m, \end{aligned}$ (3) where $f_{r}$ are scalars, $F_{r}$ are $p \times 1$ known vectors and $m$ linearly independent restriction on parameter vector $β$ . In such situation, we may use restricted estimator of $β$ [Citation3,Citation4,Citation11,Citation12]. The RMLE [Citation3] is obtained to maximize the log-likelihood function of the GRM over $β$ under restrictions $F_{r}^{'} β = f_{r}$ . One method for solving restricted optimization problems is the quadratic penalty function. The quadratic function for the RMLE is given as $\begin{aligned} Θ (β, λ) = ℓ (β) - \frac{1}{2} \sum_{r = 1}^{m} λ_{r} (f_{r} - F_{r}^{'} β)^{2} \end{aligned}$ and $m a x_{β} Θ (β, λ)$ for fixed and positive value of $λ_{j}$ . Differentiating $Θ (β, λ)$ with respect to $β_{j}$ equals $\begin{aligned} P_{j} (β, λ) = \sum_{i = 1}^{n} \frac{(y_{i} - μ_{i})}{a (φ)} \frac{1}{V a r_{μ}} \frac{d μ_{i}}{d η_{i}} x_{i j} - \frac{1}{2} \sum_{r = 1}^{m} F_{r j} λ_{r} (f_{j} - F_{j}^{'} β) . \end{aligned}$ We compute the RMLE by using the method similar to that for unrestricted estimator. Therefore, the $(t + 1) th$ approximation of the RMLE is finally obtained as (4) $\begin{aligned} {\hat{β}}_{R M L E} = {\hat{β}}_{M L E} + A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} (f - F {\hat{β}}_{M L E}) . \end{aligned}$ (4) For testing the hypothesis $H_{o} : F β = f$ , the Wald-type test is defined as $\begin{aligned} L_{n} = n {(F {\overset{ˇ}{β}}_{M L E} - f)}^{'} [F^{'} {(n^{- 1} A_{n})}^{- 1} F]^{- 1} (F {\overset{ˇ}{β}}_{M L E} - f) . \end{aligned}$ Under $H_{o} : F β = f$ , $L_{n} \overset{D}{\to} χ_{m}^{2}$ as $n \to \infty$ , $L_{n}$ follows central $χ^{2}$ distribution with m degrees of freedom.

Clearly, $E ({\hat{β}}_{R M L E}) \neq β$ unless (3) holds: $\begin{aligned} B i a s ({\hat{β}}_{R M L E}) = E ({\hat{β}}_{R M L E}) - β = A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} (f - F β) . \end{aligned}$ The variance–covariance matrix of ${\hat{β}}_{R M L E}$ is defined as $\begin{aligned} C o v ({\hat{β}}_{R M L E}) = φ [A^{- 1} - A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1}], \end{aligned}$ where $φ$ is the estimated dispersion parameter computed as $φ = \frac{1}{n - p} \sum_{i = 1}^{n} \frac{{(y_{i} - {\hat{μ}}_{i})}^{2}}{{\hat{μ}}_{i}^{2}} .$

The performance of the RMLE is superior to the MLE since RMLE has a smaller sampling variance than the MLE. The asymptotic variance–covariance matrix of ${\hat{β}}_{M L E}$ is defined as $C o v ({\hat{β}}_{M L E}) = φ (A^{- 1})$ . Therefore, it is shown that $\begin{aligned} [C o v ({\hat{β}}_{M L E}) - C o v ({\hat{β}}_{R M L E})] = φ (A^{- 1}) - φ [A^{- 1} - A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1}] \end{aligned}$ which is a positive semidefinite (psd) matrix. So, we conclude that the RMLE has minimum sampling variance as compared to the MLE. Amin et al. [Citation7] proposed the GRRE for the GRM (5) $\begin{aligned} \hat{β} (k) = A_{k} {\hat{β}}_{M L E}, k \geq 0 \end{aligned}$ (5) where $A_{k} = (I_{p} + k A^{- 1})^{- 1}$ , $I_{p}$ is an identity matrix of order $p \times p$ and $k \geq 0$ is the ridge parameter. The MSE properties of the $\hat{β} (k)$ are defined by Amin et al. [Citation7].

3. Proposed estimator

The RMLE of $β$ is obtained by maximizing the log-likelihood function of the GRM subject to the restrictions given in Equation (3). Now considering the problem of multicollinearity under the reparametrized model, the method of RMLE may produce poor estimates and provide misleading information as in MLE in the presence of multicollinearity. Therefore, it is required to modify the RMLE in the line of GRRE to obtain efficient estimates under a set of linear restrictions by following the work of Sarkar [Citation9]. Quadratic function for the RGRRE is defined as $\begin{aligned} Θ (β, k, λ) = β^{'} β - \frac{2}{k} l (β) + \sum_{r = 1}^{m} λ_{r} (f_{r} - F_{r}^{^{'}} β)^{2}, \end{aligned}$ where $1 / k$ is the Lagrange multipliers. The quadratic function $Θ (β, k, λ)$ combines objective functions of GRRE [Citation13] and RMLE [Citation3], and the same idea is also carried out by Kurtoǧlu and Özkale [Citation14] for the count regression model where the dispersion parameter equals to 1. Differentiating $Θ (β, k, λ)$ with respect to $β_{j}$ equals $\begin{aligned} P_{j} (β, k, λ) = 2 [β_{j} - \frac{1}{k} \sum_{i = 1}^{n} \frac{(y_{i} - μ_{i})}{a (φ)} \frac{1}{V a r_{μ}} \frac{d μ_{i}}{d η_{i}} x_{i j} + \sum_{r = 1}^{m} F_{r j} λ_{r} (f_{j} - F_{j}^{'} β)] . \end{aligned}$ Now define $H (β, k, λ)$ is $p \times p$ matrix with elements $ℜ_{j q} (β, k, λ)$ and taking the second-order derivates of $P_{j} (β, k, λ)$ as $\begin{aligned} ℜ_{j q} (β, k, λ) = \frac{\partial^{2} (Θ (β, k, λ))}{\partial β_{j} \partial β_{q}} = 2 [δ_{j q} - \frac{1}{k} \frac{\partial^{2} (ℓ (β))}{\partial β_{j} \partial β_{q}} + \sum_{r = 1}^{m} λ_{r} F_{r j} F_{r q}] . \end{aligned}$ Taking the expectation of both sides, we have $\begin{aligned} E [\frac{\partial^{2} (Θ (β, k, λ))}{\partial β_{j} \partial β_{q}}] & = - 2 [δ_{j q} - \frac{1}{k} E {- \frac{\partial^{2} (ℓ (β))}{\partial β_{j} \partial β_{q}}} + \sum_{r = 1}^{m} λ_{r} F_{r j} F_{r q}] \\ = - 2 [δ_{j q} + \frac{1}{k} \sum_{i = 1}^{n} \frac{x_{i j} x_{i q}}{a (φ)} \frac{1}{μ_{i}^{2}} {(\frac{d μ_{i}}{d η_{i}})}^{2} + \sum_{r = 1}^{m} λ_{r} F_{r j} F_{r q}], \end{aligned}$ where $δ_{j q} = {\begin{array}{ll} 1 & i f j = q \\ 0 & o t h e r w i s e \end{array}$ .

Let $β_{F} (k)$ is the RGRRE of $β$ with iterations $t (t \geq 1)$ . Subsequently, by means of the Fisher scoring method this case yields $\begin{aligned} β_{F} (k)^{(t + 1)} = β_{F} (k)^{(t)} + H ({β_{F}}^{(t)}, k, λ)^{- 1} Θ ({β_{F}}^{(t)}, k, λ), \end{aligned}$ Based on the objective functions of RMLE $Θ (β, λ)$ and GRRE $Θ (β, k, λ)$ , the RGRRE is defined as (6) $\begin{aligned} {\hat{β}}_{F} (k) = \hat{β} (k) + (A + k I_{p})^{- 1} F^{'} [F {(A + k I_{p})}^{- 1} F^{'}]^{- 1} (f - F {\hat{β}}_{M L E}), k > 0. \end{aligned}$ (6) The final form of the restricted ridge estimator in the GLM was proposed by Kurtoğlu and Özkale [Citation14] and it is same for the GRM. However, the MSE properties of GRM are different as compared to the count regression model by considering the effect of dispersion parameter in the estimation methods of optimal shrinkage parameter $k$ in the RGRRE. The performance of the shrinkage estimators is different for different forms of the GLM in the restricted estimators (e.g. [Citation4,Citation10–12]).

The RGRRE can be simplified as (6a) $\begin{aligned} {\hat{β}}_{F} (k) = A_{k} {\hat{β}}_{R M L E}, \end{aligned}$ (6a) where $A_{k} = (I_{p} + k A^{- 1})^{- 1}$ and ${\hat{β}}_{R M L E} = {\hat{β}}_{M L E} + A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} (f - F {\hat{β}}_{M L E})$ . It can be easily seen that ${\hat{β}}_{F} (k) = {\hat{β}}_{R M L E}$ when $k = 0$ .

We compute the bias of ${\hat{β}}_{F} (k)$ by using (6) as $\begin{aligned} B i a s ({\hat{β}}_{F} (k)) & = E [{\hat{β}}_{F} (k)] - β \\ = [A_{k} β + A_{k} A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} (f - F β)] - β \\ = (A_{k} - I_{p}) β + A_{k} A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} (f - F β) \\ = - k (A + k I_{p})^{- 1} β + A_{k} A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} (f - F β) \end{aligned}$ So the RGRRE is a biased estimator of parameter vector $β$ unless $k = 0$ and $f - F β = 0$ , where $0$ is the $m \times 1$ null vector.

The variance–covariance matrix of ${\hat{β}}_{F} (k)$ is computed as $\begin{aligned} C o v ({\hat{β}}_{F} (k)) = φ [A_{k} {A^{- 1} - A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1}} {A_{k}}^{^{'}}] . \end{aligned}$

3.1. MSE properties and superiority of the ${\hat{β}}_{F} (k)$

In this section, we define the MSE properties and gauge the performance of ${\hat{β}}_{F} (k)$ . The matrix means squared error (MMSE) of an estimator $\hat{υ}$ of $υ$ can be defined as (7) $\begin{aligned} M M S E (\hat{υ}) = E (\hat{υ} - υ)^{^{'}} (\hat{υ} - υ) = C o v (\hat{υ}) + B i a s (\hat{υ}) B i a s (\hat{υ})^{^{'}}, \end{aligned}$ (7) where $C o v (\hat{υ})$ denotes the variance–covariance matrix of an estimator $\hat{υ}$ and $B i a s (\hat{υ}) = E (\hat{υ}) - υ$ represents the bias vector. The scalar MSE (SMSE) of $\hat{υ}$ can be found by applying the trace operator which is given by (8) $\begin{aligned} S M S E (\hat{υ}) = t r [M M S E (\hat{υ})] = t r [C o v (\hat{υ})] + B i a s (\hat{υ})^{^{'}} B i a s (\hat{υ}) . \end{aligned}$ (8) Let ${\hat{υ}}_{j} (j = 1, 2)$ be two competitive estimators of parameter $υ$ , and the estimator ${\hat{υ}}_{2}$ is said to be superior to ${\hat{υ}}_{1}$ in the form of MMSE criterion if $M M S E ({\hat{υ}}_{2}) - M M S E ({\hat{υ}}_{1}) \geq 0$ . Moreover, if the estimator ${\hat{υ}}_{2}$ dominates the estimator ${\hat{υ}}_{1}$ in the form of MMSE criterion, then $S M S E ({\hat{υ}}_{1}) \geq S M S E ({\hat{υ}}_{2})$ . For the comparison of the following discussion, we list the following lemmas for making the comparison of the estimators.

Lemma 1.

Let $A$ is a real symmetric matrix, $P$ is a matrix, then $A \geq 0 \Leftrightarrow \forall P, P^{'} A P \geq 0 \Leftrightarrow$ each eigenvalue of matrix $A$ is nonnegative.

Proof:

See Wang et al. [Citation15]

Lemma 2.

Suppose that matrices $A$ and $B$ are not singular, and $C a n d D$ are matrices with proper orders, then $(A + C B D)^{- 1} = A^{- 1} - A^{- 1} C (B^{- 1} + D A^{- 1} C)^{- 1} D A^{- 1} .$

Proof:

See Rao et al. [Citation16]

3.1.1. Comparison between ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$

It is evident that $E ({\hat{β}}_{R M L E}) \neq β$ unless (3) holds. The ${\hat{β}}_{R M L E}$ is an unbiased estimator when $τ = 0$ while ${\hat{β}}_{F} (k)$ is always a biased estimation method.

Theorem 3.1.

Under the GRM, the ${\hat{β}}_{F} (k)$ is a biased estimator and ${\hat{β}}_{R M L E}$ is an unbiased estimator when $τ = 0$ . Though, the $C o v ({\hat{β}}_{F} (k)) \leq C o v ({\hat{β}}_{R M L E})$ for $k \geq 0$ .

Proof:

As regards performance by the variance–covariance matrices of ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ , we compute the difference of two variance–covariance matrices as (9) $\begin{aligned} [C o v ({\hat{β}}_{R M L E}) - C o v ({\hat{β}}_{F} (k))] \\ = φ [A^{- 1} - A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1}] - φ [A_{k} {A^{- 1} - A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1}} {A_{k}}^{^{'}}] \\ = [φ A_{k} {k^{2} A^{- 1} G A^{- 1} + k G A^{- 1} + k A^{- 1} G} {A_{k}}^{^{'}}], \end{aligned}$ (9) where $G = A^{- 1} - A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} F A^{- 1}$ ( $G =$ psd), $A = X^{'} \hat{W} X$ ( $A =$ positive definite) and $A_{k} = (I_{p} + k A^{- 1})^{- 1}$ . From (9), we conclude that $A^{- 1} G$ is a non-negative definite (nnd) matrix and $A^{- 1} G A^{- 1}$ is a psd matrix. Therefore, $[C o v ({\hat{β}}_{R M L E}) - C o v ({\hat{β}}_{F} (k))]$ is a psd for $k \geq 0$ . Hence, the sampling variance of ${\hat{β}}_{F} (k)$ has smaller than the ${\hat{β}}_{R M L E}$ .

Now, we discuss the SMSE properties of the

{\hat{β}}_{F} (k)

and show the comparison of

{\hat{β}}_{F} (k)

to the

{\hat{β}}_{R M L E}

. Following [Citation11] and [Citation3], we define the SMSE of

{\hat{β}}_{R M L E}

for the GRM as below

(10)

\begin{aligned} S M S E ({\hat{β}}_{R M L E}) = φ t r (G) = φ \sum_{j = 1}^{p} m_{j j}, \end{aligned}

(10) where

m_{j j} \geq 0

represents the jth diagonal element of the matrix

M = Q^{'} G Q

, Q is the orthogonal matrix such that

Q^{'} G Q = Λ = d i a g (λ_{1}, λ_{2}, \dots, λ_{p})

. The SMSE of the RGRRE is computed as

(11)

\begin{aligned} S M S E ({\hat{β}}_{F} (k)) = φ t r [C o v ({\hat{β}}_{F} (k))] + [B i a s ({\hat{β}}_{F} (k))]^{^{'}} [B i a s ({\hat{β}}_{F} (k))] \end{aligned}

(11)

If we assume that the prior restrictions hold, i.e. $τ = 0$ , then (11) may be written as $\begin{aligned} S M S E ({\hat{β}}_{F} (k)) = φ t r (A_{k} G A_{k}^{'}) + k^{2} β^{'} (A + k I_{p})^{- 2} β . \end{aligned}$ After simplification, the SMSE of the RGRRE is defined as (12) $\begin{aligned} S M S E ({\hat{β}}_{F} (k)) = φ \sum_{j = 1}^{p} \frac{λ_{j}^{2}}{{(λ_{j} + k)}^{2}} m_{j j} + k^{2} \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + k)}^{2}} = γ_{1} (F_{k}) + γ_{2} (F_{k}) \end{aligned}$ (12) where $α$ is the jth element of $Q^{'} β$ , $λ_{j}$ is the jth eigenvalue of the matrix A, $γ_{1} (F_{k}) and γ_{2} (F_{k})$ represent the total variance and squared bias of ${\hat{β}}_{F} (k)$ respectively. From (12), it can be noted that the bias of the RGRRE is the same as the bias of the GRRE when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$ . By following [Citation6], we show the following SMSE properties of ${\hat{β}}_{F} (k)$ for the GRM.

Theorem 3.2.

The total variance $γ_{1} (F_{k})$ is a continuous and monotonically decreasing function of k.

Proof:

Differentiating the expression $γ_{1} (F_{k})$ with respect to k as below (13) $\begin{aligned} \frac{\partial {γ_{1} (F_{k})}}{\partial k} = - 2 φ \sum_{j = 1}^{p} \frac{λ_{j}^{2}}{{(λ_{j} + k)}^{3}} m_{j j} \end{aligned}$ (13) Based on (13), it is evident that $γ_{1} (F_{k})$ is a continuous and monotonically decreasing function of k since $\frac{\partial {γ_{1} (F_{k})}}{\partial k} \to - \infty$ when $k \to 0 +$ and $λ_{p} \to 0$ .

Theorem 3.3.

The squared bias $γ_{2} (F_{k})$ is a continuous and monotonically increasing function of k.

Proof:

Differentiating the expression $γ_{2} (F_{k})$ with respect to k as below (14) $\begin{aligned} \frac{\partial {γ_{2} (F_{k})}}{\partial k} = 2 k \sum_{j = 1}^{p} \frac{λ_{j} α_{j}^{2}}{{(λ_{j} + k)}^{3}} . \end{aligned}$ (14) Equation (14) indicates that $γ_{2} (F_{k})$ is a continuous and monotonically increasing function of k for $k > 0$ and $λ_{j} > 0$ .

Theorem 3.4.

Under the GRM, there always exists $k > 0$ in the range $0 < k < φ / [m a x (\frac{α_{j}^{2}}{(λ_{j} m_{j j})})]$ such that $S M S E ({\hat{β}}_{F} (k)) < S M S E ({\hat{β}}_{R M L E})$ when $τ = 0$ .

Proof:

The first derivative of (12) with respect to k equals (15) $\begin{aligned} \frac{\partial {S M S E ({\hat{β}}_{F} (k))}}{\partial k} = - 2 φ \sum_{j = 1}^{p} \frac{λ_{j}^{2}}{{(λ_{j} + k)}^{3}} m_{j j} + 2 k \sum_{j = 1}^{p} \frac{λ_{j} α_{j}^{2}}{{(λ_{j} + k)}^{3}} = 2 \sum_{j = 1}^{p} \frac{k λ_{j} α_{j}^{2} - φ λ_{j}^{2} m_{j j}}{{(λ_{j} + k)}^{3}} . \end{aligned}$ (15) As discussed earlier $m_{j j} \geq 0$ and $λ_{j} > 0$ , $\forall j = 1, 2, \dots, p$ . It is well known that total variance and total bias are monotonically decreasing and increasing function of k. It is then evident that $\frac{\partial {γ_{1} (F_{k})}}{\partial k}$ and $\frac{\partial {γ_{2} (F_{k})}}{\partial k}$ are always non-positive and non-negative, respectively. Thus to prove the theorem it is enough to condition that $0 < k < φ / [m a x (\frac{α_{j}^{2}}{(λ_{j} m_{j j})})]$ for $\frac{\partial {S M S E ({\hat{β}}_{F} (k))}}{\partial k}$ to be negative. Hence, we have proven that the existence of a value for $k,$ which showed the superiority of RGRRE over the GRRE.

Remark.

From Theorem 3.4, it is noted that $M = Q^{'} G Q = Λ^{- 1} - B$ , where $B = Q^{'} A^{- 1} F^{'} (F A^{- 1} F^{'})^{- 1} F A^{- 1} Q$ and $M$ is a psd matrix. Consequently, $m_{j j} = \frac{1}{λ_{j}} - b_{j j} \leq \frac{1}{λ_{j}}$ and $b_{j j} = d i a g (B)$ . It can also be noted that $φ / [m a x (\frac{α_{j}^{2}}{(λ_{j} m_{j j})})] \leq φ / α_{m a x}^{2}$ . Therefore, we conclude that one of the beneficial consequences of involving exact prior information is that the range of values of k for the dominance of the RGRRE over the RMLE becomes reduced as compared to that for the dominance of GRRE over the traditional MLE in the sense of SMSE criterion.

3.1.2. Comparison between ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$

This section compares the performance of ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ .

Theorem 3.5.

Under the GRM, the $C o v ({\hat{β}}_{F} (k))$ is less than $C o v (\hat{β} (k))$ for $k \geq 0 iff τ = 0$ .

Proof:

Both ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ have same bias iff $τ = 0$ , which can be defined as $\begin{aligned} B i a s ({\hat{β}}_{F} (k)) = B i a s (\hat{β} (k)) = - k (A + k I_{p})^{- 1} β . \end{aligned}$ Thus we only make a comparison between the dispersion matrices of the ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ . (16) $\begin{aligned} [C o v (\hat{β} (k)) - C o v ({\hat{β}}_{F} (k))] \\ = φ [W {(k)}^{- 1} A W {(k)}^{- 1}] - φ [A_{k} {A^{- 1} - A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1}} A_{k}] \\ = φ [W {(k)}^{- 1} A^{- 1} F^{'} {(F A^{- 1} F^{'})}^{- 1} F A^{- 1} W {(k)}^{- 1}], \end{aligned}$ (16) where $W (k) = (A + k I_{p})$ , $A_{k} = W (k) A$ and $W (k) = A_{k} A^{- 1}$ . $[C o v (\hat{β} (k)) - C o v ({\hat{β}}_{F} (k))]$ is a psd matrix for $k \geq 0$ . It is enough to show that RGRRE is superior to the GRRE.

3.1.3. Comparison between ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when $τ \neq 0$

The performance of the ${\hat{β}}_{F} (k)$ ultimately depends on the assumed restrictions and its bias when the prior constraints on the parameters are not true. To make a comparison of ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ in the sense of SMSE, we illustrate Theorem 3.6.

Theorem 3.6.

Under the GRM with collinear regressors, if we have $\overset{ˇ}{k} > 0$ , where $\overset{ˇ}{k} = [m i n (ϕ λ_{j}^{2} m_{j j} + \overset{˘}{τ}_{j}^{*} + α_{j} λ_{j} \overset{˘}{τ}_{j}^{*})] / [m a x (λ_{j} α_{j}^{2} + α_{j} \overset{˘}{τ}_{j}^{*})]$ . Then $S M S E ({\hat{β}}_{F} (k)) < S M S E ({\hat{β}}_{R M L E})$ for $0 < k < \overset{ˇ}{k} < \infty$ when $τ \neq 0$ .

Proof:

Rewrite the SMSE of ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when the restriction does not hold, respectively: $\begin{aligned} SMSE ({\hat{β}}_{R M L E}) = φ t r [G] + {τ^{*}}^{'} A^{- 2} τ^{*}, \end{aligned}$ where $τ^{*} = F^{'} (F A^{- 1} F^{'})^{- 1} τ$ . The final form of SMSE $({\hat{β}}_{R M L E})$ is defined as below: (17) $\begin{aligned} SMSE ({\hat{β}}_{R M L E}) = \sum_{j = 1}^{p} \frac{φ λ_{j}^{2} m_{j j} + \overset{˘}{τ} {_{j}^{*}}^{2}}{λ_{j}^{2}}, \end{aligned}$ (17) where ${\hat{τ}}^{*} := Q^{' *}$ . The SMSE of ${\hat{β}}_{F} (k)$ may rewrite as below (18) $\begin{aligned} SMSE ({\hat{β}}_{F} (k)) & = φ t r [A_{k} G {A_{k}}^{'}] + {[W {(k)}^{- 1} τ^{*} - k W {(k)}^{- 1} β {τ^{*}}^{'} A^{- 2} τ^{*}]}^{'} \\ \times [W {(k)}^{- 1} τ^{*} - k W {(k)}^{- 1} β {τ^{*}}^{'} A^{- 2} τ^{*}] \\ δ (k) & = SMSE ({\hat{β}}_{F} (k)) = φ \sum_{j = 1}^{p} \frac{λ_{j}^{2} m_{j j}}{{(λ_{j} + k)}^{2}} + \sum_{j = 1}^{p} \frac{k^{2} α_{j}^{2} + \overset{˘}{τ} {_{j}^{*}}^{2} - 2 k α_{j} {\overset{˘}{τ}}_{j}^{*}}{{(λ_{j} + k)}^{2}} . \end{aligned}$ (18) Taking the first derivative of (18) with respect to k to complete the theorem. $\begin{aligned} δ {(k)}^{'} = \frac{\partial {SMSE ({\hat{β}}_{F} (k))}}{\partial k} = - 2 φ \sum_{j = 1}^{p} \frac{λ_{j}^{2} m_{j j}}{{(λ_{j} + k)}^{3}} + 2 k \\ \sum_{j = 1}^{p} \frac{k λ_{j} α_{j}^{2} + k α_{j} {\overset{˘}{τ}}_{j}^{*} - \overset{˘}{τ} {_{j}^{*}}^{2} - α_{j} λ_{j} {\overset{˘}{τ}}_{j}^{*}}{{(λ_{j} + k)}^{3}} \\ = 2 \sum_{j = 1}^{p} \frac{k λ_{j} α_{j}^{2} + k α_{j} {\overset{˘}{τ}}_{j}^{*} - (\overset{˘}{τ} {_{j}^{*}}^{2} + α_{j} λ_{j} {\overset{˘}{τ}}_{j}^{*} + φ λ_{j}^{2} m_{j j})}{{(λ_{j} + k)}^{3}} . \end{aligned}$ and so $\begin{aligned} δ {(0)}^{'} = 2 \sum_{j = 1}^{p} \frac{- (\dot{τ}_{j}^{* 2} + α_{j} λ_{j} {\overset{˘}{τ}}_{j}^{*} + φ λ_{j}^{2} m_{j j})}{{(λ_{j})}^{3}} . \end{aligned}$ Additionally, $\begin{aligned} δ (0) = \sum_{j = 1}^{p} \frac{φ λ_{j}^{2} m_{j} + {\dot{τ}}_{j}^{* 2}}{λ_{j}^{2}} = SMSE ({\hat{β}}_{R M L E}) . \end{aligned}$ Thus, to prove the theorem, it is enough to show that $δ (k)^{^{'}}$ is negative. If $δ (0)^{^{'}} < 0$ , then for $0 < k < \overset{ˇ}{k} < \infty$ , where $\overset{ˇ}{k} = \frac{[m i n (ϕ λ_{j}^{2} m_{j j} + \overset{˘}{τ}_{j}^{*} + α_{j} λ_{j} \overset{˘}{τ}_{j}^{*})]}{[m a x (λ_{j} α_{j}^{2} + α_{j} \overset{˘}{τ}_{j}^{*})]}$ , we have $δ (k) < δ (0)$ or $S M S E ({\hat{β}}_{F} (k)) < S M S E ({\hat{β}}_{R M L E})$ .

3.1.4. Comparison between ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ when $τ \neq 0$

As we discussed the performance of ${\hat{β}}_{F} (k)$ in the previous section, so it is easy to show the superiority of ${\hat{β}}_{F} (k)$ to the $\hat{β} (k)$ .

Theorem 3.7.

Under the GRM with collinear regressors when $τ \neq 0$ , if we have $k > 0$ , then $S M S E ({\hat{β}}_{F} (k)) < S M S E (\hat{β} (k))$ for $0 < \frac{[m a x (\overset{˘}{τ} {_{j}^{*}}^{2} - ϕ λ_{j} (1 + λ_{j} m_{j j}))]}{[m i n (2 α_{j} \overset{˘}{τ}_{j}^{*})]} < k < \infty$ .

Proof:

The SMSE of $\hat{β} (k)$ is defined as (19) $\begin{aligned} S M S E (\hat{β} (k)) = φ \sum_{j = 1}^{p} \frac{λ_{j}}{{(λ_{j} + k)}^{2}} + k^{2} \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + k)}^{2}} \end{aligned}$ (19) The SMSE difference is computed by using (18)–(19), $\begin{aligned} SMSE (\hat{β} (k)) - SMSE ({\hat{β}}_{F} (k)) = \sum_{j = 1}^{p} \frac{2 k α_{j} \overset{˘}{τ}_{j}^{*} - [\overset{˘}{τ} {_{j}^{*}}^{2} - ϕ λ_{j} (1 + λ_{j} m_{j j})]}{[m i n (2 α_{j} \overset{˘}{τ}_{j}^{*})]} . \end{aligned}$ The expression $2 k α_{j} \overset{˘}{τ}_{j}^{*} - [\overset{˘}{τ} {_{j}^{*}}^{2} - ϕ λ_{j} (1 + m_{j j})] > 0$ if $k > \frac{[m a x (\overset{˘}{τ} {_{j}^{*}}^{2} - ϕ λ_{j} (1 + λ_{j} m_{j j}))]}{[m i n (2 α_{j} \overset{˘}{τ}_{j}^{*})]} > 0$ . This completes the theorem.

4. Estimation of the shrinkage parameter k

In practice, it is best to estimate the shrinkage parameter k in order to attain minimum MSE of the RMLE and RGRRE instead of MLE and GRRE. For this purpose, we find the optimal value of k by differentiating (12) with respect to k and equating to zero, we have (20) $\begin{aligned} \frac{\partial [M S E ({\hat{β}}_{F} (k))]}{\partial k} = φ \sum_{j = 1}^{p} \frac{λ_{j}^{2}}{{(λ_{j} + k)}^{3}} m_{j j} - k \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + k)}^{2}} + k^{2} \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + k)}^{2}} = 0 \end{aligned}$ (20) It is easy to see that the above equation may be simplified as (21) $\begin{aligned} \frac{\partial [M S E ({\hat{β}}_{F} (k))]}{\partial k} = φ λ_{j}^{2} m_{j j} - k α_{j}^{2} λ_{j} = 0 \end{aligned}$ (21) By solving (21) for k, we have (22) $\begin{aligned} k_{j} = \frac{φ λ_{j} m_{j j}}{α_{j}^{2}}, \end{aligned}$ (22) where $α_{j}^{2}$ which is squared of $α_{j}$ represents the jth element of the vector $α = Q^{'} β$ . The details of these matrices are already discussed in the previous section. After estimating the optimal value of the shrinkage parameter, we propose several estimators to assess efficiency. Motivated by the work [Citation17,Citation18] where different ridge estimation methods have been proposed for estimating the shrinkage parameter. Following are our proposed ridge estimators for the RGRRE: $\begin{aligned} k_{1} & = {\hat{k}}_{m e a n} = \frac{\sum_{j = 1}^{p} (\sqrt{\frac{\hat{φ} λ_{j} m_{j j}}{{\hat{α}}_{j}^{2}}})}{p}; \\ k_{2} & = {\hat{k}}_{m e d i a n} = m e d i a n (\frac{\hat{φ} λ_{j} m_{j j}}{{\hat{α}}_{j}^{2}}) \\ k_{3} & = {\hat{k}}_{m a x} = m a x (\sqrt{\frac{\hat{φ} λ_{j} m_{j j}}{{\hat{α}}_{j}^{2}}}); \\ k_{4} & = {\hat{k}}_{m i n} = m i n (\frac{\hat{φ} λ_{j} m_{j j}}{{\hat{α}}_{j}^{2}}) \\ k_{5} & = {\hat{k}}_{H M} = \frac{p}{\sum_{j = 1}^{p} (1 / (\frac{\hat{φ} λ_{j} m_{j j}}{{\hat{α}}_{j}^{2}}))}; \\ k_{6} & = {\hat{k}}_{G M} = {(\prod_{j = 1}^{p} (\frac{\hat{φ} λ_{j} m_{j j}}{{\hat{α}}_{j}^{2}}))}^{1 / p} . \end{aligned}$

5. The Monte Carlo simulation

In this section, we present a Monte Carlo simulation study to assess the performance of our proposed estimators under different evaluated situations, where MSE is considered as an assessment criterion.

5.1. The design of the simulation

The response variable of the GRM is generated using pseudo-random numbers from $G (μ_{i}, φ)$ distribution with the log link function, where (23) $\begin{aligned} {\hat{μ}}_{i} = e x p (β_{0} + β_{1} x_{i 1} + β_{2} x_{i 2} + \dots + β_{p} x_{i p}), i = 1, \dots, n, j = 1, \dots, p . \end{aligned}$ (23) The selection of parametric values in (23) is in such a way that $\sum_{j = 1}^{p} β_{j}^{2} = 1$ , which is the common constraint in different simulation studies [Citation7].

Following the work of McDonald and Galarneau [Citation19], the explanatory variables are generated by $\begin{aligned} x_{i j} = (1 - ρ^{2})^{1 / 2} u_{i j} + ρ u_{i j + 1}, i = 1, 2, \dots, n, j = 1, 2, \dots, p, \end{aligned}$ Where $u_{i j}$ are pseudo-random numbers that are generated from the standard normal distribution and $ρ$ is specified so that the correlation between any two explanatory variables is given by $ρ^{2}$ . To see the correlation effect in the simulation, we considered $ρ^{2}$ = 0.90, 0.95, 0.99, and 0.999. Additionally, we also consider the seven sets of sample sizes to examine the clear effect of multicollinearity, i.e. 50, 100, 150, 200, 300, 400 and 500. The values of $β$ affect the value of $F β = f$ which measures the correctness of the restrictions, and the performances of the estimators depend on the magnitude of $F β = f$ . Therefore, following the work of Mansson et al. [Citation11], the following restrictions are imposed on p = 4 and p = 8 which are respectively given as $\begin{aligned} F = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix} \begin{matrix} - 2 & 1 \\ 1 & - 1 \end{matrix}] and f = [\begin{matrix} 0 \\ 0 \end{matrix}] \end{aligned}$ The restrictions for p = 8 are set to be $\begin{aligned} F = [\begin{matrix} 1 & 0 \\ 1 & - 1 \end{matrix} \begin{matrix} - 2 & 1 \\ 1 & - 1 \end{matrix} \begin{matrix} - 3 & 1 \\ - 3 & 1 \end{matrix} \begin{matrix} 1 & 1 \\ - 2 & 1 \end{matrix}] and f = [\begin{matrix} 0 \\ 0 \end{matrix}] \end{aligned}$ In the simulation procedure, the dispersion parameter is estimated to be the Pearson method, i.e. $\hat{φ} = (n - p)^{- 1} \sum_{i = 1}^{n} \frac{{(y_{i} - {\hat{μ}}_{i})}^{2}}{{\hat{μ}}_{i}^{2}}$ for each sample size and explanatory variables. For a combination of the various values, i.e. n, p and $ρ$ the generated data are repeated 2000 times, and the whole process is run 2000 times to compute the simulated MSE as follows [Citation20,Citation21]: $\begin{aligned} M S E (\hat{β}) = \frac{1}{R} \sum_{i = 1}^{R} ({\hat{β}}_{(i)} - β)^{'} ({\hat{β}}_{(i)} - β), \end{aligned}$ where R is the number of replications set to be 2000 times, and subscript (i) refers to the ith replication and ${\hat{β}}_{(i)}$ is the estimate of $β$ in the ith replication of the experiment.

5.2. Results and discussion

Table and represent the estimated MSEs of the GRM estimators. To assess the performance of the proposed estimators, we consider multiple factors, i.e. multicollinearity, sample size and different explanatory variables to monitor efficiency and compared the performance of the proposed estimator with some existing estimators. We can observe from tables that the new proposed estimators perform better than RMLE as well as MLE and GRRE. Specifically, the shrinkage parameter $k_{3}$ performed better in the sense of minimum MSE as compared to the other proposed estimators.

Table 1. Estimated MSE when p = 4.

Display Table

It can also be observed from Tables and that as we increase the levels of multicollinearity, there is a general increase in the estimated MSEs of the considered estimators. Moreover, findings also demonstrate that the unrestricted estimators, i.e. MLE and GRRE severely effect from the correlated explanatory variables. Table shows that when severe multicollinearity exists among the explanatory variables, i.e. $ρ^{2} = 0.999$ and n = 500, then the MSEs of the MLE, RMLE, GRRE and six proposed estimators are respectively given as 13.2751, 8.5290, 10.2307, 1.8389, 2.9298, 1.5251, 11.7882, 6.9484 and 2.5806. Among these, $k_{3}$ attains the minimum MSE as compared to others. So, we can say that our proposed $k_{3}$ are more resistant in the presence of severe multicollinearity as compared to other estimators. Moreover, explanatory variables also play a critical role in the performance of all the listed estimators. It is clearly observed from Tables and that as we increase the value of p from 4 to 8, the MSE of all the estimators may rapidly increase. For both the value of p, we observed that our proposed restricted ridge parameters outperform the RMLE and other estimation methods in all of the evaluated situations. Thus we can say that the proposed RGRRE are significantly decreasing the MSE. The sample size is also an important factor in judging the performance of any estimator. Findings clarify that as we increase the sample size, the estimated MSE gradually decreases for all the estimators under study, which is considered as an important property of any estimator. Usually, for the estimation of unknown parameters, the most robust option is $k_{3}$ as compared to other estimation methods in the presence of severe multicollinearity.

Table 2. Estimated MSE when p = 8.

Display Table

6. An empirical application: hydrocarbon escape data

In this section, we evaluate the performance of the proposed estimator using a real application. For this purpose, we consider a hydrocarbon dataset, which is taken from Weisberg [Citation22]. When petrol is pumped into tanks, hydrocarbons escape into the atmosphere. For the reduction of pollution in the atmosphere, different devices are installed for the absorption of vapours. To evaluate their effectiveness, 32 laboratory experiments were conducted without using the devices. There were four explanatory variables that are involved in this laboratory experiment and the description of the given variables is as follows: The quantity y (in grams) of hydrocarbon escaping was measured as a function of the tank temperature $x_{1}$ (in $^{o} F$ ), the temperature $x_{2}$ (in $^{o} F$ ) of the petrol pumped in, the initial pressure $x_{3}$ in the tank and the pressure $x_{4}$ of the petrol pumped in (both in pounds per square inch). Before further proceeding, it is very crucial to find the probability distribution of the response variable y to identify the appropriate regression model. For this purpose, we use three important tests, namely, Anderson–Darling, Cramer–Von Mises and Pearson $χ^{2}$ test (for detailed description, we recommend the following studies: [Citation23,Citation24]). The results of Table showed that the hydrogen escape dataset is well fitted to the gamma distribution due to its maximum p-value as compared to other competitive distributions. More specifically, it can be seen that the test statistic (p-value) of Cramer–Von Mises found to be 0.0459 (0.5811) clearly signifying that the considered dataset is well fitted to the gamma distribution. The bivariate correlation among the four explanatory variables is displayed in Table . One can be clearly seen from Table that there is a high correlation among all of the four explanatory variables, which clearly indicates that the dataset is highly multicollinear.

Table 3. Distribution of goodness of fit test for hydrogen escape data.

Display Table

Table 4. Correlation matrix.

Display Table

Based on the analysis of the dataset, the estimated coefficients and relative efficiencies are summarized in Table to judge the performance of the MLE, RMLE and RGRREs. The restriction matrix for the considered dataset is to be $F = [0, 1, 1, 1, 1]$ with $f = [0]$ . Since relative efficiency is used in the example for assessment purpose, particularly defined as $e ({\hat{β}}_{M L E}, {\hat{β}}_{i}) = \frac{S M S E ({\hat{β}}_{i})}{S M S E ({\hat{β}}_{M L E})}$ where ${\hat{β}}_{i} ({\hat{β}}_{R M L E}, \hat{β} (k), {\hat{β}}_{F} (k_{1}) - {\hat{β}}_{F} (k_{6}))$ is any biased estimator. Thus $e ({\hat{β}}_{M L E}, {\hat{β}}_{i}) < 1$ indicate improved precision relative to MLE, and $e ({\hat{β}}_{M L E}, {\hat{β}}_{i}) > 1$ indicate worse performance. From Table , the RMLE ( ${\hat{β}}_{R M L E}$ ), GRRE ( $\hat{β} (k)$ ) and proposed RGRREs ( ${\hat{β}}_{F} (k_{1}) - {\hat{β}}_{F} (k_{6})$ ) have smaller SMSE than MLE ( ${\hat{β}}_{M L E}$ ). Moreover, the minimum value of $e ({\hat{β}}_{M L E}, {\hat{β}}_{i})$ suggests the best estimator, among others. The application result shows that the performance of the proposed RGRRE outperforms the RMLE and GRRE in the presence of high, but imperfect multicollinearity. In Table , the results from the GRM are shown in terms of the estimated parameter and relative efficiency for different estimators. It can be noted that the MSE is most extensive for ${\hat{β}}_{M L E}$ and $\hat{β} (k)$ indicating the poor performance in case of severe multicollinearity and many explanatory variables as shown in the simulation study. The second worst estimator is the ${\hat{β}}_{R M L E}$ . The estimator that minimizes the MSE is the ${\hat{β}}_{F} (k_{6})$ .

Table 5. Estimated parameter and relative efficiency for different estimators.

Display Table

7. Conclusion

In this paper, we introduced a new RGRRE for the estimation of unknown parameters of the GRM in the presence of mild to severe multicollinearity. We also proposed some methods to choose the ridge parameter k for the RGRRE. A Monte Carlo simulation study has been designed to evaluate the performance of the proposed RGRREs and compared it with other estimators under different evaluated conditions where MSE is considered as an assessment criterion. Moreover, to illustrate the benefits of using the proposed estimator, we also consider an empirical application. From the simulation and real application results, we conclude that the performance of RGRRE under different ridge parameters is better than the MLE, RMLE and GRRE. Specifically, the RGRRE with parameter $k_{3}$ is the most robust and recommended estimator in the sense of minimum MSE. Since Monte Carlo simulation results proved $k_{3}$ as the most resistant estimator in the presence of severe multicollinearity. Hence, we suggest the researchers to use RGRRE with ridge parameter $k_{3}$ and $k_{6}$ for the estimation of GRM in the presence of severe multicollinearity.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Algamal ZY. Developing a ridge estimator for the gamma regression model. J Chemometr. 2018;32(10):e3054.
Web of Science ®Google Scholar
Qasim M, Amin M, Amanullah M. On the performance of some new Liu parameters for the gamma regression model. J Stat Comput Simul. 2018;88(16):3065–3080.
Web of Science ®Google Scholar
Nyquist H. Restricted estimation of generalized linear models. J Royal Statist Soc C Appl Statist. 1991;40(1):133–141.
Web of Science ®Google Scholar
Kibria BG, Saleh AME. Improving the estimators of the parameters of a probit regression model: a ridge regression approach. J Stat Plan Inference. 2012;142(6):1421–1435.
Web of Science ®Google Scholar
Qasim M, Kibria BMG, Månsson K, et al. A new Poisson Liu regression estimator: method and application. J Appl Stat. 2020;47(12):2258–2271.
PubMed Web of Science ®Google Scholar
Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67.
Web of Science ®Google Scholar
Amin M, Qasim M, Amanullah M, et al. Performance of some ridge estimators for the gamma regression model. Statist Papers. 2020;61(3):997–1026.
Web of Science ®Google Scholar
Kaçiranlar S, Sakallioğlu S, Akdeniz F, et al. A new biased estimator in linear regression and a detailed analysis of the widely-analyzed dataset on portland cement. Sankhyā Ind J Stat B. 1999: 443–459.
Google Scholar
Sarkar N. A new estimator combining the ridge regression and the restricted least squares methods of estimation. Commun Stat Theory Methods. 1992;21(7):1987–2000.
Web of Science ®Google Scholar
Mahmoudi A, Arabi Belaghi R, Mandal S. A comparison of preliminary test, stein-type and penalty estimators in gamma regression model. J Stat Comput Simul. 2020;90(17):3051–3079.
Web of Science ®Google Scholar
Månsson K, Kibria BM, Shukur G. A restricted Liu estimator for binary regression models and its application to an applied demand system. J Appl Stat. 2016;43(6):1119–1127.
Web of Science ®Google Scholar
Asar Y, Arashi M, Wu J. Restricted ridge estimator in the logistic regression model. Commun Stat Simul Comput. 2017;46(8):6538–6544.
Web of Science ®Google Scholar
Segerstedt B. On ordinary ridge regression in generalized linear models. Commun Statist Theory Methods. 1992;21(8):2227–2246.
Web of Science ®Google Scholar
Kurtoğlu F, Özkale MR. Restricted ridge estimator in generalized linear models: Monte Carlo simulation studies on Poisson and binomial distributed responses. Commun Stat Simul Comput. 2017;48(4):1191–1218.
Web of Science ®Google Scholar
Wang SG, Wu MX, Jia ZZ. Matrix inequalities. 2nd ed Beijing: Chinese Science Press; 2006.
Google Scholar
Rao CR, Toutenburg H, Shalabh, et al. Linear models and generalizations—least squares and alternatives. Berlin: Springer; 2008.
Google Scholar
Kibria BG. Performance of some new ridge regression estimators. Commun Stat Simul Comput. 2003;32(2):419–435.
Web of Science ®Google Scholar
Qasim M, Månsson K, Kibria BMG. On some beta ridge regression estimators: method, simulation and application. J Stat Comput Simul. 2021;91(9):1699–1712.
Web of Science ®Google Scholar
McDonald GC, Galarneau DI. A Monte Carlo evaluation of some ridge-type estimators. J Am Stat Assoc. 1975;70(350):407–416.
Web of Science ®Google Scholar
Månsson G K, Shukur G. A Poisson ridge regression estimator. Econo Model. 2011;28:1475–1481.
Web of Science ®Google Scholar
Varathan N, Wijekoon P. Optimal generalized logistic estimator. Commun Stat Theory Methods. 2018;47:463–474.
Web of Science ®Google Scholar
Weisberg S. Applied linear regression. John Wiley & Sons; 1980.
Google Scholar
J. Zhang, Powerful goodness-of-fit and multi-sample tests, PhD Thesis. 2011. York University, Toronto.
Google Scholar
Evan DL, Drew JH, Leemis LM. The distribution of Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling test statistics for exponential populations with estimated parameters. Commun Stat Theory Methods. 2008;37:1396–1421.
Google Scholar

A restricted gamma ridge regression estimator combining the gamma ridge regression and the restricted maximum likelihood methods of estimation

Abstract

1. Introduction

2. The gamma regression model and estimators

2.1. The RMLE

3. Proposed estimator

3.1. MSE properties and superiority of the ${\hat{β}}_{F} (k)$

3.1.1. Comparison between ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$

3.1.2. Comparison between ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$

3.1.3. Comparison between ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when $τ \neq 0$

3.1.4. Comparison between ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ when $τ \neq 0$

4. Estimation of the shrinkage parameter k

5. The Monte Carlo simulation

5.1. The design of the simulation

5.2. Results and discussion

Table 1. Estimated MSE when p = 4.

Table 2. Estimated MSE when p = 8.

6. An empirical application: hydrocarbon escape data

Table 3. Distribution of goodness of fit test for hydrogen escape data.

Table 4. Correlation matrix.

Table 5. Estimated parameter and relative efficiency for different estimators.

7. Conclusion

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

A restricted gamma ridge regression estimator combining the gamma ridge regression and the restricted maximum likelihood methods of estimation

Abstract

1. Introduction

2. The gamma regression model and estimators

2.1. The RMLE

3. Proposed estimator

3.1. MSE properties and superiority of the β^F(k)

3.1.1. Comparison between β^F(k) and β^RMLE when the prior restrictions on the parameters are true, i.e. τ=(f−Fβ)=0

3.1.2. Comparison between β^F(k) and β^(k) when the prior restrictions on the parameters are true, i.e. τ=(f−Fβ)=0

3.1.3. Comparison between β^F(k) and β^RMLE when τ≠0

3.1.4. Comparison between β^F(k) and β^(k) when τ≠0

4. Estimation of the shrinkage parameter k

5. The Monte Carlo simulation

5.1. The design of the simulation

5.2. Results and discussion

Table 1. Estimated MSE when p = 4.

Table 2. Estimated MSE when p = 8.

6. An empirical application: hydrocarbon escape data

Table 3. Distribution of goodness of fit test for hydrogen escape data.

Table 4. Correlation matrix.

Table 5. Estimated parameter and relative efficiency for different estimators.

7. Conclusion

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.1. MSE properties and superiority of the ${\hat{β}}_{F} (k)$

3.1.1. Comparison between ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$

3.1.2. Comparison between ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ when the prior restrictions on the parameters are true, i.e. $τ = (f - F β) = 0$

3.1.3. Comparison between ${\hat{β}}_{F} (k)$ and ${\hat{β}}_{R M L E}$ when $τ \neq 0$

3.1.4. Comparison between ${\hat{β}}_{F} (k)$ and $\hat{β} (k)$ when $τ \neq 0$