Abstract
In this article, we propose a restricted gamma ridge regression estimator (RGRRE) by combining the gamma ridge regression (GRR) and restricted maximum likelihood estimator (RMLE) to combat multicollinearity problem for estimating the parameter in the gamma regression model. The properties of the new estimator are discussed, and its superiority over the GRR, RMLE and traditional maximum likelihood estimator is theoretically analysed under different conditions. We also suggest some estimating methods to find the optimal value of the shrinkage parameter. A Monte Carlo simulation study is conducted to judge the performance of the proposed estimator. Finally, an empirical application is analysed to show the benefit of RGRRE over the existing estimators.
1. Introduction
Gamma regression model (GRM) is frequently used in the field of medical sciences, health care economics and automobile insurance claims [Citation1]. The GRM is used, when the dependent variable is positively skewed and assumes that the dependent variable has a gamma distribution [Citation2]. The usual maximum likelihood estimator (MLE) is used to estimate the parameter vector of the GRM by means of iterative reweighed least squares (IRWLS) algorithm. The MLE minimizes the weighted sum of squared error. Though the MLE estimator cannot be suitable if it is suspected that may belong to a linear subspace such as , where f is an vector of known elements and F is the known matrix of full rank with In this condition, a restricted maximum likelihood estimator (RMLE) should be used [Citation3,Citation4]. The problem of multicollinearity and its consequences on a GLM are well known, for example, one of the most important consequences of multicollinearity on the restricted and unrestricted estimators in the GRM is that both estimators provide large variances of the regression coefficients. Another consequence of multicollinearity is the wider confidence interval, decreased statistical power which results in an increase in the probability of type II error in hypothesis testing in terms of the parameters [Citation5].
Unrestricted ridge estimators under the classical linear regression model and the GRM have been considered respective by Hoerl and Kennard [Citation6] and Amin et al. [Citation7] to remedy the problem caused by multicollinearity. However, the regression coefficient estimates are obtained by using the restricted estimator when the prior information for regression coefficients is available, which be able to form linear restriction in the GLM. For instance, the linear restriction exists in sum-to-zero parameterization, when the regressors are qualitative and in quantal response problems with constant relative potency between drugs (see for more details, Citation3). Therefore, in this paper, the model of interest is the GRM which relates to the system of linear restrictions. Nyquist [Citation3] proposed the RMLE for the GLM under linear restrictions on the parameters. However, in the presence of multicollinearity, the weighted matrix of cross products is ill-conditioned, which leads to instability and high variances of both the MLE (unrestricted) and RMLE. To overcome the problem of multicollinearity, different kinds of shrinkage estimator have been proposed. For the linear regression model, Kaçiranlar et al. [Citation8] developed the restricted Liu estimator by combing the Liu estimator and the restricted least square estimator (RSLE), and Sarkar [Citation9] proposed restricted ridge regression approach by combining ordinary ridge regression and RSLE. However, literature on the restricted ridge regression under the GRM is limited.
The aim of this paper is to propose a new restricted gamma ridge regression estimator (RGRRE) by combining the GRRE and RMLE by following the work of Sarkar [Citation9]. The motivation at the back of the new estimator is as follows. Under certain circumstances, a reparameterization in the forms of linear combinations of the regression coefficients is found when the linear restrictions on the regression coefficients are true. Now consider the problem of multicollinearity under the reparametrized model, then it is required to modify the RMLE in the line of GRRE to obtain the efficient estimates. The mean square error (MSE) properties of the RGRRE are studied and show the superiority of RGRRE to the RMLE. We also suggest some methods for estimating the value of ridge parameter.
The rest of this study is organized as follows: The GRM and existing estimators are discussed in Section 2. In Section 3, we derive the MSE properties of the proposed estimator and the theoretical comparison with existing estimators is considered. In Section 4, we suggest some estimators for the selection of the ridge parameter of the RGRRE. The design of the Monte Carlo simulation study and its results are provided in Section 5. A real-life dataset is analysed in Section 6. Some concluding remarks are presented in Section 7.
2. The gamma regression model and estimators
In the GLM framework, the response variable is assumed to follow an exponential family distribution with mean . The random variables contain n independent observations, i.e. . Suppose that the probability density function of is shown as below where is the location parameter, is the dispersion parameter and is the cumulant function. The GLM employs the relation (1) (1) where , is the monotonic differentiable log link function, is the linear predictor, is a vector of regression coefficients and is the ith row of which is an matrix with non-stochastic explanatory variables. Then the log-likelihood function is defined as The MLE of is obtained by differentiating the function below which is Consider the response variable contains n independent observations that comes from the gamma distribution. Let by using the reparameterization as and where is the mean function of the response variable and is the dispersion parameter (see for more details of the GRM estimation [Citation7,Citation10]). The mean function of the GRM is defined as by using log link function as defined in Equation (1). The most common estimation method for the GRM is the MLE. The MLE is obtained by solving Since the above function is non-linear in , so we can solve the above expression iteratively through Fisher’s scoring method. Consider be the estimated value of the MLE of with iterations which may be written as where represents Fisher information matrix and is the vector of scores, and both and are computed at the last iteration , where the convergence is achieved. After the final iteration, the MLE can be obtained by the IRWLS method as (2) (2) where , and is vector with elements being evaluated at the last iteration , whereas represents the mean function of the response variable using log link function. Let and , then By following Kibria and Saleh [Citation4], we assume two conditions (i) , as and (ii) as , where is finite and p.d matrix and is the ith row of the matrix . The asymptotic distribution of is stated as
2.1. The RMLE
One method to improve the efficiency of the estimators is the use of extraneous or prior information. In practice, such prior information might be accessible in relation to the regression coefficients. For instance, in applied economics, the constant returns to scale imply that the exponents in a Cobb–Douglas production function should sum to unity. In the second example, the absence of money illusion on the part of consumers implies that the sum of money income and price elasticities in a demand function should be zero. This type of prior information may be available from an extraneous source, user experience, some theoretical consideration, among others. To make use of such information in enhancing the estimation of regression coefficients, it can be said in the form of linear restrictions.
Our primary aim is to estimate when it is suspected that belongs to the linear sub-space defined by (3) (3) where are scalars, are known vectors and linearly independent restriction on parameter vector . In such situation, we may use restricted estimator of [Citation3,Citation4,Citation11,Citation12]. The RMLE [Citation3] is obtained to maximize the log-likelihood function of the GRM over under restrictions . One method for solving restricted optimization problems is the quadratic penalty function. The quadratic function for the RMLE is given as and for fixed and positive value of . Differentiating with respect to equals We compute the RMLE by using the method similar to that for unrestricted estimator. Therefore, the approximation of the RMLE is finally obtained as (4) (4) For testing the hypothesis , the Wald-type test is defined as Under , as , follows central distribution with m degrees of freedom.
Clearly, unless (3) holds: The variance–covariance matrix of is defined as where is the estimated dispersion parameter computed as
The performance of the RMLE is superior to the MLE since RMLE has a smaller sampling variance than the MLE. The asymptotic variance–covariance matrix of is defined as . Therefore, it is shown that which is a positive semidefinite (psd) matrix. So, we conclude that the RMLE has minimum sampling variance as compared to the MLE. Amin et al. [Citation7] proposed the GRRE for the GRM (5) (5) where , is an identity matrix of order and is the ridge parameter. The MSE properties of the are defined by Amin et al. [Citation7].
3. Proposed estimator
The RMLE of is obtained by maximizing the log-likelihood function of the GRM subject to the restrictions given in Equation (3). Now considering the problem of multicollinearity under the reparametrized model, the method of RMLE may produce poor estimates and provide misleading information as in MLE in the presence of multicollinearity. Therefore, it is required to modify the RMLE in the line of GRRE to obtain efficient estimates under a set of linear restrictions by following the work of Sarkar [Citation9]. Quadratic function for the RGRRE is defined as where is the Lagrange multipliers. The quadratic function combines objective functions of GRRE [Citation13] and RMLE [Citation3], and the same idea is also carried out by Kurtoǧlu and Özkale [Citation14] for the count regression model where the dispersion parameter equals to 1. Differentiating with respect to equals Now define is matrix with elements and taking the second-order derivates of as Taking the expectation of both sides, we have where .
Let is the RGRRE of with iterations . Subsequently, by means of the Fisher scoring method this case yields Based on the objective functions of RMLE and GRRE , the RGRRE is defined as (6) (6) The final form of the restricted ridge estimator in the GLM was proposed by Kurtoğlu and Özkale [Citation14] and it is same for the GRM. However, the MSE properties of GRM are different as compared to the count regression model by considering the effect of dispersion parameter in the estimation methods of optimal shrinkage parameter in the RGRRE. The performance of the shrinkage estimators is different for different forms of the GLM in the restricted estimators (e.g. [Citation4,Citation10–12]).
The RGRRE can be simplified as (6a) (6a) where and . It can be easily seen that when .
We compute the bias of by using (6) as So the RGRRE is a biased estimator of parameter vector unless and , where is the null vector.
The variance–covariance matrix of is computed as
3.1. MSE properties and superiority of the
In this section, we define the MSE properties and gauge the performance of . The matrix means squared error (MMSE) of an estimator of can be defined as (7) (7) where denotes the variance–covariance matrix of an estimator and represents the bias vector. The scalar MSE (SMSE) of can be found by applying the trace operator which is given by (8) (8) Let be two competitive estimators of parameter , and the estimator is said to be superior to in the form of MMSE criterion if . Moreover, if the estimator dominates the estimator in the form of MMSE criterion, then . For the comparison of the following discussion, we list the following lemmas for making the comparison of the estimators.
Lemma 1.
Let is a real symmetric matrix, is a matrix, then each eigenvalue of matrix is nonnegative.
Proof:
See Wang et al. [Citation15]
Lemma 2.
Suppose that matrices and are not singular, and are matrices with proper orders, then
Proof:
See Rao et al. [Citation16]
3.1.1. Comparison between and when the prior restrictions on the parameters are true, i.e.
It is evident that unless (3) holds. The is an unbiased estimator when while is always a biased estimation method.
Theorem 3.1.
Under the GRM, the is a biased estimator and is an unbiased estimator when . Though, the for .
Proof:
As regards performance by the variance–covariance matrices of and , we compute the difference of two variance–covariance matrices as (9) (9) where ( psd), ( positive definite) and . From (9), we conclude that is a non-negative definite (nnd) matrix and is a psd matrix. Therefore, is a psd for . Hence, the sampling variance of has smaller than the .
Now, we discuss the SMSE properties of the and show the comparison of to the . Following [Citation11] and [Citation3], we define the SMSE of for the GRM as below (10) (10) where represents the jth diagonal element of the matrix , Q is the orthogonal matrix such that . The SMSE of the RGRRE is computed as (11) (11)If we assume that the prior restrictions hold, i.e., then (11) may be written as After simplification, the SMSE of the RGRRE is defined as (12) (12) where is the jth element of , is the jth eigenvalue of the matrix A, represent the total variance and squared bias of respectively. From (12), it can be noted that the bias of the RGRRE is the same as the bias of the GRRE when the prior restrictions on the parameters are true, i.e. . By following [Citation6], we show the following SMSE properties of for the GRM.
Theorem 3.2.
The total variance is a continuous and monotonically decreasing function of k.
Proof:
Differentiating the expression with respect to k as below (13) (13) Based on (13), it is evident that is a continuous and monotonically decreasing function of k since when and .
Theorem 3.3.
The squared bias is a continuous and monotonically increasing function of k.
Proof:
Differentiating the expression with respect to k as below (14) (14) Equation (14) indicates that is a continuous and monotonically increasing function of k for and .
Theorem 3.4.
Under the GRM, there always exists in the range such that when .
Proof:
The first derivative of (12) with respect to k equals (15) (15) As discussed earlier and , . It is well known that total variance and total bias are monotonically decreasing and increasing function of k. It is then evident that and are always non-positive and non-negative, respectively. Thus to prove the theorem it is enough to condition that for to be negative. Hence, we have proven that the existence of a value for which showed the superiority of RGRRE over the GRRE.
Remark.
From Theorem 3.4, it is noted that , where and is a psd matrix. Consequently, and . It can also be noted that . Therefore, we conclude that one of the beneficial consequences of involving exact prior information is that the range of values of k for the dominance of the RGRRE over the RMLE becomes reduced as compared to that for the dominance of GRRE over the traditional MLE in the sense of SMSE criterion.
3.1.2. Comparison between and when the prior restrictions on the parameters are true, i.e.
This section compares the performance of and .
Theorem 3.5.
Under the GRM, the is less than for .
Proof:
Both and have same bias iff , which can be defined as Thus we only make a comparison between the dispersion matrices of the and . (16) (16) where , and . is a psd matrix for . It is enough to show that RGRRE is superior to the GRRE.
3.1.3. Comparison between and when
The performance of the ultimately depends on the assumed restrictions and its bias when the prior constraints on the parameters are not true. To make a comparison of and in the sense of SMSE, we illustrate Theorem 3.6.
Theorem 3.6.
Under the GRM with collinear regressors, if we have , where . Then for when .
Proof:
Rewrite the SMSE of and when the restriction does not hold, respectively: where . The final form of SMSE is defined as below: (17) (17) where . The SMSE of may rewrite as below (18) (18) Taking the first derivative of (18) with respect to k to complete the theorem. and so Additionally, Thus, to prove the theorem, it is enough to show that is negative. If , then for , where , we have or .
3.1.4. Comparison between and when
As we discussed the performance of in the previous section, so it is easy to show the superiority of to the .
Theorem 3.7.
Under the GRM with collinear regressors when , if we have , then for .
Proof:
The SMSE of is defined as (19) (19) The SMSE difference is computed by using (18)–(19), The expression if . This completes the theorem.
4. Estimation of the shrinkage parameter k
In practice, it is best to estimate the shrinkage parameter k in order to attain minimum MSE of the RMLE and RGRRE instead of MLE and GRRE. For this purpose, we find the optimal value of k by differentiating (12) with respect to k and equating to zero, we have (20) (20) It is easy to see that the above equation may be simplified as (21) (21) By solving (21) for k, we have (22) (22) where which is squared of represents the jth element of the vector . The details of these matrices are already discussed in the previous section. After estimating the optimal value of the shrinkage parameter, we propose several estimators to assess efficiency. Motivated by the work [Citation17,Citation18] where different ridge estimation methods have been proposed for estimating the shrinkage parameter. Following are our proposed ridge estimators for the RGRRE:
5. The Monte Carlo simulation
In this section, we present a Monte Carlo simulation study to assess the performance of our proposed estimators under different evaluated situations, where MSE is considered as an assessment criterion.
5.1. The design of the simulation
The response variable of the GRM is generated using pseudo-random numbers from distribution with the log link function, where (23) (23) The selection of parametric values in (23) is in such a way that , which is the common constraint in different simulation studies [Citation7].
Following the work of McDonald and Galarneau [Citation19], the explanatory variables are generated by Where are pseudo-random numbers that are generated from the standard normal distribution and is specified so that the correlation between any two explanatory variables is given by . To see the correlation effect in the simulation, we considered = 0.90, 0.95, 0.99, and 0.999. Additionally, we also consider the seven sets of sample sizes to examine the clear effect of multicollinearity, i.e. 50, 100, 150, 200, 300, 400 and 500. The values of affect the value of which measures the correctness of the restrictions, and the performances of the estimators depend on the magnitude of . Therefore, following the work of Mansson et al. [Citation11], the following restrictions are imposed on p = 4 and p = 8 which are respectively given as The restrictions for p = 8 are set to be In the simulation procedure, the dispersion parameter is estimated to be the Pearson method, i.e. for each sample size and explanatory variables. For a combination of the various values, i.e. n, p and the generated data are repeated 2000 times, and the whole process is run 2000 times to compute the simulated MSE as follows [Citation20,Citation21]: where R is the number of replications set to be 2000 times, and subscript (i) refers to the ith replication and is the estimate of in the ith replication of the experiment.
5.2. Results and discussion
Table and represent the estimated MSEs of the GRM estimators. To assess the performance of the proposed estimators, we consider multiple factors, i.e. multicollinearity, sample size and different explanatory variables to monitor efficiency and compared the performance of the proposed estimator with some existing estimators. We can observe from tables that the new proposed estimators perform better than RMLE as well as MLE and GRRE. Specifically, the shrinkage parameter performed better in the sense of minimum MSE as compared to the other proposed estimators.
It can also be observed from Tables and that as we increase the levels of multicollinearity, there is a general increase in the estimated MSEs of the considered estimators. Moreover, findings also demonstrate that the unrestricted estimators, i.e. MLE and GRRE severely effect from the correlated explanatory variables. Table shows that when severe multicollinearity exists among the explanatory variables, i.e. and n = 500, then the MSEs of the MLE, RMLE, GRRE and six proposed estimators are respectively given as 13.2751, 8.5290, 10.2307, 1.8389, 2.9298, 1.5251, 11.7882, 6.9484 and 2.5806. Among these, attains the minimum MSE as compared to others. So, we can say that our proposed are more resistant in the presence of severe multicollinearity as compared to other estimators. Moreover, explanatory variables also play a critical role in the performance of all the listed estimators. It is clearly observed from Tables and that as we increase the value of p from 4 to 8, the MSE of all the estimators may rapidly increase. For both the value of p, we observed that our proposed restricted ridge parameters outperform the RMLE and other estimation methods in all of the evaluated situations. Thus we can say that the proposed RGRRE are significantly decreasing the MSE. The sample size is also an important factor in judging the performance of any estimator. Findings clarify that as we increase the sample size, the estimated MSE gradually decreases for all the estimators under study, which is considered as an important property of any estimator. Usually, for the estimation of unknown parameters, the most robust option is as compared to other estimation methods in the presence of severe multicollinearity.
6. An empirical application: hydrocarbon escape data
In this section, we evaluate the performance of the proposed estimator using a real application. For this purpose, we consider a hydrocarbon dataset, which is taken from Weisberg [Citation22]. When petrol is pumped into tanks, hydrocarbons escape into the atmosphere. For the reduction of pollution in the atmosphere, different devices are installed for the absorption of vapours. To evaluate their effectiveness, 32 laboratory experiments were conducted without using the devices. There were four explanatory variables that are involved in this laboratory experiment and the description of the given variables is as follows: The quantity y (in grams) of hydrocarbon escaping was measured as a function of the tank temperature (in ), the temperature (in ) of the petrol pumped in, the initial pressure in the tank and the pressure of the petrol pumped in (both in pounds per square inch). Before further proceeding, it is very crucial to find the probability distribution of the response variable y to identify the appropriate regression model. For this purpose, we use three important tests, namely, Anderson–Darling, Cramer–Von Mises and Pearson test (for detailed description, we recommend the following studies: [Citation23,Citation24]). The results of Table showed that the hydrogen escape dataset is well fitted to the gamma distribution due to its maximum p-value as compared to other competitive distributions. More specifically, it can be seen that the test statistic (p-value) of Cramer–Von Mises found to be 0.0459 (0.5811) clearly signifying that the considered dataset is well fitted to the gamma distribution. The bivariate correlation among the four explanatory variables is displayed in Table . One can be clearly seen from Table that there is a high correlation among all of the four explanatory variables, which clearly indicates that the dataset is highly multicollinear.
Based on the analysis of the dataset, the estimated coefficients and relative efficiencies are summarized in Table to judge the performance of the MLE, RMLE and RGRREs. The restriction matrix for the considered dataset is to be with . Since relative efficiency is used in the example for assessment purpose, particularly defined as where is any biased estimator. Thus indicate improved precision relative to MLE, and indicate worse performance. From Table , the RMLE (), GRRE () and proposed RGRREs () have smaller SMSE than MLE (). Moreover, the minimum value of suggests the best estimator, among others. The application result shows that the performance of the proposed RGRRE outperforms the RMLE and GRRE in the presence of high, but imperfect multicollinearity. In Table , the results from the GRM are shown in terms of the estimated parameter and relative efficiency for different estimators. It can be noted that the MSE is most extensive for and indicating the poor performance in case of severe multicollinearity and many explanatory variables as shown in the simulation study. The second worst estimator is the . The estimator that minimizes the MSE is the .
7. Conclusion
In this paper, we introduced a new RGRRE for the estimation of unknown parameters of the GRM in the presence of mild to severe multicollinearity. We also proposed some methods to choose the ridge parameter k for the RGRRE. A Monte Carlo simulation study has been designed to evaluate the performance of the proposed RGRREs and compared it with other estimators under different evaluated conditions where MSE is considered as an assessment criterion. Moreover, to illustrate the benefits of using the proposed estimator, we also consider an empirical application. From the simulation and real application results, we conclude that the performance of RGRRE under different ridge parameters is better than the MLE, RMLE and GRRE. Specifically, the RGRRE with parameter is the most robust and recommended estimator in the sense of minimum MSE. Since Monte Carlo simulation results proved as the most resistant estimator in the presence of severe multicollinearity. Hence, we suggest the researchers to use RGRRE with ridge parameter and for the estimation of GRM in the presence of severe multicollinearity.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- Algamal ZY. Developing a ridge estimator for the gamma regression model. J Chemometr. 2018;32(10):e3054.
- Qasim M, Amin M, Amanullah M. On the performance of some new Liu parameters for the gamma regression model. J Stat Comput Simul. 2018;88(16):3065–3080.
- Nyquist H. Restricted estimation of generalized linear models. J Royal Statist Soc C Appl Statist. 1991;40(1):133–141.
- Kibria BG, Saleh AME. Improving the estimators of the parameters of a probit regression model: a ridge regression approach. J Stat Plan Inference. 2012;142(6):1421–1435.
- Qasim M, Kibria BMG, Månsson K, et al. A new Poisson Liu regression estimator: method and application. J Appl Stat. 2020;47(12):2258–2271.
- Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67.
- Amin M, Qasim M, Amanullah M, et al. Performance of some ridge estimators for the gamma regression model. Statist Papers. 2020;61(3):997–1026.
- Kaçiranlar S, Sakallioğlu S, Akdeniz F, et al. A new biased estimator in linear regression and a detailed analysis of the widely-analyzed dataset on portland cement. Sankhyā Ind J Stat B. 1999: 443–459.
- Sarkar N. A new estimator combining the ridge regression and the restricted least squares methods of estimation. Commun Stat Theory Methods. 1992;21(7):1987–2000.
- Mahmoudi A, Arabi Belaghi R, Mandal S. A comparison of preliminary test, stein-type and penalty estimators in gamma regression model. J Stat Comput Simul. 2020;90(17):3051–3079.
- Månsson K, Kibria BM, Shukur G. A restricted Liu estimator for binary regression models and its application to an applied demand system. J Appl Stat. 2016;43(6):1119–1127.
- Asar Y, Arashi M, Wu J. Restricted ridge estimator in the logistic regression model. Commun Stat Simul Comput. 2017;46(8):6538–6544.
- Segerstedt B. On ordinary ridge regression in generalized linear models. Commun Statist Theory Methods. 1992;21(8):2227–2246.
- Kurtoğlu F, Özkale MR. Restricted ridge estimator in generalized linear models: Monte Carlo simulation studies on Poisson and binomial distributed responses. Commun Stat Simul Comput. 2017;48(4):1191–1218.
- Wang SG, Wu MX, Jia ZZ. Matrix inequalities. 2nd ed Beijing: Chinese Science Press; 2006.
- Rao CR, Toutenburg H, Shalabh, et al. Linear models and generalizations—least squares and alternatives. Berlin: Springer; 2008.
- Kibria BG. Performance of some new ridge regression estimators. Commun Stat Simul Comput. 2003;32(2):419–435.
- Qasim M, Månsson K, Kibria BMG. On some beta ridge regression estimators: method, simulation and application. J Stat Comput Simul. 2021;91(9):1699–1712.
- McDonald GC, Galarneau DI. A Monte Carlo evaluation of some ridge-type estimators. J Am Stat Assoc. 1975;70(350):407–416.
- Månsson G K, Shukur G. A Poisson ridge regression estimator. Econo Model. 2011;28:1475–1481.
- Varathan N, Wijekoon P. Optimal generalized logistic estimator. Commun Stat Theory Methods. 2018;47:463–474.
- Weisberg S. Applied linear regression. John Wiley & Sons; 1980.
- J. Zhang, Powerful goodness-of-fit and multi-sample tests, PhD Thesis. 2011. York University, Toronto.
- Evan DL, Drew JH, Leemis LM. The distribution of Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling test statistics for exponential populations with estimated parameters. Commun Stat Theory Methods. 2008;37:1396–1421.