1,155
Views
18
CrossRef citations to date
0
Altmetric
Article

An unbiased estimator with prior information

, , &
Pages 45-55 | Received 29 Mar 2019, Accepted 07 Dec 2019, Published online: 06 Jan 2020

Abstract

The ordinary least square (OLS) estimator suffers a breakdown in the presence of multicollinearity. The estimator is still unbiased but possesses a significant variance. In this study, we proposed an unbiased modified ridge-type estimator as an alternative to the OLS estimator and the biased estimators for handling multicollinearity in linear regression models. The properties of this new estimator were derived. The estimator is also unbiased with minimum variance. A real-life application to the higher heating value of poultry waste from proximate analysis and simulation study generally supported the findings.

1. Introduction

Consider the linear regression model (1) y=Xβ+ε,εN(0,σ2I)(1) where y is a n×1 vector of the dependent variable, X is a known n×p full rank matrix of explanatory variables, β is a p×1 vector of regression coefficients and I is an n×n identity matrix. The ordinary least squares estimator (OLS) of β in model (1) is defined as: (2) β̂OLS=(S)1Xy(2) where S=XX.

This estimator is the most widely used method to estimate the parameters in a linear regression model. It performs best when certain assumptions are satisfied. One of them is that the independent variables are not associated. However, in practice, there often exist strong or perfect linear relationships among the independent variables. This situation is called multicollinearity. The OLS estimator suffers a breakdown in the presence of multicollinearity. The estimator is still unbiased but possesses a significant variance (Ayinde, Lukman, Samuel, & Attah, Citation2018). Different approaches are available in the literature to handle this problem. These include Hoerl and Kennard (Citation1970), Swindel (Citation1976), Farebrother (Citation1976), Liu (Citation1993), Sakallioglu and Akdeniz (Citation2003), Ozkale and Kacıranlar (Citation2007), Yang and Chang (Citation2010), Li and Yang (Citation2012), Wu and Yang (Citation2013), Wu (Citation2014) and recently, Arumairajan and Wijekoon (Citation2017), Ayinde et al. (Citation2018), Lukman, Ayinde, Binuomote, and Onate (Citation2019). The estimators by these authors are biased. Crouse, Jin, and Hanumara (Citation1995) and Sakalloglu and Akdeniz (2003) proposed the unbiased version of the ridge estimator and Liu estimator, respectively, with the addition of prior information. These methods effectively handle the problem of multicollinearity and eliminate bias.

In this article, we proposed an unbiased modified ridge-type estimator (UMRT) with prior information and derived its properties. Furthermore, we discuss the performance of the proposed estimator over the OLS estimator, the Ridge estimator (RE) and the modified ridge-type estimator (MRT) using the mean square error matrix (MSEM) criteria.

The remaining part of this article is as follows. In Section 2, we proposed the unbiased modified ridge-type estimator and compared its performance with some existing estimators using the mean square error matrix (MSEM) criterion in Section 3. We estimate the biasing parameter k and d in Section 4. We conducted a simulation study and a real-life data application in Section 5. Finally, we provide some concluding remarks in Section 6.

2. Unbiased modified ridge-type estimator with prior information

Hoerl and Kennard (Citation1970) defined the ridge estimator of β as: (3) β̂RE(k)=(S+kI)1Xy,k>0(3) where k is the biasing parameter.

Swindel (Citation1976) defined the ridge estimator with prior information b (4) β̂MRE(k,b)=(S+kI)1(Xy+kb)(4)

Crouse et al. (Citation1995) introduced the unbiased ridge estimator based on the ridge estimator and prior information J. This is defined as (5) β̂UMRE=(S+kI)1(Xy+kJ)(5) where J and β̂OLS are uncorrelated and J ̴N(β, V) such that V=(σ2k)Ip and Ip is p × p identity matrix. J is estimated by J=i=1pβ̂ip.

Lukman et al. (Citation2019) proposed the modified ridge-type estimator which is defined as follows: (6) β̂MRT(k,d)==[S+k(1+d)]1Sβ̂OLS=Fkdβ̂OLS(6) where Fkd=[S+k(1+d)]1S

Studying the following convex estimator (7) β̂(C,J)=Cβ̂OLS+(IC)J(7) where C is a p × p matrix and I is a p × p identity matrix. Consequently, the mean square error of β̂(C,J) is (8) MSE(β̂(C,J))=σ2CS1C+(IC)V(IC)(8)

Then, (9) MSE(β(C,J))C=2C(σ2S1+V)2V=0(9)

From (9), C is obtained to be C=V(σ2S1+V)1. Accordingly, V=σ2(IC)1CS1). The convex estimator β(C,J) has minimum MSE for optimal value of C and it’s an unbiased estimator of β. Therefore, the new estimator in this study is defined as (10) β̂UMRT(Fkd,J)=Fkdβ̂OLS+(IFkd)J=β̂MRT(k,d)+(IFkd)J(10) where Fkd=[S+k(1+d)]1S, then, the value of V=σ2k(1+d). Consequently, JN(β,σ2k(1+d)) for k > 0, 0 ˂ d ˂ 1.

It is easy to show that β̂UMRT(Fkd,J) is an unbiased estimator of β. The expectation vector, bias vector, dispersion matrix and mean square error matrix of the proposed estimator are: (11) E(β̂UMRT(Fkd,J))=E(β̂MRT+(IFkd)J)=[S+k(1+d)]1Sβ+[S+k(1+d)]1k(1+d)β=[S+k(1+d)]1[Sβ+k(1+d)β]=[S+k(1+d)]1β[S+k(1+d)]=β(11) (12) Bias(β̂UMRT(Fkd,J))=E(β̂UMRT(Fkd,J))β=ββ=0(12) (13) D(βUMRT(Fkd,J))=D(βMRT+(IFkd)J)=σ2[S+k(1+d)]1(13)

Since Bias = 0, then, (14) MSEM ((βUMRT(Fkd,J))=D(β̂UMRT(Fkd,J))(14)

Consequently, the estimator β̂UMRT(Fkd,J) is an unbiased estimator of β.

Suppose there exist an orthogonal matrix Q such that QXXQ=Λ=diag(λ1,λ2,,λp) where λi is the ith eigenvalue of XX. Λ and Q are the matrices of eigenvalues and eigenvectors of XX, respectively. Model (1) can be written in canonical form as: (15) y=Zα+ε(15) where Z=XQ, α=Qβ and ZZ=Λ. For model (15), we get the following representations: (16) α̂OLS=Λ1Zy(16) (17) α̂RE(k)=(Λ+k)1Zy(17) (18) α̂MRT(k,d)=[Λ+k(1+d)]1Λα̂OLS(18) (19) α̂UMRT(Fkd,J)=α̂MRT(k,d)+(IFkd)J(19)

Lemma 2.1.

Let M be an n×n positive definite matrix, that is M > 0, and α be some vector, then Mαα0 if and only if αM1α1 (Farebrother, Citation1976).

Lemma 2.2.

Let β̂i=Aiyi=1,2 be two linear estimators of β. Suppose that D=Cov(β̂1)Cov(β̂2)>0, where Cov(β̂i),i=1,2 denotes the covariance matrix of β̂i and bi=Bias(β̂i)=(AiXI)β,i=1,2. Consequently, (20) Δ(β̂1β̂2)=MSEM(β̂1)MSEM(β̂2)=σ2D+b1b1b2b2>0(20) if and only if b2[σ2D+b1b1]1b2<1 where MSEM(β̂i)=Cov(β̂i)+bibi (Trenkler & Toutenburg, Citation1990).

3. Theoretical Comparisons

3.1. Comparison of the OLS estimator and the unbiased modified ridge-type estimator

Theorem 3.1.

The unbiased modified ridge-type estimator β̂UMRT(Fkd,J) is superior to the OLS estimator in the mean square error sense for k > 0 and 0 < d < 1

Proof.

By Definition, (21) MSEM(β̂OLS)=σ2Λ1(21)

The MSEM difference between EquationEqs. (14) and Equation(21) (22) MSEM(β̂OLS)MSEM(β̂UMRT(Fkd,J))=σ2Λ1σ2[Λ+k(1+d)]1=σ2(Λ1(Λ+k(1+d))1)=σ2diag[1λi1(λi+k(1+d))]i=1p(22)

It was observed that Λ1[Λ+k(1+d)]1 will be positive definite if and only if λi+k(1+d)λi>0. However, for k > 0 and 0<d < 1, λi+k(1+d)λi will be positive definite. By Lemma 2.2, the proof is completed.

3.2. Comparison of ridge estimator and the unbiased modified ridge-type estimator

From the representation, β̂RE(k)=(Λ+kI)1Zy, the mean square error matrix is (23) D(β̂RE(k)=σ2BkΛBk(23) (24) MSEM(β̂RE(k))=σ2BkΛBk+k2BkααBk(24) where Bk=(Λ+kI)1.

The difference between β̂RE(k) and β̂UMRT(Fkd,J) in term of the MSEM is (25) MSEM(β̂RE(k))MSEM(β̂UMRT(Fkd,J))=σ2BkΛBk+k2BkααBkσ2[Λ+k(1+d)]1==σ2(BkΛBk(Λ+k(1+d))1)+k2BkααBk(25)

Let k > 0, 0<d < 1, thus, we have the following theorem.

Theorem 3.2.

Let us consider two estimators β̂RE(k) and β̂UMRT(Fkd,J) . If k > 0 and 0 < d < 1, the estimator β̂UMRT(Fkd,J) is superior to the estimator MSEM(β̂RE(k)) in the MSEM if and only if BkΛBk(Λ+k(1+d))10.

Proof:

The difference between Eqs. (14) and (23) (26) =D(β̂RE(k))D(β̂UMRT(Fkd,J))=σ2(Λ+kI)1Λ(Λ+kI)1σ2(Λ+k(1+d))1=σ2diag[λi(λi+k)21λi+k(1+d)]i=1p(26)

We observed that (Λ+kI)1Λ(Λ+kI)1(Λ+k(1+d))1 will be positive definite if and only if λi(λi+k(1+d))(λi+k)2>0 or λi(d1)>k. where k > 0 and 0<d < 1.

3.3. Comparison of modified ridge-type estimator and unbiased modified ridge-type estimator

From the representation, α̂MRT(k,d)=[Λ+k(1+d)]1Zy, the dispersion and MSEM is defined as follows: (27) D(β̂MRT(k,d))=σ2RkΛ1Rk(27) where R=Λ(Λ+k(1+d)I)1 (28) MSEM(β̂MRT(k,d)=σ2RkΛ1Rk+(RkI)αα(RkZ)(28)

Theorem 3.3.

The unbiased modified ridge type estimator always dominates the modified ridge type estimator in the MSEM sense for k > 0 and 0 < d < 1.

Proof

: The difference between (14) and (28) (29) MSEM(β̂MRT(k,d))MSEM(β̂UMRT(Fkd,J)=σ2k(1+d)[Λ+k(1+d)I]1[I+k(1+d)αα][Λ+k(1+d)I]1(29)

Therefore, MSEM(β̂MRT(k,d))MSEM(β̂UMRT(Fkd,J)) is a non-negative matrix for k > 0 and 0<d < 1. The proof of Theorem 3.3 is completed.

4. Estimation of the biasing parameters k and d

In this section, we discuss the estimation of the biasing parameter k and d.

4.1. The estimation of parameter d

In the definition of the new estimator, J and α̂OLS are uncorrelated. Therefore, (α̂OLSJ)N(0,σ2k(1+d)[Λ1k(1+d)+1]) and (30) E[(α̂OLSJ)(α̂OLSJ)]=σ2k(1+d)[p+k(1+d)tr(Λ1)](30)

From (30), if σ2 is known for a fixed k, we can get an unbiased estimator of d as follows: (31) d̂=pσ2k[(β̂OLSJ)(β̂OLSJ)σ2tr(Λ1)]1(31)

When σ2 is unknown, s2 is used as an estimate of σ2. (32) s2=(YXβ̂OLS)(YXβ̂OLS)np(32)

Consequently, (33) d̂=ps2k[(β̂OLSJ)(β̂OLSJ)s2tr(Λ1)]1(33) where tr(Λ1)=i=1P1λi and λi is the eigen-value of XX. It was observed that the estimator of d in (33) can return a negative value. To eliminate the negative value, Wu (Citation2014) suggests replacing d̂ with one (1) when its estimate is negative. Here, in this study, when d in EquationEq. (33) is negative, we adopt the estimator of d̂ suggested by Ozkale and Kaciranlar (Citation2007) as follows: (34) d̂*=min[αi2σ2λi+αi2](34)

4.2. Estimating the biasing parameter k

From EquationEq. (30), if σ2 is known and d is assumed to be fixed, an unbiased estimate of k is defined as follows: (35) k̂=pσ2(1+d)[(β̂OLSJ)(β̂OLSJ)δ2tr(Λ1)](35)

When k̂ is negative, estimate k̂ as follows: (36) k̂=pσ2i=1pαi2(36)

5. Numerical example and Monte–Carlo simulation

5.1. Application to poultry waste data

The theoretical results are illustrated with real-life data which was analyzed in the study of Qian, Lee, Soto, and Chen (Citation2018). A total of 48 samples of poultry waste were collected from different published open literature reviews to form a database for derivation, evaluation and validation of proximate-based higher heating value (HHV) models. Six samples (#43, 44, 45, 46, 47 and 48) were deleted due to incomplete information. The linear regression model is: (37) HHV=β0+β1FC+β2VM+β3A+ε(37) where HHV denotes Higher Heating Value, FC denotes Fixed Carbon, VM denotes Volatile Matter, A denotes ASH and ε is the random error term that is expected to be normally distributed. The relationship between the variables were obtained by the correlation matrix as follows.

From , there is a strong positive relationship between higher heating value and Fixed Carbon while a negative relationship exists between HHV and VM; HHV and Ash. To identify the distribution of the error term, we used the Jarque-Bera (JB) test. The test statistic and the corresponding p value are JB = 0.6409 and p value =.7258, respectively. Since this p value is larger than any reasonable alpha value used in the literature, we conclude that the error term follows the normal distribution. We diagnosed the model for a possible presence of multicollinearity. The variance inflation factor (VIF) values are VIFFC = 997.819, VIFVM = 2163.504, VIFASH = 1533.782. Literature shows that a model suffers from multicollinearity when VIFi>10. Since the values of the VIF in the above model is higher than 10, we conclude that the model suffers from severe multicollinearity. Alternatively, we can use the condition number (CN) to examine if the explanatory variables are related where CN=maximum(eigenvalue)minimum(eigenvalue). If CN is between 100 and 1000 there is moderate to strong multicollinearity and if it exceeds 1000 there is severe multicollinearity (Arumairajan & Wijekoon, Citation2017; Gujarati, Citation1995). The condition number is 581291.39 which indicates the presence of severe multicollinearity. Therefore, it will be appropriate to predict higher heating value with an alternative unbiased estimator possessing minimum variance. We adopt K fold crossvalidation to validate the performances of the estimators. The data is partitioned into K equal size folds (K = 10 in this study). In these K folds, onefold will be treated as the test set and use the remaining K – 1 (9) folds as the training set. The MSE is computed on the observations in the held-out fold. The process is repeated ten times, taking out a different part each time. The validation test error is obtained by computing the average K estimates of the test error, and we get an estimated validation (test) error rate for new observations. The estimator with the lowest validation MSE is the best. The average MSE of the validation error in this study is defined as: (38) AMSECV=k=1101nki=1n(yiy˜i)210(38) where nk is the number of subsample in each fold, y˜i is the fitted value for observation i, obtained from the data with fold k removed. The result is presented in .

Table 2. Regression coefficients and MSE.

The result in shows that the unbiased modified ridge-type estimator (UMRT) produced the same estimates with the OLS estimator. Also, the technique was able to circumvent the problem of large variance which is peculiar to the OLS estimator. The proposed estimator has the smallest mean square error and prediction error, respectively.

5.2. Monte–Carlo simulation

We carried out a Monte–Carlo simulation to investigate the performances of these estimators. The explanatory variables were generated in line with the study of McDonald and Galarneau (Citation1975), Liu (Citation1993) and Lukman and Ayinde (Citation2017). This is defined as: (39) Xij=(1γ2)1/2zij+γzipi=1,2,,n,j=1,2,,p(39) where zij is independent standard normal distribution with mean zero and unit variance, γ2 is the correlation between any two explanatory variables and p is the number of explanatory variables. The values of γ were taken as 0.85, 0.95 and 0.99, respectively. In this study, the number of explanatory variable (p) was taken to be three and six.

The response variable is defined as: (40) yi=β1X1+β2X2+β3X3+εi(40) where εi(0,σ2). The values of β were chosen such that ββ= 1 (Newhouse & Oman, Citation1971). The sample size used are 30 and 50. Two different values of σ: 1 and 5. The experiment is repeated 1000 times. The estimated MSE is calculated as (41) MSE(β̂)=11000j=11000(β̂ijβi)(β̂ijβi)(41) where β̂ij denotes the estimate of the ith parameter in jth replication and βi is the true parameter values. The estimated MSEs of the estimators for different values of n, k, d, σ and γ are shown in . The following observations were made:

Table 1. Correlation matrix of the variables.

Table 3. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 30, Sig = 1 and p = 3.

Table 4. Estimated MSE for OLS, RE, D and MRT when n = 30, sig = 5 and p = 3.

Table 5. Estimated MSE for OLS, RE, D and MRT when n = 50, sig = 1 and p = 3.

Table 6. Estimated MSE for OLS, RE, D and MRT when n = 50, sig = 5 and p = 3.

Table 7. Estimated MSE for OLS, RE, D and MRT when n = 100, sig = 1 and p = 3.

Table 8. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 100, Sig = 5 and p = 3.

Table 9. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 200, Sig = 1 and p = 3.

Table 10. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 200, Sig = 5 and p = 3.

Table 11. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 30, Sig = 1 and p = 6.

Table 12. Estimated MSE for OLS, RE, D and MRT when n = 30, sig = 5 and p = 6.

Table 13. Estimated MSE for OLS, RE, D and MRT when n = 50, sig = 1 and p = 6.

Table 14. Estimated MSE for OLS, RE, D and MRT when n = 50, sig = 5 and p = 6.

Table 15. Estimated MSE for OLS, RE, D and MRT when n = 100, sig = 1 and p = 6.

Table 16. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 100, Sig = 5 and p = 6.

Table 17. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 200, Sig = 1 and p = 6.

Table 18. Estimated MSE for OLS, Ridge, MRT and UMRT when n = 200, Sig = 5 and p = 6.

  1. The unbiased estimator is superior to OLS in all the cases. OLS estimator has the least performance when there is multicollinearity.

  2. Also, the unbiased estimator consistently outperforms the ridge and modified ridge estimators. Even though, ridge and modified ridge estimators dominate OLS in all cases.

  3. When the sample size increase, the MSE decreases even when the correlation between the explanatory variables increases.

  4. As sample sizes remain constant, increasing the value of σ increases the mean square errors of each of the estimators.

  5. As the number of explanatory variables increases, the mean squared error of all the estimators’ increases for a given level of multicollinearity and σ.

Generally, we confirm the superiority of the unbiased estimator over other estimators at the different level of multicollinearity and error variance. The performance of the modified-ridge estimator dominates the ridge estimator and OLS.

6. Conclusion

The OLS estimator suffers a breakdown in the presence of multicollinearity. The estimator is unbiased but possesses a significant variance. An alternative estimator called unbiased modified ridge-type estimator with prior information was proposed in this study. This estimator was proved to be unbiased and possess minimum variance theoretically. Also, a simulation study and real-life application were conducted to establish the superiority of this estimator over the existing estimators in terms of the MSEM criterion and crossvalidation prediction error. The performance of this new estimator is better than the OLS estimator and ridge estimator for all degree of multicollinearity. This estimator was able to circumvent the problem of inflated variance that faces the OLS estimator. Finally, this estimator should be adopted as a replacement to the OLS estimator and the biased estimators when there is multicollinearity in a linear model.

Acknowledgements

The authors are grateful to the anonymous reviewers for their valuable comments and suggestions, which certainly improved the quality and presentation of this article.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Arumairajan, S., & Wijekoon, P. (2017). Modified almost unbiased liu estimator in linear regression model. Communications in Mathematics and Statistics, 5, 261–276. doi:10.1007/s40304-017-0111-z
  • Ayinde, K., Lukman, A. F., Samuel, O. O., & Attah, O. M. (2018). Some new adjusted ridge estimators of linear regression model. International Journal of Civil Engineering and Technology, 11, 2838–2852.
  • Crouse, R. H., Jin, C., & Hanumara, R. C. (1995). Unbiased ridge estimation with prior information and ridge trace. Communications in Statistics—Theory and Methods, 24, 2341–2354. doi:10.1080/03610929508831620
  • Farebrother, R. W. (1976). Further results on the mean square error of ridge regression. Journal of the Royal Statistical Society: Series B (Methodological), B38, 248–250. doi:10.1111/j.2517-6161.1976.tb01588.x
  • Gujarati, D. N. (1995). Basic econometrics. New York, NY: McGraw-Hill.
  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. doi:10.1080/00401706.1970.10488634
  • Li, Y., & Yang, H. (2012). A new Liu-type estimator in linear regression model. Statistical Papers, 53, 427–437. doi:10.1007/s00362-010-0349-y
  • Liu, K. (1993). A new class of biased estimate in linear regression. Communications in Statistics - Theory and Methods, 22, 393–402.
  • Lukman, A. F., & Ayinde, K. (2017). Review and classifications of the ridge parameter estimation techniques. Hacettepe Journal of Mathematics and Statistics, 46, 953–967. doi:10.15672/HJMS.201815671
  • Lukman, A. F., Ayinde, K., Binuomote, S., & Onate, A. C. (2019). Modified ridge-type estimator to combat multicollinearity: Application to chemical data. Journal of Chemometrics, e3125. 10.1002/cem.3125
  • McDonald, M. C., & Galarneau, D. I. (1975). A Monte Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association, 70, 407–416. doi:10.2307/2285832
  • Newhouse, J. P., & Oman, S. D. (1971). An evaluation of ridge estimators. Rand Report, 1–28. R-716-PR.
  • Ozkale, M. R., & Kaciranlar, S. (2007). The restricted and unrestricted two-parameter estimators. Communications in Statistics - Theory and Methods, 36, 2707–2725.
  • Qian, X., Lee, S., Soto, A., & Chen, G. (2018). Regression model to predict the higher heating value of poultry waste from proximate analysis. Resources, 7, 39. doi:10.3390/resources7030039
  • Sakallioglu, S., & Akdeniz, F. (2003). Unbiased Liu estimation with prior information. International Journal of Mathematical Sciences, 2(1), 205–217.
  • Swindel, F. F. (1976). Good ridge estimators based on prior information. Communications in Statistics - Theory and Methods, 11, 1065–1075. doi:10.1080/03610927608827423
  • Trenkler, G., & Toutenburg, H. (1990). Mean squared error matrix comparisons between biased estimators an overview of recent results. Statistical Papers, 31(1), 165–179. doi:10.1007/BF02924687
  • Wu, J. (2014). An unbiased two-parameter estimation with prior information in linear regression model. The Scientific World Journal, 2014, 1–8. doi:10.1155/2014/206943
  • Wu, J., & Yang, H. (2013). Efficiency of an almost unbiased two-parameter estimator in linear regression model. Statistics, 47, 535–545. doi:10.1080/02331888.2011.605891
  • Yang, H., & Chang, X. (2010). A new two-parameter estimator in linear regression. Communications in Statistics - Theory and Methods, 39, 923–934. doi:10.1080/03610920902807911