1,117
Views
0
CrossRef citations to date
0
Altmetric
Short Communication

Dealing with Multicollinearity in Predicting Egg Components from Egg Weight and Egg Dimension

, &
Article: 3408 | Received 11 Apr 2014, Accepted 11 Aug 2014, Published online: 17 Feb 2016

Abstract

Measurements of 174 eggs from meat-type breeder flock (Ross) at 36 weeks of age were used to study the problem of multicollinearity (MC) instability in the estimation of egg components of yolk weight (YKWT), albumen weight (ALBWT) and eggshell weight (SHWT). Egg weight (EGWT), egg shape index (ESI)=egg width (EGWD)*100/egg length (EGL) and their interaction (EGWTESI) were used in the context of un-centred vs centred data and principal components regression (PCR) models. The pairwise phenotypic correlations, variance inflation factor (VIF), eigenvalues, condition index (CI), and variance proportions were examined. Egg weight had positive correlations with EGWD and EGL (r=0.56 and 0.50, respectively; P<0.0001) and EGL had a negative correlation with ESI (r=-0.79; P<0.0001). The highest correlation was observed between EGWT and ALBWT (r=0.94; P<0.0001), while the lowest was between EGWD and SHWT (r=0.33; P<0.0001). Multicollinearity problems were found in EGWT, ESI and their interaction as shown by VIF (>10), eigenvalues (near zero), CI (>30) and high corresponding proportions of variance of EGWT, ESI and EGWTESI with respect to EGWTESI. Results from this study suggest that mean centring and PCR were appropriate to overcome the MC instability in the estimation of egg components from EGWT and ESI. These methods improved the meaning of intercept values and produced much lower standard error values for regression coefficients than those from un-centred data.

Introduction

Chicken egg is an excellent nutritious food item with a well-balanced source of essential nutrients (Stadelman, Citation1977). It contains high quality protein along with balanced source of nutrients in relation to its low energy content (Shrimpton, Citation1987; Hu et al., Citation2001). The components of eggs are of particular interest to producers and its acceptability to consumers (Stadelman, Citation1977). An average egg weights 58 g approximately. Of this weight, the yolk constitutes 16.8 g (29%), the albumen, 35.7 g (61.5%) and the eggshell 5.5 g (9.5%). The weight and dimensions of the chicken egg are very important characteristics that influence its components and grading and consequently the economic outcome of production (Pandey et al., Citation1986; Farooq et al., Citation2001, Citation2003; Wilson and Suarez, Citation1993). These egg characteristics are highly correlated and used to predict egg components (Choprakarn et al., Citation1998; Pandey et al., Citation1986; Wilson and Suarez, Citation1993; Farooq et al., Citation2003; Abanikannda et al., Citation2007). However, the idea of having many highly correlated variables of egg measurements entered into a regression model can lead to multicollinearity (MC) problems. Multicollinearity refers to a situation in which there is an exact (or nearly exact) linear relation among two or more of the input variables (Hawking and Pendleton, Citation1983). This has a potentially serious effect on the standard errors of the coefficients, which may mislead the interpretation of the results. Multicollinearity inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant. This may create some difficulties to understand how the different external egg measurements impact on egg components. It also makes determining the contribution of each explanatory variable difficult because the effects of these variables are mixed. This relationship among variables induces numerical instability into the estimates and alters the size of the coefficient of determination (Aziz and Sharaby, Citation1993). There are many statistical solutions to the MC problem, such as mean centring or using the principal component regression (PCR) model (Yu, Citation2008). Much of the published research seems to only predict egg components from a set of egg measurements, which is acceptable to disregard MC. The predictions will still be accurate and the overall R2 quantifies how well the model predicts egg components. However, the model in this case does not help to understand how the various egg measurements impact the values of egg components. There is no available information on the problem of MC among egg weight and dimensions, the predictors of egg components. In addition, most studies examine main effects or linear effects, and interactions are only available in few publications. This is due to the fact that such interaction effects are difficult to interpret.

This study aimed at dealing with the problem of MC in the prediction of egg components of yolk weight (YKWT), albumen weight (ALBWT) and eggshell weight (SHWT) from egg weight (EGWT) and egg shape index (ESI). The effects of EGWT and ESI and their interaction (EGWTESI) in the regression models of un-centred data, centred data and principal components (PCs) on the variance inflation factor (VIF), eigenvalue and condition index (CI), variance proportions and prediction of egg components were examined.

Materials and methods

A total of 174 freshly laid eggs produced by a meat-type breeder flock (Ross-Alwadi, Riyadh, Saudi Arabia) at 36 weeks of age were collected. Eggs were candled using light to detect cracks and other defects. Eggs were numbered and weighed individually by electronic scale. Measurements of length and width of the eggs (EGL and EGWD, respectively) were taken by steel vernier caliper graduated to 1/10 of 1 mm. Eggs were broken and albumen and yolk separated. The yolk was then carefully rolled on a paper towel to remove extra white and chalaza. When the chalaza was not removed by this process, a razor was used to remove it from the yolk and the clean yolk was weighed. Eggshell was washed, air dried overnight and weighted on an electronic scale. Eggshell membrane was not separated from the egg shell, thus egg shell weight is inclusive of the membrane weight. All weights were made with accuracy to the nearest 0.1 g. Albumen weight was calculated by subtracting the total egg, yolk, and shell weights. Egg shape index was calculated using the following formula given by Panda (Citation1996):

Measurements were made of independent variables of EGWT, EGWD, EGL, and ESI, and dependent variables of YKWT, SHWT and ALBWT.

Statistical analysis

Data were analysed for descriptive statistics (mean±SD, coefficient of variation, and minimum and maximum values). As a first indication of collinearity, correlation coefficients among the independent egg variables were estimated. Due to the inadequacy of correlation as a method of detecting collinearity, the method of VIF of Rook et al. (Citation1990) was employed as follows: where, R i2 is the coefficient of determination.

Eigenvalues of the correlation matrix (X’X), CIs and variance proportions were also computed to confirm the existence or not of collinearity following the procedures adopted by Malau-Aduli et al. (Citation2004) and Pimentel et al. (Citation2007). The variance proportions are the proportions of the variance of the parameter estimate accounted for by each principal component associated with each of the eigenvalue. A high proportion of variance of an independent variable coefficient reveals a strong association with the eigenvalue. Multicollinearity is a problem when a component associated with a high CI contributes substantially to the variance of two or more variables. An attempt was made to reduce the MC by centring the independent variables. Centring a predictor merely entails subtracting the mean of the variable values in the data set from each variable value (Robinson and Schumacker, Citation2009). Also, data was submitted to PCR analysis (Yu, Citation2011).

The full regression model was defined as: where: Y=dependent variable (YKWT, ALBWT, SHWT); a=intercept; ßn=regression coefficients; Xn=independent variables (X1, X2 and X3 for EGWT, ESI and EGWTESI, respectively).

SAS (Citation2006) was used in the analysis.

Results and discussion

The mean and coefficient of variation of egg characteristics and correlations among egg characteristics are shown in and , respectively. Egg weight, EGWD, EGL, ESI, YKWT, ALBWT and SHWT averaged 58.83 g, 43.54 mm, 55.28 mm, 78.84%, 17.85 g, 35.74 g and 5.22 g, respectively. There were significant positive correlations (P<0.0001) between EGWT and egg dimensions (EGWD and EGL, r=0.56 and 0.50, respectively), and between ESI and EGWD (r=0.48). There was a significant negative correlation (P<0.0001) between EGL and ESI (r=-0.79). Mean centring did not alter the correlations between independent and dependent variables. Similar findings on the relationship between ESI with either EGWD or EGL were reported in chicken and Japanese quail eggs (Ozcelik, Citation2002; Kul and Seker, Citation2004; Abanikannda et al., Citation2007). In contrast, Olawumi and Ogunlade (Citation2008) reported non-significant negative correlation value (-0.09) between the EGWT and ESI. The relationships between ESI and EGWD and between ESI and EGL (positive and negative relationships, respectively) in chicken and partridge eggs suggest that these relationships are more likely due to the way of ESI is calculated in which the EGWD is the numerator factor and EGL is the denominator factor (Panda, Citation1996; Gunlu et al., Citation2003). The weight and dimensions of egg (EGWT, EGWD and EGL) have positive significant (P<0.0001) correlations with YKWT (0.34 to 0.74), ALBWT (0.45 to 0.94) and SHWT (0.33 to 0.82). Thus, EGWT, EGWD and EGL have direct relations with the weight of egg components. Similar finding was reported by Olawumi and Ogunlade (Citation2008) who found significant positive correlation between EGWT and SHWT. Egg shape index was negatively (P<0.05) correlated to ALBWT and SHWT. The strong relationship between EGWT or egg dimensions (EGWD and EGL) and egg components (YKWT, ALBWT and SHWT) suggests that the combination of these characteristics could be used to estimate egg components.

The VIFs, eigenvalues, CIs and variance proportion for the relationships among EGWT, ESI and EGWTESI are shown in . The regression analysis indicated that there were collinearity problems in the two variables (EGWT and ESI) and EGWTESI as showed by VIFs. The VIFs were higher than 10.00 (VIF=835, 519 and 1228.5 for EGWT, ESI and EGWTESI, respectively). According to Gill (Citation1986), no absolute standard exists for judging the magnitude of the VIF. However, a crude rule of thumb is to be suspicious of collinearity if VIF is greater than 10.00. This is consistent with the report of Belsley et al. (Citation1980), Rook et al. (Citation1990) and Belsley (Citation1991). Collinearity problems were further confirmed from the computations of the eigenvalues of the correlation matrix, CIs and variance proportions (). The eigenvalues of EGWT, ESI and EGWTESI were near zero (0.003, 0.002 and 8.1 E-7, respectively) indicating that the correlation matrix approached singularity. Condition indexes were higher than 30 (CIs=38, 49 and 2220 for EGWT, ESI and EGWTESI, respectively). Belsley (Citation1991) suggested that moderate to strong relations are associated with CI numbers of 30 to 100. Results in from un-centred regression model indicated that EGWT, ESI and EGWTESI have high proportions of variance (0.999) with respect to EGWTESI. Obviously, EGET and ESI are highly correlated to EGWTESI (r=0.76 and 0.57 for EGWT and ESI with EGWTESI, respectively) with an eigenvalue of 8.11 E-7 and CI of 2220 causing MC in the regression model. This made the variances of estimates become inflated and consequently overfitting the regression model. A similar finding was reported by Greene (Citation2000). This finding suggests that using variable selection methods in the presence of MC may be an inappropriate way to find the correct relationships between variables. Results from the regression of centred means model indicated that EGWT and ESI had high variance proportions in relations to EGWT and EGWTESI (0.44 and 0.45, 0.32 and 0.32, respectively) with eigenvalue and CI values of 1.09 and 0.79, and 1.03 and 1.21, respectively ().

The un-centred, centred means and PCR models of egg measurements predicting egg components are shown in . The un-centred model predicted YKWT, ALBWT and SHWT from egg weight between 50.71 and 67.11 g and ESI between 60.9 and 94.1 (). The fitted regression models to these data were:

The centred model provides a more meaningful interpretation when compared with that of the un-centred model, where the intercept values of 17.85, 35.74 and 5.22 g represent values of YKWT, ALBWT and SHWT when the mean values of EGWT and ESI were 58.82 g, and 78.84%, respectively (). The slopes of the centred means represent the linear relationships between egg measurements (EGWT and ESI) and egg components (YKWT, ALBWT and SHWT) when characterising the egg at the mean of the collected data. This is advantageous because the interpretations of the intercept and slope are now in the range of the observed data. Mean centring does not alter the correlations between independent and dependent variables. It seems that means centring does not affect the interpretation of regression results, but it improves the meaning of the intercept values.

The PCR model was used to reduce MC. The reduction is accomplished by using less than the full set of PCs to explain the variation in the response variable. When all three PCs are used, the ordinary least squares solution can be reproduced. The model can be as follows:

The PC scores were calculated with eigenvectors as weights as follows:

The Z1, Z2 and Z3 had eigenvalues of 1.90948046, 1.09013195 and 0.00038759, and variances of λ1=1.909, λ2=1.090, λ3=0.0039 for Z1, Z2 and Z3, respectively. The regression of egg component was considered against Z1, Z2 and Z3 (egg component=α1Z12Z23Z3+€; ). However, the proportions of variance explained by Z1 and Z2 were 63.6 and 36.3%, respectively. The cumulative variance explained by Z1 and Z2 was almost 99.9% so that the two Zn were retained. Since the Z3 had variance of 0.0029, the linear function defining Z3 was approximately equal to zero and was the source of MC in the data. Regression of reduced model of egg component was considered against Z1 and Z2 (egg component=α1Z12Z2+Є). Consequen tly, the R2 and adjusted-R2 of the reduced PCR model were slightly reduced when compared with those of the PCR of full PCs model (56.16 vs 56.70 and 55.64 vs 55.93; 89.24 vs 89.00 and 89.05 vs 88.87; and 68.00 vs 67.13 and 67.43 vs 66.74 for the R2 and adjusted-R2 of YKWT, ALBWT and SHWT, respectively).

Results from this study suggest that mean centring and PCR are appropriate and adequate techniques to alleviate the MC instability in the estimation of egg components (YKWT, ALBWT and SHWT) from EGWT and ESI. Both techniques produce much lower standard errors for regression coefficients than those produced of un-centred data model.

Table 1. Characteristics of meat-type breeder eggs (n=174).

Table 2. Correlation coefficients of weight, dimensions and components of meat-type breeder eggs.

Table 3. The regression models of un-centred and mean centred data and principal component of egg measurements predicting egg components form egg weight, egg shape index and their interaction.

Table 4. Regression models of un-centred and mean centred data and principal component regression of measurements predicting egg yolk, albumen and eggshell weights form egg weight, egg shape index and their interaction.

Conclusions

It is concluded that mean centring and PCR techniques may be used to overcome MC problem and develop a better model for the estimation of egg components (YKWT, ALBWT and SHWT) from EGWT and ESI.

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding the work through the research group project No RGP-VPP-185.

References

  • Abanikannda O.T.F. Olutogun O. Leigh A.O. Ajayi L.A., 2007. Statistical modeling of egg weight and egg dimensions in commercial layers. Int. J. Poult. Sci. 6:59-63.
  • Aziz M.A. Sharaby M.A., 1993. Collinearity as a problem in predicting body weight from body dimensions of Najdi sheep in Saudi Arabia. Small Ruminant Res. 12:117-124.
  • Belsley D.A., 1991. Conditioning diagnostics, collinearity and weak data in regression. John Wiley and Sons, New York, NY, USA.
  • Belsley D.A. Kuh E. Welsch R.E., 1980. Regression diagnostics: identifying influential data and sources of collinearity. John Wiley and Sons, New York, NY, USA.
  • Choprakarn K. Salangam I. Janaka K., 1998. Laying performance, egg characteristics and egg composition in Thai indigenous hens. J. Nat. Res. Council Thailand 30:1-17.
  • Farooq K.A.M. Durrani F.R. Sarbiland K. Chaud N., 2003. Predicting egg weight, shell weight, shell thickness and hatching chick weight of Japanese quails using various egg traits as regressors. Int. J. Poult. Sci. 2:164-167.
  • Farooq M. Mian M.A. Ali M. Durrani F.R. Asquar A. Muqarrab A.K., 2001. Egg traits of Fayomi bird under subtropical conditions. Sarad J. Agri. 17:141-145.
  • Gill J.L., 1986. Outliers and influence in multiple regression. J. Anim. Breed. Genet. 103:161-175.
  • Greene W.H., 2000. Econometric analysis. Prentice-Hall Publ., Upper Saddle River, NJ, USA.
  • Gunlu A. Kiriki K. Cetin O. Carip M., 2003. Some external and internal quality characteristics of partridge (A. graeca) eggs. Food Agri. Environ. 1:197-199.
  • Hawking R.R. Pendleton O.J., 1983. The regression dilemma. Commun. Stat. A-Theor. 12:497-527.
  • Hu F.B. Manson J.E. Willett W.C., 2001. Types of dietary fat and risk of coronary heart disease: a critical review. J. Am. Coll. Nutr. 20:5-19.
  • Kul S. Seker I., 2004. Phenotypic correlation between some external and internal egg quality traits in the Japanese quail (Coturnix japonica). Int. J. Poult. Sci. 3:400-405.
  • Malau-Aduli A.E.O. Aziz M.A. Kojina T. Niibayashi T. Oshima K. Komatsu M., 2004. Fixing collinearity instability using principal component and ridge regression analyses in the relationship between body measurements and body weight in Japanese Black cattle. J. Anim. Vet. Adv. 3:856-863.
  • Olawumi S.O. Ogunlade J.T., 2008. Phenotypic correlations between some external and internal egg quality traits in the exotic Isa Brown layer breeders. Int. J. Poult. Sci. 2:30-35.
  • Ozcelik M., 2002. The phenotypic correlation among some external and internal quality characteristics in Japanese quail eggs. Vet. J. Ankara Univ. 49:67-72.
  • Panda P.C., 1996. Shape and Texture. In: PandaP.C. ( ed.) Egg and poultry technology. Vikas Publ., New Delhi, India, p 57.
  • Pandey N.K. Mahapatra C.M. Verma S.S. Johari D.C., 1986. Effect of strain on physical egg quality characteristics in white Leghorn chickens. J. Poult. Sci. 21:304-307.
  • Pimentel E.C.G. Queiroz S.A. Carvalheiro R. Fries L.A., 2007. Use of ridge regression for prediction of early growth performance in crossbred calves. Genet. Mol. Biol. 30:536-544.
  • Robinson C. Schumacker R.E., 2009. Interaction effects: centering, variance inflation factor, and interpretation issues. Multiple Linear Regression Viewpoints 35:6-11.
  • Rook A.J. Dhanoa M.S. Gill M., 1990. Prediction of the voluntary intake of grass silages by beef cattle. 2. Principal component and ridge regression analyses. Anim. Prod. 50:439-454.
  • SAS, 2006. Guide for personal computers, version 9.1.3. SAS Inst. Inc., Cary, NC, USA.
  • Shrimpton D.H., 1987. The nutritive value of eggs and their dietary significance. In: WellsR.G. BeljavinC.G. ( eds.). Egg quality. Current problems and recent advances. Butterworth Publ., London, UK, pp 11-25.
  • Stadelman W.J., 1977. Quality identification of shell eggs. In: StadelmanW.J. CotterillD.J. ( eds.) Egg science and technology. AVI Publ., Westport, CT, USA, p 33.
  • Wilson H.R. Suarez M.E., 1993. The use of egg weight and chick weight coefficient of variation as quality indicators in hatchery management. J. Appl. Poultry Res. 2:227-231.
  • Yu C.H., 2008. Multi-collinearity, variance inflation, and orthogonalization in regression. Available from: http://www.creative-wisdom.com/computer/sas/collinear.html
  • Yu C.H., 2011. Principal component regression as a counter measure against colinearity. Arizona State University ed., Tempe, AZ, USA. Available from: http://www.lexjansen.com/wuss/2011/analy/Papers_Yu_C_73333.pdf