2,173
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Response surface designs using the generalized variance inflation factors

& | (Reviewing Editor)
Article: 1053728 | Received 22 Dec 2014, Accepted 15 May 2015, Published online: 14 Jun 2015

Abstract

We study response surface designs using the generalized variance inflation factors for subsets as an extension of the variance inflation factors.

Public Interest Statement

Response surface designs are a mainstay in applied statistics. The variance inflation factors VIF are a measure of collinearity for a single variable in a linear regression model. The generalization to subsets of variables is the generalized variance inflation factor GVIF. This research introduces GVIF as a penalty measure for extending a linear response model to a response surface with the included quadratic terms. The methodology is demonstrated with case studies, and, in particular, it is shown that using GVIF, the H310 design can be improved for the standard global optimality criteria of A, D, and E.

1. Introduction

We consider a linear regression Y=Xβ+ε with X a full rank n×p matrix and L(ε)=N(0,σ2In). The variance inflation factor VIF, Belsley (Citation1986), measures the penalty for adding one non-orthogonal additional explanatory variable to a linear regression model, and they can be computed as a ratio of determinants. The extension of VIF to a measure of the penalty for adding a subset of variables to a model is the generalized variance inflation factor GVIF of Fox and Monette (Citation1992), which will be used to study response surface designs, in particular, as the penalty for adding the quadratic terms to the model.

2. Variance inflation factors

For our linear model Y=Xβ+ε, let DX be the diagonal matrix with entries on the diagonal DX[i,i]=(XX)i,i-1/2. When the design has been standardized XXDX , the VIFs are the diagonal entries of the inverse of SX=DX(XX)DX. That is, the VIFs are the ratios of the actual variances for the explanatory variables to the “ideal” variances had the columns of X been orthogonal. Note that we follow Stewart (Citation1987) and do not necessarily center the explanatory variables.

For our linear model Y=Xβ+ε, view X=[X[p],xp] with xp the pth column of X and X[p] the matrix formed by the remaining columns. The variance inflation factor VIFp measures the effect of adding column xp to X[p]. For notational convenience, we demonstrate VIFp with the last column p. An ideal column would be orthogonal to the previous columns with the entries in the off-diagonal elements of the pth row and pth column of XX all zeros. Denote by Mp the idealized moment matrixMp=X[p]X[p]0p-10p-1xpxp.

The VIFs are the diagonal entries of SX-1=DX-1(XX)-1DX-1. It remains to note that the inverse, SX-1, can be computed using cofactors Ci,j. In particular,(1) VIFp=[SX-1]p,p=[DX-1(XX)-1DX-1]p,p=(xpxp)1/2det(Cp,p)det(XX)(xpxp)1/2=det(Mp)det(XX)(1)

the ratio of the determinant of the idealized moment matrix Mp to the determinant of the moment matrix XX. This definition extends naturally to subsets and is discussed in the next section.

For an alternate view of the how collinearities in the explanatory variables inflate the model variances of the regression coefficients when compared to a fictitious orthogonal reference design, consider the formula for the model varianceVarM(β^j)=σ2i=1n(xij-x¯j)211-Rj2

where Rj2 is the square of the multiple correlation from the regression of the jth column of X=[xij] on the remaining columns as in Liao and Valliant (Citation2012). The first term σ2/(xij-x¯j)2 is the model variance for β^j had the jth explanatory variable been orthogonal to the remaining variables. The second term 1/(1-Rj2) is a standard definition of the jthVIF as in Thiel (Citation1971).

3. Generalized variance inflation factors

In this section, we introduce the GVIFs as an extension of the classical variance inflation factors VIF from Equation 1. For the linear model Y=Xβ+ε, view X=[X1,X2] partitioned with X1 of dimension n×r usually consisting of the lower order terms and X2 of dimension n×s usually consisting of the higher order terms. The idealized moment matrix for the (r,s) partitioning of X isM(r,s)=X1X10r×s0s×rX2X2.

Following Equation 1, to measure the effect of adding X2 to the design X1, that is for X2|X1, we define the generalized variance inflation factor as(2) GVIF(X2|X1)=det(M(r,s))det(XX)=det(X1X1)det(X2X2)det(XX)(2)

as in Equation 10 of Fox and Monette (Citation1992), who compared the sizes of the joint confidence regions for β for partitioned designs and noted when X=[X[p],xp] that GVIF[xp|X[p]]=VIFp. Equation 2 is in the spirit of the efficiency comparisons in linear inferences introduced in Theorems 4 and 5 of Jensen and Ramirez (Citation1993). A similar measure of collinearity is mentioned in Note 2 in Wichers (Citation1975), Theorem 1 of Berk (Citation1977), and Garcia, Garcia, and Soto (Citation2011). For the simple linear regression model with p=2, Equation 2 gives VIF=11-ρ2 with ρ the correlation coefficient as required. Fox and Monette (Citation1992) suggested that X1 contains the variables which are of “simultaneous interest,” while X2 contains additional variables selected by the investigator. We will set X1 for the constant and main effects and set X2 the (optional) quadratic terms with values from X1.

Willan and Watts (Citation1978) measured the effect of collinearity using the ratio of the volume of the actual joint confidence region for β^ to the volume of the joint confidence region in the fictitious orthogonal reference design. Their ratio is in the spirit of GVIF as det(XX) is inversely proportional to the square of the volume of the joint confidence region for β^. They also introduced a measure of relative predictability and they note: “The existence of near linear relations in the independent variables of the actual data reduces the overall predictive efficiency by this factor.” For a simple case study, consider the simple linear regression model with n=4, x1=[-2,-1,1,2], and y=[4,1,1,4]. The 95% prediction interval for x1=0 is 2.5±10.20. If the model also includes x2=[-2.001,-1.001,1.001,2.001], then the 95% prediction interval for (x1,x2)=(0,0) is 2.5±46.02 demonstrating the loss of predictive efficiency due to the collinearity introduced by x2.

For the (r,s) partition of X=[X1,X2] with X1 of dimension n×r and X2 of dimension n×s, setD(r,s)=X1X1-1/200X2X2-1/2,

and denote the canonical moment matrix as(3) R=D(r,s)(XX)D(r,s)=Ir×rX1X1-1/2(X1X2)X2X2-1/2X2X2-1/2(X2X1)X1X1-1/2Is×s;(3)

with determinantdet(R)=det(XX)det(X1X1)det(X2X2)=1GVIF(X2|X1);

equivalently,det(R)=det(Ir×r-Br×sBs×r)=det(Is×s-Bs×rBr×s)

where Br×s=X1X1-1/2(X1X2)X2X2-1/2.

In the case {r=p-1, s=1}, X2=xp is a n×1 vector and the partitioned design X=[X1,xp] has det(R)=1-[xpX1X1X1-1X1xp]/xpxp. From standard facts for the inverse of a partitioned matrix, for example, Myers (Citation1990, p. 459), VIFp=[R-1]p,p=[D(p-1,1)-1(XX)-1D(p-1,1)-1]p,p can be computed directly asxpxp1/2(XX)p,p-1xpxp1/2=xpxpxpxp-xpX1X1X1-1X1xp=11-[xpX1X1X1-1X1xp]/xpxp=1det(R)=GVIF(X2|X1).

Table 1. CCD with parameter a, canonical index γX2, and GVIF

We study the eigenvalue structure of M(r,s) in Appendix 1. Let {λ1λ2λmin(r,s)0} be the non-negative singular values of X1X1-1/2(X1X2)X2X2-1/2. It is shown in Appendix 1 that an alternative formulation for GVIF is(4) GVIF(X2|X1)=i=1min(r,s)(1-λi2)-1.(4)

4. Quadratic model with p=3

For the partitioning X=[Xr|Xs], the canonical moment matrix, Equation 3, has the identity matrices Ir, Is down the diagonal and off-diagonal array X1X1-1/2X1X2X2X2-1/2. For the quadratic model y=β0+β1x+β2x2 and partitioning X=[1,x|x2], we haveR=10ρ101ρ2ρ1ρ21.

From Equation 4, GVIF(x2|1,x)=(1-λ2)-1 where λ=ρ12+ρ22 is the unique positive singular value of [ρ1,ρ2]. DenoteγX2=ρ12+ρ22

as the canonical index with GVIF(x2|1,x)=11-γX2=1det(R). Surprisingly, many higher order designs also have the off-diagonal entry of the canonical moment matrix with a unique positive singular value with GVIF(X2|X1)=11-γX2 with the collinearity between the lower order terms and the upper order terms as a function of the canonical index γX2.

5. Central composite and factorial designs for quadratic models (p=6)

In this section, we compare the central composite design (CCD) X of Box and Wilson (Citation1951) and the factorial design Z. The design points are shown in Table of Appendix 2. Both designs are 9×6 and use the quadratic response modely=β0+β1x1+β2x2+β11x12+β22x22+β12x1x2+ε.

The CCD traditionally uses the value a=2 in four entries, while the factorial design uses the value a=1. To study the difference in the designs with these different values, we computed the GVIF to compare the “orthogonality” between the lower order terms X1 of dimension 9×3 and the higher order quadratic terms X2 of dimension 9×3. The off-diagonal B3×3 entry of R from Equation 3 in Section 3 has the formB3×3=ρ1ρ20000000

with ρ1=ρ2=232+a28+2a4, canonical index γX2=ρ12+ρ22 and GVIF(X2|X1)=11-γX2 as in the quadratic model case with p=3 shown in the Section 4. For Table , if a=1, then ρ1=ρ2=2/10, γX2=8/10, and GVIF(X2|X1)=5. Surprisingly, the classical choice of a=2 gives the largest value for GVIF(X2|X1), that is the worst value, indicating the greatest collinearity between the lower and higher order terms, as noted in O’Driscoll and Ramirez (Citationin press).

6. Larger designs (p=10)

We consider the quadratic response surface designs for(5) y=β0+β1x1+β2x2+β3x3+β11x12+β22x22+β33x32+β12x1x2+β13x1x3+β23x2x3+ε(5)

with n responses and with X partitioned into X1|X2 with X1 the four lower order terms (r=4) and X2 the six quadratic terms (s=6). Four popular designs are given in Appendix 2. They are the hybrid designs (H310 and H311B) of Roquemore (Citation1976) Tables and , the Box and Behnken (Citation1960) (BBD) design Table , and the CCD of Box and Wilson (Citation1951) Table .

For each design, we compute the 10×10 canonical moment matrix. It is striking that, for all these designs, the off-diagonal 4×6 array in R has only one non-zero singular value with its square the canonical index γX2. It follows that GVIF(X2|X1)=11-γX2.

Table 2. Hybrid designs H310, H311B, Box and Behnken BBD,and CCD

Table 3. Singular values for off-diagonal array of R for BDD and SCD with GVIF

Table reports that the design H310 is the most conditioned with respect to the GVIF with the least amount of collinearity between the lower and higher order terms.

7. More complicated designs with ordered singular values

Let X be the minimal design of Box and Draper (Citation1974) BDD with n=11 from Table , and let Z be the small composite design of Hartley (Citation1959) SCD with n=11 from Table for the quadratic response surface model (r=4 and s=6) as in Equation (Equation5). Let α={α1αr0} and β={β1βr0} be the non-negative singular values of the off-diagonal array for RX and RZ, respectively. As αiβi(1ir) in Table , it follows that GVIF(X2|X1)GVIF(Z2|Z1) showing less collinearity between the lower and higher order terms for the BDD design.

8. An improved H310 design

When the diagonal matrix Λr×s in Equation 6 in Appendix 1 has only one non-zero entry, we have denoted the square of this value the canonical index. We extend this definition to the case when X1X1-1/2(X1X2)X2X2-1/2 has multiple positive singular values. The Frobenious norm for a rectangular matrix Ar×s is defined by ||A||F2=i=1rj=1saij2=trace(AA). For a design matrix X, we extend the definition of the canonical index with γX2=||Λr×s||F2. Alternatively, γX2=trace(X2X2-1(X2X1)X1X1-1(X1X2)) as in Equation 7.

We examine, in detail, the H310 design matrix X11×10, Table in Appendix 2, with our attention to the value of -0.1360 in row 2 for x3. In succession, we will replace the values {1.1736,0.6386,-0.9273,1.0000,1.2906,-0.1360} by a free parameter and use γX2 to determine an optimal value. For example, replacing the four entries which are 1.1736 with c1, we calculate the minimum value for γX2=0.8199 with c1=1.1768 denoted cmin in Table . These values are within the four digit accuracy of the data. We performed a similar calculation with c2 using the four entries which are 0.6386; with c3 with the four entries which are -0.9273; with c4 with the eight entries which are 1; and with c5 with the single entry 1.2906. The original design has γX2=0.8199. The entries in the H310 design are given to four significant digits. With this precision, the original design is nearly optimal with respect to the canonical index γX2 for the first five entries in Table . The sixth entry of c6=-0.1360 was not optimal with γX2=0.8181 with cmin=-0.01264, a magnitude value smaller.

Table 4. Optimal values cmin for γX2

Denote the “improved” H310 design as the H310 design with the value of c6=-0.01264. The “improved” H310 also has a unique positive singular value for the off-diagonal of R with its square the canonical index γX2. All of the standard design criteria favor the “improved” H310 design over the H310 design, which was originally constructed based on the rotatability criterion to maintain equal variances for predicted responses for points that have the same distance from the design center. As usual A(X)=tr((XX)-1), D(X)=det((XX)-1), and E(X)=max{eigenvalues of (XX)-1}. The small relative changes Δ in the design criteria are shown in Table in Column 4.

Table 5. Design criteria for the “improved” H310 with Δ the relative change

The abnormality of the second row in H310 has been noted in Jensen (Citation1998) who showed that the design is least sensitive to the second row of X, the row containing the value c6=-0.1360.

9. Conclusions

The VIF measure the penalty for adding a non-orthogonal variable to a linear regression. The VIF can be computed as a ratio of determinant as in Equation 1. A similar ratio criterion was studied by Fox and Monette (Citation1992) to measure the effect of adding a subset of new variables to a design and they dubbed it the generalized variance inflation factor GVIF, Equation 2. We have noted the relationship between GVIF and the singular values of the off-diagonal array in the canonical moment matrix and have used GFIV to study standard quadratic response designs. The H310 design of Roquemorer (Citation1976) was shown not to be optimal with respect to GFIV and an “improved” H310 design was introduced which was favored over H310 using the standard design criteria A, D, and E.

Additional information

Funding

The authors received no direct funding for this research.

Notes on contributors

Diarmuid O’Driscoll

Diarmuid O’Driscoll is the head of the Mathematics and Computer Studies Department at Mary Immaculate College, Limerick. He was awarded a Travelling Studentship for his MSc at University College Cork in 1977. He has taught at University College Cork, Cork Institute of Technology, University of Virginia, and Frostburg State University. His research interests are in mathematical education, errors in variables regression, and design criteria. In 2014, he was awarded a Teaching Heroes Award by the National Forum for the Enhancement of Teaching and Learning (Ireland).

Donald E. Ramirez

Donald E Ramirez is a full professor in the Department of Mathematics at the University of Virginia in Charlottesville, Virginia. He received his PhD in Mathematics from Tulane University in New Orleans, Louisiana. His research is in harmonic analysis and mathematical statistics. His current research interests are in statistical outliers and ridge regression.

References

  • Belsley, D. A. (1986). Centering, the constant, first-differencing, and assessing conditioning. In E. Kuh & D. A. Belsley (Eds.),Model reliability (pp. 117–153). Cambridge: MIT Press.
  • Berk, K. (1977). Tolerance and condition in regression computations. Journal of the American Statistical Association, 72, 863–866.
  • Box, G. E. P., & Behnken, D. W. (1960). Some new three-level designs for the study of quantitative variables. Technometrics, 2, 455–475.
  • Box, M. J., & Draper, N. R. (1974). On minimum-point second order design. Technometrics, 16, 613–616.
  • Box, G. E. P., & Wilson, K. B. (1951). On the experimental attainment of optimum conditions. Journal of the Royal Statistical Society, Series B, 13, 1–45.
  • Eaton, M. L. (1983). Multivariate statistics. New York, NY: Wiley.
  • Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Journal of the American Statistical Association, 87, 178–183.
  • Garcia, C. B., Garcia, J., & Soto, J. (2011). The raise method: An alternative procedure to estimate the parameters in presence of collinearity. Quality and Quantity, 45, 403–423.
  • Hartley, H. O. (1959). Smallest composite design for quadratic response surfaces. Biometrics, 15, 611–624.
  • Jensen, D. R. (1998). Principal predictors and efficiency in small second-order designs. Biometrical Journal, 40, 183–203.
  • Jensen, D. R., & Ramirez, D. E. (1993). Efficiency comparisons in linear inference. Journal of Statistical Planning and Inference, 37, 51–68.
  • Liao, D., & Valliant, R. (2012). Variance inflation in the analysis of complex survey data. Survey Methodology, 38, 53–62.
  • Myers, R. (1990). Classical and modern regression with applications (2nd ed.). Boston, MA: PWS-Kent.
  • O’Driscoll, D., & Ramirez, D. E. (in press). Revisiting some design criteria ( under review).
  • Roquemorer, K. G. (1976). Hybrid designs for quadratic response surfaces. Technometrics, 18, 419–423.
  • Stewart, G. W. (1987). Collinearity and least squares regression. Statistical Science, 2, 68–84.
  • Thiel, H. (1971). Principles of econometrics. New York, NY: Wiley.
  • Wichers, R. (1975). The detection of multicollinearity: A comment. Review of Economics and Statistics, 57, 366–368.
  • Willan, A. R., & Watts, D. G. (1978). Meaningful multicollinearity measures. Technometrics, 20, 407–412.

Appendix 1

We study the eigenvalue structure of M(r,s). Let {λ1λ2λmin(r,s)0} be the non-negative singular values of X1X1-1/2(X1X2)X2X2-1/2.

As with the canonical correlation coefficients Eaton (Citation1983), write the off-diagonal rectangular array Br×s of R as PΛQwith P and Q orthogonal matrices and Λr×s the rectangular diagonal matrix with the non-negative singular values down the diagonal. SetL=Pr×r0r×s0s×rQs×s.

For notational convenience, we assume rs. The matrix L is orthogonal and transforms RLRL into diagonal matrices:(A1) IrΛr×sΛs×rIs=Ir[SVr×r|0r×(s-r)][SVr×r|0r×(s-r)]Is(A1)

with Λr×s=[SVr×r|0r×(s-r)] where SVr×r is the diagonal matrix of the non-negative singular values. Since L is orthogonal, this transformation has not changed the eigenvalues. To compute the determinant of R, convert the matrix in Equation 6 into an upper diagonal matrix by Gauss Elimination on Λs×r. This changes r of the 1s on the diagonal in rows r+1 to r+r into 1-λi2, and thus det(R)=i=1min(r,s)(1-λi2) withGVIF(X2|X1)=i=1min(r,s)11-λi2.

The singular values of R12=X1X1-1/2(X1X2)X2X2-1/2 are the non-negative square roots of the eigenvalues of ΛΛ denoted by(A2) eigvals(ΛΛ)=eigvals((QR12P)(PR12Q)))=eigvals(X1X1-1)(X1X2)X2X2-1(X2X1)).(A2)

If the trace of the inverse of the matrix in Equation 6 is required, then we note thatIrΛr×sΛs×rIs-1=(Ir-Λr×sΛs×r)-1-Λr×s(Is-Λs×rΛr×s)-1-Λs×r(Ir-Λr×sΛs×r)-1(Is-Λs×rΛr×s)-1

with trace given by tr((LRL)-1)=|r-s|+2i=1min(r,s)11-λi2.

Appendix 2

Table A1. The lower order matrix for the CCD with center run with a=2, n=9 and the lower order matrix for the factorial design with center run n=9

Table A2. The lower order matrix for the hybrid (H310) design of Roquemore (Citation1976) with center run, n=11

Table A3. The lower order matrix for the hybrid (H311B) design of Roquemore (Citation1976) with center run, n=11

Table A4. The lower order matrix for the Box and Behnken (Citation1960) design (BBD) with center run, n=13

Table A5. The lower order matrix for the Box and Wilson (Citation1951) CCD for α=1.732 with center run, n=15

Table A6. The lower order matrix for the Box and Draper (Citation1974) minimal design (BDD) with center run, n=11

Table A7. The Lower order matrix for the small composite design of Hartley (Citation1959) (SCD) for α=1.732 with center run, n=11