1,470
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Comparison of optimal linear, affine and convex combinations of metamodels

ORCID Icon
Pages 702-718 | Received 08 Jul 2019, Accepted 20 Mar 2020, Published online: 01 May 2020

Abstract

In this article, five different formulations for establishing optimal ensembles of metamodels are presented and compared. The comparison is done by minimizing different norms of the residual vector of the leave-one-out cross-validation errors for linear, affine and convex combinations of 10 metamodels. The norms are taken to be the taxicab, the Euclidean and the infinity norm, respectively. The ensemble of metamodels consists of quadratic regression, Kriging with linear or quadratic bias, radial basis function networks with a-priori linear or quadratic bias, radial basis function networks with a-posteriori linear or quadratic bias, polynomial chaos expansion, support vector regression and least squares support vector regression. Eight benchmark functions are studied as ‘black-boxes’ using Halton and Hammersley samplings. The optimal ensembles are established for either one of the samplings and then the corresponding root mean square errors are established using the other sampling and vice versa. In total, 80 different test cases (5 formulations, 8 benchmarks and 2 samplings) are studied and presented. In addition, an established design optimization problem is solved using affine and convex combinations. It is concluded that minimization of the taxicab or Euclidean norm of the residual vector of the leave-one-out cross-validation errors for convex combinations of metamodels produces the best ensemble of metamodels.

1. Introduction

The use of metamodels, such as Kriging in Kleijnen (Citation2009), radial basis function networks in Amouzgar and Strömberg (Citation2017), polynomial chaos expansion in Sudret (Citation2008), support vector machines in Strömberg (Citation2018a) and support vector regression in Yun, Yoon, and Nakayama (Citation2009), for approximating computationally expensive models, such as nonlinear finite element models, in order to perform advanced studies, such as different disciplines of design optimization studied by, for example, Strömberg and Tapankov (Citation2012) and Amouzgar, Rashid, and Strömberg (Citation2013), is today a most established approach in engineering, see for example the review by Wang and Shan (Citation2006). The best choice of metamodel depends strongly on the character of the response, the choice of design of experiments and the definition of best. Therefore, searching for a general best single metamodel valid for all situations is like searching for the holy grail. A more fruitful search is to find the best ensemble of metamodels for a particular problem. This is the topic of the current article. To be more precise, in this article the best linear, affine and convex combinations, respectively, of 10 metamodels are established, where best is defined by the taxicab norm, the Euclidean norm or the infinity norm of the residual vector of the leave-one-out cross-validation errors.

During recent years, initiated by the work of Viana, Haftka, and Steffen (Citation2009) and Acar and Rais-Rohani (Citation2009), the choice of adopting affine combinations of metamodels has emerged to be a standard approach for setting up optimal ensembles of metamodels for design optimization. However, the present author's experience with using affine combinations of metamodels has often been poor. Overfitting and poor generalization of the optimal affine ensemble are frequently observed. Typically, a large positive weight is cancelled out by an almost equally large negative weight. In fact, this issue has already been discussed in Viana, Haftka, and Steffen (Citation2009), but explained to probably depend on the approximation used in the objective function. Furthermore, it was also discussed in this early article whether to introduce a convex constraint in order to overcome this pitfall. Therefore, one might be surprised that so many contributions promote the approach of using affine combinations of metamodels, especially since the pitfalls with affine combinations are well known in machine learning, see for example the textbook by Zhou (Citation2012). In fact, this has already been pointed out by Breiman (Citation1996) when he investigated the ideas of stacked generalization proposed by Wolpert (Citation1992). Breiman clearly states in his article that the proper constraints on the weights wi of a combination of metamodels are wi=1 and wi0. In conclusion, the ensemble of metamodels should be a convex combination, not an affine combination. Indeed, the investigations done by Strömberg (Citation2018bCitation2019) support this statement. This is also demonstrated in the following article, where optimal linear, affine and convex combinations are established by minimizing the norm of the residual vector of the leave-one-out cross-validation errors and validating the optimal ensembles of metamodels (OEMs) by calculating the root mean square errors (RMSE) for eight benchmark functions.

Early work on ensembles of metamodels can be found in Goel et al. (Citation2007), wherein a weighted average of the metamodels was adopted. Acar (Citation2010) studied various approaches for constructing affine combinations of metamodels using local measures. Xiao, Yi, and Xu (Citation2011) established affine combinations of metamodels with a recursive arithmetic average. Lee and Choi (Citation2014) proposed a new pointwise affine combination of metamodels by using a nearest points cross-validation approach. Shi et al. (Citation2016) proposed efficient affine combinations of radial basis function networks. Most recently, Song et al. (Citation2018) suggested an advanced and robust affine combination of metamodels by using extended adaptive hybrid functions. Two years earlier, Ferreira and Serpa (Citation2016) discussed different variants of least squares approaches, where a formulation including the constraints wi0 was mentioned briefly.

The contents of this article are as follows: in the next section, optimal linear, affine and convex combinations are formulated by using the norm of the residual vector of the leave-one-out cross-validation errors. For linear and affine combinations, only the Euclidean norm is considered, but for convex combination the taxicab and the infinity norm are also utilized in order to obtain two linear programming (LP) problems and not only a quadratic one (QP). In addition, in Appendix 1, a symbolic proof is given showing that the established optimal affine formulation presented in Viana, Haftka, and Steffen (Citation2009) is equivalent to minimizing the predicted residual sum of squares (PRESS) for affine combinations. In Section 3, the ensemble of metamodels containing different settings of quadratic regression, Kriging, radial basis function networks (RBFN), polynomial chaos expansion (PCE), support vector regression (SVR) and least squares SVR are presented. In Section 4, the optimal linear, affine and convex combinations of metamodels are compared for eight benchmark functions, by training the metamodels for one particular sampling and then validating the RMSE for another sampling. In addition, a well-known design optimization problem is solved using affine and convex combinations of metamodels. Finally, some concluding remarks are given.

2. Optimal combinations of metamodels

In this work, a new metamodel yen=yen(x) is defined as a linear combination of an ensemble of metamodels, i.e. (1) yen=yen(x)=i=1Mwiyi(x),(1) where M is the total number of metamodels in the ensemble, wi are weights and yi=yi(x) represents a particular metamodel. In the next section, the basic equations of the 10 metamodels yi as implemented in this article are presented.

The leave-one-out cross-validation error at a point xˆk of yen is given by (2) ek=e(xˆk)=fˆkyen(k)(xk)=fˆki=1Mwiyi(k)(xˆk),(2) where yi(k)(x) represents the metamodel with the kth data point excluded from the sampling set {xˆi,fˆi}. If now the leave-one-out cross-validation in (Equation2) is executed for every data point, then the vector of PRESS residuals can be established: (3) e={ei}=fˆYw,(3) where fˆ contains fˆi, w is a vector of weights wi and (4) [Y]ij=yj(i)(xˆi).(4) In order to find the best linear combination of metamodels yen, the PRESS can be minimized, i.e. (5) (LS)minwe2=eTe.(5) The solution to this problem is of course given by the well-known normal equation. Thus, the optimal weights are given by (6) w=(YTY)1YTfˆ.(6) The affine constraint wT1=1 can also be included in (Equation5), i.e. (7) (LS-A)minwe2s.t. wT1=1.(7) The solution to this problem is given by (8) w=(YTY)1YTfˆλ1,(8) where (9) λ=1T(YTY)1YTfˆ11T(YTY)11.(9) Here, and in the following, 1 represents a column vector of ones of proper size.

It is possible to prove that (Equation8) and (Equation9) taken together are equivalent to (10) w=C111TC11,(10) where (11) [C]ij=eiTej(11) and (12) {ei}k=fˆkyi(k)(xk)(12) is the leave-one-out cross-validation error for a particular metamodel yi(x). A symbolic proof of this statement is given in Appendix 1. Equation (Equation10) is the normal equation to (13) minwwTCws.t. wT1=1,(13) which was suggested by Viana, Haftka, and Steffen (Citation2009). Notice, of course, that the statement of equivalence does not hold for a linear combination. Then, when C is positive definite, the optimal solution to (14) minwwTCw(14) is the zero vector, which apparently differs from (Equation6) in general.

In addition, as suggested already by Breiman (Citation1996) and discussed by Viana, Haftka, and Steffen (Citation2009), a natural constraint to include in (Equation7) is that wi0, i.e. (15) (LS-C)minwe2s.t. wT1=1,wi0,i=1,,M,(15) which, of course, is a QP-problem. Furthermore, by adding this constraint, (Equation1) becomes a convex combination.

Instead of using the Euclidean norm e2 in (Equation15), the taxicab norm e1 or the infinity norm e can be adopted according to (16) e1=i=1N|ei|,e=max(e1,,eN),(16) where N is the number of sampling points.

The problem in (Equation15) with the Euclidean norm switched to the taxicab norm corresponds to the following LP-problem: (17) (LST-C)min(w,p,q)i=1Npi+qis.t. Ywfˆ=pq,wT1=1,wi,pj,qj0,i=1,,M, j=1,,N.(17) Finally, using the infinity norm, (Equation15) can be rewritten as the following LP-problem: (18) (LSINF-C)min(w,t)ts.t. Ywfˆt1,Yw+fˆt1,wT1=1,wi,t0,i=1,,M.(18) In summary, in order to establish optimal ensembles of metamodels, the following optimization problems are considered: LS in (Equation5), LS-A in (Equation7), LS-C in (Equation15), LST-C in (Equation17) and, finally, LSINF-C in (Equation18). In Section 4, these five formulations are used to establish optimal linear, affine and convex combinations of metamodels for eight benchmark functions, which then are validated by calculating the RMSE.

3. Metamodels

A set of sampling data {xˆi,fˆi}, obtained from design of experiments as mentioned in the previous section, can be represented with an appropriate function, which is called a response surface, a surrogate model or a metamodel. One choice of such a function is the regression model given by (19) f=f(x)=ξ(x)Tβ,(19) where ξ=ξ(x) is a vector of polynomials of x, and β contains regression coefficients. By minimizing the sum of square errors, i.e. (20) minβi=1N(Xijβjfˆi)2,(20) where Xij=ξj(xˆi) and N is the number of sampling points, then optimal regression coefficients are obtained from the normal equation reading (21) β=(XTX)1XTfˆ.(21) A quadratic regression model, denoted model Q in the next section, is used as one of the 10 metamodels in the ensemble.

Examples of other useful metamodels, which are included in the ensemble, are Kriging, RBFN, PCE, SVR and least square SVR. The basic equations of these models as implemented in the present author's in-house toolboxFootnote1 are presented below.

3.1. Kriging

The Kriging models are given by (22) f(x)=ξ(x)Tβ+r(x)TR1θ)(fˆXβ,(22) where the first term represents the global behaviour by a linear (model Kr-L) or quadratic regression model (model Kr-Q) and the second term ensures that the sample data is fitted exactly. R=R(θ)=[Rij], where (23) Rij=Rij(θ,xˆi,xˆj)=expk=1Nθk(xˆkixˆki)2.(23) Furthermore, θ is obtained by maximizing the likelihood function (24) 1σNdet(R)(2π)Nexp(Xβfˆ)TR1(Xβfˆ)2σ2(24) using a genetic algorithm and establishing (25) β=(XTR1(θ)X)1XTR1(θ)fˆ(25) from the optimality conditions.

3.2. Radial basis function networks

For a particular input xˆk, the outcome of the radial basis function network can be written as (26) fk=f(xˆk)=i=1NΦAkiαi+i=1NβBkiβi,(26) where αi and βi are constants defined by (Equation29) and (Equation30), or (Equation31) presented below, (27) Aki=Φi(xˆk)andBki=ξi(xˆk).(27) Here, Φi=Φi(xˆk) represents the radial basis function and the second term in (Equation26) is a linear or quadratic bias. Furthermore, for a set of signals, the corresponding outgoing responses fˆ={fˆi} of the network can be formulated compactly as (28) fˆ=Aα+Bβ,(28) where α={αi}, β={βi}, A=[Aij] and B=[Bij]. If β is given a priori by the normal equation in (Equation21) as (29) β=(BTB)1BTfˆ,(29) then (30) α=A1fˆBβˆ.(30) Two settings of this model are utilized in the ensemble depending on the choice of bias, called model Rpri-L and model Rpri-Q, respectively.

Otherwise, when the bias is unknown, α and β are established by solving (31) ABBT0αβ=fˆ0.(31) Two settings of this model are also utilized in the ensemble, which are called model Rpost-L and model Rpost-Q.

3.3. Polynomial chaos expansion

Polynomial chaos expansion (model PCE) by using Hermite polynomials ϕn=ϕn(x) can be written as (32) f(x)=i=0Mtcij=1NVARϕi(xj),(32) where Mt+1 is the number of terms and constant coefficients ci, and NVAR is the number of variables xi. The Hermite polynomials are defined by (33) ϕn=ϕn(x)=(1)nexpx22dndxnexpx22.(33) For instance, one has (34) ϕ0=1,ϕ1=x,ϕ2=x21,ϕ3=x33x,ϕ4=x46x2+3,ϕ5=x510x3+15x,ϕ6=x615x4+45x215,ϕ7=x721x5+105x3105x.(34) The unknown constants ci are then established by using the normal equation in (Equation21). A nice feature of the polynomial chaos expansion is that the mean of f(X) in (Equation32) for uncorrelated standard normal distributed variables Xi is simply given by (35) E[f(X)]=c0.(35)

3.4. Support vector regression

The soft nonlinear support vector regression model (model SVR) reads (36) f(x)=i=1Nλik(xi,x)i=1Nλˆik(xi,x)+b,(36) where k(xi,x) is the kernel function, and λi, λˆi and b are established by solving (37) min(λ,λˆ)12i=1Nj=1N(λiλˆi)(λjλˆj)k(xi,xj)+j=1N(λˆiλi)fˆi+δj=1N(λi+λˆi)s.t. j=1N(λiλˆi)=0,0λi,λˆiC,i=1,,N.(37) Finally, the corresponding least square support vector regression model (model L-SVR) is established by solving (38) 01T1T1B+γIB1BB+γIbλλˆ=c0fˆδ1fˆδ1,(38) where γ=1/C and (39) B=[Bij],Bij=k(xi,xj).(39) In summary, the ensemble consists of the following metamodels: Q, Kr-L, Kr-Q, Rpri-L, Rpri-Q, Rpost-L, Rpost-Q, PCE, SVR and L-SVR. In the next section, optimal linear, affine and convex combinations of this ensemble for eight benchmark functions are established.

4. Examples

In this section, optimal ensembles of the 10 metamodels presented in the previous section are established by using the five formulations presented in Section 2 (LS, LS-A, LS-C, LST-C and LSINF-C). This is done for the eight benchmark functions presented below in (Equation40): (40) f1=sin(4(x11)2)cos(4(x21)2),1xi2,f2=100(x2x12)2+(x11)2,2xi2,f3=(1000(4/x12)4+1000(4/x22)4),1xi4,f4=peaks(x1,x2),2xi2,f5=0.6+sin(16/15x11)+sin(16/15x11)2+sin(16/15x21)+sin(16/15x21)2,1xi1,f6=cos(x1)sin(x2)x1/(x22+1),5xi5,f7=(x1+10)2+(x2+10)2190exp(0.1x120.1x22),10xi10,f8=(18x1+7x127/3x13+1/4x14)x22exp(x2),0xi5.(40) Seven of the eight functions in (Equation40) are well-known test examples, while f3 is a test function developed by Strömberg (Citation2016, Citation2018a, Citation2018b, Citation2019) in order to evaluate algorithms for reliability-based design optimization. The fourth function f4 is the well-known peaks function implemented in MATLAB®, where the complete analytical expression can be found easily in the documentation of MATLAB®.

Now, the eight test functions in (Equation40) are considered to be ‘black-boxes’, which are treated by setting up two different designs of experiments with 30 sampling points; Halton and Hammersley sampling according to Figure  are adopted. Thus, in total, 80 optimal ensembles of metamodels are established.Footnote2

Figure 1. Left: Halton sampling. Right: Hammersley sampling.

Figure 1. Left: Halton sampling. Right: Hammersley sampling.

The performance of each OEM is evaluated by taking the RMSE for the other sampling, i.e. for an OEM established for Halton sampling the RMSE is calculated for Hammersley sampling and vice versa. The RMSE for the OEM established for Halton sampling are presented in Table , and for the Hammersley sampling the RMSE are given in Table . In general, the RMSE are lower for the convex combinations compared to LS and LS-A. Typically, the LS and LS-A have similar values. The LS-C and LST-C have similar values, which, in general, are slightly lower than the values obtained using LSINF-C.

Table 1. RMSE for the five OEMs and all test functions with Halton sampling.

Table 2. RMSE for the five OEMs and all test functions with Hammersley sampling.

By ranking the OEMs for each test case from 1 (worst) to 5 (best), one obtains the ranking presented in Figure . A clear trend is that LS-C and LST-C have a better performance than the other OEMs.

Figure 2. Ranking of the five formulations for establishing the OEMs.

Figure 2. Ranking of the five formulations for establishing the OEMs.

Not only is a better performance obtained using LS-C or LST-C, but also a smoother behaviour of the OEM compared to the behaviour obtained using a linear or affine combination. This difference appears clearly in Figures , and can be explained by overfitting and the number of metamodels included in the OEM. In general, all 10 metamodels appear in the OEMs of LS and LS-A. For the convex OEMs, three to four metamodels are mostly included in the OEMs. This result is summarized in Figure . Thus, automatic pruning of the metamodels is produced by using convex combinations.

Figure 3. The modified Brent test function f7 compared to the OEMs by using LS, LS-A and LS-C. Notice the overfitting for LS and LS-A. (a) f7 in (Equation40)—modified Brent, analytic. (b) f7 in (Equation40)—modified Brent, LS, Halton. (c) f7 in (Equation40)—modified Brent, LS-A, Halton. (d) f7 in (Equation40)—modified Brent, LS-C, Halton.

Figure 3. The modified Brent test function f7 compared to the OEMs by using LS, LS-A and LS-C. Notice the overfitting for LS and LS-A. (a) f7 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—modified Brent, analytic. (b) f7 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—modified Brent, LS, Halton. (c) f7 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—modified Brent, LS-A, Halton. (d) f7 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—modified Brent, LS-C, Halton.

Figure 4. The test function f3 compared to the OEMs by using LS, LS-A and LS-C. Notice the strong overfitting in one of the corners for LS and LS-A. (a) f3 in (Equation40)—analytic. (b) f3 in (Equation40)—LS, Hammersley. (c) f3 in (Equation40)—LS-A, Hammersley. (d) f3 in (Equation40)—LS-C, Hammersley.

Figure 4. The test function f3 compared to the OEMs by using LS, LS-A and LS-C. Notice the strong overfitting in one of the corners for LS and LS-A. (a) f3 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—analytic. (b) f3 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—LS, Hammersley. (c) f3 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—LS-A, Hammersley. (d) f3 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—LS-C, Hammersley.

Figure 5. The Hosaki test function f8 compared to the OEMs by using LS, LS-A and LS-C. Notice stronger overfitting along the boundaries for LS and LS-A compared to LS-C. (a) f8 in (Equation40)—Hosaki, analytic. (b) f8 in (Equation40)—Hosaki, LS, Halton. (c) f8 in (Equation40)—Hosaki, LS-A, Halton. (d) f8 in (Equation40)—Hosaki, LS-C, Halton.

Figure 5. The Hosaki test function f8 compared to the OEMs by using LS, LS-A and LS-C. Notice stronger overfitting along the boundaries for LS and LS-A compared to LS-C. (a) f8 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—Hosaki, analytic. (b) f8 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—Hosaki, LS, Halton. (c) f8 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—Hosaki, LS-A, Halton. (d) f8 in (Equation40(40) f1=sin⁡(4(x1−1)−2)cos⁡(4(x2−1)−2),1≤xi≤2,f2=100∗(x2−x12)2+(x1−1)2,−2≤xi≤2,f3=(1000∗(4/x1−2)4+1000∗(4/x2−2)4),1≤xi≤4,f4=peaks(x1,x2),−2≤xi≤2,f5=0.6+sin⁡(16/15∗x1−1)+⋯sin⁡(16/15∗x1−1)2+⋯sin⁡(16/15∗x2−1)+⋯sin⁡(16/15∗x2−1)2,−1≤xi≤1,f6=cos⁡(x1)sin⁡(x2)−x1/(x22+1),−5≤xi≤5,f7=(x1+10)2+(x2+10)2−⋯190∗exp⁡(−0.1∗x12−0.1∗x22),−10≤xi≤10,f8=(1−8x1+7x12−7/3x13+⋯1/4x14)x22exp⁡(−x2),0≤xi≤5.(40) )—Hosaki, LS-C, Halton.

Figure 6. The number of metamodels on average for the OEMs by using the five different formulations.

Figure 6. The number of metamodels on average for the OEMs by using the five different formulations.

LS and LS-A also suffer from overfitting. Frequently, a large positive weight is cancelled out by a large negative value on another weight, see for example the optimal weights for the modified Brent test function in Table  and the plots in Figure . In this particular test example, the performances of the convex combinations are superior over the performances of the linear and affine combinations. This can be concluded by studying the plots in Figure .

Table 3. The optimal weights for the modified Brent test function f7. Notice the pruning obtained using LS-C, LST-C and LSINF-C, as well as the appearance of large positive and negative weights cancelling out each other for the LS and LS-A.

Although automatic pruning is generated using convex combinations, all metamodels appear sometimes in the convex OEMs, see Figure . From this figure one can also see that Kriging-L and PCE are the real workhorses for the examples studied in this article, but also Rpost-Q and SVR appear more frequently than the others.

Figure 7. The frequency of metamodels appearing in the ensemble for LS-C, LST-C and LSINF-C.

Figure 7. The frequency of metamodels appearing in the ensemble for LS-C, LST-C and LSINF-C.

Finally, a most established design optimization benchmark is solved using optimal affine (LS-A) and convex (LST-C) combinations of metamodels. The welded beam problem considered by Garg (Citation2014) and Amouzgar and Strömberg (Citation2017) is studied. The governing equations of the problem read (41) minxif=1.104,71x12x2+0.048,11x3x4(14+x2)s.t. g1=τ(xi)13,6000,g2=σ(xi)30,0000,g3=x1x40,g4=0.125x10,g5=δ(xi)0.250,g6=6,000Pc(xi)0,(41) where the definitions of the shear stress τ(xi), normal stress σ(xi), displacement δ(xi) and critical force Pc(xi) can be found in Amouzgar and Strömberg (Citation2017). In addition, the variables are bounded by 0.1x1, x42 and 0.1x2, x310. The optimal solution obtained by GA and SLP is (0.24437,6.2175,8.2915,0.24437). An almost identical solution was presented by Garg (Citation2014). In Amouzgar and Strömberg (Citation2017), this benchmark was solved using screening with a total number of 60 sampling points and radial basis function networks. The solution obtained in their work was (0.4147, 3.926, 6.6201, 0.4147), which violated the first constraint.

In this work, the problem is solved by setting up Halton sampling using 60 points without any screening and establishing optimal affine (LS-A by solving Equation10) and convex combinations (LST-C) of the following metamodels: Q, Kr-L, Rpri-Q, Rpost-Q, SVR and L-SVR. The optimal weights for LS-A by solving (Equation10) are presented in Table  and the optimal weights for LST-C by solving (Equation17) are presented in Table . The pruning effect using LST-C appears clearly when compared to the solution for LS-A, where no pruning is obtained. The drawback with large positive weights cancelled out by large negative weights using LS-A appears for f and g5 in Table .

Table 4. The optimal weights for the affine combination (LS-A) defined by (Equation10). No pruning is observed.

Table 5. The optimal weights for the convex combination (LST-C) defined by (Equation17). The pruning effect appears clearly.

In Table , the optimal solutions for all metamodels as well as the affine and convex combinations of metamodels are presented. By just studying the objective value f, one obtains the following ranking: Rpri-Q, L-SVR, Rpost-Q, Kr-L, LS-A, Q, LST-C and SVR. It might seem that the performance of LST-C and SVR are poor. However, by studying the corresponding constraints values in Table , one realizes that only the optimal solutions obtained using LST-C and SVR are feasible. All the other metamodels fail; also the affine combination LS-A fails. Thus, the best performance is obtained using LST-C, which in turn is much better than the performance obtained using SVR.

Table 6. The optimal solution using LS-A and LST-C as well as the metamodels of the ensemble taken separately.

Table 7. The corresponding constraint values for the optimal solutions presented in Table . Negative values are feasible.

5. Concluding remarks

In this article, a general framework for generating optimal linear, affine and convex combinations of metamodels by minimizing the PRESS using the taxicab, Euclidean or infinity norm are presented and studied. Thus, the formulation proposed by Viana, Haftka, and Steffen (Citation2009) is not the starting point in this article, but instead derived symbolically as a special case for affine combinations using the Euclidean norm. It is concluded from the study that convex combinations are preferable over linear as well as the established affine combinations. The risk of overfitting is less and automatic pruning of metamodels is obtained. In conclusion, it is suggested that the optimal weights of the convex combinations of metamodels be established by minimizing the Euclidean norm or the taxicab norm of the residual vector of the leave-one-out cross-validation errors, where the former formulation implies a QP-problem and the latter one is an LP-problem to be solved. For future work, it is suggested to study the performance of these two formulations for large problems, in order to establish if the QP-problem is preferable over the LP-problem or vice versa.

Disclosure statement

No potential conflict of interest was reported by the author(s).

ORCID

Niclas Strömberg  http://orcid.org/0000-0001-6821-5727

Notes

2 The optimal weights for the OEMs for all test functions in (Equation40) can be obtained from the author upon request by sending an e-mail to [email protected]

References

  • Acar, E. 2010. “Various Approaches for Constructing An Ensemble of Metamodels Using Local Measures.” Structural and Multidisciplinary Optimization 42: 879–896. doi:10.1007/s00158-010-0520-z
  • Acar, E., and M. Rais-Rohani. 2009. “Ensemble of Metamodels with Optimized Weight Factors.” Structural and Multidisciplinary Optimization 37: 279–294. doi:10.1007/s00158-008-0230-y
  • Amouzgar, K., A. Rashid, and N. Strömberg. 2013. “Multi-Objective Optimization of a Disc Brake System by Using SPEA2 and RBFN.” In Proceedings of the 39th Design Automation Conference. New York: ASME.
  • Amouzgar, K., and N. Strömberg. 2017. “Radial Basis Functions As Surrogate Models with a Priori Bias in Comparison with a Posteriori Bias.” Structural and Multidisciplinary Optimization 55: 1453–1469. doi:10.1007/s00158-016-1569-0
  • Breiman, L. 1996. “Stacked Regressions.” Machine Learning 24: 49–64. doi:10.1007/BF00117832
  • Ferreira, W. G., and A. L. Serpa. 2016. “Ensemble of Metamodels: The Augmented Least Squares Approach.” Structural and Multidisciplinary Optimization 53: 1019–1046. doi:10.1007/s00158-015-1366-1
  • Garg, H. 2014. “Solving Structural Engineering Design Optimization Problems Using An Artificial Bee Colony Algorithm.” Journal of Industrial and Management Optimization 10 (3): 777–794. doi:10.3934/jimo.2014.10.777
  • Goel, T., R. T. Haftka, W. Shyy, and N. V. Queipo. 2007. “Ensemble of Surrogates.” Structural and Multidisciplinary Optimization 33: 199–216. doi:10.1007/s00158-006-0051-9
  • Kleijnen, J. P. C. 2009. “Kriging Metamodeling in Simulation: A Review.” European Journal of Operational Research 192 (3): 707–716. doi:10.1016/j.ejor.2007.10.013
  • Lee, Y., and D.-H. Choi. 2014. “Pointwise Ensemble of Meta-Models Using ν Nearest Points Cross-Validation.” Structural and Multidisciplinary Optimization 50: 383–394. doi:10.1007/s00158-014-1067-1
  • Shi, R., L. Liu, T. Long, and J. Liu. 2016. “An Efficient Ensemble of Radial Basis Functions Method Based on Quadratic Programming.” Engineering Optimization 48: 1202–1225. doi:10.1080/0305215X.2015.1100470
  • Song, X., L. Lv, J. Li, W. Sun, and J. Zhang. 2018. “An Advanced and Robust Ensemble Surrogate Model: Extended Adaptive Hybrid Functions.” Journal of Mechanical Design 140 (4): Article ID 041402. doi:10.1115/1.4039128
  • Strömberg, N. 2016. “Reliability-Based Design Optimization by Using a SLP Approach and Radial Basis Function Networks.” In Proceedings of the 2016 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference (IDETC/CIE). New York: ASME.
  • Strömberg, N. 2018a. “Reliability-Based Design Optimization by Using Support Vector Machines.” In Proceedings of ESREL—the European Safety and Reliability Conference. London: CRC Press. doi:10.1201/9781351174664.
  • Strömberg, N. 2018b. “Reliability-Based Design Optimization by Using Metamodels.” In Proceedings of the 6th International Conference on Engineering Optimization (EngOpt 2018). Cham, Switzerland: Springer International Publishing. doi:10.1007/978-3-319-97773-7.
  • Strömberg, N. 2019. “Reliability-Based Design Optimization by Using Ensemble of Metamodels.” In Proceedings of the 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering (UNCECOMP 2019). Athens: UNDECOMP.
  • Strömberg, N., and M. Tapankov. 2012. “Sampling- and SORM-based RBDO of a Knuckle Component by Using Optimal Regression Models.” In Proceedings of the 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference. Reston, VA: AIAA. doi:10.2514/6.2012-5439.
  • Sudret, B. 2008. “Global Sensitivity Analysis Using Polynomial Chaos Expansions.” Reliability Engineering & System Safety 93 (7): 964–979. doi:10.1016/j.ress.2007.04.002
  • Viana, F. A. C., R. T. Haftka, and V. Steffen Jr. 2009. “Multiple Surrogates: How Cross-Validation Errors Can Help Us to Obtain the Best Predictor.” Structural and Multidisciplinary Optimization 39: 439–457. doi:10.1007/s00158-008-0338-0
  • Wang, G. G., and S. Shan. 2006. “Review of Metamodeling Techniques in Support of Engineering Design Optimization.” Journal of Mechanical Design 129 (4): 370–380. doi:10.1115/1.2429697
  • Wolpert, D. H. 1992. “Stacked Generalization.” Neural Networks 5: 241–259. doi:10.1016/S0893-6080(05)80023-1
  • Xiao, Jian Zhou, Zhong Ma Yi, and Fang Li Xu. 2011. “Ensembles of Surrogates with Recursive Arithmetic Average.” Structural and Multidisciplinary Optimization 44: 651–671. doi:10.1007/s00158-011-0655-6 doi: 10.1007/s00158-011-0652-9
  • Yun, Y., M. Yoon, and H. Nakayama. 2009. “Multi-Objective Optimization Based on Meta-modeling by Using Support Vector Regression.” Optimization and Engineering 10: 167–181. doi:10.1007/s11081-008-9063-1
  • Zhou, Z.-H. 2012. Ensemble Methods, Foundations and Algorithms. New York: Chapman and Hall/CRC. doi:10.1201/b12207

Appendix 1. Symbolic proof

Below, in this appendix, the MATLAB® code for a symbolic proof showing that the optimal weights for an affine combination by minimizing the PRESS in (Equation8) and (Equation9) are identical to the optimal weights (Equation10) by using the formulation by Viana, Haftka, and Steffen (Citation2009) is given.