Full article: Comparison of optimal linear, affine and convex combinations of metamodels

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this article, five different formulations for establishing optimal ensembles of metamodels are presented and compared. The comparison is done by minimizing different norms of the residual vector of the leave-one-out cross-validation errors for linear, affine and convex combinations of 10 metamodels. The norms are taken to be the taxicab, the Euclidean and the infinity norm, respectively. The ensemble of metamodels consists of quadratic regression, Kriging with linear or quadratic bias, radial basis function networks with a-priori linear or quadratic bias, radial basis function networks with a-posteriori linear or quadratic bias, polynomial chaos expansion, support vector regression and least squares support vector regression. Eight benchmark functions are studied as ‘black-boxes’ using Halton and Hammersley samplings. The optimal ensembles are established for either one of the samplings and then the corresponding root mean square errors are established using the other sampling and vice versa. In total, 80 different test cases (5 formulations, 8 benchmarks and 2 samplings) are studied and presented. In addition, an established design optimization problem is solved using affine and convex combinations. It is concluded that minimization of the taxicab or Euclidean norm of the residual vector of the leave-one-out cross-validation errors for convex combinations of metamodels produces the best ensemble of metamodels.

Keywords:

1. Introduction

The use of metamodels, such as Kriging in Kleijnen (Citation2009), radial basis function networks in Amouzgar and Strömberg (Citation2017), polynomial chaos expansion in Sudret (Citation2008), support vector machines in Strömberg (Citation2018a) and support vector regression in Yun, Yoon, and Nakayama (Citation2009), for approximating computationally expensive models, such as nonlinear finite element models, in order to perform advanced studies, such as different disciplines of design optimization studied by, for example, Strömberg and Tapankov (Citation2012) and Amouzgar, Rashid, and Strömberg (Citation2013), is today a most established approach in engineering, see for example the review by Wang and Shan (Citation2006). The best choice of metamodel depends strongly on the character of the response, the choice of design of experiments and the definition of best. Therefore, searching for a general best single metamodel valid for all situations is like searching for the holy grail. A more fruitful search is to find the best ensemble of metamodels for a particular problem. This is the topic of the current article. To be more precise, in this article the best linear, affine and convex combinations, respectively, of 10 metamodels are established, where best is defined by the taxicab norm, the Euclidean norm or the infinity norm of the residual vector of the leave-one-out cross-validation errors.

During recent years, initiated by the work of Viana, Haftka, and Steffen (Citation2009) and Acar and Rais-Rohani (Citation2009), the choice of adopting affine combinations of metamodels has emerged to be a standard approach for setting up optimal ensembles of metamodels for design optimization. However, the present author's experience with using affine combinations of metamodels has often been poor. Overfitting and poor generalization of the optimal affine ensemble are frequently observed. Typically, a large positive weight is cancelled out by an almost equally large negative weight. In fact, this issue has already been discussed in Viana, Haftka, and Steffen (Citation2009), but explained to probably depend on the approximation used in the objective function. Furthermore, it was also discussed in this early article whether to introduce a convex constraint in order to overcome this pitfall. Therefore, one might be surprised that so many contributions promote the approach of using affine combinations of metamodels, especially since the pitfalls with affine combinations are well known in machine learning, see for example the textbook by Zhou (Citation2012). In fact, this has already been pointed out by Breiman (Citation1996) when he investigated the ideas of stacked generalization proposed by Wolpert (Citation1992). Breiman clearly states in his article that the proper constraints on the weights $w_{i}$ of a combination of metamodels are $\sum w_{i} = 1$ and $w_{i} \geq 0$ . In conclusion, the ensemble of metamodels should be a convex combination, not an affine combination. Indeed, the investigations done by Strömberg (Citation2018b, Citation2019) support this statement. This is also demonstrated in the following article, where optimal linear, affine and convex combinations are established by minimizing the norm of the residual vector of the leave-one-out cross-validation errors and validating the optimal ensembles of metamodels (OEMs) by calculating the root mean square errors (RMSE) for eight benchmark functions.

Early work on ensembles of metamodels can be found in Goel et al. (Citation2007), wherein a weighted average of the metamodels was adopted. Acar (Citation2010) studied various approaches for constructing affine combinations of metamodels using local measures. Xiao, Yi, and Xu (Citation2011) established affine combinations of metamodels with a recursive arithmetic average. Lee and Choi (Citation2014) proposed a new pointwise affine combination of metamodels by using a nearest points cross-validation approach. Shi et al. (Citation2016) proposed efficient affine combinations of radial basis function networks. Most recently, Song et al. (Citation2018) suggested an advanced and robust affine combination of metamodels by using extended adaptive hybrid functions. Two years earlier, Ferreira and Serpa (Citation2016) discussed different variants of least squares approaches, where a formulation including the constraints $w_{i} \geq 0$ was mentioned briefly.

The contents of this article are as follows: in the next section, optimal linear, affine and convex combinations are formulated by using the norm of the residual vector of the leave-one-out cross-validation errors. For linear and affine combinations, only the Euclidean norm is considered, but for convex combination the taxicab and the infinity norm are also utilized in order to obtain two linear programming (LP) problems and not only a quadratic one (QP). In addition, in Appendix 1, a symbolic proof is given showing that the established optimal affine formulation presented in Viana, Haftka, and Steffen (Citation2009) is equivalent to minimizing the predicted residual sum of squares (PRESS) for affine combinations. In Section 3, the ensemble of metamodels containing different settings of quadratic regression, Kriging, radial basis function networks (RBFN), polynomial chaos expansion (PCE), support vector regression (SVR) and least squares SVR are presented. In Section 4, the optimal linear, affine and convex combinations of metamodels are compared for eight benchmark functions, by training the metamodels for one particular sampling and then validating the RMSE for another sampling. In addition, a well-known design optimization problem is solved using affine and convex combinations of metamodels. Finally, some concluding remarks are given.

2. Optimal combinations of metamodels

In this work, a new metamodel $y_{e n} = y_{e n} (x)$ is defined as a linear combination of an ensemble of metamodels, i.e. (1) $y_{e n} = y_{e n} (x) = \sum_{i = 1}^{M} w_{i} y_{i} (x),$ (1) where M is the total number of metamodels in the ensemble, $w_{i}$ are weights and $y_{i} = y_{i} (x)$ represents a particular metamodel. In the next section, the basic equations of the 10 metamodels $y_{i}$ as implemented in this article are presented.

The leave-one-out cross-validation error at a point ${\hat{x}}^{k}$ of $y_{e n}$ is given by (2) $e_{k} = e ({\hat{x}}^{k}) = {\hat{f}}^{k} - y_{e n}^{(- k)} (x^{k}) = {\hat{f}}^{k} - \sum_{i = 1}^{M} w_{i} y_{i}^{(- k)} ({\hat{x}}^{k}),$ (2) where $y_{i}^{(- k)} (x)$ represents the metamodel with the kth data point excluded from the sampling set ${{\hat{x}}^{i}, {\hat{f}}^{i}}$ . If now the leave-one-out cross-validation in (Equation2(2) $e_{k} = e ({\hat{x}}^{k}) = {\hat{f}}^{k} - y_{e n}^{(- k)} (x^{k}) = {\hat{f}}^{k} - \sum_{i = 1}^{M} w_{i} y_{i}^{(- k)} ({\hat{x}}^{k}),$ (2) ) is executed for every data point, then the vector of PRESS residuals can be established: (3) $e = {e_{i}} = \hat{f} - Y w,$ (3) where $\hat{f}$ contains ${\hat{f}}^{i}$ , $w$ is a vector of weights $w_{i}$ and (4) $[Y]_{i j} = y_{j}^{(- i)} ({\hat{x}}^{i}) .$ (4) In order to find the best linear combination of metamodels $y_{e n}$ , the PRESS can be minimized, i.e. (5) $(LS) min_{w} ∥ e ∥_{2} = e^{T} e .$ (5) The solution to this problem is of course given by the well-known normal equation. Thus, the optimal weights are given by (6) $w = (Y^{T} Y)^{- 1} Y^{T} \hat{f} .$ (6) The affine constraint $w^{T} 1 = 1$ can also be included in (Equation5(5) $(LS) min_{w} ∥ e ∥_{2} = e^{T} e .$ (5) ), i.e. (7) $(LS-A) \{\begin{cases} min_{w} {∥e∥}_{2} \\ s.t. w^{T} 1 = 1. \end{cases}$ (7) The solution to this problem is given by (8) $w = (Y^{T} Y)^{- 1} (Y^{T} \hat{f} - λ 1),$ (8) where (9) $λ = \frac{1^{T} (Y^{T} Y)^{- 1} Y^{T} \hat{f} - 1}{1^{T} (Y^{T} Y)^{- 1} 1} .$ (9) Here, and in the following, $1$ represents a column vector of ones of proper size.

It is possible to prove that (Equation8(8) $w = (Y^{T} Y)^{- 1} (Y^{T} \hat{f} - λ 1),$ (8) ) and (Equation9(9) $λ = \frac{1^{T} (Y^{T} Y)^{- 1} Y^{T} \hat{f} - 1}{1^{T} (Y^{T} Y)^{- 1} 1} .$ (9) ) taken together are equivalent to (10) $w = \frac{C^{- 1} 1}{1^{T} C^{- 1} 1},$ (10) where (11) $[C]_{i j} = e_{i}^{T} e_{j}$ (11) and (12) ${e_{i}}_{k} = {\hat{f}}^{k} - y_{i}^{(- k)} (x^{k})$ (12) is the leave-one-out cross-validation error for a particular metamodel $y_{i} (x)$ . A symbolic proof of this statement is given in Appendix 1. Equation (Equation10(10) $w = \frac{C^{- 1} 1}{1^{T} C^{- 1} 1},$ (10) ) is the normal equation to (13) $\{\begin{cases} min_{w} w^{T} C w \\ s.t. w^{T} 1 = 1, \end{cases}$ (13) which was suggested by Viana, Haftka, and Steffen (Citation2009). Notice, of course, that the statement of equivalence does not hold for a linear combination. Then, when $C$ is positive definite, the optimal solution to (14) $min_{w} w^{T} C w$ (14) is the zero vector, which apparently differs from (Equation6(6) $w = (Y^{T} Y)^{- 1} Y^{T} \hat{f} .$ (6) ) in general.

In addition, as suggested already by Breiman (Citation1996) and discussed by Viana, Haftka, and Steffen (Citation2009), a natural constraint to include in (Equation7(7) $(LS-A) \{\begin{cases} min_{w} {∥e∥}_{2} \\ s.t. w^{T} 1 = 1. \end{cases}$ (7) ) is that $w_{i} \geq 0$ , i.e. (15) $(LS-C) \{\begin{cases} min_{w} ∥ e ∥_{2} \\ s.t. \{\begin{cases} w^{T} 1 = 1, \\ w_{i} \geq 0, i = 1, \dots, M, \end{cases} \end{cases}$ (15) which, of course, is a QP-problem. Furthermore, by adding this constraint, (Equation1(1) $y_{e n} = y_{e n} (x) = \sum_{i = 1}^{M} w_{i} y_{i} (x),$ (1) ) becomes a convex combination.

Instead of using the Euclidean norm $∥ e ∥_{2}$ in (Equation15(15) $(LS-C) \{\begin{cases} min_{w} ∥ e ∥_{2} \\ s.t. \{\begin{cases} w^{T} 1 = 1, \\ w_{i} \geq 0, i = 1, \dots, M, \end{cases} \end{cases}$ (15) ), the taxicab norm $∥ e ∥_{1}$ or the infinity norm $∥ e ∥_{\infty}$ can be adopted according to (16) $\begin{aligned} ∥ e ∥_{1} & = \sum_{i = 1}^{N} | e_{i} |, \\ ∥ e ∥_{\infty} & = max (e_{1}, \dots, e_{N}), \end{aligned}$ (16) where N is the number of sampling points.

The problem in (Equation15(15) $(LS-C) \{\begin{cases} min_{w} ∥ e ∥_{2} \\ s.t. \{\begin{cases} w^{T} 1 = 1, \\ w_{i} \geq 0, i = 1, \dots, M, \end{cases} \end{cases}$ (15) ) with the Euclidean norm switched to the taxicab norm corresponds to the following LP-problem: (17) $(LST-C) \{\begin{cases} min_{(w, p, q)} \sum_{i = 1}^{N} p_{i} + q_{i} \\ s.t. \{\begin{cases} Y w - \hat{f} = p - q, \\ w^{T} 1 = 1, \\ w_{i}, p_{j}, q_{j} \geq 0, i = 1, \dots, M, j = 1, \dots, N . \end{cases} \end{cases}$ (17) Finally, using the infinity norm, (Equation15(15) $(LS-C) \{\begin{cases} min_{w} ∥ e ∥_{2} \\ s.t. \{\begin{cases} w^{T} 1 = 1, \\ w_{i} \geq 0, i = 1, \dots, M, \end{cases} \end{cases}$ (15) ) can be rewritten as the following LP-problem: (18) $(LSINF-C) \{\begin{cases} min_{(w, t)} t \\ s.t. \{\begin{cases} Y w - \hat{f} \leq t 1, \\ - Y w + \hat{f} \leq t 1, \\ w^{T} 1 = 1, \\ w_{i}, t \geq 0, i = 1, \dots, M . \end{cases} \end{cases}$ (18) In summary, in order to establish optimal ensembles of metamodels, the following optimization problems are considered: LS in (Equation5(5) $(LS) min_{w} ∥ e ∥_{2} = e^{T} e .$ (5) ), LS-A in (Equation7(7) $(LS-A) \{\begin{cases} min_{w} {∥e∥}_{2} \\ s.t. w^{T} 1 = 1. \end{cases}$ (7) ), LS-C in (Equation15(15) $(LS-C) \{\begin{cases} min_{w} ∥ e ∥_{2} \\ s.t. \{\begin{cases} w^{T} 1 = 1, \\ w_{i} \geq 0, i = 1, \dots, M, \end{cases} \end{cases}$ (15) ), LST-C in (Equation17(17) $(LST-C) \{\begin{cases} min_{(w, p, q)} \sum_{i = 1}^{N} p_{i} + q_{i} \\ s.t. \{\begin{cases} Y w - \hat{f} = p - q, \\ w^{T} 1 = 1, \\ w_{i}, p_{j}, q_{j} \geq 0, i = 1, \dots, M, j = 1, \dots, N . \end{cases} \end{cases}$ (17) ) and, finally, LSINF-C in (Equation18(18) $(LSINF-C) \{\begin{cases} min_{(w, t)} t \\ s.t. \{\begin{cases} Y w - \hat{f} \leq t 1, \\ - Y w + \hat{f} \leq t 1, \\ w^{T} 1 = 1, \\ w_{i}, t \geq 0, i = 1, \dots, M . \end{cases} \end{cases}$ (18) ). In Section 4, these five formulations are used to establish optimal linear, affine and convex combinations of metamodels for eight benchmark functions, which then are validated by calculating the RMSE.

3. Metamodels

A set of sampling data ${{\hat{x}}^{i}, {\hat{f}}^{i}}$ , obtained from design of experiments as mentioned in the previous section, can be represented with an appropriate function, which is called a response surface, a surrogate model or a metamodel. One choice of such a function is the regression model given by (19) $f = f (x) = ξ (x)^{T} β,$ (19) where $ξ = ξ (x)$ is a vector of polynomials of $x$ , and $β$ contains regression coefficients. By minimizing the sum of square errors, i.e. (20) $min_{β} \sum_{i = 1}^{N} (X_{i j} β_{j} - {\hat{f}}^{i})^{2},$ (20) where $X_{i j} = ξ_{j} ({\hat{x}}^{i})$ and N is the number of sampling points, then optimal regression coefficients are obtained from the normal equation reading (21) $β = (X^{T} X)^{- 1} X^{T} \hat{f} .$ (21) A quadratic regression model, denoted model Q in the next section, is used as one of the 10 metamodels in the ensemble.

Examples of other useful metamodels, which are included in the ensemble, are Kriging, RBFN, PCE, SVR and least square SVR. The basic equations of these models as implemented in the present author's in-house toolboxFootnote¹ are presented below.

3.1. Kriging

The Kriging models are given by (22) $f (x) = ξ (x)^{T} β^{*} + r (x)^{T} R^{- 1} (θ^{*}) (\hat{f} - X β^{*}),$ (22) where the first term represents the global behaviour by a linear (model Kr-L) or quadratic regression model (model Kr-Q) and the second term ensures that the sample data is fitted exactly. $R = R (θ) = [R_{i j}]$ , where (23) $R_{i j} = R_{i j} (θ, {\hat{x}}^{i}, {\hat{x}}^{j}) = \exp (- \sum_{k = 1}^{N} θ_{k} ({\hat{x}}_{k}^{i} - {\hat{x}}_{k}^{i})^{2}) .$ (23) Furthermore, $θ^{*}$ is obtained by maximizing the likelihood function (24) $\frac{1}{σ^{N} \sqrt{det (R) (2 π)^{N}}} \exp (- \frac{(X β - \hat{f})^{T} R^{- 1} (X β - \hat{f})}{2 σ^{2}})$ (24) using a genetic algorithm and establishing (25) $β^{*} = (X^{T} R^{- 1} (θ^{*}) X)^{- 1} X^{T} R^{- 1} (θ^{*}) \hat{f}$ (25) from the optimality conditions.

3.2. Radial basis function networks

For a particular input ${\hat{x}}^{k}$ , the outcome of the radial basis function network can be written as (26) $f^{k} = f ({\hat{x}}^{k}) = \sum_{i = 1}^{N_{Φ}} A_{k i} α_{i} + \sum_{i = 1}^{N_{β}} B_{k i} β_{i},$ (26) where $α_{i}$ and $β_{i}$ are constants defined by (Equation29(29) $β = (B^{T} B)^{- 1} B^{T} \hat{f},$ (29) ) and (Equation30(30) $α = A^{- 1} (\hat{f} - B \hat{β}) .$ (30) ), or (Equation31(31) $[\begin{matrix} A & B \\ B^{T} & 0 \end{matrix}] \{\begin{matrix} α \\ β \end{matrix}\} = \{\begin{matrix} \hat{f} \\ 0 \end{matrix}\} .$ (31) ) presented below, (27) $A_{k i} = Φ_{i} ({\hat{x}}^{k}) and B_{k i} = ξ_{i} ({\hat{x}}^{k}) .$ (27) Here, $Φ_{i} = Φ_{i} ({\hat{x}}^{k})$ represents the radial basis function and the second term in (Equation26(26) $f^{k} = f ({\hat{x}}^{k}) = \sum_{i = 1}^{N_{Φ}} A_{k i} α_{i} + \sum_{i = 1}^{N_{β}} B_{k i} β_{i},$ (26) ) is a linear or quadratic bias. Furthermore, for a set of signals, the corresponding outgoing responses $\hat{f} = {{\hat{f}}^{i}}$ of the network can be formulated compactly as (28) $\hat{f} = A α + B β,$ (28) where $α = {α_{i}}$ , $β = {β_{i}}$ , $A = [A_{i j}]$ and $B = [B_{i j}]$ . If $β$ is given a priori by the normal equation in (Equation21(21) $β = (X^{T} X)^{- 1} X^{T} \hat{f} .$ (21) ) as (29) $β = (B^{T} B)^{- 1} B^{T} \hat{f},$ (29) then (30) $α = A^{- 1} (\hat{f} - B \hat{β}) .$ (30) Two settings of this model are utilized in the ensemble depending on the choice of bias, called model Rpri-L and model Rpri-Q, respectively.

Otherwise, when the bias is unknown, $α$ and $β$ are established by solving (31) $[\begin{matrix} A & B \\ B^{T} & 0 \end{matrix}] \{\begin{matrix} α \\ β \end{matrix}\} = \{\begin{matrix} \hat{f} \\ 0 \end{matrix}\} .$ (31) Two settings of this model are also utilized in the ensemble, which are called model Rpost-L and model Rpost-Q.

3.3. Polynomial chaos expansion

Polynomial chaos expansion (model PCE) by using Hermite polynomials $ϕ_{n} = ϕ_{n} (x)$ can be written as (32) $f (x) = \sum_{i = 0}^{M_{t}} c_{i} \prod_{j = 1}^{N_{V A R}} ϕ_{i} (x_{j}),$ (32) where $M_{t} + 1$ is the number of terms and constant coefficients $c_{i}$ , and $N_{V A R}$ is the number of variables $x_{i}$ . The Hermite polynomials are defined by (33) $ϕ_{n} = ϕ_{n} (x) = (- 1)^{n} \exp (\frac{x^{2}}{2}) \frac{d^{n}}{d x^{n}} (\exp (- \frac{x^{2}}{2})) .$ (33) For instance, one has (34) $\begin{aligned} ϕ_{0} & = 1, \\ ϕ_{1} & = x, \\ ϕ_{2} & = x^{2} - 1, \\ ϕ_{3} & = x^{3} - 3 x, \\ ϕ_{4} & = x^{4} - 6 x^{2} + 3, \\ ϕ_{5} & = x^{5} - 10 x^{3} + 15 x, \\ ϕ_{6} & = x^{6} - 15 x^{4} + 45 x^{2} - 15, \\ ϕ_{7} & = x^{7} - 21 x^{5} + 105 x^{3} - 105 x . \end{aligned}$ (34) The unknown constants $c_{i}$ are then established by using the normal equation in (Equation21(21) $β = (X^{T} X)^{- 1} X^{T} \hat{f} .$ (21) ). A nice feature of the polynomial chaos expansion is that the mean of $f (X)$ in (Equation32(32) $f (x) = \sum_{i = 0}^{M_{t}} c_{i} \prod_{j = 1}^{N_{V A R}} ϕ_{i} (x_{j}),$ (32) ) for uncorrelated standard normal distributed variables $X_{i}$ is simply given by (35) $E [f (X)] = c_{0} .$ (35)

3.4. Support vector regression

The soft nonlinear support vector regression model (model SVR) reads (36) $f (x) = \sum_{i = 1}^{N} λ^{i} k (x^{i}, x) - \sum_{i = 1}^{N} {\hat{λ}}^{i} k (x^{i}, x) + b^{*},$ (36) where $k (x^{i}, x)$ is the kernel function, and $λ^{i}$ , ${\hat{λ}}^{i}$ and $b^{*}$ are established by solving (37) $\{\begin{cases} min_{(λ, \hat{λ})} \{\begin{cases} \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (λ^{i} - {\hat{λ}}^{i}) (λ^{j} - {\hat{λ}}^{j}) k (x^{i}, x^{j}) + \dots \\ \sum_{j = 1}^{N} ({\hat{λ}}^{i} - λ^{i}) {\hat{f}}^{i} + δ \sum_{j = 1}^{N} (λ^{i} + {\hat{λ}}^{i}) \end{cases} \\ s.t. \{\begin{cases} \sum_{j = 1}^{N} (λ^{i} - {\hat{λ}}^{i}) = 0, \\ 0 \leq λ^{i}, {\hat{λ}}^{i} \leq C, i = 1, \dots, N . \end{cases} \end{cases}$ (37) Finally, the corresponding least square support vector regression model (model L-SVR) is established by solving (38) $[\begin{matrix} 0 & 1^{T} & - 1^{T} \\ 1 & B + γ I & - B \\ - 1 & - B & B + γ I \end{matrix}] \{\begin{matrix} b \\ λ \\ \hat{λ} \end{matrix}\} = \{\begin{matrix} c 0 \\ \hat{f} - δ 1 \\ - \hat{f} - δ 1 \end{matrix}\},$ (38) where $γ = 1 / C$ and (39) $B = [B_{i j}], B_{i j} = k (x^{i}, x^{j}) .$ (39) In summary, the ensemble consists of the following metamodels: Q, Kr-L, Kr-Q, Rpri-L, Rpri-Q, Rpost-L, Rpost-Q, PCE, SVR and L-SVR. In the next section, optimal linear, affine and convex combinations of this ensemble for eight benchmark functions are established.

4. Examples

In this section, optimal ensembles of the 10 metamodels presented in the previous section are established by using the five formulations presented in Section 2 (LS, LS-A, LS-C, LST-C and LSINF-C). This is done for the eight benchmark functions presented below in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) ): (40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) Seven of the eight functions in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) ) are well-known test examples, while $f_{3}$ is a test function developed by Strömberg (Citation2016, Citation2018a, Citation2018b, Citation2019) in order to evaluate algorithms for reliability-based design optimization. The fourth function $f_{4}$ is the well-known peaks function implemented in MATLAB $^{®}$ , where the complete analytical expression can be found easily in the documentation of MATLAB $^{®}$ .

Now, the eight test functions in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) ) are considered to be ‘black-boxes’, which are treated by setting up two different designs of experiments with 30 sampling points; Halton and Hammersley sampling according to Figure are adopted. Thus, in total, 80 optimal ensembles of metamodels are established.Footnote²

Figure 1. Left: Halton sampling. Right: Hammersley sampling.

The performance of each OEM is evaluated by taking the RMSE for the other sampling, i.e. for an OEM established for Halton sampling the RMSE is calculated for Hammersley sampling and vice versa. The RMSE for the OEM established for Halton sampling are presented in Table , and for the Hammersley sampling the RMSE are given in Table . In general, the RMSE are lower for the convex combinations compared to LS and LS-A. Typically, the LS and LS-A have similar values. The LS-C and LST-C have similar values, which, in general, are slightly lower than the values obtained using LSINF-C.

Table 1. RMSE for the five OEMs and all test functions with Halton sampling.

Display Table

Table 2. RMSE for the five OEMs and all test functions with Hammersley sampling.

Display Table

By ranking the OEMs for each test case from 1 (worst) to 5 (best), one obtains the ranking presented in Figure . A clear trend is that LS-C and LST-C have a better performance than the other OEMs.

Figure 2. Ranking of the five formulations for establishing the OEMs.

Not only is a better performance obtained using LS-C or LST-C, but also a smoother behaviour of the OEM compared to the behaviour obtained using a linear or affine combination. This difference appears clearly in Figures , and can be explained by overfitting and the number of metamodels included in the OEM. In general, all 10 metamodels appear in the OEMs of LS and LS-A. For the convex OEMs, three to four metamodels are mostly included in the OEMs. This result is summarized in Figure . Thus, automatic pruning of the metamodels is produced by using convex combinations.

Figure 3. The modified Brent test function $f_{7}$ compared to the OEMs by using LS, LS-A and LS-C. Notice the overfitting for LS and LS-A. (a) $f_{7}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—modified Brent, analytic. (b) $f_{7}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—modified Brent, LS, Halton. (c) $f_{7}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—modified Brent, LS-A, Halton. (d) $f_{7}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—modified Brent, LS-C, Halton.

Figure 4. The test function $f_{3}$ compared to the OEMs by using LS, LS-A and LS-C. Notice the strong overfitting in one of the corners for LS and LS-A. (a) $f_{3}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—analytic. (b) $f_{3}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—LS, Hammersley. (c) $f_{3}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—LS-A, Hammersley. (d) $f_{3}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—LS-C, Hammersley.

Figure 5. The Hosaki test function $f_{8}$ compared to the OEMs by using LS, LS-A and LS-C. Notice stronger overfitting along the boundaries for LS and LS-A compared to LS-C. (a) $f_{8}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—Hosaki, analytic. (b) $f_{8}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—Hosaki, LS, Halton. (c) $f_{8}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—Hosaki, LS-A, Halton. (d) $f_{8}$ in (Equation40(40) $\begin{aligned} f_{1} & = \sin (4 (x_{1} - 1) - 2) \cos (4 (x_{2} - 1) - 2), 1 \leq x_{i} \leq 2, \\ f_{2} & = 100 * (x_{2} - x_{1}^{2})^{2} + (x_{1} - 1)^{2}, - 2 \leq x_{i} \leq 2, \\ f_{3} & = \sqrt{(1000 * (4 / x_{1} - 2)^{4} + 1000 * (4 / x_{2} - 2)^{4})}, 1 \leq x_{i} \leq 4, \\ f_{4} & = peaks (x_{1}, x_{2}), - 2 \leq x_{i} \leq 2, \\ f_{5} & = \{\begin{cases} 0.6 + \sin (16 / 15 * x_{1} - 1) + \dots \\ \sin (16 / 15 * x_{1} - 1)^{2} + \dots \\ \sin (16 / 15 * x_{2} - 1) + \dots \\ \sin (16 / 15 * x_{2} - 1)^{2}, \end{cases} - 1 \leq x_{i} \leq 1, \\ f_{6} & = \cos (x_{1}) \sin (x_{2}) - x_{1} / (x_{2}^{2} + 1), - 5 \leq x_{i} \leq 5, \\ f_{7} & = \{\begin{cases} (x_{1} + 10)^{2} + (x_{2} + 10)^{2} - \dots \\ 190 * \exp (- 0.1 * x_{1}^{2} - 0.1 * x_{2}^{2}), \end{cases} - 10 \leq x_{i} \leq 10, \\ f_{8} & = \{\begin{cases} (1 - 8 x_{1} + 7 x_{1}^{2} - 7 / 3 x_{1}^{3} + \dots \\ 1 / 4 x_{1}^{4}) x_{2}^{2} \exp (- x_{2}), \end{cases} 0 \leq x_{i} \leq 5. \end{aligned}$ (40) )—Hosaki, LS-C, Halton.

Figure 6. The number of metamodels on average for the OEMs by using the five different formulations.

LS and LS-A also suffer from overfitting. Frequently, a large positive weight is cancelled out by a large negative value on another weight, see for example the optimal weights for the modified Brent test function in Table and the plots in Figure . In this particular test example, the performances of the convex combinations are superior over the performances of the linear and affine combinations. This can be concluded by studying the plots in Figure .

Table 3. The optimal weights for the modified Brent test function $f_{7}$ . Notice the pruning obtained using LS-C, LST-C and LSINF-C, as well as the appearance of large positive and negative weights cancelling out each other for the LS and LS-A.

Display Table

Although automatic pruning is generated using convex combinations, all metamodels appear sometimes in the convex OEMs, see Figure . From this figure one can also see that Kriging-L and PCE are the real workhorses for the examples studied in this article, but also Rpost-Q and SVR appear more frequently than the others.

Figure 7. The frequency of metamodels appearing in the ensemble for LS-C, LST-C and LSINF-C.

Finally, a most established design optimization benchmark is solved using optimal affine (LS-A) and convex (LST-C) combinations of metamodels. The welded beam problem considered by Garg (Citation2014) and Amouzgar and Strömberg (Citation2017) is studied. The governing equations of the problem read (41) $\{\begin{cases} min_{x_{i}} f = 1.104, 71 x_{1}^{2} x_{2} + 0.048, 11 x_{3} x_{4} (14 + x_{2}) \\ s.t. \{\begin{cases} g_{1} = τ (x_{i}) - 13,600 \leq 0, \\ g_{2} = σ (x_{i}) - 30,000 \leq 0, \\ g_{3} = x_{1} - x_{4} \leq 0, \\ g_{4} = 0.125 - x_{1} \leq 0, \\ g_{5} = δ (x_{i}) - 0.25 \leq 0, \\ g_{6} = 6,000 - P_{c} (x_{i}) \leq 0, \end{cases} \end{cases}$ (41) where the definitions of the shear stress $τ (x_{i})$ , normal stress $σ (x_{i})$ , displacement $δ (x_{i})$ and critical force $P_{c} (x_{i})$ can be found in Amouzgar and Strömberg (Citation2017). In addition, the variables are bounded by $0.1 \leq x_{1},$ $x_{4} \leq 2$ and $0.1 \leq x_{2},$ $x_{3} \leq 10$ . The optimal solution obtained by GA and SLP is $(0.24437, 6.2175, 8.2915, 0.24437)$ . An almost identical solution was presented by Garg (Citation2014). In Amouzgar and Strömberg (Citation2017), this benchmark was solved using screening with a total number of 60 sampling points and radial basis function networks. The solution obtained in their work was (0.4147, 3.926, 6.6201, 0.4147), which violated the first constraint.

In this work, the problem is solved by setting up Halton sampling using 60 points without any screening and establishing optimal affine (LS-A by solving Equation10(10) $w = \frac{C^{- 1} 1}{1^{T} C^{- 1} 1},$ (10) ) and convex combinations (LST-C) of the following metamodels: Q, Kr-L, Rpri-Q, Rpost-Q, SVR and L-SVR. The optimal weights for LS-A by solving (Equation10(10) $w = \frac{C^{- 1} 1}{1^{T} C^{- 1} 1},$ (10) ) are presented in Table and the optimal weights for LST-C by solving (Equation17(17) $(LST-C) \{\begin{cases} min_{(w, p, q)} \sum_{i = 1}^{N} p_{i} + q_{i} \\ s.t. \{\begin{cases} Y w - \hat{f} = p - q, \\ w^{T} 1 = 1, \\ w_{i}, p_{j}, q_{j} \geq 0, i = 1, \dots, M, j = 1, \dots, N . \end{cases} \end{cases}$ (17) ) are presented in Table . The pruning effect using LST-C appears clearly when compared to the solution for LS-A, where no pruning is obtained. The drawback with large positive weights cancelled out by large negative weights using LS-A appears for f and $g_{5}$ in Table .

Table 4. The optimal weights for the affine combination (LS-A) defined by (Equation10(10) $w = \frac{C^{- 1} 1}{1^{T} C^{- 1} 1},$ (10) ). No pruning is observed.

Display Table

Table 5. The optimal weights for the convex combination (LST-C) defined by (Equation17(17) $(LST-C) \{\begin{cases} min_{(w, p, q)} \sum_{i = 1}^{N} p_{i} + q_{i} \\ s.t. \{\begin{cases} Y w - \hat{f} = p - q, \\ w^{T} 1 = 1, \\ w_{i}, p_{j}, q_{j} \geq 0, i = 1, \dots, M, j = 1, \dots, N . \end{cases} \end{cases}$ (17) ). The pruning effect appears clearly.

Display Table

In Table , the optimal solutions for all metamodels as well as the affine and convex combinations of metamodels are presented. By just studying the objective value f, one obtains the following ranking: Rpri-Q, L-SVR, Rpost-Q, Kr-L, LS-A, Q, LST-C and SVR. It might seem that the performance of LST-C and SVR are poor. However, by studying the corresponding constraints values in Table , one realizes that only the optimal solutions obtained using LST-C and SVR are feasible. All the other metamodels fail; also the affine combination LS-A fails. Thus, the best performance is obtained using LST-C, which in turn is much better than the performance obtained using SVR.

Table 6. The optimal solution using LS-A and LST-C as well as the metamodels of the ensemble taken separately.

Display Table

Table 7. The corresponding constraint values for the optimal solutions presented in Table . Negative values are feasible.

Display Table

5. Concluding remarks

In this article, a general framework for generating optimal linear, affine and convex combinations of metamodels by minimizing the PRESS using the taxicab, Euclidean or infinity norm are presented and studied. Thus, the formulation proposed by Viana, Haftka, and Steffen (Citation2009) is not the starting point in this article, but instead derived symbolically as a special case for affine combinations using the Euclidean norm. It is concluded from the study that convex combinations are preferable over linear as well as the established affine combinations. The risk of overfitting is less and automatic pruning of metamodels is obtained. In conclusion, it is suggested that the optimal weights of the convex combinations of metamodels be established by minimizing the Euclidean norm or the taxicab norm of the residual vector of the leave-one-out cross-validation errors, where the former formulation implies a QP-problem and the latter one is an LP-problem to be solved. For future work, it is suggested to study the performance of these two formulations for large problems, in order to establish if the QP-problem is preferable over the LP-problem or vice versa.

Disclosure statement

No potential conflict of interest was reported by the author(s).

ORCID

Niclas Strömberg http://orcid.org/0000-0001-6821-5727

Notes

1 MetaBox, http://www.fema.se

2 The optimal weights for the OEMs for all test functions in (Equation40) can be obtained from the author upon request by sending an e-mail to [email protected]

References

Acar, E. 2010. “Various Approaches for Constructing An Ensemble of Metamodels Using Local Measures.” Structural and Multidisciplinary Optimization 42: 879–896. doi:10.1007/s00158-010-0520-z
Web of Science ®Google Scholar
Acar, E., and M. Rais-Rohani. 2009. “Ensemble of Metamodels with Optimized Weight Factors.” Structural and Multidisciplinary Optimization 37: 279–294. doi:10.1007/s00158-008-0230-y
Web of Science ®Google Scholar
Amouzgar, K., A. Rashid, and N. Strömberg. 2013. “Multi-Objective Optimization of a Disc Brake System by Using SPEA2 and RBFN.” In Proceedings of the 39th Design Automation Conference. New York: ASME.
Google Scholar
Amouzgar, K., and N. Strömberg. 2017. “Radial Basis Functions As Surrogate Models with a Priori Bias in Comparison with a Posteriori Bias.” Structural and Multidisciplinary Optimization 55: 1453–1469. doi:10.1007/s00158-016-1569-0
Web of Science ®Google Scholar
Breiman, L. 1996. “Stacked Regressions.” Machine Learning 24: 49–64. doi:10.1007/BF00117832
Web of Science ®Google Scholar
Ferreira, W. G., and A. L. Serpa. 2016. “Ensemble of Metamodels: The Augmented Least Squares Approach.” Structural and Multidisciplinary Optimization 53: 1019–1046. doi:10.1007/s00158-015-1366-1
Web of Science ®Google Scholar
Garg, H. 2014. “Solving Structural Engineering Design Optimization Problems Using An Artificial Bee Colony Algorithm.” Journal of Industrial and Management Optimization 10 (3): 777–794. doi:10.3934/jimo.2014.10.777
Web of Science ®Google Scholar
Goel, T., R. T. Haftka, W. Shyy, and N. V. Queipo. 2007. “Ensemble of Surrogates.” Structural and Multidisciplinary Optimization 33: 199–216. doi:10.1007/s00158-006-0051-9
Web of Science ®Google Scholar
Kleijnen, J. P. C. 2009. “Kriging Metamodeling in Simulation: A Review.” European Journal of Operational Research 192 (3): 707–716. doi:10.1016/j.ejor.2007.10.013
Web of Science ®Google Scholar
Lee, Y., and D.-H. Choi. 2014. “Pointwise Ensemble of Meta-Models Using ν Nearest Points Cross-Validation.” Structural and Multidisciplinary Optimization 50: 383–394. doi:10.1007/s00158-014-1067-1
Web of Science ®Google Scholar
Shi, R., L. Liu, T. Long, and J. Liu. 2016. “An Efficient Ensemble of Radial Basis Functions Method Based on Quadratic Programming.” Engineering Optimization 48: 1202–1225. doi:10.1080/0305215X.2015.1100470
Web of Science ®Google Scholar
Song, X., L. Lv, J. Li, W. Sun, and J. Zhang. 2018. “An Advanced and Robust Ensemble Surrogate Model: Extended Adaptive Hybrid Functions.” Journal of Mechanical Design 140 (4): Article ID 041402. doi:10.1115/1.4039128
PubMed Web of Science ®Google Scholar
Strömberg, N. 2016. “Reliability-Based Design Optimization by Using a SLP Approach and Radial Basis Function Networks.” In Proceedings of the 2016 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference (IDETC/CIE). New York: ASME.
Google Scholar
Strömberg, N. 2018a. “Reliability-Based Design Optimization by Using Support Vector Machines.” In Proceedings of ESREL—the European Safety and Reliability Conference. London: CRC Press. doi:10.1201/9781351174664.
Google Scholar
Strömberg, N. 2018b. “Reliability-Based Design Optimization by Using Metamodels.” In Proceedings of the 6th International Conference on Engineering Optimization (EngOpt 2018). Cham, Switzerland: Springer International Publishing. doi:10.1007/978-3-319-97773-7.
Google Scholar
Strömberg, N. 2019. “Reliability-Based Design Optimization by Using Ensemble of Metamodels.” In Proceedings of the 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering (UNCECOMP 2019). Athens: UNDECOMP.
Google Scholar
Strömberg, N., and M. Tapankov. 2012. “Sampling- and SORM-based RBDO of a Knuckle Component by Using Optimal Regression Models.” In Proceedings of the 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference. Reston, VA: AIAA. doi:10.2514/6.2012-5439.
Google Scholar
Sudret, B. 2008. “Global Sensitivity Analysis Using Polynomial Chaos Expansions.” Reliability Engineering & System Safety 93 (7): 964–979. doi:10.1016/j.ress.2007.04.002
Web of Science ®Google Scholar
Viana, F. A. C., R. T. Haftka, and V. Steffen Jr. 2009. “Multiple Surrogates: How Cross-Validation Errors Can Help Us to Obtain the Best Predictor.” Structural and Multidisciplinary Optimization 39: 439–457. doi:10.1007/s00158-008-0338-0
Web of Science ®Google Scholar
Wang, G. G., and S. Shan. 2006. “Review of Metamodeling Techniques in Support of Engineering Design Optimization.” Journal of Mechanical Design 129 (4): 370–380. doi:10.1115/1.2429697
Web of Science ®Google Scholar
Wolpert, D. H. 1992. “Stacked Generalization.” Neural Networks 5: 241–259. doi:10.1016/S0893-6080(05)80023-1
Web of Science ®Google Scholar
Xiao, Jian Zhou, Zhong Ma Yi, and Fang Li Xu. 2011. “Ensembles of Surrogates with Recursive Arithmetic Average.” Structural and Multidisciplinary Optimization 44: 651–671. doi:10.1007/s00158-011-0655-6 doi: 10.1007/s00158-011-0652-9
Web of Science ®Google Scholar
Yun, Y., M. Yoon, and H. Nakayama. 2009. “Multi-Objective Optimization Based on Meta-modeling by Using Support Vector Regression.” Optimization and Engineering 10: 167–181. doi:10.1007/s11081-008-9063-1
Web of Science ®Google Scholar
Zhou, Z.-H. 2012. Ensemble Methods, Foundations and Algorithms. New York: Chapman and Hall/CRC. doi:10.1201/b12207
Google Scholar

Appendix 1. Symbolic proof

Below, in this appendix, the MATLAB $^{®}$ code for a symbolic proof showing that the optimal weights for an affine combination by minimizing the PRESS in (Equation8(8) $w = (Y^{T} Y)^{- 1} (Y^{T} \hat{f} - λ 1),$ (8) ) and (Equation9(9) $λ = \frac{1^{T} (Y^{T} Y)^{- 1} Y^{T} \hat{f} - 1}{1^{T} (Y^{T} Y)^{- 1} 1} .$ (9) ) are identical to the optimal weights (Equation10(10) $w = \frac{C^{- 1} 1}{1^{T} C^{- 1} 1},$ (10) ) by using the formulation by Viana, Haftka, and Steffen (Citation2009) is given.

Comparison of optimal linear, affine and convex combinations of metamodels

Abstract

1. Introduction

2. Optimal combinations of metamodels