781
Views
22
CrossRef citations to date
0
Altmetric
Research Article

Performance comparison of multiple and single surrogate models for pumping optimization of coastal aquifers

, &
Pages 336-349 | Received 22 May 2018, Accepted 18 Dec 2018, Published online: 11 Mar 2019

ABSTRACT

Single and multiple surrogate models were compared for single-objective pumping optimization problems of a hypothetical and a real-world coastal aquifer. Different instances of radial basis functions and kriging surrogates were utilized to reduce the computational cost of direct optimization with variable density and salt transport models. An adaptive surrogate update scheme was embedded in the operations of an evolutionary algorithm to efficiently control the feasibility of optimal solutions in pumping optimization problems with multiple constraints. For a set of independent optimization runs, results showed that multiple surrogates, either by selecting the best or by using ensembles, did not necessarily outperform the single surrogate approach. Nevertheless, the ensemble with optimal weights produced slightly better results than selecting only the best surrogates or applying a simple averaging approach. For all cases, the computational cost, by using single or multiple surrogate models, was reduced by up to 90% of the direct optimization.

Editor R. Woods; Associate editor L. Bouchaou

Introduction

Variable density and salt transport (VDST) numerical models provide high-fidelity simulations of seawater intrusion (SWI) at the expense of increased computational cost (Werner et al. Citation2013). As a result, the integration of VDST models within simulation-based frameworks, such as uncertainty analysis or optimization, is often computationally impractical. Several SWI studies have proposed the use of surrogate models to reduce the computational cost without compromising the quality of the analysis (Ketabchi and Ataie-Ashtiani Citation2015a, Sreekanth and Datta Citation2015).

Typically, surrogate models are trained on a set of input–output data produced from the computationally expensive physics-based model and then serve as efficient empirical approximators of the original model’s response (Razavi et al. Citation2012a, Asher et al. Citation2015). The selection of the surrogate model is often indicated by the type of analysis required by the expensive computer model (Broad et al. Citation2015). For example, Rajabi et al. (Citation2015) employed polynomial chaos expansions as appropriate surrogate models to cope with uncertainty in SWI simulations. Artificial neural networks, genetic programming, radial basis functions, fuzzy inference systems and multivariate adaptive regression splines are examples of surrogate models that have been used as surrogates for VDST models, in deterministic single-objective or multi-objective optimization problems of coastal aquifers (e.g. Rao et al. Citation2004, Rao and Manju Citation2007, Dhar and Datta Citation2009, Sreekanth and Datta Citation2010, Kourakos and Mantoglou Citation2011, Grundmann et al. Citation2012, Ataie-Ashtiani et al. Citation2013, Christelis and Mantoglou Citation2016, Roy and Datta Citation2017a, Citation2017b, Citation2017c). Some studies have also compared the prediction skills and the efficiency of different surrogates for approximating the response of VDST models (e.g. Sreekanth and Datta Citation2011a, Lal and Datta Citation2018, Yadav et al. Citation2018).

Instead of employing a single surrogate to approximate the response of an expensive computer model, various researchers in engineering optimization have investigated the use of multiple surrogates aiming to improve the exploration of the optimal search space (e.g. Viana et al. Citation2009, Citation2013, Acar Citation2010, Müller and Piché Citation2011, Nikolos Citation2013, Müller and Shoemaker Citation2014, Jiang et al. Citation2015, Shi et al. Citation2016, Bhosekar and Ierapetritou Citation2017, Hou et al. Citation2017). The generic approach in a multiple surrogate framework is to identify a suite of reliable surrogates for the problem at hand, usually through cross-validation errors, and either use the best or construct an ensemble (e.g. weighted-average surrogate) to replace the original expensive model (Viana et al. Citation2010). The superiority of employing multiple against single surrogates is debatable, since the nature of each optimization problem may favour the one approach or the other (e.g. Viana et al. Citation2009, Babaei and Pan Citation2016).

In addition, the use of multiple surrogates involves several considerations that are related to the effort and computation time required to implement a surrogate-based optimization (SBO). A primary issue is if the best surrogate models are identified through a cross-validation strategy, or if it is known, based on previous experience with the optimization problem, which surrogates should work best (Viana et al. Citation2009). In the first case, it must be decided how many surrogates should be explored and what computational budget is available to apply an efficient and informative cross-validation framework. Furthermore, the training time of surrogate models varies with their complexity, and thus employing more complex yet possibly more accurate models may add considerable computational time.

The multiple surrogate approach is limited in coastal aquifer management and has been mostly explored using ensembles constructed from different instances of the same surrogate model (Roy and Datta Citation2018b). Sreekanth and Datta (Citation2011b) first utilized an ensemble of genetic programming surrogate models in lieu of a VDST model to solve a multi-objective pumping optimization problem. Their study showed that the ensemble performed better than using a single genetic programming surrogate model. Recently, Roy and Datta (Citation2017a, Citation2017c) solved multi-objective pumping optimization problems by utilizing ensembles of fuzzy inference systems and of multivariate adaptive regression splines, respectively. Their approach reduced the uncertainty in the prediction of the surrogate models while the multivariate adaptive regression splines provided a more efficient ensemble surrogate selection. Roy and Datta (Citation2018a) also proposed the use of ensembles of Gaussian process regression models to increase the prediction accuracy of surrogate models in coastal aquifer management under parameter uncertainty. To handle the computational burden, they also used parallel computing facilities.

The combination of different surrogate modelling techniques may potentially increase the accuracy in approximating the original computer model response (Viana et al. Citation2009, Müller and Shoemaker Citation2014). The present work compares the performance of single and multiple surrogates for single-objective pumping optimization problems of coastal aquifers. An adaptive SBO framework is adopted where the surrogate models are used to predict the computationally expensive constraint functions associated with multiple pumping wells. By utilizing a cross-validation error strategy, the multiple surrogate approach is applied either by identifying the best model or by forming weighted-average surrogates (ensembles) for each constraint function. Different instances of radial basis functions and kriging surrogate models are employed due to their beneficial interpolating capabilities in approximating deterministic computer simulations (Razavi et al. Citation2012b). The SBO frameworks are also compared with the direct optimization using the VDST model, which provided the benchmark solutions for a hypothetical and a real-world coastal aquifer management problem. To enable a more comprehensive comparison due to the probabilistic nature of the optimization algorithm, multiple optimization runs were performed for each framework.

SWI modelling and pumping optimization

Overview of the study area

An unconfined, elongated coastal aquifer along the Vathi valley, located in the central part of the Greek island of Kalymnos, is considered as the real-world application of this study ().

Figure 1. Hydro-lithological map of Kalymnos island. Source: Hellenic Ministry of Development (Citation2005)

Figure 1. Hydro-lithological map of Kalymnos island. Source: Hellenic Ministry of Development (Citation2005)

Regarding the hydro-lithology of the examined aquifer, it consists mainly of highly permeable limestone, which outcrops at the valley margins. Borehole lithological data indicate that the aquifer is bounded by an impermeable schist formation, which underlies the limestones. The valley plain consists of scree, alluvium deposits and volcanic formations (tuff) of high, medium and very low permeability, respectively. Although the carbonate rocks are characterized by secondary porosity, for simplification purposes a uniform hydraulic conductivity is considered for the limestones. The aquifer is divided into four zones of uniform hydraulic conductivity, as depicted in , and it is replenished by a uniform surface recharge of approximately 7849 m3/d.

Figure 2. Hydraulic conductivity zones of the Vathi aquifer

Figure 2. Hydraulic conductivity zones of the Vathi aquifer

The aquifer was simulated assuming a horizontal bottom at a depth of −25 m. The northern, southern and western boundaries are considered impermeable. At the eastern sea boundary, first-type (Dirichlet) boundary conditions are applied. That is, a specified hydrostatic equivalent freshwater head and a constant relative concentration corresponding to the maximum fluid density of seawater are applied to represent sea boundaries. Due to the lack of hydrogeological data and hydraulic head measurements, it was not possible to perform a reliable and accurate calibration of the aquifer model. Therefore, the parameter values are rough estimates, based on a trial-and-error procedure (Mantoglou et al. Citation2004).

Application on a hypothetical coastal aquifer model

The optimization frameworks were all first tested for the numerical SWI simulations of an illustrative coastal aquifer model (). The dimensions of the coastal aquifer model are set to x = 7000 m, y = 3000 m, and the aquifer thickness is z = 25 m, while a rectangular shape is assumed. Unconfined, steady-state and saturated flow conditions are assumed. summarizes the basic input parameters for the VDST numerical model.

Table 1. Parameters for the numerical SWI simulations. Kx, Ky, Kz: hydraulic conductivities (m/d); Rgw: total aquifer recharge (m3/d); αL, αT: longitudinal and transverse dispersivity (m); Δx, Δy, Δz: grid discretization settings (m)

Figure 3. Illustrative coastal aquifer model with boundary conditions applied

Figure 3. Illustrative coastal aquifer model with boundary conditions applied

A CPU time of 30 s is required for a single run of the VDST model. All the simulation–optimization methods were conducted on the same desktop, namely, a 2.7-GHz Intel i5 processor with 8 GB of RAM in a 64-bit Windows 10 system. An initial simulation run of the VDST model was performed with no pumping present, until the head and salinity concentration fields reached steady state. These outputs provided the initial conditions for the subsequent VDST simulations related to the optimization runs.

VDST models

The phenomenon of SWI is more realistically described by VDST models which consider mixing between freshwater and salt water due to hydrodynamic dispersion. The VDST modelling is based on numerical codes which solve a coupled system of partial differential equations of flow and transport:

(1) xiKijhfxj+ρrnj+Qρ=Sshft(1)
(2) xiDijcxjqic+Qc=ϕct(2)

In the flow EquationEquation (1), hf [L] represents the equivalent freshwater head given by hf=p/ρfg+z; Kij [L T−1] are the coefficients of the freshwater hydraulic conductivity tensor; Ss [L−1] is the specific storage; t [T] is time; and Qρ [L3 L−3 T−1] is a volumetric fluid source/sink term per unit aquifer volume. The gravity acceleration constant is denoted by g [M L−2] and z [L] is the elevation above the horizontal datum. The fluid pressure is denoted by p [M L−1 T−2], while ρf [M L−3] is the reference fluid density. A linear relation is assumed between fluid density ρ [M L−3] and concentration c according to:

(3) ρρf/ρf=ρmaxρf/ρfc(3)

where c [-] is the dimensionless relative concentration, which varies between 0 and 1. Thus, ρ = ρmax corresponds to cmax = 1. In the transport EquationEquation (2), Dij [L2 T−1] are the coefficients of the dispersion tensor; ϕ [-] is porosity and Qc [M L−3 T−1] is a solute mass source/sink term. Note that the indices i,j are the unit vectors in the x and y directions and nj is the direction of flow, which is equal to 1 in the vertical direction and 0 in the horizontal directions. The Darcy flux term qi is expressed as:

(4) qi=Kijhfxj+ρrnj(4)

where ρr is the dimensionless relative density ((ρρf)/ρf). In this study, HydroGeoSphere (HGS) code (Graf and Therrien Citation2005, Therrien et al. Citation2006) was used to simulate SWI for the case of a hypothetical and a real-world coastal aquifer model. The numerical solution is based on the control volume finite element method and adaptive time-stepping (Therrien et al. Citation2006). A Picard scheme is utilized by HGS to iterate between the flow and transport equations.

Pumping optimization based on SWI models

Let fQ=i=1MQi represent the objective function of the problem, which is linear in respect to the decision variables Qi, with i = 1, 2, …, M; that is, the individual pumping rates of each pumping well, where Q is the decision vector of pumping rates (Q = (Q1, Q2, …, QM)). The objective is to find the maximum of f (total groundwater abstraction) subject to inequality constraints of the form gi(Q) ≤ 0, which protect the coastal aquifer from SWI. Here, it is assumed that the salinity concentration ci in each pumping well is not allowed to exceed a specified salinity concentration threshold cT = 35 mg/L. This is formulated as a nonlinear constrained optimization problem as follows:

(5) mini=1MQis.t.ciQ1,Q2,,QMcT, i=1,2,,MQminQiQmax, i=1,2,,M(5)

Note that the number of constraint functions is equal to the number of pumping wells. In EquationEquation (5), Qmin and Qmax define the lower and upper limits that pumping rates can take. The negative sign in the formulation of the objective function denotes that the objective function is maximized, since the location of the maximum of f(Q) occurs at the same point, with the minimum of −f(Q).

It has been demonstrated in several coastal aquifer management studies that evolutionary algorithms can successfully handle the multimodality of pumping optimization problems in coastal aquifers (e.g. Karpouzos and Katsifarakis Citation2013, Ketabchi and Ataie-Ashtiani Citation2015b). Here, a heuristic optimization method, namely, the evolutionary annealing–simplex (EAS) algorithm (Efstratiadis and Koutsoyiannis Citation2002, Rozos et al. Citation2004), is utilized to solve problem (5). The latter was translated to a bound-constrained optimization problem using penalty terms in the objective function so that EAS can be applied, as given by EquationEquation (6):

minfQ=i=1MQiifi=1,2,,M;ciQ1,Q2,,QMcTMvi=1MmaxcicT,02ifi=1,2,.M;ciQ1,Q2,,QM>cT                     (6)

where Mv represents the number of pumping wells for which the constraint is violated. EquationEquation (6) attributes a separate score for each violated constraint. Furthermore, the penalized objective function is multiplied by Mv to include the number of constraint violations for the case of a non-feasible vector Q. The pumping optimization problem defined in EquationEquation (6) can be directly solved by combining the VDST model with the EAS algorithm. Guidelines for the EAS parameters can be found in Efstratiadis and Koutsoyiannis (Citation2002) and Tsoukalas et al. (Citation2016). Accordingly, the initial population size was set to npop = 10M, the two annealing schedule control parameters to λp = 0.95 and ψ = 2, the mutation probability to mp = 0.1, and the convergence criterion to ε = 10−4. For all optimization frameworks, the termination criteria are met if the convergence criterion ε equals its pre-set value, or the number of objective function evaluations reaches the maximum of nmax = 100npop.

Surrogate-based pumping optimization

The surrogate models

The nonlinear constraints described in EquationEquation (5) are computationally expensive to evaluate, since the VDST runs first to calculate the salinity concentration field at the end of the simulated management period. Given the computational cost of VDST simulations and the thousands of runs required by an evolutionary algorithm for a moderate dimensionality optimization problem, the direct approach may result in excessive computational burden. To reduce this computational burden, surrogate models are employed to replace the VDST simulation and to enable an efficient simulation–optimization routine. Furthermore, it is assumed that there is no a priori information for selecting a specific type of surrogate model, but instead it is computationally feasible to explore different surrogate model formulations for implementing a SBO framework. After an appropriate training procedure, the surrogate models can approximate the original salinity output at the corresponding observation points of the VDST model.

A suite of radial basis functions and kriging models is considered here to investigate the efficacy of multiple surrogates for the coastal aquifer management problem defined in the previous section. Both radial basis functions and kriging surrogate models have been successfully applied in water resources optimization problems (e.g. Baú and Mayer Citation2006, Shoemaker et al. Citation2007, Razavi et al. Citation2012b, Pan et al.Citation2014, Tsoukalas and Makropoulos Citation2015, Christelis and Mantoglou Citation2016, Tsoukalas et al. Citation2016, Christelis et al. Citation2016Citation2018). As interpolating surrogate models, they pass through all the previously evaluated points with the original model and thus can be more accurate as new input–output data become available from deterministic computer simulations (Forrester et al. Citation2008, Wang et al. Citation2014). A Latin hypercube sampling (LHS) design (McKay et al. Citation1979) was utilized to uniformly sample the decision vector space, evaluate the VDST model and create an initial set of m training patterns for each surrogate model.

Radial basis functions (RBF)

For the construction of the RBF surrogate models we used the codes developed within the MATSuMoTo MATLAB toolbox developed by Müller (Citation2014). A unique RBF model is associated with each of the M pumping wells. The set of decision vectors Q1,Q2,,QmRM obtained from the LHS design and the values ciQ1,ciQ2,,ciQm (i = 1, …, M) obtained from the VDST model define a RBF model, with a linear polynomial tail, of the following form (Powell Citation1992):

(7) SmQ=k=1mλkϕQQk+pQ(7)

where λ1,,λmR are coefficients to be determined, and p(Q) is a linear polynomial whose coefficients also need to be determined such that Sm passes through all the design points. Here, three types of basis functions are considered, that is: the linear form where ϕr=r, the cubic form where ϕr=r3 and a thin plate spline (TPS) where ϕr=r2lnr. The construction time for the RBF models is negligible in comparison to the VDST simulation time even for a large number of training patterns.

Kriging (KRG)

The MATLAB toolbox DACE (Design and Analysis of Computer Experiments) (Lophaven et al. Citation2002) was used to construct KRG surrogate models. KRG was first introduced as a surrogate model of deterministic computer simulations in Sacks et al. (Citation1989). Details about the mathematical background of KRG can be found elsewhere in the literature (e.g. Jones et al. Citation1998, Lophaven et al. Citation2002, Forrester et al. Citation2008). Here, we only briefly present the general concept for a KRG surrogate model as applied in our problem.

Kriging treats the deterministic outputs ciQ1, ciQ2,,ciQm (i = 1, …, M) obtained from the VDST model as if they were generated from a stochastic process defined by a known regression function f(Q) and a zero mean Gaussian process Z(Q), with variance σ2 and a correlation model R (Jones et al. Citation1998):

(8) YQ=fQ+ZQ(8)

The concept utilized by the KRG models is that, if the difference between two vectors Q and Q is small, then the scalar resulting predictions ciQ and ciQ of the “true”, black-box function (the VDST numerical model in this case) should be closely correlated (Forrester et al. Citation2008). The DACE toolbox allows for both regression models of different order polynomials and correlation models of different structures. The correlation model involves a set of parameters θ, which are identified using the maximum likelihood estimation method and numerical optimization techniques (Forrester et al. Citation2008). Here, the initial guess for θ values was set according to Viana (Citation2011). In total six KRG surrogate models were utilized by selecting exponential and Gaussian correlation models combined with zero-order, first-order and second-order polynomial models.

Adaptive SBO using single surrogates

As previously mentioned, a total of nine surrogate models (three RBF and six KRG) were considered in this study. Each was used separately to develop an adaptive SBO framework similar to that presented in Christelis and Mantoglou (Citation2016). The steps of the SBO approach can be summarized as follows:

  1. Use LHS design to provide the initial m training points Q1,Q2,,Qm and obtain ciQ1, ciQ2,,ciQm(i = 1, …, M) through m VDST simulations.

  2. Construct M surrogate models (e.g. M cubic RBFs) corresponding to the M pumping wells, and create an external archive of training patterns.

  3. Run the EAS algorithm based on the surrogate models and, if during optimization a better optimum is found, do the following:

  4. Re-evaluate the current decision vector with the VDST model;

  5. Replace the objective function value with that obtained from the VDST model;

  6. Store the new input–output data to the archive and re-train the surrogate models.

  7. Are the stopping criteria for EAS algorithm met? If yes, return the final solution; otherwise go back to Step 3.

Whereas the above adaptive strategy favours the fast improvement of the surrogate model at the region of the current optimum (local exploitation), it is not considered as a global SBO approach. However, it quickly locates good optimal solutions and has been successfully applied in other single-objective pumping optimization studies of coastal aquifers (Kourakos and Mantoglou Citation2009, Papadopoulou et al. Citation2010, Christelis and Mantoglou Citation2016). Another very active field of research found in the adaptive SBO literature is the development of sampling strategies that effectively utilize the expensive original model simulations to update the surrogate and increase its global accuracy (e.g. Regis Citation2011, Citation2014, Zhou et al. Citation2017).

Adaptive SBO using multiple surrogates

A cross-validation strategy is typically employed to assess the performance of the surrogate models and then either the best one is selected, or a set of the best available surrogates is used to construct an ensemble. Cross-validation is a practical approach to obtain information about the accuracy of the surrogates based on an initial training set. However, whether the combination provides better results over the selection of a single best surrogate model is inconclusive in the existing literature (Viana et al. Citation2010).

Three approaches are followed to make use of the multiple surrogate concept. First, given a relatively small number of m input–output patterns from the VDST model, a leave-one-out strategy is employed to find the best surrogate model. Here, the available surrogates emulate the scalar response of the computer model and thus a total of M surrogate models need to be constructed for each of the constraint functions corresponding to the pumping wells (EquationEquation (5)). That is, a cross-validation error is calculated by fitting the available surrogate models to m − 1 training points; then the response of the surrogate models to the point left out from the fitting process is predicted. This process is repeated m times until all points have been evaluated separately. The root mean squared error metric is used as a cross-validation score, which is defined as follows for the ith pumping well:

(9) RMSEi=1mj=1mcijQcˆijQ2,i=1,,M(9)

The surrogate models with the lowest RMSE values for each pumping well are identified and then used in the adaptive SBO scheme discussed above. Within the adaptive SBO framework, the surrogate models are reconstructed as new training points become available through the optimization operations.

Apart from selecting the best surrogate model, two ensemble formulations are also implemented. The two best surrogate models identified for each constrained function from the cross-validation strategy are used to form an ensemble of surrogates (ESM). Here, the multiple surrogates are formed as ensembles by constructing a weighted-average surrogate. Thus, the prediction of the ESM is formulated as:

(10) yˆESM=k=1NSMwkyˆk(10)

where yˆk is the prediction produced by the kth surrogate model, NSM is the number of surrogate models that form the ensemble and wk is the weight attributed to the kth surrogate model, and should be k=1NSMwk=1. The first ESM approach implemented in this work is based on assigning uniform weights to the surrogates, which means that they contribute equally to the ensemble response. This is known as a simple averaging technique and has been adopted for coastal aquifer management by Sreekanth and Datta (Citation2011b) and Roy and Datta (Citation2017c). The second ensemble approach involves the construction of a weighted-average surrogate model where optimal weights are obtained through optimization. It is noted that the optimization task for the optimal weights must be performed for each one of the constraint functions to form M ensembles of surrogates. The relevant optimization problem is defined as (Zhou et al. Citation2013, Jiang et al. Citation2015):

(11) minf(w)=1mj=1m(w1y1+w2y2yVDST)2s.t.k=1NSMwk=1(11)

where m is the number of the initial training sample; y1 and y2 are the individual predictions of the best two surrogate models for each prediction; and yVDST is the corresponding response of the VDST model. Thus, the weights for the surrogate models are the decision variables of the above optimization problem, which was solved using the Sequential Quadratic Programming (SQP) method in MATLAB. Since the number of constraint functions may be large, and the construction of the weighted-average surrogate with optimal weights may add considerable computation time, another approach could be to use surrogates, which are known to be accurate for the problem at hand. The reasoning is that we consider experience with the available surrogate models, and the additional computational time is utilized to optimize the weights instead of searching for the best surrogates using a cross-validation strategy. The above multiple surrogate frameworks are sample dependent, which means that using a different training sample may result in a different selection/combination of the best surrogate model. If an ensemble is to be constructed by different instances of the same surrogate model, resampling techniques may be beneficial to increase the diversity of the surrogate model response (e.g. Sreekanth and Datta Citation2011b, Roy and Datta Citation2017c). The adaptive multiple SBO framework implemented here is summarized in .

Figure 4. Workflow of the multiple SBO frameworks

Figure 4. Workflow of the multiple SBO frameworks

The adaptive SBO frameworks are compared with the optimal results from the VDST-based optimization, which provided the benchmark solution. Due to the probabilistic nature of the EAS algorithm, 30 independent runs were performed for each of the SBO methods for a more thorough comparison. The VDST-based optimization was run only once to provide the benchmark solution. summarizes the different surrogate modelling techniques compared in this study.

Table 2. Summary of the surrogate models (SM) and the abbreviated names given in this work

Optimization results

Hypothetical application

The leave-one-out strategy was used to identify the best surrogate model corresponding to each constraint function. Based on the cross-validations scores, the following SMs were identified as best and second best for the total of M = 10 pumping wells and are presented in .

Table 3. The surrogate models with the lowest RMSE values for each pumping well (PW) – hypothetical application

The results of the optimal solutions obtained from all the optimization frameworks are presented in by box-plot visualization, whereas some output statistics are presented in .

Table 4. Statistics of the best feasible objective function values for the 30 independent runs (best results are in bold)

Figure 5. Performance comparison between direct optimization and SBO frameworks

Figure 5. Performance comparison between direct optimization and SBO frameworks

It is obvious from the results in the previous section that the performance of the surrogate models may vary in several aspects. For example, surrogate models that were identified as best for some constraint functions failed to provide good optimal solutions and a robust performance over the 30 independent runs when they were used alone in a single SBO framework. The box plots in demonstrate that the SBO using multiple surrogates produced good optimal solutions, which compare satisfactorily with the benchmark solution from the EAS-VD model. In addition, the solutions from the ensemble with optimal weights (EAS-SM12) exhibit a low dispersion, and they also have the best median and mean values among all the SBO frameworks ().

The notches in the box plots of EAS-SM12 and EAS-SM10 appear to overlap more clearly than with the EAS-SM11, indicating that the simple averaging technique may have a different and lower true median from the other two multiple SBO frameworks (). Nevertheless, EAS-SM12 exhibits the most symmetrical distribution and appears as a more robust approach than the other two multiple SBO approaches. Of the nine single surrogates, three (EAS-SM3, EAS-SM4 and EAS-SM5) also produced good optimal solutions and demonstrate a similar performance to the multiple SBO approaches. Interestingly, among all frameworks, the highest maximum and minimum optimal solutions were obtained from SBO with a KRG model of Gaussian correlation and first-order polynomial. Based on the initial training sample, the latter (SM5) was identified as among the best surrogate models only two times () compared to other surrogate modelling techniques. This could mean that the size of the initial training sample used to perform cross-validation was not adequate to identify this KRG model as best using other constraint functions. It is not always feasible though, to design large training datasets to find the best surrogates due to the prohibitive computational cost of the VDST simulations. In such cases, the only choice may be to select a surrogate model solely on previous experience with the optimization problem.

In terms of computational savings, the single SBO frameworks required less than 2 hours to converge compared to the approximate time of 2 days for the VDST-based optimization. However, in the case of multiple SBO, additional time was required to apply the cross-validation approach, which took approx. 20 min to calculate RMSE scores for all constraint functions and for all surrogate models. Furthermore, the calculation of optimal weights for the ensembles constructed for each constraint function required an additional total time of 1.3 h. Therefore, whereas EAS-SM12 has probably the most promising performance among the multiple SBO frameworks, it involves a higher computational cost than the single surrogates. Nevertheless, the additional computational time is rather inexpensive in comparison to the direct optimization approach. Overall, of the 12 approaches tested, the results above favour the six SBO approaches: three multiple SBO approaches (EAS-SM10, EAS-SM11 and EAS-SM12) and the single ones based on a cubic RBF (EAS-SM3), a Gaussian KRG with zero-order polynomial (EAS-SM4) and a Gaussian KRG with first-order polynomial (EAS-SM5).

Real-world application

The VDST-based optimization provided the benchmark optimal solution and computational time. presents the salinity distribution simulated by applying the optimal pumping rates obtained from the VDST-based optimization. A single run of the VDST simulation requires approximately 1.8 min. There are 11 main pumping boreholes installed in the aquifer and the simulation output indicates a relatively narrow seaward front.

Figure 6. VDST simulation output, with the locations of the 11 pumping wells. Salinity is expressed in relative concentrations (-)

Figure 6. VDST simulation output, with the locations of the 11 pumping wells. Salinity is expressed in relative concentrations (-)

The SBO framework was run also only once to represent the more realistic case where the computational cost of the VDST simulations does not allow multiple optimization runs. The same optimization problem defined in EquationEquation (6) was solved. shows the best surrogate models identified from the cross-validation approach.

Table 5. The surrogate models with the lowest RMSE values for each of the 11 pumping wells (PW) – cross-validation approach

presents the distribution of the optimal pumping rates obtained from all optimization frameworks that were tested for the real coastal aquifer problem. The best optimal solution compared to the benchmark solution from the VDST model was produced by the SBO that utilizes a single KRG model with Gaussian correlation and first-order polynomial (EAS-SM5). The second-best was the solution obtained by the cubic RBF model, and the ensemble with optimal weights (EAS-SM12) provided the third-best solution.

Figure 7. Distribution of the optimal pumping rates for all the optimization frameworks. The numbers in parentheses represent the total optimal pumping

Figure 7. Distribution of the optimal pumping rates for all the optimization frameworks. The numbers in parentheses represent the total optimal pumping

In terms of computation times, the direct optimization using the VDST model required approximately 11 days. In contrast, all the SBO approaches required less than 10 h. It is noted that, with the adaptive SBO used in this study, the number of calls of the VDST model may vary for each case and for each optimization run. Therefore, there is no point in generalizing conclusions about the individual computation times for each surrogate model simply based on a single run. It is obvious, however, that the multiple surrogate approaches will add computation time, while the construction of KRG models is more expensive than the RBF models. However, considering that the adaptive SBO framework utilized in this study reduces the required training patterns significantly, the construction of the surrogate models did not contribute much to the overall computation time.

Conclusions

The performance of single and multiple surrogate models was assessed for a hypothetical and a real-world coastal aquifer management problem. The multiple surrogate approach was applied by using either the best or ensembles of surrogates. The application was focused on a single-objective pumping optimization problem where multiple constraint functions must be treated separately.

In both optimization problems, the multiple surrogate approach did not result in a consistent outperformance over the use of single surrogates. However, the weighted-average surrogate (ensemble), based on optimal weights, appeared as the most promising multiple surrogate approach. Notwithstanding the good results from the surrogate ensemble, the single cubic RBF and the KRG models of Gaussian and a first-order polynomial trend performed equally well. The computational gains over the direct optimization were significant, leading to more than 90% reduction for all SBO approaches. However, the adaptive SBO frameworks using multiple surrogates were associated with additional computational cost.

In this work, only the best and the second-best surrogate models were utilized to construct a weighted-average heterogeneous ensemble based on optimal weights. Although using a larger number of surrogate models to form the ensemble may increase the computational cost, it is worthwhile to investigate this approach. Since the efficiency and robustness of surrogate models may vary depending on the optimization problem at hand, further studies should explore other heterogeneous ensembles by using promising surrogate models already used in coastal aquifer management.

Acknowledgments

The authors would like to thank the anonymous reviewers for their suggestions to improve the quality of the paper.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Acar, E., 2010. Various approaches for constructing an ensemble of metamodels using local measures. Structural and Multidisciplinary Optimization, 42 (6), 879–896. doi:10.1007/s00158-010-0520-z
  • Asher, M.J., et al., 2015. A review of surrogate models and their application to groundwater modeling. Water Resources Research, 51 (8), 5957–5973. doi:10.1002/2015WR016967
  • Ataie-Ashtiani, B., Ketabchi, H., and Rajabi, M.M., 2013. Optimal management of a freshwater lens in a small island using surrogate models and evolutionary algorithms. Journal of Hydrologic Engineering, 19 (2), 339–354. doi:10.1061/(ASCE)HE.1943-5584.0000809
  • Baú, D.A. and Mayer, A.S., 2006. Stochastic management of pump-and-treat strategies using surrogate functions. Advances in Water Resources, 29 (12), 1901–1917. doi:10.1016/j.advwatres.2006.01.008
  • Babaei, M. and Pan, I., 2016. Performance comparison of several response surface surrogate models and ensemble methods for water injection optimization under uncertainty. Computers & Geosciences, 91, 19–32. doi:10.1016/j.cageo.2016.02.022
  • Bhosekar, A. and Ierapetritou, M., 2017. Advances in surrogate based modeling, feasibility analysis and optimization: A review. Computers & Chemical Engineering, 108, 250–267. doi:10.1016/j.compchemeng.2017.09.017
  • Broad, D.R., Dandy, G.C., and Maier, H.R., 2015. A systematic approach to determining metamodel scope for risk-based optimization and its application to water distribution system design. Environmental Modelling & Software, 69, 382–395. doi:10.1016/j.envsoft.2014.11.015
  • Christelis, V., Bellos, V., and Tsakiris, G., 2016. Employing surrogate modelling for the calibration of a 2D flood simulation model. In: S. Erpicum, B. Dewals, P. Archambeau, M. Pirotton, eds. Sustainable Hydraulics in the Era of Global Change, Proceedings of 4th IAHR Congress, 27–29 July 2016 Liege, Belgium. Boca Raton, FL: CRC Press, 727–732.
  • Christelis, V. and Mantoglou, A., 2016. Pumping optimization of coastal aquifers assisted by adaptive metamodelling methods and radial basis functions. Water Resources Management, 30 (15), 5845–5859. doi:10.1007/s11269-016-1337-3
  • Christelis, V., Regis, R.G., and Mantoglou, A., 2018. Surrogate-based pumping optimization of coastal aquifers under limited computational budgets. Journal of Hydroinformatics, 20 (1), 164–176. doi:10.2166/hydro.2017.063
  • Dhar, A. and Datta, B., 2009. Saltwater intrusion management of coastal aquifers. I: linked simulation-optimization. Journal of Hydrologic Engineering, 14 (12), 1263–1272. doi:10.1061/(ASCE)HE.1943-5584.0000097
  • Efstratiadis, A. and Koutsoyiannis, D., 2002. An evolutionary annealing-simplex algorithm for global optimisation of water resource systems. In: Proceedings of the Fifth International Conference on Hydroinformatics, Cardiff, UK, 1423–1428.
  • FAC Viana, 2011. SURROGATES Toolbox User's Guide, Version 3.0, Available from: https://sites.google.com/site/felipeacviana/surrogatestoolbox
  • Forrester, A.I.J., Sóbester, A., and Keane, A.J., 2008. Engineering design via surrogate modelling-A practical guide. John Wiley & Sons.
  • Graf, T. and Therrien, R., 2005. Variable-density groundwater flow and solute transport in porous media containing nonuniform discrete fractures. Advances in Water Resources, 28 (12), 1351–1367. doi:10.1016/j.advwatres.2005.04.011
  • Grundmann, J., et al., 2012. Towards an integrated arid zone water management using simulation-based optimisation. Environmental Earth Sciences, 65 (5), 1381–1394. doi:10.1007/s12665-011-1253-z
  • Hellenic Ministry of Development, Department of Water Potential and Natural Resources, 2005. Water resources management system and tool analysis of the Aegean islands district. Phase B’, issue 2-1: Kalymnos island, Aegean Water System consortium [in Greek].
  • Hou, Z., et al., 2017. A comparative research of different ensemble surrogate models based on set pair analysis for the DNAPL-contaminated aquifer remediation strategy optimization. Journal of Contaminant Hydrology, 203, 28–37. doi:10.1016/j.jconhyd.2017.06.003
  • Jiang, X., et al., 2015. Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites. Computers & Geosciences, 84, 37–45. doi:10.1016/j.cageo.2015.08.003
  • Jones, D.R., Schonlau, M., and Welch, W.J., 1998. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13 (4), 455–492. doi:10.1023/A:1008306431147
  • Karpouzos, D.K. and Katsifarakis, K.L., 2013. A set of new benchmark optimization problems for water resources management. Water Resources Management, 27 (9), 3333–3348. doi:10.1007/s11269-013-0350-z
  • Ketabchi, H. and Ataie-Ashtiani, B., 2015a. Review: coastal groundwater optimization—advances, challenges, and practical solutions. Hydrogeology Journal, 23, 1129–1154. doi:10.1007/s10040-015-1254-1
  • Ketabchi, H. and Ataie-Ashtiani, B., 2015b. Evolutionary algorithms for the optimal management of coastal groundwater: a comparative study toward future challenges. Journal of Hydrology, 520, 193–213. doi:10.1016/j.jhydrol.2014.11.043
  • Kourakos, G., and Mantoglou, A., 2009. Pumping optimization of coastal aquifers based on evolutionary algorithms and surrogate modular neural network models. Advances in Water Resources, 32 (4), 507–521. doi:10.1016/j.advwatres.2009.01.001
  • Kourakos, G. and Mantoglou, A., 2011. Simulation and multi-objective management of coastal aquifers in semi-arid regions. Water Resources Management, 25 (4), 1063–1074. doi:10.1007/s11269-010-9677-x
  • Lal, A. and Datta, B., 2018. Development and implementation of support vector machine regression surrogate models for predicting groundwater pumping-induced saltwater intrusion into coastal aquifers. Water Resources Management, 32, 2405–2419. doi:10.1007/s11269-018-1936-2
  • Lophaven, S.N., Nielsen, H.B., and Søndergaard, J., 2002. DACE: a Matlab kriging toolbox (Vol. 2). Lyngby, Denmark: IMM, Informatics and Mathematical Modelling, The Technical University of Denmark.
  • Mantoglou, A., Papantoniou, M., and Giannoulopoulos, P., 2004. Management of coastal aquifers based on nonlinear optimization and evolutionary algorithms. Journal of Hydrology, 297 (1–4), 209–228. doi:10.1016/j.jhydrol.2004.04.011
  • McKay, M.D., Beckman, R.J., and Conover, W.J., 1979. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21 (2), 239–245.
  • Müller, J., 2014. MATSuMoTo: the MATLAB surrogate model toolbox for computationally expensive black-box global optimization problems. arXiv preprint arXiv:1404.4261. Available from: https://ccse.lbl.gov/people/julianem/index.html
  • Müller, J. and Piché, R., 2011. Mixture surrogate models based on Dempster-Shafer theory for global optimization problems. Journal of Global Optimization, 51 (1), 79–104. doi:10.1007/s10898-010-9620-y
  • Müller, J. and Shoemaker, C.A., 2014. Influence of ensemble surrogate models and sampling strategy on the solution quality of algorithms for computationally expensive black-box global optimization problems. Journal of Global Optimization, 60 (2), 123–144. doi:10.1007/s10898-014-0184-0
  • Nikolos, I.K., 2013. On the use of multiple surrogates within a differential evolution procedure for high-lift airfoil design. International Journal of Advanced Intelligence Paradigms, 5 (4), 319–341. doi:10.1504/IJAIP.2013.058302
  • Pan, I., et al., 2014. A multi-period injection strategy based optimisation approach using kriging meta-models for CO2 storage technologies. Energy Procedia, 63, 3492–3499. doi:10.1016/j.egypro.2014.11.378
  • Papadopoulou, M.P., Nikolos, I.K., and Karatzas, G.P., 2010. Computational benefits using artificial intelligent methodologies for the solution of an environmental design problem: saltwater intrusion. Water Science and Technology, 62 (7), 1479–1490. doi:10.2166/wst.2010.442
  • Powell, M.J.D., 1992. The theory of radial basis function approximation in 1990. In: W. Light, ed. Advances in numerical analysis, volume 2: wavelets subdivision algorithms and radial basis functions. Oxford, UK: Oxford University Press, 105–210.
  • Rajabi, M.M., Ataie-Ashtiani, B., and Simmons, C.T., 2015. Polynomial chaos expansions for uncertainty propagation and moment independent sensitivity analysis of seawater intrusion simulations. Journal of Hydrology, 520, 101–122. doi:10.1016/j.jhydrol.2014.11.020
  • Rao, S.V.N., et al., 2004. Planning groundwater development in coastal aquifers. Hydrological Sciences Journal, 49 (1), 155–170. doi:10.1623/hysj.49.1.155.53999
  • Rao, S.V.N. and Manju, S., 2007. Optimal pumping locations of skimming wells. Hydrological Sciences Journal, 52 (2), 352–361. doi:10.1623/hysj.52.2.352
  • Razavi, S., Tolson, B.A., and Burn, D.H., 2012a. Review of surrogate modeling in water resources. Water Resources Research, 48 (7). doi:10.1029/2011WR011527
  • Razavi, S., Tolson, B.A., and Burn, D.H., 2012b. Numerical assessment of metamodelling strategies in computationally intensive optimization. Environmental Modelling & Software, 34, 67–86. doi:10.1016/j.envsoft.2011.09.010
  • Regis, R.G., 2011. Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions. Computers & Operations Research, 38 (5), 837–853. doi:10.1016/j.cor.2010.09.013
  • Regis, R.G., 2014. Evolutionary programming for high-dimensional constrained expensive black-box optimization using radial basis functions. IEEE Transactions on Evolutionary Computation, 18 (3), 326–347. doi:10.1109/TEVC.2013.2262111
  • Roy, D.K. and Datta, B., 2017a. Fuzzy c-mean clustering based inference system for saltwater intrusion processes prediction in coastal aquifers. Water Resources Management, 31 (1), 355–376. doi:10.1007/s11269-016-1531-3
  • Roy, D.K. and Datta, B., 2017b. Genetic algorithm tuned fuzzy inference system to evolve optimal groundwater extraction strategies to control saltwater intrusion in multi-layered coastal aquifers under parameter uncertainty. Modeling Earth Systems and Environment, 3, 1–19.
  • Roy, D.K. and Datta, B., 2017c. Multivariate adaptive regression spline ensembles for management of multilayered coastal aquifers. Journal of Hydrologic Engineering, 22 (9), 04017031. doi:10.1061/(ASCE)HE.1943-5584.0001550
  • Roy, D.K. and Datta, B., 2018a. Trained meta-models and evolutionary algorithm based multi-objective management of coastal aquifers under parameter uncertainty. Journal of Hydroinformatics, 20, 1247–1267. doi:10.2166/hydro.2018.087
  • Roy, D.K. and Datta, B., 2018b. A review of surrogate models and their ensembles to develop saltwater intrusion management strategies in coastal aquifers. Earth Systems and Environment, 2, 193–211. doi:10.1007/s41748-018-0069-3
  • Rozos, E., et al., 2004. Calibration of a semi-distributed model for conjunctive simulation of surface and groundwater flows. Hydrological Sciences Journal, 49 (5), 842. doi:10.1623/hysj.49.5.819.55130
  • Sacks, J., et al., 1989. Design and analysis of computer experiments. Statistical Science, 4, 409–423. doi:10.1214/ss/1177012413
  • Shi, R., et al., 2016. An efficient ensemble of radial basis functions method based on quadratic programming. Engineering Optimization, 48 (7), 1202–1225. doi:10.1080/0305215X.2015.1100470
  • Shoemaker, C.A., Regis, R.G., and Fleming, R.C., 2007. Watershed calibration using multistart local optimization and evolutionary optimization with radial basis function approximation. Hydrological Sciences Journal, 52 (3), 450–465. doi:10.1623/hysj.52.3.450
  • Sreekanth, J. and Datta, B., 2010. Multi-objective management of saltwater intrusion in coastal aquifers using genetic programming and modular neural network based surrogate models. Journal of Hydrology, 393 (3–4), 245–256. doi:10.1016/j.jhydrol.2010.08.023
  • Sreekanth, J. and Datta, B., 2011a. Comparative evaluation of genetic programming and neural network as potential surrogate models for coastal aquifer management. Water Resources Management, 25 (13), 3201–3218. doi:10.1007/s11269-011-9852-8
  • Sreekanth, J. and Datta, B., 2011b. Coupled simulation–optimization model for coastal aquifer management using genetic programming-based ensemble surrogate models and multiple-realization optimization. Water Resources Research, 47 (4). doi:10.1029/2010WR009683
  • Sreekanth, J. and Datta, B., 2015. Simulation-optimization models for the management and monitoring of coastal aquifers. Hydrogeology Journal, 23 (6), 1155–1166. doi:10.1007/s10040-015-1272-z
  • Therrien, R., et al., 2006. HydroGeoSphere. Waterloo, Canada: Groundwater Simulations Group, University of Waterloo.
  • Tsoukalas, I., et al., 2016. Surrogate-enhanced evolutionary annealing simplex algorithm for effective and efficient optimization of water resources problems on a budget. Environmental Modelling & Software, 77, 122–142. doi:10.1016/j.envsoft.2015.12.008
  • Tsoukalas, I. and Makropoulos, C., 2015. Multiobjective optimisation on a budget: exploring surrogate modelling for robust multi-reservoir rules generation under hydrological uncertainty. Environmental Modelling & Software, 69, 396–413. doi:10.1016/j.envsoft.2014.09.023
  • Viana, F.A., Gogu, C., and Haftka, R.T., 2010, January. Making the most out of surrogate models: tricks of the trade. In: ASME 2010 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Montreal, Quebec, Canada, 587–598. American Society of Mechanical Engineers.
  • Viana, F.A., Haftka, R.T., and Steffen, V., 2009. Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Structural and Multidisciplinary Optimization, 39 (4), 439–457. doi:10.1007/s00158-008-0338-0
  • Viana, F.A., Haftka, R.T., and Watson, L.T., 2013. Efficient global optimization algorithm assisted by multiple surrogate techniques. Journal of Global Optimization, 56 (2), 669–689. doi:10.1007/s10898-012-9892-5
  • Wang, C., et al., 2014. An evaluation of adaptive surrogate modeling based optimization with two benchmark problems. Environmental Modelling & Software, 60, 167–179. doi:10.1016/j.envsoft.2014.05.026
  • Werner, A.D., et al., 2013. Seawater intrusion processes, investigation and management: recent advances and future challenges. Advances in Water Resources, 51, 3–26. doi:10.1016/j.advwatres.2012.03.004
  • Yadav, B., Mathur, S., and Yadav, B.K., 2018. Data-based modelling approach for variable density flow and solute transport simulation in a coastal aquifer. Hydrological Sciences Journal, 63 (2), 210–226. doi:10.1080/02626667.2017.1413491
  • Zhou, Q., et al., 2017. An active learning radial basis function modeling method based on self-organization maps for simulation-based design problems. Knowledge-Based Systems, 131, 10–27. doi:10.1016/j.knosys.2017.05.025
  • Zhou, X., et al., 2013. Ensemble of surrogates for dual response surface modeling in robust parameter design. Quality and Reliability Engineering International, 29 (2), 173–197. doi:10.1002/qre.1298

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.