534
Views
5
CrossRef citations to date
0
Altmetric
Articles

Assessment of the effectiveness of a multi-site stochastic weather generator on hydrological modelling in the Red Deer River watershed, Canada

&
Pages 1616-1628 | Received 03 Dec 2018, Accepted 16 Jul 2019, Published online: 02 Oct 2019

ABSTRACT

To improve the convergence of multiple-site weather generators (SWGs) based on the brute force algorithm (MBFA), a genetic algorithm (GA) is proposed to search the overall optimal correlation matrix. Precipitation series from weather generators are used as input to the hydrological model, the soil and water assessment tool (SWAT), to generate runoff over the Red Deer watershed, Canada for further runoff analysis. The results indicate that the SWAT model using SWG-generated data accurately represents the mean monthly streamflow for most of the months. The multi-site generators were capable of better representing the monthly streamflow variability, which was notably underestimated by the single-site version. In terms of extreme flows, the proposed method reproduced the observed extreme flow with smaller bias than MBFA, while the single-site generator significantly underestimated the annual maximum flows due to its poor capability in addressing partial precipitation correlations.

Editor A. Castellarin Associate editor A. Requena

1 Introduction

Stochastic weather generators (SWGs) are numerical tools that aim to synthesize long time series of various climate variables, such as precipitation and temperature, which are statistically identical to historical observations (Wilks and Wilby Citation1999). The Weather Generator (WGEN), developed by Richardson and Wright (Citation1984), and the Long Ashton Research Station-Weather Generator (LARS-WG) (Semenov and Barrow Citation1997) are two commonly-used single-site SWGs. The former is a parametric-type model, which relies on a first-order two-state Markov chain to generate the occurrence of wet and dry days and a two-parameter gamma distribution to simulate the precipitation amounts on wet days. The latter is a semi-parametric type model, relying on semi-empirical distributions to model precipitation occurrence and amounts. The output from single-site SWGs is spatially independent climate variables, making it difficult to match the actual climate data with strong spatial correlations. In the past, various types of multi-site SWGs have been proposed (Baigorria and Jones Citation2010), including parametric (Wilks Citation1998, Qian et al. Citation2002, Brissette et al. Citation2007, Chen et al. Citation2014, Evin et al. Citation2018), non-parametric (Burton et al. Citation2008, Leander and Buishand Citation2009) and hybrid (Palutikof et al. Citation2002, Breinl et al. Citation2015, Citation2017) models.

In view of parametric-type modelling, the general idea of developing a multi-site SWG is to extend the existing single-site version (e.g. WGEN) by driving it with temporally independent but spatially correlated random numbers. Based on the empirically monotonic relationship between the correlations of the random numbers and the observed correlations of precipitation occurrence and amounts, Wilks (Citation1998) constructed the corresponding empirically-derived curves to identify the needed correlation of random numbers. Wilks (Citation1999) and Qian et al. (Citation2002) enhanced the framework of the Wilks (Citation1998) approach by adopting a high-dimensional autoregression technology to produce the non-precipitation variables, including maximum and minimum temperatures as well as daily solar radiation. Even though this approach can take into account the spatial dependence of climatic variables, it has two main drawbacks: (i) the huge computational burden due to its low-efficient Monte Carlo sampling process in construction of empirically-derived curves for all possible pairs of stations, and (ii) the ill-defined correlation matrix problem in estimating correlation on a pairwise basis (Mehrotra et al. Citation2006, Khalili et al. Citation2007, Chen et al. Citation2014). Brissette et al. (Citation2007) proposed a brute force algorithm (BFA) to improve the computational efficiency, and used a spectral decomposition method (Rebonato and Jäckel Citation1999) to solve the problem of ill-defined correlation matrices in the Wilks (Citation1998) approach. The BFA is a recursion formula, capable of continually updating the needed random number correlation matrix with a varying step size and search direction according to the residual errors of the correlation between observed and generated precipitation occurrences or amounts. Chen et al. (Citation2014) coded the methodology of Brissette et al. (Citation2007) based on MATLAB, and published a MATLAB software package (MulGETS), adding functions of simulating maximum and minimum temperatures. From the perspective of optimization algorithms, the BFA may easily encounter convergence problems as it is generally challenging to obtain an overall minimum solution. Meanwhile, the iteration may be more frequently stopped by the predefined maximum iteration number, instead of the error tolerance, which may increase the risk of obtaining a poor result (Brissette et al. Citation2007, Chen et al. Citation2014). In fact, the problem of finding temporally independent but spatially correlated random numbers belongs to an unconstrained nonlinear optimization problem. Many heuristic solvers, such as genetic algorithm (GA), have been reported to be advantageous in dealing with such a problem in the hydrological field (Muleta and Nicklow Citation2005, Arsenault et al. Citation2014, Chlumecký et al. Citation2017, Manfreda et al. Citation2018). Thus, it is desirable to use a similar approach to replace the BFA in the framework of Brissette et al. (Citation2007) for achieving global solutions.

From a practical application point of view, SWGs are widely used to generate long climate series, which are used as the input to hydrological models for runoff analysis. Theoretically, the climate data generated by single-site SWGs would lose the information of spatial dependence, leading to potential modelling errors, especially for simulating extreme hydrological events (Caron et al. Citation2008). Khalili et al. (Citation2011) used a spatial autocorrelation-based multi-site SWG and a single-site WGEN, respectively, to drive a physically based distributed hydrological model (HydroTel; Fortin et al. Citation2001) for streamflow simulation in the Chute du Diable River Watershed (approx. 9700 km2), Quebec, Canada. The study suggested that it may be preferable to use multi-site SWGs, coupled with distributed hydrological models, for hydrological simulation in basins where precipitation varies spatially in a significant manner. More recently, Li et al. (Citation2017) evaluated the multi-site SWG (i.e. MulGETS) and the single-site SWG (i.e. WGEN) in hydrological modelling in the Jing River watershed (approx. 45 421 km2), China. Alodah and Seidou (Citation2019) examined the effects of a single-site SWG (WeaGETS inspired from the WGEN approach), the multi-site MulGETS and a non-parametric k-nearest-neighbour weather generator, on the hydrological output of the soil and water assessment tool (SWAT) for the South Nation watershed in the Eastern Ontario, Canada (approx. 4000 km2). The above-mentioned three studies concluded that distributed hydrological models driven by multi-site SWGs performed better in simulating extreme streamflow, which was consistent with the findings from Khalili et al. (Citation2011). Even when using a lumped hydrological model, a multi-site SWG can be advantageous in helping evaluate the impact of rainfall–runoff modelling on hydrological output, as reported by Breinl (Citation2016), in two Alpine study areas, the Salzach basin on the border of Germany and Austria (4637 km2) and the Ubaye basin in France (548 km2).

However, there are also some studies that have come up with different results in evaluating the effect of multi-site SWGs on hydrological modelling. Watson et al. (Citation2005) assessed the response of a distributed hydrological model (SWAT) to the weather data generated by a single-site SWG (Srikanthan Citation2005) and a multi-site one (Wilks Citation1998) over the Woady Yaloak River catchment (306 km2) in Australia. Little difference was found between the two models, probably due to the small size of the watershed and the flat topography, which limited the spatial variability of weather variables. Chen et al. (Citation2016) compared two methods for hydrological modelling over the large Lac-Saint-Jean watershed (45 432 km2), Québec, Canada. One method was based on the combination of a single-site SWG (i.e. WGEN) using a lumped approach and a conceptually lumped hydrological model (HSAMI); the other was based on the combination of a multi-site SWG (i.e. MulGETS) and a physically-based distributed hydrological model (CEQUEAU; Ayadi and Bargaoui Citation1998). The two methods both performed well in simulating the mean and standard deviation of monthly average and extreme flows, but the advantage of the distributed hydrological model using the multi-site SWG was not obvious, possibly due to the strongly snowmelt-dominated hydrographs in the Lac-Saint-Jean watershed. These studies demonstrated that the performance of multi-site SWGs may be affected by many factors, such as watershed size and topography, climate variability and hydrological type of watershed. More quantitative assessment of the impact of using multi-site SWGs on hydrological modelling is still highly demanded.

Therefore, the objective of this research is twofold. Firstly, GA is used to search a global optimal correlation matrix for generating the spatially correlated multi-site precipitation series by replacing the BFA of the framework of Brissette et al. (Citation2007). Its searching performance related to different error functions and effectiveness in simulating the spatial correlation of precipitation occurrence and amounts is carefully evaluated. Secondly, the flow predictions from a hydrological model (i.e. SWAT) driven by both single and multiple weather generators are evaluated for the Red Deer River watershed, Alberta, Canada. The statistics of precipitation from single and multiple weather generators are analysed to give information on their possible effects on hydrological modelling. The hydrological responses of SWAT to single and multiple weather generators will be compared to give suggestions for the choice of a weather generator in this basin.

2 Data and methodology

2.1 Study area

The Red Deer River watershed (50º29′–52º59′N, 110º01′–116º06′W) is located in southern Alberta, Canada () and covers approximately 47 954 km2 (Mishra and Coulibaly Citation2010). The Red Deer River originates from the eastern slopes of the Rocky Mountains in the Banff National Park near Lake Louise, with a total length of 724 km (Tanzeeba and Gan Citation2012). The region that this river traverses is mountainous in the west, prairie in the east and cropland in the middle. With a continental climate, the region has an average annual temperature of 4°C and a median annual precipitation of approximately 393 mm (Kerr and Cooke Citation2017). We used the precipitation data from four climate stations along the mainstream of the Red Deer River. lists the background information of the four climate stations. The observed period of all the climate stations covers 24 years from 1979 to 2005 (excluding 1981, 1982 and 1990 due to lack of data). Further information regarding the average monthly precipitation and inter-station statistics can be found in the Supplementary material (Fig. S1). The majority of annual precipitation (about 70%) occurs from May to September and the annual maximum value of the inter-station correlation coefficient is 0.54. According to the Red Deer River State of the Watershed Report (https://www.rdrwa.ca), the multi-year average water flow rate of the Red Deer River watershed is about 70 m3/s at the watershed outlet. There have been several high streamflow and flood watch/warning advisories in the upstream of the Red Deer River since 2005, which resulted from snowmelt early in the year and high precipitation events throughout the summer months.

Table 1. Information on the climate stations in this study.

Figure 1. Red Deer River watershed and the used hydro-meteorological stations.

Figure 1. Red Deer River watershed and the used hydro-meteorological stations.

2.2 Hydrological modelling

The well-known SWAT model (Tripathi et al. Citation2004) is used in this study to simulate hydrological processes. The purpose, characteristics and sources of the primary data used for the SWAT modelling are presented in the Supplementary material (Table S1). The related digital elevation model (DEM), land-use and soil data were derived from the NASA (US National Aeronautics and Space Administration) Shuttle Radar Topographic Mission (SRTM) project (Butt and Bilal Citation2011), the US Geological Survey Global Land Cover Characterization (GLCC) database (Brown et al. Citation1999), and the Food and Agriculture Organization of the United Nations (FAO) (Nachtergaele et al. Citation2009), respectively. The precipitation data at the four stations () and other meteorological data (i.e. air temperature, relative humidity, wind speed and solar radiation) were collected from the Government of Canada (http://climate.weather.gc.ca) and the National Centers for Environmental Prediction Climate Forecast System Reanalysis (CFSR) (Dile and Srinivasan Citation2014). The daily streamflow data from 1979 to 2005 at the station 05CK004 (shown in ) was derived from Environment Canada (http://www.ec.gc.ca/rhc-wsc).

Based on the DEM and defined outlets, the Red Deer River watershed was divided into 33 sub-basins (), with areas ranging from 6 to 4535 km2 (average: 1398 km2). Each sub-basin was further divided into multiple hydrologic response units (HRU) based on the information of slope, land use and soil type, resulting in 97 HRUs. The SWAT model was calibrated by using the sequential uncertainty fitting algorithm version-2 (SUFI-2) tool (Mousavi et al. Citation2012, Abbaspour et al. Citation2015) based on the observed daily streamflow data from the hydrometric station (05CK004). presents the description of the main hydrological parameters for calibration and their calibrated values. The Nash-Sutcliffe efficiency (ENS), is used as the index for evaluating model performance (Confesor and Whittaker Citation2007):

(1) ENS=1iOiSi2iOiOˉ2(1)

Table 2. SWAT parameters and the best calibrated values.

where Oi and Si are, respectively, the observed and simulated streamflow on the ith day and Oˉ is the average value of Oi.

2.3 Stochastic weather generator

2.3.1 Multi-site precipitation generation based on brute force algorithm (MBFA)

The WGEN uses a first-order two-state Markov chain to determine the daily precipitation occurrence and a two-parameter gamma distribution to generate the daily precipitation amount after a wet day is confirmed (Richardson Citation1981, Richardson and Wright Citation1984, Wilks Citation1998). It is meant for generating precipitation series at a single location and is incapable of addressing spatial correlation for multiple sites. The BFA (Brissette et al. Citation2007) is used to search for a suitable correlation matrix to produce spatially correlated random numbers and then drive the precipitation generator. The procedure () can be briefly described as follows.

Figure 2. Methodology framework of the overall processes in this study. Note: the dotted, dashed, dash-dot and solid arrow lines denote the operation procedures relevant to WGEN, MBFA, MGA and all, respectively.

Figure 2. Methodology framework of the overall processes in this study. Note: the dotted, dashed, dash-dot and solid arrow lines denote the operation procedures relevant to WGEN, MBFA, MGA and all, respectively.

Firstly, we assume Ck is a candidate correlation matrix with size of n×n (where n is the number of stations) at the kth iteration (k=0,1,2,), and initialized with the observed correlation matrix of precipitation occurrence or amounts (Cobs), i.e. C0=Cobs. If Ck is a non-positive-definite matrix, it needs to be modified as a positive-definite one by using the spectral decomposition method (Rebonato and Jäckel Citation1999). Next, we can produce the spatially correlated uniform distributed matrix RcU by using the Cholesky factorization method (Chen et al. Citation2014) for the positive-definite matrix Ck. Then, RcU is imported into the WGEN to generate spatially correlated precipitation occurrences or amounts. Meanwhile, we can calculate the generated correlation matrix of precipitation occurrence or amount (Cgen). In order to evaluate the overall differences between the observed and generated correlation matrixes, an error function (z) can be defined as (Brissette et al. Citation2007):

(2) z=fci,jΔ|cijΔΔC,ΔC=CobsCgen(2)

where i,j=1,2,n,; and the definition of error function, fci,jΔ, is defined in Section 2.4. If z is lower than a predefined error tolerance (i.e. ρ) or the number of iteration (k) is more than the predefined maximum iteration number (i.e. τ), the iteration process should stop. Otherwise, Ck is replaced by Ck+1 and the iteration process will restart. The relationship between Ck+1 and Ck is (Brissette et al. Citation2007):

(3) Ck+1=Ck+ηCobsCsyn(3)

where η is a convergence criterion that affects the convergence speed. Moreover, Brissette et al. (Citation2007) linked the mean precipitation amounts to an occurrence index to generate the daily precipitation amounts for solving the spatial intermittence problem mentioned by Wilks (Citation1998). Brissette et al. (Citation2007) also described the physical meaning and calculation method for the occurrence index, and divided the occurrence index into nine approximately uniform classes for fitting mean precipitation amounts. To avoid the data scattering problem, this study determines the number of classes of the occurrence index based on the number of precipitation events within each class. The maximum number of classes is set to 10; if the number of precipitation events in any class is lower than 30, this class will be combined with neighbouring classes until all classes contain at least 30 events. Moreover, this study also used the continuity ratio method, as proposed by Wilks (Citation1998), to judge whether it can deal appropriately with the spatial intermittence problem.

2.3.2 Multi-site precipitation generation based on genetic algorithm (MGA)

The BFA is generally difficult to converge and a poor solution could be obtained if the iteration process is compulsorily terminated by using a predefined iteration number (Chen et al. Citation2014). In practice, searching an optimized correlation matrix for generating the spatially correlated multi-site precipitation series is an unconstrained nonlinear optimization problem. The objective is to minimize the error function derived from the difference between the observed and generated correlation matrix of precipitation series. The corresponding optimization model can be formulated as:

(4a) Minfci,jΔ(4a)

subject to

(4b) Cx=Ux1,x2,,xnn1/2+Lx1,x2,,xnn1/2+In(4b)
(4c) Cgenx=BCx(4c)
(4d) ΔC=CobsCgenx(4d)
(4e) ci,jobsci,jx1,i,j(4e)
(4f) ci,jΔΔC(4f)
(4g) ci,jobsCobs(4g)
(4h) ci,jxCx(4h)

where x1,x2,,xnn1/2 are the decision variables, U and L are the strictly upper and lower triangular matrix established by decision variables, respectively; In is an n-level unit matrix; Cx is n-level candidate correlation matrix; Cgenx is the generated correlation matrix of precipitation occurrence or amount based on the candidate correlation matrix Cx; B denotes the method from Brissette et al. (Citation2007) to estimate the Cgenx, which needs three main steps: (i) producing spatially correlated random numbers with uniform distribution based on the Cx, (ii) importing it into a single-site weather generator to generate spatially correlated precipitation occurrence and amounts, and (iii) calculating the correlation matrix of synthetic precipitation occurrence or amounts (i.e. Cgenx). Constraint (4c) implies that Cgenx can be generated with the inputs of Cx. Constraint (4d) indicates that ΔC is the error matrix between Cobs and Cgenx. Constraint (4e) provides the boundary conditions for decision variables, which indicates that the elements of Cx should be more than those of Cobs and less than 1. If n=4, Cx can be written as:

(11) Cx=1x1x2x3x11x4x5x2x41x6x3x5x61,ifn=4(11)

This unconstrained nonlinear optimization problem can be solved by GA. In a GA, the population of candidate solutions to an optimization problem evolves toward better solutions by mimicking the natural selection of biological evolution (Whitley Citation1994). GA is one of the adaptive heuristic search algorithms, and requires adjustment of a number of parameters, such as population size, mutation rate and crossover rate (Karahan et al. Citation2007). also presents the searching process of GA. Firstly, a set of individuals is generated randomly within the bounds of decision variables (i.e. constraint (4e)) to initialize the population. Each individual is then transformed into a matrix via constraint (4b). Afterwards, each population contains various combinations of candidate correlation matrixes (Cx), and it will be fed into constraints (4c) and (4d) in sequence to calculate the value of the objective function and evaluate the difference between the observed and synthetic correlation matrixes of the precipitation series. As a result, the fitness score (the same as the value of the objective function) can be obtained. Such process will repeat through three main genetic operations, i.e. selection, crossover and mutation, until the termination criteria is satisfied. The relationship between selection, crossover and mutation operations, as well as the mutation and crossover rates, are described in Rajesh et al. (Citation2010). When the GA stops, a close-to-optimal spatially correlated multi-site precipitation series will be identified.

2.4 Approach design

In this study, we adopt five types of error function: (i) maximum absolute error (MAE), (ii) sum of squared error (SSE), (iii) average absolute error (AAE), (iv) root mean square error (RMSE), and (v) sum of mean and standard-deviation of absolute errors (SMSAE). The specific equations are listed as follows:

(5a) MAE:fci,jΔ=maxci,jΔ(5a)
(5b) SSE:f(ci,jΔ)=ni=1nj=1ci,jΔ 2(5b)
(5c) AAE:fci,jΔ=Eci,jΔ(5c)
(5d) RMSE:fci,jΔ=1n2ni=1nj=1c i,jΔ2(5d)
(5e) SMSAE:fci,jΔ=Eci,jΔ+Stdci,jΔ(5e)

where E and Std denote computations of statistical mean and standard deviation, respectively. In this study, three types of stochastic weather generator are to be compared. They are: (i) WGEN, (ii) MBFA, which drives the WGEN with spatially correlated random numbers identified based on the BFA, and (iii) MGA which drives the WGEN with spatially correlated random numbers identified based on GA. Corresponding to Equations (5a)–(5e), MGA is further divided into five types, namely MGA1 (based on MAE), MGA2 (based on SSE), MGA3 (based on AAE), MGA4 (based on RMSE), and MGA5 (based on SMSAE), shows the differences among the various stochastic weather generators. The outputs from these stochastic weather generators will be fed into the SWAT model for evaluating the effects of synthetic precipitation on the simulated streamflow ().

Table 3. Features of various stochastic weather generators.

3 Results and discussion

3.1 SWAT model calibration and validation

shows the calibrated parameters of the SWAT model. We set the 11-year dataset from 1980 to 1993 (excluding 1981, 1982 and 1990 due to lack of part of meteorological data) as the calibration period, and the 11-year dataset from 1994 to 2004 as the validation period. The observed and simulated daily streamflow for the calibration and validation periods are shown in and (), respectively. It is indicated that the SWAT model would somewhat underestimate the runoff of some peak-flood events, although most simulated runoff closely matches the observed runoff for both calibration and validation periods. It may be the case that (i) the SWAT model did not consider the effect of the Gleniffer Reservoir in the upstream due to a lack of reservoir operation records, and (ii) the calibration efforts place too much emphasis on the overall accuracy instead of the peak flows. Overall, the values of ENS in the calibration and validation periods reached 0.52 and 0.57, respectively. This performance is considered satisfactory for this study, as our focus is on evaluating the effect of the spatial correlation of rainfall on hydrological flows.

Figure 3. Observed vs simulated daily streamflow for (a) the calibration period, and (b) the validation period.

Figure 3. Observed vs simulated daily streamflow for (a) the calibration period, and (b) the validation period.

3.2 Identification of the best error functions as the objective of MGAs

shows the Taylor diagrams for monthly inter-station correlations calculated by observed and six multi-site SWGs. and () shows the precipitation occurrence and amounts, respectively. The Taylor diagram plots three statistics to quantify how realistically each SWG reproduces the observations with similar inter-station correlations: (i) the correlation coefficient (Taylor-CC) related to the azimuthal angle; (ii) the centred root mean square difference (Taylor-CRD) proportional to the distance between each multi-site SWG represented by different marks on the diagram and the point labeled “observed”; and (iii) the standard deviation (Taylor-SD) proportional to the radial distance from the origin point.

Figure 4. Taylor diagrams displaying a statistical comparison of the monthly inter-station correlations calculated by the observed and generated precipitation: (a) occurrence and (b) amounts.

Figure 4. Taylor diagrams displaying a statistical comparison of the monthly inter-station correlations calculated by the observed and generated precipitation: (a) occurrence and (b) amounts.

It can be seen from ) that the values of Taylor-SD of all the multi-site SWGs are higher than that of the observed record. For MBFA, the values of Taylor-CC, Taylor-CRD and Taylor-SD are approx. 0.68, 0.059 and 0.078, respectively. All MGAs have a higher Taylor-CC and a lower Taylor-CRD than the MBFA, implying a better performance of MGAs over MBFA in generating precipitation occurrence. The MGA2, MGA3 and MGA5 models are found to have slightly higher Taylor-CC values, and MGA1 and MGA4 have low Taylor-SD values. In , the values of Taylor-CC, Taylor-CRD and Taylor-SD for MBFA are approx. 0.52, 0.106 and 0.125, respectively. The MGA3 and MGA5 models have the highest Taylor-CC values, but MGA3 shows higher variations (i.e. Taylor-SD) and greater distance from the observed point (i.e. Taylor-CRD) than MGA5. It is also noted that MGA3 has similar Taylor-SD and Taylor-CRD levels as compared to MBFA, but a slightly higher Taylor-CC value.

shows boxplots of errors (both residual and absolute error) for the monthly inter-station correlations of the generated precipitation occurrence and amounts. and () are based on precipitation occurrence, and and () are based on precipitation amounts. In the boxplots, the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively, the difference between 75th and 25th percentiles are defined as interquartile range (IQR), and the central black line within the box indicates the median. The whiskers above and below the box extend to the most extreme data without considering outliers, and the outliers are marked as black points above and/or below the whiskers. It was found that MGA2 and MGA4 in and MGA1 and MGA2 in may have an overestimation problem, as the majority of their boxes are higher than zero. Moreover, MBFA, MGA1, MGA3 and MGA5 in have similar ranges and locations of boxes and whiskers. From , MGA3 is identified to be superior to the other three SWGs since it has the smallest IQR with a close-to-zero median. In terms of absolute error for rainfall occurrence (), the five MGAs perform at relatively similar levels but they are generally superior to MBFA due to their lower median and narrower IQR and whiskers. From , MGA3 is also seen to be the best alternative, judging from its lowest median, percentile and whisker levels. The selection of the fitness function is crucial for the search performance of the GA (Chlumecký et al. Citation2017). The reason that the MGA3 shows the best performance might be that the average absolute error is the most suitable fitness function of GA to search the optimized correlation matrix for generating the spatially correlated multi-site precipitation series in this basin.

Figure 5. Boxplots of residual errors for the monthly inter-station correlations of the generated precipitation: (a) occurrence and (b) amounts, and absolute errors for the correlation coefficient of the generated precipitation: (c) occurrence and (d) amounts.

Figure 5. Boxplots of residual errors for the monthly inter-station correlations of the generated precipitation: (a) occurrence and (b) amounts, and absolute errors for the correlation coefficient of the generated precipitation: (c) occurrence and (d) amounts.

3.3 Performance of MGA3, MBFA and WGEN

The results shown in the Taylor diagrams and boxplots demonstrate that MGA3 (based on AAE error function) is the best option among all MGAs in serving as the objection function for MGAs to produce spatially correlated precipitation series. In the following sections, we compare the differences among MGA3, MBFA and WGEN. presents scatterplots of the observed vs generated inter-station correlations of precipitation occurrence and amounts for all station pairs over the Red Deer River watershed. The correlation of precipitation occurrence is computed based on a monthly scale, whereas the correlation of precipitation amounts is calculated at a seasonal scale. This is because (i) MGA3, MBFA and WGEN work on a monthly basis to estimate the precipitation transition probabilities of Markov chains to generate the occurrence, and (ii) for MGA3 and MBFA, a seasonal link between the occurrence index and precipitation mean value can be used for avoiding the spatial intermittence problem in the precipitation amount generation process. In , most inter-station correlations of occurrence and amounts generated by WGEN would be close to zero indicating WGEN’s incapability in reproducing inter-station correlations. Both MBFA and MGA3 can reasonably reproduce the observed monthly inter-station correlations of both precipitation occurrence and amounts, but MGA3 seems to have a closer scattering of points along the 1:1 benchmark line than MBFA, which is consistent to the results in .

Figure 6. Observed vs generated inter-station correlations for the precipitation: (a) occurrence and (b) amounts.

Figure 6. Observed vs generated inter-station correlations for the precipitation: (a) occurrence and (b) amounts.

Figure 7. Converging processes of MBAF and MGA3 in generating precipitation: (a) occurrence and (b) amounts.

Figure 7. Converging processes of MBAF and MGA3 in generating precipitation: (a) occurrence and (b) amounts.

The basic statistics of the observed against generated precipitation series for all seasons and station pairs are shown in the Supplementary material (Fig. S2(a)–(c)). It is indicated that the three SWGs have similar performance in terms of mean, standard deviation and maximum daily precipitation. This is mainly because all generators use similar techniques in both precipitation occurrence and amount generation. It is also found that there is overestimation of the larger values of mean precipitation. In this study, we used the default gamma distribution to simulate the precipitation amounts. For a comparison of the fitting performance between the gamma distribution and the other three distributions (i.e. the exponential, Weibull and the generalized Pareto distributions) the reader is referred to Figs. S3–S6 in the Supplementary material, while Fig. S7 shows the basic statistics of the observed against generated precipitation series using these four distributions. We found that the performance of the four distributions in fitting the observed precipitation data is acceptable, but there is still bias in the mean and maximum values of the generated precipitation data. The performance of the exponential, Weibull and generalized Pareto distributions is not notably superior to that of the gamma distribution. The problem of overestimation of the larger values of mean precipitation may be caused by the stochastic nature of precipitation generation of this basin. The continuity ratios of the observed against generated precipitation series for all seasons and all combinations of station pairs are presented in Fig. S2(d). The results indicate that both MBFA and MGA3 could well reproduce the observed continuity ratios implying their capability of addressing spatial intermittence. But MGA3 seems to have relatively close scatter plots to the 1:1 line than MBFA. WGEN obviously performs the poorest in reproducing the observed continuity ratios as it does not take the spatial intermittence into consideration.

For the MBAF method, we set the convergence criterion, η, to 0.1. Two types of termination criteria are adopted, either when the maximum iteration number (i.e. τ=1000) is reached, or when a predefined error tolerance (i.e. ρ=103) is exceeded. For MGA3, the population size, elite count, crossover rate and mutation rate for GA operations are configured to 2000, 200, 0.85 and 0.01, respectively; the corresponding termination criteria is based on the maximum generation number (set to 1000). shows the search processes of MBFA and MGA3 in generating the precipitation occurrence and amounts. For further comparison of MBFA to other MGAs (i.e. MGA1, MGA2, MGA4, and MGA5) in searching, the reader is referred to the Supplementary material (Fig. S8). For both MBFA and MGA3, the overall trends of AAE would decrease and converge, with rapid fluctuations throughout the processes. Obviously, the magnitude of such fluctuation is more significant for MBFA, especially regarding the precipitation amounts. This implies that the searching process of MBFA exhibits a higher level of uncertainty and easily leads to a poorer solution when terminated by the maximum number of iterations. MGA3, on the other hand, could notably alleviate such a problem. Generally, demonstrates that MGA3 may show a slower convergence speed, but could reach a solution with better objective value. This is consistent with the findings from and , where the inter-station correlations could be better regenerated through MGA3. In terms of hardware and software setting, MBAF and MGA3 (on MATLAB platform) were run on a workstation with 96 GB RAM and AMD Opteron 2.6 GHz 16 Core 32 logical processor. We used the Parallel Computing Toolbox of MATLAB and executed the MGA3 on a parallel pool of 30 works. shows the computational costs of MBFA and MGA3. It took a total of 24.76 h for MGA3, including 2.43 h generating the precipitation occurrence and 22.33 h generating the precipitation amount. Compared with MBFA, MGA3 is more computationally intensive obviously. Although the 24.67 h calculation time is accepted in this study and such computing time could be further shortened if more CPU cores are available, the large computational burden is the conspicuous drawback of MGAs.

Table 4. Computational costs of MBFA and MGA3.

3.4 Hydrological responses to different SWGs

To evaluate the effect of SWGs on hydrological processes, the three SWGs (i.e. WGEN, MBFA and MGA3) are adopted to generate 200-year precipitation series that are used as weather input to drive the SWAT model for simulating runoff. Rather than selecting the observed streamflow directly from the hydrometric station (05CK004), we use the simulated runoff from running SWAT model using 24-year observed precipitation data as the benchmark. This is for the purpose of reducing the uncertainty/error from hydrological modelling and carrying out a fairer comparison among generators. shows the mean and standard deviation of monthly streamflow simulated by SWAT using the observed and generated precipitation series. The mean monthly streamflow would be accurately represented by the three SWGs data during January to June and September to December but overestimated during July and August (). This is probably because of the overestimation of the larger precipitation values during the summer period (see Fig. S2). In terms of standard deviation, MGA3 is found to perform well in most of the months except for May, while MBFA seems to have slight overestimation from April to December. WGEN considerably underestimates the monthly flow variability especially from May to July; the main reason is that WGEN generates precipitation series without considering any spatial dependence, which may attenuate the peak precipitation values.

Figure 8. Statistics of (a) mean and (b) standard deviation of monthly streamflow simulated by SWAT using observed and generated precipitation series. OBS: observed precipitation series.

Figure 8. Statistics of (a) mean and (b) standard deviation of monthly streamflow simulated by SWAT using observed and generated precipitation series. OBS: observed precipitation series.

shows the frequency curves of annual maximum flows simulated by SWAT using observed 24-year and generated 200-year precipitation series. The frequency curves are constructed by using the Pearson type III distribution (Koutrouvelis and Canavos Citation1999), as it has the most satisfactory fitting performance compared to the extreme value distribution, Weibull distribution and exponential distribution. For detailed fitting performance, the reader is referred to the Supplementary material (Figs. S9 and S10). It is found that WGEN significantly underestimates the extreme flows, while MBFA and MGA3 have slight underestimation. This is somewhat consistent with where high flows in one location of the watershed may be offset by low flows in other locations due to independent precipitation series generated from WGEN. As the inter-station correlations are reasonably reproduced for the observed rainfall time series, the underestimation of rainfall extremes using the gamma distribution most likely leads to the underestimation of the extreme flows by both MBFA and MGA3. In addition, compared with MBFA, the frequency curve of MGA3 is closer to the observed one, indicating that MGA3 could reproduce the observed extreme flows with a smaller bias. Overall, the multiple-site SWGs are obviously more advantageous than single-site ones in simulating extreme flows.

Figure 9. Annual maximum flows over different return periods using observed and generated precipitation series. OBS: observed precipitation series.

Figure 9. Annual maximum flows over different return periods using observed and generated precipitation series. OBS: observed precipitation series.

4 Conclusion

In this research work, we compared the performance of GA with the BFA in the framework of Brissette et al. (Citation2007) in regard to finding suitable correlation matrices and random numbers for multi-site SWG. We designed five types of error function, MAE, SSE, AAE, RMSE and SMSAE, as the alternative objectives of GA, and found that MGAs with the AAE (i.e. MGA3) could reproduce the observed precipitation series with the best spatial correlations. It should be noted that the selection of error functions would affect the fitness score and a suitable error function should be tested for specific watersheds. We compared the performance of MGA3, MBFA and WGEN in the simulation of precipitation series of four stations distributed over the Red Deer watershed, Alberta, Canada. Because the three SWGs adopted similar techniques, namely a two-state Markov chain coupled with a gamma distribution in generating precipitation series, the results exhibit similar basic statistics, i.e. mean, standard deviation and maximum value. However, all the three generators somewhat overestimated the larger values of mean precipitation, due to statistical fitting and stochastic sampling. Furthermore, by linking average precipitation to the occurrence index as suggested by Brissette et al. (Citation2007), both MBFA and MGA3 reproduced the observed continuity ratio well. As multiple-site SWGs, MBFA and MGA3 both well reproduced the observed monthly inter-station correlations of precipitation occurrence and amounts. Through scatterplots and Taylor diagram analysis, the MGA3-generated inter-station correlations of precipitation occurrence and amounts turned out to be closer to the observed records than MBFA-generated ones. This is due to the better searching capability of Gas for the correlation matrix. It was also found that the larger magnitude of the fluctuations of MBFA compared with MGA3 would lead to a higher level of uncertainty in searching suitable solutions.

After validation of the SWAT model for the Red Deer watershed, we assessed the hydrological responses to the precipitation series generated from MGA3, MBFA and WGEN. Using the data from three SWGs, the SWAT model accurately represented the monthly mean streamflow for most of the months except for some overestimation in July and August. For the standard deviation, MGA3 performed well in almost all months except for May and MBFA slightly overestimated the monthly streamflow variability from April to December. However, WGEN considerably underestimated the variability of monthly flows especially from May to July. Compared with MBFA, MGA3 reproduced the observed extreme flow with smaller bias. WGEN significantly underestimated the annual maximum flows due to its poor capability in addressing precipitation correlations. In general, spatial correlation was found essential in generating precipitation series for multiple sites in the studied Red Deer watershed, for a better representation of flow characteristics and flood peaks.

The main contributions of this research work are: (i) comparison of the performance of GA with BFA in the framework of the Brissette et al. (Citation2007) multiple-site SWG to search for a global optimal solution for generating the spatial correlated precipitation series; and (ii) the assessment of the effects of the multiple-site SWG on hydrological responses in the Red Deer River watershed, which is characterized by large size and humid continental climate. The proposed method has a more stable convergence than the framework proposed by Brissette et al. (Citation2007) and could be used as a viable alternative in regenerating the spatial correlation, basic statistics and extreme data of weather variables. However, the method was compromised by a higher computational need due to GA iterations. Based on our test, GA could work well with less than 10 stations with manageable amount of time; but when the number of stations increases, the computational burden may unavoidably increase. Setting up a MATLAB distributed computing server to run MGA3 on computer clusters, clouds, and grids might be a potential solution.

Supplemental material

Supplemental Material

Download MS Word (3.9 MB)

Acknowledgements

The authors are grateful to the associate editor and the reviewers for their insightful comments and suggestions, which have greatly helped to improve the paper.

Disclosure statement

No potential conflict of interest was reported by the authors.

Supplementary Material

Supplemental data for this article can be accessed here.

Additional information

Funding

This project was supported by Start-Up Grant [M4081327.030] from School of Civil and Environmental Engineering, Nanyang Technological University, Singapore.

References

  • Abbaspour, K., et al., 2015. A continental-scale hydrology and water quality model for Europe: calibration and uncertainty of a high-resolution large-scale SWAT model. Journal of Hydrology, 524, 733–752. doi:10.1016/j.jhydrol.2015.03.027
  • Alodah, A. and Seidou, O., 2019. The adequacy of stochastically generated climate time series for water resources systems risk and performance assessment. Stochastic Environmental Research and Risk Assessment, 33 (1), 253–269. doi:10.1007/s00477-018-1613-2
  • Arsenault, R., et al., 2014. Comparison of stochastic optimization algorithms in hydrological model calibration. Journal of Hydrologic Engineering, 19 (7), 1374–1384. doi:10.1061/(ASCE)HE.1943-5584.0000938
  • Ayadi, M. and Bargaoui, Z., 1998. Modelling of flow of the Miliane River using the CEQUEAU model. Hydrological Sciences Journal/Journal Des Sciences Hydrologiques, 43 (5), 741–758. doi:10.1080/02626669809492170
  • Baigorria, G.A. and Jones, J.W., 2010. GiST: a stochastic model for generating spatially and temporally correlated daily rainfall data. Journal of Climate, 23 (22), 5990–6008. doi:10.1175/2010JCLI3537.1
  • Breinl, K., 2016. Driving a lumped hydrological model with precipitation output from weather generators of different complexity. Hydrological Sciences Journal, 61 (8), 1395–1414. doi:10.1080/02626667.2015.1036755
  • Breinl, K., et al., 2017. Can weather generation capture precipitation patterns across different climates, spatial scales and under data scarcity? Scientific Reports, 7 (1), 5449. doi:10.1038/s41598-017-05822-y
  • Breinl, K., Turkington, T., and Stowasser, M., 2015. Simulating daily precipitation and temperature: a weather generation framework for assessing hydrometeorological hazards. Meteorological Applications, 22 (3), 334–347. doi:10.1002/met.1459
  • Brissette, F., Khalili, M., and Leconte, R., 2007. Efficient stochastic generation of multi-site synthetic precipitation data. Journal of Hydrology, 345 (3–4), 121–133. doi:10.1016/j.jhydrol.2007.06.035
  • Brown, J.F., et al., 1999. The global land-gover gharacteristics database: the users’ perspective. Photogrammetric Engineering and Remote Sensing, 65 (9), 1069–1074.
  • Burton, A., et al., 2008. RainSim: a spatial-temporal stochastic rainfall modelling system. Environmental Modelling & Software, 23 (12), 1356–1369. doi:10.1016/j.envsoft.2008.04.003
  • Butt, M.J. and Bilal, M., 2011. Application of snowmelt runoff model for water resource management. Hydrological Processes, 25 (24), 3735–3747. doi:10.1002/hyp.v25.24
  • Caron, A., Leconte, R., and Brissette, F., 2008. An improved stochastic weather generator for hydrological impact studies. Canadian Water Resources Journal, 33 (3), 233–256. doi:10.4296/cwrj3303233
  • Chen, J., Brissette, F.P., and Zhang, X.J., 2014. A multi-site stochastic weather generator for daily precipitation and temperature. Transactions of the ASABE, 57 (5), 1375–1391.
  • Chen, J., Brissette, F.P., and Zhang, X.J., 2016. Hydrological modeling using a multisite stochastic weather generator. Journal of Hydrologic Engineering, 21 (2), 04015060. doi:10.1061/(ASCE)HE.1943-5584.0001288
  • Chlumecký, M., Buchtele, J., and Richta, K., 2017. Application of random number generators in genetic algorithms to improve rainfall-runoff modelling. Journal of Hydrology, 553, 350–355. doi:10.1016/j.jhydrol.2017.08.025
  • Confesor, R.B. and Whittaker, G.W., 2007. Automatic calibration of hydrologic models with multi-objective evolutionary algorithm and Pareto optimization. Journal of the American Water Resources Association, 43 (4), 981–989. doi:10.1111/jawr.2007.43.issue-4
  • Dile, Y.T. and Srinivasan, R., 2014. Evaluation of CFSR climate data for hydrologic prediction in data-scarce watersheds: an application in the Blue Nile River Basin. Journal of the American Water Resources Association, 50 (5), 1226–1241. doi:10.1111/jawr.12182
  • Evin, G., Favre, A.-C., and Hingray, B., 2018. Stochastic generation of multi-site daily precipitation focusing on extreme events. Hydrology and Earth System Sciences, 22 (1), 655–672. doi:10.5194/hess-22-655-2018
  • Faramarzi, M., et al., 2015. Setting up a hydrological model of Alberta: data discrimination analyses prior to calibration. Environmental Modelling & Software, 74, 48–65. doi:10.1016/j.envsoft.2015.09.006
  • Fortin, J.-P., et al., 2001. Distributed watershed model compatible with remote sensing and GIS data. I: description of model. Journal of Hydrologic Engineering, 6 (2), 91–99. doi:10.1061/(ASCE)1084-0699(2001)6:2(91)
  • Her, Y., Cibin, R., and Chaubey, I., 2015. Application of parallel computing methods for improving efficiency of optimization in hydrologic and water quality modeling. Applied Engineering in Agriculture, 31 (3), 455–468.
  • Karahan, H., Ceylan, H., and Tamer Ayvaz, M., 2007. Predicting rainfall intensity using a genetic algorithm approach. Hydrological Processes, 21 (4), 470–475. doi:10.1002/(ISSN)1099-1085
  • Kerr, J.G. and Cooke, C.A., 2017. Erosion of the Alberta badlands produces highly variable and elevated heavy metal concentrations in the Red Deer River, Alberta. Science of the Total Environment, 596, 427–436.
  • Khalili, M., Brissette, F., and Leconte, R., 2011. Effectiveness of multi-site weather generator for hydrological modeling. Journal of the American Water Resources Association, 47 (2), 303–314.
  • Khalili, M., Leconte, R., and Brissette, F., 2007. Stochastic multisite generation of daily precipitation data using spatial autocorrelation. Journal of Hydrometeorology, 8 (3), 396–412.
  • Koutrouvelis, I.A. and Canavos, G.C., 1999. Estimation in the Pearson type 3 distribution. Water Resources Research, 35 (9), 2693–2704.
  • Leander, R. and Buishand, T.A., 2009. A daily weather generator based on a two-stage resampling algorithm. Journal of Hydrology, 374 (3–4), 185–195.
  • Li, Z., et al., 2017. Links between the spatial structure of weather generator and hydrological modeling. Theoretical and Applied Climatology, 128 (1–2), 103–111.
  • Manfreda, S., et al., 2018. Exploiting the use of physical information for the calibration of a lumped hydrological model. Hydrological Processes, 32 (10), 1420–1433.
  • Mehrotra, R., Srikanthan, R., and Sharma, A., 2006. A comparison of three stochastic multi-site precipitation occurrence generators. Journal of Hydrology, 331 (1–2), 280–292.
  • Mishra, A.K. and Coulibaly, P., 2010. Hydrometric network evaluation for Canadian watersheds. Journal of Hydrology, 380 (3–4), 420–437.
  • Mousavi, S.J., et al., 2012. Uncertainty-based automatic calibration of HEC-HMS model using sequential uncertainty fitting approach. Journal of Hydroinformatics, 14 (2), 286–309.
  • Muleta, M.K. and Nicklow, J.W., 2005. Sensitivity and uncertainty analysis coupled with automatic calibration for a distributed watershed model. Journal of Hydrology, 306 (1), 127–145.
  • Nachtergaele, F., et al., 2009. Harmonized world soil database. Wageningen: ISRIC. Available from: http://www.fao.org/land-water/en/.
  • Palutikof, J., et al., 2002. Generating rainfall and temperature scenarios at multiple sites: examples from the Mediterranean. Journal of Climate, 15 (24), 3529–3548.
  • Qian, B., Corte-Real, J., and Xu, H., 2002. Multisite stochastic weather models for impact studies. International Journal of Climatology, 22 (11), 1377–1397.
  • Rajesh, M., Kashyap, D., and Hari Prasad, K., 2010. Estimation of unconfined aquifer parameters by genetic algorithms. Hydrological Sciences Journal–Journal Des Sciences Hydrologiques, 55 (3), 403–413.
  • Rebonato, R. and Jäckel, P., 1999. The most general methodology to create a valid correlation matrix for risk management and option pricing purposes. Journal of Risk, 2, 17–27.
  • Richardson, C.W., 1981. Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resources Research, 17 (1), 182–190.
  • Richardson, C.W. and Wright, D.A., 1984. WGEN: a model for generating daily weather variables. United States Department of Agriculture, Agriculture Research Service, ARS-8, August 1984.
  • Rouhani, H., et al., 2007. Parameter estimation in semi-distributed hydrological catchment modelling using a multi-criteria objective function. Hydrological Processes, 21 (22), 2998–3008.
  • Semenov, M.A. and Barrow, E.M., 1997. Use of a stochastic weather generator in the development of climate change scenarios. Climatic Change, 35 (4), 397–414.
  • Srikanthan, R., 2005. Stochastic generation of daily rainfall data using a nested transition probability matrix model. In: 29th Hydrology and Water Resources Symposium: Water Capital, Rydges Lakeside, Canberra. Engineers Australia, 26.
  • Tanzeeba, S. and Gan, T.Y., 2012. Potential impact of climate change on the water availability of South Saskatchewan River Basin. Climatic Change, 112 (2), 355–386.
  • Tripathi, M., et al., 2004. Hydrological modelling of a small watershed using generated rainfall in the soil and water assessment tool model. Hydrological Processes, 18 (10), 1811–1821.
  • Wang, X. and Melesse, A., 2005. Evaluation of the SWAT model’s snowmelt hydrology in a northwestern Minnesota watershed. Transactions of the ASAE, 48 (4), 1359–1376.
  • Watson, B., et al., 2005. Hydrologic response of SWAT to single site and multi-site daily rainfall generation models. In: Proceedings of MODSIM05 international congress on modelling and simulation, Melbourne, Australia, 2981–2987.
  • Whitley, D., 1994. A genetic algorithm tutorial. Statistics and Computing, 4 (2), 65–85.
  • Wilks, D., 1998. Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology, 210 (1), 178–191.
  • Wilks, D.S., 1999. Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agricultural and Forest Meteorology, 96 (1), 85–101.
  • Wilks, D.S. and Wilby, R.L., 1999. The weather generation game: a review of stochastic weather models. Progress in Physical Geography, 23 (3), 329–357.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.