784
Views
6
CrossRef citations to date
0
Altmetric
Research Article

Assessing different roles of baseflow and surface runoff for long-term streamflow forecasting in southeastern China

ORCID Icon, , , &
Pages 2312-2329 | Received 15 Mar 2021, Accepted 08 Sep 2021, Published online: 11 Nov 2021

ABSTRACT

Accurate long-term streamflow forecast is essential to alleviate and solve the water security problems related to flood and drought disaster warnings. In this study, a new strategy for forecasting monthly streamflow is proposed and four scenarios are designed for the evaluation of different roles of baseflow and surface runoff on performances of long-term streamflow forecasting. The developed models are evaluated at multiple streamflow sites located in the Zhejiang Province of China. The results show that artificial intelligence (AI)-based models with two predictor variables (i.e. baseflow and surface runoff) performed better than that with a single predictor (streamflow) for all the months in a year, and the prediction accuracy of annual peak and monthly streamflow values is improved. Based on the comprehensive evaluations of all the models, the baseflow and surface runoff values are recommended as inputs to AI-based models for an improved prediction accuracy of streamflows.

Editor A. Fiori Associate editor M. Ionita

1 Introduction

The development of streamflow forecasting models with high prediction accuracy has always been an essential task for water resources and disaster management. Typically, hydrological forecasting can be divided into short-term and long-term predictions; the lead time of the former is within 72 hours, while that of the latter is greater than or equal to one month. Streamflow forecasts are critical for flood and drought disaster management (Sattari et al. Citation2012, Turner et al. Citation2017, Kao et al. Citation2020), safe and economic operation of reservoirs (Block Citation2011, Humphrey et al. Citation2016, Yang et al. Citation2017, Zhang et al. Citation2018), and optimal allocation of water resources (Mendoza et al. Citation2017, Kratzert et al. Citation2018, Tan et al. Citation2018). Several studies have reported the use of long-term streamflow forecasting models in major basins around the world in recent years, such as the Yangtze River (China) (Zhou et al. Citation2018), Yellow River (China) (Wang et al. Citation2018), Mahanadi River (India) (Sahoo et al. Citation2019), South Korean basins (Ajmal et al. Citation2015), eastern Australia basins (Loveridge et al. Citation2017), Sicilian River (Italy) (Pumo et al. Citation2016), and Gediz Basin (Turkey) (Turan and Yurdusev Citation2014).

Long-term streamflow forecasting models can be classified as physics-based and data-driven models. The latter do not consider the physical mechanisms and processes responsible for streamflow generation. Data-driven models include time series analysis (Cooper et al. Citation2018, Papacharalampous and Tyralis Citation2020, Kim et al. Citation2021), regression-based models (Fashae et al. Citation2019, Zuo et al. Citation2020), artificial neural networks (ANNs) (Cheng et al. Citation2020a, Hassan and Hassan Citation2020, Lv et al. Citation2020), and support vector machines (SVMs) (Bhandari Citation2019, Yu et al. Citation2020). The time series models are based on historical observations of hydrological elements to explore their evolution to forecast the hydrological processes, such as the autoregressive moving average (ARMA) model (Box and Jenkins Citation2010) and autoregressive integrated moving average (ARIMA) model (Carlson et al. Citation1970). Regression analysis is a method that considers the changes in the predicted objects due to the influencing factors. The ANN is a nonlinear system composed of many neurons, which has excellent self-learning and self-adaptive performance, and is widely used in streamflow and precipitation forecasting (Li et al. Citation2019), including the long short-term memory (LSTM) model, gated recurrent unit (GRU) model, and back-propagation (BP) model. An SVM is a generalized linear classifier that classifies data according to supervised learning, and its decision boundary is the maximum margin hyperplane (Zhu et al. Citation2016).

Several long-term streamflow forecasting models are commonly used for flood and drought management, but those models still have much room for improvement in their forecasting accuracy. Furthermore, due to the influences of climate change, human activities, and the complexity of the hydrological process itself, these models cannot fully capture all the characteristics of hydrological processes. Data-driven forecasting models can be improved by incorporating variables that are linked to streamflow generating mechanisms and can help explain their variability in time. Ensuring the inclusion of causal variables (e.g. a variable that influences another variable) in data-driven models can help capture the physical mechanisms responsible for the behaviour of the predictand (i.e. streamflow).

To characterize the complete set of hydrological processes contributing to streamflow generation and improve the prediction accuracies of the models in long-term streamflow forecasts, baseflow and surface runoff are used and evaluated as predictor variables in multiple data-driven models in this study. Surface runoff is the water that immediately contributes to river flow after rainfall events, as quick flow. Baseflow is the less undulating part of the hydrograph when the river is in the dry season and is the primary source of water supply that helps maintain reasonable river flow conditions. Baseflow separation approaches that partition the streamflow into different components, such as the surface flow and subsurface flow (Ahiablame et al. Citation2013), can help estimate baseflow for use in data-driven models. Due to the differences in basin attributes, the estimates from baseflow separation methods may be different. Only a few studies (Corzo and Solomatine Citation2007, Tongal and Booij Citation2018, Huang et al. Citation2020) related to short-term streamflow forecasts have used both baseflow and surface runoff characteristics. There have been very few studies in long-term streamflow forecasting to use both baseflow and surface runoff as predictor variables, where baseflow can be a more important causal variable explaining the variations in streamflows.

Therefore, the main focus of this study is to evaluate the efficacy of data-driven forecasting models that use baseflow, surface runoff, and streamflow variables as inputs, and these models are evaluated for forecast skill and their utility for water resources management in this study. The three main objectives of this study are therefore: (1) to use an appropriate baseflow separation method to separate the streamflow into baseflow and surface runoff; (2) to evaluate the performance of different models in long-term streamflow forecasting; and (3) to test the forecast results of baseflow, surface runoff, and streamflow as different predictor variables in artificial intelligence (AI)-based models.

2 Methodology

As an initial step of the forecasting model development, the existence or non-existence of trends in the available time series of streamflow needs to be evaluated using a non-parametric statistical hypothesis test (i.e. the Mann-Kendall (MK) test). Descriptions of the MK test and different AI-based models developed in this study are provided next. illustrates the methodology adopted in this study for forecasting monthly streamflow and evaluation of the models using four scenarios for different temporal and spatial scales.

Figure 1. Methodology using long-term streamflow forecasting models with four scenarios

Figure 1. Methodology using long-term streamflow forecasting models with four scenarios

2.1 Mann-Kendall test

The MK test is a rank-based nonparametric test that detects linear and nonlinear trends, where the null and alternative hypotheses are equal to the non-existence and existence of a trend in a time series, respectively (Mann Citation1945). The MK test has been used in several studies to evaluate the trend and stationarity analysis of meteorological and hydrological time series. When the test statistic (Z) used in the test provides a value of Z>0, it can be concluded that the time series sequence shows an upward trend. When Z<0, the time series represents a decreasing trend. The test can be conducted at a significance level of 5% to draw inferences about statistically significant trends. Climate change and human activities have a great impact on the regional hydrological process, and streamflow is becoming non-stationary. Trend analysis is important since the stationarity assessment is the premise of this study. Moreover, it can further illustrate the applicability of forecasting models when the streamflow series is stationary or non-stationary.

2.2 Forecasting models

2.2.1 Back-propagation neural network

The BP neural network is divided into input, hidden, and output layers, and the network layers are closely linked by weight bias. The BP prediction model is based on an error analysis between the training and expected results, and then the weight and thresholds are modified, step by step, to obtain a model that can produce output consistently with the desired results (Riad et al. Citation2004, Vivekanandan Citation2011). The algorithm of the BP long-term streamflow forecasting model consists of forward and backward propagation. In the forward propagation, the output z of the j neuron in the hidden layer is as shown below:

(1) zj=fi=1mwi,jxi+bjj(1)

where f. is the excitation function of neurons and bj represents bias. After the hidden layer mapping, zj is directly used as the input of the output layer, and the output layer is used for nonlinear fitting. At this point, the output Ykˆ of the k neuron in the output layer is:

(2) Ykˆ=i=1mwj,kzj+bkk(2)

where bk denotes bias from the hidden layer to the output layer. The error ek between measured and predicted streamflow is expressed as follows:

(3) ek=YkYkˆk(3)

where Yk indicates the expected output. The objective function of back-propagation is the sample mean square error E of model training:

(4) E=12YkYkˆ2=12k=1mek2(4)

BP neural network is an iterative learning algorithm that uses a gradient descent strategy to adjust the parameters with the target of a negative gradient direction in each iteration of back-propagation. The input layer to hidden layer weight wi,j and bias bk are updated to:

(5) wi,j=wi,j+ηHj1zjxik=1mwj,keki, j(5)
(6) bj=bj+ηzj1zjk=1mwj,kekj(6)

The weight of the hidden layer to the output layer wj,k and bias bk are renewed to:

(7) wj,k=wj,k+ηzjekk(7)
(8) bk=bk+ηekk(8)

2.2.2 LSTM network

LSTM is a unique recurrent neural network (RNN), which is divided into three layers with repeated chain structure (Werbos Citation1990, Hochreiter and Schmidhuber Citation1997). Compared with RNN, LSTM has a more complex system of its hidden layer, which adds a cell state and gate structure to control the flow of memory information and realize the long-term transmission and memory of knowledge. Hence, the LSTM model is more suitable for processing sequences with more extended time information than RNN and can carry long-term steps. The structure of the LSTM model mainly includes the following five parts:

  1. The forgetting gate expression is as follows:

(9) ft=σWfht1+Ufxt+bf(9)
  • (2) The input gate function is shown below:

(10) it=σWiht1+Uixt+bi(10)
  • (3) The candidate gate expression is calculated as:

(11) CttanhWcxt+Ucht1+bc(11)
  • (4) The cell state renewal equation is expressed as:

(12) Ct=ft×Ct1+it×Ct(12)
  • (5) The output gate function is calculated using the following equations:

(13) Ot=σWoht1+Uoxt+bo(13)
(14) ht=Ot×tanhCt(14)

where Wf, Wi, Wc, and Wo are the weight parameter matrix between the hidden layer forgetting, input, hidden candidate, and output gate and the upper layer neurons of the current time step; Uf, Ui, Uc, and Uo represent the weight parameters between the input layer and forgetting, input, hidden candidate, and output gate; bf, bi, bc, and bo denote the bias vectors of forgetting, input, hidden candidate, and output gate, respectively; it indicates the output value of the input gate at the current time, is the bias of the input gate; tanh is the excitation function; and ht is defined as the output gate controlling the output state of cells.

2.2.3 GRU network

The GRU network simplifies the LSTM model in that it only retains the update gate and reset gate (Cho et al. Citation2014). GRU directly takes the long-term memory of a particular moment as the output and modifies the long-term memory while outputting. Therefore, GRU has fewer inputs than the LSTM network (LSTM has three inputs, whereas GRU has only two) and a more straightforward structure to reduce computation time. The GRU network combines the input gate and forgetting the gate into update gate z, cell state, and output gate into reset gate r. Reset and update gates are calculated as follows, where ⊙ is the Hadamard product:

(15) rt=σWrxt+Urht1(15)
(16) zt=σWzxt+Uzht1(16)

where xt is the input of the current time step, ht1 represents information from the previous time step, xt and ht1 are zoomed via activation function to [−1,1]. Status information ht of the last time is shown below:

(17) ht=ztht1+1ztht(17)

The reset gate r controls the information useful to xt in ht, and then resets it. The equation for calculating the candidate hidden layer is as follows:

(18) ht=tanhWhxt+Uhrtht1(18)

where rt is the amount of information reserved in the previous period.

2.2.4 SVM model

The SVM approach uses a primary learning algorithm based on statistical learning knowledge, the structural risk minimization principle, and the dual idea (Vapnik Citation1998, Chapelle et al. Citation2002). The SVM model has great generalizability and unique advantages in solving small-sample, nonlinear, and high-dimensional pattern recognition problems. The goal of the SVM model is to obtain the optimal solution under the limited sample information rather than for the sample of infinite solutions. The algorithm is then transformed into a quadratic optimization problem to seek the optimal global solution and avoid local extrema. The algorithm maps the practical issues to high-dimensional space through nonlinear transformation, making the nonlinear discriminant function of the original area into a linear function. In hydrological forecasting, the regression of the SVM forecasting model is mainly used to fit the hydrological series.

In the case of linear separability, the separation of binary decision classes is expressed below:

(19) y=ω0+ω1x1+ω2x2(19)

where y is the output vector, xi is defined as the characteristic value, and ωi indicates the weight value of the plane. In the case of nonlinear separability, the high-dimensional maximum hyperplane boundary equation can be shown below:

(20) y=b+i=0aiyiKxi,x(20)

where Kxi,x refers to the kernel function. There are many kinds of kernel functions, such as linear, polynomial, perceptron, and radial basis kernel functions.

2.2.5 Holt-Winters model

The Holt-Winters (HW) model is a cubic exponential moving average algorithm, which divides the time series data into three parts: residual data at, trend data bt, and seasonal data st (Kays et al. Citation2018). The principle of the smoothing index method is that the exponential smoothing value of any period is the weighted average of the actual observation value of the current period and the exponential smoothing value of the previous period. The cubic exponential smoothing considers the seasonality of time series, and the iterative formula of each part is shown below:

(21) at=αYtstk+1αat1+b)(21)
(22) bt=βatat1+1βbt1(22)
(23) st=γYtat+1γstk(23)

where α, β, and γ are model parameter values, which range between 0 and 1; and Yt is forecasting the value of t time step. If there are periodic abrupt increases and drop points in the sequence, systematic data of the HW model will retain these steep increases and steep drop trends so that these trends can be accurately predicted without raising a false alarm.

2.2.6 SARIMA model

The ARIMA model is one of the most widely used time series forecasting methods, and the seasonal autoregressive integrated moving average (SARIMA) model is based on it (Phan and Nguyen Citation2020). Usually, the ARIMA model is represented by ARIMA(p,d,q), where AR refers to the autoregressive process of the model, p is the number of autoregressive items, MA represents the moving average items, and d is the number of non-seasonal differences needed to achieve stationarity. Based on this, the SARIMA model considers the seasonal factors, and the model is expressed as SARIMA(p,d,q)(P, D, Q). The calculation process of the SARIMA model is shown below:

(24) APLBPLsΔdΔsDyt=CqLDQLsμt(24)

where

(25) APL=1a1La2L2apLp(25)
(26) BPL=1b1Lsb2L2sbpLps(26)
(27) CqL=1+c1L+c2L2++cqLq(27)
(28) DQL=1+d1Ls+d2L2s++dQLQs(28)
(29) Δ=1L, Δs=1Ls(29)
(30) μtIID0,σ2(30)

yt is the original data sequence, L denotes the lagged operator, s represents the sequence change period, Δ indicates non-seasonal, Δs refers to the seasonal difference with s period, d and D are different times of the two models, and μt is defined as the white noise in the time series.

2.3 Baseflow separation methods

Baseflow separation methods can be roughly divided into five categories: graphic, hydrological model, analytical, isotope, and numerical simulation methods (Cheng et al. Citation2016). The graphic method is the basic method of baseflow separation; it is conceptually simple and mainly uses manual visual judgement, which is subjective and makes it difficult to process long-time series hydrological data. Hydrological modelling and analytical methods often require more parameters which are sometimes difficult to estimate based on available data or field-measured parameters. For example, the isotope method requires field-measured observations and laboratory analysis and is an expensive method for estimation of baseflow. The numerical simulation method is most appropriate considering the ease of use and its utility in processing large amounts of streamflow datasets for baseflow estimation. The numerical simulation method not only has the characteristics of being fast, efficient, and useful for batch processing of long series of hydrological data, but also has been widely used in multiple studies (Cartwright et al. Citation2014, Lott and Stewart Citation2016, Chen and Teegavarapu Citation2020, Xie et al. Citation2020). The numerical method used in the current study is the Lyne-Hollick (LH) (Lyne and Hollick Citation1979) method.

LH was first used in baseflow separation by Nathan and McMahon (Citation1990). The LH method is a digital filtering method based on signal analysis, and its main function is to divide the streamflow data into the fast response part (direct runoff) and the slow response part (baseflow) (Cheng et al. Citation2020b). The LH method is one of the most widely used baseflow separation methods. It has been proved that it has good applicability, objectivity, and repeatability in many river basins (Cheng et al. Citation2012, Ahiablame et al. Citation2013, Lucas et al. Citation2021). Therefore, this relatively reliable LH method is selected for baseflow separation in this study. The following equation can be used to calculate the surface runoff:

(31) Qsi=αQsi1+1+α2QiQi1i(31)

where Qi is streamflow; Qsi represents surface runoff; i denotes the time step; and α is a recession constant, which is in the range of 0.9~0.95; 0.925 is used in this study as suggested by Nathan and McMahon (Citation1990). Many studies (Cheng et al. Citation2020b, Longobardi et al. Citation2018, Lucas et al. Citation2021) have proven that filtered results of this value are close to the actual process. According to the above calculation, the baseflow (Qbi) can be calculated as:

(32) Qbi=QiQsii(32)

The LH method mainly uses EquationEquations (31) and (Equation32) to calculate the baseflow, and the number of repeated calculations of the baseflow process has a significant impact on the smoothness of its strategy. The purpose of recalculating the baseflow from back to front is to eliminate distorted data from the first calculation. According to the regular information on positive and negative alternating filtering, this study used three rounds of filtering.

2.4 Scenario setting

Four scenarios (referred to as S1, S2, S3, and S4) for long-term streamflow forecasting are listed in . Lagged values of monthly streamflows, baseflow, and surface runoff are used as predictors, and the predictand (i.e. output) from all the scenarios is the monthly streamflow. Scenario S1 presents a monthly streamflow forecast using monthly streamflow series data as input, and this scenario setting has been adopted in many studies (Robertson and Wang Citation2012, Zhao et al. Citation2016, Tongal and Booij Citation2018). For scenario S1, six forecasting models (LSTM, GRU, BP, SVM, HW, and SARIMA) are used to forecast long-term streamflow and compare the prediction accuracy of each model. Scenario 2 illustrates long-term streamflow prediction using monthly surface runoff as predictor values, and scenario 3 predicts monthly streamflow applying monthly baseflow series data. The latter two scenarios can attribute the influence of forecasting performance to either baseflow or surface runoff. In the case of scenario S4, forecasting models use monthly baseflow and surface runoff as input to predicate output (monthly streamflow). This setting can improve the prediction accuracy of the previous month when the flow is particularly large because different hydrological processes of baseflow and surface runoff are used in the monthly streamflow forecasting. Besides, scenarios 2, 3, and 4 utilize ANN models (LSTM, GRU, and BP) to evaluate the influence of different forecast factors on the prediction accuracy of each model.

Table 1. Inputs and outputs in the four scenarios devised in this study

2.5 Forecasting model evaluation criteria

Error and performance measures are generally used to evaluate the model prediction skill. In this study, the percent bias (Bias%) and Nash-Sutcliffe efficiency coefficient (NSEC) are used for the evaluation of forecasting models. The Bias% is used to assess the difference between the observed and predicted values of streamflows from different models, and the calculation of the measure is based on Equation (36):

(33) Bias%=100×i=1nQiesti=1nQiobsi=1nQiobs(33)

where Qiest is the forecast streamflow, Qiobs is the observed streamflow values, and n is the total number of streamflow time series values. The differences between simulated and observed values can be evaluated using the deviation percentage index. If the Bias% is close to 0, the performance of the forecasting model can be considered good.

NSEC is a normalized statistic and a classical statistical index used to evaluate the performance of the model. The NSEC is calculated as:

(34) NSEC=1i=1nQiobsQiest2i=1nQiobsQiobs2(34)

where Qiobs is defined as the total mean of observations. The range of NSEC is from negative infinity to 1; when it is close to 1, that means that the quality of the model is credible.

3 Case study and data preparation

3.1 Study area

The study area used for the evaluation of the forecasting models is located in Zhejiang Province, China, as shown in . Zhejiang Province is located in the southeastern part of the Chinese mainland, bordering the East China Sea. The terrain varies from west to east, and it has a humid monsoon climate with an average annual precipitation of more than 1600 mm. The mountainous landform in the region forms many rivers, including Tiaoxi, Jinghang Canal (Zhejiang section), Qiantang River, Yong River, Ling River, Ou River, Feiyu River, and Ao River as the eight major water systems. There are four rivers with a basin area of more than 10 000 km2, and 21 basins have areas between 1000 and 10 000 km2. Most rivers in Zhejiang Province have the following characteristics in common: (1) massive floods in the flood or wet season, rapid flood concentration, and high rise of the flood; (2) small flow in the dry season, with some of the small rivers being cut off in dry years; (3) significant tidal influence, sizeable tidal range, and long distances between tidal regions cause unfavourable effects on tide prevention.

Figure 2. Location map of the study area and the streamflow gauge stations

Figure 2. Location map of the study area and the streamflow gauge stations

Shaduan and Bozhiao hydrological stations are located in the middle and upper regions, respectively, of Jiao River Basin; the Jiao River is in the central coastal area of Zhejiang Province between 28°22ʹ and 29°19ʹN and between 120°14ʹ and 121°55ʹE. The total area of the basin is around 6603 km2, and the total length of the river is close to 206 km. The upper regions of the Jiao River Basin are affected by the landforms of cut and broken hills, and the streams are distributed vertically and horizontally. The trunk and all tributaries are mountainous rivers with a steep slope and rapid flow, and the flood rises and falls suddenly. The main river channel in the lower section has a gentle slope, which is a tidal reach.

Jinhua gauge station is a control station in Qiantang River Basin. The station is located in the upper reaches of the main stream, with a basin area of 5953 km2, a length of 160 km, an average width of 38 km, and an average height of 27.4 m. This river is a mountain stream with a short source, rapid flow, and sudden rise (fall) characteristics. The Changchunling hydrological station is located in Zhoushan Islands, and the control area of the basin is about 3.9 km2. The northern island has low-lying and sparsely distributed features, which belong to the subtropical monsoon climate.

3.2 Observational data

Daily discharge data from four hydrological stations [Shaduan (SD), Bozhiao (BZA), Jinhua (JH), and Changchunling (CCL)] are used to obtain the monthly streamflow data series. Seventy percent of the available data is used for training and validation, and the rest of the data is used for testing to evaluate the models developed in this study, as shown in . This study focuses on the evaluation of model performance during the validation and testing periods.

Table 2. Data information for all gauge stations

4 Results and discussion

4.1 Forecasts without baseflow and surface runoff

The results of the MK test are shown in . It is evident that the monthly streamflow at the BZA station showed no significant downward trend. It is worth mentioning that the streamflow variation is a partially random process, and the future trend has a long-term correlation with its historical change trend. In contrast, the monthly streamflow time series of the SD station shows no significant increasing trend between 1971 and 2018, but the MK test results also display a slightly downward trend during 1966–1970. The monthly streamflow of the CCL station has no definite direction of variation. According to the MK test results for the JH station, a decreasing trend appears in the streamflow time series from 1961 to 1974. However, the data shows a significant increasing trend after 1974. Additionally, the MK test results for BZA, CCL, and SD stations indicate that all of their streamflow time series can be considered stationary, while that of the JH station is determined by the non-stationary streamflow series. lists trend analysis results for each of the 12 months at the four hydrological stations. The April streamflow at BZA and the May streamflow at JH show significant decreasing trends, and the August streamflow at SD and JH shows upward trends. Moreover, SD and JH display an increasing trend of monthly streamflow in December and March, respectively, whereas the monthly streamflow of the remaining months shows no significant trend.

Table 3. Trend analysis results using the MK test for each of the 12 months at four hydrological stations

Figure 3. Trend analysis results using the MK test at four hydrological stations

Figure 3. Trend analysis results using the MK test at four hydrological stations

In this section, results from six long-term streamflow forecasting models applied to predict monthly streamflow values at four hydrological stations are discussed. Each model was trained using the antecedent monthly streamflow as a predictor. Results, including NSEC and Bias% performance metrics for the testing and validation periods, are provided in . Overall, the performance metrics indicate that ANN models performed better than the statistical model. Comparing forecasting results of LSTM, GRU, BP, SVM, HW, and SARIMA models at the BZA station, it is found that the BP model had the highest values of NSEC (0.85 and 0.82) and the lowest Bias% values (−0.01 and 0.02) in the testing and validation periods, respectively. The performances in the validation period were slightly poorer than those of the testing period. This is expected because of the internal mechanism of the neural network model. Conversely, the NSEC value of the LSTM model was higher than those of the other models for the SD gauge station. It is observed that the lowest Bias% (0.28) was displayed at the CCL station when using the HW model, which demonstrates that the statistical models with long time series produce greater accuracy compared with those using shorter time series. The Bias% values (0.12) of the GRU model in the JH station for the testing and validation datasets were less than those for the other five models.

Table 4. Performance metrics of long-term streamflow forecasting models during the testing and validation periods

shows the plots of the observed and simulated monthly streamflow values based on the six models at the four hydrological stations. Based on the plots it can be concluded that the simulation values from AI-based models are closer to the observed streamflows than those of the SVM, HW, and SARIMA models. It is noteworthy that the statistical models underestimate high monthly streamflow values. However, the LSTM, GRU, and BP models provide excellent performance when high streamflow values are considered, and they show great adaptability in predicting hydrological processes.

Figure 4. Observed and simulated values of monthly streamflows at (a) BZA station, (b) CCL station, (c) JH station, and (d) SD station based on the (e) LSTM, (f) GRU, (g) BP, (h) SVM, (i) HW, and (j) SARIMA models

Figure 4. Observed and simulated values of monthly streamflows at (a) BZA station, (b) CCL station, (c) JH station, and (d) SD station based on the (e) LSTM, (f) GRU, (g) BP, (h) SVM, (i) HW, and (j) SARIMA models

Also, the prediction capabilities of the six long-term streamflow forecasting models for the annual maximum and minimum monthly values are compared and evaluated in this study. presents the relationship between annual peak values of observed and forecast monthly streamflow by the six models. shows that the forecasting skills of LSTM, GRU, and BP are good, with reasonable predictions of low and high streamflow values. However, it can be noted that the forecast performances of the SVM, HW, and SARIMA models for peak values are significantly affected by underestimations. The peak value simulation skill of the HW model is slightly higher than that of SVM and SARIMA. The results from AI-based models suggest weaker performance in estimating high flows than those for total streamflow values, which has been a problem in most of the AI-based models in hydrological forecasting (e.g. low flow, land surface temperature, and soil moisture downscaling) (Sahoo et al. Citation2019, Long et al. Citation2020, Aboward Citation2021). The prediction accuracy of AI-based models is greatly affected by the selection of prediction factors, and streamflow as a single predictor may not be able to capture the change process of high flows. Post-processing of the estimation from AI-based models can be used to improve the forecast in future studies (Dehghani et al. Citation2019, Zuo et al. Citation2020).

Figure 5. Comparison of the observed and simulated annual peak values of monthly streamflows at (a) BZA station, (b) CCL station, (c) JH station, and (d) SD station based on the (e) LSTM, (f) GRU, (g) BP, (h) SVM, (i) HW, and (j) SARIMA models

Figure 5. Comparison of the observed and simulated annual peak values of monthly streamflows at (a) BZA station, (b) CCL station, (c) JH station, and (d) SD station based on the (e) LSTM, (f) GRU, (g) BP, (h) SVM, (i) HW, and (j) SARIMA models

The plots of observed and forecast annual minimum monthly streamflow are shown in . The LSTM, GRU, and BP models not only can produce more reasonable minimum values with the long dataset but also can provide more accurate predictions using the short time series. As compared with forecasting peak value, the ANN models failed to predict the minimum value, and the use of a single predictor may have caused this situation. The scatter plots show an overestimation for the ANN models, and it can be speculated that this is because the baseflow is not be considered in the forecasting process. The surface runoff is greater in the wet season than in the dry season, and baseflow maintains the main streamflow during the no-rain period. The prediction of high flows is of great significance for flood control in the wet season, and forecasting of low flow plays a significant role in water demand regulation and ecology in the dry season. The most important factor in AI-based models for hydrological forecasting is the selection of predictors and their use as effective predictors to improve the forecasting accuracy of high flows and low flows. Therefore, baseflow and surface runoff are suggested for use in forecasting models in the next section.

Figure 6. Comparisons of the observed and simulated annual minimum values of monthly streamflow at (a) BZA station, (b) CCL station, (c) JH station, and (d) SD station by the (e) LSTM, (f) GRU, (g) BP, (h) SVM, (i) HW, and (j) SARIMA models

Figure 6. Comparisons of the observed and simulated annual minimum values of monthly streamflow at (a) BZA station, (b) CCL station, (c) JH station, and (d) SD station by the (e) LSTM, (f) GRU, (g) BP, (h) SVM, (i) HW, and (j) SARIMA models

4.2 Forecasts with baseflow and surface runoff

In this section, daily baseflow and surface runoff were separated by the LH method. Based on daily data, monthly baseflow and surface runoff values are obtained. shows the statistical summary of monthly baseflow in the CCL and JH hydrological stations, including mean, standard deviation, median, BFI (Baseflow Index)(calculated from the sum of baseflow divided by the total of streamflow) stationary condition of time series. In more detail, the baseflow dataset of the CCL station is stationary, while that of JH shows a non-stationary (upward) trend. It is apparent from the data shown in that the BFI value of the CCL gauge station is almost double compared with those from the JH station. Here, BFI could be related to the geographical characteristics of the basins. The CCL station is located in an island basin, which means that the proportion of surface runoff is much larger than the baseflow.

Table 5. Statistical summary of monthly baseflows at stations CCL and JH

The performance of four scenarios was assessed using the LSTM, GRU, and BP models to forecast the monthly streamflow of the CCL and JH gauge stations. AI-based models are ideally suited to select different numbers of inputs that would yield different scenarios. Furthermore, only the monthly streamflow dataset is available for the BZA and SD gauge stations, while the daily streamflow series of the CCL and JH hydrological stations are also used in different evaluations. shows a map of the forecasting model evaluation criteria (NSEC).As a whole, S4 (baseflow and surface runoff) performs better than other scenarios in both testing and validation periods. Higher accuracies are observed when using baseflow and surface runoff as predictors than when only streamflow is used in the prediction of monthly streamflow.

Figure 7. Forecasting model evaluation criteria (NSEC) of three long-term streamflow forecasting models (LSTM, GRU, and BP) for four scenarios at stations CCL and JH: (a–c) validation period; (d–f) testing period

Figure 7. Forecasting model evaluation criteria (NSEC) of three long-term streamflow forecasting models (LSTM, GRU, and BP) for four scenarios at stations CCL and JH: (a–c) validation period; (d–f) testing period

In addition, present the results of simulations with three models in the four scenarios. It can be seen that the hydrograph forecast by S4 is the best fitted to the observed streamflow dataset in the CCL and JH gauge stations. Notably, the highest forecast accuracy appears when using baseflow and surface runoff as two model predictors, no matter whether the streamflow series is stationary or non-stationary. demonstrates the accuracy of the simulation by different predictor scenarios represented by NSEC and Bias%. According to , the average NSEC increased to more than 0.85, and the mean Bias% value is below 1% when using two predictors for forecasting the monthly streamflow of the CCL and JH stations. Therefore, streamflow decomposition into baseflow and surface runoff can improve the accuracy of the long-term hydrological forecast. It is apparent from the table and graphs that the BP model simulations are closer to the observed stationary streamflow series compared with the other two ANN models. In addition, the LSTM model performed better than the BP and GRU model in the non-stationary streamflow series. It is worth noting that the NSEC values of the three models were all greater than 0.7, and the Bias% was less than 10%, which demonstrates that all three models can be applied to long-term hydrological forecasting. At the JH station, the NSEC values for S1, S2, and S3 using the LSTM model are 0.72, 0.82, and 0.85, respectively. It can be concluded that the accuracy using baseflow as a predictor is higher than that from streamflow and surface runoff.

Table 6. LSTM, GRU, and BP model performance metrics for four scenarios

Figure 8. Observed and simulated monthly streamflow by the LSTM, GRU, and BP models for four scenarios (S1, S2, S3, and S4) at the CCL hydrological station

Figure 8. Observed and simulated monthly streamflow by the LSTM, GRU, and BP models for four scenarios (S1, S2, S3, and S4) at the CCL hydrological station

Figure 9. Observed and simulated monthly streamflow by the LSTM, GRU, and BP models for four scenarios (S1, S2, S3, and S4) at the JH hydrological station

Figure 9. Observed and simulated monthly streamflow by the LSTM, GRU, and BP models for four scenarios (S1, S2, S3, and S4) at the JH hydrological station

As mentioned in the previous section, in Scenario S1, the simulation of the annual peak monthly discharge of neural network models is better than that of statistical models. The observed and simulated annual peak values of monthly streamflow by four scenarios using three models in the two hydrological stations are shown in . It is found that the peak streamflow simulation of S4 is better than that of the other scenarios, and its median value for the annual peak streamflow series is the closest to that of observed streamflow. The upper and lower boundary values of the S4 box are similar to those of the original data box. These results suggest that S4 can provide better performance to forecast annual peak value and a comparably reasonable prediction range.

Figure 10. Comparison of the observed and simulated annual peak values of monthly streamflow at the two hydrological stations for four scenarios (S1, S2, S3, and S4) using the LSTM, GRU, and BP models

Figure 10. Comparison of the observed and simulated annual peak values of monthly streamflow at the two hydrological stations for four scenarios (S1, S2, S3, and S4) using the LSTM, GRU, and BP models

The best models thus include LSTM and BP networks at the CCL and JH stations, respectively. Therefore, the above models are used to compare the simulation performance of annual minimum streamflow in different scenarios. shows the box plots of observed and simulated annual minimum values for monthly discharge models from the four scenarios. According to the figure, simulations using either a single forecasting factor or two forecasting factors show overestimation. The performance of the AI-based model from S3 (baseflow as a predictor) is slightly better compared to those of the other three scenarios in the CCL hydrological station. In contrast, S4 produces reasonable forecast results using the AI-based model at the JH station. Improving the accuracy of high flows and low flows in long-term streamflow forecasting methods has always been a major area of emphasis for water resource management. When the streamflow series is selected as the forecast factor of AI-based models for extrapolation, the forecasting results of the model will be overestimated or underestimated. Based on the above results, it can be concluded that the prediction accuracy of annual maximum and minimum streamflow can be effectively improved by adding the baseflow module to the long-term hydrological forecast.

Figure 11. Comparison of the observed and simulated annual minimum values of monthly streamflow at the two hydrological stations for four scenarios (S1, S2, S3, and S4) using the LSTM, GRU, and BP models

Figure 11. Comparison of the observed and simulated annual minimum values of monthly streamflow at the two hydrological stations for four scenarios (S1, S2, S3, and S4) using the LSTM, GRU, and BP models

4.3 Comparison between S1 and S4

In this section, S1 and S4 present the two scenarios using a single predictor (streamflow) and two predictors (baseflow and surface runoff) in the AI-based models for monthly streamflow forecasting. The model forecasting performance of each month and the forecasting skill of the long-foreseen period have been evaluated. The box plots of simulated and observed monthly streamflow (January to December) from S1 and S4 in the CCL and JH hydrological stations are shown in . The results show that S4 can give better prediction performance than S1, and it could provide reliable and accurate forecasting ranges for all 12 months (January to December) at CCL and JH stations. The AI-based model using baseflow and surface runoff as predictors can forecast te more accurate values for the outliers from January to December. It can be observed that there is not much difference in the prediction skills between the two scenarios in the wet season. Also, in the dry season, the performance of S4 is much better than that of S1 using the LSTM and BP models at both stations. The inclusion of baseflow in the models could develop better prediction results for low flow, and the inclusion of surface runoff can provide better predictions of high flow.

Figure 12. Observed and simulated monthly streamflow from January to December at hydrological stations CCL and JH by the LSTM, GRU, and BP models for S1 and S4

Figure 12. Observed and simulated monthly streamflow from January to December at hydrological stations CCL and JH by the LSTM, GRU, and BP models for S1 and S4

To further verify the accuracy of the model prediction from different scenarios, we selected the streamflow series from the last year to validate the AI-based model performance and the rest of the data for training and testing. In addition, shows the long-term streamflow forecast results based on S1 and S4 during six months’ lead time at the JH and CCL gauge stations. We can see that the overall forecasting model performance of S1 is similar to that of S4 in the first two months for both stations, and the Bias% values are less than 5%. However, for the long-foreseen period, S1 declines in performance when using the LSTM and BP models at the CCL and JH stations, respectively. At both stations, the monthly streamflow forecasting result of S4 exhibits better agreement with the observed values than does that of S1. Moreover, adding the baseflow separation process in the AI-based models to forecast long-term streamflow would extend the lead time, which can solve the problem of a short-foreseen period caused by traditional forecasting models. It is noted that the AI-based models, by adding baseflow and surface runoff as predictors, can provide good simulation results.

Table 7. Long-term streamflow forecast results of Scenarios 1 and 4 during a six-month forecasting period at the JH and CCL hydrological stations

5 Conclusions

Investigation of the performance of long-term streamflow forecast using conventional and AI-based models is the main focus of this work. This study proposed and evaluated six monthly streamflow forecasting models (LSTM, GRU, BP, SVM, HW, and SARIMA). These models were applied to four sites in Zhejiang Province, China. In addition, the prediction accuracies of different models under the same forecast scenario were compared and analysed. Meanwhile, stationarity of streamflow time series was considered in long-term prediction, which can help in assessing the suitability of a forecasting model for stationary and non-stationary streamflow series. In addition, four scenarios were then designed for evaluating different roles of baseflow and surface runoff for long-term streamflow forecasting. The main findings of this study are summarized as follows:

  1. The historical streamflow data are found to be stationary (no trend) at the BZA, SD, and CCL gauge stations, while the data for the JH station shows a statistically significant upward trend. In the model prediction based on a single prediction factor (monthly streamflow), the forecasting accuracy of the ANN models is much higher than that of the statistical model in the stationary and non-stationary streamflow series. The NSEC value for the LSTM, GRU, and BP models is more than 0.80, and the Bias% of model simulation is less than 15%, which can meet the requirements of long-term hydrological forecasts. However, for the annual minimum monthly streamflow simulation, the SVM, HW, and SARIMA models produced overestimation, and the simulation accuracy is not as high as that of the ANN models.

  2. Streamflow, baseflow, and surface runoff play essential roles in long-term hydrological forecasting. The forecasting results of the ANN models with two predictor variables (baseflow and surface runoff) are found to be more reasonable than those using a single predictor, no matter whether the streamflow series is stationary or non-stationary. Moreover, the simulation accuracy of annual peak and minimum monthly streamflow was also improved.

  3. Scenario S4 can give better predictions than S1, and it could provide reliable and accurate forecasting ranges for all 12 months (January to December) at the CCL and JH stations. Then, adding the baseflow separation process in the AI-based models to forecast long-term streamflow would extend the prediction period.

Based on the above results, it is proposed to add the baseflow and surface runoff as model predictors to the long-term streamflow forecast, and the ANN models also improved the performance skills of the stationary or non-stationary streamflow series. These models can be applied to other basins in the future to realize the optimal allocation and efficient utilization of water resources in the basin.

Acknowledgements

The authors thank the Major Project of Natural Science Foundation of Zhejiang [LZ20E090001], the Fundamental Research Funds for the Zhejiang Provincial Universities [2021XZZX015], and the Zhejiang Key Research and Development Plan [2021C03017] for financial support. Zhejiang Bureau of Hydrology is also greatly acknowledged for providing hydrologic data used in this study.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the Fundamental Research Funds for the Zhejiang Provincial Universities [grant number 2021XZZX015]; Zhejiang Key Research and Development Plan [grant number 2021C03017]; and the Major Project of Natural Science Foundation of Zhejiang [grant number LZ20E090001].

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.