Search in:

Engineering Applications of Computational Fluid Mechanics Volume 14, 2020 - Issue 1

Submit an article Journal homepage

Open access

16,038

Views

135

CrossRef citations to date

Altmetric

Listen

Articles

Predicting Standardized Streamflow index for hydrological drought using machine learning models

Shahabbodin Shamshirbanda Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh, Vietnam;b Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh, Vietnam

https://orcid.org/0000-0002-6605-498X

Sajjad Hashemic Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Hana Salimic Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Saeed Samadianfardc Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, IranCorrespondence[email protected] [email protected] [email protected]

Esmaeil Asadic Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Sadra Shadkanic Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Katayoun Kargard Department of Civil Engineering, Faculty of Engineering, Urmia University, Urmia, Iran

Amir Mosavie Kalman Kando Faculty of Electrical Engineering, Obuda University, Budapest, Hungary;f School of the Built Environment, Oxford Brookes University, Oxford, UKCorrespondence[email protected] [email protected] [email protected]

https://orcid.org/0000-0003-4842-0613

Narjes Nabipourg Institute of Research and Development, Duy Tan University, Da Nang550000, VietnamCorrespondence[email protected] [email protected] [email protected]

Kwok-Wing Chauh Department of Civil and Environmental Engineering, Hong Kong Polytechnic University, Hung Hom, Hong Kong, People’s Republic of China

show all

Pages 339-350 | Received 14 Oct 2019, Accepted 05 Jan 2020, Published online: 29 Jan 2020

Cite this article
https://doi.org/10.1080/19942060.2020.1715844
CrossMark

In this article

Nomenclature
Introduction
Materials and method
Results and discussion
Conclusion
Disclosure statement
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Hydrological droughts are characterized based on their duration, severity, and magnitude. Among the most critical factors, precipitation, evapotranspiration, and runoff are essential in modeling the droughts. In this study, three indices of drought, i.e., Standardized Precipitation Index (SPI), Standardized Streamflow Index (SSI), and Standardized Precipitation Evapotranspiration Index (SPEI), are modeled using Support Vector Regression (SVR), Gene Expression Programming (GEP), and M5 model trees (MT). The results indicate that SPI delivered higher accuracy. Moreover, MT model performed better in predicting SSI by a CC of 0.8195 and a RMSE of 0.8186.

Abbreviations: ANFIS: adaptive neuro-fuzzy inference system; ANN: artificial neural network; ANN: artificial neural network; BS-SVR: boosted-support Vector Regression; CC: correlation coefficient; ELM: extreme learning machine; GEP: gene Expression Programming; GP: genetic Programming; GPR: Gaussian process regression; KNN: k-nearest neighbor; LSSVM: least squares Support Vector Machine; LSSVR: least support vector regression; MAE: mean absolute error; MARS: multivariate adaptive regression splines; MLP: multilayer perceptron; MLR: multiple linear regression; MT: M5 model tree; P: precipitation; PDSI: palmer drought severity index; PET: potential evapotranspiration; RAE: relative absolute error; RMSE: root mean square error; RVM: relevance vector machine; SAR: sodium absorption index; SDR: standard deviation reduction; SPEI: standardized precipitation evapotranspiration index; SPI: standardized precipitation index; SSI: standardized streamflow index; SVM: support vector machine; SVR: support vector regression; WAANN: Wavelet-ARIMA-ANN; WANFIS: Wavelet-Adaptive Neuro-Fuzzy Inference System; WN: wavelet network

KEYWORDS:

Gene expression Programming
hydrological drought
M5 model tree
machine learning models
Standardized Streamflow index
support vector regression

Nomenclature

ET₀	=	reference evapotranspiration (mm day⁻¹)
Δ	=	slope of the saturation vapor pressure function (kPa^°C⁻¹)
γ	=	psychometric constant (kPa^°C⁻¹)
R_n	=	net radiation (MJ m⁻² d⁻¹)
G	=	Soil heat flux density (MJ m⁻² d⁻¹)
T	=	mean air temperature (^°C)
U₂	=	average 24 h wind speed at 2 m height above ground level (m s⁻¹)
e_s	=	saturation vapor pressure (kPa)
e_a	=	actual vapor pressure (kPa)
λ	=	latent heat of evaporation (MJ Kg⁻¹)
P	=	precipitation (mm)
PET_0i	=	evapotranspiration
D_i	=	difference between Precipitation (P) and evapotranspiration ( $PE T_{0 i}$ ) calculated for the ith month
L(y,f(x,a))	=	ϵ-insensitive error function
SDR	=	standard deviation reduction
sd	=	standard deviation
T_i	=	subset of samples that have the nth result of the potential test
x_i	=	predicted values of SSI
y_i	=	observed values of SSI
n	=	number of observations

Introduction

Drought is a periodic atmospheric event associated with a lack of sufficient available water resources in a geographical area with a significant duration of time. This environmental event is considered as an inseparable part of climatic changes and a recurring event in different climatic regions all over the world (Wilhite, Citation2000). Drought can affect natural habitats and ecosystems, as well as many economic and social sectors (Heim, Citation2002). Recently, in diverse parts of the world, drought frequently occurred, and its effects have been more harmful owing to increased water demand and climatic changes. Therefore, the phenomenon of drought has attracted the attention of many researchers, and studying the characteristics of this phenomenon from different angles has been the subject of high priority for researchers (Mishra & Singh, Citation2011). Moreover, among accidental and probable aspects, precipitation plays a vital role in arid and semi-arid regions. The drought happening in most parts of the world depends mostly on precipitation. So, lack of precipitation and increased temperature profoundly affect the scarcity of surface runoff. Furthermore, the lack of precipitation leads to a shortage of surface and groundwater resources.

Drought indices are an important part of the drought monitoring system. Researchers have developed different indices for monitoring drought conditions as well as studying its quantitative effects. Most drought indices have been developed for special geographical regions and special purposes, and as a result, their implementation to have an accurate and comprehensive analysis in other regions has been associated with considerable doubts due to the intrinsic complexities of the phenomenon of drought and different climatic conditions. Predicting drought can provide useful information which helps to reduce consequences resulting from drought. The Standardized Streamflow Index (SSI) with the merit of easiness and effectiveness is regarded as the most frequently employed method for quantifying the streamflow-based hydrological droughts. In other words, SSI allows precise comparisons of the hydrological conditions of a stream. Machine learning as a data-driven method is the most suitable alternative for predicting this phenomenon, for it requires the least information. Through employing M5 model tree and an efficient artificial neural network (ANN)-based model, Bhattacharya and Solomatine (Citation2005) predicted the stage-discharge rating curve in the Bhagirathi River in India. A comparison of conventional statistical models of the stage-discharge with artificial neural network method and M5 model tree indicated that the artificial neural network and M5 model tree were capable of predicting the stage-discharge rating curve. In an attempt to predict daily and monthly values of precipitation in Japan, Sumi, Zaman, and Hirose (Citation2012) implemented an ANN, multivariate adaptive regression splines (MARS), k-nearest neighbor (KNN), and Support Vector Regression (SVR). For modeling the monthly and daily rainfall, methods mentioned above, combined with moving average and principal component analysis as a pre-processing technique utilized (Choubin, Malekian, & Golshan, Citation2016; Faroughi et al., Citation2020; Hemmati-Sarapardeh et al., Citation2020; Shabani et al., Citation2020). The results showed that regarding daily precipitation forecasting, the hybrid methods provided better performances than standalone models. Additionally, regarding the values of monthly precipitation forecasting, the SVR method provided more accurate predictions than the hybrid method. Using the two methods of SVR and wavelet networks (WN), Adamowski and Prasher (Citation2012) predicted the daily runoff values in the mountainous Sianji drainage basin located in the Himalayan region of India. A comparison of the results showed that both methods had suitable accuracy for daily flow; however, the precision of wavelet networks was slightly higher. Choubin et al. (Citation2016) utilized adaptive neuro-fuzzy inference system (ANFIS), the M5P model tree and the multilayer perceptron (MLP) for predicting SPI 1–12 months in advance. The obtained findings confirmed that performance of the MLP is more superior to other implemented models. Deo, Kisi, and Singh (Citation2017) predicted the values of SPI in eastern Australia by using the drought models of least squares Support Vector Machine (LSSVM), M5 model tree and, MARS. Their findings showed the importance of predicting periodic drought as well as precise determination of model accuracy scales with geographic-seasonal factors. Oloruntade, Mohammad, Ghazali, and Wayayok (Citation2020) analyzed the SPI and SPEI meteorological drought indices and SRI hydrological drought index to study drought occurrence in the southern region of the Niger basin during the statistical period of 1970–2008. The obtained results indicated that hydrological drought is more affected by temperature (heating) than precipitation (drying) in the basin. Tirivarombo, Osupile, and Eliasson (Citation2018) carried out a study, comparing drought indices of SPI and SPEI in the Kafue River basin located in northern Zambia. Their study showed that change in temperature plays an important role in determining drought phenomena. In other studies, Adarsh and Janga Reddy, (Citation2019) analyzed drought occurrence using 3-month Standard Precipitation Index (SPI-3). The analysis of SPI-3 series using the Mann-Kendall method for the period of 1871–2012 showed a significant increasing trend in the regions of Kerala and Telangana and a decreasing trend in the regions of Ariasa. The results obtained from hybrid models compared with conventional prediction models based on various statistical criteria revealed the superiority of hybrid methods for predicting SPI3 in the mentioned regions. In a study by Sattari, Farkhondeh, and Abraham (Citation2018), the value of sodium absorption index (SAR) estimated with Support Vector Machine (SVM), M5 model tree, ANN, and gaussian process regression (GPR). Furthermore, after investigating several hydrocarbon parameters, they selected the most effective parameters. In their study, a combination of Sodium electrical conductivity and EC considered as input parameters, and the results showed that the M5 model had a better performance among data mining methods. This study indicated that the M5 model could be successfully used to estimate SAR values. They also showed that simple and optimized linear equations through the M5 model tree resulted in reduced time and expense. Sachindra, Ahmed, Rashid, Shahid, and Perera (Citation2018) developed statistical models for downscaling of monthly precipitation. In this study, by the usage of the four techniques of ANNs, Genetic Programming (GP), Relevance Vector Machine (RVM) and SVM, they calibrated downscaling models over the period 1950–1991 and accredited over the period 1992–2014 for every station. Based on their results, the use of RVM and ANN was recommended for downscaling and predicting flood, which involved considering high extremes of precipitation. Also, they identified that regardless of the climate regime, that model displayed a precise and acceptable performance with the Polynomial kernel. Yin et al. (Citation2018) conducted a study to predict river flow data in semi-arid areas. For this purpose, they used three models of MARS, SVR, and M5 model. Based on the results, the performance of the M5 model was better than the SVR and MARS models. Furthermore, it showed that the M5 model could be successfully implemented for predicting short-term river flows in semi-arid mountainous areas. Soh, Koo, Huang, and Fung (Citation2018) proposed Wavelet-ARIMA-ANN (WAANN) and the latest Wavelet-Adaptive Neuro-Fuzzy Inference System (WANFIS) models for predicting SPEI at the Langat River Basin. It was found that the WAANN model gives improved precision for both, the short-term and mid-term drought forecasting. Mouatadid, Raj, Deo, and Adamowski (Citation2018) implemented extreme learning machine (ELM) model for predicting SPEI in a region in eastern Australia and compared its enactment to that of a multiple linear regression (MLR), an artificial neural network (ANN), and a least support vector regression (LSSVR) models. The superior agreement between observed and predicted SPEI indicated the potential of the developed models for contributing more in understanding the potential future of drought-risks in eastern Australia. Fung, Huang, and Koo (Citation2018) studied the improvement of SVR-based drought forecasting models using wavelet preprocessing techniques in the Langat River basin. Prediction of the SPEIs of hybrid wavelet models compared with SVR and the Boosted-support Vector Regression (BS-SVR) models showed that hybrid wavelet BS-SVR model provided more accuracy for the prediction of drought in the Langat River basin. For modeling the evapotranspiration process, Parasuraman, Elshorbagy, and Carey (Citation2007) utilized GP technique and compare this model with the traditional Penman-Monteith (PM) as well as ANN method. The findings of their study indicated that both GP and ANN models performed better than the PM methods. Sahoo, Jha, Singh, and Kumar (Citation2019) studied the suitability of SVR for modeling monthly low-flow time series at three stations of India. In this study, the SVR model utilized as an accurate model for low-flow prediction.

North of Iran usually receives high amount of rainfall and water resources. However, mild and somehow severe droughts hits this mentioned area more regularly in recent years, which caused financial damages to its residents. Also, SSI was widely used in Europe and eastern parts of Asia, but there is not any reported in Iran. So, for investigating the possibility of predicting SSI in north of Iran, Navrood basin in the west of Gilan province was selected as the study area. Thus, the purpose of the present study is to predict an index of the hydrological drought of SSI for the Navrood drainage basin using SPI and SPEI through the implementation of data-driven models of SVR, GEP, and M5. Although hybrid artificial intelligent models are effective in hydrological modeling, these simple data driven models were selected based on their easy applications in predictive modeling and high capabilities for using in decision making processes. Besides, the most accurate model to predict the index of hydrological drought in the region under study was identified and presented among the models mentioned above. Finally, the finding results of the current paper could provide valuable information for developing the drought monitoring and forecasting systems of North of Iran and also the whole country.

Materials and method

Study area

Navrood basin with a humid climate is a vital drainage basin in the west of Gilan province located within the city of Talesh between 48°35′ to 48°54′ east longitude and 37°36′ to 37°45′ north latitude. The longest tributary of Navrood River originates from eastern slopes of the Alborz mountain range, which is a part of the Talesh Mountains, and after other sub-tributaries join, it follows the course from the west to the east, going through the city of Asalem, and finally flowing down the Caspian Sea. The area of Navrood Basin is 274 square kilometers. The average height of the basin is 1573 m, and the average annual rainfall of the pool is 852.7. The yearly average temperature, the average relative humidity, the average wind speed, and the yearly average evaporation are 13.5°C, 73.36%, 3.32 km/h, and 750 mm, respectively. The maximum precipitation of this basin occurs in fall, spring, summer, and winter, respectively. Also, the climate of the region is humid in the plain and cold in the highlands. The drainage basin is considered a part of the forest basins with steep slopes, in which more than 50% of the basin area has a slope ranging between 30% and 60%, with an average basin slope of 31.16%. Figure shows the geographical location of the Navrood drainage basin.

Figure 1. The geographical location of Navrood drainage basin.

Standardized precipitation index (SPI)

SPI is an indicator that depends on the probability of precipitation for any time and used to calculate different time scales. This method was devised and developed by Mckee, Doesken, and Leist (Citation1993) in an attempt to study various effects of scarcity of precipitation on groundwater, surface water reserves and resources, soil moisture, and waterway flow. The implementation of the SPI indicator is increased globally due to its advantages such as simplicity and small amount of data in the calculation, and being independent of mean precipitation. Moreover, it is used to compare a wide range of climates. The SPI calculation begins by fitting a probability density function. Usually, a two-parameter gamma and sometimes a Log Pearson type III, to the total precipitation over periods of 3–24 months.

Standardized precipitation evapotranspiration index (SPEI)

Penman-Monteith method (PM) was recommended as the standard method to calculate the reference crop evapotranspiration (Allen, Pereira, Raes, & Smith, Citation1998): (1) $E T_{0} = \frac{0.408 Δ (R_{n} - G) + γ [\frac{890}{T + 273}] U_{2} (e_{a} - e_{d})}{Δ + γ (1 + 0.34 U_{2})}$ (1) The SPEI is established by Vicente-Serrano, Beguería, and López-Moreno (Citation2010), and studied in various researches. This method involves climate balance in its calculation, and the role of temperature studied in the evaluation of drought. The SPEI depends on the changes in the difference between precipitation and potential evapotranspiration (P-PET). The Palmer Drought Severity Index (PDSI) (Palmer, Citation1965) had been presented concerning the changes in numerous source and demand variables of the hydrological cycle. Though, PDSI did not consist of multi-scale characteristics and is not a standard index. Different methods recommended for calculating PET. Sheffield, Wood, and Roderick (Citation2012) studied different calculation methods and compared them with each other; it is obvious that the Penman-Monteith method achieved more accurate results, for it is more based on atmospheric evaporation demand (Allen et al., Citation1998). Thus, the calculation of SPEI in this study is based on the Penman-Monteith equation as it is described in FAO56 (Allen et al., Citation1998). PM method selected by the World Meteorological Organization (WMO) as a standard method for the calculation of PET, and its accuracy proved without requiring more data. The monthly values of reference crop evapotranspiration are calculated based on climatic information and using equation 1. In the next step, the difference between Precipitation (P) and evapotranspiration ( $PE T_{0 i}$ ) calculated for the ith month: (2) $D_{i} = P_{i} - PE T_{0 i}, i = 1, 2, \dots, N$ (2)

Standardized streamflow index (SSI)

Most researchers applied standardized indices to hydrological droughts similar to indices of meteorological drought. Flow drought indices and standardized runoff indices (Shukla & Wood, Citation2008), which have similar theoretical backgrounds, are two important standardized hydrological drought indices. Thus, calculations of SSI is the same as the process of calculating SPI, the only difference is that instead of precipitation data, surface run-off data are used. Like SPI, this index shows a proper gamma distribution. Therefore, the cumulative flow values are estimated individually for each month; then, the SSI is calculated.

Gene expression programming (GEP)

Gene expression programming (GEP) is an algorithm that utilizes populations of individuals and chooses them based on their fitness, and it can apply genetic changes by the usage of genetic operators (Ferreira, Citation2001a, Citation2001b). The first stage in the GEP algorithm concerns creating a primary population of solutions. Through an accidental process or by the usage of some information about the problem, this stage can be completed. Next, the chromosomes represented as a tree expression, which assessed according to a fitting function. In general, the suitable function is evaluated through processing several target problems, also referred to as fitting problems. If a solution happens to have a satisfactory quality or the generations reach a specific number, the progress stops and, the best solution is reported. If the cases are not found for the best solution, the best one from the current generation is kept, and the rest of the solutions are left for the selection process. According to the selection, the best individuals have more chances of having children. The whole process is repeated for some generations, and as the new generations appear, it is expected that population quality improves on average.

To select the individuals, GEP uses the well-known method of Roulette Wheel. Unlike genetic algorithm and genetic programming, GEP has several genetic operators for reproducing individuals with modifications. Replication involves an operation whose purpose is to preserve some best individuals from the current generation for the next one. The goal of the mutation operator is to introduce random modifications into a specific chromosome. This operator performs some of the flawless operations to prevent creating individuals who are defective regarding the rules. Like a genetic algorithm, GEP uses a one-point and two-point combination. GEP, similar to the genetic algorithm, uses a single-point and two-point combination. The type of two-point combination is somewhat more interesting since it can, to a great extent, turn on and off the regions within the chromosomes that have not been encoded. Furthermore, another type of combination, called gene combination, is also performed at GEP, in which genes are thoroughly combined. This operator randomly selects genes in an identical position on two-parent chromosomes to form two new offspring.

Support vector regression

Support Vector Machine (SVM) is introduced by Boser, Guyon, and Vapnik (Citation1992), who developed optimal hyperplane theory as a linear classifier during the following years. This method is based on statistical learning theory. Furthermore, they introduced nonlinear classifiers through kernel functions. Later, others implemented the results of Boser et al. (Citation1992) studies and developed the fundamental principles of SVM, and finally, Vapnik (Citation1995) developed SVM based on regression. In general, SVM models are classified into (a) classification model of SVM and (b) regression model of SVM. The classification model of SVM is used to classify the problems related to data which are placed in different classes, and regression model of SVM is employed in solving prediction problems. A hyperplane is achieved using regression on fitted data. The distance of a specified point from the hyperplane showed the error of that point. The least-squares method is the best method recommended for linear regression. However, for the regression problems, it is possible that the use of the least-squares estimator in the presence of outliers may not be wholly reasonable, and, as a result, the regression presents a poor performance. Therefore, a robust estimator should be developed to avoid poor performance, which is not sensitive to small changes in the model. As it stated, the SVM is based on minimizing the risk structure derived from the theory of statistical training (Vapnik, Citation1998). Vapnik (Citation1998) used an error function to use SVM in regression problems that ignore errors in a ϵ-insensitive called a specified distance from real values (Basak, Pal, & Patranabis, Citation2007). This function is defined as below. (3) $\begin{aligned} L (y, f (x, a)) & = | y - f (x, a) |_{ε} \\ = {\begin{matrix} 0 for | y - f (x, a) | \leq ε \\ | y - f (x, a) | - ε if | y - f (x, a) | > ε \end{matrix} \end{aligned}$ (3) This error function does not consider error values below ϵ.

M5 model tree

M5 model tree is developed by Quinlan (Citation1992). This method is a subset of data mining and machine learning methods. Data mining methods discover various and suitable models then, find values from a set of known values. Data mining methods that developed for large datasets with many variables are different from statistical methods; therefore, they were designed for smaller datasets with fewer variables. Decision tree-based methods as one of the most well-known data mining techniques, predict or classify target characteristics as an output in the form of a model with a tree structure using input data. The M5 model tree is a decision tree which can be used to predict continuous numerical attributes, in which linear regression functions are represented in the leaves of this tree (Sattari, Pal, Apaydin, & Ozturk, Citation2013), which has recently triggered significant evolution concerning classification and prediction. The results of the tree model are easy to understand and simulate, and the model output has higher accuracy, and it is compared to other models. Four parts of root, branch, nodes, and leaves are formed a decision tree. Furthermore, the connection between the nodes showed with branches, and the circle represented the nodes. For ease of drawing, the decision tree is usually drawn from left to right or from top to bottom, in a way that the root (first node) is located at the top. The end of a chain is called the leaf. Splitting is performed by one of the predicting variables for minimizing the sum of the squared deviations from the mean data of each node. The first step to develop a tree model is to use a splitting criterion. The splitting criterion for the M5 algorithm is based on the performance of the standard deviation of the values belonging to each class or category, which has been obtained in each node. The splitting criterion indicates the level of error in that node, and calculates the least expected error model as the result of testing each attribute at that node. The model error is generally calculated by measuring the accuracy of predicting the target values of unseen cases. The standard deviation reduction (SDR) defined as (4) $SDR = s d (T) - \sum \frac{| T_{i} |}{| T |} s d (T_{i})$ (4) Where sd is the standard deviation, T_i is a subset of samples that have the nth result of the potential test, and the set of samples entering each node is showed by T.

The splitting process is repeated many times at each node until it reaches the end node (leaf), and in the leaf, the sum of the squared deviations from the mean data is almost zero. A large tree will develop as a result. It will be challenging to work with this big tree as it has many branches and nodes; therefore, to have an optimal and efficient tree, unwanted branches should be pruned out. There are two methods for pruning: (1) Pruning before maximum tree formation (2) Pruning after maximum tree formation. In the first method, the pruning process does not allow the tree to produce extra branches, but in the second method, first, the maximum tree is formed, then the pruning process is performed. The selection of the optimal tree is based on minimizing prediction error (Bhattacharya & Solomatine, Citation2006).

Evaluation parameters

The error values among the predicted and observed data were studied by the root mean square error (RMSE), relative absolute error (RAE), mean absolute error (MAE) and correlation coefficient (CC) as follows (Choubin et al., Citation2020; Hauduc et al., Citation2015): (5) $\begin{aligned} RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}} \end{aligned}$ (5) (6) $\begin{aligned} MAE = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} - y_{i} | \end{aligned}$ (6) (7) $\begin{aligned} CC = \frac{(\sum_{i = 1}^{n} x_{i} y_{i} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} \sum_{i = 1}^{n} y_{i})}{\begin{matrix} (\sum_{i = 1}^{n} x_{i}^{2} - \frac{1}{n} {(\sum_{i = 1}^{n} x_{i})}^{2}) \\ (\sum_{i = 1}^{n} y_{i}^{2} - \frac{1}{n} {(\sum_{i = 1}^{n} y_{i})}^{2}) \end{matrix}} \end{aligned}$ (7) (8) $\begin{aligned} RAE = \frac{\sum_{i = 1}^{n} | y_{i} - x_{i} |}{\sum_{i = 1}^{n} | x_{i} - \bar{x} |} \end{aligned}$ (8) where, $x_{i}$ , $y_{i}$ are the predicted and observed values of SSI, and $n$ is the number of observations. The CC indicates complete correlation between observed and predicted values. Positive values signify direct correlation and negative values reveal an inverse correlation. Moreover, the RMSE and MAE values represent errors and their smaller values indicate lower errors in modeling.

Results and discussion

Prediction of runoff through the precipitation and evapotranspiration parameters is very useful in the management of water resources. The present paper investigates the performance of the three models, including SVM, GEP, and M5 in forecasting Standardized Streamflow Index using SPI and SPEI indices in Navrood station during six-time delays (a one-month to six-month). In this study, of different scales for predicting SSI, a 48-month scale was selected and investigated as it had a higher correlation, and predicted by the introduced models. Table presents statistical parameters of utilized drought indices in the study area and Figure shows SSI, SPI, and SPEI indices on a 48-month scale. Furthermore, the cross-correlation between SPI, SPEI meteorological indices, and SSI were studied simultaneously and with a delay of six months. The results of cross-correlation indicated that meteorological (SPI) and hydrological drought (SSI) in Navrood basin occurs with a 3-and 4-month delay, and the highest similarity about drought prediction has been in 48 months. In addition, the cross-correlation between meteorological (SPEI) and hydrological (SSI) drought in the basin occurs with a 1-and 2-month delay and also the highest similarity about drought prediction has been in 48 months. As can be seen from Figure , the SSI, SPEI, and SPI indices have been inconsistent since 2013–2014; the rainfall did not lead to surface flow, which can be due to changes in the type and amount of precipitation, increased temperature and evapotranspiration. An increase in temperature leads to increased evaporation and dryness of the surface, which in turn, results in increased intensity and duration of drought. Also, it can be concluded that surface flow has occurred with a delay of a few months. Also, since the SPEI index is affected by temperature and needs evapotranspiration calculation, it is more sensitive to climatic conditions than SPI. The Pearson correlation matrix and cross-correlation show that the SPI drought index has worked better than the SPEI drought index in the Navrood drainage basin, since the Navrood basin is wet and has lower evapotranspiration values. However, the SSI hydrological drought index had a faster response to the SPEI drought index (1-, 2-month delay) compared to the SPI index (3-, 4-month delay).

Figure 2 Trend of the SSI, SPI, and SPEI indices on a 48-month scale in the studied period.

Table 1. Statistical characteristics of the utilized data.

Download CSV Display Table

There is no standard method for splitting training and testing data. For instance, the study of Kurup and Dudani (Citation2014) used a total of 63% of their data for model development, whereas Qasem et al. (Citation2019) utilized 67% of data and Samadianfard, Ghorbani, and Mohammadi (Citation2018), Deo et al. (Citation2018) and Samadianfard, Jarhan, et al. (Citation2019; Samadianfard, Majnooni-Heris, et al., Citation2019) used 70% and Zounemat-Kermani et al. (Citation2019) implemented 80% of total data to develop their models. So, in this study, approximately 70% of the data were used for training, and the remaining %30 were utilized for testing. Then, to compare the utilized methods, the corresponding statistical parameters were determined and shown in Tables and .

Table 2. Assessment criteria for predicting SSI using SPEI.

Download CSV Display Table

Table 3. Assessment criteria for predicting SSI using SPI.

Download CSV Display Table

In the case of using the SPEI index, GEP-4 has a better performance among GEP models as its CC, MAE, RMSE, and RAE were 0.650, 0.849, 0.961, and 1.08, respectively. Moreover, GEP-3 ranking second, had an acceptable performance with slightly higher error, whose CC, MAE, RMSE, and RAE were 0.642, 0.830, 0.967, and 1.056, respectively. Among tree models, M5-3 had the best performance, whose CC, MAE, RMSE and RAE were 0.659, 0.830, 0.964, and 0.780, respectively. Concerning the SVR method, SVR-1 had the best performance compared to other time delays, in which CC, MAE, RMSE, and RAE were 0.663, 0.879, 1.017, and 0.835, respectively. Also, overall results indicated that SVR-6, M5-6, and GEP-6 had the worst performance whose CC, MAE, RMSE, and RAE were 0.804, 0.924, 1.088, 0.844 and 0.520, 0.897, 1.057, 0.821 and 0.540, 0.903, 1.053, 1.159, separately. Therefore, they are not recommended for SSI prediction.

Additionally, in the case of using SPEI index, among GEP models, GEP-1 had an acceptable performance in which CC, MAE, RMSE and RAE were 0.644, 0.711, 0.837 and 0.904, respectively. Likewise, GEP-2 had also an acceptable performance in which CC, MAE, RMSE and RAE were 0.691, 0.718, 0.857, and 0.913, respectively. Furthermore, M5-5 and M5-4 had the best performance in tree model whose CC, MAE, RMSE and RAE were 0.819, 0.691, 0.819, 0.635; and 0.790, 0.703, 0.819, 0.658, respectively. Regarding SVR models, SVR-4 and SVR-3 had suitable performance CC, MAE, RMSE and RAE were 0.774, 0.720, 0.852, 0.673 and 0.768, 0.720, 0.852, 0.677, respectively.

After comparing statistical parameters obtained from the presented results, it can be concluded that SPI has performed better in predicting SSI. Also, based on Pearson correlation and cross-correlation, it was identified that although SPEI is more sensitive to climatic conditions, the drought index of SPI was better, and hydrological drought index predicted SSI with a lower error. Figures and show the scatter plots of predicted and observed values of SSI.

Figure 3. scatter plots of predicted and observed SSI by SPEI values using the best models.

Figure 4. scatter plots of predicted and observed SSI by SPI values using the best models.

The approach as mentioned above, concerning increased accuracy of the superior models can be concluded for all three methods studied in Figures and . It is concluded that using SPEI, the predictions of GEP-3, GEP-4, M5-3, and SVR-1 were more consistent with the observed SSI values and the distribution of points drawn around the axis of the bisector is low. In addition, predictions of GEP-1, M5-4, and M5-5 and SVR-3 were more accurate using SPI data. In other words, SSI value can be predicted with acceptable accuracy in Navrood station using the above-mentioned models. Figure shows the assessment criteria of all three methods used in six-time delays through the use of SPEI and SPI data. As it can be seen from Figure , GEP-4 and M5-4 by lower RMSE values than other studied models, proved themselves as powerful models for predicting SSI index using SPEI and SPI values, respectively. In other words, SSI values may be predicted accurately using GEP-4 and M5-4 models.

Figure 5. Assessment criteria of all considered models using (a) SPEI and (b) SPI data.

Conclusion

Drought has adverse effects on different communities, and it occurs in various climates. Therefore, considering the importance of studying drought and the critical role that drought indices have in forecasting this phenomenon, this study investigated predicting SSI hydrological drought index through meteorological indices of SPI and SPEI using the three models of support vector regression, gene expression programming, and tree model. In general, the results of the study indicated that SPI meteorological index had predicted SSI more accurately. Also, SPEI and SPI had the best prediction for SSI using GEP and M5 models, respectively. In other words, SSI values in Navrood can be predicted with the least error and an acceptable accuracy through gene expression programming and tree models by using SPEI and SPI data. Due to the fact that only current and past monthly lagged values of SPI and SPEI were utilized as inputs of machine learning models, they had limited capability for predicting SSI index in different climatic conditions. Further studies can examine the implementation of other drought indices and their influence on the precision of machine learning models.

Disclosure statement

No potential conflict of interest was reported by the authors.

Related Research Data

Application of several data-driven techniques to predict a standardized precipitation index

Source: Centro de Ciencias de la Atmosfera

A Review of Twentieth-Century Drought Indices Used in the United States

Source: American Meteorological Society

Statistical downscaling of precipitation using machine learning techniques

Source: Elsevier BV

Input selection and data-driven model performance optimization to predict the Standardized Precipitation and Evaporation Index in a drought-prone region

Source: Elsevier BV

A training algorithm for optimal margin classifiers

Source: Association for Computing Machinery (ACM)

Spatial hazard assessment of the PM10 using machine learning models in Barcelona, Spain.

Source: Elsevier BV

Support Vector Regression Integrated with Fruit Fly Optimization Algorithm for River Flow Forecasting in Lake Urmia Basin

Source: MDPI AG

Drought modeling-A review

Source: Elsevier BV

Application of artificial intelligence models for the prediction of standardized precipitation evapotranspiration index (SPEI) at Langat River Basin, Malaysia

Source: Elsevier BV

Drought monitoring and analysis: Standardised Precipitation Evapotranspiration Index (SPEI) and Standardised Precipitation Index (SPI)

Source: Elsevier BV

A rainfall forecasting method using machine learning models and its application to the Fukuoka city case

Source: Walter de Gruyter GmbH

Neural networks and M5 model trees in modelling water level-discharge relationship

Source: Elsevier BV

Modelling the dynamics of the evapotranspiration process using genetic programming

Source: Informa UK Limited

Daily global solar radiation modeling using data-driven techniques and empirical equations in a semi-arid climate

Source: Informa UK Limited

M5 model tree application in daily river flow forecasting in Sohu Stream, Turkey

Source: Pleiades Publishing Ltd

Neural Networks for Profiling Stress History of Clays from PCPT Data

Source: American Society of Civil Engineers (ASCE)

Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model

Source: Elsevier BV

Modeling monthly pan evaporation using wavelet support vector regression and wavelet artificial neural networks in arid and humid climates

Source: Taylor & Francis Group

Forecasting soil temperature at multiple-depth with a hybrid artificial neural network model coupled-hybrid firefly optimizer algorithm

Source: Elsevier BV

The Nature of Statistical Learning Theory

Source: Springer New York

Efficiency criteria for environmental model quality assessment

Source: HAL CCSD

Can Decomposition Approaches Always Enhance Soft Computing Models? Predicting the Dissolved Oxygen Concentration in the St. Johns River, Florida

Source: Multidisciplinary Digital Publishing Institute

Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data

Source: Elsevier BV

A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index

Source: American Meteorological Society

Design and evaluation of SVR, MARS and M5Tree models for 1, 2 and 3-day lead time forecasting of river flow data in a semiarid mountainous catchment

Source: Springer Science and Business Media LLC

Modeling natural gas compressibility factor using a hybrid group method of data handling

Source: Taylor & Francis Group

Comparison of machine learning methods for runoff forecasting in mountainous watersheds with limited data / Porównanie metod uczenia maszynowego do prognozowania spływu w zlewniach górskich na podstawie ograniczonych danych

Source: Walter de Gruyter GmbH

Use of a standardized runoff index for characterizing hydrologic drought

Source: American Geophysical Union (AGU)

Evaluation of trends and predictability of short‐term droughts in three meteorological subdivisions of India using multivariate EMD‐based hybrid modelling

Source: Wiley

Estimation of sodium adsorption ratio indicator using data mining methods: a case study in Urmia Lake basin, Iran

Source: Springer Science and Business Media LLC

Linking provided by

References

Adamowski, J., & Prasher, S. O. (2012). Comparison of machine learning methods for runoff forecasting in mountainous watersheds with limited data. Journal of Water and Land Development, 17(1), 89–97.
Google Scholar
Adarsh, S., & Janga Reddy, M. (2019). Evaluation of trends and predictability of short-term droughts in three meteorological subdivisions of India using multivariate EMD-based hybrid modelling. Hydrological Processes, 33(1), 130–143.
Web of Science ®Google Scholar
Allen, R. G., Pereira, L. S., Raes, D., & Smith, M. (1998). Crop evapotranspiration- Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. FAO, Rome. 300(9).
Google Scholar
Basak, D., Pal, S., & Patranabis, D. C. (2007). Support vector regression. Neural Information Processing, 11, 203–225.
Google Scholar
Bhattacharya, B., & Solomatine, D. P. (2005). Neural networks and M5 model trees in modeling water level-discharge relationship for an Indian river. Neurocomputing, 63, 381–396.
Web of Science ®Google Scholar
Bhattacharya, B., & Solomatine, D. P. (2006). Machine learning in sedimentation modeling. Neural Networks, 19(2), 208–214.
PubMed Web of Science ®Google Scholar
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classiers. In D. Haussler (Ed.), 5th annual ACM Workshop on COLT (pp. 144–152). Pittsburgh, PA: Wiley.
Google Scholar
Choubin, B., Abdolshahnejad, M., Moradi, E., Querol, X., Mosavi, A., Shamshirband, S., & Ghamisi, P. (2020). Spatial hazard assessment of the PM10 using machine learning models in Barcelona, Spain. Science of The Total Environment, 701, 134474.
PubMed Web of Science ®Google Scholar
Choubin, B., Malekian, A., & Golshan, M. (2016). Application of several data-driven techniques to predict a standardized precipitation index. Atmósfera, 29(2), 121–128.
Web of Science ®Google Scholar
Deo, R. C., Ghorbani, M. A., Samadianfard, S., Maraseni, T., Bilgili, M., & Biazar, M. (2018). Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renewable Energy, 116, 309–323.
Web of Science ®Google Scholar
Deo, R. C., Kisi, O., & Singh, V. P. (2017). Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model. Atmospheric Research, 184, 149–175.
Web of Science ®Google Scholar
Faroughi, M., Karimimoshaver, M., Aram, F., Solgi, E., Mosavi, A., Nabipour, N., & Chau, K. W. (2020). Computational modeling of land surface temperature using remote sensing data to investigate the spatial arrangement of buildings and energy consumption relationship. Engineering Applications of Computational Fluid Mechanics, 14(1), 254–270.
Web of Science ®Google Scholar
Ferreira, C. (2001a). Gene expression programming in problem solving. 6th Online World Conf. on Soft Computing in Industrial Applications (invited Tutorial).
Google Scholar
Ferreira, C. (2001b). Gene expression programming, A new adaptive algorithm for solving problems. Complex Systems, 13(2), 87.
Google Scholar
Fung, K. F., Huang, Y. F., & Koo, C. H. (2018). Improvement of SVR-Based Drought Forecasting Models using Wavelet Pre-Processing Technique. In E3S Web of Conferences (Vol. 65, p. 07007). EDP Sciences.
Google Scholar
Hauduc, H., Neumann, M. B., Muschalla, D., Gamerith, V., Gillot, S., & Vanrolleghem, P. A. (2015). Efficiency criteria for environmental model quality assessment: A review and its application to wastewater treatment. Environmental Modelling & Software, 68, 196–204.
Web of Science ®Google Scholar
Heim Jr., R. R. (2002). A review of twentieth-century drought indices used in the United States. Bulletin of the American Meteorological Society, 83(8), 1149–1166.
Web of Science ®Google Scholar
Hemmati-Sarapardeh, A., Hajirezaie, S., Soltanian, M. R., Mosavi, A., Nabipour, N., Shamshirband, S., & Chau, K. W. (2020). Modeling natural gas compressibility factor using a hybrid group method of data handling. Engineering Applications of Computational Fluid Mechanics, 14(1), 27–37.
Web of Science ®Google Scholar
Kurup, P. U., & Dudani, N. K. (2014). Neural networks for profiling stress history of clays from PCPT data. Journal of Geotechnical and Geoenvironmental Engineering, 128, 569–579.
Web of Science ®Google Scholar
Mckee, T. B., Doesken, N. J., & Leist, J. (1993). The relationship of drought frequency and duration to time scales. Preprints 8th Conference on Applied Climatology, 17, 179–184.
Google Scholar
Mishra, A. K., & Singh, V. P. (2011). Drought modeling - a review. Journal of Hydrology, 403(1-2), 157–175.
Web of Science ®Google Scholar
Mouatadid, S., Raj, N., Deo, R. C., & Adamowski, F. (2018). Input selection and data-driven model performance optimization to predict the standardized precipitation and evaporation index in a drought-prone region. Atmospheric Research, 212, 130–149.
Web of Science ®Google Scholar
Nabipour, N. (2020). Short-Term Hydrological Drought Forecasting Based on Different Nature-Inspired Optimization Algorithms Hybridized with Artificial Neural Networks. IEEE Access, 155, 225–233. doi:10.1109/ACCESS.2020.2964584.
Google Scholar
Palmer, W. C. (1965). Meteorological drought. US Department of Commerce. Washington, DC, USA: Weather Bureau.
Google Scholar
Parasuraman, K., Elshorbagy, A., & Carey, S. K. (2007). Modelling the dynamics of the evapotranspiration process using genetic programming. Hydrological Sciences Journal, 52(3), 563–578.
Web of Science ®Google Scholar
Qasem, S. N., Samadianfard, S., Kheshtgar, S., Jarhan, S., Kisi, O., Shamshirband, S., & Chau, K. W. (2019). Modeling monthly pan evaporation using wavelet support vector regression and wavelet artificial neural networks in arid and humid climates. Engineering Applications of Computational Fluid Mechanics, 13(1), 177–187.
Web of Science ®Google Scholar
Quinlan, J. R. (1992). Learning with continuous classes. In Proc. AI’92 (Fifth Australian Joint Conference on Artificial Intelligence (pp. 343–348). Singapore: World Scientific.
Google Scholar
Sachindra, D. A., Ahmed, K., Rashid, M. M., Shahid, S., & Perera, B. J. C. (2018). Statistical downscaling of precipitation using machine learning techniques. Atmospheric Research, 212, 240–258.
Web of Science ®Google Scholar
Sahoo, B. B., Jha, R., Singh, A., & Kumar, D. (2019). Application of Support Vector Regression for modeling low flow time series. KSCE Journal of Civil Engineering, 14, 1–12.
Google Scholar
Samadianfard, S., Ghorbani, M. A., & Mohammadi, B. (2018). Forecasting soil temperature at multiple-depth with a hybrid artificial neural network model coupled hybrid firefly optimizer algorithm. Information Processing in Agriculture, 5, 465–476.
Google Scholar
Samadianfard, S., Jarhan, S., Salwana, E., Mosavi, A., Shamshirband, S., & Akib, S. (2019). Support vector regression integrated with fruit fly optimization algorithm for river flow forecasting in Lake Urmia Basin. Water, 11, 1934.
Web of Science ®Google Scholar
Samadianfard, S., Majnooni-Heris, A., Qasem, S. N., Kisi, O., Shamshirband, S., & Chau, K. W. (2019). Daily global solar radiation modeling using data-driven techniques and empirical equations in a semi-arid climate. Engineering Applications of Computational Fluid Mechanics, 13(1), 142–157.
Web of Science ®Google Scholar
Sattari, M. T., Farkhondeh, A., & Abraham, J. P. (2018). Estimation of sodium adsorption ratio indicator using data mining methods: A case study in Urmia Lake basin, Iran. Environmental Science and Pollution Research, 25(5), 4776–4786.
PubMed Web of Science ®Google Scholar
Sattari, M. T., Pal, M., Apaydin, H., & Ozturk, F. (2013). M5 model tree application in daily river flow forecasting in Sohu Stream, Turkey. Water Resources, 40(3), 233–242.
Web of Science ®Google Scholar
Shabani, S., Samadianfard, S., Sattari, M. T., Mosavi, A., Shamshirband, S., Kmet, T., & Várkonyi-Kóczy, A. R. (2020). Modeling pan evaporation using Gaussian process regression K-nearest neighbors random forest and Support Vector machines; comparative analysis. Atmosphere, 11(1), 66.
Web of Science ®Google Scholar
Sheffield, J., Wood, E. F., & Roderick, M. L. (2012). Little change in global drought over the past 60 years. Nature, 491(7424), 435–438.
PubMed Web of Science ®Google Scholar
Shukla, S. H., & Wood, A. W. (2008). Use of a standardized runoff index for characterizing hydrologic drought. Geophysical Research Letters, 35(2), 41–46.
Web of Science ®Google Scholar
Soh, Y. W., Koo, C. H., Huang, Y. F., & Fung, K. F. (2018). Application of artificial intelligence models for the prediction of standardized precipitation evapotranspiration index (SPEI) at Langat River Basin, Malaysia. Computers and Electronics in Agriculture, 144, 164–173.
Web of Science ®Google Scholar
Sumi, S. M., Zaman, M., & Hirose, H. (2012). A rainfall forecasting method using machine learning models and its application to the Fukuoka city case. International Journal of Applied Mathematics and Computer Science, 22(4), 841–854.
Web of Science ®Google Scholar
Tirivarombo, S., Osupile, D., & Eliasson, P. (2018). Drought monitoring and analysis: Standardised precipitation evapotranspiration index (SPEI) and standardised precipitation index (SPI). Physics and chemistry of the Earth. Parts A/B/C, 106, 1–10.
Web of Science ®Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Google Scholar
Vicente-Serrano, S. M., Beguería, S., & López-Moreno, J. I. (2010). A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. Journal of Climate, 23(7), 1696–1718.
Web of Science ®Google Scholar
Wilhite, D. A. (2000). Drought: A global assessment. Volume I. London and New York: Rutledge Press.
Google Scholar
Yin, Z., Feng, Q., Wen, X., Deo, R. C., Yang, L., Si, J., & He, Z. (2018). Design and evaluation of SVR, MARS and M5Tree models for 1, 2 and 3-day lead time forecasting of river flow data in a semiarid mountainous catchment. Stochastic Environmental Research and Risk Assessment, 32(9), 2457–2476.
Web of Science ®Google Scholar
Zounemat-Kermani, M., Seo, Y., Kim, S., Ghorbani, M. A., Samadianfard, S., Naghshara, S., … Singh, V. P. (2019). Can decomposition approaches always enhance soft computing models? Predicting the dissolved oxygen concentration in the St. Johns River, Florida. Applied Science, 9, 2534.
Web of Science ®Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Predicting Standardized Streamflow index for hydrological drought using machine learning models

Abstract

Nomenclature

Introduction