Search in:

Hydrological Sciences Journal Volume 65, 2020 - Issue 3

Submit an article Journal homepage

Free access

481

Views

CrossRef citations to date

Altmetric

Listen

Articles

Water quality monitoring at a virtual watershed monitoring station using a modified deep extreme learning machine

Jian JinCollege of Automation, Hangzhou Dianzi University, Hangzhou, ChinaView further author information

Peng JiangCollege of Automation, Hangzhou Dianzi University, Hangzhou, ChinaCorrespondence[email protected]
View further author information

Lei LiCollege of Automation, Hangzhou Dianzi University, Hangzhou, ChinaView further author information

Huan XuCollege of Automation, Hangzhou Dianzi University, Hangzhou, ChinaView further author information

Guang LinZhejiang Provincial Environmental Monitoring Center, Hangzhou, ChinaView further author information

Pages 415-426 | Received 15 Feb 2019, Accepted 02 Oct 2019, Published online: 17 Dec 2019

Cite this article
https://doi.org/10.1080/02626667.2019.1699245
CrossMark

In this article

ABSTRACT
1 Introduction
2 Method
3 Experiments
4 Conclusion
Disclosure statement
Additional information
Footnotes
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

A new deep extreme learning machine (ELM) model is developed to predict water temperature and conductivity at a virtual monitoring station. Based on previous research, a modified ELM auto-encoder is developed to extract more robust invariance among the water quality data. A weighted ELM that takes seasonal variation as the basis of weighting is used to predict the actual value of water quality parameters at sites which only have historical data and no longer generate new data. The performance of the proposed model is validated against the monthly data from eight monitoring stations on the Zengwen River, Taiwan (2002–2017). Based on root mean square error, mean absolute error, mean absolute percentage error and correlation coefficient, the experimental results show that the new model is better than the other classical spatial interpolation methods.

KEYWORDS:

weighted extreme learning machine
contractive denoising auto-encoder
virtual monitoring station
water quality prediction

Editor S. Archfield Associate editor Xun Sun

1 Introduction

Water bodies should be monitored at a spatial scale that provides information on their current state and highlights where new management actions may be needed, or if current management practices are sufficient (Reyjol et al. Citation2014). Hence, the greater the number of monitoring sites throughout a water body, the higher the probability that monitoring will accurately represent its current state. However, according to market research conducted by the author, in China, the construction cost of a small automatic monitoring station with five water quality parameters (permanganate index, ammonia nitrogen, total phosphorus, water temperature, pH) is up to 4 million RMB, which does not include the subsequent equipment maintenance and human resources. Therefore, there are resource implications where a large number of monitoring sites are required and a balance is needed between resource requirements and scientific rigor (Earle and Blacklocke Citation2008).

In the early 20th century, Prandtl (Citation1925) proposed and developed the mixing length theory that describes mixing length as the distance over which a fluid parcel will maintain its properties before it is mixed with the surrounding fluid. In other words, river mixing length is a distance over which an upstream water parcel will keep its original properties before dispersing those characteristics into the surrounding downstream water (Do et al. Citation2012). Therefore, in order to reduce maintenance costs, it is possible to optimize the built monitoring network based on this theory (Chapman et al. Citation2016). For this reason, many researchers have studied methods that are mostly based on statistical theory (Guigues et al. Citation2013, Mavukkandy et al. Citation2014, Wang et al. Citation2014, Tanos et al. Citation2015, Villas-Boas et al. Citation2017), the kriging method (Karamouz et al. Citation2009a, Chunping et al. Citation2012, Sabzipour et al. Citation2017), or information entropy (Karamouz et al. Citation2009b, Lee et al. Citation2014). For instance, Maymandi et al. (Citation2018) proposed a hybrid VOI-entropy-based methodology to optimize a water quality monitoring network built on the largest manmade reservoir in Iran. In order to optimize the monitoring network in the Taizihe River, Northeast China, the matter element analysis and gravity distance were applied in the optimization, which was proved to be a useful method (Wang et al. Citation2015). Furthermore, they found that the number of monitoring stations in the network could be reduced from 17 to 13. Antanasijević et al. (Citation2017) proposed a self-organizing network based on the similarity index for the optimization of sampling locations in an existing river water quality monitoring network of the River Danube on its stretch through Serbia. The study by Chapman et al. (Citation2016) used combined cluster and discriminant analysis (Kovács et al. Citation2014) to evaluate the efficiency of a monitoring network in the Danube River.

After optimization through the methods described above, some sites in a monitoring network can be closed and monitoring can be stopped to reduce subsequent maintenance costs. Such monitoring stations that only have historical data and no longer generate new data, are referred to as ‘virtual stations’ in this paper. Although the trend of change can be inferred from contiguous sites with high correlation, the specific values cannot be known. For example, when a new source of pollution appears near the optimized monitoring station, it is necessary to obtain specific values of pollutants at the current monitoring points to judge the severity of the pollution (Mitrović et al. Citation2019). Appropriate methods, such as ordinary kriging (OK) and inverse distance weighting (IDW) (Mehrjardi et al. Citation2008, Yang and Jin Citation2010) are developed to interpolate the inactive sites and infer water quality conditions at unmonitored sites. Recently, Rizo-Decelis et al. (Citation2017) adapted an improved topological kriging (TK) method to estimate water quality along the main stem of a large basin. Based on cross-validation experiments applied to 28 water quality variables measured along the Santiago River in western Mexico, they showed that the TK method offers a more accurate water quality prediction than other methods (including OK) for many of the measured parameters. Laaha et al. (Citation2013) employed the TK method and regional regression to estimate streamflow-related variables along streams in Austria and found that the TK method is well suited for streamflow and many streamflow-related variables. Jiang et al. (Citation2013) use the OK method to predict the spatial distribution of various water quality parameters (salinity, pH, total hardness) in the Huaihe River basin, China.

With the development of artificial intelligence technology, the artificial neural network (ANN) has been proved to be superior to both the kriging method and partial thin-plate splines due to its strong fitting ability to non-linear systems (Juan Citation2003); and it has been successfully applied to various spatial interpolation tasks (Li et al. Citation2004, Philippopoulos and Deligiorgi Citation2012, Shahabi et al. Citation2016). In terms of the interpolation of the spatial distribution of water quality, Yingchun et al. (Citation2013) proposed a model to spatially predict dissolved organic carbon (DOC) using an ANN and demonstrated that the ANN is a promising tool for the spatial prediction of DOC in river networks using watershed characteristics. Li et al. (Citation2013) applied the model of genetic algorithm optimized back-propagation neural networks (GA-BPNN) to formulate the forecasting methodology of groundwater quality parameters, with spatial data as parameters. Compared with the OK method, the results of the GA-BPNN were better and the proposed method was found to be a reasonable and feasible method for the spatial distribution of groundwater quality parameters in Langfang city (China). In a study by Khashei-Siuki and Sarbazi (Citation2015), IDW, kriging, cokriging, ANN and adaptive network-based fuzzy inference system (ANFIS) models were compared in the prediction of the spatial distribution of groundwater electrical conductivity (EC). Their results showed that the ANN model had the best accuracy. Mitrović et al. (Citation2019) used a Monte Carlo optimized ANN to predict 18 water quality parameters at monitoring stations in the Danube River that had stopped operation since 2012. Their research showed that most of the studied water quality parameters (13 of 18) were estimated with smaller relative error (less than 10%), which means that it is reasonable for these monitoring stations to be optimized and disabled.

The traditional ANN often uses a back-propagation (BP) algorithm to train the model. Research shows that the BP algorithm is slow in learning, easy to fall into local optimum, and difficult to adapt to some real-time scenarios (Sotiropoulos et al. Citation2002). To solve these problems, Huang et al. (Citation2006) proposed a new neural network model, the extreme learning machine (ELM). With the increasing complexity of a system, a shallow model with a simple structure has some limitations in problem analysis. For this reason, many researchers have improved the model based on ELM and developed deep models, such as the multi-layer ELM (Kasun et al. Citation2013), the denoising multi-layer ELM (Zhang et al. Citation2016) and the contractive multi-layer ELM (Jia and Du Citation2016), to realize automatic feature extraction. Based on these models, a new model that combines ELM with a contractive denoising auto-encoder (CDAE) is proposed in this paper, and the improved model is applied to predict the water quality at virtual stations in Zengwen River Basin, Taiwan.

Compared with the existing related methods of predicting the spatial distribution of water quality, the contributions of this paper are summarized as follows:

local denoising criteria and a contractive regularization term are introduced simultaneously into the auto-encoder based on the ELM, and a deep contractive denoising ELM model is developed; and
the proposed model is used to extract the abstract feature expression of the spatial water quality distribution relationship in the monitoring network, and the prediction of water quality at virtual stations is realized by the weighted extreme learning machine (WELM), which takes seasonal variation as the basis of weighting in the model.

The rest of the paper is organized as follows: Section 2 introduces the concepts of WELM, the auto-encoder and describes the proposed method. In Section 3, the Pearson correlation coefficient is used to analyse the spatial correlation of water temperature and conductivity collected from eight water quality monitoring stations near Zengwun River in Taiwan since 22 January 2002. Then, five algorithms including classical shallow models, i.e. the back propagation neural network (BPNN), ELM, WELM, deep ELM models, i.e. denoising multi-layer ELM, the proposed method, are used to predict actual values at the virtual monitoring sites through the observed values of relevant stations. Comparison of the prediction results is done using four indicators, i.e. RMSE, MAE, MAPE, R, to test the prediction performance of the proposed method. Finally, conclusions and future work are summarized.

2 Method

2.1 Weighted extreme learning machine (WELM)

The ELM is a learning method for training single hidden layer feedforward networks (SLFNs) and demonstrates its excellent learning accuracy and speed in a variety of applications (Huang et al. Citation2012). Suppose N training samples $[x_{i}, t_{i}], i = 1, 2, \dots, N$ , where each sample is denoted by $x_{i} = [x_{i 1}, x_{i 2}, \dots, x_{i d_{x}}]^{T} \in R^{d_{x}}$ and its corresponding network target vector is $t_{i} = [t_{i 1}, t_{i 2}, \dots, t_{i d_{t}}]^{T} \in R^{d_{t}}$ , where $d_{x}$ and $d_{t}$ represent the dimensions corresponding to the input and output, respectively. Standard SLFNs with $l$ hidden nodes and activation function $g (x)$ are mathematically modelled as:

(1)

\sum_{j = 1}^{l} b_{j} g_{j} (x_{i}) = \sum_{j = 1}^{l} b_{j} g (a_{j} x_{i} + b_{j}) = t_{i}, i = 1, 2, \dots, N

(1)

where $a_{j} = (a_{j 1}, a_{j 2}, \dots, a_{j d_{x}})$ is the weight vector connecting the jth hidden node and the input nodes; $b_{j} = (b_{j 1}, b_{j 2}, \dots, b_{j d_{t}})$ is the weight vector connecting the jth hidden node and the output nodes; and $b_{j}$ represents the threshold of the jth hidden node.

The above equations may be written as:

(2)

H b = T

(2)

and the output weight $b$ can be calculated as:

(3)

b = H^{†} T

(3)

where $H$ is the hidden layer node output matrix and expressed by Equation (4); $b = [b_{1}^{T}, b_{2}^{T}, \dots, b_{l}^{T}]_{l \times d_{t}}^{T}$ is the output weight matrix; and $T = [t_{1}, t_{2}, \dots, t_{N}]_{N \times d_{t}}^{T}$ is the expected output matrix.

(4)

H (a_{1}, \dots, a_{l}, b_{1}, \dots, b_{l}, x_{1}, \dots x_{N}) = {[\begin{matrix} g (a_{1} x_{1} + b_{1}) & \dots & g (a_{l} x_{1} + b_{l}) \\ ⋮ & ⋱ & ⋮ \\ g (a_{1} x_{N} + b_{1}) & \dots & g (a_{l} x_{N} + b_{l}) \end{matrix}]}_{N \times l}

(4)

In order to improve the robustness of the ELM, Deng et al. (Citation2010) proposed a regularized ELM (RELM), which combines experiential risk with structural risk; the cost function can be rewritten as follows:

(5)

\underset{b}{m i n} L_{1} = \frac{1}{2} ∥ b ∥^{2} + \frac{C}{2} ∥ T - H b ∥^{2}

(5)

where C is a regularization parameter to balance experiential risk and structural risk. The output weight $b$ can be expressed as:

(6)

b = H^{†} T = {(\frac{I}{C} + H H^{T})}^{- 1} H^{T} T

(6)

where $H^{†}$ is the Moore–Penrose generalized inverse of a matrix $H$ .

Data in real applications such as water quality forecasting usually have imbalanced class distribution, which means some of the data in the time series are more important than others. To tackle the regression or classification tasks with imbalanced class distribution, the weighted ELM (WELM) is proposed by Zong et al. (Citation2013).

The objective function of the WELM approach can be mathematically rewritten as:

(7a)

M i n i m i z e : L_{P_{E L M}} = \frac{1}{2} ∥ b ∥^{2} + \frac{1}{2} C \sum_{i = 1}^{N} W {ε_{i}}^{2}

(7a)

subject to:

(7b)

h (x_{i}) b = t_{i} - ε_{i}, i = 1, 2, \dots, N

(7b)

where $ε_{i} = (ε_{i 1}, ε i_{2}, \dots, ε_{i d_{t}})$ is the training error of output $d_{t}$ corresponding to each input $x_{i}$ ; $h (x_{i})$ is the hidden layer node output vector of input $x_{i}$ ; and W is an $N \times N$ diagonal matrix with the diagonal element $w_{i i}, i = 1, 2, \dots, N$ . In this paper, the weight matrix W is obtained by calculating the percentage of the sample size in the current quarter as a percentage of the total sample. Finally, according to the KKT (Karush-Kuhn-Tucker) theorem, the output weight matrix $b$ is given by:

(8)

b = H^{T} (\frac{I}{C} + W H H^{T})^{- 1} W T

(8)

2.2 Auto-encoder

The Auto-encoder is a special neural network, which learns the input features in an unsupervised way and sets the target values to be equal to the input (Vincent et al. Citation2010). The auto-encoder is composed of an input layer, a hidden layer, and output layer. The structure of a basic auto-encoder is shown in .

Figure 1. Structure of the basic auto-encoder.

The auto-encoder consists of two processes, encoding and decoding. In the encoding phase, an encoding mapping $f$ that transforms the input vector $x \in R^{d_{x}}$ into a hidden representation $h \in R^{d_{h}}$ :

(9)

h = f (x) = f u n c_{f} (W_{1} x + b_{1})

(9)

where $f u n c_{f}$ is the encoder activation function, which is usually a nonlinear function such as a sigmoid function or a hyperbolic tangent function. The matrix $W_{1} \in R^{d_{h} \times d_{x}}$ is the encoder weight, and $b_{1} \in R^{d_{h}}$ is the encoder bias. The parameters $d_{x}$ and $d_{h}$ represent the number of units in the input layer and hidden layer, respectively.

In the phase of the decoder network, the hidden representation $h$ is mapped back to a reconstruction $\tilde{x} \in R^{d_{x}}$ by the reconstruction function $f u n c_{g}$ :

(10)

\tilde{x} = g (h) = f u n c_{g} (W_{2} h + b_{2})

(10)

where $f u n c_{g}$ is the decoder activation function, similar to $f u n c_{f}$ . The decoder weight matrix is $W_{2} \in R^{d_{x} \times d_{h}}$ and the bias is $b_{2} \in R^{d_{x}}$ . Generally, $W_{2}$ is equal to $W_{1}^{T}$ in practice. In this paper, $W_{2} = {W_{1}}^{T}$ . All parameters $θ = \{W_{1}, W_{2}, b_{1}, b_{2}\}$ are learned simultaneously during the process of reconstruction and are as similar as possible to the original water quality data. This means that the loss function that can be expressed by Equation (11) needs to be minimized.

(11)

L (θ) = \sum_{x \in X} D (x, \tilde{x}) + \frac{1}{2} λ ∥ W ∥_{2}^{2}

(11)

where $X = \{x_{1}, x_{2}, \dots, x_{N}\}$ is the training set with N samples; and $D$ is the reconstruction error, which can be mean square variance $D (x, \tilde{x}) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{2} ∥ {\tilde{x}}_{i} - {x_{i}}^{2} ∥$ . In order to avoid overfitting, the regularization term is added as the second part. The parameter $λ$ is a weight decay coefficient that controls the importance of the regularization.

For the purpose of learning robust representations, the denoising auto-encoder (DAE) (Vincent et al. Citation2010) and the contractive auto-encoder (CAE) are proposed (Rifai et al. Citation2011). The DAE is trained to reconstruct a clean “repaired” input from a corrupted version of it and the CAE with different regularization term can be described as follows:

(12)

L_{C A E} (θ) = \sum_{x \in X} D (x, \tilde{x}) + \frac{1}{2} λ ∥ J (x) ∥_{F}^{2}

(12)

where $J (x) = \frac{\partial f}{\partial x}$ is the Jacobian matrix of the encoder $f$ at $x$ . For a sigmoid encoder, the Frobenius norm $∥ J (x) ∥_{F}^{2}$ can be calculated by (Liu et al. Citation2015):

(13)

∥ J (x) ∥_{F}^{2} = \sum_{j = 1}^{d_{h}} (f (x)_{j} (1 - f (x)_{j}))^{2} {W_{j}}^{2}

(13)

2.3 Deep extreme learning machine based on ELM-CDAE

First, a new ELM model based on contractive denoising auto-encoder (ELM-CDAE) is presented; the model structure is shown in . The ELM-CDAE introduces the denoising criterion and regularization term in a contractive auto-encoder to guarantee that learning features are robust and useful.

Figure 2. Structure of the ELM-CDAE.

The original input $x$ is corrupted into $\hat{x}$ by means of a stochastic mapping $\hat{x} \to q (\hat{x} | x)$ , and the cost function after adding the regularization term can be expressed as:

(14)

\underset{β}{m i n} L = \frac{1}{2} ∥ β ∥^{2} + \frac{C}{2} ∥ T - H β ∥^{2} + \frac{λ}{2} ∥ J (x) ∥^{2}

(14)

Using the same method of computing as Equation (5), the output weights $β$ of ELM-CDAE can be computed as (Jia and Du Citation2016):

(15)

β = {(\frac{I}{C} + H H^{T} + λ \frac{\partial H}{\partial x})}^{- 1} H^{T} X

(15)

The ELM-CDAE training algorithm is summarized as follows:

Step 1. Make use of the corrupting noise to corrupt initial samples $x$ and obtain corrupted samples $\hat{x}$ . The noise can typically be Gaussian noise, Masking noise, Salt-and-pepper noise.

Step 2. Initiate the ELM-CDAE network with a given number of hidden nodes, randomly generate input weights matrix and biases, and then use corrupted samples $\hat{x}$ and Equation (4) to calculate the hidden layer output matrix $H$ .

Step 3. Calculate the output weight matrix $β$ by Equation (15).

Then, a deep model based on ELM-CDAE and WELM is proposed. The model framework is shown in . Resembling the multi-layer ELM, which stacked the ELM auto-encoder (ELM-AE) (Kasun et al. Citation2013), the ELM-CDAE is stacked in this paper to create a new deep ELM model. Furthermore, the output of the connections between the last hidden layer and the output node is analytically calculated using the WELM approach.

Figure 3. Structure of the proposed model (ELM-CDAE).

Similar to the multi-layer ELM, each hidden layer in the deep ELM is trained in a bottom-up and greedy layer-wise manner. The raw input vectors, after being corrupted, are fed to the bottom auto-encoder. After training the bottom auto-encoder, the output hidden representations are wired to the subsequent layer. The same procedure is repeated until all the auto-encoders are trained. After this pre-training stage, the output is further fed into a WELM and the output weight is computed by Equation (8). Thus, the training of the whole network is completed.

The overall training procedure of the proposed model can be summarized as follows:

Step 1. Initiate the model parameters, including the number of hidden layers m, the number of each hidden unit, the input weights and biases. Select the activation function $g (x)$ and the corrupting noise.

Step 2. Pre-training stage:

For iteration m = 0:

(a) Make use of the corrupting noise to corrupt initial inputs $X$ and obtain corrupted inputs $\hat{X}$ ;

(b) Use all samples in $\hat{X}$ to train the ELM-CDAE and calculate the current layer output weight matrix $β_{i}$ by Equation (15).

Step 3. Calculate the output of the stacked ELM-CDAE:

(16)

f_{S E L M - C D A E} (x) = β_{m} (g (β_{m - 1} (\dots g (β_{0} x)))

(16)

Step 4. Use the output of the stacked ELM-CDAE to calculate the weight matrix $W$ in the WELM by Laplacian regularized least squares method.

Step 5. Calculate the output weight matrix $β$ by Equation (15).

3 Experiments

3.1 Data description

Monthly water quality data from eight water quality monitoring station near the Zengwun River, Taiwan, were collected between 22 January 2002 and 5 December 2017. These data are from the Taiwan Environmental Protection AgencyFootnote¹. shows the distribution of the water quality monitoring stations and the symmetrical river network distance between two adjacent monitoring stations, which finds the shortest distance between two monitoring stations along the stream network.

Figure 4. Distribution of the water quality monitoring stations in the study area.

3.2 Performance metrics and settings

To evaluate the validity of the model, some sites with historical data are used as virtual sites, and the sites upstream and downstream of them are used as reference sites to predict the actual value of the virtual sites. The input of the proposed model is the observed water quality data of the reference sites at the current time, and the output is the predicted value of the virtual site at the current time. If the predicted results of the virtual site can show the real water quality of the virtual site more accurately, then the virtual site can be optimized in the monitoring network.

In the experiments, 70% of the data collected at each site are used as a training set for the basic models, and the remaining 30% is used as a test set to test the effectiveness of the models. The datasets need to be normalized before they can be used. Normalization is the scaling of data to a small-specific interval in order to remove the unit limit of the data and convert it to a pure dimensionless value. In this way, we can compare and give weights to different units or orders of magnitude. The normalized method used in this paper can be expressed as follows:

(17)

{\hat{x}}_{i} = \frac{x_{i} - x_{m i n}}{x_{m a x} - x_{m i n}}

(17)

where ${\hat{x}}_{i}$ is the normalized value; and $x_{i}, x_{m a x}$ and $x_{m i n}$ represent the sample, maximum and minimum values in the sample, respectively.

To evaluate the model, three statistical indices, RMSE, MAE and MAPE, and the correlation coefficient R, are used. These indicators can be formulated as follows:

(18)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

(18)

(19)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |{\hat{y}}_{i} - y_{i}|

(19)

(20)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} \frac{|{\hat{y}}_{i} - y_{i}|}{y_{i}}

(20)

(21)

R = \frac{\sum_{i = 1}^{N} (y_{i} - \overline{y}) ({\hat{y}}_{i} - \overline{\hat{y}})}{\sqrt{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2} {({\hat{y}}_{i} - \overline{\hat{y}})}^{2}}}

(21)

where $y_{i}, {\hat{y}}_{i}$ and N are, respectively, the observed water quality parameter value, the predicted water quality parameter value and the number of test samples; and $\overline{y}$ and $\overline{\hat{y}}$ denote the observed and estimated means, respectively.

In addition, three classical shallow neural networks, the BPNN, ELM and WELM, and a deep ELM model based on ELM-DAE are used to verify the effectiveness of the proposed model. Furthermore, one mathematical method, OK, is also included. Some settings of these models are described as follow. The maximum number of iterations for the BPNN is 100. The Levenberg-Marquardt algorithm is adopted in the BPNN model. The activation function of the BPNN and ELM is a sigmoid function. The number of hidden nodes is gradually increased by an interval of 3 and the nearly optimal number of nodes for the BPNN and ELM are then selected based on threefold cross-validation method. The regularization parameter C for different datasets in the ELM and WELM models is to select the most suitable value from $[10^{- 20}, 10^{- 19}, \dots, 10^{19}, 10^{20}]$ by cross-validation method. For the deep ELM model based on ELM-DAE, the number of neurons in each hidden layer is the same. The optimal number of hidden layers and neurons in each hidden layer can be obtained by the cross-validation method. For the regularization parameter $λ$ in the proposed model, the best parameters to deal with different datasets were found from a total of 100 values in the range 0–1 by setting an increment of 0.01. The parameters are described in detail in a later section. The semi-variogram function in the OK method can be expressed as follows:

(22)

γ (h) = c_{0} + c [1 - exp (- \frac{h}{a}) cos (\frac{2 π h}{b})]

(22)

where $c_{0}, c, a$ and b are the parameters to be fitted and h is the distance between monitoring stations.

3.3 Spatial correlation analysis

The Pearson correlation coefficient can express the correlation of two variables, and its formula can be described as follows:

(23)

ρ_{X, Y} = \frac{\sum_{i = 1}^{N} (X_{i} - \overline{X}) (Y_{i} - \overline{Y})}{\sqrt{\sum_{i = 1}^{N} {(X_{i} - \overline{X})}^{2}} \sqrt{\sum_{i = 1}^{N} {(Y_{i} - \overline{Y})}^{2}}}

(23)

In order to analyse the spatial correlation of the water quality parameters, the Pearson correlation coefficient is used in this paper for selected monitoring stations; the results for water temperature and electrical conductivity are shown in , where the stations #1–#8 are numbered from upstream to downstream according to the real distribution of stations in the basin. For example, #1 is the First Zengwen Bridge station and #2 is Yujing Bridge station and so on.

Table 1. Correlation coefficient of water temperature and electrical conductivity between different stations. #1 is the First Zengwen Bridge station and #8 is Xigang Bridge station (see ).

Download CSV Display Table

From the data in , it can be clearly seen that the correlation coefficients between water temperature at all stations are above 0.88, which indicates that there is a good correlation between water temperatures at all stations in space. This is a common phenomenon, because the change in water temperature is normally related to the change in air temperature, while the correlation of air temperature in space is relatively high. Although all stations are not completely correlated for electrical conductivity, from those of adjacent stations, there is still a certain spatial correlation of electrical conductivity in the watershed. In most cases, the correlation coefficients of adjacent stations are greater than 0.6, which means that the two sites are strongly correlated.

Based on the above correlation analysis results, in order to verify the validity and accuracy of the proposed model, some sites are considered as virtual monitoring sites that need to be predicted, and high-correlation sites are treated as reference sites. The reference sites for different virtual sites in predicting water temperature and electrical conductivity are shown in .

Table 2. Reference stations for different virtual stations in predicting water temperature and electrical conductivity.

Download CSV Display Table

3.4 Influence of different parameters on model prediction results

The number of hidden layers and the number of neurons in the hidden layer will affect the prediction results of the models. In order to test the influence of parameter changes on the proposed method, cross-validation is carried out on two datasets. In the deep model, the number of neurons in each hidden layer is the same. The nodes set in the experiments include 3, 5, 8, 10, 15, 20 and 30. and show experimental results of two parameter dataset replicates with the same number of neurons in different hidden layers () and different numbers of neurons in the same hidden layer ().

Figure 5. Predictive effect of the model on water temperature at different depths: (a) station #1, (b) station #3, (c) station #5 and (d) station #7.

Figure 6. Predictive effect of the model on electrical conductivity at different depths: (a) station #1, (b) station #3, (c) station #5 and (d) station #7.

By analysing the data changes through and , it is found that the prediction accuracy indication (RMSE) first increases and then decreases as the number of hidden layer neurons increases. This indicates that the model has been over-fitted with the increase of the number of neurons in the hidden layer and more neurons in the hidden layer are not as good as possible. Furthermore, the results show that a 2- or 3-layer model achieved good results in water temperature and electrical conductivity data fitting.

3.5 Comparison of different model prediction results

In order to verify the effectiveness of the improved model in predicting the water quality of virtual sites, three shallow neural network models, BPNN, ELM and WELM, and two deep ELM models, ELM-DAE and ELM-CDAE, were run with the parameters in each model as described in the last section. Under the same conditions, the prediction results are shown in and .

Figure 7. Prediction of water temperature for four virtual monitoring stations by different models: (a) station #1, (b) station #3, (c) station #5 and (d) station #7.

Figure 8. Predictions of electrical conductivity for four virtual monitoring stations by different models: (a) station #1, (b) station #3, (c) station #5 and (d) station #7.

By analysing and , the following observations are made:

The RMSE of the water temperature predicted by the ELM-based models is smaller than that predicted by the BPNN model. The stability of ELM-based models is also better than that of the BPNN model. This means that the ELM-based model is more suitable for water temperature prediction than the BPNN model, especially for Station #1. In addition, deeper models perform better than shallow ELM models, such as ELM and WELM, because the deep ELM models can extract richer information from water quality data in shallow networks, especially for Station #5.
As can be seen from , the RMSE of the ELM-based model is mostly lower than that of the BPNN model in the 10 trials at stations #1 and #3. It is clear that the ELM-based models are more suitable for forecasting electrical conductivity at stations #1 and #3 than the BPNN models. From the prediction results of stations #5 and #7, it is seen that the prediction results of the BPNN model are better than those of the traditional shallow ELM model, but the difference between the results of the BPNN model and the deep ELM model is not particularly large. However, a defect that the BPNN model has is poor stability and it is difficult to converge, which was exposed in the 10 repeated trials. For example, the RMSE of the Experiment 6 for Station #3 is much larger than that of the other experiments, which means that the BPNN falls into local optimum too early. Besides, experiments 8 and 9 for Station #5 and the experiments 1 and 2 for Station #7 also show this flaw in the BPNN.

Furthermore, the average of the 10 experimental results indicated by the performance metrics and as well as the classical OK method are tested in this paper. The experimental results are shown in and , from which the following useful observations can be made:

Table 3. Prediction results of different models for water temperature at different stations.

Download CSV Display Table

Table 4. Prediction results of different models for electrical conductivity at different stations.

Download CSV Display Table

For water temperature forecasting results, the R values predicted by the neural network models are all more than 0.9 (), which means that the change in the prediction results in the virtual monitoring sites is consistent with that of the observed values. Besides, the results of the task for conductivity at stations #5 and #7 also show good performance. These experiments may give us a useful suggestion that some monitoring parameters of virtual monitoring sites can be stopped in the future, because the actual values of these parameters can be obtained by using the predicted results of relevant sites. However, the interpolation result for conductivity at statopms #1 and #2 shows that the R values are small, only about 0.4, which indicates that the prediction result is not ideal and monitoring of conductivity at these sites may no longer be optimized.
It is worth noting the prediction results of stations #5 and #7. For Station #5, the average of the results of 10 experimental trials show that the performance of the BPNN is better than that of the shallow ELM models, but inferior to that of the deep ELM models. Besides, the MAPE of the results at Station #7 is much more than 1, which indicates that there is a great gap between the prediction results of the BPNN and the observation values, even more than that of the OK method. Therefore, it is not recommended to use the BPNN to predict the conductivity at Station #7.
Compared with classical spatial interpolation algorithms such as the OK method, artificial neural networks such as the BPNN and ELM-based models are better when interpolating the water temperature at virtual sites. When predicting the actual value of electrical conductivity, the ANNs are also better than the OK method in most cases, except at Station #7. In addition, as indicated by the four performance indicators, the improved model proposed in this paper has higher prediction accuracy than the other models for its higher feature extraction ability.

4 Conclusion

A new deep ELM model for obtaining water quality data of an optimized monitoring network at a virtual monitoring station is presented. The model was divided into two parts: the first part was formed by stacking ELM-CDAE, which enables the whole model to extract more robust features from water quality data between different monitoring sites. In order to enhance the generalization capability of the model for virtual water quality prediction, the second part was realized by WELM, which uses the ratio of the current quarter sample size to the total sample size as the weight of the current sample. Finally, monthly water temperature and conductivity data from eight monitoring stations near Zengwen River in Taiwan collected from 22 January 2002 to 5 December 2017 were used to verify the effectiveness and the parameters in the model were selected by the cross-validation method. The water temperature forecasting results show that ANN models are better than the OK method, and the proposed method is superior to the other ELM-based models. For the prediction of conductivity at different virtual sites, the three shallow ANN models (BPNN, ELM, WELM) provided the most suitable sites. Besides, the OK method also showed better accuracy than the BPNN for the prediction task of Station #7, but it was worse than the ELM-based models. For the two water quality datasets, the deep ELM model performed well and the modified model was better, providing a good method to obtain water quality data of a virtual site for an optimized monitoring network.

In the future work, the influence of historical data length of each reference station will be discussed and then the water quality at a virtual monitoring station may be predicted.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This study is supported by International Science and Technology Cooperation Program of Zhejiang Province for Joint Research in High-tech Industry [No. 2016C54007], National Key R&D Program of China [No. 2016YFC0201400], Leading Talents of Science and Technology Innovation in Zhejiang Provincial Ten Thousands Plan [No. 2019R52040], Provincial Key R&D Program of Zhejiang Province [No. 2017C03019] and National Natural Science Foundation of China and Zhejiang Joint Fund for Integrating of Informatization and Industrialization [No. U1509217].

Notes

¹ https://wq.epa.gov.tw/Code/Report/DownloadLis-t.aspx.

References

Antanasijević, D., et al., 2017. A novel SON2-based similarity index and its application for the rationalization of river water quality monitoring network. River Research & Applications, 34 (1), 144–152. doi:10.1002/rra.3231
Google Scholar
Chapman, D.V., et al., 2016. Developments in water quality monitoring and management in large river catchments using the Danube River as an example. Environmental Science & Policy, 64 (5), 141–154. doi:10.1016/j.envsci.2016.06.015
Google Scholar
Chunping, O., et al., 2012. Coupling geostatistical approaches with PCA and fuzzy optimal model (FOM) for the integrated assessment of sampling locations of water quality monitoring networks (WQMNs). Journal of Environmental Monitoring Jem, 14 (12), 3118–3128. doi:10.1039/c2em30372h
PubMedGoogle Scholar
Deng, W.Y., et al., 2010. Research on extreme learning of neural networks. Chinese Journal of Computers, 33 (2), 279–287. doi:10.3724/SP.J.1016.2010.00279
Google Scholar
Do, H.T., et al., 2012. Design of sampling locations for mountainous river monitoring. Environmental Modelling & Software, 27 (2), 62–70. doi:10.1016/j.envsoft.2011.09.007
Google Scholar
Earle, R. and Blacklocke, S., 2008. Master plan for water framework directive activities in Ireland leading to River basin management plans. Desalination, 226 (1), 134–142. doi:10.1016/j.desal.2007.02.103
Web of Science ®Google Scholar
Guigues, N., Desenfant, M., and Hance, E., 2013. Combining multivariate statistics and analysis of variance to redesign a water quality monitoring network. Environ Sci Process Impacts, 15 (9), 1692–1705. doi:10.1039/c3em00168g
PubMed Web of Science ®Google Scholar
Huang, G.B., et al., 2012. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern, 42 (2), 513–529. doi:10.1109/TSMCB.2011.2168604
PubMed Web of Science ®Google Scholar
Huang, G.B., Zhu, Q.Y., and Siew, C.K., 2006. Extreme learning machine: theory and applications. Neurocomputing, 70 (1), 489–501. doi:10.1016/j.neucom.2005.12.126
Web of Science ®Google Scholar
Jia, X. and Du, H., 2016. Contractive ML-ELM for invariance robust feature extraction. In: Proceedings of ELM-2015 Volume 2. Cham: Springer, 203–208.
Google Scholar
Jiang, Y., et al., 2013. Analysis of spatial distribution of goundwater quality in Huaihe river basin. In: 2013 21st International Conference on Geoinformatics. Kaifeng, China: IEEE, 1–7.
Google Scholar
Juan, P.R., 2003. Neural networks for spatial interpolation of meteorological data. In: 3rd Conference on Artificial Intelligence Applications to the Environmental Science. Boulder, Colorado: American Meteorological Society, 60–68.
Google Scholar
Karamouz, M., et al., 2009a. Design of River water quality monitoring networks: a case study. Environmental Modeling & Assessment, 14 (6), 705–714. doi:10.1007/s10666-008-9172-4
Web of Science ®Google Scholar
Karamouz, M., et al., 2009b. Design of on-line river water quality monitoring systems using the entropy theory: a case study. Environmental Monitoring & Assessment, 155 (1–4), 63–81. doi:10.1007/s10661-008-0418-z
PubMed Web of Science ®Google Scholar
Kasun, K.L.C., et al., 2013. Representational learning with ELMs for big data. IEEE Intelligent Systems, 28 (6), 31–42.
Web of Science ®Google Scholar
Khashei-Siuki, A. and Sarbazi, M., 2015. Evaluation of ANFIS, ANN, and geostatistical models to spatial distribution of groundwater quality (case study: Mashhad plain in Iran). Arabian Journal of Geosciences, 2 (8), 903–912. doi:10.1007/s12517-013-1179-8
Google Scholar
Kovács, J., et al., 2014. Classification into homogeneous groups using combined cluster and discriminant analysis. Environmental Modelling & Software, 57 (5), 52–59. doi:10.1016/j.envsoft.2014.01.010
Google Scholar
Laaha, G., Skøien, J.O., and Blöschl, G., 2012a. Comparing geostatistical models for river networks. Geostatistics Oslo 2012, ( Springer Netherlands), 45 (2), 543–553.
Google Scholar
Laaha, G., et al., 2013. Spatial prediction of stream temperatures using top-kriging with an external drift. Environmental Modeling & Assessment, 18 (6), 671–683. doi:10.1007/s10666-013-9373-3
Web of Science ®Google Scholar
Laaha, G., Skøien, O.J., and Blöschl, G., 2012b. Spatial prediction on river networks: comparison of top‐kriging with regional regression. Hydrological Processes, 28 (2), 315–324. doi:10.1002/hyp.9578
Web of Science ®Google Scholar
Lee, C., et al., 2014. Efficient method for optimal placing of water quality monitoring stations for an ungauged basin. Journal of Environmental Management, 132 (1), 24–31. doi:10.1016/j.jenvman.2013.10.012
PubMedGoogle Scholar
Li, B., McClendon, R.W., and Hoogenboom, G., 2004. Spatial interpolation of weather variables for single locations using artificial neural networks. Transactions of the ASAE, 47 (2), 629–637. doi:10.13031/2013.16026
Google Scholar
Li, J., et al., 2013. Use of genetic-algorithm-optimized back propagation neural network and ordinary kriging for predicting the spatial distribution of groundwater quality parameter. International Conference on Graphic and Image Processing, 8768 (5), 87684V.
Google Scholar
Liu, Y., Feng, X., and Zhou, Z., 2015. Multimodal video classification with stacked contractive autoencoders. Signal Processing, 120 (4), 761–766. doi:10.1016/j.sigpro.2015.01.001
Google Scholar
Mavukkandy, M.O., Karmakar, S., and Harikumar, P.S., 2014. Assessment and rationalization of water quality monitoring network: a multivariate statistical approach to the Kabbini River (India). Environmental Science & Pollution Research, 21 (17), 10045–10066. doi:10.1007/s11356-014-3000-y
PubMed Web of Science ®Google Scholar
Maymandi, N., Kerachian, R., and Nikoo, M.R., 2018. Optimal spatio-temporal design of water quality monitoring networks for reservoirs: application of the concept of value of information. Journal of Hydrology, 558 (10), 328–340. doi:10.1016/j.jhydrol.2018.01.011
Google Scholar
Mehrjardi, R.T., Jahromi, M.Z., and Heidari, A., 2008. Spatial distribution of groundwater quality with geostatistics (Case study: yazd-Ardakan plain). World Applied Sciences Journal, 4 (1), 455–462.
Google Scholar
Mitrović, T., et al., 2019. Virtual water quality monitoring at inactive monitoring sites using Monte Carlo optimized artificial neural networks: a case study of Danube River (Serbia). Science of the Total Environment, 654 (8), 1000–1009. doi:10.1016/j.scitotenv.2018.11.189
PubMedGoogle Scholar
Philippopoulos, K. and Deligiorgi, D., 2012. Application of artificial neural networks for the spatial estimation of wind speed in a coastal region with complex topography. Renewable Energy, 38 (1), 75–82. doi:10.1016/j.renene.2011.07.007
Web of Science ®Google Scholar
Prandtl, L., 1925. Z. angew. Math. Mech, 55 (5), 136–139.
Google Scholar
Reyjol, Y., et al., 2014. Assessing the ecological status in the context of the European water framework directive: where do we go now? Science of the Total Environment, 497 (1), 332–344. doi:10.1016/j.scitotenv.2014.07.119
PubMedGoogle Scholar
Rifai, S., et al., 2011. Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA: International Machine Learning Society, 833–840.
Google Scholar
Rizo-Decelis, L.D., Pardo-Igúzquiza, E., and Andreo, B., 2017. Spatial prediction of water quality variables along a main river channel, in presence of pollution hotspots. Science of the Total Environment, 55 (4), 276–290. doi:10.1016/j.scitotenv.2017.06.145
Google Scholar
Sabzipour, B., Asghari, O., and Sarang, A., 2017. Evaluation and optimal redesigning of river water-quality monitoring networks (RWQMN) using geostatistics approach (case study: Karun, Iran). Sustainable Water Resources Management, 45 (2), 1–17.
Google Scholar
Shahabi, M., et al., 2016. Spatial modeling of soil salinity using multiple linear regression, ordinary kriging and artificial neural network methods. Archives of Agronomy & Soil Science, 63 (2), 151–160. doi:10.1080/03650340.2016.1193162
Web of Science ®Google Scholar
Sotiropoulos, D.G., Kostopoulos, A.E., and Grapsa, T.N., 2002. A spectral version of Perry’s conjugate gradient method for neural network training. Proceedings of 4th GRACM Congress on Computational Mechanics, 1 (5), 291–298.
Google Scholar
Tanos, P., et al., 2015. Optimization of the monitoring network on the River Tisza (Central Europe, Hungary) using combined cluster and discriminant analysis, taking seasonality into account. Environmental Monitoring and Assessment, 187 (9), 575. doi:10.1007/s10661-015-4777-y
PubMed Web of Science ®Google Scholar
Villas-Boas, M.D., Olivera, F., and Azevedo, J.P.S.D., 2017. Assessment of the water quality monitoring network of the Piabanha River experimental watersheds in Rio de Janeiro, Brazil, using autoassociative neural networks. Environmental Monitoring & Assessment, 189 (9), 439. doi:10.1007/s10661-017-6134-9
PubMed Web of Science ®Google Scholar
Vincent, P., et al., 2010. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11 (12), 3371–3408.
Google Scholar
Wang, H., et al., 2015. Optimal Design of River Monitoring Network in Taizihe River by Matter Element Analysis. Plos One, 10 (5), 1455–1465.
Web of Science ®Google Scholar
Wang, Y.B., et al., 2014. Spatial pattern assessment of river water quality: implications of reducing the number of monitoring stations and chemical parameters. Environmental Monitoring & Assessment, 186 (3), 1781–1792. doi:10.1007/s10661-013-3492-9
PubMed Web of Science ®Google Scholar
Yang, X. and Jin, W., 2010. GIS-based spatial regression and prediction of water quality in river networks: A case study in Iowa. Journal of Environmental Management, 91 (10), 1943–1951. doi:10.1016/j.jenvman.2010.04.011
PubMed Web of Science ®Google Scholar
Yingchun, F., et al., 2013. GIS and ANN-based spatial prediction of DOC in river networks: a case: study in Dongjiang, Southern China. Environmental Earth Sciences, 68 (5), 1495–1505. doi:10.1007/s12665-012-2177-y
Web of Science ®Google Scholar
Zhang, N., Ding, S., and Shi, Z., 2016. Denoising Laplacian multi-layer extreme learning machine. Neurocomputing, 171 (3), 1066–1074. doi:10.1016/j.neucom.2015.07.058
Google Scholar
Zong, W., Huang, G., and Chen, Y., 2013. Weighted extreme learning machine for imbalance learning. Neurocomputing, 101 (2), 229–242. doi:10.1016/j.neucom.2012.08.010
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Water quality monitoring at a virtual watershed monitoring station using a modified deep extreme learning machine

ABSTRACT

1 Introduction

2 Method