3,303
Views
45
CrossRef citations to date
0
Altmetric
Articles

Forecasting watermain failure using artificial neural network modelling

, , &
Pages 24-33 | Published online: 28 Mar 2013

Abstract

After rapid urban expansion in Ontario, post-World War II, there followed a lengthy period of time where only minimal infrastructure maintenance occurred. Now, however, most of that infrastructure is approaching the end of its predicted life expectancy, and has started failing at an unprecedented rate. The combination of low maintenance and the increasing age of water distribution infrastructure has resulted in increasing rates of pipe failures. To assign priorities for repair/replacement, artificial neural network modelling is employed. Eight independent variables are employed, namely pipe length, diameter, age, break category, soil type, pipe material, the year of Cement Mortar Lining (if implemented), and the year of Cathodic Protection (if implemented), to determine the importance of different factors influencing the pipe failure rate. The results in application to the distribution system in Etobicoke, Ontario demonstrate that ANN models have very strong predictive capabilities (R2=0.94) when compared with the multiple linear regression method (R2=0.75) to assist rehabilitation planning.

Après la rapide expansion urbaine qui suivi la seconde guerre mondiale, l’Ontario connu une longue période pendant laquelle on ne porta attention qu’à l’entretien des petites infrastructures. Maintenant, la plupart des infrastructures approchent de leur fin de vie, et ont commencé à se détériorer à un rythme sans précédent. La combinaison du faible entretien et du vieillissement des infrastructures de distribution de l’eau a entraîné une augmentation des taux de bris des conduites. Pour constituer un outil d’aide à la décision, essentiel dans le choix du réseau à réhabiliter en priorité, on cartographie la prévision des défaillances du réseau de distribution d’eau à l’aide du système de modélisation des réseaux neuronaux artificiels (RNA). Cette approche a été appliquée au réseau de la ville de Etobicoke dans l’Ontario. Le modèle comporte huit variables indépendantes, notamment: longueur de la conduite, diamètre, âge, matériau, catégorie des défaillances, type de sol, plus deux facteurs de travaux de réhabilitation. Aux canalisations, on inclut l'année de mortier du ciment de revêtement (s’il a été appliqué), et l'année de la protection cathodique (si elle est appliquée). Afin de déterminer l'importance des différents facteurs qui influencent les défaillances des conduites. Les résultats obtenus pour le réseau d’eau à Etobicoke, démontrent que les modèles RNA ont de très fortes capacités de prévision (R2 = 0.94) pour faciliter les stratégies de réhabilitation, par rapport à la méthode de régression linéaire multiple (R2 = 0.75).

Introduction

After rapid urban expansion in Ontario Canada post-World War II, there followed a lengthy period of time where only minimal infrastructure maintenance occurred. Now, however, much of that infrastructure is approaching the end of its predicted life expectancy, and has started failing at an unprecedented rate (Pelletier and Villeneuve Citation2003; Schuster and McBean Citation2008). For instance, in the City of Toronto, many pipes are beyond their theoretical service life of about 80 years (City of Toronto 2009). This is leading to a crisis in many municipalities where the failing infrastructure is impacting people’s lives in the form of broken water distribution pipes and flooded basements, as examples.

There are numerous factors that may lead to the failure of a water distribution pipe. Underlying reasons for failures include deterioration due to aging of pipes, proximity to busy traffic routes, and extreme weather patterns, including high precipitation and extreme temperatures (both high and low). When a pipe breaks, major consequences include: the flow of water is cut off, leaving people without ready access to drinking water, and second, due to the loss of pressure in the pipes, there may be ingress of contaminants into the pipe, which compromises the cleanliness of the watermain distribution system. In response, two rehabilitation strategies have proven beneficial, namely, Cement Mortar Lining (CML) and Cathodic Protection (CP), where they have been used to control the effects of external corrosion on the water distribution system and to improve a pipe’s condition. The CML process removes rust build-up on the inside of the watermain and then lines the internal surface with a thin layer of cement. This typically increases the effective life of the watermain by 30 – 50 years and results in improved hydraulic carrying capacity of the watermain. The CP process involves attaching magnesium anodes to the watermain. The anode corrodes instead of the watermain to which it is connected, thus preserving the pipe. The anode’s performance is monitored every 5 years.

The U.S. EPA has predicted that in the United States alone, $138 billion will be required to replace and maintain the existing drinking water systems during the next 20 years (Selvakumar et al. Citation2002). Some municipalities have fallen so far behind in maintenance of piping infrastructure that they constantly have to deal with crisis situations. The crisis situations consume such a large part of their budget that they are unable to take control of the situation and implement a pro-active approach to pipe maintenance. To put this in context, the City of Toronto’s water distribution system consists of 5,850 km of watermains which faces approximately 1300 watermain breaks each year which indicates the magnitude of the budgeting problem. In Kingston, the repair cost for each break has been estimated between $5,000 and $10,000. Ontario Sewer and Watermain Construction Association estimated that, province wide, the total number of leaks adds about $160 million in operating costs to taxpayers (Press 2009). In 2007, Toronto City Council approved an $87.7 million program directed at renewing its aging watermain infrastructure (City of Toronto 2009).

In summary, the water industries in cities need an intelligent system that can combine all recorded data to analyze the complex relationships in the data and to assist the processes of decision-making for asset management. In particular, ongoing rehabilitation activities of watermains are needed, to be assessed in terms of efficiency. This paper describes results from application of Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) to characterize pipe failure rates in the City of Etobicoke (Canada, Ontario).

Precedent models for watermain failure

Watermain failure modelling is widely discussed in the technical literature. One of the first attempts to model pipe failure using a regression model was performed by Shamir and Howard (Citation1979). Several studies over the past decade have looked at the effect of dynamic variables (temperature, soil moisture and conductivity) on the watermain failure rate, but these have proven to be difficult due to the cost of data acquisition. Therefore, the trend has been to create models using static variables (pipe material, diameter, wall thickness, etc.). Kleiner and Rajani (Citation2001) were successful in creating a model for pipe failure predictions based upon pipe material, number of years being observed, the number of failures, the aging rate, the freezing index, etc. The model then applied the observed number of failures to predict when failures will occur in the future. The effect of temperature on watermain failure has been observed by Lackington and Large (Citation1980), Needham and Howe (Citation1981), Newport (Citation1981), Walski and Pelliccia (Citation1982), Ciottoni (Citation1985), Goulter and Kazemi (Citation1988), Lochbaum (Citation1993), Chambers (Citation1994) and Habibian (Citation1994), where most found that the typical annual patterns of breakage rates peaked during the winter season when temperatures are at their lowest. Schuster and McBean (Citation2008) and O’Day (Citation1982) derived failure probability models based on the time of previous failures, soil types, pipe diameters, environmental conditions and age. Ahn et al. (Citation2005) observed that most types of pipes (ductile iron, steel, and asbestos cement) did not have an increased number of failures during the colder winter months but that cast iron did show a significant increase in the number of failures during the winter months.

The challenges in the modelling of the average rate of pipe failures arise because external factors create “noise” in the failure data (Kleiner and Rajani 2002). External factors may include temperature, traffic, and precipitation as well as operational aspects such as the replacement rate of old pipes. Another common barrier preventing even basic analyses of piping infrastructure and failure rates is the lack of data on pipe breakage history within the pipe network. Most municipalities maintain records of the pipes installed, including data on pipe length, diameter, pipe material and date of installation; however, only a few municipalities have records of pipe breakage longer than a decade (Pelletier and Villeneuve Citation2003). This short length of time creates difficulties in creating reliable models of pipe breakage.

While the above models are based on regression analysis, Artificial Neural Networks (ANNs) are a relatively new soft computing tool with architecture inspired by that of the brain. In water resources and environmental systems, particularly, ANNs have grown considerably over the last decade where ANN models are a highly interconnected network consisting of several simple processing elements (also known as neurons), which are capable of performing massive, parallel computations for data processing and knowledge representation (Najjar and Basheer Citation1996). In fact, ANN models attempt to replicate the functioning of a biological nervous system. ANN modelling consists of three tasks: (i) entry of data from the input-layer side of the network, (ii) processing of information within the network body, and (iii) production of output(s) at the output layer (Najjar and Basheer Citation1996). A typical feedforward multilayer neural network consists of an input layer, one (or more) hidden layer(s) and an output layer. The signal travels in one direction from the input to the output. The dataset must be sufficiently large and contain sufficient variability for the program to create sufficiently diverse connections and improve its ability to create predictions.

The literature reveals that ANNs have been used successfully in failure prediction and classification in water and sewer pipelines. However, only a few reported case studies deal explicitly with the effects of rehabilitation works on watermain failure patterns. For instance, ANN models have been applied to water distribution networks of a subdivision in Edmonton, Canada (Kleiner and Rajani Citation2001). The models were trained with historical input data including temperature, rainfall, operating pressure, and number of breaks. Ahn et al. (Citation2005) used an ANN model for predicting water pipe breaks in service pipes and mains in Seoul (Korea); they observed good performance for a prediction model based on pipe characteristics and water and soil temperatures. Moselhi and Shehab-Eldeen (Citation2000) employed an ANN in the analyses and classification of defects in sewer pipelines. The ANN models were trained to classify four different types of defects including cracks, spalling, joint displacements, and reduction of cross-sectional area. However, these models are very limited because such data are generally not collected. Al-Barqawi and Zayed (Citation2008) created an ANN model to predict the current conditions of watermains to determine which pipes need the most urgent rehabilitation. They created a scale from 0 to 10, 0 being the worst condition and 10 being the best, to rate each pipe. Each value has specific instructions on how the pipe should be dealt with, ranging from requiring immediate action to reassessing the pipe in 15 years. Using their ANN model, they were able to establish which factors contributed the most in determining the condition of the pipe and found the variables, from most important to least important, as follows: pipe age, pipe material, pipe wall thickness, thrust restraint, type of joints, pipe diameter, pipe lining and coating, dissimilar metals and pipe installation practices. Recently, Ho et al. (Citation2009) created a GIS-based hybrid ANN in order to prioritize the order in which water distribution pipes are replaced. The model consisted of three different sections, the first being fuzzy logic which was used to determine the significance of each of the variables. The second part, a seismic-based GIS model, which was used to model the frequency and intensity of earthquakes in the region. The final part was an ANN model which combined the variables from the fuzzy logic model with the seismic activity and produced an output that prioritized the replacement of the water distribution pipes. The literature also provides a variety of genetic algorithm models that can assist the decision makers in scheduling watermain replacement (e.g. Dandy and Engelhardt Citation2001, 2006; Tabesh et al. 2009).

Comparisons of ANN models with MLR are widely discussed in the literature. Almost all previous research has shown ANN models simulate phenomena better than MLR models. To cite a recent example, Bowden et al. (Citation2006) forecast chlorine residuals in a water distribution system using a neural network and compared with results of multiple linear regression model. The identified ANN model provides significant improvement over the MLR model. Jafar et al. (Citation2010) modelled the failure in water infrastructure networks using ANNs and MLR in Waterloo France. The results reported a better performance by ANN and are in concordance with the findings of Asnaashari et al. (Citation2009).

Despite the large body of research available, a comprehensive fundamental theoretical model is not currently available due to both the complications surrounding the failure process and the fact that each case is unique. Hence, a reliable, empirical method for predicting failure rate based on historical data remains the next most desirable approach. Therefore, as previous research has shown, ANN models provide such capacity to model the watermain failure process. The outcomes of the models will then be used for prioritizing repairs and making decisions on when and how to carry out maintenance and rehabilitation in the city. For this case study, two special features during watermain life, CML and CP, were employed as predictor variables in the ANN and MLR models.

Watermain failure modelling in the study area

The Greater Toronto Area (GTA) contains an extensive network of piping infrastructure. According to the City of Toronto Public Works department, there is $8.7 billion of watermain infrastructure, 5,850 km of distribution watermains, four water treatment plants and eighteen pumping stations within the GTA. On average, the age of Toronto’s watermains is 55 years, with 17% of the watermains being more than 80 years of age and with 6.5% of watermains more than 100 years old. Toronto experiences, on average, 1,300 watermain breaks per year. Between the 1900s and 1960s there were significant reductions in the manufactured wall thickness of watermains, as cost-saving measures. This has resulted in pipes placed in the 1950s and 1960s having shorter life-spans than pipes placed previously. In terms of watermain replacement/rehabilitation, the City of Toronto has increased its infrastructure investments by 9% from $ 217 million in 2004 to $240 million in 2005. This extra money is helping to expedite the rehabilitation of the GTA’s aging infrastructure. The study area used herein, Etobicoke, is a subset of the GTA, with 784 kilometres of pipe length recorded. An aggressive program of CML and CP application was carried out between 1984 to 2003, when the application to most pipes was attained.

The dataset includes available information about all watermains in the distribution system. For every pipe, there are data for the length, diameter, pipe material, placement date, year of cement-mortar lining (if done), cathodic protection (if done), and the soil type. Construction details for pipes in the water distribution system have been recorded since 1921. A log was kept each time a pipe broke, including date of failure, type of failure and age at failure, but only since 1960. The failure rate (FR) was computed using the total number of breaks, length of pipe and age of the pipe. Number of breaks in water mains per 100 kilometers of water main pipe in a year (brks/yr/100km) was defined as failure rate (FR). The data originally had 5,422 records and included information about Cast Iron (CI), Ductile Iron (DI), Asbestos Cement (AC), Concrete (Conc.) and Polyvinyl Chloride (PVC) but only CI and DI are used in the analyses herein due to minimal sample sizes for the other pipe types. Another cut was made to remove any incomplete data points; this included any data missing on material type and soil type. Since pipe breaks were only recorded starting in 1960, it would be unreliable to use every pipe that had been built since 1921 due to the possibility that a failure had occurred that was not recorded. However, since the average elapsed time to first failure was 18 years (post 1960), pipes placed up to 15 years before 1960, were used in the analyses to follow, to provide a larger dataset. In total, the cuts described above reduced the original data from 5422 records of breakage to 3836, with 1314 pipes in Sand/Gravel (S/G) and 2522 pipes in Silt/Clay (S/C). The pipes being observed varied greatly in diameter and length. The pipe diameters were as follows; 25, 37, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500 and 600 millimeters and the pipe lengths varied from 0.3 to 1033.5 meter long.

To create predictive analyses of pipe failures, both multiple linear regression (MLR) modelling and ANN modelling were used for developing failure rates of the water distribution infrastructure for the study area. Descriptive investigation of failure data, as an initial step, was performed to enhance understanding of watermain failure behavior. While this analysis also gives insight into the most relevant variables in failures, it cannot be used in prioritizing individual pipe renewal (Rogers and Grigg 2009). MLR and ANN models were selected because they provide explicit guidance that municipalities can use to make decisions as to which pipelines have the most urgent and long term needs for rehabilitation. Eight independent variables are employed, namely, pipe length, diameter, age, break history, soil type, pipe material, the years of cement mortar lining implemented (YCML) and the years of cathodic protection implemented (YCP) to determine the effect that different factors have on the pipe failure rate. Break history was considered as break category (BC) for water mains. For 0 break, BC = 0; for 1 break, BC = 1; for 2 - 4 breaks, BC = 2; for 5 – 9 breaks, BC = 3; and for 10 breaks or more BC = 4.

Statistical analysis

Statistical analyses of available failure data express how the watermain system has performed historically in the Etobicoke area. The past is not always the best predictor of the future, but it can demonstrate trends that are useful for engineering and planning projects. CML and CP are recent innovations for extending the lifespan of aging pipes; CML started being implemented in 1984 and CP was implemented in 1998 for this study area. Results show that significant reductions in pipe failures have already been the recent trend (Figure ).

Figure 1 Number of failures and length of CML and CP per year.

Figure 1 Number of failures and length of CML and CP per year.

The trendline in Figure shows that by 1986, the number of failures occurring each year started to decrease, which is directly correlated to the implementation of CML and later, due to CP. This is very promising, indicating that CML and CP have been effective measures in decreasing pipe failure rates in the Etobicoke area. Decreases in breakage rate have been also observed in the City of Ottawa following the initiation of the CP program (Kleiner et al. 2003). In 1983, the Region of Durham also completed a pilot study on the installation of CP on existing ductile iron watermains in Ajax, Ontario. The successful results of the study spurred the development of the Region’s annual CP Program. Etobicoke watermains have a pattern of seasonal failures with more occurring in the winter time. Figure displays the monthly total number of failure during 1984-2003.

Figure 2 Monthly number of failures from 1984 to 2003.

Figure 2 Monthly number of failures from 1984 to 2003.

Most peaks were recorded in January (216), followed by December (168), and February (164), and then November (91), indicating that the frequency of breaks historically, has been greatest during winter. This finding is verified by the works of others who reported that most failures occurred during the winter. In total, 55% of all breaks occur during the three winter months, December, January and February. This is intuitive because more pipes will burst as the ground freezes in the GTA.

Prediction Model 1: Multiple linear regression

An MLR model was created to simulate the failure rates of the pipes in the system. For this investigation, eight available parameters were considered. Watermain diameter (D), logarithm of pipe length (LL), age and break category (BC), pipe material type, soil type, YCML and YCP were included in MLR model to determine failure rate.

In the prediction analysis with the MLR model, the failure dataset with 3834 records was split into two samples via random selection procedures, in order to perform cross-validation. Firstly, the prediction equation was created using the first sample and the equation then used to create predicted scores for the members of the second sample. In the first group, analyses revealed that the following variables were statistically significant (p < 0.05) predictors of failure rate: LL, Age and BC, producing the following equation:

The break category coefficient in the model demonstrates that it is the most significant variable in determining the failure rate. Figure (a) plots the calculated failure rate using the MLR model against observed values. In the first group, this analysis produced an R² of 0.75. To assess the validity of MLR model through the aforementioned cross-validation approach, the equation was used in the second group to create predicted scores. The result shows cross-validation correlation coefficients equal 0.63 which is considered a good outcome (Figure ). The spacing between curves in Figure results from the inclusion of the categorical variable break category.

Figure 3 Scatter plot of MLR calculated failure rate vs. observedor a) MLR prediction and b) MLr cross-validation.

Figure 3 Scatter plot of MLR calculated failure rate vs. observedor a) MLR prediction and b) MLr cross-validation.

Prediction Model 2: Artificial neural network

The ANN methodology is based on the attempt to model the way a biological brain processes data. It is thus quite different from standard regression analysis for prediction. Numerous studies have shown ANN models represent a promising modelling technique especially for data sets having non-linear relationships which are frequently encountered in engineering processes. Unlike an MLR model, ANN models are not constrained by simplifying mathematical assumptions (e.g. linear system, normal distribution, etc.) To cite a recent example, Asnaashari et al. (Citation2009) explained the limitations of two Multiple and Poisson regressions for watermain failure modelling.

The feed-forward method used for ANN modelling means that the signal only traveled from inputs to outputs, although there are more complex models where the signal can travel back and forth, which is known as a feedback ANN. Another variable in creating an ANN model is the number of hidden nodes, which defines the complexity of the developed model. If the neurons in a hidden layer are too few, the ANN will not be able to model the data accurately. If the number of neurons in a hidden layer is too large, it can sometimes be beneficial, but may lead to over-training (Despagne and Massart Citation1998). This research used three hidden nodes as estimated below. The modelling process for ANN is adapted through the following steps:

1.

Create a database from historical failure data and separate it randomly into three distinct groups: training, testing and validation;

2.

Create the basic architecture of the ANN model using inputs, outputs and the number of hidden layers;

3.

Train and test the ANN model to create the optimal structure, which includes determining the optimal number of hidden nodes and iterations;

4.

Compare the statistical accuracy of the calculations from the training, testing and validation phases;

5.

Is the statistical accuracy from training, testing, and validation sets comparable? If no, go back to step 3; if yes, go to step 6;

6.

An ANN model with an acceptable structure for the desired model is created.

To develop a reliable predictive model using ANN, an appropriate model architecture must be created, using the input variables and the expected output. This method of learning, known as supervised learning, occurs because the program knows the outputs it is trying to calculate, as opposed to unsupervised learning where the outputs are unknown.

ANN structure set-up

Determining the network architecture is a fundamental task in ANN model development (Maier and Dandy Citation2000). It requires the selection of the optimum number of layers and the number of nodes in each of the layers. There is no unified theory for the determination of an optimal ANN architecture, but is generally achieved by fixing the number of layers and choosing the number of nodes in each layer. There are always two layers representing the input and output variables in any neural network. Choosing the number of middle layers (hidden) is the most crucial item in the ANN structure. As previous research has shown (Cybenko Citation1989; Hornik et al. Citation1989; Najjar et al. Citation1997 and Shahin et al. Citation2002), one hidden layer is sufficient to approximate any continuous function, provided that sufficient connection weights are given.

Several studies in North America and Europe over the last decade have documented the influence of pipe and soil parameters on watermains failure. The significance of the diameter, length, age, material, and failure history along with soil type for predicting failure rate, particularly in Canada, has been widely established. In the current case study, in addition to the foregoing variables, two more parameters were added to modelling. The years of cement mortar lining implemented (YCML) and the years of cathodic protection implemented (YCP) were inputs in the modelling. Therefore, the eight input variables (8 nodes) and one output node are listed as:

(1)

In Equation 3, the 8-NH-1 label denotes that there are 8 inputs and 1 output, where NH indicates the number of hidden nodes which must be determined.

A given artificial neuron, contains n inputs with signals X1 through Xn and weights W1 through Wn, the following relation was used to transform all of the input and output variables to values between 0 and 1:

(2)

Where, X = input and output data in the network. Najjar et al. (Citation1997) found that adjustment was more effective in accomplishing faster training by preventing larger numbers from overriding smaller ones. Among the many continuous transfer functions, the widely used sigmoid function was applied to adjust the data to between 0 and 1:

Where, Yj is the output of node j in this layer. The output Yj passes a signal to the output node (k). The net entering signal of an output node is:

(3)

The incoming signal from the output node (Sk ) is transformed using the sigmoid type function to scale the output (Yk ):

(4)

Finally, the scaled output is de-scaled, to produce the target output according to the following formula:

(5)

Determining the optimal number of hidden nodes has always been a question that is raised in neural network applications and there is no direct or precise way to determine the optimal number of nodes in each hidden layer. Several guidelines have been developed by researchers for approximately determining the required numbers of hidden nodes in a hidden layer from knowledge of the number of nodes in both the input and output layers (Najjar et al. Citation1997). For single hidden layer networks, there are a number of rules-of-thumb to obtain the best number of hidden layer nodes. One approach is to assume the number of hidden nodes to be 75% of the number of input units (Salchenberger et al. Citation1992). According to Hajela and Berke (1991), the number of nodes on the hidden layer should be somewhere between the average and the sum of the input and output nodes. A third approach is to fix an upper bound and work back from this bound. Hecht-Nielsen (Citation1989) and Caudill (Citation1988) suggested that the upper limit of the number of hidden nodes in a single layer network may be taken as (2I+1), where I is the number of variables in the input layer.

In this research, 75% of the number of input units was used as an initial guess for the number of hidden nodes in a hidden layer. We calculated 6 hidden nodes for the first net configuration. The iterative method was then utilized, starting from the initial guess, for determining the required number of hidden nodes in a hidden layer and monitoring the accuracy measures on the testing datasets (smallest value for the root mean square error for the test set). This was accomplished by varying the number of initial hidden nodes until the network was able to best learn the patterns involved in the testing datasets. The number of the input and output neurons was already fixed. For each set of hidden neurons, the network was trained in batch mode to minimize the average-square error (ASE) at the output layer. In order to check any over-fitting during training, a threefold Cross-Validation (CV) was performed by keeping track of the efficiency of the fitted model. The training was stopped when there was no significant improvement in the model’s efficiency, and the model was then tested for its generalization properties. As indicated in the process described above, three hidden nodes were determined for the optimal network in order to predict failure rate.

The ANN model was accomplished using randomized data with 50% used for training, 25% for testing, and the remaining 25% used for validation. Once the training and testing was complete, the remaining 25% of data were run through the validation. The validation was performed by using the ANN architecture that was created from the training and testing. The plots in Figure list the R2 values for training, testing, and validation phases of the ANN modelling.

Figure 4 R2 values for the training, testing and validation of the ANN model. a) training, b) testing, c) validation, and d) all data.

Figure 4 R2 values for the training, testing and validation of the ANN model. a) training, b) testing, c) validation, and d) all data.

Comparison of scatter plots in demonstrates the MLR model was affected by a categorical variable and then regression lines were distinguished. While each break category (BC) in Equation (2) produces separate correlation lines, the ANN model combines data to yield a single estimate of the population correlation. The ANN model for all data points had an R2 of 0.94 (Figure ).

Figure compares current and future failure rate for the rehabilitated pipelines with CML or CP approaches. As shown in the figure, the failure rate is predicted low for the next ten years. On average, ANN model projected 14.3% decreases in failure rate for the Cement-mortar-lined watermains after ten years operation.

Figure 5 Comparison of current failure rate and after ten years for the pipelines which CML or CP have been implemented.

Figure 5 Comparison of current failure rate and after ten years for the pipelines which CML or CP have been implemented.

Sensitivity analysis

After an ANN model is successfully trained, the relative strength of effect for input element information on output data can be derived based on the weights stored in the network. This research adopted an importance index to test the degree of sensitivity and significance of the various input variables to produce the outputs. This indicates the relative importance of the variables used in a neural network. In order to rank the importance of variables, the sums of squares residuals for the model was computed when the respective predictor is eliminated from the neural net. Table lists the break category variable is highly significant and Material is the least significant in failure rate prediction. ANN models also identify Break category, LogLength, and Age are significant, which were in the MLR model.

Table 1. The influence ranking of input variables on the output ANN model.

Conclusions

This paper assessed the prediction of failure rate for watermains in the City of Etobicoke, Ontario, Canada. Two methods, MLR and ANN, were developed on the basis that rate of failure is affected by many contributing factors such as surrounding soil, rehabilitation works (CP and CML), and pipe variables. Findings specific to the City of Etobicoke case study are summarized as follows:

Based on historical analysis, reduction in breakage rate has been observed following the initiation of the CP and CML programs in the City. This finding indicates the CP and CML must be incorporated in the prediction models.

A linear regression model was developed to predict failure rate and has a coefficient of determination R2 of 0.75. This relative lower prediction accuracy (in comparison with the ANN) may be judged from common drawbacks in traditional parametric methods such as MLR including the assumption of a linear relationship between the predictors and the dependent variable along with assumption of a Gaussian distribution of data and errors. Besides simplicity, the MLR model prediction was good enough for initial assessment but future decisions and planning need better accuracy in predicting.

To capture any nonlinear relationship that may be present in the dataset, ANN model was applied to the case study dataset for prediction. Through the calibration and validation processes, the ANN model with order of 8-3-1 (eight input nodes corresponding to the watermains variables and one output node corresponding to the outcome (herein, failure rate) was, by far, the best model to fit with the Etobicoke dataset. The correlation coefficient (R2=0.94) indicates that the ANN model is successful in predicting watermain failure rates. This promising result confirms the hypothesis that ANN model is more robust to solve nonlinear problems where classical mathematical modelling procedure such as MLR are unable. Moreover, the ANN model had the advantage that unlike the MLR, a significance level does not need to be considered and all variables can be incorporated in the modelling process.

In terms of engineering implications, CML and CP will decrease future degradation rates of pipelines and extend the life of watermains. On the subject of management implications, ANN model with the acceptable precision would be a reliable decision tool for future rehabilitation and budgeting planning.

Acknowledgements

The authors would like to thank the Etobicoke municipality in the GTA for their contributions in the data collection phase. This research was funded by the Canada Research Chairs and NSERC Discovery program.

References

  • Ahn , J. C. , Lee , S. W. , Lee , G. S. and Koo , J. Y. 2005 . Predicting water pipe breaks using neural network . Water Science and Technology: Water Supply , 5 ( 3–4 ) : 159 – 172 .
  • Al-Barqawi , H. and Zayed , T. 2008 . Infrastructure management: integrated AHP/ANN model to evaluate municipal watermains’ performance . Journal of Infrastructure Systems , 14 ( 4 ) : 305 – 318 .
  • Asnaashari , A. , McBean , E. A. , Shahrour , I. and Gharabaghi , B. 2009 . Prediction of watermain failure frequencies using multiple and Poisson regression . Water Science and Technology: Water Supply , 9 ( 1 ) : 9 – 19 .
  • Bowden , G. J. , Nixon , J. B. , Dandy , G. C. , Maier , H. R. and Holmes , M. 2006 . Forecasting chlorine residuals in a water distribution system using a general regression neural network . Mathematical and Computer Modelling , 44 ( 5–6 ) : 469 – 484 .
  • Caudill , M. 1988 . Neural networks primer, Part III . AI Expert , 3 ( 6 ) : 53 – 59 .
  • Chambers, G. M. 1994. “Reducing water utility costs in Winnipeg.” In Proceedings of the Western Canada Water and Wastewater Association Conference, 1–12. Winnipeg, Canada: Western Canada Water and Wastewater Association.
  • Ciottoni , A. S. 1985 . “ Updating the New York City water system ” . In Proceedings of the Specialty Conference on Infrastructure for Urban Growth , Edited by: Guyer , J. P. 69 – 77 . New York : American Society of Civil Engineers .
  • City of Toronto. 2009. Watermains and watermain breaks. Water supply: fact sheets. Accessed January 2011. http://www.toronto.ca/water/supply/system/watermains.htm.
  • Cybenko , G. 1989 . Approximation by Superpositions of a Sigmoidal Function . Mathematics of Control, Signals, and Systems , 2 : 303 – 314 .
  • Dandy , G. C. and Engelhardt , M. 2001 . The optimal scheduling of water main replacement using genetic algorithms . Journal of Water Resources Planning and Management ASCE , 127 ( 4 ) : 214 – 223 .
  • Despagne , F. and Massart , D. L. 1998 . Tutorial review: neural networks in multivariate calibration . The Analyst , 123 : 157R – 178R .
  • Goulter , I. C. and Kazemi , A. 1988 . Spatial and temporal groupings of watermain pipe breakage in Winnipeg . Canadian Journal of Civil Engineering , 15 ( 1 ) : 91 – 97 .
  • Habibian , A. 1994 . Effect of temperature changes on water-main break . Journal of Transportation Engineering , 120 ( 2 ) : 312 – 321 .
  • Hajela, P., and L. Berke. 1991. “Neurobiological computational modes in structural analysis and design.” Computers & Structures 41 (4): 657–667.
  • Hecht-Nielsen, R. 1989. “Theory of the back-propagation neural network.” Vol. 1 of Proceedings of the International Joint Conference on Neural Networks, 593–606. Washington, DC: IEEE TAB Neural Network Committee.
  • Ho , C. I. , Lin , M. D. and Lo , S. L. 2009 . Use of GIS-based hybrid artificial neural network to prioritize the order of pipe replacement in a water distribution network . Journal of Environmental Monitoring and Assessment , 166 ( 1–4 ) : 177 – 189 . doi: 10.1007/s10661-009-0994-6
  • Hornik , K. , Stinchcombe , M. and White , H. 1989 . Multilayer Feedforward Networks are Universal Approximators . Neural Networks , 2 ( 5 ) : 359 – 366 .
  • Jafar , R. , Shahrour , I. and Juran , I. 2010 . Application of Artificial Neural Networks (ANN) to model the failure of urban water mains . Mathematical and Computer Modelling , 51 ( 9–10 ) : 1170 – 1180 .
  • Kleiner Y., S. McDonald, and B. Rajani. 2003. “Cathodic protection of water mains in Ottawa: analysis and planning.” Corrosion Control for Enhanced Reliability and Safety: 1–14.
  • Kleiner , Y. and Rajani , B. 2001 . Comprehensive review of structural deterioration of watermains: statistical models . Urban Water , 3 ( 3 ) : 131 – 150 .
  • Kleiner, Y., and B. Rajani. 2002. “Forecasting variations and trends in water-main breaks.” Journal of Infrastructure Systems 8 (4): 122–131.
  • Lackington , D. W. and Large , J. M. 1980 . The integrity of existing distribution systems . Journal of the Institution of Water Engineers and Scientists , 34 : 15 – 32 .
  • Lochbaum, B. S. 1993. “PSE & G develops models to predict main breaks.” Pipeline & Gas Journal 20 (9): 20–27.
  • Maier , H. R. and Dandy , G. C. 2000 . Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications . Environmental Modelling and Software , 15 : 101 – 124 .
  • Moselhi , O. and Shehab-Eldeen , T. 2000 . Classification of defects in sewer pipes using neural networks . Journal of Infrastructure Systems , 6 ( 3 ) : 97 – 105 .
  • Najjar , Y. M. and Basheer , I. A. 1996 . Neural network approach for site characterization and uncertainty prediction . ASCE Geotechnical Special Publication , 58 ( 1 ) : 134 – 148 .
  • Najjar , Y. M. , Basheer , I. A. and Hajmeer , M. N. 1997 . Computational neural networks for predictive microbiology: I. methodology . International Journal of Food Microbiology , 34 ( 1 ) : 27 – 49 .
  • Needham , D. and Howe , M. 1981 . Why gas mains fail, Part 1 . Pipe Line Industry , 55 : 47 – 50 .
  • Newport , R. 1981 . Factors influencing the occurrence of bursts in iron watermains . Water Supply and Management , 3 : 274 – 278 .
  • O’Day , D. K. 1982 . Organizing and analyzing leak and break data for making main replacement decisions . Journal of the American Water Works Association , 74 ( 11 ) : 589 – 594 .
  • Pelletier , G. and Villeneuve , J. -P. 2003 . Modelling water pipe breaks – three case studies . Journal of Water Resources Planning and Management , 129 ( 2 ) : 115 – 123 .
  • Press, J. 2009. Water main breaks shifting.The Whig. Accessed May 2012. http://www.thewhig.com/ArticleDisplay.aspx?e=1474179&archive=true.
  • Rogers P. D. and N. S. Grigg. 2009. “Failure assessment modelling to prioritize water pipe renewal: Two case studies.” Journal of Infrastructure Systems 15 (3): 162–171.
  • Salchenberger , L. M. , Cinar , E. M. and Lash , N. A. 1992 . Neural networks: A new tool for predicting thrift failures . Journal of Decision Sciences , 23 : 899 – 916 .
  • Schuster , C. and McBean , E. 2008 . Impacts of cathodic protection on pipe break probabilities: A Toronto case study . Canadian Journal of Civil Engineering , 35 : 210 – 216 .
  • Selvakumar , A. , Clark , R. M. and Sivaganesan , M. 2002 . Costs for water supply distribution system rehabilitation . Journal of Water Resources Planning and Management , 128 ( 4 ) : 303 – 306 .
  • Shahin , M. A. , Jaksa , M. B. and Maier , H. R. 2002 . Artificial neural network based settlement prediction formula for shallow foundations on granular soils . Australian Geomechanics , 37 ( 4 ) : 45 – 52 .
  • Shamir , U. and Howard , C. 1979 . Analytic approach to scheduling pipe replacement . Journal of the American Water Works Association , 71 ( 5 ) : 248 – 258 .
  • Tabesh, M., J. Soltani, R. Farmani and D. Savic. 2009, Assessing pipe failure rate and mechanical reliability of water distribution networks using data-driven modelling. Journal of Hydroinformatics (11)1: 1–17.
  • Walski , T. M. and Pelliccia , A. 1982 . Economic analyses of watermain breaks . Journal of the American Water Works Association , 74 ( 3 ) : 140 – 147 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.