ABSTRACT
Pipe failures in water distribution infrastructure (WDI) have significant economic, environmental and public health impacts. To alleviate these impacts, repair and replacement decisions need to be prioritized to effectively reduce failure rates. In this study, a computational framework is proposed for WDI asset management that couples spatial clustering analysis with predictive modeling of pipe failures. First, hotspot/coldspot clusters of statistically significant high/low failure rates are identified using local indicators of spatial association. Second, the predictive abilities of eight statistical learning techniques are systematically tested, and the best-performing method is implemented to forecast failure rates,(breaks/(km.year)) within different sectors of the WDI. Third, the framework is implemented to compare the impact of adopting proactive instead of reactive pipe replacement strategies. Applying the framework to a real-life, large-scale WDI revealed that spatial clustering of pipe failures improves the accuracy of the prediction models.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability
Some or all data, models, or code generated or used during the study are proprietary or confidential in nature and may only be provided with restrictions (e.g. anonymized data). Pipe failure data and network GIS records were provided by the water utility under a confidentiality agreement between the water utility and the second author. The data are available by request from the authors and may only be provided after obtaining the utility’s approval and undergoing potential anonymization. The computational framework developed in this study was implemented using the Python programming language. The spatial autocorrelation analysis is conducted using the Python Spatial Analysis Library (PySAL) (Rey and Anselin Citation2010). Predictive modeling is conducted by scikit-learn (Pedregosa et al. Citation2011). Other Python libraries used for data analysis and visualization include numpy, pandas, geopandas and matplotlib (Hunter Citation2007; McKinney Citation2010; Van Der Walt, Colbert, and Varoquaux Citation2011).
Supplementary material
Supplemental data for this article can be accessed https://doi.org/10.1080/1573062X.2023.2180393