49
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Prediction of failures in sewer networks using various machine learning classifiers

ORCID Icon
Pages 877-893 | Received 11 Dec 2023, Accepted 09 May 2024, Published online: 30 May 2024
 

ABSTRACT

In sewer networks, failure prediction plays a significant role in operation and maintenance plans of wastewater utilities. This study aims to determine the effective variables on the failures by using feature selection algorithms (FS) and achieve maximum model accuracy with minimum variables. Also, four scenarios based on the suggested FS algorithms were developed. In these scenarios, the best prediction models were investigated using machine learning classifiers (ML) such as neural network classifier (NNC), gradient boosting machine (GBM), random forest (FR), and hybrid model (HM). The classification performance of ML models was evaluated using accuracy, precision, F1_score, and receiver operating characteristics (ROC) curve. The model accuracies ranging from 0.99 for accuracy to 1 for the ROC curve were achieved through ML algorithms. In conclusion, the ML algorithms suggested in this study may be a decision support tool for wastewater utilities in prioritizing the replacement, maintenance, and inspection of sewer pipes.

List of abbreviations

The following symbols are used in this paper

Acronym=

Definition

ANN=

Artificial neural network

AUC=

Area under the curve

Bagging=

Bootstrap aggregating

C=

Concrete

CCTV=

Closed-Circuit television

CNL=

Capacity per unit network length

CNNs=

Convolutional neural networks

CP=

Corrugated pipe with muff

CV=

Cross validation

DI=

Ductile iron

DT=

Decision trees

FS=

Feature selection

FPC=

False positives class

FPR=

False positive rate

GBM=

Gradient boosting machine

GIS=

Geographic information system

GM=

Geometric mean

GP=

Genetic programming

HDPE=

High-density polyethylene

HM=

Hybrid model

ISU=

Kocaeli water and sewerage administration

LASSO=

Least absolute shrinkage and selection operator

LBFGS=

Limited-memory Broyden-Fletcher-Goldfarb-Shanno

LR=

Logistic regression

LSTM=

Long short-term memory

MCC=

Matthew’s correlation coefficient

ML=

Machine learning

MRMR=

Minimum redundancy maximum relevance

NNC=

Neural network classifier

NL=

Network length

OBB=

Out-of-bag

PC=

Pearson correlation

PVC=

Polyvinyl chloride

ReLU=

Rectified linear unit

RC=

Reinforced concrete

RF=

Random forest

ROC=

Receiver operating chacteristics

ST=

Steel

SUEN=

Turkish water institute

SVM=

Support vector machine

TNC=

True negative class

TPC=

True positive class

XGB=

Extreme gradient boosting

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed https://doi.org/10.1080/1573062X.2024.2360184.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.