ABSTRACT
In sewer networks, failure prediction plays a significant role in operation and maintenance plans of wastewater utilities. This study aims to determine the effective variables on the failures by using feature selection algorithms (FS) and achieve maximum model accuracy with minimum variables. Also, four scenarios based on the suggested FS algorithms were developed. In these scenarios, the best prediction models were investigated using machine learning classifiers (ML) such as neural network classifier (NNC), gradient boosting machine (GBM), random forest (FR), and hybrid model (HM). The classification performance of ML models was evaluated using accuracy, precision, F1_score, and receiver operating characteristics (ROC) curve. The model accuracies ranging from 0.99 for accuracy to 1 for the ROC curve were achieved through ML algorithms. In conclusion, the ML algorithms suggested in this study may be a decision support tool for wastewater utilities in prioritizing the replacement, maintenance, and inspection of sewer pipes.
List of abbreviations
The following symbols are used in this paper
Acronym | = | Definition |
ANN | = | Artificial neural network |
AUC | = | Area under the curve |
Bagging | = | Bootstrap aggregating |
C | = | Concrete |
CCTV | = | Closed-Circuit television |
CNL | = | Capacity per unit network length |
CNNs | = | Convolutional neural networks |
CP | = | Corrugated pipe with muff |
CV | = | Cross validation |
DI | = | Ductile iron |
DT | = | Decision trees |
FS | = | Feature selection |
FPC | = | False positives class |
FPR | = | False positive rate |
GBM | = | Gradient boosting machine |
GIS | = | Geographic information system |
GM | = | Geometric mean |
GP | = | Genetic programming |
HDPE | = | High-density polyethylene |
HM | = | Hybrid model |
ISU | = | Kocaeli water and sewerage administration |
LASSO | = | Least absolute shrinkage and selection operator |
LBFGS | = | Limited-memory Broyden-Fletcher-Goldfarb-Shanno |
LR | = | Logistic regression |
LSTM | = | Long short-term memory |
MCC | = | Matthew’s correlation coefficient |
ML | = | Machine learning |
MRMR | = | Minimum redundancy maximum relevance |
NNC | = | Neural network classifier |
NL | = | Network length |
OBB | = | Out-of-bag |
PC | = | Pearson correlation |
PVC | = | Polyvinyl chloride |
ReLU | = | Rectified linear unit |
RC | = | Reinforced concrete |
RF | = | Random forest |
ROC | = | Receiver operating chacteristics |
ST | = | Steel |
SUEN | = | Turkish water institute |
SVM | = | Support vector machine |
TNC | = | True negative class |
TPC | = | True positive class |
XGB | = | Extreme gradient boosting |
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed https://doi.org/10.1080/1573062X.2024.2360184.