145
Views
0
CrossRef citations to date
0
Altmetric
Case Report

Random forest for the detection of unauthorized consumption in water supply systems: a case study in Southern Brazil

, ORCID Icon & ORCID Icon
Pages 394-404 | Received 22 Aug 2022, Accepted 01 Dec 2022, Published online: 12 Dec 2022

References

  • Aldrich, Chris. 2020. “Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework.” Minerals 10 (5): 420. doi:10.3390/min10050420.
  • AL-Washali, T., S. Sharma, and M. Kennedy. 2016. “Methods of Assessment of Water Losses in Water Supply Systems: A Review.” Water Resources Management 30: 4985–5001. doi:10.1007/s11269-016-1503-7.
  • AL-Washali, T., S. Sharma, R Lupoja, F AL-Nozaily, M Haidera, and M. Kennedy. 2020. “Assessment of Water Losses in Distribution Networks: Methods, Applications, Uncertainties, and Implications in Intermittent Supply.” Resources, Conservation and Recycling 152: 104515. doi:10.1016/j.resconrec.2019.104515.
  • Aria, M., C. Cuccurullo, and A. Gnasso. 2021. “A Comparison among Interpretative Proposals for Random Forests.” Machine Learning with Applications 6: 100094. doi:10.1016/j.mlwa.2021.100094.
  • Arregui, F.J., J. Soriano, E. Cabrera, and R. Cobacho. 2012. “Nine Steps Towards a Better Water Meter Management.” Water Science and Technology 65 (7): 1273–1280. doi:10.2166/wst.2012.009.
  • Blagus, R., and L. Lusa. 2013. “SMOTE for high-dimensional class-imbalanced Data.” BMC Bioinformatics 14: 106. doi:10.1186/1471-2105-14-106.
  • Borralho, P., M.R. Oliveira, and M. Azeitona 2020. “Symbolic Outlier Detection Applied to the Analysis of Drinking Water Consumption.” XXVII Meeting of the Portuguese Association for Classification and Data Analysis, Lisboa.
  • Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24: 123–140. doi:10.1007/BF00058655.
  • Breiman, L. 2001. “Random Forests.” Machine Learning 45: 5–32. doi:10.1023/A:1010933404324.
  • Buntine, W., and T. Niblett. 1992. “A Further Comparison of Splitting Rules for Decision-Tree Induction.” Machine Learning 8: 75–85. doi:10.1023/A:1022686419106.
  • Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. “Smote: Synthetic Minority over-sampling Technique.” Journal of Artificial Intelligence Research 16: 321–357. doi:10.1613/jair.953.
  • Chikodili, N.B., M.D. Abdulmalik, O.A. Abisoye, and S.A. Bashir. 2021, “Outlier Detection in Multivariate Time Series Data Using a Fusion of K-Medoid, Standardized Euclidean Distance and Z-Score.” In In Information and Communication Technology and Applications. S. Misra and B. Muhammad-Bello edited by, Vol. 1350 ICTA 2020 Communications in Computer and Information Science. Springer : Cham, 259–271. doi: 10.1007/978-3-030-69143-1_21.
  • Companhia Águas de Joinville. “GIS Map.” 2022.
  • Core Team, R. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/
  • Debón, A., A. Carrión, E. Cabrera, and H. Solano. 2010. “Comparing Risk of Failure Models in Water Supply Networks Using ROC Curves.” Reliability Engineering & System Safety 95 (1): 43–48. doi:10.1016/j.ress.2009.07.004.
  • Detroz, J.P., and A.T. da Silva 2017. “Fraud Detection in Water Meters Using Pattern Recognition Techniques.” In Proceedings of the ACM Symposium on Applied Computing 77–82. New York. doi: 10.1145/3019612.3019634.
  • Fernández, A., S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera. 2018. Learning from Imbalanced Data Sets. Berlin: Springer.
  • González-Vidal, A., J. Cuenca-Jara, and A.F. Skarmeta 2019. “IoT for Water Management: Towards Intelligent Anomaly Detection”. 2019 IEEE 5th World Forum on Internet of Things (WF-IoT), Limerick, Ireland, pp. 858–863. doi: 10.1109/WF-IoT.2019.8767190.
  • Guerrero, J.I., I. Monedero, F. Biscarri, J. Biscarri, R. Millán, and C. León. 2017. “Non-Technical Losses Reduction by Improving the Inspections Accuracy in a Power Utility.” IEEE Transactions on Power Systems 33 (2): 1209–1218. doi:10.1109/TPWRS.2017.2721435.
  • Gunturi, S. K., and D. Sarkar. 2021. “Ensemble Machine Learning Models for the Detection of Energy Theft.” Electric Power Systems Research 192: 106904. doi:10.1016/j.epsr.2020.106904.
  • Guo, W., J. Zhang, D. Cao, and H. Yao. 2022. “Cost-effective Assessment of in-service Asphalt Pavement Condition Based on Random Forests and Regression Analysis.” Construction and Building Materials 330: 127219. doi:10.1016/j.conbuildmat.2022.127219.
  • Gupta, A., and K.D. Kulat. 2018. “A Selective Literature Review on Leak Management Techniques for Water Distribution System.” Water Resource Management 32: 3247–3269. doi:10.1007/s11269-018-1985-6.
  • Gupta, A.D., P. Pandey, A. Feijóo, Z.M. Yaseen, and N.D. Bokde. 2020. “Smart Water Technology for Efficient Water Resource Management: A Review.” Energies 13: 6268. doi:10.3390/en13236268.
  • Hastie, T., R. Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.
  • Heryanto, T., S.K. Sharma, D. Daniel, and M. Kennedy. 2021. “Estimating the Economic Level of Water Losses (ELWL) in the Water Distribution System of the City of Malang, Indonesia.” Sustainability 13: 6604. doi:10.3390/su13126604.
  • Hossin, M., and M.N. Sulaiman. 2015. “A Review on Evaluation Metrics for Data Classification Evaluation“. IJDKP, 5 (2): 1–11. do:10.5121/ijdkp.2015.5201.
  • Iglewicz, B., and D. C. Hoaglin. 1993. How to Detect and Handle Outliers. Vol. 16. Milwaukee: ASQC Quality Press.
  • Johnson J.M. and T.M. Khoshgoftaar. 2019. “Survey on deep learning with class imbalanc“. J Big Data, 6(1). doi:10.1186/s40537-019-0192-5.
  • Kuhn, M., and K. Johnson. 2013. Applied Predictive Modeling. New York: Springer.
  • Lambert, A.O. 2002. “International Report on Water Losses Management and Techniques: Report to IWA Berlin Congress.” Water Supply 2 (4): 1–20. doi:10.2166/ws.2002.0115.
  • Lambert, A., and W. Hirner. 2000. “Losses from Water Supply Systems: Standard Terminology and Recommended Performance Measures .” In The Blue Pages. London, UK: IWA - International Water Association.
  • Liaw, A., and M. Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22.
  • Liemberger, R., and A. Wyatt. 2019. “Quantifying the Global non-revenue Water Problem.” Water Supply 19 (3): 831–837. doi:10.2166/ws.2018.129.
  • Liggett, J. A., and L. C. Chen. 1994. “Inverse Transient Analysis in Pipe Networks.” Journal of Hydraulic Engineering 120 (8): 934–955. doi:10.1061/ASCE/0733-9429/1994/120:8/934/.
  • Li, Y., C. Yan, W. Liu, and M. Li. 2018. “A Principle Component analysis-based Random Forest with the Potential Nearest Neighbor Method for Automobile Insurance Fraud Identification.” Applied Soft Computing 70: 1000–1009. doi:10.1016/j.asoc.2017.07.027.
  • Martín De Diego I., A.R. Redondo, R.R. Fernández, J. Navarro, and J.M. Moguerza. (2022). General Performance Score for classification problems. Appl Intell, 52(10), 12049–12063. doi:10.1007/s10489-021-03041-7.
  • Mohammadi, M. R., F.Hadavimoghaddam, S. Atashrouz, A. Abedi, A. Hemmati-Sarapardeh, and A. Mohaddespour. 2022. “Modeling the Solubility of Light Hydrocarbon Gases and Their Mixture in Brine with Machine Learning and Equations of State.” Scientific reports 12 (1): 1–25. doi:10.1038/s41598-022-18983-2.
  • Mohr, D.L., W. J. Wilson, and R.J. Freund. 2022. “Inferences on a Single Population”. In Statistical Methods. Fourth, D.L. Mohr, W. J. Wilson, and R.J Freund edited by, Academic Press, 169–199. doi:10.1016/B978-0-12-823043-5.00004-7.
  • Monedero, I., F. Biscarri, J. I. Guerrero, M. Peña, M. Roldán, and C. León. 2016. “Detection of Water Meter under-registration Using Statistical Algorithms.” Journal of Water Resources Planning and Management 142 (1): 04015036. doi:10.1061/(ASCE)WR.1943-5452.0000562.
  • Monedero, I., F. Biscarri, J.I. Guerrero, M. Roldán, and C. León. 2015. “An Approach to Detection of Tampering in Water Meters.” Procedia Computer Science 60: 413–421. doi:10.1016/j.procs.2015.08.157.
  • Monedero, I., F. Biscarri, C. León, J.I. Guerrero, J. Biscarri, and R. Millán. 2012. “Detection of Frauds and Other non-technical Losses in a Power Utility Using Pearson Coefficient, Bayesian Networks and Decision Trees.” International Journal of Electrical Power & Energy Systems 34: 90–98. doi:10.1016/j.ijepes.2011.09.009.
  • Morote, Á., and M. Hernández-Hernández. 2018. “Unauthorised Domestic Water Consumption in the City of Alicante (Spain): A Consideration of Its Causes and Urban Distribution (2005–2017).” Water 10 (7): 851. doi:10.3390/w10070851.
  • Mubvaruri, F., Z. Hoko, A. Mhizha, and W. Gumindoga. 2022. “Investigating Trends and Components of non-revenue Water for Glendale, Zimbabwe.” Physics and Chemistry of the Earth, Parts A/B/C 126: 103145. doi:10.1016/j.pce.2022.103145.
  • Mutikanga, H.E., S.K. Sharma, and K. Vairavamoorthy. 2011. “Assessment of Apparent Losses in Urban Water Systems.” Water and Environment Journal 25: 327–335. doi:10.1111/j.1747-6593.2010.00225.x.
  • Mvulirwenande, S., and U. Wehn. 2019. “Promoting Smart Water Systems in Developing Countries through Innovation Partnerships: Evidence from VIA Water-Supported Projects in Africa.” A. Scozzari, S. Mounce, D. Han, F. Soldovieri, and D. Solomatine edited by ICT for Smart Water Systems: Measurements and Data Science. The Handbook of Environmental Chemistry. Vol. 102, p. 167–207. Springer: Cham. doi: 10.1007/698_2019_422.
  • Nami, S., and M. Shajari. 2018. “Cost-sensitive Payment Card Fraud Detection Based on Dynamic Random Forest and k-nearest Neighbors.” Expert Systems with Applications 110: 381–392. doi:10.1016/j.eswa.2018.06.011.
  • Nascimento, W.N., and L. Gomes-Jr. 2022. “Enabling low-cost Automatic Water Leakage Detection: A semi-supervised, autoML-based Approach.” Urban Water Journal. doi:10.1080/1573062X.2022.2056710.
  • Nofal, S., A. Alfarrarjeh, and A.A. Jabal. 2022. “A Use Case of Anomaly Detection for Identifying Unusual Water Consumption in Jordan.” Water Supply 22 (1): 1131–1140. doi:10.2166/ws.2021.210.
  • Ociepa, Ewa, Maciej Mrowiec, and Iwona Deska. 2019. “Analysis of Water Losses and Assessment of Initiatives Aimed at Their Reduction in Selected Water Supply Systems.” Water 11 (5): 1037. doi:10.3390/w11051037.
  • Ollech, D. 2021. “Seastests: Seasonality Tests”. R Package Version 0.15.4.” https://CRAN.R-project.org/package=seastests
  • Padulano R and Del Giudice G. (2018). A Mixed Strategy Based on Self-Organizing Map for Water Demand Pattern Profiling of Large-Size Smart Water Grid Data. Water Resour Manage, 32(11), 3671–3685. 10.1007/s11269-018-2012-7
  • Qu, Z., H. Li, Y. Wang, J. Zhang, A. Abu-Siada, and Y. Yao. 2020. “Detection of Electricity Theft Behavior Based on Improved Synthetic Minority Oversampling Technique and Random Forest Classifier.” Energies 13 (8): 2039. doi:10.3390/en13082039.
  • Roccetti, M., G. Delnevo, L. Casini, and G. Cappiello. 2019. “Is Bigger Always Better? A Controversial Journey to the Center of Machine Learning Design, with Uses and Misuses of Big Data for Predicting Water Meter Failures.” Journal of Big Data 6:70. doi:10.1186/s40537-019-0235-y.
  • Seo, S. (2006). A Review and comparison of methods for detecting outliers in univariate data sets. [ Master's thesis, University of Pittsburgh. Accessed July 14 2022. http://d-scholarship.pitt.edu/7948/1/Seo.pdf.
  • Seo, S, Y Kim, HJ Han, WC Son, ZY Hong, I Sohn, J Shim, and C. Hwang. 2021. “Predicting Successes and Failures of Clinical Trials with Outer Product-Based Convolutional Neural Network.” Frontiers in Pharmacology 12: 670670. doi:10.3389/fphar.2021.670670.
  • Sowjanya, A.M., and O. Mrudula. 2022. “Effective Treatment of Imbalanced Datasets in Health Care Using Modified SMOTE Coupled with Stacked Deep Learning Algorithms.” Applied Nanoscience. doi:10.1007/s13204-021-02063-4.
  • Stańczyk, J, and E. Burszta-Adamiak. 2019. “The Analysis of Water Supply Operating Conditions Systems by Means of Empirical Exponents.” Water 11 (12): 2452. doi:10.3390/w11122452.
  • Tabesh, M., A. Roozbahani, B. Roghani, N.R. Faghihi, and R. Heydarzadeh. 2018. “Risk Assessment of Factors Influencing Non-Revenue Water Using Bayesian Networks and Fuzzy Logic.” Water Resources Management 32:3647–3670. doi:10.1007/s11269-018-2011-8.
  • Touw, W.G., J.R. Bayjanov, L. Overmars, L. Backus, J. Boekhorst, M. Wels, and S.A. van Hijum. 2013. “Data Mining in the Life Sciences with Random Forest: A Walk in the Park or Lost in the Jungle?” Briefings in Bioinformatics 14 (3): 315–326. doi:10.1093/bib/bbs034.
  • Tyralis, H., G. Papacharalampous, and A. Langousis. 2019. “A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources.” Water 11: 910. doi:10.3390/w11050910.
  • van den Berg, C. 2015. “Drivers of non-revenue Water: A cross-national Analysis.” Utilities Policy 36: 71–78. doi:10.1016/j.jup.2015.07.005.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.