ABSTRACT
This article demonstrates the application of classification trees (decision trees), logistic regression (LR), and linear discriminant function (LDR) to classify data of water quality (i.e., whether the water is fit for drinking on not fit for drinking). The data on water quality were obtained from Pakistan Council of Research in Water Resources (PCRWR) for two cities of Pakistan—one representing industrial environment (Sialkot) and the other one representing non-industrial environment (Narowal). To classify data on water quality, three statistical tools were employed—the Decision Tree methodology using Gini Index, LR, and LDA—using R software library. The results obtained by the said three techniques were compared using misclassification rates (a model with minimum value of misclassification rate is better). It was witnessed that LR performed well than the other two techniques while the Decision trees and LDA performed equally well. But for illustration purposes decision trees technique is comparatively easy to draw and interpret.
MATHEMATICS SUBJECT CLASSIFICATION:
Acknowledgments
The authors are deeply thankful to editor and reviewers for their valuable suggestions to improve the manuscript.
Funding
This article was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah. The author, Muhammad Aslam, therefore, acknowledge with thanks DSR technical and financial support.