ABSTRACT
This study explores the potential of the GPBoost approach for groundwater quality assessment in comparison to three other gradient boosting-based algorithms. Three methods, random search, grid search, and Bayesian optimization were used to find the optimal values of various hyperparameters with all four-gradient boosting-based algorithms. One hundred and two samples of Entropy weighted water quality index with 14 input parameters are used for assessing groundwater quality. The calculated EWQI values for drinking range between 80.4 and 394.96 in pre-monsoon and 39.6 to 338.79 during the post-monsoon period. Moreover, spatial distribution maps displayed that the central portions of the study area fall under medium water quality. The performances of models were compared based on multiple statistical criteria, including Correlation Coefficient (CC), root mean square error (RMSE), and mean absolute error (MAE). The results reveal that the CC value by all modeling approaches is more than 0.93, suggesting a comparable performance by all methods. Results in terms of RMSE values in predicting the EWQI values suggest GPBoost (random search) model performed better than the other three models, thus suggesting a competitive performance by GPBoost in comparison to other gradient boosting-based approaches. Relative importance analysis provided by random and grid search methods highlights the significance of NO3−, Mg2+, TDS, EC, and TH as important input parameters for predicting EWQI.
Highlights
Four gradient boosting-based machine-learning algorithms applied for EWQI prediction.
GPBoost by random search performed better than the other three models.
NO3−, Mg2+, TDS, EC, and TH was most effective input parameters for predicting EWQI.
Establishing a novel approach for estimating groundwater quality for drinking.
Acknowledgements
Hemant Raheja, the first author (Grant No. 2K19/NITK/PHD/61900011-Hemant Raheja), is grateful to the Ministry of Education, Government of India for funding the scholarship for this study. The authors acknowledged the National Institute of Technology, Kurukshetra, for providing various research facilities.
Disclosure statement
The authors confirm that they have no known financial or interpersonal conflicts that would have appeared to have an impact on the research presented in this study.
Author contributors
Hemant Raheja was responsible for sample collection and assessment, data processing, writing – original draft preparation, methodology, conceptualization, analysis, and editing.
Arun Goel was responsible for visualization, conceptualization, writing – reviewing, and supervision of the whole research work.
Mahesh Pal was responsible for visualization, conceptualization, data processing, writing – reviewing, and editing of the whole research work.
Data availability statement
The corresponding author will provide the datasets produced and/or analyzed during the current work upon reasonable request.