ABSTRACT
This work aims to improve the feature selection for data-driven rainfall–runoff models by assessing the significance of each input variable in the learning process and analysing it from a physical point of view. For this purpose, a set of 14 experiments was carried out in two watersheds of the Santa Lucía Chico basin, Uruguay. A random forest model was trained and tested for daily discharge prediction in each of them using different input variables. A feature importance analysis was carried out for each model, using a non-model-biased method (Shapely additive explanations). Results showed that the most relevant variables were lagged discharges of one and two days, along with seven-day accumulated rainfall, which is interpreted as a proxy of the soil moisture condition of the watershed. The temperature was also relevant and was proven to represent the effect of the whole set of climatic variables (relative humidity, solar radiation, wind speed).
Editor A. Fiori; Associate Editor D. Rivera
Editor A. Fiori; Associate Editor D. Rivera
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/02626667.2023.2232356