Abstract
In the dynamic landscape of banking institutions, acquiring accurate and timely information regarding customers’ incomes is crucial for effectively managing financial product offerings. To meet this demand, these institutions construct predictive models using numerous features, with only a subset contributing to capturing income variability. In this study, we propose a methodology for predicting monthly incomes by employing an XGBoost model with a reduced number of features. Feature reduction is accomplished through the implementation of Boruta and BorutaSHAP, ensuring that no predictive power is lost throughout the process. To enhance the transparency of the model’s predictions, we used the Shapley Additive Explanations (SHAP) method. The dataset used was provided by an anonymous bank from Chile, consisting of 10,000 records, 426 features, and a substantial proportion of missing values. The results demonstrate that the combination of feature selection methods and the XGBoost algorithm enables the development of a more concise model that maintains predictive performance. By leveraging the SHAP method, financial institutions can consistently identify and track influential features, thereby reducing complexity and training time without compromising predictive power. This research offers valuable contributions to financial institutions, as they can adopt our methodology to consistently identify and track the most influential features.
Acknowledgments
We would like to thank anonymous Chilean Bank institution who made this study possible by providing of dataset their customers.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Data science-based analysis techniques come from an interdisciplinary field that employs scientific methods, processes, and systems to extract knowledge and insights from data in various forms. These techniques involve the use of statistical, mathematical, and computational approaches to examine, interpret, and understand datasets.