An interpretable predictive model for bank customers’ income using the eXtreme Gradient Boosting algorithm and the SHAP method: a case study of an Anonymous Chilean Bank

Patricio SalasDepartment of Statistics, Universidad de Concepción, Concepción, ChileCorrespondence[email protected]

https://orcid.org/0000-0002-2201-4038

Patricio SáezDepartment of Statistics, Universidad de Concepción, Concepción, Chile

https://orcid.org/0000-0002-0113-3644

Vicente MarchantDepartment of Statistics, Universidad de Concepción, Concepción, Chile

Abstract

In the dynamic landscape of banking institutions, acquiring accurate and timely information regarding customers’ incomes is crucial for effectively managing financial product offerings. To meet this demand, these institutions construct predictive models using numerous features, with only a subset contributing to capturing income variability. In this study, we propose a methodology for predicting monthly incomes by employing an XGBoost model with a reduced number of features. Feature reduction is accomplished through the implementation of Boruta and BorutaSHAP, ensuring that no predictive power is lost throughout the process. To enhance the transparency of the model’s predictions, we used the Shapley Additive Explanations (SHAP) method. The dataset used was provided by an anonymous bank from Chile, consisting of 10,000 records, 426 features, and a substantial proportion of missing values. The results demonstrate that the combination of feature selection methods and the XGBoost algorithm enables the development of a more concise model that maintains predictive performance. By leveraging the SHAP method, financial institutions can consistently identify and track influential features, thereby reducing complexity and training time without compromising predictive power. This research offers valuable contributions to financial institutions, as they can adopt our methodology to consistently identify and track the most influential features.

KEYWORDS:

Acknowledgments

We would like to thank anonymous Chilean Bank institution who made this study possible by providing of dataset their customers.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Data science-based analysis techniques come from an interdisciplinary field that employs scientific methods, processes, and systems to extract knowledge and insights from data in various forms. These techniques involve the use of statistical, mathematical, and computational approaches to examine, interpret, and understand datasets.

An interpretable predictive model for bank customers’ income using the eXtreme Gradient Boosting algorithm and the SHAP method: a case study of an Anonymous Chilean Bank

Information for

Open access

Opportunities

Help and information

An interpretable predictive model for bank customers’ income using the eXtreme Gradient Boosting algorithm and the SHAP method: a case study of an Anonymous Chilean Bank

Abstract

Acknowledgments

Disclosure statement

Notes

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature