Abstract
A major requirement for credit scoring models is to provide a maximally accurate risk prediction. Additionally, regulators demand these models to be transparent and auditable. Thus, in credit scoring, very simple predictive models such as logistic regression or decision trees are still widely used and the superior predictive power of modern machine learning algorithms cannot be fully leveraged. Significant potential is therefore missed, leading to higher reserves or more credit defaults. This article works out different dimensions that have to be considered for making credit scoring models understandable and presents a framework for making “black box” machine learning models transparent, auditable, and explainable. Following this framework, we present an overview of techniques, demonstrate how they can be applied in credit scoring and how results compare to the interpretability of scorecards. A real world case study shows that a comparable degree of interpretability can be achieved while machine learning techniques keep their ability to improve predictive power.
Disclosure statement
No potential conflict of interest was reported by the authors.
Figure 1. The process description consists of four components. The first defines the stakeholders that are affected by the model. The second defines the life cycle of the model and specifies which stakeholders are active in which part of the model life cycle. The third specifies at which time which stakeholders have what needs related to the model. The last component sets out the machine learning techniques that can be used to meet the identified needs.
![Figure 1. The process description consists of four components. The first defines the stakeholders that are affected by the model. The second defines the life cycle of the model and specifies which stakeholders are active in which part of the model life cycle. The third specifies at which time which stakeholders have what needs related to the model. The last component sets out the machine learning techniques that can be used to meet the identified needs.](/cms/asset/78dd7361-87a5-49a9-8172-9f93f12635cc/tjor_a_1922098_f0001_c.jpg)
Figure 2. A detailed view on the pyramid setting out the machine learning techniques that can be used for explainability.
![Figure 2. A detailed view on the pyramid setting out the machine learning techniques that can be used for explainability.](/cms/asset/9d635a75-9b17-4c1d-9f9d-d47f268f598e/tjor_a_1922098_f0002_c.jpg)
Figure 3. Example: automatic vs. manual binning for the variable months since the most recent delinquency.
![Figure 3. Example: automatic vs. manual binning for the variable months since the most recent delinquency.](/cms/asset/8cad7d0d-00c5-44ea-8558-25bb43195ca1/tjor_a_1922098_f0003_b.jpg)
Figure 5. Selection of models for comparison based on AUC on training and test data. The black line corresponds to an AUC that is equal on training and test sets. The included models are gradient boosting machines (“gbm”) with different numbers of trees, logistic regression (“glm”), elastic net (“glmnet”), logistic regressions with spline based transformations (“rms”), two implementations of random forest (“randomForest,” “ranger”), support vector machines (“svm”), and extreme gradient boosting (“xgboost”) as well as H2O’s AutoML (“h2o”) and MLJAR AutoML (“mljar”) trained for varying amounts of time.
![Figure 5. Selection of models for comparison based on AUC on training and test data. The black line corresponds to an AUC that is equal on training and test sets. The included models are gradient boosting machines (“gbm”) with different numbers of trees, logistic regression (“glm”), elastic net (“glmnet”), logistic regressions with spline based transformations (“rms”), two implementations of random forest (“randomForest,” “ranger”), support vector machines (“svm”), and extreme gradient boosting (“xgboost”) as well as H2O’s AutoML (“h2o”) and MLJAR AutoML (“mljar”) trained for varying amounts of time.](/cms/asset/1007a22f-2ffd-4b13-9e0a-90cee5db39a3/tjor_a_1922098_f0005_b.jpg)
Acknowledgements
The authors sincerely thank two anonymous reviewers whose comments and suggestions helped improve and clarify this manuscript substantially.