Search in:

Advanced search

Journal of the Operational Research Society Volume 73, 2022 - Issue 1: Special Issue on Credit Risk Modelling

Submit an article Journal homepage

3,189

Views

CrossRef citations to date

Altmetric

Original Articles

Transparency, auditability, and explainability of machine learning models in credit scoring

Michael Bückera FH Münster - University of Applied Sciences, Münster School of Business, Münster, GermanyCorrespondence[email protected]

https://orcid.org/0000-0003-0045-8460

Gero Szepannekb HOST – Stralsund University of Applied Sciences, Stralsund, Germany

https://orcid.org/0000-0001-8456-1283

Alicja Gosiewskac Warsaw University of Technology, Warsaw, Poland

https://orcid.org/0000-0001-6563-5742

Przemyslaw Biecekc Warsaw University of Technology, Warsaw, Poland;d University of Warsaw, Warsaw, Poland

https://orcid.org/0000-0001-8423-1823

Pages 70-90 | Received 12 Jan 2020, Accepted 14 Apr 2021, Published online: 21 Jun 2021

Cite this article
https://doi.org/10.1080/01605682.2021.1922098
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Alemzadeh, H., Raman, J., Leveson, N., Kalbarczyk, Z., & Iyer, R. K. (2016). Adverse events in robotic surgery: A retrospective study of 14 years of fda data. PLoS One, 11(4), e0151470–e01515 20. https://doi.org/https://doi.org/10.1371/journal.pone.0151470
PubMed Web of Science ®Google Scholar
Ariza-Garzón, M. J., Arroyo, J., Caparrini, A., & Segovia-Vargas, M. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access, 8, 64873–64890. https://doi.org/https://doi.org/10.1109/ACCESS.2020.2984412
Web of Science ®Google Scholar
Arya, V., Bellamy, R. K. E., Chen, P.-Y., Dhurandhar, A., Hind, M., Hoffman, S. C., Houde, S., Liao, Q. V., Luss, R., Mojsilović, A., Mourad, S., Pedemonte, P., Raghavendra, R., Richards, J., Sattigeri, P., Shanmugam, K., Singh, M., Varshney, K. R., Wei, D., & Zhang, Y. (2019). One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv, 1909.03012.
Google Scholar
Azevedo, A., & Santos, M. F. (2008). KDD, SEMMA and CRISP-DM: a parallel overview. In IADIS European Conf. Data Mining. https://www.semanticscholar.org/paper/KDD%2C-SEMMA-and-CRISP-DM%3A-a-parallel-overview-Azevedo-Santos/6bc30ac3f23d43ffc2254b0be24ec4217cf8c845. Europ. Conf. Data Mining (IADIS) (pp. 182–185).
Google Scholar
Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2002). Benchmarking state-of-the-art classification algorithms for credit scoring. JORS, 54(6), 627–635.
Web of Science ®Google Scholar
Banasik, J., & Crook, J. (2007). Reject inference, augmentation and sample selection. European Journal of Operational Research, 183(3), 1582–1594. https://doi.org/https://doi.org/10.1016/j.ejor.2006.06.072
Web of Science ®Google Scholar
Bellotti, T., & Crook, J. (2009). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2), 3302–3308. https://doi.org/https://doi.org/10.1016/j.eswa.2008.01.005
Web of Science ®Google Scholar
Biecek, P. (2018). Dalex: Explainers for complex predictive models. Journal of Machine Learning Research, 19(84), 1–5.
Google Scholar
Biecek, P., & Burzykowski, T. (2019). Explanatory Model Analysis. New York: Chapman and Hall/CRC. https://pbiecek.github.io/ema/
Google Scholar
Bischl, B., Kühn, T., & Szepannek, G., (2014). On class imbalance correction for classification algorithms in credit scoring. In M. Lübbecke, A. P. L. R. M. B. P. Koster, & G. Walther (Eds.), Operations Research Proceedings (pp. 37–43). Springer. https://www.springerprofessional.de/on-class-imbalance-correction-for-classification-algorithms-in-c/7494446
Google Scholar
Brown, I., & Christophe, M. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/https://doi.org/10.1016/j.eswa.2011.09.033
Web of Science ®Google Scholar
Bücker, M., van Kampen, M., & Krämer, W. (2013). Reject inference in consumer credit scoring with nonignorable missing data. Journal of Banking & Finance, 37(3), 1040–1045. https://doi.org/https://doi.org/10.1016/j.jbankfin.2012.11.002
Web of Science ®Google Scholar
Chen, C., Lin, K., Rudin, C., Shaposhnik, Y., Wang, S., & Wang, T. (2018). An interpretable model with globally consistent explanations for credit risk. arXiv, 1811.12615.
Google Scholar
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system [Paper presentation]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (pp. 785–794). ACM.
Google Scholar
Cook, D. (2016). Practical machine learning with H2O: Powerful, scalable techniques for deep learning and AI. O’Reilly Media.
Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/https://doi.org/10.1007/BF00994018
Web of Science ®Google Scholar
Crook, J., Edelman, D., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447–1465. https://doi.org/https://doi.org/10.1016/j.ejor.2006.09.100
Web of Science ®Google Scholar
Dash, S., Günlük, O., & Wei, D. (2018). Boolean decision rules via column generation. Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 (pp. 4660––4670). Curran Associates Inc.
Google Scholar
EU Expert Group on AI. (2019). Ethics guidelines for trustworthy AI. https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
Google Scholar
European Banking Authority. (2017). Guidelines on PD estimation, LGD estimation and the treatment of defaulted exposures. https://eba.europa.eu/regulation-and-policy/model-validation/guidelines-on-pd-lgd-estimation-and-treatment-of-defaulted-assets
Google Scholar
European Commission. (2020). On artificial intelligence - A European approach to excellence and trust. https://ec.europa.eu/info/sites/info/files/commission-white-paper-artificial-intelligence-feb2020_en.pdf
Google Scholar
European Union, G. D. P. R. (2016). Regulation (eu) 2016/679 of the european parliament and of the council. "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN#d1e2838-1-1"
Google Scholar
FICO. (2019). xml challenge. https://community.fico.com/s/explainable-machine-learning-challenge
Google Scholar
Financial Stability Board. (2017). Artificial intelligence and machine learning in financial services – market developments and financial stability implications. https://www.fsb.org/wp-content/uploads/P011117.pdf
Google Scholar
Finlay, S. (2012). Credit scoring, response modelling and insurance rating. Palgarve MacMillan.
Google Scholar
Fisher, A., Rudin, C., & Dominici, F. (2018). Model class reliance: Variable importance measures for any machine learning model class, from the ’rashomon’ perspective. arXiv 1801.01489, http://arxiv.org/abs/1801.01489
Google Scholar
Fitzpatrick, T., & Mues, C. (2016). An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market. European Journal of Operational Research, 249(2), 427–439. https://doi.org/https://doi.org/10.1016/j.ejor.2015.09.014
Web of Science ®Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/https://doi.org/10.18637/jss.v033.i01
PubMed Web of Science ®Google Scholar
Friedman, J. H. (2000). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.
Web of Science ®Google Scholar
Garzcarek, U., & Steuer, D. (2019). Approaching ethical guidelines for data scientists (pp. 151–169). Springer International Publishing.
Google Scholar
Gill, N., & Hall, P. (2018). An introduction to machine learning interpretability. O’Reilly Media, Inc.
Google Scholar
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1), 44–65. https://doi.org/https://doi.org/10.1080/10618600.2014.907095
Web of Science ®Google Scholar
Gomez, O., Holter, S., Yuan, J., & Bertini, E. (2020). Vice: Visual counterfactual explanations for machine learning models [Paper presentation]. Proceedings of the 25th International Conference on Intelligent User Interfaces, In, IUI ’20 (pp. 531–535). ACM.
Google Scholar
Goodman, B., & Flaxman, S. (2017). European union regulations on algorithmic decision-making and a “right to explanation. AI Magazine, 38(3), 50–57. https://doi.org/https://doi.org/10.1609/aimag.v38i3.2741
Web of Science ®Google Scholar
Gosiewska, A. (2020). Code - Explainable machine learning in credit scoring. Zenodo. https://doi.org/https://doi.org/10.5281/zenodo.4277225
Google Scholar
Gosiewska, A., & Biecek, P. (2019). Do not trust additive explanations. arXiv e-prints, page arXiv:1903.11420.
Google Scholar
Greenwell, B., Boehmke, B., Cunningham, J., & Developers, G. (2019). gbm: Generalized boosted regression models. R Package Version 2.1.5.
Google Scholar
Greenwell, B. M. (2017). pdp: An r package for constructing partial dependence plots. The R Journal, 9(1), 421–436. https://doi.org/https://doi.org/10.32614/RJ-2017-016
Google Scholar
Hand, D. (2009). Measuring classifier performance: A coherent alternative to the area under the roc curve. Machine Learning, 77(1), 103–123. https://doi.org/https://doi.org/10.1007/s10994-009-5119-5
Web of Science ®Google Scholar
Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Springer series in statistics. Springer International Publishing.
Google Scholar
Holter, S., Gomez, O., & Bertini, E. (2018). Fico explainable machine learning challenge. Creating visual explanations to black-box machine learning models. Explainable Machine Learning Challenge Documentation.
Google Scholar
Jenkins, S., Nori, H., Koch, P., & Caruana, R. (2019). Interpretml. https://github.com/microsoft/interpret
Google Scholar
Kusner, M., & Loftus, J. (2020). The long road to fairer algorithms. Nature, 534, 34–36.
Web of Science ®Google Scholar
Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136. https://doi.org/https://doi.org/10.1016/j.ejor.2015.05.030
Web of Science ®Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
Google Scholar
Louzada, F., Ara, A., & Fernandes, G. (2016). Classification methods applied to credit scoring: A systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134. https://doi.org/https://doi.org/10.1016/j.sorms.2016.10.001
Google Scholar
Lübke, K., Gehrke, M., Horst, J., & Szepannek, G. (2020). Why we should teach causal inference: Examples in linear regression with simulated data. Journal of Statistics Education, 28(2), 133–139. https://doi.org/https://doi.org/10.1080/10691898.2020.1752859
Web of Science ®Google Scholar
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765–4774). Curran Associates, Inc.
Google Scholar
McGough, M. (2018). How bad is Sacramento’s air, exactly? Google results appear at odds with reality, some say. https://www.sacbee.com/news/california/fires/article216227775.html
Google Scholar
Molnar, C. (2019). Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/
Google Scholar
Molnar, C., Bischl, B., & Casalicchio, G. (2018). iml: An R package for interpretable machine learning. Journal of Open Source Software, 3(26), 786. https://doi.org/https://doi.org/10.21105/joss.00786
Google Scholar
Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15. https://doi.org/https://doi.org/10.1016/j.dsp.2017.10.011
Web of Science ®Google Scholar
O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown Publishing Group.
Google Scholar
Płoński, P. (2019). mljar-supervised: The Automated Machine Learning - the new standard in ML. Machine Learning for Humans. https://github.com/mljar/mljar-supervisedt
Google Scholar
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August 13–17). “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, San Francisco, CA.
Google Scholar
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J., & Müller, M. (2011). proc: An open-source package for R and S + to analyze and compare roc curves. BMC Bioinformatics, 12, 77. https://doi.org/https://doi.org/10.1186/1471-2105-12-77
PubMed Web of Science ®Google Scholar
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/https://doi.org/10.1038/s42256-019-0048-x
PubMedGoogle Scholar
Scallan, G. (2011). Class(ic) scorecards – selecting attributes in logistic regression. In Credit scoring and credit control XIII. https://www.scoreplus.com/assets/files/Auto_Classing.pdf
Google Scholar
Schölkopf, B. (2019). Causality for machine learning. arXiv e-prints, https://arxiv.org/abs/1911.10500
Google Scholar
Sokol, K., & Flach, P. (2020). One explanation does not fit all. KI - Kunstliche Intelligenz, 34(2), 235–250. https://doi.org/https://doi.org/10.1007/s13218-020-00637-y.
Google Scholar
Szepannek, G. (2017a). A framework for scorecard modelling using R. In Credit scoring and credit control XV. https://www.crc.business-school.ed.ac.uk/sites/crc/files/2020-11/20-Gero-Szepannek.pdf
Google Scholar
Szepannek, G. (2017b). On the practical relevance of modern machine learning algorithms for credit scoring applications. WIAS Report Series, 29, 88–96.
Google Scholar
Szepannek, G. (2019). How much can we see? A note on quantifying explainability of machine learning models. arxiv 1910.13376., http://arxiv.org/abs/1910.13376
Google Scholar
Szepannek, G., & Aschenbruck, R. (2019). Predicting ebay prices: Selecting and interpreting machine learning models – Results of the AG DANK 2018 data science competition. Archives of Data Science A., 7(1), 1–17. https://doi.org/https://doi.org/10.5445/IR/1000125928/
Google Scholar
Thomas, L. C., Crook, J. N., & Edelman, D. B. (2019). Credit scoring and its applications (2nd ed.). SIAM.
Google Scholar
Tobback, E., & Martens, D. (2019). Retail credit scoring using fine-grained payment data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(4), 1227–1246. https://doi.org/https://doi.org/10.1111/rssa.12469
Web of Science ®Google Scholar
Verbraken, T., Bravo, C., Richard, W., & Baesens, B. (2014). Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research, 238(2), 505–513. https://doi.org/https://doi.org/10.1016/j.ejor.2014.04.001
Web of Science ®Google Scholar
Wexler, R. (2017). When a computer program keeps you in jail. https://www.nytimes.com/2017/06/13/opinion/how-computers-are-harming-criminal-justice.html
Google Scholar
Wright, M. N., & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/https://doi.org/10.18637/jss.v077.i01
Web of Science ®Google Scholar
Zhao, Q., & Hastie, T. (2021). Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39(1), 272–281. https://doi.org/https://doi.org/10.1080/07350015.2019.1624293
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Transparency, auditability, and explainability of machine learning models in credit scoring

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Transparency, auditability, and explainability of machine learning models in credit scoring

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date