References
- Alemzadeh, H., Raman, J., Leveson, N., Kalbarczyk, Z., & Iyer, R. K. (2016). Adverse events in robotic surgery: A retrospective study of 14 years of fda data. PLoS One, 11(4), e0151470–e01515 20. https://doi.org/https://doi.org/10.1371/journal.pone.0151470
- Ariza-Garzón, M. J., Arroyo, J., Caparrini, A., & Segovia-Vargas, M. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access, 8, 64873–64890. https://doi.org/https://doi.org/10.1109/ACCESS.2020.2984412
- Arya, V., Bellamy, R. K. E., Chen, P.-Y., Dhurandhar, A., Hind, M., Hoffman, S. C., Houde, S., Liao, Q. V., Luss, R., Mojsilović, A., Mourad, S., Pedemonte, P., Raghavendra, R., Richards, J., Sattigeri, P., Shanmugam, K., Singh, M., Varshney, K. R., Wei, D., & Zhang, Y. (2019). One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv, 1909.03012.
- Azevedo, A., & Santos, M. F. (2008). KDD, SEMMA and CRISP-DM: a parallel overview. In IADIS European Conf. Data Mining. https://www.semanticscholar.org/paper/KDD%2C-SEMMA-and-CRISP-DM%3A-a-parallel-overview-Azevedo-Santos/6bc30ac3f23d43ffc2254b0be24ec4217cf8c845. Europ. Conf. Data Mining (IADIS) (pp. 182–185).
- Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2002). Benchmarking state-of-the-art classification algorithms for credit scoring. JORS, 54(6), 627–635.
- Banasik, J., & Crook, J. (2007). Reject inference, augmentation and sample selection. European Journal of Operational Research, 183(3), 1582–1594. https://doi.org/https://doi.org/10.1016/j.ejor.2006.06.072
- Bellotti, T., & Crook, J. (2009). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2), 3302–3308. https://doi.org/https://doi.org/10.1016/j.eswa.2008.01.005
- Biecek, P. (2018). Dalex: Explainers for complex predictive models. Journal of Machine Learning Research, 19(84), 1–5.
- Biecek, P., & Burzykowski, T. (2019). Explanatory Model Analysis. New York: Chapman and Hall/CRC. https://pbiecek.github.io/ema/
- Bischl, B., Kühn, T., & Szepannek, G., (2014). On class imbalance correction for classification algorithms in credit scoring. In M. Lübbecke, A. P. L. R. M. B. P. Koster, & G. Walther (Eds.), Operations Research Proceedings (pp. 37–43). Springer. https://www.springerprofessional.de/on-class-imbalance-correction-for-classification-algorithms-in-c/7494446
- Brown, I., & Christophe, M. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/https://doi.org/10.1016/j.eswa.2011.09.033
- Bücker, M., van Kampen, M., & Krämer, W. (2013). Reject inference in consumer credit scoring with nonignorable missing data. Journal of Banking & Finance, 37(3), 1040–1045. https://doi.org/https://doi.org/10.1016/j.jbankfin.2012.11.002
- Chen, C., Lin, K., Rudin, C., Shaposhnik, Y., Wang, S., & Wang, T. (2018). An interpretable model with globally consistent explanations for credit risk. arXiv, 1811.12615.
- Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system [Paper presentation]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (pp. 785–794). ACM.
- Cook, D. (2016). Practical machine learning with H2O: Powerful, scalable techniques for deep learning and AI. O’Reilly Media.
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/https://doi.org/10.1007/BF00994018
- Crook, J., Edelman, D., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447–1465. https://doi.org/https://doi.org/10.1016/j.ejor.2006.09.100
- Dash, S., Günlük, O., & Wei, D. (2018). Boolean decision rules via column generation. Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 (pp. 4660––4670). Curran Associates Inc.
- EU Expert Group on AI. (2019). Ethics guidelines for trustworthy AI. https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
- European Banking Authority. (2017). Guidelines on PD estimation, LGD estimation and the treatment of defaulted exposures. https://eba.europa.eu/regulation-and-policy/model-validation/guidelines-on-pd-lgd-estimation-and-treatment-of-defaulted-assets
- European Commission. (2020). On artificial intelligence - A European approach to excellence and trust. https://ec.europa.eu/info/sites/info/files/commission-white-paper-artificial-intelligence-feb2020_en.pdf
- European Union, G. D. P. R. (2016). Regulation (eu) 2016/679 of the european parliament and of the council. "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN#d1e2838-1-1"
- FICO. (2019). xml challenge. https://community.fico.com/s/explainable-machine-learning-challenge
- Financial Stability Board. (2017). Artificial intelligence and machine learning in financial services – market developments and financial stability implications. https://www.fsb.org/wp-content/uploads/P011117.pdf
- Finlay, S. (2012). Credit scoring, response modelling and insurance rating. Palgarve MacMillan.
- Fisher, A., Rudin, C., & Dominici, F. (2018). Model class reliance: Variable importance measures for any machine learning model class, from the ’rashomon’ perspective. arXiv 1801.01489, http://arxiv.org/abs/1801.01489
- Fitzpatrick, T., & Mues, C. (2016). An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market. European Journal of Operational Research, 249(2), 427–439. https://doi.org/https://doi.org/10.1016/j.ejor.2015.09.014
- Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/https://doi.org/10.18637/jss.v033.i01
- Friedman, J. H. (2000). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.
- Garzcarek, U., & Steuer, D. (2019). Approaching ethical guidelines for data scientists (pp. 151–169). Springer International Publishing.
- Gill, N., & Hall, P. (2018). An introduction to machine learning interpretability. O’Reilly Media, Inc.
- Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1), 44–65. https://doi.org/https://doi.org/10.1080/10618600.2014.907095
- Gomez, O., Holter, S., Yuan, J., & Bertini, E. (2020). Vice: Visual counterfactual explanations for machine learning models [Paper presentation]. Proceedings of the 25th International Conference on Intelligent User Interfaces, In, IUI ’20 (pp. 531–535). ACM.
- Goodman, B., & Flaxman, S. (2017). European union regulations on algorithmic decision-making and a “right to explanation. AI Magazine, 38(3), 50–57. https://doi.org/https://doi.org/10.1609/aimag.v38i3.2741
- Gosiewska, A. (2020). Code - Explainable machine learning in credit scoring. Zenodo. https://doi.org/https://doi.org/10.5281/zenodo.4277225
- Gosiewska, A., & Biecek, P. (2019). Do not trust additive explanations. arXiv e-prints, page arXiv:1903.11420.
- Greenwell, B., Boehmke, B., Cunningham, J., & Developers, G. (2019). gbm: Generalized boosted regression models. R Package Version 2.1.5.
- Greenwell, B. M. (2017). pdp: An r package for constructing partial dependence plots. The R Journal, 9(1), 421–436. https://doi.org/https://doi.org/10.32614/RJ-2017-016
- Hand, D. (2009). Measuring classifier performance: A coherent alternative to the area under the roc curve. Machine Learning, 77(1), 103–123. https://doi.org/https://doi.org/10.1007/s10994-009-5119-5
- Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Springer series in statistics. Springer International Publishing.
- Holter, S., Gomez, O., & Bertini, E. (2018). Fico explainable machine learning challenge. Creating visual explanations to black-box machine learning models. Explainable Machine Learning Challenge Documentation.
- Jenkins, S., Nori, H., Koch, P., & Caruana, R. (2019). Interpretml. https://github.com/microsoft/interpret
- Kusner, M., & Loftus, J. (2020). The long road to fairer algorithms. Nature, 534, 34–36.
- Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136. https://doi.org/https://doi.org/10.1016/j.ejor.2015.05.030
- Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
- Louzada, F., Ara, A., & Fernandes, G. (2016). Classification methods applied to credit scoring: A systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134. https://doi.org/https://doi.org/10.1016/j.sorms.2016.10.001
- Lübke, K., Gehrke, M., Horst, J., & Szepannek, G. (2020). Why we should teach causal inference: Examples in linear regression with simulated data. Journal of Statistics Education, 28(2), 133–139. https://doi.org/https://doi.org/10.1080/10691898.2020.1752859
- Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765–4774). Curran Associates, Inc.
- McGough, M. (2018). How bad is Sacramento’s air, exactly? Google results appear at odds with reality, some say. https://www.sacbee.com/news/california/fires/article216227775.html
- Molnar, C. (2019). Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/
- Molnar, C., Bischl, B., & Casalicchio, G. (2018). iml: An R package for interpretable machine learning. Journal of Open Source Software, 3(26), 786. https://doi.org/https://doi.org/10.21105/joss.00786
- Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15. https://doi.org/https://doi.org/10.1016/j.dsp.2017.10.011
- O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown Publishing Group.
- Płoński, P. (2019). mljar-supervised: The Automated Machine Learning - the new standard in ML. Machine Learning for Humans. https://github.com/mljar/mljar-supervisedt
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August 13–17). “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, San Francisco, CA.
- Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J., & Müller, M. (2011). proc: An open-source package for R and S + to analyze and compare roc curves. BMC Bioinformatics, 12, 77. https://doi.org/https://doi.org/10.1186/1471-2105-12-77
- Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/https://doi.org/10.1038/s42256-019-0048-x
- Scallan, G. (2011). Class(ic) scorecards – selecting attributes in logistic regression. In Credit scoring and credit control XIII. https://www.scoreplus.com/assets/files/Auto_Classing.pdf
- Schölkopf, B. (2019). Causality for machine learning. arXiv e-prints, https://arxiv.org/abs/1911.10500
- Sokol, K., & Flach, P. (2020). One explanation does not fit all. KI - Kunstliche Intelligenz, 34(2), 235–250. https://doi.org/https://doi.org/10.1007/s13218-020-00637-y.
- Szepannek, G. (2017a). A framework for scorecard modelling using R. In Credit scoring and credit control XV. https://www.crc.business-school.ed.ac.uk/sites/crc/files/2020-11/20-Gero-Szepannek.pdf
- Szepannek, G. (2017b). On the practical relevance of modern machine learning algorithms for credit scoring applications. WIAS Report Series, 29, 88–96.
- Szepannek, G. (2019). How much can we see? A note on quantifying explainability of machine learning models. arxiv 1910.13376., http://arxiv.org/abs/1910.13376
- Szepannek, G., & Aschenbruck, R. (2019). Predicting ebay prices: Selecting and interpreting machine learning models – Results of the AG DANK 2018 data science competition. Archives of Data Science A., 7(1), 1–17. https://doi.org/https://doi.org/10.5445/IR/1000125928/
- Thomas, L. C., Crook, J. N., & Edelman, D. B. (2019). Credit scoring and its applications (2nd ed.). SIAM.
- Tobback, E., & Martens, D. (2019). Retail credit scoring using fine-grained payment data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(4), 1227–1246. https://doi.org/https://doi.org/10.1111/rssa.12469
- Verbraken, T., Bravo, C., Richard, W., & Baesens, B. (2014). Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research, 238(2), 505–513. https://doi.org/https://doi.org/10.1016/j.ejor.2014.04.001
- Wexler, R. (2017). When a computer program keeps you in jail. https://www.nytimes.com/2017/06/13/opinion/how-computers-are-harming-criminal-justice.html
- Wright, M. N., & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/https://doi.org/10.18637/jss.v077.i01
- Zhao, Q., & Hastie, T. (2021). Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39(1), 272–281. https://doi.org/https://doi.org/10.1080/07350015.2019.1624293