253
Views
23
CrossRef citations to date
0
Altmetric
Part 1: Consumer Credit Risk Modelling

Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data

Pages 486-501 | Received 01 Dec 2007, Accepted 01 Feb 2009, Published online: 21 Dec 2017
 

Abstract

The 2004 Basel II Accord has pointed out the benefits of credit risk management through internal models using internal data to estimate risk components: probability of default (PD), loss given default, exposure at default and maturity. Internal data are the primary data source for PD estimates; banks are permitted to use statistical default prediction models to estimate the borrowers’ PD, subject to some requirements concerning accuracy, completeness and appropriateness of data. However, in practice, internal records are usually incomplete or do not contain adequate history to estimate the PD. Current missing data are critical with regard to low default portfolios, characterised by inadequate default records, making it difficult to design statistically significant prediction models. Several methods might be used to deal with missing data such as list-wise deletion, application-specific list-wise deletion, substitution techniques or imputation models (simple and multiple variants). List-wise deletion is an easy-to-use method widely applied by social scientists, but it loses substantial data and reduces the diversity of information resulting in a bias in the model's parameters, results and inferences. The choice of the best method to solve the missing data problem largely depends on the nature of missing values (MCAR, MAR and MNAR processes) but there is a lack of empirical analysis about their effect on credit risk that limits the validity of resulting models. In this paper, we analyse the nature and effects of missing data in credit risk modelling (MCAR, MAR and NMAR processes) and take into account current scarce data set on consumer borrowers, which include different percents and distributions of missing data. The findings are used to analyse the performance of several methods for dealing with missing data such as likewise deletion, simple imputation methods, MLE models and advanced multiple imputation (MI) alternatives based on MarkovChain-MonteCarlo and re-sampling methods. Results are evaluated and discussed between models in terms of robustness, accuracy and complexity. In particular, MI models are found to provide very valuable solutions with regard to credit risk missing data.

Acknowledgements

The author gratefully acknowledges the helpful comments and questions of two anonymous reviewers.

Notes

1 A complete analysis on causes, prevention and treatment of item non-response can be consulted in CitationDe Leeuw et al (2003).

2 Variables are dichotomized though the substitution of observed values by 1 and missing values by 0.

3 The degree on mean square error will often be more than one standard error and its direction will depend on the application, pattern of missing data and model estimated (CitationSherman, 2000).

4 Obtained by assuming that the imputed data set is the complete data set and calculating the usual variance estimate.

5 Where vec(·) operator stacks the unique elements.

6 Information on this data set is available at http://mlearn.ics.uci.edu/databases/credit-screening/. Original data used in this paper can be obtained at http://mlearn.ics.uci.edu/databases/credit-screening/crx.data (information on missing values is included in the ‘crx.data’ file, see ‘?’ symbols).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 277.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.