Abstract
Reject inference is a method for inferring how a rejected credit applicant would have behaved had credit been granted. Credit-quality data on rejected applicants are usually missing not at random (MNAR). In order to infer credit-quality data MNAR, we propose a flexible method to generate the probability of missingness within a model-based bound and collapse Bayesian technique. We tested the method's performance relative to traditional reject-inference methods using real data. Results show that our method improves the classification power of credit scoring models under MNAR conditions.
Correction
In an earlier version of this article the title was incorrect. The correct title is shown in this final version of the article.
Correction
In an earlier version of this article the title was incorrect. The correct title is shown in this final version of the article.
Acknowledgements
Åstebro acknowledges financial support from the Natural Sciences and Engineering Research Council of Canada operating grant # OGP 0183683 and from HEC Foundation.
Notes
1 Selection models do not impute missing data, but instead models jointly the probability of selection and credit quality. See Section 3 for details.
2 For example, under a 1998 EU directive, organizations in countries that don’t match the Union's standards are in most cases prohibited from receiving almost all identification and behaviour data about EU constituents (http://www.strategy-business.com/press/16635507/69627). In the United States of America, the Gramm-Leach-Bliley Act for banking industry contains very important and far-reaching privacy provisions. Legislation being proposed with increasing frequency at both the federal and state levels would require that firms obtain explicit consent from individuals before collecting, using or exchanging information about them. Such ‘opt-in’ rules have already been incorporated into many European data protection laws, and have been adopted in (US) federal regulations and local ordinance (CitationStaten and Cate, 2003, p 749). For instance, California was the first state to limit the use of retail customer matching information and several other states have followed.
3 An account that is ‘bad’ is delinquent.
4 No theory exists on which model or data to use to estimate missingness probabilities. Some candidates for the regression model include the linear, logarithmic, polynomial, power and exponential models. R-square is used to select the best fitted model while keeping the model as parsimonious as possible. Alternative models that had more parameters tended not to improve results. A linear regression model also performed well. Robustness analysis available on request. In Section 3.7 we present what happens when missingness is estimated instead using information on the probability of being rejected.
5 Semiparametric single index estimation for bivariate models are still under development (CitationMarra and Radice, 2011), and hence is not included within the scope of this paper. The maximum score estimation method is not applied since it does not produce an estimate of the probability (CitationGreene, 2006).