Abstract
The granting process is based on the probability that the applicant will refund his/her loan given his/her characteristics. This probability, also called score, is learnt based on a dataset in which rejected applicants are excluded. Thus, the population on which the score is used is different from the learning population. Many “reject inference” methods try to exploit the data available from the rejected applicants in the learning process. However, most of these methods are empirical and lack of formalization of their assumptions, and of their expected theoretical properties. We formalize such hidden assumptions in a general missing data setting for some of the most common reject inference methods. It reveals that hidden modelling is mostly incomplete, thus prohibiting to compare existing methods within the general model selection mechanism (except by financing “non-fundable” applicants). So, we assess performance of the methods on both simulated data and real data (from CACF, a major European loan issuer). Unsurprisingly, no method seems uniformly dominant. Both these theoretical and empirical results not only reinforce the idea to carefully use the classical reject inference methods but also to invest in future research works for designing model-based reject inference methods (without financing “non-fundable” applicants).
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Using the proposed implementation at: https://stat.ethz.ch/pipermail/r-help/2008-February/153708