Abstract
We propose a new procedure to estimate the loss given default (LGD) distribution. Owing to the complicated shape of the LGD distribution, using a smooth density function as a driver to estimate it may result in a decline in model fit. To overcome this problem, we first apply the logistic regression to estimate the LGD cumulative distribution function. Then, we convert the result into the LGD distribution estimate. To implement the newly proposed estimation procedure, we collect a sample of 5269 defaulted debts from Moody’s Default and Recovery Database. A performance study is performed using 2000 pairs of in-sample and out-of-sample data-sets with different sizes that are randomly selected from the entire sample. Our results show that the newly proposed procedure has better and more robust performance than its alternatives, in the sense of yielding more accurate in-sample and out-of-sample LGD distribution estimates. Thus, it is useful for studying the LGD distribution.
Acknowledgements
The authors thank the reviewers for their valuable comments and suggestions that have greatly improved the presentation of this paper. The brief description of our proposed procedure given in the introduction section is provided by the reviewer. This research is supported by the Ministry of Science and Technology, Taiwan, Republic of China.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1 For studying LGD, there are many other approaches. However, they suffer from different problems and thus are not suitable for estimating the LGD distribution. For example, the LGD distribution estimate produced using the inverse Gaussian regression (Qi and Zhao Citation2011) or beta regression (Yashkir and Yashkir Citation2013) has zero probability masses at boundaries 0 and 1. The Gaussian mixture model (Altman and Kalotay Citation2014) produces a bunch of transformed LGD values at a large negative or positive value. Thus, it is difficult to model the LGD distribution using a Gaussian mixture without facing distributional degeneracy. The ordered logistic regression (Li et al. Citation2014) and ordered probit model (Hwang et al. Citation2016) suffer from the case that some partition cells may have small or zero sizes and the resulting parameter estimates become less precise. Finally, fractional response regression, regression tree, neural network, support vector machine and ensemble model impose even no distributional assumption on LGD data (Bastos Citation2010, Citation2014, Loterman et al. Citation2012, Hartmann-Wendels et al. Citation2014).
2 Other link functions, such as the probit and the complementary log–log link functions, can be similarly used to model the probability distribution of Z(w), for each w ∊ [0, 1]. The results based on these link functions may lead to the drawing of similar insights.
3 There are another two approaches that can be used to produce for y ∊ (0, 1). First, we replace the polygon with the histogram. Using the performance metrics in Section 3.2, the histogram has similar performance to the polygon. Second, we apply the kernel estimation method of Wei and Chu (Citation1994) to the data
for j = 1, … , q − 1, and
to produce a smooth version of
But this approach suffers from a heavy computational burden. By these considerations, we use the polygon to present
for y ∊ (0, 1).
4 Through a straightforward calculation, where
and
are the average and variance of the given quantities RWMSDout,k, for k = 1, … , m. The same remark also applies to RMSEWMAD,out, RMSERWMSD,in and RMSEWMAD,in. By this result, the metric RMSE combines the average and variance of the given performance measures together. Thus, it is useful for measuring the performance of an estimation method over multiple samples.
5 For presenting the LGD frequency distribution, the number of categories q = 10 has been used in Bastos (Citation2010, Citation2014), Qi and Zhao (Citation2011) and Altman and Kalotay (Citation2014), q = 20 in Sigrist and Stahel (Citation2011), Yashkir and Yashkir (Citation2013) and Calabrese (Citation2014) and q = 50 in Oliveira et al. (Citation2015).
6 This recovery rate truncation approach has been used in Chava et al. (Citation2011), Qi and Zhao (Citation2011), Yashkir and Yashkir (Citation2013) and Altman and Kalotay (Citation2014).