ABSTRACT
We aim to examine how the Large Language Model (LLM) can contribute to loan default prediction by extracting narrative data. Based on a Chinese FinTech lending platform dataset, we employ four LLMs to predict the probability of default (PD-LLM) based on the narrative data and use the PD-LLM as an additional feature to predict default loans. The empirical results show that the narrative data contain some extra credit information and can hardly be regarded as ‘cheap talks’. The extracted information via LLMs processes some predictive capability to predict default loan applications in both in- and out-of-sample analysis. The out-of-sample results indicate that including PD-LLM can significantly improve out-of-sample forecasting performance. At the same time, the rule-based linguistic characteristics and Word-Frequency-based Models hinder out-of-sample forecasting.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/13504851.2023.2275647
Notes
1 We provide a comprehensive literature review to discuss this in the online supplementary materials.
2 The definitions of these variables can be found in the online supplementary material.
3 https://huggingface.co/models. The technical details of all these models can be found in the online supplementary file.
5 To eliminate the deviation of results caused by different model specifications, we employ a Probit model and the results are reported in the online supplementary materials.