Abstract
This article presents several actuarial applications of categorical embedding in the context of non-life insurance risk classification. In non-life insurance, many rating factors are naturally categorical, and often the categorical variables have a large number of levels. The high cardinality of categorical rating variables presents challenges in the implementation of traditional actuarial methods. Categorical embedding that is proposed in the machine learning literature for handling categorical variables has recently received attention in actuarial studies. The method is inspired by the neural network language models for learning text data and maps a categorical variable into a real-valued representation in the Euclidean space. Using a property insurance claims we demonstrate the use of categorical embedding in three applications. The first shows how embeddings are used to construct rating classes and calculate rating relativities for a single insurance risk. The second concerns predictive modeling for multivariate insurance risks and emphasizes the effects of dependence on tail risks. The third focuses on pricing new products where transfer learning is used to gather knowledge from existing products.
Discussions on this article can be submitted until April 1, 2024. The authors reserve the right to reply to any discussion. Please see the Instructions for Authors found online at http://www.tandfonline.com/uaaj for submission instructions.