Abstract
Categorical variables cannot be handled directly by logistic regression. Rather, they must be encoded or converted into continuous variables. Numerous category encodings have been proposed and used in logistic regression. However, these encodings haven’t been studied analytically. In this paper, we study analytical properties of eight commonly used category encodings in logistic regression, namely, one-hot encoding, Weight of Evidence encoding, flag encoding, label encoding, ordinal encoding, count encoding, frequency encoding and target encoding. Numerical examples are provided to demonstrate our analysis.