Full article: Linear Logistic Scoring Equations for Latent Class and Latent Profile Models: A Simple Method for Classifying New Cases

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Researchers are often interested in using latent class or latent profile parameter estimates to obtain posterior class membership probabilities for observations other than those of the original sample. In this paper, we demonstrate that these probabilities typically take on the form of linear logistic equations with coefficients which are functions of the original model parameters. In other words, the posterior class membership probabilities can be computed with a prediction formula similar to that of a multinomial logistic regression model. We derive the scoring equations for nominal, ordinal, count, and continuous indicators, as well as investigate models with missing values on class indicators, local dependencies, covariates, or multiple latent variables. In addition to the mathematical derivations of the scoring equations, we describe how either exact or approximate scoring equations can be obtained by estimating a multinomial regression model using a weighted data set.

Keywords:

In applications of factor analysis, after selecting and estimating the factor model of interest, one will typically obtain (linear) factor-score equations which can be used to estimate the subjects’ factor scores as a function of the original items included in the model (Bartholomew et al., Citation2011). An important feature of factor-score equations is that these can be used not only for the subjects in the estimation sample, but also for new subjects, that is, for out-of-sample prediction.

When performing a latent class (LC) analysis, after selecting the final model, one may assign the individuals in the estimation sample to LCs using their posterior class membership probabilities. However, it is not well known that these posterior probabilities can be expressed exactly by a set of linear logistic equations, with “regression” weights which are functions of the original LC model parameters. More specifically, a closed-form expression for the posteriors exists, as a function of the LC model parameters, if the responses are modeled using distributions from the exponential family and with canonical link functions. Availability of a set of scoring equations makes it straightforward to compute the class membership probabilities for subjects which do not belong to the original sample used to estimate the LC model. In this way, one can realize an important goal of many LC analysis applications, namely obtaining out-of-sample class membership predictions. The main advantage of this approach is it allows predicting class memberships of new subjects without the need to use LC analysis software or to program the formula of the estimated LC model. Note that this is similar to what is done in factor analysis, where factor scores are obtained using a linear factor-score formula, without the need to return to the estimates of the factor covariances, factor loadings, and residual covariances.

As far as we know, LatentGOLD (Vermunt & Magidson, Citation2016; Citation2021) is currently the only software for LC analysis that allows one to obtain these logistic scoring equations, both in tabular form and in the form of SPSS or R syntax. The aim of this paper is to show how these equations are derived. As will be shown, the slopes of the linear logistic scoring equations are obtained easily, but the expression for the intercept terms (constants) may be somewhat more complex. In many situations, the equations for the posteriors will contain only main effects of the response variables. However, as in quadratic discriminant analysis, when a LC model for continuous responses assumes variances to be class specific, quadratic terms also need to be included, and when the LC model contains covariances/associations which are class specific, interactions are also required. The approach can be extended easily to LC models with covariates and multiple latent variables. More complicated are situations where responses contain missing values (in which case the constants need to be adapted to the missing data pattern), where the model contains direct effects of covariates on the responses (in which case the exact logistic form may no longer hold), and where non-canonical link functions are used (in which case there is no longer any direct relation between the LC model parameters and the scoring equation).

Rather than computing the scoring equations from the LC model parameters, one can also obtain these equations by estimating a multinomial logistic regression model using the posteriors as weights, as done in the LatentGOLD Step3-Scoring option (Vermunt & Magidson, Citation2016; Citation2021). This approach has the advantage of increased flexibility in that it is also possible to obtain approximate equations when exact closed-form solutions are not available, or when one prefers a simpler approximate set of scoring equations over more complex exact equations.

Below, we present the scoring equations for models for categorical responses (nominal, ordinal, and counts), models for continuous responses, models with local dependencies, models with covariates, models with missing values on responses, and models with multiple latent variables. We also discuss how the scoring equations can be obtained using a weighted multinomial logistic regression analysis.

1. Latent Class Models for Categorical Responses

Let $Y_{j}$ denote one of $J$ response variables (or indicators), with $1 \leq j \leq J .$ A particular response for and the number of categories of the $j$ th response variable are referred to as $y_{j}$ and $R_{j},$ respectively, with $1 \leq y_{j} \leq R_{j} .$ The probability of having a particular set of responses $y$ is denoted by $P (Y = y) .$ The discrete latent variable is denoted by $X,$ a particular latent class by $k,$ and the number of classes by $K .$

1.1. Nominal Responses

The standard LC model for nominal responses has the following form (Collins & Lanza, Citation2010; Goodman, Citation1974a; Citation1974b; Hagenaars, Citation1990; McCutcheon, Citation1987): $P (Y = y) = \sum_{k = 1}^{K} P (X = k) \prod_{j = 1}^{J} P (Y_{j} = y_{j} | X = k),$ where $P (X = k)$ is the probability of belonging to class $k$ and $P (Y_{j} = y_{j} | X = k)$ the conditional probability of giving response $y_{j}$ on variable $Y_{j}$ conditional on belonging to class $k .$ These probabilities are often parameterized using logistic equations (Formann, Citation1992; Heinen, Citation1996; Magidson & Vermunt, Citation2004); that is, $P (X = k) = \frac{exp (γ_{k})}{D}$ $P (Y_{j} = y_{j} | X = k) = \frac{exp (α_{y_{j}} + β_{y_{j} k})}{E_{jk}}$

With $D = \sum_{k = 1}^{K} e xp (γ_{k})$ $E_{jk} = \sum_{y_{j} = 1}^{R_{j}} e xp (α_{y_{j}} + β_{y_{j} k})$

Here, $γ_{k}$ are intercept or constant terms in the regression model for $P (X = k),$ and $α_{y_{j}}$ and $β_{y_{j} k}$ are intercept and slope parameters in the regression model for $P (Y_{j} = y_{j} | X = k) .$ As always, identifying constraints need to be imposed on the logistic parameters. Typically, they are either restricted to sum to 0 over classes and response categories (referred to as effect coding), or set to 0 for one class and one response category (called dummy coding). The terms $D$ and $E_{jk}$ are normalizing constants.

The posterior probability of belonging to class $k$ conditional on response vector $y,$ denoted by $P (X = k | Y = y),$ can be obtained as follows (Dias & Vermunt, Citation2008; Goodman, Citation1974a; Citation1974b): $P (X = k | Y = y) = \frac{P (X = k) \prod_{j = 1}^{J} P (Y_{j} = y_{j} | X = k)}{\sum_{k = 1}^{K} P (X = k) \prod_{j = 1}^{J} P (Y_{j} = y_{j} | X = k)} .$

Replacing the model probabilities by their logit equations yields: $P (X = k | Y = y) = \frac{\frac{exp (γ_{k})}{D} \prod_{j = 1}^{J} \frac{exp (α_{y_{j}} + β_{y_{j} k})}{E_{jk}}}{\sum_{k = 1}^{K} \frac{exp (γ_{k})}{D} \prod_{j = 1}^{J} \frac{exp (α_{y_{j}} + β_{y_{j} k})}{E_{jk}}} .$

This equation can be simplified by removing $D$ and $α_{y_{j}},$ which are redundant because they do not depend on $k .$ Moreover, the product over the $J$ responses can be replaced by a sum over the logs of the terms concerned. This yields $P (X = k | Y = y) = \frac{exp (γ_{k} - \sum_{j = 1}^{J} l og (E_{jk}) + \sum_{j = 1}^{J} β_{y_{j} k})}{\sum_{k = 1}^{K} e xp (γ_{k} - \sum_{j = 1}^{J} l og (E_{jk}) + \sum_{j = 1}^{J} β_{y_{j} k})} = \frac{exp (γ_{k}^{*} + \sum_{j = 1}^{J} β_{y_{j} k})}{\sum_{k = 1}^{K} e xp (γ_{k}^{*} + \sum_{j = 1}^{J} β_{y_{j} k})},$ where $γ_{k}^{*} = γ_{k} - \sum_{j = 1}^{J} l og (E_{jk}) .$ Though not really necessary, the $γ_{k}^{*}$ parameters may be subjected to the same identifying—effect or dummy coding—constraints as the other parameters.

The above derivation shows that the posterior class membership probabilities can be written as a logistic equation with slopes equal to the LC model logistic regression slopes and with constants equal to the logistic class constants minus the sum of the logs of the normalizing constants. The more difficult part in the computation of these scoring equations is the computation of the constants $γ_{k}^{*} .$ But, once we have the scoring equations, we can easily compute the class membership probabilities for any response pattern, including response patterns which were not available in the original data used to estimate the LC model of interest.

It should be noted that the $γ_{k}^{*}$ terms are identical to the one-variable parameters for the latent classes in the log-linear formulation of the LC model proposed by Haberman (Citation1979). In this formulation, the joint distribution of the $X$ and $Y,$ $P (X = k, Y = y),$ is modelled as follows: $P (X = k, Y = y) = \frac{exp (γ_{k}^{*} + \sum_{j = 1}^{J} α_{y_{j}} + \sum_{j = 1}^{J} β_{y_{j} k})}{F} .$

The posterior class membership probability is obtained as $P (X = k | Y = y) = P (X = k, Y = y) / \sum_{k = 1}^{K} P (X = k, Y = y) .$ As can be seen, since $\sum_{j = 1}^{J} α_{y_{j}}$ and $F$ cancel, $P (X = k | Y = y)$ has exactly the form we derived above. Thus, when using Haberman’s log-linear formulation, the constants of the scoring equations are also model parameters. However, an important disadvantage of this formulation is that it is computationally less efficient since parameter estimation involves processing the cell entries in the joint cross-tabulation of $X$ and all $Y_{j}$ variables. Therefore, this log-linear approach can be used only when the number of response variables is small.

1.2. Ordinal Responses and Counts

In the LatentGOLD program for LC analysis (Vermunt & Magidson, Citation2016; Citation2021), ordinal response variables can be modeled using an adjacent-category logit model, that is, using a canonical link function (Agresti, Citation2002). More specifically, these are multinomial logit models in which the class-indicator association parameters are restricted as follows: $β_{y_{j} k} = β_{jk} y_{j};$ that is, to be nominal-by-linear (Goodman, Citation1979; Heinen, Citation1996). This implies that for ordinal variables, $P (Y_{j} = y_{j} | X = k) = \frac{exp (α_{y_{j}} + β_{j k} y_{j})}{E_{jk}} .$

The same restrictions imposed on the $β_{y_{j} k}$ parameters also apply to the scoring equations; that is, for ordinal variables, we replace the $β_{_{j} k}$ terms in the scoring equations by $β_{jk} y_{j} .$ This shows that in the ordinal case, the class-membership logits are linear functions of the item responses. It should be noted that while we assumed that the category scores ranged from 1 to $R_{j},$ the adjacent-category logit model allows for any type of scoring. In its more general form, $β_{y_{j} k} = β_{jk} ν_{y_{j}},$ where $ν_{y_{j}}$ is the score for category $y_{j} .$

When modeling ordinal variables using other (non-canonical) link functions, such as cumulative logit or cumulative probit link functions, exact expressions for the scoring equations no longer exist. As will be shown below, a possible way out is to estimate the scoring equations treating the response variables as either numeric or nominal predictors of class membership.

Also for a Poisson and binomial count variable, the scoring equations contain the term $β_{jk} y_{j} .$ This can be seen from the fact that the class-specific density of a count variable $Y_{j}$ takes on the following form: $P (Y_{j} = y_{j} | X = k) \propto \frac{exp (α_{j} y_{j} + β_{jk} y_{j})}{E_{jk}},$ where it should be noted that $α_{_{j}} y_{j}$ cancels from the scoring equation. The expression for $log E_{jk}$ changes compared to the nominal and ordinal case. For Poisson counts, $log E_{jk} = exp (α_{j} + β_{jk}) e_{j},$ where $e_{j}$ is the exposure; and for Binomial counts, $log E_{jk} = e_{j} l og (1 + exp (α_{j} + β_{jk})),$ where $e_{j}$ represents the number of trials. This shows that scoring equations should also include terms for the exposure (number of trials) when this number varies across individuals. When the $e_{j}$ are fixed, as for nominal and ordinal variables, the $log E_{jk}$ terms can be included in the constants $γ_{k}^{*} .$

1.3. Local Dependencies

Thus far, we assumed that responses are independent within classes. Now we will look at the scoring equations for LC models with local dependencies (Hagenaars, Citation1988; Magidson & Vermunt, Citation2004; Oberski et al., Citation2013). In the most general case, including a local dependency between (nominal) response variables $Y_{j}$ and $Y_{m}$ implies that $P (Y_{j} = y_{j}, Y_{m} = y_{m} | X = k) = \frac{exp (α_{y_{j}} + α_{y_{m}} + β_{y_{j} k} + β_{y_{m} k} + δ_{y_{j} y_{m}} + λ_{y_{j} y_{m} k})}{E_{jmk}},$ where $E_{jmk} = \sum_{y_{j} = 1}^{R_{j}} \sum_{y_{m} = 1}^{R_{m}} e xp (α_{y_{j}} + α_{y_{m}} + β_{y_{j} k} + β_{y_{m} k} + δ_{y_{j} y_{m}} + λ_{y_{j} y_{m} k}) .$

This is a model with an association between $Y_{j}$ and $Y_{m},$ $δ_{y_{j} y_{m}},$ and an interaction with the latent classes, $λ_{y_{j} y_{m} k} .$ In other words, it represents a model in which the strength of the local dependency is allowed to vary across classes.

As can be seen, one difference with the local independence model is that the normalizing constants entering in the $γ_{k}^{*}$ coefficients of the scoring equations should be computed per set of locally dependent variables. The $δ_{y_{j} y_{m}}$ term cancels from the scoring equations because it does not depend on the classes. In contrast, the term $λ_{y_{j} y_{m} k}$ becomes part of the scoring equations, which in the case of class-specific local dependencies not only contains main effects but also interaction terms. Note that when local dependencies are not class-specific, that is, when $λ_{y_{j} y_{m} k} = 0,$ the only remaining difference between local independence and local dependence models concerns the computation of the constants $γ_{k}^{*} .$

The scoring equations in local-dependence LC models for ordinal variables are very similar to those for nominal variables. When the ordinal variables are modelled using an adjacent-category logit specification, $δ_{y_{j} y_{m}} = δ_{jm} y_{j} y_{m}$ and $λ_{y_{j} y_{m} k} = λ_{jmk} y_{j} y_{m} .$ The scoring equations will contain the term $λ_{jmk} y_{j} y_{m}$ when the interaction parameters $λ_{jmk}$ are not fixed to 0.

1.4. Missing Values

When some indicators have missing values, the LC model for the observed values $Y_{obs}$ can be defined as follows: $P (Y_{obs} = y_{obs}) = \sum_{k = 1}^{K} P (X = k) \prod_{j = 1}^{J} P (Y_{j} = y_{j} | X = k)^{r_{j}},$ where $r_{j} = 1$ if the response variable concerned is observed and 0 when it has a missing value (Vermunt et al., Citation2008; Vermunt & Magidson, Citation2016). Note that this formulation implies that the product is taken over the observed responses only. Therefore, similar to subjects with complete data, the computation of the posteriors for subjects with missing values involves using only their observed responses. This means that the sum $\sum_{j = 1}^{J} β_{y_{j} k}$ should be taken over the observed variables only or, equivalently, that $β_{y_{j} k}$ should be set to 0 for the missing value category. However, the sum $\sum_{j = 1}^{J} l og (E_{jk})$ which is subtracted from the constants should also be taken over the observed variables only, implying that each pattern of missing data has its own set of constants $γ_{k}^{*} .$ A way to account for this is by using the same $γ_{k}^{*}$ for all observations but adding a term $log (E_{jk})$ to the scoring equation when variable $j$ has a missing value. In other words, in order to deal with missing data, the scoring equation should be expanded to include the term $\sum_{j = 1}^{J} l og (E_{jk}) (1 - r_{j}) .$ Note that this approach can be used with any missing data pattern occurring among the new subjects for which one wishes to obtain the posteriors.

A special type of missing data occurs when the LC model is estimated using $J$ variables, but only the first $J_{1}$ of these are to be used for classification purposes (where $J = J_{1} + J_{2}$ ); for example, this situation may occur if one wishes to ignore the last $J_{2}$ variables when calculating the classification probabilities because this information will not be available when performing out-of-sample predictions. In this case, the posteriors are obtained as follows: $P (X = k | Y_{1} = y_{1}) = \frac{P (X = k) \prod_{j = 1}^{J_{1}} P (Y_{j} = y_{j} | X = k)}{\sum_{k = 1}^{K} P (X = k) \prod_{j = 1}^{J_{1}} P (Y_{j} = y_{j} | X = k)} .$

As can be seen, only the slope parameters and the normalizing constants of the first $J_{1}$ response variables will enter into the scoring equations.

1.5. An Example with Five Dichotomous Indicators

provides an example illustrating the computation of the scoring equations for an application with five dichotomous response variables. It concerns the model with three latent classes estimated for the LatentGOLD “political.sav” demo data set. The upper part of gives the estimates of the model parameters $γ_{k},$ $α_{y_{j}},$ and $β_{y_{j} k}$ using dummy coding with the parameters for the first class and the first item category fixed to 0. The lower part gives the values of $γ_{k}^{*}$ and $log (E_{jk}),$ where for consistency we also use dummy coding for the $log (E_{jk})$ terms. To obtain the $log (E_{1 k})$ values for the first item, we first compute ${exp (α}_{y_{1}} + β_{y_{1} k}),$ which for all three classes equals 1.0000 for $y_{1} = 1,$ and 2.6533, 0.4452, and 1.4312 for $y_{1} = 2 .$ Next, we sum the obtained values across the two item categories and take the log, yielding 1.29562, 0.3682, and 0.8884 for the three classes. Because of the dummy coding, we subtract the value of the first class, yielding the reported $log (E_{1 k})$ values 0.0000, −0.9275, and −0.4073. The $log (E_{jk})$ values of all items are subtracted from the $γ_{k}$ values to get the intercepts $γ_{k}^{*}$ of the scoring equations, and the slopes $β_{y_{j} k}$ can be used without any modification in the scoring equations. In the case of a missing value, the slope parameters for the item concerned equal $log (E_{jk}) .$ Appendix A shows R code generated for this application by LatentGOLD, which can be used for classifying new observations.

Table 1. Latent class model parameters and scoring equation parameters for the political.sav data example.

Download CSV Display Table

2. Other Types of Latent Class and Mixture Models

2.1. Continuous Responses

Now, let us turn to LC or mixture models for continuous response variables (McLachlan & Peel, Citation2000), also referred as latent profile models. In a local independence model with normal within-class distributions with possibly unequal variances, the response distributions have the following form: $P (Y_{j} = y_{j} | X = k) \propto exp (- \frac{1}{2} l og σ_{jk}^{2} - \frac{1}{2} \frac{μ_{jk} μ_{jk}}{σ_{jk}^{2}} + \frac{μ_{jk} y_{j}}{σ_{jk}^{2}} - \frac{1}{2} \frac{y_{j} y_{j}}{σ_{jk}^{2}}),$ where $μ_{jk}$ and $σ_{jk}^{2}$ denote the mean and variance of $Y_{j}$ in latent class $k .$ It can be seen that in the construction of the logistic scoring equations, the terms $- \frac{1}{2} l og σ_{jk}^{2}$ and $- \frac{1}{2} \frac{μ_{jk} μ_{jk}}{σ_{jk}^{2}},$ which do not contain the response, become part of the constants. Moreover, the equations will contain the linear and quadratic terms $\frac{μ_{jk}}{σ_{jk}^{2}} y_{j}$ and $- \frac{1}{2 σ_{jk}^{2}} y_{j} y_{j} .$

When variances are assumed to be equal across classes, the first and last term of the above univariate normal distribution become $- \frac{1}{2} l og σ_{j}^{2}$ and $- \frac{1}{2} \frac{y_{j} y_{j}}{σ_{j}^{2}},$ respectively, implying that these cancel from the scoring equations because they do not depend on the class. This yields a set of scoring equations similar to those obtained in linear discriminant analysis (Hastie et al., Citation2008).

In the more general case of multivariate normal responses with unrestricted covariances $Σ_{k},$ the LC model becomes $P (Y = y) = \sum_{k = 1}^{K} P (X = k) P (Y = y | X = k),$ with $P (Y = y | X = k) \propto exp (- \frac{1}{2} l og | Σ_{k} | - \frac{1}{2} μ_{k}^{'} Σ_{k}^{- 1} μ_{k} + μ_{k}^{'} Σ_{k}^{- 1} y - \frac{1}{2} y^{'} Σ_{k}^{- 1} y) .$

As can be seen, the scoring equations now not only contain linear and quadratic terms, but also interaction terms are needed. More specifically, denoting an entry of $Σ_{k}^{- 1}$ by $a_{mjk},$ the weights for $y_{j},$ $y_{j}^{2},$ and $y_{j} y_{m}$ are $\sum_{m = 1}^{J} μ_{mk} a_{jmk},$ $- \frac{1}{2} a_{jjk},$ and $- a_{jmk},$ respectively. The first two terms of the multivariate normal density, which do not depend on the responses, become part of the constants. When variances and covariances are equal across classes, we again have equations with main effects only, as in linear discriminant analysis.

Various kinds of restricted mixtures of multivariate normal distributions have been proposed in which constraints are imposed on the class-specific means and/or covariances. Examples include mixture factor models (McLachlan & Peel, Citation2000; Yung, Citation1997), mixture structural equation models (Dolan & Van der Maas, Citation1997), mixture models with constrained eigenvalue decompositions of $Σ_{k}$ (Banfield & Raftery, Citation1993), and mixture growth models (Muthén, Citation2004). For these models, the same scoring equations can be used as when means and covariances are unrestricted.

2.2. An Example with Three Continuous Indicators

reports the model parameters and the scoring equations for a LC model with three continuous indicators (Glucose, Insulin, and SSPG) from the LatentGOLD “diabetes.dat” demo data set. It is a three-class model with a free residual covariance between the first two class indicators and with class-specific residual (co)variances. The scoring equation for this model contains not only linear terms, but also quadratic terms as well as the interaction terms between $y_{1}$ and $y_{2} .$

Table 2. Latent class model parameters and scoring equation parameters for the diabetes.dat data example.

Download CSV Display Table

2.3. Covariates

When covariates are included in the model, the latent class probabilities are typically modeled as a logistic function of the covariates (Bandeen-Roche et al, Citation1997; Dayton & Macready, Citation1988; Yamaguchi, Citation2000). That is, $P (X = k | z) = \frac{exp (γ_{0 k} + \sum_{p = 1}^{P} γ_{pk} z_{p})}{\sum_{k = 1}^{K} e xp (γ_{0 k} + \sum_{p = 1}^{P} γ_{pk} z_{p})} .$

Here, $z$ denotes the vector of covariates, and $γ_{0 k}$ and $γ_{pk}$ represent the constants and the regression parameters for covariate $z_{p} .$

Since the denominator does not depend on the class, it cancels from the formula for the posterior class membership probability and thus also from the scoring equations. Assuming the response variables are nominal, the posterior probability of class membership given responses and covariates becomes: $P (X = k | z) = \frac{exp (γ_{0 k}^{*} + \sum_{j = 1}^{J} β_{y_{j} k} + \sum_{p = 1}^{P} γ_{pk} z_{p})}{\sum_{k = 1}^{K} e xp (γ_{0 k}^{*} + \sum_{j = 1}^{J} β_{y_{j} k} + \sum_{p = 1}^{P} γ_{pk} z_{p})} .$

That is, the covariate terms can simply be added to the scoring equations.

Covariates may also have direct effects on the indicators. Let us assume we have a single covariate $z$ which has a direct effect on the categorical response variable $Y_{j};$ that is, $P (Y_{j} = y_{j} | X = k, z) = \frac{exp (α_{y_{j}} + β_{y_{j} k} + δ_{y_{j}} z)}{E_{jk | z}},$ where $E_{jk | z} = \sum_{y_{j} = 1}^{R_{j}} e xp (α_{y_{j}} + β_{y_{j} k} + δ_{y_{j}} z) .$

As can be seen, in this model, the normalizing constants depend on the covariate value, meaning that we no longer have a single $log E_{jk}$ which can be subtracted from $γ_{k} .$ Because the $log E_{jk | z}$ are neither linear functions of the covariate values, they cannot be added to the linear term for the covariate concerned. In other words, the exact linear logistic representation of the posterior probabilities collapses in this situation, though, as discussed below and in Appendix B, it may still be used as an approximation. An exception is the situation in which the covariate is a nominal or dichotomous variable, in which case exact scoring equations can still be obtained by subtracting $log E_{jk | z}$ from the $γ_{pk}$ terms of the covariate concerned.

Note that when the direct effect of a covariate is allowed to be class specific, or equivalently, when an interaction term is included, the indicator-covariate interaction should also be added to the scoring equation. Again, the scoring equations will be exact only when the covariate concerned is nominal or dichotomous. For example, this is the specification used in multiple-group LC models in which response probabilities may be allowed to differ across subgroups for one or more indicators (Clogg & Goodman, Citation1984; Eid et al., Citation2003; Kankaras et al., Citation2010).

2.4. Multiple Latent Variables

Suppose the LC model contains two latent variables $X_{1}$ and $X_{2}$ instead of one, so that we have a LC Factor or Discrete Factor model. Such a model, has the following form (Goodman, Citation1974b; Hagenaars, Citation1990; Magidson & Vermunt, Citation2001; Vermunt & Magidson, Citation2005): $P (Y = y) = \sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} P (X_{1} = k_{1}, X_{2} = k_{2}) \prod_{j = 1}^{J} P (Y_{j} = y_{j} | X_{1} = k_{1}, X_{2} = k_{2}) .$

As in the single latent variable case, the $P (X_{1} = k_{1}, X_{2} = k_{2})$ and $P (Y_{j} = y_{j} | X_{1} = k_{1}, X_{2} = k_{2})$ can be modelled using logistic regression models (Magidson & Vermunt, Citation2001). For example, $P (X_{1} = k_{1}, X_{2} = k_{2}) = \frac{exp (γ_{k_{1}}^{1} + γ_{k_{2}}^{2} + γ_{k_{1} k_{2}}^{12})}{D}$ $P (Y_{j} = y_{j} | X_{1} = k_{1}, X_{2} = k_{2}) = \frac{exp (α_{y_{j}} + β_{y_{j} k_{1}}^{1} + β_{y_{j} k_{2}}^{2})}{E_{j k_{1} k_{2}}} .$

Also in this case, the posterior probabilities can be written as functions of the LC model parameters; i.e., $P (X_{1} = k_{1}, X_{2} = k_{2} | Y = y) = \frac{exp (γ_{k_{1} k_{2}}^{*} + \sum_{j = 1}^{J} β_{y_{j} k_{1}}^{1} + \sum_{j = 1}^{J} β_{y_{j} k_{2}}^{2})}{\sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} e xp (γ_{k_{1} k_{2}}^{*} + \sum_{j = 1}^{J} β_{y_{j} k_{1}}^{1} + \sum_{j = 1}^{J} β_{y_{j} k_{2}}^{2})},$ where the $γ_{k_{1} k_{2}}^{*}$ contain the $γ$ terms and the normalizing constants $E_{j k_{1} k_{2}} .$

Note that the above equation for two latent variables can easily be generalized to an arbitrary number of $S$ latent variables. The logistic scoring equation then becomes: $P (X_{1} = k_{1}, \dots, X_{S} = k_{S} | Y = y) = \frac{exp (γ_{k_{1} \dots k_{S}}^{*} + \sum_{s = 1}^{S} \sum_{j = 1}^{J} β_{y_{j} k_{s}}^{s})}{\sum_{k_{1} = 1}^{K_{1}} \dots \sum_{k_{S} = 1}^{K_{S}} e xp (γ_{k_{1} \dots k_{S}}^{*} + \sum_{s = 1}^{S} \sum_{j = 1}^{J} β_{y_{j} k_{s}}^{s})} .$

It should be noted that the marginal posterior probabilities $P (X_{s} = k_{s} | Y = y),$ which are obtained by collapsing over the other latent variables, cannot be written as logistic functions. However, logistic approximations of the marginal posteriors may be precise enough in most applications. Below, we discuss how such approximations can be obtained.

3. Estimating the Scoring Equations Using Logistic Regression Analysis

Rather than computing the scoring equations from the parameters of the LC model, it is also possible to obtain these equations posthoc using a standard routine for multinomial logistic regression analysis. This involves the following three steps:

After selection of the final LC model, save the posterior class membership probabilities to an output file. This is a feature available in all software packages for LC analysis.
Create an expanded data set with $K$ records per subject, which contains a column with the class number taking on values from $1$ to $K,$ a column with the posterior probability for the person and class concerned, and columns for the response variables and covariates used in the LC model, the latter columns containing the same values repeated in each of the K records for each subject.
Estimate a logistic regression model in which the posteriors are used as weights. The class number is the dependent variable, and the responses and covariates are the predictors.

Depending on the situation, in the third step the responses and covariates are modeled as either nominal or numeric predictors, quadratic and/or interaction effects are added, and missing value dummies are included. For count variables, one should include the exposures (or total number of trials) as additional numeric predictors when these differ across individuals. Steps 2 and 3 are automated in the LatentGOLD program (Vermunt & Magidson, Citation2016; Citation2021), and are called Step3-Scoring.

This approach can be used not only for the posthoc computation of the exact scoring equations, but also for obtaining approximate scoring equations. This is useful when an exact form does not exist, such as when direct effects of numeric covariates on indicators were included in the LC model or when non-canonical link functions were used for ordinal variables, as well as the situation where one prefers a set of simplified equations, say without quadratic or interaction terms, that are almost as good as the exact ones. An example of the latter can be seen in , which reports the approximate scoring equations for the diabetes.dat example presented above, but leaving out the quadratic terms of $y_{1}$ and $y_{2}$ and their interaction terms. The approximate equations predict the class memberships almost as well as the exact equations; that is, the entropy R² equals 0.817, while its original value equals 0.833.

Table 3. Approximate scoring equation parameters for the diabetes.dat data example.

Download CSV Display Table

When ordinal variables are modeled using non-canonical link functions, such as cumulative logit or probit models, we have two options. Option 1 is to compute the exact scoring equations by treating the response variables as nominal predictors in the posthoc logistic regression analysis; that is, by making use of the fact that the estimated $P (Y_{j} = y_{j} | X = k)$ based on an ordinal model can be reproduced perfectly by an unrestricted multinomial model. Option 2 is to estimate the scoring equations using the response variables as numeric predictors, which in fact implies that the estimated $P (Y_{j} = y_{j} | X = k)$ from the original LC model are approximated by an adjacent-category logit model.

As shown above, in LC models with multiple latent variables, an exact set of logistic scoring equations exists for the joint class membership probabilities, but not for the marginal class membership probabilities. The posthoc estimation method can also be used to obtain approximate scoring equations based on the marginal posteriors. Applying these equations will be simpler than first computing the joint and subsequently collapsing over the other latent variables, especially with models containing more than two latent variables. The quality of the resulting approximation can be assessed by a goodness-of-fit measure.

4. Discussion

As in continuous latent variables models, in LC models it is important to have a simple scoring rule for predicting a person’s value on the latent variable. In this paper, we showed that for LC models this scoring rule has the form of a linear logistic equation, with weights which are simple functions of the original LC model parameters. We derived the exact scoring equations for nominal, ordinal, count, and continuous response variables, for local independence and local dependence models, for models with covariates, for models with multiple latent variables, and for models with missing values on some of the indicators. Moreover, we discussed several situations in which exact scoring equations may not exist, such as LC models with direct effects of covariates on the indicators and LC models in which the conditional response distributions are restricted using regression models based on non-canonical link functions.

We also explained how to compute exact or approximate scoring equations with the saved posterior probabilities from any LC analysis program. This can be achieved with standard routines for logistic regression analysis. In practice, this may be much easier than computing the scoring equations from the LC model parameters, where the constants from the scoring equations may be somewhat more tedious to obtain.

While not discussed explicitly, the computation of the scoring equations proceeds in exactly the same manner in LC models for mixed responses; that is, in LC models for combinations of nominal, ordinal, count, and continuous indicators (Hennig & Liao, Citation2013; Hunt & Jorgensen, Citation1999; Vermunt & Magidson, Citation2002). The only thing that needs to be done in the computation of the scoring equations is to collect the terms for the different indicators, irrespective of their scale types. When using the posthoc method based on a logistic regression analysis, things are even easier. Nominal indicators are used as nominal predictors, and ordinal, count and continuous indicators as numeric predictors. Depending on the situation, quadratic and/or interaction terms may also need to be included.

The scoring equations discussed in this article can be used to obtain point estimates for the posterior probabilities, not only for subjects in the original sample, but also for new subjects. However, an issue not dealt with in this paper is the uncertainty about those estimates. Since the “regression” weights of the scoring equation are sample estimates, when deriving a prediction, it would be better to take into account this sampling variability. Note that the weights are functions of the original model parameters, for which we have the estimated asymptotic variance-covariance matrix. A possible approach to obtain the covariance matrix of the weights involves sampling say 100 parameter sets from their estimated multivariate normal distribution and computing 100 sets of weights. Other options to explore are the delta method and bootstrapping (Dias & Vermunt, Citation2008). Our future research will focus on this important topic.

Disclosure Statement

Jeroen K. Vermunt and Jay Magidson are co-developers of the LatentGOLD software.

References

Agresti, A. (2002). Categorical data analysis (2d ed.). Wiley.
Google Scholar
Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92, 1375–1386. https://doi.org/10.1080/01621459.1997.10473658
Web of Science ®Google Scholar
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821. https://doi.org/10.2307/2532201
Web of Science ®Google Scholar
Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis. Arnold.
Google Scholar
Clogg, C. C., & Goodman, L. A. (1984). Latent structure analysis of a set of multidimensional contingency tables. Journal of the American Statistical Association, 79, 762–771. https://doi.org/10.1080/01621459.1984.10477093
Web of Science ®Google Scholar
Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences. Wiley.
Google Scholar
Dayton, C. M., & Macready, G. B. (1988). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, 173–178. https://doi.org/10.1080/01621459.1988.10478584
Web of Science ®Google Scholar
Dias, J. G., & Vermunt, J. K. (2008). A bootstrap-based aggregate classifier for model-based clustering. Computational Statistics, 23, 643–659. https://doi.org/10.1007/s00180-007-0103-7
Web of Science ®Google Scholar
Dolan, C. V., & Van der Maas, H. L. J. (1997). Fitting multivariate normal finite mixtures subject to structural equation modeling. Psychometrika, 63, 227–253. https://doi.org/10.1007/BF02294853
Web of Science ®Google Scholar
Eid, M., Langeheine, R., & Diener, E. (2003). Comparing typological structures across cultures by multigroup latent class analysis. A primer. Journal of Cross-Cultural Psychology, 34, 195–210. https://doi.org/10.1177/0022022102250427
Web of Science ®Google Scholar
Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476–486. https://doi.org/10.1080/01621459.1992.10475229
Web of Science ®Google Scholar
Goodman, L. A. (1974a). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231. https://doi.org/10.2307/2334349
Web of Science ®Google Scholar
Goodman, L. A. (1974b). The analysis of systems of qualitative variables when some of the variables are unobservable: Part I - A modified latent structure approach. American Journal of Sociology, 79, 1179–1259. https://doi.org/10.1086/225676
Web of Science ®Google Scholar
Goodman, L. A. (1979). Simple models for the analysis of association in cross-classifications saving ordered categories. Journal of the American Statistical Association, 74, 537–552. https://doi.org/10.1080/01621459.1979.10481650
Web of Science ®Google Scholar
Haberman, S. J. (1979). Analysis of qualitative data, Vol 2, New developments. Academic Press.
Google Scholar
Hagenaars, J. A. (1990). Categorical longitudinal data - loglinear analysis of panel, trend and cohort data. Sage.
Google Scholar
Hagenaars, J. A. P. (1988). Latent structure models with direct effects between indicators local dependence models. Sociological Methods & Research, 16, 379–405. https://doi.org/10.1177/0049124188016003002
Web of Science ®Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning. Springer.
Google Scholar
Heinen, T. (1996). Latent class and discrete latent trait models: Similarities and differences. Sage.
Google Scholar
Hennig, C., & Liao, T. F. (2013). How to find an appropriate clustering for mixed type variables with application to socioeconomic stratification (with discussion). Journal of the Royal Statistical Society Series C: Applied Statistics, 62, 309–369. https://doi.org/10.1111/j.1467-9876.2012.01066.x
Web of Science ®Google Scholar
Hunt, L., & Jorgensen, M. (1999). Mixture model clustering using the MULTIMIX program. Australian and New Zeeland Journal of Statistics, 41, 153–172.
Web of Science ®Google Scholar
Kankaras, M., Moors, G., & Vermunt, J. K. (2010). Testing for measurement invariance with latent class analysis. In E. Davidov, P. Schmidt, and J. Billiet (Eds.), Cross-cultural analysis: Methods and applications (pp. 359–384). Routledge.
Google Scholar
Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models, bi-plots and related graphical displays. Sociological Methodology, 31, 223–264. https://doi.org/10.1111/0081-1750.00096
Web of Science ®Google Scholar
Magidson, J., & Vermunt, J. K. (2004). Latent class models. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 175–198). Sage Publications.
Google Scholar
McCutcheon, A. L. (1987). Latent class analysis. Sage.
Google Scholar
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. John Wiley & Sons.
Google Scholar
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 345–368). Sage.
Google Scholar
Oberski, D. L., van Kollenburg, G. H., & Vermunt, J. K. (2013). A Monte Carlo evaluation of three methods to detect local dependence in binary data latent class models. Advances in Data Analysis and Classification, 7, 267–279. https://doi.org/10.1007/s11634-013-0146-2
Web of Science ®Google Scholar
Vermunt, J. K., & Magidson, J. (2002). Latent class cluster analysis. In J. Hagenaars & A. McCutcheon (Eds.), Applied latent class analysis (pp. 89–106). Cambridge University Press.
Google Scholar
Vermunt, J. K., & Magidson, J. (2005). Factor analysis with categorical indicators: A comparison between traditional and latent class approaches. In A. Van der Ark, M.A. Croon & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 41–62). Erlbaum.
Google Scholar
Vermunt, J. K., & Magidson, J. (2016). Technical Guide for LatentGOLD 5.1: Basic, Advanced, and Syntax. Statistical Innovations Inc.
Google Scholar
Vermunt, J. K., & Magidson, J. (2021). Upgrade Manual for LatentGOLD Basic, Advanced/Syntax, and Choice Version 6.0. Statistical Innovations Inc.
Google Scholar
Vermunt, J. K., Van Ginkel, J. R., Van der Ark, L. A., & Sijtsma, K. (2008). Multiple imputation of categorical data using latent class analysis. Sociological Methodology, 38, 369–397. https://doi.org/10.1111/j.1467-9531.2008.00202.x
Web of Science ®Google Scholar
Yamaguchi, K. (2000). Multinomial logit latent-class regression models: An analysis of the predictors of gender-role attitudes among Japanese women. American Journal of Sociology, 105, 1702–1740. https://doi.org/10.1086/210470
Web of Science ®Google Scholar
Yung, Y. F. (1997). Finite mixtures in confirmatory factor-analysis models. Psychometrika, 62, 297–330. https://doi.org/10.1007/BF02294554
Web of Science ®Google Scholar

Appendix A:

R Code Generated by LatentGOLD for the First Example Application

With the LatentGOLD output options “ScoringEquations” and “WriteRsyntax= <filename>“, one can request an R syntax file that can be used to classify new observations. The variable names in the “political.sav” data file are sys_resp, ideo_lev, rep_pot, prot_app, and conv_par. The lg_scoring function consists of three parts:

First, it creates the variables to be used as “predictors” in the scoring equations. For ordinal and continuous indicators, these are copies of the variables in the data set, and for categorical variables, these are dummies for the response categories. In addition, dummies are created for missing values.
Then, it computes the class-specific linear terms using the variables created in part 1 and the scoring equations’ parameters.
Subsequently, the linear terms are exponentiated and transformed to posterior probabilities.

The function’s output are the posterior probabilities and the modal class. Below, you find the R code, which ends with example code calling the lg_scoring function to add classification information to a data set.

## Scoring function to be called per record

lg_scoring<-function(dat) {

# Part 1: Create variables to be used as predictors in scoring

# equations

if(is.na(dat$sys_resp)) {

sys_resp_lg_1<-0;sys_resp_lg_2<-0;sys_resp_lg_m<-1

}

else {

if(dat$sys_resp==1) {

sys_resp_lg_1<-1;sys_resp_lg_2<-0;sys_resp_lg_m<-0

}

else if(dat$sys_resp==2) {

sys_resp_lg_1<-0;sys_resp_lg_2<-1;sys_resp_lg_m<-0

}

else {

sys_resp_lg_1<-0;sys_resp_lg_2<-0;sys_resp_lg_m<-1

}

# The same is done for the other 4 indicators

…

# Part 2: Compute the class-specific linear terms

Cluster_lg_1<-(0)+

(0)*sys_resp_lg_1+(0)*sys_resp_lg_2+

(0)*ideo_lev_lg_1+(0)*ideo_lev_lg_2+

(0)*rep_pot_lg_1+(0)*rep_pot_lg_2+

(0)*prot_app_lg_1+(0)*prot_app_lg_2+

(0)*conv_par_lg_1+(0)*conv_par_lg_2+

(0)*sys_resp_lg_m+(0)*ideo_lev_lg_m+

(0)*rep_pot_lg_m+(0)*prot_app_lg_m+

(0)*conv_par_lg_m

Cluster_lg_2<-(3.4185551)+

(0)*sys_resp_lg_1+(-1.7853117)*sys_resp_lg_2+

(0)*ideo_lev_lg_1+(-3.0502076)*ideo_lev_lg_2+

(0)*rep_pot_lg_1+(0.56595846)*rep_pot_lg_2+

(0)*prot_app_lg_1+ (-0.74630356)*prot_app_lg_2+

(0)*conv_par_lg_1+(-3.039846)*conv_par_lg_2+

(-0.9274766)*sys_resp_lg_m+(-0.49927555)*ideo_lev_lg_m+

(0.11060958)*rep_pot_lg_m+ (-0.34180273)*prot_app_lg_m+

(-1.8328984)*conv_par_lg_m

Cluster_lg_3<-(-3.6424675)+

(0)*sys_resp_lg_1+ (-0.61732246)*sys_resp_lg_2+

(0)*ideo_lev_lg_1+ (-0.23276128)*ideo_lev_lg_2+

(0)*rep_pot_lg_1+ (3.6818864)*rep_pot_lg_2+

(0)*prot_app_lg_1+ (3.0608866)*prot_app_lg_2+

(0)*conv_par_lg_1+ (-1.0034281)*conv_par_lg_2+

(-0.40726727)*sys_resp_lg_m+(-0.089565904)*ideo_lev_lg_m+

(1.9387485)*rep_pot_lg_m+(2.5015355)*prot_app_lg_m+

(-0.81826931)*conv_par_lg_m

# Part 3: Compute odds from logits, as well as modal class and

# probabilities from odds

max_lg<-Cluster_lg_1

if(Cluster_lg_2 > max_lg) {

max_lg<-Cluster_lg_2

}

if(Cluster_lg_3 > max_lg) {

max_lg<-Cluster_lg_3

}

Cluster_lg_1<-exp(Cluster_lg_1-max_lg)

Cluster_lg_2<-exp(Cluster_lg_2-max_lg)

Cluster_lg_3<-exp(Cluster_lg_3-max_lg)

max_lg<-Cluster_lg_1

Cluster_lg_modal<-1

if(Cluster_lg_2 > max_lg) {

max_lg<-Cluster_lg_2; Cluster_lg_modal<-2

}

if(Cluster_lg_3 > max_lg) {

max_lg<-Cluster_lg_3; Cluster_lg_modal<-3

}

sum_lg<-Cluster_lg_1 + Cluster_lg_2 + Cluster_lg_3

Cluster_lg_1<-Cluster_lg_1/sum_lg

Cluster_lg_2<-Cluster_lg_2/sum_lg

Cluster_lg_3<-Cluster_lg_3/sum_lg

return(list(

”Cluster_modal”=Cluster_lg_modal,

”Cluster_1”=Cluster_lg_1,

”Cluster_2”=Cluster_lg_2,

”Cluster_3”=Cluster_lg_3

))

}

## Example of call of scoring function in a loop over records

outdata<-inpdata

for(i in 1:nrow(outdata)){

scoring<-lg_scoring(outdata[i,])

outdata[i,”Cluster_modal”]<-scoring$Cluster_modal

outdata[i,”Cluster_1”]<-scoring$Cluster_1

outdata[i,”Cluster_2”]<-scoring$Cluster_2

outdata[i,”Cluster_3”]<-scoring$Cluster_3

}

As an example, let us take a subject with sys_resp = NA, ideo_lev = 1, rep_pot = 2, prot_app = 2, and conv_par = NA. For this data pattern, in part 1, the dummies sys_resp_lg_m, ideo_lev_lg_1, rep_pot_lg_2, prot_app_lg_2, and conv_par_lg_m are set to 1, and the remaining ones to 0. By summing the weights for which the dummies equal 1, part 2 yields the values 0, −1.8329, and −.8183 for the three linear equations. Transforming these to probabilities in part 3, gives posteriors .1095, .1766, and .7139 for the three classes.

Appendix B: Taylor Approximation of the Normalizing Constants with a Covariate Having a Direct Effect on a Categorical Indicator

As shown in the main text, when a numeric covariate z has a direct effect on a categorical indicator, the normalizing constant of the indicator concerned becomes: $E_{jk | z} = \sum_{y_{j} = 1}^{R_{j}} e xp (α_{y_{j}} + β_{y_{j} k} + δ_{y_{j}} z),$ and will thus depend on the value of z. As a result, the scoring equations will no longer be linear logistic. However, a possible way out is to approximate this term using a Taylor expansion.

For simplicity, assume covariate z is centered, and thus has a mean of 0. The second-order Taylor approximation of $log E_{jk | z}$ at $z = 0$ equals: $log_{jk | z} = log E_{jk | z = 0} + \frac{d log E_{jk | z = 0}}{d z} z + \frac{d^{2} log E_{jk | z = 0}}{d^{2} z} / 2 z^{2},$ with $\frac{d log E_{jk | z = 0}}{d z} = \sum_{y_{j} = 1}^{R_{j}} P (Y_{j} = y_{j} | X = k, z = 0) δ_{y_{j}}$

and $\frac{d^{2} log E_{jk | z = 0}}{d^{2} z} = \sum_{y_{j} = 1}^{R_{j}} P (Y_{j} = y_{j} | X = k, z = 0) [δ_{y_{j}} - \frac{d log E_{jk | z = 0}}{d z}] δ_{y_{j}} .$

The term $log E_{jk | z = 0}$ is subtracted from the intercept of class k, the $\frac{d log E_{jk | z = 0}}{d z}$ are subtracted from the linear terms for z, and the $- \frac{d^{2} log E_{jk | z = 0}}{d^{2} z} / 2$ appear as quadratic terms in the scoring equations.

Linear Logistic Scoring Equations for Latent Class and Latent Profile Models: A Simple Method for Classifying New Cases

Abstract

1. Latent Class Models for Categorical Responses

1.1. Nominal Responses

1.2. Ordinal Responses and Counts

1.3. Local Dependencies

1.4. Missing Values

1.5. An Example with Five Dichotomous Indicators

Table 1. Latent class model parameters and scoring equation parameters for the political.sav data example.

2. Other Types of Latent Class and Mixture Models

2.1. Continuous Responses

2.2. An Example with Three Continuous Indicators

Table 2. Latent class model parameters and scoring equation parameters for the diabetes.dat data example.

2.3. Covariates

2.4. Multiple Latent Variables

3. Estimating the Scoring Equations Using Logistic Regression Analysis

Table 3. Approximate scoring equation parameters for the diabetes.dat data example.

4. Discussion

Disclosure Statement

References

Appendix A:

R Code Generated by LatentGOLD for the First Example Application

Appendix B: Taylor Approximation of the Normalizing Constants with a Covariate Having a Direct Effect on a Categorical Indicator

Information for

Open access

Opportunities

Help and information

Linear Logistic Scoring Equations for Latent Class and Latent Profile Models: A Simple Method for Classifying New Cases

Abstract

1. Latent Class Models for Categorical Responses

1.1. Nominal Responses

1.2. Ordinal Responses and Counts

1.3. Local Dependencies

1.4. Missing Values

1.5. An Example with Five Dichotomous Indicators

Table 1. Latent class model parameters and scoring equation parameters for the political.sav data example.

2. Other Types of Latent Class and Mixture Models

2.1. Continuous Responses

2.2. An Example with Three Continuous Indicators

Table 2. Latent class model parameters and scoring equation parameters for the diabetes.dat data example.

2.3. Covariates

2.4. Multiple Latent Variables

3. Estimating the Scoring Equations Using Logistic Regression Analysis

Table 3. Approximate scoring equation parameters for the diabetes.dat data example.

4. Discussion

Disclosure Statement

References

Appendix A:

R Code Generated by LatentGOLD for the First Example Application

Appendix B: Taylor Approximation of the Normalizing Constants with a Covariate Having a Direct Effect on a Categorical Indicator

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date