Abstract
The methodological literature on mixture modeling has rapidly expanded in the past 15 years, and mixture models are increasingly applied in practice. Nonetheless, this literature has historically been diffuse, with different notations, motivations, and parameterizations making mixture models appear disconnected. This pedagogical review facilitates an integrative understanding of mixture models. First, 5 prototypic mixture models are presented in a unified format with incremental complexity while highlighting their mutual reliance on familiar probability laws, common assumptions, and shared aspects of interpretation. Second, 2 recent extensions—hybrid mixtures and parallel-process mixtures—are discussed. Both relax a key assumption of classic mixture models but do so in different ways. Similarities in construction and interpretation among hybrid mixtures and among parallel-process mixtures are emphasized. Third, the combination of both extensions is motivated and illustrated by means of an example on oppositional defiant and depressive symptoms. By clarifying how existing mixture models relate and can be combined, this article bridges past and current developments and provides a foundation for understanding new developments.
Notes
1For mixture models with discrete outcomes, a probability parameterization, most commonly, or loglinear parameterization, have historically been used (e.g., CitationBiemer, 2011; CitationCollins & Flaherty, 2002; CitationHeinen, 1996; CitationMcCutcheon, 2002). For mixture models with continuous outcomes, a logistic parameterization is typically used for the between-class model, as employed here also. For consistency, we also employ a logistic parameterization with discrete outcomes (following, e.g., CitationHumphreys & Janson, 2000; CitationB. O. Muthén, 2001, Citation2004; CitationReboussin, Reboussin, Liang, & Anthony, 1998). This logistic parameterization can be used to compute probabilities, implicitly achieves the same constraints as in the popular probability parameterization, and readily expands to accommodate covariates (unlike the probability parameterization; CitationMagidson & Vermunt, 2004).
2Reasons are that (a) there is no formal set of rules (akin to Wright's tracing rules in SEM) allowing mixture model equations to be directly reproduced from their diagrams and (b) diagrams do not fully represent all aspects of the mixture model. For instance, current path diagrams do not communicate how many classes there are, if a particular parameter differs across some but not all classes, if a parameter is fixed to 0 in some classes but estimated in others, or which is the reference class.
3Here we use a shorthand: P(A) represents P(A = a), where a is a realization of random variable A.
4This is also sometimes called the chain rule.
5Online appendix is available at http://www.vanderbilt.edu/peabody/sterba/appxs.htm
6For each outcome in the LPA, depicts class-specific densities weighted by their class probabilities; however, weighting is used here only for ease of visualization. In LPA estimation, weighting is done for the joint outcome density, not individual outcome densities (see Equation (14)).
7Even when discrete subpopulations exist, many more response patterns than classes can arise due to measurement error.
8When evaluated for a particular response pattern, it is a probability.
9Having K = M corresponds with configural invariance of the categorical latent variable across time.
10The testability of measurement invariance in the conventional LTA contrasts to its untestability in the conventional GBT (unless extended to a second-order GBT, as in CitationGrimm & Ram, 2009).
11If T > 2, it could be possible to regress latent states at time t on prior states at both t – 1 and t – 2.
a Further constraints are common for parsimony and/or to prevent empirical underidentification (see text). Further constraints are also used to, for instance, impose threshold invariance within state across time in LTA.
12Analogously, in continuous latent variable models, the posterior density can be “shrunk” toward the mean of the prior density (CitationSkrondal & Rabe-Hesketh, 2004).
13A third approach, not discussed here, is unavailable for LCA and LTA but available for LPA and GBT. It involves estimating residual covariances within class; estimating many such covariances and/or allowing them to differ across class can incur estimation problems (CitationLubke & Neale, 2006).
14For instance, in a K = M = 2 LTA, testing forward change (below-diagonal elements of the transition probability matrix = 0) requires fixing α1 to a large negative number. Testing no-change (off-diagonal elements of transition probability matrix = 0) requires also fixing β11 to a very large positive number.
15Within class, the number of growth coefficients is (1 + b) × r (where b is the polynomial curve degree and r is the number of regimes). At timepoints where a particular regime goes off-line, its growth coefficients are fixed to 0 (for details see CitationDolan, Schmittmann, Lubke, & Neale, 2005).