Abstract
Finite mixture models, whether latent class models, growth mixture models, latent profile models, or factor mixture models, have become an important statistical tool in social science research. One of the biggest and most debated challenges in mixture modeling is the evaluation of model fit and model comparison. In the application of mixture models, researchers often fit a collection of models and then decide on a single optimal model based on a variety of model fit information. We propose a k-fold cross-validation procedure to model selection whereby the model is repeatedly fit to different partitions of the data set, the resulting model is then applied to kth partition of the sample, and the distribution of fit indexes is examined. This method is illustrated with growth mixture models fit to longitudinal data on reading ability collected as part of the Early Childhood Longitudinal Study–Kindergarten Cohort.
Keywords:
FUNDING
This work was supported by National Science Foundation Grant REAL-1252463 awarded to the University of Virginia, David Grissmer (Principal Investigator), and Christopher Hulleman (Co-Principal Investigator).
Notes
1 Despite some debate regarding the results presented by Lo et al. (Citation2001; see Jeffries, Citation2003), we report the VLMR LRT and aLMR LRT because they remain widely used in the mixture modeling literature.
2 Although a test sample size of 5 (1% of N = 496) might seem too small when performing 100-fold cross-validation, no parameters are estimated when the model is applied to the test sample. The test sample only needs to consist of at least one participant (essentially inserting the participant’s data into in Equation 1 and calculating the
using the model-implied mean and covariance structure when the model was estimated using the training sample), which occurs when performing leave-one-out cross-validation (i.e., k-fold cross-validation with k = N).