2,653
Views
63
CrossRef citations to date
0
Altmetric
Articles

Model Selection in Finite Mixture Models: A k-Fold Cross-Validation Approach

 

Abstract

Finite mixture models, whether latent class models, growth mixture models, latent profile models, or factor mixture models, have become an important statistical tool in social science research. One of the biggest and most debated challenges in mixture modeling is the evaluation of model fit and model comparison. In the application of mixture models, researchers often fit a collection of models and then decide on a single optimal model based on a variety of model fit information. We propose a k-fold cross-validation procedure to model selection whereby the model is repeatedly fit to k1 different partitions of the data set, the resulting model is then applied to kth partition of the sample, and the distribution of fit indexes is examined. This method is illustrated with growth mixture models fit to longitudinal data on reading ability collected as part of the Early Childhood Longitudinal Study–Kindergarten Cohort.

FUNDING

This work was supported by National Science Foundation Grant REAL-1252463 awarded to the University of Virginia, David Grissmer (Principal Investigator), and Christopher Hulleman (Co-Principal Investigator).

Notes

1 Despite some debate regarding the results presented by Lo et al. (Citation2001; see Jeffries, Citation2003), we report the VLMR LRT and aLMR LRT because they remain widely used in the mixture modeling literature.

2 Although a test sample size of 5 (1% of N = 496) might seem too small when performing 100-fold cross-validation, no parameters are estimated when the model is applied to the test sample. The test sample only needs to consist of at least one participant (essentially inserting the participant’s data into xi in Equation 1 and calculating the 2LL using the model-implied mean and covariance structure when the model was estimated using the training sample), which occurs when performing leave-one-out cross-validation (i.e., k-fold cross-validation with k = N).

Additional information

Funding

This work was supported by National Science Foundation Grant REAL-1252463 awarded to the University of Virginia, David Grissmer (Principal Investigator), and Christopher Hulleman (Co-Principal Investigator).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.