688
Views
21
CrossRef citations to date
0
Altmetric
TEACHER'S CORNER

Addressing the Problem of Switched Class Labels in Latent Variable Mixture Model Simulation Studies

, &
Pages 110-131 | Published online: 07 Jan 2011
 

Abstract

The discrimination between alternative models and the detection of latent classes in the context of latent variable mixture modeling depends on sample size, class separation, and other aspects that are related to power. Prior to a mixture analysis it is useful to investigate model performance in a simulation study that reflects the research settings. Multiple data sets are generated under 1 or more models, and alternative models are fitted to the data. The aggregation of results over multiple data sets is complicated by the fact that mixture models are only identified up to a permutation of the class labels. Estimated class labels are arbitrary, with the effect that the estimated parameters for Class 1 could be incorrectly labeled as Class 2, Class 3, and so forth, relative to their data generating labels. In a simulation study, the detection of switched labels needs to be automated. Switched class labels are not necessarily simple to detect. This article describes different possible scenarios of switched class labels, and develops an algorithm implemented in R that (a) detects switched labels, and (b) provides information that can be used to either correct class labels or to discard a particular data set from a simulation if class labels are ambivalent. The algorithm is useful in Monte Carlo simulations involving latent variable mixture models.

Notes

1Note that the class sizes in are equal. This is generally not the case in practice, but is used here for illustrative purposes.

2Note here that correctly labeling K – 1 classes will, by default, correctly label the kth class. Thus, the switched label detection algorithm requires that proportions correct assignment be greater than CA crit in K – 1 rather than in K classes.

3Note that if a model does not converge or has other fatal problems with estimation, *.pro and *.par files will not be produced by Mplus. Also note that Mplus will still produce *.pro and *.par files when latent variable matrices are not positive definite, when residual variance matrices are not positive definite, and when the first-order derivative matrix is not positive definite. These results are not trustworthy and should be excluded when taking summaries in a simulation study. This information is printed in the primary Mplus output file *.out.

4Fixing the parameter in a class other than the last class also works, but note that this is the default in Mplus and a requirement of LBLcorrect script developed herein.

5One of the class proportions (π1,…,π K ) is redundant because π K = 1 – Σ k = 1 K – 1 π k . In Mplus, the logit class proportions (log(π k K )) are estimated with the last class (K) as the reference class. These are called ALPHA(C) in TECHNICAL 1 in Mplus.

aThe LBLcorrect.r script will correctly handle cases where the last class is correctly labeled.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 412.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.