Abstract
Mixtures of linear mixed models (MLMMs) are useful for clustering grouped data and can be estimated by likelihood maximization through the Expectation–Maximization algorithm. A suitable number of components is then determined conventionally by comparing different mixture models using penalized log-likelihood criteria such as Bayesian information criterion. We propose fitting MLMMs with variational methods, which can perform parameter estimation and model selection simultaneously. We describe a variational approximation for MLMMs where the variational lower bound is in closed form, allowing for fast evaluation and develop a novel variational greedy algorithm for model selection and learning of the mixture components. This approach handles algorithm initialization and returns a plausible number of mixture components automatically. In cases of weak identifiability of certain model parameters, we use hierarchical centering to reparameterize the model and show empirically that there is a gain in efficiency in variational algorithms similar to that in Markov chain Monte Carlo (MCMC) algorithms. Related to this, we prove that the approximate rate of convergence of variational algorithms by Gaussian approximation is equal to that of the corresponding Gibbs sampler, which suggests that reparameterizations can lead to improved convergence in variational algorithms just as in MCMC algorithms. Supplementary materials for the article are available online.
SUPPLEMENTARY MATERIALS
Appendix: Derivation of variational lower bound in (Equation3(3) ) and the expressions of the variational lower bounds and parameter updates for Algorithms 2 and 3 can be found in the Appendix. An example on application of Algorithm 2 to yeast galactose data of Ideker et al. (Citation2001) is also included. (VA_MLMM.appendix.pdf)
R codes and data: R codes for implementing the VGA using algorithms 1, 2, and 3 and the water temperature dataset are available as supplemental materials. Please read file “README” contained in the zip file for more details. (VGA.zip)
ACKNOWLEDGMENTS
Siew Li Tan was partially supported as part of the Singapore-Delft Water Alliance (SDWA)’s tropical reservoir research program. We thank SDWA for supplying the water temperature dataset and Dr. David Burger and Dr. Hans Los for their valuable comments and suggestions.