1,198
Views
22
CrossRef citations to date
0
Altmetric
Articles

Using Bayesian Statistics to Model Uncertainty in Mixture Models: A Sensitivity Analysis of Priors

, &
 

Abstract

The Bayesian estimation framework has specific benefits that can aid in the estimation of mixture models. Previous research has shown that the use of priors to capture (un)certainty in latent class sizes has the potential to greatly improve estimation accuracy of a mixture model. These priors can be beneficial in mixture modeling, but proper specification is key. A sensitivity analysis of priors is essential to understand the impact of the prior on the latent classes, whether diffuse or informed priors are implemented. We illustrate a full sensitivity analysis on Dirichlet priors for the class proportions of a latent growth mixture model. We show that substantive results can (drastically) shift as the prior setting is modified, even if only slightly. Math assessment data were used from the Early Childhood Longitudinal Study–Kindergarten class. We conclude with a discussion about final model interpretations when estimates are highly influenced by prior settings.

Notes

1 Uncertainty in Bayesian mixture modeling does not solely refer to the selection of priors. It can also refer to model uncertainty, which would fall under the issue of class enumeration with respect to mixture modeling. The issue of class enumeration is not directly addressed in this study, largely because the focus here is on the sensitivity analysis of prior settings and space does not allow for an assessment of both forms of uncertainty (i.e., model and prior uncertainty). However, the interested reader can see the following for more information on Bayesian class selection: Celeux, Forbes, Robert, and Titterington (2006) and Zhang, Lai, Lu, and Tong (Citation2013). Bayesian model selection of mixture models is still a growing area that is in need of further research to develop tools that can properly select among competing Bayesian mixture models.

2 The term noninformative prior refers to the case where researchers supply vague information about the population parameter value; the prior is typically defined with a very wide variance (Gill, Citation2008). Although noninformative is one term commonly used in the Bayesian literature to describe this type of prior (see, e.g., Gelman et al., Citation2013), other phrases such as diffuse (see, e.g., Gill, Citation2008), or flat (Jeffreys, Citation1961) are also used to describe this type of prior. We use noninformative and diffuse interchangeably in this article.

3 The term precision is a technical term in Bayesian statistics referring directly to the informativeness of the prior. In the case of a normal prior, for example, the precision represents the inverse of the variance hyperparameter.

4 Specifically, if the covariance matrix was allowed to vary across latent classes, then the IW priors placed on the class-specific matrices would need to be manipulated and studied through another sensitivity analysis. In fact, with a heterogeneous covariance matrix specified across classes, we found the IW prior too vague for the smaller class (C2). This resulted in the model not being properly estimated with the default IW prior for C2 using these data. In this case, the IW prior for C2 would require a sensitivity analysis because it would need to be altered for the model to properly estimate. Given that the focus in this article was on the prior for the latent class proportions (i.e., the Dirichlet prior), we opted to hold the priors for the covariance matrix constant and estimate the model with the covariance matrix restricted across classes. This restriction allowed us to fully investigate the impact of the Dirichlet prior with all other priors held constant. In an actual empirical investigation, one would have to conduct a full sensitivity analysis on all priors to fully understand how the combination of priors affects final model results. In this case, the covariance matrix restriction could be relaxed and the sensitivity analysis could also incorporate the IW prior (as well as all of the other priors specified).

5 There are many additional methods that can be used for deriving priors. For example, one could conduct a meta-analysis (Ibrahim, Chen, & Sinha, Citation2001; Rietbergen, Klugkist, Janessen, Moons, & Hoijtink, Citation2011), consult with experts (Bijak & Wisniowski, Citation2010; Fransman et al., Citation2011; Howard, Maxwell, & Fleming, Citation2000; Martin et al., Citation2012; Morris, Oakley, & Crowe, Citation2014), use data-driven priors (Berger, Citation2006; Brown, Citation2008; Candel & Winkens, Citation2003; van der Linden, Citation2008), or use a data-splitting technique as we implemented here (Gelman, Bois, & Jiang, Citation1996; Moore, Reise, Depaoli, & Haviland, Citation2015).

6 The purpose of this example is to illustrate how to incorporate (un)certainty into the mixture model and subsequently examine the impact of subjective priors. We decided to use a data-splitting technique, where Data Set 1 was used to estimate a LGMM with Bayesian diffuse (default) prior settings. Estimates from this first analysis were then converted into subjective priors implemented on select model parameters for Data Set 2. There are many different ways in which priors could have been obtained, and applied researchers should take great care when selecting the method(s) to implement when deriving priors. Even in the context of this data-splitting technique, we could have analyzed the first data set in a variety of ways (e.g., frequentist estimation or with subjective priors). The main point is to be purposeful when selecting priors, and then to always report the exact prior settings so that researchers can interpret results in the context of the prior.

7 We reran the cells with increased chain lengths up to 300,000 iterations as the minimum number of iterations to check for stability when more iterations were requested. It is good practice to rerun analyses (especially mixture models, which can encounter increased problems with convergence) to ensure local convergence was not obtained (Depaoli & van de Schoot, Citation2015). Stable results across the runs were obtained so convergence was established.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.