Abstract
As Bayesian methods continue to grow in accessibility and popularity, more empirical studies are turning to Bayesian methods to model small sample data. Bayesian methods do not rely on asympotics, a property that can be a hindrance when employing frequentist methods in small sample contexts. Although Bayesian methods are better equipped to model data with small sample sizes, estimates are highly sensitive to the specification of the prior distribution. If this aspect is not heeded, Bayesian estimates can actually be worse than frequentist methods, especially if frequentist small sample corrections are utilized. We show with illustrative simulations and applied examples that relying on software defaults or diffuse priors with small samples can yield more biased estimates than frequentist methods. We discuss conditions that need to be met if researchers want to responsibly harness the advantages that Bayesian methods offer for small sample problems as well as leading small sample frequentist methods.
Keywords:
Notes
1 Marginal prior refers to a prior distribution on an individual element of a matrix. This is opposed to multivariate priors (e.g., the inverse Wishart prior) that place one prior on the entire matrix.
In general, the marginal approach can be problematic because a matrix formed by draws from the multiple marginal distributions are not guaranteed to be nonpositive definite whereas this property is guaranteed by drawing from an inverse Wishart distribution. In the context of growth models where the random effect covariance matrix is typically of very low dimension, this issue is less likely to be a concern (Liu et al., Citation2016).
2 The inverse Wishart distribution is rather complex so we do not delve into full details given the intended focus of this article (see Muthén & Asparouhov, Citation2012, for a detailed explanation). As a quick introduction, the first argument is a scale matrix (Ψ) and the second number is the degrees of freedom (v). The larger the degrees of freedom, the more informative the prior will be. The mean of the inverse Wishart distribution is and the variance is
where p is the dimension of the scale matrix and ψ is an element of the scale matrix Ψ.
3 We want to also note that REML is not immune to convergence issues with smaller sample sizes. SAS will fix variance components estimates to 0 if an estimate is negative or if there is a convergence issue. The percentages of replications in which this occurred for the 20-, 30-, and 50-cluster conditions were 25%, 21%, and 11%, respectively. A preponderance of the issues occurred for the slope variance with a population value (0.10) that was fairly close to 0. These replications are excluded from the results, which might result in the REML results appearing slightly better than they are in actuality because the replications that would conceivably perform the worst might be excluded. Also note that the Kenward–Roger correction cannot operate when the variance components are inadmissible or fixed to zero.
4 Note that these methods are not available in Mplus and require that the models be fit in SAS (used in this study) or Stata. This requires that the model be considered a linear mixed model rather than a latent growth model. Curran (Citation2003) noted that these two models are interchangeable mathematically.
5 The inverse gamma distribution has a mean equal to and a variance equal to
for α the shape parameter (the firsthyperparameter) and β the scale parameter (the second hyperparameter).
6 The percentage of replications in which this occurred for the 8-, 10-, and 14-cluster conditions were 9%, 3%, and 1%, respectively. These replications are excluded from the results, which might result in the REML results appearing slightly better than they are in actuality because the replications that would conceivably perform the worst might be excluded.