Abstract
This article examines Bayesian model averaging as a means of addressing predictive performance in Bayesian structural equation models. The current approach to addressing the problem of model uncertainty lies in the method of Bayesian model averaging. We expand the work of Madigan and his colleagues by considering a structural equation model as a special case of a directed acyclic graph. We then provide an algorithm that searches the model space for submodels and obtains a weighted average of the submodels using posterior model probabilities as weights. Our simulation study provides a frequentist evaluation of our Bayesian model averaging approach and indicates that when the true model is known, Bayesian model averaging does not yield necessarily better predictive performance compared to nonaveraged models. However, our case study using data from an international large-scale assessment reveals that the model-averaged submodels provide better posterior predictive performance compared to the initially specified model.
ACKNOWLEDGMENTS
We would like to thank Fabrizzio Sanchez for early contributions to the R code used in this project.
FUNDING
The research reported in this article was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D110001 to the University of Wisconsin–Madison. The opinions expressed are those of the authors and do not necessarily represent the views of the Institute or the U.S. Department of Education.
Notes
1 Note that in the case where there is only one element in the block, the prior distribution is assumed to be inversegamma; that is, .
2 This is rarely seen in practice, and of course, software packages that produce the Bayes factor will use equal prior odds as a default.
3 The notion of best subset regression is controversial in the frequentist framework because of concern over capitalization on chance. However, in the Bayesian framework with its focus on predictive accuracy, finding the best subset of predictors is less of a problem.
4 Our use of two chains and this large number of draws was to ensure wide sampling of the posterior distribution to obtain valid comparisons across conditions.
5 As with the simulation study, use of such a large number of post-burn-in draws was to ensure coverage and convergence of the MCMC sampling.