3,167
Views
96
CrossRef citations to date
0
Altmetric
Articles

BIC and Alternative Bayesian Information Criteria in the Selection of Structural Equation Models

, , &
 

Abstract

Selecting between competing structural equation models is a common problem. Often selection is based on the chi-square test statistic or other fit indices. In other areas of statistical research Bayesian information criteria are commonly used, but they are less frequently used with structural equation models compared to other fit indices. This article examines several new and old information criteria (IC) that approximate Bayes factors. We compare these IC measures to common fit indices in a simulation that includes the true and false models. In moderate to large samples, the IC measures outperform the fit indices. In a second simulation we only consider the IC measures and do not include the true model. In moderate to large samples the IC measures favor approximate models that only differ from the true model by having extra parameters. Overall, SPBIC, a new IC measure, performs well relative to the other IC measures.

Notes

1The online appendix and complete replication materials for the analyses presented here are available at http://dvn.iq.harvard.edu/dvn/dv/jjharden

2A dedicated Bayesian statistician would prefer formal Bayesian analyses, by specifying explicit prior probabilities to develop actual Bayes factors, which could then be compared to the various IC approximations. Although we recognize the value of this approach as an ideal, social and behavioral scientists in practice rarely work with prior distributions, but instead use IC-based measures that do not require prior specifications for model selection. Our goal is to assess the accuracy of such IC measures for selecting the true model or the best approximation to the true model in structural equation models, an area that has received little attention, and with some IC measures that are new.

3The latent variable model is also called the structural equation. Because the latent variable and measurement models both have structural parameters we prefer the term latent variable model to avoid this confusion.

4We can modify the model to permit such correlations among the errors, but do not do so here.

5Of course, advances in computational statistics over the last few decades have eased the burden of this complexity.

6Note that this is very similar to CitationKashyap's. (1982) approximation (KBIC), which uses log(N) rather than We compared KBIC to IBIC in our simulation study and found that IBIC consistently outperformed KBIC, particularly at small sample sizes. Thus, the slightly greater complexity in IBIC appears to be worthwhile. See the online appendix for more details on this comparison (http://dvn.iq.harvard.edu/dvn/dv/jjharden).

7 CitationBollen et al. (2012) also discussed an alternative computation of SPBIC when As , the prior variance goes to 0, so the prior distribution is a point mass at the mean, θ*. This case never occurs in the simulations or empirical example we discuss later so we do not discuss it further here. Such a case is less studied so that we would caution readers that we have too little experience with the SPBIC to know its behavior under these less common circumstances.

8This was most problematic at small sample sizes for SIM1—at N = 100, 168 samples had to be discarded for nonconvergence. The respective SIM1 numbers at other sample sizes are N = 250: 32, N = 500: 2, N = 1,000: 1, and N = 5,000: 0. The numbers for SIM2 are N = 100: 27, N = 250: 3, N = 500: 1, N = 1,000: 0, and N = 5,000: 0.

9The sem package uses the observed information matrix for the calculation of the asymptotic covariance matrix of the parameter estimates. This provides the output needed for the SPBIC (observed information matrix). The IBIC uses the expected information matrix, which is not available in the sem package in R. We used the observed information matrix in place of the expected information matrix. We did not anticipate much of a difference, but checked this issue in the following way. The expected information matrix is the expected value, or mean, of the observed information matrices. We computed the mean of the 1,000 simulated observed information matrices for each model, then replaced each of those means as the expected information matrices for their respective models and reran SIM1, n = 100. If there were any differences between using the observed and expected information matrices, we would expect it at the smallest sample size of 100. The result was an IBIC performance value (true model selection %) within a percent or two to what we got using the observed information. Although we would not anticipate large differences in other cases, we would still recommend that the expected information matrix be used for IBIC when it is available.

10Selecting the lowest value for the IC is an exact test that does not take into account the magnitude of the difference in the fit statistic from the next best-fitting model. Alternatively, following the guidelines of CitationRaftery (1995), we could consider models with differences in fit of less than 2 as essentially tied. Relative performance of the various criteria was similar when we did so; therefore to simplify presentation we assess correct selection in terms of the exact selection based on lowest value.

11Of course, we remind readers of the limitations of any simulation design and the need to replicate results under diverse conditions.

12Nearly all SEM software packages provide all but the information matrices as part of standard output. Many packages permit output of the asymptotic covariance matrix of the parameter estimates. The inverse of this covariance matrix provides an estimate of the information matrix. Whether it is the expected or observed information matrix depends on the SEM software and the option used. The online supplementary materials include an R function that takes output from any software package and computes all of the IC measures.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.