Abstract
Normal theory full-information maximum likelihood (FIML) is a common estimation technique for incomplete data in structural equation modeling (SEM). However, it is not commonly known that approximate fit indices (AFIs) can be distorted, relative to their complete data counterparts, when FIML is used to handle missing data. In this article, we show that two most popular AFIs, the root-mean-square error of approximation (RMSEA) and the comparative fit index (CFI) often approach different population values under FIML estimation when missing data are present. By deriving the FIML fit function for incomplete data and showing that it is different from the usual maximum likelihood (ML) fit function for complete data, we provide a mathematical explanation for this phenomenon. We also present several analytic examples as well as the results of two large sample simulation studies to illustrate how AFIs change with missing data.
Notes
1 Technically, ignorability requires an additional condition that the parameters of the missingness mechanism are independent of the model parameters (Little & Rubin, Citation2002).
2 All computer code and results from this article are available on Open Science Framework (OSF) at https://osf.io/uvpab/?view_only=e739f447ff1045bc8ff773838a2a0faf.
3 To use the MG fit function for handling missing data, pseudo-values corresponding to cases with missing data have to be inserted in the covariance matrices of the missing data patterns, and the degrees of freedom needed to be adjusted for these pseudo-values after fitting the model. See Chapter 8 of Bollen (Citation1989) for a detailed explanation.
4 Equation (12) is not defined if ; in that case, the sample RMSEA is set to zero. Similarly, the sample CFI in Equation (13) is rounded down to 0 or up to 1 if it exceeds these bounds.
5 We cannot compute the traditionally defined CFI because the traditional independence model is not nested within the highly restrictive-hypothesized model used in this example.
6 In most SF conditions, the variables with missing data had correlated residuals in the population model. However, in the SF conditions where four variables have missing data but only two variables have a correlated residual, two of the variables with missing data will not include a correlated residual.
7 Another advantage of the MI approach is its ability to handle mixtures of incomplete categorical and continuous variables (Enders & Mansolf, Citation2018).
8 Using the rational root theorem, we can show that the function in (A10) has no rational root. We solved for by graphing the function.