966
Views
13
CrossRef citations to date
0
Altmetric
Articles

Examining the effect of missing data on RMSEA and CFI under normal theory full-information maximum likelihood

 

Abstract

Normal theory full-information maximum likelihood (FIML) is a common estimation technique for incomplete data in structural equation modeling (SEM). However, it is not commonly known that approximate fit indices (AFIs) can be distorted, relative to their complete data counterparts, when FIML is used to handle missing data. In this article, we show that two most popular AFIs, the root-mean-square error of approximation (RMSEA) and the comparative fit index (CFI) often approach different population values under FIML estimation when missing data are present. By deriving the FIML fit function for incomplete data and showing that it is different from the usual maximum likelihood (ML) fit function for complete data, we provide a mathematical explanation for this phenomenon. We also present several analytic examples as well as the results of two large sample simulation studies to illustrate how AFIs change with missing data.

Notes

1 Technically, ignorability requires an additional condition that the parameters of the missingness mechanism are independent of the model parameters (Little & Rubin, Citation2002).

2 All computer code and results from this article are available on Open Science Framework (OSF) at https://osf.io/uvpab/?view_only=e739f447ff1045bc8ff773838a2a0faf.

3 To use the MG fit function for handling missing data, pseudo-values corresponding to cases with missing data have to be inserted in the covariance matrices of the missing data patterns, and the degrees of freedom needed to be adjusted for these pseudo-values after fitting the model. See Chapter 8 of Bollen (Citation1989) for a detailed explanation.

4 Equation (12) is not defined if TLRdf<0; in that case, the sample RMSEA is set to zero. Similarly, the sample CFI in Equation (13) is rounded down to 0 or up to 1 if it exceeds these bounds.

5 We cannot compute the traditionally defined CFI because the traditional independence model is not nested within the highly restrictive-hypothesized model used in this example.

6 In most SF conditions, the variables with missing data had correlated residuals in the population model. However, in the SF conditions where four variables have missing data but only two variables have a correlated residual, two of the variables with missing data will not include a correlated residual.

7 Another advantage of the MI approach is its ability to handle mixtures of incomplete categorical and continuous variables (Enders & Mansolf, Citation2018).

8 Using the rational root theorem, we can show that the function in (A10) has no rational root. We solved for ψ by graphing the function.

Additional information

Funding

This research was supported by grant RGPIN-2015-05251 from the Natural Sciences and Engineering Research Council of Canada (NSERC) to Victoria Savalei.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.