Abstract
Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to incomplete data and evaluates their performance via a Monte Carlo study. The focus is on statistics that also perform well in small samples. The following statistics are defined and studied: corrected residual-based statistic, residual-based F statistic, scaled chi-square, adjusted chi-square, Bartlett-corrected scaled chi-square, and Swain-corrected scaled chi-square. Both Type I error rates and power are studied with missing completely at random nonnnormally distributed data and varying degrees of nonnormality. Sample size, model size, and number of variables containing missingness are also varied. For power comparisons, both minor and major model misspecifications are considered. Two statistics had the best Type I error control and power: the adjusted chi-square and Bartlett-corrected chi-square. These statistics are recommended to practitioners. It is concluded that model fit can be assessed reliably and with sufficient power even at the intersection of all 3 problems: incomplete data, nonnormality, and small sample size.
Notes
1In the context of missing data, the ML estimator is often relabeled full information ML (FIML), but there is no need for this additional terminology.
2For completeness, we note that a completely different approach to dealing with nonnormality is to abandon ML estimation and to switch to an estimator appropriate for the type of data (e.g., elliptical theory) or to use asymptotically distribution free estimation approach (ADF; Browne, 1984). We do not study these approaches here.
3With incomplete data, the extension of the Satorra–Bentler chi-square is sometimes called the Yuan–Bentler chi-square, as CitationYuan and Bentler (2000) developed the extension. From this point on, both are simply referred to as scaled chi-square.
4With complete data, if there is no mean structure, means can be omitted. We include them here because with incomplete data they need to be estimated.
5For example, CitationNevitt and Hancock (2004) found that at most 2% of models failed to converge under ML across a variety of studied conditions, whereas 1% to 15% of models failed to converge under ADF for larger samples, and as many as 20% to 40% in smaller samples.
6This study included other sample sizes. Here, we only summarized the results pertaining to residual-based statistics.
7The large sample version (parallel to Equation 1) of the residual-based statistic under constraints is given in Browne (1982; Equation 1.7.18). The small-sample version (parallel to Equation 2) is being worked out and will be reported elsewhere.
8The original residual-based statistic in Equation 2 was computed but not reported, because its performance falls in line with previous research: It rejected nearly all models in all conditions, and probably requires thousands of cases to do well.