Abstract
RMSEA estimation given nonnormal continuous data is usually based on the mean-adjusted () or mean-variance-adjusted (
) chi-square statistic, but a plain application of these statistics has poor performance. Savalei and colleagues gave a better way (the BSL method) to infer RMSEA using
or
. However, the BSL method is applicable to continuous data only. For categorical data, currently RMSEA inference is still based on a plain application of
or
, but such practice is already problematic under continuous data. In this paper, we first show that it is more meaningful to define RMSEA under unweighted least squares (ULS) than under weighted least squares (WLS) or diagonally weighted least squares (DWLS). Then, we propose a correct point estimator and confidence interval for RMSEA given categorical data and ULS. Simulation results show our methods perform well while all the traditional methods break down.
Notes
1 Combining EquationEquations (6)(6)
(6) and (Equation9
(9)
(9) ), when the model is correct (i.e., all
), the real distribution of
is characterized by
.
2 Note that WLSM and WLSMV refer to the M-adjusted and MV-adjusted procedures for DWLS, not for WLS.
3 It is unclear to us how Savalei (Citation2018) arrived at the recommendation against applying the BSL methods for categorical data. The argument in Savalei (Citation2018) was that a correct fit function requires as the weight matrix but the weight is not
in ULS or DWLS (p. 425). However, for a scalar-valued function
to be a legitimate fit function, only three properties are required: (a)
for any
values; (b)
if and only if
; (c)
is twice differentiable with respect to both
and
(e.g., Browne, Citation1984, p. 64). Both ULS and DWLS satisfy these requirements, and therefore the argument in Savalei (Citation2018) is unconvincing. Nevertheless, our simulation results in a later section indicate that the BSL-M and BSL-MV CIs both have poor empirical coverages given categorical data but the new CI we propose performs well. Now that a new CI with satisfactory coverages is available, we do not bother to ask why the BSL CIs are inapplicable to categorical data.
4 A saturated model simply means the number of model parameters is the same as the number of data elements. Suppose the data have thresholds and the model has
as the parameters for thresholds, then
is saturated. However, this does not necessarily cause
,
, and so forth, because
to
can still freely take values in the model fitting process.
5 See https://bit.ly/2xEzeEB for the R code and simulation files.