Abstract
A 2-stage robust procedure as well as an R package, rsem, were recently developed for structural equation modeling with nonnormal missing data by Yuan and Zhang (2012). Several test statistics that have been used for complete data analysis are employed to evaluate model fit in the 2-stage robust method. However, properties of these statistics under robust procedures for incomplete nonnormal data analysis have never been studied. This study aims to systematically evaluate and compare 5 test statistics, including a test statistic derived from normal-distribution-based maximum likelihood, a rescaled chi-square statistic, an adjusted chi-square statistic, a corrected residual-based asymptotical distribution-free chi-square statistic, and a residual-based F statistic. These statistics are evaluated under a linear growth curve model by varying 8 factors: population distribution, missing data mechanism, missing data rate, sample size, number of measurement occasions, covariance between the latent intercept and slope, variance of measurement errors, and downweighting rate of the 2-stage robust method. The performance of the test statistics varies and the one derived from the 2-stage normal-distribution-based maximum likelihood performs much worse than the other four. Application of the 2-stage robust method and of the test statistics is illustrated through growth curve analysis of mathematical ability development, using data on the Peabody Individual Achievement Test mathematics assessment from the National Longitudinal Survey of Youth 1997 Cohort.
ACKNOWLEDGMENT
We would like to thank Dr. Scott Maxwell for his suggestions and comments that have significantly improved this study.
Notes
1The missing data rate as a whole is calculated by dividing the number of missing points by the number of total data points in the data set. For example, when T = 4, y1 is completely observed and y2, y3, and y4 equally have a probability of 40% to be missing, the missing data rate as a whole is 3 × 40% × N/4N = 30%.
2Other information such as the empirical mean and standard error of each test statistic has also been obtained but is not reported here for the sake of space. Such information is available on request.
3Because the results for different alpha levels (.01, .05, and .1) are similar, we only present the results for alpha level .05 to save space. Complete simulation results are available on request and can be viewed at http://nd.psychstat.org/research/sem2013a
4C3 shows similar results as C2.
5Auxiliary variables are variables that are not in the model of interest, but whose inclusion in the analysis can be beneficial (Enders, Citation2010). They are related either to the variables containing missing values, to the cause of missingness itself, or to both.