Abstract
This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.
Notes
1The theorem states that if x
n
x, a
n
converges in probability to a and b
n
converges in probability to b, a
n
x
n
+b
n
ax+b
2The variable Algebra is chosen because T ML is highly significant when it is included in model (18)
3The value can be easily changed in the program if −99 is a possible value for real data
4The main program is at the end of the file twosML.sas on the Web
5The file only contains the (p
s
+p
s
*+1)×(p
s
+p
s
*+1) numbers that belong to “
SWN
s
″ =
EQS from the output of twosML.sas. The p
s
*× p
s
* numbers corresponding to
SWNΣ
s
should be saved separately for only covariance structure analysis, as in Appendix E