466
Views
29
CrossRef citations to date
0
Altmetric
Original Articles

SEM with Missing Data and Unknown Population Distributions Using Two-Stage ML: Theory and Its Application

&
Pages 621-652 | Published online: 18 Dec 2008
 

Abstract

This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.

Notes

1The theorem states that if x n x, a n converges in probability to a and b n converges in probability to b, a n x n +b n ax+b

2The variable Algebra is chosen because T ML is highly significant when it is included in model (18)

3The value can be easily changed in the program if −99 is a possible value for real data

4The main program is at the end of the file twosML.sas on the Web

5The file only contains the (p s +p s *+1)×(p s +p s *+1) numbers that belong to “ SWN s ″ = EQS from the output of twosML.sas. The p s p s * numbers corresponding to SWNΣ s should be saved separately for only covariance structure analysis, as in Appendix E

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.