466
Views
29
CrossRef citations to date
0
Altmetric
Original Articles

SEM with Missing Data and Unknown Population Distributions Using Two-Stage ML: Theory and Its Application

&
Pages 621-652 | Published online: 18 Dec 2008
 

Abstract

This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.

Notes

1The theorem states that if x n x, a n converges in probability to a and b n converges in probability to b, a n x n +b n ax+b

2The variable Algebra is chosen because T ML is highly significant when it is included in model (18)

3The value can be easily changed in the program if −99 is a possible value for real data

4The main program is at the end of the file twosML.sas on the Web

5The file only contains the (p s +p s *+1)×(p s +p s *+1) numbers that belong to “ SWN s ″ = EQS from the output of twosML.sas. The p s p s * numbers corresponding to SWNΣ s should be saved separately for only covariance structure analysis, as in Appendix E

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.