Abstract
In the presence of omitted variables or similar validity threats, regression estimates are biased. Unbiased estimates (the causal effects) can be obtained in large samples by fitting instead the Instrumental Variables Regression (IVR) model. The IVR model can be estimated using structural equation modeling (SEM) software or using Econometric estimators such as two-stage least squares (2SLS). We describe 2SLS using SEM terminology, and report a simulation study in which we generated data according to a regression model in the presence of omitted variables and fitted (a) a regression model using ordinary least squares, (b) an IVR model using maximum likelihood (ML) as implemented in SEM software, and (c) an IVR model using 2SLS. Coverage rates of the causal effect using regression methods are always unacceptably low (often 0). When using the IVR model, accurate coverage is obtained across all conditions when N = 500. Even when the IVR model is misspecified, better coverage than regression is generally obtained. Differences between 2SLS and ML are small and favor 2SLS in small samples (N ≤ 100).
Notes
1 In this article we consider the joint modeling of y, x, and z as commonly described in the SEM literature. We could have considered instead estimation of y and x conditional on z, in which case z may include binary variables, which are commonly used in applied research.
2 This is (N–1)/N times the sample covariance matrix, where N denotes sample size.
3 It would be correctly specified if or
or equivalently
.
4 Since and we use
, these values lead to population values in the equivalent IVR model of
= .1 and .2 (i.e., these are the population covariance values between the disturbances of the predictor and outcome).
5 There is a single degree of freedom available for testing. The usual recommended cutoffs for the RMSEA (Browne & Cudeck, Citation1993) should not be used to gauge the magnitude of model misfit when there are so few degrees of freedom (Kenny, Kaniskan, & McCoach, Citation2015). For instance, population RMSEAs ranged from .125 to .532 with an average of .24, suggesting extraordinarily poor fit. In contrast, SRMR values suggest that the models fit closely.
6 When there is a single outcome y, as in the simulation studies reported here, the ML estimator of the IVR model has a closed form solution (Anderson & Rubin, Citation1949, Citation1950) and it is referred to in the Econometrics literature as limited information ML (LIML estimator) –see Davidson and MacKinnon (Citation2004) for technical details. However, in this article, we obtained the ML solution iteratively, as implemented in SEM software. This is referred to in the Econometrics literature as full information ML.
7 Provided it yields consistent estimates.