Abstract
Statistical methods to identify mis-specifications of linear regression models with respect to the direction of dependence (i.e. whether or
better approximates the data-generating mechanism) have received considerable attention. Direction dependence analysis (DDA) constitutes such a statistical tool and makes use of higher-moment information of variables to derive statements concerning directional model mis-specifications in observational data. Previous studies on direction of dependence mainly focused on statistical inference and guidelines for the selection from the two directionally competing candidate models (
versus
) while assuming the absence of unobserved common causes. The present study describes properties of DDA when confounders are present and extends existing DDA methodology by incorporating the confounder model as a possible explanation. We show that all three explanatory models can be uniquely identified under standard DDA assumptions. Further, we discuss the proposed approach in the context of testing competing mediation models and evaluate an organizational model proposing a mediational relation between school leadership and student achievement via school safety using observational data from an urban school district. Overall, DDA provides strong empirical support that school safety has indeed a causal effect on student achievement but suggests that important confounders are present in the school leadership–safety relation.
Notes
1 The relation between residuals and errors can be established as follows (see, e.g., Cook & Weisberg, Citation1982): Focusing on the general case, we have the linear model y = Xβ+e where y is a vector of observed responses, X is a matrix of observed predictors, β is a vector of unknown parameters, and e is a vector of unknown errors. The vector of OLS residuals is given by r = y−ŷ = (I−V)y with ŷ being the vector of predicted outcome scores, I being the identity matrix, and V = X(XTX)–1XT being the hat matrix. Thus, one obtains r = (I−V)y = (I−V)(Xβ+e) = (I−V)e from which follows that the relationship between raw residuals and errors only depends on hat values. Thus, when hat values are small, raw residuals constitute a reasonable approximation for the errors. Alternatively, standardized or studentized residuals may be used as well which have the advantage of additionally considering hat values.
2 The equilibrium assumption is also implicitly made in the analysis of cross-sectional data.
3 Example R code to perform DDA is provided in the online supplementary appendix. Software implementations of DDA and introductory material are available from www.ddaproject.com.
4 Collider biases can also occur when conditioning on variables that have been affected by a common exogenous variable or when conditioning on variables that temporally precede the exogenous variable (Greenland, Citation2003).