2,157
Views
30
CrossRef citations to date
0
Altmetric
Original Articles

Addressing unobserved endogeneity bias in accounting studies: control and sensitivity methods by variable type

Pages 545-571 | Published online: 02 Jul 2014
 

Abstract

Together with their associated statistical routines, this paper describes the control and sensitivity methods that can be employed by accounting researchers to address the important issue of unobserved (omitted) variable bias in regression and matching models according to the types of variables employed. As with other social science disciplines, an important and pervasive issue in observational (non-experimental) accounting research is omitted variable bias (endogeneity). Causal inferences for endogenous explanatory variables are biased. This occurs in regression models where an unobserved (confounding) variable is correlated with both the dependent (outcome) variable in a regression model and the causal explanatory (often a selection) variable of interest. The Heckman treatment effect model has been widely employed to control for hidden bias for continuous outcomes and endogenous binary selection variables. However, in accounting studies, limited (categorical) dependent variables are a common feature and endogenous explanatory variables may be other than binary in nature. The purpose of this paper is to provide an overview of contemporary control methods, together with the statistical routines to implement them, which extend the Heckman approach to binary, multinomial, ordinal, count and percentile outcomes and to where endogenous variables take various forms. These contemporary methods aim to improve causal estimates by controlling for hidden bias, though at the price of increased complexity. A simpler approach is to conduct sensitivity analysis. This paper also presents a synopsis of a number of sensitivity techniques and their associated statistical routines which accounting researchers can employ routinely to appraise the vulnerability of causal effects to potential (simulated) unobserved bias when estimated with conventional regression and propensity score matching estimators.

Acknowledgements

I am grateful to two anonymous reviewers and to Elisabeth Dedman, Associate Editor, for helpful comments and suggestions. I am also grateful to Mark Clatworthy for helpful discussions and suggestions. Any errors are solely those of the author.

Notes

1. Note this principle underpins the use of multivariate regression models. Specifically, if we compare the mean audit fees of big 4 and non-big 4 auditees (univariate analysis), we find that big 4 clients incur substantially higher fees. This is factually correct, but it is uninformative regarding whether big 4 auditors charge an incremental premium – e.g. for conducting a higher quality audit – known as the treatment effect (below). Other relevant factors (such as client size and complexity) which determine both big 4 selection and fees must be controlled for in the regression model.

2. Other things equal, the premium would be underestimated by the equivalent of the overestimate.

3. The mechanical implementation of any statistical estimator without sufficient thought to theoretical considerations and to correctly specifying the proposed model is clearly ill advised, not least with regard to the methods discussed in this paper.

4. Treatreg and ivprobit () have built-in Stata commands and are supported by Stata manuals. As well as being user-friendly, a major feature of Stata is that experts in their fields produce dedicated user-written modules ().

5. Where accounting studies are unavailable to illustrate the methods, applications in social science research are referenced and briefly described. These supplement the more technical statistical/econometric papers which are also referenced and described. It is hoped that they will be informative for researchers interested in implementing the techniques. Experience suggests that studying examples of the methods applied in extant empirical studies is fruitful.

6. With a computer attached to the internet, the user-written Stata modules (commands) described in this paper can be easily accessed (including help documentation) and implemented in Stata when using a computer with internet access by simply typing findit followed by command names listed in and .

7. IV regression methods are extensively employed in economic research where simultaneous causality bias frequently features. The origins of the method can be traced as far back as 1928 in an exposition of the estimation of demand and supply elasticities (Stock and Trebbi Citation2003).

8. Although more efficient, because ML methods jointly estimate the parameters (including ϵ1i and ϵ2i) they are more difficult to implement computationally than their two-step counterparts. Prior to the huge increase in computer power, two-step methods were sometimes preferred (particularly for large samples) on this basis (Cong and Drukker Citation2000). The two estimators may produce similar results. For instance, Cong and Drukker (Citation2000) report treatment variable coefficients of 1.26 (1.27) after controlling for selection bias with two-step (ML) estimators in an empirical example which illustrates the application of the Stata treatreg command.

9. As noted by a reviewer, the ML method sometimes suffers from non-convergence problems whereas the two-step method always results in convergence. This is more likely to be an issue where more complex multinomial specifications are employed, as shown in .

10. The SSRN records 360 citations of Leuz and Verrecchia's (Citation2000) paper.

11. Tucker (Citation2010, p. 44) also notes it is ‘not advisable’ to use probit or ordered probit outcome models with probit selection models to correct for bias.

12. Rather than ML, simulated ML is utilised where, amongst others, multinomial variables (either as outcomes or explanatory ones) are employed in models with endogenous variables. In such cases, estimation may involve integrals with high dimension and no closed form solutions, such that simulated ML is the only viable estimator (see Arias and Cox Citation1999, for an informative discussion of the methodology).

13. The methodology is also appropriate (below) where Y is dichotomous or count in type (Deb and Trivedi Citation2006b).

14. In a similar manner to the error terms in the Heckman treatment effect model, note that the standard Pearson correlation coefficient for assessing the degree of linear association between two variables also assumes that the variables are jointly (bivariate) normally distributed.

15. Wilde's (Citation2008, p. 121) code is specified for implementation with the LIMDEP statistical package.

16. Lennox et al. (Citation2012) show that a number of studies do not use – or employ unsuitable – IVs, leading to a lack of robustness of reported empirical findings.

17. The documentation describing how mtreatreg is implemented (Deb Citation2009) provides clear guidance of how Y is specified for logit, count and OLS models.

18. The R package and R manuals can be downloaded from http://www.r-project.org/. The endogMNP module (and help files) can be downloaded from http://cran.r-project.org/web/packages/endogMNP/index.html.

19. The Stata poisgof command can be employed to test whether nbreg is preferable to poisson.

20. The quantile estimator is more robust to outliers in that OLS minimises the sum of the squares of the residuals, whereas quantile regression minimises the sum of absolute residuals, thus giving less weight to outliers (Wooldridge Citation2010, p. 450).

21. Abadie et al. (Citation2002, p. 426) note that if only a non-binary instrumental variable is available it can be transformed into a binary one for identification purposes (see also note 26).

22. Altonji et al. (Citation2005) specify a similar sensitivity method when examining the impact of the type of school attended and subsequent education attainment. Employing probit selection models and probit and OLS outcome models they examine the sensitivity of treatment estimates to simulated unobserved bias with reference to the correlation (ρ) between the errors of the selection and outcome models as per the Heckman treatment effect equations as shown in Section 2.2.1 above.

23. Frank's Excel formatted file (including instructions) is available from www.msu.edu/~kenfrank/research.htm.

24. As with Frank's (Citation2000) method, for multinomial treatment variables, sensitivity analysis is conducted on each of the N-1 binary treatment variables included in the outcome model.

25. The authors note (p. 2424) that their proposed model is preliminary in that further research is required regarding its statistical properties. It can be implemented in Stata employing treatreg (saving IMRs) and psmatch2.

26. As explained by Nannicini (Citation2007, p. 6), either in advance, or using sensatt, non-binary Y variables can be transformed to binary ones. For example, above and below the mean or median. This may be helpful to accounting researchers when Y is ordinal and can be readily partitioned into (say) high versus lower ratings.

27. Among a number of interesting observations, Lennox et al. (Citation2012, p. 589) note ‘the frequent comments by editors and reviewers of the need to control for endogeneity’ – and that (p. 610) ‘Although OLS is typically more robust, it can still yield incorrect inferences when selection bias is a significant concern. Nevertheless, robustness is an important criterion that researchers should take into account when evaluating their findings’.

28. If the researcher knows (e.g. with reference to prior research) or suspects a variable of potential import is omitted (e.g. because it is unavailable in an archival database), and has expectations regarding its likely impact, then the plausibility of control method causal estimates or sensitivity technique evaluations may be easier to assess.

29. As highlighted in Section 1, the exploitation of natural experiments offers a powerful methodology for addressing endogeneity concerns in accounting studies (Gassen Citation2013). As also noted in the Introduction, additional methods for dealing with endogeneity are available where studies employ panel data (Wooldridge Citation2010).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 183.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.