681
Views
4
CrossRef citations to date
0
Altmetric
Articles

Let’s take the bias out of econometrics

Pages 81-98 | Received 20 Apr 2018, Accepted 16 Jul 2018, Published online: 19 Nov 2018
 

ABSTRACT

This study exposes the cognitive flaws of ‘endogeneity bias’. It examines how conceptualisation of the bias has evolved to embrace all major econometric problems, despite extensive lack of hard evidence. It reveals the crux of the bias – a priori rejection of causal variables as conditionally valid ones, and of the bias correction by consistent estimators – modification of those variables by non-uniquely and non-causally generated regressors. It traces the flaws to misconceptions about error terms and estimation consistency. It highlights the need to shake off the bias to let statistical learning play an active and formal role in econometrics.

JEL Classification:

Acknowledgement

The first draft of this paper came out under the title ‘Time to demystify endogeneity bias’ as SOAS working paper no. 192 in 2015. The current version is a substantial revision of the 2016 version of the working paper. During this lengthy process, I have received invaluable support, comments and suggestions from Ruben Lee, Sophie van Hüllen, Ugur Ergun, Simon Appleton, Shi Li, Lina Song, Xuheng Zang, participants at various seminars and conferences where the paper was presented, editors of Journal of Economic Methodology and also anonymous referees’ reports. I am grateful to them all.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes on contributor

Duo Qin, Professor of Economics, SOAS, University of London; expertise and research interests: The history and the methodology of econometrics, especially from the applied modelling angle; development economics and international economics.

Notes

1 A book edited by Mayo and Spanos (Citation2010) is a rare exception. However, a search with Google Scholar yields no citations of this book by econometricians or economists once self-citations are discounted.

2 Causal graphs, also known as directed acyclic graphs, are widely used in statistics and computing, e.g. see Pearl (Citation2009), Wermuth and Cox (Citation2011); see also Spirtes (Citation2005) and Elwert (Citation2013) for their potential in econometric and social research respectively.

3 This interpretation was implied in Wermuth’s (Citation1992) in-depth analysis of how over-parameterisation in multivariate linear structural equations results in non-decomposable independence hypotheses, and identification conditions help to remove the over-parameterisation so as to achieve decomposable independence.

4 See Cox and Wermuth (Citation2004) for more discussion of these cases.

5 Notice that maintaining model (2) in case (c) leads to nonsense regression.

6 For example, the treatment is perceived as a safeguard of the ceteris paribus condition, e.g. see Angrist and Pischke (Citation2015, Introduction).

7 ri=φ(γx.VVi)Φ(γx.VVi), where φ() and Φ() stand respectively for the density and cumulative density of standard normal distribution.

8 For a detailed discussion on the conceptualisation of collinearity versus causal relationship, see Qin (Citation2014).

9 Description of the two types of errors in association with model selection and assessment is given a pronounced place in statistical learning textbooks, e.g. see Abu-Mostafa, Magdon-Ismail, and Lin (Citation2012), James, Witten, Hastie, and Tibshirani (Citation2013), Shalev-Shwartz and Ben-David (Citation2014).

10 Historically, the unknown nature of the error term has long been conceived by various leading econometricians. For example, Frisch classified statistical variations into three types – systematic variations, accidental variations and disturbances and assigned the latter two to the error term, see Bjerkholt and Qin (Citation2010, Chapter 3). In the Cowles Commission works, the error term was described as ‘the joint effect of numerous separately insignificant variables that we … presume to be independent of observable exogenous variables’ Marschak (Citation1953, p. 12). Subsequently, the error term was generally described as ‘the effect of all those factors which we cannot identify for one reason or another’ (Malinvaud, Citation1966, p. 74). However, none of these descriptions has been formally linked to the error term of bivariate regression models where endogeneity bias is defined in textbooks. See also Qin (Citation2013, Chapter 8) for a history of the error term in time-series econometrics.

11 Cross-validation, i.e. the practice of splitting available data into training and testing subsets and utilising the testing errors as proxies for out-of-sample errors, is reviewed in a very sceptical tone by Leamer in the chapter ‘Model choice and specification analysis’ in the Handbook of Econometrics (Citation1983a); it is briefly described as part of kernel estimation procedure in (Cameron & Trivedi, Citation2005, Chapter 9).

12 See also Swamy, Tavlas, and Hall (Citation2015) for a recent revisit and extension of their arguments.

13 Methodological implications of that research have also engaged the attention of philosophers, e.g. see Glymour (Citation2010) and Russo (Citation2014). For recent studies on the faithfulness condition, see Spirtes (Citation2009), Zhang and Spirtes (Citation2011). For discussions by researchers from other social science disciplines, see Morgan (Citation2013), and Kalisch and Bühlmann (Citation2014).

14 Insightful discussions among statisticians on the strategic importance of formulating statistical questions scientifically can also been found in Hand (Citation1994), Senn (Citation1998) and Breiman (Citation2001).

15 Methodologically, the issue on positions of model closure is closely related to the recurring debate over realism of econometrics and economics as well, e.g. see Sims (Citation1980), Hoover (Citation2000Citation2000), Mäki (Citation2002), Romer (Citation2016).

16 See the disputes on measurement without discovery versus measurement without theory between J. Koopmans and R. Vining published in Review of Economics and Statistics in the late 1940s, e.g. Qin (1993, Chapter 6).

17 The latest resurgence can be found in Romer (Citation2016). Another careful and recent examination of the conceptual links between identification, IVs, exogeneity and omitted variables from the angle of the experimental approach versus the structuralist approach can be found in Erik Biørn (Citation2017).

18 In the Econometric Theory (ET) interview of David Hendry, he recalled how the audience at the 1977 European Econometric Society conference was bewildered by J.-F. Richard’s presentation, which used conditional-expectation based sequencing to formalise the concept of exogeneity (Ericsson & Hendry, Citation2004). Another telling example of related communication failure can be found in the discussion of Wermuth (Citation1992) between A.S. Goldberger and statisticians.

19 A brief historical account of this research and also the subsequent developments in programme evaluation methods is given in Qin (Citation2015, p. 2.2). The following description is written to complement rather than repeat that account.

20 An extensive methodological discussion on randomisation and related model specification issues can be found in Leamer (Citation1983b), from which the title of the present paper stems.

21 Such behaviour is referred to as ‘selection on unobservable’ in textbooks as opposed to ‘selection on observable’, which covers both omitted variable bias and sampling selection concerning comparability of the two groups.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 315.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.