Correcting for Endogeneity in Models with Bunching: Journal of Business & Economic Statistics: Vol 42 , No 3

Abstract

We develop a novel control function approach in models where the treatment variable has bunching at one corner of its support. This situation typically arises when the treatment variable is a constrained choice and some observations choose the corner solution. The method exploits distributional shape restrictions but makes no exclusion restrictions. We provide estimators and establish their asymptotic behavior, prove the convergence of the bootstrap, and develop tests of the identification assumptions. An application reveals that watching television has no effect on cognitive skills and a negative effect on noncognitive skills in children.

KEYWORDS:

Supplementary Materials

The appendix provides generalizations of the approach to more flexible functional forms, details referenced in the paper as well as further results on identification, estimation and testing, a Monte Carlo study, and proofs.

Acknowledgments

We thank Marinho Bertanha, Stéphane Bonhomme, Juan Carlos Escanciano, Bruno Ferman, Simon Freyaldenhoven, Dalia Ghanem, Leonard Goff, Guido Imbens, Ingrid van Keilegom, Salvador Navarro, Rodrigo Pinto, Vitor Possebom, Pedro Sant’Anna, Azeem Shaikh, David Slichter, Christopher Taber, Aad van der Vaart, and seminar participants at various conferences and institutions for helpful discussions and comments. The analysis and conclusions set forth here are those of the authors and do not indicate concurrence by other members of the research staff, the Board of Governors, or the Federal Reserve System.

Disclosure Statement

The authors report there are no competing interests to declare.

Notes

1 Meta–analyses of studies estimating the effects of these variables on health outcomes include Fawzi et al. (Citation1993), Hernán et al. (Citation2002), Reynolds et al. (Citation2003), Noordzij et al. (Citation2005), Bischoff–Ferrari et al. (2005), Oken, Levitan, and Gillman (Citation2008), and Richardson, Elliott, and Roberts (Citation2013).

2 See, for example, Peek, Rosengren, and Tootell (Citation2003), Ekici and Dunn (Citation2010), Bertrand et al. (Citation2010), Brown, Coile, and Weisbenner (Citation2010), Melzer (Citation2011), Boserup, Kopczuk, and Kreiner (Citation2016), and Erixson (Citation2017).

3 See, for example, Luoh and Herzog (Citation2002), James–Burdumy (2005), Eren and Henderson (Citation2011), Bhutani et al. (Citation2013), Holt et al. (Citation2013), and Boulianne (Citation2015).

4 See, for example, McDuffie et al. (Citation1996), Black, Devereux, and Salvanes (Citation2005), and Cohen (Citation2008).

5 See, for example, Rozenas, Schutte, and Zhukov (Citation2017), Erhardt (Citation2017), Pang (Citation2017), Bleemer (Citation2018a), Bleemer (Citation2018b), Ferreira, Ferreira, and Mariano (Citation2018), Lavetti and Schmutte (Citation2018), Caetano and Maheshri (Citation2018), De Vito, Jacob, and Müller (Citation2019), Caetano, Kinsler, and Teng (Citation2019), Caetano, Caetano, and Nielsen (Citation2022), and Caetano et al. (Citation2022)

6 It may be difficult to conceptualize the idea that X* can be negative, as it would mean that someone would want to choose negative amounts of TV watching. It may be easier to think of X* at X = 0 as a measure of the distance from exact indifference between watching some TV and alternative activities. For instance, those at $X^{*} = - 0.1$ have characteristics, preferences and constraints that led them to be nearly indifferent between watching TV and another activity at X = 0, while those at $X^{*} = - 3$ have characteristics, preferences and constraints that led them to be farther from indifference at X = 0 (e.g., this family could be equal in every way to a family of type $X^{*} = 0,$ except for having a higher relative preference for playing sports versus watching TV.)

7 All equations and results involving random variables should be read as holding almost surely. $P$ denotes the probability, and details about the implied probability spaces and conditional sigma–algebras are self–evident and thus omitted. $Z$ denotes the support of the distribution of Z, and $Z | A$ denotes the support of the conditional distribution of Z given A. For brevity, we often mention the support of the variable V when we mean the support of the distribution of V. Finally, the expectation $E$ is assumed to exist wherever written.

8 This is not satisfied if elements of the parameter vector affect $F_{X^{*} | Z = z}$ only when $x \leq 0.$ One example is the piecewise–linear distribution family with a kink at zero, which has cdf $F_{X^{*} | Z = z} (x) = ω_{2} + ω_{1} \cdot x \cdot 1 (- \frac{ω_{2}}{ω_{1}} \leq x \leq 0) + ω_{2} \cdot x \cdot 1 (0 \leq x \leq \frac{1}{ω_{2}} - 1) + 1 (z \geq \frac{1}{ω_{2}} - 1) .$ The expectation satisfies $E [X^{*} | X^{*} \leq 0, Z = z] = - \frac{1}{3} \frac{ω_{2}}{ω_{1}},$ and therefore varies with $ω_{1} .$ However, $F_{X^{*} | Z = z} (x)$ varies with ω₁ only when $x \leq 0,$ so $F_{X | Z = z} (x) = ω_{2} \cdot 1 (x \geq 0) + ω_{2} \cdot x \cdot 1 (0 \leq x \leq \frac{1}{ω_{2}} - 1) + 1 (z \geq \frac{1}{ω_{2}} - 1)$ cannot identify ω₁. The expectation is not identifiable in this family.

9 The symmetric family is location–scale with function $H (u) = 1 - F_{X | Z = z} (- u)$ if $u \geq 0,$ and $H (u) = F_{X | Z = z} (u + 2 med (X | Z = z)),$ if $u > 0,$ where $med (X | Z = z)$ is the median of $X | Z = z$ . In this case, $μ_{z} = 2 med (X | Z = z)$ and $σ_{z} = 1$ .

10 The use of $E$ in this expression gives the impression that the left hand side is constant. However, the expectation here is taken with respect to X_i and Z_i, but conditional on the data that generated $\hat{ψ}$ . Thus, the left hand side term is a random variable. This notation can be confusing, but it is standard in the empirical process literature (e.g., Chen, Linton, and Van Keilegom Citation2003). To clarify the notation further, we give an example for two variables V and Q, where V is discrete, assuming values ${v_{1}, \dots, v_{M}}$ with probabilities $p_{1}, \dots, p_{M}$ . Suppose that $\hat{ξ} (v_{m}) = \frac{1}{n} \sum_{j = 1}^{n} Q_{j} 1 (V_{j} = v_{m})$ and $ξ (v) = E [Q 1 (V = v_{m})]$ . Then, the notation $\sqrt{n} E [\hat{ξ} (V_{i}) - ξ (V_{i})]$ is equivalent to $\sqrt{n} \frac{1}{n} \sum_{j = 1}^{n} (Q_{j} P (V = V_{j}) - E [E [Q 1 (V = V_{j})]]) = \sqrt{n} \frac{1}{n} \sum_{j = 1}^{n} \sum_{m = 1}^{M} 1 (V_{j} = v_{m}) (Q_{j} p_{m} - E [Q | V = v_{m}] p_{m})] = \sum_{m = 1}^{M} p_{m} \sqrt{n} \frac{1}{n} \sum_{j = 1}^{n} (Q_{j} - E [Q | V = v_{m}]) 1 (V_{j} = v_{m})]$ . Supposing that the data is iid and $var (Q | V = v_{m}) = ς_{m}^{2},$ then $Ω = \sum_{m = 1}^{M} p_{m} (ς_{m}^{2} p_{m}^{2})$ .

11 Precisely, $Σ = E {[W W']}^{- 1} \lim_{n \to \infty} \sum_{i = 1}^{n} \frac{1}{n} E [var (Y_{i} | X_{i}, Z_{i}) W_{i} W_{i}^{'}] E {[W W']}^{- 1}$ . The limit is well defined by Assumption 3(i) and Assumption 4(i), and is established in White (Citation1980).

12 Specifically, Theorem B requires that Assumption 2.5’ in Chen, Linton, and Van Keilegom (Citation2003) holds with “in probability” replaced by “almost surely.” The direct translation of this condition to our context is expressed in Footnote 25 in Appendix C.1.

13 Pötscher and Prucha (Citation1994) and Jenish and Prucha (Citation2009) discuss almost sure stochastic equicontinuity for establishing Uniform Laws of Large Numbers. The primitives explored in those papers are not sufficient to establish almost sure stochastic equicontinuity with a $\sqrt{n}$ denominator as required by Chen, Linton, and Van Keilegom (Citation2003). We did not find other references of primitives of almost sure stochastic equicontinuity, and a general treatment of this condition in the lines of Pakes and Pollard (Citation1989) is an open question.

14 Let K(u) be a kernel function ( $\int_{- \infty}^{\infty} K (u) = 1,$ and suppose if convenient that $K (u) \geq 0$ and $K (u) 1 (| u | > 1) = 0$ ), let $k_{n} (Z_{i} - z) = K ((Z_{i} - z) / h_{n}) / \sum_{i = 1}^{n} K ((Z_{i} - z) / h_{n}),$ for some sequence $h_{n} \to 0,$ $n h_{n} \to \infty .$ Then ${\hat{F}}_{X | Z = z} (\cdot) = \sum_{i = 1}^{n} 1 (X_{i} \leq \cdot) k_{n} (Z_{i} - z) .$ Conditions for uniform convergence of such estimators can be verified in the existing literature. See, for example, Andrews (Citation1995) and Hansen (Citation2008).

15 This test is designed to have better power to detect deviations from the null at the extremities of distributions. Such deviations may be a particular concern in our setting, when including the upper tail in the comparison of the distributions is often necessary.

16 Precisely, $X^{*} | Z = z$ is symmetric around its mean if $F_{X^{*} | Z = z} (x) = 1 - F_{X^{*} | Z = z} (2 E [X^{*} | Z = z] - x)$ for all x.

17 See, for example, Chen and Tripathi (Citation2017), which is designed specifically for truncated variables as in our case, Niu et al. (Citation2018) and the large list of citations therein, and Quessy (Citation2021) for a more general test of copula homogeneity which can be used to test symmetry.

18 Note that, in models without controls, this test has nontrivial power only against violations of (3), not against the misidentification of the expectation. This is because, if (2) and (3) hold without controls, $E [Y | X] = L (Y | X, X + \tilde{e} 1 (X = 0))$ holds for all $\tilde{e}$ .

19 When X has N bunching points, one can use X = 0 for the correction, add indicators of each of the remaining $N - 1$ bunching points, and do a joint test that all the coefficients of the dummies are equal to zero, as discussed in the multivariate version of the dummy test in Caetano et al. (Citation2021).

20 See Zavodny (Citation2006), Gentzkow and Shapiro (Citation2008), and Munasib and Bhattacharya (Citation2010) for recent empirical papers in this literature. These papers are well aware that watching television may be endogenous. Zavodny (Citation2006) tackles endogeneity using fixed effects, Munasib and Bhattacharya (Citation2010) uses IV, while Gentzkow and Shapiro (Citation2008) uses the timing of the roll–out of children’s programming to different local markets to obtain causal estimates.

21 The variables included as controls are the child’s age and squared age (in months), and indicators for: CDS wave (1997, 2002, and 2007), grade (thirteen variables, from kindergarten through grade 12), gender, ethnicity (black, Hispanic and other nonwhite ethnicity), whether the child has siblings, family income tercile, whether the mother is alive, and whether the father is alive.

Correcting for Endogeneity in Models with Bunching

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Correcting for Endogeneity in Models with Bunching

Abstract

Supplementary Materials

Acknowledgments

Disclosure Statement

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature