349
Views
2
CrossRef citations to date
0
Altmetric
Articles

Minimum Contrast Empirical Likelihood Inference of Discontinuity in Density*

, &
Pages 934-950 | Published online: 05 Jul 2019
 

Abstract

This article investigates the asymptotic properties of a simple empirical-likelihood-based inference method for discontinuity in density. The parameter of interest is a function of two one-sided limits of the probability density function at (possibly) two cut-off points. Our approach is based on the first-order conditions from a minimum contrast problem. We investigate both first-order and second-order properties of the proposed method. We characterize the leading coverage error of our inference method and propose a coverage-error-optimal (CE-optimal, hereafter) bandwidth selector. We show that the empirical likelihood ratio statistic is Bartlett correctable. An important special case is the manipulation testing problem in a regression discontinuity design (RDD), where the parameter of interest is the density difference at a known threshold. In RDD, the continuity of the density of the assignment variable at the threshold is considered as a “no-manipulation” behavioral assumption, which is a testable implication of an identifying condition for the local average treatment effect. When specialized to the manipulation testing problem, the CE-optimal bandwidth selector has an explicit form. We propose a data-driven CE-optimal bandwidth selector for use in practice. Results from Monte Carlo simulations are presented. Usefulness of our method is illustrated by an empirical example.

SUPPLEMENTARY MATERIALS

Online supplement with details of implementation and proofs and Matlab code for the Monte Carlo simulations.

ACKNOWLEDGMENTS

We thank the joint editor, the associate editor, and two anonymous referees, whose comments greatly improved the article. We thank Hiroyuki Kasahara and Pedro Sant’Anna for helpful discussion.

Notes

1 Particularly, Todd, and Van der Klaauw’s (2001) local independence assumption is not required in Lee’s (Citation2008) framework. Recently, Dong (Citation2018) pointed out that the plausibility of the local independence assumption of Hahn, Todd, and Van der Klaauw (Citation2001) can be often in doubt in practical empirical applications.

2 We should emphasize that the continuity of the density function of the assignment variable is neither sufficient nor necessary for identification of treatment effects (see McCrary Citation2008 for discussion).

3 Recently, Cattaneo, Jansson, and Ma (Citation2018) proposed a novel local regression method with an application to manipulation testing that does not require binning. We adopt this idea to obtain “pilot bandwidths” when calculating estimators of the CE-optimal bandwidths.

4 The asymptotic variance estimation can be complicated in a Wald-type manipulation testing (see Cattaneo, Jansson, and Ma Citation2018).

5 These methods are fundamentally different from selection rules based on the criterion of minimizing the AMSE for point estimation (see, e.g., Arai and Ichimura Citation2018).

6 We also note that since manipulation testing is viewed as a specification/falsification test in the context of RDD, practitioners are more concerned with the Type I error than the Type II error. Compared with some other type of criterion that incorporates Type II error (power), the CE-optimal bandwidth selection rule is of more practical interest to practitioners.

7 The parameter of interest in Doyle (Citation2007) and Jales (Citation2018) is of the form ρ(z1, z2) = z2/z1. Gerard, Rokkanen, and Rothe (Citation2018) considered estimating a parameter taking the form ρ(z1, z2) = 1 – z2/z1. Bajari et al. (Citation2011) considered ρ(z1, z2) = 1/z2 – 1/z1. The parameter of interest could be an implicit function of φ and φ+, whose partial derivatives can be computed using the implicit function theorem (see, e.g., Saez Citation2010, eq. (5)).

8 Such a local-linear-type estimator dates back to Lejeune and Sarda (Citation1992), where the MCE was interpreted as local linear approximation to the empirical density function. Cheng, Fan, and Marron (Citation1997) showed that both the binning estimator and the MCE enjoy certain optimal theoretical properties.

9 A similar problem in the context of over-identified moment restriction model was considered by Ma (Citation2017).

10 Applying the “delta method” (see, e.g., Hall Citation1992, sec. 2.7), we show that the difference between the cumulative distribution function of LR* and that of LR* is of order that is approximately the same as that of the stochastic order of magnitude of the error term in (9). See the supplement for more detail.

11 We note that the restriction of K*,+ to [0, 1] coincides with that of the “equivalent kernel” of local linear regression (see, e.g., Armstrong and Kolesár Citation2018b, sec. S2.1). By simple calculation, we find that if K is the triangular kernel, Assumption 5 is satisfied with J = 3 and u2 = 3/4. It is straightforward to do calculation to verify that this condition is also satisfied by other commonly used “polynomial” kernels such as the Epanechnikov, biweight and triweight kernels.

12 Following Calonico, Cattaneo, and Farrell (Citation2018), we recommend squaring the objective function when numerically solving the minimization problem.

13 It is also noteworthy that the CE-optimal bandwidth is “adaptive” in the sense that the leading coverage error is minimized no matter what the true value of the parameter of interest ϑ* is.

14 Note that for most regular parameters that could be estimated at the standard parametric rate, the coverage error decay rate of standard two-sided confidence intervals is O(n−1). An interesting observation here is that Bartlett-corrected MC-EL inference achieves the same fast coverage error decay rate even though the parameter of interest is estimated at a slower nonparametric rate.

15 A similar observation in the context of local polynomial regression was made in (Calonico, Cattaneo, and Farrell Citation2018, sec. 3). See (Calonico, Cattaneo, and Farrell Citation2018, Corollary 5) and the discussion that follows.

16 See (Kitamura Citation2006, sec. 8.1) for a discussion of the inner loop optimization and the outer loop optimization in the context of EL.

17 Note that this statement is exactly the same as that of Theorem 2.1 of OXM. This suggests that the MCE of ϑ* is first-order equivalent to its local likelihood estimator.

18 Rigorously, the following theorem holds for a stochastic approximation of t*. See (9), the discussion that follows and the supplement for details.

19 Note that for many other Wald-type or EL-type inference methods for nonparametric curves, the coverage errors are often of order O(nh5+h2+(nh)1). See, for example, (Calonico, Cattaneo, and Farrell Citation2018, Theorem 1) for Wald-type confidence intervals based on standard or bias-corrected kernel density estimators at interior points and (Otsu, Xu, and Matsushita Citation2015, Theorem 4.1) for EL-type confidence sets for the regression discontinuity parameter.

20 π=q2/(rq1+q2) , where qj=ϕ(1σj)/[σϕ(1σj)(1Φ(1σj))] for j = 1, 2 and σ1=2,σ2=8.

21 Note that in Table 1 when the null-hypothesized value r0 = 2, the reported size of ELCEtr is identical to its Bartlett-corrected version ELBCtr. This is because when r0 = 2, the CE-optimal bandwidth makes the leading distortion Bc (see Theorem 2) achieve zero and thus the rescaling factor is equal to one. While for other null-hypothesized values r0 = 0.8, 1.2, and 1.6, the minimized leading distortion Bc remains positive. In the latter cases, applying Bartlett correction to the MC-EL test statistic with CE-optimal bandwidth slightly improves its size performance.

22 See (Otsu, Xu, and Matsushita Citation2015, p. 98) for discussion. Here the grid for the null-hypothesized values is [0.3, 4.0] for r = 1.2 and [0.5, 5.0] for r = 2.0, both with a step length 0.01.

23 The endpoints of Ave. CI are the averages of the endpoints of confidence sets over simulation replications.

24 π=(q2d)/(q1+q2) , where qj=ϕ(1σj)/[σϕ(1σj)(1Φ(1σj))],j=1,2 and σ1=2,σ2=8.

25 We use the R function rddensity to generate the last column. See Cattaneo, Jansson, and Ma (Citation2017) for details about the implementation of the CJM test.

26 The plug-in estimates of the bandwidth are on average smaller than the true CE-optimal bandwidth. In the case of d = 0.075 and n = 1000, the median estimate of HEL* (the constant term in the CE-optimal bandwidth) is 6.73 while the true value is 14.10. The mean absolute deviation is 7.16. Similarly, the estimates of the (normalized) Bartlett correction term n2∕3Bc are on average larger than the true value. In the case d = 0.075, the median estimate of n2∕3Bc is 1.04 and its true value is 0.51. The mean absolute deviation is 0.59.

Additional information

Funding

Jun Ma’s research is supported by fund for building world-class universities (disciplines) of Renmin University of China. Yu gratefully acknowledges the support of JSPS KAKENHI grant number JP17K13713.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 123.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.