Abstract
The MLE of the ATE in the logit model for binary outcomes may have a significant second-order bias if the event has a low probability, which is the case we focus on in this article. We derive the second-order bias of the logit ATE estimator, and we propose a bias-corrected estimator of the ATE. We also propose a variation on the logit model with parameters that are elasticities. Finally, we propose a computational trick that avoids numerical instability in the case of estimation for rare events.
Notes
1. See Appendix A.3 for details.
2. Our bias formula looks somewhat different from the one presented in King and Zeng (Citation2001), which in turn is based on McCullagh and Nelder (Citation1989). It can be shown that the two bias formulae are in fact identical. See Appendix B.
3. We assume that the treatment assignment is unconfounded given X.
4. Appendix E also notes that δ = 1 (so that ) is not compatible with asymptotic normality and that this rate is not appropriate for logit models.
5. His Equation (2) implies that Λ(θn)→0, nΛ(θn)→∞. Note that Λ(θn)→0 means that , which in turn means that exp(θn)→0, or . Also note that nΛ(θn)→∞ rules out the Poisson approximation, because if , we cannot have nΛ(θn)→∞. On the other hand, nΛ(θn)→∞ is satisfied as long as with .
6. In , for the and n = 500 combination, the bias of is 0.0168, where the true value of β is 1. So, the MLE overestimates the elasticity by 1.68%, which is reduced to 0.41% by the bias-corrected estimator.
7. When and the support of x is bounded, we can guarantee that . If the support of x is not bounded, but if we are sure that for most values of x, we may want to adopt a parameterization where if t < 0, and if t > 0.
8. It is in the sense that
9. p denotes the probability of y = 1 for each parameter combination.
10. As was discussed in Remark 2, the second-order bias of the ATE is zero when the propensity score is constant. In order to verify this result, we will first consider the case that the distributions of x are identical over the D = 1 and D = 0 subsamples. In , , and in Appendix G, we evaluate the performance of various estimators of the ATE under random assignment.
11. Our simulation results are for the case of a single covariate. Adding covariates did not change the conclusions.
12. p1 and p0 denote the probabilities of y=1 for Di = 1 and Di = 0, i.e., the treated and control sub-samples.