836
Views
1
CrossRef citations to date
0
Altmetric
Articles

Double Machine Learning for Sample Selection Models

ORCID Icon, ORCID Icon & ORCID Icon
 

Abstract

This article considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.

Supplementary Materials

The appendices include the following: proofs of propositions (A), proofs of theorems (B), derivation of influence functions (C), a discussion on the convergence rates of ML estimators (D), a simulation study (E), and descriptive statistics related to the empirical example (F). We also provide a replication package for the empirical example in the main paper.

Acknowledgments

We have benefited from comments by Alyssa Carlson, David Kaplan, Jannis Kueck, Peter Mueser, and seminar participants at the University of Missouri.

Disclosure Statement

No potential conflict of interest was reported by the authors.

Notes

1 Relatedly, Barnwell and Chaudhuri (Citation2020) consider several outcome periods under a monotonic MAR assumption (i.e., outcome attrition being an absorbing state weakly increasing over time) and also discuss the evaluation of randomly assigned treatments in this context based on the efficient score function. In contrast, our framework considers a single outcome period and permits selection into treatment to be related to observed confounders.

2 As an alternative set of IV restrictions, d’Haultfoeuille (Citation2010) permits the instrument to be associated with the outcome, but assumes conditional independence of the instrument and selection given the outcome.

3 See for example, Ahn and Powell (Citation1993), Das, Newey, and Vella (Citation2003), Newey (Citation2007), Newey, Powell, and Vella (Citation1999), Blundell and Powell (Citation2004), and Imbens and Newey (Citation2009) for further semi- and nonparametric control function approaches in sample selection or instrumental variable models.

4 While the efficient score function associated with (2) is technically speaking doubly robust, that is, consistent if either μ(d,1,X,Π) or pd(X,Π) is correctly specified, it is worth noting that this property can generally only hold if Π is correctly specified because it enters both μ(d,1,X,Π) and pd(X,Π) as first step estimator. However, our approach does not rely on (global) doubly robustness but on Neyman orthogonality, which implies that DML is robust to local perturbations in Π under particular regularity conditions.

5 Under separability, potential outcomes in the selected and nonselected populations might differ due to distinct levels of U, while conditional average treatment effects given X, V are the same, because the influence of U cancels out when taking differences in conditional outcomes across treatment states. See Huber (Citation2014b) for further discussion.

6 Related assumptions on the inclusion of post-treatment covariates for tackling confounding have been imposed in evaluations of dynamic (i.e., sequentially assigned) treatments, see Robins (Citation1986), Robins (Citation1998), and Lechner (Citation2009), or of causal mediation, see for example Imai and Yamamoto (Citation2013) and Huber (Citation2014a). Indeed, M is a mediator in the sense that part of the effect of D on Y operates via M. At the same time, M also affects selection S, which is the reason why it needs to be included as a control variable as stated in Assumption 8. One difference to the dynamic treatment effects literature, however, is that the selection indicator S does not affect (and is not affected by) the outcome, which is also a distinction with studies on surrogate outcomes (i.e., short-term outcomes through which the treatment effect on longer-term outcomes operates), see for instance Athey et al. (Citation2019).

Additional information

Funding

Lafférs acknowledges support provided by the Slovak Research and Development Agency under contract no. APVV-21-0360 and VEGA-1/0398/23. Michela Bia acknowledges financial support from the Inter Mobility IN Program: “Causal Mediation Analysis and Machine Learning based estimators (CAME)”, funded by the Luxembourg National Research Fund.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.