Using machine learning for efficient flexible regression adjustment in economic experiments: Econometric Reviews: Vol 0, No 0

Abstract

This study investigates the optimal use of covariates in reducing variance when analyzing experimental data. We show that finding the variance-minimizing strategy for making use of pre-treatment observables is equivalent to estimating the conditional expectation function of the outcome given all available pre-randomization observables. This is a pure prediction problem, which recent advances in machine learning (ML) are well-suited to tackling. Through a number of empirical examples, we show how ML-based regression adjustments can feasibly be implemented in practical settings. We compare our proposed estimator to other standard variance reduction techniques in the literature. Two important advantages of our ML-based regression adjustment estimator are that (i) they improve asymptotic efficiency relative to other alternatives and (ii) they can be implemented automatically, with relatively little tuning from the researcher, which limits the scope for data-snooping.

Keywords:

JEL Codes::

Ackonwledgments

We thank Lyft Inc. for providing a large portion of the data used in this project. We additionally thank Adeline Sutton for her help in accessing and interpreting the CHECC data, as well as Brent Hickman, Michael Cuna, Atom Vayalinkal, and participants at the Advances in Field Experiments conference for helpful comments that have improved the article. Documentation of our procedures and our Stata and R code can be found here: https://github.com/gsun593/FlexibleRA

Declaration of Interest

John List was Chief Economist at Lyft when this research was carried out. He is now Chief Economist at Walmart. Ian Muir and Gregory Sun were also employed at Lyft at the time that the research was carried out. They are no longer affiliated with Lyft.

Author Contribution Statement

The authors confirm contribution to the paper as follows: study conception and design: Gregory Sun; data collection: Ian Muir, Gregory Sun; analysis and interpretation of results: John List, Ian Muir, Gregory Sun; draft manuscript preparation: John List, Gregory Sun. All authors reviewed the results and approved the final version of the manuscript.

Notes

1. Specifically, our estimators attain the asymptotic efficiency bound subject to the constraint that $\Pr (W_{i, g} = 1 | X_{i} = x) = ρ_{g}$ for fixed proportions 𝝆, and for all x. If randomization probabilities can be made conditional on x, then for a fixed target parameter, variance can be further decreased by exploiting heteroskedasticity in Y_i(g) conditional on X_i. For instance, if the researcher is interested in estimating the average treatment effect, then the researcher could further reduce variance by over-sampling treatments for which the outcome of the variance is higher: $\Pr (W_{i, g} = 1 | X_{i} = x) \propto Var (Y_{i} (g) | X_{i} = x)$ . This information is often difficult to obtain in practice, and moreover, the optimal sampling design for one target parameter may not be optimal for another.

2. We provide code for doing so at https://github.com/gsun593/FlexibleRA

3. Such a choice makes 𝐂^𝐡 deterministically 0.

4. Note a subtle difference in the justification for this fact. In this case, A_g is uncorrelated only with linear functions of X, but because we are restricting ourselves to the class of linear in X regression adjustments, the summands of B_g and $C_{g}^{β_{g}}$ are all restricted to be linear as well.

5. The two examples NW explicitly have in mind are logistic regression and Poisson regression. In the former case, $f (\cdot) = \exp (\cdot) / (1 + \exp (\cdot))$ while in the latter case, $f (\cdot) = \exp (\cdot)$ .

6. Where here, A and B are as in the previous section.

7. As we will see in our simulations, we still prefer $\hat{m}$ to be high quality, as the ability of $\hat{m}$ to fit the data affects the sampling variability of the resulting estimator.

8. However, this point should not be overstated. Nonparametric estimators typically suffer from slower rates of convergence than parametric estimators, so in a finite sample, one may still prefer linear regression adjustment. Our empirical results suggest that, in general, one should pick the method that produces the highest quality out-of-sample predictions of the outcome as measured by mean squared error.

9. Note that if the sample size is not sufficiently large, some care should be taken to ensure that each fold gets observations from each of the treatment groups g.

10. R Code implementing our flexible regression adjustment along with the analyses of the three non Lyft settings can be found at the following link: https://github.com/gsun593/FlexibleRA. We have also included a copy of the code in Appendix B

11. See Friedman et al. (Citation2004) for an interpretation of this strategy as approximating the solution to a LASSO-like estimation procedure.

12. For confidentiality reasons, we cannot report the exact size of this sample. However, at the time of our writing, Lyft recorded a number of passengers in the tens of millions.

13. Specifically, the x axis in these qq-plots is defined by the theoretical quantiles of a standard normal distribution while the y axis corresponds to the empirical quantiles. If the asymptotic theory is correct, the points in these plots should lie close to the 45 degree line, and deviations from this prediction allow us to more precisely visualize deviations from asymptotic normality.

14. This reduction is not just due to noise: the difference would be statistically significant if subjected to formal hypothesis testing.

15. If the nonparametric method being used has algorithmic complexity growing faster than linearly in dataset size (which is common), two-fold cross-fitting would be even faster than not using a split sample for sufficiently large datasets.

16. Specifically, we implemented our point estimates according to Equation11(11) $\begin{array}{l} {\hat{μ}}_{g}^{FRA} & = \frac{1}{n_{g}} \sum_{i : W_{i, g} = 1} (Y_{i} - {\hat{m}}_{g, i}) + \frac{1}{n} \sum_{i = 1}^{n} {\hat{m}}_{g, i} \\ = \frac{1}{n_{g}} \sum_{i : W_{i, g} = 1} \underset{a_{g, i}}{\underset{⏟}{(Y_{i} - m_{g, i})}} + \frac{1}{n} \sum_{i = 1}^{n} \underset{b_{g, i}}{\underset{⏟}{m_{g, i}}} + o (1 / \sqrt{n}), \end{array}$ (11) and our standard errors according to Equation12(12) $\begin{array}{l} {\hat{V}}_{g, g} & = \frac{1}{n_{g}} \hat{Var} (a_{g}) + \frac{1}{n} \hat{Var} (b_{g}) \\ {\hat{V}}_{g, g^{'}} & = \frac{1}{n} \hat{Cov} (b_{g}, b_{g^{'}}), \end{array}$ (12) , but using an OLS fit for ${\hat{m}}_{g, i}$ in place of a fitted machine learning model.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 578.00 Add to cart

* Local tax will be added as applicable

Using machine learning for efficient flexible regression adjustment in economic experiments

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Using machine learning for efficient flexible regression adjustment in economic experiments

Abstract

Ackonwledgments

Declaration of Interest

Author Contribution Statement

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature