630
Views
8
CrossRef citations to date
0
Altmetric
Methodological Studies

An Applied Researcher’s Guide to Estimating Effects from Multisite Individually Randomized Trials: Estimands, Estimators, and Estimates

Pages 270-308 | Received 26 Jun 2019, Accepted 18 Sep 2020, Published online: 13 Jan 2021
 

Abstract

Researchers face many choices when conducting large-scale multisite individually randomized control trials. One of the most common quantities of interest in multisite RCTs is the overall average effect. Even this quantity is non-trivial to define and estimate. The researcher can target the average effect across individuals or sites. Furthermore, the researcher can target the effect for the experimental sample or a larger population. If treatment effects vary across sites, these estimands can differ. Once an estimand is selected, an estimator must be chosen. Standard estimators, such as fixed-effects regression, can be biased. We describe 15 estimators, consider which estimands they are appropriate for, and discuss their properties in the face of cross-site effect heterogeneity. Using data from 12 large multisite RCTs, we estimate the effect (and standard error) using each estimator and compare the results. We assess the extent that these decisions matter in practice and provide guidance for applied researchers.

Acknowledgments

We would like to thank Natalya Verbitsky-Savitz, Steven Raudenbush, Daniel Schwartz, and Howard Bloom, all of whom contributed to early discussions about this work. Thanks to Nicole Pashley for her comments on the manuscript and technical appendices and for detecting several prior errors. Thanks to James Pustejovsky for assistance with cluster robust standard errors, pushing us to clarify several points, and good conversation on the ideas in this article. And finally thanks to our three (very rigorous) reviewers who demanded clarity, provided many useful edits, and who spent considerable time thinking deeply about the themes we discussed. A final thanks to the Miratrix CARES Lab for their feedback and comments on this work.

Declaration of interest statement

This manuscript was first submitted to JREE on June 26, 2019. At that time Dr. Sean Reardon was the editor-in-chief and Dr. Elizabeth Stuart served as the corresponding editor of this manuscript through its first submission to its acceptance. Per JREE policy, the current editorial team, of which Dr. Luke Miratrix and Dr. Michael Weiss are a part, was not involved in the peer review and decision process.

Notes

1 There are actually multiple possible super population models; to streamline our discussion, we refer to them as a general class and note salient differences when they arise. Furthermore, as we will see, there can be additional variation within each of the four classes due to concerns such as nonresponse or nonrandom sampling.

2 In this article we focus on the intention-to-treat effect. For ease of exposition we refer to units as “treated” or “not treated” rather than “offered a treatment” or “not offered a treatment.”

3 For this model to be sensible, we must assume no spillover, and a well-defined treatment. I.e., we cannot have the outcome of person i changing if person k gets a different treatment. This is often called the Stable Unit Treatment Value Assumption, or SUTVA. See Rosenbaum (Citation2010) for further discussion and a good overview.

4 This is necessary to prevent, e.g., a case where sites with generally low impacts have more people sampled, leading to a higher proportion of low-impact units in the overall sample than in the population. Site population numbers could instead be included as weights, rather than site sample size numbers, to correct for this. Considering such situation is beyond the scope of this article.

5 From Parsons et al. (Citation2016): “A social impact bond is an innovative form of pay-for-success contracting that leverages private funding to finance public services. In a social impact bond, private investors fund an intervention through an intermediary organization—and the government repays the funder only if the program achieves certain goals, which are specified at the outset of the initiative and assessed by an independent evaluator.”

6 Recall we assume each site’s evaluation sample constitutes the entire site.

7 In our technical appendix we elaborate on these estimtaors more fully, providing reference to the core ideas and concepts tied to each estimator.

8 Interestingly, the person-weighted superpopulation design-based estimator, while consistent, has a small degree of bias due to N, the total units in the sample, being random. This creates a random denominator for our weighting of sites. The expected value of our estimator will then not quite be the expected value of the numerator divided by the expected value of the denominator. This bias depends on a correlation of site size and site impact, and will generally be quite small. We ignore it in this article, treating the DB estimator as our closest-to-unbiased baseline available. For true unbiasedness, one would have to divide by E[N], assumed known, rather than N in the weighting. The site-weighted version, discussed below, due to not relying on site size, is in fact completely unbiased. See Pashley & Miratrix (Citation2020).

9 The “in principle” refers to the assumption of homoskedasticity that gives these precision weights. If different sites have different degrees of variation, then the precision weights may not correspond to the actual precisions of the sites and this estimand may not in fact be the easiest estimand to estimate. The fixed effect estimator would no longer be truly optimal in this case.

10 Interestingly, the weights make the site fixed effects not strictly necessary for estimation. The regression of outcome onto treatment, with these weights and without further covariates such as site fixed effect indicators, will give an unbiased estimate of the individual average impact. The site fixed effects, however, can absorb outcome variation across sites to increase precision.

11 There are several variants of Huber-White, all of which differently adjust for concerns such as degrees of freedom issues; we use “HC1.” HC1 is the default of STATA’s “robust” option and is commonly used in the literature. All these variants are generally quite similar for large experiments.

12 Similar to Huber-White's HC0, HC1, and so forth, there are several different proposed degrees-of-freedom corrections to account for this.

13 This classification coupled with the biased impact estimators gives even more estimation approaches. For example, one could use cluster robust standard errors on top of multilevel modeling (this is what the HLM7 software and the “robust” option in STATA does) to keep a sampling framework of sites from a superpopulation while also allowing for complex, unknown correlation structures of individuals’ residuals within site.

14 Due to these concerns, we excluded β̂MLRIRC. The other estimators have reasonable adjustments; with β̂MLRIRC the alternatives did not seem sensible.

15 We flip the sign of the outcome as needed to make the site-weighted estimate larger, just for this visualization.

Additional information

Funding

This work was supported by the Institute of Education Sciences [R305D140012].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.