412
Views
23
CrossRef citations to date
0
Altmetric
Methodological Studies

Assessing the Precision of Multisite Trials for Estimating the Parameters of a Cross-Site Population Distribution of Program Effects

&
Pages 877-902 | Received 09 May 2016, Accepted 28 Feb 2017, Published online: 08 May 2017
 

ABSTRACT

Multisite trials, which are being used with increasing frequency in education and evaluation research, provide an exciting opportunity for learning about how the effects of interventions or programs are distributed across sites. In particular, these studies can produce rigorous estimates of a cross-site mean effect of program assignment (intent-to-treat), a cross-site standard deviation of the effects of program assignment, and a difference between the cross-site mean effects of program assignment for two subpopulations of sites. However, to capitalize on this opportunity will require adequately powering future trials to estimate these parameters. To help researchers do so, we present a simple approach for computing the minimum detectable values of these parameters for different sample designs. The article then uses this approach to illustrate for each parameter, the precision trade-off between increasing the number of study sites and increasing site sample size. Findings are presented for multisite trials that randomize individual sample members and for multisite trials that randomize intact groups or clusters of sample members.

Funding

This article was funded by Grant #201500035 from the Spencer Foundation and Grant #183631 from the William T. Grant Foundation.

Notes

1 These effect sizes are stated as a proportion of the total individual-level standard deviation () of reading or math test scores for control group members in the study sample.

2 Even though the study was based on a convenience sample of charter middle schools and it is thus not possible to rigorously define the population of charter middle schools that is represented, this population can exist in principle and thus is a valid target of inference (Bloom, Raudenbush, Weiss, & Porter, Citation2017; Raudenbush & Bloom, Citation2015).

3 According to the Common Guidelines for Education Research and Development published by the Institute of Education Sciences and the National Science Foundation (Citation2013) (http://ies.ed.gov/pdf/CommonGuidelines.pdf.) “. . . Scale-Up studies should be conducted in settings and with population groups that are sufficiently diverse to broadly generalize findings.” (p. 14)

4 For example, Schochet (Citation2013) states that “[u]nder this fixed population scenario, researchers are to be agnostic about whether the study results have external validity. Policy makers and other users of the study results can decide whether the impact evidence is sufficient to adopt the intervention on a broader scale . . .” (pp. 221–222). However, he also states: “Nonetheless this approach [referring to the super-population framework] can be justified on the grounds that policymakers may generalize the findings anyway, especially if the study provides a primary basis for deciding whether to implement the tested treatment more broadly . . .” (pp. 223–224).

5 For detailed discussions about different inference populations, see Crits-Christoph, Tu, and Gallop (Citation2003); Hedges and Rhoads (2009); Schochet (Citation2015); Senn (Citation2007); Serlin, Wampold, and Levin (Citation2003); and Siemer and Joorman (Citation2003).

6 In particular, Bloom et al. (Citation2017) indicate that using fixed site-specific intercepts eliminates the bias that can result from a model with randomly varying intercepts when the proportion of sample members randomized to treatment () varies across sites and thus it is possible for unmeasured site characteristics to be correlated with .

7 If W predicts cross-site variation in program effects, .

8 Pages 158 and 159 of Bloom (Citation2005) explain why the multiplier for a minimum detectable effect () equals , where is the critical t value for a two-tailed hypothesis test and is the corresponding t value for power equal to .

9 Assuming that is constant across sites does not produce major estimation or inference problems unless and site sample sizes vary substantially and are highly correlated (Bloom et al., Citation2017). Assuming that is the same for treatment and control group members does not produce major estimation or inference problems unless the values for and the sample sizes differ substantially across the two experimental groups (Bloom et al., Citation2017). These simplifying assumptions are made by most, if not all, prior discussions of the statistical power or precision of MSTs (e.g., Raudenbush & Liu, Citation2000).

10 In practice, n and may vary across sites. In that case, we recommend using the harmonic mean value of n and for the calculations here and elsewhere in the present article where these parameters appear. Strictly speaking, one should use the harmonic mean value of the expression However, the information needed to do so is rarely available when designing a study, and this approach is not likely to produce a result that differs markedly from that based on the harmonic mean values of and .

11 Weiss et al. (Citation2017) provide estimates of from data for 16 MSTs.

12 The parameter, , also represents the proportion of that is explained by the site intercepts, .

13 In practice, researchers often use the sample control-group mean () and standard deviation () of outcomes to define a standardized z score. Weiss et al. (Citation2017) explore issues that arise when z scores are defined in terms of the mean and standard deviation for other reference populations, such as a state or a nation.

14 Because these two studies examine different populations (urban districts in one case and the nation in another), their estimates of and are not fully comparable. Nonetheless, together they provide a strong basis for extrapolating values for these parameters to other settings.

15 (p. 69) of Hedges and Hedberg (Citation2007) reports a national estimate of for third-grade reading test scores equal to 0.27 and a corresponding estimate of equal to 0.48.

16 This comparison is only an approximation because it does not account for the difference between the numbers of degrees of freedom for the two inference frameworks. Nonetheless, it provides a good indication of their relative precision for a super population that has a lot of cross-site impact variation.

17 To convert an MDESSD in units of to a minimum detectable effect standard deviation in natural units of the outcome involved, multiple the MDESSD by.

18 Of course there can be an MDESD for a lower value of statistical power and a less stringent level of statistical significance.

19 The three studies pooled were: (a) an evaluation of the Greater Avenues for Improvement Program conducted in 22 local welfare offices (sites) in California (Riccio & Friedlander, Citation1992), (b) an evaluation of Project Independence conducted in 10 local welfare offices (sites) in Florida (Kemple & Haimson, Citation1994), and (c) the National Evaluation of Welfare-to-Work programs conducted in 27 local welfare offices (sites) from six states (Hamilton, Citation2002).

20 In the future, it might be possible to study cross-site impact variation using MSCRTs that randomize intact classrooms (clusters) within schools (sites). However, to date, very few such studies have been conducted. And even these studies would need to be quite large because of the typically small numbers of classrooms per school—especially for elementary or middle schools.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 302.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.