161
Views
0
CrossRef citations to date
0
Altmetric
Methodological Studies

Asymdystopia: The Threat of Small Biases in Evaluations of Education Interventions That Need to Be Powered to Detect Small Impacts

, & ORCID Icon
Pages 207-240 | Received 04 Dec 2019, Accepted 13 Sep 2020, Published online: 16 Apr 2021
 

Abstract

Evaluators of education interventions are increasingly designing studies to detect impacts much smaller than the 0.20 standard deviations that Cohen characterized as “small.” While the need to detect smaller impacts is based on compelling arguments that such impacts are substantively meaningful, the drive to detect smaller impacts may create a new challenge for researchers: the need to guard against smaller biases. The purpose of this article is twofold. First, we examine the potential for small biases to increase the risk of making false inferences as studies are powered to detect smaller impacts, a phenomenon we refer to as asymdystopia. We examine this potential for two of the most rigorous designs commonly used in education research—randomized controlled trials and regression discontinuity designs. Second, we recommend strategies researchers can use to avoid or mitigate these biases.

Acknowledgments

We thank Jessie Mazeika for excellent research assistance. We thank Luke Miratrix and two anonymous referees for helpful comments.

Open Scholarship

This article has earned the Center for Open Science badges for Open Data and Open Materials through Open Practices Disclosure. The data and materials are openly accessible at https://ies.ed.gov/ncee/wwc/StudyFindings, https://ies.ed.gov/ncee/projects/evaluation/data_files.asp, https://osf.io/3x2zu/?view_only=92275575ad2c4ce5ab5cac4bdc38787a and https://ies.ed.gov/pubsearch/pubsinfo.asp?pubid=NCEE20184002.

Notes

1 We acknowledge the recent critiques of the NHST framework (Amrhein et al., Citation2019; Wasserstein & Lazar, Citation2016) and do not intend for this article to implicitly endorse its continued use. The issues raised in this article are equally applicable to any inferential method (for example, Bayesian posterior probabilities) that ignores small biases when assessing and reporting uncertainty in studies powered to detect small impacts. We therefore stick with the NHST framework in this article for simplicity and because it is likely to be familiar to the widest range of readers.

2 Some studies—particularly retrospective nonexperimental studies using administrative data—have the statistical power to detect effects that are too small to be substantively important. This article does not focus on “overpowered” studies. Instead, we focus on studies that are designed to have just enough statistical power to detect the smallest impact that is substantively important.

3 The term “assignment variable” is often used interchangeably with “forcing variable,” “running variable,” and “score.” Truly continuous assignment variables are atypical in practice, although methods do exist to account for discreteness (for example, Armstrong & Kolesár, Citation2018; Barreca et al., Citation2016; Kolesár & Rothe, Citation2018; Lee & Card, Citation2008).

4 See Bloom (Citation2005); Bloom et al. (Citation2007); Deke and Dragoset (Citation2012); Hedges and Hedberg (Citation2007); Murray (Citation1998); and Schochet (Citation2008a, Citation2008b) for more information about calculating statistical power in both RCTs and RDDs.

5 The What Works Clearinghouse, managed by the U.S. Department of Education’s Institute of Education Sciences, systematically reviews and synthesizes education research studies with the goal of providing a reliable source of scientific evidence for what works in education to improve student outcomes. For more information, see http://ies.ed.gov/ncee/wwc/.

6 Asymdystopia can also arise in other causal impact designs, such as quasi-experiments (QEDs). For example, omitted variable bias can often lead to upwardly biased QED impact estimates, which likely do not diminish with sample size. A formal consideration of asymdystopia in QEDs is beyond the scope of this article, however.

7 This summary draws heavily from the WWC’s technical methods paper entitled Assessing Attrition Bias (https://ies.ed.gov/ncee/wwc/Document/243), which includes complete details of the attrition model.

8 The U.S. Department of Health and Human Services has also used this model. See, for example, the Home Visiting Evidence of Effectiveness Review (http://homvee.acf.hhs.gov) and the Teen Pregnancy Prevention Evidence Review (http://tppevidencereview.aspe.hhs.gov).

9 For any observed overall and differential attrition rates, there are many values of αt and αc that would yield a given level of bias (see EquationEquation (3)). To calculate a unique pair of model parameters for each given level of bias, we assume that αt=rαc, where r is a constant equal to the ratio of αt to αc implicit in the WWC parameters (0.27/0.22). This approach allows us to uniquely characterize how optimistic the study parameters would need to be to contain bias.

10 Note that it does not matter which attrition rates and values of α correspond to the treatment or control groups—switching all treatment and control labels would yield the same conclusions.

11 We exclude quick reviews because the review protocol differs from other types of reviews. The database is available at https://ies.ed.gov/ncee/wwc/StudyFindings.

12 Typically the MDE is expressed as a function of the standard error. However, the WWC does not record standard errors, so we infer the standard error from the combination of the impact estimate, p-value, and analytical sample size. For each study, we calculated the MDE using the following formula: MDE=[T1(N1,1α2)+T1(N1,β)]×|ES/T1(N1,p2)|, where T1 is the inverse t-distribution, α is the significance level (assumed to be 0.05), β is the power (assumed to be 0.80), ES is the effect size, p is the p-value, N is the analytical sample size for the unit of randomization, and the vertical bars indicate the absolute value.

13 A non-exhaustive list of relevant papers includes: Armstrong and Kolesár (Citation2020); Bartalotti (Citation2019); Bartalotti et al. (Citation2017); Branson et al. (Citation2019); Calonico et al. (Citation2020); Cattaneo et al. (Citation2015, Citation2017); He and Bartalotti (Citation2020); Imbens and Wager (Citation2019); Noack and Rothe (Citation2020); Sales and Hansen (Citation2019).

14 In some cases, these algorithms select bandwidths that are so narrow there are not enough data to calculate an impact and/or a standard error. In those cases, we automatically expand the bandwidth until we can calculate an impact and standard error. Of the seven DGPs (see and ), three had cases where the bandwidth had to be expanded. For the DGPs represented in panes (b) and (c), the bandwidth had to be expanded in 78% of Monte Carlo replications when the CCT bandwidth selection algorithm was used with a sample size of 100,000. For the DGP represented in , pane (d), the bandwidth had to be expanded in up to 1% of replications when the IK bandwidth was used (regardless of sample size). When the CCT algorithm was used, the bandwidth had to be expanded in 70% of replications with a sample size of 1,000 and over 99% of replications with a sample size of 10,000 or 100,000.

15 The WWC attrition model does not directly incorporate covariates. However, the benefits of adjusting for covariates can be reflected in the model by making more optimistic assumptions regarding the negative consequences of attrition.

16 We recognize that in the case of retrospective studies, researchers have much less control over the data they have and the analyses those data can support.

Additional information

Funding

This article is based on the following report released by the Institute of Education Sciences at the U.S. Department of Education: Deke, J., Wei, T., & Kautz, T. (2017). Asymdystopia: The threat of small biases in evaluations of education interventions that need to be powered to detect small impacts [NCEE 2018-4002]. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. This article has been funded in part by federal funds from the U.S. Department of Education under contract [number ED-IES-12-C-0083]. The content of this publication does not necessarily reflect the views or policies of the U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. This article has been funded in part by Mathematica, Inc. (©2019 Mathematica, Inc.).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.