Abstract
Randomized evaluations of educational technology produce log data as a bi-product: highly granular data on student and teacher usage. These datasets could shed light on causal mechanisms, effect heterogeneity, or optimal use. However, there are methodological challenges: implementation is not randomized and is only defined for the treatment group, and log datasets have a complex structure. This article discusses three approaches to help surmount these issues. One approach uses data from the treatment group to estimate the effect of usage on outcomes in an observational study. Another, causal mediation analysis, estimates the role of usage in driving the overall effect. Finally, principal stratification estimates overall effects for groups of students with the same “potential” usage. We analyze hint data from an evaluation of the Cognitive Tutor Algebra I curriculum using these three approaches, with possibly conflicting results: the observational study and mediation analysis suggest that hints reduce posttest scores, while principal stratification finds that treatment effects may be correlated with higher rates of hint requests. We discuss these mixed conclusions and give broader methodological recommendations.
Notes
1 This subsection draws heavily on comments from an anonymous reviewer.
2 Technically, if h is the event that a student requests a hint on a problem and e is the event that the student makes an error, then
3 The principal stratification model was re-run without dropping sections, and after dropping sections worked by fewer than 500 students, with similar results.
4 Including these students in a principal stratification model is straightforward (Sales & Pane, Citation2019a). Including subjects with missing log data in a mediation or observational study design can be more problematic (see, e.g. Li & Zhou, Citation2017).
5 This is equivalent to the “controlled direct effect,” CDE(0) e.g. VanderWeele (Citation2015, p. 57).