Abstract
Mediation analysis in Single Case Experimental Designs (SCEDs) evaluates intervention mechanisms for individuals. Despite recent methodological developments, no clear guidelines exist for maximizing power to detect the indirect effect in SCEDs. This study compares frequentist and Bayesian methods, determining (1) minimum required sample size to detect indirect effects in AB designs, (2) relative power for proximal vs. distal mediators, and (3) optimal observation allocation between baseline and treatment phases. Simulation study results suggest highest power for proximal mediators with at least 60 observations evenly allocated to A and B phases. Findings from this study have important implications for the design and statistical analysis in numerous fields that routinely employ SCEDs, ranging from education to psychology and nursing.
Notes
1 We use the term “group data” to refer to studies that collect and analyze data for samples consisting of more than one participant, as opposed to SCEDs in which data analysis is conducted for one participant at a time, regardless of whether the study collected data for more participants or not.
2 However, setting the b path to 0.59 in this setting does not result in the proportion of explained variation in the outcome being equal to 26%, which is the definition of a large effect usually simulated in group level mediation analysis.
3 The RMSE for Bayesian point summaries for the individual paths and the indirect effect was only computed for the posterior mean, and not for the posterior median and mode because the posterior mean had the lowest bias out of the three point summaries. Thus the RMSE values for Bayesian results represent an upper limit in terms of statistical performance of Bayesian point summaries in this context.
4 Technically, the concepts of power and Type I error rate stem from the frequentist framework and do not apply to Bayesian posterior summaries, and we do not advocate for using cut-off values when interpreting results of Bayesian hypothesis testing. The only reason we do so in this study was to define criteria for evaluating our methods that would inform applied researchers about the expected accuracy of these methods in practice.