1,254
Views
14
CrossRef citations to date
0
Altmetric
Research Article

Bootstrap Confidence Intervals for Multilevel Standardized Effect Size

 

Abstract

Although many methodologists and professional organizations have urged applied researchers to compute and report effect size measures accompanying tests of statistical significance, discussions on obtaining confidence intervals (CIs) for effect size with clustered/multilevel data have been scarce. In this paper, I explore the bootstrap as a viable and accessible alternative for obtaining CIs for multilevel standardized mean difference effect size for cluster-randomized trials. A simulation was carried out to compare 17 analytic and bootstrap procedures for constructing CIs for multilevel effect size, in terms of empirical coverage rate and width, for both normal and nonnormal data. Results showed that, overall, the residual bootstrap with studentized CI had the best coverage rates (94.75% on average), whereas the residual bootstrap with basic CI had better coverage in small samples. These two procedures for constructing CIs showed better coverage than using analytic methods for both normal and nonnormal data. In addition, I provide an illustrative example showing how bootstrap CIs for multilevel effect size can be easily obtained using the statistical software R and the R package bootmlm. I strongly encourage applied researchers to report CIs to adequately convey the uncertainty of their effect size estimates.

Notes

1 Note that, whereas the definition of SD causes no confusion for single-level studies, it is ambiguous in multilevel studies, because several different SDs can be used. These include the within-cluster SD (σW), the between-cluster SD (σB), and the total SD. Hedges (Citation2007) viewed the issue in a meta-analysis framework, and suggested that the choice should depend on the nature of other studies in the synthesis. For example, if in most other studies data are collected from a single site, the within-cluster SD may be a better choice. It is not a purpose of this study to argue which SD should be used. Indeed, any effect size can be estimated with the bootstrap as long as the estimator can be obtained from the original sample one. I choose the total SD in this study because it uses more information in the data and theoretically can be converted to a variance accounted for effect size (Snijders & Bosker, Citation1994).

2 In Hedges (Citation2009), V(σ̂W2) in the numerator of the second term of Equation (8) was dropped with the assumption that it was negligible, but in this paper it is kept for more accurate estimation.

3 Technically speaking, the sampling variance of each individual residual depends on their hat values, which is a measure of the influence of each individual case (or cluster) on the model estimates. Therefore, rescaling all residuals by the same value is not mathematically correct, and one should instead rescale the level-2 and level-1 residuals according to their hat values, as recommended in Davison and Hinkley (Citation1997). In single-level regression, the variance of a residual is V(êi)=σ2(1hii), which depends on the leverage (i.e., hat value, hii) of the ith observation. Therefore, a theoretically better residual bootstrap procedure is to transform the residuals differentially as êi*=êi/1hii so that V(êi*)=σ2 for all i. The results based on this improved residual bootstrap procedure are not presented in this paper, however, as they were essentially the same as those for the procedure by Carpenter et al. (Citation2003). In the bootmlm package, Carpenter et al.’s procedure can be called using the argument type = ‘residual_cgr’, and the hat-value-reflated procedure can be called using the argument type = ‘residual’.

4 In a conventional jackknife estimator, li=T(x)T(i)(x) represents the changes in T(x) when the ith observation is deleted. As noted in Van der Leeden et al. (Citation2008), as the level-1 observations are not independent for multilevel data, one can perform jackknife only on the highest level, so lj is the changes in δ̂T when the jth cluster is deleted. In this paper the BCa CIs were obtained using the grouped jackknife as described in Van der Leeden et al.

5 In this paper I adopt the parameterization of skew-t by Azzalini (Citation2013, chapter 4), in which a variable Z=Z0/V has a skew-t distribution with slant parameter α and degrees of freedom ν, if Z0 follows a skew-normal distribution with slant = α and V follows a χ2/ν distribution with degrees of freedom = ν. The skew-normal density function is 2ϕ(x)Φ(αx), where ϕ(x) is the normal density function and Φ(x) is the cumulative normal density function.

6 For example, the parametric and the residual bootstrap with percentile and bias-corrected CIs in MLwiN (Rasbash et al., Citation2019), the bootstrap procedures in the R packages rms (Harrell, Citation2019) and ClusterBootstrap (Deen & de Rooij, Citation2018), the case bootstrap routines in SPSS and in Stata, and the SAS macros by Roberts and Fan (Citation2004) and Wang et al. (Citation2006).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.