1,254
Views
14
CrossRef citations to date
0
Altmetric
Research Article

Bootstrap Confidence Intervals for Multilevel Standardized Effect Size

Pages 558-578 | Published online: 11 Apr 2020
 

Abstract

Although many methodologists and professional organizations have urged applied researchers to compute and report effect size measures accompanying tests of statistical significance, discussions on obtaining confidence intervals (CIs) for effect size with clustered/multilevel data have been scarce. In this paper, I explore the bootstrap as a viable and accessible alternative for obtaining CIs for multilevel standardized mean difference effect size for cluster-randomized trials. A simulation was carried out to compare 17 analytic and bootstrap procedures for constructing CIs for multilevel effect size, in terms of empirical coverage rate and width, for both normal and nonnormal data. Results showed that, overall, the residual bootstrap with studentized CI had the best coverage rates (94.75% on average), whereas the residual bootstrap with basic CI had better coverage in small samples. These two procedures for constructing CIs showed better coverage than using analytic methods for both normal and nonnormal data. In addition, I provide an illustrative example showing how bootstrap CIs for multilevel effect size can be easily obtained using the statistical software R and the R package bootmlm. I strongly encourage applied researchers to report CIs to adequately convey the uncertainty of their effect size estimates.

Notes

1 Note that, whereas the definition of SD causes no confusion for single-level studies, it is ambiguous in multilevel studies, because several different SDs can be used. These include the within-cluster SD (σW), the between-cluster SD (σB), and the total SD. Hedges (Citation2007) viewed the issue in a meta-analysis framework, and suggested that the choice should depend on the nature of other studies in the synthesis. For example, if in most other studies data are collected from a single site, the within-cluster SD may be a better choice. It is not a purpose of this study to argue which SD should be used. Indeed, any effect size can be estimated with the bootstrap as long as the estimator can be obtained from the original sample one. I choose the total SD in this study because it uses more information in the data and theoretically can be converted to a variance accounted for effect size (Snijders & Bosker, Citation1994).

2 In Hedges (Citation2009), V(σ̂W2) in the numerator of the second term of Equation (8) was dropped with the assumption that it was negligible, but in this paper it is kept for more accurate estimation.

3 Technically speaking, the sampling variance of each individual residual depends on their hat values, which is a measure of the influence of each individual case (or cluster) on the model estimates. Therefore, rescaling all residuals by the same value is not mathematically correct, and one should instead rescale the level-2 and level-1 residuals according to their hat values, as recommended in Davison and Hinkley (Citation1997). In single-level regression, the variance of a residual is V(êi)=σ2(1hii), which depends on the leverage (i.e., hat value, hii) of the ith observation. Therefore, a theoretically better residual bootstrap procedure is to transform the residuals differentially as êi*=êi/1hii so that V(êi*)=σ2 for all i. The results based on this improved residual bootstrap procedure are not presented in this paper, however, as they were essentially the same as those for the procedure by Carpenter et al. (Citation2003). In the bootmlm package, Carpenter et al.’s procedure can be called using the argument type = ‘residual_cgr’, and the hat-value-reflated procedure can be called using the argument type = ‘residual’.

4 In a conventional jackknife estimator, li=T(x)T(i)(x) represents the changes in T(x) when the ith observation is deleted. As noted in Van der Leeden et al. (Citation2008), as the level-1 observations are not independent for multilevel data, one can perform jackknife only on the highest level, so lj is the changes in δ̂T when the jth cluster is deleted. In this paper the BCa CIs were obtained using the grouped jackknife as described in Van der Leeden et al.

5 In this paper I adopt the parameterization of skew-t by Azzalini (Citation2013, chapter 4), in which a variable Z=Z0/V has a skew-t distribution with slant parameter α and degrees of freedom ν, if Z0 follows a skew-normal distribution with slant = α and V follows a χ2/ν distribution with degrees of freedom = ν. The skew-normal density function is 2ϕ(x)Φ(αx), where ϕ(x) is the normal density function and Φ(x) is the cumulative normal density function.

6 For example, the parametric and the residual bootstrap with percentile and bias-corrected CIs in MLwiN (Rasbash et al., Citation2019), the bootstrap procedures in the R packages rms (Harrell, Citation2019) and ClusterBootstrap (Deen & de Rooij, Citation2018), the case bootstrap routines in SPSS and in Stata, and the SAS macros by Roberts and Fan (Citation2004) and Wang et al. (Citation2006).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.