Bootstrap Confidence Intervals for Multilevel Standardized Effect Size: Multivariate Behavioral Research: Vol 56 , No 4

Sample our Education journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/00273171.2020.1746902?needAccess=true

Abstract

Although many methodologists and professional organizations have urged applied researchers to compute and report effect size measures accompanying tests of statistical significance, discussions on obtaining confidence intervals (CIs) for effect size with clustered/multilevel data have been scarce. In this paper, I explore the bootstrap as a viable and accessible alternative for obtaining CIs for multilevel standardized mean difference effect size for cluster-randomized trials. A simulation was carried out to compare 17 analytic and bootstrap procedures for constructing CIs for multilevel effect size, in terms of empirical coverage rate and width, for both normal and nonnormal data. Results showed that, overall, the residual bootstrap with studentized CI had the best coverage rates (94.75% on average), whereas the residual bootstrap with basic CI had better coverage in small samples. These two procedures for constructing CIs showed better coverage than using analytic methods for both normal and nonnormal data. In addition, I provide an illustrative example showing how bootstrap CIs for multilevel effect size can be easily obtained using the statistical software R and the R package bootmlm. I strongly encourage applied researchers to report CIs to adequately convey the uncertainty of their effect size estimates.

Keywords:

Effect size
multilevel
cluster-randomized trial
bootstrap
standardized mean difference
robustness
nonnormal data

Notes

1 Note that, whereas the definition of SD causes no confusion for single-level studies, it is ambiguous in multilevel studies, because several different SDs can be used. These include the within-cluster SD ( $σ_{W}$ ), the between-cluster SD ( $σ_{B}$ ), and the total SD. Hedges (Citation2007) viewed the issue in a meta-analysis framework, and suggested that the choice should depend on the nature of other studies in the synthesis. For example, if in most other studies data are collected from a single site, the within-cluster SD may be a better choice. It is not a purpose of this study to argue which SD should be used. Indeed, any effect size can be estimated with the bootstrap as long as the estimator can be obtained from the original sample one. I choose the total SD in this study because it uses more information in the data and theoretically can be converted to a variance accounted for effect size (Snijders & Bosker, Citation1994).

2 In Hedges (Citation2009), $V ({\hat{σ}}_{W}^{2})$ in the numerator of the second term of Equation (8) was dropped with the assumption that it was negligible, but in this paper it is kept for more accurate estimation.

3 Technically speaking, the sampling variance of each individual residual depends on their hat values, which is a measure of the influence of each individual case (or cluster) on the model estimates. Therefore, rescaling all residuals by the same value is not mathematically correct, and one should instead rescale the level-2 and level-1 residuals according to their hat values, as recommended in Davison and Hinkley (Citation1997). In single-level regression, the variance of a residual is $V ({\hat{e}}_{i}) = σ^{2} (1 - h_{i i}),$ which depends on the leverage (i.e., hat value, h_ii) of the ith observation. Therefore, a theoretically better residual bootstrap procedure is to transform the residuals differentially as ${\hat{e}}_{i}^{*} = {\hat{e}}_{i} / \sqrt{1 - h_{i i}}$ so that $V ({\hat{e}}_{i}^{*}) = σ^{2}$ for all i. The results based on this improved residual bootstrap procedure are not presented in this paper, however, as they were essentially the same as those for the procedure by Carpenter et al. (Citation2003). In the bootmlm package, Carpenter et al.’s procedure can be called using the argument type = ‘residual_cgr’, and the hat-value-reflated procedure can be called using the argument type = ‘residual’.

4 In a conventional jackknife estimator, $l_{i} = T (x) - T_{(i)} (x)$ represents the changes in $T (x)$ when the ith observation is deleted. As noted in Van der Leeden et al. (Citation2008), as the level-1 observations are not independent for multilevel data, one can perform jackknife only on the highest level, so l_j is the changes in ${\hat{δ}}_{T}$ when the jth cluster is deleted. In this paper the BCa CIs were obtained using the grouped jackknife as described in Van der Leeden et al.

5 In this paper I adopt the parameterization of skew-t by Azzalini (Citation2013, chapter 4), in which a variable $Z = Z_{0} / \sqrt{V}$ has a skew-t distribution with slant parameter $α$ and degrees of freedom ν, if Z₀ follows a skew-normal distribution with slant = $α$ and V follows a $χ^{2} / ν$ distribution with degrees of freedom = ν. The skew-normal density function is $2 ϕ (x) Φ (α x),$ where $ϕ (x)$ is the normal density function and $Φ (x)$ is the cumulative normal density function.

6 For example, the parametric and the residual bootstrap with percentile and bias-corrected CIs in MLwiN (Rasbash et al., Citation2019), the bootstrap procedures in the R packages rms (Harrell, Citation2019) and ClusterBootstrap (Deen & de Rooij, Citation2018), the case bootstrap routines in SPSS and in Stata, and the SAS macros by Roberts and Fan (Citation2004) and Wang et al. (Citation2006).

Hedges, L. V. (2007). Effect sizes in cluster-randomized designs. Journal of Educational and Behavioral Statistics, 32(4), 341–370. doi:10.3102/1076998606298043

Web of Science ®Google Scholar

Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22(3), 342–363. doi:10.1177/0049124194022003004

Web of Science ®Google Scholar

Hedges, L. V. (2009). Effect sizes in nested designs. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 337–355). Russell Sage Foundation.

Google Scholar

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University.

Google Scholar

Carpenter, J. R., Goldstein, H., & Rasbash, J. (2003). A novel bootstrap procedure for assessing the relationship between class size and achievement. Journal of the Royal Statistical Society: Series C (Applied Statistics)), 52(4), 431–443. doi:10.1111/1467-9876.00415

Web of Science ®Google Scholar

Van der Leeden, R., Meijer, E., & Busing, F. M. T. A. (2008). Resampling multilevel models. In J. de Leeuw & E. Meijer (Eds.), Handbook of multilevel analysis (pp. 401–433). Springer.

Google Scholar

Azzalini, A. (2013). The skew-normal and related families (with the collaboration of Antonella Capitanio). Cambridge University Press. 10.1017/CBO9781139248891

Google Scholar

Rasbash, J., Steele, F., Browne, W. J., Goldstein, H. (2019). A user’s guide to MLwiN, v3.03. http://www.bristol.ac.uk/cmm/media/software/mlwin/downloads/manuals/3-03/manual-web.pdf

Google Scholar

Harrell, F. E. Jr. (2019). rms: Regression modeling strategies (R package version 5.1-3.1) [Computer software manual]. https://CRAN.R-project.org/package=rms

Google Scholar

Deen, M., de Rooij, M. (2018). Clusterbootstrap: Analyze clustered data with generalized linear models using the cluster bootstrap (R package version 1.0.0) [Computer software manual]. https://CRAN.R-project.org/package=ClusterBootstrap

Google Scholar

Roberts, J. K., & Fan, X. (2004). Bootstrapping within the multilevel/hierarchical linear modeling framework: A primer for use with SAS and SPLUS. Multiple Linear Regression Viewpoints, 30, 23–34.

Google Scholar

Wang, J., Carpenter, J. R., & Kepler, M. A. (2006). Using SAS to conduct nonparametric residual bootstrap multilevel modeling with a small number of groups. Computer Methods and Programs in Biomedicine, 82(2), 130–143. doi:10.1016/j.cmpb.2006.02.006

PubMed Web of Science ®Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 352.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

Bootstrap Confidence Intervals for Multilevel Standardized Effect Size

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Bootstrap Confidence Intervals for Multilevel Standardized Effect Size

Abstract

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature