1,592
Views
10
CrossRef citations to date
0
Altmetric
Evidence Base Update

Effect Size Measures for Multilevel Models in Clinical Child and Adolescent Research: New R-Squared Methods and Recommendations

&
 

Abstract

Clinical psychologists studying child and adolescent populations commonly analyze hierarchically structured data via multilevel modeling (MLM). In clinical child and adolescent psychology, and in psychology more broadly, increasing emphasis is being placed on the reporting of effect size, such as R-squared (R2) measures of explained variance. In MLM, however, the literature on R2 had, until recently, suffered from several shortcomings: (a) the relations among existing measures were unknown, (b) methods for quantifying some types of explained variance were unavailable, (c) which (if any) measures should be used for model comparison was unclear, (d) most measures did not generalize to models with more than two levels, and (e) software to compute measures was unavailable. The purpose of this article is to summarize recent methodological developments that resolved these issues and encourage the use of MLM R2 in practice. We provide a nontechnical discussion of how the issues have been resolved and demonstrate how the new measures and methods can be implemented, highlighting their utility with an empirical example. We first consider a two-level MLM for a single hypothesized model in which we examine emotional response to social situations as a predictor of maladaptive self-cognitions, demonstrating the various ways we can quantify explained variance. We then discuss and demonstrate the use of R2 for model comparison, and discuss the extension to models with more than two levels. Last, we discuss new free software that researchers can use to compute measures and produce associated graphics.

Acknowledgments

We thank Sonya Sterba for helpful comments.

Notes

1 In the past 10 years, approximately 17% of articles published in Journal of Clinical Child & Adolescent Psychology have included analyses or discussion of MLM, also commonly called hierarchical linear modeling or mixed-effects modeling (estimated using Google Scholar and determining the percentage of articles using the exact phrase “multilevel modeling,” its synonyms, and such variants as “multilevel models”).

2 Note that some measures include only one source of explained variance, whereas others combine multiple sources. For simplicity, we focus on the single-source measures, as the combined-source measures are simple combinations of these, as noted in .

3 Any of the combined source measures can be visualized by looking at the cumulative size of the stacked bars; for instance, in the hypothetical example, f1, f2, v, and m together explain about 80% of the total outcome variance ().

4 Certain measures are additionally mathematically equivalent in the sample, as explained in Rights and Sterba (Citation2018a).

5 Researchers may wish to employ Cohen’s (Citation1992) rules-of-thumb to determine whether an effect size is small (R2 = .02), medium (R2 = .13), or large (R2 = .26). We caution researchers, however, that such guidelines are to some degree subjective, arbitrary, and dependent on the research context.

6 In this table, we assume that the random intercept will be added only to a Model A that contains a fixed intercept and no predictors. This can be used as an initial comparison to assess the extent of overall between-cluster variability. However, this first step can equivalently be accomplished by computing the intraclass correlation coefficient (i.e., the ratio of between-cluster variance to the sum of between- and within-cluster variances) from a random-intercept-only model.

7 As an optional Step 5, one can compute a combined-source by simply adding the relevant measures; we recommend, however, that researchers focus on the single-source measures for more complete information (see Rights & Sterba, Citation2018c, for details).

8 We caution researchers, however, in using this information alone in determining whether random slopes are necessary to include. In general, a low simply reflects a small effect size and does not necessarily imply that the added terms are superfluous (see Rights & Sterba, Citation2018a).

9 Although the code currently does not accommodate more than three levels, researchers dealing with four or more levels (though rare) can compute measures using the formulae provided by Rights and Sterba (Citation2018b).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.