630
Views
0
CrossRef citations to date
0
Altmetric
Sports Performance

Meta-analysis of variation in sport and exercise science: Examples of application within resistance training research

ORCID Icon, , , , &
Pages 1617-1634 | Received 29 Oct 2022, Accepted 15 Nov 2023, Published online: 01 Dec 2023
 

ABSTRACT

Meta-analysis has become commonplace within sport and exercise science for synthesising and summarising empirical studies. However, most research in the field focuses upon mean effects, particularly the effects of interventions to improve outcomes such as fitness or performance. It is thought that individual responses to interventions vary considerably. Hence, interest has increased in exploring precision or personalised exercise approaches. Not only is the mean often affected by interventions, but variation may also be impacted. Exploration of variation in studies such as randomised controlled trials (RCTs) can yield insight into interindividual heterogeneity in response to interventions and help determine generalisability of effects. Yet, larger samples sizes than those used for typical mean effects are required when probing variation. Thus, in a field with small samples such as sport and exercise science, exploration of variation through a meta-analytic framework is appealing. Despite the value of embracing and exploring variation alongside mean effects in sport and exercise science, it is rarely applied to research synthesis through meta-analysis. We introduce and evaluate different effect size calculations along with models for meta-analysis of variation using relatable examples from resistance training RCTs.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 To clarify language here for those unfamiliar, the term and concept model is used commonly in statistics. A statistical model essentially is specification of what we think the data generating process might be for a given situation. In the context of meta-analyses the data are usually the individual effects that we have extracted from studies i.e., the results of each study. The model, in mathematical formulae, is intended to approximate the processes that we assume led to the generation of the data.

2 Effect size is an agnostic term used for a family of statistics which communicate the strength of a given “effect” resulting from research. This includes descriptive statistics ranging from mean raw values to correlation coefficients and everything in between (Caldwell & Vigotsky, Citation2020) including, as we shall see, statistics describing variation.

3 This estimation can be done using a variety of methods and is an area of ongoing investigation as to how different methods perform. This is beyond the scope of this paper to discuss. We note however that the models we present all utilise Restricted Maximum Likelihood estimation.

4 Hence current efforts to conduct direct replications (see https://ssreplicationcentre.com/).

5 For example, strength might be examined in different studies using different operationalisations including one repetition maximum testing or maximum voluntary contractions. Or the same operationalisations may be employed but different exercises such as the squat or bench press.

6 Though notably not all meta-analyses use magnitude based effect sizes. Indeed some explicitly use what Caldwell and Vigotsky (Citation2020) term signal-to-noise effect sizes (e.g., Heidel et al. (Citation2022)).

7 For those unfamiliar with the terminology, an estimator for a statistic is unbiased if it produces parameter estimates that are on average correct. Thus a bias corrected statistic is one which would be biased without the correction applied, but otherwise has been shown to be unbiased.

8 We will refer to both merely as the SMD throughout the manuscript for simplicity and note that throughout when reporting a “SMD” we are reporting the bias-corrected version. We also note that another magnitude based effect size, Glass’ Δ, is commonly recommended as it is the simplest form of SMD though makes assumptions about the impact of the intervention having no effect on the denominator (i.e., variance; Caldwell and Vigotsky (Citation2020)).

9 Exploration of methodological approaches and their impact on heterogeneity have also been explored in preclinical research (Usui et al., Citation2021).

10 Though notably, in the case of health behaviour studies it may be the case that if someone volunteers for a study it could conceivably motivate them to alter various habits even when they are assigned to a control group thus influencing change scores.

11 For one clear example, see in Vigotsky et al. (Citation2020) who show that the mean and standard deviation for baseline strength values typically scale with one another across most studies.

12 The authors of the meta-analysis did not make their extracted data openly available, nor did they respond to our request for the extracted data. Further, their original analysis included 119 studies however we were unable to extract data for our analyses from 8 of these for a variety of reasons (e.g., only percentage change data was reported, no standard deviations for control groups reported).

13 Regression analyses are likely familiar to most readers where in the simplest form they try to predict the value of some dependent variable from some independent variable(s). This can be extended to meta-analytic synthesis where the independent variables reflect characteristics associated with the effects included. For example, they may reflect characteristics of the sample in the study for which the effect was extracted such as age or sex, or they might reflect characteristics of the intervention received such as the dose or frequency of exposure.

14 It is worth noting that in the sport and exercise sciences, similarly to other fields that examine the effects of experimental interventions, the most common study design for testing or estimating intervention effects is the randomised pretest-postest-control design (i.e., an intervention and control, or other intervention, group randomly allocated and measured pre- and post-exposure). We presented the SMD and lnRR effect sizes in equations Equation 5 and Equation 9 merely for simplicity in the introduction, but note that extension of these for such 2 × 2(i.e., condition x time) study designs have been presented in detail elsewhere (see: Gurevitch et al. (Citation2000); Morris et al. (Citation2007); Morris (Citation2008); Lajeunesse (Citation2011, Citation2015)) and these are the effect sizes used in the meta-analyses referred to here.

15 We also explored for signs of small study bias, including publication bias favouring the finding of intervention effects, for the SMDs given that the relative lack of awareness for variation based effect sizes in the field implies that they might have more influence over such biases. There did not appear to be any obvious small study bias in the dataset (see https://osf.io/stqr3).

16 We use the term arm to refer to an intervention group-control group contrast to accommodate studies including multiple intervention groups. This is so as to not confuse the reader with the use of group to designate either the RT intervention group(s) or control group separately. Thus, in the instances of models using effect sizes relating to comparisons between an intervention group and control group (i.e., SMD, lnRR, SDir, lnVR, and lnCVR) we calculate comparisons between each intervention group (i.e., arm) and the control group. Thus, where a study had for example two RT interventions and a control, two separate arms would be coded (RT intervention 1 compared to control, and RT intervention 2 compared to control). Data was coded such that study and arm had implicit nesting.

17 Technically then the random effects model presented earlier is also a mixed effects model. It is traditionally referred to as the random-effects model though.

18 In contrast to the models presented examining effect sizes relating to comparisons between an intervention group and control group, in the models examining lnσijk and lnxˉijk as a predictor the term arm refers to both the intervention groups(s) and control group. Thus, where a study had for example two RT interventions and a control, three separate arms would be coded (RT intervention 1, RT intervention 2, and control). Data were again coded such that study and arm had implicit nesting.

19 We do not have to limit ourselves to only fixed effect predictor terms as we have here. Indeed, for mixed effect models generally some argue that models should use a maximal random effects structure including both random intercepts and slopes (i.e., that the effect of the predictor term can vary within different levels of the model and is also assumed to come from an overarching distribution of slopes), and their correlations, to enhance generalisability of inferences (Barr et al., Citation2013). We could model a categorical variable for the outcome type and using random effects include β2+φ1iOutcome or β2+φ1i+φ2jOutcome in the model with Outcome as a dummy coded variable for the outcome type (i.e., hypertrophy = 0, and strength = 1), where β2 is the overall average slope or regression coefficient for Outcome, and φ1i is the deviation (random slope) from β2 for the ith study and φ2j is the deviation for the jth arm. These model specifications do not assume that the difference between outcomes is fixed, but can vary between studies and arms. We could also do the same and include random slopes for β1 on lnxˉ thus allowing for the strength of the relationship between lnσˆ and lnxˉ to also vary between studies and arms. Indeed, we fit a range of models using lnσˆ with lnxˉ and Outcome as a predictor with (1) random intercepts only for study and arm, (2) the inclusion of correlated random slopes for lnxˉ by study, (3) the inclusion of correlated random slopes for lnxˉ by study and arm, (4) the inclusion of correlated random slopes for Outcome by study, (5) the inclusion of correlated random slopes for Outcome by study and arm, and (6) the inclusion of correlated random slopes for both lnxˉ and Outcome by study, (7) the inclusion of correlated random slopes for both lnxˉ and Outcome by study and arm. The comparison of these models using 2×logBF (Kass & Raftery, Citation1995) from approximate Bayesian information criterion (Wagenmakers, Citation2007) to determine under which is the observed data most likely is included in the supplementary materials (https://osf.io/3tv6x). There was very strong evidence supporting the random intercepts plus correlated random slopes for both lnxˉ and Outcome by study model compared to all others and so this is presented here.

20 Note, as with the models examining Outcome upon baseline scores, we similarly explored change scores lnσˆ with lnxˉ and Group as a predictor with (1) random intercepts only for study and arm, (2) the inclusion of correlated random slopes for lnxˉ by study, (3) the inclusion of correlated random slopes for lnxˉ by study and arm, (4) the inclusion of correlated random slopes for Group by study, (5) the inclusion of correlated random slopes for both lnxˉ and Group by study, and (6) the inclusion of correlated random slopes for both lnxˉ and Group by study and lnxˉ by arm (we do not include the models with random slopes for Group by arm as in this model each arm refers to a particular group, RT or CON, and so no arm provides data for both). The comparison of these models using 2×logBF (Kass & Raftery, Citation1995) from approximate Bayesian information criterion (Wagenmakers, Citation2007) to determine under which is the observed data most likely is included in the supplementary materials (see https://osf.io/b5deh for strength and https://osf.io/5ektd for hypertrophy). Similar to the models including Outcome there was very strong evidence supporting the random intercepts plus correlated random slopes for both lnxˉ and Group by study and lnxˉ by arm model for strength, and random intercepts plus correlated random slopes for both lnxˉ and Group by study for hypertrophy, compared to all other models and so these are presented here. All estimates for the difference between RT and CON, where positive values indicate RT increased variation in changes scores and negative values indicate it decreased variation, can be seen in the supplementary materials (https://osf.io/5g7ce) all of which revealed similar conclusions.

21 It is perhaps worth explaining the assumptions that the different models explored make regarding the mean-variation relationship. For example, the lnVR and lnCVR models can be thought of as similar in that they both make fixed assumptions about the relationship between mean and variance; in the lnVR it is assumed to be zero, and in the lnCVR it is assumed to be proportional i.e., one. In both however this is a strong assumption. The multilevel meta-regressions on the other hand actually estimate this relationship (i.e., the value of β1, the slope or regression coefficient for lnxˉ) and in models where random slopes are included this is also estimated allowing it to vary between studies and/or arms (i.e., in some studies there may be a more or less strong relationship compared to others). Mean-variation relationships are important to consider when exploring variation effects, but it is also important to consider whether or not this relationship is assumed to be some fixed proportional value (i.e., as the lnCVR does) or whether or not this should be estimated from the data and whether it might also vary across studies and arms (i.e., as the multilevel meta-regression models allow). It should also be noted that these models all assume that the xˉ is estimated without error which is clearly not the case. Given that for most effects that might be included in such models we can determine the sampling variance for xˉ one approach to address this might be to employ models that incorporate the variance on this predictor (i.e., measurement error or errors in variables models). This is beyond the scope of this paper to discuss. It is not necessarily clear which model should be preferred here, and fortunately substantive conclusions are impacted little by model specification, but thought should be given to the assumptions each makes and the fit of each model to the data.

22 See supplementary materials (https://osf.io/e6vpr) for examples from model estimates for both SMD and lnCVR, (used for simplicity of presenting moderator analysis results) across a range of categorical and continuous predictors for both strength and hypertrophy outcomes. There were no obvious moderators of lnCVR in particular.

23 Indeed, it can be seen from figures that many of the individual study effect estimates have very large sampling errors.

Additional information

Funding

This work was not supported by any funding.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 461.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.