890
Views
55
CrossRef citations to date
0
Altmetric
Review Articles

A general theory of effect size, and its consequences for defining the benchmark response (BMR) for continuous endpoints

Pages 342-351 | Received 01 Jun 2016, Accepted 23 Sep 2016, Published online: 02 Nov 2016
 

Abstract

A general theory on effect size for continuous data predicts a relationship between maximum response and within-group variation of biological parameters, which is empirically confirmed by results from dose–response analyses of 27 different biological parameters. The theory shows how effect sizes observed in distinct biological parameters can be compared and provides a basis for a generic definition of small, intermediate and large effects. While the theory is useful for experimental science in general, it has specific consequences for risk assessment: it solves the current debate on the appropriate metric for the Benchmark response in continuous data. The theory shows that scaling the BMR expressed as a percent change in means to the maximum response (in the way specified) automatically takes “natural variability” into account. Thus, the theory supports the underlying rationale of the BMR 1 SD. For various reasons, it is, however, recommended to use a BMR in terms of a percent change that is scaled to maximum response and/or within group variation (averaged over studies), as a single harmonized approach.

Acknowledgements

The author is highly grateful to Cajo ter Braak, Peter Teunis, George Johnson, Jose Fereira, Hilko van der Voet, and Jose Cortinas Abrahantes for critically reading the manuscript and for providing valuable comments. Further, the author gratefully acknowledges the extensive comments provided by five reviewers selected by the Editor and anonymous to the author. These comments were helpful in revising the manuscript.

Declaration of interest

The author’s affiliation is as shown on the cover page. The paper was prepared during the course of the author's normal employment without an external funding. The author has sole responsibility for the writing and content of the paper.

Supplemental material

Supplemental material for this article is available online here.

Annex

This Annex shows how the ES theory predicts the relationship between M and s.

Assumption 1

The first assumption is a very usual assumption underlying statistical methodology. It says that within-group variation is, next to direct measurement errors, the result of (many) experimental factors that were not entirely constant among experimental units (during, before, or at the end of the study). These experimental factors are in fact unintended concomitant experimental treatments, resulting in disturbing effects on the observations. Usually, experimenters aim to mitigate these disturbing effects by keeping the experimental circumstances as similar as possible among the experimental units, in an attempt to minimize the within-group variation (i.e. to maximize the signal/noise ratio, one might say). However, zero within-group variation will never be achieved. For instance, in animal experiments there will always be small differences in the locations in the experimental room or facility, and in treatments like feeding, weighing, section, cage cleaning. Further, the concomitant treatments cannot occur at exactly the same time, so that there may be time effects. Assumption 1 provides an explanation of the “scatter” in the data, and is a generally accepted principle in statistical theory, forming the basis for the theory of study designs (e.g. Kempthorne Citation1952; Cox Citation1958).

Assumption 2

The second assumption says that effects from treatments are multiplicative rather than additive for biological parameters observed as continuous data. In other words, in the hypothetical situation that there are no disturbing factors in an experiment, a given treatment is assumed to cause the same percent change in different experimental units (e.g. subjects) that differ in response value before the treatment (or rather: had they not been treated).

Together with assumption 1, the central limit theorem predicts that data are lognormally distributed when the sources of variation act multiplicatively (and normally distributed when the sources of variation act additively).

The notion that effects tend to work multiplicatively rather than additively has been frequently re-iterated in the literature since around 135 years ago (Galton Citation1879; Kapteijn Citation1916; Wicksell Citation1917; Cochran Citation1938; Gebelein and Heite Citation1950; Wachholder Citation1952; Bagenal Citation1955; Mitzenmacher Citation2004; Furusawa et al. Citation2005; Limpert and Stahel Citation2011; Kobayashi et al. Citation2011). Further, all sorts of (continuous) data have been reported to be close to lognormal in numerous published papers. In the Supplementary Material, it is shown that for all datasets that were used for estimating M and s, both (i) lognormality and (ii) SDs being proportional to their means was confirmed. Such a systematic evaluation of a large range of datasets has, to the knowledge of the author, no precedent. It provides very strong evidence for the generality of the multiplicativity of effects in biological parameters.

Nonetheless, statistical analyses often start from assuming additivity of effects, that is, by assuming normal distributions and homogenous variances as the default, and apply the log (or other) transformation only “if needed”. This is poor strategy. Since effects are more likely to be multiplicative than additive, any statistical analysis (of continuous data) should start with taking the logarithm of the observations as the default approach, and check the assumptions of normal distributions and homogenous variances on that scale. Is so, on the original scale data are lognormal and show homogenous coefficients of variation. While these assumptions appear to hold in most datasets (as illustrated in the Supplementary Material), there may always be specific cases that the assumptions appear not fulfilled. However, it is important to realize that the assumptions may be compromised for specific reasons. For example, there may be some clustering in the data that was not accounted for in the statistical analysis, or the data were derived values, such as concentrations in a bulk sample (which effectively means that the underlying concentrations in the sub-samples were added, thereby distorting the original lognormal distribution). In the former example, the statistical analysis should take the clustering into account rather than assume another distribution. In the latter example, the lognormal distribution indeed fails, however, without rejecting the multiplicativity assumption.

Quantitative specification of the basic theory: how do effect sizes scale?

With the two assumptions just discussed, the basic theory presented in the main text can now be further specified quantitatively. As assumption 2 is equivalent to the assumption that effects work additively on log scale, the derivation below will be based on the (natural) log scale. Consider an animal experiment where the treatment results in the maximum response of a particular biological parameter (P1). This biological parameter is observed by measurements Y. Any observation Y in the treatment group may be described by: (A1) where M reflects the maximum fold-change relative to the geometric mean of Yuntreated, which is equal to exp(μ), and where μY, untreated denotes that expected response of Y in the untreated animal. The last term in expression (A1) is a random error term, consisting of the sum of a large number of individual small error terms, indexed by i. Here, each term ɛi is the contribution from some experimental condition that was not exactly identical among the experimental units. These experimental conditions can be considered as unintended treatments (see assumption 1). Each ɛi is expressed as a fold-change as well. Strictly speaking, the error term should also include the measurement error in the observation Y, but here it will be assumed that this can be ignored.

Now consider a second biological parameter (P2) in the same study, and assume that the applied treatment results in maximum response in parameter P2 as well. Parameter P2 is observed by measurements Z. The “impact” (see basic theory) of the applied treatment is similar in Y and Z, as it results in maximum response in both parameters. Since P1 and P2 are observed in the same experimental units (e.g. animals), each observation Y (related to parameter P1) is accompanied by an observation Z (related to parameter P2) observed in the same animal. So, next to the impact of the intended treatment, all the impacts from all unintended treatments with respect to P1 and P2 may, in the same animal, be assumed to be similar as well (this assumption is further discussed below). With the basic theory saying that translating an (invisible) impact into the associated effect size only depends on the “expressiveness” of the parameter (here, P1 or P2), we may now try to answer the question: How would expression (A1) look for observation Z? To that end, we need a quantitative hypothesis telling how biological parameters translate any given (invisible) impact into an effect size in the quantitative sense.

At first thought, one might consider the hypothesis that a biological parameter translates any given (invisible) impact into an effect size that is proportional to its maximum effect size. However, this hypothesis is not possible for a priori reasons: it may result in effect sizes smaller than one (which is the null effect size for a fold change). For example, when M is equal to 6 and 2.4 in P1 and P2, respectively, then the effect size 2 in P1 would be equivalent to an effect size of 0.8 in P2. Therefore, this hypothesis needs no further consideration.

A more realistic hypothesis would be that a biological parameter translates any given impact into an effect size that is proportional to its maximum effect size, but with both effect sizes (fold changes) on the log scale. If so, P2 will translate any given impact into an effect size that is proportional to the logarithm of effect size expressed by P1 as a result of the same impact. With this assumption, an observation Z in parameter P2 is predicted to obey: (A2)

When we calculate the within-group variance of log(Y) and log(Z), based on expressions (A1) and (A2), it follows that (A3) or, sZ =c sY. Together with log(MZ) = c log(MY), this second hypothesis predicts that log(M) is proportional to s when comparing different biological parameters. This is exactly the fitted curve in the empirical relationship shown in .

One additional assumption was implicitly used here: the unintended experimental treatments, whose effects are captured in the term Σlog(ɛi), all have the same impact on P1 and P2 in the same animal. This is a strong assumption, and probably not true in general. However, the assumption is not needed in a strict sense, due to the fact that the number of coincidental factors in experimental studies is large. Thus, coincidental factors that may affect one parameter but not another might be compensated by other coincidental factors for which the opposite holds. Further, it should be noted that the empirical relationship in is based on different studies, so that the unintended experimental factors no longer relate to the same animals (experimental units). This will cause additional scatter in the correlation plot. However, when the estimate of s is based on a large range of studies, this effect will disappear as well.

Notes

1 A fold change is equivalent to a percent change, e.g. a 2-fold increase is a 100% increase. However, a 2-fold decrease is a 50% decrease. While expressing changes in terms of percent change is very common, it is better to think (and report) in terms of fold change. For example, a 5 fold-increase and a 5-fold decrease reflect the same change, and thus both increasing and decreasing dose-responses could be included in creating . Also see the discussion of plotting SDs against means in increasing and decreasing dose-responses in the Supplementary Material.

2 Here, it is assumed that s is constant over dose groups, or, equivalently, that on the original scale the SD is proportional to the mean (CV is constant). The latter is in line with the basic assumption that effects work multiplicatively, and is confirmed by the data shown in the Supplementary Material.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 65.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 739.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.