Full article: Incorporating the sample correlation into the testing of two endpoints in clinical trials

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

We introduce an improved Bonferroni method for testing two primary endpoints in clinical trial settings using a new data-adaptive critical value that explicitly incorporates the sample correlation coefficient. Our methodology is developed for the usual Student’s t-test statistics for testing the means under normal distributional setting with unknown population correlation and variances. Specifically, we construct a confidence interval for the unknown population correlation and show that the estimated type-1 error rate of the Bonferroni method with the population correlation being estimated by its lower confidence limit can be bounded from above less conservatively than using the traditional Bonferroni upper bound. We also compare the new procedure with other procedures commonly used for the multiple testing problem addressed in this paper.

KEYWORDS:

1. Introduction

Pivotal clinical trials for new treatments that are designed to evaluate two primary efficacy endpoints face the so-called ‘multiplicity problem’, which, if not addressed, may cause inflation of type-1 error. Accordingly, regulatory agencies require that analysis plans contain a statistical methodology for type-1 error control. Moreover, since controlling type-1 error may also impact type-2 error (i.e., decrease power), regulators stress that one should examine the trade-off between the two types of error and carefully choose type-1 error controlling methodology. The multiplicity problem is further exacerbated by the inherent dependencies among various endpoints. While these dependencies can be qualitatively characterized in the sense that outcomes associated with the endpoints exhibit similar tendencies, albeit, with different magnitudes, there are situations where they can be quantitatively assessed from sample correlations among the examined variables.

Several statistical methodologies have been put forward to deal with the need to control type-1 error, with the aim of ultimately identifying at least one endpoint, and preferably both, for which the new treatment is better than the control. Among them, the most commonly used are the Bonferroni method for global testing and its step-down extension, Holm’s (Citation1979) method, for multiple testing. Because these methods utilize the Bonferroni inequality that relies only on the marginal p-values, they are dependency-free, and hence can be quite conservative when the p-values or the corresponding test statistics are highly dependent. Šidák (Citation1967) and Simes (Citation1986) have introduced improvements of the Bonferroni method for global testing. They control the type-1 error rate under independence and under a type of positive dependency that arises in some practical applications (Hochberg and Rom (Citation1995), Samuel-Cahn (Citation1996), Sarkar and Chang (Citation1997), and Sarkar (Citation1998)). Šidák’s (Citation1967) global test has been used by Holland and Copenhaver (Citation1987) to develop a step-down method, whereas Simes (Citation1986) has been used by Hochberg (Citation1988) to develop a step-up multiple testing method and by Hommel (Citation1988) to develop a closed testing method based on the ‘Closure Principle’ of Marcus et al. (Citation1976). Gou et al. (Citation2014) proposed a class of hybrid Hochberg-Hommel procedures which tend to be more powerful than either the Hochberg or Hommel procedure.

Šidák’s (Citation1967) and Simes (Citation1986) improved versions of the Bonferroni global test and their multiple testing extensions only qualitatively capture the underlying positive dependency, as they are still based on marginal p-values while continuing to maintain the type-1 error rate control even under such positive dependency. Unfortunately, they can be quite conservative, and hence can lose power, when such dependency is moderately high. Moreover, they can fail to control the type-1 error under negative dependency. While these two tests are widely used, theoretical results regarding the validity of their application have only been done in the case of normal statistics with certain correlation structure [(Hochberg and Rom (Citation1995), and Samuel-Cahn (Citation1996)], or t-statistics with same denominator representing an estimate of the common population standard deviations [Sarkar and Chang (Citation1997), and Sarkar (Citation1998)]. These assumptions do not hold in the two-endpoint problem addressed here because the endpoints almost always have different population variabilities.

Under normal distributional settings, which are most commonly used for global testing in practical applications and where the dependency among test statistics is parametrically represented through correlation coefficients, it is possible to capture the dependency quantitatively, and hence more fully than the Šidák’s (Citation1967) and Simes (Citation1986) tests, while improving the Bonferroni method. However, this idea of improving the Bonferroni method has so far been limited to the case where the population correlations are assumed known (see, e.g., Xie (Citation2012) and the references therein). Of course, one can consider replacing the known correlations in these methods with their suitable estimates to make them fully data-adaptive, but there is no theoretical justification that these would ultimately control the type I error rate. With correlations being rarely known in practice, tightening the Bonferroni type-1 error rate control through explicit use of sample correlations and providing a theoretical justification of such control would be an important objective.

In this paper, we consider achieving the above-mentioned objective by considering the two-mean testing problem under a normal distributional setting with unknown population correlation and variances. Our goal is to test the two hypotheses, with the aim of rejecting at least one, and preferably both. This testing scenario commonly arises in pharmaceutical studies. We propose a new procedure in this setting that utilizes the Bonferroni test based on the usual (marginal) Student’s-t test statistics but uses a data-adaptive critical value that explicitly incorporates the sample correlation coefficient. The confidence interval approach of Berger and Boos (Citation1994) is employed to make use of the sample correlation. More specifically, we first theoretically prove that the type-1 error rate of the Bonferroni method based on Student’s-t statistics (or their absolute values) with any fixed critical value is strictly decreasing in the unknown correlation coefficient (or its absolute value). These decreasing properties allow us to estimate the type-I error rates for both one- and two-sided testing problems, without relying on computations generally required in the application of the Berger and Boos (Citation1994) approach. We simply are substituting the unknown correlation coefficient (or its absolute value) with its lower confidence limit, given a fixed confidence coefficient, into the error rate formulas. Bounding these estimated error rates from above by the nominal level α allows us to produce correlation-adaptive critical values that are smaller than the traditional Bonferroni critical values but still control the type-1 error rate. The fact that such adaptive Bonferroni methods can provide much tighter control of the type-1 error rate than their regular, non-adaptive versions over a wide range of choices for the confidence coefficient and level of significance is demonstrated numerically.

It is important to note that the Berger and Boos (Citation1994) approach to estimating population correlation using its interval estimate in multiple testing scenarios was taken before in Tamhane et al. (Citation2012). However, it was for a different problem, namely, the development of a two-stage group sequential design for testing primary and secondary endpoints controlling familywise error rate (FWER). Moreover, unlike here, they considered large-sample settings, which allowed them to assume the $t$ -test statistics to be normally distributed and use large-sample confidence interval for the unknown correlation. Additionally, these authors only showed a directional relationship between the FWER and the correlation via numerical analysis as they were unable to show the relationship analytically.

The paper is organized as follows. Section 2 introduces our proposed ‘correlation-adaptive Bonferroni’ methodologies for both one- and two-sided testing problems. The process of computing the correlation-adaptive critical values in these methods is described in Section 3. In Section 4, we present these critical values for a wide range of sample sizes, before numerically showing in Section 4 how our methods compare with the corresponding traditional, non-adaptive Bonferroni methods in terms of type-1 error rate control and power. Concluding remarks are made in Section 5. These remarks include comments on (i) the novelty of theoretical results we obtain in this article towards application of the Berger and Boos (Citation1994) approach, and (ii) possible extension of the proposed correlation-adaptive Bonferroni to its Holm-type stepdown analog for simultaneous testing. Detailed proofs of the technical results needed to develop our proposed method are provided in the Appendix 1.

2. Proposed methodologies

In our setting, a test treatment is compared to a control treatment on two outcome measures $X_{1}$ and $X_{2}$ that are jointly distributed as a bivariate normal with a covariance matrix

$Σ = (\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}),$ .

and with the following pair of means: $(X_{1}, X_{2}) = \{\begin{matrix} (μ_{1}^{(1)}, μ_{2}^{(1)}) f o r t e s t \\ (μ_{1}^{(2)}, μ_{2}^{(2)}) f o r c o n t r o l \end{matrix}$

Given $n_{1}$ pairs of observations $(X_{1 j}^{(1)}, X_{2 j}^{(1)})$ , $j = 1, \dots, n_{1},$ for the test group, and $n_{2}$ pairs of observations $(X_{1 j}^{(2)}, X_{2 j}^{(2)})$ , $j = 1, \dots, n_{2},$ for the control group, our problem is to test the intersection $H_{0}$ of the following two one-sided null hypotheses:

H_{0} = \{H_{01} : μ_{1}^{(1)} \leq μ_{1}^{(2)}\} \cap \{H_{02} : μ_{2}^{(1)} \leq μ_{2}^{(2)}\},

against the union $H_{1}$ of one-sided alternative hypotheses:

H_{1} = \{H_{11} : μ_{1}^{(1)} μ_{1}^{(2)}\} \cup \{H_{12} : μ_{2}^{(1)} μ_{2}^{(2)}\},

or the intersection $H_{0}$ of the following two null hypotheses:

H_{0} = \{H_{01} : μ_{1}^{(1)} = μ_{1}^{(2)}\} \cap \{H_{02} : μ_{2}^{(1)} = μ_{2}^{(2)}\},

against the union $H_{1}$ of two-sided alternative hypotheses:

H_{1} = \{H_{11} : μ_{1}^{(1)} \neq μ_{1}^{(2)}\} \cup \{H_{12} : μ_{2}^{(1)} \neq μ_{2}^{(2)}\},

subject to a control of the type-1 error rate at $α$ .

Note that for the one-sided testing problem, the least favorable configurartion, i.e., the point in the parameter space of $H_{0}$ for which type-1 error is maximized is $\{μ_{1}^{(1)} = μ_{1}^{(2)}\} \cap \{μ_{2}^{(1)} = μ_{2}^{(2)}\}$ . Therefore in the one-sided testing problem, we can control type-1 error if we define and test the null hypotheses exactly as in the two-sided testing problem, i.e.,

H_{0} = \{H_{01} : μ_{1}^{(1)} = μ_{1}^{(2)}\} \cap \{H_{02} : μ_{2}^{(1)} = μ_{2}^{(2)}\}

Let

T_{1} = \sqrt{\frac{n_{1} n_{2}}{n_{1} + n_{2}}} \frac{({\overset{ˉ}{X}}_{1}^{(1)} - {\overset{ˉ}{X}}_{1}^{(2)})}{S_{1}} a n d T_{2} = \sqrt{\frac{n_{1} n_{2}}{n_{1} + n_{2}}} \frac{({\overset{ˉ}{X}}_{2}^{(1)} - {\overset{ˉ}{X}}_{2}^{(2)})}{S_{2}},

where ${\dot{X}}_{i}^{(k)} = \frac{1}{n_{k}} \sum_{j - 1}^{n_{k}} X_{i j}^{(k)}$ is the sample mean corresponding to $μ_{i}^{(k)}$ , for $i = 1, 2; k = 1, 2$ , and

S_{1}^{2} = \frac{1}{n - 2} \sum_{k = 1}^{2} \sum_{j = 1}^{n_{k}} {(X_{i j}^{(k)} - {\dot{X}}_{i}^{(k)})}^{2},

with

n = n_{1} + n_{2}

, is the pooled sample variance corresponding to

X_{i}

, for

i = 1, 2

. These are the standard Student’s

t

statistics that are used to marginally test the corresponding null hypotheses and form the basic ingredients in the development of traditional intersection or global tests, like Bonferroni, Simes (Citation1986), and others, that ignore an explicit use of the correlation between

X_{1}

and

X_{2}

or its estimate in their constructions.

We seek to improve the Bonferroni test by adapting it to the correlation between $X_{1}$ and $X_{2}$ through $r = S_{12} / S_{1} S_{2}$ , with

$S_{12} = \frac{1}{n - 2} \sum_{k = 1}^{2} \sum_{j = 1}^{n_{k}} (X_{1 j}^{(k)} - {\overset{ˉ}{X}}_{1}^{(k)}) (X_{2 j}^{(k)} - {\overset{ˉ}{X}}_{2}^{(k)})$ .

the pooled sample correlation between $X_{1}$ and $X_{2}$ . More specifically, we attempt to find a critical value $c_{1 α} (r)$ , depending on $r$ , such that

(2.1)

P r_{H_{0}} \{m a x (T_{1}, T_{2}) \leq c_{1 α} (r)\} \geq 1 - α,

(2.1)

or $c_{2 α} (r)$ such that

(2.2)

P r_{H_{0}} \{m a x (|T_{1}|, |T_{2}|) \leq c_{2 α} (r)\} \geq 1 - α,

(2.2)

depending on whether $H_{0}$ is tested against a one-sided alternative $H_{1} : \{μ_{1}^{(1)} > μ_{1}^{(2)}\} \cup \{μ_{2}^{(1)} > μ_{2}^{(2)}\}$ or against a two-sided alternative

$H_{1} : \{μ_{1}^{(1)} \neq μ_{1}^{(2)}\} \cup \{μ_{2}^{(1)} \neq μ_{2}^{(2)}\}$ .

Towards finding $c_{1 α} (r)$ and $c_{2 α} (r)$ , we first note the following distributional results:

\sqrt{\frac{n_{1} n_{2}}{n_{1} + n_{2}}} (\begin{matrix} {\overset{ˉ}{X}}_{1}^{(1)} - {\overset{ˉ}{X}}_{1}^{(2)} \\ {\overset{ˉ}{X}}_{2}^{(1)} - {\overset{ˉ}{X}}_{2}^{(2)} \end{matrix}) a n d (n - 2) (\begin{matrix} S_{1}^{2} & S_{12} \\ S_{12} & S_{2}^{2} \end{matrix})

independently distributed as $N_{2} (μ, Σ)$ and $W_{2} (n - 2, Σ),$ respectively, with

which equals $(\begin{matrix} 0 \\ 0 \end{matrix})$ under $H_{0} .$ From these results, we obtain the theorem below:

$μ = \sqrt{\frac{n_{1} n_{2}}{n_{1} + n_{2}}} (\begin{matrix} μ_{1}^{(1)} - μ_{1}^{(2)} \\ μ_{2}^{(1)} - μ_{2}^{(2)} \end{matrix}),$

Theorem 1. The following results hold:

The probability $P r_{H_{0}} (m a x (T_{1}, T_{2}) \leq c)$ depends on the nuisance parameters, $ρ, σ_{1}, a n d σ_{2}$ only through $ρ$ and is strictly increasing in $ρ$ , for any fixed $- \infty < c < \infty$ .,

The probability $P r_{H_{0}} (m a x (|T_{1}|, |T_{2}| \leq c)$ depends on the nuisance parameters, $ρ, σ_{1}, a n d σ_{2}$ only through $|ρ|$ and is strictly increasing in $|ρ|$ , for any fixed $0 < c_{1}, c_{2} < \infty$ .

This theorem, a proof of which is presented in Appendix 1, facilitates the calculation of $c_{1 α} (r)$ and $c_{2 α} (r)$ using a slight modification of the confidence interval approach of Berger and Boos (Citation1994). Specifically, let $Δ_{1} (c, ρ) = P r_{H_{0}} (m a x (T_{1}, T_{2}) \leq c)$ , and ${\hat{ρ}}_{1, β} (r)$ be a lower confidence limit for $ρ$ based on $r$ with confidence coefficient $1 - β$ . Then, since

Δ_{1} (c, ρ) \geq Δ_{1} (c, - 1) = 2 P r_{H_{0}} (T_{1} \leq c) - 1,

and $Δ_{1} (c, ρ)$ is strictly increasing in $ρ \in (- 1, 1)$ , we have

$Δ_{1} (c, ρ)$ = $E \{Δ_{1} (c, ρ) I (ρ \geq {\hat{ρ}}_{1, β} (r))\} + E \{Δ_{1} (c, ρ) I (ρ < {\hat{ρ}}_{1, β} (r))\}$

$\geq$ $E \{Δ_{1} (c, {\hat{ρ}}_{1, β} (r)) I (ρ \geq {\hat{ρ}}_{1, β} (r))\} + [2 P r_{H_{0}} (T_{1} \leq c) - 1] P r_{H_{0}} (ρ < {\hat{ρ}}_{1, β} (r))$

E \{Δ_{1} (c, {\hat{ρ}}_{1, β} (r)) I (ρ \geq {\hat{ρ}}_{1, β} (r))\} + β [2 P r_{H_{0}} (T_{1} \leq c) - 1] .

The desired $c \equiv c_{1 α} (r)$ guaranteeing (2.1) then can be obtained by equating $Δ_{1} (c, {\hat{ρ}}_{1, β} (r))$ to $\{1 - α - β [2 P r_{H_{0}} (T_{1} \leq c) - 1]\} / (1 - β)$ , that is, by solving the equation below for $c$ , for any fixed $(α, β, r)$ :

(2.3)

G_{1, β} (c, r) = (1 - β) Δ_{1} (c, {\hat{ρ}}_{1, β} (r)) + β [2 P r_{H_{0}} (T_{1} \leq c) - 1] = 1 - α .

(2.3)

It is worth noting that $G_{1, β} (c, r) \geq 2 P r_{H_{0}} (T_{1} \leq c) - 1$ , and so $c_{1 α} (r)$ is less than or equal to the Bonferroni critical value $c$ satisfying $2 P r_{H_{0}} (T_{1} \leq c) = 2 - α$ . In other words, the resulting modification of the Bonferroni test for testing $H_{0}$ against $H_{1} : \{μ_{1}^{(1)} > μ_{1}^{(2)}\} \cup \{μ_{2}^{(1)} > μ_{2}^{(2)}\}$ will have a larger rejection region.

The $c_{2 α}$ satisfying (2.2) can be obtained in the same manner by using the fact that $Δ_{2} (c, |ρ|) = P r (m a x (|T_{1}|, |T_{2}|) \leq c) \geq Δ_{2} (c, |ρ| = 0) = P r^{2} (|T_{1}| < c),$ and $Δ_{2} (c, |ρ|)$ is strictly increasing in $|ρ|$ , and utilizing a lower confidence limit ${|\hat{ρ}|}_{2, β} (|r|)$ of $|ρ|$ based on $|r|$ with confidence coefficient $1 - β$ . More specifically, $c \equiv c_{2 α}$ can be obtained by solving the equation below for $c$ , for any fixed $(α, β, r)$ :

(2.4)

G_{2, β} (c, r) = (1 - β) Δ_{2} (c, {|\hat{ρ}|}_{2, β} (|r|)) + β {P r}_{H_{0}}^{2} (|T_{1}| < c) = 1 - α .

(2.4)

Since $c_{2 α}$ is smaller than the Bonferonni critical value $c$ satisfying ${P r}_{H_{0}}^{2} (|T_{1}| < c) = 1 - α$ for testing $H_{0}$ against $H_{1} : \{μ_{1}^{(1)} \neq μ_{1}^{(2)}\} \cup \{μ_{2}^{(1)} \neq μ_{2}^{(2)}\}$ , our modification of the Bonferroni test will have a larger rejection region, and hence more power, than the usual Bonferroni test.

3. Data-adaptive critical values

This section describes the process of calculating $G_{1, β} (c, r)$ and $G_{2, β} (c, r)$ , given $β,$ from the pooled sample covariance matrix with $n - 2$ degrees of freedom (d.f.). A pseudocode of these calculations appears in Appendix 2. Subsequently, we derive the critical values $c_{1 α} (r)$ and $c_{2 α} (r)$ by solving the corresponding Equationequations (2.3(2.3) $G_{1, β} (c, r) = (1 - β) Δ_{1} (c, {\hat{ρ}}_{1, β} (r)) + β [2 P r_{H_{0}} (T_{1} \leq c) - 1] = 1 - α .$ (2.3) ) and (Equation2.4(2.4) $G_{2, β} (c, r) = (1 - β) Δ_{2} (c, {|\hat{ρ}|}_{2, β} (|r|)) + β {P r}_{H_{0}}^{2} (|T_{1}| < c) = 1 - α .$ (2.4) ) for $c$ . The calculation of $G_{1, β} (c, r)$ and $G_{2, β} (c, r)$ involves expressing the probabilities $Δ_{1} (c, ρ)$ and $Δ_{2} (c, |ρ|)$ and estimating them by substituting ρ and |ρ| with their respective $1 - β$ lower confidence limit ${\hat{ρ}}_{1, β} (r)$ and ${\hat{|ρ|}}_{2, β} (r) .$

3.1. Expressions of $Δ_{1} (c, ρ)$ and $Δ_{2} (c, |ρ|)$

Let $Φ_{ρ}$ be the cumulative distribution function of $(Z_{1}, Z_{2}$ ) having standard bivariate normal distribution with correlation $ρ$ . Then, from the above-mentioned joint distribution of $({\overset{ˉ}{X}}_{1}^{(1)} - {\overset{ˉ}{X}}_{1}^{(2)}, {\overset{ˉ}{X}}_{2}^{(1)} - {\overset{ˉ}{X}}_{2}^{(2)})$ and $(S_{1}^{2}, S_{2}^{2}, S_{12})$ under $H_{0}$ , we see that

Δ_{1} (c, ρ) = P r_{H_{0}} (Z_{1} \leq c \frac{S_{1}}{σ_{1}}, Z_{2} \leq c \frac{s_{2}}{σ_{2}})

(3.1)

= \int_{0}^{\infty} \int_{0}^{\infty} Φ_{ρ} (c \sqrt{w_{1} / (n - 2)}, c \sqrt{w_{2} / (n - 2)}) g (w_{1}, w_{2}) d w_{1} d w_{2},

(3.1)

and

Δ_{2} (c, |ρ|) = \int_{0}^{\infty} \int_{0}^{\infty} g (w_{1}, w_{2}) [Φ_{|ρ|} (c \sqrt{\frac{w_{1}}{n - 2}}, c \sqrt{w_{2} (n - 2)}) -

2 Φ_{|ρ|} (- c \sqrt{w_{1} / (n - 2)}, c \sqrt{w_{2} / (n - 2)}) +

Φ_{|ρ|} (- c \sqrt{w_{1} / (n - 2)}, - c \sqrt{w_{2} / (n - 2)})] d w_{1} d w_{2}

(3.2)

where $g (w_{1}, w_{2})$ is the density of $(W_{1}, W_{2})$ , the diagonal elements of a $2 \times 2$ Wishart matrix with $n - 2$ d.f. and covariance matrix $(\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}) .$ Since

(W_{1}, W_{2}) \overset{d}{=} (W_{1}, ((1 - ρ^{2}) W_{3} + {(\sqrt{(1 - ρ^{2})} Z + ρ \sqrt{W_{1}})}^{2})),

with $W_{1}$ , $W_{3}$ , and $Z$ being distributed independently as $χ_{n - 2}^{2}$ , $χ_{n - 3}^{2}$ and $N (0, 1)$ , respectively (see, e.g., Odell and Feiveson (Citation1966)), we see that $g (w_{1}, w_{2})$ can be expressed as follows:

g (w_{1}, w_{2}) = g_{W_{1}} (w_{1}) \int \int_{A (W_{1}, W_{2})} g_{W_{3}} (w_{3}) φ (z) d w_{3} d z,

where $g_{W_{1}}$ , $g_{W_{3}}$ and $φ (z)$ are the densities of $χ_{n - 2}^{2}$ , $χ_{n - 3}^{2}$ and $N (0, 1)$ , respectively, and

A (w_{1}, w_{2}) = \{(w_{3}, z) : (1 - ρ^{2}) w_{3} + {(\sqrt{(1 - ρ^{2}}) z + ρ \sqrt{w_{1}})}^{2} = w_{2}\}

3.2. Lower confidence limits ${\hat{ρ}}_{1, β} (r)$ and $| {\hat{ρ |}}_{2, β} (r) .$

Although these confidence limits can be approximated by using Fisher’s transformation of $r$ , we consider calculating them exactly using the following distribution of $r$ (from sample covariance matrix with $n - 2$ d.f.), obtained from Hotelling (Citation1953):

$f_{ρ} (r) = \frac{(n - 3) Γ (n - 2) {(1 - ρ^{2})}^{\frac{n - 2}{2}} {(1 - r^{2})}^{\frac{n - 5}{2}}}{\sqrt{2 π} Γ (n - \frac{3}{2}) {(1 - ρ r)}^{n - \frac{5}{2}}}$ ₂F₁ $(\frac{1}{2}, \frac{1}{2}; \frac{2 n - 3}{2}; \frac{ρ r + 1}{2})$ ,

where $Γ$ is the gamma function and ₂ $F_{1}$ is the Gaussian hypergeometric function:

₂F₁ $(a, b; c; z)$ = $\sum_{n = 0}^{\infty} \frac{{(a)}_{n} {(b)}_{n}}{{(c)}_{n}} \frac{z^{n}}{n!}$ , with ${(q)}_{n} = \{\begin{matrix} 1, n = 0 \\ q (q + 1) \dots (q + n - 1), n > 0 . \end{matrix}$

A $1 - β$ lower confidence limit ${\hat{ρ}}_{1, β} (r)$ for $ρ$ is calculated by solving the following equation for $\hat{ρ}$ :

$F_{\hat{ρ}} (r) = \int_{- 1}^{r} f_{\hat{ρ}} (x) d x$ = $1 - β$ . (3.3)

Similarly, a $1 - β$ lower confidence limit ${|\hat{ρ}|}_{2, β} (|r|)$ for $|ρ|$ can be calculated by solving the following equation for $\hat{|ρ|}$ :

$F_{\hat{ρ}} (|r|) = \int_{0}^{|r|} f_{|\hat{ρ}|} (x) d x$ = $1 - β$ , (3.4)

where $f_{|ρ|} (x) = f_{ρ} (x) + f_{ρ} (- x); 0 \leq x \leq 1$

3.3. Calculation of $c_{1 α}$ and $c_{2 α}$

We estimate $Δ_{1} (c, ρ)$ , given $(c$ , $β)$ by replacing $ρ$ with its lower confidence limit ${\hat{ρ}}_{1, β} (r)$ to obtain

$G_{1, β} (c, r)$ = $(1 - β) Δ_{1} (c, {\hat{ρ}}_{1, β} (r)) + β [2 P r_{H_{0}} (T_{1} \leq c) - 1]$ ,

where $P r_{H_{0}} (T_{1} \leq c)$ is calculated using the cumulative distribution function of central Student’s t with $n - 2$ d.f. The $c_{1 α}$ is then obtained by solving the equation $G_{1, β} (c, r) = 1 - α$ for $c$ .

Similarly, $c_{2 α}$ is calculated by estimating $Δ_{2} (c, |ρ|)$ , given $(c$ , $β)$ by replacing $|ρ|$ with its lower confidence limit ${\hat{|ρ|}}_{2, β} (r)$ to obtain

$G_{2, β} (c, r)$ = $(1 - β) Δ_{2} (c, {\hat{|ρ|}}_{2, β} (r)) + β {P r}_{H_{0}}^{2} (|T_{1}| \leq c)$ ,

where $P r_{H_{0}} (| T_{1} | \leq c)$ is calculated using the cumulative distribution function of central Student’s t with $n - 2$ d.f. and solving the equation $G_{2, β} (c, r) = 1 - α$ for $c$ .

4. Critical values

present the critical values of our proposed correlation-adaptive Bonferroni procedures, respectively, for one- and two-sided testing problems. For each configuration of sample size and observed sample correlation coefficient $r$ , the table entries are the solutions of the process described in Section 3 for $G_{1, β} (c, r) = 1 - α$ (one-sided tests) and $G_{2, β} (c, r) = 1 - α$ (two-sided tests). These solutions were obtained by iteratively changing the critical values and numerically integrating the left-hand side of each equation until a solution was found so that the right-hand side of each equation was within 0.000001 of $1 - α$ .

Table 1. One-sided critical values ( $α$ = 0.025; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Display Table

Table 2. Two-sided critical values ( $α$ = 0.05; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Display Table

We are providing values for a wide range of observed sample correlation coefficient $r$ or its absolute value $|r|$ , depending on whether the testing problem is one- or two-sided, and for some choices of total sample size $n$ . For sample sizes below 1000, we have used $β =$ 0.05, while for sample sizes of 1000 and above, $β =$ 0.01 was used. We elaborate on these choices in Section 5.

Note that for the two endpoints problem, with a two-sided $α = 0.05.$ or a one-sided $α = 0.025$ , the Bonferroni critical values are simply half of their respective $α,$ namely 0.025 (= 0.05/2) and 0.0125 (= 0.025/2) for the two- or one-sided testing problems, no matter what the sample correlation coefficient is. As expected, the newly derived critical values increase as the sample size or the sample correlation coefficient increases. This is due to the tighter range of the confidence interval with increased sample size, and the decreasing property of the type-1 error rate with increasing population correlation (approximately equaling the sample correlation for large sample size). Of note is that the new critical values remain close to the corresponding usual Bonferroni critical values when the sample correlation is in the range of −1 to 0.

displays power comparisons between the correlation adaptive Bonferroni, the standard, non-adaptive Bonferroni, Simes, and Šidák procedures. Estimated power calculation was done via 1,000,000 random samples. Comparisons were made for a few configurations of effect sizes for the two endpoints, ranging from equal to substantially different. For meaningful comparisons, configurations of effect sizes were designed to facilitate power comparisons in the range of 80-90%. As expected, the Adaptive Bonferroni has a power advantage over the non-adaptive Bonferroni test that increases with sample size and population correlation. The differences are noticeable, being in the range from 1% to 4%. Šidák’s test gives only a minor improvement over Bonferroni. Simes’s test has its best advantage for effect sizes that are equal. In those cases, its advantage over the adaptive Bonferroni can be in the range of 0.5–1%. On the other hand, when the effect sizes are different, the adaptive Bonferroni has an advantage that can be in the range of 2–2.5%. As we stated before, and elaborated in the next section, the Simes’ test has not been shown to control type-1 error for the testing problem addressed here, namely when the t-statistics are constructed with separate estimates of the population standard deviations, and therefore its validity for this problem is not known.

Table 3. Power (%) comparison. One-sided $α$ = 0.025

Display Table

5. Discussion and concluding remarks

The multiplicity problem addressed in this paper is quite common in clinical trial settings where two treatments are compared on two primary endpoints and evidence of superiority on one of these endpoints is sufficient to obtain regulatory marketing approval. Current solutions to this problem in terms of controlling the type-1 error rate are typically based on dependency-free methodologies (such as Bonferroni test and its various extensions) or on those that only qualitatively utilize positive dependencies (such as Šidák’s (Citation1967) and Simes (Citation1986) tests and their extensions). However, it is generally understood that test procedures that utilize more data-embedded information, such as dependencies among variables, tend to be more powerful. Our proposed data-adaptive version of the Bonferroni method utilizing information through the sample correlation is such a procedure. It is indeed more powerful than its non-adaptive counterpart, as numerically verified.

It is important to note that Simes’ and Šidák’s inequalities were not proven to hold in the testing problem described here and therefore the validity of multiple testing procedures based on these two tests is questionable. Hochberg and Rom (Citation1995) and Samuel-Cahn (Citation1996) have shown that Simes’ test controls type-1 error when the test statistics are jointly bivariate normal for two-sided testing, and with non-negative correlation for one-sided testing. Sarkar and Chang (Citation1997), and Sarkar (Citation1998) have obtained similar results when the test statistics are jointly bivariate t whose marginal t-statistics have been constructed with the same estimate of the standard deviation (sometimes referred to as ‘the standard bivariate t of the Dunnett type’). For the problem at hand, the marginal t-statistics do not share the same estimate of the standard deviation, and therefore, the resulting bivariate t-distribution is not of the Dunnett type. It is unknown whether the results proven in Sarkar and Chang (Citation1997), and Sarkar (Citation1998) hold for this problem. Moreover, it has been shown in Hochberg and Rom (Citation1995), and Samuel-Cahn (Citation1996) that Simes’ test has an inflated type-1 error for negatively correlated normal statistics with one-sided testing; and since the value of the population correlation is rarely known and can be negative, the validity of the Sime’s test in the testing problem described here is questionable.

The arguments above regarding one-sided testing also apply to Šidák’s inequality. Nevertheless, the results obtained in this paper allow us to state the following: 1. The Adaptive Bonferroni method is never less powerful than Šidák’s method for two-sided testing since our method allows us to replace the unknown correlation with a less conservative correlation resulting from the use of the confidence interval for the unknown population correlation. If the confidence interval does not cover zero, then our critical values will be less conservative than Šidák’s critical values, otherwise, they will be the same. By implication, we have proven that 1: Šidák’s inequality holds for the absolute values of two t statistics whose joint distribution is of the form described here (the standard deviations have separate estimates), an important result on its own; and 2: For the one-sided testing problem, it is generally (but not always) true that for positively correlated statistics, our method will result in less conservative critical values than those obtained by assuming that the correlation is zero (independence in the normal case) as is done by Šidák. However, Šidák’s method can inflate type-1 error if the population correlation is negative, while our method is valid for that case.

One might consider using Hotelling’s T² to test the global null hypothesis for our setting. However, the resulting test does not possess the “Consonance” property of Gabriel (Citation1969); that is, following the rejection of the global null hypothesis, the rejection of any of the individual hypotheses is not guaranteed, and they must each be tested and rejected by their own a $α$ -level test. This may lead to loss of power for the rejection of any of the individual null hypotheses. On the other hand, the Bonferroni as well as our adaptive version of it, being in the class of Union Intersection (UI) tests, are consonant, and therefore do allow for the rejection of at least one individual null hypothesis whenever the global null hypothesis is rejected. A UI test allows for the allocation of different portions of type-1 the error to the marginal Student’s t-test statistics, thereby adapting the test to the possible difference in effect sizes between the two endpoints. Also, it is amenable to its applications as a stepwise procedure, starting with the global test and, depending on the rejection of the global null hypothesis (and so at least one individual hypothesis), allocating the full nominal type-1 error to the other hypotheses, thereby increasing the power to reject the second hypothesis.

The monotonicity of the type I error rate for Bonferroni global testing involving one-sided (or two-sided) tests with respect to the population correlation (or the absolute value of the population correlation) is an important theoretical result in the process of carrying out the main maximization step in the Berger and Boos (Citation1994) approach without computations. While this property is known in the literature for multivariate (or absolute-valued multivariate) normal random variables, they are not available for the joint distribution of the marginal $t$ ’s (or absolute-valued marginal $t$ ’s) in Hotelling’s T², and so these results proven in the bivariate case in this paper are important in their own right. Tamhane et al. (Citation2012) have made use of a similar monotonicity property for normally distributed test statistics, although for a different problem, in the aforementioned step of the Berger and Boos (Citation1994) approach without computations. However, they verified this property numerically.

The proposed correlation-adaptive Bonferroni method for global testing can be used to develop a Holm-type stepdown method for simultaneous testing of the individual null hypotheses in the present context. For instance, let us consider the one-sided testing problem. With $H_{(01)}$ and $H_{(02)}$ denoting the null hypotheses corresponding to $m i n (T_{1}, T_{2})$ and $m a x (T_{1}, T_{2})$ , respectively, we can describe this so-called correlation-adaptive Holm method controlling the (familywise) type-1 error rate at $α$ as follows:

Do not reject $H_{(01)}$ or $H_{(02)}$ if $max (T_{1}, T_{2}) \leq c_{1 α} (r)$

Do not reject $H_{(01)}$ but reject $H_{(02)}$ if $min (T_{1}, T_{2}) \leq t_{α, n - 2}, m a x (T_{1}, T_{2}) > c_{1 α} (r)$

Reject both $H_{(01)}$ and $H_{(02)}$ if $min (T_{1}, T_{2}) > t_{α, n - 2}, m a x (T_{1}, T_{2}) > c_{1 α} (r)$

A correlation-adaptive Holm method for the two-sided testing problem can be similarly proposed in terms of $m i n (|T_{1}|, |T_{2}|),$ $m a x (|T_{1}|, |T_{2}|)$ and $c_{2 α} (r)$ .

The correlation-adaptive Bonferroni methodology can be further extended to more than two endpoints, although a difficulty arises due to the increased dimensionality. One may need to resort to some efficient Monte-Carlo numerical integration methods to address the testing of more than two endpoints. This extension will also require some additional theoretical results. A more pragmatic approach to reduce the dimensionality problem is to use the bivariate results obtained here and to devise an upper bound for the case of more than two endpoints. The first method can readily be described for the case of three endpoints as follows (one-sided bounds are described here with obvious changes to two-sided testing):

P r (\cup_{i = 1}^{3} {T_{i} \geq c}) = \sum_{i = 1}^{3} P r (T_{i} \geq c) - \sum_{i, j (i j) = 1}^{3} P r (\{T_{i} \geq c\} \cap^{\{T_{j} \geq c\}}) + P r (\cap_{i = 1}^{3} (T_{j} \geq c))

and since

P r (\cap_{i = 1}^{3} (T_{j} \geq c)) \leq min_{i \neq j} P r (\{T_{i} \geq c\} \cap \{T_{j} \geq c\}) .

(2.5)

\to P r (\cup_{i = 1}^{3} {T_{i} \geq c}) \leq \sum_{i = 1}^{3} P r (T_{i} \geq c) - max_{i \in \{1, 2, 3\}} \sum_{j \neq i} P r (\{T_{i} \geq c\} \cap \{T_{j} \geq c\})

(2.5)

This bound relies on the univariate and bivariate probabilities only. We can then replace each of the bivariate probabilities on the righthand side of (2.5) using the lower confidence limit of the correlation between the respective statistics and apply the Berger and Boos (Citation1994) method as was done for the two-endpoint problem. Two types of extensions of (2.5) can be made for more than three endpoints: The first is based on extending (2.5) to $k$ endpoints using Kounias (Citation1968) inequality:

(2.6)

P r (\cup_{i = 1}^{k} {T_{i} \geq c}) \leq \sum_{i = 1}^{k} P r (T_{i} \geq c) - max_{i \in \{1, \dots, k\}} \sum_{j \neq i} P r (\{T_{i} \geq c\} \cap \{T_{j} \geq c\}),

(2.6)

and using a lower confidence bound for each (bivariate) correlation and the Berger and Boos (Citation1994) method in (2.5).

A second approach is to utilize the closure principle of Marcus et al. (Citation1976) to test all intersection hypotheses of cardinality $j (j \in \{1, 2, \dots k\})$ at level $j α / k$ . In this approach, any intersection hypothesis $H$ of cardinality $i$ , can be rejected at level $i α / k$ by testing and rejecting all intersection hypotheses of cardinality $j (> i)$ implying $H$ at level $j α / k$ . Applying this idea recursively to testing $k$ endpoints, the following procedure will control type-1 error rate:

Reject any hypothesis $H_{j}$ $(j = 1, \dots, k)$ corresponding to endpoint $j,$ provided all intersection hypotheses of cardinality $3$ implying $H_{j}$ have been tested and rejected at level $3 α / k .$ We use (2.5) to test all hypotheses of cardinality $3$ .

The tightness (i.e., how far the above bounds are from the exact type-1 error) of the above approaches depends on the correlation matrix among the $k$ endpoints which in turn determines whether higher dimensional probabilities are diminishingly small compared to the two-dimensional probabilities. Our preliminary evaluation suggests that for small to moderate correlations, the univariate and bivariate probabilities do provide a tight upper bound on type-1 error. Further work is currently undertaken to examine the above bounds. As an example of this point, consider a setting with three endpoints, and sample sizes of 500 in each of two groups, with all observed sample correlations being 0.5. With a one-sided type-1 error of 0.025, the Bonferroni test will use a critical value of 0.025/3 =0.0083 for testing each of the three hypotheses. Applying (2.5) with a lower $1 - β$ $(β$ =0.0001) confidence limit and the Berger and Boos (Citation1994) method, we get a critical value of 0.00867 which is a slight improvement over the Bonferroni test. If we were to consider the asymptotic critical value ( $n \to \infty)$ using a three-dimensional normal with all correlations equal to 0.5 to approximate the joint distribution of the test statistics, we would use a critical value of 0.0095 (estimated using a Monte Carlo simulation) making our critical value 0.00867 slightly conservative. Note that the use of the asymptotic critical value may cause some type-1 error inflation due to the use of the normal distribution instead of the t-distribution, and the use of the observed correlations to replace the unknown correlations. Thus, the conservatism of our critical value is no more than the difference derived from the asymptotic distribution, and practically can be much lower.

A similar problem with observed sample correlations of 0.9 gives a critical value of 0.01205 from our method while the Bonferroni test remains unchanged with a critical value of 0.0083. Again, considering the asymptotic distribution as a three-dimensional normal with all correlations being 0.9, the critical value is 0.0145 (estimated from a Monte Carlo simulation), making our method with a critical value of 0.01205 slightly conservative but much less conservative than the Bonferroni test.

The method described in this paper can be extended more easily to situations where, following the rejection of either of the primary endpoints, it is desired to test secondary endpoints. The dependencies between the primary and secondary endpoints can then be readily incorporated using the methodology described in this article to devise an improved sequential testing.

Supplemental material

Supplemental Material

Download MS Word (68.2 KB)

Acknowledgments

We would like to thank Ajit Tamhane for his helpful suggestions on an earlier version of this manuscript. We also thank the anonymous referee and the editor for their insightful comments which helped strengthen this manuscript. We thank Michael Pol for meticulously programming all tables. An R code for generating critical values is available from the authors upon request

Supplemental data

Supplemental data for this article can be accessed on the publisher’s website.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

References

Berger, R. L., and D. D. Boos. 1994. P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association 89 (427):1012–1016.
Web of Science ®Google Scholar
Gabriel, K. R. 1969. Simultaneous test procedures–some theory of multiple comparisons. Annals of Mathematical Statistics 41 (1):224–250. doi:https://doi.org/10.1214/aoms/1177697819.
Google Scholar
Gou, J., A. C. Tamhane, D. Xi, and D. Rom. 2014. A class of improved hybrid hochberg-hommel type step-up multiple test procedures. Biometrika 101 (4):899–911. doi:https://doi.org/10.1093/biomet/asu032.
Web of Science ®Google Scholar
Hochberg, Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75 (4):800–802. doi:https://doi.org/10.1093/biomet/75.4.800.
Web of Science ®Google Scholar
Hochberg, Y., and D. M. Rom. 1995. Extensions of multiple testing procedures based on Simes’ test. Journal of Statistical Planning and Inference 48 (2):141–152. doi:https://doi.org/10.1016/0378-3758(95)00005-T.
Web of Science ®Google Scholar
Holland, B. S., and M. D. Copenhaver. 1987. An improved sequentially rejective Bonferroni test procedure. Biometrics 43 (2):417–423. doi:https://doi.org/10.2307/2531823.
Web of Science ®Google Scholar
Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6 (2):65–70.
Web of Science ®Google Scholar
Hommel, G. A. 1988. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75 (2):383–386. doi:https://doi.org/10.1093/biomet/75.2.383.
Web of Science ®Google Scholar
Hotelling, H. 1953. New light on the correlation coefficient and its transforms. Journal of the Royal Statistical Society 15 (2):193–232.
Google Scholar
Kounias, E. J. 1968. Bounds for the probability of a union, with applications. The Annals of Mathematical Statistics 39 (6):2154–2158. doi:https://doi.org/10.1214/aoms/1177698049.
Google Scholar
Marcus, R., E. Peritz, and K. R. Gabriel. 1976. On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63 (3):655–660. doi:https://doi.org/10.1093/biomet/63.3.655.
Web of Science ®Google Scholar
Odell, P. L., and A. H. Feiveson. 1966. A numerical procedure to generate a sample covariance matrix. Journal of the American Statistical Association 61 (313):199–203. doi:https://doi.org/10.1080/01621459.1966.10502018.
Web of Science ®Google Scholar
Samuel-Cahn, E. 1996. Is the Simes improved Bonferroni procedure conservative? Biometrika 83 (4):928–933. doi:https://doi.org/10.1093/biomet/83.4.928.
Web of Science ®Google Scholar
Sarkar, S. K. 1998. Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Annals of Statistics 26 (2):494–504. doi:https://doi.org/10.1214/aos/1028144846.
Web of Science ®Google Scholar
Sarkar, S. K., and C.-K. Chang. 1997. The simes method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92 (440):1601–1608. doi:https://doi.org/10.1080/01621459.1997.10473682.
Web of Science ®Google Scholar
Šidák, Z. K. 1967. Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association 62 (318):626–633.
Web of Science ®Google Scholar
Simes, R. J. 1986. An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 (3):751–754. doi:https://doi.org/10.1093/biomet/73.3.751.
Web of Science ®Google Scholar
Tamhane, A. C., Y. Wu, and C. R. Mehta. 2012. Adaptive extensions of a two-stage group sequential procedure for testing primary and secondary endpoints (II): sample size re-estimation. Statistics in Medicine 31 (19):2041–2054. doi:https://doi.org/10.1002/sim.5377.
PubMed Web of Science ®Google Scholar
Xie, C. 2012. Weighted multiple testing correction for correlated tests. Statistics in Medicine 31 (4):341–352. doi:https://doi.org/10.1002/sim.4434.
PubMed Web of Science ®Google Scholar

Incorporating the sample correlation into the testing of two endpoints in clinical trials

ABSTRACT

1. Introduction

2. Proposed methodologies

3. Data-adaptive critical values

3.1. Expressions of $Δ_{1} (c, ρ)$ and $Δ_{2} (c, |ρ|)$

3.2. Lower confidence limits ${\hat{ρ}}_{1, β} (r)$ and $| {\hat{ρ |}}_{2, β} (r) .$

3.3. Calculation of $c_{1 α}$ and $c_{2 α}$

4. Critical values

Table 1. One-sided critical values ( $α$ = 0.025; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Table 2. Two-sided critical values ( $α$ = 0.05; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Table 3. Power (%) comparison. One-sided $α$ = 0.025

5. Discussion and concluding remarks

Supplemental Material

Acknowledgments

Supplemental data

References

Information for

Open access

Opportunities

Help and information

Incorporating the sample correlation into the testing of two endpoints in clinical trials

ABSTRACT

1. Introduction

2. Proposed methodologies

3. Data-adaptive critical values

3.1. Expressions of Δ1c,ρ and Δ2c,ρ

3.2. Lower confidence limits ρˆ1,βr and|ρ|ˆ2,βr.

3.3. Calculation of c1α and c2α

4. Critical values

Table 1. One-sided critical values (α= 0.025; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Table 2. Two-sided critical values (α= 0.05; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Table 3. Power (%) comparison. One-sided α= 0.025

5. Discussion and concluding remarks

Supplemental Material

Acknowledgments

Supplemental data

Correction Statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.1. Expressions of $Δ_{1} (c, ρ)$ and $Δ_{2} (c, |ρ|)$

3.2. Lower confidence limits ${\hat{ρ}}_{1, β} (r)$ and $| {\hat{ρ |}}_{2, β} (r) .$

3.3. Calculation of $c_{1 α}$ and $c_{2 α}$

Table 1. One-sided critical values ( $α$ = 0.025; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Table 2. Two-sided critical values ( $α$ = 0.05; β = 0.05 for n < 1,000, β = 0.01 for n ≥ 1,000)

Table 3. Power (%) comparison. One-sided $α$ = 0.025