Search in:

Statistical Theory and Related Fields Volume 5, 2021 - Issue 2: Special Issue on Experimental Design. Guest Editors: Xinwei Deng, Dept. of Statistics, College of Science, Virginia Tech, USA; Devon Lin, Dept. of Mathematics and Statistics, Queen‘s University, Canada.

Submit an article Journal homepage

Free access

700

Views

CrossRef citations to date

Altmetric

Listen

Articles

Sample size and power analysis for stepped wedge cluster randomised trials with binary outcomes

Jijia Wanga Department of Applied Clinical Research, UT Southwestern Medical Center, Dallas, TX, USAView further author information

Jing Caob Department of Statistical Science, Southern Methodist University, Dallas, TX, USAView further author information

Song Zhangc Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USACorrespondence[email protected]
View further author information

Chul Ahnc Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USAView further author information

Pages 162-169 | Received 09 Jul 2020, Accepted 12 Mar 2021, Published online: 06 Apr 2021

Cite this article
https://doi.org/10.1080/24754269.2021.1904094
CrossMark

In this article

1. Introduction
2. Method
3. Simulation studies
4. Example
5. Discussion
Disclosure statement
Additional information
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In stepped wedge cluster randomised trials (SW-CRTs), clusters of subjects are randomly assigned to sequences, where they receive a specific order of treatments. Compared to conventional cluster randomised studies, one unique feature of SW-CRTs is that all clusters start from control and gradually transition to intervention according to the randomly assigned sequences. This feature mitigates the ethical concern of withholding an effective treatment and reduces the logistic burden of implementing the intervention at multiple clusters simultaneously. This feature, however, presents challenges that need to be addressed in experimental design and data analysis, i.e., missing data due to prolonged follow-up and complicated correlation structures that involve between-subject and longitudinal correlations. In this study, based on the generalised estimating equation (GEE) approach, we present a closed-form sample size formula for SW-CRTs with a binary outcome, which offers great flexibility to account for unbalanced randomisation, missing data, and arbitrary correlation structures. We also present a correction approach to address the issue of under-estimated variance by GEE estimator when the sample size is small. Simulation studies and application to a real clinical trial are presented.

Keywords:

Stepped wedge
GEE
clinical trials
power analysis
sample size

This article is part of the following collections:

Special Issue on Experimental Design

1. Introduction

Recently, stepped wedge cluster randomised trials (SW-CRTs) are gaining popularity in large-scale biomedical and healthcare studies (Bacchieri et al., Citation2010; Bailet et al., Citation2009; Lenguerrand et al., Citation2020; Scalia et al., Citation2019; van Holland et al., Citation2012). Clusters of subjects are randomly assigned to different treatment sequences. Within each sequence, all clusters receive the control initially, but switch to the intervention at a particular step, as illustrated in Figure . There are two main types of SW-CRTs. One is the closed-cohort SW-CRT, which follows the same cohort of subjects through the treatment sequences. i.e., each subject contributes a set of longitudinal measurements. The other is the cross-sectional SW-CRT, which enrols a new panel of subjects at each step, i.e., each subject only contributes one measurement (Beard et al., Citation2015; Copas et al., Citation2015; Martin et al., Citation2016). SW-CRTs are considered advantageous in that (1) all clusters eventually receive the intervention, mitigating the ethical concern of withholding the effective intervention; (2) clusters switch from control to intervention in one direction only, which is more convenient in terms of washout compared to crossover studies with multiple periods; (3) they reduce the logistic burden of implementing the intervention simultaneously at many centres or facilities (Edwards, Citation2013; Zhou et al., Citation2020).

Figure 1. A diagram of an SW-CRT with 4 time points and 3 sequences (shaded and blank cells represent intervention and control, respectively.)

At the design stage, it is important to determine the number of clusters to ensure that clinical trials are adequately powered to detect effective interventions. Hussey and Hughes (Citation2007) proposed a sample size estimation approach based on mixed-effect models for cross-sectional SW-CRTs with continuous outcomes, which also extends to binary outcomes. This approach assumes the correlation between any pairs of measurements from the same cluster to be identical, regardless of whether they are observed during the same period or not. This assumption might over-simplify reality because the correlation between concurrent observations is likely stronger than that between non-concurrent ones. Furthermore, among non-concurrent observations, the correlation might decay as observations become temporally further apart. Hooper et al. (Citation2016) derived a sample size formula based on multilevel models for closed-cohort and cross-sectional SW-CRTs with continuous outcomes. Within clusters, a separate exchangeable correlation is assumed for concurrent and non-concurrent observations, with the former stronger than the latter. Kasza et al. (Citation2019) proposed a sample size method that allows the correlation between non-concurrent observations to decay exponentially. Li et al. (Citation2018) proposed sample size procedures for closed-cohort SW-CRTs with continuous and binary responses under the framework of generalised estimating equations (GEE), which employs a block exchangeable within-cluster correlation structure and this procedure can be extended to cross-sectional SW-CRTs. Zhou et al. (Citation2020) developed a numerical power analysis method for SW-CRTs with binary outcomes based on the maximum-likelihood approach. Other developments in sample size calculation for SW-CRTs include, but are not limited to, Hemming et al. (Citation2015), Woertman et al. (Citation2013), Moulton et al. (Citation2007), and Baio et al. (Citation2015).

Most of the existing sample size methods assume relatively simpler correlation structures and no missing data, which might not hold in real SW-CRTs. Especially in closed-cohort SW-CRTs, with prolonged follow-up, the correlation structures that simultaneously involve between-subject and within-subject (longitudinal) correlations can be complicated and the problem of missing data cannot be ignored. In this study, based on the GEE approach (Liang & Zeger, Citation1986), we present a closed-form sample size formula for SW-CRTs with a binary outcome. It is generally applicable to both cross-sectional and closed-cohort SW-CRTs. It also provides great flexibility to account for design issues frequently encountered by practitioners including unbalanced randomisation, different severity and patterns of missing data, and complicated correlation structures.

This article is organised as follows. In Section 2, we describe the model and derive a closed-form formula to calculate the required number of clusters in SW-CRTs with binary outcomes. In Section 3, we conduct extensive simulations to evaluate the performance of the proposed method and to explore the impact of different design parameters on sample size requirement. In Section 4, we apply this method to the design of a postoperative delirium study. In Section 5, we conclude with a discussion.

2. Method

Suppose in a closed-cohort SW-CRT with T time points, n clusters are randomly assigned to S sequences (S = T−1). These clusters are randomly assigned to the sth sequence with probability $p_{s}$ ( $s = 1, \dots, S$ ), where $\sum_{s = 1}^{S} p_{s} = 1$ . The resulting number of clusters assigned to the sth sequence is denoted by $n_{s}$ , with $\sum_{s = 1}^{S} n_{s} = n$ . The cluster size (number of subjects per cluster) is denoted by J. Let $Y_{s i j t}$ denote the binary measurement obtained from the jth subject ( $j = 1, \dots, J$ ) within the ith cluster ( $i = 1, \dots, n_{s}$ ) under the sth sequence ( $s = 1, \dots, S$ ) at time t ( $t = 1, \dots, T)$ . We define $E (Y_{s i j t}) = μ_{s t}$ and $μ_{s t}$ is modelled by $\log (\frac{μ_{s t}}{1 - μ_{s t}}) = λ_{t} + v_{s t} ζ .$ Here $λ_{t}$ is the time-specific intercept, $v_{s t}$ is the treatment indicator with 0/1 indicating control/intervention, and ζ represents the intervention effect, which is assumed to be constant over time. The specification of $λ_{t}$ ( $t = 1, \dots, T$ ) allows us to account for temporal trends of arbitrary shapes. As for the second moment, first we have $Var (Y_{s i j t}) = μ_{s t} (1 - μ_{s t})$ . For the vector of longitudinal observations from each individual, $Y_{s i j} = (Y_{s i j 1}, \dots, Y_{s i j T})^{'}$ , we define $Ω = Corr (Y_{s i j})$ to be the within-subject (longitudinal) correlation matrix with diagonal elements $ω_{t t} = 1$ ( $t = 1, \dots, T$ ). Furthermore, we use $Φ = Corr (Y_{s i j}, Y_{s i j^{'}})$ to denote correlation between subjects from the same clusters. It can be considered as the matrix version of ICC (intracluster correlation coefficient). Define $Y_{s i} = (Y_{s i 1}^{'}, \dots, Y_{s i J}^{'})^{'}$ to be the collection of measurements from the $(s, i)$ th cluster. The correlation matrix of $Y_{s i}$ is $R = I_{J} \otimes (Ω - Φ) + (1_{J} 1_{J}^{'}) \otimes Φ,$ where ⊗ is the Kronecker product operator, $I_{J}$ is a $J \times J$ identity matrix, and $1_{J}$ is a vector of length J with all elements being 1. Finally, the observations are assumed to be independent across clusters. Hence, we complete the model specification for the first two moments of $Y_{s i}$ , as is required by the GEE approach (Liang & Zeger, Citation1986).

Define $β = (λ_{1}, \dots, λ_{T}, ζ)^{'}$ to be the vector of parameters. Based on the GEE approach with an independent working correlation structure, the estimate $\hat{β}$ can be solved from the score function $U (β) = 0$ using the Newton–Raphson method, where $U (β) = \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} \sum_{j = 1}^{J} X_{s}^{'} [Y_{s i j} - μ_{s} (β)]$ with $μ_{s} = (μ_{s 1}, \dots, μ_{s T})$ , and $X_{s} = (I_{T}, v_{s})$ is the design matrix with $v_{s} = (v_{s 1}, \dots, v_{s T})^{'}$ . Liang and Zeger (Citation1986) proved that as $n \to \infty$ , $\sqrt{n} (\hat{β} - β)$ asymptotically follows a multivariate normal distribution with zero mean and covariance matrix $Σ = A^{- 1} E A^{- 1}$ , where $A = J \sum_{s = 1}^{S} p_{s} {(X_{s}^{'} G_{s})}^{\otimes 2}$ and $E = J \sum_{s = 1}^{S} p_{s} X_{s}^{'} G_{s} [Ω + (J - 1) Φ] G_{s} X_{s} .$ Here $G_{s}$ is a $T \times T$ diagonal matrix with the $(t, t)$ th element being $\sqrt{μ_{s t} (1 - μ_{s t})}$ for $t = 1, \dots, T$ and $C^{\otimes 2} = {C C}^{'}$ for a matrix $C$ . In practice, $A$ and $E$ can be estimated by $\hat{A} = \frac{J}{n} \sum_{s = 1}^{S} n_{s} {(X_{s}^{'} {\hat{G}}_{s})}^{\otimes 2}$ and $\hat{E} = n^{- 1} \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} {(\sum_{j = 1}^{J} X_{s}^{'} {\hat{e}}_{s i j})}^{\otimes 2},$ where ${\hat{e}}_{s i j} = Y_{s i j} - {\hat{μ}}_{s}$ is the residual vector with ${\hat{μ}}_{s} = ({\hat{μ}}_{s 1}, \dots, {\hat{μ}}_{s T})^{'}$ , and ${\hat{G}}_{s}$ is $T \times T$ diagonal with elements being $\sqrt{{\hat{μ}}_{s t} (1 - {\hat{μ}}_{s t})}$ .

Let $\hat{ζ}$ be the estimator of ζ and ${\hat{σ}}_{ζ}^{2}$ be the $(T + 1, T + 1)$ th element of $\hat{Σ} = {\hat{A}}^{- 1} \hat{E} {\hat{A}}^{- 1}$ . Based on the test statistic $\sqrt{n} \hat{| ζ |} / {\hat{σ}}_{ζ}$ , to reject the null hypothesis $H_{0} : ζ = 0$ with a power of $1 - γ$ at a two-sided significance level of α, the required number of clusters can be computed by (1) $n = \frac{\begin{matrix} {(z_{1 - α / 2} + z_{1 - γ})}^{2} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} \\ G_{s} [Ω + (J - 1) Φ] G_{s} (v_{s} - \bar{a}) \end{matrix}}{ζ_{0}^{2} J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}},$ (1) where $ζ_{0}$ is the true intervention effect, $w_{s t} = p_{s} μ_{s t} (1 - μ_{s t})$ , ${\bar{a}}_{t} = \frac{\sum_{s = 1}^{S} w_{s t} v_{s t}}{\sum_{s = 1}^{S} w_{s t}}$ is the weighted proportion of subjects receiving intervention at time t, $\bar{a} = ({\bar{a}}_{1}, \dots, {\bar{a}}_{T})^{'}$ , and $z_{c}$ is the $100 c$ th percentile of the standard normal distribution with 0<c<1. Details of derivation are presented in Appendix .

In closed-cohort SW-CRTs, longitudinal measurements are planned on each subject at pre-specified time points. However, in real clinical trials with prolonged follow-up, the occurrence of missing data is usually inevitable. Ignoring missing data in sample size calculation will lead to under-powered studies. To address this problem, we introduce the missing indicator $Δ_{s i j t} = 0 / 1$ if $Y_{s i j t}$ is observed/missing. We assume that the occurrence of missing data only depends on time and define the marginal observational probability $Prob (Δ_{s i j t} = 1) = δ_{t}$ . To accommodate different missing data patterns, we also introduce the joint observational probability $Prob (Δ_{s i j t} Δ_{s i j t^{'}} = 1) = δ_{t t^{'}}$ , which is the probability that a subject contributes observations both at time t and $t^{'}$ ( $t \neq t^{'}$ ). For example, under the independent missing (IM) pattern, the occurrences of missing data are independent between t and $t^{'}$ , hence $δ_{t t^{'}} = δ_{t} δ_{t^{'}}$ . On the other hand, under the monotone missing (MM) pattern, a subject having missing data at t would miss all subsequent observation, hence $δ_{t t^{'}} = δ_{t^{'}}$ for $t^{'} > t$ . Under the assumption of missing completely at random, $A$ and $E$ can be rewritten as $A^{*} = J \sum_{s = 1}^{S} p_{s} X_{s}^{'} diag (δ) G_{s} G_{s} X_{s}$ and $\begin{aligned} E^{*} & = J \sum_{s = 1}^{S} p_{s} X_{s}^{'} G_{s} \\ \times [\tilde{δ} \circ Ω + (J - 1) diag (δ) Φ diag (δ)] G_{s} X_{s}, \end{aligned}$ respectively. Here $\circ$ indicates the operation of Hadamard product, $diag (δ)$ is a $T \times T$ diagonal matrix with diagonal elements being $δ = (δ_{1}, \dots, δ_{T})^{'}$ , and $\tilde{δ}$ is a $T \times T$ matrix with the diagonal $(t, t)$ th element being $δ_{t}$ and off-diagonal ( $t, t^{'}$ )th element being $δ_{t t^{'}}$ . Then the generalised formula for the number of clusters accounting for missing data is (2) $n^{*} = \frac{\begin{matrix} {(z_{1 - α / 2} + z_{1 - γ})}^{2} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} G_{s} \\ [\tilde{δ} \circ Ω + (J - 1) diag (δ) Φ diag (δ)] G_{s} (v_{s} - \bar{a}) \end{matrix}}{ζ_{0}^{2} J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) δ_{t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}} .$ (2) Formula (Equation2(2) $n^{*} = \frac{\begin{matrix} {(z_{1 - α / 2} + z_{1 - γ})}^{2} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} G_{s} \\ [\tilde{δ} \circ Ω + (J - 1) diag (δ) Φ diag (δ)] G_{s} (v_{s} - \bar{a}) \end{matrix}}{ζ_{0}^{2} J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) δ_{t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}} .$ (2) ) offers great flexibility to accommodate various missing data patterns (through $\tilde{δ}, δ$ ), complicated correlation structures (through $Ω, Φ$ ), and unbalanced randomisation (through $p_{s}$ ). On the other hand, given n and the true treatment effect $ζ_{0}$ , the anticipated power can be evaluated by $P (Z < \sqrt{\frac{\begin{matrix} n^{*} J ζ_{0}^{2} \sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) \\ δ_{t} {\bar{a}}_{t} (1 - {\bar{a}}_{t}) \end{matrix}}{\begin{matrix} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} G_{s} \\ [\tilde{δ} \circ Ω + (J - 1) diag \\ (δ) Φ diag (δ)] \\ G_{s} (v_{s} - \bar{a}) \end{matrix}}} - z_{1 - α / 2}),$ where Z is a standard normal variable.

We have described the sample size calculation method for closed-cohort SW-CRTs with binary outcomes. In practice, many SW-CRTs are cross-sectional, where new panels of subjects are measured at each time point. Using the same notation framework, the proposed method easily accommodates cross-sectional SW-CRTs. Specifically, we consider the cluster size under a cross-sectional SW-CRT to be JT. At each time point, J subjects are selected from each cluster for measurements, and these subjects will not be selected again in the future. It implies that between-period correlation $ω_{t t^{'}}$ in $Ω$ is equivalent to within-period correlation $ϕ_{t t^{'}}$ in $Φ$ . The required number of clusters can be similarly calculated using Equation (Equation2(2) $n^{*} = \frac{\begin{matrix} {(z_{1 - α / 2} + z_{1 - γ})}^{2} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} G_{s} \\ [\tilde{δ} \circ Ω + (J - 1) diag (δ) Φ diag (δ)] G_{s} (v_{s} - \bar{a}) \end{matrix}}{ζ_{0}^{2} J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) δ_{t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}} .$ (2) ).

3. Simulation studies

We conducted simulation studies to evaluate the performance of the proposed sample size method. Suppose we are planning a closed-cohort SW-CRT with T = 4 time points and cluster size J = 15. We assume balanced randomisation to the S = 3 sequences, i.e., $p_{1} = \dots = p_{S} = 1 / 3$ . We set the time-specific intercepts $λ_{t} = 0.01 (t - 1)$ for $t = 1, \dots, T$ . We explore two values for the intervention effect $ζ_{0}$ : 0.41 and 0.59, which corresponded to odds ratios of 1.5 and 1.8, respectively. Different correlation structures are explored: for the longitudinal correlation matrix ( $Ω$ ), we investigate the CS and AR(1) structures, with off-diagonal elements being $ω_{t t^{'}} = ρ_{1}$ and $ω_{t t^{'}} = ρ_{1}^{| t - t^{'} | / (T - 1)} (t \neq t^{'})$ , respectively; for the between-subject correlation matrix, we specify $Φ = 1 1^{'} ρ_{3} + (ρ_{2} - ρ_{3}) I$ with diagonal ICC being $ρ_{2}$ and off-diagonal between-subject between-period correlation $ρ_{3}$ being 0.005. We also explored different correlation values $(ρ_{1}, ρ_{2}) = {(0.1, 0.03), (0.2, 0.03), (0.1, 0.05), (0.2, 0.05)}$ . For missing data, we considered four sets of marginal observational probabilities as follows: $\begin{aligned} δ_{1} & = (1.00, 1.00, 1.00, 1.00), \\ δ_{2} & = (1.00, 0.80, 0.75, 0.70), \\ δ_{3} & = (1.00, 0.90, 0.80, 0.70), \\ δ_{4} & = (1.00, 1.00, 0.85, 0.70) . \end{aligned}$ $δ_{1}$ represents the scenario where all subjects contribute complete observations, while $δ_{2} - δ_{4}$ represents scenarios of various trends in missing data, but with the same attrition rate (0.3) at the end of the study. The IM and MM missing data patterns will be explored, which leads to different joint observational probabilities (see Section 2). The null hypothesis is $H_{0} : ζ = 0$ . We set the power $1 - γ = 0.8$ and two-sided type I error rate $α = 0.05$ . For each combination of design parameters, we calculate the required number of clusters (n) and conducted simulations to evaluate the empirical power and type I error. The simulation algorithm is outlined as follows:

Calculate the required number of clusters (n) using Equation (Equation2(2) $n^{*} = \frac{\begin{matrix} {(z_{1 - α / 2} + z_{1 - γ})}^{2} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} G_{s} \\ [\tilde{δ} \circ Ω + (J - 1) diag (δ) Φ diag (δ)] G_{s} (v_{s} - \bar{a}) \end{matrix}}{ζ_{0}^{2} J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) δ_{t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}} .$ (2) ).
Generate the numbers of clusters randomised to the three sequences ( $n_{1}, n_{2}, n_{3}$ ) from a multinomial distribution ( $n, p_{1}, p_{2}, p_{3}$ ).
For each cluster, generate a vector of correlated binary measurements based on true effect $ζ = ζ_{0}$ and other design parameters ( $Ω$ and $Φ$ ) based on the method of Emrich and Piedmonte (Citation1991).
Generate missing indicators under different missing patterns and marginal observational probabilities $δ$ .
Calculate $\hat{ζ}$ and ${\hat{σ}}_{ζ}$ . The estimation bias can be corrected using the combination of Morel et al. (Citation2003) and Donner and Klar's (Citation2000) methods. If $\sqrt{n} | \hat{ζ} | / {\hat{σ}}_{ζ} > z_{1 - α / 2}$ , then reject the null hypothesis.
Repeat Steps 2–5 5000 times. The empirical power is calculated as the proportion of iterations that reject the null hypothesis. The empirical type I error is evaluated similarly except for setting $ζ = 0$ in Step 3.

In Tables and , the columns under ‘GEE’ present the simulation results. Each cell presents the required number of clusters as well as the empirical power and type I error. We have several observations. First, more clusters are required when longitudinal correlation ( $ρ_{1}$ ) and between-subject correlation ( $ρ_{2}$ ) get larger. For example, in the first row of Table , the required number of clusters changes from 45 to 46 when the longitudinal correlation ( $ρ_{1}$ ) increases from 0.1 to 0.2. On the other hand, in the first cell of Tables and , the required number of clusters increases from 45 to 53 when the between-subject correlation ( $ρ_{2}$ ) increases from 0.03 to 0.05. Second, the longitudinal correlation structures affect the required number of clusters, which can be shown by comparing the CS and AR(1) panels in each table. Third, different missing patterns and observational probabilities affect the required number of clusters. Given the same attrition rate at the end of the study, scenarios with greater dropout initially lead to more missing data and larger sample size requirements. For example, sample sizes under $δ_{2}$ are always the largest among $δ_{1}$ – $δ_{4}$ . Furthermore, under the MM missing pattern, missing data tend to concentrate on a few subjects, which leads to greater information loss and larger sample size requirement. Finally, compared with the nominal type I error of 0.05, the empirical type I errors are generally inflated (up to 0.0868). The reason is that when the number of clusters is relatively small, the conventional GEE approach tends to underestimate the variance of the treatment effect (Morel et al., Citation2003).

Table 1. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with $ρ_{2} = 0.03$ .

Display Table

Table 2. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with $ρ_{2} = 0.05$ .

Display Table

To address the issue of underestimated variance, we have explored different correction methods, including Mancl and DeRouen (Citation2001), Kauermann and Carroll (Citation2001), Ziegler and Vens (Citation2010), Morel et al. (Citation2003), Fay and Graubard (Citation2001) and Pan and Wall (Citation2002). We find that the combination of Morel et al. (Citation2003) (MBN) and Donner and Klar's (Citation2000) methods achieves a good balance between satisfactory performance and easy implementation in practice. Specifically, the MBN method modifies the GEE covariance estimator with an additional term $\begin{aligned} Σ_{M B N} & = A^{- 1} E A^{- 1} + min {0.5, \frac{T + 1}{n - T - 1}} \\ \times max {1, \frac{1}{T + 1} trace (A^{- 1} E)} A^{- 1} . \end{aligned}$ Donner and Klar's (Citation2000) method suggests adding one cluster to each treatment arm. The results under this combination approach are presented in Tables and under the columns of ‘Adjusted GEE’. The empirical powers and type I errors are very close to their nominal values of 0.8 and 0.05, respectively. For example, in Table when the number of clusters is less than 30, the type I errors without adjustment are all severely inflated (larger than 0.07). After adjustment, all the type I errors are close to the nominal level 0.05.

We also conduct simulations to investigate the performance of the proposed method in cross-sectional SW-CRTs. Because each subject only contributes one measurement, the issue of missing data does not apply. We set $Ω = 1 1^{'} ρ + (1 - ρ) I$ and $Φ = 1 1^{'} ρ$ . Two values are explored for ρ: 0.03 and 0.05. Table presents the required number of clusters with empirical power and type I error for cross-sectional SW-CRTs. Similar to the observations from the closed-cohort SW-CRTs, a smaller correlation (ρ) is associated with a smaller sample size requirement. Furthermore, the proposed correction approach performs well in maintaining the empirical powers and type I errors at their nominal levels.

Table 3. Required number of clusters (empirical power, empirical type I error) for cross-sectional studies.

Display Table

We performed additional simulations to evaluate the relationship between the required number of clusters and power. We used the same parameter settings as described above for closed-cohort studies with the CS correlation structures. Under different combinations of design parameters, as shown in Figure (solid lines), testing power increases as the number of clusters increases. Furthermore, we compared the proposed method with an existing method (Li et al., Citation2018). Since Li's method does not account for missing data, we only consider the scenario of complete observations. To maximise the usability of the proposed sample size method in pragmatic settings, we assume that when analysing trial data researchers do not know the true correlation structure and make inference using GEE with independent working correlation. This practical solution is slightly less efficient than Li's method which uses the true correlation (see Figure ). We believe the proposed method nonetheless provides a useful sample size solution for the design of pragmatic SW-CRTs because it compensates for a slight loss in efficiency by advantages in (1) a closed-form sample size formula; (2) accommodation of missing data; and (3) not requiring the true correlation to be known during inference.

Figure 2. Relationship between the number of clusters and power (P and L denote the proposed method and Li's method, respectively).

4. Example

We apply the proposed method to a cross-sectional SW-CRT study (Mouchoux et al., Citation2011), which was designed to evaluate whether a multifaceted programme (including consulting and training, etc.) could decrease postoperative delirium in patients aged 75 and older. The outcome of interest is the occurrence of delirium within seven days after surgery. Suppose this study is conducted over a six-month period with T = 4 pre-specified time points and surgical wards are assigned to S = 3 sequences with balanced randomisation. At each time point, 15 patients per surgical ward will receive assigned intervention and the delirium outcome will be recorded. It is hypothesised that the multifaceted programme can reduce the occurrence of delirium from 60% to 40%, which corresponds to an odds ratio of 0.44 and a constant time effect of 0.41. By assuming $ρ = 0.05$ in $Ω = 1 1^{'} ρ + (1 - ρ) I$ and $Φ = 1 1^{'} ρ$ , we will need 16 wards to achieve 80% power at a two-sided significance level of 0.05. If 30 patients are selected per surgical ward for measurements, 12 wards will be needed.

5. Discussion

In this study, we propose a sample size and power calculation method that is generally applicable to both closed-cohort and cross-sectional SW-CRTs with binary outcomes. We directly incorporate several design issues encountered in pragmatic trials into power analysis and were able to provide a closed-form sample size solution. Through different specifications of correlation matrices $Ω$ and $Φ$ , the proposed method offers great flexibility to account for different types of SW-CRTs and correlation structures. The inclusion of parameters $p_{s}$ allows researchers to employ unbalanced randomisation. Furthermore, our method maintains the desired power in the presence of missing data through the specification of marginal observational probabilities at population level ( $δ$ ), and the missing pattern at subject level ( $\tilde{δ}$ ). In simulation studies, we have investigated the independent (IM) and monotone (MM) missing patterns. In practice, a clinical trial might encounter different types of missing patterns. For example, it is possible that some subjects miss a few appointments due to accidents (IM), while some subjects drop out in the middle of study (MM). The proposed sample size method can accommodate such scenarios by specifying a mixture of IM and MM, where $δ_{t}^{(M I X)} = w δ_{t}^{(I M)} + (1 - w) δ_{t}^{(M M)}$ and $δ_{t t^{'}}^{(M I X)} = w δ_{t t^{'}}^{(I M)} + (1 - w) δ_{t t^{'}}^{(M M)},$ where w and 1−w are weights for IM and MM, respectively. Finally, we have present a correction approach to address the issue of underestimated variance by the GEE method when the number of clusters is limited in SW-CRTs.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Patient-Centered Outcomes Research Institute [ME-1609-36761].

References

Bacchieri, G., Barros, A. J., Gonçalves, H., & Gigante, D. P. (2010). A community intervention to prevent traffic accidents among bicycle commuters. Revista De Saude Publica, 44(5), 867–875. https://doi.org/10.1590/S0034-89102010000500012
PubMed Web of Science ®Google Scholar
Bailet, L. L., Repper, K. K., Piasta, S. B., & Murphy, S. P. (2009). Emergent literacy intervention for prekindergarteners at risk for reading failure. Journal of Learning Disabilities, 42(4), 336–355. https://doi.org/10.1177/0022219409335218
PubMed Web of Science ®Google Scholar
Baio, G., Copas, A., Ambler, G., Hargreaves, J., Beard, E., & Omar, R. Z. (2015). Sample size calculation for a stepped wedge trial. Trials, 16(1), 354. https://doi.org/10.1186/s13063-015-0840-9
Google Scholar
Beard, E., Lewis, J. J., Copas, A., Davey, C., Osrin, D., Baio, G., Thompson, J. A., Fielding, K. L., Omar, R. Z., Ononge, S., Hargreaves, J., & Prost, A. (2015). Stepped wedge randomised controlled trials: systematic review of studies published between 2010 and 2014. Trials, 16(1), 353. https://doi.org/10.1186/s13063-015-0839-2
Google Scholar
Copas, A. J., Lewis, J. J., Thompson, J. A., Davey, C., Baio, G., & Hargreaves, J. R. (2015). Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials, 16(1), 352. https://doi.org/10.1186/s13063-015-0842-7
Google Scholar
Donner, A., & Klar, N. (2000). Design and analysis of cluster randomization trials in health research. Arnold.
Google Scholar
Edwards, S. J. (2013). Ethics of clinical science in a public health emergency: Drug discovery at the bedside. The American Journal of Bioethics, 13(9), 3–14. https://doi.org/10.1080/15265161.2013.813597
PubMed Web of Science ®Google Scholar
Emrich, L. J., & Piedmonte, M. R. (1991). A method for generating high-dimensional multivariate binary variates. The American Statistician, 45(4), 302–304. https://doi.org/10.2307/2684460
Web of Science ®Google Scholar
Fay, M. P., & Graubard, B. I. (2001). Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics, 57(4), 1198–1206. https://doi.org/10.1111/j.0006-341X.2001.01198.x
PubMed Web of Science ®Google Scholar
Hemming, K., Haines, T. P., Chilton, P. J., Girling, A. J., & Lilford, R. J. (2015). The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ (Clinical Research Ed.), 350, h391. https://doi.org/10.1136/bmj.h391
PubMed Web of Science ®Google Scholar
Hooper, R., Teerenstra, S., de Hoop, E., & Eldridge, S. (2016). Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Statistics in Medicine, 35(26), 4718–4728. https://doi.org/10.1002/sim.v35.26
Web of Science ®Google Scholar
Hussey, M. A., & Hughes, J. P. (2007). Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials, 28(2), 182–191. https://doi.org/10.1016/j.cct.2006.05.007
PubMed Web of Science ®Google Scholar
Kasza, J., Hemming, K., Hooper, R., Matthews, J., & Forbes, A. (2019). Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Statistical Methods in Medical Research, 28(3), 703–716. https://doi.org/10.1177/0962280217734981
Web of Science ®Google Scholar
Kauermann, G., & Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. Journal of the American Statistical Association, 96(456), 1387–1396. https://doi.org/10.1198/016214501753382309
Web of Science ®Google Scholar
Lenguerrand, E., Winter, C., Siassakos, D., MacLennan, G., Innes, K., Lynch, P., Cameron, A., Crofts, J., McDonald, A., McCormack, K., Forrest, M., Norrie, J., Bhattacharya, S., & Draycott, T. (2020). Effect of hands-on interprofessional simulation training for local emergencies in Scotland: The thistle stepped-wedge design randomised controlled trial. BMJ Quality & Safety, 29(2), 122–134. https://doi.org/10.1136/bmjqs-2018-008625
Web of Science ®Google Scholar
Li, F., Turner, E. L., & Preisser, J. S. (2018). Sample size determination for GEE analyses of stepped wedge cluster randomized trials. Biometrics, 74(4), 1450–1458. https://doi.org/10.1111/biom.v74.4
Web of Science ®Google Scholar
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis for discrete and continuous outcomes using generalized linear models. Biometrika, 84, 3–32. https://doi.org/10.2307/2531248
Google Scholar
Mancl, L. A., & DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics, 57(1), 126–134. https://doi.org/10.1111/biom.2001.57.issue-1
PubMed Web of Science ®Google Scholar
Martin, J., Taljaard, M., Girling, A., & Hemming, K. (2016). Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials. BMJ Open, 6(2), e010166. https://doi.org/10.1136/bmjopen-2015-010166
Web of Science ®Google Scholar
Morel, J. G., Bokossa, M., & Neerchal, N. K. (2003). Small sample correction for the variance of GEE estimators. Biometrical Journal, 45(4), 395–409. https://doi.org/10.1002/bimj.200390021
Web of Science ®Google Scholar
Mouchoux, C., Rippert, P., Duclos, A., Fassier, T., Bonnefoy, M., Comte, B., Heitz, D., Colin, C., & Krolak-Salmon, P. (2011). Impact of a multifaceted program to prevent postoperative delirium in the elderly: The CONFUCIUS stepped wedge protocol. BMC Geriatrics, 11(1), 1157. https://doi.org/10.1186/1471-2318-11-25
Google Scholar
Moulton, L. H., Golub, J. E., Durovni, B., Cavalcante, S. C., Pacheco, A. G., Saraceni, V., King, B., & Chaisson, R. E. (2007). Statistical design of THRio: A phased implementation clinic-randomized study of a tuberculosis preventive therapy intervention. Clinical Trials, 4(2), 190–199. https://doi.org/10.1177/1740774507076937
Web of Science ®Google Scholar
Pan, W., & Wall, M. M. (2002). Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Statistics in Medicine, 21(10), 1429–1441. https://doi.org/10.1002/(ISSN)1097-0258
PubMed Web of Science ®Google Scholar
Scalia, P., Durand, M.-A., Forcino, R. C., Schubbe, D., Barr, P. J., O'Brien, N., O'Malley, A. J., Foster, T., Politi, M. C., Laughlin-Tommaso, S., Banks, E., Madden, T., Anchan, R. M., Aarts, J. W. M., Velentgas, P., Balls-Berry, J., Bacon, C., Adams-Foster, M., Mulligan, C. C., …, Elwyn, G. (2019). Implementation of the uterine fibroids option grid patient decision aids across five organizational settings: A randomized stepped-wedge study protocol. Implementation Science, 14(1), 100. https://doi.org/10.1186/s13012-019-0933-z
Web of Science ®Google Scholar
van Holland, B. J., de Boer, M. R., Brouwer, S., Soer, R., & Reneman, M. F. (2012). Sustained employability of workers in a production environment: Design of a stepped wedge trial to evaluate effectiveness and cost-benefit of the POSE program. BMC Public Health, 12(1), 1003. https://doi.org/10.1186/1471-2458-12-1003
Google Scholar
Woertman, W., de Hoop, E., Moerbeek, M., Zuidema, S. U., Gerritsen, D. L., & Teerenstra, S. (2013). Stepped wedge designs could reduce the required sample size in cluster randomized trials. Journal of Clinical Epidemiology, 66(7), 752–758. https://doi.org/10.1016/j.jclinepi.2013.01.009
PubMed Web of Science ®Google Scholar
Zhou, X., Liao, X., Kunz, L. M., Normand, S.-L. T., Wang, M., & Spiegelman, D. (2020). A maximum likelihood approach to power calculations for stepped wedge designs of binary outcomes. Biostatistics (Oxford, England), 21(1), 102–121. https://doi.org/10.1093/biostatistics/kxy031
PubMed Web of Science ®Google Scholar
Ziegler, A., & Vens, M. (2010). Generalized estimating equations. Methods of Information in Medicine, 49(05), 421–425. https://doi.org/10.3414/ME10-01-0026
PubMedGoogle Scholar

Appendix. Derivation of Equation (1)

First we have

\hat{A} = \frac{J}{n} \sum_{s = 1}^{S} n_{s} {(X_{s}^{'} {\hat{G}}_{s})}^{\otimes 2} .

n \to \infty

\hat{A}

approaches

A = J \sum_{s = 1}^{S} p_{s} {(X_{s}^{'} G_{s})}^{\otimes 2}

On the other hand, we have

\begin{aligned} \hat{E} & = n^{- 1} \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} {(\sum_{j = 1}^{J} X_{s}^{'} {\hat{e}}_{s i j})}^{\otimes 2} \\ = n^{- 1} \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} (\sum_{j = 1}^{J} X_{s}^{'} {\hat{e}}_{s i j}) (\sum_{j = 1}^{J} {\hat{e}}_{s i j}^{'} X_{s}) \\ = n^{- 1} \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} (\sum_{j = 1}^{J} \sum_{j^{'} = 1}^{J} X_{s}^{'} {\hat{e}}_{s i j} {\hat{e}}_{s i j^{'}}^{'} X_{s}) \\ = n^{- 1} \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} (\sum_{j = 1}^{J} X_{s}^{'} {\hat{e}}_{s i j} {\hat{e}}_{s i j}^{'} X_{s} + 2 \sum_{j = 1}^{J - 1} \sum_{j^{'} = j + 1}^{J} X_{s}^{'} {\hat{e}}_{s i j} {\hat{e}}_{s i j^{'}}^{'} X_{s}) . \end{aligned}

n \to \infty

\hat{E}

approaches

E = J \sum_{s = 1}^{S} p_{s} X_{s}^{'} G_{s} [Ω + (J - 1) Φ] G_{s} X_{s} .

We are only interested in

σ_{ζ}^{2}

, which is the

(T + 1, T + 1)

-component of

{Σ = A}^{- 1} E A^{- 1}

. The last row of

A^{- 1}

can be simplified as

{[J \sum_{t = 1}^{T} \sum_{s = 1}^{S} w_{s t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{- 1} [\begin{array}{cc} - \bar{a} & 1 \end{array}],

where

w_{s t} = p_{s} μ_{s t} (1 - μ_{s t})

{\bar{a}}_{t} = \frac{\sum_{s = 1}^{S} w_{s t} v_{s t}}{\sum_{s = 1}^{S} w_{s t}}

is the weighted proportion of subjects receiving intervention at time t, and

\bar{a} = ({\bar{a}}_{1}, \dots, {\bar{a}}_{T})^{'}

. Then, we have

\begin{aligned} σ_{ζ}^{2} & = {[J \sum_{t = 1}^{T} \sum_{s = 1}^{S} w_{s t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{- 2} [\begin{array}{cc} - \bar{a} & 1 \end{array}] E {[\begin{array}{cc} - \bar{a} & 1 \end{array}]}^{'} \\ = {[J \sum_{t = 1}^{T} \sum_{s = 1}^{S} w_{s t} {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{- 2} [\begin{array}{cc} - \bar{a} & 1 \end{array}] J \sum_{s = 1}^{S} p_{s} X_{s}^{'} G_{s} \\ \times [Ω + (J - 1) Φ] G_{s} X_{s} {[\begin{array}{cc} - \bar{a} & 1 \end{array}]}^{'} \\ = \frac{\sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} G_{s} [Ω + (J - 1) Φ] G_{s} (v_{s} - \bar{a})}{J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}} . \end{aligned}

The required number of clusters is

\begin{aligned} n & = \frac{{(z_{1 - α / 2} + z_{1 - γ})}^{2} σ_{ζ}^{2}}{ζ_{0}^{2}} \\ = \frac{\begin{matrix} {(z_{1 - α / 2} + z_{1 - γ})}^{2} \sum_{s = 1}^{S} p_{s} {(v_{s} - \bar{a})}^{'} \\ G_{s} [Ω + (J - 1) Φ] G_{s} (v_{s} - \bar{a}) \end{matrix}}{ζ_{0}^{2} J {[\sum_{t = 1}^{T} (\sum_{s = 1}^{S} w_{s t}) {\bar{a}}_{t} (1 - {\bar{a}}_{t})]}^{2}} . \end{aligned}

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Sample size and power analysis for stepped wedge cluster randomised trials with binary outcomes

Abstract

1. Introduction

2. Method

3. Simulation studies

Table 1. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with $ρ_{2} = 0.03$ .

Table 2. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with $ρ_{2} = 0.05$ .

Table 3. Required number of clusters (empirical power, empirical type I error) for cross-sectional studies.

4. Example

5. Discussion

Disclosure statement

References

Appendix. Derivation of Equation (1)

Information for

Open access

Opportunities

Help and information

Sample size and power analysis for stepped wedge cluster randomised trials with binary outcomes

Abstract

1. Introduction

2. Method

3. Simulation studies

Table 1. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with ρ2=0.03.

Table 2. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with ρ2=0.05.

Table 3. Required number of clusters (empirical power, empirical type I error) for cross-sectional studies.

4. Example

5. Discussion

Disclosure statement

Additional information

Funding

References

Appendix. Derivation of Equation (1)

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with $ρ_{2} = 0.03$ .

Table 2. Required number of clusters (empirical power, empirical type I error) for closed-cohort studies with $ρ_{2} = 0.05$ .