1,703
Views
3
CrossRef citations to date
0
Altmetric
Statistical Issues and Challenges in Clinical Trials for COVID-19 Treatments, Vaccines, Medical Devices and Diagnostics

Mitigating Study Power Loss Caused by Clinical Trial Disruptions Due to the COVID-19 Pandemic: Leveraging External Data via Propensity Score-Integrated Approaches

, , , , ORCID Icon, , & show all
Pages 87-91 | Received 23 Sep 2020, Accepted 02 Dec 2020, Published online: 26 Jan 2021

Abstract

The spread of COVID-19 has created tremendous challenges to ongoing clinical studies essential to finding effective treatments and cures for a myriad of diseases, with some studies having suspended enrollment altogether. This perspective article focuses on the loss of power in clinical studies disrupted by the pandemic. It introduces an innovative use of the recently developed propensity score-integrated approaches: salvaging those stopped studies for which restarting enrollment is not feasible, by integrating external patients with data already collected to recover the loss of study power due to the premature stopping. A hypothetical example is provided to illustrate how to implement these methods while preserving study integrity.

1 Introduction

During the current COVID-19 public health emergency, tremendous challenges have been encountered in the design, conduct, and data analysis of ongoing clinical trials. This perspective article focuses on the recovery of lost power while preserving study integrity for clinical trials that are prematurely stopped as a result of the COVID-19 pandemic (van Dorn Citation2020). Clinical studies in varying stages, including single-arm studies, have suspended enrollment under the current extraordinary circumstances. In some cases, enrollment into the clinical study will restart and the study can complete its course, however this is not always feasible or practical. With that in mind, the question is: What can be done with the study data that have already been collected? Depending on how close the study is to the enrollment target, testing the prespecified study hypotheses using only the available data may not be meaningful due to insufficient power. Since discarding the data already collected would not only be a waste of invested resources but also could result in the unavailability of a potentially safe and effective medical product, we considered the possibility of salvaging the stopped study (which will hereafter be referred to as the current study) by mitigating the loss of power. Strategies for the mitigation of power loss generally fall into two categories: (1) using patients in the current study only and including additional follow-up time (e.g., for time-to-event endpoints) or repeated measurements (e.g., for longitudinal endpoints); (2) leveraging external patients such as those extracted from a historical clinical study or a real-world data (RWD) source (Akacha et al. Citation2020; Meyer et al. Citation2020). In the current article, we adapt recently developed methodological tools for the implementation of the second type of strategies. Before putting these statistical methods into practice, however, it’s important to consider the following first: (1) whether leveraging external data fits for purpose for the specific objectives of the current study, (2) whether the external data are relevant to the clinical question being asked (relevance), (3) whether the external data have adequate quality and integrity essential for regulatory decision making (reliability) (US Food and Drug Administration Citation2017), and (4) it is essential that protocol amendments for unplanned mid-study modifications in response to the pandemic be made with integrity and transparency in an outcome-free manner (Rubin Citation2008; Yue, Lu, and Xu Citation2014). From a regulatory perspective, whether the strategies referred to in Meyer et al. (Citation2020) and Akacha et al. (Citation2020) are viable is decided on a case by case basis, and ideally, the sponsor will consult with the relevant FDA review divisions in advance (US Food and Drug Administration Citation2020a, Citation2020b). Statistically, these strategies can be viewed as borrowing patients from an external data source to augment the current study data. By “borrowing patients,” we mean borrowing data collected on these patients. Innovative statistical methods are needed to implement this idea. This perspective article aims to draw attention to some recent such statistical innovations that are applicable, namely the propensity score-integrated power prior approach and the propensity score-integrated composite likelihood approach (collectively known as the propensity score-integrated approaches) (Chen et al. Citation2020; Wang et al. Citation2019, Citation2020). These statistical procedures were developed for the planned leveraging of RWD in a prospectively designed clinical study. Here, we introduce a new and unique application of these procedures as a mitigating strategy to salvage studies that are underpowered as a result of inability to complete enrollment due to the current pandemic.

2 Propensity Score-Integrated Approaches

The propensity score-integrated approaches are essentially a statistical procedure that makes borrowing external patients more justified, by using propensity score to form strata in such a way that within each stratum external patients are similar with those in the current study and carrying out the borrowing within the propensity score strata (in a sense that will become clear in Section 3). The idea is that borrowing would be more reasonable if the external patients being borrowed are similar with their “borrowers” in the current study. Here, “similar” means alike in terms of observed covariates. Thus, two groups of patients are similar if the distribution of observed covariates in one group is close to that in the other, in which case we say that the two groups are balanced in these covariates. Therefore, what propensity score-integrated approaches do can also be summarized as “borrowing after balancing.” To show how this works, let us first review the original definition of propensity score for comparative studies (Rosenbaum and Rubin Citation1983, Citation1984), and then adapt this definition to suit our goal. In a comparative study, the propensity score e(X) for a patient with a vector X of observed baseline covariates is the conditional probability of being in the treated group (T = 1) rather than the control group (T = 0) given the vector of baseline covariates X:(1) e(X)=Pr(T=1|X)(1)

Propensity score (PS) is a balancing score in the sense that conditional on the PS, the distribution of observed baseline covariates is the same between the treated and control patients. This implies that when the PSs are balanced across the two treatment groups, the distribution of all the observed covariates are balanced in expectation across the two groups. In practice, patients’ PSs are estimated by modeling the probability of treatment group membership T as a function of the observed covariates, typically via logistic regression. Estimated PSs can then be used for matching, weighting, or stratification, to balance the treated group and the control group to reduce bias in the statistical inference for treatment effects.

Since our objective is to create strata within which the observed covariates are balanced between the external patients and the current study patients, PS is defined accordingly as the conditional probability of a patient coming from the current study rather than the external data source given the value of covariates. More formally, let patients from the current study be labeled Z = 1 and patients from the external data source be labeled Z = 0, we define PS as(2) e(X)=Pr(Z=1|X)(2) where X is the vector of observed covariates. The PS so defined is still a balancing score, which means that, as in the case of comparative studies, when the PSs are balanced between the external patients and the current study patients, all the observed covariates are balanced in expectation across the two groups. To take advantage of this balancing property, one can form strata in which estimated PSs are relatively homogeneous, so that within each PS stratum the distribution of observed covariates among the external patients is close to that among patients in the current study, and balance for all covariates within each stratum is therefore expected. In practice, balance is assessed for each covariate, and, if not satisfactory for some covariates then PSs may be re-estimated. This makes the entire process of PS estimation, PS stratification, and balance assessment, which is called PS design, an iterative one.

Having explained how balance can be achieved, let us now introduce two ways to borrow external patients while down-weighting them, one Bayesian, using power prior, and the other frequentist, using composite likelihood. Down-weighting is needed when the number of patients available from the external source is larger than the number of patients that need to be borrowed by the current study (which is usually the case), and we want to limit the influence these external patients have on the study results.

The power prior (Chen and Ibrahim Citation2000) is originally intended to be an informative prior constructed from historical data (Ibrahim et al. Citation2015). If we substitute external data for historical data, the method fits our purpose perfectly. A power prior π for a parameter θ based on external data D0 is constructed as follows:(3) π(θ)[L(θ|D0)]απ0(θ),(3) where L(θ|D0) is the likelihood function of θ given the external data, π0(θ) is the initial prior distribution for θ, and α (0 α 1) is called the power parameter. This prior is multiplied to the likelihood function of θ given the current study data D1, L(θ|D1), to obtain the posterior distribution of θ,(4) π(θ|D1)[L(θ|D1)] π(θ),(4) completing the statistical inference for θ. From this construction, α can evidently be interpreted as the fraction of information external patients contribute to the inference for θ. For example, if α = 0.1, each external patient contributes 10% of their information, and the total amount of information the external patients bring to the statistical inference is equivalent to the information contributed by 0.1 times the total number of external patients, which can be interpreted as the (nominal) number of patients being borrowed for some common distributions such as normal and binomial. If α = 1 then the number of patients borrowed is equal to the number of all the patients constituting D0. At the other extreme if α = 0 then no patients are borrowed.

The composite likelihood (Varin, Reid, and Firth Citation2011) for the parameter of interest θ is a weighted product of probability density functions:(5) L(θ|Y)=if(yi|θ)λi,(5) where each i represents a patient and λi is a nonnegative weight. Clearly, when all the λi’s equal to 1, composite likelihood reduces to ordinary likelihood. To use composite likelihood to serve our purpose, we let λi = 1 for patients from the current study and 0<λi1 for patients from the external data source. If statistical inference for θ is conducted based on the composite likelihood after giving λi’s numerical values in this way, then we are essentially down-weighting the external patients relative to the current study patients. For example, if λi = 0.1 for all external patients, then each external patient contributes roughly 10% of their information, and the (nominal) number of patients borrowed is 0.1 times the total number of external patients.

The propensity score-integrated power prior approach (PSPP) and propensity score-integrated composite likelihood approach (PSCL) were originally developed for leveraging RWD to save sample size required for a traditional prospective clinical study. While power prior or composite likelihood in these approaches are used for statistical inference, PS design serves two objectives: (1) selecting external patients which are comparable to those enrolled in the current study in terms of baseline covariates, and (2) determining the weights used to down-weight information of external patients in statistical inference, as will be illustrated in the next section.

To ensure the integrity of study design and interpretability of statistical inference results, PS design needs to be outcome-free, that is, performed with no outcome data in sight. This is because, as has been mentioned, the goal of PS design is to adequately balance covariates, and, to improve balance, PS often needs to be re-estimated multiple times. Such flexibility creates a problem that must be addressed; that is, how to maintain study objectivity when outcome data already exist, which presents an opportunity for data dredging. Outcome-free design is essentially a matter of blinding or masking, which can also be referred to as building a firewall in the biopharmaceutical arena. Various schemes have been devised for this purpose. The scheme that we propose is for the investigator of the study to identify an independent statistician to perform the PS design to whom no outcome data are provided. The independent statistician shares with the investigator the responsibility of upholding the outcome-free principle (Yue, Lu, and Xu Citation2014; Li et al. Citation2016; Yue et al. Citation2016; Lu, Xu, and Yue Citation2019, Citation2020; Xu et al. Citation2020). Given their important role, the independent statistician needs to be identified in the study proposal.

Having provided an outline of PS-integrated approaches, we next present a hypothetical example to illustrate how they can be implemented to salvage a clinical study disrupted by the COVID-19 pandemic.

3 A Hypothetical Example

In this section, we use a hypothetical example to illustrate the use of PSPP and PSCL to recover power for a clinical study stopped prematurely due to the COVID-19 pandemic. For simplicity (and, as shown in the next section, without losing generality) consider a single-arm study for a medical device with planned sample size 380 and the following study hypothesesH0:θ36%vs.Ha: θ<36%,where θ is the unknown true one-year adverse event rate. The sample size of 380 was obtained as follows: assuming θ = 30%, 380 patients would provide approximately 80% power at the significance level of 0.05 (one-sided). Suppose that the enrollment was stopped at 290 patients due to the current pandemic, and that it is not practical to reopen the enrollment at a later time. To salvage the study, it was proposed to borrow 90 = 380 - 290 patients from a high-quality registry for this device in Europe (the device had been approved in EU), using PS-integrated approaches. An independent statistician was also identified in the proposal.

Based on the patient inclusion/exclusion criteria specified in the current study, 941 patients were selected from the registry. Based on the covariate data of 1231 (290 + 941) patients from the current study and the registry, a propensity score model was created by the independent statistician, who was blinded to the outcome data, using logistic regression. Five PS strata were formed, and balance for each covariate was checked using numerical and graphical methods. The process of PS estimation, PS stratification, and covariate balance assessment was iterated several times, with the logistic regression model adjusted each time by adding higher order and cross product terms to improve balance. The iterative process stopped when balance was found to be satisfactory. The numbers of patients in the PS strata are displayed in .

Table 1 Sample size in each PS stratum.

Recall that to recover power loss it was proposed to borrow 90 external patients, which means the total amount of information being borrowed is equivalent to 90 external patients. Since borrowing takes place within each stratum, how to allocate the 90 patients to the 5 PS strata needs to be figured out. There may be many possible ways to do so. One may allocate equal number of (i.e., 90/5 = 18) patients to each stratum. The strategy employed in this example is to make the nominal number of patients to be borrowed in each stratum proportional to the similarity of external patients and the current study patients in terms of baseline covariates in that stratum. And this similarity is measured by an overlapping coefficient (Inman and Bradley Citation1989)—the overlapping area of propensity score distributions of the two groups of patients (other reasonable measures could also be used). The overlapping coefficients are then standardized so that they add up to 1. The standardized overlapping coefficients time the total nominal number of patients being borrowed (90) determine the nominal number of patients being borrowed in each stratum. In this example, the number of patients allocated to each stratum using our strategy (as shown in ) is close to that using equal allocation.

Table 2 Overlapping coefficient, standardized overlapping coefficient, nominal number of patients to be borrowed, and power parameter (or weight) in each stratum.

The power parameter α in the Bayesian approach or the exponent λ in the composite likelihood in the frequentist approach in each PS stratum can then be obtained by dividing the nominal number of external patients to be borrowed by the total number of external patients in that stratum. Having determined α (or λ) in each PS stratum, the fraction of information each external patient contributes was known, and the study design was complete. The overlapping coefficient, the standardized overlapping coefficient, the nominal number of patients to be borrowed, and α (or λ) in each stratum in this hypothetical example are presented in . Here, again, all the above design activities were performed by the independent statistician who was blinded to the outcome data.

After clinical outcomes were observed on and unblinded for all the patients, the statistical inference was conducted. For the Bayesian approach, apply the power prior within each stratum to get stratum-specific posterior distributions, which are then combined to complete the inference for the parameter of interest θ. In this example, the posterior probability of θ ¡ 36% is 96.9%, which meets the study success criterion. For the frequentist approach, construct the composite likelihood to get stratum-specific maximum likelihood estimates, which are then combined to complete the inference for the parameter of interest θ. The combined maximum likelihood estimate of θ is 31%, with a one-sided p-value = 0.01, which also meets the study success criterion. This concludes our straw man example, and hopefully the meaning of “borrowing within PS strata” is clear. For more details of this procedure see Wang et al. (Citation2019, Citation2020).

4 From Single-Arm Studies to Randomized Controlled Trials

In the previous section, a hypothetical example is used to describe how PSPP and PSCL can be applied to salvage a single-arm medical device study whose enrollment is curtailed due to the COVID-19 pandemic, by incorporating registry data from Europe where the device has been on the market. While single-arm studies are common for medical devices, randomized controlled trials (RCTs) are certainly a mainstay. Therefore, it is important to highlight the straightforward applicability of these approaches to the augmentation of RCTs with external data. Consider an RCT for a medical device cut short by the COVID-19 pandemic. Just as in the previous section, suppose registry data are available in Europe for the device, and, additionally, registry data are available in the United States for the control therapy (surgery). Both registries are of high quality and it is deemed appropriate to borrow their data to recover power for the statistical inference of the treatment effect μ=θTθC, where T stands for treated group (device) and C stands for control group (surgery). To implement this idea, the statistical procedures illustrated in the previous section can be used to borrow the European registry to augment the treated group and the US registry to augment the control group, as if each were a single arm study. For PSPP, posterior distributions for θT and θC are obtained separately and then, based on the independence of θT and θC, the posterior distribution of θTθC can be found. For PSCL, independent point estimates and their standard errors are obtained separately for θT and θC and then appropriately combined for the point and interval estimation of θTθC. The same steps can be followed for other estimands of treatment effect such as θT/θC. If only one of the two arms is augmented with external data, then PSPP or PSCL only needs to be applied once to that arm because the other arm does not borrow any external data.

5 Concluding Remarks

The public health emergency due to COVID-19 has caused major disruptions to ongoing clinical trials, which pose significant statistical challenges. The PS-integrated methodological contributions presented in this article may be useful in mitigating the impact of these disruptions. These PS-integrated approaches were recently developed as a methodology for planned leveraging of RWD to augment a prospectively designed clinical study, thereby reducing the length and cost of the study. The current article demonstrates how this methodology can be used to salvage a single-arm study or an RCT stopped due to the pandemic by incorporating external data to make up for the lost power as suggested by Meyer et al. (Citation2020) and Akacha et al. (Citation2020).

It is important to note that the incorporation of external data does not change the original objective of the study. In particular, the estimand of the study stays the same. While the external patients are used to replace the patients not enrolled due to the pandemic, they will not be used to replace the patients already enrolled but with missing outcomes. Missing data that are missing completely at random (MCAR) or missing at random (MAR) will be handled in the usual manner. We agree with Meyer et al.’s (Citation2020) observation that if relevant site-specific and participant-specific information related to missingness is collected during the study, most of the pandemic-related missingness can be considered MCAR or MAR. For missing data that are missing not at random (MNAR), sensitivity analyses would be needed to assess the effect of deviations from MAR.

The proposed methodology relies on the technique of propensity score, which has limitations in its application. There is no guarantee that satisfactory balance can be obtained for each covariate through propensity score design. In fact, if the distribution of PS among the current study patients and that among the external patients have little overlap, covariate balance may not be achievable. That means for the PS-integrated approaches to work, patients in the current study and external patients cannot be too dissimilar in terms of covariate distribution.

As can be seen from , in each stratum, the weight α (or λ) assigned to external patients is the nominal number of borrowed patients divided by the number of external patients available to be borrowed. The within-stratum nominal numbers of borrowed patients are proportional to a similarity measure of external patients and the current study patients and add up to the nominal number of borrowed external patients for the whole study. In this application this number can be set as the difference between the planned study sample size and the sample size at the premature stopping due to COVID-19. Wang et al. (Citation2019, Citation2020) contain simulations for operating characteristics, which show that bias increases as the total nominal number of borrowed external patients increases, when covariate distributions are different between the current study patients and external patients prior to stratification.

Finally, for the proposed methodology to be applicable, both the relevance and reliability of the external data source are essential. It is also critical that the principle of outcome-free design (Rubin Citation2008; Yue, Lu, and Xu Citation2014) be strictly followed in the execution of the PS-integrated approaches to maintain the study integrity so that the study results are interpretable. Early consultation with relevant FDA review divisions is important.

Acknowledgments

The authors would like to thank the associate editor and two anonymous reviewers for their insightful comments, which have greatly improved this article.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.