Publication Cover
Sequential Analysis
Design Methods and Applications
Volume 27, 2008 - Issue 1
281
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Discussion on “Second-Guessing Clinical Trial Designs” by Jonathan J. Shuster and Myron N. Chang

Pages 26-29 | Received 11 Sep 2006, Accepted 14 Mar 2007, Published online: 04 Feb 2008

Abstract

In discussing the paper by Shuster and Chang, we focus on the distinction between criteria for stopping a clinical trial early versus criteria for achieving statistical significance during the conduct of the trial. It is important to recognize that statistical significance is one of many factors that a data and safety monitoring board (DSMB) must consider in making the determination whether or not to stop a trial early (and thus the reference to interim testing boundaries as stopping rules is a misnomer). Some greatly simplified examples are provided. We conclude that one should tread cautiously in attempting to second-guess decisions made by a DSMB, and one should give careful consideration to the nonstatistical issues involved.

Subject Classifications:

1. INTRODUCTION

The authors consider clinical trials which were conducted without interim testing and provide a provocative discussion about investigating the possibilities for early stopping which might have occurred had sequential methods been incorporated in the study design. As indicated in their abstract, the focus is to assist biostatistical reviewers of such trials, and to provide them with methodology for imposing a sequential analysis on the completed study results. While the focus is thus on the somewhat technical, statistical issues of early termination, I believe that any discussion about early termination must consider the larger context in which it occurs. Specifically, the question which a Data and Safety Monitoring Board (DSMB) must address, is not just “Can the study be terminated early?” but “Should the study be terminated early?” Thus I will discuss the paper by Shuster and Chang somewhat indirectly by discussing this larger question.

2. WHEN SHOULD A CLINICAL TRIAL BE STOPPED EARLY?

A succinct answer to this question is “When it is in the best interests of patients.” The patient population referred to here includes both patients who are included in the trial as well as patients who are not, and patients who currently have the disease as well as those who will develop the disease in the future. While the overriding concern is the welfare of patients, the interest of the sponsor is, of course, an additional important consideration. Another consideration is the advancement of scientific knowledge, but one should keep in mind that the ultimate goal in all medical research is inevitably the welfare of patients.

Any listing of factors which impact the welfare of patients should include the following, most of which are primarily non-statistical in nature:

What is a safety profile?

Has efficacy or lack of efficacy been demonstrated? A satisfactory answer to this question must consider secondary as well as primary measures of efficacy, and estimation of magnitude of effect size as well as statistical significance.

Are there important questions about the nature of treatment effects which remain to be answered? Such questions may include an understanding about mechanisms of action and differential effects among subgroups.

Are alternate treatments available and, if so, how does efficacy and safety compare to the effects observed for the drug under study?

If evidence of efficacy has been observed, might the effect be transient? If efficacy has not been observed, might the drug have a delayed benefit?

The answers to some of these questions may change markedly while the study is in progress. Although one would hope that sufficient expertise might be available within the DSMB to ensure that it is aware of the current state of knowledge, it might also be advisable to have study investigators (perhaps members of the Steering Committee) provide updates on these issues in open sessions of each meeting of the DSMB.

I illustrate some of the practical issues which a DSMB might encounter with some greatly oversimplified examples. Each of these examples supposes that 2/3 of the data has been collected and is available at the time of the interim analysis.

Example 2.1

A prespecified boundary for testing efficacy based on the primary endpoint has been crossed, thus implying that efficacy has been established and the study “can be” terminated. However, the evidence for efficacy observed among the secondary endpoints is disappointing.

Not infrequently in this situation, a DSMB will conclude that the study should not be terminated because the strength of the evidence observed in secondary endpoints is not sufficiently supportive. In particular, some secondary endpoints may be highly correlated with the primary endpoint so that, if these secondary endpoints are not supportive, it sheds doubt on the credibility of the evidence observed for the primary endpoint.

Example 2.2

Suppose in Example 2.1 that the secondary endpoints were strongly supportive of efficacy and that the magnitude of the effect size was clinically important.

In this situation, one might still conclude that it is in the best interests of patients to continue the trial. For example, the interim analysis might suggest that the treatment effects vary considerably among subgroups of patients so that, in order to provide appropriate patient care, it is necessary to have a complete understanding of these subgroup differences.

Example 2.3

The conditional power calculations clearly indicate that achieving statistical significance for the primary endpoint is virtually impossible and corresponding confidence intervals indicate that the largest plausible effect size is too small to be clinically relevant.

One might nonetheless wish to continue the study. An example would be if important effects are observed among secondary endpoints. The observed safety profile of the drug would also weigh importantly in the decision whether or not to continue the trial.

Example 2.4

The conditional power is 70%, suggesting a real possibility for achieving statistical significance for the primary endpoint, and thus suggesting that the trial should be continued.

Again, it would be important to consider additional factors. Do confidence intervals indicate that a clinically meaningful effect is plausible? Do negative findings among the secondary endpoints adversely impact the plausibility of the results observed for the primary endpoint? Is the safety profile acceptable? Have alternate treatments become available which obviate the usefulness of the drug under study?

3. SOME TECHNICAL COMMENTS

The authors make a desirable contribution to clinical trials methodology by considering new interim testing designs with a goal to achieving optimal properties. Existing procedures for group sequential designs might also be considered, and comparisons to the newly proposed procedures would be of interest.

One alternate approach would be to fix the maximal sample size to be the same as originally specified, so that the effect of the interim testing is to lower power. An advantage is that the study design is the same, and only the data analysis changes. This also simplifies the process of imposing an alternate analysis plan. Credibility would be enhanced by choosing a commonly used group sequential procedure that the investigators or DSMB might likely have chosen and that would be familiar to other reviewers. Such a procedure, which has approximately the same power and maximal sample size as the single sample size design, would accomplish both objectives.

Since the magnitude of effect sizes is important, the use of repeated confidence intervals would be helpful. Conditional power calculations would be particularly useful in assessing futility.

While the new procedures appear to have optimal operating characteristics, the authors point out some properties that are counterintuitive. Some further work in understanding the rationale for these properties would be desirable. Also, the procedures are described as Bayesian with an apparent need to specify a prior distribution as well as cost and loss functions, although this does not seem to have factored into the application of the procedures in the examples. Some clarification here would be helpful.

The authors assume that recruitment is halted during the process of creating and reviewing interim analyses. This practice appears undesirable and also uncommon in my experience. On the other hand, the authors correctly identify the difficulties that may arise when a trial is stopped early, and subsequent data raise questions about this decision.

4. CONCLUSIONS

  1. As reflected in the above discussion, there is an enormous difference between the questions “When can a study be terminated early?” and “When should a study be terminated early?” While the former question is largely statistical, the latter question involves many other factors, most of which are not statistical in nature and require subjective judgment and knowledge of externalities that can change during the course of the trial. Thus, second-guessing DSMBs is not a desirable strategy for improving their performance.

  2. Although I am not directly acquainted with the examples presented by Shuster and Chang, the authors make an important contribution by drawing attention to the need for careful interim monitoring of clinical trials and the need for mechanisms to insure this happens. Ultimately, the issue is one of ethical conduct of research, which is the responsibility of Institutional Review Boards (IRB). Thus, efforts might be directed towards insuring that IRBs are effective in requiring that trials are appropriately monitored.

  3. Since the overriding responsibility of the DSMB is the welfare of patients, and this is the ultimate issue in deciding whether a trial should be terminated early, statistical probability statements should be viewed only as guidelines to assist the decision making process.

  4. The forgoing considerations suggest that DSMBs are best chaired by individuals with clinical as well as methodological expertise rather than by statisticians.

Notes

Recommended by N. Mukhopadhyay

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.