Publication Cover
Journal of Medicine and Philosophy
A Forum for Bioethics and Philosophy of Medicine
Volume 31, 2006 - Issue 1: Clinical Ethics
793
Views
5
CrossRef citations to date
0
Altmetric
Original Articles

The Ethics and Science of Placebo-Controlled Trials: Assay Sensitivity and the Duhem–Quine Thesis

Pages 65-81 | Published online: 20 Aug 2006

Abstract

The principle of clinical equipoise requires that, aside from certain exceptional cases, second generation treatments ought to be tested against standard therapy. In violation of this principle, placebo-controlled trials (PCTs) continue to be used extensively in the development and licensure of second-generation treatments. This practice is typically justified by appeal to methodological arguments that purport to demonstrate that active-controlled trials (ACTs) are methodologically flawed. Foremost among these arguments is the so called assay sensitivity argument. In this paper, I take a closer look at this argument. Following Duhem, I argue that all trials, placebo-controlled or not, rely on external information for their meaningful interpretation. Pending non-circular empirical evidence that we can trust the findings of PCTs to a greater degree than the findings of ACTs, I conclude that the assay sensitivity argument fails to demonstrate that placebo-controlled trials are preferable, methodologically or otherwise, to active-controlled trials. Contrary to the intentions of its authors, the fundamental lesson taught by the assay sensitivity argument is Duhemian: the validity of all clinical trials depends on external information.

I. INTRODUCTION

Ethicists and researchers alike have long worried about the apparent ethical dilemma posed by clinical research. On the one hand, the duty of care governing the physician-patient relationship requires that a physician-researcher act in the patient's best interests, providing treatment consistent with the standard of care. On the other hand, the ethics of research requires the use of sound experimental design: scientific validity is a necessary condition for ethical acceptability. In the context of clinical research, the randomized placebo-controlled trial (PCT) is the gold standard. Given this design, however, the actual therapy received by a patient enrolled in a study is determined by the randomizing scheme. If effective treatment already exists, however, randomization to placebo is inconsistent with the standard of care. In these circumstances, it seems, we are faced with a conflict between the duty of care and the requirements of good science: either the interests of the individual are sacrificed in favor of the interests of future patients who may benefit from the results of valid research, or the potential benefits of research to future patients are sacrificed in favor of the interests of the individual (CitationFried, 1974, p. 51).

Freedman's principle of clinical equipoise suggests a solution to this dilemma (CitationFreedman, 1987, pp. 1–6). The principle of clinical equipoise requires that at the start of a randomized controlled trial (RCT) comparing two treatments there must exist honest, professional disagreement in the community of expert practitioners as to the preferred treatment. Only given such disagreement, Freedman argues, is the random assignment of patients to different treatment arms consistent with the duty of care. According to clinical equipoise, therefore, a placebo control is used properly only when evaluating first generation treatments for a medical condition. Where no treatment exists, “nothing” is currently the standard of care such that randomization to placebo is consistent with the duty of care. Once effective treatment exists, however, randomization to placebo is no longer consistent with the duty of care because “nothing” is no longer the standard of care. Accordingly, clinical equipoise requires that, aside from certain exceptional cases, second generation treatments ought to be tested against an active control, or standard therapy, in an active-controlled trial (ACT) (CitationFreedman, 1987, pp. 1–6; CitationWeijer, 1999, p. 213).

The principle of clinical equipoise, and its requirements, are now recognized by many researchers and regulators. That being said, the principle continues to generate controversy. Indeed, quite recently the principle has been attacked at its foundations. In a series of three articles Franklin Miller and Howard Brody have argued that the purported ethical dilemma posed by clinical research, for which the principle of clinical equipoise is proposed as a solution, is false (e.g., CitationMiller & Brody, 2003, pp. 19–28). The norms of research and therapy, they argue, are distinct: researchers do not have therapeutic obligations to research subjects. Since both the principle of clinical equipoise, and the dilemma it is supposed to solve, are based on the assumption that researchers do have therapeutic obligations to research subjects, if Miller and Brody are right, the principle of clinical equipoise should be rejected as a guiding principle in clinical research.

I do not believe that Miller and Brody are right. However, I will not address their arguments here. For the purposes of this article, I will assume that the dilemma posed by clinical research is real, and that the principle of clinical equipoise should not be rejected for the reasons they adduce. Instead, I will focus on a more subtle, and long-standing, controversy over the principle's implications concerning the appropriate choice of controls. The requirement that ACTs be used in the evaluation of second-generation treatments has repeatedly been criticized on methodological grounds. ACTs, it is argued, are methodologically flawed such that their results are frequently uninterpretable. Since an invalid study is an unethical study, whether or not it employs active controls, the requirements of validity, it is argued, trump the requirements of clinical equipoise. Thus, despite the ethical problems they pose, placebo controls must be used because PCTs perform an irreplaceable service in clinical research.

If it is true that, for the sake of validity, PCTs must be used in trials of second-generation treatments, the principle of clinical equipoise fails to resolve the ethical dilemma posed by clinical research because patients enrolled in such trials will not receive therapy consistent with the standard of care. But the truth of this claim depends on whether the methodological arguments arrayed against the ACTs actually demonstrate what their authors purport: that PCTs are methodologically preferable to ACTs. There are a number of such arguments in the literature (CitationWeijer, 1999, pp. 211–218). In my opinion, all but one of these arguments has been successfully rebutted; that one is the so-called assay sensitivity argument. This argument continues to convert and convince despite the substantive, if not conclusive, criticism it has received (CitationWeijer, 1999, pp. 211–218). Indeed, the concept of assay sensitivity has become institutionalized, as extensive discussion of the concept in the ICH E10 Guidelines, adopted by the regulatory bodies of the European Union, Japan, and the USA in 2000, attests.

In this article, I take a closer look at the assay sensitivity argument. Following a review of the argument as it is presented most recently by CitationTemple and Ellenberg (2000), and in the ICH E10 Guidelines, I offer a critique, arguing that the assay sensitivity argument fails to demonstrate that PCTs are methodologically preferable to ACTs. My argument proceeds as follows. First I consider the assay sensitivity argument as it reads. Prima facie, the argument suggests an absolute contrast between PCTs and ACTs: PCTs, unlike ACTs, do not rely on any information external to the trial for their meaningful interpretation. I argue that this claim is patently false. Since Duhem, philosophers of science have recognized that all empirical tests rely on a wide range of background information and assumptions concerning the test conditions. I then consider a weaker interpretation of the assay sensitivity argument: perhaps the difference between PCTs and ACTs with respect to self-containment is a matter of degree. This is an empirical question, an empirical question which has yet to be resolved. Until such time that it is, I conclude that the assay sensitivity argument fails. Indeed, contrary to the intentions of its authors, the fundamental lesson taught by the assay sensitivity argument is Duhemian: the validity of any clinical trial and the truth of its conclusions depends on external information.

Note: my aim in this article is not to show that ACTs are methodologically superior to PCTs. Rather, I am concerned to show that ACTs and PCTs are, from a methodological point of view, on all fours. Furthermore, there are other reasons for preferring PCTs over ACTs, e.g., pragmatic reasons. I will not discuss these reasons here.

II THE ASSAY SENSITIVITY ARGUMENT

There are two general strategies for showing that a new therapy is effective. One can show that the new therapy is superior to a control treatment, or one can show that the new therapy is non-inferior by a defined amount than a known effective treatment. PCTs typically adopt the former strategy, while ACTs typically adopt the latter. Of course, it is possible to design a superiority trial using an active control, or a non-inferiority trial with a placebo arm; these pairings are merely typical. For the sake of terminological simplicity, however, I will ignore these complications. In what follows, I will use “PCT” and “superiority trial” interchangeably. Likewise, I will use “ACT” and “non-inferiority trial” interchangeably.

While both PCTs and ACTs may be valid, there is a critical difference between their inferential structures. A well-designed study that finds a difference between treatment and control (i.e., superiority), it is argued, provides strong evidence of the effectiveness of a new drug without reference to any information external to the trial. On the other hand, it is argued, a positive finding in a well-designed non-inferiority study does not, in itself, demonstrate that the new treatment is effective; rather, it demonstrates either that both drugs were effective in the study, or that neither were (CitationTemple & Ellenberg, 2000, p. 456). Only on the assumption that the active control is in fact effective can we conclude from non-inferiority that the new drug is effective as well. Crucially, this assumption cannot be verified from data internal to the study. It must be justified, to the extent that it can be, by appeal to external information derived from past experience with the drug. Whereas a finding of superiority (in a well-designed study) stands alone, a finding of non-inferiority demonstrates effectiveness only when supported by external information concerning the efficacy of the active control. This argument is sometimes called the historical-control argument.

Concerning non-inferiority trials, this argument is undoubtedly correct. Conclusions of effectiveness in trials of this type do rely on external information to justify the assumption that the active control is in fact an effective treatment — in this case, positive results from previously conducted PCTs of the treatment in question. In this sense, ACTs are similar to historically controlled trials. Clearly, a finding of non-inferiority between a new treatment and an active control demonstrates that the new treatment is effective, other things being equal, if and only if it is being compared with an effective treatment. Thus, in cases where the efficacy of standard treatment is inadequately supported by good, reliable evidence, ACTs may not justifiably be conducted. Surely, however, insofar as the requisite external information is available such that the historically based assumption of efficacy can be justified, the conduct of ACTs is also justified. Not so, says the argument from assay sensitivity.

In the ICH E10 Guidelines, assay sensitivity is identified as a property of a clinical trial defined as “the ability to distinguish an effective treatment from a less effective or ineffective treatment” (2000, p. 7). The assay sensitivity argument is perhaps most easily understood as a special case of the historical control argument. The historical control argument shows that non-inferiority trials rely on something like a historical control assumption. That is, conclusions of effectiveness in non-inferiority trials are valid if and only if the historical assumption that the active control is an effective drug is justified appropriately (i.e., by appeal to external information concerning the past performance of the drug in question). The assay sensitivity argument pushes this problem a step further, arguing that non-inferiority trials rely on a historically based assumption of assay sensitivity. That is, conclusions of effectiveness in non-inferiority trials are valid if and only if we can justifiably assume that the active control was effective in this particular assay, or study. If this assumption cannot be justified, a finding of non-inferiority cannot demonstrate effectiveness because there are two possible interpretations of the results between which the trial cannot distinguish: either both drugs were effective in the study, or neither were. The trial cannot distinguish an effective treatment from a less effective or ineffective treatment. In other words, the trial lacks assay sensitivity. Given that trials frequently lack assay sensitivity, a finding of non-inferiority will frequently fail to demonstrate that a new treatment is effective. For this reason, the argument concludes, non-inferiority trials are frequently uninterpretable, and, therefore, morally unjustified.

How frequently, we might ask. For many types of effective drugs, it is argued, the assumption of assay sensitivity cannot be justified: “Although it might appear reasonable to expect a known active agent to be superior to placebo in any given appropriately designed trial,” argue Temple and Ellenberg, “experience has shown that this is not the case for many types of drugs” (2000, pp. 456–457). In this vein, Temple and Ellenberg provide a list of “many classes of drugs with assay sensitivity problems”: antidepressants, analgesics, anxiolytics, antiemetics, antihypertensives, hypnotics, antianginal agents, angiotensin-converting enzyme inhibitors for heart failure, postinfarction β-blockers, antihistamines, nonsteroidal asthma prophylaxis, motility-modifying drugs for gastroesophageal reflux disease, and “many other effective agents” (2000, p. 458).

It is worth noting that Temple and Ellenberg provide empirical support for no more than four of the classes cited as drugs with assay sensitivity problems. Other commentators have focused on this shortcoming, arguing that there is in fact little evidence that assay insensitivity is a widespread problem, and that what little evidence there is comes from underpowered studies. In response to this, Temple and Ellenberg argue that assay insensitivity is not merely a matter of study size, effect size, or variability, noting that assay insensitivity is consistent with adequately powered studies, and with effect sizes that vary greatly and unpredictably from study to study (2000, p. 458). Pending good empirical evidence, the status and frequency of assay insensitivity remains controversial. For the sake of argument, however, let us assume that assay insensitivity is a real, relatively frequent problem.

We've seen the implications of assay insensitivity for non-inferiority trials, but what are the implications of the assay sensitivity problem for conclusions of effectiveness in superiority trials? The implications are significantly different. Since, as we've seen, the assay sensitivity assumption is a special case of the historical control assumption, this is not surprising. Conclusions of effectiveness in superiority trials, or so it is alleged, do not rely on anything like a historical control assumption. Unlike non-inferiority trials, it is argued, superiority trials do not rely on external information for the justification of this assumption since they don't make it. Similarly, conclusions of effectiveness in superiority trials do not rely on external information to justify the assumption of assay sensitivity because, it is argued, a finding of significant difference itself justifies this assumption. As the ICH E10 Guidelines assert:

When two treatments within a trial are shown to have different efficacy (i.e., when one treatment is superior), that finding itself demonstrates that the trial had assay sensitivity. In contrast, a successful non-inferiority trial (i.e., one that has shown non-inferiority), or an unsuccessful superiority trial, generally does not contain such direct evidence of assay sensitivity (2000, p. 8, emphasis mine).

The alleged implications of the assay sensitivity problem for the two types of trials are, therefore, inverse (summarized in ). Negative results in superiority trials and positive results in non-inferiority trials, both of which involve a failure to find a difference between two treatments, suffer from the inferential dangers of assay insensitivity. In both cases, the assumption of assay sensitivity must be justified with reference to information external to the trial. When this is impossible, conclusions from a finding of no-difference (or non-inferiority) are unjustified. Inversely, it is argued, positive results in superiority trials and negative results in non-inferiority trials, both of which involve the demonstration of different efficacy (in the latter case, the trial demonstrates that standard treatment is superior to the new treatment), never suffer from the inferential dangers of assay insensitivity; this is because a finding of different efficacy in itself demonstrates that the trial had assay sensitivity. No information external to the trial is needed to support a conclusion of effectiveness in the former case, nor of ineffectiveness in the latter. Therefore, at least in the alleged cases where the assay sensitivity assumption cannot be externally justified, superiority trials are methodologically and morally preferable because, unlike non-inferiority trials, superiority trials allow for “a clear distinction … between a drug that does not work … and a study that does not work …” (CitationTemple & Ellenberg, 2000, p. 457).

TABLE 1 The Inverse Relationship between Trial Type and Results with Respect to Assay Sensitivity

III CRITIQUE OF THE ASSAY SENSITIVITY ARGUMENT

The inverse relationship between trial type and results with respect to assay sensitivity is a function of the alleged contrast between PCTs and ACTs with respect to what I will call “inferential self-containment.” Inferential self-containment, a close relative of the concept of internal validity, is a property of a clinical trial I define as the ability to infer that a relationship between two variables is causal or that the absence of a relationship implies the absence of cause without relying on information that is not testable within the study (i.e., external information). PCTs, in contrast with ACTs, are inferentially self-contained because they purportedly do not rely on any external information for the meaningful interpretation of their results. As Temple and Ellenberg state in no uncertain terms:

A well-designed study that shows superiority of a treatment to a control (placebo or active therapy) provides strong evidence of the effectiveness of the new treatment, limited only by the statistical uncertainty of the result. No information external to the trial is needed to support the conclusion of effectiveness. In contrast, a study that successfully shows “equivalence” … does not by itself demonstrate that the new treatment is effective (2000, p. 456, emphasis mine).

Notice the absolute character of this contrast. ACTs rely on external information to justify (1) the historical control assumption; and (2) the assay sensitivity assumption. PCTs do not rely on a historical control assumption, so the issue is moot here. However, they do rely on the assumption of assay sensitivity. But, as we saw above, in PCTs this assumption is allegedly justified internally: a finding of difference, it is argued, in and of itself demonstrates assay sensitivity. PCTs, it is alleged, are inferentially self-contained, ACTs are not.

If it can be shown that PCTs and ACTs do not differ with respect to self-containment in this sense, then the implications of the assay sensitivity argument for PCTs and ACTs will not be inverse. Rather, the implications of the argument for PCTs and ACTs will be the same. In the following I argue that PCTs do, in fact, rely on external information for the meaningful interpretation of their results, both generally, and with respect to the assay sensitivity assumption in particular. The difference between PCTs and ACTs with respect to inferential self-containment, I conclude, is not absolute. I then consider a weaker interpretation of the assay sensitivity argument: perhaps the difference between PCTs and ACTs with respect to self-containment is a matter of degree. Perhaps PCTs rely on less, or qualitatively different, assumptions than ACTs. This is an empirical question, not a matter that can be solved a priori, by appeal to methodological considerations alone, as the assay sensitivity argument suggests. I briefly examine the oft-cited Hypericum study, an exemplar of the sort of evidence typically cited against ACTs. I argue that those who treat the results of this study, and studies like it, as evidence that PCTs are methodologically preferable (to some degree) to ACTs beg the question. I conclude that the assay sensitivity argument fails to show that PCTs are methodologically preferable to ACTs because the necessary contrast with respect to self-containment fails to obtain.

A The Duhem–Quine Thesis

The idea that a trial, or kind of trial, might possess the property of self-containment assumes a particular conception of the relationship between hypotheses and evidence. In particular, it assumes that there is something like a one-to-one correspondence between hypotheses and empirical results. If and only if the results of a clinical trial correspond unambiguously with the hypothesis under study does it make sense to suppose that a trial might possess a property like inferential self-containment, because if there is not such a determinate connection, auxiliary hypotheses are necessarily involved in the interpretation of evidence. Philosophers of science have long noted that there is not a discrete connection between hypothesis and evidence. Rather, the meaningful interpretation of the results of any empirical test depends on a host of background information and assumptions concerning the test conditions. Only given this (external) information, is it possible to determine whether or not, or within what limits, empirical observations confirm or disconfirm a given hypothesis. This insight is commonly referred to as the Duhem–Quine thesis in deference to its originator, Pierre Duhem, and most famous proponent, W.V.O. Quine.

The Duhem–Quine thesis asserts that empirical tests are holistic; that is, empirical observations confirm or disconfirm a given statement or hypothesis only against the background of a network of theoretical assumptions. For Quine there is not a one-to-one correspondence between statements and empirical observation. Rather, empirical observation confronts our beliefs only as a whole (CitationQuine, 1951, p. 38). To repeat Quine's famous simile, the totality of our knowledge or beliefs is like “a field of force whose boundary conditions are experience.” Within this field, all our beliefs are interrelated, connected, as it were, by lines of force. A conflict with experience at the edges of this field requires adjustments in the interior. But, an adjustment in one place may require adjustments in many other places, because of the logical interconnections between our various beliefs. And because the field is underdetermined by experience, just what adjustments to make are unclear. The Duhem–Quine thesis, therefore, denies precisely what the idea of self-containment presupposes: that there is a discrete connection between the results, for example, of a PCT, and the hypothesis under study. Since there is never a discrete connection between evidence and single statements, no trial, placebo-controlled or not, possesses the property of inferential self-containment (CitationQuine, 1951, pp. 39–40).

In fact, the meaningful interpretation of RCT results depends on a host of background assumptions that cannot be tested within the study. For example, assumptions about randomness and randomization: how to randomize subjects, if randomization achieves what it is supposed to, even what randomness is, are controversial questions. Assumptions about blinding: whether blinding works or not is disputed. For instance, Philip's paradox says that placebo-controlled trials are useful only when the experimental treatment fails to prove significantly superior to placebo, because the more effective an experimental intervention is, the more likely it is to become unblinded during the course of the study (CitationNey, Collines, & Spensor, 1986, pp. 119–126). And assumptions about the appropriate control: whether, for example, placebo controls provide the baseline they are supposed to is under dispute since there is substantial evidence that placebo controls, like active controls, possess their own pharmacological profiles, including measurable peak times, carry-over effects, cumulative effects, and toxicities (CitationLasagna, Laties, & Dohan, 1958, pp. 533–537). Furthermore, it has repeatedly been demonstrated that different-colored placebos have differing effectiveness for various conditions (CitationShapira et al., 1970, pp. 446–449; CitationHuskisson, 1974, pp. 196–200; CitationLucchelli, Cuttaneo & Zattoni, 1978, pp. 153–155; CitationBuckalew & Coffield, 1982, pp. 245–248; CitationSchindel, 1978, pp. 231–235). Finally, there is good reason to believe that the placebo response rate itself is highly variable. In a meta-analysis of 30 PCTs of cimetidine, Moerman found that about half of the trials declared that cimetidine was superior to placebo for ulcer healing at one month; the other half declared that cimetidine was no better than placebo. The different conclusions were not accounted for by the response rate to cimetidine; this was strikingly constant across all trials (70–75%). Rather, it was the placebo response rate, which varied incredibly, from a low of 10% to a high of 80%, that accounted for the difference. In high-placebo-response trials, cimetidine appeared to be no more effective than placebo; in the low-placebo-response trials cimetidine was significantly superior (CitationMoerman, 1983). Given this, we must ask whether a finding of difference in a PCT was due to actual superiority or to the fact that the placebo used was not effective in this particular assay. This question cannot be answered internally. Indeed, none of these assumptions can be tested within the study, and these are just three of many such examples that illustrate the Duhemian character of all RCTs.

On the face of it, then, Temple and Ellenberg are simply wrong when they state that “[n]o information external to the trial is needed to support the conclusion of effectiveness [in a well-designed PCT that demonstrates a difference between treatment and control].” RCTs, like all other empirical tests, rely on a host of background assumptions for their meaningful interpretation.

Admittedly, the Duhem–Quine thesis is now so well accepted that it is practically self-evident. It is difficult to believe that Temple and Ellenberg would be ignorant of it. That being said, their strident formulation of the assay sensitivity argument suggests that they are. But maybe I am interpreting their remarks uncharitably. Perhaps what they meant was that no information external to the trial is needed to support the assumption of assay sensitivity in particular. Now we must ask: is there a contrast between PCTs and ACTs with respect to self-containment when it comes to this assumption?

B Assay Sensitivity Re-Examined

The implications of the assay sensitivity argument for PCTs and ACTs are purportedly inverse. Assay sensitivity cannot be assumed when the results are negative in a superiority trial and positive in a non-inferiority trial. Inversely, it is argued, assay sensitivity can be assumed when the results are positive in a superiority trial and negative in a non-inferiority trial because a finding of difference is, in itself, evidence of assay sensitivity. Let us call the latter pair of trials “A” trials (for “can be assumed”) and the former pair of trials “not-A” trials (for “cannot be assumed”). Notice that not-A trials may possess assay sensitivity. The problem in these cases is that the results are consistent with the trials' possessing or lacking this property. From an epistemological point of view, the ontological question concerning whether or not a not-A trial possesses the property “assay sensitivity” is underdetermined by the internal evidence. Only an appeal to external information (when it is available) can resolve this ambiguity. The problem does not arise in A trials because, again, a finding of different efficacy in itself demonstrates that the trial had assay sensitivity; that is, from an epistemological point of view, the answer to the ontological question is sufficiently determined by the internal evidence. In general, then, the assay sensitivity argument is an epistemological argument that turns on a claim of differential access: A trials provide us with sufficient evidence to answer the ontological question of whether or not a trial possesses the property of “assay sensitivity,” while not-A trials leave this question underdetermined such that an appeal to external information (when it is available) is necessary. This, if it is true, would constitute a contrast between PCTs (specifically, positive PCTs) and ACTs (specifically, positive ACTs) with respect to self-containment.

Assay sensitivity is defined as “the ability to distinguish an effective treatment from a less effective or ineffective treatment.” If assay sensitivity is a property that all trials may possess or lack, as the ICH E10 Guidelines suggest, this definition must be expressed subjunctively because not-A trials do not distinguish an effective treatment from a less effective or ineffective treatment; rather, not-A trials “fail” to so distinguish two treatments. But not-A trials may possess assay sensitivity in virtue of the fact that they would distinguish the two active treatments under comparison from a less effective or ineffective treatment (e.g., placebo) if such a comparison were made. In this sense, not-A trials possess the property of assay sensitivity subjunctively. With this in mind, we are in a position to specify three necessary and sufficient conditions for assay sensitivity that, I take it, are implicit in the definition:

1.

D

2.

T indicates D

3.

D → T indicates D

The first condition tells us that, ontologically speaking, there is a difference between the treatments being compared in the trial. The second condition tells us that the trial, in fact, indicated a difference. The third condition tells us that the trial would indicate a difference if there were one. Condition (3) says more than (1) and (2). Not only is there a difference between the two treatments, and the trial indicated a difference, but if there were a difference, the trial would indicate it. To use possible worlds talk, in all those worlds closest to the actual world in which there is a difference between the treatments being compared, the trial would detect the difference. Condition (3), to paraphrase Robert Nozick, tells us that the trail “tracks” difference (CitationNozick, 1981, p. 178).

Notice, however, that (3) says nothing about possible trials in which there is no difference. Condition (3) tells us how T is sensitive to D, but not how T is sensitive to not-D. It tells us that T would indicate a difference if there were one, but not what T would indicate if there weren't a difference. Perfect sensitivity would involve indication and difference varying together. With (3) we have one portion of that variation: if D, then T indicates D. But the sensitivity specified by this subjunctive does not have T's indicating varying with all the ontological possibilities (i.e., D and not-D), merely the cases in which there is a difference between the treatments being compared. To paraphrase Nozick again, (3) tells us only half the story about the sensitivity of T's indication ability. Given this, notice that

not-D & T indicates D

(i.e., a finding of difference when there isn't one, or a false positive) is perfectly consistent with (3). Since (3) tells us only that T will indicate D when D obtains, it tells us nothing about what T will indicate when not-D obtains. Given this, T might indicate D even if not-D. Nothing in (3) rules out this possibility.

To be at all plausible, therefore, the definition of assay sensitivity must be amended so as to include a fourth condition:

4.

not-D → not-(T indicates D)

This condition tells us that the trial would not indicate a difference if there was not one. To use possible worlds talk again, (4) says that in all those worlds closest to the actual world in which there is not a difference between the treatments being compared, the trial would not indicate a difference. With these four necessary and sufficient conditions for assay sensitivity in hand, we are in a position to answer our question: is there a contrast between PCTs and ACTs with respect to self-containment when it comes to the assay sensitivity assumption?

First of all, I should make it clear that the modality of the subjunctive conditionals (3) and (4) is not necessity. Clinical trials, to point out the obvious, deal in probabilities. For this reason, no trial is ever perfectly sensitive. However, we can assess the probability of these conditions being met. Take condition (4), for example. Condition (4), as we saw above, is supposed to rule out the possibility of “not-D & T indicates D,” or a false positive (Type I error). In a trial, the probability of Type I error (the α level) can be controlled; indeed, we can specify it. Conventionally we tolerate an α level of 0.05, sometimes 0.01. But, by decreasing α we can effectively reduce the probability of Type I error to next to zero if we so desire. With this mind, it might be argued that if we specify an α level of 0.001, or 0.0001, surely a finding of difference provides direct (internal) evidence of assay sensitivity limited only by the vanishing statistical uncertainty of the result. In other words, given a sufficiently low α level, aren't positive PCTs inferentially self-contained (with respect to the assay sensitivity assumption)?

No. Though we can control for statistical uncertainty, our control is limited to minimizing the probability of decision error, and that probability is never zero. Furthermore, α, and for that matter β, respectively specify the probability of Type I and Type II error only in the long run. An α level of 0.05 says that, in the long run (e.g., in a sequence of 100 trials) the probability of a positive result when Ho is true is 0.05, or 5%. What an α level of 0.05 does not allow us to say is that there is a 95% chance that this particular result is a true positive, or that there is no more than a 5% chance that this particular positive result is a false positive. Indeed, we could specify as low an α level as we like, and it would still say nothing about whether or not a positive result in a particular trial was a false positive or not (CitationPeirce, 1957, p. 64). Given this, it is simply a mistake to hold that a finding of difference in a particular trial is, in itself, sufficient evidence for assay sensitivity because, in a particular trial, we have no way of knowing whether or not the result is true or false.

Furthermore, even if we were to grant that, given a sufficiently low α level, a finding of difference is in itself sufficient evidence for assay sensitivity despite the long-run nature of α, the assay sensitivity argument would still fail to demonstrate a contrast between PCTs and ACTs with respect to self-containment, because the same argument can be made with respect to Type II error (i.e., false negatives). False positive results in non-inferiority trials are, after all, conventional false negatives: the trial fails to distinguish an effective treatment from a less effective or ineffective treatment such that the experimental treatment, which is in fact ineffective, is found non-inferior to standard treatment. But the probability of false negatives, or β, like α, can be controlled. Since β is inversely related to power, we can control for Type II errors internally simply by increasing the power of the test. Since both false positives and false negatives can be controlled for internally (albeit, only in the long run), the alleged contrast between PCTs and ACTs with respect to self-containment again dissolves.

Of course, it might be objected that the determination of β depends on effect size and other variables that are not known in advance. To this extent, the objection might continue, the determination of β, unlike α, depends on external information, and to this extent, A trials and not-A trials differ with respect to self-containment. Since, however, α is meaningful only with reference to a sequence of trials, α, like β, ultimately depends on external information (i.e., information from other trials in the sequence). Pointing out that the determination of β requires reference to other trials in a sequence, therefore, loses force as an objection. There is not, therefore, a contrast between PCTs and ACTs with respect to self-containment when it comes to the assay sensitivity assumption: both PCTs and ACTs rely on external information in order to justify the assumption of assay sensitivity.

C Different by Degree?

Again, it looks like Temple and Ellenberg are just wrong when they state that “[n]o information external to the trial is needed to support the conclusion of effectiveness [in a well designed PCT that demonstrates a difference between treatment and control]” (CitationTemple & Ellenberg, 2000, p. 456). All RCTs, placebo-controlled or not, rely on a host of background assumptions for their meaningful interpretation, both generally and with respect to the assay sensitivity assumption in particular. Again, this may seem trivial, but it does show that the assay sensitivity argument fails. If ACTs are inferior to PCTs, it is not because ACTs rely on external information for their meaningful interpretation and PCTs do not. Contrary to the assay sensitivity argument, there is not an absolute difference between PCTs and ACTs with respect to inferential self-containment either generally or in relation to the assay sensitivity assumption in particular.

Given that the strident formulation of the assay sensitivity argument by Temple and Ellenberg demands that we interpret the alleged contrast between PCTs and ACTs with respect to self-containment as absolute in character, it is tempting to stop here. I have shown that, on this interpretation of the argument, it is patently false. Charity, however, demands that we entertain a weaker interpretation of the assay sensitivity argument: perhaps the difference between PCTs and ACTs with respect to self-containment is a matter of degree. Perhaps PCTs rely on less external assumptions than ACTs, or different assumptions that are more easily justified. In other words, even if there is not an absolute difference between PCTs and ACTs with respect to self-containment, could it not be the case that, for one reason or another, the findings of PCTs can be trusted to a greater degree than the findings of ACTs? Perhaps. But this is, at best, an empirical question, not a question that can be answered a priori, by appeal to methodology alone, as the assay sensitivity argument suggests.

Is there any evidence that PCTs are so preferable? Temple and Ellenberg, and others, frequently point to the results of various three-arm studies as evidence in favor of this hypothesis. The oft-cited Hypericum study (CitationHypericum Depression Trial Study Group, 2002, p. 1807) is a case in point. In this randomized controlled trial comparing hypericum perforatum, sertraline, and placebo, neither hypericum nor sertraline performed significantly better than placebo even though sertraline has repeatedly been shown to be effective in other trials. The authors of this study note that

“[f]rom a methodological point of view, this study can be considered an example of the importance of including inactive and active comparators in trials testing the possible antidepressant effects of medications. In fact, without a placebo, hypericum could easily have been considered as effective as sertraline” (CitationHypericum Depression Trial Study Group, 2002, p. 1813).

Notice, however, that treating these results as evidence that we can trust the findings of PCTs to a greater degree than the findings of ACTs begs the question. It is simply assumed that performance relative to a placebo control in a well-designed trial in and of itself demonstrates that a trial had assay sensitivity. But we now know that this is not the case. We cannot assess the results of this trial independently of the assumption, one of many, that the placebo control performed as expected. But, as we saw above, this assumption cannot be justified internally. We only know what to expect from past experience. Furthermore, even given good (external) information about past experience, we can't be certain that the placebo control will perform like it did in the past. We are, in short, in much the same situation as we are with respect to an active control: we must do the best we can to justify our assumptions and suspend judgment until our results have been replicated. After all, no one concluded that sertraline was an ineffective treatment on the basis of the results of this trial. We know that sertraline has repeatedly been shown to be effective in other trials, so we suspend judgment. Why do we treat the hypothesis that we can trust the findings of PCTs to a greater degree than the findings of ACTs differently? We can conclude that this hypothesis is confirmed on the basis of these results if and only if we are already assuming that a finding of difference in a comparison against placebo in and of itself demonstrates that a trial had assay sensitivity. But this assumption is false. Those who do draw this conclusion on the basis of evidence of this kind are simply begging the question, assuming without justification that we can trust the findings of PCTs to a greater degree than the findings of ACTs when this is the very question under study. Until there is good (i.e., non-circular) empirical evidence that the findings of PCTs are so preferable, this assumption remains unjustified.

IV CONCLUSION

Pending non-circular empirical evidence that we can trust the findings of PCTs to a greater degree than the findings of ACTs, there is no reason to believe that we can trust the findings of PCTs to a greater degree than the findings of ACTs. Nor is there is an absolute contrast between ACTs and PCTs with respect to self-containment. In both ACTs and PCTs the ontological question concerning whether or not the trial possesses the property “assay sensitivity” is underdetermined by the (internal) evidence. Both ACTs and PCTs depend on external information for their meaningful interpretation. Therefore, the assay sensitivity argument, on any plausible interpretation, fails to show that PCTs are preferable, methodologically or otherwise, to ACTs. Indeed, contrary to its authors' intentions, the fundamental lesson taught by the assay sensitivity argument is Duhemian: the validity of any clinical trial and the truth of its conclusions depends on external information. The assay sensitivity argument, therefore, presents an insuperable barrier to the conduct of ACTs only insofar as it presents such a barrier to the conduct of PCTs as well. From a methodological point of view, ACTs and PCTs are on all fours. The problem of assay insensitivity, therefore, does not mitigate against the use of ACTs in trials of second-generation treatments. Barring universal invalidity in the face of assay insensitivity, the principle of clinical equipoise continues to promise a genuine resolution of the ethical dilemma posed by clinical research.

ACKNOWLEDGMENTS

I would like to thank the various people who offered insightful comments on earlier versions of this article at: the Department of Philosophy Colloquium, Dalhousie University, September 19, 2003; the Canadian Institutes of Health Research (CIHR) training retreat, Pictou NS, October, 2003; the Joint meeting of the Canadian Bioethics Society (CBS) and American Society for Bioethics and Humanities (ASBH), Montreal PQ, October 2004; and my reviewers at The Journal of Medicine and Philosophy. Also, special thanks to Dr. Richmond Campbell for his guidance on earlier versions of this project, and to Dr. Charles Weijer, whose enthusiasm for, and knowledge of, the ethics of research has contributed to this paper in innumerable ways.

James Anderson's research is funded by the Social Sciences and Humanities Research Council (SSHRC), the CIHR, and The Killam Trust.

Notes

Temple, R. (1983). Difficulties in evaluating positive control trials Reprinted from The 1983 Biopharmaceutical Section Proceedings of the American Statistical Association.

REFERENCES

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.