Abstract
A procedure is developed for power analysis and sample size calculation for a class of complex testing problems regarding the largest binomial probability under a combination of treatments. It is shown that the asymptotic null distribution of the likelihood-ratio statistic is not parameter-free, but is a conservative asymptotic null distribution. A nonlinear Gauss-Seidel algorithm is proposed to uniquely determine the alternative for the power and sample size calculation given the baseline binomial probability. An example from an animal clinical trial is discussed.
1. Introduction
In biological and medical research, it is often necessary to perform power analysis, or determine the sample size required for binomial trials involving multiple treatments. For example, such an analysis/calculation is required by the National Institutes of Health (NIH) for research grant applications. One particular type of questions that need to be answered is regarding the largest binomail probability under these treatments. Suppose that there are r treatments, denoted by . Here the ‘treatment’ can be a real treatment (e.g., drug), or a factor that has two levels (e.g., male/female). Let be the indicator for the qth treatment (0 or 1), . Suppose that n independent trials are run under each combination of the treatments, , resulting a binary outcome for each trial (1 – success, 0 – failure). Let be the total number of successes under . It is assumed that has a Binomia distribution, where is the probability of success in a single trial under . It is believed that all the treatments have at least nonnegative effects, so is the smallest probability and is the largest. The question is to determine the sample size, n, so that one has at least the power to prove the case that is higher than the rest of the probabilities, if it is indeed higher by at least as much as δ. In some cases, the researcher has a target sample size (e.g., the maximum under the budget constraint). The question then is to perform the power analysis for the given sample size. We illustrate with an example.
Example 1.1
Researchers at the Oregon Health & Science University (OHSU) were preparing to meet the deadline of a grant submission. One of the research aims of the grant proposal had to do with comparative transplantations via animal clinical trials (i.e., mice) to determine the best overall protocol, which can then be further optimised. Successful liver repopulation is defined as greater than 70% cell replacement and successful blood reconstitution is defined as greater than 50% human cells in the bone marrow. From previous studies, it was known that these levels of liver repopulation could be achieved in 20–80% of transplanted mice with a good hepatocyte donor, the average being 50–60%. For cord blood transplants into neonates, about 75% of the mice reach the desired human repopulation levels, again with a range of 50–90% between experiments. Using these numbers from the single cell type transplants, we made the assumption that a successful protocol would yield about 30% success in double-repopulation. Here the protocol involved a treatment high and low dose levels, and a control factor of youth and grown-up mice. In other words, there are two treatments, for high dose and for low dose; for youth mice and for grown-up one. The trials were to be carried out independently on n different mice, resulting a binomial proportion, under each protocol. It was determined that n should be no more than 30 due to the budget constraint. An initial request was made to perform a power analysis in order to detect a 10% difference that separates the success rates of the ‘optimal’ protocol (i.e., with high dose and youth mice) and the rest.
A naive approach to the power analysis/sample size calculation would be to compare with each of the other probabilities (actually, only those with exactly one of the treatment indicators equal to zero), and perform a two-sample t-test for the difference in proportions. However, this approach is low power, and often results in a sample size that the researcher cannot afford. The inefficiency of the naive approach is not surprising, because only a (small) portion of the data are involved in the two-sample t-test (for each comparison). One can do better by utilising the entire data in the analysis. To do so we need to assume a model for the binomial probability. Suppose that (1) (1) where satisfies and is strictly increasing. A well-known example of h is the logistic function, . Under the assumed model, the problem of interest can be expressed more precisely as testing the hypothesis (2) (2) versus . Natually, the likelihood-ratio test is considered. The latter is based on the test statistic (3) (3) where is the maximised log-likelihood and the maximised log-likelihood under . A first step for the likelihood-ratio test (LRT) would be to determine the critical value, , corresponding to the given level of significance α, such that () () where and is the probability distribution given that β is the true vector of parameters. If the log-likelihood function is log-concave, which is the case, for example, for the logistic regression, the event inside the probability of (Equation4() () ) implies that the maximum for must take place on the boundary of , provided that . Therefore, the critical value is computed by considering the boundary of , which is a subset of (5) (5) Unfortunately, even with the latest simplification, the asymptotic null distribution of the LRT is not parameter-free. This is shown in the next section. On the other hand, the arguments also show that a conservative asymptotic null distribution (CAND) for the LRT is , which is parameter-free. Here the CAND is in the sense that (6) (6)
The main objective of the current paper is to determine the sample size, n, for the LRT so that the test will have the designated power, or to obtain the power of the LRT, under a given sample size. Although standard power and sample size problems in logistic regression are well studied (e.g., Alam, Rao, & Cheng, Citation2010; Borenstein, Rothstein, & Cohen, Citation2001; Demidenko, Citation2007; Hsieh, Bloch, & Larsen, Citation1998; Novikov, Fund, & Freedman, Citation2010; Whittemore, Citation1981), to the best of our knowledge, the kind of problems that we are dealing with have not been addressed. Clearly, the power of the test depends on the alternative, and there are infinitely many possible alternatives to (Equation2(2) (2) ), that is, in . On the other hand, a practitioner would prefer a ‘short answer’, as opposed to something that is case-by-case. This issue is addressed in Section 3, where a unique alternative, based on a reasonable argument, is determined. A simple Gauss-Seidel type algorithm is proposed to compute the alternative. A Monte Carlo procedure is then proposed to compute the power or sample size. A real-life application is considered in Section 4. Technical details are deferred to Appendix.
2. Asymptotic null distribution
In the standard situation, the LRT is known to have an asymptotic distribution, under the null hypothesis, with a certain degrees of freedom that does not depend on the parameter under the null. For example, if, instead of (Equation2(2) (2) ), one were to test (7) (7) versus for a fixed , then the asymptotic distribution of the LRT is , regardless of the value of the true , as long as the true is zero. In other words, the asymptotic null distribution is parameter-free. However, this is not the case for testing (Equation2(2) (2) ). We show this for the case of logistic regression with r=2.
Write the log-likelihood function as . Let , , and be the maximiser of l without constraint, over , and over , respectively. Suppose that the true parameter vector is , where . By the standard asymptotic theory, we have , as . On the other hand, by White (Citation1982), we have , as for some and . Thus, by the Taylor expansion, we have (8) (8) (9) (9) for some . It is easy to show that the partial derivatives are uniformly bounded when divided by n, so the second terms on the right sides of (Equation8(8) (8) ) and (Equation9(9) (9) ) are . As for the first terms, it is easy to show that, for any β, we have where c does not depend on the parameter. Thus, we have By the weak law of large numbers, we have , with , . Thus, we have (10) (10) For fixed , write and . Then, by the inequality in Appendix A.1, we have with the equality holding if and only if . It follows that the right side of (Equation10(10) (10) ) is positive unless for all . Because the latter implies , a contradiction, the right side of (Equation10(10) (10) ) must be positive. Therefore, in conclusion, we have with probability tending to one that , hence , by the standard asymptotic result [see the note below (Equation7(7) (7) )].
Now suppose that the true parameter vector is . Then, it can be shown (see Appendix A.2) that , as , where Note that is not distributed as . To see this, note that if and only if . Because the pdf of ξ is positive everywhere, there is a positive probability that , hence . On the other hand, one always has . It follows that , hence .
The results so far in this section have shown that the asymptotic null distribution of depends on the values of the true parameters, namely, if , where [or , where , by a similar argument], the asymptotic null distribution is ; if , the asymptotic null distribution is not . Thus, the asymptotic null distribution is not parameter-free.
Nevertheless, is, in general (not restricted to logistic and r=2), a CAND in the sense of (Equation6(6) (6) ), provided that the log-likelihood is log-concave. This can be shown with a simple argument. For any , there is a such that . Due to the log-concavity, the event is the same as the event , where is the maximised log-likelihood over [see the note above (Equation5(5) (5) )]. On the other hand, we have , where is the maximised log-likelihood over . Therefore, we have , by the standard asymptotic result. Therefore, we have .
On the other hand, if such that while , by a similar argument as the above for the special case of logistic regression with r=2, it can be shown that, with tending to one, we have , hence . Because achieves its supremum at while , (Equation6(6) (6) ) must hold. Note that, by the definition of , which has , all the indexes j,k mentioned in this paragraph are assumed to be ; in other words, is not involved.
3. Power and sample size calculation
By the results of the previous section, we can use as the critical value of the LRT. As for the power calculation, although there are infinitely many alternatives that influence the power, it is often reasonable to assume, in practice, that the baseline probability is known. In other words, is known according to (Equation1(1) (1) ); therefore, is known.
In addition, the minimum probabilistic increase of the largest probability over the other probabilities, δ, is given. In other words, we consider all the alternatives such that (11) (11) for all . It follows, under (Equation1(1) (1) ), that all of the must be positive, and that (Equation11(11) (11) ) is equivalent to (12) (12) The minimum amount of increase of the left sides of (Equation12(12) (12) ) over the right sides takes place when the equalities hold in all of the inequalities, that is, when (13) (13) This results in r equations, from which we can uniquely determine the alternative. Note that (Equation13(13) (13) ) is a nonlinear equation system; however, it can be solved conveniently by utilising a Gauss-Seidel type algorithm (e.g., Jiang, Citation2000). Namely, from (Equation13(13) (13) ) we have (14) (14) where (the inverse of h exists because h is assumed to be strictly increasing). Thus, given the initial values (e.g., all zero), we have (15) (15) for . The convergence is guaranteed, and fast. We illustrate with an example.
Example 1.1
continued
In the animal clinical trial example, the researchers suggested a baseline probability of 0.3. Using the logistic regression, we have . Furthermore, the minimum probability increase was set (again by the researchers) as . Recall that r=2 in this case. The Gauss-Seidel algorithm converged within three iterations. Table shows the R outputs of the first five iterations.
Table 1. Convergence of Gauss-Seidel algorithm.
Given the target sample size, the power of the LRT at the alternative can be computed by a Monte Carlo method. Consider, for example, the case of logistic regression. Under the alternative , one can simulate data, , under the logistic regression. For each simulated data set, r+1 logistic regressions are fit. The first one is under the full model, the next one under the model without ,…, and the last one under the model without . Let denote the maximised log-likelihoods as results of these logistic regressions. We compute for the simulated data set. This is repeated B times, resulting . The power of the LRT is then approximated by .
If, instead, the goal is to determine the sample size so that the LRT has a designated power, say, γ, we can use the following bisection procedure to speed up the search for the minimum sample size. First pick a couple of initial sample sizes, and , and compute the power under and using the above procedure. Suppose that the power under is less than γ, and the power under is greater than γ. We then let (take the integer part, if necessary), and compute the power under using the above procedure. If the power under is greater than γ, let ; otherwise, let , and so on. The procedure should converge quickly to a single integer, , so that either or is the minimum sample size to have the power greater than or equal to γ.
4. Animal clinical trial revisited
Let us go back to Example 1.1 of Section 1. Recall the initial request was to make a power analysis based on the sample size n=30 for detecting a 10% difference between and the rest of the binomial probabilities, and the baseline probability was set as 30%. We considered a logistic regression with r=2 for this case. Here and . The alternative was computed by the Gauss-Seidel algorithm [see Example 1.1 (continued) in Section 3] as and . The corresponding probabilities are , , and . As we can see, the minimum difference between and the rest of the p's is 0.1. For this alternative, the power of the LRT at 5% level of significance was computed, using the Monte-Carlo method described in Section 3 with B=1000, as approximately 88%.
As the 80% power was considered satisfactory by the researchers, it appeared that the sample size might be reduced a little. However, when the same procedure was applied to n=20, the power was computed as approximately 78%. In communicating with the lead researcher, the researcher suggested that he would rather sacrifice a little regarding the minimum probabilistic difference in exchange for a reduced sample size (i.e., n=20). Thus, the new δ was set as 0.15. The new alternative was computed by the Gauss-Seidel algorithm (which, again, converged in three iterations) as and . The corresponding probabilities are , , and . As we can see, the minimum probabilistic increase of over the rest of the p's is 0.15. For the new alternative, the power of the LRT at 5% level of significance was computed, again using the Monte-Carlo method with B=1000, as approximately 86%. The researcher was satisfied with the result.
Acknowledgments
The authors are grateful to Dr. Markus Grompe of the Doernbecher Children's Hospital of the Oregon Health & Science University for presenting the problem from their research, and for information and helpful discussions. The authors also wish to thank a reviewer for helpful comments.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Funding
Notes on contributors
Thuan Nguyen
Thuan Nguyen is Associate Professor, Department of Public Health and Preventive Medicine, Oregon Health and Science University, USA.
Jiming Jiang
Jiming Jiang is Professor, Department of Statistics, University of California, Davis, USA.
References
- Alam, M. K., Rao, M. B., & Cheng, F.-C. (2010). Sample size determination in logistic regression. Sankhyā B, 72, 58–75. doi: 10.1007/s13571-010-0004-6
- Borenstein, M., Rothstein, H., & Cohen, J. (2001). Power and precision. Englewood, US: Biostat Inc.
- Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statistics in Medicine, 26, 3385–3397. doi: 10.1002/sim.2771
- Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17, 1623–1634. doi: 10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S
- Jiang, J. (2000). A nonlinear Gauss-Seidel algorithm for inference about GLMM. Computational Statistics, 15, 229–241. doi: 10.1007/s001800000030
- Jiang, J. (2010). Large sample techniques for statistics. New York: Springer.
- Novikov, I., Fund, N., & Freedman, L. S. (2010). A modified approach to estimating sample size for simple logistic regression with one continuous covariate. Statistics in Medicine, 29, 97–105.
- White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1–25. doi: 10.2307/1912526
- Whittemore, A. (1981). Sample size for logistic regression with small response probability. Journal of the American Statistical Association, 76, 27–32. doi: 10.1080/01621459.1981.10477597
Appendix
A.1. An inequality
Consider the function and . We have . Thus, , , or depending on , , or . It follows that , hence , has a unique maximum at , and for any . Thus, for any , we have
A.2. Some derivation in Section 2
We show that as , where the η's are defined in Section 2, if is the true parameter vector. With the notation introduced in Section 2, we have, by the Taylor expansion, , implying . Also, by the standard asymptotic expansion (e.g., Jiang, Citation2010, Ch. 4), we have It follows that Similarly, we have
and Furthermore, we have (16) (16) and, by the standard asymptotic result, (17) (17) Let denote the left side of (EquationA2(17) (17) ) divided by , and A denote the matrix in (EquationA1(16) (16) ). For and , let and denote the subvector and submatrix , respectively, j=1,2. Then, by the above expressions, it is easy to show that Because , as , by the continuous mapping theorem (e.g., Jiang, Citation2010, p.30), we have , where and .