974
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

Optimal Spiking Experiment for Noninferiority of Qualitative Microbiological Methods on Accuracy With Multiple Microorganisms

ORCID Icon, , & ORCID Icon
Pages 198-213 | Received 01 Oct 2020, Accepted 10 Nov 2021, Published online: 02 Feb 2022

Abstract

The European and United States Pharmacopoeia demand a noninferiority study on the detection of microorganisms when an alternate qualitative microbiological method is intended to replace the compendial microbiological method. However, without imposing any modeling assumptions or constraints, noninferiority studies require large numbers of test samples for a proposed noninferiority criterion of 0.7 or higher for each microorganism. When we can assume that the accuracy of the alternate method with respect to the compendial method is homogeneous across microorganisms, a joint statistical analysis of the data from all microorganisms can be used to help reduce the sample size dramatically. For this situation, we provide a test statistic for noninferiority, an optimal spiking experiment, and a sample size calculation approach under only mild modeling assumptions of the microorganism-specific detection proportions. We illustrate our approach on a real dataset and demonstrate good performance of our method using simulation studies.

1 Introduction

According to international guidelines USP < 1223 > and EP 5.1.6, validation of a qualitative rapid microbiological method (RMM) requires an illustration of noninferiority compared to the compendial method (CM) on two validation parameters: specificity and limit of detection. The European (EP) and United States Pharmacopoeia (USP) define specificity as the ability to detect the required range of microorganisms that may be presented in the test sample. They emphasize that extraneous matter in the test system (e.g., growth medium) should not interfere with the test. Thus, specificity is strongly related to false positives. The limit of detection (LOD) is defined in the EP and USP as the lowest number of microorganisms in a test sample that can be detected under the stated experimental conditions with at least 95% confidence. In other areas of science, the LOD is associated with the sensitivity of the test method (e.g., for chemical analyses). The EP also indicates that accuracy can replace the LOD under certain circumstances and they define accuracy as the closeness of test results between the alternative and compendial method.

These three validation parameters are all related to the probability of detecting microorganisms, but they do not provide direct information on the probability of detecting exactly one microorganism. This detection proportion (IJzerman-Boon and Van denHeuvel 2015) will naturally depend on the type of microorganism, the microbiological method, and possibly other types of factors. If this detection proportion is the main parameter that is affecting positive and negative test result, we could just study this one parameter in the validation study. In this setting, we could define accuracy as the ratio of the two detection proportions between the RMM and the CM for each microorganism and ignore the LOD and specificity. The RMM is then considered noninferior with respect to the CM if this accuracy parameter exceeds some predefined amount (say 70%), the so-called noninferiority margin, for each microorganism.

Estimation of this accuracy parameter is not trivial without making modeling assumptions on the probability of detecting microorganisms, since we can not spike test samples with exactly one microorganism. Using a Binomial form for the detection probability, IJzerman-Boon and Van denHeuvel (2015) used maximum likelihood estimation and constructed an approximate 95% confidence interval with good coverage, opening the way for noninferiority studies on this accuracy parameter. However, when only one type of microorganism is being tested, the authors demonstrated that they need approximately 200 test samples per microbiological method to reject the null hypothesis of 100% accuracy in favor of 70% accuracy (α=0.05 and β=0.20). Clearly, designing an experiment with 200 test samples per microbiological method is probably doable when a very small set of microorganisms is being tested in a validation study, but it is not feasible for a large set of microorganisms.

Alternatively, it may be assumed that the accuracy across microorganisms is homogeneous or consistent, while allowing for unique detection proportions for the different microorganisms. When the RMM uses similar principles of detecting microorganisms, this assumption seems warranted. For instance, certain RMM’s use a combination of the detection principle of the compendial method with modern technology (e.g., cameras and specialized software) to detect growth of microorganisms that can not yet be seen with the naked eye), making the method faster in detecting organisms than the compendial method that does not use this modern technology. Under this assumption we may pool all the data from all microorganisms to determine noninferiority of the RMM with respect to the CM across a large set of microorganisms and potentially reduce the total sample size dramatically. This article will study the optimal design for such a noninferiority study and provide a minimum sample size calculation formula that would guarantee noninferiority with a particular power and a specific Type I error rate. Simulation studies will demonstrate the performance of our approach.

Section 2 contains the statistical details on the probability detection model, the formulation of accuracy and noninferiority, our optimal design, and the sample size calculation. In Section 3, we present a simulation study where we studied the performance of our theory for different sets or distributions of detection proportions. Section 4 demonstrates a real case study where we implemented our theory in practice. Then, the final section contains a summary and a discussion.

2 Statistical Methodology

Let Yijk{0,1} be a binary random variable representing the absence and presence of microorganisms in test sample k{1,2,,n} obtained with microbiological method j{1,2} (j = 1: CM; j = 2: RMM) for type of microorganism i{1,2,,m}. We further assume that Xijk is the true latent or hidden number of microorganisms for type of microorganism i in test sample k that is allocated to method j. We assume that the conditional probability of a positive test result in test sample k, given the true number of microorganism Xijk=x, is equal to(1) P(Yijk=1|Xijk=x)=1(1πij)x,(1) where πij is the so-called detection proportion for type of microorganism i and method j, since it represents the probability of detecting one microorganism (Van den Heuvel and IJzerman-Boon Citation2013; IJzerman-Boon and Van denHeuvel 2015; Manju et al. Citation2019), that is, πij=P(Yijk=1|Xijk=1). A detection proportion of πij=1 implies that microbiological method j is able to detect microorganism i perfectly (i.e., the limit of detection is equal to one), but when πij is not perfect (πij<1), the microbiological method will detect on average less microorganisms than actually present in a test sample. Model (1) is called the binomial detection mechanism, as proposed by others (Abbott Citation1925; CochranCitation1950; Ridout, Fenlon and Hughes Citation1993; IJzerman-Boon and Van denHeuvel 2015).

We now assume that the detection proportion of the RMM (j = 2) satisfies the following relation with respect to the CM (j=1):(2) πi2=θπi1,i{1,2,,m},(2) with θmin{π111,π211,,πm11} to maintain πi21 for all i. This restriction would be automatically guaranteed when θ1. The parameter θ can be viewed as the accuracy or recovery of the RMM with respect to the CM on detecting one microorganism in a test sample. It is considered homogeneous or constant over all m different types of microorganisms, while maintaining potentially different detection proportions π11,π21,,πm1 for the CM on the set of microorganisms. Under assumption (2) it is more convenient to use the notation πi=πi1.

The RMM is considered to be noninferior when the accuracy parameter θ is above some predefined noninferiority margin δ, that is, leading to noninferiority hypothesis(3) H0:θδversusH1:θ>δ.(3)

For microbiological methods it is not uncommon to use a noninferiority margin equal to δ=0.7 (see Sutton, Knapp and Dabbah) or δ=0.8 (which can be derived from an absolute margin of 0.2 on differences in proportions in USP < 1223 >).

To test this noninferiority hypothesis, m different types of microorganisms are being tested. We will assume that for each type of microorganism a single spike solution is being created from which 2n test samples of equal volume are being collected. These 2n test samples are randomly divided into two groups each with n test samples. One group of n test samples are tested with the RMM and the other group with the CM. We will assume that the total volume of all 2n test samples is relatively small compared to the volume of the spiked solution, so that we can assume that test samples are approximately independent (Cochran Citation1950; Wyshak and Detre Citation1972; Van den Heuvel Citation2011). We further assume that the microorganisms are randomly distributed in the spiked solution, implying that the true number of microorganisms Xijk in test sample k follows approximately a Poisson distribution with mean spike λ (Cochran Citation1950; Van den Heuvel Citation2011; IJzerman-Boon and Van denHeuvel 2015; Manju et al. Citation2019). Combining these assumptions with binomial detection probability (1), the probability that a test sample is tested positively, also called the positive rate, is then equal to(4) P(Yi1k=1)=1exp(λπi)andP(Yi2k=1)=1exp(λθπi).(4)

Our research question is to optimally test hypothesis (3) using data {Yijk} having probability distribution (4) and with π1,π2,,πm being treated as nuisance parameters. More specifically, we would like to know how to choose λ and n such that the null hypothesis for a given noninferiority margin δ can be rejected using a Type I error rate of α and a power of 1β.

2.1 Maximum Likelihood Estimation

Our test statistic for noninferiority hypothesis (3) will be based on a lower confidence limit (Hahn and Meeker Citation1991; Wellek Citation2010) for the maximum likelihood estimator of θ (or alternatively for a transformation of this estimator). Thus, we need to study the maximum likelihood estimators for θ and πi first. We do not have to estimate the spike parameter λ since the spike level is normally set by the experimenter (although it is difficult to set this spike level precisely). In the rare case that no information on λ would be available we would estimate the parameter π˜i=λπi for microorganism i. The interpretation of π˜i is then lost, but we could still estimate the common accuracy θ.

The log-likelihood function for the set of observations {Yijk} under the earlier stated model assumptions is equal to(5) L=i=1mj=12(Yij.log(1exp(λθjπi))λθjπi(nYij.)),(5) where accuracy parameter θj is given by θj=θ when j = 2 and θj=1 when j = 1, and where Yij.=k=1nYijk is the number of positively tested samples for type of microorganism i and method j. The set of likelihood equations for estimation of θ and πi is then given by(6) L.θ=Lθ=i=1m(λπiYi2.1exp(λθπi)nλπi)=0,L.πi=Lπi=λYi1.1exp(λπi)+λθYi2.1exp(λθπi)nλ(θ+1)=0.(6)

In case m = 1, the solution to the set of equations in (6) is given by(7) θ̂=log(1Y¯12)log(1Y¯11)andπ̂1=log(1Y¯11)λ,(7) assuming that λ is (approximately) known and under restriction of 0<Y¯ij<1 for j{1,2}, with Y¯ij=Yij./n the average number of positively tested samples or the observed positive rate for microorganism i and method j. In case Y¯ij is equal to a boundary value {0,1}, the estimators θ̂ and/or π̂1 may either not exist or otherwise be unrealistic. For instance, when Y¯11=1 both estimators θ̂ and π̂1 do not exist, while for Y¯11=0 only the estimator θ̂ does not exist. When Y¯12=0 or Y¯12=1, it may lead to the unrealistic estimates θ̂=0 or θ̂=, respectively. Thus, when we have only one type of microorganism, we can only allow experiments with data satisfying 0<Y¯1j<1,j{1,2}, that is, for our approach both positive and negative samples should occur for each method. It implies that the average spike λ should be selected appropriately to avoid these boundary issues as much as possible.

The probability that the observed positive rates for one of the methods is equal to one of the boundaries is given by(8) PB(η)=exp(nη)+(1exp(η))n(8) with η=λπi for measuring microorganism i with the CM and η=λθπi for the RMM. In this probability is visualized as a function of the sample size for three levels of η. It shows that large sample sizes may be needed to have a low probability that the positive rates are away from the boundary, in particular when η is larger.

Fig. 1 Visualization of the probability of observing only positive or only negative test results.

Fig. 1 Visualization of the probability of observing only positive or only negative test results.

In case m > 1, we may be less stringent on boundary issues when at least one type of microorganism satisfies 0<Y¯ij<1, with j{1,2}. Indeed, when the observed positive rates for one type of microorganism are away from the boundary, this microorganism already provides a proper estimate of θ (and its standard error) and therefore, we may allow the observed positive rates of other types of microorganisms to be equal to the boundary. However, we demonstrate in Appendix A that we still must ignore microorganisms when both observed positive rates are equal to the same boundary (Y¯i1=Y¯i2=1 or Y¯i1=Y¯i2=0) if we wish to obtain proper estimates of θ and πi (assuming that at least one type of microorganism has data away from the boundaries), but we do not have to exclude data when only one method has an observed positive rate at the boundary.

The (m+1)×(m+1) Fisher information matrix I is determined by the following moments of the score functions (see Appendix B)(9) var(Lθ.)=i=1mnλ2πi2exp(λθπi)1exp(λθπi)var(L.πi)=nλ2(exp(λπi)1exp(λπi)+θ2exp(λθπi)1exp(λθπi))cov(Lθ.,L.πi1)=nλ2θπiexp(λθπi)1exp(λθπi)cov(L.πs,L.πr)=0,rs(9)

Using the inverse of the Fisher information matrix I1, the asymptotic variance of θ̂ can be determined in a closed-form expression (see Appendix B) and it is given by(10) τA2var(θ̂)=[i=1mnλ2πi2(exp(λθπi)1)+θ2(exp(λπi)1)]1.(10)

An estimate τ̂A2 of this asymptotic variance can be obtained by substituting the ML estimators θ̂ and π̂i in (10). Using this estimated asymptotic variance, an asymptotic one-sided confidence limit can be constructed by(11) LCLθ=θ̂z1ατ̂A,(11) with α being the significance level and zq the qth quantile of the standard normal distribution. Then the noninferiority hypothesis (3) is rejected when the lower confidence limit is above the noninferiority margin δ, that is, LCLθ>δ.

Alternatively, we will use a lower confidence limit on the ML estimator of the logarithmically transformed accuracy log(θ), with log the natural logarithm, since we expect that the distribution of a logarithmic transformation of an estimator of a ratio (see (2)) may converge faster to a normal distribution than the distribution of the ratio estimator itself. The lower confidence limit on the logarithmic transformed estimator can be obtained with the delta method (Cramér Citation1946), leading to(12) LCLlog(θ)=log(θ̂)z1αθ̂1τ̂A.(12)

In this case, the noninferiority hypothesis (3) is rejected when this transformed lower confidence limit is above the logarithmically transformed noninferiority margin, that is, LCLlog(θ)>log(δ). Note that this transformed approach leads to a multiplicative noninferiority test in the original scale of the accuracy and could therefore have a better power than the test based on (11).

2.2 Optimal Spiking Design

For our optimal spiking design, we will focus on an experiment with a single spiked solution for each type of microorganism that is then used to collect 2n samples. Since the detection proportion πi for microorganism i is typically unknown at the design stage, we would like to determine one optimal level for the mean spike per test sample λ that is used for all microorganisms. In other words, to maximize the power of our noninferiority test in either (11) or in (12), we should minimize the variance τA2 in (10) over the mean spike λ, while the detection proportions π1, π2, …, πm are being unknown and potentially arbitrary.

The variance τA2 in (10) can be bounded from below and from above using minima and maxima values of the elements in the sum in (10). Indeed, the sum in (10) is smaller than or equal to m times the maximum element and larger than or equal to m times the minimum element. These boundaries only include a single detection proportion, making the form of the variance in (10) somewhat easier to study. The boundaries for the variance τA2 are given bymini{(exp(λθπi)1)+θ2(exp(λπi)1)mnλ2πi2}τA2maxi{(exp(λθπi)1)+θ2(exp(λπi)1)mnλ2πi2}.

Clearly, the maximum and minimum variance are reached at different values of the detection proportion and the two boundaries will become equal to each other when all detection proportions are equal (π1=π2==πm=π). The variance of the ML estimator θ̂ then reduces to(13) VE=(exp(λθπ)1)+θ2(exp(λπ)1)mnλ2π2.(13)

It can be shown that VE in (13) is a convex function with respect to π, λ, and λπ (Appendix C), which implies that it has a unique minimum for λ. However, the solution for λ would depend on the detection proportion π and the accuracy θ and it does not come in a closed-form expression. On the other hand, we know that the optimal solution for λ is of the form λ=λ0π1, since the variance in (13) is a function of λπ. The solution of the optimal λ0 for the minimum variance in (13) can be found by setting the derivative of (13) with respect to λπ equal to zero, and thus satisfies the following equation(14) (θλ02)exp(θλ0)+θ2(λ02)exp(λ0)+2(1+θ2)=0.(14)

shows the optimal values λ0 for different values of accuracy θ using the Newton–Rhapson procedure (Süli and Mayers Citation2003).

Table 1 Optimal values of λ0 for different levels of accuracy θ and optimal sample sizes mn for 80% power.

The optimal spike λ can be selected approximately when there would be knowledge on the average detection proportion π¯=i=1mπi/m for the CM, because the first-order approximation of i=1m([nλ2πi2]/[exp(λθπi)1+θ2(exp(λπi)1)]) in (10) is equal to [mnλ2π¯2]/[exp(λθπ¯)1+θ2(exp(λπ¯)1)]. Thus, the first-order approximation of the variance in (10) is given by (13) with the detection proportion π replaced by π¯. If however information on the (average) detection proportion is lacking, then it should be noted that what is actually needed, is some knowledge on the product λπi. This can be obtained for a particular organism by performing a pilot experiment in which a laboratory uses a compendial enumeration method to estimate λî of the spike λ for microorganism i, assuming that the detection proportion πi=1 for the compendial enumeration method. If the latter assumption is not fulfilled, then λi actually estimates the product λπi, which can be used to design the study.

A minimum sample size mn that provides power 1β for testing hypothesis (3) using the proposed confidence limit in (11), can be calculated with the well-known sample size formula for a simple hypothesis test based on normality (Donner, Birkett and Buck Citation1981; Hsieh Citation1988; Julious Citation2004)(15) mn=[z1α+z1β]2[exp(θλ0)1+θ2(exp(λ0)1)]λ02(θδ)2.(15)

When we would use the logarithmically transformed test statistic in (12), the corresponding sample size formula would be equal to (15) with (θδ)2 replaced by θ2(log(θ)log(δ))2. Since log(1+x)>x[1+x]1 for all positive x, we obtain that log(θ)log(δ)>1δ/θ for all θ>δ. Hence, the sample size will be smaller for the logarithmically transformed test statistic compared to the nontransformed test statistic when the accuracy is beyond the noninferiority margin. Or in other words, the logarithmically transformed test will have more power than the nontransformed test when applied to the same dataset.

Several sample sizes mn are presented in for different values of accuracy θ for a selected noninferiority margin of δ=0.7. It demonstrates that the total sample size can stay below 400 test samples when the true accuracy is at least 0.9 and an optimal spike can be selected. However, the sample sizes are calculated under ideal settings (an optimal spike and equal detection proportions) and it requires an evaluation when the settings would be different in practice. shows that the variance (13) and the sample size (15) may substantially change with the value λπ, especially for values below 1 or above 3, which may be an issue when exact information on the average detection proportion is lacking. In the following section we will conduct simulations to better understand the influence of unequal detection proportions on our testing procedure.

Fig. 2 The variance VE (with mn = 1) and the sample size mn (α=0.05, β=0.20,δ=0.7) as a function of λπ for different values of θ.

Fig. 2 The variance VE (with mn = 1) and the sample size mn (α=0.05, β=0.20, δ=0.7) as a function of λπ for different values of θ.

3 Simulation

To simulate data for our (optimal) spiking experiment, we need to select different detection proportions for the CM. We will follow two different approaches. First of all, we will assume that the detection proportions are drawn randomly from a probability distribution. This means that for each simulation run we draw a new set of detection proportions. We used the logistic-normal LN(μ,σ) and the beta B(a, b) distribution. For the logistic-normal distribution, the detection proportion for the CM is set to πi=exp(Zi)/(1+exp(Zi)), with Zi drawn from a normal distribution, ZiN(μ,σ2), and for the beta distribution the detection proportion is set to πi=Ui, with UiB(a,b). For both distributions, we used two different settings: (μ,σ)=(1,0.25) and (0.5,0.5), and (a,b)=(5,1) and (1, 1), respectively. The density functions for these choices of distributions are given in . The mean detection proportions π¯ for these four distributions are equal to 0.73, 0.62, 0.83, and 0.5, respectively. Second, we will consider different sets of fixed detection proportions. For these settings, the proportions will remain the same for each simulation run. One set of detection proportions consists of the quantiles πi=F1(i/(m+1)), with F one of the logistic-normal or beta distributions and m the number of microorganisms. The second set of detection proportions consists of a small detection proportion between 0.1 and 0.2 and the other detection proportions above 0.5 such that the average of the detection proportions is equal to the mean detection of one of the four distributions F. For this settings we only consider m = 5 different types of microorganisms. For the distribution LN(1,0.25), the detection proportions are chosen equal to 0.15, 0.75, 0.83, 0.91, and 0.99, with an average of π¯=0.73. For the distribution LN(0.5,0.5), the detection proportions are chosen equal to 0.15, 0.7, 0.725, 0.75, and 0.775, with an average of π¯=0.62. For the beta distribution B(5, 1), the detection proportions are selected equal to 0.2, 0.975, 0.983, 0.991, and 0.999 with an average of π¯=0.83 and, finally, for the distribution B(1, 1) the detection proportions are selected equal to 0.15, 0.5, 0.56, 0.62, and 0.68, with an average of π¯=0.5.

Fig. 3 Density functions for the simulation of detection proportions (left: logistic-normal; right: beta).

Fig. 3 Density functions for the simulation of detection proportions (left: logistic-normal; right: beta).

3.1 Simulation of Spiking Experiment

We select different numbers for the types of microorganism (m{5,10,15}) and use three levels for the homogeneous accuracy (θ{0.8,0.9,1.0}). We also choose θ=0.7 equal to the noninferiority margin δ=0.7 to investigate the Type I error. After selecting m and θ, we draw the detection proportions π11, π21, …, πm1 for the CM using one of the four distributions for the random case. For the fixed setting of detection proportions, we just use the set of values we indicated above. Moreover we determine πi2=θπi1 for the RMM. Then, the sample size n is taken such that mn is equal to the optimal sample size in using test statistic LCLθ. In case n would not be an integer we rounded n to the nearest integer value larger than n. Then, for each microorganism i and method j, we simulated the numbers of microorganisms Xij1,Xij2, …,Xijn in the n test samples using a Poisson(λ) distribution, with λ being the average number of microorganisms in the test samples (λ{1.5,2.0,2.5,3.0,3.5}). These numbers of microorganisms are assumed to be independent. Then, the observed binary outcomes Yij1,Yij2, …, Yijn , are independently generated with a Bernoulli distribution using the conditional distribution P(Yijk=1|Xijk)=1(1θjπi1)Xijk, with θj=1 if j = 1 and θj=θ if j = 2. This procedure is repeated 1000 times, to obtain multiple simulated spiking experiments under the same settings and investigate the power of our test statistics. The simulation of this spiking experiment was conducted with SAS software, version 9.4.

3.2 Analysis of Simulated Spiking Experiment

For each simulated dataset, we calculated the observed positive rate Y¯ij for microorganism i and method j. To be able to estimate the parameters θ and πi and to conduct the noninferiority tests for each simulated spiking experiment, we first had to clean the data and eliminate types of microorganisms with observed positive rates at the boundary. Any type of microorganism with either only negative (Y¯i1=Y¯i2=0) or otherwise only positive samples (Y¯i1=Y¯i2=1) were removed from the data when we had at least one type of microorganism with data 0<Y¯ij<1 (Appendix A). Information on the number of microorganisms that were available for the analysis will be reported.

It is expected that the detection proportions that are randomly drawn for each simulation run will typically result in less extreme results than for the fixed settings. The reason is that in the fixed settings several proportions are really close to one (especially for the distributions B(5, 1) and LN(1.0,0.25)), which may result in positive rates for microorganisms equal to the boundary one for the corresponding microorganisms. Thus, the consequence is that more datasets may be eliminated in the fixed settings compared to the random settings.

After eliminating microorganisms, the model parameters are estimated and the noninferiority tests are performed. In practice, laboratories are often able to provide an estimate λ̂i of the spike λ for microorganism i using a compendial enumeration method. This estimate can be used in the analysis by substituting λ by this estimate. In the simulation study we used λ̂i=[X¯i1+X¯i2]/2 as proxy, with X¯ij=k=1nXijk/n the simulated average in the test sample. The analysis of the data was conducted with SAS software, version 9.4, using procedure NLMIXED. Both the Type I error and the power of the noninferiority test will be reported.

3.3 Simulation Results

In this section, we only report the results of the detection proportions that were drawn randomly from the four distributions for each simulation run. The results of the fixed detection proportions are reported in Appendix D. The conclusions for these different ways of selecting detection proportions are not fundamentally different. The Type I error rates between these two approaches are very comparable, but the power values are somewhat lower in the fixed settings for higher values of the spike level λ. This can be fully explained by the higher numbers of datasets that must be eliminated (as we described in Appendix A). Our sample sizes are calculated under the assumption that none of the data is being eliminated. Thus, if then substantial amounts of data must be removed because they cannot contribute to the estimation of the parameters, the sample sizes have been underestimated, leading to lower powers that may drop below 80%.

shows the average, the minimum, the maximum, and the 5% quantile of the numbers of microorganisms that were not removed from the simulated spiking experiment (m = 15, λ=3.5) for the four distributional choices and for two levels of accuracy. For the other chosen settings, all with lower m, λ, and θ, less data had to be removed since these settings had a lower chance of obtaining boundary values (see (8)).

Table 2 Summary statistics of the included number of microorganisms for random detection proportions.

It shows that it is unlikely to remove more than 6 types of microorganisms in a spiking experiment with 15 microorganisms (i.e., a quantile of 9 for distribution B(5, 1)), but in most settings it is expected to remove substantially less data. Clearly, the relative small sample size of n = 15, calculated for an accuracy of θ = 1, is more likely to provide data at the boundaries frequently when the average spike would be larger (see (8) and ), but this is less of an issue when the sample size is larger and the accuracy is smaller, for example, when n = 26 and θ=0.9.

The performance of the two noninferiority tests is provided in . provides the simulated Type I error rate, while the other three figures provide the power of the tests for different levels of accuracy (at significance α=0.05). In each figure we show the results of the four choices of distribution functions for the detection proportions and the three settings for the number of microorganisms (m{5,10,15}) with the corresponding sample sizes n.

Fig. 4 Type I error rate of the two noninferiority test (θ=0.7) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 4 Type I error rate of the two noninferiority test (θ=0.7) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 5 Power of the two noninferiority tests (θ=0.8) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 5 Power of the two noninferiority tests (θ=0.8) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 6 Power of the two noninferiority tests (θ=0.9) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 6 Power of the two noninferiority tests (θ=0.9) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 7 Power of the two noninferiority tests (θ = 1) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

Fig. 7 Power of the two noninferiority tests (θ = 1) as a function of λ for different distributions and number of microorganisms when detection proportions are selected randomly.

shows that the significance level of 5% is nicely achieved. This may not be a surprise, since we used a sample size mn that was calculated under assumption that the accuracy is equal to θ=0.7. The fluctuations of the Type I error rates seem to be random and unrelated to the choices of distributions.

The results on power demonstrate that the noninferiority test LCLlog(θ) is more powerful than LCLθ (as expected). For the first three distributions, a power of 80% is achieved for LCLlog(θ) in most settings when the average spike λ can be maintained within 2 to 3 colony forming units per test sample (CFU/sample). This does not always occur for test LCLθ, even though we used the sample size calculation for this test. Spiking levels at λ=1.5 and/or λ=3.5 CFU/sample may diminish the power, depending on the mean detection proportion of the distribution in combination with the accuracy. Indeed, the distributions with lower mean detection proportions (LN(0.5,0.5) and B(1, 1)) seem to benefit from a somewhat higher spike, in particular when the accuracy is lower. This is not surprising since the first-order approximation of variance τA2 in (10) is equal to VE in (13) with π replaced by π¯ (see Section 2.2). The two distributions with a higher mean detection proportion seem to do better for lower spike levels, but a level of 1.5 CFU/sample still seems to be too low. This is not surprising either, since a spike lower than 1.593 would be less optimal even when the mean detection proportion would be equal to one (see ).

A power of 80% does not seem to be attained for any spike level when the accuracy is θ=0.8 and the detection proportions follow a uniform distribution (B(1, 1)). We think this may be caused by detection proportions close to zero. The variance VE for the accuracy estimator for detection proportions close to zero would have extreme standard errors (see ). Thus, just one small detection proportion may affect the power of the test negatively. For lower values of θ this effect is enhanced compared to accuracies closer to one.

The form of the density for the detection proportions also seems to affect the power of the noninferiority test on accuracy, which is probably caused by the likelihood of a small detection proportion. The distributions LN(0.5,0.5) and B(1, 1) do not just have a lower mean, they also show more variability, with a higher probability of generating small detection proportions. Based on our past experiences with the CM, we expect that the LN(1.0,0.25) and B(5.0,1.0) are more realistic distributions on detection proportions in practice than the other two. Having detection proportions below 0.3 for microorganisms with the CM seems less likely, unless the microorganism is highly stressed and unwilling to grow.

4 Case Study

Our approach has been implemented in practice, where a specific RMM was compared to the CM. The company who performed the validation study decided to test m = 16 different types of microorganism and used noninferiority margin δ=0.7. Although there is no reason to assume that the RMM is inferior to the CM, the accuracy will be considered less than 1 to avoid being too optimistic. It was assumed that the accuracy was in the range θ[0.85,0.95]. This leads to a sample size in the range of n[18,39], using test statistic LCLθ and . It was therefore decided to take n = 30 test samples per microorganism and method, allowing for a true accuracy of just below 0.9. The optimal λ0 should be chosen between 1.634 and 1.721 CFU/sample. Assuming that the mean detection proportion may be close to 0.5, since a few difficult to test microorganisms are being tested, a spike λ in the range of 3 CFU/sample was recommended.

The collected number of positive test samples for the noninferiority study are given in . The company also provided an estimated or predicted spike level based on the compendial enumeration method.

Table 3 Numbers of positive test samples out of 30 test samples from a real noninferiority study.

The model and methodology as explained in Section 2 was used to estimate the accuracy and the detection proportions for all microorganisms with the CM. We used the estimated spike levels provided by the company in the analysis, but we obtained the same conclusion on the accuracy when we ignored this information. If the estimated spike would be somewhat too low (or underestimated), we would obtain estimates for the detection proportions that may be larger than one. This is not an issue when interest is in the estimation and testing of the accuracy, it merely shows that the predicted spike level was not accurately determined. The SAS code is provided in Appendix E.

The estimates of the individual detection proportions and their 95% confidence intervals are given in . The estimate of the logarithmically transformed accuracy with its 90% confidence interval is given by –0.156 [0.319,0.007]. Since the noninferiority margin in the log scale is equal to 0.357log(0.7), we reject the null hypothesis on inferiority and conclude that the RMM is most likely noninferior to the CM. The estimate of the accuracy with its 90% confidence interval is θ̂=0.856 [0.727,1.007].

Table 4 Estimated detection proportions with their 95% confidence intervals.

5 Discussion and Conclusion

We have provided an optimal framework for a noninferiority study on detection of microorganisms between two qualitative microbiological methods where we have assumed that the accuracy or recovery parameter is homogeneous across all types of microorganisms. We provided the optimal average spike level for the test samples taken from a single solution experiment and presented a sample size formula. Our simulation study showed that the proposed noninferiority tests have a power that is close to or better than the requested power. We recommend to use the logarithmically transformed accuracy test in the analysis, but a sample size calculation that is conducted for the accuracy test in the original scale. The reason for this discrepancy is that our calculations are somewhat optimistic since they are calculated under ideal circumstances.

The main benefit of our approach is the potential of a substantial reduction in sample sizes. IJzerman-Boon and Van denHeuvel (2015) showed that a sample size of 200 test samples per microbiological method for one microorganism is necessary to obtain 80% power for a traditional hypothesis test on accuracy when the true accuracy is 70%. For a homogeneous accuracy setting, this number could theoretically be reduced to 213 test samples for all microorganisms together, when the number of microorganisms is limited to 7. The reason for this limit is that with less than 30 test samples per microorganism, there is a risk that the collected data on a microorganism is not suitable for noninferiority testing, because the positive rates likely become equal to zero or one, excluding this microorganism from participating in the noninferiority study. Our case study demonstrated that a sample size of 30 test samples per method for each microorganism is attainable and worked nicely and we recommend not to go lower due to this risk of observing boundary values. This is still a reduction in sample sizes of about 85% with current status quo.

Our analysis also demonstrated that an optimal average spike is somewhere in between 2 and 3 CFU/sample when the detection proportions are larger than 0.5 and not all equal to one. This is in line with the official guidelines (USP < 1223 > 2015), who propose to choose a spike level such that the positive rate for the compendial is between 50 to 75%, even though higher spike levels have been recommended in the past. However, our noninferiority test is different from these guidelines, which proposes to use the most probable number (MPN) that is calculated from multiple dilutions or a noninferiority test on the positive rates. The MPN is less powerful than our test statistic and the second test statistic is highly sensitive to the spike and may falsely result in noninferiority when spike levels are selected somewhat higher (IJzerman-Boon, Manju, and Van denHeuvel 2020).

The disadvantage of our approach is that it is not known whether we may assume a homogeneous accuracy in practice. When the rapid and the compendial method have the same detection principles or mechanisms, but the rapid method is just faster than the compendial method in establishing a decision, it may not be unrealistic that there exists a consistent ratio between the two methods on detecting a single organism across different types of microorganisms. However, when detection mechanisms between the rapid and compendial method are truly different it is unknown if a common accuracy is realistic. Note that different detection proportions for detecting different organisms can be assumed for a common or homogeneous accuracy as well as for heterogeneous accuracies. In case the assumption of a common accuracy is false or there exists doubt on the assumption, different microorganisms may possibly be grouped based on their expectation of equal accuracy, but with different accuracies across groups. If certain groups can be formed, then our noninferiority framework can be applied to these groups separately and still result in lower sample sizes. In the near future, we are planning to study statistical tests that could evaluate the assumption of a homogeneous accuracy or otherwise determine the influence of heterogeneous accuracies on the noninferiority of the rapid method compared to the compendial method for a particular organism. This may complement the work we provided here. It should be noted that in the presented case study the likelihood ratio test for testing the null hypothesis of one common accuracy was not rejected (p = 0.794), suggesting that homogeneity of accuracy was reasonable, but the power (and Type I error) of the likelihood ratio test still need evaluation.

Supplemental material

Supplemental Material

Download Zip (4.2 KB)

Additional information

Funding

This work is part of the research program Rapid Micro Statistics with project number 15990, which is (partly) financed by the Dutch Research Council (NWO).

Notes

1 This equation actually has a solution whenever the observed positive rates Y¯21 and Y¯22 are not equal to the same boundary value as we will see for the second equation in (A.1) in Scenarios 2 to 5. Thus, one of the positive rates is allowed to be equal to the boundary, as long as the other is not. However, characteristics of the solution depend on whether or not the positive rates are at a boundary zero or not.

References

  • Abbott, W. S. (1925), “A Method of Computing the Effectiveness of an Insecticide,” Journal of Economic Entomology, 18, 265–267. DOI: 10.1093/jee/18.2.265a.
  • Cochran, W. G. (1950), “Estimation of Bacterial Densities by Means of the “Most Probable Number”,” Biometrics, 6, 105–116. DOI: 10.2307/3001491.
  • Cramér, H. (1946), Mathematical Methods of Statistics, Princeton, NJ: Princeton University Press.
  • Donner, A., Birkett, N., and Buck, C. (1981), “Randomization by Cluster: Sample Size Requirements and Analysis,” American Journal of Epidemiology, 114, 906–914. DOI: 10.1093/oxfordjournals.aje.a113261.
  • EP 5.1.6. (2017), European Pharmacopoeia 9.2. Alternative Methods for Control of Microbiological Quality, EDQM: Strasbourg, pp. 4339–4348.
  • Hahn, G. J., and Meeker, W. Q. (1991), Statistical Intervals: A Guide for Practitioners, New York: Wiley.
  • Hsieh, F. Y. (1988), ‘Sample Size Formulae for Intervention Studies With The Cluster as Unit of Randomization,” Statistics in Medicine, 7, 1195–1201. DOI: 10.1002/sim.4780071113.
  • IJzerman-Boon, P. C., Manju, M. A., and Van den Heuvel, E. R. (2020), “Non-Inferiority Testing for Qualitative Microbiological Methods: Assessing and Improving the Approach in USP <1223> ,” Manuscript in Preparation.
  • IJzerman-Boon, P. C., and Van den Heuvel, E. R. (2015), “Validation of Qualitative Microbiological Test Methods,” Pharmaceutical Statistics, 14, 120–128. DOI: 10.1002/pst.1663.
  • Julious, S. A. (2004), ‘Sample Sizes for Clinical Trials With Normal Data,” Statistics in Medicine, 23, 1921–1986. DOI: 10.1002/sim.1783.
  • Lu, T. T., and Shiou, S. H. (2002), “Inverses of 2 × 2 Block Matrices,” Computers & Mathematics with Applications, 43, 119–129.
  • Manju, M. A., Van den Heuvel, E. R., and IJzerman-Boon, P. C. (2019), “A Comparison of Spiking Experiments to Estimate the Detection Proportion of Qualitative Microbiological Methods,” Journal of Biopharmaceutical Statistics, 29, 30–55. DOI: 10.1080/10543406.2018.1452027.
  • Ridout, M. S., Fenlon, J. S., and Hughes, P. R. (1993), “A Generalized One-Hit Model for Bioassays of Insect Viruses,” Biometrics, 49, 1136–1141. DOI: 10.2307/2532255.
  • Süli, E. and Mayers, D. F. (2003), An Introduction to Numerical Analysis, Cambridge: Cambridge University Press.
  • Sutton, S. V. W., Knapp, J. E., and Dabbah, R. (2001), “Activities of the USP Microbiology Subcommittee of Revision During the 1995–2000 Revision Cycle,” PDA Journal of Pharmaceutical Science and Technology, 55, 33–48.
  • USP <1223> (2015), United States Pharmacopoeia 40. Validation of Alternative Microbiological Methods, Rockville, MD: U.S. Pharmacopoeial Convention, pp. 1756–1770.
  • Van den Heuvel, E. R. (2011), “Estimation of the Limit of Detection for Quantal Response Bioassays,” Pharmaceutical Statistics, 10, 203–212. DOI: 10.1002/pst.435.
  • Van den Heuvel, E. R. and IJzerman-Boon, P. C. (2013), “A Comparison of Test Statistics for the Recovery of Rapid Growth-Based Enumeration Tests,” Pharmaceutical Statistics, 12, 291–299. DOI: 10.1002/pst.1581.
  • Wellek, S. (2010), Testing Statistical Hypotheses of Equivalence and Noninferiority, Boca Raton, FL: CRC Press.
  • Wyshak, G., and Detre, K. (1972), ‘Estimating the Number of Organisms in Quantal Assays,” Applied Microbiology, 23, 784–790. DOI: 10.1128/am.23.4.784-790.1972.

Appendix A

Here, we discuss maximum likelihood estimation of the model parameters θ and π1, π2, …, πm for model (4) based on data {Yijk}, where part of the data may be at the boundary value (Y¯ij{0,1}), but at least one type of microorganism has data that is not at the boundary (0<Y¯ij<1, j{1,2}). We illustrate this for just two microorganisms, where one microorganism may have observed boundary values (i = 1) and the other microorganism has not (i = 2). We may assume λ = 1, because we could just replace λπi by a new parameter ηi that should be estimated instead. The likelihood equations from (6) are then reduced to the following three equations:(A.1) π1Y¯12[1exp{θπ1}]1+π2Y¯22[1exp{θπ2}]1=π1+π2,Y¯11[1exp{π1}]1+θY¯12[1exp{θπ1}]1=1+θ,Y¯21[1exp{π2}]1+θY¯22[1exp{θπ2}]1=1+θ.(A.1)

The proof is built up in the following way. First, we demonstrate that the MLE’s do not exist when Y¯11 and Y¯12 are at the same boundary. For all remaining settings the MLE’s do exist. To demonstrate that claim, we first discuss the solution π̂2(θ) for π2 using the third equation in (A.1). We show its existence for each θ and study π̂2(θ) as function of θ, to determine certain characterizations we need later in the proof. Then we study π̂1(θ) in the second equation of (A.1) for different data scenarios of Y¯11 and Y¯12. In these scenarios we also study the solution θ̂ for the first equation in (A.1) when the solutions π̂1(θ) and π̂2(θ) are substituted.

The same boundary values for the first microorganism

In case Y¯11=0 and Y¯12=0, the second equation in (A.1) reduces to 0=θ+1, which results in the unrealistic estimator of θ̂=1. When Y¯11=1 and Y¯12=1, the second equation in (A.1) becomes equal to [exp{π1}1]1+θ[exp{θπ1}1]1=0, which has no finite solution for the detection proportion π1. Thus, it is obvious that a microorganism can not have two observed positive rates at the same boundary. Therefore, microorganisms with either Y¯11=0 and Y¯12=0 or Y¯11=1 and Y¯12=1 should be removed from the spiking experiment. For all other settings we will demonstrate that the set of equations in (A.1) has solutions for θ, π1, and π2.

Solution detection proportion for the second microorganism

The third equation in (A.1) has a unique solution for π2 when the observed positive rates satisfy 0<Y¯21<1 and 0<Y¯22<1 and when θ>0 is fixedFootnote1. The function Y¯21[1exp{π2}]1+θY¯22[1exp{θπ2}]1 is a continuous decreasing function in π2>0, since both parts are decreasing functions. It converges to when π2 converges to 0 and when π2 converges to , the function converges to Y¯21+θY¯22<1+θ. Thus, there is just one unique solution π̂2(θ)>0 that solves the third equation in (A.1). It should be noted that π̂2(1)=log(1Y¯2), with Y¯2=[Y¯21+Y¯22]/2. Hence, if both methods are equal, that is, θ = 1, then the detection proportion π2 for the second microorganism is estimated based on the average positive rates Y¯2 of the two methods.

Due to the continuity of the function Y¯21[1exp{π2}]1+θY¯22[1exp{θπ2}]1 in π2 and θ, the function π̂2(θ) is continuous in θ. Furthermore, when θ converges to zero, θY¯22[1exp{θπ2}]1 would converge to infinity when π2 would be fixed. Thus, π2 needs to increase to compensate for this increase and to become or maintain a solution for the third equation in (A.1). This leads to limθ0π̂2(θ)= and limθ0Y¯21[1exp{π2}]1=Y¯21. Furthermore, we obtain that limθ0θπ̂2(θ)=0, from the observation thatlimθ0θY¯22[1exp{θπ̂2(θ)}]1=1Y¯21>0,after rewriting the third inequality in (A.1).

We will now demonstrate that θ0π̂2(θ0)<θ1π̂2(θ1) when 0<θ0<θ1. We only need to address the situation where π̂2(θ0)π̂2(θ1), since otherwise θ0π̂2(θ0)<θ1π̂2(θ1) is trivially obtained. When π̂2(θ0)π̂2(θ1) is true, we obtain the following inequality(A.2) 1Y¯21[1exp{π̂2(θ0)}]11Y¯21[1exp{π̂2(θ1)}]1,(A.2) since [1exp{π2}]1 is a decreasing function in π2. Now assume that also the alternative inequality θ0π̂2(θ0)θ1π̂2(θ1) is true, then we would obtain the inequality(A.3) Y¯22[1exp{θ0π̂2(θ0)}]11Y¯22[1exp{θ1π̂2(θ1)}]11,(A.3) again using that [1exp{π2}]1 is a decreasing function in π2. Rewriting the third equation in (A.1) asθ=(1Y¯21[1exp{π̂2(θ)}]1)/(Y¯22[1exp{θπ̂2(θ)}]11), and using inequalities (A.2) and (A.3), we obtain the following contradiction:θ0=1Y¯21[1exp{π2(θ0)}]1Y¯22[1exp{θ0π2(θ0)}]111Y¯21[1exp{π2(θ1)}]1Y¯22[1exp{θ1π2(θ1)}]11=θ1.

Thus, θπ̂2(θ) must be an increasing function in θ.

Finally, rewriting the third equation and letting θ converge to infinity, we obtain(A.4) limθY¯221exp{θπ̂2(θ)}1=limθ1θ(1Y¯211exp{π̂2(θ)}).(A.4)

Since θπ̂2(θ) is an increasing function in θ, it either converges to infinity or to some finite value. In both cases the left-hand side in (A.4) is a finite value. Thus, the right-hand side in (A.4) must be a finite value too. If we define ξ2=limθ(1Y¯21[1exp{π̂2(θ)}]1)/θ as this finite limit, we obtain that limθθπ̂2(θ)=log(1Y¯22/[1+ξ2]), with ξ2[Y¯221,0). Note that ξ2 can not be equal to zero. To demonstrate this, lets assume ξ2=0. In this case, we would have that the expression on the left-hand side in (A.4) would be positive for any value of θ, since θπ̂2(θ) is an increasing function, [1exp{θπ̂2(θ)}]1 is a decreasing function, and its limit is equal to zero. Thus, the expression in the left-hand side of (A.4) approaches ξ2=0 from above. We would also have that limθπ̂2(θ)=limθθ1log(1Y¯22)=0, since Y¯22<1. In combination with the fact that limθ0π̂2(θ)=, this means that we can find a value θ such that the expression on the right-hand side in (A.4) becomes negative, since Y¯21>0. Since the equality in (A.4) should hold for any value of θ and not just for the limits, we have created a contradiction. Thus, the value ξ2 must be negative.

Solution detection proportion for the first microorganism

We will now study the solution π1 from the second equation in (A.1) as a function of θ, assuming that Y¯11 and Y¯12 are not both equal to the same boundary. We will make use of the same reasoning as for π2 where possible. We will study the different settings for Y¯11 and Y¯12 and show that the first equation in (A.1) also has a solution in θ̂ after we substituted the solutions π̂1(θ) and π̂2(θ) in the first equation of (A.1).

Scenario 1 {0<Y¯11<1,0<Y¯12<1}

This is the exact same situation as for π2 and we thus obtain the same characteristics for π̂1(θ) as for π̂2(θ). Since [1exp{θπ̂i(θ)}]1 converges to infinity when θ converges to zero (since limθ0θπ̂i(θ)=0), we can find a value θ0 such that Y¯i2[1exp{θ0π̂i(θ0)}]1>1 for i{1,2}. For this value, it follows for the left-hand side of the first equation in (A.1) that(A.5) π̂1(θ0)Y¯121exp{θ0π̂1(θ0)}+π̂2(θ0)Y¯221exp{θ0π̂2(θ0)}>π̂1(θ0)+π̂2(θ0).(A.5)

Additionally, similar to (A.4) for the second microorganism, Y¯i2[1exp{θπ̂i(θ)}]1 converges to 1+ξi<1 when θ converges to infinity. Thus, we can find a θ1 such that we have the opposite inequality in (A.5). This implies that there exists a θ(θ0,θ1) that would give equality in (A.5), since π̂i(θ) is a continuous function in θ.

Scenario 2 {Y¯11=0,Y¯12>0}

The second equation in (A.1) becomes equal to θY¯12[1exp{θπ1}]1=1+θ and leads immediately to the solution of π̂1(θ)=θ1log(1θY¯12/(θ+1)). This solution satisfies the same characteristics as the solution from Scenario 1. Indeed, θπ̂1(θ)=log(1θY¯12/(θ+1)) is an increasing function in θ with limθ0θπ̂1(θ)=0 and limθθπ̂1(θ)=log(1Y¯12), which could be infinite when Y¯12=1. Furthermore, the solution π̂1(θ) converges to infinity when θ converges to zero. To demonstrate the existence of θ, we can use the same reasoning as in Scenario 1. There is again a θ0 such that inequality in (A.5) holds true, making use of limθ0θπ̂i(θ)=0,Y¯i2>0, and 1exp{θπ̂i(θ)} converges to zero when θ converges to zero. There also exists a θ1>θ0 such that the inequality in (A.5) is reversed when θ0 is replaced by θ1. Indeed, Y¯22[1exp{θπ̂2(θ)}]1 converges to 1+ξ2<1 and Y¯12[1exp{θπ̂1(θ)}]1 converges to 1.

Scenario 3 {Y¯11>0,Y¯12=0}

The second equation in (A.1) becomes equal to Y¯11[1exp{θπ1}]1=1+θ and leads immediately to the solution of π̂1(θ)=log(1Y¯11/(θ+1)). This solution is somewhat different from the solution in Scenario 1, since limθ0π̂1(θ)=log(1Y¯11)< due to Y¯11>0. Furthermore, we have that θπ̂1(θ)=θlog(1Y¯11/(θ+1)) is an increasing function in θ with limθ0θπ̂1(θ)=0 but with limθθπ̂1(θ)=. We now need to show that there exists a solution θ such that(A.6) Y¯22[1exp{θπ̂2(θ)}]1=1+π̂1(θ)/π̂2(θ).(A.6)

Since π̂1(0) is finite, π̂2(θ) converges to infinity when θ converges to zero, and 1exp{θπ̂2(θ)} converges to zero when θ converges to zero, we can select a θ0 such that the left side in (A.6) with θ replaced by θ0 is larger than the right-hand side in (A.6) with θ replaced by θ0. Furthermore, we may find a θ1 such that the left side in (A.6) is lower than one, since 1exp{θπ̂2(θ)} converges to one when θ increases and Y¯22<1, while the right side is always larger than one. Thus, there exist a θ(θ0,θ1) that would give equality in (A.6).

Scenario 4 {Y¯11=1,Y¯12<1}

Note that we may assume here that Y¯12>0, since the combination Y¯12=0 and Y¯11=1 has been addressed in Scenario 3. Since the arguments for finding solution π2 is unaffected by Y¯21=1 when Y¯22<1, the solution π̂1(θ) also exists for situation Y¯11=1 and Y¯12<1. We need to investigate if the same characteristics holds for π̂1(θ) as for π̂2(θ). The continuity of π̂1(θ), limθ0π̂1(θ)=, and being a nondecreasing function is not affected by Y¯11=1 when Y¯12<1. Furthermore, for the asymptotics of θπ̂2(θ) we did not need that Y¯21<1, thus we also have limθθπ̂1(θ)=log(1Y¯12/[1+ξ1]), with ξ1=limθ(1[1exp{π̂1(θ)}]1)/θ. The complication is to demonstrate that limθ0θπ̂1(θ)=0 when Y¯11=1. Rewriting the second equation in (A.1) and making use of limθ0π̂1(θ)=, we obtainlimθ0Y¯12[1exp{θπ̂1(θ)}]1=limθ0(θ1+1[1exp{θπ̂1(θ)}]1)=.

This equality can only hold when limθ0θπ̂1(θ)=0. Thus, the proof now follows Scenario 1.

Scenario 5 {Y¯11<1,Y¯12=1}

Note that we may assume here that Y¯11>0, since the combination Y¯12=1 and Y¯11=0 has been addressed in Scenario 2. For the solution of π1, we may follow the proof of finding solution π2 completely. The existence of π̂1(θ), the continuity of π̂1(θ), limθ0π̂1(θ)=, limθ0θπ̂1(θ)=0 and θπ̂1(θ) being a nondecreasing function in θ are all unaffected by Y¯12=1. With Y¯12=1, it follows that limθθπ̂1(θ)=log(1Y¯12/[1+ξ1])=log(11/[1+ξ1]). Now using the proof of Scenario 1 for finding the solution θ, we also demonstrate the existence of θ̂ in this scenario.

Appendix B

We will demonstrate that the variance of the maximum likelihood estimator θ̂ is given by (10). Note that the log-likelihood function is given in (5) and that the score functions are given in (6). They can be obtained by just taking simple derivatives with respect to the parameters. Here, we will calculate the variances and covariances of the score functions and then calculate the inverse Fisher information to determine the variance of θ̂.

First note that the expected value of the score functions are zero, since E(Yij.)=n[1exp{λθjπi}], with θj=1 if j = 1 and θj=θ if j = 2. Thus, the variance of L.θ is now determined byvar(L.θ)=i=1mvar(λπiYi2.1exp(λθπi)nλπi),=i=1m(λπi1exp(λθπi))2var(Yi2.),=i=1mnλ2πi2exp(λθπi)1exp(λθπi),since the variance of Yi2. is equal to n[1exp(λθπi)]exp(λθπi). The variance of L.πi is given by var(L.πi)=E(λYi1.1exp(λπi)+λθYi2.1exp(λθπi)nλ(θ+1))2,=[λ1exp(λπi)]2var(Yi1.)+[λθ1exp(λθπi)]2var(Yi2.), =nλ2exp(λπi)1exp(λπi)+nλ2θ2exp(λθπi)1exp(λθπi), since the variance of Yi1. is equal to n[1exp(λπi)]exp(λπi). Since Yi1. and Yi2. are independent and data from different microorganisms are independent, the covariance between L.θ and L.πi is now obtained ascov(L.θ,L.πi)=λ2θπi[1exp(λθπi)]2var(Yi2.),=nλ2θπiexp(λθπi)1exp(λθπi), and the covariance between L.πi and L.πi with ii, is obviously zero.

The Fisher information matrix can now be written as(B.1) I=[I11I12I21I22],(B.1) withI11=var(L.θ),I12=(cov(L.θ,L.π1),,cov(L.θ,L.πm)),andI22=[var(L.πi)000var(L.π2)000var(L.πm)].

Clearly, I11 and I22 are nonsingular and invertible. Since I11I12I221I21 is also invertible, the inverse of I can be written as (Lu and Shiou Citation2002)I1=[(I11I12I221I21)1(I11I12I221I21)1I12I221(I22I21I111I12)1I21I111(I22I21I111I12)1].

The variance of the MLE of θ is therefore given by var(θ̂)=(I11I12I221I21)1, the first (top-left) element of the inverse Fisher information matrix. We now obtain thatI12I221I21=i=1m[cov(L.θ,L.πi)]2/var(L.πi),=i=1m(nλ2θπiexp(λθπi)1exp(λθπi))2(nλ2exp(λπi)1exp(λπi)+nλ2θ2exp(λθπi)1exp(λθπi))1.

If we now introduce γij=1exp(λθjπi), we obtain that exp(λθjπi)/[1exp(λθjπi)]=[1γij]/γij, exp(λθjπi)1=γij/[1γij], and the term within the sum of I12I221I21 can be rewritten intonλ2θ2πi2(1γi2γi2)2(1γi1γi1+θ21γi2γi2)1.

Then we obtain thatI11I12I221I21=nλ2i=1mπi2(1γi2)γi2[1θ2(1γi2)γi1γi2(1γi1)+θ2γi1(1γi2)],=nλ2i=1mπi2[(1γi2)(1γi1)γi2(1γi1)+θ2γi1(1γi2)],=nλ2i=1mπi2[(exp(λθπi)1)+θ2(exp(λπi)1)]1,

This leads to the variance in (10).

Appendix C

We will study the convexity of variance VE in (13) as a function of λπ. This is the same as studying this variance as a function of π>0 when λ = 1. Since mn is a constant, we could also assume that it is equal to one. Thus, we will study the function(C.1) VE(π)=exp(θπ)1+θ2(exp(π)1)π2,π>0.(C.1)

To demonstrate that this function is convex, we will study the first and second derivative. The first derivative is equal to(C.2) exp(θπ)(θπ2+2exp(θπ))+θ2exp(π)(π2+2exp(π))π3(C.2) and its second derivative is equal to(C.3) g(x1)+θ2g(x2)π4,(C.3) where g(x)=exp(x)(x24x+66exp(x)),x1=θπ and x2=π. The function g(x) is an increasing function, since its derivative is g(x)=exp(x)(x22x+2) and this function is always positive for any x. Since g(0)=0, we also obtain that g(x) > 0 when x > 0. This implies that the second derivative in (C.3) is positive for π>0. Hence, the variance VE is a convex function in λπ, λ, and π.

Appendix D

Here we present the results of the simulation study for the case where detection proportions were selected as fixed parameters for simulation runs under the same simulation settings.

When the detection proportions are fixed at the quantiles of one of the four distribution functions, shows the summary statistics for the available number of microorganisms in an experiment with m = 15, λ=3.5 and two accuracies θ=0.9 and θ = 1. The results show that for distribution B(5, 1), 5% of the simulation runs have less than 9 organisms, while we started with m = 15 microorganisms and for a few simulations the complete experiment was invalidated based on the criteria in Appendix A. As we mentioned before, the relatively large detection proportions for this distribution function provide a high probability that the number of positive samples for both methods are at the same upper boundary, making the data useless for our estimation procedure.

Table 5 Summary statistics of the included number of microorganisms for fixed detection proportions based on quantile of the four distributions.

shows that the Type I error rate is close to the nominal value of α=0.05. The issues of eliminating data (also for settings with large numbers of organisms removed) do not seem to affect the Type I error a lot. show similar powers to the setting with the detection proportions randomly drawn from the distribution (cf. ). The random settings remove almost the same amount of data from an experiment as the fixed settings, which explains the similarity in power. The power values are at some settings below 80%, since the sample size calculation did not take into account the loss of data.

Fig. 8 Type I error rate of the two noninferiority tests (θ=0.7) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 8 Type I error rate of the two noninferiority tests (θ=0.7) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 9 Power of the two noninferiority tests (θ=0.8) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 9 Power of the two noninferiority tests (θ=0.8) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 10 Power of the two noninferiority tests (θ=0.9) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 10 Power of the two noninferiority tests (θ=0.9) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 11 Power of the two noninferiority tests (θ = 1) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

Fig. 11 Power of the two noninferiority tests (θ = 1) as a function of λ for the different number of microorganisms when detection proportions are fixed at the quantiles of a distribution.

For the case of one small detection proportion and four larger detection proportions, we see similar results as for the detection proportions selected at the quantiles. The summary statistics can be found in , the Type I error can be found in , while the power can be found in .

Fig. 12 Type I error rate of the two noninferiority tests (θ=0.7) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 12 Type I error rate of the two noninferiority tests (θ=0.7) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 13 Power of the two noninferiority tests (θ=0.8) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 13 Power of the two noninferiority tests (θ=0.8) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 14 Power of the two noninferiority tests (θ=0.9) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 14 Power of the two noninferiority tests (θ=0.9) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 15 Power of the two noninferiority tests (θ = 1) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Fig. 15 Power of the two noninferiority tests (θ = 1) as a function of λ for the number of microorganisms m = 5 when detection proportions are fixed with one small value and an average equal to the distribution mean.

Table 6 Summary statistics of the included number of microorganisms for fixed detection proportions with one small value and an average equal to the distribution mean.

Appendix E

PROC NLMIXED DATA = VALIDATION QPOINTS = 20 DF= 10000; PARMS %DO I = 1%TO 16; DET&I = 0.8%END; LOGTHETA=-0.1; %DO I = 1%TO 16; IF MICROORGANISM =&I THEN DO; PC = 1-EXP(-DET&I.*LAMBDA_HAT); PR = 1-EXP(-EXP(LOGTHETA)*DET&I.* LAMBDA_HAT); END; %END; P = (METHOD=’C’)*PC + (METHOD=’R’)*PR; MODEL Y ∼ BINARY(P); ESTIMATE ’LOG ACCURACY’ LOGTHETA ALPHA = 0.1; RUN;

CODE 1. The SAS code for estimating the detection proportions and the common accuracy.