Abstract
The paper proposes new computational methods of computing confidence bounds for the least-squares estimates (LSEs) of rate constants in mass action biochemical reaction network and stochastic epidemic models. Such LSEs are obtained by fitting the set of deterministic ordinary differential equations (ODEs), corresponding to the large-volume limit of a reaction network, to network's partially observed trajectory treated as a continuous-time, pure jump Markov process. In the large-volume limit the LSEs are asymptotically Gaussian, but their limiting covariance structure is complicated since it is described by a set of nonlinear ODEs which are often ill-conditioned and numerically unstable. The current paper considers two bootstrap Monte-Carlo procedures, based on the diffusion and linear noise approximations for pure jump processes, which allow one to avoid solving the limiting covariance ODEs. The results are illustrated with both in-silico and real data examples from the LINE 1 gene retrotranscription model and compared with those obtained using other methods.
1. Introduction
The rapid advances in modern molecular quantification technology, such as fluorescence microscopy, cellular flow cytometry and high-throughput RNA-seq (see, e.g. [Citation15,Citation26,Citation27]), provide us with an unprecedented opportunity to discover the basic operating principles of molecular systems biology. However, progress seems to largely depend on our ability to bring to bear proper statistical methods capable of separating the biological signal of interest from the vast amount of experimental noise. In order to be effective, such methods must be tailored to the specific experimental tools and data collection mechanisms which, despite continuing improvements, still often provide incomplete and highly noisy information.
In this paper, we are concerned with statistical methods for biochemical reaction rates estimation from the data consisting of one or more time course trajectories of (partial) network species concentrations measured on a pre-defined time grid via, for example, RT-PCR or flow cytometry assays [Citation4,Citation21]. Although this particular problem has been discussed extensively in the computational biology and statistical literature ([Citation9,Citation12] and references therein), its asymptotic analysis was often dismissed due to a small number of longitudinal data points available from typical laboratory experiments. However, as pointed out recently in [Citation11] in many practical situations, it is reasonable to assume that the size of biological system (e.g. cell culture) under consideration is sufficiently large to allow for its large-volume approximation, the technique which is often used in modern population dynamics models, like, for example, epidemic spread studies [Citation1]. The volume limit idea should be particularly appealing to molecular biology experimentalists as it circumvents the problem of small longitudinal sample sizes due to instrumental limitations and cost considerations. The same approach is also applicable (although this is not the focus of current paper) to stochastic epidemic models [Citation1].
Using the large-volume argument Rempala [Citation16] has shown that the popular method of parameter estimation based on the least-squares criterion yields consistent and asymptotically efficient results when fitting the deterministic system of ordinary differential equations (ODEs) to the stochastic data arriving from large collections of molecules (e.g. in the RT-PCR experiments) under the law of mass action kinetics. However, as also pointed out in [Citation16], the practical obstacle to large-volume approximation for the least-square estimates (LSEs) is in the fact that their joined asymptotic covariance structure is given very non-explicitly by a solution to a set of nonlinear differential equations. Consequently, due to often numerically ill-posed inverse problems, it is quite difficult to obtain the asymptotic confidence bounds for the LSEs, even in relatively simple models.
In the current paper, we aim at improving the practical applicability of the results discussed in [Citation16]. First, we consider a slightly more general version of the LSE, the so-called penalized LSE or PLSE. Whereas in many modern statistical analysis problems the penalized methods are used to improve identifiability and decrease variability in the model predictions, in our case the biggest potential gain in their usage is the stabilization of the variance matrix (for the fixed volume) in a way analogous to the ridge model in classical regression analysis [Citation10]. Furthermore, based on the consistency of PLSEs, we derive and illustrate with examples two large-volume parametric bootstrap (Monte-Carlo) procedures which allow for consistent estimation of the covariance structure, possibly under limited and only partially observable trajectory data. In order to improve efficiency in re-computing model trajectories, we propose the usage of diffusion and closely related linear noise approximations (LNAs) [Citation12] with the bootstrap algorithm. We show that the consistency of our procedures follows from the classical large-volume limit theorem for the density-dependent jump process (see, e.g. [Citation7, Chapter 10]). Finally, in order to apply our methods to practical experimental data, we also propose a procedure based on the profile likelihood bootstrap algorithm which allows for a more realistic selection of the effective volume size of the modelled system. In essence, we treat the volume size as an estimable parameter of the model and fit it via the maximal likelihood method.
Beyond this introductory section, the paper is organized as follows. In Section 2, we introduce the concept of a stochastic biochemical network under mass action dynamics, along with the necessary notation. In Section 3, we discuss the least-squares estimation problem for mass action biochemical networks and prove basic consistency and normality results for penalized least-squares estimates (PLSEs). These PLSE results are analogous to those in [Citation16] but form the foundation for our latter analysis and are provided for completeness. As they are consequences of a general law of large numbers and a central limit theorem (CLT) for density-dependent stochastic processes, we provide brief review of the relevant facts about such processes in Appendix 1. In Section 4, our main results on the bootstrap algorithms for estimating LSE distribution are discussed, along with the formulation of the required large-volume consistency and diffusion approximation results. The proofs are deferred to Appendix 2. Finally, in Section 5, we illustrate some of the concepts discussed earlier with the analysis of the LINE 1 retrotransposition model for both synthetic and real data, comparing the latter analysis to the one presented in [Citation16] relying on simple asymptotic LSE bounds. Our conclusions and a brief summary are given in Section 6.
2. Jump processes and mass action kinetics
A reaction system or reaction network is typically defined as a set of d constituent chemical species reacting according to a finite set of equations, each describing the system state change after the occurrence of a particular reaction. For instance, the chemical reaction is interpreted as ‘a molecule of species A combines with a molecule of species B to create a molecule of species C at rate λ’. In this paper, the reaction system is treated as a continuous-time Markov chain, with its state described by a d-dimensional vector keeping track of each species molecular counts at time t. The kth reaction is determined by the reaction rate function along with two vectors, and which give, respectively, the number of molecules of each species consumed and produced in the reaction, Both and are the elements of and the corresponding reaction rate is a non-negative scalar. Under the assumptions that the system is well-stirred and thermally equilibrated, the reaction rates take on the classical mass action form and the waiting times between reactions are exponentially distributed (see, e.g. [Citation8]). Intuitively, the mass action form of a kth reaction rate function represents the number of different ways we can choose molecular substrates needed for the reaction k to occur [Citation7, Chapter 10]. Defining , the kth reaction rate function has the form Here, the parameter n denotes the effective volume of the system. Whereas in many classical chemistry applications n is simply the physical volume of a container (e.g. cell) multiplied by Avagadro's number, in biological systems, where often the physical units of the parameters are not readily available,Footnote1 it is convenient to consider n simply as a scaling parameter proportionate to the overall system volume [Citation2,Citation3].
The exponential waiting time distribution, along with the probability of a particular reaction [Citation8] justifies the system time-evolution equation (1) (1) Denoting by the species concentration per unit volume and observing the behaviour of the mass action rate functions in the large-volume limit () one obtains the approximation (2) (2) When is replaced with in Equation (Equation1(1) (1) ), the process becomes a member of a general family of the so-called density-dependent Markov jump processes (DDMJP) described in the appendix. As shown in the appendix (cf. Theorem A.1) the approximation (Equation2(2) (2) ) ties together the stochastic jump process with its large-volume limit which turns out to be given by the solution of an integral version of the classical law of mass action ODE (the so-called reaction rate equation, cf., e.g. [Citation24]), (3) (3) The above ODE is parametrized by a vector of parameters, say β, involving the rate constants as well as possibly the initial conditions . In what follows, we write for any such solution to either Equation (Equation3(3) (3) ) or to its integral version (see also Appendix 1).
3. Penalized least squares
Let the counts of d species observed at discrete time points, , ( be denoted by where for now we assume that this observed data arrived from a single trajectory of DDMJP for which the effective system volume n was known. The extension to multiple trajectories is straightforward, but not considered here except for a very simple case of independent observations (see Corollary 3.5 of Theorem 3.3). The case when n is unknown is discussed in Section 5. Following Rempala [Citation16], we denote the scaled molecular counts and assume that is the true parameter value we intend to estimate. Throughout the paper, the PLSE is defined as any solution (not necessarily unique) to the optimization problem (4) (4) where , h is a penalty function and Θ is an open set in containing . Common forms of include for the ridge regression, for the lasso [Citation23], and for the elastic net [Citation29]. Note that in our setting we will typically have component-wise, in which case the lasso penalty becomes (right) differentiable as it reduces to
For any matrix A, define . The following two results extending those in [Citation16] will be useful in the following sections. The first one establishes the strong law of large numbers (SLLN) and the consistency of as .
Theorem 3.1
SLLN
Let denote the derivative of with respect to the kth coordinate of and set . Assume that and that the SLLN for DDMJP holds cf. Theorem A.1 in the appendix the subderivatives of the penalty function h are bounded and as . If (5) (5) and for some (6) (6) then the penalized LSE given by Equation (Equation4(4) (4) ) are strongly consistent, that is,
Proof.
Let denote the vector of subderivatives (the sub-gradient) of . By the so-called KKT necessary optimality conditions (see, e.g. [Citation20], Chapter 3), any solution of the optimization problem (Equation4(4) (4) ) has to satisfy with and denoting, respectively, the observations from d chemical species concentrations measured at and the corresponding deterministic solution of Equation (Equation3(3) (3) ) evaluated at and . This yields . Adding and subtracting the deterministic limit gives or (7) (7) Under the SLLN for DDMJP cited in the appendix (cf. Theorem A.1), the condition and the boundedness of imply the RHS above converges a.s. to 0 and hence so does the LHS. Consequently, . Under the assumptions of the trajectories identifiability (Equation5(5) (5) ) and non-degeneracy (Equation6(6) (6) ), the asymptotic inversion gives a.s.
Remark 3.1.
Note that the conditions (Equation5(5) (5) ) and (Equation6(6) (6) ) are necessary to guarantee the LSE consistency. The selection of points is therefore important to ensure good statistical properties of LSEs, as also seen in our next theorem below. The required boundedness of h subderivatives is, on the other hand, a relatively mild condition and satisfied, e.g. for and . Indeed, the latter is simply differentiable, and for the former, recall that the subderivative of is , or any number in , respectively, according to , or .
The second result of this section is the CLT for the scaled PLSEs . Similarly to the SLLN above, this one is also a consequence of the appropriate result for DDMJP.
Theorem 3.2
CLT
Suppose that the assumptions of the previous theorem hold and additionally and h is twice differentiable in . Furthermore, assume that the hypothesis of CLT for DDMJP holds cf. Theorem A.2 in the appendix. Then for any where is a multivariate zero-centred Gaussian distribution with the covariance matrix (8) (8) Here, H is the Hessian matrix of the penalty function h and is given by Equation (Equation23(A5) (A5) ) of Theorem A.2 in the appendix.
Proof.
In the notation of the previous proof, under the differentiability assumptions on the penalty function h, in Equation (Equation7(7) (7) ) we may take , the gradient vector of the penalty. Expanding the LHS of Equation (Equation7(7) (7) ) about gives In view of the theorem assumptions, the higher order terms scaled by tend to 0, and therefore (9) (9) since the condition implies that the term in the second line above converges to 0. From Theorem A.2 in the appendix, it follows that the d-vectors ) converge weakly to the d-variate normal variables whose covariance structure is given by described in Equation (EquationA5(A5) (A5) ). Therefore, the relation (Equation9(9) (9) ), along with the continuous mapping and Slutsky's theorem arguments (see, e.g. [Citation22], p. 133), implies that where ⇒ denotes the usual weak convergence of random vectors. Since by Slutsky's theorem, the hypothesis follows via the Cramer–Wold device (see, e.g. [Citation22], p. 56).
Remark 3.2.
Note that the relation (Equation9(9) (9) ) indicates that for finite n the penalty term may decrease the variability of the LSE, by stabilizing the inverse of matrix. Indeed, for instance for this effect is similar to that observed in the ridge regression model. This is further illustrated in the numerical example in Section 5 (cf. Table ).
Table 1. Penalized vs. ordinary LSEs for L1.
In Theorems 3.1 and 3.3, we have assumed that data come from a single trajectory of the jump process. However, the results are obviously immediately extendable to the case of multiple independent trajectories. This special case of independent observations is of interest, for instance, in RT-PCR experiments with multiple cell-culture dishes and therefore deserves a special attention.
Corollary 3.1.
Assume that the trajectory data are mutually independent, then for and, assuming that remaining conditions in Theorem 3.3 hold true, its hypothesis also holds with
As already indicated in Section 1, the complicated, non-explicit covariance structure of the collection of random variables ) limits the usefulness of the relation (Equation9(9) (9) ) in obtaining asymptotic confidence bounds for the penalized LSE. Although for some simple reaction networks the explicit analytic expression for Equation (Equation8(8) (8) ) may sometimes exist, in most cases of practical interest it may be only numerically evaluated (cf. Section 5). As such evaluations become more unstable with increasing system complexity (see, e.g. [Citation25]), the methods of approximating the LSEs asymptotic distribution without the need to explicitly evaluate the formula (Equation8(8) (8) ) are naturally of great interest. This issue is discussed next.
4. Bootstrap algorithms
We present here two resampling algorithms for estimating the distribution of the PLSEs discussed in the previous section. As seen below, this may be done either directly, by considering the appropriate resampling distributions themselves, or indirectly, by considering the resampling estimates of the asymptotic covariance in Theorem 3.3.
4.1. Diffusion approximation
For the purpose of bootstrapping trajectories, it is useful to have an easy to simulate approximation to the concentration process where is given by Equation (Equation1(1) (1) ) (cf. also Equation (EquationA2(A2) (A2) ) in the appendix). To this end, consider the following Itô diffusion process (see, e.g.[Citation2]): (10) (10) where are independent standard Brownian motions and (11) (11) is the drift function. Note that under the law of mass action approximation (Equation2(2) (2) ), is asymptotically () equal to the RHS of the reaction rate equation (Equation3(3) (3) ). As shown in Lemma A.3 in the appendix, the process defined by Equation (Equation10(10) (10) ) approximates in the sense that both share the same deterministic large-volume limit given by the solution of the reaction rate equation (Equation3(3) (3) ). Note that there is an obvious computational benefit in considering the process instead of the pure jump process which requires an exact (hence expensive) simulation of the trajectory (Equation1(1) (1) ) [Citation8]. For large systems, the simulation of is typically substantially less computationally intense and may be performed via one of the many standard numerical methods available in the literature [Citation28, Chapter 5].
The considerations above lead to the diffusion bootstrap algorithm (Algorithm 1) for approximating either directly the distribution of the penalized LSE or the covariance matrix . The formal justification of Algorithm 1 is given in the following consistency result (Theorem 4.1) stating that as the system volume increases, the bootstrap samples have the asymptotically correct empirical distribution and, consequently, may be used to estimate the covariance structure of the least squares and penalized least-squares estimates with vanishing penalty. The proof is given in Appendix 2.
Theorem 4.1.
Assume is a scaled trajectory from the pure jump process (Equation1(1) (1) ) with . Let be associated with the probability space, . Let denote the diffusion process parametrized by the LSE and let be the penalized least-squares estimate associated with the conditional probability space, Then, under the assumptions of Theorem 3.3,
Although in the above theorem we have assumed that the penalty function is differentiable, it should be noted that this is not necessary. Provided that the subderivatives of the penalty term are bounded, the term vanishes asymptotically and the theorem hypothesis still holds.
Remark 4.1.
From the proof of the theorem, it is evident that the hypothesis of Theorem 4.1 extends to the case of independent observations of Corollary 3.5.
4.2. Linear noise approximation
An alternative approximation of Equation (Equation1(1) (1) ), which is often numerically simpler than , may be obtained by directly appealing to the limit theorem for DDMJP, that is, Theorem A.2 in the appendix. Since under the assumptions of Theorem A.2 as where is the Gaussian process satisfying the distributionally equivalent diffusion equations (EquationA3(A3) (A3) ) and (EquationA4(A4) (A4) ), the natural approximation of the process by its limit is therefore (12) (12) where we write β for for notational convenience. The process is commonly referred to as the LNA of [Citation12]. In our present context, it may be also heuristically derived as follows (for another heuristic derivation cf., e.g. [Citation28], Chapter 8). Since given by Equation (Equation10(10) (10) ) and a.s. (cf. Theorem A.1), then Substituting defined in Equation (Equation12(12) (12) ) for , and applying Taylor's expansion about to the drift function , one obtains for large n where . Assuming that , where both and are non-random, and collecting terms of different order in n, the above yields (13) (13) and (14) (14) showing that the quantities and in Equation (Equation12(12) (12) ) are indeed (distributionally) the same as those defined in Theorem A.2.
The usage of in place of leads to another resampling algorithm similar to the diffusion bootstrap, but potentially exhibiting better numerical stability due to the natural decoupling of the deterministic and stochastic components of the resamples following from Equation (Equation12(12) (12) ). For the record, we present the second resampling algorithm below as Algorithm 2.
Formally, the validity of Algorithm 2 is derived from the following analogue of Theorem 4.1. The proof is again deferred to the appendix.
Theorem 4.2.
Let and be as in Theorem 4.1. Let denote the LNA process conditional on and let be the penalized least-squares estimate associated with the conditional probability space, . Then, under the assumptions of Theorem 3.3,
Note that Remark 4.2 following Theorem 4.1 also applies to Theorem 4.3.
4.3. Unobserved species and the EM algorithm
In most practical circumstances, some of the species in the chemical model will be unobservable due to technical or other experimental limitations, like, for instance, the inability to simultaneously record genomic and proteomic data [Citation16]. Let us briefly outline how the inference problem may be handled in such cases. Following the set-up of Section 3, denote by the limiting () scaled concentration of chemical species j at time point (assumed as observed) and let and . Define in a similar fashion vector corresponding to the unobserved species, so the full species vector is then . Under either LNA or the diffusion approximation assumption, the joint limited likelihood of the full data is multivariate normal with its mean being the vector of the ODE solutions to Equation (Equation3(3) (3) ) for all d species and its precision matrix where the partitioning patterns reflects the covariance structure between species vectors and . In this notation, the expression for the limiting log-likelihood function of the full data satisfies With the help of the above, the asymptotic inference on β may be performed via the usual EM algorithm [Citation5]. In particular, it follows that the algorithm's E-step has a closed form expression (see, e.g. [Citation14] Chapter 2) and may be maximized numerically as soon as the Gaussian covariance structure is estimated (e.g. via Algorithm 1 or 2).
5. Example: L1 system
As an example, consider a retrotranscription model of LINE1 or L1 gene introduced and studied earlier in [Citation18]. The system equations given below describe the first stage of the retrotranscription process in which the L1 transcript () production is activated by aggregated amounts of transcription facilitating proteins () that interact with the regulatory region of the gene. The set of chemical equations in the system with effective size n is (15) (15) In the above system, the so-called equal dispersal condition (EDC) is often assumed, where . Under EDC the re-parameterized mass action ODE solution in Equation (Equation13(13) (13) ) has the closed form (16) (16) where and gives the fold change from the initial concentrations [Citation16]. Note that in this parametrization the regularity conditions (Equation5(5) (5) ) and (Equation6(6) (6) ) are satisfied for L1 system and that the diffusion approximation (Equation10(10) (10) ) specializes to (17) (17) where are independent standard Brownian motions. The LNA for L1 may be derived analogously.
5.1. Asymptotic LSE distribution for L1
In order to illustrate the bootstrap methods from Section 4, we compare the bootstrap estimates of the LSE distributions to their true asymptotic ones in the L1 model under EDC. The mass action ODE for L1 is given by Equation (Equation16(16) (16) ). Upon normalizing to the initial concentrations,Footnote2 and the drift function and its gradient are The matrices Φ and G in the corresponding CLT covariance calculations (Theorem A.2) are, respectively, and Recall that the asymptotic covariance formula for the components of the scaled limiting process in Theorem 3.2 is which may be numerically evaluated along with the remaining terms in the asymptotic variance expression (Equation8(8) (8) ). For comparison, we generate the L1 system trajectories with , and and assume the process to be observed at the time points from to in the alternating increments of and . For example, the first 5 time points are . The resulting asymptotic covariance matrix for the LSE (with no penalty, ) obtained with the help of numerical integration is shown in Figure along with its bootstrap estimates for varying n. For the bootstrap, the diffusion approximation with corresponding was applied using the Euler–Maruyama method with step size, . The number of bootstrap samples was taken as , for each of the system sizes . To additionally graphically illustrate the recovery of the asymptotic covariance structure by the bootstrap algorithms, we show in Figure the theoretical (upper panel) and bootstrap approximated (lower panel) bivariate marginal distributions of the LSE of , denoted .
Figures and indicate that the empirical distribution of the bootstrapped LSEs closely approximates that of the theoretical distribution illustrating in practice the application of our Algorithms 1 and 2. Note that obtaining the bootstrap samples only requires the ability to simulate from the diffusion process.
5.2. Penalized LSE for L1
In most practical cases, the data-generating process cannot be measured with arbitrary precision, due to various data collection limitations or errors arising during data processing. In general, such processing errors (see, e.g. [Citation19]) add variation to the LSEs and may compound the problems with formula (Equation8(8) (8) ) which in practice needs to be evaluated using estimated values of the model parameters in a fixed volume and hence is sensitive to the LSEs lack of precision. The penalty term in Equation (4) is intended to improve the LSE estimates precision by giving up some small amount of accuracy (recovering it in the limit, due to ). Since, as already indicated in Section 3, the penalty term also contributes to stabilizing the inverse matrix in Equation (Equation8(8) (8) ), the use of penalized formulation (Equation4(8) (8) ) for LSEs with fixed sample size may bring considerable benefits.
To illustrate this point, we compare now the bootstrap estimates for the lasso-penalized and ordinary LSE in the L1 model with various system sizes. The values of the parameters are as above. Since the lasso penalty is right-differentiable in our setting, as coordinate-wise, all the forms of penalty discussed in Section 3 are smooth and hence roughly equivalent (see, [Citation10]). To account for the observational errors, we also introduce an independent mean 0 Gaussian noise at each observation grid time point. The variance of this noise is set to 5% of the maximum concentration of mRNA L1 transcripts () which is equivalent to approximately 25% of the maximum concentration of the transcription facilitating proteins (). The choice of penalty rate is for and , so as to satisfy the LSE consistency requirements, although in a real data analysis one would typically choose it by some form of a cross-validation procedure (cf. [Citation6]). The results of the analysis for resamples are summarized in Table . As seen from the table, for the system sizes of and the penalized LSE enjoys, respectively, a 10-fold and 2-fold decrease in variance as compared with the ordinary LSE of , with only modest increase in bias. For the penalized estimate of , there is nearly a 5-fold decrease in variance for but only a moderate decrease for , while the variability of the estimates remaining roughly the same for the LSE and penalized LSE for . From the inspection of table entries it should be clear, that with the noisy data the penalized LSE does better in terms of the variance for small system sizes, whereas for larger n the performance of ordinary and penalized LSE is very similar.
5.3. RT-PCR Data for L1
In order to look at a more realistic biological example, in addition to the above synthetic data, we also consider the real time-PCR data from one of the time course experiments analysing L1 system in HeLa cell cultures [Citation13] under external chemical stress, as described in [Citation17]. The data consist of the L1 transcripts values () only, as the protein () is unobservable. The L1 transcript measurements from the experiment are presented in Table taken from Rempala [Citation16] where the LSE-based asymptotic confidence bound for this particular data set was obtained by approximating the overall system size and using the asymptotic distribution of the LSE. The system size approximation in [Citation16] was based on a relatively crude fluorescence conversion method introduced in [Citation17]. We shall illustrate now how the bootstrap approach may be used to help identify most likely system size, without using a (possibly unreliable) method based on fluorescent signal conversion. Note that, unlike in previous examples, the time-course L1 transcript measurements reported in Table were obtained from genetically identical but spatially separated L1 cell cultures (different tissue culture plates) and hence represent not a single one, but rather multiple trajectories of the stochastic system (Equation15(15) (15) ), each obtained at different time point . Nonetheless, due to the genetic homogeneity of tissue cultures, the initial conditions for the observed trajectories and the respective experimental system sizes n may be assumed to be approximately equal and Theorems 4.1 and 4.3 (and Algorithms 1 and 2) are still applicable in view of Remark 4.2.
Table 2. Time-course RT-PCR experiment results for L1 gene.
As in our earlier analysis, we took and reducing Equation (Equation16(16) (16) ) to the following one-dimensional model: (18) (18) Since with the concentration unobserved this model is no longer identifiable, we also fix and hence reduce the initial L1 model to only two parameters . Application of the LNA diffusion approximation and the EM algorithm to the limiting LSE problem, as discussed in Section 4.3 (with no penalty, i.e. ),Footnote3 gave the following values of the estimates, which are very similar to those obtained in [Citation16]
Based on these numerical values, the bootstrap algorithm (Algorithm 2) was applied to calculate the confidence bounds for the observed longitudinal trajectory of the L1 transcripts. The bounds depend on the unknown system size n, which was estimated via the method of likelihood profiling as follows. First, for the grid of n values (from to 9000 with 500 step size), the bootstrap approximation based on resamples was obtained using Algorithm 2 (note that with Algorithm 2 the same set of bootstrap resamples may be used with different n). Next, the actual value of the observed error was compared with the modes of the error distribution for varying n. The best mode-to-data alignment (over the gridded values of n) was obtained at and hence this value was taken as the estimated effective system size. The process is illustrated in Figure . In Figure , the 95% confidence envelopes of the original analysis in [Citation16] (where the effective system size was taken as ) are compared with the ones obtained from the likelihood profiling (). As may be seen from the plots the new larger estimate of the system size has the effect of shrinking the uncertainty around the observed trajectory due to generally tighter confidence bounds about observed data points. illustrating the advantage of the bootstrap-based analysis for data in Table .
6. Summary and discussion
We proposed here and justified with the appropriate large-volume limit results some new ways of practically estimating the covariance structure and, in general, improving the performance of LSEs in biochemical network models. We discussed three possible approaches based on (a) the penalized least-squares criterion, (b) the fast bootstrap algorithms for covariance estimation and (c) the bootstrap-based profiling method for the determination of effective system size.
With regard to (a), we have shown that under the vanishing penalty on parameters size, the LSEs continue to be consistent in the limit, but may additionally exhibit much better precision in the small-volume settings. Concerning (b), we have argued that the usage of bootstrap Monte-Carlo algorithms may allow to avoid the often difficult calculations of the LSEs asymptotic covariance structure. To improve efficiency, we proposed the usage of diffusion-based bootstrap methods and shown them to be consistent in the large-volume limit. Two types of diffusion approximation were considered: (i) the classical chemical Langevin equation approximation and (ii) the LNA method directly derived from the CLT for jump processes. The latter one, although slightly more biased in our simulation studies, seems particularly attractive as it allows one to generate resamples which are independent of the system size (volume) n. With regard to (c), this aspect of the LNA-based bootstrap may be used to develop an efficient algorithm for estimating the effective volume of the biochemical model, in essence, by analysing the size of the model fit fluctuations for different volume sizes. We have illustrated the practical applicability of this approach to reducing model uncertainty for partially observed, longitudinal RT-PCR data from the LINE 1 retrotransciption network.
Overall, our current work seems to support the conclusion that the bootstrap methods, although not without limitations, like, for example, the requirement to be able to simulate from the entire reaction network, seem to offer a viable alternative to a direct asymptotic approximation based on the estimated covariance structure. One hopes that the results presented here will encourage further study of the general application of resampling to biochemical network inference problems, both in its theoretical and applied aspects, in particular in the setting specifically not considered here when both .
Acknowledgments
Research supported in part by US NIH grant R01CA-152158 (DFL, GAR) and US NSF grant DMS-1106485 (GAR)
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. See Section 5.3.
2. This is done out of convenience to eliminate superfluous parameters. Note that any such renormalization possibly effects the units of system volume n.
3. In the particular case, given the large system size n, the penalty would have only negligible effect.
References
- H. Andersson and T. Britton, Stochastic Epidemic Models and Their Statistical Analysis, Vol. 4, Springer, New York, NY, 2000.
- D.F. Anderson and T.G. Kurtz, Continuous Time Markov Chain Models for Chemical Reaction Networks, in Design and Analysis of Biomolecular Circuits, H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore, eds., Springer, New York, NY, 2011, 3–42.
- K. Ball, T.G. Kurtz, L. Popovic, and G. Rempala, Asymptotic analysis of multiscale approximations to reaction networks, Ann. Appl. Probab. 16(4) (2006), pp. 1925–1961. doi: 10.1214/105051606000000420
- M.W. Chevalier and H. El-Samad, A data-integrated method for analyzing stochastic biochemical networks, J. Chem. Phys. 135(21) (2011), 214110. doi: 10.1063/1.3664126
- A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the em algorithm, J. R. Statist. Soc. Ser. B (Methodol.) 39(1) (1977), pp. 1–38.
- B. Efron and G. Gong, A leisurely look at the bootstrap, the jackknife, and cross-validation, Amer. Statist. 37(1) (1983), pp. 36–48.
- S.N. Ethier and T.G. Kurtz, Markov Processes: Characterization and Convergence, Vol. 282, Wiley, New York, NY, 2009.
- D.T. Gillespie, A rigorous derivation of the chemical master equation, Phys. A: Statist. Mech. Appl. 188(1) (1992), pp. 404–425. doi: 10.1016/0378-4371(92)90283-V
- A. Golightly and D.J. Wilkinson, Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo, Interface Focus 1(6) (2011), pp. 807–820. doi: 10.1098/rsfs.2011.0047
- T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning, Vol. 2, Springer, New York, NY, 2009.
- M. Komorowski, M.J. Costa, D.A. Rand, and M. P.H. Stumpf, Sensitivity, robustness, and identifiability in stochastic chemical kinetics models, Proc. Natl. Acad. Sci. USA 108(21) (2011), pp. 8645–8650. doi: 10.1073/pnas.1015814108
- M. Komorowski, B. Finkenstädt, C.V. Harper, and D.A. Rand, Bayesian inference of biochemical kinetic parameters using the linear noise approximation, BMC Bioinformatics 10(1) (2009), p. 343. doi: 10.1186/1471-2105-10-343
- J.R. Masters, Hela cells 50 years on: The good, the bad and the ugly, Nat. Rev. Cancer 2(4) (2002), pp. 315–319. doi: 10.1038/nrc775
- G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, Vol. 382, Wiley-Interscience, New York, NY, 2007.
- O.D. Perez, P.O. Krutzik, and G.P. Nolan, Flow cytometric analysis of kinase signaling cascades, Methods Mol. Biol. 263(2004), pp. 67–94.
- G.A. Rempala, Least squares estimation in stochastic biochemical networks, Bull. Math. Biol. 74(8) (2012), pp. 1938–1955. doi: 10.1007/s11538-012-9744-y
- G.A. Rempala, K.S. Ramos, and T. Kalbfleisch, A stochastic model of gene transcription: an application to l1 retrotransposition events, J. Theoret. Biol. 242(1) (2006), pp. 101–116. doi: 10.1016/j.jtbi.2006.02.010
- G.A. Rempala, K.S. Ramos, T. Kalbfleisch, and I. Teneng, Validation of a mathematical model of gene transcription in aggregated cellular systems: application to l1 retrotransposition, J. Comput. Biol. 14(3) (2007), pp. 339–349. doi: 10.1089/cmb.2006.0125
- A. Roberts, C. Trapnell, J. Donaghey, J.L. Rinn, and L. Pachter, Improving RNA-Seq expression estimates by correcting for fragment bias, Genome Biol. 12(3) (2011), p. R22. doi: 10.1186/gb-2011-12-3-r22
- A. Ruszczynski, Nonlinear Optimization, Vol. 13, Princeton University Press, Princeton, NJ, 2011.
- L. Salwinski and D. Eisenberg, In silico simulation of biological network dynamics, Nat. Biotechnol. 22(8) (2004), pp. 1017–1019. doi: 10.1038/nbt991
- J. Shao, Mathematical Statistics, 2nd ed., Springer, New York, NY, 2003.
- R. Tibshirani, Regression shrinkage and selection via the lasso: a retrospective, J. R. Statist. Soc: Ser. B (Statist. Methodol.) 73(3) (2011), pp. 273–282. doi: 10.1111/j.1467-9868.2011.00771.x
- N.G. Van Kampen, Stochastic Processes in Physics and Chemistry, Vol. 1, Elsevier North-Holland, Amsterdam, 1992.
- M.D. Wang, M.J. Schnitzer, H. Yin, R. Landick, J. Gelles, and S.M. Block, Force and velocity measured for single molecules of RNA polymerase, Science 282(5390) (1998), pp. 902–907. doi: 10.1126/science.282.5390.902
- D.A. Wheeler, M. Srinivasan, M. Egholm, Y. Shen, L. Chen, A. McGuire, W. He, Y.-J. Chen, V. Makhijani, G.T. Roth, X. Gomes, K. Tartaro, F. Niazi, C.L. Turcotte, G.P. Irzyk, J.R. Lupski, C. Chinault, X.-z. Song, Y. Liu, Y. Yuan, L. Nazareth, X. Qin, D.M. Muzny, M. Margulies, G.M. Weinstock, R.A. Gibbs, and J.M. Rothberg, The complete genome of an individual by massively parallel DNA sequencing, Nature 452(7189) (2008), pp. 872–876. doi: 10.1038/nature06884
- D.J. Wilkinson, Stochastic modelling for quantitative description of heterogeneous biological systems, Nat. Rev. Genet. 10(2) (2009), pp. 122–133. doi: 10.1038/nrg2509
- D.J. Wilkinson, Stochastic Modelling for Systems Biology, Vol. 44, CRC Press, Boca Raton, FL, 2011.
- H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Statist. Soc: Ser. B (Statist. Methodol.) 67(2) (2005), pp. 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
Appendix 1. Density-dependent Markov jump process
In the following, suppose the state is a continuous-time Markov process on the positive orthant of d-dimensional lattice, with jump intensities for , and the transition probabilities and . It is assumed that there are a finite number of vectors, such that and that is a continuous function of x for each k. Such processes are known as the DDMJPs since their transition rates depend on the density of the process (see, e.g. [Citation16]). Note that under the approximation (Equation2(2) (2) ) the process given in Equation (Equation1(1) (1) ) is a special case of the DDMJP. If is a collection of independent unit Poisson processes then (A1) (A1) Define the centred Poisson process and . The corresponding drift function is given by Equation (Equation11(11) (11) ), that is, , where for notational simplicity we no longer emphasize the trajectory or rate dependance on any parameters. Then, Equation (EquationA1(A1) (A1) ) becomes (A2) (A2) The following result due to Kurtz [Citation1, Chapter 5] establishes SLLN for .
Theorem A.1
(SLLN)
Let and suppose that for any compact there exists a constant such that . Then where is the deterministic solution to the integral equation,
For the inferential purposes, it is important to know that the scaled fluctuation of the DDMJP around the deterministic ODE solution, is Gaussian. This fact follows from the following CLT due to Kurtz [Citation1, Chapter 5]. Let such that .
Theorem A.2
(CLT)
Suppose is continuous and is non-random. Then, where the process V is given by (A3) (A3) or, in a distributionally equivalent Itô diffusion form, by (A4) (A4) The process V is a Gaussian vector process with covariance (A5) (A5) where satisfies and .
Appendix 2. Diffusion approximation and the bootstrap
Recall the Itô diffusion process defined by Equation (Equation10(10) (10) ). The following result indicates that has the same consistency property as the scaled pure jump process (EquationA2(A2) (A2) ) and may be therefore used as its approximation.
Lemma A.1
Suppose that a constant, and that for each compact set we have a Lipschitz constant, such that then almost surely, where is the solution to and .
Proof.
By the definition of given in Equation (Equation10(10) (10) ) implying, where . For almost all ω, , is a continuous process, and thus achieves its maximum and minimum on every compact set, implying that is bounded, except on a set of measure 0. (Note: We have suppressed the inherent sample space notation, ω, except when needed, as is standard for brevity). Then, by Gronwall's inequality (cf. [Citation1], Chapter 5), yielding As , the first term of the RHS converges almost surely to 0 by the theorem's hypothesis. The second term involves a finite weighted sum of the expressions of type which are bounded for almost all ω, implying that converges to 0 a.s.
The following result shows that the properly centred and scaled diffusion process converges in distribution to the same limit as the centred and scaled pure jump process.
Lemma A.2.
With the assumption a constant, and those of Theorem 2, the process, converges in distribution to the process, which is distributionally equivalent to defined in Theorem A.2.
Proof.
By definition,
Define, as before, Since the process is non-anticipating and converges in probability pointwise to by Lemma A.3, the dominated convergence theorem (DCT) for stochastic integrals implies that can be computed by interchanging the limit with the first integral, and that the classical DCT can be applied to the second. This gives Again, defining Φ to be the matrix function solution to we have Thus, and implies again by the continuous mapping theorem. This is clearly a Gaussian vector process by definition, with its covariance function given by Since the Brownian motions and are independent, as well as the increments are independent of , we have and therefore and V have the same distribution.
The following observation regarding the LNA process will be useful at this point.
Remark A.1.
In view of the definition of the LNA process given by Equation (Equation12(12) (12) ), it is easy to see that the results of Lemmas A.3 and A.4 hold true when is replaced by .
With the above lemmas, we are finally in a position to prove the bootstrap consistency result of Theorem 4.1.
Proof of Theorem 4.1
Let indicate that the sequence of probability distributions associated with the conditional (bootstrap) space converges to the distribution of a multivariate normal random vector with covariance . The notation and indicates the property holds for almost every ω and almost every , respectively. In view of the assumed differentiability of the penalty function h, any minimizing solution, , satisfying must also necessarily satisfy . Expansion about gives where is the Hessian of the penalty term evaluated at β. Lemma A.3 gives implying that converges to 0 in view of Lemma A.4, validating the removal in the limit of the higher order terms. Setting we arrive at By the consistency for the diffusion approximation (Lemma A.3), , implying that and . Since converges to the same distribution as the Gaussian vector process with mean 0 and covariance function where we have (A6) (A6) where is the zero-centred multivariate normal measure with covariance By consistency of the penalized LSE for the pure jump process, , , implying that and . An appeal to Slutsky's theorem gives (A7) (A7) Asymptotic normality of the penalized LSE for the pure jump process implies (A8) (A8) Application of the Cramer–Wold device to Equations (EquationA7(A7) (A7) ) and (EquationA8(A8) (A8) ), along with Equation (EquationA6(A6) (A6) ) imply which in turn implies
Proof of Theorem 4.3
In view of Remark A.5, proof of Theorem 4.1 may be virtually repeated with in place of to obtain the hypothesis.