1,103
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Bootstrapping least-squares estimates in biochemical reaction networks

&
Pages 125-146 | Received 27 Jun 2014, Accepted 18 Mar 2015, Published online: 22 Apr 2015

Abstract

The paper proposes new computational methods of computing confidence bounds for the least-squares estimates (LSEs) of rate constants in mass action biochemical reaction network and stochastic epidemic models. Such LSEs are obtained by fitting the set of deterministic ordinary differential equations (ODEs), corresponding to the large-volume limit of a reaction network, to network's partially observed trajectory treated as a continuous-time, pure jump Markov process. In the large-volume limit the LSEs are asymptotically Gaussian, but their limiting covariance structure is complicated since it is described by a set of nonlinear ODEs which are often ill-conditioned and numerically unstable. The current paper considers two bootstrap Monte-Carlo procedures, based on the diffusion and linear noise approximations for pure jump processes, which allow one to avoid solving the limiting covariance ODEs. The results are illustrated with both in-silico and real data examples from the LINE 1 gene retrotranscription model and compared with those obtained using other methods.

AMS Classification:

1. Introduction

The rapid advances in modern molecular quantification technology, such as fluorescence microscopy, cellular flow cytometry and high-throughput RNA-seq (see, e.g. [Citation15,Citation26,Citation27]), provide us with an unprecedented opportunity to discover the basic operating principles of molecular systems biology. However, progress seems to largely depend on our ability to bring to bear proper statistical methods capable of separating the biological signal of interest from the vast amount of experimental noise. In order to be effective, such methods must be tailored to the specific experimental tools and data collection mechanisms which, despite continuing improvements, still often provide incomplete and highly noisy information.

In this paper, we are concerned with statistical methods for biochemical reaction rates estimation from the data consisting of one or more time course trajectories of (partial) network species concentrations measured on a pre-defined time grid via, for example, RT-PCR or flow cytometry assays [Citation4,Citation21]. Although this particular problem has been discussed extensively in the computational biology and statistical literature ([Citation9,Citation12] and references therein), its asymptotic analysis was often dismissed due to a small number of longitudinal data points available from typical laboratory experiments. However, as pointed out recently in [Citation11] in many practical situations, it is reasonable to assume that the size of biological system (e.g. cell culture) under consideration is sufficiently large to allow for its large-volume approximation, the technique which is often used in modern population dynamics models, like, for example, epidemic spread studies [Citation1]. The volume limit idea should be particularly appealing to molecular biology experimentalists as it circumvents the problem of small longitudinal sample sizes due to instrumental limitations and cost considerations. The same approach is also applicable (although this is not the focus of current paper) to stochastic epidemic models [Citation1].

Using the large-volume argument Rempala [Citation16] has shown that the popular method of parameter estimation based on the least-squares criterion yields consistent and asymptotically efficient results when fitting the deterministic system of ordinary differential equations (ODEs) to the stochastic data arriving from large collections of molecules (e.g. in the RT-PCR experiments) under the law of mass action kinetics. However, as also pointed out in [Citation16], the practical obstacle to large-volume approximation for the least-square estimates (LSEs) is in the fact that their joined asymptotic covariance structure is given very non-explicitly by a solution to a set of nonlinear differential equations. Consequently, due to often numerically ill-posed inverse problems, it is quite difficult to obtain the asymptotic confidence bounds for the LSEs, even in relatively simple models.

In the current paper, we aim at improving the practical applicability of the results discussed in [Citation16]. First, we consider a slightly more general version of the LSE, the so-called penalized LSE or PLSE. Whereas in many modern statistical analysis problems the penalized methods are used to improve identifiability and decrease variability in the model predictions, in our case the biggest potential gain in their usage is the stabilization of the variance matrix (for the fixed volume) in a way analogous to the ridge model in classical regression analysis [Citation10]. Furthermore, based on the consistency of PLSEs, we derive and illustrate with examples two large-volume parametric bootstrap (Monte-Carlo) procedures which allow for consistent estimation of the covariance structure, possibly under limited and only partially observable trajectory data. In order to improve efficiency in re-computing model trajectories, we propose the usage of diffusion and closely related linear noise approximations (LNAs) [Citation12] with the bootstrap algorithm. We show that the consistency of our procedures follows from the classical large-volume limit theorem for the density-dependent jump process (see, e.g. [Citation7, Chapter 10]). Finally, in order to apply our methods to practical experimental data, we also propose a procedure based on the profile likelihood bootstrap algorithm which allows for a more realistic selection of the effective volume size of the modelled system. In essence, we treat the volume size as an estimable parameter of the model and fit it via the maximal likelihood method.

Beyond this introductory section, the paper is organized as follows. In Section 2, we introduce the concept of a stochastic biochemical network under mass action dynamics, along with the necessary notation. In Section 3, we discuss the least-squares estimation problem for mass action biochemical networks and prove basic consistency and normality results for penalized least-squares estimates (PLSEs). These PLSE results are analogous to those in [Citation16] but form the foundation for our latter analysis and are provided for completeness. As they are consequences of a general law of large numbers and a central limit theorem (CLT) for density-dependent stochastic processes, we provide brief review of the relevant facts about such processes in Appendix 1. In Section 4, our main results on the bootstrap algorithms for estimating LSE distribution are discussed, along with the formulation of the required large-volume consistency and diffusion approximation results. The proofs are deferred to Appendix 2. Finally, in Section 5, we illustrate some of the concepts discussed earlier with the analysis of the LINE 1 retrotransposition model for both synthetic and real data, comparing the latter analysis to the one presented in [Citation16] relying on simple asymptotic LSE bounds. Our conclusions and a brief summary are given in Section 6.

2. Jump processes and mass action kinetics

A reaction system or reaction network is typically defined as a set of d constituent chemical species reacting according to a finite set of equations, each describing the system state change after the occurrence of a particular reaction. For instance, the chemical reaction A+BλC, is interpreted as ‘a molecule of species A combines with a molecule of species B to create a molecule of species C at rate λ’. In this paper, the reaction system is treated as a continuous-time Markov chain, with its state described by a d-dimensional vector X(t) keeping track of each species molecular counts at time t. The kth reaction is determined by the reaction rate function λk(x) along with two vectors, νk and νk which give, respectively, the number of molecules of each species consumed and produced in the reaction, Both νk and νk are the elements of Z0d and the corresponding reaction rate is a non-negative scalar. Under the assumptions that the system is well-stirred and thermally equilibrated, the reaction rates take on the classical mass action form and the waiting times between reactions are exponentially distributed (see, e.g. [Citation8]). Intuitively, the mass action form of a kth reaction rate function represents the number of different ways we can choose molecular substrates needed for the reaction k to occur [Citation7, Chapter 10]. Defining |νk|=iνik, the kth reaction rate function has the form λk(n)(x)=nκkiνik!xiνikn|νk|. Here, the parameter n denotes the effective volume of the system. Whereas in many classical chemistry applications n is simply the physical volume of a container (e.g. cell) multiplied by Avagadro's number, in biological systems, where often the physical units of the parameters are not readily available,Footnote1 it is convenient to consider n simply as a scaling parameter proportionate to the overall system volume [Citation2,Citation3].

The exponential waiting time distribution, along with the probability of a particular reaction [Citation8] justifies the system time-evolution equation (1) X(t)=X(0)+kYk0tλk(n)(X(s))ds(νkνk).(1) Denoting by c=n1x the species concentration per unit volume and observing the behaviour of the mass action rate functions in the large-volume limit (n) one obtains the approximation (2) λk(n)(x)nκkciνik=:nλk(c).(2) When λk(n)(x) is replaced with nλk(x/n) in Equation (Equation1), the process X(t) becomes a member of a general family of the so-called density-dependent Markov jump processes (DDMJP) described in the appendix. As shown in the appendix (cf. Theorem A.1) the approximation (Equation2) ties together the stochastic jump process with its large-volume limit which turns out to be given by the solution of an integral version of the classical law of mass action ODE (the so-called reaction rate equation, cf., e.g. [Citation24]), (3) c˙(t)=kκkciνik(t)(νkνk).(3) The above ODE is parametrized by a vector of parameters, say β, involving the rate constants κk as well as possibly the initial conditions c(0). In what follows, we write cβ(t) for any such solution to either Equation (Equation3) or to its integral version (see also Appendix 1).

3. Penalized least squares

Let the counts of d species observed at discrete time points, ti, (i=1,,T<) be denoted by Xi(n,β)Z0d where for now we assume that this observed data arrived from a single trajectory of DDMJP for which the effective system volume n was known. The extension to multiple trajectories is straightforward, but not considered here except for a very simple case of independent observations (see Corollary 3.5 of Theorem 3.3). The case when n is unknown is discussed in Section 5. Following Rempala [Citation16], we denote the scaled molecular counts Cn(ti)=Cnβ(ti)=X(n,β)(ti)/n and assume that β0ΘRp is the true parameter value we intend to estimate. Throughout the paper, the PLSE is defined as any solution (not necessarily unique) to the optimization problem (4) βˆn=argminβΘi|Cn(ti)cβ(ti)|2+2γnh(β),(4) where γn0, h is a penalty function and Θ is an open set in Rp containing β0. Common forms of h(x) include |x|22=ixi2 for the ridge regression, |x|1=i|xi| for the lasso [Citation23], and |x|22+|x|1 for the elastic net [Citation29]. Note that in our setting we will typically have β0 component-wise, in which case the lasso penalty becomes (right) differentiable as it reduces to h(β)=iβi

For any matrix A, define A=TrAA. The following two results extending those in [Citation16] will be useful in the following sections. The first one establishes the strong law of large numbers (SLLN) and the consistency of βˆn as n.

Theorem 3.1

SLLN

Let kcβ denote the derivative of cβR0d with respect to the kth coordinate of βRp and set cβ=[kcjβ]Rp×d. Assume that cβ< and that the SLLN for DDMJP holds (cf. Theorem A.1 in the appendix), the subderivatives of the penalty function h are bounded and γn0 as n. If (5) β1,β2Θcβ1(ti)=cβ2(ti)for i=1,,Timpliesβ1=β2(5) and for some ε>0 (6) kikcβ(ti)(cβ(ti)cβ0(ti))0for ββ0, |ββ0|<ε(6) then the penalized LSE βˆn given by Equation (Equation4) are strongly consistent, that is, βˆnβ0as na.s.

Proof.

Let θRp denote the vector of subderivatives (the sub-gradient) of h(β). By the so-called KKT necessary optimality conditions (see, e.g. [Citation20], Chapter 3), any solution βˆn of the optimization problem (Equation4) has to satisfy i(Cn(ti)cβˆn(ti))(Cn(ti)cβˆ(ti))=2γnθ with Cn(ti) and cβˆ(ti)Rd denoting, respectively, the observations from d chemical species concentrations measured at ti and the corresponding deterministic solution of Equation (Equation3) evaluated at ti and βˆ. This yields icβˆ(ti)(Cn(ti)cβˆ(ti))=γnθ. Adding and subtracting the deterministic limit gives icβˆ(ti)(Cn(ti)cβ0(ti)+cβ0(ti)cβˆ(ti))=γnθ or (7) icβˆ(ti)(cβˆ(ti)cβ0(ti))+γnθ=icβˆ(ti)(Cn(ti)cβ0(ti)).(7) Under the SLLN for DDMJP cited in the appendix (cf. Theorem A.1), the condition γn0 and the boundedness of cβ(ti) imply the RHS above converges a.s. to 0 and hence so does the LHS. Consequently, limnicβˆ(ti)(cβˆ(ti)cβ0(ti))=0. Under the assumptions of the trajectories identifiability (Equation5) and non-degeneracy (Equation6), the asymptotic inversion gives limnβˆn=β0 a.s.

Remark 3.1.

Note that the conditions (Equation5) and (Equation6) are necessary to guarantee the LSE consistency. The selection of points {ti} is therefore important to ensure good statistical properties of LSEs, as also seen in our next theorem below. The required boundedness of h subderivatives is, on the other hand, a relatively mild condition and satisfied, e.g. for h(x)=|x|1 and h(x)=|x|2. Indeed, the latter is simply differentiable, and for the former, recall that the subderivative of |x| is 1,1, or any number in [1,1], respectively, according to x<0, x>0 or x=0.

The second result of this section is the CLT for the scaled PLSEs n(βˆβ0). Similarly to the SLLN above, this one is also a consequence of the appropriate result for DDMJP.

Theorem 3.2

CLT

Suppose that the assumptions of the previous theorem hold and additionally γn=o(n1/2) and h is twice differentiable in ΘRp. Furthermore, assume that the hypothesis of CLT for DDMJP holds (cf. Theorem A.2 in the appendix). Then for any sRp supx|P(n(sβˆsβ0)x)P(sZnx)|0as n, where ZnMVN(0,Σβˆ) is a multivariate zero-centred Gaussian distribution with the covariance matrix (8) Σβˆ=Vari(Bβˆ+γnH(βˆ))1cβˆ(ti)n(Cn(ti)cβ0(ti))=i(Bβˆ+γnH(βˆ))1cβˆ(ti)Var(Vi)[(Bβˆ+γnH(βˆ))1cβˆ(ti)]+ij[(Bβˆ+γnH(βˆ))1cβˆ(ti)Cov(Vi,Vj)[(Bβˆ+γnH(βˆ))1cβˆ(tj)]+ij(Bβˆ+γnH(βˆ))1cβˆ(tj)Cov(Vi,Vj)[(Bβˆ+γnH(βˆ))1cβˆ(ti)]].(8) Here, Bβˆ=icβˆ(ti)[cβ0(ti)], H is the Hessian matrix of the penalty function h and Cov(Vi,Vj)=Cov(V(ti),V(tj)) is given by Equation (Equation23) of Theorem A.2 in the appendix.

Proof.

In the notation of the previous proof, under the differentiability assumptions on the penalty function h, in Equation (Equation7) we may take θ=h, the gradient vector of the penalty. Expanding the LHS of Equation (Equation7) about β0 gives icβˆ(ti)(Cn(ti)cβ0(ti))+γnh(βˆ)=icβˆ(ti)(cβˆ(ti)cβ0(ti))+γnH(β0)(βˆβ0)+γnh(β0)+Op(|βˆβ0|)(βˆβ0)=icβˆ(ti)[cβ0(ti)](βˆβ0)+γnH(β0)(βˆβ0)+Op(|βˆβ0|)(βˆβ0)+γnh(β0)=(Bβˆ+γnH(β0))(βˆβ0)+Op(|βˆβ0|)(βˆβ0)+γnh(β0). In view of the theorem assumptions, the higher order terms scaled by n tend to 0, and therefore (9) n(βˆβ0)=(Bβˆ+γnH(β0))1icβˆ(ti)n(Cn(ti)cβ0(ti))γnn(Bβˆ+γnH(β0))1h(β0)+op(1)=(Bβˆ+γnH(β0))1icβˆ(ti)n(Cn(ti)cβ0(ti))+op(1),(9) since the condition γn=o(n1/2) implies that the term in the second line above converges to 0. From Theorem A.2 in the appendix, it follows that the d-vectors n(Cn(ti)cβ0(ti)) (i=1,,T) converge weakly to the d-variate normal variables whose covariance structure is given by Cov(Vi,Vj) described in Equation (EquationA5). Therefore, the relation (Equation9), along with the continuous mapping and Slutsky's theorem arguments (see, e.g. [Citation22], p. 133), implies that n(βˆβ0)MVN(0,Σβ0) where ⇒ denotes the usual weak convergence of random vectors. Since MVN(0,Σn,βˆ)MVN(0,Σβ0) by Slutsky's theorem, the hypothesis follows via the Cramer–Wold device (see, e.g. [Citation22], p. 56).

Remark 3.2.

Note that the relation (Equation9) indicates that for finite n the penalty term may decrease the variability of the LSE, by stabilizing the inverse of Bβˆ matrix. Indeed, for instance for h(x)=|x|2 this effect is similar to that observed in the ridge regression model. This is further illustrated in the numerical example in Section 5 (cf. Table ).

Table 1. Penalized vs. ordinary LSEs for L1.

In Theorems 3.1 and 3.3, we have assumed that data come from a single trajectory of the jump process. However, the results are obviously immediately extendable to the case of multiple independent trajectories. This special case of independent observations is of interest, for instance, in RT-PCR experiments with multiple cell-culture dishes and therefore deserves a special attention.

Corollary 3.1.

Assume that the trajectory data Xi(n,β)Z0d (i=1,,T<) are mutually independent, then Cov(Zi,Zj)=0 for ij and, assuming that remaining conditions in Theorem 3.3 hold true, its hypothesis also holds with Σβˆ=i(Bβˆ+γnH(βˆ))1cβˆ(ti)Var(Vi)[(Bβˆ+γnH(βˆ))1cβˆ(ti)].

As already indicated in Section 1, the complicated, non-explicit covariance structure of the collection of random variables ncβˆ(ti)(Cn(ti)cβ0(ti)) (i=1,,T) limits the usefulness of the relation (Equation9) in obtaining asymptotic confidence bounds for the penalized LSE. Although for some simple reaction networks the explicit analytic expression for Equation (Equation8) may sometimes exist, in most cases of practical interest it may be only numerically evaluated (cf. Section 5). As such evaluations become more unstable with increasing system complexity (see, e.g. [Citation25]), the methods of approximating the LSEs asymptotic distribution without the need to explicitly evaluate the formula (Equation8) are naturally of great interest. This issue is discussed next.

4. Bootstrap algorithms

We present here two resampling algorithms for estimating the distribution of the PLSEs discussed in the previous section. As seen below, this may be done either directly, by considering the appropriate resampling distributions themselves, or indirectly, by considering the resampling estimates of the asymptotic covariance in Theorem 3.3.

4.1. Diffusion approximation

For the purpose of bootstrapping trajectories, it is useful to have an easy to simulate approximation to the concentration process Cn(t)=X(t)/n where X(t) is given by Equation (Equation1) (cf. also Equation (EquationA2) in the appendix). To this end, consider the following Itô diffusion process Dn(t) (see, e.g.[Citation2]): (10) Dn(t)=Dn(0)+k1n0tλk(Dn(s))dBk(s)(νkνk)+0tF(Dn(s))ds,(10) where Bk(t) are independent standard Brownian motions and (11) F(x)=kλk(x)(νkνk)(11) is the drift function. Note that under the law of mass action approximation (Equation2), F(x) is asymptotically (n) equal to the RHS of the reaction rate equation  (Equation3). As shown in Lemma A.3 in the appendix, the process defined by Equation (Equation10) approximates Cn(t) in the sense that both share the same deterministic large-volume limit given by the solution c(t) of the reaction rate equation (Equation3). Note that there is an obvious computational benefit in considering the process Dn instead of the pure jump process X(t) which requires an exact (hence expensive) simulation of the trajectory (Equation1) [Citation8]. For large systems, the simulation of Dn is typically substantially less computationally intense and may be performed via one of the many standard numerical methods available in the literature [Citation28, Chapter 5].

The considerations above lead to the diffusion bootstrap algorithm (Algorithm 1) for approximating either directly the distribution of the penalized LSE βˆ or the covariance matrix Σβˆ. The formal justification of Algorithm 1 is given in the following consistency result (Theorem 4.1) stating that as the system volume increases, the bootstrap samples have the asymptotically correct empirical distribution and, consequently, may be used to estimate the covariance structure of the least squares and penalized least-squares estimates with vanishing penalty. The proof is given in Appendix 2.

Theorem 4.1.

Assume Cn(t)=X(t)/n is a scaled trajectory from the pure jump process (Equation1) with β=β0. Let βˆ=argminβΘi|Cn(ti)cβ(ti)|2+γnh(β) be associated with the probability space, (Ω,F,P). Let Dn(t) denote the diffusion process parametrized by the LSE β=βˆ, and let βˆ=argminβΘi|Dn(ti)cβ(ti)|2+γnh(β) be the penalized least-squares estimate associated with the conditional probability space, (Ω,F,Pn) Then, under the assumptions of Theorem 3.3, sRpsupx|Pn(ns(βˆβˆ)x)P(ns(βˆβ0)x)|0a.s. n.

Although in the above theorem we have assumed that the penalty function is differentiable, it should be noted that this is not necessary. Provided that the subderivatives of the penalty term are bounded, the term vanishes asymptotically and the theorem hypothesis still holds.

Remark 4.1.

From the proof of the theorem, it is evident that the hypothesis of Theorem 4.1 extends to the case of independent observations of Corollary 3.5.

4.2. Linear noise approximation

An alternative approximation of Equation (Equation1), which is often numerically simpler than Dn, may be obtained by directly appealing to the limit theorem for DDMJP, that is, Theorem A.2 in the appendix. Since under the assumptions of Theorem A.2 n(Cn(t)cβ(t))V(t) as n where V(t) is the Gaussian process satisfying the distributionally equivalent diffusion equations  (EquationA3) and (EquationA4), the natural approximation of the process Cn(t) by its limit V(t) is therefore (12) Ln(t)=cβ(t)+V(t)n.(12) where we write β for β0 for notational convenience. The process Ln(t) is commonly referred to as the LNA of Cn(t) [Citation12]. In our present context, it may be also heuristically derived as follows (for another heuristic derivation cf., e.g. [Citation28], Chapter 8). Since Cn(t)Dn(t) given by Equation (Equation10) and Cncβ a.s. (cf. Theorem A.1), then Cn(t)Cn(0)+1nk0tλk(cβ(s))dBk(s)(νkνk)+0tF(Cn(t))ds. Substituting Ln(t) defined in Equation (Equation12) for Cn(t), and applying Taylor's expansion about cβ(t) to the drift function F(x), one obtains for large n cβ(t)+1nV(t)=Cn(0)+1nk0tλk(cβ(s))dBk(s)(νkνk)+0tF(cβ(s)+1nV(s))dsCn(0)+1nk0tλk(cβ(s))dBk(s)(νkνk)+0tF(cβ(s))ds+1n0tF(cβ(s))V(s))ds, where [F]ij=[jFi]. Assuming that Cn(0)=c0+v0/n, where both c0 and v0 are non-random, and collecting terms of different order in n, the above yields (13) cβ(t)=c0+0tF(cβ(s))ds(13) and (14) V(t)=v0+k0tλk(cβ(s))dBk(s)(νkνk)+0tF(cβ(s))V(s)ds,(14) showing that the quantities cβ(t) and V(t) in Equation (Equation12) are indeed (distributionally) the same as those defined in Theorem A.2.

The usage of Ln(t) in place of Dn(t) leads to another resampling algorithm similar to the diffusion bootstrap, but potentially exhibiting better numerical stability due to the natural decoupling of the deterministic and stochastic components of the resamples following from Equation (Equation12). For the record, we present the second resampling algorithm below as Algorithm 2.

Formally, the validity of Algorithm 2 is derived from the following analogue of Theorem 4.1. The proof is again deferred to the appendix.

Theorem 4.2.

Let Cn(t) and βˆ be as in Theorem 4.1. Let Ln(t) denote the LNA process conditional on βˆ, and let βˆ=argminβΘi|Ln(ti)cβ(ti)|2+γnh(β) be the penalized least-squares estimate associated with the conditional probability space, (Ω,F,Pn). Then, under the assumptions of Theorem 3.3, sRpsupx|Pn(ns(βˆβˆ)x)P(ns(βˆβ0)x)|0a.s. n.

Note that Remark 4.2 following Theorem 4.1 also applies to Theorem 4.3.

4.3. Unobserved species and the EM algorithm

In most practical circumstances, some of the species in the chemical model will be unobservable due to technical or other experimental limitations, like, for instance, the inability to simultaneously record genomic and proteomic data [Citation16]. Let us briefly outline how the inference problem may be handled in such cases. Following the set-up of Section 3, denote by xij the limiting (n) scaled concentration of chemical species j at time point ti (assumed as observed) and let xj=(x1j,,xTj) and x=(x1,,xr)RrT. Define in a similar fashion vector yR(dr)T corresponding to the unobserved species, so the full species vector is then [x,y]=(x,y). Under either LNA or the diffusion approximation assumption, the joint limited likelihood of the full data is multivariate normal with its mean cβ=[cxβ,cyβ]RdT being the vector of the ODE solutions to Equation (Equation3) for all d species and its precision matrix Σ1(β)=Σ~11(β)Σ~12(β)Σ~21(β)Σ~22(β), where the partitioning patterns reflects the covariance structure between species vectors x and y. In this notation, the expression for the limiting log-likelihood function (x,y,|β) of the full data satisfies (x,y|β)log|Σ(β)|([x,y]cβ)Σ1(β)([x,y]cβ)=log|Σ(β)|(xcxβ)Σ~11(β)(xcxβ)(ycyβ)Σ~21(β)(xcxβ)(xcxβ)Σ~12(β)(ycyβ)(ycyβ)Σ~22(β)(ycyβ). With the help of the above, the asymptotic inference on β may be performed via the usual EM algorithm [Citation5]. In particular, it follows that the algorithm's E-step has a closed form expression (see, e.g. [Citation14] Chapter 2) and may be maximized numerically as soon as the Gaussian covariance structure is estimated (e.g. via Algorithm 1 or 2).

5. Example: L1 system

As an example, consider a retrotranscription model of LINE1 or L1 gene introduced and studied earlier in [Citation18]. The system equations given below describe the first stage of the retrotranscription process in which the L1 transcript (X1) production is activated by aggregated amounts of transcription facilitating proteins (X2) that interact with the regulatory region of the gene. The set of chemical equations in the system with effective size n is (15) X2κ1X1+X2X1κ2φ,φnκ3X2X2κ4φ.(15) In the above system, the so-called equal dispersal condition (EDC) is often assumed, where κ2=κ4. Under EDC the re-parameterized mass action ODE solution cβ in Equation (Equation13) has the closed form (16) c1β(t)=β1β3[1+β5(1+tβ2)eβ2t],c2β(t)=β3(1+β5eβ2t),(16) where β1=κ1κ2β2=κ2β3=κ3κ4β4=β2, and β5 gives the fold change from the initial concentrations [Citation16]. Note that in this parametrization the regularity conditions (Equation5) and (Equation6) are satisfied for L1 system and that the diffusion approximation (Equation10) specializes to (17) Dn1(t)=Dn1(0)+1n0tβ1β2Dn2(s)dB1(s)0tβ2Dn1(s)dB2(s)+β1β20tDn2(s)dsβ20tDn1(s)ds,Dn2(t)=Dn2(0)+1nβ3β4B3(t)0tβ2Dn2(s)dB4(s)+β3β2tβ20tDn2(s)ds,(17) where B1,B2,B3,B4 are independent standard Brownian motions. The LNA for L1 may be derived analogously.

5.1. Asymptotic LSE distribution for L1

In order to illustrate the bootstrap methods from Section 4, we compare the bootstrap estimates of the LSE distributions to their true asymptotic ones in the L1 model under EDC. The mass action ODE for L1 is given by Equation (Equation16). Upon normalizing to the initial concentrations,Footnote2 β5=1 and the drift function and its gradient are F(x1,x2)=(β1β2x2β2x1,β2β3β2x2),F(x1,x2)=β2β1β20β2. The matrices Φ and G in the corresponding CLT covariance calculations (Theorem A.2) are, respectively, Φ(t,s)=eβ2(st)β1β2(st)eβ2(st)0eβ2(st) and G(x1(s),x2(s))=β1β2x2(s)+β2x1(s)00β2β3+β2x2(s). Recall that the asymptotic covariance formula for the components Zi of the scaled limiting process in Theorem 3.2 is Cov(Zi,Zj)=0titjΦ(ti,s)G(cβ(s))(Φ(tj,s))ds which may be numerically evaluated along with the remaining terms in the asymptotic variance Σβˆ expression (Equation8). For comparison, we generate the L1 system trajectories with β0=(β1,β2,β3)=(5,1,.1), β4=β2 and β5=1 and assume the process to be observed at the time points from 0.03 to 1.00 in the alternating increments of 0.03 and 0.04. For example, the first 5 time points are 0.03,0.07,0.10,0.13,0.17. The resulting asymptotic covariance matrix for the LSE (with no penalty, γn=0) obtained with the help of numerical integration is shown in Figure  along with its bootstrap estimates for varying n. For the bootstrap, the diffusion approximation with corresponding β0 was applied using the Euler–Maruyama method with step size, dt=0.01. The number of bootstrap samples was taken as M=10,000, for each of the system sizes n=102,103,104. To additionally graphically illustrate the recovery of the asymptotic covariance structure by the bootstrap algorithms, we show in Figure  the theoretical (upper panel) and bootstrap approximated (lower panel) bivariate marginal distributions of the LSE of β0, denoted βˆ.

Figure 1. Bootstrap estimate of the LSE covariance for the L1 system. True asymptotic LSE covariance matrix vs. diffusion bootstrap covariance matrices for L1 systems with varying n (via Algorithm 1). To facilitate comparisons, the penalty was removed, i.e. γn=0 for all cases considered.

Figure 1. Bootstrap estimate of the LSE covariance for the L1 system. True asymptotic LSE covariance matrix vs. diffusion bootstrap covariance matrices for L1 systems with varying n (via Algorithm 1). To facilitate comparisons, the penalty was removed, i.e. γn=0 for all cases considered.

Figure 2. LSE covariance recovery via bootstrap for the L1 system. The contour plots for the true asymptotic LSE bivariate marginal distributions (top panel) and their LNA bootstrap estimates (bottom panel) for the LSE in the L1 system with n=1000. The recovery of the multivariate Gaussian distribution covariance structure by the bootstrap procedure is clearly visible.

Figure 2. LSE covariance recovery via bootstrap for the L1 system. The contour plots for the true asymptotic LSE bivariate marginal distributions (top panel) and their LNA bootstrap estimates (bottom panel) for the LSE in the L1 system with n=1000. The recovery of the multivariate Gaussian distribution covariance structure by the bootstrap procedure is clearly visible.

Figures  and indicate that the empirical distribution of the bootstrapped LSEs closely approximates that of the theoretical distribution illustrating in practice the application of our Algorithms 1 and 2. Note that obtaining the bootstrap samples only requires the ability to simulate from the diffusion process.

5.2. Penalized LSE for L1

In most practical cases, the data-generating process cannot be measured with arbitrary precision, due to various data collection limitations or errors arising during data processing. In general, such processing errors (see, e.g. [Citation19]) add variation to the LSEs and may compound the problems with formula (Equation8) which in practice needs to be evaluated using estimated values of the model parameters in a fixed volume and hence is sensitive to the LSEs lack of precision. The penalty term γnh(β) in Equation (4) is intended to improve the LSE estimates precision by giving up some small amount of accuracy (recovering it in the limit, due to γn0). Since, as already indicated in Section 3, the penalty term also contributes to stabilizing the inverse matrix in Equation (Equation8), the use of penalized formulation (Equation4) for LSEs with fixed sample size may bring considerable benefits.

To illustrate this point, we compare now the bootstrap estimates for the lasso-penalized and ordinary (γn=0) LSE in the L1 model with various system sizes. The values of the parameters β0 are as above. Since the lasso penalty is right-differentiable in our setting, as β0 coordinate-wise, all the forms of penalty discussed in Section 3 are smooth and hence roughly equivalent (see, [Citation10]). To account for the observational errors, we also introduce an independent mean 0 Gaussian noise at each observation grid time point. The variance of this noise is set to 5% of the maximum concentration of mRNA L1 transcripts (X1) which is equivalent to approximately 25% of the maximum concentration of the transcription facilitating proteins (X2). The choice of penalty rate is γn=n3/4 for n=10,102,103, and 104, so as to satisfy the LSE consistency requirements, although in a real data analysis one would typically choose it by some form of a cross-validation procedure (cf. [Citation6]). The results of the analysis for M=10,000 resamples are summarized in Table . As seen from the table, for the system sizes of n=10 and n=100 the penalized LSE enjoys, respectively, a 10-fold and 2-fold decrease in variance as compared with the ordinary LSE of β1, with only modest increase in bias. For the penalized estimate of β2, there is nearly a 5-fold decrease in variance for n=10 but only a moderate decrease for n100, while the variability of the β3 estimates remaining roughly the same for the LSE and penalized LSE for n100. From the inspection of table entries it should be clear, that with the noisy data the penalized LSE does better in terms of the variance for small system sizes, whereas for larger n the performance of ordinary and penalized LSE is very similar.

5.3. RT-PCR Data for L1

In order to look at a more realistic biological example, in addition to the above synthetic data, we also consider the real time-PCR data from one of the time course experiments analysing L1 system in HeLa cell cultures [Citation13] under external chemical stress, as described in [Citation17]. The data consist of the L1 transcripts values (X1) only, as the protein (X2) is unobservable. The L1 transcript measurements from the experiment are presented in Table  taken from Rempala [Citation16] where the LSE-based asymptotic confidence bound for this particular data set was obtained by approximating the overall system size and using the asymptotic distribution of the LSE. The system size approximation in [Citation16] was based on a relatively crude fluorescence conversion method introduced in [Citation17]. We shall illustrate now how the bootstrap approach may be used to help identify most likely system size, without using a (possibly unreliable) method based on fluorescent signal conversion. Note that, unlike in previous examples, the time-course L1 transcript measurements reported in Table  were obtained from genetically identical but spatially separated L1 cell cultures (different tissue culture plates) and hence represent not a single one, but rather multiple trajectories of the stochastic system (Equation15), each obtained at different time point . Nonetheless, due to the genetic homogeneity of tissue cultures, the initial conditions for the observed trajectories and the respective experimental system sizes n may be assumed to be approximately equal and Theorems 4.1 and 4.3 (and Algorithms 1 and 2) are still applicable in view of Remark 4.2.

Table 2. Time-course RT-PCR experiment results for L1 gene.

As in our earlier analysis, we took β4=β2 and β5=1 reducing Equation (Equation16) to the following one-dimensional model: (18) c1β(t)=β1β3[1+(1+tβ2)eβ2t].(18) Since with the concentration c2β unobserved this model is no longer identifiable, we also fix β3=1 and hence reduce the initial L1 model to only two parameters (β1,β2). Application of the LNA diffusion approximation and the EM algorithm to the limiting LSE problem, as discussed in Section 4.3 (with no penalty, i.e. γn=0),Footnote3 gave the following values of the estimates, which are very similar to those obtained in [Citation16] βˆ1=6.18,βˆ2=0.027.

Based on these numerical values, the bootstrap algorithm (Algorithm 2) was applied to calculate the confidence bounds for the observed longitudinal trajectory of the L1 transcripts. The bounds depend on the unknown system size n, which was estimated via the method of likelihood profiling as follows. First, for the grid of n values (from n=500 to 9000 with 500 step size), the bootstrap approximation based on M=5000 resamples was obtained using Algorithm 2 (note that with Algorithm 2 the same set of bootstrap resamples may be used with different n). Next, the actual value of the observed error was compared with the modes of the error distribution for varying n. The best mode-to-data alignment (over the gridded values of n) was obtained at n=5000 and hence this value was taken as the estimated effective system size. The process is illustrated in Figure . In Figure , the 95% confidence envelopes of the original analysis in [Citation16] (where the effective system size was taken as n=1000) are compared with the ones obtained from the likelihood profiling (n=5000). As may be seen from the plots the new larger estimate of the system size has the effect of shrinking the uncertainty around the observed trajectory due to generally tighter confidence bounds about observed data points. illustrating the advantage of the bootstrap-based analysis for data in Table .

Figure 3. Bootstrap likelihood profiling. The LNA bootstrap (M=5000) plots approximating for several different n values the residuals distributions for the model (Equation18) fitted to data in Table  and comparing them with the actual observed residuals (grey vertical lines). As seen from the plots, around the value n=5000 the mode of the residuals distribution best aligns with the observed value. Note that the alignment is very poor for n=1000, which is the estimated volume used in [Citation16].

Figure 3. Bootstrap likelihood profiling. The LNA bootstrap (M=5000) plots approximating for several different n values the residuals distributions for the model (Equation18(18) c1β(t)=β1β3[1+(1+tβ2)e−β2t].(18) ) fitted to data in Table 2 and comparing them with the actual observed residuals (grey vertical lines). As seen from the plots, around the value n=5000 the mode of the residuals distribution best aligns with the observed value. Note that the alignment is very poor for n=1000, which is the estimated volume used in [Citation16].

Figure 4. Confidence envelopes for different n values. The ODE model data (Equation18) fitted to the trajectory data in Table  plotted along with the fits 95% confidence envelopes for two different estimated system sizes. The left panel shows the bounds obtained in [Citation16] for the system size n=1000 whereas the right panel shows the bounds obtained by taking n=5000.

Figure 4. Confidence envelopes for different n values. The ODE model data (Equation18(18) c1β(t)=β1β3[1+(1+tβ2)e−β2t].(18) ) fitted to the trajectory data in Table 2 plotted along with the fits 95% confidence envelopes for two different estimated system sizes. The left panel shows the bounds obtained in [Citation16] for the system size n=1000 whereas the right panel shows the bounds obtained by taking n=5000.

6. Summary and discussion

We proposed here and justified with the appropriate large-volume limit results some new ways of practically estimating the covariance structure and, in general, improving the performance of LSEs in biochemical network models. We discussed three possible approaches based on (a) the penalized least-squares criterion, (b) the fast bootstrap algorithms for covariance estimation and (c) the bootstrap-based profiling method for the determination of effective system size.

With regard to (a), we have shown that under the vanishing penalty on parameters size, the LSEs continue to be consistent in the limit, but may additionally exhibit much better precision in the small-volume settings. Concerning (b), we have argued that the usage of bootstrap Monte-Carlo algorithms may allow to avoid the often difficult calculations of the LSEs asymptotic covariance structure. To improve efficiency, we proposed the usage of diffusion-based bootstrap methods and shown them to be consistent in the large-volume limit. Two types of diffusion approximation were considered: (i) the classical chemical Langevin equation approximation and (ii) the LNA method directly derived from the CLT for jump processes. The latter one, although slightly more biased in our simulation studies, seems particularly attractive as it allows one to generate resamples which are independent of the system size (volume) n. With regard to (c), this aspect of the LNA-based bootstrap may be used to develop an efficient algorithm for estimating the effective volume of the biochemical model, in essence, by analysing the size of the model fit fluctuations for different volume sizes. We have illustrated the practical applicability of this approach to reducing model uncertainty for partially observed, longitudinal RT-PCR data from the LINE 1 retrotransciption network.

Overall, our current work seems to support the conclusion that the bootstrap methods, although not without limitations, like, for example, the requirement to be able to simulate from the entire reaction network, seem to offer a viable alternative to a direct asymptotic approximation based on the estimated covariance structure. One hopes that the results presented here will encourage further study of the general application of resampling to biochemical network inference problems, both in its theoretical and applied aspects, in particular in the setting specifically not considered here when both T,n.

Acknowledgments

Research supported in part by US NIH grant R01CA-152158 (DFL, GAR) and US NSF grant DMS-1106485 (GAR)

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. See Section 5.3.

2. This is done out of convenience to eliminate superfluous parameters. Note that any such renormalization possibly effects the units of system volume n.

3. In the particular case, given the large system size n, the penalty would have only negligible effect.

References

  • H. Andersson and T. Britton, Stochastic Epidemic Models and Their Statistical Analysis, Vol. 4, Springer, New York, NY, 2000.
  • D.F. Anderson and T.G. Kurtz, Continuous Time Markov Chain Models for Chemical Reaction Networks, in Design and Analysis of Biomolecular Circuits, H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore, eds., Springer, New York, NY, 2011, 3–42.
  • K. Ball, T.G. Kurtz, L. Popovic, and G. Rempala, Asymptotic analysis of multiscale approximations to reaction networks, Ann. Appl. Probab. 16(4) (2006), pp. 1925–1961. doi: 10.1214/105051606000000420
  • M.W. Chevalier and H. El-Samad, A data-integrated method for analyzing stochastic biochemical networks, J. Chem. Phys. 135(21) (2011), 214110. doi: 10.1063/1.3664126
  • A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the em algorithm, J. R. Statist. Soc. Ser. B (Methodol.) 39(1) (1977), pp. 1–38.
  • B. Efron and G. Gong, A leisurely look at the bootstrap, the jackknife, and cross-validation, Amer. Statist. 37(1) (1983), pp. 36–48.
  • S.N. Ethier and T.G. Kurtz, Markov Processes: Characterization and Convergence, Vol. 282, Wiley, New York, NY, 2009.
  • D.T. Gillespie, A rigorous derivation of the chemical master equation, Phys. A: Statist. Mech. Appl. 188(1) (1992), pp. 404–425. doi: 10.1016/0378-4371(92)90283-V
  • A. Golightly and D.J. Wilkinson, Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo, Interface Focus 1(6) (2011), pp. 807–820. doi: 10.1098/rsfs.2011.0047
  • T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning, Vol. 2, Springer, New York, NY, 2009.
  • M. Komorowski, M.J. Costa, D.A. Rand, and M. P.H. Stumpf, Sensitivity, robustness, and identifiability in stochastic chemical kinetics models, Proc. Natl. Acad. Sci. USA 108(21) (2011), pp. 8645–8650. doi: 10.1073/pnas.1015814108
  • M. Komorowski, B. Finkenstädt, C.V. Harper, and D.A. Rand, Bayesian inference of biochemical kinetic parameters using the linear noise approximation, BMC Bioinformatics 10(1) (2009), p. 343. doi: 10.1186/1471-2105-10-343
  • J.R. Masters, Hela cells 50 years on: The good, the bad and the ugly, Nat. Rev. Cancer 2(4) (2002), pp. 315–319. doi: 10.1038/nrc775
  • G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, Vol. 382, Wiley-Interscience, New York, NY, 2007.
  • O.D. Perez, P.O. Krutzik, and G.P. Nolan, Flow cytometric analysis of kinase signaling cascades, Methods Mol. Biol. 263(2004), pp. 67–94.
  • G.A. Rempala, Least squares estimation in stochastic biochemical networks, Bull. Math. Biol. 74(8) (2012), pp. 1938–1955. doi: 10.1007/s11538-012-9744-y
  • G.A. Rempala, K.S. Ramos, and T. Kalbfleisch, A stochastic model of gene transcription: an application to l1 retrotransposition events, J. Theoret. Biol. 242(1) (2006), pp. 101–116. doi: 10.1016/j.jtbi.2006.02.010
  • G.A. Rempala, K.S. Ramos, T. Kalbfleisch, and I. Teneng, Validation of a mathematical model of gene transcription in aggregated cellular systems: application to l1 retrotransposition, J. Comput. Biol. 14(3) (2007), pp. 339–349. doi: 10.1089/cmb.2006.0125
  • A. Roberts, C. Trapnell, J. Donaghey, J.L. Rinn, and L. Pachter, Improving RNA-Seq expression estimates by correcting for fragment bias, Genome Biol. 12(3) (2011), p. R22. doi: 10.1186/gb-2011-12-3-r22
  • A. Ruszczynski, Nonlinear Optimization, Vol. 13, Princeton University Press, Princeton, NJ, 2011.
  • L. Salwinski and D. Eisenberg, In silico simulation of biological network dynamics, Nat. Biotechnol. 22(8) (2004), pp. 1017–1019. doi: 10.1038/nbt991
  • J. Shao, Mathematical Statistics, 2nd ed., Springer, New York, NY, 2003.
  • R. Tibshirani, Regression shrinkage and selection via the lasso: a retrospective, J. R. Statist. Soc: Ser. B (Statist. Methodol.) 73(3) (2011), pp. 273–282. doi: 10.1111/j.1467-9868.2011.00771.x
  • N.G. Van Kampen, Stochastic Processes in Physics and Chemistry, Vol. 1, Elsevier North-Holland, Amsterdam, 1992.
  • M.D. Wang, M.J. Schnitzer, H. Yin, R. Landick, J. Gelles, and S.M. Block, Force and velocity measured for single molecules of RNA polymerase, Science 282(5390) (1998), pp. 902–907. doi: 10.1126/science.282.5390.902
  • D.A. Wheeler, M. Srinivasan, M. Egholm, Y. Shen, L. Chen, A. McGuire, W. He, Y.-J. Chen, V. Makhijani, G.T. Roth, X. Gomes, K. Tartaro, F. Niazi, C.L. Turcotte, G.P. Irzyk, J.R. Lupski, C. Chinault, X.-z. Song, Y. Liu, Y. Yuan, L. Nazareth, X. Qin, D.M. Muzny, M. Margulies, G.M. Weinstock, R.A. Gibbs, and J.M. Rothberg, The complete genome of an individual by massively parallel DNA sequencing, Nature 452(7189) (2008), pp. 872–876. doi: 10.1038/nature06884
  • D.J. Wilkinson, Stochastic modelling for quantitative description of heterogeneous biological systems, Nat. Rev. Genet. 10(2) (2009), pp. 122–133. doi: 10.1038/nrg2509
  • D.J. Wilkinson, Stochastic Modelling for Systems Biology, Vol. 44, CRC Press, Boca Raton, FL, 2011.
  • H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Statist. Soc: Ser. B (Statist. Methodol.) 67(2) (2005), pp. 301–320. doi: 10.1111/j.1467-9868.2005.00503.x

Appendix 1. Density-dependent Markov jump process

In the following, suppose the state {Xn(t):t0} is a continuous-time Markov process on the positive orthant of d-dimensional lattice, Z0d with jump intensities qx,x+νkνk(n)=nλk(n1x) for xZ0d, νkνkZd and the transition probabilities P(Xn(t+h)=x+νkνk|Xn(t)=x)=hnλk(n1x)+o(h) and P(Xn(t+h)=x|Xn(t)=x)=1hnkλk(n1x)+o(h). It is assumed that there are a finite number of vectors, νkνkZd such that supxλk(x)>0 and that λk(x) is a continuous function of x for each k. Such processes are known as the DDMJPs since their transition rates depend on the density of the process (see, e.g. [Citation16]). Note that under the approximation (Equation2) the process given in Equation (Equation1) is a special case of the DDMJP. If Yl={Yk(t):t0} is a collection of independent unit Poisson processes then (A1) Xn(t)=Xn(0)+kYkn0tλk(n1Xn(s))ds(νkνk).(A1) Define the centred Poisson process Yˆk(nt)=Yk(nt)nt and Cn(t)=Xn(t)/n. The corresponding drift function is given by Equation (Equation11), that is, F(x)=kλk(x)(νkνk), where for notational simplicity we no longer emphasize the trajectory or rate dependance on any parameters. Then, Equation (EquationA1) becomes (A2) Cn(t)=Cn(0)+n1kYˆkn0tλk(Cn(s))ds(νkνk)+0tF(Cn(s))ds.(A2) The following result due to Kurtz [Citation1, Chapter 5] establishes SLLN for Cn(t).

Theorem A.1

(SLLN)

Let limnCn(0)=c0 and suppose that for any compact KRd there exists a constant MK such that |F(x)F(y)|MK|xy|, x,yK. Then limnsupst|Cn(s)c(s)|=0a.s., where c(t) is the deterministic solution to the integral equation, c(t)=c0(t)+0tF(c(s))ds.

For the inferential purposes, it is important to know that the scaled fluctuation of the DDMJP around the deterministic ODE solution, Vn=n(Cn(t)c(t)) is Gaussian. This fact follows from the following CLT due to Kurtz [Citation1, Chapter 5]. Let FRd×d such that [F]ij=[jFi].

Theorem A.2

(CLT)

Suppose F is continuous and limnVn(0)=v0 is non-random. Then, Vn=n(Cn(t)c(t))V, where the process V is given by (A3) V(t)=v0+kBk0tλk(c(s))ds(νkνk)+0tF(c(s))V(s)ds,(A3) or, in a distributionally equivalent Itô diffusion form, by (A4) V(t)=v0+k0tλk(c(s))dBk(s)(νkνk)+0tF(c(s))V(s)ds.(A4) The process V is a Gaussian vector process with covariance (A5) Cov(V(t),V(r))=0rtΦ(t,s)G(c(s))Φ(r,s)ds,(A5) where ΦRd×d satisfies sΦ(t,s)=Φ(t,s)F(c(s))Φ(s,s)=I and G(x)=k(νkνk)(νkνk)λk(x).

Appendix 2. Diffusion approximation and the bootstrap

Recall the Itô diffusion process Dn defined by Equation (Equation10). The following result indicates that Dn has the same consistency property as the scaled pure jump process (EquationA2) and may be therefore used as its approximation.

Lemma A.1

Suppose that limnDn(0)c(0) a constant, and that for each compact set KRd we have a Lipschitz constant, MK, such that |F(x)F(y)|MK|xy|, x,yK, then limnsupst|Dn(s)c(s)|=0 almost surely, where c(t) is the solution to c(t)=c(0)+0F(c(s))ds and F(x)=k(νkνk)λk(x), xRd.

Proof.

By the definition of Dn(t) given in Equation (Equation10) |Dn(s)c(s)|=Dn(0)c(0)+k1n0sλk(Dn(r))dBk(r)(νkνk)+0s(F(Dn(r))F(c(r)))dr|Dn(0)d(0)|+1nk|νkνk|0sλk(Dn(z))dBk(z)+0sMK|Dn(r)c(r)|dr implying, |Dn(s)c(s)||Dn(0)c(0)|+1nk|νkνk|λ¯ksup0r1r2s|Bk(r1)Bk(r2)|+0sMK|Dn(r)c(r)|dr, where λ¯k=supxKλk(x). For almost all ω, Bk(ω,t), is a continuous process, and thus achieves its maximum and minimum on every compact set, implying that sup0r1r2s|Bk(ω,r1)Bk(ω,r2)| is bounded, except on a set of measure 0. (Note: We have suppressed the inherent sample space notation, ω, except when needed, as is standard for brevity). Then, by Gronwall's inequality (cf. [Citation1], Chapter 5), |Dn(s)c(s)|(|Dn(0)c(0)|+1nk|νkνk|λ¯ksup0r1r2s|Bk(r1)Bk(r2)|)eMKs yielding supst|Dn(s)c(s)|(|Dn(0)c(0)|+1nk|νkνk|λ¯ksup0r1r2t|Bk(r1)Bk(r2)|)eMKt. As n, the first term of the RHS converges almost surely to 0 by the theorem's hypothesis. The second term involves a finite weighted sum of the expressions of type sup0r1r2s|Bk(ω,r1)Bk(ω,r2)| which are bounded for almost all ω, implying that (1/n)sup0r1r2s|Bk(ω,r1)Bk(ω,r2)| converges to 0 a.s.

The following result shows that the properly centred and scaled diffusion process converges in distribution to the same limit as the centred and scaled pure jump process.

Lemma A.2.

With the assumption cn(0)=n(Dn(0)c(0))c0 a constant, and those of Theorem 2, the process, Vn(t)=n(Dn(t)c(t)) converges in distribution to the process, V(t) which is distributionally equivalent to V(t) defined in Theorem A.2.

Proof.

By definition, Vn(t)=n(Dn(t)c(t))=cn(0)+k0tλk(Dn(s))dBk(s)(νkνk)+0tF(c(s))Vn(s)ds+0tO(|Dn(s)c(s)|)Vn(s)ds.

Define, as before, Un(t):=k0tλk(Dn(s))dBk(s)(νkνk)+0tO(|Dn(s)c(s)|)Vn(s)ds=cn(0)+Vn(t)0tF(c(s))Vn(s)ds. Since the process Dn(s) is non-anticipating and converges in probability pointwise to c(s) by Lemma A.3, the dominated convergence theorem (DCT) for stochastic integrals implies that limnUn(t) can be computed by interchanging the limit with the first integral, and that the classical DCT can be applied to the second. This gives Un(t)k0tlimnλk(Dn(s))dBk(s)(νkνk)+0tlimnO(|Dn(s)c(s)|)Vn(s)ds=k0tλk(c(s))dBk(s)(νkνk)=U(t). Again, defining Φ to be the matrix function solution to sΦ(t,s)=Φ(t,s)F(c(s)),Φ(s,s)=I, we have 0Φ(t,s)dUn(s)=0Φ(t,s)dVn(s)0Φ(t,s)F(c(s))Vn(s)ds=Φ(t,t)Vn(t)Φ(t,0)cn(0)0Φ′(t,s)Vn(s)ds0Φ(t,s)F(c(s))Vn(s)ds=Vn(t)Φ(t,0)cn(0). Thus, Vn(t)=Φ(t,0)cn(0)+0Φ(t,s)dUn(s), and UnU implies Vn=h(Un)h(U)=Φ(t,0)c0+0Φ(t,s)dU(s)=V again by the continuous mapping theorem. This is clearly a Gaussian vector process by definition, with its covariance function given by E(V(t)V(r))=E0Φ(t,s)dU(s)0rΦ(r,s)dU(s)=Elim|si+1si|0iΦ(t,si)(U(si+1)U(si))iΦ(r,si)(U(si+1)U(si))=Elim|si+1si|0iΦ(t,si)kλk(c(si))(Bk(si+1)Bk(si))(νkνk)×iΦ(r,si)kλk(c(si))(Bk(si+1)Bk(si))(νkνk))=lim|si+1si|0Ei,jΦ(t,si)k,l(νkνk)(νlνl)λk(c(si))λj(c(sj))i,j×(Bk(si+1)Bk(si))(Bl(sj+1)Bl(sj))Φ(r,sj). Since the Brownian motions Bk and Bl are independent, as well as the increments (Bk(si+1)Bk(si)) are independent of (Bk(sj+1)Bk(sj)), we have E(V(t)V(r))=lim|si+1si|0iΦ(t,si)kλ(c(si))(νkνk)(νkνk)E(Bk(si+1)Bk(si))2Φ(r,si)=lim|si+1si|0iΦ(t,si)kλ(c(si))(νkνk)(νkνk)Φ(r,si)(si+1si)=0trΦ(t,s)G(c(s))Φ(r,s)ds and therefore V and V have the same distribution.

The following observation regarding the LNA process will be useful at this point.

Remark A.1.

In view of the definition of the LNA process Ln(t) given by Equation (Equation12), it is easy to see that the results of Lemmas A.3 and A.4 hold true when Cn(t) is replaced by Ln(t).

With the above lemmas, we are finally in a position to prove the bootstrap consistency result of Theorem 4.1.

Proof of Theorem 4.1

Let indicate that the sequence of probability distributions associated with the conditional (bootstrap) space converges to the distribution of a multivariate normal random vector with covariance Σβ0. The notation a.s.(ω) and a.s.(ω) indicates the property holds for almost every ω and almost every ω, respectively. In view of the assumed differentiability of the penalty function h, any minimizing solution, βˆ, satisfying βˆ=argminβΘi|Dn(ti)cβ(ti)|2+2γnh(β) must also necessarily satisfy icβˆ(ti)(Dn(ti)cβˆ(ti))+γnh(βˆ)=0. Expansion about βˆ gives icβˆ(ti)n(Dn(ti)cβˆ(ti))=icβˆ(ti)n(cβˆ(ti)cβˆ(ti))+nγnh(βˆ)=icβˆ(ti)[cβˆ(ti)]n(βˆβˆ)+nγnH(βˆ)(βˆβˆ)+O(|βˆβˆ|)n(βˆβˆ) where H(β) is the Hessian of the penalty term evaluated at β. Lemma A.3 gives O(|βˆβˆ|)0 a.s.(ω) implying that O(|βˆβˆ|)n(βˆβˆ) converges to 0 in view of Lemma A.4, validating the removal in the limit of the higher order terms. Setting Bβˆ=icβˆ(ti)[cβˆ(ti)]+(γn/2)H(βˆ) we arrive at n(βˆβˆ)=iBβˆ1cβˆ(ti)n(Dn(ti)cβˆ(ti)). By the consistency for the diffusion approximation (Lemma A.3), βˆβˆ a.s.(ω), implying that cβˆ(ti)cβˆ(ti) and Bβˆ1Bβˆ1 a.s.(ω). Since n(Dn(t)cβˆ(t)) converges to the same distribution as the Gaussian vector process with mean 0 and covariance function Σβˆ(t,r)=0trΦβˆ(t,s)G(cβˆ(s))Φβˆ(r,s)ds, where sΦβˆ(t,s)=Φβˆ(t,s)F(cβˆ(s)),Φβˆ(s,s)=I, we have (A6) sRpsupx|Pn(ns(βˆβˆ)x)P(sZnx)|0a.s.(A6) where ZnMVN(0,Σβˆ) is the zero-centred multivariate normal measure with covariance Σβˆ=VariBβˆ1cβˆ(ti)n(Dn(ti)cβˆ(ti))=iBβˆ1cβˆ(ti)Σβˆ(ti,ti)[Bβˆ1cβˆ(ti)]+ij[Bβˆ1cβˆ(ti)Σβˆ(ti,tj)[Bβˆ1cβˆ(tj)]+ijBβˆ1cβˆ(tj)Σβˆ(ti,tj)[Bβˆ1cβˆ(ti)]. By consistency of the penalized LSE for the pure jump process, Cn(t), βˆβ0 a.s.(ω), implying that cβˆ(ti)cβ0(ti) and Bβˆ1Bβ01 a.s.(ω). An appeal to Slutsky's theorem gives (A7) MVN(0,Σβˆ)MVN(0,Σβ0).(A7) Asymptotic normality of the penalized LSE for the pure jump process implies (A8) n(βˆβ0)MVN(0,Σβ0).(A8) Application of the Cramer–Wold device to Equations (EquationA7) and (EquationA8), along with Equation (EquationA6) imply sRpsupx|Pn(ns(βˆβˆ)x)Pn(ns(βˆβ0)x)|0a.s., which in turn implies n(βˆβˆ)MVN(0,Σβ0).

Proof of Theorem 4.3

In view of Remark A.5, proof of Theorem 4.1 may be virtually repeated with Ln(t) in place of Cn(t) to obtain the hypothesis.