Full article: Bootstrapping least-squares estimates in biochemical reaction networks

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The paper proposes new computational methods of computing confidence bounds for the least-squares estimates (LSEs) of rate constants in mass action biochemical reaction network and stochastic epidemic models. Such LSEs are obtained by fitting the set of deterministic ordinary differential equations (ODEs), corresponding to the large-volume limit of a reaction network, to network's partially observed trajectory treated as a continuous-time, pure jump Markov process. In the large-volume limit the LSEs are asymptotically Gaussian, but their limiting covariance structure is complicated since it is described by a set of nonlinear ODEs which are often ill-conditioned and numerically unstable. The current paper considers two bootstrap Monte-Carlo procedures, based on the diffusion and linear noise approximations for pure jump processes, which allow one to avoid solving the limiting covariance ODEs. The results are illustrated with both in-silico and real data examples from the LINE 1 gene retrotranscription model and compared with those obtained using other methods.

Keywords:

AMS Classification:

1. Introduction

The rapid advances in modern molecular quantification technology, such as fluorescence microscopy, cellular flow cytometry and high-throughput RNA-seq (see, e.g. [Citation15,Citation26,Citation27]), provide us with an unprecedented opportunity to discover the basic operating principles of molecular systems biology. However, progress seems to largely depend on our ability to bring to bear proper statistical methods capable of separating the biological signal of interest from the vast amount of experimental noise. In order to be effective, such methods must be tailored to the specific experimental tools and data collection mechanisms which, despite continuing improvements, still often provide incomplete and highly noisy information.

In this paper, we are concerned with statistical methods for biochemical reaction rates estimation from the data consisting of one or more time course trajectories of (partial) network species concentrations measured on a pre-defined time grid via, for example, RT-PCR or flow cytometry assays [Citation4,Citation21]. Although this particular problem has been discussed extensively in the computational biology and statistical literature ([Citation9,Citation12] and references therein), its asymptotic analysis was often dismissed due to a small number of longitudinal data points available from typical laboratory experiments. However, as pointed out recently in [Citation11] in many practical situations, it is reasonable to assume that the size of biological system (e.g. cell culture) under consideration is sufficiently large to allow for its large-volume approximation, the technique which is often used in modern population dynamics models, like, for example, epidemic spread studies [Citation1]. The volume limit idea should be particularly appealing to molecular biology experimentalists as it circumvents the problem of small longitudinal sample sizes due to instrumental limitations and cost considerations. The same approach is also applicable (although this is not the focus of current paper) to stochastic epidemic models [Citation1].

Using the large-volume argument Rempala [Citation16] has shown that the popular method of parameter estimation based on the least-squares criterion yields consistent and asymptotically efficient results when fitting the deterministic system of ordinary differential equations (ODEs) to the stochastic data arriving from large collections of molecules (e.g. in the RT-PCR experiments) under the law of mass action kinetics. However, as also pointed out in [Citation16], the practical obstacle to large-volume approximation for the least-square estimates (LSEs) is in the fact that their joined asymptotic covariance structure is given very non-explicitly by a solution to a set of nonlinear differential equations. Consequently, due to often numerically ill-posed inverse problems, it is quite difficult to obtain the asymptotic confidence bounds for the LSEs, even in relatively simple models.

In the current paper, we aim at improving the practical applicability of the results discussed in [Citation16]. First, we consider a slightly more general version of the LSE, the so-called penalized LSE or PLSE. Whereas in many modern statistical analysis problems the penalized methods are used to improve identifiability and decrease variability in the model predictions, in our case the biggest potential gain in their usage is the stabilization of the variance matrix (for the fixed volume) in a way analogous to the ridge model in classical regression analysis [Citation10]. Furthermore, based on the consistency of PLSEs, we derive and illustrate with examples two large-volume parametric bootstrap (Monte-Carlo) procedures which allow for consistent estimation of the covariance structure, possibly under limited and only partially observable trajectory data. In order to improve efficiency in re-computing model trajectories, we propose the usage of diffusion and closely related linear noise approximations (LNAs) [Citation12] with the bootstrap algorithm. We show that the consistency of our procedures follows from the classical large-volume limit theorem for the density-dependent jump process (see, e.g. [Citation7, Chapter 10]). Finally, in order to apply our methods to practical experimental data, we also propose a procedure based on the profile likelihood bootstrap algorithm which allows for a more realistic selection of the effective volume size of the modelled system. In essence, we treat the volume size as an estimable parameter of the model and fit it via the maximal likelihood method.

Beyond this introductory section, the paper is organized as follows. In Section 2, we introduce the concept of a stochastic biochemical network under mass action dynamics, along with the necessary notation. In Section 3, we discuss the least-squares estimation problem for mass action biochemical networks and prove basic consistency and normality results for penalized least-squares estimates (PLSEs). These PLSE results are analogous to those in [Citation16] but form the foundation for our latter analysis and are provided for completeness. As they are consequences of a general law of large numbers and a central limit theorem (CLT) for density-dependent stochastic processes, we provide brief review of the relevant facts about such processes in Appendix 1. In Section 4, our main results on the bootstrap algorithms for estimating LSE distribution are discussed, along with the formulation of the required large-volume consistency and diffusion approximation results. The proofs are deferred to Appendix 2. Finally, in Section 5, we illustrate some of the concepts discussed earlier with the analysis of the LINE 1 retrotransposition model for both synthetic and real data, comparing the latter analysis to the one presented in [Citation16] relying on simple asymptotic LSE bounds. Our conclusions and a brief summary are given in Section 6.

2. Jump processes and mass action kinetics

A reaction system or reaction network is typically defined as a set of d constituent chemical species reacting according to a finite set of equations, each describing the system state change after the occurrence of a particular reaction. For instance, the chemical reaction $A + B \overset{λ}{⟶} C,$ is interpreted as ‘a molecule of species A combines with a molecule of species B to create a molecule of species C at rate λ’. In this paper, the reaction system is treated as a continuous-time Markov chain, with its state described by a d-dimensional vector $X (t)$ keeping track of each species molecular counts at time t. The kth reaction is determined by the reaction rate function $λ_{k} (x)$ along with two vectors, $ν_{k}$ and $ν_{k}^{'}$ which give, respectively, the number of molecules of each species consumed and produced in the reaction, Both $ν_{k}$ and $ν_{k}^{'}$ are the elements of $Z_{\geq 0}^{d}$ and the corresponding reaction rate is a non-negative scalar. Under the assumptions that the system is well-stirred and thermally equilibrated, the reaction rates take on the classical mass action form and the waiting times between reactions are exponentially distributed (see, e.g. [Citation8]). Intuitively, the mass action form of a kth reaction rate function represents the number of different ways we can choose molecular substrates needed for the reaction k to occur [Citation7, Chapter 10]. Defining $| ν_{k} | = \sum_{i} ν_{i k}$ , the kth reaction rate function has the form $λ_{k}^{(n)} (x) = n κ_{k} \frac{\prod_{i} ν_{i k}! (\binom{x_{i}}{ν_{i k}})}{n^{| ν_{k} |}} .$ Here, the parameter n denotes the effective volume of the system. Whereas in many classical chemistry applications n is simply the physical volume of a container (e.g. cell) multiplied by Avagadro's number, in biological systems, where often the physical units of the parameters are not readily available,^Footnote1 it is convenient to consider n simply as a scaling parameter proportionate to the overall system volume [Citation2,Citation3].

The exponential waiting time distribution, along with the probability of a particular reaction [Citation8] justifies the system time-evolution equation (1) $X (t) = X (0) + \sum_{k} Y_{k} (\int_{0}^{t} λ_{k}^{(n)} (X (s)) d s) (ν_{k}^{'} - ν_{k}) .$ (1) Denoting by $c = n^{- 1} x$ the species concentration per unit volume and observing the behaviour of the mass action rate functions in the large-volume limit ( $n \to \infty$ ) one obtains the approximation (2) $λ_{k}^{(n)} (x) \approx n κ_{k} \prod c_{i}^{ν_{i k}} =: n λ_{k} (c) .$ (2) When $λ_{k}^{(n)} (x)$ is replaced with $n λ_{k} (x / n)$ in Equation (Equation1(1) $X (t) = X (0) + \sum_{k} Y_{k} (\int_{0}^{t} λ_{k}^{(n)} (X (s)) d s) (ν_{k}^{'} - ν_{k}) .$ (1) ), the process $X (t)$ becomes a member of a general family of the so-called density-dependent Markov jump processes (DDMJP) described in the appendix. As shown in the appendix (cf. Theorem A.1) the approximation (Equation2(2) $λ_{k}^{(n)} (x) \approx n κ_{k} \prod c_{i}^{ν_{i k}} =: n λ_{k} (c) .$ (2) ) ties together the stochastic jump process with its large-volume limit which turns out to be given by the solution of an integral version of the classical law of mass action ODE (the so-called reaction rate equation, cf., e.g. [Citation24]), (3) $\dot{c} (t) = \sum_{k} κ_{k} \prod c_{i}^{ν_{i k}} (t) (ν_{k}^{'} - ν_{k}) .$ (3) The above ODE is parametrized by a vector of parameters, say β, involving the rate constants $κ_{k}$ as well as possibly the initial conditions $c (0)$ . In what follows, we write $c^{β} (t)$ for any such solution to either Equation (Equation3(3) $\dot{c} (t) = \sum_{k} κ_{k} \prod c_{i}^{ν_{i k}} (t) (ν_{k}^{'} - ν_{k}) .$ (3) ) or to its integral version (see also Appendix 1).

3. Penalized least squares

Let the counts of d species observed at discrete time points, $t_{i}$ , ( $i = 1, \dots, T < \infty)$ be denoted by $X_{i}^{(n, β)} \in Z_{\geq 0}^{d}$ where for now we assume that this observed data arrived from a single trajectory of DDMJP for which the effective system volume n was known. The extension to multiple trajectories is straightforward, but not considered here except for a very simple case of independent observations (see Corollary 3.5 of Theorem 3.3). The case when n is unknown is discussed in Section 5. Following Rempala [Citation16], we denote the scaled molecular counts $C_{n} (t_{i}) = C_{n}^{β} (t_{i}) = X^{(n, β)} (t_{i}) / n$ and assume that $β^{0} \in Θ \subset R^{p}$ is the true parameter value we intend to estimate. Throughout the paper, the PLSE is defined as any solution (not necessarily unique) to the optimization problem (4) ${\hat{β}}_{n} = {argmin}_{β \in Θ} \sum_{i} | C_{n} (t_{i}) - c^{β} (t_{i}) |^{2} + 2 γ_{n} h (β),$ (4) where $γ_{n} \geq 0$ , h is a penalty function and Θ is an open set in $R^{p}$ containing $β^{0}$ . Common forms of $h (x)$ include $| x |_{2}^{2} = \sum_{i} x_{i}^{2}$ for the ridge regression, $| x |_{1} = \sum_{i} | x_{i} |$ for the lasso [Citation23], and $| x |_{2}^{2} + | x |_{1}$ for the elastic net [Citation29]. Note that in our setting we will typically have $β \geq 0$ component-wise, in which case the lasso penalty becomes (right) differentiable as it reduces to $h (β) = \sum_{i} β_{i}$

For any matrix A, define $∥ A ∥ = \sqrt{T r A A^{⊤}}$ . The following two results extending those in [Citation16] will be useful in the following sections. The first one establishes the strong law of large numbers (SLLN) and the consistency of ${\hat{β}}_{n}$ as $n \to \infty$ .

Theorem 3.1

SLLN

Let $\partial_{k} c^{β}$ denote the derivative of $c^{β} \in R_{\geq 0}^{d}$ with respect to the kth coordinate of $β \in R^{p}$ and set $\partial c^{β} = [\partial_{k} c_{j}^{β}] \in R^{p \times d}$ . Assume that $∥ \partial c^{β} ∥ < \infty$ and that the SLLN for DDMJP holds $($ cf. Theorem A.1 in the appendix $),$ the subderivatives of the penalty function h are bounded and $γ_{n} \to 0$ as $n \to \infty$ . If (5) $\forall β_{1}, β_{2} \in Θ c^{β_{1}} (t_{i}) = c^{β_{2}} (t_{i}) for i = 1, \dots, T implies β_{1} = β_{2}$ (5) and for some $ε > 0$ (6) $\exists_{k} \sum_{i} \partial_{k} c^{β} (t_{i}) (c^{β} (t_{i}) - c^{β^{0}} (t_{i})) \neq 0 for β \neq β^{0}, | β - β^{0} | < ε$ (6) then the penalized LSE ${\hat{β}}_{n}$ given by Equation (Equation4(4) ${\hat{β}}_{n} = {argmin}_{β \in Θ} \sum_{i} | C_{n} (t_{i}) - c^{β} (t_{i}) |^{2} + 2 γ_{n} h (β),$ (4) ) are strongly consistent, that is, ${\hat{β}}_{n} \to β^{0} as n \to \infty a . s .$

Proof.

Let $θ \in R^{p}$ denote the vector of subderivatives (the sub-gradient) of $h (β)$ . By the so-called KKT necessary optimality conditions (see, e.g. [Citation20], Chapter 3), any solution ${\hat{β}}_{n}$ of the optimization problem (Equation4(4) ${\hat{β}}_{n} = {argmin}_{β \in Θ} \sum_{i} | C_{n} (t_{i}) - c^{β} (t_{i}) |^{2} + 2 γ_{n} h (β),$ (4) ) has to satisfy $- \partial \sum_{i} (C_{n} (t_{i}) - c^{{\hat{β}}_{n}} (t_{i}))^{⊤} (C_{n} (t_{i}) - c^{\hat{β}} (t_{i})) = 2 γ_{n} θ$ with $C_{n} (t_{i})$ and $c^{\hat{β}} (t_{i}) \in R^{d}$ denoting, respectively, the observations from d chemical species concentrations measured at $t_{i}$ and the corresponding deterministic solution of Equation (Equation3(3) $\dot{c} (t) = \sum_{k} κ_{k} \prod c_{i}^{ν_{i k}} (t) (ν_{k}^{'} - ν_{k}) .$ (3) ) evaluated at $t_{i}$ and $\hat{β}$ . This yields $\sum_{i} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{\hat{β}} (t_{i})) = γ_{n} θ$ . Adding and subtracting the deterministic limit gives $\sum_{i} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{β^{0}} (t_{i}) + c^{β^{0}} (t_{i}) - c^{\hat{β}} (t_{i})) = γ_{n} θ$ or (7) $\sum_{i} \partial c^{\hat{β}} (t_{i}) (c^{\hat{β}} (t_{i}) - c^{β^{0}} (t_{i})) + γ_{n} θ = \sum_{i} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) .$ (7) Under the SLLN for DDMJP cited in the appendix (cf. Theorem A.1), the condition $γ_{n} \to 0$ and the boundedness of $\partial c^{β} (t_{i})$ imply the RHS above converges a.s. to 0 and hence so does the LHS. Consequently, $lim_{n \to \infty} \sum_{i} \partial c^{\hat{β}} (t_{i}) (c^{\hat{β}} (t_{i}) - c^{β^{0}} (t_{i})) = 0$ . Under the assumptions of the trajectories identifiability (Equation5(5) $\forall β_{1}, β_{2} \in Θ c^{β_{1}} (t_{i}) = c^{β_{2}} (t_{i}) for i = 1, \dots, T implies β_{1} = β_{2}$ (5) ) and non-degeneracy (Equation6(6) $\exists_{k} \sum_{i} \partial_{k} c^{β} (t_{i}) (c^{β} (t_{i}) - c^{β^{0}} (t_{i})) \neq 0 for β \neq β^{0}, | β - β^{0} | < ε$ (6) ), the asymptotic inversion gives $lim_{n \to \infty} {\hat{β}}_{n} = β^{0}$ a.s.

Remark 3.1.

Note that the conditions (Equation5(5) $\forall β_{1}, β_{2} \in Θ c^{β_{1}} (t_{i}) = c^{β_{2}} (t_{i}) for i = 1, \dots, T implies β_{1} = β_{2}$ (5) ) and (Equation6(6) $\exists_{k} \sum_{i} \partial_{k} c^{β} (t_{i}) (c^{β} (t_{i}) - c^{β^{0}} (t_{i})) \neq 0 for β \neq β^{0}, | β - β^{0} | < ε$ (6) ) are necessary to guarantee the LSE consistency. The selection of points ${t_{i}}$ is therefore important to ensure good statistical properties of LSEs, as also seen in our next theorem below. The required boundedness of h subderivatives is, on the other hand, a relatively mild condition and satisfied, e.g. for $h (x) = | x |_{1}$ and $h (x) = | x |_{2}$ . Indeed, the latter is simply differentiable, and for the former, recall that the subderivative of $| x |$ is $- 1, 1$ , or any number in $[- 1, 1]$ , respectively, according to $x < 0$ , $x > 0$ or $x = 0$ .

The second result of this section is the CLT for the scaled PLSEs $\sqrt{n} (\hat{β} - β^{0})$ . Similarly to the SLLN above, this one is also a consequence of the appropriate result for DDMJP.

Theorem 3.2

CLT

Suppose that the assumptions of the previous theorem hold and additionally $γ_{n} = o (n^{- 1 / 2})$ and h is twice differentiable in $Θ \in R^{p}$ . Furthermore, assume that the hypothesis of CLT for DDMJP holds $($ cf. Theorem A.2 in the appendix $)$ . Then for any $s \in R^{p}$ $sup_{x} | P (\sqrt{n} (s^{⊤} \hat{β} - s^{⊤} β^{0}) \leq x) - P (s^{⊤} Z_{n} \leq x) | \to 0 as n \to \infty,$ where $Z_{n} \sim MVN (0, Σ_{\hat{β}})$ is a multivariate zero-centred Gaussian distribution with the covariance matrix (8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) Here, $B_{\hat{β}} = \sum_{i} \partial c^{\hat{β}} (t_{i}) [\partial c^{β^{0}} (t_{i})]^{⊤},$ H is the Hessian matrix of the penalty function h and $Cov (V_{i}, V_{j}) = Cov (V (t_{i}), V (t_{j}))$ is given by Equation (Equation23(A5) $Cov (V (t), V (r)) = \int_{0}^{r \land t} Φ (t, s) G (c (s)) Φ (r, s)^{⊤} d s,$ (A5) ) of Theorem A.2 in the appendix.

Proof.

In the notation of the previous proof, under the differentiability assumptions on the penalty function h, in Equation (Equation7(7) $\sum_{i} \partial c^{\hat{β}} (t_{i}) (c^{\hat{β}} (t_{i}) - c^{β^{0}} (t_{i})) + γ_{n} θ = \sum_{i} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) .$ (7) ) we may take $θ = \nabla h$ , the gradient vector of the penalty. Expanding the LHS of Equation (Equation7(7) $\sum_{i} \partial c^{\hat{β}} (t_{i}) (c^{\hat{β}} (t_{i}) - c^{β^{0}} (t_{i})) + γ_{n} θ = \sum_{i} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) .$ (7) ) about $β^{0}$ gives $\begin{aligned} \sum_{i} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) + γ_{n} \nabla h (\hat{β}) \\ = \sum_{i} \partial c^{\hat{β}} (t_{i}) (c^{\hat{β}} (t_{i}) - c^{β^{0}} (t_{i})) + γ_{n} H (β^{0}) (\hat{β} - β^{0}) + γ_{n} \nabla h (β^{0}) + O_{p} (| \hat{β} - β^{0} |) (\hat{β} - β^{0}) \\ = \sum_{i} \partial c^{\hat{β}} (t_{i}) [\partial c^{β^{0}} (t_{i})]^{⊤} (\hat{β} - β^{0}) + γ_{n} H (β^{0}) (\hat{β} - β^{0}) + O_{p} (| \hat{β} - β^{0} |) (\hat{β} - β^{0}) + γ_{n} \nabla h (β^{0}) \\ = (B_{\hat{β}} + γ_{n} H (β^{0})) (\hat{β} - β^{0}) + O_{p} (| \hat{β} - β^{0} |) (\hat{β} - β^{0}) + γ_{n} \nabla h (β^{0}) . \end{aligned}$ In view of the theorem assumptions, the higher order terms scaled by $\sqrt{n}$ tend to 0, and therefore (9) $\begin{aligned} \sqrt{n} (\hat{β} - β^{0}) & = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) \\ - γ_{n} \sqrt{n} (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \nabla h (β^{0}) + o_{p} (1) \\ = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) + o_{p} (1), \end{aligned}$ (9) since the condition $γ_{n} = o (n^{- 1 / 2})$ implies that the term in the second line above converges to 0. From Theorem A.2 in the appendix, it follows that the d-vectors $\sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))$ $(i = 1, \dots, T$ ) converge weakly to the d-variate normal variables whose covariance structure is given by $Cov (V_{i}, V_{j})$ described in Equation (EquationA5(A5) $Cov (V (t), V (r)) = \int_{0}^{r \land t} Φ (t, s) G (c (s)) Φ (r, s)^{⊤} d s,$ (A5) ). Therefore, the relation (Equation9(9) $\begin{aligned} \sqrt{n} (\hat{β} - β^{0}) & = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) \\ - γ_{n} \sqrt{n} (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \nabla h (β^{0}) + o_{p} (1) \\ = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) + o_{p} (1), \end{aligned}$ (9) ), along with the continuous mapping and Slutsky's theorem arguments (see, e.g. [Citation22], p. 133), implies that $\sqrt{n} (\hat{β} - β^{0}) \Rightarrow MVN (0, Σ_{β^{0}})$ where ⇒ denotes the usual weak convergence of random vectors. Since $MVN (0, Σ_{n, \hat{β}}) \Rightarrow MVN (0, Σ_{β^{0}})$ by Slutsky's theorem, the hypothesis follows via the Cramer–Wold device (see, e.g. [Citation22], p. 56).

Remark 3.2.

Note that the relation (Equation9(9) $\begin{aligned} \sqrt{n} (\hat{β} - β^{0}) & = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) \\ - γ_{n} \sqrt{n} (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \nabla h (β^{0}) + o_{p} (1) \\ = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) + o_{p} (1), \end{aligned}$ (9) ) indicates that for finite n the penalty term may decrease the variability of the LSE, by stabilizing the inverse of $B_{\hat{β}}$ matrix. Indeed, for instance for $h (x) = | x |_{2}$ this effect is similar to that observed in the ridge regression model. This is further illustrated in the numerical example in Section 5 (cf. Table ).

Table 1. Penalized vs. ordinary LSEs for L1.

Display Table

In Theorems 3.1 and 3.3, we have assumed that data come from a single trajectory of the jump process. However, the results are obviously immediately extendable to the case of multiple independent trajectories. This special case of independent observations is of interest, for instance, in RT-PCR experiments with multiple cell-culture dishes and therefore deserves a special attention.

Corollary 3.1.

Assume that the trajectory data $X_{i}^{(n, β)} \in Z_{\geq 0}^{d}$ $(i = 1, \dots, T < \infty)$ are mutually independent, then $Cov (Z_{i}, Z_{j}) = 0$ for $i \neq j$ and, assuming that remaining conditions in Theorem 3.3 hold true, its hypothesis also holds with $Σ_{\hat{β}} = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} .$

As already indicated in Section 1, the complicated, non-explicit covariance structure of the collection of random variables $\sqrt{n} \partial c^{\hat{β}} (t_{i}) (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))$ $(i = 1, \dots, T$ ) limits the usefulness of the relation (Equation9(9) $\begin{aligned} \sqrt{n} (\hat{β} - β^{0}) & = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) \\ - γ_{n} \sqrt{n} (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \nabla h (β^{0}) + o_{p} (1) \\ = (B_{\hat{β}} + γ_{n} H (β^{0}))^{- 1} \sum_{i} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i})) + o_{p} (1), \end{aligned}$ (9) ) in obtaining asymptotic confidence bounds for the penalized LSE. Although for some simple reaction networks the explicit analytic expression for Equation (Equation8(8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) ) may sometimes exist, in most cases of practical interest it may be only numerically evaluated (cf. Section 5). As such evaluations become more unstable with increasing system complexity (see, e.g. [Citation25]), the methods of approximating the LSEs asymptotic distribution without the need to explicitly evaluate the formula (Equation8(8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) ) are naturally of great interest. This issue is discussed next.

4. Bootstrap algorithms

We present here two resampling algorithms for estimating the distribution of the PLSEs discussed in the previous section. As seen below, this may be done either directly, by considering the appropriate resampling distributions themselves, or indirectly, by considering the resampling estimates of the asymptotic covariance in Theorem 3.3.

4.1. Diffusion approximation

For the purpose of bootstrapping trajectories, it is useful to have an easy to simulate approximation to the concentration process $C_{n} (t) = X (t) / n$ where $X (t)$ is given by Equation (Equation1(1) $X (t) = X (0) + \sum_{k} Y_{k} (\int_{0}^{t} λ_{k}^{(n)} (X (s)) d s) (ν_{k}^{'} - ν_{k}) .$ (1) ) (cf. also Equation (EquationA2(A2) $C_{n} (t) = C_{n} (0) + n^{- 1} \sum_{k} {\hat{Y}}_{k} (n \int_{0}^{t} λ_{k} (C_{n} (s)) d s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (C_{n} (s)) d s .$ (A2) ) in the appendix). To this end, consider the following Itô diffusion process $D_{n} (t)$ (see, e.g.[Citation2]): (10) $D_{n} (t) = D_{n} (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (D_{n} (s)) d s,$ (10) where $B_{k} (t)$ are independent standard Brownian motions and (11) $F (x) = \sum_{k} λ_{k} (x) (ν_{k}^{'} - ν_{k})$ (11) is the drift function. Note that under the law of mass action approximation (Equation2(2) $λ_{k}^{(n)} (x) \approx n κ_{k} \prod c_{i}^{ν_{i k}} =: n λ_{k} (c) .$ (2) ), $F (x)$ is asymptotically ( $n \to \infty$ ) equal to the RHS of the reaction rate equation (Equation3(3) $\dot{c} (t) = \sum_{k} κ_{k} \prod c_{i}^{ν_{i k}} (t) (ν_{k}^{'} - ν_{k}) .$ (3) ). As shown in Lemma A.3 in the appendix, the process defined by Equation (Equation10(10) $D_{n} (t) = D_{n} (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (D_{n} (s)) d s,$ (10) ) approximates $C_{n} (t)$ in the sense that both share the same deterministic large-volume limit given by the solution $c (t)$ of the reaction rate equation (Equation3(3) $\dot{c} (t) = \sum_{k} κ_{k} \prod c_{i}^{ν_{i k}} (t) (ν_{k}^{'} - ν_{k}) .$ (3) ). Note that there is an obvious computational benefit in considering the process $D_{n}$ instead of the pure jump process $X (t)$ which requires an exact (hence expensive) simulation of the trajectory (Equation1(1) $X (t) = X (0) + \sum_{k} Y_{k} (\int_{0}^{t} λ_{k}^{(n)} (X (s)) d s) (ν_{k}^{'} - ν_{k}) .$ (1) ) [Citation8]. For large systems, the simulation of $D_{n}$ is typically substantially less computationally intense and may be performed via one of the many standard numerical methods available in the literature [Citation28, Chapter 5].

The considerations above lead to the diffusion bootstrap algorithm (Algorithm 1) for approximating either directly the distribution of the penalized LSE $\hat{β}$ or the covariance matrix $Σ_{\hat{β}}$ . The formal justification of Algorithm 1 is given in the following consistency result (Theorem 4.1) stating that as the system volume increases, the bootstrap samples have the asymptotically correct empirical distribution and, consequently, may be used to estimate the covariance structure of the least squares and penalized least-squares estimates with vanishing penalty. The proof is given in Appendix 2.

Theorem 4.1.

Assume $C_{n} (t) = X (t) / n$ is a scaled trajectory from the pure jump process (Equation1(1) $X (t) = X (0) + \sum_{k} Y_{k} (\int_{0}^{t} λ_{k}^{(n)} (X (s)) d s) (ν_{k}^{'} - ν_{k}) .$ (1) ) with $β = β^{0}$ . Let $\hat{β} = {argmin}_{β \in Θ} \sum_{i} | C_{n} (t_{i}) - c^{β} (t_{i}) |^{2} + γ_{n} h (β)$ be associated with the probability space, $(Ω, F, P)$ . Let $D_{n}^{*} (t)$ denote the diffusion process parametrized by the LSE $β = \hat{β},$ and let ${\hat{β}}^{*} = {argmin}_{β \in Θ} \sum_{i} | D_{n}^{*} (t_{i}) - c^{β} (t_{i}) |^{2} + γ_{n} h (β)$ be the penalized least-squares estimate associated with the conditional probability space, $(Ω^{*}, F^{*}, P_{n}^{*})$ Then, under the assumptions of Theorem 3.3, $\forall s \in R^{p} sup_{x} | P_{n}^{*} (\sqrt{n} s^{⊤} ({\hat{β}}^{*} - \hat{β}) \leq x) - P (\sqrt{n} s^{⊤} (\hat{β} - β^{0}) \leq x) | \to 0 a . s . n \to \infty .$

Although in the above theorem we have assumed that the penalty function is differentiable, it should be noted that this is not necessary. Provided that the subderivatives of the penalty term are bounded, the term vanishes asymptotically and the theorem hypothesis still holds.

Remark 4.1.

From the proof of the theorem, it is evident that the hypothesis of Theorem 4.1 extends to the case of independent observations of Corollary 3.5.

4.2. Linear noise approximation

An alternative approximation of Equation (Equation1(1) $X (t) = X (0) + \sum_{k} Y_{k} (\int_{0}^{t} λ_{k}^{(n)} (X (s)) d s) (ν_{k}^{'} - ν_{k}) .$ (1) ), which is often numerically simpler than $D_{n}$ , may be obtained by directly appealing to the limit theorem for DDMJP, that is, Theorem A.2 in the appendix. Since under the assumptions of Theorem A.2 $\sqrt{n} (C_{n} (t) - c^{β} (t)) \Rightarrow V (t)$ as $n \to \infty$ where $V (t)$ is the Gaussian process satisfying the distributionally equivalent diffusion equations (EquationA3(A3) $V (t) = v_{0} + \sum_{k} B_{k} (\int_{0}^{t} λ_{k} (c (s)) d s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} \partial F (c (s)) V (s) d s,$ (A3) ) and (EquationA4(A4) $V (t) = v_{0} + \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} \partial F (c (s)) V (s) d s .$ (A4) ), the natural approximation of the process $C_{n} (t)$ by its limit $V (t)$ is therefore (12) $L_{n} (t) = c^{β} (t) + \frac{V (t)}{\sqrt{n}} .$ (12) where we write β for $β^{0}$ for notational convenience. The process $L_{n} (t)$ is commonly referred to as the LNA of $C_{n} (t)$ [Citation12]. In our present context, it may be also heuristically derived as follows (for another heuristic derivation cf., e.g. [Citation28], Chapter 8). Since $C_{n} (t) \approx D_{n} (t)$ given by Equation (Equation10(10) $D_{n} (t) = D_{n} (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (D_{n} (s)) d s,$ (10) ) and $C_{n} \to c^{β}$ a.s. (cf. Theorem A.1), then $C_{n} (t) \approx C_{n} (0) + \frac{1}{\sqrt{n}} \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c^{β} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (C_{n} (t)) d s .$ Substituting $L_{n} (t)$ defined in Equation (Equation12(12) $L_{n} (t) = c^{β} (t) + \frac{V (t)}{\sqrt{n}} .$ (12) ) for $C_{n} (t)$ , and applying Taylor's expansion about $c^{β} (t)$ to the drift function $F (x)$ , one obtains for large n $\begin{aligned} c^{β} (t) + \frac{1}{\sqrt{n}} V (t) & = C_{n} (0) + \frac{1}{\sqrt{n}} \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c^{β} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) \\ + \int_{0}^{t} F (c^{β} (s) + \frac{1}{\sqrt{n}} V (s)) d s \\ \approx C_{n} (0) + \frac{1}{\sqrt{n}} \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c^{β} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (c^{β} (s)) d s \\ + \frac{1}{\sqrt{n}} \int_{0}^{t} \partial F (c^{β} (s)) V (s)) d s, \end{aligned}$ where $[\partial F]_{i j} = [\partial_{j} F_{i}]$ . Assuming that $C_{n} (0) = c_{0} + v_{0} / \sqrt{n}$ , where both $c_{0}$ and $v_{0}$ are non-random, and collecting terms of different order in n, the above yields (13) $c^{β} (t) = c_{0} + \int_{0}^{t} F (c^{β} (s)) d s$ (13) and (14) $V (t) = v_{0} + \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c^{β} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} \partial F (c^{β} (s)) V (s) d s,$ (14) showing that the quantities $c^{β} (t)$ and $V (t)$ in Equation (Equation12(12) $L_{n} (t) = c^{β} (t) + \frac{V (t)}{\sqrt{n}} .$ (12) ) are indeed (distributionally) the same as those defined in Theorem A.2.

The usage of $L_{n} (t)$ in place of $D_{n} (t)$ leads to another resampling algorithm similar to the diffusion bootstrap, but potentially exhibiting better numerical stability due to the natural decoupling of the deterministic and stochastic components of the resamples following from Equation (Equation12(12) $L_{n} (t) = c^{β} (t) + \frac{V (t)}{\sqrt{n}} .$ (12) ). For the record, we present the second resampling algorithm below as Algorithm 2.

Formally, the validity of Algorithm 2 is derived from the following analogue of Theorem 4.1. The proof is again deferred to the appendix.

Theorem 4.2.

Let $C_{n} (t)$ and $\hat{β}$ be as in Theorem 4.1. Let $L_{n}^{*} (t)$ denote the LNA process conditional on $\hat{β},$ and let ${\hat{β}}^{*} = {argmin}_{β \in Θ} \sum_{i} | L_{n}^{*} (t_{i}) - c^{β} (t_{i}) |^{2} + γ_{n} h (β)$ be the penalized least-squares estimate associated with the conditional probability space, $(Ω^{*}, F^{*}, P_{n}^{*})$ . Then, under the assumptions of Theorem 3.3, $\forall s \in R^{p} sup_{x} | P_{n}^{*} (\sqrt{n} s^{⊤} ({\hat{β}}^{*} - \hat{β}) \leq x) - P (\sqrt{n} s^{⊤} (\hat{β} - β^{0}) \leq x) | \to 0 a . s . n \to \infty .$

Note that Remark 4.2 following Theorem 4.1 also applies to Theorem 4.3.

4.3. Unobserved species and the EM algorithm

In most practical circumstances, some of the species in the chemical model will be unobservable due to technical or other experimental limitations, like, for instance, the inability to simultaneously record genomic and proteomic data [Citation16]. Let us briefly outline how the inference problem may be handled in such cases. Following the set-up of Section 3, denote by $x_{i}^{j}$ the limiting ( $n \to \infty$ ) scaled concentration of chemical species j at time point $t_{i}$ (assumed as observed) and let $x^{j} = (x_{1}^{j}, \dots, x_{T}^{j})$ and $x = (x^{1}, \dots, x^{r})^{⊤} \in R^{r T}$ . Define in a similar fashion vector $y \in R^{(d - r) T}$ corresponding to the unobserved species, so the full species vector is then $[x, y] = (x^{⊤}, y^{⊤})^{⊤}$ . Under either LNA or the diffusion approximation assumption, the joint limited likelihood of the full data is multivariate normal with its mean $c^{β} = [c_{x}^{β}, c_{y}^{β}] \in R^{d T}$ being the vector of the ODE solutions to Equation (Equation3(3) $\dot{c} (t) = \sum_{k} κ_{k} \prod c_{i}^{ν_{i k}} (t) (ν_{k}^{'} - ν_{k}) .$ (3) ) for all d species and its precision matrix $Σ^{- 1} (β) = [\begin{matrix} {\tilde{Σ}}_{11} (β) & {\tilde{Σ}}_{12} (β) \\ {\tilde{Σ}}_{21} (β) & {\tilde{Σ}}_{22} (β) \end{matrix}],$ where the partitioning patterns reflects the covariance structure between species vectors $x$ and $y$ . In this notation, the expression for the limiting log-likelihood function $ℓ (x, y, | β)$ of the full data satisfies $\begin{aligned} ℓ (x, y | β) & \propto - \log | Σ (β) | - ([x, y] - c^{β})^{⊤} Σ^{- 1} (β) ([x, y] - c^{β}) \\ = - \log | Σ (β) | - (x - c_{x}^{β})^{⊤} {\tilde{Σ}}_{11} (β) (x - c_{x}^{β}) - (y - c_{y}^{β})^{⊤} {\tilde{Σ}}_{21} (β) (x - c_{x}^{β}) \\ - (x - c_{x}^{β})^{⊤} {\tilde{Σ}}_{12} (β) (y - c_{y}^{β}) - (y - c_{y}^{β})^{⊤} {\tilde{Σ}}_{22} (β) (y - c_{y}^{β}) . \end{aligned}$ With the help of the above, the asymptotic inference on β may be performed via the usual EM algorithm [Citation5]. In particular, it follows that the algorithm's E-step has a closed form expression (see, e.g. [Citation14] Chapter 2) and may be maximized numerically as soon as the Gaussian covariance structure is estimated (e.g. via Algorithm 1 or 2).

5. Example: L1 system

As an example, consider a retrotranscription model of LINE1 or L1 gene introduced and studied earlier in [Citation18]. The system equations given below describe the first stage of the retrotranscription process in which the L1 transcript ( $X_{1}$ ) production is activated by aggregated amounts of transcription facilitating proteins ( $X_{2}$ ) that interact with the regulatory region of the gene. The set of chemical equations in the system with effective size n is (15) $\begin{aligned} X_{2} & ⟹ κ_{1} X_{1} + X_{2} X_{1} ⟹ κ_{2} φ, \\ φ & ⟹ n κ_{3} X_{2} X_{2} ⟹ κ_{4} φ . \end{aligned}$ (15) In the above system, the so-called equal dispersal condition (EDC) is often assumed, where $κ_{2} = κ_{4}$ . Under EDC the re-parameterized mass action ODE solution $c^{β}$ in Equation (Equation13(13) $c^{β} (t) = c_{0} + \int_{0}^{t} F (c^{β} (s)) d s$ (13) ) has the closed form (16) $\begin{aligned} c_{1}^{β} (t) & = β_{1} β_{3} [1 + β_{5} (1 + t β_{2}) e^{- β_{2} t}], \\ c_{2}^{β} (t) & = β_{3} (1 + β_{5} e^{- β_{2} t}), \end{aligned}$ (16) where $β_{1} = \frac{κ_{1}}{κ_{2}} β_{2} = κ_{2} β_{3} = \frac{κ_{3}}{κ_{4}} β_{4} = β_{2},$ and $β_{5}$ gives the fold change from the initial concentrations [Citation16]. Note that in this parametrization the regularity conditions (Equation5(5) $\forall β_{1}, β_{2} \in Θ c^{β_{1}} (t_{i}) = c^{β_{2}} (t_{i}) for i = 1, \dots, T implies β_{1} = β_{2}$ (5) ) and (Equation6(6) $\exists_{k} \sum_{i} \partial_{k} c^{β} (t_{i}) (c^{β} (t_{i}) - c^{β^{0}} (t_{i})) \neq 0 for β \neq β^{0}, | β - β^{0} | < ε$ (6) ) are satisfied for L1 system and that the diffusion approximation (Equation10(10) $D_{n} (t) = D_{n} (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (D_{n} (s)) d s,$ (10) ) specializes to (17) $\begin{aligned} {D_{n}}_{1} (t) & = {D_{n}}_{1} (0) + \frac{1}{\sqrt{n}} (\int_{0}^{t} \sqrt{β_{1} β_{2} {D_{n}}_{2} (s)} d B_{1} (s) - \int_{0}^{t} \sqrt{β_{2} {D_{n}}_{1} (s)} d B_{2} (s)) \\ + β_{1} β_{2} \int_{0}^{t} {D_{n}}_{2} (s) d s - β_{2} \int_{0}^{t} {D_{n}}_{1} (s) d s, \\ {D_{n}}_{2} (t) & = {D_{n}}_{2} (0) + \frac{1}{\sqrt{n}} (\sqrt{β_{3} β_{4}} B_{3} (t) - \int_{0}^{t} \sqrt{β_{2} {D_{n}}_{2} (s)} d B_{4} (s)) + β_{3} β_{2} t - β_{2} \int_{0}^{t} {D_{n}}_{2} (s) d s, \end{aligned}$ (17) where $B_{1}, B_{2}, B_{3}, B_{4}$ are independent standard Brownian motions. The LNA for L1 may be derived analogously.

5.1. Asymptotic LSE distribution for L1

In order to illustrate the bootstrap methods from Section 4, we compare the bootstrap estimates of the LSE distributions to their true asymptotic ones in the L1 model under EDC. The mass action ODE for L1 is given by Equation (Equation16(16) $\begin{aligned} c_{1}^{β} (t) & = β_{1} β_{3} [1 + β_{5} (1 + t β_{2}) e^{- β_{2} t}], \\ c_{2}^{β} (t) & = β_{3} (1 + β_{5} e^{- β_{2} t}), \end{aligned}$ (16) ). Upon normalizing to the initial concentrations,^Footnote2 $β_{5} = 1$ and the drift function and its gradient are $\begin{aligned} F (x_{1}, x_{2}) & = (β_{1} β_{2} x_{2} - β_{2} x_{1}, β_{2} β_{3} - β_{2} x_{2}), \\ \partial F (x_{1}, x_{2}) & = [\begin{matrix} - β_{2} & β_{1} β_{2} \\ 0 & - β_{2} \end{matrix}] . \end{aligned}$ The matrices Φ and G in the corresponding CLT covariance calculations (Theorem A.2) are, respectively, $Φ (t, s) = [\begin{matrix} e^{β_{2} (s - t)} & - β_{1} β_{2} (s - t) e^{β_{2} (s - t)} \\ 0 & e^{β_{2} (s - t)} \end{matrix}]$ and $G (x_{1} (s), x_{2} (s)) = [\begin{matrix} β_{1} β_{2} x_{2} (s) + β_{2} x_{1} (s) & 0 \\ 0 & β_{2} β_{3} + β_{2} x_{2} (s) \end{matrix}] .$ Recall that the asymptotic covariance formula for the components $Z_{i}$ of the scaled limiting process in Theorem 3.2 is $Cov (Z_{i}, Z_{j}) = \int_{0}^{t_{i} \land t_{j}} Φ (t_{i}, s) G (c^{β} (s)) (Φ (t_{j}, s))^{⊤} d s$ which may be numerically evaluated along with the remaining terms in the asymptotic variance $Σ_{\hat{β}}$ expression (Equation8(8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) ). For comparison, we generate the L1 system trajectories with $β^{0} = (β_{1}, β_{2}, β_{3}) = (5, 1, .1)$ , $β_{4} = β_{2}$ and $β_{5} = 1$ and assume the process to be observed at the time points from $0.03$ to $1.00$ in the alternating increments of $0.03$ and $0.04$ . For example, the first 5 time points are $0.03, 0.07, 0.10, 0.13, 0.17$ . The resulting asymptotic covariance matrix for the LSE (with no penalty, $γ_{n} = 0$ ) obtained with the help of numerical integration is shown in Figure along with its bootstrap estimates for varying n. For the bootstrap, the diffusion approximation with corresponding $β^{0}$ was applied using the Euler–Maruyama method with step size, $d t = 0.01$ . The number of bootstrap samples was taken as $M = 10, 000$ , for each of the system sizes $n = 10^{2}, 10^{3}, 10^{4}$ . To additionally graphically illustrate the recovery of the asymptotic covariance structure by the bootstrap algorithms, we show in Figure the theoretical (upper panel) and bootstrap approximated (lower panel) bivariate marginal distributions of the LSE of $β^{0}$ , denoted $\hat{β}$ .

Figure 1. Bootstrap estimate of the LSE covariance for the L1 system. True asymptotic LSE covariance matrix vs. diffusion bootstrap covariance matrices for L1 systems with varying n (via Algorithm 1). To facilitate comparisons, the penalty was removed, i.e. $γ_{n} = 0$ for all cases considered.

Figure 2. LSE covariance recovery via bootstrap for the L1 system. The contour plots for the true asymptotic LSE bivariate marginal distributions (top panel) and their LNA bootstrap estimates (bottom panel) for the LSE in the L1 system with $n = 1000$ . The recovery of the multivariate Gaussian distribution covariance structure by the bootstrap procedure is clearly visible.

Figures and indicate that the empirical distribution of the bootstrapped LSEs closely approximates that of the theoretical distribution illustrating in practice the application of our Algorithms 1 and 2. Note that obtaining the bootstrap samples only requires the ability to simulate from the diffusion process.

5.2. Penalized LSE for L1

In most practical cases, the data-generating process cannot be measured with arbitrary precision, due to various data collection limitations or errors arising during data processing. In general, such processing errors (see, e.g. [Citation19]) add variation to the LSEs and may compound the problems with formula (Equation8(8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) ) which in practice needs to be evaluated using estimated values of the model parameters in a fixed volume and hence is sensitive to the LSEs lack of precision. The penalty term $γ_{n} h (β)$ in Equation (4) is intended to improve the LSE estimates precision by giving up some small amount of accuracy (recovering it in the limit, due to $γ_{n} \to 0$ ). Since, as already indicated in Section 3, the penalty term also contributes to stabilizing the inverse matrix in Equation (Equation8(8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) ), the use of penalized formulation (Equation4(8) $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (C_{n} (t_{i}) - c^{β^{0}} (t_{i}))) \\ = \sum_{i} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Var (V_{i}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} (B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{j}) Cov (V_{i}, V_{j}) [(B_{\hat{β}} + γ_{n} H (\hat{β}))^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤}] . \end{aligned}$ (8) ) for LSEs with fixed sample size may bring considerable benefits.

To illustrate this point, we compare now the bootstrap estimates for the lasso-penalized and ordinary $(γ_{n} = 0)$ LSE in the L1 model with various system sizes. The values of the parameters $β^{0}$ are as above. Since the lasso penalty is right-differentiable in our setting, as $β \geq 0$ coordinate-wise, all the forms of penalty discussed in Section 3 are smooth and hence roughly equivalent (see, [Citation10]). To account for the observational errors, we also introduce an independent mean 0 Gaussian noise at each observation grid time point. The variance of this noise is set to 5% of the maximum concentration of mRNA L1 transcripts ( $X_{1}$ ) which is equivalent to approximately 25% of the maximum concentration of the transcription facilitating proteins ( $X_{2}$ ). The choice of penalty rate is $γ_{n} = n^{- 3 / 4}$ for $n = 10, 10^{2}, 10^{3},$ and $10^{4}$ , so as to satisfy the LSE consistency requirements, although in a real data analysis one would typically choose it by some form of a cross-validation procedure (cf. [Citation6]). The results of the analysis for $M = 10, 000$ resamples are summarized in Table . As seen from the table, for the system sizes of $n = 10$ and $n = 100$ the penalized LSE enjoys, respectively, a 10-fold and 2-fold decrease in variance as compared with the ordinary LSE of $β_{1}$ , with only modest increase in bias. For the penalized estimate of $β_{2}$ , there is nearly a 5-fold decrease in variance for $n = 10$ but only a moderate decrease for $n \geq 100$ , while the variability of the $β_{3}$ estimates remaining roughly the same for the LSE and penalized LSE for $n \geq 100$ . From the inspection of table entries it should be clear, that with the noisy data the penalized LSE does better in terms of the variance for small system sizes, whereas for larger n the performance of ordinary and penalized LSE is very similar.

5.3. RT-PCR Data for L1

In order to look at a more realistic biological example, in addition to the above synthetic data, we also consider the real time-PCR data from one of the time course experiments analysing L1 system in HeLa cell cultures [Citation13] under external chemical stress, as described in [Citation17]. The data consist of the L1 transcripts values ( $X_{1}$ ) only, as the protein ( $X_{2}$ ) is unobservable. The L1 transcript measurements from the experiment are presented in Table taken from Rempala [Citation16] where the LSE-based asymptotic confidence bound for this particular data set was obtained by approximating the overall system size and using the asymptotic distribution of the LSE. The system size approximation in [Citation16] was based on a relatively crude fluorescence conversion method introduced in [Citation17]. We shall illustrate now how the bootstrap approach may be used to help identify most likely system size, without using a (possibly unreliable) method based on fluorescent signal conversion. Note that, unlike in previous examples, the time-course L1 transcript measurements reported in Table were obtained from genetically identical but spatially separated L1 cell cultures (different tissue culture plates) and hence represent not a single one, but rather multiple trajectories of the stochastic system (Equation15(15) $\begin{aligned} X_{2} & ⟹ κ_{1} X_{1} + X_{2} X_{1} ⟹ κ_{2} φ, \\ φ & ⟹ n κ_{3} X_{2} X_{2} ⟹ κ_{4} φ . \end{aligned}$ (15) ), each obtained at different time point . Nonetheless, due to the genetic homogeneity of tissue cultures, the initial conditions for the observed trajectories and the respective experimental system sizes n may be assumed to be approximately equal and Theorems 4.1 and 4.3 (and Algorithms 1 and 2) are still applicable in view of Remark 4.2.

Table 2. Time-course RT-PCR experiment results for L1 gene.

Download CSV Display Table

As in our earlier analysis, we took $β_{4} = β_{2}$ and $β_{5} = 1$ reducing Equation (Equation16(16) $\begin{aligned} c_{1}^{β} (t) & = β_{1} β_{3} [1 + β_{5} (1 + t β_{2}) e^{- β_{2} t}], \\ c_{2}^{β} (t) & = β_{3} (1 + β_{5} e^{- β_{2} t}), \end{aligned}$ (16) ) to the following one-dimensional model: (18) $c_{1}^{β} (t) = β_{1} β_{3} [1 + (1 + t β_{2}) e^{- β_{2} t}] .$ (18) Since with the concentration $c_{2}^{β}$ unobserved this model is no longer identifiable, we also fix $β_{3} = 1$ and hence reduce the initial L1 model to only two parameters $(β_{1}, β_{2})$ . Application of the LNA diffusion approximation and the EM algorithm to the limiting LSE problem, as discussed in Section 4.3 (with no penalty, i.e. $γ_{n} = 0$ ),^Footnote3 gave the following values of the estimates, which are very similar to those obtained in [Citation16] ${\hat{β}}_{1} = 6.18, {\hat{β}}_{2} = 0.027.$

Based on these numerical values, the bootstrap algorithm (Algorithm 2) was applied to calculate the confidence bounds for the observed longitudinal trajectory of the L1 transcripts. The bounds depend on the unknown system size n, which was estimated via the method of likelihood profiling as follows. First, for the grid of n values (from $n = 500$ to 9000 with 500 step size), the bootstrap approximation based on $M = 5000$ resamples was obtained using Algorithm 2 (note that with Algorithm 2 the same set of bootstrap resamples may be used with different n). Next, the actual value of the observed error was compared with the modes of the error distribution for varying n. The best mode-to-data alignment (over the gridded values of n) was obtained at $n = 5000$ and hence this value was taken as the estimated effective system size. The process is illustrated in Figure . In Figure , the 95% confidence envelopes of the original analysis in [Citation16] (where the effective system size was taken as $n = 1000$ ) are compared with the ones obtained from the likelihood profiling ( $n = 5000$ ). As may be seen from the plots the new larger estimate of the system size has the effect of shrinking the uncertainty around the observed trajectory due to generally tighter confidence bounds about observed data points. illustrating the advantage of the bootstrap-based analysis for data in Table .

Figure 3. Bootstrap likelihood profiling. The LNA bootstrap ( $M = 5000$ ) plots approximating for several different n values the residuals distributions for the model (Equation18(18) $c_{1}^{β} (t) = β_{1} β_{3} [1 + (1 + t β_{2}) e^{- β_{2} t}] .$ (18) ) fitted to data in Table and comparing them with the actual observed residuals (grey vertical lines). As seen from the plots, around the value $n = 5000$ the mode of the residuals distribution best aligns with the observed value. Note that the alignment is very poor for $n = 1000$ , which is the estimated volume used in [Citation16].

Figure 4. Confidence envelopes for different n values. The ODE model data (Equation18(18) $c_{1}^{β} (t) = β_{1} β_{3} [1 + (1 + t β_{2}) e^{- β_{2} t}] .$ (18) ) fitted to the trajectory data in Table plotted along with the fits 95% confidence envelopes for two different estimated system sizes. The left panel shows the bounds obtained in [Citation16] for the system size $n = 1000$ whereas the right panel shows the bounds obtained by taking $n = 5000$ .

6. Summary and discussion

We proposed here and justified with the appropriate large-volume limit results some new ways of practically estimating the covariance structure and, in general, improving the performance of LSEs in biochemical network models. We discussed three possible approaches based on (a) the penalized least-squares criterion, (b) the fast bootstrap algorithms for covariance estimation and (c) the bootstrap-based profiling method for the determination of effective system size.

With regard to (a), we have shown that under the vanishing penalty on parameters size, the LSEs continue to be consistent in the limit, but may additionally exhibit much better precision in the small-volume settings. Concerning (b), we have argued that the usage of bootstrap Monte-Carlo algorithms may allow to avoid the often difficult calculations of the LSEs asymptotic covariance structure. To improve efficiency, we proposed the usage of diffusion-based bootstrap methods and shown them to be consistent in the large-volume limit. Two types of diffusion approximation were considered: (i) the classical chemical Langevin equation approximation and (ii) the LNA method directly derived from the CLT for jump processes. The latter one, although slightly more biased in our simulation studies, seems particularly attractive as it allows one to generate resamples which are independent of the system size (volume) n. With regard to (c), this aspect of the LNA-based bootstrap may be used to develop an efficient algorithm for estimating the effective volume of the biochemical model, in essence, by analysing the size of the model fit fluctuations for different volume sizes. We have illustrated the practical applicability of this approach to reducing model uncertainty for partially observed, longitudinal RT-PCR data from the LINE 1 retrotransciption network.

Overall, our current work seems to support the conclusion that the bootstrap methods, although not without limitations, like, for example, the requirement to be able to simulate from the entire reaction network, seem to offer a viable alternative to a direct asymptotic approximation based on the estimated covariance structure. One hopes that the results presented here will encourage further study of the general application of resampling to biochemical network inference problems, both in its theoretical and applied aspects, in particular in the setting specifically not considered here when both $T, n \to \infty$ .

Acknowledgments

Research supported in part by US NIH grant R01CA-152158 (DFL, GAR) and US NSF grant DMS-1106485 (GAR)

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. See Section 5.3.

2. This is done out of convenience to eliminate superfluous parameters. Note that any such renormalization possibly effects the units of system volume n.

3. In the particular case, given the large system size n, the penalty would have only negligible effect.

References

H. Andersson and T. Britton, Stochastic Epidemic Models and Their Statistical Analysis, Vol. 4, Springer, New York, NY, 2000.
Google Scholar
D.F. Anderson and T.G. Kurtz, Continuous Time Markov Chain Models for Chemical Reaction Networks, in Design and Analysis of Biomolecular Circuits, H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore, eds., Springer, New York, NY, 2011, 3–42.
Google Scholar
K. Ball, T.G. Kurtz, L. Popovic, and G. Rempala, Asymptotic analysis of multiscale approximations to reaction networks, Ann. Appl. Probab. 16(4) (2006), pp. 1925–1961. doi: 10.1214/105051606000000420
Web of Science ®Google Scholar
M.W. Chevalier and H. El-Samad, A data-integrated method for analyzing stochastic biochemical networks, J. Chem. Phys. 135(21) (2011), 214110. doi: 10.1063/1.3664126
PubMed Web of Science ®Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the em algorithm, J. R. Statist. Soc. Ser. B (Methodol.) 39(1) (1977), pp. 1–38.
Web of Science ®Google Scholar
B. Efron and G. Gong, A leisurely look at the bootstrap, the jackknife, and cross-validation, Amer. Statist. 37(1) (1983), pp. 36–48.
Web of Science ®Google Scholar
S.N. Ethier and T.G. Kurtz, Markov Processes: Characterization and Convergence, Vol. 282, Wiley, New York, NY, 2009.
Google Scholar
D.T. Gillespie, A rigorous derivation of the chemical master equation, Phys. A: Statist. Mech. Appl. 188(1) (1992), pp. 404–425. doi: 10.1016/0378-4371(92)90283-V
Web of Science ®Google Scholar
A. Golightly and D.J. Wilkinson, Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo, Interface Focus 1(6) (2011), pp. 807–820. doi: 10.1098/rsfs.2011.0047
Web of Science ®Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning, Vol. 2, Springer, New York, NY, 2009.
Google Scholar
M. Komorowski, M.J. Costa, D.A. Rand, and M. P.H. Stumpf, Sensitivity, robustness, and identifiability in stochastic chemical kinetics models, Proc. Natl. Acad. Sci. USA 108(21) (2011), pp. 8645–8650. doi: 10.1073/pnas.1015814108
PubMed Web of Science ®Google Scholar
M. Komorowski, B. Finkenstädt, C.V. Harper, and D.A. Rand, Bayesian inference of biochemical kinetic parameters using the linear noise approximation, BMC Bioinformatics 10(1) (2009), p. 343. doi: 10.1186/1471-2105-10-343
PubMed Web of Science ®Google Scholar
J.R. Masters, Hela cells 50 years on: The good, the bad and the ugly, Nat. Rev. Cancer 2(4) (2002), pp. 315–319. doi: 10.1038/nrc775
PubMed Web of Science ®Google Scholar
G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, Vol. 382, Wiley-Interscience, New York, NY, 2007.
Google Scholar
O.D. Perez, P.O. Krutzik, and G.P. Nolan, Flow cytometric analysis of kinase signaling cascades, Methods Mol. Biol. 263(2004), pp. 67–94.
PubMedGoogle Scholar
G.A. Rempala, Least squares estimation in stochastic biochemical networks, Bull. Math. Biol. 74(8) (2012), pp. 1938–1955. doi: 10.1007/s11538-012-9744-y
PubMed Web of Science ®Google Scholar
G.A. Rempala, K.S. Ramos, and T. Kalbfleisch, A stochastic model of gene transcription: an application to l1 retrotransposition events, J. Theoret. Biol. 242(1) (2006), pp. 101–116. doi: 10.1016/j.jtbi.2006.02.010
PubMed Web of Science ®Google Scholar
G.A. Rempala, K.S. Ramos, T. Kalbfleisch, and I. Teneng, Validation of a mathematical model of gene transcription in aggregated cellular systems: application to l1 retrotransposition, J. Comput. Biol. 14(3) (2007), pp. 339–349. doi: 10.1089/cmb.2006.0125
PubMed Web of Science ®Google Scholar
A. Roberts, C. Trapnell, J. Donaghey, J.L. Rinn, and L. Pachter, Improving RNA-Seq expression estimates by correcting for fragment bias, Genome Biol. 12(3) (2011), p. R22. doi: 10.1186/gb-2011-12-3-r22
PubMed Web of Science ®Google Scholar
A. Ruszczynski, Nonlinear Optimization, Vol. 13, Princeton University Press, Princeton, NJ, 2011.
Google Scholar
L. Salwinski and D. Eisenberg, In silico simulation of biological network dynamics, Nat. Biotechnol. 22(8) (2004), pp. 1017–1019. doi: 10.1038/nbt991
PubMed Web of Science ®Google Scholar
J. Shao, Mathematical Statistics, 2nd ed., Springer, New York, NY, 2003.
Google Scholar
R. Tibshirani, Regression shrinkage and selection via the lasso: a retrospective, J. R. Statist. Soc: Ser. B (Statist. Methodol.) 73(3) (2011), pp. 273–282. doi: 10.1111/j.1467-9868.2011.00771.x
Web of Science ®Google Scholar
N.G. Van Kampen, Stochastic Processes in Physics and Chemistry, Vol. 1, Elsevier North-Holland, Amsterdam, 1992.
Google Scholar
M.D. Wang, M.J. Schnitzer, H. Yin, R. Landick, J. Gelles, and S.M. Block, Force and velocity measured for single molecules of RNA polymerase, Science 282(5390) (1998), pp. 902–907. doi: 10.1126/science.282.5390.902
PubMed Web of Science ®Google Scholar
D.A. Wheeler, M. Srinivasan, M. Egholm, Y. Shen, L. Chen, A. McGuire, W. He, Y.-J. Chen, V. Makhijani, G.T. Roth, X. Gomes, K. Tartaro, F. Niazi, C.L. Turcotte, G.P. Irzyk, J.R. Lupski, C. Chinault, X.-z. Song, Y. Liu, Y. Yuan, L. Nazareth, X. Qin, D.M. Muzny, M. Margulies, G.M. Weinstock, R.A. Gibbs, and J.M. Rothberg, The complete genome of an individual by massively parallel DNA sequencing, Nature 452(7189) (2008), pp. 872–876. doi: 10.1038/nature06884
PubMed Web of Science ®Google Scholar
D.J. Wilkinson, Stochastic modelling for quantitative description of heterogeneous biological systems, Nat. Rev. Genet. 10(2) (2009), pp. 122–133. doi: 10.1038/nrg2509
PubMed Web of Science ®Google Scholar
D.J. Wilkinson, Stochastic Modelling for Systems Biology, Vol. 44, CRC Press, Boca Raton, FL, 2011.
Google Scholar
H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Statist. Soc: Ser. B (Statist. Methodol.) 67(2) (2005), pp. 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
Web of Science ®Google Scholar

Appendix 1. Density-dependent Markov jump process

In the following, suppose the state

{X_{n} (t) : t \geq 0}

is a continuous-time Markov process on the positive orthant of d-dimensional lattice,

Z_{\geq 0}^{d}

with jump intensities

q_{x, x + ν_{k}^{'} - ν_{k}}^{(n)} = n λ_{k} (n^{- 1} x)

for

x \in Z_{\geq 0}^{d}

ν_{k}^{'} - ν_{k} \in Z^{d}

and the transition probabilities

P (X_{n} (t + h) = x + ν_{k}^{'} - ν_{k} | X_{n} (t) = x) = h n λ_{k} (n^{- 1} x) + o (h)

and

P (X_{n} (t + h) = x | X_{n} (t) = x) = 1 - h n \sum_{k} λ_{k} (n^{- 1} x) + o (h)

. It is assumed that there are a finite number of vectors,

ν_{k}^{'} - ν_{k} \in Z^{d}

such that

\sup_{x} λ_{k} (x) > 0

and that

λ_{k} (x)

is a continuous function of x for each k. Such processes are known as the DDMJPs since their transition rates depend on the density of the process (see, e.g. [Citation16]). Note that under the approximation (Equation2) the process given in Equation (Equation1) is a special case of the DDMJP. If

Y_{l} = {Y_{k} (t) : t \geq 0}

is a collection of independent unit Poisson processes then

(A1)

X_{n} (t) = X_{n} (0) + \sum_{k} Y_{k} (n \int_{0}^{t} λ_{k} (n^{- 1} X_{n} (s)) d s) (ν_{k}^{'} - ν_{k}) .

(A1) Define the centred Poisson process

{\hat{Y}}_{k} (n t) = Y_{k} (n t) - n t

and

C_{n} (t) = X_{n} (t) / n

. The corresponding drift function is given by Equation (Equation11), that is,

F (x) = \sum_{k} λ_{k} (x) (ν_{k}^{'} - ν_{k})

, where for notational simplicity we no longer emphasize the trajectory or rate dependance on any parameters. Then, Equation (EquationA1) becomes

(A2)

C_{n} (t) = C_{n} (0) + n^{- 1} \sum_{k} {\hat{Y}}_{k} (n \int_{0}^{t} λ_{k} (C_{n} (s)) d s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (C_{n} (s)) d s .

(A2) The following result due to Kurtz [Citation1, Chapter 5] establishes SLLN for

C_{n} (t)

Theorem A.1

(SLLN)

Let $lim_{n \to \infty} C_{n} (0) = c_{0}$ and suppose that for any compact $K \in R^{d}$ there exists a constant $M_{K}$ such that $| F (x) - F (y) | \leq M_{K} | x - y |, \forall x, y \in K$ . Then $lim_{n \to \infty} sup_{s \leq t} | C_{n} (s) - c (s) | = 0 a . s .,$ where $c (t)$ is the deterministic solution to the integral equation, $c (t) = c_{0} (t) + \int_{0}^{t} F (c (s)) d s .$

For the inferential purposes, it is important to know that the scaled fluctuation of the DDMJP around the deterministic ODE solution, $V_{n} = \sqrt{n} (C_{n} (t) - c (t))$ is Gaussian. This fact follows from the following CLT due to Kurtz [Citation1, Chapter 5]. Let $\partial F \in R^{d \times d}$ such that $[\partial F]_{i j} = [\partial_{j} F_{i}]$ .

Theorem A.2

(CLT)

Suppose $\partial F$ is continuous and $lim_{n \to \infty} V_{n} (0) = v_{0}$ is non-random. Then, $V_{n} = \sqrt{n} (C_{n} (t) - c (t)) \Rightarrow V,$ where the process V is given by (A3) $V (t) = v_{0} + \sum_{k} B_{k} (\int_{0}^{t} λ_{k} (c (s)) d s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} \partial F (c (s)) V (s) d s,$ (A3) or, in a distributionally equivalent Itô diffusion form, by (A4) $V (t) = v_{0} + \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} \partial F (c (s)) V (s) d s .$ (A4) The process V is a Gaussian vector process with covariance (A5) $Cov (V (t), V (r)) = \int_{0}^{r \land t} Φ (t, s) G (c (s)) Φ (r, s)^{⊤} d s,$ (A5) where $Φ \in R^{d \times d}$ satisfies $\frac{\partial}{\partial s} Φ (t, s) = Φ (t, s) \partial F (c (s)) Φ (s, s) = I$ and $G (x) = \sum_{k} (ν_{k}^{'} - ν_{k}) (ν_{k}^{'} - ν_{k})^{⊤} λ_{k} (x)$ .

Appendix 2. Diffusion approximation and the bootstrap

Recall the Itô diffusion process $D_{n}$ defined by Equation (Equation10(10) $D_{n} (t) = D_{n} (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (D_{n} (s)) d s,$ (10) ). The following result indicates that $D_{n}$ has the same consistency property as the scaled pure jump process (EquationA2(A2) $C_{n} (t) = C_{n} (0) + n^{- 1} \sum_{k} {\hat{Y}}_{k} (n \int_{0}^{t} λ_{k} (C_{n} (s)) d s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (C_{n} (s)) d s .$ (A2) ) and may be therefore used as its approximation.

Lemma A.1

Suppose that $lim_{n \to \infty} D_{n} (0) \to c (0)$ a constant, and that for each compact set $K \in R^{d}$ we have a Lipschitz constant, $M_{K},$ such that $| F (x) - F (y) | \leq M_{K} | x - y |,$ $\forall x, y \in K,$ then $lim_{n \to \infty} sup_{s \leq t} | D_{n} (s) - c (s) | = 0$ almost surely, where $c (t)$ is the solution to $c (t) = c (0) + \int_{0}^{⊤} F (c (s)) d s$ and $F (x) = \sum_{k} (ν_{k}^{'} - ν_{k}) λ_{k} (x), x \in R^{d}$ .

Proof.

By the definition of $D_{n} (t)$ given in Equation (Equation10(10) $D_{n} (t) = D_{n} (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} F (D_{n} (s)) d s,$ (10) ) $\begin{aligned} | D_{n} (s) - c (s) | & = |D_{n} (0) - c (0) + \sum_{k} \frac{1}{\sqrt{n}} \int_{0}^{s} \sqrt{λ_{k} (D_{n} (r))} d B_{k} (r) (ν_{k}^{'} - ν_{k}) + (\int_{0}^{s} (F (D_{n} (r)) - F (c (r))) d r)| \\ \leq | D_{n} (0) - d (0) | + \frac{1}{\sqrt{n}} \sum_{k} | ν_{k}^{'} - ν_{k} | |\int_{0}^{s} \sqrt{λ_{k} (D_{n} (z))} d B_{k} (z)| + \int_{0}^{s} M_{K} | D_{n} (r) - c (r) | d r \end{aligned}$ implying, $\begin{aligned} | D_{n} (s) - c (s) | \leq | D_{n} (0) - c (0) | + \frac{1}{\sqrt{n}} \sum_{k} | ν_{k}^{'} - ν_{k} | \sqrt{{\bar{λ}}_{k}} sup_{0 \leq r_{1} \leq r_{2} \leq s} | B_{k} (r_{1}) - B_{k} (r_{2}) | + \int_{0}^{s} M_{K} | D_{n} (r) - c (r) | d r, \end{aligned}$ where ${\bar{λ}}_{k} = sup_{x \in K} λ_{k} (x)$ . For almost all ω, $B_{k} (ω, t)$ , is a continuous process, and thus achieves its maximum and minimum on every compact set, implying that $sup_{0 \leq r_{1} \leq r_{2} \leq s} | B_{k} (ω, r_{1}) - B_{k} (ω, r_{2}) |$ is bounded, except on a set of measure 0. (Note: We have suppressed the inherent sample space notation, ω, except when needed, as is standard for brevity). Then, by Gronwall's inequality (cf. [Citation1], Chapter 5), $| D_{n} (s) - c (s) | \leq (| D_{n} (0) - c (0) | + \frac{1}{\sqrt{n}} \sum_{k} | ν_{k}^{'} - ν_{k} | \sqrt{{\bar{λ}}_{k}} sup_{0 \leq r_{1} \leq r_{2} \leq s} | B_{k} (r_{1}) - B_{k} (r_{2}) |) e^{M_{K} s}$ yielding $sup_{s \leq t} | D_{n} (s) - c (s) | \leq (| D_{n} (0) - c (0) | + \frac{1}{\sqrt{n}} \sum_{k} | ν_{k}^{'} - ν_{k} | \sqrt{{\bar{λ}}_{k}} sup_{0 \leq r_{1} \leq r_{2} \leq t} | B_{k} (r_{1}) - B_{k} (r_{2}) |) e^{M_{K} t} .$ As $n \to \infty$ , the first term of the RHS converges almost surely to 0 by the theorem's hypothesis. The second term involves a finite weighted sum of the expressions of type $sup_{0 \leq r_{1} \leq r_{2} \leq s} | B_{k} (ω, r_{1}) - B_{k} (ω, r_{2}) |$ which are bounded for almost all ω, implying that $(1 / \sqrt{n}) sup_{0 \leq r_{1} \leq r_{2} \leq s} | B_{k} (ω, r_{1}) - B_{k} (ω, r_{2}) |$ converges to 0 a.s.

The following result shows that the properly centred and scaled diffusion process converges in distribution to the same limit as the centred and scaled pure jump process.

Lemma A.2.

With the assumption $c_{n} (0) = \sqrt{n} (D_{n} (0) - c (0)) \to c_{0}$ a constant, and those of Theorem 2, the process, $V_{n}^{'} (t) = \sqrt{n} (D_{n} (t) - c (t))$ converges in distribution to the process, $V^{'} (t)$ which is distributionally equivalent to $V (t)$ defined in Theorem A.2.

Proof.

By definition, $\begin{aligned} V_{n}^{'} (t) & = \sqrt{n} (D_{n} (t) - c (t)) = c_{n} (0) + \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) \\ + \int_{0}^{t} \partial F (c (s)) V_{n}^{'} (s) d s + \int_{0}^{t} O (| D_{n} (s) - c (s) |) V_{n}^{'} (s) d s . \end{aligned}$

Define, as before, $\begin{aligned} U_{n} (t) & := \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} O (| D_{n} (s) - c (s) |) V_{n}^{'} (s) d s \\ = - c_{n} (0) + V_{n}^{'} (t) - \int_{0}^{t} \partial F (c (s)) V_{n}^{'} (s) d s . \end{aligned}$ Since the process $D_{n} (s)$ is non-anticipating and converges in probability pointwise to $c (s)$ by Lemma A.3, the dominated convergence theorem (DCT) for stochastic integrals implies that $lim_{n \to \infty} U_{n} (t)$ can be computed by interchanging the limit with the first integral, and that the classical DCT can be applied to the second. This gives $\begin{aligned} U_{n} (t) & \Rightarrow \sum_{k} \int_{0}^{t} lim_{n \to \infty} \sqrt{λ_{k} (D_{n} (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) + \int_{0}^{t} lim_{n \to \infty} O (| D_{n} (s) - c (s) |) V_{n}^{'} (s) d s \\ = \sum_{k} \int_{0}^{t} \sqrt{λ_{k} (c (s))} d B_{k} (s) (ν_{k}^{'} - ν_{k}) = U (t) . \end{aligned}$ Again, defining Φ to be the matrix function solution to $\frac{\partial}{\partial s} Φ (t, s) = - Φ (t, s) \partial F (c (s)), Φ (s, s) = I,$ we have $\begin{aligned} \int_{0}^{⊤} Φ (t, s) d U_{n} (s) & = \int_{0}^{⊤} Φ (t, s) d V_{n}^{'} (s) - \int_{0}^{⊤} Φ (t, s) \partial F (c (s)) V_{n}^{'} (s) d s \\ = Φ (t, t) V_{n}^{'} (t) - Φ (t, 0) c_{n} (0) - \int_{0}^{⊤} {Φ′}^{} (t, s) V_{n}^{'} (s) d s - \int_{0}^{⊤} Φ (t, s) \partial F (c (s)) V_{n}^{'} (s) d s \\ = V_{n}^{'} (t) - Φ (t, 0) c_{n} (0) . \end{aligned}$ Thus, $V_{n}^{'} (t) = Φ (t, 0) c_{n} (0) + \int_{0}^{⊤} Φ (t, s) d U_{n} (s),$ and $U_{n} \Rightarrow U$ implies $V_{n}^{'} = h (U_{n}) \Rightarrow h (U) = Φ (t, 0) c_{0} + \int_{0}^{⊤} Φ (t, s) d U (s) = V^{'}$ again by the continuous mapping theorem. This is clearly a Gaussian vector process by definition, with its covariance function given by $\begin{aligned} E (V^{'} (t) V^{'} (r)^{⊤}) & = E [\int_{0}^{⊤} Φ (t, s) d U (s) {(\int_{0}^{r} Φ (r, s) d U (s))}^{⊤}] \\ = E lim_{| s_{i + 1} - s_{i} | \to 0} (\sum_{i} Φ (t, s_{i}) (U (s_{i + 1}) - U (s_{i})) {(\sum_{i} Φ (r, s_{i}) (U (s_{i + 1}) - U (s_{i})))}^{⊤}) \\ = E lim_{| s_{i + 1} - s_{i} | \to 0} \sum_{i} Φ (t, s_{i}) \sum_{k} \sqrt{λ_{k} (c (s_{i}))} (B_{k} (s_{i + 1}) - B_{k} (s_{i})) (ν_{k}^{'} - ν_{k}) \\ \times (\sum_{i} Φ (r, s_{i}) \sum_{k} \sqrt{λ_{k} (c (s_{i}))} (B_{k} (s_{i + 1}) - B_{k} (s_{i})) (ν_{k}^{'} - ν_{k}))^{⊤}) \\ = lim_{| s_{i + 1} - s_{i} | \to 0} E (\sum_{i, j} Φ (t, s_{i}) \sum_{k, l} (ν_{k}^{'} - ν_{k}) (ν_{l}^{'} - ν_{l})^{⊤} \sqrt{λ_{k} (c (s_{i}))} \sqrt{λ_{j} (c (s_{j}))} \\ \times (B_{k} (s_{i + 1}) - B_{k} (s_{i})) (B_{l} (s_{j + 1}) - B_{l} (s_{j})) Φ (r, s_{j})^{⊤}) . \end{aligned}$ Since the Brownian motions $B_{k}$ and $B_{l}$ are independent, as well as the increments $(B_{k} (s_{i + 1}) - B_{k} (s_{i}))$ are independent of $(B_{k} (s_{j + 1}) - B_{k} (s_{j}))$ , we have $\begin{aligned} E (V^{'} (t) V^{'} (r)^{⊤}) & = lim_{| s_{i + 1} - s_{i} | \to 0} \sum_{i} Φ (t, s_{i}) \sum_{k} λ (c (s_{i})) (ν_{k}^{'} - ν_{k}) (ν_{k}^{'} - ν_{k})^{⊤} E (B_{k} (s_{i + 1}) - B_{k} (s_{i}))^{2} Φ (r, s_{i})^{⊤} \\ = lim_{| s_{i + 1} - s_{i} | \to 0} \sum_{i} Φ (t, s_{i}) \sum_{k} λ (c (s_{i})) (ν_{k}^{'} - ν_{k}) (ν_{k}^{'} - ν_{k})^{⊤} Φ (r, s_{i})^{⊤} (s_{i + 1} - s_{i}) \\ = \int_{0}^{t \land r} Φ (t, s) G (c (s)) Φ (r, s)^{⊤} d s \end{aligned}$ and therefore $V^{'}$ and V have the same distribution.

The following observation regarding the LNA process will be useful at this point.

Remark A.1.

In view of the definition of the LNA process $L_{n} (t)$ given by Equation (Equation12(12) $L_{n} (t) = c^{β} (t) + \frac{V (t)}{\sqrt{n}} .$ (12) ), it is easy to see that the results of Lemmas A.3 and A.4 hold true when $C_{n} (t)$ is replaced by $L_{n} (t)$ .

With the above lemmas, we are finally in a position to prove the bootstrap consistency result of Theorem 4.1.

Proof of Theorem 4.1

Let $\Rightarrow^{*}$ indicate that the sequence of probability distributions associated with the conditional (bootstrap) space converges to the distribution of a multivariate normal random vector with covariance $Σ_{β^{0}}$ . The notation $a . s . (ω)$ and $a . s . (ω^{*})$ indicates the property holds for almost every ω and almost every $ω^{*}$ , respectively. In view of the assumed differentiability of the penalty function h, any minimizing solution, ${\hat{β}}^{*}$ , satisfying ${\hat{β}}^{*} = {argmin}_{β \in Θ} \sum_{i} | D_{n}^{*} (t_{i}) - c^{β} (t_{i}) |^{2} + 2 γ_{n} h (β)$ must also necessarily satisfy $\sum_{i} \partial c^{{\hat{β}}^{*}} (t_{i}) (D_{n}^{*} (t_{i}) - c^{{\hat{β}}^{*}} (t_{i})) + γ_{n} \nabla h ({\hat{β}}^{*}) = 0$ . Expansion about $\hat{β}$ gives $\begin{aligned} \sum_{i} \partial c^{{\hat{β}}^{*}} (t_{i}) \sqrt{n} (D_{n}^{*} (t_{i}) - c^{\hat{β}} (t_{i})) & = \sum_{i} \partial c^{{\hat{β}}^{*}} (t_{i}) \sqrt{n} (c^{{\hat{β}}^{*}} (t_{i}) - c^{\hat{β}} (t_{i})) + \sqrt{n} γ_{n} \nabla h ({\hat{β}}^{*}) \\ = \sum_{i} \partial c^{{\hat{β}}^{*}} (t_{i}) [\partial c^{\hat{β}} (t_{i})]^{⊤} \sqrt{n} ({\hat{β}}^{*} - \hat{β}) + \sqrt{n} γ_{n} H ({\hat{β}}^{*}) ({\hat{β}}^{*} - \hat{β}) \\ + O (| {\hat{β}}^{*} - \hat{β} |) \sqrt{n} ({\hat{β}}^{*} - \hat{β}) \end{aligned}$ where $H (β)$ is the Hessian of the penalty term evaluated at β. Lemma A.3 gives $O (| {\hat{β}}^{*} - \hat{β} |) \to 0 a . s . (ω^{*})$ implying that $O (| {\hat{β}}^{*} - \hat{β} |) \sqrt{n} ({\hat{β}}^{*} - \hat{β})$ converges to 0 in view of Lemma A.4, validating the removal in the limit of the higher order terms. Setting $B_{{\hat{β}}^{*}} = \sum_{i} \partial c^{{\hat{β}}^{*}} (t_{i}) [\partial c^{\hat{β}} (t_{i})]^{⊤} + (γ_{n} / 2) H ({\hat{β}}^{*})$ we arrive at $\sqrt{n} ({\hat{β}}^{*} - \hat{β}) = \sum_{i} B_{{\hat{β}}^{*}}^{- 1} \partial c^{{\hat{β}}^{*}} (t_{i}) \sqrt{n} (D_{n}^{*} (t_{i}) - c^{\hat{β}} (t_{i})) .$ By the consistency for the diffusion approximation (Lemma A.3), ${\hat{β}}^{*} \to \hat{β} a . s . (ω^{*})$ , implying that $\partial c^{{\hat{β}}^{*}} (t_{i}) \to \partial c^{\hat{β}} (t_{i})$ and $B_{{\hat{β}}^{*}}^{- 1} \to B_{\hat{β}}^{- 1}$ $a . s . (ω^{*})$ . Since $\sqrt{n} (D_{n}^{*} (t) - c^{\hat{β}} (t))$ converges to the same distribution as the Gaussian vector process with mean 0 and covariance function $Σ_{\hat{β}}^{*} (t, r) = \int_{0}^{t \land r} Φ^{\hat{β}} (t, s) G (c^{\hat{β}} (s)) Φ^{\hat{β}} (r, s)^{⊤} d s,$ where $\frac{\partial}{\partial s} Φ^{\hat{β}} (t, s) = - Φ^{\hat{β}} (t, s) \partial F (c^{\hat{β}} (s)), Φ^{\hat{β}} (s, s) = I,$ we have (A6) $\forall s \in R^{p} sup_{x} | P_{n}^{*} (\sqrt{n} s^{⊤} ({\hat{β}}^{*} - \hat{β}) \leq x) - P (s^{⊤} Z_{n} \leq x) | \to 0 a . s .$ (A6) where $Z_{n} \sim MVN (0, Σ_{\hat{β}})$ is the zero-centred multivariate normal measure with covariance $\begin{aligned} Σ_{\hat{β}} & = Var (\sum_{i} B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{i}) \sqrt{n} (D_{n} (t_{i}) - c^{\hat{β}} (t_{i}))) \\ = \sum_{i} B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{i}) Σ_{\hat{β}}^{*} (t_{i}, t_{i}) [B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} \\ + \sum_{i \neq j} [B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{i}) Σ_{\hat{β}}^{*} (t_{i}, t_{j}) [B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{j})]^{⊤} \\ + \sum_{i \neq j} B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{j}) Σ_{\hat{β}}^{*} (t_{i}, t_{j}) [B_{\hat{β}}^{- 1} \partial c^{\hat{β}} (t_{i})]^{⊤} . \end{aligned}$ By consistency of the penalized LSE for the pure jump process, $C_{n} (t)$ , $\hat{β} \to β^{0} a . s . (ω)$ , implying that $\partial c^{\hat{β}} (t_{i}) \to \partial c^{β^{0}} (t_{i})$ and $B_{\hat{β}}^{- 1} \to B_{β^{0}}^{- 1}$ $a . s . (ω)$ . An appeal to Slutsky's theorem gives (A7) $MVN (0, Σ_{\hat{β}}) \Rightarrow MVN (0, Σ_{β^{0}}) .$ (A7) Asymptotic normality of the penalized LSE for the pure jump process implies (A8) $\sqrt{n} (\hat{β} - β^{0}) \Rightarrow MVN (0, Σ_{β^{0}}) .$ (A8) Application of the Cramer–Wold device to Equations (EquationA7(A7) $MVN (0, Σ_{\hat{β}}) \Rightarrow MVN (0, Σ_{β^{0}}) .$ (A7) ) and (EquationA8(A8) $\sqrt{n} (\hat{β} - β^{0}) \Rightarrow MVN (0, Σ_{β^{0}}) .$ (A8) ), along with Equation (EquationA6(A6) $\forall s \in R^{p} sup_{x} | P_{n}^{*} (\sqrt{n} s^{⊤} ({\hat{β}}^{*} - \hat{β}) \leq x) - P (s^{⊤} Z_{n} \leq x) | \to 0 a . s .$ (A6) ) imply $\forall s \in R^{p} sup_{x} | P_{n}^{*} (\sqrt{n} s^{⊤} ({\hat{β}}^{*} - \hat{β}) \leq x) - P_{n} (\sqrt{n} s^{⊤} (\hat{β} - β^{0}) \leq x) | \to 0 a . s .,$ which in turn implies $\sqrt{n} ({\hat{β}}^{*} - \hat{β}) \Rightarrow^{*} MVN (0, Σ_{β^{0}}) .$

Proof of Theorem 4.3

In view of Remark A.5, proof of Theorem 4.1 may be virtually repeated with $L_{n} (t)$ in place of $C_{n} (t)$ to obtain the hypothesis.

Bootstrapping least-squares estimates in biochemical reaction networks

Abstract

1. Introduction

2. Jump processes and mass action kinetics

3. Penalized least squares

SLLN

CLT

Table 1. Penalized vs. ordinary LSEs for L1.