Search in:

Journal of Biological Dynamics Volume 9, 2015 - Issue 1

Submit an article Journal homepage

Open access

770

Views

CrossRef citations to date

Altmetric

Listen

Original Articles

Information content in data sets for a nucleated-polymerization model

H.T. BanksCenter for Research in Scientific Computation, North Carolina State University, Raleigh, NC27695-8212, USACorrespondence[email protected]
View further author information

Marie DoumicLaboratoire Jacques-Louis Lions, Université Pierre et Marie Curie, 4 place Jussieu, boîte courrier 187, 75252Paris, Cedex 05, France;INRIA Paris-Rocquencourt, MAMBA project-team, domaine de Voluceau, BP 105, 78153Rocquencourt, FranceView further author information

Carola KruseLaboratoire Jacques-Louis Lions, Université Pierre et Marie Curie, 4 place Jussieu, boîte courrier 187, 75252Paris, Cedex 05, France;INRIA Paris-Rocquencourt, MAMBA project-team, domaine de Voluceau, BP 105, 78153Rocquencourt, FranceView further author information

Stephanie PrigentLaboratoire Jacques-Louis Lions, Université Pierre et Marie Curie, 4 place Jussieu, boîte courrier 187, 75252Paris, Cedex 05, France;INRIA Paris-Rocquencourt, MAMBA project-team, domaine de Voluceau, BP 105, 78153Rocquencourt, FranceView further author information

Human RezaeiInstitut National de Recherche Agronomique, UR892, Virologie Immunologie Moléculaires, Jouy-en-Josas, FranceView further author information

Pages 172-197 | Received 30 Nov 2014, Accepted 04 May 2015, Published online: 05 Jun 2015

Cite this article
https://doi.org/10.1080/17513758.2015.1050465
CrossMark

In this article

1. Introduction
2. The model
3. The inverse problem
4. SEs and asymptotic analysis
5. Sensitivity motivated inverse problems
6. Model comparison tests
7. Conclusions and suggested further efforts
Acknowledgements
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We illustrate the use of statistical tools (asymptotic theories of standard error quantification using appropriate statistical models, bootstrapping, and model comparison techniques) in addition to sensitivity analysis that may be employed to determine the information content in data sets. We do this in the context of recent models [S. Prigent, A. Ballesta, F. Charles, N. Lenuzza, P. Gabriel, L.M. Tine, H. Rezaei, and M. Doumic, An efficient kinetic model for assemblies of amyloid fibrils and its application to polyglutamine aggregation, PLoS ONE 7 (2012), e43273. doi:10.1371/journal.pone.0043273.] for nucleated polymerization in proteins, about which very little is known regarding the underlying mechanisms; thus, the methodology we develop here may be of great help to experimentalists. We conclude that the investigated data sets will support with reasonable levels of uncertainty only the estimation of the parameters related to the early steps of the aggregation process.

Keywords:

inverse problems
polyglutamine and aggregation modelling
information content
sensitivity
uncertainty quantification

Mathematics Subject Classification:

65M32
62P10
64B10
49Q12

1. Introduction

As mathematical models become more complex with multiple states, increasing numbers of parameters need to be estimated using experimental data. Thus there is a need for critical analysis in model validation related to the reliability of parameter estimates obtained in model fitting. A recent concrete example involves previous HIV models [Citation1,Citation6] with 15 or more parameters to be estimated. In [Citation8], using recently developed parameter selectivity tools [Citation7] based on parameter sensitivity-based scores, it was shown that a number of these parameters could not be estimated with any degree of reliability. Moreover, we found that quantifiable uncertainty varies among patients depending upon the number of treatment interruptions (perturbations of therapy). This leads to a fundamental question: How much information with respect to model validation can be expected in a given data set or collection of data sets?

Here we illustrate the use of other tools: asymptotic theories (as the data sample size increases without bound) of standard error (SE) quantification using appropriate statistical models, bootstrapping (repeated sampling of synthetic data similar to the original data), and statistical (ANOVA type) model comparison techniques – these are discussed in some detail in the recent monograph [Citation11] as well as in numerous statistical texts. Such techniques are employed in addition to sensitivity theory in order to determine the information content in data sets. We pursue this in the context of recent models [Citation25] for nucleated polymerization in proteins.

After briefly outlining the biological context of amyloid formation, we describe the model in Section 2. In Section 3, we investigate the statistical model to be used with our noisy data. This is a necessary step in order to use the correct error model in our generalized least-squares (GLS) minimization. This also reveals information on our experimental observation process. Once we have found parameters which allow a reasonable fit, we determine the confidence we may have in our estimation procedures. We do this in Section 4, using both the condition number of the covariance matrix and a sensitivity analysis. This reveals a smaller number of parameters (than those estimated in [Citation25]) to be reasonably sensitive to the data sets, whereas others do not really affect the quality of the fits to our data. To further support our sensitivity findings, we then apply a bootstrapping analysis in Section 5. We are lead to four main parameters and compare their resulting errors with the asymptotic confidence intervals of Section 4. Finally, in Section 6, we carry out model comparison tests [Citation3,Citation5,Citation11] as used in [Citation10], and these lead us to select three parameters out of the nine original ones estimated in [Citation25] that can be reliably estimated.

1.1. Protein polymerization

It is now known that several neurodegenerative disorders, including Alzheimer's disease, Huntington's disease, and Prion diseases, for example, mad cow, are related to aggregations of proteins presenting an abnormal folding. These protein aggregates are called amyloids and have become a focus of modelling efforts in recent years [Citation12,Citation25,Citation29–31]. One of the main challenges in this field is to understand the key aggregation mechanisms, both qualitatively and quantitatively. In order to test our methodology on a relatively simple case, we focus here on polyglutamine (PolyQ)-containing proteins. This was also the case study chosen to illustrate the fairly general ordinary differential equation (ODE)–partial differential equation (PDE) model proposed in [Citation25]; the reason for our choice is that, as shown in [Citation25], the polymerization mechanisms prove to be simpler for PolyQ aggregation than for other types of proteins ( e.g. PrP) [Citation21]. To understand data sets from experiments carried out by Human Rezaei and his team at INRA (Virologie et Immunologie Moleculaires) [Citation25], we adapt the general model to this context. The data sets (DS1–DS4) of interest to us here are depicted in Figure and record the evolution of normalized total polymerized mass in time.

Figure 1. The data sets of interest from [Citation9,Citation25]. The total polymerized mass is measured by Thioflavin T (ThT), which is one of the most common experimental tools for in vitro protein polymerization [Citation25,Citation29]. (available in colour online)

In [Citation25] and a subsequent effort in [Citation9], the authors sought to investigate several questions including (i) understanding the key polymerization mechanisms, (ii) how to numerically approximate the model, and (iii) how to select parameters and calibrate the model. Here we briefly summarize results related to (ii) and focus primarily on (iii).

2. The model

2.1. Original ODE model

We briefly outline the model that is the same as that of [Citation25]. Let $(V, V^{*}, c_{i})$ be the main variables of interest consisting, respectively, of concentrations of the normal proteins that we will call monomers (basic subunits that are repeated in a chainlike fashion), of the monomeric proteins presenting an abnormal configuration that we will call conformers, and of the i-polymers made of i aggregated abnormal proteins. The following comprise the fundamental dynamics modelled in [Citation25] which the interested reader may consult for further details:

Monomer–conformer exchange: $V \overset{k_{I}^{+}}{\underset{k_{I}^{-}}{⇌}} V^{*}$ . This represents the spontaneous formation of an active form of the monomer, denoted here $V^{*}$ , out of the initially present inert form denoted V. The inert form cannot react and form fibrils, whereas the active conformer may.
Nucleation: $\underset{i_{0}}{\underset{⏟}{V^{*} + V^{*} + \dots + V^{*}}} \overset{k_{on}^{N}}{\underset{k_{off}^{N}}{⇌}} c_{i_{0}}$ . This is the spontaneous formation of the smallest stable polymer, formed by the addition of a certain number $i_{0}$ of active conformers. This smallest stable polymer is called the nucleus.
Polymerization by conformer addition: $c_{i} + V^{*} \overset{k_{on}^{i}}{\underset{}{⇌}} c_{i + 1}$ . Once a nucleus is formed, its size grows progressively by addition of an active conformer.

Other reactions like fragmentation and coalescence are negligible for the case of PolyQ-containing proteins (see [Citation25] for experimental justification).

The law of mass action in the deterministic framework (see [Citation5,Citation27] and the numerous references therein) translates $A + B \overset{k_{I}^{+}}{\underset{k_{I}^{-}}{⇌}} A^{'} + B^{'}$ into the ODE $d [A] / d t = - k^{+} [A] [B] + k^{-} [A]^{'} [B]^{'}$ . Using these basic ideas we obtain the infinite system of ODEs studied in [Citation25] (1) $\frac{d V}{d t} = - k_{I}^{+} V + k_{I}^{-} V^{*},$ (1) (2) $\frac{d V^{*}}{d t} = k_{I}^{+} V - k_{I}^{-} V^{*} + i_{0} k_{off}^{N} c_{i_{0}} - V^{*} \sum_{i \geq i_{0}} k_{on}^{i} c_{i},$ (2) (3) $\frac{d c_{i_{0}}}{d t} = k_{on}^{N} (V^{*})^{i_{0}} - k_{off}^{N} c_{i_{0}} - k_{on}^{i_{0}} c_{i_{0}} V^{*},$ (3) (4) $\frac{d c_{i}}{d t} = V^{*} (k_{on}^{i - 1} c_{i - 1} - k_{on}^{i} c_{i}), i = i_{0} + 1, \dots$ (4) with initial conditions $V (0) = c_{0}, V^{*} (0) = 0, c_{i_{0}} (0) = c_{i} (0) = 0$ and the mass balance equation $\frac{d}{d t} (V + V^{*} + \sum_{i = i_{0}}^{\infty} i c_{i}) = 0.$

The experiments of interest to us measure the a-dimensional total polymerized mass, or Madim which is given by $M (t) = \sum_{i \geq i_{0}} i c_{i} (t) .$

2.2. An approximate PDE system and the associated forward problem

Amyloid formations are characterized by very long polymers (a fibril may contain up to $10^{6}$ monomer units). A PDE version of the standard model, where a continuous variable x approximates the discrete sizes i, is thus a reasonable approximation for large amyloid polymers. However, for small polymer sizes this resulting continuum approximation does not work very well. Thus, we take a ‘hybrid approach’ of retaining the ODE for smaller sizes and using the PDE for larger ones [Citation9].

We define a small parameter $ε = 1 / i_{M}$ , and let $x_{i} = i ε$ with $i_{M} ≫ 1$ being the average polymer size defined by $i_{M} = \frac{\sum_{i \geq i_{0}} i c_{i}}{\sum c_{i}} .$

Then after definition of dimensionless quantities $c^{ε} (t, x) = \sum c_{i} 1_{[x_{i}, x_{i + 1}]},$ we may obtain a PDE to replace the infinite ODE system. Rigorous derivations of such continuous integro-PDE models may be found in [Citation22] for coagulation-fragmentation equations, in [Citation15] for the limit of the so-called Becker– Döring system (polymerization and depolymerization reactions are the only reactions considered) towards its continuous limit called the Lifshitz–Slyozov model, and in [Citation19] for the growth-fragmentation ‘Prion Model’. A formal derivation for a full model, also including nucleation, is carried out in [Citation25].

Let $N_{0} \in N$ . We then use the approximation (5) $\begin{aligned} \frac{d V}{d t} & = - k_{I}^{+} V + k_{I}^{-} V^{*}, \\ \frac{d V^{*}}{d t} & = k_{I}^{+} V - k_{I}^{-} V^{*} + i_{0} k_{off}^{N} c_{i_{0}} - V^{*} \sum_{i \geq i_{0}} k_{on}^{i} c_{i}, \end{aligned}$ (5) (6) $\frac{d c_{i_{0}}}{d t} = k_{on}^{N} (V^{*})^{i_{0}} - k_{off}^{N} c_{i_{0}} - k_{on}^{i_{0}} c_{i_{0}} V^{*},$ (6) (7) $\frac{d c_{i}}{d t} = V^{*} (k_{on}^{i - 1} c_{i - 1} - k_{on}^{i} c_{i}), i \leq N_{0},$ (7) (8) $\partial_{t} c^{ε} (x, t) = - V^{*} \partial_{x} (k_{on} (x) c^{ε} (x, t)), x \geq N_{0},$ (8) with initial conditions $V (0) = c_{0}, V^{*} (0) = 0, c_{i_{0}} (0) = c_{i} (0) = 0, c^{ε} (x, 0) = 0,$ and the boundary condition $c^{ε} (x = N_{0}, t) = c_{N_{0}} (t) .$ Here we have passed to the continuous representations for chain lengths larger than $i = N_{0}$ .

Then an assumed mass balance equation becomes $\frac{d}{d t} (V + V^{*} + \sum_{i = i_{0}}^{N_{0}} i c_{i} + \int_{N_{0}}^{\infty} x c^{ε} (x) d x) = 0.$

In [Citation9], we considered requirements for a good discretization scheme including the following: (i) it should conserve the a-dimensional total polymerized mass (Madim), (ii) it should be fast, and most importantly, (iii) it should be accurate.

To ensure the conservation of mass, we replace the ODE for $V^{*}$ by the mass conservation equation and obtain (9) $\begin{aligned} \frac{d V}{d t} & = - k_{I}^{+} V + k_{I}^{-} V^{*}, \\ V^{*} & = c_{0} - V - \sum_{i = i_{0}}^{N_{0}} i c_{i} - \int_{N_{0}}^{\infty} x c^{ε} d x, \\ \frac{d c_{i_{0}}}{d t} & = k_{on}^{N} (V^{*})^{i_{0}} - k_{off}^{N} c_{i_{0}} - k_{on}^{i_{0}} c_{i_{0}} V^{*}, \\ \frac{d c_{i}}{d t} & = V^{*} (k_{on}^{i - 1} c_{i - 1} - k_{on}^{i} c_{i}), i \leq N_{0}, \end{aligned}$ (9) (10) $\partial_{t} c^{ε} (x, t) = - V^{*} \partial_{x} (k_{on} (x) c^{ε} (x, t)), x \geq N_{0},$ (10) with initial and boundary conditions as before.

We developed methodology for forward solutions in [Citation9]. We first point out that the desired spatial computational domain is very large as determined by the maximum size of observed polymers, with range up to $10^{6}$ . The peak in the distribution is at the left side of the domain of interest; for larger polymer sizes, the distribution is almost linearly decreasing.

Based on these and other considerations discussed in [Citation9], the PDE was approximated by the Finite Volume Method (see [Citation23] for discussions of Upwind, Lax–Wendroff, and flux limiter methods) with an adaptive mesh, refined towards the smaller polymer sizes. Furthermore, we kept the ratio between the step size and the corresponding mesh element constant, that is, we used $Δ x_{i} / x_{i} = q < 1$ so that $x_{i} = 1 / (1 - q) x_{i - 1}$ . This mesh is quasi-linear in the sense of $Δ x_{i - 1} / Δ x_{i} = 1 + O (q)$ . The resulting Upwind and Lax–Wendroff schemes are then consistent on the progressive mesh [Citation23]. For further details on these schemes including examples demonstrating convergence properties, the interested reader may consult [Citation9].

3. The inverse problem

A major question in formulating the model for use in inverse problem scenarios consists of how to best parametrically represent the polymerization parameters $k_{on}^{i}$ of Equation (Equation3(3) $\frac{d c_{i_{0}}}{d t} = k_{on}^{N} (V^{*})^{i_{0}} - k_{off}^{N} c_{i_{0}} - k_{on}^{i_{0}} c_{i_{0}} V^{*},$ (3) ) and the function $k_{on}$ of Equation (Equation4(4) $\frac{d c_{i}}{d t} = V^{*} (k_{on}^{i - 1} c_{i - 1} - k_{on}^{i} c_{i}), i = i_{0} + 1, \dots$ (4) ) for our application. We do this in a combined piecewise continuous formulation.

For this continuous polymerization function $k_{on} (x)$ , we use the piecewise linear representation (Figure ) $\begin{aligned} k_{on} (x) = \{\begin{cases} k_{on}^{min} + x \frac{k_{on}^{max} - k_{on}^{min}}{x_{1} i_{max} - i_{0}}, & x \leq x_{1} i_{max}, \\ k_{on}^{max}, & x_{1} i_{max} \leq x \leq x_{2} i_{max}, \\ k_{on}^{max} - x \frac{k_{on}^{max}}{i_{max} (1 - x_{2})}, & x_{2} i_{max} \leq x \leq i_{max}, \\ 0, & x \geq i_{max} . \end{cases} \end{aligned}$ In the numerical approximations, we chose $i_{0} = 2, N_{0} = 500$ . The discrete polymerization parameters $k_{on}^{i}$ , $i = i_{0}, \dots, N_{0}$ , are then obtained as $k_{on}^{i} := k_{on} (x = i) .$

Figure 2. Parametric representation for $k_{on}$ .

Following [Citation25], we chose to approximate $k_{on}$ by a function as depicted in Figure . (According to our discussions with S. Prigent, H. Rezaei and J. Torrent, other choices like a Gaussian bell curve are also possible, but as we will subsequently conclude, the presently available data will not support estimation of parameters in these representations.) Thus with this parametrization we have five more parameters $k_{on}^{min}, k_{on}^{max},$ the fractions $x_{1}, x_{2},$ and $i_{max}$ in addition to the four basic parameters $k_{I}^{+}, k_{I}^{-}, k_{on}^{N}, k_{off}^{N}$ to be estimated using our data sets.

Thus, we seek to estimate (with acceptable quantification of uncertainties) the nine parameters $k_{I}^{+}, k_{I}^{-}, k_{off}^{N}, k_{on}^{N}$ , and $k_{on} (X)$ (represented in parametrical form depicted above with the five additional unknowns $k_{on}^{min}, k_{on}^{max},$ $x_{1}, x_{2}, i_{max}$ ) that fit the data best. To do this we need an efficient discretization method as discussed above for the forward problem as well as a correct assumption on the measurement errors in the inverse problem.

3.1. Estimation of parameters

We make some standard statistical assumptions (see [Citation5,Citation11,Citation17,Citation28]) underlying our inverse problem formulations.

Assume that there exists a true or nominal set of parameter $θ_{0} = (k_{I}^{-}, \dots, i_{max}) .$
Let $E_{i}$ be i.i.d. with $E (E_{i}) = 0$ and $cov (E_{i}, E_{i}) = σ_{0}^{2}$ , where $i = 1, \dots, n$ and n is the number of observations or data points in the given data set.

Denote as $\hat{θ}$ the estimated parameter for $θ_{0}$ . The inverse problem is based on statistical assumptions on the observation error in the data.

If we assume an absolute error data model, then data points are taken with equal importance. This is represented by observations (here $ϵ_{i}$ is a realization of the error process $E_{i}$ ) (11) $y_{i} = M (t_{i}, θ_{0}) + ϵ_{i} .$ (11)

On the other hand, if one assumes some type of relative error data model, then the error is proportional in some sense to the measured polymerized mass. This can be represented by observations of the form (12) $y_{i} = M (t_{i}, θ_{0}) + M (t_{i}, θ_{0})^{γ} ϵ_{i}, γ \in (0, 1] .$ (12)

Absolute model error formulations dictate we use ordinary least-squares (OLS) inverse problem [Citation5,Citation11] given by (13) $\hat{θ} = \arg min_{θ} \sum_{i = 1}^{n} (y_{i} - M (t_{i}, θ))^{2},$ (13) while for relative error models one should use inverse problem formulations with GLS cost functional (14) $\hat{θ} = \arg min_{θ} \sum_{i = 1}^{n} {(\frac{y_{i} - M (t_{i}, θ)}{M (t_{i}, θ)^{γ}})}^{2}, γ \in (0, 1] .$ (14)

3.1.1. The residual plots.

To obtain a best statistical model, we used residual plots (see [Citation5,Citation11] for more details) with residuals given by $r_{i} = \frac{y_{i} - M (t_{i}, \hat{θ})}{M (t_{i}, \hat{θ})^{γ}}, γ \in [0, 1] .$

To illustrate what we are seeking for our data sets, we first used simulated relative error data (simulated data for $γ = 1$ ), then carried out the inverse problems for both a relative error cost functional (i.e. $γ = 1$ ) and an OLS cost functional (i.e. $γ = 0$ ). We then plotted the corresponding residuals vs. time and also residuals vs. the model values. The first plots are related to the correctness of our assumption of independency and identical distributions i.i.d. for the data, whereas the second plots contain information as to the correctness of the form of our proposed statistical model (Figures and ).

Figure 3. Plots with simulated data: (a) correct cost function vs. time $(γ = 1)$ ; (b) incorrect cost function vs. time $(γ = 0)$ .

Figure 4. Plots with simulated data: (a) correct cost function vs. model $(γ = 1)$ ; (b) incorrect cost function vs. model $(γ = 0)$ .

3.2. Statistical models of noise

We next carried out similar inverse problems with data set (DS) 4 of our experimental data collection. We first used DS4 on the interval $t \in [0, 8]$ . Based on some earlier calculations we also chose the nucleation index $i_{0} = 2$ for all our subsequent calculations. The residual plots given in Figures and suggest strongly that neither of the first attempts of assumed statistical models and corresponding cost functionals (absolute error and OLS or relative error with $γ = 1$ and simple GLS) are correct.

Figure 5. (a) $M (t_{k})$ (Madim) with OLS; (b) residuals vs. model: OLS.

Figure 6. (a) $M (t_{k})$ with GLS, $γ = 1$ ; (b) residuals vs. model: GLS.

Based on these initial results and the speculation that early periods of the polymerization process may be somewhat stochastic in nature, we chose to subsequently use all the data sets on the intervals $[t_{0}, 8]$ where $t_{0}$ is the first time when $M (t_{0}) > 0.12$ (thus 12% of the a-dimensional total polymerized mass). Moreover, we decided to use other values of γ between 0 and 1 to test DS4. Setting $i_{0} = 2$ , we focused on the question of the most appropriate values of γ to use in a GLS approach (again see [Citation11] for further motivation and details). We then obtained the results with DS4 depicted in Figure . Analysis of these residuals suggest that either $γ = 0.6$ or $γ = 0.7$ might be satisfactory for use in a GLS setting.

Figure 7. Residuals for DS4 using different values of γ.

Motivated by these results, we next investigated the inverse problems for each of the four experimental data sets with initial concentration $c_{0} = 200 μ mol$ and $i_{0} = 2$ . We carried out the optimization over all data points with $M (t_{k}) \geq 0.12$ and used the GLS method with $γ = 0.6$ . The resulting graphics depicted in Figure again suggest that $γ = 0.6$ is a reasonable value to use in our subsequent analysis of the PolyQ data with regard to its information content for inverse problem estimation and parameter uncertainty quantification.

Figure 8. Residuals for the four experimental data sets using $γ = 0.6$ .

4. SEs and asymptotic analysis

4.1. SEs for parameters using GLS

We employed first the asymptotic theory (as $n \to \infty$ ) for parameter uncertainty summarized in [Citation5,Citation11,Citation17] and references therein. In the case of GLS, the associated SEs for the estimated parameters $\hat{θ} = (k_{I}^{+}, \dots, i_{max})$ (vector length $κ_{θ} = 9$ ) are given by the following construction (for details see Chaps 3.2.5 and 3.2.6 of [Citation11]).

We may define the SEs by the formula ${SE}_{k} = \sqrt{Σ_{k k} (\hat{θ})}, k = 1, \dots, 9,$ where the covariance matrix Σ is given by $Σ (\hat{θ}) = {\hat{σ}}^{2} (χ^{T} (\hat{θ}) W (\hat{θ}) χ (\hat{θ}))^{- 1} .$ Here $χ = \frac{\partial M}{\partial θ} = (\frac{\partial M (t_{1}; \hat{θ})}{\partial θ}, \dots, \frac{\partial M (t_{n}; \hat{θ})}{\partial θ})$ is the sensitivity matrix of size $n \times κ_{θ} = n \times 9$ (n being the number of data points and $κ_{θ} = 9$ being the number of estimated parameters) and W is defined by $W^{- 1} (\hat{θ}) = diag (M (t_{1}; \hat{θ})^{2 γ}, \dots, M (t_{n}; \hat{θ})^{2 γ}) .$ We use the approximation of the variance $σ_{0}^{2} \approx \hat{σ} (\hat{θ})^{2} = \frac{1}{n - κ_{θ}} \sum_{i = 1}^{n} \frac{1}{M (t_{i}; \hat{θ})^{2 γ}} (M (t_{i}, \hat{θ}) - y_{i})^{2} .$

To obtain a finite SE using asymptotic theory, the $κ_{θ} \times κ_{θ} = 9 \times 9$ matrix $F = χ^{T} (\hat{θ}) W (\hat{θ}) χ (\hat{θ})$ thus must be invertible. In the above problem, we do indeed obtain a good fit of the curve and good residuals (for the sake of brevity, not depicted here). However, we also found that the condition number of the matrix $F = χ^{T} (\hat{θ}) W (\hat{θ}) χ (\hat{θ})$ is $κ = 10^{24}$ . Looking more closely at the matrix F reveals a near linear dependence between certain rows, hence the large condition number. We thus quickly reach the following conclusions:

We obtain a set of parameters for which the model fits well, but we cannot have any reasonable confidence in them using the asymptotic theories from statistics ( e.g. see the references given above).
We suspect that it may not be possible to obtain sufficient information from our data set curves to estimate all nine parameters with a high degree of confidence. This is based on our calculations with the corresponding Fisher matrices as well our prior knowledge in that the graphs depicted in Figure are very similar to Logistic or Gompertz curves. These curves can be quite well fit with parameterized models with only two or three carefully chosen parameters.

To assist in initial understanding of these issues, we consider components of the associated sensitivity matrices $χ = \partial M / \partial θ$ .

4.2. Sensitivity analysis

For the sensitivity analysis, we follow [Citation5,Citation11] and carry out all computations using the differential system of sensitivity equations as detailed in those references. Hereafter all our analysis will be carried out using DS4 and the best estimate $\hat{θ}$ obtained for the latter. We find that the model is sensitive mainly to four parameters: $k_{I}^{+}, k_{I}^{-}, k_{on}^{N}, k_{off}^{N}$ . The sensitivities for the remaining parameters are on an order of magnitude of $10^{- 6}$ or less. It also shows some sensitivity with respect to $x_{1}$ . However, the parameter $x_{1}$ appears in the model only as the factor $x_{1} i_{max}$ . The sensitivities depicted in the following use $\hat{θ}$ for the nine best fit GLS parameters, that is, $\hat{θ}$ for $κ_{θ} = 9$ . We note that since we use the a-dimensional quantity M in the cost functionals (OLS or GLS), it is the sensitivity of this quantity with respect to the parameters θ, rather than any relative sensitivities, that will determine changes in the cost functionals to be minimized with respect to changes in the parameters (Figures –).

Figure 9. (a) Sensitivity w.r.t. $k_{I}^{-}$ ; (b) sensitivity w.r.t. $k_{I}^{+}$ .

Figure 10. (a) Sensitivity w.r.t. $k_{on}^{N}$ ; (b) sensitivity w.r.t. $k_{off}^{N}$ .

Figure 11. (a) Sensitivity w.r.t. $k_{on}^{min}$ ; (b) sensitivity w.r.t. $k_{off}^{max}$ .

Figure 12. (a) Sensitivity w.r.t. $x_{1}$ ; (b) sensitivity w.r.t. $x_{2}$ .

Figure 13. (a) Sensitivity w.r.t. $i_{max}$ ; (b) sensitivity w.r.t. $x_{11} = i_{max} x_{1}$ .

5. Sensitivity motivated inverse problems

Based on the sensitivity findings depicted above, we investigated a series of inverse problems in which we attempted to estimate an increasing number of parameters beginning first with the fundamental parameters $k_{I}^{+}$ and $k_{I}^{-}$ . In each of these inverse problems, we attempted to ascertain uncertainty bounds for the estimated parameters using both the asymptotic theory described above and a GLS version of bootstrapping [Citation13,Citation14,Citation16,Citation18,Citation20].

A quick outline of the appropriate bootstrapping algorithm is given next.

5.1. Bootstrapping algorithm: nonconstant variance data

We suppose now that we are given experimental data $(t_{1}, y_{1}), \dots, (t_{n}, y_{n})$ from the underlying observation process (15) $Y_{i} = M (t_{i}; θ_{0}) + M (t_{i}; θ_{0})^{γ} \tilde{E_{i}},$ (15) where $i = 1, \dots, n$ and the ${\tilde{E}}_{i}$ are i.i.d. with mean zero and constant variance $σ_{0}^{2}$ . Then we see that $E (Y_{i}) = M (t_{i}; θ_{0})$ and $Var (Y_{i}) = σ_{0}^{2} M^{2 γ} (t_{i}, θ_{0})$ , with associated corresponding realizations of $Y_{i}$ given by $y_{i} = M (t_{i}; θ_{0}) + M (t_{i}; θ_{0})^{γ} \tilde{ϵ_{i}} .$

A standard algorithm can be used to compute the corresponding bootstrapping estimate ${\hat{θ}}_{boot}$ of $θ_{0}$ and its empirical distribution. We treat the general case for nonlinear dependence of the model output on the parameters θ. The algorithm is given as follows.

First obtain the estimate ${\hat{θ}}^{0}$ from the entire sample ${y_{i}}$ using the GLS given in Equation (Equation14(14) $\hat{θ} = \arg min_{θ} \sum_{i = 1}^{n} {(\frac{y_{i} - M (t_{i}, θ)}{M (t_{i}, θ)^{γ}})}^{2}, γ \in (0, 1] .$ (14) ) with $γ = 1$ . An estimate ${\hat{θ}}_{boot}$ can be solved for iteratively as follows.
Define the nonconstant variance standardized residuals ${\bar{s}}_{i} = \frac{y_{i} - M (t_{;} {\hat{θ}}^{0})}{M (t_{i}; {\hat{θ}}^{0})^{γ}}, i = 1, 2, \dots, n .$
Set $m = 0$ .
Create a bootstrapping sample of size n using random sampling with replacement from the data (realizations) { ${\bar{s}}_{1}$ ,…, ${\bar{s}}_{n}$ } to form a bootstrapping sample ${s_{1}^{m}, \dots, s_{n}^{m}}$ .
Create bootstrapping sample points $y_{i}^{m} = M (t_{i}; {\hat{θ}}^{0}) + M (t_{i}; {\hat{θ}}^{0})^{γ} s_{i}^{m},$ where $i = 1$ ,…,n.
Obtain a new estimate ${\hat{θ}}^{m + 1}$ from the bootstrapping sample ${y_{i}^{m}}$ using GLS.
Set $m = m + 1$ and repeat steps 3–5 until $m \geq M$ where M is large (e.g. $M = 1000$ ).

We then calculate the mean, SE, and confidence intervals using the formulae (16) $\begin{aligned} {\hat{θ}}_{boot} & = \frac{1}{M} \sum_{m = 1}^{M} {\hat{θ}}^{m}, \\ Var (θ_{boot}) & = \frac{1}{M - 1} \sum_{m = 1}^{M} ({\hat{θ}}^{m} - {\hat{θ}}_{boot})^{T} ({\hat{θ}}^{m} - {\hat{θ}}_{boot}), \\ {SE}_{k} ({\hat{θ}}_{boot}) & = \sqrt{Var (θ_{boot})_{k k}}, \end{aligned}$ (16) where $θ_{boot}$ denotes the bootstrapping estimator.

5.2. Estimation of two parameters

We first carried out estimation for the two parameters $k_{I}^{+}$ and $k_{I}^{-}$ . We use the GLS formulation with $γ = 0.6$ . We fix globally (based on previous estimations with DS4) the parameter values

Table

Display Table

and used the initial guesses for the parameters given by

Table

Display Table

We then used the bootstrapping algorithm and obtained the following means and SEs for $M = 1000$ which, as reported in the following, compare quite well with the asymptotic theory estimates. The corresponding distributions are shown in Figures and .

Figure 14. Two parameters estimation ( $k_{I}^{+}$ , $k_{I}^{-}$ ). Bootstrapping distribution for $k_{I}^{+}$ . We use GLS and $M = 1000$ runs.

Figure 15. Two parameters estimation ( $k_{I}^{+}$ , $k_{I}^{-}$ ). Bootstrapping distribution for $k_{I}^{-}$ . We use GLS and $M = 1000$ runs.

Table

Display Table

5.3. GLS estimation of three parameters

We tried next to estimate three parameters using the GLS formulation with $γ = 0.6$ . Once again we fixed all the parameters describing the domain and the polymerization function $k_{on}$ and also fixed either $k_{off}^{N}$ or $k_{on}^{N}$ in the corresponding inverse problems.

5.4. GLS estimation for $k_{I}^{+}, k_{I}^{-},$ and $k_{on}^{N}$

We fixed values as follows:

Table

Display Table

We used as initial parameter values:

Table

Display Table

We obtained the estimated parameters together with the corresponding SEs, variances, and the condition numbers κ of the corresponding sensitivity matrices for the four data sets as reported in the following table. The

95 %

confidence results based on the asymptotic theory are also depicted for DS4 in Figure .

Table

Display Table

To compare these asymptotic results with bootstrapping, we carried out bootstrapping with DS4 for the estimation of $k_{I}^{+}$ , $k_{I}^{-}$ , and $k_{on}^{N}$ with the same initial values as above. We then obtained the following means and SE for a run with $M = 1000$ , in comparison to the asymptotic theory.

Figure 16. Confidence intervals.

Table

Display Table

Of particular interest are the values obtained for $k_{on}^{N}$ and the bootstrapping SEs for $k_{on}^{N}$ which are extremely small. It should be noted that the sensitivity of the model output on $k_{on}^{N}$ is also very small. Thus one might conjecture that the iterations in the bootstrapping algorithm do not change the values of $k_{on}^{N}$ very much and hence one observes the extremely small SE that are produced for the bootstrapping estimates (Figures –).

Figure 17. Estimation for $k_{I}^{+}$ , $k_{I}^{-},$ and $k_{on}^{N}$ : bootstrapping distribution for $k_{I}^{-}$ for GLS and $M = 1000$ runs.

Figure 18. Estimation for $k_{I}^{+}$ , $k_{I}^{-}$ , and $k_{on}^{N}$ : bootstrapping distribution for $k_{I}^{+}$ for GLS and $M = 1000$ runs.

Figure 19. Estimation for $k_{I}^{+}$ , $k_{I}^{-}$ , and $k_{on}^{N}$ : bootstrapping distribution for $k_{on}^{N}$ for GLS and $M = 1000$ runs.

5.5. GLS estimation for $k_{I}^{+}, k_{I}^{-},$ and $k_{off}^{N}$

In another test, we fixed $k_{on}^{N}$ and instead estimate $k_{off}^{N}$ (along with $k_{I}^{+}$ and $k_{I}^{-}$ ). We use the fixed values:

Table

Display Table

and the initial guesses for the parameters to be estimated given by

Table

Display Table

We obtained the estimated parameters and corresponding SE.

Table

Display Table

Also in this case, we carried out bootstrapping for DS4. The bootstrapping distributions for

k_{I}^{+}

k_{I}^{-}

, and

k_{off}^{N}

are found in Figures –. We then obtained the following means and SEs for a run with

M = 1000

in comparison to the asymptotic theory.

Figure 20. Three parameters estimation ( $k_{I}^{+}$ , $k_{I}^{-},$ and $k_{off}^{N}$ ): bootstrapping distribution for $k_{I}^{+}$ . We used GLS and $M = 1000$ runs.

Figure 21. Three parameters estimation ( $k_{I}^{+}$ , $k_{I}^{-},$ and $k_{off}^{N}$ ): bootstrapping distribution for $k_{I}^{-}$ . We used GLS and $M = 1000$ runs.

Figure 22. Three parameters estimation ( $k_{I}^{+}$ , $k_{I}^{-}$ and $k_{off}^{N}$ ): bootstrapping distribution for $k_{off}^{N}$ . We used GLS and $M = 1000$ runs.

Table

Display Table

5.6. Estimation of four main parameters

Following the sensitivity analysis detailed above, we tried to estimate a combination of the parameters $k_{I}^{+}, k_{I}^{-}, k_{on}^{N}, k_{off}^{N}$ for the parameter set with $κ_{θ} = 4$ .

Parameters were fixed as follows from the original nine parameter fit:

Table

Display Table

We obtained the following result for the estimation of the four parameters using the DS1–DS4. In all of them, the condition number of Fischer's information matrix κ is too large to invert. This along with the sensitivity results above strongly suggests that the data sets do not contain sufficient information to estimate four or more parameters with any degree of certainty attached to the estimates.

Table

Display Table

6. Model comparison tests

A type of Residuals Sum of Squares (RSS)-based model selection criterion [Citation3,Citation5,Citation11] can be used as a tool for model comparison for certain classes of models. In particular, this is true for models such as those given in [Citation10] in which potentially extraneous mechanisms can be eliminated from the model by a simple restriction on the underlying parameter space while the form of the mathematical model remains unchanged. In other words, this methodology can be used to compare two nested mathematical models where the parameter set $Ω_{θ}^{H}$ (this notation will be defined explicitly in Section 6.1) for the restricted model can be identified as a linearly restricted subset of the admissible parameter set $Ω_{θ}$ of the unrestricted model. Indeed, the RSS-based model selection criterion is a useful tool to determine whether or not certain terms in the mathematical models are important in describing the given experimental data.

6.1. Ordinary least squares

We now turn to the statistical model (Equation11(11) $y_{i} = M (t_{i}, θ_{0}) + ϵ_{i} .$ (11) ), where the measurement errors are assumed to be independent and identically distributed with zero mean and constant variance $σ_{0}^{2}$ . In addition, we assume that there exists $θ_{0}$ such that the statistical model (17) $Y_{j} = M (t_{j}; θ_{0}) + E_{j}, j = 1, 2, \dots, n,$ (17) correctly describes the observation process. In other words, Equation (Equation17(17) $Y_{j} = M (t_{j}; θ_{0}) + E_{j}, j = 1, 2, \dots, n,$ (17) ) is the true model, and $θ_{0}$ is the true value of the mathematical model parameter θ.

With our assumption on measurement errors, the mathematical model parameter θ can be estimated by using the OLS method; that is, the OLS estimator of θ is obtained by solving $θ^{n} = \arg min_{θ \in Ω_{θ}} J^{n} (θ; Y) .$ Here $Y = (Y_{1}, Y_{2}, \dots, Y_{n})^{T}$ , and the cost function $J^{n}$ is defined as $J^{n} (θ; Y) = \frac{1}{n} \sum_{k = 1}^{n} (Y_{k} - M (t_{k}; θ))^{2} .$ The corresponding realization ${\hat{θ}}^{n}$ of $θ^{n}$ is obtained by solving ${\hat{θ}}^{n} = \arg min_{θ \in Ω_{θ}} J^{n} (θ; y),$ where $y$ is a realization of $Y$ (i.e. $y = (y_{1}, y_{2}, \dots, y_{n})^{T}$ ).

As alluded to in the introduction, we might also consider a restricted version of the mathematical model in which the unknown true parameter is assumed to lie in a subset $Ω_{θ}^{H} \subset Ω_{θ}$ of the admissible parameter space. We assume this restriction can be written as a linear constraint, $H θ_{0} = h$ , where $H \in R^{κ_{r} \times κ_{q}}$ is a matrix having rank $κ_{r}$ (i.e. $κ_{r}$ is the number of constraints imposed), and $h$ is a known vector. Thus the restricted parameter space is $Ω_{θ}^{H} = {θ \in Ω_{θ} : H θ = h} .$ Then the null and alternative hypotheses are $\begin{aligned} H_{0} : θ_{0} \in Ω_{θ}^{H}, \\ H_{A} : θ_{0} \notin Ω_{θ}^{H} . \end{aligned}$ We may define the restricted parameter estimator as $θ^{n, H} = \arg min_{θ \in Ω_{θ}^{H}} J^{n} (θ; Y),$ and the corresponding realization is denoted by ${\hat{θ}}^{n, H}$ . Since $Ω_{θ}^{H} \subset Ω_{θ}$ , it is clear that $J^{n} ({\hat{θ}}^{n}; y) \leq J^{n} ({\hat{θ}}^{n, H}; y) .$ This fact forms the basis for a model selection criterion based upon the residual sum of squares.

Using the standard assumptions (given in detail in [Citation11]), one can establish asymptotic convergence result for the test statistics (which is a function of observations and is used to determine whether or not the null hypothesis is rejected) $U^{n} = \frac{n (J^{n} (θ^{n, H}; Y) - J^{n} (θ^{n}; Y))}{J^{n} (θ^{n}; Y)},$ where the corresponding realization ${\hat{U}}_{n}$ is defined as ${\hat{U}}^{n} = \frac{n (J^{n} ({\hat{θ}}^{n, H}; y) - J^{n} ({\hat{θ}}^{n}; y))}{J^{n} ({\hat{θ}}^{n}; y)} .$ This asymptotic convergence result is summarized in the following theorem.

Theorem 6.1

Under assumptions detailed in [Citation5,Citation11] and assuming the null hypothesis $H_{0}$ is true, then $U^{n}$ converges in distribution $($ as $n \to \infty)$ to a random variable U having a chi-square distribution with $κ_{r}$ degrees of freedom.

The above theorem suggests that if the sample size n is sufficiently large, then $U^{n}$ is approximately chi-square distributed with $κ_{r}$ degrees of freedom. We use this fact to determine whether or not the null hypothesis $H_{0}$ is rejected. To do that, we choose a significance level α (usually chosen to be 0.05) and use $χ^{2}$ tables to obtain the corresponding threshold value τ so that $Prob (U > τ) = α$ . We next compute ${\hat{U}}^{n}$ and compare it to τ. If ${\hat{U}}^{n} > τ$ , then we reject the null hypothesis $H_{0}$ with confidence level $(1 - α) 100 %$ ; otherwise, we do not reject. We emphasize that care should be taken in stating conclusions: we either reject or do not reject $H_{0}$ at the specified level of confidence. The following table illustrates the threshold values for $χ^{2} (1)$ with the given significance level.

Table

Download CSV Display Table

Similar tables can be found in any elementary statistics text or online or calculated by some software package such as Matlab, and is given here for illustrative purposes and also for use in the examples demonstrated in the following.

6.2. Generalized least squares

The model comparison results outlined can be extended to deal with GLS problems in which measurement errors are independent with $E (E_{k}) = 0$ and $Var (E_{k}) = σ_{0}^{2} w^{2} (t_{k}, \hat{θ})$ , $k = 1, 2, \dots, n$ , where w is some known real-valued function with $w (t, \hat{θ}) \neq 0$ for any t. This is achieved through rescaling the observations in accordance with their variance (as discussed in [Citation11]) so that the resulting (transformed) observations are identically distributed as well as independent.

6.3. Results for PolyQ aggregation models

We then carried out a series of model comparison tests (we again used DS4) for nested models to determine if an added parameter yields a statistically significantly improved model fit. Our null hypothesis in each case was: $H_{0}$ : The restricted model is adequate (i.e. the fit to data is not significantly improved with the model containing the additional parameter as a parameter to be estimated). We obtained the following results.

The model with estimation of ${k_{I}^{+}, k_{I}^{-}}$ vs. the model with estimation of ${k_{I}^{+}, k_{I}^{-}, k_{off}^{N}}$ : we find with $n = 699$ , $J_{n} ({\hat{θ}}_{H}^{n}; Y) = 0.0044192109$ , $J_{n} ({\hat{θ}}^{n}; Y) = 0.0043709501,$ and ${\hat{U}}_{n} = 7.7178$ . Thus, we reject $H_{0}$ at a 99% confidence level.
The model with estimation of ${k_{I}^{+}, k_{I}^{-}}$ vs. the model with estimation of ${k_{I}^{+}, k_{I}^{-}, k_{on}^{N}}$ : we find $J_{n} ({\hat{θ}}^{n}; Y) = 0.0044192108$ with ${\hat{U}}_{n} = 7.49 \times 10^{- 06} .$ Thus, we do not reject $H_{0}$ at a 99% confidence level.
The model with estimation of ${k_{I}^{+}, k_{I}^{-}, k_{off}^{N}}$ vs. the model with estimation of ${k_{I}^{+}, k_{I}^{-}, k_{off}^{N}, k_{on}^{N}}$ : to the order of computation we find no difference in the cost functions in this case and therefore we do not reject $H_{0}$ at a confidence level of 99%.
The model with estimation of ${k_{I}^{+}, k_{I}^{-}, k_{on}^{N}}$ vs. the model with estimation of ${k_{I}^{+}, k_{I}^{-}, k_{on}^{N}, k_{off}^{N}}$ : we find $J_{n} ({\hat{θ}}^{n}; Y) = 0.0043709780$ with ${\hat{U}}_{n} = 7.7133$ and hence we reject $H_{0}$ with a confidence level of 99%.

From these and the preceding results we conclude that the information content of the typical data set for the dynamics considered here will support at most three parameters estimated with reasonable confidence levels and these are the parameters ${k_{I}^{+}, k_{I}^{-}, k_{off}^{N}}$ .

7. Conclusions and suggested further efforts

For the efforts reported on above we make several conclusions.

For the majority of data sets, the GLS residual plots with $γ = 0.6$ are random when fitted for data points $M (t_{k}) \geq 0.12.$ As conjectured earlier, this may be because the early formation of aggregates is somewhat stochastic in nature which is not well described by either the mathematical and/or statistical models. It appears that one needs special consideration of smaller polymer sizes. Indeed we suspect from additional discussions with our colleagues that perhaps the nucleation step might be dominated by a stochastic rather than deterministic process in the early stages (i.e. for small polymer sizes). This is a possible direction of further investigation.

Based on several different mathematical/statistical methodologies (sensitivities, asymptotic analysis, bootstrapping, and model comparison tests), the data sets we considered do not contain sufficient information for the reliable estimation of all nine parameters of interest. Indeed our findings suggest that at most three parameters can be reliably estimated with the data sets typical of those presented here, and that these parameters are ${k_{I}^{+}, k_{I}^{-}, k_{off}^{N}}$ . Recently related efforts [Citation2] suggest that perhaps there are experimental design questions that could be addressed to collect data that might support the more sophisticated models derived in [Citation25], especially in order to investigate information coming from different initial concentrations. Indeed, we have considered here data sets related to experiments carried out with the same initial concentration. Adapting the previously used techniques to simultaneously or successively use all the information content in data sets carried out for different initial concentration is a challenging problem (see [Citation24] for a discussion of the effect of initial concentration on nucleated polymerization).

Here we concluded that at most three parameters ${k_{I}^{+}, k_{I}^{-}, k_{off}^{N}}$ can be reliably estimated with the data sets investigated. The two first parameters determine the balance between the normal and abnormal protein concentrations and the third represents the stability of the nucleus against the degradation into monomeric entities. These three parameters are related to the early steps of the aggregation process, and thus we conclude that the model applied to these data sets does not provide any insight into the polymerization of larger polymers. Since this is the case, there is little motivation to modify the polymerization function $k_{on}$ depicted in Figure until further data collection procedures are pursued. It is difficult to say whether the initiation parameters are biologically more or less interesting than other parameters in the later aggregation process. (Everything interests the biological community since very little is known with certainty.) What we can more prudently conclude is rather a negative conclusion: this type of experiments seems to be more sensitive to the ignition of the reaction than to secondary pathways, which reveals a limit, and argues for the measurement not only of the total polymerized mass but also of size distributions of fibrils if one wants to really select the proper model for self-acceleration, as for instance in [Citation26,Citation30,Citation31].

The methodology we developed here may be carried out for other types of proteins as well as related experiments, and this is of peculiar interest since the ThT measurements as studied here are the most standard method for protein aggregation. For instance, in the seminal paper [Citation29], models are proposed and fit to data; our type of analysis would complement their findings and allow one to assess the quality of these fits and how much confidence the authors can have in their conclusions. These results in turn could be combined with optimal design techniques [Citation2,Citation4,Citation11] to design more informative experiments.

Acknowledgments

The authors are most grateful to a referee whose careful reading of an earlier version of this manuscript led to substantial improvements in the final version.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This research was supported in part (MD, CK) by the ERC Starting Grant SKIPPERAD, in part (HTB) by the National Institute of Allergy and Infectious Diseases [grant number NIAID R01AI071915-10], and in part (HTB) by the Air Force Office of Scientific Research [grant number AFOSR FA9550-12-1-0188].

References

B.M. Adams, H.T. Banks, M. Davidian, and E.S. Rosenberg, Model fitting and prediction with HIV treatment interruption data, Center for Research in Scientific Computation Technical Report CRSC-TR05-40, NC State University, October, 2005; Bull. Math. Biol. 69 (2007), pp. 563–584.
Google Scholar
K. Adoteye, H.T. Banks, and K.B. Flores, Optimal design of non-equilibrium experiments for genetic network interrogation, CRSC-TR14-12, N.C. State University, Raleigh, NC, September, 2014; Appl. Math. Lett. 40 (2015), pp. 84–89; DOI: 10.1016/j.aml.2014.09.013.
Google Scholar
H.T. Banks and B.G. Fitzpatrick, Statistical methods for model comparison in parameter estimation problems for distributed systems, J. Math. Biol. 28 (1990), pp. 501–527. doi: 10.1007/BF00164161
Web of Science ®Google Scholar
H.T. Banks and K.L. Rehm, Parameter estimation in distributed systems: Optimal design, CRSC TR14-06, N.C. State University, Raleigh, NC, May, 2014; Eurasian J. Math. Comput. Appl. 2 (2014), pp. 70–79.
Google Scholar
H.T. Banks and H.T. Tran, Mathematical and Experimental Modeling of Physical and Biological Processes, CRC Press, Boca Raton, FL, 2009.
Google Scholar
H.T. Banks, M. Davidian, S. Hu, G.M. Kepler, and E.S. Rosenberg, Modeling HIV immune response and validation with clinical data, J. Biol. Dyn. 2 (2008), pp. 357–385. doi: 10.1080/17513750701813184
PubMedGoogle Scholar
H.T. Banks, A. Cintron-Arias, and F. Kappel, Parameter selection methods in inverse problem formulation, CRSC-TR10-03, N.C. State University, February, 2010, Revised, November, 2010; in Mathematical Modeling and Validation in Physiology: Application to the Cardiovascular and Respiratory Systems, J.J. Batzel, M. Bachar, and F. Kappel, eds., Lecture Notes in Mathematics Vol. 2064, Springer-Verlag, Berlin, 2013, pp. 43 – 73.
Google Scholar
H.T. Banks, R. Baraldi, K. Cross, K. Flores, C. McChesney, L. Poag, and E. Thorpe, Uncertainty quantification in modeling HIV viral mechanics, CRSC-TR13-16, N.C. State University, Raleigh, NC, December, 2013; Math. Biosci. Eng., to appear.
Google Scholar
H.T. Banks, M. Doumic, and C. Kruse, Efficient numerical schemes for nucleation-aggregation models: Early steps, CRSC-TR14-01, N.C. State University, Raleigh, NC, March, 2014.
Google Scholar
H.T. Banks, J.E. Banks, K. Link, J.A. Rosenheim, C. Ross, and K.A. Tillman, Model comparison tests to determine data information content, CRSC-TR14-13, N.C. State University, Raleigh, NC, October, 2014; Appl. Math. Lett. 43 (2015), pp. 10–18.
Google Scholar
H.T. Banks, S. Hu, and W.C. Thompson, Modeling and Inverse Problems in the Presence of Uncertainty, Taylor/Francis-Chapman/Hall-CRC Press, Boca Raton, FL, 2014.
Google Scholar
V. Calvez, N. Lenuzza, M. Doumic, J.-P. Deslys, F. Mouthon, and B. Perthame, Prion dynamic with size dependency – strain phenomena, J. Biol. Dyn. 4(1) (2010), pp. 28–42. doi: 10.1080/17513750902935208
PubMedGoogle Scholar
R.J. Carroll and D. Ruppert, Transformation and Weighting in Regression, Chapman & Hall, New York, 1988.
Google Scholar
R.J. Carroll, C.F.J. Wu, and D. Ruppert, The effect of estimating weights in weighted least squares, J. Amer. Statist. Assoc. 83 (1988), pp. 1045–1054. doi: 10.1080/01621459.1988.10478699
Web of Science ®Google Scholar
J.F. Collet, T. Goudon, F. Poupaud, and A. Vasseur, The Becker-Döring system and its Lifshitz-Slyozov limit, SIAM J. Appl. Math. 62 (2002), pp. 1488–1500. doi: 10.1137/S0036139900378852
Web of Science ®Google Scholar
M. Davidian, Nonlinear Models for Univariate and Multivariate Response, ST 762 Lecture Notes, Chapters 2, 3, 9 and 11, 2007; http://www.stat.ncsu.edu/people/davidian/courses/st732/.
Google Scholar
M. Davidian and D.M. Giltinan, Nonlinear Models for Repeated Measurement Data, Chapman and Hall, London, 2000.
Google Scholar
T.J. DiCiccio and B. Efron, Bootstrap confidence intervals, Statist. Sci. 11 (1995), pp. 189–228.
Web of Science ®Google Scholar
M. Doumic, T. Goudon, and T. Lepoutre, Scaling limit of a discrete prion dynamics model, Commun. Math. Sci. 7 (2009), pp. 839–865. doi: 10.4310/CMS.2009.v7.n4.a3
Web of Science ®Google Scholar
B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans, CBMS 38, SIAM Publishing, Philadelphia, PA, 1982.
Google Scholar
F. Eghiaian, T. Daubenfeld, Y. Quenet, M. van Audenhaege, A.P. Bouin, G. van der Rest, J.Grosclaude, and H. Rezaei, Diversity in prion protein oligomerization pathways results from domain expansion as revealed by hydrogen/deuterium exchange and disulfide linkage, PNAS 104(18) (2007), pp. 7414–7419. doi: 10.1073/pnas.0607745104
PubMed Web of Science ®Google Scholar
P. Laurençot and S. Mischler, From the discrete to the continuous coagulation fragmentation equations, Proc. R. Soc. Edinb. Sect. A Math. 132 (2002), pp. 1219–1248. doi: 10.1017/S0308210500002080
Google Scholar
R.J. LeVeque, Finite-Volume Methods for Hyperbolic Problems, Cambridge University Press, Cambridge, 2002.
Google Scholar
E.T. Powers and D.L. Powers, The kinetics of nucleated polymerizations at high concentrations: Amyloid fibril formation near and above the ‘supercritical concentration’, Biophys. J. 91 (2006), pp. 122–132. doi: 10.1529/biophysj.105.073767
PubMed Web of Science ®Google Scholar
S. Prigent, A. Ballesta, F. Charles, N. Lenuzza, P. Gabriel, L.M. Tine, H. Rezaei, and M. Doumic, An efficient kinetic model for assemblies of amyloid fibrils and its application to polyglutamine aggregation, PLoS ONE 7 (2012), e43273. DOI:10.1371/journal.pone.0043273.
Google Scholar
S. Prigent, H.W. Haffaf, H.T. Banks, M. Hoffmann, H. Rezaei, and M. Doumic, Size distribution of amyloid fibrils. Mathematical models and experimental data, Int. J. Pure Appl. Math. 93(6) (2014), pp. 845–878. doi: 10.12732/ijpam.v93i6.10
Google Scholar
S.I. Rubinow, Introduction to Mathematical Biology, John Wiley & Sons, New York, 1975.
Google Scholar
G.A.F. Seber and C.J. Wild, Nonlinear Regression, J. Wiley & Sons, Hoboken, NJ, 2003.
Google Scholar
W.-F. Xue, S.W. Homans, and S.E. Radford, Systematic analysis of nucleation-dependent polymerization reveals new insights into the mechanism of amyloid self-assembly, Proc. Natl. Acad. Sci. USA 105 (2008), pp. 8926–8931. doi: 10.1073/pnas.0711664105
PubMed Web of Science ®Google Scholar
W.-F. Xue, S.W. Homans, and S.E. Radford, Amyloid fibril length distribution quantified by atomic force microscopy single-particle image analysis, Protein Eng. Des. Sel.: PEDS 22 (2009), pp. 489–496. doi: 10.1093/protein/gzp026
PubMed Web of Science ®Google Scholar
W.-F. Xue and S.E. Radford, An imaging and systems modeling approach to fibril breakage enables prediction of amyloid behavior, Biophys. J. 105 (2013), pp. 2811–2819. doi: 10.1016/j.bpj.2013.10.034
PubMed Web of Science ®Google Scholar

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Information content in data sets for a nucleated-polymerization model

Abstract

1. Introduction

1.1. Protein polymerization

2. The model

2.1. Original ODE model

2.2. An approximate PDE system and the associated forward problem

3. The inverse problem

3.1. Estimation of parameters

3.1.1. The residual plots.

3.2. Statistical models of noise

4. SEs and asymptotic analysis

4.1. SEs for parameters using GLS

4.2. Sensitivity analysis

5. Sensitivity motivated inverse problems

5.1. Bootstrapping algorithm: nonconstant variance data

5.2. Estimation of two parameters

5.3. GLS estimation of three parameters

5.4. GLS estimation for $k_{I}^{+}, k_{I}^{-},$ and $k_{on}^{N}$

5.5. GLS estimation for $k_{I}^{+}, k_{I}^{-},$ and $k_{off}^{N}$

5.6. Estimation of four main parameters

6. Model comparison tests

6.1. Ordinary least squares

6.2. Generalized least squares

6.3. Results for PolyQ aggregation models

7. Conclusions and suggested further efforts

Acknowledgments

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Information content in data sets for a nucleated-polymerization model

Abstract

1. Introduction

1.1. Protein polymerization

2. The model

2.1. Original ODE model

2.2. An approximate PDE system and the associated forward problem

3. The inverse problem

3.1. Estimation of parameters

3.1.1. The residual plots.

3.2. Statistical models of noise

4. SEs and asymptotic analysis

4.1. SEs for parameters using GLS

4.2. Sensitivity analysis

5. Sensitivity motivated inverse problems

5.1. Bootstrapping algorithm: nonconstant variance data

5.2. Estimation of two parameters

5.3. GLS estimation of three parameters

5.4. GLS estimation for kI+,kI−, and konN

5.5. GLS estimation for kI+,kI−, and koffN

5.6. Estimation of four main parameters

6. Model comparison tests

6.1. Ordinary least squares

6.2. Generalized least squares

6.3. Results for PolyQ aggregation models

7. Conclusions and suggested further efforts

Acknowledgments

Disclosure statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

5.4. GLS estimation for $k_{I}^{+}, k_{I}^{-},$ and $k_{on}^{N}$

5.5. GLS estimation for $k_{I}^{+}, k_{I}^{-},$ and $k_{off}^{N}$