202

Views

CrossRef citations to date

Altmetric

Articles

Experimental design for vector output systems

H.T. BanksCenter for Research in Scientific Computation, Center for Quantitative Sciences in Biomedicine, N.C. State University, Raleigh, USA.Correspondence[email protected]

K.L. Rehm

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We formulate an optimal design problem for the selection of best states to observe and optimal sampling times for parameter estimation or inverse problems involving complex non-linear dynamical systems. An iterative algorithm for implementation of the resulting methodology is proposed. Its use and efficacy are illustrated on two applied problems of practical interest: (i) dynamic models of HIV progression and (ii) modelling of the Calvin cycle in plant metabolism and growth.

Keywords:

Introduction

In many scientific fields where mathematical modelling is utilized, mathematical models grow increasingly complex, containing possibly more state variables and parameters, over time as the underlying governing processes of a system are better understood and refinements in mechanisms are considered. Additionally, as technology invents and improves devices to measure physical and biological phenomena, new data become available to inform mathematical modelling efforts. The world is approaching an era in which the vast amounts of information available to researchers may be overwhelming or even counterproductive to efforts. We explore a framework based on the Fisher Information Matrix (FIM) for a system of ordinary differential equations (ODEs) to determine when an experimenter should take samples and what variables to measure when collecting information on a physical or biological process that is modelled by a dynamical system.

Inverse problem methodologies are discussed in [Citation1] in the context of dynamical system or mathematical model parameter estimation when a sufficient number of observations of one or more states (variables) are available. The choice of method depends on assumptions the modeller makes on the form of the error between the model and the observations (the statistical model). The most prevalent source of error is observation error, which is made when collecting data. (One can also consider model error, which originates from the differences between the model and the underlying process that the model describes. But this is often quite difficult to quantify.) Measurement error is most readily discussed in the context of statistical models. The three techniques commonly addressed are maximum likelihood estimators (MLE), used when the properties of the error distribution are known; ordinary least squares (OLS), for error with constant variance across observations; and generalized least squares (GLS), used when the variance of the data can be expressed as a non-constant function. Uncertainty quantification is also described for optimization problems of this type, namely in the form of observation error covariances, standard errors, residual plots, and sensitivity matrices. Techniques to approximate the variance of the error are also included in these discussions.

Experimental design using the FIM, which is based on sensitivity matrices, is described in [Citation2] for the case of scalar data. Sensitivity matrices are composed of functions that relate the change in a variable to the change in the parameter. The first order quantifications of these relations are called traditional sensitivity functions and are useful in suggesting when a variable should be sampled to get the most information for estimating a particular parameter, especially when the first order sensitivity functions are used in conjunction with the so-called second order sensitivity functions. This work also examines the usefulness of generalized sensitivity functions,[Citation3] which are calculated using the FIM, that are known to describe how information about the parameters is distributed across time for each variable. Both types of sensitivity functions are then used in numerical simulations to determine the optimal final time for an experiment of a process described by a logistic curve.

In [Citation4], the authors develop an experimental design theory using the FIM to identify optimal sampling times for experiments on physical processes (modelled by an ODE system) in which scalar or vector data will be taken. The experimental design technique developed is applied in numerical simulations to the logistic curve, a simple ODE model describing glucose regulation and a harmonic oscillator example.

In addition to when to take samples, the question of what variables to measure is also very important in designing effective experiments, especially when the number of state variables is large. Use of such a methodology to optimize what to measure would further reduce testing costs by eliminating extra experiments to measure variables neglected in previous trials. In [Citation5], the best set of variables for an ODE system modelling the Calvin cycle [Citation6] is identified using two methods. The first, an ad-hoc statistical method, determines which variables directly influence an output of interest at any one particular time. A model using a subset of variables is determined via multivariate linear regression, and the efficacy of the model is then measured using the Akaike Information Criterion. The variables that appear in the best models at the most time points are then identified as the most important to measure. Such a method does not utilize the information on the underlying time-varying processes given by the dynamical system model. Extension of the second method first suggested in [Citation5], based optimal design ideas, is the subject of our presentation here.

Building on the theory in [Citation4] and [Citation5], we formulate a previously unexplored optimal design problem to determine not only the optimal sampling variables out of a finite set of possible sampling variables but also the optimal sampling time distribution given a fixed final time. We compare the SE-optimal design introduced in [Citation2] and [Citation4] with the well-known methods of D-optimal and E-optimal design on a six-compartment HIV model [Citation7] and a thirty-one dimensional model of the Calvin Cycle.[Citation6] Such models where there may be a wide range of variables to possibly observe are not only ideal on which to test our proposed methodology, but also are widely encountered in applications.

Mathematical background

Mathematical and statistical models

We explore our experimental design questions using a mathematical model(1) $\begin{matrix} \frac{d \vec{x}}{dt} (t) & = \vec{g} (t, \vec{x} (t; \vec{θ}), \vec{q}), t \in [t_{0}, t_{f}], \\ \vec{x} (t_{0}; \vec{θ}) & = {\vec{x}}_{0}, \end{matrix}$ (1) where $\vec{x} (t; \vec{θ})$ is the vector of state variables of the system generated using a parameter vector $\vec{θ} = ({\vec{x}}_{0}; \vec{q}) \in R^{p}$ , $p = m + r$ , that contains $m$ initial values and $r$ system parameters listed in $\vec{q}$ , $\vec{g}$ is a mapping $R^{1 + m + r} \to R^{m}$ , $t_{0} \geq 0$ is the initial time, and $t_{f} < \infty$ is the final time. We define an observation process(2) $\begin{matrix} \vec{f} (t; \vec{θ}) = C \vec{x} (t; \vec{θ}), \end{matrix}$ (2) where $C$ is an observation operator that maps $R^{m} \to R^{N}$ , where $N$ is the number of variables observed at a single sampling time. If we were able to observe all states, each measured by a different sampling technique, then $N = m$ and $C = I^{m \times m}$ ; however, this is most often not the case because of the impossibility of or the expense in measuring all state variables. In other cases (such as the HIV example below) we may be able to directly observe only combinations of the states.

In order to discuss the amount of uncertainty in parameter estimates, we formulate a statistical model [Citation1] of the form(3) $\begin{matrix} \vec{Y} (t) = \vec{f} (t; {\vec{θ}}_{0}) + \vec{E} (t), t \in [t_{0}, t_{f}], \end{matrix}$ (3) where ${\vec{θ}}_{0}$ is the hypothesized true value of the unknown parameters and $\vec{E}$ is a vector random process that represents observation error for the measured variables. We make the standard assumptions: $\begin{matrix} \begin{matrix} E (\vec{E} (t)) = \vec{0}, t \in [t_{0}, t_{f}], \\ V ar (\vec{E} (t)) = V_{0} (t) = d iag (σ_{0, 1} {(t)}^{2}, σ_{0, 2} {(t)}^{2}, \dots, σ_{0, N} {(t)}^{2}), t \in [t_{0}, t_{f}], \\ C ov (E_{i} (t) E_{i} (s)) = σ_{0, i} {(t)}^{2} δ (t - s), s, t \in [t_{0}, t_{f}], \\ C ov (E_{i} (t) E_{j} (s)) = 0, i \neq j, s, t \in [t_{0}, t_{f}], \end{matrix} \end{matrix}$ where $δ (0) = 1$ and $δ (t) = 0$ for $t \neq 0$ . Realizations of the statistical model (Equation3(3) $\begin{matrix} \vec{Y} (t) = \vec{f} (t; {\vec{θ}}_{0}) + \vec{E} (t), t \in [t_{0}, t_{f}], \end{matrix}$ (3) ) are written $\begin{matrix} \vec{y} (t) = \vec{f} (t; {\vec{θ}}_{0}) + \vec{ϵ} (t), t \in [t_{0}, t_{f}] . \end{matrix}$ When collecting experimental data, it is often difficult to take continuous measurements of the observed variables. Instead, we assume that we have $n$ observations at times $t_{j}$ , $j = 1, \dots, n$ , $t_{0} \leq t_{1} < t_{2} < \dots < t_{n} \leq t_{f}$ . We then write the observation process (Equation2(2) $\begin{matrix} \vec{f} (t; \vec{θ}) = C \vec{x} (t; \vec{θ}), \end{matrix}$ (2) ) as(4) $\begin{matrix} \vec{f} (t_{j}; \vec{θ}) = C \vec{x} (t_{j}; \vec{θ}), j = 1, 2, \dots, n, \end{matrix}$ (4) the discrete statistical model as(5) $\begin{matrix} {\vec{Y}}_{j} = \vec{f} (t_{j}; {\vec{θ}}_{0}) + \vec{E} (t_{j}), j = 1, 2, \dots, n, \end{matrix}$ (5) and a realization of the discrete statistical model as $\begin{matrix} {\vec{y}}_{j} = \vec{f} (t_{j}; {\vec{θ}}_{0}) + \vec{ϵ} (t_{j}), j = 1, 2, \dots, n . \end{matrix}$ If we were given ${\vec{θ}}_{0}$ , we could solve (Equation1(1) $\begin{matrix} \frac{d \vec{x}}{dt} (t) & = \vec{g} (t, \vec{x} (t; \vec{θ}), \vec{q}), t \in [t_{0}, t_{f}], \\ \vec{x} (t_{0}; \vec{θ}) & = {\vec{x}}_{0}, \end{matrix}$ (1) ) for $\vec{x} (t; {\vec{θ}}_{0})$ , a process known as solving the forward problem. Alternatively, if we had a set of data ${\vec{y}}_{j}$ , $j = 1, 2, \dots, n$ , we could estimate ${\vec{θ}}_{0}$ in a process known as solving the inverse problem. We will use this mathematical and statistical framework to develop a methodology to identify sampling variables that provide the most information pertinent to estimating a given set of parameters and the most informative times at which the samples should be taken.

Formulation of the optimal design problem

Several methods exist to solve the inverse problem. A major factor [Citation1] in determining which method to use is additional assumptions made about $\vec{E} (t)$ . It is common practice to make the assumption that realizations of $\vec{E} (t)$ at particular time points are independent and identically distributed (i.i.d.). If, additionally, the distributions describing the behaviour of the components of $\vec{E} (t)$ are known, then maximum likelihood methods may be used to find an estimate of ${\vec{θ}}_{0}$ . On the other hand, if the distributions for $\vec{E} (t)$ are not known but the variance $V_{0} (t)$ (also unknown) is assumed to vary over time, weighted least squares methods are often used. We propose an optimal design problem formulation using a generalized weighted least squares criterion.

Let $P_{1} ([t_{0}, t_{f}])$ denote the set of all bounded distributions on the interval $[t_{0}, t_{f}]$ . We consider the generalized weighted least squares cost functional for systems with vector output(6) $\begin{matrix} J_{WLS} (\vec{y}, \vec{θ}) = \int_{t_{0}}^{t_{f}} {[\vec{y} (t) - \vec{f} (t; \vec{θ})]}^{T} V_{0}^{- 1} (t) [\vec{y} (t) - \vec{f} (t; \vec{θ})] {dP}_{1} (t), \end{matrix}$ (6) where $P_{1} (t) \in P_{1} ([t_{0}, t_{f}])$ is a general measure on the interval $[t_{0}, t_{f}]$ . For a given continuous data set $\vec{y} (t)$ , we search for a parameter $\hat{θ}$ that minimizes $J_{WLS} (\vec{y}, \vec{θ})$ .

We next consider the case of observations collected at discrete times. If we choose a set of $n$ time points $τ = {t_{j}}$ , $j = 1, 2, \dots, n$ , where $t_{0} \leq t_{1} < t_{2} < \dots < t_{n} \leq t_{f}$ and take(7) $\begin{matrix} P (t) = P_{τ} = \sum_{j = 1}^{n} δ_{t_{j}}, \end{matrix}$ (7) where $δ_{a}$ represents the Dirac delta distribution with atom at $a$ , then the weighted least squares criterion (Equation6(6) $\begin{matrix} J_{WLS} (\vec{y}, \vec{θ}) = \int_{t_{0}}^{t_{f}} {[\vec{y} (t) - \vec{f} (t; \vec{θ})]}^{T} V_{0}^{- 1} (t) [\vec{y} (t) - \vec{f} (t; \vec{θ})] {dP}_{1} (t), \end{matrix}$ (6) ) for a finite number of observations becomes $\begin{matrix} J_{WLS}^{n} (\vec{y}, \vec{θ}) = \sum_{j = 1}^{n} {[\vec{y} (t_{j}) - \vec{f} (t_{j}; \vec{θ})]}^{T} V_{0}^{- 1} (t_{j}) [\vec{y} (t_{j}) - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ To select a useful distribution of time points and set of observation variables, we introduce the $N$ by $p$ sensitivity matrices $[\frac{\partial \vec{f} (t; \vec{θ})}{\partial \vec{θ}}]$ and the $m$ by $p$ sensitivity matrices $[\frac{\partial \vec{x} (t; \vec{θ})}{\partial \vec{θ}}]$ that are determined using the differential operator in row vector form $(\partial_{θ_{1}}, \partial_{θ_{2}}, \dots, \partial_{θ_{p}})$ represented by $\nabla_{\vec{θ}}$ and the observation operator defined in (Equation2(2) $\begin{matrix} \vec{f} (t; \vec{θ}) = C \vec{x} (t; \vec{θ}), \end{matrix}$ (2) ),(8) $\begin{matrix} \nabla_{\vec{θ}} \vec{f} (t, \vec{θ}) & = \frac{\partial \vec{f} (t; \vec{θ})}{\partial \vec{θ}} \\ = C \frac{\partial \vec{x} (t; \vec{θ})}{\partial \vec{θ}} \\ = C (\begin{matrix} \frac{\partial x_{1} (t; \vec{θ})}{\partial θ_{1}} & \frac{\partial x_{1} (t; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial x_{1} (t; \vec{θ})}{\partial θ_{p}} \\ \frac{\partial x_{2} (t; \vec{θ})}{\partial θ_{1}} & \frac{\partial x_{2} (t; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial x_{2} (t; \vec{θ})}{\partial θ_{p}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial x_{m} (t; \vec{θ})}{\partial θ_{1}} & \frac{\partial x_{m} (t; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial x_{m} (t; \vec{θ})}{\partial θ_{p}} \end{matrix}) \\ = C \nabla_{\vec{θ}} \vec{x} (t; \vec{θ}) . \end{matrix}$ (8) Using the sensitivity matrix $\nabla_{\vec{θ}} f (t, {\vec{θ}}_{0})$ , we may formulate the Generalized Fisher Information Matrix (GFIM). Consider the set $C \subset R^{m}$ of admissible observation maps and let $P_{2} (C)$ represent the set of all bounded distributions $P_{2} (c)$ on $C$ . Then the GFIM may be written(9) $\begin{matrix} F (P_{1}, P_{2}, {\vec{θ}}_{0}) & \equiv & \int_{t_{0}}^{t_{f}} \int_{C} \frac{1}{σ^{2} (t, c)} \nabla_{\vec{θ}}^{T} \vec{f} (t; {\vec{θ}}_{0}) \nabla_{\vec{θ}} \vec{f} (t; {\vec{θ}}_{0}) {dP}_{2} (c) {dP}_{1} (t) \\ = \int_{t_{0}}^{t_{f}} \int_{C} \frac{1}{σ^{2} (t, c)} \nabla_{\vec{θ}}^{T} (c \vec{x} (t; {\vec{θ}}_{0})) \nabla_{\vec{θ}} (c \vec{x} (t; {\vec{θ}}_{0})) {dP}_{2} (c) {dP}_{1} (t) . \end{matrix}$ (9) Taking $N$ different sampling maps in $C$ represented by the $m$ -dimensional row vectors $c_{k}$ , $k = 1, 2, \dots, N$ , we construct the discrete distribution on $C$ (10) $\begin{matrix} P_{C} = \sum_{k = 1}^{N} δ_{c_{k}}, \end{matrix}$ (10) where $δ_{a}$ represents the Dirac delta distribution with atom at $a$ . Using $P_{C}$ in (Equation10(10) $\begin{matrix} P_{C} = \sum_{k = 1}^{N} δ_{c_{k}}, \end{matrix}$ (10) ), we obtain the GFIM for multiple discrete observation methods taken continuously over $[t_{0}, t_{f}]$ ,(11) $\begin{matrix} F (P_{1}, P_{C}, {\vec{θ}}_{0}) & = \int_{t_{0}}^{t_{f}} \sum_{k = 1}^{N} \frac{1}{σ^{2} (t, c_{k})} \nabla_{\vec{θ}}^{T} (c_{k} \vec{x} (t; {\vec{θ}}_{0})) \nabla_{\vec{θ}} (c_{k} \vec{x} (t; {\vec{θ}}_{0})) {dP}_{1} (t) \\ = \int_{t_{0}}^{t_{f}} \sum_{k = 1}^{N} \frac{1}{σ^{2} (t, c_{k})} \nabla_{\vec{θ}}^{T} \vec{x} (t; {\vec{θ}}_{0}) c_{k}^{T} c_{k} \nabla_{\vec{θ}} \vec{x} (t; {\vec{θ}}_{0}) {dP}_{1} (t) \\ = \int_{t_{0}}^{t_{f}} \sum_{k = 1}^{N} \nabla_{\vec{θ}}^{T} \vec{x} (t; {\vec{θ}}_{0}) c_{k}^{T} \frac{1}{σ^{2} (t, c_{k})} c_{k} \nabla_{\vec{θ}} \vec{x} (t; {\vec{θ}}_{0}) {dP}_{1} (t) \\ = \int_{t_{0}}^{t_{f}} \nabla_{\vec{θ}}^{T} \vec{x} (t; {\vec{θ}}_{0}) \sum_{k = 1}^{N} (c_{k}^{T} \frac{1}{σ^{2} (t, c_{k})} c_{k}) \nabla_{\vec{θ}} \vec{x} (t; {\vec{θ}}_{0}) {dP}_{1} (t) \\ = \int_{t_{0}}^{t_{f}} \nabla_{\vec{θ}}^{T} \vec{x} (t; {\vec{θ}}_{0}) (C^{T} V_{0}^{- 1} (t) C) \nabla_{\vec{θ}} \vec{x} (t; {\vec{θ}}_{0}) {dP}_{1} (t), \end{matrix}$ (11) where $C = (c_{1}, c_{2}, \dots, c_{N})^{T} \in R^{N \times m}$ is the observation operator in (Equation2(2) $\begin{matrix} \vec{f} (t; \vec{θ}) = C \vec{x} (t; \vec{θ}), \end{matrix}$ (2) ) and (Equation4(4) $\begin{matrix} \vec{f} (t_{j}; \vec{θ}) = C \vec{x} (t_{j}; \vec{θ}), j = 1, 2, \dots, n, \end{matrix}$ (4) ) and $V_{0} (t) \in R^{N \times N}$ is the covariance matrix as described in (Equation3(3) $\begin{matrix} \vec{Y} (t) = \vec{f} (t; {\vec{θ}}_{0}) + \vec{E} (t), t \in [t_{0}, t_{f}], \end{matrix}$ (3) ). Applying the distribution $P_{τ}$ as described in (Equation7(7) $\begin{matrix} P (t) = P_{τ} = \sum_{j = 1}^{n} δ_{t_{j}}, \end{matrix}$ (7) ) to the GFIM (Equation12(12) $\begin{matrix} F (τ, C, {\vec{θ}}_{0}) = F (P_{τ}, P_{C}, {\vec{θ}}_{0}) = \sum_{j = 1}^{n} \nabla_{\vec{θ}}^{T} \vec{x} (t_{j}; {\vec{θ}}_{0}) C^{T} V_{0}^{- 1} (t_{j}) C \nabla_{\vec{θ}} \vec{x} (t_{j}; {\vec{θ}}_{0}) . \end{matrix}$ (12) ) for discrete observation operators measured continuously yields the discrete $p \times p$ FIM for discrete observation operators measured at discrete times(12) $\begin{matrix} F (τ, C, {\vec{θ}}_{0}) = F (P_{τ}, P_{C}, {\vec{θ}}_{0}) = \sum_{j = 1}^{n} \nabla_{\vec{θ}}^{T} \vec{x} (t_{j}; {\vec{θ}}_{0}) C^{T} V_{0}^{- 1} (t_{j}) C \nabla_{\vec{θ}} \vec{x} (t_{j}; {\vec{θ}}_{0}) . \end{matrix}$ (12) This describes the amount of information about the $p$ parameters of interest that is captured by the observed quantities described by the sampling maps $c_{k}$ , $k = 1, 2, \dots, N$ , listed in $C$ , when they are measured at the time points in $τ$ .

The questions of determining the best (in some sense) $C$ and $τ$ are important questions in the optimal design of an experiment. Recall that the set of time points $τ$ has an associated distribution $P_{1} (τ) = P_{τ} \in P_{1} ([t_{0}, t_{f}])$ , where $P_{1} ([t_{0}, t_{f}])$ is the set of all bounded distributions on $[t_{0}, t_{f}]$ . Similarly, the set of sampling maps ${c_{k}}$ has an associated bounded distribution $P_{C} \in P_{2} (C)$ . Define the space of bounded distributions $P ([t_{0}, t_{f}] \times C) = P_{1} ([t_{0}, t_{f}]) \times P_{2} (C)$ with elements $P = (P_{τ}, P_{C}) \in P$ . Without loss of generality, assume that $C \subset R^{m}$ is closed and bounded, and assume that there exists a functional $J : R^{p \times p} \to R^{+}$ of the GFIM (Equation10(10) $\begin{matrix} P_{C} = \sum_{k = 1}^{N} δ_{c_{k}}, \end{matrix}$ (10) ). Then the optimal design problem associated with $J$ is selecting a distribution $\hat{P} \in P$ such that(13) $\begin{matrix} J (F (\hat{P}, {\vec{θ}}_{0})) = \min_{P \in P} J (F (P, {\vec{θ}}_{0})), \end{matrix}$ (13) where $J$ depends continuously on the elements of $F (P, {\vec{θ}}_{0})$ .

The Prohorov metric [Citation8] provides a general theoretical framework for the existence of $\hat{P}$ and approximation in $P ([t_{0}, t_{f}] \times C)$ (a general theoretical framework is developed in [Citation2, Citation9]). The application of the Prohorov metric to optimal design problems formulated as (Equation14(14) $\begin{matrix} \hat{C} = {argmin}_{{C | P_{C} \in P_{2}^{N} (C_{N^{*}})}} J (F (\hat{τ}, C, {\vec{θ}}_{0})), \\ \hat{τ} = {argmin}_{{τ | P_{τ} \in P_{1} ([t_{0}, t_{f}])}} J (F (τ, \hat{C}, {\vec{θ}}_{0})), \end{matrix}$ (14) ) is explained more fully in [Citation2]: briefly, define the Prohorov metric $ρ$ on the space $P ([t_{0}, t_{f}] \times C)$ , and consider the metric space $(P ([t_{0}, t_{f}] \times C), ρ)$ . Since $[t_{0}, t_{f}] \times C$ is compact, $(P ([t_{0}, t_{f}] \times C), ρ)$ is also compact. Additionally, by the properties of the Prohorov metric, $(P ([t_{0}, t_{f}] \times C), ρ)$ is complete and separable. Therefore, an optimal distribution $\hat{P}$ exists and may be approximated by a discrete distribution.

The formulation of the cost functional (Equation14(14) $\begin{matrix} \hat{C} = {argmin}_{{C | P_{C} \in P_{2}^{N} (C_{N^{*}})}} J (F (\hat{τ}, C, {\vec{θ}}_{0})), \\ \hat{τ} = {argmin}_{{τ | P_{τ} \in P_{1} ([t_{0}, t_{f}])}} J (F (τ, \hat{C}, {\vec{θ}}_{0})), \end{matrix}$ (14) ) may take many forms. We focus on the use of traditional optimal design methods, D-optimal, E-optimal or SE-optimal design criteria, to determine the form of $J$ . Each of these design criteria are functions of the inverse of the FIM (assumed hereafter to be invertible) defined in (Equation13(13) $\begin{matrix} J (F (\hat{P}, {\vec{θ}}_{0})) = \min_{P \in P} J (F (P, {\vec{θ}}_{0})), \end{matrix}$ (13) ).

In D-optimal design, the cost functional is written $\begin{matrix} J_{D} (F) = d et (F {(τ, C, {\vec{θ}}_{0})}^{- 1}) = \frac{1}{d et (F (τ, C, {\vec{θ}}_{0}))} . \end{matrix}$ By minimizing $J_{D}$ , we minimize the volume of the confidence interval ellipsoid describing the uncertainty in our parameter estimates. Since $F$ is symmetric and positive semi-definite, $J_{D} (F) \geq 0$ . Additionally, since $F$ is assumed invertible, $J_{D} (F) \neq 0$ , therefore, $J_{D} : R^{p \times p} \to R^{+}$ .

In E-optimal design, the cost functional is $J_{E}$ is the largest eigenvalue of ${(F (τ, C, {\vec{θ}}_{0}))}^{- 1}$ , or equivalently $J_{E} (F) = \max \frac{1}{e ig (F (τ, C, {\vec{θ}}_{0}))} .$ To obtain a smaller standard error, we must reduce the length of the principal axis of the confidence interval ellipsoid. Since an eigenvalue $λ$ solves $d et (F - λ I) = 0$ , an eigenvalue of $λ = 0$ would mean $d et (F) = 0$ , or that $F$ is not invertible. Since $F$ is positive definite, all eigenvalues are therefore positive. Thus, $J_{E} : R^{p \times p} \to R^{+}$ .

In SE-optimal design, $J_{SE}$ is a sum of the elements on the diagonal of ${(F (τ, C, {\vec{θ}}_{0}))}^{- 1}$ weighted by the respective parameter values,[Citation2, Citation4] written $\begin{matrix} J_{SE} (F) = \sum_{i = 1}^{p} \frac{{(F (τ, C, {\vec{θ}}_{0})))}_{i, i}^{- 1}}{θ_{0, i}^{2}} . \end{matrix}$ Thus in SE-optimal design, the goal is to minimize the sum of squared errors of the parameters normalized by the true parameter values. As the diagonal elements of $F^{- 1}$ are all positive and all parameters are assumed non-zero in $\vec{θ} \in R^{p}$ , $J_{SE} : R^{p \times p} \to R^{+}$ .

In [Citation4], it is shown that the D-, E- and SE-optimal design criteria select different time grids and yield different standard errors. We expect that these design cost functionals will also choose different observation variables (maps) in order to minimize different dimensions of the confidence interval ellipsoid.

Algorithm and optimization constraints

In most optimal design problems, there is not a continuum of measurement possibilities that may be used; rather, there are $N^{*} < \infty$ possible observation maps $c$ . Denote this set as $C_{N^{*}} \subset R^{m}$ . While we may still use the Prohorov metric-based framework to guarantee existence and convergence of (Equation14(14) $\begin{matrix} \hat{C} = {argmin}_{{C | P_{C} \in P_{2}^{N} (C_{N^{*}})}} J (F (\hat{τ}, C, {\vec{θ}}_{0})), \\ \hat{τ} = {argmin}_{{τ | P_{τ} \in P_{1} ([t_{0}, t_{f}])}} J (F (τ, \hat{C}, {\vec{θ}}_{0})), \end{matrix}$ (14) ), we have a stronger result first proposed in [Citation5] that is useful in numerical implementation. Because $C_{N^{*}}$ is finite, all probability distributions made from the elements of $C_{N^{*}}$ have the form $P_{2} ({c_{k}}) = P_{C} = \sum_{k = 1}^{N} δ_{c_{k}}$ for a fixed $N$ . Moreover, the set of all distributions that use $N$ sampling methods $P_{2}^{N} (C_{N^{*}})$ is also finite. For a fixed distribution of time points $P_{τ}$ , we may compute using (Equation13(13) $\begin{matrix} J (F (\hat{P}, {\vec{θ}}_{0})) = \min_{P \in P} J (F (P, {\vec{θ}}_{0})), \end{matrix}$ (13) ) the set of all possible FIM $F (P_{τ}, P_{C}, \vec{θ})$ that could be formulated from $c \in C_{N^{*}}$ . By the properties of matrix multiplication and addition, this set is also finite. Then the functional (Equation14(14) $\begin{matrix} \hat{C} = {argmin}_{{C | P_{C} \in P_{2}^{N} (C_{N^{*}})}} J (F (\hat{τ}, C, {\vec{θ}}_{0})), \\ \hat{τ} = {argmin}_{{τ | P_{τ} \in P_{1} ([t_{0}, t_{f}])}} J (F (τ, \hat{C}, {\vec{θ}}_{0})), \end{matrix}$ (14) ) applied to all $F$ in the set produces a finite set contained in $R^{+}$ . Because this set is finite, it is well ordered by the relation $\leq$ and therefore has a minimal element. Thus for any distribution of time points $P_{τ}$ , we may find at least one solution $\hat{P_{C}} \in P_{2}^{N} (C_{N^{*}})$ . Moreover, $\hat{P_{C}}$ may be determined by a search over all matrices $C = (c_{1}, c_{2}, \dots, c_{N})^{T}$ formed by $N$ elements from $C_{N^{*}}$ . Therefore, for a fixed $P_{τ}$ and $N \leq N^{*} < \infty$ a global optimal set of $N$ sampling methods may be determined.

Due to the computational demands of performing non-linear optimization for $n$ time points and $N$ observation maps (for a total of $n + N$ dimensions), and to the difference in techniques between searching for an optimal $P_{C}$ in the finite set $C_{N^{*}}$ and searching for an optimal distribution of $n$ sampling times, we instead solve the coupled set of equations (14) $\begin{matrix} \hat{C} = {argmin}_{{C | P_{C} \in P_{2}^{N} (C_{N^{*}})}} J (F (\hat{τ}, C, {\vec{θ}}_{0})), \\ \hat{τ} = {argmin}_{{τ | P_{τ} \in P_{1} ([t_{0}, t_{f}])}} J (F (τ, \hat{C}, {\vec{θ}}_{0})), \end{matrix}$ (14) where $C \in R^{N \times m}$ represents a set of $N$ sampling maps and $τ = {t_{j}}_{j = 1}^{n}$ , $t_{0} \leq t_{1} < t_{2} < \dots < t_{n} \leq t_{f}$ , is an ordered set of $n$ sampling times. These equations are solved iteratively as (15) $\begin{matrix} {\hat{C}}_{i} = {argmin}_{{C | P_{C} \in P_{2}^{N} (C_{N^{*}})}} J (F ({\hat{τ}}_{i - 1}, C, {\vec{θ}}_{0})), \\ {\hat{τ}}_{i} = {argmin}_{{τ | P_{τ} \in P_{1} ([t_{0}, t_{f}])}} J (F (τ, {\hat{C}}_{i}, {\vec{θ}}_{0})), \end{matrix}$ (15) where $J$ is the D-, E- or SE-optimal design criterion. We begin by solving for ${\hat{C}}_{1}$ where ${\hat{τ}}_{0}$ is specified by the user. The system (Equation17(17) $\begin{matrix} {\hat{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} V_{0}^{- 1} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ (17) )-(Equation18(18) $\begin{matrix} V_{0} \approx \hat{V} = d iag (\frac{1}{n - p} \sum_{j = 1}^{n} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T}), \end{matrix}$ (18) ) is solved until $| J (F ({\hat{τ}}_{i}, {\hat{C}}_{i}, {\vec{θ}}_{0})) - J (F ({\hat{τ}}_{i - 1}, {\hat{C}}_{i - 1}, {\vec{θ}}_{0})) | < ϵ$ or until ${\hat{C}}_{i} = {\hat{C}}_{i - 1}$ . For each iteration, (Equation17(17) $\begin{matrix} {\hat{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} V_{0}^{- 1} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ (17) ) is solved using a global search over all possible $C$ . Since the sensitivity equations cannot be easily solved for in the models chosen to illustrate our method, we use a modified version of tssolve.m by A. Attarian,[Citation10] which implements the myAD package developed by M. Fink.[Citation11]

Solving (Equation18(18) $\begin{matrix} V_{0} \approx \hat{V} = d iag (\frac{1}{n - p} \sum_{j = 1}^{n} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T}), \end{matrix}$ (18) ) requires using a non-linear constrained iterative optimization algorithm. While MATLAB’s fmincon is a natural choice for such problems, as reported in [Citation4], it does not perform well in this situation. Instead, we use SolvOpt developed by A. Kuntsevich and F. Kappel [Citation12] (which utilizes a modified version of Shor’s $r$ -algorithm) to search for an optimal distribution local to an initial uniformly spaced time point distribution. There exist many types of constraints that may be placed on this optimization. The four constraints used in [Citation4] are

Optimize all time points such that $t_{0} \leq t_{1} \leq t_{2} \leq . . . \leq t_{n} \leq t_{f}$ . This method optimizes $n$ time points.
The initial and final time points are fixed as $t_{1} = t_{0}$ and $t_{n} = t_{f}$ . The routine then optimizes over the remaining $n - 2$ time points such that $t_{j} \leq t_{j + 1}$ .
Optimize the time steps $ν_{j} \geq 0$ . Fix $t_{1} = t_{0}$ and $t_{n} = t_{f}$ , and the remaining time points may be found by $t_{j + 1} = t_{j} + ν_{j}$ , $j = 1, 2, \dots, n - 2$ . Additionally, $t_{f} - t_{0} \geq \sum_{j = 1}^{n - 2} ν_{j}$ . This routine also optimizes over $n - 2$ variables.
Optimize the time steps $ν_{j} \geq 0$ . Fix $t_{1} = t_{0}$ . The remaining time points may be found by $t_{j + 1} = t_{j} + ν_{j}$ , $j = 1, 2, \dots, n - 1$ such that $t_{f} - t_{0} = \sum_{j = 1}^{n - 1} ν_{j}$ . This method optimizes over $n - 1$ variables.

We select for our use two constraints, (C2) and (C3), to reduce the complexity of the problem of time point distribution selection. Once either of the convergence requirements are met and

\hat{C}

and

\hat{τ}

are determined, we compute standard errors using the asymptotic theory described in the next section.

Standard errors

In order to compare the ability of different optimal design criteria to minimize uncertainty in parameter estimation, we compute the standard errors associated with these parameters. We begin by selecting an ODE system, a nominal set of parameters $\vec{θ}$ that we would estimate, the start and end times of the experiment $t_{0}$ and $t_{f}$ , the number $N$ of sampling times $n$ , and the number of observation maps we wish to use. After an optimal $τ$ and $C$ are determined according to one of the three previously described optimal design methods, we compute the standard errors for the parameters in $\vec{θ}$ . There are multiple techniques [Citation4] available to compute standard errors; here we choose to use asymptotic theory due to its ease of implementation.

Asymptotic theory for standard errors

If we assume that the covariance matrix $V_{0} (t)$ is constant over time ( $V_{0} (t) \equiv V_{0} = V ar (\vec{E} (t_{j})) = d iag (σ_{0, 1}^{2}, σ_{0, 2}^{2}, \dots, σ_{0, N}^{2})$ ), then we may use an ordinary least squares (OLS) framework to estimate standard errors. Once an optimal $τ = {t_{j}}_{j = 1}^{N}$ , $t_{0} \leq t_{1} < t_{2} < \dots < t_{n} \leq t_{f}$ , and $C$ are determined, we obtain data from an experiment or simulate data ${y_{j}}$ as a realization of the random process ${Y_{j}}$ described in (Equation5(5) $\begin{matrix} {\vec{Y}}_{j} = \vec{f} (t_{j}; {\vec{θ}}_{0}) + \vec{E} (t_{j}), j = 1, 2, \dots, n, \end{matrix}$ (5) ), and then we estimate the parameters in $\vec{θ}$ by solving the inverse problem using the OLS criterion.[Citation1] The discrete OLS estimator is defined as (16) $\begin{matrix} {\vec{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{Y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} V_{0}^{- 1} [{\vec{Y}}_{j} - \vec{f} (t_{j}; \vec{θ})], \end{matrix}$ (16) where $Θ$ is the set of all possible values of $\vec{θ}$ , such that each element of the difference vector ${\vec{Y}}_{j} - \vec{f} (t_{j}; \vec{θ})$ is weighted using the variance of its corresponding sampling maps. One realization of this problem using data ${\vec{y}}_{j}$ , $j = 1, 2, \dots, n$ , is written (17) $\begin{matrix} {\hat{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} V_{0}^{- 1} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ (17) However, calculating ${\hat{θ}}_{OLS}$ still requires the unknown $V_{0}$ . If the number of parameters $p$ of a system is sufficiently small and number of observations $n$ large so that $p < n$ , then we may calculate the bias adjusted estimate of the variances(18) $\begin{matrix} V_{0} \approx \hat{V} = d iag (\frac{1}{n - p} \sum_{j = 1}^{n} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T}), \end{matrix}$ (18) and find the estimate of ${\vec{θ}}_{0}$ using (19) $\begin{matrix} {\vec{θ}}_{0} \approx {\hat{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} {\hat{V}}^{- 1} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ (19) Therefore, finding ${\hat{θ}}_{OLS}$ when $V_{0}$ is unknown requires solving the coupled system of equations (Equation21(21) $\begin{matrix} χ_{j} (\vec{θ}) = χ_{j}^{n} (\vec{θ}) = (\begin{matrix} \frac{\partial f_{1} (t_{j}; \vec{θ})}{\partial θ_{1}} & \frac{\partial f_{1} (t_{j}; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial f_{1} (t_{j}; \vec{θ})}{\partial θ_{p}} \\ ⋮ & ⋮ & ⋮ \\ \frac{\partial f_{N} (t_{j}; \vec{θ})}{\partial θ_{1}} & \frac{\partial f_{N} (t_{j}; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial f_{N} (t_{j}; \vec{θ})}{\partial θ_{p}} \end{matrix}) . \end{matrix}$ (21) ) and (Equation22(22) $\begin{matrix} Σ_{0}^{n} \approx {\hat{Σ}}^{n} = {(\sum_{j = 1}^{n} χ_{j}^{T} ({\hat{θ}}_{OLS}) {\hat{V}}^{- 1} χ_{j} ({\hat{θ}}_{OLS}))}^{- 1} . \end{matrix}$ (22) ).

We may utilize the asymptotic properties of the OLS minimizer (Equation19(19) $\begin{matrix} {\vec{θ}}_{0} \approx {\hat{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} {\hat{V}}^{- 1} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ (19) ) to learn about the behaviour of the model (Equation1(1) $\begin{matrix} \frac{d \vec{x}}{dt} (t) & = \vec{g} (t, \vec{x} (t; \vec{θ}), \vec{q}), t \in [t_{0}, t_{f}], \\ \vec{x} (t_{0}; \vec{θ}) & = {\vec{x}}_{0}, \end{matrix}$ (1) ) and (Equation3(3) $\begin{matrix} \vec{Y} (t) = \vec{f} (t; {\vec{θ}}_{0}) + \vec{E} (t), t \in [t_{0}, t_{f}], \end{matrix}$ (3) ). As the number of samples $n \to \infty$ , ${\vec{θ}}_{OLS}$ has the following properties [Citation1, Citation13, Citation14] $\begin{matrix} {\vec{θ}}_{OLS} \sim N ({\vec{θ}}_{0}, Σ_{0}^{n}) \approx N ({\hat{θ}}_{OLS}, {\hat{Σ}}^{n}), \end{matrix}$ where(20) $\begin{matrix} Σ_{0}^{n} \approx {(\sum_{j = 1}^{n} χ_{j}^{T} ({\vec{θ}}_{0}) V_{0}^{- 1} χ_{j} ({\vec{θ}}_{0}))}^{- 1} \end{matrix}$ (20) is the $p \times p$ covariance matrix, and $χ_{j} (\vec{θ}) = χ_{j}^{n} (\vec{θ}) = \nabla_{\vec{θ}} \vec{f} (t_{j}; {\vec{θ}}_{0})$ is the $N \times p$ matrix(21) $\begin{matrix} χ_{j} (\vec{θ}) = χ_{j}^{n} (\vec{θ}) = (\begin{matrix} \frac{\partial f_{1} (t_{j}; \vec{θ})}{\partial θ_{1}} & \frac{\partial f_{1} (t_{j}; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial f_{1} (t_{j}; \vec{θ})}{\partial θ_{p}} \\ ⋮ & ⋮ & ⋮ \\ \frac{\partial f_{N} (t_{j}; \vec{θ})}{\partial θ_{1}} & \frac{\partial f_{N} (t_{j}; \vec{θ})}{\partial θ_{2}} & \dots & \frac{\partial f_{N} (t_{j}; \vec{θ})}{\partial θ_{p}} \end{matrix}) . \end{matrix}$ (21) The approximation ${\hat{Σ}}^{n}$ to the covariance matrix $Σ_{0}^{n}$ is(22) $\begin{matrix} Σ_{0}^{n} \approx {\hat{Σ}}^{n} = {(\sum_{j = 1}^{n} χ_{j}^{T} ({\hat{θ}}_{OLS}) {\hat{V}}^{- 1} χ_{j} ({\hat{θ}}_{OLS}))}^{- 1} . \end{matrix}$ (22) We may use ${\hat{Σ}}^{n}$ to approximate the standard errors of each parameter in ${\hat{θ}}_{OLS}$ . For the $k$ th element of ${\hat{θ}}_{OLS}$ , written ${\hat{θ}}_{OLS, k}$ , the asymptotic standard error is $ASE ({\vec{θ}}_{0, k}) = \sqrt{{\hat{Σ}}_{0, kk}^{n}} \approx ASE ({\hat{θ}}_{OLS, k}) = \sqrt{{\hat{Σ}}_{kk}^{n}},$ where ${\hat{Σ}}_{0, kk}^{n}$ is the element in the $k$ th row and $k$ th column of $Σ_{0}^{n}$ , and ${\hat{Σ}}_{kk}^{n}$ is the corresponding element in ${\hat{Σ}}^{n}$ . Additionally, since the FIM is defined to be the inverse of the covariance matrix, we may approximate the FIM using (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) by $F (τ, C, {\vec{θ}}_{0}) \approx F (\hat{τ}, \hat{C}, {\hat{θ}}_{OLS}) = {({\hat{Σ}}^{n})}^{- 1}$ .

Example 1: HIV model

After validating code results for time point selection on simple models such as those in [Citation4], we examined the performance of both the time point and observable operator selection algorithms on the log-scaled version of the HIV model developed in [Citation15]. While the analytic solution of this system cannot easily be found, this model has been studied, improved and successfully fit to and indeed validated with several sets of longitudinal data using parameter estimation techniques.[Citation7, Citation16, Citation17] Additionally, the sampling or observation operators used to collect data in a clinical setting as well as the relative usefulness of these sampling techniques are known.[Citation7] The model includes uninfected and infected CD $4^{+}$ T-cells, called type 1 target cells ( $T_{1}$ and $T_{1}^{*}$ , respectively), uninfected and infected macrophages (subsequently determined to more likely be resting or inactive CD $4^{+}$ T-cells), called type 2 target cells ( $T_{2}$ and $T_{2}^{*}$ ), infectious free virus $V_{I}$ , and immune response $E$ produced by cytotoxic T-lymphocytes CD $8^{+}$ . The HIV model with treatment function $u (t)$ is given by(23) $\begin{matrix} {\overset{\cdot}{T}}_{1} & = λ_{1} - d_{1} T_{1} - (1 - ϵ_{1} u (t)) k_{1} V_{I} T_{1}, \\ {\overset{\cdot}{T}}_{2} & = λ_{2} - d_{2} T_{2} - (1 - f ϵ_{1} u (t)) k_{2} V_{I} T_{2}, \\ {\overset{\cdot}{T}}_{1}^{*} & = (1 - ϵ_{1} u (t)) k_{1} V_{I} T_{1} - δ T_{1}^{*} - m_{1} {ET}_{1}^{*}, \\ {\overset{\cdot}{T}}_{2}^{*} & = (1 - f ϵ_{1} u (t)) k_{2} V_{I} T_{2} - δ T_{2}^{*} - m_{2} {ET}_{2}^{*}, \\ \overset{\cdot}{V} & = N_{T} δ (T_{1}^{*} + T_{2}^{*}) - [c + (1 - ϵ_{2} u (t)) 10^{3} k_{1} T_{1} + (1 - f ϵ_{1} u (t)) 10^{3} k_{2} T_{2}] V_{I}, \\ \overset{\cdot}{E} & = λ_{E} + b_{E} \frac{T_{1}^{*} + T_{2}^{*}}{T_{1}^{*} + T_{2}^{*} + K_{b}} E - d_{E} \frac{T_{1}^{*} + T_{2}^{*}}{T_{1}^{*} + T_{2}^{*} + K_{d}} E - δ_{E} E . \end{matrix}$ (23) The log-scaled HIV model, which is used to alleviate difficulties due to the large differences in scales of the variables and the parameters, is(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) where $T_{1} = 10^{x_{1}}$ , $T_{2} = 10^{x_{2}}$ , $T_{1}^{*} = 10^{x_{3}}$ , $T_{2}^{*} = 10^{x_{4}}$ , $V_{I} = 10^{x_{5}}$ and $E = 10^{x_{6}}$ .

In [Citation7] this model’s parameters are estimated for each of 45 different patients who were in a treatment programme for HIV at Massachusetts General Hospital (MGH). These individuals experienced interrupted treatment schedules, in which the patient did not take any medication for viral load control. Seven of these patients adhered to a structured treatment interruption schedule planned by the clinician. We use the parameters estimated to fit the data of one of these patients, Patient 4 in [Citation7], as our “true” parameters ${\vec{θ}}_{0}$ for this model. This patient was chosen because the patient continued treatment for an extended period of time (1919 days) and the corresponding data set contains 158 measurements of viral load and 107 measurements of CD4 cell count ( $T_{1} + T_{1}^{*}$ ) that exhibit a response to interruption in treatment, and the estimated parameters yield a model exhibiting trends similar to that in the data.

Table 1 Values of parameters in the HIV model (26).

Display Table

Sensitivity analysis performed in [Citation15] and parameter subset selection carried out in [Citation18] identified subsets of the 20 parameters and six initial conditions that would likely be reliably estimated when solving the corresponding inverse problems. Based on their results, we treat the subset of parameters $\vec{θ} = (λ_{1}, d_{1}, ϵ_{1}, k_{1}, ϵ_{2}, N_{T}, c, b_{E}, x_{1}^{0}, x_{2}^{0}, x_{5}^{0})$ as unknown and fix all other parameters. The values for the fixed parameters and ${\vec{θ}}_{0}$ are computed in [Citation7] and given in Table . It is important to choose parameters to which the model is sensitive; a poor choice for the components of $\vec{θ}$ will negatively affect sensitivity matrices and may lead to a near-singular FIM. Based on currently available measurement methods, we allow the possible types of observations including (1) infectious virus $x_{5}$ , (2) immune response $x_{6}$ , (3) total activated CD4 cells $x_{1} + x_{3}$ and (4) type 2 (resting or unactivated) target cells $x_{2} + x_{4}$ , each with an assumed error variance of $10 %$ of the initial variable values given by ${\vec{x}}^{0} = (3.0799, 1.2443, 1.7899, - 0.2150, 5.9984, - 0.7251)^{T}$ .

HIV observable selection results, times fixed

To simulate the experience of a clinician gathering data as a patient regularly returns for scheduled testing, we fix the sampling times and choose the optimal observables. We consider choices of $N = 1, 2, 3$ sampling operators out of the four possible observables, all $N$ of which will be measured at $n = 51, 101, 201, 401$ evenly spaced times over 2000 days, corresponding to measurements every 40, 20, 10 and 5 days, respectively. The $N$ optimal sampling maps are determined for each of the three optimal design criteria, and the asymptotic standard errors (ASE) are calculated after carrying out the corresponding inverse problem calcualtions.

Table 2 Number of observables, number of time points, observables selected by D-optimal cost functional, and the minimum and maximum standard error and associated parameter for the parameter subset in the HIV model (27).

Display Table

Table 3 Number of observables, number of time points, observables selected by E-optimal cost functional, and the minimum and maximum standard error and associated parameter for the parameter subset in the HIV model (27).

Display Table

Table 4 Number of observables, number of time points, observables selected by SE-optimal cost functional, and the minimum and maximum standard error and associated parameter for the parameter subset in the HIV model (27).

Display Table

Tables , and display the optimal observation operators determined by the D-, E- and SE-optimal design cost functions, respectively, as well as the lowest and highest ASE. In all three optimal design criteria, there was a distinct best choice of observables (listed in Tables , and ) for each pair of $n$ and $N$ . When only $N = 1$ observable could be measured, each design criterion consistently picked the same observable for all $n$ ; similarly, at $N = 2$ , both the D-optimal and SE-optimal design criteria were consistent in their selection over all $n$ , and E-optimal only differed at $n = 401$ . Even at $N = 3$ , each optimal design method specified at least two of the same observables at all $n$ .

The observables that were rated best changed between criteria, affirming the fact that each optimal design method minimizes different aspects of the standard error ellipsoid. At $N = 1$ observable, D-optimal selects the CD4 cell count while E-optimal and SE-optimal choose the infectious virus count. As a result, the min(ASE) calculated for a parameter estimation problem using the D-optimal observables is approximately 1/3 lower than the min(ASE) of E- and SE-optimal for all tested time point distributions. Similarly, the max(ASE) calculated for E- and SE-optimal designed parameter estimation problems is approximately 1/3 lower than that of D-optimal. Thus at $N = 1$ , based on minimum and maximum asymptotic standard errors, there is no clear best choice of an optimal design cost function.

At $N = 2$ allowed observables, both the D- and SE-optimal cost functions are minimized by an observation operator containing both activated CD4 cells (type 1 target cells) and type 2 target cells. The E-optimal cost function still favours infectious virus count in addition to type 2 target cells (at $n = 51, 101, 201$ ) or type 1 target cells (at $n = 401$ ). As a result, max(ASE) in the E-optimal design parameter estimation problem is $0 %$ – $20 %$ lower than that of D- or SE-optimal; however, the D- and SE-optimal designs reduce min(ASE) by $20 %$ – $30 %$ of the min(ASE) of E-optimal. Therefore at $N = 2$ , the E-optimal cost functional would be preferable if the largest ASE’s were to be reduced, but for the best overall improvement (as measured by percent reduction from the E-optimal ASE), D- and SE-optimal are recommended.

Table 5 Approximate asymptotic standard errors calculated by asymptotic theory (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) using optimally spaced $n = 35$ time points under constraints C2 (left set of 3) and C3 (right set of 3) and optimally selected $N = 2$ observables for parameters of interest $\vec{θ}$ of the HIV model (27). Smallest ASE per parameter is highlighted using bold font.

Display Table

Fig. 1 Solution of (27) plotted using the observables log $(10^{x_{1}} + 10^{x_{3}})$ (upper left), log $(10^{x_{2}} + 10^{x_{4}})$ (upper right), $x_{5}$ (lower left) and $x_{6}$ (lower right). Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 35$ times under constraint C2.

When selecting $N = 3$ observables, each of the three design criteria select many of the same observables. This is to be expected as $N^{*} = 4$ in this simulation. For $n = 51, 101, 201$ , both total CD4 cell count and immune response $E (t)$ are selected by all design criteria. The D-optimal criterion also chooses type 2 cell count, so the lack of information on virus count as measured by $x_{5} (t)$ leads to its high max(ASE), $ASE (x_{5}^{0})$ . E-optimal (and at larger $n$ , SE-optimal) choose to measure infectious virus count, reducing ASE $(x_{5}^{0})$ and thus reducing the max(ASE) by more than $50 %$ . While at low $n$ , E-optimal has the lowest min(ASE) and max(ASE), SE-optimal performs better at high $n$ , so when selecting $N = 3$ observables, the number of time points $n$ may affect which optimal design cost function performs best.

HIV observables and time point selection results

When taking samples on a uniform time grid, D-, E- and SE-optimal design criteria all choose observation operators that yield favourable ASE’s, with some criteria performing best under certain circumstances. For example, the SE-optimal observables at $n = 401$ , $N = 3$ yield the smallest standard errors; however, for all other values of $n$ at $N = 3$ , E-optimal performs best. At $N = 2$ , E-optimal is a slightly weaker scheme. The examples in [Citation4] also reveal that D-, E- and SE-optimal designs are all competitive when only selecting time points for several different models. Now we wish to investigate the performance of these three criteria when selecting both an observation operator and a sampling time distribution using the algorithm described by Equations (Equation17(17) $\begin{matrix} {\hat{θ}}_{OLS} = {argmin}_{\vec{θ} \in Θ} \sum_{j = 1}^{n} {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T} V_{0}^{- 1} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] . \end{matrix}$ (17) ) and (Equation18(18) $\begin{matrix} V_{0} \approx \hat{V} = d iag (\frac{1}{n - p} \sum_{j = 1}^{n} [{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})] {[{\vec{y}}_{j} - \vec{f} (t_{j}; \vec{θ})]}^{T}), \end{matrix}$ (18) ).

To maintain consistency across trials while slightly simplifying the parameter estimation problem, we allow the set of six parameters and three initial conditions $\vec{θ} = (λ_{1}$ , $d_{1}$ , $k_{1}$ , $N_{T}$ , $c$ , $b_{E}$ , $x_{1}^{0}$ , $x_{2}^{0}$ , $x_{5}^{0})$ to be treated as unknowns and fix all other parameters. We again allow the possible observations of (1) infectious virus $x_{5}$ , (2) immune response $x_{6}$ , (3) CD4 cells $x_{1} + x_{3}$ and (4) type 2 target cells $x_{2} + x_{4}$ , each with an assumed error variance of $5 %$ of the initial variable values given by ${\vec{x}}^{0}$ . To reflect the often limited resources of a clinical trial, we allow $N = 2$ observation maps to be included in the observation operators $C$ and examine the distribution of time points if $n = 35$ or $n = 105$ samples (consisting of all observables in the observation operator) which may be taken from $t_{0} = 0$ through $t_{f} = 1460$ . We begin all simulations with uniformly spaced sample times, and use either time grid constraint C2 or C3. Both constraints assume that samples are taken at $t_{0}$ and $t_{f}$ , so we in effect are optimizing the remaining 33 or 103 observation times.

When choosing $N = 2$ observables and distributing $n = 35$ time points using constraint C2, the D-optimal cost function yields the lowest ASE for the parameters $(λ_{1}$ , $d_{1}$ , $N_{T}$ , $c$ , $b_{E}$ , $x_{1}^{0}$ ), but the SE-optimals ASE are on the same order of magnitude (Table ). Both the D- and SE-optimal cost functions selected the same observables, and they both were minimized by similar distributions of time points (Figure ). The E-optimal design leads in the smallest ASE for $(k_{1}, x_{2}^{0}, x_{5}^{0})$ , with an ASE $(x_{5}^{0})$ that is half that of D-optimal and SE-optimal. E-optimal, however, trails D- and SE-optimal in the accuracy of an estimate for $b_{E}$ , with an ASE $(b_{E})$ that is larger by one order of magnitude. The large difference in ASE’s for these two parameters may be related to the observables selected by each design criterion: E-optimal design chooses $x_{5}$ (the variable most closely related to $x_{5}^{0}$ ) as an observable when D- and SE-optimal choose $x_{6}$ (whose right hand side contains $b_{E}$ ). All three also chose $x_{2} + x_{4}$ . The distribution of time points selected under each optimal design criterion focuses on regions very soon before or after a change in behaviour of the selected observables (Figure ). This may indicate that fewer sampling points may be needed for parameter estimates of similar accuracy. For the experimental design constraints of 35 time point allocated using constraint C2, the D-optimal cost functional is best for most parameters, but E-optimal is best for some parameters.

The ASE calculated for choosing $N = 2$ observables and distributing $n = 35$ time points using constraint C3 are very similar (Table ), and the selected observables remain unchanged; however, the optimal distribution of time points under each design criterion (Figure ) is more uniform than when using constraint C2 (Figure ). For small $n$ with $N = 2$ observables under constraint C3, D-optimal design is best for most parameters, but E-optimal is best for the remaining few. While the choice of a particular time point distribution constraint does not greatly impact the calculated ASEs for the paramters of interest, the constraint C3 yields a measurement schedule that may be more feasible for a patient to follow in that it does not call for as many days of multiple observations.

Fig. 2 Solution of (27) plotted using the observables log $(10^{x_{1}} + 10^{x_{3}})$ (upper left), log $(10^{x_{2}} + 10^{x_{4}})$ (upper right), $x_{5}$ (lower left) and $x_{6}$ (lower right). Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 35$ times under constraint C3.

Table 6 Approximate asymptotic standard errors calculated by asymptotic theory (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) using optimally spaced $n = 105$ time points under constraints C2 (left set of 3) and C3 (right set of 3) and optimally selected $N = 2$ observables for parameters of interest $\vec{θ}$ of the HIV model (27). Smallest ASE per parameter is highlighted using bold font.

Display Table

Fig. 3 Solution of (27) plotted using the observables log $(10^{x_{1}} + 10^{x_{3}})$ (upper left), log $(10^{x_{2}} + 10^{x_{4}})$ (upper right), $x_{5}$ (lower left) and $x_{6}$ (lower right). Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 105$ times under constraint C2.

Choosing $N = 2$ observables and distributing $n = 105$ time points using constraint C2 leads to a different pattern in which optimal design criterion is best for which parameter. For the more dense time point distribution, SE-optimal design is a much stronger candidate against D- and E-optimal (Table ). It yields the lowest ASE’s for $(λ_{1}, N_{T}, c, b_{E})$ , while E-optimal is best for $(k_{1}, x_{1}^{0}, x_{2}^{0}, x_{5}^{0})$ and D-optimal is best for $(d_{1}, b_{E})$ . The D- and SE-optimal cost functions again choose the same observables of $x_{6}$ and $x_{2} + x_{4}$ , so their ASE’s for all parameters are similar. In this scenario, the E-optimal design has the largest percent reduction in ASE from those of D- and SE-optimal for the parameters $x_{1}^{0}$ and $x_{5}^{0}$ . E-optimal also changed its selected observables to $x_{1} + x_{3}$ and $x_{2} + x_{4}$ . The distribution of time points for the D-optimal design criteria appear very close to uniform (Figure ), and the distributions for E- and SE-optimal are clustered near local maxima, local minima, and other large changes in behaviour of the observables; however, even slight differences between the distributions of the E- and SE-optimal costs functions are visible. Consider the graph of infectious virus count in Figure . There is a cluster of SE-optimal times between $t = 1100$ and $t = 1250$ , but no E-optimal times occur during that interval. Very soon after, between $t = 1250$ and $t = 1400$ , there is a cluster of E-optimal times but no SE-optimal times. These clusters may indicate that either these periods in the patient’s treatment history are key to characterizing the parameters or that fewer time points may be adequate to obtain sufficiently accurate parameter estimates.

Fig. 4 Solution of (27) plotted using the observables log $(10^{x_{1}} + 10^{x_{3}})$ (upper left), log $(10^{x_{2}} + 10^{x_{4}})$ (upper right), $x_{5}$ (lower left) and $x_{6}$ (lower right). Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 105$ times under constraint C3.

Table 7 Top: Optimal 5 observables chosen by each optimal design criterion when estimating the parameters ${\vec{θ}}_{a}$ of the model in [Citation6]. Bottom: Approximate asymptotic standard errors calculated using asymptotic theory (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) for each parameter in ${\vec{θ}}_{a}$ using the optimal 5 observables with time point constraints of uniform spacing, constraint C2, and constraint C3. Smallest ASE for each parameter per time point constraint is highlighted in bold font.

Display Table

The strength of SE-optimal design for large $n$ does not hold for time grid constraint C3 (Table ). E-optimal design provides the lowest ASE for the parameters $(λ_{1}, d_{1}, k_{1}, x_{1}^{0})$ , D-optimal is best for $(N_{T}, c, b_{E}, x_{2}^{0}, x_{5}^{0})$ and the ASE calculated using the SE-optimal designed experiment is often the largest. The observables selected in this case are the same as the $n = 105$ , constraint C2 case. The optimal time point distributions determined under all three design criterion are near uniform (Figure ). Small differences between the distributions may be observed when the functions have a slope of high magnitude, indicating that the distributions are not exactly uniform. As the time point optimization routine is started with a uniform time point distribution, this may indicate that when samples are taken at many times during an experiment, a uniform distribution is somewhat near optimal for model (27) or, more significantly, that a uniform distribution is not an appropriate initial distribution to use to obtain the optimal time point distribution.

Between D- and E-optimal, there is no clear leader for optimal time point and observable selection in these examples for system (27). The SE-optimal criterion is useful in estimating some parameters under the constraint C2, and in most cases yields standard errors on the same order of magnitude of the leading optimal design criterion. The optimal design algorithm successfully identifies observables and a time distribution that optimizes an aspect of the Fisher Information Matrix as determined by the choice of optimal design cost functional; however, the time point distributions exhibit tendencies to cluster near times when the slopes of function solutions are changing. We proceed to examine the algorithm’s performance when presented with a system that allows many possible observables and does not require as many sampling times.

Example 2: Zhu’s Calvin Cycle model

The second model we use as an example is characteristic of the large differential equation systems that often appear in industrial problems. In [Citation6], Zhu et al., present an ODE model for the Calvin Cycle in fully grown spinach. The Calvin Cycle, part of the light-independent reactions in photosynthesis, plays an important role in plant carbon fixation, which leads to the growth of the plant. This model contains 165 parameters and initial conditions, 31 ODEs and 7 concentration balance laws and involves 38 state variables (metabolite concentrations in different parts of the cell) as well as a calculation for photosynthetic CO $_{2}$ uptake rate. The metabolites used in the model are denoted by RuBP, PGA, DPGA, T3P, FBP, E4P, S7P, ATP, SBP, NADPH, HexP, PenP, NADHc, NADc, ADPc, ATPc, GLUc, KGc, ADP in photorespiration, ATP in photorespiration, GCEA, GCA, PGCA, GCAc, GOAc, SERc, GLYc, HPRc, GCEAc, T3Pc, FBPc, HexPc, F26BPc, UDPGc, UTPc, SUCP, SUCc and PGAc, and the parameters are mainly initial conditions, maximum reaction velocities and Michaelis–Menten constants for reaction substrates, products, activators and inhibitors. The ‘c’ following some metabolite names indicates that the model compartment corresponds to the metabolite concentration in cytosol; compartment names lacking a ‘c’ are the metabolite in the chloroplast stroma. The full system of equations and parameter values may be found in the appendices of [Citation6]. While the model has not been validated with data as completely as the family of HIV models discussed above, it is representative of the models used to describe plant enzyme kinetics and utilizes well-documented Michaelis–Menten enzyme kinetic model formulations.[Citation19]

Fig. 5 Solutions of selected state variables in the Zhu model [Citation6]. Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ sampling times under constraint C2 when sampling the optimal 5 observables to estimate ${\vec{θ}}_{a}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

Fig. 6 Solutions of selected state variables in the Zhu model.[Citation6] Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ sampling times under constraint C3 when sampling the optimal 5 observables to estimate ${\vec{θ}}_{a}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

In [Citation5], sets of the optimal 3, 5, 10 and 15 metabolites are identified as the most useful to measure in an experiment in order to estimate a subset of 6 parameters, ${\vec{θ}}_{a}$ = [KM $_{11}$ , KM $_{521}$ , KI $_{523}$ , KC, KM $_{1221}$ , KM $_{1241}$ ] $^{T}$ with true values ${\vec{θ}}_{a 0}$ = [0.0115, 0.0025, 0.00007, 0.0115, 0.15, 0.15] $^{T}$ , and a subset of 18 parameters, ${\vec{θ}}_{c}$ =[RuBP $_{0}$ , SBP $_{0}$ , KM $_{11}$ , KM $_{13}$ , KI $_{13}$ , KE $_{4}$ , KM $_{9}$ , KM $_{131}$ , KI $_{135}$ , KE $_{22}$ , KM $_{511}$ , KM $_{521}$ , KI $_{523}$ , KC, KM $_{1221}$ , KM $_{1241}$ , V $_{9}$ , V $_{58}$ ] $^{T}$ with true values ${\vec{θ}}_{c 0}$ = [2, 0.3, 0.0115, 0.02, 0.075, 0.05, 0.05, 0.05, 0.4, 0.058, 0.02, 0.0025, 0.00007, 0.0115, 0.15, 0.15, 0.3242, 0.0168] $^{T}$ . The simulation was set in the framework of an experiment run for 3000 seconds over which 11 samples are taken at evenly spaced times for the optimization of estimates for ${\vec{θ}}_{a}$ and 21 samples for estimation of ${\vec{θ}}_{c}$ .

We use a similar set up for our simulations in order to judge the ability of the time and variable selections to minimize the asymptotic standard errors of both ${\vec{θ}}_{a}$ and ${\vec{θ}}_{c}$ using $N = 5$ and $N = 10$ observables and $n = 11$ time points over the time interval $t \in [0, 3000]$ . Each metabolite is assigned a variance of $5 %$ of its initial value, and to reduce computation time, all metabolites that do not change in concentration over time ( $dx / dt = 0$ ) are excluded from the search algorithm. As it is often possible to measure a particular metabolite’s concentration in plant tissue, we allow $C_{N^{*}}$ to be composed of 28 vectors in $R^{1 \times 40}$ that are composed of a one in the element corresponding to the position of the differential equation describing the dynamics of the metabolite in the vector of model ODEs and zero elsewhere.

We compare the ASE calculated for $n = 11$ evenly spaced measurements over the interval $t \in [0, 3000]$ against the optimally spaced $n = 11$ time points according to constraints C2 and C3 and plot the D-, E- and SE-optimal time point distributions on the solutions of selected compartments in the Zhu model of [Citation6]. The four compartments chosen for graphing are carbon uptake rate $A (t)$ (units $μ$ mol m $^{- 2}$ s $^{- 1}$ ), which is a rate calculated from the output of the model that describes the plant’s efficiency in using the available environmental resources to develop; adenosine triphosphate (ATP) in chloroplast stroma, which is a well-known metabolite active in photosynthesis; sucrose, a sugar, in cytosol (SUCc); and Ribulose-1,5-bisphosphate (RuBP), a metabolite connected to the enzyme RuBisCO, which is essential to carbon fixation.

The simplest scenario we test is the selection of the optimal $N = 5$ observables (metabolites) and $n = 11$ time points to use when estimating the six parameters in ${\vec{θ}}_{a}$ . For all three optimal design methods, the optimal observables determined under the uniform grid were also determined to be optimal after the time point distribution is optimized under constraints C2 and C3. These observables are listed in the upper portion of Table . All three optimal design methods identify the observables PGA, SERc and F26BPc, indicating that these three metabolites may be central to accurate estimates of the parameters in ${\vec{θ}}_{a}$ ; moreover, the E-optimal and SE-optimal cost functions selected the same set of five observables.

The similarity in results, however, does not continue through the selected time point distributions. Under constraint C2 (Figure ), the D-optimal distribution is loosely clustered about the center of the time interval, the E-optimal time points are clustered about $t = 250$ seconds, and SE-optimal chooses a small cluster of time points around the initial bump and allows a few samples after the function reaches a steady state. Using constraint C3 (Figure ), all three optimal design criteria chose a majority of their sampling times before $t = 600$ and allow only a few sampling times after the system reaches a steady state. The optimization of time point distributions yields improved asymptotic standard errors from those of the uniform distribution – sometimes by an order of magnitude or more. For all three time point constraints, D-optimal yields the smallest ASE’s for the most number of parameters, and SE-optimal yields the smallest ASE’s for the second most number of parameters. Both optimal design criteria perform better with the C3 optimal times than the C2 optimal times. Therefore, in this simple case, using either the D- or SE-optimal design criterion with time point distribution constraint C3 would yield the best results.

The next scenario is the selection of the optimal $N = 10$ observables and $n = 11$ time points to use when estimating the six parameters in ${\vec{θ}}_{a}$ . For all three optimal design methods, the optimal observables determined under the uniform grid were also determined to be optimal after the time point distribution is optimized under constraints C2 and C3. These observables are listed in the upper portion of Table reftab:zhusps106. Of the 28 possible observables, 20 were selected by at least one optimal design criterion, 8 of which (DPGA, T3P, E4P, S7P, ATP, GCA, SERc, T3Pc) were selected by two criteria and one, GOAc, was selected by all three. The effect of adding five observables does not have a large effect on the estimated ASE for the six parameters of interest. The ASE’s listed in Table reftab:zhusps106 for the $N = 10$ observable case are on the same order as those in Table for the $N = 5$ observable case for each time point constraint.

The optimal time point distributions also differ from those selected under the five observable case. Under constraint C2 (Figure ), the D-optimal distribution is loosely clustered near the boundaries of the time interval while the E- and SE-optimal time points are scattered over the whole interval. Using constraint C3 (Figure ), the D-optimal times are spread throughout the middle half of the time interval, E-optimal over the first half, and SE-optimal over the second half. While there are no easily discernible patterns between the distributions chosen under the C2 and C3 constraints, the optimization of time point distributions again yield improved asymptotic standard errors from those of a uniform time distribution. For all three time point constraints, D-optimal yields the smallest ASE’s for the most number of parameters, and SE-optimal yields the smallest ASE’s for the second most number of parameters. Other than for the parameters it estimates best, SE-optimal is often the worst criterion. The time point distribution under constraint C2 allows smaller ASE’s for the $N = 10$ observable variables case. Therefore, using the D-optimal design criterion with time point distribution constraint C2 would yield the best results.

The addition of five observables for ${\vec{θ}}_{a}$ does not greatly impact most of the calculated ASE’s. For the time point constraints C2 and C3, all but one of the minimum ASE’s remain on the same order of magnitude, and some of the ASE’s increase when $N = 10$ observables are allowed from when $N = 5$ observables are allowed. This may indicate that for a small parameter set such as ${\vec{θ}}_{a}$ , only a small amount of information is needed to obtain the best possible results; adding extra information (through additional observables) does not further improve the ASE’s.

Table 8 Top: Optimal 10 observables chosen by each optimal design criterion when estimating the parameters ${\vec{θ}}_{a}$ of the model in [Citation6]. Bottom: Approximate asymptotic standard errors calculated using asymptotic theory (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) for each parameter in ${\vec{θ}}_{a}$ using the optimal 10 observables with time point constraints of uniform spacing, constraint C2 and constraint C3. Smallest ASE for each parameter per time point constraint is highlighted in bold font.

Display Table

Fig. 7 Solutions of selected state variables in the Zhu model [Citation6]. Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ sampling times under constraint C2 when sampling the optimal 10 observables to estimate ${\vec{θ}}_{a}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

Fig. 8 Solutions of selected state variables in the Zhu model [Citation6]. Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ sampling times under constraint C3 when sampling the optimal 10 observables to estimate ${\vec{θ}}_{a}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

Now consider the parameter vector ${\vec{θ}}_{c}$ . We identify the optimal $N = 5$ observables and the optimal distribution of $n = 11$ time points using the D-, E-, and SE-optimal cost functionals. Three metabolites, RuBP, SBP and F26BPc, are selected by all three cost functionals, indicating that these three observables may be important to measure to accurately estimate a large number of the parameters in ${\vec{θ}}_{c}$ . The inclusion of RuBP $_{0}$ and SBP $_{0}$ in ${\vec{θ}}_{c}$ may also heavily influence the selection of observables. Many of the other metabolites selected by at least one of the cost functionals, namely, PGA, SERc and T3Pc, were also selected for ${\vec{θ}}_{a}$ in the experimental setup of $N = 5$ observables and $n = 11$ time points, indicating that they are still relevant to some of the parameters in ${\vec{θ}}_{c}$ .

Using the uniform time point distribution yields standard errors that are up to six order of magnitude larger than the parameter (KI $_{523}$ = 0.00007) and are typically two or three orders of magnitude larger than the parameter value (Table ). Constraint C2 also leads to large standard errors when using the D- and E-optimal cost function. The E-optimal standard errors, in fact, do not change from those calculated using the uniform time point distribution, and as shown in Figure , the E-optimal time distribution under constraint C2 is uniform. The SE-optimal design criterion performs the best when using constraint C2 and produces much smaller standard errors, though the standard errors are still larger than the parameter values by orders of magnitude for most parameters. Both D- and SE-optimal perform favourably under constraint C3: the D-optimal, C3 design produces smaller ASE’s than the SE-optimal, C2 design, and the SE-optimal, C3 design yields the smallest ASE’s for almost all parameters. Under the constraint C3 and the SE-optimal cost functional, the calculated ASE’s are zero to one orders of magnitude larger than the parameter values.

The favourable performance of the SE-optimal design criterion under constraint C2 may be attributed to its selected time point distribution (Figure ). The SE-optimal distribution is the only one that places a large number of time points (5 out of 11) before the solution approaches a steady state at $t = 250$ . The E-optimal design, as mentioned previously, selects the uniform distribution, and the D-optimal design suggests sampling after the solution has achieved a steady state. When using constraint C3, the E-optimal design selects a distribution that maintains the uniform spacing between points in the interior of the time interval (0,3000) (Figure ), again leading to poor ASE’s. Both the D- and SE-optimal designs place the majority of the 11 time points in the first one-third of the time interval, thus capturing more of the dynamical system’s behaviour before it achieves a steady state. These front-heavy distributions greatly reduce the estimated ASE’s from those calculated using a uniform time distribution.

Table 9 Top: Optimal 5 observables chosen by each optimal design criterion when estimating the parameters ${\vec{θ}}_{c}$ of the model in [Citation6]. Bottom: Approximate asymptotic standard errors calculated using asymptotic theory (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) for each parameter in ${\vec{θ}}_{c}$ using the optimal 5 observables with time point constraints of uniform spacing, constraint C2 and constraint C3. Smallest ASE for each parameter per time point constraint is highlighted in bold font.

Display Table

Fig. 9 Solutions of selected state variables in the Zhu model.[Citation6] Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ sampling times under constraint C2 when sampling the optimal five observables to estimate ${\vec{θ}}_{c}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

Fig. 10 Solutions of selected state variables in the Zhu model.[Citation6] Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ times under constraint C3 when sampling the optimal five observables to estimate ${\vec{θ}}_{c}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; Bottom Left: SUCc; bottom right: RuBP.

The last case tested is designing experiments using each of the three cost functionals to estimate ${\vec{θ}}_{c}$ when allowed $N = 10$ observables and $n = 11$ time points. As in the case of only allowing five observables for estimation of ${\vec{θ}}_{a}$ , the metabolites RuBP, SBP and F26BPc are selected by all three optimal design criteria. Additionally, PenP and SERc are selected by all three. A total of 15 different metabolites are chosen across the three optimal design criteria, of which only five are chosen by a single criterion and five selected by two criteria (Table ). The three optimal design criteria show more agreement on which 10 observables are important to ${\vec{θ}}_{c}$ than which 10 are important to ${\vec{θ}}_{a}$ (in that case, a total of 20 different metabolites are chosen). This indicates that the three design criterion agree upon observables that are central to understanding the behaviour of the model but also select observables that help minimize their associated cost functionals.

Similar to the five observable cases for ${\vec{θ}}_{c}$ , the ASE’s calculated when using a time point distribution optimized under the C2 or C3 constraints are smaller than those of the uniform time point distribution (though the difference is much less pronounced). While the ASE’s from the D- and SE-optimal criteria are typically smallest under the uniform time point distribution, the E-optimal designed experiment yields the smallest ASE’s for the most number of parameters under constraint C2 (Table ). The ASE’s for the D-optimal experiment under constraint C2 are typically larger than those for the D-optimal experiment with a uniform time distribution, and the SE-optimal ASE’s see only marginal improvements. For parameters when the E-optimal ASE’s are not as small as those for the D- and SE-optimal designs, the ASE’s are often still on the same order. These ASE’s are still orders of magnitude larger than the values of some parameters. The minimum ASE’s under constraint C3 are smaller than those under constraint C2 (and are notable improvements over those from the uniform time point distribution), but most are improved by only one order of magnitude or less. The D- and SE-optimal cost functions yield the smallest standard errors for more parameters under C3 than C2, as well.

The performance improvements seen in the D- and SE-optimal experimental designs from the C2-optimal time point distributions to the distributions optimal under constraint C3 can be easily explained. Under constraint C2, neither criterion produced clusters of time points (Figure ). The distributions generated by both criteria are more similar to the uniform distribution that was used as the initial seed for the time point distribution optimization than to the highly clustered distributions seen for other selections of $n$ and $N$ (such as the one shown in Figure ). The D-optimal distribution only contains one time point before $t = 500$ , as well, thus ignoring the early dynamics of the system. The E-optimal time point distribution contains clusters at the very beginning ( $t < 200$ ) and very end ( $t > 2900$ ) of the time interval as well as a loose cluster between $t = 1500$ and $t = 2000$ .

All three optimal design criteria produce non-uniform distributions under constraint C3 (Figure ). The time point distributions determined using the E- and SE-optimal cost functions place a majority of their time points before $t = 1000$ , capturing the dynamics early in the system solution. The D-optimal distribution also concentrates its sampling times before $t = 1000$ , but does not as tightly cluster the time points. All three cost functionals under time point constraint C3 produce smaller ASE’s than those calculated with a uniform time point distribution (Table ). The distributions generated under both constraints C2 and C3 indicate that heavily weighting time periods when the system is changing its pattern of behaviour (such as transitioning from oscillations to a steady state) produce smaller standard errors than uniform or near-uniform distributions.

Table 10 Top: Optimal 10 observables chosen by each optimal design criterion when estimating the parameters ${\vec{θ}}_{c}$ of the model in [Citation6]. Bottom: Approximate asymptotic standard errors calculated using asymptotic theory (Equation25(24) $\begin{matrix} \overset{\cdot}{x_{1}} & = \frac{10^{- x_{1}}}{l n (10)} (λ_{1} - d_{1} 10^{x_{1}} - (1 - ϵ_{1} u (t)) k_{1} 10^{x_{1} + x_{5}}), \\ \overset{\cdot}{x_{2}} & = \frac{10^{- x_{2}}}{l n (10)} (λ_{2} - d_{2} 10^{x_{2}} - (1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}}), \\ \overset{\cdot}{x_{3}} & = \frac{10^{- x_{3}}}{l n (10)} ((1 - ϵ_{1} u (t)) k_{1} 10^{x_{5} + x_{1}} - δ 10^{x_{3}} - m_{1} 10^{x_{6} + x_{3}}), \\ \overset{\cdot}{x_{4}} & = \frac{10^{- x_{4}}}{l n (10)} ((1 - f ϵ_{1} u (t)) k_{2} 10^{x_{5} + x_{2}} - δ 10^{x_{4}} - m_{2} 10^{x_{6} + x_{4}}), \\ \overset{\cdot}{x_{5}} & = \frac{10^{- x_{5}}}{l n (10)} ((1 - ϵ_{2} u (t)) 10^{3} N_{T} δ (10^{x_{3}} + 10^{x_{4}}) - c 10^{x_{5}} \\ - (1 - ϵ_{1} u (t)) ρ_{1} 10^{3} k_{1} 10^{x_{1} + x_{5}} - (1 - f ϵ_{1} u (t)) ρ_{2} 10^{3} k_{2} 10^{x_{2} + x_{5}}) \\ \overset{\cdot}{x_{6}} & = \frac{10^{- x_{6}}}{l n (10)} (λ_{E} + \frac{b_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{b}} 10^{x_{6}} - \frac{d_{E} (10^{x_{3}} + 10^{x_{4}})}{10^{x_{3}} + 10^{x_{4}} + K_{d}} 10^{x_{6}} - δ_{E} 10^{x_{6}}), \end{matrix}$ (24) ) for each parameter in ${\vec{θ}}_{c}$ using the optimal 10 observables with time point constraints of uniform spacing, constraint C2 and constraint C3. Smallest ASE for each parameter per time point constraint is highlighted in bold font.

Display Table

Fig. 11 Solutions of selected state variables in the Zhu model.[Citation6] Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ sampling times under constraint C2 when sampling the optimal 10 observables to estimate ${\vec{θ}}_{c}$ (Table ). Top left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

Fig. 12 Solutions of selected state variables in the Zhu model.[Citation6] Plotted on top of each curve are the D-optimal (circle), E-optimal (square) and SE-optimal (x) $n = 11$ times under constraint C3 when sampling the optimal 10 observables to estimate ${\vec{θ}}_{c}$ (Table ). Top Left: Carbon uptake rate $A (t)$ ; top right: ATP; bottom left: SUCc; bottom right: RuBP.

Our computations using the Zhu model of [Citation6] for the Calvin cycle indicate that both the selection of observables and the distribution of sampling times across the experiment affect the estimated asymptotic standard errors – sometimes by several orders of magnitude. While the ASE’s calculated for all exercises were often larger than the parameter values, this may be related to the model’s tendency to quickly reach a steady state.

Overall, the optimal design cost functions performed best under time point constraint C3, followed by C2 and finally the uniform distribution. While in some examples a particular cost functional would do particularly well (such as D-optimal in the ${\vec{θ}}_{a}$ , 10 observable case and SE-optimal in the ${\vec{θ}}_{c}$ , 5 observable case), it is hard to determine apriori – or even using the results of a uniform time distribution – which cost functional will perform best for a particular set of parameters and observables. The reduction in ASE’s gained from adding additional observables is also difficult to predict. Adding five observables to better estimate ${\vec{θ}}_{a}$ did not result in noticeably smaller estimated asymptotic standard errors as calculated by asymptotic theory. The improvement may be better quantified using a different method to calculate asymptotic standard errors such as Monte Carlo simulations or bootstrapping, but that is beyond the scope of this initial investigation.

Conclusion

Expanding the efforts reported in [Citation4] on time point distribution selection using the D-, E- and SE-optimal design criteria, we introduce a new methodology and algorithm for selecting both optimal observables and an optimal time point distribution. While the D- and E-optimal cost functions are well established in the literature, the SE-optimal design method is relatively new but competitive. We compare the abilities of these three design criteria to reduce the estimated asymptotic standard errors of selected subsets of parameters for two ordinary differential equation models representative of ODE systems used in industrial applications: the log-scaled HIV model (27) and the model of [Citation6] for the Calvin cycle. These examples suggest the strengths of each design method in selecting appropriate observable variables and sampling times.

Collecting patient data for the treatment of HIV is limited by the types and expenses of assays available. In efforts with experimental data,[Citation7] parameter estimation for the HIV model (27) has only been performed with one or two observed quantities. Our tests for determining the optimal sets of 1, 2 and 3 observables when data is collected at uniformly distributed times show that the SE-optimal design criterion performs very similarly to the D-optimal criterion; moreover, a small number of observables may be compensated for by allowing more sampling times. While only measuring one observable may not quite be adequate for accurate parameter estimates, taking either 201 or 401 measurements of two observables yielded similar standard errors when compared to an experiment in which three observables are sampled 51-101 times.

For the HIV model, we compared the performance of all three design criteria under time point distribution constraints C2 and C3. When determining the optimal distribution of 35 time points, D-optimal yields the smallest ASE’s for the most parameters, followed by E-optimal, for both time point constraints. When determining the optimal distribution of 105 time points, however, D-, E-, and SE-optimal each yield the smallest ASE’s for some of the parameters under C2, and both D- and E-optimal are strong performers under constraint C3. The SE-optimal standard errors, while not the lowest, are on the same order of magnitude as those of D- and E-optimal. Therefore for our selected parameter values, D- and E-optimal more reliably yield the smallest standard errors.

The Zhu model of [Citation6] is representative of experimental environments where a great number of state variables may possibly be measured but the costs of measurements limit the number of observed variables that may be sampled. We compared the estimated asymptotic standard errors calculated using the observables and time point distributions determined with the D-,E-, and SE-optimal design methods. For the small parameter subset ${\vec{θ}}_{a}$ , a small number of observables is sufficient to obtain as small ASE’s as possible - adding more observables does not improve the ASE’s generated for any of the design methods. Optimizing the time point distribution under either constraint C2 or C3 improves the ASE’s by up to two orders of magnitude. At both 5 and 10 observables, the D-optimal design provides the smallest ASE’s for the most parameters, followed by SE-optimal. The E-optimal standard errors, however, are often on the same order of magnitude, indicating that it is still competitive.

When testing the treatment of the larger parameter subset ${\vec{θ}}_{c}$ of the Zhu model by each of the three design methods, we found that while an excellent selection of observables and time points may still yield small standard errors (as with the SE-optimal design method with 5 observables under constraint C3), adding more information through more observables reduces the calculated ASE’s in the uniform time distribution. At 5 observables, the SE-optimal cost function performs the best and E-optimal time distribution optimization fails to venture far from the uniform grid initially used; at 10 observables, E-optimal performs best under constraint C2 and all three design criteria perform the best for different parameters under constraint C3. Thus for the Zhu model, it is difficult to predict which optimal design criterion will perform best in a particular case.

While the performance of each optimal design criteria is highly dependent upon the ODE system, parameter subset, number of observables allowed, number of time points allowed, and even constraints imposed on the time point distribution selection, the examples performed using the HIV model (27) and the Zhu model [Citation6] demonstrate that the D-, E-, and SE-optimal design methods are all competitive and useful. Moreover, in the Zhu model examples, selection of optimal time points can reduce the estimated asymptotic standard errors of parameters by several orders of magnitude, thus providing data that is more useful in a parameter estimation problem without taking more samples or measuring more observables. Thus the new methodology developed and illustrated here can be an important tool in designing experimental protocols for obtaining data to be used in estimating parameters in complex dynamic models.

Acknowledgments

This research was supported in part (HTB and KLR) by grant number R01AI07 1915-07 from the National Institute of Allergy and Infectious Diseases and in part (KLR) by Fellowships from the CRSC and CQSB. The authors are grateful to Dr. Laura Potter of Bioinformatics, Syngenta Biotechnology, Inc., Research Triangle Park, NC, for her numerous conversations on plant biology experimentation and for initially providing, among a number of other references, the reference [Citation6].

References

Banks HT, Davidian M, Samuels JR, Sutton KL. Chapter 11, An inverse problem statistical methodology summary, CRSC Technical Report CRSC-TR08-01, NCSU, 2008. In: Chowell G, Hayman JM, Bettencourt LMA, Castillo-Chavez C, editors. Statistical estimation approaches in epidemiology. Berlin: Springer; 2009. p. 249–302.
Google Scholar
Banks HT, Dediu S, Ernstberger SL, Kappel F, Generalized sensitivities and optimal experimental design, CRSC-TR08-12; 2008, (Revised) 2009 [J. Inverse Ill-posed Probl. 2010;18:25–83].
Google Scholar
Thomaseth K, Cobelli C. Generalized sensitivity functions in physiological system identification. Ann. Biomed. Eng. 1999;27:607–616.
PubMed Web of Science ®Google Scholar
Banks HT, Holm K, Kappel F. Comparison of optimal design methods in inverse problems, CRSC Technical Report CRSC-TR10-11, NCSU; 2010 [Inverse Probl. 2011;27:075002].
Google Scholar
Avery M, Banks HT, Basu K, Cheng Y, Eager E, Khasawinah S, Potter L, Rehm KL. Experimental design and inverse problems in plant biological modeling, CRSC Technical Report CRSC-TR11-12, NCSU; 2011 [J. Inverse and Ill-posed Problems, doi:10.1515/jiip-2012-0208].
Google Scholar
Zhu X-G, Sturler E, Long SP. Optimizing the distribution of resources between enzymes of carbon metabolism can dramatically increase photosynthetic rate: a numerical simulation using an evolutionary algorithm. Plant Physiol. 2007;145:513–526.
PubMed Web of Science ®Google Scholar
Adams BM, Banks HT, Davidian M, Rosenberg ES. Model fitting and prediction with HIV treatment interruption data, CRSC Technical Report CRSC-TR05-40, NCSU; 2005 [Bull. Math. Biol. 2007;69:563–584].
Google Scholar
Prohorov YV. Convergence of random processes and limit theorems in probability theory. Theor. Prob. Appl. 1956;1:157–214.
Google Scholar
Banks HT, Bihari KL. Modeling and estimating uncertainty in parameter estimation, CRSC Technical Report CRSC-TR99-40, NCSU; 1999 [Inverse Problems. 2001;17:95–111].
Google Scholar
Attarian A. tssolve.m, Retrieved August 2011, from http://www4.ncsu.edu/arattari/.
Google Scholar
Fink M. myAD, Retrieved August 2011, from http://www.mathworks.com/matlabcentral/fileexchange/15235-automatic-differentiation-for-matlab.
Google Scholar
Kuntsevich A, Kappel F. SolvOpt, Retrieved July 2011, from http://www.kfunigraz.ac.at/imawww/kuntsevich/solvopt/.
Google Scholar
Davidian M, Giltinan D. Nonlinear models for repeated measurement data. London: Chapman & Hall; 1998.
Google Scholar
Seber GAF, Wild CJ. Nonlinear regression. New York: Wiley; 1989.
Google Scholar
Adams BM, Banks HT, Davidian M, Kwon H, Tran HT, Winfree SN, Rosenberg ES. HIV dynamics: modeling, data analysis, and optimal treatment protocols, CRSC Technical Report CRSC-TR04-05, NCSU; 2004 [J. Comput. Appl. Math. 2005;184:10–49].
Google Scholar
Adams BM. Non-parametric parameter estimation and clinical data fitting with a model of HIV infection [PhD Thesis]. NC State Univ.; 2005.
Google Scholar
Banks HT, Davidian M, Hu S, Kepler G, Rosenberg E. Modeling HIV, immune response and validation with clinical data, CRSC Technical Report CRSC-TR07-09, NCSU; 2007 [J. Biol. Dyn. 2008;2:357–385].
Google Scholar
Banks HT, Cintrón-Arias A, Kappel F. Parameter selection methods in inverse problem formulation, CRSC Technical Report CRSC-TR10-03, NCSU, revised November 2010. In: Mathematical model development and validation in physiology: application to the cardiovascular and respiratory systems. Lecture notes in mathematics. Mathematical biosciences subseries. Springer-Verlag; in press.
Google Scholar
Banks HT. Modeling and control in the biomedical sciences, Lecture notes in biomathematics, Vol. 6. Levin S, editor. Berlin: Springer-Verlag; 1975.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Experimental design for vector output systems

Abstract

Introduction