Search in:

Inverse Problems in Science and Engineering Volume 27, 2019 - Issue 6

Submit an article Journal homepage

Open access

1,408

Views

CrossRef citations to date

Altmetric

Listen

Original Articles

Continuous analogue to iterative optimization for PDE-constrained inverse problems

R. BoigerInstitute of Mathematics, Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria;Materials Center Leoben Forschung Gmbh, Leoben, AustriaView further author information

A. FiedlerChair of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Garching, Germany;Institute of Computational Biology, Helmholtz Zentrum München – German Research Center for Environmental Health, Neuherberg, GermanyView further author information

J. HasenauerChair of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Garching, Germany;Institute of Computational Biology, Helmholtz Zentrum München – German Research Center for Environmental Health, Neuherberg, GermanyCorrespondence[email protected]
View further author information

B. KaltenbacherInstitute of Mathematics, Alpen-Adria-Universität Klagenfurt, Klagenfurt, AustriaView further author information

Pages 710-734 | Received 29 Jun 2017, Accepted 24 May 2018, Published online: 10 Jul 2018

Cite this article
https://doi.org/10.1080/17415977.2018.1494167
CrossMark

In this article

ABSTRACT
1. Introduction
2. Mathematical model
3. Parameter estimation problem
4. Continuous analogue of descent methods for PDE-constrained problems
5. Local stability and convergence to a local optimum
6. Discussion of the assumptions
7. Application
8. Conclusion and outlook
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The parameters of many physical processes are unknown and have to be inferred from experimental data. The corresponding parameter estimation problem is often solved using iterative methods such as steepest descent methods combined with trust regions. For a few problem classes also continuous analogues of iterative methods are available. In this work, we expand the application of continuous analogues to function spaces and consider PDE (partial differential equation)-constrained optimization problems. We derive a class of continuous analogues, here coupled ODE (ordinary differential equation)–PDE models, and prove their convergence to the optimum under mild assumptions. We establish sufficient bounds for local stability and convergence for the tuning parameter of this class of continuous analogues, the retraction parameter. To evaluate the continuous analogues, we study the parameter estimation for a model of gradient formation in biological tissues. We observe good convergence properties, indicating that the continuous analogues are an interesting alternative to state-of-the-art iterative optimization methods.

KEYWORDS:

Partial differential equations
optimization
continuous analogues
mathematical biology
steady state

MSC SUBJECT CLASSIFICATIONS:

93D20
49N45
35K57
37N40

1. Introduction

Partial differential equations (PDEs) are used in various application areas to describe physical, chemical, biological, economic, or social processes. The parameters of these processes are often unknown and need to be estimated from experimental data [Citation1,Citation2]. The maximum likelihood and maximum a posteriori parameter estimates are given by the solutions of PDE-constrained optimization problems. Since PDE-constrained optimization problems are in general nonlinear, non-convex as well as computationally challenging, efficient and reliable optimization methods are required.

Over the last decades, a large number of numerical methods for PDE-constrained optimization problems have been proposed (see [Citation3–8] and references therein). Beyond methodological contributions, there is numerous literature on parameter estimation in special applications available [Citation9]. Most of the theoretical and applied work focuses on iterative methods, which generate a discrete sequence of points along which the objective function decreases. In this manuscript, we pursue an alternative route and consider continuous analogues (see [Citation10–14] and references therein).

Continuous analogues are formulations of iterative optimization methods in terms of differential equations and have been derived for a series of optimizers for real-valued optimization problems, including the Levenberg–Marquardt and the Newton–Raphson method [Citation11,Citation15]. For many constrained and unconstrained optimization problems, continuous analogues exhibit larger regions of attraction and more robust convergence than discrete iterative methods [Citation15]. In a recent study, the concept of continuous analogues has been employed to develop a tailored parameter optimization method for ordinary differential equation (ODE) models with steady-state constraints [Citation16]. This method outperformed iterative constrained optimization methods with respect to convergence and computational efficiency [Citation16], and complemented methods based on computer algebra [Citation17]. Although continuous analogues are generally promising, we are not aware of continuous analogues for solving PDE-constrained optimization problems.

In this manuscript, we formulate a continuous analogue for PDE-constrained optimization problems based on the structure of the PDE and the local geometry of its solution space. The continuous analogue is a coupled system of ODEs and PDEs, which has the optima of the PDE-constrained optimization problem as equilibrium points. We establish local stability and convergence under mild assumptions. The continuous analogue enables us to use adaptive numerical methods for solving optimization problems with PDE constraints. Beyond the generalization of previous results for ODEs (i.e. [Citation16]) to function spaces, we provide rationales and constraints for the choice of tuning parameters, in particular of the retraction factor. The methods are applied to a model of gradient formation in biological tissues.

The manuscript is structured as follows . In Sections 2 and 3, we introduce the considered class of models and PDE-constrained optimization problems. For these classes, we propose a continuous analogue to descent methods in Section 4. In Section 5, we prove convergence of the continuous analogue to local optima. The assumptions under which stability and convergence to local optima are achieved are discussed in Section 6. The properties of the continuous analogues are studied in Section 7 for a model of gradient formation in biological tissues.

2. Mathematical model

We consider parameter estimation for models with elliptic and parabolic PDE constraints. In the parabolic case, the initial condition $u_{0}$ of the parabolic PDE is defined as the solution of an elliptic PDE: (1) $\begin{aligned} u_{t} (t) & = C (θ, t, u (t)) t \in] 0, T [ \\ u (0) & = u_{0}, \end{aligned}$ (1) with (2) $0 = C_{0} (θ, u_{0}),$ (2) in which $u (t)$ is the time-dependent solution of the elliptic PDE, and C and $C_{0}$ are operators. The considered combination of parabolic and elliptic PDEs is flexible and allows for the analysis of many practically relevant scenarios. Here, $u_{0}$ denotes a stable steady state of the unperturbed system, while u denotes the transient solution of the perturbed system starting in the steady state of the unperturbed system $u_{0}$ . The observables of the models are defined via observation operators $B (θ, t, u (t))$ and $B_{0} (θ, u_{0})$ (3) $y (t) = B (θ, t, u (t)), y_{0} = B_{0} (θ, u_{0}) .$ (3) We work in the following functional analytic setting. Let V be a separable Banach space with dual space $V^{*}$ and let V be continuously and densely embedded into a Hilbert space H, such that $V \subseteq H ≅ H^{*} \subseteq V^{*}$ forms a Gelfand triple. Furthermore, we consider the initial value $u_{0} \in V$ and the transient state $u \in W (0, T) = L^{2} (0, T; V) \cap H^{1} (0, T; V^{*})$ . The parameters θ are assumed to be finite dimensional and real-valued, $θ \in R^{n_{θ}}$ . For each t and θ, the observation operators B, $B_{0}$ are mappings, $B : R^{n_{θ}} \times] 0, T [\times V \to Z$ , $B_{0} : R^{n_{θ}} \times V \to Z_{0}$ , where the observation spaces $Z, Z_{0}$ are Hilbert spaces. The operators C and $C_{0}$ are mappings, $C : R^{n_{θ}} \times] 0, T [\times V \to V^{*}$ and $C_{0} : R^{n_{θ}} \times V \to V^{*}$ . Existence of a weak solution holds, e.g. under the following assumptions on the differential operator C ([Citation18] p.770 ff.):

Assumption 2.1

There exists $ε_{θ} > 0$ such that for all $θ \in B_{ε_{θ}} (θ^{*}) = {θ \in R^{n_{θ}} | ∥ θ - θ^{*} ∥ < ε_{θ}} \subseteq R^{n_{θ}},$ the following holds.

The operator $- C (θ, t, \cdot)$ is monotone and hemicontinuous for each $t \in] 0, T [$ .
The operator $- C (θ, t, \cdot)$ is coercive for each $t \in] 0, T [,$ i.e. there exist constants $c_{1} > 0$ and $c_{2} \geq 0$ such that $- {⟨C (θ, t, v), v⟩}_{V^{*}, V} \geq c_{1} ∥ v ∥_{V}^{2} - c_{2} for all v \in V, t \in] 0, T [.$
The operator $C (θ, \cdot, \cdot)$ satisfies a growth condition, i.e. there exist a non-negative function $c_{3} \in L^{2} (0, T)$ and a constant $c_{4} > 0$ such that $∥ C (θ, t, v) ∥_{V^{*}} \leq c_{3} (t) + c_{4} ∥ v ∥_{V} for all v \in V, t \in] 0, T [.$
The function $t \mapsto ⟨ C (θ, \cdot, w), v ⟩_{V^{*}, V}$ is measurable on $] 0, T [$ for all $v, w \in V$ .

For the differential operator $C_{0}$ , the first two items in Assumption 2.1 are assumed to hold, cf. Assumption 5.2 below.

In mathematical biology, the differential operator C is often semilinear and describes a reaction–diffusion–advection equation, $C (θ, t, u) = f (k, u) - \nabla_{x} \cdot (v u - D \nabla_{x} u),$ in which $u (t) \in V$ is a concentration vector, $x \in Ω \subseteq R^{n}$ is the spatial location and $f : R^{n_{k}} \times V \to V^{*}$ is the reaction term. The parameters $θ = (k, v, D)$ are the velocity vector $v \in R^{n}$ , the diffusion matrix $D \in R^{n \times n}$ and the kinetic parameters $k \in R^{n_{k}}$ . If D is positive definite and f fulfils a certain growth condition, then this operator satisfies Assumption 2.1.

3. Parameter estimation problem

We consider the estimation of the unknown model parameters θ from noise-corrupted measurements of the observables y. To obtain estimates for the unknown parameters, an objective function is minimized, e.g. the negative log-likelihood function or the sum-of-squared-residuals. In the following, we distinguish two cases.

3.1. Elliptic and parabolic PDE constraints

In the general case, observations are available for the initial state and the transient phase. The objective function $\tilde{J}$ depends on the parameters and the parameter-dependent solutions of the parabolic and the elliptic PDE, $\tilde{J} : R^{n_{θ}} \times V \times W (0, T) \to R$ , e.g. $\tilde{J} (θ, u_{0}, u) = \frac{1}{2} ∥ y_{0} - B_{0} (θ, u_{0}) ∥_{Z_{0}}^{2} + \frac{1}{2} \int_{0}^{T} ∥ y (t) - B (θ, t, u) ∥_{Z}^{2} d t$ . The optimization problem is given by (4) $\begin{aligned} min_{θ, u_{0}, u} & \tilde{J} (θ, u_{0}, u) \\ s.t. & u_{t} = C (θ, t, u) u (0) = u_{0} \\ 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (4)

3.2. Elliptic PDE constraint

In many applications, only experimental data for the steady state of a process are available. Possible reasons include fast equilibration of the process and limitations of experimental devices. In this case, the problem is simplified and the objective function J depends only on the parameters and parameter-dependent solutions of the elliptic PDE, $J : R^{n_{θ}} \times V \to R$ , e.g. $J (θ, u_{0}) = \frac{1}{2} ∥ y_{0} - B_{0} (θ, u_{0}) ∥_{Z_{0}}^{2}$ . The optimization problem is given by (5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) The reduced formulation of the optimization problem (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ) is given by (6) $min_{θ} j (θ) := J (θ, ϕ_{0} (θ)),$ (6) in which $ϕ_{0} (θ)$ denotes the parameter-dependent solution of $C_{0} (θ, u_{0}) = 0$ , and $j : R^{n_{θ}} \to R$ denotes the reduced objective function.

4. Continuous analogue of descent methods for PDE-constrained problems

Optimization problems of types (Equation4(4) $\begin{aligned} min_{θ, u_{0}, u} & \tilde{J} (θ, u_{0}, u) \\ s.t. & u_{t} = C (θ, t, u) u (0) = u_{0} \\ 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (4) ) and (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ) are currently often solved using iterative descent methods, combined with trust regions. In the following, we develop a continuous analogue of an iterative descent method for PDEs. For simplicity, we first consider the case of an elliptic PDE constraint (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ) and afterwards generalize the results to the case of mixed parabolic and elliptic PDE constraints (Equation4(4) $\begin{aligned} min_{θ, u_{0}, u} & \tilde{J} (θ, u_{0}, u) \\ s.t. & u_{t} = C (θ, t, u) u (0) = u_{0} \\ 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (4) ).

4.1. Elliptic PDE Constraint

To solve optimization problem (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ), we derive a coupled ODE–PDE system. The trajectory of this continuous analogue evolves in parameter and state space on the manifold defined by $C_{0} (θ, u_{0}) = 0$ towards a local minimum. To evolve on the manifold, the continuous analogue exploits the first-order geometry of the manifold, i.e. its tangent space.

Mathematically, the first-order geometry of $C_{0} (θ, ϕ_{0} (θ)) = 0$ is defined by the sensitivity equations (7) $\frac{\partial C_{0}}{\partial u_{0}} (θ, ϕ_{0} (θ)) \frac{\partial ϕ_{0}}{\partial θ_{i}} (θ) + \frac{\partial C_{0}}{\partial θ_{i}} (θ, ϕ_{0} (θ)) = 0, i \in {1, \dots, n_{θ}} .$ (7) The sensitivity equations can be reformulated to $\frac{\partial ϕ_{0}}{\partial θ_{i}} (θ) = - {(\frac{\partial C_{0}}{\partial u_{0}} (θ, ϕ_{0} (θ)))}^{- 1} \frac{\partial C_{0}}{\partial θ_{i}} (θ, ϕ_{0} (θ)), i \in {1, \dots, n_{θ}},$ provided the inverse of $(\partial C_{0} / \partial u_{0}) (θ, ϕ_{0} (θ))$ exists. We extend $\nabla_{θ} ϕ_{0}$ to points $(θ, u_{0})$ not necessarily lying on the solution manifold of $C_{0} = 0$ by defining the operator $S_{0} : R^{n_{θ}} \times V \to V^{n_{θ}}$ that provides the solution of the sensitivity equations (8) $\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}) {S_{0}}_{i} (θ, u_{0}) + \frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}) = 0, i \in {1, \dots, n_{θ}},$ (8) for given θ and $u_{0}$ . With (Equation7(7) $\frac{\partial C_{0}}{\partial u_{0}} (θ, ϕ_{0} (θ)) \frac{\partial ϕ_{0}}{\partial θ_{i}} (θ) + \frac{\partial C_{0}}{\partial θ_{i}} (θ, ϕ_{0} (θ)) = 0, i \in {1, \dots, n_{θ}} .$ (7) ), it holds that (9) $S_{0} (θ, ϕ_{0} (θ)) = \nabla_{θ} ϕ_{0} (θ),$ (9) provided the solutions to (Equation7(7) $\frac{\partial C_{0}}{\partial u_{0}} (θ, ϕ_{0} (θ)) \frac{\partial ϕ_{0}}{\partial θ_{i}} (θ) + \frac{\partial C_{0}}{\partial θ_{i}} (θ, ϕ_{0} (θ)) = 0, i \in {1, \dots, n_{θ}} .$ (7) ) and (Equation8(8) $\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}) {S_{0}}_{i} (θ, u_{0}) + \frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}) = 0, i \in {1, \dots, n_{θ}},$ (8) ) are unique.

In order to couple changes in θ with appropriate changes in $u_{0}$ , we now use the fact that the function $\nabla_{θ} ϕ_{0} (θ)$ provides the first-order term of the Taylor series expansion of the steady state with respect to the parameter vector θ, (10) $ϕ_{0} (θ + r Δ θ) = ϕ_{0} (θ) + \nabla_{θ} ϕ_{0} (θ) r Δ θ + o (r) as r \to 0, r \in R .$ (10) Defining ${\hat{ϕ}}_{0} (r) := ϕ_{0} (θ + r Δ θ)$ for some $Δ θ \in R^{n_{θ}}$ and differentiating (Equation10(10) $ϕ_{0} (θ + r Δ θ) = ϕ_{0} (θ) + \nabla_{θ} ϕ_{0} (θ) r Δ θ + o (r) as r \to 0, r \in R .$ (10) ) with respect to r yields (11) $\frac{d {\hat{ϕ}}_{0}}{d r} (r) = \nabla_{θ} ϕ_{0} (θ) Δ θ + o (1) = S_{0} (θ + r Δ θ, {\hat{ϕ}}_{0} (r)) Δ θ + o (1) as r \to 0.$ (11) This relation motivates the formulation of the coupled ODE–PDE model (12) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) \frac{d θ}{d r} (r) = S_{0} (θ, u_{0}) g (θ, u_{0}), u_{0} (0) = u_{0, 0}, \end{aligned}$ (12) using the artificial time parameter r. For a change in the parameters $d θ / d r$ , the update in $u_{0}$ is chosen according to (Equation11(11) $\frac{d {\hat{ϕ}}_{0}}{d r} (r) = \nabla_{θ} ϕ_{0} (θ) Δ θ + o (1) = S_{0} (θ + r Δ θ, {\hat{ϕ}}_{0} (r)) Δ θ + o (1) as r \to 0.$ (11) ). Solutions of this dynamical system evolve on the manifold $C_{0} (θ (r), u_{0} (r)) = 0$ for arbitrary parameter update directions $g : R^{n_{θ}} \times V \to R^{n_{θ}}$ , provided that the initial state is on the manifold $C_{0} (θ_{0}, u_{0, 0}) = 0$ . The state variables of this coupled ODE–PDE system are θ and $u_{0}$ , and the path variable is r. To solve optimization problem (Equation6(6) $min_{θ} j (θ) := J (θ, ϕ_{0} (θ)),$ (6) ), g is chosen as an arbitrary descent direction satisfying $\nabla j (θ)^{T} g (θ, u_{0}) < 0,$ more precisely satisfying Assumption 5.5 below. For example, g can be chosen as a steepest descent direction (13) $g = \underset{∥ v ∥_{*} \leq 1}{argmin} \nabla j (θ)^{T} v$ (13) for some norm $∥ \cdot ∥_{*}$ . For the Euclidian norm, we obtain the gradient descent direction, (14) $g_{i} (θ, u_{0}) := - \frac{\partial J}{\partial θ_{i}} (θ, u_{0}) - {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} =: d_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}},$ (14) in which we substituted $ϕ_{0} (θ)$ by $u_{0}$ extending the definition also to states $u_{0}$ that are not on the steady-state manifold. Likewise, defining $∥ v ∥_{*}^{2} := v^{T} H (θ, u_{0}) v$ with some positive definite matrix $H (θ, u_{0})$ , so that (15) $g (θ, u_{0}) = H (θ, u_{0})^{- 1} d (θ, u_{0})$ (15) leads to a descent direction. Using, e.g. the Hessian of j, (16) $\begin{aligned} H_{i j} (θ, u_{0}) & = \frac{\partial^{2} J}{\partial θ_{i} \partial θ_{j}} (θ, u_{0}) + {⟨\frac{\partial^{2} J}{\partial u_{0} \partial θ_{j}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} \\ + {⟨\frac{\partial^{2} J}{\partial u_{0} \partial θ_{i}} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})⟩}_{V^{*}, V} + \frac{\partial^{2} J}{\partial u_{0}^{2}} (θ, u_{0}) ({S_{0}}_{i} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})) \\ + {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {T_{0}}_{i, j} (θ, u_{0})⟩}_{V^{*}, V}, i, j \in {1, \dots, n_{θ}}, \end{aligned}$ (16) with the second-order sensitivities defined by $\begin{aligned} \frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}) {T_{0}}_{i, j} (θ, u_{0}) + \frac{\partial^{2} C_{0}}{\partial u_{0}^{2}} (θ, u_{0}) ({S_{0}}_{i} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})) + \frac{\partial^{2} C_{0}}{\partial u_{0} \partial θ_{j}} (θ, u_{0}) {S_{0}}_{i} (θ, u_{0}) \\ + \frac{\partial^{2} C_{0}}{\partial u_{0} \partial θ_{i}} (θ, u_{0}) {S_{0}}_{j} (θ, u_{0}) + \frac{\partial^{2} C_{0}}{\partial θ_{i} \partial θ_{j}} (θ, u_{0}) = 0, i, j \in {1, \dots, n_{θ}}, \end{aligned}$ leads to Newton, Gauss-Newton (upon skipping the ${T_{0}}_{i, j}$ terms) or quasi-Newton methods (where approximations to the Hessian (Equation16(16) $\begin{aligned} H_{i j} (θ, u_{0}) & = \frac{\partial^{2} J}{\partial θ_{i} \partial θ_{j}} (θ, u_{0}) + {⟨\frac{\partial^{2} J}{\partial u_{0} \partial θ_{j}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} \\ + {⟨\frac{\partial^{2} J}{\partial u_{0} \partial θ_{i}} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})⟩}_{V^{*}, V} + \frac{\partial^{2} J}{\partial u_{0}^{2}} (θ, u_{0}) ({S_{0}}_{i} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})) \\ + {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {T_{0}}_{i, j} (θ, u_{0})⟩}_{V^{*}, V}, i, j \in {1, \dots, n_{θ}}, \end{aligned}$ (16) ) are computed via low rank updates). As the Hessian is not guaranteed to be positive definite, regularization with a scaled identity matrix, $H_{μ} (θ, u_{0}) = H (θ, u_{0}) + μ I$ with $μ > 0$ , might be useful. However, how to choose μ and possible continuous update rules are out of the scope of this work. The coupled ODE–PDE systems (Equation12(12) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) \frac{d θ}{d r} (r) = S_{0} (θ, u_{0}) g (θ, u_{0}), u_{0} (0) = u_{0, 0}, \end{aligned}$ (12) ) can be solved using numerical time-stepping methods. These numerical methods might, however, accumulate errors resulting in the divergence of the state $(θ (r), u_{0} (r))$ from the steady-state manifold. Additionally, the initial state, $u_{0, 0}$ , might not be on the steady-state manifold. To account for this, we include the retraction term $λ C_{0} (θ, u_{0})$ in the evolution equation of $u_{0}$ , with retraction factor $λ > 0$ . This yields the following continuous analogue of a descent method for optimization problems with elliptic PDE constraints, (17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) As, for fixed θ, the equation $C_{0} (θ, u_{0}) = 0$ defines a stable steady state of the PDE (Equation2(2) $0 = C_{0} (θ, u_{0}),$ (2) ), the retraction term stabilizes the manifold. For $λ ≫ 1$ , the system should first converge to the steady state $ϕ_{0} (θ_{0})$ for the initial parameter $θ_{0}$ and then move along the manifold to a local optimum $θ^{*}$ as illustrated in Figure .

Figure 1. The state of the system is illustrated along the trajectory of (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ). In the first phase, the equilibration phase, the system converges to the manifold. The solution is not feasible during this phase as the equality constraint, $C_{0} (θ, u_{0}) = 0$ , is violated. In the course of the equilibration, the objective function value might increase. In the second phase, the minimization phase, the objective function is minimized along the steady-state manifold.

Figure 1. The state of the system is illustrated along the trajectory of (Equation17(17) dθdr(r)=g(θ,u0),θ(0)=θ0,du0dr(r)=S0(θ,u0)g(θ,u0)+λC0(θ,u0),u0(0)=u0,0.(17) ). In the first phase, the equilibration phase, the system converges to the manifold. The solution is not feasible during this phase as the equality constraint, C0(θ,u0)=0, is violated. In the course of the equilibration, the objective function value might increase. In the second phase, the minimization phase, the objective function is minimized along the steady-state manifold.

4.2. Elliptic and parabolic PDE constraints

The continuous analogue for descent with elliptic PDE constraints can be generalized to problems with parabolic and elliptic PDE constraints. One possibility for doing so is to consider the partially reduced problem (18) $\begin{aligned} min_{θ, u_{0}} & \tilde{J} (θ, u_{0}) := \tilde{J} (θ, u_{0}, ϕ (θ, u_{0})) \\ s.t. & 0 = C_{0} (θ, u_{0}), \end{aligned}$ (18) in which $u = ϕ (θ, u_{0})$ denotes the solution to $u_{t} = C (θ, t, u)$ with $u (0) = u_{0}$ . Given this formulation, we can use continuous analogue (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) with (19) ${\tilde{g}}_{i} (θ, u_{0}) = - \frac{\partial \tilde{J}}{\partial θ_{i}} (θ, u_{0}) - \frac{\partial \tilde{J}}{\partial u_{0}} (θ, u_{0}) {S_{0}}_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}} .$ (19) To avoid the need for the solution operator $ϕ : R^{n_{θ}} \times V \to W (0, T)$ , alternatively, a continuous analogue of the full problem can also be formulated. This is beyond the scope of this study, though, and will be subject of future research.

5. Local stability and convergence to a local optimum

The behaviour of the coupled ODE–PDE system (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) introduced in the previous section depends on the properties of the objective function and the PDE model, as well as the retraction factor λ. To prove that a solution of (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) with an appropriate retraction factor λ is well defined and converges to the local minimizer $(θ^{*}, u_{0}^{*}) = (θ^{*}, ϕ_{0} (θ^{*}))$ of (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ), we impose the following assumptions.

Assumption 5.1

The descent direction vanishes at the minimizer $θ^{*}$ of the optimization problem $min_{θ} j (θ),$ $g (θ^{*}, ϕ_{0} (θ^{*})) = 0.$

Assumption 5.2

There exists $ε_{θ} > 0$ such that for all $θ \in B_{ε_{θ}} (θ^{*}) = {θ \in R^{n_{θ}} | ∥ θ - θ^{*} ∥ < ε_{θ}} \subseteq R^{n_{θ}},$ the following holds.

The operator $- C_{0} (θ, \cdot)$ is monotone and hemicontinuous.
The operator $- C_{0} (θ, \cdot)$ is coercive.

Assumption 5.3

The function $C_{0} (θ, u_{0})$ is locally uniformly monotonically decreasing, i.e. there exist $γ_{c} > 0$ and $ε_{u_{0}} > 0$ such that ${⟨C_{0} (θ, u_{0}^{1}) - C_{0} (θ, u_{0}^{2}), u_{0}^{1} - u_{0}^{2}⟩}_{V^{*}, V} \leq - γ_{c} ∥ u_{0}^{1} - u_{0}^{2} ∥_{V}^{2}$ for all $θ \in B_{ε_{θ}} (θ^{*})$ and $u_{0}^{1}, u_{0}^{2} \in B_{ε_{u_{0}}} (ϕ_{0} (θ^{*})) := {u_{0} \in V | ∥ u_{0} - ϕ_{0} (θ^{*}) ∥_{V} < ε_{u_{0}}}$ .

Assumption 5.4

The sensitivity $S_{0} (θ, u_{0})$ is locally Lipschitz continuous with respect to $u_{0},$ i.e. there exists $L_{S_{0}} \geq 0$ such that $∥ S_{0} (θ, u_{0}^{1}) - S_{0} (θ, u_{0}^{2}) ∥_{V^{n_{θ}}} \leq L_{S_{0}} ∥ u_{0}^{1} - u_{0}^{2} ∥_{V}$ for all $θ \in B_{ε_{θ}} (θ^{*})$ and $u_{0}^{1}, u_{0}^{2} \in B_{ε_{u_{0}}} (ϕ_{0} (θ^{*}))$ .

Assumption 5.5

The mapping $θ \mapsto g (θ, ϕ_{0} (θ))$ is uniformly monotonically decreasing on $B_{ε_{θ}} (θ^{*}),$ i.e. there exists $γ_{g} > 0$ such that $(g (θ, ϕ_{0} (θ)) - g (θ^{*}, ϕ_{0} (θ^{*})))^{T} (θ - θ^{*}) \leq - γ_{g} ∥ θ - θ^{*} ∥^{2}$ for all $θ \in B_{ε_{θ}} (θ^{*})$ .

Assumption 5.6

The descent direction g is locally Lipschitz continuous with respect to $u_{0},$ i.e. there exists $L_{g} \geq 0$ such that $∥ g (θ, u_{0}^{1}) - g (θ, u_{0}^{2}) ∥ \leq L_{g} ∥ u_{0}^{1} - u_{0}^{2} ∥_{V}$ for all $θ \in B_{ε_{θ}} (θ^{*})$ and $u_{0}^{1}, u_{0}^{2} \in B_{ε_{u_{0}}} (ϕ_{0} (θ^{*})) \subseteq V$ .

Moreover, g is uniformly bounded on $B_{ε_{θ}} (θ^{*}) \times B_{ε_{u_{0}}} (ϕ_{0} (θ^{*})),$ i.e there exists $K_{g} \geq 0$ such that $∥ g (θ, u_{0}) ∥ \leq K_{g}$ for all $(θ, u_{0}) \in B_{ε_{θ}} (θ^{*}) \times B_{ε_{u_{0}}} (ϕ_{0} (θ^{*}))$ .

5.1. Elliptic PDE constraints

Using Assumptions 5.1–5.6 and the existence of a weak solution (Assumption 2.1), we can prove the following theorem on stability and convergence for the continuous analogue of the descent method for elliptic PDE constraints.

Theorem 5.1

Let Assumptions 5.1–5.6 be satisfied. Then there exists a $λ^{*} \geq 0$ such that for all $λ > λ^{*}$ solutions to (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) are well defined for all r>0 and the local minimizer $(θ^{*}, u_{0}^{*})$ of the optimization problem (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ) is a locally exponentially stable steady state of the system (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ).

Proof.

Define $\tilde{θ} := θ - θ^{*}$ and ${\tilde{u}}_{0} = u_{0} - ϕ_{0} (θ)$ , with $θ \in B_{ε_{θ}} (θ^{*})$ and $u_{0} \in B_{ε_{u_{0}}} (ϕ_{0} (θ^{*}))$ , where $ϕ_{0} (θ^{*}) = u_{0}^{*}$ exists, because of Assumption 5.2. We further define a Lyapunov function $V (r) = \frac{1}{2} ∥ \tilde{θ} (r) ∥^{2} + \frac{1}{2} ∥ {\tilde{u}}_{0} (r) ∥_{H}^{2}$ . To prove Theorem 5.1, we will show that the Lyapunov function decreases exponentially. The derivative along the trajectories is given by $\frac{d}{d r} V (r) = \frac{d}{d r} (\frac{1}{2} ∥ \tilde{θ} (r) ∥^{2}) + \frac{d}{d r} (\frac{1}{2} ∥ {\tilde{u}}_{0} (r) ∥_{H}^{2}) .$ First, we bound the first summand from above, using (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ), and Assumptions 5.1, 5.5 and 5.6, $\begin{aligned} \frac{d}{d r} \frac{1}{2} ∥ \tilde{θ} ∥^{2} & = {(\frac{d}{d r} \tilde{θ})}^{T} \tilde{θ} \\ = (g (θ, u_{0}) - g (θ, ϕ_{0} (θ)))^{T} \tilde{θ} + (g (θ, ϕ_{0} (θ)) - g (θ^{*}, ϕ_{0} (θ^{*})))^{T} \tilde{θ} \\ \leq ∥ g (θ, u_{0}) - g (θ, ϕ_{0} (θ)) ∥ ∥ \tilde{θ} ∥ - γ_{g} ∥ θ - θ^{*} ∥^{2} \\ \leq L_{g} ∥ {\tilde{u}}_{0} ∥ ∥ \tilde{θ} ∥ - γ_{g} ∥ \tilde{θ} ∥^{2} . \end{aligned}$ Second, we bound the second summand from above, using (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) and $C_{0} (θ, ϕ_{0} (θ)) = 0$ , as well as the fact that by Assumption 5.3 we have (Equation9(9) $S_{0} (θ, ϕ_{0} (θ)) = \nabla_{θ} ϕ_{0} (θ),$ (9) ), $\begin{aligned} \frac{d}{d r} \frac{1}{2} ∥ {\tilde{u}}_{0} ∥_{H}^{2} & = {(\frac{d {\tilde{u}}_{0}}{d r}, {\tilde{u}}_{0})}_{V^{*}, V} = {(\frac{d u_{0}}{d r} - \nabla_{θ} ϕ_{0} (θ) \frac{d θ}{d r}, {\tilde{u}}_{0})}_{V^{*}, V} \\ = {((S_{0} (θ, u_{0}) - S_{0} (θ, ϕ_{0} (θ))) g (θ, u_{0}), {\tilde{u}}_{0})}_{V^{*}, V} \\ + λ {(C_{0} (θ, u_{0}) - C_{0} (θ, ϕ_{0} (θ)), {\tilde{u}}_{0})}_{V^{*}, V} \\ \leq ∥ (S_{0} (θ, u_{0}) - S_{0} (θ, ϕ_{0} (θ))) ∥_{V^{n_{θ}}} ∥ g (θ, u_{0}) ∥ ∥ {\tilde{u}}_{0} ∥_{V} \\ + λ {(C_{0} (θ, u_{0}) - C_{0} (θ, ϕ_{0} (θ)), {\tilde{u}}_{0})}_{V^{*}, V} . \end{aligned}$ With Assumptions 5.3, 5.4 and 5.6, we get $\frac{d}{d r} \frac{1}{2} ∥ {\tilde{u}}_{0} ∥_{H}^{2} \leq (L_{S_{0}} K_{g} - λ γ_{c}) ∥ {\tilde{u}}_{0} ∥_{V}^{2} .$ Hence, we can estimate the derivative of the Lyapunov function, $\frac{d}{d r} V (r) \leq - (- L_{S_{0}} K_{g} + λ γ_{c}) ∥ {\tilde{u}}_{0} ∥_{V}^{2} + L_{g} ∥ {\tilde{u}}_{0} ∥_{V} ∥ \tilde{θ} ∥ - γ_{g} ∥ \tilde{θ} ∥^{2} .$ To show that $V$ decays exponentially, we have to show that $\frac{d}{d r} V (r) \leq - a V (r)$ for some a>0. Based on our estimates, proving Theorem 5.1 reduces to finding a>0 with (20) $0 \leq (- L_{S_{0}} K_{g} + λ γ_{c} - \frac{a}{2}) ∥ {\tilde{u}}_{0} ∥_{V}^{2} - L_{g} ∥ {\tilde{u}}_{0} ∥_{V} ∥ \tilde{θ} ∥ + (γ_{g} - \frac{a}{2}) ∥ \tilde{θ} ∥^{2} .$ (20) We want this inequality to be valid without restrictions on $∥ \tilde{θ} ∥$ or $∥ {\tilde{u}}_{0} ∥_{V}$ . Due to the last term, we can therefore only consider values of a that are smaller than $2 γ_{g}$ . Hence, (Equation20(20) $0 \leq (- L_{S_{0}} K_{g} + λ γ_{c} - \frac{a}{2}) ∥ {\tilde{u}}_{0} ∥_{V}^{2} - L_{g} ∥ {\tilde{u}}_{0} ∥_{V} ∥ \tilde{θ} ∥ + (γ_{g} - \frac{a}{2}) ∥ \tilde{θ} ∥^{2} .$ (20) ) is equivalent to $0 \leq {(\sqrt{γ_{g} - \frac{a}{2}} ∥ \tilde{θ} ∥ - \frac{L_{g}}{2 \sqrt{γ_{g} - \frac{a}{2}}} ∥ {\tilde{u}}_{0} ∥_{V})}^{2} + (- L_{S_{0}} K_{g} + λ γ_{c} - \frac{a}{2} - \frac{L_{g}^{2}}{4 (γ_{g} - \frac{a}{2})}) ∥ {\tilde{u}}_{0} ∥_{V}^{2} .$ Since the first term in the inequality is greater or equal to 0, we have to find a>0 such that $λ γ_{c} - \frac{a}{2} - L_{S_{0}} K_{g} - \frac{L_{g}^{2}}{4 (γ_{g} - \frac{a}{2})} \geq 0.$ Multiplying with $4 γ_{g} - 2 a$ , we obtain a quadratic inequality for a $a^{2} + 2 (- γ_{g} - λ γ_{c} + L_{S_{0}} K_{g}) a + (4 λ γ_{c} γ_{g} - 4 L_{S_{0}} K_{g} γ_{g} - L_{g}^{2}) \geq 0.$ The roots of the quadratic polynomial are given by $a_{1, 2} = γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} \pm \sqrt{d},$ with discriminant $d = (γ_{g} - λ γ_{c} + L_{S_{0}} K_{g})^{2} + L_{g}^{2} .$ The discriminant is always positive, therefore, $a_{1} = γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} - \sqrt{d} < a_{2} = γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} + \sqrt{d}$ are real roots. In the following, we will assume that $a_{1} > 0$ , which can be achieved by choosing λ such that $λ > λ^{*} = L_{S_{0}} K_{g} / γ_{c} + L_{g}^{2} / (4 γ_{g} γ_{c}) \geq 0$ . This choice is justified as follows. As the square root, $\sqrt{d}$ , is always positive, $γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} > 0$ , i.e. $λ > (L_{S_{0}} K_{g} - γ_{g}) / γ_{c}$ needs to hold to ensure $a_{1} > 0$ . Squaring both sides of the inequality (21) $γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} > \sqrt{d}$ (21) yields $\begin{aligned} (γ_{g} + λ γ_{c} - L_{S_{0}} K_{g})^{2} & > (γ_{g} - λ γ_{c} + L_{S_{0}} K_{g})^{2} + L_{g}^{2} \\ \Leftrightarrow λ & > \frac{L_{g}^{2}}{4 γ_{g} γ_{c}} + \frac{L_{S_{0}} K_{g}}{γ_{c}} . \end{aligned}$ Taking $λ > λ^{*} := max \{\frac{L_{S_{0}} K_{g} - γ_{g}}{γ_{c}}, \frac{L_{g}^{2}}{4 γ_{g} γ_{c}} + \frac{L_{S_{0}} K_{g}}{γ_{c}}\} = \frac{L_{g}^{2}}{4 γ_{g} γ_{c}} + \frac{L_{S_{0}} K_{g}}{γ_{c}}$ ensures $a_{1} > 0$ .

Therefore, a either fulfils $0 < a < a_{1}$ with $a < 2 γ_{g}$ or $a_{2} < a < 2 γ_{g}$ , provided $λ > λ^{*}$ . Hence, we distinguish the following three cases:

$2 γ_{g} < a_{1} < a_{2}$ ,
$a_{1} < a_{2} \leq 2 γ_{g}$ ,
$a_{1} \leq 2 γ_{g} \leq a_{2}$ ,

for the relation of $2 γ_{g}$ , $a_{1}$ and $a_{2}$ as illustrated in Figure .

Figure 2. The function $f (a) = a^{2} + (- 2 γ_{g} - 2 λ γ_{c} + 2 L_{S_{0}} K_{g}) a + (4 λ γ_{c} γ_{g} - 4 L_{S_{0}} K_{g} γ_{g} - L_{g}^{2})$ is illustrated with the two roots $a_{1}$ and $a_{2}$ and the three different positions of $2 γ_{g}$ , as well as possible positions of a.

Case (1): $2 γ_{g} < a_{1}$ is equivalent to $λ γ_{c} - L_{S_{0}} K_{g} - γ_{g} > \sqrt{d} .$ If the term $λ γ_{c} - L_{S_{0}} K_{g} - γ_{g}$ is negative, the inequality cannot be valid. The term $λ γ_{c} - L_{S_{0}} K_{g} - γ_{g}$ is non-negative if $λ \geq (γ_{g} + L_{S_{0}} K_{g}) / γ_{c}$ . In this case, we can square the inequality and get a contradiction ( $0 > L_{g}^{2}$ ).

Case (2): $a_{2} \leq 2 γ_{g}$ is equivalent to $\sqrt{d} \leq γ_{g} + L_{S_{0}} K_{g} - λ γ_{c} .$ This leads to a contradiction with the same arguments as in case (1).

Case (3): $a_{1} \leq 2 γ_{g}$ is equivalent to $- γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} \leq \sqrt{d} .$ The left-hand side $- γ_{g} + λ γ_{c} - L_{S_{0}} K_{g}$ is non-negative for all $λ \geq (γ_{g} + L_{S_{0}} K_{g}) / γ_{c}$ . With squaring, we get $0 \leq L_{g}^{2}$ . On the other hand, if the term $- γ_{g} + λ γ_{c} - L_{S_{0}} K_{g}$ is negative, that is $λ < (γ_{g} + L_{S_{0}} K_{g}) / γ_{c}$ , we have $\sqrt{d} > 0$ . This is true for all λ, because d>0. In total, we find $a_{1} \leq 2 γ_{g}$ for all $λ > 0$ . Analogously we get for $a_{2}$ that $a_{2} \geq 2 γ_{g}$ is fulfilled for all $λ > 0$ . Hence, we know that $a_{1} \leq 2 γ_{g} \leq a_{2}$ holds for all $λ > 0$ and only case (3) is valid.

Altogether, we find that a lies in the interval $[0, a_{1}]$ provided $λ > λ^{*}$ . In this case, it also holds that $\frac{d}{d r} V (r) \leq - \frac{a}{2} (∥ {\tilde{u}}_{0} (r) ∥_{V}^{2} + ∥ \tilde{θ} (r) ∥^{2}) \leq - \frac{a}{2 K_{V \to H}^{2}} ∥ {\tilde{u}}_{0} (r) ∥_{H}^{2} - \frac{a}{2} ∥ \tilde{θ} (r) ∥^{2} \leq - \tilde{a} V (r),$ with $\tilde{a} = - (a / 2) min {1 / K_{V \to H}^{2}, 1}$ , where $K_{V \to H}$ is the embedding constant.

Remark 5.1

To tune the choice of the retraction factor λ, we now consider the fact that the value of a determines the speed at which $V (r)$ decreases, thus a convenient choice of the retraction factor $λ > λ^{*}$ maximizes a to yield the fastest exponential decay. In our case, this means maximizing $a (λ) = a_{1} = γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} - \sqrt{d (λ)}$ with respect to λ. An elementary computation yields $\frac{d a}{d λ} (λ) = γ_{c} + \frac{γ_{g} γ_{c} - λ γ_{c}^{2} + L_{S_{0}} K_{g} γ_{c}}{\sqrt{d (λ)}} \geq 0$ with equality iff $L_{g} = 0$ , thus $λ \mapsto a (λ)$ is monotonically increasing (strictly, if $L_{g} > 0$ ) and therefore $\begin{aligned} sup_{λ \in (λ^{*}, \infty)} a (λ) & = lim_{λ \to \infty} a (λ) = lim_{λ \to \infty} γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} - \sqrt{(γ_{g} - λ γ_{c} + L_{S_{0}} K_{g})^{2} + L_{g}^{2}} \\ = 2 γ_{g} . \end{aligned}$ In case $L_{g} = 0$ , we have $a (λ) = γ_{g} + λ γ_{c} - L_{S_{0}} K_{g} - | γ_{g} - λ γ_{c} + L_{S_{0}} K_{g} |$ . Distinguishing the two cases for the absolute value yields the maximal value $a (λ) = 2 γ_{g}$ , attained at all $λ \geq (γ_{g} - L_{S_{0}} K_{g}) / γ_{c}$ .

This shows that (unless $L_{g} = 0$ ) the exponential decay is maximized by choosing $λ > λ^{*}$ as large as possible. Nevertheless, in practice, λ should not be chosen too large in order to avoid stiffness of system (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ).

Remark 5.2

The proof provides a lower bound for the retraction factor λ, namely $λ > λ^{*} = L_{g}^{2} / (4 γ_{g} γ_{c}) + L_{S_{0}} K_{g} / γ_{c}$ . In specific applications, it might not always be possible to explicitly compute all involved constants. If this is the case, an alternative Lyapunov function can be used to derive a lower bound for λ. A possible candidate for this Lyapunov function is (22) $V (r) = j (θ (r)) - j (θ^{*}) + \frac{1}{2} ∥ u_{0} - ϕ_{0} (θ (r)) ∥_{H}^{2} .$ (22) With this choice and analogous computations as above, different lower bounds involving different constants can be derived. The lower bound for the retraction factor can be estimated as $λ > {\hat{λ}}^{*} = \hat{L} / γ_{c} + {\hat{L}}_{g}^{2} / (4 γ_{c})$ with (23) $\hat{L} = \{\begin{cases} \frac{((S_{0} (θ, u_{0}) - S_{0} (θ, ϕ_{0} (θ))) g (θ, u_{0}), u_{0} - ϕ_{0} (θ))_{V^{*}, V}}{∥ u_{0} - ϕ_{0} (θ) ∥_{V}^{2}}, & if u_{0} \neq ϕ_{0} (θ), \\ 0, & else, \end{cases}$ (23) and (24) ${\hat{L}}_{g} = \{\begin{cases} \frac{g (θ, ϕ_{0} (θ))^{T} (g (θ, ϕ_{0} (θ)) - g (θ, u_{0}))}{∥ u_{0} - ϕ_{0} (θ) ∥_{V} ∥ g (θ, ϕ_{0} (θ)) ∥}, & if u_{0} \neq ϕ_{0} (θ) and g (θ, ϕ_{0} (θ)) \neq 0, \\ 0, & else . \end{cases}$ (24) This bound depends on θ, the current parameter estimates during computation, and therefore requires a posteriori adaptation of the retraction factor. A practical implementation of such a retraction factor choice involves the evaluation of functionals of the residuals $g (θ, ϕ_{0} (θ)) - g (θ, u_{0})$ , $u_{0} - ϕ_{0} (θ)$ as well as sensitivities $(S_{0} (θ, u_{0}) - S_{0} (θ, ϕ_{0} (θ))) g (θ, u_{0})$ (which can be done approximately, on a coarser computational mesh, and using adjoint techniques) is subject of future work.

Remark 5.3

If the cost function is locally convex but not locally uniformly convex, the parameters are identifiable but not strictly identifiable [Citation19]. In this case, $g (θ, ϕ_{0})$ defined by (Equation14(14) $g_{i} (θ, u_{0}) := - \frac{\partial J}{\partial θ_{i}} (θ, u_{0}) - {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} =: d_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}},$ (14) ) is not uniformly monotone, but just monotone, i.e. formally $γ_{g} = 0$ . In this case, using a projection P on the orthogonal complement of the null space of $\nabla^{2} j (θ^{*})$ might facilitate the proof of convergence on this subspace. A possible Lyapunov function in this case is given by $V (r) = \frac{1}{2} ∥ P g (θ, ϕ_{0} (θ)) ∥^{2} + \frac{1}{2} ∥ u_{0} - ϕ_{0} (θ) ∥_{X}^{2}$ . Denoting the smallest positive eigenvalue of $(d g / d θ) (θ ϕ_{0} (θ))$ with $\underline{μ}$ , we require that $ξ^{T} P (d g / d θ) (θ, ϕ_{0} (θ)) ξ \leq - \underline{μ} ∥ P ξ ∥^{2}$ for all $ξ \in R^{n_{θ}}$ . Then a retraction factor λ should be chosen $λ > λ^{*} = \hat{L} / γ_{c} + {\hat{L}}_{g}^{2} / (4 γ_{c} \underline{μ})$ , with (25) $\hat{L} = \{\begin{cases} \frac{((S_{0} (θ, u_{0}) - S_{0} (θ, ϕ_{0} (θ))) g (θ, u_{0}), u_{0} - ϕ_{0} (θ))_{V^{*}, V}}{∥ u_{0} - ϕ_{0} (θ) ∥_{V}^{2}}, & if u_{0} \neq ϕ_{0} (θ), \\ 0, & else, \end{cases}$ (25) and (26) ${\hat{L}}_{g} = \{\begin{cases} \frac{g (θ, ϕ_{0} (θ))^{T} (g (θ, ϕ_{0} (θ)) - g (θ, u_{0}))}{∥ u_{0} - ϕ_{0} (θ) ∥_{V} ∥ g (θ, ϕ_{0} (θ)) ∥}, & if u_{0} \neq ϕ_{0} (θ) and g (θ, ϕ_{0} (θ)) \neq 0, \\ 0, & else . \end{cases}$ (26) However, the null space of $\nabla^{2} j (θ^{*})$ depends on the unknown optimal parameter $θ^{*}$ and can in general not be assessed a priori, thus leaving this approach for further investigation.

Remark 5.4

Locality in Theorem 5.1 is only imposed in terms of the size of $ε_{θ}, ε_{u_{0}}$ in Assumptions 5.2–5.6. As a matter of fact, Assumptions 5.2, 5.3, 5.4, 5.6 will typically hold for $(θ, u_{0})$ values in a larger neighbourhood of the solution, possibly after imposing certain constraints on the parameter values such as nonnegativity, in order to ensure that the model defined by $C_{0}$ is valid. Smallness of $ε_{θ}, ε_{u_{0}}$ will mainly be required in Assumption 5.5 to guarantee, together with Assumption 5.1, that the descent direction defined by g really points towards $(θ_{*}, ϕ_{0} (θ_{*}))$ .

To enlarge the convergence radius, globalization strategies for the solution of nonlinear equations or optimization problems can be employed. In particular, as the analysis in [Citation20,Citation21] indicates, the strategy of using a particular time-stepping algorithm together with an appropriate step size control in place of the continuous flow defined by (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ), might enable to get rid of the locality assumption in Theorem 5.1. Indeed, with $F (θ, u_{0}) = (\begin{matrix} - d (θ, u_{0}) \\ - λ C_{0} (θ, u_{0}) \end{matrix}), M (θ, u_{0}) = {(\begin{matrix} H (θ, u_{0}) & 0 \\ - S_{0} (θ, u_{0}) H (θ, u_{0}) & I \end{matrix})}^{- 1},$ for the choice (Equation15(15) $g (θ, u_{0}) = H (θ, u_{0})^{- 1} d (θ, u_{0})$ (15) ), method (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) can at least formally be cast into the framework of [Citation20, Equation (5.1)], see also the introduction of [Citation21] for the infinite dimensional setting relevant here. However, the detailed conditions from [Citation21] would still have to be verified for our special setting, which will be subject of future research.

5.2. Elliptic and parabolic PDE constraints

As we consider the partially reduced form of the optimization problem with elliptic and parabolic PDE constraints (Equation18(18) $\begin{aligned} min_{θ, u_{0}} & \tilde{J} (θ, u_{0}) := \tilde{J} (θ, u_{0}, ϕ (θ, u_{0})) \\ s.t. & 0 = C_{0} (θ, u_{0}), \end{aligned}$ (18) ), the results established for the elliptic problem can be easily transferred given the existence of a solution operator for the parabolic problem.

Theorem 5.2

Let Assumptions 2.1, 5.1–5.6 be satisfied with g replaced by $\tilde{g}$ according to (Equation19(19) ${\tilde{g}}_{i} (θ, u_{0}) = - \frac{\partial \tilde{J}}{\partial θ_{i}} (θ, u_{0}) - \frac{\partial \tilde{J}}{\partial u_{0}} (θ, u_{0}) {S_{0}}_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}} .$ (19) ). Then there exists a $λ^{*} > 0$ such that for all $λ > λ^{*}$ solutions to (27) $\begin{aligned} \frac{d θ}{d r} (r) & = \tilde{g} (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) \tilde{g} (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (r) = u_{0, 0}, \end{aligned}$ (27) are well defined for all r>0 and the local minimizer $(θ^{*}, u_{0}^{*})$ of the optimization problem (Equation4(4) $\begin{aligned} min_{θ, u_{0}, u} & \tilde{J} (θ, u_{0}, u) \\ s.t. & u_{t} = C (θ, t, u) u (0) = u_{0} \\ 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (4) ) is a locally exponentially stable steady state of the system (Equation27(27) $\begin{aligned} \frac{d θ}{d r} (r) & = \tilde{g} (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) \tilde{g} (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (r) = u_{0, 0}, \end{aligned}$ (27) ).

With the setting introduced in the last paragraph of Section 4, the result directly follows from Theorem 5.1.

6. Discussion of the assumptions

In this section, we discuss Assumptions 5.1–5.6 in more detail. In particular, we provide sufficient conditions for these assumptions which show that the assumptions are rather weak and fulfilled by many application problems.

Remark 6.1

Assumption 5.1

For the choice (Equation14(14) $g_{i} (θ, u_{0}) := - \frac{\partial J}{\partial θ_{i}} (θ, u_{0}) - {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} =: d_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}},$ (14) ) of g, this condition is satisfied due to the identity $0 = \nabla_{θ} j (θ^{*}) = - \nabla_{θ} J (θ, ϕ (θ^{*})) - \frac{\partial J}{\partial u_{0}} (θ, ϕ (θ^{*})) S_{0} (θ, ϕ (θ^{*})) .$ This remains valid more generally for descent directions defined by (Equation15(15) $g (θ, u_{0}) = H (θ, u_{0})^{- 1} d (θ, u_{0})$ (15) ), such as Newton-type methods.

Remark 6.2

Assumptions 5.2 and 5.3

Dynamical systems in engineering, physics and life sciences will typically exhibit locally exponentially stable steady-state solutions. This implies that these systems are locally uniformly monotone. This is especially true for the optimal parameters $θ^{*}$ and thus (for smooth $C_{0} (θ, u_{0})$ ) also in a neighbourhood of $(θ^{*}, ϕ_{0} (θ^{*}))$ . Hence, around the true parameters, Assumption 5.3 and thus also the monotonicity part of Assumption 5.2 is locally fulfilled for most real-world systems.

An additional consequence of uniform monotonicity, hemicontinuity and coercivity of $C_{0}$ according to Assumption 5.2 is uniform monotonicity, hemicontinuity and coercivity of its linearization and thus, by the Theorem of Browder and Minty, existence and boundedness of the inverse $((\partial C_{0} / \partial u_{0}) (θ^{*}, u_{0}^{*}))^{- 1} : V^{*} \to V$ , is assured, as will be used below.

Remark 6.3

Assumption 5.4

The Lipschitz continuity of S is satisfied for example if

$\partial C_{0} / \partial u_{0} : R^{n_{θ}} \times V \to L (V, V^{*})$ is continuous,
$\nabla_{θ} C_{0}$ and $\partial C_{0} / \partial u_{0}$ are locally Lipschitz continuous with respect to $u_{0}$ in a neighbourhood of $(θ^{*}, u_{0}^{*})$ , with Lipschitz constants $L_{C_{0}, θ}$ and $L_{C_{0}, u_{0}}$ , respectively,
$\nabla_{θ} C_{0}$ is uniformly bounded on this neighbourhood by $K_{C_{0}, θ}$ ,
$(\partial C_{0} / \partial u_{0}) (θ^{*}, u_{0}^{*})^{- 1}$ is bounded by $K_{C_{0}, u_{0}}$ , as discussed in Remark 6.2,

provided that $C_{0}$ is continuously (Frèchet) differentiable. This can be seen as follows.

We want to show that $∥ S_{0_{i}} (θ, u_{0}^{1}) - S_{0_{i}} (θ, u_{0}^{2}) ∥_{V} \leq L_{S_{i}} ∥ u_{0}^{1} - u_{0}^{2} ∥_{V}$ for all $i \in {1, \dots, n_{θ}}$ , $θ \in B_{ε_{θ}} (θ^{*})$ and $u_{0}^{1}, u_{0}^{2} \in B_{ε_{u}} (ϕ_{0} (θ^{*}))$ , where $S_{0_{i}}$ is the ith component of $S_{0}$ .

Using $S_{0_{i}} (θ, u_{0}) = ((\partial C_{0} / \partial u_{0}) (θ, u_{0}))^{- 1} ((\partial C_{0} / \partial θ_{i}) (θ, u_{0}))$ for $i \in {1, \dots, n_{θ}}$ yields $\begin{aligned} {∥S_{0_{i}} (θ, u_{0}^{1}) - S_{0_{i}} (θ, u_{0}^{2})∥}_{V} \\ \leq {∥{(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{1}))}^{- 1}∥}_{V^{*} \to V} {∥\frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}^{1}) - \frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}^{2})∥}_{V^{*}} \\ + {∥{(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{1}))}^{- 1} - {(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{2}))}^{- 1}∥}_{V^{*} \to V} {∥\frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}^{2})∥}_{V^{*}} . \end{aligned}$ With $A^{- 1} - B^{- 1} = A^{- 1} (B - A) B^{- 1}$ , we can further estimate $\begin{aligned} {∥S_{0_{i}} (θ, u_{0}^{1}) - S_{0_{i}} (θ, u_{0}^{2})∥}_{V} \\ \leq {∥{(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{1}))}^{- 1}∥}_{V^{*} \to V} ({∥\frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}^{1}) - \frac{\partial C_{0}}{\partial θ_{i}} (q, u_{0}^{2})∥}_{V^{*}} \\ + {∥\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{2}) - \frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{1})∥}_{V \to V^{*}} {∥{(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{1}))}^{- 1}∥}_{V^{*} \to V} {∥\frac{\partial C_{0}}{\partial θ_{i}} (θ, u_{0}^{2})∥}_{V^{*}}) . \end{aligned}$ The fact that the inverses $((\partial C_{0} / \partial u_{0}) (θ, u_{0}^{1}))^{- 1}$ , $((\partial C_{0} / \partial u_{0}) (θ, u_{0}^{2}))^{- 1}$ exist and are bounded follows from the regularity of $(\partial C_{0} / \partial u_{0}) (θ^{*}, ϕ_{0} (θ^{*}))$ and a perturbation argument. Taking $L = (\partial C_{0} / \partial u_{0}) (θ^{*}, ϕ_{0} (θ^{*}))$ and $M_{i} = (\partial C_{0} / \partial u_{0}) (θ, u_{0}^{i}))$ , i=1,2 and using that by continuity of $(\partial C_{0} / \partial u_{0}) (θ, u_{0})$ in a neighbourhood of $(θ^{*}, u_{0}^{*})$ and by possibly decreasing $ε_{θ}$ , $ε_{u_{0}}$ , we get $∥ M_{i} - L ∥ = {∥\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{i}) - \frac{\partial C_{0}}{\partial u_{0}} (θ^{*}, ϕ_{0} (θ^{*}))∥}_{V \to V^{*}} < \frac{1}{∥ L^{- 1} ∥_{V}}$ for any $(θ, u_{0}^{i}) \in B_{ε_{θ}} (θ^{*}) \times B_{ε_{u}} (ϕ_{0} (θ^{*}))$ . Therewith the operator $M_{i}^{- 1} = ((\partial C_{0} / \partial u_{0}) (θ, u_{0}))^{- 1}$ exists and is bounded by ${∥M_{i}^{- 1}∥}_{V} = {∥{(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{i}))}^{- 1}∥}_{V} \leq \frac{∥ L^{- 1} ∥}{1 - ∥ L^{- 1} ∥ ∥ L - M_{i} ∥} \leq K_{C_{0}, u_{0}} .$ Using this bound and local Lipschitz continuity with respect to $u_{0}$ of the derivatives $\partial C_{0} / \partial θ_{i}$ and $\partial C_{0} / \partial u_{0}$ results in ${∥S_{0_{i}} (θ, u_{0}^{1}) - S_{0_{i}} (θ, u_{0}^{2})∥}_{V} \leq K_{C_{0}, u_{0}} (L_{C_{0}, θ_{i}} + K_{C_{0}, u_{0}} K_{C_{0}, θ_{i}} L_{C_{0}, u_{0}}) {∥u_{0}^{1} - u_{0}^{2}∥}_{V}$ and hence ${∥S_{0} (θ, u_{0}^{1}) - S_{0} (θ, u_{0}^{2})∥}_{V} \leq L_{S_{0}} {∥u_{0}^{1} - u_{0}^{2}∥}_{V}$ with $L_{S_{0}} = \sum_{i = 1}^{n_{θ}} {K_{C_{0}, u_{0}} (L_{C_{0}, θ_{i}} + K_{C_{0}, u_{0}} K_{C_{0}, θ_{i}} L_{C_{0}, u_{0}})}$ , where $K_{C_{0}, θ_{i}}$ denotes the bound of the derivative w.r.t. $θ_{i}$ of the steady-state residual.

Remark 6.4

Assumption 5.5

If the descent direction g is defined by (Equation14(14) $g_{i} (θ, u_{0}) := - \frac{\partial J}{\partial θ_{i}} (θ, u_{0}) - {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} =: d_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}},$ (14) ), then, according to [Citation19], Assumption 5.5 is equivalent to practical identifiability, which implies local structural identifiability of the parameter vector θ. The same holds true for more general choices (Equation15(15) $g (θ, u_{0}) = H (θ, u_{0})^{- 1} d (θ, u_{0})$ (15) ), such as Newton-type methods, under a uniform positivity condition on $H (θ, u_{0})$ .

If local structural identifiability cannot be guaranteed, one can still use regularization [Citation22], e.g. by adding a term $α (θ - θ_{p})^{T} Γ^{- 1} (θ - θ_{p})$ with positive definite Γ and positive α to the cost function $J (θ, u_{0})$ in (Equation5(5) $\begin{aligned} min_{θ, u_{0}} & J (θ, u_{0}) \\ s.t. & 0 = C_{0} (θ, u_{0}) . \end{aligned}$ (5) ) for which (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) yields a minimizer $(θ^{*} (α), u_{0}^{*} (α))$ . Regularization theory, e.g. [Citation22], provides convergence of $θ^{*} (α)$ to a parameter $\bar{θ}$ that is consistent with the observations as $α \to 0$ .

Remark 6.5

Assumption 5.6

As an example, we check Assumption 5.6 for the case of a gradient descent-based update of θ, i.e. g according to (Equation14(14) $g_{i} (θ, u_{0}) := - \frac{\partial J}{\partial θ_{i}} (θ, u_{0}) - {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} =: d_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}},$ (14) ). We find $\begin{aligned} ∥g (θ, u_{0}^{1}) - g (θ, u_{0}^{2})∥ \\ = ∥- \nabla_{θ} J (θ, u_{0}^{1}) - \frac{\partial J}{\partial u_{0}} (θ, u_{0}^{1}) S_{0} (θ, u_{0}^{1}) + \nabla_{θ} J (θ, u_{0}^{2}) + \frac{\partial J}{\partial u_{0}} (θ, u_{0}^{2}) S_{0} (θ, u_{0}^{2})∥ \\ \leq ∥\nabla_{θ} J (θ, u_{0}^{1}) - \nabla_{θ} J (θ, u_{0}^{2})∥ + {∥\frac{\partial J}{\partial u_{0}} (θ, u_{0}^{1}) - \frac{\partial J}{\partial u_{0}} (θ, u_{0}^{2})∥}_{V \to R} {∥S_{0} (θ, u_{0}^{1})∥}_{V^{n_{θ}}} \\ + {∥\frac{\partial J}{\partial u_{0}} (θ, u_{0}^{2})∥}_{V \to R} {∥S_{0} (θ, u_{0}^{1}) - S_{0} (θ, u_{0}^{2})∥}_{V^{n_{θ}}} . \end{aligned}$ Here, $S_{0}$ is Lipschitz continuous according to Remark 6.3. Also boundedness of $S_{0}$ is fulfilled under the assumptions made in Remark 6.3, namely that $((\partial C_{0} / \partial u_{0}) (θ, u_{0}))^{- 1}$ and $\nabla_{θ} C_{0}$ are locally uniformly bounded. (28) $\begin{aligned} {∥S_{0} (θ, u_{0}^{1})∥}_{V^{n_{θ}}} & \leq {∥{(\frac{\partial C_{0}}{\partial u_{0}} (θ, u_{0}^{1}))}^{- 1}∥}_{V^{*} \to V} {∥\nabla_{θ} C_{0} (θ, u_{0}^{1})∥}_{V^{*^{n_{θ}}}} \\ \leq K_{C_{0}, u_{0}} K_{C_{0}, θ} =: K_{S_{0}} . \end{aligned}$ (28) If J is differentiable and the derivatives $\nabla_{θ} J$ and $\partial J / \partial u_{0}$ are locally Lipschitz continuous with respect to $u_{0}$ with Lipschitz constants $L_{J, θ; u_{0}}$ and $L_{J, u_{0}; u_{0}}$ , respectively, and $\partial j / \partial u_{0}$ is uniformly bounded on $B_{ε_{θ}} (θ^{*}) \times B_{ε_{u}} (ϕ_{0} (θ^{*}))$ , we can conclude $∥g (θ, u_{0}^{1}) - g (θ, u_{0}^{2})∥ \leq L_{g} ∥ u_{0}^{1} - u_{0}^{2} ∥_{V},$ with $L_{g} = L_{J, θ; u_{0}} + K_{J, u_{0}} L_{S_{0}} + L_{J, u_{0}; u_{0}} K_{S_{0}}$ .

The more general choice (Equation15(15) $g (θ, u_{0}) = H (θ, u_{0})^{- 1} d (θ, u_{0})$ (15) ) requires additionally Lipschitz continuity of $H (θ, u_{0})$ , which in case of Newton-type methods with (Equation16(16) $\begin{aligned} H_{i j} (θ, u_{0}) & = \frac{\partial^{2} J}{\partial θ_{i} \partial θ_{j}} (θ, u_{0}) + {⟨\frac{\partial^{2} J}{\partial u_{0} \partial θ_{j}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} \\ + {⟨\frac{\partial^{2} J}{\partial u_{0} \partial θ_{i}} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})⟩}_{V^{*}, V} + \frac{\partial^{2} J}{\partial u_{0}^{2}} (θ, u_{0}) ({S_{0}}_{i} (θ, u_{0}), {S_{0}}_{j} (θ, u_{0})) \\ + {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {T_{0}}_{i, j} (θ, u_{0})⟩}_{V^{*}, V}, i, j \in {1, \dots, n_{θ}}, \end{aligned}$ (16) ) amounts to higher smoothness of J and $C_{0}$ .

Thus we have shown the following proposition.

Proposition 6.1

Let Assumption 6.1 be satisfied. Then there exist $ε_{θ}, ε_{u_{0}} > 0$ such that on $B_{ε_{θ}} (θ^{*}) \times B_{ε_{u}} (ϕ_{0} (θ^{*}))$ , Assumptions 5.1–5.6 are fulfilled.

Assumption 6.1

1. $C_{0} (θ, \cdot)$ is locally uniformly monotonically decreasing.
2. $C_{0}$ is continuous and continuously $($ Fréchet $)$ differentiable in a neighbourhood of $(θ^{*}, u_{0}^{*})$ .
3. $\nabla_{θ} C_{0}$ and $\partial C_{0} / \partial u_{0}$ are locally Lipschitz continuous w.r.t. $u_{0}$ and $\nabla_{θ} C_{0}$ is bounded in a neighbourhood of $(θ^{*}, u_{0}^{*})$ .
$g (θ, u_{0})$ is defined by (Equation14(14) $g_{i} (θ, u_{0}) := - \frac{\partial J}{\partial θ_{i}} (θ, u_{0}) - {⟨\frac{\partial J}{\partial u_{0}} (θ, u_{0}), {S_{0}}_{i} (θ, u_{0})⟩}_{V^{*}, V} =: d_{i} (θ, u_{0}), i \in {1, \dots, n_{θ}},$ (14) ) and θ is locally structurally identifiable.
1. J is continuously $($ Fréchet $)$ differentiable in a neighbourhood of $(θ^{*}, u_{0}^{*})$ .
2. $\partial J / \partial u_{0}$ and $\nabla_{θ} J$ are locally Lipschitz continuous with respect to $u_{0}$ .
3. $\partial J / \partial u_{0}$ is bounded in a neighbourhood of $(θ^{*}, u_{0}^{*})$ .

7. Application

To illustrate the continuous analogue of the descent method, we use it to study CCL21 gradient formation in biological tissues. This process is highly relevant in immune response [Citation23,Citation24] and described by a reaction–diffusion equation [Citation25]. In the following, we outline the model, estimate its parameters using the approach proposed in this paper and analyse the results.

7.1. Model formulation

CCL21 gradients are necessary for the guidance of dendritic cells towards lymphatic vessels [Citation26]. They are formed by the combination of several biological processes. The chemokine CCL21 is produced in the lymphatic vessels, which cover a subset domain $Ω_{L}$ of the domain Ω of interest, $Ω_{L} \subset Ω$ . The source term is defined via the function $Q (x) = \{\begin{cases} 1, & for x \in Ω_{L}, \\ 0, & otherwise . \end{cases}$ The concentration of free CCL21 is denoted by u. Free CCL21 binds to a sugar whose concentration is denoted by s. The binding yields immobilized CCL21 whose concentration is denoted by c. The parameters $k_{1}, k_{- 1}, D, γ, α$ denote the binding and unbinding rates, the diffusion coefficient, the degradation rate and the production rate of CCL21 from the lymphatic vessels, respectively. A PDE model for the process has been developed in [Citation25] and is given by (29) $\begin{aligned} u_{t} - D Δ u = α Q - k_{1} u s + k_{- 1} c - γ u, \\ \dot{s} = - k_{1} u s + k_{- 1} c, \\ \dot{c} = k_{1} u s - k_{- 1} c, \end{aligned}$ (29) for $t \in] 0, T [$ and $x \in Ω$ , with initial conditions $u (0, x) = c (0, x) = 0$ , $s (0, x) = s_{0}$ and no-flux boundary conditions $(\partial / \partial ν) u = 0$ where ν is the outer normal on Ω. The parameter $s_{0}$ denotes the initial sugar concentration.

As the formation of the gradient is apparently fast, we consider the steady state of (Equation29(29) $\begin{aligned} u_{t} - D Δ u = α Q - k_{1} u s + k_{- 1} c - γ u, \\ \dot{s} = - k_{1} u s + k_{- 1} c, \\ \dot{c} = k_{1} u s - k_{- 1} c, \end{aligned}$ (29) ). By considering the PDE for the time evolution of s and c, we find that $c = s_{0} u_{s} / (1 + u_{s})$ and $s = s_{0} / (u_{s} + 1)$ with the scaled CCL21 concentration $u_{s} := k_{1} u / k_{- 1}$ . Using the additional reformulation $\tilde{D} = D / γ$ , $\tilde{α} = α k_{1} / (γ k_{- 1})$ , the scaled steady-state concentration of CCL21, $u_{s}$ , has to fulfil $0 = \tilde{D} Δ \tilde{Δ} u_{s} + \tilde{α} Q - u_{s}$ and the boundary conditions $(\partial / \partial ν) u_{s} = 0$ .

For the considered process, imaging data have been collected [Citation27]. These images provide information about the localization of the lymphatic vessels (encoded in Q) and the concentration of immobilized CCL21. As the measured intensity values are corrupted by background fluorescence and as the data are not normalized, we model the readout following [Citation25] as $y_{i} = s_{l} (b + \int_{A_{i}} c (t, x) d x),$ where b denotes the intensity of the background fluorescence, $s_{l}$ is a scaling constant and $A_{i} \subset Ω$ is the domain of the pixel k. As the parameters are structurally non-identifiable, we reformulate the models in terms of $\tilde{b} = s_{l} b$ and ${\tilde{s}}_{0} = s_{l} s_{0}$ in the parameter estimation to one parameter and just consider $\tilde{b}$ and ${\tilde{s}}_{0}$ .

The optimization problem is then given by (30) $\begin{aligned} min_{θ, u_{s}} & J (θ, u_{s}) = \frac{1}{2} \{\sum_{i = 1}^{M} \log (2 π σ_{i}^{2} {\bar{y}}_{i}) + {(\frac{\log ({\bar{y}}_{i}) - \log (y_{i})}{σ_{i}})}^{2}\} \\ s . t . & - \tilde{D} Δ u_{s} + u_{s} = \tilde{α} Q, x \in Ω \\ \frac{\partial}{\partial ν} u_{s} = 0, x \in ∂Ω \\ y_{i} = \tilde{b} + \int_{A_{i}} \frac{{\tilde{s}}_{0} u_{s} (x)}{u_{s} (x) + 1} d x = \tilde{b} + {\tilde{s}}_{0} h_{i} (u_{s}), \end{aligned}$ (30) where $Ω \subseteq R^{2}$ , $σ_{i}$ is the scale parameter of the log-normally distributed measurement error and $h_{i} (u_{s}) = \int_{A_{i}} (u_{s} (x) / (u_{s} (x) + 1)) d x$ , $i = 1, \dots, M$ . The parameter vector θ is given by $θ = (\tilde{D}, \tilde{α}, {\tilde{s}}_{0}, \tilde{b}, σ) \in R^{n_{θ}}$ , with $n_{θ} = 5$ .

All parameters are assumed to be non-negative due to their biological meaning. The spaces V and $V^{*}$ for which we examine the problem are $V = H^{1} (Ω) and V^{*} = H^{1} (Ω)^{*}$ . The operator $C_{0}$ is given by $C_{0} (θ, u_{s}) = \tilde{D} Δ u_{s} - u_{s} + \tilde{α} Q$ . For these spaces and operators, it can easily be checked that all assumptions for applying the method (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ) are satisfied.

7.2. Numerical implementation

For the numerical simulation of the biological process, we employed a finite element discretization of the PDE model. The discretization was obtained using the MATLAB PDE toolbox and accounts for the stationary topology of the model (Figure A). The mesh consists of 2170 elements and the concentrations in these elements are the state variables of the discretization. For parameter optimization using the coupled ODE–PDE model (Equation17(17) $\begin{aligned} \frac{d θ}{d r} (r) & = g (θ, u_{0}), θ (0) = θ_{0}, \\ \frac{d u_{0}}{d r} (r) & = S_{0} (θ, u_{0}) g (θ, u_{0}) + λ C_{0} (θ, u_{0}), u_{0} (0) = u_{0, 0} . \end{aligned}$ (17) ), the same mesh was employed and the states of the discretized PDE were coupled with the ODE for the parameters. This yields a model with 2170+5 equations. The simulation-based method for parameter estimation was implemented in MATLAB extending the routine published in [Citation16]. The numerical simulation was performed using the MATLAB ODE solver ode15s, an implicit scheme applicable to stiff problems. To accelerate the calculations, we implemented the Jacobian of the coupled ODE–PDE model. The simulation of the continuous analogue was terminated, if the gradient of the right-hand side became small, i.e. $∥ C_{0} (θ, u_{0}) ∥_{V^{*}} / ∥ u_{0} ∥_{V} \leq 10^{- 6}$ . Furthermore, simulations were interrupted whenever the objective function value became complex, which can happen due to the log-transformation of the output.

Figure 3. (A) Geometry of a lymphoid vessel obtained from biological imaging data [Citation27]. (B) Simulated data of the CCL21 gradient generated by simulating model (Equation29(29) $\begin{aligned} u_{t} - D Δ u = α Q - k_{1} u s + k_{- 1} c - γ u, \\ \dot{s} = - k_{1} u s + k_{- 1} c, \\ \dot{c} = k_{1} u s - k_{- 1} c, \end{aligned}$ (29) ).

7.3. Simulated data

To evaluate the convergence properties of the proposed algorithm for the models, we considered published simulated data for the ground truth (similar to [Citation28]). The geometry of lymphatic tissue was extracted from the available imaging data [Citation27] using the MATLAB PDE toolbox. On this geometry, the discretized PDE was simulated using biologically plausible parameter values (Table ). The simulated data for CCL21 gradient formation process were corrupted by noise to obtain a plausible scenario (Figure B).

Table 1. True parameters, estimated parameters and parameter ranges for the latin hypercube sampling for the CCL21 model.

Display Table

7.4. Optimization

The objective function for most parameter estimation problems is non-convex and can be multimodal. For this reason, we employed multi-start local optimization using the continuous analogue for which we have established local convergence in this paper. The starting points for the local optimizations were sampled using a latin hypercube approach with lower and upper bounds provided in Table . We used a linear parametrization for the states and a log-parametrization $ξ = \log (θ)$ for the parameters following previous evaluations for biochemical systems [Citation29]. We did not implement any bounds for values of parameter or states. The implementation of the multi-start local optimization is based upon the MATLAB toolbox PESTO [Citation30]. The implementation of the objective function and finite element schemes was adapted from [Citation25]. For the local optimization with the continuous analogue, we chose the negative gradient as descent direction.

As a reference, we performed also multi-start local optimization using a discrete iterative optimization method. We used the state-of-the-art optimizer fmincon.m with the starting points sampled for the continuous analogue and the interior point algorithm implemented in the MATLAB Optimization Toolbox. This interior point algorithm employs either a Newton step, where the Hessian is approximated by the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm, or a conjugate gradient step using a trust region [Citation31–33]. The optimizer was provided with the objective function, the nonlinear constraint, as well as the corresponding derivatives. We used the same parametrization as for the continuous analogue and additionally constrained parameter values for $ξ = \log (θ)$ in the optimization by the same upper and lower bounds used for the sampling (Table ). The value of $u_{0}$ at the nodes of the mesh for the finite element discretization was constrained using upper and lower bounds for the optimization to lie in $[- 1, 3]$ . A total of 2000 iterations and 4000 function evaluations was allowed.

7.5. Comparison of continuous analogue and discrete iterative procedure

We performed 100 local optimization runs with the continuous analogue for a retraction factor $λ = 10^{7}$ and a discrete iterative method (Figure A). Both methods found the same best parameter value (Table ) and achieved a good fit to the data. The assessment of the results revealed a good convergence of the continuous analogue. Almost $90 %$ of the runs achieved an objective function value which was comparable with the best objective function value found across all runs (relative difference $< 0.001 %$ ). Overall, 96% of the runs finished successfully, meaning that either the optimization was stopped because the stopping criterion was fulfilled or the maximum number of iterations was reached, while 4% of the runs stopped prematurely.

Figure 4. Results of parameter estimation for CCL21 model. (A) Sorted objective function values for the multi-start optimization with continuous analogue ( $λ = 10^{7}$ ) and discrete iterative procedure. Converged runs are indicated in blue. (B) CPU time needed per optimizer run for the optimization using the continuous analogue and the discrete iterative procedure (lighter grey colour indicates runs which stopped because the maximal number of iterations was reached). The box covers the range between the 25th and the 75th percentile of the distribution. The median CPU time is indicated by a line. (C) Histogram of values for ${\hat{λ}}^{*}$ (Remark 5.2) obtained for 1000 points sampled in parameter-state space. (D) Percentage of completed runs (top), converged runs (middle)and median as well as 25th and 75th percentile of the runtime of completed runs (bottom) for different values of λ. For each value of λ, 100 local optimization runs were performed.

The discrete iterative optimization converged for $66 %$ of the runs to the optimal value (Figure A). Accordingly, the success rate was substantially lower than for the proposed continuous analogue. Of the runs which did not converge to the global optimum 25 runs were stopped because the maximal number of iterations was reached.

For the considered problem, the continuous analogue outperformed the discrete iterative method regarding the CPU time (Figure B). We found a median CPU time of 15 minutes for the continuous solver and 174 minutes for the discrete iterative procedure. In light of the fact that the discrete iterative method uses second-order information, it is interesting to observe that a continuous analogue using the negative gradient is more efficient. One possible explanation is that the efficiency of the continuous analogue is a result of the application of sophisticated numerical solvers. The adaptive, implicit solver ode15s, which is provided with the analytical Jacobian of the ODE–PDE model, might facilitate large step-sizes and fast convergence. Indeed, the Jacobian also provides second-order information.

7.6. Evaluation of retraction factor influence

As an analytical calculation of the bound for the retraction factor was not possible, we sampled 1000 points in parameter-state space and evaluated the estimate for the lower bound ${\hat{λ}}^{*}$ (Remark 5.2). The histogram of the resulting values for ${\hat{λ}}^{*}$ is presented in Figure C. The values for ${\hat{λ}}^{*}$ span many orders of magnitude, and the distribution peaks at $10^{4}$ . This result indicated that for different starting points very different retraction factors might be ideal.

To investigate the convergence properties for the different values of the retraction factor λ, we performed 100 local optimization runs for a range of different retraction factors. For each retraction factor, we assessed the number of completed runs and the number of converged runs (Figure D). Interestingly, as λ was increased the percentage of completed runs decreased. Yet, for large retraction factors many of the completed runs also converged, while for small retraction factors no runs converged as the maximum number of iterations becomes too large. The median CPU time for the optimization of one run decreased for increasing values of λ (Figure D). Notably, for the small values, the median CPU time was nearly six to seven times higher than the smallest one. The quantiles indicate that also the variability was higher for small values of λ. These results indicated that the retraction factor should be chosen large enough but not too large.

In summary, the analysis of the model of CCL21 gradient formation revealed that the retraction factor λ has a substantial influence on the convergence properties as well as the run time. For low values of λ starts did not converge while for large values of λ increasing stiffness of the problem could be observed. In an intermediate regime, which could here also be found by random sampling, we found the best convergence properties.

8. Conclusion and outlook

Parameter estimation is an important problem in a wide range of applications. Robustness and performance of the available iterative methods is, however, often limited. In this study, we introduced continuous analogues of descent methods for optimization with PDE constraints. For these continuous analogues, we proved local convergence of their solutions to the optima. The necessary assumptions are fulfilled for a wide range of application problems, rendering the results interesting for several research fields.

We demonstrated the applicability of continuous analogues for a model of gradient formation in biological tissues and compared them with an iterative discrete procedure. The results highlight the potential of the continuous analogues, e.g. a high convergence rate and lower computation times than the discrete iterative procedure. For the comparison, we used the MATLAB optimization routine fmincon.m, a state-of-the-art discrete iterative procedure. Alternatives would be IPOPT or KNITRO. As fmincon is a generic interior point method, there might apparently be approaches which are efficient for the considered PDE-constrained problems (see also the no free lunch theorem [Citation34]). The evaluation of the influence of the retraction factor revealed the importance of an appropriate choice of the retraction factor as well as the issue of premature stopping. In this study, we provide a lower bound for λ which ensures local convergence. As this bound might, however, be conservative and can only be assessed pointwise, the use of adaptive methods might be interesting. To address the issue of premature stopping, bounds for parameters and state variables have to be implemented, e.g. by including log-barrier functions [Citation35] in the objective function or through projection into the feasible space.

In the application problem, we only considered elliptic PDE constraints as for the proposed continuous analogues parabolic constraints can be encapsulated in the objective function. This changes the objective function landscape and indirectly influences the convergence. Conceptually, it should also be possible to formulate continuous analogues which do not require a solution operator for the parabolic PDE but also have the solution of the parabolic PDE as a state variable. This mathematically more elegant approach is left for future research.

In conclusion, this study presented continuous analogues for a new problem class. Similar to other problem classes for which continuous analogues have been established [Citation15,Citation16], we expect an improvement of convergence and computation time. The continuous analogues for optimization also complement recent work on simulation-based methods for uncertainty analysis [Citation36]. The efficient implementation of these methods in easily accessible software packages should be a focus of future research as it would render the methods available to a broad community.

The method and its analysis apply as they are to the case of infinite dimensional parameters θ. However, in that situation, the inverse problem of identifying θ is often ill-posed, so the assumption of practical identifiability (cf. Assumption 5.5 and Remark 6.4) might not be satisfied. To restore stability, regularization can be employed, as pointed out in Remark 6.4.

Future research in this context will be concerned with globalization strategies, such as those proposed in [Citation20,Citation21], cf. Remark 5.4.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

B.K. acknowledges financial support by the Austrian Science Fund FWF under grants I2271 and P30054. A.F. and J.H. acknowledge financial support by the German Research Foundation (DFG) under grant HA7376/1-1.

References

Isakov V. Inverse problems for partial differential equations. 2nd ed. New York, NY: Springer; 2006. (Applied Mathematical Sciences; 127).
Google Scholar
Tarantola A. Inverse problem theory and methods for model parameter estimation. Philadelphia: SIAM Society for Industrial and Applied Mathematics; 2005.
Google Scholar
Banks H, Kunisch K. Estimation techniques for distributed parameter systems. Boston, Basel, Berlin: Birkhäuser; 1989. (Systems & Control: Foundations & Applications; 1).
Google Scholar
Bock H, Carraro T, Jäeger W, et al. Model based parameter estimation: theory and applications. Berlin, Heidelberg: Springer; 2013. (Contributions in Mathematical and Computational Sciences; 4).
Google Scholar
Carvalho EP, Martínez J, Martínez JM, et al. On optimization strategies for parameter estimation in models governed by partial differential equations. Math Comput Simul. 2015;114:14–24. doi: 10.1016/j.matcom.2010.07.020
Web of Science ®Google Scholar
Hinze M, Pinnau R, Ulbrich M, et al. Optimization with PDE constraints. Netherlands: Springer; 2009. (Mathematical Modelling: Theory and Applications; 23).
Google Scholar
Ito K, Kunisch K. Lagrange multiplier approach to variational problems and applications. Philadelphia: SIAM Society for Industrial and Applied Mathematics; 2008. (Advances in Design and Control; 15).
Google Scholar
Xun X, Cao J, Mallick B, et al. Parameter estimation of partial differential equation models. J Am Stat Assoc. 2013;108:1009–1020. doi: 10.1080/01621459.2013.794730
Web of Science ®Google Scholar
Nielsen BF, Lysaker M, Grøttum P. Computing ischemic regions in the heart with the bidomain model – first steps towards validation. IEEE Trans Med Imaging. 2013 Jun;32(6):1085–1096. doi: 10.1109/TMI.2013.2254123
PubMed Web of Science ®Google Scholar
Airapetyan RG, Ramm AG, Smirnova AB. Continuous methods for solving nonlinear ill-posed problems. Operator Theory Appl. 2000;25:111.
Google Scholar
Kaltenbacher B, Neubauer A, Ramm AG. Convergence rates of the continuous regularized Gauss-Newton method. J Inv Ill-Posed Problems. 2002;10:261–280. doi: 10.1515/jiip.2002.10.3.261
Google Scholar
Tanabe K. Continuous Newton–Raphson method for solving and underdetermined system of nonlinear equations. Nonlinear Anal Theory Methods Appl. 1979;3(4):495–503. doi: 10.1016/0362-546X(79)90064-6
Google Scholar
Tanabe K. A geometric method in nonlinear programming. J Optim Theory Appl. 1980;30(2):181–210. doi: 10.1007/BF00934495
Web of Science ®Google Scholar
Watson L, Bartholomew-Biggs M, Ford J. Optimization and nonlinear equations. Journal of computational and applied mathematics. Vol. 124. Amsterdam: The Netherlands; 2001.
Google Scholar
Tanabe K. Global analysis of continuous analogues of the Levenberg–Marquardt and Newton–Raphson methods for solving nonlinear equations. Ann Inst Statist Math. 1985;37(Part B):189–203. doi: 10.1007/BF02481091
Google Scholar
Fiedler A, Raeth S, Theis FJ, et al. Tailored parameter optimization methods for ordinary differential equation models with steady-state constraints. BMC Syst Biol. 2016 Aug;10:80.
PubMedGoogle Scholar
Rosenblatt M, Timmer J, Kaschek D. Customized steady-state constraints for parameter estimation in non-linear ordinary differential equation models. Front Cell Dev Biol. 2016;4:41. doi: 10.3389/fcell.2016.00041
PubMed Web of Science ®Google Scholar
Zeidler E. Nonlinear functional analysis and its applications II/B: nonlinear monotone operators. New York, NY: Springer; 1990.
Google Scholar
Faller D, Klingmüller U, Timmer J. Simulation methods for optimal experimental design in systems biology. Simulation. 2003;79(12):717–725. doi: 10.1177/0037549703040937
Google Scholar
Potschka A. Backward step control for global Newton-type methods. SIAM J Numer Anal. 2016;54(1):361–387. doi: 10.1137/140968586
Web of Science ®Google Scholar
Potschka A. Backward step control for Hilbert space problems. 2017; ArXiv:1608.01863 [math.NA].
Google Scholar
Engl H, Hanke M, Neubauer A. Regularization of inverse problems. Dordrecht: Kluwer Academic Publishers; 2000. (Mathematics and Its Applications; 375).
Google Scholar
Alvarez D, Vollmann EH, von Andrian UH. Mechanisms and consequences of dendritic cell migration. Immunity. 2008;29(3):325–342. 2017/05/19. doi: 10.1016/j.immuni.2008.08.006
PubMed Web of Science ®Google Scholar
Mellman I, Steinman RM. Dendritic cells. Cell. 2001;106(3):255–258. 2017/05/19. doi: 10.1016/S0092-8674(01)00449-4
PubMed Web of Science ®Google Scholar
Hock S, Hasenauer J, Theis FJ. Modeling of 2D diffusion processes based on microscopy data: parameter estimation and practical identifiability analysis. BMC Bioinformatics. 2013;14(10):1–6.
PubMedGoogle Scholar
Schumann K, Lammermann T, Bruckner M, et al. Immobilized chemokine fields and soluble chemokine gradients cooperatively shape migration patterns of dendritic cells. Immunity. 2010 May;32(5):703–713. doi: 10.1016/j.immuni.2010.04.017
PubMed Web of Science ®Google Scholar
Weber M, Hauschild R, Schwarz J, et al. Interstitial dendritic cell guidance by haptotactic chemokine gradients. Science. 2013;339:328–332. doi: 10.1126/science.1228456
PubMed Web of Science ®Google Scholar
Hross S Parameter estimation and uncertainty quantification for reaction-diffusion models in image based systems biology [dissertation]. München: Technische Universität München; 2016.
Google Scholar
Raue A, Schilling M, Bachmann J, et al. Lessons learned from quantitative dynamical modeling in systems biology. PLoS One. 2013 Sep;8(9):e74335. doi: 10.1371/journal.pone.0074335
PubMed Web of Science ®Google Scholar
Stapor P, Weindl D, Ballnus B, et al. Pesto: parameter estimation toolbox. Bioinformatics. 2017 Oct;btx676–btx676. Available from: http://dx.doi.org/10.1093/bioinformatics/btx676.
Web of Science ®Google Scholar
Byrd RH, Hribar ME, Nocedal J. An interior point algorithm for large-scale nonlinear programming. SIAM J Optim. 1999;9:877–900. doi: 10.1137/S1052623497325107
Web of Science ®Google Scholar
Byrd RH, Gilbert JC, Nocedal J. A trust region method based on interior point techniques for nonlinear programming. Math Program. 2000 Nov;89(1):149–185. Available from: http://dx.doi.org/10.1007/PL00011391.
Web of Science ®Google Scholar
Waltz RA, Morales JL, Nocedal J, et al. An interior algorithm for nonlinear optimization that combines line search and trust region steps. Math Program. 2006 Jul;107(3):391–408. Available from: http://dx.doi.org/10.1007/s10107-004-0560-5.
Web of Science ®Google Scholar
Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997 Apr;1(1):67–82. doi: 10.1109/4235.585893
Google Scholar
Boyd S, Vandenberghe L. Convex optimisation. UK: Cambridge University Press; 2004.
Google Scholar
Boiger R, Hasenauer J, Hross S, et al. Integration based profile likelihood calculation for PDE constrained parameter estimation problems. Inverse Prob. 2016 Dec;32(12):125009. doi: 10.1088/0266-5611/32/12/125009
Web of Science ®Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free