Full article: Nonlinear least squares algorithm for identification of hazards

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Given observations of selected concentrations one wishes to determine unknown intensities and locations of the sources for a hazard. The concentration of the hazard is governed by a steady-state nonlinear diffusion–advection partial differential equation and the best fit of the data. The discretized version leads to a coupled nonlinear algebraic system and a nonlinear least squares problem. The coefficient matrix is a nonsingular M-matrix and is not symmetric. Iterative methods are compositions of nonnegative least squares and Picard/Newton methods, and a convergence proof is given. The singular values of the associated least squares matrix are important in the convergence proof, the sensitivity to the parameters of the model, and the location of the observation sites.

Keywords:

AMS subject classifications:

Public Interest Statement

Chemical or biological hazards can move by diffusion or fluid flow. The goal of the research is to make predictions about the location and intensities of the hazards based on the down stream location of observations sites. Mathematical models of the diffusion and flow are used to extrapolate from the observations data sites to unknown upstream locations and intensities of the hazards. The convergence of the iterative process for these extrapolations is studied. The sensitivities of the calculations, because of variations of the model parameters, observation data, and location of the observation sites, are also studied.

1. Introduction

The paper is a continuation of the work in White (Citation2011). The first new result is a convergence proof of the algorithm to approximate a solution to a nonlinear least square problem. A second new result is a sensitivity analysis of the solution as a function of the model parameters, measured data, and observation sites. Here, the singular value decomposition of the least squares matrix plays an important role.

A convergence proof is given for the algorithm to approximate the solution of a coupled nonlinear algebraic system and a nonnegative least squares problem, see lines (9–11) in this paper and in White (Citation2011). These discrete models evolve from the continuous models using the advection–diffusion PDE in Equation (1) (see Pao, Citation1992). In contrast to the continuous models in Andrle and El Badia (Citation2015), Hamdi (Citation2007), and Mahar and Datta (Citation1997), this analysis focuses on the linear and semilinear algebraic systems. In Hamdi (Citation2007), the steady-state continuous linear model requires observations at all of the down stream portions of the boundary. In Andrle and El Badia (Citation2015), the continuous linear velocity free model where the intensities of the source term are time dependent requires observations at the boundary over a given time interval. In both these papers, the Kohn–Vogelius objective function is used for the symmetric models in the identification problems.

Here and in White (Citation2011) we assume the coefficient matrix associated with the steady-state discretized partial differential equation in Equation (1) is an M-matrix and may not be symmetric, which includes coefficient matrices from upwind finite difference schemes for the advection–diffusion PDE. The least squares matrix C( : , ssites), see Equations (6–8), is an $k \times l$ matrix where $k > l$ and k and l are the number of observation and source sites, respectively. The rank and singular values (see Meyer, Citation2000, section 5.12), of C( : , ssites) determines whether of not the observation and source sites are located so that the intensities can be accurately computed.

Four examples illustrate these methods. The first two examples in Section 3 are for $n = 2$ and $n = 3$ unknowns, and they show some of the difficulties associated with the nonlinear problem. Sections 4 and 5 contain the convergence proof. Examples three and four in Section 6 are applications of the 1D and 2D versions of the steady-state advection–diffusion PDE. Sensitivity of the computed solutions due to variation of the model parameters, the data and the relative location of the source and observation sites is illustrated. The connection with the sensitivity and the singular values of the least squares matrix is established.

2. Problem formulation

Consider a hazardous substance with sources from a finite number of locations. The sources are typically located at points in a plane and can be modeled by delta functionals. The hazard is described as concentration or density by a function of space and time, u(x, y, t). This is governed by the advection–diffusion PDE(1) $\begin{matrix} u_{t} - \nabla ∙ D \nabla u + v ∙ \nabla u + d_{r} u = f (u) + S \end{matrix}$ (1)

where S is a source term, D is the diffusion, v is the velocity of the medium, $d_{r}$ is the decay rate, and f(u) is the nonlinear term such as in logistic growth models. The source term will be a finite linear combination of delta functionals(2) $\begin{matrix} S = \sum a_{i} δ (x - {\hat{x}}_{i}) \end{matrix}$ (2)

where the coefficients $a_{i}$ are the intensities and the locations ${\hat{x}}_{i}$ are possibly unknown points in the line or plane. The source coefficients should be nonnegative, which is in contrast with impulsive culling where the coefficient of the delta functionals are negative and have the form $- c_{i} u (x_{i})$ (see White, Citation2009).

We will assume the source sites and the observation sites do not intersect. The reason for this is the ability to measure concentrations most likely allows one to measure the intensities at the same site.

Identification problem: Given data for u(osites) at a number of given observation sites, osites, determine the intensities ( $a_{i}$ ) and locations ( ${\hat{x}}_{i}$ ) at source sites, ssites, so that $\begin{matrix} ‖ d a t a - u (o s i t e s) ‖ is suitably small . \end{matrix}$

2.1. Linear least squares

Since the governing PDE is to be discretized, we first consider the corresponding linear discrete problem $\begin{matrix} A u = d . \end{matrix}$

The coefficient matrix A is derived using finite differences with upwind finite differences on the velocity term. Therefore, we assume A is a nonsingular M-matrix (see Berman & Plemmons, Citation1994, chapter 6 or Meyer, Citation2000, section 7.10). The following approach finds the solution in one least squares step. The coefficient matrix is $n \times n$ and nodes are partitioned into three ordered sets, whose order represents a reordering of the nodes: $\begin{matrix} o s i t e s & observe sites \\ s s i t e s & source sites and \\ r s i t e s & remaining sites . \end{matrix}$

The discrete identification problem is given data data at osites, find d(ssites) such that $\begin{matrix} A u = d such that u (o s i t e s) = d a t a . \end{matrix}$

In general, the reordered matrix has the following $2 \times 2$ block structure with $o t h e r = [s s i t e s r s i t e s]$ (3) $\begin{matrix} A & = [\begin{matrix} a & e \\ f & b \end{matrix}] \end{matrix}$ (3)

where $a = A (o s i t e s, o s i t e s)$ , $e = A (o s i t e s, o t h e r)$ , $f = A (o t h e r, o s i t e s)$ and $b = A (o t h e r, o t h e r)$ .

Assumptions of this paper:

(1)	A is an M-matrix so that A and a have inverses.
(2)	$# (o s i t e s) = k$ , and $# (s s i t e s) = l < k$ so that $n = k + l + # (r s i t e s)$ .
(3)	$d = {[0 d_{1}^{T}]}^{T}$ and the only nonzero terms in $d_{1}$ are at the nodes in ssites, that is, $d \equiv {[0 d_{1} {(s s i t e s)}^{T} 0]}^{T} = {[0 z^{T} 0]}^{T}$ .

Multiply Equation (3) by a block elementary matrix (as in assumptions 1 and 2)

(4)

\begin{matrix} [\begin{matrix} I_{k} & 0 \\ - f a^{- 1} & I_{n - k} \end{matrix}] [\begin{matrix} a & e \\ f & b \end{matrix}] [\begin{matrix} u_{0} \\ u_{1} \end{matrix}] = [\begin{matrix} I_{k} & 0 \\ - f a^{- 1} & I_{n - k} \end{matrix}] [\begin{matrix} 0 \\ d_{1} \end{matrix}] \end{matrix}

(4)

(5)

\begin{matrix} [\begin{matrix} a & e \\ 0 & \hat{b} \end{matrix}] [\begin{matrix} u_{0} \\ u_{1} \end{matrix}] = [\begin{matrix} 0 \\ d_{1} \end{matrix}] \end{matrix}

(5)

where $\hat{b} \equiv b - f a^{- 1} e$ . Solve Equation (5) for $u_{1}$ and then $u_{0} = - a^{- 1} e {(\hat{b})}^{- 1} d_{1}$ . Define(6) $\begin{matrix} d_{1} \equiv {[d_{1} {(s s i t e s)}^{T} 0]}^{T} and C \equiv - a^{- 1} e {(\hat{b})}^{- 1} . \end{matrix}$ (6)

The computed approximations of the observed data are in $u_{0}$ and the source data are in $z = d_{1} (s s i t e s)$ . This gives a least squares problem for z (7) $\begin{matrix} d a t a = C (:, s s i t e s) z . \end{matrix}$ (7)

Since a is $k \times k$ , e is $k \times (n - k)$ , and $\hat{b}$ is $(n - k) \times (n - k)$ , the matrix C is $k \times (n - k)$ and the least squares matrix C( : , ssites) is $k \times l$ where $k > l$ . The least squares problem in (7) has a unique solution if the columns of C( : , ssites) are linearly independent ( $C (:, s s i t e s) z = 0$ implies $z = 0)$ .

Impose the nonnegative condition on the components of the least squares solution. That is, consider finding $z \geq 0$ such that(8) $\begin{matrix} r {(z)}^{T} r (z) = min_{y \geq 0} r {(y)}^{T} r (y) where r (z) \equiv d a t a - C (:, s s i t e s) z \end{matrix}$ (8)

If the matrix C( : , ssites) has full column rank, then the nonnegative least squares problem (8) has a unique solution (see Bjorck, Citation1996, section 5.2). The nonnegative least squares solution can be computed using the MATLAB command lsqnonneg.m (see MathWorks Inc., Citation2015).

2.2. Least squares and nonlinear term

Consider a nonlinear variation of $A u = d$ (9) $\begin{matrix} A u = d (z) + \hat{d} (u) \end{matrix}$ (9)

where $d (z) = {[0 z^{T} 0]}^{T}$ (as in assumption 3) and $\hat{d} (u) = [{\hat{d}}_{i} (u_{i})]$ .

Use the reordering in (3) and repeat the block elementary matrix transformation in (4) and (5) applied to the above. Using the notation $\hat{d} (u) = {[{\hat{d}}_{0} {(u)}^{T} {\hat{d}}_{1} {(u)}^{T}]}^{T}$ one gets $\begin{matrix} [\begin{matrix} a & e \\ 0 & \hat{b} \end{matrix}] [\begin{matrix} u_{0} \\ u_{1} \end{matrix}] & = [\begin{matrix} I_{k} & 0 \\ - f a^{- 1} & I_{n - k} \end{matrix}] ([\begin{matrix} 0 \\ d_{1} \end{matrix}] + [\begin{matrix} {\hat{d}}_{0} (u) \\ {\hat{d}}_{1} (u) \end{matrix}]) \\ = [\begin{matrix} 0 \\ d_{1} \end{matrix}] + [\begin{matrix} {\hat{d}}_{0} (u) \\ - f a^{- 1} {\hat{d}}_{0} (u) + {\hat{d}}_{1} (u) \end{matrix}] . \end{matrix}$

Solve for $u_{1}$ and then for $u_{0}$ $\begin{matrix} u_{0} & = a^{- 1} {\hat{d}}_{0} (u) - a^{- 1} e {(\hat{b})}^{- 1} (d_{1} + (- f a^{- 1} {\hat{d}}_{0} (u) + {\hat{d}}_{1} (u))) \\ = g (u) + C d_{1} \end{matrix}$

where $C \equiv - a^{- 1} e {(\hat{b})}^{- 1}$ and $g (u) \equiv a^{- 1} {\hat{d}}_{0} (u) + C (- f a^{- 1} {\hat{d}}_{0} (u) + {\hat{d}}_{1} (u))$ .

This leads to a nonlinear least squares problem for nonnegative z (10) $\begin{matrix} d a t a - g (u) = C (:, s s i t e s) z . \end{matrix}$ (10)

The solution of the nonnegative least squares problem is input to the nonzero components in d, that is, $d (s s i t e s) = z$ . The coupled problem from (9) and (10) is to find $z \geq 0$ and u so that(11) $\begin{matrix} d a t a - g (u) = C (:, s s i t e s) z and A u = d (z) + \hat{d} (u) . \end{matrix}$ (11)

3. Two algorithms for the coupled problem

If the nonnegative least squares problem has a unique solution $z = H (u)$ given a suitable u, then we can write (11) as $\begin{matrix} A u = d (H (u)) + \hat{d} (u), \end{matrix}$

or as a fixed point $\begin{matrix} u = A^{- 1} (d (H (u)) + \hat{d} (u)) \equiv G (u) . \end{matrix}$

The following algorithm is a composition of nonnegative least squares and a Picard update to form one iteration step. Using additional assumptions we will show this is contractive and illustrate linear convergence. Example 3.2 is a simple implementation with $n = 3$ , and in Section 6 Example 6.1 uses the 1D steady-state version of the model in (1) and Example 6.2 uses the 2D steady-state model in (1).

Algorithm 1. NLS-Picard Method for (11) choose nonnegative $u^{0}$ for $m = 0$ , mmax solve the nonnegative least squares problem $d a t a - g (u^{m}) = C (:, s s i t e s) z^{m + 1}$ solve the linear problem $A u^{m + 1} = d (z^{m + 1}) + \hat{d} (u^{m})$ test for convergence end for-loop

The following algorithm uses a combination of nonnegative least squares and Newton like updates. Define F(z, u) and $F_{u} (z, u)$ as $\begin{matrix} F (z, u) \equiv A u - d (z) - \hat{d} (u) and F_{u} (z, u) \equiv A - 0 - {\hat{d}}_{u} (u) \end{matrix}$

where ${\hat{d}}_{u} (u) \equiv [{({\tilde{d}}_{i})}^{'} (u_{i})]$ is the diagonal matrix with components ${({\tilde{d}}_{i})}^{'} (u_{i})$ .

The matrix $F_{u} (z, u)$ is not the full derivative matrix of F(z, u) because it ignores the dependence of z on u. Choose $λ > 0$ so that $F_{u} + λ I$ has an inverse.

Algorithm 2. NLS-Newton Method for (11) choose nonnegative $u^{0}$ and $λ > 0$ for $m = 0$ , mmax solve the nonnegative least squares problem $d a t a - g (u^{m}) = C (:, s s i t e s) z^{m + 1}$ solve the linear problem $A u^{m + 1 / 2} = d (z^{m + 1}) + \hat{d} (u^{m})$ compute $F (z^{m + 1}, u^{m + 1 / 2}) \equiv A u^{m + 1 / 2} - d (z^{m + 1}) - \hat{d} (u^{m + 1 / 2})$ $= \hat{d} (u^{m}) - \hat{d} (u^{m + 1 / 2})$ compute $F_{u} (z^{m + 1}, u^{m + 1 / 2}) \equiv A - 0 - {\hat{d}}_{u} (u^{m + 1 / 2})$ solve the linear problem $(F_{u} (z^{m + 1}, u^{m + 1 / 2}) + λ I) Δ u = F (z^{m + 1}, u^{m + 1 / 2})$ $u^{m + 1} = u^{m + 1 / 2} - Δ u$ test for convergence end for-loop

Variations of the NLS-Newton algorithm include using over relaxation, $ω$ , at the first linear solve and using damping, $α$ , on the second linear solve in the Newton update: $\begin{matrix} u^{m + 1 / 2} & = (1 - ω) u^{m} + ω u^{m + 1 / 2} and \\ u^{m + 1} & = u^{m + 1 / 2} - α Δ u . \end{matrix}$

Appropriate choices of $w, α$ and $λ$ can accelerate convergence.

The calculations recorded in Table 4 and Figure 4 in White (Citation2011), illustrate the NLS-Newton algorithm for the 1D steady-state problem with logistic growth term $\hat{d} (u) = f (u) = c (100 - u) u$ and variable growth rate c. The larger the c parameter resulted in larger concentrations. The intensities at the three source sites were identified. However, the number of iterations of the NLS-Newton algorithm increased as the c parameter increased. The algorithm did start to fail for c larger than 0.0100.

The lack of convergence is either because the associated fixed point problem does not have a contractive mapping or because the solutions become negative. The following example with $n = 2$ , nonzero source term and nonsymmetric A illustrates the difficulty of solving a semilinear algebraic problem where the solution is required to be positive.

Example 3.1

Let $n = 2$ and use $f_{i} (u_{i}) = c (100 - u_{i}) u_{i}$ . Equation (9) has the form $\begin{matrix} [\begin{matrix} 1 & 0 \\ - ε & 1 \end{matrix}] [\begin{matrix} u_{0} \\ u_{1} \end{matrix}] = [\begin{matrix} 0 \\ z \end{matrix}] + [\begin{matrix} f_{0} (u_{0}) \\ f_{1} (u_{1}) \end{matrix}] . \end{matrix}$

The first equation is $u_{0} = c (100 - u_{0}) u_{0}$ whose solution is either $u_{0} = 0$ or $u_{0} = 100 - 1 / c > 0$ . Put the nonzero solution into the second equation $- ε (100 - 1 / c) + u_{1} = z + c (100 - u_{1}) u_{1}$ and solve for $u_{1}$ . If $ε (100 - 1 / c) + z > 0$ , then the second equation is $u_{1} = c (100 - u_{1}) u_{1} + ε (100 - 1 / c) + z$ and will have a positive solution.

The next example with $n = 3$ , given data at the first and second nodes and a source term at the third node illustrates the coupled problem in (11). This simple example can be solved both by-hand calculations and the NLS-Picard algorithm, and more details are given in Appendix 1.

Example 3.2

Let the data be given at the first and second nodes $\begin{matrix} d a t a = [\begin{matrix} 2.9711 \\ 5.4971 \end{matrix}] . \end{matrix}$

Let the matrix A and nonlinear terms be given by system $\begin{matrix} [\begin{matrix} 3 & 0 & - 1 \\ 0 & 3 & - 2 \\ - 2 & - 2 & 4 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ z \end{matrix}] + [\begin{matrix} c u_{1} (10 - u_{1}) \\ c u_{2} (10 - u_{2}) \\ c u_{3} (10 - u_{3}) \end{matrix}] . \end{matrix}$

The submatrices of A are $\begin{matrix} a = [\begin{matrix} 3 & 0 \\ 0 & 3 \end{matrix}], e = [\begin{matrix} - 1 \\ - 2 \end{matrix}], f = [\begin{matrix} - 2 & - 2 \end{matrix}] and b = [4] . \end{matrix}$

Then one easily computes $\begin{matrix} \hat{b} = b - f a^{- 1} e = [2] and C = - a^{- 1} e {\hat{b}}^{- 1} = (1 / 6) [\begin{matrix} 1 \\ 2 \end{matrix}] . \end{matrix}$

The g(u) in the nonlinear least square Equation (11) is $\begin{matrix} g (u) = c [\begin{matrix} (4 / 9) u_{1} (10 - u_{1}) + (1 / 6) u_{3} (10 - u_{3}) + (1 / 9) u_{2} (10 - u_{2}) \\ (5 / 9) u_{2} (10 - u_{2}) + (2 / 6) u_{3} (10 - u_{3}) + (2 / 9) u_{1} (10 - u_{1}) \end{matrix}] . \end{matrix}$

The calculation with the NLS-Picard algorithm for this example with $n = 3$ in White (Citation2015, testlsnonlc.m) confirms both the exact solution for $c = 8 / 100 = 0.0800$ and convergence for $c < 0.1667$ , which are given in Appendix 1. The number of iterations needed for convergence increases as c increases, and convergence fails for $c = 0.3000$ . Both the exact and NLS-Picard calculations show $z > 0$ for $c \leq 0.1975$ and $z = 0$ for $c \geq 0.1976$ . The outputs for five values of $c = 0.08 : 0.04 : 0.24$ are given in Figure . The left graphs indicate the five solutions of (10) for fixed $z = z z = 10$ and increasing c. The two “*” are the given data, which are from the solution of (10) with $c = 0.0800$ and 5% error. The right graphs are the converged solutions of (11) for $c = 0.08 : 0.04 : 0.20$ .

4. Nonnegative least squares problem

The nonnegative least squares problem in (10) for $z \geq 0$ has several forms $\begin{matrix} R {(z)}^{T} R (z) = min_{y \geq 0} R {(y)}^{T} R (y) where R (z) \equiv d a t a - g (u) - C (:, s s i t e s) z . \end{matrix}$

This may be written as(12) $\begin{matrix} R {(z)}^{T} R (z) = {(d a t a - g (u))}^{T} (d a t a - g (u)) - 2 J (z) \end{matrix}$ (12)

where $\begin{matrix} J (z) \equiv \frac{1}{2} z^{T} C {(:, s s i t e s)}^{T} C (:, s s i t e s) z - z^{T} C {(:, s s i t e s)}^{T} (d a t a - g (u)) . \end{matrix}$

Figure 1. NLS-Picard algorithm for n = 3.

If C( : , ssites) has full column rank, then the matrix $C {(:, s s i t e s)}^{T} C (:, s s i t e s)$ is symmetric positive definite (SPD). J(z) is the quadratic functional associated with the linear system (normal equations) $\begin{matrix} 0 = r (z) \equiv C {(:, s s i t e s)}^{T} (d a t a - g (u)) - C {(:, s s i t e s)}^{T} C (:, s s i t e s) z \end{matrix}$

When the nonnegative condition is imposed on $z \geq 0$ and the matrix is SPD, then the following are equivalent (see Cryer, Citation1971): $\begin{matrix} J (z) = min_{y \geq 0} J (y) (minimum quadratic functional), \\ 0 \geq r {(z)}^{T} (y - z) for all y \geq 0 (variational inequality) and \\ 0 = r {(z)}^{T} z and 0 \geq r (z) (linear complementarity problem). \end{matrix}$

Any solution of a variational inequality is unique and depends continuously on the right-hand side. The next theorem is a special case of this.

Theorem 4.1

Consider the nonnegative least squares problem in (10). If C( : , ssites) has full column rank, then there is only one nonnegative solution for each u. Let $z = H (u)$ be the solution for each u in S where S is a bounded set of nonnegative n-vectors. Moreover, if $‖ \hat{d} (u) - \hat{d} (v) ‖ \leq \hat{K} ‖ u - v ‖$ , then there is a constant $C_{1}$ such that $\begin{matrix} ‖ C (:, s s i t e s) (H (u) - H (v)) ‖ \leq C_{1} ‖ u - v ‖ \end{matrix}$

Proof

Let z and w be solutions of (10). The variational inequalities for z with $y = w$ , and for w with $y = z$ are $\begin{matrix} 0 \geq r {(z)}^{T} (w - z) and 0 \geq r {(w)}^{T} (z - w) . \end{matrix}$

Add these to get a contradiction for the case $z - w$ is not a zero vector $\begin{matrix} 0 \geq {(z - w)}^{T} C {(:, s s i t e s)}^{T} C (:, s s i t e s) (z - w) > 0 . \end{matrix}$

In order to show the continuity, use the above with $z = H (u)$ and $w = H (v) :$ $\begin{matrix} 0 & \geq {[C {(:, s s i t e s)}^{T} (d a t a - g (u)) - C {(:, s s i t e s)}^{T} C (:, s s i t e s) H (u)]}^{T} (H (v) - H (u)) \\ 0 & \geq {[C {(:, s s i t e s)}^{T} (d a t a - g (v)) - C {(:, s s i t e s)}^{T} C (:, s s i t e s) H (v)]}^{T} (H (u) - H (v)) . \end{matrix}$

Add these to obtain for $B \equiv C {(:, s s i t e s)}^{T} C (:, s s i t e s)$ $\begin{matrix} {(H (v) - H u))}^{T} B (H (v) - H (u)) \leq - {(g (v) - g (u))}^{T} C (:, s s i t e s) (H (v) - H (u)) . \end{matrix}$

Use the Cauchy inequality $\begin{matrix} {‖ C (:, s s i t e s) (H (v) - H (u)) ‖}^{2} \leq ‖ g (v) - g (u) ‖ ‖ C (:, s s i t e s) (H (v) - H (u)) ‖ . \end{matrix}$

By the definition of g and the assumption on d we have for some $C_{1}$ $\begin{matrix} ‖ C (:, s s i t e s) (H (v) - H (u)) ‖ \leq C_{1} ‖ v - u ‖ . \end{matrix}$ $□$

The solution of the variational problem for fixed u has the form $z =$ H(u). The assumption that C( : , ssites) has full column rank implies there is a positive constant $c_{0}$ such that $c_{0}^{2} w^{T} w \leq w^{T} C {(:, s s i t e s)}^{T} C (:, s s i t e s) w$ . This and the above theorem give the following(13) $\begin{matrix} c_{0} ‖ H (v) - H (u) ‖ \leq ‖ C (:, s s i t e s) (H (v) - H (u)) ‖ \leq C_{1} ‖ v - u ‖ . \end{matrix}$ (13)

Although in Example 3.2 with $n = 3$ one can estimate $c_{0}$ and $C_{1}$ , generally these constants are not easily approximated. However, $c_{0}$ can be estimated by the smallest singular value of the least squares matrix.

5. Convergence proof of the NLS-Picard algorithm

Let the assumptions in the previous theorem hold so that (12) is true for all $u, v \in S$ . The mapping $G (u) \equiv A^{- 1} (d (H (u)) + \hat{d} (u))$ may not give $G (u) \in S!$ This appears to be dependent on the particular problem and properties of $\hat{d} (u)$ . If the bound is chosen so that $\hat{d} (u)$ is also nonnegative, then assumption (1) implies $A^{- 1} \geq 0$ and hence $G (u) \geq 0$ . The least squares condition requires the solution to be “close” to the given data at the osites, which suggests the G(u) should remain bounded. The following theorem gives general conditions on the problem.

Theorem 5.1

Let the assumptions of Theorem 4.1 hold for $u, v \in S$ . Assume there is a solution to the coupled problem (11), $u^{*} \in S$ . Let $c_{0}$ and $C_{1}$ be from (12). If $\begin{matrix} \hat{r} \equiv ‖ A^{- 1} ‖ (C_{1} / c_{0} + \hat{K}) < 1, \end{matrix}$

then the NLS-Picard algorithm will converge to this solution of (11).

Proof

Let $u^{*} = G (u^{*})$ and $u^{m + 1} = G (u^{m})$ . $\begin{matrix} ∥u^{*} - u^{m + 1}∥ & = ‖ G (u^{*}) - G (u^{m}) ‖ \\ = ∥A^{- 1} (d (H (u^{*}) - d (H (u^{m})) + A^{- 1} (\hat{d} (u^{*}) - \hat{d} (u^{m}))∥ \\ \leq ∥A^{- 1} ‖ (C_{1} / c_{0}) ‖ u^{*} - u^{m} ‖ + ‖ A^{- 1} ‖ \hat{K} ‖ u^{*} - u^{m}∥ \\ = ∥A^{- 1} ‖ (C_{1} / c_{0} + \hat{K}) ‖ u^{*} - u^{m}∥ \\ = \hat{r} ‖ u^{*} - u^{m} ‖ \\ ⋮ \\ \leq {\hat{r}}^{m + 1} ‖ u^{*} - u^{0} ‖ . \end{matrix}$

Since $\hat{r} < 1$ , this implies convergence $□$

6. Applications and sensitivity analysis

Both the NLS-Picard and NLS-Newton methods converge linearly, that is, the errors at the next iteration are bounded by a contractive constant, $\hat{r} < 1$ , times the error from the current iteration. However, the contractive constant for the NLS-Newton algorithm is significantly smaller than the contractive constant for the NLS-Picard algorithm, and this results in fewer iterations steps for the NLS-Newton algorithm. The calculations in the next two examples can be done by either algorithm. The relevance of the singular values of the least squares matrix to the sensitivity of the calculations with variations in the data and the source and observation sites is demonstrated.

Figure 2. NLS-Newton algorithm with variable c.

Example 6.1

Consider the steady-state 1D model in (1). The following calculations were done by the MATLAB code in White (Citation2015, poll1dlsnonla2.m) and with seven observation sites, three source sites, and $n = 160$ . The differential Equation (1) was discretized by the upwind finite difference approximations, and the sources terms were the intensities times the approximation of the delta functionals $\begin{matrix} d (s s i t e s) = {[a_{1} a_{2} a_{3}]}^{T} / d x \end{matrix}$

where initially $d \equiv z e r o s (n, 1)$ . In order to check the accuracy of the finite difference model, calculations were done with $n = 160, 320$ and 480, which gave similar results as reported below for $n = 160$ . The condition number (ratio of the largest and smallest singular values) of the least squares matrix controls the relative errors in the computed least squares problem. The condition numbers for the $n = 160, 320$ , and 480 are about the same at 17.0170, 16.5666, and 16.4146, respectively.

For $c = 0.010$ the NLS-Picard algorithm did not converge, but the NLS-Newton algorithm converged in 92 steps. The solid lines in Figure are with no random error in the data, and the dotted lines are from data with 5% random error. For $c = 0.005$ the NLS-Picard algorithm converged in 100 steps with $\hat{r} \approx . 83$ , and the NLS-Newton algorithm converged in 19 steps with $\hat{r} \approx . 41$ . If w is changed from 1.0 to 1.1, then the NLS-Newton algorithm converges in 16 steps with $\hat{r} \approx . 28$ . As c decreases, the iterations required for convergence and $\hat{r}$ decrease.

The dotted lines indicate some uncertainty in the numerical solutions, which can be a function of the physical parameters and the location of the observation sites relative to the source sites. If there is no error in the data, the intensities of the sources are in the vector $\begin{matrix} z = [\begin{matrix} 4.0000 & 1.0000 & 2.0000 \end{matrix}] . \end{matrix}$

Use 100 computations with the $c = 0.001$ and 5% random errors. The means and standard deviations of the computed intensities at the three source sites are $\begin{matrix} m e a n (z) & = [\begin{matrix} 4.0357 & 0.9437 & 2.0187 \end{matrix}] and \\ s t d (z) & = [\begin{matrix} 0.2570 & 0.3631 & 0.1375 \end{matrix}] . \end{matrix}$

The standard deviation of the intensities at the center source in Figure is large relative to its mean value. The large computed least squares errors are a result of some relatively small singular values in the singular value decomposition of the least squares matrix $\begin{matrix} C (:, s s i t e s) = U Σ V^{T} \end{matrix}$

where the singular values $σ_{i}$ are the diagonal components of $Σ$ ; U is $7 \times 7$ , $Σ$ is $7 \times 3$ and V is $3 \times 3$ .

In this example, the third singular value is small $\begin{matrix} σ = [\begin{matrix} 0.2882 & 0.0916 & 0.0169 \end{matrix}] . \end{matrix}$

The least square solution of (10) is given by the pseudo inverse of the least squares matrix times $d a t a - g (u) \equiv \hat{d}$ $\begin{matrix} V Σ^{†} U^{T} \hat{d} = v_{1} u_{1}^{T} \hat{d} / σ_{1} + v_{2} u_{2}^{T} \hat{d} / σ_{2} + v_{3} u_{3}^{T} \hat{d} / σ_{3} . \end{matrix}$

Because $σ_{3}$ is relatively small, any errors in $\hat{d}$ will be amplified as well as the components of $\begin{matrix} v_{3} = {[\begin{matrix} 0.5778 & - 0.7882 & 0.2121 \end{matrix}]}^{T} . \end{matrix}$

This suggests the second component, the center source, of least squares solution will have significant errors. The errors can be decreased if the percentage random error is smaller or by alternative location of the observation sites. In the 1D model this is not too difficult to remedy. However, in the 2D model this is more challenging.

Example 6.2

Consider the steady-state 2D model in (1) with boundary conditions equal to zero up stream and zero normal derivative down stream. The calculations in Figure were done by the MATLAB code in White (Citation2015, testlsnonlf2da3.m) using NLS-Picard and with four variable observation sites and three fixed source sites. The three fixed sources are to the left near $x = 0$ , and the four observation sites are indicated by the symbol “*” above the site. The intensities times the approximation of the delta functionals is $\begin{matrix} d (s s i t e s) = {[a_{1} a_{2} a_{3}]}^{T} / (d x * d y) \end{matrix}$

where initially $d \equiv z e r o s (n n, 1)$ and $n n = (n x + 1) * (n y + 1)$ . If there is no random error in the data, then the computed source will be $\begin{matrix} [\begin{matrix} 8.0000 & 16.0000 & 4.0000 \end{matrix}] . \end{matrix}$

The numerical experiments involve varying the location of the observation site at (x, 16) where $x = L - Δ L$ and varying the y-component of the velocity, vely. The velocity term in (1) is $(v e l x, v e l y) = (1.00, 0.10)$ . The base calculation in Figure uses $Δ L = 30$ and $v e l y = 0.10$ along with 5% random error in the data and $c = 0.0004$ growth coefficient in the nonlinear term. The singular values of the least squares matrix and the mean and standard deviations of the intensities of the three computed sources are recorded for 100 computations.

$Δ L = 30$ and $v e l y = 0.10 :$ $\begin{matrix} σ = [\begin{matrix} 0.0472 & 0.0300 & 0.0046 \end{matrix}] \\ m e a n (z) = [\begin{matrix} 7.9947 & 16.0341 & 4.0040 \end{matrix}] and \\ s t d (z) = [\begin{matrix} 0.2523 & 0.4966 & 0.1451 \end{matrix}] . \end{matrix}$

Changing the y-component of the velocity can cause the concentration to be moved away from the observation sites and cause more uncertainty in the calculations, which is indicated by larger standard deviations and smaller singular values of the least squares matrix. Change $v e l y = 0.10$ to $v e l y = 0.15$ and $v e l y = 0.05$ .

$Δ L = 30$ and $v e l y = 0.15 :$ $\begin{matrix} σ = [\begin{matrix} 0.0424 & 0.0302 & 0.0002 \end{matrix}] \\ m e a n (z) = [\begin{matrix} 7.9562 & 16.034 & 23.6939 \end{matrix}] and \\ s t d (z) = [\begin{matrix} 0.2399 & 0.5479 & 30.7650 \end{matrix}] . \end{matrix}$

Here, the third singular value has significantly decreased and can cause failure of the NLS-Picard algorithm or increased computation errors because of the variation in the data.

$Δ L = 30$ and $v e l y = 0.05 :$ $\begin{matrix} σ = [\begin{matrix} 0.0853 & 0.0004 & 0.0001 \end{matrix}] \\ m e a n (z) = [\begin{matrix} 7.7883 & 16.6030 & 3.9778 \end{matrix}] and \\ s t d (z) = [\begin{matrix} 0.5744 & 2.0884 & 0.1167 \end{matrix}] . \end{matrix}$

In this case the second and third singular values of the least squares matrix decreased, and the standard deviation of the second source increased.

Changing the location of the third observation site can also cause more uncertainty in the calculations, which is indicated by larger standard deviations and smaller singular values of the least squares matrix. Change $Δ L = 30$ to $Δ L = 20$ and $Δ L = 10$ and keep $v e l y = 0.10$ .

$Δ L = 20$ and $v e l y = 0.10 :$ $\begin{matrix} σ = [\begin{matrix} 0.0472 & 0.0300 & 0.0005 \end{matrix}] \\ m e a n (z) = [\begin{matrix} 8.0012 & 16.0350 & 4.5229 \end{matrix}] and \\ s t d (z) = [\begin{matrix} 0.2350 & 0.5321 & 4.5265 \end{matrix}] . \end{matrix}$

Here, the third singular value has significantly decreased and can cause failure of the NLS-Picard algorithm or increased computation errors because of the variation in the data. The standard deviation of the third source has significantly increased.

$Δ L = 10$ and $v e l y = 0.10 :$ $\begin{matrix} σ = [\begin{matrix} 0.0472 & 0.0320 & 0.0000396 \end{matrix}] \\ m e a n (z) = [\begin{matrix} 8.0005 & 15.9137 & 238.9531 \end{matrix}] and \\ s t d (z) = [\begin{matrix} 0.2379 & 0.5007 & 396.3124 \end{matrix}] . \end{matrix}$

In this case the third singular values of the least squares matrix have dramatically decreased, and the standard deviation of the third source is very large relative to the mean.

The calculation in Figure has six source sites (near the axes) and nine observation sites (down stream and below the “*”) (White, Citation2015, testlsnonlf2da6.m) uses NLS-Picard. Here the vely is larger with velocity the equal to (1.00, 0.30), and the random error in the data is smaller and equal to 1%: $\begin{matrix} σ = [\begin{matrix} 0.0782 & 0.0612 & 0.0459 & 0.0304 & 0.0178 & 0.0047 \end{matrix}] \\ m e a n (z) = [\begin{matrix} 23.9404 & 16.0215 & 8.0135 & 3.9869 & 16.1539 & 7.9334 \end{matrix}] and \\ s t d (z) = [\begin{matrix} 0.8512 & 0.4408 & 0.2064 & 0.0549 & 0.9046 & 0.2879 \end{matrix}] . \end{matrix}$

The source closest to the origin is the most difficult to identify; with one percent error in the data the standard deviation is 0.8512 and with five percent error this goes up to 4.2348. This happens despite the relatively close distribution of the singular values from 0.0782 to 0.0047 for a condition number equal to 16.7658.

7. Summary

In order to determine locations and intensities of the source sites from data at the observation sites, there must more observations sites than source sites and the observation sites must be more chosen so that the least squares matrix C( : , ssites) has full column rank. Furthermore, the singular values in the singular value decomposition of the least squares matrix are important in the convergence proof and the location of the observation sites.

Figure 3. Three sources and four observations.

Figure 4. Six sources and nine observations.

Measurement errors in the observations as well as uncertain physical parameters can contribute to significant variation in the computed intensities of the sources. Multiple computations with varying these should be done, and the means and standard deviations should be noted. Large standard deviations from the mean indicate a lack of confidence in the computations. In this case, one must adjust the observation sites, which can be suggested by inspection of the smaller singular values for the least squares matrix and the corresponding columns in V of the singular value decomposition.

Additional information

Funding

No external funding. Some of this research was done before retirement at North Carolina State University.

Notes on contributors

Robert E. White

Professor Robert E. White has published numerous articles on numerical analysis including approximate solutions of partial differential equations, numerical linear algebra, and parallel algorithms. The most recent work is on hazard identification given observation data. He has authored three textbooks. The second edition of Computational Mathematics: Models, Methods and Analysis with MATLAB and MPI will be published in the fall of 2015 by CRC Press/Taylor and Francis.

References

Andrle, M., & El Badia, A. (2015). On an inverse problem. Application to a pollution detection problem, II. Inverse Problems in Science and Engineering, 23, 389–412.
Web of Science ®Google Scholar
Berman, A., & Plemmons, R. J. (1994). Nonnegative matrices in the mathematical sciences. Philadelphia, PA: SIAM.
Google Scholar
Bjorck, A. (1996). Numerical methods for least squares problems. Philadelphia, PA: SIAM.
Google Scholar
Cryer, C. W. (1971). The solution of a quadratic programming problem using systematic over ralaxation. SIAM Journal on Control and Optimization, 9, 385–392.
Web of Science ®Google Scholar
Hamdi, A. (2007). Identification of point sources in two-dimensional advection-diffusion-reaction equations: Application to pollution sources in a river. Stationary case, Inverse Problems in Science and Engineering, 15, 855–870.
Web of Science ®Google Scholar
Mahar, P. S., & Datta, B. (1997). Optimal monitoring network and ground-water-pollution source identification. Journal of Water Resources Planning and Management, 199, 199–2007.
Web of Science ®Google Scholar
MathWorks Inc. (2015). (123, pp. 199–207). Retrieved from http://www.mathworks.com
Google Scholar
Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Philadelphia, PA: SIAM.
Google Scholar
Pao, C. V. (1992). Nonlinear parabolic and elliptic equations. New York, NY: Plenum Press.
Google Scholar
White, R. E. (2009). Populations with impulsive culling: Identification and control. International Journal of Computer Mathematics, 86, 2143–2164.
Web of Science ®Google Scholar
White, R. E. (2011). Identification of hazards with impulsive sources. International Journal of Computer Mathematics, 88, 762–780.
Web of Science ®Google Scholar
White, R. E. (2015). MATLAB codes for hazard identification. Retrieved from http://www4.ncsu.edu/eos/users/w/white/www/white/hazardid/hazardid.htm
Google Scholar

Appendix 1

Details for Example 3.2 with $n = 3$

The least square problem in (11) can be solved for nonnegative $z = H (u)$ $\begin{matrix} z & = (6 / 5) (d a t a (1) + 2 d a t a (2)) \\ - (16 / 15) c u_{1} (10 - u_{1}) \\ - (22 / 15) c u_{2} (10 - u_{2}) \\ - c u_{3} (10 - u_{3}) . \end{matrix}$

Next, consider the system in (11). The first two equations give the first and second unknowns in terms of the third unknown $\begin{matrix} u_{1} & = \frac{- (3 - c 10) + \sqrt{{(3 - c 10)}^{2} + 4 c u_{3}}}{2 c} and \\ u_{2} & = \frac{- (3 - c 10) + \sqrt{{(3 - c 10)}^{2} + 8 c u_{3}}}{2 c} . \end{matrix}$

Insert the formulas for $z, u_{1}$ , and $u_{2}$ into the third equation in the system to get a nonlinear problem for a single unknown $u_{3}$ and the single equation $\begin{matrix} \sqrt{{(3 - c 10)}^{2} + 4 c u_{3}} + 2 \sqrt{{(3 - c 10)}^{2} + 8 c u_{3}} \\ = 3 (3 - c 10) + 2 c (d a t a (1) + 2 d a t a (2)) . \end{matrix}$

When $c = 8 / 100$ , this can be solved by Newton’s method to give $u_{3} = 7.2526$ . Then solve for the first two unknowns and finally solve for z $\begin{matrix} u_{1} = 2.9748, u_{2} = 5.4952 and z = 10.4764 . \end{matrix}$

In the above example, we were able to explicitly solve for nonnegative z in terms of the components of u. The problem $A u = d (H (u)) + \hat{d} (u)$ is $\begin{matrix} [\begin{matrix} 3 & 0 & - 1 \\ 0 & 3 & - 2 \\ - 2 & - 2 & 4 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] & = [\begin{matrix} c u_{1} (10 - u_{1}) \\ c u_{2} (10 - u_{2}) \\ D (u_{1}, u_{2}) \end{matrix}] \end{matrix}$

where $\begin{matrix} D (u_{1}, u_{2}) & \equiv (6 / 5) (d a t a (1) + 2 d a t a (2)) \\ - (16 / 15) c u_{1} (10 - u_{1}) \\ - (22 / 15) c u_{2} (10 - u_{2}) . \end{matrix}$

The equivalent fixed point problem is $u = A^{- 1} (d (H (u)) + \hat{d} (u))$ $\begin{matrix} [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = (1 / 18) [\begin{matrix} 8 & 2 & 3 \\ 4 & 10 & 6 \\ 6 & 6 & 9 \end{matrix}] [\begin{matrix} c u_{1} (10 - u_{1}) \\ c u_{2} (10 - u_{2}) \\ D (u_{1}, u_{2}) \end{matrix}] \equiv G (u) . \end{matrix}$

Expand the matrix-vector product to get $\begin{matrix} G (u) \approx [\begin{matrix} f_{1} (. 26) + f_{2} (- . 13) + 2.8 \\ f_{1} (- . 13) + f_{2} (. 07) + 5.6 \\ f_{1} (- . 20) + f_{2} (- . 40) + 8.4 \end{matrix}] where f_{i} \equiv c u_{i} (10 - u_{i}) . \end{matrix}$

Assume u, v are nonnegative and bounded and in $\begin{matrix} S \equiv {u | u = {[\begin{matrix} u_{1} & u_{2} & u_{3} \end{matrix}]}^{T}, 0 \leq u_{i} \leq 10, i = 1, 2, 3} . \end{matrix}$

Then $0 \leq f_{i} \leq 25 c$ and since $z \geq 0, G (u) \geq 0$ . If $c \leq 1$ , then $G (u) < 10$ and G aps S into S.

The contraction mapping theorem requires G(u) to be contractive, that is, $\begin{matrix} {‖ G (u) - G (v) ‖}_{\infty} \leq r {‖ u - v ‖}_{\infty} where r < 1 . \end{matrix}$

One can approximate the differences $\begin{matrix} Δ f_{i} & \equiv c u_{i} (10 - u_{i}) - c v_{i} (10 - v_{i}) \\ ‖ Δ f_{i} ‖ & \leq c 10 ‖ u_{i} - v_{i} ‖ . \end{matrix}$

Then it is possible to estimate r as a function of c $\begin{matrix} {‖ G (u) - G (v) ‖}_{\infty} & \leq ‖ (1 / 18) [\begin{matrix} 8 & 2 & 3 \\ 4 & 10 & 6 \\ 6 & 6 & 9 \end{matrix}] [\begin{matrix} Δ f_{1} \\ Δ f_{2} \\ D (u_{1}, u_{2}) - D (v_{1}, v_{2}) \end{matrix}] ‖_{\infty} \\ \leq (1 / 18) ‖ [\begin{matrix} Δ f_{1} (8 + 3 (- 16 / 15)) + Δ f_{2} (2 + 3 (- 22 / 15)) \\ Δ f_{1} (4 + 6 (- 16 / 15)) + Δ f_{2} (10 + 6 (- 22 / 15)) \\ Δ f_{1} (6 + 9 (- 16 / 15)) + Δ f_{2} (6 + 9 (- 22 / 15)) \end{matrix}] ‖_{\infty} \\ \leq {c 10 (10.8 / 18) ‖ u - v ‖}_{\infty}, \end{matrix}$

Let $r = c (108 / 18) < 1$ and require $c < 1 / 6 = 0.1667$ .

Nonlinear least squares algorithm for identification of hazards