Search in:

Inverse Problems in Science and Engineering Volume 28, 2020 - Issue 12

Submit an article Journal homepage

Free access

1,253

Views

CrossRef citations to date

Altmetric

Listen

Articles

A new regularization approach for numerical differentiation

Abinash NayakDepartment of Mathematics, University of Alabama at Birmingham, Birmingham, AL, USACorrespondence[email protected]
[email protected]

Pages 1747-1772 | Received 15 Oct 2019, Accepted 22 Apr 2020, Published online: 04 Jun 2020

Cite this article
https://doi.org/10.1080/17415977.2020.1763983
CrossMark

In this article

1. Introduction
2. Notations and preliminaries
3. Convexity of the functional G
4. The descent algorithm
5. Convergence, stability and conditional well-posedness
6. Numerical implementation
7. Results
8. Stopping criterion II
9. Conclusion and future research
Acknowledgements
Disclosure statement
Footnotes
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

It is well known that the problem of numerical differentiation is an ill-posed problem and one requires regularization methods to approximate the solution. The commonly practiced regularization methods are (external) parameter-based like Tikhonov regularization, which has certain inherent difficulties associated with them. In such scenarios, iterative regularization methods serve as an attractive alternative. In this paper, we propose a novel iterative regularization method where the minimizing functional does not contain the noisy data directly, but rather a smoothed or integrated version of it. The advantage, in addition to circumventing the use of noisy data directly, is that the sequence of functions constructed during the descent process tends to avoid overfitting, and hence, does not corrupt the recovery significantly. To demonstrate the effectiveness of our method we compare the numerical results obtained from our method with the numerical results obtained from certain standard regularization methods such as Tikhonov regularization, Total-variation, etc.

Keywords:

Numerical differentiation
Newton-type methods
Mathematical programming methods
optimization and variational techniques
Volterra integral equations
inverse problems
Tikhonov regularization
iterative regularization
numerical analysis

1991 Mathematics Subject Classifications:

Primary 65D25
Secondary 49M15
65K05
65K10
45D05
45Q05

1. Introduction

In many applications, we want to calculate the derivative of a function measured experimentally, that is, to differentiate a function obtained from discrete noisy data. The problem consists of calculating stably the derivative of a smooth function g given its noisy data $\tilde{g}$ such that $| | \tilde{g} - g | | \leq δ$ , where the norm $| | \cdot | |$ can be either the $L^{\infty} or L^{2}$ -norm. This turns out to be an ill-posed problem, since for a small δ such that $| | \tilde{g} - g | | \leq δ$ we can have $| | {\tilde{g}}^{'} - g^{'} | |$ arbitrarily large or even throw $\tilde{g}$ out of the set of differentiable functions. Many methods and techniques have been introduced in the literature regarding this topic (see [Citation1–13] and references therein). They mostly fall into one of three categories: difference methods, interpolation methods and regularization methods. One can use the first two methods to get satisfactory results provided that the function is given precisely, but they fail miserably when they encounter even small amounts of noise, if the step size is not chosen appropriately or in a regularized way. Hence in such scenarios, regularization methods need to be employed to handle the instability arising from noisy data, with Tikhonov's regularization method being very popular in this respect. In a regularization method, the instability is bypassed by reducing the numerical differentiation problem to a family of well-posed problems, depending on a regularization (smoothing) parameter. Once an optimal value for this parameter is found, the corresponding well-posed problem is then solved to obtain an estimate for the derivative. Unfortunately, the computation of the optimal value for this parameter is a nontrivial task, and usually it demands some prior knowledge of the errors involved. Though there are many techniques developed to find an appropriate parameter value, such as the Morzov's discrepancy principle, the L-curve method and others, still sometimes the solution recovered is either over-fitted or over-smoothed; especially when g has sharp edges, discontinuities or the data has extreme noise in it. An attractive alternative is the iterative regularization methods, like the Landweber iterations. In an iterative regularization method (without any explicit dependence on external parameters), one can start the minimization process first and then use the iteration index as an regularization parameter, i.e. one terminates the minimization process at an appropriate time to achieve regularization. More recently iterative methods have been investigated in the frame work of regularization of nonlinear problems, see [Citation14–18].

In this paper, we propose a new smoothing or regularization technique that doesn't involve any external parameters, thus avoiding all the difficulties associated with them. Furthermore, since this method does not involve the noisy data directly, it is very robust to large noise level in the data as well as to errors with non-zero mean. Thus it makes the method computationally feasible when dealing with real-life data, where we do not expect to have any knowledge of the errors involved or when there is extreme noise involved in the data.

Let us briefly outline our method. First, we lay out a brief summary of a typical (external) parameter-dependent regularization method, and then we present our new (external) parameter-independent smoothing technique. For a given differentiable function g the problem of numerical differentiation, finding $φ = g^{'}$ , can be expressed via a Volterra integral equation:Footnote¹ (1) $T_{D} φ = g_{1},$ (1) where $g_{1} (x) = g (x) - g (a)$ and (2) $(T_{D} φ) (x) := \int_{a}^{x} φ (ξ) d ξ,$ (2) for all $x \in [a, b] \subset R$ . Here one can attempt to solve Equation (Equation1(1) $T_{D} φ = g_{1},$ (1) ) by approximating ϕ with the minimizer of the functional $G_{1} (ψ) = | | T_{D} ψ - g_{1} | |_{2}^{2} .$ However, as stated earlier, the recovery becomes unstable for a noisy g. So, to counter the ill-posed nature of this problem regularization techniques are introduced: one instead minimizes the functionalFootnote² (3) $G_{2} (ψ, β) = | | T_{D} ψ - g_{1} | |_{2}^{2} + β | | D ψ | |,$ (3) where β is a regularization parameter and D is a differentiation operator of some order (see [Citation1,Citation14]). This converts the ill-posed problem to a family of conditionally well-posed problems depending on β. The first term (fitting term) of the functional $G_{2}$ ensures that the inverse solution fits well with the given data, when applied by the forward operator $T_{D}$ , and the second term (smoothing term) in (Equation3(3) $G_{2} (ψ, β) = | | T_{D} ψ - g_{1} | |_{2}^{2} + β | | D ψ | |,$ (3) ) controls the smoothness of the inverse solution. Since the minimizer of the second term is not the solution of the inverse problem, unless it is a trivial solution, one needs to balance between the smoothing and fitting of the inverse recovery by finding an optimal value $β_{0}$ . Then the exact solution ϕ is approximated by minimizing the corresponding functional $G (\cdot, β_{0})$ .

Given that an important key in any regularization technique is the conversion of an ill-posed problem to a (conditionally) well-posed problem, we make this our main goal. We start with integrating the data twice, which helps in smoothing out the noise. Thus (Equation1(1) $T_{D} φ = g_{1},$ (1) ) can be reformulated as: find a ϕ that satisfies (4) $T_{D} φ = - u_{1}^{″},$ (4) where $- u_{1}^{″} = g_{1}$ and $g_{1}^{'} = g^{'} = φ$ . Equivalently, $u_{1} (x) = - (- \int_{x}^{b} \int_{a}^{η} g_{1} (ξ) d ξ d η)$ or (5) $u_{1} (x) = \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] .$ (5) Now we find the solution of (Equation4(4) $T_{D} φ = - u_{1}^{″},$ (4) ) by approximating it with the minimizer of the following functional: (6) $G (ψ) = | | u_{ψ}^{'} - u_{1}^{'} | |_{2}^{2},$ (6) where $u_{ψ}$ is the solution, for a given ψ, of the following boundary value problem: (7) $\begin{aligned} - u_{ψ}^{″} & = T_{D} ψ, \end{aligned}$ (7) (8) $\begin{aligned} u_{ψ} (a) & = u_{1} (a), u_{ψ} (b) = u_{1} (b) . \end{aligned}$ (8) Note that unlike the functionals $G_{1}$ and $G_{2}$ which are defined on the data g directly, the functional G is defined rather on the transformed data $u_{1}^{'}$ . Thus one can see that when the data g has noise in it then working with the smoothed (integrated) data $u_{1}$ is more effective than working with the noisy data g directly. It will be proved in later sections that this simple change in the working space from g to $u_{1}$ drastically improves the stability and smoothness of the inverse recovery of ϕ, without adding any further smoothing terms.

Remark 1.1

Typically, real-life data have zero-mean additive noise in it, i.e. (9) $\tilde{g} (x) = g (x) + ϵ (x),$ (9) for $x \in [a, b]$ , such that $E_{μ} (ϵ) = 0$ and $E_{μ} (ϵ^{2}) \leq δ$ (or equivalently, $| | \tilde{g} - g | |_{L^{2}} \leq δ$ ), where μ is the probability density function of the random variable ε. Therefore, integrating the data smooths out the noise present in the original data ( $\tilde{g}$ ) and hence, the new data ( $\tilde{u_{1}}$ , as defined by (Equation5(5) $u_{1} (x) = \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] .$ (5) ) for $\tilde{g}$ ) for the transformed equation (Equation4(4) $T_{D} φ = - u_{1}^{″},$ (4) ) has a significantly reduced noise level in it, see the comparison in Figure .

We prove, in Section 3, that the functional G is strictly convex and hence has a unique global minimum which satisfies (Equation1(1) $T_{D} φ = g_{1},$ (1) ), that is, $φ = {\arg \min}_{ψ} G (ψ)$ satisfies $g^{'} = φ$ . The global minimum is achieved via an iterative method using an upgraded steepest gradient or conjugate-gradient method, presented in Section 4, where the use of an (Sobolev) $H^{1}$ -gradient for G instead of the commonly used $L^{2}$ -gradient is discussed and is a crucial step in the optimization process. In Section 5, we shed some light on the stability, convergence and well-posedness of this technique. In Section 6, the Volterra operator $T_{D}$ is improved for better computation and numerical efficiency but keeping it consistent with the theory developed. In Section 7, we provide some numerical results and compare our method of numerical differentiation with some popular regularization methods, namely Tikhonov regularization, total variation, smoothing spline, mollification method and least square polynomial approximation (the comparison is done with results provided in [Citation1,Citation2,Citation19]). Finally, in Section 8 we present an efficient (heuristic) stopping criterion for the descent process when we don't have any prior knowledge of the error norm.

2. Notations and preliminaries

We adopt the following notations that will be used throughout the paper. All functions are real-valued defined on a bounded closed domain $[a, b] \subset R$ . For $1 \leq p < \infty$ , $L^{p} [a, b]$ := $(L^{p}, | | \cdot | |_{L^{p}}, [a, b])$ denotes the usual Banach space of p-integrable functions on $[a, b]$ and the space $L^{\infty} [a, b] :=$ $(L^{\infty}, ‖ \cdot ‖_{L^{\infty}}, [a, b])$ contains the essentially bounded measurable functions. Likewise the Sobolev space $H^{q} [a, b] :=$ $(H^{q}, ‖ \cdot ‖_{H^{q}}, [a, b])$ contains all the functions for which f, $f^{'}, \dots, f^{(q)} \in L^{2} [a, b]$ , with weak differentiation understood, and the space $H_{0}^{q} [a, b] := {f \in H^{q} : ‘ f vanishes at the boundary^{'}}$ . The spaces $L^{2} [a, b]$ and $H^{q}$ are Hilbert spaces with inner-products denoted by $(\cdot, \cdot)_{L^{2}}$ and $(\cdot, \cdot)_{H^{q}}$ , respectively.

Remark 2.1

Note that although the Volterra operator $T_{D}$ is well defined on the space of integrable functions ( $L^{1} [a, b]$ ), we will restrict it to the space of $L^{2} [a, b]$ functions, that is, we shall consider $L^{2} [a, b]$ to be the searching space for the solution of (Equation1(1) $T_{D} φ = g_{1},$ (1) ), since it is a Hilbert space and has a nice inner product $(\cdot, \cdot)_{L^{2}}$ . Hence the domain of the functional G is $D_{G} = L^{2} [a, b]$ . It is not hard to see that the Volterra operator $T_{D}$ is linear and bounded on $L^{2} [a, b]$ .

Remark 2.2

From Remark 2.1, since $φ \in L^{2} [a, b]$ we have $T_{D} φ \in L^{2} [a, b]$ . Thus our problem space to be considered is $H^{1} [a, b]$ , that is, $g \in H^{1} [a, b]$ . Note that since we will not be working with g directly but rather with $u_{1}$ , integrating twice makes $u_{1} \in H^{3} [a, b] \subset H^{1} [a, b]$ . This is particularly significant in the sense that we are able to upgrade the smoothness of the working data or information space from $H^{1} [a, b]$ to $H^{3} [a, b]$ and hence improve the stability and efficiency of numerical computations. This effect can be seen in some of the numerical examples presented in Section 7 where the noise involved in the data is extreme.

Before we proceed to prove the convexity of G and the well-posedness of the method, we will manipulate $u_{1}$ further to push it to an even nicer space. From Equation (Equation5(5) $u_{1} (x) = \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] .$ (5) ), we have $u_{1} (b) = 0 and u_{1} (a) = \int_{a}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) \frac{(b - a)^{2}}{2}$ Now redefining the working data as $u (x) = u_{1} (x) - (b - x / b - a) u_{1} (a)$ , we have (10) $\begin{aligned} u (x) & = \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] \\ - \frac{b - x}{b - a} [\int_{a}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) \frac{(b - a)^{2}}{2}], \end{aligned}$ (10) and this gives $u (a) = 0$ , $u (b) = 0$ and $- u^{″} = - u_{1}^{″} = g_{1}$ . Thus $u \in H_{0}^{3} [a, b]$ and the negative Laplace operator $- Δ = - \partial^{2} / \partial x^{2}$ is a positive operator in $L^{2} [a, b]$ on the domain $D_{Δ} = H_{0}^{2} [a, b] \supset H_{0}^{3} [a, b]$ , since for any $v \in H_{0}^{2} [a, b]$ (11) $\begin{aligned} (- Δ v, v)_{L^{2}} & = \int_{a}^{b} - (Δ v) v d x \\ = [- v \nabla v]_{a}^{b} + \int_{a}^{b} | \nabla v |^{2} d x \\ = (\nabla v, \nabla v)_{L^{2}} \geq 0 \end{aligned}$ (11) and equality holds when $\nabla v = 0$ , which implies $v \equiv 0$ . Here, and until otherwise specified, the notations Δ and ∇, will mean $\partial^{2} / \partial x^{2}$ , $\partial / \partial x$ , respectively, throughout the paper. The functional $G (ψ)$ in (Equation6(6) $G (ψ) = | | u_{ψ}^{'} - u_{1}^{'} | |_{2}^{2},$ (6) ) is now redefined, with the new u, as (12) $G (ψ) = ‖ u_{ψ}^{'} - u^{'} ‖_{2} .$ (12) The positivity of the operator $- Δ$ in $L^{2} [a, b]$ on the domain $D_{Δ}$ also helps us to obtain an upper and lower bound for $G (ψ)$ , for any $ψ \in L^{2} [a, b]$ . First we get the trivial upper bound (13) $G (ψ) = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \leq \sum_{j = 0}^{1} {| | \frac{\partial^{j}}{\partial x^{j}} (u - u_{ψ}) | |}_{L^{2}}^{2} = | | u - u_{ψ} | |_{H^{1}}^{2}$ (13) To get a lower bound we note from (Equation8(8) $\begin{aligned} u_{ψ} (a) & = u_{1} (a), u_{ψ} (b) = u_{1} (b) . \end{aligned}$ (8) ) that $v = u - u_{ψ} \in H_{0}^{2} [a, b]$ , thus (14) $\begin{aligned} G (ψ) & = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \\ \geq λ_{1} | | u - u_{ψ} | |_{L^{2}}^{2}, \end{aligned}$ (14) where $λ_{1} > 0$ is the smallest eigenvalue of the positive operator $- Δ$ on $D_{Δ}$ . Hence, we get the bounds of $G (ψ)$ in terms of the $H^{1} [a, b]$ norm of $u - u_{ψ}$ as (15) ${(1 + \frac{1}{λ_{1}})}^{- 1} | | u - u_{ψ} | |_{H^{1}}^{2} \leq G (ψ) = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \leq | | u - u_{ψ} | |_{H^{1}}^{2} .$ (15) Equation (Equation15(15) ${(1 + \frac{1}{λ_{1}})}^{- 1} | | u - u_{ψ} | |_{H^{1}}^{2} \leq G (ψ) = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \leq | | u - u_{ψ} | |_{H^{1}}^{2} .$ (15) ) indicates the stability of the method and the convergence of $u_{ψ_{m}}$ to u in $H^{1} [a, b]$ if $G (ψ_{m}) \to 0$ , as is explained in detail in Section 5.

3. Convexity of the functional G

In this section, we prove the convexity of the functional G, together with some important properties.

Theorem 3.1

An equivalent form of G, for any $ψ \in L^{2} [a, b],$ is (16) $\begin{aligned} G (ψ) & = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \\ = \int_{a}^{b} ({u^{'}}^{2} - {u_{ψ}^{'}}^{2}) - 2 (T_{D} ψ) (u - u_{ψ}) d x . \end{aligned}$ (16)
For any $ψ_{1}, ψ_{2} \in L^{2} [a, b],$ we have (17) $G (ψ_{1}) - G (ψ_{2}) = \int_{a}^{b} - 2 (T_{D} (ψ_{1} - ψ_{2})) (u - \frac{u_{ψ_{1}} + u_{ψ_{2}}}{2}) d x .$ (17)
The first G $\hat{a}$ teaux derivative,Footnote³ at $ψ \in L^{2} [a, b],$ for G is given by (18) $G^{'} (ψ) [h] = \int_{a}^{b} (T_{D} h) (- 2 (u - u_{ψ})) d x,$ (18) where $h \in L^{2} [a, b]$ . And the $L^{2}$ -gradient of G, at ψ, is given by (19) $\nabla_{L^{2}}^{ψ} G = T_{D}^{*} (- 2 (u - u_{ψ})),$ (19) where the adjoint $T_{D}^{*}$ of $T_{D}$ is given by, for any $f \in L^{2} [a, b],$ Footnote⁴ (20) $(T_{D}^{*} f) (x) = \int_{x}^{b} f (ξ) d ξ; \forall x \in [a, b] .$ (20)
The second Gâteaux derivative,Footnote⁵ at any $ψ \in L^{2} [a, b],$ of G is given by (21) $G^{″} (ψ) [h, k] = 2 (- Δ^{- 1} (T_{D} h), (T_{D} k))_{L^{2}}$ (21) where $h, k \in L^{2} [a, b]$ . Hence for any $ψ \in L^{2} [a, b],$ $G^{″} (ψ)$ is a positive definite quadratic form.

For the proof of Theorem 3.1, we also need the following ancillary result.

Lemma 3.2

For fixed $ψ, h \in L^{2} [a, b]$ we have, in $H^{1} [a, b],$ (22) $lim_{ϵ \to 0} u_{ψ + ϵ h} = u_{ψ} .$ (22)

Proof.

Subtracting the following equations: $\begin{aligned} - u_{ψ}^{″} & = T_{D} ψ \\ - u_{ψ + ϵ h}^{″} & = T_{D} (ψ + ϵ h), \end{aligned}$ we have (23) $- Δ (u_{ψ + ϵ h} - u_{ψ}) = ϵ T_{D} h$ (23) and using $u_{ψ + ϵ h} - u_{ψ} \in H_{0}^{2} [a, b]$ we get, via integration by parts, $(\nabla (u_{ψ + ϵ h} - u_{ψ}), \nabla (u_{ψ + ϵ h} - u_{ψ}))_{L^{2}} = ϵ (T_{D} h, u_{ψ + ϵ h} - u_{ψ})_{L^{2}} .$ Now from (Equation14(14) $\begin{aligned} G (ψ) & = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \\ \geq λ_{1} | | u - u_{ψ} | |_{L^{2}}^{2}, \end{aligned}$ (14) ) we have $λ_{1} | | u_{ψ + ϵ h} - u_{ψ} | |_{L^{2}}^{2} \leq | | \nabla (u_{ψ + ϵ h} - u_{ψ}) | |_{L^{2}}^{2}$ , and using the Cauchy–Schwarz inequality, we have $\begin{aligned} (1 + λ_{1}^{- 1})^{- 1} | | u_{ψ + ϵ h} - u_{ψ} | |_{H^{1}}^{2} \leq | | \nabla (u_{ψ + ϵ h} - u_{ψ}) | |_{L^{2}}^{2} = ϵ (T_{D} h, u_{ψ + ϵ h} - u_{ψ})_{L^{2}}, \\ ⟹ (1 + λ_{1}^{- 1})^{- 1} ‖ u_{ψ + ϵ h} - u_{ψ} ‖_{H^{1}} \leq ϵ | | T_{D} h | |_{L^{2}}, \end{aligned}$ where $λ_{1} > 0$ . Hence $u_{ψ + ϵ h} \overset{ϵ \to 0}{\to} u_{ψ}$ in $H^{1} [a, b]$ , since the operator $T_{D}$ is bounded and $ψ, h \in L^{2} [a, b]$ are fixed, which implies the right hand side is of $O (ϵ)$ .

3.1. Proof of Theorem 3.1

The proof of first two properties (i) and (ii) are straight forward via integration by parts and using the fact that $u - u_{ψ} \in H_{0}^{2} [a, b]$ .

In order to prove (iii) and (iv), we use Lemma 3.2.

(iii)	The Gâteaux derivative of the functional G at ψ in the direction of $h \in L^{2} [a, b]$ is given by (24) $G^{'} (ψ) [h] = lim_{ϵ \to 0} \frac{G (ψ + ϵ h) - G (ψ)}{ϵ} .$ (24) Now for a fixed $ϵ > 0$ , we have using (Equation23(23) $- Δ (u_{ψ + ϵ h} - u_{ψ}) = ϵ T_{D} h$ (23) ), $\begin{aligned} \frac{G (ψ + ϵ h) - G (ψ)}{ϵ} \\ = ϵ^{- 1} \int_{a}^{b} (u^{'} - u_{ψ + ϵ h}^{'})^{2} - (u^{'} - u_{ψ}^{'})^{2} d x \\ = ϵ^{- 1} \int_{a}^{b} (u_{ψ}^{'} - u_{ψ + ϵ h}^{'}) (2 u^{'} - (u_{ψ + ϵ h} + u_{ψ}^{'})) d x \\ = ϵ^{- 1} \int_{a}^{b} - Δ (u_{ψ} - u_{ψ + ϵ h}) (2 u - (u_{ψ + ϵ h} + u_{ψ})) d x \\ = ϵ^{- 1} \int_{a}^{b} - ϵ (T_{D} h) (2 u - (u_{ψ + ϵ h} + u_{ψ})) d x \\ = - (T_{D} h, 2 u - (u_{ψ + ϵ h} + u_{ψ}))_{L^{2}} . \end{aligned}$ Using Lemma 3.2, one obtains the Gâteaux derivative of G at $ψ \in L^{2} [a, b]$ in the direction of $h \in L^{2} [a, b]$ as $G^{'} (ψ) [h] = (T_{D} h, - 2 (u - u_{ψ}))_{L^{2}} .$ Note that $(T_{D} h, - 2 (u - u_{ψ}))_{L^{2}} = (h, T_{D}^{} (- 2 (u - u_{ψ})))_{L^{2}}$ for all $h \in L^{2} [a, b]$ , where $T_{D}^{}$ is the adjoint of the operator $T_{D}$ . Hence by Riesz representation theorem, the $L^{2}$ -gradient of the functional G at ψ is given by $\nabla_{L^{2}}^{ψ} G = T_{D}^{*} (- 2 (u - u_{ψ})) .$
(iv)	Finally, the second Gâteaux derivative for the functional G at ψ is given by (25) $G^{″} (ψ) [h, k] = lim_{ϵ \to 0} \frac{G^{'} (ψ + ϵ h) [k] - G^{'} (ψ) [k]}{ϵ}$ (25) Again for a fixed $ϵ > 0$ , we have using (Equation23(23) $- Δ (u_{ψ + ϵ h} - u_{ψ}) = ϵ T_{D} h$ (23) ) $\begin{aligned} \frac{G^{'} (ψ + ϵ h) [k] - G^{'} (ψ) [k]}{ϵ} \\ = ϵ^{- 1} \int_{a}^{b} (T_{D} k) (- 2 (u - u_{ψ + ϵ h})) - (T_{D} k) (- 2 (u - u_{ψ})) d x \\ = ϵ^{- 1} \int_{a}^{b} - 2 (T_{D} k) (u_{ψ} - u_{ψ + ϵ h}) d x \\ = ϵ^{- 1} \int_{a}^{b} - 2 (T_{D} k) (ϵ Δ^{- 1} (T_{D} h)) d x \\ = 2 \int_{a}^{b} (T_{D} k) (- Δ^{- 1} (T_{D} h)) d x \\ = 2 (- Δ^{- 1} (T_{D} h), T_{D} k)_{L^{2}} . \end{aligned}$ Hence from (Equation25(25) $G^{″} (ψ) [h, k] = lim_{ϵ \to 0} \frac{G^{'} (ψ + ϵ h) [k] - G^{'} (ψ) [k]}{ϵ}$ (25) ) and letting $ϵ \to 0$ we get $G^{″} (ψ) [h, k] = 2 (- Δ^{- 1} (T_{D} h), T_{D} k)_{L^{2}} .$ Here we can see the strict convexity of the functional G, as for any $h \in L^{2} [a, b]$ , we have $\begin{aligned} G^{″} (ψ) [h, h] & = 2 (- Δ^{- 1} (T_{D} h), (T_{D} h))_{L^{2}} \\ = 2 (y, - Δ y)_{L^{2}}, \end{aligned}$ where $- Δ y = T_{D} h$ and $y \in H_{0}^{2} [a, b]$ (from (Equation23(23) $- Δ (u_{ψ + ϵ h} - u_{ψ}) = ϵ T_{D} h$ (23) )). As $- Δ$ is a positive operator on $H_{0}^{2} [a, b]$ , y is the trivial solution if and only if $T_{D} h = 0$ . But $\int_{a}^{x} h (ξ) d ξ = 0$ for all $x \in [a, b]$ if and only if $h \equiv 0$ . Thus $G^{″} (ψ)$ is a positive definite form for any $ψ \in L^{2} [a, b]$ .

In this section, we proved that the functional G is strictly convex and hence has a unique minimizer, which is attained by the solution ϕ of the inverse problem (Equation1(1) $T_{D} φ = g_{1},$ (1) ). We next discuss a descent algorithm that uses the $L^{2}$ -gradient to derive other gradients that provide descent directions, with better and faster descent rates.

4. The descent algorithm

Here we discuss the problem of minimizing the functional G via a descent method. Theorem 3.1 suggests that the minimization of the functional G should be computationally effective in that ϕ is not only the unique global minimum for G but also the unique zero for the gradient, that is, $\nabla_{L^{2}}^{ψ} G \neq 0$ for $ψ \neq φ$ . Now for a given $ψ \in L^{2} [a, b]$ , let $h \in L^{2} [a, b]$ denote an update direction for ψ. Then Taylor's expansion gives $G (ψ - α h) = G (ψ) - α G^{'} (ψ) [h] + O (α^{2})$ or, for sufficiently small $α > 0$ , we have (26) $G (ψ - α h) - G (ψ) \approx - α G^{'} (ψ) [h] .$ (26) So if we choose the direction h in such a way that $G^{'} (ψ) [h] > 0$ , then we can minimize G along this direction. Thus we can set up a recovery algorithm for ϕ, forming a sequence of values $G (ψ_{initial} - α_{1} h_{1}) > \dots > G (ψ_{m - 1} - α_{m} h_{m}) > G (ψ_{m} - α_{m + 1} h_{m + 1}) \geq 0$ We list a number of different gradient directions that can make $G^{'} (ψ) [h] > 0$ .

The $L^{2}$ -Gradient:
First, notice from Theorem 3.1 that at a given $ψ \in L^{2} [a, b]$ , (27) $G^{'} (ψ) [h] = (h, \nabla_{L^{2}}^{ψ} G)_{L^{2}}$ (27) so if we choose the direction $h = \nabla_{L^{2}}^{ψ} G$ at ψ, then $G^{'} (ψ) [h] > 0$ . However, there are numerical issues associated with $L^{2}$ -gradient of G during the descent process stemming from the fact that it is always zero at b. Consequently, the boundary data at b for the evolving functions $ψ_{m}$ are invariant during the descent and there is no control on the evolving boundary data at b. This can result in severe decay near the boundary point b if $ψ_{initial} (b) \neq φ (b)$ , as the end point for all such $ψ_{m}$ is glued to $ψ_{initial} (b)$ , but $ψ_{m} \to φ$ in $L^{2} [a, b]$ , as is proved in Section 5.
The $H^{1}$ -Gradient:
One can circumvent this problem by opting for the Sobolev gradient $\nabla_{H^{1}}^{ψ} G$ instead (see [Citation20]), which is also known as the Neuberger gradient. It is defined as follows: for any $h \in H^{1} [a, b]$ (28) $\begin{aligned} G^{'} (ψ) [h] & = (\nabla_{H^{1}}^{ψ} G, h)_{H^{1}} \\ = (g^{'}, h^{'})_{L^{2}} + (g, h)_{L^{2}} \\ = - (g^{″}, h)_{L^{2}} + (g, h)_{L^{2}} + [g^{'} h]_{a}^{b} \\ = (- g^{″} + g, h)_{L^{2}} + [g^{'} h]_{a}^{b} \end{aligned}$ (28) where $g = \nabla_{H^{1}}^{ψ} G$ . Comparing with (Equation27(27) $G^{'} (ψ) [h] = (h, \nabla_{L^{2}}^{ψ} G)_{L^{2}}$ (27) ) one can obtain the Neuberger gradient g at ψ, by solving the boundary value problem (29) $\begin{aligned} - g^{″} + g & = \nabla_{L^{2}}^{ψ} G \\ [g^{'} h]_{a}^{b} & = 0. \end{aligned}$ (29) Setting h = g, the boundary condition becomes (30) $[g^{'} g]_{a}^{b} = g^{'} (b) g (b) - g^{'} (a) g (a) = 0.$ (30) This provides us a gradient, $\nabla_{H^{1}}^{ψ} G$ , with considerably more flexibility at the boundary points a and b. In particular consider the following cases:
1. Dirichlet Neuberger gradient : $g (a) = 0$ and $g (b) = 0$ .
2. Neumann Neuberger gradient : $g^{'} (a) = 0$ and $g^{'} (b) = 0$ .
3. Robin or mixed Neuberger gradient : $g (a) = 0$ and $g^{'} (b) = 0$ or $g^{'} (a) = 0$ and $g (b) = 0$ .
This excellent smoothing technique was originally introduced and used by Neuberger. In addition to the flexibility at the end points, it enables the new gradient to be a preconditioned (smoothed) version of $\nabla_{L^{2}}^{ψ} G$ , as $g = (I - Δ)^{- 1} \nabla_{L^{2}}^{ψ} G$ , and hence gives a superior convergence in the steepest descent algorithms. So now choosing the descent direction h = g at ψ makes $G^{'} (ψ) [h] = (g, \nabla_{H^{1}}^{ψ} G)_{H^{1}} > 0$ and hence $G (ψ - α h) - G (ψ) < 0$ (from (Equation26(26) $G (ψ - α h) - G (ψ) \approx - α G^{'} (ψ) [h] .$ (26) )). As stated earlier, the greatest advantage of this gradient is the control of boundary data during the descent process, since based on some prior information of the boundary data we can choose any one of the three aforementioned gradients. For example, if some prior knowledge on $φ (a)$ and $φ (b)$ are known, then one can define $φ_{initial}$ as the straight line joining them and use the Dirichlet Neubeger gradient for the descent. Thus the boundary data is preserved in each of the evolving $ψ_{m}$ during the descent process, which leads to a much more efficient, and faster, descent compared to the normal $L^{2}$ -gradient descent. Even when $φ |_{{a, b}}$ is unknown, one can use the Neumann Neuberger gradient that allows free movements at the boundary points rather than gluing it to a fixed value. In the latter scenario, one can even take the average of $\nabla_{H^{1}}^{ψ} G$ and $\nabla_{L^{2}}^{ψ} G$ to make use of both the gradients.
The $L^{2} - H^{1}$ Conjugate Gradient:
If one wishes to further boost the descent speed (by, roughly, a factor of 2) and make the best use of both the gradients, then the standard Polak–Ribi $\overset{`}{e}$ re conjugate gradient scheme (see [Citation21]) can be implemented. The initial search direction at $ψ_{0}$ , is $h_{0} = g_{0} = \nabla_{H^{1}}^{ψ_{0}} G$ . At $ψ_{m}$ one can use the exact or inexact line search routine to minimize $G (ψ)$ in the direction of $h_{m}$ resulting in $ψ_{m + 1}$ . Then $g_{m + 1} = \nabla_{H^{1}}^{ψ_{m + 1}} G$ and $h_{m + 1} = g_{m + 1} + γ_{m} h_{m}$ , where (31) $γ_{m} = \frac{(g_{m + 1} - g_{m}, g_{m + 1})_{H^{1}}}{(g_{m}, g_{m})_{H^{1}}} = \frac{(g_{m + 1} - g_{m}, \nabla_{L^{2}}^{ψ_{m + 1}} G)_{L^{2}}}{(g_{m}, \nabla_{L^{2}}^{ψ_{m}} G)_{L^{2}}} .$ (31)

Remark 4.1

Though the $L^{2}$ – $H^{1}$ conjugate gradient boost the descent rate, it (sometimes) compromises the accuracy of the recovery, especially when the noise present in the data is extreme, see Table when $σ = 0.1$ . Now one can also construct a $H^{1}$ – $H^{1}$ conjugate gradient based only on the Sobolev gradient. This is smoother than the $L^{2}$ – $H^{1}$ conjugate gradient (as Sobolev gradients are smoother, see Equation (Equation28(28) $\begin{aligned} G^{'} (ψ) [h] & = (\nabla_{H^{1}}^{ψ} G, h)_{H^{1}} \\ = (g^{'}, h^{'})_{L^{2}} + (g, h)_{L^{2}} \\ = - (g^{″}, h)_{L^{2}} + (g, h)_{L^{2}} + [g^{'} h]_{a}^{b} \\ = (- g^{″} + g, h)_{L^{2}} + [g^{'} h]_{a}^{b} \end{aligned}$ (28) )) and hence improves the accuracy of the recovered solution (specially for smooth recovery), but it is tad slower than the $L^{2}$ – $H^{1}$ conjugate gradient. Finally, the simple Sobolev gradient ( $\nabla_{H^{1}}^{} G$ ) provides the most accurate recovery, but at the cost of the descent rate (it the slowest amongst the three), see Tables and .

4.1. The line search method

We minimize the single variable function $f_{m} (α) = G (ψ_{m + 1} (α))$ , where $ψ_{m + 1} (α) = ψ_{m} - α \nabla_{H^{1}}^{ψ_{m}} G$ , via a line search minimization by first bracketing the minimum and then using some well-known optimization techniques like Brent minimization to further approximate it. Note that the function $f_{m} (α)$ is strictly decreasing in some neighbourhood of $α = 0$ as $f_{m}^{'} (0) = - | | g_{m} | |_{H^{1}}^{2} < 0$ .

In order to achieve numerical efficiency, we need to carefully choose the initial step size $α_{0}$ . For that, we use the quadratic approximation of the function $f_{m} (α)$ as follows: (32) $f_{m} (α) \approx G (ψ_{m}) - α G^{'} (ψ_{m}) [g_{m}] + \frac{1}{2} α^{2} G^{″} (ψ_{m}) [g_{m}, g_{m}],$ (32) which gives the minimizing value for α as (33) $α_{0} = \frac{G^{'} (ψ_{m}) [g_{m}]}{G^{″} (ψ_{m}) [g_{m}, g_{m}]} .$ (33) Now since $α_{0}$ is derived from the quadratic approximation of the functional G, it is usually very close to the optimal value, thereby reducing the computational time of the descent algorithm significantly. Now if, for $k = 0, 1, 2 \dots$ , $f_{m} ((k + 1) α_{0}) > f_{m} (k α_{0})$ then we have a bracket, $[max {k - 1, 0} α_{0}, (k + 1) α_{0}]$ , for the minimum and one can use single variable minimization solvers to approximate it.

In this section, we saw a descent algorithm, with various gradients, where starting from an initial guess $ψ_{0} \in L^{2} [a, b]$ , we obtain a sequence of $L^{2}$ -functions $ψ_{m}$ for which the sequence ${G (ψ_{m})}$ is strictly decreasing. In the next section, we discuss the convergence of the $ψ_{m}$ 's to ϕ and the stability of the recovery.

5. Convergence, stability and conditional well-posedness

Exact data: We first prove that the sequence of functions constructed during the descent process converges to the exact source function in the absence of any error term and then proves the stability of the process in the presence of noise in the data.

5.1. Convergence

First we see that for the sequence ${ψ_{m}}$ produced by the steepest descent algorithm, described in Section 4, we have $G (ψ_{m}) \to 0$ , since the functional G is non-negative and strictly convex (with the global minimizer ϕ, $G (φ) = 0$ ) and $G (ψ_{m + 1}) < G (ψ_{m})$ . In this section, we prove that if for any sequence of functions ${ψ_{m}} \subset L^{2} [a, b]$ such that $G (ψ_{m}) \to 0$ , then $ψ_{m} \overset{w}{\to} φ$ in $L^{2} [a, b]$ , where $\overset{w}{\to}$ denotes weak convergence in $L^{2} [a, b]$ .

Theorem 5.1

Suppose that ${ψ_{m}}$ is any sequence of $L^{2}$ -functions such that the sequence ${G (ψ_{m})}$ tends to zero. Then ${ψ_{m}}$ converges weakly to ϕ in $L^{2} [a, b]$ and ${u_{ψ_{m}}}$ converges strongly to u in $H^{1} [a, b]$ . Also, the sequence ${g_{m}}$ converges weakly to $g_{1}$ in $L^{2} [a, b],$ where $g_{m} = - u_{ψ_{m}}^{″}$ and $g_{1} = - u^{″}$ .

Proof.

The proof of $u_{ψ_{m}} \overset{s}{\to} u$ in $H^{1} [a, b]$ is trivial from the bounds of $G (ψ)$ in Equation (Equation15(15) ${(1 + \frac{1}{λ_{1}})}^{- 1} | | u - u_{ψ} | |_{H^{1}}^{2} \leq G (ψ) = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \leq | | u - u_{ψ} | |_{H^{1}}^{2} .$ (15) ), which gives (34) $| | u - u_{ψ_{m}} | |_{H^{1}}^{2} \leq (1 + \frac{1}{λ_{1}}) G (ψ_{m}) .$ (34) To see the weak convergence of ${ψ_{m}}$ to ϕ in $L^{2} [a, b]$ , we first prove that the sequence ${T_{D} ψ_{m}}$ converges weakly to $T_{D} φ$ in $L^{2} [a, b]$ . Since ${u_{ψ_{m}}}$ converges strongly to u in $H^{1} [a, b]$ , this implies ${u_{ψ_{m}}^{'}}$ and ${u_{ψ_{m}}}$ converge weakly to $u^{'}$ and u in $L^{2} [a, b]$ , respectively. Now we will use the fact that $C_{0}^{\infty} [a, b]$ is dense in $L^{2} [a, b]$ (in $L^{2}$ -norm), i.e. for any $ψ \in L^{2} [a, b]$ and $ϵ > 0$ there exists a $\tilde{ψ} \in C_{0}^{\infty} [a, b]$ such that $| | ψ - \tilde{ψ} | |_{L^{2}} \leq ϵ$ . So for any $\tilde{ψ} \in C_{0}^{\infty} [a, b]$ we have, using ${\tilde{ψ}}^{'} \in L^{2} [a, b]$ , (35) $\begin{aligned} (T_{D} ψ_{m} - T_{D} φ, \tilde{ψ})_{L^{2}} & = (- Δ (u_{ψ_{m}} - u), \tilde{ψ})_{L^{2}} \\ = (\nabla (u_{ψ_{m}} - u), {\tilde{ψ}}^{'})_{L^{2}}, \end{aligned}$ (35) which tends to zero as $m \to \infty$ . Hence by the density of $C_{0}^{\infty} [a, b]$ in $L^{2} [a, b]$ , it can be proved that the sequence ${T_{D} ψ_{m}}$ converges weakly to $T_{D} φ$ in $L^{2} [a, b]$ .

To prove convergence of ${ψ_{m}}$ to ${φ}$ weakly in $L^{2} [a, b]$ , we can use $(T_{D} (ψ_{m} - φ), ψ)_{L^{2}} = (ψ_{m} - φ, T_{D}^{*} ψ)_{L^{2}}$ and (Equation35(35) $\begin{aligned} (T_{D} ψ_{m} - T_{D} φ, \tilde{ψ})_{L^{2}} & = (- Δ (u_{ψ_{m}} - u), \tilde{ψ})_{L^{2}} \\ = (\nabla (u_{ψ_{m}} - u), {\tilde{ψ}}^{'})_{L^{2}}, \end{aligned}$ (35) ). Therefore, our proof will be complete if we can show that the range of $T_{D}^{*}$ is dense in $L^{2} [a, b]$ in $L^{2}$ -norm. Again we start with any $\tilde{ψ} \in C_{0}^{\infty} [a, b]$ , and using ${\tilde{ψ}}^{'} \in L^{2} [a, b]$ , we have $\begin{aligned} (T_{D}^{*} (- {\tilde{ψ}}^{'})) (x) & = - \int_{x}^{b} {\tilde{ψ}}^{'} (η) d η \\ = \tilde{ψ} (x), \end{aligned}$ i.e. $C_{0}^{\infty} [a, b] \subset R a n g e (T_{D}^{*})$ . Hence we have $R a n g e (T_{D}^{*})$ dense in $L^{2} [a, b]$ in $L^{2}$ -norm. For convergence of ${g_{m}}$ to $g_{1}$ , note that $T_{D} ψ_{m} = - u_{ψ_{m}}^{″} = g_{m}$ and $T_{D} φ = - u^{″} = g_{1}$ . And since ${T_{D} ψ_{m}}$ converges weakly to $T_{D} φ$ in $L^{2} [a, b]$ implies ${g_{m}}$ converges weakly to $g_{1}$ in $L^{2} [a, b]$ .

Remark 5.2

It can be further proved that the sequence ${ψ_{m}}$ converges strongly to ϕ in $L^{2} [a, b]$ . To prove this, first, one needs to analyse the operator associated with the minimizing functional G as defined in (Equation12(12) $G (ψ) = ‖ u_{ψ}^{'} - u^{'} ‖_{2} .$ (12) ), i.e. from the definition of $u_{ψ}$ in (Equation7(7) $\begin{aligned} - u_{ψ}^{″} & = T_{D} ψ, \end{aligned}$ (7) ) together with the criterion $u_{ψ} \in H_{0}^{2} [a, b]$ we have (from (Equation10(10) $\begin{aligned} u (x) & = \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] \\ - \frac{b - x}{b - a} [\int_{a}^{b} \int_{a}^{η} g (ξ) d ξ d η - g (a) \frac{(b - a)^{2}}{2}], \end{aligned}$ (10) ), with $g = T_{D} ψ$ and $(T_{D} ψ) (a) = 0$ , by the definition of $T_{D}$ ) (36) $u_{ψ}^{'} = - \int_{a}^{x} (T_{D} ψ) (ξ) d ξ + \frac{1}{b - a} \int_{a}^{b} \int_{a}^{η} (T_{D} ψ) (ξ) d ξ d η .$ (36) From the expression (Equation36(36) $u_{ψ}^{'} = - \int_{a}^{x} (T_{D} ψ) (ξ) d ξ + \frac{1}{b - a} \int_{a}^{b} \int_{a}^{η} (T_{D} ψ) (ξ) d ξ d η .$ (36) ), the operator $L (ψ) := u_{ψ}^{'}$ is both linear and bounded, and hence minimizing the functional G in (Equation12(12) $G (ψ) = ‖ u_{ψ}^{'} - u^{'} ‖_{2} .$ (12) ) is equivalent to Landweber iterations corresponding to the operator L. Then from the convergence theories developed for Landweber iterations the sequence ${ψ_{m}}$ converges to ϕ strongly in $L^{2} [a, b]$ , for details on iterative regularization see [Citation14,Citation16].

Theorem 5.1 proves that for the given function $g_{1} \in L^{2} [a, b]$ we are able to construct a sequence of smooth functions, ${g_{m}} \subset L^{2} [a, b]$ , that converges (weakly) to $g_{1}$ in $L^{2} [a, b]$ . This is critical since, when the data has noise in it one needs to terminate the descent process at an appropriate instance to attain regularization, see Section 5.3, and the (weak) convergence helps us to construct such a stopping criterion in the absence of noise information (δ), see Section 8.

Noisy data: In this section, we consider the data has noise in it and shows that the sequence of functions constructed during the descent process using the noisy data still approximates the exact solution, under some conditions.

5.2. Stability

Here we prove the stability of the process. We will prove this by considering the problem of numerical differentiation as equivalent to finding a unique minimizer of the positive functional G, this makes the problem well-posed. That is, for a given $g \in H^{1} [a, b]$ , and hence a given $u \in H_{0}^{3} [a, b]$ and the functional G, the problem of finding (derivative) $φ \in L^{2} [a, b]$ such that $g^{'} = φ$ is equivalent to finding the minimizer of the functional G, i.e. a $ψ \in L^{2} [a, b]$ such that $G (ψ) \leq ϵ$ , for any small $ϵ > 0$ , is a conditionally well-posed problem. It is not hard to prove that if two functions g, $\tilde{g} \in H^{1} [a, b]$ are such that $| | g - \tilde{g} | |_{L^{2}} \leq δ$ , where $δ > 0$ is small, then the corresponding u, $\tilde{u} \in H_{0}^{3} [a, b]$ also satisfyFootnote⁶ $| | u - \tilde{u} | |_{H^{1}} \leq C δ$ , for some constant C.

Theorem 5.3

Suppose the function $\tilde{u}$ in the perturbed version of the function u such that $| | u - \tilde{u} | |_{H^{1}} \leq δ,$ where $δ > 0,$ and let ϕ, $\tilde{φ} \in L^{2} [a, b]$ denote their respective recovered functions, such that $T_{D} φ = - u^{″}$ and $T_{D} \tilde{φ} = - {\tilde{u}}^{″}$ . Let the functional G, without loss of generality, be defined based on $\tilde{u},$ that is, $G (\tilde{φ}) = 0,$ then we have (37) $0 \leq G (φ) \leq C δ^{2},$ (37) where C is some constant.

Proof.

Since $u_{φ} = u$ and ${\tilde{u}}_{\tilde{φ}} = \tilde{u}$ , the proof follows from the definitions of the corresponding functionals, as $G (φ) = | | u_{φ}^{'} - {\tilde{u}}^{'} | |_{L^{2}}^{2} = | | u^{'} - {\tilde{u}}^{'} | |_{L^{2}}^{2} \leq C δ^{2}$

In the next theorem, we prove that if a sequence of functions ${ψ_{m}}$ converges to $\tilde{φ}$ in $L^{2} [a, b]$ , then it also approximates ϕ in $L^{2} [a, b]$ , that is, if $G_{\tilde{u}} (ψ_{m})$ is small, then $G_{u} (ψ_{m})$ is also small where $G_{u}$ and $G_{\tilde{u}}$ are the functionals formed based on u and $\tilde{u}$ , respectively.

Theorem 5.4

Suppose for a sequence of functions ${ψ_{m}} \subset L^{2} [a, b],$ the corresponding sequence ${G_{\tilde{u}} (ψ_{m})} \subset R$ converges to zero, where the functional $G_{\tilde{u}}$ is formed based on $\tilde{u},$ then for the original u such that $| | u - \tilde{u} | |_{H^{1}} \leq δ,$ for small $δ > 0,$ there exists a $M (δ) \in N$ such that for all $m \geq M (δ),$ $G_{u} (ψ_{m}) \leq c δ^{2}$ for some constant c and $G_{u}$ is the functional based on u.

Proof.

For u, $\tilde{u}$ and any $ψ_{m} \in L^{2} [a, b]$ , we have $G_{u} (ψ_{m}) = | | u^{'} - u_{ψ_{m}}^{'} | |_{L^{2}}^{2}$ and $G_{\tilde{u}} (ψ_{m}) = | | {\tilde{u}}^{'} - u_{ψ_{m}}^{'} | |_{L^{2}}^{2}$ , then $\begin{aligned} G_{u} (ψ_{m}) & = | | u^{'} - u_{ψ_{m}}^{'} | |_{L^{2}}^{2} \\ \leq | | {\tilde{u}}^{'} - u^{'} | |_{L^{2}}^{2} + | | {\tilde{u}}^{'} - u_{ψ_{m}}^{'} | |_{L^{2}}^{2} \end{aligned}$ and hence from Theorem 5.1, the result follows.

5.3. Conditional well-posedness (iterative-regularization)

As explained earlier, in an external-parameter based regularization method (like Tikhonov-type regularizations) first, one converts the ill-posed problem to a family of well-posed problem (depending on the parameter value β) and then, only after finding an appropriate regularization parameter value (say $β_{0}$ ), one proceeds to the recovery process, i.e. completely minimize the corresponding functional $G (\cdot, β_{0})$ (as defined in (Equation3(3) $G_{2} (ψ, β) = | | T_{D} ψ - g_{1} | |_{2}^{2} + β | | D ψ | |,$ (3) )). Whereas in a classical iterative regularization method (not involving any external-parameters like β), one cannot recover a regularized solution by simply minimizing a related functional completely, instead stopping the recovery process at an appropriate instance provides the regularizing effect to the solution. That is, one starts to minimize some related functional (to recover the solution) but then terminates the minimization process at an appropriate iteration before it has reached the minimum (to restrict the influence of the noise), i.e. here the iteration stopping index serves as a regularization parameter, for details see [Citation14,Citation16].

In this section, we further explain the above phenomenon by showing that if one attempts to recover the true (or original) solution ϕ by using a noisy data then it will distort the recovery. First we see that for an exact g (or equivalently, an exact u) and the functional constructed based on it (i.e. $G_{u}$ ) we have the true solution (ϕ) satisfying $G_{u} (φ) = 0$ . However, for a given noisy $\tilde{g}$ , with $| | \tilde{g} - g | |_{L^{2}} = δ > 0$ , and the functional based on it (i.e. $G_{\tilde{u}}$ ) we will have $G_{\tilde{u}} (φ) > 0$ , see Theorem 5.5. So if we construct a sequence of functions $ψ_{m}^{δ} \in L^{2} [a, b]$ , using the descent algorithm and based on the noisy data $\tilde{g}$ , such that $G_{\tilde{u}} (ψ_{m}^{δ}) \to 0$ then (from Theorem 5.1) we will have $ψ_{m}^{δ} \to \tilde{φ}$ , where $\tilde{φ}$ is the recovered noisy solution satisfying $G_{\tilde{u}} (\tilde{φ}) = 0$ . This implies initially $ψ_{m}^{δ} \to φ$ and then upon further iterations $ψ_{m}^{δ}$ diverges away from ϕ and approaches $\tilde{φ}$ . Hence, the errors in the recoveries $| | ψ_{m}^{δ} - φ | |_{L^{2}}$ follow a semi-convergence nature, i.e. decreases first and then increases. This is a typical behaviour of any ill-posed problem and is managed, as stated above, by stopping the descent process at an appropriate iteration $M (δ)$ such that $G_{\tilde{u}} (ψ_{M (δ)}^{δ}) > 0$ but close to zero (due to the stability Theorems 5.3 and 5.4). Following similar arguments as in (Equation15(15) ${(1 + \frac{1}{λ_{1}})}^{- 1} | | u - u_{ψ} | |_{H^{1}}^{2} \leq G (ψ) = | | u^{'} - u_{ψ}^{'} | |_{L^{2}}^{2} \leq | | u - u_{ψ} | |_{H^{1}}^{2} .$ (15) ), we can have a lower bound for $G_{\tilde{u}} (φ)$ .

Theorem 5.5

Given two functions u, $\tilde{u} \in H_{0}^{3} [a, b],$ their respective recovery ϕ, $\tilde{φ} \in L^{2} [a, b],$ such that $T_{D} φ = - u^{″}$ and $T_{D} \tilde{φ} = - {\tilde{u}}^{″},$ and let the functional $G_{\tilde{u}}$ be defined based on $\tilde{u},$ that is, $G_{\tilde{u}} (\tilde{φ}) = 0,$ then we have (38a) $G_{\tilde{u}} (φ) = | | u^{'} - {\tilde{u}}^{'} | |_{L^{2}}^{2} \geq λ_{1} | | u - \tilde{u} | |_{L^{2}}^{2},$ (38a) as an $L^{2}$ -lower bound and for a $H^{1}$ -lower bound, we have (38b) $G_{\tilde{u}} (φ) = | | u^{'} - {\tilde{u}}^{'} | |_{L^{2}}^{2} \geq {(1 + \frac{1}{λ_{1}})}^{- 1} | | u - \tilde{u} | |_{H^{1}}^{2}$ (38b) where $λ_{1} = π^{2} / (b - a)^{2}$ is the smallest eigenvalue of $- Δ$ on $H_{0}^{2} [a, b]$ .

Therefore, combining Theorems 5.3 and 5.5 we have the following two-sided inequality for $G_{\tilde{u}} (φ)$ , for some constants $C_{1}$ and $C_{2}$ , $0 \leq C_{1} | | u - \tilde{u} | |_{H^{1}}^{2} \leq G_{\tilde{u}} (φ) \leq C_{2} | | u - \tilde{u} | |_{H^{1}}^{2} .$ Thus, when $δ \to 0$ we have $G_{\tilde{u}} (φ) \to 0$ which implies $φ_{δ} \to φ$ in $L^{2} [a, b]$ . Now though we would like to use the bounds in Theorem 5.5 to terminate the descent process, but we do not known the exact g (or equivalently, the exact u), and hence cannot use that as the stopping condition. However, if the error norm $δ = | | g - g_{δ} | |_{L^{2}}$ is known then one can use Morozov's discrepancy principle, [Citation22], as a stopping criterion for the iteration process, that is, terminate the iteration when (39) $| | T ψ_{m} - g_{δ} | |_{L^{2}} \leq τ δ$ (39) for an appropriate $τ > 1$ ,Footnote⁷ and for unknown δ one usually goes for heuristic approaches to stop the iterations, an example of which is presented in Section 8.

6. Numerical implementation

In this section, we provide an algorithm to compute the derivative numerically. Notice that though one can use the integral operator equation (Equation1(1) $T_{D} φ = g_{1},$ (1) ) to recover ϕ inversely, for computational efficiency we can further improve the operator and the operator equation, but keeping the theory intact. First we see that the adjoint operator $T_{D}^{*}$ can also provide an integral operator equation of interest, (40) $- T_{D}^{*} φ = g_{2},$ (40) where $g_{2} (x) = g (x) - g (b)$ and $(T_{D}^{*} φ) (x) := \int_{x}^{b} φ (ξ) d ξ,$ for all $x \in [a, b] \subset R$ . Now we can combine both the operator equations (Equation1(1) $T_{D} φ = g_{1},$ (1) ) and (Equation40(40) $- T_{D}^{*} φ = g_{2},$ (40) ) to get the following integral operator equation: (41) $T φ = g_{3},$ (41) where $g_{3} (x) = 2 g (x) - (g (a) + g (b))$ and for all $x \in [a, b] \subset R$ , (42) $\begin{aligned} (T φ) (x) & := (T_{D} φ - T_{D}^{*} φ) (x) \\ = \int_{a}^{x} φ (ξ) d ξ - \int_{x}^{b} φ (ξ) d ξ . \end{aligned}$ (42) The advantage of the operator equation (Equation41(41) $T φ = g_{3},$ (41) ) over (Equation1(1) $T_{D} φ = g_{1},$ (1) ) or (Equation40(40) $- T_{D}^{*} φ = g_{2},$ (40) ) is that it recovers ϕ symmetrically at the end points. For example if we consider the operator equation (Equation40(40) $- T_{D}^{*} φ = g_{2},$ (40) ) for the recovery of ϕ, then during the descent process the $L^{2}$ -gradient at ψ will be $\nabla_{L^{2}}^{ψ} G = - T_{D} (- 2 (u - u_{ψ}))$ (similar to (Equation19(19) $\nabla_{L^{2}}^{ψ} G = T_{D}^{*} (- 2 (u - u_{ψ})),$ (19) )) which implies $\nabla_{L^{2}}^{ψ} G (a) = 0$ for all $ψ \in L^{2} [a, b]$ . Hence the boundary data at a for all the evolving $ψ_{m}$ 's are going to be invariant during the descent.Footnote⁸ As for (Equation1(1) $T_{D} φ = g_{1},$ (1) ), since $\nabla_{L^{2}}^{ψ} G (b) = 0$ , the boundary data at b for all the evolving $ψ_{m}$ 's are going to be invariant during the descent. Even though one can opt for the Sobolev gradient of G at ψ, $\nabla_{H^{1}}^{ψ} G$ , to counter that problem but, due to the intrinsic decay of the base function $\nabla_{L^{2}}^{ψ} G$ for all $ψ_{m}$ 's at a or b, the recovery of ϕ near that respective boundary will not be as good (or symmetric) as at the other end. On the other hand if we use the operator equation (Equation41(41) $T φ = g_{3},$ (41) ) for the descent recovery of ϕ then the $L^{2}$ -gradient at ψ is going to be (43) $\nabla_{L^{2}}^{ψ} G = T^{*} (- 2 (u - u_{ψ}))$ (43) where (44) $T^{*} = T_{D}^{*} - T_{D} or T^{*} = - T .$ (44) Thus $\nabla_{L^{2}}^{ψ} G (a) \neq 0$ and $\nabla_{L^{2}}^{ψ} G (b) \neq 0$ , and hence the recovery of ϕ at both the end points will be performed symmetrically. Now one can derive other gradients, like the Neuberger or conjugate gradient based on this $L^{2}$ -gradient, for the recovery of ϕ depending on the scenarios, that is, based on the prior knowledge of the boundary information (as explained in Section 4).

Corresponding to the operator equation (Equation41(41) $T φ = g_{3},$ (41) ), the smooth or integrated data $u \in H_{0}^{3} [a, b]$ will be (45) $\begin{aligned} u (x) & = 2 \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - (g (a) + g (b)) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] \\ - \frac{b - x}{b - a} [2 \int_{a}^{b} \int_{a}^{η} g (ξ) d ξ d η - (g (a) + g (b)) \frac{(b - a)^{2}}{2}] \end{aligned}$ (45) Thus our problem set up now is as follows: for a given $g \in H^{1} [a, b]$ (and hence a given $u \in H_{0}^{3} [a, b]$ ) we want to find a $φ \in L^{2} [a, b]$ such that (46) $T φ = - u^{″} .$ (46) Our inverse approach to achieve ϕ will be to minimize the functional G which is defined, for any $ψ \in L^{2} [a, b]$ , by (47) $G (ψ) = | | u^{'} - u_{ψ}^{'} | |_{L^{2}},$ (47) where u is as defined in (Equation45(45) $\begin{aligned} u (x) & = 2 \int_{x}^{b} \int_{a}^{η} g (ξ) d ξ d η - (g (a) + g (b)) [\frac{(b - a)^{2}}{2} - \frac{(x - a)^{2}}{2}] \\ - \frac{b - x}{b - a} [2 \int_{a}^{b} \int_{a}^{η} g (ξ) d ξ d η - (g (a) + g (b)) \frac{(b - a)^{2}}{2}] \end{aligned}$ (45) ) and $u_{ψ}$ is the solution of the boundary value problem (48) $\begin{aligned} T ψ & = - u_{ψ}^{″}, \\ u_{ψ} (a) & = u (a) and u_{ψ} (b) = u (b) . \end{aligned}$ (48) Since our new problem set up is almost identical to the old one, the previous theorems and results developed for $T_{D}$ can be similarly extended to T. Next we provide a pseudo-code, Algorithm 1, for the descent algorithm described earlier.

Remark 6.1

If prior knowledge of $φ (a)$ and $φ (b)$ is known then $ψ_{0}$ can be defined as a straight line joining them and we use Dirichlet Neuberger gradient for the descent. If no prior information is known about $φ (a)$ and $φ (b)$ , then we simply choose $ψ_{0} \equiv 0$ and use the Sobolev gradient or conjugate gradient for the descent. It has been numerically seen that having any information of $φ (a)$ or $φ (b)$ and using it, together with appropriate gradient, significantly improves the convergence rate of the descent process and the efficiency of the recovery. In the examples presented here we have not assumed any prior knowledge of $φ (a)$ or $φ (b)$ to keep the problem settings as pragmatic as possible.

Remark 6.2

To solve the boundary value problem (Equation29(29) $\begin{aligned} - g^{″} + g & = \nabla_{L^{2}}^{ψ} G \\ [g^{'} h]_{a}^{b} & = 0. \end{aligned}$ (29) ) while calculating the Neuberger gradients we used the invariant embedding technique for better numerical results (see [Citation23]). This is very important as this technique enables us to convert the boundary value problem to a system of initial and final value problems and hence one can use the more robust initial value solvers, compared to boundary value solvers, which normally use shooting methods.

Remark 6.3

For all the numerical testings presented in Section 7, we assumed to have prior knowledge on the error norm (δ) and used it as a stopping criteria, as explained in Section 5.3, for the descent process. We also compare the results obtained without any noise information (i.e. using heuristic stopping strategy, see Section 8) with the results obtained using the noise information, see Tables and in Section 8.

7. Results

A MATLAB program was written to test the numerical viability of the method. We take an evenly spaced grid with $h = 10^{- 2}$ in all the examples, unless otherwise specified. In all the examples, we used the discrepancy principle (see (Equation39(39) $| | T ψ_{m} - g_{δ} | |_{L^{2}} \leq τ δ$ (39) )) to terminate the iterations when the discrepancy error goes below δ (which is assumed to be known). In Section 8, we discuss the stopping criterion when δ is unknown.

Example 7.1

Comparison with standard regularization methods

In this example, we compare the inverse recovery using our technique with some of the standard regularization methods. We again perturbed the smooth function $g (x) = \cos (x)$ , here we consider the regularity of the data as $H^{1}$ (i.e. $g \in H^{(k + 1)} [a, b]$ for k = 0), on $[- .5, .5]$ by random noises to get $\tilde{g} (x) = g (x) + ϵ (x)$ , where ε is a normal random variable with mean 0 and standard deviation σ. Like in [Citation1], we generated two data sets, one with $h = 10^{- 2}$ (the dense set) and other with $h = 10^{- 1}$ (the sparse set). We tested with $σ = 0.01$ on both the data sets and $σ = 0.1$ only on the dense set. We compare the relative errors, $| | φ - \tilde{φ} | |_{L^{2}} / | | φ | |_{L^{2}}$ , obtained in our method (using Neumann $\nabla_{H^{1}}^{} G$ -gradient) with the relative errors provided in [Citation1], which are listed in Table . Here we can see that our method of numerical differentiation outperforms most of the other methods, in both the dense and sparse situation. Though Tikhonov method performs better for k = 2, that is when g is assumed to be in $H^{3} [a, b]$ , but for the same smoothness consideration, $g \in H^{1} [a, b]$ or k = 0, it fails miserably. In fact, one can prove that the ill-posed problem of numerical differentiation turns out to be well-posed when $g \in H^{(k + 1)} ([a, b])$ for k = 1, 2, see [Citation14], which explains the small relative errors in Tikhonov regularization for k = 1, 2. As stated above, the results in Table are obtained using the Sobolev gradient, whereas Tables and show a comparison in the recovery errors using different gradients and different stopping criteria, as well as the descent rates associated with them. To compare with the total variation method, which is very effective in recovering discontinuities in the solution, we perform a test on a sharp-edged function similar to the one presented in [Citation1] and the results are shown in Example 7.4. Hence we can consider this method as an universal approach in every scenarios.

In the next example, we show that one does not have to assume the normality conditions for the error term, i.e. the assumption that the noise involved should be iid normal random variables is not needed, which is critical for certain other methods, rather it can be a mixture of any random variables.

Example 7.2

Noise as a mixture of random variables

In the previous example, we saw that our method holds out in the presence of large error norm. Here we show that this technique is very effective even in the presence of extreme noise. We perturb the function $g (x) = \sin (x / 3)$ on $[0, 3 π]$ to $\tilde{g} (x) = g (x) + ϵ (x)$ ,Footnote⁹ where ε is the error function obtained from a mixture of uniform( $- δ$ , δ) and normal(0, δ) random variables, where $δ = 0.5$ . Figure shows the noisy $\tilde{g}$ and the exact g, and Figure (a) shows the computed derivative $\tilde{φ}$ vs. $φ (x) = \cos (x / 3) / 3$ . The relative error for the recovery of $\tilde{φ}$ is 0.0071.

Figure 1. Noisy $\tilde{g}$ .

Figure 2. Inverse recovery of the derivative $\tilde{φ}$ and $T \tilde{φ}$ : (a) derivative $\tilde{φ}$ vs. ϕ and (b) recovery of $\tilde{φ}$ .

Figure 2. Inverse recovery of the derivative φ~ and Tφ~: (a) derivative φ~ vs. ϕ and (b) recovery of φ~.

In the next example, we further pushed the limits by not having a zero-mean error term, which is again crucial for many other methods.

Example 7.3

Error with non-zero mean

In this example, we will show that this method is impressive even when the noise involved has nonzero mean. We consider the settings of the previous example: $g (x) = \sin (x / 3)$ on $[0, 3 π]$ is perturbed to $\tilde{g} (x) = g (x) + ϵ (x)$ but here the error function ε is a mixture of uniform(-0.8δ, 1.2δ) and normal(0.1, δ), for $δ = 0.1$ . Figure (b) shows the recovery of the derivative ϕ versus the true derivative. The relative error of the recovery for ϕ is around 0.0719.

In the following two examples, we provide the results of numerical differentiation done on a piece-wise differentiable functions and compare it with the results obtained in [Citation1,Citation2].

Example 7.4

Discontinuous source function

Here we selected a function randomly from the many functions tested in [Citation2]. The selected function has the following definition: $y_{2} (t) = {\begin{cases} 1 - t, & t \in [0, 0.5], \\ t, & t \in (0.5, 1] . \end{cases}$ The function $y_{2}$ is piecewise differentiable except at the point t = 0.5, where it has a sharp edge. The function is then perturbed by a uniform( $- δ$ , δ) random variable to get the noisy data $y_{2_{δ}}$ , shown in Figure a, where we even increased the error norm in our testing from $δ = 0.001$ in [Citation2] to $δ =$ 0.01 in our case. Figure shows the recoveries using the method described here and comparing it with the result obtained in [Citation2], one can see that our recovery outperforms, especially at the boundary points, the recovery in [Citation2]. We also compare it with a similar result obtained in [Citation1]Footnote¹⁰ using a total variation regularization method, shown in Figure (b).

Figure 3. Recoveries using out method: (a) numerical derivative $\tilde{φ}$ and (b) smooth approximation $T \tilde{φ}$ .

Figure 3. Recoveries using out method: (a) numerical derivative φ~ and (b) smooth approximation Tφ~.

Figure 4. (a) Noisy data for Example 7.4 and (b) total variation regularization from [Citation1].

Figure 5. Integration smooths out the noise present in the data: (a) noisy $\tilde{g}$ vs exact g and (b) noisy $\tilde{u}$ vs exact u.

Figure 5. Integration smooths out the noise present in the data: (a) noisy g~ vs exact g and (b) noisy u~ vs exact u.

8. Stopping criterion II

As explained in Section 5.3, in the presence noise, one has to terminate the descent process at an appropriate iteration to achieve regularization. The discrepancy principle [Citation24–26] provides a stopping condition provided the error norm (δ) is known. However, in many of the practical situations it is very hard to determine an estimate of the error norm. In such cases, heuristic approaches are taken to determine stopping criteria, such as the L-curve method [Citation27,Citation28]. In this section, we present a new heuristic approach to terminate the iterations when the error norm (δ) is unknown. First we notice that the minimizing functional G used here, as defined in (Equation12(12) $G (ψ) = ‖ u_{ψ}^{'} - u^{'} ‖_{2} .$ (12) ), does not contain the noisy $\tilde{g}$ directly, rather an integrated (smoothed) version of it ( $\tilde{u}$ ), as compared to a minimizing functional (such as $G_{2}$ , defined in (Equation3(3) $G_{2} (ψ, β) = | | T_{D} ψ - g_{1} | |_{2}^{2} + β | | D ψ | |,$ (3) )) used in any standard regularization method. Hence, in addition to avoiding the noisy data from affecting the recovery, the integration process also helps in constructing a stopping strategy, which is explained below. Figure shows the difference in g and $\tilde{g}$ vs. u and $\tilde{u}$ , from Example 7.2. We can see, from Figure (b), that the integration smooths out the noise present in $\tilde{g}$ to get $| | \tilde{u} - u | |_{L^{2}} / | | u | |_{L^{2}} \approx 0.78 %$ , whereas the noise level in $\tilde{g}$ is $| | \tilde{g} - g | |_{L^{2}} / | | g | |_{L^{2}} \approx 55.44 %$ . Consequently, the sequence ${{\tilde{g}}_{m} := T ψ_{m}}_{m \geq 1}$ , constructed during the descent process, converges (weakly) in $L^{2} [a, b]$ to $\tilde{g}$ , rather than strongly to $\tilde{g}$ , that is, for any $ϕ \in L^{2} [a, b]$ the sequence ${({\tilde{g}}_{m} - \tilde{g}, ϕ)_{L^{2}}}$ converges to zero. In other words, the integration mitigates the effects of the high oscillations originating from the random variable and also of any outliers (as its support is close to zero measure). Also, since the forward operator T (as defined in (Equation42(42) $\begin{aligned} (T φ) (x) & := (T_{D} φ - T_{D}^{*} φ) (x) \\ = \int_{a}^{x} φ (ξ) d ξ - \int_{x}^{b} φ (ξ) d ξ . \end{aligned}$ (42) )) is smooth, the sequence ${\tilde{g}}_{m}$ first approximate the exact g (as it is also smooth, $g \in H^{1} [a, b]$ ), with the corresponding sequence ${u_{ψ_{m}}}$ approximating $\tilde{u} \approx u$ , and then the sequence ${{\tilde{g}}_{m}}$ attempts to fit the noisy $\tilde{g}$ , which leads to a phenomenon known as overfitting. However, when ${\tilde{g}}_{m}$ tries to overfit the data (i.e. fit $\tilde{g}$ ) the sequence values $| | \tilde{u} - u_{ψ_{m}} | |_{L^{2}}$ increases, since the ovefitting occurs in a smooth fashion (as T is a smooth operator) and, as a result, increases the integral values. This effect can be seen in Figure (a), $| | u_{ψ_{m}} - u_{δ} | |_{L^{2}}$ descent for Example 7.1 (when $σ = 0.1$ ), and in Figure (b), $| | T ψ_{m} - g_{δ} | |_{L^{2}}$ descent for Example 7.2. One can capture the recoveries at these fluctuatingFootnote¹¹ points (of either $| | T ψ_{m} - g_{δ} | |_{L^{2}}$ , $| | u_{ψ_{m}}^{'} - u_{δ}^{'} | |_{L^{2}}$ or $| | u_{ψ_{m}} - u_{δ} | |_{L^{2}}$ ) and choose the recovery corresponding to the earliest iteration for which $T ψ_{m}$ fits through $g_{δ}$ . Choosing the early fluctuating iteration is especially important when dealing with data with large error level, such as in Example 7.1 ( $σ = 0.1$ ) and Example 7.2. For example, from Figure (b) if one captures the recovery at iteration 4 then the relative error in the recovery is only 8% (see Figure a). However, even if an appropriate early iteration is not selected, still the recovery errors saturate after certain iterations, rather than blowing up. This is significant when dealing with data having small to moderate error level, such as in Example 7.1 ( $σ = 0.01$ ), where one can notice (in Figure b) that the relative errors of the recoveries attain saturation after recovering the optimal solution, since $| | u - u_{δ} | |_{L^{2}} \approx 0$ for small δ. Tables and show the relative errors of the recoveries obtained using this heuristic stopping criterion.

Figure 6. Fluctuations during the descent process. (a) $| | u_{ψ_{m}} - u_{δ} | |_{L^{2}}$ , Example 7.1 ( $σ = 0.1$ ) and (b) $| | \tilde{g} - g_{m} | |_{L^{2}}$ , Example 7.2.

Figure 6. Fluctuations during the descent process. (a) ||uψm−uδ||L2, Example 7.1 (σ=0.1) and (b) ||g~−gm||L2, Example 7.2.

Figure 7. Relative errors in the recoveries during the descent process: (a) Example 7.2 and (b) Example 7.1 ( $σ = 0.01$ ).

Remark 8.1

Note that this phenomena does not occur in Landweber iterations, since one minimizes the functional containing the noisy data $g_{δ}$ directly, i.e. $G (ψ) = | | T ψ - g_{δ} | |_{L^{2}}^{2}$ . Figure (a) shows the descent of $| | T ψ_{m} - g_{δ} | |_{L^{2}}$ , when Landweber iterations are implemented for Example 7.1 ( $σ = 0.01$ ) and Figure (b) shows the corresponding descent of the relative errors of the recovered solutions, where can see no fluctuations in Figure (a) but the semi-convergence in Figure (b). Therefore, without any prior knowledge of δ it is hard to stop the descent process and avoids the ill-posedness. Whereas, notice the saturation of the relative errors of the recovery (Figure (b)) when we implement our method to the same problem.

Figure 8. Landweber iterations on Example 7.1 ( $σ = 0.01$ ): (a) $| | T ψ_{m} - g_{δ} | |_{L^{2}}$ -descent and (b) relative errors descent.

9. Conclusion and future research

This algorithm for numerical differentiation is very effective, even in the presence of extreme noise, as can be seen from the examples presented in Section 7. Furthermore, it serves as a universal method to deal with all scenarios such as when the data set is dense or sparse and when the function g is smooth or not smooth. The key feature in this technique is that we are able to upgrade the working space of the problem from $H^{1} [a, b]$ to $H_{0}^{3} [a, b]$ , which is a much smoother space. Additionally, this method also enjoys many advantages of not encountering the involvement of an external regularization parameter, for example one does not have to determine the optimum parameter choice to balance between the fitting and smoothing of the inverse recovery. Even the heuristic approach for the stopping criteria also provides us with a much better recovery, and hence it's very applicable in the absence of the error norm.

In a follow-up paper, we improve this method to calculate derivatives of functions in higher dimensions and for higher order derivatives. Moreover, we can extend this method to encompass any linear inverse problems and thereby generalize the theory , which will be presented in the coming paper where we will apply this method to recover solution of Fredholm Integral Equations, like deconvolution or general Volterra equation.

Acknowledgments

I am very grateful to Prof. Ian Knowles for his support, encouragement and stimulating discussions throughout the preparation of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 The technicality as to how far g can be weakened so that the solution of equation (Equation1

(1)

T_{D} φ = g_{1},

(1) ) is uniquely and stably recovered (finding the ϕ inversely) will be discussed later.

2 Again the domain space for the functionals $G_{1}$ , $G_{2}$ and G are discussed in details later.

3 It can be further proved that it's also the first Fr $\overset{´}{e}$ chet derivative of G at ψ.

4 Hence $T_{D}^{*}$ is also a linear and bounded operator in $L^{2} [a, b]$ .

5 Again, it can be proved that it's the second Fr $\overset{´}{e}$ chet derivative of G at ψ.

6 Just for simplicity, we assume $g (a) = \tilde{g} (a)$ .

7 In our experiments, we considered $τ = 1$ and the termination condition as $| | T ψ_{m} - g_{δ} | |_{L^{2}} < δ$ .

8 As explained in the $L^{2}$ -gradient version of the descent algorithm for G, in Section 4.

9 To be consistent with $^{6}$ , we kept $\tilde{g} (a) = g (a)$ and $\tilde{g} (b) = g (b)$ , but can be avoided.

10 Where the test function is $g (x) = | x - 0.5 |$ on $[0, 1]$ and the data set is 100 uniformly distributed points with $σ = 0.01$ .

11 the fluctuating occurs since the values of $| | \tilde{g} - g_{m} | |_{L^{2}}$ tends to decrease first (when approximating the exact g) and then increases (when making a transition from g to $\tilde{g}$ ) and eventually decreases (when trying to fit the noisy $\tilde{g}$ , i.e. overfitting)

References

Knowles I, Renka RJ. 2014. Methods for numerical differentiation of noisy data. Proceedings of the variational and topological methods: theory, applications, numerical simulations, and open problems, Vol. 21. San Marcos (TX): Texas State University; 2014. p. 235–246.
Google Scholar
Lu S, Pereverzev SV. Numerical differentiation from a viewpoint of regularization theory. Math Comp. 2006;75(256):1853–1870.
Web of Science ®Google Scholar
Wei T, Hon YC. Numerical derivatives from one-dimensional scattered noisy data. J Phys: Conf Ser. 2005;12:171–179.
Google Scholar
Ramm AG, Smirnova AB. On stable numerical differentiation. Math Comp. 2001;70(235):1131–1153.
Web of Science ®Google Scholar
Háo DN, Chuong LH, Lesnic D. Heuristic regularization methods for numerical differentiation. Comput Math Appl. 2012;63(4):816–826.
Web of Science ®Google Scholar
Jauberteau F, Jauberteau JL. Numerical differentiation with noisy signal. Appl Math Comput. 2009;215(6):2283–2297.
Web of Science ®Google Scholar
Stickel JJ. Data smoothing and numerical differentiation by a regularization method. Comput Chem Eng. 2010;34:467–475.
Web of Science ®Google Scholar
Zhao Z, Meng Z, He G. A new approach to numerical differentiation. J Comput Appl Math. 2009;232(2):227–239.
Web of Science ®Google Scholar
Wang Z, Wang H, Qiu S. A new method for numerical differentiation based on direct and inverse problems of partial differential equations. Appl Math Lett. 2015;43:61–67.
Web of Science ®Google Scholar
Murio DA. The mollification method and the numerical solution of ill-posed problems. New York: A Wiley-Interscience Publication, John Wiley & Sons, Inc.; 1993.
Google Scholar
Hào DN. A mollification method for ill-posed problems. Numer Math. 1994;68(4):469–506.
Web of Science ®Google Scholar
Hào ÐN, Reinhardt H-J, Seiffarth F. Stable numerical fractional differentiation by mollification. Numer Funct Anal Optim. 1994;15(5–6):635–659.
Web of Science ®Google Scholar
Hào DN, Reinhardt H-J, Schneider A. Stable approximation of fractional derivatives of rough functions. BIT Numer Math. 1995;35(4):488–503.
Google Scholar
Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Vol. 375. Dordrecht: Kluwer Academic Publishers Group; 1996; Mathematics and its applications.
Google Scholar
Kaltenbacher B. Some Newton type methods for the regularization of nonlinear ill-posed problems. Inverse Probl. 1997;13:729–753.
Web of Science ®Google Scholar
Kaltenbacher B., Neubauer A., Scherzer O. Iterative regularization methods for nonlinear ill-posed problems. Vol. 6. Berlin: Walter de Gruyter GmbH & Co. KG; 2008; Radon series on computational and applied mathematics.
Google Scholar
Bakushinskii AB. The problems of the convergence of the iteratively regularized Gauss–Newton method. Comput Math Math Phys. 1992;32:1353–1359.
Web of Science ®Google Scholar
Hanke M. A regularizing Levenberg-Marquardt scheme, with applications to inverse groundwater filtration problems. Inverse Probl. 1997;13:79–95.
Web of Science ®Google Scholar
Knowles I, Wallace R. A variational method for numerical differentiation. Numer Math. 1995;70(1):91–110. 65D25 (49M10) [96h:65031].
Web of Science ®Google Scholar
Neuberger JW. Sobolev gradients in differential equations. New York: Springer-Verlag; 1997. (Lecture notes in mathematics; vol. 1670).
Google Scholar
Knowles I. Variational methods for ill-posed problems. In: Neuberger JM, editor. Variational methods: open problems, recent progress, and numerical algorithms (Flagstaff, Arizona, 2002). Providence (RI): American Mathematical Society; 2004. p. 187–199. (Contemporary mathematics; vol. 357).
Google Scholar
Morozov VA. Methods for solving incorrectly posed problems. New York: Springer-Verlag; 1984.
Google Scholar
Fox L, Mayers DF. Numerical solution of ordinary differential equations. London: Chapman & Hall; 1987.
Google Scholar
Gfrerer H. An a posteriori parameter choice for ordinary and iterated Tikhonov regularization of ill-posed problems leading to optimal convergence rates. Math Comput. 1987;49:507–22.
Web of Science ®Google Scholar
Morozov VA. On the solution of functional equations by the method of regularization. Sov Math Dokl. 1966;7:414–417.
Google Scholar
Vainikko GM. The principle of the residual for a class of regularization methods. USSR Comput Math Math Phys. 1982;22:1–19.
Web of Science ®Google Scholar
Hansen PC. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 1992;34(4):561–580.
Web of Science ®Google Scholar
Lawson CL, Hanson RJ. Solving least squares problems. Englewood Cliffs (NJ): Prentice-Hall; 1974.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

A new regularization approach for numerical differentiation

Abstract

1. Introduction

2. Notations and preliminaries