1,253
Views
5
CrossRef citations to date
0
Altmetric
Articles

A new regularization approach for numerical differentiation

Pages 1747-1772 | Received 15 Oct 2019, Accepted 22 Apr 2020, Published online: 04 Jun 2020

Abstract

It is well known that the problem of numerical differentiation is an ill-posed problem and one requires regularization methods to approximate the solution. The commonly practiced regularization methods are (external) parameter-based like Tikhonov regularization, which has certain inherent difficulties associated with them. In such scenarios, iterative regularization methods serve as an attractive alternative. In this paper, we propose a novel iterative regularization method where the minimizing functional does not contain the noisy data directly, but rather a smoothed or integrated version of it. The advantage, in addition to circumventing the use of noisy data directly, is that the sequence of functions constructed during the descent process tends to avoid overfitting, and hence, does not corrupt the recovery significantly. To demonstrate the effectiveness of our method we compare the numerical results obtained from our method with the numerical results obtained from certain standard regularization methods such as Tikhonov regularization, Total-variation, etc.

1991 Mathematics Subject Classifications:

1. Introduction

In many applications, we want to calculate the derivative of a function measured experimentally, that is, to differentiate a function obtained from discrete noisy data. The problem consists of calculating stably the derivative of a smooth function g given its noisy data g~ such that ||g~g||δ, where the norm |||| can be either the LorL2-norm. This turns out to be an ill-posed problem, since for a small δ such that ||g~g||δ we can have ||g~g|| arbitrarily large or even throw g~ out of the set of differentiable functions. Many methods and techniques have been introduced in the literature regarding this topic (see [Citation1–13] and references therein). They mostly fall into one of three categories: difference methods, interpolation methods and regularization methods. One can use the first two methods to get satisfactory results provided that the function is given precisely, but they fail miserably when they encounter even small amounts of noise, if the step size is not chosen appropriately or in a regularized way. Hence in such scenarios, regularization methods need to be employed to handle the instability arising from noisy data, with Tikhonov's regularization method being very popular in this respect. In a regularization method, the instability is bypassed by reducing the numerical differentiation problem to a family of well-posed problems, depending on a regularization (smoothing) parameter. Once an optimal value for this parameter is found, the corresponding well-posed problem is then solved to obtain an estimate for the derivative. Unfortunately, the computation of the optimal value for this parameter is a nontrivial task, and usually it demands some prior knowledge of the errors involved. Though there are many techniques developed to find an appropriate parameter value, such as the Morzov's discrepancy principle, the L-curve method and others, still sometimes the solution recovered is either over-fitted or over-smoothed; especially when g has sharp edges, discontinuities or the data has extreme noise in it. An attractive alternative is the iterative regularization methods, like the Landweber iterations. In an iterative regularization method (without any explicit dependence on external parameters), one can start the minimization process first and then use the iteration index as an regularization parameter, i.e. one terminates the minimization process at an appropriate time to achieve regularization. More recently iterative methods have been investigated in the frame work of regularization of nonlinear problems, see [Citation14–18].

In this paper, we propose a new smoothing or regularization technique that doesn't involve any external parameters, thus avoiding all the difficulties associated with them. Furthermore, since this method does not involve the noisy data directly, it is very robust to large noise level in the data as well as to errors with non-zero mean. Thus it makes the method computationally feasible when dealing with real-life data, where we do not expect to have any knowledge of the errors involved or when there is extreme noise involved in the data.

Let us briefly outline our method. First, we lay out a brief summary of a typical (external) parameter-dependent regularization method, and then we present our new (external) parameter-independent smoothing technique. For a given differentiable function g the problem of numerical differentiation, finding φ=g, can be expressed via a Volterra integral equation:Footnote1 (1) TDφ=g1,(1) where g1(x)=g(x)g(a) and (2) (TDφ)(x):=axφ(ξ)dξ,(2) for all x[a,b]R. Here one can attempt to solve Equation (Equation1) by approximating ϕ with the minimizer of the functional G1(ψ)=||TDψg1||22. However, as stated earlier, the recovery becomes unstable for a noisy g. So, to counter the ill-posed nature of this problem regularization techniques are introduced: one instead minimizes the functionalFootnote2 (3) G2(ψ,β)=||TDψg1||22+β||Dψ||,(3) where β is a regularization parameter and D is a differentiation operator of some order (see [Citation1,Citation14]). This converts the ill-posed problem to a family of conditionally well-posed problems depending on β. The first term (fitting term) of the functional G2 ensures that the inverse solution fits well with the given data, when applied by the forward operator TD, and the second term (smoothing term) in (Equation3) controls the smoothness of the inverse solution. Since the minimizer of the second term is not the solution of the inverse problem, unless it is a trivial solution, one needs to balance between the smoothing and fitting of the inverse recovery by finding an optimal value β0. Then the exact solution ϕ is approximated by minimizing the corresponding functional G(,β0).

Given that an important key in any regularization technique is the conversion of an ill-posed problem to a (conditionally) well-posed problem, we make this our main goal. We start with integrating the data twice, which helps in smoothing out the noise. Thus (Equation1) can be reformulated as: find a ϕ that satisfies (4) TDφ=u1,(4) where u1=g1 and g1=g=φ. Equivalently, u1(x)=(xbaηg1(ξ)dξdη) or (5) u1(x)=xbaηg(ξ)dξdηg(a)[(ba)22(xa)22].(5) Now we find the solution of (Equation4) by approximating it with the minimizer of the following functional: (6) G(ψ)=||uψu1||22,(6) where uψ is the solution, for a given ψ, of the following boundary value problem: (7) uψ=TDψ,(7) (8) uψ(a)=u1(a),uψ(b)=u1(b).(8) Note that unlike the functionals G1 and G2 which are defined on the data g directly, the functional G is defined rather on the transformed data u1. Thus one can see that when the data g has noise in it then working with the smoothed (integrated) data u1 is more effective than working with the noisy data g directly. It will be proved in later sections that this simple change in the working space from g to u1 drastically improves the stability and smoothness of the inverse recovery of ϕ, without adding any further smoothing terms.

Remark 1.1

Typically, real-life data have zero-mean additive noise in it, i.e. (9) g~(x)=g(x)+ϵ(x),(9) for x[a,b], such that Eμ(ϵ)=0 and Eμ(ϵ2)δ (or equivalently, ||g~g||L2δ), where μ is the probability density function of the random variable ε. Therefore, integrating the data smooths out the noise present in the original data (g~) and hence, the new data (u1~, as defined by (Equation5) for g~) for the transformed equation (Equation4) has a significantly reduced noise level in it, see the comparison in Figure .

We prove, in Section 3, that the functional G is strictly convex and hence has a unique global minimum which satisfies (Equation1), that is, φ=argminψG(ψ) satisfies g=φ. The global minimum is achieved via an iterative method using an upgraded steepest gradient or conjugate-gradient method, presented in Section 4, where the use of an (Sobolev) H1-gradient for G instead of the commonly used L2-gradient is discussed and is a crucial step in the optimization process. In Section 5, we shed some light on the stability, convergence and well-posedness of this technique. In Section 6, the Volterra operator TD is improved for better computation and numerical efficiency but keeping it consistent with the theory developed. In Section 7, we provide some numerical results and compare our method of numerical differentiation with some popular regularization methods, namely Tikhonov regularization, total variation, smoothing spline, mollification method and least square polynomial approximation (the comparison is done with results provided in [Citation1,Citation2,Citation19]). Finally, in Section 8 we present an efficient (heuristic) stopping criterion for the descent process when we don't have any prior knowledge of the error norm.

2. Notations and preliminaries

We adopt the following notations that will be used throughout the paper. All functions are real-valued defined on a bounded closed domain [a,b]R. For 1p<, Lp[a,b] := (Lp,||||Lp,[a,b]) denotes the usual Banach space of p-integrable functions on [a,b] and the space L[a,b]:= (L,L,[a,b]) contains the essentially bounded measurable functions. Likewise the Sobolev space Hq[a,b]:= (Hq,Hq,[a,b]) contains all the functions for which f, f,,f(q)L2[a,b], with weak differentiation understood, and the space H0q[a,b]:={fHq:f vanishes at the boundary}. The spaces L2[a,b] and Hq are Hilbert spaces with inner-products denoted by (,)L2 and (,)Hq, respectively.

Remark 2.1

Note that although the Volterra operator TD is well defined on the space of integrable functions (L1[a,b]), we will restrict it to the space of L2[a,b] functions, that is, we shall consider L2[a,b] to be the searching space for the solution of (Equation1), since it is a Hilbert space and has a nice inner product (,)L2. Hence the domain of the functional G is DG=L2[a,b]. It is not hard to see that the Volterra operator TD is linear and bounded on L2[a,b].

Remark 2.2

From Remark 2.1, since φL2[a,b] we have TDφL2[a,b]. Thus our problem space to be considered is H1[a,b], that is, gH1[a,b]. Note that since we will not be working with g directly but rather with u1, integrating twice makes u1H3[a,b]H1[a,b]. This is particularly significant in the sense that we are able to upgrade the smoothness of the working data or information space from H1[a,b] to H3[a,b] and hence improve the stability and efficiency of numerical computations. This effect can be seen in some of the numerical examples presented in Section 7 where the noise involved in the data is extreme.

Before we proceed to prove the convexity of G and the well-posedness of the method, we will manipulate u1 further to push it to an even nicer space. From Equation (Equation5), we have u1(b)=0andu1(a)=abaηg(ξ)dξdηg(a)(ba)22 Now redefining the working data as u(x)=u1(x)(bx/ba)u1(a), we have (10) u(x)=xbaηg(ξ)dξdηg(a)[(ba)22(xa)22]bxba[abaηg(ξ)dξdηg(a)(ba)22],(10) and this gives u(a)=0, u(b)=0 and u=u1=g1. Thus uH03[a,b] and the negative Laplace operator Δ=2/x2 is a positive operator in L2[a,b] on the domain DΔ=H02[a,b]H03[a,b], since for any vH02[a,b] (11) (Δv,v)L2=ab(Δv)vdx=[vv]ab+ab|v|2dx=(v,v)L20(11) and equality holds when v=0, which implies v0. Here, and until otherwise specified, the notations Δ and ∇, will mean 2/x2, /x, respectively, throughout the paper. The functional G(ψ) in (Equation6) is now redefined, with the new u, as (12) G(ψ)=uψu2.(12) The positivity of the operator Δ in L2[a,b] on the domain DΔ also helps us to obtain an upper and lower bound for G(ψ), for any ψL2[a,b]. First we get the trivial upper bound (13) G(ψ)=||uuψ||L22j=01||jxj(uuψ)||L22=||uuψ||H12(13) To get a lower bound we note from (Equation8) that v=uuψH02[a,b], thus (14) G(ψ)=||uuψ||L22λ1||uuψ||L22,(14) where λ1>0 is the smallest eigenvalue of the positive operator Δ on DΔ. Hence, we get the bounds of G(ψ) in terms of the H1[a,b] norm of uuψ as (15) (1+1λ1)1||uuψ||H12G(ψ)=||uuψ||L22||uuψ||H12.(15) Equation (Equation15) indicates the stability of the method and the convergence of uψm to u in H1[a,b] if G(ψm)0, as is explained in detail in Section 5.

3. Convexity of the functional G

In this section, we prove the convexity of the functional G, together with some important properties.

Theorem 3.1

  1. An equivalent form of G, for any ψL2[a,b], is (16) G(ψ)=||uuψ||L22=ab(u2uψ2)2(TDψ)(uuψ)dx.(16)

  2. For any ψ1,ψ2L2[a,b], we have (17) G(ψ1)G(ψ2)=ab2(TD(ψ1ψ2))(uuψ1+uψ22)dx.(17)

  3. The first Ga^teaux derivative,Footnote3 at ψL2[a,b], for G is given by (18) G(ψ)[h]=ab(TDh)(2(uuψ))dx,(18) where hL2[a,b]. And the L2-gradient of G, at ψ, is given by (19) L2ψG=TD(2(uuψ)),(19) where the adjoint TD of TD is given by, for any fL2[a,b],Footnote4 (20) (TDf)(x)=xbf(ξ)dξ;x[a,b].(20)

  4. The second Gâteaux derivative,Footnote5 at any ψL2[a,b], of G is given by (21) G(ψ)[h,k]=2(Δ1(TDh),(TDk))L2(21) where h,kL2[a,b]. Hence for any ψL2[a,b], G(ψ) is a positive definite quadratic form.

For the proof of Theorem 3.1, we also need the following ancillary result.

Lemma 3.2

For fixed ψ,hL2[a,b] we have, in H1[a,b], (22) limϵ0uψ+ϵh=uψ.(22)

Proof.

Subtracting the following equations: uψ=TDψuψ+ϵh=TD(ψ+ϵh), we have (23) Δ(uψ+ϵhuψ)=ϵTDh(23) and using uψ+ϵhuψH02[a,b] we get, via integration by parts, ((uψ+ϵhuψ),(uψ+ϵhuψ))L2=ϵ(TDh,uψ+ϵhuψ)L2. Now from (Equation14) we have λ1||uψ+ϵhuψ||L22||(uψ+ϵhuψ)||L22, and using the Cauchy–Schwarz inequality, we have (1+λ11)1||uψ+ϵhuψ||H12||(uψ+ϵhuψ)||L22=ϵ(TDh,uψ+ϵhuψ)L2,(1+λ11)1uψ+ϵhuψH1ϵ||TDh||L2, where λ1>0. Hence uψ+ϵhϵ0uψ in H1[a,b], since the operator TD is bounded and ψ,hL2[a,b] are fixed, which implies the right hand side is of O(ϵ).

3.1. Proof of Theorem 3.1

The proof of first two properties (i) and (ii) are straight forward via integration by parts and using the fact that uuψH02[a,b].

In order to prove (iii) and (iv), we use Lemma 3.2.

(iii)

The Gâteaux derivative of the functional G at ψ in the direction of hL2[a,b] is given by (24) G(ψ)[h]=limϵ0G(ψ+ϵh)G(ψ)ϵ.(24) Now for a fixed ϵ>0, we have using (Equation23), G(ψ+ϵh)G(ψ)ϵ=ϵ1ab(uuψ+ϵh)2(uuψ)2dx=ϵ1ab(uψuψ+ϵh)(2u(uψ+ϵh+uψ))dx=ϵ1abΔ(uψuψ+ϵh)(2u(uψ+ϵh+uψ))dx=ϵ1abϵ(TDh)(2u(uψ+ϵh+uψ))dx=(TDh,2u(uψ+ϵh+uψ))L2.Using Lemma 3.2, one obtains the Gâteaux derivative of G at ψL2[a,b] in the direction of hL2[a,b] as G(ψ)[h]=(TDh,2(uuψ))L2.Note that (TDh,2(uuψ))L2=(h,TD(2(uuψ)))L2 for all hL2[a,b], where TD is the adjoint of the operator TD. Hence by Riesz representation theorem, the L2-gradient of the functional G at ψ is given by L2ψG=TD(2(uuψ)).

(iv)

Finally, the second Gâteaux derivative for the functional G at ψ is given by (25) G(ψ)[h,k]=limϵ0G(ψ+ϵh)[k]G(ψ)[k]ϵ(25) Again for a fixed ϵ>0, we have using (Equation23) G(ψ+ϵh)[k]G(ψ)[k]ϵ=ϵ1ab(TDk)(2(uuψ+ϵh))(TDk)(2(uuψ))dx=ϵ1ab2(TDk)(uψuψ+ϵh)dx=ϵ1ab2(TDk)(ϵΔ1(TDh))dx=2ab(TDk)(Δ1(TDh))dx=2(Δ1(TDh),TDk)L2.Hence from (Equation25) and letting ϵ0 we get G(ψ)[h,k]=2(Δ1(TDh),TDk)L2.Here we can see the strict convexity of the functional G, as for any hL2[a,b], we have G(ψ)[h,h]=2(Δ1(TDh),(TDh))L2=2(y,Δy)L2,where Δy=TDh and yH02[a,b] (from (Equation23)). As Δ is a positive operator on H02[a,b], y is the trivial solution if and only if TDh=0. But axh(ξ)dξ=0for all x[a,b] if and only if h0. Thus G(ψ) is a positive definite form for any ψL2[a,b].

In this section, we proved that the functional G is strictly convex and hence has a unique minimizer, which is attained by the solution ϕ of the inverse problem (Equation1). We next discuss a descent algorithm that uses the L2-gradient to derive other gradients that provide descent directions, with better and faster descent rates.

4. The descent algorithm

Here we discuss the problem of minimizing the functional G via a descent method. Theorem 3.1 suggests that the minimization of the functional G should be computationally effective in that ϕ is not only the unique global minimum for G but also the unique zero for the gradient, that is, L2ψG0 for ψφ. Now for a given ψL2[a,b], let hL2[a,b] denote an update direction for ψ. Then Taylor's expansion gives G(ψαh)=G(ψ)αG(ψ)[h]+O(α2) or, for sufficiently small α>0, we have (26) G(ψαh)G(ψ)αG(ψ)[h].(26) So if we choose the direction h in such a way that G(ψ)[h]>0, then we can minimize G along this direction. Thus we can set up a recovery algorithm for ϕ, forming a sequence of values G(ψinitialα1h1)>>G(ψm1αmhm)>G(ψmαm+1hm+1)0 We list a number of different gradient directions that can make G(ψ)[h]>0.

  1. The L2-Gradient:

    First, notice from Theorem 3.1 that at a given ψL2[a,b], (27) G(ψ)[h]=(h,L2ψG)L2(27) so if we choose the direction h=L2ψG at ψ, then G(ψ)[h]>0. However, there are numerical issues associated with L2-gradient of G during the descent process stemming from the fact that it is always zero at b. Consequently, the boundary data at b for the evolving functions ψm are invariant during the descent and there is no control on the evolving boundary data at b. This can result in severe decay near the boundary point b if ψinitial(b)φ(b), as the end point for all such ψm is glued to ψinitial(b), but ψmφ in L2[a,b], as is proved in Section 5.

  2. The H1-Gradient:

    One can circumvent this problem by opting for the Sobolev gradient H1ψG instead (see [Citation20]), which is also known as the Neuberger gradient. It is defined as follows: for any hH1[a,b] (28) G(ψ)[h]=(H1ψG,h)H1=(g,h)L2+(g,h)L2=(g,h)L2+(g,h)L2+[gh]ab=(g+g,h)L2+[gh]ab(28) where g=H1ψG. Comparing with (Equation27) one can obtain the Neuberger gradient g at ψ, by solving the boundary value problem (29) g+g=L2ψG[gh]ab=0.(29) Setting h = g, the boundary condition becomes (30) [gg]ab=g(b)g(b)g(a)g(a)=0.(30) This provides us a gradient, H1ψG, with considerably more flexibility at the boundary points a and b. In particular consider the following cases:

    1. Dirichlet Neuberger gradient : g(a)=0 and g(b)=0.

    2. Neumann Neuberger gradient : g(a)=0 and g(b)=0.

    3. Robin or mixed Neuberger gradient : g(a)=0 and g(b)=0 or g(a)=0 and g(b)=0.

    This excellent smoothing technique was originally introduced and used by Neuberger. In addition to the flexibility at the end points, it enables the new gradient to be a preconditioned (smoothed) version of L2ψG, as g=(IΔ)1L2ψG, and hence gives a superior convergence in the steepest descent algorithms. So now choosing the descent direction h = g at ψ makes G(ψ)[h]=(g,H1ψG)H1>0 and hence G(ψαh)G(ψ)<0 (from (Equation26)). As stated earlier, the greatest advantage of this gradient is the control of boundary data during the descent process, since based on some prior information of the boundary data we can choose any one of the three aforementioned gradients. For example, if some prior knowledge on φ(a) and φ(b) are known, then one can define φinitial as the straight line joining them and use the Dirichlet Neubeger gradient for the descent. Thus the boundary data is preserved in each of the evolving ψm during the descent process, which leads to a much more efficient, and faster, descent compared to the normal L2-gradient descent. Even when φ|{a,b} is unknown, one can use the Neumann Neuberger gradient that allows free movements at the boundary points rather than gluing it to a fixed value. In the latter scenario, one can even take the average of H1ψG and L2ψG to make use of both the gradients.

  3. The L2H1 Conjugate Gradient:

    If one wishes to further boost the descent speed (by, roughly, a factor of 2) and make the best use of both the gradients, then the standard Polak–Ribie`re conjugate gradient scheme (see [Citation21]) can be implemented. The initial search direction at ψ0, is h0=g0=H1ψ0G. At ψm one can use the exact or inexact line search routine to minimize G(ψ) in the direction of hm resulting in ψm+1. Then gm+1=H1ψm+1G and hm+1=gm+1+γmhm, where (31) γm=(gm+1gm,gm+1)H1(gm,gm)H1=(gm+1gm,L2ψm+1G)L2(gm,L2ψmG)L2.(31)

Remark 4.1

Though the L2H1 conjugate gradient boost the descent rate, it (sometimes) compromises the accuracy of the recovery, especially when the noise present in the data is extreme, see Table  when σ=0.1. Now one can also construct a H1H1 conjugate gradient based only on the Sobolev gradient. This is smoother than the L2H1 conjugate gradient (as Sobolev gradients are smoother, see Equation (Equation28)) and hence improves the accuracy of the recovered solution (specially for smooth recovery), but it is tad slower than the L2H1 conjugate gradient. Finally, the simple Sobolev gradient (H1G) provides the most accurate recovery, but at the cost of the descent rate (it the slowest amongst the three), see Tables  and .

4.1. The line search method

We minimize the single variable function fm(α)=G(ψm+1(α)), where ψm+1(α)=ψmαH1ψmG, via a line search minimization by first bracketing the minimum and then using some well-known optimization techniques like Brent minimization to further approximate it. Note that the function fm(α) is strictly decreasing in some neighbourhood of α=0 as fm(0)=||gm||H12<0.

In order to achieve numerical efficiency, we need to carefully choose the initial step size α0. For that, we use the quadratic approximation of the function fm(α) as follows: (32) fm(α)G(ψm)αG(ψm)[gm]+12α2G(ψm)[gm,gm],(32) which gives the minimizing value for α as (33) α0=G(ψm)[gm]G(ψm)[gm,gm].(33) Now since α0 is derived from the quadratic approximation of the functional G, it is usually very close to the optimal value, thereby reducing the computational time of the descent algorithm significantly. Now if, for k=0,1,2, fm((k+1)α0)>fm(kα0) then we have a bracket, [max{k1,0}α0,(k+1)α0], for the minimum and one can use single variable minimization solvers to approximate it.

In this section, we saw a descent algorithm, with various gradients, where starting from an initial guess ψ0L2[a,b], we obtain a sequence of L2-functions ψm for which the sequence {G(ψm)} is strictly decreasing. In the next section, we discuss the convergence of the ψm's to ϕ and the stability of the recovery.

5. Convergence, stability and conditional well-posedness

Exact data: We first prove that the sequence of functions constructed during the descent process converges to the exact source function in the absence of any error term and then proves the stability of the process in the presence of noise in the data.

5.1. Convergence

First we see that for the sequence {ψm} produced by the steepest descent algorithm, described in Section 4, we have G(ψm)0, since the functional G is non-negative and strictly convex (with the global minimizer ϕ, G(φ)=0) and G(ψm+1)<G(ψm). In this section, we prove that if for any sequence of functions {ψm}L2[a,b] such that G(ψm)0, then ψmwφ in L2[a,b], where w denotes weak convergence in L2[a,b].

Theorem 5.1

Suppose that {ψm} is any sequence of L2-functions such that the sequence {G(ψm)} tends to zero. Then {ψm} converges weakly to ϕ in L2[a,b] and {uψm} converges strongly to u in H1[a,b]. Also, the sequence {gm} converges weakly to g1 in L2[a,b], where gm=uψm and g1=u.

Proof.

The proof of uψmsu in H1[a,b] is trivial from the bounds of G(ψ) in Equation (Equation15), which gives (34) ||uuψm||H12(1+1λ1)G(ψm).(34) To see the weak convergence of {ψm} to ϕ in L2[a,b], we first prove that the sequence {TDψm} converges weakly to TDφ in L2[a,b]. Since {uψm} converges strongly to u in H1[a,b], this implies {uψm} and {uψm} converge weakly to u and u in L2[a,b], respectively. Now we will use the fact that C0[a,b] is dense in L2[a,b] (in L2-norm), i.e. for any ψL2[a,b] and ϵ>0 there exists a ψ~C0[a,b] such that ||ψψ~||L2ϵ. So for any ψ~C0[a,b] we have, using ψ~L2[a,b], (35) (TDψmTDφ,ψ~)L2=(Δ(uψmu),ψ~)L2=((uψmu),ψ~)L2,(35) which tends to zero as m. Hence by the density of C0[a,b] in L2[a,b], it can be proved that the sequence {TDψm} converges weakly to TDφ in L2[a,b].

To prove convergence of {ψm} to {φ} weakly in L2[a,b], we can use (TD(ψmφ),ψ)L2=(ψmφ,TDψ)L2 and (Equation35). Therefore, our proof will be complete if we can show that the range of TD is dense in L2[a,b] in L2-norm. Again we start with any ψ~C0[a,b], and using ψ~L2[a,b], we have (TD(ψ~))(x)=xbψ~(η)dη=ψ~(x), i.e. C0[a,b]Range(TD). Hence we have Range(TD) dense in L2[a,b] in L2-norm. For convergence of {gm} to g1, note that TDψm=uψm=gm and TDφ=u=g1. And since {TDψm} converges weakly to TDφ in L2[a,b] implies {gm} converges weakly to g1 in L2[a,b].

Remark 5.2

It can be further proved that the sequence {ψm} converges strongly to ϕ in L2[a,b]. To prove this, first, one needs to analyse the operator associated with the minimizing functional G as defined in (Equation12), i.e. from the definition of uψ in (Equation7) together with the criterion uψH02[a,b] we have (from (Equation10), with g=TDψ and (TDψ)(a)=0, by the definition of TD) (36) uψ=ax(TDψ)(ξ)dξ+1baabaη(TDψ)(ξ)dξdη.(36) From the expression (Equation36), the operator L(ψ):=uψ is both linear and bounded, and hence minimizing the functional G in (Equation12) is equivalent to Landweber iterations corresponding to the operator L. Then from the convergence theories developed for Landweber iterations the sequence {ψm} converges to ϕ strongly in L2[a,b], for details on iterative regularization see [Citation14,Citation16].

Theorem 5.1 proves that for the given function g1L2[a,b] we are able to construct a sequence of smooth functions, {gm}L2[a,b], that converges (weakly) to g1 in L2[a,b]. This is critical since, when the data has noise in it one needs to terminate the descent process at an appropriate instance to attain regularization, see Section 5.3, and the (weak) convergence helps us to construct such a stopping criterion in the absence of noise information (δ), see Section 8.

Noisy data: In this section, we consider the data has noise in it and shows that the sequence of functions constructed during the descent process using the noisy data still approximates the exact solution, under some conditions.

5.2. Stability

Here we prove the stability of the process. We will prove this by considering the problem of numerical differentiation as equivalent to finding a unique minimizer of the positive functional G, this makes the problem well-posed. That is, for a given gH1[a,b], and hence a given uH03[a,b] and the functional G, the problem of finding (derivative) φL2[a,b] such that g=φ is equivalent to finding the minimizer of the functional G, i.e. a ψL2[a,b] such that G(ψ)ϵ, for any small ϵ>0, is a conditionally well-posed problem. It is not hard to prove that if two functions g, g~H1[a,b] are such that ||gg~||L2δ, where δ>0 is small, then the corresponding u, u~H03[a,b] also satisfyFootnote6 ||uu~||H1Cδ, for some constant C.

Theorem 5.3

Suppose the function u~ in the perturbed version of the function u such that ||uu~||H1δ, where δ>0, and let ϕ, φ~L2[a,b] denote their respective recovered functions, such that TDφ=u and TDφ~=u~. Let the functional G, without loss of generality, be defined based on u~, that is, G(φ~)=0, then we have (37) 0G(φ)Cδ2,(37) where C is some constant.

Proof.

Since uφ=u and u~φ~=u~, the proof follows from the definitions of the corresponding functionals, as G(φ)=||uφu~||L22=||uu~||L22Cδ2

In the next theorem, we prove that if a sequence of functions {ψm} converges to φ~ in L2[a,b], then it also approximates ϕ in L2[a,b], that is, if Gu~(ψm) is small, then Gu(ψm) is also small where Gu and Gu~ are the functionals formed based on u and u~, respectively.

Theorem 5.4

Suppose for a sequence of functions {ψm}L2[a,b], the corresponding sequence {Gu~(ψm)}R converges to zero, where the functional Gu~ is formed based on u~, then for the original u such that ||uu~||H1δ, for small δ>0, there exists a M(δ)N such that for all mM(δ), Gu(ψm)cδ2 for some constant c and Gu is the functional based on u.

Proof.

For u, u~ and any ψmL2[a,b], we have Gu(ψm)=||uuψm||L22 and Gu~(ψm)=||u~uψm||L22, then Gu(ψm)=||uuψm||L22||u~u||L22+||u~uψm||L22 and hence from Theorem 5.1, the result follows.

5.3. Conditional well-posedness (iterative-regularization)

As explained earlier, in an external-parameter based regularization method (like Tikhonov-type regularizations) first, one converts the ill-posed problem to a family of well-posed problem (depending on the parameter value β) and then, only after finding an appropriate regularization parameter value (say β0), one proceeds to the recovery process, i.e. completely minimize the corresponding functional G(,β0) (as defined in (Equation3)). Whereas in a classical iterative regularization method (not involving any external-parameters like β), one cannot recover a regularized solution by simply minimizing a related functional completely, instead stopping the recovery process at an appropriate instance provides the regularizing effect to the solution. That is, one starts to minimize some related functional (to recover the solution) but then terminates the minimization process at an appropriate iteration before it has reached the minimum (to restrict the influence of the noise), i.e. here the iteration stopping index serves as a regularization parameter, for details see [Citation14,Citation16].

In this section, we further explain the above phenomenon by showing that if one attempts to recover the true (or original) solution ϕ by using a noisy data then it will distort the recovery. First we see that for an exact g (or equivalently, an exact u) and the functional constructed based on it (i.e. Gu) we have the true solution (ϕ) satisfying Gu(φ)=0. However, for a given noisy g~, with ||g~g||L2=δ>0, and the functional based on it (i.e. Gu~) we will have Gu~(φ)>0, see Theorem 5.5. So if we construct a sequence of functions ψmδL2[a,b], using the descent algorithm and based on the noisy data g~, such that Gu~(ψmδ)0 then (from Theorem 5.1) we will have ψmδφ~, where φ~ is the recovered noisy solution satisfying Gu~(φ~)=0. This implies initially ψmδφ and then upon further iterations ψmδ diverges away from ϕ and approaches φ~. Hence, the errors in the recoveries ||ψmδφ||L2 follow a semi-convergence nature, i.e. decreases first and then increases. This is a typical behaviour of any ill-posed problem and is managed, as stated above, by stopping the descent process at an appropriate iteration M(δ) such that Gu~(ψM(δ)δ)>0 but close to zero (due to the stability Theorems 5.3 and 5.4). Following similar arguments as in (Equation15), we can have a lower bound for Gu~(φ).

Theorem 5.5

Given two functions u, u~H03[a,b], their respective recovery ϕ, φ~L2[a,b], such that TDφ=u and TDφ~=u~, and let the functional Gu~ be defined based on u~, that is, Gu~(φ~)=0, then we have (38a) Gu~(φ)=||uu~||L22λ1||uu~||L22,(38a) as an L2-lower bound and for a H1-lower bound, we have (38b) Gu~(φ)=||uu~||L22(1+1λ1)1||uu~||H12(38b) where λ1=π2/(ba)2 is the smallest eigenvalue of Δ on H02[a,b].

Therefore, combining Theorems 5.3 and 5.5 we have the following two-sided inequality for Gu~(φ), for some constants C1 and C2, 0C1||uu~||H12Gu~(φ)C2||uu~||H12. Thus, when δ0 we have Gu~(φ)0 which implies φδφ in L2[a,b]. Now though we would like to use the bounds in Theorem 5.5 to terminate the descent process, but we do not known the exact g (or equivalently, the exact u), and hence cannot use that as the stopping condition. However, if the error norm δ=||ggδ||L2 is known then one can use Morozov's discrepancy principle, [Citation22], as a stopping criterion for the iteration process, that is, terminate the iteration when (39) ||Tψmgδ||L2τδ(39) for an appropriate τ>1,Footnote7 and for unknown δ one usually goes for heuristic approaches to stop the iterations, an example of which is presented in Section 8.

6. Numerical implementation

In this section, we provide an algorithm to compute the derivative numerically. Notice that though one can use the integral operator equation (Equation1) to recover ϕ inversely, for computational efficiency we can further improve the operator and the operator equation, but keeping the theory intact. First we see that the adjoint operator TD can also provide an integral operator equation of interest, (40) TDφ=g2,(40) where g2(x)=g(x)g(b) and (TDφ)(x):=xbφ(ξ)dξ, for all x[a,b]R. Now we can combine both the operator equations (Equation1) and (Equation40) to get the following integral operator equation: (41) Tφ=g3,(41) where g3(x)=2g(x)(g(a)+g(b)) and for all x[a,b]R, (42) (Tφ)(x):=(TDφTDφ)(x)=axφ(ξ)dξxbφ(ξ)dξ.(42) The advantage of the operator equation (Equation41) over (Equation1) or (Equation40) is that it recovers ϕ symmetrically at the end points. For example if we consider the operator equation (Equation40) for the recovery of ϕ, then during the descent process the L2-gradient at ψ will be L2ψG=TD(2(uuψ)) (similar to (Equation19)) which implies L2ψG(a)=0 for all ψL2[a,b]. Hence the boundary data at a for all the evolving ψm's are going to be invariant during the descent.Footnote8 As for (Equation1), since L2ψG(b)=0, the boundary data at b for all the evolving ψm's are going to be invariant during the descent. Even though one can opt for the Sobolev gradient of G at ψ, H1ψG, to counter that problem but, due to the intrinsic decay of the base function L2ψG for all ψm's at a or b, the recovery of ϕ near that respective boundary will not be as good (or symmetric) as at the other end. On the other hand if we use the operator equation (Equation41) for the descent recovery of ϕ then the L2-gradient at ψ is going to be (43) L2ψG=T(2(uuψ))(43) where (44) T=TDTDorT=T.(44) Thus L2ψG(a)0 and L2ψG(b)0, and hence the recovery of ϕ at both the end points will be performed symmetrically. Now one can derive other gradients, like the Neuberger or conjugate gradient based on this L2-gradient, for the recovery of ϕ depending on the scenarios, that is, based on the prior knowledge of the boundary information (as explained in Section 4).

Corresponding to the operator equation (Equation41), the smooth or integrated data uH03[a,b] will be (45) u(x)=2xbaηg(ξ)dξdη(g(a)+g(b))[(ba)22(xa)22]bxba[2abaηg(ξ)dξdη(g(a)+g(b))(ba)22](45) Thus our problem set up now is as follows: for a given gH1[a,b] (and hence a given uH03[a,b]) we want to find a φL2[a,b] such that (46) Tφ=u.(46) Our inverse approach to achieve ϕ will be to minimize the functional G which is defined, for any ψL2[a,b], by (47) G(ψ)=||uuψ||L2,(47) where u is as defined in (Equation45) and uψ is the solution of the boundary value problem (48) Tψ=uψ,uψ(a)=u(a)anduψ(b)=u(b).(48) Since our new problem set up is almost identical to the old one, the previous theorems and results developed for TD can be similarly extended to T. Next we provide a pseudo-code, Algorithm 1, for the descent algorithm described earlier.

Remark 6.1

If prior knowledge of φ(a) and φ(b) is known then ψ0 can be defined as a straight line joining them and we use Dirichlet Neuberger gradient for the descent. If no prior information is known about φ(a) and φ(b), then we simply choose ψ00 and use the Sobolev gradient or conjugate gradient for the descent. It has been numerically seen that having any information of φ(a) or φ(b) and using it, together with appropriate gradient, significantly improves the convergence rate of the descent process and the efficiency of the recovery. In the examples presented here we have not assumed any prior knowledge of φ(a) or φ(b) to keep the problem settings as pragmatic as possible.

Remark 6.2

To solve the boundary value problem (Equation29) while calculating the Neuberger gradients we used the invariant embedding technique for better numerical results (see [Citation23]). This is very important as this technique enables us to convert the boundary value problem to a system of initial and final value problems and hence one can use the more robust initial value solvers, compared to boundary value solvers, which normally use shooting methods.

Remark 6.3

For all the numerical testings presented in Section 7, we assumed to have prior knowledge on the error norm (δ) and used it as a stopping criteria, as explained in Section 5.3, for the descent process. We also compare the results obtained without any noise information (i.e. using heuristic stopping strategy, see Section 8) with the results obtained using the noise information, see Tables  and  in Section 8.

7. Results

A MATLAB program was written to test the numerical viability of the method. We take an evenly spaced grid with h=102 in all the examples, unless otherwise specified. In all the examples, we used the discrepancy principle (see (Equation39)) to terminate the iterations when the discrepancy error goes below δ (which is assumed to be known). In Section 8, we discuss the stopping criterion when δ is unknown.

Example 7.1

Comparison with standard regularization methods

In this example, we compare the inverse recovery using our technique with some of the standard regularization methods. We again perturbed the smooth function g(x)=cos(x), here we consider the regularity of the data as H1 (i.e. gH(k+1)[a,b] for k = 0), on [.5,.5] by random noises to get g~(x)=g(x)+ϵ(x), where ε is a normal random variable with mean 0 and standard deviation σ. Like in [Citation1], we generated two data sets, one with h=102 (the dense set) and other with h=101 (the sparse set). We tested with σ=0.01 on both the data sets and σ=0.1 only on the dense set. We compare the relative errors, ||φφ~||L2/||φ||L2, obtained in our method (using Neumann H1G-gradient) with the relative errors provided in [Citation1], which are listed in Table . Here we can see that our method of numerical differentiation outperforms most of the other methods, in both the dense and sparse situation. Though Tikhonov method performs better for k = 2, that is when g is assumed to be in H3[a,b], but for the same smoothness consideration, gH1[a,b] or k = 0, it fails miserably. In fact, one can prove that the ill-posed problem of numerical differentiation turns out to be well-posed when gH(k+1)([a,b]) for k = 1, 2, see [Citation14], which explains the small relative errors in Tikhonov regularization for k = 1, 2. As stated above, the results in Table  are obtained using the Sobolev gradient, whereas Tables  and  show a comparison in the recovery errors using different gradients and different stopping criteria, as well as the descent rates associated with them. To compare with the total variation method, which is very effective in recovering discontinuities in the solution, we perform a test on a sharp-edged function similar to the one presented in [Citation1] and the results are shown in Example 7.4. Hence we can consider this method as an universal approach in every scenarios.

Table 1. Relative errors in derivative approximation.

Table 2. Relative errors and the descent rates using different gradients.

Table 3. Relative errors and the descent rates using different gradients.

In the next example, we show that one does not have to assume the normality conditions for the error term, i.e. the assumption that the noise involved should be iid normal random variables is not needed, which is critical for certain other methods, rather it can be a mixture of any random variables.

Example 7.2

Noise as a mixture of random variables

In the previous example, we saw that our method holds out in the presence of large error norm. Here we show that this technique is very effective even in the presence of extreme noise. We perturb the function g(x)=sin(x/3) on [0,3π] to g~(x)=g(x)+ϵ(x),Footnote9 where ε is the error function obtained from a mixture of uniform(δ, δ) and normal(0, δ) random variables, where δ=0.5. Figure  shows the noisy g~ and the exact g, and Figure (a) shows the computed derivative φ~ vs. φ(x)=cos(x/3)/3. The relative error for the recovery of φ~ is 0.0071.

Figure 1. Noisy g~.

Figure 1. Noisy g~.

Figure 2. Inverse recovery of the derivative φ~ and Tφ~: (a) derivative φ~ vs. ϕ and (b) recovery of φ~.

Figure 2. Inverse recovery of the derivative φ~ and Tφ~: (a) derivative φ~ vs. ϕ and (b) recovery of φ~.

In the next example, we further pushed the limits by not having a zero-mean error term, which is again crucial for many other methods.

Example 7.3

Error with non-zero mean

In this example, we will show that this method is impressive even when the noise involved has nonzero mean. We consider the settings of the previous example: g(x)=sin(x/3) on [0,3π] is perturbed to g~(x)=g(x)+ϵ(x) but here the error function ε is a mixture of uniform(-0.8δ, 1.2δ) and normal(0.1, δ), for δ=0.1. Figure (b) shows the recovery of the derivative ϕ versus the true derivative. The relative error of the recovery for ϕ is around 0.0719.

In the following two examples, we provide the results of numerical differentiation done on a piece-wise differentiable functions and compare it with the results obtained in [Citation1,Citation2].

Example 7.4

Discontinuous source function

Here we selected a function randomly from the many functions tested in [Citation2]. The selected function has the following definition: y2(t)={1t,t[0,0.5],t,t(0.5,1]. The function y2 is piecewise differentiable except at the point t = 0.5, where it has a sharp edge. The function is then perturbed by a uniform(δ, δ) random variable to get the noisy data y2δ, shown in Figure a, where we even increased the error norm in our testing from δ=0.001 in [Citation2] to δ=0.01 in our case. Figure  shows the recoveries using the method described here and comparing it with the result obtained in [Citation2], one can see that our recovery outperforms, especially at the boundary points, the recovery in [Citation2]. We also compare it with a similar result obtained in [Citation1]Footnote10 using a total variation regularization method, shown in Figure (b).

Figure 3. Recoveries using out method: (a) numerical derivative φ~ and (b) smooth approximation Tφ~.

Figure 3. Recoveries using out method: (a) numerical derivative φ~ and (b) smooth approximation Tφ~.

Figure 4. (a) Noisy data for Example 7.4 and (b) total variation regularization from [Citation1].

Figure 4. (a) Noisy data for Example 7.4 and (b) total variation regularization from [Citation1].

Figure 5. Integration smooths out the noise present in the data: (a) noisy g~ vs exact g and (b) noisy u~ vs exact u.

Figure 5. Integration smooths out the noise present in the data: (a) noisy g~ vs exact g and (b) noisy u~ vs exact u.

8. Stopping criterion II

As explained in Section 5.3, in the presence noise, one has to terminate the descent process at an appropriate iteration to achieve regularization. The discrepancy principle [Citation24–26] provides a stopping condition provided the error norm (δ) is known. However, in many of the practical situations it is very hard to determine an estimate of the error norm. In such cases, heuristic approaches are taken to determine stopping criteria, such as the L-curve method [Citation27,Citation28]. In this section, we present a new heuristic approach to terminate the iterations when the error norm (δ) is unknown. First we notice that the minimizing functional G used here, as defined in (Equation12), does not contain the noisy g~ directly, rather an integrated (smoothed) version of it (u~), as compared to a minimizing functional (such as G2, defined in (Equation3)) used in any standard regularization method. Hence, in addition to avoiding the noisy data from affecting the recovery, the integration process also helps in constructing a stopping strategy, which is explained below. Figure  shows the difference in g and g~ vs. u and u~, from Example 7.2. We can see, from Figure (b), that the integration smooths out the noise present in g~ to get ||u~u||L2/||u||L20.78%, whereas the noise level in g~ is ||g~g||L2/||g||L255.44%. Consequently, the sequence {g~m:=Tψm}m1, constructed during the descent process, converges (weakly) in L2[a,b] to g~, rather than strongly to g~, that is, for any ϕL2[a,b] the sequence {(g~mg~,ϕ)L2} converges to zero. In other words, the integration mitigates the effects of the high oscillations originating from the random variable and also of any outliers (as its support is close to zero measure). Also, since the forward operator T (as defined in (Equation42)) is smooth, the sequence g~m first approximate the exact g (as it is also smooth, gH1[a,b]), with the corresponding sequence {uψm} approximating u~u, and then the sequence {g~m} attempts to fit the noisy g~, which leads to a phenomenon known as overfitting. However, when g~m tries to overfit the data (i.e. fit g~) the sequence values ||u~uψm||L2 increases, since the ovefitting occurs in a smooth fashion (as T is a smooth operator) and, as a result, increases the integral values. This effect can be seen in Figure (a), ||uψmuδ||L2 descent for Example 7.1 (when σ=0.1), and in Figure (b), ||Tψmgδ||L2 descent for Example 7.2. One can capture the recoveries at these fluctuatingFootnote11 points (of either ||Tψmgδ||L2, ||uψmuδ||L2 or ||uψmuδ||L2) and choose the recovery corresponding to the earliest iteration for which Tψm fits through gδ. Choosing the early fluctuating iteration is especially important when dealing with data with large error level, such as in Example 7.1 (σ=0.1) and Example 7.2. For example, from Figure (b) if one captures the recovery at iteration 4 then the relative error in the recovery is only 8% (see Figure  a). However, even if an appropriate early iteration is not selected, still the recovery errors saturate after certain iterations, rather than blowing up. This is significant when dealing with data having small to moderate error level, such as in Example 7.1 (σ=0.01), where one can notice (in Figure b) that the relative errors of the recoveries attain saturation after recovering the optimal solution, since ||uuδ||L20 for small δ. Tables  and  show the relative errors of the recoveries obtained using this heuristic stopping criterion.

Figure 6. Fluctuations during the descent process. (a) ||uψmuδ||L2, Example 7.1 (σ=0.1) and (b) ||g~gm||L2, Example 7.2.

Figure 6. Fluctuations during the descent process. (a) ||uψm−uδ||L2, Example 7.1 (σ=0.1) and (b) ||g~−gm||L2, Example 7.2.

Figure 7. Relative errors in the recoveries during the descent process: (a) Example 7.2 and (b) Example 7.1 (σ=0.01).

Figure 7. Relative errors in the recoveries during the descent process: (a) Example 7.2 and (b) Example 7.1 (σ=0.01).

Remark 8.1

Note that this phenomena does not occur in Landweber iterations, since one minimizes the functional containing the noisy data gδ directly, i.e. G(ψ)=||Tψgδ||L22. Figure (a) shows the descent of ||Tψmgδ||L2, when Landweber iterations are implemented for Example 7.1 (σ=0.01) and Figure (b) shows the corresponding descent of the relative errors of the recovered solutions, where can see no fluctuations in Figure (a) but the semi-convergence in Figure (b). Therefore, without any prior knowledge of δ it is hard to stop the descent process and avoids the ill-posedness. Whereas, notice the saturation of the relative errors of the recovery (Figure (b)) when we implement our method to the same problem.

Figure 8. Landweber iterations on Example 7.1 (σ=0.01): (a) ||Tψmgδ||L2-descent and (b) relative errors descent.

Figure 8. Landweber iterations on Example 7.1 (σ=0.01): (a) ||Tψm−gδ||L2-descent and (b) relative errors descent.

9. Conclusion and future research

This algorithm for numerical differentiation is very effective, even in the presence of extreme noise, as can be seen from the examples presented in Section 7. Furthermore, it serves as a universal method to deal with all scenarios such as when the data set is dense or sparse and when the function g is smooth or not smooth. The key feature in this technique is that we are able to upgrade the working space of the problem from H1[a,b] to H03[a,b], which is a much smoother space. Additionally, this method also enjoys many advantages of not encountering the involvement of an external regularization parameter, for example one does not have to determine the optimum parameter choice to balance between the fitting and smoothing of the inverse recovery. Even the heuristic approach for the stopping criteria also provides us with a much better recovery, and hence it's very applicable in the absence of the error norm.

In a follow-up paper, we improve this method to calculate derivatives of functions in higher dimensions and for higher order derivatives. Moreover, we can extend this method to encompass any linear inverse problems and thereby generalize the theory , which will be presented in the coming paper where we will apply this method to recover solution of Fredholm Integral Equations, like deconvolution or general Volterra equation.

Acknowledgments

I am very grateful to Prof. Ian Knowles for his support, encouragement and stimulating discussions throughout the preparation of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 The technicality as to how far g can be weakened so that the solution of equation (Equation1) is uniquely and stably recovered (finding the ϕ inversely) will be discussed later.

2 Again the domain space for the functionals G1, G2 and G are discussed in details later.

3 It can be further proved that it's also the first Fre´chet derivative of G at ψ.

4 Hence TD is also a linear and bounded operator in L2[a,b].

5 Again, it can be proved that it's the second Fre´chet derivative of G at ψ.

6 Just for simplicity, we assume g(a)=g~(a).

7 In our experiments, we considered τ=1 and the termination condition as ||Tψmgδ||L2<δ.

8 As explained in the L2-gradient version of the descent algorithm for G, in Section 4.

9 To be consistent with 6, we kept g~(a)=g(a) and g~(b)=g(b), but can be avoided.

10 Where the test function is g(x)=|x0.5| on [0,1] and the data set is 100 uniformly distributed points with σ=0.01.

11 the fluctuating occurs since the values of ||g~gm||L2 tends to decrease first (when approximating the exact g) and then increases (when making a transition from g to g~) and eventually decreases (when trying to fit the noisy g~, i.e. overfitting)

References

  • Knowles I, Renka RJ. 2014. Methods for numerical differentiation of noisy data. Proceedings of the variational and topological methods: theory, applications, numerical simulations, and open problems, Vol. 21. San Marcos (TX): Texas State University; 2014. p. 235–246.
  • Lu S, Pereverzev SV. Numerical differentiation from a viewpoint of regularization theory. Math Comp. 2006;75(256):1853–1870.
  • Wei T, Hon YC. Numerical derivatives from one-dimensional scattered noisy data. J Phys: Conf Ser. 2005;12:171–179.
  • Ramm AG, Smirnova AB. On stable numerical differentiation. Math Comp. 2001;70(235):1131–1153.
  • Háo DN, Chuong LH, Lesnic D. Heuristic regularization methods for numerical differentiation. Comput Math Appl. 2012;63(4):816–826.
  • Jauberteau F, Jauberteau JL. Numerical differentiation with noisy signal. Appl Math Comput. 2009;215(6):2283–2297.
  • Stickel JJ. Data smoothing and numerical differentiation by a regularization method. Comput Chem Eng. 2010;34:467–475.
  • Zhao Z, Meng Z, He G. A new approach to numerical differentiation. J Comput Appl Math. 2009;232(2):227–239.
  • Wang Z, Wang H, Qiu S. A new method for numerical differentiation based on direct and inverse problems of partial differential equations. Appl Math Lett. 2015;43:61–67.
  • Murio DA. The mollification method and the numerical solution of ill-posed problems. New York: A Wiley-Interscience Publication, John Wiley & Sons, Inc.; 1993.
  • Hào DN. A mollification method for ill-posed problems. Numer Math. 1994;68(4):469–506.
  • Hào ÐN, Reinhardt H-J, Seiffarth F. Stable numerical fractional differentiation by mollification. Numer Funct Anal Optim. 1994;15(5–6):635–659.
  • Hào DN, Reinhardt H-J, Schneider A. Stable approximation of fractional derivatives of rough functions. BIT Numer Math. 1995;35(4):488–503.
  • Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Vol. 375. Dordrecht: Kluwer Academic Publishers Group; 1996; Mathematics and its applications.
  • Kaltenbacher B. Some Newton type methods for the regularization of nonlinear ill-posed problems. Inverse Probl. 1997;13:729–753.
  • Kaltenbacher B., Neubauer A., Scherzer O. Iterative regularization methods for nonlinear ill-posed problems. Vol. 6. Berlin: Walter de Gruyter GmbH & Co. KG; 2008; Radon series on computational and applied mathematics.
  • Bakushinskii AB. The problems of the convergence of the iteratively regularized Gauss–Newton method. Comput Math Math Phys. 1992;32:1353–1359.
  • Hanke M. A regularizing Levenberg-Marquardt scheme, with applications to inverse groundwater filtration problems. Inverse Probl. 1997;13:79–95.
  • Knowles I, Wallace R. A variational method for numerical differentiation. Numer Math. 1995;70(1):91–110. 65D25 (49M10) [96h:65031].
  • Neuberger JW. Sobolev gradients in differential equations. New York: Springer-Verlag; 1997. (Lecture notes in mathematics; vol. 1670).
  • Knowles I. Variational methods for ill-posed problems. In: Neuberger JM, editor. Variational methods: open problems, recent progress, and numerical algorithms (Flagstaff, Arizona, 2002). Providence (RI): American Mathematical Society; 2004. p. 187–199. (Contemporary mathematics; vol. 357).
  • Morozov VA. Methods for solving incorrectly posed problems. New York: Springer-Verlag; 1984.
  • Fox L, Mayers DF. Numerical solution of ordinary differential equations. London: Chapman & Hall; 1987.
  • Gfrerer H. An a posteriori parameter choice for ordinary and iterated Tikhonov regularization of ill-posed problems leading to optimal convergence rates. Math Comput. 1987;49:507–22.
  • Morozov VA. On the solution of functional equations by the method of regularization. Sov Math Dokl. 1966;7:414–417.
  • Vainikko GM. The principle of the residual for a class of regularization methods. USSR Comput Math Math Phys. 1982;22:1–19.
  • Hansen PC. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 1992;34(4):561–580.
  • Lawson CL, Hanson RJ. Solving least squares problems. Englewood Cliffs (NJ): Prentice-Hall; 1974.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.