542
Views
4
CrossRef citations to date
0
Altmetric
Articles

On the choice of Lagrange multipliers in the iterated Tikhonov method for linear ill-posed equations in Banach spaces

, &
Pages 796-826 | Received 21 Feb 2019, Accepted 15 Aug 2019, Published online: 09 Sep 2019

ABSTRACT

This article is devoted to the study of nonstationary Iterated Tikhonov (nIT) type methods (Hanke M, Groetsch CW. Nonstationary iterated Tikhonov regularization. J Optim Theory Appl. 1998;98(1):37–53; Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Vol. 375, Mathematics and its Applications. Dordrecht: Kluwer Academic Publishers Group; 1996. MR 1408680) for obtaining stable approximations to linear ill-posed problems modelled by operators mapping between Banach spaces. Here we propose and analyse an a posteriori strategy for choosing the sequence of regularization parameters for the nIT method, aiming to obtain a pre-defined decay rate of the residual. Convergence analysis of the proposed nIT type method is provided (convergence, stability and semi-convergence results). Moreover, in order to test the method's efficiency, numerical experiments for three distinct applications are conducted: (i) a 1D convolution problem (smooth Tikhonov functional and Banach parameter-space); (ii) a 2D deblurring problem (nonsmooth Tikhonov functional and Hilbert parameter-space); (iii) a 2D elliptic inverse potential problem.

2010 MATHEMATICS SUBJECT CLASSIFICATIONS:

1. Introduction

In this article we investigate nonstationary Iterated Tikhonov (nIT) type methods [Citation1,Citation2] for obtaining stable approximations of linear ill-posed problems modelled by operators mapping between Banach spaces. The novelty of our approach consists in the introduction of an a posteriori strategy for choosing the sequence of regularization parameters (or, equivalently, the Lagrange multipliers) for the nIT iteration, which play a key role in the convergence speed of the nIT iteration.

This new a posteriori strategy aims to enforce a pre-defined decay of the residual in each iteration; it differs from the classical choice for the Lagrange multipliers (see, e.g. [Citation2,Citation3]), which is based on an a priori strategy (typically geometrical) and leads to an unknown decay rate of the residual.

The inverse problem we are interested in consists of determining an unknown quantity xX from given data yY, where X, Y are Banach spaces. We assume that data are obtained by indirect measurements of the parameter, this process being described by the ill-posed operator equation (1) Ax=y,(1) where A:XY is a bounded linear operator, whose inverse A1:R(A)X either does not exist, or is not continuous. In practical situations, one does not know the data y exactly; instead, only approximate measured data yδY are available with (2) yδyYδ,(2) where δ>0 is the (known) noise level. A comprehensive study of linear ill-posed problems in Banach spaces can be found in the text book [Citation4] (see, e.g. [Citation1] for a corresponding theory in Hilbert spaces).

Iterated Tikhonov type methods are typically used for linear inverse problems. In the Hilbert space setting we refer the reader to [Citation2] for linear operator equations, and also to [Citation5] for the nonlinear case. In the Banach space setting the research is still ongoing. Some preliminary results can be found in [Citation3] for linear operator equations; see [Citation6] for the nonlinear case; see also [Citation7]. In all references above, a priori strategies are used for choosing the Lagrange multipliers.

1.1. Main results: presentation and interpretation

The approach discussed in this manuscript is devoted to the Banach space setting, and consists in adopting an a posteriori strategy for the choice of the Lagrange multipliers. The strategy used here is inspired by the recent work [Citation8], where the authors propose an endogenous strategy for the choice of the Lagrange multipliers in the nonstationary iterated Tikhonov method for solving (Equation1), (Equation2) when X and Y are Hilbert spaces. The penalty terms used in our Tikhonov functionals are the same as in [Citation6] and consist of Bregman distances induced by (uniformly) convex functionals, e.g. the sum of the L2-norm with the TV -seminorm. In our previous work [Citation9], we implemented the method proposed in this paper and investigated its numerical performance (two different ill-posed problems were solved). Here, we extend our previous results by presenting a whole convergence analysis. Additionally, the corresponding algorithm is implemented for solving three benchmark problems and more details concerning the computation of the minimizers of the Tikhonov functional are provided; our numerical results are compared with the ones obtained using the nIT method using the classical geometrical choice of Lagrange multipliers [Citation6].

In what follows, we briefly interpret the main results: The proposed method defines each Lagrange multiplier such that the residual of the corresponding next iterate lies in an interval which depends on both the noise level and the residual of the current iterate (see (Equation23)). This fact has the following consequences: (1) it forces a geometrical decay of the residual (Proposition 4.1); (2) it guarantees the possibility of computing multipliers in agreement with the theoretical convergence results (Theorems 4.6, 4.8 and 4.9); (3) the computation of the multipliers demands less numerical effort than the classical strategy of computing the multipliers by solving an equation in each iteration; (4) the next iterate is not uniquely determined by the current one; instead, it is chosen within a set of successors of the current iterate (Definition 4.7). We also address the actual computation of the Lagrange multipliers. Since each multiplier is implicitly defined by an inequality, we discuss a numerically efficient strategy for computing them, which is based on the decrease rate of the past residuals (Section 5.1).

1.2. Outline of the article

This manuscript is outlined as follows: In Section 2 a revision of relevant background material is presented. In Section 3 the new nIT method is introduced. Section 4 is devoted to the convergence analysis of the nIT method. In Section 5 possible implementations of our method are discussed; the evaluation of the Lagrange multipliers is addressed, as well as the issue of minimizing the Tikhonov functionals. Section 6 is devoted to numerical experiments, while Section 7 is dedicated to final remarks and conclusions.

2. Background material

For details on the material discussed in this section, we refer the reader to the textbooks [Citation4,Citation10].

Unless the contrary is explicitly stated, we always consider X a real Banach space. The effective domain of the convex functional f:XR¯:=(,] is defined as Dom(f):={xX:f(x)<}. The set Dom(f) is always convex and we call f proper provided it is non-empty. We call f uniformly convex if there exists a continuous and strictly increasing function φ:R0+R0+ with the property φ(t)=0 implies t=0, such that (3) f(λx+(1λ)y)+λ(1λ)φ(xy)λf(x)+(1λ)f(y),(3) for all λ(0,1) and x,yX. Of course f uniformly convex implies f strictly convex, which in turn implies f convex. The functional f is lower semi-continuous (in short l.s.c.) if for any sequence (xk)kNX satisfying xkx, it holds f(x)liminfkf(xk). It is called weakly lower semi-continuous (w.l.s.c.) if above property holds true with xkx replaced by xkx. Obviously, every w.l.s.c functional is l.s.c. Further, any Banach space norm is w.l.s.c.

The sub-differential of a functional f:XR¯ is the point-to-set mapping f:X2X defined by f(x):={xX:f(x)+x,yxf(y)forallyX}. Any element in the set f(x) is called a sub-gradient of f at x. The effective domain of f is the set Dom(f):={xX:f(x)}. It is clear that the inclusion Dom(f)Dom(f) holds whenever f is proper.

Sub-differentiable functionals and l.s.c. convex functionals are very close related concepts. In fact, a sub-differentiable functional f is convex and l.s.c. in any open convex set of Dom(f). On the other hand, a proper, convex and l.s.c. functional is always sub-differentiable on its effective domain.

The definition of sub-differential readily yields 0f(x)f(x)f(y)forallyX. If f,g:XR¯ are convex functionals and there is a point xDom(f)Dom(g) where f is continuous, then (4) (f+g)(x)=f(x)+g(x)forallxX.(4) Moreover, if Y is a real Banach space, h:YR¯ is a convex functional, bY, A:XY is a bounded linear operator and h is continuous at some point of the range of A, then (h(b))(y)=(h)(yb)and(hA)(x)=A(h(Ax)), for all xX and yY, where A:YX is the Banach-adjoint of A. Consequently, (5) (h(Ab))(x)=A(h)(Axb)forallxX.(5) If a convex functional f:XR¯ is Gâteaux-differentiable at xX, then f has a unique sub-gradient at x, namely, the Gâteaux-derivative itself: f(x)={f(x)}.

The sub-differential of the convex functional (6) f(x)=1pxp,p>1,(6) is called the duality mapping and is denoted by Jp. It can be shown that for all xX, Jp(x)={xX:x,x=xxandx=xp1}. Thus, the duality mapping has the inner-product-like properties: x,yxp1yandx,x=xp, for all xJp(x). By using the Riesz Representation Theorem, one can prove that J2(x)=x for all xX whenever X is a Hilbert space.

Banach spaces are classified according with their geometrical characteristics. Many concepts concerning these characteristics are usually defined using the so called modulus of convexity and modulus of smoothness, but most of these definitions can be equivalently stated observing the properties of the functional f defined in (Equation6).Footnote1 This functional is convex and sub-differentiable in any Banach space X. If (Equation6) is Gâteaux-differentiable in the whole space X, this Banach space is called smooth. In this case, Jp(x)=f(x)={f(x)} and therefore, the duality mapping Jp:XX is single-valued. If the functional f in (Equation6) is Fré chet-differentiable in X, this space is called locally uniformly smooth and it is called uniformly smooth provided f is uniformly Fréchet-differentiable in bounded sets. As a result, the duality mapping is continuous (resp. uniformly continuous in bounded sets) in locally uniformly smooth (resp. uniformly smooth) spaces. It is immediate that uniform smoothness of a Banach space implies local uniform smoothness, which in turn implies smoothness of this space. Moreover, none reciprocal is true. Similarly, a Banach space X is called strictly convex whenever (Equation6) is a strictly convex functional. Moreover, X is called uniformly convex if the functional f in (Equation6) is uniformly convex. It is clear that uniform convexity implies strict convexity. It is well-known that both uniformly smooth and uniformly convex Banach spaces are reflexive.

Assume f is proper. Then, choosing elements x,yX with yDom(f), we define the Bregman distance between x and y in the direction of ξf(y) as Dξf(x,y):=f(x)f(y)ξ,xy. Obviously, Dξf(y,y)=0 and, since ξf(y), it additionally holds that Dξf(x,y)0. Moreover, it is straightforward proving the Three Points Identity: Dξ1f(x2,x1)Dξ1f(x3,x1)=Dξ3f(x2,x3)+ξ3ξ1,x2x3, for all x2X, x1,x3Dom(f), ξ1f(x1) and ξ3f(x3). Further, the functional Dξf(,y) is strictly convex whenever f is strictly convex, and in this case, Dξf(x,y)=0 iff x=y.

When f is the functional defined in (Equation6) and X is a smooth Banach space, the Bregman distance has the special notation Δp(x,y), i.e. Δp(x,y):=1pxp1pypJp(y),xy. Since J2 is the identity operator in Hilbert spaces, a simple application of the polarization identity shows that Δ2(x,y)=12xy2 in these spaces.

It is not difficult to prove (see e.g. [Citation9]) that if f:XR¯ is uniformly convex, then (7) φ(xy)Dξf(x,y)(7) for all xX, yDom(f) and ξf(y), where ϕ is the function in (Equation3). In particular, in a smooth and uniformly convex Banach space X, the above inequality reads φ(xy)Δp(x,y).

We say that a functional f:XR¯ has the Kadec property if for any sequence (xk)kNX, the weak convergence xkx, together with f(xk)f(x)<, implies xkx. It is not difficult to prove (see e.g. [Citation6]) that any proper, w.l.s.c. and uniformly convex functional has the Kadec property. In particular, the norm in a uniformly convex Banach space has this property.

Concrete examples of Banach spaces of interest are the Lebesgue space Lp(Ω), the Sobolev space Wn,p(Ω), nN, and the space of psummable sequences p(R). All these Banach spaces are both uniformly convex and uniformly smooth provided that 1<p<.

3. The nIT method

In this section, we present the nonstationary iterated Tikhonov (nIT) type method considered in this article, which aims to find stable approximate solutions to the inverse problem (Equation1), (Equation2). The method proposed here is in the spirit of the method in [Citation6]. The distinguishing feature is the use of an a posteriori strategy for the choice of the Lagrange multipliers, as detailed below.

For a fixed r>1 and a uniformly convex penalty term f, the nIT method defines sequences (xkδ)kN in X and (ξkδ)kN in X iteratively by xkδ:=argminxXλkδr1Axyδr+Dξk1δf(x,xk1δ),ξkδ:=ξk1δλkδAJr(Axkδyδ), where the Lagrange multiplier λkδ>0 is to be determined using only information about A, δ, yδ and xk1δ.

Our strategy for choosing the Lagrange multipliers is inspired by the work [Citation8], where the authors propose an endogenous strategy for the choice of the Lagrange multipliers in the nonsationary iterated Tikhonov method for solving (Equation1), (Equation2) when X and Y are Hilbert spaces. This method is based on successive orthogonal projection methods onto a family of shrinking, separating convex sets. Specifically, the iterative method in [Citation8] obtains the new iterate projecting the current one onto a levelset of the residual function, whose level belongs to a range defined by the current residual and by the noise level. Moreover, the admissible Lagrange multipliers (in each iteration) shall be chosen in a non-degenerate interval.

Aiming to extend this framework to the Banach space setting, we are forced to introduce Bregman distance and Bregman projections. This is due to the well-known fact that in Banach spaces the metric projection onto a convex and closed set C, defined as PC(x)=argminzCzx2, loses the decreasing distance property of the orthogonal projection in Hilbert spaces. In order to recover this property, one should minimize in Banach spaces the Bregman distance, instead of the norm-induced distance.

For the remaining of this article we adopt the following main assumptions:

  1. There exists an element xX such that Ax=y, where yR(A) is the exact data.

  2. f is a w.l.s.c. function.

  3. f is a uniformly convex function.

  4. X and Y are reflexive Banach spaces and Y is smooth.

Moreover, we denote by ΩμrX the μ-levelset of the residual functional Axyδ, i.e. Ωμr:={xX:r1Axyδrr1μr}. Note that, since A is a continuous linear operator, it follows that Ωμr is closed and convex. Now, given x^Dom(f) and ξf(x^), we define the Bregman projection of x^ onto Ωμr, as a solution of the minimization problem (8) {minDξf(x,x^)s.t.r1Axyδrr1μr.(8) It is worth noticing that a solution of the problem (Equation8) depends on the sub-gradient ξ. Furthermore, since Dξf(,x^) is strictly convex (which follows from the uniformly convexity of f), problem (Equation8) has at most one solution. The fact that the Bregman projection is well defined when μ>δ (in this case we set PΩμrf(x^):=argminxΩμrDξf(x,x^)) is a consequence of the following lemma.

Lemma 3.1

If μ>δ, then problem (Equation8) has a solution.

Proof.

Assumption (A.1), together with equation (Equation2) and the assumption that μ>δ, implies that the feasible set of problem (Equation8), i.e. the set Ωμr, is nonempty.

From Assumptions (A.2) and (A.3) it follows that Dξf(,x^) is proper, convex and l.s.c. Furthermore, relation (Equation7) implies that Dξf(,x^) is a coercive function. Hence, the lemma follows using the reflexivity of X together with [Citation11, Corollary 3.23].Footnote2

Note that if 0μμ, then ΩμrΩμr and A1(y)Ωμr for all μδ. Furthermore, with the available information of the solution set of (Equation1), Ωδr is the set of best possible approximate solution for this inverse problem. However, since problem (Equation8) may be ill-posed when μ=δ, our best choice is to generate xkδ from xk1δΩδr as a solution of problem (Equation8), with x^=xk1δ and μ=μk such that we guarantee a reduction of the residual norm while preventing ill-posedness of (Equation8).

For this purpose, we analyse in the sequel the minimization problem (Equation8) by means of Lagrange multipliers. The Lagrangian function associated to problem (Equation8) is L(x,λ)=λr(Axyδrμr)+Dξf(x,x^). Note that, for each λ>0, the function L(,λ):XR¯ is l.s.c. and convex. For any λ>0 define the functions (9) π(x^,λ):=argminxXL(x,λ),Gx^(λ):=Aπ(x^,λ)yδr.(9) The next lemma provides a classical Lagrange multiplier result for problem (Equation8), which will be useful for formulating the nIT method.

Lemma 3.2

If Ax^yδ>μ>δ, then the following assertions are equivalent

  1. x is a solution of (Equation8);

  2. there exists λ>0 satisfying x=π(x^,λ) and Gx^(λ)=μr.

Proof.

It follows from (Equation2), Assumption (A.1) and the hypothesis μ>δ that xX satisfies Axyδr<μr. This inequality implies the Slater condition for problem (Equation8). Thus, since A is continuous and Dξf(,x^) is l.s.c., we conclude that x is a solution of (Equation8) if and only if there exists λR such that the point (x,λ) satisfies the Karush–Kuhn–Tucker (KKT) conditions for this minimization problem [Citation12], namely λ0,Gx^(λ)μr,λ(Gx^(λ)μr)=0,0xL(x,λ). If we assume λ=0 in the relations above, then the definition of the Lagrangian function, together with the strictly convexity of Dξf(,x^), implies that x^ is the unique minimizer of L(,0). Moreover, since Ax^yδ>μ, we conclude that the pair (x^,0) does not satisfy the KKT conditions. Consequently, we have λ>0 and Gx^(λ)μr=0. The lemma follows using the definition of π(x^,λ).

We are ready to present the nIT method for solving (Equation1).

Properties (Equation4) and (Equation5), together with the definition of the duality mapping Jr, imply that the point xkδX minimizes the optimization problem in [3.2] if and only if (10) 0λkδAJr(Axkδyδ)+f(xkδ)ξk1δ.(10) Hence, since Y is a smooth Banach space, the duality mapping Jr is single valued and ξk1δλkδAJr(Axkδyδ)f(xkδ). Consequently, ξkδ in step 3.2 of Algorithm 1 is well defined and it is a sub-gradient of f at xkδ.

Notice that the stopping criteria in Algorithm 1 corresponds to the discrepancy principle, i.e. the iteration stops at step k(δ) defined by (11) k(δ):=min{k1;Axjδyδ>τδ,j=0,,k1andAxkδyδτδ}.(11)

Remark 3.3

Novel properties of the proposed method

  1. The strategy used here is inspired by the recent work [Citation8], where the authors propose an endogenous strategy for the choice of the Lagrange multipliers in the nonstationary iterated Tikhonov method for solving (Equation1), (Equation2) when X and Y are Hilbert spaces.

  2. The penalty terms used in our Tikhonov functionals are the same as in [Citation6] and consist of Bregman distances induced by (uniformly) convex functionals, e.g, the sum of the L2-norm with the TV -seminorm.

  3. We present a whole convergence analysis for the proposed method, characterizing it as a regularization metod.

4. Convergence analysis

In this section, we analyse the convergence properties of Algorithm 1. We begin by presenting the following result that establishes an estimate for the decay of the residual Axkδyδ. It can be proved in much the same manner as [Citation8, Proposition 4.1], and for the sake of brevity we omit the proof here.

Proposition 4.1

Let (xkδ)0kk(δ) be the (finite) sequence defined by the nIT method (Algorithm 1), with δ0 and yδY as in (Equation2). Then, [Axkδyδδ]η[Axk1δyδδ]ηk[Ax0yδδ],k=1,,k(δ), where k(δ)N is defined by (Equation11).

As a direct consequence of Proposition 4.1 we have that in the noisy data case, the discrepancy principle terminates the iteration after finitely many steps, i.e. k(δ)<. Furthermore, the corollary below gives an estimate for the stopping index k(δ).

Corollary 4.2

Let (xkδ)0kk(δ) be the (finite) sequence defined by the nIT method (Algorithm 1), with δ>0 and yδY as in (Equation2). Then, the stopping index k(δ), defined in (Equation11), satisfies k(δ)|lnη|1ln[Ax0yδδ(τ1)δ]+1.

In the next proposition we prove monotonicity of the sequence (Dξkδf(x,xkδ))kN and we also estimate the gain Dξk1δf(x,xk1δ)Dξkδf(x,xkδ), where xX satisfies Assumption (A.1).

Proposition 4.3

Let (xkδ)0kk(δ) be the (finite) sequence defined by the nIT method (Algorithm 1), with δ0 and yδY as in (Equation2). Then, for every x satisfying Assumption (A.1) it holds (12) Dξkδf(x,xkδ)Dξk1δf(x,xk1δ)Dξk1δf(xkδ,xk1δ)λkδ(11τ)Axkδyδr,(12) for k=1,,k(δ)1.

Proof.

Using the three points identity we have (13) Dξkδf(x,xkδ)Dξk1δf(x,xk1δ)=Dξk1δf(xkδ,xk1δ)+ξkδξk1δ,xkδx=Dξk1δf(xkδ,xk1δ)λkδAJr(Axkδyδ),xkδx,(13) where the second equality above follows from the definition of ξkδ. Simple manipulations of the second term on the right hand side above yield (14) λkδAJr(Axkδyδ),xkδx=λkδJr(Axkδyδ),Axkδyδ+λkδJr(Axkδyδ),yδy=λkδAxkδyδr+λkδJr(Axkδyδ),yδy.(14) Combining these two relations we obtain Dξkδf(x,xkδ)Dξk1δf(x,xk1δ)Dξk1δf(xkδ,xk1δ)λkδAxkδyδr+λkδAxkδyδr1yδyDξk1δf(xkδ,xk1δ)λkδAxkδyδr+λkδAxkδyδr1δ, where the last inequality follows from (Equation2). Since k{1,,k(δ)1}, we have τδAxkδyδ. Thus, Dξkδf(x,xkδ)Dξk1δf(x,xk1δ)Dξk1δf(xkδ,xk1δ)λkδAxkδyδr+λkδτAxkδyδr. We deduce (Equation12) from the above inequality.

Corollary 4.4

Let (xk)kN be the sequence defined by the nIT method (Algorithm 1) with δ=0 and (λk)kN be the sequence of corresponding Lagrange multipliers. Then, for all k=1,2,, and any x satisfying (A.1), we have (15) Dξkf(x,xk)=Dξk1f(x,xk1)Dξk1f(xk,xk1)λkAxkyr.(15) Consequently, using a telescopic sum, we obtain (16) k=1λkAxkyr<.(16)

Proof.

In the exact data case, i.e. δ=0 and yδ=y, equality (Equation14) becomes λkAJr(Axky),xkx=λkAxkyr. Combining the formula above with (Equation13), when δ=0, we deduce (Equation15). The result in (Equation16) follows directly from (Equation15).

We now fix a x0Dom(f) and ξ0f(x0), and study the existence and uniqueness of a vector xX with the property (17) Dξ0f(x,x0)=inf{Dξ0f(x,x0):xDom(f)andAx=y}.(17) Such an element x is called a x0minimal-distance solution and it is the equivalent of the x0minimal-norm solution of (Equation1) in the Hilbert space setting [Citation1].

Lemma 4.5

There exists a unique element xX satisfying (Equation17).

Proof.

Let (xk)kNDom(f) be a sequence satisfying Axk=y and Dξ0f(xk,x0)a:=inf{Dξ0f(x,x0):xDom(f)andAx=y}, then the sequence (Dξ0f(xk,x0))kN is bounded, and because f is uniformly convex we have that (xk)kN is bounded as well. Since X is reflexive, there exist a vector x¯X and a subsequence (xkj)jN such that xkjx¯. It follows that Ax¯=y and because f is w.l.s.c. we have Dξ0f(x¯,x0)lim infjDξ0f(xkj,x0)=limjDξ0f(xkj,x0)=a, which implies that x¯ is a x0minimal-distance solution.

Suppose now that xz are two x0minimal-distance solutions. Since f is strictly convex, so is Dξ0f(,x0), and for any α(0,1) we obtain Dξ0f(αx+(1α)z,x0)<αDξ0f(x,x0)+(1α)Dξ0f(z,x0)=Dξ0f(x,x0), which contradicts the minimality of x.

In the next theorem we prove strong convergence of the sequence generated by the nIT algorithm in the noise-free case to the solution x.

Theorem 4.6

Let (xk)kN be the sequence defined by the nIT method (Algorithm 1) with δ=0 and (λk)kN the sequence of the corresponding Lagrange multipliers. Then, (xk)kN converges strongly to x.

Proof.

We first prove that (xk)kN is a Cauchy sequence in X. Let 0l<m and x a solution of (Equation1), using the three points identity we have (18) Dξlf(xm,xl)Dξlf(x,xl)=Dξmf(x,xm)+ξmξl,xmx.(18)

We observe that (19) |ξmξl,xmx|=|j=l+1mξjξj1,xmx|=|j=l+1mλjJr(Axjy),A(xmx)|j=l+1mλjAxjyr1Axmy,(19) where the second equality above follows form the definition of ξj and the inequality is a consequence of the properties of the duality mapping Jr. Proposition 4.1, with δ=0, implies that AxmyAxjy for all jm. Therefore, we have |ξmξl,xmx|j=l+1mλjAxjyr. Combining the inequality above with (Equation18) we deduce that Dξlf(xm,xl)Dξlf(x,xl)Dξmf(x,xm)+j=l+1mλjAxjyr. From (Equation15) it follows that the sequence (Dξkf(x,xk))kN is monotonically decreasing. Thus, by inequality above and (Equation16) we have Dξlf(xm,xl)0 as l,m, and from the uniform convexity of f we obtain that (xk)kN is a Cauchy sequence in X. Therefore, there is x¯X such that xkx¯ as k, and since Proposition 4.1 implies that Axky0 as k and A is a continuous map, we conclude that Ax¯=y.

Now, we prove that x¯=x. We first observe that ξkξ0(Dξ0f(,x0))(xk), which yields ξkξ0,xxkDξ0f(x,x0)Dξ0f(xk,x0). Thus, (20) Dξ0f(x¯,x0)lim infkDξ0f(xk,x0)Dξ0f(x,x0)+lim infkξkξ0,xkx.(20) Next, we prove that (21) limkξkξ0,xkx=0,(21) which in view of (Equation20) will ensure that Dξ0f(x¯,x0)Dξ0f(x,x0), proving that x¯=x. Indeed, using equation (Equation19) with x=x, for m>l, we have |ξmξl,xmx|=|k=l+1mλkJr(Axky),Axmy|k=l+1mλkAxkyr0 as l. Then, given ϵ>0 there exists k0N such that k>k0|ξkξk0,xkx|<ϵ2. Now, |ξk0ξ0,xkx|=|n=1k0λnJr(Axny),Axky|Axkyn=1k0λnAxnyr1. Since Axky0 as k, we conclude that there exists a number k1k0 such that k>k1|ξk0ξ0,xkx|<ϵ2. Therefore, k>k1 implies ξkξ0,xkx<ϵ, which proves (Equation21) as we wanted.

Our intention in the remainder of this section is to study the convergence properties of the family (xk(δ)δ)δ>0 as the noise level δ approaches zero. To achieve this goal we first establish a stability result connecting xkδ to xk. Observe that in general, xk+1δ is not uniquely defined from xkδ, which motivates the following definition.

Definition 4.7

Let 0<η0η1<1 be pre-fixed constants. x~X is called a successor of xkδ if

  1. k<k(δ);

  2. There exists 0λ~< such that  x~:=argminxXTλ~δ(x), where Tλ~δ is the Tikhonov functional (22) Tλ~δ(x):=λ~r1Axyδr+Dξkδf(x,xkδ);(22)

  3. The residual Ax~yδ belongs to the interval (23) [(1η0)δ+η0Axkδyδ,(1η1)δ+η1Axkδyδ].(23)

In other words, there must exist a nonnegative Lagrange multiplier λ~, s.t. the minimizer x~ of the corresponding Tikhonov functional in (Equation22) attains a residual Ax~yδ in the interval (Equation23) (which is defined by convex combinations of the noise level δ with the residual at the current iterate xkδ).

In the noisy-data case, as long as the discrepancy principle is not satisfied, the interval in (Equation23) is a subset of (24) (δ,(1η1)δ+η1Axkδyδ](24) because (1η0)δ+η0Axkδyδδ+η0(τ1)δ>δ. Therefore, if we consider a sequence generated by Algorithm 1 with the inequalities in [3.2] replaced by a more restrictive condition, as in item 3 of Definition 4.7, all the previous results still hold. Further, since Axkδyδ>τδ>δ and η0η1, it is clear that interval (Equation23) is non-empty.

We note that interval (Equation23) becomes close to the interval (Equation24) as η00, and therefore, the former interval is only a bit larger than the last one for η00. In the noise-free case, (Equation23) reduces to the non-empty interval (25) [η0Axky,η1Axky],(25) and according to Theorem 4.6, the sequence (xk)kN converges to x whenever xk+1 is a successor of xk for all kN. In this situation, we call (xk)kN a noiseless sequence.

Now, we study the behaviour of xkδ, for fixed k, as the noise level δ approaches zero. For the sake of the notation, we define Tλk(x):=λkrAxyr+Dξkf(x,xk), for δ=0 (compare it with (Equation22)).

Theorem 4.8

Stability

Let (δj)jN be a positive-zero sequence and fix τ>1. Assume that the sequences (xkδj)0kk(δj), jN, are fixed, where xk+1δj is a successor of xkδj for k=0,,k(δj)1. Further, assume that Y is a locally uniformly smooth Banach space. If x0 is not a solution of (Equation1), then there exists a noiseless sequence (xk)kN such that, for every fixed number kN, there exists a subsequence (δjm)mN (depending on k) satisfying xnδjmxn,ξnδjmξnandf(xnδjm)f(xn)asm,forn=0,,k.

Proof.

Assume that x0 is not a solution of Ax=y. We use an induction argument: since x0δ=x0 and ξ0δ=ξ0 for every δ0, the result is clear for k=0. The argument consists in successively choosing a subsequence of the current subsequence, and to avoid a notational overload, we denote a subsequence of (δj) still by (δj). Suppose the result holds true for some kN. Then, there exists a subsequence (δj)jN satisfying (26) xnδjxn,ξnδjξnandf(xnδj)f(xn)asj,forn=0,,k,(26) where xn is a successor of xn1 for n=1,,k. Because xk+1δj is a successor of xkδj, it is true that k<k(δj) for all j. Due to the same reason, there exists a non-negative number λkδj such that xk+1δj=argminxXTλkδjδj(x) with the resulting residual Axk+1δyδ lying in the interval (Equation23). Our task now is proving that there exists a successor xk+1 of xk and a subsequence (δj) of the current subsequence such that xk+1δjxk+1, ξk+1δjξk+1 and f(xk+1δj)f(xk+1) as j. Since the proof is relatively large, we divide it in 4 main steps: 1. we find a vector x¯X such that (27) xk+1δjx¯(27) for a specific subsequence (δj)jN. 2. using the third item in Definition 4.7 we show that the sequence (λkδj)jN is bounded. Then, we choose a convergent subsequence and define (28) λk:=limjλkδj<,(28) as well as (29) xk+1:=argminxXTλk(x).(29) 3. we prove that xk+1=x¯ which guarantees that xk+1δjxk+1. 4. Finally, we prove that (30) f(xk+1δj)f(xk+1)asj,(30) which in view of (Equation27) will guarantee that xk+1δjxk+1 as j, since f has the Kadec property, see last paragraph in Section 2. The last result in turn, will prove that ξk+1δjξk+1. Finally, we validate that xk+1 is a successor of xk, completing the induction argument.

Step 1: From (Equation12) and the uniform convexity of f follows that the sequence (xk+1δj)jN is bounded. Thus, there exists a subsequence (δj) of the current subsequence, and a vector x¯X such that (Equation27) holds true.

Step 2: We claim that, for each kN fixed, there exists a constant λmax,k>0 such that (31) λkδjλmax,kforalljN.(31) Indeed, assume the contrary. Then, there is a subsequence satisfying λkδj as j. But in this case, lim infj1rAxk+1δjyδjrlim infjTλkδjδj(xk+1δj)λkδjlim infjTλkδjδj(x)λkδj=limj(1rAxyδjr+Dξkδjf(x,xkδj)λkδj)limj(1rδjr+Dξ0f(x,x0)λkδj)=0. This, together with the lower bound of interval (Equation23) implies that x0 is a solution of Ax=y because (32) 0η0k+1Ax0y=η0k+1limj(Ax0δjyδjδj)lim infj(Axk+1δjyδjδj)=0,(32) and we have a contradiction. Thus (Equation31) is true. We fix a convergent subsequence and define λk as in (Equation28) and xk+1 as in (Equation29).

Step 3: Observe that |ξk,x¯xkξkδj,xk+1δjxkδj||ξk,x¯xk+1δj|+ξkxkδjxk+ξkξkδjxk+1δjxkδj. Thus, from (Equation27) and the induction hypothesis it follows that limjξkδj,xk+1δjxkδj=ξk,x¯xk. Now, the weak lower semi-continuity of f, together with (Equation27) and the induction hypothesis, implies that (33) Dξkf(x¯,xk)=f(x¯)f(xk)ξk,x¯xklim infjf(xk+1δj)limjf(xkδj)limjξkδj,xk+1δjxkδj=lim infjDξkδjf(xk+1δj,xkδj).(33) From (Equation27) we have (34) Axk+1δjyδjAx¯yasj,(34) which together with (Equation28) and the lower semi-continuity of Banach space norms yields Tλk(x¯)=λkrAx¯yr+Dξkf(x¯,xk)lim infj(λkδjrAxk+1δjyδjr+Dξkδjf(xk+1δj,xkδj))=lim infjTλkδjδj(xk+1δj)lim infjTλkδjδj(xk+1)=limjTλkδjδj(xk+1)=Tλk(xk+1). This proves that x¯=xk+1 because xk+1 is the unique minimizer of Tλk. Thus, xk+1δjxk+1 as j. The above inequalities also ensure that lim infjTλkδjδj(xk+1δj)=Tλk(xk+1). Taking a subsequence if necessary, we can assume that the following sequences converge: (35) aj:=Dξkδjf(xk+1δj,xkδj),a:=limjaj,(35) and (36) rej:=Axk+1δjyδjr,re:=limjrej.(36) We can also assume for this subsequence that (37) limjTλkδjδj(xk+1δj)=Tλk(xk+1).(37) Step 4: Define c:=Dξkf(xk+1,xk) and obseve that from (Equation33), the inequality ca holds. Thus, it suffices to prove that ac to ensure that limjDξkδjf(xk+1δj,xkδj)=Dξkf(xk+1,xk), which will prove (Equation30). We assume that a>c and derive a contradiction. From (Equation28), (Equation35), (Equation36), (Equation37), together with the definition of limit, it follows the existence of a number NN such that jN implies Tλkδjδj(xk+1δj)<Tλk(xk+1)+ac2,rerej+ac6λmax,k,λkλkδj+ac6reandaaj+ac6. Therefore, for any jN it holds Tλk(xk+1)λkre+c=(λkλkδj)re+λkδjre+a(ac)ac6rere+λkδj(rej+ac6λmax,k)+(aj+ac6)(ac)λkδjrej+ajac2=Tλkδjδj(xk+1δj)ac2<Tλk(xk+1), which leads to an obvious contradiction, proving that ac as we wanted. Hence (Equation30) holds, and since f has the Kadec property, it follows, in view of (Equation27), that xk+1δjxk+1. This last result, together with the continuity of the duality mapping in locally uniformly smooth Banach spaces, implies that ξk+1δjξk+1.

It only remains to prove that Axk+1y belongs to interval (Equation25), which will guarantee that xk+1 is a successor of xk. But this result follows from applying the limit j to the sequence Axk+1δjyδj, which belongs to interval (Equation23).

Theorem 4.9

Regularization

Let (δj)jN be a positive-zero sequence and fix τ>1. Assume that the sequences (xkδj)0kk(δj), jN, are fixed, where xk+1δj is a successor of xkδj for k=0,,k(δj)1. Further, assume that Y is a locally uniformly smooth Banach space. Then, (38) limjxk(δj)δj=x.(38)

Proof.

The result is clear if x0 is a solution of Ax=y. Assume this is not the case. We first validate that the sequence (k(δj))jN has no convergent subsequences. Indeed, if for a subsequence (δjm)mN it is true that k(δjm)n as m, then since k(δjm)N for all mN, we must have k(δjm)=nN for m large enough. From Theorem 4.8, the subsequence (xnδjm)mN has itself a subsequence (which we still denote by (xnδjm)mN) which converges to xn, this is, limmxk(δjm)δjm=limmxnδjm=xn. But xn is a solution of Ax=y since yAxn=limmyAxk(δjm)δjmlimm(yyδjm+yδjmAxk(δjm)δjm)limm(τ+1)δjm=0. As in the proof of Theorem 4.8 (see (Equation32)) we conclude that x0 is a solution of Ax=y, contradicting our assumption. Therefore k(δj) as j.

We now prove that each subsequence of (xk(δj)δj)jN has itself a subsequence which converges to x. This will prove (Equation38). We observe that since any subsequence of (δj)jN is itself a positive-zero sequence, it suffices to prove that (xk(δj)δj)jN has a subsequence which converges to x.

Our first step is proving that, for every ϵ>0 fixed, there exists a subsequence (which we still denote by (δj)jN) depending on ε, and a number J=J(ϵ), such that (39) jJDξk(δj)δjf(x,xk(δj)δj)<ϵ.(39) In fact, fix ϵ>0 and let (xk)kN be the noiseless sequence constructed in Theorem 4.8. Since xk+1 is a successor of xk for all kN, the sequence (xk)kN converges to x (see Theorem 4.6). Then, there exists a natural number N=N(ϵ) such that xNx<12ϵ2andDξNf(x,xN)<ϵ2. From Theorem 4.8 there exists a subsequence (still denoted by (δj)jN) depending on N, and a number J1N, depending on ε, such that jJ1[ξNδjξN<ϵ2andxNδjxN<12ϵ2]. Since k(δj), there is a number J2N such that k(δj)N for all jJ2. It follows from (Equation12) and the three points identity that for jJ:=max{J1,J2}, Dξk(δj)δjf(x,xk(δj)δj)DξNδjf(x,xNδj)DξNf(x,xN)DξNf(xNδj,xN)+ξNδjξNxNδjxDξNf(x,xN)+ξNδjξN(xNδjxN+xNx)<ϵ, which proves (Equation39).

Choosing ϵ=1, we can find a subsequence (δj)jN and select a number j1N such that Dξk(δj1)δj1f(x,xk(δj1)δj1)<1. Since the current subsequence (δj)jN is also a positive-zero sequence, the above reasoning can be applied again in order to extract a subsequence of the current one satisfying (Equation39) with ϵ=1/2. We choose a number j2j1 such that the inequality Dξk(δj2)δj2f(x,xk(δj2)δj2)<12 holds true. Using induction, it is therefore possible to construct a subsequence (δjm)mN with the property Dξk(δjm)δjmf(x,xk(δjm)δjm)<1mforallmN, which implies that limmDξk(δjm)δjmf(x,xk(δjm)δjm)=0, and since f is uniformly convex, limmxk(δjm)δjmx=0.

5. Algorithms and numerical implementation

5.1. Determining the Lagrange multipliers

As before, we consider the function Gx^(λ)=Axλyδr, where xλ=π(λ,x^) represents the minimizer of the Tikhonov functional Tλδ(x)=1rλAxyδr+Dξf(x,x^). In order to choose the Lagrange multiplier in the k-th iteration, our strategy consists in finding a λk>0 such that Gxk1(λk)[ak,bk], where ak:=(η0δ+(1η0)Axk1yδ)randbk:=(η1δ+(1η1)Axk1yδ)r. Here 0<η0η1<1 are pre-defined constants.

We have employed three different methods to compute λk: the well-known secant and Newton methods and a third strategy, here called adaptive method, which we now explain: fix σ1, σ2(0,1), c1>1 and some initial value λ0δ>0. In the k-th iteration, k1, we define λkδ=ckλk1δ, where ck={ck1σ1,ifGxk2(λk1δ)<ak1ck1σ21,ifGxk2(λk1δ)>bk1ck1,otherwise,fork2. The idea behind the adaptive method is observing the behaviour of the residual in last iterations and trying to determine how much the Lagrange multiplier should be increased in the next iteration. For example, the residual Gxk2(λk1δ)=Axk1yδr lying on the left of the target interval [ak1,bk1], means that λk1δ was too large. We thus multiply ck1 by a number σ1(0,1) in order to reduce the rate of growth of the Lagrange multiplier λkδ, trying to hit the target in the next iteration.

Although the Newton method is efficient, in the sense that it normally finds a good approximation for the Lagrange multiplier in very few steps, it has the drawback of demanding the differentiability of the Tikhonov functional, and therefore it cannot be applied in all situations.

Because it does not require the evaluation of derivatives, the secant method can be used even for a nonsmooth Tikhonov functional. A disadvantage of this method is the high computational effort required to perform it.

Among these three possibilities, the adaptive strategy is the cheapest one, since it only demands one minimization of the Tikhonov functional per iteration. Further, this simple strategy does not require the derivative of this functional, which makes it fit in a large range of applications.

Note that this third strategy may generate a λkδ such that Gxk1(λkδ)[ak,bk] in some iterative steps. This is the reason for correcting the factors ck in each iteration. In our numerical experiments, the condition Gxk1(λkδ)[ak,bk] was satisfied in almost all steps.

5.2. Minimization of the Tikhonov functional

In our numerical experiments, we are interested in solving the inverse problem (Equation1), where A:Lp(Ω)L2(Ω), with 1<p<, is linear and bounded, noisy data yδ are available, and the noise level δ>0 is known.

In order to implement the nIT method (Algorithm 1), a minimizer of the Tikhonov functional (Equation22) needs to be calculated in each iteration step. Minimizing this functional can be itself a very challenging task. We have used two algorithms for achieving this goal in our numerical experiments: 1. The Newton method was used for minimizing this functional in the case p2 and with a smooth function f, which induces the Bregman distance in the penalization term. 2. The so called ADMM method was employed in order to minimize the Tikhonov functional for the case p=2 (Hilbert space) and a nonsmooth functional f. In the following, we explain the details.

First we consider the Newton method. Define the Bregman distance induced by the norm-functional f(g):=1pgLpp, 1<p<, which leads to the smooth penalization term Dξf(g,h)=Δp(g,h), see Section 2. The resulting Tikhonov functional reads Tλ(g)=12λAgyδ2+Δp(g,gk1), where gk1 is the current iterate.Footnote3 In this case, the optimality condition (Equation10) reads: (40) F(g¯)=λAyδ+Jp(gk1),(40) where g¯Lp(Ω) is the minimizer of Tλ(g) and F(g):=λAAg+Jp(g).

In order to apply the Newton method to the nonlinear equation (Equation40), one needs to evaluate the derivative of F, which (whenever exists) is given by F(g)=λAA+Jp(g). Next we present an explicit expression for the Gâteaux-derivative Jp(g) (the derivation of this expression is postponed to Appendix A, where the Gâteaux-differentiability of Jp in Lp(Ω), for p2, is investigated). Given gLp(Ω), with p2, it holds (41) (Jp(g))(h)=(p1)|g|p2,h,hLp(Ω),(41) where the linear operator (p1)|g|p2:h(p1)|g()|p2h() is to be understood in pointwise sense. In the discretized setting, Jp(g) is a diagonal matrix whose ith element on its diagonal is (p1)|g(xi)|p2, with xi being the ith point of the chosen mesh.

In our numerical simulations, we consider the situation where the sought solution is sparse and, therefore, the case p1 is of our interest. We stress the fact that (EquationA2) (see Appendix A) holds true even for 1<p<2 whenever x0. Using this fact, one can prove that (Equation41) holds for these values of p, e.g. if g does not change signal in Ω (i.e, either g>0 in Ω, or g<0 in Ω) and the direction h is a bounded function in this set. However, these strong hypotheses are very difficult to check, and even if they are satisfied, we still expect having stability problems for inverting the matrix F(g) if the function g attains a small value in some point of the mesh, because the function in (EquationA2) satisfies γ(x) as x0. In order to avoid this kind of problem in our numerical experiments, we have replaced the ith element on the diagonal of the matrix Jp(g) by min{(p1)|g(xi)|p2,106}.

The second method that we used in our experiments was the well-known Alternating Direction Method of Multipliers (ADMM), which has been implemented to minimize the Tikhonov functional associated with the inverse problem Ax=yδ, where X=Y=Rn, A:RnRn, and f:RnR¯ is a nonsmooth function.

ADMM is an optimization scheme for solving linearly constrained programming problems with decomposable structure [Citation13], which goes back to the works of Glowinski and Marrocco [Citation14], and of Gabay and Mercier [Citation15]. Specifically, this algorithm solves problems in the form: (42) min(x,z)Rn×Rm{φ(x)+ϕ(z):Mx+Bz=d},(42) where φ:RnR¯ and ϕ:RmR¯ are convex proper l.s.c. functions, M:RnRl and B:RmRl are linear operators, and dRl.

ADMM solves the coupled problem (Equation42) performing a sequences of steps that decouple functions ϕ and φ, making it possible to exploit the individual structure of these functions. It can be interpreted in terms of alternating minimization, with respect to x and z, of the augmented Lagrangian function associated with problem (Equation42). Indeed, ADMM consists of the iterations xk+1=argminxRnLρ(x,zk,uk)zk+1=argminzRMLρ(xk+1,z,uk)uk+1=uk+ρ(Mxk+1+Bzk+1d), where ρ>0 and Lρ is the augmented Lagrangian function Lρ(x,z,u):=φ(x)+ϕ(z)+u,Mx+Bzd+12ρMx+Bzd22. The convergence results for ADMM guarantee, under suitable assumptions, that the sequences (xk)kN, (zk)kN and (uk)kN generated by the method are such that Mxk+Bzkd0, φ(xk)+ϕ(zk)s and uku, where s is the optimal value of problem (Equation42) and u is a solution of the dual problem associated with (Equation42).

For minimizing the Tikhonov functional using ADMM we introduce an additional decision variable z such that problem of minimizing Tλkδδ(x) for xX is rewritten into the form of (Equation42).

The specific choice of the functions ϕ, φ and the operators M and B is problem dependent (for a concrete example see, e.g. Section 6.2). This allows us to exploit the special form of the functional Tλkδδ, and also to pose the minimization problem in a more suitable form, in order to be solved numerically.

In all numerical simulations presented in Section 6, the ADMM method is stopped when Mxk+Bzkd becomes smaller than a predefined threshold.

6. Numerical experiments

6.1. Deconvolution

In what follows we consider the application of the nIT method to the deconvolution problem modelled by the linear integral operator Ax:=01K(s,t)x(t)dt=y(s), where the kernel K is the continuous function defined by K(s,t)={49s(1t),st49t(1s),s>t. This benchmark problem is considered in [Citation3]. There, it is observed that A:Lp[0,1]C[0,1] is continuous and bounded for 1p. Thus, A:Lp[0,1]Lr[0,1] is compact for 1r<.

In our experiment, A is replaced by the discrete operator Ad, where the above integral is computed using a quadrature formula (trapezoidal rule) over an uniform partition of the interval [0,1] with 400 nodes.

The exact solution of the discrete problem is the vector xR400 with x(27)=2, x(72)=1.25, x(103)=1.75, x(255)=1.25, x(350)=1.5 and x(i)=0, elsewhere.Footnote4

We compute y=Adx, the exact data, and add random Gaussian noise to yR400 to get the noisy data yδ satisfying yyδYδ.

We follow [Citation3] in the experimental setting and choose δ=0.0005, τ=1.001 (discrepancy principle), and Y=L2. For the parameter space, two distinct choices are considered, namely X=L1.001 and X=L2.

Numerical results for the deconvolution problem are presented in Figure  (for simplicity, all legends in this figure refere to the space L1; however, p=1.001 is used in the computations). The following methods are depicted:

  1. (BLUE) L2-penalization, Geometric sequence;

  2. (GREEN) L2-penalization, Secant method;

  3. (RED) L1.001-penalization, Geometric sequence;

  4. (PINK) L1.001-penalization, Secant method;

  5. (BLACK) L1.001-penalization, Newton method.

Figure 1. Deconvolution problem: Numerical experiments.

Figure 1. Deconvolution problem: Numerical experiments.

The six pictures in Figure  represent:

  1. [TOP] Iteration error in L2-norm (left);Footnote5 residual in L2-norm (right);

  2. [CENTER] Number of linear systems/step (left); Lagrange multipliers (right);

  3. [BOTTOM] exact solution and reconstructions with L2-penalization (left); exact solution and reconstructions with L1.001-penalization (right).

6.2. Image deblurring

In the sequel an application of the nIT method to an image deblurring problem is considered. This is a finite dimensional problem with spaces X=Rn×Rn and Y=Rn×Rn. The vector xX represents the pixel values of the original image to be restored, and yY contains the pixel values of the observed blurred image. In practice, only noisy blurred data yδY satisfying (Equation2) is available. The linear transformation A represents some blurring operator.

In the numerical simulations we consider the situation where the blur of the image is modelled by a space invariant point spread function (PSF). The exact solution is the 512×512 Barbara image (see Figure ), and yδ is obtained adding artificial noise to the exact data y=Ax (here A is the covolution operator corresponding to the PSF).

Figure 2. Image deblurring problem: (LEFT) Point Spread Function; (CENTER) Exact image; (RIGHT) Blurred image.

Figure 2. Image deblurring problem: (LEFT) Point Spread Function; (CENTER) Exact image; (RIGHT) Blurred image.

For this problem the nIT method is implemented with two distinct penalization terms, namely f(x)=x22 (L2 penalization) and f(x)=μ2x22+TV(x) (L2+TV penalization). Here μ>0 is a regularization parameter and TV(x)=x1 is the total variation norm of x, where :Rn×Rn(Rn×Rn)×(Rn×Rn) is the discrete gradient operator. We minimize the Tikhonov functional associated with the L2+TV penalization term using the ADMM described in Section 5.

In our experiments the values μ=104, δ=0.00001 and τ=1.5 are used. Moreover, x0=yδ and ξ0=(sign(x0)) are chosen as initial guesses.

In Figure , the exact solution, the convolution kernel, and the noisy data are shown. The reconstructed images are shown in Figure , while the numerical results are presented in Figure . The following methods were implemented:

  1. (BLUE) L2-penalization, Geometric sequence;

  2. (RED) L2+TV-penalization, Geometric sequence;

  3. (PINK) L2+TV-penalization, Secant method;

  4. (GREEN) L2+TV-penalization, Adaptive method.

Figure 3. Image deblurring problem: Reconstructions (TOP LEFT) L2–Geometric; (TOP RIGHT) L2+TV–Geometric; (BOTTOM LEFT) L2+TV–Secant; (BOTTOM RIGHT) L2+TV–Adaptive.

Figure 3. Image deblurring problem: Reconstructions (TOP LEFT) L2–Geometric; (TOP RIGHT) L2+TV–Geometric; (BOTTOM LEFT) L2+TV–Secant; (BOTTOM RIGHT) L2+TV–Adaptive.

Figure 4. Image deblurring problem: Numerical experiments.

Figure 4. Image deblurring problem: Numerical experiments.

The four pictures in Figure  represent:

  1. [TOP] Iteration error xxkδ;

  2. [CENTER TOP] Residual Axkδyδ;

  3. [CENTER BOTTOM] Number of linear systems solved in each step;

  4. [BOTTOM] Lagrange multiplier λk. and τ=1.5. Moreover, the initial guesses x0=yδ

Remark 6.1

The Tikhonov functional associated with the L2+TV penalization term is minimized using the ADMM in Section 5. Note that, if f(x)=μ2x22+x1 then one needs to solve minxX12λkAxyδ2+μ2xxk1δ2+x1xk1δ1ξk1δ,xxk1δ in each iteration. In order to use ADMM, we sate this problem into the form of (Equation42) by defining z=x, φ(x):=λk2Axyδ2+μ2xxk1δ2ξk1δ,xxk1δ, ϕ(z)=z1xk1δ1, M=, B=I and d=0.

6.3. Inverse potential problem

The third application considered in this section, the Inverse Potentinal Problem (IPP), consists of recovering a coefficient function x:ΩR, from measurements of the Cauchy data of its corresponding potential on the boundary of the domain Ω=(0,1)×(0,1).

The direct problem is modelled by the linear operator A:L2(Ω)xwν|ΩL2(Ω), where wH1(Ω) solves the elliptic boundary value problem (43) Δw=x,inΩ;w=0,atΩ.(43) Since xL2(Ω), the Dirichlet boundary value problem in (Equation43) admits a unique solution (known as potential) wH2(Ω)H01(Ω) [Citation16].

The inverse problem we are concerned with, consists in determining the piecewise constant source function x from measurements of the Neumann trace of w at Ω. Using the above notation, the IPP can be written in the abbreviated form (Equation1), with data y=wν|Ω.

In our implementations we follow [Citation8] in the experimental setup: we set Ω=(0,1)×(0,1) and assume that xH1(Ω) is a function with sharp gradients (see Figure ).

The boundary value problem (Equation43) is solved for w using x=x, and the exact data y=wν|Ω for the inverse problem is computed. The noisy data yδ for the inverse problem is obtained by adding to y a normally distributed noise with zero mean, in order to achieve a prescribed relative noise level.

In our numerical implementations we set τ=5 (discrepancy principle constant), δ=0.5% (relative noise level), and the initial guess x01.5 (a constant function in Ω). For the parameter space we choose X=Lp(Ω) with p=1.5, while the data space is Y=L2(Ω).

Numerical results for the Inverse potential problem are presented in Figure . The following methods are depicted:

  1. (RED) Lp-penalization, Geometric sequence;

  2. (BLACK) Lp-penalization, Newton method;

  3. (BLUE) Lp-penalization, Adaptive method;

Figure 5. Inverse Potential problem: Numerical experiments.

Figure 5. Inverse Potential problem: Numerical experiments.

Figure 6. Inverse Potential problem: (TOP LEFT) Exact solution x. Reconstructions (TOP RIGHT) L1–Geometric; (BOTTOM LEFT) L1–Newton; (BOTTOM RIGHT) L1–Adaptive.

Figure 6. Inverse Potential problem: (TOP LEFT) Exact solution x⋆. Reconstructions (TOP RIGHT) L1–Geometric; (BOTTOM LEFT) L1–Newton; (BOTTOM RIGHT) L1–Adaptive.

The four pictures in Figure  represent: [TOP] Iteration error in Lp-norm; [CENTERTOP] Residual in L2-norm; [CENTERBOTTOM] Number of linear systems/step; [BOTTOM] Lagrange multipliers.

The corresponding reconstruction results are shown in Figure : [TOPLEFT] Exact Solution; [TOPRIGHT] geometric method; [BOTTOMLEFT] Newton method; [BOTTOMLEFT] Adaptive method.

7. Conclusions

In this manuscript we investigate a novel strategy for choosing the regularization parameters in the nonstationary iterated Tikhonov (nIT) method for solving ill-posed operator equations modelled by linear operators acting between Banach spaces. The novelty of our approach consists in defining strategies for choosing a sequence of regularization parameters (i.e. Lagrange multipliers) for the nIT method.

A preliminary (numerical) investigation of this method was conducted in [Citation9]. In the present manuscript we derive a complete convergence analysis, proving convergence, stability and semi-convergence results (see Section 4). In Sections 6.1 and 6.2 we revisit two numerical applications discussed in [Citation9]. Moreover, in Section 6.3 we investigate a classical benchmark problem in the inverse problems literature, namely the 2D elliptic Inverse Potential Problem.

The Lagrange multipliers are chosen (a posteriori) in order to enforce a fast decay of the residual functional (see Algorithm 1 and Section 4). The computation of these multipliers is performed by means of three distinct methods: (1) a secant type method; (2) a Newton type method; (3) an adaptive method using a geometric sequence with non-constant growth rate, where the rate is updated after each step.

The computation of the iterative step of the nIT method requires the minimization of a Tikhonov Functional (see Section 4). This task is solved here using two distinct methods: (1) in the case of smooth penalization and Banach parameter-spaces the optimality condition (related to the Tikhonov functional) leads to a nonlinear equation, which is solved using a Newton type method; (2) in the case of nonsmooth penalization and Hilbert parameter-space, the ADMM method is used for minimizing the Tikhonov functional.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

A.L. acknowledges support from CNPq [grant number 311087/2017-5], and from the AvH Foundation.

Notes

1 The differentiability and convexity properties of this functional are independent of the particular choice of p>1.

2 This Lemma guarantees that, given a reflexive Banach space E, and a nonempty closed convex set AE, then any convex l.s.c proper function ϕ:A(,] achieves its minimum on A.

3 Here (Equation1) is replaced by Ag=yδ.

4 Notice that we are dealing with a discrete inverse problem, and discretization errors associated to the continuous model are not being considered.

5 For the purpose of comparison, the iteration error is ploted in the in L2-norm for both choices of the parameter space X=L2 and X=L1.001.

References

  • Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Dordrecht: Kluwer Academic Publishers Group; 1996. (Mathematics and its applications; 375). MR 1408680.
  • Hanke M, Groetsch CW. Nonstationary iterated Tikhonov regularization. J Optim Theory Appl. 1998;98(1):37–53. doi: 10.1023/A:1022680629327
  • Jin Q, Stals L. Nonstationary iterated Tikhonov regularization for ill-posed problems in Banach spaces. Inverse Probl. 2012;28(3):104011.
  • Schuster T, Kaltenbacher B, Hofmann B, et al. Regularization methods in Banach spaces. Berlin: de Gruyter; 2012.
  • De Cezaro A, Baumeister J, Leitão A. Modified iterated Tikhonov methods for solving systems of nonlinear ill-posed equations. Inverse Probl Imaging. 2011;5(1):1–17. doi: 10.3934/ipi.2011.5.1
  • Jin Q, Zhong M. Nonstationary iterated Tikhonov regularization in Banach spaces with uniformly convex penalty terms. Numer Math. 2014;127(3):485–513. MR 3216817 doi: 10.1007/s00211-013-0594-9
  • Lorenz D, Schoepfer F, Wenger S. The linearized bregman method via split feasibility problems: analysis and generalizations. SIAM J Imaging Sci. 2014;7(2):1237–1262. doi: 10.1137/130936269
  • Boiger R, Leitão A, Svaiter BF. Range-relaxed criteria for choosing the Lagrange multipliers in nonstationary iterated Tikhonov method. IMA J Numer Anal. 2019;39(1). DOI:10.1093/imanum/dry066
  • Machado MP, Margotti F, Leitão A. On nonstationary iterated Tikhonov methods for ill-posed equations in Banach spaces. In: Hofmann B, Leitão A, Zubelli JP, editors. New trends in parameter identification for mathematical models. Cham: Springer International Publishing; 2018. p. 175–193.
  • Cioranescu I. Geometry of Banach spaces, duality mappings and nonlinear problems. Dordrecht: Kluwer Academic Publishers Group; 1990. (Mathematics and its applications; 62). MR 1079061.
  • Brezis H. Functional analysis, Sobolev spaces and partial differential equations. New York: Springer; 2011. (Universitext). MR 2759829.
  • Rockafellar RT Conjugate duality and optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1974, Lectures given at the Johns Hopkins University, Baltimore, Md., June, 1973, Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 16. MR 0373611.
  • Eckstein J, Bertsekas DP. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program. 1992;55(3):293–318. MR 1168183. doi: 10.1007/BF01581204
  • Glowinski R, Marrocco A. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité, d'une classe de problèmes de Dirichlet non linéaires. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Rouge Anal. Numér.. 1975;9(R-2):41–76. MR 0388811.
  • Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl. 1976;2(1):17–40. doi: 10.1016/0898-1221(76)90003-1
  • Evans LC. Partial differential equations. Providence, RI: American Mathematical Society; 1998. (Graduate studies in mathematics; 19).

Appendix

In this appendix we discuss how to compute the Gâteaux derivative of Jp. Given gLp(Ω), the key idea for deriving a formula for Jp(g) is to observe the differentiability of the function γ:Rxp1|x|pR. This function is differentiable in R whenever p>1 and, in this case it holds (A1) γ(x)=|x|p1sign(x),wheresign(x)={1,ifx>00,ifx=01,ifx<0.(A1) Furthermore, γ is twice differentiable in R if p2, with derivative given by (A2) γ(x)=(p1)|x|p2.(A2) This formula still holds true for 1<p<2, but only in R{0}. In this case, γ(0) does not exist and γ(x) grows to infinity as x approaches to zero.

Since Jp(g)=(1pgLpp) can be identified with (A3) Jp(g)=|g|p1sign(g)(A3) (see [Citation10]), which looks very similar to γ in (EquationA1), the bounded linear operator Jp(g):Lp(Ω)Lp(Ω) should be in some sense similar to γ in (EquationA2). We then fix gLp(Ω) with p2 and try to prove (Equation41), i.e. (Jp(g))(h)=(p1)|g|p2,h,hLp(Ω), where the linear operator (p1)|g|p2:h(p1)|g()|p2h() is to be understood in pointwise sense. This ensures that Jp is Gâteaux-differentiable in Lp(Ω) and its derivative Jp can be identified with (p1)||p2.

Note that, given hLp(Ω), equality (Equation41) holds iff limt0ft=(p1)|g()|p2h(), where ft:ΩR is defined by ft(x):=t1[Jp(g+th)Jp(g)](x). This is equivalent to prove that limt0ft(p1)|g()|p2h()Lp(Ω)p=0. In view of (EquationA3) and (EquationA2), it follows that for each xΩ fixed we have limt0ft(x)=limt0t1[|g(x)+th(x)|p1sign(g(x)+th(x))|g(x)|p1sign(g(x))]=ddt[|g(x)+th(x)|p1sign(g(x)+th(x))]t=0=[(p1)|g(x)+th(x)|p2h(x)]t=0=(p1)|g(x)|p2h(x). Thus, making use of Lebesgue's Dominated Convergence Theorem, we conclude that limt0ft(p1)|g()|p2h()Lp(Ω)p=limt0Ω|ft(x)(p1)|g(x)|p2h(x)|pdx=Ω|limt0ft(x)(p1)|g(x)|p2h(x)|pdx=0. which proves (Equation41).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.