Search in:

Inverse Problems in Science and Engineering Volume 28, 2020 - Issue 6

Submit an article Journal homepage

Free access

542

Views

CrossRef citations to date

Altmetric

Listen

Articles

On the choice of Lagrange multipliers in the iterated Tikhonov method for linear ill-posed equations in Banach spaces

M. P. MachadoDepartment of Mathematics, Federal University of Bahia, Salvador, BrazilView further author information

F. MargottiDepartment of Mathematics, Federal University of St. Catarina, Florianópolis, BrazilView further author information

A. LeitãoDepartment of Mathematics, Federal University of St. Catarina, Florianópolis, BrazilCorrespondence[email protected]
View further author information

Pages 796-826 | Received 21 Feb 2019, Accepted 15 Aug 2019, Published online: 09 Sep 2019

Cite this article
https://doi.org/10.1080/17415977.2019.1662001
CrossMark

In this article

ABSTRACT
1. Introduction
2. Background material
3. The nIT method
4. Convergence analysis
5. Algorithms and numerical implementation
6. Numerical experiments
7. Conclusions
Disclosure statement
Additional information
Footnotes
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

This article is devoted to the study of nonstationary Iterated Tikhonov (nIT) type methods (Hanke M, Groetsch CW. Nonstationary iterated Tikhonov regularization. J Optim Theory Appl. 1998;98(1):37–53; Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Vol. 375, Mathematics and its Applications. Dordrecht: Kluwer Academic Publishers Group; 1996. MR 1408680) for obtaining stable approximations to linear ill-posed problems modelled by operators mapping between Banach spaces. Here we propose and analyse an a posteriori strategy for choosing the sequence of regularization parameters for the nIT method, aiming to obtain a pre-defined decay rate of the residual. Convergence analysis of the proposed nIT type method is provided (convergence, stability and semi-convergence results). Moreover, in order to test the method's efficiency, numerical experiments for three distinct applications are conducted: (i) a 1D convolution problem (smooth Tikhonov functional and Banach parameter-space); (ii) a 2D deblurring problem (nonsmooth Tikhonov functional and Hilbert parameter-space); (iii) a 2D elliptic inverse potential problem.

KEYWORDS:

Ill-posed problems
Banach spaces
linear operators
iterated Tikhonov method

2010 MATHEMATICS SUBJECT CLASSIFICATIONS:

65J20
47J06

1. Introduction

In this article we investigate nonstationary Iterated Tikhonov (nIT) type methods [Citation1,Citation2] for obtaining stable approximations of linear ill-posed problems modelled by operators mapping between Banach spaces. The novelty of our approach consists in the introduction of an a posteriori strategy for choosing the sequence of regularization parameters (or, equivalently, the Lagrange multipliers) for the nIT iteration, which play a key role in the convergence speed of the nIT iteration.

This new a posteriori strategy aims to enforce a pre-defined decay of the residual in each iteration; it differs from the classical choice for the Lagrange multipliers (see, e.g. [Citation2,Citation3]), which is based on an a priori strategy (typically geometrical) and leads to an unknown decay rate of the residual.

The inverse problem we are interested in consists of determining an unknown quantity $x \in X$ from given data $y \in Y$ , where X, Y are Banach spaces. We assume that data are obtained by indirect measurements of the parameter, this process being described by the ill-posed operator equation (1) $A x = y,$ (1) where $A : X \to Y$ is a bounded linear operator, whose inverse $A^{- 1} : R (A) \to X$ either does not exist, or is not continuous. In practical situations, one does not know the data y exactly; instead, only approximate measured data $y^{δ} \in Y$ are available with (2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) where $δ > 0$ is the (known) noise level. A comprehensive study of linear ill-posed problems in Banach spaces can be found in the text book [Citation4] (see, e.g. [Citation1] for a corresponding theory in Hilbert spaces).

Iterated Tikhonov type methods are typically used for linear inverse problems. In the Hilbert space setting we refer the reader to [Citation2] for linear operator equations, and also to [Citation5] for the nonlinear case. In the Banach space setting the research is still ongoing. Some preliminary results can be found in [Citation3] for linear operator equations; see [Citation6] for the nonlinear case; see also [Citation7]. In all references above, a priori strategies are used for choosing the Lagrange multipliers.

1.1. Main results: presentation and interpretation

The approach discussed in this manuscript is devoted to the Banach space setting, and consists in adopting an a posteriori strategy for the choice of the Lagrange multipliers. The strategy used here is inspired by the recent work [Citation8], where the authors propose an endogenous strategy for the choice of the Lagrange multipliers in the nonstationary iterated Tikhonov method for solving (Equation1(1) $A x = y,$ (1) ), (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ) when X and Y are Hilbert spaces. The penalty terms used in our Tikhonov functionals are the same as in [Citation6] and consist of Bregman distances induced by (uniformly) convex functionals, e.g. the sum of the $L^{2}$ -norm with the TV -seminorm. In our previous work [Citation9], we implemented the method proposed in this paper and investigated its numerical performance (two different ill-posed problems were solved). Here, we extend our previous results by presenting a whole convergence analysis. Additionally, the corresponding algorithm is implemented for solving three benchmark problems and more details concerning the computation of the minimizers of the Tikhonov functional are provided; our numerical results are compared with the ones obtained using the nIT method using the classical geometrical choice of Lagrange multipliers [Citation6].

In what follows, we briefly interpret the main results: The proposed method defines each Lagrange multiplier such that the residual of the corresponding next iterate lies in an interval which depends on both the noise level and the residual of the current iterate (see (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) )). This fact has the following consequences: (1) it forces a geometrical decay of the residual (Proposition 4.1); (2) it guarantees the possibility of computing multipliers in agreement with the theoretical convergence results (Theorems 4.6, 4.8 and 4.9); (3) the computation of the multipliers demands less numerical effort than the classical strategy of computing the multipliers by solving an equation in each iteration; (4) the next iterate is not uniquely determined by the current one; instead, it is chosen within a set of successors of the current iterate (Definition 4.7). We also address the actual computation of the Lagrange multipliers. Since each multiplier is implicitly defined by an inequality, we discuss a numerically efficient strategy for computing them, which is based on the decrease rate of the past residuals (Section 5.1).

1.2. Outline of the article

This manuscript is outlined as follows: In Section 2 a revision of relevant background material is presented. In Section 3 the new nIT method is introduced. Section 4 is devoted to the convergence analysis of the nIT method. In Section 5 possible implementations of our method are discussed; the evaluation of the Lagrange multipliers is addressed, as well as the issue of minimizing the Tikhonov functionals. Section 6 is devoted to numerical experiments, while Section 7 is dedicated to final remarks and conclusions.

2. Background material

For details on the material discussed in this section, we refer the reader to the textbooks [Citation4,Citation10].

Unless the contrary is explicitly stated, we always consider X a real Banach space. The effective domain of the convex functional $f : X \to \bar{R} := (- \infty, \infty]$ is defined as $Dom (f) := {x \in X : f (x) < \infty} .$ The set $Dom (f)$ is always convex and we call f proper provided it is non-empty. We call f uniformly convex if there exists a continuous and strictly increasing function $φ : R_{0}^{+} \to R_{0}^{+}$ with the property $φ (t) = 0$ implies $t = 0,$ such that (3) $f (λ x + (1 - λ) y) + λ (1 - λ) φ (‖ x - y ‖) \leq λ f (x) + (1 - λ) f (y),$ (3) for all $λ \in (0, 1)$ and $x, y \in X .$ Of course f uniformly convex implies f strictly convex, which in turn implies f convex. The functional f is lower semi-continuous (in short l.s.c.) if for any sequence $(x_{k})_{k \in N} \subset X$ satisfying $x_{k} \to x$ , it holds $f (x) \leq lim inf_{k \to \infty} f (x_{k}) .$ It is called weakly lower semi-continuous (w.l.s.c.) if above property holds true with $x_{k} \to x$ replaced by $x_{k} ⇀ x .$ Obviously, every w.l.s.c functional is l.s.c. Further, any Banach space norm is w.l.s.c.

The sub-differential of a functional $f : X \to \bar{R}$ is the point-to-set mapping $\partial f : X \to 2^{X^{*}}$ defined by $\partial f (x) := {x^{*} \in X^{*} : f (x) + ⟨ x^{*}, y - x ⟩ \leq f (y) for all y \in X} .$ Any element in the set $\partial f (x)$ is called a sub-gradient of f at x. The effective domain of $\partial f$ is the set $Dom (\partial f) := {x \in X : \partial f (x) \neq \emptyset} .$ It is clear that the inclusion $Dom (\partial f) \subset Dom (f)$ holds whenever f is proper.

Sub-differentiable functionals and l.s.c. convex functionals are very close related concepts. In fact, a sub-differentiable functional f is convex and l.s.c. in any open convex set of $Dom (f) .$ On the other hand, a proper, convex and l.s.c. functional is always sub-differentiable on its effective domain.

The definition of sub-differential readily yields $0 \in \partial f (x) ⟺ f (x) \leq f (y) for all y \in X .$ If $f, g : X \to \bar{R}$ are convex functionals and there is a point $x \in Dom (f) \cap Dom (g)$ where f is continuous, then (4) $\partial (f + g) (x) = \partial f (x) + \partial g (x) for all x \in X .$ (4) Moreover, if Y is a real Banach space, $h : Y \to \bar{R}$ is a convex functional, $b \in Y,$ $A : X \to Y$ is a bounded linear operator and h is continuous at some point of the range of A, then $\partial (h (\cdot - b)) (y) = (\partial h) (y - b) and \partial (h \circ A) (x) = A^{*} (\partial h (A x)),$ for all $x \in X$ and $y \in Y,$ where $A^{*} : Y^{*} \to X^{*}$ is the Banach-adjoint of A. Consequently, (5) $\partial (h (A \cdot - b)) (x) = A^{*} (\partial h) (A x - b) for all x \in X .$ (5) If a convex functional $f : X \to \bar{R}$ is Gâteaux-differentiable at $x \in X,$ then f has a unique sub-gradient at x, namely, the Gâteaux-derivative itself: $\partial f (x) = {\nabla f (x)} .$

The sub-differential of the convex functional (6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) is called the duality mapping and is denoted by $J_{p} .$ It can be shown that for all $x \in X,$ $J_{p} (x) = {x^{*} \in X^{*} : ⟨ x^{*}, x ⟩ = ‖ x^{*} ‖ ‖ x ‖ and ‖ x^{*} ‖ = {‖ x ‖}^{p - 1}} .$ Thus, the duality mapping has the inner-product-like properties: $⟨ x^{*}, y ⟩ \leq {‖ x ‖}^{p - 1} ‖ y ‖ and ⟨ x^{*}, x ⟩ = {‖ x ‖}^{p},$ for all $x^{*} \in J_{p} (x) .$ By using the Riesz Representation Theorem, one can prove that $J_{2} (x) = x$ for all $x \in X$ whenever X is a Hilbert space.

Banach spaces are classified according with their geometrical characteristics. Many concepts concerning these characteristics are usually defined using the so called modulus of convexity and modulus of smoothness, but most of these definitions can be equivalently stated observing the properties of the functional f defined in (Equation6(6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) ).Footnote¹ This functional is convex and sub-differentiable in any Banach space X. If (Equation6(6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) ) is Gâteaux-differentiable in the whole space X, this Banach space is called smooth. In this case, $J_{p} (x) = \partial f (x) = {\nabla f (x)}$ and therefore, the duality mapping $J_{p} : X \to X^{*}$ is single-valued. If the functional f in (Equation6(6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) ) is Fré chet-differentiable in X, this space is called locally uniformly smooth and it is called uniformly smooth provided f is uniformly Fréchet-differentiable in bounded sets. As a result, the duality mapping is continuous (resp. uniformly continuous in bounded sets) in locally uniformly smooth (resp. uniformly smooth) spaces. It is immediate that uniform smoothness of a Banach space implies local uniform smoothness, which in turn implies smoothness of this space. Moreover, none reciprocal is true. Similarly, a Banach space X is called strictly convex whenever (Equation6(6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) ) is a strictly convex functional. Moreover, X is called uniformly convex if the functional f in (Equation6(6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) ) is uniformly convex. It is clear that uniform convexity implies strict convexity. It is well-known that both uniformly smooth and uniformly convex Banach spaces are reflexive.

Assume f is proper. Then, choosing elements $x, y \in X$ with $y \in Dom (\partial f),$ we define the Bregman distance between x and y in the direction of $ξ \in \partial f (y)$ as $D_{ξ} f (x, y) := f (x) - f (y) - ⟨ ξ, x - y ⟩ .$ Obviously, $D_{ξ} f (y, y) = 0$ and, since $ξ \in \partial f (y),$ it additionally holds that $D_{ξ} f (x, y) \geq 0.$ Moreover, it is straightforward proving the Three Points Identity: $D_{ξ_{1}} f (x_{2}, x_{1}) - D_{ξ_{1}} f (x_{3}, x_{1}) = D_{ξ_{3}} f (x_{2}, x_{3}) + ⟨ ξ_{3} - ξ_{1}, x_{2} - x_{3} ⟩,$ for all $x_{2} \in X,$ $x_{1}, x_{3} \in Dom (\partial f),$ $ξ_{1} \in \partial f (x_{1})$ and $ξ_{3} \in \partial f (x_{3}) .$ Further, the functional $D_{ξ} f (\cdot, y)$ is strictly convex whenever f is strictly convex, and in this case, $D_{ξ} f (x, y) = 0$ iff $x = y .$

When f is the functional defined in (Equation6(6) $f (x) = \frac{1}{p} {‖ x ‖}^{p}, p > 1,$ (6) ) and X is a smooth Banach space, the Bregman distance has the special notation $Δ_{p} (x, y),$ i.e. $Δ_{p} (x, y) := \frac{1}{p} {‖ x ‖}^{p} - \frac{1}{p} {‖ y ‖}^{p} - ⟨ J_{p} (y), x - y ⟩ .$ Since $J_{2}$ is the identity operator in Hilbert spaces, a simple application of the polarization identity shows that $Δ_{2} (x, y) = \frac{1}{2} ‖ x - y ‖^{2}$ in these spaces.

It is not difficult to prove (see e.g. [Citation9]) that if $f : X \to \bar{R}$ is uniformly convex, then (7) $φ (‖ x - y ‖) \leq D_{ξ} f (x, y)$ (7) for all $x \in X,$ $y \in Dom (\partial f)$ and $ξ \in \partial f (y),$ where ϕ is the function in (Equation3(3) $f (λ x + (1 - λ) y) + λ (1 - λ) φ (‖ x - y ‖) \leq λ f (x) + (1 - λ) f (y),$ (3) ). In particular, in a smooth and uniformly convex Banach space X, the above inequality reads $φ (‖ x - y ‖) \leq Δ_{p} (x, y) .$

We say that a functional $f : X \to \bar{R}$ has the Kadec property if for any sequence $(x_{k})_{k \in N} \subset X,$ the weak convergence $x_{k} ⇀ x$ , together with $f (x_{k}) \to f (x) < \infty$ , implies $x_{k} \to x .$ It is not difficult to prove (see e.g. [Citation6]) that any proper, w.l.s.c. and uniformly convex functional has the Kadec property. In particular, the norm in a uniformly convex Banach space has this property.

Concrete examples of Banach spaces of interest are the Lebesgue space $L^{p} (Ω),$ the Sobolev space $W^{n, p} (Ω),$ $n \in N,$ and the space of $p -$ summable sequences $ℓ^{p} (R) .$ All these Banach spaces are both uniformly convex and uniformly smooth provided that $1 < p < \infty$ .

3. The nIT method

In this section, we present the nonstationary iterated Tikhonov (nIT) type method considered in this article, which aims to find stable approximate solutions to the inverse problem (Equation1(1) $A x = y,$ (1) ), (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ). The method proposed here is in the spirit of the method in [Citation6]. The distinguishing feature is the use of an a posteriori strategy for the choice of the Lagrange multipliers, as detailed below.

For a fixed r>1 and a uniformly convex penalty term f, the nIT method defines sequences $(x_{k}^{δ})_{k \in N}$ in X and $(ξ_{k}^{δ})_{k \in N}$ in $X^{*}$ iteratively by $\begin{aligned} x_{k}^{δ} := \arg min_{x \in X} λ_{k}^{δ} r^{- 1} {‖ A x - y^{δ} ‖}^{r} + D_{ξ_{k - 1}^{δ}} f (x, x_{k - 1}^{δ}), \\ ξ_{k}^{δ} := ξ_{k - 1}^{δ} - λ_{k}^{δ} A^{*} J_{r} (A x_{k}^{δ} - y^{δ}), \end{aligned}$ where the Lagrange multiplier $λ_{k}^{δ} > 0$ is to be determined using only information about A, δ, $y^{δ}$ and $x_{k - 1}^{δ}$ .

Our strategy for choosing the Lagrange multipliers is inspired by the work [Citation8], where the authors propose an endogenous strategy for the choice of the Lagrange multipliers in the nonsationary iterated Tikhonov method for solving (Equation1(1) $A x = y,$ (1) ), (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ) when X and Y are Hilbert spaces. This method is based on successive orthogonal projection methods onto a family of shrinking, separating convex sets. Specifically, the iterative method in [Citation8] obtains the new iterate projecting the current one onto a levelset of the residual function, whose level belongs to a range defined by the current residual and by the noise level. Moreover, the admissible Lagrange multipliers (in each iteration) shall be chosen in a non-degenerate interval.

Aiming to extend this framework to the Banach space setting, we are forced to introduce Bregman distance and Bregman projections. This is due to the well-known fact that in Banach spaces the metric projection onto a convex and closed set C, defined as $P_{C} (x) = a r g m i n_{z \in C} ‖ z - x ‖^{2}$ , loses the decreasing distance property of the orthogonal projection in Hilbert spaces. In order to recover this property, one should minimize in Banach spaces the Bregman distance, instead of the norm-induced distance.

For the remaining of this article we adopt the following main assumptions:

There exists an element $x^{⋆} \in X$ such that $A x^{⋆} = y$ , where $y \in R (A)$ is the exact data.
f is a w.l.s.c. function.
f is a uniformly convex function.
X and Y are reflexive Banach spaces and Y is smooth.

Moreover, we denote by $Ω_{μ}^{r} \subset X$ the μ-levelset of the residual functional $‖ A x - y^{δ} ‖$ , i.e. $Ω_{μ}^{r} := {x \in X : r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r}} .$ Note that, since A is a continuous linear operator, it follows that $Ω_{μ}^{r}$ is closed and convex. Now, given $\hat{x} \in Dom (\partial f)$ and $ξ \in \partial f (\hat{x})$ , we define the Bregman projection of $\hat{x}$ onto $Ω_{μ}^{r}$ , as a solution of the minimization problem (8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) It is worth noticing that a solution of the problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) depends on the sub-gradient ξ. Furthermore, since $D_{ξ} f (\cdot, \hat{x})$ is strictly convex (which follows from the uniformly convexity of f), problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) has at most one solution. The fact that the Bregman projection is well defined when $μ > δ$ (in this case we set $P_{Ω_{μ}^{r}}^{f} (\hat{x}) := \arg min_{x \in Ω_{μ}^{r}} D_{ξ} f (x, \hat{x})$ ) is a consequence of the following lemma.

Lemma 3.1

If $μ > δ,$ then problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) has a solution.

Proof.

Assumption (A.1), together with equation (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ) and the assumption that $μ > δ$ , implies that the feasible set of problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ), i.e. the set $Ω_{μ}^{r},$ is nonempty.

From Assumptions (A.2) and (A.3) it follows that $D_{ξ} f (\cdot, \hat{x})$ is proper, convex and l.s.c. Furthermore, relation (Equation7(7) $φ (‖ x - y ‖) \leq D_{ξ} f (x, y)$ (7) ) implies that $D_{ξ} f (\cdot, \hat{x})$ is a coercive function. Hence, the lemma follows using the reflexivity of X together with [Citation11, Corollary 3.23].Footnote²

Note that if $0 \leq μ^{'} \leq μ$ , then $Ω_{μ^{'}}^{r} \subseteq Ω_{μ}^{r}$ and $A^{- 1} (y) \subset Ω_{μ}^{r}$ for all $μ \geq δ$ . Furthermore, with the available information of the solution set of (Equation1(1) $A x = y,$ (1) ), $Ω_{δ}^{r}$ is the set of best possible approximate solution for this inverse problem. However, since problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) may be ill-posed when $μ = δ$ , our best choice is to generate $x_{k}^{δ}$ from $x_{k - 1}^{δ} \notin Ω_{δ}^{r}$ as a solution of problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ), with $\hat{x} = x_{k - 1}^{δ}$ and $μ = μ_{k}$ such that we guarantee a reduction of the residual norm while preventing ill-posedness of (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ).

For this purpose, we analyse in the sequel the minimization problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) by means of Lagrange multipliers. The Lagrangian function associated to problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) is $L (x, λ) = \frac{λ}{r} (‖ A x - y^{δ} ‖^{r} - μ^{r}) + D_{ξ} f (x, \hat{x}) .$ Note that, for each $λ > 0$ , the function $L (\cdot, λ) : X \to \bar{R}$ is l.s.c. and convex. For any $λ > 0$ define the functions (9) $π (\hat{x}, λ) := \arg min_{x \in X} L (x, λ), G_{\hat{x}} (λ) := ‖ A π (\hat{x}, λ) - y^{δ} ‖^{r} .$ (9) The next lemma provides a classical Lagrange multiplier result for problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ), which will be useful for formulating the nIT method.

Lemma 3.2

If $‖ A \hat{x} - y^{δ} ‖ > μ > δ,$ then the following assertions are equivalent

x is a solution of (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) );
there exists $λ^{*} > 0$ satisfying $x = π (\hat{x}, λ^{*})$ and $G_{\hat{x}} (λ^{*}) = μ^{r} .$

Proof.

It follows from (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ), Assumption (A.1) and the hypothesis $μ > δ$ that $x^{⋆} \in X$ satisfies $‖ A x^{⋆} - y^{δ} ‖^{r} < μ^{r} .$ This inequality implies the Slater condition for problem (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ). Thus, since A is continuous and $D_{ξ} f (\cdot, \hat{x})$ is l.s.c., we conclude that x is a solution of (Equation8(8) ${\begin{cases} min D_{ξ} f (x, \hat{x}) \\ s . t . r^{- 1} ‖ A x - y^{δ} ‖^{r} \leq r^{- 1} μ^{r} . \end{cases}$ (8) ) if and only if there exists $λ \in R$ such that the point $(x, λ)$ satisfies the Karush–Kuhn–Tucker (KKT) conditions for this minimization problem [Citation12], namely $λ \geq 0, G_{\hat{x}} (λ) \leq μ^{r}, λ (G_{\hat{x}} (λ) - μ^{r}) = 0, 0 \in \partial_{x} L (x, λ) .$ If we assume $λ = 0$ in the relations above, then the definition of the Lagrangian function, together with the strictly convexity of $D_{ξ} f (\cdot, \hat{x})$ , implies that $\hat{x}$ is the unique minimizer of $L (\cdot, 0)$ . Moreover, since $‖ A \hat{x} - y^{δ} ‖ > μ$ , we conclude that the pair $(\hat{x}, 0)$ does not satisfy the KKT conditions. Consequently, we have $λ > 0$ and $G_{\hat{x}} (λ) - μ^{r} = 0$ . The lemma follows using the definition of $π (\hat{x}, λ)$ .

We are ready to present the nIT method for solving (Equation1(1) $A x = y,$ (1) ).

Properties (Equation4(4) $\partial (f + g) (x) = \partial f (x) + \partial g (x) for all x \in X .$ (4) ) and (Equation5(5) $\partial (h (A \cdot - b)) (x) = A^{*} (\partial h) (A x - b) for all x \in X .$ (5) ), together with the definition of the duality mapping $J_{r}$ , imply that the point $x_{k}^{δ} \in X$ minimizes the optimization problem in [3.2] if and only if (10) $0 \in λ_{k}^{δ} A^{*} J_{r} (A x_{k}^{δ} - y^{δ}) + \partial f (x_{k}^{δ}) - ξ_{k - 1}^{δ} .$ (10) Hence, since Y is a smooth Banach space, the duality mapping $J_{r}$ is single valued and $ξ_{k - 1}^{δ} - λ_{k}^{δ} A^{*} J_{r} (A x_{k}^{δ} - y^{δ}) \in \partial f (x_{k}^{δ}) .$ Consequently, $ξ_{k}^{δ}$ in step 3.2 of Algorithm 1 is well defined and it is a sub-gradient of f at $x_{k}^{δ}$ .

Notice that the stopping criteria in Algorithm 1 corresponds to the discrepancy principle, i.e. the iteration stops at step $k (δ)$ defined by (11) $k (δ) := min {k \geq 1; ‖ A x_{j}^{δ} - y^{δ} ‖ > τ δ, j = 0, \dots, k - 1 and ‖ A x_{k}^{δ} - y^{δ} ‖ \leq τ δ} .$ (11)

Remark 3.3

Novel properties of the proposed method

The strategy used here is inspired by the recent work [Citation8], where the authors propose an endogenous strategy for the choice of the Lagrange multipliers in the nonstationary iterated Tikhonov method for solving (Equation1(1) $A x = y,$ (1) ), (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ) when X and Y are Hilbert spaces.
The penalty terms used in our Tikhonov functionals are the same as in [Citation6] and consist of Bregman distances induced by (uniformly) convex functionals, e.g, the sum of the $L^{2}$ -norm with the TV -seminorm.
We present a whole convergence analysis for the proposed method, characterizing it as a regularization metod.

4. Convergence analysis

In this section, we analyse the convergence properties of Algorithm 1. We begin by presenting the following result that establishes an estimate for the decay of the residual $‖ A x_{k}^{δ} - y^{δ} ‖$ . It can be proved in much the same manner as [Citation8, Proposition 4.1], and for the sake of brevity we omit the proof here.

Proposition 4.1

Let $(x_{k}^{δ})_{0 \leq k \leq k (δ)}$ be the (finite) sequence defined by the nIT method (Algorithm 1), with $δ \geq 0$ and $y^{δ} \in Y$ as in (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ). Then, $[‖ A x_{k}^{δ} - y^{δ} ‖ - δ] \leq η [‖ A x_{k - 1}^{δ} - y^{δ} ‖ - δ] \leq η^{k} [‖ A x_{0} - y^{δ} ‖ - δ], k = 1, \dots, k (δ),$ where $k (δ) \in N$ is defined by (Equation11(11) $k (δ) := min {k \geq 1; ‖ A x_{j}^{δ} - y^{δ} ‖ > τ δ, j = 0, \dots, k - 1 and ‖ A x_{k}^{δ} - y^{δ} ‖ \leq τ δ} .$ (11) ).

As a direct consequence of Proposition 4.1 we have that in the noisy data case, the discrepancy principle terminates the iteration after finitely many steps, i.e. $k (δ) < \infty$ . Furthermore, the corollary below gives an estimate for the stopping index $k (δ)$ .

Corollary 4.2

Let $(x_{k}^{δ})_{0 \leq k \leq k (δ)}$ be the (finite) sequence defined by the nIT method (Algorithm 1), with $δ > 0$ and $y^{δ} \in Y$ as in (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ). Then, the stopping index $k (δ),$ defined in (Equation11(11) $k (δ) := min {k \geq 1; ‖ A x_{j}^{δ} - y^{δ} ‖ > τ δ, j = 0, \dots, k - 1 and ‖ A x_{k}^{δ} - y^{δ} ‖ \leq τ δ} .$ (11) ), satisfies $k (δ) \leq | \ln η |^{- 1} \ln [\frac{‖ A x_{0} - y^{δ} ‖ - δ}{(τ - 1) δ}] + 1.$

In the next proposition we prove monotonicity of the sequence $(D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}))_{k \in N}$ and we also estimate the gain $D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) - D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ})$ , where $x^{⋆} \in X$ satisfies Assumption (A.1).

Proposition 4.3

Let $(x_{k}^{δ})_{0 \leq k \leq k (δ)}$ be the (finite) sequence defined by the nIT method (Algorithm 1), with $δ \geq 0$ and $y^{δ} \in Y$ as in (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ). Then, for every $x^{⋆}$ satisfying Assumption (A.1) it holds (12) $D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) \leq D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} (1 - \frac{1}{τ}) ‖ A x_{k}^{δ} - y^{δ} ‖^{r},$ (12) for $k = 1, \dots, k (δ) - 1.$

Proof.

Using the three points identity we have (13) $\begin{aligned} D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) & = - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) + ⟨ ξ_{k}^{δ} - ξ_{k - 1}^{δ}, x_{k}^{δ} - x^{⋆} ⟩ \\ = - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} ⟨ A^{*} J_{r} (A x_{k}^{δ} - y^{δ}), x_{k}^{δ} - x^{⋆} ⟩, \end{aligned}$ (13) where the second equality above follows from the definition of $ξ_{k}^{δ}$ . Simple manipulations of the second term on the right hand side above yield (14) $\begin{aligned} λ_{k}^{δ} ⟨ A^{*} J_{r} (A x_{k}^{δ} - y^{δ}), x_{k}^{δ} - x^{⋆} ⟩ & = λ_{k}^{δ} ⟨ J_{r} (A x_{k}^{δ} - y^{δ}), A x_{k}^{δ} - y^{δ} ⟩ + λ_{k}^{δ} ⟨ J_{r} (A x_{k}^{δ} - y^{δ}), y^{δ} - y ⟩ \\ = λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r} + λ_{k}^{δ} ⟨ J_{r} (A x_{k}^{δ} - y^{δ}), y^{δ} - y ⟩ . \end{aligned}$ (14) Combining these two relations we obtain $\begin{aligned} D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) & \leq - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r} \\ + λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r - 1} ‖ y^{δ} - y ‖ \\ \leq - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r} \\ + λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r - 1} δ, \end{aligned}$ where the last inequality follows from (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ). Since $k \in {1, \dots, k (δ) - 1}$ , we have $τ δ \leq ‖ A x_{k}^{δ} - y^{δ} ‖$ . Thus, $D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) \leq - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r} + \frac{λ_{k}^{δ}}{τ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r} .$ We deduce (Equation12(12) $D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) \leq D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} (1 - \frac{1}{τ}) ‖ A x_{k}^{δ} - y^{δ} ‖^{r},$ (12) ) from the above inequality.

Corollary 4.4

Let $(x_{k})_{k \in N}$ be the sequence defined by the nIT method (Algorithm 1) with $δ = 0$ and $(λ_{k})_{k \in N}$ be the sequence of corresponding Lagrange multipliers. Then, for all $k = 1, 2, \dots,$ and any $x^{⋆}$ satisfying (A.1), we have (15) $D_{ξ_{k}} f (x^{⋆}, x_{k}) = D_{ξ_{k - 1}} f (x^{⋆}, x_{k - 1}) - D_{ξ_{k - 1}} f (x_{k}, x_{k - 1}) - λ_{k} ‖ A x_{k} - y ‖^{r} .$ (15) Consequently, using a telescopic sum, we obtain (16) $\sum_{k = 1}^{\infty} λ_{k} ‖ A x_{k} - y ‖^{r} < \infty .$ (16)

Proof.

In the exact data case, i.e. $δ = 0$ and $y^{δ} = y$ , equality (Equation14(14) $\begin{aligned} λ_{k}^{δ} ⟨ A^{*} J_{r} (A x_{k}^{δ} - y^{δ}), x_{k}^{δ} - x^{⋆} ⟩ & = λ_{k}^{δ} ⟨ J_{r} (A x_{k}^{δ} - y^{δ}), A x_{k}^{δ} - y^{δ} ⟩ + λ_{k}^{δ} ⟨ J_{r} (A x_{k}^{δ} - y^{δ}), y^{δ} - y ⟩ \\ = λ_{k}^{δ} ‖ A x_{k}^{δ} - y^{δ} ‖^{r} + λ_{k}^{δ} ⟨ J_{r} (A x_{k}^{δ} - y^{δ}), y^{δ} - y ⟩ . \end{aligned}$ (14) ) becomes $λ_{k} ⟨ A^{*} J_{r} (A x_{k} - y), x_{k} - x^{⋆} ⟩ = λ_{k} ‖ A x_{k} - y ‖^{r} .$ Combining the formula above with (Equation13(13) $\begin{aligned} D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) & = - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) + ⟨ ξ_{k}^{δ} - ξ_{k - 1}^{δ}, x_{k}^{δ} - x^{⋆} ⟩ \\ = - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} ⟨ A^{*} J_{r} (A x_{k}^{δ} - y^{δ}), x_{k}^{δ} - x^{⋆} ⟩, \end{aligned}$ (13) ), when $δ = 0$ , we deduce (Equation15(15) $D_{ξ_{k}} f (x^{⋆}, x_{k}) = D_{ξ_{k - 1}} f (x^{⋆}, x_{k - 1}) - D_{ξ_{k - 1}} f (x_{k}, x_{k - 1}) - λ_{k} ‖ A x_{k} - y ‖^{r} .$ (15) ). The result in (Equation16(16) $\sum_{k = 1}^{\infty} λ_{k} ‖ A x_{k} - y ‖^{r} < \infty .$ (16) ) follows directly from (Equation15(15) $D_{ξ_{k}} f (x^{⋆}, x_{k}) = D_{ξ_{k - 1}} f (x^{⋆}, x_{k - 1}) - D_{ξ_{k - 1}} f (x_{k}, x_{k - 1}) - λ_{k} ‖ A x_{k} - y ‖^{r} .$ (15) ).

We now fix a $x_{0} \in Dom (\partial f)$ and $ξ_{0} \in \partial f (x_{0})$ , and study the existence and uniqueness of a vector $x^{†} \in X$ with the property (17) $D_{ξ_{0}} f (x^{†}, x_{0}) = inf {D_{ξ_{0}} f (x, x_{0}) : x \in Dom (f) and A x = y} .$ (17) Such an element $x^{†}$ is called a $x_{0} -$ minimal-distance solution and it is the equivalent of the $x_{0} -$ minimal-norm solution of (Equation1(1) $A x = y,$ (1) ) in the Hilbert space setting [Citation1].

Lemma 4.5

There exists a unique element $x^{†} \in X$ satisfying (Equation17(17) $D_{ξ_{0}} f (x^{†}, x_{0}) = inf {D_{ξ_{0}} f (x, x_{0}) : x \in Dom (f) and A x = y} .$ (17) ).

Proof.

Let $(x_{k})_{k \in N} \subset Dom (f)$ be a sequence satisfying $A x_{k} = y$ and $D_{ξ_{0}} f (x_{k}, x_{0}) \to a := inf {D_{ξ_{0}} f (x, x_{0}) : x \in Dom (f) and A x = y},$ then the sequence $(D_{ξ_{0}} f (x_{k}, x_{0}))_{k \in N}$ is bounded, and because f is uniformly convex we have that $(x_{k})_{k \in N}$ is bounded as well. Since X is reflexive, there exist a vector $\bar{x} \in X$ and a subsequence $(x_{k_{j}})_{j \in N}$ such that $x_{k_{j}} ⇀ \bar{x} .$ It follows that $A \bar{x} = y$ and because f is w.l.s.c. we have $D_{ξ_{0}} f (\bar{x}, x_{0}) \leq \underset{j \to \infty}{lim inf} D_{ξ_{0}} f (x_{k_{j}}, x_{0}) = lim_{j \to \infty} D_{ξ_{0}} f (x_{k_{j}}, x_{0}) = a,$ which implies that $\bar{x}$ is a $x_{0} -$ minimal-distance solution.

Suppose now that $x \neq z$ are two $x_{0} -$ minimal-distance solutions. Since f is strictly convex, so is $D_{ξ_{0}} f (\cdot, x_{0})$ , and for any $α \in (0, 1)$ we obtain $D_{ξ_{0}} f (α x + (1 - α) z, x_{0}) < α D_{ξ_{0}} f (x, x_{0}) + (1 - α) D_{ξ_{0}} f (z, x_{0}) = D_{ξ_{0}} f (x, x_{0}),$ which contradicts the minimality of x.

In the next theorem we prove strong convergence of the sequence generated by the nIT algorithm in the noise-free case to the solution $x^{†}$ .

Theorem 4.6

Let $(x_{k})_{k \in N}$ be the sequence defined by the nIT method (Algorithm 1) with $δ = 0$ and $(λ_{k})_{k \in N}$ the sequence of the corresponding Lagrange multipliers. Then, $(x_{k})_{k \in N}$ converges strongly to $x^{†} .$

Proof.

We first prove that $(x_{k})_{k \in N}$ is a Cauchy sequence in X. Let $0 \leq l < m$ and $x^{*}$ a solution of (Equation1(1) $A x = y,$ (1) ), using the three points identity we have (18) $D_{ξ_{l}} f (x_{m}, x_{l}) - D_{ξ_{l}} f (x^{*}, x_{l}) = - D_{ξ_{m}} f (x^{*}, x_{m}) + ⟨ ξ_{m} - ξ_{l}, x_{m} - x^{*} ⟩ .$ (18)

We observe that (19) $\begin{aligned} | ⟨ ξ_{m} - ξ_{l}, x_{m} - x^{*} ⟩ | & = | \sum_{j = l + 1}^{m} ⟨ ξ_{j} - ξ_{j - 1}, x_{m} - x^{*} ⟩ | \\ = | \sum_{j = l + 1}^{m} - λ_{j} ⟨ J_{r} (A x_{j} - y), A (x_{m} - x^{*}) ⟩ | \\ \leq \sum_{j = l + 1}^{m} λ_{j} ‖ A x_{j} - y ‖^{r - 1} ‖ A x_{m} - y ‖, \end{aligned}$ (19) where the second equality above follows form the definition of $ξ_{j}$ and the inequality is a consequence of the properties of the duality mapping $J_{r}$ . Proposition 4.1, with $δ = 0$ , implies that $‖ A x_{m} - y ‖ \leq ‖ A x_{j} - y ‖$ for all $j \leq m$ . Therefore, we have $| ⟨ ξ_{m} - ξ_{l}, x_{m} - x^{*} ⟩ | \leq \sum_{j = l + 1}^{m} λ_{j} ‖ A x_{j} - y ‖^{r} .$ Combining the inequality above with (Equation18(18) $D_{ξ_{l}} f (x_{m}, x_{l}) - D_{ξ_{l}} f (x^{*}, x_{l}) = - D_{ξ_{m}} f (x^{*}, x_{m}) + ⟨ ξ_{m} - ξ_{l}, x_{m} - x^{*} ⟩ .$ (18) ) we deduce that $D_{ξ_{l}} f (x_{m}, x_{l}) \leq D_{ξ_{l}} f (x^{*}, x_{l}) - D_{ξ_{m}} f (x^{*}, x_{m}) + \sum_{j = l + 1}^{m} λ_{j} ‖ A x_{j} - y ‖^{r} .$ From (Equation15(15) $D_{ξ_{k}} f (x^{⋆}, x_{k}) = D_{ξ_{k - 1}} f (x^{⋆}, x_{k - 1}) - D_{ξ_{k - 1}} f (x_{k}, x_{k - 1}) - λ_{k} ‖ A x_{k} - y ‖^{r} .$ (15) ) it follows that the sequence $(D_{ξ_{k}} f (x^{*}, x_{k}))_{k \in N}$ is monotonically decreasing. Thus, by inequality above and (Equation16(16) $\sum_{k = 1}^{\infty} λ_{k} ‖ A x_{k} - y ‖^{r} < \infty .$ (16) ) we have $D_{ξ_{l}} f (x_{m}, x_{l}) \to 0$ as $l, m \to \infty$ , and from the uniform convexity of f we obtain that $(x_{k})_{k \in N}$ is a Cauchy sequence in X. Therefore, there is $\bar{x} \in X$ such that $x_{k} \to \bar{x}$ as $k \to \infty$ , and since Proposition 4.1 implies that $‖ A x_{k} - y ‖ \to 0$ as $k \to \infty$ and A is a continuous map, we conclude that $A \bar{x} = y$ .

Now, we prove that $\bar{x} = x^{†} .$ We first observe that $ξ_{k} - ξ_{0} \in \partial (D_{ξ_{0}} f (\cdot, x_{0})) (x_{k})$ , which yields $⟨ ξ_{k} - ξ_{0}, x^{†} - x_{k} ⟩ \leq D_{ξ_{0}} f (x^{†}, x_{0}) - D_{ξ_{0}} f (x_{k}, x_{0}) .$ Thus, (20) $D_{ξ_{0}} f (\bar{x}, x_{0}) \leq \underset{k \to \infty}{lim inf} D_{ξ_{0}} f (x_{k}, x_{0}) \leq D_{ξ_{0}} f (x^{†}, x_{0}) + \underset{k \to \infty}{lim inf} ⟨ ξ_{k} - ξ_{0}, x_{k} - x^{†} ⟩ .$ (20) Next, we prove that (21) $lim_{k \to \infty} ⟨ ξ_{k} - ξ_{0}, x_{k} - x^{†} ⟩ = 0,$ (21) which in view of (Equation20(20) $D_{ξ_{0}} f (\bar{x}, x_{0}) \leq \underset{k \to \infty}{lim inf} D_{ξ_{0}} f (x_{k}, x_{0}) \leq D_{ξ_{0}} f (x^{†}, x_{0}) + \underset{k \to \infty}{lim inf} ⟨ ξ_{k} - ξ_{0}, x_{k} - x^{†} ⟩ .$ (20) ) will ensure that $D_{ξ_{0}} f (\bar{x}, x_{0}) \leq D_{ξ_{0}} f (x^{†}, x_{0}),$ proving that $\bar{x} = x^{†} .$ Indeed, using equation (Equation19(19) $\begin{aligned} | ⟨ ξ_{m} - ξ_{l}, x_{m} - x^{*} ⟩ | & = | \sum_{j = l + 1}^{m} ⟨ ξ_{j} - ξ_{j - 1}, x_{m} - x^{*} ⟩ | \\ = | \sum_{j = l + 1}^{m} - λ_{j} ⟨ J_{r} (A x_{j} - y), A (x_{m} - x^{*}) ⟩ | \\ \leq \sum_{j = l + 1}^{m} λ_{j} ‖ A x_{j} - y ‖^{r - 1} ‖ A x_{m} - y ‖, \end{aligned}$ (19) ) with $x^{*} = x^{†}$ , for m>l, we have $| ⟨ ξ_{m} - ξ_{l}, x_{m} - x^{†} ⟩ | = | \sum_{k = l + 1}^{m} - λ_{k} ⟨ J_{r} (A x_{k} - y), A x_{m} - y ⟩ | \leq \sum_{k = l + 1}^{m} λ_{k} {‖ A x_{k} - y ‖}^{r} \to 0$ as $l \to \infty .$ Then, given $ϵ > 0$ there exists $k_{0} \in N$ such that $k > k_{0} ⟹ | ⟨ ξ_{k} - ξ_{k_{0}}, x_{k} - x^{†} ⟩ | < \frac{ϵ}{2} .$ Now, $\begin{aligned} | ⟨ ξ_{k_{0}} - ξ_{0}, x_{k} - x^{†} ⟩ | & = | \sum_{n = 1}^{k_{0}} - λ_{n} ⟨ J_{r} (A x_{n} - y), A x_{k} - y ⟩ | \\ \leq ‖ A x_{k} - y ‖ \sum_{n = 1}^{k_{0}} λ_{n} {‖ A x_{n} - y ‖}^{r - 1} . \end{aligned}$ Since $‖ A x_{k} - y ‖ \to 0$ as $k \to \infty,$ we conclude that there exists a number $k_{1} \geq k_{0}$ such that $k > k_{1} ⟹ | ⟨ ξ_{k_{0}} - ξ_{0}, x_{k} - x^{†} ⟩ | < \frac{ϵ}{2} .$ Therefore, $k > k_{1}$ implies $⟨ ξ_{k} - ξ_{0}, x_{k} - x^{†} ⟩ < ϵ,$ which proves (Equation21(21) $lim_{k \to \infty} ⟨ ξ_{k} - ξ_{0}, x_{k} - x^{†} ⟩ = 0,$ (21) ) as we wanted.

Our intention in the remainder of this section is to study the convergence properties of the family $(x_{k (δ)}^{δ})_{δ > 0}$ as the noise level δ approaches zero. To achieve this goal we first establish a stability result connecting $x_{k}^{δ}$ to $x_{k}$ . Observe that in general, $x_{k + 1}^{δ}$ is not uniquely defined from $x_{k}^{δ},$ which motivates the following definition.

Definition 4.7

Let $0 < η_{0} \leq η_{1} < 1$ be pre-fixed constants. $\tilde{x} \in X$ is called a successor of $x_{k}^{δ}$ if

$k < k (δ)$ ;
There exists $0 \leq \tilde{λ} < \infty$ such that $\tilde{x} := \arg min_{x \in X} T_{\tilde{λ}}^{δ} (x)$ , where $T_{\tilde{λ}}^{δ}$ is the Tikhonov functional (22) $T_{\tilde{λ}}^{δ} (x) := \tilde{λ} r^{- 1} ‖ A x - y^{δ} ‖^{r} + D_{ξ_{k}^{δ}} f (x, x_{k}^{δ});$ (22)
The residual $‖ A \tilde{x} - y^{δ} ‖$ belongs to the interval (23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23)

In other words, there must exist a nonnegative Lagrange multiplier $\tilde{λ}$ , s.t. the minimizer $\tilde{x}$ of the corresponding Tikhonov functional in (Equation22(22) $T_{\tilde{λ}}^{δ} (x) := \tilde{λ} r^{- 1} ‖ A x - y^{δ} ‖^{r} + D_{ξ_{k}^{δ}} f (x, x_{k}^{δ});$ (22) ) attains a residual $‖ A \tilde{x} - y^{δ} ‖$ in the interval (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ) (which is defined by convex combinations of the noise level δ with the residual at the current iterate $x_{k}^{δ}$ ).

In the noisy-data case, as long as the discrepancy principle is not satisfied, the interval in (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ) is a subset of (24) $(δ, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖]$ (24) because $(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖ \geq δ + η_{0} (τ - 1) δ > δ .$ Therefore, if we consider a sequence generated by Algorithm 1 with the inequalities in [3.2] replaced by a more restrictive condition, as in item 3 of Definition 4.7, all the previous results still hold. Further, since $‖ A x_{k}^{δ} - y^{δ} ‖ > τ δ > δ$ and $η_{0} \leq η_{1},$ it is clear that interval (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ) is non-empty.

We note that interval (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ) becomes close to the interval (Equation24(24) $(δ, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖]$ (24) ) as $η_{0} \to 0,$ and therefore, the former interval is only a bit larger than the last one for $η_{0} \approx 0.$ In the noise-free case, (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ) reduces to the non-empty interval (25) $[η_{0} ‖ A x_{k} - y ‖, η_{1} ‖ A x_{k} - y ‖],$ (25) and according to Theorem 4.6, the sequence $(x_{k})_{k \in N}$ converges to $x^{†}$ whenever $x_{k + 1}$ is a successor of $x_{k}$ for all $k \in N$ . In this situation, we call $(x_{k})_{k \in N}$ a noiseless sequence.

Now, we study the behaviour of $x_{k}^{δ},$ for fixed k, as the noise level δ approaches zero. For the sake of the notation, we define $T_{λ_{k}} (x) := \frac{λ_{k}}{r} {‖ A x - y ‖}^{r} + D_{ξ_{k}} f (x, x_{k}),$ for $δ = 0$ (compare it with (Equation22(22) $T_{\tilde{λ}}^{δ} (x) := \tilde{λ} r^{- 1} ‖ A x - y^{δ} ‖^{r} + D_{ξ_{k}^{δ}} f (x, x_{k}^{δ});$ (22) )).

Theorem 4.8

Stability

Let $(δ_{j})_{j \in N}$ be a positive-zero sequence and fix $τ > 1.$ Assume that the sequences $(x_{k}^{δ_{j}})_{0 \leq k \leq k (δ_{j})},$ $j \in N,$ are fixed, where $x_{k + 1}^{δ_{j}}$ is a successor of $x_{k}^{δ_{j}}$ for $k = 0, \dots, k (δ_{j}) - 1.$ Further, assume that Y is a locally uniformly smooth Banach space. If $x_{0}$ is not a solution of (Equation1(1) $A x = y,$ (1) ), then there exists a noiseless sequence $(x_{k})_{k \in N}$ such that, for every fixed number $k \in N,$ there exists a subsequence $(δ_{j_{m}})_{m \in N}$ (depending on k) satisfying $x_{n}^{δ_{j_{m}}} \to x_{n}, ξ_{n}^{δ_{j_{m}}} \to ξ_{n} and f (x_{n}^{δ_{j_{m}}}) \to f (x_{n}) as m \to \infty, for n = 0, \dots, k .$

Proof.

Assume that $x_{0}$ is not a solution of $A x = y .$ We use an induction argument: since $x_{0}^{δ} = x_{0}$ and $ξ_{0}^{δ} = ξ_{0}$ for every $δ \geq 0,$ the result is clear for $k = 0.$ The argument consists in successively choosing a subsequence of the current subsequence, and to avoid a notational overload, we denote a subsequence of $(δ_{j})$ still by $(δ_{j}) .$ Suppose the result holds true for some $k \in N .$ Then, there exists a subsequence $(δ_{j})_{j \in N}$ satisfying (26) $x_{n}^{δ_{j}} \to x_{n}, ξ_{n}^{δ_{j}} \to ξ_{n} and f (x_{n}^{δ_{j}}) \to f (x_{n}) as j \to \infty, for n = 0, \dots, k,$ (26) where $x_{n}$ is a successor of $x_{n - 1}$ for $n = 1, \dots, k .$ Because $x_{k + 1}^{δ_{j}}$ is a successor of $x_{k}^{δ_{j}},$ it is true that $k < k (δ_{j})$ for all j. Due to the same reason, there exists a non-negative number $λ_{k}^{δ_{j}}$ such that $x_{k + 1}^{δ_{j}} = \arg min_{x \in X} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x)$ with the resulting residual $‖ A x_{k + 1}^{δ} - y^{δ} ‖$ lying in the interval (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ). Our task now is proving that there exists a successor $x_{k + 1}$ of $x_{k}$ and a subsequence $(δ_{j})$ of the current subsequence such that $x_{k + 1}^{δ_{j}} \to x_{k + 1},$ $ξ_{k + 1}^{δ_{j}} \to ξ_{k + 1}$ and $f (x_{k + 1}^{δ_{j}}) \to f (x_{k + 1})$ as $j \to \infty$ . Since the proof is relatively large, we divide it in 4 main steps: 1. we find a vector $\bar{x} \in X$ such that (27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) for a specific subsequence $(δ_{j})_{j \in N} .$ 2. using the third item in Definition 4.7 we show that the sequence $(λ_{k}^{δ_{j}})_{j \in N}$ is bounded. Then, we choose a convergent subsequence and define (28) $λ_{k} := lim_{j \to \infty} λ_{k}^{δ_{j}} < \infty,$ (28) as well as (29) $x_{k + 1} := \arg min_{x \in X} T_{λ_{k}} (x) .$ (29) 3. we prove that $x_{k + 1} = \bar{x}$ which guarantees that $x_{k + 1}^{δ_{j}} ⇀ x_{k + 1} .$ 4. Finally, we prove that (30) $f (x_{k + 1}^{δ_{j}}) \to f (x_{k + 1}) as j \to \infty,$ (30) which in view of (Equation27(27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) ) will guarantee that $x_{k + 1}^{δ_{j}} \to x_{k + 1}$ as $j \to \infty,$ since f has the Kadec property, see last paragraph in Section 2. The last result in turn, will prove that $ξ_{k + 1}^{δ_{j}} \to ξ_{k + 1} .$ Finally, we validate that $x_{k + 1}$ is a successor of $x_{k},$ completing the induction argument.

Step 1: From (Equation12(12) $D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) \leq D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} (1 - \frac{1}{τ}) ‖ A x_{k}^{δ} - y^{δ} ‖^{r},$ (12) ) and the uniform convexity of f follows that the sequence $(x_{k + 1}^{δ_{j}})_{j \in N}$ is bounded. Thus, there exists a subsequence $(δ_{j})$ of the current subsequence, and a vector $\bar{x} \in X$ such that (Equation27(27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) ) holds true.

Step 2: We claim that, for each $k \in N$ fixed, there exists a constant $λ_{max, k} > 0$ such that (31) $λ_{k}^{δ_{j}} \leq λ_{max, k} for all j \in N .$ (31) Indeed, assume the contrary. Then, there is a subsequence satisfying $λ_{k}^{δ_{j}} \to \infty$ as $j \to \infty .$ But in this case, $\begin{aligned} \underset{j \to \infty}{lim inf} \frac{1}{r} {‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖}^{r} & \leq \underset{j \to \infty}{lim inf} \frac{T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}})}{λ_{k}^{δ_{j}}} \leq \underset{j \to \infty}{lim inf} \frac{T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x^{†})}{λ_{k}^{δ_{j}}} \\ = lim_{j \to \infty} (\frac{1}{r} {‖ A x^{†} - y^{δ_{j}} ‖}^{r} + \frac{D_{ξ_{k}^{δ_{j}}} f (x^{†}, x_{k}^{δ_{j}})}{λ_{k}^{δ_{j}}}) \\ \leq lim_{j \to \infty} (\frac{1}{r} δ_{j}^{r} + \frac{D_{ξ_{0}} f (x^{†}, x_{0})}{λ_{k}^{δ_{j}}}) = 0. \end{aligned}$ This, together with the lower bound of interval (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ) implies that $x_{0}$ is a solution of Ax=y because (32) $\begin{aligned} 0 \leq η_{0}^{k + 1} ‖ A x_{0} - y ‖ & = η_{0}^{k + 1} lim_{j \to \infty} (‖ A x_{0}^{δ_{j}} - y^{δ_{j}} ‖ - δ_{j}) \\ \leq \underset{j \to \infty}{lim inf} (‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖ - δ_{j}) = 0, \end{aligned}$ (32) and we have a contradiction. Thus (Equation31(31) $λ_{k}^{δ_{j}} \leq λ_{max, k} for all j \in N .$ (31) ) is true. We fix a convergent subsequence and define $λ_{k}$ as in (Equation28(28) $λ_{k} := lim_{j \to \infty} λ_{k}^{δ_{j}} < \infty,$ (28) ) and $x_{k + 1}$ as in (Equation29(29) $x_{k + 1} := \arg min_{x \in X} T_{λ_{k}} (x) .$ (29) ).

Step 3: Observe that $\begin{aligned} | ⟨ ξ_{k}, \bar{x} - x_{k} ⟩ - ⟨ ξ_{k}^{δ_{j}}, x_{k + 1}^{δ_{j}} - x_{k}^{δ_{j}} ⟩ | \leq & | ⟨ ξ_{k}, \bar{x} - x_{k + 1}^{δ_{j}} ⟩ | + ‖ ξ_{k} ‖ ‖ x_{k}^{δ_{j}} - x_{k} ‖ \\ + ‖ ξ_{k} - ξ_{k}^{δ_{j}} ‖ ‖ x_{k + 1}^{δ_{j}} - x_{k}^{δ_{j}} ‖ . \end{aligned}$ Thus, from (Equation27(27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) ) and the induction hypothesis it follows that $lim_{j \to \infty} ⟨ ξ_{k}^{δ_{j}}, x_{k + 1}^{δ_{j}} - x_{k}^{δ_{j}} ⟩ = ⟨ ξ_{k}, \bar{x} - x_{k} ⟩ .$ Now, the weak lower semi-continuity of f, together with (Equation27(27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) ) and the induction hypothesis, implies that (33) $\begin{aligned} D_{ξ_{k}} f (\bar{x}, x_{k}) & = f (\bar{x}) - f (x_{k}) - ⟨ ξ_{k}, \bar{x} - x_{k} ⟩ \\ \leq \underset{j \to \infty}{lim inf} f (x_{k + 1}^{δ_{j}}) - lim_{j \to \infty} f (x_{k}^{δ_{j}}) - lim_{j \to \infty} ⟨ ξ_{k}^{δ_{j}}, x_{k + 1}^{δ_{j}} - x_{k}^{δ_{j}} ⟩ \\ = \underset{j \to \infty}{lim inf} D_{ξ_{k}^{δ_{j}}} f (x_{k + 1}^{δ_{j}}, x_{k}^{δ_{j}}) . \end{aligned}$ (33) From (Equation27(27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) ) we have (34) $A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ⇀ A \bar{x} - y as j \to \infty,$ (34) which together with (Equation28(28) $λ_{k} := lim_{j \to \infty} λ_{k}^{δ_{j}} < \infty,$ (28) ) and the lower semi-continuity of Banach space norms yields $\begin{aligned} T_{λ_{k}} (\bar{x}) & = \frac{λ_{k}}{r} {‖ A \bar{x} - y ‖}^{r} + D_{ξ_{k}} f (\bar{x}, x_{k}) \\ \leq \underset{j \to \infty}{lim inf} (\frac{λ_{k}^{δ_{j}}}{r} {‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖}^{r} + D_{ξ_{k}^{δ_{j}}} f (x_{k + 1}^{δ_{j}}, x_{k}^{δ_{j}})) \\ = \underset{j \to \infty}{lim inf} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}}) \leq \underset{j \to \infty}{lim inf} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}) = lim_{j \to \infty} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}) = T_{λ_{k}} (x_{k + 1}) . \end{aligned}$ This proves that $\bar{x} = x_{k + 1}$ because $x_{k + 1}$ is the unique minimizer of $T_{λ_{k}} .$ Thus, $x_{k + 1}^{δ_{j}} ⇀ x_{k + 1}$ as $j \to \infty .$ The above inequalities also ensure that $\underset{j \to \infty}{lim inf} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}}) = T_{λ_{k}} (x_{k + 1}) .$ Taking a subsequence if necessary, we can assume that the following sequences converge: (35) $a_{j} := D_{ξ_{k}^{δ_{j}}} f (x_{k + 1}^{δ_{j}}, x_{k}^{δ_{j}}), a := lim_{j \to \infty} a_{j},$ (35) and (36) $r e_{j} := {‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖}^{r}, r e := lim_{j \to \infty} r e_{j} .$ (36) We can also assume for this subsequence that (37) $lim_{j \to \infty} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}}) = T_{λ_{k}} (x_{k + 1}) .$ (37) Step 4: Define $c := D_{ξ_{k}} f (x_{k + 1}, x_{k})$ and obseve that from (Equation33(33) $\begin{aligned} D_{ξ_{k}} f (\bar{x}, x_{k}) & = f (\bar{x}) - f (x_{k}) - ⟨ ξ_{k}, \bar{x} - x_{k} ⟩ \\ \leq \underset{j \to \infty}{lim inf} f (x_{k + 1}^{δ_{j}}) - lim_{j \to \infty} f (x_{k}^{δ_{j}}) - lim_{j \to \infty} ⟨ ξ_{k}^{δ_{j}}, x_{k + 1}^{δ_{j}} - x_{k}^{δ_{j}} ⟩ \\ = \underset{j \to \infty}{lim inf} D_{ξ_{k}^{δ_{j}}} f (x_{k + 1}^{δ_{j}}, x_{k}^{δ_{j}}) . \end{aligned}$ (33) ), the inequality $c \leq a$ holds. Thus, it suffices to prove that $a \leq c$ to ensure that $lim_{j \to \infty} D_{ξ_{k}^{δ_{j}}} f (x_{k + 1}^{δ_{j}}, x_{k}^{δ_{j}}) = D_{ξ_{k}} f (x_{k + 1}, x_{k}),$ which will prove (Equation30(30) $f (x_{k + 1}^{δ_{j}}) \to f (x_{k + 1}) as j \to \infty,$ (30) ). We assume that a>c and derive a contradiction. From (Equation28(28) $λ_{k} := lim_{j \to \infty} λ_{k}^{δ_{j}} < \infty,$ (28) ), (Equation35(35) $a_{j} := D_{ξ_{k}^{δ_{j}}} f (x_{k + 1}^{δ_{j}}, x_{k}^{δ_{j}}), a := lim_{j \to \infty} a_{j},$ (35) ), (Equation36(36) $r e_{j} := {‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖}^{r}, r e := lim_{j \to \infty} r e_{j} .$ (36) ), (Equation37(37) $lim_{j \to \infty} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}}) = T_{λ_{k}} (x_{k + 1}) .$ (37) ), together with the definition of limit, it follows the existence of a number $N \in N$ such that $j \geq N$ implies $\begin{aligned} T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}}) < T_{λ_{k}} (x_{k + 1}) + \frac{a - c}{2}, r e \leq r e_{j} + \frac{a - c}{6 λ_{max, k}}, \\ λ_{k} & \leq λ_{k}^{δ_{j}} + \frac{a - c}{6 r e} and a \leq a_{j} + \frac{a - c}{6} . \end{aligned}$ Therefore, for any $j \geq N$ it holds $\begin{aligned} T_{λ_{k}} (x_{k + 1}) & \leq λ_{k} r e + c = (λ_{k} - λ_{k}^{δ_{j}}) r e + λ_{k}^{δ_{j}} r e + a - (a - c) \\ \leq \frac{a - c}{6 r e} r e + λ_{k}^{δ_{j}} (r e_{j} + \frac{a - c}{6 λ_{max, k}}) + (a_{j} + \frac{a - c}{6}) - (a - c) \\ \leq λ_{k}^{δ_{j}} r e_{j} + a_{j} - \frac{a - c}{2} = T_{λ_{k}^{δ_{j}}}^{δ_{j}} (x_{k + 1}^{δ_{j}}) - \frac{a - c}{2} < T_{λ_{k}} (x_{k + 1}), \end{aligned}$ which leads to an obvious contradiction, proving that $a \leq c$ as we wanted. Hence (Equation30(30) $f (x_{k + 1}^{δ_{j}}) \to f (x_{k + 1}) as j \to \infty,$ (30) ) holds, and since f has the Kadec property, it follows, in view of (Equation27(27) $x_{k + 1}^{δ_{j}} ⇀ \bar{x}$ (27) ), that $x_{k + 1}^{δ_{j}} \to x_{k + 1} .$ This last result, together with the continuity of the duality mapping in locally uniformly smooth Banach spaces, implies that $ξ_{k + 1}^{δ_{j}} \to ξ_{k + 1} .$

It only remains to prove that $‖ A x_{k + 1} - y ‖$ belongs to interval (Equation25(25) $[η_{0} ‖ A x_{k} - y ‖, η_{1} ‖ A x_{k} - y ‖],$ (25) ), which will guarantee that $x_{k + 1}$ is a successor of $x_{k} .$ But this result follows from applying the limit $j \to \infty$ to the sequence $‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖,$ which belongs to interval (Equation23(23) $[(1 - η_{0}) δ + η_{0} ‖ A x_{k}^{δ} - y^{δ} ‖, (1 - η_{1}) δ + η_{1} ‖ A x_{k}^{δ} - y^{δ} ‖] .$ (23) ).

Theorem 4.9

Regularization

Proof.

The result is clear if $x_{0}$ is a solution of $A x = y .$ Assume this is not the case. We first validate that the sequence $(k (δ_{j}))_{j \in N}$ has no convergent subsequences. Indeed, if for a subsequence $(δ_{j_{m}})_{m \in N}$ it is true that $k (δ_{j_{m}}) \to n$ as $m \to \infty$ , then since $k (δ_{j_{m}}) \in N$ for all $m \in N,$ we must have $k (δ_{j_{m}}) = n \in N$ for m large enough. From Theorem 4.8, the subsequence $(x_{n}^{δ_{j_{m}}})_{m \in N}$ has itself a subsequence (which we still denote by $(x_{n}^{δ_{j_{m}}})_{m \in N}$ ) which converges to $x_{n},$ this is, $lim_{m \to \infty} x_{k (δ_{j_{m}})}^{δ_{j_{m}}} = lim_{m \to \infty} x_{n}^{δ_{j_{m}}} = x_{n} .$ But $x_{n}$ is a solution of Ax=y since $\begin{aligned} ‖ y - A x_{n} ‖ & = lim_{m \to \infty} ‖ y - A x_{k (δ_{j_{m}})}^{δ_{j_{m}}} ‖ \\ \leq lim_{m \to \infty} (‖ y - y^{δ_{j_{m}}} ‖ + ‖ y^{δ_{j_{m}}} - A x_{k (δ_{j_{m}})}^{δ_{j_{m}}} ‖) \\ \leq lim_{m \to \infty} (τ + 1) δ_{j_{m}} = 0. \end{aligned}$ As in the proof of Theorem 4.8 (see (Equation32(32) $\begin{aligned} 0 \leq η_{0}^{k + 1} ‖ A x_{0} - y ‖ & = η_{0}^{k + 1} lim_{j \to \infty} (‖ A x_{0}^{δ_{j}} - y^{δ_{j}} ‖ - δ_{j}) \\ \leq \underset{j \to \infty}{lim inf} (‖ A x_{k + 1}^{δ_{j}} - y^{δ_{j}} ‖ - δ_{j}) = 0, \end{aligned}$ (32) )) we conclude that $x_{0}$ is a solution of $A x = y,$ contradicting our assumption. Therefore $k (δ_{j}) \to \infty$ as $j \to \infty .$

We now prove that each subsequence of $(x_{k (δ_{j})}^{δ_{j}})_{j \in N}$ has itself a subsequence which converges to $x^{†} .$ This will prove (Equation38(38) $lim_{j \to \infty} x_{k (δ_{j})}^{δ_{j}} = x^{†} .$ (38) ). We observe that since any subsequence of $(δ_{j})_{j \in N}$ is itself a positive-zero sequence, it suffices to prove that $(x_{k (δ_{j})}^{δ_{j}})_{j \in N}$ has a subsequence which converges to $x^{†}$ .

Our first step is proving that, for every $ϵ > 0$ fixed, there exists a subsequence (which we still denote by $(δ_{j})_{j \in N}$ ) depending on ε, and a number $J = J (ϵ),$ such that (39) $j \geq J ⟹ D_{ξ_{k (δ_{j})}^{δ_{j}}} f (x^{†}, x_{k (δ_{j})}^{δ_{j}}) < ϵ .$ (39) In fact, fix $ϵ > 0$ and let $(x_{k})_{k \in N}$ be the noiseless sequence constructed in Theorem 4.8. Since $x_{k + 1}$ is a successor of $x_{k}$ for all $k \in N,$ the sequence $(x_{k})_{k \in N}$ converges to $x^{†}$ (see Theorem 4.6). Then, there exists a natural number $N = N (ϵ)$ such that $‖ x_{N} - x^{†} ‖ < \frac{1}{2} \sqrt{\frac{ϵ}{2}} and D_{ξ_{N}} f (x^{†}, x_{N}) < \frac{ϵ}{2} .$ From Theorem 4.8 there exists a subsequence (still denoted by $(δ_{j})_{j \in N}$ ) depending on N, and a number $J_{1} \in N,$ depending on ε, such that $j \geq J_{1} ⟹ [‖ ξ_{N}^{δ_{j}} - ξ_{N} ‖ < \sqrt{\frac{ϵ}{2}} and ‖ x_{N}^{δ_{j}} - x_{N} ‖ < \frac{1}{2} \sqrt{\frac{ϵ}{2}}] .$ Since $k (δ_{j}) \to \infty,$ there is a number $J_{2} \in N$ such that $k (δ_{j}) \geq N$ for all $j \geq J_{2} .$ It follows from (Equation12(12) $D_{ξ_{k}^{δ}} f (x^{⋆}, x_{k}^{δ}) \leq D_{ξ_{k - 1}^{δ}} f (x^{⋆}, x_{k - 1}^{δ}) - D_{ξ_{k - 1}^{δ}} f (x_{k}^{δ}, x_{k - 1}^{δ}) - λ_{k}^{δ} (1 - \frac{1}{τ}) ‖ A x_{k}^{δ} - y^{δ} ‖^{r},$ (12) ) and the three points identity that for $j \geq J := max {J_{1}, J_{2}},$ $\begin{aligned} D_{ξ_{k (δ_{j})}^{δ_{j}}} f (x^{†}, x_{k (δ_{j})}^{δ_{j}}) & \leq D_{ξ_{N}^{δ_{j}}} f (x^{†}, x_{N}^{δ_{j}}) \\ \leq D_{ξ_{N}} f (x^{†}, x_{N}) - D_{ξ_{N}} f (x_{N}^{δ_{j}}, x_{N}) + ‖ ξ_{N}^{δ_{j}} - ξ_{N} ‖ ‖ x_{N}^{δ_{j}} - x^{†} ‖ \\ \leq D_{ξ_{N}} f (x^{†}, x_{N}) + ‖ ξ_{N}^{δ_{j}} - ξ_{N} ‖ (‖ x_{N}^{δ_{j}} - x_{N} ‖ + ‖ x_{N} - x^{†} ‖) < ϵ, \end{aligned}$ which proves (Equation39(39) $j \geq J ⟹ D_{ξ_{k (δ_{j})}^{δ_{j}}} f (x^{†}, x_{k (δ_{j})}^{δ_{j}}) < ϵ .$ (39) ).

Choosing $ϵ = 1,$ we can find a subsequence $(δ_{j})_{j \in N}$ and select a number $j_{1} \in N$ such that $D_{ξ_{k (δ_{j_{1}})}^{δ_{j_{1}}}} f (x^{†}, x_{k (δ_{j_{1}})}^{δ_{j_{1}}}) < 1.$ Since the current subsequence $(δ_{j})_{j \in N}$ is also a positive-zero sequence, the above reasoning can be applied again in order to extract a subsequence of the current one satisfying (Equation39(39) $j \geq J ⟹ D_{ξ_{k (δ_{j})}^{δ_{j}}} f (x^{†}, x_{k (δ_{j})}^{δ_{j}}) < ϵ .$ (39) ) with $ϵ = 1 / 2$ . We choose a number $j_{2} \geq j_{1}$ such that the inequality $D_{ξ_{k (δ_{j_{2}})}^{δ_{j_{2}}}} f (x^{†}, x_{k (δ_{j_{2}})}^{δ_{j_{2}}}) < \frac{1}{2}$ holds true. Using induction, it is therefore possible to construct a subsequence $(δ_{j_{m}})_{m \in N}$ with the property $D_{ξ_{k (δ_{j_{m}})}^{δ_{j_{m}}}} f (x^{†}, x_{k (δ_{j_{m}})}^{δ_{j_{m}}}) < \frac{1}{m} for all m \in N,$ which implies that $lim_{m \to \infty} D_{ξ_{k (δ_{j_{m}})}^{δ_{j_{m}}}} f (x^{†}, x_{k (δ_{j_{m}})}^{δ_{j_{m}}}) = 0,$ and since f is uniformly convex, $lim_{m \to \infty} ‖ x_{k (δ_{j_{m}})}^{δ_{j_{m}}} - x^{†} ‖ = 0.$

5. Algorithms and numerical implementation

5.1. Determining the Lagrange multipliers

As before, we consider the function $G_{\hat{x}} (λ) = ‖ A x_{λ} - y^{δ} ‖^{r}$ , where $x_{λ} = π (λ, \hat{x})$ represents the minimizer of the Tikhonov functional $T_{λ}^{δ} (x) = \frac{1}{r} λ ‖ A x - y^{δ} ‖^{r} + D_{ξ} f (x, \hat{x}) .$ In order to choose the Lagrange multiplier in the k-th iteration, our strategy consists in finding a $λ_{k} > 0$ such that $G_{x_{k - 1}} (λ_{k}) \in [a_{k}, b_{k}]$ , where $a_{k} := (η_{0} δ + (1 - η_{0}) ‖ A x_{k - 1} - y^{δ} ‖)^{r} and b_{k} := (η_{1} δ + (1 - η_{1}) ‖ A x_{k - 1} - y^{δ} ‖)^{r} .$ Here $0 < η_{0} \leq η_{1} < 1$ are pre-defined constants.

We have employed three different methods to compute $λ_{k}$ : the well-known secant and Newton methods and a third strategy, here called adaptive method, which we now explain: fix $σ_{1}$ , $σ_{2} \in (0, 1)$ , $c_{1} > 1$ and some initial value $λ_{0}^{δ} > 0$ . In the k-th iteration, $k \geq 1$ , we define $λ_{k}^{δ} = c_{k} λ_{k - 1}^{δ}$ , where $c_{k} = {\begin{cases} c_{k - 1} σ_{1}, & if G_{x_{k - 2}} (λ_{k - 1}^{δ}) < a_{k - 1} \\ c_{k - 1} σ_{2}^{- 1}, & if G_{x_{k - 2}} (λ_{k - 1}^{δ}) > b_{k - 1} \\ c_{k - 1}, & otherwise \end{cases}, for k \geq 2.$ The idea behind the adaptive method is observing the behaviour of the residual in last iterations and trying to determine how much the Lagrange multiplier should be increased in the next iteration. For example, the residual $G_{x_{k - 2}} (λ_{k - 1}^{δ}) = ‖ A x_{k - 1} - y^{δ} ‖^{r}$ lying on the left of the target interval $[a_{k - 1}, b_{k - 1}]$ , means that $λ_{k - 1}^{δ}$ was too large. We thus multiply $c_{k - 1}$ by a number $σ_{1} \in (0, 1)$ in order to reduce the rate of growth of the Lagrange multiplier $λ_{k}^{δ}$ , trying to hit the target in the next iteration.

Although the Newton method is efficient, in the sense that it normally finds a good approximation for the Lagrange multiplier in very few steps, it has the drawback of demanding the differentiability of the Tikhonov functional, and therefore it cannot be applied in all situations.

Because it does not require the evaluation of derivatives, the secant method can be used even for a nonsmooth Tikhonov functional. A disadvantage of this method is the high computational effort required to perform it.

Among these three possibilities, the adaptive strategy is the cheapest one, since it only demands one minimization of the Tikhonov functional per iteration. Further, this simple strategy does not require the derivative of this functional, which makes it fit in a large range of applications.

Note that this third strategy may generate a $λ_{k}^{δ}$ such that $G_{x_{k - 1}} (λ_{k}^{δ}) \notin [a_{k}, b_{k}]$ in some iterative steps. This is the reason for correcting the factors $c_{k}$ in each iteration. In our numerical experiments, the condition $G_{x_{k - 1}} (λ_{k}^{δ}) \in [a_{k}, b_{k}]$ was satisfied in almost all steps.

5.2. Minimization of the Tikhonov functional

In our numerical experiments, we are interested in solving the inverse problem (Equation1(1) $A x = y,$ (1) ), where $A : L^{p} (Ω) \to L^{2} (Ω)$ , with $1 < p < \infty$ , is linear and bounded, noisy data $y^{δ}$ are available, and the noise level $δ > 0$ is known.

In order to implement the nIT method (Algorithm 1), a minimizer of the Tikhonov functional (Equation22(22) $T_{\tilde{λ}}^{δ} (x) := \tilde{λ} r^{- 1} ‖ A x - y^{δ} ‖^{r} + D_{ξ_{k}^{δ}} f (x, x_{k}^{δ});$ (22) ) needs to be calculated in each iteration step. Minimizing this functional can be itself a very challenging task. We have used two algorithms for achieving this goal in our numerical experiments: 1. The Newton method was used for minimizing this functional in the case $p \neq 2$ and with a smooth function f, which induces the Bregman distance in the penalization term. 2. The so called ADMM method was employed in order to minimize the Tikhonov functional for the case p=2 (Hilbert space) and a nonsmooth functional f. In the following, we explain the details.

First we consider the Newton method. Define the Bregman distance induced by the norm-functional $f (g) := \frac{1}{p} ‖ g ‖_{L^{p}}^{p},$ $1 < p < \infty,$ which leads to the smooth penalization term $D_{ξ} f (g, h) = Δ_{p} (g, h),$ see Section 2. The resulting Tikhonov functional reads $T_{λ} (g) = \frac{1}{2} λ ‖ A g - y^{δ} ‖^{2} + Δ_{p} (g, g_{k - 1}),$ where $g_{k - 1}$ is the current iterate.Footnote³ In this case, the optimality condition (Equation10(10) $0 \in λ_{k}^{δ} A^{*} J_{r} (A x_{k}^{δ} - y^{δ}) + \partial f (x_{k}^{δ}) - ξ_{k - 1}^{δ} .$ (10) ) reads: (40) $F (\bar{g}) = λ A^{*} y^{δ} + J_{p} (g_{k - 1}),$ (40) where $\bar{g} \in L^{p} (Ω)$ is the minimizer of $T_{λ} (g)$ and $F (g) := λ A^{*} A g + J_{p} (g)$ .

In order to apply the Newton method to the nonlinear equation (Equation40(40) $F (\bar{g}) = λ A^{*} y^{δ} + J_{p} (g_{k - 1}),$ (40) ), one needs to evaluate the derivative of F, which (whenever exists) is given by $F^{'} (g) = λ A^{*} A + J_{p}^{'} (g)$ . Next we present an explicit expression for the Gâteaux-derivative $J_{p}^{'} (g)$ (the derivation of this expression is postponed to Appendix A, where the Gâteaux-differentiability of $J_{p}$ in $L^{p} (Ω)$ , for $p \geq 2$ , is investigated). Given $g \in L^{p} (Ω)$ , with $p \geq 2$ , it holds (41) $(J_{p}^{'} (g)) (h) = ⟨ (p - 1) | g |^{p - 2}, h ⟩, \forall h \in L^{p} (Ω),$ (41) where the linear operator $(p - 1) | g |^{p - 2} : h \mapsto (p - 1) | g (\cdot) |^{p - 2} h (\cdot)$ is to be understood in pointwise sense. In the discretized setting, $J_{p}^{'} (g)$ is a diagonal matrix whose $i -$ th element on its diagonal is $(p - 1) | g (x_{i}) |^{p - 2},$ with $x_{i}$ being the $i -$ th point of the chosen mesh.

In our numerical simulations, we consider the situation where the sought solution is sparse and, therefore, the case $p \approx 1$ is of our interest. We stress the fact that (EquationA2(A2) $γ^{''} (x) = (p - 1) | x |^{p - 2} .$ (A2) ) (see Appendix A) holds true even for 1<p<2 whenever $x \neq 0$ . Using this fact, one can prove that (Equation41(41) $(J_{p}^{'} (g)) (h) = ⟨ (p - 1) | g |^{p - 2}, h ⟩, \forall h \in L^{p} (Ω),$ (41) ) holds for these values of p, e.g. if g does not change signal in Ω (i.e, either g>0 in Ω, or g<0 in Ω) and the direction h is a bounded function in this set. However, these strong hypotheses are very difficult to check, and even if they are satisfied, we still expect having stability problems for inverting the matrix $F^{'} (g)$ if the function g attains a small value in some point of the mesh, because the function in (EquationA2(A2) $γ^{''} (x) = (p - 1) | x |^{p - 2} .$ (A2) ) satisfies $γ^{''} (x) \to \infty$ as $x \to 0$ . In order to avoid this kind of problem in our numerical experiments, we have replaced the $i -$ th element on the diagonal of the matrix $J_{p}^{'} (g)$ by $min {(p - 1) | g (x_{i}) |^{p - 2}, 10^{6}}$ .

The second method that we used in our experiments was the well-known Alternating Direction Method of Multipliers (ADMM), which has been implemented to minimize the Tikhonov functional associated with the inverse problem $A x = y^{δ}$ , where $X = Y = R^{n}$ , $A : R^{n} \to R^{n}$ , and $f : R^{n} \to \bar{R}$ is a nonsmooth function.

ADMM is an optimization scheme for solving linearly constrained programming problems with decomposable structure [Citation13], which goes back to the works of Glowinski and Marrocco [Citation14], and of Gabay and Mercier [Citation15]. Specifically, this algorithm solves problems in the form: (42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) where $φ : R^{n} \to \bar{R}$ and $ϕ : R^{m} \to \bar{R}$ are convex proper l.s.c. functions, $M : R^{n} \to R^{l}$ and $B : R^{m} \to R^{l}$ are linear operators, and $d \in R^{l}$ .

ADMM solves the coupled problem (Equation42(42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) ) performing a sequences of steps that decouple functions ϕ and φ, making it possible to exploit the individual structure of these functions. It can be interpreted in terms of alternating minimization, with respect to x and z, of the augmented Lagrangian function associated with problem (Equation42(42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) ). Indeed, ADMM consists of the iterations $\begin{aligned} x_{k + 1} = \arg min_{x \in R^{n}} L_{ρ} (x, z_{k}, u_{k}) \\ z_{k + 1} = \arg min_{z \in R^{M}} L_{ρ} (x_{k + 1}, z, u_{k}) \\ u_{k + 1} = u_{k} + ρ (M x_{k + 1} + B z_{k + 1} - d), \end{aligned}$ where $ρ > 0$ and $L_{ρ}$ is the augmented Lagrangian function $L_{ρ} (x, z, u) := φ (x) + ϕ (z) + ⟨ u, M x + B z - d ⟩ + \frac{1}{2} ρ ‖ M x + B z - d ‖_{2}^{2} .$ The convergence results for ADMM guarantee, under suitable assumptions, that the sequences $(x_{k})_{k \in N}$ , $(z_{k})_{k \in N}$ and $(u_{k})_{k \in N}$ generated by the method are such that $M x_{k} + B z_{k} - d \to 0$ , $φ (x_{k}) + ϕ (z_{k}) \to s^{⋆}$ and $u_{k} \to u^{⋆}$ , where $s^{⋆}$ is the optimal value of problem (Equation42(42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) ) and $u^{⋆}$ is a solution of the dual problem associated with (Equation42(42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) ).

For minimizing the Tikhonov functional using ADMM we introduce an additional decision variable z such that problem of minimizing $T_{λ_{k}^{δ}}^{δ} (x)$ for $x \in X$ is rewritten into the form of (Equation42(42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) ).

The specific choice of the functions ϕ, φ and the operators M and B is problem dependent (for a concrete example see, e.g. Section 6.2). This allows us to exploit the special form of the functional $T_{λ_{k}^{δ}}^{δ}$ , and also to pose the minimization problem in a more suitable form, in order to be solved numerically.

In all numerical simulations presented in Section 6, the ADMM method is stopped when $‖ M x_{k} + B z_{k} - d ‖$ becomes smaller than a predefined threshold.

6. Numerical experiments

6.1. Deconvolution

In what follows we consider the application of the nIT method to the deconvolution problem modelled by the linear integral operator $A x := \int_{0}^{1} K (s, t) x (t) d t = y (s),$ where the kernel K is the continuous function defined by $K (s, t) = {\begin{cases} 49 s (1 - t), s \leq t \\ 49 t (1 - s), s > t \end{cases} .$ This benchmark problem is considered in [Citation3]. There, it is observed that $A : L^{p} [0, 1] \to C [0, 1]$ is continuous and bounded for $1 \leq p \leq \infty$ . Thus, $A : L^{p} [0, 1] \to L^{r} [0, 1]$ is compact for $1 \leq r < \infty$ .

In our experiment, A is replaced by the discrete operator $A_{d}$ , where the above integral is computed using a quadrature formula (trapezoidal rule) over an uniform partition of the interval $[0, 1]$ with 400 nodes.

The exact solution of the discrete problem is the vector $x^{⋆} \in R^{400}$ with $x^{⋆} (27) = 2$ , $x^{⋆} (72) = 1.25$ , $x^{⋆} (103) = 1.75$ , $x^{⋆} (255) = 1.25$ , $x^{⋆} (350) = 1.5$ and $x^{⋆} (i) = 0$ , elsewhere.Footnote⁴

We compute $y = A_{d} x^{⋆}$ , the exact data, and add random Gaussian noise to $y \in R^{400}$ to get the noisy data $y^{δ}$ satisfying $‖ y - y^{δ} ‖_{Y} \leq δ$ .

We follow [Citation3] in the experimental setting and choose $δ = 0.0005$ , $τ = 1.001$ (discrepancy principle), and $Y = L^{2}$ . For the parameter space, two distinct choices are considered, namely $X = L^{1.001}$ and $X = L^{2}$ .

Numerical results for the deconvolution problem are presented in Figure (for simplicity, all legends in this figure refere to the space $L^{1}$ ; however, p=1.001 is used in the computations). The following methods are depicted:

(BLUE) $L^{2}$ -penalization, Geometric sequence;
(GREEN) $L^{2}$ -penalization, Secant method;
(RED) $L^{1.001}$ -penalization, Geometric sequence;
(PINK) $L^{1.001}$ -penalization, Secant method;
(BLACK) $L^{1.001}$ -penalization, Newton method.

Figure 1. Deconvolution problem: Numerical experiments.

The six pictures in Figure represent:

[TOP] Iteration error in $L^{2}$ -norm (left);Footnote⁵ residual in $L^{2}$ -norm (right);
[CENTER] Number of linear systems/step (left); Lagrange multipliers (right);
[BOTTOM] exact solution and reconstructions with $L^{2}$ -penalization (left); exact solution and reconstructions with $L^{1.001}$ -penalization (right).

6.2. Image deblurring

In the sequel an application of the nIT method to an image deblurring problem is considered. This is a finite dimensional problem with spaces $X = R^{n} \times R^{n}$ and $Y = R^{n} \times R^{n}$ . The vector $x \in X$ represents the pixel values of the original image to be restored, and $y \in Y$ contains the pixel values of the observed blurred image. In practice, only noisy blurred data $y^{δ} \in Y$ satisfying (Equation2(2) $‖ y^{δ} - y ‖_{Y} \leq δ,$ (2) ) is available. The linear transformation A represents some blurring operator.

In the numerical simulations we consider the situation where the blur of the image is modelled by a space invariant point spread function (PSF). The exact solution is the $512 \times 512$ Barbara image (see Figure ), and $y^{δ}$ is obtained adding artificial noise to the exact data y=Ax (here A is the covolution operator corresponding to the PSF).

Figure 2. Image deblurring problem: (LEFT) Point Spread Function; (CENTER) Exact image; (RIGHT) Blurred image.

For this problem the nIT method is implemented with two distinct penalization terms, namely $f (x) = ‖ x ‖_{2}^{2}$ ( $L^{2}$ penalization) and $f (x) = \frac{μ}{2} ‖ x ‖_{2}^{2} + T V (x)$ ( $L^{2} + T V$ penalization). Here $μ > 0$ is a regularization parameter and $T V (x) = ‖ \nabla x ‖_{1}$ is the total variation norm of x, where $\nabla : R^{n} \times R^{n} \to (R^{n} \times R^{n}) \times (R^{n} \times R^{n})$ is the discrete gradient operator. We minimize the Tikhonov functional associated with the $L^{2} + T V$ penalization term using the ADMM described in Section 5.

In our experiments the values $μ = 10^{- 4}$ , $δ = 0.00001$ and $τ = 1.5$ are used. Moreover, $x_{0} = y^{δ}$ and $ξ_{0} = \nabla^{*} (s i g n (\nabla x_{0}))$ are chosen as initial guesses.

In Figure , the exact solution, the convolution kernel, and the noisy data are shown. The reconstructed images are shown in Figure , while the numerical results are presented in Figure . The following methods were implemented:

(BLUE) $L^{2}$ -penalization, Geometric sequence;
(RED) $L^{2} + T V$ -penalization, Geometric sequence;
(PINK) $L^{2} + T V$ -penalization, Secant method;
(GREEN) $L^{2} + T V$ -penalization, Adaptive method.

Figure 3. Image deblurring problem: Reconstructions (TOP LEFT) $L^{2}$ –Geometric; (TOP RIGHT) $L^{2} + T V$ –Geometric; (BOTTOM LEFT) $L^{2} + T V$ –Secant; (BOTTOM RIGHT) $L^{2} + T V$ –Adaptive.

Figure 4. Image deblurring problem: Numerical experiments.

The four pictures in Figure represent:

[TOP] Iteration error $‖ x^{⋆} - x_{k}^{δ} ‖$ ;
[CENTER TOP] Residual $‖ A x_{k}^{δ} - y^{δ} ‖$ ;
[CENTER BOTTOM] Number of linear systems solved in each step;
[BOTTOM] Lagrange multiplier $λ_{k}$ . and $τ = 1.5$ . Moreover, the initial guesses $x_{0} = y^{δ}$

Remark 6.1

The Tikhonov functional associated with the $L^{2} + T V$ penalization term is minimized using the ADMM in Section 5. Note that, if $f (x) = \frac{μ}{2} ‖ x ‖_{2}^{2} + ‖ \nabla x ‖_{1}$ then one needs to solve $min_{x \in X} \frac{1}{2} λ_{k} ‖ A x - y^{δ} ‖^{2} + \frac{μ}{2} ‖ x - x_{k - 1}^{δ} ‖^{2} + ‖ \nabla x ‖_{1} - ‖ \nabla x_{k - 1}^{δ} ‖_{1} - ⟨ ξ_{k - 1}^{δ}, x - x_{k - 1}^{δ} ⟩$ in each iteration. In order to use ADMM, we sate this problem into the form of (Equation42(42) $min_{(x, z) \in R^{n} \times R^{m}} {φ (x) + ϕ (z) : M x + B z = d},$ (42) ) by defining $z = \nabla x$ , $φ (x) := \frac{λ_{k}}{2} ‖ A x - y^{δ} ‖^{2} + \frac{μ}{2} ‖ x - x_{k - 1}^{δ} ‖^{2} - ⟨ ξ_{k - 1}^{δ}, x - x_{k - 1}^{δ} ⟩$ , $ϕ (z) = ‖ z ‖_{1} - ‖ \nabla x_{k - 1}^{δ} ‖_{1}$ , $M = - \nabla$ , B=I and d=0.

6.3. Inverse potential problem

The third application considered in this section, the Inverse Potentinal Problem (IPP), consists of recovering a coefficient function $x : Ω \to R$ , from measurements of the Cauchy data of its corresponding potential on the boundary of the domain $Ω = (0, 1) \times (0, 1)$ .

The direct problem is modelled by the linear operator $A : L^{2} (Ω) ∋ x \mapsto w_{ν} |_{\partial Ω} \in L^{2} (\partial Ω)$ , where $w \in H^{1} (Ω)$ solves the elliptic boundary value problem (43) $Δ w = x, in Ω; w = 0, at \partial Ω .$ (43) Since $x \in L^{2} (Ω)$ , the Dirichlet boundary value problem in (Equation43(43) $Δ w = x, in Ω; w = 0, at \partial Ω .$ (43) ) admits a unique solution (known as potential) $w \in H^{2} (Ω) \cap H_{0}^{1} (Ω)$ [Citation16].

The inverse problem we are concerned with, consists in determining the piecewise constant source function x from measurements of the Neumann trace of w at $\partial Ω$ . Using the above notation, the IPP can be written in the abbreviated form (Equation1(1) $A x = y,$ (1) ), with data $y = w_{ν} |_{\partial Ω}$ .

In our implementations we follow [Citation8] in the experimental setup: we set $Ω = (0, 1) \times (0, 1)$ and assume that $x^{⋆} \in H^{1} (Ω)$ is a function with sharp gradients (see Figure ).

The boundary value problem (Equation43(43) $Δ w = x, in Ω; w = 0, at \partial Ω .$ (43) ) is solved for w using $x = x^{⋆}$ , and the exact data $y = w_{ν} |_{\partial Ω}$ for the inverse problem is computed. The noisy data $y^{δ}$ for the inverse problem is obtained by adding to y a normally distributed noise with zero mean, in order to achieve a prescribed relative noise level.

In our numerical implementations we set $τ = 5$ (discrepancy principle constant), $δ = 0.5 %$ (relative noise level), and the initial guess $x_{0} \equiv 1.5$ (a constant function in Ω). For the parameter space we choose $X = L^{p} (Ω)$ with p=1.5, while the data space is $Y = L^{2} (\partial Ω)$ .

Numerical results for the Inverse potential problem are presented in Figure . The following methods are depicted:

(RED) $L^{p}$ -penalization, Geometric sequence;
(BLACK) $L^{p}$ -penalization, Newton method;
(BLUE) $L^{p}$ -penalization, Adaptive method;

Figure 5. Inverse Potential problem: Numerical experiments.

Figure 6. Inverse Potential problem: (TOP LEFT) Exact solution $x^{⋆}$ . Reconstructions (TOP RIGHT) $L^{1}$ –Geometric; (BOTTOM LEFT) $L^{1}$ –Newton; (BOTTOM RIGHT) $L^{1}$ –Adaptive.

The four pictures in Figure represent: [TOP] Iteration error in $L^{p}$ -norm; [ $C E N T E R T O P$ ] Residual in $L^{2}$ -norm; [ $C E N T E R B O T T O M$ ] Number of linear systems/step; [BOTTOM] Lagrange multipliers.

The corresponding reconstruction results are shown in Figure : [ $T O P L E F T$ ] Exact Solution; [ $T O P R I G H T$ ] geometric method; [ $B O T T O M L E F T$ ] Newton method; [ $B O T T O M L E F T$ ] Adaptive method.

7. Conclusions

In this manuscript we investigate a novel strategy for choosing the regularization parameters in the nonstationary iterated Tikhonov (nIT) method for solving ill-posed operator equations modelled by linear operators acting between Banach spaces. The novelty of our approach consists in defining strategies for choosing a sequence of regularization parameters (i.e. Lagrange multipliers) for the nIT method.

A preliminary (numerical) investigation of this method was conducted in [Citation9]. In the present manuscript we derive a complete convergence analysis, proving convergence, stability and semi-convergence results (see Section 4). In Sections 6.1 and 6.2 we revisit two numerical applications discussed in [Citation9]. Moreover, in Section 6.3 we investigate a classical benchmark problem in the inverse problems literature, namely the 2D elliptic Inverse Potential Problem.

The Lagrange multipliers are chosen (a posteriori) in order to enforce a fast decay of the residual functional (see Algorithm 1 and Section 4). The computation of these multipliers is performed by means of three distinct methods: (1) a secant type method; (2) a Newton type method; (3) an adaptive method using a geometric sequence with non-constant growth rate, where the rate is updated after each step.

The computation of the iterative step of the nIT method requires the minimization of a Tikhonov Functional (see Section 4). This task is solved here using two distinct methods: (1) in the case of smooth penalization and Banach parameter-spaces the optimality condition (related to the Tikhonov functional) leads to a nonlinear equation, which is solved using a Newton type method; (2) in the case of nonsmooth penalization and Hilbert parameter-space, the ADMM method is used for minimizing the Tikhonov functional.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

A.L. acknowledges support from CNPq [grant number 311087/2017-5], and from the AvH Foundation.

Notes

1 The differentiability and convexity properties of this functional are independent of the particular choice of

p > 1.

2 This Lemma guarantees that, given a reflexive Banach space E, and a nonempty closed convex set $A \subset E$ , then any convex l.s.c proper function $ϕ : A \to (- \infty, \infty]$ achieves its minimum on A.

3 Here (Equation1(1) $A x = y,$ (1) ) is replaced by $A g = y^{δ}$ .

4 Notice that we are dealing with a discrete inverse problem, and discretization errors associated to the continuous model are not being considered.

5 For the purpose of comparison, the iteration error is ploted in the in $L^{2}$ -norm for both choices of the parameter space $X = L^{2}$ and $X = L^{1.001}$ .

References

Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Dordrecht: Kluwer Academic Publishers Group; 1996. (Mathematics and its applications; 375). MR 1408680.
Google Scholar
Hanke M, Groetsch CW. Nonstationary iterated Tikhonov regularization. J Optim Theory Appl. 1998;98(1):37–53. doi: 10.1023/A:1022680629327
Web of Science ®Google Scholar
Jin Q, Stals L. Nonstationary iterated Tikhonov regularization for ill-posed problems in Banach spaces. Inverse Probl. 2012;28(3):104011.
Google Scholar
Schuster T, Kaltenbacher B, Hofmann B, et al. Regularization methods in Banach spaces. Berlin: de Gruyter; 2012.
Google Scholar
De Cezaro A, Baumeister J, Leitão A. Modified iterated Tikhonov methods for solving systems of nonlinear ill-posed equations. Inverse Probl Imaging. 2011;5(1):1–17. doi: 10.3934/ipi.2011.5.1
Web of Science ®Google Scholar
Jin Q, Zhong M. Nonstationary iterated Tikhonov regularization in Banach spaces with uniformly convex penalty terms. Numer Math. 2014;127(3):485–513. MR 3216817 doi: 10.1007/s00211-013-0594-9
Web of Science ®Google Scholar
Lorenz D, Schoepfer F, Wenger S. The linearized bregman method via split feasibility problems: analysis and generalizations. SIAM J Imaging Sci. 2014;7(2):1237–1262. doi: 10.1137/130936269
Web of Science ®Google Scholar
Boiger R, Leitão A, Svaiter BF. Range-relaxed criteria for choosing the Lagrange multipliers in nonstationary iterated Tikhonov method. IMA J Numer Anal. 2019;39(1). DOI:10.1093/imanum/dry066
Google Scholar
Machado MP, Margotti F, Leitão A. On nonstationary iterated Tikhonov methods for ill-posed equations in Banach spaces. In: Hofmann B, Leitão A, Zubelli JP, editors. New trends in parameter identification for mathematical models. Cham: Springer International Publishing; 2018. p. 175–193.
Google Scholar
Cioranescu I. Geometry of Banach spaces, duality mappings and nonlinear problems. Dordrecht: Kluwer Academic Publishers Group; 1990. (Mathematics and its applications; 62). MR 1079061.
Google Scholar
Brezis H. Functional analysis, Sobolev spaces and partial differential equations. New York: Springer; 2011. (Universitext). MR 2759829.
Google Scholar
Rockafellar RT Conjugate duality and optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1974, Lectures given at the Johns Hopkins University, Baltimore, Md., June, 1973, Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 16. MR 0373611.
Google Scholar
Eckstein J, Bertsekas DP. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program. 1992;55(3):293–318. MR 1168183. doi: 10.1007/BF01581204
Web of Science ®Google Scholar
Glowinski R, Marrocco A. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité, d'une classe de problèmes de Dirichlet non linéaires. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Rouge Anal. Numér.. 1975;9(R-2):41–76. MR 0388811.
Google Scholar
Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl. 1976;2(1):17–40. doi: 10.1016/0898-1221(76)90003-1
Google Scholar
Evans LC. Partial differential equations. Providence, RI: American Mathematical Society; 1998. (Graduate studies in mathematics; 19).
Google Scholar

Appendix

In this appendix we discuss how to compute the Gâteaux derivative of $J_{p}$ . Given $g \in L^{p} (Ω)$ , the key idea for deriving a formula for $J_{p}^{'} (g)$ is to observe the differentiability of the function $γ : R ∋ x \mapsto p^{- 1} | x |^{p} \in R$ . This function is differentiable in $R$ whenever p>1 and, in this case it holds (A1) $γ^{'} (x) = | x |^{p - 1} sign (x), where sign (x) = {\begin{cases} 1, & if x > 0 \\ 0, & if x = 0 \\ - 1, & if x < 0 \end{cases} .$ (A1) Furthermore, γ is twice differentiable in $R$ if $p \geq 2$ , with derivative given by (A2) $γ^{''} (x) = (p - 1) | x |^{p - 2} .$ (A2) This formula still holds true for $1 < p < 2,$ but only in $R ∖ {0}$ . In this case, $γ^{''} (0)$ does not exist and $γ^{''} (x)$ grows to infinity as x approaches to zero.

Since $J_{p} (g) = (\frac{1}{p} ‖ g ‖_{L^{p}}^{p})^{'}$ can be identified with (A3) $J_{p} (g) = {| g |}^{p - 1} sign (g)$ (A3) (see [Citation10]), which looks very similar to $γ^{'}$ in (EquationA1(A1) $γ^{'} (x) = | x |^{p - 1} sign (x), where sign (x) = {\begin{cases} 1, & if x > 0 \\ 0, & if x = 0 \\ - 1, & if x < 0 \end{cases} .$ (A1) ), the bounded linear operator $J_{p}^{'} (g) : L^{p} (Ω) \to L^{p^{*}} (Ω)$ should be in some sense similar to $γ^{''}$ in (EquationA2(A2) $γ^{''} (x) = (p - 1) | x |^{p - 2} .$ (A2) ). We then fix $g \in L^{p} (Ω)$ with $p \geq 2$ and try to prove (Equation41(41) $(J_{p}^{'} (g)) (h) = ⟨ (p - 1) | g |^{p - 2}, h ⟩, \forall h \in L^{p} (Ω),$ (41) ), i.e. $(J_{p}^{'} (g)) (h) = ⟨ (p - 1) | g |^{p - 2}, h ⟩, \forall h \in L^{p} (Ω),$ where the linear operator $(p - 1) | g |^{p - 2} : h \mapsto (p - 1) | g (\cdot) |^{p - 2} h (\cdot)$ is to be understood in pointwise sense. This ensures that $J_{p}$ is Gâteaux-differentiable in $L^{p} (Ω)$ and its derivative $J_{p}^{'}$ can be identified with $(p - 1) | \cdot |^{p - 2}$ .

Note that, given $h \in L^{p} (Ω)$ , equality (Equation41(41) $(J_{p}^{'} (g)) (h) = ⟨ (p - 1) | g |^{p - 2}, h ⟩, \forall h \in L^{p} (Ω),$ (41) ) holds iff $lim_{t \to 0} f_{t} = (p - 1) | g (\cdot) |^{p - 2} h (\cdot)$ , where $f_{t} : Ω \to R$ is defined by $f_{t} (x) := t^{- 1} [J_{p} (g + t h) - J_{p} (g)] (x)$ . This is equivalent to prove that $lim_{t \to 0} ‖ f_{t} - (p - 1) | g (\cdot) |^{p - 2} h (\cdot) ‖_{L^{p^{*}} (Ω)}^{p^{*}} = 0.$ In view of (EquationA3(A3) $J_{p} (g) = {| g |}^{p - 1} sign (g)$ (A3) ) and (EquationA2(A2) $γ^{''} (x) = (p - 1) | x |^{p - 2} .$ (A2) ), it follows that for each $x \in Ω$ fixed we have $\begin{aligned} lim_{t \to 0} f_{t} (x) & = lim_{t \to 0} t^{- 1} [| g (x) + t h (x) |^{p - 1} sign (g (x) + t h (x)) - | g (x) |^{p - 1} sign (g (x))] \\ = \frac{d}{d t} [| g (x) + t h (x) |^{p - 1} sign (g (x) + t h (x))]_{t = 0} \\ = [(p - 1) | g (x) + t h (x) |^{p - 2} h (x)]_{t = 0} = (p - 1) | g (x) |^{p - 2} h (x) . \end{aligned}$ Thus, making use of Lebesgue's Dominated Convergence Theorem, we conclude that $\begin{aligned} lim_{t \to 0} ‖ f_{t} - (p - 1) | g (\cdot) |^{p - 2} h (\cdot) ‖_{L^{p^{*}} (Ω)}^{p^{*}} & = lim_{t \to 0} \int_{Ω} | f_{t} (x) - (p - 1) | g (x) |^{p - 2} h (x) |^{p^{*}} d x \\ = \int_{Ω} | lim_{t \to 0} f_{t} (x) - (p - 1) | g (x) |^{p - 2} h (x) |^{p^{*}} d x = 0. \end{aligned}$ which proves (Equation41(41) $(J_{p}^{'} (g)) (h) = ⟨ (p - 1) | g |^{p - 2}, h ⟩, \forall h \in L^{p} (Ω),$ (41) ).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

On the choice of Lagrange multipliers in the iterated Tikhonov method for linear ill-posed equations in Banach spaces

ABSTRACT

1. Introduction

1.1. Main results: presentation and interpretation

1.2. Outline of the article

2. Background material

3. The nIT method

Novel properties of the proposed method

4. Convergence analysis

Stability

Regularization

5. Algorithms and numerical implementation

5.1. Determining the Lagrange multipliers

5.2. Minimization of the Tikhonov functional

6. Numerical experiments

6.1. Deconvolution

6.2. Image deblurring

6.3. Inverse potential problem

7. Conclusions

Disclosure statement

References

Appendix

Information for

Open access

Opportunities

Help and information

On the choice of Lagrange multipliers in the iterated Tikhonov method for linear ill-posed equations in Banach spaces

ABSTRACT

1. Introduction

1.1. Main results: presentation and interpretation

1.2. Outline of the article

2. Background material

3. The nIT method

Novel properties of the proposed method

4. Convergence analysis

Stability

Regularization

5. Algorithms and numerical implementation

5.1. Determining the Lagrange multipliers

5.2. Minimization of the Tikhonov functional

6. Numerical experiments

6.1. Deconvolution

6.2. Image deblurring

6.3. Inverse potential problem

7. Conclusions

Disclosure statement

Additional information

Funding

Notes

References

Appendix

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date