Full article: Dualization and Automatic Distributed Parameter Selection of Total Generalized Variation via Bilevel Optimization

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Total Generalized Variation (TGV) regularization in image reconstruction relies on an infimal convolution type combination of generalized first- and second-order derivatives. This helps to avoid the staircasing effect of Total Variation (TV) regularization, while still preserving sharp contrasts in images. The associated regularization effect crucially hinges on two parameters whose proper adjustment represents a challenging task. In this work, a bilevel optimization framework with a suitable statistics-based upper level objective is proposed in order to automatically select these parameters. The framework allows for spatially varying parameters, thus enabling better recovery in high-detail image areas. A rigorous dualization framework is established, and for the numerical solution, a Newton type method for the solution of the lower level problem, i.e. the image reconstruction problem, and a bilevel TGV algorithm are introduced. Denoising tests confirm that automatically selected distributed regularization parameters lead in general to improved reconstructions when compared to results for scalar parameters.

Keywords:

1. Introduction

In this work we analyze and implement a bilevel optimization framework for automatically selecting spatially varying regularization parameters $α : = (α_{0}, α_{1}) \in C {(\bar{Ω})}^{2}, α > 0,$ in the following image reconstruction problem: (1.1) $minimize \frac{1}{2} \int_{Ω} {(T u - f)}^{2} d x + {TGV}_{α}^{2} (u) over u \in BV (Ω),$ (1.1) where the second-order Total Generalized Variation (TGV) regularization is given by (1.2) $\begin{matrix} {TGV}_{α}^{2} (u) = \sup {\int_{Ω} u {div}^{2} ϕ d x : ϕ \in C_{c}^{\infty} (Ω, S^{d \times d}), | ϕ (x) | \leq α_{0} (x), \\ | div ϕ (x) | \leq α_{1} (x), for all x \in Ω} . \end{matrix}$ (1.2)

Here, $Ω \subseteq R^{d}$ is a bounded, open image domain with Lipschitz boundary, $S^{d \times d}$ denotes the space of d × d symmetric matrices, $T : L^{2} (Ω) \to L^{2} (Ω)$ is a bounded linear (output) operator, and f denotes given data which satisfies (1.3) $f = T u_{true} + η .$ (1.3)

In this context, η models a highly oscillatory (random) component with zero mean and known quadratic deviation (variance) $σ^{2}$ from the mean. Further, $L^{2} (Ω)$ denotes the standard Lebesgue space [Citation1], and $| \cdot |,$ represents the Euclidean vector norm or its associated matrix norm. The space of infinitely differentiable functions with compact support in $Ω$ and values in $S^{d \times d}$ is denoted by $C_{c}^{\infty} (Ω, S^{d \times d}) .$ Further, we refer to Section 2 for the definition of the first- and second-order divergences $div$ and ${div}^{2},$ respectively.

Originally, the TGV functional was introduced for scalar parameters $α_{0}, α_{1} > 0$ only; see [Citation2]. It serves as a higher order extension of the well-known Total Variation (TV) regularizer [Citation3,Citation4], preserves edges (i.e., sharp contrast) [Citation5,Citation6], and promotes piecewise affine reconstructions while avoiding the often adverse staircasing effect (i.e., piecewise constant structures) of TV [Citation7–9]; see for an illustration. These properties of TGV have made it a successful regularizer in variational image restoration for a variety of applications [Citation2, Citation10–16]. Extensions to manifold-valued data, multimodal and dynamic problems [Citation17–22] have been proposed as well. In all of these works, the choice of the scalar parameters $α_{0}, α_{1}$ is made “manually” via a direct grid search. Alternatively, selection schemes relying on a known ground truth u_true have been studied; see [Citation23–26]. The latter approach, however, is primarily of interest when investigating the mere capabilities of TGV regularization.

Figure 1. Gaussian denoising: Typical difference between TV (piecewise constant) and TGV reconstructions (piecewise affine).

While there exist automated parameter choice rules for TV regularization, see for instance [Citation27] and the references therein, analogous techniques and results for the TGV parameters are very scarce. One of the very few contributions is [Citation28] where, however, a spatially varying fidelity weight rather then regularization parameter is computed. Compared to the choice of the regularization weight in TV-based models, the infimal convolution type regularization incorporated into the TGV functional significantly complicates the selection; compare the equivalent definition (2.1) below. Further difficulties arise when these parameters are spatially varying as in (1.2). In that case, by appropriately choosing $α = (α_{0}, α_{1}),$ one wishes to smoothen homogeneous areas in the image while preserving fine scale details. The overall target is then to not only select the parameters in order to reduce noise while avoiding oversmoothing, as in the TV case, but also to ensure that the interplay of α₀ and α₁ will not produce any staircasing.

For this delicate selection task and inspired by [Citation27, Citation29] for TV, in this work we propose a bilevel minimization framework for an automated selection of $α$ in the TGV case. Formally, the setting can be characterized as follows: (1.4) ${\begin{matrix} minimize a statistics - based (upper level) objective over (u, α) \\ subject t o u solving (1.1) for a regularization weight α = (α_{0}, α_{1}) . \end{matrix}$ (1.4)

Note here that the optimization variable $α$ enters the lower level minimization problem (1.1) as a parameter, thus giving rise to $u = u (α) .$ We also mention that this optimization format falls into the general framework which is discussed in our review paper [Citation30] where the general opportunities and mathematical as well as algorithmic aspects of bilevel optimization in generating structured non-smooth regularization functionals are discussed in detail.

As our statistical set-up parallels the one in [Citation27, Citation29], here we resort to the upper level objective proposed in that work. It is based on localized residuals $R : L^{2} (Ω) \to L^{\infty} (Ω)$ with (1.5) $R u (x) = \int_{Ω} w (x, y) {(T u - f)}^{2} (y) d y,$ (1.5) where $w \in L^{\infty} (Ω \times Ω)$ with $\int_{Ω} \int_{Ω} w (x, y) dxdy = 1 .$ Note that $R u (x)$ can be interpreted as a local variance keeping in mind that, assuming Gaussian noise of variance $σ^{2},$ we have that $\int_{Ω} {(T u_{true} - f)}^{2} d x = \int_{Ω} η^{2} d x = σ^{2} | Ω | .$ Consequently, if a reconstructed image u is close to u_true then it is expected that for every $x \in Ω$ the value of $R u (x)$ will be close to $σ^{2} .$ Hence it is natural to consider an upper level objective which aims to approximately keep $R u$ within a corridor ${\underline{σ}}^{2} \leq σ^{2} \leq {\bar{σ}}^{2}$ with positive bounds ${\underline{σ}}^{2}, {\bar{σ}}^{2} .$ This can be achieved by utilizing the function $F : L^{2} (Ω) \to R$ with (1.6) $F (v) : = \frac{1}{2} \int_{Ω} \max {(v - {\bar{σ}}^{2}, 0)}^{2} d x + \frac{1}{2} \int_{Ω} \min {(v - {\underline{σ}}^{2}, 0)}^{2} d x .$ (1.6)

The function $F (R \cdot)$ is then indeed suitable as an upper level objective. This is demonstrated in , where we show (in the middle and right plots) the objective values for a series of scalar TGV denoising results and for a variety of parameters $(α_{0}, α_{1})$ for the image depicted on the left. Regarding the choices of $\underline{σ}, \bar{σ}, w$ we refer to Section 5. Upon inspection of we find that the functional $F (R \cdot)$ is minimized for a pair of scalar parameters $(α_{0}, α_{1})$ that is close to the one maximizing the peak-signal-to-noise-ratio (PSNR). Note, however, that in order to truly optimize the PSNR, one would need the ground truth image u_true, which is of course typically not available. In contrast to this, we emphasize that $F (R \cdot)$ does not involve any ground truth information. Rather, it only relies on statistical properties of the noise.

Figure 2. Suitability of the functional $F (R \cdot)$ as an upper level objective. Evaluation of $F (R u)$ where u solves the TGV denoising problem (1.1) (T = Id), for a variety of scalar parameters $(α_{0}, α_{1}) .$

Figure 2. Suitability of the functional F(R·) as an upper level objective. Evaluation of F(Ru) where u solves the TGV denoising problem (1.1) (T = Id), for a variety of scalar parameters (α0,α1).

For analytical and numerical reasons, rather than having (1.1) as the lower level problem for the bilevel minimization framework (1.4), one can use alternative formulations, as was done for instance in [Citation27, Citation29] for TV models, where the Fenchel predual problem was used instead, see also [Citation30] for a thorough discussion on the corresponding TGV model. This yields a bilevel problem which is expressed in terms of dual variables and is equivalent to the one stated in terms of the primal variable u. In this way, one has to treat a more amenable variational inequality of the first kind rather than one of second kind in the primal setting in the constraint system of the resulting bilevel optimization problem. Numerically, one may then utilize very efficient and resolution independent, function space based solution algorithms, like (inexact) semismooth Newton methods [Citation31]. The other option that we will consider in this work, is to minimize the upper level objective subject to the primal-dual optimality conditions, for which Newton methods can also be applied for their solution, see for instance [Citation32] for an inexact semismooth Newton solver which operates on the primal-dual optimality conditions for TV regularization. We should also mention that this approach requires $T^{*} T$ to be invertible, with $T^{*}$ being the adjoint of T, which is true when T is injective with closed range. We note that this does not exclude the use of our bilevel scheme to inverse problems whose forward operator does not satisfy this condition. A noninvertibility of $T^{*} T$ can for instance be treated by adding a small regularization term of the form $\frac{κ}{2} \int_{Ω} u^{2} d x,$ in combination with a Levenberg-Marquardt algorithmic scheme that sends the parameter κ to zero along the iterations. Alternatively, instead of Newton, one can resort to first-order methods for solving the lower level problem for which invertibility of $T^{*} T$ is not required. Such approaches are definitely interesting and necessary for many inverse problems however since they would deviate from the main focus of our paper which is the automatic computation of spatially distributed regularization weights, we will keep this invertibility assumption in the following.

2. The structure of the paper

Basic facts on the TGV functional with spatially varying parameters along with functional analytic foundations needed for (pre)dualization are the subjects of Section 2. Section 2.4 is concerned with the derivation of the predual problem of (1.1) and the corresponding primal-dual optimality conditions. Regularized versions of these conditions are the focus of Section 3. Besides respective primal-dual optimality conditions, we study the asymptotic behavior of these problems and their associated solutions under vanishing regularization. Section 4 introduces the bilevel TGV problem for which the primal-dual optimality conditions serve as constraints. The numerical solution of the proposed bilevel problem is the subject of Section 5. It is also argued that every regularized instance of the lower level problem can be solved efficiently by employing an (inexact) semismooth Newton method. The paper ends by a report on extensive numerical tests along with conclusions drawn from these computational results.

Summarizing, this work provides not only a user-friendly and novel hierarchical variational framework for automatic selection of the TGV regularization parameters, but by making these parameters spatially dependent it leads to an overall performance improvement; compare, e.g., the results in Section 5.

3. The dual form of the weighted TGV functional

3.1. Total generalized variation

We recall here some basic facts about the TGV functional (1.2) with constant parameters $α_{0}, α_{1}$ and assume throughout that the reader is familiar with the basic concepts of functions of bounded variation (BV); see [Citation33] for a detailed account. For a function $ϕ \in C_{c}^{\infty} (Ω, S^{d \times d})$ the first- and second-order divergences are, respectively, given by ${(div ϕ)}_{i} = \sum_{j = 1}^{d} \frac{\partial ϕ_{i j}}{\partial x_{j}}, i = 1, \dots, d, and {div}^{2} ϕ = \sum_{i = 1}^{d} \frac{\partial^{2} ϕ_{i i}}{\partial x_{i}^{2}} + 2 \sum_{i < j} \frac{\partial^{2} ϕ_{i j}}{\partial x_{i} \partial x_{j}} .$

In [Citation14] it was shown that a function $u \in L^{1} (Ω)$ has finite $TGV$ value if and only if it belongs to $BV (Ω) .$ Here $BV (Ω)$ denotes the Banach space of function of bounded variation over $Ω$ with associated norm $‖ \cdot ‖_{BV (Ω)} .$ Moreover, the bounded generalized variation norm $‖ \cdot ‖_{BGV} : = ‖ \cdot ‖_{L^{1} (Ω)} + {TGV}_{α}^{2} (\cdot)$ is equivalent to $‖ \cdot ‖_{BV (Ω)} .$ Similarly to TV, TGV is a convex functional which is lower semicontinuous with respect to the strong L¹ convergence. In [Citation14, Citation34] it is demonstrated that the TGV functional can be equivalently written as (2.1) ${TGV}_{α}^{2} (u) = \min_{w \in BD (Ω)} α_{1} | D u - w | (Ω) + α_{0} | E w | (Ω),$ (2.1) where $BD (Ω)$ is the Banach space of functions of bounded deformation, with $E$ denoting the distributional symmetrized gradient [Citation35,Citation36]. The asymptotical behavior of the TGV model in image restoration with respect to scalars $α_{0}, α_{1}$ was studied in [Citation37]; see also [Citation6]. For instance, when T = Id and either α₀ or α₁ converges to zero, then the corresponding solutions of (1.1) converge (weakly $^{*}$ in $BV (Ω)$ ) to f. When both of the parameters are sent to infinity, then the solutions converge weakly $^{*}$ to the L²-linear regression solution for f. We further note that the set of affine functions constitutes the kernel of the TGV functional.

For specific symmetric functions u, there exist combinations of $α_{0}, α_{1}$ such that ${TGV}_{α} (u) = α_{1} TV (u) .$ In general one can show that there exists a constant C > 0 such that if $α_{0} / α_{1} > C,$ then the $TGV$ value does not depend on α₀ and, up to an affine correction, it is equivalent to $TV .$ In that case the reconstructed images still suffer from a kind of (affine) staircasing effect [Citation37].

The fine structure of TGV reconstructions has been studied analytically mainly in dimension one in [Citation5, Citation38–41]. Under some additional regularity assumptions (compare [Citation6]) it can be shown that for TGV denoising the jump set of the solution is essentially contained in the jump set of the data; see [Citation42] for the TV case.

3.2. The space $W_{0}^{q} ({div}^{2}; Ω)$

Next we introduce several function spaces which will be useful in our subsequent development. For this purpose, let $1 \leq q \leq \infty$ and $p \in L^{q} (Ω, R^{d}) .$ Recall that $div p \in L^{q} (Ω)$ if there exists $w \in L^{q} (Ω)$ such that $\int_{Ω} \nabla ϕ \cdot p d x = - \int_{Ω} ϕ w d x, for all ϕ \in C_{c}^{\infty} (Ω) .$

In that case w is unique and we set $div p = w .$ Based on this first-order divergence, we define the Banach space $W^{q} (div; Ω) : = {p \in L^{q} (Ω, R^{d}) : div p \in L^{q} (Ω)},$ endowed with the norm $‖ p ‖_{W^{q} (div; Ω)}^{q} : = ‖ p ‖_{L^{q} (Ω, R^{d})}^{q} + ‖ div p ‖_{L^{q} (Ω)}^{q} .$ Similarly one obtains the Banach space $W^{q} ({div}^{2}; Ω)$ as the space of all functions $p \in L^{q} (Ω, S^{d \times d})$ whose first- and second-order divergences, $div p$ and ${div}^{2} p,$ respectively, belong to $L^{q} (Ω) .$ We note that for a $p \in L^{q} (Ω, S^{d \times d}),$ we have that $div p \in L^{q} (Ω, R^{d})$ if there exists an $ω \in L^{q} (Ω, R^{d})$ such that $\int_{Ω} E ϕ \cdot p d x = - \int_{Ω} ϕ \cdot ω d x, for all ϕ \in C_{c}^{\infty} (Ω, R^{d}),$ with $E ϕ$ denoting the L¹ function representing the absolutely continuous part of $E ϕ,$ with respect to the Lebesgue measure. Note that since $ϕ$ is smooth, we have $E ϕ = E ϕ .$ As before ω is unique and we set $div p = ω .$ Finally ${div}^{2} p \in L^{q} (Ω)$ if there exists a function $v \in L^{q} (Ω)$ such that $\int_{Ω} \nabla ϕ \cdot div p d x = - \int_{Ω} ϕ v d x, for all ϕ \in C_{c}^{\infty} (Ω) .$

This space is equipped with the norm $‖ p ‖_{W^{q} ({div}^{2}; Ω)}^{q} : = ‖ p ‖_{L^{q} (Ω)}^{q} + ‖ div p ‖_{L^{q} (Ω, R^{d})}^{q} + ‖ {div}^{2} p ‖_{L^{q} (Ω)}^{q} .$ We refer to [Citation12] for a more general definition of these spaces. Note that when q = 2 these spaces are Hilbertian and then the standard notation is $H (div; Ω)$ and $H ({div}^{2}; Ω);$ see [Citation43]. The Banach spaces $W_{0}^{q} (div; Ω)$ and $W_{0}^{q} ({div}^{2}; Ω)$ are defined as $W_{0}^{q} (div; Ω) = {\bar{C_{c}^{\infty} (Ω, R^{d})}}^{‖ \cdot ‖_{W^{q} (div; Ω)}}, W_{0}^{q} ({div}^{2}; Ω) = {\bar{C_{c}^{\infty} (Ω, S^{d \times d})}}^{‖ \cdot ‖_{W^{q} ({div}^{2}; Ω)}},$ with the analogous notation $H_{0} (div; Ω)$ and $H_{0} ({div}^{2}; Ω)$ for q = 2. Using the definitions above, the following integration by parts formulae hold true: (2.2) $\int_{Ω} \nabla ϕ \cdot p d x = - \int_{Ω} ϕ div p d x, for all p \in W_{0}^{q} (div; Ω), ϕ \in C^{\infty} (\bar{Ω}, R),$ (2.2) (2.3) $\int_{Ω} E ϕ \cdot p d x = - \int_{Ω} ϕ \cdot div p d x, for all p \in W_{0}^{q} ({div}^{2}; Ω), ϕ \in C^{\infty} (\bar{Ω}, R^{d}),$ (2.3) (2.4) $\int_{Ω} \nabla ϕ \cdot div p d x = - \int_{Ω} ϕ {div}^{2} p d x, for all p \in W_{0}^{q} ({div}^{2}; Ω), ϕ \in C^{\infty} (\bar{Ω}, R) .$ (2.4)

3.3. Weighted TGV

Throughout the remainder of this work we use the weighted TGV functional (1.2) with $α_{0}, α_{1} \in C (\bar{Ω})$ and $α_{0} (x), α_{1} (x) > \underline{α} > 0,$ $\underline{α} \in R, x \in Ω .$ We denote by $| \cdot |$ the finite dimensional Euclidean norm. We note that for an $R^{ℓ}$ -valued finite Radon measure $μ = (m_{1}, \dots, m_{ℓ}),$ we denote by $| μ |$ its total variation measure, where for every Borel $E \subseteq Ω$ $| μ | (E) = \sup {\sum_{n = 0}^{\infty} | μ (E_{n}) | : E_{n} Borel pairwise disjoint, E = \cup_{n = 0}^{\infty} E_{n}} .$

Note also that it can be shown $| μ | (Ω) = \sup {\sum_{i = 1}^{ℓ} \int_{Ω} ϕ_{i} d μ_{i} : ϕ \in C_{c}^{\infty} (Ω, R^{ℓ}), | ϕ (x) | \leq 1, for all x \in Ω} .$

We will show that for $u \in L^{2} (Ω)$ the space $C_{c}^{\infty} (Ω, S^{d \times d})$ in (1.2) can be substituted by $H_{0} ({div}^{2}; Ω)$ (and by $W_{0}^{d} ({div}^{2}; Ω)$ for $u \in L^{d / d - 1} (Ω)$ ). This fact will be instrumental when deriving the predual of the TGV minimization problem. For this we need the following result:

Proposition 2.1.

The weighted ${TGV}_{α}^{2}$ functional (1.2) admits the equivalent expression (2.5) ${TGV}_{α}^{2} (u) = \min_{w \in BD (Ω)} \int_{Ω} α_{1} d | D u - w | + \int_{Ω} α_{0} d | E w | .$ (2.5)

Proof.

The proof is analogous to the one for the scalar TGV functional; see for instance [Citation34, Theorem 3.5] or [Citation12, Proposition 2.8]. Here, we highlight only the significant steps. Indeed, given $u \in L^{1} (Ω),$ the idea is to define $\begin{matrix} U = C_{0}^{1} (Ω, R^{d}) \times C_{0}^{2} (Ω; S^{d \times d}), V = C_{0}^{1} (Ω, R^{d}), \\ Λ : U \to V, Λ (u_{1}, u_{2}) = - u_{1} - div u_{2}, \\ F_{1} : U \to \bar{R}, F_{1} (u_{1}, u_{2}) = - \int_{Ω} u div u_{1} + I_{{| \cdot (x) | \leq α_{1} (x)}} (u_{1}) + I_{{| \cdot (x) | \leq α_{0} (x)}} (u_{2}), \\ F_{2} : V \to \bar{R}, F_{2} (v) = I_{{0}} (v) . \end{matrix}$

Here, $I_{S} (\cdot)$ denotes the indicator function of a set S. Now, after realizing that (2.6) ${TGV}_{α}^{2} (u) = \sup_{(u_{1}, u_{2}) \in U} - F_{1} (u_{1}, u_{2}) - F_{2} (Λ (u_{1}, u_{2})),$ (2.6) the proof proceeds by showing that the dual problem of (2.6) is equivalent to (2.5) and then applying the Fenchel duality result [Citation44]. We note that in order to achieve zero duality gap between the primal and the dual problem the so-called Attouch-Brezis condition needs to be satisfied see [Citation45], that is one needs to show that the set $\underset{λ \geq 0}{\cup} λ (dom (F_{2}) - Λ (dom (F_{1})))$ is a closed subspace of V. Indeed one can easily see that $\begin{matrix} \underset{λ \geq 0}{\cup} λ (dom (F_{2}) - Λ (dom (F_{1}))) = {λ (u_{1} + div u_{2}) : (u_{1}, u_{2}) \in U, \\ | u_{1} (x) | \leq α_{1} (x), | u_{2} (x) | \leq α_{0} (x), \forall x \in Ω} = V . \end{matrix}$

Note that for the above equality we crucially used the fact that α₁ is bounded away from zero.

The other subtle point is the following density result which is required in order to show that (2.6) is indeed equal to (1.2): (2.7) $\begin{matrix} {\bar{{ϕ \in C_{c}^{\infty} (Ω, S^{d \times d}) : | ϕ (x) | \leq α_{0} (x), | div ϕ (x) | \leq α_{1} (x), for all x \in Ω}}}^{‖ \cdot ‖_{C_{0}^{2}}} \\ = {ψ \in C_{0}^{2} (Ω, S^{d \times d}) : | ψ (x) | \leq α_{0} (x), | div ψ (x) | \leq α_{1} (x), for all x \in Ω} . \end{matrix}$ (2.7)

Indeed let ψ belong to the second set in (2.7), and let $ϵ > 0 .$ Choose $0 < λ_{ϵ} < 1$ such that (2.8) $‖ ψ - λ_{ϵ} ψ ‖_{C_{0}^{2}} < ϵ / 2.$ (2.8)

Since α₀ and α₁ are continuous and bounded away from zero there exists $α_{ϵ} > 0,$ smaller than the minimum of $α_{0}, α_{1},$ such that $| λ_{ϵ} ψ (x) | \leq α_{0} (x) - α_{ϵ}, | div λ_{ϵ} ψ (x) | \leq α_{1} (x) - α_{ϵ}, for all x \in Ω .$

From standard density properties there exists a function $ϕ_{ϵ} \in C_{c}^{\infty} (Ω, S^{d \times d})$ such that the following conditions hold for all $x \in Ω :$ (2.9) $‖ ϕ_{ϵ} - λ_{ϵ} ψ ‖_{C_{0}^{2}} < ϵ / 2, | ϕ_{ϵ} (x) - λ_{ϵ} ψ (x) | \leq α_{ϵ} / 2, | div ϕ_{ϵ} (x) - div λ_{ϵ} ψ (x) | \leq α_{ϵ} / 2,$ (2.9) which implies (2.10) $| ϕ_{ϵ} (x) | \leq α_{0} (x) - α_{ϵ} / 2, | div ϕ_{ϵ} (x) | \leq α_{1} (x) - α_{ϵ} / 2, for all x \in Ω .$ (2.10)

Then, from (2.10) it follows that $ϕ_{ϵ}$ belongs to the first set in (2.7) and from (2.8) and (2.9) we get that $‖ ψ - ϕ_{ϵ} ‖_{C_{0}^{2}} < ϵ .$ □

Now we are ready to establish the density result needed for dualization. For the sake of the flow of presentation we defer the proof, which parallels the one of [Citation12, Proposition 3.3], to Appendix A. Below “a.e.” stands for “almost every” with respect to the Lebesgue measure.

Proposition 2.2.

For $u \in L^{2} (Ω)$ , the weighted TGV functional (1.2) can be equivalently written as (2.11) $\begin{matrix} {TGV}_{α}^{2} (u) = \sup {\int_{Ω} u {div}^{2} p d x : p \in H_{0} ({div}^{2}; Ω), & | p (x) | \leq α_{0} (x), \\ | div p (x) | \leq α_{1} (x), fora . e . x \in Ω} . \end{matrix}$ (2.11)

Remark:

By slightly amending the proof of Proposition 2.2, one can also show that if $u \in L^{d / d - 1} (Ω) \supset BV (Ω)$ then the dual variables p can be taken to belong to $W_{0}^{d} ({div}^{2}; Ω)$ rather than $H_{0} ({div}^{2}; Ω) .$

3.4. The predual weighted TGV problem

Now we study the predual problem for the weighted TGV model with continuous weights, i.e., we use the regularization functional (1.2) or equivalently (2.11). For $T \in L (L^{2} (Ω), L^{2} (Ω))$ we assume that $B : = T^{*} T$ is invertible and define $‖ v ‖_{B}^{2} = \int_{Ω} v B^{- 1} v,$ which induces a norm in $L^{2} (Ω)$ equivalent to the standard one; compare [Citation31]. When $T^{*} T$ is not invertible one can add an extra regularization term of the form $\frac{κ}{2} \int_{Ω} u^{2} d x,$ for some small $κ > 0,$ to the objective in (1.1). Then $B = κ Id + T^{*} T$ will be invertible, see a similar approach in [Citation27, Citation29, Citation31]. We mention that a different set-up, as adopted for instance in [Citation46], assumes $T \in L (L^{d / d - 1}, L^{2} (Ω)),$ since $BV (Ω)$ continuously embeds to $L^{d / d - 1} (Ω) .$ Then one readily finds that for d > 2, $B$ cannot be invertible, in general. Even though invertibility of $T^{*} T$ is not necessary when dealing only with the TGV regularization problem (1.1), additional challenges would arise in our overall bilevel optimization context, such as, e.g., a lack of uniqueness of the solution for the lower level problem, i.e., genuine set-valuedness of the regularization-weight-to-reconstruction map cf. the constraint set in (1.4). In the next proposition we prove zero duality gap for (1.1) and its predual. We note however that the invertibility assumption on $T^{*} T$ is not needed to establish zero duality gap but rather to get an explicit formulation of the predual problem. An adaptation of the following proposition without this invertibility assumption can be done for instance by simply adjusting the corresponding proof of [Citation46, Theorem 5.4] where analogous dualization results for a class of structural TV regularizers were considered.

Proposition 2.3.

Let $f \in L^{2} (Ω)$ , then there exists a solution to the primal problem (2.12) $minimize \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} + {TGV}_{α}^{2} (u) over u \in BV (Ω),$ (2.12) as well as to its predual problem (2.13) $\begin{matrix} minimize \frac{1}{2} ‖ T^{*} f - {div}^{2} p ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2}}^{2} over p \in H_{0} ({div}^{2}; Ω) \\ subject to | p (x) | \leq α_{0} (x), | div p (x) | \leq α_{1} (x), fora . e . x \in Ω, \end{matrix}$ (2.13) and there is no duality gap, i.e., the primal and predual optimal objective values are equal. Moreover, the solutions u and p of these problems satisfy (2.14) $B u = T^{*} f - {div}^{2} p .$ (2.14)

Proof.

We set $U = H_{0} ({div}^{2}; Ω), V = L^{2} (Ω),$ $Λ : U \to V$ with $Λ p = {div}^{2} p,$ and also $F_{1} : U \to \bar{R}$ and $F_{2} : V \to \bar{R}$ with (2.15) $F_{1} (p) = I_{{| \cdot (x) | \leq α_{0} (x), fora . e . x}} (p) + I_{{| div \cdot (x) | \leq α_{1} (x), fora . e . x}} (p),$ (2.15) (2.16) $F_{2} (ψ) = \frac{1}{2} ‖ T^{*} f - ψ ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} .$ (2.16)

Immediately one gets that (2.17) $\inf_{p \in U} F_{1} (p) + F_{2} (Λ p) = \min_{\begin{matrix} p \in H_{0} ({div}^{2}; Ω) \\ | p (x) | \leq α_{0} (x) \\ | div p (x) | \leq α_{1} (x) \end{matrix}} \frac{1}{2} ‖ T^{*} f - {div}^{2} p ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} .$ (2.17)

The problem in (2.17) admits a solution. Indeed, first observe that the objective is bounded from below. Then note that since $\frac{1}{2} ‖ T \cdot - f ‖_{L^{2} (Ω)}^{2}$ is continuous at $0 \in L^{2} (Ω),$ its convex conjugate (see [Citation44] for a general definition) which is equal to $\frac{1}{2} ‖ T^{*} f + \cdot ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2}$ is coercive in $L^{2} (Ω);$ see [Citation47, Theorem 4.4.10]. Hence, any infimizing sequence ${(p_{n})}_{n \in N}$ is bounded in $H_{0} ({div}^{2}; Ω),$ and thus there exist an (unrelabeled) subsequence and $p \in H ({div}^{2}; Ω)$ such that $p_{n} ⇀ p, div p_{n} ⇀ div p$ and ${div}^{2} p_{n} ⇀ {div}^{2} p$ weakly in L². We also have that p is a feasible point since the set ${(h, div h, {div}^{2} h) : h \in H_{0} ({div}^{2}; Ω), | h (x) | \leq α_{0} (x), | div h (x) | \leq α_{1} (x), fora . e . x \in Ω},$ is weakly closed. Then p is a minimizer of (2.17) as $\frac{1}{2} ‖ T^{*} f - \cdot ‖_{B}^{2}$ is weakly lower semicontinuous in $L^{2} (Ω) .$

We now calculate the expression $F_{1}^{*} (Λ^{*} u) + F_{2}^{*} (- u)$ for $u \in V^{*} = L^{2} (Ω) .$ As before one verifies by direct computation that $F_{2}^{*} (- u) = \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} .$ Moreover, $\begin{matrix} F_{1}^{*} (Λ^{*} u) = \sup_{p \in U} {{〈 Λ^{*} u, p 〉}_{U^{*}, U} - F_{1} (p)} = \sup_{p \in U} {{〈 u, Λ p 〉}_{L^{2} (Ω), L^{2} (Ω)} - F_{1} (p)} \\ = \sup_{\begin{matrix} | p (x) | \leq α_{0} (x) \\ | div p (x) | \leq α_{1} (x) \\ p \in H_{0} ({div}^{2}; Ω) \end{matrix}} \int_{Ω} {div}^{2} p d x = {TGV}_{α}^{2} (u), \end{matrix}$ where for the last equality we used Proposition 2.2 and its underlying density result. In order to prove that there is no duality gap, it suffices again to verify the Attouch-Brezis condition and show that the set $\underset{λ \geq 0}{\cup} λ (dom (F_{2}) - Λ (dom (F_{1})))$ is a closed subspace of V. It is immediate to see that $dom (F_{2}) = L^{2} (Ω),$ and hence the condition holds true. Thus, we also get existence of a solution for the primal problem (2.12). Finally (2.14) follows from the optimality condition (Euler-Lagrange system) that corresponds to $Λ p \in \partial F_{2}^{*} (- u) .$ □

The primal-dual optimality conditions for the problems (2.12) and (2.13) read (2.18) $p \in \partial F_{1}^{*} (Λ^{*} u),$ (2.18) (2.19) $Λ p \in \partial F_{2}^{*} (- u),$ (2.19) and we note once again that (2.14) corresponds to (2.19) with F₂ and Λ as in the proof of Proposition 2.12. Instead of making the optimality condition that corresponds to (2.18) explicit, we are interested in the analogous optimality conditions written in the variables u and w of the equivalent primal weighted TGV problem (2.20) $\min_{\overset{u \in BV (Ω)}{w \in BD (Ω)}} \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} d | D u - w | + \int_{Ω} α_{0} d | E w | .$ (2.20)

For this purpose note first that the predual problem (2.13) can be equivalently written as (2.21) ${\begin{matrix} minimize \frac{1}{2} ‖ T^{*} f + div q ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} over (p, q) \in H_{0} ({div}^{2}; Ω) \times H_{0} (div; Ω), \\ subject to - div p = q, | p (x) | \leq α_{0} (x), | q (x) | \leq α_{1} (x), fora . e . x \in Ω . \end{matrix}$ (2.21)

The next proposition characterizes the solutions (w, u) and (p, q) of the problems (2.20), and (2.21) respectively. Before we state it, we note that in its proof we will make use of the following density results: (2.22) ${\bar{C_{α_{0}}}}^{L^{2} (Ω)} = K_{α_{0}}, {\bar{C_{α_{1}}}}^{H_{0} (div; Ω)} = K_{α_{1}},$ (2.22) where (2.23) $C_{α_{0}} : = {({div}^{2} ϕ, div ϕ) : ϕ \in C_{c}^{\infty} (Ω, S^{d \times d}), | ϕ (x) | \leq α_{0} (x), for all x \in Ω},$ (2.23) (2.24) $K_{α_{0}} : = {({div}^{2} p, div p) : p \in H_{0} ({div}^{2}; Ω), | p (x) | \leq α_{0} (x), for a . e . x \in Ω},$ (2.24) (2.25) $C_{α_{1}} : = {ψ : ψ \in C_{c}^{\infty} (Ω, R^{d}), | ψ (x) | \leq α_{1} (x), for all x \in Ω},$ (2.25) (2.26) $K_{α_{1}} : = {q : q \in H_{0} (div; Ω), | q (x) | \leq α_{1} (x), for a . e . x \in Ω} .$ (2.26)

These results can be proven by using the duality arguments of the proof of Proposition 2.2, which originate from [Citation12], or with the use of mollification techniques; see [Citation48–50].

Proposition 2.4.

The pair $(p, q) \in H_{0} ({div}^{2}; Ω) \times H_{0} (div; Ω)$ is a solution to (2.21), and $(w, u) \in BD (Ω) \times BV (Ω)$ is a solution to (2.20) if and only if the following optimality conditions are satisfied: (2.27) $B u = T^{*} f + div q,$ (2.27) (2.28) $q = - div p,$ (2.28) (2.29) $| q (x) | \leq α_{1} (x) for a . e . x \in Ω$ (2.29) (2.30) $\begin{matrix} and 〈 D u - w, \tilde{q} - q 〉 \leq 0 for every \tilde{q} \in H_{0} (div; Ω) with | \tilde{q} (x) | \leq α_{1} (x) for a . e . x \in Ω, \\ | p (x) | \leq α_{0} (x) for a . e . x \in Ω \end{matrix}$ (2.30) $and 〈 E w, \tilde{p} - p 〉 \leq 0 for every \tilde{p} \in H_{0} ({div}^{2}; Ω) with | \tilde{p} (x) | \leq α_{0} (x) fora . e . x \in Ω .$

Proof.

Define $X = (X_{1}, X_{2}) = H_{0} ({div}^{2}, Ω) \times H_{0} (div, Ω),$ $Y = (Y_{1}, Y_{2}) = H_{0} (div; Ω) \times L^{2} (Ω), Λ : X \to Y$ with $Λ (p, q) = (q + div p, div q),$ and $F_{1} : X \to \bar{R}, F_{2} : Y \to \bar{R}$ with (2.31) $F_{1} (p, q) = I_{{| \cdot (x) | \leq α_{0} (x), for a . e . x}} (p) + I_{{| \cdot (x) | \leq α_{1} (x), for a . e . x}} (q),$ (2.31) (2.32) $F_{2} (ϕ, ψ) = I_{{0}} (ϕ) + \frac{1}{2} ‖ T^{*} f + ψ ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} .$ (2.32)

One checks immediately that $\min_{(p, q) \in X} F_{1} (p, q) + F_{2} (Λ (p, q))$ corresponds to (2.21) with the dual problem reading $\min_{(w, u) \in Y^{*}} F_{1}^{*} (- Λ^{*} (w, u)) + F_{2}^{*} (w, u) .$ Observe that since $- {〈 Λ^{*} (w, u), (p, q) 〉}_{X^{*}, X} = - {〈 (w, u), Λ (p, q) 〉}_{Y^{*}, Y} = - {〈 w, div p 〉}_{Y_{1}^{*}, Y_{1}} - {〈 w, q 〉}_{Y_{1}^{*}, Y_{1}} - {〈 u, div q 〉}_{Y_{2}^{*}, Y_{2}},$ we have (2.33) $F_{1}^{*} (- Λ^{*} (w, u)) = \sup_{\overset{p \in H_{0} ({div}^{2}; Ω)}{| p (x) | \leq α_{0} (x)}} - {〈 w, div p 〉}_{Y_{1}^{*}, Y_{1}} + \sup_{\overset{q \in H_{0} (div; Ω)}{| q (x) | \leq α_{1} (x)}} - {〈 w, q 〉}_{Y_{1}^{*}, Y_{1}} - {〈 u, div q 〉}_{Y_{2}^{*}, Y_{2}} .$ (2.33)

Note that the suprema above are always greater or equal to the corresponding suprema over $C_{c}^{\infty} (Ω, S^{d \times d}) \subset H_{0} ({div}^{2}; Ω)$ and $C_{c}^{\infty} (Ω, R^{d}) \subset H_{0} (div; Ω) .$ One can easily check that there is no duality gap between these primal and dual problems – the Attouch-Brezis condition is satisfied – and thus since the primal problem is finite so is the dual. Using the fact that α₀ is bounded from below, this implies in particular that w, seen as distribution through its action on $C_{c}^{\infty} (Ω, R^{d}),$ has a distributional symmetrized gradient $E w$ with bounded Radon norm, and hence it is a Radon measure. It follows that $w \in L^{1} (Ω, R^{d})$ yielding $w \in BD (Ω);$ see [Citation34], which means that for $ψ \in C_{c}^{\infty} (Ω, R^{d})$ we have ${〈 w, ψ 〉}_{Y_{1}^{*}, Y_{1}} = \int_{Ω} w \cdot ψ d x .$ Using density of $C_{c}^{\infty} (Ω, R^{d})$ in $H_{0} (div; Ω)$ this also implies ${〈 w, q 〉}_{Y_{1}^{*}, Y_{1}} = \int_{Ω} w \cdot q d x$ and similarly ${〈 w, div p 〉}_{Y_{1}^{*}, Y_{1}} = \int_{Ω} w \cdot div p d x$ in (2.33). Using now also the density results (2.22) we have $\begin{matrix} F_{1}^{*} (- Λ^{*} (w, u)) = \sup_{\overset{p \in H_{0} ({div}^{2}; Ω)}{| p (x) | \leq α_{0} (x)}} - \int_{Ω} w \cdot div p d x + \sup_{\overset{q \in H_{0} (div; Ω)}{| q (x) | \leq α_{1} (x)}} - \int_{Ω} w \cdot q d x - \int_{Ω} u div q d x \\ = \sup_{\overset{ϕ \in C_{c}^{\infty} (Ω, S^{d \times d})}{| ϕ (x) | \leq α_{0} (x)}} - \int_{Ω} w \cdot div ϕ d x + \sup_{\overset{ψ \in C_{c}^{\infty} (Ω, R^{d})}{| ψ (x) | \leq α_{1} (x)}} - \int_{Ω} w \cdot ψ d x - \int_{Ω} u div ψ d x \\ = \sup_{\overset{ϕ \in C_{c}^{\infty} (Ω, S^{d \times d})}{| ϕ (x) | \leq α_{0} (x)}} 〈 E w, ϕ 〉 + \sup_{\overset{ψ \in C_{c}^{\infty} (Ω, R^{d})}{| ψ (x) | \leq α_{1} (x)}} 〈 D u - w, ψ 〉, \\ = \int_{Ω} α_{0} d | E w | + \int_{Ω} α_{1} d | D u - w | . \end{matrix}$

Here we used the fact that since the distribution Du – w has a finite Radon norm, due to α₁ being bounded away from zero, it can be represented by an $R^{d}$ -valued finite Radon measure and in particular $u \in BV (Ω) .$ Furthermore, as in the proof of Proposition 2.3 we have $F_{2}^{*} (w, u) = \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} .$

The fact that there is no duality gap is ensured by Propositions 2.1, 2.2 and 2.3. We now turn our attention to the optimality conditions (2.34) $(p, q) \in \partial F_{1}^{*} (- Λ^{*} (w, u)),$ (2.34) (2.35) $Λ (p, q) \in \partial F_{2}^{*} ((w, u)) .$ (2.35)

It can be checked again that (2.35) gives (2.27) and (2.28). We now expand on (2.34). We have that $(p, q) \in \partial F_{1}^{*} (- Λ^{*} (w, u))$ which is equivalent to $- Λ^{*} (w, u) \in \partial F_{1} (p, q),$ that is $F_{1} (p, q) = 0$ and $\begin{matrix} {〈 - Λ^{*} (w, u), (\tilde{p} - p, \tilde{q} - q) 〉}_{X^{*}, X} \leq F_{1} (\tilde{p}, \tilde{q}) \\ \Leftrightarrow - 〈 w, div (\tilde{p} - p) 〉 - 〈 w, \tilde{q} - q 〉 - 〈 u, div \tilde{q} - div q 〉 \leq F_{1} (\tilde{p}, \tilde{q}) \\ \Leftrightarrow 〈 E w, \tilde{p} - p 〉 \leq I_{{| \cdot (x) | \leq α_{0} (x), f . a . e . x}} (\tilde{p}) \\ 〈 D u - w, \tilde{q} - q 〉 \leq I_{{| \cdot (x) | \leq α_{1} (x), f . a . e . x}} (\tilde{q}) \\ \Leftrightarrow 〈 E w, \tilde{p} - p 〉 \leq 0 \\ 〈 D u - w, \tilde{q} - q 〉 \leq 0, \end{matrix}$ with the last two inequalities holding for any $\tilde{p} \in H_{0} ({div}^{2}; Ω)$ with $| \tilde{p} (x) | \leq α_{0} (x)$ for a.e. $x \in Ω$ and for any $\tilde{q} \in H_{0} (div; Ω)$ with $| \tilde{q} (x) | \leq α_{1} (x)$ for a.e. $x \in Ω .$ Hence we obtain (2.29) and (2.30). □

4. A Series of regularized problems

4.1. Regularization of the primal problem

With the aim of lifting the regularity of u and w to avoid measure-valued derivatives, we next consider the following regularized version of the primal weighted TGV problem (2.20): (3.1) $\begin{matrix} minimize \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} & + \int_{Ω} α_{1} | \nabla u - w | d x + \int_{Ω} α_{0} | E w | d x \\ + \frac{μ}{2} ‖ \nabla u ‖_{L^{2} (Ω)}^{2} + \frac{ν}{2} ‖ w ‖_{H^{1} (Ω, R^{d})}^{2} over (w, u) \in H^{1} (Ω, R^{d}) \times H^{1} (Ω), \end{matrix}$ (3.1) for some constants $0 < μ, ν ≪ 1 .$ Existence of solutions for (3.1) follows from standard arguments.

Observe that (3.1) is equivalent to $\min_{(w, u) \in \hat{X}} Q_{1} (w, u) + Q_{2} (R (w, u))$ where $\hat{X} = H^{1} (Ω, R^{d}) \times H^{1} (Ω),$ $\hat{Y} = L^{2} (Ω, S^{d \times d}) \times L^{2} (Ω, R^{d}), R : \hat{X} \to \hat{Y}$ with $R (w, u) = (E w, \nabla u - w), Q_{1} : \hat{X} \to R, Q_{2} : \hat{Y} \to R$ with $Q (w, u) = \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} + \frac{μ}{2} ‖ \nabla u ‖_{L^{2} (Ω, R^{d})}^{2} + \frac{ν}{2} ‖ w ‖_{H^{1} (Ω, R^{d})}^{2}$ and $Q_{2} (ψ, ϕ) = \int_{Ω} α_{1} | ϕ | d x + \int_{Ω} α_{0} | ψ | d x .$ Note that the Attouch-Brezis condition is satisfied since $dom (Q_{2}) = Y .$

Proposition 3.1.

The pairs $(w, u) \in H^{1} (Ω, R^{d}) \times H^{1} (Ω)$ and $(p, q) \in L^{2} (Ω, R^{d \times d}) \times L^{2} (Ω, R^{d})$ are solutions to (3.1) and its predual problem, respectively, if and only if the following optimality conditions are satisfied: (3.2) $B u - μ Δ u + \nabla^{*} q - T^{*} f = 0 in H^{1} {(Ω)}^{*},$ (3.2) (3.3) $ν w - ν Δ w - q + E^{*} p = 0 in H^{1} {(Ω, R^{d})}^{*},$ (3.3) (3.4) ${\begin{matrix} | q | \leq α_{1}, \\ α_{1} (\nabla u - w) - q | \nabla u - w | = 0 & if | q (x) | = α_{1} (x), \\ \nabla u - w = 0 & if | q (x) | < α_{1} (x), \end{matrix}$ (3.4) (3.5) ${\begin{matrix} | p | \leq α_{0}, \\ α_{0} E w - p | E w | = 0 & if | p (x) | = α_{0} (x), \\ E w = 0 & if | p (x) | < α_{0} (x), \end{matrix}$ (3.5) where the multiplications are regarded component-wise.

Proof.

The proof follows again easily by calculating the corresponding primal-dual optimality conditions. □

Next we study the relationship between the solutions of (2.20) and (3.1) as the parameters μ, ν tend to zero.

Proposition 3.2.

Let $μ_{n}, ν_{n} \to 0$ and let ${(w_{n}, u_{n})}_{n \in N}$ be a sequence of solution pairs of the problem (3.1). Then $u_{n} \overset{*}{⇀} u^{*}$ and $w_{n} \overset{*}{⇀} w^{*}$ in $BV (Ω)$ and $BD (Ω)$ respectively, where $(w^{*}, u^{*})$ is a solution pair for (2.20). The convergence of w_n is up to a subsequence.

Proof.

For convenience of notation, define the energies $\begin{matrix} E_{n} (w, u) = \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} | \nabla u - w | d x + \int_{Ω} α_{0} | E w | d x + \frac{μ_{n}}{2} ‖ \nabla u ‖_{L^{2} (Ω)}^{2} + \frac{ν_{n}}{2} ‖ w ‖_{H^{1} (Ω, R^{d})}^{2}, \\ E (w, u) = \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} d | D u - w | + \int_{Ω} α_{0} d | E w | . \end{matrix}$

We have (3.6) $\begin{matrix} \frac{1}{2} ‖ T u_{n} - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} | \nabla u_{n} - w_{n} | d x + \int_{Ω} α_{0} | E w_{n} | d x \leq E_{n} (w_{n}, u_{n}) \leq E_{n} (0, 0) \leq \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} . \end{matrix}$ (3.6)

Thus, the sequences ${(u_{n})}_{n \in N}$ and ${(w_{n})}_{n \in N}$ are bounded in $BV (Ω)$ and $BD (Ω),$ respectively. In order to see this, note that by setting ${\underline{α}}_{i} : = \min_{x \in \bar{Ω}} α_{i} (x),$ i = 0, 1, we get $\begin{matrix} {TGV}_{{\underline{α}}_{0}, {\underline{α}}_{1}}^{2} (u_{n}) = \min_{w \in BD (Ω)} {\underline{α}}_{1} ‖ \nabla u_{n} - w ‖_{M} + {\underline{α}}_{0} ‖ E w ‖_{M} \\ \leq \int_{Ω} α_{1} | \nabla u_{n} - w_{n} | d x + \int_{Ω} α_{0} | E w_{n} | d x \leq \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} . \end{matrix}$

Hence, ${(u_{n})}_{n \in N}$ is bounded in the sense of second-order TGV. Again using [Citation47, Theorem 4.4.10], we get that $\frac{1}{2} ‖ T \cdot - f ‖_{L^{2} (Ω)}^{2}$ is coercive, since it is the convex conjugate of $\frac{1}{2} ‖ T^{*} f + \cdot ‖_{B}^{2} - \frac{1}{2} ‖ f ‖_{L^{2}}^{2}$ which is continuous at $0 \in L^{2} (Ω) .$ This implies further that this sequence is bounded both on $L^{2} (Ω)$ and $BV (Ω) .$ The bound on ${(w_{n})}_{n \in N}$ in $BD (Ω)$ then follows from (3.6).

From compactness theorems in those spaces (for $BD (Ω)$ see for instance [Citation36, Remark 2.4]) we have that there exist $u^{*} \in BV (Ω) \cap L^{2} (Ω)$ and $w^{*} \in BD (Ω)$ such that $u_{n_{k}} \overset{*}{⇀} u^{*}$ in $BV (Ω)$ and weakly in $L^{2} (Ω),$ and $w_{n_{k}} \overset{*}{⇀} w^{*}$ in $BD (Ω)$ along suitable subsequences. Due to the lower semicontinuity of the functional E with respect to these convergences, we have for any pair $(\tilde{w}, \tilde{u}) \in H^{1} (Ω, R^{d}) \times H^{1} (Ω)$ (3.7) $E (w^{*}, u^{*}) \leq \underset{k \to \infty}{\lim \inf} E (w_{n_{k}}, u_{n_{k}}) \leq \underset{k \to \infty}{\lim \inf} E_{n_{k}} (w_{n_{k}}, u_{n_{k}}) \leq \underset{k \to \infty}{\lim \inf} E_{n_{k}} (\tilde{w}, \tilde{u}) = E (\tilde{w}, \tilde{u}) .$ (3.7)

Recall now that $LD (Ω) = {w \in L^{1} (Ω, R^{d}) : E w \in L^{1} (Ω, R^{d \times d})}$ is a Banach space endowed with the norm $‖ w ‖_{LD (Ω)} = ‖ w ‖_{L^{1} (Ω, R^{d})} + ‖ E w ‖_{L^{1} (Ω, R^{d \times d})}$ and that $C^{\infty} (\bar{Ω}, R^{d})$ is dense in that space; see [Citation35, Proposition 1.3]. From this, in combination with the fact that $C^{\infty} (\bar{Ω})$ is dense in $W^{1, 1} (Ω) \cap L^{2} (Ω)$ we have that for every $(\hat{w}, \hat{u}) \in LD (Ω) \times (W^{1, 1} (Ω) \cap L^{2} (Ω))$ there exists a sequence ${({\hat{w}}_{h}, {\hat{u}}_{h})}_{h \in N} \in C^{\infty} (\bar{Ω}, R^{d}) \times C^{\infty} (\bar{Ω}) \subseteq H^{1} (Ω, R^{d}) \times H^{1} (Ω),$ such that $E ({\hat{w}}_{h}, {\hat{u}}_{h}) \to E (\hat{w}, \hat{u}) .$ Hence, since (3.7) holds we have that (3.8) $E (w^{*}, u^{*}) \leq E (\hat{w}, \hat{u}), for all (\hat{w}, \hat{u}) \in LD (Ω) \times (W^{1, 1} (Ω) \cap L^{2} (Ω)) .$ (3.8)

Finally, by following similar steps as in the proof of [Citation6, Theorem 3], we can show that for every $(w, u) \in BD (Ω) \times (BV (Ω) \cap L^{2} (Ω))$ there exists a sequence ${(w_{h}, u_{h})}_{h \in N} \in LD (Ω) \times (W^{1, 1} (Ω) \cap L^{2} (Ω))$ such that $‖ u_{h} - u ‖_{L^{2} (Ω)} \to 0, \int_{Ω} α_{1} | \nabla u_{h} - w_{h} | d x \to \int_{Ω} α_{1} d | D u - w |, \int_{Ω} α_{0} | E w_{h} | d x \to \int_{Ω} α_{0} d | E w |,$ which implies again that $E (w_{h}, u_{h}) \to E (w, u) .$ This, together with (3.8) yields $E (w^{*}, u^{*}) \leq E (w, u), for all (w, u) \in BD (Ω) \times BV (Ω) .$

This yields that $(w^{*}, u^{*})$ is a solution pair for (2.20). Finally, from the uniqueness of the solution $u^{*}$ for (2.20) we get that the whole initial sequence ${(u_{n})}_{n \in N}$ converges to $u^{*}$ weakly $^{*}$ in $BV (Ω) .$ The uniqueness follows from the fact that T is injective and hence the functional $\frac{1}{2} ‖ T \cdot - f ‖_{L^{2} (Ω)}^{2}$ is strictly convex. □

We now proceed to the second level of regularization of the problem (3.1), which, in addition to lifting the regularity of u and w, respectively, also smoothes the non-differentiable constituents. For this purpose, we define the following primal problem, which will also be treated numerically, below: $\begin{matrix} (p_{γ}) minimize \frac{1}{2} ‖ T u - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} φ_{γ} (\nabla u - w) d x + \int_{Ω} α_{0} φ_{γ} (E w) d x \\ + \frac{μ}{2} ‖ \nabla u ‖_{L^{2} (Ω)}^{2} + \frac{ν}{2} ‖ w ‖_{H^{1} (Ω, R^{d})}^{2} over (w, u) \in H^{1} (Ω, R^{d}) \times H^{1} (Ω) . \end{matrix}$

Here $φ_{γ}$ denotes the Huber-regularized version of the Euclidean norm. That is, for a vector $v \in S, S = R^{d}$ or $R^{d \times d}$ and $γ > 0$ we use (3.9) $φ_{γ} (v) (x) = {\begin{matrix} | v (x) | - \frac{1}{2} γ & if | v (x) | \geq γ, \\ \frac{1}{2 γ} | v (x) |^{2} & if | v (x) | < γ, \end{matrix}$ (3.9) with $| \cdot |$ denoting either the Euclidean norm in $R^{d}$ or the Frobenius norm in $R^{d \times d} .$ We mention that this type of Huber regularization of $TV$ -type terms in the primal problem corresponds to an L² regularization of the dual variables in the predual [Citation32, Citation51]. In order to illustrate this consider the following denoising problem $P_{γ}$ without any H¹ regularization: (3.10) $minimize \frac{1}{2} ‖ u - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} d φ_{γ_{1}} (D u - w) + \int_{Ω} α_{0} d φ_{γ_{2}} (E w) over (w, u) \in BD (Ω) \times BV (Ω),$ (3.10) where $\begin{matrix} \int_{Ω} α_{1} d φ_{γ_{1}} (D u - w) = \int_{Ω} α_{1} φ_{γ_{1}} (\nabla u - w) d x + \int_{Ω} α_{1} d | D^{s} u |, \\ \int_{Ω} α_{0} d φ_{γ_{1}} (E w) = \int_{Ω} α_{0} φ_{γ_{2}} (E w) d x + \int_{Ω} α_{0} d | E^{s} w | . \end{matrix}$

Here $D^{s} u$ and $E^{s} w$ denote the singular parts with respect to the Lebesgue measure of Du, and $E w$ respectively, following the Lebesgue decomposition $D u = \int_{Ω} \nabla u d x + D^{s} u$ and $E w = \int_{Ω} E w d x + E^{s} w .$ The corresponding predual problem of (3.10) is given by (3.11) $\begin{matrix} maximize - \frac{1}{2} ‖ f + div q ‖_{L^{2} (Ω)}^{2} - \frac{γ_{0}}{2} \int_{Ω} \frac{1}{α_{0}} | p |^{2} d x - \frac{γ_{1}}{2} \int_{Ω} \frac{1}{α_{1}} | q |^{2} d x + \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2}, \\ over (p, q) \in W_{0}^{d} ({div}^{2}; Ω) \times W_{0}^{d} (div; Ω), \\ subject to q = - div p, | p (x) | \leq α_{0} (x), | q (x) | \leq α_{1} (x) . \end{matrix}$ (3.11) The proof is similar to the one of Proposition 2.4 with $F_{1} (p, q) = I_{{| \cdot (x) | \leq α_{0} (x)}} (p) + I_{{| \cdot (x) | \leq α_{1} (x)}} (q) - \frac{γ_{0}}{2} \int_{Ω} \frac{1}{α_{0}} | p |^{2} d x - \frac{γ_{1}}{2} \int_{Ω} \frac{1}{α_{1}} | q |^{2} d x,$ and in the dualization process we use the fact that for an S-valued measure μ we have, $\int_{Ω} α d φ_{γ} (μ) = \sup {\int_{Ω} ϕ d μ - I_{{| \cdot (x) | \leq α (x)}} (ϕ) - \frac{γ}{2} \int_{Ω} \frac{1}{α} | ϕ |^{2} d x : ϕ \in C_{c}^{\infty} (Ω, S)};$ see for instance [Citation52].

Returning to the (doubly) regularized primal problem $P_{γ},$ we are primarily interested in its associated first-order optimality conditions.

Proposition 3.3.

We have that the pairs $(w, u) \in H^{1} (Ω, R^{d}) \times H^{1} (Ω)$ and $(p, q) \in L^{2} (Ω, R^{d \times d}) \times L^{2} (Ω, R^{d})$ are solution to $P_{γ}$ and its predual problem, respectively, if and only if the following optimality conditions are satisfied: (3.12) $B u - μ Δ u + \nabla^{*} q - T^{*} f = 0 in H^{1} {(Ω)}^{*},$ (3.12) (3.13) $ν w - ν Δ w - q + E^{*} p = 0 in H^{1} {(Ω, R^{d})}^{*},$ (3.13) (3.14) $\max (| \nabla u - w |, γ_{1}) q - α_{1} (\nabla u - w) = 0 in L^{2} (Ω, R^{d}),$ (3.14) (3.15) $\max (| E w |, γ_{0}) p - α_{0} E w = 0 in L^{2} (Ω, S^{d \times d}) .$ (3.15)

(3.16)

The proof of Proposition 3.3 follows from calculating the corresponding primal-dual optimality conditions as in Proposition 3.1. Finally, in order to avoid constraint degeneracy and for the sake of differentiability for the bilevel scheme in the next section, we also employ here a smoothed version $\max_{δ} (\cdot, γ)$ of $\max$ and its derivative, denoted by $X_{δ},$ defined as follows for $r \geq 0$ and for $0 < \frac{δ}{2} < γ :$ ${max}_{δ} (r, γ) = {\begin{matrix} γ \\ \frac{1}{2 δ} {(r + \frac{δ}{2} - γ)}^{2} + γ, \\ r \end{matrix} X_{δ} (r, γ) = {\begin{matrix} 0 & if r \leq γ - \frac{δ}{2}, \\ \frac{1}{δ} (r + \frac{δ}{2} - γ) & if γ - \frac{δ}{2} < r < γ + \frac{δ}{2}, \\ 1 & if r > γ + \frac{δ}{2} . \end{matrix}$

Thus we arrive at the following regularized optimality conditions: (Opt1) $B u - μ Δ u + \nabla^{*} q - T^{*} f = 0,$ (Opt1) (Opt2) $ν w - ν Δ w - q + E^{*} p = 0,$ (Opt2) (Opt3) ${max}_{δ} (| \nabla u - w |, γ_{1}) q - α_{1} (\nabla u - w) = 0,$ (Opt3) (Opt4) ${max}_{δ} (| E w |, γ_{0}) p - α_{0} E w = 0 .$ (Opt4) Note that these would correspond to the optimality conditions of an analogue problem to $P_{γ}$ where the Huber function $φ_{γ}$ is substituted by an analogous C² version $φ_{γ, δ},$ having the property $\nabla φ_{γ, δ} (x) = x / \max_{δ} (| x |, γ)$ pointwise, and which do not write down explicitly here. The relevant approximation result now follows, where we have set $γ_{0} = γ_{1} = γ$ for simplicity.

Proposition 3.4.

Let $(w, u, q, p)$ and $(w_{γ, δ}, u_{γ, δ}, p_{γ, δ}, q_{γ, δ})$ satisfy the optimality conditions (3.2)–(3.5) and $(O p t_{1})$ – $(O p t_{4}),$ respectively. Then, as $γ, δ \to 0,$ we have $u_{γ, δ} \to u$ strongly in $H^{1} (Ω), w_{γ, δ} \to w$ strongly in $H^{1} (Ω, R^{d})$ as well as $div q_{γ, δ} \to div q$ and $q_{γ, δ} + div p_{γ, δ} \to q + div p$ weakly $^{*}$ in $H^{1} {(Ω)}^{*}$ and $H^{1} {(Ω, R^{d})}^{*},$ respectively.

Proof.

By subtracting first two equations of the optimality system of Proposition 3.1 and 3.3, respectively, we get for all $v \in H^{1} (Ω), ω \in H^{1} (Ω, R^{d})$ (3.17) $\int_{Ω} B (u - u_{γ, δ}) v d x + μ \int_{Ω} \nabla (u - u_{γ, δ}) \nabla v d x = \int_{Ω} (q_{γ, δ} - q) \nabla v d x,$ (3.17) (3.18) $ν \int_{Ω} (w - w_{γ, δ}) ω d x + ν \int_{Ω} \nabla (w - w_{γ, δ}) \nabla ω d x = \int_{Ω} (q - q_{γ, δ}) ω d x + \int_{Ω} (p_{γ, δ} - p) E ω d x .$ (3.18)

When using $v = u - u_{γ, δ}$ and $ω = w - w_{γ, δ}$ in the equations above and adding them up we get (3.19) $‖ u - u_{γ, δ} ‖_{B^{- 1}}^{2} + μ ‖ \nabla u - \nabla u_{γ, δ} ‖_{L^{2} (Ω, R^{d})}^{2} + ν ‖ w - w_{γ, δ} ‖_{H^{1} (Ω, R^{d})}^{2} = R_{1} + R_{2},$ (3.19) where $R_{1} : = \int_{Ω} {(q_{γ, δ} - q)}^{⊤} [\nabla u - w - (\nabla u_{γ, δ} - w_{γ, δ})] d x, R_{2} : = \int_{Ω} {(p_{γ, δ} - p)}^{⊤} E (w - w_{γ, δ}) d x .$

We now estimate R₁ and R₂. Consider the partitions of $Ω$ into disjoint sets (up to sets of measure zero) $Ω = A \cup I = A_{γ, δ} \cup I_{γ, δ},$ where $\begin{matrix} A & = {x \in Ω : | \nabla u - w | > 0}, I = Ω ∖ A, \\ A_{γ, δ} & = {x \in Ω : | \nabla u_{γ, δ} - w_{γ, δ} | > γ + \frac{δ}{2}}, I_{γ, δ} = Ω ∖ A_{γ, δ} . \end{matrix}$

We estimate R₁ separately on the disjoint sets $A_{γ, δ} \cap A,$ $A_{γ, δ} \cap I,$ $I_{γ, δ} \cap A$ and $I_{γ, δ} \cap I .$ Recall that $| q (x) | \leq α_{1} (x),$ as well as $| q_{γ, δ} (x) | \leq α_{1} (x)$ for almost every $x \in Ω .$ Starting from $A_{γ, δ} \cap A$ and noticing that $q = α_{1} \frac{\nabla u - w}{| \nabla u - w |}, q_{γ, δ} = α_{1} \frac{\nabla u_{γ, δ} - w_{γ, δ}}{| \nabla u_{γ, δ} - w_{γ, δ} |},$ it follows that pointwise on $A_{γ, δ} \cap A$ (with argument x left off for ease of notation) we have $\begin{matrix} {(q_{γ, δ} - q)}^{⊤} [\nabla u - w - (\nabla u_{γ, δ} - w_{γ, δ})] = q_{γ, δ} (\nabla u - w) - α_{1} | \nabla u_{γ, δ} - w_{γ, δ} | - α_{1} | \nabla u - w | + q (\nabla u_{γ, δ} - w_{γ, δ}) \\ \leq α_{1} | \nabla u - w | - α_{1} | \nabla u_{γ, δ} - w_{γ, δ} | - α_{1} | \nabla u - w | + α_{1} | \nabla u_{γ, δ} - w_{γ, δ} | \\ = 0. \end{matrix}$

Turning now to the set $A_{γ, δ} \cap I$ and recalling $\nabla u - w = 0$ we have ${(q_{γ, δ} - q)}^{⊤} [\nabla u - w - (\nabla u_{γ, δ} - w_{γ, δ})] \leq - α_{1} | \nabla u_{γ, δ} - w_{γ, δ} | + | q | | \nabla u_{γ, δ} - w_{γ, δ} | \leq 0.$

For the set $I_{γ, δ} \cap A,$ note that $| \nabla u_{γ, δ} - w_{γ, δ} | \leq γ + \frac{δ}{2}, \nabla u_{γ, δ} - w_{γ, δ} = \frac{γ}{α_{1}} q_{γ, δ} .$

Thus, we can estimate $\begin{matrix} {(q_{γ, δ} - q)}^{⊤} [\nabla u - w - (\nabla u_{γ, δ} - w_{γ, δ})] \leq q_{γ, δ} (\nabla u - w) - α_{1} | \nabla u - w | - q_{γ, δ} (\nabla u_{γ, δ} - w_{γ, δ}) + q (\nabla u_{γ, δ} - w_{γ, δ}) \\ \leq α_{1} | \nabla u - w | - α_{1} | \nabla u - w | - q_{γ, δ} (\nabla u_{γ, δ} - w_{γ, δ}) + q (\nabla u_{γ, δ} - w_{γ, δ}) \\ \leq (2 γ + δ) α_{1} \end{matrix}$

Similarly, for the set $I_{γ, δ} \cap I$ we get ${(q_{γ, δ} - q)}^{⊤} [\nabla u - w - (\nabla u_{γ, δ} - w_{γ, δ})] \leq 2 (γ + δ) α_{1} .$

Combining the above estimates we have $R_{1} \leq \int_{Ω} (2 γ + δ) α_{1} d x \to 0$ and for R₂ we similarly get $R_{2} \leq \int_{Ω} (2 γ + δ) α_{0} d x \to 0.$

Hence, from (3.19) and the fact that $‖ \cdot ‖_{B^{- 1}}$ is equivalent to $‖ \cdot ‖_{L^{2} (Ω)},$ we obtain the desired convergences for $u_{γ, δ}$ and $w_{γ, δ} .$ From this result and using (3.17) and (3.18) we get that for every $v \in H^{1} (Ω)$ and for every $ω \in H^{1} (Ω, R^{d})$ we have $\int_{Ω} v div q_{γ, δ} d x \to \int_{Ω} v div q d x and \int_{Ω} ω (q_{γ, δ} + div p_{γ, δ}) d x \to \int_{Ω} ω (q + div p) d x,$ as $γ, δ \to 0 .$ This completes the proof. □

Finally, the following approximation result holds true, when all the parameters μ, ν, γ, δ tend to zero.

Proposition 3.5.

Let $μ_{n}, ν_{n}, γ_{n}, δ_{n} \to 0$ , and denote by $u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} \in H^{1} (Ω)$ the solution of (Opt₁)–(Opt₁) with $(μ, ν, γ, δ) = (μ_{n}, ν_{n}, γ_{n}, δ_{n})$ . Then $u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} \overset{*}{⇀} u^{*}$ in $BV (Ω)$ , where $u^{*}$ solves (2.20).

Proof.

It is easy to show that $u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} \to u^{*}$ in $L^{1} (Ω) .$ Indeed, we have $‖ u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} - u^{*} ‖_{L^{1} (Ω)} \leq ‖ u_{μ_{n}, ν_{n}, 0, 0} - u^{*} ‖_{L^{1} (Ω)} + ‖ u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} - u_{μ_{n}, ν_{n}, 0, 0} ‖_{L^{1} (Ω)} .$

According to Proposition 3.2 it holds that $‖ u_{μ_{n}, ν_{n}, 0, 0} - u^{*} ‖_{L^{1} (Ω)} \to 0 .$ The other term tends to zero according to Equationequation (3.19)(3.19) $‖ u - u_{γ, δ} ‖_{B^{- 1}}^{2} + μ ‖ \nabla u - \nabla u_{γ, δ} ‖_{L^{2} (Ω, R^{d})}^{2} + ν ‖ w - w_{γ, δ} ‖_{H^{1} (Ω, R^{d})}^{2} = R_{1} + R_{2},$ (3.19) of Proposition 3.4. There, the estimates for R₁, R₂ are not affected if we substitute u and $u_{γ, δ}$ by $u_{μ_{n}, ν_{n}, 0, 0}$ and $u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}},$ respectively. In other words, the estimate $‖ u_{μ_{n}, ν_{n}, 0, 0} - u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} ‖_{L^{2} (Ω)}^{2} \leq C (2 γ_{n} + δ_{n}) | Ω | ‖ α_{0} + α_{1} ‖_{\infty}$ holds for some C > 0 and hence $‖ u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} - u_{μ_{n}, ν_{n}, 0, 0} ‖_{L^{1} (Ω)} \to 0 .$

To finish the proof and show that the convergence is weak $^{*}$ in $BV (Ω),$ it suffices to establish that $\int_{Ω} | \nabla u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} d x |$ is uniformly bounded in n; see [Citation33, Proposition 3.13]. Observe first that, if $φ_{γ, δ}$ is the C² regularized Huber function, that corresponds to the δ-smoothing of $\max,$ then as in the proof of Proposition 3.2 we get (3.20) $\int_{Ω} α_{1} φ_{γ, δ} (\nabla u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} - w_{μ_{n}, ν_{n}, γ_{n}, δ_{n}}) d x + \int_{Ω} α_{0} φ_{γ, δ} (E w_{μ_{n}, ν_{n}, γ_{n}, δ_{n}}) d x \leq \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} .$ (3.20)

As in (3.9) we also have that $φ_{γ, δ} (\cdot) \geq | \cdot | - \frac{1}{2} γ,$ and hence we obtain (3.21) $\int_{Ω} α_{1} | \nabla u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} - w_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} | d x + \int_{Ω} α_{0} | E w_{μ_{n}, ν_{n}, γ_{n}, δ_{n}} | d x \leq \frac{1}{2} ‖ f ‖_{L^{2} (Ω)}^{2} + \frac{(‖ α_{1} ‖_{\infty} + ‖ α_{0} ‖_{\infty}) | Ω | γ_{n}}{2} \leq K,$ (3.21) for some constant K > 0. Then, as in the proof of Proposition 3.2, we get that ${(u_{μ_{n}, ν_{n}, γ_{n}, δ_{n}})}_{n \in N}$ is bounded in TGV which, together with the L¹ bound, gives the desired bound in TV. □

5. The bilevel optimization scheme

In this section we will adapt the bilevel optimization framework developed in [Citation27, Citation29] in order to automatically select the regularization functions α₀ and α₁. The main idea is to minimize a suitable upper level objective over both the image u and the regularization parameters α₀, α₁ subject to u being a solution to a (regularized) TGV-based reconstruction problem with these regularization weights.

It is useful to recall the definitions of the localized residual $R$ and the function F as stated in the introduction: (4.1) $R u (x) = \int_{Ω} w (x, y) {(T u - f)}^{2} (y) d y,$ (4.1) where $w \in L^{\infty} (Ω \times Ω)$ with $\int_{Ω} \int_{Ω} w (x, y) dxdy = 1$ and (4.2) $F (v) : = \frac{1}{2} \int_{Ω} \max {(v - {\bar{σ}}^{2}, 0)}^{2} d x + \frac{1}{2} \int_{Ω} \min {(v - {\underline{σ}}^{2}, 0)}^{2} d x,$ (4.2) for some appropriately chosen ${\underline{σ}}^{2}, {\bar{σ}}^{2} .$

We thus end up to the following bilevel minimization problem: (4.3) ${\begin{matrix} \min J (u, α_{0}, α_{1}) : = F (R (u)) + \frac{λ_{0}}{2} ‖ α_{0} ‖_{H^{1} (Ω)}^{2} + \frac{λ_{1}}{2} ‖ α_{1} ‖_{H^{1} (Ω)}^{2}, \\ over (u, α_{0}, α_{1}) \in H^{1} (Ω) \times A_{a d}^{0} \times A_{a d}^{1}, \\ \begin{matrix} subject to (u, w) = \underset{(\tilde{u}, \tilde{w})}{argmin} & \frac{1}{2} ‖ T \tilde{u} - f ‖_{L^{2} (Ω)}^{2} + \int_{Ω} α_{1} φ_{γ, δ} (\nabla \tilde{u} - \tilde{w}) d x + \int_{Ω} α_{0} φ_{γ, δ} (E \tilde{w}) d x \\ + \frac{μ}{2} ‖ \nabla \tilde{u} ‖_{L^{2} (Ω)}^{2} + \frac{ν}{2} ‖ \tilde{w} ‖_{H^{1} (Ω, R^{d})}^{2} . \end{matrix} \end{matrix}$ (4.3)

Utilizing the regularized primal-dual first-order optimality characterization (Opt₁)–(Opt₄) of the solution to the lower level problem of (4.3), we arrive at the following mathematical program with equilibrium constraints (MPEC, for short) which is equivalent to (4.3): $(P_{TGV}) {\begin{matrix} \min J (u, α_{0}, α_{1}) : = F (R (u)) + \frac{λ_{0}}{2} ∥ α_{0} ∥_{H^{1} (Ω)}^{2} + \frac{λ_{1}}{2} ∥ α_{1} ∥_{H^{1} (Ω)}^{2}, \\ over (u, α_{0}, α_{1}) \in H^{1} (Ω) \times A_{a d}^{0} \times A_{a d}^{1}, \\ \begin{matrix} subject to B u - μ Δ u + \nabla^{*} q - T^{*} f = 0, \\ ν w - ν Δ w - q + E^{*} p = 0, \\ {max}_{δ} (| \nabla u - w |, γ_{1}) q - α_{1} (\nabla u - w) = 0, \\ {max}_{δ} (| E w |, γ_{0}) p - α_{0} E w = 0. \end{matrix} \end{matrix}$

Note that in view of the equivalence of (4.3) and ( $P$ _TGV), we will still refer the latter as the bilevel TGV problem. A few words about ( $P$ _TGV) are in order. Here, α_i are forced to be contained in the box constraint sets (4.4) $A_{a d}^{i} : = {α_{i} \in H^{1} (Ω) : {\underline{α}}_{i} \leq α_{i} \leq {\bar{α}}_{i}}, i = 0, 1,$ (4.4) with ${\underline{α}}_{i}, {\bar{α}}_{i} \in L^{2} (Ω)$ and $0 < \underline{ϵ} \leq {\underline{α}}_{i} (x) < {\bar{α}}_{i} (x) - \bar{ϵ}$ in $Ω$ for some $\underline{ϵ}, \bar{ϵ} > 0,$ i = 0, 1. Note that the H¹ regularity on the parameter functions $α_{0}, α_{1}$ facilitates the existence and differential sensitivity analysis as established in [Citation27, Citation29] for the TV case. Note, however, that this setting does not guarantee a priori that these functions belong to $C (\bar{Ω}),$ the regularity required for applying the dualization results of the previous sections. Nevertheless, under mild data assumptions, one can make use of the following regularity result of the H¹–projection onto the sets $A_{a d}^{0}$ and $A_{a d}^{1};$ see [Citation29, Corollary 2.3] for a proof.

Proposition 4.1.

Let $Ω \subset R^{ℓ}$ with $ℓ = 1, 2, 3$ be a bounded convex set and let $A_{a d} : = {α \in H^{1} (Ω) : \underline{α} \leq α \leq \bar{α}}$ , where $\underline{α}, \bar{α} \in H^{2} (Ω)$ such that $\underline{α} \leq \bar{α}$ and $\frac{\partial \underline{α}}{\partial ν} = \frac{\partial \bar{α}}{\partial ν} = 0$ in $H^{1 / 2} (\partial Ω)$ . Then if $ω^{*} = P_{A_{a d}} (ω) : = \underset{α \in A}{argmin} \frac{1}{2} ‖ α - ω ‖_{H^{1} (Ω)}^{2},$ it holds $ω \in H^{2} (Ω) and \frac{\partial ω}{\partial ν} = 0 \Rightarrow ω^{*} \in H^{2} (Ω) and \frac{\partial ω^{*}}{\partial ν} = 0.$

In particular, if ${\underline{α}}_{0}, {\bar{α}}_{0},$ ${\underline{α}}_{1}, {\bar{α}}_{1}$ as well as the initializations for α₁ and α₀ are constant functions, then along the projected gradient iterations of Algorithm Citation2, the weights are guaranteed to belong to $H^{2} (Ω)$ which (for dimension $d \leq 2$ ) embeds into $C (\bar{Ω}) .$

We briefly note that in the TV case it can be shown [Citation30, Citation46] that $W^{1, 1}$ regularity for the regularization parameter α suffices to establish a dualization framework. A corresponding result is not yet known for TGV, even though one expects that it could be shown by similar arguments. Hence, here we will also make use of the H¹–projection regularity result as described above.

Regarding the box constraints (4.4) in [Citation25] it was shown that for a PSNR-optimizing upper level objective $\tilde{J} (u, α) = ‖ u (α) - f ‖_{L^{2} (Ω)}^{2}$ subject to H¹ and Huber regularized TV and TGV denoising problems, under some mild conditions on the data f, the optimal scalar solutions α and $(α_{0}, α_{1})$ are strictly positive. As depicted in the upper level objective discussed here appears close to optimizing the PSNR, keeping the parameters strictly positive via (4.4) seems, however, necessary for the time being.

We now briefly discuss how to treat the bilevel problem ( $P$ _TGV). Let $(α_{0}, α_{1}) \mapsto u (α_{0}, α_{1})$ denote the solution map for the lower level problem, equivalently of the optimality conditions (Opt₁)–(Opt₄). Then the problem ( $P$ _TGV) admits the following reduced version (4.5) $4 \min \hat{J} (α_{0}, α_{1}) : = J (u (α_{0}, α_{1}), α_{0}, α_{1}) over α_{0} \in A_{a d}^{0}, α_{1} \in A_{a d}^{1} .$ (4.5)

Similarly to the TV case [Citation27], one can show that the reduced functional $\hat{J} : H^{1} (Ω) \times H^{1} (Ω) \to R$ is differentiable. We can then apply the KKT framework in Banach spaces [Citation53]: (4.6) ${\begin{matrix} minimize J (x) over x \in X, \\ subject to x \in C and g (x) = 0, \end{matrix}$ (4.6) where $V, A, Z$ are Banach spaces, $X = V \times A,$ $J : X \to R$ and $g : X \to Z$ are Fréchet differentiable and continuous differentiable functions, respectively, and $C \subset X$ is a non-empty, closed convex set. In the bilevel TGV problem ( $P$ _TGV) we have $V = H^{1} (Ω) \times H^{1} (Ω, R^{d}) \times L^{2} (Ω, R^{d}), L^{2} (Ω, S^{d \times d}),$ $A = H^{1} (Ω) \times H^{1} (Ω),$ $C = V \times A_{a d}^{0} \times A_{a d}^{1},$ and $Z = H^{1} {(Ω)}^{*} \times H^{1} {(Ω, R^{d})}^{*} \times L^{2} (Ω, R^{d}), L^{2} (Ω, S^{d \times d}) .$ Here $g : X \to Z$ is defined by the optimality conditions (Opt₁)–(Opt₄). Finally, for $x = (u, w, q, p, α_{0}, α_{1}) : = (x, α_{0}, α_{1}),$ we have $J (x) = J (u, α_{0}, α_{1}) .$ Note that the framework of (4.6) guarantees the existence of an adjoint variable $x^{*} \in V$ with the help of which an optimal solution of ( $P$ _TGV) can be characterized. This adjoint variable also allows the computation of the derivative of the reduced objective $\hat{J}'$ in an amenable way, see next section.

We will skip here the proofs for the differentiability of the functions g and the reduced objective J as well as the existence proofs for ( $P$ _TGV). These results can be shown similarly to the corresponding assertions for TV; see [Citation27, Theorem. 6.1, Proposition 6.2, Proposition 6.3].

6. Numerical implementation

In this section we will describe a Newton method for the lower level problem, a projected gradient algorithm for the solution of the discretized version of the bilevel problem ( $P$ _TGV), as well as provide corresponding numerical examples in denoising.

6.1. Newton solver for the lower level problem

Before we proceed to devising of a projected gradient algorithm for the solution of the bilevel problem, we discuss first a primal-dual Newton algorithm for the solution of the first-order optimality conditions (Opt₁)– (Opt₄) re-written here for the sake of readability: (5.1) $B u - μ Δ u - div q - f = 0,$ (5.1) (5.2) $ν w - ν Δ w - q - div p = 0,$ (5.2) (5.3) ${max}_{δ} (| \nabla u - w |, γ_{1}) q - α_{1} (\nabla u - w) = 0,$ (5.3) (5.4) ${max}_{δ} (| E w |, γ_{0}) p - α_{0} E w = 0.$ (5.4)

A few words on the discrete involved quantities are in order. Images (d = 2) are considered as elements of $U_{h} : = {u | u : Ω_{h} \to R}$ where $Ω_{h} = {1, 2, \dots, n} \times {1, 2, \dots, m}$ is a discrete cartesian grid that corresponds to the image pixels. The mesh size, defined as the distance between the grid points, is set to $h = 1 / \sqrt{n m} .$ We define the associated discrete function spaces $W_{h} = U_{h} \times U_{h}, V_{h} = U_{h} \times U_{h} \times U_{h},$ so that $p \in V_{h}$ with $p = (p^{11}, p^{12}, p^{22}) .$ For the discrete gradient and divergence we have, $\nabla : W_{h} \to V_{h}$ and $div : V_{h} \to W_{h}$ satisfying the adjoint relation $\nabla = - {div}^{⊤},$ setting zero values at the ghost points. The discretized symmetrized gradient Ew is defined as $\frac{1}{2} (\nabla w + {(\nabla w)}^{⊤}) .$ For the discretized versions of the Laplacian, we use the standard five-point stencils with zero Neumann boundary conditions, by setting the function values of ghost grid points to be the same with the function value of the nearest grid point in $Ω_{h} .$ Note that these act on the primal variables u and w, which satisfy natural boundary conditions in contrast to the dual variable.

The system of Equationequations (5.1)–(5.4) can be shortly written as $g_{pd} (x) = 0,$ where $x = (u, w, q, p) .$ We compute the derivative of $g_{pd}$ at a point $x = (u, w, q, p)$ as the following block-matrix: $D g_{pd} (x) = D g_{pd} (u, w, q, p) = [\begin{matrix} A & B \\ C & D \end{matrix}],$ where (5.5) $A = [\begin{matrix} B - μ Δ & 0 \\ 0 & ν (I - Δ) \end{matrix}], B = [\begin{matrix} - div & 0 \\ - I & - div \end{matrix}], D = [\begin{matrix} {max}_{δ} (| \nabla u - w |, γ_{1}) & 0 \\ 0 & {max}_{δ} (| E w |, γ_{0}) \end{matrix}],$ (5.5) (5.6) $C = [\begin{matrix} - α_{1} \nabla + q X_{δ} (| \nabla u - w |, γ_{1}) \frac{\nabla u - w}{| \nabla u - w |} \cdot \nabla & α_{1} I + q X_{δ} (| \nabla u - w |, γ_{1}) \frac{\nabla u - w}{| \nabla u - w |} \cdot (- I) \\ 0 & - α_{0} E + p X_{δ} (| E w |, γ_{0}) \frac{E w}{| E w |} \cdot E \end{matrix}] .$ (5.6)

Given $x^{k},$ the Newton iteration for solving the system of Equationequations (5.1)–(5.4), or $g_{pd} (x) = 0$ for short, reads $x^{k + 1} = x^{k} - D F {(x^{k})}^{- 1} F (x^{k}),$ which can also be written as (5.7) $D g_{pd} (x^{k}) x^{k + 1} = D g_{pd} (x^{k}) x^{k} - g_{pd} (x^{k}) .$ (5.7)

Here it is convenient to introduce the notation $D g_{pd} (x^{k}) = D g_{pd} (u^{k}, w^{k}, q^{k}, p^{k}) = [\begin{matrix} A & B \\ C_{k} & D_{k} \end{matrix}]$ since only the submatrices C and D depend on k. Note that the right-hand side $D g_{pd} (x^{k}) x^{k} - g_{pd} (x^{k})$ of the linear system (5.7) can be written as $D g_{pd} (x^{k}) x^{k} - F (x^{k}) = (\begin{matrix} b_{1}^{k} \\ b_{2}^{k} \end{matrix}),$ where $b_{1}^{k} = {(f, 0)}^{⊤}, and b_{2}^{k} = {(q^{k} X_{δ} (| \nabla u^{k} - w^{k} |, γ_{1}) | \nabla u^{k} - w^{k} |, p^{k} X_{δ} (| E w^{k} |, γ_{0}) | E w^{k} |)}^{⊤} .$

Notation-wise, the components that appear in $b_{2}^{k}$ should be regarded as the diagonals of the corresponding diagonal matrices that we mentioned before, multiplied component-wise. By introducing the notation $x_{1}^{k} = {(u^{k}, w^{k})}^{⊤}, x_{2}^{k} = {(q^{k}, p^{k})}^{⊤},$ the Newton system (5.7) can be written as (5.8) $[\begin{matrix} A & B \\ C_{k} & D_{k} \end{matrix}] (\begin{matrix} x_{1}^{k + 1} \\ x_{2}^{k + 1} \end{matrix}) = (\begin{matrix} b_{1}^{k} \\ b_{2}^{k} \end{matrix}) .$ (5.8)

The above system can be simplified utilizing the Schur complement: First solve for the primal variables $x_{1}^{k + 1} = (u^{k + 1}, w^{k + 1})$ and then recover the dual ones $x_{2}^{k + 1} = (q^{k + 1}, p^{k + 1}) .$ This yields $(A - B D_{k}^{- 1} C_{k}) x_{1}^{k + 1} = b_{1}^{k} - B D_{k}^{- 1} b_{2}^{k}, x_{2}^{k + 1} = D_{k}^{- 1} (b_{2}^{k} - C_{k} x_{1}^{k + 1}) .$

The folllowing result then holds.

Lemma 5.1.

If $(q^{k}, p^{k})$ belong to the feasible set, i.e., $| q^{k} | \leq α_{1}$ and $| p^{k} | \leq α_{0}$ component-wise, then the matrix $S_{k} : = (A - B D_{k}^{- 1} C_{k})$ is positive definite and for the minimum eigenvalues we have $λ_{\min} (S_{k}) \geq λ_{\min} (A) > 0$ . Furthermore, $S_{k}^{- 1}$ is bounded independently of k.

The proof of Lemma 5.1 follows the steps of the analogous proof in [Citation32] and is hence omitted. Summarizing, the Newton method for the solution of the (5.1)–(5.4) is outlined in Algorithm Citation1. Here we have followed [Citation32] and project in every iteration the variables q, p onto the feasible sets such that the result of Lemma 5.1 holds.

Algorithm 1 Newton algorithm for the solution of the regularized TGV primal problem $P_{γ}$

while some stopping criterion is not satisfied do

Solve the linear system for $x_{1}^{k + 1} = (u^{k + 1}, w^{k + 1})$ $(A - B D_{k}^{- 1} C_{k}) x_{1}^{k + 1} = b_{1}^{k} - B D_{k}^{- 1} b_{2}^{k}$

Update ${\tilde{x}}_{2}^{k + 1} = ({\tilde{q}}^{k + 1}, {\tilde{p}}^{k + 1})$ as follows ${\tilde{x}}_{2}^{k + 1} = D_{k}^{- 1} (b_{2}^{k} - C_{k} x_{1}^{k + 1})$

Compute $q^{k + 1}, p^{k + 1}$ as projections of ${\tilde{q}}^{k + 1}, {\tilde{p}}^{k + 1}$ onto the feasible sets ${q : | q | \leq α_{1}}, {p : | p | \leq α_{0}}$

end while

The projections onto the feasible sets are defined, respectively, as (5.9) $q = \frac{\tilde{q}}{\max {1, \frac{| \tilde{q} |}{α_{1}}}}, p = \frac{\tilde{p}}{\max {1, \frac{| \tilde{p} |}{α_{0}}}},$ (5.9) with the equalities above to be considered component-wise.

6.2. The numerical algorithm for ( $P$ _TGV)

We now describe our strategy for solving the discretized version of the bilevel TGV problem ( $P$ _TGV). We note that in most of the experiments we will keep α₀ a scalar – this is justified by the numerical results; see the relevant discussion later on. We will always mention the small modifications on the algorithm when α₀ is a scalar. We will also make use here of the discrete Laplacian with zero Neumann boundary conditions $Δ : U_{h} \to U_{h}$ which is used to act on the weight function α₁. These are the desired boundary conditions for $α_{0}, α_{1}$ as dictated by the regularity result for the H¹–projection in [Citation29, Corollary 2.3]. For a function $u \in U_{h}$ we define the discrete $ℓ^{2}$ norm as $∥ u ∥_{ℓ^{2} (Ω_{h})}^{2} = h^{2} \sum_{(i, j) \in Ω_{h}} | u_{i, j} |^{2} .$

For the discrete H¹ norm applied to the weight function α₁ we use $∥ α_{1} ∥_{H^{1} (Ω_{h})} = h \sqrt{α_{1}^{⊤} (I - Δ) α_{1}},$ while the dual norm is defined as $∥ r ∥_{H^{1} {(Ω_{h})}^{*}} = ∥ {(I - Δ)}^{- 1} r ∥_{H^{1} (Ω_{h})} = h \sqrt{r^{⊤} {(I - Δ)}^{- 1} r}$ based on the $H^{1} \to H^{1} {(Ω)}^{*}$ Riesz map $α \mapsto r = (I - Δ) α .$ For the discrete version of the averaging filter in the definition of the localized residuals (4.1) we use a filter of size $n_{w} \times n_{w},$ with entries of equal value whose sum is equal to one. With these definitions the discrete version of the bilevel TGV ( $P$ _TGV), is the following: $(P_{TGV}^{h}) {\begin{matrix} minimize \frac{1}{2} ‖ {(R (u) - {\bar{σ}}^{2})}^{+} ∥_{ℓ^{2} (Ω_{h})}^{2} + \frac{1}{2} ∥ {({\underline{σ}}^{2} - R (u))}^{+} ∥_{ℓ^{2} (Ω_{h})}^{2} + \frac{λ_{0}}{2} ∥ α_{0} ∥_{H^{1} (Ω_{h})}^{2} + \frac{λ_{0}}{2} ∥ α_{1} ∥_{H^{1} (Ω_{h})}^{2}, \\ over (u, α_{0}, α_{1}) \in U_{h} \times {(A_{a d}^{0})}_{h} \times {(A_{a d}^{1})}_{h}, \\ \begin{matrix} subject to B u - μ Δ u - div q - f = 0, \\ ν w - ν Δ w - q - div p = 0, \\ {max}_{δ} (| \nabla u - w |, γ_{1}) q - α_{1} (\nabla u - w) = 0, \\ {max}_{δ} (| E w |, γ_{0}) p - α_{0} E w = 0. \end{matrix} \end{matrix}$

Here the box constraint sets are defined as $\begin{matrix} {(A_{a d}^{0})}_{h} = {α_{0} \in U_{h} : {\underline{α}}_{0} \leq {(α_{0})}_{i, j} \leq {\bar{α}}_{0}, for all (i, j) \in Ω_{h}}, \\ {(A_{a d}^{1})}_{h} = {α_{1} \in U_{h} : {\underline{α}}_{1} \leq {(α_{1})}_{i, j} \leq {\bar{α}}_{1}, for all (i, j) \in Ω_{h}} . \end{matrix}$

The discretized versions of (5.1)–(5.4) and the upper level objective are still denoted by $g_{pd} (x) = 0,$ and J respectively.

Regarding the choice of the lower and upper bounds for the local variance ${\underline{σ}}^{2}$ and ${\bar{σ}}^{2},$ respectively, we follow here the following rules, where $σ^{2}$ is the variance of the “Gaussian” noise contaminating the data: (5.10) ${\bar{σ}}^{2} = σ^{2} (1 + \frac{\sqrt{2}}{n_{w}}), {\underline{σ}}^{2} = σ^{2} (1 - \frac{\sqrt{2}}{n_{w}}) .$ (5.10)

The formulae (5.10) are based on the statistics of the extremes; see [Citation29, Section 4.2.1].

We now proceed by describing the algorithm for the numerical solution of ( $P_{TGV}^{h}$ ). In essence, we employ a discretized projected gradient method with Armijo line search. We briefly describe how the discrete gradient of the reduced objective functional is computed with the help of the adjoint equation. The corresponding discretized version of the latter $D_{x} g_{pd} {(x^{*})}^{⊤} = - D_{x} J (x),$ where $x^{*} : = (u^{*}, w^{*}, q^{*}, p^{*})$ is the adjoint variable, reads (5.11) $[\begin{matrix} A^{⊤} & C^{⊤} \\ B^{⊤} & D^{⊤} \end{matrix}] (\begin{matrix} u^{*} \\ w^{*} \\ q^{*} \\ p^{*} \end{matrix}) = (\begin{matrix} - 2 (u - f) (w * ({(R (u) - {\bar{σ}}^{2})}^{+} - {({\underline{σ}}^{2} - R (u))}^{+})) \\ 0 \\ 0 \\ 0 \end{matrix}) : = (\begin{matrix} b_{1}^{*} \\ b_{2}^{*} \end{matrix}),$ (5.11) where the matrices above were defined in (5.5) and (5.6). The equation can be solved again for $x_{1}^{*} : = (u^{*}, w^{*})$ first and then subsequently for $x_{2}^{*} : = (q^{*}, p^{*})$ as follows $\begin{matrix} (A^{⊤} - C^{⊤} {(D^{⊤})}^{- 1} B^{⊤}) x_{1}^{*} = b_{1}^{*}, \\ x_{2}^{*} = {(D^{⊤})}^{- 1} (b_{2}^{*} - B^{⊤} x_{1}^{*}) . \end{matrix}$

The derivatives of the reduced objective with respect to α₀ and α₁, respectively, are (5.12) ${\hat{J}}_{α_{0}}^{'} (α_{0}, α_{1}) = {(D_{α_{0}} g_{pd})}^{⊤} x^{*} + D_{α_{0}} J (α_{0}, α_{1})$ (5.12) (5.13) $\begin{matrix} = [\begin{matrix} I d & I d & 2 I d \end{matrix}] (\begin{matrix} 0 \\ 0 \\ 0 \\ - diag (E w) \end{matrix}) (\begin{matrix} u^{*} \\ w^{*} \\ q^{*} \\ p^{*} \end{matrix}) + λ_{0} (I d - Δ) α_{0} \\ = - [I d I d 2 I d] d iag (E w) p^{*} + λ_{0} (I d - Δ) α_{0}, \end{matrix}$ (5.13) (5.14) $\begin{matrix} {\hat{J}}_{α_{1}}^{'} (α_{0}, α_{1}) = {(D_{α_{1}} g_{pd})}^{⊤} x^{*} + D_{α_{1}} J (α_{0}, α_{1}) \\ = [\begin{matrix} I d & I d \end{matrix}] (\begin{matrix} 0 \\ 0 \\ - diag (D u - w) \\ 0 \end{matrix}) (\begin{matrix} u^{*} \\ w^{*} \\ q^{*} \\ p^{*} \end{matrix}) + λ_{1} (I d - Δ) α_{1}, \\ = - [I d I d] d iag (D u - w) q^{*} + λ_{1} (I d - Δ) α_{1}, \end{matrix}$ (5.14) where $x = (u, w, q, p)$ solves $g_{pd} (x) = 0$ for $α_{0}, α_{1} .$ The corresponding reduced gradients are (5.15) $\nabla_{α_{i}} \hat{J} (α_{0}, α_{1}) = {(I - Δ)}^{- 1} {\hat{J}}_{α_{i}}^{'} (α_{0}, α_{1}), i = 0, 1.$ (5.15)

We note that in the case of a scalar α₀, we set $λ_{0} = 0 .$ Then, ${\hat{J}}_{α_{0}}' (α_{0}, α_{1}) = - [1 1 21] diag (E w) p^{*},$ and $\nabla_{α_{0}} \hat{J} (α_{0}, α_{1}) = {\hat{J}}_{α_{0}}' (α_{0}, α_{1}) .$ Here $1$ denotes a matrix of size $1 \times n m$ with all entries equal to one. In summary, the projected gradient algorithm for the solutions of ( $P_{TGV}^{h}$ ) is described in Algorithm Citation2

We lastly note that the projections $P_{{(A_{a d}^{0})}_{h}}, P_{{(A_{a d}^{1})}_{h}}$ are computed as described in [Citation29, Algorithm Citation4], that is via the semismooth Newton method developed in [Citation54]. We only mention that the original discretized H¹–projection problem $P_{{(A_{a d})}_{h}} (\tilde{α})$ given by (5.16) ${\begin{matrix} \min \frac{1}{2} ‖ α - \tilde{α} ‖_{H^{1} (Ω_{h})}^{2} : = \frac{h^{2}}{2} {(α - \tilde{α})}^{⊤} (I - Δ) (α - \tilde{α}), \\ over α \in {(A_{a d})}_{h} = {α \in U_{h} : \underline{α} \leq α_{i, j} \leq \bar{α}}, \end{matrix}$ (5.16) is approximated by the following penalty version: (5.17) $\min_{α \in U_{h}} \frac{1}{2} ‖ α - \tilde{α} ‖_{H^{1} (Ω_{h})}^{2} + \frac{1}{ϵ_{α}} (\frac{1}{2} ‖ {(α - \bar{α})}^{+} ‖_{ℓ^{2} (Ω_{h})}^{2} + \frac{1}{2} ‖ {(\underline{α} - α)}^{+} ‖_{ℓ^{2} (Ω_{h})}^{2}),$ (5.17) with some small $ϵ_{α} > 0 .$ For the projection regarding a scalar α₀, we simply set $P_{{(A_{a d}^{0})}_{h}} (α_{0}) = \max (\min (α_{0}, {\bar{α}}_{0}), {\underline{α}}_{0}) .$

Algorithm 2: Discretized projected gradient method for the bilevel TGV problem ( $P_{TGV}^{h}$ )

Input: f, ${\underline{α}}_{0},$ ${\bar{α}}_{0}, {\underline{α}}_{1},$ ${\bar{α}}_{1}, \bar{σ}, \underline{σ},$ λ₀, λ₁, μ, ν, γ₀, γ₁, δ, $n_{w} τ_{0}^{0}, τ_{1}^{0},$ $0 < c < 1, 0 < θ_{-} < 1 \leq θ_{+}$

Initialize: $α_{0}^{0} \in {(A_{a d}^{0})}_{h},$ $α_{1}^{0} \in {(A_{a d}^{1})}_{h}$ and set k = 0.

repeat

Use the Algorithm Citation1 to compute the solution $x^{k} = (u^{k}, w^{k}, q^{k}, p^{k})$ of the lower level problem $g_{pd} (u^{k}, w^{k}, q^{k}, p^{k}) = 0$

Solve the adjoint Equationequation (5.11)(5.11) $[\begin{matrix} A^{⊤} & C^{⊤} \\ B^{⊤} & D^{⊤} \end{matrix}] (\begin{matrix} u^{*} \\ w^{*} \\ q^{*} \\ p^{*} \end{matrix}) = (\begin{matrix} - 2 (u - f) (w * ({(R (u) - {\bar{σ}}^{2})}^{+} - {({\underline{σ}}^{2} - R (u))}^{+})) \\ 0 \\ 0 \\ 0 \end{matrix}) : = (\begin{matrix} b_{1}^{*} \\ b_{2}^{*} \end{matrix}),$ (5.11) for $(u^{*}, w^{*}, q^{*}, p^{*})$

Compute the derivative of the reduced objective with respect to α₀ and α₁ as in (5.13) and (5.14)

Compute the reduced gradients $\nabla_{α_{i}} \hat{J} (α_{0}^{k}, α_{1}^{k}) = {(I - Δ)}^{- 1} {\hat{J}}_{α_{i}}' (α_{0}^{k}, α_{1}^{k}), i = 0, 1$

Compute the trial points $α_{i}^{k + 1} = P_{{(A_{a d}^{i})}_{h}} (α_{i}^{k} - τ_{i}^{k} \nabla_{α_{i}} \hat{J} (α_{0}^{k}, α_{1}^{k})), i = 0, 1$

while $\begin{matrix} \hat{J} (α_{0}^{k + 1}, α_{1}^{k + 1}) > \hat{J} (α_{0}^{k}, α_{1}^{k}) \\ + c ({\hat{J}}_{α_{0}}^{'} {(α_{0}^{k}, α_{1}^{k})}^{⊤} (α_{0}^{k + 1} - α_{0}^{k}) + {\hat{J}}_{α_{1}}^{'} {(α_{0}^{k}, α_{1}^{k})}^{⊤} (α_{1}^{k + 1} - α_{1}^{k})) \end{matrix}$

do (Armijo line search)

Set $τ_{0}^{k} : = θ_{-} τ_{0}^{k}, τ_{1}^{k} : = θ_{-} τ_{1}^{k}$ and re-compute $α_{i}^{k + 1} = P_{{(A_{a d}^{i})}_{h}} (α_{i}^{k} - τ_{i}^{k} \nabla_{α_{i}} \hat{J} (α_{0}^{k}, α_{1}^{k})), i = 0, 1$

end while

Update $τ_{0}^{k + 1} = θ_{+} τ_{0}^{k}, τ_{1}^{k + 1} = θ_{+} τ_{1}^{k}$ and $k : = k + 1$

until some stopping condition is satisfied

6.3. Numerical examples in denoising

We now discuss some weighted TGV numerical examples, with regularization weights produced automatically by Algorithm Citation2. We are particularly interested in the degree of improvement over the scalar TGV examples. We are also interested in whether the statistics-based upper level objective enforces an automatic choice of regularization parameters that ultimately leads to a reduction of the staircasing effect. Our TGV results are also compared with the bilevel weighted TV method of [Citation27, Citation29]. In order to have a fair comparison, we use a Huber TV regularization for the corresponding lower lever problem, with the latter substituted by the TV primal-dual optimality conditions, $\begin{matrix} u - div p - f = 0 \\ \max (| \nabla u |, γ) p - α \nabla u = 0, \end{matrix}$ where α is now the spatially dependent regularization parameter for TV. We use the same values for the Huber parameter γ both in bilevel TV and bilevel TGV. The associated test images are depicted in with resolution $n = m = 256 .$ The first one is the well-known “Cameraman” image which essentially consists of a combination of piecewise constant parts and texture. The next two images, “Parrot” and “Turtle” contain large piecewise affine type areas, thus they are more suitable for the TGV prior. The final image “hatchling” is characterized by highly oscillatory patterns of various kinds, depicting sand in various degrees of focus.

Figure 3. Test images, resolution $256 \times 256 .$

Figure 3. Test images, resolution 256×256.

Parameter values: For the lower level primal-dual TGV problem we used μ = 0, ν = 1, $δ = 10^{- 5},$ $γ_{0} = γ_{1} = 10^{- 3},$ and a mesh size h = 1. For the H¹–projection, we set $ϵ_{α} = 10^{- 6},$ and we also weighted the discrete Laplacian Δ with $6 \times 10^{4} .$ For the lower and upper bounds of α₀ and α₁ we set here ${\underline{α}}_{0} = 10^{- 2}, {\bar{α}}_{0} = 10$ and ${\underline{α}}_{1} = 10^{- 4}, {\bar{α}}_{1} = 10 .$ We also set $λ_{1} = 10^{- 11}$ and when we spatially varied α₀ we also set $λ_{0} = 10^{- 11} .$ We used a normalized $n_{w} \times n_{w}$ filter for w (i.e., with entries $1 / n_{w}^{2}$ ), with n_w = 7. The local variance barriers ${\underline{σ}}^{2}$ and ${\bar{σ}}^{2}$ were set according to (5.10). For our noisy images we have $σ^{2} = 10^{- 2},$ and thus the corresponding values for $(\underline{σ}, \bar{σ})$ are $(0.00798, 0.01202) .$ For the Armijo line search the parameters were $τ_{0}^{0} = 0.05,$ $τ_{1}^{0} = 100, c = 10^{- 9}, θ_{-} = 0.25, θ_{+} = 2 .$ We solved each lower level problem until the residual of each of the optimality conditions (5.1)–(5.4) had Euclidean norm less than $10^{- 4} .$ MATLAB’s backslash was used for the solution of the linear systems.

We note that the initialization of the algorithm needs some attention. As it was done in [Citation29] for the TV case, $α_{0}^{0}$ and $α_{0}^{1}$ must be large enough in order to produce cartoon-like images, providing the local variance estimator with useful information. However, if α₀ is initially too large then there is a danger of falling into the regime, in which the TGV functional and hence the solution map of (at least the non-regularized) lower level problem does not depend on α₀. In that case the derivative of the reduced functional with respect to α₀ will be close to zero, thus making no or little progress with respect to its optimal choice. Indeed this was confirmed after some numerical experimentation. Note that an analogous phenomenon can occur also in the case where α₀ is much smaller than α₁. In that case it is the effect of α₁ which vanishes. This has been shown theoretically in [Citation37, Proposition 2] for dimension one, but numerical experiments indicate that this phenomenon persists also in higher dimensions. In our examples we used $α_{1}^{0} = 0.25$ and $α_{0}^{0} = 0.2 .$ Regarding the termination of the projected gradient algorithm, we used a fixed number of iterations n = 40. Neither the upper level objective nor the argument changed significantly after running the algorithm for more iterations; see for instance . The same holds true for the corresponding PSNR and SSIM values. We also note that a termination criterion as in [Citation29] based on the proximity measures $‖ P_{{(A_{a d}^{i})}_{h}} (α_{i}^{k} - \nabla_{α_{i}} \hat{J} (α_{0}^{k}, α_{1}^{k})) - α_{i}^{k} ‖_{H^{1} (Ω_{h})},$ i = 0, 1, is also possible here.

Figure 4. Upper level objective values vs projected gradient iterations for the problem ( $P_{TGV}^{h}$ ) (right) of (scalar α₀, spatial α₁).

Figure 4. Upper level objective values vs projected gradient iterations for the problem (PTGVh) (right) of Figure 5 (scalar α0, spatial α1).

We note that due to the line search, the number of times that the lower level problem has to be solved is more than the number of projected gradient iterations. For instance for the four examples the lower level problem had to be solved 58, 57, 57, and 57 times, respectively, with typically 8-12 Newton iterations needed per each lower level problem, This resulted in each bilevel problem (40 projected gradient iterations) requiring approximately 45-50 times the CPU time needed to solve one instance of the lower level problem. We note however that in general after 8-9 projected gradient iterations the PSNR and SSIM values of the reconstruction were essentially reaching their top values, requiring about 6-8 times the CPU time needed for the lower level problem. Further improvement with respect to computational times can be achieved using faster descent methods than the one we used here, like for instance [Citation55,Citation56].

For the first series of examples we keep the parameter α₀ scalar, whose value nevertheless is determined by the bilevel algorithms. We depict the examples in . The first row shows the noisy images, while the second contains the bilevel TV results. The third row depicts the best scalar TGV results with respect to SSIM, where we have computed the optimal scalars $α_{0}, α_{1}$ with a manual grid method. The fourth row shows the results of ( $P_{TGV}^{h}$ ). Detailed sections of all the images of are highlighted in . The weight functions α₁ for the bilevel TV and the bilevel TGV algorithms are shown in . In we report all PSNR and SSIM values of the best scalar methods (scalar TV, scalar TGV) with respect to both quality measures, the corresponding values of the bilevel TV and TGV algorithms, as well as the ones the correspond to solving the bilevel TGV with the statistics-based upper level objective (4.2) but using scalar parameters only (third row from the end in ). For the latter case, we used no additional regularization for the weights in the upper level objective. We also report the PSNR and SSIM values of the computed results when both α₀, α₁ are spatially varying (last row), which we discuss later on in this section. We next comment on the results for each image.

Figure 5. First row: noisy images. Second row: bilevel TV. Third row: Best scalar TGV (SSIM). Fourth row: bilevel TGV.

Figure 6. Details of the reconstructions shown in .

Figure 7. First row: the computed regularization functions α for bilevel TV. Second row: the computed regularization functions α₁ for bilevel TGV.

Figure 7. First row: the computed regularization functions α for bilevel TV. Second row: the computed regularization functions α1 for bilevel TGV.

Table 1. PSNR and SSIM comparisons for the images of .

Display Table

Cameraman: Here both the best PSNR and SSIM are obtained by the bilevel TV algorithm. This is probably not surprising due to the piecewise constant nature of this image. However, the bilevel TGV algorithm improve upon its scalar version with respect to both measures.

Parrot: Here the best results with respect to both PSNR and SSIM are achieved by the bilevel TGV algorithm, ( $P_{TGV}^{h}$ ). There is significant improvement over all TV methods, which is due to the parameters being chosen in a way such that the staircasing effect diminishes. Furthermore, we observe improvement over the scalar TGV result especially around the parrot’s eye, where the weights α₁ drop significantly; see the second column of .

Turtle: We get analogous results here as well, with the bilevel TGV ( $P_{TGV}^{h}$ ) producing the best results both with respect to PSNR and SSIM. There is a significant reduction of the staircasing effect, while the weight α₁ drops in the detailed areas of the image (head and flipper of the turtle).

Hatchling: In this image, the best PSNR and best SSIM are achieved by the scalar TGV when its parameters are manually optimized with respect to each measure (using the ground truth). However, the (ground truth-free) bilevel TGV achieves a better SSIM and PSNR value compared to these two results, respectively, achieving a better balance with respect to these measures. Similarly the scalar TV results are generally better than the ones of bilevel TV. We attribute this to the fact that the natural oscillatory features of the image are interpreted as noise by the upper level objective. Nevertheless, all the bilevel methods are able to locate and preserve better the eyes area, i.e., sand in focus, with the weight α₁ dropping there significantly.

We remark that in all four examples, the ground truth-free bilevel TGV with at least one spatially varying parameter always produces better results with respect to both PSNR and SSIM, than the corresponding ground truth-free bilevel TGV with both parameters being scalar, compare the third last with the second last row of .

Finally, we would like to experiment with the case were also the weight α₀ varies spatially. We note that by spatially varying both TGV parameters, the reduced problem becomes highly non-convex with many combinations of these parameters leading to similar values for the upper level objective. In order to deal with this, we solve a slightly different problem by performing one step of alternating optimization. Here we optimize only with respect to a spatially varying α₀ having fixed the spatial weight α₁ as it has been computed by the previous experiments, see the last row of . According to our numerical experiments, this strategy produces satisfactory results. As initialization for α₀, we set it constant, equal to 5.

In we depict the computed spatially varying parameters α₀ as well as the corresponding PSNR and SSIM values. Observe that the shape of α₀ is different to the one of α₁, compare the last row of to the second row of . This implies that a non-constant ratio of $α_{0} / α_{1}$ is preferred throughout the image domain. Secondly, by spatially varying α₀ we only get a slight improvement with respect to PSNR and SSIM in the first two images, compare the second last and the last row of . However, it is interesting to observe the spatial adaptation of α₀ with respect to piecewise constant versus piecewise smooth areas. The values of α₀ are high in large piecewise constant areas, like the background of cameraman, the left area of the parrot image, as well as the top-right and the bottom-left corner of the turtle image. This is not so surprising as large values of α₀ imply a large ratio $α_{0} / α_{1}$ and a promotion of TV like behavior in those areas. We can observe this in more detail at the parrot image, see last row of . On the contrary, the values of α₀ are kept small in piecewise smooth areas like the right part of the parrot image and the sun rays around the turtle’s body. This results in low ratio $α_{0} / α_{1}$ and thus to a more TGV like behavior, reducing the staircasing effect. This is another indication of the fact that by minimizing the statistics-based upper level objective one is able not only to better preserve detailed areas but also to finely adjust the TGV parameters such that the staircasing is reduced.

Figure 8. Experiments with optimizing over a spatially varying α₀. Top row: the automatically computed scalar parameters α₀, that correspond to the images of the last row of . Middle row: the automatically computed spatially varying parameters α₀, where α₁ has been kept fixed (last row of ). The weight α₀ is adapted to piecewise constant parts having there large values and hence promoting TV like behavior, see for instance the parrot image at the last row. On the contrary α₀ has low values in piecewise smooth parts promoting a TGV like behavior reducing the staircasing.

Figure 8. Experiments with optimizing over a spatially varying α0. Top row: the automatically computed scalar parameters α0, that correspond to the images of the last row of Figure 5. Middle row: the automatically computed spatially varying parameters α0, where α1 has been kept fixed (last row of Figure 7). The weight α0 is adapted to piecewise constant parts having there large values and hence promoting TV like behavior, see for instance the parrot image at the last row. On the contrary α0 has low values in piecewise smooth parts promoting a TGV like behavior reducing the staircasing.

Lastly, we would like to see the degree of improvement of bilevel TGV with spatial parameters over the other ground truth-free approaches, i.e. using the statistics-based upper level objective, over a larger set of images. We thus used the 24 images of the Kodak dataset http://r0k.us/graphics/kodak/, and selected two middle square parts from each photograph, resized to 256 × 256 pixels, see left part of . We added two more photographs (see bottom-right part in the depicted collection) to reach 50 photographs in total. In each photograph, we run the bilevel TV and bilevel TGV experiments after adding the same type of Gaussian noise (with only α₁ being spatially varying), as well as the bilevel TGV with both parameters to be scalar and we compared the corresponding PSNR differences, ${PSNR}_{spatial TGV} - {PSNR}_{spatial TV},$ ${PSNR}_{spatial TGV} - {PSNR}_{scalar TGV},$ as well as the corresponding SSIM ones, for every one of the 50 images. The PSNR improvement for the bilevel TGV over bilevel TV (both spatial parameters) was 0.05 ± 0.14 (mean ± standard deviation) while the improvement with respect to SSIM was more moderate, 0.003 ± 0.008. In (top) we also show the corresponding histograms of this difference. The fact that for some images the bilevel TV algorithm produces images of higher PSNR and/or SSIM, as it was the case for the cameraman, is perhaps not so surprising as the same has been observed in similar tests in [Citation26] in the case of scalar parameters only and for a bilevel scheme that maximizes the PSNR of the reconstructed image, using a ground truth-based upper level objective. We stress that for large ratio $α_{0} / α_{1}$ TGV is equivalent to TV only up an affine correction [Citation37] and for images that have a piecewise constant structure, setting α₀ in TGV to be large enough might not be enough to achieve the same reconstruction as TV. On the other hand the improvement of bilevel TGV with spatially varying α₁ versus its scalar analogue was $0.08 \pm 0.17$ for the PSNR and 0.005 ± 0.008 for the SSIM, see bottom histograms of . We note that the last two methods require a similar computational effort since the additional H¹–projection for the spatially varying weight is typically quite fast and does not significantly contribute to the overall computational time.

Figure 9. Histograms of the PSNR and SSIM differences between the bilevel spatial TGV and bilevel spatial TV (top) and between the bilevel spatial TGV and bilevel scalar TGV (bottom), for reconstructions of noisy versions of the 50 images on the left. In all bilevel algorithms the ground truth-free statistics-based upper level objective was used.

7. Conclusion

In this work we have adapted the bilevel optimization framework of [Citation27, Citation29] for automatically computing spatially dependent regularization parameters for the TGV regularizer. We established a rigorous dualization framework for the lower level TGV minimization problem that formed the basis for its algorithmic treatment via a Newton method. We showed that the bilevel optimization framework with the statistics/localized residual based upper level objective is able to automatically produce spatially varying parameters that not only adapt to the level of detail in the image but also reduce the staircasing effect.

Future continuation of this work includes adaptation of the bilevel TGV framework for advanced inverse problems tasks, i.e., Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) reconstruction as well as in multimodal medical imaging problems where structural TV based regularizers (edge aligning) have been suggested. That will also require devising a bilevel scheme where the invertibility assumption on T is dropped, since the linear operators involved in these inverse problems do not satisfy this assumption. Adaptation of the framework for different noise distributions e.g. Poisson, Salt & Pepper as well as combination of those [Citation57,Citation58], should also be investigated. Finally, a fine structural analysis of the weighted TGV regularized solutions in the spirit of [Citation59,Citation60] would be also of interest.

Additional information

Funding

This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – The Berlin Mathematics Research Center MATH+ (EXC-2046/1, project ID: 390685689). It was further supported by the MATHEON Research Center project CH12 funded by the Einstein Center for Mathematics (ECMath) Berlin. M.H and K.P. acknowledge support of Institut Henri Poincaré (UMS 839 CNRS-Sorbonne Université), and LabEx CARMIN (ANR-10-LABX-59-01). C.N.R. was supported by NSF grant DMS-2012391. H.S. acknowledges the financial support of Alexander von Humboldt Foundation.

References

Adams, R. A., Fournier, J. (2003). Sobolev Spaces. 2nd ed. Cambridge, MA: Academic Press,
Google Scholar
Ambrosio, L., Fusco, N., Pallara, D. (2000). Functions of Bounded Variation and Free Discontinuity Problems. USA: Oxford University Press.
Google Scholar
Attouch, H., Brezis, H. (1986). Duality for the Sum of Convex Functions in General Banach Spaces. Vol. 34, North-Holland Mathematical Library, pp. 125–133.
Google Scholar
Barzilai, J., Borwein, J. M. (1988). Two-point step size gradient methods. IMA J Numer Anal. 8(1):141–148. DOI: 10.1093/imanum/8.1.141.
Web of Science ®Google Scholar
Benning, M., Brune, C., Burger, M., Müller, J. (2013). Higher-order TV methods: Enhancement via Bregman iteration. J Sci Comput. 54(2-3):269–310. DOI: 10.1007/s10915-012-9650-3.
Web of Science ®Google Scholar
Bergounioux, M., Papoutsellis, E., Stute, S., Tauber, C. (2018). Infimal convolution spatiotemporal PET reconstruction using total variation based priors, HAL preprint https://hal.archives-ouvertes.fr/hal-01694064.
Google Scholar
Borwein, J. M., Vanderwerff, J. D. (2010). Convex Functions. Cambridge: Cambridge University Press.
Google Scholar
Bredies, K., Dong, Y., Hintermüller, M. (2013). Spatially dependent regularization parameter selection in total generalized variation models for image restoration. Inter. J. Comput. Math. 90(1):109–123. DOI: 10.1080/00207160.2012.700400.
Web of Science ®Google Scholar
Bredies, K., Holler, M. (2012). Artifact-free JPEG decompression with total generalized variation. VISAP 2012: Proceedings of the International Conference on Computer Vision and Applications,
Google Scholar
Bredies, K., Holler, M. (2013). A TGV regularized wavelet based zooming model. Scale Space and Variational Methods in Computer Vision. Springer, DOI: 10.1007/978-3-642-38267-3_13., pp. 149–160.
Google Scholar
Bredies, K., Holler, M. (2014). Regularization of linear inverse problems with total generalized variation. J. Inverse Ill-Posed Prob. 22(6):871–913. DOI: 10.1515/jip-2013-0068.
Web of Science ®Google Scholar
Bredies, K., Holler, M. (2015). A TGV-based framework for variational image decompression, zooming, and reconstruction. Part I: Analytics. SIAM J. Imaging Sci. 8(4):2814–2850. DOI: 10.1137/15M1023865.
Web of Science ®Google Scholar
Bredies, K., Holler, M. (2015). A TGV-based framework for variational image decompression, zooming, and reconstruction. Part II: Numerics. SIAM J. Imaging Sci. 8(4):2851–2886. DOI: 10.1137/15M1023877.
Web of Science ®Google Scholar
Bredies, K., Holler, M., Storath, M., Weinmann, A. (2018). Total generalized variation for manifold-valued data. SIAM J. Imaging Sci. 11(3):1785–1848. DOI: 10.1137/17M1147597.
Web of Science ®Google Scholar
Bredies, K., Kunisch, K., Pock, T. (2010). Total generalized variation. SIAM J. Imaging Sci. 3(3):492–526. DOI: 10.1137/090769521.
Web of Science ®Google Scholar
Bredies, K., Kunisch, K., Valkonen, T. (2013). Properties of L1-TGV 2: The one-dimensional case. J. Math. Anal. Appl. 398(1):438–454. DOI: 10.1016/j.jmaa.2012.08.053.
Web of Science ®Google Scholar
Bredies, K., Valkonen, T. (2011). Inverse problems with second-order total generalized variation constraints. Proceedings of SampTA 2011 - 9th International Conference on Sampling Theory and Applications, Singapore,
Google Scholar
Burger, M., Papafitsoros, K., Papoutsellis, E., Schönlieb, C. B. (2016). Infimal convolution regularisation functionals of BV and [Formula: see text] Spaces: Part I: The Finite [Formula: see text] Case. J. Math. Imaging Vis. 55(3):343–369. DOI: 10.1007/s10851-015-0624-6.
PubMed Web of Science ®Google Scholar
Calatroni, L., Chung, C., Los Reyes, J. D., Schönlieb, C. B., Valkonen, T. (2017). Bilevel approaches for learning of variational imaging models. RADON Book Series on Computational and Applied Mathematics. Vol. 18, Berlin, Boston: De Gruyter, https://www.degruyter.com/view/product/458544.
Google Scholar
Calatroni, L., De Los Reyes, J. C., Schönlieb, C. B. (2017). Infimal convolution of data discrepancies for mixed noise removal. SIAM J. Imaging Sci. 10(3):1196–1233. DOI: 10.1137/16M1101684.
Web of Science ®Google Scholar
Calatroni, L., Papafitsoros, K. (2019). Analysis and automatic parameter selection of a variational model for mixed gaussian and salt-and-pepper noise removal. Inverse Prob. 35(11):114001. DOI: 10.1088/1361-6420/ab291a.
Web of Science ®Google Scholar
Caselles, V., Chambolle, A., Novaga, M. (2007). The discontinuity set of solutions of the TV denoising problem and some extensions. Multiscale Model. Simul. 6(3):879–894. DOI: 10.1137/070683003.
Web of Science ®Google Scholar
Chambolle, A., Duval, V., Peyré, G., Poon, C. (2017). Geometric properties of solutions to the total variation denoising problem. Inverse Prob. 33(1):015002. http://stacks.iop.org/0266-5611/33/i=1/a=015002. DOI: 10.1088/0266-5611/33/1/015002.
Web of Science ®Google Scholar
Chambolle, A., Lions, P. L. (1997). Image recovery via total variation minimization and related problems. Numerische Mathematik. 76(2):167–188. DOI: 10.1007/s002110050258.
Web of Science ®Google Scholar
Van Chung, C., De los Reyes, J. C., Schönlieb, C. B. (2017). Learning optimal spatially-dependent regularization parameters in total variation image denoising. Inverse Prob. 33(7):074005. DOI: 10.1088/1361-6420/33/7/074005.
Web of Science ®Google Scholar
De Los Reyes, J. C., Schönlieb, C. B., Valkonen, T. (2016). The structure of optimal parameters for image restoration problems. J. Math. Anal. Appl. 434(1):464–500. DOI: 10.1016/j.jmaa.2015.09.023.
Web of Science ®Google Scholar
De Los Reyes, J. C., Schönlieb, C. B., Valkonen, T. (2017). Bilevel parameter learning for higher-order Total Variation regularisation models. J. Math. Imaging Vis. 57(1):1–25. DOI: 10.1007/s10851-016-0662-8.
PubMed Web of Science ®Google Scholar
Demengel, F., Temam, R. (1984). Convex functions of a measure and applications. Indiana Univ. Math. J. 33(5):673–709. DOI: 10.1512/iumj.1984.33.33036.
Web of Science ®Google Scholar
Ekeland, I., Temam, R. (1999). Convex analysis and variational problems. Classics Appl. Mathem. Soci. Indust. Appl. Math.
Google Scholar
Girault, V., Raviart, P. A. (1986). Finite Element Method for Navier-Stokes Equation. Berlin, Germany: Springer.
Google Scholar
Hintermüller, M., Holler, M., Papafitsoros, K. (2018). A function space framework for structural total variation regularization with applications in inverse problems. Inverse Prob. 34(6):064002. http://stacks.iop.org/0266-5611/34/i=6/a=064002. DOI: 10.1088/1361-6420/aab586.
Web of Science ®Google Scholar
Hintermüller, M., Kunisch, K. (2006). Path-following methods for a class of constrained minimization problems in function space. SIAM J. Optim. 17(1):159–187. DOI: 10.1137/040611598.
Web of Science ®Google Scholar
Hintermüller, M., Papafitsoros, K. (2019). Generating structured nonsmooth priors and associated primal-dual methods. Processing, Analyzing and Learning of Images, Shapes, and Forms: Part 2. In: Ron K. and Xue-Cheng T., eds. Handbook of Numerical Analysis, Vol. 20, DOI: 10.1016/bs.hna.2019.08.001., pp. 437–502.
Google Scholar
Hintermüller, M., Papafitsoros, K., Rautenberg, C. N. (2017). Analytical aspects of spatially adapted total variation regularisation. J. Math. Anal. Appl. 454(2):891–935. DOI: 10.1016/j.jmaa.2017.05.025.
Web of Science ®Google Scholar
Hintermüller, M., Papafitsoros, K., Rautenberg, C. N. (2020). Variable step mollifiers and applications. Integr. Equ. Oper. Theory. 92(6). DOI: 10.1007/s00020-020-02608-2.
Web of Science ®Google Scholar
Hintermüller, M., Rautenberg, C. N. (2015). On the density of classes of closed convex sets with pointwise constraints in Sobolev spaces. J. Math. Anal. Appl. 426(1):585–593. DOI: 10.1016/j.jmaa.2015.01.060.
Web of Science ®Google Scholar
Hintermüller, M., Rautenberg, C. N. (2017). Optimal selection of the regularization function in a weighted total variation model. Part I: Modelling and theory. J. Math. Imaging Vis. 59(3):498–514. DOI: 10.1007/s10851-017-0744-2.
Web of Science ®Google Scholar
Hintermüller, M., Rautenberg, C. N., Rösel, S. (2017). Density of convex intersections and applications. Proc. Royal Soci. London A Math. Phys. Engng Sci. 473(2205) DOI: 10.1098/rspa.2016.0919.
Google Scholar
Hintermüller, M., Rautenberg, C. N., Wu, T., Langer, A. (2017). Optimal selection of the regularization function in a weighted total variation model. Part II: Algorithm, its analysis and numerical tests. J. Math. Imaging Vis. 59(3):515–533. DOI: 10.1007/s10851-017-0736-2.
Web of Science ®Google Scholar
Hintermüller, M., Stadler, G. (2006). An infeasible primal-dual algorithm for total bounded variation–based inf-convolution-type image restoration. SIAM J. Sci. Comput. 28(1):1–23. DOI: 10.1137/040613263.
Web of Science ®Google Scholar
Holler, M., Kunisch, K. (2014). On infimal convolution of TV-type functionals and applications to video and image reconstruction. SIAM J. Imaging Sci. 7(4):2258–2300. DOI: 10.1137/130948793.
Web of Science ®Google Scholar
Huber, R., Haberfehlner, G., Holler, M., Kothleitner, G., Bredies, K. (2019). Total generalized variation regularization for multi-modal electron tomography. Nanoscale. 11(12):5617–5632. DOI: 10.1039/c8nr09058k.
PubMed Web of Science ®Google Scholar
Jalalzai, K. Discontinuities of the minimizers of the weighted or anisotropic total variation for image reconstruction, arXiv preprint 1402.0026 (2014), http://arxiv.org/abs/1402.0026.
Google Scholar
Jalalzai, K. (2016). Some remarks on the staircasing phenomenon in total variation-based image denoising. J. Math. Imaging Vis. 54(2):256–268. DOI: http://dx.doi.org/10.1007/s10851-015-0600-1.
Web of Science ®Google Scholar
Knoll, F., Bredies, K., Pock, T., Stollberger, R. (2011). Second order total generalized variation (TGV) for MRI. Magn. Reson. Med. 65(2):480–491. DOI: 10.1002/mrm.22595.
PubMed Web of Science ®Google Scholar
Knoll, F., Holler, M., Koesters, T., Otazo, R., Bredies, K., Sodickson, D. K. (2017). Joint MR-PET reconstruction using a multi-channel image regularizer. IEEE Trans. Med. Imaging. 36(1):1–16. DOI: 10.1109/TMI.2016.2564989.
PubMed Web of Science ®Google Scholar
Kunisch, K., Hintermüller, M. (2004). Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM J. Appl. Math. 64(4):1311–1333. DOI: 10.1137/S0036139903422784.
Web of Science ®Google Scholar
Nesterov, Y. E. (1983). A method for solving the convex programming problem with convergence rate O(1/k2). Soviet Math. Dokl. 27:367–372.
Google Scholar
Papafitsoros, K. (2014). Novel higher order regularisation methods for image reconstruction. Ph.D. thesis. University of Cambridge. https://www.repository.cam.ac.uk/handle/1810/246692.
Google Scholar
Papafitsoros, K., Bredies, K. (2015). Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Road, CB3 0WA, Cambridge A study of the one dimensional total generalised variation regularisation problem. Inverse Prob. Imaging. 9(2):511–550. DOI: 10.3934/ipi.2015.9.511.
Web of Science ®Google Scholar
Papafitsoros, K., Valkonen, T. (2015). Asymptotic behaviour of total generalised variation. Scale Space and Variational Methods in Computer Vision: 5th International Conference, SSVM 2015, Proceedings. In: Jean-François A., Mila N., and Nicolas P., eds. Springer International Publishing, pp. 702–714. DOI: 10.1007/978-3-319-18461-6_56.
Google Scholar
Pöschl, C., Scherzer, O. (2015). Exact solutions of one-dimensional total generalized variation. Comm. Math. Sci. 13(1):171–202. DOI: 10.4310/CMS.2015.v13.n1.a9.
Web of Science ®Google Scholar
Ring, W. (2000). Structural properties of solutions to total variation regularization problems. Esaim: M2an. 34(4):799–810. DOI: 10.1051/m2an:2000104.
Web of Science ®Google Scholar
Rudin, L. I., Osher, S., Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 60(1-4):259–268. DOI: 10.1016/0167-2789(92)90242-F.
Web of Science ®Google Scholar
Schloegl, M., Holler, M., Schwarzl, A., Bredies, K., Stollberger, R. (2017). Infimal convolution of total generalized variation functionals for dynamic MRI. Magn Reson Med. 78(1):142–155. DOI: 10.1002/mrm.26352.
PubMed Web of Science ®Google Scholar
Temam, R. (1985). Mathematical Problems in Plasticity. Vol. 15, Gauthier-Villars Paris.
Google Scholar
Temam, R., Strang, G. (1980). Functions of bounded deformation. Arch. Rational Mech. Anal. 75(1):7–21. DOI: 10.1007/BF00284617.
Web of Science ®Google Scholar
Valkonen, T. (2017). The jump set under geometric regularisation. Part 2: Higher-order approaches. J. Math. Anal. Appl. 453(2):1044–1085. DOI: 10.1016/j.jmaa.2017.04.037.
Web of Science ®Google Scholar
Valkonen, T., Bredies, K., Knoll, F. (2013). Total generalized variation in diffusion tensor imaging. SIAM J. Imaging Sci. 6(1):487–525. DOI: 10.1137/120867172.
Web of Science ®Google Scholar
Zowe, J., Kurcyusz, S. (1979). Regularity and stability for the mathematical programming problem in Banach spaces. Appl Math Optim. 5(1):49–62. DOI: 10.1007/BF01442543.
Web of Science ®Google Scholar

Appendix A

We provide here the proof of Proposition 2.2:Proof of Proposition 2.2 . The proof follows [Citation12, Proposition 3.3]. Denote by

C_{α}, K_{α}

the following convex sets

(A.1)

C_{α} = {{div}^{2} ϕ : ϕ \in C_{c}^{\infty} (Ω, S^{d \times d}), | ϕ (x) | \leq α_{0} (x), | div ϕ (x) | \leq α_{1} (x), for all x \in Ω},

(A.1)

(A.2)

K_{α} = {{div}^{2} p : p \in H_{0} ({div}^{2}; Ω), | p (x) | \leq α_{0} (x), | div p (x) | \leq α_{1} (x), fora . e . x \in Ω} .

(A.2)

It suffices to show that (A.3) ${\bar{C_{α}}}^{L^{2} (Ω)} = K_{α} .$ (A.3)

We first show that $K_{α}$ is closed in $L^{2} (Ω) .$ Let $g \in L^{2} (Ω)$ and assume that there exists ${(p_{n})}_{n \in N} \subset H_{0} ({div}^{2}; Ω)$ where every p_n satisfies the convex constraints and ${div}^{2} p_{n} \to g$ in $L^{2} (Ω) .$ By boundedness of $α_{0}, α_{1}$ we have that there exist $h_{0} \in L^{2} (Ω, S^{d \times d}),$ $h_{1} \in L^{2} (Ω, R^{d})$ and a subsequence of ${(p_{n_{k}})}_{k \in N}$ such that $p_{n_{k}} ⇀ h_{0} and div p_{n_{k}} ⇀ h_{1},$ in $L^{2} (Ω)$ and $L^{2} (Ω, R^{d}),$ respectively. Using that, we have for every $ϕ \in C_{c}^{\infty} (Ω, R^{d})$ (A.4) $\int_{Ω} \nabla ϕ \cdot h_{0} d x = \lim_{k \to \infty} \int_{Ω} \nabla ϕ \cdot p_{n_{k}} d x = - \lim_{k \to \infty} \int_{Ω} ϕ \cdot div p_{n_{k}} d x = - \int_{Ω} ϕ \cdot h_{1} d x,$ (A.4) thus $h_{1} = div h_{0} .$ Similarly we derive that $g = div h_{1} = {div}^{2} h_{0}$ and hence $h_{0} \in H ({div}^{2}; Ω) .$ Finally note that the set ${(h, div h, {div}^{2} h) : h \in H_{0} ({div}^{2}; Ω), | h (x) | \leq α_{0} (x), | div h (x) | \leq α_{1} (x), fora . e . x \in Ω},$ is a norm-closed and convex subset of $L^{2} (Ω, (S^{d \times d} \times R^{d} \times R))$ and hence weakly closed. Since ${(p_{n_{k}}, div p_{n_{k}}, {div}^{2} p_{n_{k}})}_{k \in N}$ belongs to that set, converging weakly to $(h_{0}, div h_{0}, {div}^{2} h_{0})$ we get that the latter also belongs there. Thus, $K_{α}$ is closed in $L^{2} (Ω)$ and since $C_{α} \subset K_{α},$ we get ${\bar{C_{α}}}^{L^{2} (Ω)} \subset K_{α} .$

It remains to show the other direction, i.e., $K_{α} \subset {\bar{C_{α}}}^{L^{2} (Ω)} .$ Toward that, note first that the functional ${TGV}_{α}^{2} (Ω) : L^{2} (Ω) \to \bar{R},$ can also be written as ${TGV}_{α}^{2} (u) = I_{C_{α}}^{*} (u) .$

Using the convexity of $C_{α}$ one gets ${TGV}_{α}^{2^{*}} (v) = I_{C_{α}}^{* *} (v) = I_{{\bar{C_{α}}}^{L^{2} (Ω)}} (v) .$

Secondly, note that due to the lower bounds on $α_{0}, α_{1},$ for $u \in L^{2} (Ω),$ we have that ${TGV}_{α}^{2} (u) < \infty$ if and only if $u \in BV (Ω) .$ Indeed this holds from the equivalence of the (scalar) $‖ \cdot ‖_{BGV}$ with $‖ \cdot ‖_{BV (Ω)}$ and from the estimate ${TGV}_{\underline{α}, \underline{α}}^{2} (u) \leq {TGV}_{α}^{2} (u) \leq ‖ α_{1} ‖_{\infty} TV (u),$ for every $u \in L^{2} (Ω) .$ This means that if for ${div}^{2} p \in K_{α}$ it holds (A.5) $\int_{Ω} u {div}^{2} p d x \leq {TGV}_{α}^{2} (u), for all u \in BV (Ω) \cap L^{2} (Ω),$ (A.5) then in fact the inequality (A.5) will hold for every $u \in L^{2} (Ω)$ and thus ${TGV}_{α}^{2^{*}} ({div}^{2} p) = 0$ which implies ${div}^{2} p \in {\bar{C_{α}}}^{L^{2} (Ω)} .$ Thus in order to finish the proof it suffices to show (A.5) for every ${div}^{2} p \in K_{α} .$ In view of Proposition 2.1 it suffices to show (A.6) $\int_{Ω} u {div}^{2} p d x \leq \min_{w \in BD (Ω)} \int_{Ω} α_{1} d | D u - w | + \int_{Ω} α_{0} d | E w |$ (A.6) for all $u \in BV (Ω) \cap L^{2} (Ω) .$ The first step toward that is to show that for every $w \in BD (Ω)$ and for every $p \in H_{0} ({div}^{2}; Ω)$ with $| p (x) | \leq α_{0} (x)$ and $| div p (x) | \leq α_{1} (x)$ for a.e. $x \in Ω,$ it holds (A.7) $\int_{Ω} w \cdot div p d x \leq \int_{Ω} α_{0} d | E w | .$ (A.7)

Indeed, note first that from (2.3), we get for every $ϕ \in C^{\infty} (\bar{Ω}, R^{d})$ (A.8) $| \int_{Ω} ϕ \cdot div p d x | = | \int_{Ω} p \cdot E ϕ d x | \leq \int_{Ω} | p | | E ϕ | d x \leq \int_{Ω} α_{0} d | E ϕ | .$ (A.8)

Recall now that every $w \in BD (Ω)$ can be strictly approximated by a sequence ${(ϕ_{n})}_{n \in N} \subset C^{\infty} (\bar{Ω}, R^{d}),$ that is $ϕ_{n} \to w$ in $L^{1} (Ω, R^{d})$ and $| E ϕ | (Ω) \to | E w | (Ω),$ see [Citation12, Proposition 2.10]. Furthermore, using that, along with Reshetnyak’s continuity theorem [Citation33, Theorem 2.39] we also get that $\int_{Ω} α_{0} d | E ϕ_{n} | \to \int_{Ω} α_{0} d | E w |, as n \to \infty .$

Using the above and the fact that $div p \in L^{\infty} (Ω, R^{d}),$ by taking limits in (A.8) we obtain (A.7). Finally in order to obtain (A.5) let again $p \in H_{0} ({div}^{2}; Ω)$ with $| p (x) | \leq α_{0} (x)$ and $| div p (x) | \leq α_{1} (x)$ for a.e. $x \in Ω,$ and let $ϕ \in C^{\infty} (\bar{Ω}, R) .$ Then by using (2.4) and (A.7), we have for every $w \in BD (Ω)$ $\begin{matrix} \int_{Ω} ϕ {div}^{2} p d x \leq | \int_{Ω} \nabla ϕ \cdot div p d x | = | \int_{Ω} (\nabla ϕ - w) \cdot div p d x + \int_{Ω} w div p d x | \\ \leq \int_{Ω} | \nabla ϕ - w | | div p | d x + \int_{Ω} α_{0} d | E w | \leq \int_{Ω} α_{1} | \nabla ϕ - w | d x + \int_{Ω} α_{0} d | E w | . \end{matrix}$

Similarly as before given $u \in BV (Ω) \cap L^{2} (Ω)$ and $w \in BD (Ω),$ there exists a sequence ${(ϕ_{n})}_{n \in N} \subset C^{\infty} (\bar{Ω}, R)$ such that $ϕ_{n} \to u$ in $L^{2} (Ω)$ and $| \nabla ϕ_{n} - w | (Ω) \to | D u - w | (Ω) .$ This follows from a modification of the proof of [Citation12, Proposition 2.10] where one takes advantage of the L² integrability of u. Using again the Reshetnyak’s continuity theorem and taking limits we get that for every $w \in BD (Ω)$ $\int_{Ω} u {div}^{2} p d x \leq \int_{Ω} α_{1} d | D u - w | + \int_{Ω} α_{0} d | E w | .$

By taking the minimum over $w \in BD (Ω),$ we obtain (A.5). □

Dualization and Automatic Distributed Parameter Selection of Total Generalized Variation via Bilevel Optimization

Abstract

1. Introduction

2. The structure of the paper

3. The dual form of the weighted TGV functional

3.1. Total generalized variation

3.2. The space $W_{0}^{q} ({div}^{2}; Ω)$

3.3. Weighted TGV

3.4. The predual weighted TGV problem

4. A Series of regularized problems

4.1. Regularization of the primal problem

5. The bilevel optimization scheme

6. Numerical implementation

6.1. Newton solver for the lower level problem

6.2. The numerical algorithm for ( $P$ _TGV)

6.3. Numerical examples in denoising

Table 1. PSNR and SSIM comparisons for the images of .

7. Conclusion

References

Appendix A

Information for

Open access

Opportunities

Help and information

Dualization and Automatic Distributed Parameter Selection of Total Generalized Variation via Bilevel Optimization

Abstract

1. Introduction

2. The structure of the paper

3. The dual form of the weighted TGV functional

3.1. Total generalized variation

3.2. The space W0q(div2;Ω)

3.3. Weighted TGV

3.4. The predual weighted TGV problem

4. A Series of regularized problems

4.1. Regularization of the primal problem

5. The bilevel optimization scheme

6. Numerical implementation

6.1. Newton solver for the lower level problem

6.2. The numerical algorithm for (PTGV)

6.3. Numerical examples in denoising

Table 1. PSNR and SSIM comparisons for the images of Figure 5.

7. Conclusion

Additional information

Funding

References

Appendix A

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.2. The space $W_{0}^{q} ({div}^{2}; Ω)$

6.2. The numerical algorithm for ( $P$ _TGV)

Table 1. PSNR and SSIM comparisons for the images of .