Search in:

Inverse Problems in Science and Engineering Volume 27, 2019 - Issue 2

Submit an article Journal homepage

Free access

387

Views

CrossRef citations to date

Altmetric

Listen

Original Articles

Nonlinear Tikhonov regularization in Hilbert scales with balancing principle tuning parameter in statistical inverse problems

M. Pricop-Jeckstadt Faculty of Medicine Carl Gustav Carus, Institute for Medical Informatics and Biometry, Technische Universität Dresden , Dresden, Germany.Correspondence[email protected]
View further author information

Pages 205-236 | Received 10 Jul 2016, Accepted 10 Mar 2018, Published online: 25 Mar 2018

Cite this article
https://doi.org/10.1080/17415977.2018.1454918
CrossMark

In this article

ABSTRACT
1. Introduction
2. Balancing principle for nonlinear inverse problems in Hilbert scales
3. Applications
4. Numerical simulations
Acknowledgements
Additional information
Footnotes
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In this article, we study the balancing principle for Tikhonov regularization in Hilbert scales for deterministic and statistical nonlinear inverse problems. While the rates of convergence in deterministic setting is order optimal, they prove to be order optimal up to a logarithmic term in the stochastic framework. The two-step approach allows us to consider a data-driven algorithm in a general error model for which an exponential behaviour of the tail of the estimator chosen in the first step is valid. Finally, we compute the overall rate of convergence for a Hammerstein operator equation and for a parameter identification problem. Moreover, we illustrate these rates for the last application after we study some large sample properties of the local polynomial estimator in a general stochastic framework.

KEYWORDS:

Statistical inverse problem
nonlinear Tikhonov regularization
Hilbert scales
balancing principle

AMS SUBJECT CLASSIFICATIONS:

Primary: 65J20
Secondary: 47N20

1. Introduction

Solving inverse problems in a deterministic setting involves reconstructing a mathematical object in an ill-posed operator equation (see e.g. [Citation1]). The original article [Citation2] considers the stochastic extension of inverse problems to (Gaussian) random fields and formalize it as a statistical estimation question by discretization with the help of sample functions. Loosely speaking, statistical inverse problems are inverse problems with random noise. Moreover, classical statistical problems like convolution or error-in-variables models could be rewritten as linear statistical inverse problems [Citation3].

In this paper, we consider the problem of estimating an unknown, not directly observable quantity $a^{†}$ from the noisy values of an element $u^{†}$ related to $a^{†}$ by the operator equation(1) $\begin{matrix} F (a^{†}) = u^{†}, \end{matrix}$ (1)

where $F : D (F) \subseteq X \to Y$ is a nonlinear, injective operator between two Hilbert spaces $X$ and $Y$ . We assume that the equation is ill-posed, in the sense that $F^{- 1}$ is not continuous. First, we have at our disposal the data Y perturbed by deterministic noise such that(2) $\begin{matrix} Y = u^{†} + δ ξ \end{matrix}$ (2)

where $ξ \in Y$ , $‖ ξ ‖ \leq 1$ is the normalized deterministic error and $δ$ is the deterministic noise level. The results obtained in the framework (Equation2(2) $\begin{matrix} Y = u^{†} + δ ξ \end{matrix}$ (2) ) are further utilized in the stochastic process setting that is generalized here to the abstract noise model(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3)

where $ϵ$ is a normalized Hilbert space process, $σ$ is the corresponding stochastic noise levels, and the relation (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ) is understood in the weak sense. Our general noise model is inspired by [Citation4] and its relation to several common used statistical models was investigated in [Citation3]. A classical example is the inverse nonparametric regression problem, where the data are indirectly related to the functional mean by an operator equation in contrast to the direct regression where we observe noisy values of this function.

A first method for solving nonlinear inverse problems in the statistical setting, inspired by Tikhonov regularization and named Method of Regularization, was studied by O’Sullivan, Lukas and Wahba (see [Citation5–Citation7]). The straightforward generalization of this method to the infinite-dimensional noise model is $\begin{matrix} \bar{a} : = {argmin}_{a \in D (F)} ({‖ F (a) - Y ‖}_{Y}^{2} + α {‖ a - a_{0} ‖}_{X}^{2}) . \end{matrix}$

where $a_{0}$ is an initial guess for $a^{†}$ . However, in the important case when $ϵ$ is white noise ${‖ F (a) - Y ‖}_{Y}$ is infinite with probability 1 for all $a \in D (F)$ , so $\bar{a}$ is not well-defined.

Since our aim is to focus on data-driven methods, we will like to include statistical models that go beyond the additive setting and for whom the direct statistical problem (like the direct nonparametric regression problem) is already studied and better understood. Hence, we choose a two-step estimator as suggested and studied in [Citation8] that gives us the flexibility to embed progress made in the direct stochastic setting into statistical nonlinear inverse problems. In the deterministic case, a pre-smoothing step allows to overcome the oversmoothing effect of the Tikhonov and Landweber regularization method (see [Citation9]). Data estimation in this framework could be achieved by Tikhonov regularization, regularization of the embedding operator or by wavelet shrinkage (see [Citation10]). In the first step, a possibly biased estimator $\hat{u}$ of $u^{†}$ is determined from the data Y such that the error estimate(4) $\begin{matrix} \sqrt{E (‖ \hat{u} - u^{†} ‖_{Y}^{2})} \leq β . \end{matrix}$ (4)

holds true and the rate of convergence $β$ that depends on the stochastic noise $σ$ is known. This is a nonparametric regression problem which has been studied intensively in the statistical literature if $ϵ$ is a Gaussian white noise process and the smoothness conditions are ellipsoids in the Sobolev spaces $H^{b}$ on the domain (0, 1) (see [Citation11] and references therein). In this case, the optimal rates of convergence (Equation4(4) $\begin{matrix} \sqrt{E (‖ \hat{u} - u^{†} ‖_{Y}^{2})} \leq β . \end{matrix}$ (4) ) for adaptive and non-adaptive estimators are $β \sim σ^{\frac{b}{2 b + 1}}$ . Moreover, this rate will also hold for non-Gaussian white noise if the error probability distribution is regular in classical sense since in this case the distribution-free nonparametric regression problem is asymptotically equivalent to a homoskedastic Gaussian shift white noise model (see [Citation12,Citation13]). $β$ will play the role of thenoise level in the next step. In the second step, the estimator $\hat{a}$ is constructed as the solution of the Tikhonov minimization problem (5) $\begin{matrix} \hat{a} : = {argmin}_{a \in D (F)} (‖ F (a) - \hat{u} ‖_{Y}^{2} + α {‖ a - a_{0} ‖}_{X}^{2}) . \end{matrix}$ (5)

Therefore, this method allows the discretization of the problem (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ) for a large variety of designs and the unknown noise level is de facto estimated in the first step. On the other hand, this step can make the computation cumbersome and it is superfluous for designs for which methods and convergence analysis are already available (like for Poisson noise in [Citation14]). Moreover, the rates of convergence depend on the nature of the noise through the tail behaviour of the stochastic method chosen in the first step, making necessary to specify conditions on the error probability distribution to concretize this result for different settings. It follows from [Citation8] that the method is consistent in the sense that for any choice of minimizing elements $\hat{a} \in D (F)$ we have(6) $\begin{matrix} E (‖ \hat{a} - a^{†} ‖_{X}^{2}) \to 0 \end{matrix}$ (6)

if $α \to 0$ and $β^{2} / α \to 0$ . Moreover, in [Citation15] rates of convergence for a large set of smoothness classes expressed with the help of Hilbert scales were computed and optimality of this method was proved for linear operators. Regularization in Hilbert scales was studied for the first time by Natterer [Citation16] for linear and by Neubauer [Citation17] for nonlinear deterministic inverse problems. Linear statistical inverse problems have been studied in a Hilbert scale framework in [Citation4,Citation18–Citation22]. In [Citation15] rates of convergence were obtained for nonlinear statistical inverse problems in Hilbert scale for a-priori choice of the regularization parameter $α$ . A-priori choice of the regularization parameter yields order-optimal rates of convergence but requires the knowledge of the noise levels $δ$ and $σ$ , and of the smoothness of the solution q. We focus in this paper on an adaptive tuning parameter rule, the balancing principle, which was proposed by Lepskii for the choice of the regularization parameter for a regression problem in [Citation23]. His principle of bounding the best possible accuracy by a function which is non-increasing with respect to $α$ -values was used by Mathé and Pereverzev to obtain order-optimal estimators in variable Hilbert scales (see [Citation24]). Afterwards this principle was applied by different authors for deterministic linear and nonlinear inverse problems [Citation24–Citation26].

Our aim is to study rates of convergence of the two-step method with balancing principle choice of the regularization parameter in a general noise model for a large class of smoothnesses expressed with the help of Hilbert scales. This extends the method in [Citation15] to a fully data-driven algorithm in both steps for a general class of additive noise models, including some non-Gaussian white noise processes, as specified before. To this aim, we introduce in the following section Hilbert scales $X_{s}$ and their corresponding norms ${‖ \cdot ‖}_{s}, s \in R$ , and prove rates of convergence of the error for the Lepskii estimator in nonlinear deterministic inverse problems. In addition to the common assumptions on the operator F as in [Citation15], a smallness condition on the noise level $δ$ is necessary as it is usual the case for Lepskii tuning methods. The deterministic rates of convergence allow us to get rates of convergence of $E (‖ \hat{a} - a^{†} ‖_{0}^{2})$ with respect to $β$ that are order optimal up to a log-factor for nonlinear statistical inverse problems under a supplementary condition on the tail behaviour of the estimator $\hat{u}$ in Section 2.2. The overall rate of convergence for the two-step method depends on the nature of the noise through the asymptotic behaviour of the tail and the mean square error of the estimator chosen in the first step. We will illustrate the overall capacity of the two-step approach to reach order optimal (up to a logarithmic term) rates of convergence in the Section 3 of our article, containing the numerical simulation for the reconstruction of a reaction coefficient from Gaussian noisy data in the setting of an inverse nonparametric regression. The asymptotic equivalence between this discrete model and that of the Gaussian white noise (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ) is discussed for example in [Citation11] as well as the relation between the variance in the continuous setting and the sample size n: $σ \sim \frac{1}{\sqrt{n}}$ . We also present new results regarding this behaviour for the local polynomial estimator in the case of errors with Gaussian or log-concave probability distribution, that are useful to illustrate the numerical performance of our algorithm in the last section. Due to the great flexibility of the two-step method, methodological progress in the study of the local polynomial estimator in a large class of statistical models like binary response model, inhomogeneous exponential and Poisson models by adaptive weights smoothing could provide the necessary tools to deal with non-Gaussian settings (see [Citation27]). The study of the overall optimal rates of convergence for these cases is subject of further research.

Two different applications are considered in this paper: one statistical inverse problem defined by the Hammerstein operator, for which we also define the corresponding Hilbert scale and the problem of the reconstruction of a reaction coefficient already studied in [Citation15]. We compute the overall rate of convergence for these two examples that turns out to be optimal up to a log-factor, and graphically compare the numerical and theoretical rates of convergence for the second application in the last section.

2. Balancing principle for nonlinear inverse problems in Hilbert scales

The overall rate of convergence of the two-step method for the balancing choice of the regularization parameter depends on the stochastic model and can be computed by considering first the deterministic setting and afterwards extending it to the statistical framework. In the following, we focus on the rates of convergence for the Lepskii choice of the regularization parameter for nonlinear inverse problems in Hilbert scales separately in a deterministic and in a stochastic framework. The order optimal rates of convergence proved in the deterministic setting and the properties of the nonparametric estimator in the pre-smoothing step lead to optimal up to a logarithm term rates of convergence in the Gaussian stochastic case.

2.1. Rates of convergence for balancing principle in deterministic setting

We consider in this section a deterministic framework, meaning that we assume the noise model (Equation2(2) $\begin{matrix} Y = u^{†} + δ ξ \end{matrix}$ (2) ). To characterize the smoothness of the solution $a^{†}$ , a Hilbert scale is defined in the following with the help of a densely defined, unbounded, self-adjoint, strictly-positive operator L in the Hilbert space $X$ . We consider the set $M$ of the elements x for which all powers of L are defined $M = \cap_{k = 0}^{\infty} D (L^{k})$ $M$ is dense in $X$ and, by spectral theorem, $L^{s}$ is well-defined for all $s \in R$ . For $x, y \in M$ and $s \in R$ , let $\begin{matrix} {〈x, y〉}_{s} & : = 〈L^{s} x, L^{s} y〉, \\ {‖ x ‖}_{s} & = ‖ L^{s} x ‖ . \end{matrix}$

Then the Hilbert space $X_{s}$ is defined as the completion of $M$ with respect to the norm ${‖ \cdot ‖}_{s}$ and ${(X_{s})}_{s \in R}$ is called the Hilbert scale induced by L. A comprehensive introduction in Hilbert scales can be found in [Citation1,Citation28]. In our proofs we are going to use repeatedly the interpolation inequality(7) $\begin{matrix} {‖ x ‖}_{r} \leq {‖ x ‖}_{t}^{\frac{s - r}{s - t}} {‖ x ‖}_{s}^{\frac{r - t}{s - t}}, x \in X_{s} \end{matrix}$ (7)

which holds for any $t < r < s$ . We define our estimator $a_{α}^{δ}$ as (8) $\begin{matrix} a_{α}^{δ} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} ({‖ F (a) - Y ‖}_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) \end{matrix}$ (8)

where $s \geq 0$ and $a_{0} \in D (F)$ is an initial guess. We obtain rates of convergence for the deterministic error $‖ a_{α}^{δ} - a^{†} ‖_{0}$ if the following assumptions on the operator F hold:

Assumptions 2.1:

(A)	(assumptions on F) If $F (a) = F (a^{†})$ for some $a \in D (F) \cap (a_{0} + X_{s})$ , then $a = a^{†}$ . Moreover, D(F) is convex, $F : D (F) \cap (a_{0} + X_{s}) \to Y$ is weakly sequentially closed, and $F : D (F) \subset X \to Y$ has a Fréchet derivative $F^{'} : D (F) \to L (X, Y)$ .
(B)	(smoothing properties of $F^{'}$ ) There exist constants $p \in [0, s]$ and $λ, Λ > 0$ such that for all $h \in X$ (9) $\begin{matrix} {λ ‖ h ‖}_{- p} \leq ‖ F^{'} (a^{†}) {h ‖}_{Y} \leq Λ {‖ h ‖}_{- p} . \end{matrix}$ (9)
(C)	(Lipschitz continuity of $F^{'}$ and boundedness condition on D(F)) There exists a constant $C_{L} > 0$ such that (10) $\begin{matrix} ‖ F^{'} (a) - F^{'} (a^{†}) ‖_{Y \leftarrow X_{- p}} \leq C_{L} {‖ a - a^{†} ‖}_{0} \leq \frac{λ^{2}}{2 Λ} \end{matrix}$ (10) for all $a \in B_{ρ} (a^{†}) \cap (a_{0} + X_{s})$ for some $ρ > 0$ such that $B_{ρ} \subset D (F)$ .

We define the estimator $a_{α}$ corresponding to the exact data $u^{†}$ by (11) $\begin{matrix} a_{α} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} (‖ F (a) - u^{†} ‖_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) . \end{matrix}$ (11)

We treat the noise free error $‖ a_{α} - a^{†} ‖_{0}$ and the data noise error $‖ a_{α}^{δ} - a_{α} ‖_{0}$ separately as it is common for this adaptive methods (see e.g. [Citation29]).

Theorem 2.2:

If the Assumptions 2.1 hold true and if $a^{†} \in D (F)$ fulfills(12) $\begin{matrix} a^{†} - a_{0} \in X_{q} for some q \in [s, 2 s + p] \end{matrix}$ (12)

then the estimation $\begin{matrix} ‖ a_{α} - a^{†} ‖_{0} \leq K α^{\frac{q}{2 (p + s)}} \end{matrix}$

holds true, where $c = ‖ a^{†} - a_{0} ‖_{q}$ and K is a constant dependending on $s, p, q, λ, Λ, c$ .

For a better readability of this paper, the proof can be found in the Appendix 1. The interval $[s, 2 s + p]$ for the smoothness degree is due to the technical condition (Equation7(7) $\begin{matrix} {‖ x ‖}_{r} \leq {‖ x ‖}_{t}^{\frac{s - r}{s - t}} {‖ x ‖}_{s}^{\frac{r - t}{s - t}}, x \in X_{s} \end{matrix}$ (7) ) and can be interpreted as a saturation limit (see e.g. [Citation17]). The exponent of $α$ in the bound on the noise free error is similar to those computed for linear inverse problems, but the constant depends on the $q -$ norm of $a^{†} - a_{0}$ . It was not the aim of this research, to overcome this restriction, but it might be an interesting topic to study.

Theorem 2.3:

Under the assumptions of Theorem 2.2 we have that(13) $\begin{matrix} \begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq & k_{1} α^{\frac{q}{p + s}} + (k_{1} + k_{2}) δ^{2} α^{- \frac{p}{s + p}} \end{matrix} \end{matrix}$ (13)

where $k_{1} \geq K$ and $k_{2}$ are constants dependending on $s, p, q, λ, Λ, c$ , respectively $s, p, λ, Λ$ .

The proof of this Theorem can also be found in Appendix 1. This is based, together with the proof of the previous theorem, on the computation of bounds for the noise-free and data noise term in ${‖ \cdot ‖}_{s}$ and ${‖ \cdot ‖}_{- p}$ norms, and the interpolation inequality (Equation7(7) $\begin{matrix} {‖ x ‖}_{r} \leq {‖ x ‖}_{t}^{\frac{s - r}{s - t}} {‖ x ‖}_{s}^{\frac{r - t}{s - t}}, x \in X_{s} \end{matrix}$ (7) ). The classical results for the linear inverse problems yield a decreasing function in $α$ as upper-bound for the data noise error term. Due to the non-linearity, the product $‖ a_{α}^{δ} - a_{α} ‖_{- p} {‖ a_{α} - a^{†} ‖}_{- p}^{2}$ appears in the upper bound and because of the adaptivity, we can control it only through expressions containing the term $α^{\frac{q}{p + s}}$ . The constants can be explicitely computed i.e.(14) $\begin{matrix} k_{2} = 12 {(\frac{Λ^{2}}{λ^{2}} + \frac{1}{4})}^{1 + \frac{s}{s + p}} Λ^{- \frac{2 s}{s + p}} \end{matrix}$ (14)

but for the sake of simplicity we renounced to formally calculate them in the proof. Finally, we get a sum between an increasing and a decreasing function in the tuning parameter, and we define an adaptive choice of the regularization parameter based on this. To ensure consistency of the method, a smallness conditions on $σ$ and c needs to be fulfilled, as we will see below. First, we present a direct consequence of the previous results.

Corollary 2.4:

Under the assumptions and notations of Theorems 2.2 and 2.3, it holds(15) $\begin{matrix} \begin{matrix} ‖ a_{α}^{δ} - a^{†} ‖_{0}^{2} \leq & 4 k_{1} α^{\frac{q}{p + s}} + 2 (k_{1} + k_{2}) δ^{2} α^{- \frac{p}{s + p}} \end{matrix} \end{matrix}$ (15)

where $k_{1}$ and $k_{2}$ are the constants from Theorem 2.3.

The choice of the regularization parameter $α^{†} \sim δ^{\frac{2 (p + s)}{p + q}}$ gives the order optimal rate of convergence $\begin{matrix} ‖ a_{α^{†}}^{δ} - a^{†} ‖_{0}^{2} \leq 2 C^{*} δ^{\frac{2 q}{p + q}} \end{matrix}$

with $C^{*}$ which depends on $a^{†}$ , p, s, q, $λ$ , $Λ$ and c.

Proof The first inequality follows immediately from Theorems 2.2 and 2.3 and allows us to estimate the tuning parameter $α$ , by balancing between the upper bounds of the approximation and noise-free errors. For $α = α^{†} \sim δ^{\frac{2 (p + s)}{p + q}}$ we get $\begin{matrix} ‖ a_{α}^{δ} - a^{†} ‖_{0}^{2} \leq 2 C^{*} {(δ^{\frac{2 (p + s)}{p + q}})}^{\frac{q}{p + q}} = 2 C^{*} δ^{\frac{2 q}{p + q}} . \end{matrix}$

Figure 1. Upper bounds for the nonlinear data noise error (dotted line), Lepskii rule $Φ$ (dashed line), linear data noise error (full line) for error level $δ^{2} = 0.5$ (left, diamond) and 0.01 (right, diamond), and the corresponding minimum points for $Ψ$ (circle).

We discuss now an adaptive choice of the regularization parameter based on the balancing principle and on the previous bounds computed in Theorems 2.2 and 2.3. We follow here the approach from [Citation24,Citation25,Citation30]. Hence, we estimate the value of the regularization parameter $α$ that approximates the minimizer of the total error $‖ a_{α}^{δ} - a_{†} ‖_{0}^{2}$ by the minimizer of the bounding sum resulting from the aforesaid theorems. This optimum is computed from a finite set of possible parameters $\begin{matrix} M = \{α_{1}, α_{j} = α_{1} {(l^{2})}^{j - 1}, j = 2, \dots, m, with m = min \{k : α_{k} \geq 1\}\} \end{matrix}$

where $l > 1$ , $α_{1}$ is usually equal to $δ^{2}$ , and we denote $a_{α_{j}}^{δ} = a_{j}$ . We define the non-increasing function $Ψ : {1, 2, \dots, m} \to R^{+}$ , $Ψ (j) = 6 k_{2} δ^{2} α_{j}^{- \frac{p}{s + p}}$

and the function $Φ : {1, 2, \dots, m} \to R^{+}$ , $Φ (j) = 4 k_{1} α_{j}^{\frac{q}{p + s}} + 2 k_{1} δ^{2} α_{j}^{- \frac{p}{s + p}}$

with the constants $k_{1}$ and $k_{2}$ as in Theorem 2.3. It can be easily checked that $Φ$ is non-decreasing on the set M if $α_{1} > \frac{1}{2} δ^{2}$ since the minimum point of the real-valued expression $4 k_{1} α^{\frac{q}{p + s}} + 2 k_{1} δ^{2} α^{- \frac{p}{s + p}}$ is smaller than $\frac{1}{2} δ^{2}$ in our setting.

If the inequality(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16)

holds, then we can formulate a rule to choose the tuning parameter $α$ and define an order-optimal estimator independent from smoothness q. Condition (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) assures the existence of at least one intersection point between the curves defined by $Φ$ and $Ψ$ and is similar to the one usual used in the framework of the linear inverse problems, as the Figure also illustrates. It implies as well that our results will be valid only for $q -$ distances c and noise levels small enough such that the right hand side term of (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) bounds an algebraic expression involving these two values. An explicit example of this relationship is given in Section 4.2.

Therefore, we determine the regularization parameter in the set M for which the following inequality holds $\begin{matrix} i_{*} = & max \{i : α_{i} \in M, k_{1} (2 α_{i}^{\frac{q + p}{p + s}} + δ^{2}) \leq 3 k_{2} δ^{2}\} \end{matrix}$

and $α_{*} = α_{i_{*}}$ . The computation of $α_{*}$ and $a_{*} = a_{α_{i_{*}}}$ is unfeasible, since q is unknown, therefore we approximate them by $\begin{matrix} i_{+} = max \{i : ‖ a_{i} - a_{j} ‖_{0}^{2} \leq 32 k_{2} δ^{2} α_{j}^{- \frac{p}{s + p}}, j = 1, 2, \dots, i\} \end{matrix}$

where $a_{i} = a_{α_{i}}$ , $α_{+} = α_{i_{+}}$ and $a_{+} = a_{α_{+}}$ (see [Citation29] for details). Moreover, a simplified version of Lepskii criterion was also presented in the aforementioned article, which had the origins in the quasi-optimality choice of the regularization parameter proposed by Tikhonov and Glasko. In this case, we choose the estimator $a_{⋄} = a_{α_{i_{⋄}}}$ corresponding to the regularization parameter $α_{⋄} = α_{i_{⋄}}$ with the index $\begin{matrix} i_{⋄} = max \{i : ‖ a_{j} - a_{j - 1} ‖_{0}^{2} \leq 32 k_{2} δ^{2} α_{j - 1}^{- \frac{p}{s + p}}, j = 1, 2, \dots, i\} . \end{matrix}$

If condition (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) holds, all three sets defined above are different from the empty set and, for $k_{1} \leq k_{2}$ , not equal to M.

Theorem 2.5:

If the assumptions of Theorem 2.2 and the inequality (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) hold true, we have that $\begin{matrix} i_{*} \leq i_{+} \leq i_{⋄} \end{matrix}$

and we get order optimal rates of convergence both for $a_{+}$ and $a_{⋄}$ i.e. $\begin{matrix} ‖ a_{+} - a^{†} ‖_{0} & \leq c^{+} α^{\frac{q}{p + q}} \\ ‖ a_{⋄} - a^{†} ‖_{0} & \leq c^{⋄} α^{\frac{q}{p + q}} \end{matrix}$

for constants $c^{+}$ and $c^{⋄}$ which depends on $a^{†}, p, s, γ, Γ$ and c.

Proof The inequalities in Theorem 2.5 follow from the order relation between the sets involved in the definition of the three tuning rules. $a_{*}$ fulfills $\begin{matrix} ‖ a_{*} - a_{j} ‖_{0}^{2} \leq & 2 (‖ a_{*} - a^{†} ‖_{0}^{2} + {‖ a_{j} - a^{†} ‖}_{0}^{2}) \\ \leq & 32 k_{2} δ^{2} α_{j}^{- \frac{p}{s + p}} \end{matrix}$

for any $j \leq i_{*}$ since $α_{*}^{- \frac{p}{s + p}} \geq α_{j}^{- \frac{p}{s + p}}$ , and it is obvious that $‖ a_{j} - a_{j - 1} ‖_{0}^{2} \leq 32 k_{2} δ^{2} α_{j - 1}^{- \frac{p}{s + p}}$

for $j \leq i_{+}$ (see [Citation29]). Since the proof of the second part of the theorem is similar for both rates of convergence, we just show the second result in the following paragraph. From the previous inequality $i_{⋄}$ it holds that $i_{*} \leq i_{⋄}$ and we can write $\begin{matrix} ‖ a_{⋄} - a^{†} ‖_{0} & \leq ‖ a_{*} - a^{†} ‖_{0} + \sum_{j = i_{*} + 1}^{i_{⋄}} {‖ a_{α_{j} - 1} - a_{j} ‖}_{0} \\ \leq ‖ a_{*} - a^{†} ‖_{0} + 4 \sqrt{2 k_{2}} δ \sum_{j = 0}^{i_{⋄} - i_{*} - 1} \frac{1}{α_{*}^{\frac{p}{2 (s + p)}} l^{\frac{pj}{s + p}}} \end{matrix}$

Moreover, the definition of $α_{*}$ allows us to obtain order optimal rates of convergence (see e.g. [Citation15,Citation16] and references therein for a discussion about optimal rates for Tikhonov regularization in Hilbert scales.) $\begin{matrix} ‖ a_{⋄} - a^{†} ‖_{0} \leq & 4 \sqrt{2 k_{2}} \frac{δ}{α_{*}^{\frac{p}{2 (s + p)}}} \frac{1 - \frac{1}{l^{\frac{(i_{⋄} - i_{+}) p}{s + p}}}}{1 - \frac{1}{l^{\frac{p}{s + p}}}} + 4 \sqrt{2 k_{2}} δ α_{*}^{\frac{- p}{2 (s + p)}} \\ \leq & c^{⋄} δ^{\frac{q}{p + q}} . \end{matrix}$

Remark 1:

The last condition (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) implies that our results will be valid only for a set of noise levels depending on the $q -$ norm c. Generally, the feasibility of the estimator depends on the initial guess, and this problem can be solved for example by considering different a-priori values and applying the algorithm for each of them.

2.2. Rates of convergence for balancing principle in stochastic setting

In this section we consider the stochastic noise model (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ) and apply the two-step method for this statistical inverse problems. Hence, we choose first an estimator $\hat{u}$ as data and its known rate of convergence (Equation4(4) $\begin{matrix} \sqrt{E (‖ \hat{u} - u^{†} ‖_{Y}^{2})} \leq β . \end{matrix}$ (4) ) as noise level. We define the estimators ${\hat{a}}_{α}$ as (17) $\begin{matrix} {\hat{a}}_{α} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} (‖ F (a) - \hat{u} ‖_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) \end{matrix}$ (17)

where $s \geq 0$ and $a_{0} \in X$ is an initial guess. In the following, the balancing principle is going to be applied in a similar way as in the previous section and rates of convergence are going to be proven. These results are obtained under the supplementary condition for the tail of the probability distribution of the estimator $\hat{u}$ (18) $\begin{matrix} P \{ω : ‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} \geq τ E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2})\} \leq l_{1} exp (- l_{2} τ^{η}) \end{matrix}$ (18)

for any $τ$ greater than 1 such that ${(\frac{τ - 1}{l_{2}})}^{η} > l_{3}$ , where $l_{1}$ , $l_{2}$ and $l_{3}$ are positive universal constants and $0 < η \leq 1$ . We choose the set of regularization parameters as $\begin{matrix} M : = \{α_{j} = β^{2} {(l^{2})}^{j - 1}, j = 2, \dots, min \{k : α_{k} \geq 1\}\} \end{matrix}$

with $l > 1$ and the estimator ${\hat{a}}_{+} = {\hat{a}}_{α_{i_{s +}}}$ with $\begin{matrix} i_{s +} = & max {i : ‖ {\hat{a}}_{i} - {\hat{a}}_{j} ‖_{0} \leq 4 \sqrt{2 \cdot k_{2} (1 + {(l_{2} ln \frac{1}{β})}^{\frac{1}{η}})} β α_{j}^{\frac{- p}{2 (s + p)}}, \\ j = 1, 2, \dots, i} . \end{matrix}$

The rates of convergence for this estimator can be computed using similar techniques as in [Citation31,Citation32].

Theorem 2.6:

If the assumptions of Theorem 2.2 and the conditions (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) and (Equation18(18) $\begin{matrix} P \{ω : ‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} \geq τ E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2})\} \leq l_{1} exp (- l_{2} τ^{η}) \end{matrix}$ (18) ) are fulfilled then, for all $β \leq exp (- l_{3} \frac{p + q}{2 q})$ , we have the following error bound for ${\hat{a}}_{+}$ $\begin{matrix} E (‖ {\hat{a}}_{+} - a^{†} ‖_{0}^{2}) \leq K_{+} β^{\frac{2 q}{p + q}} {(ln \frac{1}{β})}^{\frac{1}{η} \frac{q}{p + q}} \end{matrix}$

where $K_{+}$ is a constant which depends on $c^{+}$ , $l_{1}$ , $l_{2}$ and $C_{L}$ .

Proof For $τ$ fulfilling the conditions of the Theorem 2.6 we denote(19) $\begin{matrix} Ω_{τ} = \{‖ \hat{u} - u^{†} ‖_{Y}^{2} \geq τ E (‖ \hat{u} - u^{†} ‖_{Y}^{2})\} \end{matrix}$ (19)

as the set of extreme values of the $Y -$ error of the estimator $\hat{u}$ . Due to the bias-variance decomposition $\begin{matrix} E (‖ \hat{u} - u^{†} ‖_{Y}^{2}) = E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2}) + {‖ E (\hat{u}) - u^{†} ‖}_{Y}^{2} \end{matrix}$

we can rewrite the inequality (Equation19(19) $\begin{matrix} Ω_{τ} = \{‖ \hat{u} - u^{†} ‖_{Y}^{2} \geq τ E (‖ \hat{u} - u^{†} ‖_{Y}^{2})\} \end{matrix}$ (19) ) as $\begin{matrix} ‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} + 2 {〈\hat{u} - E (\hat{u}), E (\hat{u}) - u^{†}〉}_{Y} + {‖ E (\hat{u}) - u^{†} ‖}_{Y}^{2} \\ \geq τ E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2}) + τ {‖ E (\hat{u}) - u^{†} ‖}_{Y}^{2} . \end{matrix}$

From Cauchy-Schwarz inequality, the set inclusion holds $\begin{matrix} Ω_{τ} & \subseteq {‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} + \frac{1}{τ - 1} ‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} + (τ - 1) {‖ E (\hat{u}) - u^{†} ‖}_{Y}^{2} \\ + ‖ E (\hat{u}) - u^{†} ‖_{Y}^{2} \geq τ E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2}) + τ {‖ E (\hat{u}) - u^{†} ‖}_{Y}^{2}} \end{matrix}$

leading to $Ω_{τ} \subseteq \{‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} \geq (τ - 1) E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2})\} .$

From (Equation18(18) $\begin{matrix} P \{ω : ‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} \geq τ E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2})\} \leq l_{1} exp (- l_{2} τ^{η}) \end{matrix}$ (18) ) we get(20) $\begin{matrix} P (Ω_{τ}) \leq l_{1} exp (- \frac{{(τ - 1)}^{η}}{l_{2}}) \end{matrix}$ (20)

for all $τ \geq 1$ such that ${(\frac{τ - 1}{l_{2}})}^{η} > l_{3}$ . With the help of conditional expectation we can write $\begin{matrix} E (‖ {\hat{a}}_{+} - a^{†} ‖_{0}^{2}) & = P (Ω_{τ}) E (‖ {\hat{a}}_{+} - a^{†} ‖_{0}^{2} | Ω_{τ}) + P (C Ω_{τ}) E (‖ {\hat{a}}_{+} - a^{†} ‖_{0}^{2} | C Ω_{τ}) \end{matrix}$

where $C Ω_{τ} = Ω \ Ω_{τ}$ is the complementary set of $Ω_{τ}$ in $Ω$ . We apply the deterministic convergence result for Lepskii choice of the regularization parameter (see Theorem 2.5) on the set $C Ω_{τ}$ for $τ = 1 + {(l_{2} \frac{2 q}{p + q} ln \frac{1}{β})}^{\frac{1}{η}}$ and use the exponential inequality (Equation20(20) $\begin{matrix} P (Ω_{τ}) \leq l_{1} exp (- \frac{{(τ - 1)}^{η}}{l_{2}}) \end{matrix}$ (20) ) to bound $P (Ω_{τ})$ for noise levels $β \leq exp (- l_{3} \frac{p + q}{2 q})$ . This yields $\begin{matrix} E (‖ {\hat{a}}_{+} - a^{†} ‖_{0}^{2}) & \leq l_{1} exp (- \frac{{(τ - 1)}^{η}}{l_{2}}) \frac{λ^{2}}{2 Λ C_{L}} + (1 - P (Ω_{τ})) c^{+} {(τ β^{2})}^{\frac{q}{p + q}} \\ \leq l_{1} \frac{λ^{2}}{2 Λ C_{L}} exp (- \frac{2 q}{p + q} ln \frac{1}{β}) + c^{+} {[β^{2} (1 + {(l_{2} \frac{2 q}{p + q} ln \frac{1}{β})}^{\frac{1}{η}})]}^{\frac{q}{p + q}} \\ \leq l_{1} \frac{λ^{2}}{2 Λ C_{L}} β^{\frac{2 q}{p + q}} + c^{+} β^{\frac{2 q}{p + q}} (1 + {(l_{2} \frac{2 q}{p + q} ln \frac{1}{β})}^{\frac{q}{η (p + q)}}) \\ \leq K_{+} β^{\frac{2 q}{p + q}} {(ln \frac{1}{β})}^{\frac{q}{η (p + q)}} . \end{matrix}$

Remark 2:

Like in the deterministic setting, we can choose a simplified version for the choice of the regularization parameter that reduces the computational cost. We select the estimator ${\hat{a}}_{⋄} = {\hat{a}}_{α_{i_{⋄}}}$ with $\begin{matrix} \begin{matrix} i_{⋄} = & max \{i = 1, 2, \dots m : ‖ {\hat{a}}_{i} - {\hat{a}}_{i - 1} ‖_{0} \leq 4 \sqrt{2 k_{2} (1 + {(l_{2} ln \frac{1}{β})}^{\frac{1}{η}})} β α_{i - 1}^{\frac{- p}{2 (s + p)}}\} \end{matrix} \end{matrix}$

In the same way as in the Theorem 2.6 we can prove that the rate of convergence is equal to(21) $\begin{matrix} E (‖ {\hat{a}}_{⋄} - a^{†} ‖_{0}^{2}) \leq K_{⋄} β^{\frac{2 q}{p + q}} {(ln \frac{1}{β})}^{\frac{1}{η} \frac{q}{p + q}} . \end{matrix}$ (21)

3. Applications

3.1. Hammerstein operator

In the last section, we compute the overall rate of convergence of the two-step method for an inverse problem originating in a non-linear integral equation and inspired by [Citation33]. We consider here the Hammerstein operator(22) $\begin{matrix} F : H^{1} \to L^{2} \\ a \mapsto \int_{0}^{∙} Φ (a (t)) d t \end{matrix}$ (22)

where $Φ$ is a function in the space of Hölder continuous functions $C^{2, 1} (R)$ with bounded second derivative(23) $\begin{matrix} ‖ Φ^{″} {(t) ‖}_{C^{0, 1}} \leq K \end{matrix}$ (23)

for all $t \in R$ , and $H^{1}$ denotes the Sobolev space of index 1 of functions on the interval [0, 1]. Our inverse problem is to determine a from the knowledge of F(a). Similar conditions as in Assumptions 2.1 were used in [Citation34] to study the Landweber iterative method in Hilbert scales. The following properties of the operator F and its Fréchet derivative are going to be useful for choosing an appropriate Hilbert scale. The detailed proofs of the following theoretical results can be found in [Citation35].

Lemma 3.1:

F is a continuous, compact, weakly closed and Fréchet differentiable operator with the Fréchet derivative(24) $\begin{matrix} (F^{'} (a) h) (t) = \int_{0}^{t} Φ^{'} (a (s)) h (s) d s \end{matrix}$ (24)

The adjoint of the Fréchet derivative is given by(25) $\begin{matrix} F^{'} {(a)}^{*} g = B^{- 1} (Φ^{'} (a (\cdot)) \int_{∙}^{1} g (s) d s) . \end{matrix}$ (25)

Here $B : \{v \in H^{2} : v^{'} (0) = v^{'} (1) = 0\} \to L^{2}$ is defined by $B v : = - v^{″} + v$ .

We assume from now on that(26) $\begin{matrix} Φ^{'} (s) > 0, for all s \in R . \end{matrix}$ (26)

In this case, the operator F is injective and we determine the set $R (F^{'} {(a^{†})}^{*})$ in order to choose a Hilbert scale which fulfills Assumptions 2.1. Let the linear integral operator $A : L^{2} \to L^{2}$ be defined as $(A f) (x) = \int_{0}^{x} f (t) d t$ . $F^{'} {(a^{†})}^{*}$ is the composition of three operators: $A^{*}$ , the multiplication operator $M_{Φ^{'} \circ a^{†}} : L^{2} \to L^{2}, M_{Φ^{'} \circ a^{†}} (h) = (Φ^{'} \circ a^{†}) h$ and $B^{- 1}$ . The operator $A^{*} : L^{2} \to L^{2}, A^{*} (g) (s) = \int_{s}^{1} g (t) d t$ has the range $\begin{matrix} R (A^{*}) = \{u \in H^{1} : u (1) = 0\} \end{matrix}$

and $\begin{matrix} M (Φ^{'} \circ a^{†}) (R (A^{*})) = \{u \in H^{1} : u (1) = 0\} \end{matrix}$

as $Φ^{'} \circ a^{†}$ is in $H^{1}$ . Note that due to (Equation26(26) $\begin{matrix} Φ^{'} (s) > 0, for all s \in R . \end{matrix}$ (26) ) there exists $γ > 1$ such that $\begin{matrix} \frac{1}{γ} \leq Φ^{'} (a^{†} (s)) \leq γ \end{matrix}$

for all $s \in [0, 1]$ since $‖ a^{†} ‖_{\infty} < \infty$ . Since the restriction $\begin{matrix} \tilde{B} {: = B |}_{D (B) \cap H^{3}} : \{v \in H^{3} : v^{'} (0) = v^{'} (1) = 0\} \to H^{1} \end{matrix}$

is an isomorphism by elliptic regularity results and since $(B v) (1) = 0$ is equivalent to $v^{″} (1) = v (1)$ we finally get $\begin{matrix} R (F^{'} {(a^{†})}^{*}) = R (B^{- 1} \circ M_{Φ^{'} \circ a^{†}} \circ A^{*}) = \{v \in H^{3} : v^{'} (0) = v^{'} (1) = 0, v (1) = v^{″} (1)\} . \end{matrix}$

As the operators $A^{*}, M (Φ^{'} \circ a^{†})$ and $B^{- 1}$ are injective, we have that $F^{'} {(a^{†})}^{*}$ is bijective between $L^{2}$ and $R (F^{'} {(a^{†})}^{*})$ and $‖ F^{'} {(a^{†})}^{*} {w ‖}_{H^{3}} \sim {‖ w ‖}_{L^{2}}$ for all $w \in L^{2}$ .

We would like to have a Hilbert scale ${\{X_{s}\}}_{s \in R}$ with the following properties: it holds (27a) $\begin{matrix} X_{0} = H^{1} \end{matrix}$ (27a) (or higher regularity of $X_{0}$ ) to be able to show Assumptions 2.1, and we need $R (F^{'} {(a^{†})}^{*})$

to be a member of the Hilbert scale, e.g.(27b) $\begin{matrix} X_{2} & = \{w \in H^{3} : w^{'} (0) = w^{'} (1) = 0, w (1) = w^{″} (1)\} \\ {〈w, v〉}_{X_{2}} & = {〈\tilde{B} w, \tilde{B} v〉}_{H^{1}} = \int_{0}^{1} w^{(3)} v^{(3)} + 3 w^{″} v^{″} + 3 w^{'} v^{'} + w v d x . \end{matrix}$ (27b) Unfortunately, the restriction $\tilde{B} : R (F^{'} {(a^{†})}^{*}) \to H^{1}$ is not self-adjoint and we need to consider the operator defined underneath as the generator of the Hilbert scale below.

Proposition 3.2:

The operator $\begin{matrix} L : D (L) & = \{w \in H^{5} : w^{'} (0) = w^{'} (1) = w^{(3)} (0) = 0, w (1) = w^{″} (1)\} \to H^{1} \\ L w : = w - 2 w^{″} + w^{(4)} . \end{matrix}$

is self-adjoint in $H^{1}$ , and(27c) $\begin{matrix} D (L^{\frac{1}{2}}) = R (F^{'} {(a^{†})}^{*}) . \end{matrix}$ (27c)

It follows thereof that the Hilbert scale $\{X_{s}\}$ generated by the operator $L^{\frac{1}{4}}$ has the desired properties (Equation27a(27a) $\begin{matrix} X_{0} = H^{1} \end{matrix}$ (27a) ), (Equation27b(27b) $\begin{matrix} X_{2} & = \{w \in H^{3} : w^{'} (0) = w^{'} (1) = 0, w (1) = w^{″} (1)\} \\ {〈w, v〉}_{X_{2}} & = {〈\tilde{B} w, \tilde{B} v〉}_{H^{1}} = \int_{0}^{1} w^{(3)} v^{(3)} + 3 w^{″} v^{″} + 3 w^{'} v^{'} + w v d x . \end{matrix}$ (27b) ) and (Equation27c(27c) $\begin{matrix} D (L^{\frac{1}{2}}) = R (F^{'} {(a^{†})}^{*}) . \end{matrix}$ (27c) ).

Theorem 3.3:

For the Hilbert scale $\{X_{s}\}$ defined before and under the supplementary conditions (Equation23(23) $\begin{matrix} ‖ Φ^{″} {(t) ‖}_{C^{0, 1}} \leq K \end{matrix}$ (23) ) and (Equation26(26) $\begin{matrix} Φ^{'} (s) > 0, for all s \in R . \end{matrix}$ (26) ), Assumption 2.1(B) is fulfilled with $p = 2$ . Moreover, Assumptions 2.1(C) is also satisfied under the same conditions if the diameter of $D (F) \cap (a_{0} + X_{s})$ in $X_{0}$ -norm is smaller than a $ρ > 0$ .

Proof Assumptions 2.1(C) is equivalent to(28) $\begin{matrix} ‖ {(F^{'} (a) - F^{'} (a^{†}))}^{*} ‖_{Y \to X_{2}} \leq C_{L} {‖ a - a^{†} ‖}_{0} \leq \frac{λ^{2}}{2 Λ} \end{matrix}$ (28)

for all $a \in B_{ρ} (a^{†}) \cap (a_{0} + X_{s})$ for some $ρ > 0$ such that $B_{ρ} \subset D (F)$ . It follows, by (Equation25(25) $\begin{matrix} F^{'} {(a)}^{*} g = B^{- 1} (Φ^{'} (a (\cdot)) \int_{∙}^{1} g (s) d s) . \end{matrix}$ (25) ) that $\begin{matrix} ‖ (F^{'} {(a^{†})}^{*} - F^{'} {(a)}^{*}) {h ‖}_{X_{2}} & = ‖ (Φ^{'} (a^{†} (\cdot)) - Φ^{'} (a (\cdot))) \int_{\cdot}^{1} {h (s) d s ‖}_{0} \\ \leq ‖ (Φ^{'} \circ a^{†}) - (Φ^{'} \circ a) ‖_{H^{1}} {‖ \int_{\cdot}^{1} h (s) d s ‖}_{H^{1}} \\ \leq K (2 + ‖ a^{†} ‖_{0}) ‖ a^{†} {- a ‖}_{0} {‖ h ‖}_{L^{2}} . \end{matrix}$

The last inequality holds because $\begin{matrix} ‖ (Φ^{'} \circ a^{†}) - (Φ^{'} \circ a) ‖_{H^{1}} & = ‖ Φ^{'} \circ a^{†} - Φ^{'} {\circ a ‖}_{L^{2}} + {‖ (Φ^{″} \circ a^{†}) {(a^{†})}^{'} - (Φ^{″} \circ a) a^{'} ‖}_{L^{2}} \\ \leq K ‖ a^{†} - a ‖_{L^{2}} + ‖ (Φ^{″} \circ a) ({(a^{†})}^{'} - a^{'}) \\ + {(a^{†})}^{'} (Φ^{″} \circ a^{†} - Φ^{″} \circ a) {) ‖}_{L^{2}} \\ \leq K ‖ a^{†} {- a ‖}_{L^{2}} + K {‖ {(a^{†})}^{'} - a^{'} ‖}_{L^{2}} \\ + ‖ Φ^{″} \circ a^{†} - Φ^{″} {\circ a ‖}_{\infty} {‖ {(a^{†})}^{'} ‖}_{L^{2}} \\ \leq 2 K ‖ a^{†} {- a ‖}_{H^{1}} + ‖ Φ^{″} ‖_{C^{0, 1}} ‖ a^{†} {- a ‖}_{\infty} {‖ a^{†} ‖}_{H^{1}} \\ \leq 2 K ‖ a^{†} {- a ‖}_{H^{1}} + ‖ Φ^{″} ‖_{C^{0, 1}} ‖ a^{†} ‖_{H^{1}} ‖ a^{†} {- a ‖}_{H^{1}} {‖ a^{†} ‖}_{H^{1}} \\ \leq K (2 + ‖ a^{†} ‖_{0}) ‖ a^{†} {- a ‖}_{H^{1}} . \end{matrix}$

Hence, the diameter $ρ$ should be smaller than $\frac{λ^{2}}{4 Λ K (2 + ‖ a^{†} ‖_{0})}$ .

Proposition 3.4:

If $a^{†} \in H^{q + 1}$ and $Φ \in C^{⌈ q ⌉ + 1, 1}$ , where $⌈ q ⌉$ is the biggest integer smaller than q, then $u^{†} \in H^{q + 2}$ .

Proof Since the composition between $Φ$ and $a^{†}$ belongs to $H^{q + 1}$ (see [Citation36]), it follows immediately that $u^{†} \in H^{q + 2}$ , for $q \geq 2$ .

Now, the overall rate of convergence can be computed.

Corollary 3.5:

In the case of the stochastic noise model (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ), assuming $ϵ$ white noise with Gaussian or regular non-Gaussian probability distribution, if the conditions of Theorem 3.3 hold true and if $s \geq 2$ , $q \in [s, 2 s + 2]$ , $a^{†} - a_{0} \in X_{q}$ , and $a^{†} \in H^{q + 1}$ , then $u^{†}$ can be estimated such that the rate of convergence (Equation4(4) $\begin{matrix} \sqrt{E (‖ \hat{u} - u^{†} ‖_{Y}^{2})} \leq β . \end{matrix}$ (4) ) holds true with $β \sim σ^{\frac{q + 2}{q + 2 + 1 / 2}}$ . If the white noise has Gaussian distribution and the condition (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) holds, then we get the rates of convergence $\begin{matrix} \sqrt{E ‖ \hat{a} - a^{†} ‖_{0}^{2}} & = O (σ^{\frac{q}{q + 2 + 1 / 2}} {(log \frac{1}{σ})}^{\frac{3 q}{2 (q + 2)}}) \end{matrix}$

while for regular concave log-probability distributions we obtain $\begin{matrix} \sqrt{E ‖ \hat{a} - a^{†} ‖_{0}^{2}} & = O (σ^{\frac{q}{q + 2 + 1 / 2}} {(log \frac{1}{σ})}^{\frac{2 q}{q + 2}}) . \end{matrix}$

Proof The first statement follows from Proposition 3.4, from asymptotic equivalence in Le Cam’s sense between nonparametric regression with regular error distribution i.e. presenting a locally asymptotic stochastic expansion (for exact definition of the regularity conditions see [Citation12]) and the white noise with drift, and from the theory of nonparametric regression (see [Citation11]). The smoothness of $u^{†}$ is Sobolev of order $q + 2$ , and rates of convergence on Sobolev balls were computed for nonparametric regression problems in the last reference and they correspond to the statement from the Corollary 3.5. The second statement is a consequence of the rates of convergence computed in [Citation37], Theorem 2.6 and Lemma 4.3.

3.2. Reconstruction of a reaction coefficient

To illustrate the convergence behaviour of our method, we consider in this section a parameter identification problem as a nonlinear operator equation and compute over-all rates of convergence for the Lepskii estimator based on theoretical properties already proved in [Citation15]. Furthermore, we compare its numerical and theoretical rates of convergence for different smoothness conditions with the help of Monte-Carlo simulations.

3.2.1. Theoretical rates of convergence

The direct problem is modeled by the differential equation $\begin{matrix} \{\begin{matrix} - Δ u (x) + a (x) u (x) & = f (x), x \in Ω \\ {u |}_{\partial Ω} & = g \end{matrix} \end{matrix}$

where f and g are infinitely smooth, and the domain $Ω$ is chosen to be (0, 1) for simplicity. For a given bound $γ > 0$ we introduce the set(29) $\begin{matrix} D (F) =} a \in L^{2} ((0, 1)) : 0 \leq a (t) \leq γ, \forall t \in [0, 1] { \end{matrix}$ (29)

and we formulate the inverse problem of identifying the parameter $a \in D (F)$ knowing $u_{a} = u_{a} (x)$ as an operator equation with the help of parameter-to-solution operator(30) $\begin{matrix} F : D (F) \to L^{2} ((0, 1)), F (a) : = u_{a} . \end{matrix}$ (30)

The Hilbert scale ${(X_{s})}_{s \in R}$ will be generated by the square root $B^{\frac{1}{2}}$ of the positive, self-adjoint operator B defined by $\begin{matrix} B : H_{0}^{1} ((0, 1)) \cap H^{2} ((0, 1)) \to L^{2} ((0, 1)), B v : = - Δ v + v . \end{matrix}$

We notice that the elements of this Hilbert scale with integer index are subsets of Sobolev spaces with Dirichlet boundary conditions on the even derivatives $\begin{matrix} X_{1} & = H_{0}^{1} ((0, 1)), X_{2} = H_{2} ((0, 1)) \cap H_{0}^{1} ((0, 1)), \\ X_{3} & = {v \in H_{3} ((0, 1)) : v, Δ v \in H_{0}^{1} ((0, 1))} . \end{matrix}$

In [Citation15] it was proven that the operator F together with this choice of Hilbert scale fulfills the Assumptions 2.1 for $p = 2$ and $s = 2$ when the condition(31) $\begin{matrix} c_{u} = inf_{x \in (0, 1)} u^{†} (x) > 0 \end{matrix}$ (31)

and the smallness condition on the D(F) hold, and that the regularity of the parameter $a^{†} \in H^{q} ((0, 1))$ with $q > 1 / 2$ determines the smoothness of the exact data $u^{†} \in H^{q + 2} ((0, 1))$ . Hence, we can get an over-all convergence rate results in this case, also.

Corollary 3.6:

In the case of the stochastic noise model (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ), assuming $ϵ$ white noise with Gaussian or regular non-Gaussian probability distribution, for the Hilbert scale defined above $(X_{s})$ and under the supplementary condition (Equation32(32) $\begin{matrix} Y_{i} = u^{†} (X_{i}) + ϵ_{i}, i = 1, \dots, n \end{matrix}$ (32) ) and if $s \geq 2$ , $q \in [s, 2 s + 2]$ , $a^{†} - a_{0} \in X_{q}$ , and $a^{†} \in H^{q} ((0, 1))$ , then $u^{†}$ can be estimated such that the rate of convergence (Equation4(4) $\begin{matrix} \sqrt{E (‖ \hat{u} - u^{†} ‖_{Y}^{2})} \leq β . \end{matrix}$ (4) ) holds true with $β \sim σ^{\frac{q + 1}{q + 1 + 1 / 2}}$ . If the white noise has Gaussian distribution and the conditions (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) holds, then we get the rates of convergence $\begin{matrix} \sqrt{E ‖ \hat{a} - a^{†} ‖_{0}^{2}} & = O (σ^{\frac{q}{q + 2 + 1 / 2}} {(log \frac{1}{σ})}^{\frac{3 q}{2 (q + 2)}}) \end{matrix}$

4. Numerical simulations

In this final section we present the employment of the balancing principle and the influence of the smoothness of the parameter $a^{†}$ on the rate of convergence of $\hat{u}$ in the two-step method for the reconstruction of a reaction coefficient by numerical simulations. First, we choose the local polynomial estimator as the pre-smoothing method and review some of its properties in the framework of a discretized version of the problem (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ). A new result about the tail behaviour (Equation18(18) $\begin{matrix} P \{ω : ‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2} \geq τ E (‖ \hat{u} - E (\hat{u}) ‖_{Y}^{2})\} \leq l_{1} exp (- l_{2} τ^{η}) \end{matrix}$ (18) ) of the local polynomial estimator in a general, non-Gaussian noise model (see [Citation38] for a review of the literature on this subject) is also proved. Moreover, the values of $λ$ and $Λ$ for parameter-to-solution operator (Equation31(31) $\begin{matrix} c_{u} = inf_{x \in (0, 1)} u^{†} (x) > 0 \end{matrix}$ (31) ), and the empirical overall rate of convergence for the two-step method are computed, and the rates are illustrated in our graphics for different smoothnesses of $a^{†}$ .

Figure 2. Exact parameter $a^{†}$ (full line - $q = 2.5$ , dashed line - $q = 3.5$ , dotted line - $q = 4.5$ ), and its corresponding empirical and theoretical (dashed - dotted) rate of convergence on a log-log scale

4.1. Tail behaviour of local polynomial estimator

In the first step, we consider the nonparametric regression problem of estimating $u^{†}$ from the data ${(Y_{i}, X_{i})}_{i = 1, \dots, n}$ such that(32) $\begin{matrix} Y_{i} = u^{†} (X_{i}) + ϵ_{i}, i = 1, \dots, n \end{matrix}$ (32)

where ${(ϵ_{i})}_{i = 1, \dots, n}$ are independent, centred random variables with $v a r (ϵ_{i}) \leq v \leq \infty$ and $X_{i} \in (0, 1), i = 1, \dots, n$ is a deterministic design. In the case of Gaussian nonparametric regression, when ${\{ϵ\}}_{i = 1, \dots, n}$ are independent, identically normal distributed N(0, 1), this is the discretized version of the Gaussian white noise model in (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ) with $ϵ$ the standard Wiener process on [0,1] and the stochastic noise level $σ = \frac{1}{\sqrt{n}}$ (see e.g. [Citation11]).

As usual in this setting, we assume that $u^{†}$ belongs to the Hölder class $Σ (l, M)$ on $(0, 1)$ i.e. it is a $⌊ l ⌋$ times differentiable function whose derivative ${(u^{†})}^{⌊ l ⌋}$ satisfies $\begin{matrix} |{(u^{†})}^{⌊ l ⌋} (x) - {(u^{†})}^{⌊ l ⌋} (x^{'})| \leq M {|x - x^{'}|}^{l - ⌊ l ⌋}, \forall x, x^{'} \in (0, 1) \end{matrix}$

where $⌊ l ⌋$ is the integer part of $l > 0$ . For the case that the co-domain $Y$ of the operator F is the Sobolev space $W^{⌊ l ⌋ + 1, 2}$ , the compact embedding of the Hilbert space $W^{⌊ l ⌋ + 1, 2}$ into the Hölder space $Σ (l, M)$ follows from the Morrey’s inequality (as in [Citation39]).

The regression function $u^{†}$ can be locally approximated by $\begin{matrix} u^{†} (x + f) ≃ \sum_{k = 0}^{⌊ l ⌋} \frac{{(u^{†})}^{(k)} (x)}{k!} f^{k} \end{matrix}$

for $f > 0$ small enough. The vector $θ (x) = {\{{(u^{†})}^{(k)} (x) f^{k}\}}_{k = 0, \dots, ⌊ l ⌋}$ can be estimated by means of the local polynomial estimator of order $d = ⌊ l ⌋$ .

Definition 4.1:

The local polynomial estimator of order d of $θ (x)$ is defined by (33) $\begin{matrix} {\hat{θ}}_{nh} (x) = {argmin}_{θ \int R^{d + 1}} \{\frac{1}{nh} \sum_{i = 1}^{n} {[Y_{i} - θ^{T} V (\frac{X_{i} - x}{h})]}^{2} K (\frac{X_{i} - x}{h})\} \end{matrix}$ (33)

where $V (u) = (1, u, \frac{u^{2}}{2!}, \dots, \frac{u^{d}}{d!})$ , $K : R \to R$ is a kernel i.e. an integrable function satisfying $\int K (u) d u = 1$ , $h > 0$ is the bandwidth and $d \geq 0$ is an integer. The local polynomial estimator of order d of $u^{†} (x)$ is the first element of the vector ${\hat{θ}}_{nh} (x)$ and is denoted by ${\hat{u}}_{nh} (x) = V^{t} (0) {\hat{θ}}_{nh} (x)$ .

The existence and uniqueness of the optimization problem (Equation34(34) $\begin{matrix} {\hat{θ}}_{nh} (x) & = \frac{1}{nh} \sum_{i = 1}^{n} Y_{i} B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) \\ {\hat{u}}_{nh} (x) & = \frac{1}{nh} \sum_{i = 1}^{n} Y_{i} V^{t} (0) B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) . \end{matrix}$ (34) ) was proven in [Citation11] under the following hypothesis which will assume to hold from now on.

Assumptions 4.2:

(A)	There exists an $n_{0} \in N$ such that the eigenvalues of the Hessian matrix of the objective function in the optimization problem (Equation34(34) $\begin{matrix} {\hat{θ}}_{nh} (x) & = \frac{1}{nh} \sum_{i = 1}^{n} Y_{i} B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) \\ {\hat{u}}_{nh} (x) & = \frac{1}{nh} \sum_{i = 1}^{n} Y_{i} V^{t} (0) B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) . \end{matrix}$ (34) ) $\begin{matrix} B_{hx} & = {\{B_{hx}^{pq} (θ)\}}_{p, q = 1, \dots, d + 1} \\ = {\{\frac{\partial}{\partial θ_{p}} \frac{\partial}{\partial θ_{q}} \{\frac{1}{nh} \sum_{i = 1}^{n} {[Y_{i} - θ^{T} V (\frac{X_{i} - x}{h})]}^{2} K (\frac{X_{i} - x}{h})\}\}}_{p, q = 0, \dots, d} \\ = \frac{2}{nh} \sum_{i = 1}^{n} V (\frac{X_{i} - x}{h}) V^{T} (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) \end{matrix}$ are bounded from below uniformly in x by a $λ_{0} > 0$ for any $n \geq n_{0}$ .
(B)	The frequency of the points ${\{X_{i}\}}_{i = 1, \dots, n}$ in any interval $(a, b) \subset (0, 1)$ is uniformly bounded with respect to n by $a_{0} max (b - a, \frac{1}{n})$ for a positive constant $a_{0}$ .
(C)	K is bounded by a constant $K_{\max}$ on $R$ and has compact support $s u p p (K) \subseteq [- 1, 1]$ .

Consequently, the local polynomial estimator of order d of $θ (x)$ as well as of $u^{†} (x)$ exist and are given by(34) $\begin{matrix} {\hat{θ}}_{nh} (x) & = \frac{1}{nh} \sum_{i = 1}^{n} Y_{i} B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) \\ {\hat{u}}_{nh} (x) & = \frac{1}{nh} \sum_{i = 1}^{n} Y_{i} V^{t} (0) B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) . \end{matrix}$ (34)

We focus in the following on the tail behaviour of the local polynomial estimator for a fixed bandwidth h under different noise conditions.

Lemma 4.3:

Let us assume that the vector of errors $ϵ = {\{ϵ_{i}\}}_{i = 1, \dots, n}$ has a log-concave, continuous distribution on $R^{n}$ i.e. the logarithm of this probability density is concave on the set where its logarithm is defined. If the Assumptions 4.2 hold, we have $\begin{matrix} P \{‖ {\hat{u}}_{nh} - E ({\hat{u}}_{nh}) ‖_{L^{2} (0, 1)}^{2} \geq t E {‖ {\hat{u}}_{nh} - E ({\hat{u}}_{nh}) ‖}_{L^{2} (0, 1)}^{2}\} \leq e \cdot e x p \{- \frac{\sqrt{t}}{\frac{C_{1} e}{2}}\} \end{matrix}$

where $C_{1}$ is an universal constant, $h > 0$ and $t \geq e^{2} C_{1}^{2}$ .

Moreover, if the errors ${\{ϵ_{i}\}}_{i = 1, \dots, n}$ are independent, centred Gaussian random variables, then it holds $\begin{matrix} P \{‖ {\hat{u}}_{nh} - E ({\hat{u}}_{nh}) ‖_{L^{2} (0, 1)}^{2} \geq t E {‖ {\hat{u}}_{nh} - E ({\hat{u}}_{nh}) ‖}_{L^{2} (0, 1)}^{2}\} \leq e x p \{- C_{2} \cdot t\} \end{matrix}$

where $C_{2}$ is an universal constant.

Proof From (Equation35(35) $\begin{matrix} P \{f (ϵ) \geq t E (f (ϵ))\} \leq {(\frac{\frac{C_{1} q}{2}}{\sqrt{t}})}^{q} \end{matrix}$ (35) ) we can write $\begin{matrix} ‖ {\hat{u}}_{nh} - E ({\hat{u}}_{nh}) ‖_{L^{2} (0, 1)}^{2} = \int_{0}^{1} {(\frac{1}{nh} \sum_{i = 1}^{n} ϵ_{i} V^{t} (0) B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}))}^{2} d x \\ = \frac{1}{{(n h)}^{2}} \sum_{i, j = 1}^{n} ϵ_{i} ϵ_{j} \int_{0}^{1} V^{t} (0) B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) V^{t} (0) B_{hx}^{- 1} V (\frac{X_{j} - x}{h}) \\ \cdot K (\frac{X_{j} - x}{h}) d x . \end{matrix}$

This is a positive polynomial of order two with symmetric coefficients in the random variables $ϵ_{i}, i = 1, \dots, n$ . In this case, results regarding the higher moments are already available (see Theorem 7 in [Citation40]). For the sake of completeness we reproduce this result in Appendix 1 (see Lemma 1) and use it to bound the moments of the polynomial $\begin{matrix} f (ϵ) = \frac{1}{{(n h)}^{2}} \sum_{i, j = 1}^{n} ϵ_{i} ϵ_{j} \int_{0}^{1} V^{t} (0) B_{hx}^{- 1} V (\frac{X_{i} - x}{h}) K (\frac{X_{i} - x}{h}) V^{t} (0) B_{hx}^{- 1} V (\frac{X_{j} - x}{h}) \\ \cdot K (\frac{X_{j} - x}{h}) d x \end{matrix}$

as $\begin{matrix} E (f {(ϵ)}^{\frac{q}{2}}) \leq {(C_{1} \frac{q}{2})}^{q} {(E f (ϵ))}^{\frac{q}{2}} \end{matrix}$

where $d = r = 2$ , $q \geq 2$ and $C_{1}$ is an universal constant. From the Markov inequality we get(35) $\begin{matrix} P \{f (ϵ) \geq t E (f (ϵ))\} \leq {(\frac{\frac{C_{1} q}{2}}{\sqrt{t}})}^{q} \end{matrix}$ (35)

Let us choose now the integer $q^{*}$ such that $\frac{2 \sqrt{t}}{e C_{1}} - 1 \leq q^{*} \leq \frac{2 \sqrt{t}}{e C_{1}}$ . For values of t such that $q^{*} \geq 2$ we can apply the previous inequality and obtain $\begin{matrix} P \{f (ϵ) \geq t E (f (ϵ))\} \leq {(\frac{\frac{C_{1} q^{*}}{2}}{\sqrt{t}})}^{q^{*}} \leq exp (- q^{*}) \leq e \cdot exp (- \frac{\sqrt{t}}{C_{1} \frac{e}{2}}) . \end{matrix}$

The second inequality in Lemma 4.3, leading to Chernoff bound like estimates, is a direct consequence of hypercontractivity inequalities for Gaussian Hilbert spaces (see Theorem 6.7 in [Citation41]).

Remark 3:

The log-concave probability distributions include normal, exponential, logistic, chi-square, chi and Laplace. A survey of these can be found in [Citation42]. The stochastic error is assumed to be a zero mean stochastic process with bounded covariance operator. Therefore, considering a general class of distributions larger than the Gaussian fit into the general setting of our problem. Due to Lemma 4.3, the overall rate of convergence of the two-step method will depend of the class of probability distributions to which the discretized noise term belongs to.

Remark 4:

Results about asymptotically optimality of local polynomial estimator with the regularization parameter chosen by cross-validation were proved in [Citation43,Citation44]. Details about practical implementation of this method could be found in [Citation45].

Remark 5:

An adaptive local polynomial estimator based on a particular implementation of Lepskii scheme and its convergence properties for local and global risks for a wide range of characterizations of smoothness and accuracy measures are presented in [Citation37]. The rates of convergence depend on the smoothness in the scale of Sobolev spaces of the exact data $u^{†}$ , the degree of the polynomial and the index of the Lebesgue norm of the global risk, and are optimal or optimal up to a logarithmic term, as expected from [Citation46,Citation47].

Remark 6:

A data-driven approach leading to adaptive estimation of both polynomial order and the bandwidth with the help of cross-validation can be found in [Citation48]. This method is uniformly consistent for a large class of functions $u^{†}$ and reaches the optimal rate for the case of correctly specified parametric model i.e. when $u^{†}$ is a polynomial whose order does not depend on sample size. Nevertheless, an order optimal approach aiming to adaptively tailor the degree and the tuning parameter of the local polynomial estimator is still an open problem.

4.2. Rate of convergence of the Tikhonov estimator for the balancing principle

In the following we compute and illustrate the empirical rate of convergence of the Tikhonov estimator for the balancing principle in the stochastic setting with the local polynomial estimator as pre-smoothing method. Since theoretical and numerical results imply that the Lepskii constant $k_{2}$ depend on the chosen application, we present a result concerning the computation of this constant for the inverse problem defined in (Equation31(31) $\begin{matrix} c_{u} = inf_{x \in (0, 1)} u^{†} (x) > 0 \end{matrix}$ (31) ). To this aim, the values of $Λ$ and $λ$ are given in the following lemma under general conditions for the direct operator.

Lemma 4.4:

If the assumptions from Corollary 3.6 are fulfilled and the supplementary conditions(36) $\begin{matrix} \{\begin{matrix} f \geq 0, \\ g > 0, \\ f - a^{†} \cdot g \geq 0 \end{matrix} \end{matrix}$ (36)

hold, then we have $\begin{matrix} λ & = \frac{1}{m a x (1, γ)} inf_{\partial Ω} g \\ Λ & = \frac{\sqrt{1 + 3 \cdot γ^{2} \cdot {(1 + \frac{D}{π})}^{2}}}{1 + {(\frac{D}{π})}^{2}} \cdot (\frac{1}{2} D^{2} \cdot sup_{Ω} f + sup_{\partial Ω} g) \cdot D \end{matrix}$

where D is the length of the interval $Ω$ .

Proof The Fréchet derivative of the operator defined in (Equation31(31) $\begin{matrix} c_{u} = inf_{x \in (0, 1)} u^{†} (x) > 0 \end{matrix}$ (31) ) is $F^{'} (a) h = T {(a)}^{- 1} (- h \cdot u_{a}), T (a) : H^{2} (Ω) \cap H_{0}^{1} (Ω) \to L^{2} (Ω)$

where $T (a) = u_{a}$ (see [Citation1]).The following bounds follow from the Gelfand triple structure of the Hilbert scale and the Banach algebra property of the Hilbert space $H^{2} (Ω)$ (see [Citation15] and references within): $\begin{matrix} \frac{1}{m a x (1, γ)} \frac{1}{‖ \frac{1}{u_{a}} ‖_{\infty}} {‖ h ‖}_{- 2} \leq ‖ F^{'} {(a) h ‖}_{0} \leq \frac{\sqrt{1 + 3 \cdot γ^{2} \cdot k^{2}}}{C} \cdot ‖ u_{a} ‖_{X_{0}} {‖ h ‖}_{- 2} \end{matrix}$

where k and C are Sobolev constants corresponding to embedding of $H_{0}^{1} (Ω)$ , respectively $H^{2} (Ω) \cap H_{0}^{1} (Ω)$ into $L^{2} (Ω)$ . From [Citation49,Citation50] it follows that these constants take the values $k = 1 + \frac{D}{π}$ and $C = 1 + {(\frac{D}{π})}^{2}$ .

Under the supplementary conditions (3.6), classical methods in partial differential equations lead to following bounds on the solution of the direct problem $\begin{matrix} inf_{\partial Ω} g \leq u_{a} (x) & \leq - V (x) \cdot sup_{Ω} f + sup_{\partial Ω} g \\ inf_{\partial Ω} g \leq u_{a} (x) & \leq \frac{1}{2} D^{2} \cdot sup_{Ω} f + sup_{\partial Ω} g \end{matrix}$

where $V (x) = \frac{1}{2} (| x - x_{0} |^{2} - D^{2}), x \in Ω$ for an $x_{0} \in Ω$ (see e.g. Exercise 4 in [Citation51]). Since $\begin{matrix} \frac{1}{m a x (1, γ)} \frac{1}{‖ \frac{1}{u_{a}} ‖_{\infty}} = inf_{Ω} u_{q} = inf_{\partial Ω} g \end{matrix}$

and $\begin{matrix} ‖ u_{a} ‖_{X_{0}} \leq D \cdot (\frac{1}{2} D^{2} \cdot sup_{Ω} f + sup_{\partial Ω} g) \end{matrix}$

we get $λ$ and $Λ$ from Lemma 3.6.

Remark 7:

The computation of the Lepskii constant is also feasible for smooth domains $Ω$ in $R^{n}$ , $n > 1$ . For example, results about the value of the Poincaré constant for convex domains in $R^{n}$ , $n > 1$ are available in [Citation52].

In our numerical simulations, the unknown parameter $a^{†}$ is chosen as a B-spline of order 2, 3 or 4 which corresponds to smoothness up to $H^{2.5}$ , $H^{3.5}$ respectively $H^{4.5}$ , and its upper bound $γ$ is equal to 1. The use of different larger values of $γ$ does not change the asymptotic rates of convergence, hence the method is robust with respect to $γ$ . The exact data $u^{†}$ has smoothness up to $H^{4.5}$ , $H^{5.5}$ respectively $H^{6.5}$ corresponding to the chosen B-splines. From Sobolev embedding theorem, this implies a Hölder condition of order up to 4, 5 and 6.

The Gaussian white noise was discretized such that we obtain an inverse regression problem with equidistant deterministic design $X_{1}, \dots X_{n} \in (0, 1)$ and normally distributed errors with variance $σ^{2} = 0.01$ . Since the discrete regression design is asymptotically equivalent to the Guassian white noise model (Equation3(3) $\begin{matrix} Y = u^{†} + σ ϵ \end{matrix}$ (3) ) (see e.g. [Citation11]), we consider the increasing sample sizes n 1000 , 2512 , 6310 , 15849 , 39811 , 100000 instead of letting the noise level going to 0 in the theoretical noise model, and 100 simulations for each sample size.

Remark 8:

Theorem 2.6 holds for noise levels that fulfill the inequality (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ). In our numerical example, we can explicitly formulate the relation between the noise level $β$ and the $q -$ norm of the difference between the initial guess and exact solution c. From Lemma 4.4, we get the values $λ = 1$ and $Λ = 3.4$ since $γ = 1$ , $D = 1$ , ${sup}_{Ω} f = 1$ and ${sup}_{\partial Ω} g = 1$ for a choice of the exact data $u^{†} (x) = 1 + x \cdot (1 - x)$ . The values of the constants $k_{1}$ and $k_{2}$ can be accordingly derived as $\begin{matrix} k_{1} & = 54 \cdot 2^{\frac{q}{4}} c^{2} + 597.42 \cdot 2^{\frac{3 (q + 2)}{16}} c^{1.5} + 195.98 \cdot 2^{\frac{q}{2}} c + 136.199 \cdot 2^{\frac{q}{2}} \sqrt{c} \\ k_{2} & = 143.24 \end{matrix}$

and, for a choice of the initial value for the regularization parameter $α_{1} = β^{2}$ , (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) becomes $\begin{matrix} k_{1} \cdot (2 β^{\frac{2 (q - s)}{p + 2}} + 1) \leq 3 \cdot k_{2} \end{matrix}$

For $q = 2.5$ we have $\begin{matrix} (83.28 \cdot c^{2} + 941.53 \cdot c^{1.5} + 490.948 \cdot c + 151.77 \cdot \sqrt{c}) (2 β^{\frac{1}{4}} + 1) \leq 429.73 \end{matrix}$

and the sufficient conditions for this inequality to hold are $c^{\frac{1}{2}} (1 + 2 β^{\frac{1}{4}}) \leq 0.26$ and $c \leq 1$ . Hence, for a noise level of 0.01, our initial guess should be as close in the $q -$ norm to the exact solution as 0.16 to ensure that condition (Equation16(16) $\begin{matrix} Φ (1) & \leq Ψ (1) \end{matrix}$ (16) ) holds. Moreover, the quality of the reconstruction is very sensitive to the choice of the a-priori guess.

To illustrate the rates of convergence for the two-step method with Lepskii choice of the regularization parameter, we used in the first step of the algorithm a local polynomial estimator with Gaussian kernel for the direct regression problem as stated in (Equation33(33) $\begin{matrix} {\hat{θ}}_{nh} (x) = {argmin}_{θ \int R^{d + 1}} \{\frac{1}{nh} \sum_{i = 1}^{n} {[Y_{i} - θ^{T} V (\frac{X_{i} - x}{h})]}^{2} K (\frac{X_{i} - x}{h})\} \end{matrix}$ (33) ) and an a-priori chosen bandwidth. We chose quadratic local polynomial estimator, since the linear estimator shows larger bias in regions with higher curvatures , and we varied the values of the bandwidth by using a constant $c_{h}$ chosen from the grid between 0.1 and 1 with step size 0.1 as factor to the a-priori value $n^{- \frac{1}{5}}$ . The empirical values of the rates of convergence $\hat{β}$ were computed by leave-1-out cross-validation and used in the second step of the method. An exhaustive introduction to the application of the local polynomial estimator and its challenges can be found in [Citation45]. The tail behaviour for this estimator fulfills the condition in Theorem 2.6 as a consequence of Lemma 4.3. For a polynomial of degree 2, it follows from Theorem 5.10 and Remark 5.13 in [Citation41] that $l_{2} = \frac{1}{e}$ and $η = 1$ .

In the second step, we applied the balancing principle to choose the regularization parameter as studied in Theorem 2.6. The Lepskii factor $k_{*}$ was chosen from a grid of values between 0.1 and 0.05, by minimizing the empirical mean square error, and the Lepskii constant is computed as $k_{*} \cdot \sqrt{1 + {(l_{2} log \frac{1}{\hat{β}})}^{\frac{1}{η}}} \hat{β}$ . An overview of the constants used for simulations and computation is given in the Table .

We present in Figure the convergence rates for the chosen smoothness of the exact solution, considering the sample sizes 1000 , 2512 , 6310 , 15849 , 39811 , 100000. The logarithm of the empirical estimation of the expected squared error $\sqrt{E (‖ {\hat{a}}_{+} - a^{†} ‖_{0}^{2})}$ of $\hat{a}$ is plotted over the logarithm of the empirical values of the rates of convergence of the local polynomial estimator for the three choices of $a^{†}$ shown in Figure . The empirical rates of convergence are plotted in full, dashed and dotted lines according to the three different smoothness conditions, while the theoretical ones corresponding to a line with slope $\frac{q}{2 + q}$ are illustrated in the dashed-dotted lines. Starting with a sample size $n = 15849$ , the linear trend dominates, approaching asymptotically the theoretical linear slopes.

Table 1. Overview of the constants.

Display Table

Acknowledgements

The author would like to thank the Isaac Newton Institute for Mathematical Sciences for support and hospitality during the programme ‘Variational methods and effective algorithms for imaging and vision’ when work on this paper was also undertaken.

Additional information

Funding

This work was supported by the Research Training Group 1023: Identification in Mathematical Models that was founded in 2004 at the Faculty of Mathematics at the University of Göttingen and was supported by the DFG (Deutsche Forschungsgemeinschaft) and by the Isaac Newton Institute for Mathematical Sciences (EPSRC [grant number EP/K032208/1], [grant number EP/R014604/]).

Notes

No potential conflict of interest was reported by the author.

References

Engl HW , Hanke M , Neubauer A . Regularization of inverse problems. Vol. 375, Mathematics and its applications. Dordrecht: Kluwer Academic Publishers Group; 1996.
Google Scholar
Sudakov VN , Halfin LA . A statistical approach to the correctness of the problems of mathematical physics. Dokl Akad Nauk SSSR. 1964;157:1058–1060.
Google Scholar
Bissantz N , Hohage T , Munk A , et al . Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J Numer Anal. 2007;45(6):2610–2636.
Web of Science ®Google Scholar
Mathé P , Pereverzev S . Optimal discretization of inverse problems in Hilbert scales. Regularization and self-regularization of projection methods. SIAM J Numer Anal. 2001;38:1999–2021.
Web of Science ®Google Scholar
O’Sullivan F . Convergence characteristics of methods of regularization estimators for nonlinear operator equations. SIAM J Numer Anal. 1990;27(6):1635–1649.
Web of Science ®Google Scholar
Lukas MA . Robust generalized cross-validation for choosing the regularization parameter. Inverse Probl. 2006;22(5):1883–1902.
Web of Science ®Google Scholar
Wahba G . Spline models for observational data. Vol. 59, CBMS-NSF regional conference series in applied mathematics. Philadelphia (PA): Society for Industrial and Applied Mathematics (SIAM); 1990.
Google Scholar
Bissantz N , Hohage T , Munk A . Consistency and rates of convergence of nonlinear Tikhonov regularization with random noise. Inverse Probl. 2004;20(6):1773–1789.
Web of Science ®Google Scholar
Klann E , Ramlau R . Regularization by fractional filter methods and data smoothing. Inverse Probl. 2008;24(2):025018, 26.
Web of Science ®Google Scholar
Klann E , Maaß P , Ramlau R . Two-step regularization methods for linear inverse problems. J Inverse Ill-Posed Probl. 2006;14(6):583–607.
Google Scholar
Tsybakov AB . Introduction to nonparametric estimation. Springer series in statistics. New York (NY): Springer; 2009. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
Google Scholar
Grama I , Nussbaum M . Asymptotic equivalence for nonparametric regression. Math Methods Statist. 2002;11(1):1–36.
Google Scholar
Meister A , Reiß M . Asymptotic equivalence for nonparametric regression with non-regular errors. Probab Theory Rel Fields. 2013;155(1–2):201–229.
Web of Science ®Google Scholar
Werner F , Hohage T . Convergence rates in expectation for Tikhonov-type regularization of inverse problems with Poisson data. Inverse Probl. 2012;28(10):104004, 15.
Web of Science ®Google Scholar
Hohage T , Pricop M . Nonlinear Tikhonov regularization in Hilbert scales for inverse boundary value problems with random noise. Inverse Probl Imag. 2008;2(2):271–290.
Web of Science ®Google Scholar
Natterer F . Error bounds for Tikhonov regularization in Hilbert scales. Appl Anal. 1984;18(1–2):29–37.
Web of Science ®Google Scholar
Neubauer A . Tikhonov regularization of nonlinear ill-posed problems in Hilbert scales. Appl Anal. 1992;46(1–2):59–72.
Google Scholar
Mair BA , Ruymgaart FH . Statistical inverse estimation in Hilbert scales. SIAM J Appl Math. 1996;56(5):1424–1444.
Web of Science ®Google Scholar
Nussbaum M , Pereverzev S . The degree of ill-posedness in stochastic and deterministic noise models. Berlin: WIAS; 1999. Technical Report.
Google Scholar
Goldenshluger A , Pereverzev S . On adaptive inverse estimation of linear functionals in Hilbert scales. Bernoulli. 2003;9:783–807.
Web of Science ®Google Scholar
Tautenhahn U . Error estimates for regularized solutions of nonlinear ill-posed problems. Inverse Probl. 1994;10(2):485–500.
Web of Science ®Google Scholar
Polzehl J , Spokoiny V . Error estimates for regularization methods in Hilbert scales. SIAM J Numer Anal. 1996;33(6):2120–2130.
Web of Science ®Google Scholar
Lepskiĭ OV . A problem of adaptive estimation in Gaussian white noise. Teor Veroyatnost i Primenen. 1990;35(3):459–470.
Web of Science ®Google Scholar
Mathé P , Pereverzev S . Geometry of linear ill-posed problems in variable Hilbert scales. Inverse Probl. 2003;19(3):789–803.
Web of Science ®Google Scholar
Lu S , Pereverzev S , Ramlau R . An analysis of Tikhonov regularization for nonlinear ill-posed problems under general smoothness assumptions. Inverse Probl. 2007;23(1):217–230.
Web of Science ®Google Scholar
Bauer F , Hohage T . A Lepskij-type stopping rule for regularized Newton methods. Inverse Probl. 2005;21(6):1975–1991.
Web of Science ®Google Scholar
Polzehl J , Spokoiny VG . Adaptive weights smoothing with applications to image restoration. J R Stat Soc Ser B Stat Methodol. 2000;62(2):335–354.
Google Scholar
Kreĭn SG , Petunin JI . Scales of Banach spaces. Uspehi Mat Nauk. 1966;21(2 (128)):89–168.
Google Scholar
Pereverzev S , Schock E . On the adaptive selection of the parameter in regularization of ill-posed problems. SIAM J Numer Anal. 2005;43(5):2060–2076. electronic.
Web of Science ®Google Scholar
Mathé P . The Lepskiĭ principle revisited. Inverse Probl. 2006;22(3):L11–L15.
Web of Science ®Google Scholar
Bauer F , Pereverzev S . Regularization without preliminary knowledge of smoothness and error behaviour. Eur J Appl Math. 2005;16(3):303–317.
Web of Science ®Google Scholar
Bauer F . An alternative approach to the oblique derivative problem in potential theory [PhD thesis]. Aachen: Universität Kaiserslautern; 2004.
Google Scholar
Neubauer A . On Landweber iteration for nonlinear ill-posed problems in Hilbert scales. Numer Math. 2000;85(2):309–328.
Web of Science ®Google Scholar
Egger H , Neubauer A . Preconditioning Landweber iteration in Hilbert scales. Numer Math. 2005;101(4):643–662.
Web of Science ®Google Scholar
Pricop M . Tikhonov regularization in Hilbert scales for nonlinear statistical inverse problems. Uelvesbüll: Der Andere Verlag; 2007.
Google Scholar
Renardy M , Rogers RC . An introduction to partial differential equations. 2nd ed. Vol. 13, Texts in applied mathematics. New York (NY): Springer-Verlag; 2004.
Google Scholar
Goldenshluger A , Nemirovski A . On spatially adaptive estimation of nonparametric regression. Math Methods Statist. 1997;6(2):135–170.
Google Scholar
Avery M . Literature review for local polynomial regression, unpublished manuscript; 2013. Available from: http://www4.ncsu.edu/mravery/AveryReview2.pdf
Google Scholar
Adams RA , Fournier J . Sobolev spaces. 2nd ed. Vol. 140, Pure and applied mathematics. Oxford: Elsevier Science; 2003.
Google Scholar
Carbery A , Wright J . Distributional and lq norm inequalities for polynomials over convex bodies in R n . Math Res Lett. 2001;8(3):233–248.
Web of Science ®Google Scholar
Janson S . Gaussian Hilbert spaces. Vol. 129, Cambridge tracts in mathematics. Cambridge: Cambridge University Press; 1997.
Google Scholar
Bagnoli M , Bergstrom T . Log-concave probability and its applications. Econom Theory. 2005;26(2):445–469.
Web of Science ®Google Scholar
Xia Y , Li WK . Asymptotic behavior of bandwidth selected by the cross-validation method for local polynomial fitting. J Multivariate Anal. 2002;83(2):265–287.
Web of Science ®Google Scholar
Li Q , Racine J . Cross-validated local linear nonparametric regression. Statist Sinica. 2004;14(2):485–512.
Web of Science ®Google Scholar
Fan J , Gijbels I . Local polynomial modelling and its applications. Vol. 66, Monographs on statistics and applied probability. London: Chapman & Hall; 1996.
Google Scholar
Lepskiĭ OV . Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor Veroyatnost i Primenen. 1991;36(4):645–659.
Web of Science ®Google Scholar
Lepskiĭ OV . Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Teor Veroyatnost i Primenen. 1992;37(3):468–481.
Web of Science ®Google Scholar
Hall PG , Racine JS . Infinite order cross-validated local polynomial regression. Department of Economics Working Papers McMaster University;2013.
Google Scholar
Chua SK , Wheeden RL . A note on sharp 1-dimensional Poincaré inequalities. Proc Amer Math Soc. 2006;134(8):2309–2316. electronic.
Web of Science ®Google Scholar
Talenti G . Best constant in Sobolev inequality. Ann Mat Pura Appl (4). 1976;110:353–372.
Google Scholar
Taylor ME . Partial differential equations. I. Vol. 115, Applied mathematical sciences. New York (NY): Springer-Verlag; 1996.
Google Scholar
Payne LE , Weinberger HF . An optimal Poincaré inequality for convex domains. Arch Rat Mech Anal. 1960;1960(5):286–292.
Google Scholar

Appendix 1

Proof of the Theorem 2.2: As

a_{α}

is a solution of the minimization problem for the Tikhonov functional, the following inequality holds

\begin{matrix} ‖ F (a_{α}) - u^{†} ‖_{Y}^{2} + α {‖ a_{α} - a_{0} ‖}_{s}^{2} & \leq ‖ F (a^{†}) - u^{†} ‖_{Y}^{2} + α {‖ a^{†} - a_{0} ‖}_{s}^{2} \\ = α ‖ a^{†} - a_{0} ‖_{s}^{2} . \end{matrix}

It follows that $\begin{matrix} ‖ F (a_{α}) - u^{†} ‖_{Y}^{2} + α {‖ a_{α} - a^{†} ‖}_{s}^{2} & \leq α (‖ a^{†} - a_{0} ‖_{s}^{2} + ‖ a_{α} - a^{†} ‖_{s}^{2} - ‖ a_{α} - a_{0} ‖_{s}^{2}) \\ = 2 α {〈a^{†} - a_{0}, a^{†} - a_{α}〉}_{s} . \end{matrix}$

From Assumption 2.1 C it follows that $\begin{matrix} F (a_{α}) = F (a^{†}) + F^{'} (a^{†}) (a_{α} - a^{†}) + r_{α}, with ‖ r_{α} ‖_{Y} \leq \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p} . \end{matrix}$

Replacing the difference $F (a_{α}) - F (a^{†})$ in the previous inequality we obtain(37) $\begin{matrix} \begin{matrix} ‖ F^{'} (a^{†}) (a_{α} - a^{†}) ‖_{Y}^{2} + {‖ r_{α} ‖}_{Y}^{2} & + 2 〈F^{'} (a^{†}) (a_{α} - a^{†}), r_{α}〉 + α {‖ a_{α} - a^{†} ‖}_{s}^{2} \\ \leq 2 α {〈a^{†} - a_{0}, a^{†} - a_{α}〉}_{s} . \end{matrix} \end{matrix}$ (37)

Using now Assumption 2.1 B we get that $\begin{matrix} λ^{2} ‖ a_{α} - a^{†} ‖_{- p}^{2} + α ‖ a_{α} - a^{†} ‖_{s}^{2} \leq 2 Λ \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p}^{2} + 2 α {〈a^{†} - a_{0}, a^{†} - a_{α}〉}_{s} . \end{matrix}$

Since we assumed that $a^{†} - a_{0} \in X_{q}$ we have that $\begin{matrix} \frac{λ^{2}}{2} ‖ a_{α} - a^{†} ‖_{- p}^{2} + α ‖ a_{α} - a^{†} ‖_{s}^{2} \leq 2 c α {‖ a_{α} - a^{†} ‖}_{2 s - q} \end{matrix}$

where $c = ‖ a^{†} - a_{0} ‖_{q}$ . We apply the interpolation inequality (Equation7(7) $\begin{matrix} {‖ x ‖}_{r} \leq {‖ x ‖}_{t}^{\frac{s - r}{s - t}} {‖ x ‖}_{s}^{\frac{r - t}{s - t}}, x \in X_{s} \end{matrix}$ (7) ) for $- p \leq 2 s - q \leq s$ and obtain(A1) $\begin{matrix} \frac{λ^{2}}{2} ‖ a_{α} - a^{†} ‖_{- p}^{2} + α ‖ a_{α} - a^{†} ‖_{s}^{2} \leq 2 c α ‖ a_{α} - a^{†} ‖_{- p}^{\frac{- s + q}{s + p}} {‖ a_{α} - a^{†} ‖}_{s}^{\frac{2 s - q + p}{s + p}} . \end{matrix}$ (A1)

Then it holds $\begin{matrix} ‖ a_{α} - a^{†} ‖_{- p}^{2} \leq \frac{4 c}{λ^{2}} α ‖ a_{α} - a^{†} ‖_{- p}^{\frac{- s + q}{s + p}} {‖ a_{α} - a^{†} ‖}_{s}^{\frac{2 s - q + p}{s + p}} \end{matrix}$

and we can write $\begin{matrix} ‖ a_{α} - a^{†} ‖_{- p}^{2} \leq {(\frac{4 c}{λ^{2}})}^{\frac{2 (s + p)}{3 s + 2 p - q}} α^{\frac{2 (s + p)}{3 s + 2 p - q}} {‖ a_{α} - a^{†} ‖}_{s}^{\frac{2 (2 s - q + p)}{3 s + 2 p - q}} . \end{matrix}$

From (EquationA2(A2) $\begin{matrix} ‖ a_{α} - a^{†} ‖_{- p}^{2} & \leq {(\frac{4 c}{λ^{2}} α)}^{\frac{2 (s + p)}{3 s + 2 p - q}} 2^{\frac{q + 2 p + s}{s + p} \frac{2 s - q + p}{3 s + 2 p - q}} c^{2 \frac{2 s - q + p}{3 s + 2 p - q}} {(\frac{α}{λ^{2}})}^{\frac{- s + q}{s + p} \frac{2 s + p - q}{3 s + 2 p - q}} \\ \leq 2^{\frac{2 s + 3 p + q}{s + p}} c^{2} {(\frac{α}{λ^{2}})}^{\frac{p + q}{s + p}} . \end{matrix}$ (A2) ) we have that $\begin{matrix} ‖ a_{α} - a^{†} ‖_{s}^{2} & \leq 2 c ‖ a_{α} - a^{†} ‖_{s}^{\frac{2 s - q + p}{s + p}} {(\frac{4 c}{λ^{2}})}^{\frac{q - s}{3 s + 2 p - q}} α^{\frac{- s + q}{3 s + 2 p - q}} {‖ a_{α} - a^{†} ‖}_{s}^{\frac{(- s + q) (2 s - q + p)}{(s + p) (3 s + 2 p - q)}} \\ = 2^{\frac{q + 2 p + s}{3 s + 2 p - q}} c^{\frac{2 (s + p)}{3 s + 2 p - q}} {(\frac{α}{λ^{2}})}^{\frac{- s + q}{3 s + 2 p - q}} {‖ a_{α} - a^{†} ‖}_{s}^{\frac{2 (2 s - q + p)}{3 s + 2 p - q}} \end{matrix}$

and it follows $\begin{matrix} ‖ a_{α} - a^{†} ‖_{s}^{2} & \leq 2^{\frac{q + 2 p + s}{s + p}} c^{2} {(\frac{α}{λ^{2}})}^{\frac{- s + q}{s + p}} \end{matrix}$

and(A2) $\begin{matrix} ‖ a_{α} - a^{†} ‖_{- p}^{2} & \leq {(\frac{4 c}{λ^{2}} α)}^{\frac{2 (s + p)}{3 s + 2 p - q}} 2^{\frac{q + 2 p + s}{s + p} \frac{2 s - q + p}{3 s + 2 p - q}} c^{2 \frac{2 s - q + p}{3 s + 2 p - q}} {(\frac{α}{λ^{2}})}^{\frac{- s + q}{s + p} \frac{2 s + p - q}{3 s + 2 p - q}} \\ \leq 2^{\frac{2 s + 3 p + q}{s + p}} c^{2} {(\frac{α}{λ^{2}})}^{\frac{p + q}{s + p}} . \end{matrix}$ (A2)

Using the interpolation inequality (Equation7(7) $\begin{matrix} {‖ x ‖}_{r} \leq {‖ x ‖}_{t}^{\frac{s - r}{s - t}} {‖ x ‖}_{s}^{\frac{r - t}{s - t}}, x \in X_{s} \end{matrix}$ (7) ) the desired rate of convergence is obtained $\begin{matrix} ‖ a_{α} - a^{†} ‖_{0} & \leq ‖ a_{α} - a^{†} ‖_{- p}^{\frac{s}{p + s}} {‖ a_{α} - a^{†} ‖}_{s}^{\frac{p}{s + p}} \\ \leq 2^{1 + \frac{q}{2 (s + p)}} c {(\frac{α}{λ^{2}})}^{\frac{1}{2} \frac{q}{p + s}} \\ \leq K \cdot α^{\frac{1}{2} \frac{q}{p + s}} \end{matrix}$

with K a constant that depends on $s, p, q, λ, c$ . $□$

Proof of the Theorem 2.3:

Let $a_{α}^{δ}$ be a solution of the optimization problem (Equation8(8) $\begin{matrix} a_{α}^{δ} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} ({‖ F (a) - Y ‖}_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) \end{matrix}$ (8) ) and let us consider the Lagrange-Euler equation for the optimization problem (Equation8(8) $\begin{matrix} a_{α}^{δ} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} ({‖ F (a) - Y ‖}_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) \end{matrix}$ (8) ). From classical results for calculus of variations we know that the first variation of the Tikhonov functional $J_{α}$ defined as the right hand side in (Equation8(8) $\begin{matrix} a_{α}^{δ} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} ({‖ F (a) - Y ‖}_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) \end{matrix}$ (8) ), is null for every direction $h \in X_{s}$ and, since we have Fréchet differentiability in $a_{α}^{δ}$ it follows that ${J_{α}}^{'} (a_{α}^{δ}) h = 0$ for all $h \in X_{s}$ . This means $\begin{matrix} {〈F (a_{α}^{δ}) - Y, F^{'} (a_{α}^{δ}) h〉}_{Y} + α {〈a_{α}^{δ} - a_{0}, h〉}_{s} = 0, h \in X_{s} . \end{matrix}$

Due to the same arguments, since $a_{α}$ is the solution of the minimization problem (Equation11(11) $\begin{matrix} a_{α} : = {argmin}_{a \in D (F) \cap (a_{0} + X_{s})} (‖ F (a) - u^{†} ‖_{Y}^{2} + α {‖ a - a_{0} ‖}_{s}^{2}) . \end{matrix}$ (11) ) and from the Lagrange-Euler equation it follows $\begin{matrix} {〈F (a_{α}) - u^{†}, F^{'} (a_{α}) h〉}_{Y} + α {〈a_{α} - a_{0}, h〉}_{s} = 0, h \in X_{s} . \end{matrix}$

As $a_{α}^{δ} - a_{0}, a_{α} - a_{0} \in X_{s}$ we have that $a_{α}^{δ} - a_{α} \in X_{s}$ and we can choose the direction $h = a_{α}^{δ} - a_{α}$ in both equalities. It follows that $\begin{matrix} {〈F (a_{α}^{δ}) - Y, F^{'} (a_{α}^{δ}) (a_{α}^{δ} - a_{α})〉}_{Y} + α {〈a_{α}^{δ} - a_{0}, a_{α}^{δ} - a_{α}〉}_{s} = 0, \end{matrix}$ $\begin{matrix} {〈F (a_{α}) - u^{†}, F^{'} (a_{α}) (a_{α}^{δ} - a_{α})〉}_{Y} + α {〈a_{α} - a_{0}, a_{α}^{δ} - a_{α}〉}_{s} = 0 . \end{matrix}$

Taking the difference between these two relations we obtain(A3) $\begin{matrix} \begin{matrix} {〈F (a_{α}^{δ}) - Y, F^{'} (a_{α}^{δ}) (a_{α}^{δ} - a_{α})〉}_{Y} - {〈F (a_{α}) - u^{†}, F^{'} (a_{α}) (a_{α}^{δ} - a_{α})〉}_{Y} \\ + α {〈a_{α}^{δ} - a_{α}, a_{α}^{δ} - a_{α}〉}_{s} = 0 . \end{matrix} \end{matrix}$ (A3)

From Assumption 2.1 we can write the Taylor polynomial as $\begin{matrix} F (a_{α}) = F (a^{†}) + F^{'} (a^{†}) (a_{α} - a^{†}) + \int_{0}^{1} \{F^{'} (a^{†} + t (a_{α} - a^{†})) - F^{'} (a^{†})\} (a_{α} - a^{†}) d t \\ F (a_{α}^{δ}) = F (a^{†}) + F^{'} (a^{†}) (a_{α}^{δ} - a^{†}) + \int_{0}^{1} \{F^{'} (a^{†} + t (a_{α}^{δ} - a^{†})) - F^{'} (a^{†})\} (a_{α}^{δ} - a^{†}) d t \end{matrix}$

and we get $\begin{matrix} F (a_{α}^{δ}) - F (a_{α}) & = F^{'} (a^{†}) (a_{α}^{δ} - a_{α}) \\ - \int_{0}^{1} \{F^{'} (a^{†} + t (a_{α} - a^{†})) - F^{'} (a^{†})\} (a_{α} - a^{†}) d t \\ + \int_{0}^{1} \{F^{'} (a^{†} + t (a_{α}^{δ} - a^{†})) - F^{'} (a^{†})\} (a_{α}^{δ} - a_{α}) d t \\ + \int_{0}^{1} \{F^{'} (a^{†} + t (a_{α}^{δ} - a^{†})) - F^{'} (a^{†})\} (a_{α} - a^{†}) d t . \end{matrix}$

We denote the three integrals as $I_{1}, I_{2}$ respectively $I_{3}$ and, from Assumption 2.1, we can bound their norms by $\begin{matrix} ‖ I_{1} ‖ & \leq \int_{0}^{1} ‖ F^{'} (a^{†} + t (a_{α} - a^{†})) - F^{'} (a^{†}) ‖_{Y \leftarrow X_{- p}} {‖ a_{α} - a^{†} ‖}_{- p} d t \\ \leq \int_{0}^{1} C_{L} t ‖ a_{α} - a^{†} ‖_{0} ‖ a_{α} - a^{†} ‖_{- p} d t = \frac{C_{L}}{2} ‖ a_{α} - a^{†} ‖_{0} {‖ a_{α} - a^{†} ‖}_{- p} \\ \leq \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p} \\ ‖ I_{2} ‖ & \leq \int_{0}^{1} ‖ F^{'} (a^{†} + t (a_{α}^{δ} - a^{†})) - F^{'} (a^{†}) ‖_{Y \leftarrow X_{- p}} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} d t \\ \leq \int_{0}^{1} C_{L} t ‖ a_{α}^{δ} - a^{†} ‖_{0} ‖ a_{α}^{δ} - a_{α} ‖_{- p} d t \leq \frac{C_{L}}{2} ‖ a_{α}^{δ} - a^{†} ‖_{0} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ \leq \frac{λ^{2}}{4 Λ} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ ‖ I_{3} ‖ & \leq \int_{0}^{1} ‖ F^{'} (a^{†} + t (a_{α}^{δ} - a^{†})) - F^{'} (a^{†}) ‖_{Y \leftarrow X_{- p}} {‖ a_{α} - a^{†} ‖}_{- p} d t \\ ‖ I_{3} ‖ & \leq \int_{0}^{1} C_{L} t ‖ a_{α}^{δ} - a^{†} ‖_{0} ‖ a_{α} - a^{†} ‖_{- p} d t \leq \frac{C_{L}}{2} ‖ a_{α}^{δ} - a^{†} ‖_{0} {‖ a_{α} - a^{†} ‖}_{- p} \\ \leq \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p} . \end{matrix}$ We write now the relation (EquationA4(A4) $\begin{matrix} λ^{2} ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{2} + α {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{2} & \leq (\frac{λ^{2}}{2} + \frac{λ^{4}}{16 Λ^{2}}) {‖ a_{α}^{δ} - a_{α} ‖}_{- p}^{2} \\ + (λ^{2} + \frac{λ^{4}}{8 Λ^{2}}) ‖ a_{α} - a^{†} ‖_{- p} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ + (Λ + \frac{λ^{2}}{4 Λ}) δ {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \end{matrix}$ (A4) ) as $\begin{matrix} {〈F (a_{α}^{δ}) - F (a_{α}) + F (a_{α}) - u^{†} + u^{†} - Y, (F^{'} (a_{α}^{δ}) - F^{'} (a^{†})) (a_{α}^{δ} - a_{α})〉}_{Y} \\ - {〈F (a_{α}) - u^{†}, (F^{'} (a_{α}) - F^{'} (a^{†})) (a_{α}^{δ} - a_{α})〉}_{Y} \\ + {〈F (a_{α}^{δ}) - F (a_{α}) - Y + u^{†}, F^{'} (a^{†}) (a_{α}^{δ} - a_{α})〉}_{Y} + α {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{2} = 0 . \end{matrix}$

Using the Taylor formula we get $\begin{matrix} 〈F^{'} (a^{†}) (a_{α}^{δ} - a_{α}) - I_{1} + I_{2} + I_{3} + F^{'} (a^{†}) (a_{α} - a^{†}) + I_{1} + u^{†} - Y, \\ {(F^{'} (a_{α}^{δ}) - F^{'} (a^{†})) (a_{α}^{δ} - a_{α})〉}_{Y} + α {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{2} \\ - {〈F^{'} (a^{†}) (a_{α} - a^{†}) + I_{1}, (F^{'} (a_{α}) - F^{'} (a^{†})) (a_{α}^{δ} - a_{α})〉}_{Y} \\ + {〈F^{'} (a^{†}) (a_{α}^{δ} - a_{α}) - I_{1} + I_{2} + I_{3} + u^{†} - Y, F^{'} (a^{†}) (a_{α}^{δ} - a_{α})〉}_{Y} = 0 . \end{matrix}$

It follows $\begin{matrix} α ‖ a_{α}^{δ} - a_{α} ‖_{s}^{2} + {‖ F^{'} (a^{†}) (a_{α}^{δ} - a_{α}) ‖}_{Y}^{2} \\ = {〈F^{'} (a^{†}) (a_{α} - a^{†}) + I_{1}, (F^{'} (a_{α}) - F^{'} (a^{†})) (a_{α}^{δ} - a_{α})〉}_{Y} \\ - {〈- I_{1} + I_{2} + I_{3} + u^{†} - Y, F^{'} (a^{†}) (a_{α}^{δ} - a_{α})〉}_{Y} \\ - 〈F^{'} (a^{†}) (a_{α}^{δ} - a_{α}) + I_{2} + I_{3} + F^{'} (a^{†}) (a_{α} - a^{†}) + u^{†} - Y, \\ {(F^{'} (a_{α}^{δ}) - F^{'} (a^{†})) (a_{α}^{δ} - a_{α})〉}_{Y} . \end{matrix}$

From Cauchy-Schwarz inequality, Assumptions 2.1 and the norm estimates for $I_{1}$ , $I_{2}$ and $I_{3}$ we have $\begin{matrix} λ^{2} ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{2} + α {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{2} \\ \leq (Λ ‖ a_{α} - a^{†} ‖_{- p} + \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p}) \frac{λ^{2}}{4 Λ} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ + Λ ‖ a_{α}^{δ} - a_{α} ‖_{- p} (\frac{λ^{2}}{4 Λ} ‖ a_{α} - a^{†} ‖_{- p} + \frac{λ^{2}}{4 Λ} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ + \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p} + δ) + (δ + Λ ‖ a_{α}^{δ} - a_{α} ‖_{- p} + \frac{λ^{2}}{4 Λ} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ + Λ ‖ a_{α} - a^{†} ‖_{- p} + \frac{λ^{2}}{4 Λ} {‖ a_{α} - a^{†} ‖}_{- p}) \frac{λ^{2}}{4 Λ} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} . \end{matrix}$

We can rewrite this inequality as(A4) $\begin{matrix} λ^{2} ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{2} + α {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{2} & \leq (\frac{λ^{2}}{2} + \frac{λ^{4}}{16 Λ^{2}}) {‖ a_{α}^{δ} - a_{α} ‖}_{- p}^{2} \\ + (λ^{2} + \frac{λ^{4}}{8 Λ^{2}}) ‖ a_{α} - a^{†} ‖_{- p} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ + (Λ + \frac{λ^{2}}{4 Λ}) δ {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \end{matrix}$ (A4)

As $λ \leq Λ$ we have $\frac{λ^{4}}{16 Λ^{2}} \leq \frac{λ^{2}}{16}$ and it holds $\begin{matrix} \begin{matrix} \frac{7 λ^{2}}{16} ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{2} + α {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{2} \leq & (λ^{2} + \frac{λ^{4}}{8 Λ^{2}}) ‖ a_{α} - a^{†} ‖_{- p} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ + (Λ + \frac{λ^{2}}{4 Λ}) δ {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \end{matrix} \end{matrix}$

To simplify the notation, we introduce a positive constant $k_{*} \in R$ that will vary from line to line in the following paragraphs. We have that $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{2} & \leq \frac{16}{7} (1 + \frac{λ^{2}}{8 Λ^{2}}) ‖ a_{α} - a^{†} ‖_{- p} ‖ a_{α}^{δ} - a_{α} ‖_{- p} + \frac{16}{7} (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ {‖ a_{α}^{δ} - a_{α} ‖}_{- p} \\ \leq k_{*} (‖ a_{α} - a^{†} ‖_{- p} ‖ a_{α}^{δ} - a_{α} ‖_{- p} + (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ {‖ a_{α}^{δ} - a_{α} ‖}_{- p}) \end{matrix}$

and this implies(A5) $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{- p} & \leq k_{*} (‖ a_{α} - a^{†} ‖_{- p} + (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ) \\ \leq k_{*} (c {(\frac{2 α}{λ^{2}})}^{\frac{p + q}{2 (s + p)}} + (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ) \end{matrix}$ (A5)

since we computed the rates of convergence for the $- p -$ norm of the approximation error $‖ a_{α} - a^{†} ‖_{- p}^{2}$ in (EquationA3(A3) $\begin{matrix} \begin{matrix} {〈F (a_{α}^{δ}) - Y, F^{'} (a_{α}^{δ}) (a_{α}^{δ} - a_{α})〉}_{Y} - {〈F (a_{α}) - u^{†}, F^{'} (a_{α}) (a_{α}^{δ} - a_{α})〉}_{Y} \\ + α {〈a_{α}^{δ} - a_{α}, a_{α}^{δ} - a_{α}〉}_{s} = 0 . \end{matrix} \end{matrix}$ (A3) ). Using (EquationA3(A3) $\begin{matrix} \begin{matrix} {〈F (a_{α}^{δ}) - Y, F^{'} (a_{α}^{δ}) (a_{α}^{δ} - a_{α})〉}_{Y} - {〈F (a_{α}) - u^{†}, F^{'} (a_{α}) (a_{α}^{δ} - a_{α})〉}_{Y} \\ + α {〈a_{α}^{δ} - a_{α}, a_{α}^{δ} - a_{α}〉}_{s} = 0 . \end{matrix} \end{matrix}$ (A3) ) and (EquationA6(A6) $\begin{matrix} \begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq & k_{1} α^{\frac{q}{p + s}} + (k_{1} + k_{2}) δ^{2} α^{- \frac{p}{s + p}} \end{matrix} \end{matrix}$ (A6) ) in (EquationA5(A5) $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{- p} & \leq k_{*} (‖ a_{α} - a^{†} ‖_{- p} + (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ) \\ \leq k_{*} (c {(\frac{2 α}{λ^{2}})}^{\frac{p + q}{2 (s + p)}} + (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ) \end{matrix}$ (A5) ) we notice that $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{s}^{2} & \leq \frac{1}{α} {‖ a_{α}^{δ} - a_{α} ‖}_{- p} (\frac{9}{8} λ^{2} {‖ a_{α} - a^{†} ‖}_{- p} + \frac{5}{4} Λ δ) \\ \leq \frac{1}{α} k_{*} (c {(\frac{2 α}{λ^{2}})}^{\frac{p + q}{2 (s + p)}} + (\frac{Λ}{λ^{2}} + \frac{1}{4 Λ}) δ) (\frac{9}{4} λ^{2} c {(\frac{2 α}{λ^{2}})}^{\frac{p + q}{2 (s + p)}} + \frac{5}{4} Λ δ) \\ \leq k_{*} (c^{2} {(\frac{2 α}{λ^{2}})}^{\frac{q - s}{p + s}} + c Λ δ {(\frac{2}{λ^{2}})}^{\frac{p + q}{2 (p + s)}} α^{\frac{- 2 s - p + q}{2 (p + s)}} + (\frac{Λ^{2}}{λ^{2}} + \frac{1}{4}) \frac{δ^{2}}{α}) \end{matrix}$

From (EquationA6(A6) $\begin{matrix} \begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq & k_{1} α^{\frac{q}{p + s}} + (k_{1} + k_{2}) δ^{2} α^{- \frac{p}{s + p}} \end{matrix} \end{matrix}$ (A6) ) it holds $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{2} & \leq k_{*} (c^{2} {(\frac{2 α}{λ^{2}})}^{\frac{p + q}{s + p}} + {(\frac{Λ}{λ^{2}} + \frac{1}{4 Λ})}^{2} δ^{2}) \\ \leq k_{*} (c^{2} {(\frac{2 α}{λ^{2}})}^{\frac{p + q}{s + p}} + {(\frac{Λ}{λ^{2}} + \frac{1}{4 Λ})}^{2} δ^{2}) \end{matrix}$

We apply now the interpolation inequality (Equation7(7) $\begin{matrix} {‖ x ‖}_{r} \leq {‖ x ‖}_{t}^{\frac{s - r}{s - t}} {‖ x ‖}_{s}^{\frac{r - t}{s - t}}, x \in X_{s} \end{matrix}$ (7) ) to obtain $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq & ‖ a_{α}^{δ} - a_{α} ‖_{- p}^{\frac{2 s}{p + s}} {‖ a_{α}^{δ} - a_{α} ‖}_{s}^{\frac{2 p}{p + s}} \end{matrix}$

Since ${(x + y)}^{r} \leq x^{r} + y^{r}$ for all x, y positive and $0 \leq r \leq 1$ , we get the following inequality $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq k_{*} (c^{\frac{2 s}{s + p}} {(\frac{2 α}{λ^{2}})}^{\frac{s (p + q)}{{(s + p)}^{2}}} + {(\frac{Λ}{λ^{2}} + \frac{1}{4 Λ})}^{\frac{2 s}{s + p}} δ^{\frac{2 s}{s + p}}) \cdot \\ k_{*} ({(\frac{2 α}{λ^{2}})}^{\frac{p (q - s)}{{(p + s)}^{2}}} + Λ^{\frac{p}{s + p}} {(\frac{2}{λ^{2}})}^{\frac{p (p + q)}{2 {(p + s)}^{2}}} α^{\frac{p (- 2 s - p + q)}{2 {(p + s)}^{2}}} δ^{\frac{p}{s + p}} + {(\frac{Λ^{2}}{λ^{2}} + \frac{1}{4})}^{\frac{p}{s + p}} {(\frac{δ^{2}}{α})}^{\frac{p}{s + p}}) \end{matrix}$

meaning that $\begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq k_{*} (c^{2} {(\frac{2 α}{λ^{2}})}^{\frac{q}{p + s}} + α^{\frac{2 s q + p q - p^{2}}{2 {(s + p)}^{2}}} {(\frac{2}{λ^{2}})}^{\frac{(p + q) (2 s + p)}{2 {(s + p)}^{2}}} δ^{\frac{p}{s + p}} \\ + {(\frac{Λ^{2}}{λ^{2}} + \frac{1}{4})}^{1 + \frac{s}{s + p}} Λ^{- \frac{2 s}{s + p}} α^{- \frac{p}{s + p}} δ^{2} \\ + {(\frac{Λ}{λ^{2}} + \frac{1}{4 Λ})}^{\frac{2 s}{s + p}} c^{\frac{2 p}{s + p}} {(\frac{2 α}{λ^{2}})}^{\frac{p (q - s)}{{(s + p)}^{2}}} δ^{\frac{2 s}{s + p}} \\ + c^{\frac{p}{s + p}} Λ^{\frac{p}{s + p}} {(\frac{Λ}{λ^{2}} + \frac{1}{4 Λ})}^{\frac{2 s}{s + p}} {(\frac{2}{λ^{2}})}^{\frac{p (p + q)}{2 {(s + p)}^{2}}} α^{\frac{p (- 2 s - p + q)}{2 {(s + p)}^{2}}} δ^{1 + \frac{s}{s + p}} \\ + c^{\frac{2 s}{s + p}} {(\frac{Λ^{2}}{λ^{2}} + \frac{1}{4})}^{\frac{p}{s + p}} {(\frac{2}{λ^{2}})}^{\frac{s (p + q)}{{(s + p)}^{2}}} α^{\frac{q s - p^{2}}{{(s + p)}^{2}}} δ^{\frac{2 p}{s + p}}) \end{matrix}$

where we used the inequalities $\frac{p}{s + p} \leq 0.5$ and $\frac{s}{s + p} \leq 1$ to bound the numerical constants. We now apply four times the Young inequality $\begin{matrix} a b \leq \frac{a^{k}}{k} + \frac{b^{l}}{l}, \end{matrix}$

which holds for all $a, b > 0$ , $1 < k, l < \infty$ and $\frac{1}{k} + \frac{1}{l} = 1$ with $a^{k} = δ^{2} α^{\frac{- p}{s + p}}$ . For the second term we take $b = α^{\frac{2 q s + p q}{2 {(s + p)}^{2}}}$ , $k = \frac{2 (s + p)}{p}$ and $l = \frac{2 (s + p)}{2 s + p}$ , for the fourth $b = α^{\frac{pq}{{(s + p)}^{2}}}$ , $l = \frac{s + p}{p}$ and $k = \frac{s + p}{s}$ , for the fifth $b = α^{\frac{pq}{2 {(s + p)}^{2}}}$ , $k = \frac{2 (s + p)}{2 s + p}$ and $l = \frac{2 (s + p)}{p}$ , for the sixth term $b = α^{\frac{pq}{{(s + p)}^{2}}}$ , $k = \frac{s + p}{s}$ and $l = \frac{s + p}{p}$ . This yields(A6) $\begin{matrix} \begin{matrix} ‖ a_{α}^{δ} - a_{α} ‖_{0}^{2} \leq & k_{1} α^{\frac{q}{p + s}} + (k_{1} + k_{2}) δ^{2} α^{- \frac{p}{s + p}} \end{matrix} \end{matrix}$ (A6)

where and $k_{2} = 12 {(\frac{Λ^{2}}{λ^{2}} + \frac{1}{4})}^{1 + \frac{s}{s + p}} Λ^{- \frac{2 s}{s + p}}$ can be easily computed from the second term and from an upper bound for $k_{*}$ , and $k_{1}$ is a constant that depends on $s, p, q, λ, Λ, c$ such that $k_{1} \geq K$ . $□$

For the sake of completeness, we state now the following result from [Citation40] that was applied in our article in the proof of Lemma 4.3.

Lemma A.1:

Let $p : R^{n} \to R$ be a polynomial of degree at most d. Suppose $0 < r \leq q < \infty$ and $μ$ is a log-concave probability measure on $R^{n}$ . Then there is an absolute constant C such that $\begin{matrix} {({\int ‖ p (x) ‖}^{\frac{p}{q}} d μ (x))}^{\frac{1}{q}} \leq C \frac{max (q, 1)}{max (r, 1)} {({\int ‖ p (x) ‖}^{\frac{r}{d}} d μ (x))}^{\frac{1}{r}} . \end{matrix}$

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Nonlinear Tikhonov regularization in Hilbert scales with balancing principle tuning parameter in statistical inverse problems

ABSTRACT

1. Introduction

2. Balancing principle for nonlinear inverse problems in Hilbert scales

2.1. Rates of convergence for balancing principle in deterministic setting

2.2. Rates of convergence for balancing principle in stochastic setting

3. Applications

3.1. Hammerstein operator

3.2. Reconstruction of a reaction coefficient

3.2.1. Theoretical rates of convergence

4. Numerical simulations

4.1. Tail behaviour of local polynomial estimator

4.2. Rate of convergence of the Tikhonov estimator for the balancing principle

Table 1. Overview of the constants.

Acknowledgements

References

Appendix 1

Information for

Open access

Opportunities

Help and information

Nonlinear Tikhonov regularization in Hilbert scales with balancing principle tuning parameter in statistical inverse problems

ABSTRACT

1. Introduction

2. Balancing principle for nonlinear inverse problems in Hilbert scales

2.1. Rates of convergence for balancing principle in deterministic setting

2.2. Rates of convergence for balancing principle in stochastic setting

3. Applications

3.1. Hammerstein operator

3.2. Reconstruction of a reaction coefficient

3.2.1. Theoretical rates of convergence

4. Numerical simulations

4.1. Tail behaviour of local polynomial estimator

4.2. Rate of convergence of the Tikhonov estimator for the balancing principle

Table 1. Overview of the constants.

Acknowledgements

Additional information

Funding

Notes

References

Appendix 1

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date