387
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Nonlinear Tikhonov regularization in Hilbert scales with balancing principle tuning parameter in statistical inverse problems

Pages 205-236 | Received 10 Jul 2016, Accepted 10 Mar 2018, Published online: 25 Mar 2018

ABSTRACT

In this article, we study the balancing principle for Tikhonov regularization in Hilbert scales for deterministic and statistical nonlinear inverse problems. While the rates of convergence in deterministic setting is order optimal, they prove to be order optimal up to a logarithmic term in the stochastic framework. The two-step approach allows us to consider a data-driven algorithm in a general error model for which an exponential behaviour of the tail of the estimator chosen in the first step is valid. Finally, we compute the overall rate of convergence for a Hammerstein operator equation and for a parameter identification problem. Moreover, we illustrate these rates for the last application after we study some large sample properties of the local polynomial estimator in a general stochastic framework.

AMS SUBJECT CLASSIFICATIONS:

1. Introduction

Solving inverse problems in a deterministic setting involves reconstructing a mathematical object in an ill-posed operator equation (see e.g. [Citation1]). The original article [Citation2] considers the stochastic extension of inverse problems to (Gaussian) random fields and formalize it as a statistical estimation question by discretization with the help of sample functions. Loosely speaking, statistical inverse problems are inverse problems with random noise. Moreover, classical statistical problems like convolution or error-in-variables models could be rewritten as linear statistical inverse problems [Citation3].

In this paper, we consider the problem of estimating an unknown, not directly observable quantity a from the noisy values of an element u related to a by the operator equation(1) F(a)=u,(1)

where F:D(F)XY is a nonlinear, injective operator between two Hilbert spaces X and Y. We assume that the equation is ill-posed, in the sense that F-1 is not continuous. First, we have at our disposal the data Y perturbed by deterministic noise such that(2) Y=u+δξ(2)

where ξY, ξ1 is the normalized deterministic error and δ is the deterministic noise level. The results obtained in the framework (Equation2) are further utilized in the stochastic process setting that is generalized here to the abstract noise model(3) Y=u+σϵ(3)

where ϵ is a normalized Hilbert space process, σ is the corresponding stochastic noise levels, and the relation (Equation3) is understood in the weak sense. Our general noise model is inspired by [Citation4] and its relation to several common used statistical models was investigated in [Citation3]. A classical example is the inverse nonparametric regression problem, where the data are indirectly related to the functional mean by an operator equation in contrast to the direct regression where we observe noisy values of this function.

A first method for solving nonlinear inverse problems in the statistical setting, inspired by Tikhonov regularization and named Method of Regularization, was studied by O’Sullivan, Lukas and Wahba (see [Citation5Citation7]). The straightforward generalization of this method to the infinite-dimensional noise model is a¯:=argminaD(F)F(a)-YY2+αa-a0X2.

where a0 is an initial guess for a. However, in the important case when ϵ is white noise F(a)-YY is infinite with probability 1 for all aD(F), so a¯ is not well-defined.

Since our aim is to focus on data-driven methods, we will like to include statistical models that go beyond the additive setting and for whom the direct statistical problem (like the direct nonparametric regression problem) is already studied and better understood. Hence, we choose a two-step estimator as suggested and studied in [Citation8] that gives us the flexibility to embed progress made in the direct stochastic setting into statistical nonlinear inverse problems. In the deterministic case, a pre-smoothing step allows to overcome the oversmoothing effect of the Tikhonov and Landweber regularization method (see [Citation9]). Data estimation in this framework could be achieved by Tikhonov regularization, regularization of the embedding operator or by wavelet shrinkage (see [Citation10]). In the first step, a possibly biased estimator u^ of u is determined from the data Y such that the error estimate(4) Eu^-uY2β.(4)

holds true and the rate of convergence β that depends on the stochastic noise σ is known. This is a nonparametric regression problem which has been studied intensively in the statistical literature if ϵ is a Gaussian white noise process and the smoothness conditions are ellipsoids in the Sobolev spaces Hb on the domain (0, 1) (see [Citation11] and references therein). In this case, the optimal rates of convergence (Equation4) for adaptive and non-adaptive estimators are βσb2b+1. Moreover, this rate will also hold for non-Gaussian white noise if the error probability distribution is regular in classical sense since in this case the distribution-free nonparametric regression problem is asymptotically equivalent to a homoskedastic Gaussian shift white noise model (see [Citation12,Citation13]). β will play the role of thenoise level in the next step. In the second step, the estimator a^ is constructed as the solution of the Tikhonov minimization problem (5) a^:=argminaD(F)F(a)-u^Y2+αa-a0X2.(5)

Therefore, this method allows the discretization of the problem (Equation3) for a large variety of designs and the unknown noise level is de facto estimated in the first step. On the other hand, this step can make the computation cumbersome and it is superfluous for designs for which methods and convergence analysis are already available (like for Poisson noise in [Citation14]). Moreover, the rates of convergence depend on the nature of the noise through the tail behaviour of the stochastic method chosen in the first step, making necessary to specify conditions on the error probability distribution to concretize this result for different settings. It follows from [Citation8] that the method is consistent in the sense that for any choice of minimizing elements a^D(F) we have(6) Ea^-aX20(6)

if α0 and β2/α0. Moreover, in [Citation15] rates of convergence for a large set of smoothness classes expressed with the help of Hilbert scales were computed and optimality of this method was proved for linear operators. Regularization in Hilbert scales was studied for the first time by Natterer [Citation16] for linear and by Neubauer [Citation17] for nonlinear deterministic inverse problems. Linear statistical inverse problems have been studied in a Hilbert scale framework in [Citation4,Citation18Citation22]. In [Citation15] rates of convergence were obtained for nonlinear statistical inverse problems in Hilbert scale for a-priori choice of the regularization parameter α. A-priori choice of the regularization parameter yields order-optimal rates of convergence but requires the knowledge of the noise levels δ and σ, and of the smoothness of the solution q. We focus in this paper on an adaptive tuning parameter rule, the balancing principle, which was proposed by Lepskii for the choice of the regularization parameter for a regression problem in [Citation23]. His principle of bounding the best possible accuracy by a function which is non-increasing with respect to α-values was used by Mathé and Pereverzev to obtain order-optimal estimators in variable Hilbert scales (see [Citation24]). Afterwards this principle was applied by different authors for deterministic linear and nonlinear inverse problems [Citation24Citation26].

Our aim is to study rates of convergence of the two-step method with balancing principle choice of the regularization parameter in a general noise model for a large class of smoothnesses expressed with the help of Hilbert scales. This extends the method in [Citation15] to a fully data-driven algorithm in both steps for a general class of additive noise models, including some non-Gaussian white noise processes, as specified before. To this aim, we introduce in the following section Hilbert scales Xs and their corresponding norms ·s,sR, and prove rates of convergence of the error for the Lepskii estimator in nonlinear deterministic inverse problems. In addition to the common assumptions on the operator F as in [Citation15], a smallness condition on the noise level δ is necessary as it is usual the case for Lepskii tuning methods. The deterministic rates of convergence allow us to get rates of convergence of Ea^-a02 with respect to β that are order optimal up to a log-factor for nonlinear statistical inverse problems under a supplementary condition on the tail behaviour of the estimator u^ in Section 2.2. The overall rate of convergence for the two-step method depends on the nature of the noise through the asymptotic behaviour of the tail and the mean square error of the estimator chosen in the first step. We will illustrate the overall capacity of the two-step approach to reach order optimal (up to a logarithmic term) rates of convergence in the Section 3 of our article, containing the numerical simulation for the reconstruction of a reaction coefficient from Gaussian noisy data in the setting of an inverse nonparametric regression. The asymptotic equivalence between this discrete model and that of the Gaussian white noise (Equation3) is discussed for example in [Citation11] as well as the relation between the variance in the continuous setting and the sample size n: σ1n. We also present new results regarding this behaviour for the local polynomial estimator in the case of errors with Gaussian or log-concave probability distribution, that are useful to illustrate the numerical performance of our algorithm in the last section. Due to the great flexibility of the two-step method, methodological progress in the study of the local polynomial estimator in a large class of statistical models like binary response model, inhomogeneous exponential and Poisson models by adaptive weights smoothing could provide the necessary tools to deal with non-Gaussian settings (see [Citation27]). The study of the overall optimal rates of convergence for these cases is subject of further research.

Two different applications are considered in this paper: one statistical inverse problem defined by the Hammerstein operator, for which we also define the corresponding Hilbert scale and the problem of the reconstruction of a reaction coefficient already studied in [Citation15]. We compute the overall rate of convergence for these two examples that turns out to be optimal up to a log-factor, and graphically compare the numerical and theoretical rates of convergence for the second application in the last section.

2. Balancing principle for nonlinear inverse problems in Hilbert scales

The overall rate of convergence of the two-step method for the balancing choice of the regularization parameter depends on the stochastic model and can be computed by considering first the deterministic setting and afterwards extending it to the statistical framework. In the following, we focus on the rates of convergence for the Lepskii choice of the regularization parameter for nonlinear inverse problems in Hilbert scales separately in a deterministic and in a stochastic framework. The order optimal rates of convergence proved in the deterministic setting and the properties of the nonparametric estimator in the pre-smoothing step lead to optimal up to a logarithm term rates of convergence in the Gaussian stochastic case.

2.1. Rates of convergence for balancing principle in deterministic setting

We consider in this section a deterministic framework, meaning that we assume the noise model (Equation2). To characterize the smoothness of the solution a, a Hilbert scale is defined in the following with the help of a densely defined, unbounded, self-adjoint, strictly-positive operator L in the Hilbert space X. We consider the set M of the elements x for which all powers of L are definedM=k=0D(Lk) M is dense in X and, by spectral theorem, Ls is well-defined for all sR. For x,yM and sR, letx,ys:=Lsx,Lsy,xs=Lsx.

Then the Hilbert space Xs is defined as the completion of M with respect to the norm ·s and (Xs)sR is called the Hilbert scale induced by L. A comprehensive introduction in Hilbert scales can be found in [Citation1,Citation28]. In our proofs we are going to use repeatedly the interpolation inequality(7) xrxts-rs-txsr-ts-t,xXs(7)

which holds for any t<r<s. We define our estimator aαδ as (8) aαδ:=argminaD(F)a0+XsF(a)-YY2+αa-a0s2(8)

where s0 and a0D(F) is an initial guess. We obtain rates of convergence for the deterministic error aαδ-a0 if the following assumptions on the operator F hold:

Assumptions 2.1:

(A)

(assumptions on F) If F(a)=F(a) for some aD(F)a0+Xs, then a=a. Moreover, D(F) is convex, F:D(F)a0+XsY is weakly sequentially closed, and F:D(F)XY has a Fréchet derivative F:D(F)L(X,Y).

(B)

(smoothing properties of F) There exist constants p[0,s] and λ,Λ>0 such that for all hX (9) λh-pF(a)hYΛh-p.(9)

(C)

(Lipschitz continuity of F and boundedness condition on D(F)) There exists a constant CL>0 such that (10) F(a)-F(a)YX-pCLa-a0λ22Λ(10) for all aBρ(a)a0+Xs for some ρ>0 such that BρD(F).

We define the estimator aα corresponding to the exact data u by (11) aα:=argminaD(F)a0+XsF(a)-uY2+αa-a0s2.(11)

We treat the noise free error aα-a0 and the data noise error aαδ-aα0 separately as it is common for this adaptive methods (see e.g. [Citation29]).

Theorem 2.2:

If the Assumptions 2.1 hold true and if aD(F) fulfills(12) a-a0Xqfor someq[s,2s+p](12)

then the estimationaα-a0Kαq2(p+s)

holds true, where c=a-a0q and K is a constant dependending on s,p,q,λ,Λ,c.

For a better readability of this paper, the proof can be found in the Appendix 1. The interval [s,2s+p] for the smoothness degree is due to the technical condition (Equation7) and can be interpreted as a saturation limit (see e.g. [Citation17]). The exponent of α in the bound on the noise free error is similar to those computed for linear inverse problems, but the constant depends on the q-norm of a-a0. It was not the aim of this research, to overcome this restriction, but it might be an interesting topic to study.

Theorem 2.3:

Under the assumptions of Theorem 2.2 we have that(13) aαδ-aα02k1αqp+s+k1+k2δ2α-ps+p(13)

where k1K and k2 are constants dependending on s,p,q,λ,Λ,c, respectively s,p,λ,Λ.

The proof of this Theorem can also be found in Appendix 1. This is based, together with the proof of the previous theorem, on the computation of bounds for the noise-free and data noise term in ·s and ·-p norms, and the interpolation inequality (Equation7). The classical results for the linear inverse problems yield a decreasing function in α as upper-bound for the data noise error term. Due to the non-linearity, the product aαδ-aα-paα-a-p2 appears in the upper bound and because of the adaptivity, we can control it only through expressions containing the term αqp+s. The constants can be explicitely computed i.e.(14) k2=12Λ2λ2+141+ss+pΛ-2ss+p(14)

but for the sake of simplicity we renounced to formally calculate them in the proof. Finally, we get a sum between an increasing and a decreasing function in the tuning parameter, and we define an adaptive choice of the regularization parameter based on this. To ensure consistency of the method, a smallness conditions on σ and c needs to be fulfilled, as we will see below. First, we present a direct consequence of the previous results.

Corollary 2.4:

Under the assumptions and notations of Theorems 2.2 and 2.3, it holds(15) aαδ-a024k1αqp+s+2k1+k2δ2α-ps+p(15)

where k1 and k2 are the constants from Theorem 2.3.

The choice of the regularization parameter αδ2(p+s)p+q gives the order optimal rate of convergenceaαδ-a022Cδ2qp+q

with C which depends on a, p, s, q, λ, Λ and c.

Proof The first inequality follows immediately from Theorems 2.2 and 2.3 and allows us to estimate the tuning parameter α, by balancing between the upper bounds of the approximation and noise-free errors. For α=αδ2(p+s)p+q we getaαδ-a022C(δ2(p+s)p+q)qp+q=2Cδ2qp+q.

Figure 1. Upper bounds for the nonlinear data noise error (dotted line), Lepskii rule Φ (dashed line), linear data noise error (full line) for error level δ2=0.5 (left, diamond) and 0.01 (right, diamond), and the corresponding minimum points for Ψ (circle).

Figure 1. Upper bounds for the nonlinear data noise error (dotted line), Lepskii rule Φ (dashed line), linear data noise error (full line) for error level δ2=0.5 (left, diamond) and 0.01 (right, diamond), and the corresponding minimum points for Ψ (circle).

We discuss now an adaptive choice of the regularization parameter based on the balancing principle and on the previous bounds computed in Theorems 2.2 and 2.3. We follow here the approach from [Citation24,Citation25,Citation30]. Hence, we estimate the value of the regularization parameter α that approximates the minimizer of the total error aαδ-a02 by the minimizer of the bounding sum resulting from the aforesaid theorems. This optimum is computed from a finite set of possible parametersM=α1,αj=α1(l2)j-1,j=2,,m,withm=mink:αk1

where l>1, α1 is usually equal to δ2, and we denote aαjδ=aj. We define the non-increasing function Ψ:{1,2,,m}R+,Ψ(j)=6k2δ2αj-ps+p

and the function Φ:{1,2,,m}R+,Φ(j)=4k1αjqp+s+2k1δ2αj-ps+p

with the constants k1 and k2 as in Theorem 2.3. It can be easily checked that Φ is non-decreasing on the set M if α1>12δ2 since the minimum point of the real-valued expression 4k1αqp+s+2k1δ2α-ps+p is smaller than 12δ2 in our setting.

If the inequality(16) Φ(1)Ψ(1)(16)

holds, then we can formulate a rule to choose the tuning parameter α and define an order-optimal estimator independent from smoothness q. Condition (Equation16) assures the existence of at least one intersection point between the curves defined by Φ and Ψ and is similar to the one usual used in the framework of the linear inverse problems, as the Figure also illustrates. It implies as well that our results will be valid only for q-distances c and noise levels small enough such that the right hand side term of (Equation16) bounds an algebraic expression involving these two values. An explicit example of this relationship is given in Section 4.2.

Therefore, we determine the regularization parameter in the set M for which the following inequality holdsi=maxi:αiM,k12αiq+pp+s+δ23k2δ2

and α=αi. The computation of α and a=aαi is unfeasible, since q is unknown, therefore we approximate them byi+=maxi:ai-aj0232k2δ2αj-ps+p,j=1,2,,i

where ai=aαi, α+=αi+ and a+=aα+(see [Citation29] for details). Moreover, a simplified version of Lepskii criterion was also presented in the aforementioned article, which had the origins in the quasi-optimality choice of the regularization parameter proposed by Tikhonov and Glasko. In this case, we choose the estimator a=aαi corresponding to the regularization parameter α=αi with the indexi=maxi:aj-aj-10232k2δ2αj-1-ps+p,j=1,2,,i.

If condition (Equation16) holds, all three sets defined above are different from the empty set and, for k1k2, not equal to M.

Theorem 2.5:

If the assumptions of Theorem 2.2 and the inequality (Equation16) hold true, we have thatii+i

and we get order optimal rates of convergence both for a+ and a i.e.a+-a0c+αqp+qa-a0cαqp+q

for constants c+ and c which depends on a,p,s,γ,Γ and c.

Proof The inequalities in Theorem 2.5 follow from the order relation between the sets involved in the definition of the three tuning rules. a fulfillsa-aj022a-a02+aj-a0232k2δ2αj-ps+p

for any ji since α-ps+pαj-ps+p, and it is obvious thataj-aj-10232k2δ2αj-1-ps+p

for ji+ (see [Citation29]). Since the proof of the second part of the theorem is similar for both rates of convergence, we just show the second result in the following paragraph. From the previous inequality i it holds that ii and we can writea-a0a-a0+j=i+1iaαj-1-aj0a-a0+42k2δj=0i-i-11αp2(s+p)lpjs+p

Moreover, the definition of α allows us to obtain order optimal rates of convergence (see e.g. [Citation15,Citation16] and references therein for a discussion about optimal rates for Tikhonov regularization in Hilbert scales.)a-a042k2δαp2(s+p)1-1l(i-i+)ps+p1-1lps+p+42k2δα-p2(s+p)cδqp+q.

Remark 1:

The last condition (Equation16) implies that our results will be valid only for a set of noise levels depending on the q-norm c. Generally, the feasibility of the estimator depends on the initial guess, and this problem can be solved for example by considering different a-priori values and applying the algorithm for each of them.

2.2. Rates of convergence for balancing principle in stochastic setting

In this section we consider the stochastic noise model (Equation3) and apply the two-step method for this statistical inverse problems. Hence, we choose first an estimator u^ as data and its known rate of convergence (Equation4) as noise level. We define the estimators a^α as (17) a^α:=argminaD(F)a0+XsF(a)-u^Y2+αa-a0s2(17)

where s0 and a0X is an initial guess. In the following, the balancing principle is going to be applied in a similar way as in the previous section and rates of convergence are going to be proven. These results are obtained under the supplementary condition for the tail of the probability distribution of the estimator u^ (18) Pω:u^-Eu^Y2τEu^-Eu^Y2l1exp(-l2τη)(18)

for any τ greater than 1 such that τ-1l2η>l3, where l1, l2 and l3 are positive universal constants and 0<η1. We choose the set of regularization parameters asM:=αj=β2(l2)j-1,j=2,,mink:αk1

with l>1 and the estimator a^+=a^αis+ withis+=max{i:a^i-a^j042·k21+(l2ln1β)1ηβαj-p2(s+p),j=1,2,,i}.

The rates of convergence for this estimator can be computed using similar techniques as in [Citation31,Citation32].

Theorem 2.6:

If the assumptions of Theorem 2.2 and the conditions (Equation16) and (Equation18) are fulfilled then, for all βexp(-l3p+q2q), we have the following error bound for a^+ Ea^+-a02K+β2qp+qln1β1ηqp+q

where K+ is a constant which depends on c+, l1, l2 and CL.

Proof For τ fulfilling the conditions of the Theorem 2.6 we denote(19) Ωτ=u^-uY2τEu^-uY2(19)

as the set of extreme values of the Y-error of the estimator u^. Due to the bias-variance decompositionEu^-uY2=Eu^-E(u^)Y2+E(u^)-uY2

we can rewrite the inequality (Equation19) asu^-Eu^Y2+2u^-Eu^,Eu^-uY+Eu^-uY2τEu^-Eu^Y2+τEu^-uY2.

From Cauchy-Schwarz inequality, the set inclusion holdsΩτ{u^-Eu^Y2+1τ-1u^-Eu^Y2+(τ-1)Eu^-uY2+Eu^-uY2τEu^-Eu^Y2+τEu^-uY2}

leading toΩτu^-Eu^Y2(τ-1)Eu^-Eu^Y2.

From (Equation18) we get(20) P(Ωτ)l1exp-(τ-1)ηl2(20)

for all τ1 such that τ-1l2η>l3. With the help of conditional expectation we can writeEa^+-a02=P(Ωτ)Ea^+-a02|Ωτ+P(CΩτ)Ea^+-a02|CΩτ

where CΩτ=Ω\Ωτ is the complementary set of Ωτ in Ω. We apply the deterministic convergence result for Lepskii choice of the regularization parameter (see Theorem 2.5) on the set CΩτ for τ=1+l22qp+qln1β1η and use the exponential inequality (Equation20) to bound P(Ωτ) for noise levels βexp(-l3p+q2q) . This yieldsEa^+-a02l1exp-(τ-1)ηl2λ22ΛCL+1-P(Ωτ)c+(τβ2)qp+ql1λ22ΛCLexp-2qp+qln1β+c+β21+l22qp+qln1β1ηqp+ql1λ22ΛCLβ2qp+q+c+β2qp+q1+l22qp+qln1βqη(p+q)K+β2qp+qln1βqη(p+q).

Remark 2:

Like in the deterministic setting, we can choose a simplified version for the choice of the regularization parameter that reduces the computational cost. We select the estimator a^=a^αi withi=maxi=1,2,m:a^i-a^i-1042k21+(l2ln1β)1ηβαi-1-p2(s+p)

In the same way as in the Theorem 2.6 we can prove that the rate of convergence is equal to(21) E(a^-a02)Kβ2qp+qln1β1ηqp+q.(21)

3. Applications

3.1. Hammerstein operator

In the last section, we compute the overall rate of convergence of the two-step method for an inverse problem originating in a non-linear integral equation and inspired by [Citation33]. We consider here the Hammerstein operator(22) F:H1L2a0Φ(a(t))dt(22)

where Φ is a function in the space of Hölder continuous functions C2,1R with bounded second derivative(23) Φ(t)C0,1K(23)

for all tR, and H1 denotes the Sobolev space of index 1 of functions on the interval [0, 1]. Our inverse problem is to determine a from the knowledge of F(a). Similar conditions as in Assumptions 2.1 were used in [Citation34] to study the Landweber iterative method in Hilbert scales. The following properties of the operator F and its Fréchet derivative are going to be useful for choosing an appropriate Hilbert scale. The detailed proofs of the following theoretical results can be found in [Citation35].

Lemma 3.1:

F is a continuous, compact, weakly closed and Fréchet differentiable operator with the Fréchet derivative(24) (F(a)h)(t)=0tΦ(a(s))h(s)ds(24)

The adjoint of the Fréchet derivative is given by(25) F(a)g=B-1Φ(a(·))1g(s)ds.(25)

Here B:vH2:v(0)=v(1)=0L2 is defined by Bv:=-v+v.

We assume from now on that(26) Φ(s)>0,for allsR.(26)

In this case, the operator F is injective and we determine the set R(F(a)) in order to choose a Hilbert scale which fulfills Assumptions 2.1. Let the linear integral operator A:L2L2 be defined as (Af)(x)=0xf(t)dt. F(a) is the composition of three operators: A, the multiplication operator MΦa:L2L2,MΦa(h)=(Φa)h and B-1. The operator A:L2L2,A(g)(s)=s1g(t)dt has the rangeR(A)=uH1:u(1)=0

andM(Φa)(R(A))=uH1:u(1)=0

as Φa is in H1. Note that due to (Equation26) there exists γ>1 such that1γΦ(a(s))γ

for all s[0,1] since a<. Since the restrictionB~:=B|D(B)H3:vH3:v(0)=v(1)=0H1

is an isomorphism by elliptic regularity results and since (Bv)(1)=0 is equivalent to v(1)=v(1) we finally getR(F(a))=R(B-1MΦaA)=vH3:v(0)=v(1)=0,v(1)=v(1).

As the operators A,M(Φa) and B-1 are injective, we have that F(a) is bijective between L2 and R(F(a)) and F(a)wH3wL2 for all wL2.

We would like to have a Hilbert scale XssR with the following properties: it holds (27a) X0=H1(27a) (or higher regularity of X0) to be able to show Assumptions 2.1, and we needR(F(a))

to be a member of the Hilbert scale, e.g.(27b) X2=wH3:w(0)=w(1)=0,w(1)=w(1)w,vX2=B~w,B~vH1=01w(3)v(3)+3wv+3wv+wvdx.(27b) Unfortunately, the restriction B~:R(F(a))H1 is not self-adjoint and we need to consider the operator defined underneath as the generator of the Hilbert scale below.

Proposition 3.2:

The operatorL:D(L)=wH5:w(0)=w(1)=w(3)(0)=0,w(1)=w(1)H1Lw:=w-2w+w(4).

is self-adjoint in H1, and(27c) D(L12)=R(F(a)).(27c)

It follows thereof that the Hilbert scale Xs generated by the operator L14 has the desired properties (Equation27a), (Equation27b) and (Equation27c).

Theorem 3.3:

For the Hilbert scale Xs defined before and under the supplementary conditions (Equation23) and (Equation26), Assumption 2.1(B) is fulfilled with p=2. Moreover, Assumptions 2.1(C) is also satisfied under the same conditions if the diameter of D(F)(a0+Xs) in X0-norm is smaller than a ρ>0.

Proof Assumptions 2.1(C) is equivalent to(28) F(a)-F(a)YX2CLa-a0λ22Λ(28)

for all aBρ(a)a0+Xs for some ρ>0 such that BρD(F). It follows, by (Equation25) that(F(a)-F(a))hX2=(Φ(a(·))-Φ(a(·)))·1h(s)ds0(Φa)-(Φa)H1·1h(s)dsH1K(2+a0)a-a0hL2.

The last inequality holds because(Φa)-(Φa)H1=Φa-ΦaL2+(Φa)(a)-(Φa)aL2Ka-aL2+(Φa)((a)-a)+(a)(Φa-Φa))L2Ka-aL2+K(a)-aL2+Φa-Φa(a)L22Ka-aH1+ΦC0,1a-aaH12Ka-aH1+ΦC0,1aH1a-aH1aH1K(2+a0)a-aH1.

Hence, the diameter ρ should be smaller than λ24ΛK(2+a0).

Proposition 3.4:

If aHq+1 and ΦCq+1,1, where q is the biggest integer smaller than q, then uHq+2.

Proof Since the composition between Φ and a belongs to Hq+1 (see [Citation36]), it follows immediately that uHq+2, for q2.

Now, the overall rate of convergence can be computed.

Corollary 3.5:

In the case of the stochastic noise model (Equation3), assuming ϵ white noise with Gaussian or regular non-Gaussian probability distribution, if the conditions of Theorem 3.3 hold true and if s2, q[s,2s+2], a-a0Xq, and aHq+1, then u can be estimated such that the rate of convergence (Equation4) holds true with βσq+2q+2+1/2. If the white noise has Gaussian distribution and the condition (Equation16) holds, then we get the rates of convergenceEa^-a02=Oσqq+2+1/2log1σ3q2(q+2)

while for regular concave log-probability distributions we obtainEa^-a02=Oσqq+2+1/2log1σ2qq+2.

Proof The first statement follows from Proposition 3.4, from asymptotic equivalence in Le Cam’s sense between nonparametric regression with regular error distribution i.e. presenting a locally asymptotic stochastic expansion (for exact definition of the regularity conditions see [Citation12]) and the white noise with drift, and from the theory of nonparametric regression (see [Citation11]). The smoothness of u is Sobolev of order q+2 , and rates of convergence on Sobolev balls were computed for nonparametric regression problems in the last reference and they correspond to the statement from the Corollary 3.5. The second statement is a consequence of the rates of convergence computed in [Citation37], Theorem 2.6 and Lemma 4.3.

3.2. Reconstruction of a reaction coefficient

To illustrate the convergence behaviour of our method, we consider in this section a parameter identification problem as a nonlinear operator equation and compute over-all rates of convergence for the Lepskii estimator based on theoretical properties already proved in [Citation15]. Furthermore, we compare its numerical and theoretical rates of convergence for different smoothness conditions with the help of Monte-Carlo simulations.

3.2.1. Theoretical rates of convergence

The direct problem is modeled by the differential equation-Δu(x)+a(x)u(x)=f(x),xΩu|Ω=g

where f and g are infinitely smooth, and the domain Ω is chosen to be (0, 1) for simplicity. For a given bound γ>0 we introduce the set(29) D(F)=}aL2((0,1)):0a(t)γ,t[0,1]{(29)

and we formulate the inverse problem of identifying the parameter aD(F) knowing ua=ua(x) as an operator equation with the help of parameter-to-solution operator(30) F:D(F)L2(0,1),F(a):=ua.(30)

The Hilbert scale XssR will be generated by the square root B12 of the positive, self-adjoint operator B defined byB:H01(0,1)H2(0,1)L2(0,1),Bv:=-Δv+v.

We notice that the elements of this Hilbert scale with integer index are subsets of Sobolev spaces with Dirichlet boundary conditions on the even derivativesX1=H01((0,1)),X2=H2((0,1))H01((0,1)),X3={vH3((0,1)):v,ΔvH01((0,1))}.

In [Citation15] it was proven that the operator F together with this choice of Hilbert scale fulfills the Assumptions 2.1 for p=2 and s=2 when the condition(31) cu=infx(0,1)ux>0(31)

and the smallness condition on the D(F) hold, and that the regularity of the parameter aHq((0,1)) with q>1/2 determines the smoothness of the exact data uHq+2((0,1)). Hence, we can get an over-all convergence rate results in this case, also.

Corollary 3.6:

In the case of the stochastic noise model (Equation3), assuming ϵ white noise with Gaussian or regular non-Gaussian probability distribution, for the Hilbert scale defined above Xs and under the supplementary condition (Equation32) and if s2, q[s,2s+2], a-a0Xq, and aHq((0,1)), then u can be estimated such that the rate of convergence (Equation4) holds true with βσq+1q+1+1/2. If the white noise has Gaussian distribution and the conditions (Equation16) holds, then we get the rates of convergenceEa^-a02=Oσqq+2+1/2log1σ3q2(q+2)

while for regular concave log-probability distributions we obtainEa^-a02=Oσqq+2+1/2log1σ2qq+2.

4. Numerical simulations

In this final section we present the employment of the balancing principle and the influence of the smoothness of the parameter a on the rate of convergence of u^ in the two-step method for the reconstruction of a reaction coefficient by numerical simulations. First, we choose the local polynomial estimator as the pre-smoothing method and review some of its properties in the framework of a discretized version of the problem (Equation3). A new result about the tail behaviour (Equation18) of the local polynomial estimator in a general, non-Gaussian noise model (see [Citation38] for a review of the literature on this subject) is also proved. Moreover, the values of λ and Λ for parameter-to-solution operator (Equation31), and the empirical overall rate of convergence for the two-step method are computed, and the rates are illustrated in our graphics for different smoothnesses of a.

Figure 2. Exact parameter a (full line - q=2.5, dashed line - q=3.5, dotted line - q=4.5), and its corresponding empirical and theoretical (dashed - dotted) rate of convergence on a log-log scale

Figure 2. Exact parameter a† (full line - q=2.5, dashed line - q=3.5, dotted line - q=4.5), and its corresponding empirical and theoretical (dashed - dotted) rate of convergence on a log-log scale

4.1. Tail behaviour of local polynomial estimator

In the first step, we consider the nonparametric regression problem of estimating u from the data Yi,Xii=1,,n such that(32) Yi=u(Xi)+ϵi,i=1,,n(32)

where ϵii=1,,n are independent, centred random variables with var(ϵi)v and Xi0,1,i=1,,n is a deterministic design. In the case of Gaussian nonparametric regression, when ϵi=1,,n are independent, identically normal distributed N(0, 1), this is the discretized version of the Gaussian white noise model in (Equation3) with ϵ the standard Wiener process on [0,1] and the stochastic noise level σ=1n (see e.g. [Citation11]).

As usual in this setting, we assume that u belongs to the Hölder class Σ(l,M) on 0,1 i.e. it is a l times differentiable function whose derivative ul satisfiesul(x)-ul(x)Mx-xl-l,x,x0,1

where l is the integer part of l>0. For the case that the co-domain Y of the operator F is the Sobolev space Wl+1,2, the compact embedding of the Hilbert space Wl+1,2 into the Hölder space Σ(l,M) follows from the Morrey’s inequality (as in [Citation39]).

The regression function u can be locally approximated byu(x+f)k=0luk(x)k!fk

for f>0 small enough. The vector θ(x)=uk(x)fkk=0,,l can be estimated by means of the local polynomial estimator of order d=l.

Definition 4.1:

The local polynomial estimator of order d of θ(x) is defined by (33) θ^nh(x)=argminθRd+11nhi=1nYi-θTVXi-xh2KXi-xh(33)

where V(u)=1,u,u22!,,udd!, K:RR is a kernel i.e. an integrable function satisfying K(u)du=1 , h>0 is the bandwidth and d0 is an integer. The local polynomial estimator of order d of u(x) is the first element of the vector θ^nh(x) and is denoted by u^nh(x)=Vt(0)θ^nh(x).

The existence and uniqueness of the optimization problem (Equation34) was proven in [Citation11] under the following hypothesis which will assume to hold from now on.

Assumptions 4.2:

(A)

There exists an n0N such that the eigenvalues of the Hessian matrix of the objective function in the optimization problem (Equation34) Bhx=Bhxpq(θ)p,q=1,,d+1=θpθq1nhi=1nYi-θTVXi-xh2KXi-xhp,q=0,,d=2nhi=1nVXi-xhVTXi-xhKXi-xh are bounded from below uniformly in x by a λ0>0 for any nn0.

(B)

The frequency of the points Xii=1,,n in any interval a,b0,1 is uniformly bounded with respect to n by a0maxb-a,1n for a positive constant a0.

(C)

K is bounded by a constant Kmax on R and has compact support supp(K)[-1,1].

Consequently, the local polynomial estimator of order d of θ(x) as well as of u(x) exist and are given by(34) θ^nh(x)=1nhi=1nYiBhx-1VXi-xhKXi-xhu^nh(x)=1nhi=1nYiVt(0)Bhx-1VXi-xhKXi-xh.(34)

We focus in the following on the tail behaviour of the local polynomial estimator for a fixed bandwidth h under different noise conditions.

Lemma 4.3:

Let us assume that the vector of errors ϵ=ϵii=1,,n has a log-concave, continuous distribution on Rn i.e. the logarithm of this probability density is concave on the set where its logarithm is defined. If the Assumptions 4.2 hold, we havePu^nh-Eu^nhL2(0,1)2tEu^nh-Eu^nhL2(0,1)2e·exp-tC1e2

where C1 is an universal constant, h>0 and te2C12.

Moreover, if the errors ϵii=1,,n are independent, centred Gaussian random variables, then it holdsPu^nh-Eu^nhL2(0,1)2tEu^nh-Eu^nhL2(0,1)2exp-C2·t

where C2 is an universal constant.

Proof From (Equation35) we can writeu^nh-Eu^nhL2(0,1)2=011nhi=1nϵiVt(0)Bhx-1VXi-xhKXi-xh2dx=1(nh)2i,j=1nϵiϵj01Vt(0)Bhx-1VXi-xhKXi-xhVt(0)Bhx-1VXj-xh·KXj-xhdx.

This is a positive polynomial of order two with symmetric coefficients in the random variables ϵi,i=1,,n. In this case, results regarding the higher moments are already available (see Theorem 7 in [Citation40]). For the sake of completeness we reproduce this result in Appendix 1 (see Lemma 1) and use it to bound the moments of the polynomialf(ϵ)=1(nh)2i,j=1nϵiϵj01Vt(0)Bhx-1VXi-xhKXi-xhVt(0)Bhx-1VXj-xh·KXj-xhdx

asEf(ϵ)q2C1q2qEf(ϵ)q2

where d=r=2, q2 and C1 is an universal constant. From the Markov inequality we get(35) Pf(ϵ)tEf(ϵ)C1q2tq(35)

Let us choose now the integer q such that 2teC1-1q2teC1. For values of t such that q2 we can apply the previous inequality and obtainPf(ϵ)tEf(ϵ)C1q2tqexp-qe·exp-tC1e2.

The second inequality in Lemma 4.3, leading to Chernoff bound like estimates, is a direct consequence of hypercontractivity inequalities for Gaussian Hilbert spaces (see Theorem 6.7 in [Citation41]).

Remark 3:

The log-concave probability distributions include normal, exponential, logistic, chi-square, chi and Laplace. A survey of these can be found in [Citation42]. The stochastic error is assumed to be a zero mean stochastic process with bounded covariance operator. Therefore, considering a general class of distributions larger than the Gaussian fit into the general setting of our problem. Due to Lemma 4.3, the overall rate of convergence of the two-step method will depend of the class of probability distributions to which the discretized noise term belongs to.

Remark 4:

Results about asymptotically optimality of local polynomial estimator with the regularization parameter chosen by cross-validation were proved in [Citation43,Citation44]. Details about practical implementation of this method could be found in [Citation45].

Remark 5:

An adaptive local polynomial estimator based on a particular implementation of Lepskii scheme and its convergence properties for local and global risks for a wide range of characterizations of smoothness and accuracy measures are presented in [Citation37]. The rates of convergence depend on the smoothness in the scale of Sobolev spaces of the exact data u, the degree of the polynomial and the index of the Lebesgue norm of the global risk, and are optimal or optimal up to a logarithmic term, as expected from [Citation46,Citation47].

Remark 6:

A data-driven approach leading to adaptive estimation of both polynomial order and the bandwidth with the help of cross-validation can be found in [Citation48]. This method is uniformly consistent for a large class of functions u and reaches the optimal rate for the case of correctly specified parametric model i.e. when u is a polynomial whose order does not depend on sample size. Nevertheless, an order optimal approach aiming to adaptively tailor the degree and the tuning parameter of the local polynomial estimator is still an open problem.

4.2. Rate of convergence of the Tikhonov estimator for the balancing principle

In the following we compute and illustrate the empirical rate of convergence of the Tikhonov estimator for the balancing principle in the stochastic setting with the local polynomial estimator as pre-smoothing method. Since theoretical and numerical results imply that the Lepskii constant k2 depend on the chosen application, we present a result concerning the computation of this constant for the inverse problem defined in (Equation31). To this aim, the values of Λ and λ are given in the following lemma under general conditions for the direct operator.

Lemma 4.4:

If the assumptions from Corollary 3.6 are fulfilled and the supplementary conditions(36) f0,g>0,f-a·g0(36)

hold, then we haveλ=1max(1,γ)infΩgΛ=1+3·γ2·1+Dπ21+Dπ2·12D2·supΩf+supΩg·D

where D is the length of the interval Ω.

Proof The Fréchet derivative of the operator defined in (Equation31) isF(a)h=T(a)-1(-h·ua),T(a):H2(Ω)H01(Ω)L2(Ω)

where T(a)=ua (see [Citation1]).The following bounds follow from the Gelfand triple structure of the Hilbert scale and the Banach algebra property of the Hilbert space H2(Ω) (see [Citation15] and references within):1max(1,γ)11uah-2F(a)h01+3·γ2·k2C·uaX0h-2

where k and C are Sobolev constants corresponding to embedding of H01(Ω), respectively H2(Ω)H01(Ω) into L2(Ω). From [Citation49,Citation50] it follows that these constants take the values k=1+Dπ and C=1+Dπ2.

Under the supplementary conditions (3.6), classical methods in partial differential equations lead to following bounds on the solution of the direct probleminfΩgua(x)-V(x)·supΩf+supΩginfΩgua(x)12D2·supΩf+supΩg

where V(x)=12|x-x0|2-D2,xΩ for an x0Ω (see e.g. Exercise 4 in [Citation51]). Since1max(1,γ)11ua=infΩuq=infΩg

anduaX0D·12D2·supΩf+supΩg

we get λ and Λ from Lemma 3.6.

Remark 7:

The computation of the Lepskii constant is also feasible for smooth domains Ω in Rn, n>1. For example, results about the value of the Poincaré constant for convex domains in Rn, n>1 are available in [Citation52].

In our numerical simulations, the unknown parameter a is chosen as a B-spline of order 2, 3 or 4 which corresponds to smoothness up to H2.5, H3.5 respectively H4.5, and its upper bound γ is equal to 1. The use of different larger values of γ does not change the asymptotic rates of convergence, hence the method is robust with respect to γ. The exact data u has smoothness up to H4.5, H5.5 respectively H6.5 corresponding to the chosen B-splines. From Sobolev embedding theorem, this implies a Hölder condition of order up to 4, 5 and 6.

The Gaussian white noise was discretized such that we obtain an inverse regression problem with equidistant deterministic design X1,Xn(0,1) and normally distributed errors with variance σ2=0.01. Since the discrete regression design is asymptotically equivalent to the Guassian white noise model (Equation3) (see e.g. [Citation11]), we consider the increasing sample sizes n 1000 , 2512 , 6310 , 15849 , 39811 , 100000 instead of letting the noise level going to 0 in the theoretical noise model, and 100 simulations for each sample size.

Remark 8:

Theorem 2.6 holds for noise levels that fulfill the inequality (Equation16). In our numerical example, we can explicitly formulate the relation between the noise level β and the q-norm of the difference between the initial guess and exact solution c. From Lemma 4.4, we get the values λ=1 and Λ=3.4 since γ=1, D=1, supΩf=1 and supΩg=1 for a choice of the exact data u(x)=1+x·(1-x). The values of the constants k1 and k2 can be accordingly derived ask1=54·2q4c2+597.42·23(q+2)16c1.5+195.98·2q2c+136.199·2q2ck2=143.24

and, for a choice of the initial value for the regularization parameter α1=β2, (Equation16) becomesk1·2β2(q-s)p+2+13·k2

For q=2.5 we have83.28·c2+941.53·c1.5+490.948·c+151.77·c2β14+1429.73

and the sufficient conditions for this inequality to hold are c121+2β140.26 and c1. Hence, for a noise level of 0.01, our initial guess should be as close in the q-norm to the exact solution as 0.16 to ensure that condition (Equation16) holds. Moreover, the quality of the reconstruction is very sensitive to the choice of the a-priori guess.

To illustrate the rates of convergence for the two-step method with Lepskii choice of the regularization parameter, we used in the first step of the algorithm a local polynomial estimator with Gaussian kernel for the direct regression problem as stated in (Equation33) and an a-priori chosen bandwidth. We chose quadratic local polynomial estimator, since the linear estimator shows larger bias in regions with higher curvatures , and we varied the values of the bandwidth by using a constant ch chosen from the grid between 0.1 and 1 with step size 0.1 as factor to the a-priori value n-15. The empirical values of the rates of convergence β^ were computed by leave-1-out cross-validation and used in the second step of the method. An exhaustive introduction to the application of the local polynomial estimator and its challenges can be found in [Citation45]. The tail behaviour for this estimator fulfills the condition in Theorem 2.6 as a consequence of Lemma 4.3. For a polynomial of degree 2, it follows from Theorem 5.10 and Remark 5.13 in [Citation41] that l2=1e and η=1.

In the second step, we applied the balancing principle to choose the regularization parameter as studied in Theorem 2.6. The Lepskii factor k was chosen from a grid of values between 0.1 and 0.05, by minimizing the empirical mean square error, and the Lepskii constant is computed as k·1+l2log1β^1ηβ^. An overview of the constants used for simulations and computation is given in the Table .

We present in Figure the convergence rates for the chosen smoothness of the exact solution, considering the sample sizes 1000 , 2512 , 6310 , 15849 , 39811 , 100000. The logarithm of the empirical estimation of the expected squared error E(a^+-a02) of a^ is plotted over the logarithm of the empirical values of the rates of convergence of the local polynomial estimator for the three choices of a shown in Figure . The empirical rates of convergence are plotted in full, dashed and dotted lines according to the three different smoothness conditions, while the theoretical ones corresponding to a line with slope q2+q are illustrated in the dashed-dotted lines. Starting with a sample size n=15849, the linear trend dominates, approaching asymptotically the theoretical linear slopes.

Table 1. Overview of the constants.

Acknowledgements

The author would like to thank the Isaac Newton Institute for Mathematical Sciences for support and hospitality during the programme ‘Variational methods and effective algorithms for imaging and vision’ when work on this paper was also undertaken.

Additional information

Funding

This work was supported by the Research Training Group 1023: Identification in Mathematical Models that was founded in 2004 at the Faculty of Mathematics at the University of Göttingen and was supported by the DFG (Deutsche Forschungsgemeinschaft) and by the Isaac Newton Institute for Mathematical Sciences (EPSRC [grant number EP/K032208/1], [grant number EP/R014604/]).

Notes

No potential conflict of interest was reported by the author.

References

  • Engl HW , Hanke M , Neubauer A . Regularization of inverse problems. Vol. 375, Mathematics and its applications. Dordrecht: Kluwer Academic Publishers Group; 1996.
  • Sudakov VN , Halfin LA . A statistical approach to the correctness of the problems of mathematical physics. Dokl Akad Nauk SSSR. 1964;157:1058–1060.
  • Bissantz N , Hohage T , Munk A , et al . Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J Numer Anal. 2007;45(6):2610–2636.
  • Mathé P , Pereverzev S . Optimal discretization of inverse problems in Hilbert scales. Regularization and self-regularization of projection methods. SIAM J Numer Anal. 2001;38:1999–2021.
  • O’Sullivan F . Convergence characteristics of methods of regularization estimators for nonlinear operator equations. SIAM J Numer Anal. 1990;27(6):1635–1649.
  • Lukas MA . Robust generalized cross-validation for choosing the regularization parameter. Inverse Probl. 2006;22(5):1883–1902.
  • Wahba G . Spline models for observational data. Vol. 59, CBMS-NSF regional conference series in applied mathematics. Philadelphia (PA): Society for Industrial and Applied Mathematics (SIAM); 1990.
  • Bissantz N , Hohage T , Munk A . Consistency and rates of convergence of nonlinear Tikhonov regularization with random noise. Inverse Probl. 2004;20(6):1773–1789.
  • Klann E , Ramlau R . Regularization by fractional filter methods and data smoothing. Inverse Probl. 2008;24(2):025018, 26.
  • Klann E , Maaß P , Ramlau R . Two-step regularization methods for linear inverse problems. J Inverse Ill-Posed Probl. 2006;14(6):583–607.
  • Tsybakov AB . Introduction to nonparametric estimation. Springer series in statistics. New York (NY): Springer; 2009. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
  • Grama I , Nussbaum M . Asymptotic equivalence for nonparametric regression. Math Methods Statist. 2002;11(1):1–36.
  • Meister A , Reiß M . Asymptotic equivalence for nonparametric regression with non-regular errors. Probab Theory Rel Fields. 2013;155(1–2):201–229.
  • Werner F , Hohage T . Convergence rates in expectation for Tikhonov-type regularization of inverse problems with Poisson data. Inverse Probl. 2012;28(10):104004, 15.
  • Hohage T , Pricop M . Nonlinear Tikhonov regularization in Hilbert scales for inverse boundary value problems with random noise. Inverse Probl Imag. 2008;2(2):271–290.
  • Natterer F . Error bounds for Tikhonov regularization in Hilbert scales. Appl Anal. 1984;18(1–2):29–37.
  • Neubauer A . Tikhonov regularization of nonlinear ill-posed problems in Hilbert scales. Appl Anal. 1992;46(1–2):59–72.
  • Mair BA , Ruymgaart FH . Statistical inverse estimation in Hilbert scales. SIAM J Appl Math. 1996;56(5):1424–1444.
  • Nussbaum M , Pereverzev S . The degree of ill-posedness in stochastic and deterministic noise models. Berlin: WIAS; 1999. Technical Report.
  • Goldenshluger A , Pereverzev S . On adaptive inverse estimation of linear functionals in Hilbert scales. Bernoulli. 2003;9:783–807.
  • Tautenhahn U . Error estimates for regularized solutions of nonlinear ill-posed problems. Inverse Probl. 1994;10(2):485–500.
  • Polzehl J , Spokoiny V . Error estimates for regularization methods in Hilbert scales. SIAM J Numer Anal. 1996;33(6):2120–2130.
  • Lepskiĭ OV . A problem of adaptive estimation in Gaussian white noise. Teor Veroyatnost i Primenen. 1990;35(3):459–470.
  • Mathé P , Pereverzev S . Geometry of linear ill-posed problems in variable Hilbert scales. Inverse Probl. 2003;19(3):789–803.
  • Lu S , Pereverzev S , Ramlau R . An analysis of Tikhonov regularization for nonlinear ill-posed problems under general smoothness assumptions. Inverse Probl. 2007;23(1):217–230.
  • Bauer F , Hohage T . A Lepskij-type stopping rule for regularized Newton methods. Inverse Probl. 2005;21(6):1975–1991.
  • Polzehl J , Spokoiny VG . Adaptive weights smoothing with applications to image restoration. J R Stat Soc Ser B Stat Methodol. 2000;62(2):335–354.
  • Kreĭn SG , Petunin JI . Scales of Banach spaces. Uspehi Mat Nauk. 1966;21(2 (128)):89–168.
  • Pereverzev S , Schock E . On the adaptive selection of the parameter in regularization of ill-posed problems. SIAM J Numer Anal. 2005;43(5):2060–2076. electronic.
  • Mathé P . The Lepskiĭ principle revisited. Inverse Probl. 2006;22(3):L11–L15.
  • Bauer F , Pereverzev S . Regularization without preliminary knowledge of smoothness and error behaviour. Eur J Appl Math. 2005;16(3):303–317.
  • Bauer F . An alternative approach to the oblique derivative problem in potential theory [PhD thesis]. Aachen: Universität Kaiserslautern; 2004.
  • Neubauer A . On Landweber iteration for nonlinear ill-posed problems in Hilbert scales. Numer Math. 2000;85(2):309–328.
  • Egger H , Neubauer A . Preconditioning Landweber iteration in Hilbert scales. Numer Math. 2005;101(4):643–662.
  • Pricop M . Tikhonov regularization in Hilbert scales for nonlinear statistical inverse problems. Uelvesbüll: Der Andere Verlag; 2007.
  • Renardy M , Rogers RC . An introduction to partial differential equations. 2nd ed. Vol. 13, Texts in applied mathematics. New York (NY): Springer-Verlag; 2004.
  • Goldenshluger A , Nemirovski A . On spatially adaptive estimation of nonparametric regression. Math Methods Statist. 1997;6(2):135–170.
  • Avery M . Literature review for local polynomial regression, unpublished manuscript; 2013. Available from: http://www4.ncsu.edu/mravery/AveryReview2.pdf
  • Adams RA , Fournier J . Sobolev spaces. 2nd ed. Vol. 140, Pure and applied mathematics. Oxford: Elsevier Science; 2003.
  • Carbery A , Wright J . Distributional and lq norm inequalities for polynomials over convex bodies in R n . Math Res Lett. 2001;8(3):233–248.
  • Janson S . Gaussian Hilbert spaces. Vol. 129, Cambridge tracts in mathematics. Cambridge: Cambridge University Press; 1997.
  • Bagnoli M , Bergstrom T . Log-concave probability and its applications. Econom Theory. 2005;26(2):445–469.
  • Xia Y , Li WK . Asymptotic behavior of bandwidth selected by the cross-validation method for local polynomial fitting. J Multivariate Anal. 2002;83(2):265–287.
  • Li Q , Racine J . Cross-validated local linear nonparametric regression. Statist Sinica. 2004;14(2):485–512.
  • Fan J , Gijbels I . Local polynomial modelling and its applications. Vol. 66, Monographs on statistics and applied probability. London: Chapman & Hall; 1996.
  • Lepskiĭ OV . Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor Veroyatnost i Primenen. 1991;36(4):645–659.
  • Lepskiĭ OV . Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Teor Veroyatnost i Primenen. 1992;37(3):468–481.
  • Hall PG , Racine JS . Infinite order cross-validated local polynomial regression. Department of Economics Working Papers McMaster University;2013.
  • Chua SK , Wheeden RL . A note on sharp 1-dimensional Poincaré inequalities. Proc Amer Math Soc. 2006;134(8):2309–2316. electronic.
  • Talenti G . Best constant in Sobolev inequality. Ann Mat Pura Appl (4). 1976;110:353–372.
  • Taylor ME . Partial differential equations. I. Vol. 115, Applied mathematical sciences. New York (NY): Springer-Verlag; 1996.
  • Payne LE , Weinberger HF . An optimal Poincaré inequality for convex domains. Arch Rat Mech Anal. 1960;1960(5):286–292.

Appendix 1

Proof of the Theorem 2.2:

As aα is a solution of the minimization problem for the Tikhonov functional, the following inequality holdsF(aα)-uY2+αaα-a0s2F(a)-uY2+αa-a0s2=αa-a0s2.

It follows thatF(aα)-uY2+αaα-as2α(a-a0s2+aα-as2-aα-a0s2)=2αa-a0,a-aαs.

From Assumption 2.1 C it follows thatF(aα)=F(a)+F(a)(aα-a)+rα,withrαYλ24Λaα-a-p.

Replacing the difference F(aα)-F(a) in the previous inequality we obtain(37) F(a)(aα-a)Y2+rαY2+2F(a)(aα-a),rα+αaα-as22αa-a0,a-aαs.(37)

Using now Assumption 2.1 B we get thatλ2aα-a-p2+αaα-as22Λλ24Λaα-a-p2+2αa-a0,a-aαs.

Since we assumed that a-a0Xq we have thatλ22aα-a-p2+αaα-as22cαaα-a2s-q

where c=a-a0q. We apply the interpolation inequality (Equation7) for -p2s-qs and obtain(A1) λ22aα-a-p2+αaα-as22cαaα-a-p-s+qs+paα-as2s-q+ps+p.(A1)

Then it holdsaα-a-p24cλ2αaα-a-p-s+qs+paα-as2s-q+ps+p

and we can writeaα-a-p24cλ22(s+p)3s+2p-qα2(s+p)3s+2p-qaα-as2(2s-q+p)3s+2p-q.

From (EquationA2) we have thataα-as22caα-as2s-q+ps+p4cλ2q-s3s+2p-qα-s+q3s+2p-qaα-as(-s+q)(2s-q+p)(s+p)(3s+2p-q)=2q+2p+s3s+2p-qc2(s+p)3s+2p-qαλ2-s+q3s+2p-qaα-as2(2s-q+p)3s+2p-q

and it followsaα-as22q+2p+ss+pc2αλ2-s+qs+p

and(A2) aα-a-p24cλ2α2(s+p)3s+2p-q2q+2p+ss+p2s-q+p3s+2p-qc22s-q+p3s+2p-qαλ2-s+qs+p2s+p-q3s+2p-q22s+3p+qs+pc2αλ2p+qs+p.(A2)

Using the interpolation inequality (Equation7) the desired rate of convergence is obtainedaα-a0aα-a-psp+saα-asps+p21+q2(s+p)cαλ212qp+sK·α12qp+s

with K a constant that depends on s,p,q,λ,c.

Proof of the Theorem 2.3:

Let aαδ be a solution of the optimization problem (Equation8) and let us consider the Lagrange-Euler equation for the optimization problem (Equation8). From classical results for calculus of variations we know that the first variation of the Tikhonov functional Jα defined as the right hand side in (Equation8), is null for every direction hXs and, since we have Fréchet differentiability in aαδ it follows that Jα(aαδ)h=0 for all hXs. This meansF(aαδ)-Y,F(aαδ)hY+αaαδ-a0,hs=0,hXs.

Due to the same arguments, since aα is the solution of the minimization problem (Equation11) and from the Lagrange-Euler equation it followsF(aα)-u,F(aα)hY+αaα-a0,hs=0,hXs.

As aαδ-a0,aα-a0Xs we have that aαδ-aαXs and we can choose the direction h=aαδ-aα in both equalities. It follows thatF(aαδ)-Y,F(aαδ)(aαδ-aα)Y+αaαδ-a0,aαδ-aαs=0, F(aα)-u,F(aα)(aαδ-aα)Y+αaα-a0,aαδ-aαs=0.

Taking the difference between these two relations we obtain(A3) F(aαδ)-Y,F(aαδ)(aαδ-aα)Y-F(aα)-u,F(aα)(aαδ-aα)Y+αaαδ-aα,aαδ-aαs=0.(A3)

From Assumption 2.1 we can write the Taylor polynomial asF(aα)=F(a)+F(a)(aα-a)+01F(a+t(aα-a))-F(a)(aα-a)dtF(aαδ)=F(a)+F(a)(aαδ-a)+01F(a+t(aαδ-a))-F(a)(aαδ-a)dt

and we getF(aαδ)-F(aα)=F(a)(aαδ-aα)-01F(a+t(aα-a))-F(a)(aα-a)dt+01F(a+t(aαδ-a))-F(a)(aαδ-aα)dt+01F(a+t(aαδ-a))-F(a)(aα-a)dt.

We denote the three integrals as I1,I2 respectively I3 and, from Assumption 2.1, we can bound their norms byI101F(a+t(aα-a))-F(a)YX-paα-a-pdt01CLtaα-a0aα-a-pdt=CL2aα-a0aα-a-pλ24Λaα-a-pI201F(a+t(aαδ-a))-F(a)YX-paαδ-aα-pdt01CLtaαδ-a0aαδ-aα-pdtCL2aαδ-a0aαδ-aα-pλ24Λaαδ-aα-pI301F(a+t(aαδ-a))-F(a)YX-paα-a-pdtI301CLtaαδ-a0aα-a-pdtCL2aαδ-a0aα-a-pλ24Λaα-a-p. We write now the relation (EquationA4) asF(aαδ)-F(aα)+F(aα)-u+u-Y,(F(aαδ)-F(a))(aαδ-aα)Y-F(aα)-u,(F(aα)-F(a))(aαδ-aα)Y+F(aαδ)-F(aα)-Y+u,F(a)(aαδ-aα)Y+αaαδ-aαs2=0.

Using the Taylor formula we getF(a)(aαδ-aα)-I1+I2+I3+F(a)(aα-a)+I1+u-Y,(F(aαδ)-F(a))(aαδ-aα)Y+αaαδ-aαs2-F(a)(aα-a)+I1,(F(aα)-F(a))(aαδ-aα)Y+F(a)(aαδ-aα)-I1+I2+I3+u-Y,F(a)(aαδ-aα)Y=0.

It followsαaαδ-aαs2+F(a)(aαδ-aα)Y2=F(a)(aα-a)+I1,(F(aα)-F(a))(aαδ-aα)Y--I1+I2+I3+u-Y,F(a)(aαδ-aα)Y-F(a)(aαδ-aα)+I2+I3+F(a)(aα-a)+u-Y,(F(aαδ)-F(a))(aαδ-aα)Y.

From Cauchy-Schwarz inequality, Assumptions 2.1 and the norm estimates for I1, I2 and I3 we haveλ2aαδ-aα-p2+αaαδ-aαs2Λaα-a-p+λ24Λaα-a-pλ24Λaαδ-aα-p+Λaαδ-aα-pλ24Λaα-a-p+λ24Λaαδ-aα-p+λ24Λaα-a-p+δ+δ+Λaαδ-aα-p+λ24Λaαδ-aα-p+Λaα-a-p+λ24Λaα-a-pλ24Λaαδ-aα-p.

We can rewrite this inequality as(A4) λ2aαδ-aα-p2+αaαδ-aαs2λ22+λ416Λ2aαδ-aα-p2+λ2+λ48Λ2aα-a-paαδ-aα-p+Λ+λ24Λδaαδ-aα-p(A4)

As λΛ we have λ416Λ2λ216 and it holds7λ216aαδ-aα-p2+αaαδ-aαs2λ2+λ48Λ2aα-a-paαδ-aα-p+Λ+λ24Λδaαδ-aα-p

To simplify the notation, we introduce a positive constant kR that will vary from line to line in the following paragraphs. We have thataαδ-aα-p21671+λ28Λ2aα-a-paαδ-aα-p+167Λλ2+14Λδaαδ-aα-pkaα-a-paαδ-aα-p+Λλ2+14Λδaαδ-aα-p

and this implies(A5) aαδ-aα-pkaα-a-p+Λλ2+14Λδkc2αλ2p+q2(s+p)+Λλ2+14Λδ(A5)

since we computed the rates of convergence for the -p-norm of the approximation error aα-a-p2 in (EquationA3). Using (EquationA3) and (EquationA6) in (EquationA5) we notice thataαδ-aαs21αaαδ-aα-p98λ2aα-a-p+54Λδ1αkc2αλ2p+q2(s+p)+Λλ2+14Λδ94λ2c2αλ2p+q2(s+p)+54Λδkc22αλ2q-sp+s+cΛδ2λ2p+q2(p+s)α-2s-p+q2(p+s)+Λ2λ2+14δ2α

From (EquationA6) it holdsaαδ-aα-p2kc22αλ2p+qs+p+Λλ2+14Λ2δ2kc22αλ2p+qs+p+Λλ2+14Λ2δ2

We apply now the interpolation inequality (Equation7) to obtainaαδ-aα02aαδ-aα-p2sp+saαδ-aαs2pp+s

Since (x+y)rxr+yr for all x, y positive and 0r1, we get the following inequalityaαδ-aα02kc2ss+p2αλ2s(p+q)(s+p)2+Λλ2+14Λ2ss+pδ2ss+p·k2αλ2p(q-s)(p+s)2+Λps+p2λ2p(p+q)2(p+s)2αp(-2s-p+q)2(p+s)2δps+p+Λ2λ2+14ps+pδ2αps+p

meaning thataαδ-aα02kc22αλ2qp+s+α2sq+pq-p22(s+p)22λ2(p+q)(2s+p)2(s+p)2δps+p+Λ2λ2+141+ss+pΛ-2ss+pα-ps+pδ2+Λλ2+14Λ2ss+pc2ps+p2αλ2p(q-s)(s+p)2δ2ss+p+cps+pΛps+pΛλ2+14Λ2ss+p2λ2p(p+q)2(s+p)2αp(-2s-p+q)2(s+p)2δ1+ss+p+c2ss+pΛ2λ2+14ps+p2λ2s(p+q)(s+p)2αqs-p2(s+p)2δ2ps+p

where we used the inequalities ps+p0.5 and ss+p1 to bound the numerical constants. We now apply four times the Young inequalityabakk+bll,

which holds for all a,b>0, 1<k,l< and 1k+1l=1 with ak=δ2α-ps+p. For the second term we take b=α2qs+pq2(s+p)2, k=2(s+p)p and l=2(s+p)2s+p, for the fourth b=αpq(s+p)2, l=s+pp and k=s+ps, for the fifth b=αpq2(s+p)2, k=2(s+p)2s+p and l=2(s+p)p, for the sixth term b=αpq(s+p)2, k=s+ps and l=s+pp. This yields(A6) aαδ-aα02k1αqp+s+k1+k2δ2α-ps+p(A6)

where and k2=12Λ2λ2+141+ss+pΛ-2ss+p can be easily computed from the second term and from an upper bound for k, and k1 is a constant that depends on s,p,q,λ,Λ,c such that k1K.

For the sake of completeness, we state now the following result from [Citation40] that was applied in our article in the proof of Lemma 4.3.

Lemma A.1:

Let p:RnR be a polynomial of degree at most d. Suppose 0<rq< and μ is a log-concave probability measure on Rn. Then there is an absolute constant C such thatp(x)pqdμ(x)1qCmax(q,1)max(r,1)p(x)rddμ(x)1r.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.