1,196
Views
2
CrossRef citations to date
0
Altmetric
Articles

A forward–backward penalty scheme with inertial effects for monotone inclusions. Applications to convex bilevel programming

ORCID Icon &
Pages 1855-1880 | Received 13 Jan 2018, Accepted 23 Nov 2018, Published online: 11 Dec 2018

ABSTRACT

We investigate a forward–backward splitting algorithm of penalty type with inertial effects for finding the zeros of the sum of a maximally monotone operator and a cocoercive one and the convex normal cone to the set of zeroes of an another cocoercive operator. Weak ergodic convergence is obtained for the iterates, provided that a condition expressed via the Fitzpatrick function of the operator describing the underlying set of the normal cone is verified. Under strong monotonicity assumptions, strong convergence for the sequence of generated iterates is proved. As a particular instance we consider a convex bilevel minimization problem including the sum of a non-smooth and a smooth function in the upper level and another smooth function in the lower level. We show that in this context weak non-ergodic and strong convergence can be also achieved under inf-compactness assumptions for the involved functions.

AMS SUBJECT CLASSIFICATIONS:

1. Introduction and preliminaries

1.1. Motivation and problems formulation

During the last couple years one can observe in the optimization community an increasing interest in numerical schemes for solving variational inequalities expressed as monotone inclusion problems of the form (1) 0Ax+NMx,(1) where H is a real Hilbert space, A:HH is a maximally monotone operator, M:=argminh is the set of global minima of the proper, convex and lower semicontinuous function h:RR¯:=R{±} and NM:HH is the normal cone of the set M. The article [Citation1] was starting point for a series of papers [Citation2–12] addressing this topic or related ones. All these papers share the common feature that the proposed iterative schemes use penalization strategies, namely, they evaluate the penalized h by its gradient, in case the function is smooth (see, for instance, [Citation3]), and by its proximal operator, in case it is non-smooth (see, for instance, [Citation4]).

Weak ergodic convergence has been obtained in [Citation3,Citation4] under the hypothesis: (2) For all pRanNM,n1λnβnhpβnσMpβn<+,(2) with (λn)n1, the sequence of step sizes, (βn)n1, the sequence of penalty parameters, h:HR¯, the Fenchel conjugate function of h, and RanNM the range of the normal cone operator NM:HH. Let us mention that (Equation2) is the discretized counterpart of a condition introduced in [Citation1] for continuous-time non-autonomous differential inclusions.

One motivation for studying numerical algorithms for monotone inclusions of type (Equation1) comes from the fact that, when Af is the convex subdifferential of a proper, convex and lower semicontinuous function f:HR¯, they furnish iterative methods for solving bilevel optimization problems of the form (3) minxHfx:xargminh.(3) Among the applications where bilevel programming problems play an important role we mention the modelling of Stackelberg games, the determination of Wardrop equilibria for network flows, convex feasibility problems [Citation13], domain decomposition methods for PDEs [Citation14], image processing problems [Citation6], and optimal control problems [Citation4].

Later on, in [Citation7], the following monotone inclusion problem, which turned out to be more suitable for applications, has been addressed in the same spirit of penalty algorithms (4) 0Ax+Dx+NMx,(4) where A:HH is a maximally monotone operator, D:HH is cocoercive operator and the constraint set M is the set of zeros of another cocoercive operator B:HH. The provided algorithm of forward–backward type evaluates the operator A by a backward step and the two single-valued operators by forward steps. For the convergence analysis, (Equation2) has been replaced by a condition formulated in terms of the Fitzpatrick function associated with the operator B, which we will also use in this paper. In [Citation5], several particular situations for which this new condition is fulfilled have been provided.

The aim of this work is to endow the forward–backward penalty scheme for solving (Equation4) from [Citation7] with inertial effects, which means that the new iterate is defined in terms of the previous two iterates. Inertial algorithms have their roots in the time discretization of second-order differential systems [Citation15]. They can accelerate the convergence of iterates when minimizing a differentiable function [Citation16] and the convergence of the objective function values when minimizing the sum of a convex non-smooth and a convex smooth function [Citation17,Citation18]. Moreover, as emphasized in [Citation19], see also [Citation20], algorithms with inertial effects may detect optimal solutions of minimization problems which cannot be found by their non-inertial variants. In the last years, a huge interest in inertial algorithms can be noticed (see, for instance, [Citation8,Citation9,Citation15,Citation17,Citation20–32]).

We prove weak ergodic convergence of the sequence generated by the inertial forward–backward penalty algorithm to a solution of the monotone inclusion problem (Equation4), under reasonable assumptions for the sequences of step sizes, penalty and inertial parameters. When the operator A is assumed to be strongly monotone, we also prove strong convergence of the generated iterates to the unique solution of (Equation4).

In Section 3, we address the minimization of the sum of a convex non-smooth and a convex smooth function with respect to the set of minimizes of another convex and smooth function. Besides the convergence results obtained from the general case, we achieve weak non-ergodic and strong convergence statements under inf-compactness assumptions for the involved functions. The weak non-ergodic theorem is an useful alternative to the one in [Citation9], where a similar statement has been obtained for the inertial forward–backward penalty algorithm with constant inertial parameter under assumptions which are quite complicated and hard to verify (see also [Citation11,Citation12]).

1.2. Notations and preliminaries

In this subsection we introduce some notions and basic results which we will use throughout this paper (see [Citation33–35]). Let H be a real Hilbert space with inner product , and associated norm =,.

For a function Ψ:HR¯:=R{±}, we denote DomΨ={xH:Ψ(x)<+} its effective domain and say that Ψ is proper, if DomΨ and Ψ(x)> for all xH. The conjugate function of Ψ is Ψ:HR¯,Ψ(u)=supxH{x,uΨ(x)}. The convex subdifferential of Ψ at the point xH is the set ∂Ψ(x)={pH:yx,pΨ(y)Ψ(x) yH}, whenever Ψ(x)R. We take by convention ∂Ψ(x)=, if Ψ(x){±}.

Let M be a non-empty subset of H. The indicator function of M, which is denoted by δM:HR¯, takes the value 0 on M and + otherwise. The convex subdifferential of the indicator function is the normal cone of M, that is NM(x)={pH:yx,p 0 yH}, if xM, and NM(x)= otherwise. Notice that for xM we have pNM(x) if and only if σM(x)=x,p, where σM=δM is the support function of M.

For an arbitrary set-value operator A:HH we denote by GrA={(x,v)H×H:vAx} its graph, by DomA={xH:Ax} its domain, by RanA={vH:xH with vAx} its range and by A1:HH its inverse operator, defined by (v,x)GrA1 if and only if (x,v)GrA. We use also the notation ZerA={xH:0Ax} for the set of zeros of the operator A. We say that A is monotone, if xy,vw0 for all (x,v),(y,w)GrA. A monotone operator A is said to be maximally monotone, if there exists no proper monotone extension of the graph of A on H×H. Let us mention that if A is maximally monotone, then ZerA is a convex and closed set [Citation33, Proposition 23.39]. We refer to [Citation33, Section 23.4] for conditions ensuring that ZerA is non-empty. If A is maximally monotone, then one has the following characterization for the set of its zeros (5) zZer A if and only if uz,y0 for all u,yGr A.(5) The operator A is said to be γ- strongly monotone with γ>0, if xy,vwxy2 for all (x,v),(y,w)GrA. If A is maximally monotone and strongly monotone, then ZerA is a singleton, thus non-empty [Citation33, Corollary 23.27].

The resolvent of A,JA:HH, is defined by JA:=(Id+A)1, where Id:HH denotes the identity operator on H. If A is maximally monotone, then JA:HH is single-value and maximally monotone [Citation33, Proposition 23.7, Corollary 23.10]. For an arbitrary γ>0, we have the following identity [Citation33, Proposition 23.18] JγA+γJγ1A1γ1Id=Id. We denote Γ(H) the family of proper, convex and lower semicontinuous extended real-valued functions defined on H. When ΨΓ(H) and γ>0, we denote by proxγΨ(x) the proximal point with parameter γ of function Ψ at point xH, which is the unique optimal solution of the optimization problem infyHΨy+12γyx2. Notice that Jγ∂Ψ=(Id+γ∂Ψ)1=proxγΨ, thus proxγΨ:HH is a single-valued operator fulfilling the so-called Moreau's decomposition formula: proxγΨ+γproxγ1Ψγ1Id=Id. The function Ψ:HR¯ is said to be γstrongly convex with γ>0, if Ψγ22 is a convex function. This property implies that ∂Ψ is γstrongly monotone.

The Fitzpatrick function [Citation36] associated to a monotone operator A is defined as ϕA:H×HR¯,ϕAx,u:=supy,vGrAx,v+y,uy,v and it is a convex and lower semicontinuous function. For insights in the outstanding role played by the Fitzpatrick function in relating the convex analysis with the theory of monotone operators we refer to [Citation33,Citation34,Citation37–39] and the references therein. If A is maximally monotone, then ϕA is proper and it fulfills ϕAx,ux,ux,uH×H, with equality if and only if (x,u)GrA. Notice that if ΨΓ(H), then ∂Ψ is a maximally monotone operator and it holds (∂Ψ)1=Ψ. Furthermore, the following inequality is true (see [Citation37]): (6) ϕ∂Ψx,uΨx+Ψux,vH×H.(6) We present as follows some statements that will be essential when carrying out the convergence analysis. Let (xn)n0 be a sequence in H and (λn)n1 be a sequence of positive real numbers. The sequence of weighted averages (zn)n1 is defined for every n1 as (7) zn:=1τnk=1nλkxk,where τn:=k=1nλk.(7)

Lemma 1.1

Opial-Passty

Let Z be a non-empty subset of H and assume that the limit limn+xnu exists for every element uZ. If every sequential weak cluster point of (xn)n0, respectively (zn)n1, lies in Z, then the sequence (xn)n0, respectively (zn)n1, converges weakly to an element in Z as n+.

Two following result can be found in [Citation5,Citation7].

Lemma 1.2

Let (θn)n0,(ξn)n1 and (δn)n1 be sequences in R+ with (δn)n11. If there exists n01 such that θn+1θnαnθnθn1ξn+δnnn0 and α such that 0αnα<1n1, then the following statements are true:

  1. n1[θnθn1]+<+, where [s]+:=max{s,0};

  2. the limit limnθn exists.

  3. the sequence (ξn)n1 belongs to 1.

The following result follows from Lemma 1.2, applied in case αn:=0 and θn:=ρnρ for all n1, where ρ is a lower bound for (ρn)n1.

Lemma 1.3

Let (ρn)n1 be a sequence in R, which is bounded from below, and (ξn)n1, (δn)n1 be sequences in R+ with (δn)n11. If there exists n01 such that ρn+1ρnξn+δnnn0, then the following statements are true:

  1. the sequence (ρn)n1 is convergent.

  2. the sequence (ξn)n1 belongs to 1.

The following result, which will be useful in this work, shows that statement (ii) in Lemma 1.3 can be obtained also when (ρn)n1 is not bounded from below, but it has a particular form.

Lemma 1.4

Let (ρn)n1 be a sequence in R and (ξn)n1,(δn)n1 be sequences in R+ with (δn)n11 and ρn:=θnαnθn1+χnn1, where (θn)n0,(χn)n1 are sequences in R+ and there exists α such that 0αnα<1n1. If there exists n01 such that (8) ρn+1ρnξn+δnnn0,(8) then the sequence (ξn)n1 belongs to 1.

Proof.

We fix an integer N¯n0, sum up the inequalities in (Equation8) for n=n0,n0+1,,N¯ and obtain (9) ρN¯+1ρn0n=n0N¯ξn+n=n0N¯δnn1δn<+.(9) Hence the sequence {ρn}n1 is bounded from above. Let ρ¯>0 be an upper bound of this sequence. For all n1 it holds θnαθn1θnαnθn1+χn=ρnρ¯, from which we deduce that (10) ρnθn+αθn1αθn1.(10) By induction we obtain for all nn0+1 (11) θnαθn1+ρ¯αnn0θn0+ρ¯k=1nn0αk1αnn0θn0+ρ¯1α.(11) Then inequality (Equation9) combined with (Equation10) and (Equation11) leads to (12) n=n0N¯ξnρn0ρN¯+1+n=n0N¯δnρn0+αθN¯+n1δnρn0+αN¯n0+1θn0+αρ¯1α+n1δn<+.(12) We let N¯ converge to + and obtain that n1ξn<+.

2. The general monotone inclusion problem

In this section we address the following monotone inclusion problem.

Problem 2.1

Let H be a real Hilbert space, A:HH a maximally monotone operator, D:HH an ηcocoercive with η>0,B:HH a μcocoercive with μ>0 and assume that M:=Zer B. The monotone inclusion problem to solve reads 0Ax+Dx+NMx.

The following forward–backward penalty algorithm with inertial effects for solving Problem 2.1 will be in the focus of our investigations in this paper.

When D=0 and B=h, where h:HR is a convex and differentiable function with μ1-Lipschitz continuous gradient with μ>0 fulfilling minh=0, then Problem 2.1 recovers the monotone inclusion problem addressed in [Citation3, Section 3] and Algorithm 2.2 can be seen as an inertial version of the iterative scheme considered in this paper. When B=0, we have that NM={0} and Algorithm 2.2 is nothing else than the inertial version of the classical forward–backward algorithm (see for instance [Citation33,Citation40]).

Hypotheses 2.2

The convergence analysis will be carried out in the following hypotheses (see also [Citation7]):

  1. A+NMis maximally monotone and Zer(A+D+NM);

  2. for every pRanNM,n1λnβn[supuMϕB(u,pβn)σM(pβn)]<+.

Since A and NM are maximally monotone operators, the sum A+NM is maximally monotone, provided some specific regularity conditions are fulfilled (see [Citation33–35,Citation38]). Furthermore, since D is also maximally monotone [Citation33, Example 20.28] and DomDH, if A+NM is maximally monotone, then A+D+NM is also maximally monotone.

Let us also notice that for pRanNM there exists uˆM such that pNM(uˆ), hence, for every β>0 it holds supuMϕBu,pβσMpβuˆ,pβσMpβ=0. For situations where (H2fitz) is satisfied we refer the reader to [Citation5,Citation8,Citation9,Citation11].

Before formulating the main theorem of this section we will prove some useful technical results.

Lemma 2.3

Let (xn)n0 be the sequence generated by Algorithm 2.2 and (u,y) be an element in Gr(A+D+NM) such that y=v+Du+p with vAu and pNM(u). Further, let ε1,ε2,ε3>0 be such that 1ε3>0. Then the following inequality holds for all n1 (13) xn+1u2xnu2 αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn12+2ε2λn2βn22μ1ε3λnβnBxn2+4ε2λn22ηλnDxnDu2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y.(13)

Proof.

Let n1 be fixed. According to definition of the resolvent of the operator A we have (14) xnxn+1λnDxn+βnBxn+αnxnxn1λnAxn+1(14) and, since λnvλnAu, the monotonicity of A guarantees (15) xn+1u,xnxn+1λnDxn+βnBxn+v+αnxnxn10(15) or, equivalently, (16) 2uxn+1,xnxn+12λnuxn+1,βnBxn+Dxn+v2αnuxn+1,xnxn1.(16) For the term in the left-hand side of (Equation16) we have (17) 2uxn+1,xnxn+1=xn+1u2+xn+1xn2xnu2.(17) Since 2αnuxn,xnxn1= αnuxn12+αnuxn2+αnxnxn12 and 2xn+1xn,αnxnxn1 4ε1xn+1xn2+αn24ε1xnxn12, by adding the two inequalities, we obtain the following estimation for the second term in the right-hand side of (Equation16) (18) 2αnuxn+1,xnxn1 αnxnu2αnxn1u2+4ε1xn+1xn2+αn+αn24ε1xnxn12.(18) We turn now our attention to the first term in the right-hand side of (Equation16), which can be written as (19) 2λnuxn+1,βnBxn+Dxn+v= 2λnuxn,βnBxn+Dxn+v+2λnβnxnxn+1,Bxn+2λnxnxn+1,Dxn+v.(19) We have (20) 2λnβnxnxn+1,Bxnε22xn+1xn2+2ε2λn2βn2Bxn2(20) and (21) 2λnxnxn+1,Dxn+vε22xn+1xn2+2ε2λn2Dxn+v2 ε22xn+1xn2+4ε2λn2DxnDu2+4ε2λn2Du+v2.(21) On the other hand, we have (22) 2λnuxn,βnBxn+Dxn+v= 2λnβnuxn,Bxn+2λnuxn,DxnDu+2λnuxn,Du+v.(22) Since 0<ε3<1 and Bu=0, the cocoercivity of B gives us (23) 2λnβnuxn,Bxn2μ1ε3λnβnBxn2+2ε3λnβnuxn,Bxn.(23) Similarly, the cocoercivity of D gives us (24) 2λnuxn,DxnDu2ηλnDxnDu2.(24) Combining (Equation23)–(Equation24) with (Equation22) and by using the definition Fitzpatrick function and the fact that σM(p/ε3βn)=u,p/ε3βn, we obtain (25) 2λnuxn,βnBxn+Dxn+v2μ1ε3λnβnBxn2+2ε3λnβnuxn,Bxn2ηλnDxnDu2+2λnuxn,Du+v=2μ1ε3λnβnBxn2+2ε3λnβnuxn,Bxn2ηλnDxnDu2+2λnuxn,yp=2μ1ε3λnβnBxn22ηλnDxnDu2+2λnuxn,y+2ε3λnβnu,Bxn+xn,pε3βnxn,Bxnu,pε3βn2μ1ε3λnβnBxn22ηλnDxnDu2+2λnuxn,y+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn.(25) The inequalities (Equation20), (Equation21) and (Equation25) lead to (26) 2λnuxn+1,βnBxn+Dxn+v 2ε2λn2βn22μ1ε3λnβnBxn2+4ε2λn22ηλnDxnDu2+ε2xn+1xn2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y.(26) Finally, by combining (Equation17), (Equation18) and (Equation26), we obtain (Equation13).

From now on we will assume that for 0<α<13 the constants ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 are chosen such that (C4)1ε3>0,ε2<14ε1αα24ε1andsupn1λnβn<με21ε3. As a consequence, there exists 0<s 1ε113ε1ε21+α2ε12 , which means that for all n1 it holds (27) αn+1+αn+124ε114ε1ε3α+α24ε114ε1ε3<s.(27) On the other hand, there exists 0<tμ(1ε2)(1/ε3)supn0λnβn , which means that for all n1 it holds (28) 1ε3λnβnμ1ε2t.(28)

Remark 2.4

  • Since 0<α<13, one can always find ε1,ε2>0 such that ε2<14ε1α(α2/4ε1) . One possible choice is ε1=α4 and 0<ε2<13α. From the second inequality in (C4) it follows that 13ε1ε2>ε1+α+(α2/4ε1)>0.

  • As 1ε113ε1ε21+α2ε12=113ε1ε214ε1ε2αα24ε1>0, it is always possible to choose s such that 0<s1ε113ε1ε(1+α2ε1)2. Since in this case s<14ε1ε2αα24ε1, one has (Equation27).

The following proposition brings us closer to the convergence result.

Proposition 2.5

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 2.2 and assume that the Hypotheses 2.2 are verified. Then the following statements are true:

  1. the sequence (xn+1xn)n0 belongs to 2 and the sequence (λnβnBxn2)n1 belongs to 1;

  2. if, moreover, lim infn+λnβn>0, then limn+Bxn=0 and thus every cluster point of the sequence (xn)n0 lies in M.

  3. for every uZer(A+D+NM), the limit limn+xnu exists.

Proof.

Since limn+λn=0, there exists a integer n11 such that λn(2/ε2)η for all nn0. According to Lemma 2.3, for every (u,y)Gr(A+D+NM) such that y=v+Du+p, with vAu and pNM(u), and all nn0 the following inequality holds (29) xn+1u2xnu2 αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn12+2ε2λnβn2μ1ε3λnβnBxn2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y.(29) We consider uZer(A+D+NM), which means that we can take y=0 in (Equation29). For all n1 we denote (30) θn:=xnu2,ρn:=θnαnθn1+αn+αn24ε1xnxn12(30) and (31) δn:=4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn.(31) Using that (αn)n1 is non-decreasing, for all nn0 it yields (32) ρn+1ρnαn+1+αn+124ε114ε1ε2xn+1xn2+2ε3λnβn2μ1ε2λnβnBxn2+δnsxn+1xn22tλnβnBxn2+δn,(32) where s,t>0 are chosen according to (Equation27) and (Equation28), respectively.

Thanks to (H2fitz) and (C1) it holds (33) n1δn=4ε2Du+v2n1λn2+2n1ε3λnβnsupuMϕBu,pε3βnσMpε3βn<+.(33) Hence, according to Lemma 1.4, we obtain (34) n0xn+1xn2<+ and n1λnβnBxn2<+,(34) which proves (i). If, in addition, lim infnλnβn>0, then limn+Bxn=0, which means every cluster point of the sequence (xn)n0 lies in Zer B=M.

In order to prove (iii), we consider again the inequality (Equation29) for an arbitrary element uZer(A+D+NM) and y=0. With the notations in (Equation30) and (Equation31), we get for all nn0 (35) θn+1θnαnθnθn1+αn+αn24ε1xnxn12+δn.(35) According to (Equation33) and (Equation34) we have (36) n1αn+αn24ε1xnxn12+n1δnα+α24ε1n1xnxn12+n1δn<+,(36) therefore, by Lemma 1.2, the limit limn+θn=limn+xnu2 exists, which means that the limit limn+xnu exists, too.

Remark 2.6

The condition (C3) that we imposed in combination with 0<α<13 on the sequence of inertial parameters (αn)n1 is the one proposed in [Citation15, Proposition 2.1] when addressing the convergence of the inertial proximal point algorithm. However, the statements in the proposition above and in the following convergence theorem remain valid if one alternatively assumes that there exists α such that 0αnα<1 for all n1 and n1αn+αn24ε1xnxn12<+. This can be realized if one chooses for a fixed p>1 αnminα,2ε11+1+npxnxn12 n1. Indeed, in this situation we have that (αn2/4ε1)+αn(1/npxnxn12) 0 for all n1, which gives n1αn+αn24ε1xnxn12n11np<+.

Now we are ready to prove the main theorem of this section, which addresses the convergence of the sequence generated by Algorithm 2.2.

Theorem 2.8

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 2.2, (zn)n1 be the sequence defined in (Equation7) and assume that the Hypotheses 2.2 are verified. Then the following statements are true:

  1. the sequence (zn)n1 converges weakly to an element in Zer(A+D+NM) as n+.

  2. if A is γ-strongly monotone with γ>0, then (xn)n0 converges strongly to the unique element in Zer(A+D+NM) as n+.

Proof.

  1. According to Proposition 2.5 (iii), the limit limn+xnu exists for every uZer(A+D+NM). Let z be a sequential weak cluster point of (zn)n1. We will show that zZer(A+D+NM), by using the characterization (Equation5) of the maximal monotonicity, and the conclusion will follow by Lemma 1.1. To this end we consider an arbitrary (u,y)Gr(A+D+NM) such that y=v+Du+p, where vAu and pNM(u). From (Equation29), with the notations (Equation30) and (Equation31), we have for all nn0 (37) ρn+1ρnsxn+1xn22tλnβnBxn2+δn+2λnuxn,yδn+2λnuxn,y.(37) Recall that from (Equation33) that n1δn<+. Since (xn)n0 is bounded, the sequence (ρn)n1 is also bounded.

    We fix an arbitrary integer N¯n0 and sum up the inequalities in (Equation37) for n=n0+1,n0+2,,N¯. This yields ρN¯+1ρn0+1n1δn+2n=1n0λnu+n=1n0λnxn,y+2τN¯un=1N¯λnxn,y. After dividing this last inequality by 2τN¯=2n=1N¯λn, we obtain (38) 12τN¯ρN¯+1ρn0+112τN¯T+2uzN¯,y,(38) where T:=n1δn+2n=1n0λnu+n=1n0λnxn,yR. By passing in (Equation38) to the limit and by using that limNτN¯=limN¯n=1N¯λn=+, we get lim infN¯uzN¯,y0. As z is a sequential weak cluster point of (zn)n1, the above inequality gives us uz,y0, which finally means that zZer(A+D+NM).

  2. Let uH be the unique element in Zer(A+D+NM). Since A is γstrongly monotone with γ>0, the formula in (Equation15) reads for all n1 xn+1u,xnxn+1λnDxn+βnBxn+v+αnxnxn1γλnxn+1u2 or, equivalently, 2γλnxn+1u2+2uxn+1,xnxn+1 2λnuxn+1,βnBxn+Dxn+v2αnuxn+1,xnxn1. By using again (Equation17), (Equation18) and (Equation26) we obtain for all n1 2γλnxn+1u2+xn+1u2xnu2αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn12+2ε2λn2βn22μ1ε3λnβnBxn2+4ε2λn22ηλnDxnDu2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y. By using the notations in (Equation30) and (Equation31), this yields for all n1 2γλnxn+1u2+θn+1θnαnθnθn1+αn+αn24ε1xnxn12+δn. By taking into account (Equation36), from Lemma 1.2 we get 2γn1λnxnu2<+. According to (C1) we have n1λn=+, which implies that the limit limnxnu must be equal to zero. This provides the desired conclusion.

3. Applications to convex bilevel programming

We will employ the results obtained in the previous section, in the context of monotone inclusions, to the solving of convex bilevel programming problems.

Problem 3.1

Let H be a real Hilbert space, f:HR¯ a proper, convex and lower semicontinuous function and g,h:HR differentiable functions with Lg-Lipschitz continuous and, respectively, Lh-Lipschitz continuous gradients. Suppose that argminh and minh=0. The bilevel programming problem to solve reads minxargminhfx+gx.

The assumption minh=0 is not restrictive as, otherwise, one can replace h with hminh.

Hypotheses 3.2

The convergence analysis will be carry out in the following hypotheses:

  1. f+Nargminh is maximally monotone and S:=argminxargminh{f(x)+g(x)};

  2. for every pRanNargminh,n1λnβn[h(pβn)σargminh(pβn)]<+.

In the above hypotheses, we have that f+g+Nargminh=(f+g+δargminh) and hence S=Zer(f+g+Nargminh). Since according to the Theorem of Baillon–Haddad (see, e.g. [Citation33, Corollary 18.16]), g and h are Lg1-cocoercive and, respectively, Lh1- cocoercive, and argminh=Zerh, solving the bilevel programming problem in Problem 3.1 reduces to solving the monotone inclusion 0f(x)+g(x)+Nargminh(x). By using to this end Algorithm 2.2, we receive the following iterative scheme.

By using the inequality (Equation6), one can easily notice, that (H2prog) implies (H2fitz), which means that the convergence statements for Algorithm 3.3 can be derived as particular instances of the ones derived in the previous section.

Alternatively, one can use to this end the following lemma and employ the same ideas and techniques as in Section 2. Lemma 3.3 is similar to Lemma 2.3, however, it will allow us to provide convergence statements also for the sequence of function values (h(xn))n0.

Lemma 3.3

Let (xn)n0 be the sequence generated by Algorithm 3.3 and (u,y) be an element in Gr(f+g+Nargminh) such that y=v+g(u)+p with vf(u) and pNargminh(u). Further, let ε1,ε2,ε3>0 be such that 1ε3>0. Then the following inequality holds for all n1 xn+1u2xnu2 αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn122ε2λn2βn22μ1ε3λnβnhxn2+4ε2λn22ηλngxngu2+λnβnhuhxn+4ε2λn2v+gu2+ε3λnβnh2pε3βnσargminh2pε3βn+2λnuxn,y.

Proof.

Let be n1 fixed. The proof follows by combining the estimates used in the proof of Lemma 2.3 with some inequalities which better exploit the convexity of h. From (Equation23) we have 2λnβnuxn,hxn2μ1ε3λnβnhxn2+2ε3λnβnuxn,hxn. Since h is convex, the following relation also holds 2λnβnuxn,hxn2λnβnhuhxn. Summing up the two inequalities above gives 2λnβnuxn,hxnμ1ε3λnβnhxn2+ε3λnβnuxn,hxn+λnβnhuhxn. Using the same techniques as in the derivation of (Equation25), we get 2λnuxn,v+gxn+βnhxnμ1ε3λnβnhxn22ηλngxngu2+λnβnhuhxn+2λnuxn,y+ε3λnβnhu,2pε3βnσargminh2pε3βn. With these improved estimates, the conclusion follows as in the proof of Lemma 2.3.

By using now Lemma 3.3, one obtains, after slightly adapting the proof of Proposition 2.5, the following result.

Proposition 3.4

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following statements are true:

  1. the sequence (xn+1xn)n0 belongs to 2 and the sequences (λnβnh(xn)2)n1 and (λnβnh(xn))n1 belong to 1;

  2. if, moreover, lim infn+λnβn>0, then limn+h(xn)=limn+h(xn)=0 and thus every cluster point of the sequence (xn)n0 lies in argminh.

  3. for every uS, the limit limn+xnu exists.

Finally, the above proposition leads to the following convergence result.

Theorem 3.6

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 3.3, (zn)n1 be the sequence defined in (Equation7) and assume that the Hypotheses 3.2 are verified. Then the following statements are true:

  1. the sequence (zn)n1 converges weakly to an element in S as n+.

  2. if f is γstrongly convex with γ>0, then (xn)n0 converges strongly to the unique element in S as n+.

As follows we will show that under inf-compactness assumptions one can achieve weak non-ergodic convergence for the sequence (xn)n0. Weak non-ergodic convergence has been obtained for Algorithm 3.3 in [Citation9] when αn=α for all n1 and for restrictive choices for both the sequence of step sizes and penalty parameters.

We denote by (f+g)=minxargminh(f(x)+g(x)). For every element x in H, we denote by dist(x,S)=infuSxu the distance from x to S. In particular, dist(x,S)=xPrSx, where PrSx denotes the projection of x onto S. The projection operator PrS is firmly non-expansive [Citation33, Proposition 4.8], this means (39) PrSxPrSy2+IdPrSxIdPrSy2xy2 x,yH.(39) Denoting d(x)=12dist(x,S)2=12xPrSx2 for all xH, one has that xd(x) is differentiable and it holds d(x)=xPrSx for all xH.

Lemma 3.6

Let (xn)n0 be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following inequality holds for all n1 (40) dxn+1dxnαndxndxn1+λn(f+g)xn+1(f+g) Lg2λn+Lh4λnβn+αn2xn+1xn2+αnxnxn12.(40)

Proof.

Let n1 be fixed. Since d is convex, we have (41) dxn+1dxnxn+1PrSxn+1,xn+1xn.(41) Then there exists vn+1f(xn+1) such that (see (Equation14)) xnxn+1λn(g(xn)+βnh(xn))+αn(xnxn1)=λnvn+1 and, so, (42) xn+1PrSxn+1,xn+1xn= xn+1PrSxn+1,λnvn+1λngxnλnβnhxn+αnxnxn1λnβnxn+1PrSxn+1,hxn+αnxn+1PrSxn+1,xnxn1.(42)

Since vn+1f(xn+1), we get (43) λnxn+1PrSxn+1,vn+1λnfPrSxn+1fxn+1.(43) Using the convexity of g it follows (44) gxngPrSxn+1gxn,xnPrSxn+1.(44) On the other hand, the Descent Lemma gives (45) gxn+1gxn+gxn,xn+1xn+Lg2xn+1xn2.(45) By adding (Equation44) and (Equation45), it yields (46) λnxn+1PrSxn+1,gxn λngPrSxn+1gxn+1+Lgλn2xn+1xn2.(46) Using the (1/Lh) cocoercivity of h combined with the fact that h(PrS(xn+1))=0 (as PrS(xn+1) belongs to S), it yields xnPrSxn+1,hxn1Lhhxn2. Therefore (47) λnβnxn+1PrSxn+1,hxnλnβnxnxn+1,hxn1Lhhxn2λnβnLh4xn+1xn2.(47)

Further, we have αnxn+1PrSxn+1xnPrSxn,xnxn1 αn2IdPrSxn+1IdPrSxn2+αn2xnxn12αn2xn+1xn2+αn2xnxn12, and αnxnPrSxn,xnxn1=αndxn+αn2xnxn12αn2xn1PrSxn2αndxn+αn2xnxn12αndxn1. By adding two relations above, we obtain (48) αnxn+1PrSxn+1,xnxn1= αnxn+1PrSxn+1xnPrSxn,xnxn1+αnxnPrSxn,xnxn1αn2xn+1xn2+αnxnxn12+αndxndxn1.(48) By combining (Equation43), (Equation46), (Equation47) and (Equation48) with (Equation42) we obtain the desired conclusion.

Definition 3.7

A function Ψ:HR¯ is sad to be inf-compact if for every r>0 and κR the set LevκrΨ:=xH:xr,Ψxκ is relatively compact in H.

An useful property of inf-compact functions follows.

Lemma 3.8

Let Ψ:HR¯ be inf-compact and (xn)n0 be a bounded sequence in H such that (Ψ(xn))n0 is bounded as well. If the sequence (xn)n0 converges weakly to an element in xˆ as n+, then it converges strongly to this element.

Proof.

Let be r¯>0 and κ¯R such that for all n1 xnr¯andΨxnκ¯. Hence, (xn)n0 belongs to the set Levκ¯r¯(Ψ), which is relatively compact. Then (xn)n0 has at least one strongly convergent subsequence. Since every strongly convergent subsequence (xnl)l0 of (xn)n0 has as limit xˆ, the desired conclusion follows.

We can formulate now the weak non-ergodic convergence result.

Theorem 3.10

Let 0<α<13, ε1,ε2,ε3>0, the sequences (λn)n1 and (βn)n1 satisfy the condition 0<lim infnλnβnsupn0λnβnμ, (xn)n0 be the sequence generated by Algorithm 3.3, and assume that the Hypotheses 3.2 are verified and that either f+g or h is inf-compact. Then the following statements are true:

  1. limn+d(xn)=0;

  2. the sequence (xn)n0 converges weakly to an element in S as n+;

  3. if h is inf-compact, then the sequence (xn)n0 converges strongly to an element in S as n+.

Proof.

  1. Thanks to Lemma 3.6, for all n1 we have (49) dxn+1dxn+λn(f+g)xn+1(f+g)αndxndxn1+ζn,(49) where ζn:= Lg2λn+Lh4λnβn+αn2xn+1xn2+αnxnxn12. From Proposition 3.4 (i), combined with the fact that both sequences (λn)n1 and (βn)n1 are bounded, it follows that n1ζn<+.

    In general, since (xn)n0 is not necessarily included in argminh, we have to treat two different cases.

    Case 1: There exists an integer n11 such that (f+g)(xn)(f+g) for all nn1. In this case, we obtain from Lemma 1.2 that:

    • the limit limn+d(xn) exists.

    • nn2λn[(f+g)(xn+1)(f+g)]<+. Moreover, since (λn)n11, we must have (50) lim infn+(f+g)xn(f+g).(50)

    Consider a subsequence (xnk)k0 of (xn)n0 such that limk+(f+g)xnk=lim infn+(f+g)xn and note that, thanks to (Equation50), the sequence ((f+g)(xnk))k0 is bounded. From Proposition 3.4 (ii) –(iii) we get that also (xnk)k0 and (h(xnk))k0 are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence (xnl)l0 of (xnk)k0, which converges strongly to an element xˆ as l+. According to Proposition 3.4 (ii) –(iii), xˆ belongs to argminh. On the other hand, (51) liml+(f+g)xnl=lim infn+(f+g)xn(f+g)xˆ(f+g).(51) We deduce from (Equation50)–(Equation51) that (f+g)(xˆ)=(f+g), or in other words, that xˆS. In conclusion, thanks to the continuity of d, limn+dxn=limldxnl=dxˆ=0. Case 2: For all n1 there exists some n>n such that (f+g)(xn)<(f+g). We define the set V=n1:(f+g)xn<(f+g). There exists an integer n22 such that for all nn2 the set {kn:kV} is non-empty. Hence, for all nn2 the number tn:=maxkn:kV is well-defined. By definition tnn for all nn3 and moreover the sequence {tn}nn2 is non-decreasing and limn+tn=. Indeed, if limntn=tR, then for all n>t it holds (f+g)(xn)(f+g), contradiction. Choose an integer Nn2.

    • If tN<N, then, for all n=tN,,N1, since (f+g)(xn)(f+g), the inequality (Equation49) gives (52) dxn+1dxndxn+1dxn+λnFxn+1Fαndxndxn1+ζn.(52) Summing (Equation52) for n=tN,,N1 and using tht {αn}n1 is non-decreasing, it yields (53) dxNdxtN n=tNN1αndxnαn1dxn1+n=tNN1ζnαdxN1+ntNζn.(53)

    • If tN=N, then d(xN)=d(xtN) and we have (54) dxNαdxN1dxtN+ntNζn.(54)

    For all n1 we define an:=d(xn)αd(xn1). In both cases it yields (55) aNdxtN+n=tNNζndxtN+ntNζn.(55) Passing in (Equation55) to limit as N+ we obtain that (56) lim supn+anlim supn+dxtn.(56) Let be uS. For all n1 we have dxn=12distxn,S212xnu2, which shows that (d(xn))n0 is bounded, as limn+xnu exists. We obtain (57) lim supnan=lim supndxnαdxn11αlim supndxn0.(57) Further, for all n1 we have (f+g)(xtn)<(f+g), which gives (58) lim supn+(f+g)xtn(f+g).(58) This means that the sequence ((f+g)(xtn))n0 is bounded from above. Consider a subsequence (xtk)k0 of (xtn)n0 such that limk+dxtk=lim supn+dxtn. From Proposition 3.4 (ii)–(iii) we get that also (xtk)k0 and (h(xtk))k0 are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence (xtl)l0 of (xtk)k0, which converges strongly to an element xˆ as l+. According to Proposition 3.4 (ii)–(iii), xˆ belongs to argminh. Furthermore, it holds (59) lim infl+(f+g)xtl(f+g)xˆ(f+g).(59) We deduce from (Equation58) and (Equation59) that (f+g)(f+g)xˆlim supn+(f+g)xtllim supn+(f+g)xtn(f+g), which gives xˆS. Thanks to the continuity of d we get (60) lim supn+dxtn=liml+dxtl=dxˆ=0.(60) By combining (Equation56), (Equation57) and (Equation60), it yields 01αlim supn+dxnlim supn+anlim supn+dxtn=0, which implies lim supn+d(xn)=0 and thus limn+dxn=lim infn+dxn=lim supn+dxn=0.

  2. According to (i) we have limnd(xn)=0, thus every weak cluster point of the sequence (xn)n0 belongs to S. From Lemma 1.1 it follows that (xn)n0 converges weakly to a point in S as n+.

  3. Since lim infnλnβn>0, from Proposition 3.4(ii) we have that limn+hxn=limn+hxn=0. Since (xn)n0 is bounded, there exist r¯>0 and κ¯R such that for all n1 xnr¯andhxnκ¯. Thanks to (ii) the sequence (xn)n0 converges weakly to an element in S. Therefore, according to Lemma 3.8, it converges strongly to this element in S.

Acknowledgments

The second author gratefully acknowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W1260-N35).

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

Research partially supported by FWF (Austrian Science Fund), projects I 2419-N32 and W1260-N35.

References

  • Attouch H, Czarnecki M-O. Asymptotic behavior of coupled dynamical systems with multiscale aspects. J Differ Equ. 2010;248(6):1315–1344.
  • Attouch H, Cabot A, Czarnecki M-O. Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans Am Math Soc. 2018;370(2):755–790.
  • Attouch H, Czarnecki M-O, Peypouquet J. Coupling forward–backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J Optim. 2011;21(4):1251–1274.
  • Attouch H, Czarnecki M-O, Peypouquet J. Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim. 2011;21(1):149–173.
  • Banert S, Boţ RI. Backward penalty schemes for monotone inclusion problems. J Optim Theory Appl. 2015;166(3):930–948.
  • Boţ RI, Csetnek ER. A Tseng's type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J Math. 2014;42(4):451–465.
  • Boţ RI, Csetnek ER. Forward–backward and Tseng's type penalty schemes for monotone inclusion problems. Set-Valued Var Anal. 2014;22(2):313–331.
  • Boţ RI, Csetnek ER, Nimana N. Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim Lett. 2018;12(1):17–33.
  • Boţ RI, Csetnek ER, Nimana N. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J Math. 2018;46(1):53–71.
  • Frankel P, Peypouquet J. Lagrangian-penalization algorithm for constrained optimization and variational inequalities. Set-Valued Var Anal. 2012;20(2):169–185.
  • Noun N, Peypouquet J. Forward–backward penalty scheme for constrained convex minimization without inf-compactness. J Optim Theory Appl. 2013;158(3):787–795.
  • Peypouquet J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J Optim Theory Appl. 2012;153(1):123–138.
  • Attouch H, Briceno-Arias LM, Combettes PL. A parallel splitting method for coupled monotone inclusions. SIAM J Control Optim. 2010;48(5):3246–3270.
  • Attouch H, Bolte J, Redont P, Soubeyran A. Alternating proximal algorithms for weakly coupled convex minimization problems, applications to dynamical games and PDE's. J Convex Anal. 2008;15(3):485–506.
  • Alvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1):3–11.
  • Polyak BT. Introduction to optimization. New York: Publications Division, Optimization Software Inc.; 1987. Translations Series in Mathematics and Engineering.
  • Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202.
  • Chambolle A, Dossal C. On the convergence of the iterates of the ‘Fast Iterative Shrinkage/Thresholding Algorithm’. J Optim Theory Appl. 2015;166(3):968–982.
  • Bertsekas DP. Nonlinear programming. Cambridge (MA): Athena Scientific; 1999.
  • Boţ RI, Csetnek ER, Laszlo SC. An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J Comput Optim. 2016;4(1):3–25.
  • Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Optim. 2000;38(4):1102–1119.
  • Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J Optim. 2004;14(3):773–782.
  • Attouch H, Czarnecki M-O. Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J Differ Equ. 2017;262(3):2745–2770.
  • Attouch H, Peypouquet J, Redont P. A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256.
  • Boţ RI, Csetnek ER. An inertial forward–backward–forward primal–dual splitting algorithm for solving monotone inclusion problems. Numer Algorithms. 2016;71(3):519–540.
  • Boţ RI, Csetnek ER. Penalty schemes with inertial effects for monotone inclusion problems. Optimization. 2017;66(6):313–331.
  • Boţ RI, Csetnek ER, Hendrich C. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl Math Comput. 2015;256(1):472–487.
  • Chen C, Chan RH, Ma S, Yang J. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J Imag Sci. 2015;8(4):2239–2267.
  • Chen C, Ma S, Yang J. A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim. 2015;25(4):2120–2142.
  • Maingé P-E. Convergence theorems for inertial Krasnosel'skiĭ–Mann type algorithms. J Comput Appl Math. 2008;219(1):223–236.
  • Maingé P-E, Moudafi A. Convergence of new inertial proximal methods for dc programming. SIAM J Optim. 2008;19(1):397–413.
  • Moudafi A, Oliny M. Convergence of a splitting inertial proximal method for monotone operators. J Comput Appl Math. 2003;155(2):447–454.
  • Bauschke HH, Combettes PL. Convex analysis monotone operator theory in Hilbert spaces. New York (NY): Springer; CMS Books in Mathematics; 2011.
  • Boţ RI. Conjugate duality in convex optimization. Vol. 637. Berlin, Heidelberg: Springer; 2010. Lecture Notes in Economics and Mathematical Systems.
  • Zălinescu C. Convex analysis in general vector spaces. Singapore: World Scientific; 2002.
  • Fitzpatrick S. Representing monotone operators by convex functions. Proceedings of the Centre for Mathematical Analysis. Workshop/Miniconference on Functional Analysis and Optimization, Vol. 20, Australian National University, Canberra; 1988.
  • Bauschke HH, McLaren DA, Sendov HS. Fitzpatrick functions: inequalities, examples and remarks on a problem by S. Fitzpatrick. J Convex Anal. 2006;13(3):499–523.
  • Borwein JM. Maximal monotonicity via convex analysis. J Convex Anal. 2006;13(3):561–586.
  • Burachik RS, Svaiter BF. Maximal monotone operators, convex functions and a special family of enlargements. Set-Valued Anal. 2002;10(4):29–316.
  • Combettes PL. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization. 2004;53(5–6):475–504.