![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
ABSTRACT
We investigate a forward–backward splitting algorithm of penalty type with inertial effects for finding the zeros of the sum of a maximally monotone operator and a cocoercive one and the convex normal cone to the set of zeroes of an another cocoercive operator. Weak ergodic convergence is obtained for the iterates, provided that a condition expressed via the Fitzpatrick function of the operator describing the underlying set of the normal cone is verified. Under strong monotonicity assumptions, strong convergence for the sequence of generated iterates is proved. As a particular instance we consider a convex bilevel minimization problem including the sum of a non-smooth and a smooth function in the upper level and another smooth function in the lower level. We show that in this context weak non-ergodic and strong convergence can be also achieved under inf-compactness assumptions for the involved functions.
1. Introduction and preliminaries
1.1. Motivation and problems formulation
During the last couple years one can observe in the optimization community an increasing interest in numerical schemes for solving variational inequalities expressed as monotone inclusion problems of the form
(1)
(1) where
is a real Hilbert space,
is a maximally monotone operator,
is the set of global minima of the proper, convex and lower semicontinuous function
and
is the normal cone of the set M. The article [Citation1] was starting point for a series of papers [Citation2–12] addressing this topic or related ones. All these papers share the common feature that the proposed iterative schemes use penalization strategies, namely, they evaluate the penalized h by its gradient, in case the function is smooth (see, for instance, [Citation3]), and by its proximal operator, in case it is non-smooth (see, for instance, [Citation4]).
Weak ergodic convergence has been obtained in [Citation3,Citation4] under the hypothesis:
(2)
(2) with
, the sequence of step sizes,
, the sequence of penalty parameters,
, the Fenchel conjugate function of h, and
the range of the normal cone operator
. Let us mention that (Equation2
(2)
(2) ) is the discretized counterpart of a condition introduced in [Citation1] for continuous-time non-autonomous differential inclusions.
One motivation for studying numerical algorithms for monotone inclusions of type (Equation1(1)
(1) ) comes from the fact that, when
is the convex subdifferential of a proper, convex and lower semicontinuous function
, they furnish iterative methods for solving bilevel optimization problems of the form
(3)
(3) Among the applications where bilevel programming problems play an important role we mention the modelling of Stackelberg games, the determination of Wardrop equilibria for network flows, convex feasibility problems [Citation13], domain decomposition methods for PDEs [Citation14], image processing problems [Citation6], and optimal control problems [Citation4].
Later on, in [Citation7], the following monotone inclusion problem, which turned out to be more suitable for applications, has been addressed in the same spirit of penalty algorithms
(4)
(4) where
is a maximally monotone operator,
is cocoercive operator and the constraint set M is the set of zeros of another cocoercive operator
. The provided algorithm of forward–backward type evaluates the operator A by a backward step and the two single-valued operators by forward steps. For the convergence analysis, (Equation2
(2)
(2) ) has been replaced by a condition formulated in terms of the Fitzpatrick function associated with the operator B, which we will also use in this paper. In [Citation5], several particular situations for which this new condition is fulfilled have been provided.
The aim of this work is to endow the forward–backward penalty scheme for solving (Equation4(4)
(4) ) from [Citation7] with inertial effects, which means that the new iterate is defined in terms of the previous two iterates. Inertial algorithms have their roots in the time discretization of second-order differential systems [Citation15]. They can accelerate the convergence of iterates when minimizing a differentiable function [Citation16] and the convergence of the objective function values when minimizing the sum of a convex non-smooth and a convex smooth function [Citation17,Citation18]. Moreover, as emphasized in [Citation19], see also [Citation20], algorithms with inertial effects may detect optimal solutions of minimization problems which cannot be found by their non-inertial variants. In the last years, a huge interest in inertial algorithms can be noticed (see, for instance, [Citation8,Citation9,Citation15,Citation17,Citation20–32]).
We prove weak ergodic convergence of the sequence generated by the inertial forward–backward penalty algorithm to a solution of the monotone inclusion problem (Equation4(4)
(4) ), under reasonable assumptions for the sequences of step sizes, penalty and inertial parameters. When the operator A is assumed to be strongly monotone, we also prove strong convergence of the generated iterates to the unique solution of (Equation4
(4)
(4) ).
In Section 3, we address the minimization of the sum of a convex non-smooth and a convex smooth function with respect to the set of minimizes of another convex and smooth function. Besides the convergence results obtained from the general case, we achieve weak non-ergodic and strong convergence statements under inf-compactness assumptions for the involved functions. The weak non-ergodic theorem is an useful alternative to the one in [Citation9], where a similar statement has been obtained for the inertial forward–backward penalty algorithm with constant inertial parameter under assumptions which are quite complicated and hard to verify (see also [Citation11,Citation12]).
1.2. Notations and preliminaries
In this subsection we introduce some notions and basic results which we will use throughout this paper (see [Citation33–35]). Let be a real Hilbert space with inner product
and associated norm
.
For a function , we denote
its effective domain and say that Ψ is proper, if
and
for all
. The conjugate function of Ψ is
. The convex subdifferential of Ψ at the point
is the set
, whenever
. We take by convention
, if
.
Let M be a non-empty subset of . The indicator function of M, which is denoted by
, takes the value 0 on M and
otherwise. The convex subdifferential of the indicator function is the normal cone of M, that is
, if
, and
otherwise. Notice that for
we have
if and only if
, where
is the support function of M.
For an arbitrary set-value operator we denote by
its graph, by
its domain, by
its range and by
its inverse operator, defined by
if and only if
. We use also the notation
for the set of zeros of the operator A. We say that A is monotone, if
for all
. A monotone operator A is said to be maximally monotone, if there exists no proper monotone extension of the graph of A on
. Let us mention that if A is maximally monotone, then
is a convex and closed set [Citation33, Proposition 23.39]. We refer to [Citation33, Section 23.4] for conditions ensuring that
is non-empty. If A is maximally monotone, then one has the following characterization for the set of its zeros
(5)
(5) The operator A is said to be γ- strongly monotone with
, if
for all
. If A is maximally monotone and strongly monotone, then
is a singleton, thus non-empty [Citation33, Corollary 23.27].
The resolvent of , is defined by
, where
denotes the identity operator on
. If A is maximally monotone, then
is single-value and maximally monotone [Citation33, Proposition 23.7, Corollary 23.10]. For an arbitrary
, we have the following identity [Citation33, Proposition 23.18]
We denote
the family of proper, convex and lower semicontinuous extended real-valued functions defined on
. When
and
, we denote by
the proximal point with parameter γ of function Ψ at point
, which is the unique optimal solution of the optimization problem
Notice that
, thus
is a single-valued operator fulfilling the so-called Moreau's decomposition formula:
The function
is said to be
strongly convex with
, if
is a convex function. This property implies that
is
strongly monotone.
The Fitzpatrick function [Citation36] associated to a monotone operator A is defined as
and it is a convex and lower semicontinuous function. For insights in the outstanding role played by the Fitzpatrick function in relating the convex analysis with the theory of monotone operators we refer to [Citation33,Citation34,Citation37–39] and the references therein. If A is maximally monotone, then
is proper and it fulfills
with equality if and only if
. Notice that if
, then
is a maximally monotone operator and it holds
. Furthermore, the following inequality is true (see [Citation37]):
(6)
(6) We present as follows some statements that will be essential when carrying out the convergence analysis. Let
be a sequence in
and
be a sequence of positive real numbers. The sequence of weighted averages
is defined for every
as
(7)
(7)
Lemma 1.1
Opial-Passty
Let Z be a non-empty subset of and assume that the limit
exists for every element
. If every sequential weak cluster point of
respectively
lies in Z, then the sequence
respectively
converges weakly to an element in Z as
.
Two following result can be found in [Citation5,Citation7].
Lemma 1.2
Let and
be sequences in
with
. If there exists
such that
and α such that
then the following statements are true:
, where
;
the limit
exists.
the sequence
belongs to
.
The following result follows from Lemma 1.2, applied in case and
for all
, where ρ is a lower bound for
.
Lemma 1.3
Let be a sequence in
, which is bounded from below, and
,
be sequences in
with
. If there exists
such that
then the following statements are true:
the sequence
is convergent.
the sequence
belongs to
.
The following result, which will be useful in this work, shows that statement (ii) in Lemma 1.3 can be obtained also when is not bounded from below, but it has a particular form.
Lemma 1.4
Let be a sequence in
and
be sequences in
with
and
where
are sequences in
and there exists α such that
If there exists
such that
(8)
(8) then the sequence
belongs to
.
Proof.
We fix an integer , sum up the inequalities in (Equation8
(8)
(8) ) for
and obtain
(9)
(9) Hence the sequence
is bounded from above. Let
be an upper bound of this sequence. For all
it holds
from which we deduce that
(10)
(10) By induction we obtain for all
(11)
(11) Then inequality (Equation9
(9)
(9) ) combined with (Equation10
(10)
(10) ) and (Equation11
(11)
(11) ) leads to
(12)
(12)
We let
converge to
and obtain that
.
2. The general monotone inclusion problem
In this section we address the following monotone inclusion problem.
Problem 2.1
Let be a real Hilbert space,
a maximally monotone operator,
an
cocoercive with
a
cocoercive with
and assume that
. The monotone inclusion problem to solve reads
The following forward–backward penalty algorithm with inertial effects for solving Problem 2.1 will be in the focus of our investigations in this paper.
When D=0 and , where
is a convex and differentiable function with
-Lipschitz continuous gradient with
fulfilling
, then Problem 2.1 recovers the monotone inclusion problem addressed in [Citation3, Section 3] and Algorithm 2.2 can be seen as an inertial version of the iterative scheme considered in this paper. When B=0, we have that
and Algorithm 2.2 is nothing else than the inertial version of the classical forward–backward algorithm (see for instance [Citation33,Citation40]).
Hypotheses 2.2
The convergence analysis will be carried out in the following hypotheses (see also [Citation7]):
;
for every
.
Since A and are maximally monotone operators, the sum
is maximally monotone, provided some specific regularity conditions are fulfilled (see [Citation33–35,Citation38]). Furthermore, since D is also maximally monotone [Citation33, Example 20.28] and
, if
is maximally monotone, then
is also maximally monotone.
Let us also notice that for there exists
such that
, hence, for every
it holds
For situations where (
) is satisfied we refer the reader to [Citation5,Citation8,Citation9,Citation11].
Before formulating the main theorem of this section we will prove some useful technical results.
Lemma 2.3
Let be the sequence generated by Algorithm 2.2 and
be an element in
such that y=v+Du+p with
and
. Further, let
be such that
. Then the following inequality holds for all
(13)
(13)
Proof.
Let be fixed. According to definition of the resolvent of the operator A we have
(14)
(14) and, since
, the monotonicity of A guarantees
(15)
(15) or, equivalently,
(16)
(16) For the term in the left-hand side of (Equation16
(16)
(16) ) we have
(17)
(17) Since
and
by adding the two inequalities, we obtain the following estimation for the second term in the right-hand side of (Equation16
(16)
(16) )
(18)
(18)
We turn now our attention to the first term in the right-hand side of (Equation16
(16)
(16) ), which can be written as
(19)
(19)
We have
(20)
(20) and
(21)
(21)
On the other hand, we have
(22)
(22)
Since
and Bu=0, the cocoercivity of B gives us
(23)
(23) Similarly, the cocoercivity of D gives us
(24)
(24) Combining (Equation23
(23)
(23) )–(Equation24
(24)
(24) ) with (Equation22
(22)
(22) ) and by using the definition Fitzpatrick function and the fact that
, we obtain
(25)
(25)
The inequalities (Equation20
(20)
(20) ), (Equation21
(21)
(21) ) and (Equation25
(25)
(25) ) lead to
(26)
(26)
Finally, by combining (Equation17
(17)
(17) ), (Equation18
(18)
(18) ) and (Equation26
(26)
(26) ), we obtain (Equation13
(13)
(13) ).
From now on we will assume that for the constants
and the sequences
and
are chosen such that
As a consequence, there exists
, which means that for all
it holds
(27)
(27) On the other hand, there exists
, which means that for all
it holds
(28)
(28)
Remark 2.4
Since
, one can always find
such that
. One possible choice is
and
. From the second inequality in
it follows that
.
As
it is always possible to choose s such that
. Since in this case
, one has (Equation27
(27)
(27) ).
The following proposition brings us closer to the convergence result.
Proposition 2.5
Let ,
and the sequences
and
satisfy condition
. Let
be the sequence generated by Algorithm 2.2 and assume that the Hypotheses 2.2 are verified. Then the following statements are true:
the sequence
belongs to
and the sequence
belongs to
;
if, moreover,
, then
and thus every cluster point of the sequence
lies in M.
for every
, the limit
exists.
Proof.
Since , there exists a integer
such that
for all
. According to Lemma 2.3, for every
such that y=v+Du+p, with
and
, and all
the following inequality holds
(29)
(29)
We consider
, which means that we can take y=0 in (Equation29
(29)
(29) ). For all
we denote
(30)
(30) and
(31)
(31) Using that
is non-decreasing, for all
it yields
(32)
(32)
where s,t>0 are chosen according to (Equation27
(27)
(27) ) and (Equation28
(28)
(28) ), respectively.
Thanks to () and (C
) it holds
(33)
(33) Hence, according to Lemma 1.4, we obtain
(34)
(34) which proves (i). If, in addition,
, then
, which means every cluster point of the sequence
lies in
.
In order to prove (iii), we consider again the inequality (Equation29(29)
(29) ) for an arbitrary element
and y=0. With the notations in (Equation30
(30)
(30) ) and (Equation31
(31)
(31) ), we get for all
(35)
(35) According to (Equation33
(33)
(33) ) and (Equation34
(34)
(34) ) we have
(36)
(36) therefore, by Lemma 1.2, the limit
exists, which means that the limit
exists, too.
Remark 2.6
The condition (C) that we imposed in combination with
on the sequence of inertial parameters
is the one proposed in [Citation15, Proposition 2.1] when addressing the convergence of the inertial proximal point algorithm. However, the statements in the proposition above and in the following convergence theorem remain valid if one alternatively assumes that there exists
such that
for all
and
This can be realized if one chooses for a fixed p>1
Indeed, in this situation we have that
for all
, which gives
Now we are ready to prove the main theorem of this section, which addresses the convergence of the sequence generated by Algorithm 2.2.
Theorem 2.8
Let ,
and the sequences
and
satisfy condition
. Let
be the sequence generated by Algorithm 2.2,
be the sequence defined in (Equation7
(7)
(7) ) and assume that the Hypotheses 2.2 are verified. Then the following statements are true:
the sequence
converges weakly to an element in
as
.
if A is γ-strongly monotone with
, then
converges strongly to the unique element in
as
.
Proof.
According to Proposition 2.5 (iii), the limit
exists for every
. Let z be a sequential weak cluster point of
. We will show that
, by using the characterization (Equation5
(5)
(5) ) of the maximal monotonicity, and the conclusion will follow by Lemma 1.1. To this end we consider an arbitrary
such that y=v+Du+p, where
and
. From (Equation29
(29)
(29) ), with the notations (Equation30
(30)
(30) ) and (Equation31
(31)
(31) ), we have for all
(37)
(37) Recall that from (Equation33
(33)
(33) ) that
. Since
is bounded, the sequence
is also bounded.
We fix an arbitrary integer
and sum up the inequalities in (Equation37
(37)
(37) ) for
. This yields
After dividing this last inequality by
, we obtain
(38)
(38) where
. By passing in (Equation38
(38)
(38) ) to the limit and by using that
, we get
As z is a sequential weak cluster point of
, the above inequality gives us
, which finally means that
.
Let
be the unique element in
. Since A is
strongly monotone with
, the formula in (Equation15
(15)
(15) ) reads for all
or, equivalently,
By using again (Equation17
(17)
(17) ), (Equation18
(18)
(18) ) and (Equation26
(26)
(26) ) we obtain for all
By using the notations in (Equation30
(30)
(30) ) and (Equation31
(31)
(31) ), this yields for all
By taking into account (Equation36
(36)
(36) ), from Lemma 1.2 we get
According to (C
) we have
, which implies that the limit
must be equal to zero. This provides the desired conclusion.
3. Applications to convex bilevel programming
We will employ the results obtained in the previous section, in the context of monotone inclusions, to the solving of convex bilevel programming problems.
Problem 3.1
Let be a real Hilbert space,
a proper, convex and lower semicontinuous function and
differentiable functions with
-Lipschitz continuous and, respectively,
-Lipschitz continuous gradients. Suppose that
and
. The bilevel programming problem to solve reads
The assumption is not restrictive as, otherwise, one can replace h with
.
Hypotheses 3.2
The convergence analysis will be carry out in the following hypotheses:
;
for every
.
In the above hypotheses, we have that and hence
. Since according to the Theorem of Baillon–Haddad (see, e.g. [Citation33, Corollary 18.16]),
and
are
-cocoercive and, respectively,
- cocoercive, and
, solving the bilevel programming problem in Problem 3.1 reduces to solving the monotone inclusion
By using to this end Algorithm 2.2, we receive the following iterative scheme.
By using the inequality (Equation6(6)
(6) ), one can easily notice, that
implies
, which means that the convergence statements for Algorithm 3.3 can be derived as particular instances of the ones derived in the previous section.
Alternatively, one can use to this end the following lemma and employ the same ideas and techniques as in Section 2. Lemma 3.3 is similar to Lemma 2.3, however, it will allow us to provide convergence statements also for the sequence of function values .
Lemma 3.3
Let be the sequence generated by Algorithm 3.3 and
be an element in
such that
with
and
. Further, let
be such that
. Then the following inequality holds for all
Proof.
Let be fixed. The proof follows by combining the estimates used in the proof of Lemma 2.3 with some inequalities which better exploit the convexity of h. From (Equation23
(23)
(23) ) we have
Since h is convex, the following relation also holds
Summing up the two inequalities above gives
Using the same techniques as in the derivation of (Equation25
(25)
(25) ), we get
With these improved estimates, the conclusion follows as in the proof of Lemma 2.3.
By using now Lemma 3.3, one obtains, after slightly adapting the proof of Proposition 2.5, the following result.
Proposition 3.4
Let ,
and the sequences
and
satisfy condition
. Let
be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following statements are true:
the sequence
belongs to
and the sequences
and
belong to
;
if, moreover,
, then
and thus every cluster point of the sequence
lies in
.
for every
, the limit
exists.
Finally, the above proposition leads to the following convergence result.
Theorem 3.6
Let ,
and the sequences
and
satisfy condition
. Let
be the sequence generated by Algorithm 3.3,
be the sequence defined in (Equation7
(7)
(7) ) and assume that the Hypotheses 3.2 are verified. Then the following statements are true:
the sequence
converges weakly to an element in
as
.
if f is
strongly convex with
, then
converges strongly to the unique element in
as
.
As follows we will show that under inf-compactness assumptions one can achieve weak non-ergodic convergence for the sequence . Weak non-ergodic convergence has been obtained for Algorithm 3.3 in [Citation9] when
for all
and for restrictive choices for both the sequence of step sizes and penalty parameters.
We denote by . For every element x in
, we denote by
the distance from x to
. In particular,
, where
denotes the projection of x onto
. The projection operator
is firmly non-expansive [Citation33, Proposition 4.8], this means
(39)
(39) Denoting
for all
, one has that
is differentiable and it holds
for all
.
Lemma 3.6
Let be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following inequality holds for all
(40)
(40)
Proof.
Let be fixed. Since d is convex, we have
(41)
(41) Then there exists
such that (see (Equation14
(14)
(14) ))
and, so,
(42)
(42)
Since , we get
(43)
(43) Using the convexity of g it follows
(44)
(44) On the other hand, the Descent Lemma gives
(45)
(45) By adding (Equation44
(44)
(44) ) and (Equation45
(45)
(45) ), it yields
(46)
(46) Using the
cocoercivity of
combined with the fact that
(as
belongs to
), it yields
Therefore
(47)
(47)
Further, we have
and
By adding two relations above, we obtain
(48)
(48)
By combining (Equation43
(43)
(43) ), (Equation46
(46)
(46) ), (Equation47
(47)
(47) ) and (Equation48
(48)
(48) ) with (Equation42
(42)
(42) ) we obtain the desired conclusion.
Definition 3.7
A function is sad to be inf-compact if for every r>0 and
the set
is relatively compact in
.
An useful property of inf-compact functions follows.
Lemma 3.8
Let be inf-compact and
be a bounded sequence in
such that
is bounded as well. If the sequence
converges weakly to an element in
as
, then it converges strongly to this element.
Proof.
Let be and
such that for all
Hence,
belongs to the set
, which is relatively compact. Then
has at least one strongly convergent subsequence. Since every strongly convergent subsequence
of
has as limit
, the desired conclusion follows.
We can formulate now the weak non-ergodic convergence result.
Theorem 3.10
Let ,
, the sequences
and
satisfy the condition
,
be the sequence generated by Algorithm 3.3, and assume that the Hypotheses 3.2 are verified and that either f+g or h is inf-compact. Then the following statements are true:
;
the sequence
converges weakly to an element in
as
;
if h is inf-compact, then the sequence
converges strongly to an element in
as
.
Proof.
Thanks to Lemma 3.6, for all
we have
(49)
(49) where
From Proposition 3.4 (i), combined with the fact that both sequences
and
are bounded, it follows that
.
In general, since
is not necessarily included in
, we have to treat two different cases.
Case 1: There exists an integer
such that
for all
. In this case, we obtain from Lemma 1.2 that:
the limit
exists.
. Moreover, since
, we must have
(50)
(50)
Consider a subsequence
of
such that
and note that, thanks to (Equation50
(50)
(50) ), the sequence
is bounded. From Proposition 3.4 (ii) –(iii) we get that also
and
are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence
of
, which converges strongly to an element
as
. According to Proposition 3.4 (ii) –(iii),
belongs to
. On the other hand,
(51)
(51) We deduce from (Equation50
(50)
(50) )–(Equation51
(51)
(51) ) that
, or in other words, that
. In conclusion, thanks to the continuity of d,
Case 2: For all
there exists some
such that
. We define the set
There exists an integer
such that for all
the set
is non-empty. Hence, for all
the number
is well-defined. By definition
for all
and moreover the sequence
is non-decreasing and
. Indeed, if
, then for all
it holds
, contradiction. Choose an integer
.
If
, then, for all
, since
, the inequality (Equation49
(49)
(49) ) gives
(52)
(52) Summing (Equation52
(52)
(52) ) for
and using tht
is non-decreasing, it yields
(53)
(53)
If
, then
and we have
(54)
(54)
For all
we define
. In both cases it yields
(55)
(55) Passing in (Equation55
(55)
(55) ) to limit as
we obtain that
(56)
(56) Let be
. For all
we have
which shows that
is bounded, as
exists. We obtain
(57)
(57) Further, for all
we have
, which gives
(58)
(58) This means that the sequence
is bounded from above. Consider a subsequence
of
such that
From Proposition 3.4 (ii)–(iii) we get that also
and
are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence
of
, which converges strongly to an element
as
. According to Proposition 3.4 (ii)–(iii),
belongs to
. Furthermore, it holds
(59)
(59) We deduce from (Equation58
(58)
(58) ) and (Equation59
(59)
(59) ) that
which gives
. Thanks to the continuity of d we get
(60)
(60) By combining (Equation56
(56)
(56) ), (Equation57
(57)
(57) ) and (Equation60
(60)
(60) ), it yields
which implies
and thus
According to (i) we have
, thus every weak cluster point of the sequence
belongs to
. From Lemma 1.1 it follows that
converges weakly to a point in
as
.
Since
, from Proposition 3.4(ii) we have that
Since
is bounded, there exist
and
such that for all
Thanks to (ii) the sequence
converges weakly to an element in
. Therefore, according to Lemma 3.8, it converges strongly to this element in
.
Acknowledgments
The second author gratefully acknowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W1260-N35).
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Radu Ioan Boţ http://orcid.org/0000-0002-4469-314X
Additional information
Funding
References
- Attouch H, Czarnecki M-O. Asymptotic behavior of coupled dynamical systems with multiscale aspects. J Differ Equ. 2010;248(6):1315–1344.
- Attouch H, Cabot A, Czarnecki M-O. Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans Am Math Soc. 2018;370(2):755–790.
- Attouch H, Czarnecki M-O, Peypouquet J. Coupling forward–backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J Optim. 2011;21(4):1251–1274.
- Attouch H, Czarnecki M-O, Peypouquet J. Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim. 2011;21(1):149–173.
- Banert S, Boţ RI. Backward penalty schemes for monotone inclusion problems. J Optim Theory Appl. 2015;166(3):930–948.
- Boţ RI, Csetnek ER. A Tseng's type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J Math. 2014;42(4):451–465.
- Boţ RI, Csetnek ER. Forward–backward and Tseng's type penalty schemes for monotone inclusion problems. Set-Valued Var Anal. 2014;22(2):313–331.
- Boţ RI, Csetnek ER, Nimana N. Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim Lett. 2018;12(1):17–33.
- Boţ RI, Csetnek ER, Nimana N. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J Math. 2018;46(1):53–71.
- Frankel P, Peypouquet J. Lagrangian-penalization algorithm for constrained optimization and variational inequalities. Set-Valued Var Anal. 2012;20(2):169–185.
- Noun N, Peypouquet J. Forward–backward penalty scheme for constrained convex minimization without inf-compactness. J Optim Theory Appl. 2013;158(3):787–795.
- Peypouquet J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J Optim Theory Appl. 2012;153(1):123–138.
- Attouch H, Briceno-Arias LM, Combettes PL. A parallel splitting method for coupled monotone inclusions. SIAM J Control Optim. 2010;48(5):3246–3270.
- Attouch H, Bolte J, Redont P, Soubeyran A. Alternating proximal algorithms for weakly coupled convex minimization problems, applications to dynamical games and PDE's. J Convex Anal. 2008;15(3):485–506.
- Alvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1):3–11.
- Polyak BT. Introduction to optimization. New York: Publications Division, Optimization Software Inc.; 1987. Translations Series in Mathematics and Engineering.
- Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202.
- Chambolle A, Dossal C. On the convergence of the iterates of the ‘Fast Iterative Shrinkage/Thresholding Algorithm’. J Optim Theory Appl. 2015;166(3):968–982.
- Bertsekas DP. Nonlinear programming. Cambridge (MA): Athena Scientific; 1999.
- Boţ RI, Csetnek ER, Laszlo SC. An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J Comput Optim. 2016;4(1):3–25.
- Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Optim. 2000;38(4):1102–1119.
- Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J Optim. 2004;14(3):773–782.
- Attouch H, Czarnecki M-O. Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J Differ Equ. 2017;262(3):2745–2770.
- Attouch H, Peypouquet J, Redont P. A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256.
- Boţ RI, Csetnek ER. An inertial forward–backward–forward primal–dual splitting algorithm for solving monotone inclusion problems. Numer Algorithms. 2016;71(3):519–540.
- Boţ RI, Csetnek ER. Penalty schemes with inertial effects for monotone inclusion problems. Optimization. 2017;66(6):313–331.
- Boţ RI, Csetnek ER, Hendrich C. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl Math Comput. 2015;256(1):472–487.
- Chen C, Chan RH, Ma S, Yang J. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J Imag Sci. 2015;8(4):2239–2267.
- Chen C, Ma S, Yang J. A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim. 2015;25(4):2120–2142.
- Maingé P-E. Convergence theorems for inertial Krasnosel'skiĭ–Mann type algorithms. J Comput Appl Math. 2008;219(1):223–236.
- Maingé P-E, Moudafi A. Convergence of new inertial proximal methods for dc programming. SIAM J Optim. 2008;19(1):397–413.
- Moudafi A, Oliny M. Convergence of a splitting inertial proximal method for monotone operators. J Comput Appl Math. 2003;155(2):447–454.
- Bauschke HH, Combettes PL. Convex analysis monotone operator theory in Hilbert spaces. New York (NY): Springer; CMS Books in Mathematics; 2011.
- Boţ RI. Conjugate duality in convex optimization. Vol. 637. Berlin, Heidelberg: Springer; 2010. Lecture Notes in Economics and Mathematical Systems.
- Zălinescu C. Convex analysis in general vector spaces. Singapore: World Scientific; 2002.
- Fitzpatrick S. Representing monotone operators by convex functions. Proceedings of the Centre for Mathematical Analysis. Workshop/Miniconference on Functional Analysis and Optimization, Vol. 20, Australian National University, Canberra; 1988.
- Bauschke HH, McLaren DA, Sendov HS. Fitzpatrick functions: inequalities, examples and remarks on a problem by S. Fitzpatrick. J Convex Anal. 2006;13(3):499–523.
- Borwein JM. Maximal monotonicity via convex analysis. J Convex Anal. 2006;13(3):561–586.
- Burachik RS, Svaiter BF. Maximal monotone operators, convex functions and a special family of enlargements. Set-Valued Anal. 2002;10(4):29–316.
- Combettes PL. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization. 2004;53(5–6):475–504.