![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
ABSTRACT
We investigate a second-order dynamical system with variable damping in connection with the minimization of a nonconvex differentiable function. The dynamical system is formulated in the spirit of the differential equation which models Nesterov's accelerated convex gradient method. We show that the generated trajectory converges to a critical point, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. We also provide convergence rates for the trajectory formulated in terms of the Lojasiewicz exponent.
COMMUNICATED BY:
1. Introduction
Consider the (not necessarily convex) optimization problem
(1)
(1) where
is a Fréchet differentiable function with
-Lipschitz continuous gradient, i.e. there exists
such that
for all
. We associate to (Equation1
(1)
(1) ) the second-order dynamical system (for
)
(2)
(2) where
and
.
The study of the system (Equation2(2)
(2) ) is motivated by the recent developments related to the approaching of the solving of convex optimization problems from a continuous perspective.
In [Citation1], Su, Boyd and Candes proposed the following dynamical system
(3)
(3) as the continuous counterpart of the Nesterov's accelerated gradient method [see [Citation2]] for minimizing g in the convex case. This research has been deepened by Attouch and his co-authors [see [Citation3,Citation4]], who proved that, if
, then the generated trajectory
converges to a minimizer of g as
, while the convergence rate of the objective function along the trajectory is
. The convergence of the trajectory is actually the continuous counterpart of a result due to Chambolle and Dossal [see [Citation5]], which proves the convergence of the iterates of the modified FISTA algorithm [see [Citation6]].
Recently, in [Citation7], investigations have been performed concerning the convergence rate of the objective function along the trajectory in the subcritical case , while some open questions related to the asymptotic properties of the trajectory have been formulated.
In this manuscript, we carry out, in the nonconvex setting, an asymptotic analysis of the dynamical system (Equation2(2)
(2) ), which can be seen as a perturbation of the dynamical system (Equation3
(3)
(3) ) that models Nesterov's accelerated gradient method in the convex case. To the best of our knowledge, this is the first contribution addressing
second-order dynamical systems with variable damping associated to nonconvex optimization problems. We show that the generated trajectory converges to a critical point of g as
, provided the following regularization of g,
satisfies the Kurdyka–Łojasiewicz inequality. Moreover, we derive convergence rates in the terms of Łojasiewicz exponent, for the trajectory, its velocity and its acceleration. One of the major future goals is to study the asymptotic properties of the system (Equation2
(2)
(2) ) in case
. For other investigations of the asymptotic analysis of
second-order dynamical systems with time-dependent damping, we refer to the papers of Haraux and Jendoubi [Citation8] and Balti [Citation9].
For , the convergence of the trajectory generated by (Equation2
(2)
(2) ) to a critical point of g has been shown by Bégout, Bolte and Jendoubi in [Citation10] in the hypothesis that g is of class
and it satisfies the Kurdyka–Łojasiewicz property with a desingularizing function satisfying a restrictive condition [see also the papers of Haraux and Jendoubi [Citation11] and Chill and Jendoubi [Citation12]]. On the other hand, the dynamical system (Equation2
(2)
(2) ) is, for
, a particular instance of the second-order dynamical system of proximal-gradient type studied in [Citation13].
The following numerical scheme, with starting points ,
(4)
(4) where
is the step size, can be seen as a discrete counterpart of (Equation2
(2)
(2) ). One can notice that for
, this iterative scheme algorithm is similar to Nesterov's accelerated convex gradient method.
In the following, we prove that (Equation2(2)
(2) ) can be seen in an informal way as the exact limit of (Equation4
(4)
(4) )). We take to this end in (Equation4
(4)
(4) ) small step sizes and follow the same approach as Su, Boyd and Candes in [Citation1, Section 2]. For this purpose, we rewrite (Equation4
(4)
(4) ) in the form
(5)
(5) and introduce the Ansatz
for some twice continuously differentiable function
. We let
and get
Then, as the step size s goes to zero, from the Taylor expansion of x, we obtain
and
Furthermore, since
it follows
. Consequently, (Equation5
(5)
(5) ) can be written as
or, equivalently
Hence,
After dividing by
and letting
, we obtain
which, after division by t, gives (Equation2
(2)
(2) ), namely
2. Existence and uniqueness of the trajectory
We consider on the finite-dimensional space the Euclidean topology. If
is a local minimizer of g, then
. We denote by
the set of critical points of g.
We are considering in the asymptotic analysis of the dynamical system (Equation2(2)
(2) ) strong global solutions.
Definition 2.1
We say that is a strong global solution of (Equation2
(2)
(2) ), if the following properties are satisfied:
are locally absolutely continuous, in other words, absolutely continuous on each interval
for
;
for almost every
;
and
.
Recall that a function is absolutely continuous on an interval
, if there exists an integrable function
such that
It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative
by the integration formula above. On the other hand, if
(where
) is absolutely continuous and
is L-Lipschitz continuous (where
), then the function
is absolutely continuous, too. Moreover,
is almost everywhere differentiable and the inequality
holds for almost every
[see [Citation14,Citation15]].
We prove existence and uniqueness of a strong global solution of (Equation2(2)
(2) ) by making use of the Cauchy–Lipschitz–Picard Theorem for absolutely continues trajectories [see for example [Citation16, Proposition 6.2.1], [Citation17, Theorem 54]]. The key argument is that one can rewrite (Equation2
(2)
(2) ) as a particular first-order dynamical system in a suitably chosen product space [see also [Citation18]].
Theorem 2.1
For every starting points there exists a unique strong global solution of the dynamical system (Equation2
(2)
(2) ).
Proof.
By making use of the notation , the system (Equation2
(2)
(2) ) can be rewritten as a first-order dynamical system:
(6)
(6) where
First we show that is
-Lipschitz continuous for every
and that the Lipschitz constant is a function of time with the property that
. Indeed, for every
, we have
Obviously, the Lipschitz constant function
is continuous, hence integrable, on
for all
, consequently,
.
Next we show that for all
. Let
be fixed. For
, one has
and the conclusion follows by the continuity of
on
.
The Cauchy–Lipschitz–Picard Theorem guarantees existence and uniqueness of the trajectory of the first-order dynamical system (Equation6(6)
(6) ) and thus of the second-order dynamical system (Equation2
(2)
(2) ).
The next result shows that the acceleration of the trajectory generated by (Equation2(2)
(2) ) is also locally absolutely continuous on
.
Proposition 2.1
For the starting points let x be the unique strong global solution of (Equation2
(2)
(2) ). Then
is locally absolutely continuous on
hence the third-order derivative
exists almost everywhere on
Proof.
Let T>0 be fixed. According to Theorem 2.1, is absolutely continuous on
. We endow the product space
with the 1-norm. For arbitrary
, we have
where
Let be . Since the functions
and
are absolutely continuous on
, there exists
such that for any finite family of intervals
, the implication
holds. Consequently,
hence
is absolutely continuous on
, which shows that
is absolutely continuous
. This proves that
is locally absolutely continuous on
, which means that the third-order derivative
exists almost everywhere on
The following results provides an estimate for the third-order derivative of the strong global solution of the dynamical system (Equation2(2)
(2) ) in terms its first- and second-order derivatives.
Lemma 2.1
For the starting points let x be the unique strong global solution of (Equation2
(2)
(2) ). Then, for almost every
it holds
(7)
(7)
Proof.
Let be such that
. We have for almost every h>0 that
Hence,
By taking the limit as
, we obtain
Since
, we conclude
Remark 2.1
For
we have that
for almost every
.
3. Convergence of trajectories
In this section, we study the convergence of the trajectory of the dynamical system (Equation2(2)
(2) ). We denote by
the set of limit points of the trajectory x.
Before proving a first result in this sense, we recall two technical lemmas which will play an essential role in the asymptotic analysis.
Lemma 3.1
Suppose that is locally absolutely continuous and bounded below and that there exists
such that for almost every
Then there exists
.
Lemma 3.2
Suppose that is locally absolutely continuous and
and that there exists
such that for almost every
Then it holds
.
Theorem 3.1
Assume that g is bounded from below and, for let x be the unique strong global solution of the dynamical system (Equation2
(2)
(2) ). Then the following statements are true:
there exists
such that the limit
exists and is finite;
and
Proof.
Choose . By using the
-Lipschitz continuity of
, for almost every
it holds
Taking into account that
and
we obtain for almost every
(8)
(8) Since
, there exists
such that for every
it holds
(9)
(9) For simplicity, we denote
Let be Since
, we have
Furthermore, by the
-Lipschitz property of
, it holds
By integrating (Equation8
(8)
(8) ) on
, we obtain
(10)
(10) Taking into account that g is bounded from bellow, by letting
, we obtain
Consequently
, hence
On the other hand, (Equation8
(8)
(8) ) and Lemma 3.1 ensure that the limit
(11)
(11) exists and is finite.
As for almost every
and
, according to Lemma 3.2, it follows that
As for almost every
and
(see Remark 2.1 and (i)), according to Lemma 3.2, it follows that
Finally, by using that , (Equation11
(11)
(11) ) becomes
(12)
(12)
Let Then there exists a sequence
such that
as
By using the continuity of
, we have
which shows that
In the following result, we use the distance function to a set, defined for as
for all
. The following result is a direct consequence of Theorem 3.1.
Lemma 3.3
Assume that g is bounded from below and, for let x be the unique strong global solution of the dynamical system (Equation2
(2)
(2) ). Define
Let be
and
such that for every
the inequalities (Equation9
(9)
(9) ) hold. For every
define
Then the following statements are true:
the limit
exists and is finite;
H is finite and constant on
is nonempty and compact;
Proof.
(i) According to Theorem 3.1(iii),
hence
(ii) and (iii) are nothing else than (Equation8(8)
(8) ) and (Equation12
(12)
(12) ), respectively.
(iv) follows directly from (iii).
(v) Since for every
, by using (Equation2
(2)
(2) ), we have for almost every
(vi) Since
and (see (i))
by Theorem 3.1(iv) one obtains
Assume that x is bounded.
(vii) Since we obtain that u and v are bounded, too. Thus, the set of limit points
is nonempty. Furthermore, since
and
is bounded, it is enough to show that
is closed.
Let be and assume that
We show that
Obviously, for every
, there exists a sequence
, such that
Let be . Since
there exists
such that for every
it holds
Let
be fixed. Since
there exists
such that for every
it holds
Let be
such that
. Obviously
as
and for every
Hence,
thus
(viii) follows from the definition of the set .
Remark 3.1
Combining (iii) and (iv) in Lemma 3.3, it follows that for every and
such that
as
we have
The convergence of the trajectory generated by the dynamical system (Equation2(2)
(2) ) will be shown in the framework of functions satisfying the Kurdyka–Łojasiewicz property. For
, we denote by
the class of concave and continuous functions
such that
, ϕ is continuously differentiable on
, continuous at 0 and
for all
.
Definition 3.1
Kurdyka–Łojasiewicz property
Let be a differentiable function. We say that f satisfies the Kurdyka–Łojasiewicz (KL) property at
if there exist
, a neighbourhood U of
and a function
such that for all x in the intersection
the following inequality holds
If f satisfies the KL property at each point in
, then f is called a KL function.
The origins of this notion go back to the pioneering work of Łojasiewicz [Citation19], where it is proved that for a real-analytic function and a critical point
(that is
), there exists
such that the function
is bounded around
. This corresponds to the situation when
. The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [Citation20] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [Citation21–24].
To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [Citation21–27] and the references therein for more details regarding all the classes mentioned above and illustrating examples.
An important role in our convergence analysis will be played by the following uniformized KL property given in [Citation27, Lemma 6].
Lemma 3.4
Let be a compact set and let
be a differentiable function. Assume that f is constant on Ω and f satisfies the KL property at each point of Ω. Then there exist
and
such that for all
and for all x in the intersection
(13)
(13) the following inequality holds
(14)
(14)
The following theorem is the main result of the paper and concerns the global asymptotic convergence of the trajectory generated by (Equation2(2)
(2) ).
Theorem 3.2
Assume that g is bounded from below and, for let x be the unique strong global solution of (Equation2
(2)
(2) ). Suppose that
is a KL function and x is bounded. Then the following statements are true:
there exists
such that
Proof.
Let be and
such that for every
the inequalities (Equation9
(9)
(9) ) hold. Furthermore, we will use the notations made in Lemma 3.3, according to which we can choose
. It holds
Case I. There exists such that
From Lemma 3.3(ii), we have
hence
On the other hand,
hence
Hence,
, which leads to
Since A<0 and
for every
, the latter inequality can hold only if
Consequently,
and x is constant on
Case II. We assume that holds for every
. Let
According to Lemma 3.3, Ω is nonempty and compact and H is constant on Ω. Since H is a KL function, according to Lemma 3.4, there exist
and
such that for every
in the intersection
one has
Since , there exists
such that
for every
. Since
, there exists
such that
for every
. Hence, for every
, we have
According to Lemma 3.3 (ii) and (v), we have and
hence
for almost every
By integrating on the interval , for
, we obtain
Since ϕ is bounded from below, and
for every
and
was arbitrarily chosen, we obtain that
which leads to
and further to
Since p is bounded from above on
and
we obtain that
Finally, since , the limit
exists and it is finite. In conclusion, there exists
such that
Remark 3.2
According to Remark 2.1, there exists N>0 such that for almost every
hence, under the hypotheses of Theorem 3.2, one has
Remark 3.3
Since the class of semi-algebraic functions is closed under addition [see, for example, [Citation27]] and is semi-algebraic, the conclusion of the previous theorem holds, if, instead of asking that H is a KL function, we ask that g is semi-algebraic.
Remark 3.4
Assume that g is coercive, that is
For
, let
be the unique global solution of (Equation2
(2)
(2) ). Then x is bounded.
Indeed, notice that g is bounded from below, being a continuous and coercive function [see [Citation28]]. From (Equation10(10)
(10) ), it follows that
is contained for every
in a lower level set of g, which is bounded. According to Theorem 3.1,
, which implies that x is bounded.
4. Convergence rates
In this section, we will assume that the regularized function H satisfies the Lojasiewicz property, which, as noted in the previous section, corresponds to a particular choice of the desingularizing function ϕ [see [Citation19,Citation22,Citation25]].
Definition 4.1
Let be a differentiable function. The function f is said to fulfil the Łojasiewicz property, if for every
there exist
and
such that
The number θ is called the Łojasiewicz exponent of f at the critical point
In the following theorem, we provide convergence rates for the trajectory generated by (Equation2(2)
(2) ), its velocity and acceleration in terms of the Łojasiewicz exponent of H [see, also, [Citation22,Citation25]].
Theorem 4.1
Assume that g is bounded from below and, for let x be the unique strong global solution of (Equation2
(2)
(2) ). Suppose that x is bounded, let
be such that
and suppose that
fulfils the Łojasiewicz property at
with Łojasiewicz exponent θ. Let be (see Remark 2.1)
Then, there exist
and T>0 such that for almost every
, the following statements are true:
if
then x converges in finite time;
if
then
and
if
then
and
Proof.
Let be and
such that for every
the inequalities (Equation9
(9)
(9) ) hold. We define for every
Let
be fixed. For
, we have
By taking the limit as
, we obtain
(15)
(15)
Furthermore, for T>t, we have
By taking the limit as
, we obtain
(16)
(16)
According to Remark 2.1, it holds for almost every
,
By taking the limit as
, we obtain
(17)
(17) Next, we show that there exists m<0 such that
(18)
(18)
Indeed,
where
It is an easy verification that
As we have seen in the proof of Theorem 3.2, if there exists such that
, then x is constant on
and the conclusion follows.
On the other hand, if for every one has that
, then according to the proof of Theorem 3.2, there exist
and
such that
and
Busing (Equation18
(18)
(18) ), we obtain that
where
For , we integrate the last relation on the interval
, where
, which yields
By taking the limits as
, we get
On the other hand, according to the KL property for H and Lemma 3.3 (v), we have
hence
By denoting
, one can easily see that a>1 and so
Taking into account that
, the previous inequality is nothing else than
(19)
(19) where
If , then (Equation19
(19)
(19) ) becomes
By multiplying with
and integrating on
, it follows that there exists
such that
where
Using (Equation15
(15)
(15) )–(Equation17
(17)
(17) ), we obtain
which proves (b).
Assume now that . In this case, (Equation19
(19)
(19) ) leads to
By integrating on
we obtain
where
Then there exists
such that
for every
, thus, x is constant on
and (a) follows.
Assume now that In this case, (Equation19
(19)
(19) ) leads to
By integrating on
we obtain
where
or, equivalently,
Using again (Equation15
(15)
(15) )–(Equation17
(17)
(17) ), we obtain
which proves (c).
Acknowledgements
The authors are thankful to an anonymous reviewer for comments and remarks which were helpful to gain a better insight into the asymptotic behaviour of the trajectories of the studied dynamical system.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Radu Ioan Bo http://orcid.org/0000-0002-4469-314X
Additional information
Funding
References
- Su W, Boyd S, Candes EJ. A differential equation for modeling Nesterov's accelerated gradient method: theory and insights. J Mach Learn Res. 2016;17:1–43.
- Nesterov YE. A method for solving the convex programming problem with convergence rate O(1/k2). (Russian) Dokl Akad Nauk SSSR. 1983;269(3):543–547.
- Attouch H, Chbani Z, Peypouquet J, et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math Program. 2018;168(1–2, Ser. B):123–175. doi: 10.1007/s10107-016-0992-8
- Attouch H, Peypouquet J, Redont P. Fast convex optimization via inertial dynamics with Hessian driven damping. J Differ Equat. 2016;261(10):5734–5783. doi: 10.1016/j.jde.2016.08.020
- Chambolle A, Dossal C. On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl. 2015;166(3):968–982. doi: 10.1007/s10957-015-0746-4
- Beck A, Teboulle M. A fast iterative shrinkage- thresholding algorithm for linear inverse problems. SIAM J Imag Sci. 2009;2(1):183–202. doi: 10.1137/080716542
- Attouch H, Chbani Z, Riahi H. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3. 2017. arXiv:1706.05671.
- Haraux A, Jendoubi MA. Asymptotics for a second order differential equation with a linear, slowly time-decaying damping term. Evol Equat Contr Theor. 2013;2(3):461–470. doi: 10.3934/eect.2013.2.461
- Balti M. Asymptotic behavior for second-order differential equations with nonlinear slowly time-decaying damping and integrable source. Electron J Differ Equat. 2015;302:11pp.
- Bégout P, Bolte J, Jendoubi MA. On damped second-order gradient systems. J Differ Equat. 2015;259:3115–3143. doi: 10.1016/j.jde.2015.04.016
- Haraux A, Jendoubi MA. Convergence of solutions of second-order gradient-like systems with analytic nonliniearities. J Differ Equat. 1998;144(2):313–320. doi: 10.1006/jdeq.1997.3393
- Chill R, Jendoubi MA. Asymptotics for a second order differential equation with a linear, slowly time-decaying damping term. Evol Equat Contr Theor. 2013;2(3):461–470. doi: 10.3934/eect.2013.2.461
- Boţ RI, Csetnek ER, László SC. Approaching nonsmooth nonconvex minimization through second-order proximal-gradient dynamical systems. J Evol Equat. 2018. doi:10.1007/s00028-018-0441-7.
- Abbas B, Attouch H, Svaiter BF. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J Optim Theory Appl. 2014;161(2):331–360. doi: 10.1007/s10957-013-0414-5
- Attouch H, Svaiter BF. A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J Control Optim. 2011;49(2):574–598. doi: 10.1137/100784114
- Haraux A. Systèmes Dynamiques Dissipatifs et Applications, Recherches en Mathématiques Appliquées 17, Masson, Paris, 1991.
- Sontag ED. Mathematical control theory. Deterministic finite-dimensional systems. 2nd ed. New York: Springer-Verlag; 1998. (Texts in Applied Mathematics 6).
- Alvarez F, Attouch H, Bolte J, et al. A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J Math Pures et Appl. 2002;81(8):747–779. doi: 10.1016/S0021-7824(01)01253-3
- Łojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, 87–89, 1963.
- Kuryka K. On gradients of functions definable in o-minimal structures. Ann l Fourier (Grenoble). 1998;48(3):769–783. doi: 10.5802/aif.1638
- Attouch H, Bolte J, Redont P, et al. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math Oper Res. 2010;35(2):438–457. doi: 10.1287/moor.1100.0449
- Bolte J, Daniilidis A, Lewis A. The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J Optim. 2006;17(4):1205–1223. doi: 10.1137/050644641
- Bolte J, Daniilidis A, Lewis A, et al. Clarke subgradients of stratifiable functions. SIAM J Optim. 2007;18(2):556–572. doi: 10.1137/060670080
- Bolte J, Daniilidis A, Ley O, et al. Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans Am Math Soc. 2010;362(6):3319–3363. doi: 10.1090/S0002-9947-09-05048-X
- Attouch H, Bolte J. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math Program. 2009;116(1–2, Series B):5–16. doi: 10.1007/s10107-007-0133-5
- Attouch H, Bolte J, Svaiter BF. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math Program. 2013;137(1–2, Series A):91–129. doi: 10.1007/s10107-011-0484-9
- Bolte J, Sabach S, Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program Ser A. 2014;146(1–2):459–494. doi: 10.1007/s10107-013-0701-9
- Rockafellar RT, Wets RJ-B. Variational analysis, fundamental principles of mathematical sciences, 317. Berlin: Springer-Verlag; 1998.