Publication Cover
Applicable Analysis
An International Journal
Volume 99, 2020 - Issue 3
1,617
Views
12
CrossRef citations to date
0
Altmetric
Articles

A second-order dynamical approach with variable damping to nonconvex smooth minimization

ORCID Icon, &
Pages 361-378 | Received 02 May 2018, Accepted 26 Jun 2018, Published online: 09 Jul 2018

ABSTRACT

We investigate a second-order dynamical system with variable damping in connection with the minimization of a nonconvex differentiable function. The dynamical system is formulated in the spirit of the differential equation which models Nesterov's accelerated convex gradient method. We show that the generated trajectory converges to a critical point, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. We also provide convergence rates for the trajectory formulated in terms of the Lojasiewicz exponent.

COMMUNICATED BY:

AMS SUBJECT CLASSIFICATIONS:

View correction statement:
Erratum

1. Introduction

Consider the (not necessarily convex) optimization problem (1) infxRng(x),(1) where g:RnR is a Fréchet differentiable function with Lg-Lipschitz continuous gradient, i.e. there exists Lg0 such that g(x)g(y)Lgxy for all x,yRn. We associate to (Equation1) the second-order dynamical system (for tt0) (2) x¨(t)+αt+γx˙(t)+g(x(t))=0,x(t0)=u0,x˙(t0)=v0,(2) where t0>0,u0,v0Rn,αR and γ(0,+).

The study of the system (Equation2) is motivated by the recent developments related to the approaching of the solving of convex optimization problems from a continuous perspective.

In [Citation1], Su, Boyd and Candes proposed the following dynamical system (3) x¨(t)+αtx˙(t)+g(x(t))=0,x(t0)=u0,x˙(t0)=v0,(3) as the continuous counterpart of the Nesterov's accelerated gradient method [see [Citation2]] for minimizing g in the convex case. This research has been deepened by Attouch and his co-authors [see [Citation3,Citation4]], who proved that, if α>3, then the generated trajectory x(t) converges to a minimizer of g as t+, while the convergence rate of the objective function along the trajectory is o(1/t2). The convergence of the trajectory is actually the continuous counterpart of a result due to Chambolle and Dossal [see [Citation5]], which proves the convergence of the iterates of the modified FISTA algorithm [see [Citation6]].

Recently, in [Citation7], investigations have been performed concerning the convergence rate of the objective function along the trajectory in the subcritical case α3, while some open questions related to the asymptotic properties of the trajectory have been formulated.

In this manuscript, we carry out, in the nonconvex setting, an asymptotic analysis of the dynamical system (Equation2), which can be seen as a perturbation of the dynamical system (Equation3) that models Nesterov's accelerated gradient method in the convex case. To the best of our knowledge, this is the first contribution addressing second-order dynamical systems with variable damping associated to nonconvex optimization problems. We show that the generated trajectory converges to a critical point of g as t+, provided the following regularization of g, H:Rn×RnR,H(u,v)=g(u)+12uv2, satisfies the Kurdyka–Łojasiewicz inequality. Moreover, we derive convergence rates in the terms of Łojasiewicz exponent, for the trajectory, its velocity and its acceleration. One of the major future goals is to study the asymptotic properties of the system (Equation2) in case γ=0. For other investigations of the asymptotic analysis of second-order dynamical systems with time-dependent damping, we refer to the papers of Haraux and Jendoubi [Citation8] and Balti [Citation9].

For α=0, the convergence of the trajectory generated by (Equation2) to a critical point of g has been shown by Bégout, Bolte and Jendoubi in [Citation10] in the hypothesis that g is of class C2 and it satisfies the Kurdyka–Łojasiewicz property with a desingularizing function satisfying a restrictive condition [see also the papers of Haraux and Jendoubi [Citation11] and Chill and Jendoubi [Citation12]]. On the other hand, the dynamical system (Equation2) is, for α=0, a particular instance of the second-order dynamical system of proximal-gradient type studied in [Citation13].

The following numerical scheme, with starting points x0,x1Rn, (4) (k1) yk=xk+(1γs)kαγsk+α(xkxk1),xk+1=yksg(yk),(4) where s1/Lg is the step size, can be seen as a discrete counterpart of (Equation2). One can notice that for γ=0, this iterative scheme algorithm is similar to Nesterov's accelerated convex gradient method.

In the following, we prove that (Equation2) can be seen in an informal way as the exact limit of (Equation4)). We take to this end in (Equation4) small step sizes and follow the same approach as Su, Boyd and Candes in [Citation1, Section 2]. For this purpose, we rewrite (Equation4) in the form (5) xk+1xks=(1γs)kαγsk+αxkxk1ssg(yk)k1(5) and introduce the Ansatz xkx(ks) for some twice continuously differentiable function x:[0,+)Rn. We let k=t/s and get x(t)xk,x(t+s)xk+1,x(ts)xk1. Then, as the step size s goes to zero, from the Taylor expansion of x, we obtain xk+1xks=x˙(t)+12x¨(t)s+o(s) and xkxk1s=x˙(t)12x¨(t)s+o(s).

Furthermore, since sg(yk)g(xk)sLgykxk=sLg(1γs)kαγsk+αxkxk1=o(s), it follows sg(yk)=sg(xk)+o(s). Consequently, (Equation5) can be written as x˙(t)+12x¨(t)s+o(s)=(1γs)tαγst+αsx˙(t)12x¨(t)s+o(s)sg(x(t))+o(s) or, equivalently (t+αs)(x˙(t)+12x¨(t)s+o(s))=((1γs)tαγs)(x˙(t)12x¨(t)s+o(s))s(t+αs)g(x(t))+o(s). Hence, 12(2t+αsγtsαγs)x¨(t)s+(γts+αs+αγs)x˙(t)+s(t+αs)g(x(t))=o(s). After dividing by s and letting s0, we obtain tx¨(t)+(γt+α)x˙(t)+tg(x(t))=0, which, after division by t, gives (Equation2), namely x¨(t)+αt+γx˙(t)+g(x(t))=0.

2. Existence and uniqueness of the trajectory

We consider on the finite-dimensional space Rn the Euclidean topology. If xRn is a local minimizer of g, then g(x)=0. We denote by crit(g)={xRn:g(x)=0} the set of critical points of g.

We are considering in the asymptotic analysis of the dynamical system (Equation2) strong global solutions.

Definition 2.1

We say that x:[t0,+)Rn is a strong global solution of (Equation2), if the following properties are satisfied:

  1. x,x˙:[t0,+)Rn are locally absolutely continuous, in other words, absolutely continuous on each interval [t0,T] for t0<T<+;

  2. x¨(t)+(α/t+γ)x˙(t)+g(x(t))=0 for almost every tt0;

  3. x(t0)=u0 and x˙(t0)=v0.

Recall that a function x:[t0,+)Rn is absolutely continuous on an interval [t0,T], if there exists an integrable function y:[t0,T]Rn such that x(t)=x(0)+t0ty(s)dst[t0,T]. It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative x˙=y by the integration formula above. On the other hand, if x:[t0,T]Rn (where T>t0) is absolutely continuous and B:RnRn is L-Lipschitz continuous (where L0), then the function Bx is absolutely continuous, too. Moreover, Bx is almost everywhere differentiable and the inequality (d/dt)B(x(t))Lx˙(t) holds for almost every tt0 [see [Citation14,Citation15]].

We prove existence and uniqueness of a strong global solution of (Equation2) by making use of the Cauchy–Lipschitz–Picard Theorem for absolutely continues trajectories [see for example [Citation16, Proposition 6.2.1], [Citation17, Theorem 54]]. The key argument is that one can rewrite (Equation2) as a particular first-order dynamical system in a suitably chosen product space [see also [Citation18]].

Theorem 2.1

For every starting points u0,v0Rn there exists a unique strong global solution of the dynamical system (Equation2).

Proof.

By making use of the notation X(t)=(x(t),x˙(t)), the system (Equation2) can be rewritten as a first-order dynamical system: (6) X˙(t)=F(t,X(t)),X(t0)=(u0,v0),(6) where F:[t0,+)×Rn×RnRn×Rn,F(t,u,v)=(v,(α/t+γ)vg(u)).

First we show that F(t,,) is L(t)-Lipschitz continuous for every tt0 and that the Lipschitz constant is a function of time with the property that L()Lloc1([t0,+)). Indeed, for every (u,v),(u¯,v¯)Rn×Rn, we have F(t,u,v)F(t,u¯,v¯)=vv¯2+αt+γ(v¯v)+(g(u¯)g(u))21+2αt+γ2vv¯2+2Lg2uu¯21+2Lg2+2αt+γ2vv¯2+uu¯2=1+2Lg2+2αt+γ2(u,v)(u¯,v¯). Obviously, the Lipschitz constant function tL(t):=1+2Lg2+2(α/t+γ)2 is continuous, hence integrable, on [t0,T] for all t0<T<+, consequently, LLloc1([t0,+)).

Next we show that F(,u,v)Lloc1([t0,+),Rn×Rn) for all u,vRn. Let u,vRn be fixed. For t0<T<+, one has t0TF(t,u,v)dt=t0Tv2+αt+γv+g(u)2dtt0T1+2αt+γ2v2+2g(u)2dtv2+g(u)2t0T3+2αt+γ2dt and the conclusion follows by the continuity of t3+2(α/t+γ)2 on [t0,T].

The Cauchy–Lipschitz–Picard Theorem guarantees existence and uniqueness of the trajectory of the first-order dynamical system (Equation6) and thus of the second-order dynamical system (Equation2).

The next result shows that the acceleration of the trajectory generated by (Equation2) is also locally absolutely continuous on [t0,+).

Proposition 2.1

For the starting points u0,v0Rn, let x be the unique strong global solution of (Equation2). Then x¨ is locally absolutely continuous on [t0,+), hence the third-order derivative x(3) exists almost everywhere on [t0,+).

Proof.

Let T>0 be fixed. According to Theorem 2.1, X(t):=(x(t),x˙(t)) is absolutely continuous on [t0,T]. We endow the product space Rn×Rn with the 1-norm. For arbitrary s,t[t0,T], we have X˙(s)X˙(t)1=F(s,X(s))F(t,X(t))1=x˙(s)x˙(t),αs+γx˙(s)+αt+γx˙(t)g(x(s))+g(x(t))1(1+γ)x˙(s)x˙(t)+αsx˙(s)αtx˙(t)+g(x(s))g(x(t))(1+γ)x˙(s)x˙(t)+|α|sx˙(s)x˙(t)+αsx˙(t)αtx˙(t)+Lgx(s)x(t)L1x˙(s)x˙(t)+L2αsαt+Lgx(s)x(t), where L1:=maxt[t0,T]1+γ+|α|tandL2:=maxt[t0,T]x˙(t).

Let be ε>0. Since the functions x˙(),tα/t and x() are absolutely continuous on [t0,T], there exists η>0 such that for any finite family of intervals Ik=(ak,bk)[t0,T], the implication IkIj= and k|bkak|<ηkx˙(bk)x˙(ak)<ε3L1,kαbkαak<ε3L2andkx(bk)x(ak)<ε3Lg holds. Consequently, kX˙(ak)X˙(bk)<ε3+ε3+ε3=ε, hence X˙()=(x˙(),x¨()) is absolutely continuous on [t0,T], which shows that x¨ is absolutely continuous [t0,T]. This proves that x¨ is locally absolutely continuous on [t0,+), which means that the third-order derivative x(3) exists almost everywhere on [t0,+).

The following results provides an estimate for the third-order derivative of the strong global solution of the dynamical system (Equation2) in terms its first- and second-order derivatives.

Lemma 2.1

For the starting points u0,v0Rn, let x be the unique strong global solution of (Equation2). Then, for almost every t[t0,+), it holds (7) x(3)(t)Lg+|α|t2x˙(t)+γ+|α|tx¨(t).(7)

Proof.

Let t[t0,+) be such that X˙(t)=F(t,X(t)). We have for almost every h>0 that X˙(t+h)X˙(t)1=F(t+h,X(t+h))F(t,X(t))1=x˙(t+h)x˙(t),αt+h+γx˙(t+h)+αt+γx˙(t)g(x(t+h))+g(x(t))1=x˙(t+h)x˙(t)+αt+h+γx˙(t+h)+αt+γx˙(t)g(x(t+h))+g(x(t))(1+γ)x˙(t+h)x˙(t)+αt+hx˙(t+h)αtx˙(t)+g(x(t+h))g(x(t))(1+γ)x˙(t+h)x˙(t)+αt+hx˙(t+h)αtx˙(t)+Lgx(t+h)x(t). Hence, X˙(t+h)X˙(t)h1(1+γ)x˙(t+h)x˙(t)h+(α/(t+h))x˙(t+h)(α/t)x˙(t)h+Lgx(t+h)x(t)h. By taking the limit as h0, we obtain X¨(t)1(1+γ)x¨(t)+αtx˙(t)+Lgx˙(t). Since X¨(t)1=x(3)(t)+x¨(t), we conclude x(3)(t)Lg+|α|t2x˙(t)+γ+|α|tx¨(t).

Remark 2.1

For N:=maxtt0Lg+|α|t2,γ+|α|t, we have that x(3)(t)N(x¨(t)+x˙(t)) for almost every t[t0,+).

3. Convergence of trajectories

In this section, we study the convergence of the trajectory of the dynamical system (Equation2). We denote by ω(x):={x¯Rn:tk+ such that x(tk)x¯ as k+} the set of limit points of the trajectory x.

Before proving a first result in this sense, we recall two technical lemmas which will play an essential role in the asymptotic analysis.

Lemma 3.1

Suppose that F:[0,+)R is locally absolutely continuous and bounded below and that there exists GL1([0,+)) such that for almost every t[0,+) ddtF(t)G(t). Then there exists limt+F(t)R.

Lemma 3.2

Suppose that F:[0,+)[0,+) is locally absolutely continuous and FLp([0,+)),1p<, and that there exists G:[0,+)R,GLr([0,+)),1r, such that for almost every t[0,+) ddtF(t)G(t). Then it holds limt+F(t)=0.

Theorem 3.1

Assume that g is bounded from below and, for u0,v0Rn, let x be the unique strong global solution of the dynamical system (Equation2). Then the following statements are true:

  1. x˙,x¨L2([t0,+),Rn);

  2. there exists β>0 such that the limit limt+g(βa˙x(t)+x(t)) exists and is finite;

  3. limt+x¨(t)=0 and limt+x˙(t)=0;

  4. ω(x)crit(g).

Proof.

Choose 0<β<min(2/Lg,(Lg2+2γLgLg)/Lg). By using the Lg-Lipschitz continuity of g, for almost every t[t0,+) it holds ddtg(βx˙(t)+x(t))=βx¨(t)+x˙(t),g(βx˙(t)+x(t))=βx¨(t)+x˙(t),g(βx˙(t)+x(t))g(x(t))+βx¨(t)+x˙(t),x¨(t)αt+γx˙(t)βx¨(t)21+βγ+αβtx¨(t),x˙(t)γ+αtx˙(t)2+Lgβx¨(t)+x˙(t)βx˙(t). Taking into account that βx¨(t)+x˙(t)βx˙(t)β2x¨(t)x˙(t)+βx˙(t)212β2x¨(t)2+β+12β2x˙(t)2 and 1+βγ+αβtx¨(t),x˙(t)=12ddt1+βγ+αβtx˙(t)2αβ2t2x˙(t)2, we obtain for almost every t[t0,+) (8) ddtg(βx˙(t)+x(t))+121+βγ+αβtx˙(t)2β+Lgβ22x¨(t)2+γ+Lgβ+Lgβ22αtαβ2t2x˙(t)2.(8) Since 0<β<min(2/Lg,(Lg2+2γLgLg)/Lg), there exists t1>0 such that for every tt1 it holds (9) 1+βγ+αβt>0,β+Lgβ22<0andγ+Lgβ+Lgβ22αtαβ2t2<0.(9) For simplicity, we denote A:=β+Lgβ22andB(t):=γ+Lgβ+Lgβ22αtαβ2t2t[t0,+).

Let be T>t1. Since xC2([t1,T],Rn), we have x,x˙,x¨L2([t1,T],Rn). Furthermore, by the Lg-Lipschitz property of g, it holds gL2([t1,T],Rn). By integrating (Equation8) on [t1,T], we obtain (10) g(βx˙(T)+x(T))+121+βγ+αβTx˙(T)2t1TAx¨(t)2dtt1TB(t)x˙(t)2dtg(βx˙(t1)+x(t1))+121+βγ+αβt1x˙(t1)2.(10) Taking into account that g is bounded from bellow, by letting T+, we obtain t1(Ax¨(t)2)dt<+andt1(B(t)x˙(t)2)dt<+ Consequently x¨()2,B()x˙()2L1([t0,+),R), hence x¨,x˙L2([t0,+),Rn). On the other hand, (Equation8) and Lemma 3.1 ensure that the limit (11) limt+g(βx˙(t)+x(t))+121+βγ+αβtx˙(t)2(11) exists and is finite.

As for almost every t[t0,+) ddt(x˙(t)2)=2x¨(t),x˙(t)x˙(t)2+x¨(t)2 and x˙()2+x¨()2L1([t0,+)), according to Lemma 3.2, it follows that limt+x˙(t)=0.

As for almost every t[t0,+) ddt(x¨(t)2)=2x(3)(t),x¨(t)x(3)(t)2+x¨(t)2 and x(3)()2+x¨()2L1([t0,+)) (see Remark 2.1 and (i)), according to Lemma 3.2, it follows that limt+x¨(t)=0.

Finally, by using that limt+x˙(t)=0, (Equation11) becomes (12) limt+g(βx˙(t)+x(t))+121+βγ+αβtx˙(t)2=limt+g(βx˙(t)+x(t))R.(12)

Let x¯ω(x). Then there exists a sequence tk+,k+ such that x(tk)x¯ as k+. By using the continuity of g, we have 0=limk+x¨(tk)+αtk+γx˙(tk)+g(x(tk))=g(x¯), which shows that x¯crit(g).

In the following result, we use the distance function to a set, defined for ARn as dist(x,A)=infyAxy for all xRn. The following result is a direct consequence of Theorem 3.1.

Lemma 3.3

Assume that g is bounded from below and, for u0,v0Rn, let x be the unique strong global solution of the dynamical system (Equation2). Define H:Rn×RnR,H(x,y)=g(x)+12xy2. Let be 0<β<min(2/Lg,(Lg2+2γLgLg)/Lg) and t1>0 such that for every tt1 the inequalities (Equation9) hold. For every t[t1,+), define u(t):=βx˙(t)+x(t),v(t):=1+βγ+αβt+βx˙(t)+x(t),A=β+Lgβ22,B(t):=γ+Lgβ+Lgβ22αtαβ2t2andp(t):=Lgβ+γ+|α|t+21+βγ+αβt. Then the following statements are true:

  1. ω(u)=ω(v)=ω(x);

  2. ddtH(u(t),v(t))Ax¨(t)2+B(t)x˙(t)20 for almost every tt1;

  3. the limit limt+H(u(t),v(t))=limt+g(βa˙x(t)+x(t)) exists and is finite;

  4. H is finite and constant on ω(u,v)={(x¯,x¯)Rn×Rn:x¯ω(x)};

  5. H(u(t),v(t))x¨(t)+p(t)x˙(t) for almost every tt1;

  6. ω(u,v)crit(H).

If x is bounded, then
  1. ω(u,v) is nonempty and compact;

  2. limt+dist((u(t),v(t)),ω(u,v))=0.

Proof.

(i) According to Theorem 3.1(iii), limt+βx˙(t)=limt+1+βγ+αβt+βx˙(t)=0, hence ω(u)=ω(v)=ω(x).

(ii) and (iii) are nothing else than (Equation8) and (Equation12), respectively.

(iv) follows directly from (iii).

(v) Since H(x,y)=(g(x)+xy,yx) for every (x,y)Rn×Rn, by using (Equation2), we have for almost every t[t1,+) H(u(t),v(t))g(u(t))+2u(t)v(t)g(u(t))g(x(t))+g(x(t))+2u(t)v(t)Lgβx˙(t)+x¨(t)γ+αtx˙(t)+2u(t)v(t)x¨(t)+Lgβ+γ+|α|t+21+βγ+αβtx˙(t)=x¨(t)+p(t)x˙(t).

(vi) Since critH={(x,y)Rn×Rn:H(x,y)=(0,0)}={(x¯,x¯)Rn×Rn:x¯crit(g)} and (see (i)) ω(u,v)={(x¯,x¯)Rn×Rn:x¯ω(x)}, by Theorem 3.1(iv) one obtains ω(u,v)crit(H).

Assume that x is bounded.

(vii) Since x˙(t)0,t+, we obtain that u and v are bounded, too. Thus, the set of limit points ω(u,v) is nonempty. Furthermore, since ω(u,v)={(x¯,x¯)Rn×Rn:x¯ω(x)} and ω(x) is bounded, it is enough to show that ω(x) is closed.

Let be (x¯n)n1ω(x) and assume that limn+x¯n=x. We show that xω(x). Obviously, for every n1, there exists a sequence tkn+,k+, such that limk+x(tkn)=x¯n.

Let be ϵ>0. Since limn+x¯n=x, there exists N(ϵ)N such that for every nN(ϵ) it holds x¯nx<ϵ2. Let n1 be fixed. Since limk+x(tkn)=x¯n, there exists k(n,ϵ)N such that for every kk(n,ϵ) it holds x(tkn)x¯n<ϵ2. Let be knk(n,ε) such that tknn>n. Obviously tknn as n+ and for every nN(ϵ) x(tknn)x<ϵ. Hence, limn+x(tknn)=x, thus xω(x).

(viii) follows from the definition of the set ω(u,v).

Remark 3.1

Combining (iii) and (iv) in Lemma 3.3, it follows that for every x¯ω(x) and tk+ such that x(tk)x¯ as k+ we have limk+H(u(tk),v(tk))=H(x¯,x¯).

The convergence of the trajectory generated by the dynamical system (Equation2) will be shown in the framework of functions satisfying the Kurdyka–Łojasiewicz property. For η(0,+], we denote by Θη the class of concave and continuous functions ϕ:[0,η)[0,+) such that ϕ(0)=0, ϕ is continuously differentiable on (0,η), continuous at 0 and ϕ(s)>0 for all s(0,η).

Definition 3.1

Kurdyka–Łojasiewicz property

Let f:RnR be a differentiable function. We say that f satisfies the Kurdyka–Łojasiewicz (KL) property at x¯Rn if there exist η(0,+], a neighbourhood U of x¯ and a function ϕΘη such that for all x in the intersection U{xRn:f(x¯)<f(x)<f(x¯)+η}, the following inequality holds ϕ(f(x)f(x¯))f(x))1. If f satisfies the KL property at each point in Rn, then f is called a KL function.

The origins of this notion go back to the pioneering work of Łojasiewicz [Citation19], where it is proved that for a real-analytic function f:RnR and a critical point x¯Rn (that is f(x¯)=0), there exists θ[1/2,1) such that the function |ff(x¯)|θf1 is bounded around x¯. This corresponds to the situation when ϕ(s)=C(1θ)1s1θ. The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [Citation20] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [Citation21–24].

To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [Citation21–27] and the references therein for more details regarding all the classes mentioned above and illustrating examples.

An important role in our convergence analysis will be played by the following uniformized KL property given in [Citation27, Lemma 6].

Lemma 3.4

Let ΩRn be a compact set and let f:RnR be a differentiable function. Assume that f is constant on Ω and f satisfies the KL property at each point of Ω. Then there exist ϵ,η>0 and ϕΘη such that for all x¯Ω and for all x in the intersection (13) {xRn:dist(x,Ω)<ε}{xRn:f(x¯)<f(x)<f(x¯)+η},(13) the following inequality holds (14) ϕ(f(x)f(x¯))f(x)1.(14)

The following theorem is the main result of the paper and concerns the global asymptotic convergence of the trajectory generated by (Equation2).

Theorem 3.2

Assume that g is bounded from below and, for u0,v0Rn, let x be the unique strong global solution of (Equation2). Suppose that H:Rn×RnR,H(x,y)=g(x)+12xy2 is a KL function and x is bounded. Then the following statements are true:

  1. x˙,x¨L1([t0,+),Rn);

  2. there exists x¯crit(g) such that limt+x(t)=x¯.

Proof.

Let be 0<β<min(2/Lg,(Lg2+2γLgLg)/Lg) and t1>0 such that for every tt1 the inequalities (Equation9) hold. Furthermore, we will use the notations made in Lemma 3.3, according to which we can choose (x~,x~)ω(u,v). It holds limt+H(u(t),v(t))=H(x~,x~).

Case I. There exists t¯t1 such that H(u(t¯),v(t¯))=H(x~,x~). From Lemma 3.3(ii), we have ddtH(u(t),v(t))Ax¨(t)2+B(t)x˙(t)20 for almost every tt1, hence H(u(t),v(t))H(x~,x~) for every tt¯. On the other hand, H(u(t),v(t))limt+H(u(t),v(t))=H(x~,x~) for every tt1, hence H(u(t),v(t))=H(x~,x~) for every tt¯. Hence, (d/dt)H(u(t),v(t))=0, which leads to 0Ax¨(t)2+B(t)x˙(t)20 for almost every tt¯. Since A<0 and B(t)<0 for every tt1, the latter inequality can hold only if x˙(t)=x¨(t)=0 for almost every tt¯. Consequently, x˙,x¨L1([t0,+),Rn) and x is constant on [t¯,+).

Case II. We assume that H(u(t),v(t))>H(x~,x~) holds for every tt1. Let Ω:=ω(u,v). According to Lemma 3.3, Ω is nonempty and compact and H is constant on Ω. Since H is a KL function, according to Lemma 3.4, there exist ε,η>0 and ϕΘη such that for every (z~,w~) in the intersection {(z,w)Rn×Rn:dist((z,w),Ω)<ε}{(z,w)Rn×Rn:H(x¯,x¯)<H(z,w)<H(x~,x~)+η} one has ϕ(H(z~,w~)H(x~,x~))H(z~,w~)1.

Since limt+dist(u(t),v(t),Ω)=0, there exists t2t1 such that dist(u(t),v(t)),Ω)<ϵ for every tt2. Since limt+H(u(t),v(t))=H(x¯,x¯), there exists t3t1 such that H(x¯,x¯)<H(u(t),v(t))<H(x¯,x¯)+η for every tt3. Hence, for every tT:=max(t2,t3), we have ϕ(H(u(t),v(t))H(x¯,x¯))H(u(t),v(t))1.

According to Lemma 3.3 (ii) and (v), we have (d/dt)H(u(t),v(t))Ax¨(t)2+B(t)x˙(t)20 and H(u(t),v(t))x¨(t)+p(t)x˙(t), hence ddtϕ(H(u(t),v(t))H(x~,x~))=ϕ(H(u(t),v(t))H(x~,x~))ddtH(u(t),v(t))Ax¨(t)2+B(t)x˙(t)2x¨(t)+p(t)x˙(t) for almost every t[T,+).

By integrating on the interval [T,T¯], for T¯>T, we obtain ϕ(H(u(T¯),v(T¯))H(x~,x~))TT¯Ax¨(s)2+B(s)x˙(s)2x¨(s)+p(s)x˙(s)dsϕ(H(u(T),v(T))H(x~,x~)).

Since ϕ is bounded from below, A<0,B(s)<0 and p(s)>0 for every sT and T¯ was arbitrarily chosen, we obtain that 0T+Ax¨(s)2B(s)x˙(s)2x¨(s)+p(s)x˙(s)dsϕ(H(u(T),v(T))H(x¯,x¯)), which leads to tx¨(t)2x¨(t)+p(t)x˙(t),tx˙(t)2x¨(t)+p(t)x˙(t)L1([t0,+),Rn) and further to tx¨(t)x˙(t)x¨(t)+p(t)x˙(t)L1([t0,+),Rn). Since p is bounded from above on [t0,+) and x˙(t)+x¨(t)=x¨(t)2x¨(t)+p(t)x˙(t)+p(t)x˙(t)2x¨(t)+p(t)x˙(t)+(1+p(t))x¨(t)x˙(t)x¨(t)+p(t)x˙(t), we obtain that x˙,x¨L1([t0,+),Rn).

Finally, since x˙L1([t0,+),Rn), the limit limt+x(t) exists and it is finite. In conclusion, there exists x¯crit(g) such that limt+x(t)=x¯.

Remark 3.2

According to Remark 2.1, there exists N>0 such that x(3)(t)N(x¨(t)+x˙(t)) for almost every tt0, hence, under the hypotheses of Theorem 3.2, one has x(3)L1([t0,+),Rn).

Remark 3.3

Since the class of semi-algebraic functions is closed under addition [see, for example, [Citation27]] and (x,y)12xy2 is semi-algebraic, the conclusion of the previous theorem holds, if, instead of asking that H is a KL function, we ask that g is semi-algebraic.

Remark 3.4

Assume that g is coercive, that is limu+g(u)=+. For u0,v0Rn, let xC2([0,+),Rn) be the unique global solution of (Equation2). Then x is bounded.

Indeed, notice that g is bounded from below, being a continuous and coercive function [see [Citation28]]. From (Equation10), it follows that βx˙(T)+x(T) is contained for every Tt1 in a lower level set of g, which is bounded. According to Theorem 3.1, βx˙(t)0,t+, which implies that x is bounded.

4. Convergence rates

In this section, we will assume that the regularized function H satisfies the Lojasiewicz property, which, as noted in the previous section, corresponds to a particular choice of the desingularizing function ϕ [see [Citation19,Citation22,Citation25]].

Definition 4.1

Let f:RnR be a differentiable function. The function f is said to fulfil the Łojasiewicz property, if for every x¯critf there exist K,ϵ>0 and θ(0,1) such that |f(x)f(x¯)|θKf(x) for every x fulfilling xx¯<ϵ. The number θ is called the Łojasiewicz exponent of f at the critical point x¯.

In the following theorem, we provide convergence rates for the trajectory generated by (Equation2), its velocity and acceleration in terms of the Łojasiewicz exponent of H [see, also, [Citation22,Citation25]].

Theorem 4.1

Assume that g is bounded from below and, for u0,v0Rn, let x be the unique strong global solution of (Equation2). Suppose that x is bounded, let x¯crit(g) be such that limt+x(t)=x¯ and suppose that H:Rn×RnR,H(x,y)=g(x)+12xy2 fulfils the Łojasiewicz property at (x¯,x¯)critH with Łojasiewicz exponent θ. Let be (see Remark 2.1) N:=maxtt0Lg+|α|t2,γ+|α|t. Then, there exist a1,a2,a3,a4>0 and T>0 such that for almost every t[T,+), the following statements are true:

  1. if θ(0,12), then x converges in finite time;

  2. if θ=12, then x(t)x¯a1ea2t, x˙(t)a1ea2t and x¨(t)Na1ea2t;

  3. if θ(12,1), then x(t)x¯(a3t+a4)(1θ)/(2θ1), x˙(t)(a3t+a4)(1θ)/(2θ1) and x¨(t)N(a3t+a4)(1θ)/(2θ1).

Proof.

Let be 0<β<min(2/Lg,(Lg2+2γLgLg)/Lg) and t1>0 such that for every tt1 the inequalities (Equation9) hold. We define for every t[t1,+) σ(t):=t+(x˙(s)+x¨(s))ds. Let t[t1,+) be fixed. For Tt, we have x(t)x¯=x(T)x¯tTx˙(s)dsx(T)x¯+tTx˙(s)ds. By taking the limit as T+, we obtain (15) x(t)x¯t+x˙(s)dsσ(t).(15)

Furthermore, for T>t, we have x˙(t)=x˙(T)tTx¨(s)dsx˙(T)+tTx¨(s)ds. By taking the limit as T+, we obtain (16) x˙(t)t+x¨(s)dsσ(t).(16)

According to Remark 2.1, it holds x(3)(t)N(x¨(t)+x˙(t)) for almost every tt1, x¨(t)=x¨(T)tTx(3)(s)dsx¨(T)+tTx(3)(s)dsx¨(T)+NtT(x¨(s)+x˙(s))dsT>t. By taking the limit as T+, we obtain (17) x¨(t)Nσ(t).(17) Next, we show that there exists m<0 such that (18) Ax¨(t)2+B(t)x˙(t)2x¨(t)+p(t)x˙(t)m(x˙(t)+x¨(t)).(18)

Indeed, (x¨(t)+p(t)x˙(t))(x˙(t)+x¨(t))=x¨(t)2+(1+p(t))x˙(t)x¨(t)+p(t)x˙(t)232+p(t)2x¨(t)2+12+3p(t)2x˙(t)2Amx¨(t)2+B(t)mx˙(t)2, where m:=maxmaxtt1A3/2+p(t)/2,maxtt1B(t)(3/2)p(t)+1/2. It is an easy verification that m<0.

As we have seen in the proof of Theorem 3.2, if there exists t¯t1 such that H(u(t¯),v(t¯))=H(x¯,x¯), then x is constant on [t¯,+) and the conclusion follows.

On the other hand, if for every tt1 one has that H(u(t),v(t))>H(x¯,x¯), then according to the proof of Theorem 3.2, there exist ϵ>0 and Tt1 such that (u(t),v(t))(x¯,x¯)<ϵ and ddt(H(u(t),v(t))H(x¯,x¯))1θAx¨(t)2+B(t)x˙(t)2x¨(t)+p(t)x˙(t) for almost every tT. Busing (Equation18), we obtain that M(x˙(t)+x¨(t))+ddt(H(u(t),v(t))H(x¯,x¯))1θ0 for almost every tT, where M:=m>0.

For tT, we integrate the last relation on the interval [t,T~], where T~>t, which yields MtT~(x˙(s)+x¨(s))ds+(H(u(T~),v(T~))H(x¯,x¯))1θ(H(u(t),v(t))H(x¯,x¯))1θ. By taking the limits as T~+, we get Mσ(t)(H(u(t),v(t))H(x¯,x¯))1θ. On the other hand, according to the KL property for H and Lemma 3.3 (v), we have (H(u(t),v(t))H(x¯,x¯))θKH(u(t),v(t))K(x¨(t)+p(t)x˙(t)) for almost every tT, hence Mσ(t)K(1θ)/θ(x¨(t)+p(t)x˙(t))(1θ)/θ for almost every tT. By denoting a:=maxtTp(t)(0,+), one can easily see that a>1 and so Mσ(t)(aK)(1θ)/θ(x˙(t)+x¨(t))(1θ)/θ for almost every tT. Taking into account that x˙(t)+x¨(t)=σ˙(t), the previous inequality is nothing else than (19) cσθ/(1θ)(t)σ˙(t) for almost every tT,(19) where c:=Mθ/(1θ)/aK>0.

If θ=12, then (Equation19) becomes cσ(t)+σ˙(t)0 for almost every tT. By multiplying with ect and integrating on [T,t], it follows that there exists a1>0 such that σ(t)a1ea2t for every tT, where a2=c. Using (Equation15)–(Equation17), we obtain x(t)x¯a1ea2t,x˙(t)a1ea2tandx¨(t)Na1ea2t for every tT, which proves (b).

Assume now that 0<θ<12. In this case, (Equation19) leads to ddtσ(12θ)/(1θ)(t)=12θ1θσθ/(1θ)(t)σ˙(t)c12θ1θ for almost every tT. By integrating on [T,t] we obtain σ(12θ)/(1θ)(t)α¯t+β¯, for every tT, where α¯>0. Then there exists TˆT such that σ(t)0 for every tTˆ, thus, x is constant on [Tˆ,+) and (a) follows.

Assume now that 12<θ<1. In this case, (Equation19) leads to ddtσ(12θ)/(1θ)(t)=12θ1θσθ/(1θ)(t)σ˙(t)c2θ11θ for almost every tT. By integrating on [T,t] we obtain σ(12θ)/(1θ)(t)a3t+a4 for every tT, where a3,a4>0, or, equivalently, σ(t)(a3t+a4)(1θ)/(2θ1) for every tT. Using again (Equation15)–(Equation17), we obtain x(t)x¯(a3t+a4)(1θ)/(2θ1),x˙(t)(a3t+a4)(1θ)/(2θ1)andx¨(t)N(a3t+a4)(1θ)/(2θ1) for every tT, which proves (c).

Acknowledgements

The authors are thankful to an anonymous reviewer for comments and remarks which were helpful to gain a better insight into the asymptotic behaviour of the trajectories of the studied dynamical system.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

Radu Ioan Bo's research was partially supported by FWF (Austrian Science Fund) [project I 2419-N32] ; Ernö Robert Csetnek's research was supported by FWF (Austrian Science Fund) [project P 29809-N32] and Szilárd Csaba László's research was supported by a grant of Ministry of Research and Innovation - Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si Inovarii (CNCS-UEFISCDI) [project number PN-III-P1-1.1-TE-2016-0266] within PNCDI III.

References

  • Su W, Boyd S, Candes EJ. A differential equation for modeling Nesterov's accelerated gradient method: theory and insights. J Mach Learn Res. 2016;17:1–43.
  • Nesterov YE. A method for solving the convex programming problem with convergence rate O(1/k2). (Russian) Dokl Akad Nauk SSSR. 1983;269(3):543–547.
  • Attouch H, Chbani Z, Peypouquet J, et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math Program. 2018;168(1–2, Ser. B):123–175. doi: 10.1007/s10107-016-0992-8
  • Attouch H, Peypouquet J, Redont P. Fast convex optimization via inertial dynamics with Hessian driven damping. J Differ Equat. 2016;261(10):5734–5783. doi: 10.1016/j.jde.2016.08.020
  • Chambolle A, Dossal C. On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl. 2015;166(3):968–982. doi: 10.1007/s10957-015-0746-4
  • Beck A, Teboulle M. A fast iterative shrinkage- thresholding algorithm for linear inverse problems. SIAM J Imag Sci. 2009;2(1):183–202. doi: 10.1137/080716542
  • Attouch H, Chbani Z, Riahi H. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3. 2017. arXiv:1706.05671.
  • Haraux A, Jendoubi MA. Asymptotics for a second order differential equation with a linear, slowly time-decaying damping term. Evol Equat Contr Theor. 2013;2(3):461–470. doi: 10.3934/eect.2013.2.461
  • Balti M. Asymptotic behavior for second-order differential equations with nonlinear slowly time-decaying damping and integrable source. Electron J Differ Equat. 2015;302:11pp.
  • Bégout P, Bolte J, Jendoubi MA. On damped second-order gradient systems. J Differ Equat. 2015;259:3115–3143. doi: 10.1016/j.jde.2015.04.016
  • Haraux A, Jendoubi MA. Convergence of solutions of second-order gradient-like systems with analytic nonliniearities. J Differ Equat. 1998;144(2):313–320. doi: 10.1006/jdeq.1997.3393
  • Chill R, Jendoubi MA. Asymptotics for a second order differential equation with a linear, slowly time-decaying damping term. Evol Equat Contr Theor. 2013;2(3):461–470. doi: 10.3934/eect.2013.2.461
  • Boţ RI, Csetnek ER, László SC. Approaching nonsmooth nonconvex minimization through second-order proximal-gradient dynamical systems. J Evol Equat. 2018. doi:10.1007/s00028-018-0441-7.
  • Abbas B, Attouch H, Svaiter BF. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J Optim Theory Appl. 2014;161(2):331–360. doi: 10.1007/s10957-013-0414-5
  • Attouch H, Svaiter BF. A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J Control Optim. 2011;49(2):574–598. doi: 10.1137/100784114
  • Haraux A. Systèmes Dynamiques Dissipatifs et Applications, Recherches en Mathématiques Appliquées 17, Masson, Paris, 1991.
  • Sontag ED. Mathematical control theory. Deterministic finite-dimensional systems. 2nd ed. New York: Springer-Verlag; 1998. (Texts in Applied Mathematics 6).
  • Alvarez F, Attouch H, Bolte J, et al. A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J Math Pures et Appl. 2002;81(8):747–779. doi: 10.1016/S0021-7824(01)01253-3
  • Łojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, 87–89, 1963.
  • Kuryka K. On gradients of functions definable in o-minimal structures. Ann l Fourier (Grenoble). 1998;48(3):769–783. doi: 10.5802/aif.1638
  • Attouch H, Bolte J, Redont P, et al. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math Oper Res. 2010;35(2):438–457. doi: 10.1287/moor.1100.0449
  • Bolte J, Daniilidis A, Lewis A. The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J Optim. 2006;17(4):1205–1223. doi: 10.1137/050644641
  • Bolte J, Daniilidis A, Lewis A, et al. Clarke subgradients of stratifiable functions. SIAM J Optim. 2007;18(2):556–572. doi: 10.1137/060670080
  • Bolte J, Daniilidis A, Ley O, et al. Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans Am Math Soc. 2010;362(6):3319–3363. doi: 10.1090/S0002-9947-09-05048-X
  • Attouch H, Bolte J. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math Program. 2009;116(1–2, Series B):5–16. doi: 10.1007/s10107-007-0133-5
  • Attouch H, Bolte J, Svaiter BF. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math Program. 2013;137(1–2, Series A):91–129. doi: 10.1007/s10107-011-0484-9
  • Bolte J, Sabach S, Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program Ser A. 2014;146(1–2):459–494. doi: 10.1007/s10107-013-0701-9
  • Rockafellar RT, Wets RJ-B. Variational analysis, fundamental principles of mathematical sciences, 317. Berlin: Springer-Verlag; 1998.