322
Views
7
CrossRef citations to date
0
Altmetric
Original Articles

Analysis of the gradient method with an Armijo–Wolfe line search on a class of non-smooth convex functions

ORCID Icon & ORCID Icon
Pages 223-242 | Received 03 Jun 2019, Accepted 24 Sep 2019, Published online: 09 Oct 2019
 

ABSTRACT

It has long been known that the gradient (steepest descent) method may fail on non-smooth problems, but the examples that have appeared in the literature are either devised specifically to defeat a gradient or subgradient method with an exact line search or are unstable with respect to perturbation of the initial point. We give an analysis of the gradient method with steplengths satisfying the Armijo and Wolfe inexact line search conditions on the non-smooth convex function f(x)=a|x(1)|+i=2nx(i). We show that if a is sufficiently large, satisfying a condition that depends only on the Armijo parameter, then, when the method is initiated at any point x0Rn with x0(1)0, the iterates converge to a point x¯ with x¯(1)=0, although f is unbounded below. We also give conditions under which the iterates f(xk), using a specific Armijo–Wolfe bracketing line search. Our experimental results demonstrate that our analysis is reasonably tight.

2010 MATHEMATICS SUBJECT CLASSIFICATIONS:

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. There is a subtle distinction between the Wolfe condition given here and that given in [Citation16], since here the Wolfe condition is understood to fail if the gradient of f does not exist at xk+tdk, while in [Citation16] it is understood to fail if the function of one variable sf(xk+sdk) is not differentiable at s = t. For the example analysed here, these conditions are equivalent.

2. The same oscillatory behaviour occurs if we replace the Wolfe condition by the Goldstein condition f(xk+tdk)f(xk)+c2tf(xk)Tdk.

4. In our implementation, we made no attempt to determine whether fˆ is differentiable at a given point or not. This is essentially impossible in floating point arithmetic, but as noted earlier, the gradient is defined at randomly generated points with probability one; there is no reason to suppose that any of the methods tested will generate points where fˆ is not differentiable, except in the limit, and hence the ‘subgradient’ method actually reduces to the gradient method with tk=1/k. See [Citation16] for further discussion.

Additional information

Funding

Michael L. Overton was supported in part by National Science Foundation [grant number DMS-1620083].

Notes on contributors

Azam Asl

Azam Asl obtained her B.Sc. in Computer Engineering at Sharif University of Technology in Teheran in 2008. She will receive her Ph.D. in Computer Science from New York University in 2020.

Michael L. Overton

Michael L. Overton obtained his Ph.D. in Computer Science from Stanford University in 1979. He is a Silver Professor of Computer Science and Mathematics at the Courant Institute of Mathematical Sciences, New York University.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.