322
Views
7
CrossRef citations to date
0
Altmetric
Original Articles

Analysis of the gradient method with an Armijo–Wolfe line search on a class of non-smooth convex functions

ORCID Icon & ORCID Icon
Pages 223-242 | Received 03 Jun 2019, Accepted 24 Sep 2019, Published online: 09 Oct 2019
 

ABSTRACT

It has long been known that the gradient (steepest descent) method may fail on non-smooth problems, but the examples that have appeared in the literature are either devised specifically to defeat a gradient or subgradient method with an exact line search or are unstable with respect to perturbation of the initial point. We give an analysis of the gradient method with steplengths satisfying the Armijo and Wolfe inexact line search conditions on the non-smooth convex function f(x)=a|x(1)|+i=2nx(i). We show that if a is sufficiently large, satisfying a condition that depends only on the Armijo parameter, then, when the method is initiated at any point x0Rn with x0(1)0, the iterates converge to a point x¯ with x¯(1)=0, although f is unbounded below. We also give conditions under which the iterates f(xk), using a specific Armijo–Wolfe bracketing line search. Our experimental results demonstrate that our analysis is reasonably tight.

2010 MATHEMATICS SUBJECT CLASSIFICATIONS:

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. There is a subtle distinction between the Wolfe condition given here and that given in [Citation16], since here the Wolfe condition is understood to fail if the gradient of f does not exist at xk+tdk, while in [Citation16] it is understood to fail if the function of one variable sf(xk+sdk) is not differentiable at s = t. For the example analysed here, these conditions are equivalent.

2. The same oscillatory behaviour occurs if we replace the Wolfe condition by the Goldstein condition f(xk+tdk)f(xk)+c2tf(xk)Tdk.

4. In our implementation, we made no attempt to determine whether fˆ is differentiable at a given point or not. This is essentially impossible in floating point arithmetic, but as noted earlier, the gradient is defined at randomly generated points with probability one; there is no reason to suppose that any of the methods tested will generate points where fˆ is not differentiable, except in the limit, and hence the ‘subgradient’ method actually reduces to the gradient method with tk=1/k. See [Citation16] for further discussion.

Additional information

Funding

Michael L. Overton was supported in part by National Science Foundation [grant number DMS-1620083].

Notes on contributors

Azam Asl

Azam Asl obtained her B.Sc. in Computer Engineering at Sharif University of Technology in Teheran in 2008. She will receive her Ph.D. in Computer Science from New York University in 2020.

Michael L. Overton

Michael L. Overton obtained his Ph.D. in Computer Science from Stanford University in 1979. He is a Silver Professor of Computer Science and Mathematics at the Courant Institute of Mathematical Sciences, New York University.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,330.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.