Full article: Tensor methods for finding approximate stationary points of convex functions

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this paper, we consider the problem of finding ε-approximate stationary points of convex functions that are p-times differentiable with ν-Hölder continuous pth derivatives. We present tensor methods with and without acceleration. Specifically, we show that the non-accelerated schemes take at most $O (ϵ^{- 1 / (p + ν - 1)})$ iterations to reduce the norm of the gradient of the objective below given $ϵ \in (0, 1)$ . For accelerated tensor schemes, we establish improved complexity bounds of $O (ϵ^{- (p + ν) / [(p + ν - 1) (p + ν + 1)]})$ and $O (| \log (ϵ) | ϵ^{- 1 / (p + ν)})$ , when the Hölder parameter $ν \in [0, 1]$ is known. For the case in which ν is unknown, we obtain a bound of $O (ϵ^{- (p + 1) / [(p + ν - 1) (p + 2)]})$ for a universal accelerated scheme. Finally, we also obtain a lower complexity bound of $O (ϵ^{- 2 / [3 (p + ν) - 2]})$ for finding ε-approximate stationary points using p-order tensor methods.

Keywords:

2010 Mathematics Subject Classifications:

1. Introduction

1.1. Motivation

In this paper, we study tensor methods for unconstrained optimization, i.e. methods in which the iterates are obtained by the (approximate) minimization of models defined from high-order Taylor approximations of the objective function. This type of method is not new in the optimization literature (see, e.g. [Citation1,Citation4,Citation33]). Recently, the interest for tensor methods has been renewed by the work in [Citation2], where p-order tensor methods were proposed for unconstrained minimization of nonconvex functions with Lipschitz continuous pth derivatives. It was shown that these methods take at most $O (ϵ^{- \frac{p + 1}{p}})$ iterations to find an ε-approximate first-order stationary point of the objective, generalizing the bound of $O (ϵ^{- 3 / 2})$ , originally established in [Citation31] for the Cubic Regularization of Newton's Method (p = 2). After [Citation2], several high-order methods have been proposed and analysed for nonconvex optimization (see, e.g. [Citation9–11,Citation23]), resulting even in worst-case complexity bounds for the number of iterations that p-order methods need to generate approximate qth order stationary points [Citation7,Citation8].

More recently, in [Citation27], p-order tensor methods with and without acceleration were proposed for unconstrained minimization of convex functions with Lipschitz continuous pth derivatives. As it is usual in Convex Optimization, these methods aim the generation of a point $\bar{x}$ such that $f (\bar{x}) - f^{*} \leq ϵ$ , where f is the objective function, $f^{*}$ is its optimal value and $ϵ > 0$ is a given precision. Specifically, it was shown that the non-accelerated scheme takes at most $O (ϵ^{- 1 / p})$ iterations to reduce the functional residual below a given $ϵ > 0$ , while the accelerated scheme takes at most $O (ϵ^{- 1 / (p + 1)})$ iterations to accomplish the same task. Auxiliary problems in these methods consist in the minimization of a $(p + 1)$ -regularization of the pth order Taylor approximation of the objective, which is a multivariate polynomial. A remarkable result shown in [Citation27] (which distinguishes this work from [Citation1]) is that, in the convex case, the auxiliary problems in tensor methods become convex when the corresponding regularization parameter is sufficiently large. Since [Citation27], several high-order methods have been proposed for convex optimization (see, e.g. [Citation13,Citation14,Citation18,Citation20]), including near-optimal methods [Citation5,Citation15,Citation21,Citation28,Citation29] motivated by the second-order method in [Citation24]. In particular, in [Citation18], we have adapted and generalized the methods in [Citation16,Citation17,Citation27] to handle convex functions with ν-Hölder continuous pth derivatives. It was shown that the non-accelerated schemes take at most $O (ϵ^{- 1 / (p + ν - 1)})$ iterations to generate a point with functional residual smaller than a given $ϵ \in (0, 1)$ , while the accelerated variants take only $O (ϵ^{- 1 / (p + ν)})$ iterations when the parameter ν is explicitly used in the scheme. For the case in which ν is not known, we also proposed a universal accelerated scheme for which we established an iteration complexity bound of $O (ϵ^{- p / [(p + 1) (p + ν - 1)]})$ .

As a natural development, in this paper, we present variants of the p-order methods ( $p \geq 2$ ) proposed in [Citation18] that aim the generation of a point $\bar{x}$ such that $∥ \nabla f (\bar{x}) ∥_{*} \leq ϵ$ , for a given threshold $ϵ \in (0, 1)$ . In the context of nonconvex optimization, finding approximate stationary points is usually the best one can expect from local optimization methods. In the context of convex optimization, one motivation to search for approximate stationary points is the fact that the norm of the gradient may serve as a measure of feasibility and optimality when one applies the dual approach for solving constrained convex problems (see, e.g. [Citation26]). Another motivation comes from the inexact high-order proximal-point methods, recently proposed in [Citation28,Citation29], in which the iterates are computed as approximate stationary points of uniformly convex models.

Specifically, our contributions are the following:

We show that the non-accelerated schemes in [Citation18] take at most $O (ϵ^{- 1 / (p + ν - 1)})$ iterations to reduce the norm of the gradient of the objective below a given $ϵ \in (0, 1)$ , when the objective is convex, and $O (ϵ^{- (p + ν) / (p + ν - 1)})$ iterations, when f is nonconvex. These complexity bounds extend our previous results reported in [Citation16] for regularized Newton methods (case p = 2). Moreover, our complexity bound for the nonconvex case agrees in order with the bounds obtained in [Citation9,Citation23] for different tensor methods.
For accelerated tensor schemes, we establish improved complexity bounds of $O (ϵ^{- (p + ν) / [(p + ν - 1) (p + ν + 1)]})$ , when the Hölder parameter $ν \in [0, 1]$ is known. This result generalizes the bound of $O (ϵ^{- 2 / 3})$ obtained in [Citation26] for the accelerated gradient method ( $p = ν = 1$ ). In contrast, when ν is unknown, we prove a bound of $O (ϵ^{- (p + 1) / [(p + ν - 1) (p + 2)]})$ for a universal accelerated scheme.
For the case in which ν and the corresponding Hölder constant are known, we propose tensor schemes for the composite minimization problem. In particular, we prove a bound of $O (R^{\frac{p + ν - 1}{p + ν}} | \log_{2} (ϵ) | ϵ^{- \frac{1}{p + ν}})$ iterations, where R is an upper bound for the initial distance to the optimal set. This result generalizes the bounds obtained in [Citation26] for first-order and second-order accelerated schemes combined with a regularization approach (p = 1, 2 and $ν = 1$ ). We also prove a bound of $O (S^{\frac{p + ν - 1}{p + ν}} | \log_{2} (ϵ) | ϵ^{- 1})$ iterations, where S is an upper bound for the initial functional residual.
Considering the same class of difficult functions described in [Citation18], we derive a lower complexity bound of $O (ϵ^{- 2 / [3 (p + ν) - 2]})$ iterations (in terms of the initial distance to the optimal set), and a lower complexity bound of $O (ϵ^{- 2 (p + ν) / [3 (p + ν) - 2]})$ iterations (in terms of the initial functional residual), for p-order tensor methods to find ε-approximate stationary points of convex functions with ν-Hölder continuous pth derivatives. These bounds generalize the corresponding bounds given in [Citation6] for first-order methods ( $p = ν = 1$ ).

The paper is organized as follows. In Section 2, we define our problem. In Section 3, we present complexity results for tensor schemes without acceleration. In Section 4, we present complexity results for accelerated schemes. In Section 5, we analyse tensor schemes for the composite minimization problem. Finally, in Section 6, we establish lower complexity bounds for tensor methods find ε-approximate stationary points of convex functions under the Hölder condition. Some auxiliary results are left in the Appendix.

1.2. Notations and generalities

Let $E$ be a finite-dimensional real vector space, and $E^{*}$ be its dual space. We denote by $⟨ s, x ⟩$ the value of the linear functional $s \in E^{*}$ at point $x \in E$ . Spaces $E$ and $E^{*}$ are equipped with conjugate Euclidean norms: (1) $∥ x ∥ = ⟨ B x, x ⟩^{1 / 2}, x \in E, ∥ s ∥_{*} = ⟨ s, B^{- 1} s ⟩^{1 / 2}, s \in E^{*} .$ (1) where $B : E \to E^{*}$ is a self-adjoint positive definite operator ( $B ≻ 0$ ). For a smooth function $f : E \to R$ , denote by $\nabla f (x)$ its gradient, and by $\nabla^{2} f (x)$ its Hessian evaluated at point $x \in E$ . Then $\nabla f (x) \in E^{*}$ and $\nabla^{2} f (x) h \in E^{*}$ for $x, h \in E$ .

For any integer $p \geq 1$ , denote by $D^{p} f (x) [h_{1}, \dots, h_{p}]$ the directional derivative of function f at x along directions $h_{i} \in E$ , $i = 1, \dots, p$ . For any $x \in d o m f$ and $h_{1}, h_{2} \in E$ we have $D f (x) [h_{1}] = ⟨ \nabla f (x), h_{1} ⟩ a n d D^{2} f (x) [h_{1}, h_{2}] = ⟨ \nabla^{2} f (x) h_{1}, h_{2} ⟩ .$ If $h_{1} = \dots = h_{p} = h \in E$ , we denote $D^{p} f (x) [h_{1}, \dots, h_{p}]$ by $D^{p} f (x) [h]^{p}$ . Using this notation, the pth order Taylor approximation of function f at $x \in E$ can be written as follows: (2) $f (x + h) = Φ_{x, p} (x + h) + o (∥ h ∥^{p}),$ (2) where (3) $Φ_{x, p} (y) \equiv f (x) + \sum_{i = 1}^{p} \frac{1}{i!} D^{i} f (x) [y - x]^{i}, y \in E .$ (3) Since $D^{p} f (x) [\cdot]$ is a symmetric p-linear form, its norm is defined as: $∥ D^{p} f (x) ∥ = max_{h_{1}, \dots, h_{p}} \{|D^{p} f (x) [h_{1}, \dots, h_{p}]| : ∥ h_{i} ∥ \leq 1, i = 1, \dots, p\} .$ It can be shown that (see, e.g. Appendix 1 in [Citation30]) $∥ D^{p} f (x) ∥ = max_{h} \{|D^{p} f (x) [h]^{p}| : ∥ h ∥ \leq 1\} .$ Similarly, since $D^{p} f (x) [., \dots, .] - D^{p} f (y) [., \dots, .]$ is also a symmetric p-linear form for fixed $x, y \in E$ , it follows that $∥ D^{p} f (x) - D^{p} f (y) ∥ = max_{h} \{|D^{p} f (x) [h]^{p} - D^{p} f (y) [h]^{p}| : ∥ h ∥ \leq 1\} .$

2. Problem statement

In this paper, we consider methods for solving the following minimization problem (4) $min_{x \in E} f (x),$ (4) where $f : E \to R$ is a convex function p-times differentiable. We assume that (Equation4(4) $min_{x \in E} f (x),$ (4) ) has at least one optimal solution $x^{*} \in E$ . As in [Citation18], the level of smoothness of the objective f will be characterized by the family of Hölder constants (5) $H_{f, p} (ν) \equiv sup_{x, y \in E} \{\frac{∥ D^{p} f (x) - D^{p} f (y) ∥}{∥ x - y ∥^{ν}}\}, 0 \leq ν \leq 1.$ (5) From (Equation5(5) $H_{f, p} (ν) \equiv sup_{x, y \in E} \{\frac{∥ D^{p} f (x) - D^{p} f (y) ∥}{∥ x - y ∥^{ν}}\}, 0 \leq ν \leq 1.$ (5) ), it can be shown that, for all $x, y \in E$ , (6) $\begin{aligned} | f (y) - Φ_{x, p} (y) | \leq \frac{H_{f, p} (ν)}{p!} ∥ y - x ∥^{p + ν}, \end{aligned}$ (6) (7) $\begin{aligned} ∥ \nabla f (y) - \nabla Φ_{x, p} (y) ∥_{*} \leq \frac{H_{f, p} (ν)}{(p - 1)!} ∥ y - x ∥^{p + ν - 1}, \end{aligned}$ (7) and (8) $∥ \nabla^{2} f (y) - \nabla^{2} Φ_{x, p} (y) ∥_{*} \leq \frac{H_{f, p} (ν)}{(p - 2)!} ∥ y - x ∥^{p + ν - 2} .$ (8) Given $x \in E$ , if $H_{f, p} (ν) < + \infty$ and $H \geq H_{f, p} (ν)$ , by (Equation6(6) $\begin{aligned} | f (y) - Φ_{x, p} (y) | \leq \frac{H_{f, p} (ν)}{p!} ∥ y - x ∥^{p + ν}, \end{aligned}$ (6) ) we have (9) $f (y) \leq Φ_{x, p} (y) + \frac{H}{p!} ∥ y - x ∥^{p + ν}, y \in E .$ (9) This property motivates the use of the following class of models of f around $x \in E$ : (10) $Ω_{x, p, H}^{(α)} (y) = Φ_{x, p} (y) + \frac{H}{p!} ∥ y - x ∥^{p + α}, α \in [0, 1] .$ (10) Note that, by (Equation9(9) $f (y) \leq Φ_{x, p} (y) + \frac{H}{p!} ∥ y - x ∥^{p + ν}, y \in E .$ (9) ), if $H \geq H_{f, p} (ν)$ then $f (y) \leq Ω_{x, p, H}^{(ν)} (y)$ for all $y \in E$ .

3. Tensor schemes without acceleration

Let us consider the following assumption:

$H_{f, p} (ν) < + \infty$ for some $ν \in [0, 1]$ .

Regarding the smoothness parameter ν, there are only two possible situations: either ν is known, or ν is unknown. In order to cover both cases in a single framework, as in [Citation18], we shall consider the parameter

(11)

α = \{\begin{cases} ν, & i f ν i s k n o w n, \\ 1, & i f ν i s u n k n o w n . \end{cases} - 1.5 p c

(11)

Remark 3.1

If ν is unknown, by (Equation11(11) $α = \{\begin{cases} ν, & i f ν i s k n o w n, \\ 1, & i f ν i s u n k n o w n . \end{cases} - 1.5 p c$ (11) ) we set $α = 1$ in Algorithm 1. The resulting algorithm is a universal scheme that can be viewed as a generalization of the universal second-order method (6.10) in [Citation16]. Moreover, it is worth mentioning that for p = 3 and $α = ν$ , one can use Gradient Methods with Bregman distance [Citation19] to approximately solve (12) in the sense of (13).

For both cases (ν known or unknown), Algorithm 1 is a particular instance of Algorithm 1 in [Citation18] in which $M_{t} = 2^{i_{t}} H_{t}$ for all $t \geq 0$ . Let us define the following function of ε: (15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) The next lemma provides upper bounds on $M_{t}$ and on the number of calls of the oracle in Algorithm 1.

Lemma 3.1

Suppose that H1 holds. Given $ϵ \in (0, 1)$ , assume that ${x_{t}}_{t = 0}^{T}$ is a sequence generated by Algorithm 1 such that (16) $∥ \nabla f (x_{t}) ∥_{*} > ϵ, t = 0, \dots, T .$ (16) Then, (17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) and, consequently, (18) $M_{t} = 2^{i_{t}} H_{t} \leq 2 max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T - 1,$ (18) Moreover, the number $O_{T}$ of calls of the oracle after T iterations is bounded as follows: (19) $O_{T} \leq 2 T + \log_{2} max \{H_{0}, N_{ν} (ϵ)\} - \log_{2} H_{0} .$ (19)

Proof.

Let us prove (Equation17(17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) ) by induction. Clearly, it holds for t = 0. Assume that (Equation17(17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) ) is true for some t, $0 \leq t \leq T - 1$ . If ν is known, then by (Equation11(11) $α = \{\begin{cases} ν, & i f ν i s k n o w n, \\ 1, & i f ν i s u n k n o w n . \end{cases} - 1.5 p c$ (11) ) we have $α = ν$ . Thus, it follows from H1 and Lemma A.2 in [Citation18] that the final value of $2^{i_{t}} H_{t}$ cannot be bigger than $2 max {(3 / 2) H_{f, p} (ν), 3 θ (p - 1)!}$ , since otherwise we should stop the line search earlier. Therefore, $H_{t + 1} = \frac{1}{2} 2^{i_{t}} H_{t} \leq max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\} = N_{ν} (ϵ) \leq max \{H_{0}, N_{ν} (ϵ)\},$ that is, (Equation17(17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) ) holds for t = t + 1. On the other hand, if ν is unknown, we have $α = 1$ . In view of (Equation16(16) $∥ \nabla f (x_{t}) ∥_{*} > ϵ, t = 0, \dots, T .$ (16) ), Corollary A.5 [Citation18] and $ϵ \in (0, 1)$ , we must have $2^{i_{t}} H_{t} \leq 2 max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} {(\frac{4}{ϵ})}^{\frac{1 - ν}{p + ν - 1}}\} \leq 2 N_{ν} (ϵ) .$ Consequently, it follows that $H_{t + 1} = \frac{1}{2} 2^{i_{t}} H_{t} \leq N_{ν} (ϵ) \leq max \{H_{0}, N_{ν} (ϵ)\},$ that is, (Equation17(17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) ) holds for t + 1. This completes the induction argument. Using (Equation17(17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) ), for $t = 0, \dots, T - 1$ we get $M_{t} = 2 H_{t + 1} \leq 2 max {H_{0}, N_{ν} (ϵ)}$ . Finally, note that at the tth iteration of Algorithm 1, the oracle is called $i_{t} + 1$ times. Since $H_{t + 1} = 2^{i_{t} - 1} H_{t}$ , it follows that $i_{t} - 1 = \log_{2} H_{t + 1} - \log_{2} H_{t}$ . Thus, by (Equation17(17) $H_{t} \leq max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T,$ (17) ) we get $\begin{aligned} O_{T} & = \sum_{t = 0}^{T - 1} (i_{t} + 1) = \sum_{t = 0}^{T - 1} 2 + \log_{2} H_{t + 1} - \log_{2} H_{t} = 2 T + \log_{2} H_{T} - \log_{2} H_{0} \\ \leq 2 T + \log_{2} max \{H_{0}, N_{ν} (ϵ)\} - \log_{2} H_{0}, \end{aligned}$ and the proof is complete.

Let us consider the additional assumption:

The level sets of f are bounded, that is, $max_{x \in L (x_{0})} ∥ x - x^{*} ∥ \leq D_{0} \in (1, + \infty)$ for $L (x_{0}) \equiv {x \in E : f (x) \leq f (x_{0})}$ , with $x_{0}$ being the starting point.

The next theorem gives global convergence rates for Algorithm 1 in terms of the functional residual.

Theorem 3.2

Suppose that H1 and H2 are true and let ${x_{t}}_{t = 0}^{T}$ be a sequence generated by Algorithm 1 such that, for $t = 0, \dots, T,$ we have $∥ \nabla f (x_{t, i}^{+}) ∥_{*} > ϵ, i = 0, \dots, i_{t} .$ Let m be the first iteration number such that $f (x_{m}) - f (x^{*}) \leq 4 [8 (p + 1)!]^{p + α - 1} max \{H_{0}, N_{ν} (ν)\} D_{0}^{p + α},$ and assume that m<T. Then (20) $m \leq \frac{1}{\ln (\frac{p + α}{p + α - 1})} \ln max \{1, \log_{2} \frac{f (x_{0}) - f (x^{*})}{2 [8 (p + 1)!]^{p + α - 1} max \{H_{0}, N_{ν} (ϵ)\} D_{0}^{p + α}}\}$ (20) and, for all k, $m < k \leq T,$ we have (21) $f (x_{k}) - f (x^{*}) \leq \frac{[24 p (p + 1)!]^{p + α - 1} 2 max \{H_{0}, N_{ν} (ϵ)\} D_{0}^{p + α}}{(k - m)^{p + α - 1}} .$ (21)

Proof.

By Lemma 3.1, this result follows from Theorem 3.2 in [Citation18] with $M_{ν} = 2 max {H_{0}, N_{ν} (ϵ)}$ .

Now, we can derive global convergence rates for Algorithm 1 in terms of the norm of the gradient.

Theorem 3.3

Under the same assumptions of Theorem 3.2, if T = m + 3s for some $s \geq 1,$ then (22) $g_{T}^{*} \equiv min_{0 \leq t \leq T} ∥ \nabla f (x_{t}) ∥_{*} \leq 2 {[\frac{288 p (p + 1)! D_{0}}{T - m}]}^{p + α - 1} max \{H_{0}, N_{ν} (ϵ)\} .$ (22) Consequently, (23) $T < m + κ_{1}^{(ν)} [288 p (p + 1)! D_{0}] ϵ^{- \frac{1}{p + ν - 1}},$ (23) with $κ_{1}^{(ν)} = \{\begin{cases} {(2 max \{H_{0}, \frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\})}^{\frac{1}{p + ν - 1}}, & i f ν i s k n o w n, \\ {(2 max \{H_{0}, θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\})}^{\frac{1}{p}}, & i f ν i s u n k n o w n . \end{cases}$

Proof.

By Theorem 3.2, we have (24) $f (x_{k}) - f (x^{*}) \leq \frac{[24 p (p + 1)!]^{p + α - 1} 2 max \{H_{0}, N_{ν} (ϵ)\} D_{0}^{p + α}}{(k - m)^{p + α - 1}},$ (24) for all k, $m < k \leq T$ . In particular, it follows from (14) and (Equation24(24) $f (x_{k}) - f (x^{*}) \leq \frac{[24 p (p + 1)!]^{p + α - 1} 2 max \{H_{0}, N_{ν} (ϵ)\} D_{0}^{p + α}}{(k - m)^{p + α - 1}},$ (24) ) that $\begin{aligned} \frac{[24 p (p + 1)!]^{p + α - 1} 2 max \{H_{0}, N_{ν} (ϵ)\} D_{0}^{p + α}}{(2 s)^{p + α - 1}} \\ \geq f (x_{m + 2 s}) - f (x^{*}) \\ = f (x_{T}) - f (x^{*}) + \sum_{k = m + 2 s}^{T - 1} (f (x_{k}) - f (x_{k + 1})) \\ \geq \frac{s}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} (g_{T}^{*})^{\frac{p + α}{p + α - 1}} . \end{aligned}$ Therefore, $\begin{aligned} (g_{T}^{*})^{\frac{p + α}{p + α - 1}} & \leq \frac{8 (p + 1)! [24 p (p + 1)!]^{p + α - 1} {(2 max \{H_{0}, N_{ν} (ϵ)\})}^{\frac{p + α}{p + α - 1}} D_{0}^{p + α}}{2^{p + α - 1} s^{p + α}} \\ \leq \frac{(96 p)^{p + α - 1} 3^{p + α} [(p + 1)!]^{p + α} {(2 max \{H_{0}, N_{ν} (ϵ)\})}^{\frac{p + α}{p + α - 1}} D_{0}^{p + α}}{(T - m)^{p + α}}, \end{aligned}$ and so (Equation22(22) $g_{T}^{*} \equiv min_{0 \leq t \leq T} ∥ \nabla f (x_{t}) ∥_{*} \leq 2 {[\frac{288 p (p + 1)! D_{0}}{T - m}]}^{p + α - 1} max \{H_{0}, N_{ν} (ϵ)\} .$ (22) ) holds. By assumption, we have $g_{T}^{*} > ϵ$ . Thus, by (Equation22(22) $g_{T}^{*} \equiv min_{0 \leq t \leq T} ∥ \nabla f (x_{t}) ∥_{*} \leq 2 {[\frac{288 p (p + 1)! D_{0}}{T - m}]}^{p + α - 1} max \{H_{0}, N_{ν} (ϵ)\} .$ (22) ) we get (25) $\begin{aligned} ϵ < 2 {(\frac{288 p (p + 1)! D_{0}}{T - m})}^{p + α - 1} max \{H_{0}, N_{ν} (ϵ)\} \\ ⟹ & T < m + [288 p (p + 1)! D_{0}] {(2 max \{H_{0}, N_{ν} (ϵ)\})}^{\frac{1}{p + α - 1}} ϵ^{- \frac{1}{p + α - 1}} . \end{aligned}$ (25) Finally, by analysing separately the cases in which ν is known and unknown, it follows from (Equation25(25) $\begin{aligned} ϵ < 2 {(\frac{288 p (p + 1)! D_{0}}{T - m})}^{p + α - 1} max \{H_{0}, N_{ν} (ϵ)\} \\ ⟹ & T < m + [288 p (p + 1)! D_{0}] {(2 max \{H_{0}, N_{ν} (ϵ)\})}^{\frac{1}{p + α - 1}} ϵ^{- \frac{1}{p + α - 1}} . \end{aligned}$ (25) ) and (Equation15(15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) ) that (Equation23(23) $T < m + κ_{1}^{(ν)} [288 p (p + 1)! D_{0}] ϵ^{- \frac{1}{p + ν - 1}},$ (23) ) is true.

Remark 3.2

Suppose that the objective f in (Equation4(4) $min_{x \in E} f (x),$ (4) ) is nonconvex and bounded from below by $f^{*}$ . Then, it follows from (14) and (Equation18(18) $M_{t} = 2^{i_{t}} H_{t} \leq 2 max \{H_{0}, N_{ν} (ϵ)\}, f o r t = 0, \dots, T - 1,$ (18) ) that $f (x_{t}) - f (x_{t + 1}) \geq \frac{1}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} ϵ^{\frac{p + α}{p + α - 1}}, t = 0, \dots, T - 1.$ Summing up these inequalities, we get $f (x_{0}) - f^{*} \geq f (x_{0}) - f (x_{T}) \geq \frac{T}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} ϵ^{\frac{p + α}{p + α - 1}}$ and so, by (Equation15(15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) ), we obtain $T \leq O (ϵ^{- \frac{p + ν}{p + ν - 1}})$ . This bound generalizes the bound of $O (ϵ^{- \frac{2 + ν}{1 + ν}})$ proved in [Citation16] for p = 2. It agrees in order with the complexity bounds proved in [Citation23] and [Citation9] for different universal tensor methods.

4. Accelerated tensor schemes

The schemes presented here generalize the procedures described in [Citation26] for p = 1 and p = 2. Specifically, our general scheme is obtained by adding Step 2 of Algorithm 1 at the end of Algorithm 4 in [Citation18], in order to relate the functional decrease with the norm of the gradient of f in suitable points:

Let us define the following function of ε: (27) ${\tilde{N}}_{ν} (ϵ) = \{\begin{cases} (p + ν - 1) (H_{f, p} (ν) + θ (p - 1)!), & i f ν i s k n o w n, \\ max \{4 θ (p - 1)!, (4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} {(\frac{4}{ϵ})}^{\frac{1 - ν}{p + ν - 1}}\}, & i f ν i s u n k n o w n . \end{cases}$ (27) In Algorithm 2, note that ${x_{t}}$ is independent of ${z_{t}}$ . The next theorem establishes global convergence rates for the functional residual with respect to ${x_{t}}$ .

Theorem 4.1

Assume that H1 holds and let the sequence ${x_{t}}_{t = 0}^{T}$ be generated by Algorithm 2 such that, for $t = 0, \dots, T$ we have (28) $∥ \nabla f (x_{t, i}^{+}) ∥_{*} > ϵ, i = 0, \dots, i_{t} .$ (28) Then, (29) $f (x_{t}) - f (x^{*}) \leq \frac{2^{3 p} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} (p + α)^{p + α - 1} ∥ x_{0} - x^{*} ∥^{p + α}}{(p - 1)! (t - 1)^{p + α}},$ (29) for $t = 2, \dots, T$ .

Proof.

As in the proof of Lemma 3.1, it follows from (Equation28(28) $∥ \nabla f (x_{t, i}^{+}) ∥_{*} > ϵ, i = 0, \dots, i_{t} .$ (28) ), (Equation27(27) ${\tilde{N}}_{ν} (ϵ) = \{\begin{cases} (p + ν - 1) (H_{f, p} (ν) + θ (p - 1)!), & i f ν i s k n o w n, \\ max \{4 θ (p - 1)!, (4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} {(\frac{4}{ϵ})}^{\frac{1 - ν}{p + ν - 1}}\}, & i f ν i s u n k n o w n . \end{cases}$ (27) ) and Lemmas A.6 and A.7 in [Citation18] that ${\tilde{H}}_{t} \leq max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\}, t = 0, \dots, T,$ which gives ${\tilde{M}}_{t} = 2^{i_{t}} {\tilde{H}}_{t} = 2 (2^{i_{t}} {\tilde{H}}_{t}) = 2 {\tilde{H}}_{t + 1} \leq 2 max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\}, t = 0, \dots, T - 1.$ Then, (Equation29(29) $f (x_{t}) - f (x^{*}) \leq \frac{2^{3 p} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} (p + α)^{p + α - 1} ∥ x_{0} - x^{*} ∥^{p + α}}{(p - 1)! (t - 1)^{p + α}},$ (29) ) follows directly from Theorem 4.3 in [Citation18] with $M_{ν} = 2 max {{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)}$ .

Now we can obtain global convergence rates for Algorithm 2 in terms of the norm of the gradient.

Theorem 4.2

Suppose that H1 holds and let sequences ${x_{t}}_{t = 0}^{T}$ and ${z_{t}}_{t = 0}^{T}$ be generated by Algorithm 2. Assume that, for $t = 0, \dots, T,$ we have (30) $min \{∥ \nabla f (x_{t, i}^{+}) ∥_{*}, ∥ \nabla f (z_{t, j}^{+}) ∥_{*}\} > ϵ, i = 0, \dots, i_{t}, j = 0, \dots, j_{t} .$ (30) If T = 2s for some $s > 1,$ then (31) $g_{T}^{*} \equiv min_{0 \leq k \leq T} ∥ \nabla f (z_{t}) ∥_{*} \leq C_{ν} (ϵ) ∥ x_{0} - x^{*} ∥^{p + α - 1} {[\frac{p + 1}{T - 2}]}^{\frac{(p + α - 1) (p + α + 1)}{p + α}}$ (31) where $C_{ν} (ϵ) = {[2^{(4 p + 6)} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} max {\{H_{0}, N_{ν} (ϵ)\}}^{\frac{1}{p + α - 1}}]}^{\frac{p + α - 1}{p + α}},$

with $N_{ν} (ϵ)$ and ${\tilde{N}}_{ν} (ϵ)$ defined in (Equation15(15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) ) and (Equation27(27) ${\tilde{N}}_{ν} (ϵ) = \{\begin{cases} (p + ν - 1) (H_{f, p} (ν) + θ (p - 1)!), & i f ν i s k n o w n, \\ max \{4 θ (p - 1)!, (4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} {(\frac{4}{ϵ})}^{\frac{1 - ν}{p + ν - 1}}\}, & i f ν i s u n k n o w n . \end{cases}$ (27) ), respectively. Consequently, (32) $T \leq 2 + 2^{(4 p + 6)} (p + 1) max \{1, {\tilde{H}}_{0}, H_{0}, 3 p (H_{f, p} (ν) + θ (p - 1)!)\} ∥ x_{0} - x^{*} ∥^{\frac{p + ν}{p + ν + 1}} ϵ^{- \frac{p + ν}{(p + ν - 1) (p + ν + 1)}},$ (32) if ν is known (i.e. $α = ν$ ), and

(33) $\begin{aligned} T < 2 + 2^{(2 p + 3)} (p + 1) max {\{1, {\tilde{H}}_{0}, H_{0}, 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!]\}}^{\frac{1}{2}} ∥ x_{0} - x^{*} ∥^{\frac{p + 1}{p + 2}} ϵ^{- \frac{p + 1}{(p + ν - 1) (p + 2)}}, \end{aligned}$ (33)

if ν is unknown (i.e. $α = 1$ ).

Proof.

By Theorem 4.1, we have (34) $f (x_{t}) - f (x^{*}) \leq \frac{2^{3 p} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} (p + α)^{p + α - 1} ∥ x_{0} - x^{*} ∥^{p + α}}{(p - 1)! (t - 1)^{p + α}}$ (34) for $t = 2, \dots, T$ . On the other hand, as in Lemma 3.1, by (Equation30(30) $min \{∥ \nabla f (x_{t, i}^{+}) ∥_{*}, ∥ \nabla f (z_{t, j}^{+}) ∥_{*}\} > ϵ, i = 0, \dots, i_{t}, j = 0, \dots, j_{t} .$ (30) ) we get $2^{j_{t}} H_{t} \leq 2 max \{H_{0}, N_{ν} (ϵ)\}, t = 0, \dots, T - 1,$ where $N_{ν} (ϵ)$ is defined in (Equation15(15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) ). Then, in view of (26), it follows that (35) $f ({\bar{z}}_{t}) - f (z_{t + 1}) \geq \frac{1}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} ∥ \nabla f (z_{t + 1}) ∥_{*}^{\frac{p + α}{p + α - 1}},$ (35) for $t = 0, \dots, T - 1$ . In particular, $f (z_{t + 1}) \leq f ({\bar{z}}_{t})$ for $t = 0, \dots, T - 1$ . Moreover, by the definition of ${\bar{z}}_{t}$ , we get $f ({\bar{z}}_{t}) \leq f (x_{t + 1})$ and $f ({\bar{z}}_{t}) \leq f (z_{t})$ . Therefore (36) $f (z_{t}) \leq f (x_{t}), t = 0, \dots, T,$ (36) and (37) $f (z_{t}) - f (z_{t + 1}) \geq \frac{1}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} ∥ \nabla f (z_{t + 1}) ∥_{*}^{\frac{p + α}{p + α - 1}} .$ (37) Now, since T = 2s, summing up (Equation37(37) $f (z_{t}) - f (z_{t + 1}) \geq \frac{1}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} ∥ \nabla f (z_{t + 1}) ∥_{*}^{\frac{p + α}{p + α - 1}} .$ (37) ), we get $\begin{aligned} \frac{2^{3 p} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} (p + α)^{p + α - 1} ∥ x_{0} - x^{*} ∥^{p + α}}{(p - 1)! (s - 1)^{p + α}} \\ \geq f (x_{s}) - f (x^{*}) \geq f (z_{s}) - f (x^{*}) \\ = f (z_{T}) - f (x^{*}) + \sum_{k = s}^{T - 1} (f (z_{k}) - f (z_{k + 1})) \\ \geq \frac{(s - 1)}{8 (p + 1)!} {[\frac{1}{2 max \{H_{0}, N_{ν} (ϵ)\}}]}^{\frac{1}{p + α - 1}} (g_{T}^{*})^{\frac{p + α}{p + α - 1}} . \end{aligned}$ Thus, $(g_{T}^{*})^{\frac{p + α}{p + α - 1}} \leq \frac{2^{(4 p + 6)} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} max {\{H_{0}, N_{ν} (ϵ)\}}^{\frac{1}{p + α - 1}} (p + 1)^{p + α + 1} ∥ x_{0} - x^{*} ∥^{p + α}}{(T - 2)^{p + α + 1}},$ and so (Equation31(31) $g_{T}^{*} \equiv min_{0 \leq k \leq T} ∥ \nabla f (z_{t}) ∥_{*} \leq C_{ν} (ϵ) ∥ x_{0} - x^{*} ∥^{p + α - 1} {[\frac{p + 1}{T - 2}]}^{\frac{(p + α - 1) (p + α + 1)}{p + α}}$ (31) ) holds. By assumption, we have $g_{T}^{*} > ϵ$ . Thus, it follows from (Equation31(31) $g_{T}^{*} \equiv min_{0 \leq k \leq T} ∥ \nabla f (z_{t}) ∥_{*} \leq C_{ν} (ϵ) ∥ x_{0} - x^{*} ∥^{p + α - 1} {[\frac{p + 1}{T - 2}]}^{\frac{(p + α - 1) (p + α + 1)}{p + α}}$ (31) ) that (38) $\begin{aligned} ϵ < C_{ν} (ϵ) ∥ x_{0} - x^{*} ∥^{p + α - 1} {[\frac{p + 1}{T - 2}]}^{\frac{(p + α - 1) (p + α + 1)}{(p + α)}} \\ ⟹ & T < 2 + (p + 1) {[\frac{C_{ν} (ϵ)}{ϵ}]}^{\frac{p + α}{(p + α - 1) (p + α + 1)}} ∥ x_{0} - x^{*} ∥^{\frac{p + α}{p + α + 1}} . \end{aligned}$ (38) If ν is known, by (Equation15(15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) ) and (Equation27(27) ${\tilde{N}}_{ν} (ϵ) = \{\begin{cases} (p + ν - 1) (H_{f, p} (ν) + θ (p - 1)!), & i f ν i s k n o w n, \\ max \{4 θ (p - 1)!, (4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} {(\frac{4}{ϵ})}^{\frac{1 - ν}{p + ν - 1}}\}, & i f ν i s u n k n o w n . \end{cases}$ (27) ) we have $max {N_{ν} (ϵ), {\tilde{N}}_{ν} (ϵ)} \leq 3 p (H_{f, p} (ν) + θ (p - 1)!)$ . Then, $max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} max {\{H_{0}, N_{ν} (ϵ)\}}^{\frac{1}{p + ν - 1}} \leq max {\{{\tilde{H}}_{0}, H_{0}, 3 p (H_{f, p} (ν) + θ (p - 1)!)\}}^{\frac{p + ν}{p + ν - 1}},$ and so (39) $C_{ν} (ϵ) \leq 2^{(4 p + 6)} max \{{\tilde{H}}_{0}, H_{0}, 3 p (H_{f, p} (ν) + θ (p - 1)!)\} .$ (39) Combining (Equation38(38) $\begin{aligned} ϵ < C_{ν} (ϵ) ∥ x_{0} - x^{*} ∥^{p + α - 1} {[\frac{p + 1}{T - 2}]}^{\frac{(p + α - 1) (p + α + 1)}{(p + α)}} \\ ⟹ & T < 2 + (p + 1) {[\frac{C_{ν} (ϵ)}{ϵ}]}^{\frac{p + α}{(p + α - 1) (p + α + 1)}} ∥ x_{0} - x^{*} ∥^{\frac{p + α}{p + α + 1}} . \end{aligned}$ (38) ), (Equation39(39) $C_{ν} (ϵ) \leq 2^{(4 p + 6)} max \{{\tilde{H}}_{0}, H_{0}, 3 p (H_{f, p} (ν) + θ (p - 1)!)\} .$ (39) ) and $\frac{p + ν}{(p + ν - 1) (p + ν + 1)} \leq 1$ , we obtain (Equation32(32) $T \leq 2 + 2^{(4 p + 6)} (p + 1) max \{1, {\tilde{H}}_{0}, H_{0}, 3 p (H_{f, p} (ν) + θ (p - 1)!)\} ∥ x_{0} - x^{*} ∥^{\frac{p + ν}{p + ν + 1}} ϵ^{- \frac{p + ν}{(p + ν - 1) (p + ν + 1)}},$ (32) ). If ν is unknown, it follows from (Equation15(15) $N_{ν} (ϵ) = \{\begin{cases} max \{\frac{3 H_{f, p} (ν)}{2}, 3 θ (p - 1)!\}, & i f ν i s k n o w n, \\ max \{θ, {(\frac{3 H_{f, p} (ν)}{2})}^{\frac{p}{p + ν - 1}} 4^{\frac{1 - ν}{p + ν - 1}}\} ϵ^{- \frac{1 - ν}{p + ν - 1}}, & i f ν i s u n k n o w n . \end{cases}$ (15) ) and (Equation27(27) ${\tilde{N}}_{ν} (ϵ) = \{\begin{cases} (p + ν - 1) (H_{f, p} (ν) + θ (p - 1)!), & i f ν i s k n o w n, \\ max \{4 θ (p - 1)!, (4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} {(\frac{4}{ϵ})}^{\frac{1 - ν}{p + ν - 1}}\}, & i f ν i s u n k n o w n . \end{cases}$ (27) ) that $max \{N_{ν} (ϵ), {\tilde{N}}_{ν} (ϵ)\} \leq 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!] ϵ^{- \frac{1 - ν}{p + ν - 1}} .$ Then, $\begin{aligned} max \{{\tilde{H}}_{0}, {\tilde{N}}_{ν} (ϵ)\} max {\{H_{0}, N_{ν} (ϵ)\}}^{\frac{1}{p}} \\ \leq {[max \{{\tilde{H}}_{0}, H_{0}, 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!]\} ϵ^{- \frac{1 - ν}{p + ν - 1}}]}^{\frac{p + 1}{p}}, \end{aligned}$ and so (40) $C_{ν} (ϵ) \leq 2^{(4 p + 6)} max \{{\tilde{H}}_{0}, H_{0}, 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!]\} ϵ^{- \frac{1 - ν}{p + ν - 1}} .$ (40) Combining (Equation38(38) $\begin{aligned} ϵ < C_{ν} (ϵ) ∥ x_{0} - x^{*} ∥^{p + α - 1} {[\frac{p + 1}{T - 2}]}^{\frac{(p + α - 1) (p + α + 1)}{(p + α)}} \\ ⟹ & T < 2 + (p + 1) {[\frac{C_{ν} (ϵ)}{ϵ}]}^{\frac{p + α}{(p + α - 1) (p + α + 1)}} ∥ x_{0} - x^{*} ∥^{\frac{p + α}{p + α + 1}} . \end{aligned}$ (38) ), (Equation40(40) $C_{ν} (ϵ) \leq 2^{(4 p + 6)} max \{{\tilde{H}}_{0}, H_{0}, 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!]\} ϵ^{- \frac{1 - ν}{p + ν - 1}} .$ (40) ) and $\frac{p + 1}{p (p + 2)} < \frac{1}{2}$ , we obtain (Equation33(33) $\begin{aligned} T < 2 + 2^{(2 p + 3)} (p + 1) max {\{1, {\tilde{H}}_{0}, H_{0}, 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!]\}}^{\frac{1}{2}} ∥ x_{0} - x^{*} ∥^{\frac{p + 1}{p + 2}} ϵ^{- \frac{p + 1}{(p + ν - 1) (p + 2)}}, \end{aligned}$ (33) ).

Remark 4.1

When $ν = 1$ , bounds (Equation32(32) $T \leq 2 + 2^{(4 p + 6)} (p + 1) max \{1, {\tilde{H}}_{0}, H_{0}, 3 p (H_{f, p} (ν) + θ (p - 1)!)\} ∥ x_{0} - x^{*} ∥^{\frac{p + ν}{p + ν + 1}} ϵ^{- \frac{p + ν}{(p + ν - 1) (p + ν + 1)}},$ (32) ) and (Equation33(33) $\begin{aligned} T < 2 + 2^{(2 p + 3)} (p + 1) max {\{1, {\tilde{H}}_{0}, H_{0}, 4 [(4 H_{f, p} (ν))^{\frac{p}{p + ν - 1}} + θ (p - 1)!]\}}^{\frac{1}{2}} ∥ x_{0} - x^{*} ∥^{\frac{p + 1}{p + 2}} ϵ^{- \frac{p + 1}{(p + ν - 1) (p + 2)}}, \end{aligned}$ (33) ) have the same dependence on ε. However, when $ν \neq 1$ , the bound of $O (ϵ^{- (p + 1) / [(p + ν - 1) (p + 2)]})$ obtained for the universal scheme (i.e. $α = 1$ ) is worse than the bound of $O (ϵ^{- (p + ν) / [(p + ν - 1) (p + ν + 1)]})$ obtained for the non-universal scheme (i.e. $α = ν$ ). In both cases, these complexity bounds are better than the bound of $O (ϵ^{- 1 / (p + ν - 1)})$ proved for Algorithm 1.

5. Composite minimization

From now on, we will assume that ν and $H_{f, p} (ν)$ are known. In this setting, we can consider the composite minimization problem: (41) $min_{x \in E} \tilde{f} (x) \equiv f (x) + ϕ (x),$ (41) where $f : E \to R$ is a convex function satisfying H1 (see page 5), and $ϕ : E \to R \cup {+ \infty}$ is a simple closed convex function whose effective domain has nonempty relative interior, that is, $r i (d o m ϕ) \neq \emptyset$ . We assume that there exists at least one optimal solution $x^{*} \in E$ for (Equation41(41) $min_{x \in E} \tilde{f} (x) \equiv f (x) + ϕ (x),$ (41) ). By (Equation6(6) $\begin{aligned} | f (y) - Φ_{x, p} (y) | \leq \frac{H_{f, p} (ν)}{p!} ∥ y - x ∥^{p + ν}, \end{aligned}$ (6) ), if $H \geq H_{f, p} (ν)$ we have $\tilde{f} (y) \leq Ω_{x, p, H}^{(ν)} (y) + ϕ (y), \forall y \in E .$ This motivates the following class of models of $\tilde{f}$ around a fixed point $x \in E$ : (42) ${\tilde{Ω}}_{x, p, H}^{(ν)} (y) \equiv Ω_{x, p, H}^{(ν)} (y) + ϕ (y),$ (42) where $Ω_{x, p, H}^{(ν)} (\cdot)$ is defined in (Equation10(10) $Ω_{x, p, H}^{(α)} (y) = Φ_{x, p} (y) + \frac{H}{p!} ∥ y - x ∥^{p + α}, α \in [0, 1] .$ (10) ). The next lemma gives a sufficient condition for function $Ω_{x, p, H}^{(ν)} (\cdot)$ to be convex. Its proof is an adaptation of the proof of Theorem 1 in [Citation27].

Lemma 5.1

Suppose that H1 holds for some $p \geq 2$ . Then, for any $x, y \in E$ we have (43) $\nabla^{2} f (y) ⪯ \nabla^{2} Φ_{x, p} (y) + \frac{H_{f, p} (ν)}{(p - 2)!} ∥ y - x ∥^{p + ν - 2} B .$ (43) Moreover, if $H \geq (p - 1) H_{f, p} (ν),$ then function $Ω_{x, p, H}^{(ν)} (\cdot)$ is convex for any $x \in E$ .

Proof.

For any $u \in E$ , it follows from (Equation8(8) $∥ \nabla^{2} f (y) - \nabla^{2} Φ_{x, p} (y) ∥_{*} \leq \frac{H_{f, p} (ν)}{(p - 2)!} ∥ y - x ∥^{p + ν - 2} .$ (8) ) that $\begin{aligned} ⟨ (\nabla^{2} f (y) - \nabla^{2} Φ_{x, p} (y)) u, u ⟩ & \leq ∥ \nabla^{2} f (y) - \nabla^{2} Φ_{x, p} (y) ∥_{*} ∥ u ∥^{2} \\ \leq \frac{H_{f, p} (ν)}{(p - 2)!} ∥ y - x ∥^{p + ν - 2} ∥ u ∥^{2} . \end{aligned}$ Since $u \in E$ is arbitrary, we get (Equation43(43) $\nabla^{2} f (y) ⪯ \nabla^{2} Φ_{x, p} (y) + \frac{H_{f, p} (ν)}{(p - 2)!} ∥ y - x ∥^{p + ν - 2} B .$ (43) ).

Now, suppose that $H \geq (p - 1) H_{f, p} (ν)$ . Then, by (Equation43(43) $\nabla^{2} f (y) ⪯ \nabla^{2} Φ_{x, p} (y) + \frac{H_{f, p} (ν)}{(p - 2)!} ∥ y - x ∥^{p + ν - 2} B .$ (43) ) we have $\begin{aligned} 0 ⪯ \nabla^{2} f (y) & ⪯ \nabla^{2} Φ_{x, p} (y) + \frac{H_{f, p} (ν) (p - 1)}{(p - 1)!} ∥ y - x ∥^{p + ν - 2} B \\ ⪯ \nabla^{2} Φ_{x, p} (y) + \frac{H}{(p - 1)!} ∥ y - x ∥^{p + ν - 2} B \\ ⪯ \nabla^{2} Φ_{x, p} (y) + \frac{H (p + ν)}{p!} ∥ y - x ∥^{p + ν - 2} B \\ ⪯ \nabla^{2} Φ_{x, p} (y) + \nabla^{2} (\frac{H}{p!} ∥ y - x ∥^{p + ν}) = \nabla^{2} Ω_{x, p, H}^{(ν)} (y) . \end{aligned}$ Therefore, $Ω_{x, p, H}^{(ν)} (y)$ is convex.

From Lemma 5.1, if $H \geq (p - 1) H_{f, p} (ν)$ , it follows that ${\tilde{Ω}}_{x, p, H}^{(ν)} (\cdot)$ is also convex. In this case, since $r i (d o m ϕ) \neq \emptyset$ , any solution $x^{+}$ of (44) $min_{y \in E} {\tilde{Ω}}_{x, p, H}^{(ν)} (y)$ (44) satisfies the first-order optimality condition: (45) $0 \in \partial {\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) = \partial Ω_{x, p, H}^{(ν)} (x^{+}) + \partial ϕ (x^{+}) = \{\nabla Ω_{x, p, H}^{(ν)} (x^{+})\} + \partial ϕ (x^{+}) .$ (45) Therefore, there exists $g_{ϕ} (x^{+}) \in \partial ϕ (x^{+})$ such that (46) $\nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) = 0.$ (46) Instead of solving (Equation44(44) $min_{y \in E} {\tilde{Ω}}_{x, p, H}^{(ν)} (y)$ (44) ) exactly, in our algorithms we consider inexact solutions $x^{+}$ such thatFootnote¹ (47) ${\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) \leq \tilde{f} (x) a n d ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \leq θ ∥ x^{+} - x ∥^{p + ν - 1},$ (47) for some $g_{ϕ} (x^{+}) \in \partial ϕ (x^{+})$ and $θ \geq 0$ . For such points $x^{+}$ , we define (48) $\nabla \tilde{f} (x^{+}) \equiv \nabla f (x^{+}) + g_{ϕ} (x^{+}),$ (48) with $g_{ϕ} (x^{+})$ satisfying (Equation47(47) ${\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) \leq \tilde{f} (x) a n d ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \leq θ ∥ x^{+} - x ∥^{p + ν - 1},$ (47) ). Clearly, we have $\nabla \tilde{f} (x^{+}) \in \partial \tilde{f} (x^{+})$ .

Lemma 5.2

Suppose that H1 holds and let $x^{+}$ be an approximate solution of (Equation44(44) $min_{y \in E} {\tilde{Ω}}_{x, p, H}^{(ν)} (y)$ (44) ) such that (Equation47(47) ${\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) \leq \tilde{f} (x) a n d ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \leq θ ∥ x^{+} - x ∥^{p + ν - 1},$ (47) ) holds for some $x \in E$ . If (49) $H \geq max \{p H_{f, p} (ν), 3 θ (p - 1)!\},$ (49) then (50) $\tilde{f} (x) - \tilde{f} (x^{+}) \geq \frac{1}{8 (p + 1)! H^{\frac{1}{p + ν - 1}}} ∥ \nabla \tilde{f} (x^{+}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} .$ (50)

Proof.

By (Equation48(48) $\nabla \tilde{f} (x^{+}) \equiv \nabla f (x^{+}) + g_{ϕ} (x^{+}),$ (48) ), (Equation7(7) $\begin{aligned} ∥ \nabla f (y) - \nabla Φ_{x, p} (y) ∥_{*} \leq \frac{H_{f, p} (ν)}{(p - 1)!} ∥ y - x ∥^{p + ν - 1}, \end{aligned}$ (7) ), (Equation10(10) $Ω_{x, p, H}^{(α)} (y) = Φ_{x, p} (y) + \frac{H}{p!} ∥ y - x ∥^{p + α}, α \in [0, 1] .$ (10) ), (Equation47(47) ${\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) \leq \tilde{f} (x) a n d ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \leq θ ∥ x^{+} - x ∥^{p + ν - 1},$ (47) ) and (Equation49(49) $H \geq max \{p H_{f, p} (ν), 3 θ (p - 1)!\},$ (49) ) we have (51) $\begin{aligned} ∥ \nabla \tilde{f} (x^{+}) ∥_{*} & = ∥ \nabla f (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \\ \leq ∥ \nabla f (x^{+}) - \nabla Φ_{x, p} (x^{+}) ∥_{*} + ∥ \nabla Φ_{x, p} (x^{+}) - \nabla Ω_{x, p, H}^{(ν)} (x^{+}) ∥_{*} \\ + ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \\ \leq [\frac{H_{f, p} (ν)}{(p - 1)!} + \frac{H (p + ν)}{p!} + θ] ∥ x^{+} - x ∥^{p + ν - 1} \leq 2 H ∥ x^{+} - x ∥^{p + ν - 1}, \end{aligned}$ (51) where the last inequality is due to $p \geq 2$ . On the other hand, by (Equation6(6) $\begin{aligned} | f (y) - Φ_{x, p} (y) | \leq \frac{H_{f, p} (ν)}{p!} ∥ y - x ∥^{p + ν}, \end{aligned}$ (6) ), (Equation42(42) ${\tilde{Ω}}_{x, p, H}^{(ν)} (y) \equiv Ω_{x, p, H}^{(ν)} (y) + ϕ (y),$ (42) ), (Equation49(49) $H \geq max \{p H_{f, p} (ν), 3 θ (p - 1)!\},$ (49) ), we have $\begin{aligned} \tilde{f} (x^{+}) & \leq {\tilde{Ω}}_{x, p, H_{f, p} (ν)}^{(ν)} (x^{+}) = Φ_{x, p} (x^{+}) + \frac{H_{f, p} (ν)}{p!} ∥ x^{+} - x ∥^{p + ν} + ϕ (x^{+}) \\ = Φ_{x, p} (x^{+}) + \frac{H}{p!} ∥ x^{+} - x ∥^{p + ν} - \frac{(H - H_{f, p} (ν))}{p!} ∥ x^{+} - x ∥^{p + ν} + ϕ (x^{+}) \\ = {\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) - \frac{H - H_{f, p} (ν)}{p!} ∥ x^{+} - x ∥^{p + ν} \leq \tilde{f} (x^{+}) - \frac{H - H_{f, p} (ν)}{p!} ∥ x^{+} - x ∥^{p + ν} . \end{aligned}$ Note that $H \geq p H_{f, p} (ν) \geq (\frac{p + 1}{p}) H_{f, p} (ν)$ . Thus, (52) $\begin{aligned} \tilde{f} (x) - \tilde{f} (x^{+}) & \geq \frac{H - H_{f, p} (ν)}{p!} ∥ x^{+} - x ∥^{p + ν} \geq \frac{(H - \frac{1}{p + 1} H)}{p!} ∥ x^{+} - x ∥^{p + ν} \\ = \frac{H}{(p + 1)!} ∥ x^{+} - x ∥^{p + ν} . \end{aligned}$ (52) Finally, combining (Equation51(51) $\begin{aligned} ∥ \nabla \tilde{f} (x^{+}) ∥_{*} & = ∥ \nabla f (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \\ \leq ∥ \nabla f (x^{+}) - \nabla Φ_{x, p} (x^{+}) ∥_{*} + ∥ \nabla Φ_{x, p} (x^{+}) - \nabla Ω_{x, p, H}^{(ν)} (x^{+}) ∥_{*} \\ + ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \\ \leq [\frac{H_{f, p} (ν)}{(p - 1)!} + \frac{H (p + ν)}{p!} + θ] ∥ x^{+} - x ∥^{p + ν - 1} \leq 2 H ∥ x^{+} - x ∥^{p + ν - 1}, \end{aligned}$ (51) ) and (Equation52(52) $\begin{aligned} \tilde{f} (x) - \tilde{f} (x^{+}) & \geq \frac{H - H_{f, p} (ν)}{p!} ∥ x^{+} - x ∥^{p + ν} \geq \frac{(H - \frac{1}{p + 1} H)}{p!} ∥ x^{+} - x ∥^{p + ν} \\ = \frac{H}{(p + 1)!} ∥ x^{+} - x ∥^{p + ν} . \end{aligned}$ (52) ), we get (Equation50(50) $\tilde{f} (x) - \tilde{f} (x^{+}) \geq \frac{1}{8 (p + 1)! H^{\frac{1}{p + ν - 1}}} ∥ \nabla \tilde{f} (x^{+}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} .$ (50) ).

In this composite context, let us consider the following scheme:

For p = 3, point $x_{t + 1}$ at Step 1 can be computed by Algorithm 2 in [Citation19], which is linearly convergent. As far as we know, the development of efficient algorithms to approximately solve (Equation44(44) $min_{y \in E} {\tilde{Ω}}_{x, p, H}^{(ν)} (y)$ (44) )–(Equation42(42) ${\tilde{Ω}}_{x, p, H}^{(ν)} (y) \equiv Ω_{x, p, H}^{(ν)} (y) + ϕ (y),$ (42) ) with p>3 is still an open problem.

Theorem 5.3

Suppose that H1 holds and that $\tilde{f}$ is bounded from below by ${\tilde{f}}^{*}$ . Given $ϵ \in (0, 1),$ assume that ${x_{t}}_{t = 0}^{T}$ is a sequence generated by Algorithm 3 such that $∥ \nabla \tilde{f} (x_{t}) ∥_{*} > ϵ$ for $t = 0, \dots, T$ . Then, (53) $T \leq 8 (p + 1)! M^{\frac{1}{p + ν - 1}} (\tilde{f} (x_{0}) - {\tilde{f}}^{*}) ϵ^{- \frac{p + ν}{p + ν - 1}} .$ (53)

Proof.

By Lemma 5.2, bound (Equation53(53) $T \leq 8 (p + 1)! M^{\frac{1}{p + ν - 1}} (\tilde{f} (x_{0}) - {\tilde{f}}^{*}) ϵ^{- \frac{p + ν}{p + ν - 1}} .$ (53) ) follows as in Remark 3.2.

5.1. Extended accelerated scheme

Let us consider the following variant of Algorithm 2 for composite minimization:

The next theorem gives the global convergence rate for Algorithm 4 in terms of the norm of the gradient. Its proof is a direct adaptation of the proof of Theorem 4.2.

Theorem 5.4

Suppose that H1 holds. Assume that ${z_{t}}_{t = 0}^{T}$ is a sequence generated by Algorithm 4 such that (56) $∥ \nabla \tilde{f} (z_{t}) ∥_{*} > ϵ, t = 0, \dots, T .$ (56) If T = 2s for some $s > 1,$ then (57) ${\tilde{g}}_{T} \equiv min_{0 \leq k \leq T} ∥ \nabla \tilde{f} (z_{t}) ∥_{*} \leq \frac{{[2^{4 (p + 1)}]}^{\frac{p + ν - 1}{p + ν}} M ∥ x_{0} - x^{*} ∥^{p + ν - 1}}{(T - 2)^{\frac{(p + ν - 1) (p + ν + 1)}{p + ν}}} .$ (57) Consequently, (58) $T \leq 2 + {[2^{4 (p + 1)}]}^{\frac{1}{p + ν + 1}} ∥ x_{0} - x^{*} ∥^{\frac{p + ν}{p + ν + 1}} {(\frac{M}{ϵ})}^{\frac{p + ν}{(p + ν - 1) (p + ν + 1)}} .$ (58)

Proof.

In view of Theorem A.2, we have (59) $\tilde{f} (x_{t}) - \tilde{f} (x^{*}) \leq \frac{2^{3 p - 1} M (p + ν)^{p + ν - 1} ∥ x_{0} - x^{*} ∥^{p + ν}}{(p - 1)! (t - 1)^{p + ν}},$ (59) for $t = 2, \dots, T$ . On the other hand, by (55) and Lemma 5.2, we have (60) $\tilde{f} ({\bar{z}}_{t}) - \tilde{f} (z_{t + 1}) \geq \frac{1}{8 (p + 1)! M^{\frac{1}{p + ν - 1}}} ∥ \nabla \tilde{f} (z_{t + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}},$ (60) for $t = 0, \dots, T - 1$ . Thus, $f (z_{t + 1}) \leq f ({\bar{z}}_{t}) \leq min {f (x_{t + 1}), f (z_{t})}$ and, consequently, (61) $\tilde{f} (z_{t}) \leq \tilde{f} (x_{t}), t = 0, \dots, T,$ (61) and (62) $\tilde{f} (z_{t}) - \tilde{f} (z_{t + 1}) \geq \frac{1}{8 (p + 1)! M^{\frac{1}{p + ν - 1}}} ∥ \nabla \tilde{f} (z_{t + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}}, t = 0, \dots, T - 1.$ (62) Since T = 2s, combining (Equation59(59) $\tilde{f} (x_{t}) - \tilde{f} (x^{*}) \leq \frac{2^{3 p - 1} M (p + ν)^{p + ν - 1} ∥ x_{0} - x^{*} ∥^{p + ν}}{(p - 1)! (t - 1)^{p + ν}},$ (59) ), (Equation61(61) $\tilde{f} (z_{t}) \leq \tilde{f} (x_{t}), t = 0, \dots, T,$ (61) ) and (Equation62(62) $\tilde{f} (z_{t}) - \tilde{f} (z_{t + 1}) \geq \frac{1}{8 (p + 1)! M^{\frac{1}{p + ν - 1}}} ∥ \nabla \tilde{f} (z_{t + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}}, t = 0, \dots, T - 1.$ (62) ), we obtain $\begin{aligned} \frac{2^{3 p - 1} M (p + ν)^{p + ν - 1} ∥ x_{0} - x^{*} ∥^{p + ν}}{(p - 1)! (s - 1)^{p + ν}} & \geq \tilde{f} (x_{s}) - \tilde{f} (x^{*}) \geq \tilde{f} (z_{s}) - \tilde{f} (x^{*}) \\ = \tilde{f} (z_{T}) - \tilde{f} (x^{*}) + \sum_{k = s}^{T - 1} \tilde{f} (z_{t}) - \tilde{f} (z_{t + 1}) \\ \geq \frac{(s - 1)}{8 (p + 1)! M^{\frac{1}{p + ν - 1}}} {({\tilde{g}}_{T})}^{\frac{p + ν}{p + ν - 1}}, \end{aligned}$ where ${\tilde{g}}_{T} = min_{0 \leq k \leq T} ∥ \nabla \tilde{f} (z_{t}) ∥_{*}$ . Therefore, ${({\tilde{g}}_{T})}^{\frac{p + ν}{p + ν - 1}} \leq \frac{2^{4 (p + 1)} M^{\frac{p + ν}{p + ν - 1}} (p + 1)^{p + ν + 1} ∥ x_{0} - x^{*} ∥^{p + ν}}{(T - 2)^{p + ν + 1}},$ which gives (Equation57(57) ${\tilde{g}}_{T} \equiv min_{0 \leq k \leq T} ∥ \nabla \tilde{f} (z_{t}) ∥_{*} \leq \frac{{[2^{4 (p + 1)}]}^{\frac{p + ν - 1}{p + ν}} M ∥ x_{0} - x^{*} ∥^{p + ν - 1}}{(T - 2)^{\frac{(p + ν - 1) (p + ν + 1)}{p + ν}}} .$ (57) ). Finally, by (Equation56(56) $∥ \nabla \tilde{f} (z_{t}) ∥_{*} > ϵ, t = 0, \dots, T .$ (56) ) we have ${\tilde{g}}_{T} > ϵ$ . Thus, (Equation58(58) $T \leq 2 + {[2^{4 (p + 1)}]}^{\frac{1}{p + ν + 1}} ∥ x_{0} - x^{*} ∥^{\frac{p + ν}{p + ν + 1}} {(\frac{M}{ϵ})}^{\frac{p + ν}{(p + ν - 1) (p + ν + 1)}} .$ (58) ) follows directly from (Equation57(57) ${\tilde{g}}_{T} \equiv min_{0 \leq k \leq T} ∥ \nabla \tilde{f} (z_{t}) ∥_{*} \leq \frac{{[2^{4 (p + 1)}]}^{\frac{p + ν - 1}{p + ν}} M ∥ x_{0} - x^{*} ∥^{p + ν - 1}}{(T - 2)^{\frac{(p + ν - 1) (p + ν + 1)}{p + ν}}} .$ (57) ).

5.2. Regularization approach

Now, let us consider the ideal situation in which ν, $H_{f, p} (ν)$ and $R \geq ∥ x_{0} - x^{*} ∥$ are known. In this case, a complexity bound with a better dependence on ε can be obtained by repeatedly applying an accelerated algorithm to a suitable regularization of $\tilde{f}$ . Specifically, given $δ > 0$ , consider the regularized problem (63) $min_{x \in R^{n}} {\tilde{F}}_{δ} (x) \equiv F_{δ} (x) + ϕ (x),$ (63) for (64) $F_{δ} (x) = f (x) + \frac{δ}{p + ν} ∥ x - x_{0} ∥^{p + ν} .$ (64)

Lemma 5.5

Given $x_{0} \in E$ and $ν \in [0, 1],$ let $d_{p + ν} : E \to R$ be defined by $d_{p + ν} (x) = ∥ x - x_{0} ∥^{p + ν}$ , where $∥ \cdot ∥$ is the Euclidean norm defined in (Equation1(1) $∥ x ∥ = ⟨ B x, x ⟩^{1 / 2}, x \in E, ∥ s ∥_{*} = ⟨ s, B^{- 1} s ⟩^{1 / 2}, s \in E^{*} .$ (1) ). Then, $∥ D^{p} d_{p + ν} (x) - D^{p} d_{p + ν} (y) ∥ \leq C_{p, ν} ∥ x - y ∥^{ν}, \forall x, y \in E,$ where $C_{p, ν} = 2 Π_{i = 1}^{p} (ν + i)$ .

Proof.

See [Citation32].

As a consequence of the lemma above, we have the following property.

Lemma 5.6

If H1 holds, then the pth derivative of $F_{δ} (\cdot)$ in (Equation64(64) $F_{δ} (x) = f (x) + \frac{δ}{p + ν} ∥ x - x_{0} ∥^{p + ν} .$ (64) ) is ν-Hölder continuous with constant $H_{F_{δ}, p} (ν) = H_{f, p} (ν) + \frac{δ}{p + ν} C_{p, ν}$ .

In view of Lemma 5.6, to solve (Equation63(63) $min_{x \in R^{n}} {\tilde{F}}_{δ} (x) \equiv F_{δ} (x) + ϕ (x),$ (63) ) we can use the following instance of Algorithm A1 (see Appendix):

Let us consider the following restart procedure based on Algorithm 5.

Theorem 5.7

Suppose that H1 holds and let ${u_{k}}_{k = 0}^{T}$ be a sequence generated by Algorithm 6 such that (68) $∥ \nabla {\tilde{F}}_{δ} (u_{k}) ∥ > \frac{ϵ}{2}, k = 0, \dots, T .$ (68) Then, (69) $T \leq 1 + \log_{2} (\frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ R^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (69)

Proof.

Let $x_{δ}^{*} = \arg min_{x \in E} {\tilde{F}}_{δ} (x)$ . By Theorem A.2 and (66), we have (70) $\begin{aligned} {\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) & = {\tilde{F}}_{δ} (x_{m}^{(k)}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ \leq \frac{2^{3 p - 1} H_{δ} (p + ν)^{p + ν - 1} ∥ x_{0}^{(k)} - x_{δ}^{*} ∥^{p + ν}}{(p - 1)! (m - 1)^{p + ν}} \\ \leq \frac{δ 2^{- (p + ν - 2)}}{2 (p + ν)} ∥ y_{k} - x_{δ}^{*} ∥^{p + ν} . \end{aligned}$ (70) On the other hand, by Lemma 5 in [Citation12] and Lemma 1 in [Citation25], function $F_{δ} (\cdot)$ is uniformly convex of degree $p + ν$ with parameter $2^{- (p + ν - 2)}$ . Thus, (71) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \geq \frac{δ 2^{- (p + ν - 2)}}{p + ν} ∥ y_{k + 1} - x_{δ}^{*} ∥^{p + ν} .$ (71) Combining (Equation70(70) $\begin{aligned} {\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) & = {\tilde{F}}_{δ} (x_{m}^{(k)}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ \leq \frac{2^{3 p - 1} H_{δ} (p + ν)^{p + ν - 1} ∥ x_{0}^{(k)} - x_{δ}^{*} ∥^{p + ν}}{(p - 1)! (m - 1)^{p + ν}} \\ \leq \frac{δ 2^{- (p + ν - 2)}}{2 (p + ν)} ∥ y_{k} - x_{δ}^{*} ∥^{p + ν} . \end{aligned}$ (70) ) and (Equation71(71) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \geq \frac{δ 2^{- (p + ν - 2)}}{p + ν} ∥ y_{k + 1} - x_{δ}^{*} ∥^{p + ν} .$ (71) ), we obtain $∥ y_{k + 1} - x_{δ}^{*} ∥^{p + ν} \leq \frac{1}{2} ∥ y_{k} - x_{δ}^{*} ∥^{p + ν}$ , and so (72) $∥ y_{k} - x_{δ}^{*} ∥^{p + ν} \leq {(\frac{1}{2})}^{k} ∥ y_{0} - x_{δ}^{*} ∥^{p + ν} = {(\frac{1}{2})}^{k} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} .$ (72) Thus, it follows from (Equation70(70) $\begin{aligned} {\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) & = {\tilde{F}}_{δ} (x_{m}^{(k)}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ \leq \frac{2^{3 p - 1} H_{δ} (p + ν)^{p + ν - 1} ∥ x_{0}^{(k)} - x_{δ}^{*} ∥^{p + ν}}{(p - 1)! (m - 1)^{p + ν}} \\ \leq \frac{δ 2^{- (p + ν - 2)}}{2 (p + ν)} ∥ y_{k} - x_{δ}^{*} ∥^{p + ν} . \end{aligned}$ (70) ) and (Equation72(72) $∥ y_{k} - x_{δ}^{*} ∥^{p + ν} \leq {(\frac{1}{2})}^{k} ∥ y_{0} - x_{δ}^{*} ∥^{p + ν} = {(\frac{1}{2})}^{k} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} .$ (72) ) that (73) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \leq \frac{δ}{2^{p + ν - 1} (p + ν)} {(\frac{1}{2})}^{k} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} .$ (73) In view of Lemma 5.2, by (67) and (65), we get (74) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (u_{k + 1}) \geq \frac{1}{8 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}}} ∥ \nabla {\tilde{F}}_{δ} (u_{k + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} .$ (74) Then, combining (Equation73(73) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \leq \frac{δ}{2^{p + ν - 1} (p + ν)} {(\frac{1}{2})}^{k} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} .$ (73) ) and (Equation74(74) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (u_{k + 1}) \geq \frac{1}{8 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}}} ∥ \nabla {\tilde{F}}_{δ} (u_{k + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} .$ (74) ), it follows that $\frac{1}{8 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}}} ∥ \nabla {\tilde{F}}_{δ} (u_{k + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} \leq \frac{δ}{2^{p + ν - 1} (p + ν)} {(\frac{1}{2})}^{k} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} .$ In particular, for k = T−1, it follows from (Equation68(68) $∥ \nabla {\tilde{F}}_{δ} (u_{k}) ∥ > \frac{ϵ}{2}, k = 0, \dots, T .$ (68) ) that (75) $2^{T - 1} \leq \frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ ∥ x_{0} - x_{δ}^{*} ∥^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}} .$ (75) Since ${\tilde{F}}_{δ} (x_{δ}^{*}) \leq {\tilde{F}}_{δ} (x^{*})$ , it follows that $∥ x_{0} - x_{δ}^{*} ∥ \leq ∥ x_{0} - x^{*} ∥ \leq R$ . Thus, combining this with (Equation75(75) $2^{T - 1} \leq \frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ ∥ x_{0} - x_{δ}^{*} ∥^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}} .$ (75) ), we get (Equation69(69) $T \leq 1 + \log_{2} (\frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ R^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (69) ).

Corollary 5.8

Suppose that H1 holds and that $R \geq 1$ . Then, Algorithm 6 with (76) $δ = \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} .$ (76) perform at most (77) $O (\log_{2} (\frac{R^{p + ν - 1}}{ϵ}) {(\frac{R^{p + ν - 1}}{ϵ})}^{\frac{1}{p + ν}}) .$ (77) iterations of Algorithm 5 in order to generate $u_{T}$ such that $∥ \nabla \tilde{f} (u_{T}) ∥_{*} \leq ϵ$ .

Proof.

By Theorem 5.7, we can obtain $∥ \nabla {\tilde{F}}_{δ} (u_{T}) ∥_{*} \leq ϵ / 2$ with (78) $T \leq 2 + \log_{2} (\frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ R^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (78) Moreover, it follows from (65), (Equation76(76) $δ = \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} .$ (76) ), the definition of $H_{F_{δ}, p} (ν)$ in Lemma 5.6, $ϵ \in (0, 1)$ and $R \geq 1$ that (79) $\begin{aligned} H_{δ} & = p (H_{F_{δ}, p} + 3 θ (p - 1)!) = p (H_{f, p} (ν) + \frac{δ}{p + ν} C_{p, ν} + 3 θ (p - 1)!) \\ = p (H_{f, p} (ν) + \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} \frac{C_{p, ν}}{(p + ν)} + 3 θ (p - 1)!) \\ \leq p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!) . \end{aligned}$ (79) Combining (Equation78(78) $T \leq 2 + \log_{2} (\frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ R^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (78) ), (Equation79(79) $\begin{aligned} H_{δ} & = p (H_{F_{δ}, p} + 3 θ (p - 1)!) = p (H_{f, p} (ν) + \frac{δ}{p + ν} C_{p, ν} + 3 θ (p - 1)!) \\ = p (H_{f, p} (ν) + \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} \frac{C_{p, ν}}{(p + ν)} + 3 θ (p - 1)!) \\ \leq p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!) . \end{aligned}$ (79) ) and (Equation76(76) $δ = \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} .$ (76) ), we have (80) $T \leq 2 + \log_{2} (\frac{32 (p + 1)! {[p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}^{\frac{1}{p + ν - 1}} R}{2^{2 (p + ν - 2)} (p + ν) ϵ^{\frac{1}{p + ν - 1}}}) .$ (80) At this point $u_{T}$ , we have (81) $∥ \nabla \tilde{f} (u_{T}) ∥_{*} \leq ∥ \nabla {\tilde{F}}_{δ} (u_{T}) ∥_{*} + \frac{δ}{p + ν} ∥ \nabla d_{p + ν} (u_{T}) ∥ \leq \frac{ϵ}{2} + δ ∥ u_{T} - x_{0} ∥^{p + ν - 1} .$ (81) Since ${\tilde{F}}_{δ} (\cdot)$ is uniformly convex of degree $p + ν$ with parameter $2^{- (p + ν - 2)}$ , it follows from (Equation74(74) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (u_{k + 1}) \geq \frac{1}{8 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}}} ∥ \nabla {\tilde{F}}_{δ} (u_{k + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} .$ (74) ) and (Equation73(73) ${\tilde{F}}_{δ} (y_{k + 1}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \leq \frac{δ}{2^{p + ν - 1} (p + ν)} {(\frac{1}{2})}^{k} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} .$ (73) ) that $\begin{aligned} \frac{δ 2^{- (p + ν - 2)}}{p + ν} ∥ u_{T} - x_{δ}^{*} ∥^{p + ν} & \leq {\tilde{F}}_{δ} (u_{T}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \leq {\tilde{F}}_{δ} (y_{T}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ \leq \frac{δ}{2^{p + ν - 1} (p + ν)} {(\frac{1}{2})}^{T - 1} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} . \end{aligned}$ Therefore, $∥ u_{T} - x_{δ}^{*} ∥ \leq ∥ x_{0} - x_{δ}^{*} ∥$ , and so (82) $\begin{aligned} ∥ u_{T} - x_{0} ∥^{p + ν - 1} & \leq {[∥ u_{T} - x_{δ}^{*} ∥ + ∥ x_{δ}^{*} - x_{0} ∥]}^{p + ν - 1} \\ \leq 2^{p + ν - 1} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν - 1} \end{aligned}$ (82) (83) $\begin{aligned} \leq 2^{p + ν - 1} R^{p + ν - 1} . \end{aligned}$ (83) Now, combining (Equation81(81) $∥ \nabla \tilde{f} (u_{T}) ∥_{*} \leq ∥ \nabla {\tilde{F}}_{δ} (u_{T}) ∥_{*} + \frac{δ}{p + ν} ∥ \nabla d_{p + ν} (u_{T}) ∥ \leq \frac{ϵ}{2} + δ ∥ u_{T} - x_{0} ∥^{p + ν - 1} .$ (81) ), (Equation83(83) $\begin{aligned} \leq 2^{p + ν - 1} R^{p + ν - 1} . \end{aligned}$ (83) ) and (Equation76(76) $δ = \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} .$ (76) ), we obtain (84) $∥ \nabla \tilde{f} (u_{T}) ∥ \leq \frac{ϵ}{2} + \frac{ϵ}{2} = ϵ .$ (84) The conclusion is obtained by noticing that, for δ given in (Equation76(76) $δ = \frac{ϵ}{2^{(p + ν)} R^{p + ν - 1}} .$ (76) ) we have (85) $\begin{aligned} m & = 1 + ⌈{(\frac{2^{4 p + ν - 2} (p + ν)^{p + ν} H_{δ}}{δ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \\ \leq 1 + ⌈{(\frac{2^{4 p + ν - 2} (p + ν)^{p + ν} [p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}{δ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \\ = 1 + ⌈{(\frac{2^{5 p + 2 ν - 2} (p + ν)^{p + ν} R^{p + ν - 1} [p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}{ϵ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \end{aligned}$ (85) Thus, (Equation77(77) $O (\log_{2} (\frac{R^{p + ν - 1}}{ϵ}) {(\frac{R^{p + ν - 1}}{ϵ})}^{\frac{1}{p + ν}}) .$ (77) ) follows from multiplying (Equation80(80) $T \leq 2 + \log_{2} (\frac{32 (p + 1)! {[p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}^{\frac{1}{p + ν - 1}} R}{2^{2 (p + ν - 2)} (p + ν) ϵ^{\frac{1}{p + ν - 1}}}) .$ (80) ) and (Equation85(85) $\begin{aligned} m & = 1 + ⌈{(\frac{2^{4 p + ν - 2} (p + ν)^{p + ν} H_{δ}}{δ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \\ \leq 1 + ⌈{(\frac{2^{4 p + ν - 2} (p + ν)^{p + ν} [p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}{δ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \\ = 1 + ⌈{(\frac{2^{5 p + 2 ν - 2} (p + ν)^{p + ν} R^{p + ν - 1} [p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}{ϵ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \end{aligned}$ (85) ).

Suppose now that $S \geq \tilde{f} (x_{0}) - \tilde{f} (x^{*})$ is known. In this case, we have the following variant of Theorem 5.7.

Theorem 5.9

Suppose that H1 holds and let ${u_{k}}_{k = 0}^{T}$ be a sequence generated by Algorithm 6 such that (86) $∥ \nabla {\tilde{F}}_{δ} (u_{k}) ∥_{*} > \frac{ϵ}{2}, k = 0, \dots, T .$ (86) Then, (87) $T \leq 1 + \log_{2} (\frac{16 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} S}{ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (87)

Proof.

By (Equation75(75) $2^{T - 1} \leq \frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} δ ∥ x_{0} - x_{δ}^{*} ∥^{p + ν}}{2^{p + ν - 1} (p + ν) ϵ^{\frac{p + ν}{p + ν - 1}}} .$ (75) ), we have (88) $T \leq 1 + \log_{2} (\frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}}}{2 ϵ^{\frac{p + ν}{p + ν - 1}}} \frac{δ}{2^{p + ν - 2} (p + ν)} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν}) .$ (88) Since ${\tilde{F}}_{δ} (\cdot)$ is uniformly convex of degree $p + ν$ with parameter $δ 2^{- (p + ν - 2)}$ we have (89) $\begin{aligned} \frac{δ}{2^{p + ν - 2} (p + ν)} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} & \leq {\tilde{F}}_{δ} (x_{0}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ = \tilde{f} (x_{0}) - \tilde{f} (x_{δ}^{*}) - \frac{δ}{p + ν} ∥ x_{δ}^{*} - x_{0} ∥^{p + ν} \\ \leq \tilde{f} (x_{0}) - \tilde{f} (x_{δ}^{*}) \\ \leq \tilde{f} (x_{0}) - \tilde{f} (x^{*}) \\ \leq S . \end{aligned}$ (89) Combining (Equation88(88) $T \leq 1 + \log_{2} (\frac{32 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}}}{2 ϵ^{\frac{p + ν}{p + ν - 1}}} \frac{δ}{2^{p + ν - 2} (p + ν)} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν}) .$ (88) ) and (Equation89(89) $\begin{aligned} \frac{δ}{2^{p + ν - 2} (p + ν)} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} & \leq {\tilde{F}}_{δ} (x_{0}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ = \tilde{f} (x_{0}) - \tilde{f} (x_{δ}^{*}) - \frac{δ}{p + ν} ∥ x_{δ}^{*} - x_{0} ∥^{p + ν} \\ \leq \tilde{f} (x_{0}) - \tilde{f} (x_{δ}^{*}) \\ \leq \tilde{f} (x_{0}) - \tilde{f} (x^{*}) \\ \leq S . \end{aligned}$ (89) ) we get (Equation87(87) $T \leq 1 + \log_{2} (\frac{16 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} S}{ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (87) ).

Corollary 5.10

Suppose that H1 holds and that $S \geq 1$ . Then, Algorithm 6 with (90) $δ = {[\frac{ϵ}{2^{p + ν} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}}}]}^{p + ν}$ (90) performs at most (91) $O (\log_{2} (\frac{S}{ϵ^{\frac{p + ν - 1}{p + ν}}}) (\frac{S^{\frac{p + ν - 1}{p + ν}}}{ϵ}))$ (91) iterations of Algorithm 5 in order to generate $u_{T}$ such that $∥ \nabla \tilde{f} (u_{T}) ∥_{*} \leq ϵ$ .

Proof.

By Theorem 5.9, we can obtain $∥ \nabla {\tilde{F}}_{δ} (u_{T}) ∥_{*} \leq ϵ / 2$ with (92) $T \leq 2 + \log_{2} (\frac{16 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} S}{ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (92) In view of (Equation90(90) $δ = {[\frac{ϵ}{2^{p + ν} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}}}]}^{p + ν}$ (90) ), $ϵ \in (0, 1)$ and $S \geq 1$ , we also have (93) $H_{δ} \leq p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!) .$ (93) Thus, from (Equation92(92) $T \leq 2 + \log_{2} (\frac{16 (p + 1)! H_{δ}^{\frac{1}{p + ν - 1}} S}{ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (92) ) and (Equation93(93) $H_{δ} \leq p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!) .$ (93) ), it follows that (94) $T \leq 2 + \log_{2} (\frac{16 (p + 1)! {[p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}^{\frac{1}{p + ν - 1}} S}{ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (94) At this point $u_{T}$ we have $∥ \nabla \tilde{f} (u_{T}) ∥_{*} \leq ∥ \nabla {\tilde{F}}_{δ} (u_{T}) ∥_{*} + \frac{δ}{p + ν} ∥ \nabla d_{p + ν} (u_{T}) ∥_{*} \leq \frac{ϵ}{2} + δ ∥ u_{T} - x_{0} ∥^{p + ν - 1} .$ By (Equation82(82) $\begin{aligned} ∥ u_{T} - x_{0} ∥^{p + ν - 1} & \leq {[∥ u_{T} - x_{δ}^{*} ∥ + ∥ x_{δ}^{*} - x_{0} ∥]}^{p + ν - 1} \\ \leq 2^{p + ν - 1} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν - 1} \end{aligned}$ (82) ) and (Equation89(89) $\begin{aligned} \frac{δ}{2^{p + ν - 2} (p + ν)} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν} & \leq {\tilde{F}}_{δ} (x_{0}) - {\tilde{F}}_{δ} (x_{δ}^{*}) \\ = \tilde{f} (x_{0}) - \tilde{f} (x_{δ}^{*}) - \frac{δ}{p + ν} ∥ x_{δ}^{*} - x_{0} ∥^{p + ν} \\ \leq \tilde{f} (x_{0}) - \tilde{f} (x_{δ}^{*}) \\ \leq \tilde{f} (x_{0}) - \tilde{f} (x^{*}) \\ \leq S . \end{aligned}$ (89) ), (95) $\begin{aligned} ∥ u_{T} - x_{0} ∥^{p + ν - 1} & \leq 2^{p + ν - 1} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν - 1} \leq 2^{p + ν - 1} {[\frac{2^{p + ν - 2} (p + ν) S}{δ}]}^{\frac{p + ν - 1}{p + ν}} \\ = {(\frac{1}{δ})}^{\frac{p + ν - 1}{p + ν}} 2^{p + ν - 1} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}} . \end{aligned}$ (95) Thus, it follows from (Equation95(95) $\begin{aligned} ∥ u_{T} - x_{0} ∥^{p + ν - 1} & \leq 2^{p + ν - 1} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν - 1} \leq 2^{p + ν - 1} {[\frac{2^{p + ν - 2} (p + ν) S}{δ}]}^{\frac{p + ν - 1}{p + ν}} \\ = {(\frac{1}{δ})}^{\frac{p + ν - 1}{p + ν}} 2^{p + ν - 1} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}} . \end{aligned}$ (95) ), (Equation95(95) $\begin{aligned} ∥ u_{T} - x_{0} ∥^{p + ν - 1} & \leq 2^{p + ν - 1} ∥ x_{0} - x_{δ}^{*} ∥^{p + ν - 1} \leq 2^{p + ν - 1} {[\frac{2^{p + ν - 2} (p + ν) S}{δ}]}^{\frac{p + ν - 1}{p + ν}} \\ = {(\frac{1}{δ})}^{\frac{p + ν - 1}{p + ν}} 2^{p + ν - 1} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}} . \end{aligned}$ (95) ) and (Equation90(90) $δ = {[\frac{ϵ}{2^{p + ν} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}}}]}^{p + ν}$ (90) ) that $∥ \nabla \tilde{f} (u_{T}) ∥_{*} \leq \frac{ϵ}{2} + δ^{\frac{1}{p + ν}} 2^{p + ν - 1} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}} \leq \frac{ϵ}{2} + \frac{ϵ}{2} = ϵ .$ Finally, by (66) and (Equation90(90) $δ = {[\frac{ϵ}{2^{p + ν} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}}}]}^{p + ν}$ (90) ) we have $\begin{aligned} m & = 1 + ⌈{(\frac{2^{4 p + ν - 2} (p + ν)^{p + ν} H_{δ}}{δ (p - 1)!})}^{\frac{1}{p + ν}}⌉ \leq 1 \\ + ⌈{(\frac{2^{4 p + ν - 2} (p + ν)^{p + ν} [p H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!]}{(p - 1)!})}^{\frac{1}{p + ν}} \\ \frac{2^{p + ν} {[2^{p + ν - 2} (p + ν) S]}^{\frac{p + ν - 1}{p + ν}}}{ϵ}⌉ . \end{aligned}$ Thus, (Equation91(91) $O (\log_{2} (\frac{S}{ϵ^{\frac{p + ν - 1}{p + ν}}}) (\frac{S^{\frac{p + ν - 1}{p + ν}}}{ϵ}))$ (91) ) follows by multiplying (Equation94(94) $T \leq 2 + \log_{2} (\frac{16 (p + 1)! {[p (H_{f, p} (ν) + C_{p, ν} + 3 θ (p - 1)!)]}^{\frac{1}{p + ν - 1}} S}{ϵ^{\frac{p + ν}{p + ν - 1}}}) .$ (94) ) by the upper bound on m given above.

6. Lower complexity bounds under Hölder condition

In this section, we derive lower complexity bounds for p-order tensor methods applied to the problem (Equation4(4) $min_{x \in E} f (x),$ (4) ) in terms of the norm of the gradient of f, where the objective f is convex and $H_{f, p} (ν) < + \infty$ for some $ν \in [0, 1]$ .

For simplicity, assume that $E = R^{n}$ and $B = I_{n}$ . Given an approximation $\bar{x}$ for the solution of (Equation4(4) $min_{x \in E} f (x),$ (4) ), we consider p-order methods that compute trial points of the form $x^{+} = \bar{x} + \bar{h}$ , where the search direction $\bar{h}$ is the solution of an auxiliary problem of the form (96) $min_{h \in R^{n}} φ_{a, γ, q} (h) \equiv \sum_{i = 1}^{p} a^{(i)} D^{i} f (\bar{x}) [h]^{i} + γ ∥ h ∥^{q},$ (96) with $a \in R^{p}$ , $γ > 0$ and q>1. Denote by $Γ_{\bar{x}, f} (a, γ, q)$ the set of all stationary points of function $φ_{a, γ, q} (\cdot)$ and define the linear subspace (97) $S_{f} (\bar{x}) = L i n (Γ_{\bar{x}, f} (a, γ, q) | a \in R^{p}, γ > 0, q > 1) .$ (97) More specifically, we consider the class of p-order tensor methods characterized by the following assumption.

Assumption 6.1

Given $x_{0} \in R^{n}$ , the method generates a sequence of test points ${x_{k}}_{k \geq 0}$ such that (98) $x_{k + 1} \in x_{0} + \sum_{i = 0}^{k} S_{f} (x_{i}), k \geq 0.$ (98) Given $ν \in [0, 1]$ , we consider the same family of difficult problems discussed in [Citation18], namely: (99) $f_{k} (x) = \frac{1}{p + ν} [\sum_{i = 1}^{k - 1} | x^{(i)} - x^{(i + 1)} |^{p + ν} + \sum_{i = k}^{n} | x^{(i)} |^{p + ν}] - x^{(1)}, 2 \leq k \leq n .$ (99) The next lemma establishes that for each $f_{k} (\cdot)$ we have $H_{f_{k}, p} (ν) < + \infty$ .

Lemma 6.1

Given an integer $k \in [2, n],$ the pth derivative of $f_{k} (\cdot)$ is ν-Hölder continuous with (100) $H_{f_{k}, p} (ν) = 2^{\frac{2 + ν}{2}} Π_{i = 1}^{p - 1} (p + ν - i) .$ (100)

Proof.

See Lemma 5.1 in [Citation18].

The next lemma provides additional properties of $f_{k} (\cdot)$ .

Lemma 6.2

Given an integer $k \in [2, n]$ , let function $f_{k} (\cdot)$ be defined by (Equation99(99) $f_{k} (x) = \frac{1}{p + ν} [\sum_{i = 1}^{k - 1} | x^{(i)} - x^{(i + 1)} |^{p + ν} + \sum_{i = k}^{n} | x^{(i)} |^{p + ν}] - x^{(1)}, 2 \leq k \leq n .$ (99) ). Then, $f_{k} (\cdot)$ has a unique global minimizer $x_{k}^{*}$ . Moreover, (101) $f_{k}^{*} = - \frac{(p + ν - 1) k}{p + ν} a n d ∥ x_{k}^{*} ∥ < \frac{(k + 1)^{\frac{3}{2}}}{\sqrt{3}} .$ (101)

Proof.

See Lemma 5.2 in [Citation18].

Our goal is to understand the behaviour of the tensor methods specified by Assumption 1 when applied to the minimization of $f_{k} (\cdot)$ with a suitable k. For that, let us consider the following subspaces: $R_{k}^{n} = \{x \in R^{n} | x^{(i)} = 0, i = k + 1, \dots, n\}, 1 \leq k \leq n - 1.$

Lemma 6.3

For any $q \geq 0$ and $x \in R_{k}^{n},$ $f_{k + q} (x) = f_{k} (x)$ .

Proof.

It follows directly from (Equation99(99) $f_{k} (x) = \frac{1}{p + ν} [\sum_{i = 1}^{k - 1} | x^{(i)} - x^{(i + 1)} |^{p + ν} + \sum_{i = k}^{n} | x^{(i)} |^{p + ν}] - x^{(1)}, 2 \leq k \leq n .$ (99) ).

Lemma 6.4

Let $M$ be a p-order tensor method satisfying Assumption 1. If $M$ is applied to the minimization of $f_{t} (\cdot)$ $(2 \leq t \leq n)$ starting from $x_{0} = 0,$ then the sequence ${x_{k}}_{k \geq 0}$ of test points generated by $M$ satisfies $x_{k + 1} \in \sum_{i = 0}^{k} S_{f_{t}} (x_{i}) \subset R_{k + 1}^{n}, 0 \leq k \leq t - 1.$

Proof.

See Lemma 2 in [Citation27].

The next lemma gives a lower bound for the norm of the gradient of $f_{t} (\cdot)$ on suitable points.

Lemma 6.5

Let k be an integer in the interval $[1, t - 1),$ with $t + 1 \leq n$ . If $x \in R_{k}^{n},$ then $∥ \nabla f_{t} (x) ∥_{*} \geq \frac{1}{\sqrt{k + 1}}$ .

Proof.

In view of (Equation99(99) $f_{k} (x) = \frac{1}{p + ν} [\sum_{i = 1}^{k - 1} | x^{(i)} - x^{(i + 1)} |^{p + ν} + \sum_{i = k}^{n} | x^{(i)} |^{p + ν}] - x^{(1)}, 2 \leq k \leq n .$ (99) ) we have (102) $f_{k} (x) = η_{p + ν} (A_{k} x) - ⟨ e_{1}, x ⟩,$ (102) where (103) $η_{p + ν} (u) = \frac{1}{p + ν} \sum_{i = 1}^{n} | u^{(i)} |^{p + ν},$ (103) and (104) $A_{k} = (\begin{matrix} U_{k} & 0 \\ 0 & I_{n - k} \end{matrix}), w i t h U_{k} = (\begin{matrix} 1 & - 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & - 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 & - 1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{matrix}) \in R^{k \times k} .$ (104) By (Equation104(104) $A_{k} = (\begin{matrix} U_{k} & 0 \\ 0 & I_{n - k} \end{matrix}), w i t h U_{k} = (\begin{matrix} 1 & - 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & - 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 & - 1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{matrix}) \in R^{k \times k} .$ (104) ) and (Equation103(103) $η_{p + ν} (u) = \frac{1}{p + ν} \sum_{i = 1}^{n} | u^{(i)} |^{p + ν},$ (103) ), we have ${(\nabla η_{p + ν} (A_{t} x))}^{(i)} = \{\begin{cases} | x^{(i)} - x^{(i + 1)} |^{p + ν - 2} (x^{(i)} - x^{(i + 1)}), & i = 1, \dots, t - 1. \\ | x^{(i)} |^{p + ν - 2} (x^{(i)}), & i = t, \dots, n . \end{cases}$ Since $x \in R_{k}^{n}$ , it follows that $x^{(i)} = 0$ for i>k. Therefore, ${(\nabla η_{p + ν} (A_{t} x))}^{(i)} = 0, i = k + 1, \dots, n,$ which means that $\nabla η_{p + ν} (A_{t} x) \in R_{k}^{n}$ . Then, from (Equation102(102) $f_{k} (x) = η_{p + ν} (A_{k} x) - ⟨ e_{1}, x ⟩,$ (102) ), we obtain (105) $\begin{aligned} ∥ \nabla f_{t} (x) ∥_{*}^{2} & = ∥ A_{t}^{T} \nabla η_{p + ν} (A_{t} x) - e_{1} ∥_{*}^{2} \geq inf_{y \in R_{k}^{n}} ∥ A_{t}^{T} y - e_{1} ∥_{*}^{2} \\ = inf_{z \in R^{k}} ∥ B z - e_{1} ∥_{*}^{2} [w h e r e B = A_{t}^{T} (\begin{matrix} I_{k} \\ 0 \end{matrix})] \\ = ∥ B (B^{T} B)^{- 1} B^{T} e_{1} - e_{1} ∥_{*}^{2} \\ = \sum_{i = 1}^{n} {({[B (B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} - (e_{1})^{(i)})}^{2} . \end{aligned}$ (105) By (Equation104(104) $A_{k} = (\begin{matrix} U_{k} & 0 \\ 0 & I_{n - k} \end{matrix}), w i t h U_{k} = (\begin{matrix} 1 & - 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & - 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 & - 1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{matrix}) \in R^{k \times k} .$ (104) ), we have (106) $B = (\begin{matrix} \tilde{U} \\ 0 \end{matrix}) \in R^{n \times k}, w i t h \tilde{U} = (\begin{matrix} 1 & 0 & 0 & \dots & 0 & 0 \\ - 1 & 1 & 0 & \dots & 0 & 0 \\ 0 & - 1 & 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & - 1 & 1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{matrix}) \in R^{(k + 1) \times k}$ (106) Consequently, (107) $B^{T} e_{1} = (\begin{matrix} 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}) \in R^{k} .$ (107) and (108) $B^{T} B = (\begin{matrix} 2 & - 1 & 0 & 0 & \dots & 0 & 0 & 0 \\ - 1 & 2 & - 1 & 0 & \dots & 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - 1 & 2 & - 1 \\ 0 & 0 & 0 & 0 & \dots & 0 & - 1 & 2 \end{matrix}) \in R^{k \times k} .$ (108) From (Equation108(108) $B^{T} B = (\begin{matrix} 2 & - 1 & 0 & 0 & \dots & 0 & 0 & 0 \\ - 1 & 2 & - 1 & 0 & \dots & 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - 1 & 2 & - 1 \\ 0 & 0 & 0 & 0 & \dots & 0 & - 1 & 2 \end{matrix}) \in R^{k \times k} .$ (108) ), it can be checked that (109) $(B^{T} B)^{- 1} = \frac{1}{k + 1} \tilde{B} \in R^{k \times k},$ (109) with (110) ${\tilde{B}}_{i j} = \{\begin{cases} i [(k + 1) - j], & i f j \geq i, \\ j [(k + 1) - i], & o t h e r w i s e . \end{cases}$ (110) Now, combining (Equation107(107) $B^{T} e_{1} = (\begin{matrix} 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}) \in R^{k} .$ (107) ) and (Equation108(108) $B^{T} B = (\begin{matrix} 2 & - 1 & 0 & 0 & \dots & 0 & 0 & 0 \\ - 1 & 2 & - 1 & 0 & \dots & 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - 1 & 2 & - 1 \\ 0 & 0 & 0 & 0 & \dots & 0 & - 1 & 2 \end{matrix}) \in R^{k \times k} .$ (108) )–(Equation109(109) $(B^{T} B)^{- 1} = \frac{1}{k + 1} \tilde{B} \in R^{k \times k},$ (109) ), we get (111) ${[(B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} = \frac{(k + 1) - i}{k + 1}, i = 1, \dots, k .$ (111) Then, it follows from (Equation106(106) $B = (\begin{matrix} \tilde{U} \\ 0 \end{matrix}) \in R^{n \times k}, w i t h \tilde{U} = (\begin{matrix} 1 & 0 & 0 & \dots & 0 & 0 \\ - 1 & 1 & 0 & \dots & 0 & 0 \\ 0 & - 1 & 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & - 1 & 1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{matrix}) \in R^{(k + 1) \times k}$ (106) ) and (Equation111(111) ${[(B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} = \frac{(k + 1) - i}{k + 1}, i = 1, \dots, k .$ (111) ) that (112) $\begin{aligned} {[B (B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} & = \{\begin{cases} \frac{k}{k + 1}, & i = 1, \\ - \frac{(k + 1) - (i - 1)}{k + 1} + \frac{(k + 1) - i}{k + 1}, & i = 2, \dots, k, \\ - \frac{1}{k + 1}, & i = k + 1, \\ 0, & i = k + 2, \dots, n . \end{cases} \\ = \{\begin{cases} \frac{k}{k + 1}, & i = 1, \\ - \frac{1}{k + 1}, & i = 2, \dots, k + 1, \\ 0, & i = k + 2, \dots, n . \end{cases} \end{aligned}$ (112) Finally, by (Equation105(105) $\begin{aligned} ∥ \nabla f_{t} (x) ∥_{*}^{2} & = ∥ A_{t}^{T} \nabla η_{p + ν} (A_{t} x) - e_{1} ∥_{*}^{2} \geq inf_{y \in R_{k}^{n}} ∥ A_{t}^{T} y - e_{1} ∥_{*}^{2} \\ = inf_{z \in R^{k}} ∥ B z - e_{1} ∥_{*}^{2} [w h e r e B = A_{t}^{T} (\begin{matrix} I_{k} \\ 0 \end{matrix})] \\ = ∥ B (B^{T} B)^{- 1} B^{T} e_{1} - e_{1} ∥_{*}^{2} \\ = \sum_{i = 1}^{n} {({[B (B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} - (e_{1})^{(i)})}^{2} . \end{aligned}$ (105) ) and (Equation112(112) $\begin{aligned} {[B (B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} & = \{\begin{cases} \frac{k}{k + 1}, & i = 1, \\ - \frac{(k + 1) - (i - 1)}{k + 1} + \frac{(k + 1) - i}{k + 1}, & i = 2, \dots, k, \\ - \frac{1}{k + 1}, & i = k + 1, \\ 0, & i = k + 2, \dots, n . \end{cases} \\ = \{\begin{cases} \frac{k}{k + 1}, & i = 1, \\ - \frac{1}{k + 1}, & i = 2, \dots, k + 1, \\ 0, & i = k + 2, \dots, n . \end{cases} \end{aligned}$ (112) ) we have $\begin{aligned} ∥ \nabla f_{t} (x) ∥_{*}^{2} & \geq \sum_{i = 1}^{n} {({[B (B^{T} B)^{- 1} B^{T} e_{1}]}^{(i)} - (e_{1})^{(i)})}^{2} \\ = {(- \frac{1}{k + 1})}^{2} + \sum_{i = 2}^{k + 1} {(- \frac{1}{k + 1})}^{2} = \sum_{i = 1}^{k + 1} \frac{1}{(k + 1)^{2}} \\ = \frac{1}{k + 1}, \end{aligned}$ and the proof is complete.

The next theorem establishes a lower bound for the rate of convergence of p-order tensor methods with respect to the initial functional residual $(f (x_{0}) - f^{*})$ .

Theorem 6.6

Let $M$ be a p-order tensor method satisfying Assumption 6.1. Assume that for any function f with $H_{f, p} (ν) < + \infty$ this method ensures the rate of convergence: (113) $min_{1 \leq k \leq t - 1} ∥ \nabla f (x_{k}) ∥_{*} \leq \frac{H_{f, p} (ν)^{\frac{1}{p + ν}} (f (x_{0}) - f^{*})^{\frac{p + ν - 1}{p + ν}}}{κ (t)}, t \geq 2,$ (113) where ${x_{k}}_{k \geq 0}$ is the sequence generated by method $M$ and $f^{*}$ is the optimal value of f. Then, for all $t \geq 2$ such that $t + 1 \leq n$ we have (114) $κ (t) \leq D_{p, ν} t^{\frac{3 (p + ν) - 2}{2 (p + ν)}} w i t h D_{p, ν} = {[2^{\frac{2 + ν}{2}} Π_{i = 1}^{p - 1} (p + ν - i)]}^{\frac{1}{p + ν}} {[\frac{p + ν - 1}{p + ν}]}^{\frac{p + ν - 1}{p + ν}} .$ (114)

Proof.

Suppose that method $M$ is applied to minimize function $f_{t} (\cdot)$ with initial point $x_{0} = 0$ . By Lemma 6.4, we have $x_{k} \in R_{k}^{n}$ for all k, $1 \leq k \leq t - 1$ . Thus, from Lemma 6.5, it follows that (115) $min_{1 \leq k \leq t - 1} ∥ \nabla f_{t} (x_{k}) ∥_{*} \geq min_{1 \leq k \leq t - 1} \frac{1}{\sqrt{k + 1}} = \frac{1}{\sqrt{t}} .$ (115) Then, combining (Equation113(113) $min_{1 \leq k \leq t - 1} ∥ \nabla f (x_{k}) ∥_{*} \leq \frac{H_{f, p} (ν)^{\frac{1}{p + ν}} (f (x_{0}) - f^{*})^{\frac{p + ν - 1}{p + ν}}}{κ (t)}, t \geq 2,$ (113) ), (Equation115(115) $min_{1 \leq k \leq t - 1} ∥ \nabla f_{t} (x_{k}) ∥_{*} \geq min_{1 \leq k \leq t - 1} \frac{1}{\sqrt{k + 1}} = \frac{1}{\sqrt{t}} .$ (115) ), Lemma 6.1 and Lemma 6.2, we get $\begin{aligned} κ (t) & \leq \frac{H_{f_{t}, p} (ν)^{\frac{1}{p + ν}} (f_{t} (x_{0}) - f_{t}^{*})}{min_{1 \leq k \leq t - 1} ∥ \nabla f_{t} (x_{k}) ∥_{*}} \\ \leq {[2^{\frac{2 + ν}{2}} Π_{i = 1}^{p - 1} (p + ν - i)]}^{\frac{1}{p + ν}} {[\frac{p + ν - 1}{p + ν}]}^{\frac{p + ν - 1}{p + ν}} t^{\frac{p + ν - 1}{p + ν}} t^{\frac{1}{2}} \\ \leq D_{p, ν} (t + 1)^{\frac{3 (p + ν) - 2}{2 (p + ν)}}, \end{aligned}$ where constant $D_{p, ν}$ is given in (Equation114(114) $κ (t) \leq D_{p, ν} t^{\frac{3 (p + ν) - 2}{2 (p + ν)}} w i t h D_{p, ν} = {[2^{\frac{2 + ν}{2}} Π_{i = 1}^{p - 1} (p + ν - i)]}^{\frac{1}{p + ν}} {[\frac{p + ν - 1}{p + ν}]}^{\frac{p + ν - 1}{p + ν}} .$ (114) ).

Remark 6.1

Theorem 6.6 gives a lower bound of $O ((\frac{1}{k})^{\frac{3 (p + ν) - 2}{2 (p + ν)}})$ for the rate of convergence of tensor methods with respect to the initial functional residual. For first-order methods in the Lipschitz case (i.e. $p = ν = 1$ ), we have $O (\frac{1}{k})$ . This gives a lower complexity bound of $O (ϵ^{- 1})$ iterations for finding ε-stationary points of convex functions using first-order methods, which coincides with the lower bound (8a) in [Citation6]. Moreover, in view of Corollary 5.8, Algorithm 6 is suboptimal in terms of the initial residual, with the complexity a complexity gap that increases as p grows.

Now, we obtain a lower bound for the rate of convergence of p-order tensor methods with respect to the distance $∥ x_{0} - x^{*} ∥$ .

Theorem 6.7

Let $M$ be a p-order tensor method satisfying Assumption 1. Assume that for any function f with $H_{f, p} (ν) < + \infty$ this method ensures the rate of convergence: (116) $min_{1 \leq k \leq t - 1} ∥ \nabla f (x_{k}) ∥_{*} \leq \frac{H_{f, p} (ν) ∥ x_{0} - x^{*} ∥^{p + ν - 1}}{κ (t)}, t \geq 2,$ (116) where ${x_{k}}_{k \geq 0}$ is the sequence generated by method $M$ and $x^{*}$ is a global minimizer of f. Then, for all $t \geq 2$ such that $t + 1 \leq n$ we have (117) $κ (t) \leq L_{p, ν} (t + 1)^{\frac{3 (p + ν) - 2}{2}} w i t h L_{p, ν} = 2^{\frac{2 + ν}{2}} (3)^{- \frac{p + ν - 1}{2}} Π_{i = 0}^{p - 1} (p + ν - i) .$ (117)

Proof.

Let us apply method $M$ for minimizing function $f_{t} (\cdot)$ starting from point $x_{0} = 0$ . By Lemma 6.4, we have $x_{k} \in R_{k}^{n}$ for all k, $1 \leq k \leq t - 1$ . Thus, from Lemma 6.5, it follows that (118) $min_{1 \leq k \leq t - 1} ∥ \nabla f_{t} (x_{k}) ∥_{*} \geq min_{1 \leq k \leq t - 1} \frac{1}{\sqrt{k + 1}} = \frac{1}{\sqrt{t}} .$ (118) Then, combining (Equation116(116) $min_{1 \leq k \leq t - 1} ∥ \nabla f (x_{k}) ∥_{*} \leq \frac{H_{f, p} (ν) ∥ x_{0} - x^{*} ∥^{p + ν - 1}}{κ (t)}, t \geq 2,$ (116) ), (Equation118(118) $min_{1 \leq k \leq t - 1} ∥ \nabla f_{t} (x_{k}) ∥_{*} \geq min_{1 \leq k \leq t - 1} \frac{1}{\sqrt{k + 1}} = \frac{1}{\sqrt{t}} .$ (118) ), Lemma 6.1 and Lemma 6.2 we get $\begin{aligned} κ (t) & \leq \frac{H_{f_{t}, p} (ν) ∥ x_{0} - x_{t + 1}^{*} ∥^{p + ν - 1}}{min_{1 \leq k \leq t - 1} ∥ \nabla f_{t} (x_{k}) ∥_{*}} \leq 2^{\frac{2 + ν}{2}} Π_{i = 1}^{p - 1} (p + ν - i) ∥ x_{t}^{*} ∥^{p + ν - 1} t^{\frac{1}{2}} \\ \leq 2^{\frac{2 + ν}{2}} Π_{i = 1}^{p - 1} (p + ν - 1) {[\frac{(t + 1)^{\frac{3}{2}}}{\sqrt{3}}]}^{p + ν - 1} (t + 1)^{\frac{1}{2}} \leq L_{p, ν} (t + 1)^{\frac{3 (p + ν) - 2}{2}}, \end{aligned}$ where constant $L_{p, ν}$ is given in (Equation117(117) $κ (t) \leq L_{p, ν} (t + 1)^{\frac{3 (p + ν) - 2}{2}} w i t h L_{p, ν} = 2^{\frac{2 + ν}{2}} (3)^{- \frac{p + ν - 1}{2}} Π_{i = 0}^{p - 1} (p + ν - i) .$ (117) ).

Remark 6.2

Theorem 6.7 establishes that the lower bound for the rate of convergence of tensor methods in terms of the norm of the gradient is also of $O ((\frac{1}{k})^{\frac{3 (p + ν) - 2}{2}})$ . For first-order methods in the Lipschitz case (i.e. $p = ν = 1$ ), we have $O (\frac{1}{k^{2}})$ . This gives a lower complexity bound of $O (ϵ^{- \frac{1}{2}})$ for finding ε-stationary points of convex functions using first-order methods, which coincides with the lower bound (8b) in [Citation6].

Remark 6.3

The rate of $O ((\frac{1}{k})^{\frac{3 (p + ν) - 2}{2}})$ corresponds to a worst-case complexity bound of $O (ϵ^{- 2 / [3 (p + ν) - 2]})$ iterations necessary to ensure $∥ \nabla f (x_{k}) ∥_{*} \leq ϵ$ . Note that, for $ϵ \in (0, 1)$ , we have ${(\frac{1}{ϵ})}^{\frac{p + ν}{(p + ν - 1) (p + ν + 1)}} \leq {(\frac{1}{ϵ})}^{\frac{1}{p + ν - 1}} \leq {(\frac{1}{ϵ})}^{\frac{p + 1}{(p - 1) (3 p - 2)}} {(\frac{1}{ϵ})}^{\frac{2}{3 (p + ν) - 2}} .$ Thus, by increasing the power of the oracle (i.e. the order p), our non-universal schemes become nearly optimal. For example, if $ϵ = 10^{- 6}$ and $p \geq 4$ , we have $(\frac{1}{ϵ})^{\frac{1}{p + ν - 1}} \leq 10 (\frac{1}{ϵ})^{\frac{2}{3 (p + ν) - 2}}$ .

7. Conclusion

In this paper, we presented p-order methods that can find ε-approximate stationary points of convex functions that are p-times differentiable with ν-Hölder continuous pth derivatives. For the universal and the non-universal schemes without acceleration, we established iteration complexity bounds of $O (ϵ^{- 1 / (p + ν - 1)})$ for finding $\bar{x}$ such that $∥ \nabla f (\bar{x}) ∥_{*} \leq ϵ$ . For the case in which ν is known, we obtain improved complexity bounds of $O (ϵ^{- (p + ν) / [(p + ν - 1) (p + ν + 1)]})$ and $O (| \log (ϵ) | ϵ^{- 1 / (p + ν)})$ for the corresponding accelerated schemes. For the case in which ν is unknown, we obtained a bound of $O (ϵ^{- (p + 1) / [(p + ν - 1) (p + 2)]})$ for a universal accelerated scheme. Similar bounds were also obtained for tensor schemes adapted to the minimization of composite convex functions. A lower complexity bound of $O (ϵ^{- 2 / [3 (p + ν) - 2]})$ was obtained for the referred problem class. Therefore, in practice, our non-universal schemes become nearly optimal as we increase the order p.

As an additional result, we showed that Algorithm 6 takes at most $O (\log (ϵ^{- 1}))$ iterations to find ε-stationary points of uniformly convex functions of degree $p + ν$ in the form (Equation64(64) $F_{δ} (x) = f (x) + \frac{δ}{p + ν} ∥ x - x_{0} ∥^{p + ν} .$ (64) ). Notice that strongly convex functions are uniformly convex of degree 2. Thus, our result generalizes the known bound of $O (\log (ϵ^{- 1}))$ obtained for first-order schemes (p = 1) applied to strongly convex functions with Lipschitz continuous gradients ( $ν = 1$ ). At this point, it is not clear to us how p-order methods (with $p \geq 2$ ) behave when the objective functions are strongly convex with ν-Hölder continuous pth derivatives. Nevertheless, from the remarks done in [Citation12, p. 6] for p = 2, it appears that in our case the class of uniformly convex functions of degree $p + ν$ is the most suitable for p-order methods from a physical point of view.

Acknowledgements

The authors are very grateful to an anonymous referee, whose comments helped to improve the first version of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

G.N. Grapiglia was supported by the National Council for Scientific and Technological Development - Brazil [grant number 406269/2016-5] and by the European Research Council Advanced [grant number 788368]. Yurii Nesterov was supported by the European Research Council Advanced [grant number 788368].

Notes on contributors

G. N. Grapiglia

G. N. Grapiglia obtained his doctoral degree in Mathematics in 2014 at Universidade Federal do Paraná (UFPR), Brazil. Currently, he is an Assistant Professor at UFPR. His research cover the development, analysis and application of optimization methods.

Yurii Nesterov

Yurii Nesterov is a professor at the Center for Operations Research and Econometrics (CORE) in Catholic University of Louvain (UCL), Belgium. He received his Ph.D. degree (Applied Mathematics) in 1984 at the Institute of Control Sciences, Moscow. His research interests are related to complexity issues and efficient methods for solving various optimization problems. He has received several international prizes, among which are the Dantzig Prize from SIAM and Mathematical Programming society (2000), the John von Neumann Theory Prize from INFORMS (2009), the SIAM Outstanding Paper Award (2014), and the Euro Gold Medal from the Association of European Operations Research Societies (2016). In 2018, he also won an Advanced Grant from the European Research Council.

Notes

1 Conditions (Equation47(47) ${\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) \leq \tilde{f} (x) a n d ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \leq θ ∥ x^{+} - x ∥^{p + ν - 1},$ (47) ) have already been used in [Citation20] and are the composite analogue of the conditions proposed in [Citation2]. It is worth to mention that, for p = 3 and $ν = 1$ , the tensor model $Ω_{x, p, M}^{(ν)} (\cdot)$ has very nice relative smoothness properties (see [Citation27]) which allow the approximate solution of (Equation44(44) $min_{y \in E} {\tilde{Ω}}_{x, p, H}^{(ν)} (y)$ (44) ) by Bregman Proximal Gradient Algorithms [Citation3,Citation22].

References

M. Baes, Estimate sequence methods: Extensions and approximations (2009). Available at http://www.optimization-online.org/DB_FILE/2009/08/2372.pdf.
Google Scholar
E.G. Birgin, J.L. Gardenghi, J.M. Martínez, S.A. Santos, and Ph.L. Toint, Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models, Math. Program. 163 (2017), pp. 359–368. doi: 10.1007/s10107-016-1065-8
Web of Science ®Google Scholar
J. Bolte, S. Sabach, M. Teboulle, and Y. Vaesburg, First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems, SIAM. J. Optim. 28 (2018), pp. 2131–2151. doi: 10.1137/17M1138558
Web of Science ®Google Scholar
A. Bouaricha, Tensor methods for large, sparse unconstrained optimization, SIAM. J. Optim. 7 (1997), pp. 732–756. doi: 10.1137/S1052623494267723
Web of Science ®Google Scholar
S. Bubeck, Q. Jiang, Y.T. Lee, Y. Li, and A. Sidford, Near-optimal method for highly smooth convex optimization (2019). Available at arXiv, math. OC/1812.08026v2.
Google Scholar
Y. Carmon, J.C. Duchi, O. Hinder, and A. Sidford, Lower bounds for finding stationary points II: first-order methods (2017). Available at arXiv, math. OC/1711.00841.
Google Scholar
C. Cartis, N.I.M. Gould, and Ph.L. Toint, Second-order optimality and beyond: characterization and evaluation complexity in convexly constrained nonlinear optimization, Found. Comput. Math. 18 (2018), pp. 1073–1107. doi: 10.1007/s10208-017-9363-y
Web of Science ®Google Scholar
C. Cartis, N.I.M. Gould, and Ph.L. Toint, Strong evaluation complexity bounds for arbitrary-order optimization of nonconvex nonsmooth composite functions (2020). Available at arXiv, math. OC/2001.10802.
Google Scholar
C. Cartis, N.I.M. Gould, and Ph.L. Toint, Universal regularized methods – varying the power, the smoothness, and the accuracy, SIAM. J. Optim. 29 (2019), pp. 595–615. doi: 10.1137/16M1106316
Web of Science ®Google Scholar
X. Chen, Ph.L. Toint, and H. Wang, Complexity of partially separable convexly constrained optimization with non-lipschitz singularities, SIAM. J. Optim. 29 (2019), pp. 874–903. doi: 10.1137/18M1166511
Web of Science ®Google Scholar
X. Chen and Ph.L. Toint, High-order evaluation complexity for convexly-constrained optimization with non-Lipschitzian group sparsity terms, Math. Program. (2020). doi:10.1007/s10107-020-01470-9.
Web of Science ®Google Scholar
N. Doikov and Yu. Nesterov, Minimizing uniformly convex functions by cubic regularization of newton method (2019). Available at arXiv, math. OC/1905.02671.
Google Scholar
N. Doikov and Yu. Nesterov, Contracting proximal methods for smooth convex optimization. CORE Discussion Paper 2019/27.
Google Scholar
N. Doikov and Yu. Nesterov, Inexact tensor methods with dynamic accuracies (2020). Available at arXiv, math. OC/2002.09403.
Google Scholar
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C.A. Uribe, The global rate of convergence for optimal tensor methods in smooth convex optimization (2019). Available at arXiv, math. OC/1809.00389v11.
Google Scholar
G.N. Grapiglia and Yu. Nesterov, Regularized Newton methods for minimizing functions with Hölder continuous hessians, SIAM. J. Optim. 27 (2017), pp. 478–506. doi: 10.1137/16M1087801
Web of Science ®Google Scholar
G.N. Grapiglia and Yu. Nesterov, Accelerated regularized Newton methods for minimizing composite convex functions, SIAM. J. Optim. 29 (2019), pp. 77–99. doi: 10.1137/17M1142077
Web of Science ®Google Scholar
G.N Grapiglia and Yu. Nesterov, Tensor methods for minimizing convex functions with Hölder continuous higher-order derivatives. To appear in SIAM Journal on Optimization.
Google Scholar
G.N. Grapiglia and Yu. Nesterov, On inexact solution of auxiliary problems in tensor methods for convex optimization, Optim. Methods Softw. (2020). doi:10.1080/10556788.2020.1731749.
Web of Science ®Google Scholar
B. Jiang, T. Lin, and S. Zhang, A unified adaptive tensor approximation scheme to accelerated composite convex optimization (2020). Available at arXiv, math. OC/1811.02427v2.
Google Scholar
B. Jiang, H. Wang, and S. Zhang, An optimal high-order tensor method for convex optimization (2020). Available at arXiv, math OC/1812.06557v3.
Google Scholar
H. Lu, R.M. Freund, and Yu. Nesterov, Relatively smooth convex optimization by first-order methods, and applications, SIAM. J. Optim. 28 (2018), pp. 333–354. doi: 10.1137/16M1099546
Web of Science ®Google Scholar
J.M. Martínez, On high-order model regularization for constrained optimization, SIAM. J. Optim. 27 (2017), pp. 2447–2458. doi: 10.1137/17M1115472
Web of Science ®Google Scholar
R.D.C. Monteiro and B.F. Svaiter, An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods, SIAM. J. Optim. 23 (2013), pp. 1092–1125. doi: 10.1137/110833786
Web of Science ®Google Scholar
Yu. Nesterov, Accelerating the cubic regularization of Newton's method on convex problems, Math. Program. 112 (2008), pp. 159–181. doi: 10.1007/s10107-006-0089-x
Web of Science ®Google Scholar
Yu. Nesterov, How to make gradients small, Optima 88 (2012), pp. 10–11.
Google Scholar
Yu. Nesterov, Implementable tensor methods in unconstrained convex optimization, Math. Program. (2019). doi:10.1007/s10107-019-01449-1.
PubMed Web of Science ®Google Scholar
Yu. Nesterov, Inexact accelerated high-order proximal-point methods. CORE Discussion Paper 2020/08.
Google Scholar
Yu. Nesterov, Inexact high-order proximal-point methods with auxiliary search procedure. CORE Discussion Paper 2020/10.
Google Scholar
Yu. Nesterov and A. Nemirovskii, Interior Point Polynomial Methods in Convex Programming: Theory and Applications, SIAM, Philadelphia, 1994.
Google Scholar
Yu. Nesterov and B.T. Polyak, Cubic regularization of Newton method and its global performance, Math. Program. 108 (2006), pp. 177–205. doi: 10.1007/s10107-006-0706-8
Web of Science ®Google Scholar
A. Rodomanov and Yu. Nesterov, Smoothness parameter of power of Euclidean norm, J. Optim. Theory. Appl. 185 (2020), pp. 303–326. doi: 10.1007/s10957-020-01653-6
PubMed Web of Science ®Google Scholar
R.B. Schnabel and T. Chow, Tensor methods for unconstrained optimization using second derivatives, SIAM. J. Optim. 1 (1991), pp. 293–315. doi: 10.1137/0801020
Web of Science ®Google Scholar

Appendix. Accelerated scheme for composite minimization

To solve problem (Equation41(41) $min_{x \in E} \tilde{f} (x) \equiv f (x) + ϕ (x),$ (41) ), we can apply the following modification of Algorithm 3 in [Citation18]:

In order to establish a convergence rate for Algorithm A1, we will need the following result.

Lemma A.1

Suppose that H1 holds and let $x^{+}$ be an approximate solution to $min_{y \in E} {\tilde{Ω}}_{x, p, H}^{(ν)} (y)$ such that (A3) ${\tilde{Ω}}_{x, p, H}^{(ν)} (x^{+}) \leq \tilde{f} (x) a n d ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥ \leq θ ∥ x^{+} - x ∥^{p + ν - 1},$ (A3) for some $g_{ϕ} (x^{+}) \in \partial ϕ (x^{+})$ . If $H \geq (p + ν - 1) (H_{f, p} (ν) + θ (p - 1)!),$ then (A4) $⟨ \nabla \tilde{f} (x^{+}), x - x^{+} ⟩ \geq \frac{1}{3} {[\frac{(p - 1)!}{H}]}^{\frac{1}{p + ν - 1}} ∥ \nabla \tilde{f} (x^{+}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} .$ (A4)

Proof.

Denote $r = ∥ x^{+} - x ∥$ . Then, $\begin{aligned} ∥ \nabla \tilde{f} (x^{+}) + \frac{H (p + ν)}{p!} r^{p + ν - 2} B (x^{+} - x) ∥_{*} \\ = ∥ \nabla f (x^{+}) - \nabla Φ_{x, p} (x^{+}) + \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \\ \leq ∥ \nabla f (x^{+}) - \nabla Φ_{x, p} (x^{+}) ∥_{*} + ∥ \nabla Ω_{x, p, H}^{(ν)} (x^{+}) + g_{ϕ} (x^{+}) ∥_{*} \\ \leq (\frac{H_{f, p} (ν)}{(p - 1)!} + θ) r^{p + ν - 1}, \end{aligned}$ which gives (A5) $\begin{aligned} (\frac{H_{f, p} (ν)}{(p - 1)!} + θ) r^{2 (p + ν - 1)} & \geq ∥ \nabla \tilde{f} (x^{+}) + \frac{H (p + ν)}{p!} r^{p + ν - 2} B (x^{+} - x) ∥_{*}^{2} \\ = ∥ \nabla \tilde{f} (x^{+}) ∥_{*}^{2} + \frac{2 (p + ν)}{p!} H r^{p + ν - 2} ⟨ \nabla \tilde{f} (x^{+}), x^{+} - x ⟩ \\ + \frac{H^{2} (p + ν)^{2}}{(p!)^{2}} r^{2 (p + ν - 1)} . \end{aligned}$ (A5) From (EquationA5(A5) $\begin{aligned} (\frac{H_{f, p} (ν)}{(p - 1)!} + θ) r^{2 (p + ν - 1)} & \geq ∥ \nabla \tilde{f} (x^{+}) + \frac{H (p + ν)}{p!} r^{p + ν - 2} B (x^{+} - x) ∥_{*}^{2} \\ = ∥ \nabla \tilde{f} (x^{+}) ∥_{*}^{2} + \frac{2 (p + ν)}{p!} H r^{p + ν - 2} ⟨ \nabla \tilde{f} (x^{+}), x^{+} - x ⟩ \\ + \frac{H^{2} (p + ν)^{2}}{(p!)^{2}} r^{2 (p + ν - 1)} . \end{aligned}$ (A5) ), the rest of the proof follows exactly as in the proof of Lemma A.6 in [Citation18].

Theorem A.2

Suppose that H1 holds and let the sequence ${x_{t}}_{t = 0}^{T}$ be generated by Algorithm A1. Then, for $t = 2, \dots, T,$ (A6) $\tilde{f} (x_{t}) - \tilde{f} (x^{*}) \leq \frac{2^{3 p - 1} M (p + ν)^{p + ν} ∥ x_{0} - x^{*} ∥^{p + ν}}{(p - 1)! (t - 1)^{p + ν}} .$ (A6)

Proof.

For all $t \geq 0$ , we have (A7) $ψ_{t} (x) \leq A_{t} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν}, \forall x \in E .$ (A7) Indeed, (EquationA7(A7) $ψ_{t} (x) \leq A_{t} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν}, \forall x \in E .$ (A7) ) is true for t = 0 because $A_{0} = 0$ and $ψ_{0} (x) = \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν}$ . Suppose that (EquationA7(A7) $ψ_{t} (x) \leq A_{t} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν}, \forall x \in E .$ (A7) ) is true for some $t \geq 0$ . Then, $\begin{aligned} ψ_{t + 1} (x) & = ψ_{t} (x) + a_{t} [f (x_{t + 1}) + ⟨ \nabla f (x_{t + 1}), x - x_{t + 1} ⟩ + ϕ (x)] \\ \leq A_{t} \tilde{f} (x) + a_{t} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν} = A_{t + 1} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν} . \end{aligned}$ Thus, (EquationA7(A7) $ψ_{t} (x) \leq A_{t} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν}, \forall x \in E .$ (A7) ) follows by induction. Now, let us prove that (A8) $A_{t} \tilde{f} (x_{t}) \leq ψ_{t}^{*} \equiv min_{x \in E} ψ_{t} (x) .$ (A8) Again, using $A_{0} = 0$ , we see that (EquationA8(A8) $A_{t} \tilde{f} (x_{t}) \leq ψ_{t}^{*} \equiv min_{x \in E} ψ_{t} (x) .$ (A8) ) is true for t = 0. Assume that (EquationA8(A8) $A_{t} \tilde{f} (x_{t}) \leq ψ_{t}^{*} \equiv min_{x \in E} ψ_{t} (x) .$ (A8) ) is true for some $t \geq 0$ . Note that $ψ_{t} (\cdot)$ is uniformly convex of degree $p + ν$ with parameter $2^{- (p + ν - 2)}$ . Thus, by the induction assumption $ψ_{t} (x) \geq ψ_{t}^{*} + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x - v_{t} ∥^{p + ν} \geq A_{t} \tilde{f} (x_{t}) + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x - v_{t} ∥^{p + ν} .$ Consequently, (A9) $\begin{aligned} ψ_{t + 1}^{*} & = min_{x \in d o m ϕ} \{ψ_{t} (x) + a_{t} [f (x_{t + 1}) + ⟨ \nabla f (x_{t + 1}), x - x_{t + 1} + ϕ (x)]\} \\ \geq min_{x \in d o m ϕ} \{A_{t} \tilde{f} (x_{t}) + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x - v_{t} ∥^{p + ν} \\ + a_{t} [f (x_{t + 1}) + ⟨ \nabla f (x_{t + 1}), x - x_{t + 1} ⟩ + ϕ (x)]\} . \end{aligned}$ (A9) Since f is convex and differentiable and $g_{ϕ} (x_{t + 1}) \in \partial ϕ (x_{t + 1})$ , we have (A10) $\tilde{f} (x_{t}) \geq \tilde{f} (x_{t + 1}) + ⟨ \nabla \tilde{f} (x_{t + 1}), x_{t} - x_{t + 1} ⟩$ (A10) and (A11) $ϕ (x) \geq ϕ (x_{t + 1}) + ⟨ g_{ϕ} (x_{t + 1}), x - x_{t + 1} ⟩ .$ (A11) Using (EquationA10(A10) $\tilde{f} (x_{t}) \geq \tilde{f} (x_{t + 1}) + ⟨ \nabla \tilde{f} (x_{t + 1}), x_{t} - x_{t + 1} ⟩$ (A10) ) and (EquationA11(A11) $ϕ (x) \geq ϕ (x_{t + 1}) + ⟨ g_{ϕ} (x_{t + 1}), x - x_{t + 1} ⟩ .$ (A11) ) in (EquationA9(A9) $\begin{aligned} ψ_{t + 1}^{*} & = min_{x \in d o m ϕ} \{ψ_{t} (x) + a_{t} [f (x_{t + 1}) + ⟨ \nabla f (x_{t + 1}), x - x_{t + 1} + ϕ (x)]\} \\ \geq min_{x \in d o m ϕ} \{A_{t} \tilde{f} (x_{t}) + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x - v_{t} ∥^{p + ν} \\ + a_{t} [f (x_{t + 1}) + ⟨ \nabla f (x_{t + 1}), x - x_{t + 1} ⟩ + ϕ (x)]\} . \end{aligned}$ (A9) ), it follows that (A12) $\begin{aligned} ψ_{t + 1}^{*} & \geq min_{x \in d o m ϕ} \{A_{t + 1} \tilde{f} (x_{t + 1}) + ⟨ \nabla \tilde{f} (x_{t + 1}), A_{t} x_{t} - A_{t} x_{t + 1} ⟩ \\ + a_{t} ⟨ \nabla \tilde{f} (x_{t + 1}), x - x_{t + 1} ⟩ + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x - v_{t} ∥^{p + ν}\} . \end{aligned}$ (A12) Note that $A_{t} x_{t} = A_{t + 1} y_{t} - a_{t} v_{t}$ and $A_{t + 1} x_{t + 1} = A_{t} x_{t + 1} + a_{t} x_{t + 1}$ . Thus, combining (EquationA12(A12) $\begin{aligned} ψ_{t + 1}^{*} & \geq min_{x \in d o m ϕ} \{A_{t + 1} \tilde{f} (x_{t + 1}) + ⟨ \nabla \tilde{f} (x_{t + 1}), A_{t} x_{t} - A_{t} x_{t + 1} ⟩ \\ + a_{t} ⟨ \nabla \tilde{f} (x_{t + 1}), x - x_{t + 1} ⟩ + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x - v_{t} ∥^{p + ν}\} . \end{aligned}$ (A12) ) and Lemma A.1, we obtain $\begin{aligned} ψ_{t + 1}^{*} & \geq A_{t + 1} \tilde{f} (x_{t + 1}) + min_{x \in d o m ϕ} \{A_{t + 1} \frac{1}{4} {[\frac{(p - 1)!}{M}]}^{\frac{1}{p + ν - 1}} ∥ \nabla \tilde{f} (x_{t + 1}) ∥_{*}^{\frac{p + ν}{p + ν - 1}} \\ + a_{t} ⟨ \nabla \tilde{f} (x_{t + 1}), x - v_{t} ⟩ + \frac{2^{- (p + ν - 2)}}{p + ν} ∥ x_{t} - v_{t} ∥^{p + ν}\} \geq A_{t + 1} \tilde{f} (x_{t + 1}), \end{aligned}$ where the last inequality follows from (A1) exactly as in the proof of Theorem 4.3 in [Citation18]. Thus, (EquationA8(A8) $A_{t} \tilde{f} (x_{t}) \leq ψ_{t}^{*} \equiv min_{x \in E} ψ_{t} (x) .$ (A8) ) also holds for t + 1, which completes the induction argument.

Now, combining (EquationA7(A7) $ψ_{t} (x) \leq A_{t} \tilde{f} (x) + \frac{1}{p + ν} ∥ x - x_{0} ∥^{p + ν}, \forall x \in E .$ (A7) ) and (EquationA8(A8) $A_{t} \tilde{f} (x_{t}) \leq ψ_{t}^{*} \equiv min_{x \in E} ψ_{t} (x) .$ (A8) ), we have (A13) $\tilde{f} (x_{t}) - \tilde{f} (x^{*}) \leq \frac{1}{A_{t}} [\frac{1}{p + ν} ∥ x_{0} - x^{+} ∥^{p + ν}] .$ (A13) Once again, as in the proof of Theorem 4.3 in [Citation18], it follows from (A1) that (A14) $A_{t} \geq \frac{(p - 1)!}{2^{3 p - 1} M} {[\frac{1}{p + ν} {(\frac{1}{2})}^{\frac{p + ν - 1}{p + ν}}]}^{p + ν} (t - 1)^{p + ν}, \forall t \geq 2.$ (A14) Finally, (EquationA6(A6) $\tilde{f} (x_{t}) - \tilde{f} (x^{*}) \leq \frac{2^{3 p - 1} M (p + ν)^{p + ν} ∥ x_{0} - x^{*} ∥^{p + ν}}{(p - 1)! (t - 1)^{p + ν}} .$ (A6) ) follows directly from (EquationA13(A13) $\tilde{f} (x_{t}) - \tilde{f} (x^{*}) \leq \frac{1}{A_{t}} [\frac{1}{p + ν} ∥ x_{0} - x^{+} ∥^{p + ν}] .$ (A13) ) and (EquationA14(A14) $A_{t} \geq \frac{(p - 1)!}{2^{3 p - 1} M} {[\frac{1}{p + ν} {(\frac{1}{2})}^{\frac{p + ν - 1}{p + ν}}]}^{p + ν} (t - 1)^{p + ν}, \forall t \geq 2.$ (A14) ).

Tensor methods for finding approximate stationary points of convex functions

Abstract