Full article: Convergence rate analysis of the gradient descent–ascent method for convex

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this paper, we study the gradient descent–ascent method for convex–concave saddle-point problems. We derive a new non-asymptotic global convergence rate in terms of distance to the solution set by using the semidefinite programming performance estimation method. The given convergence rate incorporates most parameters of the problem and it is exact for a large class of strongly convex-strongly concave saddle-point problems for one iteration. We also investigate the algorithm without strong convexity and we provide some necessary and sufficient conditions under which the gradient descent–ascent enjoys linear convergence.

Keywords:

Mathematic Subject Classifications:

1. Introduction

We consider the convex–concave saddle point problem (1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) where $F : R^{n} \times R^{m} \to (- \infty, \infty)$ , and $F (\cdot, y)$ and $F (x, \cdot)$ are convex and concave, respectively, for any fixed $x \in R^{n}$ and $y \in R^{m}$ . We assume that problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) has some solution, that is, there exists a saddle point $(x^{⋆}, y^{⋆}) \in R^{n} \times R^{m}$ with $F (x^{⋆}, y) \leq F (x^{⋆}, y^{⋆}) \leq F (x, y^{⋆}), \forall x \in R^{n}, \forall y \in R^{m} .$ We denote the solution set of problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) with $S^{⋆}$ . We call F smooth if for some $L_{x}, L_{y}, L_{xy}$ , we have $\begin{aligned} (i) & ‖ \nabla_{x} F (x_{2}, y) - \nabla_{x} F (x_{1}, y) ‖ \leq L_{x} ‖ x_{2} - x_{1} ‖ \forall x_{1}, x_{2}, y \\ (ii) & ‖ \nabla_{y} F (x, y_{2}) - \nabla_{y} F (x, y_{1}) ‖ \leq L_{y} ‖ y_{2} - y_{1} ‖ \forall x, y_{1}, y_{2} \\ (iii) & ‖ \nabla_{x} F (x, y_{2}) - \nabla_{x} F (x, y_{1}) ‖ \leq L_{xy} ‖ y_{2} - y_{1} ‖ \forall x, y_{1}, y_{2} \\ (iv) & ‖ \nabla_{y} F (x_{2}, y) - \nabla_{y} F (x_{1}, y) ‖ \leq L_{xy} ‖ x_{2} - x_{1} ‖ \forall x_{1}, x_{2}, y . \end{aligned}$ The function F is said to be strongly convex-strongly concave if $\begin{aligned} (i) & F (\cdot, y) - \frac{μ_{x}}{2} ‖ \cdot ‖^{2} is convex for any fixed y \\ (ii) & F (x, \cdot) + \frac{μ_{y}}{2} ‖ \cdot ‖^{2} is concave for any fixed x, \end{aligned}$ for some $μ_{x}, μ_{y} > 0$ . Note that strong convex-strong concavity implies that problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) has a unique solution $(x^{⋆}, y^{⋆})$ . We denote the set of smooth strongly convex-strongly concave functions by $F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ .

Problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) has applications in game theory [Citation4], robust optimization [Citation5], adversarial training [Citation14], and reinforcement learning [Citation30], to name but a few. In addition, various other algorithms have been developed for solving saddle point problems; see e.g. [Citation15,Citation16,Citation18,Citation25,Citation26,Citation28,Citation29].

One of the simplest approaches for handling problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) introduced in [Citation2, Chapter 6] is the gradient-descent–ascent method, which may be regarded as a generalization of the gradient method for saddle point problems. The gradient descent–ascent method is described in Algorithm 1.

The local and global linear convergence of Algorithm 1 have been investigated in the literature; see [Citation3,Citation13,Citation17,Citation33] and the references therein. As we investigate the global linear convergence rate of Algorithm 1, we mention one known global convergence result, which is derived by using variational inequality techniques. Suppose that $z = (x, y)$ . Let the function $ϕ : R^{n + m} \to R^{n + m}$ given by $ϕ (z) = (\nabla_{x} F (z) - \nabla_{y} F (z))^{T}$ . It is shown that, see e.g. [Citation22], $\begin{aligned} ‖ ϕ (\bar{z}) - ϕ (\hat{z}) ‖ \leq 2 L ‖ \bar{z} - \hat{z} ‖, \\ ⟨ ϕ (\bar{z}) - ϕ (\hat{z}), \bar{z} - \hat{z} ⟩ \geq μ ‖ \bar{z} - \hat{z} ‖^{2}, \end{aligned}$ where $L = max {L_{x}, L_{y}, L_{xy}}$ and $μ = min {μ_{x}, μ_{y}}$ . Indeed, ϕ is Lipschitz continuous and strongly monotone. By Facchinei and Pang [Citation12, Theorem 12.1.2], for $t \in (0, \frac{μ}{2 L^{2}})$ , we have (2) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq (1 + 4 L^{2} t^{2} - 2 μt) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$ (2) In this study, we revisit Algorithm 1 and improve the convergence rate (Equation2(2) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq (1 + 4 L^{2} t^{2} - 2 μt) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$ (2) ). Indeed, we derive a new convergence rate involving most parameters of problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ). It is worth noting that if one sets $L = max {L_{x}, L_{y}, L_{xy}}$ and $μ = min {μ_{x}, μ_{y}}$ , the new bound dominates the convergence rate (Equation2(2) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq (1 + 4 L^{2} t^{2} - 2 μt) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$ (2) ) for any step length $t \in (0, \frac{μ}{2 L^{2}})$ . Furthermore, by setting $t = \frac{μ}{4 L^{2}}$ , one can infer that Algorithm 1 has a complexity of $O (\frac{L^{2}}{μ^{2}} \ln (\frac{1}{ϵ}))$ to compute iterates $(x^{k}, y^{k})$ such that $‖ x^{k} - x^{⋆} ‖^{2} + ‖ y^{k} - y^{⋆} ‖^{2} \leq ϵ (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2})$ , which is the known iteration complexity bound in the literature; see e.g. [Citation6,Citation32]. In this study, thanks to the new convergence rate given in Theorem 2.2, the order of complexity of $O ((\frac{L}{μ} + \frac{L_{xy}^{2}}{μ^{2}}) \ln (\frac{1}{ϵ}))$ is obtained when $L = max {L_{x}, L_{y}}$ and $μ = min {μ_{x}, μ_{y}}$ , which is more informative in comparison with the above-mentioned one. Moreover, by providing some example, we show that the given convergence rate is exact for one iteration.

The goal of this work is not to achieve the optimal algorithmic complexity for the class of saddle point problems introduced above. Rather, we have the more modest goal of giving the best possible worst-case complexity analysis of the gradient descent–ascent method (Algorithm 1). It is important to note that there are accelerated and extra gradient descent–ascent methods with better worst-case complexity than Algorithm 1; see e.g. [Citation18,Citation28]. In particular, the accelerated methods may be shown to have a worst-case complexity $O (\sqrt{\frac{L_{x}^{2}}{μ_{x}^{2}} + \frac{L_{xy}^{2}}{μ_{x} μ_{y}} + \frac{L_{y}^{2}}{μ_{y}^{2}}} \cdot \ln (\frac{1}{ϵ}))$ , which may be compared to the best-known lower complexity bound $O (\sqrt{\frac{L_{x}}{μ_{x}} + \frac{L_{xy}^{2}}{μ_{x} μ_{y}} + \frac{L_{y}}{μ_{y}}} \cdot \ln (\frac{1}{ϵ}))$ for the class of pure first-order algorithms [Citation34].

The paper is organized as follows. First, we present basic definitions and preliminaries used to establish the results. Section 2 is devoted to the study of the linear convergence of Algorithm 1. In Section 3, we study the linear convergence of the gradient descent–ascent method without strong convexity. Indeed, we let $F \in F (L_{x}, L_{y}, L_{xy}, 0, 0)$ and give some necessary and sufficient conditions for the linear convergence. Moreover, we derive a convergence rate under this setting.

Notation 1

The n-dimensional Euclidean space is denoted by $R^{n}$ . We use $⟨ \cdot, \cdot ⟩$ and $‖ \cdot ‖$ to denote the Euclidean inner product and norm, respectively. For a matrix A, $A_{ij}$ denotes its $(i, j)$ th entry, and $A^{T}$ represents the transpose of A. We use $λ_{max} (A)$ and $λ_{min} (A)$ to denote the largest and the smallest eigenvalue of symmetric matrix A, respectively.

Let $X \subseteq R^{n}$ . We denote the distance function to X by $d_{X} (x) := inf_{\bar{x} \in X} ‖ x - \bar{x} ‖$ and the set-valued mapping $Π_{X} (x)$ stands for the projection of x on X, i.e. $Π_{X} (x) := {y \in X : ‖ x - y ‖ = d_{X} (x)}$ .

We call a differentiable function $f : R^{n} \to (- \infty, \infty)$ L-smooth if $‖ \nabla f (x_{1}) - ∇f (x_{2}) ‖ \leq L ‖ x_{1} - x_{2} ‖ \forall x_{1}, x_{2} \in R^{n} .$ The function $f : R^{n} \to R$ is called μ-strongly convex function if the function $x \mapsto f (x) - \frac{μ}{2} ‖ x ‖^{2}$ is convex. Clearly, any convex function is 0-strongly convex. We denote the set of real-valued convex functions which are L-smooth and μ-strongly convex by $F_{μ, L} (R^{n})$ .

Let $I$ be a finite index set and let ${(x^{i}; g^{i}; f^{i})}_{i \in I} \subseteq R^{n} \times R^{n} \times R$ . A set ${(x^{i}; g^{i}; f^{i})}_{i \in I}$ is called $F_{μ, L}$ -interpolable if there exists $f \in F_{μ, L} (R^{n})$ with $f (x^{i}) = f^{i}, g^{i} \in ∂f (x^{i}) i \in I .$ The next theorem gives necessary and sufficient conditions for $F_{μ, L}$ -interpolability.

Theorem 1.1

[Citation27, Theorem 4]

Let $L \in (0, \infty)$ and $μ \in [0, \infty)$ and let $I$ be a finite index set. The set ${(x^{i}; g^{i}; f^{i})}_{i \in I} \subseteq R^{n} \times R^{n} \times R$ is $F_{μ, L}$ -interpolable if and only if for any $i, j \in I$ , we have

(3) $\frac{1}{2 (1 - \frac{μ}{L})} (\frac{1}{L} {‖ g^{i} - g^{j} ‖}^{2} + μ {‖ x^{i} - x^{j} ‖}^{2} - \frac{2 μ}{L} ⟨ g^{j} - g^{i}, x^{j} - x^{i} ⟩) \leq f^{i} - f^{j} - ⟨ g^{j}, x^{i} - x^{j} ⟩ .$ (3)

It is worth mentioning that, under the assumptions of Theorem 1.1, the set ${(x^{i}; g^{i}; f^{i})}_{i \in I}$ is interpolable with an L-smooth μ-strongly concave function if and only if for any $i, j \in I$ , we have

(4) $\frac{1}{2 (1 - \frac{μ}{L})} (\frac{1}{L} {‖ g^{i} - g^{j} ‖}^{2} + μ {‖ x^{i} - x^{j} ‖}^{2} + \frac{2 μ}{L} ⟨ g^{j} - g^{i}, x^{j} - x^{i} ⟩) \leq - f^{i} + f^{j} + ⟨ g^{j}, x^{i} - x^{j} ⟩ .$ (4)

2. The gradient descent–ascent method

In this section, we study the convergence rate of gradient descent–ascent method when $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ with $min {μ_{x}, μ_{y}} > 0$ . Indeed, we investigate the worst-case behaviour of one step of Algorithm 1 in terms of distance to the unique saddle point $(x^{⋆}, y^{⋆})$ for one iterate. Let $(x^{2}, y^{2})$ be generated by the algorithm using the starting point $(x^{1}, y^{1})$ . The worst-cast convergence rate is in essence an optimization problem. Indeed, the worst-cast convergence rate of Algorithm 1 is given by the solution of the following abstract optimization problem: (5) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & (x^{2}, y^{2}) is generated by Algorithm 1 w . r . t . F, x^{1}, y^{1} \\ (x^{⋆}, y^{⋆}) is the unique saddle point of problem (1) \\ F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y}) \\ x^{1} \in R^{n}, y^{1} \in R^{m} . \end{aligned}$ (5) In problem (Equation5(5) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & (x^{2}, y^{2}) is generated by Algorithm 1 w . r . t . F, x^{1}, y^{1} \\ (x^{⋆}, y^{⋆}) is the unique saddle point of problem (1) \\ F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y}) \\ x^{1} \in R^{n}, y^{1} \in R^{m} . \end{aligned}$ (5) ), $F, x^{1}, x^{2}, x^{⋆}, y^{1}, y^{2}, y^{⋆}$ are decision variables and $μ_{x}, L_{x}, μ_{y}, L_{y}, L_{xy}, t$ are fixed parameters. At first glace, problem (Equation5(5) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & (x^{2}, y^{2}) is generated by Algorithm 1 w . r . t . F, x^{1}, y^{1} \\ (x^{⋆}, y^{⋆}) is the unique saddle point of problem (1) \\ F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y}) \\ x^{1} \in R^{n}, y^{1} \in R^{m} . \end{aligned}$ (5) ) seems completely intractable, but its solution may in fact be approximated using a suitable semidefinite programming (SDP) problem, as shown below. This type of approximation is an example of the so-called SDP performance estimation method, that was introduced by Drori and Teboulle [Citation10].

Suppose that $\begin{aligned} F^{i, j} = F (x^{i}, y^{j}) i, j \in {1, 2, ⋆}, \\ G_{x}^{i, j} = \nabla_{x} F (x^{i}, y^{j}) i, j \in {1, 2, ⋆}, \\ G_{y}^{i, j} = \nabla_{y} F (x^{i}, y^{j}) i, j \in {1, 2, ⋆}, \end{aligned}$ where indices ${1, 2, ⋆}$ refers to the starting point, the point generated by Algorithm 1 and the saddle point of the problem, respectively. Note that due to the the necessary and sufficient conditions for convex–concave saddle point problems, we have $G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0.$ By using Theorem 1.1, problem (Equation5(5) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & (x^{2}, y^{2}) is generated by Algorithm 1 w . r . t . F, x^{1}, y^{1} \\ (x^{⋆}, y^{⋆}) is the unique saddle point of problem (1) \\ F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y}) \\ x^{1} \in R^{n}, y^{1} \in R^{m} . \end{aligned}$ (5) ) may be relaxed as a finite dimensional optimization problem, (6) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{2}; G_{x}^{2, k}; F^{2, k}), (x^{⋆}; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{2}; G_{y}^{k, 2}; F^{k, 2}), (y^{⋆}; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ ‖ G_{x}^{k, i} - G_{x}^{k, j} ‖ \leq L_{xy} ‖ y^{i} - y^{j} ‖, i, j, k \in {1, 2, ⋆} \\ ‖ G_{y}^{i, k} - G_{y}^{j, k} ‖ \leq L_{xy} ‖ x^{i} - x^{j} ‖, i, j, k \in {1, 2, ⋆} \\ x^{2} = x^{1} - t G_{x}^{1, 1} \\ y^{2} = y^{1} + t G_{y}^{1, 1}, \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (6) In problem (Equation6(6) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{2}; G_{x}^{2, k}; F^{2, k}), (x^{⋆}; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{2}; G_{y}^{k, 2}; F^{k, 2}), (y^{⋆}; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ ‖ G_{x}^{k, i} - G_{x}^{k, j} ‖ \leq L_{xy} ‖ y^{i} - y^{j} ‖, i, j, k \in {1, 2, ⋆} \\ ‖ G_{y}^{i, k} - G_{y}^{j, k} ‖ \leq L_{xy} ‖ x^{i} - x^{j} ‖, i, j, k \in {1, 2, ⋆} \\ x^{2} = x^{1} - t G_{x}^{1, 1} \\ y^{2} = y^{1} + t G_{y}^{1, 1}, \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (6) ), ${(x^{i}; G_{x}^{i, j}; F^{i, j})}$ and ${(y^{i}; G_{y}^{j, i}; F^{j, i})}$ $(i, j \in {1, 2, ⋆})$ are decision variables. We may assume that $x^{⋆} = 0$ and $y^{⋆} = 0$ as Algorithm 1 is invariant under translation. By elimination, problem (Equation6(6) $\begin{aligned} max & \frac{‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2}}{‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{2}; G_{x}^{2, k}; F^{2, k}), (x^{⋆}; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{2}; G_{y}^{k, 2}; F^{k, 2}), (y^{⋆}; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ ‖ G_{x}^{k, i} - G_{x}^{k, j} ‖ \leq L_{xy} ‖ y^{i} - y^{j} ‖, i, j, k \in {1, 2, ⋆} \\ ‖ G_{y}^{i, k} - G_{y}^{j, k} ‖ \leq L_{xy} ‖ x^{i} - x^{j} ‖, i, j, k \in {1, 2, ⋆} \\ x^{2} = x^{1} - t G_{x}^{1, 1} \\ y^{2} = y^{1} + t G_{y}^{1, 1}, \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (6) ) may be reformulated as follows, (7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) To approximate the solution of problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ), we reformulate it as a semidefinite program by using the Gram matrix of the unknown vectors in the problem. Indeed, we form the Gram matrices X and Y as follows, $\begin{aligned} U = (\begin{matrix} x^{1} & x^{2} & G_{x}^{1, 1} & G_{x}^{1, 2} & G_{x}^{1, ⋆} & G_{x}^{2, 1} & G_{x}^{2, 2} & G_{x}^{2, ⋆} & G_{x}^{⋆, 1} & G_{x}^{⋆, 2} \end{matrix}) \\ V = (\begin{matrix} y^{1} & y^{2} & G_{y}^{1, 1} & G_{y}^{1, 2} & G_{y}^{1, ⋆} & G_{y}^{2, 1} & G_{y}^{2, 2} & G_{y}^{2, ⋆} & G_{y}^{⋆, 1} & G_{y}^{⋆, 2} \end{matrix}) \\ X = U^{T} U, Y = V^{T} V . \end{aligned}$ This results in an SDP problem, as long as we view the value $‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}$ , that appears in the denominator of the objective of problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ), as a fixed parameter. For this reason we may indeed view problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ) as an SDP problem in the positive semidefinite matrix variables X and Y. The interested reader may refer to [Citation27,Citation31] for more details concerning the Gram matrix formulation, and SDP performance estimation in general.

For the convenience of the analysis, we investigate the linear convergence of Algorithm 1 in terms of $L = max {L_{x}, L_{y}}$ and $μ = min {μ_{x}, μ_{y}}$ . Before we present the main theorem in this section, we need to present a lemma.

Lemma 2.1

Let $0 < μ \leq L$ , $c \geq 0$ and let $I = (0, \frac{2 μ}{μL + c^{2}})$ . Suppose that the function $u : I \to R$ given by $u (t) = \frac{1}{2} (L^{2} + μ^{2} + 2 c^{2}) t^{2} - (L + μ) t + \frac{1}{2} (L - μ) t \sqrt{(Lt + μt - 2)^{2} + 4 c^{2} t^{2}} .$ Then u is convex on I and $u (I) \subseteq [- 1, 0)$ .

Proof.

Consider the function $v : I \to R$ given by $v (t) = (L^{2} + μ^{2} + 2 c^{2}) t + (L - μ) \sqrt{(Lt + μt - 2)^{2} + 4 c^{2} t^{2}} .$ The function v is convex and positive on I. By elementary calculus, one can show that $v^{'} (0) > 0$ . So v is increasing on I due to the convexity. As the product of positive monotone convex functions is a convex function, the function $t \mapsto tv (t)$ is also convex, which implies the convexity of u. Indeed, u is strictly convex on I. Since strictly convex functions attain their maximum on endpoints of a given interval, $u (t) < max {u (0), u (\frac{2 μ}{μL + c^{2}})} = 0$ for $t \in I$ . It remains to show that $min_{t \in I} u (t) \geq - 1$ . This follows from the point that $u (t) \geq \frac{1}{2} (L^{2} + μ^{2}) t^{2} - (L + μ) t \geq \frac{- 1}{2} (1 + \frac{2 Lμ}{L^{2} + μ^{2}}) \geq - 1,$ and the proof is complete.

By the weak duality theorem for SDP, one may demonstrate an upper bound for the optimal value of the SDP problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ), by constructing a feasible solution to its dual problem, i.e. feasible dual multipliers for the constraints of problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ). This is done in the next theorem. In the proof, the correct value of the dual multipliers are simply given, and their correctness is verified. The correct values were obtained by solving the SDP problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ) repeatedly for different numerical values of the parameters, and noting the (numerical) optimal dual multiplier values. Based on these values, it was possible to deduce the analytical expressions for the multipliers. For this reason, the proof was found in a computer-assisted way, but it does not rely on any numerical calculations. Having said that, the proof involves a long identity, given in full in Appendix 2 to this paper, that is so long that it could only be obtained in a computer-assisted way.

Theorem 2.2

Let $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ . Suppose that $L = max {L_{x}, L_{y}}$ and $μ = min {μ_{x}, μ_{y}} > 0$ . If $t \in (0, \frac{2 μ}{μL + L_{xy}^{2}})$ , then Algorithm 1 generates $(x^{2}, y^{2})$ such that (8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) where $α = 1 + \frac{1}{2} (L^{2} + μ^{2} + 2 L_{xy}^{2}) t^{2} - (L + μ) t + \frac{1}{2} (L - μ) t \sqrt{(Lt + μt - 2)^{2} + 4 L_{xy}^{2} t^{2}} .$

Proof.

As mentioned earlier, we may assume without loss of generality that $x^{⋆} = 0$ and $y^{⋆} = 0$ . By assumption, $F (\cdot, y) \in F_{μ, L} (R^{n})$ and $F (x, \cdot) \in F_{μ, L} (R^{m})$ for any fixed x, y. Without loss of generality, we may also assume that $L_{xy} = 1$ , by replacing F by $\frac{1}{L_{xy}} F$ . This follows from the observation that Algorithm 1 generates the same point $(x^{2}, y^{2})$ for the problem $min_{x \in R^{n}} max_{y \in R^{m}} \frac{1}{L_{xy}} F (x, y),$ with the step length $L_{xy} t$ . Moreover, one has $\frac{1}{L_{xy}} F \in F (\frac{L_{x}}{L_{xy}}, \frac{L_{y}}{L_{xy}}, 1, \frac{μ_{x}}{L_{xy}}, \frac{μ_{y}}{L_{xy}})$ if and only if $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ . Now let $t \in (0, \frac{2 μ}{μL + 1})$ and define (the multipliers): $\begin{aligned} \bar{α} = 1 + \frac{1}{2} (L^{2} + μ^{2} + 2) t^{2} - (L + μ) t + \frac{1}{2} (L - μ) t \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}}, \\ β = \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}}, γ_{1} = \frac{t (t^{2} (2 + L^{2} + Lμ) - t (3 L + μ) + (Lt - 1) β + 2)}{β}, \\ γ_{2} = \frac{t (t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) β + 2)}{β}, γ_{3} = \frac{t^{2} (β + Lt - μt)}{2 β} . \end{aligned}$ It is readily verified that $γ_{1}, γ_{2}, γ_{3} \geq 0$ , but since this calculation is somewhat tedious we present it in Appendix 1. Moreover, Lemma 2.1 implies that $\bar{α} \in [0, 1)$ .

The idea of the proof is now as follows: we first establish that, for any feasible solution of the SDP problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ), it holds that (9) ${‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}) \leq 0.$ (9) We do this by establishing an algebraic identity for the left-hand side of the inequality (Equation9(9) ${‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}) \leq 0.$ (9) ). The first and last terms of this identity (shown in full in Appendix 2 to this paper) are as follows: $\begin{aligned} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}) \\ = - γ_{1} (F^{1, 1} - F^{⋆, 1} - ⟨ G_{x}^{⋆, 1}, x^{1} ⟩ \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{x}^{1, 1} - G_{x}^{⋆, 1} ‖}^{2} + μ {‖ x^{1} ‖}^{2} - \frac{2 μ}{L} ⟨ G_{x}^{⋆, 1} - G_{x}^{1, 1}, - x^{1} ⟩)) \\ ⋮ \\ - \frac{t {(β + Lt - μt)}^{2}}{4 (L - μ) β} {‖ G_{y}^{1, 1} - G_{y}^{⋆, 1} - G_{y}^{1, ⋆} ‖}^{2} . \end{aligned}$ Note that the first term on the right-hand side is indeed non-positive, since $γ_{1} \geq 0$ , and the expression in brackets is non-negative at any feasible solution of the SDP problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ), since it corresponds to one of the constraints in (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ). The last term is non-positive as well, since it is the product of a non-positive multiplier with a squared expression. The remaining terms in the identity are similarly non-positive (see Appendix 2), proving the inequality (Equation9(9) ${‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}) \leq 0.$ (9) ). All that remains is to recognize that, in (Equation9(9) ${‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}) \leq 0.$ (9) ), $G_{x}^{1, 1}$ corresponds to $\nabla_{x} F (x^{1}, y^{1})$ , so that $x^{1} - t G_{x}^{1, 1}$ corresponds to $x^{2}$ , etc. This yields the statement of the theorem, after rescaling to remove the assumption $L_{xy} = 1$ .

One may wonder how we obtained the (analytical) expression for α in Theorem 2.2. Consider the optimization problem (10) $min_{x \in R^{n}} f (x),$ (10) where $f \in F_{μ, L}$ . It is known that the quadratic function $q (x) = x^{T} Qx$ with $λ_{max} (Q) = L$ and $λ_{min} (Q) = μ$ attains the worst-case convergence rate for the gradient method; see e.g. [Citation9]. We guessed that this property may hold for problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) and we investigated the bilinear saddle point problem (11) $min_{x \in R^{2}} max_{y \in R^{2}} \frac{1}{2} x^{T} (\begin{matrix} L_{x} & 0 \\ 0 & μ_{x} \end{matrix}) x + x^{T} (\begin{matrix} 0 & L_{xy} \\ L_{xy} & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L_{y} & 0 \\ 0 & μ_{y} \end{matrix}) y,$ (11) where $L_{x} \geq μ_{x} > 0$ , $L_{y} \geq μ_{y} > 0$ and $L_{xy}$ are fixed parameters and we derived the worst case convergence of Algorithm 1 with respect to this problem. Our numerical experiments showed that the derived convergence rate is the same as the optimal value of the semidefinite programming problem corresponding to problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ). Moreover, as a by-product, we exhibit that the convergence rate (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) is exact for one iteration by using problem (Equation11(11) $min_{x \in R^{2}} max_{y \in R^{2}} \frac{1}{2} x^{T} (\begin{matrix} L_{x} & 0 \\ 0 & μ_{x} \end{matrix}) x + x^{T} (\begin{matrix} 0 & L_{xy} \\ L_{xy} & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L_{y} & 0 \\ 0 & μ_{y} \end{matrix}) y,$ (11) ); see Proposition 2.4.

Theorem 2.2 provides some new information concerning Algorithm 1. Firstly, Theorem 2.2 improves the known convergence factor in the literature; see our discussion in Introduction. In addition, it investigates the convergence rate for a step length in a larger interval. Secondly, it does not assume the second order continuous differentiability of F, which is commonly used for deriving a local convergence rate; see [Citation17,Citation20,Citation33]. Finally, the given convergence rate incorporates three parameter $μ = min {μ_{x}, μ_{y}}$ , $L = max {L_{x}, L_{y}}$ and $L_{xy}$ , which is more informative in comparison with the results in the literature mostly given in terms of $μ = min {μ_{x}, μ_{y}}$ and $L = max {L_{x}, L_{y}, L_{xy}}$ ; see [Citation21,Citation33,Citation34] and references therein. Even though if one considers $L = max {L_{x}, L_{y}, L_{xy}}$ and $μ = min {μ_{x}, μ_{y}}$ , convergence rate (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) dominates (Equation2(2) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq (1 + 4 L^{2} t^{2} - 2 μt) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$ (2) ). This follows from that for $t \in (0, \frac{μ}{2 L^{2}})$ , one has

$\begin{aligned} (1 + 4 L^{2} t^{2} - 2 μt) - (1 + \frac{1}{2} (3 L^{2} + μ^{2}) t^{2} - (L + μ) t + \frac{1}{2} (L - μ) t \sqrt{(Lt + μt - 2)^{2} + 4 L^{2} t^{2}}) \\ \geq (2 L^{2} + Lμ - μ^{2}) t^{2} \geq 2 L^{2} t^{2}, \end{aligned}$ where the first inequality results from $\sqrt{(Lt + μt - 2)^{2} + 4 L^{2} t^{2}} \leq (2 - Lt - μt) + 2 Lt$ . In addition, in this case, the step length can take value in a larger interval as $(0, \frac{μ}{2 L^{2}}) \subseteq (0, \frac{2 μ}{L (L + μ)})$ . Moreover, Conjecture 2.6 discusses the convergence rate in terms of $L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y}$ .

The next proposition gives the optimal step length with respect to the worst case convergence rate.

Proposition 2.3

Let $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ . If $L = max {L_{x}, L_{y}}$ and $μ = min {μ_{x}, μ_{y}} > 0$ , then the optimal step length for Algorithm 1 with respect to the bound (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) is (12) $t^{⋆} = \frac{2 ((L + μ) \sqrt{L_{xy}^{2} + Lμ} + L_{xy} (μ - L))}{(4 L_{xy}^{2} + (L + μ)^{2}) \sqrt{L_{xy}^{2} + Lμ}} .$ (12) Moreover, the convergence rate with respect to $t^{⋆}$ is (13) $α^{⋆} = \frac{8 L_{xy} (L^{2} - μ^{2}) \sqrt{Lμ + L_{xy}^{2}} + {(L^{2} - μ^{2})}^{2} + 16 L_{xy}^{2} (Lμ + L_{xy}^{2})}{{((L + μ)^{2} + 4 L_{xy}^{2})}^{2}} .$ (13)

Proof.

Let $α : [0, \frac{2 μ}{μL + L_{xy}^{2}}] \to R$ given by $α (t) = 1 + \frac{1}{2} (L^{2} + μ^{2} + 2 L_{xy}^{2}) t^{2} - (L + μ) t + \frac{1}{2} (L - μ) t \sqrt{(Lt + μt - 2)^{2} + 4 L_{xy}^{2} t^{2}} .$ By Lemma 2.1, α is a strictly convex function on its domain. By doing some algebra, one can verify that $α^{'} (t^{⋆}) = 0$ , which implies that $t^{⋆}$ is the minimum.

If $L_{xy} = 0$ , problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) reduces to a separable optimization problem. Indeed, the variables x and y are independent. Under this assumption, the optimal step length given by Proposition 2.3 is $t^{⋆} = \frac{2}{L + μ}$ , which is the well-known optimal step length for the optimization problem $min_{x \in R^{n}} f (x),$ where $f \in F_{μ, L}$ ; see [Citation24, Theorem 2.1.15]. Moreover, the convergence rate corresponding to $t^{⋆}$ is $α^{⋆} = (\frac{L - μ}{L + μ})^{2}$ . By some algebra, one can show that under the assumptions of Proposition (2.3), Algorithm 1 has a complexity of $O ((\frac{L}{μ} + \frac{L_{xy}^{2}}{μ^{2}}) \ln (\frac{1}{ϵ}))$ . Note that the lower iteration complexity bound for first-order methods with $L = max {L_{x}, L_{y}}$ and $μ = min {μ_{x}, μ_{y}}$ is $Ω (\sqrt{\frac{L}{μ} + \frac{L_{xy}^{2}}{μ^{2}}} \ln (\frac{1}{ϵ}))$ ; see [Citation34].

As mentioned earlier, we calculated the convergence rate by using problem (Equation11(11) $min_{x \in R^{2}} max_{y \in R^{2}} \frac{1}{2} x^{T} (\begin{matrix} L_{x} & 0 \\ 0 & μ_{x} \end{matrix}) x + x^{T} (\begin{matrix} 0 & L_{xy} \\ L_{xy} & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L_{y} & 0 \\ 0 & μ_{y} \end{matrix}) y,$ (11) ). The next proposition states that the bound (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) is tight for some class of bilinear saddle point problems.

Proposition 2.4

Let $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ . Suppose that $L_{x} = L_{y}$ and $min {μ_{x}, μ_{y}} > 0$ . If $t \in (0, \frac{2 μ}{μL + L_{xy}^{2}})$ , then convergence rate (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) is exact for one iteration.

Proof.

To establish the proposition, it suffices to introduce a problem for which Algorithm 1 generates $(x^{2}, y^{2})$ with respect to the initial point $(x^{1}, y^{1})$ such that $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} = α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ where α is the convergence rate factor given in Theorem 2.2. Consider problem (Equation11(11) $min_{x \in R^{2}} max_{y \in R^{2}} \frac{1}{2} x^{T} (\begin{matrix} L_{x} & 0 \\ 0 & μ_{x} \end{matrix}) x + x^{T} (\begin{matrix} 0 & L_{xy} \\ L_{xy} & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L_{y} & 0 \\ 0 & μ_{y} \end{matrix}) y,$ (11) ). Due to the symmetry of Algorithm 1 and the class of problems, we may assume $μ_{x} \geq μ_{y}$ . Moreover, without loss of generality, we can take $L_{xy} = 1$ ; see our discussion in the proof of Theorem 2.2. Suppose $L = L_{x}$ , $μ = μ_{y}$ and $β = \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}}$ . One can verify that Algorithm 1 with the initial point $\begin{aligned} x_{1}^{1} = 0, x_{2}^{1} = \sqrt{\frac{2 - t (L + μ) + β}{2 β}}, \\ y_{1}^{1} = - t \sqrt{\frac{2}{β (2 - t (L + μ) + β)}}, y_{2}^{1} = 0. \end{aligned}$ generates $(x^{2}, y^{2})$ with the desired equality.

One may wonder why we stress on one iteration in Proposition 2.4. Based on our numerical results if $L_{xy} > 0$ , under the setting of Theorem 2.2, we observed that $‖ x^{k} - x^{⋆} ‖^{2} + ‖ y^{k} - y^{⋆} ‖^{2} < α^{k - 1} (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}), k \geq 3,$ for some $t \in (0, \frac{2 μ}{μL + L_{xy}^{2}})$ . The reason may be related to the fact that the vector field ${(\begin{matrix} \nabla_{x} F (x, y) & - \nabla_{y} F (x, y) \end{matrix})}^{T}$ is not conservative.

It may be of interest whether inequality (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) may hold without strong convexity. By removing strong convexity, the solution set may not be singleton. Hence, we investigate distance to the solution set, that is, if there exists $0 \leq α < 1$ with $d_{S^{⋆}}^{2} ((x^{2}, y^{2})) \leq α d_{S^{⋆}}^{2} ((x^{1}, y^{1})) .$ The next proposition says in general the answer is negative. Indeed, it gives an example with $min {μ_{x}, μ_{y}} = 0$ and a unique saddle point for which $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \geq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ for some $α \geq 1$ , no matter how close $(x^{1}, y^{1})$ is to the unique saddle point and which positive step length t is taken. In the next proposition, we may assume without loss of generality $μ_{x} = 0$ and make an example analogous to that given in Proposition 2.4.

Proposition 2.5

Let $L, L_{xy}, μ_{y}, t, r > 0$ be given. Then there exist $α \geq 1$ and a function $F \in F (L, L, L_{xy}, 0, μ_{y})$ with the unique saddle point $(x^{⋆}, y^{⋆})$ and $(x^{1}, y^{1})$ such that, for $(x^{2}, y^{2})$ generated by Algorithm 1, we have $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \geq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ and $‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2} = r^{2}$ .

Proof.

As discussed before, we may assume $L_{xy} = 1$ . Consider the bilinear saddle point problem, $min_{x \in R^{2}} max_{y \in R^{2}} F (x, y) = \frac{1}{2} x^{T} (\begin{matrix} L & 0 \\ 0 & 0 \end{matrix}) x + x^{T} (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L & 0 \\ 0 & μ_{y} \end{matrix}) y .$ It is clear that $F \in F (L, L, L_{xy}, 0, μ_{y})$ , and the unique saddle point is $(x^{⋆}, y^{⋆}) = (0, 0)$ . Suppose that $\begin{aligned} x_{1}^{1} = 0, x_{2}^{1} = r \sqrt{\frac{2 - tL + β}{2 β}}, \\ y_{1}^{1} = - rt \sqrt{\frac{2}{β (2 - tL + β)}}, y_{2}^{1} = 0, \end{aligned}$ where $β = \sqrt{(Lt - 2)^{2} + 4 t^{2}}$ . One can verify Algorithm 1 generates $(x^{2}, y^{2})$ with $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \geq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) = α r^{2},$ where $α = 1 + \frac{1}{2} (L^{2} + 2) t^{2} - Lt + \frac{1}{2} Lt \sqrt{(Lt - 2)^{2} + 4 t^{2}}$ . By Proposition 2.3, one can infer that $α \geq 1$ .

Note that r can take any positive value in Proposition 2.5. By Proposition 2.4, one can infer that the convergence rate factor for bilinear saddle point problems may not be improved for one iteration since the given example is a bilinear saddle point problem. Furthermore, the given convergence rate factor is tight whether $L_{x} = L_{y}$ . As discussed in [Citation28], the function $H (x, y) = F (\sqrt[4]{\frac{L_{y}}{L_{x}}} x, \sqrt[4]{\frac{L_{x}}{L_{y}}} y)$ shares the same smoothness constants with respect to x and y, that is, $\nabla_{x} H (\cdot, y)$ and $\nabla_{y} H (x, \cdot)$ are Lipschitz continuous with the same modulus $\sqrt{L_{x} L_{y}}$ . However, the gradient methods are not invariant under scaling; see [Citation8, Chapter 9]. Hence, we may lose the generality of our discussion by assuming this condition.

Based on our numerical results and analysis of problem (Equation11(11) $min_{x \in R^{2}} max_{y \in R^{2}} \frac{1}{2} x^{T} (\begin{matrix} L_{x} & 0 \\ 0 & μ_{x} \end{matrix}) x + x^{T} (\begin{matrix} 0 & L_{xy} \\ L_{xy} & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L_{y} & 0 \\ 0 & μ_{y} \end{matrix}) y,$ (11) ), we conjecture the following (exact) convergence rate of Algorithm 1 in terms of $L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y}$ . Due to the symmetry of Algorithm 1, we may assume that $L_{x} \geq L_{y}$ . Moreover, Proposition 2.4 implies that bound (Equation8(8) $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}),$ (8) ) is tight when $μ_{y} \leq μ_{x}$ . Hence, we need only consider $μ_{y} > μ_{x}$ .

Conjecture 2.6

Let $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ . Suppose that $μ_{y} > μ_{x} > 0$ , $max {L_{x}, L_{y}} = L_{x}$ and $\begin{aligned} c = \frac{1}{2} (L_{y}^{2} + μ_{x}^{2}) t - (L_{y} + μ_{x}) + \frac{1}{2} (L_{y} - μ_{x}) \sqrt{(L_{y} t + μ_{x} t - 2)^{2} + 4 L_{xy}^{2} t^{2}}, \\ \bar{μ} = \frac{c + 2 L_{x} - L_{x}^{2} t + L_{x} L_{xy}^{2} t^{2} - (c + L_{x} (2 - L_{x} t)) \sqrt{1 + t (c + t L_{xy}^{2})}}{t L_{xy} (c + t L_{xy}^{2} + L_{x} (2 - L_{x} t))}, \\ α (μ, L, L_{xy}, t) = 1 + \frac{1}{2} (L^{2} + μ^{2} + 2 L_{xy}^{2}) t^{2} - (L + μ) t \\ + \frac{1}{2} (L - μ) t \sqrt{(Lt + μt - 2)^{2} + 4 L_{xy}^{2} t^{2}} . \end{aligned}$ Then, one of the following scenarios holds.

Assume that $μ_{x} μ_{y} (L_{x} - L_{y}) \geq L_{xy}^{2} (μ_{y} - μ_{x})$ and $t \in (0, \frac{2 μ_{y}}{L_{x} μ_{y} + L_{xy}^{2}})$ .
1. If $μ_{y} \leq \bar{μ}$ , then $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (μ_{y}, L_{x}, L_{xy}, t) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$
2. If $μ_{y} \geq \bar{μ}$ , then $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (μ_{x}, L_{y}, L_{xy}, t) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$
Assume that $μ_{x} μ_{y} (L_{x} - L_{y}) \leq L_{xy}^{2} (μ_{y} - μ_{x})$ and $t \in (0, \frac{2 μ_{x}}{L_{y} μ_{x} + L_{xy}^{2}})$ .
1. If $μ_{y} \leq \bar{μ}$ , then $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (μ_{y}, L_{x}, L_{xy}, t) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$
2. If $μ_{y} \geq \bar{μ}$ , then $‖ x^{2} - x^{⋆} ‖^{2} + ‖ y^{2} - y^{⋆} ‖^{2} \leq α (μ_{x}, L_{y}, L_{xy}, t) (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$

Although we have extensive numerical evidence supporting Conjecture 2.6, we have been unable to prove either part (a) or part (b). To be more precise, we have verified numerically that the optimal value of the SDP problems (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ) and (Equation11(11) $min_{x \in R^{2}} max_{y \in R^{2}} \frac{1}{2} x^{T} (\begin{matrix} L_{x} & 0 \\ 0 & μ_{x} \end{matrix}) x + x^{T} (\begin{matrix} 0 & L_{xy} \\ L_{xy} & 0 \end{matrix}) y - \frac{1}{2} y^{T} (\begin{matrix} L_{y} & 0 \\ 0 & μ_{y} \end{matrix}) y,$ (11) ) corresponds to the expressions in Conjecture 2.6 for many different numerical values of the parameters $L_{x}$ , $L_{y}$ , $L_{xy}$ , $μ_{x}$ , and $μ_{y}$ , but we could not manage to derive analytical expressions for the dual multipliers of SDP problem (Equation7(7) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x}, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆} w . r . t . μ_{y}, L_{y} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, 2} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{2, k} ‖}^{2} \leq L_{xy}^{2} {‖ t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 1} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} ‖ y^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{1, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} ‖ x^{1} ‖^{2}, k \in {1, 2, ⋆} \\ {‖ G_{x}^{k, 2} - G_{x}^{k, ⋆} ‖}^{2} \leq L_{xy}^{2} {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ {‖ G_{y}^{2, k} - G_{y}^{⋆, k} ‖}^{2} \leq L_{xy}^{2} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2}, k \in {1, 2, ⋆} \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (7) ) for proving the conjecture.

2.1. Numerical illustration

In this section we provide randomly generated examples to compare the optimal step length (Equation12(12) $t^{⋆} = \frac{2 ((L + μ) \sqrt{L_{xy}^{2} + Lμ} + L_{xy} (μ - L))}{(4 L_{xy}^{2} + (L + μ)^{2}) \sqrt{L_{xy}^{2} + Lμ}} .$ (12) ) given in this paper to the known step length $t = μ / (4 L^{2})$ for the bilinear problem $min_{x \in R^{5}} max_{y \in R^{4}} \frac{1}{2} x^{T} A_{x} x + x^{T} A_{xy} y - \frac{1}{2} y^{T} A_{y} y,$ where $A_{x}$ and $A_{y}$ are symmetric positive definite matrices. Moreover, the instances are constructed such that the spectra of $A_{x}$ and $A_{y}$ are contained in the interval $[0.5, 5]$ . For this class of instances, one has $L = max {λ_{max} (A_{x}), λ_{max} (A_{y})} \in [0.5, 5]$ and $μ = min {λ_{min} (A_{x}), λ_{min} (A_{y})} \in [0.5, L]$ . The matrix $A_{xy} \in R^{5 \times 4}$ has entries chosen uniformly at random from $[0, 1]$ , and subsequently we set $L_{xy} = ‖ A_{xy} ‖_{2}$ . By construction, the solution (saddle point) is $(x^{⋆}, y^{⋆}) = (0, 0)$ . The starting points $x^{1}$ and $y^{1}$ are randomly drawn unit vectors so that the initial condition $‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2} = 2$ is satisfied.

In Figure we show average values (over 100 randomly generated instances) for the convergence indicator $‖ x^{k} - x^{⋆} ‖^{2} + ‖ y^{k} - y^{⋆} ‖^{2}$ after k iterations, for the two step lengths.Footnote¹ Note that our new step length (Equation12(12) $t^{⋆} = \frac{2 ((L + μ) \sqrt{L_{xy}^{2} + Lμ} + L_{xy} (μ - L))}{(4 L_{xy}^{2} + (L + μ)^{2}) \sqrt{L_{xy}^{2} + Lμ}} .$ (12) ) gives a clear improvement over the known step length $μ / (4 L^{2})$ .

Figure 1. Mean values of $‖ x^{k} - x^{⋆} ‖^{2} + ‖ y^{k} - y^{⋆} ‖^{2}$ for 100 randomly generated instances for each iteration k using the two different step lengths t.

3. Linear convergence without strong convexity

In this section, we study the linear convergence of Algorithm 1 without assuming strong convexity. Indeed, we suppose that $F \in F (L_{x}, L_{y}, L_{xy}, 0, 0)$ and we propose some necessary and sufficient conditions for the linear convergence. This subject has received some attention in recent years and some sufficient conditions have been proposed in [Citation11,Citation33] under which Algorithm 1 enjoys local linear convergence rate or it is linearly convergent for bilinear saddle point problems. This topic has been investigated extensively in the context of optimization. The interested reader can refer to [Citation1,Citation7,Citation19,Citation23] and references therein. In this study, we extend the quadratic gradient growth property introduced in [Citation19] for saddle point problems.

Recall that we denote the non-empty solution set of problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) by $S^{⋆}$ . As we do not assume the strong convexity (concavity), $S^{⋆}$ may not be singleton. Note that $S^{⋆}$ is a closed convex set under our assumptions. Recall that $Π_{S^{⋆}} ((x, y))$ denotes the projection of $(x, y)$ onto $S^{⋆}$ .

Definition 3.1

Let $μ_{F} > 0$ . A function F has a quadratic gradient growth if for any $x \in R^{n}$ and $y \in R^{m}$ , (14) $⟨ \nabla_{x} F (x, y), x - x^{⋆} ⟩ - ⟨ \nabla_{y} F (x, y), y - y^{⋆} ⟩ \geq μ_{F} d_{S^{⋆}}^{2} ((x, y)),$ (14) where $(x^{⋆}, y^{⋆}) = Π_{S^{⋆}} ((x, y))$ .

Note that if we set $y = y^{⋆}$ in (Equation14(14) $⟨ \nabla_{x} F (x, y), x - x^{⋆} ⟩ - ⟨ \nabla_{y} F (x, y), y - y^{⋆} ⟩ \geq μ_{F} d_{S^{⋆}}^{2} ((x, y)),$ (14) ), we have $⟨ \nabla_{x} F (x, y^{⋆}), x - x^{⋆} ⟩ \geq μ_{F} ‖ x - x^{⋆} ‖^{2} .$ Hence, $L_{x}$ -smoothness implies that $μ_{F} \leq L_{x}$ . Consequently, due to the symmetry, we have $μ_{F} \leq min {L_{x}, L_{y}}$ . The next proposition states that the quadratic gradient growth condition is weaker than the strong convexity-strong concavity. Indeed, the strong convexity–strong concavity implies the quadratic gradient growth property.

Proposition 3.2

Let $F \in F (L_{x}, L_{y}, L_{xy}, μ_{x}, μ_{y})$ . If $min {μ_{x}, μ_{y}} > 0$ , then F has a quadratic gradient growth with $μ_{F} = min {μ_{x}, μ_{y}}$ .

Proof.

Under the assumptions, problem (Equation1(1) $min_{x \in R^{n}} max_{y \in R^{m}} F (x, y),$ (1) ) has a unique solution $(x^{⋆}, y^{⋆})$ and $\nabla_{x} F (x^{⋆}, y^{⋆}) = 0$ and $\nabla_{y} F (x^{⋆}, y^{⋆}) = 0$ . Let $μ = min {μ_{x}, μ_{y}}$ and $L = max {L_{x}, L_{y}}$ . Suppose that $(x, y) \in R^{n} \times R^{m}$ . By Theorem 1.1, we have

$\begin{aligned} 0 & \leq (F (x^{⋆}, y) - F (x, y) + ⟨ \nabla_{x} F (x, y), x - x^{⋆} ⟩ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ \nabla_{x} F (x^{⋆}, y) - \nabla_{x} F (x, y) ‖}^{2} \\ + μ {‖ x - x^{⋆} ‖}^{2} - \frac{2 μ}{L} ⟨ \nabla_{x} F (x, y) - \nabla_{x} F (x^{⋆}, y), x - x^{⋆} ⟩)) + (F (x, y^{⋆}) - F (x^{⋆}, y^{⋆}) \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ \nabla_{x} F (x, y^{⋆}) ‖}^{2} + μ {‖ x - x^{⋆} ‖}^{2} - \frac{2 μ}{L} ⟨ \nabla_{x} F (x, y^{⋆}), x - x^{⋆} ⟩)) + (F (x, y) \\ - F (x, y^{⋆}) - ⟨ \nabla_{y} F (x, y), y - y^{⋆} ⟩ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ \nabla_{y} F (x, y^{⋆}) - \nabla_{y} F (x, y) ‖}^{2} + μ {‖ y - y^{⋆} ‖}^{2} \\ - \frac{2 μ}{L} ⟨ \nabla_{y} F (x, y^{⋆}) - \nabla_{y} F (x, y), y - y^{⋆} ⟩)) + (F (x^{⋆}, y^{⋆}) - F (x^{⋆}, y) - \frac{L}{2 (L - μ)} \\ \times (\frac{1}{L} {‖ \nabla_{y} F (x^{⋆}, y) ‖}^{2} + μ {‖ y - y^{⋆} ‖}^{2} - \frac{2 μ}{L} ⟨ \nabla_{y} F (x^{⋆}, y), y^{⋆} - y ⟩)) \\ = \frac{- μ^{2}}{L - μ} {‖ (x - x^{⋆}) - \frac{1}{2 μ} (\nabla_{x} F (x, y) + \nabla_{x} F (x, y^{⋆}) - \nabla_{x} F (x^{⋆}, y)) ‖}^{2} \\ - \frac{1}{4 (L - μ)} {‖ \nabla_{x} F (x, y) - \nabla_{x} F (x, y^{⋆}) - \nabla_{x} F (x^{⋆}, y) ‖}^{2} \\ - \frac{μ^{2}}{L - μ} {‖ (y - y^{⋆}) + \frac{1}{2 μ} (\nabla_{y} F (x, y) - \nabla_{y} F (x, y^{⋆}) + \nabla_{y} F (x^{⋆}, y)) ‖}^{2} \\ - \frac{1}{4 (L - μ)} {‖ \nabla_{y} F (x, y) - \nabla_{y} F (x, y^{⋆}) - \nabla_{y} F (x^{⋆}, y) ‖}^{2} \\ - μ ({‖ x - x^{⋆} ‖}^{2} + {‖ y - y^{⋆} ‖}^{2}) + ⟨ \nabla_{x} F (x, y), x - x^{⋆} ⟩ - ⟨ \nabla_{y} F (x, y), y - y^{⋆} ⟩ . \end{aligned}$ Hence, $μ ({‖ x - x^{⋆} ‖}^{2} + {‖ y - y^{⋆} ‖}^{2}) \leq ⟨ \nabla_{x} F (x, y), x - x^{⋆} ⟩ - ⟨ \nabla_{y} F (x, y), y - y^{⋆} ⟩,$ and the proof is complete.

Note that the converse of Proposition 3.2 does not hold necessarily. Consider the following saddle point problem (15) $min_{x \in R} max_{y \in R} F (x, y) := f (x + y) - 2 y^{2},$ (15) where $f (s) = {\begin{cases} 0 & | s | \leq 1 \\ (s - 1)^{2} & s > 1 \\ (s + 1)^{2} & s < - 1. \end{cases}$ It is seen that F is not strongly convex-strongly concave and the solution set of problem (Equation15(15) $min_{x \in R} max_{y \in R} F (x, y) := f (x + y) - 2 y^{2},$ (15) ) is ${(x, 0) : | x | \leq 1}$ . By doing some algebra, one can check that F has a quadratic gradient growth with $μ_{F} = 1$ while it is not strongly convex with respect to the first component. For the case that $F (\cdot, y)$ is neither strongly convex nor is $F (x, \cdot)$ strongly concave, one may consider uncoupled problem $min_{x \in R} max_{y \in R} f (x) - f (y)$ .

In what follows, by using performance estimation, we establish that Algorithm 1 enjoys the linear convergence whether $F \in F (L_{x}, L_{y}, L_{xy}, 0, 0)$ has a quadratic gradient growth. Without loss of generality, we may assume that $(0, 0) = Π_{S^{⋆}} ((x^{1}, y^{1}))$ . To establish the linear convergence, it suffices to show that $d_{S^{⋆}}^{2} ((x^{2}, y^{2})) \leq ‖ x^{2} ‖^{2} + ‖ y^{2} ‖^{2} \leq α d_{S^{⋆}}^{2} ((x^{1}, y^{1})),$ for some $α \in [0, 1)$ . Similarly to Section 2, we formulate the following optimization problem (16) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x} = 0, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆, *} w . r . t . μ_{y} = 0, L_{y} \\ ‖ G_{x}^{k, i} - G_{x}^{k, j} ‖ \leq L_{xy} ‖ y^{i} - y^{j} ‖, i, j, k \in {1, 2, ⋆} \\ ‖ G_{y}^{i, k} - G_{y}^{j, k} ‖ \leq L_{xy} ‖ x^{i} - x^{j} ‖, i, j, k \in {1, 2, ⋆} \\ μ_{F} (‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}) \leq ⟨ G_{x}^{1, 1}, x^{1} ⟩ - ⟨ G_{y}^{1, 1}, y^{1} ⟩, \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (16) Note that in the formulation (Equation16(16) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x} = 0, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆, *} w . r . t . μ_{y} = 0, L_{y} \\ ‖ G_{x}^{k, i} - G_{x}^{k, j} ‖ \leq L_{xy} ‖ y^{i} - y^{j} ‖, i, j, k \in {1, 2, ⋆} \\ ‖ G_{y}^{i, k} - G_{y}^{j, k} ‖ \leq L_{xy} ‖ x^{i} - x^{j} ‖, i, j, k \in {1, 2, ⋆} \\ μ_{F} (‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}) \leq ⟨ G_{x}^{1, 1}, x^{1} ⟩ - ⟨ G_{y}^{1, 1}, y^{1} ⟩, \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (16) ), we only use a subset of constraints for the performance estimation. In the next theorem, we prove the linear convergence of Algorithm 1 when F has a quadratic gradient growth.

Theorem 3.3

Let $F \in F (L_{x}, L_{y}, L_{xy}, 0, 0)$ and $L = max {L_{x}, L_{y}}$ . Assume that F has a quadratic gradient growth with $μ_{F} > 0$ . If $t \in (0, \frac{2 μ_{F}}{L μ_{F} + 2 L_{xy} \sqrt{μ_{F} (L - μ_{F})} + L_{xy}^{2}})$ , then Algorithm 1 generates $(x^{2}, y^{2})$ such that (17) $d_{S^{⋆}}^{2} ((x^{2}, y^{2})) \leq α d_{S^{⋆}}^{2} ((x^{1}, y^{1})),$ (17) where $α = t (2 t L_{xy} \sqrt{μ_{F} (L - μ_{F})} + μ_{F} (Lt - 2) + t L_{xy}^{2}) + 1.$

Proof.

The argument is similar to that of Theorem 2.2. It is seen that for any step length t in the given interval, $α \in [0, 1)$ . We may assume without loss of generality $L_{xy} = 1$ . By the assumptions, $F (\cdot, y) \in F_{0, L} (R^{n})$ and $F (x, \cdot) \in F_{0, L} (R^{m})$ for any fixed x, y. Suppose that

$\begin{aligned} \bar{α} = t (2 t \sqrt{μ_{F} (L - μ_{F})} + μ_{F} (Lt - 2) + t) + 1, β = t^{2} (μ_{F} \sqrt{L - μ_{F}} + \sqrt{μ_{F}}), \\ γ_{1} = t^{2} (\frac{μ_{F}}{\sqrt{μ_{F} (L - μ_{F})}} + μ_{F}), γ_{2} = \frac{t^{2} (μ_{F} (L - μ_{F}) + \sqrt{μ_{F} (L - μ_{F})})}{μ_{F}}, \\ γ_{3} = - \frac{t^{2} (μ_{F} (L + μ_{F}) + \sqrt{μ_{F} (L - μ_{F})})}{μ_{F}} + \frac{β}{\sqrt{L - μ_{F}}} + 2 t, γ_{4} = \frac{1}{2} t^{2} (\sqrt{μ_{F} (L - μ_{F})} + 1) . \end{aligned}$ One may readily verify that $γ_{1}, γ_{2}, γ_{3}, γ_{4} \geq 0$ . By doing some algebra, one can show that $\begin{aligned} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} (‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}) + γ_{1} (F^{1, 1} - F^{⋆, 1} - ⟨ G_{x}^{⋆, 1}, x^{1} ⟩ \\ - \frac{1}{2 L} {‖ G_{x}^{1, 1} - G_{x}^{⋆, 1} ‖}^{2}) + γ_{2} (F^{⋆, 1} - F^{1, 1} + ⟨ G_{x}^{1, 1}, x^{1} ⟩ - \frac{1}{2 L} {‖ G_{x}^{⋆, 1} - G_{x}^{1, 1} ‖}^{2}) + γ_{2} (F^{1, ⋆} \\ - F^{⋆, ⋆} - \frac{1}{2 L} {‖ G_{x}^{1, ⋆} ‖}^{2}) + γ_{1} (F^{⋆, ⋆} - F^{1, ⋆} + ⟨ G_{x}^{1, ⋆}, x^{1} ⟩ - \frac{1}{2 L} {‖ G_{x}^{1, ⋆} ‖}^{2}) + γ_{1} (F^{1, ⋆} - F^{1, 1} \\ + ⟨ G_{y}^{1, ⋆}, y^{1} ⟩ - \frac{1}{2 L} {‖ G_{y}^{1, 1} - G_{y}^{1, ⋆} ‖}^{2}) + γ_{2} (F^{1, 1} - F^{1, ⋆} - ⟨ G_{y}^{1, 1}, y^{1} ⟩ - \frac{1}{2 L} {‖ G_{y}^{1, ⋆} - G_{y}^{1, 1} ‖}^{2}) \\ + γ_{2} (- F^{⋆, 1} + F^{⋆, ⋆} - \frac{1}{2 L} {‖ G_{y}^{⋆, 1} ‖}^{2}) + γ_{1} (- F^{⋆, ⋆} + F^{⋆, 1} + ⟨ G_{y}^{⋆, 1}, - y^{1} ⟩ - \frac{1}{2 L} {‖ G_{y}^{⋆, 1} ‖}^{2}) \\ + γ_{3} (⟨ G_{x}^{1, 1}, x^{1} ⟩ - ⟨ G_{y}^{1, 1}, y^{1} ⟩ - μ_{F} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2})) + γ_{4} (‖ x^{1} ‖^{2} - {‖ G_{y}^{1, 1} - G_{y}^{⋆, 1} ‖}^{2}) \\ + γ_{4} (‖ x^{1} ‖^{2} - {‖ G_{y}^{1, ⋆} ‖}^{2}) + γ_{4} (‖ y^{1} ‖^{2} - {‖ G_{x}^{1, 1} - G_{x}^{1, ⋆} ‖}^{2}) + γ_{4} (‖ y^{1} ‖^{2} - {‖ G_{x}^{⋆, 1} ‖}^{2}) \\ = - ζ_{1} {‖ x^{1} + ζ_{2} G_{x}^{1, 1} - ζ_{3} (G_{x}^{1, ⋆} - G_{x}^{⋆, 1}) ‖}^{2} - ζ_{4} {‖ G_{x}^{1, 1} - G_{x}^{1, ⋆} - G_{x}^{⋆, 1} ‖}^{2} \\ - ζ_{1} {‖ y^{1} - ζ_{2} G_{y}^{1, 1} - ζ_{3} (G_{y}^{1, ⋆} - G_{y}^{⋆, 1}) ‖}^{2} - ζ_{4} {‖ G_{y}^{1, 1} - G_{y}^{⋆, 1} - G_{y}^{1, ⋆} ‖}^{2} \leq 0, \end{aligned}$ where the multipliers $ζ_{1}, ζ_{2}, ζ_{3}, ζ_{4}$ are given as follows $\begin{aligned} ζ_{1} = μ_{F} (\frac{β}{\sqrt{L - μ_{F}}} - μ_{F} t^{2}), ζ_{2} = \frac{β}{2 μ_{F} \sqrt{μ_{F}} t^{2}} - \frac{1}{μ_{F}}, ζ_{3} = \frac{t^{2} (\sqrt{μ_{F} (L - μ_{F})} + 1)}{2 μ_{F} t^{2}}, \\ ζ_{4} = \frac{1}{4} (\frac{2 t^{2} (μ_{F} (L - μ_{F}) + 1)}{\sqrt{μ_{F} (L - μ_{F})}} - \frac{{(2 μ_{F} t^{2} (μ_{F} - L) + β \sqrt{L - μ_{F}})}^{2}}{μ_{F} (L - μ_{F}) t^{2} \sqrt{μ_{F} (L - μ_{F})}}) . \end{aligned}$ One can show by some algebra that $ζ_{1}, ζ_{4} \geq 0$ . Hence, for any feasible solution of problem (Equation16(16) $\begin{aligned} max & \frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}} \\ s . t . & {(x^{1}; G_{x}^{1, k}; F^{1, k}), (x^{1} - t G_{x}^{1, 1}; G_{x}^{2, k}; F^{2, k}), (0; G_{x}^{⋆, k}; F^{⋆, k})} satisfy (3) for \\ k \in {1, 2, ⋆} w . r . t . μ_{x} = 0, L_{x} \\ {(y^{1}; G_{y}^{k, 1}; F^{k, 1}), (y^{1} + t G_{y}^{1, 1}; G_{y}^{k, 2}; F^{k, 2}), (0; G_{y}^{k, ⋆}; F^{k, ⋆})} satisfy (4) for \\ k \in {1, 2, ⋆, *} w . r . t . μ_{y} = 0, L_{y} \\ ‖ G_{x}^{k, i} - G_{x}^{k, j} ‖ \leq L_{xy} ‖ y^{i} - y^{j} ‖, i, j, k \in {1, 2, ⋆} \\ ‖ G_{y}^{i, k} - G_{y}^{j, k} ‖ \leq L_{xy} ‖ x^{i} - x^{j} ‖, i, j, k \in {1, 2, ⋆} \\ μ_{F} (‖ x^{1} ‖^{2} + ‖ y^{1} ‖^{2}) \leq ⟨ G_{x}^{1, 1}, x^{1} ⟩ - ⟨ G_{y}^{1, 1}, y^{1} ⟩, \\ G_{x}^{⋆, ⋆} = 0, G_{y}^{⋆, ⋆} = 0. \end{aligned}$ (16) ), we have $\frac{{‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2}}{{‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}} \leq \bar{α},$ and the proof is complete.

We obtained the linear convergence by using quadratic gradient growth in Theorem 3.3. The next theorem states that quadratic gradient growth property is also a sufficient condition for the linear convergence.

Theorem 3.4

If Algorithm 1 is linearly convergent for any initial point, then F has a quadratic gradient growth for some $μ_{F} > 0$ .

Proof.

Let $(x^{1}, y^{1}) \in R^{n} \times R^{m}$ and $(x^{2}, y^{2})$ be generated by Algorithm 1. Suppose that $(x^{⋆}, y^{⋆}) = Π_{S^{⋆}} ((x^{2}, y^{2}))$ . As Algorithm 1 is linearly convergent, there exist $α \in [0, 1)$ with (18) $d_{S^{⋆}}^{2} ((x^{2}, y^{2})) \leq α d_{S^{⋆}}^{2} ((x^{1}, y^{1})) \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$ (18) By setting $x^{2} = x^{1} - t \nabla_{x} F (x^{1}, y^{1})$ and $y^{2} = y^{1} + t \nabla_{y} F (x^{1}, y^{1})$ in inequality (Equation18(18) $d_{S^{⋆}}^{2} ((x^{2}, y^{2})) \leq α d_{S^{⋆}}^{2} ((x^{1}, y^{1})) \leq α (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) .$ (18) ), we get $\frac{1 - α}{2 t} (‖ x^{1} - x^{⋆} ‖^{2} + ‖ y^{1} - y^{⋆} ‖^{2}) \leq ⟨ \nabla_{x} F (x^{1}, y^{1}), x^{1} - x^{⋆} ⟩ - ⟨ \nabla_{y} F (x^{1}, y^{1}), y^{1} - y^{⋆} ⟩,$ which implies that $μ_{F} d_{S^{⋆}}^{2} (x^{1}, y^{1}) \leq ⟨ \nabla_{x} F (x^{1}, y^{1}), x^{1} - x^{⋆} ⟩ - ⟨ \nabla_{y} F (x^{1}, y^{1}), y^{1} - y^{⋆} ⟩,$ for $μ_{F} = \frac{1 - α}{2 t}$ and the proof is complete.

4. Concluding remarks

In this study, we provided a new convergence rate for the gradient descent–ascent method for saddle point problems. Furthermore, we gave some necessary and sufficient conditions for the linear convergence without strong convexity. We employed performance estimation method for proving the results. For future work, it would be interesting to consider the case where the variables x and y in the saddle point problem are constrained to lie in given, compact convex sets, since many saddle point problems fall in this category. In this case, one could use the performance estimation framework to analyse other methods, e.g. proximal type algorithms.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Dutch Scientific Council (NWO) grant OCENW.GROOT.2019.015, Optimization for and with Machine Learning (OPTIMAL).

Notes on contributors

Moslem Zamani

Moslem Zamani obtained his PhD degree from the University of Avignon and the University of Tehran. He worked as a postdoctoral researcher at Tilburg University. He is currently a researcher at UCLouvain. His main research interests include non-linear optimization and machine learning algorithms.

Hadi Abbaszadehpeivasti

Hadi Abbaszadehpeivasti earned his bachelor's degree in applied mathematics from the University of Zanjan and subsequently obtained master's degrees in industrial engineering from both Sharif University of Technology and Sabancı University. He is currently a PhD. student at Tilburg University, focusing on the complexity of first-order methods. His primary research interests include mathematical optimization and machine learning algorithms.

Etienne de Klerk

Etienne de Klerk obtained his PhD degree from the Delft University of Technology in The Netherlands in 1997. From January 1998 to September 2003, he held assistant professorships at the Delft University of Technology, and from September 2003 to September 2005 an associate professorship at the University of Waterloo, Canada, in the Department of Combinatorics & Optimization. In September 2004 he was appointed at Tilburg University, The Netherlands, first as an associate professor, and then as full professor (from June 2009). From August 29th, 2012, until August 31st, 2013, he was also appointed as full professor in the Division of Mathematics of the School of Physical and Mathematical Sciences at the Nanyang Technological University in Singapore. From September 1st, 2015 to August 31st 2019, he also held a part-time full professorship at the Delft University of Technology. Dr. De Klerk's main research interest is mathematical optimization, and, in particular, semidefinite programming.

Notes

1 The 100 random instances and starting points that we generated to produce Figure may be found on GitHub; see: https://github.com/molsemzamani/Bilinear-Minimax.

References

H. Abbaszadehpeivasti, E. de Klerk, and M. Zamani, Conditions for linear convergence of the gradient method for non-convex optimization, Optim. Lett. 17 (2023), pp. 1105–1125.
Web of Science ®Google Scholar
K.J. Arrow, H. Azawa, L. Hurwicz, H. Uzawa, H.B. Chenery, S.M. Johnson, and S. Karlin, Studies in Linear and Non-linear Programming, Vol. 2, Stanford University Press, California, 1958.
Google Scholar
W. Azizian, I. Mitliagkas, S. Lacoste-Julien and G. Gidel, A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games, in International Conference on Artificial Intelligence and Statistics, PMLR, 2020, pp. 2863–2873.
Google Scholar
T. Başar and G.J. Olsder, Dynamic Noncooperative Game Theory, SIAM, Philadelphia, PA, 1998.
Google Scholar
A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust Optimization, Vol. 28, Princeton University Press, Princeton, 2009.
Google Scholar
A. Beznosikov, B. Polyak, E. Gorbunov, D. Kovalev, and A. Gasnikov, Smooth monotone stochastic variational inequalities and saddle point problems – survey, preprint (2022). Available at arXiv:2208.13592.
Google Scholar
J. Bolte, T.P. Nguyen, J. Peypouquet, and B.W. Suter, From error bounds to the complexity of first-order descent methods for convex functions, Math. Program. 165 (2017), pp. 471–507.
Web of Science ®Google Scholar
S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, 2004.
Google Scholar
E. De Klerk, F. Glineur, and A.B. Taylor, On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions, Optim. Lett. 11 (2017), pp. 1185–1199.
Web of Science ®Google Scholar
Y. Drori and M. Teboulle, Performance of first-order methods for smooth convex minimization: A novel approach, Math. Program. 145 (2014), pp. 451–482.
Web of Science ®Google Scholar
S.S. Du and W. Hu, Linear convergence of the primal-dual gradient method for convex–concave saddle point problems without strong convexity, in The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, 2019, pp. 196–205.
Google Scholar
F. Facchinei and J.S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems, Springer, New York, 2003.
Google Scholar
A. Fallah, A. Ozdaglar and S. Pattathil, An optimal multistage stochastic gradient method for minimax problems, in 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, 2020, pp. 3573–3579.
Google Scholar
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, Adv. Neural. Inf. Process. Syst. 27 (2014), pp. 1–9.
Google Scholar
E.Y. Hamedani and N.S. Aybat, A primal-dual algorithm with line search for general convex–concave saddle point problems, SIAM. J. Optim. 31 (2021), pp. 1299–1329.
Web of Science ®Google Scholar
R. Jiang and A. Mokhtari, Generalized optimistic methods for convex–concave saddle point problems, preprint (2022). Available at arXiv:2202.09674.
Google Scholar
T. Liang and J. Stokes, Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks, in The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, 2019, pp. 907–915.
Google Scholar
T. Lin, C. Jin and M.I. Jordan, Near-optimal algorithms for minimax optimization, in Conference on Learning Theory, PMLR, 2020, pp. 2738–2779.
Google Scholar
Z.Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res. 46 (1993), pp. 157–178.
Google Scholar
L. Mescheder, S. Nowozin, and A. Geiger, The numerics of gans, Adv. Neural. Inf. Process. Syst. 30 (2017), pp. 1–11.
Google Scholar
A. Mokhtari, A. Ozdaglar and S. Pattathil, A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach, in International Conference on Artificial Intelligence and Statistics, PMLR, 2020, pp. 1497–1507.
Google Scholar
A. Mokhtari, A.E. Ozdaglar, and S. Pattathil, Convergence rate of O(1/k) for optimistic gradient and extragradient methods in smooth convex–concave saddle point problems, SIAM. J. Optim. 30 (2020), pp. 3230–3251.
Web of Science ®Google Scholar
I. Necoara, Y. Nesterov, and F. Glineur, Linear convergence of first order methods for non-strongly convex optimization, Math. Program. 175 (2019), pp. 69–107.
Web of Science ®Google Scholar
Y. Nesterov, Lectures on Convex Optimization, Vol. 137, Springer, Berlin, 2018.
Google Scholar
J. Nie, Z. Yang, and G. Zhou, The saddle point problem of polynomials, Found. Comput. Math. 22 (2022), pp. 1133–1169.
Web of Science ®Google Scholar
J.W. Simpson-Porco, B.K. Poolla, N. Monshizadeh, and F. Dörfler, Input–output performance of linear–quadratic saddle-point algorithms with application to distributed resource allocation problems, IEEE. Trans. Automat. Contr. 65 (2019), pp. 2032–2045.
Web of Science ®Google Scholar
A.B. Taylor, J.M. Hendrickx, and F. Glineur, Smooth strongly convex interpolation and exact worst-case performance of first-order methods, Math. Program. 161 (2017), pp. 307–345.
Web of Science ®Google Scholar
Y. Wang and J. Li, Improved algorithms for convex–concave minimax optimization, Adv. Neural. Inf. Process. Syst. 33 (2020), pp. 4800–4810.
Google Scholar
Z. Xu, H. Zhang, Y. Xu, and G. Lan, A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems, Math. Program. 201 (2023), pp. 635–706.
Web of Science ®Google Scholar
M. Yang, O. Nachum, B. Dai, L. Li, and D. Schuurmans, Off-policy evaluation via the regularized Lagrangian, Adv. Neural. Inf. Process. Syst. 33 (2020), pp. 6551–6561.
Google Scholar
M. Zamani, H. Abbaszadehpeivasti, and E. de Klerk, The exact worst-case convergence rate of the alternating direction method of multipliers, preprint (2022). Available at arXiv:2206.09865.
Google Scholar
G. Zhang, X. Bao, L. Lessard, and R. Grosse, A unified analysis of first-order methods for smooth games via integral quadratic constraints, J. Mach. Learn. Res. 22 (2021), pp. 1–39.
Web of Science ®Google Scholar
G. Zhang, Y. Wang, L. Lessard and R.B. Grosse, Near-optimal local convergence of alternating gradient descent–ascent for minimax optimization, in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 7659–7679.
Google Scholar
J. Zhang, M. Hong, and S. Zhang, On lower iteration complexity bounds for the convex concave saddle point problems, Math. Program. 194 (2022), pp. 901–935.
Web of Science ®Google Scholar

Appendices

Appendix 1.

Non-negativity of multipliers in Theorem 2.2

Recall that, in the proof of Theorem 2.2,

γ_{1}

is defined by

γ_{1} = \frac{t (t^{2} (2 + L^{2} + Lμ) - t (3 L + μ) + (Lt - 1) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2)}{\sqrt{(L t + μ t - 2)^{2} + 4 t^{2}}} .

Since t is non-negative, we only need to prove that

{\hat{γ}}_{1} := 2 t^{2} - 3 Lt - μt + (Lt - 1) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + L^{2} t^{2} + Lμ t^{2} + 2

is non-negative. We show that the following optimization problem is lower bounded by zero,

\begin{aligned} min_{L, t, μ} & {\hat{γ}}_{1} \\ s . t . & L \geq μ, μ \geq 0, t \geq 0, \end{aligned}

where

L, t, μ

are decision variables. First we consider the case that

Lt - 1 \leq 0

. We have the following optimization problem

(A1)

\begin{aligned} min_{L, t} & (min_{0 \leq μ \leq L} 2 t^{2} + L^{2} t^{2} + Lμ t^{2} - 3 Lt - μt + (Lt - 1) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2) \\ s . t . & Lt \leq 1, L \geq μ, t \geq 0. \end{aligned}

(A1) The function

{\hat{γ}}_{1}

is concave in μ, therefore, we just consider

μ = 0

and

μ = L

First we consider the case that $μ = 0$ . By substituting $μ = 0$ in ${\hat{γ}}_{1}$ we have ${\hat{γ}}_{1} = 2 t^{2} + (Lt - 1) (Lt - 2 + \sqrt{(Lt - 2)^{2} + 4 t^{2}}) .$ We argue that the above function is non-negative on the feasible set of problem (EquationA1(A1) $\begin{aligned} min_{L, t} & (min_{0 \leq μ \leq L} 2 t^{2} + L^{2} t^{2} + Lμ t^{2} - 3 Lt - μt + (Lt - 1) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2) \\ s . t . & Lt \leq 1, L \geq μ, t \geq 0. \end{aligned}$ (A1) ). By a conjugate multiplication of $Lt - 2 + \sqrt{(Lt - 2)^{2} + 4 t^{2}}$ one has ${\hat{γ}}_{1} = 2 t^{2} (1 - \frac{2 (1 - Lt)}{(2 - Lt) + \sqrt{(Lt - 2)^{2} + 4 t^{2}}}),$ since $(2 - Lt) + \sqrt{(Lt - 2)^{2} + 4 t^{2}} \geq 2 (2 - Lt)$ we conclude that $0 \leq \frac{2 (Lt - 1)}{(Lt - 2) + \sqrt{(Lt - 2)^{2} + 4 t^{2}}} \leq 1$ which proves ${\hat{γ}}_{1}$ is non-negative.

Now we consider the case that $μ = L$ . By substituting $μ = L$ we have ${\hat{γ}}_{1} = 2 t^{2} + 2 L^{2} t^{2} - 4 Lt + 2 (Lt - 1) \sqrt{(Lt - 1)^{2} + t^{2}} + 2.$ Now we show that $\frac{1}{2} {\hat{γ}}_{1} = t^{2} + (Lt - 1) ((Lt - 1) + \sqrt{(Lt - 1)^{2} + t^{2}})$ is non-negative on the given set. Note that, again by conjugate multiplication, $\frac{1}{2} {\hat{γ}}_{1} = t^{2} (1 - \frac{(1 - Lt)}{(1 - Lt) + \sqrt{(Lt - 1)^{2} + t^{2}}}),$ which always is non-negative due to the non-negativity of $(1 - Lt)$ .

Now we consider the case that tL−1>0. We have $\begin{aligned} {\hat{γ}}_{1} & = 2 t^{2} + L^{2} t^{2} + Lμ t^{2} - 3 Lt - μt + (Lt - 1) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2 \\ \geq 2 t^{2} + L^{2} t^{2} + Lμ t^{2} - 3 Lt - μt + (Lt - 1) | Lt + μt - 2 | + 2. \end{aligned}$ Here, we need to consider two sub-cases. Firstly, when $2 - Lt - μt \geq 0$ , we have $2 t^{2} + L^{2} t^{2} + Lμ t^{2} - 3 Lt - μt + (Lt - 1) (2 - Lt - μt) + 2 - = 2 t^{2} \geq 0.$ If $Lt + μt - 2 \geq 0$ , we have $\begin{aligned} 2 t^{2} + L^{2} t^{2} + Lμ t^{2} - 3 Lt - μt + (Lt - 1) (Ltμt - 2) + 2 \\ = (Lt - 2)^{2} + (L - μ) t + t^{2} + Lμ t^{2} \geq 0, \end{aligned}$ which completes the proof.

To show that $γ_{2}$ is non-negative we follow the same procedure. Recall the definition of $γ_{2}$ $γ_{2} = \frac{t (t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2)}{\sqrt{(L t + μ t - 2)^{2} + 4 t^{2}}} .$ We define ${\hat{γ}}_{2}$ as ${\hat{γ}}_{2} = t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2.$ Due to $t \geq 0$ , we only need to show that ${\hat{γ}}_{2}$ is non-negative. To this end, we show that the following optimization problem is lower bounded by zero. $\begin{aligned} min_{L, t, μ} & {\hat{γ}}_{2} = t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2 \\ s . t . & L \geq μ, μ \geq 0, t \geq 0. \end{aligned}$ First we consider the case that $1 - μt \geq 0$ . We have ${\hat{γ}}_{2} \geq t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) | Lt + μt - 2 | + 2.$ We consider two sub-cases. Firstly, $Lt + μt - 2 \geq 0$ : $\begin{aligned} {\hat{γ}}_{2} & = t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2 \\ \geq t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) (Lt + μt - 2) + 2 = 2 t^{2} \geq 0. \end{aligned}$ Now assume that $Lt + μt - 2 \leq 0$ . $\begin{aligned} {\hat{γ}}_{2} & = t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2 \\ \geq t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) (2 - Lt - μt) + 2 \\ = 2 ((μt - 1)^{2} + (2 - μt - Lt) + μL t^{2} + t^{2}) \geq 0. \end{aligned}$ Now we consider the case that $1 - μt \leq 0$ . $\begin{aligned} min_{t, μ} & (min_{μ \leq L \leq \frac{2 μ - t}{μt}} {\hat{γ}}_{2} = t^{2} (2 + μ^{2} + Lμ) - t (3 μ + L) + (1 - μt) \sqrt{(Lt + μt - 2)^{2} + 4 t^{2}} + 2) \\ s . t . & μt \geq 1, μ \geq 0, t \geq 0. \end{aligned}$ Note that ${\hat{γ}}_{2}$ is concave with respect to the variable L. Therefore, we should study the boundaries of L. If we set $L = μ$ we have $\begin{aligned} {\hat{γ}}_{2} & = 2 (t^{2} + μ^{2} t^{2} - 2 μt + (1 - μt) \sqrt{(μt - 1)^{2} + t^{2}} + 1) \\ = 2 t^{2} + 2 (μt - 1) ((μt - 1) - \sqrt{(μt - 1)^{2} + t^{2}}) . \end{aligned}$ By conjugate multiplication, we have ${\hat{γ}}_{2} = 2 t^{2} (1 - \frac{(μt - 1)}{μt - 1 + \sqrt{(μt - 1)^{2} + t^{2}}}) \geq 0.$ By $L \leq \frac{2 μ - t}{μt}$ one can see that $L \leq \frac{2}{t}$ . Setting $L = \frac{2}{t}$ : $\begin{aligned} {\hat{γ}}_{2} & = - μt + 2 t^{2} + μ^{2} t^{2} + (1 - μt) \sqrt{4 t^{2} + μ^{2} t^{2}} \\ = 2 t^{2} (1 - \frac{2 (μt - 1)}{μt + \sqrt{μ^{2} t^{2} + 4 t^{2}}}) \geq 0. \end{aligned}$ This completes the proof.

Appendix 2.

Identity used in the proof of Theorem 2.2

The proof of Theorem 2.2 requires the following identity, that may be verified through direct (symbolic) calculation:

$\begin{aligned} {‖ x^{1} - t G_{x}^{1, 1} ‖}^{2} + {‖ y^{1} + t G_{y}^{1, 1} ‖}^{2} - \bar{α} ({‖ x^{1} ‖}^{2} + {‖ y^{1} ‖}^{2}) + γ_{1} (F^{1, 1} - F^{⋆, 1} - ⟨ G_{x}^{⋆, 1}, x^{1} ⟩ \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{x}^{1, 1} - G_{x}^{⋆, 1} ‖}^{2} + μ ‖ x^{1} ‖^{2} - \frac{2 μ}{L} ⟨ G_{x}^{⋆, 1} - G_{x}^{1, 1}, - x^{1} ⟩)) + γ_{2} (F^{⋆, 1} - F^{1, 1} \\ + ⟨ G_{x}^{1, 1}, x^{1} ⟩ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{x}^{⋆, 1} - G_{x}^{1, 1} ‖}^{2} + μ ‖ x^{1} ‖^{2} - \frac{2 μ}{L} ⟨ G_{x}^{1, 1} - G_{x}^{⋆, 1}, x^{1} ⟩)) + γ_{2} (F^{1, ⋆} \\ - F^{⋆, ⋆} - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{x}^{1, ⋆} ‖}^{2} + μ {‖ x^{1} ‖}^{2} - \frac{2 μ}{L} ⟨ G_{x}^{1, ⋆}, x^{1} ⟩)) + γ_{1} (F^{⋆, ⋆} - F^{1, ⋆} + ⟨ G_{x}^{1, ⋆}, x^{1} ⟩ \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{x}^{1, ⋆} ‖}^{2} + μ {‖ x^{1} ‖}^{2} - \frac{2 μ}{L} ⟨ G_{x}^{1, ⋆}, x^{1} ⟩)) + γ_{1} (F^{1, ⋆} - F^{1, 1} + ⟨ G_{y}^{1, ⋆}, y^{1} ⟩ - \frac{L}{2 (L - μ)} \\ \times (\frac{1}{L} {‖ G_{y}^{1, 1} - G_{y}^{1, ⋆} ‖}^{2} + μ ‖ y^{1} ‖^{2} - \frac{2 μ}{L} ⟨ G_{y}^{1, ⋆} - G_{y}^{1, 1}, y^{1} ⟩)) + γ_{2} (F^{1, 1} - F^{1, ⋆} - ⟨ G_{y}^{1, 1}, y^{1} ⟩ \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{y}^{1, ⋆} - G_{y}^{1, 1} ‖}^{2} + μ {‖ y^{1} ‖}^{2} - \frac{2 μ}{L} ⟨ - G_{y}^{1, 1} + G_{y}^{1, ⋆}, y^{1} ⟩)) + γ_{2} (- F^{⋆, 1} + F^{⋆, ⋆} \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{y}^{⋆, 1} ‖}^{2} + μ {‖ y^{1} ‖}^{2} - \frac{2 μ}{L} ⟨ G_{y}^{⋆, 1}, - y^{1} ⟩)) + γ_{1} (- F^{⋆, ⋆} + F^{⋆, 1} + ⟨ G_{y}^{⋆, 1}, - y^{1} ⟩ \\ - \frac{L}{2 (L - μ)} (\frac{1}{L} {‖ G_{y}^{⋆, 1} ‖}^{2} + μ {‖ y^{1} ‖}^{2} - \frac{2 μ}{L} ⟨ - G_{y}^{⋆, 1}, y^{1} ⟩)) + γ_{3} (‖ x^{1} ‖^{2} - {‖ G_{y}^{1, 1} - G_{y}^{⋆, 1} ‖}^{2}) \\ + γ_{3} (‖ x^{1} ‖^{2} - {‖ G_{y}^{1, ⋆} ‖}^{2}) + γ_{3} ({‖ y^{1} ‖}^{2} - {‖ G_{x}^{1, 1} - G_{x}^{1, ⋆} ‖}^{2}) + γ_{3} (‖ y^{1} ‖^{2} - {‖ G_{x}^{⋆, 1} ‖}^{2}) \\ = - ζ_{1} {‖ x^{1} - ζ_{2} G_{x}^{1, 1} - ζ_{3} (G_{x}^{1, ⋆} - G_{x}^{⋆, 1}) ‖}^{2} - ζ_{4} {‖ G_{x}^{1, 1} - G_{x}^{1, ⋆} - G_{x}^{⋆, 1} ‖}^{2} \\ - ζ_{1} {‖ y^{1} + ζ_{2} G_{y}^{1, 1} - ζ_{3} (G_{y}^{1, ⋆} - G_{y}^{⋆, 1}) ‖}^{2} - ζ_{4} {‖ G_{y}^{1, 1} - G_{y}^{⋆, 1} - G_{y}^{1, ⋆} ‖}^{2}, \end{aligned}$ where $ζ_{1}, ζ_{2}, ζ_{3}, ζ_{4}$ are given by $\begin{aligned} ζ_{1} & = \frac{1}{2} t (\frac{(L^{2} + μ^{2}) β}{L - μ} - \frac{2 t^{2} (L - μ)}{β} + (L + μ) (t (L + μ) - 2)), \\ ζ_{2} & = - \frac{(L^{2} t - L - μ^{2} t + μ) β - L^{2} t (Lt + μt - 3) - (L + μ) (μ^{2} t^{2} - 2 μt + 2 t^{2} + 2) + μ^{2} t}{2 t^{2} (L + μ)^{2} (Lμ + 1) - 8 Lμt (L + μ) + 8 Lμ}, \\ ζ_{3} & = - \frac{t (L^{2} + 6 Lμ + μ^{2}) - 2 t^{2} (L + μ) (Lμ + 1) - (L - μ) β - 2 (L + μ)}{2 t^{2} (L + μ)^{2} (Lμ + 1) - 8 Lμt (L + μ) + 8 Lμ}, \\ ζ_{4} & = \frac{t {(β + Lt - μt)}^{2}}{4 (L - μ) β} . \end{aligned}$ Note that $ζ_{1}, ζ_{4} \geq 0$ , as required.

Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems

Abstract

1. Introduction

[Citation27, Theorem 4]

2. The gradient descent–ascent method

2.1. Numerical illustration

3. Linear convergence without strong convexity

4. Concluding remarks

Disclosure statement

Notes on contributors

Moslem Zamani

Hadi Abbaszadehpeivasti

Etienne de Klerk

References

Appendices

Appendix 1.

Non-negativity of multipliers in Theorem 2.2

Appendix 2.

Identity used in the proof of Theorem 2.2

Information for

Open access

Opportunities

Help and information

Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems

Abstract

1. Introduction

[Citation27, Theorem 4]

2. The gradient descent–ascent method

2.1. Numerical illustration

3. Linear convergence without strong convexity

4. Concluding remarks

Disclosure statement

Additional information

Funding

Notes on contributors

Moslem Zamani

Hadi Abbaszadehpeivasti

Etienne de Klerk

Notes

References

Appendices

Appendix 1.

Non-negativity of multipliers in Theorem 2.2

Appendix 2.

Identity used in the proof of Theorem 2.2

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date