Full article: A subspace inertial method for derivative-free nonlinear monotone equations

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We introduce a subspace inertial line search algorithm (SILSA), for finding solutions of nonlinear monotone equations (NME). At each iteration, a new point is generated in a subspace generated by the previous points. Of all finite points forming the subspace, a point with the largest residual norm is replaced by the new point to update the subspace. In this way, SILSA leaves regions far from the solution of NME and approaches regions near it, leading to a fast convergence to the solution. This study analyzes global convergence and complexity upper bounds on the number of iterations and the number of function evaluations required for SILSA. Numerical results show that SILSA is promising compared to the basic line search algorithm with several known derivative-free directions.

Keywords:

1. Introduction

This paper introduces an efficient derivative-free algorithm for solving monotone equations (1) $F (x) = 0, x \in R^{n},$ (1) where x is a vector in $R^{n}$ and $F : R^{n} \to R^{n}$ is a monotone function, possibly with large n. In finite precision arithmetic, for a given threshold $ϵ > 0$ and an initial point $x_{0} \in R^{n}$ , the algorithm finds an ε-approximate solution $x_{ϵ}$ of the problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), where the residual norm of $x_{ϵ}$ is below $min (ϵ, ‖ F (x_{0}) ‖)$ . Here $‖ \cdot ‖$ is the Euclidean norm.

The problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) appears in various practical applications, including constrained neural networks [Citation1], nonlinear compressed sensing [Citation2], phase retrieval [Citation3], and economic and chemical equilibrium problems [Citation4].

Different algorithms [Citation5–17] have been proposed and analyzed for finding an ε-approximate solution of the problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ). However, these approaches either require the computation of the (real) Jacobian matrix, which can be computationally expensive and memory-intensive, making them unsuitable for high-dimensional problems, or use its approximation, which may require a large number of function evaluations in high dimensions. As a result, these methods are not ideal for tackling large scale nonlinear systems of equations.

To address these limitations, Solodov and Svaiter [Citation18] proposed a basic line search algorithm (BLSA) augmented with a projected scheme for finding an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ). Compared to methods that require the (real) Jacobian matrix or its approximation, derivative-free methods [Citation19–38] have a simpler structure and lower memory requirements, making them suitable for solving large-scale problems. Nevertheless, it should be noted that BLSA does not provide a guarantee for reducing the residual norm at every accepted point. Instead, the residual norm can non-monotonically jump down or up until an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) is obtained. Hence, the convergence rate of BLSA is relatively slow.

To accelerate the convergence rate of BLSA and obtain an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), one potential approach is to integrate BLSA with inertial methods, such as those proposed in [Citation39–45], which construct steps based on the two previous accepted points, as described in [Citation29–31]. Despite using two previous points to generate new points, these methods still do not guarantee that the resulting points have the lowest residual norms in comparison to the prior accepted points. Another potential approach proposed in [Citation46] is to augment the algorithm with an extrapolation step in the line search condition to enforce residual norm reduction at each accepted point. However, in ill-conditioned problems, this technique may face challenges in finding such points since it does not accept points whose residual norms have not been reduced.

The global convergence of BLSA with various derivative-free directions to find an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), has been established in [Citation24,Citation29–32,Citation35]. However, based on our knowledge, no attempt has been made to find out the maximum number of iterations and the maximum number of function evaluations required to find an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) for BLSA. Thus, if complexity upper bounds on the number of function evaluations and the number of iterations are found, it would be interesting to know the cost of the algorithm before the algorithm is implemented and to know what parameters are appeared in such complexity bounds.

1.1. Contribution

This study proposes a new derivative-free line search algorithm, named subspace inertial line search algorithm (SILSA), which aims to find an ε-approximate solution of the problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) in Euclidean space. The underlying mapping of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) is monotone and Lipschitz continuous. The proposed subspace inertial point uses the information of the previous evaluated points. It uses a procedure to replace a point with the largest residual norm in the subspace with a new evaluated point. Due to this replacement, SILSA moves from regions containing points with large residual norms to regions containing points with low residual norms, leading to a fast convergence to an ε-approximate solution of the problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ). Additionally, the algorithm employs a spectral derivative-free direction based on the efficient direction of Liu and Storey [Citation32], along with an improved version of BLSA by Solodov and Svaiter [Citation18]. Moreover, we establish the global convergence property of SILSA under mild conditions and derive complexity upper bounds on the number of iterations and the number of function evaluations required by SILSA to find an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).

1.2. Organization of the paper

The paper is structured as follows. Section 2 provides preliminaries. Section 3 introduces SILSA, which comprises a new subspace inertial technique in Subsection 3.1 and a new derivative-free direction in Subsection 3.2. Section 4 investigates theoretical results for SILSA, which include auxiliary results in Subsection 4.1, global convergence in Subsection 4.2, and complexity results in Subsection 4.3. In Section 5, we compare SILSA with BLSA using several known derivative-free directions. Conclusion is given in Section 6.

2. Preliminaries

A mapping $F : R^{n} \to R^{n}$ is said to be monotone if the condition (2) $(F (x) - F (y))^{T} (x - y) \geq 0, \forall x, y \in R^{n}$ (2) holds in a Euclidean space $R^{n}$ .

In this paper, we assume the following assumptions:

(A1)	The function $F : R^{n} \to R^{n}$ is continuously differentiable and Lipschitz continuous with the Lipschitz constant L>0.
(A2)	F is monotone, i.e. the condition (Equation2(2) $(F (x) - F (y))^{T} (x - y) \geq 0, \forall x, y \in R^{n}$ (2) ) holds.
(A3)	The solution set $X^{*}$ of the system (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) is nonempty.

In the following subsections, we review the important concepts (basic line search, least and most promising points, inertial point, and complexity bound) that we are using during our study.

2.1. Basic line search algorithm (BLSA)

In this subsection, we introduce the concept of BLSA, which generates a sequence of iterates, denoted as ${y_{k}}_{k \geq 0}$ , to find an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ). It enforces the condition that $y_{k}$ must satisfy the line search condition (3) $- F (y_{k} + α_{k} d_{k})^{T} d_{k} \geq σ α_{k} ‖ F (y_{k} + α_{k} d_{k}) ‖ ‖ d_{k} ‖^{2} .$ (3) After the direction $d_{k}$ satisfies the descent condition (4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) and the step size $α_{k}$ is found by satisfying the line search condition (Equation3(3) $- F (y_{k} + α_{k} d_{k})^{T} d_{k} \geq σ α_{k} ‖ F (y_{k} + α_{k} d_{k}) ‖ ‖ d_{k} ‖^{2} .$ (3) ), BLSA accepts the new point (5) $y_{k + 1} := y_{k} + α_{k} d_{k}, \forall k \geq 0.$ (5)

2.2. Least and most promising points

In this subsection, we define two important concepts, which are needed to clarify our algorithm: the least promising point (LP point), defined as the point with the highest residual norm among the evaluated points, and the most promising point (MP point), defined as the point with the lowest residual norm among the evaluated points. These definitions are motivated by the fact that there is no guarantee of a residual norm reduction in each step of BLSA.

2.3. Inertial step

In this subsection, the traditional inertial point (6) $v_{k} = y_{k} + e_{k} (y_{k} - y_{k - 1})$ (6) is defined, where $y_{k - 1}, y_{k} \in R^{n}$ are the two distinct points generated by BLSA and $e_{k} \in [0, 1)$ is called extrapolation step size, which can be updated in various ways [Citation29–31,Citation39–43,Citation45], one of which is (7) $e_{k} = min {e_{max}, k^{- 2} ‖ y_{k} - y_{k - 1} ‖^{- 2}},$ (7) where $0 < e_{max} \leq 1$ is a tuning parameter. This choice guarantees the global convergence for BLSA in combination with the initial step (Equation7(7) $e_{k} = min {e_{max}, k^{- 2} ‖ y_{k} - y_{k - 1} ‖^{- 2}},$ (7) ), e.g. see [Citation29, Lemma 4.5].

Due to existing of no guarantee of producing $y_{k - 1}$ and $y_{k}$ as MP points by applying BLSA, it may decrease the effectiveness of inertial point. However, employing a subspace inertial point based on the previous MP points, gives us this chance to generate a new MP point or a point close to two previous MP points. In this way, the new generated point would be far from the previous LP points.

2.4. Complexity bound

In this subsection, we define the complexity bound, i.e. the maximum number of iterations and the maximum number of function evaluations required to find an ε-approximate solution $x_{ϵ}$ of the problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) that satisfies the theoretical criterion (8) $‖ F (x_{ϵ}) ‖ \leq min (ϵ, ‖ F (x_{0}) ‖) .$ (8) Let us define $f (x) := \frac{1}{2} ‖ F (x) ‖^{2}$ and its true gradient by $g (x) := J (x)^{T} F (x)$ of F at x, where $J (x)$ denotes the true Jacobian of F at x. Using the linear approximation of $F (x + d) \approx F (x) + J (x)^{T} d$ , we have $f (x + d) \approx q (d) := f (x) + g (x)^{T} d + \frac{1}{2} ‖ J (x) d ‖^{2},$ where the function $q (d)$ is a convex function. Assumptions (A1)–(A2) implies that for every $x, d \in R^{n}$ , we have (9) $f (x + d) - f (x) = g (x)^{T} d + \frac{1}{2} γ^{2} ‖ d ‖^{2},$ (9) where γ depends on x and d and also satisfies (10) $| γ | \leq L (general case), 0 \leq γ \leq L (convex case) .$ (10) The following result is a variant of [Citation47, Proposition 2], which is a crucial component to obtain the complexity bound for our algorithm. This result is independent of a particular derivative-free line search. Here we use the basic line search algorithm, which is different from the line search algorithm of [Citation47].

Proposition 2.1

Consider $x, d \in R^{n}$ and $Δ_{f} \geq 0$ , where $Δ_{f}$ is a threshold on f. Then, we show that at least one of the following conditions is satisfied:

(i)	$f (x + d) < f (x) - Δ_{f}$ ,
(ii)	$f (x + d) > f (x) + Δ_{f}$ and $f (x - d) < f (x) - Δ_{f}$ ,
(iii)	$\| g (x)^{T} d \| \leq Δ_{f} + \frac{1}{2} L^{2} ‖ d ‖^{2}$ .

Proof.

We assume that (iii) does not satisfy. Hence, we have $| g (x)^{T} d | > Δ_{f} + \frac{1}{2} L^{2} ‖ d ‖^{2} .$ Although the condition (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ) holds, we cannot guarantee $g (x)^{T} d < 0$ because the true matrix $J (x)$ at x is not available in $g (x) = J (x)^{T} F (x)$ . Hence, we consider the proof in the following two cases:

Case 1. If $g (x)^{T} d \leq 0$ , from (Equation9(9) $f (x + d) - f (x) = g (x)^{T} d + \frac{1}{2} γ^{2} ‖ d ‖^{2},$ (9) ) and (Equation10(10) $| γ | \leq L (general case), 0 \leq γ \leq L (convex case) .$ (10) ), we have (11) $f (x + d) - f (x) \leq g (x)^{T} d + \frac{1}{2} L^{2} ‖ d ‖^{2} = - | g (x)^{T} d | + \frac{1}{2} L^{2} ‖ d ‖^{2} < - Δ_{f};$ (11) hence (i) holds.

Case 2. If $g (x)^{T} d \geq 0$ , from (Equation9(9) $f (x + d) - f (x) = g (x)^{T} d + \frac{1}{2} γ^{2} ‖ d ‖^{2},$ (9) ) and (Equation10(10) $| γ | \leq L (general case), 0 \leq γ \leq L (convex case) .$ (10) ), we have (12) $f (x - d) - f (x) \leq g (x)^{T} (- d) + \frac{1}{2} L^{2} ‖ d ‖^{2} = - | g (x)^{T} d | + \frac{1}{2} L^{2} ‖ d ‖^{2} < - Δ_{f};$ (12) hence the second inequality of (ii) holds. The first inequality of (ii) $f (x + d) - f (x) \geq g (x)^{T} d - \frac{1}{2} L^{2} ‖ d ‖^{2} > Δ_{f}$ is obtained.

In exact precision arithmetic, the aim is to obtain an exact solution of the problem (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ). However, in the presence of finite precision arithmetic, the algorithm may get stuck before finding an approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), especially in nearly flat areas of the search space. For a finite termination, the theoretical criterion (Equation8(8) $‖ F (x_{ϵ}) ‖ \leq min (ϵ, ‖ F (x_{0}) ‖) .$ (8) ) is used to find an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).

2.5. Existing derivative-free directions

We here discuss several conjugate gradient (CG) type directions and their derivative-free variants.

Let us begin with a well-known CG method that aims to minimize an unconstrained smooth function $f : R^{n} \to R$ , using the iterative formula: (13) $x_{k + 1} = x_{k} + α_{k} d_{k}, \forall k \geq 0,$ (13) where $α_{k}$ is a step size determined by a line search procedure. The search direction (14) $d_{0} := - g (x_{0}), d_{k} := - g (x_{k}) + β_{k} d_{k - 1}, \forall k \geq 1$ (14) is computed, where $g (x_{k})$ is the true gradient of $f (x)$ at $x_{k}$ and $β_{k} \in R$ is the CG parameter.

Some classical famous formulas of the CG parameter are

$β_{k}^{P R P} := \frac{g (x_{k})^{T} v_{k - 1}}{‖ g (x_{k - 1}) ‖^{2}}$ of Polak–Ribere–Polyak (PRP) [Citation36,Citation37], where $v_{k} := g (x_{k}) - g (x_{k - 1})$ ;
$β_{k}^{F R} := \frac{‖ g (x_{k}) ‖^{2}}{‖ g (x_{k - 1}) ‖^{2}}$ of Fletcher–Reeves (FR) [Citation26];
$β_{k}^{L S} := \frac{g (x_{k})^{T} v_{k - 1}}{- d_{k - 1}^{T} g (x_{k - 1})}$ of Liu–Storey (LS) [Citation32];
$β_{k}^{D Y} := \frac{‖ g (x_{k}) ‖^{2}}{d_{k - 1}^{T} v_{k - 1}}$ of Dai–Yuan (DY) [Citation25];
$β_{k}^{D L} := \frac{g (x_{k})^{T} v_{k - 1}}{d_{k - 1}^{T} v_{k - 1}} - t \frac{g (x_{k})^{T} s_{k - 1}}{d_{k - 1}^{T} v_{k - 1}}$ of Dai–Liao (DL) [Citation48], where $t \geq 0$ and $s_{k} := x_{k} - x_{k - 1}$ .

For the other CG type directions; see the survey [Citation28].

To identify an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), BLSA generates the sequence ${x_{k}}_{k \geq 0}$ given by (15) $x_{k + 1} = x_{k} - λ_{k} F (z_{k}), λ_{k} = \frac{F (z_{k})^{T} (x_{k} - z_{k})}{‖ F (z_{k}) ‖^{2}}$ (15) with the trial point $z_{k} = x_{k} + α_{k} d_{k}$ . As a cheap and useful choice for $d_{k}$ , based on CG directions this paper focuses on the following three derivative-free search directions:

Motivated by the PRP method, the derivative-free direction $d_{0} = - F (x_{0}), d_{k} = - F (x_{k}) + β_{k} d_{k - 1}$ was proposed in [Citation24], where $β_{k} = \frac{F (x_{k})^{T} y_{k - 1}}{‖ F (x_{k - 1}) ‖}$ and $y_{k} = F (x_{k}) - F (x_{k - 1})$ .
Inspired by the FR method, the derivative-free direction $d_{0} = - F (x_{0}), d_{k} = - F (x_{k}) + β_{k}^{F R} v_{k - 1} - θ_{k} F (x_{k})$ was proposed in [Citation35] with $β_{k}^{F R} = \frac{‖ F (x_{k}) ‖^{2}}{‖ F (x_{k - 1}) ‖^{2}}$ and the three different choices $θ_{k}^{(1)} = \frac{F (x_{k})^{T} v_{k - 1}}{‖ F (x_{k - 1}) ‖^{2}}, θ_{k}^{(2)} = \frac{‖ F (x_{k}) ‖^{2} ‖ v_{k - 1} ‖^{2}}{‖ F (x_{k - 1}) ‖^{4}}, θ_{k}^{(3)} = θ_{k}^{(1)} + (β_{k}^{F R})^{2},$ where $v_{k} = z_{k} - x_{k}$ .
Motivated by the LS method, the derivative-free direction $d_{0} = - F (x_{0}), d_{k} = - F (x_{k}) + β_{k}^{E L S} d_{k - 1}$ was proposed in [Citation49], where $β_{k}^{E L S} = \frac{F (x_{k})^{T} y_{k - 1}}{F (x_{k - 1})^{T} d_{k - 1}} - t \frac{‖ y_{k - 1} ‖^{2} F (x_{k})^{T} d_{k - 1}}{(F (x_{k - 1})^{T} d_{k - 1})^{2}}$ and $t \geq \frac{1}{4}$ .

These methods are particularly well-suited for tackling large-scale non-smooth problems, since they utilize only function values and require minimal memory. Furthermore, the stability of the search directions is independent of the type of line search employed. It has been demonstrated that the sequence ${x_{k}}_{k \geq 0}$ generated by these methods globally converges to the solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), provided that the underlying mapping F is monotone and L-Lipschitz continuous [Citation18]. In Section 5, we present and evaluate several derivative-free directions that are based on the well-known CG directions and compare them with our proposed method.

3. Modified derivative-free algorithm

As mentioned in the introduction, several iterative inertial methods have been proposed in [Citation29–31] for obtaining an ε-approximate solution of nonlinear monotone Equation (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) in Euclidean space. The authors established that the sequences generated by their methods converge globally to the solution of the problem under mild conditions. Their primary contribution is in achieving an ε-approximate solution to (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) at a faster rate.

As discussed in Section 2, the concepts of MP and LP points were defined. Since BLSA cannot guarantee that the two points used to construct the inertial method are the previous MP points, the traditional inertial point has a low probability of being an MP point. To address this issue, the inertial point must ideally be constructed based on the previous MP points, or at the very least, a point in close proximity to the previous MP points. Hence, SILSA reduces the oscillation intensity of the residual norm by moving from regions containing LP points to regions containing MP points.

3.1. Novel subspace inertial method

In this section, we present a novel subspace inertial method that chooses the ingredients of the subspace from the previous MP points, thereby accelerating convergence to an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).

Let ${x_{k}}_{k \geq 0}$ be the sequence generated by our method. At the kth iteration of SILSA, we save the points generated by SILSA as the columns of the matrix $X_{n \times m}$ and their residual norms as the components of the vector ${N F}_{1 \times m}^{k} := (\begin{matrix} ‖ F (X_{: 1}^{k}) ‖, \dots, ‖ F (X_{: m}^{k}) ‖ \end{matrix}) .$ Here the jth column of $X^{k}$ is denoted by $X_{: j}^{k}$ and m is the subspace dimension. We now introduce our novel subspace inertial point (16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) where the extrapolation step size (17) $e_{k} := min {e_{max}, k^{- 2} {‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖}^{- 2}}$ (17) is computed, so that the condition (18) $\sum_{k = 1}^{\infty} ‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖ < \infty$ (18) is satisfied. Here $0 < e_{max} \leq 1$ is the maximum value for $e_{k}$ and $0 < λ_{j} < 1$ for $j = 1, \dots, m - 1$ are called subspace step sizes, satisfying (19) $\sum_{j = 1}^{m - 1} λ_{j} = 1.$ (19) These subspace step sizes will be chosen in Section 5 such that (Equation19(19) $\sum_{j = 1}^{m - 1} λ_{j} = 1.$ (19) ) holds. The condition (Equation18(18) $\sum_{k = 1}^{\infty} ‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖ < \infty$ (18) ) results in $α_{k} ‖ d_{k} ‖ \to 0$ (see Lemma 4.1, below), where $α_{k}$ satisfies the line search condition (Equation3(3) $- F (y_{k} + α_{k} d_{k})^{T} d_{k} \geq σ α_{k} ‖ F (y_{k} + α_{k} d_{k}) ‖ ‖ d_{k} ‖^{2} .$ (3) ). This result guarantees the global convergence for SILSA (see Theorem 4.3, below).

To update the matrix $X^{k}$ and the vector ${NF}^{k}$ at the kth iteration of SILSA, we replace the LP point with a new MP point (if any). Therefore, we use the new subspace inertial point (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ), which involves a weighted average of the m previous MP points. This increases the chance of discovering an MP point by SILSA, i.e. $x_{b}$ with $b = \underset{i = 1 : m}{argmin} ({NF}_{i}^{k})$ .

It should be noted that when m=2, the traditional inertial point (Equation6(6) $v_{k} = y_{k} + e_{k} (y_{k} - y_{k - 1})$ (6) ) differs from our subspace inertial point (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ), because (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ) replaces the LP point among the previous MP points with a new point. This means that (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ) is not restricted to using only the two previous points $x_{k - 1}$ and $x_{k}$ , while the traditional inertial point (Equation6(6) $v_{k} = y_{k} + e_{k} (y_{k} - y_{k - 1})$ (6) ) is limited to exactly these two points. If these two points are LP points, then BLSA using (Equation6(6) $v_{k} = y_{k} + e_{k} (y_{k} - y_{k - 1})$ (6) ) cannot move quickly from regions with LP points to regions with MP points, while SILSA using (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ) has a good chance of having more MP points in the subspace inertial, since the subspace inertial is updated by removing LP points as described above.

3.2. Novel derivative-free direction

Motivated by the CG method given in [Citation32], we introduce the spectral derivative-free direction (20) $d_{k} := {\begin{cases} - θ_{0} F (w_{k}) & if k = 0, \\ - θ_{k} F (w_{k}) + β_{k}^{D F L S} d_{k - 1} & if k \geq 1, \end{cases}$ (20) with the scalar parameter (21) $β_{k}^{D F L S} := - \frac{F (w_{k})^{T} y_{k - 1}}{F (w_{k - 1})^{T} d_{k - 1}}$ (21) and the spectral parameter (22) $θ_{k} := {\begin{cases} c & if k = 0, \\ c + β_{k}^{D F L S} \frac{F (w_{k})^{T} d_{k - 1}}{‖ F (w_{k}) ‖^{2}} & if k \geq 1, \end{cases}$ (22) where 0<c<1 is given and $w_{k}$ comes from (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ).

Lemma 3.1

The search direction $d_{k}$ computed by (Equation20(20) $d_{k} := {\begin{cases} - θ_{0} F (w_{k}) & if k = 0, \\ - θ_{k} F (w_{k}) + β_{k}^{D F L S} d_{k - 1} & if k \geq 1, \end{cases}$ (20) ) for $k \geq 0$ satisfies the descent condition (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ).

Proof.

For the tuning parameter 0<c<1, $d_{0}^{T} F (w_{0}) = - c ‖ F (w_{0}) ‖^{2}$ and $\begin{aligned} d_{k}^{T} F (w_{k}) & = (- θ_{k} F (w_{k}) + β_{k}^{D F L S} d_{k - 1})^{T} F (w_{k}) \\ = - (c + β_{k}^{D F L S} \frac{F (w_{k})^{T} d_{k - 1}}{‖ F (w_{k}) ‖^{2}}) ‖ F (w_{k}) ‖^{2} + β_{k}^{D F L S} F (w_{k})^{T} d_{k - 1} \\ = - c ‖ F (w_{k}) ‖^{2} for k \geq 1; \end{aligned}$ hence $d_{k}$ satisfies (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ) for all $k \geq 0$ .

3.3. Subspace inertial line search with SILSA

In this subsection, we introduce a detailed description of our subspace inertial derivative-free algorithm, which we call SILSA. This algorithm is designed to find an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) and is an improved version of BLSA that incorporates the subspace inertial method for faster convergence. In practice, the new subspace inertial method generates points that are, at worst, close to the previous MP points. Specifically, the new method replaces one of the previous MP points with the greatest residual norm. This substitution causes SILSA to move from regions with LP points to regions with MP points, and in practice quickly finds an approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).

The SILSA algorithm incorporates several tuning parameters: $σ > 0$ and $0 < \bar{γ} < 1$ (the line search parameters), $0 \leq δ_{min} < 1$ (the minimum threshold for $δ_{k}$ ), $0 < δ_{min} < δ_{max} \leq 1$ (the initial value for $δ_{k}$ ), $ω_{d} > 1$ (the parameter for updating $δ_{k}$ ), 0<c<1 (the parameter for computing $d_{k}$ ), $m \geq 2$ (the subspace dimension), $r \in (0, 1)$ (the parameter for reducing $α_{k}$ ), and $0 \leq e_{max} < 1$ (the maximum value for $e_{k}$ ).

We now describe how SILSA algorithm works:

(S0)	(Initialization) First, we choose an initial point $x_{0} \in R^{n}$ . Next, we select the inertial weights $0 < λ_{j} < 1$ , for $j = 1, \dots, m - 1$ such that the condition $\sum_{j = 1}^{m - 1} λ_{j} = 1$ is satisfied. We then choose the initial inertial point $w_{0} = x_{0}$ , and set the search direction to the negative residual vector at $x_{0}$ . The initial parameter $e_{0}$ for adjusting the inertial point is set to the tuning parameter $0 < e_{max} \leq 1$ , while the initial step size $0 < δ_{0} \leq 1$ is set to the tuning parameter $0 < δ_{max} \leq 1$ .
(S1)	(Line search algorithm) At the kthe iteration, SILSA performs lineSearch (line 5) along the derivative-free direction $d_{k}$ ( $d_{k}$ is computed by searchDir in line 16 for $k \geq 1$ ). Initially, lineSearch sets j=0 and takes the initial step size (23) $α_{k, 0} := δ_{k} .$ (23) The other step sizes are then reduced by a given factor 0<r<1 according to (24) $α_{k, j + 1} := r α_{k, j} for j \geq 0,$ (24) and j is increased until the line search condition (25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) is satisfied. Once this condition holds, we set $α_{k} := α_{k, j}$ and compute the accepted point (26) $z_{k} := w_{k} + α_{k} d_{k}$ (26) and its residual vector $F (z_{k})$ . If $‖ F (z_{k}) ‖$ is below a given threshold $ϵ > 0$ , $z_{k}$ is chosen as an approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), and SILSA terminates.
(S2)	(Checking reduction of the residual norm) If $‖ F (z_{k}) ‖ > ϵ$ , then checkDec (line 8) checks whether or not the decrease condition (27) $f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}$ (27) holds, where $f (z_{k}) := \frac{1}{2} ‖ F (z_{k}) ‖^{2}$ and $f (w_{k}) := \frac{1}{2} ‖ F (w_{k}) ‖^{2}$ . Accordingly, using the tuning parameter $ω_{d} > 1$ , it then either increases the step size $δ_{k}$ of the decrease condition (Equation27(27) $f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}$ (27) ), i.e. (28) $δ_{k + 1} := min (ω_{d} δ_{k}, δ_{max})$ (28) or decreases it, i.e. (29) $δ_{k + 1} := δ_{k} / ω_{d} .$ (29) Moreover, the extrapolation step size $e_{k}$ is computed by (Equation17(17) $e_{k} := min {e_{max}, k^{- 2} {‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖}^{- 2}}$ (17) ).
(S3)	(Projection of $w_{k}$ into the hyperplane $H := {w \in R^{n} ∣ F (z_{k})^{T} (w - z_{k}) = 0}$ ) The point $w_{k}$ is projected into H by projectPoint (line 9) and then the new point (30) $x_{k + 1} := w_{k} - μ_{k} F (z_{k})$ (30) is computed with the step size $μ_{k} := \frac{F (z_{k})^{T} (w_{k} - z_{k})}{‖ F (z_{k}) ‖^{2}}$ , and its residual norm $‖ F (x_{k + 1}) ‖$ . If $‖ F (x_{k + 1}) ‖$ is below a given threshold $ϵ > 0$ , $x_{k + 1}$ is chosen as an approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) and SILSA ends.
(S4)	(Update of the subspace of the old points) If the norm of the residual vector at the next point $x_{k + 1}$ , denoted by $‖ F (x_{k + 1}) ‖$ , is greater than a given threshold $ϵ > 0$ then updateSubspace updates the information of the subspace inertial point, which includes the matrix $X^{k}$ and the vector ${NF}^{k}$ (defined in Section 3.1) in line 11. If the current iteration $k + 1 \geq m$ holds, then a new point is added to the subspace. Specifically, we find the index $i_{w}$ of the previous MP point with the largest residual norm and replace it with the new point $x_{k + 1}$ , i.e. $i_{w} := \underset{i = 1 : m}{argmax} {{N F}_{: i}^{k + 1}}, X_{: i_{w}}^{k + 1} = x_{k + 1}, {N F}_{: i_{w}}^{k + 1} = ‖ F (x_{k + 1}) ‖ .$ On the other hand, if k+1<m, we set $X_{: k + 1}^{k + 1} = x_{k + 1}$ and ${NF}_{: k + 1}^{k + 1} = ‖ F (x_{k + 1}) ‖$ . By adding more points with the lower residual norm to the subspace, this strategy increases the chance of finding an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).
(S5)	(Computation of a new inertial point) After calculating the new subspace inertial point $w_{k + 1}$ by (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ) and determining its residual norm, if the value of $‖ F (w_{k + 1}) ‖$ is found to be less than a certain specified threshold $ϵ > 0$ , then $w_{k + 1}$ is considered as an approximate solution for (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), and SILSA terminates. Alternatively, if $δ_{k}$ is found to be less than a given threshold $δ_{min} > 0$ , then $w_{k}$ is considered as an approximate solution for (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ), and SILSA terminates.
(S6)	(Computation of the search direction) If the value of $‖ F (w_{k + 1}) ‖$ is greater than ε, the difference between the residual at the inertial points $w_{k + 1}$ and $w_{k}$ , denoted by $y_{k} := F (w_{k + 1}) - F (w_{k})$ , is computed. Then the derivative-free direction $d_{k + 1}$ is computed by (Equation20(20) $d_{k} := {\begin{cases} - θ_{0} F (w_{k}) & if k = 0, \\ - θ_{k} F (w_{k}) + β_{k}^{D F L S} d_{k - 1} & if k \geq 1, \end{cases}$ (20) ), whose step sizes $β_{k}^{D F L S}$ and $θ_{k + 1}$ have been computed by (Equation21(21) $β_{k}^{D F L S} := - \frac{F (w_{k})^{T} y_{k - 1}}{F (w_{k - 1})^{T} d_{k - 1}}$ (21) ) and (Equation22(22) $θ_{k} := {\begin{cases} c & if k = 0, \\ c + β_{k}^{D F L S} \frac{F (w_{k})^{T} d_{k - 1}}{‖ F (w_{k}) ‖^{2}} & if k \geq 1, \end{cases}$ (22) ), which depend on the values of $y_{k}$ , $d_{k}$ , $w_{k}$ , and $w_{k + 1}$ . The tuning parameters $δ_{max}$ , $δ_{min}$ , $ω_{d}$ , and r appear in the complexity bound on the number of function evaluations, which is discussed in Section 4.3. It is worth noting that, based on the update rules for $δ_{k}$ in (Equation28(28) $δ_{k + 1} := min (ω_{d} δ_{k}, δ_{max})$ (28) ) and (Equation29(29) $δ_{k + 1} := δ_{k} / ω_{d} .$ (29) ), $δ_{k}$ is always less than or equal to $δ_{max}$ for all values of k.

4. Convergence analysis and complexity

In this section, we first present several auxiliary results, that are necessary to establish global convergence and the complexity bounds, and then the main theoretical results.

4.1. Some auxiliary results

The following results have a key role in proving global convergence.

Lemma 4.1

Let ${w_{k}}_{k \geq 0}$ and ${x_{k}}_{k \geq 0}$ be the two sequences generated by SILSA, assume that the assumptions (A1)–(A3) hold, and define $Δ^{k} := max_{j = 1 : m - 1} | X_{: j + 1}^{k} - X_{: j}^{k} |$ . Then:

(i)	The inequality (31) $‖ x_{k + 1} - x^{} ‖^{2} \leq ‖ w_{k} - x^{} ‖^{2} - σ^{2} ‖ w_{k} - z_{k} ‖^{4}$ (31) holds.
(ii)	The sequence ${x_{k}}_{k \geq 0}$ is bounded, $\sum_{k = 0}^{\infty} ‖ w_{k} - z_{k} ‖^{4} < \infty$ and so (32) $lim_{k \to \infty} ‖ w_{k} - z_{k} ‖ = lim_{k \to \infty} α_{k} ‖ d_{k} ‖ = 0.$ (32)
(iii)	$Δ^{k}$ is finite and (33) $\sum_{j = 1}^{m - 1} λ_{j} ‖ X_{: j + 1}^{k} - X_{: j}^{k} ‖ \leq (m - 1) Δ^{k} < \infty .$ (33)
(iv)	There exists a positive constant ${\bar{Γ}}_{w}$ such that (34) $‖ F (w_{k}) ‖ \leq {\bar{Γ}}_{w}, \forall k \geq 0.$ (34)
(v)	If the direction $d_{k}$ is bounded, i.e. $‖ d_{k} ‖ \leq Γ_{d}$ for a positive constant $Γ_{d}$ , then there exists a positive constant $Γ_{z}$ such that (35) $‖ F (z_{k}) ‖ \leq Γ_{z}, \forall k \geq 0.$ (35) Here, $z_{k}$ is from (Equation26(26) $z_{k} := w_{k} + α_{k} d_{k}$ (26) ).
(vi)	If $F (w_{k}) \neq 0$ for any k, then $‖ d_{k} ‖ \geq c ‖ F (w_{k}) ‖$ and $d_{k} \neq 0$ .

Proof.

Let $x^{*}$ be the solution of Equation (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) and $x^{*} \in X^{*} \subset R^{n}$ be the set of feasible solutions.

(i-ii)	The proof can be done like the proof of [Citation29, Lemma 4.5], but with the difference that the extrapolation step size (Equation17(17) $e_{k} := min {e_{max}, k^{- 2} {‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖}^{- 2}}$ (17) ) and the condition (Equation18(18) $\sum_{k = 1}^{\infty} ‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖ < \infty$ (18) ) are used instead of the traditional extrapolation step size (Equation6(6) $v_{k} = y_{k} + e_{k} (y_{k} - y_{k - 1})$ (6) ) and the condition (Equation7(7) $e_{k} = min {e_{max}, k^{- 2} ‖ y_{k} - y_{k - 1} ‖^{- 2}},$ (7) ), respectively.
(iii)	There exist two positive integers $k > k^{'}$ such that $Δ^{k} = max_{j = 1 : m - 1} ‖ X_{: j + 1}^{k} - X_{: j}^{k} ‖ = ‖ x_{k} - x_{k^{'}} ‖ \leq ‖ x_{k} ‖ + ‖ x_{k^{'}} ‖ < \infty$ due to (ii). Then, the condition (Equation33(33) $\sum_{j = 1}^{m - 1} λ_{j} ‖ X_{: j + 1}^{k} - X_{: j}^{k} ‖ \leq (m - 1) Δ^{k} < \infty .$ (33) ) holds, because $0 < m < \infty$ and $\sum_{j = 1}^{m - 1} λ_{j} = 1$ .
(iv)	From (ii), since ${x_{k}}_{k \geq 0}$ is bounded, there exists a positive constant $Γ_{0}$ such that $‖ x_{k} ‖ \leq Γ_{0}$ for all $k \geq 0$ . From (Equation17(17) $e_{k} := min {e_{max}, k^{- 2} {‖ \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}) ‖}^{- 2}}$ (17) ) and since $0 < e_{max} \leq 1$ , we obtain $e_{k} \leq e_{max} \leq 1$ for all k. Hence, by (Equation16(16) $w_{k} := x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1}^{k} - X_{: j}^{k}),$ (16) ) and (Equation33(33) $\sum_{j = 1}^{m - 1} λ_{j} ‖ X_{: j + 1}^{k} - X_{: j}^{k} ‖ \leq (m - 1) Δ^{k} < \infty .$ (33) ), the sequence ${w_{k}}_{k \geq 0}$ is bounded above, i.e. $\begin{aligned} ‖ w_{k} ‖ & = ‖ x_{k} + e_{k} \sum_{j = 1}^{m - 1} λ_{j} (X_{: j + 1} - X_{: j}) ‖ \\ \leq ‖ x_{k} ‖ + e_{k} \sum_{j = 1}^{m - 1} λ_{j} ‖ X_{: j + 1} - X_{: j} ‖ \\ \leq Γ_{0} + 2 Γ_{0} (m - 1) = (2 m - 1) Γ_{0}; \end{aligned}$ hence F is continuous from (A1) and therefore (Equation34(34) $‖ F (w_{k}) ‖ \leq {\bar{Γ}}_{w}, \forall k \geq 0.$ (34) ) is valid.
(v)	From (Equation23(23) $α_{k, 0} := δ_{k} .$ (23) ), (Equation28(28) $δ_{k + 1} := min (ω_{d} δ_{k}, δ_{max})$ (28) ), and since $0 < δ_{max} \leq 1$ , we obtain $α_{k} \leq δ_{max} \leq 1$ for all k. Hence, (Equation26(26) $z_{k} := w_{k} + α_{k} d_{k}$ (26) ) and (iv) result in $‖ z_{k} ‖ = ‖ w_{k} + α_{k} d_{k} ‖ \leq ‖ w_{k} ‖ + ‖ d_{k} ‖ \leq (2 m - 1) Γ_{0} + Γ_{d} .$ Therefore the continuity of F implies that (Equation35(35) $‖ F (z_{k}) ‖ \leq Γ_{z}, \forall k \geq 0.$ (35) ) is valid.
(vi)	From (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ) and the fact that $F (w_{k}) \neq 0$ for all k, we have $- ‖ d_{k} ‖ \leq \frac{F (w_{k})^{T} d_{k}}{‖ F (w_{k}) ‖} \leq - c ‖ F (w_{k}) ‖ < 0,$ resulting in $d_{k} \neq 0$ .

In the following result, under the assumption that the residual norms are bounded below, upper and lower bounds for search directions and step sizes are restricted.

Proposition 4.2

Let ${x_{k}}_{k \geq 0}$ and ${w_{k}}_{k \geq 0}$ be the two sequences generated by SILSA and assume that the assumptions (A1)–(A3) hold. If there is a positive constant ${\underline{Γ}}_{w}$ such that $‖ F (w_{k}) ‖ \geq {\underline{Γ}}_{w}$ for all k, then the following two statements are valid:

(i)	The search directions $d_{k}$ are bounded above, i.e. (36) $0 < ‖ d_{k} ‖ \leq Γ_{d} := c {\bar{Γ}}_{w} + \frac{4 {\bar{Γ}}_{w}^{2}}{c {\underline{Γ}}_{w}^{2} σ} \forall k,$ (36) where σ is from the line search condition (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ), ${\bar{Γ}}_{w}$ is from (Equation34(34) $‖ F (w_{k}) ‖ \leq {\bar{Γ}}_{w}, \forall k \geq 0.$ (34) ), and c is from (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ).
(ii)	If the line search condition (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ) cannot be satisfied, then line search step sizes $α_{k}$ are bounded, i.e. (37) $\underline{α} := \frac{r c {\bar{Γ}}_{w}^{2}}{(L + σ Γ_{z}) Γ_{d}^{2}} \leq α_{k} \leq δ_{max} \leq 1.$ (37) where r is from the line search condition (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ), L is from (A2), and $δ_{max}$ is a tuning parameter.

Proof.

By the Cauchy-Schwartz inequality and (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ), we have $\begin{aligned} ‖ F (z_{k}) ‖ ‖ w_{k} - z_{k} ‖ & \geq F (z_{k})^{T} (w_{k} - z_{k}) \geq σ α_{k, j}^{2} ‖ F (z_{k}) ‖ ‖ d_{k} ‖^{2} \\ = σ ‖ F (z_{k}) ‖ ‖ w_{k} - z_{k} ‖^{2} . \end{aligned}$ Thus, we obtain (38) $σ ‖ w_{k} - z_{k} ‖ \leq 1, \forall k \geq 0.$ (38) It follows from Lemma 4.1(iv), (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ), (Equation20(20) $d_{k} := {\begin{cases} - θ_{0} F (w_{k}) & if k = 0, \\ - θ_{k} F (w_{k}) + β_{k}^{D F L S} d_{k - 1} & if k \geq 1, \end{cases}$ (20) ), (Equation34(34) $‖ F (w_{k}) ‖ \leq {\bar{Γ}}_{w}, \forall k \geq 0.$ (34) ), and (Equation38(38) $σ ‖ w_{k} - z_{k} ‖ \leq 1, \forall k \geq 0.$ (38) ) that, $\begin{aligned} | θ_{k} | & = | c - \frac{(F (w_{k})^{T} y_{k - 1}) (F (w_{k})^{T} d_{k - 1})}{F (w_{k - 1})^{T} d_{k - 1} ‖ F (w_{k}) ‖^{2}} | \\ \leq c + \frac{‖ F (w_{k}) ‖ ‖ y_{k - 1} ‖ ‖ F (w_{k}) ‖ ‖ d_{k - 1} ‖}{| F (w_{k - 1})^{T} d_{k - 1} | ‖ F (w_{k}) ‖^{2}} \end{aligned}$ and $| β_{k}^{D F L S} | = | \frac{F (w_{k})^{T} y_{k - 1}}{F (w_{k - 1})^{T} d_{k - 1}} | \leq \frac{‖ F (w_{k}) ‖ ‖ y_{k - 1} ‖}{| F (w_{k - 1})^{T} d_{k - 1} |}$ , resulting in $\begin{aligned} 0 < ‖ d_{k} ‖ & = ‖ - θ_{k} F (w_{k}) + β_{k}^{D F L S} d_{k - 1} ‖ \\ \leq c ‖ F (w_{k}) ‖ + 2 \frac{‖ F (w_{k}) ‖ ‖ y_{k - 1} ‖}{| F (w_{k - 1})^{T} d_{k - 1} |} ‖ d_{k - 1} ‖ \\ \leq c ‖ F (w_{k}) ‖ + 4 \frac{{\bar{Γ}}_{w}^{2}}{c ‖ F (w_{k - 1}) ‖^{2}} ‖ w_{k - 1} - z_{k - 1} ‖ \leq Γ_{d} = c {\bar{Γ}}_{w} + \frac{4 {\bar{Γ}}_{w}^{2}}{c {\underline{Γ}}_{w}^{2} σ} . \end{aligned}$ (ii) From Lemma 4.1(vi), $d_{k} \neq 0$ . We show that lineSearch always terminates in a finite number of steps. From (Equation23(23) $α_{k, 0} := δ_{k} .$ (23) ), we have $α_{k, 0} = δ_{k}$ . Then according to the role of updating $α_{k, j}$ in (Equation24(24) $α_{k, j + 1} := r α_{k, j} for j \geq 0,$ (24) ) we have $α_{k, j} = r^{- j} δ_{k}$ . If the condition (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ) with $α_{k, j} = r^{- j} δ_{k}$ does not hold, i.e. (39) $- F (w_{k} + r^{- j} δ_{k} d_{k})^{T} d_{k} < σ r^{- j} δ_{k} ‖ F (w_{k} + r^{- j} δ_{k} d_{k}) ‖ ‖ d_{k} ‖^{2},$ (39) as j goes to infinity, we have $- F (w_{k})^{T} d_{k} < 0$ , which contradicts (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ), since $δ_{k} \leq δ_{max}$ , $‖ d_{k} ‖ \leq Γ_{d}$ (from (i)), and $‖ F (w_{k} + r^{- j} δ_{k} d_{k}) ‖ = ‖ F (w_{k}) ‖ \leq Γ_{z}$ . Hence lineSearch terminates finitely; there is a positive integer $j^{'}$ such that $α_{k} = α_{k, j^{'}} = r^{- j^{'}} δ_{k},$ satisfying (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ). As long as (Equation39(39) $- F (w_{k} + r^{- j} δ_{k} d_{k})^{T} d_{k} < σ r^{- j} δ_{k} ‖ F (w_{k} + r^{- j} δ_{k} d_{k}) ‖ ‖ d_{k} ‖^{2},$ (39) ) holds, applying (Equation22(22) $θ_{k} := {\begin{cases} c & if k = 0, \\ c + β_{k}^{D F L S} \frac{F (w_{k})^{T} d_{k - 1}}{‖ F (w_{k}) ‖^{2}} & if k \geq 1, \end{cases}$ (22) ) into (Equation4(4) $F (y_{k})^{T} d_{k} \leq - c ‖ F (y_{k}) ‖^{2}, with 0 < c < 1$ (4) ) and using (A2), we have $\begin{aligned} c ‖ F (w_{k}) ‖^{2} & = - F (w_{k})^{T} d_{k} \\ = (F (w_{k} + r^{- (j^{'} - 1)} δ_{k}, d_{k})^{T} d_{k} - F (w_{k})^{T} d_{k}) \\ - F (w_{k} + r^{- (j^{'} - 1)} δ_{k} d_{k})^{T} d_{k} \\ \leq L r^{- (j^{'} - 1)} δ_{k} ‖ d_{k} ‖^{2} + σ r^{- (j^{'} - 1)} δ_{k} ‖ F (w_{k} + r^{- (j^{'} - 1)} δ_{k} d_{k}) ‖ ‖ d_{k} ‖^{2}, \end{aligned}$ leading to $\begin{aligned} δ_{max} & \geq α_{k} = r^{- j^{'}} δ_{k} \geq \frac{r c ‖ F (w_{k}) ‖^{2}}{(L + σ ‖ F (w_{k} + r^{- (j^{'} - 1)} δ_{k} d_{k}) ‖) ‖ d_{k} ‖^{2}} \geq \underline{α} \\ = \frac{r c {\bar{Γ}}_{w}^{2}}{(L + σ Γ_{z}) Γ_{d}^{2}} \end{aligned}$ from Lemma 1(iv).

4.2. Convergence analysis

The following result is the main global convergence of SILSA. The variants of this result can be found in [Citation29–31], but with the different inertial point.

Theorem 4.3

Suppose that (A1)–(A3) hold and ${w_{k}}_{k \geq 0}$ , ${z_{k}}_{k \geq 0}$ , ${x_{k}}_{k \geq 0}$ are the three sequences generated by SILSA. Let $δ_{min} = 0$ . Then, at least one of (40) $lim_{k \to \infty} ‖ F (z_{k}) ‖ = 0, lim_{k \to \infty} ‖ F (w_{k}) ‖ = 0, lim_{k \to \infty} ‖ F (x_{k}) ‖ = 0$ (40) holds. Moreover, the sequences ${x_{k}}_{k \geq 0}$ and ${w_{k}}_{k \geq 0}$ converge to a solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).

Proof.

If $‖ F (w_{k}) ‖ = 0$ , then SILSA terminates and accepts $w_{k}$ as a solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) (see line 13 of SILSA). Otherwise, SILSA performs and therefore there is a positive constant ${\underline{Γ}}_{w}$ such that $‖ F (w_{k}) ‖ > {\underline{Γ}}_{w}$ , for all k, holds. Hence Proposition 4.2(i) results in that there is a positive constant $Γ_{d}$ such that $0 < ‖ d_{k} ‖ \leq Γ_{d}$ for all k ( $d_{k} \neq 0$ from Lemma 4.1(vi)). Moreover, if $‖ F (w_{k} + r^{- j^{'}} δ_{k} d_{k}) ‖ = 0$ , then SILSA terminates and accepts $z_{k} = w_{k} + r^{- j^{'}} δ_{k} d_{k}$ as a solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) (here $j^{'}$ is a positive integer value such that $α_{k} = α_{k, j^{'}} = r^{- j^{'}} δ_{k}$ satisfying (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) )). Otherwise, from Lemma 4.1(iv), there is a positive constant $Γ_{z}$ such that $0 < ‖ F (z_{k}) ‖ \leq Γ_{z} with z_{k} = w_{k} + r^{- j^{'}} δ_{k} d_{k} .$ As such, the assumptions of Proposition 4.2(ii) are verified and this proposition results in that there is a positive constant $\underline{α}$ such that $α_{k} \geq \underline{α}$ for all k. Hence, from Lemma 4.1(vi), we obtain $α_{k} ‖ d_{k} ‖ > \underline{α} c ‖ F (w_{k}) ‖ > \underline{α} c {\underline{Γ}}_{w} > 0$ for all k, which contradicts (Equation32(32) $lim_{k \to \infty} ‖ w_{k} - z_{k} ‖ = lim_{k \to \infty} α_{k} ‖ d_{k} ‖ = 0.$ (32) ). Therefore, $‖ F (w_{k}) ‖ = 0$ is obtained. From (Equation32(32) $lim_{k \to \infty} ‖ w_{k} - z_{k} ‖ = lim_{k \to \infty} α_{k} ‖ d_{k} ‖ = 0.$ (32) ), $lim_{k \to \infty} ‖ x_{k} - w_{k} ‖ = 0$ and the continuity of F results in $\begin{aligned} lim_{k \to \infty} ‖ F (x_{k}) ‖ - lim_{k \to \infty} ‖ F (w_{k}) ‖ & \leq lim_{k \to \infty} ‖ F (x_{k}) - F (w_{k}) ‖ \\ \leq L lim_{k \to \infty} ‖ x_{k} - w_{k} ‖ = 0, \end{aligned}$ which consequently implies (41) $lim_{k \to \infty} ‖ F (x_{k}) ‖ = 0.$ (41) From the continuity of F, the boundedness of ${x_{k}}_{k \geq 0}$ and (Equation41(41) $lim_{k \to \infty} ‖ F (x_{k}) ‖ = 0.$ (41) ), it implies that the sequence ${x_{k}}_{k \geq 0}$ , generated by SILSA, has an accumulation point $x^{*}$ such that $F (x^{*}) = 0$ . On the other hand, the sequence ${x_{k} - x^{*}}_{k \geq 0}$ is convergent by Lemma 4.1, which means that the sequence ${x_{k}}_{k \geq 0}$ globally converges to the solution $x^{*}$ of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ).

4.3. Complexity results

This section concerns an investigation of the complexity of SILSA. Firstly, we establish an upper limit on the number of function evaluations required by lineSearch. Following that, we determine an upper bound on the number of iterations needed for SILSA to converge, with or without a reduction in residual norms. Consequently, we derive an upper threshold for the total number of function evaluations necessary to successfully find an approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) by SILSA.

Proposition 4.4

Let ${x_{k}}_{k \geq 0}$ be the sequence generated by SILSA and assumes that (A1)–(A3) hold. Assuming that the initial step size is bounded by $0 < δ_{max} \leq 1$ and that the parameter 0<r<1 is utilized to decrease the step size in lineSearch, the number nf of function evaluations used by lineSearch (line 5 of SILSA) can be constrained by $⌈ \log_{r^{- 1}} \frac{δ_{max}}{\underline{α}} ⌉,$ where $\underline{α}$ is a positive constant derived from Proposition 4.2(ii).

Proof.

From Proposition 4.2(ii), we have $α_{k, n f} \geq \underline{α}$ for all k. By (Equation23(23) $α_{k, 0} := δ_{k} .$ (23) ) and (Equation24(24) $α_{k, j + 1} := r α_{k, j} for j \geq 0,$ (24) ), we have $r^{n f} δ_{max} \geq α_{k, n f} = r^{n f} δ_{k} \geq \underline{α},$ leading to $n f \leq ⌈ \log_{r^{- 1}} \frac{δ_{max}}{\underline{α}} ⌉$ since $r \in (0, 1)$ .

By means of (Equation27(27) $f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}$ (27) ), we define the index set $I_{k}$ as the set of all iterations k such that $f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}$ . This set encompasses iterations exhibiting at most $\bar{γ} δ_{k}$ reductions in the residual norms, while the index set $I_{k}^{c}$ is the complement of $I_{k}$ .

Theorem 4.5

Let ${w_{k}}_{k \geq 0}$ , ${z_{k}}_{k \geq 0}$ , ${x_{k}}_{k \geq 0}$ be the sequences generated by SILSA, let $x_{ϵ}$ be an ε-approximate solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ) found by SILSA, and assume that (A1)–(A3) hold. Moreover, the tuning parameters $0 < \bar{γ} < 1$ (parameter for line search), $0 < δ_{min} < δ_{max} \leq 1$ (initial and minimal threshold for the step size $δ_{k}$ ), 0<r<1 (parameter for reducing step size by lineSearch), $1 \leq ω_{d} < \infty$ (parameter for updating the step size $δ_{k}$ ) are given. Then the following statements are valid:

(i)	The number of iterations of SILSA with reductions in the residual norms is bounded by (42) $\| I_{k} \| \leq \frac{f (x_{0}) - f (x_{ϵ})}{\bar{γ} δ_{min}} .$ (42)
(ii)	The number of iterations of SILSA without reductions in the residual norms is bounded by (43) $\| I_{k}^{c} \| \leq \log_{ω_{d}} \frac{δ_{max}}{δ_{min}} .$ (43)
(iii)	The number of iterations of SILSA is bounded by $N = \| I_{k} \| + \| I_{k}^{c} \| \leq \frac{f (x_{0}) - f (x_{ϵ})}{\bar{γ} δ_{min}} + \log_{ω_{d}} \frac{δ_{max}}{δ_{min}} = O (δ_{min}^{- 1}) .$
(iv)	The number of function evaluations of SILSA is bounded by $n f_{t o t a l} \leq N ⌈ \log_{r^{- 1}} \frac{δ_{max}}{\underline{α}} ⌉ .$
(v)	If there is a positive constant $M_{0}$ such that (44) $‖ g (w_{k}) ‖ \leq M_{0} ‖ F (w_{k}) ‖$ (44) for all k and SILSA has no iteration with a reduction in the residual norm, SILSA finds at least a point $w_{k}$ with at most $O (ϵ^{- 2})$ function evaluations satisfying $‖ F (w_{k}) ‖ = O (ϵ)$ . Here $g (w_{k}) = J (w_{k})^{T} F (w_{k})$ comes from Section 2.

Proof.

The index set $I_{k}$ is defined as ${k ∣ f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}}$ . By the definition of $I_{k}$ , we have: $f (x_{0}) - f (x_{ϵ}) \geq \sum_{j \in I_{k}} (f (w_{j}) - f (z_{j})) \geq \bar{γ} \sum_{j \in I_{k}} δ_{k} \geq \bar{γ} \sum_{j \in I_{k}} δ_{min} = | I_{k} | \bar{γ} δ_{min},$ which yields the result in (Equation42(42) $| I_{k} | \leq \frac{f (x_{0}) - f (x_{ϵ})}{\bar{γ} δ_{min}} .$ (42) ).
The set $I_{k}^{c}$ is defined as ${1, 2, \dots, k} ∖ I_{k}$ . Updating $δ_{k} = δ_{k - 1} / ω_{d}$ guarantees that $δ_{min} \leq δ_{k} \leq δ_{max}$ , which leads to the derivation of (Equation43(43) $| I_{k}^{c} | \leq \log_{ω_{d}} \frac{δ_{max}}{δ_{min}} .$ (43) ).
Combining the results from (i) and (ii) yields the desired outcome.
The result is obtained from (iii) and Proposition 4.4.
Consider any $j \in N \cup {0}$ with $j < \infty$ . During the execution of lineSearch, the trial points $w_{k} + α_{k, j} d_{k}$ are generated, the last of which satisfies the line search condition (Equation25(25) $- F (w_{k} + α_{k, j} d_{k})^{T} d_{k} \geq σ α_{k, j} ‖ F (w_{k} + α_{k, j} d_{k}) ‖ ‖ d_{k} ‖^{2}$ (25) ) and is accepted as $z_{k}$ . However, the condition (Equation27(27) $f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}$ (27) ) along $\pm d_{k}$ may not be satisfied. In the worst case, we assume that (Equation27(27) $f (z_{k}) < f (w_{k}) - \bar{γ} δ_{k}$ (27) ) is not satisfied. Then, by applying Proposition 2.1(iii), we have: $| g (w_{k})^{T} (α_{k} d_{k}) | \leq \bar{γ} δ_{k} + \frac{L}{2} ‖ α_{k} d_{k} ‖^{2} .$ We now consider the following two cases:
- Case 1: If $‖ F (w_{k}) ‖ \leq ϵ := \sqrt{δ_{min}}$ , then $x_{k} = w_{k}$ is a solution of (Equation1(1) $F (x) = 0, x \in R^{n},$ (1) ). Hence SILSA finds a point $x_{k}$ whose residual norm is less than ε with at most $O (ϵ^{- 2})$ function evaluations.
- Case 2 Assuming that $‖ F (w_{k}) ‖ > ϵ$ for all k, we can apply Proposition 4.2(i) to obtain the condition (Equation36(36) $0 < ‖ d_{k} ‖ \leq Γ_{d} := c {\bar{Γ}}_{w} + \frac{4 {\bar{Γ}}_{w}^{2}}{c {\underline{Γ}}_{w}^{2} σ} \forall k,$ (36) ), which ensures that $‖ d_{k} ‖ \leq Γ_{d}$ for all k. Additionally, Lemma 4.1(iv) and Proposition 4.2(ii) guarantee that the condition (Equation37(37) $\underline{α} := \frac{r c {\bar{Γ}}_{w}^{2}}{(L + σ Γ_{z}) Γ_{d}^{2}} \leq α_{k} \leq δ_{max} \leq 1.$ (37) ) holds, i.e. $α_{k} \geq \underline{α}$ for all k. Consequently, after a finite number of iterations, SILSA terminates due to the role of updating $δ_{k}$ in (Equation29(29) $δ_{k + 1} := δ_{k} / ω_{d} .$ (29) ), which implies the existence of a positive integer $k_{0}$ such that $α_{k} \leq δ_{k} \leq δ_{min}$ for $k \geq k_{0}$ . Considering the worst-case scenario where there is no reduction of the residual norm at $z_{k}$ for all k (i.e. $I_{k}$ is empty), we can use Proposition 2.1, (Equation36(36) $0 < ‖ d_{k} ‖ \leq Γ_{d} := c {\bar{Γ}}_{w} + \frac{4 {\bar{Γ}}_{w}^{2}}{c {\underline{Γ}}_{w}^{2} σ} \forall k,$ (36) ), (Equation37(37) $\underline{α} := \frac{r c {\bar{Γ}}_{w}^{2}}{(L + σ Γ_{z}) Γ_{d}^{2}} \leq α_{k} \leq δ_{max} \leq 1.$ (37) ), and (Equation44(44) $‖ g (w_{k}) ‖ \leq M_{0} ‖ F (w_{k}) ‖$ (44) ) to obtain $\begin{aligned} M_{0} | F (w_{k})^{T} (α_{k} d_{k}) | & \leq | g (w_{k})^{T} (α_{k} d_{k}) | \leq \bar{γ} δ_{k} + \frac{L}{2} ‖ α_{k} d_{k} ‖^{2} \\ \leq \bar{γ} δ_{min} + \frac{L}{2} α_{k}^{2} Γ_{d}^{2} \end{aligned}$ for all $k \geq k_{0}$ , i.e. $\begin{aligned} c ‖ F (w_{k}) ‖^{2} & = | F (w_{k})^{T} d_{k} | \leq \frac{\bar{γ} δ_{min}}{M_{0} α_{k}} + \frac{L}{2 M_{0}} Γ_{d}^{2} α_{k} \\ \leq (\frac{\bar{γ}}{M_{0} \underline{α}} + \frac{L Γ_{d}^{2}}{2 M_{0}}) δ_{min} . \end{aligned}$ Combining the results of the two cases, we obtain $‖ F (w_{k}) ‖ = O (ϵ) = O (\sqrt{δ_{min}})$ , which completes the proof.

As stated in the introduction, our complexity bound is of the same order as the bounds obtained by Cartis et al. [Citation50], Curtis et al. [Citation51], Dodangeh and Vicente [Citation52], Dodangeh et al. [Citation53], Kimiaei and Neumaier [Citation47], and Vicente [Citation54] for other optimization methods.

5. Numerical experiments

In this section, we present a comparative analysis of our algorithm, SILSA, with 10 well-known algorithms (discussed below) on a set of 18 test problems with the dimensions $n \in {10, 50, 300, 500, 1000, 5000} .$ This results in a total of 108 test functions. The Matlab codes of these 18 problems are available in the Section 7. To ensure that all test problems used are monotone in finite precision arithmetic, we randomly generated $10^{6}$ distinct points x and y and verified that the condition $\frac{(x - y)^{T} (F (x) - F (y))}{| x - y |^{T} (| F (x) | + | F (y) |) + 1} \leq - 0.1$ was fulfilled.

We compare SILSA with the following algorithms:

BLSA-DY, BLSA with the derivative-free direction using the CG direction of Dai and Yuan [Citation25].
BLSA-HZ, BLSA with the derivative-free direction using the CG direction of Hager and Zhang [Citation27].
BLSA-PR, BLSA with the derivative-free direction using the CG direction of Polak-Ribere-Polyak (PRP) [Citation36,Citation37].
BLSA-FR, BLSA with the derivative-free direction using the CG direction of Fletcher and Reeves (FR) [Citation26].
BLSA-3PR, BLSA with the derivative-free direction using the CG direction of Zhang et al. [Citation38].
BLSA-3A, BLSA with the derivative-free direction using the CG direction of Andrei [Citation22].
BLSA-IM, BLSA with the derivative-free direction of Ivanov et al. [Citation55].
BLSA-AK, BLSA with the derivative-free direction of Abubakar and Kumam [Citation56].
BLSA-HD, BLSA with the derivative-free direction of Huang et al. [Citation57].
BLSA-SS, BLSA with the derivative-free direction of Sabi'u et al. [Citation58].

In our comparison, the line search parameters for all algorithms were set to $σ = 0.01$ and r=0.5. We performed a tuning process to choose these two values for the selected test problems. The values of the other tuning parameters of the proposed algorithms are default values. For SILSA, the default values of the tuning parameters are as follows:

$δ_{max} = 0.5$ (the initial step size $δ_{k}$ ), $δ_{min} = 0$ (the minimum threshold for the step size $δ_{k}$ ), $ω_{d} = 2$ (the parameter for updating the step size $δ_{k}$ ), c=0.5 (the direction parameter), $e_{max} = 10^{- 4}$ (maximum value for $e_{k}$ ), $\bar{γ} = 10^{- 20}$ (the line search parameter), $λ_{i}^{0} = \ln (μ + \frac{1}{2}) - \ln i$ (the initial values for weights), and m=10 (the subspace inertial dimension). Here $μ = 4 + ⌊ 3 \ln n ⌋$ was chosen and the normalized version $λ_{i}^{0}$ of $λ_{i} := λ_{i}^{0} / \sum_{j = 1}^{m - 1} λ_{j}^{0}$ for $i = 1, 2, \dots, m - 1$ was computed.

Following the data profile of Mor'e and Wild [Citation59] and the performance profile of Dolan and Mor'e [Citation60],

the data profile $δ_{s} (κ)$ of the solver s for a positive value of κ measures the fraction of problems that the solver s can solve with at most $κ (n + 1)$ function evaluations, where n is the dimension of problems,
the performance profile $ρ_{s} (τ)$ of the solver s for a positive value of τ measures the relative efficiency of the solver s in solving the set of problems.

In particular, the fraction of problems that the solver s wins compared to the other solvers is $ρ_{s} (1)$ and the fraction of problems for sufficiently large τ (or κ) that the solver s can solve is $ρ_{s} (τ)$ (or $δ_{s} (κ$ )). The measure for efficiency considered in this paper is the number nf of function evaluations. The efficiency with respect to nf is called nf efficiency. All algorithms terminated when exactly one of the conditions $‖ F (x_{ϵ}) ‖ \leq 10^{- 5}$ , $nf \leq nfmax = 10000$ , and $sec \leq secmax = 360$ sec was satisfied. Here sec denotes time in seconds.

To evaluate their robustness and efficiency, we plot the data and performance profiles of all algorithms. From Figure , we conclude that SILSA is competitive with the other algorithms. Of the 112 test functions, SILSA is able to solve 95% of them, while also having the lowest number of function evaluations on 45% of these problems.

Figure 1. Results for problems with $n \in {10, 50, 300, 500, 1000, 5000}$ , the maximum number of function reevaluations ( $nfmax = 10000$ ), the maximum time in seconds ( $secmax = 360$ sec), and $ϵ = 10^{- 5}$ : Data profiles $δ (κ)$ (left) in dependence of a bound κ on the cost ratio, performance profiles $ρ (τ)$ (right) in dependence of a bound τ on the performance ratio, in terms of nf. Problems solved by no solver are ignored.

Figure 1. Results for problems with n∈{10,50,300,500,1000,5000}, the maximum number of function reevaluations (nfmax=10000), the maximum time in seconds (secmax=360 sec), and ϵ=10−5: Data profiles δ(κ) (left) in dependence of a bound κ on the cost ratio, performance profiles ρ(τ) (right) in dependence of a bound τ on the performance ratio, in terms of nf. Problems solved by no solver are ignored.

6. Conclusion

The paper discusses an improved derivative-free line search method for nonlinear monotone equations. Our line search is a combination of the basic line search algorithm proposed by Solodov and Svaiter [Citation18] and a novel subspace inertial point whose goal is to speed up reaching an approximate solution of nonlinear monotone equations. The subspace is generated based on a finite number of the previous MP points such that a point with largest residual norm among the previous MP points is replaced by a new evaluated point. The global convergence and worst case complexity results are proved. Numerical results show that our improved line search method is competitive with the state-of-the-art derivative-free methods.

7. Test problems in Matlab

The Matlab codes of all 18 test problems are as follows:

The problems 1–4, 6–8, 10, 13 are from [Citation46], the problem 5 is from [Citation14], the problems 9, 14, 15 are from [Citation61], the problems 12 is from [Citation62], and the problems 16-18 are from the present paper. To obtain the problems 16-18, the nonlinear complementarity problem $(x, s) \geq 0, s = F (x), x^{T} s = 0$ is converted to the nonsmooth equations (45) $(\begin{matrix} s - F (x) \\ min {x, s} \end{matrix}) = 0,$ (45) where F is a monotone operator. Following [Citation63], by defining $ϕ (μ, a, b) := a + b - \sqrt{(a - b)^{2} + 4 μ} for all (μ, a, b) \in R^{3},$ the problem (Equation45(45) $(\begin{matrix} s - F (x) \\ min {x, s} \end{matrix}) = 0,$ (45) ) is transformed into $(\begin{matrix} s - F (x) \\ ϕ (μ, x_{1}, s_{1}) \\ ϕ (μ, x_{2}, s_{2}) \\ ϕ (μ, x_{n}, s_{n}) \end{matrix}) = 0.$ For all test problems, the initial points were chosen to be $x_{i} = i / (i + 2)$ for $i = 1, \dots, n$ .

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The first author acknowledges financial support of the Austrian Science Foundation under Project No. P 34317. The second author is grateful to King Fahd University of Petroleum and Minerals for providing excellent research facilities. The third author is supported by an FWO junior postdoctoral fellowship [12AK924N]. In addition, she received funding from the Flemish Government (AI Research Program). Susan Ghaderi is affiliated with Leuven.AI - KU Leuven Institute for AI, B-3000, Leuven, Belgium.

References

Chorowski J, Zurada JM. Learning understandable neural networks with nonnegative weight constraints. IEEE Trans Neural Netw Learn Syst. 2015;26(1):62–69. doi: 10.1109/TNNLS.2014.2310059
PubMed Web of Science ®Google Scholar
Blumensath T. Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans Inf Theory. 2013;59:3466–3474. doi: 10.1109/TIT.2013.2245716
Web of Science ®Google Scholar
Candes EJ, Li X, Soltanolkotabi M. Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans Inf Theory. 2015 Apr;61(4):1985–2007. doi: 10.1109/TIT.2015.2399924
Web of Science ®Google Scholar
Dirkse SP, Ferris MC. Mcplib: a collection of nonlinear mixed complementarity problems. Optim Methods Softw. 1995 Jan;5(4):319–345. doi: 10.1080/10556789508805619
Google Scholar
Ahookhosh M, Amini K, Bahrami S. Two derivative-free projection approaches for systems of large-scale nonlinear monotone equations. Numer Algorithms. 2012 Oct;64(1):21–42. doi: 10.1007/s11075-012-9653-z
Web of Science ®Google Scholar
Ahookhosh M, Artacho FJA, Fleming RMT, et al. Local convergence of the Levenberg–Marquardt method under Hölder metric subregularity. Adv Comput Math. 2019 Jun;45(5-6):2771–2806. doi: 10.1007/s10444-019-09708-7
Web of Science ®Google Scholar
Ahookhosh M, Fleming RMT, Vuong PT. Finding zeros of hölder metrically subregular mappings via globally convergent Levenberg–Marquardt methods. Optim Methods Softw. 2020 Jan;37(1):113–149. doi: 10.1080/10556788.2020.1712602
Web of Science ®Google Scholar
Amini K, Shiker MA, Kimiaei M. A line search trust-region algorithm with nonmonotone adaptive radius for a system of nonlinear equations. 4OR. 2016;14(2):133–152. doi: 10.1007/s10288-016-0305-3
Web of Science ®Google Scholar
Brown PN, Saad Y. Convergence theory of nonlinear Newton–Krylov algorithms. SIAM J Optim. 1994;4(2):297–330. doi: 10.1137/0804017
Web of Science ®Google Scholar
Dennis JE, Moré JJ. A characterization of superlinear convergence and its application to quasi-Newton methods. Math Comput. 1974;28(126):549–560. doi: 10.1090/mcom/1974-28-126
Web of Science ®Google Scholar
Esmaeili H, Kimiaei M. A trust-region method with improved adaptive radius for systems of nonlinear equations. Math Methods Oper Res. 2016;83(1):109–125. doi: 10.1007/s00186-015-0522-0
Web of Science ®Google Scholar
Kimiaei M. Nonmonotone self-adaptive Levenberg–Marquardt approach for solving systems of nonlinear equations. Numer Funct Anal Optim. 2018;39(1):47–66. doi: 10.1080/01630563.2017.1351988
Web of Science ®Google Scholar
Kimiaei M, Esmaeili H. A trust-region approach with novel filter adaptive radius for system of nonlinear equations. Numer Algorithms. 2016;73(4):999–1016. doi: 10.1007/s11075-016-0126-7
Web of Science ®Google Scholar
Li D, Fukushima M. A globally and superlinearly convergent Gauss–Newton-based bfgs method for symmetric nonlinear equations. SIAM J Numer Anal. 1999;37(1):152–172. doi: 10.1137/S0036142998335704
Web of Science ®Google Scholar
Yamashita N, Fukushima M. On the Rate of Convergence of the Levenberg-Marquardt Method. In: Alefeld G, Chen X, editors. Topics in Numerical Analysis. Springer Vienna.2001. p. 239–249.
Google Scholar
Yuan Y. Recent advances in numerical methods for nonlinear equations and nonlinear least squares. NACO. 2011;1(1):15–34. doi: 10.3934/naco.2011.1.15
Google Scholar
Zhou G, Toh KC. Superlinear convergence of a Newton-type algorithm for monotone equations. J Optim Theory Appl. 2005;125(1):205–221. doi: 10.1007/s10957-004-1721-7
Web of Science ®Google Scholar
Solodov MV, Svaiter BF. A globally convergent inexact Newton method for systems of monotone equations. In: Reformulation: Nonsmooth, piecewise smooth, semismooth and smoothing methods. Springer; 1998. p. 355–369.
Google Scholar
Al-Baali M, Narushima Y, Yabe H. A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization. Comput Optim Appl. 2014 May;60(1):89–110. doi: 10.1007/s10589-014-9662-z
Web of Science ®Google Scholar
Aminifard Z, Babaie-Kafaki S. A modified descent Polak–Ribiére–Polyak conjugate gradient method with global convergence property for nonconvex functions. Calcolo. 2019;56(2):1–11. doi: 10.1007/s10092-019-0312-9
Web of Science ®Google Scholar
Aminifard Z, Hosseini A, Babaie-Kafaki S. Modified conjugate gradient method for solving sparse recovery problem with nonconvex penalty. Signal Process. 2022;193:108424. doi: 10.1016/j.sigpro.2021.108424
Web of Science ®Google Scholar
Andrei N. A simple three-term conjugate gradient algorithm for unconstrained optimization. Comput Appl Math. 2013;241:19–29. doi: 10.1016/j.cam.2012.10.002
Web of Science ®Google Scholar
Andrei N. Nonlinear conjugate gradient methods for unconstrained optimization. 1. Cham, Switzerland: Springer Cham; 2020.
Google Scholar
Cheng W. A PRP type method for systems of monotone equations. MCM. 2009;50(1-2):15–20.
Google Scholar
Dai YH, Yuan Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J Optim. 1999;10(1):177–182. doi: 10.1137/S1052623497318992
Web of Science ®Google Scholar
Fletcher R, Reeves CM. Function minimization by conjugate gradients. Comput J. 1964;7(2):149–154. doi: 10.1093/comjnl/7.2.149
Web of Science ®Google Scholar
Hager WW, Zhang H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J Optim. 2005;16(1):170–192. doi: 10.1137/030601880
Web of Science ®Google Scholar
Hager WW, Zhang H. A survey of nonlinear conjugate gradient methods. Pacific J Optim. 2006;2(1):35–58.
Google Scholar
Ibrahim A, Kumam P, Abubakar AB, et al. A method with inertial extrapolation step for convex constrained monotone equations. J Inequal Appl. 2021;2021(1):1–25. doi: 10.1186/s13660-021-02719-3
Web of Science ®Google Scholar
Ibrahim AH, Kumam P, Abubakar AB, et al. Accelerated derivative-free method for nonlinear monotone equations with an application. Numer Linear Algebra Appl. 2021 Nov;29(3):e2424. doi: 10.1002/nla.v29.3
Web of Science ®Google Scholar
Ibrahim AH, Kumam P, Sun M, et al. Projection method with inertial step for nonlinear equations: application to signal recovery. J Ind Manag. 2021;19:30–55.
Google Scholar
Liu Y, Storey C. Efficient generalized conjugate gradient algorithms, part 1: theory. J Optim Theory Appl. 1991;69(1):129–137. doi: 10.1007/BF00940464
Web of Science ®Google Scholar
Liu Y, Zhu Z, Zhang B. Two sufficient descent three-term conjugate gradient methods for unconstrained optimization problems with applications in compressive sensing. J Appl Math Comput. 2021;68:1787–1816. doi: 10.1007/s12190-021-01589-8
Web of Science ®Google Scholar
Lotfi M, Mohammad Hosseini S. An efficient hybrid conjugate gradient method with sufficient descent property for unconstrained optimization. Optim Methods Softw. 2021;37:1725–1739. doi: 10.1080/10556788.2021.1977808
Web of Science ®Google Scholar
Papp Z, Rapajić S. FR type methods for systems of large-scale nonlinear monotone equations. Appl Math Comput. 2015;269:816–823.
Web of Science ®Google Scholar
Polak E, Ribiere G. Note sur la convergence de méthodes de directions conjuguées. ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique. 1969;3(R1):35–43.
Google Scholar
Polyak BT. The conjugate gradient method in extremal problems. USSR Comput Math Math Phys. 1969;9(4):94–112. doi: 10.1016/0041-5553(69)90035-4
Google Scholar
Zhang L, Zhou W, Li DH. A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J Numer. 2006;26(4):629–640. doi: 10.1093/imanum/drl016
Web of Science ®Google Scholar
Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in hilbert space. SIAM J Optim. 2004 Jan;14(3):773–782. doi: 10.1137/S1052623403427859
Web of Science ®Google Scholar
Alvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Var Anal. 2001;9(1):3–11. doi: 10.1023/A:1011253113155
Web of Science ®Google Scholar
Attouch H, Peypouquet J, Redont P. A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256. doi: 10.1137/130910294
Web of Science ®Google Scholar
Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. doi: 10.1137/080716542
Web of Science ®Google Scholar
Boţ RI, Grad SM. Inertial forward–backward methods for solving vector optimization problems. Optimization. 2018 Feb;67(7):959–974. doi: 10.1080/02331934.2018.1440553
PubMed Web of Science ®Google Scholar
Boţ RI, Nguyen DK. A forward-backward penalty scheme with inertial effects for monotone inclusions, applications to convex bilevel programming. Optimization. 2018 Dec;68(10):1855–1880.
PubMedGoogle Scholar
Boţ R, Sedlmayer M, Vuong PT. A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to GANs. arXiv e-prints, arXiv:2003.07886, 2020 Mar.
Google Scholar
Ibrahim AH, Kimiaei M, Kumam P. A new black box method for monotone nonlinear equations. Optimization. 2021 Nov;72:1119–1137. doi: 10.1080/02331934.2021.2002326
Web of Science ®Google Scholar
Kimiaei M, Neumaier A. Efficient global unconstrained black box optimization. Math Program Comput. 2022 Feb;14:365–414. doi: 10.1007/s12532-021-00215-9
Web of Science ®Google Scholar
Dai YH, Liao LZ. New conjugacy conditions and related nonlinear conjugate gradient methods. Appl Math Optim. 2001 Jan;43(1):87–101. doi: 10.1007/s002450010019
Web of Science ®Google Scholar
Li M. An Liu-Storey-type method for solving large-scale nonlinear monotone equations. Numer Funct Anal Optim. 2014;35(3):310–322. doi: 10.1080/01630563.2013.812656
Web of Science ®Google Scholar
Cartis C, Sampaio P, Toint P. Worst-case evaluation complexity of non-monotone gradient-related algorithms for unconstrained optimization. Optimization. 2014 Jan;64(5):1349–1361. doi: 10.1080/02331934.2013.869809
Web of Science ®Google Scholar
Curtis FE, Lubberts Z, Robinson DP. Concise complexity analyses for trust region methods. Optim Lett. 2018 Jun;12(8):1713–1724. doi: 10.1007/s11590-018-1286-2
Web of Science ®Google Scholar
Dodangeh M, Vicente LN. Worst case complexity of direct search under convexity. Math Program. 2014 Nov;155(1-2):307–332. doi: 10.1007/s10107-014-0847-0
Web of Science ®Google Scholar
Dodangeh M, Vicente LN, Zhang Z. On the optimal order of worst case complexity of direct search. Optim Lett. 2015 Jun;10(4):699–708. doi: 10.1007/s11590-015-0908-1
Web of Science ®Google Scholar
Vicente L. Worst case complexity of direct search. EURO J Comput Optim. 2013 May;1(1-2):143–153. doi: 10.1007/s13675-012-0003-7
Google Scholar
Ivanov B, Milovanović GV, Stanimirović PS. Accelerated dai-liao projection method for solving systems of monotone nonlinear equations with application to image deblurring. J Glob Optim. 2022 Jul;85(2):377–420. doi: 10.1007/s10898-022-01213-4
Web of Science ®Google Scholar
Abubakar AB, Kumam P. A descent Dai-Liao conjugate gradient method for nonlinear equations. Numer Algorithms. 2018 May;81(1):197–210. doi: 10.1007/s11075-018-0541-z
Web of Science ®Google Scholar
Huang F, Deng S, Tang J. A derivative-free memoryless BFGS hyperplane projection method for solving large-scale nonlinear monotone equations. Soft Comput. 2022 Oct;27(7):3805–3815. doi: 10.1007/s00500-022-07536-4
Web of Science ®Google Scholar
Sabi'u J, Shah A, Stanimirović PS, et al. Modified optimal Perry conjugate gradient method for solving system of monotone equations with applications. Appl Numer Math. 2023 Feb;184:431–445. doi: 10.1016/j.apnum.2022.10.016
Web of Science ®Google Scholar
Moré JJ, Wild SM. Benchmarking derivative-free optimization algorithms. SIAM J Optim. 2009 Jan;20(1):172–191. doi: 10.1137/080724083
Web of Science ®Google Scholar
Dolan ED, Moré JJ. Benchmarking optimization software with performance profiles. Math Program. 2002 Jan;91(2):201–213. doi: 10.1007/s101070100263
Web of Science ®Google Scholar
La Cruz W, Martínez J, Raydan M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Math Comput. 2006;75(255):1429–1448. doi: 10.1090/S0025-5718-06-01840-0
Web of Science ®Google Scholar
Gao P, He C. An efficient three-term conjugate gradient method for nonlinear monotone equations with convex constraints. Calcolo. 2018 Nov;55(4):53. doi: 10.1007/s10092-018-0291-2
Web of Science ®Google Scholar
Geiger C, Kanzow C. On the resolution of monotone complementarity problems. Comput Optim Appl. 1996 Mar;5(2):155–173. doi: 10.1007/BF00249054
Google Scholar

A subspace inertial method for derivative-free nonlinear monotone equations

Abstract