Publication Cover
Optimization
A Journal of Mathematical Programming and Operations Research
Latest Articles
270
Views
2
CrossRef citations to date
0
Altmetric
Research Article

A subspace inertial method for derivative-free nonlinear monotone equations

ORCID Icon, ORCID Icon & ORCID Icon
Received 03 Apr 2023, Accepted 23 Aug 2023, Published online: 04 Sep 2023

Abstract

We introduce a subspace inertial line search algorithm (SILSA), for finding solutions of nonlinear monotone equations (NME). At each iteration, a new point is generated in a subspace generated by the previous points. Of all finite points forming the subspace, a point with the largest residual norm is replaced by the new point to update the subspace. In this way, SILSA leaves regions far from the solution of NME and approaches regions near it, leading to a fast convergence to the solution. This study analyzes global convergence and complexity upper bounds on the number of iterations and the number of function evaluations required for SILSA. Numerical results show that SILSA is promising compared to the basic line search algorithm with several known derivative-free directions.

1. Introduction

This paper introduces an efficient derivative-free algorithm for solving monotone equations (1) F(x)=0,xRn,(1) where x is a vector in Rn and F:RnRn is a monotone function, possibly with large n. In finite precision arithmetic, for a given threshold ϵ>0 and an initial point x0Rn, the algorithm finds an ε-approximate solution xϵ of the problem (Equation1), where the residual norm of xϵ is below min(ϵ,F(x0)). Here is the Euclidean norm.

The problem (Equation1) appears in various practical applications, including constrained neural networks [Citation1], nonlinear compressed sensing [Citation2], phase retrieval [Citation3], and economic and chemical equilibrium problems [Citation4].

Different algorithms [Citation5–17] have been proposed and analyzed for finding an ε-approximate solution of the problem (Equation1). However, these approaches either require the computation of the (real) Jacobian matrix, which can be computationally expensive and memory-intensive, making them unsuitable for high-dimensional problems, or use its approximation, which may require a large number of function evaluations in high dimensions. As a result, these methods are not ideal for tackling large scale nonlinear systems of equations.

To address these limitations, Solodov and Svaiter [Citation18] proposed a basic line search algorithm (BLSA) augmented with a projected scheme for finding an ε-approximate solution of (Equation1). Compared to methods that require the (real) Jacobian matrix or its approximation, derivative-free methods [Citation19–38] have a simpler structure and lower memory requirements, making them suitable for solving large-scale problems. Nevertheless, it should be noted that BLSA does not provide a guarantee for reducing the residual norm at every accepted point. Instead, the residual norm can non-monotonically jump down or up until an ε-approximate solution of (Equation1) is obtained. Hence, the convergence rate of BLSA is relatively slow.

To accelerate the convergence rate of BLSA and obtain an ε-approximate solution of (Equation1), one potential approach is to integrate BLSA with inertial methods, such as those proposed in [Citation39–45], which construct steps based on the two previous accepted points, as described in [Citation29–31]. Despite using two previous points to generate new points, these methods still do not guarantee that the resulting points have the lowest residual norms in comparison to the prior accepted points. Another potential approach proposed in [Citation46] is to augment the algorithm with an extrapolation step in the line search condition to enforce residual norm reduction at each accepted point. However, in ill-conditioned problems, this technique may face challenges in finding such points since it does not accept points whose residual norms have not been reduced.

The global convergence of BLSA with various derivative-free directions to find an ε-approximate solution of (Equation1), has been established in [Citation24,Citation29–32,Citation35]. However, based on our knowledge, no attempt has been made to find out the maximum number of iterations and the maximum number of function evaluations required to find an ε-approximate solution of (Equation1) for BLSA. Thus, if complexity upper bounds on the number of function evaluations and the number of iterations are found, it would be interesting to know the cost of the algorithm before the algorithm is implemented and to know what parameters are appeared in such complexity bounds.

1.1. Contribution

This study proposes a new derivative-free line search algorithm, named subspace inertial line search algorithm (SILSA), which aims to find an ε-approximate solution of the problem (Equation1) in Euclidean space. The underlying mapping of (Equation1) is monotone and Lipschitz continuous. The proposed subspace inertial point uses the information of the previous evaluated points. It uses a procedure to replace a point with the largest residual norm in the subspace with a new evaluated point. Due to this replacement, SILSA moves from regions containing points with large residual norms to regions containing points with low residual norms, leading to a fast convergence to an ε-approximate solution of the problem (Equation1). Additionally, the algorithm employs a spectral derivative-free direction based on the efficient direction of Liu and Storey [Citation32], along with an improved version of BLSA by Solodov and Svaiter [Citation18]. Moreover, we establish the global convergence property of SILSA under mild conditions and derive complexity upper bounds on the number of iterations and the number of function evaluations required by SILSA to find an ε-approximate solution of (Equation1).

1.2. Organization of the paper

The paper is structured as follows. Section 2 provides preliminaries. Section 3 introduces SILSA, which comprises a new subspace inertial technique in Subsection 3.1 and a new derivative-free direction in Subsection 3.2. Section 4 investigates theoretical results for SILSA, which include auxiliary results in Subsection 4.1, global convergence in Subsection 4.2, and complexity results in Subsection 4.3. In Section 5, we compare SILSA with BLSA using several known derivative-free directions. Conclusion is given in Section 6.

2. Preliminaries

A mapping F:RnRn is said to be monotone if the condition (2) (F(x)F(y))T(xy)0, x,yRn(2) holds in a Euclidean space Rn.

In this paper, we assume the following assumptions:

(A1)

The function F:RnRn is continuously differentiable and Lipschitz continuous with the Lipschitz constant L>0.

(A2)

F is monotone, i.e. the condition (Equation2) holds.

(A3)

The solution set X of the system (Equation1) is nonempty.

In the following subsections, we review the important concepts (basic line search, least and most promising points, inertial point, and complexity bound) that we are using during our study.

2.1. Basic line search algorithm (BLSA)

In this subsection, we introduce the concept of BLSA, which generates a sequence of iterates, denoted as {yk}k0, to find an ε-approximate solution of (Equation1). It enforces the condition that yk must satisfy the line search condition (3) F(yk+αkdk)TdkσαkF(yk+αkdk)dk2.(3) After the direction dk satisfies the descent condition (4) F(yk)TdkcF(yk)2,with 0<c<1(4) and the step size αk is found by satisfying the line search condition (Equation3), BLSA accepts the new point (5) yk+1:=yk+αkdk, k0.(5)

2.2. Least and most promising points

In this subsection, we define two important concepts, which are needed to clarify our algorithm: the least promising point (LP point), defined as the point with the highest residual norm among the evaluated points, and the most promising point (MP point), defined as the point with the lowest residual norm among the evaluated points. These definitions are motivated by the fact that there is no guarantee of a residual norm reduction in each step of BLSA.

2.3. Inertial step

In this subsection, the traditional inertial point (6) vk=yk+ek(ykyk1)(6) is defined, where yk1,ykRn are the two distinct points generated by BLSA and ek[0,1) is called extrapolation step size, which can be updated in various ways [Citation29–31,Citation39–43,Citation45], one of which is (7) ek=min{emax,k2ykyk12},(7) where 0<emax1 is a tuning parameter. This choice guarantees the global convergence for BLSA in combination with the initial step (Equation7), e.g. see [Citation29, Lemma 4.5].

Due to existing of no guarantee of producing yk1 and yk as MP points by applying BLSA, it may decrease the effectiveness of inertial point. However, employing a subspace inertial point based on the previous MP points, gives us this chance to generate a new MP point or a point close to two previous MP points. In this way, the new generated point would be far from the previous LP points.

2.4. Complexity bound

In this subsection, we define the complexity bound, i.e. the maximum number of iterations and the maximum number of function evaluations required to find an ε-approximate solution xϵ of the problem (Equation1) that satisfies the theoretical criterion (8) F(xϵ)min(ϵ,F(x0)).(8) Let us define f(x):=12F(x)2 and its true gradient by g(x):=J(x)TF(x) of F at x, where J(x) denotes the true Jacobian of F at x. Using the linear approximation of F(x+d)F(x)+J(x)Td, we have f(x+d)q(d):=f(x)+g(x)Td+12J(x)d2,where the function q(d) is a convex function. Assumptions (A1)–(A2) implies that for every x,dRn, we have (9) f(x+d)f(x)=g(x)Td+12γ2d2,(9) where γ depends on x and d and also satisfies (10) |γ|L (general case),0γL (convex case).(10) The following result is a variant of [Citation47, Proposition 2], which is a crucial component to obtain the complexity bound for our algorithm. This result is independent of a particular derivative-free line search. Here we use the basic line search algorithm, which is different from the line search algorithm of [Citation47].

Proposition 2.1

Consider x,dRn and Δf0, where Δf is a threshold on f. Then, we show that at least one of the following conditions is satisfied:

(i)

f(x+d)<f(x)Δf,

(ii)

f(x+d)>f(x)+Δf and f(xd)<f(x)Δf,

(iii)

|g(x)Td|Δf+12L2d2.

Proof.

We assume that (iii) does not satisfy. Hence, we have |g(x)Td|>Δf+12L2d2.Although the condition (Equation4) holds, we cannot guarantee g(x)Td<0 because the true matrix J(x) at x is not available in g(x)=J(x)TF(x). Hence, we consider the proof in the following two cases:

Case 1. If g(x)Td0, from (Equation9) and (Equation10), we have (11) f(x+d)f(x)g(x)Td+12L2d2=|g(x)Td|+12L2d2<Δf;(11) hence (i) holds.

Case 2. If g(x)Td0, from (Equation9) and (Equation10), we have (12) f(xd)f(x)g(x)T(d)+12L2d2=|g(x)Td|+12L2d2<Δf;(12) hence the second inequality of (ii) holds. The first inequality of (ii) f(x+d)f(x)g(x)Td12L2d2>Δfis obtained.

In exact precision arithmetic, the aim is to obtain an exact solution of the problem (Equation1). However, in the presence of finite precision arithmetic, the algorithm may get stuck before finding an approximate solution of (Equation1), especially in nearly flat areas of the search space. For a finite termination, the theoretical criterion (Equation8) is used to find an ε-approximate solution of (Equation1).

2.5. Existing derivative-free directions

We here discuss several conjugate gradient (CG) type directions and their derivative-free variants.

Let us begin with a well-known CG method that aims to minimize an unconstrained smooth function f:RnR, using the iterative formula: (13) xk+1=xk+αkdk, k0,(13) where αk is a step size determined by a line search procedure. The search direction (14) d0:=g(x0),dk:=g(xk)+βkdk1, k1(14) is computed, where g(xk) is the true gradient of f(x) at xk and βkR is the CG parameter.

Some classical famous formulas of the CG parameter are

  1. βkPRP:=g(xk)Tvk1g(xk1)2 of Polak–Ribere–Polyak (PRP) [Citation36,Citation37], where vk:=g(xk)g(xk1);

  2. βkFR:=g(xk)2g(xk1)2 of Fletcher–Reeves (FR) [Citation26];

  3. βkLS:=g(xk)Tvk1dk1Tg(xk1) of Liu–Storey (LS) [Citation32];

  4. βkDY:=g(xk)2dk1Tvk1 of Dai–Yuan (DY) [Citation25];

  5. βkDL:=g(xk)Tvk1dk1Tvk1tg(xk)Tsk1dk1Tvk1 of Dai–Liao (DL) [Citation48], where t0 and sk:=xkxk1.

For the other CG type directions; see the survey [Citation28].

To identify an ε-approximate solution of (Equation1), BLSA generates the sequence {xk}k0 given by (15) xk+1=xkλkF(zk),λk=F(zk)T(xkzk)F(zk)2(15) with the trial point zk=xk+αkdk. As a cheap and useful choice for dk, based on CG directions this paper focuses on the following three derivative-free search directions:

  1. Motivated by the PRP method, the derivative-free direction d0=F(x0),dk=F(xk)+βkdk1was proposed in [Citation24], where βk=F(xk)Tyk1F(xk1) and yk=F(xk)F(xk1).

  2. Inspired by the FR method, the derivative-free direction d0=F(x0),dk=F(xk)+βkFRvk1θkF(xk)was proposed in [Citation35] with βkFR=F(xk)2F(xk1)2 and the three different choices θk(1)=F(xk)Tvk1F(xk1)2,θk(2)=F(xk)2vk12F(xk1)4,θk(3)=θk(1)+(βkFR)2,where vk=zkxk.

  3. Motivated by the LS method, the derivative-free direction d0=F(x0),dk=F(xk)+βkELSdk1was proposed in [Citation49], where βkELS=F(xk)Tyk1F(xk1)Tdk1tyk12F(xk)Tdk1(F(xk1)Tdk1)2 and t14.

These methods are particularly well-suited for tackling large-scale non-smooth problems, since they utilize only function values and require minimal memory. Furthermore, the stability of the search directions is independent of the type of line search employed. It has been demonstrated that the sequence {xk}k0 generated by these methods globally converges to the solution of (Equation1), provided that the underlying mapping F is monotone and L-Lipschitz continuous [Citation18]. In Section 5, we present and evaluate several derivative-free directions that are based on the well-known CG directions and compare them with our proposed method.

3. Modified derivative-free algorithm

As mentioned in the introduction, several iterative inertial methods have been proposed in [Citation29–31] for obtaining an ε-approximate solution of nonlinear monotone Equation (Equation1) in Euclidean space. The authors established that the sequences generated by their methods converge globally to the solution of the problem under mild conditions. Their primary contribution is in achieving an ε-approximate solution to (Equation1) at a faster rate.

As discussed in Section 2, the concepts of MP and LP points were defined. Since BLSA cannot guarantee that the two points used to construct the inertial method are the previous MP points, the traditional inertial point has a low probability of being an MP point. To address this issue, the inertial point must ideally be constructed based on the previous MP points, or at the very least, a point in close proximity to the previous MP points. Hence, SILSA reduces the oscillation intensity of the residual norm by moving from regions containing LP points to regions containing MP points.

3.1. Novel subspace inertial method

In this section, we present a novel subspace inertial method that chooses the ingredients of the subspace from the previous MP points, thereby accelerating convergence to an ε-approximate solution of (Equation1).

Let {xk}k0 be the sequence generated by our method. At the kth iteration of SILSA, we save the points generated by SILSA as the columns of the matrix Xn×m and their residual norms as the components of the vector NF1×mk:=(F(X:1k),,F(X:mk)).Here the jth column of Xk is denoted by X:jk and m is the subspace dimension. We now introduce our novel subspace inertial point (16) wk:=xk+ekj=1m1λj(X:j+1kX:jk),(16) where the extrapolation step size (17) ek:=min{emax,k2j=1m1λj(X:j+1kX:jk)2}(17) is computed, so that the condition (18) k=1j=1m1λj(X:j+1kX:jk)<(18) is satisfied. Here 0<emax1 is the maximum value for ek and 0<λj<1 for j=1,,m1 are called subspace step sizes, satisfying (19) j=1m1λj=1.(19) These subspace step sizes will be chosen in Section 5 such that (Equation19) holds. The condition (Equation18) results in αkdk0 (see Lemma 4.1, below), where αk satisfies the line search condition (Equation3). This result guarantees the global convergence for SILSA (see Theorem 4.3, below).

To update the matrix Xk and the vector NFk at the kth iteration of SILSA, we replace the LP point with a new MP point (if any). Therefore, we use the new subspace inertial point (Equation16), which involves a weighted average of the m previous MP points. This increases the chance of discovering an MP point by SILSA, i.e. xb with b=argmini=1:m(NFik).

It should be noted that when m=2, the traditional inertial point (Equation6) differs from our subspace inertial point (Equation16), because (Equation16) replaces the LP point among the previous MP points with a new point. This means that (Equation16) is not restricted to using only the two previous points xk1 and xk, while the traditional inertial point (Equation6) is limited to exactly these two points. If these two points are LP points, then BLSA using (Equation6) cannot move quickly from regions with LP points to regions with MP points, while SILSA using (Equation16) has a good chance of having more MP points in the subspace inertial, since the subspace inertial is updated by removing LP points as described above.

3.2. Novel derivative-free direction

Motivated by the CG method given in [Citation32], we introduce the spectral derivative-free direction (20) dk:={θ0F(wk)if k=0,θkF(wk)+βkDFLSdk1if k1,(20) with the scalar parameter (21) βkDFLS:=F(wk)Tyk1F(wk1)Tdk1(21) and the spectral parameter (22) θk:={cif k=0,c+βkDFLSF(wk)Tdk1F(wk)2if k1,(22) where 0<c<1 is given and wk comes from (Equation16).

Lemma 3.1

The search direction dk computed by (Equation20) for k0 satisfies the descent condition (Equation4).

Proof.

For the tuning parameter 0<c<1, d0TF(w0)=cF(w0)2 and dkTF(wk)=(θkF(wk)+βkDFLSdk1)TF(wk)=(c+βkDFLSF(wk)Tdk1F(wk)2)F(wk)2+βkDFLSF(wk)Tdk1=cF(wk)2for k1;hence dk satisfies (Equation4) for all k0.

3.3. Subspace inertial line search with SILSA

In this subsection, we introduce a detailed description of our subspace inertial derivative-free algorithm, which we call SILSA. This algorithm is designed to find an ε-approximate solution of (Equation1) and is an improved version of BLSA that incorporates the subspace inertial method for faster convergence. In practice, the new subspace inertial method generates points that are, at worst, close to the previous MP points. Specifically, the new method replaces one of the previous MP points with the greatest residual norm. This substitution causes SILSA to move from regions with LP points to regions with MP points, and in practice quickly finds an approximate solution of (Equation1).

The SILSA algorithm incorporates several tuning parameters: σ>0 and 0<γ¯<1 (the line search parameters), 0δmin<1 (the minimum threshold for δk), 0<δmin<δmax1 (the initial value for δk), ωd>1 (the parameter for updating δk), 0<c<1 (the parameter for computing dk), m2 (the subspace dimension), r(0,1) (the parameter for reducing αk), and 0emax<1 (the maximum value for ek).

We now describe how SILSA algorithm works:

(S0)

(Initialization) First, we choose an initial point x0Rn. Next, we select the inertial weights 0<λj<1, for j=1,,m1 such that the condition j=1m1λj=1 is satisfied. We then choose the initial inertial point w0=x0, and set the search direction to the negative residual vector at x0. The initial parameter e0 for adjusting the inertial point is set to the tuning parameter 0<emax1, while the initial step size 0<δ01 is set to the tuning parameter 0<δmax1.

(S1)

(Line search algorithm) At the kthe iteration, SILSA performs lineSearch (line 5) along the derivative-free direction dk (dk is computed by searchDir in line 16 for k1). Initially, lineSearch sets j=0 and takes the initial step size (23) αk,0:=δk.(23) The other step sizes are then reduced by a given factor 0<r<1 according to (24) αk,j+1:=rαk,jfor j0,(24) and j is increased until the line search condition (25) F(wk+αk,jdk)Tdkσαk,jF(wk+αk,jdk)dk2(25) is satisfied. Once this condition holds, we set αk:=αk,j and compute the accepted point (26) zk:=wk+αkdk(26) and its residual vector F(zk). If F(zk) is below a given threshold ϵ>0, zk is chosen as an approximate solution of (Equation1), and SILSA terminates.

(S2)

(Checking reduction of the residual norm) If F(zk)>ϵ, then checkDec (line 8) checks whether or not the decrease condition (27) f(zk)<f(wk)γ¯δk(27) holds, where f(zk):=12F(zk)2 and f(wk):=12F(wk)2. Accordingly, using the tuning parameter ωd>1, it then either increases the step size δk of the decrease condition (Equation27), i.e. (28) δk+1:=min(ωdδk,δmax)(28) or decreases it, i.e. (29) δk+1:=δk/ωd.(29) Moreover, the extrapolation step size ek is computed by (Equation17).

(S3)

(Projection of wk into the hyperplane H:={wRnF(zk)T(wzk)=0}) The point wk is projected into H by projectPoint (line 9) and then the new point (30) xk+1:=wkμkF(zk)(30) is computed with the step size μk:=F(zk)T(wkzk)F(zk)2, and its residual norm F(xk+1). If F(xk+1) is below a given threshold ϵ>0, xk+1 is chosen as an approximate solution of (Equation1) and SILSA ends.

(S4)

(Update of the subspace of the old points) If the norm of the residual vector at the next point xk+1, denoted by F(xk+1), is greater than a given threshold ϵ>0 then updateSubspace updates the information of the subspace inertial point, which includes the matrix Xk and the vector NFk (defined in Section 3.1) in line 11. If the current iteration k+1m holds, then a new point is added to the subspace. Specifically, we find the index iw of the previous MP point with the largest residual norm and replace it with the new point xk+1, i.e. iw:=argmaxi=1:m{NF:ik+1},X:iwk+1=xk+1,NF:iwk+1=F(xk+1).On the other hand, if k+1<m, we set X:k+1k+1=xk+1 and NF:k+1k+1=F(xk+1). By adding more points with the lower residual norm to the subspace, this strategy increases the chance of finding an ε-approximate solution of (Equation1).

(S5)

(Computation of a new inertial point) After calculating the new subspace inertial point wk+1 by (Equation16) and determining its residual norm, if the value of F(wk+1) is found to be less than a certain specified threshold ϵ>0, then wk+1 is considered as an approximate solution for (Equation1), and SILSA terminates. Alternatively, if δk is found to be less than a given threshold δmin>0, then wk is considered as an approximate solution for (Equation1), and SILSA terminates.

(S6)

(Computation of the search direction) If the value of F(wk+1) is greater than ε, the difference between the residual at the inertial points wk+1 and wk, denoted by yk:=F(wk+1)F(wk), is computed. Then the derivative-free direction dk+1 is computed by (Equation20), whose step sizes βkDFLS and θk+1 have been computed by (Equation21) and (Equation22), which depend on the values of yk, dk, wk, and wk+1.

The tuning parameters δmax, δmin, ωd, and r appear in the complexity bound on the number of function evaluations, which is discussed in Section 4.3. It is worth noting that, based on the update rules for δk in (Equation28) and (Equation29), δk is always less than or equal to δmax for all values of k.

4. Convergence analysis and complexity

In this section, we first present several auxiliary results, that are necessary to establish global convergence and the complexity bounds, and then the main theoretical results.

4.1. Some auxiliary results

The following results have a key role in proving global convergence.

Lemma 4.1

Let {wk}k0 and {xk}k0 be the two sequences generated by SILSA, assume that the assumptions (A1)–(A3) hold, and define Δk:=maxj=1:m1|X:j+1kX:jk|. Then:

(i)

The inequality (31) xk+1x2wkx2σ2wkzk4(31) holds.

(ii)

The sequence {xk}k0 is bounded, k=0wkzk4<and so (32) limkwkzk=limkαkdk=0.(32)

(iii)

Δk is finite and (33) j=1m1λjX:j+1kX:jk(m1)Δk<.(33)

(iv)

There exists a positive constant Γ¯w such that (34) F(wk)Γ¯w, k0.(34)

(v)

If the direction dk is bounded, i.e. dkΓd for a positive constant Γd, then there exists a positive constant Γz such that (35) F(zk)Γz, k0.(35) Here, zk is from (Equation26).

(vi)

If F(wk)0 for any k, then dkcF(wk) and dk0.

Proof.

Let x be the solution of Equation (Equation1) and xXRn be the set of feasible solutions.

(i-ii)

The proof can be done like the proof of [Citation29, Lemma 4.5], but with the difference that the extrapolation step size (Equation17) and the condition (Equation18) are used instead of the traditional extrapolation step size (Equation6) and the condition (Equation7), respectively.

(iii)

There exist two positive integers k>k such that Δk=maxj=1:m1X:j+1kX:jk=xkxkxk+xk<due to (ii). Then, the condition (Equation33) holds, because 0<m< and j=1m1λj=1.

(iv)

From (ii), since {xk}k0 is bounded, there exists a positive constant Γ0 such that xkΓ0 for all k0. From (Equation17) and since 0<emax1, we obtain ekemax1 for all k. Hence, by (Equation16) and (Equation33), the sequence {wk}k0 is bounded above, i.e. wk=xk+ekj=1m1λj(X:j+1X:j)xk+ekj=1m1λjX:j+1X:jΓ0+2Γ0(m1)=(2m1)Γ0;hence F is continuous from (A1) and therefore (Equation34) is valid.

(v)

From (Equation23), (Equation28), and since 0<δmax1, we obtain αkδmax1 for all k. Hence, (Equation26) and (iv) result in zk=wk+αkdkwk+dk(2m1)Γ0+Γd.Therefore the continuity of F implies that (Equation35) is valid.

(vi)

From (Equation4) and the fact that F(wk)0 for all k, we have dkF(wk)TdkF(wk)cF(wk)<0,resulting in dk0.

In the following result, under the assumption that the residual norms are bounded below, upper and lower bounds for search directions and step sizes are restricted.

Proposition 4.2

Let {xk}k0 and {wk}k0 be the two sequences generated by SILSA and assume that the assumptions (A1)–(A3) hold. If there is a positive constant Γ_w such that F(wk)Γ_w for all k, then the following two statements are valid:

(i)

The search directions dk are bounded above, i.e. (36) 0<dkΓd:=cΓ¯w+4Γ¯w2cΓ_w2σ k,(36) where σ is from the line search condition (Equation25), Γ¯w is from (Equation34), and c is from (Equation4).

(ii)

If the line search condition (Equation25) cannot be satisfied, then line search step sizes αk are bounded, i.e. (37) α_:=rcΓ¯w2(L+σΓz)Γd2αkδmax1.(37) where r is from the line search condition (Equation25), L is from (A2), and δmax is a tuning parameter.

Proof.

By the Cauchy-Schwartz inequality and (Equation25), we have F(zk)wkzkF(zk)T(wkzk)σαk,j2F(zk)dk2=σF(zk)wkzk2.Thus, we obtain (38) σwkzk1, k0.(38) It follows from Lemma 4.1(iv), (Equation4), (Equation20), (Equation34), and (Equation38) that, |θk|=|c(F(wk)Tyk1)(F(wk)Tdk1)F(wk1)Tdk1F(wk)2|c+F(wk)yk1F(wk)dk1|F(wk1)Tdk1|F(wk)2and |βkDFLS|=|F(wk)Tyk1F(wk1)Tdk1|F(wk)yk1|F(wk1)Tdk1|, resulting in 0<dk=θkF(wk)+βkDFLSdk1cF(wk)+2F(wk)yk1|F(wk1)Tdk1|dk1cF(wk)+4Γ¯w2cF(wk1)2wk1zk1Γd=cΓ¯w+4Γ¯w2cΓ_w2σ.(ii) From Lemma 4.1(vi), dk0. We show that lineSearch always terminates in a finite number of steps. From (Equation23), we have αk,0=δk. Then according to the role of updating αk,j in (Equation24) we have αk,j=rjδk. If the condition (Equation25) with αk,j=rjδk does not hold, i.e. (39) F(wk+rjδkdk)Tdk<σrjδkF(wk+rjδkdk)dk2,(39) as j goes to infinity, we have F(wk)Tdk<0, which contradicts (Equation4), since δkδmax, dkΓd (from (i)), and F(wk+rjδkdk)=F(wk)Γz. Hence lineSearch terminates finitely; there is a positive integer j such that αk=αk,j=rjδk,satisfying (Equation25). As long as (Equation39) holds, applying (Equation22) into (Equation4) and using (A2), we have cF(wk)2=F(wk)Tdk=(F(wk+r(j1)δk,dk)TdkF(wk)Tdk)F(wk+r(j1)δkdk)TdkLr(j1)δkdk2+σr(j1)δkF(wk+r(j1)δkdk)dk2,leading to δmaxαk=rjδkrcF(wk)2(L+σF(wk+r(j1)δkdk))dk2α_=rcΓ¯w2(L+σΓz)Γd2from Lemma 1(iv).

4.2. Convergence analysis

The following result is the main global convergence of SILSA. The variants of this result can be found in [Citation29–31], but with the different inertial point.

Theorem 4.3

Suppose that (A1)–(A3) hold and {wk}k0, {zk}k0, {xk}k0 are the three sequences generated by SILSA. Let δmin=0. Then, at least one of (40) limkF(zk)=0,limkF(wk)=0,limkF(xk)=0(40) holds. Moreover, the sequences {xk}k0 and {wk}k0 converge to a solution of (Equation1).

Proof.

If F(wk)=0, then SILSA terminates and accepts wk as a solution of (Equation1) (see line 13 of SILSA). Otherwise, SILSA performs and therefore there is a positive constant Γ_w such that F(wk)>Γ_w, for all k, holds. Hence Proposition 4.2(i) results in that there is a positive constant Γd such that 0<dkΓd for all k (dk0 from Lemma 4.1(vi)). Moreover, if F(wk+rjδkdk)=0, then SILSA terminates and accepts zk=wk+rjδkdk as a solution of (Equation1) (here j is a positive integer value such that αk=αk,j=rjδk satisfying (Equation25)). Otherwise, from Lemma 4.1(iv), there is a positive constant Γz such that 0<F(zk)Γzwith zk=wk+rjδkdk.As such, the assumptions of Proposition 4.2(ii) are verified and this proposition results in that there is a positive constant α_ such that αkα_ for all k. Hence, from Lemma 4.1(vi), we obtain αkdk>α_cF(wk)>α_cΓ_w>0 for all k, which contradicts (Equation32). Therefore, F(wk)=0 is obtained. From (Equation32), limkxkwk=0 and the continuity of F results in limkF(xk)limkF(wk)limkF(xk)F(wk)Llimkxkwk=0,which consequently implies (41) limkF(xk)=0.(41) From the continuity of F, the boundedness of {xk}k0 and (Equation41), it implies that the sequence {xk}k0, generated by SILSA, has an accumulation point x such that F(x)=0. On the other hand, the sequence {xkx}k0 is convergent by Lemma 4.1, which means that the sequence {xk}k0 globally converges to the solution x of (Equation1).

4.3. Complexity results

This section concerns an investigation of the complexity of SILSA. Firstly, we establish an upper limit on the number of function evaluations required by lineSearch. Following that, we determine an upper bound on the number of iterations needed for SILSA to converge, with or without a reduction in residual norms. Consequently, we derive an upper threshold for the total number of function evaluations necessary to successfully find an approximate solution of (Equation1) by SILSA.

Proposition 4.4

Let {xk}k0 be the sequence generated by SILSA and assumes that (A1)–(A3) hold. Assuming that the initial step size is bounded by 0<δmax1 and that the parameter 0<r<1 is utilized to decrease the step size in lineSearch, the number nf of function evaluations used by lineSearch (line 5 of SILSA) can be constrained by logr1δmaxα_,where α_ is a positive constant derived from Proposition 4.2(ii).

Proof.

From Proposition 4.2(ii), we have αk,nfα_ for all k. By (Equation23) and (Equation24), we have rnfδmaxαk,nf=rnfδkα_,leading to nflogr1δmaxα_since r(0,1).

By means of (Equation27), we define the index set Ik as the set of all iterations k such that f(zk)<f(wk)γ¯δk. This set encompasses iterations exhibiting at most γ¯δk reductions in the residual norms, while the index set Ikc is the complement of Ik.

Theorem 4.5

Let {wk}k0, {zk}k0, {xk}k0 be the sequences generated by SILSA, let xϵ be an ε-approximate solution of (Equation1) found by SILSA, and assume that (A1)–(A3) hold. Moreover, the tuning parameters 0<γ¯<1 (parameter for line search), 0<δmin<δmax1 (initial and minimal threshold for the step size δk), 0<r<1 (parameter for reducing step size by lineSearch), 1ωd< (parameter for updating the step size δk) are given. Then the following statements are valid:

(i)

The number of iterations of SILSA with reductions in the residual norms is bounded by (42) |Ik|f(x0)f(xϵ)γ¯δmin.(42)

(ii)

The number of iterations of SILSA without reductions in the residual norms is bounded by (43) |Ikc|logωdδmaxδmin.(43)

(iii)

The number of iterations of SILSA is bounded by N=|Ik|+|Ikc|f(x0)f(xϵ)γ¯δmin+logωdδmaxδmin=O(δmin1).

(iv)

The number of function evaluations of SILSA is bounded by nftotalNlogr1δmaxα_.

(v)

If there is a positive constant M0 such that (44) g(wk)M0F(wk)(44) for all k and SILSA has no iteration with a reduction in the residual norm, SILSA finds at least a point wk with at most O(ϵ2) function evaluations satisfying F(wk)=O(ϵ). Here g(wk)=J(wk)TF(wk) comes from Section 2.

Proof.

  1. The index set Ik is defined as {kf(zk)<f(wk)γ¯δk}. By the definition of Ik, we have: f(x0)f(xϵ)jIk(f(wj)f(zj))γ¯jIkδkγ¯jIkδmin=|Ik|γ¯δmin,which yields the result in (Equation42).

  2. The set Ikc is defined as {1,2,,k}Ik. Updating δk=δk1/ωd guarantees that δminδkδmax, which leads to the derivation of (Equation43).

  3. Combining the results from (i) and (ii) yields the desired outcome.

  4. The result is obtained from (iii) and Proposition 4.4.

  5. Consider any jN{0} with j<. During the execution of lineSearch, the trial points wk+αk,jdk are generated, the last of which satisfies the line search condition (Equation25) and is accepted as zk. However, the condition (Equation27) along ±dk may not be satisfied. In the worst case, we assume that (Equation27) is not satisfied. Then, by applying Proposition 2.1(iii), we have: |g(wk)T(αkdk)|γ¯δk+L2αkdk2.We now consider the following two cases:

    • Case 1: If F(wk)ϵ:=δmin, then xk=wk is a solution of (Equation1). Hence SILSA finds a point xk whose residual norm is less than ε with at most O(ϵ2) function evaluations.

    • Case 2 Assuming that F(wk)>ϵ for all k, we can apply Proposition 4.2(i) to obtain the condition (Equation36), which ensures that dkΓd for all k. Additionally, Lemma 4.1(iv) and Proposition 4.2(ii) guarantee that the condition (Equation37) holds, i.e. αkα_ for all k. Consequently, after a finite number of iterations, SILSA terminates due to the role of updating δk in (Equation29), which implies the existence of a positive integer k0 such that αkδkδmin for kk0. Considering the worst-case scenario where there is no reduction of the residual norm at zk for all k (i.e. Ik is empty), we can use Proposition 2.1, (Equation36), (Equation37), and (Equation44) to obtain M0|F(wk)T(αkdk)||g(wk)T(αkdk)|γ¯δk+L2αkdk2γ¯δmin+L2αk2Γd2for all kk0, i.e. cF(wk)2=|F(wk)Tdk|γ¯δminM0αk+L2M0Γd2αk(γ¯M0α_+LΓd22M0)δmin.Combining the results of the two cases, we obtain F(wk)=O(ϵ)=O(δmin), which completes the proof.

As stated in the introduction, our complexity bound is of the same order as the bounds obtained by Cartis et al. [Citation50], Curtis et al. [Citation51], Dodangeh and Vicente [Citation52], Dodangeh et al. [Citation53], Kimiaei and Neumaier [Citation47], and Vicente [Citation54] for other optimization methods.

5. Numerical experiments

In this section, we present a comparative analysis of our algorithm, SILSA, with 10 well-known algorithms (discussed below) on a set of 18 test problems with the dimensions n{10,50,300,500,1000,5000}.This results in a total of 108 test functions. The Matlab codes of these 18 problems are available in the Section 7. To ensure that all test problems used are monotone in finite precision arithmetic, we randomly generated 106 distinct points x and y and verified that the condition (xy)T(F(x)F(y))|xy|T(|F(x)|+|F(y)|)+10.1was fulfilled.

We compare SILSA with the following algorithms:

  1. BLSA-DY, BLSA with the derivative-free direction using the CG direction of Dai and Yuan [Citation25].

  2. BLSA-HZ, BLSA with the derivative-free direction using the CG direction of Hager and Zhang [Citation27].

  3. BLSA-PR, BLSA with the derivative-free direction using the CG direction of Polak-Ribere-Polyak (PRP) [Citation36,Citation37].

  4. BLSA-FR, BLSA with the derivative-free direction using the CG direction of Fletcher and Reeves (FR) [Citation26].

  5. BLSA-3PR, BLSA with the derivative-free direction using the CG direction of Zhang et al. [Citation38].

  6. BLSA-3A, BLSA with the derivative-free direction using the CG direction of Andrei [Citation22].

  7. BLSA-IM, BLSA with the derivative-free direction of Ivanov et al. [Citation55].

  8. BLSA-AK, BLSA with the derivative-free direction of Abubakar and Kumam [Citation56].

  9. BLSA-HD, BLSA with the derivative-free direction of Huang et al. [Citation57].

  10. BLSA-SS, BLSA with the derivative-free direction of Sabi'u et al. [Citation58].

In our comparison, the line search parameters for all algorithms were set to σ=0.01 and r=0.5. We performed a tuning process to choose these two values for the selected test problems. The values of the other tuning parameters of the proposed algorithms are default values. For SILSA, the default values of the tuning parameters are as follows:

δmax=0.5 (the initial step size δk), δmin=0 (the minimum threshold for the step size δk), ωd=2 (the parameter for updating the step size δk), c=0.5 (the direction parameter), emax=104 (maximum value for ek), γ¯=1020 (the line search parameter), λi0=ln(μ+12)lni (the initial values for weights), and m=10 (the subspace inertial dimension). Here μ=4+3lnn was chosen and the normalized version λi0 of λi:=λi0/j=1m1λj0 for i=1,2,,m1 was computed.

Following the data profile of Mor'e and Wild [Citation59] and the performance profile of Dolan and Mor'e [Citation60],

  • the data profile δs(κ) of the solver s for a positive value of κ measures the fraction of problems that the solver s can solve with at most κ(n+1) function evaluations, where n is the dimension of problems,

  • the performance profile ρs(τ) of the solver s for a positive value of τ measures the relative efficiency of the solver s in solving the set of problems.

In particular, the fraction of problems that the solver s wins compared to the other solvers is ρs(1) and the fraction of problems for sufficiently large τ (or κ) that the solver s can solve is ρs(τ) (or δs(κ)). The measure for efficiency considered in this paper is the number nf of function evaluations. The efficiency with respect to nf is called nf efficiency. All algorithms terminated when exactly one of the conditions F(xϵ)105, nfnfmax=10000, and secsecmax=360 sec was satisfied. Here sec denotes time in seconds.

To evaluate their robustness and efficiency, we plot the data and performance profiles of all algorithms. From Figure , we conclude that SILSA is competitive with the other algorithms. Of the 112 test functions, SILSA is able to solve 95% of them, while also having the lowest number of function evaluations on 45% of these problems.

Figure 1. Results for problems with n{10,50,300,500,1000,5000}, the maximum number of function reevaluations (nfmax=10000), the maximum time in seconds (secmax=360 sec), and ϵ=105: Data profiles δ(κ) (left) in dependence of a bound κ on the cost ratio, performance profiles ρ(τ) (right) in dependence of a bound τ on the performance ratio, in terms of nf. Problems solved by no solver are ignored.

Figure 1. Results for problems with n∈{10,50,300,500,1000,5000}, the maximum number of function reevaluations (nfmax=10000), the maximum time in seconds (secmax=360 sec), and ϵ=10−5: Data profiles δ(κ) (left) in dependence of a bound κ on the cost ratio, performance profiles ρ(τ) (right) in dependence of a bound τ on the performance ratio, in terms of nf. Problems solved by no solver are ignored.

6. Conclusion

The paper discusses an improved derivative-free line search method for nonlinear monotone equations. Our line search is a combination of the basic line search algorithm proposed by Solodov and Svaiter [Citation18] and a novel subspace inertial point whose goal is to speed up reaching an approximate solution of nonlinear monotone equations. The subspace is generated based on a finite number of the previous MP points such that a point with largest residual norm among the previous MP points is replaced by a new evaluated point. The global convergence and worst case complexity results are proved. Numerical results show that our improved line search method is competitive with the state-of-the-art derivative-free methods.

7. Test problems in Matlab

The Matlab codes of all 18 test problems are as follows:

The problems 1–4, 6–8, 10, 13 are from [Citation46], the problem 5 is from [Citation14], the problems 9, 14, 15 are from [Citation61], the problems 12 is from [Citation62], and the problems 16-18 are from the present paper. To obtain the problems 16-18, the nonlinear complementarity problem (x,s)0, s=F(x), xTs=0 is converted to the nonsmooth equations (45) (sF(x)min{x,s})=0,(45) where F is a monotone operator. Following [Citation63], by defining ϕ(μ,a,b):=a+b(ab)2+4μfor all (μ,a,b)R3,the problem (Equation45) is transformed into (sF(x)ϕ(μ,x1,s1)ϕ(μ,x2,s2)ϕ(μ,xn,sn))=0.For all test problems, the initial points were chosen to be xi=i/(i+2) for i=1,,n.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The first author acknowledges financial support of the Austrian Science Foundation under Project No. P 34317. The second author is grateful to King Fahd University of Petroleum and Minerals for providing excellent research facilities. The third author is supported by an FWO junior postdoctoral fellowship [12AK924N]. In addition, she received funding from the Flemish Government (AI Research Program). Susan Ghaderi is affiliated with Leuven.AI - KU Leuven Institute for AI, B-3000, Leuven, Belgium.

References

  • Chorowski J, Zurada JM. Learning understandable neural networks with nonnegative weight constraints. IEEE Trans Neural Netw Learn Syst. 2015;26(1):62–69. doi: 10.1109/TNNLS.2014.2310059
  • Blumensath T. Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans Inf Theory. 2013;59:3466–3474. doi: 10.1109/TIT.2013.2245716
  • Candes EJ, Li X, Soltanolkotabi M. Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans Inf Theory. 2015 Apr;61(4):1985–2007. doi: 10.1109/TIT.2015.2399924
  • Dirkse SP, Ferris MC. Mcplib: a collection of nonlinear mixed complementarity problems. Optim Methods Softw. 1995 Jan;5(4):319–345. doi: 10.1080/10556789508805619
  • Ahookhosh M, Amini K, Bahrami S. Two derivative-free projection approaches for systems of large-scale nonlinear monotone equations. Numer Algorithms. 2012 Oct;64(1):21–42. doi: 10.1007/s11075-012-9653-z
  • Ahookhosh M, Artacho FJA, Fleming RMT, et al. Local convergence of the Levenberg–Marquardt method under Hölder metric subregularity. Adv Comput Math. 2019 Jun;45(5-6):2771–2806. doi: 10.1007/s10444-019-09708-7
  • Ahookhosh M, Fleming RMT, Vuong PT. Finding zeros of hölder metrically subregular mappings via globally convergent Levenberg–Marquardt methods. Optim Methods Softw. 2020 Jan;37(1):113–149. doi: 10.1080/10556788.2020.1712602
  • Amini K, Shiker MA, Kimiaei M. A line search trust-region algorithm with nonmonotone adaptive radius for a system of nonlinear equations. 4OR. 2016;14(2):133–152. doi: 10.1007/s10288-016-0305-3
  • Brown PN, Saad Y. Convergence theory of nonlinear Newton–Krylov algorithms. SIAM J Optim. 1994;4(2):297–330. doi: 10.1137/0804017
  • Dennis JE, Moré JJ. A characterization of superlinear convergence and its application to quasi-Newton methods. Math Comput. 1974;28(126):549–560. doi: 10.1090/mcom/1974-28-126
  • Esmaeili H, Kimiaei M. A trust-region method with improved adaptive radius for systems of nonlinear equations. Math Methods Oper Res. 2016;83(1):109–125. doi: 10.1007/s00186-015-0522-0
  • Kimiaei M. Nonmonotone self-adaptive Levenberg–Marquardt approach for solving systems of nonlinear equations. Numer Funct Anal Optim. 2018;39(1):47–66. doi: 10.1080/01630563.2017.1351988
  • Kimiaei M, Esmaeili H. A trust-region approach with novel filter adaptive radius for system of nonlinear equations. Numer Algorithms. 2016;73(4):999–1016. doi: 10.1007/s11075-016-0126-7
  • Li D, Fukushima M. A globally and superlinearly convergent Gauss–Newton-based bfgs method for symmetric nonlinear equations. SIAM J Numer Anal. 1999;37(1):152–172. doi: 10.1137/S0036142998335704
  • Yamashita N, Fukushima M. On the Rate of Convergence of the Levenberg-Marquardt Method. In: Alefeld G, Chen X, editors. Topics in Numerical Analysis. Springer Vienna.2001. p. 239–249.
  • Yuan Y. Recent advances in numerical methods for nonlinear equations and nonlinear least squares. NACO. 2011;1(1):15–34. doi: 10.3934/naco.2011.1.15
  • Zhou G, Toh KC. Superlinear convergence of a Newton-type algorithm for monotone equations. J Optim Theory Appl. 2005;125(1):205–221. doi: 10.1007/s10957-004-1721-7
  • Solodov MV, Svaiter BF. A globally convergent inexact Newton method for systems of monotone equations. In: Reformulation: Nonsmooth, piecewise smooth, semismooth and smoothing methods. Springer; 1998. p. 355–369.
  • Al-Baali M, Narushima Y, Yabe H. A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization. Comput Optim Appl. 2014 May;60(1):89–110. doi: 10.1007/s10589-014-9662-z
  • Aminifard Z, Babaie-Kafaki S. A modified descent Polak–Ribiére–Polyak conjugate gradient method with global convergence property for nonconvex functions. Calcolo. 2019;56(2):1–11. doi: 10.1007/s10092-019-0312-9
  • Aminifard Z, Hosseini A, Babaie-Kafaki S. Modified conjugate gradient method for solving sparse recovery problem with nonconvex penalty. Signal Process. 2022;193:108424. doi: 10.1016/j.sigpro.2021.108424
  • Andrei N. A simple three-term conjugate gradient algorithm for unconstrained optimization. Comput Appl Math. 2013;241:19–29. doi: 10.1016/j.cam.2012.10.002
  • Andrei N. Nonlinear conjugate gradient methods for unconstrained optimization. 1. Cham, Switzerland: Springer Cham; 2020.
  • Cheng W. A PRP type method for systems of monotone equations. MCM. 2009;50(1-2):15–20.
  • Dai YH, Yuan Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J Optim. 1999;10(1):177–182. doi: 10.1137/S1052623497318992
  • Fletcher R, Reeves CM. Function minimization by conjugate gradients. Comput J. 1964;7(2):149–154. doi: 10.1093/comjnl/7.2.149
  • Hager WW, Zhang H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J Optim. 2005;16(1):170–192. doi: 10.1137/030601880
  • Hager WW, Zhang H. A survey of nonlinear conjugate gradient methods. Pacific J Optim. 2006;2(1):35–58.
  • Ibrahim A, Kumam P, Abubakar AB, et al. A method with inertial extrapolation step for convex constrained monotone equations. J Inequal Appl. 2021;2021(1):1–25. doi: 10.1186/s13660-021-02719-3
  • Ibrahim AH, Kumam P, Abubakar AB, et al. Accelerated derivative-free method for nonlinear monotone equations with an application. Numer Linear Algebra Appl. 2021 Nov;29(3):e2424. doi: 10.1002/nla.v29.3
  • Ibrahim AH, Kumam P, Sun M, et al. Projection method with inertial step for nonlinear equations: application to signal recovery. J Ind Manag. 2021;19:30–55.
  • Liu Y, Storey C. Efficient generalized conjugate gradient algorithms, part 1: theory. J Optim Theory Appl. 1991;69(1):129–137. doi: 10.1007/BF00940464
  • Liu Y, Zhu Z, Zhang B. Two sufficient descent three-term conjugate gradient methods for unconstrained optimization problems with applications in compressive sensing. J Appl Math Comput. 2021;68:1787–1816. doi: 10.1007/s12190-021-01589-8
  • Lotfi M, Mohammad Hosseini S. An efficient hybrid conjugate gradient method with sufficient descent property for unconstrained optimization. Optim Methods Softw. 2021;37:1725–1739. doi: 10.1080/10556788.2021.1977808
  • Papp Z, Rapajić S. FR type methods for systems of large-scale nonlinear monotone equations. Appl Math Comput. 2015;269:816–823.
  • Polak E, Ribiere G. Note sur la convergence de méthodes de directions conjuguées. ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique. 1969;3(R1):35–43.
  • Polyak BT. The conjugate gradient method in extremal problems. USSR Comput Math Math Phys. 1969;9(4):94–112. doi: 10.1016/0041-5553(69)90035-4
  • Zhang L, Zhou W, Li DH. A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J Numer. 2006;26(4):629–640. doi: 10.1093/imanum/drl016
  • Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in hilbert space. SIAM J Optim. 2004 Jan;14(3):773–782. doi: 10.1137/S1052623403427859
  • Alvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Var Anal. 2001;9(1):3–11. doi: 10.1023/A:1011253113155
  • Attouch H, Peypouquet J, Redont P. A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256. doi: 10.1137/130910294
  • Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. doi: 10.1137/080716542
  • Boţ RI, Grad SM. Inertial forward–backward methods for solving vector optimization problems. Optimization. 2018 Feb;67(7):959–974. doi: 10.1080/02331934.2018.1440553
  • Boţ RI, Nguyen DK. A forward-backward penalty scheme with inertial effects for monotone inclusions, applications to convex bilevel programming. Optimization. 2018 Dec;68(10):1855–1880.
  • Boţ R, Sedlmayer M, Vuong PT. A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to GANs. arXiv e-prints, arXiv:2003.07886, 2020 Mar.
  • Ibrahim AH, Kimiaei M, Kumam P. A new black box method for monotone nonlinear equations. Optimization. 2021 Nov;72:1119–1137. doi: 10.1080/02331934.2021.2002326
  • Kimiaei M, Neumaier A. Efficient global unconstrained black box optimization. Math Program Comput. 2022 Feb;14:365–414. doi: 10.1007/s12532-021-00215-9
  • Dai YH, Liao LZ. New conjugacy conditions and related nonlinear conjugate gradient methods. Appl Math Optim. 2001 Jan;43(1):87–101. doi: 10.1007/s002450010019
  • Li M. An Liu-Storey-type method for solving large-scale nonlinear monotone equations. Numer Funct Anal Optim. 2014;35(3):310–322. doi: 10.1080/01630563.2013.812656
  • Cartis C, Sampaio P, Toint P. Worst-case evaluation complexity of non-monotone gradient-related algorithms for unconstrained optimization. Optimization. 2014 Jan;64(5):1349–1361. doi: 10.1080/02331934.2013.869809
  • Curtis FE, Lubberts Z, Robinson DP. Concise complexity analyses for trust region methods. Optim Lett. 2018 Jun;12(8):1713–1724. doi: 10.1007/s11590-018-1286-2
  • Dodangeh M, Vicente LN. Worst case complexity of direct search under convexity. Math Program. 2014 Nov;155(1-2):307–332. doi: 10.1007/s10107-014-0847-0
  • Dodangeh M, Vicente LN, Zhang Z. On the optimal order of worst case complexity of direct search. Optim Lett. 2015 Jun;10(4):699–708. doi: 10.1007/s11590-015-0908-1
  • Vicente L. Worst case complexity of direct search. EURO J Comput Optim. 2013 May;1(1-2):143–153. doi: 10.1007/s13675-012-0003-7
  • Ivanov B, Milovanović GV, Stanimirović PS. Accelerated dai-liao projection method for solving systems of monotone nonlinear equations with application to image deblurring. J Glob Optim. 2022 Jul;85(2):377–420. doi: 10.1007/s10898-022-01213-4
  • Abubakar AB, Kumam P. A descent Dai-Liao conjugate gradient method for nonlinear equations. Numer Algorithms. 2018 May;81(1):197–210. doi: 10.1007/s11075-018-0541-z
  • Huang F, Deng S, Tang J. A derivative-free memoryless BFGS hyperplane projection method for solving large-scale nonlinear monotone equations. Soft Comput. 2022 Oct;27(7):3805–3815. doi: 10.1007/s00500-022-07536-4
  • Sabi'u J, Shah A, Stanimirović PS, et al. Modified optimal Perry conjugate gradient method for solving system of monotone equations with applications. Appl Numer Math. 2023 Feb;184:431–445. doi: 10.1016/j.apnum.2022.10.016
  • Moré JJ, Wild SM. Benchmarking derivative-free optimization algorithms. SIAM J Optim. 2009 Jan;20(1):172–191. doi: 10.1137/080724083
  • Dolan ED, Moré JJ. Benchmarking optimization software with performance profiles. Math Program. 2002 Jan;91(2):201–213. doi: 10.1007/s101070100263
  • La Cruz W, Martínez J, Raydan M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Math Comput. 2006;75(255):1429–1448. doi: 10.1090/S0025-5718-06-01840-0
  • Gao P, He C. An efficient three-term conjugate gradient method for nonlinear monotone equations with convex constraints. Calcolo. 2018 Nov;55(4):53. doi: 10.1007/s10092-018-0291-2
  • Geiger C, Kanzow C. On the resolution of monotone complementarity problems. Comput Optim Appl. 1996 Mar;5(2):155–173. doi: 10.1007/BF00249054