Publication Cover
Optimization
A Journal of Mathematical Programming and Operations Research
Latest Articles
354
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A fresh look at nonsmooth Levenberg–Marquardt methods with applications to bilevel optimization

ORCID Icon, ORCID Icon & ORCID Icon
Received 06 Jun 2023, Accepted 29 Jan 2024, Published online: 12 Mar 2024

Abstract

In this paper, we revisit the classical problem of solving over-determined systems of nonsmooth equations numerically. We suggest a nonsmooth Levenberg–Marquardt method for its solution which, in contrast to the existing literature, does not require local Lipschitzness of the data functions. This is possible when using Newton-differentiability instead of semismoothness as the underlying tool of generalized differentiation. Conditions for local fast convergence of the method are given. Afterwards, in the context of over-determined mixed nonlinear complementarity systems, our findings are applied, and globalized solution methods, based on a residual induced by the maximum and the Fischer–Burmeister function, respectively, are constructed. The assumptions for local fast convergence are worked out and compared. Finally, these methods are applied for the numerical solution of bilevel optimization problems. We recall the derivation of a stationarity condition taking the shape of an over-determined mixed nonlinear complementarity system involving a penalty parameter, formulate assumptions for local fast convergence of our solution methods explicitly, and present results of numerical experiments. Particularly, we investigate whether the treatment of the appearing penalty parameter as an additional variable is beneficial or not.

Mathematics Subject Classifications:

1. Introduction

Nowadays, the numerical solution of nonsmooth equations is a classical topic in computational mathematics. Journeying 30 years back to the past, Qi and colleagues developed the celebrated nonsmooth Newton method, see [Citation1, Citation2], based on the notion of semismoothness for locally Lipschitzian functions which is due to Mifflin, see [Citation3]. Keeping the convincing local convergence properties of Newton-type methods in mind, this allowed for a fast numerical solution of Karush–Kuhn–Tucker systems, associated with constrained nonlinear optimization problems with inequality constraints, without the need of smoothing or relaxing the involved complementarity slackness condition. Indeed, the latter can be reformulated as a nonsmooth equation with the aid of so called NCP-functions, see [Citation4–6] for an overview, where NCP abbreviates nonlinear complementarity problem. Recall that a continuous function φ:R2R is referred to as an NCP-function whenever (a,b)R2:φ(a,b)=0  a0,b0,ab=0is valid. A rather prominent example of an NCP-function is the famous Fischer–Burmeister (FB) function φFB:R2R given by (1) (a,b)R2:φFB(a,b):=a+b+a2+b2,(1) see [Citation7] for its origin. Another popular NCP-function is the maximum function (a,b)max(a,b). We refer the reader to [Citation7–10] for further early references dealing with semismooth Newton-type methods to solve nonlinear complementarity systems. Let us note that even prior to the development of semismooth Newton methods 30 years ago, the numerical solution of nonsmooth equations via Newton-type algorithms has been addressed in the literature, see e.g. [Citation11–15]. We refer the interested reader to the monographs [Citation16–18] for a convincing overview.

Let us mention some popular directions of research which have been enriched or motivated by the theory of semismooth Newton methods. Exemplary, this theory has been extended to Levenberg–Marquardt (LM) methods in order to allow for the treatment of over-determined or irregular systems of equations, see e.g. [Citation10, Citation19–22]. It also has been applied in order to solve so-called mixed nonlinear complementarity problems which are a natural extension of nonlinear complementarity problems and may also involve pure (smooth) equations, see e.g. [Citation23–28]. Recently, the variational notion of semismoothness* for set-valued mappings has been introduced in [Citation29] in order to discuss the convergence properties of Newton's method for the numerical solution of so-called generalized equations given via set-valued operators. There also exist infinite-dimensional extensions of nonsmooth Newton methods, see e.g. [Citation30–36]. In contrast to the finite-dimensional case, in infinite dimensions, the underlying concept of semismoothness is often replaced by so-called Newton-differentiability. In principle, this notion of generalized differentiation encapsulates all necessities to address the convergence analysis of associated Newton-type methods properly. Noteworthy, it does not rely on the Lipschitzness of the underlying mapping. In the recent paper [Citation37], this favourable advantage of Newton-derivatives has been used to construct a Newton-type method for the numerical solution of a certain stationarity system associated with so-called mathematical programs with complementarity constraints, as these stationarity conditions can be reformulated as a system of discontinuous equations.

Let us point the reader's attention to the fact that some stationarity conditions for bilevel optimization problems, see [Citation38, Citation39] for an introduction, can be reformulated as (over-determined) systems of nonsmooth equations involving an (unknown) penalization parameter λ. This observation has been used in the recent papers [Citation40–42] in order to solve bilevel optimization problems numerically. In [Citation40], the authors used additional dummy variables in order to transform the naturally over-determined system into a square system, and applied a globalized semismooth Newton method for its computational solution. The papers [Citation41, Citation42] directly tackle the over-determined nonsmooth system with Gauss–Newton or LM methods. However, in order to deal with the nonsmoothness, they either assume that strict complementarity holds or they smooth the complementarity condition. In all three papers [Citation40–42], it has been pointed out that the choice of the penalty parameter λ in the stationarity system is, on the one hand, crucial for the success of the approach but, on the other hand, difficult to realize in practice.

The contributions in this paper touch several different aspects. Motivated by the results from [Citation37], we study the numerical solution of over-determined systems of nonsmooth equations with the aid of LM methods based on the notion of Newton-differentiability. We show local superlinear convergence of this method under reasonable assumptions, see Theorem 3.2. Furthermore, we point out that even for local quadratic convergence, local Lipschitzness of the underlying mapping is not necessary. To be more precise, a one-sided Lipschitz estimate, which is referred to as calmness in the literature, is enough for that purpose. Thus, our theory applies to situations where the underlying mapping can be even discontinuous. Our next step is the application of this method for the numerical solution of over-determined mixed nonlinear complementarity systems. Here, we strike a classical path and reformulate the appearing complementarity conditions with the FB function and the maximum function in order to obtain a system of nonlinear equations. To globalize the associated Newton methods, we exploit the well-known fact that the squared norm of the residual associated with the FB function is continuously differentiable, and make use of gradient steps with respect to this squared norm in combination with an Armijo step size selection if LM directions do not yield sufficient improvements. We work out our abstract conditions guaranteeing local fast convergence for both approaches and state global convergence results in Theorems 4.6 and 4.12. Furthermore, we compare these findings with the ones in the classical paper [Citation9] where a related comparison has been done in the context of semismooth Newton-type methods for square systems of nonsmooth equations. Finally, we apply both methods in order to solve bilevel optimization problems via their over-determined stationarity system of nonsmooth equations mentioned earlier. In contrast to [Citation41, Citation42], we neither assume strict complementarity to hold nor do we smooth or relax the complementarity condition. Some assumptions for local fast convergence are worked out, see Theorems 5.3 and 5.4. As extensive numerical testing of related approaches has been carried out in [Citation40–42], our experiments in Section 5.2 focus on specific features of the developed algorithms. Particularly, we comment on the possibility to treat the aforementioned penalty parameter λ as a variable, investigate this approach numerically, and compare the outcome with the situation where λ is handled as a parameter.

The remainder of the paper is organized as follows. Section 2 provides an overview of the notation and preliminaries used in the manuscript. Furthermore, we recall the notion of Newton-differentiability and present some associated calculus rules in Section 2.2. A local nonsmooth LM method for Newton-differentiable mappings is discussed in Section 3. Section 4 is dedicated to the derivation of globalized nonsmooth LM methods for the numerical solution of over-determined mixed nonlinear complementarity systems. First, we provide the analysis for the situation where the complementarity condition is reformulated with the maximum function in Section 4.1. Afterwards, in Section 4.2, we comment on the differences which pop up when the FB function is used instead. The obtained theory is used in Section 5 in order to solve bilevel optimization problems. In Section 5.1, we first recall the associated stationarity system of interest and characterize some scenarios where it provides necessary optimality conditions. Second, we present different ways on how to model these stationarity conditions as an over-determined mixed nonlinear complementarity system. The numerical solution of these systems with the aid of nonsmooth LM methods is then investigated in Section 5.2 where results of computational experiments are evaluated. The paper closes with some concluding remarks in Section 6.

2. Notation and preliminaries

2.1. Notation

By N, we denote the positive integers. The set of all real matrices with mN rows and nN columns will be represented by Rm×n, and O denotes the all-zero matrix of appropriate dimensions while we use In for the identity matrix in Rn×n. If MRm×n is a matrix and I{1,,m} is arbitrary, then MI shall be the matrix which results from M by deleting those rows whose associated index does not belong to I. Whenever the quadratic matrix NRn×n is symmetric, λmin(N) denotes the smallest eigenvalue of N. For any xRn, diag(x)Rn×n is the diagonal matrix whose main diagonal is given by x and whose other entries vanish.

For arbitrary nN, the space Rn will be equipped by the standard Euclidean inner product as well as the Euclidean norm :RnR. Further, we equip Rm×n with the matrix norm induced by the Euclidean norm, i.e. with the spectral norm, and denote it by :Rm×nR as well as this cannot cause any confusion. For arbitrary xRn and ϵ>0, Bϵ(x):={yRn|yxϵ} represents the closed ε-ball around x. Recall that a sequence {xk}kNRn is said to converge superlinearly to some x¯Rn whenever xk+1x¯o(xkx¯). The convergence xkx¯ is said to be quadratic if xk+1x¯O(xkx¯2).

Frequently, for brevity of notation, we interpret tuples of vectors as a single block column vector, i.e. we exploit the identity ∀i{1,,},xiRni: Rn1××Rn(x1,,x)[x1x]Rn1++nfor any N with 2 and n1,,nN.

A function H:RnRm is called calm at x¯Rn whenever there are constants C>0 and ϵ>0 such that ∀xBϵ(x¯):H(x)H(x¯)Cxx¯.We note that calmness can be seen as a one-sided local Lipschitz property which is, in general, weaker than local Lipschitzness. Indeed, one can easily check that there exist functions which are calm at some fixed reference point but discontinuous in each neighbourhood of this point.

Next, let H:RnRm be differentiable at x¯Rn. Then H(x¯)Rm×n denotes the Jacobian of H at x¯. Similarly, for a differentiable scalar-valued function h:RnR, ∇h(x¯)Rn is the gradient of h at x¯. Clearly, h(x¯)=∇h(x¯) by construction. In the case where h is twice differentiable, 2h(x¯)Rn×n denotes the Hessian of h at x¯. Partial derivatives with respect to particular variables are represented in analogous fashion.

We use Γ:RnRm is order to express that Γ is a so-called set-valued mapping which assigns to each xRn a (potentially empty) subset of Rm. For any such set-valued mapping, the sets defined by domΓ:={xRn|Γ(x)} and gphΓ:={(x,y)Rn×Rm|yΓ(x)} are the domain and the graph of Γ, respectively. We say that Γ is inner semicontinuous at (x¯,y¯)gphΓ whenever for each sequence {xk}kNRn such that xkx¯, there exists a sequence {yk}kNRm such that yky¯ which satisfies ykΓ(xk) for all large enough kN. We emphasize that whenever Γ is inner semicontinuous at (x¯,y¯), then x¯ is an interior point of domΓ. At a fixed point (x¯,y¯)gphΓ, Γ is said to be calm whenever there are constants ϵ,δ,L>0 such that ∀xBϵ(x¯),∀yΓ(x)Bδ(y¯),y~Γ(x¯):yy~Lxx¯.We note that whenever H:RnRm is a single-valued function which is calm at x¯, then the set-valued mapping x{H(x)} is calm at (x¯,H(x¯)). The converse is not necessarily true.

2.2. Newton-differentiability

In order to construct numerical methods for the computational solution of nonsmooth equations, one has to choose a suitable concept of generalized differentiation. Typically, the idea of semismoothness is used for that purpose, see [Citation2, Citation3]. Here, however, we strike a different path and exploit the notion of Newton-differentiability, see [Citation32, Citation35], whose origins can be found in infinite-dimensional optimization. The latter has the inherent advantage that, in contrast to semismoothness, it is defined for non-Lipschitz functions and enjoys a natural calculus which follows the lines known from standard differentiability. These beneficial features have been used recently in order to solve stationarity conditions of complementarity-constrained optimization problems which can be reformulated as systems of discontinuous equations, see [Citation37].

Let us start with the formal definition of Newton-differentiability before presenting some essential calculus rules. Most of this material is taken from the recent contribution [Citation37, Section 2.3].

Definition 2.1

Let F:RpRq and DNF:RpRq×p be given mappings and let MRp be nonempty. We say that

  1. F is Newton-differentiable on M with Newton-derivative DNF whenever for each zM, we have F(z+d)F(z)DNF(z+d)d=o(d),

  2. F is Newton-differentiable on M of order α(0,1] with Newton-derivative DNF whenever for each zM, we have F(z+d)F(z)DNF(z+d)d=O(d1+α).

We note that the Newton-derivative of a mapping F:RpRq is not necessarily uniquely determined if F is Newton-differentiable on some set MRp. It can be easily checked that any continuously differentiable function F:RpRq is Newton-differentiable on Rp when DNF:=F is chosen. If, additionally, F is locally Lipschitz continuous, then the order of Newton-differentiability is 1.

In the recent paper [Citation37], the authors present several important calculus rules for Newton-derivatives including a chain rule which we recall below, see [Citation37, Lemma 2.11].

Lemma 2.2

Suppose that F:RpRq is Newton-differentiable on MRp with Newton-derivative DNF, and that G:RqRs is Newton-differentiable on F(M) with Newton-derivative DNG. Further, assume that DNF is bounded on a neighbourhood of M, and that DNG is bounded on a neighbourhood of F(M). Then GF is Newton-differentiable on M with Newton-derivative given by zDNG(F(z))DNF(z). If both F and G are Newton-differentiable of order α(0,1], then GF is Newton-differentiable of order α with the Newton-derivative given above.

Let us note that Lemma 2.2 directly gives a sum rule which applies to the concept of Newton-differentiability. However, a direct proof via Definition 2.1 shows that a sum rule for Newton-differentiability holds without additional boundedness assumptions on the Newton-derivatives of the involved functions.

Let us inspect some examples which will be of major interest in this paper.

Example 2.3

  1. The function max:R2R is Newton-differentiable on R2 of order 1 with Newton-derivative given by (a,b)R2:DNmax(,)(a,b):={(1,0)ab,(0,1)a<b,see [Citation37, Example 2.8].

  2. For arbitrary pN, we investigate Newton-differentiability of the Euclidean norm :RpR. Noting that is continuously differentiable on Rp{0} with locally Lipschitzian derivative, it is Newton-differentiable of order 1 there. Let us consider the mapping DN:RpR1×p given by (2) ∀zRp:DN(z):={zzz0,ppez=0.(2) Here, eRp denotes the all-ones vector. At the origin, we have dDN(d)d=0 for each dRp, so we already obtain Newton-differentiability of on Rp of order 1. We also note that the precise value of DN(0) is completely irrelevant for the validity of this property. However, the particular choice in (Equation2) will be beneficial later on.

  3. Let us investigate Newton-differentiability of the aforementioned FB function φFB:R2R given in (Equation1). Relying on the sum rule (note that φFB is the sum of the identity and in R2) and respecting our arguments from Example 2.3(b), we obtain that φFB is Newton-differentiable on R2 of order 1 with Newton-derivative given by DNφFB(a,b):={(1+aa2+b2,1+ba2+b2)(a,b)(0,0),(1+22,1+22)(a,b)=(0,0)for all (a,b)R2, where we made use of (Equation2).

3. A local nonsmooth Levenberg–Marquardt method

The paper [Citation8] introduces a semismooth Newton-type method for square complementarity systems whose globalization is based on the FB function. An extension to LM-type methods (which even can handle inexact solutions of the subproblems) can be found in [Citation10, Citation19]. A theoretical and numerical comparison of these methods is provided in [Citation9]. An application of the semismooth LM method in the context of mixed complementarity systems can be found in [Citation20], and these ideas were applied in the context of semiinfinite optimization e.g. in [Citation21, Citation22].

In this subsection, we aim to analyse the local convergence properties of nonsmooth LM methods in a much broader context which covers not only the applications within the aforementioned papers, but also some new ones in bilevel optimization, see e.g. Section 5 and our comments in Section 6. Our approach via Newton-derivatives is slightly different from the one in the literature which exploits semismoothness of the underlying map of interest and, thus, particularly, local Lipschitz continuity.

Throughout the subsection, we assume that F:RpRq is a given mapping with q>p and inherent nonsmooth structure. We aim to solve the (over-determined) nonlinear system of equations (3) F(z)=0.(3) Classically, this can be done by minimizing the squared residual of a first-order linearization associated with F. The basic idea behind LM methods is now to add a classical square-norm regularization term to this optimization problem. Let us consider a current iterate zk where F is Newton-differentiable, and consider the minimization of d12F(zk)+DNF(zk)d2+νk2d2 which is a strictly convex function. Above, we exploited the Newton-derivative in order to find a linearization of F at zk, and νk>0 is a regularization parameter. By means of the chain rule, a necessary optimality condition for this surrogate problem is given by (4) (DNF(zk)DNF(zk)+νkIp)d=DNF(zk)F(zk).(4) Observing that the matrix DNF(zk)DNF(zk) is at least positive semidefinite, (Equation4) indeed possesses a uniquely determined solution dk.

This motivates the formulation of Algorithm 3.1 for the numerical treatment of (Equation3). We assume that DNF:RpRq×p is a given function which serves as a Newton-derivative on a suitable subset of Rp which will be specified later.

In the subsequent theorem, we present a local convergence result regarding Algorithm 3.1.

Theorem 3.2

Let MRp be a set such that a solution z¯Rp of (Equation3) satisfies z¯M. Assume that F is Newton-differentiable on M with Newton-derivative DNF:RpRq×p. We assume that there are δ>0 and C>0 such that, for all zBδ(z¯), DNF(z) possesses full column rank p such that (5) λmin(DNF(z)DNF(z))1C,DNF(z)C.(5) Then there exists ϵ>0 such that, for each starting point z0Bϵ(z¯) and each null sequence {νk}kN(0,ϵ), Algorithm 3.1 terminates after finitely many steps or produces a sequence {zk}kN which converges to z¯ superlinearly. Furthermore, if F is even Newton-differentiable on M of order 1, and if νkO(F(zk)) while F is calm at z¯, then the convergence is quadratic.

Proof.

Due to the assumptions of the theorem, and by Newton-differentiability of F, we can choose ϵ(0,min(δ,1/(4C)) so small such that the following estimates hold true for all dBϵ(0) and ν>0: (6a) (DNF(z¯+d)DNF(z¯+d)+νIp)1C,(6a) (6b) DNF(z¯+d)C,(6b) (6c) F(z¯+d)F(z¯)DNF(z¯+d)d14C2d.(6c) Using F(z¯)=0, for each zkBϵ(z¯) and νk(0,ϵ), we find (7) zk+1z¯=zk(DNF(zk)DNF(zk)+νkIp)1DNF(zk)F(zk)z¯CDNF(zk)(F(zk)F(z¯)DNF(zk)(zkz¯))+Cνkzkz¯C2F(zk)F(z¯)DNF(zk)(zkz¯)+Cνkzkz¯14zkz¯+14zkz¯=12zkz¯.(7) Thus, we have zk+1z¯12zkz¯, i.e. zk+1Bϵ/2(z¯). Thus, if z0Bϵ(z¯) and {νk}kN(0,ϵ), we have zkz¯ in the case where Algorithm 3.1 generates an infinite sequence. Furthermore, the definition of Newton-differentiability, (Equation7), and νk0 give zk+1z¯=o(zkz¯), i.e. the convergence zkz¯ is superlinear.

Finally, assume that F is Newton-differentiable of order 1 and calm at z¯, and that νkO(F(zk)). Then there is a constant K>0 such that the estimate (Equation7) can be refined as zk+1z¯C2F(zk)F(z¯)DNF(zk)(zkz¯)+Cνkzkz¯O(zkz¯2)+CKF(zk)zkz¯=O(zkz¯2),where we used F(z¯)=0 and the calmness of F at z¯ in the last step.

Let us briefly compare the assumptions of Theorem 3.2, which are used to guarantee the superlinear or quadratic convergence of a sequence generated by Algorithm 3.1, with the ones exploited in the literature where nonsmooth LM methods are considered from the viewpoint of semismoothness, see e.g. [Citation10, Section 2]. Therefore, we fix a solution z¯Rp of (Equation3). The full column rank assumption on the Newton-derivative, locally around z¯, corresponds to so-called BD-regularity of the point z¯ which demands that all matrices within Bouligand's generalized Jacobian at z¯ (if available), see e.g. [Citation1, Section 2], possess full column rank. We note that, by upper semicontinuity of Bouligand's generalized Jacobian, this full rank assumption extends to a neighbourhood of z¯. Hence, these full rank assumptions are, essentially, of the same type although the one from Theorem 3.2 is more general since it addresses situations where the underlying map is allowed to be non-Lipschitz. Second, Theorem 3.2 assumes boundedness of the Newton-derivative, locally around z¯. In the context of semismooth LM methods, local boundedness of the generalized derivative holds inherently by construction of Bouligand's generalized Jacobian and local Lipschitzness of the underlying mapping. Third, for quadratic convergence, the regularization parameters need to satisfy νkO(F(zk)) in Theorem 3.2, and this assumption is also used in [Citation10]. Similarly, since Newton-differentiability of order 1 is a natural counterpart of so-called strict semismoothness, the assumptions for quadratic convergence are also the same. Summarizing these impressions, Algorithm 3.1 and Theorem 3.2 generalize the already existing theory on nonsmooth LM methods to a broader setting under reasonable assumptions. We also note that our analysis made no use of deeper underlying properties of the generalized derivative, we mainly used its definition for our purposes. However, it should be observed that a sophisticated choice of the Newton-derivative, which is not uniquely determined for a given map as mentioned in Section 2.2, may lead to weaker assumptions in Theorem 3.2 than a bad choice of it. Indeed, it even may happen that the assumptions of Theorem 3.2 are valid for a particular choice of the Newton-derivative while they are violated for another one. Thus, when Algorithm 3.1 (or a suitable globalization) is implemented, one has to keep in mind to choose the Newton-derivative in such a way that the assumptions of Theorem 3.2 are valid (if this is actually possible), as the requirements of Theorem 3.2 need to be satisfied for one particular choice of the Newton-derivative (and not for all possible choices). Clearly, this choice depends on structural properties of the nonsmooth mapping F under consideration and is, potentially, a laborious task as we will see in Section 4, see [Citation37] as well.

The following corollary of Theorem 3.2 shows that F(zk+1)o(F(zk)) can be expected under reasonable assumptions, and this will be of essential importance later on.

Corollary 3.3

Under the assumptions of Theorem 3.2 which guarantee the superlinear convergence of {zk}kN generated by Algorithm 3.1, we additionally have F(zk+1)F(zk)0provided F is calm at z¯.

Proof.

We choose ϵ>0 as in the proof of Theorem 3.2 and observe {zk}kNBϵ(z¯). Exploiting the Newton-differentiability of F and F(z¯)=0, we have F(zk)=DNF(zk)(zkz¯)+o(zkz¯),and some transformations give, for sufficiently large kN and by boundedness of the sequence {DNF(zk)}kN, zkz¯(DNF(zk)DNF(zk))1DNF(zk)F(zk)+12zkz¯,i.e. 12zkz¯(DNF(zk)DNF(zk))1DNF(zk)F(zk)C2F(zk)due to (Equation5). Thus, we find F(zk+1)F(zk)2C2F(zk+1)F(z¯)zkz¯2C2Lzk+1z¯zkz¯2C2Lo(zkz¯)zkz¯0as k, where L>0 is a local calmness constant of F at z¯.

In [Citation37, Section 3.2], a function is constructed which possesses the following properties:

  1. it is Newton-differentiable on its set of roots with globally bounded and nonvanishing Newton-derivative,

  2. it is discontinuous in each open neighbourhood of its set of roots, and

  3. it is calm at each point from its set of roots.

This function is then used to construct a nonsmooth Newton-type method for the computational solution of stationarity systems associated with complementarity-constrained optimization problems. We note that it crucially violates the standard requirement of local Lipschitzness which is typically exploited in the context of nonsmooth Newton-type methods. Similarly as in [Citation37], the analysis in Theorem 3.2 and Corollary 3.3 only requires calmness of the mapping F (as well as some other natural assumptions) in order to get local fast convergence of a nonsmooth LM method. Thus, the ideas from [Citation37] can be carried over to the situation where over-determined stationarity systems of complementarity-constrained optimization problems need to be solved (such systems would, exemplary, arise when applying the theory from Section 5.1 to bilevel optimization problems with additional complementarity constraints at the upper-level stage or to the so-called combined reformulation of bilevel optimization problems which makes use of the so-called value function and Karush–Kuhn–Tucker reformulation at the same time, see [Citation43]).

For the globalization of Algorithm 3.1, one typically needs to impose additional assumptions like the smoothness of zF(z)2. In the next section, we address the prototypical setting of mixed nonlinear complementarity systems and carve out two classical globalization strategies.

4. A global nonsmooth Levenberg–Marquardt method for mixed nonlinear complementarity systems

For functions H:Rp1×Rp2Rq1 and G:Rp1×Rp2Rp2 being continuously differentiable, we aim to solve the mixed nonlinear complementarity system (MNLCS) H(w,ξ)=0,G(w,ξ)0, ξ0,G(w,ξ)ξ=0(MNLCS) where, in the application we have in mind, q1>p1 holds true, see Section 5. A comprehensive overview of available theoretical and numerical aspects as well as applications of mixed nonlinear complementarity problems can be found in the monograph [Citation16]. Typical solution approaches are based on nonsmooth Newton-type methods, see [Citation26], smoothing methods, see [Citation23, Citation27], active-set and interior-point methods, see [Citation24, Citation28], as well as penalty methods, see [Citation25]. Let us note that the classical formulation of mixed nonlinear complementarity systems as considered in [Citation23, Citation24, Citation26–28] is essentially equivalent to the one we are suggesting in (EquationMNLCS), see [Citation23, Section 4] and [Citation24, Section 1] for arguments which reveal this equivalence. However, in Section 5, it will be beneficial to work with the model (EquationMNLCS) to preserve structural properties.

For later use, we introduce some index sets depending on a pair of vectors (w,ξ)Rp1×Rp2: I0(w,ξ):={i{1,,p2}|Gi(w,ξ)=0},I(w,ξ):={i{1,,p2}|Gi(w,ξ)<0},I+(w,ξ):={iI0(w,ξ)|ξi>0},I00(w,ξ):={iI0(w,ξ)|ξi=0}.Above, G1,,Gp2:Rp1×Rp2R are the component functions of G.

Based on NCP-functions, see Section 1, the complementarity condition (8) G(w,ξ)0, ξ0,G(w,ξ)ξ=0(8) can be restated in form of a (nonsmooth) equality constraint. In this paper, we will focus on two popular choices, namely the maximum function and the famous FB function. Clearly, (Equation8) is equivalent to max(G(w,ξ),ξ)=0,where max:Rp2×Rp2Rp2 is exploited to express the componentwise maximum. Using ϕFB:Rp2×Rp2Rp2, given by ξ1,ξ2Rp2:ϕFB(ξ1,ξ2):=(φFB(ξ11,ξ12)φFB(ξp21,ξp22)),where φFB:R2R is the FB function defined in (Equation1), (Equation8) is also equivalent to ϕFB(G(w,ξ),ξ)=0.These nonsmooth equations motivate the consideration of the special residual mappings Fmax,FFB:Rp1×Rp2Rq1×Rp2 given by (9) ∀wRp1,∀ξRp2:Fmax(w,ξ):=[H(w,ξ)max(G(w,ξ),ξ)],FFB(w,ξ):=[H(w,ξ)ϕFB(G(w,ξ),ξ)],(9) since precisely their roots are the solutions of (EquationMNLCS). Let us note that, in a certain sense, these residuals are equivalent. This can be distilled from [Citation44, Lemma 3.1].

Lemma 4.1

There exist constants K1,K2>0 such that ∀wRp1,∀ξRp2:K1Fmax(w,ξ)FFB(w,ξ)K2Fmax(w,ξ).

Due to the inherent nonsmoothness of Fmax and FFB from (Equation9), we may now apply Algorithm 3.1 in order to solve Fmax(w,ξ)=0 or FFB(w,ξ)=0. Note that we have p:=p1+p2 and q:=q1+p2 in the context of Section 3. In the literature, the approach via FFB is quite popular due to the well-known observation that the mapping ΨFB:Rp1×Rp2R given by ∀wRp1,∀ξRp2:ΨFB(w,ξ):=12FFB(w,ξ)2is continuously differentiable, allowing for a globalization of suitable local solution methods. This is due to the fact that the squared FB function is continuously differentiable, see e.g. [Citation45, Lemma 3.4]. However, it has been reported in [Citation9] in the context of square nonlinear complementarity systems that a mixed approach combining both residuals from (Equation9) might be beneficial since the assumptions for local fast convergence could be weaker while the local convergence is faster.

4.1. The mixed approach via both residual functions

First, let us focus on this mixed approach where the central LM search direction will be computed via Fmax given in (Equation9) in order to end up with less complicated Newton-derivatives in the LM system. Still, we exploit the smooth function ΨFB for globalization purposes. Later, in Section 4.2, we briefly comment on the method which exclusively uses FFB.

Below, we provide formulas for a Newton-derivative of Fmax and the gradient of ΨFB for later use.

Lemma 4.2

On each bounded subset of Rp1×Rp2, the mapping Fmax is Newton-differentiable with Newton-derivative given by (w,ξ)[Hw(w,ξ)Hξ(w,ξ)D(I(w,ξ))Gw(w,ξ)D(I(w,ξ))Gξ(w,ξ)D(I<(w,ξ))],and this mapping, again, is bounded on bounded sets. Whenever the derivatives of G and H are locally Lipschitzian, the order of Newton-differentiability is 1. Above, the index sets I(w,ξ) and I<(w,ξ) are given by I(w,ξ):={i{1,,p2}|Gi(w,ξ)ξi},I<(w,ξ):={i{1,,p2}|Gi(w,ξ)<ξi},and, for each I{1,,p2}, D(I)Rp2×p2 is the diagonal matrix given by ∀i,j{1,,p2}:D(I)ij:={1i=j, iI,0otherwise.

Proof.

This follows easily from Lemma 2.2, Example 2.3(a), and our comments right after Definition 2.1.

Throughout the section, we will denote the Newton-derivative of Fmax which has been characterized in Lemma 4.2 by DNFmax.

The next result follows simply by computing the derivative of the squared FB function and using the standard chain rule.

Lemma 4.3

At each point (w,ξ)Rp1×Rp2, ΨFB is continuously differentiable, and its gradient is given by ΨFB(w,ξ)=[Hw(w,ξ)Hξ(w,ξ)D~G(w,ξ)Gw(w,ξ)D~G(w,ξ)Gξ(w,ξ)D~ξ(w,ξ)]FFB(w,ξ)where D~G(w,ξ):=diag(va(w,ξ)),D~ξ(w,ξ):=diag(vb(w,ξ)).Above, va(w,ξ),vb(w,ξ)Rp2 are the vectors given by (10) ∀i{1,,p2}:(va(w,ξ))i:={1+Gi(w,ξ)Gi2(w,ξ)+ξi2iI00(w,ξ),1+22iI00(w,ξ),(vb(w,ξ))i:={1ξiGi2(w,ξ)+ξi2iI00(w,ξ),1+22iI00(w,ξ).(10)

For local superlinear convergence of our method of interest, we need the following result.

Lemma 4.4

Fix a solution (w¯,ξ¯)Rp1×Rp2 of (EquationMNLCS) and assume that for each index set II00(w¯,ξ¯), the matrix [Hw(w¯,ξ¯)Hξ(w¯,ξ¯)Gw(w¯,ξ¯)I+(w¯,ξ¯)IGξ(w¯,ξ¯)I+(w¯,ξ¯)IOII(w¯,ξ¯)Icp2],where we used Ic:=I00(w¯,ξ¯)I, possesses linear independent columns. Then the matrices M(w,ξ)M(w,ξ), where M(w,ξ):=[Hw(w,ξ)Hξ(w,ξ)D(I(w,ξ))Gw(w,ξ)D(I(w,ξ))Gξ(w,ξ)D(I<(w,ξ))],are uniformly positive definite in a neighbourhood of (w¯,ξ¯).

Proof.

Suppose that the assertion is false. Then there exist sequences {wk}kNRp1, {ξk}kNRp2, {(dwk,dξk)}kNRp1×Rp2, and {ηk}kN[0,) such that wkw¯, ξkξ¯, ηk0, and, for each kN, dwk+dξk=1 as well as ηk=[dwkdξk]M(wk,ξk)M(wk,ξk)[dwkdξk].For brevity of notation, we make use of the abbreviations (11) Hw(k):=Hw(wk,ξk),Hξ(k):=Hξ(wk,ξk),(Gi)w(k):=(Gi)w(wk,ξk),(Gi)ξ(k):=(Gi)ξ(wk,ξk)(11) for all i{1,,p2}. Due to D(I(wk,ξk))D(I<(wk,ξk))=O, the above gives (12) ηk=Hw(k)dwk2+2(Hw(k)dwk)(Hξ(k)dξk)+Hξ(k)dξk2+iI(wk,ξk)((Gi)w(k)dwk+(Gi)ξ(k)dξk)2+iI<(wk,ξk)(dξk)i2.(12) By continuity of G, each index iI+(w¯,ξ¯) lies in I(wk,ξk) for sufficiently large kN. Furthermore, any iI(w¯,ξ¯) lies in I<(wk,ξk) for sufficiently large kN. For iI00(w¯,ξ¯), two scenarios are possible. Either there is an infinite subset KiN such that iI(wk,ξk) for all kKi, or iI<(wk,ξk) holds for all large enough kN. Anyhow, since there are only finitely many indices in {1,,p2}, we may choose an infinite subset KN as well as an index set II00(w¯,ξ¯) such that I(wk,ξk)=I+(w¯,ξ¯)I and I<(wk,ξk)=I(w¯,ξ¯)Ic is valid for each kK, where Itextupc:=I00(w¯,ξ¯)I. Hence, for each kK, (Equation12) is equivalent to (13) ηk=Hw(k)dwk2+2(Hw(k)dwk)(Hξ(k)dξk)+Hξ(k)dξk2+iI+(w¯,ξ¯)I((Gi)w(k)dwk+(Gi)ξ(k)dξk)2+iI(w¯,ξ¯)Ic(dξk)i2.(13) Clearly, along a subsubsequence (without relabelling), {(dwk,dξk)}kK converges to some (dw,dξ)Rp1×Rp2 such that dw+dξ=1. Thus, taking the limit kK in (Equation13) gives 0=Hw(w¯,ξ¯)dw2+2(Hw(w¯,ξ¯)dw)(Hξ(w¯,ξ¯)dξ)+Hξ(w¯,ξ¯)dξ2+iI+(w¯,ξ¯)I((Gi)w(w¯,ξ¯)dw+(Gi)ξ(w¯,ξ¯)dξ)2+iI(w¯,ξ¯)Ic(dξ)i2.This implies that the matrix M~I(w¯,ξ¯)M~I(w¯,ξ¯), where

M~I(w¯,ξ¯):=[Hw(w¯,ξ¯)Hξ(w¯,ξ¯)D(I+(w¯,ξ¯)I)Gw(w¯,ξ¯)D(I+(w¯,ξ¯)I)Gξ(w¯,ξ¯)D(I(w¯,ξ¯)Ic)],

which is naturally positive semidefinite by construction, is not positive definite. Thus, M~I(w¯,ξ¯) cannot possess full column rank, contradicting the lemma's assumptions.

The qualification condition postulated in Lemma 4.4 actually corresponds to the linear independence of the columns of all elements of Bouligand's generalized Jacobian of Fmax at (w¯,ξ¯), i.e. these assumptions recover the BD-regularity condition from the literature, and the latter is well established in the context of solution algorithms for nonsmooth systems. Let us also mention that it can be easily checked by means of simple examples that full column rank of the Newton-derivative of Fmax at (w¯,ξ¯), as constructed in Lemma 4.2, does, in general, not guarantee the uniform positive definiteness which has been shown under the assumptions of Lemma 4.4.

We note that each solution of (EquationMNLCS) is a global minimizer of ΨFB and, thus, a stationary point of this function. The converse statement is not likely to hold. Even for quadratic systems, one needs strong additional assumptions in order to obtain such a result.

Next, we present the method of our interest in Algorithm 4.5. For brevity of notation, we introduce z:=[wξ],d:=[δwδξ],and similarly, we define zk and dk. Recall that DNFmax(z) denotes the Newton-derivative of Fmax at z characterized in Lemma 4.2.

We now present the central convergence result associated with Algorithm 4.5. Its proof is similar to the one of [Citation37, Theorem 5.2] but included for the purpose of completeness.

Theorem 4.6

Let {zk}kN be a sequence generated by Algorithm 4.5.

(a)

If ΨFB(zk+dk)κΨFB(zk) holds infinitely many times in Step 4, then {ΨFB(zk)}kN is a null sequence and each accumulation point of {zk}kN is a solution of (EquationMNLCS).

(b)

Each accumulation point of {zk}kN is a stationary point of ΨFB.

(c)

If an accumulation point z¯ of {zk}kN satisfies the assumptions of Lemma 4.4, then the whole sequence {zk}kN converges to z¯ superlinearly. If the derivatives of G and H are locally Lipschitz continuous functions, then the convergence is even quadratic.

Proof.

  1. We note that Algorithm 4.5 is a descent method with respect to ΨFB which is bounded from below by 0. Thus, the assumptions guarantee ΨFB(zk)0. Noting that ΨFB is continuous, each accumulation point z¯ of {zk}kN satisfies ΨFB(z¯)=0 in this situation, giving FFB(z¯)=0, and this means that z¯ solves (EquationMNLCS).

  2. If the assumptions of the first statement hold, the assertion is clear. Thus, we may assume that, along the tail of the sequence, ΨFB(zk+dk)>κΨFB(zk) is valid. Assume without loss of generality that {zk}kN converges to some point z¯. We proceed by a distinction of cases.

    If lim infkdk=0, Algorithm 4.5 automatically gives dk=ΨFB(zk) along a subsequence (without relabelling). Taking the limit along this subsequence gives ΨFB(z¯)=0 by continuity of ΨFB.

    Next, consider the case lim infkαkdk>0. Noting that {ΨFB(zk)}kN is monotonically decreasing and bounded from below, this sequence converges. Furthermore, for all large enough iterations kN, the choice of the step size and the fact that the search direction is a descent direction for ΨFB guarantee that 0αkσΨFB(zk)dkΨFB(zk)ΨFB(zk+1).Thus, since {ΨFB(zk)}kN is a Cauchy sequence, we find αkΨFB(zk)dk0. Noting that, by construction of Algorithm 4.5, each search direction passes the angle test ΨFB(zk)dkρ1ΨFB(zk)dk for large enough kN, the estimate αkΨFB(zk)dk0ρ1αkΨFB(zk)dkαkΨFB(zk)dkfollows. Taking the limit k and noting that {αkdk}kN is bounded away from zero, ΨFB(zk)0 follows, so ΨFB(z¯)=0 is obtained from continuity of ΨFB.

    Finally, we assume that lim infkdk>0 and lim infkαkdk=0. For simplicity of notation, let αkdk0. Then we have αk0 by assumption. Particularly, for large enough kN, the step size candidate β1αk is rejected, i.e. ΨFB(zk+β1αkdk)>ΨFB(zk)+β1αkσΨFB(zk)dk.The smoothness of ΨFB allows for the application of the mean value theorem in order to find θk(0,β1αk) such that ΨFB(zk+β1αkdk)ΨFB(zk)=β1αkΨFB(zk+θkdk)dk,and together with the above, ΨFB(zk+θkdk)dk>σΨFB(zk)dkfollows. Clearly, αkdk0 gives θkdk0. Noting that ΨFB is uniformly continuous on each closed ball around z¯, for arbitrary ϵ>0, we can ensure 0<ΨFB(zk+θkdk)dkσΨFB(zk)dk=(ΨFB(zk+θkdk)ΨFB(zk))dk+(1σ)ΨFB(zk)dkϵdk+(1σ)ΨFB(zk)dkfor large enough kN. Combining this with the validity of the angle test gives ϵdk>(1σ)ΨFB(zk)dk(1σ)ρ1ΨFB(zk)dk,i.e. (1σ)ρ1ΨFB(zk)<ϵ for all large enough kN. Since ϵ>0 has been chosen arbitrarily, ΨFB(zk)0 follows, which gives ΨFB(z¯)=0.

  3. Let {zk}kK be a subsequence fulfilling zkKz¯ for some point z¯ which satisfies the assumptions of Lemma 4.4. Furthermore, we note that Fmax is locally Lipschitzian by construction. Clearly, by FFB(zk)K0, which holds since z¯ solves Fmax(z)=0 and, thus, also FFB(z)=0, {νk}kK is a null sequence. Due to Lemma 4.1, it holds νkO(Fmax(zk)), so we know that, for sufficiently large kK, zk lies in the radius of attraction of z¯ mentioned in Theorem 3.2 while νk is sufficiently small in order to apply Theorem 3.2 to get the desired results if the LM direction is actually accepted. This, however, follows for all large enough kK from Corollary 3.3 since Lemma 4.1 gives ΨFB(zk+dk)ΨFB(zk)=(FFB(zk+dk)FFB(zk))2(K2K1)2(Fmax(zk+dk)Fmax(zk))2K0for the LM directions {dk}kK.

Remark 4.7

  1. Clearly, Algorithm 4.5 is a descent method with respect to ΨFB, i.e. the sequence {FFB(zk)}kN is monotonically decreasing. This directly gives that {νk}kN is monotonically decreasing and, thus, bounded by its trivial lower boundedness.

  2. Besides the standard angle test, there is another condition in Step 7 which avoids that the LM direction is chosen if it tends to zero while the angle test is passed. This is due to the following observation. Suppose that (along a suitable subsequence without relabelling), the LM directions {dk}kN pass the angle test but tend to zero. In order to prove in Theorem 4.6 that the accumulation points of {zk}kN are stationary for ΨFB, one can exploit boundedness of the matrices DNFmax(zk)DNFmax(zk)+νkIp which would give (14) MˆI(w¯,ξ¯)[H(w¯,ξ¯)max(G(w¯,ξ¯),ξ¯)]=0(14) by definition of the LM direction where (w¯,ξ¯) is an accumulation point of {zk}kN. Above, we used the matrix

    MˆI(w¯,ξ¯):=[Hw(w¯,ξ¯)Hξ(w¯,ξ¯)D(I>(w¯,ξ¯)I)Gw(w¯,ξ¯)D(I>(w¯,ξ¯)I)Gξ(w¯,ξ¯)D(I<(w¯,ξ¯)Ic)]

    as well as the index set I>(w¯,ξ¯):={i{1,,p2}|Gi(w¯,ξ¯)>ξ¯},and the pair (I,Ic) is a disjoint partition of {i{1,,m}|Gi(w¯,ξ¯)=ξ¯i}. We note, however, that (Equation14) is underdetermined, so we cannot deduce H(w¯,ξ¯)=0 and max(G(w¯,ξ¯),ξ¯)=0 which would give us stationarity of z¯:=(w¯,ξ¯) for ΨFB. This is pretty much in contrast to the situation in [Citation9, Theorem 4.6] where, for square systems, stationarity has been shown under some additional assumptions.

  3. Following the literature, see e.g. [Citation9, Citation10], it is also possible to incorporate inexact solutions of the LM equation in Algorithm 4.5 in a canonical way. Combined with a suitable solver for this equation, this approach may lead to an immense speed-up of the method. For brevity of presentation, we omit this discussion here but just point out the possibility of investigating it.

4.2. On using the Fischer–Burmeister function exclusively

In this subsection, we briefly comment on an algorithm, related to Algorithm 4.5, which fully relies on the residual FFB introduced in (Equation9). For completeness, we first present a result regarding the Newton-differentiability of this function which basically follows from the chain rule stated in Lemma 2.2 and Example 2.3(c).

Lemma 4.8

On each bounded subset of Rp1×Rp2, the mapping FFB is Newton-differentiable with Newton-derivative given by (w,ξ)[Hw(w,ξ)Hξ(w,ξ)D~G(w,ξ)Gw(w,ξ)D~G(w,ξ)Gξ(w,ξ)D~ξ(w,ξ)],and this mapping, again, is bounded on bounded sets. Whenever the derivatives of G and H are locally Lipschitzian, the order of Newton-differentiability is 1. Above, the matrices D~G(w,ξ),D~ξ(w,ξ)Rp2×p2 are those ones defined in Lemma 4.3.

Subsequently, we will denote the Newton-derivative of FFB characterized above by DNFFB. Observe that, due to Lemma 4.3, we have DNFFB(z)FFB(z)=ΨFB(z) for each zRp.

We also need to figure out, in which situations the Newton-derivative from Lemma 4.8 satisfies the assumptions for local fast convergence.

Lemma 4.9

Fix a solution (w¯,ξ¯)Rp1×Rp2 of (EquationMNLCS) and assume that for each pair (a,b)Rp2×Rp2 of vectors satisfying (15)  iI(w¯,ξ¯):ai=0, bi=1, iI+(w¯,ξ¯):ai=1, bi=0, iI00(w¯,ξ¯):(ai1)2+(bi1)2=1,(15) the matrix (16) [Hw(w¯,ξ¯)Hξ(w¯,ξ¯)DˆGa(w¯,ξ¯)Gw(w¯,ξ¯)DˆGa(w¯,ξ¯)Gξ(w¯,ξ¯)Dˆξb(w¯,ξ¯)],(16) where we used DˆGa(w¯,ξ¯):=diag(a)D(I+(w¯,ξ¯)I00(w¯,ξ¯)),Dˆξb(w¯,ξ¯):=diag(b)D(I(w¯,ξ¯)I00(w¯,ξ¯)),possesses linearly independent columns. Then the matrices N(w,ξ)N(w,ξ), where N(w,ξ):=[Hw(w,ξ)Hξ(w,ξ)D~G(w,ξ)Gw(w,ξ)D~G(w,ξ)Gξ(w,ξ)D~ξ(w,ξ)],are uniformly positive definite in a neighbourhood of (w¯,ξ¯).

Proof.

For the proof, we partially mimic our arguments from the proof of Lemma 4.4. Thus, let us suppose that the assertion is false. Then there exist sequences {wk}kNRp1, {ξk}kNRp2, {(dwk,dξk)}kNRp1×Rp2, and {ηk}kN[0,) such that wkw¯, ξkξ¯, ηk0, and, for each kN, dwk+dξk=1 as well as ηk=[dwkdξk]N(wk,ξk)N(wk,ξk)[dwkdξk].Again, we make use of the abbreviations from (Equation11) and obtain, by definition of the matrix N(wk,ξk), (17) ηk=Hw(k)dwk2+2(Hw(k)dwk)(Hξ(k)dξk)+Hξ(k)dξk2+i=1p2(va(wk,ξk))i2((Gi)w(k)dwk+(Gi)ξ(k)dξk)22i=1p2(va(wk,ξk))i(vb(wk,ξk))i((Gi)w(k)dwk+(Gi)ξ(k)dξk)(dξk)i+i=1p2(vb(wk,ξk))i2(dξk)i2(17) where we exploited the vectors va(wk,ξk) and vb(wk,ξk) defined in (Equation10). For each iI+(w¯,ξ¯), iIc00(wk,ξk):={1,,p2}I00(wk,ξk) holds for large enough kN by continuity of Gi, and we find the convergences (va(wk,ξk))i1 and (vb(wk,ξk))i0 as k. Similarly, for each iI(w¯,ξ¯), we find iIc00(wk,ξk) for all large enough kN, and we also have (va(wk,ξk))i0 and (vb(wk,ξk))i1 in this case. It remains to consider the indices iI00(w¯,ξ¯). By construction, we know that the sequence {((va(wk,ξk))i,(vb(wk,ξk))i)}kNR2 belongs to the sphere of radius 1 around (1,1) for each kN and, thus, possesses an accumulation point in this sphere. Thus, taking into account that I00(w¯,ξ¯) is a finite set, we find an infinite set KN and vectors a,bRp2 satisfying va(wk,ξk)Ka, vb(wk,ξk)Kb, and (Equation15). We may also assume for simplicity that the convergences dwkKdw and dξkdξ hold for a pair (dw,dξ)Rp1×Rp2 which is nonvanishing. Taking the limit kK in (Equation17) then gives 0=Hw(w¯,ξ¯)dw2+2(Hw(w¯,ξ¯)dw)(Hξ(w¯,ξ¯)dξ)+Hξ(w¯,ξ¯)dξ2+iI+(w¯,ξ¯)((Gi)w(w¯,ξ¯)dw+(Gi)ξ(w¯,ξ¯)dξ)2+iI(w¯,ξ¯)(dξ)i2+iI00(w¯,ξ¯)(ai((Gi)w(w¯,ξ¯)dw+(Gi)ξ(w¯,ξ¯)dξ)bi(dξ)i)2which, similar as in the proof of Lemma 4.4, implies that (dw,dξ) belongs to the kernel of the matrix from (Equation16). This, however, contradicts the lemma's assumptions.

We note that the assumption of Lemma 4.9, which corresponds to the full row rank of all matrices in Bouligand's generalized Jacobian of FFB at the reference point, i.e. BD-regularity, is more restrictive than the one from Lemma 4.4. Indeed, if the assumption of Lemma 4.9 holds, then one can choose the vectors a,bRp2 such that ∀i{1,,p2}:ai:={0iI(w¯,ξ¯)Ic,1iI+(w¯,ξ¯)I,bi:={0iI+(w¯,ξ¯)I,1iI(w¯,ξ¯)Icfor arbitrary II00(w¯,ξ¯) and Ic:=I00(w¯,ξ¯)I in order to validate the assumption of Lemma 4.4. Clearly, the assumptions of Lemmas 4.4 and 4.9 coincide whenever the biactive set I00(w¯,ξ¯) is empty. This situation is called strict complementarity in the literature.

The subsequent example shows that the assumptions of Lemma 4.9 can be strictly stronger than those ones of Lemma 4.4.

Example 4.10

Let us consider the mixed linear complementarity system w+ξ=0, w0,ξ0, =0which possesses the uniquely determined solution (w¯,ξ¯):=(0,0). In the context of this example, the functions G,H:R×RR are given by (w,ξ)R×R:G(w,ξ):=w,H(w,ξ):=w+ξ.Clearly, the assumption of Lemma 4.4 holds since the matrices [1110],[1101]possess full column rank 2. However, the matrix [11ab]possesses column rank 1 for a:=b:=1+2/2, i.e. the assumptions of Lemma 4.9 are violated.

From the viewpoint of semismooth solution methods, it also has been observed in [Citation9, Propositions 2.8 and 2.10, Example 2.1] that the mixed approach via Fmax needs less restrictive assumptions than the one via FFB in order to yield local fast convergence.

Next, we state the globalized nonsmooth LM method for the numerical solution of (EquationMNLCS) via exclusive use of FFB from (Equation9) in Algorithm 4.11.

Below, we formulate a convergence result which addresses Algorithm 4.11.

Theorem 4.12

Let {zk}kN be a sequence generated by Algorithm 4.11.

(a)

If ΨFB(zk+dk)κΨFB(zk) holds infinitely many times in Step 4, then {ΨFB(zk)}kN is a null sequence and each accumulation point of {zk}kN is a solution of (EquationMNLCS).

(b)

Each accumulation point of {zk}kN is a stationary point of ΨFB.

(c)

If an accumulation point z¯ of {zk}kN satisfies the assumptions of Lemma 4.9, then the whole sequence {zk}kN converges to z¯ superlinearly. If the derivatives of G and H are locally Lipschitz continuous functions, then the convergence is even quadratic.

Proof.

The only major difference to the proof of Theorem 4.6 addresses the second statement. More precisely, we need to show that, if the ratio test in Step 4 is violated along the tail of the sequence, and if dk0 along a subsequence (without relabelling) while zkz¯ for some z¯, then z¯ is stationary for ΨFB. As in the proof of Theorem 4.6, this is clear if dk=ΨFB(zk) holds infinitely many times. Thus, without loss of generality, let us assume that dk is the LM direction for all kN, i.e. the uniquely determined solution of (18). Boundedness of {νk}kN together with Lemma 4.8 gives boundedness of the matrices {DNFFB(zk)DNFFB(zk)+νkIp}kN. Thus, dk0 gives ΨFB(z¯)=0 by continuity of ΨFB and definition of dk in (18).

5. Applications in optimistic bilevel optimization

5.1. Model problem and optimality conditions

We consider the so-called standard optimistic bilevel optimization problem (OBPP) minx,yF(x,y)s.t.G(x,y)0,yS(x),(OBPP) where S:RnRm is given by (19) ∀xRn:S(x):=argminy{f(x,y)|yY(x)}.(19) The terminus ‘standard’ has been coined in [Citation46]. Above, F,f:Rn×RmR are referred to as the upper- and lower-level objective function, respectively, and assumed to be twice continuously differentiable. Furthermore the mapping, G:Rn×RmRs is the twice continuously differentiable upper-level constraint function, and the set-valued mapping Y:RnRm is given by ∀xRn:Y(x):={yRm|g(x,y)0},where the describing lower-level constraint function g:Rn×RmRt is also assumed to be twice continuously differentiable. The component functions of g will be denoted by g1,,gt.

We consider the so-called lower-level value function reformulation (VFR) minx,yF(x,y)s.t.G(x,y)0,g(x,y)0,f(x,y)φ(x),(VFR) where φ:RnR¯:=R{±} is the lower-level value function given by (20) ∀xRn:φ(x):=infy{f(x,y)|yY(x)}.(20) It is well known that (EquationOBPP) and (EquationVFR) possess the same local and global minimizers.

Let us fix a feasible point (x¯,y¯)Rn×Rm of (EquationOBPP). Under mild assumptions, one can show that the lower-level value function φ is locally Lipschitz continuous in a neighbourhood of x¯, and its Clarke subdifferential, see [Citation47], obeys the upper estimate (21) cφ(x¯){xf(x¯,y¯)+gx(x¯,y¯)νˆ|νˆΛ(x¯,y¯)},(21) where Λ(x¯,y¯) is the lower-level Lagrange multiplier set which comprises all vectors νˆRt such that yf(x¯,y¯)+gy(x¯,y¯)νˆ=0,νˆ0,g(x¯,y¯)0,νˆg(x¯,y¯)=0.Let us now assume that (x¯,y¯) is already a local minimizer of (EquationOBPP) and, thus, of (EquationVFR) as well. Again, under some additional assumptions, (x¯,y¯) is a (non-smooth) stationary point of (EquationVFR) (in Clarke's sense). Keeping the estimate (Equation21) in mind while noting that, by definition of the lower-level value function, f(x¯,y¯)=φ(x¯) is valid, this amounts to the existence of μRs, ν,νˆRt, and λR such that (22a) ∇F(x¯,y¯)+G(x¯,y¯)μ+g(x¯,y¯)(νλνˆ)=0,(22a) (22b) yf(x¯,y¯)+gy(x¯,y¯)νˆ=0,(22b) (22c) μ0,G(x¯,y¯)0,μG(x¯,y¯)=0,(22c) (22d) ν0,g(x¯,y¯)0,νg(x¯,y¯)=0,(22d) (22e) νˆ0,g(x¯,y¯)0,νˆg(x¯,y¯)=0,(22e) (22f) λ0.(22f) In the subsequently stated lemma, we postulate some conditions which ensure that local minimizers of (EquationOBPP) indeed are stationary in the sense that there exist multipliers which solve the system (Equation22). Related results can be found e.g. in [Citation48–50]. As we work with slightly different constraint qualifications, we provide a proof for the convenience of the reader.

Lemma 5.1

Fix a local minimizer (x¯,y¯)Rn×Rm of (EquationOBPP) such that x¯ is an interior point of domS. The fulfillment of the following conditions implies that there are multipliers μRs, ν,νˆRt, and λR which solve the system (Equation22).

(a)

The functions f and g1,,gt are convex in (x,y).

(b)

Either the functions f and g are affine in (x,y) or the lower-level Mangasarian–Fromovitz constraint qualification (LMFCQ) gy(x¯,y¯)νˆ=0,νˆ0,νˆg(x¯,y¯)=0  νˆ=0holds.

(c)

The set-valued mapping Φ:R×RsRn×Rm given by (r,u)R×Rs:Φ(r,u):={(x,y)gphY|f(x,y)φ(x)r,G(x,y)u}is calm at ((0,0),(x¯,y¯)).

Proof.

The imposed convexity assumptions on the lower-level data functions guarantee that φ is convex, see e.g. [Citation51, Lemma 2.1]. Given that x¯ is an interior point of domS, we have |φ(x)|< for all x in some neighbourhood of x¯, and therefore φ is Lipschitz continuous around x¯. Thus, (EquationVFR) is a Lipschitz optimization problem around the point (x¯,y¯). Hence, it follows from [Citation52, Theorem 4.1] and [Citation53, Theorem 6.12] that the calmness of Φ at ((0,0),(x¯,y¯)) yields the existence of μRs, λR, ϑcφ(x¯), and υNgphY(x¯,y¯) such that condition (Equation22c) holds together with (23a) ∇F(x¯,y¯)+G(x¯,y¯)μ+λ(∇f(x¯,y¯)(ϑ,0))+υ=0,(23a) (23b) λ0,f(x¯,y¯)φ(x¯)0,λ(f(x¯,y¯)φ(x¯))=0.(23b) Above, NgphY(x¯,y¯) denotes for the normal cone, in the sense of convex analysis, to the graph of Y (which is a convex set under the assumptions made) at the point (x¯,y¯). Due to the validity of LMFCQ or the fact that g1,,gt are affine, there exists some νRt satisfying (Equation22d) such that υ=g(x¯,y¯)ν, see e.g. [Citation53, Theorem 6.14] for the nonlinear case (in the linear case, this is a consequence of the well-known Farkas lemma). Furthermore, combining the fulfillment of LMFCQ with the full convexity of the lower-level data functions implies that inclusion (Equation21) holds, see e.g. [Citation51, Theorem 2.1]. On the other hand, if f and g are affine, then (Equation21) holds due to [Citation54, Proposition 4.1]. In both situations, we can find some νˆRt such that (Equation22b), (Equation22e), and ϑ=xf(x¯,y¯)+gx(x¯,y¯)νˆ are satisfied. Plugging this information into (Equation23a) gives xF(x¯,y¯)+Gx(x¯,y¯)μ+gx(x¯,y¯)(νλνˆ)=0,yF(x¯,y¯)+Gy(x¯,y¯)μ+λyf(x¯,y¯)+gy(x¯,y¯)ν=0.Now, making use of (Equation22b) yields (Equation22). Finally, it remains to show that (Equation23b) reduces to (Equation22f). This, however, is obvious as f(x¯,y¯)=φ(x¯) holds due to y¯S(x¯).

Remark 5.2

  1. Note that assumption (a) in Lemma 5.1 can be replaced by so-called inner semicontinuity of the lower-level optimal solution mapping S from (Equation19) at (x¯,y¯) as this, together with LMFCQ, still yields validity of the estimate (Equation21) although φ is likely to be not convex in this situation, see e.g. [Citation50, Corollary 5.3, Theorem 6.1].

  2. We note that assumption (c) is potentially weaker than the standard conditions used in the literature which combine a so-called partial calmness condition, see [Citation55], with MFCQ-type conditions with respect to the upper- and lower-level inequality constraints. On the one hand, we admit that a calmness-type assumption on the constraint including the lower-level value function is comparatively restrictive, see e.g. [Citation56–58] for discussions, so that (Equation22) can be seen as a reliable necessary optimality condition for (EquationOBPP) in selected situations only. On the other hand, from a numerical perspective, the computational solution of the system (Equation22) turned out to be surprisingly effective in order to determine minimizers of optimistic bilevel optimization problems, see e.g. [Citation41, Citation42, Citation59], and this observation covers situations where the assumptions of Lemma 5.1 are not necessarily satisfied.

  3. In the case where the functions f and g are affine in (x,y), the associated lower-level value function φ is piecewise affine. Hence, whenever G is affine as well, the set-valued mapping Φ from assumption (c) is so-called polyhedral, i.e. its graph is the union of finitely many convex polyhedral sets. It is well known that such mappings are inherently calm at all points of their graph, see [Citation60, Proposition 1]. Thus, assumptions (a), (b), and (c) are satisfied in this situation.

  4. A second standard approach to optimality conditions for bilevel optimization problems is based on its so-called Karush–Kuhn–Tucker reformulation, but as shown in [Citation59], both approaches are, in general, completely different when comparing the resulting optimality conditions and associated constraint qualifications. Optimality conditions which are based on the reformulation (EquationVFR) merely discriminate from each other due to different upper estimates for the subdifferential of the optimal value function and the associated constraint qualifications which ensure their validity. For a detailed comparison, we refer the interested reader to [Citation48, Citation49, Citation61].

Let us now transfer the necessary optimality conditions from (Equation22) into a mixed nonlinear complementarity system. In order to do it in a reasonable way, we need to comment on the role of the appearing real number λ. Therefore, let us mention again that system (Equation22) is nothing else but the (nonsmooth) Karush–Kuhn–Tucker system of (EquationVFR) where the (Clarke) subdifferential of the implicitly known function φ has been approximated from above by initial problem data in terms of (Equation21). Having this in mind, there are at least two possible interpretations of the meaning behind λ. On the one hand, it may represent the Lagrange multiplier associated with the constraint f(x,y)φ(x)0 in (EquationVFR). Clearly, in order to incorporate optimality for the lower-level problem into the system (Equation22), the multiplier νˆ characterized in (Equation22b), (Equation22e), has to be meaningful, i.e. λ has to be positive. Similarly, we can interpret λ as a partial penalty parameter which provides local optimality of (x¯,y¯) for minx,yF(x,y)+λ(f(x,y)φ(x))s.t.G(x,y)0,g(x,y)0whose (nonsmooth) Karush–Kuhn–Tucker system reduces to (Equation22) under mild assumptions, and this is the fundamental idea behind the aforementioned concept of partial calmness from [Citation55]. We note that, whenever a feasible point (x¯,y¯)Rn×Rm of (EquationOBPP) is a local minimizer of the above partially penalized problem for some λ, then this also holds for each larger value of this parameter. Similarly, the case λ=0 would not be reasonable here as this means that lower-level optimality is not a restriction at all. Summing up these considerations, we may work with λ>0.

Subsequently, we will introduce some potential approaches for the reformulation of (Equation22) as a mixed nonlinear complementarity system.

5.1.1. Parametric approach

Following the ideas in [Citation42], we suppose that λ>0 is not a variable in the system (Equation22), but a given parameter which has to be chosen before the system is solved. Although this approach is challenging due to the obvious difficulty of choosing an appropriate value for λ, it comes along with some theoretical advantages as we will see below.

For a compact notation, let us introduce the block variables w:=[xy]Rn+m,ξ:=[μννˆ]Rs+2tas well as, for some fixed λ>0, functions Lλop,op:Rn+m×Rs+2tR of Lagrangian type given by ∀wRn+m,∀ξRs+2t:Lλop(w,ξ):=F(x,y)+μG(x,y)+(νλνˆ)g(x,y),op(w,ξ):=f(x,y)+νˆg(x,y).Setting (24) G(w,ξ):=[G(x,y)g(x,y)g(x,y)],H(w,ξ):=[wLλop(w,ξ)yop(w,ξ)],(24) the solutions of the associated nonlinear complementarity system (EquationMNLCS) are precisely the solutions of (Equation22) with a priori given λ. We note that G does not depend on the multipliers in this setting.

Let us now state some assumptions which guarantee local fast convergence of Algorithms 4.5 and 4.11 when applied to the above setting. Therefore, we first need to fix a point (w,ξ)=((x,y),(μ,ν,νˆ))Rn+m×Rs+2t which satisfies the stationarity conditions (Equation22) with given λ>0. For such a point, we introduce the following index sets: IG0(x,y):={i{1,,s}|Gi(x,y)=0},IG(x,y):={i{1,,s}|Gi(x,y)<0},IG+(x,y,μ):={iIG0(x,y)|μi>0},IG00(x,y,μ):={iIG0(x,y)|μi=0},Ig0(x,y):={i{1,,t}|gi(x,y)=0},Ig(x,y):={i{1,,t}|gi(x,y)<0},Ig+(x,y,ν):={iIg0(x,y)|νi>0},Ig00(x,y,ν):={iIg0(x,y)|νi=0}.Furthermore, we make use of the so-called critical subspace defined by C(w,ξ):={δwRn+m|Gi(x,y)δw=0iIG+(x,y,μ)gi(x,y)δw=0iIg+(x,y,ν)Ig+(x,y,νˆ)}.We say that the lower-level linear independence constraint qualification (LLICQ) holds at (x,y) whenever the gradients ygi(x,y)(iIg0(x,y))are linearly independent (note that, at the lower-level stage, only y is a variable). Analogously, the bilevel linear independence constraint qualification (BLICQ) is said to hold at (x,y) whenever the gradients Gi(x,y)(iIG0(x,y)),gi(x,y) (iIg0(x,y))are linearly independent.

The following two theorems are inspired by [Citation41, Theorem 3.3]. Our first result provides conditions which guarantee validity of the assumptions of Lemma 4.4 in the setting (Equation24). These assumptions, thus, give local fast convergence of Algorithm 4.5 in the present setting.

Theorem 5.3

Let (w,ξ)=((x,y),(μ,ν,νˆ))Rn+m×Rs+2t be a solution of the stationarity system (Equation22) with given λ>0. Let LLICQ and BLICQ be satisfied at (x,y). Finally, assume that the second-order condition (25)  δwC(w,ξ){0}:δwww2Lλop(w,ξ)δw>0(25) holds. Then, in the specific setting modelled in (Equation24), the assumptions of Lemma 4.4 are valid.

Proof.

For each sets IGIG00(x,y,μ), IgIg00(x,y,ν), and IˆgIg00(x,y,νˆ), we need to show that the system comprising (26a) ww2Lλop(w,ξ)δw+G(x,y)δμ+g(x,y)(δνλδνˆ)=0,(26a) (26b) (yop)w(w,ξ)δw+gy(x,y)δνˆ=0(26b) as well as the sign conditions (27a) Gi(x,y)δw=0iIG+(x,y,μ)IG,(27a) (27b) gi(x,y)δw=0iIg+(x,y,ν)Ig+(x,y,νˆ)IgIˆg,(27b) (27c) δμi=0iIG(x,y)(IG)c,(27c) (27d) δνi=0iIg(x,y)(Ig)c,(27d) (27e) δνˆi=0iIg(x,y)(Iˆg)c(27e) only possesses the trivial solution. Above, we used (IG)c:=IG00(x,y,μ)IG, (Ig)c:=Ig00(x,y,ν)Ig, and (Iˆg)c:=Ig00(x,y,νˆ)Iˆg.

Multiplying (Equation26a) from the left with δw and respecting (Equation27a) gives δwww2Lλop(w,ξ)δw=0. On the other hand, (Equation27a) and (Equation27b) yield δwC(w,ξ). Hence, the assumptions of the theorem can be used to find δw=0. Now, we can exploit LLICQ in order to obtain δνˆ=0 from (Equation26b) and (Equation27e). Finally, with the aid of BLICQ, (Equation26a), (Equation27c), and (Equation27d), we find δμ=0 and δν=0. This shows the claim.

Let us note from the proof that Theorem 5.3 remains correct whenever (Equation25) is replaced by the slightly weaker condition (28)  δwC(w,ξ){0}:δwww2Lλop(w,ξ)δw0.(28) However, (Equation25) is related to second-order sufficient optimality conditions for the characterization of strict local minimizers of (EquationOBPP), see e.g. [Citation62], and as we aim to find local minimizers of (EquationOBPP), this seems to be a more natural assumption than (Equation28) which seemingly concerns saddle points of Lλop.

Under slightly stronger conditions, we can prove that even the assumptions of Lemma 4.9 hold in the precise setting from (Equation24) which, in turn, guarantee local fast convergence of Algorithm 4.11 in the present setting.

Theorem 5.4

Let (w,ξ)=((x,y),(μ,ν,νˆ))Rn+m×Rs+2t be a solution of the stationarity system (Equation22) with given λ>0. Let LLICQ and BLICQ be satisfied at (x,y), and let Ig00(x,y,νˆ)Ig+(x,y,ν) hold. Finally, assume that the second-order condition (Equation25) holds. Then, in the specific setting modelled in (Equation24), the assumptions of Lemma 4.9 are valid.

Proof.

For each iIG00(x,y,μ), let (aiG,biG)R2 satisfy (aiG1)2+(biG1)2=1.Similarly, for each iIg00(x,y,ν) (iIg00(x,y,νˆ)), let (aig,big)R2 ((aˆig,bˆig)R2) satisfy (aig1)2+(big1)2=1((aˆig1)2+(bˆig1)2=1).We need to show that the system comprising the conditions from (Equation26) as well as the sign conditions (29a) Gi(x,y)δw=0iIG+(x,y,μ),(29a) (29b) gi(x,y)δw=0iIg+(x,y,ν)Ig+(x,y,νˆ),(29b) (29c) aiGGi(x,y)δwbiGδμi=0iIG00(x,y,μ),(29c) (29d) aiggi(x,y)δwbigδνi=0iIg00(x,y,ν),(29d) (29e) aˆiggi(x,y)δwbˆigδνˆi=0iIg00(x,y,νˆ),(29e) (29f) δμi=0iIG(x,y),(29f) (29g) δνi=δνˆi=0iIg(x,y)(29g) only possesses the trivial solution.

For later use, we introduce index sets P10G,P01G,P+GIG00(x,y,μ) by means of P10G:={iIG00(x,y,μ)|aiG=1,biG=0},P01G:={iIG00(x,y,μ)|aiG=0,biG=1},P+G:=IG00(x,y,μ)(P10GP01G).Let us note that aiG,biG>0 holds for each iP+G, which gives Gi(x,y)δw=(biG/aiG)δμi by (Equation29c). Furthermore, for each iP10G, we have Gi(x,y)δw=0, while δμi=0 is valid for each iP01G. Let the index sets P10g,P01g,P+gIg00(x,y,ν) and Pˆ10g,Pˆ01g,Pˆ+gIg00(x,y,νˆ) be defined in analogous fashion. Note that for each iPˆ+gIg00(x,y,νˆ)Ig+(x,y,ν), (Equation29b) and (Equation29e) give (bˆig/aˆig)δνˆi=gi(x,y)δw=0.

Clearly, (Equation29a) and (Equation29b) give δwC(w,ξ). Multiplying (Equation26a) from the left with δw while respecting (Equation25), (Equation29a), and the above discussion gives 0δwww2Lλop(w,ξ)δw=iP+GbiGaiG(δμi)2iP+gbigaig(δνi)2+λiPˆ+gbˆigaˆig(δνˆi)2=iP+GbiGaiG(δμi)2iP+gbigaig(δνi)20.Thus, we have δwww2Lλop(w,ξ)δw=0, so (Equation25) yields δw=0. Now, we can argue as in the proof of Theorem 5.3 in order to obtain δμ=0, δν=0, and δνˆ=0 from (Equation26b), (Equation29f), and (Equation29g) with the aid of LLICQ and BLICQ. This completes the proof.

Let us note that both Theorems 5.3 and 5.4 drastically enhance [Citation41, Theorem 3.3] where strict complementarity is assumed.

5.1.2. Variable approach

The discussions in [Citation40, Citation42] underline that, from a numerical point of view, treating λ as a parameter in (Equation22) is nontrivial since the particular choice of it is quite involved. We therefore aim to interpret λ as a variable which is determined by the solution algorithm. In this section, we suggest two associated approaches.

First, we may consider the block variables w:=[xy]Rn+m,ξ:=[μννˆλ]Rs+2t+1as well as Lagrangian-type functions L1op,1op:Rn+m×Rs+2t+1R given by ∀wRn+m,∀ξRs+2t+1:L1op(w,ξ):=F(x,y)+μG(x,y)+(νλνˆ)g(x,y),1op(w,ξ):=f(x,y)+νˆg(x,y).Setting (30) G(w,ξ):=[G(x,y)g(x,y)g(x,y)0],H(w,ξ):=[wL1op(w,ξ)y1op(w,ξ)],(30) we can reformulate (Equation22) as a mixed complementarity system of type (EquationMNLCS). Here, λ is treated as part of the Lagrange multiplier vector associated with (Equation22), and this seems to be rather natural.

A second idea is based on a squaring trick. Observe that the system (Equation22) can be equivalently reformulated by combining ∇F(x¯,y¯)+G(x¯,y¯)μ+g(x¯,y¯)(νζ2νˆ)=0with (Equation22b)–(Equation22e) for multipliers μRs, ν,νˆRt, and ζR. This eliminates the sign condition (Equation22f). Thus, using the block variables w:=[xyζ]Rn+m+1,ξ:=[μννˆ]Rs+2tas well as Lagrangian-type functions L2op,2op:Rn+m+1×Rs+2tR given by ∀wRn+m+1,∀ξRs+2t:L2op(w,ξ):=F(x,y)+μG(x,y)+(νζ2νˆ)g(x,y),2op(w,ξ):=f(x,y)+νˆg(x,y),we can recast the resulting system in the form (EquationMNLCS) when using (31) G(w,ξ):=[G(x,y)g(x,y)g(x,y)],H(w,ξ):=[(x,y)L2op(w,ξ)y2op(w,ξ)].(31) Note that in both settings (Equation30) and (Equation31), the mapping G does not depend on the multiplier.

Let us point the reader's attention to the fact that both approaches discussed above come along with the heavy disadvantage that a result similar to Theorem 5.3 does not seem to hold. In order to see this, one can try to mimic the proof of Theorem 5.3 in the setting (Equation30) and (Equation31), respectively. After all possible eliminations, one ends up with δν=δλνˆ and δν=2ζδζνˆ, respectively, and both of these linear equations possess a nonvanishing kernel. Clearly, these arguments extend to Theorem 5.4 as well.

5.2. Computational experiments

We would like to point the reader's attention to the fact that the numerical application of Gauss–Newton or LM methods for the numerical solution of a smoothed version of system (Equation22) has been investigated in [Citation41, Citation42]. Therein, the authors challenged their methods by running experiments on the 124 nonlinear bilevel optimization problems which are contained in the BOLIB library from [Citation63]. Let us note that only some of these examples satisfy the assumptions of Lemma 5.1, leading to the fact that (Equation22) does not provide a reliable necessary optimality condition, i.e. system (Equation22) does not possess a solution in many situations. In this regard, it is not surprising that, in [Citation41, Section 5.4], the authors admit that the methods under consideration most often terminate since the maximum number of iterations is reached or since the norm of the difference of two consecutive iterates becomes small, which is clearly not satisfying. In [Citation42, Section 4.1.4], the authors introduce additional termination criteria, based on the difference of the norm of the FB residual and certain thresholds of the iteration number, to enforce termination even in situations where the algorithm did not fully converge. It is not clarified in [Citation42] why the authors do not check for approximate stationarity of the squared FB residual. Summarizing this, the experiments in [Citation41, Citation42] visualize some global properties of the promoted solution approaches but do not respect the assumptions needed to ensure (local fast) convergence. The results are, thus, of limited expressiveness.

Here, we restrict ourselves to the illustration of certain features of our nonsmooth LM methods from Algorithms 4.5 and 4.11 on problems where the assumptions of Lemma 5.1 or even Theorems 5.3 and 5.4 hold. Furthermore, we take care of distinguishing between the parametric approach from Section 5.1.1, where the multiplier λ is treated as a parameter which has to be chosen a priori, see [Citation40, Citation41] for further comments on how precisely this parameter can be chosen in numerical practice, and the variable approach from Section 5.1.2, where λ is an additional variable.

5.2.1. Implementation and setting

We implemented Algorithm 4.5 (method mixLM) and Algorithm 4.11 (method FBLM) for the three different settings described in (Equation24) (setting Para), (Equation30) (setting Var1), and (Equation31) (setting Var2) (making a total number of six algorithms) in MATLAB2022b based on user-supplied gradients and Hessians. Numerical experiments were ran for two bilevel optimization problems whose minimizers satisfy the stationarity conditions from (Equation22):

Experiment 1

the problem from [Citation62, Example 8] where (Equation22) holds at the global minimizer for each λ>0,

Experiment 2

the inverse transportation problem from [Citation64, Section 5.3.2, Experiment 3] whose lower-level problem and upper-level constraints are fully affine.

For each experiment, a certain number of (random) starting points has been generated, and each of the algorithms has been run based on these starting points. If the termination criterion FFB(zk)<τabs (Term=1) is violated in Algorithms 4.5 and 4.11, we additionally check validity of ΨFB(zk)<τabsstat (Term=2) for some constant τabsstat>0 in order to capture situations where a stationary point of the squared FB residual ΨFB has been found. Furthermore, we equip each algorithm with a maximum number of possible iterations, and terminate if it is hit (Term=0).

In order to compare the output of the six methods, we make use of so-called performance profiles, see [Citation65], based on the following indicators: total number of iterations, execution time (in seconds), upper-level objective value, and percentage of full LM steps (i.e. the quotient # full LM steps/# total iterations). We denote by ts,i>0 the metric of comparison (associated with the current performance index) of solver sS for solving the instance iI of the problem, where S is the set of solvers and I represents the different starting points. We define the performance ratio by ∀sS,∀iI:rs,i:=ts,imin{ts,i|sS}.This quantity is the ratio of the performance of solver sS to solve instance iI compared to the best performance of any other algorithm in S to solve instance i. The cumulative distribution function ωs:[1,)[0,1] of the current performance index associated with the solver sS is defined by ∀τ[1,):ωs(τ):=|{iI|rs,iτ}||I|.The performance profile for a fixed performance index shows (the illustrative parts of) the graphs of all the functions ωs, sS. The value ωs(1) represents the fraction of problem instances for which solver sS shows the best performance. For arbitrary τ1, ωs(τ) shows the fraction of problem instances for which solver sS shows at most the τ-fold of the best performance.

Finally, let us comment on the precise construction of the performance metric. For the comparison of the total number of iterations and execution time (in seconds), we simply rely on the index under consideration. Computed upper-level objective values are shifted by the minimum function value known in the literature to ensure that the associated performance metric does not produce negative outputs (additionally, we also add a small positive offset to ensure positivity). In order to benchmark the percentage of full LM steps, we subtract the aforementioned quotient of full LM steps over total number of steps from 1 (again, a positive offset is added to ensure positivity). This guarantees that small values of the metric indicate a desirable behaviour.

5.2.2. Numerical examples

Experiment 1 We investigate the bilevel optimization problem from [Citation62, Example 8] which is given by minx,y(x8)2+(y9)2s.t.x0,yargminy{(y3)2|y2x}.Its global minimizer is (x¯,y¯):=(9,3), and the corresponding stationarity conditions from (Equation22) hold for each λ>0 when choosing μ:=0, νˆ0, and ν:=2+λνˆ. One can easily check that the assumptions of Theorem 5.3 are valid. If νˆ>0 is chosen, strict complementarity holds. However, even for νˆ:=0, we have Ig00(x¯,y¯,νˆ)=Ig+(x¯,y¯,ν) so that the assumptions of Theorem 5.4 hold as well for each feasible choice of the multipliers.

For this experiment, we took the maximum number of iteration to be 105 and chose the following parameters for the algorithms: q: = 0.8, τabs:=106, τabsstat:=108, β:=0.5, σ:=0.5, γ1:=γ2:=0.5, ρ1:=102, ρ2:=1012, and ρ:=102. We challenged our algorithms with 121 starting points constructed in the following way. The pair (x,y) is chosen from {0,1,,10}×{5,4,,5}. For the multipliers, we always chose μ:=ν:=νˆ:=1 and λ:=1. In the case of setting Var2, we made use of ζ:=1. The results of the experiments are illustrated in Figure  and Table .

Figure 1. Performance profiles for Experiment 1. From top left to bottom right: number of total iterations, execution time, upper-level objective value, and percentage of full LM steps.

Figure 1. Performance profiles for Experiment 1. From top left to bottom right: number of total iterations, execution time, upper-level objective value, and percentage of full LM steps.

Table 1. Averaged performance indices for Experiment 1.

We immediately see that the approaches Para and Var2 perform almost equally good. These methods terminate due to a small residual or stationarity of the squared FB residual in 90% of all runs, and the actual optimal solution is found in approximately 60% of all runs with slight advantages for mixLM. Let us now report on Var1. The optimal solution is found in about 57% of all runs. However, in this setting, both algorithms much more often do not terminate due to stationarity of the squared FB residual but are aborted since the maximum number of iterations is reached – the latter happens in 43% of the runs. Often, both algorithms seem to be close (but not too close) to a stationary point of the squared FB residual in this case, but this is not detected by our termination criterion. For a better overview, Figure  illustrates the termination behaviour of all six methods. Coming back to an overall comparison, mixLM in setting Para converges to the global minimizer of the problem for starting points (x,y) chosen from ({0,,10}×{0,,5})({5,,10}×{2})({9,10}×{3}).In most of the associated runs, a total number of 6 to 10 iterations is performed out of which almost all are full LM steps, i.e. we observe local fast convergence in our experiments. Related phenomena can be observed for the remaining five approaches. Particularly, despite the absence of any theoretical guarantees, local fast convergence is present in the variable settings described in Section 5.1.2. More precisely, we observed that whenever our algorithms approach the global minimizer, then this happens in a few number of full LM steps, and the convergence is, indeed, quadratic. The large average number of iterations documented in Table  results from the fact that whenever the algorithms do not approach the global minimizer, they tend to get stuck in stationary points of the squared residual which are approached by slow gradient steps. Typically, the algorithms are aborted in this situation since the maximum number of iterations is reached. The performance profiles in Figure indicate that settings Para and Var2 are superior to Var1 when the total number of iterations and execution time are investigated. Recalling that Var1 does not stop until the maximum number of iterations is reached in comparatively many runs, its (averaged) running time is twice as large as for Para and Var2. We note that mixLM (even in setting Var1) performs more full LM steps than FBLM, see Table  as well. For more than 80% of all starting points, mixLM in setting Var1 carries out the highest number of full LM steps among the six methods which we are comparing here. However, there are some critical instances where, in the setting Var1, comparatively many gradient steps are done, which explains the numbers in Table . All this is in line with the above comments, and also extends to the computed upper-level objective value, although one has to be careful as all six methods compute the global minimizer almost equally often. However, Var1 ends up in points with comparatively high objective value in much more runs than the other two settings. Taking a look at the averaged numbers in Table , we observe that mixLM performs slightly quicker than FBLM for all three settings. Furthermore, mixLM finds the global minimizer more often than FBLM. The highest number of total iterations can be observed for FBLM in setting Var1.

Figure 2. Reason for termination for Experiment 1.

Figure 2. Reason for termination for Experiment 1.

Experiment 2 We consider the bilevel optimization problem minx,y12yyo2s.t.x0,exebdem,yS(x)where S:RnRn× is the solution mapping of the parametric transportation problem (TR$(x)$) minyi=1nj=1cijyijs.t.j=1yijxi(i=1,,n),i=1nyijbjdem(j=1,,),y0.(TR$(x)$) Here, N is a positive integer, bdemN is an integer vector modelling the minimum demand of the ℓ consumers, and c[0,1]n× is a cost matrix. In (TR(x)), the parameter xRn represents the offer provided at n warehouses which is unknown and shall be reconstructed from a given (noisy) transportation plan yoRn×. For our experiments, we chose n: = 5, :=7, and relied on the data matrices given in [Citation64, Appendix]. As mentioned in [Citation64], the best known solution of this bilevel optimization problem comes along with an upper-level objective value of 5.07104.

All six methods were tested based on a collection of 500 random starting points which were created in the following way. For the construction of the pair (x,y), the components of x are chosen randomly from the interval [1,10] while the components of y are chosen randomly from [0,10], based on a uniform distribution, respectively. The associated multiplier vectors as well as ζ in setting Var2 are defined as in our first experiment. For our algorithms, we chose a maximum number of 104 iterations, and we made use of the following values for all appearing parameters: q: = 0.9, τabs:=104, τabsstat:=103, β:=0.9, σ:=0.4, γ1:=104, γ2:=0.05, and ρ1:=ρ2:=ρ:=104. The resulting performance profiles and averaged performance indices are documented in Figure  and Table .

Figure 3. Performance profiles for Experiment 2. From top left to bottom right: number of total iterations, execution time, upper-level objective value, and percentage of full LM steps.

Figure 3. Performance profiles for Experiment 2. From top left to bottom right: number of total iterations, execution time, upper-level objective value, and percentage of full LM steps.

Table 2. Averaged performance indices for Experiment 2.

From Figure , it is easy to see that FBLM performs better than mixLM considering the total number of iterations, execution time, and the percentage of full LM steps although FBLM in setting Var2 cannot challenge the same algorithm in settings Para and Var1 for these three criteria, see Table  as well. Our choice of τabs and τabsstat caused that all six methods terminated either due to a small residual or stationarity of the squared FB residual in most of the runs. Let us point out that mixLM terminated due to stationarity of the squared FB residual in most of the runs, and the same holds true for FBLM in setting Var1, see Figure  as well.

Figure 4. Reason for termination for Experiment 2.

Figure 4. Reason for termination for Experiment 2.

This, however, does not tell the full story. In the last row of Table , we document the number of runs where the final iterate produces an upper level objective value between 5.06104 and 5.08104, i.e. we count the number of runs where a point is produced whose associated upper-level function value is close to the best known one (below, we will refer to such points as reasonable). As it turns out, mixLM in setting Var1 is not competitive at all in this regard. Indeed, this method most often terminates in points possessing upper-level objective value around 4.07104 and norm of the FB residual around 105. We suspect that this method tends to get stuck in stationary points of the squared FB residual which are not feasible for the inverse transportation problem. Hence, the performance profile which monitors the upper-level objective value in Figure is of limited meaningfulness. The situation is far better for FBLM where reasonable points are computed much more frequently, although it has to be mentioned that the settings Para (79% reasonable) and Var2 (76% reasonable) again outrun Var1 (32% reasonable) in this regard. Interestingly, mixLM in setting Var2 always terminated due to a small value of the squared FB residual, but produced reasonable points in more than 82% of all runs. An individual fine tuning of the parameters for the settings Para, Var1, and Var2 may lead to more convincing termination reasons, but we abstained from it for a better comparison of all methods under consideration. The performance profiles in Figure show that FBLM in the settings Para and Var1 are superior to Var2, and when only computed function values are taken into account, then, for the price of higher computation time, mixLM in setting Var2 is acceptable as well.

Let us note that (Equation22) turns out to be an over-determined mixed linear complementarity system in the particular setting considered here and, thus, a certain error bound condition is present. In the light of available literature on smooth LM methods, see e.g. [Citation66], this could be a reason for the observed local fast convergence for a large number of starting points, although we did not prove such a result in this paper. Furthermore, it has to be observed that a reformulation via Var2 abrogates linearity of the system, but we obtained the best results in this setting when convergence to reasonable points is the underlying criterion.

Summary Let us briefly summarize that our experiments visualized competitiveness of the settings Para and Var2, while setting Var1 has turned out to come along with numerical disadvantages in both experiments. While, in our first experiment, there was no significant difference between Para and Var2, our second (and, by far, more challenging) experiment revealed that Var2 also could have some benefits over Para. Our experiments do not indicate whether it is generally better to apply mixLM or FBLM, but at least we can suggest that whenever the parameter λ is unknown, then it might be reasonable to apply mixLM in setting Var2 to obtain good solutions.

6. Concluding remarks

In this paper, we exploited the concept of Newton-differentiability in order to design a nonsmooth Levenberg–Marquardt method for the numerical solution of over-determined nonsmooth equations. Our method possesses desirable local convergence properties under reasonable assumptions and may be applied in situations where the underlying nonsmooth mapping is even discontinuous. We applied this idea to over-determined mixed nonlinear complementarity systems where a suitable globalization is possible via gradient steps with respect to the squared norm of the residual induced by the Fischer–Burmeister function. However, we investigated the method in two different flavors regarding the computation of the Levenberg–Marquardt direction namely via a residual given in terms of the maximum function on the one hand and in terms of the Fischer–Burmeister function on the other hand. For both methods, global convergence results have been derived, and assumptions for local fast convergence have been specified. Our analysis recovers the impressions from [Citation9], obtained in a slightly different framework, that the approach via the maximum residual works under less restrictive assumptions. Finally, we applied these globalized nonsmooth Levenberg–Marquardt methods in order to solve bilevel optimization problems numerically. Theoretically, we were in position to verify local fast convergence of both algorithms under less restrictive assumptions than those ones stated in the literature which among others assume strict complementarity, see [Citation41, Section 3]. Some numerical experiments were discussed in order to visualize interesting computational features of this approach.

Our theoretical and numerical investigation of bilevel optimization problems solely focused on the optimistic approach, but it might be also possible to address (similarly over-determined) stationarity systems related to pessimistic bilevel optimization via a similar approach. Next, we would like to point the reader's attention to the fact that the (inherent) inner semicontinuity assumption on the solution mapping in Lemma 5.1, see Remark 5.2 as well, is comparatively strong and may fail in many practically relevant scenarios. However, it is well known from the literature that in the presence of so-called inner semicompactness, which is inherent whenever the solution mapping is locally bounded, slightly more general necessary optimality conditions can be derived which comprise additional geometric constraints of polyhedral type addressing the multipliers, see e.g. [Citation49, Theorem 4.9]. For the numerical solution of this stationarity system, a numerical method would be necessary which is capable of solving an over-determined system of equations subject to polyhedral constraints where either the polyhedral constraint set is not convex or the equations are non-smooth. Exemplary, we refer the interested reader to [Citation67–69] for examples of such methods. In the future, it needs to be checked how these methods apply to the described setting of bilevel optimization, and how the assumptions for local fast convergence can be verified in this situation. Yet another way to extend our approach to more general bilevel optimization problems could be to consider the combined reformulation of bilevel optimization problems which merges the value function and Karush–Kuhn–Tucker reformulation, see [Citation43]. A potential associated stationarity system can be reformulated as an overdetermined system of discontinuous nonsmooth equations which, similar as in [Citation37], still can be solved with the aid of the nonsmooth Levenberg–Marquardt method developed in Section 3 as the latter one is reasonable for functions which are merely calm at their roots, and this property is likely to hold, see [Citation37, Lemma 3.3(d)]. Let us also mention that it would be interesting to study whether an error bound condition is enough to yield local fast convergence of Algorithms 4.5 and 4.11 even in the nonsmooth setting, see [Citation66] for the analysis in the smooth case. Furthermore, it remains to be seen whether some reasonable conditions can be found which guarantee local fast convergence of our algorithms when applied to stationarity systems of bilevel optimization problems where the multiplier associated with the value function constraint is treated as a variable, see Section 5.1.2.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Qi L. Convergence analysis of some algorithms for solving nonsmooth equations. Math Oper Res. 1993;18(1):227–244. https://doi.org/10.1287/moor.18.1.227
  • Qi L, Sun J. A nonsmooth version of Newton's method. Math Program. 1993;58:353–367. https://doi.org/10.1007/BF01581275
  • Mifflin R. Semismooth and semiconvex functions in constrained optimization. SIAM J Control Optim. 1977;15(6):959–972. https://doi.org/10.1137/0315061
  • Galántai A. Properties and construction of NCP functions. Comput Optim Appl. 2012;52(3):805–824. https://doi.org/10.1007/s10589-011-9428-9
  • Kanzow C, Yamashita N, Fukushima M. New NCP-functions and their properties. J Optim Theory Appl. 1997;94(1):115–135. https://doi.org/10.1023/A:1022659603268
  • Sun D, Qi L. On NCP-functions. Comput Optim Appl. 1999;13(1):201–220. https://doi.org/10.1023/A:1008669226453
  • Fischer A. A special Newton-type optimization method. Optimization. 1992;24(3-4):269–284. https://doi.org/10.1080/02331939208843795
  • De Luca T, Facchinei F, Kanzow C. A semismooth equation approach to the solution of nonlinear complementarity systems. Math Program. 1996;75:407–439. https://doi.org/10.1007/BF02592192
  • De Luca T, Facchinei F, Kanzow C. A theoretical and numerical comparison of some semismooth algorithms for complementarity problems. Comput Optim Appl. 2000;16:173–205. https://doi.org/10.1023/A:1008705425484
  • Facchinei F, Kanzow C. A nonsmooth inexact Newton method for the solution of large-scale nonlinear complementarity problems. Math Program. 1997;76:493–512. https://doi.org/10.1007/BF02614395
  • Kojima M, Shindo S. Extensions of Newton and Quasi-Newton methods to systems of PC1 equations. J Oper Res Soc Japan. 1986;29(4):352–374. https://doi.org/10.15807/JORSJ.29.352
  • Kummer B. Newton's method for non-differentiable functions. In: Guddat J, Bank B, Hollatz H, Kall P, Klatte D, Kummer B, Lommatzsch K, Tammer K, Vlach M, Zimmermann K, editors. Advances in mathematical optimization. Berlin: Akademie-Verlag; 1988. p. 114–125. https://doi.org/10.1515/9783112479926-011
  • Kummer B. Newton's method based on generalized derivatives for nonsmooth functions: convergence analysis. In: Oettli W, Pallaschke D, editors. Advances in optimization. Springer: Berlin; 1992. p. 171–194. https://doi.org/10.1007/978-3-642-51682-5_12
  • Pang J-S. Newton's method for B-differentiable equations. Math Oper Res. 1990;15(2):311–341. https://doi.org/10.1287/moor.15.2.311
  • Robinson SM. Newton's method for a class of nonsmooth functions. Set-Valued Anal. 1994;2:291–305. https://doi.org/10.1007/BF01027107
  • Facchinei F, Pang J-S. Finite-dimensional variational inequalities and complementarity problems. New York: Springer; 2003. https://doi.org/10.1007/b97543
  • Izmailov AF, Solodov MV. Newton-type methods for optimization and variational problems. New York: Springer; 2014. https://doi.org/10.1007/978-3-319-04247-3
  • Klatte D, Kummer B. Nonsmooth equations in optimization. regularity, calculus, methods and applications. Boston: Kluwer Academic; 2002. https://doi.org/10.1007/b130810
  • Jiang H. Global convergence analysis of the generalized Newton and Gauss–Newton methods of the Fischer–Burmeister equation for the complementarity problem. Math Oper Res. 1999;24(3):529–543. https://doi.org/10.1287/moor.24.3.529
  • Kanzow C, Petra S. On a semismooth least squares formulation of complementarity problems with gap reduction. Optim Methods Softw. 2004;19(5):507–525. https://doi.org/10.1080/10556780410001683096
  • Ma C, Wang C. A nonsmooth Levenberg–Marquardt method for solving semi-infinite programming problems. J Comput Appl Math. 2009;230(2):633–642. https://doi.org/10.1016/j.cam.2009.01.004
  • Qi L, Wu S-Y, Zhou G. Semismooth Newton methods for solving semi-infinite programming problems. J Glob Optim. 2003;27:215–232. https://doi.org/10.1023/A:1024814401713
  • Chen C, Mangasarian OL. A class of smoothing functions for nonlinear and mixed complementarity problems. Comput Optim Appl. 1996;5:97–138. https://doi.org/10.1007/BF00249052
  • Daryina AN, Izmailov AF, Solodov MV. A class of active-set Newton methods for mixed complementarity problems. SIAM J Optim. 2005;15(2):409–429. https://doi.org/10.1137/S105262340343590X
  • Huang C, Wang S. A penalty method for a mixed nonlinear complementarity problem. Nonlinear Anal Theory Methods Appl. 2012;75(2):588–597. https://doi.org/10.1016/j.na.2011.08.061
  • Kanzow C. Global optimization techniques for mixed complementarity problems. J Glob Optim. 2000;16:1–21. https://doi.org/10.1023/A:1008331803982
  • Li D, Fukushima M. Smoothing Newton and quasi-Newton methods for mixed complementarity problems. Comput Optim Appl. 2000;17:203–230. https://doi.org/10.1023/A:1026502415830
  • Monteiro RDC, Pang J-S. Properties of an interior-point mapping for mixed complementarity problems. Math Oper Res. 1996;21(3):629–654. https://doi.org/10.1287/moor.21.3.629
  • Gfrerer H, Outrata JV. On a semismooth* Newton method for solving generalized equations. SIAM J Optim. 2021;31(1):489–517. https://doi.org/10.1137/19M1257408
  • Brokate M, Ulbrich M. Newton differentiability of convex functions in normed spaces and of a class of operators. SIAM J Optim. 2022;32(2):1265–1287. https://doi.org/10.1137/21M1449531
  • Chen X, Nashed Z, Qi L. Smoothing methods and semismooth methods for nondifferentiable operator equations. SIAM J Numer Anal. 2000;38(4):1200–1216. https://doi.org/10.1137/S0036142999356719
  • Hintermüller M, Ito K, Kunisch K. The primal-dual active set strategy as a semismooth Newton method. SIAM J Optim. 2002;13(3):865–888. https://doi.org/10.1137/s1052623401383558
  • Ito K, Kunisch K. Lagrange multiplier approach to variational problems and applications. Philadelphia: SIAM; 2008. https://doi.org/10.1137/1.9780898718614
  • Schiela A. A simplified approach to semismooth Newton methods in function space. SIAM J Optim. 2008;19(3):1417–1432. https://doi.org/10.1137/060674375
  • Ulbrich M. Semismooth Newton methods for operator equations in function spaces. SIAM J Optim. 2002;13(3):805–841. https://doi.org/10.1137/s1052623400371569
  • Ulbrich M. Semismooth newton methods for variational inequalities and constrained optimization problems in function spaces. Philadelphia: SIAM; 2011. https://doi.org/10.1137/1.9781611970692
  • Harder F, Mehlitz P, Wachsmuth G. Reformulation of the M-stationarity conditions as a system of discontinuous equations and its solution by a semismooth Newton method. SIAM J Optim. 2021;31(2):1459–1488. https://doi.org/10.1137/20M1321413
  • Dempe S. Foundations of bilevel programming. Dordrecht: Kluwer; 2002. https://doi.org/10.1007/b101970
  • Dempe S, Kalashnikov V, Pérez-Valdéz G, et al. Bilevel programming problems – theory, algorithms and applications to energy networks. Berlin: Springer; 2015. https://doi.org/10.1007/978-3-662-45827-3
  • Fischer A, Zemkoho AB, Zhou S. Semismooth Newton-type method for bilevel optimization: global convergence and extensive numerical experiments. Optim Methods Softw. 2022;37(5):1770–1804. https://doi.org/10.1080/10556788.2021.1977810
  • Fliege J, Tin A, Zemkoho AB. Gauss–Newton-type methods for bilevel optimization. Comput Optim Appl. 2021;78:793–824. https://doi.org/10.1007/s10589-020-00254-3
  • Tin A, Zemkoho AB. Levenberg–Marquardt method and partial exact penalty parameter selection in bilevel optimization. Optim Eng. 2023;24:1343–1385. https://doi.org/10.1007/s11081-022-09736-1
  • Ye JJ, Zhu DL. New necessary optimality conditions for bilevel programs by combining the MPEC and value function approaches. SIAM J Optim. 2010;20(4):1885–1905. https://doi.org/10.1137/080725088
  • Tseng P. Growth behavior of a class of merit functions for the nonlinear complementarity problem. J Optim Theory Appl. 1996;89(1):17–37. https://doi.org/10.1007/bf02192639
  • Facchinei F, Soares J. A new merit function for nonlinear complementarity problems and a related algorithm. SIAM J Optim. 1997;7(1):225–247. https://doi.org/10.1137/S1052623494279110
  • Zemkoho AB. Solving ill-posed bilevel programs. Set-Valued Var Anal. 2016;24:423–448. https://doi.org/10.1007/s11228-016-0371-x
  • Clarke FH. Optimization and nonsmooth analysis. Philadelphia: SIAM; 1990. https://doi.org/10.1137/1.9781611971309
  • Dempe S, Dutta J, Mordukhovich BS. New necessary optimality conditions in optimistic bilevel programming. Optimization. 2007;56(5-6):577–604. https://doi.org/10.1080/02331930701617551
  • Dempe S, Zemkoho AB. The generalized Mangasarian–Fromovitz constraint qualification and optimality conditions for bilevel programs. J Optim Theory Appl. 2011;148(1):46–68. https://doi.org/10.1007/s10957-010-9744-8
  • Mordukhovich BS, Nam NM, Phan HM. Variational analysis of marginal functions with applications to bilevel programming. J Optim Theory Appl. 2012;152(3):557–586. https://doi.org/10.1007/s10957-011-9940-1
  • Tanino T, Ogawa T. An algorithm for solving two-level convex optimization problems. Int J Syst Sci. 1984;15(2):163–174. https://doi.org/10.1080/00207728408926552
  • Henrion R, Jourani A, Outrata J. On the calmness of a class of multifunctions. SIAM J Optim. 2002;13(2):603–618. https://doi.org/10.1137/S1052623401395553
  • Rockafellar RT, Wets RJ-B. Variational analysis. Berlin: Springer; 1998. https://doi.org/10.1007/978-3-642-02431-3
  • Ye JJ, Wu S-Y. First order optimality conditions for generalized semi-infinite programming problems. J Optim Theory Appl. 2008;137(2):419–434. https://doi.org/10.1007/s10957-008-9352-z
  • Ye JJ, Zhu DL. Optimality conditions for bilevel programming problems. Optimization. 1995;33(1):9–27. https://doi.org/10.1080/02331939508844060
  • Henrion R, Surowiec TM. On calmness conditions in convex bilevel programming. Appl Anal. 2011;90(6):951–970. https://doi.org/10.1080/00036811.2010.495339
  • Ke R, Yao W, Ye JJ, et al. Generic property of the partial calmness condition for bilevel programming problems. SIAM J Optim. 2022;32(2):604–634. https://doi.org/10.1137/20M1371403
  • Mehlitz P, Minchenko LI, Zemkoho AB. A note on partial calmness for bilevel optimization problems with linearly structured lower level. Optim Lett. 2021;15:1277–1291. https://doi.org/10.1007/s11590-020-01636-6
  • Zemkoho AB, Zhou S. Theoretical and numerical comparison of the Karush–Kuhn–Tucker and value function reformulation in bilevel optimization. Comput Optim Appl. 2021;78:625–674. https://doi.org/10.1007/s10589-020-00250-7
  • Robinson SM. Some continuity properties of polyhedral multifunctions. In: König H, Korte B, Ritter K, editors. Mathematical programming at oberwolfach. Berlin: Springer; 1981. p. 206–214. https://doi.org/10.1007/BFb0120929
  • Dempe S, Mordukhovich BS, Zemkoho AB. Necessary optimality conditions in pessimistic bilevel programming. Optimization. 2014;63(4):505–533. https://doi.org/10.1080/02331934.2012.696641
  • Mehlitz P, Zemkoho AB. Sufficient optimality conditions in bilevel programming. Math Oper Res. 2021;46(4):1573–1598. https://doi.org/10.1287/moor.2021.1122
  • Zhou S, Zemkoho AB, Tin A. BOLIB: bilevel optimization LIBrary of test problems. In: Dempe S, Zemkoho AB, editors. Bilevel optimization: advances and next challenges. Cham: Springer International; 2020. p. 563–580. https://doi.org/10.1007/978-3-030-52119-6_19
  • Mehlitz P. Asymptotic regularity for Lipschitzian nonlinear optimization problems with applications to complementarity constrained and bilevel programming. Optimization. 2023;72(1):277–320. https://doi.org/10.1080/02331934.2022.2031190
  • Dolan ED, Moré JJ. Benchmarking optimization software with performance profiles. Math Program. 2002;91(2):201–213. https://doi.org/10.1007/s101070100263
  • Yamashita N, Fukushima M. On the rate of convergence of the Levenberg–Marquardt method. In: Alefeld G, Chen X, editors. Topics in numerical analysis. Vienna: Springer; 2001. p. 239–249. https://doi.org/10.1007/978-3-7091-6217-0_18
  • Behling R, Fischer A, Haeser G, et al. On the constrained error bound condition and the projected Levenberg–Marquardt method. Optimization. 2017;66(8):1397–1411. https://doi.org/10.1080/02331934.2016.1200578
  • Facchinei F, Fischer A, Herrich M. A family of Newton methods for nonsmooth constrained systems with nonisolated solutions. Math Oper Res. 2013;77:433–443. https://doi.org/10.1007/s00186-012-0419-0
  • Facchinei F, Fischer A, Herrich M. An LP-Newton method: nonsmooth equations, KKT systems, and nonisolated solutions. Math Program. 2014;146:1–36. https://doi.org/10.1007/s10107-013-0676-6