Publication Cover
Optimization
A Journal of Mathematical Programming and Operations Research
Latest Articles
201
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Portfolio reshaping under 1st-order stochastic dominance constraints by the exact penalty function methods

ORCID Icon & ORCID Icon
Received 31 Jan 2023, Accepted 21 Jan 2024, Published online: 09 Feb 2024

Abstract

The paper studies a financial portfolio selection problem under 1st-order stochastic dominance constraints. These constraints constitute lower bounds on the return profile of the portfolio. In particular, they allow searching for a better portfolio than some reference portfolio by comparing their cumulative distribution functions. Candidate objective functions are the average return, a value at risk, or the average value at risk. The optimization problems obtained are computationally hard because of possibly non-convex constraints and possibly discontinuous objective functions. In the case of a discrete distribution of the return, we develop numerical procedures to solve the problem. The proposed approach uses new exact penalty functions to tackle with the 1st-order stochastic dominance constraints. The resulting penalized objective function is further optimized be the stochastic successive smoothing method as a local optimizer within some branch and bound global optimization scheme. The approach is numerically and graphically illustrated on small portfolio selection problems up to dimension 10.

1. Introduction

We consider the general, constrained optimization problem (1) minimize f(x),xX,(1) where the set XRn constitutes the set of feasible solutions. The projective exact penalty method (cf. [Citation1]) involves a map πX:RnX such that (2) πX(Rn)XandπX(x)=xfor every xX.(2) The map πX in (Equation2) is not specified. For X convex, the projection (3) πX(x)=argmin{yx:yX}(3) is well-defined and thus is a candidate for the global optimization problem (Equation4). However, the projection (Equation3) might not be available at cheap computational costs. 

For a star domain X, there exists a point x0X so that every line segment [x,x0]{λx0+(1λ)x:λ[0,1]} is fully contained in X, provided that xX. The function πX(x)=λxx0+(1λx)x,xX,where λx:=sup{λ:λx0+(1λ)xX},as well satisfies the conditions (Equation2).

For a single-valued and continuous projection mapping πX(), the global and local solutions of the constrained problem (Equation1) and the unconstrained, global optimization problem (4) minimize f(πX(x))+xπX(x),xRn,(4) coincide. Hence, the global, unconstrained optimization problem (Equation4) can be considered instead of the constrained optimization problem (Equation1).

The problem (Equation4) is not necessarily smooth nor convex. So solving (Equation4) requires applying non-smooth local and global optimization methods (cf. [Citation1]).

The present paper outlines the described methodology for a financial portfolio optimization problem under specific constraints, namely 1st-order stochastic dominance constraints (FSD), where each feasible portfolio stochastically dominates a given reference portfolio. This means that a decision maker with non-decreasing utility function will prefer any feasible portfolio to the reference one. Such portfolio optimization settings were considered in [Citation2,Citation3] (with the 2nd-order stochastic dominance constraints, SSD) and in [Citation4] (for linear problems with 1st-order stochastic dominance constraints). Problems with FSD constraints are much harder than the ones with SSD constraints, because the former are non-convex. Noyan and Ruszczyński [Citation5] reduce such problems to linear mixed-integer problems. The present paper develops a different approach to such problems, which is applicable to the nonlinear case as well. Our approach consists in an application of the exact penalty method to remove the FSD constraints and solution of the obtained penalty problem by non-smooth global optimization methods.

Outline of the paper.

First, Section 2 reviews the literature on decision-making and portfolio optimization under stochastic dominance constraints.

In Section 3, we set the problem of a portfolio optimization under 1st-order stochastic dominance constraints and provide some examples of such problems. Next (cf. Section 4), we reduce the portfolio optimization problem under 1st-order stochastic dominance constraints to the unconstrained problems by means of new exact projective non-smooth and discontinuous penalty functions.

Forth, we review the successive smoothing method for local optimization of non-smooth and discontinuous functions. This method is used as a local optimizer within the branch and bound framework for the global optimization of the penalty functions. We finally give numerical illustrations (Section 5) of the proposed approach to financial portfolio optimization under 1st-order stochastic dominance constraints on portfolios containing up to 10 components with one risk-free asset.

2. Literature review

The problem of financial portfolio optimization belongs to the class of decision-making problems under uncertainty. The choice of a particular portfolio is accompanied by an uncertain result in the form of a distribution of future returns. This and more general decision problems under stochastic uncertainty are studied in the theory of stochastic programming (cf. [Citation6]). In the general case, the formalization of such problems is carried out using preferences defined on the set of possible uncertain results of decisions made. Preferences establish partial order relationships on the set of decision outcomes, i.e. they satisfy the axioms of reflexivity, transitivity, and antisymmetry. Partial order relations make it possible to narrow the choice of preferable solutions to a subset of non-dominated alternatives. Under additional assumptions about the properties of preferences, the latter can be represented in a numerical form, and then the problem of choosing preferred outcomes turns into a problem of multi-objective optimization. A discussion of stochastic programming problems from this perspective can be found in [Citation7]. Conversely, any numerical function on the set of outcomes specifies a preference relation.

In optimization and financial portfolio management problems, investments are often allocated according subject to utility and risk criteria. The original settings of this type were proposed by Markowitz [Citation8] and Roy [Citation9], who used the mean return as a measure of utility and the variance of returns as a measure of risk. An attractive feature of these formulations is the relative simplicity of the resulting optimization problems.

Subsequently, other utility and risk measures were proposed and used, such as quantiles, averaged quantiles, semi-deviations of returns from the average value, probabilities of returns falling into a profit or loss area, general coherent risk measures, and others (the references include [Citation10–25], cf. also the references therein). The corresponding decision selection problems are computationally more complex and require adapting known methods, or even developing new solution methods. For example in [Citation26], the problems of optimizing a financial portfolio in terms of averaged quantiles are reduced to a linear programming problem and can be effectively solved by existing software tools. However, problems that involve quantiles or probabilities are much more difficult because they are non-convex and non-smooth with a possibly non-convex and disconnected admissible region. The problem may be even harder, if the return depends nonlinearly on the portfolio structure (see, e.g. the discussion of the properties of these problems in [Citation12,Citation13]). For example, in the works by Noyan et al. [Citation4], Benati and Rizzi [Citation11], Luedtke et al. [Citation18], Norkin and Boyko [Citation19], Kibzun et al. [Citation16] and Norkin et al. [Citation20], such problems are reduced to problems of mixed-integer programming in the case of a discrete distribution of random data. Gaivoronski and Pflug [Citation13] developed a special method for smoothing a variational series to optimize a portfolio by a quantile criterion. Wozabal et al. [Citation25] give a review of the quantile constrained portfolio selection problem, present a difference-of-convex representation of involved quantiles, and develop a branch and bound algorithm to solve the reformulated problem.

On the other hand, natural relations of stochastic dominance of the first, second and higher orders are known on the set of probability distributions. For example, the first order stochastic dominance relation is defined as the excess of one distribution function over another distribution function. The relation of stochastic dominance of the second order is determined by the relation of the integrals of the distribution functions of random variables. With the second-order stochastic dominance relation, the decision maker's negative attitude towards risk can be expressed (see a discussion of these issues in [Citation27,Citation28]).

A natural question arises about the connection between decision-making problems in a multi-criteria formulation and in terms of certain preference relations, in particular, stochastic dominance relations. The connection between mean-risk models and second-order stochastic dominance relations was studied in the works by Ogryczak and Ruszczyński [Citation29–31]. In the works by Dentcheva and Ruszczyński [Citation27,Citation32] it is shown under what conditions the problem of decision-making in terms of preference relations is reduced to the problem of optimizing a numerical indicator.

Dentcheva and Ruszczyński [Citation2,Citation3] proposed a mixed financial portfolio optimization model in which a numerical criterion is optimized, and constraints are specified using second-order stochastic dominance relations. The feasible set in this setting consists of decisions, which dominate some reference one and are preferred by any risk averse decision maker. In the case of a discrete distribution of random data, the problem is reduced to a linear programming problem of (large) dimension. These works have given rise to a large stream of work on stochastic optimization problems under second-order stochastic dominance constraints (see the reviews by Gutjahr and Pichler [Citation7], Dai et al. [Citation33], Dentcheva and Ruszczyński [Citation34] and Fábián et al. [Citation35]).

Noyan et al. [Citation4,Citation5] and Dentcheva and Ruszczyński [Citation36] considered similar mixed problems, but with first-order stochastic dominance relations in the constraints. To solve them, a method of reduction to problems of linear mixed-integer programming with subsequent continuous relaxation of Boolean constraints and the introduction of additional cutting constraints is proposed. The paper [Citation33] develops a quite different approach to solving the problem based on the dual reformulation from [Citation37] by discretization of the space of one-dimensional utilities by step-wise functions, smoothing the latter, and applying the stochastic gradient method for the (local) optimization of the approximate dual function. Dentcheva et al. [Citation38] studied the stability of these problems with respect to perturbation of the involved distributions on the basis of general studies of the stability of stochastic programming problems.

In this article, we consider similar financial portfolio optimization problems under 1st-order stochastic dominance constraints, but from a different point of view and apply a different solution approach that is also applicable to problems with nonlinear random return functions. This problem is viewed as the problem of optimizing the risk profile of the portfolio according to the preferences of the decision maker. Namely, the decision maker sets the desired risk profile (the form of the cumulative distribution function) and tries to find an acceptable portfolio that dominates this risk profile. This is one statement, and the other is that, under the condition of the existence of an admissible portfolio, i.e. portfolio dominating some reference portfolio, it can be any index or risk-free portfolio, choose a portfolio with the desired risk profile by optimizing one or another function (for example risk measures, as average quantiles, etc.).

In this way it is possible to satisfy the needs of both a risk seeking decision maker and a risk averse decision maker. In the first case, the average quantile function for high returns is maximized, and in the second case, the mean quantile function for low returns is maximized, in both cases with a lower bound on the quantile risk profile. In the first case, the risk profile is stretched, and in the second case, it is compressed and becomes more like a profile of a deterministic value. The reshaping of the risk profile can be made both through selection of different objective function and by adding new securities to the portfolio, e.g. as commodities, etc., cf. [Citation39].

In the problems under consideration, the objective function is nonlinear, non-convex, and possibly non-smooth or even discontinuous, and the number of constraints is continual (uncountable). But when the reference profile has a stepped character, one can limit oneself to a finite number of restrictions by the number of steps of the reference profile. In this problem, the admissible area may turn out to be non-convex and disconnected. Thus, the problem under consideration is a global optimization problem with highly complex and nonlinear constraints. To solve it, we first reduce it to an unconstrained global optimization problem by applying new penalty functions, namely, discontinuous penalty functions as in [Citation40,Citation41] and the so-called projective penalty functions as in [Citation1,Citation42]. In the first case, the objective function outside the allowable area is extended by large but finite penalty values, and in the second case, it is extended at infeasible points by summing the value of the objective function in the projection of this point onto the feasible set and the distance to this projection. In this case, the projection is made in the direction of some known internal feasible point. After such a transformation, the problem is still a complex unconstrained global optimization problem. In order to solve it, we further apply the method of successive smoothing of this penalty function, i.e. we minimize successive smoothed approximations of the penalized function, starting from relatively large smoothing parameters with its gradual decreasing to zero. It is known that the smoothed functions can be optimized by the method of stochastic gradients, where the latter have the form of finite difference vectors in random directions, cf. [Citation43–46]. Here, smoothing plays a dual role. Firstly, it allows optimizing non-smooth and discontinuous functions and, secondly, it levels out shallow local extrema. Although smoothing makes it possible to ignore small local extrema, it does not guarantee convergence to the global extrema. Therefore, we put the method of sequential smoothing in a general scheme of the stochastic branch and bound method, where the smoothing method plays the role of a local optimizer on subsets of the optimization area. The scheme of the branch and bound method is designed in such a way that the calculations are concentrated in the most promising areas of the search for the global extrema. Our approach is similar to that of Dai et al. [Citation33], but applied to the primal problem, it applies an exact penalty method instead of the Lagrangian one, and optimizes the penalized function by stochastic finite-difference gradient methods. Besides, we provide global search by a specific stochastic branch and bound algorithm, which is well-suited for parallelization.

This article describes the financial portfolio optimization model with 1st-order stochastic dominance constraints and illustrates the proposed approach to its solution on the problems of reshaping the risk profile of portfolios of small dimension. At the same time, the results of changing the shape of the risk profile are presented in graphical form, which allows visually comparing the resulting profile with the reference one, and, if necessary, continue adaptation of the profile to the preferences of the decision maker.

3. Mathematical problem setting

The financial portfolio is described by a vector x=(x1,,xn) of values xi and by a random vector of returns ω=(ω1,,ωn)Ω of assets, i=1,,n, in some fixed time interval; () denotes the transposition of a vector. Denote by X{xRn:i=1nxi1, xici}the set of admissible portfolios with a unit maximal total cost of the whole portfolio, ci is a lower bound on the value of component i of the portfolio (e.g. a short-selling constraint or a limitation on borrowing assets). In the definition of the set X, the inequality i=1nxi1 is used, which means that x01i=1nxi – the non-invested funds – have zero yield. The portfolio is characterized by a random return f(x,ω)=ωx, by the mean return μ(x)=Eωf(x,ω)=i=1nxiEωωi for the considered period of time and by the variance of return σ2(x)=Eω(f(x,ω)Eωf(x,ω))2, where Eω denotes the mathematical expectation with respect to the distribution of random variable ω.

The classical financial portfolio models assume a linear dependence of the return on the portfolio structure, for which alternative problem reformulations are available. Nonlinearities appear when random returns are modelled by some parametric distribution. Another example of nonlinear portfolio return appears in the dynamic portfolio optimization problem with fixed mix portfolio control strategy, cf. [Citation47].

Suppose the random vector ω is given by a discrete (an empirical, e.g.) distribution {ω1,,ωm} with equiprobable values ωiRn, i=1,2,,m. Then the average portfolio return μ(x)=Eωx and the cumulative distribution function (CDF) of the portfolio return Fx(t)=Pr{ωxt} are given by f(x)=μm(x)=1mi=1m(ωi)xand Fx(t)=Fx,m(t)=m1#{i:(ωi)xt}.The portfolio optimization problem under 1st-order stochastic dominance constraint is (5) maximizef(x)(5) (6) subject toFx(t)Fref(t)for all tR and xX.(6) The objective function f(x) in (Equation5) quantifies the profit, which is to be maximized. As objectives f(x) in the master problem (Equation5), we consider

  • the mean value μ(x),

  • some Value-at-Risk function (V@Rγ(x)) at risk level γ(0,1),

  • the average Value-at-Risk function AV@Rα,β(x)1βααβV@Rγ(x)dγ,0α<β1,in particular AV@Rγ(x)AV@Rγ,1(x) and AV@R0,1(x)=μ(x).

The constraints (Equation6) address risk: the cumulative distribution function Fx() of the feasible portfolio x must not be worse than the reference function Fref(t) (also called a reference risk profile) for all tR. The reference function itself may be given by Fxref(t+δ(t)), where xref is some reference portfolio with cumulative distribution function Fxref(t))=Pr{ωxreft}.

We formally can associate some random variable ξref with the CDF Fref(t). Then, the family of inequalities (Equation6) ensures that the random variable ξx=ωx dominates the random variable ξref in 1st stochastic order. We further remark that there can be several stochastic dominance constraints with corresponding reference CDF Frefi, which can be replaced by the single CDF Fref=miniFrefi.

The constraints (Equation6) are tight– often too tight to ensure a non-empty set of feasible solutions. The function δ(t)0 relaxes these tight constraints by accepting higher losses at given probabilities.

The function Fx() is non-decreasing and continuous from the right (upper semicontinuous), the functions Fref() and δ() are assumed right continuous.

Lemma 3.1

The feasible set in (Equation6) is either empty or compact.

Proof.

Indeed, define X<{xX:Fx<(t)Fref(t)  tR} andX{xX:Fx(t)Fref(t)  tR}with right continuous Fref(), Fx(), and left continuous Fx<(). On one hand it holds that XX<, since Fx<(t)Fx(t). On the other hand, if xX<, then it holds for any t that Fx(t)limtτFx<(τ)limtτFref(τ)=Fref(t).So X<X and, hence, X<=X. In such case, the function Fx<(t) appears to be lower semicontinuous in (x,t), hence the sets {xRn:Fx<(t)Fref(t)} are closed and the sets {xX:Fx<(t)Fref(t)} are closed for each t. If the feasible set X=X< in (Equation6) is non-empty, then it is closed and bounded and, hence, compact. This completes the proof.

Lemma 3.2

If the function Fx(t) has only finitely many jumps, e.g. at Tx={ω1x,,ωmx}, then {xX:Fx(t)Fref(t) tR}={xX:Fx(t)Fref(t),tTx}.

Proof.

Let Tx={t1,t2,}, Fx(t)=Fx(tk) for t[tk,tk+1), k1, and Fx(t)=0 for t<t1. Then Fx(t)=Fx(tk)Fref(tk)Fref(t) for all t[tk,tk+1), k1. Besides, Fx(t)=0Fref(t) for all t<t1.

An alternative problem formulation. A reformulation of the problem (Equation5)–(Equation6) involves the inverse cumulative distribution functions instead of the CDFs. To this end let Qx(α)suptR{t:Fx(t)α},α[0,1],be the return quantile (generalized inverse) function associated with the decision x, and Qref(α) be some reference quantile function, continuous from below (lower semicontinuous).

Consider the problem (7) maximizef(x)(7) (8) subject toQref(α)Qx(α) for all α[0,1] and xX.(8) The following lemmas demonstrate that problem (Equation7)–(Equation8) is well-defined and equivalent to the initial problem statement (Equation5)–(Equation6).

Lemma 3.3

Define Fx<(t)P{ωx<t} and Qx<(α)sup{t:Fx<(t)α}, then Qx(α)=Qx<(α), both functions Qx(α) and Qx<(α) are upper semicontinuous in (x,α), and hence the feasible set in (Equation8) is compact.

Proof.

First, it holds for any α[0,1] that Qx<(α)=sup{t:Fx<(t)α}sup{t:Fx(t)α}=Qx(α).Let us prove the opposite inequality, Qx(α)Qx<(α). For a given α there exists tα such that Qx<(α)=sup{t:Fx<(t)α}=tx,α and thus Fx<(t)α for all ttx,α. For any t<tx,α, it holds that Fx(t)Fx<(tx,α)α. Hence, we obtain the required opposite inequality, Qx(α)=sup{t:Fx(t)α}tx,α=Qx<(α) and thus Qx(α)=Qx<(α). Next, the Qx<() is the maximum function, by Aubin and Ekeland [Citation48, Ch. 1, Sec. 1, Prop. 21] it is upper semicontinuous in (x,α). Hence, the feasible set in (Equation8) is either empty or compact as intersection of compact sets {xX:Qx(α)Qref(α)}, α[0,1]. The proof is complete.

The next lemma states that problems (Equation5)–(Equation6) and (Equation7)–(Equation8) are equivalent.

Lemma 3.4

Let Qref(α)sup{t:Fref(t)α} (cf. (Equation8)) , where Fref(t) is the same (right continuous) reference distribution function as in (Equation6). Then the problems (Equation5)–(Equation6) and (Equation7)–(Equation8) are equivalent, i.e. they have the same objective function and their feasible sets coincide.

Proof.

We have to prove that (Equation6) and (Equation8) define the same feasible set. Assume (Equation6) is fulfilled but (Equation8) is not. Then there exists α0 such that Qref(α0)>Qx(α0). This implies that Fx(Qref(α0))>α0. From here, by assumption (Equation6), Fref(Qref(α0))Fx(Qref(α0))>α0. The supremum in sup{t:Fref(t)α0} is achieved at t=Qref(α) then Fref(Qref(α))α0, a contradiction.

Assume (Equation8) is fulfilled but (Equation6) is not. Then there exists t0 such that Fref(t0)<Fx(t0). By assumption, Qref(Fx(t0))Qx(Fx(t0)) and by definition of a quantile, Qx(Fx(t0))=sup{t:Fx(t)Fx(t0)}t0, thus Qref(Fx(t0))Qx(Fx(t0))t0. On the other hand, due to the right continuity of a distribution function, there is t1>t0 such that Fref(t1)<Fx(t0), so Qref(Fref(t))=sup{t:Fref(t)Fx(t0)}t1>t0, a contradiction. This completes the proof.

Lemma 3.5

If the reference function Qref has a step like character with steps at Aref={0=α1,α2,,αk=1}, i.e. Qref(α)=Qref(αi) for α[αi,αi+1), i{0,1,,k1}, then {xX:Qref(α)Qx(α)  α[0,1]}={xX:Qref(α)Qx(α)  αAref}.

Proof.

Assume Qref(αi)Qx(αi) for all i{0,1,,k}. Then Qref(α)=Qref(αi)Qx(αi)Qx(α) for all α[αi,αi+1), i{0,1,,k1} and Qref(αk)Qref(αk).

Employing different objective functions f(x) in (Equation7) allows reshaping the risk profile Fx(t) and Qx(α) in a desirable manner. For example, the problem maximizeAV@Rγ,1(x)subject toQxref(α)δ(α)Qx,m(α), δ(α)0 (α[0,1]) and xXcan be used for searching more risky but potentially more profitable portfolios than some reference one xref with risk profile Qxref(α) and step back function δ(α). The problem maximizeAV@R0,γ(x)subject toQxref(α)δ(α)Qx,m(α), δ(α)0 (α[0,1])andxXcan be used to obtain less risky and less profitable portfolio than a reference portfolio xref. Note, however, that the objective functions in these problems can be discontinuous.

Examples

The following two examples address particular cases of the problem setting (Equation7) and (Equation8) involving the generalized inverse. Example 3.7 extends the problem setting from portfolio optimization to decision-making under dangerous threats.

Example 3.6

Portfolio selection under a single Value-at-Risk (V@R) constraint, cf. [Citation25] and corresponding references therein

Let Qx(α) be the α-quantile of the random return ξx=ωx for a given α, q0 be the reference value for Qx(α0), and τmin1imωi a.s. Consider the problem (9) maximizef(x)=Eωxsubject toQx(α0)q0,xX,(9) where α0 is a fixed risk level.

With the reference quantile function Qref(α)={qαif αα0,τif α<α0,the constraints in this example are equivalent to Qx(α)Qref(α), α[0,1], xX, of form the form (Equation8).

Example 3.7

Decision making under catastrophic risks, [Citation49]

Catastrophic risks, as catastrophic floods, earthquakes, tsunami, etc., designate some ‘low probability – high consequences’ events. Usually, they are described by a list of possible extreme events (indexed by i=1,2,,I) that can happen once in 10, 50, or 100, etc., years. Decision-making under catastrophic risks means designing a certain mitigation measures to prevent unacceptable losses. It is proposed the following framework for decision-making under catastrophic risks.

Let vector of parameters xX describes a decision (a complex of countermeasures) from some (compact) set X of possible decisions, each associated with costs c(x). For each kind of event i, experts can define reasonable (‘acceptable’) levels of losses qi due to this event, 0<q1<<qm. Suppose we can model each event i, its consequences and losses i(x) under the decision xX. Then the corresponding decision-making problem is (10) minimizec(x)subject toi(x)qi,i=1,2,,I;xX.(10) Although the framework does not include explicit probabilities of the events i, we can formally introduce probabilities 1>p1>p2>>pm, e.g. p1=1/10, p2=1/50, p3=1/100, etc., that event i happens in any given year. By defining two quantile functions Qref(α){qm,α[0,pm),qm1,α[pm,pm1),,,q1,α[p1,p1),0,α[p1,1],Qx(α){m(x),α[0,pm),m1(x),α[pm,pm1),,,1(x),α[p1,p1),0,α[p1,1],we can formally express the constraints (Equation10) as Qx(α)Qref(α) for all α[0,1], i.e. in terms of 1st-order stochastic dominance.

4. The solution approach: exact penalty functions

In case of a discrete random variable ω, the function Fx(t) in (Equation6) is discontinuous in t and x. For the solution of the problem (Equation5)–(Equation6), we apply the exact discontinuous and the exact projective penalty functions from [Citation1,Citation40–42,Citation50].

4.1. Finding a feasible solution

If the distribution function Fx,m() has a stepwise character with jumps at step points Tx={t1,,tm}, we may set Gm(x)maxtTx(Fx,m(t)Fref(t)).With that, due to Lemma 3.2, the stochastic dominance constraints in (Equation6) are equivalent to the inequality Gm(x)0.

To find a feasible solution x0 for the problem (Equation5)–(Equation6), we solve the problem (11) minxXGm(x).(11) If for some x0X it holds that Gm(x0)0, then x0 is a feasible solution of the problem (Equation5)–(Equation6).

Similarly, due to Lemma 3.5, to find a feasible solution of problem (Equation7)–(Equation8), we solve the problem (12) minxX Hm(x)maxαAref(Qref(α)Qx,m(α)),(12) where Aref is the set of jump points of Qx,m().

4.2. Structure of the feasible set

The structure of the feasible set of the problems (Equation5)–(Equation6) and (Equation7)–(Equation8) heavily depends on the choice of the reference profiles Fref(t) and Qref(α). Figure  illustrates disconnected and non-convex feasible sets for a portfolio consisting of 3 components only.

Figure 1. Possible (disconnected and non-convex) shapes of feasible sets under 1st-order SDC (xref=(x1=0.31,x2=0.69), δ=0.05,0,0).

Figure 1. Possible (disconnected and non-convex) shapes of feasible sets under 1st-order SDC (xref=(x1=0.31,x2=0.69), δ=0.05,0,0).

Suppose the portfolio x0=(0.31,0.69,0) includes the first two assets of Table 9 from the appendix, with random return ω given by the first two columns of this table. Let F0(t)Pr{ωx0<t} be the risk profile of this portfolio and Fref(t)F0(t+δ) (with constant δ=5%) be the reference risk profile. The left figure in Figure displays the shape of the non-convex disjoint feasible set of problem (Equation5)–(Equation6) for this example. The figure in the middle gives an example of the feasible set of problem (Equation5)–(Equation6), when a risk-free asset is infeasible. The right figure of Figure corresponds to the case of a feasible risk-free portfolio.

When the distribution of returns ω is discrete (i.e. represented by scenarios), the distribution function Fx(t) of the portfolio return is discontinuous in x as a weighted sum of step-wise indicator functions, so the stochastic dominance constraints are represented by discontinuous functions. So generally, we deal with discontinuous optimization problems, which are a challenge for optimization theory. In particular, the constraints in these problems can be non-convex and disjoint. Our first step towards a solution of such problems is to eliminate constraints by a penalty method. However, the standard exact penalty methods that add a penalty term to the objective function is invalid in this case. In the next section we consider two new exact penalty methods, discontinuous and projective ones.

4.3. Exact discontinuous penalty functions

To find an optimal solution for problem (Equation5)–(Equation6), we solve the problem (13) maxxX[F(x){f(x)if Gm(x)0,cGm(x)else,](13) where c<sup{xX:Gm(x)0}fm(x), for example, c=f(x0) with Gm(x0)0.

Obviously, the global maximums of the problems (Equation5)–(Equation6) and (Equation13) coincide.

We can further remove the constraint xX from problem (Equation13) by subtracting the exact projective penalty term xπX(x), where πX(x) is the projection of x on X, from the objective function (Equation13), and solve the global problem (14) maxxRnΦ(x)F(πX(x))xπX(x)(14) instead.

4.4. Exact projective penalty functions

Let x0X be some feasible solution of problem (Equation5)–(Equation6), i.e. x0X and Gm(x0)0. For any xRn denote xλ=(1λ)x0+λx. Define a projection point (15) pGm(x)={xif Gm(x)0,xλxif Gm(x)>0,(15) where λx=sup{λ[0,1]:Gm(xλ)0}.Now instead of the constrained problem (Equation5)–(Equation6) consider the unconstrained problem (16) maxxRn[Fm(x)fm(pGm(πX(x)))pGm(πX(x))πX(x)xπX(x)].(16) Similarly, instead of constrained problem (Equation7)–(Equation8), we can consider the unconstrained problem (17) maxxRn[Fm(x)fm(pHm(πX(x)))pHm(πX(x))πX(x)xπX(x)].(17) The exact projective penalty method was studied and tested in Norkin [Citation1,Citation42], Galvan [Citation50]. The main features of the method are: it is exact, it does not require selection of the right penalty parameter, and it does not use the objective function values outside the feasible set. Also, for the considered portfolio reshaping problem, the exact projective penalty function can be found in a closed form.

By Norkin [Citation1, Theorem 4.4], the global maximums of problems (Equation5)–(Equation6) and (Equation16) coincide, and, if the mapping pGm(πX(x)) is continuous, also the local maximums of both problems coincide, i.e. the optimization problems are equivalent. The mapping pGm(πX(x)) is continuous, if the feasible set is convex and the projection mapping (Equation15) uses an internal point x0 of the feasible set of the considered problem. For example, a feasible risk-free portfolio can represent such point.

Remark 4.1

Computational aspects

The calculations of the projections pGm() and pHm() requires finding roots of the equations ϕ(λ)Gm(xλ)=0 and ψ(λ)Hm(xλ)=0. This requires multiple evaluations of the functions ϕ(λ) and ψ(λ), and hence, multiple construction of Fxλ,m(t)=Pr{ωxλt} or Qxλ,m(α) for different portfolios xλ. This may take considerable time in case of large number of observations m. However, in case of a specific feasible point, namely a risk-free (feasible) portfolio, these functions can be easily found through Fx,m(t) and Qx,m(α) as the following statements show.

Proposition 4.2

Let x0=(1,0,,0) be a risk-free portfolio with fixed return r, xX be an arbitrary portfolio, with the random return f(x,ω)=ωx, the return cumulative distribution function Fx(t), tR, and the corresponding inverse (quantile) one Qx(α), α[0,1], X={xRn:i=1nxi=1, xi0,ı=1,,n}. Consider a mixed portfolio of the form xλ=λx+(1λ)x0, λ[0,1]. Its distribution function and inverse distribution function are expressed through Fx(t) and Qx(α) as Fxλ(t)=Fx((t(1λ)r)/λ),and Qxλ(α)=λQx(α)+(1λ)r.

Proof.

Denote f(x,ω)=ωx and f(xλ,ω)=ωxλ. Then Fxλ(t)=Pr{f(xλ,ω)<t}=Pr{ω(1λ)x0+λf(x,ω)<t}=Pr{(1λ)r+λf(x,ω)<t}=Pr{f(x,ω)<(t(1λ)r)/λ}=Fx((t(1λ)r)/λ).Next, by definition, Qxλ(α) is the optimal value of the optimization problem max{tR:Fxλ(t)α}=max{tR:Pr{f(xλ,ω)<tα}=max{tR:Fx((t(1λ)r)/λα}.With the variable τ=(t(1λ)r)/λ, the latter problem is equivalent to max{λτ+(1λ)r:Fxλ(τ)α}.The optimal value of this problem equals the expression λsupτR{τ:Fx(τ)α}+(1λ)r=λQx(α)+(1λ)r=Qxλ(α),which completes the proof.

The preceding proposition shows that the quantile function Qxλ(α) of the mixed portfolio xλ=λx+(1λ)x0 is the weighted quantile function the return r of the risk-free portfolio x0 and Qx(α), the return quantile function of the portfolio x.

Thus, for a known risk-free feasible portfolio x0=(1,0,,0) with a fixed return r>0, the calculations can be reduced considerably. Indeed, we need to calculate only one CDF Fx,m() and can re-use it for the CDFs Fxλ,m() for different values of λ. In case of a discrete reference CDF Fref(t) with jump points (Tref,Aref)={(t1,α1),,(tk,αk),}, the projection pHm(x) can be found in an analytical form as the following proposition states.

Proposition 4.3

Assume that Qref(1)<r, then there exists a risk-free internal feasible portfolio x0 with return r. Further, the projection pHm(x) can be stated in the closed form pHm(x)=(1λx)x0+λxx, where λx={1,if Hm(x)0,min{αAref:Qx(α)<Qref(α)}Qref(α)rQx(α)r,if Hm(x)>0.

Proof.

It holds that Qref(α)Qref(1)<r=Qx0(α), i.e. the risk-free portfolio x0=(1,0,,0) is feasible and internal.

For Hm(x)0 we have, by definition, pHm(x)=x. So assume that Hm(x)>0. Consider the portfolios xλ=(1λ)x0+λx, λ[0,1], and the function hx(α,λ)=Qref(α)Qxλ(α)=Qref(α)rλ(Qx(α)r).For αAref such that Qx(α)<Qref(α) it holds that hx(α,1)>0 and hx(α,0)<0 and the function hx(α,) is linear strictly monotone. So the projection corresponds to the minimal λ such hx(α,λ)0, that is to λx, λx=min{αAref:Qx(α)<Qref(α)}Qref(α)rQx(α)r,which completes the proof.

The proposition shows that given the conditions, the feasible set {xX:Hm(x)0} has a star shape with respect to the feasible point x0 that represents a risk-free portfolio.

If Qx(α), αAref, are continuous functions in x, then, given the conditions of the preceding proposition, λx is continuous. Hence, the projection mapping pm(x)=λxx+(1λx)x0 is also continuous. It follows from [Citation1, Theorem 4.4] that the problems (Equation7)–(Equation8) and (Equation17) are equivalent.

5. Numerical optimization of discontinuous penalty functions

In this section, we consider a numerical method for the optimization of generally discontinuous functions Gm(x), Hm(x) and Fm(x) in (Equation11)–(Equation17). The idea consists in sequential approximations of the original function by smooth (averaged) ones and optimizing the latter by stochastic optimization methods. For this we develop a stochastic finite-difference estimates of gradients of the smoothed functions. Although the successive smoothing method has certain global optimization abilities (as discussed in [Citation42]), to strengthen this property we embed it as a local optimizer into some branch and bound scheme.

The problem setting satisfies the following conditions, which justify the tools employed to numerically solve the problem.

  1. The original optimization problems are correctly set, i.e. they have feasible and optimal solutions. The Lemmas 3.1, 3.3 and 3.4 ensure this assumption.

  2. The original constrained problems are transformed to unconstrained ones and this transformation are assumed to be exact. The exact discontinuous and exact projective penalty methods discussed in Section 4 and in [Citation1] provide this transformation.

  3. Penalty functions are approximated by a sequence of smoothed functions, and this sequence epi-converges to the target penalty function. The so-called strongly lower semicontinuous functions (see Definition 5.1 and Theorem 5.4 in Subsection 5.1 below) ensure this property.

  4. The level sets of the penalty and the smoothed functions are uniformly bounded. This property is provided by penalty term xπX(x) in (Equation16) and (Equation17) related to projection on the convex set X.

  5. For the unconstrained minimization of smoothed functions we apply stochastic finite-difference gradient method in a combination of a branch and bound method, which are assumed to approximately find their global minimums. Both applied methods are heuristic, their structure and convergence properties are discussed in Subsection 5.2 below.

  6. Convergence of the approximate global minimums to global minimums of the penalty function (on a subset, where penalty function values can be approached through its continuity points) is guaranteed by properties of epi-convergence, cf. [Citation51, Theorem 7.33].

5.1. Averaged functions

We limit the consideration to the case of the so-called strongly lower semicontinuous functions.

Definition 5.1

Strongly lower semicontinuous functions, cf. [Citation43]

A function F:RnR is called lower semicontinuous (lsc) at a point x, if lim infνF(xν)F(x)for all sequences xkx.

A function F:RnR is called strongly lower semicontinuous (strongly lsc) at a point x, if it is lower semicontinuous at x and there exists a sequence xkx such that it is continuous at xk (for all xk) and F(xk)F(x). A function F is called strongly lower semicontinuous (strongly lsc) on XRn, if this is the case for all xX.

The property of strong lower semi-continuity is preserved under continuous transformations. Some further properties of these functions are discussed in [Citation52].

If the function F:RnR is continuous almost everywhere (i.e. it is discontinuous on some set of measure zero), then its (lower) regularization (18) Fˆ(x)lim inf{F(xk):xkx,F is continuous at xk}(18) is strongly lower semicontinuous and coincides with F almost everywhere.

The averaged functions obtained from the original non-smooth or discontinuous function by convolution with some kernel have smoother characteristics. For this reason, they are often used in optimization theory (see [Citation43,Citation53] and references therein).

Definition 5.2

The set (family) of bounded and integrable functions {ψθ:RnR+,θR+} satisfying for any ϵ>0 the conditions limθ0ϵBψθ(z)dz=1,B{xRn:x1},is called a family of mollifiers. The kernels {ψθ} are said to be smooth if the functions ψθ() are continuously differentiable.

A function F:RnR1 is called bounded at infinity if there are positive numbers C and r such that |F(x)|C for all x with xr.

Given a locally integrable, bounded at infinity, function F:RnR1 and a family of smoothing kernels {ψθ}, the associated family of averaged functions {Fθ:θR+} is (19) Fθ(x)RnF(x+z)ψθ(z)dz=RnF(z)ψθ(zx)dz.(19) Smoothing kernels can have an unlimited support suppψθ={x:ψθ(x)>0}. To ensure the existence of the integrals (Equation19), we assume that the function F is bounded at infinity. We can always assume this property if we are interested in the behaviour of F within some bounded area. If suppψθ0 for θ0, then this assumption is superfluous.

For example, a family of kernels can be as follows. Let ψ be some probability density function with bounded support suppψ, a positive numerical sequence {θν:ν=1,2,} tending to 0 as ν. Then the smoothing kernels on Rn can be taken as ψθν(z)1(θν)nψ(z/θν),where θν is a bandwidth.

If the function F is not continuous, then we cannot expect the averaged functions Fθ(x) to converge to F uniformly. But we don't need that. We need such a convergence of the averaged functions Fθ(x) to F that guarantees the convergence of the minima of Fθ(x) to the minima of F. This property is guaranteed by the so-called epi-convergence of functions.

Definition 5.3

Epi-convergence, cf. Rockafellar and Wets [Citation54]

A sequence of functions {Fν:RnR¯,νN} epi-converges to a function F:RnR¯ at a point x, iff

  1. lim infνFν(xν)F(x) for all xνx and

  2. lim infνFν(xν)=F(x) for some sequence xνx.

The sequence {Fν} epi-converges to F, if this is the case at every point xRn.

Theorem 5.4

Epi-convergence of averaged functions, cf. [Citation43], Theorem 3.7 and [Citation51], Example 7.19

For a strongly lower semicontinuous locally integrable function F:RnR1, any associated sequence of averaged functions {Fν=Fθν:θνR+} epi-converges to F as θν0.

Note that in the optimization problem without constraints it follows that limν(infxFν)=infxF,cf. [Citation51, Theorem 7.33].

We remark that for almost everywhere continuous function F, the corresponding averaged functions Fθν epi-converge to its regularization Fˆ defined in (Equation18).

To optimize discontinuous functions, we approximate them with averaged functions. The convolution of a discontinuous function with the corresponding kernel (probability density) improves analytical properties of the resulting function, but increases the computational complexity of the problem, since it transforms the deterministic function into an expectation, which is a multidimensional integral. This transformation of the problem, which involves probability measures, naturally links to well-studied and well-established methods in stochastic optimization.

We can consider smoothed functions obtained by employing differentiable kernel with unbounded support, for example the Gaussian kernel given by the probability density ψ(y)=(2π)n/2ey2/2.Consider the family Fθ(x)=RnF(x+θy)ψ(y)dy=1θnRnF(z)ψ(zxθ)dz,θ>0,of averaged functions. Suppose that F is globally bounded (one may even assume that |F(x)|γ1+γ2xγ3 with some non-negative constants γ1, γ2 and γ3). Then for the strongly lsc function F, the average functions Fθ epi-converge to F as θ0 and each function Fθ is analytical with gradient (cf. Stein's lemma) Fθ(x)=1θn+2RnF(z)ψ(zxθ)(zx)dz=1θRnF(x+θy)ψ(y)ydy=1θRnF(xθy)ψ(y)ydy=1θRn(F(x+θy)F(x))ψ(y)ydy=12θRn(F(x+θy)F(xθy))ψ(y)ydy,or (20) Fθ(x)=Eη1θ(F(x+θη)F(x))η=Eη12θ(F(x+θη)F(xθη))η,(20) where the random vector η follows the standard normal distribution and Eη is the mathematical expectation over η.

It follows that the random vector (21) ξθ(x,η)=η2θ(F(x+θη)F(xθη))(21) with Gaussian random variable η is an unbiased random estimate of the gradient Fθ(x).

5.2. Stochastic methods for minimization of discontinuous penalty functions

Consider a problem of constrained minimization of a generally discontinuous function subject to a box or other convex constraints. The target problems are (Equation11)–(Equation17).

Such problems can be solved, e.g. by collective random search algorithms. In this section, we develop stochastic quasi-gradient algorithms to solve these problems.

A problem of constrained optimization can be reduced to the problem of unconstrained optimization of a coercive function F(x) by using non-smooth or discontinuous penalty functions as described in [Citation1,Citation42,Citation50] (for the case of the present paper see Section 4).

Suppose the function F(x) is strongly lower semicontinuous. In view of Theorem 5.4, it is always possible to construct a sequence of smoothed averaged functions Fθν that epi-converges to F. Due to this property, the global minima of Fθν converge to the global minima of F as θν0. Convergence of local minima was studied in [Citation43].

Let us consider some procedures for optimizing the function F using approximating averaged functions Fθν.

Suppose one can find the global minima {xν} of the functions Fθν, ν=0,1,. Then any limit point of the sequence {xν} is a global minimum of the function F. However, finding global minima of Fθν can be a quite difficult task, so consider the following method.

The successive stochastic smoothing method, cf. [Citation42]

The method sequentially minimizes a sequence of smoothed functions Fθν with decreasing smoothing parameter θν0. The sequence of approximations xν is constructed by implementing the following steps (cf. [Citation42]).

The successive smoothing method as a local optimizer.

  1. Initialization. Fix a box [lb,ub] with lower (lb) and upper (ub) bounds on the problem variables, select a number N of (decreasing) smoothing steps and a decreasing rate α[1/2,1); K is the number of iterations of the Nemirovski-Yudin method (NYM), select a starting (maximal) smoothing parameter θ1, e.g. θ1=ublb. Select a (random) starting point x0(lb,ub). Set ν=1, the initial smoothing iteration count.

  2. Smoothing iterations. For a fixed smoothing parameter θν, ν1, minimize the smoothed function Fθν by some stochastic optimization method with the use of the initial point xν and finite-difference stochastic gradients (Equation21) to find the next approximation xν+1. For example, this can be done by the following stochastic optimization algorithm.

    1. Initialization of the Nemirovski-Yudin method (NYM). Fix m1 the sample size for the gradient estimation. Set k=1, y1=z1=xν.

    2. Batch estimation of the gradient Fθν(yk). Sample independent, normally distributed directions η1,,ηmRn and calculate the current estimate of Fθν(yk) by employing (Equation21) as ξθν(yk)=i=1m12θν(F(yk+θνηi)F(ykθνηi))ηi.

    3. Itarations of NYM: yk+1=Π[lb,ub](ykρkξθν(yk)ξθν(yk)+ϵ),ρk=θν/k, ϵ>0,zk+1=(1ρki=1kρi)zk+(ρki=1kρi)yk+1,where Π[lb,ub]() is the projection operator on the box [lb,ub].

    4. Stopping of NYM. Increase k by 1. If k<K, then go to (ii)b, else continue with 5.2.

  3. Gelfand–Zetlin–Nesterov step. Set xν+1=zk+1+λ(zk+1zk),λ[0,1), e.g. λ=0.8.

  4. Transition to the next less smoothing step (or stop). Increase the smoothing iteration number ν by 1, decrease the smoothing parameter θν+1=αθν. If ν<N, go to step (ii), else return an approximate solution xν of the optimization problem.

The successive smoothing method described above starts with a more smooth function that disregards the fine structure of the objective to find the most promising regions in the search space before gradually making the approximation more exact. The successive smoothing with gradual vanishing degree of smoothing distinguishes our method from other smoothing methods, which use fixed smoothing for minimization of non-smooth functions. In our method we allocate a fixed number K of iterations to minimize a smoothed function Fθν(). For a particular case K=1 (and λ=0), the convergence rate of the successive smoothing method on the class of Lipschitz non-smooth convex functions was studied in [Citation55]. Roughly speaking, the rate of convergence (in function) of the method is proportional to the range of the function on the feasible set multiplied by factor n/(mN), where n is the dimension of the space, m is the number of finite differences for estimation of gradients. This method stops after N smoothing iterations. The other practical stopping rule is to stop computations, if there is no progress of the algorithm after several iterations.

For the minimization of the smoothed function Fθ under fixed θ=θν, one can apply any (not necessary NY mirror descent [Citation56] but, e.g. a stochastic version of the Nesterov's method [Citation57]) stochastic finite-difference optimization method based on the finite difference representations (Equation20), (Equation21) of the gradients of the smoothed function. In the described NY algorithm, the local optimization of Fθ stops after K iterations.

One more possible stopping rule is based on estimating gradients Fθ(x) during the iterative optimization process. In general, this is a rather complicated and time-consuming procedure that requires calculation of multidimensional integrals. However, such asymptotically consistent estimates can be constructed in parallel with the construction of the main minimization sequence by using the following so-called averaging procedure [Citation45,Citation58,Citation59].

Consider the stochastic optimization procedure xk+1=xkρkzk,z0=ξθ(x0,η0),x0Rn,zk+1=zkλk(zkξθ(xk,ηk)),k=0,1,,for an iterative optimization of a function Fθ(x) and parallel evaluation of its gradients Fθ(x), where the vectors ξθ(xk,η) are given by (Equation21). For the conditional expectations it holds that E(ξθ(xk,ηk)xk)=Fθ(xk). Let the numbers ρk, λk satisfy the conditions 0λk1,limkλk=0,k=0λk=+,k=0λk2<+,andlimkρkλk=0.Then, with probability one, it holds (cf. Ermoliev [Citation58, Theorem V.8]) that zkFθ(xk)0ask.If Fθν(xν)ϵν0, then, by results of [Citation43,Citation45,Citation60], the constructed sequence xν asymptotically converges to the set, which satisfies necessary optimality conditions for F. Some other stopping rules for stochastic gradient methods are discussed, e.g. in [Citation61].

Figure  provides a graphical illustration of the performance of the successive smoothing method on problem (Equation14).

Figure 2. Illustration of the smoothing method performance on two asset (1,2) portfolio selection. Examples of the trajectories of the method for different discontinuous penalties. xref=(x1=0.3,x2=0.7), δ=0.05.

Figure 2. Illustration of the smoothing method performance on two asset (1,2) portfolio selection. Examples of the trajectories of the method for different discontinuous penalties. xref=(x1=0.3,x2=0.7), δ=0.05.

Global optimization issues.

The optimization problems under consideration (Equation5)–(Equation6) and (Equation7)–(Equation8) are challenging, they are non-convex and multi-extremal, discontinuous, disconnected and constrained. So we apply a number of different techniques to tackle and solve them.

To remove discontinuous constraints, we use exact discontinuous penalty functions, and to remove structural portfolio constraints and bound constraints, we apply exact projective penalty functions, cf. [Citation1,Citation42,Citation50].

In the case of multi-extremal problems (as in Figure ) we employ a version of the branch and bound method [Citation1] with the successive smoothing method as a local minimizer (in [Citation1], the successive quadratic approximation method was used as a local optimizer).

Figure 3. Illustration of the smoothing method performance on two asset (4,9) portfolio selection. Example of the trajectory of the method for non-connected feasible set. xref=(x4=0.3,x9=0.7), δ=0.05. The green is the starting point, the red is the final one.

Figure 3. Illustration of the smoothing method performance on two asset (4,9) portfolio selection. Example of the trajectory of the method for non-connected feasible set. xref=(x4=0.3,x9=0.7), δ=0.05. The green is the starting point, the red is the final one.

To solve problem (Equation11)–(Equation17), we apply the following branch and bound (cut) algorithm acting in a box [lb,ub]Rn.

The Branch & Bound algorithm.

Initialization.

Set the initial partition P0={X=[lb,ub]}, select a random starting point x~0X and apply some local optimization algorithm A to the problem under consideration. As result, we find a better point x¯0X such that F(x¯0)<F(x~0). Set the B&B iteration count k=0. Set tolerances ϵ>0 and δ>0. Set the integer stopping parameter p2.

B&B iteration.

Suppose at iteration k we have partition Pk={Xi:i=1,,Nk} of the set X=i=1NkXi consisting of smaller boxes Xi. For each Xi, there is a known feasible point x¯iXi and the corresponding value F(x¯i), Vk=min1iNkF(x¯i). Set Pk+1=.

For each such set XiPk choose a random starting point x~i and apply some local optimization algorithm (e.g. the successive smoothing one) to the problem minxXiF(x) to find a better point x¯¯iXi, F(x¯¯i)<F(x~i).

If the values F(x¯i) and F(x¯¯i) are sufficiently different, say F(x¯i)F(x¯¯i)ϵ, or the points x¯¯i and x¯i are sufficiently distinct, x¯¯ix¯iδ, we subdivide the box Xi=XiXi′′ into two subboxes Xi and Xi′′ so that x¯iXi and x¯¯iXi′′. In this case, the partition Pk+1 is updated by adding the successors Xi and Xi′′, i.e. Pk+1Pk+1XiXi′′. Otherwise, if the values F(x¯i) and F(x¯¯i) and points x¯¯i and x¯i are close, the set Xix¯i goes unchanged to the updated partition Pk+1Pk+1Xi.

When all elements XiPk are examined in this way, i.e. the new partition Pk+1 with elements Xi,i=1,,Nk+1, and points x¯iXi has been constructed, we modify the achieved record value, Vk+1=min1iNk+1F(x¯i).

Check for the stop.

If there is no progress of the B&B method during p B&B iterations, i.e. Vk+1=Vk+1p, then stop; otherwise, repeat the B&B iteration.

Remark 5.5

The described B&B aims at subdividing a non-convex multiextremal problem into sub-problems, each containing only one local minimum. To prevent the method from early stop, two mechanisms are used. First, the stopping parameter p2 is introduced: the method stops after p unproductive B&B iterations. In all numerical experiments, p=5 was sufficient to find the global minimum. Second, for each new run of the local minimizer, a random starting point is used, that increases the probability of finding a new local minimum of the sub-problem.

Let us emphasize that as local optimization algorithm within the B&B scheme one can apply any reasonable (even heuristic) algorithm, which allows improving the current solution. In particular, in numerical experiments we also used well implemented optimization algorithms from the shelf, e.g. a sequential quadratic approximation algorithm, which formally is not applicable to the considered problem, but which quickly finds local optimums and thus considerably speeds up the overall optimization procedure.

The function values F(x¯i) of the objective provide upper bounds for the optimal values Fi=minxXiF(x). If there were known lower bounds LiFi, then the subsets XiPk such that LiVk can be safely ignored, i.e. excluded from the current partition Pk. Heuristically, if some set Xi remains unchanged during several (say p) B&B iterations, it can be ignored or rare examined in the future iterations. Further results of the B&B algorithm described above are available in [Citation1].

6. Numerical illustration

For the numerical illustration of the algorithm proposed we return to portfolio optimization under 1st-order stochastic dominance constraints. We use a small data set of annual returns of nine US companies from [Citation62, Table 1, page 13] provided in the appendix.

6.1. Testing the successive smoothing method on the discontinuous portfolio optimization problems

First, let us illustrate the proposed approach on a two-dimensional portfolio (the first two columns of the table from the appendix) by solving the problem (Equation5)–(Equation6). For this we first fix some initial reference portfolio, xref=(0.7,0.3) with average return μref=0.0629 and corresponding CDF Fref(t). Then, we relax the constraint (Equation6) by replacing Fref(t) with the shifted function Fref(tδ), where δ=0.05, independent of t. With this new reference function we solve the problem (Equation13) with different values of the parameter c{0.0659,1.0659} by the stochastic smoothing method from Subsection 5.2.

Figure presents the results. The pictures illustrate how the method climbs up to the global maximum of the discontinuous objective function. In the two presented examples, the method finds better portfolios with average return μ0.0640. The right picture also highlights the set of feasible portfolios, because by setting a larger c, we decrease the values of the penalized function at the infeasible points. It can be seen that the proposed version of the smoothing method is not very sensible to discontinuities of the minimized function.

One more example is presented on Figure . The reference 2-security portfolio is xref=(x4=0.3,x9=0.7), δ=0.05. In this example, the feasible set is not connected. The optimal portfolio consists of the two assets x4,9=(x4=0.8779,x9=0.1219) with the expected return μ(x4,9)=0.1664. If we extend the portfolio to nine assets, we can obtain by the proposed method the better return μ1:9=0.1724 with the optimal portfolio x=(0.0117,0.0131,0.0730,0.2936,0.4619,0.0123,0.0080,0.0324,0.0880),ixi=0.9941, within the same bounds on risk, G(x)=0.

6.2. Lower bounding the risk-return profile and maximizing the tail return

This subsection presents results of optimization of 3- and 10-component portfolios under 1st-order stochastic dominance constraints. The constraint is given by its (reference) CDF, and as objective functions we employ the average value of the portfolio return (AV), the Value-at-Risk (quantile, V@Rα) and Average Value-at-Risk indicators (AV@Rα) with levels α=40% and α=70%. Results are provided in tabular and graphical form, each set of experiments is specified by the number of portfolio components (3 or 10), the parameter α (40% or 70%), and the exact penalty method applied (discontinuous or projective from Subsections 4.3 and 4.4).

Each table contains three numerical rows corresponding to three kinds of the objective functions:

  1. the mean,

  2. the V@R, and

  3. the AV@R.

The rows ‘Mean’, ‘V@R’ and ‘AV@R’ show the objective values of the corresponding indicators for the three optimal portfolios. The other columns show the structure of the obtained portfolios. Each table is supplemented by a reference figure containing three graphs displaying the optimal risk profile. The blue broken line in a graph depicts the reference CDF, a (left) bound on the portfolio return, CDF([0.05; 0.05; 0.1; 0.11; 0.125])=[0; 0.2; 0.4; 0.6; 1]. The red (right) broken line shows the CDF of the actual optimal portfolio return. So the lines display the reference and the actual risk profiles of the optimal portfolios.

Tables  and  (and the corresponding Figures  and ) compare results of solving portfolio optimization problems by means of discontinuous and projective penalty methods, respectively, α=40%. As can be seen, both figures, Figure and , are very similar, that can be a proof that both penalty methods are applicable and give close results.

Figure 4. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Discontinuous penalties. 1) Optimal average return. 2) Optimal V@R40%. 3) Optimal AV@R40%; cf. Table .

Figure 4. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Discontinuous penalties. 1) Optimal average return. 2) Optimal V@R40%. 3) Optimal AV@R40%; cf. Table 1.

Figure 5. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) Optimal average return. (2) Optimal V@R0.4. (3) Optimal AV@R0.4. See Table .

Figure 5. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) Optimal average return. (2) Optimal V@R0.4. (3) Optimal AV@R0.4. See Table 2.

Table 1. Optimal 3 component portfolios, discontinuous penalties: (1) Max mean; (2) Max V@R; (3) Max AV@R.

Table 2. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max V@R40%. (3) Max AV@R40%.

The next two Tables  and  (and the corresponding Figures  and ) show the effect of extension of a portfolio for account of new securities, from 3 to 10, for α=70%. The objective functions values in Table  are greater than the corresponding values in Table . The corresponding Figures and indicate changes in the risk profiles of optimal portfolios due to this enlargement. The increase of the objective functions happens also for account of huddling the risk profiles to the reference ones. The pictures also show the influence of the different objective functions on the risk profiles of the optimal portfolios. Finally, Table  shows the structures of the optimal 10-component portfolios.

Figure 6. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) Optimal average return. (2) Optimal V@R70%. (3) Optimal AV@R70%. See Table .

Figure 6. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) Optimal average return. (2) Optimal V@R70%. (3) Optimal AV@R70%. See Table 3.

Figure 7. Profiles of optimal 10 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) optimal average return; (2) optimal V@R70%; (3) optimal AV@R70%; cf. Tables  and .

Figure 7. Profiles of optimal 10 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) optimal average return; (2) optimal V@R70%; (3) optimal AV@R70%; cf. Tables 4 and 5.

Table 3. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max V@R. (3) Max AV@R.

Table 4. Optimal 10 component portfolios, analytical projection: (1) Max mean. (2) Max V@R. (3) Max AV@R.

Table 5. The optimal 10 component portfolio.

6.3. Summary of numerical experience

The concrete computational results of the numerical experiments depend on the following settings. We used

  1. three different objective functions, the mean value, quantile and average quantile functions at different risk levels;

  2. two kind of penalty functions, discontinuous and projective penalty ones (the latter is applied when there is an internal risk three portfolio);

  3. a specific heuristic stochastic branch and bound algorithm with adjustable parameters;

  4. several local optimization algorithms, Nemirovski-Yudin and Nesterov finite-difference stochastic optimization algorithms, and also a deterministic sequential quadratic programming one;

  5. random starting points and adjustable parameters for local optimization algorithms.

The concrete computational results depend on all these options.

The problems under consideration can have many local optimums close to the global optimum, so the combined B&B algorithms may stick at different deep local optimums and may have different random running times (from several seconds to several minutes). To validate the global optimum, it is necessary to change the parameters of the algorithm and re-run it.

It was observed that the algorithm, combined algorithm with Nesterov's local optimization method, works better than combined with Nemirovski-Yudin's one. Besides, the combined heuristic B&B algorithm, which uses a well implemented deterministic sequential quadratic approximation algorithm (sqp from Matlab's optimization toolbox) as a local optimizer, works much faster (10 times or more) than stochastic optimization algorithms (seconds against minutes).

7. Conclusions

The paper considers a specific method for optimization, which transforms a constraint optimization problem to an unconstrained global optimization problem. The paper illustrates the procedure for an optimization problem with uncountable many constraints. More specifically, the paper considers financial portfolio optimization under 1-st order stochastic dominance constraints.

The designed optimization techniques are aimed at interactive solving the portfolio reshaping problem, i.e. interactive adaptation of the portfolio risk profile to the portfolio manager's preferences by changing risk bounds and optimization of different risk criteria.

In the literature, similar portfolio optimization problems are mostly considered under 2nd-order stochastic dominance constraints, which constitute convex problems. Few exceptions include [Citation4,Citation5,Citation27,Citation33,Citation37,Citation63]. The 1st-order constraints put lower bounds on the risk profile (CDF) of the optimized portfolio. As objective functions, different aggregated indicators can serve, e.g. the expected value, the Value-at-Risk, or the average Value-at-Risk, etc. In this setting, we put lower bounds on low returns and try to maximize higher returns.

Such constraints make the problem non-convex and hard for numerical treatment. We propose the new exact penalty functions to handle the constraints and a new stochastic B&B algorithm and new stochastic optimization (smoothing) techniques for solving penalty problems. The approach is numerically and graphically illustrated on small test examples. The advantage of the proposed approach to financial portfolio optimization consists in an additional visual control of the risk profile of the optimal portfolio.

The proposed B&B algorithm is well-suited for effective parallelization (parallel implementation of the B&B iteration) and thus has much room for acceleration and scaling and this may be a topic for future research.

Disclosure statement

This study builds upon publicly available data collected in Table 9. The corresponding data, codes, and tests are available at the first author's ResearchGate web-page. The authors have no conflicts of interest to disclose.

Additional information

Funding

Gratefully acknowledges funding by Volkswagenstiftung (Volkswagen Foundation) and National Research Fund of Ukraine – Project ID 2020.02/0121. DFG, German Research Foundation – Project-ID 416228727 – SFB 1410.

References

  • Norkin VI. The projective exact penalty method for general constrained optimization. Preprint. V. M. Glushkov Institute of Cybernetics, Kyiv, 2022..
  • Dentcheva D, Ruszczyński A. Optimization with stochastic dominance constraints. SIAM J Optim. 2003;14:548–566. doi: 10.1137/s1052623402420528
  • Dentcheva D, Ruszczyński A. Portfolio optimization with stochastic dominance constraints. J Bank Financ. 2006;30:433–451. doi: 10.1016/j.jbankfin.2005.04.024
  • Noyan N, Rudolf G, Ruszczyński A. Relaxations of linear programming problems with first order stochastic dominance constraints. Oper Res Lett. 2006;34:653–659. doi: 10.1016/j.orl.2005.10.004
  • Noyan N, Ruszczyński A. Valid inequalities and restrictions for stochastic programming problems with first order stochastic dominance constraints. Math Program. 2008;114:249–275. doi: 10.1007/s10107-007-0100-1
  • Shapiro A, Dentcheva D, Ruszczyński A. Lectures on stochastic programming. 3rd ed. SIAM; 2021. (MOS-SIAM Series on Optimization). doi: 10.1137/1.9781611976595
  • Gutjahr WJ, Pichler A. Stochastic multi-objective optimization: a survey on non-scalarizing methods. Ann Oper Res. 2013;236(2):1–25. doi: 10.1007/s10479-013-1369-5
  • Markowitz HM. Portfolio selection. J Finance. 1952;7(1):77–91. doi: 10.2307/2975974
  • Roy AD. Safety first and the holding of assets. Econometrica. 1952;20:431–449. doi: 10.2307/1907413
  • Artzner P, Delbaen F, Eber J-M, et al. Coherent measures of risk. Math Financ. 1999;9:203–228. doi: 10.1111/mafi.1999.9.issue-3
  • Benati S, Rizzi R. A mixed integer linear programming formulation of the optimal mean/value-at-risk portfolio problem. Eur J Oper Res. 2007;176:423–434. doi: 10.1016/j.ejor.2005.07.020
  • Gaivoronski AA, Pflug GC. Finding optimal portfolios with constraints on value-at-rrisk. In: Green B, editor, Proceedings of the Third International Stockholm Seminar on Risk Behaviour and Risk Management. Stockholm University; 1999.
  • Gaivoronski AA, Pflug GC. Value at risk in portfolio optimization: properties and computational approach. J Risk. 2005;7:1–31. doi: 10.21314/JOR.2005.106
  • Kataoka S. A stochastic programming model. Econometrica. 1963;31:181–196. doi: 10.2307/1910956
  • Kibzun AI, Kan YS. Stochastic programming problems with probability and quantile functions. Chichester: John Wiley & Sons; 1996.
  • Kibzun AI, Naumov AV, Norkin VI. On reducing a quantile optimization problem with discrete distribution to a mixed integer programming problem. Autom Remote Control. 2013;74:951–967. doi: 10.1134/S0005117913060064
  • Kirilyuk V. Risk measures in stochastic programming and robust optimization problems. Cybern Syst Anal. 2015;51:874–885. doi: 10.1007/s10559-015-9780-3
  • Luedtke J, Ahmed S, Nemhauser G. An integer programming approach for linear programs with probabilistic constraints. Math Program. 2010;122:247–272. doi: 10.1007/s10107-008-0247-4
  • Norkin VI, Boyko SV. Safety-first portfolio selection. Cybern Syst Anal. 2012;48:180–191. doi: 10.1007/s10559-012-9396-9
  • Norkin VI, Kibzun AI, Naumov AV. Reducing two-stage probabilistic optimization problems with discrete distribution of random data to mixed-integer programming problems. Cybern Syst Anal. 2014;50:679–692. doi: 10.1007/s10559-014-9658-9
  • Pflug GC, Römisch W. Modeling, measuring and managing risk. River Edge (NJ): World Scientific; 2007. doi: 10.1142/9789812708724
  • Prekopa A. Stochastic programming. Dordreht: Kluwer Academic Publisher; 1995.
  • Sen S. Relaxation for probabilistically constrained programs with discrete random variables. Oper Res Lett. 1992;11:81–86. doi: 10.1016/0167-6377(92)90037-4
  • Telser LG. Safety first and hedging. Rev Econ Stud. 1955/56;23:1–16. doi: 10.2307/2296146
  • Wozabal D, Hochreiter R, Pflug GC. A D. C. formulation of value-at-risk constrained optimization. Optimization. 2010;59:377–400. doi: 10.1080/02331931003700731
  • Rockafellar RT, Uryasev S. Optimization of conditional value-at-risk. J Risk. 2000;2(3):21–41. doi: 10.21314/JOR.2000.038
  • Dentcheva D, Ruszczyński A. Risk preferences on the space of quantile functions. Math Program Ser B. 2013;148:181–200. doi: 10.1007/s10107-013-0724-2
  • Müller A, Stoyan D. Comparison methods for stochastic models and risks. Chichester: John Wiley & Sons; 2002.
  • Ogryczak W, Ruszczyński A. From stochastic dominance to mean-risk models: semideviations as risk measures. Eur J Oper Res. 1999;116:33–50. doi: 10.1016/S0377-2217(98)00167-2
  • Ogryczak W, Ruszczyński A. On consistency of stochastic dominance and mean–semideviation models. Math Program Ser B. 2001;89:217–232. doi: 10.1007/s101070000203
  • Ogryczak W, Ruszczyński A. Dual stochastic dominance and related mean-risk models. SIAM J Optim. 2002;13(1):60–78. doi: 10.1137/S1052623400375075
  • Dentcheva D, Ruszczyński A. Common mathematical foundations of expected utility and dual utility theories. SIAM J Optim. 2013;23(1):381–405. doi: 10.1137/120868311
  • Dai H, Xue Y, He N, et al. Learning to optimize with stochastic dominance constraints. In: International Conference on Artificial Intelligence and Statistics, PMLR; 2023; p. 8991–9009.
  • Dentcheva D, Ruszczyński A. Portfolio optimization with risk control by stochastic dominance constraints. In: Infanger G., editor, Stochastic Programming. The State of the Art. In Honor of George B. Dantzig, chapter 9, pages 189–212. New York: Springer; 2011.
  • Fábián CI, Mitra G, Roman D, et al. Portfolio choice models based on second-order stochastic dominance measures: An overview and a computational study. In M. Bertocchi, G. Consigli, and M. A. H. Dempster, editors, Stochastic optimization methods in finance and energy, International Series in Operations Research & Management Science, chapter 18, Springer: New York; 2011; p. 441–470. ISBN 978-1-4419-9586-5. doi: 10.1007/978-1-4419-9586-5_18
  • Dentcheva D, Ruszczyński A. Risk preferences on the space of quantile functions. Math Program. 2014;148(1-2):181–200. doi: 10.1007/s10107-013-0724-2
  • Dentcheva D, Ruszczyński A. Semi-infinite probabilistic optimization: first order stochastic dominance constraints. Optimization. 2004;53:583–601. doi: 10.1080/02331930412331327148
  • Dentcheva D, Henrion R, Ruszczyński A. Stability and sensitivity of optimization problems with first order stochastic dominance constraints. SIAM J Optim. 2007;18:322–337. doi: 10.1137/060650118
  • Frydenberg S, Sønsteng Henriksen TE, Pichler A, et al. Can commodities dominate stock and bond portfolios? Ann Oper Res. 2019;282(1-2):155–177. doi: 10.1007/s10479-018-2996-7
  • Batukhtin V. On solving discontinuous extremal problems. J Optim Theory Appl. 1993;77:575–589. doi: 10.1007/BF00940451
  • Knopov P, Norkin V. Stochastic optimization methods for the stochastic storage process control. M. J. Blondin et al. (eds.), Intelligent Control and Smart Energy Management, Springer Optimization and Its Applications 181, 2022; p. 79–111. doi: 10.1007/978-3-030-84474-5_3
  • Norkin VI. A stochastic smoothing method for nonsmooth global optimization. Cybernetics and Computer technologies: 2020; p. 5–14. doi: 10.34229/2707-451X.20.1.1
  • Ermoliev YM, Norkin VI, Wets RJ-B. The minimization of semicontinuous functions: mollifier subgradients. SIAM J Control Optim. 1995;33:149–167. doi: 10.1137/S0363012992238369
  • Mayne D, Polak E. On solving discontinuous extremal problems. J Optim Theory Appl. 1984;43:601–613. doi: 10.1007/BF00935008
  • Mikhalevich VS, Gupal AM, Norkin VI. Methods of nonconvex optimization. Moscow: Nauka; 1987. (In Russian).
  • Nesterov Y, Spokoiny V. Random gradient-free minimization of convex functions. Found Comput Math. 2017;17:527–566. doi: 10.1007/s10208-015-9296-2
  • Norkin V, Pflug GC, Ruszczyński A. A branch and bound method for stochastic global optimization. Math Program. 1998;83:425–450. doi: 10.1007/bf02680569
  • Aubin J-P, Ekeland I. Applied nonlinear analysis. New York: John Wiley and Sons; 1984.
  • Norkin VI. On measuring and profiling catastrophic risks. Cybern Syst Anal. 2006;42(6):839–850. doi: 10.1007/s10559-006-0124-1
  • Galvan G, Sciandrone M, Eucidi S. A parameter-free unconstrained reformulation for nonsmooth problems with convex constraints. Comput Optim Appl. 2021;80:33–53. doi: 10.1007/s10589-021-00296-1
  • Rockafellar RT, Wets RJ-B. Variational analysis. Grundlehren der mathematischen Wissenschaften. Springer, 1st ed. 1998, 3rd printing, springer edition, 2009. ISBN 3540627723; 9783540627722. doi: 10.1007/978-3-642-02431-3
  • Ermoliev Y, Norkin V. On constrained discontinuous optimization. In: Stochastic Programming Methods and Technical Applications: Proceedings of the 3rd GAMM/IFIP-Workshop on ‘Stochastic Optimization: Numerical Methods and Technical Applications’ held at the Federal Armed Forces University Munich, Neubiberg/München, Germany, June 17–20, 1996, Springer: Berlin, Heidelberg; 1998; p. 128–144.
  • Gupal AM, Norkin VI. Algorithm for the minimization of discontinuous functions. Cybernetics. 1977;13(2):220–223. doi: 10.1007/BF01073313
  • Rockafellar RT, Wets RJ-B. Variational analysis. Switzerland AG: Springer Nature; 1997. Available at https://books.google.com/books?id=w-NdOE5fD8AC.doi: 10.1007/978-3-642-02431-3
  • Norkin V, Pichler A, Kozyriev A. Constrained global optimization by smoothing. arXiv; 2023. doi: 10.48550/arXiv.2308.08422
  • Nemirovsky A, Yudin D. Informational complexity and efficient methods for solution of convex extremal problems. New York: J. Wiley & Sons; 1983.
  • Nesterov Y. A method of solving a convex programming problem with convergence rate. Soviet Math Dokl. 1983;27(2):372–376.
  • Ermoliev YM. Methods of stochastic programming. Moscow: Nauka; 1976. (In Russian).
  • Gupal AM. Stochastic methods for minimzation of nondifferentiable functions. Autom Remote Control. 1979;4(40):529–534.
  • Polyak BT. Introduction to optimization. New York: Optimization Software, Inc. Publications Division; 1987.
  • Pflug GC. Stepsize rules, stopping times, and their implementation in stochastic quasigradient algorithms. In: Ermoliev Y. and Wets RJ-B, editors, Numerical Techniques for Stochastic Optimization. Berlin: Springer-Verlag; 1988; chapter 17, p. 353–372.
  • Markowitz HM. Portfolio selection. efficient diversification of investments. New York: John Wiley & Sons, Chapman & Hall; 1959.
  • Luedtke J. New formulations for optimization under stochastic dominance constraints. SIAM J Optim. 2008;19(3):1433–1450. doi: 10.1137/070707956

Appendix

Table A1. Return data set from [Citation62, Table 1, page 13], with artificial bond column.