Full article: Portfolio reshaping under 1st-order stochastic dominance constraints by the exact penalty function methods

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The paper studies a financial portfolio selection problem under 1st-order stochastic dominance constraints. These constraints constitute lower bounds on the return profile of the portfolio. In particular, they allow searching for a better portfolio than some reference portfolio by comparing their cumulative distribution functions. Candidate objective functions are the average return, a value at risk, or the average value at risk. The optimization problems obtained are computationally hard because of possibly non-convex constraints and possibly discontinuous objective functions. In the case of a discrete distribution of the return, we develop numerical procedures to solve the problem. The proposed approach uses new exact penalty functions to tackle with the 1st-order stochastic dominance constraints. The resulting penalized objective function is further optimized be the stochastic successive smoothing method as a local optimizer within some branch and bound global optimization scheme. The approach is numerically and graphically illustrated on small portfolio selection problems up to dimension 10.

Keywords:

1. Introduction

We consider the general, constrained optimization problem (1) $\begin{aligned} minimize f (x), x \in X, \end{aligned}$ (1) where the set $X \subset R^{n}$ constitutes the set of feasible solutions. The projective exact penalty method (cf. [Citation1]) involves a map $π_{X} : R^{n} \to X$ such that (2) $\begin{aligned} π_{X} (R^{n}) & \subset X and \\ π_{X} (x) & = x for every x \in X . \end{aligned}$ (2) The map $π_{X}$ in (Equation2(2) $\begin{aligned} π_{X} (R^{n}) & \subset X and \\ π_{X} (x) & = x for every x \in X . \end{aligned}$ (2) ) is not specified. For $X$ convex, the projection (3) $\begin{aligned} π_{X} (x) = \arg \min {‖ y - x ‖ : y \in X} \end{aligned}$ (3) is well-defined and thus is a candidate for the global optimization problem (Equation4(4) $\begin{aligned} minimize f (π_{X} (x)) + ‖ x - π_{X} (x) ‖, x \in R^{n}, \end{aligned}$ (4) ). However, the projection (Equation3(3) $\begin{aligned} π_{X} (x) = \arg \min {‖ y - x ‖ : y \in X} \end{aligned}$ (3) ) might not be available at cheap computational costs.

For a star domain $X$ , there exists a point $x^{0} \in X$ so that every line segment $[x, x^{0}] {λ x^{0} + (1 - λ) x : λ \in [0, 1]}$ is fully contained in $X$ , provided that $x \in X$ . The function $\begin{aligned} π_{X} (x) = λ_{x} x^{0} + (1 - λ_{x}) x, x \in X, \end{aligned}$ where $\begin{aligned} λ_{x} := sup {λ : λ x^{0} + (1 - λ) x \in X}, \end{aligned}$ as well satisfies the conditions (Equation2(2) $\begin{aligned} π_{X} (R^{n}) & \subset X and \\ π_{X} (x) & = x for every x \in X . \end{aligned}$ (2) ).

For a single-valued and continuous projection mapping $π_{X} (\cdot)$ , the global and local solutions of the constrained problem (Equation1(1) $\begin{aligned} minimize f (x), x \in X, \end{aligned}$ (1) ) and the unconstrained, global optimization problem (4) $\begin{aligned} minimize f (π_{X} (x)) + ‖ x - π_{X} (x) ‖, x \in R^{n}, \end{aligned}$ (4) coincide. Hence, the global, unconstrained optimization problem (Equation4(4) $\begin{aligned} minimize f (π_{X} (x)) + ‖ x - π_{X} (x) ‖, x \in R^{n}, \end{aligned}$ (4) ) can be considered instead of the constrained optimization problem (Equation1(1) $\begin{aligned} minimize f (x), x \in X, \end{aligned}$ (1) ).

The problem (Equation4(4) $\begin{aligned} minimize f (π_{X} (x)) + ‖ x - π_{X} (x) ‖, x \in R^{n}, \end{aligned}$ (4) ) is not necessarily smooth nor convex. So solving (Equation4(4) $\begin{aligned} minimize f (π_{X} (x)) + ‖ x - π_{X} (x) ‖, x \in R^{n}, \end{aligned}$ (4) ) requires applying non-smooth local and global optimization methods (cf. [Citation1]).

The present paper outlines the described methodology for a financial portfolio optimization problem under specific constraints, namely 1st-order stochastic dominance constraints (FSD), where each feasible portfolio stochastically dominates a given reference portfolio. This means that a decision maker with non-decreasing utility function will prefer any feasible portfolio to the reference one. Such portfolio optimization settings were considered in [Citation2,Citation3] (with the 2nd-order stochastic dominance constraints, SSD) and in [Citation4] (for linear problems with 1st-order stochastic dominance constraints). Problems with FSD constraints are much harder than the ones with SSD constraints, because the former are non-convex. Noyan and Ruszczyński [Citation5] reduce such problems to linear mixed-integer problems. The present paper develops a different approach to such problems, which is applicable to the nonlinear case as well. Our approach consists in an application of the exact penalty method to remove the FSD constraints and solution of the obtained penalty problem by non-smooth global optimization methods.

Outline of the paper.

First, Section 2 reviews the literature on decision-making and portfolio optimization under stochastic dominance constraints.

In Section 3, we set the problem of a portfolio optimization under 1st-order stochastic dominance constraints and provide some examples of such problems. Next (cf. Section 4), we reduce the portfolio optimization problem under 1st-order stochastic dominance constraints to the unconstrained problems by means of new exact projective non-smooth and discontinuous penalty functions.

Forth, we review the successive smoothing method for local optimization of non-smooth and discontinuous functions. This method is used as a local optimizer within the branch and bound framework for the global optimization of the penalty functions. We finally give numerical illustrations (Section 5) of the proposed approach to financial portfolio optimization under 1st-order stochastic dominance constraints on portfolios containing up to 10 components with one risk-free asset.

2. Literature review

The problem of financial portfolio optimization belongs to the class of decision-making problems under uncertainty. The choice of a particular portfolio is accompanied by an uncertain result in the form of a distribution of future returns. This and more general decision problems under stochastic uncertainty are studied in the theory of stochastic programming (cf. [Citation6]). In the general case, the formalization of such problems is carried out using preferences defined on the set of possible uncertain results of decisions made. Preferences establish partial order relationships on the set of decision outcomes, i.e. they satisfy the axioms of reflexivity, transitivity, and antisymmetry. Partial order relations make it possible to narrow the choice of preferable solutions to a subset of non-dominated alternatives. Under additional assumptions about the properties of preferences, the latter can be represented in a numerical form, and then the problem of choosing preferred outcomes turns into a problem of multi-objective optimization. A discussion of stochastic programming problems from this perspective can be found in [Citation7]. Conversely, any numerical function on the set of outcomes specifies a preference relation.

In optimization and financial portfolio management problems, investments are often allocated according subject to utility and risk criteria. The original settings of this type were proposed by Markowitz [Citation8] and Roy [Citation9], who used the mean return as a measure of utility and the variance of returns as a measure of risk. An attractive feature of these formulations is the relative simplicity of the resulting optimization problems.

Subsequently, other utility and risk measures were proposed and used, such as quantiles, averaged quantiles, semi-deviations of returns from the average value, probabilities of returns falling into a profit or loss area, general coherent risk measures, and others (the references include [Citation10–25], cf. also the references therein). The corresponding decision selection problems are computationally more complex and require adapting known methods, or even developing new solution methods. For example in [Citation26], the problems of optimizing a financial portfolio in terms of averaged quantiles are reduced to a linear programming problem and can be effectively solved by existing software tools. However, problems that involve quantiles or probabilities are much more difficult because they are non-convex and non-smooth with a possibly non-convex and disconnected admissible region. The problem may be even harder, if the return depends nonlinearly on the portfolio structure (see, e.g. the discussion of the properties of these problems in [Citation12,Citation13]). For example, in the works by Noyan et al. [Citation4], Benati and Rizzi [Citation11], Luedtke et al. [Citation18], Norkin and Boyko [Citation19], Kibzun et al. [Citation16] and Norkin et al. [Citation20], such problems are reduced to problems of mixed-integer programming in the case of a discrete distribution of random data. Gaivoronski and Pflug [Citation13] developed a special method for smoothing a variational series to optimize a portfolio by a quantile criterion. Wozabal et al. [Citation25] give a review of the quantile constrained portfolio selection problem, present a difference-of-convex representation of involved quantiles, and develop a branch and bound algorithm to solve the reformulated problem.

On the other hand, natural relations of stochastic dominance of the first, second and higher orders are known on the set of probability distributions. For example, the first order stochastic dominance relation is defined as the excess of one distribution function over another distribution function. The relation of stochastic dominance of the second order is determined by the relation of the integrals of the distribution functions of random variables. With the second-order stochastic dominance relation, the decision maker's negative attitude towards risk can be expressed (see a discussion of these issues in [Citation27,Citation28]).

A natural question arises about the connection between decision-making problems in a multi-criteria formulation and in terms of certain preference relations, in particular, stochastic dominance relations. The connection between mean-risk models and second-order stochastic dominance relations was studied in the works by Ogryczak and Ruszczyński [Citation29–31]. In the works by Dentcheva and Ruszczyński [Citation27,Citation32] it is shown under what conditions the problem of decision-making in terms of preference relations is reduced to the problem of optimizing a numerical indicator.

Dentcheva and Ruszczyński [Citation2,Citation3] proposed a mixed financial portfolio optimization model in which a numerical criterion is optimized, and constraints are specified using second-order stochastic dominance relations. The feasible set in this setting consists of decisions, which dominate some reference one and are preferred by any risk averse decision maker. In the case of a discrete distribution of random data, the problem is reduced to a linear programming problem of (large) dimension. These works have given rise to a large stream of work on stochastic optimization problems under second-order stochastic dominance constraints (see the reviews by Gutjahr and Pichler [Citation7], Dai et al. [Citation33], Dentcheva and Ruszczyński [Citation34] and Fábián et al. [Citation35]).

Noyan et al. [Citation4,Citation5] and Dentcheva and Ruszczyński [Citation36] considered similar mixed problems, but with first-order stochastic dominance relations in the constraints. To solve them, a method of reduction to problems of linear mixed-integer programming with subsequent continuous relaxation of Boolean constraints and the introduction of additional cutting constraints is proposed. The paper [Citation33] develops a quite different approach to solving the problem based on the dual reformulation from [Citation37] by discretization of the space of one-dimensional utilities by step-wise functions, smoothing the latter, and applying the stochastic gradient method for the (local) optimization of the approximate dual function. Dentcheva et al. [Citation38] studied the stability of these problems with respect to perturbation of the involved distributions on the basis of general studies of the stability of stochastic programming problems.

In this article, we consider similar financial portfolio optimization problems under 1st-order stochastic dominance constraints, but from a different point of view and apply a different solution approach that is also applicable to problems with nonlinear random return functions. This problem is viewed as the problem of optimizing the risk profile of the portfolio according to the preferences of the decision maker. Namely, the decision maker sets the desired risk profile (the form of the cumulative distribution function) and tries to find an acceptable portfolio that dominates this risk profile. This is one statement, and the other is that, under the condition of the existence of an admissible portfolio, i.e. portfolio dominating some reference portfolio, it can be any index or risk-free portfolio, choose a portfolio with the desired risk profile by optimizing one or another function (for example risk measures, as average quantiles, etc.).

In this way it is possible to satisfy the needs of both a risk seeking decision maker and a risk averse decision maker. In the first case, the average quantile function for high returns is maximized, and in the second case, the mean quantile function for low returns is maximized, in both cases with a lower bound on the quantile risk profile. In the first case, the risk profile is stretched, and in the second case, it is compressed and becomes more like a profile of a deterministic value. The reshaping of the risk profile can be made both through selection of different objective function and by adding new securities to the portfolio, e.g. as commodities, etc., cf. [Citation39].

In the problems under consideration, the objective function is nonlinear, non-convex, and possibly non-smooth or even discontinuous, and the number of constraints is continual (uncountable). But when the reference profile has a stepped character, one can limit oneself to a finite number of restrictions by the number of steps of the reference profile. In this problem, the admissible area may turn out to be non-convex and disconnected. Thus, the problem under consideration is a global optimization problem with highly complex and nonlinear constraints. To solve it, we first reduce it to an unconstrained global optimization problem by applying new penalty functions, namely, discontinuous penalty functions as in [Citation40,Citation41] and the so-called projective penalty functions as in [Citation1,Citation42]. In the first case, the objective function outside the allowable area is extended by large but finite penalty values, and in the second case, it is extended at infeasible points by summing the value of the objective function in the projection of this point onto the feasible set and the distance to this projection. In this case, the projection is made in the direction of some known internal feasible point. After such a transformation, the problem is still a complex unconstrained global optimization problem. In order to solve it, we further apply the method of successive smoothing of this penalty function, i.e. we minimize successive smoothed approximations of the penalized function, starting from relatively large smoothing parameters with its gradual decreasing to zero. It is known that the smoothed functions can be optimized by the method of stochastic gradients, where the latter have the form of finite difference vectors in random directions, cf. [Citation43–46]. Here, smoothing plays a dual role. Firstly, it allows optimizing non-smooth and discontinuous functions and, secondly, it levels out shallow local extrema. Although smoothing makes it possible to ignore small local extrema, it does not guarantee convergence to the global extrema. Therefore, we put the method of sequential smoothing in a general scheme of the stochastic branch and bound method, where the smoothing method plays the role of a local optimizer on subsets of the optimization area. The scheme of the branch and bound method is designed in such a way that the calculations are concentrated in the most promising areas of the search for the global extrema. Our approach is similar to that of Dai et al. [Citation33], but applied to the primal problem, it applies an exact penalty method instead of the Lagrangian one, and optimizes the penalized function by stochastic finite-difference gradient methods. Besides, we provide global search by a specific stochastic branch and bound algorithm, which is well-suited for parallelization.

This article describes the financial portfolio optimization model with 1st-order stochastic dominance constraints and illustrates the proposed approach to its solution on the problems of reshaping the risk profile of portfolios of small dimension. At the same time, the results of changing the shape of the risk profile are presented in graphical form, which allows visually comparing the resulting profile with the reference one, and, if necessary, continue adaptation of the profile to the preferences of the decision maker.

3. Mathematical problem setting

The financial portfolio is described by a vector $x = (x_{1}, \dots, x_{n})^{'}$ of values $x_{i}$ and by a random vector of returns $ω = (ω_{1}, \dots, ω_{n})^{'} \in Ω$ of assets, $i = 1, \dots, n$ , in some fixed time interval; $(\cdot)^{'}$ denotes the transposition of a vector. Denote by $\begin{aligned} X {x \in R^{n} : \sum_{i = 1}^{n} x_{i} \leq 1, x_{i} \geq c_{i} \geq - \infty} \end{aligned}$ the set of admissible portfolios with a unit maximal total cost of the whole portfolio, $c_{i}$ is a lower bound on the value of component i of the portfolio (e.g. a short-selling constraint or a limitation on borrowing assets). In the definition of the set X, the inequality $\sum_{i = 1}^{n} x_{i} \leq 1$ is used, which means that $x_{0} 1 - \sum_{i = 1}^{n} x_{i}$ – the non-invested funds – have zero yield. The portfolio is characterized by a random return $f (x, ω) = ω^{'} x$ , by the mean return $μ (x) = E_{ω} f (x, ω) = \sum_{i = 1}^{n} x_{i} E_{ω} ω_{i}$ for the considered period of time and by the variance of return $σ^{2} (x) = E_{ω} (f (x, ω) - E_{ω} f (x, ω))^{2}$ , where $E_{ω}$ denotes the mathematical expectation with respect to the distribution of random variable ω.

The classical financial portfolio models assume a linear dependence of the return on the portfolio structure, for which alternative problem reformulations are available. Nonlinearities appear when random returns are modelled by some parametric distribution. Another example of nonlinear portfolio return appears in the dynamic portfolio optimization problem with fixed mix portfolio control strategy, cf. [Citation47].

Suppose the random vector ω is given by a discrete (an empirical, e.g.) distribution ${ω^{1}, \dots, ω^{m}}$ with equiprobable values $ω^{i} \in R^{n}$ , $i = 1, 2, \dots, m$ . Then the average portfolio return $μ (x) = E ω^{'} x$ and the cumulative distribution function (CDF) of the portfolio return $F_{x} (t) = \Pr {ω^{'} x \leq t}$ are given by $\begin{aligned} f (x) = μ_{m} (x) = \frac{1}{m} \sum_{i = 1}^{m} (ω^{i})^{'} x \end{aligned}$ and $\begin{aligned} F_{x} (t) = F_{x, m} (t) = m^{- 1} # {i : (ω^{i})^{'} x \leq t} . \end{aligned}$ The portfolio optimization problem under 1st-order stochastic dominance constraint is (5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) (6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) The objective function $f (x)$ in (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) ) quantifies the profit, which is to be maximized. As objectives $f (x)$ in the master problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) ), we consider

the mean value $μ (x)$ ,
some Value-at-Risk function ( ${V @ R}_{γ} (x)$ ) at risk level $γ \in (0, 1)$ ,
the average Value-at-Risk function $\begin{aligned} {AV @ R}_{α, β} (x) \frac{1}{β - α} \int_{α}^{β} {V @ R}_{γ} (x) d γ, 0 \leq α < β \leq 1, \end{aligned}$ in particular ${AV @ R}_{γ} (x) {AV @ R}_{γ, 1} (x)$ and ${AV @ R}_{0, 1} (x) = μ (x)$ .

The constraints (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) address risk: the cumulative distribution function $F_{x} (\cdot)$ of the feasible portfolio x must not be worse than the reference function $F_{ref} (t)$ (also called a reference risk profile) for all $t \in R$ . The reference function itself may be given by $F_{x_{ref}} (t + δ (t))$ , where $x_{ref}$ is some reference portfolio with cumulative distribution function $F_{x_{ref}} (t)) = Pr {ω^{'} x_{ref} \leq t}$ .

We formally can associate some random variable $ξ_{ref}$ with the CDF $F_{ref} (t)$ . Then, the family of inequalities (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) ensures that the random variable $ξ_{x} = ω^{'} x$ dominates the random variable $ξ_{ref}$ in 1st stochastic order. We further remark that there can be several stochastic dominance constraints with corresponding reference CDF $F_{ref}^{i}$ , which can be replaced by the single CDF $F_{ref} = min_{i} F_{ref}^{i}$ .

The constraints (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) are tight– often too tight to ensure a non-empty set of feasible solutions. The function $δ (t) \geq 0$ relaxes these tight constraints by accepting higher losses at given probabilities.

The function $F_{x} (\cdot)$ is non-decreasing and continuous from the right (upper semicontinuous), the functions $F_{ref} (\cdot)$ and $δ (\cdot)$ are assumed right continuous.

Lemma 3.1

The feasible set in (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) is either empty or compact.

Proof.

Indeed, define $\begin{aligned} X^{<} & {x \in X : F_{x}^{<} (t) \leq F_{ref} (t) \forall t \in R} and \\ X^{\leq} & {x \in X : F_{x} (t) \leq F_{ref} (t) \forall t \in R} \end{aligned}$ with right continuous $F_{ref} (\cdot)$ , $F_{x} (\cdot)$ , and left continuous $F_{x}^{<} (\cdot)$ . On one hand it holds that $X^{\leq} \subseteq X^{<}$ , since $F_{x}^{<} (t) \leq F_{x} (t)$ . On the other hand, if $x \in X^{<}$ , then it holds for any t that $\begin{aligned} F_{x} (t) \leq lim_{t \leftarrow τ} F_{x}^{<} (τ) \leq lim_{t \leftarrow τ} F_{ref} (τ) = F_{ref} (t) . \end{aligned}$ So $X^{<} \subseteq X^{\leq}$ and, hence, $X^{<} = X^{\leq}$ . In such case, the function $F_{x}^{<} (t)$ appears to be lower semicontinuous in $(x, t)$ , hence the sets ${x \in R^{n} : F_{x}^{<} (t) \leq F_{ref} (t)}$ are closed and the sets ${x \in X : F_{x}^{<} (t) \leq F_{ref} (t)}$ are closed for each t. If the feasible set $X^{\leq} = X^{<}$ in (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) is non-empty, then it is closed and bounded and, hence, compact. This completes the proof.

Lemma 3.2

If the function $F_{x} (t)$ has only finitely many jumps, e.g. at $T_{x} = {{ω^{1}}^{'} x, \dots, {ω^{m}}^{'} x},$ then $\begin{aligned} {x \in X : F_{x} (t) \leq F_{ref} (t) \forall t \in R} = {x \in X : F_{x} (t) \leq F_{ref} (t), t \in T_{x}} . \end{aligned}$

Proof.

Let $T_{x} = {t_{1}, t_{2}, \dots}$ , $F_{x} (t) = F_{x} (t_{k})$ for $t \in [t_{k}, t_{k + 1})$ , $k \geq 1$ , and $F_{x} (t) = 0$ for $t < t_{1}$ . Then $F_{x} (t) = F_{x} (t_{k}) \leq F_{ref} (t_{k}) \leq F_{ref} (t)$ for all $t \in [t_{k}, t_{k + 1})$ , $k \geq 1$ . Besides, $F_{x} (t) = 0 \leq F_{ref} (t)$ for all $t < t_{1}$ .

An alternative problem formulation. A reformulation of the problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) involves the inverse cumulative distribution functions instead of the CDFs. To this end let $\begin{aligned} Q_{x} (α) sup_{t \in R} {t : F_{x} (t) \leq α}, α \in [0, 1], \end{aligned}$ be the return quantile (generalized inverse) function associated with the decision x, and $Q_{ref} (α)$ be some reference quantile function, continuous from below (lower semicontinuous).

Consider the problem (7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) (8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) The following lemmas demonstrate that problem (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) is well-defined and equivalent to the initial problem statement (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ).

Lemma 3.3

Define $F_{x}^{<} (t) P {ω^{'} x < t}$ and $Q_{x}^{<} (α) sup {t : F_{x}^{<} (t) \leq α}$ , then $Q_{x} (α) = Q_{x}^{<} (α),$ both functions $Q_{x} (α)$ and $Q_{x}^{<} (α)$ are upper semicontinuous in $(x, α)$ , and hence the feasible set in (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) is compact.

Proof.

First, it holds for any $α \in [0, 1]$ that $\begin{aligned} Q_{x}^{<} (α) = sup {t : F_{x}^{<} (t) \leq α} \geq sup {t : F_{x} (t) \leq α} = Q_{x} (α) . \end{aligned}$ Let us prove the opposite inequality, $Q_{x} (α) \geq Q_{x}^{<} (α) .$ For a given α there exists $t_{α}$ such that $Q_{x}^{<} (α) = sup {t : F_{x}^{<} (t) \leq α} = t_{x, α}$ and thus $F_{x}^{<} (t) \leq α$ for all $t \leq t_{x, α}$ . For any $t < t_{x, α}$ , it holds that $F_{x} (t) \leq F_{x}^{<} (t_{x, α}) \leq α .$ Hence, we obtain the required opposite inequality, $Q_{x} (α) = sup {t : F_{x} (t) \leq α} \geq t_{x, α} = Q_{x}^{<} (α)$ and thus $Q_{x} (α) = Q_{x}^{<} (α) .$ Next, the $Q_{x}^{<} (\cdot)$ is the maximum function, by Aubin and Ekeland [Citation48, Ch. 1, Sec. 1, Prop. 21] it is upper semicontinuous in $(x, α)$ . Hence, the feasible set in (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) is either empty or compact as intersection of compact sets ${x \in X : Q_{x} (α) \geq Q_{ref} (α)}$ , $α \in [0, 1]$ . The proof is complete.

The next lemma states that problems (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) are equivalent.

Lemma 3.4

Let $Q_{ref} (α) sup {t : F_{ref} (t) \leq α}$ $($ cf. (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) $)$ , where $F_{ref} (t)$ is the same (right continuous) reference distribution function as in (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ). Then the problems (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) are equivalent, i.e. they have the same objective function and their feasible sets coincide.

Proof.

We have to prove that (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) define the same feasible set. Assume (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) is fulfilled but (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) is not. Then there exists $α_{0}$ such that $Q_{ref} (α_{0}) > Q_{x} (α_{0})$ . This implies that $F_{x} (Q_{ref} (α_{0})) > α_{0} .$ From here, by assumption (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ), $F_{ref} (Q_{ref} (α_{0})) \geq F_{x} (Q_{ref} (α_{0})) > α_{0}$ . The supremum in $sup {t : F_{ref} (t) \leq α_{0}}$ is achieved at $t = Q_{ref} (α)$ then $F_{ref} (Q_{ref} (α)) \leq α_{0},$ a contradiction.

Assume (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) is fulfilled but (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) is not. Then there exists $t_{0}$ such that $F_{ref} (t_{0}) < F_{x} (t_{0})$ . By assumption, $Q_{ref} (F_{x} (t_{0})) \leq Q_{x} (F_{x} (t_{0}))$ and by definition of a quantile, $Q_{x} (F_{x} (t_{0})) = sup {t : F_{x} (t) \leq F_{x} (t_{0})} \leq t_{0},$ thus $Q_{ref} (F_{x} (t_{0})) \leq Q_{x} (F_{x} (t_{0})) \leq t_{0}$ . On the other hand, due to the right continuity of a distribution function, there is $t_{1} > t_{0}$ such that $F_{ref} (t_{1}) < F_{x} (t_{0}),$ so $Q_{ref} (F_{ref} (t)) = sup {t : F_{ref} (t) \leq F_{x} (t_{0})} \geq t_{1} > t_{0}$ , a contradiction. This completes the proof.

Lemma 3.5

If the reference function $Q_{ref}$ has a step like character with steps at $A_{ref} = {0 = α_{1}, α_{2}, \dots, α_{k} = 1}$ , i.e. $Q_{ref} (α) = Q_{ref} (α_{i})$ for $α \in [α_{i}, α_{i + 1})$ , $i \in {0, 1, \dots, k - 1}$ , then $\begin{aligned} {x \in X : Q_{ref} (α) \leq Q_{x} (α) \forall α \in [0, 1]} \\ = {x \in X : Q_{ref} (α) \leq Q_{x} (α) \forall α \in A_{ref}} . \end{aligned}$

Proof.

Assume $Q_{ref} (α_{i}) \leq Q_{x} (α_{i})$ for all $i \in {0, 1, \dots, k}$ . Then $Q_{ref} (α) = Q_{ref} (α_{i}) \leq Q_{x} (α_{i}) \leq Q_{x} (α)$ for all $α \in [α_{i}, α_{i + 1})$ , $i \in {0, 1, \dots, k - 1}$ and $Q_{ref} (α_{k}) \leq Q_{ref} (α_{k})$ .

Employing different objective functions $f (x)$ in (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) ) allows reshaping the risk profile $F_{x} (t)$ and $Q_{x} (α)$ in a desirable manner. For example, the problem $\begin{aligned} maximize & {AV @ R}_{γ, 1} (x) \\ subject to & Q_{x_{ref}} (α) - δ (α) \leq Q_{x, m} (α), δ (α) \geq 0 (α \in [0, 1]) and x \in X \end{aligned}$ can be used for searching more risky but potentially more profitable portfolios than some reference one $x_{ref}$ with risk profile $Q_{x_{ref}} (α)$ and step back function $δ (α)$ . The problem $\begin{aligned} maximize & {AV @ R}_{0, γ} (x) \\ subject to & Q_{x_{ref}} (α) - δ (α) \leq Q_{x, m} (α), δ (α) \geq 0 (α \in [0, 1]) and x \in X \end{aligned}$ can be used to obtain less risky and less profitable portfolio than a reference portfolio $x_{ref}$ . Note, however, that the objective functions in these problems can be discontinuous.

Examples

The following two examples address particular cases of the problem setting (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) ) and (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) involving the generalized inverse. Example 3.7 extends the problem setting from portfolio optimization to decision-making under dangerous threats.

Example 3.6

Portfolio selection under a single Value-at-Risk ( $V @ R$ ) constraint, cf. [Citation25] and corresponding references therein

Let $Q_{x} (α)$ be the α-quantile of the random return $ξ_{x} = ω^{'} x$ for a given α, $q_{0}$ be the reference value for $Q_{x} (α_{0})$ , and $τ \leq min_{1 \leq i \leq m} ω_{i}$ a.s. Consider the problem (9) $\begin{aligned} maximize & f (x) = E ω^{'} x \\ subject to & Q_{x} (α_{0}) \geq q_{0}, x \in X, \end{aligned}$ (9) where $α_{0}$ is a fixed risk level.

With the reference quantile function $\begin{aligned} Q_{ref} (α) = {\begin{cases} q_{α} & if α \geq α_{0}, \\ τ & if α < α_{0}, \end{cases} \end{aligned}$ the constraints in this example are equivalent to $Q_{x} (α) \geq Q_{ref} (α)$ , $α \in [0, 1]$ , $x \in X$ , of form the form (Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ).

Example 3.7

Decision making under catastrophic risks, [Citation49]

Catastrophic risks, as catastrophic floods, earthquakes, tsunami, etc., designate some ‘low probability – high consequences’ events. Usually, they are described by a list of possible extreme events (indexed by $i = 1, 2, \dots, I$ ) that can happen once in 10, 50, or 100, etc., years. Decision-making under catastrophic risks means designing a certain mitigation measures to prevent unacceptable losses. It is proposed the following framework for decision-making under catastrophic risks.

Let vector of parameters $x \in X$ describes a decision (a complex of countermeasures) from some (compact) set X of possible decisions, each associated with costs $c (x)$ . For each kind of event i, experts can define reasonable (‘acceptable’) levels of losses $q_{i}$ due to this event, $0 < q_{1} < \dots < q_{m}$ . Suppose we can model each event i, its consequences and losses $ℓ_{i} (x)$ under the decision $x \in X$ . Then the corresponding decision-making problem is (10) $\begin{aligned} minimize & c (x) \\ subject to & ℓ_{i} (x) \leq q_{i}, i = 1, 2, \dots, I; x \in X . \end{aligned}$ (10) Although the framework does not include explicit probabilities of the events i, we can formally introduce probabilities $1 > p_{1} > p_{2} > \dots > p_{m}$ , e.g. $p_{1} = 1 / 10$ , $p_{2} = 1 / 50$ , $p_{3} = 1 / 100$ , etc., that event i happens in any given year. By defining two quantile functions $\begin{aligned} Q_{ref} (α) & {\begin{cases} - q_{m}, & α \in [0, p_{m}), \\ - q_{m - 1}, & α \in [p_{m}, p_{m - 1}), \\ \dots, & \dots, \\ - q_{1}, & α \in [p_{1}, p_{1}), \\ 0, & α \in [p_{1}, 1], \end{cases} \\ Q_{x} (α) & {\begin{cases} - ℓ_{m} (x), & α \in [0, p_{m}), \\ - ℓ_{m - 1} (x), & α \in [p_{m}, p_{m - 1}), \\ \dots, & \dots, \\ - ℓ_{1} (x), & α \in [p_{1}, p_{1}), \\ 0, & α \in [p_{1}, 1], \end{cases} \end{aligned}$ we can formally express the constraints (Equation10(10) $\begin{aligned} minimize & c (x) \\ subject to & ℓ_{i} (x) \leq q_{i}, i = 1, 2, \dots, I; x \in X . \end{aligned}$ (10) ) as $Q_{x} (α) \geq Q_{ref} (α)$ for all $α \in [0, 1]$ , i.e. in terms of 1st-order stochastic dominance.

4. The solution approach: exact penalty functions

In case of a discrete random variable ω, the function $F_{x} (t)$ in (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) is discontinuous in t and x. For the solution of the problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ), we apply the exact discontinuous and the exact projective penalty functions from [Citation1,Citation40–42,Citation50].

4.1. Finding a feasible solution

If the distribution function $F_{x, m} (\cdot)$ has a stepwise character with jumps at step points $T_{x} = {t_{1}, \dots, t_{m}}$ , we may set $\begin{aligned} G_{m} (x) max_{t \in T_{x}} (F_{x, m} (t) - F_{ref} (t)) . \end{aligned}$ With that, due to Lemma 3.2, the stochastic dominance constraints in (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) are equivalent to the inequality $G_{m} (x) \leq 0$ .

To find a feasible solution $x^{0}$ for the problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ), we solve the problem (11) $\begin{aligned} min_{x \in X} G_{m} (x) . \end{aligned}$ (11) If for some $x^{0} \in X$ it holds that $G_{m} (x^{0}) \leq 0$ , then $x^{0}$ is a feasible solution of the problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ).

Similarly, due to Lemma 3.5, to find a feasible solution of problem (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ), we solve the problem (12) $\begin{aligned} min_{x \in X} H_{m} (x) max_{α \in A_{ref}} (Q_{ref} (α) - Q_{x, m} (α)), \end{aligned}$ (12) where $A_{ref}$ is the set of jump points of $Q_{x, m} (\cdot)$ .

4.2. Structure of the feasible set

The structure of the feasible set of the problems (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) heavily depends on the choice of the reference profiles $F_{ref} (t)$ and $Q_{ref} (α)$ . Figure illustrates disconnected and non-convex feasible sets for a portfolio consisting of 3 components only.

Figure 1. Possible (disconnected and non-convex) shapes of feasible sets under 1st-order SDC ( $x_{ref} = (x_{1} = 0.31, x_{2} = 0.69)$ , $δ = 0.05, 0, 0$ ).

Suppose the portfolio $x^{0} = (0.31, 0.69, 0)^{'}$ includes the first two assets of Table 9 from the appendix, with random return ω given by the first two columns of this table. Let $F_{0} (t) Pr {ω x^{0} < t}$ be the risk profile of this portfolio and $F_{ref} (t) F_{0} (t + δ)$ (with constant $δ = 5 %$ ) be the reference risk profile. The left figure in Figure displays the shape of the non-convex disjoint feasible set of problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) for this example. The figure in the middle gives an example of the feasible set of problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ), when a risk-free asset is infeasible. The right figure of Figure corresponds to the case of a feasible risk-free portfolio.

When the distribution of returns ω is discrete (i.e. represented by scenarios), the distribution function $F_{x} (t)$ of the portfolio return is discontinuous in x as a weighted sum of step-wise indicator functions, so the stochastic dominance constraints are represented by discontinuous functions. So generally, we deal with discontinuous optimization problems, which are a challenge for optimization theory. In particular, the constraints in these problems can be non-convex and disjoint. Our first step towards a solution of such problems is to eliminate constraints by a penalty method. However, the standard exact penalty methods that add a penalty term to the objective function is invalid in this case. In the next section we consider two new exact penalty methods, discontinuous and projective ones.

4.3. Exact discontinuous penalty functions

To find an optimal solution for problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ), we solve the problem (13) $\begin{aligned} max_{x \in X} [F (x) {\begin{cases} f (x) & if G_{m} (x) \leq 0, \\ c - G_{m} (x) & else, \end{cases}] \end{aligned}$ (13) where $c < sup_{{x \in X : G_{m} (x) \leq 0}} f_{m} (x)$ , for example, $c = f (x_{0})$ with $G_{m} (x_{0}) \leq 0.$

Obviously, the global maximums of the problems (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation13(13) $\begin{aligned} max_{x \in X} [F (x) {\begin{cases} f (x) & if G_{m} (x) \leq 0, \\ c - G_{m} (x) & else, \end{cases}] \end{aligned}$ (13) ) coincide.

We can further remove the constraint $x \in X$ from problem (Equation13(13) $\begin{aligned} max_{x \in X} [F (x) {\begin{cases} f (x) & if G_{m} (x) \leq 0, \\ c - G_{m} (x) & else, \end{cases}] \end{aligned}$ (13) ) by subtracting the exact projective penalty term $‖ x - π_{X} (x) ‖$ , where $π_{X} (x)$ is the projection of x on X, from the objective function (Equation13(13) $\begin{aligned} max_{x \in X} [F (x) {\begin{cases} f (x) & if G_{m} (x) \leq 0, \\ c - G_{m} (x) & else, \end{cases}] \end{aligned}$ (13) ), and solve the global problem (14) $\begin{aligned} max_{x \in R^{n}} Φ (x) F (π_{X} (x)) - ‖ x - π_{X} (x) ‖ \end{aligned}$ (14) instead.

4.4. Exact projective penalty functions

Let $x^{0} \in X$ be some feasible solution of problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ), i.e. $x^{0} \in X$ and $G_{m} (x^{0}) \leq 0$ . For any $x \in R^{n}$ denote $x_{λ} = (1 - λ) x^{0} + λ x$ . Define a projection point (15) $\begin{aligned} p_{G_{m}} (x) = {\begin{cases} x & if G_{m} (x) \leq 0, \\ x_{λ_{x}} & if G_{m} (x) > 0, \end{cases} \end{aligned}$ (15) where $\begin{aligned} λ_{x} = sup {λ \in [0, 1] : G_{m} (x_{λ}) \leq 0} . \end{aligned}$ Now instead of the constrained problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) consider the unconstrained problem (16) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{G_{m}} (π_{X} (x))) - ‖ p_{G_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (16) Similarly, instead of constrained problem (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ), we can consider the unconstrained problem (17) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{H_{m}} (π_{X} (x))) - ‖ p_{H_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (17) The exact projective penalty method was studied and tested in Norkin [Citation1,Citation42], Galvan [Citation50]. The main features of the method are: it is exact, it does not require selection of the right penalty parameter, and it does not use the objective function values outside the feasible set. Also, for the considered portfolio reshaping problem, the exact projective penalty function can be found in a closed form.

By Norkin [Citation1, Theorem 4.4], the global maximums of problems (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation16(16) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{G_{m}} (π_{X} (x))) - ‖ p_{G_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (16) ) coincide, and, if the mapping $p_{G_{m}} (π_{X} (x))$ is continuous, also the local maximums of both problems coincide, i.e. the optimization problems are equivalent. The mapping $p_{G_{m}} (π_{X} (x))$ is continuous, if the feasible set is convex and the projection mapping (Equation15(15) $\begin{aligned} p_{G_{m}} (x) = {\begin{cases} x & if G_{m} (x) \leq 0, \\ x_{λ_{x}} & if G_{m} (x) > 0, \end{cases} \end{aligned}$ (15) ) uses an internal point $x^{0}$ of the feasible set of the considered problem. For example, a feasible risk-free portfolio can represent such point.

Remark 4.1

Computational aspects

The calculations of the projections $p_{G_{m}} (\cdot)$ and $p_{H_{m}} (\cdot)$ requires finding roots of the equations $ϕ (λ) G_{m} (x_{λ}) = 0$ and $ψ (λ) H_{m} (x_{λ}) = 0$ . This requires multiple evaluations of the functions $ϕ (λ)$ and $ψ (λ)$ , and hence, multiple construction of $F_{x_{λ}, m} (t) = Pr {ω^{'} x_{λ} \leq t}$ or $Q_{x_{λ}, m} (α)$ for different portfolios $x_{λ}$ . This may take considerable time in case of large number of observations m. However, in case of a specific feasible point, namely a risk-free (feasible) portfolio, these functions can be easily found through $F_{x, m} (t)$ and $Q_{x, m} (α)$ as the following statements show.

Proposition 4.2

Let $x^{0} = (1, 0, \dots, 0)$ be a risk-free portfolio with fixed return r, $x \in X$ be an arbitrary portfolio, with the random return $f (x, ω) = ω^{'} x$ , the return cumulative distribution function $F_{x} (t)$ , $t \in R$ , and the corresponding inverse (quantile) one $Q_{x} (α)$ , $α \in [0, 1]$ , $X = {x \in R^{n} : \sum_{i = 1}^{n} x_{i} = 1, x_{i} \geq 0, ı = 1, \dots, n}$ . Consider a mixed portfolio of the form $x_{λ} = λ x + (1 - λ) x^{0}$ , $λ \in [0, 1]$ . Its distribution function and inverse distribution function are expressed through $F_{x} (t)$ and $Q_{x} (α)$ as $\begin{aligned} F_{x_{λ}} (t) = F_{x} ((t - (1 - λ) r) / λ), \end{aligned}$ and $\begin{aligned} Q_{x_{λ}} (α) = λ Q_{x} (α) + (1 - λ) r . \end{aligned}$

Proof.

Denote $f (x, ω) = ω^{'} x$ and $f (x_{λ}, ω) = ω^{'} x_{λ}$ . Then $\begin{aligned} F_{x_{λ}} (t) & = Pr {f (x_{λ}, ω) < t} \\ = Pr {ω^{'} (1 - λ) x^{0} + λf (x, ω) < t} \\ = Pr {(1 - λ) r + λf (x, ω) < t} \\ = Pr {f (x, ω) < (t - (1 - λ) r) / λ} \\ = F_{x} ((t - (1 - λ) r) / λ) . \end{aligned}$ Next, by definition, $Q_{x_{λ}} (α)$ is the optimal value of the optimization problem $\begin{aligned} max {t \in R : F_{x_{λ}} (t) \leq α} \\ = max {t \in R : Pr {f (x_{λ}, ω) < t \leq α} \\ = max {t \in R : F_{x} ((t - (1 - λ) r) / λ \leq α} . \end{aligned}$ With the variable $τ = (t - (1 - λ) r) / λ$ , the latter problem is equivalent to $\begin{aligned} max {λτ + (1 - λ) r : F_{x_{λ}} (τ) \leq α} . \end{aligned}$ The optimal value of this problem equals the expression $\begin{aligned} λ sup_{τ \in R} {τ : F_{x} (τ) \leq α} + (1 - λ) r = λ Q_{x} (α) + (1 - λ) r = Q_{x_{λ}} (α), \end{aligned}$ which completes the proof.

The preceding proposition shows that the quantile function $Q_{x_{λ}} (α)$ of the mixed portfolio $x_{λ} = λ x + (1 - λ) x^{0}$ is the weighted quantile function the return r of the risk-free portfolio $x^{0}$ and $Q_{x} (α)$ , the return quantile function of the portfolio x.

Thus, for a known risk-free feasible portfolio $x^{0} = (1, 0, \dots, 0)^{'}$ with a fixed return r>0, the calculations can be reduced considerably. Indeed, we need to calculate only one CDF $F_{x, m} (\cdot)$ and can re-use it for the CDFs $F_{x_{λ}, m} (\cdot)$ for different values of λ. In case of a discrete reference CDF $F_{ref} (t)$ with jump points $(T_{ref}, A_{ref}) = {(t_{1}, α_{1}), \dots, (t_{k}, α_{k}), \dots}$ , the projection $p_{H_{m}} (x)$ can be found in an analytical form as the following proposition states.

Proposition 4.3

Assume that $Q_{ref} (1) < r$ , then there exists a risk-free internal feasible portfolio $x^{0}$ with return r. Further, the projection $p_{H_{m}} (x)$ can be stated in the closed form $p_{H_{m}} (x) = (1 - λ_{x}) x^{0} + λ_{x} x$ , where $\begin{aligned} λ_{x} = {\begin{cases} 1, & if H_{m} (x) \leq 0, \\ min_{{α \in A_{ref} : Q_{x} (α) < Q_{ref} (α)}} \frac{Q_{ref} (α) - r}{Q_{x} (α) - r}, & if H_{m} (x) > 0. \end{cases} \end{aligned}$

Proof.

It holds that $Q_{ref} (α) \leq Q_{ref} (1) < r = Q_{x^{0}} (α)$ , i.e. the risk-free portfolio $x^{0} = (1, 0, \dots, 0)^{'}$ is feasible and internal.

For $H_{m} (x) \leq 0$ we have, by definition, $p_{H_{m}} (x) = x$ . So assume that $H_{m} (x) > 0$ . Consider the portfolios $x_{λ} = (1 - λ) x^{0} + λx$ , $λ \in [0, 1]$ , and the function $\begin{aligned} h_{x} (α, λ) & = Q_{ref} (α) - Q_{x_{λ}} (α) \\ = Q_{ref} (α) - r - λ (Q_{x} (α) - r) . \end{aligned}$ For $α \in A_{ref}$ such that $Q_{x} (α) < Q_{ref} (α)$ it holds that $h_{x} (α, 1) > 0$ and $h_{x} (α, 0) < 0$ and the function $h_{x} (α, \cdot)$ is linear strictly monotone. So the projection corresponds to the minimal λ such $h_{x} (α, λ) \geq 0$ , that is to $λ_{x}$ , $\begin{aligned} λ_{x} = min_{{α \in A_{ref} : Q_{x} (α) < Q_{ref} (α)}} \frac{Q_{ref} (α) - r}{Q_{x} (α) - r}, \end{aligned}$ which completes the proof.

The proposition shows that given the conditions, the feasible set ${x \in X : H_{m} (x) \leq 0}$ has a star shape with respect to the feasible point $x^{0}$ that represents a risk-free portfolio.

If $Q_{x} (α)$ , $α \in A_{ref}$ , are continuous functions in x, then, given the conditions of the preceding proposition, $λ_{x}$ is continuous. Hence, the projection mapping $p_{m} (x) = λ_{x} x + (1 - λ_{x}) x^{0}$ is also continuous. It follows from [Citation1, Theorem 4.4] that the problems (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) and (Equation17(17) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{H_{m}} (π_{X} (x))) - ‖ p_{H_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (17) ) are equivalent.

5. Numerical optimization of discontinuous penalty functions

In this section, we consider a numerical method for the optimization of generally discontinuous functions $G_{m} (x)$ , $H_{m} (x)$ and $F_{m} (x)$ in (Equation11(11) $\begin{aligned} min_{x \in X} G_{m} (x) . \end{aligned}$ (11) )–(Equation17(17) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{H_{m}} (π_{X} (x))) - ‖ p_{H_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (17) ). The idea consists in sequential approximations of the original function by smooth (averaged) ones and optimizing the latter by stochastic optimization methods. For this we develop a stochastic finite-difference estimates of gradients of the smoothed functions. Although the successive smoothing method has certain global optimization abilities (as discussed in [Citation42]), to strengthen this property we embed it as a local optimizer into some branch and bound scheme.

The problem setting satisfies the following conditions, which justify the tools employed to numerically solve the problem.

The original optimization problems are correctly set, i.e. they have feasible and optimal solutions. The Lemmas 3.1, 3.3 and 3.4 ensure this assumption.
The original constrained problems are transformed to unconstrained ones and this transformation are assumed to be exact. The exact discontinuous and exact projective penalty methods discussed in Section 4 and in [Citation1] provide this transformation.
Penalty functions are approximated by a sequence of smoothed functions, and this sequence epi-converges to the target penalty function. The so-called strongly lower semicontinuous functions (see Definition 5.1 and Theorem 5.4 in Subsection 5.1 below) ensure this property.
The level sets of the penalty and the smoothed functions are uniformly bounded. This property is provided by penalty term $‖ x - π_{X} (x) ‖$ in (Equation16(16) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{G_{m}} (π_{X} (x))) - ‖ p_{G_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (16) ) and (Equation17(17) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{H_{m}} (π_{X} (x))) - ‖ p_{H_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (17) ) related to projection on the convex set X.
For the unconstrained minimization of smoothed functions we apply stochastic finite-difference gradient method in a combination of a branch and bound method, which are assumed to approximately find their global minimums. Both applied methods are heuristic, their structure and convergence properties are discussed in Subsection 5.2 below.
Convergence of the approximate global minimums to global minimums of the penalty function (on a subset, where penalty function values can be approached through its continuity points) is guaranteed by properties of epi-convergence, cf. [Citation51, Theorem 7.33].

5.1. Averaged functions

We limit the consideration to the case of the so-called strongly lower semicontinuous functions.

Definition 5.1

Strongly lower semicontinuous functions, cf. [Citation43]

A function $F : R^{n} \to R$ is called lower semicontinuous (lsc) at a point x, if $\begin{aligned} \underset{ν \to \infty}{lim inf} F (x^{ν}) \geq F (x) \end{aligned}$ for all sequences $x^{k} \to x$ .

A function $F : R^{n} \to R$ is called strongly lower semicontinuous (strongly lsc) at a point x, if it is lower semicontinuous at x and there exists a sequence $x^{k} \to x$ such that it is continuous at $x^{k}$ (for all $x^{k}$ ) and $F (x^{k}) \to F (x)$ . A function F is called strongly lower semicontinuous (strongly lsc) on $X \subseteq R^{n}$ , if this is the case for all $x \in X$ .

The property of strong lower semi-continuity is preserved under continuous transformations. Some further properties of these functions are discussed in [Citation52].

If the function $F : R^{n} \to R$ is continuous almost everywhere (i.e. it is discontinuous on some set of measure zero), then its (lower) regularization (18) $\begin{aligned} \hat{F} (x) lim inf {F (x^{k}) : x^{k} \to x, F is continuous at x^{k}} \end{aligned}$ (18) is strongly lower semicontinuous and coincides with F almost everywhere.

The averaged functions obtained from the original non-smooth or discontinuous function by convolution with some kernel have smoother characteristics. For this reason, they are often used in optimization theory (see [Citation43,Citation53] and references therein).

Definition 5.2

The set (family) of bounded and integrable functions ${ψ_{θ} : R^{n} \to R_{+}, θ \in R_{+}}$ satisfying for any $ϵ > 0$ the conditions $\begin{aligned} lim_{θ \to 0} \int_{ϵ B} ψ_{θ} (z) d z = 1, B {x \in R^{n} : ‖ x ‖ \leq 1}, \end{aligned}$ is called a family of mollifiers. The kernels ${ψ_{θ}}$ are said to be smooth if the functions $ψ_{θ} (\cdot)$ are continuously differentiable.

A function $F : R^{n} \to R^{1}$ is called bounded at infinity if there are positive numbers C and r such that $| F (x) | \leq C$ for all x with $‖ x ‖ \geq r$ .

Given a locally integrable, bounded at infinity, function $F : R^{n} \to R^{1}$ and a family of smoothing kernels ${ψ_{θ}}$ , the associated family of averaged functions ${F_{θ} : θ \in R_{+}}$ is (19) $\begin{aligned} F_{θ} (x) \int_{R^{n}} F (x + z) ψ_{θ} (z) d z = \int_{R^{n}} F (z) ψ_{θ} (z - x) d z . \end{aligned}$ (19) Smoothing kernels can have an unlimited support $supp ψ_{θ} = {x : ψ_{θ} (x) > 0}$ . To ensure the existence of the integrals (Equation19(19) $\begin{aligned} F_{θ} (x) \int_{R^{n}} F (x + z) ψ_{θ} (z) d z = \int_{R^{n}} F (z) ψ_{θ} (z - x) d z . \end{aligned}$ (19) ), we assume that the function F is bounded at infinity. We can always assume this property if we are interested in the behaviour of F within some bounded area. If $supp ψ_{θ} \to 0$ for $θ \to 0$ , then this assumption is superfluous.

For example, a family of kernels can be as follows. Let ψ be some probability density function with bounded support $supp ψ$ , a positive numerical sequence ${θ_{ν} : ν = 1, 2, \dots}$ tending to 0 as $ν \to \infty$ . Then the smoothing kernels on $R^{n}$ can be taken as $\begin{aligned} ψ_{θ_{ν}} (z) \frac{1}{(θ_{ν})^{n}} ψ (z / θ_{ν}), \end{aligned}$ where $θ_{ν}$ is a bandwidth.

If the function F is not continuous, then we cannot expect the averaged functions $F_{θ} (x)$ to converge to F uniformly. But we don't need that. We need such a convergence of the averaged functions $F_{θ} (x)$ to F that guarantees the convergence of the minima of $F_{θ} (x)$ to the minima of F. This property is guaranteed by the so-called epi-convergence of functions.

Definition 5.3

Epi-convergence, cf. Rockafellar and Wets [Citation54]

A sequence of functions ${F^{ν} : R^{n} \to \bar{R}, ν \in N}$ epi-converges to a function $F : R^{n} \to \bar{R}$ at a point x, iff

$\underset{ν \to \infty}{lim inf} F^{ν} (x^{ν}) \geq F (x)$ for all $x^{ν} \to x$ and
$\underset{ν \to \infty}{lim inf} F^{ν} (x^{ν}) = F (x)$ for some sequence $x^{ν} \to x$ .

The sequence ${F^{ν}}$ epi-converges to F, if this is the case at every point $x \in R^{n}$ .

Theorem 5.4

Epi-convergence of averaged functions, cf. [Citation43], Theorem 3.7 and [Citation51], Example 7.19

For a strongly lower semicontinuous locally integrable function $F : R^{n} \to R^{1}$ , any associated sequence of averaged functions ${F^{ν} = F_{θ_{ν}} : θ_{ν} \in R_{+}}$ epi-converges to F as $θ_{ν} ↓ 0$ .

Note that in the optimization problem without constraints it follows that $\begin{aligned} lim_{ν} (inf_{x} F^{ν}) = inf_{x} F, \end{aligned}$ cf. [Citation51, Theorem 7.33].

We remark that for almost everywhere continuous function F, the corresponding averaged functions $F_{θ_{ν}}$ epi-converge to its regularization $\hat{F}$ defined in (Equation18(18) $\begin{aligned} \hat{F} (x) lim inf {F (x^{k}) : x^{k} \to x, F is continuous at x^{k}} \end{aligned}$ (18) ).

To optimize discontinuous functions, we approximate them with averaged functions. The convolution of a discontinuous function with the corresponding kernel (probability density) improves analytical properties of the resulting function, but increases the computational complexity of the problem, since it transforms the deterministic function into an expectation, which is a multidimensional integral. This transformation of the problem, which involves probability measures, naturally links to well-studied and well-established methods in stochastic optimization.

We can consider smoothed functions obtained by employing differentiable kernel with unbounded support, for example the Gaussian kernel given by the probability density $\begin{aligned} ψ (y) = (2 π)^{- n / 2} e^{- ‖ y ‖^{2} / 2} . \end{aligned}$ Consider the family $\begin{aligned} F_{θ} (x) = \int_{R^{n}} F (x + θy) ψ (y) d y = \frac{1}{θ^{n}} \int_{R^{n}} F (z) ψ (\frac{z - x}{θ}) d z, θ > 0, \end{aligned}$ of averaged functions. Suppose that F is globally bounded (one may even assume that $| F (x) | \leq γ_{1} + γ_{2} ‖ x ‖^{γ_{3}}$ with some non-negative constants $γ_{1}$ , $γ_{2}$ and $γ_{3}$ ). Then for the strongly lsc function F, the average functions $F_{θ}$ epi-converge to F as $θ ↓ 0$ and each function $F_{θ}$ is analytical with gradient (cf. Stein's lemma) $\begin{aligned} \nabla F_{θ} (x) & = \frac{1}{θ^{n + 2}} \int_{R^{n}} F (z) ψ (\frac{z - x}{θ}) (z - x) d z = \frac{1}{θ} \int_{R^{n}} F (x + θ y) ψ (y) y d y \\ = - \frac{1}{θ} \int_{R^{n}} F (x - θy) ψ (y) y d y = \frac{1}{θ} \int_{R^{n}} (F (x + θy) - F (x)) ψ (y) y d y \\ = \frac{1}{2 θ} \int_{R^{n}} (F (x + θy) - F (x - θy)) ψ (y) y d y, \end{aligned}$ or (20) $\begin{aligned} \nabla F_{θ} (x) = E_{η} \frac{1}{θ} (F (x + θη) - F (x)) η = E_{η} \frac{1}{2 θ} (F (x + θη) - F (x - θη)) η, \end{aligned}$ (20) where the random vector η follows the standard normal distribution and $E_{η}$ is the mathematical expectation over η.

It follows that the random vector (21) $\begin{aligned} ξ_{θ} (x, η) = \frac{η}{2 θ} (F (x + θη) - F (x - θη)) \end{aligned}$ (21) with Gaussian random variable η is an unbiased random estimate of the gradient $\nabla F_{θ} (x)$ .

5.2. Stochastic methods for minimization of discontinuous penalty functions

Consider a problem of constrained minimization of a generally discontinuous function subject to a box or other convex constraints. The target problems are (Equation11(11) $\begin{aligned} min_{x \in X} G_{m} (x) . \end{aligned}$ (11) )–(Equation17(17) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{H_{m}} (π_{X} (x))) - ‖ p_{H_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (17) ).

Such problems can be solved, e.g. by collective random search algorithms. In this section, we develop stochastic quasi-gradient algorithms to solve these problems.

A problem of constrained optimization can be reduced to the problem of unconstrained optimization of a coercive function $F (x)$ by using non-smooth or discontinuous penalty functions as described in [Citation1,Citation42,Citation50] (for the case of the present paper see Section 4).

Suppose the function $F (x)$ is strongly lower semicontinuous. In view of Theorem 5.4, it is always possible to construct a sequence of smoothed averaged functions $F_{θ_{ν}}$ that epi-converges to F. Due to this property, the global minima of $F_{θ_{ν}}$ converge to the global minima of F as $θ_{ν} \to 0$ . Convergence of local minima was studied in [Citation43].

Let us consider some procedures for optimizing the function F using approximating averaged functions $F_{θ_{ν}}$ .

Suppose one can find the global minima ${x^{ν}}$ of the functions $F_{θ_{ν}}$ , $ν = 0, 1, \dots$ . Then any limit point of the sequence ${x^{ν}}$ is a global minimum of the function F. However, finding global minima of $F_{θ_{ν}}$ can be a quite difficult task, so consider the following method.

The successive stochastic smoothing method, cf. [Citation42]

The method sequentially minimizes a sequence of smoothed functions $F_{θ_{ν}}$ with decreasing smoothing parameter $θ_{ν} ↓ 0$ . The sequence of approximations $x^{ν}$ is constructed by implementing the following steps (cf. [Citation42]).

The successive smoothing method as a local optimizer.

Initialization. Fix a box $[lb, ub]$ with lower (lb) and upper (ub) bounds on the problem variables, select a number N of (decreasing) smoothing steps and a decreasing rate $α \in [1 / 2, 1)$ ; K is the number of iterations of the Nemirovski-Yudin method (NYM), select a starting (maximal) smoothing parameter $θ_{1}$ , e.g. $θ_{1} = ‖ ub - lb ‖ .$ Select a (random) starting point $x^{0} \in (lb, ub)$ . Set $ν = 1$ , the initial smoothing iteration count.
Smoothing iterations. For a fixed smoothing parameter $θ_{ν}$ , $ν \geq 1,$ minimize the smoothed function $F_{θ_{ν}}$ by some stochastic optimization method with the use of the initial point $x^{ν}$ and finite-difference stochastic gradients (Equation21(21) $\begin{aligned} ξ_{θ} (x, η) = \frac{η}{2 θ} (F (x + θη) - F (x - θη)) \end{aligned}$ (21) ) to find the next approximation $x^{ν + 1}$ . For example, this can be done by the following stochastic optimization algorithm.
1. Initialization of the Nemirovski-Yudin method (NYM). Fix $m \geq 1$ the sample size for the gradient estimation. Set k=1, $y^{1} = z^{1} = x^{ν}$ .
2. Batch estimation of the gradient $\nabla F_{θ_{ν}} (y^{k})$ . Sample independent, normally distributed directions $η^{1}, \dots, η^{m} \in R^{n}$ and calculate the current estimate of $\nabla F_{θ_{ν}} (y^{k})$ by employing (Equation21(21) $\begin{aligned} ξ_{θ} (x, η) = \frac{η}{2 θ} (F (x + θη) - F (x - θη)) \end{aligned}$ (21) ) as $\begin{aligned} ξ_{θ_{ν}} (y^{k}) = \sum_{i = 1}^{m} \frac{1}{2 θ_{ν}} (F (y^{k} + θ_{ν} η^{i}) - F (y^{k} - θ_{ν} η^{i})) η^{i} . \end{aligned}$
3. Itarations of NYM: $\begin{aligned} y^{k + 1} & = Π_{[lb, ub]} (y^{k} - ρ_{k} \frac{ξ_{θ_{ν}} (y^{k})}{‖ ξ_{θ_{ν}} (y^{k}) ‖ + ϵ}), ρ_{k} = θ_{ν} / \sqrt{k}, ϵ > 0, \\ z^{k + 1} & = (1 - \frac{ρ_{k}}{\sum_{i = 1}^{k} ρ_{i}}) z^{k} + (\frac{ρ_{k}}{\sum_{i = 1}^{k} ρ_{i}}) y^{k + 1}, \end{aligned}$ where $Π_{[lb, ub]} (\cdot)$ is the projection operator on the box $[lb, ub]$ .
4. Stopping of NYM. Increase k by 1. If k<K, then go to (ii)b, else continue with 5.2.
Gelfand–Zetlin–Nesterov step. Set $\begin{aligned} x^{ν + 1} = z^{k + 1} + λ (z^{k + 1} - z^{k}), λ \in [0, 1), e.g. λ = 0.8. \end{aligned}$
Transition to the next less smoothing step (or stop). Increase the smoothing iteration number ν by 1, decrease the smoothing parameter $θ_{ν + 1} = α \cdot θ_{ν}$ . If $ν < N$ , go to step (ii), else return an approximate solution $x^{ν}$ of the optimization problem.

The successive smoothing method described above starts with a more smooth function that disregards the fine structure of the objective to find the most promising regions in the search space before gradually making the approximation more exact. The successive smoothing with gradual vanishing degree of smoothing distinguishes our method from other smoothing methods, which use fixed smoothing for minimization of non-smooth functions. In our method we allocate a fixed number K of iterations to minimize a smoothed function $F_{θ_{ν}} (\cdot)$ . For a particular case K=1 (and $λ = 0$ ), the convergence rate of the successive smoothing method on the class of Lipschitz non-smooth convex functions was studied in [Citation55]. Roughly speaking, the rate of convergence (in function) of the method is proportional to the range of the function on the feasible set multiplied by factor $\sqrt{n / (mN)}$ , where n is the dimension of the space, m is the number of finite differences for estimation of gradients. This method stops after N smoothing iterations. The other practical stopping rule is to stop computations, if there is no progress of the algorithm after several iterations.

For the minimization of the smoothed function $F_{θ}$ under fixed $θ = θ_{ν}$ , one can apply any (not necessary NY mirror descent [Citation56] but, e.g. a stochastic version of the Nesterov's method [Citation57]) stochastic finite-difference optimization method based on the finite difference representations (Equation20(20) $\begin{aligned} \nabla F_{θ} (x) = E_{η} \frac{1}{θ} (F (x + θη) - F (x)) η = E_{η} \frac{1}{2 θ} (F (x + θη) - F (x - θη)) η, \end{aligned}$ (20) ), (Equation21(21) $\begin{aligned} ξ_{θ} (x, η) = \frac{η}{2 θ} (F (x + θη) - F (x - θη)) \end{aligned}$ (21) ) of the gradients of the smoothed function. In the described NY algorithm, the local optimization of $F_{θ}$ stops after K iterations.

One more possible stopping rule is based on estimating gradients $\nabla F_{θ} (x)$ during the iterative optimization process. In general, this is a rather complicated and time-consuming procedure that requires calculation of multidimensional integrals. However, such asymptotically consistent estimates can be constructed in parallel with the construction of the main minimization sequence by using the following so-called averaging procedure [Citation45,Citation58,Citation59].

Consider the stochastic optimization procedure $\begin{aligned} x^{k + 1} & = x^{k} - ρ_{k} z^{k}, z^{0} = ξ_{θ} (x^{0}, η^{0}), x^{0} \in R^{n}, \\ z^{k + 1} & = z^{k} - λ_{k} (z^{k} - ξ_{θ} (x^{k}, η^{k})), k = 0, 1, \dots, \end{aligned}$ for an iterative optimization of a function $F_{θ} (x)$ and parallel evaluation of its gradients $\nabla F_{θ} (x)$ , where the vectors $ξ_{θ} (x^{k}, η)$ are given by (Equation21(21) $\begin{aligned} ξ_{θ} (x, η) = \frac{η}{2 θ} (F (x + θη) - F (x - θη)) \end{aligned}$ (21) ). For the conditional expectations it holds that $E (ξ_{θ} (x^{k}, η^{k}) ∣ x^{k}) = \nabla F_{θ} (x^{k})$ . Let the numbers $ρ_{k}$ , $λ_{k}$ satisfy the conditions $\begin{aligned} 0 & \leq λ_{k} \leq 1, lim_{k} λ_{k} = 0, \sum_{k = 0}^{\infty} λ_{k} = + \infty, \sum_{k = 0}^{\infty} λ_{k}^{2} < + \infty, and \\ lim_{k} \frac{ρ_{k}}{λ_{k}} & = 0. \end{aligned}$ Then, with probability one, it holds (cf. Ermoliev [Citation58, Theorem V.8]) that $\begin{aligned} z^{k} - \nabla F_{θ} (x^{k}) \to 0 as k \to \infty . \end{aligned}$ If $‖ \nabla F_{θ_{ν}} (x^{ν}) ‖ \leq ϵ_{ν} \to 0$ , then, by results of [Citation43,Citation45,Citation60], the constructed sequence $x^{ν}$ asymptotically converges to the set, which satisfies necessary optimality conditions for F. Some other stopping rules for stochastic gradient methods are discussed, e.g. in [Citation61].

Figure provides a graphical illustration of the performance of the successive smoothing method on problem (Equation14(14) $\begin{aligned} max_{x \in R^{n}} Φ (x) F (π_{X} (x)) - ‖ x - π_{X} (x) ‖ \end{aligned}$ (14) ).

Figure 2. Illustration of the smoothing method performance on two asset (1,2) portfolio selection. Examples of the trajectories of the method for different discontinuous penalties. $x_{ref} = (x_{1} = 0.3, x_{2} = 0.7)$ , $δ = 0.05$ .

Global optimization issues.

The optimization problems under consideration (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) and (Equation7(7) $\begin{aligned} maximize & f (x) \end{aligned}$ (7) )–(Equation8(8) $\begin{aligned} subject to & Q_{ref} (α) \leq Q_{x} (α) for all α \in [0, 1] and x \in X . \end{aligned}$ (8) ) are challenging, they are non-convex and multi-extremal, discontinuous, disconnected and constrained. So we apply a number of different techniques to tackle and solve them.

To remove discontinuous constraints, we use exact discontinuous penalty functions, and to remove structural portfolio constraints and bound constraints, we apply exact projective penalty functions, cf. [Citation1,Citation42,Citation50].

In the case of multi-extremal problems (as in Figure ) we employ a version of the branch and bound method [Citation1] with the successive smoothing method as a local minimizer (in [Citation1], the successive quadratic approximation method was used as a local optimizer).

Figure 3. Illustration of the smoothing method performance on two asset (4,9) portfolio selection. Example of the trajectory of the method for non-connected feasible set. $x_{ref} = (x_{4} = 0.3, x_{9} = 0.7)$ , $δ = 0.05$ . The green is the starting point, the red is the final one.

To solve problem (Equation11(11) $\begin{aligned} min_{x \in X} G_{m} (x) . \end{aligned}$ (11) )–(Equation17(17) $\begin{aligned} max_{x \in R^{n}} [F_{m} (x) f_{m} (p_{H_{m}} (π_{X} (x))) - ‖ p_{H_{m}} (π_{X} (x)) - π_{X} (x) ‖ - ‖ x - π_{X} (x) ‖] . \end{aligned}$ (17) ), we apply the following branch and bound (cut) algorithm acting in a box $[lb, ub] \subset R^{n}$ .

The Branch & Bound algorithm.

Initialization.

Set the initial partition $P_{0} = {X = [lb, ub]}$ , select a random starting point ${\tilde{x}}^{0} \in X$ and apply some local optimization algorithm $A$ to the problem under consideration. As result, we find a better point ${\bar{x}}^{0} \in X$ such that $F ({\bar{x}}^{0}) < F ({\tilde{x}}^{0})$ . Set the B&B iteration count k=0. Set tolerances $ϵ > 0$ and $δ > 0$ . Set the integer stopping parameter $p \geq 2.$

B&B iteration.

Suppose at iteration k we have partition $P_{k} = {X_{i} : i = 1, \dots, N_{k}}$ of the set $X = \cup_{i = 1}^{N_{k}} X_{i}$ consisting of smaller boxes $X_{i}$ . For each $X_{i}$ , there is a known feasible point ${\bar{x}}^{i} \in X_{i}$ and the corresponding value $F ({\bar{x}}^{i})$ , $V_{k} = min_{1 \leq i \leq N_{k}} F ({\bar{x}}^{i})$ . Set $P_{k + 1} = \emptyset$ .

For each such set $X_{i} \in P_{k}$ choose a random starting point ${\tilde{x}}^{i}$ and apply some local optimization algorithm (e.g. the successive smoothing one) to the problem $min_{x \in X_{i}} F (x)$ to find a better point ${\bar{\bar{x}}}^{i} \in X_{i}$ , $F ({\bar{\bar{x}}}^{i}) < F ({\tilde{x}}^{i})$ .

If the values $F ({\bar{x}}^{i})$ and $F ({\bar{\bar{x}}}^{i})$ are sufficiently different, say $‖ F ({\bar{x}}^{i}) - F ({\bar{\bar{x}}}^{i}) ‖ \geq ϵ$ , or the points ${\bar{\bar{x}}}^{i}$ and ${\bar{x}}^{i}$ are sufficiently distinct, $‖ {\bar{\bar{x}}}^{i} - {\bar{x}}^{i} ‖ \geq δ$ , we subdivide the box $X_{i} = X_{i}^{'} \cup X_{i}^{′′}$ into two subboxes $X_{i}^{'}$ and $X_{i}^{′′}$ so that ${\bar{x}}^{i} \in X_{i}^{'}$ and ${\bar{\bar{x}}}^{i} \in X_{i}^{′′}$ . In this case, the partition $P_{k + 1}$ is updated by adding the successors $X_{i}^{'}$ and $X_{i}^{′′}$ , i.e. $P_{k + 1} P_{k + 1} \cup X_{i}^{'} \cup X_{i}^{′′} .$ Otherwise, if the values $F ({\bar{x}}^{i})$ and $F ({\bar{\bar{x}}}^{i})$ and points ${\bar{\bar{x}}}^{i}$ and ${\bar{x}}^{i}$ are close, the set $X_{i} ∋ {\bar{x}}^{i}$ goes unchanged to the updated partition $P_{k + 1} P_{k + 1} \cup X_{i}$ .

When all elements $X_{i} \in P_{k}$ are examined in this way, i.e. the new partition $P_{k + 1}$ with elements $X_{i}, i = 1, \dots, N_{k + 1}$ , and points ${\bar{x}}_{i} \in X_{i}$ has been constructed, we modify the achieved record value, $V_{k + 1} = min_{1 \leq i \leq N_{k + 1}} F ({\bar{x}}^{i})$ .

Check for the stop.

If there is no progress of the B&B method during p B&B iterations, i.e. $V_{k + 1} = V_{k + 1 - p}$ , then stop; otherwise, repeat the B&B iteration.

Remark 5.5

The described B&B aims at subdividing a non-convex multiextremal problem into sub-problems, each containing only one local minimum. To prevent the method from early stop, two mechanisms are used. First, the stopping parameter $p \geq 2$ is introduced: the method stops after p unproductive B&B iterations. In all numerical experiments, p=5 was sufficient to find the global minimum. Second, for each new run of the local minimizer, a random starting point is used, that increases the probability of finding a new local minimum of the sub-problem.

Let us emphasize that as local optimization algorithm within the B&B scheme one can apply any reasonable (even heuristic) algorithm, which allows improving the current solution. In particular, in numerical experiments we also used well implemented optimization algorithms from the shelf, e.g. a sequential quadratic approximation algorithm, which formally is not applicable to the considered problem, but which quickly finds local optimums and thus considerably speeds up the overall optimization procedure.

The function values $F ({\bar{x}}^{i})$ of the objective provide upper bounds for the optimal values $F_{i}^{*} = min_{x \in X_{i}} F (x)$ . If there were known lower bounds $L_{i} \leq F_{i}^{*}$ , then the subsets $X_{i} \in P_{k}$ such that $L_{i} \geq V_{k}$ can be safely ignored, i.e. excluded from the current partition $P_{k}$ . Heuristically, if some set $X_{i}$ remains unchanged during several (say p) B&B iterations, it can be ignored or rare examined in the future iterations. Further results of the B&B algorithm described above are available in [Citation1].

6. Numerical illustration

For the numerical illustration of the algorithm proposed we return to portfolio optimization under 1st-order stochastic dominance constraints. We use a small data set of annual returns of nine US companies from [Citation62, Table 1, page 13] provided in the appendix.

6.1. Testing the successive smoothing method on the discontinuous portfolio optimization problems

First, let us illustrate the proposed approach on a two-dimensional portfolio (the first two columns of the table from the appendix) by solving the problem (Equation5(5) $\begin{aligned} maximize & f (x) \end{aligned}$ (5) )–(Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ). For this we first fix some initial reference portfolio, $x_{ref}^{'} = (0.7, 0.3)$ with average return $μ_{ref} = 0.0629$ and corresponding CDF $F_{ref} (t)$ . Then, we relax the constraint (Equation6(6) $\begin{aligned} subject to & F_{x} (t) \leq F_{ref} (t) for all t \in R and x \in X . \end{aligned}$ (6) ) by replacing $F_{ref} (t)$ with the shifted function $F_{ref} (t - δ)$ , where $δ = 0.05$ , independent of t. With this new reference function we solve the problem (Equation13(13) $\begin{aligned} max_{x \in X} [F (x) {\begin{cases} f (x) & if G_{m} (x) \leq 0, \\ c - G_{m} (x) & else, \end{cases}] \end{aligned}$ (13) ) with different values of the parameter $c \in {0.0659, 1.0659}$ by the stochastic smoothing method from Subsection 5.2.

Figure presents the results. The pictures illustrate how the method climbs up to the global maximum of the discontinuous objective function. In the two presented examples, the method finds better portfolios with average return $μ \approx 0.0640$ . The right picture also highlights the set of feasible portfolios, because by setting a larger c, we decrease the values of the penalized function at the infeasible points. It can be seen that the proposed version of the smoothing method is not very sensible to discontinuities of the minimized function.

One more example is presented on Figure . The reference 2-security portfolio is $x_{ref}^{'} = (x_{4} = 0.3, x_{9} = 0.7)$ , $δ = 0.05$ . In this example, the feasible set is not connected. The optimal portfolio consists of the two assets $x_{4, 9}^{*} = (x_{4} = 0.8779, x_{9} = 0.1219)^{'}$ with the expected return $μ (x_{4, 9}^{*}) = 0.1664$ . If we extend the portfolio to nine assets, we can obtain by the proposed method the better return $μ_{1 : 9}^{*} = 0.1724$ with the optimal portfolio $\begin{aligned} x^{*} = (0.0117, 0.0131, 0.0730, 0.2936, 0.4619, 0.0123, 0.0080, 0.0324, 0.0880)^{'}, \end{aligned}$ $\sum_{i} x_{i}^{*} = 0.9941$ , within the same bounds on risk, $G (x^{*}) = 0$ .

6.2. Lower bounding the risk-return profile and maximizing the tail return

This subsection presents results of optimization of 3- and 10-component portfolios under 1st-order stochastic dominance constraints. The constraint is given by its (reference) CDF, and as objective functions we employ the average value of the portfolio return (AV), the Value-at-Risk (quantile, ${V @ R}_{α}$ ) and Average Value-at-Risk indicators ( ${AV @ R}_{α}$ ) with levels $α = 40 %$ and $α = 70 %$ . Results are provided in tabular and graphical form, each set of experiments is specified by the number of portfolio components (3 or 10), the parameter α ( $40 %$ or $70 %$ ), and the exact penalty method applied (discontinuous or projective from Subsections 4.3 and 4.4).

Each table contains three numerical rows corresponding to three kinds of the objective functions:

the mean,
the $V @ R$ , and
the $AV @ R$ .

The rows ‘Mean’, ‘ $V @ R$ ’ and ‘ $AV @ R$ ’ show the objective values of the corresponding indicators for the three optimal portfolios. The other columns show the structure of the obtained portfolios. Each table is supplemented by a reference figure containing three graphs displaying the optimal risk profile. The blue broken line in a graph depicts the reference CDF, a (left) bound on the portfolio return, CDF([0.05; 0.05; 0.1; 0.11; 0.125])=[0; 0.2; 0.4; 0.6; 1]. The red (right) broken line shows the CDF of the actual optimal portfolio return. So the lines display the reference and the actual risk profiles of the optimal portfolios.

Tables and (and the corresponding Figures and ) compare results of solving portfolio optimization problems by means of discontinuous and projective penalty methods, respectively, $α = 40 %$ . As can be seen, both figures, Figure and , are very similar, that can be a proof that both penalty methods are applicable and give close results.

Figure 4. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Discontinuous penalties. 1) Optimal average return. 2) Optimal ${V @ R}_{40 %}$ . 3) Optimal ${AV @ R}_{40 %}$ ; cf. Table .

Figure 5. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) Optimal average return. (2) Optimal ${V @ R}_{0.4}$ . (3) Optimal ${AV @ R}_{0.4}$ . See Table .

Table 1. Optimal 3 component portfolios, discontinuous penalties: (1) Max mean; (2) Max $V @ R$ ; (3) Max $AV @ R$ .

Display Table

Table 2. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max ${V @ R}_{40 %}$ . (3) Max ${AV @ R}_{40 %}$ .

Display Table

The next two Tables and (and the corresponding Figures and ) show the effect of extension of a portfolio for account of new securities, from 3 to 10, for $α = 70 %$ . The objective functions values in Table are greater than the corresponding values in Table . The corresponding Figures and indicate changes in the risk profiles of optimal portfolios due to this enlargement. The increase of the objective functions happens also for account of huddling the risk profiles to the reference ones. The pictures also show the influence of the different objective functions on the risk profiles of the optimal portfolios. Finally, Table shows the structures of the optimal 10-component portfolios.

Figure 6. Profiles of optimal 3 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) Optimal average return. (2) Optimal ${V @ R}_{70 %}$ . (3) Optimal ${AV @ R}_{70 %}$ . See Table .

Figure 7. Profiles of optimal 10 component portfolios: maximizing the tail returns under the risk-return lower bound. Analytical projective penalties. (1) optimal average return; (2) optimal ${V @ R}_{70 %}$ ; (3) optimal ${AV @ R}_{70 %}$ ; cf. Tables and .

Table 3. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max $V @ R$ . (3) Max $AV @ R$ .

Display Table

Table 4. Optimal 10 component portfolios, analytical projection: (1) Max mean. (2) Max $V @ R$ . (3) Max $AV @ R$ .

Display Table

Table 5. The optimal 10 component portfolio.

Display Table

6.3. Summary of numerical experience

The concrete computational results of the numerical experiments depend on the following settings. We used

three different objective functions, the mean value, quantile and average quantile functions at different risk levels;
two kind of penalty functions, discontinuous and projective penalty ones (the latter is applied when there is an internal risk three portfolio);
a specific heuristic stochastic branch and bound algorithm with adjustable parameters;
several local optimization algorithms, Nemirovski-Yudin and Nesterov finite-difference stochastic optimization algorithms, and also a deterministic sequential quadratic programming one;
random starting points and adjustable parameters for local optimization algorithms.

The concrete computational results depend on all these options.

The problems under consideration can have many local optimums close to the global optimum, so the combined B&B algorithms may stick at different deep local optimums and may have different random running times (from several seconds to several minutes). To validate the global optimum, it is necessary to change the parameters of the algorithm and re-run it.

It was observed that the algorithm, combined algorithm with Nesterov's local optimization method, works better than combined with Nemirovski-Yudin's one. Besides, the combined heuristic B&B algorithm, which uses a well implemented deterministic sequential quadratic approximation algorithm (sqp from Matlab's optimization toolbox) as a local optimizer, works much faster (10 times or more) than stochastic optimization algorithms (seconds against minutes).

7. Conclusions

The paper considers a specific method for optimization, which transforms a constraint optimization problem to an unconstrained global optimization problem. The paper illustrates the procedure for an optimization problem with uncountable many constraints. More specifically, the paper considers financial portfolio optimization under 1-st order stochastic dominance constraints.

The designed optimization techniques are aimed at interactive solving the portfolio reshaping problem, i.e. interactive adaptation of the portfolio risk profile to the portfolio manager's preferences by changing risk bounds and optimization of different risk criteria.

In the literature, similar portfolio optimization problems are mostly considered under 2nd-order stochastic dominance constraints, which constitute convex problems. Few exceptions include [Citation4,Citation5,Citation27,Citation33,Citation37,Citation63]. The 1st-order constraints put lower bounds on the risk profile (CDF) of the optimized portfolio. As objective functions, different aggregated indicators can serve, e.g. the expected value, the Value-at-Risk, or the average Value-at-Risk, etc. In this setting, we put lower bounds on low returns and try to maximize higher returns.

Such constraints make the problem non-convex and hard for numerical treatment. We propose the new exact penalty functions to handle the constraints and a new stochastic B&B algorithm and new stochastic optimization (smoothing) techniques for solving penalty problems. The approach is numerically and graphically illustrated on small test examples. The advantage of the proposed approach to financial portfolio optimization consists in an additional visual control of the risk profile of the optimal portfolio.

The proposed B&B algorithm is well-suited for effective parallelization (parallel implementation of the B&B iteration) and thus has much room for acceleration and scaling and this may be a topic for future research.

Disclosure statement

This study builds upon publicly available data collected in Table 9. The corresponding data, codes, and tests are available at the first author's ResearchGate web-page. The authors have no conflicts of interest to disclose.

Additional information

Funding

Gratefully acknowledges funding by Volkswagenstiftung (Volkswagen Foundation) and National Research Fund of Ukraine – Project ID 2020.02/0121. DFG, German Research Foundation – Project-ID 416228727 – SFB 1410.

References

Norkin VI. The projective exact penalty method for general constrained optimization. Preprint. V. M. Glushkov Institute of Cybernetics, Kyiv, 2022..
Google Scholar
Dentcheva D, Ruszczyński A. Optimization with stochastic dominance constraints. SIAM J Optim. 2003;14:548–566. doi: 10.1137/s1052623402420528
Web of Science ®Google Scholar
Dentcheva D, Ruszczyński A. Portfolio optimization with stochastic dominance constraints. J Bank Financ. 2006;30:433–451. doi: 10.1016/j.jbankfin.2005.04.024
Web of Science ®Google Scholar
Noyan N, Rudolf G, Ruszczyński A. Relaxations of linear programming problems with first order stochastic dominance constraints. Oper Res Lett. 2006;34:653–659. doi: 10.1016/j.orl.2005.10.004
Web of Science ®Google Scholar
Noyan N, Ruszczyński A. Valid inequalities and restrictions for stochastic programming problems with first order stochastic dominance constraints. Math Program. 2008;114:249–275. doi: 10.1007/s10107-007-0100-1
Web of Science ®Google Scholar
Shapiro A, Dentcheva D, Ruszczyński A. Lectures on stochastic programming. 3rd ed. SIAM; 2021. (MOS-SIAM Series on Optimization). doi: 10.1137/1.9781611976595
Google Scholar
Gutjahr WJ, Pichler A. Stochastic multi-objective optimization: a survey on non-scalarizing methods. Ann Oper Res. 2013;236(2):1–25. doi: 10.1007/s10479-013-1369-5
Google Scholar
Markowitz HM. Portfolio selection. J Finance. 1952;7(1):77–91. doi: 10.2307/2975974
Web of Science ®Google Scholar
Roy AD. Safety first and the holding of assets. Econometrica. 1952;20:431–449. doi: 10.2307/1907413
Web of Science ®Google Scholar
Artzner P, Delbaen F, Eber J-M, et al. Coherent measures of risk. Math Financ. 1999;9:203–228. doi: 10.1111/mafi.1999.9.issue-3
Web of Science ®Google Scholar
Benati S, Rizzi R. A mixed integer linear programming formulation of the optimal mean/value-at-risk portfolio problem. Eur J Oper Res. 2007;176:423–434. doi: 10.1016/j.ejor.2005.07.020
Web of Science ®Google Scholar
Gaivoronski AA, Pflug GC. Finding optimal portfolios with constraints on value-at-rrisk. In: Green B, editor, Proceedings of the Third International Stockholm Seminar on Risk Behaviour and Risk Management. Stockholm University; 1999.
Google Scholar
Gaivoronski AA, Pflug GC. Value at risk in portfolio optimization: properties and computational approach. J Risk. 2005;7:1–31. doi: 10.21314/JOR.2005.106
Google Scholar
Kataoka S. A stochastic programming model. Econometrica. 1963;31:181–196. doi: 10.2307/1910956
Web of Science ®Google Scholar
Kibzun AI, Kan YS. Stochastic programming problems with probability and quantile functions. Chichester: John Wiley & Sons; 1996.
Google Scholar
Kibzun AI, Naumov AV, Norkin VI. On reducing a quantile optimization problem with discrete distribution to a mixed integer programming problem. Autom Remote Control. 2013;74:951–967. doi: 10.1134/S0005117913060064
Web of Science ®Google Scholar
Kirilyuk V. Risk measures in stochastic programming and robust optimization problems. Cybern Syst Anal. 2015;51:874–885. doi: 10.1007/s10559-015-9780-3
Google Scholar
Luedtke J, Ahmed S, Nemhauser G. An integer programming approach for linear programs with probabilistic constraints. Math Program. 2010;122:247–272. doi: 10.1007/s10107-008-0247-4
Web of Science ®Google Scholar
Norkin VI, Boyko SV. Safety-first portfolio selection. Cybern Syst Anal. 2012;48:180–191. doi: 10.1007/s10559-012-9396-9
Google Scholar
Norkin VI, Kibzun AI, Naumov AV. Reducing two-stage probabilistic optimization problems with discrete distribution of random data to mixed-integer programming problems. Cybern Syst Anal. 2014;50:679–692. doi: 10.1007/s10559-014-9658-9
Google Scholar
Pflug GC, Römisch W. Modeling, measuring and managing risk. River Edge (NJ): World Scientific; 2007. doi: 10.1142/9789812708724
Google Scholar
Prekopa A. Stochastic programming. Dordreht: Kluwer Academic Publisher; 1995.
Google Scholar
Sen S. Relaxation for probabilistically constrained programs with discrete random variables. Oper Res Lett. 1992;11:81–86. doi: 10.1016/0167-6377(92)90037-4
Web of Science ®Google Scholar
Telser LG. Safety first and hedging. Rev Econ Stud. 1955/56;23:1–16. doi: 10.2307/2296146
Web of Science ®Google Scholar
Wozabal D, Hochreiter R, Pflug GC. A D. C. formulation of value-at-risk constrained optimization. Optimization. 2010;59:377–400. doi: 10.1080/02331931003700731
Web of Science ®Google Scholar
Rockafellar RT, Uryasev S. Optimization of conditional value-at-risk. J Risk. 2000;2(3):21–41. doi: 10.21314/JOR.2000.038
Google Scholar
Dentcheva D, Ruszczyński A. Risk preferences on the space of quantile functions. Math Program Ser B. 2013;148:181–200. doi: 10.1007/s10107-013-0724-2
Web of Science ®Google Scholar
Müller A, Stoyan D. Comparison methods for stochastic models and risks. Chichester: John Wiley & Sons; 2002.
Google Scholar
Ogryczak W, Ruszczyński A. From stochastic dominance to mean-risk models: semideviations as risk measures. Eur J Oper Res. 1999;116:33–50. doi: 10.1016/S0377-2217(98)00167-2
Web of Science ®Google Scholar
Ogryczak W, Ruszczyński A. On consistency of stochastic dominance and mean–semideviation models. Math Program Ser B. 2001;89:217–232. doi: 10.1007/s101070000203
Web of Science ®Google Scholar
Ogryczak W, Ruszczyński A. Dual stochastic dominance and related mean-risk models. SIAM J Optim. 2002;13(1):60–78. doi: 10.1137/S1052623400375075
Web of Science ®Google Scholar
Dentcheva D, Ruszczyński A. Common mathematical foundations of expected utility and dual utility theories. SIAM J Optim. 2013;23(1):381–405. doi: 10.1137/120868311
Web of Science ®Google Scholar
Dai H, Xue Y, He N, et al. Learning to optimize with stochastic dominance constraints. In: International Conference on Artificial Intelligence and Statistics, PMLR; 2023; p. 8991–9009.
Google Scholar
Dentcheva D, Ruszczyński A. Portfolio optimization with risk control by stochastic dominance constraints. In: Infanger G., editor, Stochastic Programming. The State of the Art. In Honor of George B. Dantzig, chapter 9, pages 189–212. New York: Springer; 2011.
Google Scholar
Fábián CI, Mitra G, Roman D, et al. Portfolio choice models based on second-order stochastic dominance measures: An overview and a computational study. In M. Bertocchi, G. Consigli, and M. A. H. Dempster, editors, Stochastic optimization methods in finance and energy, International Series in Operations Research & Management Science, chapter 18, Springer: New York; 2011; p. 441–470. ISBN 978-1-4419-9586-5. doi: 10.1007/978-1-4419-9586-5_18
Google Scholar
Dentcheva D, Ruszczyński A. Risk preferences on the space of quantile functions. Math Program. 2014;148(1-2):181–200. doi: 10.1007/s10107-013-0724-2
Web of Science ®Google Scholar
Dentcheva D, Ruszczyński A. Semi-infinite probabilistic optimization: first order stochastic dominance constraints. Optimization. 2004;53:583–601. doi: 10.1080/02331930412331327148
Web of Science ®Google Scholar
Dentcheva D, Henrion R, Ruszczyński A. Stability and sensitivity of optimization problems with first order stochastic dominance constraints. SIAM J Optim. 2007;18:322–337. doi: 10.1137/060650118
Web of Science ®Google Scholar
Frydenberg S, Sønsteng Henriksen TE, Pichler A, et al. Can commodities dominate stock and bond portfolios? Ann Oper Res. 2019;282(1-2):155–177. doi: 10.1007/s10479-018-2996-7
Web of Science ®Google Scholar
Batukhtin V. On solving discontinuous extremal problems. J Optim Theory Appl. 1993;77:575–589. doi: 10.1007/BF00940451
Web of Science ®Google Scholar
Knopov P, Norkin V. Stochastic optimization methods for the stochastic storage process control. M. J. Blondin et al. (eds.), Intelligent Control and Smart Energy Management, Springer Optimization and Its Applications 181, 2022; p. 79–111. doi: 10.1007/978-3-030-84474-5_3
Google Scholar
Norkin VI. A stochastic smoothing method for nonsmooth global optimization. Cybernetics and Computer technologies: 2020; p. 5–14. doi: 10.34229/2707-451X.20.1.1
Google Scholar
Ermoliev YM, Norkin VI, Wets RJ-B. The minimization of semicontinuous functions: mollifier subgradients. SIAM J Control Optim. 1995;33:149–167. doi: 10.1137/S0363012992238369
Web of Science ®Google Scholar
Mayne D, Polak E. On solving discontinuous extremal problems. J Optim Theory Appl. 1984;43:601–613. doi: 10.1007/BF00935008
Google Scholar
Mikhalevich VS, Gupal AM, Norkin VI. Methods of nonconvex optimization. Moscow: Nauka; 1987. (In Russian).
Google Scholar
Nesterov Y, Spokoiny V. Random gradient-free minimization of convex functions. Found Comput Math. 2017;17:527–566. doi: 10.1007/s10208-015-9296-2
Web of Science ®Google Scholar
Norkin V, Pflug GC, Ruszczyński A. A branch and bound method for stochastic global optimization. Math Program. 1998;83:425–450. doi: 10.1007/bf02680569
Web of Science ®Google Scholar
Aubin J-P, Ekeland I. Applied nonlinear analysis. New York: John Wiley and Sons; 1984.
Google Scholar
Norkin VI. On measuring and profiling catastrophic risks. Cybern Syst Anal. 2006;42(6):839–850. doi: 10.1007/s10559-006-0124-1
Google Scholar
Galvan G, Sciandrone M, Eucidi S. A parameter-free unconstrained reformulation for nonsmooth problems with convex constraints. Comput Optim Appl. 2021;80:33–53. doi: 10.1007/s10589-021-00296-1
Web of Science ®Google Scholar
Rockafellar RT, Wets RJ-B. Variational analysis. Grundlehren der mathematischen Wissenschaften. Springer, 1st ed. 1998, 3rd printing, springer edition, 2009. ISBN 3540627723; 9783540627722. doi: 10.1007/978-3-642-02431-3
Google Scholar
Ermoliev Y, Norkin V. On constrained discontinuous optimization. In: Stochastic Programming Methods and Technical Applications: Proceedings of the 3rd GAMM/IFIP-Workshop on ‘Stochastic Optimization: Numerical Methods and Technical Applications’ held at the Federal Armed Forces University Munich, Neubiberg/München, Germany, June 17–20, 1996, Springer: Berlin, Heidelberg; 1998; p. 128–144.
Google Scholar
Gupal AM, Norkin VI. Algorithm for the minimization of discontinuous functions. Cybernetics. 1977;13(2):220–223. doi: 10.1007/BF01073313
Google Scholar
Rockafellar RT, Wets RJ-B. Variational analysis. Switzerland AG: Springer Nature; 1997. Available at https://books.google.com/books?id=w-NdOE5fD8AC.doi: 10.1007/978-3-642-02431-3
Google Scholar
Norkin V, Pichler A, Kozyriev A. Constrained global optimization by smoothing. arXiv; 2023. doi: 10.48550/arXiv.2308.08422
Google Scholar
Nemirovsky A, Yudin D. Informational complexity and efficient methods for solution of convex extremal problems. New York: J. Wiley & Sons; 1983.
Google Scholar
Nesterov Y. A method of solving a convex programming problem with convergence rate. Soviet Math Dokl. 1983;27(2):372–376.
Google Scholar
Ermoliev YM. Methods of stochastic programming. Moscow: Nauka; 1976. (In Russian).
Google Scholar
Gupal AM. Stochastic methods for minimzation of nondifferentiable functions. Autom Remote Control. 1979;4(40):529–534.
Google Scholar
Polyak BT. Introduction to optimization. New York: Optimization Software, Inc. Publications Division; 1987.
Google Scholar
Pflug GC. Stepsize rules, stopping times, and their implementation in stochastic quasigradient algorithms. In: Ermoliev Y. and Wets RJ-B, editors, Numerical Techniques for Stochastic Optimization. Berlin: Springer-Verlag; 1988; chapter 17, p. 353–372.
Google Scholar
Markowitz HM. Portfolio selection. efficient diversification of investments. New York: John Wiley & Sons, Chapman & Hall; 1959.
Google Scholar
Luedtke J. New formulations for optimization under stochastic dominance constraints. SIAM J Optim. 2008;19(3):1433–1450. doi: 10.1137/070707956
Web of Science ®Google Scholar

Appendix

Table A1. Return data set from [Citation62, Table 1, page 13], with artificial bond column.

Download CSV Display Table

Portfolio reshaping under 1st-order stochastic dominance constraints by the exact penalty function methods

Abstract

1. Introduction

Outline of the paper.

2. Literature review

3. Mathematical problem setting

Examples

Portfolio selection under a single Value-at-Risk (V@R) constraint, cf. [Citation25] and corresponding references therein

Decision making under catastrophic risks, [Citation49]

4. The solution approach: exact penalty functions

4.1. Finding a feasible solution

4.2. Structure of the feasible set

4.3. Exact discontinuous penalty functions

4.4. Exact projective penalty functions

Computational aspects

5. Numerical optimization of discontinuous penalty functions

5.1. Averaged functions

Strongly lower semicontinuous functions, cf. [Citation43]

Epi-convergence, cf. Rockafellar and Wets [Citation54]

Epi-convergence of averaged functions, cf. [Citation43], Theorem 3.7 and [Citation51], Example 7.19

5.2. Stochastic methods for minimization of discontinuous penalty functions

The successive stochastic smoothing method, cf. [Citation42]

The successive smoothing method as a local optimizer.

Global optimization issues.

The Branch & Bound algorithm.

6. Numerical illustration

6.1. Testing the successive smoothing method on the discontinuous portfolio optimization problems

6.2. Lower bounding the risk-return profile and maximizing the tail return

Table 1. Optimal 3 component portfolios, discontinuous penalties: (1) Max mean; (2) Max V@R; (3) Max AV@R.

Table 2. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max V@R40%. (3) Max AV@R40%.

Table 3. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max V@R. (3) Max AV@R.

Table 4. Optimal 10 component portfolios, analytical projection: (1) Max mean. (2) Max V@R. (3) Max AV@R.

Table 5. The optimal 10 component portfolio.

6.3. Summary of numerical experience

7. Conclusions

Disclosure statement

Additional information

Funding

References

Appendix

Table A1. Return data set from [Citation62, Table 1, page 13], with artificial bond column.

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Portfolio selection under a single Value-at-Risk ( $V @ R$ ) constraint, cf. [Citation25] and corresponding references therein

Table 1. Optimal 3 component portfolios, discontinuous penalties: (1) Max mean; (2) Max $V @ R$ ; (3) Max $AV @ R$ .

Table 2. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max ${V @ R}_{40 %}$ . (3) Max ${AV @ R}_{40 %}$ .

Table 3. Optimal 3 component portfolios, analytical projection: (1) Max mean. (2) Max $V @ R$ . (3) Max $AV @ R$ .

Table 4. Optimal 10 component portfolios, analytical projection: (1) Max mean. (2) Max $V @ R$ . (3) Max $AV @ R$ .