Full article: On pricing of discrete Asian and Lookback options under the Heston model

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We propose a new, data-driven approach for efficient pricing of – fixed- and floating-strike – discrete arithmetic Asian and Lookback options when the underlying process is driven by the Heston model dynamics. The method proposed in this article constitutes an extension of Perotti and Grzelak [Fast sampling from time-integrated bridges using deep learning, J. Comput. Math. Data Sci. 5 (2022)], where the problem of sampling from time-integrated stochastic bridges was addressed. The model relies on the Seven-League scheme [S. Liu et al. The seven-league scheme: Deep learning for large time step Monte Carlo simulations of stochastic differential equations, Risks 10 (2022), p. 47], where artificial neural networks are employed to ‘learn’ the distribution of the random variable of interest utilizing stochastic collocation points [L.A. Grzelak et al. The stochastic collocation Monte Carlo sampler: Highly efficient sampling from expensive distributions, Quant. Finance 19 (2019), pp. 339–356]. The method results in a robust procedure for Monte Carlo pricing. Furthermore, semi-analytic formulae for option pricing are provided in a simplified, yet general, framework. The model guarantees high accuracy and a reduction of the computational time up to thousands of times compared to classical Monte Carlo pricing schemes.

Keywords:

2020 Mathematics Subject Classifications:

1. Introduction

A non-trivial problem in the financial field is the pricing of path-dependent derivatives, as for instance Asian and Lookback options. The payoffs of such derivatives are expressed as functions of the underlying process monitored over the life of the option. The monitoring can either be continuous or discrete. Depending on different specifications of the underlying dynamics, only a few case-specific theoretical formulae exist. For example, under the Black-Scholes and Merton log-normal dynamics, closed-form formulae were derived for continuously-monitored geometric Asian options (see, for instance, Devreese et al. [Citation8]). In the same model framework, Goldman et al. [Citation10], Conze [Citation6] derived an analytic formula for the continuously-monitored Lookback option, using probabilistic arguments such as the reflection principle. However, options whose payoffs are discretely-monitored are less tractable analytically, and so approximations are developed, as for the discrete Lookback option under the lognormal dynamics [Citation15]. Furthermore, stochastic volatility frameworks are even more challenging for the pricing task, where no applicable closed-form theoretical solutions are known.

Whenever an exact theoretical pricing formula is not available, a rich literature on numerical methods and approximations exists. The three main classes of approaches are Monte Carlo (MC) methods (e.g. [Citation16]), partial differential equations (PDEs) techniques (see the extensive work in [Citation27,Citation28]), and Fourier-inversion based techniques (among many relevant works, we report [Citation3,Citation31]).

Monte Carlo methods are by far the most flexible approaches, since they neither require any particular assumption on the underlying dynamics, nor on the targeted payoff. Furthermore, they benefit from a straightforward implementation based on the discretization of the time horizon as, for instance, the well-known Euler-Maruyama scheme. The cost to be paid, however, is typically a significant computational time to get accurate results. PDE approaches are more problem-specific since they require the derivation of the partial differential equation which describes the evolution of the option value over time. Then, the PDE is usually solved using finite difference methods. Fourier-inversion-based techniques exploit the relationship between the probability density function (PDF) and the characteristic function (ChF) to recover the underlying transition density by means of Fast Fourier Transform (FFT). Thanks to the swift algorithm for FFT, such methods produce high-speed numerical evaluation, but they are often problem-specific, depending on the underlying dynamics. A relevant example is [Citation7], where a numerical method is proposed for the pricing of discrete arithmetic Asian options under stochastic volatility model dynamics. In the same group of techniques, we also refer to [Citation19] where a unified framework for the pricing of discrete Asian options is described, allowing for regime-switching jump diffusion underlying dynamics. A further example is [Citation21], where a close approximation of the classic Heston model is proposed – via the CTMC model – which allows for a general SWIFT-based (see [Citation20]) pricing approach with application to exotic derivatives, as discrete Asian options.

In this article, we propose a data-drivenFootnote¹ extension to MC schemes that allows for efficient pricing of discretely-monitored Asian and Lookback options, without losing the flexibility typical of MC methods. We develop the methodology in the complex framework of the stochastic volatility model of Heston [Citation14], with an extensive application to the case of Feller condition not satisfied. Under this dynamics, we show how to price fixed- or floating-strike discrete Asian and Lookback options. Moreover, the pricing model is applied also for the challenging task of pricing options with both a fixed- and a floating-strike component. We underline that the strengths of the method are its speed and accuracy, coupled with significant flexibility. The procedure is, indeed, independent of the underlying dynamics (it could be applied, for instance, for any stochastic volatility model), and it is not sensitive to the targeted payoff.

Inspired by the works in [Citation22,Citation25], the method relies on the technique of Stochastic Collocation (SC) [Citation12], which is employed to accurately approximate the targeted distribution by means of piecewise polynomials. Artificial neural networks (ANNs) are used for fast recovery of the coefficients which uniquely determine the piecewise approximation. Given these coefficients, the pricing can be performed in a ‘MC fashion’ sampling from the target (approximated) distribution and computing the numerical average of the discounted payoffs. Furthermore, in a simplified setting, we provide a semi-analytic formula that allows to directly price options without the need of sampling from the desired distribution. In both situations (MC and semi-analytic pricing) we report a significant computational speed-up, without affecting the accuracy of the result which remains comparable with the one of highly expensive MC methods.

The remainder of the paper is as follows. In Section 2, we formally define discrete arithmetic Asian and Lookback options, as well as the model framework for the underlying process. Then, in Section 3, the pricing model is described. Two different cases are considered, in increasing order of complexity, to handle efficiently both unconditional sampling Section 3.2 and conditional sampling Section 3.3 for pricing of discrete arithmetic Asian and Lookback options. Section 4 provides theoretical support to the given numerical scheme. The quality of the methodology is also inspected empirically with several numerical experiments, reported in Section 5. Section 6 concludes.

2. Discrete arithmetic Asian and Lookback options

In a generic setting, given the present time $t_{0} \geq 0$ , the payoff at time $T > t_{0}$ of a discrete arithmetic Asian or Lookback option, with underlying process $S (t)$ , can be written as: (1) $H_{ω} (T; S) = max (ω (A (S) - K_{1} S (T) - K_{2}), 0),$ (1) where $A (S) \equiv A (S (t); t_{0} < t \leq T)$ is a deterministic function of the underlying process S, the constants $K_{1}, K_{2} \geq 0$ control the floating- and fixed-strikes of the option, and $ω = \pm 1$ . Particularly, discrete arithmetic Asian and Lookback options are obtained by setting the quantity $A (S)$ respectively as follows: (2) $A (S) := \frac{1}{N} \sum_{n \in I} S (t_{n}), A (S) := ω max_{n \in I} ωS (t_{n}),$ (2) with $I = {1, \dots, N}$ a set of indexes, $t_{1} < t_{2} < \dots < t_{N} = T$ , a discrete set of future monitoring dates, and ω as in (Equation1(1) $H_{ω} (T; S) = max (ω (A (S) - K_{1} S (T) - K_{2}), 0),$ (1) ).

Note that in both the cases of discrete arithmetic Asian and Lookback options, $A (S)$ is expressed as a deterministic transformation of the underlying process' path, which is the only requirement to apply the proposed method. Therefore, in the paper, we refer always to the class of discrete arithmetic Asian options, often just called Asian options, for simplicity. However, the theory holds for both classes of products. Actually, the pricing model applies to any product with a path-dependent European-type payoff , requiring only a different definition of $A (S)$ .

2.1. Pricing of arithmetic Asian options and Heston framework

This section focuses on the risk-neutral pricing of arithmetic – fixed- and floating-strike – Asian options, whose payoff is given in (Equation1(1) $H_{ω} (T; S) = max (ω (A (S) - K_{1} S (T) - K_{2}), 0),$ (1) ) with $A (S)$ in (Equation2(2) $A (S) := \frac{1}{N} \sum_{n \in I} S (t_{n}), A (S) := ω max_{n \in I} ωS (t_{n}),$ (2) ). By setting $K_{1} = 0$ or $K_{2} = 0$ , we get two special cases: the fixed- or the floating-strike arithmetic Asian option. From Equation (Equation1(1) $H_{ω} (T; S) = max (ω (A (S) - K_{1} S (T) - K_{2}), 0),$ (1) ), for $K_{1} = 0$ , the simplified payoff of a fixed-strike arithmetic Asian option reads: (3) $H_{ω}^{fx} (T; S) = max (ω (A (S) - K_{2}), 0),$ (3) therefore, with the risk-neutral present value: (4) $V_{ω}^{fx} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)],$ (4) where we assume the money-savings account $M (t)$ to be defined through the deterministic dynamics $d M (t) = rM (t) d t$ with constant interest rate $r \geq 0$ . For $K_{2} = 0$ , however, the payoff in Equation (Equation1(1) $H_{ω} (T; S) = max (ω (A (S) - K_{1} S (T) - K_{2}), 0),$ (1) ), becomes the one of a floating-strike arithmetic Asian option: (5) $H_{ω}^{fl} (T; S) = max (ω (A (S) - K_{1} S (T)), 0) .$ (5) The payoff in (Equation5(5) $H_{ω}^{fl} (T; S) = max (ω (A (S) - K_{1} S (T)), 0) .$ (5) ) is less tractable when compared to the one in (Equation3(3) $H_{ω}^{fx} (T; S) = max (ω (A (S) - K_{2}), 0),$ (3) ), because of the presence of two dependent stochastic unknowns, namely $S (T)$ and $A (S)$ . However, a similar representation to the one in Equation (Equation4(4) $V_{ω}^{fx} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)],$ (4) ) can be achieved, allowing for a unique pricing approach in both cases. By a change of measure from the risk-neutral measure $Q$ to the measure $Q^{S}$ associated with the numéraire $S (t)$ , i.e. the stock measure, we prove the following proposition.

Proposition 2.1

Pricing of floating-strike arithmetic Asian option under the stock measure

Under the stock measure $Q^{S}$ , the value at time $t_{0} \geq 0$ of an arithmetic floating-strike Asian option, with maturity $T > t_{0}$ and future monitoring dates $t_{n}$ , $n \in {1, \dots, N}$ , reads: (6) $V_{ω}^{fl} (t_{0}) = S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)],$ (6) with $A^{fl} (S)$ , defined as: $A^{fl} (S) := A (\frac{S (\cdot)}{S (T)}) = \frac{A (S)}{S (T)},$ where $A (S)$ is defined in (Equation2(2) $A (S) := \frac{1}{N} \sum_{n \in I} S (t_{n}), A (S) := ω max_{n \in I} ωS (t_{n}),$ (2) ).

Proof.

For a proof, see Appendix A.1.

Both the representations in Equations (Equation4(4) $V_{ω}^{fx} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)],$ (4) ) and (Equation6(6) $V_{ω}^{fl} (t_{0}) = S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)],$ (6) ) can be treated in the same way. This means that the present value of both a fixed- and a floating-strike Asian option can be computed similarly, as stated in the following proposition (see, e.g. [Citation13]).

Proposition 2.2

Symmetry of fixed- and floating-strike Asian option present value

Let us consider the process $S (t)$ and the money-savings account $M (t)$ , for $t \geq t_{0}$ . Then, the same representation holds for the value at time $t_{0}$ of both fixed- or floating-strike Asian options, with maturity $T > t_{0}$ , underlying process $S (t)$ , and future monitoring dates $t_{n}$ , $n \in {1, \dots, N}$ . The present value is given by: (7) $V_{ω}^{λ} (t_{0}) = {\begin{cases} \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)], for λ = fx, \\ S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)], for λ = fl . \end{cases}$ (7)

Proof.

The proof follows by direct comparison between Equations (Equation4(4) $V_{ω}^{fx} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)],$ (4) ) and (Equation6(6) $V_{ω}^{fl} (t_{0}) = S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)],$ (6) ).

When $K_{1} \neq 0$ and $K_{2} \neq 0$ , Equation (Equation1(1) $H_{ω} (T; S) = max (ω (A (S) - K_{1} S (T) - K_{2}), 0),$ (1) ) is the payoff of a fixed- and floating-strikes arithmetic Asian option. Its present value does not allow any simplified representation, and we write it as the expectation of the discounted payoff under the risk-neutral measure $Q$ : (8) $V_{ω} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{1} S (T) - K_{2}), 0)] .$ (8) By comparing Equations (Equation7(7) $V_{ω}^{λ} (t_{0}) = {\begin{cases} \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)], for λ = fx, \\ S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)], for λ = fl . \end{cases}$ (7) ) and (Equation8(8) $V_{ω} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{1} S (T) - K_{2}), 0)] .$ (8) ) a difference in the two settings is unveiled. Equation (Equation7(7) $V_{ω}^{λ} (t_{0}) = {\begin{cases} \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)], for λ = fx, \\ S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)], for λ = fl . \end{cases}$ (7) ) is characterized by a unique unknown stochastic quantity $A (S)$ , whereas in (Equation8(8) $V_{ω} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{1} S (T) - K_{2}), 0)] .$ (8) ) an additional term appears, namely the stock price at final time, $S (T)$ . Furthermore, the two stochastic quantities in (Equation8(8) $V_{ω} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{1} S (T) - K_{2}), 0)] .$ (8) ) are not independent. This might suggest that different procedures should be employed for the different payoffs. In particular, in a MC setting, to value (Equation7(7) $V_{ω}^{λ} (t_{0}) = {\begin{cases} \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{2}), 0)], for λ = fx, \\ S (t_{0}) E_{t_{0}}^{S} [max (ω (A^{fl} (S) - K_{1}), 0)], for λ = fl . \end{cases}$ (7) ) we only have to sample from the unconditional distribution of $A (S)$ ; while in (Equation8(8) $V_{ω} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{1} S (T) - K_{2}), 0)] .$ (8) ) the MC scheme requires dealing with both the sampling of $S (T)$ and the conditional sampling of $A (S) | S (T)$ .

Let us define the stochastic volatility dynamics of Heston for the underlying stochastic process $S (t)$ , with initial value $S (t_{0}) = S_{0}$ , through the following system of stochastic differential equations (SDEs): (9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) (10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) with $r, κ, \bar{v}, v_{0} \geq 0$ , $γ > 0$ the constant rate, the speed of mean reversion, the long-term mean of the variance process, the initial variance, and the volatility-of-volatility, respectively. $W_{x} (t)$ and $W_{v} (t)$ are Brownian Motions (BMs) under the risk-neutral measure $Q$ with correlation coefficient $ρ \in [- 1, 1]$ , i.e. $d W_{x} (t) d W_{v} (t) = ρ d t$ .Footnote² The dynamics in (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ) are defined in the risk-neutral framework. However, Proposition 2.1 entails a different measure framework, whose dynamics still fall in the class of stochastic volatility model of Heston, with adjusted parameters (see Proposition Appendix A.1).

3. Swift numerical pricing using deep learning

This section focuses on the efficient pricing of discrete arithmetic Asian options in a MC setting. The method uses a Stochastic Collocation [Citation12] (SC) based approach to approximate the target distribution. Then, artificial neural networks (ANNs) ‘learn’ the proxy of the desired distribution, allowing for fast recovery [Citation22,Citation25].

3.1. ‘Compressing’ distribution with stochastic collocation

In the framework of MC methods, the idea of SC – based on the probability integral transformFootnote³ – is to approximate the relationship between a ‘computationally expensive’ random variable, say A, and a ‘computationally cheap’ one, say ξ. The approximation is then used for sampling. A random variable is ‘expensive’ if its inverse CDF is not known in analytic form, and needs to be computed numerically. With SC, the sampling of A is performed at the cost of sampling ξ (see [Citation12]). Formally, the following mapping is used to generate samples from A: (11) $A \overset{d}{=} F_{A}^{- 1} (F_{ξ} (ξ)) =: g (ξ) \approx \tilde{g} (ξ),$ (11) with $F_{A}$ and $F_{ξ}$ being respectively the CDFs of A and ξ, and the function $\tilde{g}$ a suitable, easily evaluable approximation of g. The reason why we prefer $\tilde{g}$ to g, is that, by definition, every evaluation of g requires the numerical inversion of $F_{A}$ , the CDF of A.

Many possible choices of $\tilde{g}$ exist. In [Citation12,Citation25], $\tilde{g}$ is an $(M - 1)$ -degree polynomial expressed in Lagrange basis, defined on collocation points (CPs) $ξ := {ξ_{k}}_{k = 1}^{M}$ computed as Gauss-Hermite quadrature nodes, i.e.: (12) $\tilde{g} (x) := \sum_{k = 1}^{M} a_{k} ℓ_{k} (x), ℓ_{k} (x) := \prod_{\begin{matrix} 1 \leq j \leq M \\ j \neq k \end{matrix}} \frac{x - ξ_{j}}{ξ_{k} - ξ_{j}}, k = 1, \dots, M,$ (12) where the coefficients $a := {a_{k}}_{k = 1}^{M}$ of the polynomial in the Lagrange basis representation, called collocation values (CVs), are derived by imposing the system of equations: (13) $g (ξ_{k}) = \tilde{g} (ξ_{k}) =: a_{k}, k = 1, \dots, M,$ (13) which requires only M evaluations of g.

In this work, we define $\tilde{g}$ as a piecewise polynomial. Particularly, we divide the domain $R$ of the random variable ξ (which for us is standard normally distributedFootnote⁴) in three regions. In each region, we define $\tilde{g}$ as a polynomial. In other words, given the partition of the real line: (14) $Ω_{-} \cup Ω_{M} \cup Ω_{+} := (- \infty, - \bar{ξ}) \cup [- \bar{ξ}, \bar{ξ}] \cup (\bar{ξ}, + \infty),$ (14) for a certain $\bar{ξ} > 0$ , $\tilde{g}$ is specified as: (15) $\tilde{g} (ξ) := g_{-} (ξ) \cdot 1_{Ω_{-}} (ξ) + g_{M} (ξ) \cdot 1_{Ω_{M}} (ξ) + g_{+} (ξ) \cdot 1_{Ω_{+}} (ξ),$ (15) where $1_{(\cdot)}$ is the indicator function, and $g_{-}, g_{M}, g_{+}$ are suitable polynomials. To ensure high accuracy in the approximation, $g_{M}$ is defined as a Lagrange polynomial of high-degree M−1. The CPs $ξ$ , which identify the Lagrange basis in (Equation12(12) $\tilde{g} (x) := \sum_{k = 1}^{M} a_{k} ℓ_{k} (x), ℓ_{k} (x) := \prod_{\begin{matrix} 1 \leq j \leq M \\ j \neq k \end{matrix}} \frac{x - ξ_{j}}{ξ_{k} - ξ_{j}}, k = 1, \dots, M,$ (12) ), are chosen as Chebyshev nodes in the bounded interval $Ω_{M} = [- \bar{ξ}, \bar{ξ}]$ [Citation9]. Instead, the CVs $a$ are defined as in (Equation13(13) $g (ξ_{k}) = \tilde{g} (ξ_{k}) =: a_{k}, k = 1, \dots, M,$ (13) ). The choice of Chebyshev nodes allows for increasing the degree of the interpolation (i.e. the number of CPs and CVs), avoiding the Runge's phenomenon within the interval $Ω_{M}$ [Citation26]. However, the behaviour of $g_{M}$ outside $Ω_{M}$ is out of control. We expect the high-degree polynomial $g_{M}$ to be a poor approximation of g in $Ω_{-}$ and $Ω_{+}$ . Therefore, we define $g_{-}$ and $g_{+}$ as linear (or at most quadratic) polynomials, with degree $M_{-} - 1$ and $M_{+} - 1$ , built on the extreme CPs of $ξ$ .

Summarizing, $g_{-}$ , $g_{M}$ and $g_{+}$ are all defined as Lagrange polynomials: (16) $g_{(\cdot)} (x) := \sum_{k \in I_{(\cdot)}} a_{k} ℓ_{k} (x), ℓ_{k} (x) := \prod_{\begin{matrix} j \in I_{(\cdot)} \\ j \neq k \end{matrix}} \frac{x - ξ_{j}}{ξ_{k} - ξ_{j}}, k \in I_{(\cdot)},$ (16) where the sets of indexes for $g_{-}$ , $g_{M}$ and $g_{+}$ are $I_{-} = {1, 2}$ , $I_{M} = {1, \dots, M}$ and $I_{+} = {M - 1, M}$ , respectively (if a quadratic extrapolation is preferred, we get $I_{-} = {1, 2, 3}$ , and $I_{+} = {M - 2, M - 1, M}$ ).

Remark 3.1

‘Compressed’ distributions

The SC technique is a tool to ‘compress’ the information regarding A in (Equation11(11) $A \overset{d}{=} F_{A}^{- 1} (F_{ξ} (ξ)) =: g (ξ) \approx \tilde{g} (ξ),$ (11) ), into a small number of coefficients, the CVs $a$ . Indeed, the relationship between A and $a$ is bijective, provided the distribution of the random variable ξ, and the corresponding CPs $ξ$ (or, equivalently, the Lagrange basis in (Equation12(12) $\tilde{g} (x) := \sum_{k = 1}^{M} a_{k} ℓ_{k} (x), ℓ_{k} (x) := \prod_{\begin{matrix} 1 \leq j \leq M \\ j \neq k \end{matrix}} \frac{x - ξ_{j}}{ξ_{k} - ξ_{j}}, k = 1, \dots, M,$ (12) )), are specified a priori.

3.2. Semi-analytical pricing of fixed- or floating-strike Asian options

Let us first consider the pricing of the fixed- or the floating-strike Asian options. Both the products allow for the same representation in which the only unknown stochastic quantity is $A^{λ} (S)$ , $λ \in {fx, fl}$ , as given in Proposition 2.2. For the sake of simplicity, in the absence of ambiguity, we call $A^{λ} (S)$ just A.

For pricing purposes, we can benefit of the SC technique presented in the previous section, provided we know the map $\tilde{g}$ (or, equivalently, the CVs $a$ ), i.e.: (17) $\begin{aligned} V_{ω} (t_{0}) & = C E [max (ω (A - K), 0)] \\ \approx C E [max (ω (\tilde{g} (ξ) - K), 0)] =: {\tilde{V}}_{ω} (t_{0}), \end{aligned}$ (17) where ξ is a standard normally distributed random variable, and C is a constant coherent with Proposition 2.2. We note, that ${\tilde{V}}_{ω} (t_{0})$ is the expectation of (the positive part of) polynomials of a standard normal distribution. Hence, a semi-analytic formula exists, in a similar fashion as the one given in [Citation11].

Proposition 3.2

Semi-analytic pricing formula

Let ${\tilde{V}}_{ω} (t_{0})$ (and C) be defined as in Equation (Equation17(17) $\begin{aligned} V_{ω} (t_{0}) & = C E [max (ω (A - K), 0)] \\ \approx C E [max (ω (\tilde{g} (ξ) - K), 0)] =: {\tilde{V}}_{ω} (t_{0}), \end{aligned}$ (17) ), with $\tilde{g}$ defined in (Equation15(15) $\tilde{g} (ξ) := g_{-} (ξ) \cdot 1_{Ω_{-}} (ξ) + g_{M} (ξ) \cdot 1_{Ω_{M}} (ξ) + g_{+} (ξ) \cdot 1_{Ω_{+}} (ξ),$ (15) ). Assume further that $α_{-}$ , $α_{M}$ and $α_{+}$ are the coefficients in the canonical basis of monomials for the three polynomials $g_{-}$ , $g_{M}$ and $g_{+}$ respectively of degree $M_{-} - 1$ , M−1 and $M_{+} - 1$ . Then, using the notation $a \lor b = max (a, b)$ , the following semi-analytic pricing approximation holds: $\begin{aligned} \frac{{\tilde{V}}_{ω} (t_{0})}{ωC} & = [\sum_{i = 0}^{M_{- ω} - 1} ω^{i} α_{- ω, i} m_{i} (ω c_{K}, - \bar{ξ} \lor ω c_{K})] \cdot (F_{ξ} (- \bar{ξ} \lor ω c_{K}) - F_{ξ} (ω c_{K})) \\ + [\sum_{i = 0}^{M - 1} ω^{i} α_{M, i} m_{i} (- \bar{ξ} \lor ω c_{K}, \bar{ξ} \lor ω c_{K})] \cdot (F_{ξ} (\bar{ξ} \lor ω c_{K}) - F_{ξ} (- \bar{ξ} \lor ω c_{K})) \\ + [\sum_{i = 0}^{M_{ω} - 1} ω^{i} α_{ω, i} m_{i} (\bar{ξ} \lor ω c_{K}, + \infty)] \cdot (1 - F_{ξ} (\bar{ξ} \lor ω c_{K})) - K \cdot (1 - F_{ξ} (ω c_{K})), \end{aligned}$ where $F_{ξ}$ is the CDF of a standard normal random variable, ξ, $m_{i} (a, b) := E [ξ^{i} | a \leq ξ \leq b]$ Footnote⁵, $c_{K}$ satisfies $K = \tilde{g} (c_{K})$ , and $ω = \pm 1$ according to the call/put case.

Proof.

For proof of the previous proposition, see Appendix A.2.

We note also that Proposition 3.2 uses as input in the pricing formula the coefficients in the canonical basis, not the ones in the Lagrange basis, $a$ , in (Equation13(13) $g (ξ_{k}) = \tilde{g} (ξ_{k}) =: a_{k}, k = 1, \dots, M,$ (13) ).

Remark 3.3

Change of basis

Given a Lagrange basis identified by M collocation points $ξ$ , any $(M - 1)$ -degree polynomial $g (ξ)$ is uniquely determined by the corresponding M coefficients $a$ . A linear transformation exists that connects the M coefficients in the Lagrange basis with the M coefficients $α$ in the canonical basis of monomials. In particular, it holds: $M α = a,$ with $M \equiv M (ξ)$ a $M \times M$ Vandermonde matrix with element $M_{k, i} := ξ_{k}^{i - 1}$ in position $(k, i)$ . The matrix $M$ admits an inverse; thus, the coefficients $α$ in the canonical basis are the result of matrix-vector multiplication, provided the coefficients $a$ in the Lagrange basis are known. Moreover, since the matrix $M$ only depends on $ξ$ , its inverse can be computed a priori once the CPs $ξ$ are fixed.

Proposition 3.2 provides a semi-analytic formula for the pricing of fixed- or floating-strike Asian options. Indeed, it requires the inversion of the map $\tilde{g}$ which typically is not available in analytic form. On the other hand, since both the CPs $ξ$ and the CVs $a$ are known, a proxy of ${\tilde{g}}^{- 1}$ is easily achievable by interpolation on the pairs of values $(a_{k}, ξ_{k})$ , $k = 1, \dots, M$ .

The last problem is to recover the CVs $a$ (which identify $\tilde{g}$ ) in an accurate and fast way. We recall that, for $k = 1, \dots, M$ , each CV $a_{k}$ is defined in terms of the exact map g and the CP $ξ_{k}$ by the relationship: $a_{k} := g (ξ_{k}) = F_{A}^{- 1} (F_{ξ} (ξ_{k})) .$ The presence of $F_{A}^{- 1}$ makes it impossible to directly compute $a$ efficiently. On the other hand, by definition, the CVs $a$ are quantiles of the random variable $A \equiv A (S)$ , which depends on the parameters $p$ of the underlying process S. As a consequence, there must exist some unknown mapping H which links $p$ to the corresponding $a$ . We approximate such a mapping from synthetic data setting a regression problem, which is solved with an ANN $\tilde{H}$ (in the same fashion as in [Citation22,Citation25]). We have the following mapping: $p \mapsto a := H (p) \approx \tilde{H} (p), p \in Ω_{p}, a \in Ω_{a},$ with $Ω_{p}$ and $Ω_{a}$ being the spaces of the underlying model parameters and of the CVs, respectively, while the ANN $\tilde{H}$ is the result of an optimization process on a synthetic training setFootnote⁶: (18) $T = {(p_{i}, a_{i}) : i \in {1, \dots, N_{pairs}}} .$ (18) The pricing procedure is summarized in the following algorithm.

Algorithm: Semi-analytic pricing

Fix the M collocation points $ξ$ .
Given the parameters $p$ , approximate the M collocation values, i.e. $a \approx \tilde{H} (p)$ .
Given $a$ , compute the coefficients $α_{-}$ , $α_{M}$ and $α_{+}$ for $g_{-}$ , $g_{M}$ and $g_{+}$ (see Remark 3.3).
Given K and $a$ , compute $c_{K}$ of Proposition 3.2 interpolating ${\tilde{g}}^{- 1}$ on $(a, ξ)$ .
Given the coefficients $α_{-}$ , $α_{M}$ , $α_{+}$ , and $c_{K}$ , use Proposition 3.2 to compute ${\tilde{V}}_{ω} (t_{0})$ .

3.3. Swift Monte Carlo pricing of fixed- and floating-strikes Asian options

Let us consider the case of an option whose payoff has both a fixed- and a floating-strike. The present value of such a derivative is given by: $V_{ω} (t_{0}) = \frac{M (t_{0})}{M (T)} E_{t_{0}}^{Q} [max (ω (A (S) - K_{1} S (T) - K_{2}), 0)],$ hence the price of the option is a function of the two dependent quantities $A (S)$ and $S (T)$ . This means that, even in a MC setting, the dependency between $A (S)$ and $S (T)$ has to be fulfilled. Therefore, a different methodology with respect to the one proposed in the previous section needs to be developed.

Due to the availability of efficient and accurate sampling techniques for the underlying process S at a given future time T (we use the COS method [Citation24] enhanced with SC [Citation12], and we call it COS-SC), the main issue is the sampling of the conditional random variable $A (S) | S (T)$ . This task is addressed in the same fashion as it is done in [Citation25], where ANNs and stochastic collocation are applied for the efficient sampling from time-integral of stochastic bridgesFootnote⁷, namely $\int_{t_{0}}^{T} S (t) d t$ given the value of $S (T)$ . The underlying idea here is the same since the random variable $A (S)$ is conditional to $S (T)$ . Especially, in the previous sections we pointed out that the distribution of $A (S)$ has an unknown parametric form which depends on the set of Heston parameters $p$ . Similarly, we expect the distribution of $A | S (T) = \hat{S}$ to be parametric into the ‘augmented’ set of parameters $p_{\hat{S}} := p \cup {\hat{S}}$ . Hence, there exists a mapping H which links $p_{\hat{S}}$ with the CVs, $a_{\hat{S}}$ , corresponding to the conditional distribution $A (S) | S (T) = \hat{S}$ . We approximate H by means of a suitable ANN $∼ H$ , getting the following mapping scheme: $p_{\hat{S}} \mapsto a_{\hat{S}} := H (p_{\hat{S}}) \approx \tilde{H} (p_{\hat{S}}), p_{\hat{S}} \in Ω_{p_{S}}, a_{\hat{S}} \in Ω_{a_{S}},$ where $Ω_{p_{S}}$ and $Ω_{a_{S}}$ are respectively the spaces of the underlying model parameters (augmented with $\hat{S}$ ) and of the CVs (corresponding to the conditional distribution $A (S) | S (T)$ ), and $\tilde{H}$ is the result of a regression problem on a suitable training set $T$ (see Equation (Equation18(18) $T = {(p_{i}, a_{i}) : i \in {1, \dots, N_{pairs}}} .$ (18) )). We propose a first brute force sampling scheme.

Algorithm: Brute force conditional sampling and pricing

Fix the M collocation points $ξ$ .
Given the parameters $p$ , for $j = 1, \dots, N_{paths}$ , repeat:
1. generate the sample ${\hat{S}}_{j}$ from $S (T)$ (e.g. with COS-SC method [Citation12,Citation24]);
2. given $p_{{\hat{S}}_{j}}$ , approximate the M conditional CVs, i.e. $a_{{\hat{S}}_{j}} \approx \tilde{H} (p_{{\hat{S}}_{j}})$ ;
3. given $a_{{\hat{S}}_{j}}$ , use SC to generate the conditional sample ${\hat{A}}_{j}$ .
Given the pairs $({\hat{S}}_{j}, {\hat{A}}_{j})$ , for $j = 1, \dots, N_{paths}$ , and any desired $(K_{1}, K_{2})$ , evaluate: $V_{ω} (t_{0}) \approx \frac{1}{N_{paths}} \frac{M (t_{0})}{M (T)} \sum_{j = 1}^{N_{paths}} max (ω ({\hat{A}}_{j} - K_{1} {\hat{S}}_{j} - K_{2}), 0) .$

Nonetheless, the brute force sampling proposed above requires $N_{paths}$ evaluations of $\tilde{H}$ (see 2(b) in the previous algorithm). This is a massive computational cost, even if a single evaluation of an ANN is high-speed. We can, however, benefit from a further approximation. We compute the CVs using $\tilde{H}$ only at specific reference values for $S (T)$ . Then, the intermediate cases are derived utilizing (linear) interpolation. We choose a set of Q equally-spaced values ${S^{1}, S^{2}, \dots, S^{Q}}$ for $S (T)$ , defined as: (19) $S^{q} := S_{min} + \frac{q - 1}{Q - 1} (S_{max} - S_{min}), q = 1, \dots, Q,$ (19) where the boundaries are quantiles corresponding to the probabilities $p_{min}, p_{max} \in (0, 1)$ , i.e. $S_{min} := F_{S (T)}^{- 1} (p_{min})$ and $S_{max} := F_{S (T)}^{- 1} (p_{max})$ .

Calling $p^{q} = p_{S^{q}}$ , and $a^{q} = a_{S^{q}}$ , $q = 1, \dots, Q$ , we compute the grid $G$ of reference CPs, with only Q ANN evaluations, namely: (20) $[\begin{matrix} a^{1} \\ \dots \\ a^{q} \\ \dots \\ a^{Q} \end{matrix}] = [\begin{matrix} H (p^{1}) \\ \dots \\ H (p^{q}) \\ \dots \\ H (p^{Q}) \end{matrix}] \approx [\begin{matrix} \tilde{H} (p^{1}) \\ \dots \\ \tilde{H} (p^{q}) \\ \dots \\ \tilde{H} (p^{Q}) \end{matrix}] =: G,$ (20) where $a^{q}$ , $H (p^{q})$ and $\tilde{H} (p^{q})$ , $q = 1, \dots, Q$ , are row vectors. The interpolation on $G$ is much faster than the evaluation of $\tilde{H}$ . Therefore, the grid-based conditional sampling results more efficient than the brute force one, particularly when sampling a huge number of MC samples.

The algorithm for the grid-based sampling procedure, to be used instead of point 2. in the previous algorithm, is reported here.

Algorithm: Grid-based conditional sampling

2.1.	Fix the boundary probabilities $p_{min}, p_{max} \in (0, 1)$ and compute the boundary quantiles $S_{min} := F_{S (T)}^{- 1} (p_{min})$ and $S_{max} := F_{S (T)}^{- 1} (p_{max})$ (e.g. with the COS method [Citation24]).
2.2.	Compute the reference values $S^{q} := S_{min} + \frac{q - 1}{Q - 1} (S_{max} - S_{min})$ , $q = 1, \dots, Q$ .
2.3.	Given the ‘augmented’ parameters $p^{q}$ , evaluate Q times $\tilde{H}$ to compute $G$ (see (Equation20(20) $[\begin{matrix} a^{1} \\ \dots \\ a^{q} \\ \dots \\ a^{Q} \end{matrix}] = [\begin{matrix} H (p^{1}) \\ \dots \\ H (p^{q}) \\ \dots \\ H (p^{Q}) \end{matrix}] \approx [\begin{matrix} \tilde{H} (p^{1}) \\ \dots \\ \tilde{H} (p^{q}) \\ \dots \\ \tilde{H} (p^{Q}) \end{matrix}] =: G,$ (20) )).
2.4.	Given the parameters $p$ and the grid $G$ , for $j = 1, \dots, N_{paths}$ , repeat: generate the sample ${\hat{S}}_{j}$ from $S (T)$ (e.g. with COS-SC method [Citation12,Citation24]); given ${\hat{S}}_{j}$ , approximate the M conditional CVs, i.e. $a_{{\hat{S}}_{j}}$ , by interpolation in $G$ ; given $a_{{\hat{S}}_{j}}$ , use SC to generate the conditional sample ${\hat{A}}_{j}$ .

4. Error analysis

This section is dedicated to the assessment and discussion of the error introduced by the main approximations used in the proposed pricing method. Two primary sources of error are identifiable. The first error is due to the SC technique: in Section 3.1 the exact map g is approximated by means of the piecewise polynomial $\tilde{g}$ . The second one is a regression error, which is present in both Sections 3.2 and 3.3. ANNs $\tilde{H}$ are used instead of the exact mappings H. For the error introduced by the SC technique, we bound the ‘ $L^{2}$ -distance’, $ϵ_{SC}$ , between the exact distribution and its SC proxy showing that $\tilde{g} 1_{Ω_{M}} = g_{M}$ in $Ω_{M}$ is an analytic function. $ϵ_{SC}$ is used to provide a direct bound on the option price error, $ϵ_{P}$ . On the other hand, regarding the approximation of H via $\tilde{H}$ we provide a general convergence result for ReLU-architecture ANN, i.e. ANN with Rectified Linear Units as activation functions.

4.1. Stochastic collocation error using Chebyshev polynomials

Let us consider the error introduced in the methodology using the SC technique Section 3.1, and investigate how this affects the option price. We restrict the analysis to the case of fixed- or floating-strike discrete arithmetic Asian and Lookback options Section 3.2. We define the error $ϵ_{P}$ as the ‘ $L^{1}$ -distance’ between the real price $V_{ω} (t_{0})$ and its approximation ${\tilde{V}}_{ω} (t_{0})$ , i.e.: (21) $ϵ_{P} := | {\tilde{V}}_{ω} (t_{0}) - V_{ω} (t_{0}) | .$ (21) Given the standard normal kernel $ξ \sim N (0, 1)$ , we define the SC error as the (squared) $L^{2}$ -norm of $g - \tilde{g}$ , i.e.: (22) $ϵ_{SC} := E [(g - \tilde{g})^{2} (ξ)] .$ (22) We decompose $ϵ_{SC}$ accordingly to the piecewise definition of $\tilde{g}$ , namely: $\begin{aligned} ϵ_{SC} & = E [(g 1_{Ω_{-}} - g_{-})^{2} (ξ)] + E [(g 1_{Ω_{M}} - g_{M})^{2} (ξ)] + E [(g 1_{Ω_{+}} - g_{+})^{2} (ξ)] \\ =: ϵ_{-} + ϵ_{M} + ϵ_{+} . \end{aligned}$ with the domains $Ω_{(\cdot)}$ defined in Equation (Equation14(14) $Ω_{-} \cup Ω_{M} \cup Ω_{+} := (- \infty, - \bar{ξ}) \cup [- \bar{ξ}, \bar{ξ}] \cup (\bar{ξ}, + \infty),$ (14) ), i.e. for $\bar{ξ} > 0$ : $Ω_{-} = (- \infty, - \bar{ξ}), Ω_{M} = [- \bar{ξ}, \bar{ξ}], Ω_{+} = (\bar{ξ}, + \infty) .$ To deal with the ‘extrapolation’ errors $ϵ_{-}$ and $ϵ_{+}$ , we formulate the following assumption.

Assumption 4.1

The functions $(g 1_{Ω_{-}} - g_{-})^{2}$ and $(g 1_{Ω_{+}} - g_{+})^{2}$ are $O (\exp x^{2} / 2)$ . Equivalently, $g^{2} 1_{Ω_{-}}$ and $g^{2} 1_{Ω_{+}}$ are $O (\exp x^{2} / 2)$ (since $g_{_{-}}$ and $g_{_{+}}$ are polynomials).

Given Assumption 4.1 and the fact that $ξ \sim N (0, 1)$ Footnote⁸, then the ‘extrapolation’ errors $ϵ_{-}$ and $ϵ_{+}$ vanish, with exponential rate, as $\bar{ξ}$ tends to infinity, i.e. $ϵ_{-} = ϵ_{-} (\bar{ξ})$ , $ϵ_{+} = ϵ_{+} (\bar{ξ})$ , and: (23) ${\begin{cases} ϵ_{-} (\bar{ξ}) \to 0, \\ ϵ_{+} (\bar{ξ}) \to 0, \end{cases} for \bar{ξ} \to + \infty .$ (23) An illustration of the speed of convergence is reported in Figure . Figure a shows that the growth of $g 1_{Ω_{+}}$ is (much) less than exponential (consistently with Assumption 4.1), whereas Figure (b) illustrates the exponential decay of $ϵ_{-}$ and $ϵ_{+}$ when $\bar{ξ}$ increases.

Figure 1. Left: (slow) growth of $g 1_{Ω_{+}}$ compared to the linear extrapolation $g_{+}$ . Right: exponential decays of $ϵ_{-} (\bar{ξ})$ and $ϵ_{+} (\bar{ξ})$ in (Equation23(23) ${\begin{cases} ϵ_{-} (\bar{ξ}) \to 0, \\ ϵ_{+} (\bar{ξ}) \to 0, \end{cases} for \bar{ξ} \to + \infty .$ (23) ) when the $\bar{ξ}$ increases. The upper x-axis represents the probability $F_{ξ} (\bar{ξ})$ .

Figure 1. Left: (slow) growth of g1Ω+ compared to the linear extrapolation g+. Right: exponential decays of ϵ−(ξ¯) and ϵ+(ξ¯) in (Equation23(23) {ϵ−(ξ¯)→0,ϵ+(ξ¯)→0,for ξ¯→+∞.(23) ) when the ξ¯ increases. The upper x-axis represents the probability Fξ(ξ¯).

Therefore, if $\bar{ξ}$ is taken sufficiently big, the error $ξ_{SC}$ in (Equation22(22) $ϵ_{SC} := E [(g - \tilde{g})^{2} (ξ)] .$ (22) ) is mainly driven by the ‘interpolation’ error $ϵ_{M}$ , whose estimate is connected to error bounds for Chebyshev polynomial interpolation, and it is the focus of the next part.

Theorem 4.1

Error bound for analytic function [Citation9,Citation29]

Let f be a real function on $[- 1, 1]$ and $f_{M}$ be its $(M - 1)$ -degree polynomial interpolation built on Chebyshev nodes $ξ_{k} := \cos \frac{k - 1}{M - 1} π$ , $k = 1, \dots, M$ . If f has an analytic extension in a Bernstein ellipse $B$ with foci $\pm 1$ and major and minor semiaxis lengths summing up to $ϱ > 1$ such that $sup_{B} | f | \leq \frac{ϱ - 1}{4} \bar{C}$ for some constant $\bar{C} > 0$ , then, for each $M \geq 1$ , the following bound holds: $| | f - f_{M} | |_{L^{\infty} ([- 1, 1])} \leq \bar{C} ϱ^{1 - M} .$

Since $g 1_{Ω_{M}}$ is approximated by means of the $(M - 1)$ -degree polynomials $g_{M}$ , built on Chebyshev nodes, to apply Theorem 4.1, we verify the required assumptions, namely the boundedness of $g 1_{Ω_{M}}$ in $Ω_{M}$ and its analyticity.

We recall that: (24) $g = F_{A}^{- 1} \circ F_{ξ},$ (24) with $F_{A}$ and $F_{ξ}$ the CDFs of $A (S)$ and ξ, respectively. Hence, the boundedness on the compact domain $Ω_{M}$ is satisfied because the map g is monotone increasing (as a composition of monotone increasing functions), and defined everywhere in $Ω_{M}$ .

Furthermore, since the CDF of a standard normal, $F_{ξ}$ , is analytic, from (Equation24(24) $g = F_{A}^{- 1} \circ F_{ξ},$ (24) ) it follows that g is analytic if $F_{A}^{- 1}$ is analytic. The analyticity of $F_{A}^{- 1}$ is fulfilled if $F_{A}$ is analytic and $F_{A}^{'} = f_{A}$ does not vanish in the domain $Ω_{M}$ . Observe that, by restricting the domain to $Ω_{M}$ , the latter condition is trivially satisfied because we are ‘far’ from the tails of $A (S)$ (corresponding to the extrapolation domains $Ω_{-}$ and $Ω_{+}$ ), and $F_{A}^{'}$ do not vanish in other regions than the tails.

On the contrary, proving that $F_{A}$ is analytic is not trivial because of the lack of an explicit formula for $F_{A}$ . However, it is beneficial to represent $F_{A}$ through the characteristc function (ChF) of $A (S)$ , $ϕ_{A}$ . For that purpose, we use a well-known inversion result.

Theorem 4.2

ChF inversion theorem

Let us denote by F and ϕ the CDF and the ChF of a given real-valued random variable defined on $R$ . Then, it is possible to retrieve F from ϕ according to the inversion formula: $F (x) - F (0) = \frac{1}{2 π} \int_{- \infty}^{+ \infty} ϕ (u) \frac{1 - e^{- iux}}{iu} d u,$ with the integral being understood as a principal value.

Proof.

For detailed proof, we refer to [Citation17].

Thanks to Theorem 4.2, we have that if $ϕ_{A}$ is analytic, so it is $F_{A}$ (as long as the integral in (eqn: L2error) is well-defined). Thus, the problem becomes to determine if $ϕ_{A}$ is analytic. We rely on a characterization of entireFootnote⁹ ChFs, which can be used in this framework to show that in the cases of – fixed- or floating-strike – discrete arithmetic Asian and Lookback options, the (complex extension of the) function $ϕ_{A}$ is analytic in a certain domain.

Theorem 4.3

Characterization of entire ChFs [Citation4]

Let Y be a real random variable. Then, the complex function $ϕ (z) := E [e^{izY}]$ , $z \in C$ , is entire if and only if the absolute moments of Y exist for any order, i.e. $E [| Y |^{k}] < + \infty$ for any $k \in N$ , and the following limit holds: (25) $lim_{k \to + \infty} {(\frac{| E [Y^{k}] |}{k!})}^{\frac{1}{k}} = 0.$ (25)

Proof.

A reference for proof is given in [Citation4].

When dealing with the Heston model, there is no closed-form expression for the moments of the underlying process $S (t)$ , as well as for the moments of its transform $A (S)$ . Nonetheless, a conditional case can be studied and employed to provide a starting point for a convergence result.

Proposition 4.4

Conditional ChF $ϕ_{A | V}$ is entire

Let us define the N-dimensional random vector $V$ , with values in $Ω_{V} := R_{+}^{N}$ , as: $V := [I_{v} (t_{1}), I_{v} (t_{2}), \dots, I_{v} (t_{N})]^{T}, I_{v} (t_{n}) := \int_{t_{0}}^{t_{n}} v (τ) d τ, n = 1, \dots, N .$ Let the complex conditional characteristic function $ϕ_{A | V} (z) := E [e^{izA} | V]$ , $z \in C$ , be the extended ChF of the conditional random variable $A | V$ , with $A \equiv A (S)$ as given in Equation (Equation2(2) $A (S) := \frac{1}{N} \sum_{n \in I} S (t_{n}), A (S) := ω max_{n \in I} ωS (t_{n}),$ (2) ).

Then, $ϕ_{A | V} (z)$ is entire.

Proof.

See Appendix A.4.

From now on, using the notation of Proposition 4.4, we consider the following assumption on the tail behaviour of the random vector $(V, A)$ satisfied. Informally, we require that the density of the joint distribution of $(V, A)$ has uniform (w.r.t. $V$ ) exponential decay for A going to $+ \infty$ .

Assumption 4.2

There exists a point $z^{*} \in C$ , $z^{*} = x^{*} - i y^{*}$ , with $x^{*}, y^{*} \in R$ and $y^{*} > 0$ , such that: $\int_{Ω_{V} \times R^{+}} e^{y^{*} a} d F_{V, A} (v, a) < + \infty,$ with $F_{V, A} (\cdot, \cdot)$ the joint distribution of the random vector $V$ and the random variable A.

Thanks to Assumption 4.2, the ChF $ϕ_{A} (z)$ is well defined for any $z \in S_{y^{*}} \subset C$ , with the strip $S_{y^{*}} := R + i \cdot [- y^{*}, y^{*}]$ . Moreover, applying Fubini's Theorem, for any $z \in S_{y^{*}}$ , we have: (26) $ϕ_{A} (z) = \int_{Ω_{V}} ϕ_{A | V = v} (z) d F_{V} (v) .$ (26) Thus, we can show that the ChF $ϕ_{A} (z)$ is analytic in the strip $S_{y^{*}}$ (the details are given in Appendix A.2).

Proposition 4.5

ChF $ϕ_{A}$ is analytic

Let $ϕ_{A} (z) := E [e^{izA}]$ , $z \in S_{y^{*}}$ , with $A \equiv A (S)$ . Then, $ϕ_{A} (z)$ is analytic in $S_{y^{*}}$ .

Proof.

A proof is given in Appendix A.2.

Thanks to Proposition 4.5, and consistently with the previous discussion, we conclude that the map g in (Equation24(24) $g = F_{A}^{- 1} \circ F_{ξ},$ (24) ), is analytic on the domain $Ω_{M}$ . Therefore, we can apply Theorem 4.1, which yields the following error estimate: $| | g 1_{Ω_{M}} - g_{M} | |_{L^{\infty} (Ω_{M})} \leq \bar{C} ϱ^{1 - M},$ for certain $ϱ > 1$ and $\bar{C} > 0$ . As a consequence, the following bound for the $L^{2}$ -error $ϵ_{M}$ holds: (27) $ϵ_{M} = E [(g 1_{Ω_{M}} - g_{M})^{2} (ξ)] \leq {\bar{C}}^{2} ϱ^{2 - 2 M} .$ (27) Furthermore, the exponential convergence is also confirmed numerically, as reported in Figure . In Figure (a) we can appreciate the improvement in the approximation of $\tilde{g} 1_{Ω_{M}}$ by means of $g_{M}$ , when M is increased, whereas Figure (b) reports the exponential decay of $ϵ_{M}$ .

Figure 2. Left: exact map $g 1_{Ω_{M}}$ (blue) compared to the interpolation $g_{M}$ in the domain $Ω_{M}$ , for M = 4, 5. Right: $L^{2}$ -error $ϵ_{M}$ in (Equation27(27) $ϵ_{M} = E [(g 1_{Ω_{M}} - g_{M})^{2} (ξ)] \leq {\bar{C}}^{2} ϱ^{2 - 2 M} .$ (27) ) exponential decay in the order of the polynomial $g_{M}$ .

Figure 2. Left: exact map g1ΩM (blue) compared to the interpolation gM in the domain ΩM, for M = 4, 5. Right: L2-error ϵM in (Equation27(27) ϵM=E[(g1ΩM−gM)2(ξ)]≤C¯2ϱ2−2M.(27) ) exponential decay in the order of the polynomial gM.

Using (Equation27(27) $ϵ_{M} = E [(g 1_{Ω_{M}} - g_{M})^{2} (ξ)] \leq {\bar{C}}^{2} ϱ^{2 - 2 M} .$ (27) ), the $L^{2}$ -norm of $g - \tilde{g}$ , $ϵ_{SC}$ in (Equation22(22) $ϵ_{SC} := E [(g - \tilde{g})^{2} (ξ)] .$ (22) ), is bounded by: $ϵ_{SC} \equiv ϵ_{SC} (\bar{ξ}, M) = E [(g (ξ) - \tilde{g} (ξ))^{2}] \leq ϵ_{-} (\bar{ξ}) + {\bar{C}}^{2} ϱ^{2 - 2 M} + ϵ_{+} (\bar{ξ}),$ which goes to zero when $\bar{ξ} \in R^{+}$ and $M \in N$ tend to $+ \infty$ . Therefore, for any $ϵ > 0$ there exist ${\bar{ξ}}^{*} \in R^{+}$ and $M^{*} \in N$ such that: (28) $ϵ_{SC} ({\bar{ξ}}^{*}, M^{*}) < ϵ^{2},$ (28) and because of the exponential decay, we expect ${\bar{ξ}}^{*}$ and $M^{*}$ do not need to be taken too big.

Eventually, we can benefit from the bound in (Equation28(28) $ϵ_{SC} ({\bar{ξ}}^{*}, M^{*}) < ϵ^{2},$ (28) ) to control the pricing error, $ϵ_{P}$ in (Equation21(21) $ϵ_{P} := | {\tilde{V}}_{ω} (t_{0}) - V_{ω} (t_{0}) | .$ (21) ). By employing the well-known inequality $max (a + b, 0) \leq max (a, 0) + max (b, 0)$ and the Cauchy-Schwarz inequality, we can write: $\begin{aligned} {\tilde{V}}_{ω} (t_{0}) = E [{(ω (\tilde{g} (ξ) - K))}^{+}] & \leq \sqrt{E [{(\tilde{g} (ξ) - g (ξ))}^{2}]} + E [{(ω (g (ξ) - K))}^{+}] \\ \leq \sqrt{ϵ_{SC} (\bar{ξ}, M)} + V_{ω} (t_{0}), \end{aligned}$ and using the same argument twice (exchanging the roles of g and $\tilde{g}$ ), we end up with the following bound for the option price error: $ϵ_{P} \leq \sqrt{ϵ_{SC} ({\bar{ξ}}^{*}, M^{*})} \leq ϵ,$ with ${\bar{ξ}}^{*}$ and $M^{*}$ as in (Equation28(28) $ϵ_{SC} ({\bar{ξ}}^{*}, M^{*}) < ϵ^{2},$ (28) ).

4.2. Artificial neural network regression error

As the final part of the error analysis, we investigate when ANNs are suitable approximating maps. In particular, we focus on ANNs with ReLU-architectures, namely ANNs whose activation units are all Rectified Linear Units defined as $ϕ (x) = x 1_{x > 0} (x)$ .

Consider the Sobolev space $(W^{n, \infty} ([0, 1]^{d}), | | \cdot | |_{n, d}^{\infty})$ , with $n, d \in N ∖ {0}$ , namely the space of functions $C^{n - 1} ([0, 1]^{d})$ whose derivatives up to the $(n - 1)$ th order are all Lipschitz continuous, equipped with the norm $| | \cdot | |_{n, d}^{\infty}$ defined as: (29) $| | f | |_{n, d}^{\infty} = max_{| n | \leq n} \underset{x \in [0, 1]^{d}}{ess \sup} | D^{n} f (x) |,$ (29) with $n := (n_{1}, \dots, n_{d}) \in N^{d}$ , $| n | = \sum_{i = 1}^{d} n_{i}$ , and $D^{n}$ the weak derivative operator. Furthermore, we define the unit ball $B_{n, d} := {f \in W^{n, \infty} ([0, 1]^{d}) : | | f | |_{n, d}^{\infty} \leq 1}$ . Then, the following approximation result holds:

Theorem 4.6

Convergence for ReLU ANN

For any choice of $d, n \in N ∖ {0}$ and $ϵ \in (0, 1)$ , there exists an architecture $H (x | \cdot)$ based on ReLU (Rectified Linear Unit) activation functions ϕ, i.e. $ϕ (x) = x 1_{x > 0} (x)$ , such that:

(1)	$H (x \| \cdot)$ is able to approximate any function $f \in B_{n, d}$ with an error smaller than ϵ, i.e. there exists a matrix of weights $W$ such that $\| \| f (\cdot) - H (\cdot \| W) \| \|_{n, d}^{\infty} < ϵ$ ;
(2)	H has at most $c (\ln 1 / ϵ + 1)$ layers and at most $c ϵ^{- d / n} (\ln 1 / ϵ + 1)$ weights and neurons, with $c = c (d, n)$ an appropriate constant depending on d and n.

Proof.

A proof is available in [Citation30].

Essentially, Theorem 4.6 states that there always exists a ReLU-architecture (with bounded number of layers and activation units) suitable to approximate at any desired precision functions with a certain level of regularity (determined by $(W^{n, \infty} ([0, 1]^{d}), | | \cdot | |_{n, d}^{\infty})$ ).

Remark 4.7

Input scaling

We emphasize that although Theorem 4.6 applies to (a subclass of sufficiently regular) functions with domain the d-dimensional hypercube $[0, 1]^{d}$ , this is not restrictive. Indeed, as long as the regularity conditions are fulfilled, Theorem 4.6 holds for any function defined on a d-dimensional hyperrectangle since it is always possible to linearly map its domain into the d-dimensional hypercube.

Furthermore, we observe that all the results of convergence for ANN rely on the assumption that the training is performed successfully, and so the final error in the optimization process is negligible. Under this assumption, Theorem 4.6 provides a robust theoretical justification for using ReLU-based ANNs as regressors. The goodness of the result can also be investigated empirically, as shown in the next section (see, for instance, Figure ).

Figure 3. Illustration of a dense ANN with (from the left) one input layer, three hidden layers, and one output layer. Each white node represents an activation unit.

Figure 4. First experiment: FxA. Left: scatter plot of the real CVs ( $a_{k}$ , $k = 1, \dots, 21$ , with different colours) against the predicted ones. Right: zoom on the ‘worst’ case, namely $a_{10}$ .

Figure 4. First experiment: FxA. Left: scatter plot of the real CVs (ak, k=1,…,21, with different colours) against the predicted ones. Right: zoom on the ‘worst’ case, namely a10.

5. Numerical experiments

In this part of the paper, we detail some numerical experiments. We focus on applying the methodology given in Section 3.2 for the numerical pricing of fixed-strike discrete arithmetic Asian and Lookback options. We address the general case of discrete arithmetic Asian options described in Section 3.3. For each pricing experiment errors and timing results are given. The ground truth benchmarks are computed via MC using the almost exact simulation of the Heston models, detailed in Result Appendix A.3.

All the computations are implemented and run on a MacBook Air (M1, 2020) machine, with chip Apple M1 and 16 GB of RAM. The code is written in Python, and torch is the library used for the design and training of the ANN, as in [Citation25].

5.1. A benchmark from the literature

To assess the quality of the proposed methodology, we compare the method against the benchmarks available in Table 5 from [Citation19]. In the experiment, we consider prices of 5 discrete Asian call options with n = 201 equally spaced monitoring dates from time $t_{0} = 0$ to T = 0.25. The underlying initial value is $S (t_{0}) = 100$ , while the other Heston parameters (Set BM) as well as the target strikes are given in Table . To produce those results, a toy model has been trained based on the ranges provided in Table . The ANN employed consists of 2 hidden layers each one with 20 hidden units. The results are computed with $N_{paths} = 10^{6}$ Monte Carlo paths (consistent with the benchmark [Citation19]) and are presented in Table . For both the benchmark and the SC technique are reported the absolute value (V) and the $95 %$ confidence interval (95CI) of the option price. For the SA technique, instead, only the absolute value is reported since no sampling is involved and so there is no information on the variance of the estimate. All the results are within the BM $95 %$ confidence interval, and hence confirm the high accuracy of the proposed method.

5.2. Experiments' specifications

Among the three examples of applications presented, two of them rely on the technique given in Section 3.2, while the third is based on the theory in Section 3.3. The first experiment is the pricing of fixed-strike discrete arithmetic Asian options (FxA) with an underlying stock price process following the Heston dynamics. The second example, instead, is connected to the ‘interest rate world’, and is employed for the pricing of fixed-strike discrete Lookback swaptions (FxL). We assume the underlying swap rate is driven by a displaced Heston model with drift-less dynamics, typically used for interest rates. The last one is an application to the pricing of fixed- and floating-strikes discrete arithmetic Asian options (FxFlA) on a stock price driven by the Heston dynamics. In the first (FxA) and last experiment (FxFlA), $A (S)$ in (Equation2(2) $A (S) := \frac{1}{N} \sum_{n \in I} S (t_{n}), A (S) := ω max_{n \in I} ωS (t_{n}),$ (2) ) is specified as: (30) $A (S) = \frac{1}{5} \sum_{n = 1}^{5} S (t_{n}), t_{n} := T - (5 - n) τ_{A}, n = 1, \dots, 5,$ (30) with $τ_{A} = \frac{1}{12}$ as monitoring time lag, and $T > 4 τ_{A}$ as option maturity. Differently, in the second experiment (FxL) $A (S)$ is given by: (31) $A (S) = min_{n} S (t_{n}), t_{n} := T - (30 - n) τ_{L}, n = 1, \dots, 30,$ (31) with $τ_{L} = \frac{1}{120}$ as monitoring time lag, and $T > 29 τ_{L}$ as option maturity. Observe that, assuming the unit is 1 year, with 12 identical months and 360 days, the choices of $τ_{A}$ and $τ_{L}$ correspond respectively to 1 month and 3 days of time lag in the monitoring dates.

Table 1. Training set for benchmark replication.

Display Table

Table 2. Results benchmark (BM) replication (see, Table 5 from [Citation19]).

Display Table

5.3. Artificial neural network development

In this section, we provide the details about the generation of the training set, for each experiment, and the consequent training of the ANN used in the pricing model.

5.3.1. Training set generation

The training sets are generated through MC simulations, using the almost exact sampling in Result Appendix A.3. In the first two applications (FxA and FxL) the two training sets are defined as in (Equation18(18) $T = {(p_{i}, a_{i}) : i \in {1, \dots, N_{pairs}}} .$ (18) ), and particularly they read: $\begin{aligned} T_{FxA} & = {({r, κ, γ, ρ, \bar{v}, v_{0}, T}_{i}, {a_{1}, \dots, a_{21}}_{i}) : i \in {1, \dots, N_{pairs}^{FxA}}}, \\ T_{FxL} & = {({κ, γ, ρ, \bar{v}, v_{0}, T}_{i}, {a_{1}, \dots, a_{21}}_{i}) : i \in {1, \dots, N_{pairs}^{FxL}}} . \end{aligned}$ The Heston parameters, i.e. $p ∖ {T}$ Footnote¹⁰, are sampled using Latin Hypercube Sampling (LHS), to ensure the best filling of the parameter space [Citation22,Citation25]. For each set $p ∖ {T}$ , $N_{paths}$ paths are generated, with a time step of $Δt$ and a time horizon up to $T_{max}$ . The underlying process S is monitored at each time T for which there are enough past observations to compute $A (S)$ , i.e.: $\begin{aligned} T & \geq 4 τ_{A} + Δt, for FxA, \\ T & \geq 29 τ_{L} + Δt, for FxL . \end{aligned}$ Consequently, the product between the number of Heston parameters' set and the number of available maturities determines the magnitude of the two training sets (i.e. $N_{pairs}^{FxA}$ and $N_{pairs}^{FxL}$ ).

For each $p$ , the CVs $a$ corresponding to $A (S)$ are computed as: $a_{k} := F_{A}^{- 1} (F_{ξ} (ξ_{k})) \approx Q_{A} (F_{ξ} (ξ_{k})), k \in {1, \dots, 21},$ where $Q_{A}$ is the empirical quantile function of $A (S)$ , used as a numerical proxy of $F_{A}^{- 1}$ , and $ξ_{k}$ are the CPs computed as Chebyshev nodes: $ξ_{k} := - \bar{ξ} \cos \frac{k - 1}{20} π, k \in {1, \dots, 21},$ with $\bar{ξ} := F_{ξ}^{- 1} (0.993) \approx 2.46$ . We note that the definition of $\bar{ξ}$ avoids any CV $a_{k}$ to be ‘deeply’ in the tails of $A (S)$ , which are more sensitive to numerical instability in a MC simulation.

The information about the generation of the two training sets is reported in Table . Observe that $T_{FxA}$ is richer in elements than $T_{FxL}$ because of computational constraints. Indeed, the higher number of monitoring dates of $A (S)$ in FxL makes the generation time of $T_{FxL}$ more than twice the one of $T_{FxA}$ (given the same number of pairs).

Table 3. Training sets $T_{FxA}$ and $T_{FxL}$ generation details.

Display Table

Since in the general procedure (see Section 3.3) ANNs are used to learn the conditional distribution $A (S) | S (T)$ (not just $A (S)$ !), the third experiment requires a training set which contains also information about the conditioning value, $S (T)$ . We define $T_{FxFlA}$ as: $T_{FxFlA} = {({r, κ, γ, ρ, \bar{v}, v_{0}, T, S^{q}, p^{q}}_{i}, {a_{1}, \dots, a_{14}}_{i}) : i \in {1, \dots, N_{pairs}}},$ where $p^{q}$ is the probability corresponding to the quantile $S^{q}$ , given as in (Equation19(19) $S^{q} := S_{min} + \frac{q - 1}{Q - 1} (S_{max} - S_{min}), q = 1, \dots, Q,$ (19) ), i.e.: $S^{q} = S_{min} + \frac{q - 1}{14} (S_{max} - S_{min}), q \in {1, \dots, 15},$ with the $S_{min} = F_{S (T)}^{- 1} (p_{min})$ and $S_{max} = F_{S (T)}^{- 1} (p_{max})$ . Heuristic arguments drove the choice of adding in the input set $p$ the probability $p^{q} := F_{S (T)} (S^{q})$ , i.e. the probability implied by the final value $p^{q}$ . Indeed, the ANN training process results more accurate when both $S^{q}$ and $p^{q}$ are included in $p$ . Similarly as before, the sets of Heston parameters are sampled using LHS. For each set, $N_{tot}$ paths are generated, with a time step of $Δt$ and a time horizon up to $T_{max}$ . The underlying process S is monitored at each time T for which there are enough past observations to compute $A (S)$ , i.e.: $T \geq 4 τ_{A} + Δt .$ For any maturity T and any realization $S^{q}$ , the inverse CDF of the conditional random variable $A (S) | S (T) = S^{q}$ is approximated with the empirical quantile function, $Q_{A | S^{q}}$ . The quantile function $Q_{A | S^{q}}$ is built on the $N_{paths}$ ‘closest’ paths to $S^{q}$ , i.e. those $N_{paths}$ paths whose final values $S (T)$ are the closest to $S^{q}$ .

Eventually, for any input set $p = {r, κ, γ, ρ, \bar{v}, v_{0}, T, S^{q}, p^{q}}$ , the CVs $a$ corresponding to $A (S) | S (T) = S^{q}$ are computed as: $a_{k} := F_{A | S^{q}}^{- 1} (F_{ξ} (ξ_{k})) \approx Q_{A | S^{q}} (F_{ξ} (ξ_{k})), k \in {1, \dots, 14},$ with $ξ_{k}$ the Chebyshev nodes: $ξ_{k} := - \bar{ξ} \cos \frac{k - 1}{13} π, k \in {1, \dots, 14},$ and $\bar{ξ} := F_{ξ}^{- 1} (0.993) \approx 2.46$ .

The information about the generation of the training set $T_{FxFlA}$ are reported in Table .

Table 4. Training set $T_{FxFlA}$ generation details.

Display Table

5.3.2. Artificial neural network training

Each training set $T_{(\cdot)}$ store a finite amount of pairs $(p, a)$ , in which each $p$ and each corresponding $a$ are connected by the mapping H. The artificial neural network $\tilde{H}$ is used to approximate and generalize H for inputs $p$ not in $T_{(\cdot)}$ . The architecture of $\tilde{H}$ was initially chosen accordingly to [Citation22,Citation25], then suitably adjusted by heuristic arguments.

$\tilde{H}$ is a fully connected (or dense) ANN with five layers – one input, one output, and three hidden (HidL), as the one illustrated in Figure . Input and output layers have a number of units (neurons) – input size (InS) and output size (OutS) – coherent with the targeted problem (FxA, FxL, or FxFlA). Each hidden layer has the same hidden size (HidS) of 200 neurons, selected as the optimal one among different settings. ReLU (Rectified Linear Unit) is the non-linear activation unit (ActUn) for each neuron, and it is defined as $ϕ (x) := max (x, 0)$ [Citation23]. The loss function (LossFunc) is the Mean Squared Error (MSE) between the actual outputs, $a$ (available in $T$ ), and the ones predicted by the ANN, $\tilde{H} (p)$ . The optimization process is composed of 3,000 epochs (E). During each epoch, the major fraction (70%) of the $T$ (the actual training set) is ‘back-propagated’ through the ANN in batches of size 1024 (B). The stochastic gradient-based optimizer (Opt) Adam [Citation18] is employed in the optimization. Particularly, the optimizer updates the ANN weights based on the gradient computed on each random batch (during each epoch). The initial learning rate (InitLR) is $10^{- 3}$ , with a decaying rate (DecR) of 0.1 and a decaying step (DecS) of 1,000 epochs. The details are reported in Table .

Furthermore, during the optimization routine, the 20% of $T$ is used to validate the result (namely, to avoid the overfitting of the training set). Eventually, the remaining 10% of $T$ is used for testing the quality of the ANN. Figure provides a visual insight into the high accuracy the ANN reaches at the end of the training process. Figure (a) shows the scatter plot of the real CVs $a_{k}$ , $k = 1, \dots, 21$ , against the ones predicted using the ANN, for the experiment FxA; in Figure (b) a zoom is applied to the ‘worst’ case, namely the CV $a_{10}$ , for which anyway is reached the extremely high $R^{2}$ score of 0.9994.

Table 5. Artificial neural network and optimization details.

Display Table

5.4. Sampling and pricing

Given the trained model from the previous section, we can now focus on the actual sampling and/or pricing of options. In particular, for the first two experiments, we consider the following payoffs: (32) $\begin{aligned} FxA : max (ω (A (S) - K_{2}), 0), \frac{K_{2}}{S (t_{0})} \in [0.8, 1.2], \end{aligned}$ (32) (33) $\begin{aligned} FxL : max (ω (A (S) - K_{2}), 0), \frac{K_{2}}{S (t_{0})} \in [0.8, 1.2], \end{aligned}$ (33) whereas for the third, FxFlA, we have: (34) $max (ω (A (S) - K_{1} S (T) - K_{2}), 0), (\frac{K_{1}}{S (t_{0})}, \frac{K_{2}}{S (t_{0})}) \in [0.4, 0.6] \times [0.4, 0.6],$ (34) with $A (S)$ defined as in (Equation30(30) $A (S) = \frac{1}{5} \sum_{n = 1}^{5} S (t_{n}), t_{n} := T - (5 - n) τ_{A}, n = 1, \dots, 5,$ (30) ) for FxA and FxFlA, and as in (Equation31(31) $A (S) = min_{n} S (t_{n}), t_{n} := T - (30 - n) τ_{L}, n = 1, \dots, 30,$ (31) ) for FxL.

All the results in the following sections are compared to a MC benchmark obtained using the almost exact simulation described in Appendix A.3.

5.4.1. Numerical results for FxA

The procedure described in Section 3.2 is employed to solve the problem of pricing fixed-strike discrete Asian options with payoffs as in (Equation32(32) $\begin{aligned} FxA : max (ω (A (S) - K_{2}), 0), \frac{K_{2}}{S (t_{0})} \in [0.8, 1.2], \end{aligned}$ (32) ), with underlying stock price initial value $S (t_{0}) = 1$ . In this experiment, the ANN is trained on Heston model parameters' ranges, which include the examples proposed in [Citation1] representing some real applications. Furthermore, we note the following aspect.

Remark 5.1

Scaled underlying process and (positive) homogeneity of A

The unit initial value is not restrictive. Indeed, the underlying stock price dynamics in Equations (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ) are independent of $S (t_{0})$ , with the initial value only accounting as a multiplicative constant (this can be easily proved by means of Itô's lemma). Moreover, since $A (S)$ is (positive) homogeneous in S also $A (S)$ can be easily ‘scaled’ according to the desired initial value. Particularly, given the constant c>0, $cA (S) \overset{d}{=} A (cS)$ .

The methodology is tested on different randomly chosen sets of Heston parameters. We report the details for two specific sets, Set I and Set II, available in Table . For Set I, in Figure , we compare the population from

A (S)

obtained employing SC with the MC benchmark (both with

N_{paths} = 10^{5}

paths each). Figure (a) shows the highly accurate approximation of the exact map

g = F_{A}^{- 1} \circ F_{ξ}

by means of the piecewise polynomial approximation

\tilde{g}

. As a consequence, both the PDF (see Figure (b)) and the CDF (see Figure (c)) perfectly match. Moreover, the methodology is employed to value fixed-strike arithmetic Asian options (calls and puts) for two sets of parameters (Set I and Set II) and 50 different strikes

K_{2}

. The resulting prices are reported in Figure (a,c). Figure (b,d) display the standard errors for MC and SC on the left y-axis, and the pricing error

ϵ_{P}

for SC and SA (in units of the corresponding MC standard error) on the right y-axis. The pricing error tends to be more significant the more out of money the option is, due to the smaller SE.

The timing results are reported (in milliseconds) in Table , for different choices of $N_{paths}$ . Both SC and SA times only refer to the online pricing computational time, namely the time required for the pricing procedure, excluding the training sets generation and the ANNs training, which are performed – only once – offline. The semi-analytic formula requires a constant evaluation time, as well as the SC technique (if $N_{paths}$ is fixed), whereas the MC simulation is dependent on the parameter T (since we decided to keep the same MC step in every simulation). Therefore, the methodology becomes more convenient the longer the maturity of the option T. The option pricing computational time is reduced by tens of times when using SC to generate the population from A, while it is reduced by hundreds of times if the semi-analytic (SA) formula is employed.

Table 6. Tested Heston parameter sets.

Display Table

Figure 5. Left: comparison between maps g from MC and $\tilde{g}$ from SC (with linear extrapolation). Center: comparison between MC histogram of $A (S)$ and the numerical PDF from SC. Right: comparison between MC numerical CDF of $A (S)$ and the numerical CDF from SC.

Figure 5. Left: comparison between maps g from MC and g~ from SC (with linear extrapolation). Center: comparison between MC histogram of A(S) and the numerical PDF from SC. Right: comparison between MC numerical CDF of A(S) and the numerical CDF from SC.

Figure 6. Left: fixed-strike discrete Asian option prices for 50 different strikes under the Heston model dynamics. Right: MC and SC standard errors (left y-axis) and pricing errors in units of the corresponding MC standard errors SE, obtained with $N_{paths} = 10^{5}$ for both MC and SC (right y-axis). Up: call with Set I. Down: put with Set II.

Table 7. Timing results for option pricing.

Display Table

Eventually, the error distribution of 10,000 different option prices (one call and one put with 100 values for $K_{2}$ each for 50 randomly chosen Heston parameters' sets and maturities) is given in Figure (a). The SA prices (assuming a linear extrapolation) are compared with MC benchmarks. The outcome is satisfactory and shows the robustness of the methodology proposed. The error is smaller than three times the MC standard error in more than 90% of the cases when $N_{paths} = 10^{5}$ (red histogram), and in more than 80% of the cases when $N_{paths} = 2 \times 10^{5}$ (blue histogram).

5.4.2. Numerical results for FxL

In this section, we use the procedure to efficiently value the pipeline risk typically embedded in mortgages. The pipeline risk (in mortgages) is one of the risks a financial institution is exposed to any time a client buys a mortgage. Indeed, when a client decides to buy a mortgage there is a grace period (from one to three months in The Netherlands), during which (s)he is allowed to pick the most convenient rate, namely the minimum.

Figure 7. Pricing error $ϵ_{P}$ distribution. The error is expressed in units of the corresponding Monte Carlo benchmark standard error (SE), and reported for two different numbers of MC paths. Left: FxA (10,000 values). Center: FxL (10,000 values). Right: FxFlA (54,000 values).

Observe now that a suitable Lookback option on a payer swap, namely a Lookback payer swaption, perfectly replicates the optionality offered to the client. In other words, the ‘cost’ of the pipeline risk is assessed by evaluating a proper Lookback swaption. In particular, we price fixed-strike discrete Lookback swaptions with a monitoring period of 3 months and 3-day frequency (see Equation31(31) $A (S) = min_{n} S (t_{n}), t_{n} := T - (30 - n) τ_{L}, n = 1, \dots, 30,$ (31) ).

We assume the underlying swap rate $S (t)$ , $0 \leq t \leq T$ , is driven by the dynamics given in Equations (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ) with $S (t_{0}) = 0.05$ and parallel shifted of $θ = 0.03$ . By introducing a shift, we handle also the possible situation of negative rates, which otherwise would require a different model specification.

Remark 5.2

Parallel shift of $S (t)$ and $A (S)$

A parallel shift θ does not affect the training set generation. Indeed, since $A (S) = min_{n} S (t_{n})$ , it holds $A (S - θ) \overset{d}{=} A (S) - θ$ . Then, it is enough to sample from $A (S)$ (built from the paths of $S (t)$ without shift) and perform the shift afterward, to get the desired distribution.

The timing results from the application of the procedure are comparable to the ones in Section 5.4.1 (see Table ). Furthermore, in Figure (b), we report the pricing error distribution obtained by pricing call and put options for 50 randomly chosen Heston parameters' sets and 100 values for $K_{2}$ . In this experiment, we observe that over 95% of the errors are within three MC SE when $N_{paths} = 10^{5}$ , and the percentage is about 90% when $N_{paths} = 2 \times 10^{5}$ .

5.4.3. Numerical results for FxFlA

The third and last experiment consists in the conditional sampling of $A (S) | S (T)$ . The samples are then used, together with $S (T)$ , for pricing of fixed- and floating-strikes discrete Asian options.

The procedure is tested on 30 randomly chosen Heston parameters' sets. Both the MC benchmark and the SC procedure are based on populations with $N_{paths} = 10^{5}$ paths. In the SC pricing, the process $S (T)$ is sampled using the COS method [Citation24] combined with SC (COS-SC) to avoid a huge number of CDF numerical inversions [Citation12], and so increase efficiency. Then, we apply the grid-based algorithm of Section 3.3. We evaluate the ANN at a reduced number of reference quantiles, and we compute the CVs corresponding to each sample of $S (T)$ by means of linear interpolation. The CVs identify the map $\tilde{g}$ , which is employed for the conditional sampling. Figure (a) shows the cloud of points (for parameters' Set III in Table ) of the bivariate distribution $(S (T), A (S))$ generated using the procedure against the MC benchmark, while Figure (b) only focuses on the marginal distribution of $A (S)$ . We can appreciate a good matching between the two distributions.

Figure 8. Left: joint distribution of $(S (T), A (S))$ , for the Heston set of parameters in Set III (see Table ). Right: marginal distribution of $A (S)$ , for the same set of Heston parameters.

Figure 8. Left: joint distribution of (S(T),A(S)), for the Heston set of parameters in Set III (see Table 6). Right: marginal distribution of A(S), for the same set of Heston parameters.

For each set, we price $30 \times 30$ call and put options with equally-spaced strikes $K_{1}$ and $K_{2}$ in the ranges of (Equation34(34) $max (ω (A (S) - K_{1} S (T) - K_{2}), 0), (\frac{K_{1}}{S (t_{0})}, \frac{K_{2}}{S (t_{0})}) \in [0.4, 0.6] \times [0.4, 0.6],$ (34) ). The results for the particular case of call options with Set III in Table and $K_{2} = 0.5$ are illustrated in Figure . The SC option prices and the corresponding MC benchmarks are plotted on the left. On the right, the standard errors for MC and SC are reported (left y-axis), and the absolute pricing error $ϵ_{P}$ is shown in units of the corresponding MC standard error (right y-axis). The timing results are reported in Table for $N_{paths} = 10^{5}$ and $N_{paths} = 5 \times 10^{4}$ and they keep into account the computational time for the pricing of all the 900 different call options (according to each combination of $K_{1}$ and $K_{2}$ ). Figure (c) displays the pricing error $ϵ_{P}$ distribution for the 30 randomly chosen Heston parameter sets (for each set 900 call and 900 put options are priced for every combination of $K_{1}$ and $K_{2}$ , so the overall number of data is 54,000). About 92% of the errors are within three MC SE when $N_{paths} = 5 \times 10^{4}$ . The percentage is around 80% when $N_{paths} = 10^{5}$ are used.

Figure 9. Left: fixed-float-strike discrete Asian call option prices for 30 different $K_{1}$ , $K_{2} = 0.5$ , and Heston parameter Set III. Right: MC and SC standard errors (left y-axis) and pricing errors in units of the corresponding MC standard errors SE, obtained with $N_{paths} = 10^{5}$ for both MC and SC (right y-axis).

It might look surprising that the performance of the general procedure is better than the special one, but actually, it is not. Indeed, an important aspect needs to be accounted for. The high correlation between $S (T)$ and $A (S)$ makes the task of the ANN easier, in the sense that the distribution of $A (S) | S (T)$ typically has a low variance around $S (T)$ . In other words, the ANN has to ‘learn’ only a small correction to get $A (S) | S (T)$ from $S (T)$ ( $S (T)$ is an input for the ANN!), whereas the ANN in the special procedure learns the unconditional distribution of $A (S)$ with no information on the final value $S (T)$ , and so only on the Heston parameters. The result is that a small lack in accuracy due to a not-perfect training process, or most likely to a not-perfect training set, is less significant in the conditional case rather than in the unconditional.

6. Conclusion

In this work, we presented a robust, data-driven procedure for the pricing of fixed- and floating-strike discrete Asian and Lookback options, under the stochastic volatility model of Heston. The usage of Stochastic Collocation techniques combined with deep artificial neural networks allows the methodology to reach a high level of accuracy while reducing the computational time by tens of times when compared to Monte Carlo benchmarks. Furthermore, we provide a semi-analytic pricing formula for European-type options with a payoff given by piecewise polynomial mapping of a standard normal random variable. Such a result allows to even increase the speed-up up to hundreds of times, without deterioration on the accuracy. An analysis of the error provides theoretical justification for the proposed scheme, and the problem of sampling from both unconditional and conditional distributions is further investigated from a numerical perspective. Finally, the numerical results provide clear evidence of the quality of the method.

Acknowledgments

We would like to thank the two anonymous reviewers whose insightful comments and constructive feedback greatly contributed to the enhancement of the quality of this article. Additionally, we extend our appreciation to Rabobank (the Netherlands) for funding this project.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 The meaning of ‘data-driven’ here is the one given in [Citation22]. The (empirical) distribution of interest is computed for a set of structural parameters and stored. Such synthetic ‘data’ are used to ‘drive’ the training of a suitable model.

2 We remark that in most real applications (with a few exceptions such as some commodities and FX rates) the correlation is negative. Furthermore, the phenomenon of ‘moment explosion’ for certain choices of Heston parameters involving positive correlation is discussed in [Citation2].

3 Given the two random variables X, Y, with CDFs

F_{X}, F_{Y}

, it holds:

F_{X} (X) \overset{d}{=} F_{Y} (Y)

4 The two main reasons for ξ being standard normal are the availability of such a distribution in most of the computing tools, and the ‘similarity’ between a standard normally r.v. and (the logarithm of) $A (S)$ (see [Citation12] for more details).

5 A recursive formula for the computation of $m_{i} (a, b)$ is given in Appendix A.2.

6 The synthetic data are generated via MC simulation, as explained in Section 5.

7 By stochastic bridge, we mean any stochastic process conditional to both its initial and final values.

8 Th PDF $f_{ξ}$ works as a dumping factor in $ϵ_{-} = E [(g 1_{Ω_{-}} - g_{-})^{2} (ξ)]$ and $ϵ_{+} = E [(g 1_{Ω_{+}} - g_{+})^{2} (ξ)]$ .

9 Entire functions are complex analytic functions in the whole complex plane $C$ .

10 $p ∖ {T} = {r, κ, γ, ρ, \bar{v}, v_{0}}$ and $p ∖ {T} = {κ, γ, ρ, \bar{v}, v_{0}}$ for FxA and FxL, respectively.

11 Under the risk-neutral measure, $Q$ . However, the same scheme applies under the underlying process measure $Q^{S}$ , with only a minor difference, i.e. $k_{1} := (ρκ / γ + 1 / 2) Δt - ρ / γ$ .

References

L.B. Andersen, Efficient simulation of the Heston stochastic volatility model, J. Comput. Finance 11 (2007), pp. 1–42.
Google Scholar
L.B. Andersen and V.V. Piterbarg, Moment explosions in stochastic volatility models, Finance Stoch.11(1) (2007), pp. 29–50.
Web of Science ®Google Scholar
E. Benhamou, Fast Fourier transform for discrete Asian options, J. Comput. Finance 6(1) (2002), pp. 49–68.
Google Scholar
S.V. Berezin, On analytic characteristic functions and processes governed by SDEs, St. Petersburg Polytech. University J.: Phys. Math. 2(2) (2016), pp. 144–149.
Google Scholar
M. Broadie and Ö. Kaya, Exact simulation of stochastic volatility and other affine jump diffusion processes, Oper. Res. 54(2) (2006), pp. 217–231.
Web of Science ®Google Scholar
A. Conze, Path dependent options: The case of Lookback options, J. Finance 46(5) (1991), pp. 1893–1907.
Web of Science ®Google Scholar
S. Corsaro, I. Kyriakou, D. Marazzina, and Z. Marino, A general framework for pricing asian options under stochastic volatility on parallel architectures, Eur. J. Oper. Res. 272(3) (2019), pp. 1082–1095.
Web of Science ®Google Scholar
J. Devreese, D. Lemmens, and J. Tempere, Path integral approach to Asian options in the Black-Scholes model, Physica A Stat. Mech. Appl. 389(4) (2010), pp. 780–788.
Google Scholar
M. Gaß, K. Glau, M. Mahlstedt, and M. Mair, Chebyshev interpolation for parametric option pricing, Finance Stoch. 22 (2018), pp. 701–731.
Web of Science ®Google Scholar
M.B. Goldman, H.B. Sosin, and M.A. Gatto, Path dependent options: ‘Buy at the low, sell at the high’, J. Finance 34(5) (1979), pp. 1111–1127.
Web of Science ®Google Scholar
L.A. Grzelak and C.W. Oosterlee, From arbitrage to arbitrage-free implied volatilities, J. Comput. Finance 20(3) (2016), pp. 1–19.
Web of Science ®Google Scholar
L.A. Grzelak, J.A.S. Witteveen, M. Suárez-Taboada, and C.W. Oosterlee, The stochastic collocation Monte Carlo sampler: Highly efficient sampling from expensive distributions, Quant. Finance 19(2) (2019), pp. 339–356.
Web of Science ®Google Scholar
V. Henderson and R. Wojakowski, On the equivalence of floating-and fixed-strike asian options, J. Appl. Probab. 39(2) (2002), pp. 391–394.
Web of Science ®Google Scholar
S.L. Heston, A closed-form solution for options with stochastic volatility with applications to bond and currency options, Rev. Financ. Stud. 6(2) (1993), pp. 327–343.
Web of Science ®Google Scholar
R.C. Heynen and H.M. Kat, Lookback options with discrete and partial monitoring of the underlying price, Appl. Math. Finance 2(4) (1995), pp. 273–284.
Google Scholar
A. Kemna and A. Vorst, A pricing method for options based on average asset values, J. Bank. Financ.14(1) (1990), pp. 113–129.
Web of Science ®Google Scholar
M.G. Kendall, The Advanced Theory of Statistics, Wiley, London (UK), 1945.
Google Scholar
D.P. Kingma and J.L. Ba, Adam: A method for stochastic optimization, in Published as a Conference Paper at the 3rd International Conference for Learning Representations, San Diego, 2015.
Google Scholar
J.L. Kirkby and D. Nguyen, Efficient Asian option pricing under regime switching jump diffusions and stochastic volatility models, Annal. Finance 16(3) (2020), pp. 307–351.
Web of Science ®Google Scholar
A. Leitao, L. Ortiz-Gracia, and E.I. Wagner, Swift valuation of discretely monitored arithmetic asian options, J. Comput. Sci. 28 (2018), pp. 120–139.
Web of Science ®Google Scholar
A. Leitao Rodriguez, J. Lars Kirkby, and L. Ortiz-Gracia, The CTMC–heston model: Calibration and exotic option pricing with SWIFT, J. Comput. Finance 24(4) (2021), pp. 71–114.
Web of Science ®Google Scholar
S. Liu, L.A. Grzelak, and C.W. Oosterlee, The seven-league scheme: Deep learning for large time step Monte Carlo simulations of stochastic differential equations, Risks 10(3) (2022), p. 47.
Web of Science ®Google Scholar
C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, Activation functions: Comparison of trends in practice and research for deep learning. arXiv e-prints, 2018.
Google Scholar
C.W. Oosterlee and L.A. Grzelak, Mathematical Modeling and Computation in Finance, World Scientific Publishing Europe Ltd., 57 Shelton Street, Covent Garden, London WC2H 9HE, 2019.
Google Scholar
L. Perotti and L.A. Grzelak, Fast sampling from time-integrated bridges using deep learning, J. Comput. Math. Data Sci. 5 (2022) p. 100060.
Google Scholar
L.N. Trefethen, Approximation theory and approximation practice, extended edition, Society for Industrial and Applied Mathematics, Philadelphia (US), 2019.
Google Scholar
J. Vecer, Unified pricing of Asian options, Risk 15(6) (2002), pp. 113–116.
Google Scholar
P. Wilmott, J. Dewynne, and S. Howison, Option Pricing: Mathematical Models and Computation, Oxford Financial Press, Oxford, 1993.
Google Scholar
S. Xiang, X. Chen, and H. Wang, Error bounds for approximation in Chebyshev points, Numer. Math.116 (2010), pp. 463–491.
Web of Science ®Google Scholar
D. Yarotsky, Error bounds for approximations with Deep ReLU networks, Neural Netw. 94 (2017), pp. 103–114.
PubMed Web of Science ®Google Scholar
B. Zhang and C.W. Oosterlee, Efficient pricing of European-style Asian options under exponential Lévy processes based on Fourier Cosine expansions, SIAM J. Financ. Math. 4 (2013), pp. 399–426.
Web of Science ®Google Scholar

Appendix

Proofs and lemmas

A.1. Underlying process measure for floating-strike options

Proof of Proposition 2.1

Under the risk-neutral measure $Q$ the value at time $t_{0} \geq 0$ of a floating-strike Asian Option, with maturity $T > t_{0}$ , underlying process $S (t)$ , and future monitoring dates $t_{n}$ , $n \in {1, \dots, N}$ , is given by: $V_{ω}^{fl} (t_{0}) = E_{t_{0}}^{Q} [\frac{M (t_{0})}{M (T)} max (ω (A (S) - K_{1} S (T)), 0)] .$ We define a Radon-Nikodym derivative to change the measure from the risk-neutral measure $Q$ to the stock measure $Q^{S}$ , namely the measure associated with the numéraire S: $\frac{d Q^{S}}{d Q} = \frac{S (T)}{S (t_{0})} \frac{M (t_{0})}{M (T)},$ which yields the following present value, expressed as an expectation under the measure $Q^{S}$ : $\begin{aligned} V_{ω}^{fl} (t_{0}) & = E_{t_{0}}^{S} [\frac{M (t_{0})}{M (T)} max (ω (A (S) - K_{1} S (T)), 0) \frac{S (t_{0})}{S (T)} \frac{M (T)}{M (t_{0})}] \\ = S (t_{0}) E_{t_{0}}^{S} [max (ω (\frac{A (S)}{S (T)} - K_{1}), 0)] . \end{aligned}$

Proposition Appendix A.1

The Heston model under the underlying process measure

Using the same notation as in (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ), under the stock $S (t)$ measure, $Q^{S}$ , the Heston framework yields the following dynamics for the process $S (t)$ : $\begin{aligned} d S (t) & = (r + v (t)) S (t) d t + \sqrt{v (t)} S (t) d W_{x}^{S} (t), S (t_{0}) = S_{0}, \\ d v (t) & = κ^{*} ({\bar{v}}^{*} - v (t)) d t + γ \sqrt{v (t)} d W_{v}^{S} (t), v (t_{0}) = v_{0}, \end{aligned}$ with $κ^{*} = κ - γρ$ , ${\bar{v}}^{*} = κ \bar{v} / κ^{*}$ , and $W_{x}^{S} (t)$ and $W_{v}^{S} (t)$ are BMs under the underlying process measure $Q^{S}$ such that $d W_{s}^{S} (t) d W_{v}^{S} (t) = ρ d t$ .

Proof.

Under the stock measure $Q^{S}$ , implied by the stock $S (t)$ as numéraire, all the assets discounted with S must be martingales. Particularly, this entails that $M (t) / S (t)$ must be a martingale, where $M (t)$ is the money-savings account defined as $d M (t) = rM (t) d t$ .

From (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ), using Cholesky decomposition, the Heston model can be expressed in terms of independent Brownian motions, ${\tilde{W}}_{x} (t)$ and ${\tilde{W}}_{v} (t)$ , through the following system of SDEs: $\begin{aligned} d S (t) & = rS (t) d t + \sqrt{v (t)} S (t) d {\tilde{W}}_{x} (t), \\ d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} [ρ d {\tilde{W}}_{x} (t) + \sqrt{1 - ρ^{2}} d {\tilde{W}}_{v} (t)] . \end{aligned}$ After application of Itô's Lemma we find: $d \frac{M (t)}{S (t)} = \frac{1}{S (t)} rM (t) d t - \frac{M (t)}{S^{2} (t)} (rS (t) d t + \sqrt{v (t)} S (t) d {\tilde{W}}_{x} (t)) + \frac{M (t)}{S^{3} (t)} v (t) S^{2} (t) d t,$ which implies the following measure transformation: $d {\tilde{W}}_{x} (t) = d {\tilde{W}}_{x}^{S} (t) + \sqrt{v (t)} d t .$ Thus, under the stock measure $Q^{S}$ , the dynamics of $S (t)$ reads: $\begin{aligned} d S (t) & = rS (t) d t + \sqrt{v (t)} S (t) (d {\tilde{W}}_{x}^{S} (t) + \sqrt{v (t)} d t) \\ = (r + v (t)) S (t) d t + \sqrt{v (t)} S (t) d {\tilde{W}}_{x}^{S} (t), \end{aligned}$ while for the dynamics of $v (t)$ we find: $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} [ρ (d {\tilde{W}}_{x}^{S} (t) + \sqrt{v (t)} d t) + \sqrt{1 - ρ^{2}} d {\tilde{W}}_{v} (t)] \\ = [κ (\bar{v} - v (t)) + γρv (t)] d t + γ \sqrt{v (t)} [ρ d {\tilde{W}}_{x}^{S} (t) + \sqrt{1 - ρ^{2}} d {\tilde{W}}_{v} (t)] . \end{aligned}$ Setting $κ^{*} := κ - γρ$ , ${\bar{v}}^{*} := κ \bar{v} / κ^{*}$ , $W_{x}^{S} (t) := {\tilde{W}}_{x}^{S} (t)$ , and $W_{v}^{S} (t) := ρ {\tilde{W}}_{x}^{S} (t) + \sqrt{1 - ρ^{2}} {\tilde{W}}_{v} (t)$ the proof is complete.

A.2. Semi-analytic pricing formula

Result Appendix A.1

Moments of truncated standard normal distribution

Let $ξ \sim N (0, 1)$ and $a, b \in [- \infty, + \infty]$ , a<b. Then, the recursive expression for: $m_{i} (a, b) := E [ξ^{i} | a \leq ξ \leq b],$ the ith moment of the truncated standard normal distribution $ξ | a \leq ξ \leq b$ , reads: $m_{i} (a, b) = (i - 1) m_{i - 2} (a, b) - \frac{b^{i - 1} f_{ξ} (b) - a^{i - 1} f_{ξ} (a)}{F_{ξ} (b) - F_{ξ} (a)}, i \in N ∖ {0},$ where $m_{- 1} (a, b) := 0$ , $m_{0} (a, b) := 1$ , and $f_{ξ}$ and $F_{ξ}$ are the PDF and the CDF of ξ, respectively.

Result Appendix A.2

Expectation of polynomial of truncated normal distribution

Let $p (x) = \sum_{i = 0}^{M - 1} α_{i} x^{i}$ be a $(M - 1)$ -degree polynomial and let $ξ \sim N (0, 1)$ , with $f_{ξ}$ , $F_{ξ}$ its PDF and CDF, respectively. Then, for any $a, b \in [- \infty, + \infty]$ with a<b, the following holds: (A1) $\int_{a}^{b} p (x) f_{ξ} (x) d x = \sum_{i = 0}^{M - 1} α_{i} m_{i} (a, b) (F_{ξ} (b) - F_{ξ} (a)),$ (A1) with $m_{i} (a, b)$ as defined in Result Appendix A.1.

Proof.

The proof immediately follows thanks to the following equalities: (A2) $\begin{aligned} \int_{a}^{b} p (x) f_{ξ} (x) d x & = \sum_{i = 0}^{M - 1} α_{i} \int_{a}^{b} x^{i} f_{ξ} (x) d x = \sum_{i = 0}^{M - 1} α_{i} E [ξ^{i} 1_{[a, b]} (ξ)] \\ = \sum_{i = 0}^{M - 1} α_{i} E [ξ^{i} | a \leq ξ \leq b] \cdot P [a \leq ξ \leq b] . \end{aligned}$ (A2)

Proof of Proposition 3.2

The approximation $\tilde{g}$ is strictly increasing in the domain of interest. Then, setting $c_{K} = {\tilde{g}}^{- 1} (K)$ , we have: (A3) $\begin{aligned} \frac{{\tilde{V}}_{ω} (t_{0})}{C} & = \int_{- \infty}^{+ \infty} max (ω (\tilde{g} (x) - K), 0) f_{ξ} (x) d x \\ = \int_{ω c_{K}}^{+ \infty} ω (\tilde{g} (ωy) - K) f_{ξ} (ωy) d y \\ = ω (\int_{ω c_{K}}^{+ \infty} \tilde{g} (ωy) f_{ξ} (y) d y - K P [ξ > ω c_{K}]), \end{aligned}$ (A3) where the first equality holds by definition of expectation, the second one relies on a suitable change of variable (y = −x) and the last one holds thanks to the even symmetry of $f_{ξ}$ . We define the integral $I_{ω} (c_{K})$ as: $I_{ω} (c_{K}) := \int_{ω c_{K}}^{+ \infty} \tilde{g} (ωx) f_{ξ} (x) d x,$ and using the definition of $\tilde{g}$ as a piecewise polynomial, we get: (A4) $I_{ω} (c_{K}) = \int_{ω c_{K}}^{- \bar{ξ} \lor c_{K}} g_{- ω} (ωx) f_{ξ} (x) d x + \int_{- \bar{ξ} \lor c_{K}}^{\bar{ξ} \lor c_{K}} g_{M} (ωx) f_{ξ} (x) d x + \int_{\bar{ξ} \lor c_{K}}^{+ \infty} g_{ω} (ωx) f_{ξ} (x) d x .$ (A4) The thesis follows by applying Result Appendix A.2 at each term in (EquationA4(A4) $I_{ω} (c_{K}) = \int_{ω c_{K}}^{- \bar{ξ} \lor c_{K}} g_{- ω} (ωx) f_{ξ} (x) d x + \int_{- \bar{ξ} \lor c_{K}}^{\bar{ξ} \lor c_{K}} g_{M} (ωx) f_{ξ} (x) d x + \int_{\bar{ξ} \lor c_{K}}^{+ \infty} g_{ω} (ωx) f_{ξ} (x) d x .$ (A4) ) and exploiting the definition of $F_{ξ}$ .

A.3. Almost exact simulation from the Heston model

In a MC framework, the most common scheme employed in the industry is the Euler-Maruyama discretization of the system of SDEs which describes the underlying process dynamics. For the stochastic volatility model of Heston, such a scheme can be improved, allowing for an exact simulation of the variance process $v (t)$ (see (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) )), as shown in [Citation5]. This results in increased accuracy, and avoids numerical issues due to the theoretical non-negativity of the process $v (t)$ , leading to the so-called almost exact simulation of the Heston model [Citation24].

Result Appendix A.3

Almost exact simulation from the Heston model

Given $X (t) := \log S (t)$ , its dynamicsFootnote¹¹ between the consequent times $t_{i}$ and $t_{i + 1}$ is discretized with the following scheme: (A5) $\begin{aligned} x_{i + 1} & \approx x_{i} + k_{0} + k_{1} v_{i} + k_{2} v_{i + 1} + \sqrt{k_{3} v_{i}} ξ, \\ v_{i + 1} & = \bar{c} χ^{2} (δ, \bar{κ} v_{i}) \end{aligned}$ (A5) with the quantities: $Δt := t_{i + 1} - t_{1}, δ := \frac{4 κ \bar{v}}{γ^{2}}, \bar{c} := \frac{\bar{v}}{δ} (1 - e^{- κ Δt}), \bar{κ} := {\bar{c}}^{- 1} e^{- κ Δt},$ the noncentral chi-squared random variable $χ^{2} (δ, η)$ with δ degrees of freedom and non-centrality parameter η, and $ξ \sim N (0, 1)$ . The remaining constants are defined as: $k_{0} := (r - \frac{ρ}{γ} κ \bar{v}) Δt, k_{1} := (\frac{ρκ}{γ} - \frac{1}{2}) Δt - \frac{ρ}{γ}, k_{2} := \frac{ρ}{γ}, k_{3} := (1 - ρ^{2}) Δt .$

Derivation.

Given $X (t) = \log S (t)$ , by applying Itô's Lemma and Cholesky decomposition on the dynamics in (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ), we get: (A6) $\begin{aligned} d X (t) & = (r - \frac{1}{2} v (t)) d t + \sqrt{v (t)} [ρ d {\tilde{W}}_{v} (t) + \sqrt{1 - ρ^{2}} d {\tilde{W}}_{x} (t)], \end{aligned}$ (A6) (A7) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d {\tilde{W}}_{v} (t), \end{aligned}$ (A7) where ${\tilde{W}}_{x} (t)$ and ${\tilde{W}}_{v} (t)$ are independent BMs.

By integrating (EquationA6(A6) $\begin{aligned} d X (t) & = (r - \frac{1}{2} v (t)) d t + \sqrt{v (t)} [ρ d {\tilde{W}}_{v} (t) + \sqrt{1 - ρ^{2}} d {\tilde{W}}_{x} (t)], \end{aligned}$ (A6) ) and (EquationA7(A7) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d {\tilde{W}}_{v} (t), \end{aligned}$ (A7) ) in a the time interval $[t_{i}, t_{i + 1}]$ , the following discretization scheme is obtained: (A8) $\begin{aligned} x_{i + 1} & = x_{i} + \int_{t_{i}}^{t_{i + 1}} (r - \frac{1}{2} v (t)) d t + ρ \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{v} (t) + \sqrt{1 - ρ^{2}} \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{x} (t), \end{aligned}$ (A8) (A9) $\begin{aligned} v_{i + 1} & = v_{i} + κ \int_{t_{i}}^{t_{i + 1}} (\hat{v} - v (t)) d t + γ \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{v} (t), \end{aligned}$ (A9) where $x_{i} := X (t_{i})$ , $x_{i + 1} := X (t_{i + 1})$ , $v_{i} := v (t_{i})$ , $v_{i + 1} := v (t_{i + 1})$ .

Given $v_{i}$ , the variance $v_{i + 1}$ is distributed as a suitable scaled noncentral chi-squared distribution [Citation24]. Therefore, we substitute $\int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{v}^{λ} (t)$ in (EquationA8(A8) $\begin{aligned} x_{i + 1} & = x_{i} + \int_{t_{i}}^{t_{i + 1}} (r - \frac{1}{2} v (t)) d t + ρ \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{v} (t) + \sqrt{1 - ρ^{2}} \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{x} (t), \end{aligned}$ (A8) ) using (EquationA9(A9) $\begin{aligned} v_{i + 1} & = v_{i} + κ \int_{t_{i}}^{t_{i + 1}} (\hat{v} - v (t)) d t + γ \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{v} (t), \end{aligned}$ (A9) ), ending up with: $\begin{aligned} x_{i + 1} & = x_{i} + \int_{t_{i}}^{t_{i + 1}} (r - \frac{1}{2} v (t)) d t + \frac{ρ}{γ} (v_{i + 1} - v_{i} - κ \int_{t_{i}}^{t_{i + 1}} (\bar{v} - v (t)) d t) + \sqrt{1 - ρ^{2}} \int_{t_{i}}^{t_{i + 1}} \sqrt{v (t)} d {\tilde{W}}_{x} (t) . \end{aligned}$ We approximate the integrals in the expression above employing the left integration boundary values of the integrand, as in the Euler-Maruyama discretization scheme. The scheme (EquationA5(A5) $\begin{aligned} x_{i + 1} & \approx x_{i} + k_{0} + k_{1} v_{i} + k_{2} v_{i + 1} + \sqrt{k_{3} v_{i}} ξ, \\ v_{i + 1} & = \bar{c} χ^{2} (δ, \bar{κ} v_{i}) \end{aligned}$ (A5) ) follows collecting the terms and employing the property ${\tilde{W}}_{x} (t_{i + 1}) - {\tilde{W}}_{x} (t_{i}) \overset{d}{=} \sqrt{Δt} ξ$ , with $ξ \sim N (0, 1)$ and $Δt := t_{i + 1} - t_{i}$ .

A.4. SC error analysis for Chebyshev interpolation

The two following lemmas are useful to show that the conditional complex ChF $ϕ_{A | V} (z) = E [e^{izA} | V]$ is an analytic function of $z \in C$ . The first one provides the law of the conditional stock-price distribution, whereas the second one is meant to give algebraic bounds for the target function $A (S)$ .

Lemma Appendix A.2

Conditional distribution under Heston

Let $S (t)$ be the solution at time t of Equation (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) and $I_{v} (t) := \int_{t_{0}}^{t} v (τ) d τ$ , with v driven by the dynamics in Equation (Equation10(10) $\begin{aligned} d v (t) & = κ (\bar{v} - v (t)) d t + γ \sqrt{v (t)} d W_{v} (t), v (t_{0}) = v_{0}, \end{aligned}$ (10) ). Then, the following equality in distribution holds: $S (t) | I_{v} (t_{0}, t) \overset{d}{=} \exp (μ (I_{v} (t_{0}, t)) + σ (I_{v} (t_{0}, t)) ξ),$ with $ξ \sim N (0, 1)$ , μ and σ defined as $μ (y) := \log S (t_{0}) + r (t - t_{0}) - y / 2$ and $σ (y) := \sqrt{y}$ . Furthermore, for any $k = 0, 1, \dots$ , the following holds: (A10) $E [S (t)^{k} | I_{v} (t)] = \exp (kμ (I_{v} (t)) + \frac{1}{2} k^{2} σ^{2} (I_{v} (t)))$ (A10) In other words, the stock price given the time-integral of the variance process $I_{v}$ is log-normally distributed, with parameters dependent on the time-integral $I_{v}$ , and its moments up to any order are given as in Equation (EquationA10(A10) $E [S (t)^{k} | I_{v} (t)] = \exp (kμ (I_{v} (t)) + \frac{1}{2} k^{2} σ^{2} (I_{v} (t)))$ (A10) ).

Proof.

Writing (Equation9(9) $\begin{aligned} d S (t) & = r S (t) d t + \sqrt{v (t)} S (t) d W_{x} (t), S (t_{0}) = S_{0}, \end{aligned}$ (9) ) in integral form we get: $S (t) = S (t_{0}) \exp (r (t - t_{0}) - \frac{1}{2} \int_{t_{0}}^{t} v (τ) d τ + \int_{t_{0}}^{t} \sqrt{v (τ)} d W_{x} (τ)) .$ By considering the conditional distribution $S (t) | I_{v} (t)$ (instead of $S (t)$ ) the only source of randomness is given by the Itô's integral (and it is due to the presence of the Brownian motion $W_{x} (t)$ ). The thesis follows since the Itô's integral of a deterministic argument is normally distributed with zero mean and variance given by the time integral of the argument squared (in the same interval). Therefore, $S (t) | I_{v} (t)$ is log-normally distributed, with moments given as in (EquationA10(A10) $E [S (t)^{k} | I_{v} (t)] = \exp (kμ (I_{v} (t)) + \frac{1}{2} k^{2} σ^{2} (I_{v} (t)))$ (A10) ).

Lemma Appendix A.3

Algebraic bounds

Let us consider ${s_{1}, \dots, s_{N}}$ , with $s_{n} > 0$ for each $n = 1, \dots, N$ . Then, for any $k = 1, 2, \dots$ , we have:

(1)	$(\sum_{n} s_{n})^{k} \leq 2^{(N - 1) (k - 1)} \sum_{n} s_{n}^{k}$ .
(2)	$(min_{n} s_{n})^{k} \leq s_{n^{}}^{k}$ for any $n^{} = 1, \dots, N$ .

Proof.

The second thesis is obvious. We prove here the first one. We recall that in general, given a, b>0 and any $k = 1, 2, \dots$ , the following inequality holds: (A11) $(a + b)^{k} \leq 2^{k - 1} (a^{k} + b^{k}) .$ (A11) Then, applying (EquationA11(A11) $(a + b)^{k} \leq 2^{k - 1} (a^{k} + b^{k}) .$ (A11) ) N−1 times we get: ${(\sum_{n = 1}^{N} s_{n})}^{k} \leq 2^{k - 1} (s_{1}^{k} + {(\sum_{n = 2}^{N} s_{n})}^{k}) \leq \dots \leq 2^{(N - 1) (k - 1)} s_{N}^{k} + \sum_{n = 1}^{N - 1} 2^{n (k - 1)} s_{n}^{k},$ which can be further bounded by: ${(\sum_{n = 1}^{N} s_{n})}^{k} \leq 2^{(N - 1) (k - 1)} s_{N}^{k} + \sum_{n = 1}^{N - 1} 2^{(N - 1) (k - 1)} s_{n}^{k} = \sum_{n = 1}^{N} 2^{(N - 1) (k - 1)} s_{n}^{k} .$

We have all the ingredients to prove Proposition 4.4.

Proof of Proposition 4.4

To exploit the characterization for entire ChFs in Theorem 4.3 we need to show the finiteness of each absolute moment as well as that Equation (Equation25(25) $lim_{k \to + \infty} {(\frac{| E [Y^{k}] |}{k!})}^{\frac{1}{k}} = 0.$ (25) ) is satisfied. Both the conditions can be proved using Lemmas Appendix A.2 and A.3. For $k = 0, 1, \dots$ , we consider the two cases:

If $A = \frac{1}{N} \sum_{n} S (t_{n})$ , then thanks to Lemma Appendix A.3 we have: $\begin{aligned} E [| A |^{k} | V] = \frac{1}{N^{k}} E [{(\sum_{n} S (t_{n}))}^{k} | V] & \leq \frac{2^{(N - 1) (k - 1)}}{N^{k}} E [\sum_{n} S (t_{n})^{k} | V] \\ = \frac{2^{(N - 1) (k - 1)}}{N^{k}} \sum_{n} E [S (t_{n})^{k} | V], \end{aligned}$ whereas from Lemma Appendix A.2 follows: (A12) $\begin{aligned} E [| A |^{k} | V] & \leq \frac{2^{(N - 1) (k - 1)}}{N^{k}} \sum_{n} E [S (t_{n})^{k} | V] \end{aligned}$ (A12) (A13) $\begin{aligned} = \frac{2^{(N - 1) (k - 1)}}{N^{k}} \sum_{n} \exp (k μ_{n} (V) + \frac{1}{2} k^{2} σ_{n}^{2} (V)), \end{aligned}$ (A13) where $μ_{n} (V) := μ (I_{v} (t_{n}))$ and $σ_{n} (V) := σ (I_{v} (t_{n}))$ , $n = 1, \dots, N$ .
If $A = min_{n} S (t_{n})$ , then we immediately have: (A14) $\begin{aligned} E [| A |^{k} | V] & \leq E [S (t_{n^{*}})^{k} | V] \end{aligned}$ (A14) (A15) $\begin{aligned} = \exp (k μ_{n^{*}} (V) + \frac{1}{2} k^{2} σ_{n^{*}}^{2} (V)), \end{aligned}$ (A15) for an arbitrary $n^{*} = 1, \dots, N$ .

The finiteness of the absolute moments up to any order follows directly from Equations (EquationA13(A13) $\begin{aligned} = \frac{2^{(N - 1) (k - 1)}}{N^{k}} \sum_{n} \exp (k μ_{n} (V) + \frac{1}{2} k^{2} σ_{n}^{2} (V)), \end{aligned}$ (A13) ) and (EquationA15(A15) $\begin{aligned} = \exp (k μ_{n^{*}} (V) + \frac{1}{2} k^{2} σ_{n^{*}}^{2} (V)), \end{aligned}$ (A15) ) respectively, since $I_{v} (t_{n})$ are finite (indeed, they are time-integrals on compact intervals of continuous paths).

Eventually, thanks to Jensen's inequality we have $| E [A | V] |^{k} \leq E [| A |^{k} | V]$ . This, together with the at-most exponential growth (in k) of the absolute moments of $A | V$ , ensures that the limit in Equation (Equation25(25) $lim_{k \to + \infty} {(\frac{| E [Y^{k}] |}{k!})}^{\frac{1}{k}} = 0.$ (25) ) holds. Then, by Theorem 4.3, $ϕ_{A | V} (z)$ is an entire function of the complex variable $z \in C$ .

Proof of Proposition 4.5

The goal here is to apply Morera's theorem. Hence, let $γ \in S_{y^{*}}$ be any piecewise $C^{1}$ closed curve in the strip $S_{y^{*}}$ . Then: $\begin{aligned} \int_{γ} ϕ_{A} (z) d z & \overset{(26)}{=} \int_{γ} \int_{Ω_{V}} ϕ_{A | V = v} (z) d F_{V} (v) d z \\ \overset{Fubini}{=} \int_{Ω_{V}} \int_{γ} ϕ_{A | V = v} (z) d z d F_{V} (v) \overset{Cauchy}{=} 0, \end{aligned}$ where in the first equality we exploited the representation of the unconditional ChF $ϕ_{A}$ in terms of conditional ChFs $ϕ_{A | V}$ , in the second equality we use Fubini's theorem to exchange the order of integration, and eventually in the last equation we employ the Cauchy's integral theorem on $\int_{γ} ϕ_{A | V = v} (z) d z$ .

On pricing of discrete Asian and Lookback options under the Heston model

Abstract

1. Introduction

2. Discrete arithmetic Asian and Lookback options

2.1. Pricing of arithmetic Asian options and Heston framework

Pricing of floating-strike arithmetic Asian option under the stock measure

Symmetry of fixed- and floating-strike Asian option present value

3. Swift numerical pricing using deep learning

3.1. ‘Compressing’ distribution with stochastic collocation

‘Compressed’ distributions

3.2. Semi-analytical pricing of fixed- or floating-strike Asian options

Semi-analytic pricing formula

Change of basis

3.3. Swift Monte Carlo pricing of fixed- and floating-strikes Asian options

4. Error analysis

4.1. Stochastic collocation error using Chebyshev polynomials

Error bound for analytic function [Citation9,Citation29]

ChF inversion theorem

Characterization of entire ChFs [Citation4]

Conditional ChF ϕA|V is entire

ChF ϕA is analytic

4.2. Artificial neural network regression error

Convergence for ReLU ANN

Input scaling

5. Numerical experiments

5.1. A benchmark from the literature

5.2. Experiments' specifications

Table 1. Training set for benchmark replication.

Table 2. Results benchmark (BM) replication (see, Table 5 from [Citation19]).

5.3. Artificial neural network development

5.3.1. Training set generation

Table 3. Training sets TFxA and TFxL generation details.

Table 4. Training set TFxFlA generation details.

5.3.2. Artificial neural network training

Table 5. Artificial neural network and optimization details.

5.4. Sampling and pricing

5.4.1. Numerical results for FxA

Scaled underlying process and (positive) homogeneity of A

Table 6. Tested Heston parameter sets.

Table 7. Timing results for option pricing.

5.4.2. Numerical results for FxL

Parallel shift of S(t) and A(S)

5.4.3. Numerical results for FxFlA

6. Conclusion

Acknowledgments

Disclosure statement

Notes

References

Appendix

Proofs and lemmas

A.1. Underlying process measure for floating-strike options

The Heston model under the underlying process measure

A.2. Semi-analytic pricing formula

Moments of truncated standard normal distribution

Expectation of polynomial of truncated normal distribution

A.3. Almost exact simulation from the Heston model

Almost exact simulation from the Heston model

A.4. SC error analysis for Chebyshev interpolation

Conditional distribution under Heston

Algebraic bounds

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Conditional ChF $ϕ_{A | V}$ is entire

ChF $ϕ_{A}$ is analytic

Table 3. Training sets $T_{FxA}$ and $T_{FxL}$ generation details.

Table 4. Training set $T_{FxFlA}$ generation details.

Parallel shift of $S (t)$ and $A (S)$