Abstract
In this paper we re-formulate the automatic differentiation (and in particular, the backward automatic differentiation, also known as adjoint automatic differentiation, AAD) for random variables. While this is just a formal re-interpretation it allows one to investigate the algorithms in the presence of stochastic operators like expectation, conditional expectation or indicator functions.
We then specify the algorithms to efficiently incorporate non-pathwise operators (like conditional expectation operators). Under a comparably mild assumption it is possible to retain the simplicity of the backward automatic differentiation algorithm in the presence of conditional expectation operators. This simplifies important applications like - in mathematical finance - the application of backward automatic differentiation to the valuation of Bermudan options or calculation of xVA's.
We give the proof for a generalized version of the result. We then discuss in detail how the framework allows dramatic reduction of the memory requirements and improves the performance of a tapeless implementation of automatic differentiation (while the implementation brings advantages similar to ‘vector AAD’ (sometimes called tape compression) for free, it allows improvements beyond this. We present the implementation aspects and show how concepts from object-functional programing, like immutable objects and lazy evaluation enable additional reductions of the memory requirements.
Acknowledgments
We are grateful to Amine Chaieb and Marco Noll for stimulating discussions.
Disclosure statement
No potential conflict of interest was reported by the author.
Notes
† On a 64 bit system a memory address requires the same storage space as an IEEE 754 double precision floating point number.
† Although this assumption may be non-trivial in general, it is trivial if we consider Ω to be a discrete (Monte-Carlo) sampling space.
‡ Speaking of matrices, for derivatives, right multi-plication is replaced by left-multiplication, but for path-wise operators the derivative is self-adjoint.
† We have and
.
‡ For example (possibly slightly oversimplified): For small x we have that 1+x is a good approximation of , and
is still a comparably good approximation for
, but
is maybe not a good approximation for
.
† Although this assumption may be non-trivial in general, it is trivial if we consider Ω to be a discrete (Monte-Carlo) sampling space.
† Here we refer to the result stated in Section 1.1.1 where refers to the memory requirement of the valuation and
refers to the computation time requirement of the valuation.
† In so called ‘managed’ languages, like Java, the virtual machine will free the memory used to store an object, once there is no other references held to the object (cyclic references may be detected and removed).
‡ This design is utilized in Citationfinmath.net since it first release (2004).
† The backward algorithm is constructing the snell envelope.
‡ Since there is no shift size, the is no dependency on the shift size and we get a horizontal line.
† We use a LIBOR Market Model in normal specification
with
for
where
denotes the forward rate for the period
observed at time t and the model vegas are the partial derivatives with respect to
.
‡ The results may be reproduced by running the unit test LIBORMarketModelNormalAADSensitivitiesTest in Citationfinmath.net.
14 It is this random variable, which is propagated by the algorithm in Theorem 1 and the initial value is the the unit random variable 1.