Abstract
We consider the problem of execution timing in optimal execution. Specifically, we formulate the optimal execution problem of an infinitesimal order as an optimal stopping problem. By using a novel neural network architecture, we develop two versions of data-driven approaches for this problem, one based on supervised learning, and the other based on reinforcement learning. Temporal difference learning can be applied and extends these two methods to many variants. Through numerical experiments on historical market data, we demonstrate significant cost reduction of these methods. Insights from numerical experiments reveals various tradeoffs in the use of temporal difference learning, including convergence rates, data efficiency, and a tradeoff between bias and variance.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 This assumption is justifiable in our setting as the execution horizon is typically quite short, and might be measured in seconds to minutes. Over such short time horizons, non-stationarity can be ignored. Beyond this, note that the time of the day is also included as a state variable so the price dynamics allow for time-of-day effects even though they are stationary.
2 In this paper, our Monte Carlo updates utilize empirical samples and do not require a generative model as in typical Monte Carlo simulations.
3 The results in figure use the aforementioned RNN neural network architecture described in appendix 4. But qualitatively, the results hold for general neural network architectures as well, as the differences in running time is caused by the complexity in the loss function evaluation.