Abstract
In this paper, we investigate market behaviors at high-frequency using neural networks trained with order book data. Experiments are done intensively with 110 asset pairs covering 97% of spot-futures pairs in the Korea Exchange. An efficient training scheme that improves the performance and training stability is suggested, and using the proposed scheme, the lead–lag relationship between spot and futures markets are measured by comparing the performance gains of each market data set for predicting the other. In addition, the gradients of the trained model are analyzed to understand some important market features that neural networks learn through training, revealing characteristics of the market microstructure. Our results show that highly complex neural network models can successfully learn market features such as order imbalance, spread-volatility correlation, and mean reversion.
Acknowledgments
The authors thank their project counterpart for providing us with valuable datasets. Constructive comments from Prof. Jinwoo Shin are greatly appreciated. Author names are in alphabetical order.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Geonhwan Ju https://orcid.org/0000-0001-9661-5162
Kyoung-Kuk Kim https://orcid.org/0000-0002-9661-8707
Dong-Young Lim http://orcid.org/0000-0002-4677-965X
Notes
1 Since the number of data varies with asset, longer training epochs are required for assets with less market activity. We find that 160 epochs are enough to guarantee the convergence for all assets.
2 From the cross-validation results, we found that using training data whose dates are after the test set gives no performance gain on predicting the micro-movements. This is mainly due to the highly localized characteristics of the short-term price dynamics.
3 We tried longer time delays up to 60 s, and seven labels were enough to improve the training stability. Since labels with longer time delay are less correlated with short-term price movements, using more labels with longer time delays results in underfitting.