734
Views
0
CrossRef citations to date
0
Altmetric
Research Papers

A hybrid convolutional neural network with long short-term memory for statistical arbitrage

& ORCID Icon
Pages 595-613 | Received 16 May 2022, Accepted 10 Feb 2023, Published online: 09 Mar 2023
 

Abstract

We propose a CNN-LSTM deep learning model, which has been trained to classify profitable from unprofitable spread sequences of cointegrated stocks, for a large scale market backtest ranging from January 1991 to December 2017. We show that the proposed model can achieve high levels of accuracy and successfully derives features from the market data. We formalize and implement a trading strategy based on the model output which generates significant risk-adjusted excess returns that are orthogonal to market risks. The generated out-of-sample Sharpe ratio and alpha coefficient significantly outperform the reference model, which is based on a standard deviation rule, even after accounting for transaction costs.

JEL Codes:

Acknowledgements

We thank the editor and two anonymous referees for carefully reading the manuscript and for several constructive and detailed comments that helped to improve our paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Harlacher (Citation2016) finds that differences between the ADF test and other unit root tests such as the Phillips–Perron or the Phillips–Ouliaris test are not significant.

2 CNN-LSTM architectures have been found to achieve state of the art performance in time series forecasting tasks related, e.g. to heart rate signals Swapna et al. (Citation2018), rainfall intensity Shi et al. (Citation2017), particulate matter Huang and Kuo (Citation2018), waterworks operations Cao et al. (Citation2018), or the gold price Livieris et al. (Citation2020).

3 Hyperparameters are inspired by the choices in Livieris et al. (Citation2020), except for the number of filters, where we found better optimization results for a lower number of filters compared to 32 and 64 filters used in Livieris et al. (Citation2020).

4 We tested the following hyperparameters: hidden LSTM layers ∈ [1,2] and LSTM cells per layer ∈ [2, 5, 10, 15, 20] on 10 randomly selected pairs. We found that a single layer with 10 cells returned the most accurate results.

5 The total number of trainable parameters of the model is 1,891.

6 For example, each element of the input vector ri that is passed to the outermost layer is standardized according to r~ti=(rtimean(ri))/stddev(ri).

7 According to the classification in Krauss (Citation2017), this model represents a stochastic control approach.

9 We note that we did not notice any problems related to vanishing or exploding gradients during the training of the models.

10 Precision is defined as TPTP+FP and F1 score is defined as TPTP+12(FP+FN) where TP, FP, and FN refer to true positives, false positives and false negatives, respectively.

11 We did a comparison for different values of extrapolation parameter k and obtained the most promising results for k = 5. We found that the superior performance of the k = 5-variant can be attributed to the better out-of-sample classification accuracy compared to the alternatives for k = 10 or 20 days. Final average out-of-sample accuracies are 68.5% for k = 5, 66.5% for k = 10, 67.1% for k = 20.

12 Alpha and beta coefficients relate to the one-factor model regression. We discuss further dependencies on risk factors in Section 4.3.3.

13 We use the statsmodels library (Seabold and Perktold Citation2010) in Python with default parameters for the linear regression.

14 The authors thank Kenneth French for allowing to source all data from his website: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

15 Note that Chen and Bassett (Citation2014) show that due to the self-financing nature of these factor portfolios and the market capitalization structure, this interpretation is not necessarily true.

16 It is important to note that deep learning techniques such as LSTM or CNN models have been introduced in the late 1990s. As such, the high risk-adjusted returns in the 1990s need to be seen against the backdrop that neither the theory nor the necessary technology for this strategy has been available for the majority of market participants.

17 Note that for the backtest with m>5 we need to optimize the extended CNN-LSTM models each trading period again based on the enlarged data set, i.e. on m = 20. The results for m={10,15,20} are therefore based on newly trained models.

18 We refer to Petersen (Citation2020) for a detailed mathematical study on neural networks.

19 We will refer to the LSTM model as established by Gers et al. (Citation2000), who modified the original LSTM of Hochreiter and Schmidhuber (Citation1997) and proposed a total of three gates named according to their functions: input, output and forget gate.

20 Subscripts are expressing the to-from-relationships, i.e. Wf,h denotes the recurrent weight connection from the previous time step's hidden state ht1 to the current time step's forget gate ft.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 691.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.