Abstract
Machine learning is an increasingly important and controversial topic in quantitative finance. A lively debate persists as to whether machine learning techniques can be practical investment tools. Although machine learning algorithms can uncover subtle, contextual, and nonlinear relationships, overfitting poses a major challenge when one is trying to extract signals from noisy historical data. We describe some of the basic concepts of machine learning and provide a simple example of how investors can use machine learning techniques to forecast the cross-section of stock returns while limiting the risk of overfitting.
Disclosure: The authors report no conflicts of interest.
Editor’s Note
Submitted 19 July 2018
Accepted 30 January 2019 by Stephen J. Brown
Notes
1 “Bagging” is an abbreviation for “bootstrap aggregating,” or averaging forecasts from different training sets. “Boosting” is the process of reweighting observations to put more weight on misclassifications from prior forecasting rounds.
2 An alternative approach is to use a robust objective function, such as the pairwise rank correlation between returns and forecasts in a regression setting.
4 We refer interested readers to chapter 7 in López de Prado (2018) for an in-depth treatment of cross-validation for financial data.
5 Machine learning algorithms are well known for their ability to tease signals from big data—for instance, detecting sentiment in text or predicting future sales from social media posts. Although these applications are certainly promising, they are not the focus of this article. Our goal is to show how MLAs can be more effective than traditional quantitative techniques even when using widely known quant signals to forecast security returns.
6 Practitioners could also use a machine learning model to aggregate the information of the individual signals.
7 Results for individual regions are available on request.
8 Gu et al. (2018) found that price trend, volatility, and liquidity are by far the most important features. Our analysis suggests that these categories are important, but we also found that the percentage of shares sold short, the difference between put and call implied volatilities, and characteristics derived from financial statement information are among the 10 most important features.