15,116
Views
98
CrossRef citations to date
0
Altmetric
Articles

Machine Learning for Stock Selection

ORCID Icon & , CFA
 

Abstract

Machine learning is an increasingly important and controversial topic in quantitative finance. A lively debate persists as to whether machine learning techniques can be practical investment tools. Although machine learning algorithms can uncover subtle, contextual, and nonlinear relationships, overfitting poses a major challenge when one is trying to extract signals from noisy historical data. We describe some of the basic concepts of machine learning and provide a simple example of how investors can use machine learning techniques to forecast the cross-section of stock returns while limiting the risk of overfitting.

Disclosure: The authors report no conflicts of interest.

Editor’s Note

Submitted 19 July 2018

Accepted 30 January 2019 by Stephen J. Brown

Notes

1 “Bagging” is an abbreviation for “bootstrap aggregating,” or averaging forecasts from different training sets. “Boosting” is the process of reweighting observations to put more weight on misclassifications from prior forecasting rounds.

2 An alternative approach is to use a robust objective function, such as the pairwise rank correlation between returns and forecasts in a regression setting.

4 We refer interested readers to chapter 7 in López de Prado (2018) for an in-depth treatment of cross-validation for financial data.

5 Machine learning algorithms are well known for their ability to tease signals from big data—for instance, detecting sentiment in text or predicting future sales from social media posts. Although these applications are certainly promising, they are not the focus of this article. Our goal is to show how MLAs can be more effective than traditional quantitative techniques even when using widely known quant signals to forecast security returns.

6 Practitioners could also use a machine learning model to aggregate the information of the individual signals.

7 Results for individual regions are available on request.

8 Gu et al. (2018) found that price trend, volatility, and liquidity are by far the most important features. Our analysis suggests that these categories are important, but we also found that the percentage of shares sold short, the difference between put and call implied volatilities, and characteristics derived from financial statement information are among the 10 most important features.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.