Abstract
Using a machine-learning technique known as random forests, we analyze the role of investor confidence in forecasting monthly aggregate realized stock-market volatility of the United States (US), over and above a wide-array of macroeconomic and financial variables. We estimate random forests on data for a period from 2001 to 2020, and study horizons up to one year by computing forecasts for recursive and a rolling estimation window. We find that investor confidence, and especially investor confidence uncertainty has out-of-sample predictive value for overall realized volatility, as well as its “good” and “bad” variants. Our results have important implications for investors and policymakers.
Notes
1 The reader is referred to Limongi Concetto and Ravazzolo (Citation2019) for a detailed review of the international literature on this topic.
2 The MU and FU indexes are available for download from: https://www.sydneyludvigson.com/data-and-appendixes.
3 Note that, the University of Michigan survey-based consumer confidence index is included in the 8 macro and financial factors included in Model 2.
4 The data is publicly available for download from: https://www.aaii.com/sentimentsurvey/sent_results, and is published every Wednesday, based on the poll conducted the previous Friday.
5 The indexes are downloadable from: https://som.yale.edu/faculty-research-centers/centers-initiatives/international-center-for-finance/data/stock-market-confidence-indices/united-states-stock-market-confidence-indices.
6 It should be noted that the “grf” package offers the option to use different subsamples for constructing a tree and for making predictions. We deactivate this option, as in a classic random forest.
7 given at the end of the paper (Appendix) shows that using the mean-absolute-forecast error (MAFE) rather than the RMSFE gives qualitatively similar results.
8 Following the suggestion of an anonymous reviewer, we also estimated Models 1 through 6 by the ordinary least-squares technique on the full sample of data. In order to get an idea of the improvement in the fit of the models that results when we add the various explanatory variables, we then computed the adjusted R2 statistics for the six models. The results (for RV and h = 1; results for the other metrics of realized volatility and the other forecast horizons show a qualitatively similar pattern and are not reported) are as follows: Model 1 0.448, Model 2 0.625, Model 3: 0.629, Model 4: 0.627, Model 5: 0.658, Model 6: 0.668. Thus, Model 2 shows a much better fit in terms of the adjusted R2 statistics than Model 1, the fit improves when we turn to Models 2 to 4, which produce similar adjusted R2 statistics, and the overall model fit improves further when we move to Models 5 and 6.
9 Given the large number of predictors and the potential for nonlinearities in the data-generating process, it is not surprising that, when we compare forecast accuracy by means of the Clark-West test, random forests do not perform better than the benchmark OLS technique only in case of the baseline HAR-RV model. For all other models, the Clark-West tests yields significant results. Detailed results are available from the authors upon request.
10 As a robustness check, we consider alternative rolling-estimation windows. Results are summarized at the end of the paper () and confirm the results for the recursive-estimation windows. In addition, we estimated, as suggested by an anonymous reviewer, three alternative models. Model A features the HAR-RV predictors the 8 factors, and the confidence indexes and their standard deviations. In Model B, we added the bull-bear variables. In Model C, we further added the MU and FU predictors. Test results (not reported, but available from the authors upon request) showed that Model B hardly performs better than Model A, and that Model C performs better than Model B. In other words, the confidence indexes and their standard deviations and the MU and FU predictors both contain predictive value for realized volatility, in line with the results reported in .