7,967
Views
17
CrossRef citations to date
0
Altmetric
Perspectives

Long-Horizon Predictability: A Cautionary Tale

, &
 

Abstract

Long-horizon return regressions effectively have small sample sizes. Using overlapping long-horizon returns provides only marginal benefit. Adjustments for overlapping observations have greatly overstated t-statistics. The evidence from regressions at multiple horizons is often misinterpreted. As a result, much less statistical evidence of long-horizon return predictability exists than is implied by research, which casts doubt on claims about forecasts based on stock market valuations and factor timing.

Disclosure: AQR Capital Management is a global investment management firm that may or may not apply investment techniques or methods of analysis similar to those described herein. The views expressed here are those of the authors and not necessarily those of AQR.

Editor’s Note

Submitted 2 April 2018

Accepted 1 August 2018 by Stephen J. Brown

Acknowledgments

We would like to thank Stephen J. Brown, Daniel Giamouridis, and two anonymous reviewers. We would also like to thank Cliff Asness, Esben Hedegaard, Antti Ilmanen, Bryan Kelly, Toby Moskowitz, and Lasse Pedersen for helpful comments and suggestions.

Notes

1 The history in finance of studying the statistics of long-horizon regressions is extensive. For general applications, see Hansen and Hodrick (1980); Newey and West (1987); Richardson and Smith (1991); Andrews (1991). For applications to return predictability, see Richardson and Stock (1989); Hodrick (1992); Richardson (1993); Nelson and Kim (1993); Goetzmann and Jorion (1993); Boudoukh and Richardson (1994); Valkanov (2003); Boudoukh, Richardson, and Whitelaw (2008); Hjalmarsson (2011); Britten-Jones, Neuberger, and Nolte (2011); Kostakis, Magdalinos, and Stamatogiannis 2015. All of these methods provide ways to correct for the inference problem in a framework of overlapping errors.

2 Asness, Ilmanen, and Maloney (2017) discussed the issues related to valuation-based long-horizon regressions from a more practical perspective. They contrasted the visually appealing relationship between starting valuations and next-decade realized market returns against the disappointing economic gains achieved by market-timing trading rules based on time-varying valuations. They further explained mechanically why, given the apparent statistical evidence of predictability, such contrarian market-timing strategies have not outperformed the buy-and-hold portfolio over the past half-century.

3 For illustrative purposes, in the simulations to follow, we assumed that the predictive variable, Xt, follows a first-order autoregressive process [AR(1)] with parameters corresponding to those of 1/CAPE. We know that the innovations in AR processes for such valuation ratios as 1/CAPE and stock returns are contemporaneously correlated, which leads to a bias toward predictability (see, e.g., Stambaugh 1993, 1999). So as not to conflate the overlapping versus nonoverlapping focus of this article, we assumed in our simulations that this correlation is zero. That said, for robustness, we confirmed similar findings for under different contemporaneous correlation assumptions matched to the data. Of particular importance is that all the results and implications followed similarly. An interesting finding (not pursued here) is that the predictability bias worsened as the horizon increased (see also Nelson and Kim 1993; Torous, Valkanov, and Yan 2004). Note that the simulated p-values for the actual empirical applications in a later table do incorporate the nonzero contemporaneous correlation.

4 See Boudoukh and Richardson (1994) and Boudoukh et al. (2008). For particular assumptions about the autoregressive process for Xt, EquationEquation 2 can be written analytically. For example, assuming Xt follows an AR(1) process with autoregressive parameter ρX, one can show that var(β^Jol)=var(β^Jnol){(1/J)+(2/J2)(ρ/X1ρX)[(J1)(ρ/X1ρX)(1ρXJ1)]}.

5 One can show that as ρj1, then θ(J,ρJ)(J1)/J and var(β^Jol)var(β^Jnol).

6 A plethora of empirical methodologies focus on implementation issues in small samples; examples are Hansen and Hodrick (1980); Newey and West (1987); Andrews (1991); Robinson (1998); and Kiefer and Vogelsang (2005). Exceptions are Richardson and Smith (1991); Hodrick (1992); Boudoukh and Richardson (1994); and Boudoukh et al. (2008), who imposed the null hypothesis of no predictability and calculated the standard errors analytically, thus avoiding the implementation issue. Recent papers by Hjalmarsson (2011) and Britten-Jones et al. (2011) used empirical methodologies to address some of these issues.

7 A large body of literature shows the poor small-sample properties of Newey–West estimators when a large number of lags are used in estimation. See, for example, Richardson and Stock (1989); Andrews (1991); Nelson and Kim (1993); Goetzmann and Jorion (1993); Newey and West (1994); Bekaert, Hodrick, and Marshall (1997); Valkanov (2003); Hjalmarsson (2011); Britten-Jones et al. (2011); Chen and Tsang (2013).

8 Following the discussion in note 3, is also virtually identical over a range of contemporaneous correlation assumptions for returns and the predictive variable.

9 Recall that the p-value here represents the probability of rejecting the null hypothesis of no predictability when it is true. In other words, the p-value represents the probability of a mistake. Standard two-sided 5% tests might suggest p-values of 2.5% and 97.5% with corresponding t-statistics of –1.96 and +1.96, the so-called two-standard-error rule of thumb.

10 Not all researchers agree with this view; see, for example, Lewellen (2004); Campbell and Yogo (2006); Ang and Bekaert (2007); Campbell and Thompson (2008); Cochrane (2008).

11 Note that the simulated p-values are generated under joint distributional assumptions of returns, R, and predictive variable X. Thus, these p-values appropriately reflect any biases arising from lagged regressors (see Kendall 1954; Stambaugh 1993, 1999).

12 This finding is consistent with Asness, Chandra, Ilmanen, and Israel (2017), who found some weak evidence for value-spread timing on a standalone basis, but when applied in a multifactor context that already had exposure to the value factor, little evidence was found of improvement from value-spread timing because it only increased the exposure to the value factor beyond the optimal point.

13 To this point, a growing literature suggests that dividend (and, more broadly, cash flow) growth is, in fact, predictable (e.g., see Chen, Da, and Zhao 2013; Golez 2014; Møller and Sander 2017; Asimakopoulos, Asimakopoulos, Kourogenis, and Tsiritakis 2017).