707
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

R2 Bounds for Predictive Models: What Univariate Properties Tell us About Multivariate Predictability

, &
Pages 681-695 | Received 01 Mar 2016, Published online: 31 May 2018
 

ABSTRACT

A long-standing puzzle in macroeconomic forecasting has been that a wide variety of multivariate models have struggled to out-predict univariate models consistently. We seek an explanation for this puzzle in terms of population properties. We derive bounds for the predictive R 2 of the true, but unknown, multivariate model from univariate ARMA parameters alone. These bounds can be quite tight, implying little forecasting gain even if we knew the true multivariate model. We illustrate using CPI inflation data. Supplementary materials for this article are available online.

ACKNOWLEDGMENTS

The authors thank the editor, associate editor, and three anonymous referees, along with seminar participants at Birmingham, Cambridge, Essex Business School, Lancaster, the National Bank of Poland, Norges Bank, Nottingham, Strathclyde, the Tinbergen Institute, and Universitat Pompeu Fabra for helpful comments. The authors also thank Andrew Harvey, Gary Koop, Kevin Lee, Marco Lippi, Hashem Pesaran, and Ron Smith for helpful comments on earlier versions of this article, and thank Joshua Chan for code, support, and advice on estimating unobserved-components models.

Notes

1 On the problems of providing consistent forecasting performance over time, for a range of macro time series, see, for example, D’Agostino and Surico (Citation2012); Chauvet and Potter (Citation2013); Rossi (Citation2013a); Estrella and Stock (Citation2015); Stock and Watson (Citation2007, Stock and Watson Citation2009, Stock and Watson Citation2010, Citation2016). In contrast, Banbura, Giannone, and Reichlin (Citation2010), Koop (Citation2013), and Carriero, Clark, and Marcellino (Citation2016), for example, found that large Bayesian VAR models can (but do not always) outpredict smaller models, including univariate (AR) models; and Stock and Watson (Citation2002) found that forecasts from factor models can outperform univariate (AR) benchmarks, but typically less so for nominal than real variables.

2 We derive moment conditions for the ARMA(1,1) models implied by these UC representations. As far as we are aware, these derivations are also new.

3 We use the notation of the generic ABCD representation of Fernández-Villaverde et al. (Citation2007). They assume that this system represents the rational expectations solution of a DSGE model (in which cases the matrices (A, B, C, D) are usually functions of a lower dimensional vector of deep parameters, δ ). But the representation is sufficiently general to capture the key properties of a wide range of multivariate models, including VAR and factor models. Note that the state vector z t may contain information from the history of y t itself. In the benchmark structural DSGE model of Smets and Wouters (Citation2007), for example, z t contains levels of six out of the seven observables in y t . The system can also represent the companion form of a VAR.

4 It allows for possibly complex eigenvalues, and hence elements of z t . It can be generalized completely by letting M take the Jordan form (with 1's on the subdiagonal). This admits, in terms of the discussion below, ARMA(p, q) representations with q > p, but does not otherwise change the nature of our results.

5 All proofs are in the online appendix.

6 This draws on the seminal work of Zellner and Palm (Citation1974) and Wallis (Citation1977).

7 The limiting case |θ i | = 1, for some i, which is not invertible but is still fundamental, may in principle arise if yt has been over-differenced. But since this case essentially arises from a misspecification of the structural (multivariate) model we do not consider it further.

8 Which may in principle, as noted above, contain information from the history of yt itself.

9 Note that as discussed in Lippi and Reichlin (Citation1994) some of the θ i may be complex conjugates.

10 Note that if θ i = 0 for some i (hence the ARMA is not a minimal representation), the nonfundamental representation is undefined but we can still use (Equation9) to calculate R 2 max  = 1.

11 Not least because the predictive errors ξ t cannot in general be jointly IID with the innovation to a time series representation of q t (a point made forcefully by Pástor and Stambaugh Citation2009).

12 This is indeed the null hypothesis of no Granger causality from q t , as originally formulated by Granger (Citation1969) (although in practice in most econometric testing y t − 1 is typically only included via a finite set of autoregressive terms).

13 Pu is the preimage of u in Pr.

14 One of the referees objected to our use of the term “restrictions” on the predictive system. Clearly in causal terms the properties of the predictive system determine univariate properties, and not vice-versa. However, in strict mathematical terms, if we observe (or assume) a population univariate property, this does indeed restrict the parameter space of predictive systems that could have generated that property.

15 A specification of this form has, for example, dominated the finance literature on predictive return regressions, with yt some measure of returns or excess returns, and xt some stationary valuation criterion.

16 We focus here on the time-invariant case; but in Section 6 we extend the analysis to the case where μ, σ2 τ, and σ2 c are all potentially time-varying.

17 See Appendix I.1 for the reparameterization in the time-varying case, which nests the time-invariant case here.

18 Note that the moment condition (Equation15) is satisfied by θ and also by θ− 1. While in general, as discussed in Section 2.2, there will be multiple nonfundamental representations of the same order, in this particular case, with r = q = 1, there is only one.

19 Note that only in the limiting case as θ → 0 does it actually reveal y t + 1 perfectly.

20 Mitchell, Robertson, and Wright (Citation2018) proved a generalization of this result for r ⩾ 1, for any predicted series with cy (1) < 1.

21 Any ARMA model has a state-space representation (Hamilton Citation1994, chap. 13, pp. 375–376). Permanent mean shifts induce a unit root that can be differenced out to derive a stationary ABCD representation.

22 The methodology could be generalized to higher-order ARMA representations.

23 Stock and Watson (Citation2007), for example, noted that their unobserved components stochastic volatility model (as employed in the next section) implies a time-varying MA(1) representation, but the estimates of θ t that they presented are derived using a time-invariant formula.

24 As such the methodology applied here could in principle be extended to higher-order predictive systems.

25 Nor is the nonfundamental MA parameter, γ t , equal to θ− 1 t , except in the limiting time-invariant case.

26 In Lippi and Reichlin's (Citation1994) terms, this would imply that the minimal ARMA (p, q) is the fundamental representation, which provides the lower R 2 bound, while there would exist a nonfundamental “nonbasic” ARMA(r, r) representation, with r > q, in which all the θ i in the macroeconomist's ARMA, (Equation5), are replaced with their reciprocals, which provides the true upper bound. But the nonbasic nature of this representation would mean that the true upper bound would be unknowable.

27 Note that this would also rule out q = 0, that is, a pure AR(p). While such representations are widely used in empirical applications, the derivation from a structural model shows that, absent restrictions on the ABCD parameters, such representations can only be rationalized as approximations for the true ARMA(r, r).

28 Most of the arguments presented here also apply in the time-varying case, to which we revert below after discussing our empirical application.

29 Note that Stock and Watson used a time-invariant formula to derive an estimate of the implied time-varying MA parameter; however, we show below that in this context this generates very similar answers to the exact recursive formula.

30 In the SWC framework, with no AR component, stochastic volatility in the implicit single predictor can be captured by time variation in β t .

31 See online Appendix I.1. Note that CKP also use restrictions that bound both τ t and μ t . We impose bounds on μ t , as in CKP, but not on τ t , since this would change the order of the ARMA representation. However, we find that our estimated unobserved components are affected only minimally by whether we impose the bound on τ t .

32 Note that, as in the time-invariant case analyzed in Section 3.3, ct can be viewed in filtering terms as an estimate of the true predictor, conditional upon the history of yt , and the identifying assumption Eτ, t , ζ c, t ) = 0, ∀t.

33 We gratefully acknowledge use of Joshua Chan's Matlab code for both the SWC and CKP models, available at http://joshuachan.org/code.html . As detailed in online Appendices L and M, we do investigate the robustness of results to some of these specification choices.

34 Panels C and D of Figure M.1 (see online appendix) also show that results are robust to consideration of a more diffuse prior for σ2 τ in CKP. Such a diffuse prior is in line with the similarly diffuse prior employed in SWC.

35 Chan, Koop, and Potter's (Citation2013) out-of-sample predictability tests (their Table 5) also show that differences between the CKP and Stock-Watson's UC model are relatively modest, certainly for one-step ahead forecasts which are our focus in this article.

36 The time-invariant formula in (Equation17), for the SWC/MA(1) case is simply R 2 min  = θ2/(1 + θ2). In Panels A and B of Figure M.1 (online Appendix), we show that applying the time-invariant formulas from Section 3 to the time-varying UC and ARMA estimates usually gives good, or (in the case of the SWC representation very good) approximations to the true, recursive values we derive from our moment conditions. Exceptions to this general rule arise when estimates of θ t are close to, or exceed unity.

37 From (Equation17) the time-invariant formula is R 2 min  = (λ − θ)2/(1 − λ2 + (λ − θ)2)).

38 As noted in the discussion of Proposition 3, fundamentalness does not impose an upper bound of unity in every period. Note also that the proof also shows that the nonfundamental MA parameter γ t is only equal to θ− 1 t on average, so when shows θ^t>1 this does not imply that γ^t<1; indeed it is always higher than θ^t.

39 In Panels E and F of online Figure M.1, we show 16.5%, 50%, and 83.5% quantiles of the posterior distribution of (R 2 max , t R min , t 2) for SWC and CKP. The range of values of the gap between the upper and lower bounds is more revealing of the impact of parameter uncertainty than for either in isolation, since R 2 min , t and R 2 max , t are strongly correlated across replications. The posterior intervals are much narrower for CKP than SWC.

40 As noted above (see footnote 36), online Figure M.1 shows that the time-invariant formulas mostly provide a good approximation to the true values.

41 Time variation in β t allows us to make the single predictor pure white noise.

42 Results for other OECD countries are similar; see Panel E of Figures L2– L8 in the online appendix.

43 For the general case, this result relates to the correlation between ut and vt , the innovation to the predictor, but in the MA(1) case xt = vt .

44 We have not found a way to generalize Proposition 2 to the time-varying case; however we would defend the approach used here on the basis both of the (usually) fairly good approximations provided by time-invariant formulas for the R 2 bounds, and the logic of Corollaries 3 and 4, together with the concept of the predictive space, all of which must apply even in a time-varying context.

45 This conclusion also holds, to varying extents, in most other countries (see online Appendix L). An exception is Italian inflation, where the SWC estimates of θ^t (see Figure L.6, Panel B) are much lower, implying lower values for ρ^min,t (using the time-invariant formula) than in the US.

46 The discussion in online Appendix L shows that in most countries it is quite hard even to distinguish conclusively between time-varying MA(1) and ARMA(1,1) representations of CPI inflation. So it seems highly unlikely that higher-order representations could be estimated. In practice, we are not aware of readily available estimation routines that would allow us to estimate a higher-order ARMA model with time-varying parameters (or equivalently a UC model for the level of inflation, Yt with multiple stationary components). However, applying standard time-invariant ARMA estimation techniques to CPI inflation suggests minimal gains from increasing ARMA order in any of the eight countries we examine.

47 For example, Mitchell, Robertson, and Wright (Citation2018) work through the implications of case of a true ARMA(2,2) that generates data for inflation consistent (within the range of sampling variation) with Stock and Watson's MA(1) representation. They show that the bounds from the ARMA(1,1) provide a very good approximation, in the sense that the “predictive space” either has very little mass outside these bounds, or only contains predictive systems with properties that we would rule out on a priori grounds (e.g., cases in which both predictors have λ i < 0, or are perfectly negatively correlated.)

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 123.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.