3,489
Views
488
CrossRef citations to date
0
Altmetric
Original Articles

In-Sample or Out-of-Sample Tests of Predictability: Which One Should We Use?

&
Pages 371-402 | Published online: 06 Feb 2007
 

Abstract

It is widely known that significant in-sample evidence of predictability does not guarantee significant out-of-sample predictability. This is often interpreted as an indication that in-sample evidence is likely to be spurious and should be discounted. In this paper, we question this interpretation. Our analysis shows that neither data mining nor dynamic misspecification of the model under the null nor unmodelled structural change under the null are plausible explanations of the observed tendency of in-sample tests to reject the no-predictability null more often than out-of-sample tests. We provide an alternative explanation based on the higher power of in-sample tests of predictability in many situations. We conclude that results of in-sample tests of predictability will typically be more credible than results of out-of-sample tests.

Mathematics Subject Classification:

Acknowledgment

We have benefited from comments at the 2002 European Econometric Society Meeting, the 2002 NBER Summer Institute and the 2002 EC2 Conference. We also thank seminar participants at Bocconi, Bonn, CORE, the European Central Bank, Exeter, Helsinki, INSEAD, Leuven, Montreal, Pittsburgh, Pompeu Fabra, Southampton, Tokyo Metropolitan, Tokyo, Warwick, Waseda, Yokohama National and York. We especially thank two anonymous referees, Valentina Corradi, Todd Clark, Frank Diebold, Robert Engle, Scott Gilbert, Clive Granger, Alastair Hall, Kirstin Hubrich, Michael McCracken, Peter Reinhard Hansen, Barbara Rossi, Norman Swanson, and Ken West for helpful discussions. Part of this research was conducted while the second author served as an adviser at the European Central Bank (ECB). The views expressed in this paper do not necessarily reflect the opinion of the ECB or its staff.

Notes

aThis paper does not deal with forecast accuracy tests for nonnested models (see, e.g., West, Citation1996).

bWe focus on asymptotic results because finite-sample size distortions in practice can be effectively eliminated by the use of bootstrap methods (see, e.g., Clark and McCracken, Citation2004; Kilian, Citation1999; Kilian and Taylor, Citation2003; Mark, Citation1995; Rapach and Wohar, Citation2003).

cMcCracken (Citation2001) studies out-of-sample inference involving forecasting models that in turn were selected based on some inconsistent model selection procedure. His methodology, however, presumes that no respecification of the forecast model occurs after the out-of-sample test is conducted. Thus, he rules out data mining of the form described here.

dHansen (Citation2001) discusses some possible drawbacks of White's proposal. Note that these possible drawbacks do not apply in our context because our model is nested and the null hypothesis holds with equality.

eOur analysis is a natural extension of work in classical statistics on the testing of multiple hypotheses (see, e.g., Anderson, Citation1994; Dasgupta and Spurrier, Citation1997; Royen, Citation1984). A similar framework has also been used by Hansen (Citation2000) who proposed bootstrap inference for the distribution of R 2 in the presence of data mining.

fThere is one counterexample to this tendency, in which out-of-sample tests will tend to have higher power than in-sample tests: Suppose that the break in β occurs at exactly [λT] where λ = 0.5. Further suppose that in the first half of the sample β = − c and in the second half β = c where c is some constant. In that case, the in-sample test will have zero power asymptotically, whereas the out-of-sample test will have some power. This counterexample, however, seems more of an intellectual curiosity because it requires three unrealistic conditions. First, a switch in sign seems unlikely in situations that would suggest the use of a one-sided t-test, as is typically the case in applied work. Second, it is unlikely that the deviations from β = 0 exactly offset one another. Third, it is unlikely that the break occurs exactly at [0.5T]. Even for small deviations from these assumptions the counterexample breaks down.

gA related test that is robust against dynamic misspecification has also been proposed by Chao et al. (Citation2001). Yet another test of predictability that allows for dynamic misspecification under the null is presented in Corradi and Swanson (Citation2002), but that paper focuses on the problem of testing equal predictive accuracy of two nested models against the alternative of possibly nonlinear predictability.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 578.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.