776
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Testing for bias in forecasts for independent binary outcomes

ORCID Icon

ABSTRACT

This letter deals with a test on forecast bias in predicting independent binary outcomes, where the outcomes are either 1 or 0, and the predictions are probabilities. The test concerns two parameter restrictions in a simple logit model. Size-corrected power experiments show remarkable power.

JEL CODES:

I. Introduction and motivation

This letter deals with a test on forecast bias in predicting independent binary outcomes, where the outcomes are either 1 or 0, and the predictions are probabilities. There is no need to know how the predictions were created, that is, the predictions can be based on a logit model (Cramer Citation1991) or a probit model or a linear probability model, or by expert judgement.

In a standard regression model for continuous outcomes, one can consider the auxiliary regression which links the predictions with the realizations. If realizations yi and forecasts yˆi would be continuous variables, and these can be cross sectional data or time series data, where the forecast sample is i=1,2,..,N, then one can examine bias using the auxiliary regression

yi=α+βyˆi+εi

The parameters are estimated using Ordinary Least Squares. The Wald type test of interest concerns the hypothesis that saranyaMand β=1, jointly. Under the null hypothesis, there is no forecast bias. This regression is called the Mincer Zarnowitz regression, see Mincer and Zarnowitz (Citation1969).

In this letter, I propose a similar test but now for independent binary outcomes, that is, there are realizations that can be either 1 or 0, and the predictions are estimated probabilities that the outcome is 1. The question is if these probabilities are unbiased or not. Note that if the predictions are also 1 or 0, one can resort to variants of tests on hit rates, see for example Franses and Paap (Citation2001, page 65), but a test for the hit rate is not the focus here. The new test turns out to be similarly easy as based on the Mincer Zarnowitz regression. Power simulations show that the test works quite well. The test for forecast bias is illustrated using the 2018 Goldman Sachs predictions for the football teams that supposedly would make it to the second round of the 2018 World Championship football in Russia.

II. The main idea

Consider N forecasts pˆi and N observations yi, where yi can take values 1 or 0, and where the forecasts are numbers in between 0 and 1. An example is the dataset in , which refers to the Goldman Sachs forecasts. The interest is to see if there is forecast bias.

Table 1. Realizations and forecasts concerning surviving the first round of the 2018 World Cup in Russia. Data source is Exhibit 2 of the 11 June 2018 Global Macro Research report of the Goldman Sachs Group, Inc

The key identity to design the test is

Probyi=pˆi

where simple algebra gives

Probyi=explogpˆi1pˆi1+explogpˆi1pˆi=pˆi

The middle term can be recognized as the expression for the logit model (Franses and Paap Citation2001, page, 54), that for a single variable xi is given by

Probyi=expα+βxi1+expα+βxi

Hence, a Mincer Zarnowitz type test for the null hypothesis of no forecast bias can be based on the logit model

(1) Probyi=Λα+βlogpˆi1pˆi(1)

with Λ the logistic function, and on the Wald test for α=0and β=1, jointly.

III. Simulations

To see how the test works in practice, I consider various simulation experiments. As there is no such test around,Footnote1 I focus only on the proposed test. For sample size N, I generate 2 N observations, where the first half will be used to estimate the model parameters, and the second half will be used to create and evaluate the forecasts. The Data Generating Process (DGP) is

Probyi=exp0+2xi1+exp0+2xi

where xi\~N0,1 The binary data on yi are created as follows:

yi=1whenProbyi>0.5
yi=0whenProbyi0.5

Next, the parameters in the logit model are estimated using Maximum Likelihood, see Franses and Paap (Citation2001, section 4.2). The estimated parameters are used for the second set of N observations to create pˆi. Finally, the logit model in (1) is considered and the Wald test is computed. I use 1000 simulation runs.

First, I examine if the test has proper size. This turns out not to be the case, as even in case N=10000, the rejection rate is 16.2%. To obtain a new 5% critical value, the 95th Wald test value is taken, and this is equal to 10.71. With this new critical value, size-corrected power experiments can be run.

To create data under the alternative hypothesis, I replace observations on yi in the second set of N observations. Each time 5%, 10%, 15%, until 90% of the observations with yi=1 is replaced by yi=0. The size-corrected power for N=50,100,500and1000 is displayed in .

Clearly, the size-corrected power is quite high, even for small samples.

IV. Illustration

To illustrate the new test, consider the data in . There are 32 countries of which 16 attained the knockout stage of the 2018 World Cup football tournament. Attaining this stage is labelled as 1, having to leave the tournament after the first round is 0. The third column of presents the probabilities assigned by Goldman Sachs of attaining the second round.

Table 2. Size-corrected power. The 5% critical value is set at 10.71

The Maximum Likelihood based parameter estimates (using Eviews version 8.0) are 0.033 (0.440) and 1.596 (0.569) for α and β, respectively, with estimated standard errors in parentheses. The McFadden R-squared (Franses and Paap Citation2001, page, 64) is 0.282, so the logit model fits the data quite well. Finally, the Wald test for the joint hypothesis that α=0and β=1 appears to equal 1.100, which is substantially smaller than 10.71. This suggests that the Goldman Sachs forecast were unbiased.

Acknowledgments

Thanks to Richard Paap for helpful comments and Max Welz for programming.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1 There are tests on the so-called hit rate, that is the fraction of correctly predicted 1 and 0 observations, but that concerns another feature of the forecasts.

References

  • Cramer, J. S. 1991. The Logit Model: An Introduction for Economists. New York: Routlegde.
  • Franses, P. H., and R. Paap. 2001. Quantitative Models in Marketing Research. Cambridge UK: Cambridge University Press.
  • Mincer, J., and V. Zarnowitz. 1969. “The Evaluation of Economic Forecasts.” In Economic Forecasts and Expectations, edited by J. Mincer. New York: National Bureau of Economic Research.