Full article: Testing for bias in forecasts for independent binary outcomes

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

This letter deals with a test on forecast bias in predicting independent binary outcomes, where the outcomes are either 1 or 0, and the predictions are probabilities. The test concerns two parameter restrictions in a simple logit model. Size-corrected power experiments show remarkable power.

KEYWORDS:

JEL CODES:

I. Introduction and motivation

This letter deals with a test on forecast bias in predicting independent binary outcomes, where the outcomes are either 1 or 0, and the predictions are probabilities. There is no need to know how the predictions were created, that is, the predictions can be based on a logit model (Cramer Citation1991) or a probit model or a linear probability model, or by expert judgement.

In a standard regression model for continuous outcomes, one can consider the auxiliary regression which links the predictions with the realizations. If realizations $y_{i}$ and forecasts ${\hat{y}}_{i}$ would be continuous variables, and these can be cross sectional data or time series data, where the forecast sample is $i = 1, 2, . ., N$ , then one can examine bias using the auxiliary regression

y_{i} = α + β {\hat{y}}_{i} + ε_{i}

The parameters are estimated using Ordinary Least Squares. The Wald type test of interest concerns the hypothesis that $s a r a n y a M$ and $β = 1$ , jointly. Under the null hypothesis, there is no forecast bias. This regression is called the Mincer Zarnowitz regression, see Mincer and Zarnowitz (Citation1969).

In this letter, I propose a similar test but now for independent binary outcomes, that is, there are realizations that can be either 1 or 0, and the predictions are estimated probabilities that the outcome is 1. The question is if these probabilities are unbiased or not. Note that if the predictions are also 1 or 0, one can resort to variants of tests on hit rates, see for example Franses and Paap (Citation2001, page 65), but a test for the hit rate is not the focus here. The new test turns out to be similarly easy as based on the Mincer Zarnowitz regression. Power simulations show that the test works quite well. The test for forecast bias is illustrated using the 2018 Goldman Sachs predictions for the football teams that supposedly would make it to the second round of the 2018 World Championship football in Russia.

II. The main idea

Consider N forecasts ${\hat{p}}_{i}$ and N observations $y_{i}$ , where $y_{i}$ can take values 1 or 0, and where the forecasts are numbers in between 0 and 1. An example is the dataset in , which refers to the Goldman Sachs forecasts. The interest is to see if there is forecast bias.

Table 1. Realizations and forecasts concerning surviving the first round of the 2018 World Cup in Russia. Data source is Exhibit 2 of the 11 June 2018 Global Macro Research report of the Goldman Sachs Group, Inc

Download CSV Display Table

The key identity to design the test is

P r o b (y_{i}) = {\hat{p}}_{i}

where simple algebra gives

P r o b (y_{i}) = \frac{e x p (log (\frac{{\hat{p}}_{i}}{1 - {\hat{p}}_{i}}))}{1 + e x p (log (\frac{{\hat{p}}_{i}}{1 - {\hat{p}}_{i}}))} = {\hat{p}}_{i}

The middle term can be recognized as the expression for the logit model (Franses and Paap Citation2001, page, 54), that for a single variable $x_{i}$ is given by

P r o b (y_{i}) = \frac{e x p (α + β x_{i})}{1 + e x p (α + β x_{i})}

Hence, a Mincer Zarnowitz type test for the null hypothesis of no forecast bias can be based on the logit model

(1)

P r o b (y_{i}) = Λ (α + β log (\frac{{\hat{p}}_{i}}{1 - {\hat{p}}_{i}}))

(1)

with $Λ$ the logistic function, and on the Wald test for $α = 0$ and $β = 1$ , jointly.

III. Simulations

To see how the test works in practice, I consider various simulation experiments. As there is no such test around,Footnote¹ I focus only on the proposed test. For sample size N, I generate 2 N observations, where the first half will be used to estimate the model parameters, and the second half will be used to create and evaluate the forecasts. The Data Generating Process (DGP) is

P r o b (y_{i}) = \frac{e x p (0 + 2 x_{i})}{1 + e x p (0 + 2 x_{i})}

where $x_{i} \~ N (0, 1)$ The binary data on $y_{i}$ are created as follows:

y_{i} = 1 w h e n P r o b (y_{i}) > 0.5

y_{i} = 0 w h e n P r o b (y_{i}) \leq 0.5

Next, the parameters in the logit model are estimated using Maximum Likelihood, see Franses and Paap (Citation2001, section 4.2). The estimated parameters are used for the second set of N observations to create ${\hat{p}}_{i}$ . Finally, the logit model in (1) is considered and the Wald test is computed. I use 1000 simulation runs.

First, I examine if the test has proper size. This turns out not to be the case, as even in case $N = 10000$ , the rejection rate is 16.2%. To obtain a new 5% critical value, the 95^th Wald test value is taken, and this is equal to 10.71. With this new critical value, size-corrected power experiments can be run.

To create data under the alternative hypothesis, I replace observations on $y_{i}$ in the second set of N observations. Each time 5%, 10%, 15%, until 90% of the observations with $y_{i} = 1$ is replaced by $y_{i} = 0$ . The size-corrected power for $N = 50, 100, 500 a n d 1000$ is displayed in .

Clearly, the size-corrected power is quite high, even for small samples.

IV. Illustration

To illustrate the new test, consider the data in . There are 32 countries of which 16 attained the knockout stage of the 2018 World Cup football tournament. Attaining this stage is labelled as 1, having to leave the tournament after the first round is 0. The third column of presents the probabilities assigned by Goldman Sachs of attaining the second round.

Table 2. Size-corrected power. The 5% critical value is set at 10.71

Download CSV Display Table

The Maximum Likelihood based parameter estimates (using Eviews version 8.0) are 0.033 (0.440) and 1.596 (0.569) for $α$ and $β$ , respectively, with estimated standard errors in parentheses. The McFadden R-squared (Franses and Paap Citation2001, page, 64) is 0.282, so the logit model fits the data quite well. Finally, the Wald test for the joint hypothesis that $α = 0$ and $β = 1$ appears to equal 1.100, which is substantially smaller than 10.71. This suggests that the Goldman Sachs forecast were unbiased.

Acknowledgments

Thanks to Richard Paap for helpful comments and Max Welz for programming.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

¹ There are tests on the so-called hit rate, that is the fraction of correctly predicted 1 and 0 observations, but that concerns another feature of the forecasts.

References

Cramer, J. S. 1991. The Logit Model: An Introduction for Economists. New York: Routlegde.
Google Scholar
Franses, P. H., and R. Paap. 2001. Quantitative Models in Marketing Research. Cambridge UK: Cambridge University Press.
Google Scholar
Mincer, J., and V. Zarnowitz. 1969. “The Evaluation of Economic Forecasts.” In Economic Forecasts and Expectations, edited by J. Mincer. New York: National Bureau of Economic Research.
Google Scholar

Testing for bias in forecasts for independent binary outcomes

ABSTRACT

I. Introduction and motivation

II. The main idea

Table 1. Realizations and forecasts concerning surviving the first round of the 2018 World Cup in Russia. Data source is Exhibit 2 of the 11 June 2018 Global Macro Research report of the Goldman Sachs Group, Inc

III. Simulations

IV. Illustration

Table 2. Size-corrected power. The 5% critical value is set at 10.71

Acknowledgments

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Testing for bias in forecasts for independent binary outcomes

ABSTRACT

I. Introduction and motivation

II. The main idea

Table 1. Realizations and forecasts concerning surviving the first round of the 2018 World Cup in Russia. Data source is Exhibit 2 of the 11 June 2018 Global Macro Research report of the Goldman Sachs Group, Inc

III. Simulations

IV. Illustration

Table 2. Size-corrected power. The 5% critical value is set at 10.71

Acknowledgments

Disclosure statement

Notes

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date