826
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Inclusion of older annual data into time series models for recent quarterly data

ORCID Icon

ABSTRACT

This note proposes time series models for data where the frequency changes over time. As an example, for many countries, in the past, real GDP growth was observed annually and since a few years or decades, the data are available per quarter. Modifying the time series models allows for the incorporation of these older annual data, without a need for imputation. Illustrations for real GDP growth in China and the Netherlands show the merits of the method.

JEL CLASSIFICATION:

I. Introduction

For many countries, national statistics institutes collect reliable quarterly data for real Gross Domestic Product (GDP) growth, whereas before they used to collect those data only at the annual level. For example, quarterly real GDP growth in China is made available since 1992Q1 by the National Bureau of Statistics of China (NBSC) (www.stats.gov.cn), while annual real GDP growth rates are available from the World Bank since 1961. Statistics Netherlands makes available their real GDP growth rates per quarter through https://opendata.cbs.nl/statline/, where at the time of writing this paper, the available data start in 1995Q1, whereas the World Bank published annual real GDP growth rates again since 1961.

presents the quarterly and annual data for China, starting in 1961 and ending in 2009Q4, see Franses and Mees (Citation2013) for an analysis. It is the purpose of this note to propose a model for the annual and quarterly data at the same time,Footnote1 that is, to account for the gaps in the quarterly data before 1992Q1. The nominal GDP data in current prices data in China are reported in a format that differs from many other countries but is convenient for the present purposes. That is, NBSC presents the data in cumulated format. Writing GDP in quarter Q in year T as ZQ,T, then the nominal GDP data look like

X1,T=Z1,T
X2,T=Z1,Tr+Z2,T
X3,T=Z1,Trr+Z2,Tr+Z3,T
X4,T=Z1,Trrr+Z2,Trr+Z3,Tr+Z4,T

Figure 1. Quarterly and annual real DGP growth for China, starting in 1961 and ending in 2009Q4

Figure 1. Quarterly and annual real DGP growth for China, starting in 1961 and ending in 2009Q4

where ‘r’ means the first revision, ‘rr’ the second revision and ‘rrr’ the third revision. The way the data are compiled in the fourth quarter matches with annual GDP. As a consequence, writing XTas total annual GDP, it is clear that

100logXTlogXT1=100logX4,TlogX4,T1

In other words, the annual data in the years before 1992 for China match with the annual growth rates per quarter, but then only observed in the fourth quarter. In turn, this means that the annual data before 1992 can be treated as quarterly rates with gaps for the quarters Q1, Q2 and Q3, see again .

In Section 2 this paper continues with analysing a first-order autoregression for the quarterly GDP growth dates in China. Section 3 focuses on a second-order autoregression for the quarterly data for the Netherlands, which first have to be turned in the cumulated format like in China. Section 4 concludes with various further research topics.

II. Real GDP growth in China: autoregression of order 1

This section will focus on a first-order autoregression and its formulation in case of gaps in the data. The intention is not to impute newly created data in these gaps, but merely to adjust the expression of the model to accommodate the gaps.

Assume there are quarterly real GDP growth figures, based on quarterly cumulative levels data xt, and continue with

yt=100(logxtlogxt4)

It is convenient to write a first autoregression as

(1) ytδ=α1yt1δ+εt(1)

where δ is the mean of the variable and where εt is a zero mean white noise process with variance σ2. When this model is fitted to the Chinese data for 1992Q1 to 2009Q4, there are no signs of significant residual autocorrelation, so this autoregression seems to fit the Chinese data well.

Backward substitution of the model in (1) towards yt4 results in

(2) ytδ=α14yt4δ+εt+α1εt1+α12εt2+α13εt3(2)

This shows that skip-sampling the data with frequency 4 does not introduce additional autocorrelation, that is, εt+α1εt1+α12εt2+α13εt3 is uncorrelated with εt4+α1εt5+α12εt6+α13εt7. The variance of the error term in the part of the sample that involves the annual data is however larger than σ2 as it is σ21+α12+α14+α16. One may now want to make use of this explicit expression, but one can also resort to the (Newey and West) HAC estimator for the standard errors of the parameters.

Defining a variable lagt which takes a value of 4 until 1992Q1 and of 1 afterwards, the expressions in (1) and (2) can be combined into

(3) ytδ=α1lagtytlagtδ+ut(3)

where ut is a zero mean white noise process with time-varying variance.

Fitting model (3) using the Generalized Method of Moments (GMM) (performed in EViews version 11), the estimation results for the sample 1962Q4 to 2009Q4 (102 effective observations) are

δˆ=9.0881.279
αˆ1=0.7650.045

where HAC corrected standard errors are given in parentheses. The Ordinary Least Squares (OLS) based estimate of the standard error would have been 0.035, so indeed the HAC correction takes care of the larger variance in the first part of the sample as the HAC standard error exceeds the OLS standard error.

When model (1) is fitted to the effective sample 1992Q2 to 2009Q4, the following estimation results are obtained:

δˆ=9.5861.205
αˆ1=0.9180.044

where now the OLS-based standard errors are reported in parentheses. Comparing the two sets of estimates, it can be seen that the mean growth rate is about the same across the two samples, but the estimates for the first autoregressive parameter differ substantially. In fact, a Dicky Fuller test statistic for the shorter sample is (0.918–1)/0.044 = −1.864, which suggests the presence of a unit root, while the same statistic for the full sample is (0.765–1)/0.045 = −5.222, which suggests a clear rejection of the unit root null hypothesis.Footnote2

Hence, one can now conclude that the longer sample introduces more power to a unit root test. However, looking again at the data in , it may be that the observation in 1961Q4 is influential as it is large and negative. Re-estimating model (3) for the sample 1963Q4 to 2009Q4 (now with 101 effective observations) gives

δˆ=9.1641.182
αˆ1=0.7330.093

and clearly the HAC-based estimate of the standard error for α1 almost doubles. The Dicky-Fuller unit root test value now becomes (0.733–1)/0.093 = −2.871, which is close to the 5% critical value. Hence, one still would conclude here that, based on the full sample, the unit root null hypothesis gets rejected.

III. Real GDP growth in the Netherlands: autoregression of order 2

In this section the focus is on real GDP data for the Netherlands. A graph of the (non-cumulated) real GDP data for 1995Q1 to 2019Q4 is presented in . The cumulated levels of real GDP data appear in . presents the annual growth rates when observed quarterly, based on not cumulated and cumulated data. It is clear that the two series do not differ much. Finally, adds the annual data observed in the fourth quarter, starting from 1961 onwards.

Figure 2. Real Gross Domestic Product 1995Q1-2019Q4 for the Netherlands (in billions of Euros)

Figure 2. Real Gross Domestic Product 1995Q1-2019Q4 for the Netherlands (in billions of Euros)

Figure 3. Real Gross Domestic Product 1995Q1-2019Q4 for the Netherlands (in billions of Euros), when cumulated as Q1, Q1+ Q2, Q1+ Q2+ Q3 and Q1+ Q2+ Q3+ Q4

Figure 3. Real Gross Domestic Product 1995Q1-2019Q4 for the Netherlands (in billions of Euros), when cumulated as Q1, Q1+ Q2, Q1+ Q2+ Q3 and Q1+ Q2+ Q3+ Q4

Figure 4. Annual growth rates of Real GDP, 1996Q1-2019Q4, for the Netherlands, based on quarterly GDP figures (GROWTH) and based on cumulative quarterly GDP figures (GROWTH_C), computed as yt=100logxtlogxt4

Figure 4. Annual growth rates of Real GDP, 1996Q1-2019Q4, for the Netherlands, based on quarterly GDP figures (GROWTH) and based on cumulative quarterly GDP figures (GROWTH_C), computed as yt=100logxt−logxt−4

Figure 5. Annual growth rates of real GDP for 1961Q4-2019Q4 for the Netherlands

Figure 5. Annual growth rates of real GDP for 1961Q4-2019Q4 for the Netherlands

An analysis of the autocorrelation function of real GDP growth for the last part of the sample involving the actual quarterly data indicates that an autoregression of order 2 describes the data well. To allow for simple backward substitution, as is done in (2), it is convenient to write the second-order autoregression as

ytδα1yt1δ=α2(yt1δα1yt2δ)+εt.
.

Re-arranging terms gives

yt=δ1α11α2+α1+α2yt1α1α2yt2+εt

and this allows to compute the residuals. Again, the parameters can be estimated using GMM. Backward substitution gives

ytδα14yt4δ=α24(yt4δα14yt8δ)+ut

where ut is a zero mean white noise process with time-varying variance. More generally, if lagt is variable that takes a value 4 until 1996.1 and 1 afterwards, then the general form of this second-order autoregression is

ytδα1lagtytlagtδ=α2lagtytlagtδα1lagtyt2lagtδ+ut

where ut still is a zero mean white noise process with time-varying variance.

The estimation results for the sample for the Netherlands that starts in 1963.4 (and has 128 effective observations) are

δˆ=3.0700.608
αˆ1=0.8540.059
αˆ2=0.2760.093

with the HAC corrected standard errors in parentheses. Clearly, all parameters differ from zero, significantly. The Dicky Fuller test value is (0.854–1)/0.059 = −2.475, which is close to the 10% critical value.

The estimation results for the effective sample 1996Q3 to 2019Q4 (94 observations) are

δˆ=1.8920.689
αˆ1=0.7970.099
αˆ2=0.3340.156

where again, now the OLS-based estimated standard errors are in parentheses. The Dicky Fuller test value is (0.797–1)/0.099 = −2.051, which is much further away from the 10% critical value. Hence, also here, one can see that including past annual data provides more evidence against the null hypothesis of a unit root.

IV. Conclusion

This paper has proposed a simple method to include past annual data into models for currently available quarterly data. Application to real GDP growth per quarter for China and the Netherlands showed that the method is easy to apply. As there are longer spans of data to analyse, it was also found that there was more evidence against the unit root null hypothesis.

The method can easily be applied to any type of time series and for any frequency. So, if data are available now at the monthly level and before at the quarterly level, the same tools can be used. Note that it is important to write the model such that there is a multiplication of first-order autoregressive terms in order to make the representation easy to handle.

A next step can be to consider vector autoregressive models and autoregressive distributed lag models for data with similar features.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1 In a sense, my approach mimics a MIDAS model, where two or more variables observed at different frequencies are correlated, see Breitung and Roling (Citation2015), Clements and Galvão (Citation2008), and Ghysels, Santa-Clara, and Valkanov (Citation2004) for useful references. In the present paper, however, the focus is on mixed frequencies over time, and not across variables.

2 For the case of an intercept and no trend, the 5% critical value of the Dicky-Fuller test for 100 observations is −2.89, and the 10% critical value is −2.58.

References