Search in:

Applied Economics Letters Volume 26, 2019 - Issue 1

Submit an article Journal homepage

Open access

1,380

Views

CrossRef citations to date

Altmetric

Listen

Articles

Spurious principal components

Philip Hans FransesEconometric Institute, Erasmus School of Economics, Rotterdam, The NetherlandsCorrespondence[email protected]

Eva JanssensEconometric Institute, Erasmus School of Economics, Rotterdam, The Netherlands

Pages 37-39 | Published online: 01 Feb 2018

Cite this article
https://doi.org/10.1080/13504851.2018.1433292
CrossMark

In this article

ABSTRACT
I. Introduction and motivation
II. Simulation experiments
III. Illustration
Disclosure statement
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

ABSTRACT

The principal component regression (PCR) is often used to forecast macroeconomic variables when there are many predictors. In this letter, we argue that it makes sense to pre-whiten the predictors before including these in a PCR. With simulation experiments, we show that without such pre-whitening, spurious principal components can appear and that these can become spuriously significant in a PCR. With an illustration to annual inflation rates for five African countries, we show that non-spurious principal components can be genuinely relevant in empirical forecasting models.

KEYWORDS:

Principal component regression
pre-whitening
spurious regressions

JEL CLASSIFICATION:

I. Introduction and motivation

The principal component regression (PCR) is a frequently considered model to forecast macroeconomic variables when there are many predictors, see Stock and Watson (Citation1999; Citation2002), Bernanke, Boivin, and Eliasz (Citation2005), Heij, Van Dijk, and Groenen (Citation2011) and many others. The idea of the PCR is that the predictors are summarized in a few principal components, and that these new variables enter as explanatory variables in a regression model. When summarizing the predictors, it is typical practice to consider growth rates of the predictors in case of unit roots, but otherwise the variables are usually included as they are. In this letter, we recommend to pre-whiten all predictors, that is, to fit for example autoregressive models to the data, and use the residuals as the new predictors in principal components analysis (PCA). When the PCA results for raw and pre-whitened data are similar, one may well have found non-spurious principal components.

We base our recommendation on a few simulation experiments, which show that without such pre-whitening one runs the risk of finding spurious principal components, and finding spuriously significant newly created regressors in the PCR. The arguments why one can obtain spurious effects are the same as those echoed in Yule (Citation1926), Ames and Reiter (Citation1961) and, of course, Granger and Newbold (Citation1974).

An illustration of how a PCR can look like in case of spurious and non-spurious principal components is also given.

II. Simulation experiments

Consider the creation of four time series variables, using the data generating process (DGP):

Hence, there are four independent variables, each generated as a first-order autoregression. The error terms are all independent draws from a standard normal distribution. The starting values are always equal to 0. In the simulations, t will run from 1 to 50, or 100, or 500.

First, we create principal components for the variables , which is done based on the correlation matrix of these three variables. This implies that the sum of the eigenvalues is equal to 3. If the three variables each would be a white noise process, then the estimated eigenvalues should all be about equal to 1. However, when the autoregressive parameter deviates further away from 0 and approaches 1, we may expect that there will appear spurious non-zero correlations across the variables, as already demonstrated in Yule (Citation1926), and hence we may expect that the first eigenvalue will deviate away from 1.

A confirmation of these expectations is summarized in . The cells in the first panel present the average value of the first eigenvalue and the SD, across 10,000 replications. It is clear that the larger the autoregressive parameter gets, the larger is the first eigenvalue. When the sample size increases, the deviation away from 1 gets smaller, but not much. In the second panel, we report the frequency of 5% significant parameters, associated with the first principal component in the PCR. There, we additionally have that

Table 1. The data generating process.

Display Table

with like the other three variables, and where the PCR is

with denoting the first lag of the first principal component. Clearly, there are more than 5% significant parameters, but the spurious effects tend to disappear as we let the sample size increase.

presents similar information as , although now all variables have been pre-whitened, that is, for all variables we first estimate a first-order autoregression, and then we proceed with the residuals. Hence, we now first run the regressions

Table 2. The data generating process.

Display Table

and we store the , and and estimate the first principal component for these residuals. From the cells in we learn that pre-whitening makes the spurious results disappear, not only for the eigenvalues and principal components, also for the PCR.

III. Illustration

What is it that we recommend to practitioners so that they can recognize non-spurious principal components? We recommend comparing the eigenvalues before and after pre-whitening. In case of non-spurious results, these eigenvalues should be similar.

Consider as an illustration the three annual inflation rates for France, Japan and the USA, see Franses and Janssens (Citation2017) for data and graphs on these data and the others later. If we fit a first-order autoregression to each of these variables, the estimated autoregressive coefficients obtain values of 0.931, 0.776 and 0.823, respectively. These values are all approaching 1, and we therefore should be wary for similar issues as have been observed in the simulation experiments earlier.

When we apply PCA on the correlation matrix, we obtain for the raw data the eigenvalues 2.425, 0.446 and 0.129, and for the residuals after fitting country-specific autoregressive models of order 1, the eigenvalues 2.359, 0.418 and 0.223. Hence, in both situations there clearly is a single dominant principal component, with 0.808 and 0.786% of the variation explained, respectively. The weights in the first principal components are 0.610, 0.535 and 0.584 for the raw data, and 0.600, 0.553 and 0.578 for the pre-whitened data. Not only are the eigenvalues very similar, also the weights are clearly very similar.

Consider now the five annual inflation rates for the North African countries Algeria, Egypt, Libya, Morocco and Tunisia. The first-order autocorrelation are 0.772, 0.704, 0.248, 0.654 and 0.096, respectively. The first eigenvalue obtained from PCA for the raw data is 2.348 and the first principal component covers 0.470 of the total variance. The weights are 0.379, 0.421, 0.539, 0.433 and 0.448. When we fit first-order autoregressions, and apply PCA to the residuals, we get a first eigenvalue of 1.870, which is associated with only 0.374 of the total variance. The weights have become 0.404, 0.213, 0.628, 0.212 and 0.594, which seem markedly different from those for the raw data. Hence, we may have found a spurious principal component here.

In , we report the estimation results for inflation in Botswana and Lesotho, two countries that are quite far away from North Africa, but for which inflation may resonate with worldwide inflation (which we assume is the first principal component for France, Japan and USA). Each first row shows that the North African principal component seems significant at close to a 5% level, while each second row shows that the World based principal component is significant at a level much less than 5%. The forecast performance of the model including the non-spurious principal component is clearly better. When we include both principal components in a single PCR, we obtain p values of 0.168 and 0.186 for the North African components, respectively. The correlation between the two principal components is only 0.335, so the low p values are not due to high correlation between these two variables. Hence, the non-spurious principal component makes the spurious component obsolete.

Table 3. Estimation results and evaluation of one-step-ahead forecasts, sample 1961–2015.

Display Table

This illustration shows that comparing PCA outcomes for raw and pre-whitened data can be useful to diagnose non-spurious principal components.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

Ames, E., and S. Reiter. 1961. “Distributions of Correlation Coefficients in Economic Time Series.” Journal of the American Statistical Association 56: 637–656. doi:10.1080/01621459.1961.10480650.
Web of Science ®Google Scholar
Bernanke, B. S., J. Boivin, and P. Eliasz. 2005. “Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach.” The Quarterly Journal of Economics 120: 387–422.
Web of Science ®Google Scholar
Franses, P. H., and E. Janssens. 2017. “Inflation in Africa, 1960-2015, Econometric Institute Report EI-2017-26, Erasmus School of Economics.” https://repub.eur.nl/pub/102219
Google Scholar
Granger, C. W. J., and P. Newbold. 1974. “Spurious Regressions in Econometrics.” Journal of Econometrics 2: 111–120. doi:10.1016/0304-4076(74)90034-7.
Google Scholar
Heij, C., D. Van Dijk, and P. J. F. Groenen. 2011. “Real-Time Macroeconomic Forecasting with Leading Indicators: An Empirical Comparison.” International Journal of Forecasting 27: 466–481. doi:10.1016/j.ijforecast.2010.04.008.
Web of Science ®Google Scholar
Stock, J. H., and M. W. Watson. 1999. “Forecasting Inflation.” Journal of Monetary Economics 44: 293–335. doi:10.1016/S0304-3932(99)00027-6.
Web of Science ®Google Scholar
Stock, J. H., and M. W. Watson. 2002. “Forecasting Using Principal Components from a Large Number of Predictors.” Journal of the American Statistical Association 97: 1167–1179. doi:10.1198/016214502388618960.
Web of Science ®Google Scholar
Yule, G. U. 1926. “Why Do We Sometimes Get Nonsense Correlations between Time-Series? A Study in Sampling and the Nature of Time-Series.” Journal of the Royal Statistical Society A 89: 1–69. doi:10.2307/2341482.
Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Spurious principal components

ABSTRACT

I. Introduction and motivation

II. Simulation experiments

Table 1. The data generating process.

Table 2. The data generating process.

III. Illustration

Table 3. Estimation results and evaluation of one-step-ahead forecasts, sample 1961–2015.

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Spurious principal components

ABSTRACT

I. Introduction and motivation

II. Simulation experiments

Table 1. The data generating process.

Table 2. The data generating process.

III. Illustration

Table 3. Estimation results and evaluation of one-step-ahead forecasts, sample 1961–2015.

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date