Abstract
To understand within-person psychological processes, one may fit VAR(1) models (or continuous-time variants thereof) to multivariate time series and display the VAR(1) coefficients as a network. This approach has two major problems. First, the contemporaneous correlations between the variables will frequently be substantial, yielding multicollinearity issues. In addition, the shared effects of the variables are not included in the network. Consequently, VAR(1) networks can be hard to interpret. Second, crossvalidation results show that the highly parametrized VAR(1) model is prone to overfitting. In this article, we compare the pros and cons of two potential solutions to both problems. The first is to impose a lasso penalty on the VAR(1) coefficients, setting some of them to zero. The second, which has not yet been pursued in psychological network analysis, uses principal component VAR(1) (termed PC-VAR(1)). In this approach, the variables are first reduced to a few principal components, which are rotated toward simple structure; then VAR(1) analysis (or a continuous-time analog) is applied to the rotated components. Reanalyzing the data of a single participant of the COGITO study, we show that PC-VAR(1) has the better predictive performance and that networks based on PC-VAR(1) clearly represent both the lagged and the contemporaneous variable relations.
Notes
1 In Bulteel, Tuerlinckx, Brose and Ceulemans (Citation2016b), we propose the use of relative importance metrics as a solution to include shared effects. As it does not resolve the other problems identified here, we do not consider this approach in this article.
2 Similar graphs are drawn to visualize structural equation models although they are based on other conventions. In particular, manifest variables are displayed as squares, and latent variables as circles. Covariances and residuals are drawn as double-headed arrows and directed relations as single-headed arrows.
3 Similar graphs are drawn to visualize structural equation models although based on other conventions. In particular, manifest variables are displayed as squares, and latent variables as circles. Covariances and residuals are drawn as double-headed arrows, and directed relations as single-headed arrows.
4 A lag length selection analysis (Brandt & Williams, Citation2007) was performed by means of the Bayesian information criterion (BIC; Schwarz, Citation1978). The information criterion was computed using the R package vars (Pfaff, Citation2008). For a VAR model of order 1, the BIC value was minimized (i.e., the values for a number of lags ranging up to 3 are –2.45, 6.03, and 12.08, respectively). The conclusions are the same after standardizing the data.
5 We also applied EPFA to the three presented data sets. Results can be requested from the first author.