2,077
Views
1
CrossRef citations to date
0
Altmetric
Articles

Bayesian Nonparametric Panel Markov-Switching GARCH Models

ORCID Icon, &

Abstract

This article proposes Bayesian nonparametric inference for panel Markov-switching GARCH models. The model incorporates series-specific hidden Markov chain processes that drive the GARCH parameters. To cope with the high-dimensionality of the parameter space, the article assumes soft parameter pooling through a hierarchical prior distribution and introduces cross sectional clustering through a Bayesian nonparametric prior distribution. An MCMC posterior approximation algorithm is developed and its efficiency is studied in simulations under alternative settings. An empirical application to financial returns data in the United States is offered with a portfolio performance exercise based on forecasts. A comparison shows that the Bayesian nonparametric panel Markov-switching GARCH model provides good forecasting performances and economic gains in optimal asset allocation.

1 Introduction

Over the last 10 years, there has been an increasing interest in the study of volatility of large panels of asset returns, with a special focus on dynamic dependence and heterogeneity across assets. Studies on asset volatility are relevant in strategic investing decisions and useful to professional investors, such as investment companies, pension funds and mutual funds, to improve the portfolio allocation.

In this article we consider a benchmark dataset of the S&P100 constituents and compute the percentage log-returns at weekly frequency from 6th January 2000 to 3rd October 2020.

reports the estimates of the log-volatility and log-kurtosis of the 78 constituents considered in the analysis (top panel). Three dates are selected within the dot.com bubble, the financial crisis and COVID outbreak periods, respectively, where significant volatility in returns was observed. The figure indicates that volatility and kurtosis change over time with time series clustering effects. This calls for the use of Generalized Autoregressive Conditional Heteroscedasticity (GARCH) models with Markov-switching (MS) effects as to deal with regime changes and temporal clustering of the conditional volatility (e.g., see Engle Citation1982; Bollerslev Citation1986; Ang and Timmermann Citation2012; Bauwens and Otranto Citation2016).

Figure 1: Top panel: rolling window estimates of the log-volatility (left) and log-kurtosis (right) for the S&P100’s constituents from 6th January 2000 to 3rd October 2020 (30 weeks window). Vertical bars indicate three reference dates: 6th July 2002, 23rd August 2008 and 22nd February 2020. Bottom panel: cross-sectional distribution of the log-volatility (left) and log-kurtosis (right) in three reference dates.

Figure 1: Top panel: rolling window estimates of the log-volatility (left) and log-kurtosis (right) for the S&P100’s constituents from 6th January 2000 to 3rd October 2020 (30 weeks window). Vertical bars indicate three reference dates: 6th July 2002, 23rd August 2008 and 22nd February 2020. Bottom panel: cross-sectional distribution of the log-volatility (left) and log-kurtosis (right) in three reference dates.

Furthermore, the cross-sectional distribution of the volatility and kurtosis exhibits multiple modes and long tails (see bottom panel). This fact seems to imply cross-section heterogeneity in the data with possible similarities in the dynamics. Given that these effects can only be partially captured by independent MS models, several multivariate GARCH models have been proposed to account for dependence (see Virbickaite, Ausín, and Galeano Citation2015; Bauwens and Otranto Citation2016, Citation2020).

Nevertheless, the estimation of a large number of parameters with the available data dimension may lead to intractability, overfitting and loss of efficiency. Strong restrictions, such as the parameter pooling assumption, can be used, even though those appear to be too restrictive. Instead, different approaches, such as shrinkage or sparse estimation, can be applied. Compared to standard approaches (e.g., see regularization techniques), the Bayesian framework based on hierarchical prior distributions provides a coherent approach to inference that naturally allows for partial pooling and sharing of information across equations. Bayesian inference and hierarchical priors have been successfully used in econometrics to avoid overparameterization and overfitting (e.g., see Canova and Ciccarelli Citation2004). Bayesian inference accommodates for various degrees of shrinking and for sparse estimation through the choice of suitable classes of prior distributions, such as Bayesian Lasso prior (Park and Casella Citation2008) and spike-and-slab (George and McCulloch Citation1993), and can be combined with other dimensionalityreduction strategies.

In this respect, evidence of cluster-wise dependence in the distribution of financial asset returns (see Bauwens and Rombouts Citation2007) has prompted researchers to adopt cross-sectional clustering of the time series as a building block for a dimensionality reduction step in large dimensional problems and over-parameterized models (e.g., see Hirano Citation2002; Billio, Casarin, and Rossini Citation2019; Fisher and Jensen Citation2022). To address this issue, this article uses a panel MSGARCH model with cross-sectional clustering based on a Bayesian nonparametric (BNP) prior (Lo Citation1984). In order to detect the number of regimes, we extend the approach of Otranto and Gallo (Citation2002) to a panelframework.

A hierarchical Pitman-Yor process prior (Pitman and Yor 1997) for the MSGARCH parameters is considered. In the first stage of the hierarchical prior, cross-unit heterogeneity is allowed for, while shrinking all unit-specific parameters toward a common mean. The second stage of the hierarchy allows for mixed effects in the common mean. There are many advantages in using this hierarchical nonparametric prior. First, our approach allows for making inference on the number of mixture components in the cross-sectional clustering. Second, it adds flexibility to the model allowing for different shapes of the prior and posterior predictive distributions. Third, the predictive distribution incorporates uncertainty in the parameters and in the number of mixture components. Nonparametric Bayesian techniques have been largely and successfully used in different fields such as biostatistics (Do, Müller, and Tang Citation2005), biology Arbel, Mengersen, and Rousseau (Citation2016), medicine (Xu et al. Citation2016), and neuroimaging (Zhang et al. Citation2016). For an introduction to Bayesian nonparametrics see Hjort et al. (Citation2010) and for a review of models and applications in different fields see Müller and Mitra (Citation2013).

The inference proposed in this article is novel in some respects. As such, the article contributes to the Bayesian nonparametrics literature for time series analysis (e.g., see Taddy and Kottas Citation2009; Jensen and Maheu Citation2010; Griffin and Steel Citation2011; Di Lucca et al. Citation2013; Bassetti, Casarin, and Leisen Citation2014; Billio, Casarin, and Rossini Citation2019; Nieto-Barajas and Quintana Citation2016; Griffin and Kalli Citation2018). The article innovates on the Bayesian nonparametric dynamic panel model in Hirano (Citation2002) by introducing Markov-switching and GARCH dynamics and extends the nonparametric switching regression in Taddy and Kottas (Citation2009) to a panel model with GARCH dynamics. Our approach differs from those in Hirano (Citation2002) and Taddy and Kottas (Citation2009), and is in line with the strategies for large dimensional and over-parameterized models (e.g., see MacLehose and Dunson Citation2010; Wang Citation2010; Billio, Casarin, and Rossini Citation2019), where nonparametric hierarchical priors are used to combine partial pooling and clustering effects in theparameter space.

Further, differently from Hirano (Citation2002) and Taddy and Kottas (Citation2009), the article uses a MCMC algorithm for posterior approximation that relies on the efficient sampling method developed in Walker (Citation2007), Kalli, Griffin, and Walker (Citation2011), and Hatjispyros, Nicoleris, and Walker (Citation2011). Lastly, the article makes a contribution to the literature on Bayesian Markov-switching panel models (e.g., see Kaufmann Citation2010, Citation2015; Billio et al. Citation2016; Casarin et al. Citation2019) by introducing GARCH effects and allowing for a flexible nonparametric specification.

The estimation of MSGARCH models is also a difficult task given the path dependence problem (Gray Citation1996) and approximation methods have been considered (e.g., see Haas, Mittnik, and Paolella Citation2004; Ardia Citation2008; Bauwens, Preminger, and Rombouts Citation2010; He and Maheu Citation2010; Bauwens, Dufays, and Rombouts Citation2014; Dufays Citation2016; Wee, Chen, and Dunsmuir Citation2022). In this article, we extend the univariate Gibbs sampler by Billio, Casarin, and Osuntuyi (Citation2016) to a multiple time series set-up and provide an efficient MCMC procedure for the hidden states of a panel MSGARCH model. The proposed method relies on a combination of blocking Gibbs and generalized Metropolis samplers.

An empirical application to financial returns of the S&P100 constituents in the United States over the period 6th January 2000–3rd October 2020 is provided. The analysis offers an optimal portfolio comparison based on one-step ahead forecasts generated by the Bayesian nonparametric panel MSGARCH (BNP-MSGARCH) and competitive models, such as a Bayesian nonparametric GARCH model without regime changes (BNP-GARCH), and a parametric GARCH. First, we detect the number of regimes using a new developed panel version of the univariate procedure by Otranto and Gallo (Citation2002), fit the BNP-MSGARCH to the data, and examine the clusters’ composition. For the BNP-MSGARCH model, we consider two different regime identifications: (a) a restriction on the expected returns (BNP-MSGARCH-μ); and (b) a restriction on the unconditional volatility level (BNP-MSGARCH-γ), for example, see Ardia et al. (Citation2019). Second, the BNP-MSGARCH is evaluated against the other GARCH models in out-of-sample forecasts through statistical accuracy measures. Third, a portfolio exercise is carried out to ascertain economic gains. The main results show that the BNP-MSGARCH model is competitive against the other models for point-forecast and superior for density-forecast and portfolio performance.

The article is organized as follows. Section 2 describes the panel MSGARCH model and the Bayesian nonparametric prior distribution. Section 3 presents the data augmentation strategy and the posterior approximation method. Section 4 offers the empirical application. Section 5 concludes.

2 A Bayesian Nonparametric MSGARCH Model

We assume that the observable variable yit for the ith unit of the panel at time t satisfies (1) yit=μi(sit)+σitεit,εitiidN(0,1),(1) t=1,,T,  i=1,,N,

where N(μ,σ2) denotes the Gaussian distribution with location μ and scale σ. The conditional variance is as follows (2) σit2=γi(sit)+αi(sit)εit12+βi(sit)σit12,(2) which is the MSGARCH model, and sit, t=1,,T is a hidden Markov chain process with transition probability P(sit=k|sit1=l)=pi,lk, k,l=1,,K, with K the number of states. The functional form for the switching parameters is (3) μi(sit)=k=1KμikI(sit=k),αi(sit)=k=1KαikI(sit=k),(3) (4) βi(sit)=k=1KβikI(sit=k),γi(sit)=k=1KγikI(sit=k).(4)

We cope with the high dimension of the parameter space and related overfitting issues by exploiting cross-sectional clustering of the series. More specifically we propose to combine two modeling strategies. First, we assume soft parameter pooling through a hierarchical prior distribution with two stages, and second we introduce clustering effects in the parameter space through a nonparametric prior. The resulting joint prior distribution for the MSGARCH parameters is given by the following.

In the first stage, the rows pi,k=(pi,k1,,pi,kK) of the transition matrix follow a Dirichlet distribution (5) (pi,k1,,pi,kK)iidDir(ϕrk1,,ϕrkK),i=1,,N,(5) for all regimes k=1,,K, where the precision parameter ϕ shrinks the unit-specific probabilities toward a common value rk=(rk1,,rkK), that is the Dirichlet distribution parameter. For the second stage we assume (6) (rk1,,rkK)iidDir(1/K,,1/K).(6)

To address the high-dimensionality issue, the first stage of the hierarchical prior shrinks the switching parameters toward some common values, and the second stage clusters the units in Mk groups C1,k,,CMk,k within each regime, such that Ch,kCl,k= for hl and h=1MkCh,k={1,,N}. In the first stage, for k=1,,K we assume (7) μikN(μ˜ik*,s),γik/aBe(rγ˜ik*/a,r(1γ˜ik*/a)),(7) (8) αikBe(rα˜ik*,r(1α˜ik*)),βikBe(rβ˜ik*,r(1β˜ik*)),(8) where Be(α,β) is the beta distribution with mean α/(α+β) and a>0. The scale hyperparameters s and r are shrinking θik=(μik,γik,αik,βik)R×[0,a]×[0,1]2 toward the parameter θ˜ik*=(μ˜ik*,γ˜ik*,α˜ik*,β˜ik*)R×[0,a]×[0,1]2 which is assumed to be constant for all units in the same cluster, that is for all iCh,k where h=1,,Mk (for further details see Section 3 and (23)).

The second stage of the hierarchy generates the clusters of parameters. For each regime k we assume a Pitman-Yor process (PYP) prior (9) θ˜ik*|GkiidGk,GkPYP(ν,ψ,H0),(9) with base measure H0 and concentration and dispersion parameters ν[0,1] and ψ>ν, respectively. We assume H0 is the product of independent normal and uniform distributions N(μ;m*,s*), U(γ;0,a), U(α;0,1) and U(β;0,1), which are usually chosen priors in parametric Bayesian inference for MSGARCH (e.g., see Billio, Casarin, and Osuntuyi Citation2016). The PYP introduced in Pitman and Yor (Citation1997) is a generalization of the Dirichlet process which can be obtained for ν=0 (e.g., see Bassetti, Casarin, and Leisen Citation2014).

Through the illustration of the Chinese Restaurant metaphor, the clustering structure of the PYP is defined by a Polya-Urn sampling scheme. The parameter θ˜i* of the ith unit is either equal to one of the other units or a new one from the base distribution H0, that is, (10) θ˜ik*|θ˜1k*,,θ˜i1k*=1ψν+ij=1i1δθ˜jk*(θ˜ik*)+ψψν+iH0(θ˜ik*).(10)

This sequential allocation procedure generates clusters in the parameter space, where the number of clusters Mk is random with the following prior distribution P(Mk=h)=νh1Γ(ψ/ν+h)Γ(ψ+1)Γ(ψ/ν+1)Γ(ψ+N)Sν(N,h),with hN, where Sν(N,h) is a generalized Stirling number of the first kind, and Γ(x) is the one-parameter gamma function (e.g., see Bassetti, Casarin, and Leisen Citation2014). To evaluate the prior mean of the number of clusters, we use E(Mk)={h=1Nψψ+h1,if  ν=0,Γ(ψ+ν+N)Γ(ψ+1)νΓ(ψ+ν)Γ(ψ+N)ψν,if  ν0.

We summarize our Bayesian nonparametric model in the Directed Acyclic Graph (DAG) representation of .

Figure 2: DAG of the Bayesian nonparametric panel MSGARCH model. It exhibits the hierarchical structure of the observations yt=(y1t,,yNt) (boxes), the latent state variables st=(s1t,,sNt) (gray circles), the parameters of the transition probability matrix Pi=(pi,1,,pi,K), θi=(μi,γi,αi,βi), the hyperparameters of the first stage R=(r1,,rK), θ˜i=(μ˜i*,γ˜i*,α˜i*,β˜i*) and of the second stage G (white circles). The directed arrows show the conditional independence structure of the model.

Figure 2: DAG of the Bayesian nonparametric panel MSGARCH model. It exhibits the hierarchical structure of the observations yt=(y1t,…,yNt) (boxes), the latent state variables st=(s1t,…,sNt) (gray circles), the parameters of the transition probability matrix Pi=(pi,1,…,pi,K), θi=(μi,γi,αi,βi), the hyperparameters of the first stage R=(r1,…,rK), θ˜i∗=(μ˜i*,γ˜i*,α˜i*,β˜i*) and of the second stage G (white circles). The directed arrows show the conditional independence structure of the model.

The PYP clustering effects on the cross section correspond to a probabilistic clustering of the parameters based on an infinite mixture distribution. The PYP can be written in a Sethuraman’s like representation as a discrete random measure (see Sethuraman Citation1994) (11) Gk(dθ)=h=1Whkδθhk(dθ),(11)

where the atoms θhk* are iid random variables from the base measure H0 and the random weights Whk have the stick-breaking representation (12) Whk=Vhkl=1h1(1Vlk),(12) with VlkBe(1ν,ψ+νl), l=1,2,,h (see Arbel, Blasi, and Prünster Citation2019).

By integrating out the discrete part of the hierarchical prior, the following infinite mixture representation of the prior distribution on θ is obtained (13) θik|Gkindπ(θik|θ*)Gk(dθ*)=h=1Whkπ(θik|θhk*),(13) where π(θik|θ*) is the joint parameter distribution at the first stage of hierarchical prior (see (7)–(8)) and Gk(θ*) is the distribution at the second stage. In conclusion the PYP prior allows for probabilistic clustering in the parameter space.

The predictive density induced by our prior assumptions can be written as (14) yit|G,sitindh=1Whsitft(yit|sit,θisit)π(dθisit|θhsit*),(14) where G=(G1,,GK) is the collection of state-dependent measures, ft(yit|sit=k,θisit)=f(yit|μik,σik,t) is the transition kernel of the MSGARCH with σik,t2=γik+αikεit12+βikσit12 for k=1,,K and θik=(μik,γik,αik,βik). This prior predictive density accounts for various forms of possible heterogeneity in the data such as asymmetry, excess of kurtosis and multimodality.

3 Posterior Approximation

In this section we discuss the approximation of the posterior distribution, some identification issues and the selection of the number of regimes. We also summarize the simulation results for approximation efficiency and inference effectiveness. Let Θ=(θ1,,θK) be the collection of the units and regime parameters θk=(θ1k,,θNk) and P=(P1,,PN) the collection of transition probability matrices. Let Y=(y1,,yT) and S=(s1,,sT) be the collection over time of observation vectors yt=(y1t,,yNt) and of latent vectors st=(s1t,,sNt), respectively. The likelihood function of the panel MSGARCH model is not tractable being written in integral form. However, a data-augmentation principle can be applied to develop efficient posterior simulation methods (Frühwirth-Schnatter Citation1994). We introduce the set of auxiliary allocation variables ξik,t=I(sit=k) to write the complete-data likelihood function as follows (15) L(Y,Ξ|Θ,P)=t=1Ti=1Nf(yit|sit,θi,sit)l=1Kk=1lpi,lkξil,t1ξik,t,(15) where Ξ=(Ξ1,,ΞT) is the collection of the latent vectors Ξt=(ξ1t,,ξNt) over time, with ξit=(ξi1,t,,ξiK,t).

The joint hierarchical prior distribution is (16) π(Θ,G)=l=1K(i=1Nπ(θil|Gl)k=1lpi,lkrk1)π(Vl)π(Θl*),(16) where the infinite mixture priors, for k=1,,K, are given by (17) π(θik|Gk)=h=1Whkπ(θik|θhk*),(17)

where π(θik|θhk*)= N(μik|μhk*,s) Be(αik|αhk*,r) Be(βik|βhk*,r) Be(γik/a|γhk*/a,r) is the first-stage joint prior distribution given in (7)–(8), and (18) π(Θk*)=h=1N(μhk*;m*,s*)U(αhk*;0,1)U(βhk*;0,1)U(γhk*;0,a),(18) (19) π(Vk)=l=1Be(Vlk;1ν,ψ+νl),(19) are the joint distributions of the infinite collection of atoms and stick-breaking variables, Θk*=(θ1k*,θ2k*,) and Vk=(V1k,V2k,), respectively.

The joint prior distribution in a Bayesian nonparametric framework is usually not tractable since its support is the space of the discrete random measures which are infinite-dimensional objects (see (17)–(19)). Nevertheless, the data-augmentation principle can be applied in order to make the inference problem more tractable. Following the recent Bayesian nonparametric literature (e.g., see Bassetti, Casarin, and Leisen Citation2014; Bassetti, Casarin, and Ravazzolo Citation2018; Billio, Casarin, and Rossini Citation2019), we introduce a set of slice variables UikU(0,1) and define the index set Aik={h|Uik<Whk}. Then the infinite mixture can be demarginalized as follows (20) π(θik|Uk,Vk,θk*)=h=1I(Uik<Whk)π(θik|θhk*)=hAikπ(θik|θhk*),(20) which is a almost-surely finite mixture since Card(Aik)< a.s., where Uk=(U1k,,UNk) is the collection of slice variables.

Following the standard practice in finite mixture modeling we introduce the latent allocation variable DikAik and obtain (21) π(θik|Uk,Dk,Vk,θk*)=I(Uik<WDikk)π(θik|θDikk*),(21) where Dk=(D1k,,DNk). Let us denote with V=(V1,,VK), U=(U1,,UK) and Θ*=(θ1*,,θK*) the collections of regime-specific auxiliary variables and atoms. The joint posterior distribution π(Ξ,Θ,P,U,D,V,Θ*|Y) is proportional to (22) L(Y,Ξ|Θ,P)=l=1K(i=1Nπ(θil|Ul,Dl,Vl,Θl*)k=1lpi,lkrk1)×π(Vl)π(Θl).(22)

Note that the allocation variables allow to reconcile the notations used in the hierarchical model of (7) and the random measure representation in (11)–(13) as follows (23) θ˜ik*=θDikk*.(23)

A Gibbs sampler is used to generate random samples from the joint posterior and to approximate the Bayesian estimator. The Gibbs sampler iterates the following steps

  1. Sample slice and stick-breaking variables U and V given Ξ,Θ,P,D,Θ*,Y

  2. Sample the transition probability matrices P given Ξ,Θ,U,D,V,Θ*,Y

  3. Sample the atoms Θ* given Ξ,Θ,P,U,D,V,Y

  4. Sample the MSGARCH parameters Θ given Ξ,P,U,D,V,Θ*,Y

  5. Sample the switching allocation variables Ξ given Θ,P,U,D,V,Θ*,Y

  6. Sample the mixture allocation variables D given Ξ,Θ,P,U,V,Θ*,Y

The derivation of the full conditional distributions and sampling method are given in Appendix A. The parameter estimator θ̂, with θ element of Θ, is approximated as average of MCMC draws from the posterior. The latent state estimator is approximated as ŝit=k=1Kkξ̂ik,t, with ξ̂ik,t the kth element of the switching allocation variable estimator ξ̂it=argmax{q̂(ξit|Y)}, where q̂(ξit|Y)} is the MCMC approximation of the discrete posterior distribution q(ξit|Y), obtained from the Forward Filtering Backward Sampling step.

The likelihood function and the posterior distribution remain unchanged with respect to any state permutation. This identification issue and the related label switching problem are commonly solved by imposing a prior restriction on the parameters θik, or on a transformation q(θik) (see Celeux Citation1998; Frühwirth-Schnatter Citation2001, Citation2006) such as a prior ordering, q(θi1)>q(θi2)>>q(θiK), with some economic interpretation. In this article we implement two alternative strategies based on performance regimes μi1>μi2>>μiK (BNP-MSGARCH-μ model) and on volatility regimes γi1/(1αi1βi1)>γi2/(1αi2βi2)>>γiK/(1αiKβiK) (BNP-MSGARCH-γ model). The first identification scheme mimics an investing strategy which classifies risky assets into different groups (styles) and move invested funds among these styles depending on their relative performance. This strategy would be decisive in monitoring the relative performance of style portfolios, especially during large market drops so to better understand where markets are heading (see Bianchi Citation2020). In the literature, this approach for asset allocation is known as “style investing” or “style strategies” (Barberis and Shleifer Citation2003) and the style identification criterion is known as “style features.” The classification of assets usually relies on industry sectors (Jame and Tong Citation2014) or factors (Barberis and Shleifer Citation2003), whereas in this article it is driven by our statistical model. The second identification scheme is more common in volatility modeling and can be used to protect portfolio allocation against excess of volatility (i.e., Barro, Canestrelli, and Consigli Citation2019).

In order to detect the number of regimes, we extend the univariate approach by Otranto and Gallo (Citation2002) to a panel data framework. As in Otranto and Gallo (Citation2002), we exploit the fact that the joint density of a panel MS model is a mixture density with the number of components equal to the number of regimes and specify the following multivariate BNP model yt|(μ˜t,Λ˜t1) indNN(μ˜t,Λ˜t1), t=1,2,,T, with location μ˜t=(μ˜1t,,μ˜Nt) and diagonal precision matrix Λ˜t=diag(λ˜1t,,λ˜Nt), such that (μ˜t,Λ˜t1)|GiidG, where GPYP(υ0,ϕ0,G0) is a PY prior. This approach is flexible and robust to structural change-points at the beginning or at the end of the sample. Procedure details are given in Appendix A. For MCMC approximation efficiency and inference effectiveness, we run simulation experiments for four different data generating processes (DGPs) using BNP-MSGARCH-μ. We consider two regimes and various degrees of separation in the cluster intercept and variance. The simulation results show that the MCMC chain converges and a thinning of 10 is needed to reduce the Monte Carlo variance. Details are reported in Appendix B (supplementary materials).

4 Empirical Application

This section presents an empirical application of the BNP-MSGARCH model to the S&P100 constituents over the period 6th January 2000–3rd October 2020. 78 assets of the 101 constituents are selected in order to keep the panel balanced. Further, since style investing uses weekly or monthly predictions (e.g., see Froot and Teo Citation2008; Brookfield, Su, and Bangassa Citation2015), the percentage log-returns at weekly frequency are obtained. As a preliminary evidence, the cross-sectional heterogeneity observed in is robust to the choice of the rolling window size (see Appendix C, supplementary materials). In the analysis, we proceed in three steps: (i) detection of the number of regimes; (ii) forecast comparison; (iii) portfolio analysis.

The number of regimes is detected using the procedure described in Section 3, the BNP-MSGARCH model is fitted to the data (using both BNP-MSGARCH-μ and BNP-MSGARCH-γ), and the clusters’ composition is examined. We set two regimes (the extend procedure of Otranto and Gallo (Citation2002) points to 2 regimes, see ). When ordering the expected returns in the BNP-MSGARCH-μ model (μi1>μi2), regime 1 corresponds to a relative over-performance state and regime 2 to an under-performance state. Observations of the assets are assigned to one of the two regimes on the basis of the largest posterior probability (see ŝit in Section 3). We notice that this identification constraint is strongly supported by the data and allows us to separate the assets returns in two performance regimes (see ).

Figure 3: The posterior co-clustering matrix in regime 1 (left) and 2 (right) for expected returns (top) and volatility (bottom) identification constraints. In each block, gray shades represent the sector membership of the assets.

Figure 3: The posterior co-clustering matrix in regime 1 (left) and 2 (right) for expected returns (top) and volatility (bottom) identification constraints. In each block, gray shades represent the sector membership of the assets.

Figure 4: Price-to-Earning for the assets in clusters of regime 1 (left) and of regime 2 (right), constraint on expected returns (top panel). Logarithm of implied volatility for the assets in clusters of regime 1 (left) and of regime 2 (right), constraint on volatility (bottom panel).

Figure 4: Price-to-Earning for the assets in clusters of regime 1 (left) and of regime 2 (right), constraint on expected returns (top panel). Logarithm of implied volatility for the assets in clusters of regime 1 (left) and of regime 2 (right), constraint on volatility (bottom panel).

For cluster identification, we assume a quite diffuse PYP prior distribution (ν=0 and ψ=10). The posterior distribution is concentrated (see Figure C5) suggesting a substantial revision of the prior information and the MAP estimates of the number of clusters is 2 and 3 for regime 1 and 2, respectively. To study the composition of the clusters in the two regimes, we use the sector classification (for details, see Appendix C) and some fundamental financial ratios. First, we identify the clusters using the co-clustering probability matrix, which contains the probability P({Dik=Djk}|Mk,Y) that the parameters θik* and θjk* are in the same cluster. This probability can be easily approximated by using the MCMC samples as rRkδ(Djk(r)Dik(r))/Card(Rk), where Dik(r) is a sample of the allocation variable for the i-unit parameters in the regimes k, and Rk={r=1,,R|Nk(r)=M} contains the values of MCMC iterations such that the parameters of the panel units are allocated to exactly M mixture components. A spectral clustering algorithm has been applied to reorder the series and provide a better graphical representation of the clusters.

The top panel of reports the co-clustering matrices for the two regimes in case of expected returns identification constraint. In each block matrix, the algorithm identifies a constituent (asset) belonging to a cluster with the label 1 (gray shaded patch) and 0 (white patch) otherwise. The gray shades represent the sectors in the clusters. Following Wade and Ghahramani (Citation2018), we use the variation of information (VI) metric proposed by Meilâ (Citation2007) to compare the two regimes (in terms of clusters). This measure compares the information in the two regimes with the information shared between the two regimes. The normalized value of VI is equal to 0.20 (VI lies in the interval 0log2(N), and a normalized value is obtained dividing VI by log2(N)), which suggests a substantial difference between the clustering and composition in the two regimes (see ).

In particular, for cluster 1 in both regimes, the majority of the sectors representing the assets are: manufacturing (about 40% in both regimes), financial and insurance (19% in the first regime), and wholesale and retail (25% in the second regime). Similar results for the sectors are found for cluster 2 in the two regimes: the manufacturing sector represents about 40% of the assets in the two regimes, while the financial and insurance sector is about 18% for regime 2, and information and communication is around 20%.

Further, in order to characterize the clusters in terms of the market size, measured by market capitalization, of the constituents, we first rank the companies by computing the average size of each of them using the last year of the sample period. Then, we classify the assets into three groups, namely small (bottom panel 30%), medium (middle panel 40%) and big (top panel 30%) companies (see Tab. C4). In regime 1, companies with the medium size represent the largest majority about 40%, and a similar outcome is also observed for regime 2.

For the cluster composition, we also compute the value of the Price-to-Earnings ratio (PE) for all the clusters (the average PE is computed over the last 10 years following the standard practice in style analysis). For regime 1, clusters 1 and 2 have values of PE equal to 26.97 and 32.38, respectively. For regime 2, these values are 27.08, 35.69, and 23.72 for clusters 1, 2, and 3, respectively. In both regimes cluster 2 is overvalued. Further to this, shows that dynamics in clusters 1 and 2 in regime 1 resemble those in regime 2.

Compared to the case of constraint on expected returns, the volatility identification strategy returns separated regimes for the single series, while the identification is less effective for the panel as a whole. Further, the posterior distribution concentrates on 5 and 4 cluster for regime 1 and 2, respectively (see Figure C5). The normalized value of VI is equal to 0.41 (see bottom panel of ). The analysis of implied volatility points to a difference in the level rather than in the dynamics (see ). Further details of all the results on cluster composition are reported in Appendix C.

As regards the one-step ahead forecast comparison, we use the following models: BNP-MSGARCH-μ, BNP-MSGARCH-γ, BNP-GARCH, and GARCH. All the models are GARCH(1,1). The forecasts are computed for the period December 26, 2014–October 3, 2020 by a rolling window of 780 observations. We assess the forecasts by the RMSE and CRPS for the full out-of-sample period, and pre-COVID and COVID periods. The RMSE in indicates that models show equal predictive ability, especially for the full out-of-sample period and the pre-COVID period, with marginal differences (Appendix C includes box-plots of RMSEs). In the COVID period, the performance of all models worsens, but the largest flexibility of the nonparametric GARCH models ensures better forecasts.

Table 1: Out-of-sample evaluation. RMSE and CRPS.

When looking at CRPS, BNP-MSGARCH-γ ranks first and BNP-MSGARCH-μ places second. This result is likely due to the fact that CRPS accounts for differences not only in mean but also in higher order moments of the forecast distributions. The BNP-GARCH and the GARCH have similar performance, with the exception of the COVID period.

A Markowitz portfolio allocation based on forecasts is carried out to establish economic gains. An investor deals with the following decision problem V=i=1Nj=1NXiXjσij, s.t. E=i=1NXiμyi=y¯,i=1NXi=1,Xi0, where E is the expected return from a portfolio, Xi is portfolio weights chosen to minimize the portfolio risk, and μyi is the expected return of the ith asset. Xi is assumed to be nonnegative (no short sales). We omit the constraint E=i=1NXiμyi=y¯ to build a minimum variance portfolio (see Hlouskova, Schmidheiny, and Wagner Citation2009). For the portfolio, we compute the Sharpe ratio for each model using the weekly treasury bill rate from Kenneth French’s data library as a risk free (RF) rate (https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html)

In terms of Sharpe ratio, BNP-MSGARCH models gain the best portfolio performance (), indicating that the largest flexibility of nonparametric models has an economic value. The portfolio proportion invested in value stocks rises in periods of increasing overperforming probability (see Figure C9). The weights to stocks with large implied volatility decrease when the probability of high-volatility regime increases. In particular, value stocks are undervalued at the beginning of the outbreak and reconvey to a larger portfolio proportion toward the end of 2020.

5 Conclusion

The study of volatility in large panels of financial assets is central for professional investors focusing on protection of invested portfolios, and for policy maker aiming at the stability of the economic system. The evidence of clusters in financial returns has attracted researchers’ attention. This article proposes a new Bayesian nonparametric (BNP) inference for Markov-switching GARCH (BNP-MSGARCH) models with clustering effects to deal with regime changes, temporal and cross-sectional clustering in large panels of financial data.

Within the BNP-MSGARCH framework, the number of regimes is detected using a new BNP procedure that extends the univariate approach of Otranto and Gallo (Citation2002) to panel data. Further, to capture cross-sectional clustering, this article proposes a hierarchical Pitman-Yor process prior for the MSGARCH parameters. The simulations show that our posterior approximation procedure is efficient and our inference is able to recover the true value of the parameters and the number of clusters and regimes.

An application to the S&P100 constituents is offered for forecasting and portfolio allocation purposes. There is clear-cut evidence of asset clustering with heterogenous composition across sectors and style features, which ensures a superior density-forecast performance and a better portfolio allocation for the BNP-MSGARCH model.

Supplementary Materials

Simulation results and further details on the empirical application are given in the online supplementary materials.

Supplemental material

Supplemental Material

Download PDF (2.6 MB)

Supplemental Material

Download Zip (116.8 KB)

Acknowledgments

The authors are grateful to the Editor and the Reviewers for their useful comments which significantly improved the quality of the article. Also, we would like to thank all the conference participants for helpful discussions at the “International Society for Bayesian Analysis World meeting,” 2022, Montreal, Canada, the “Annual Meeting of the Statistical Society of Canada,” 2022, Ottawa, Canada and the “Advances in Bayesian Analysis Workshop,” 2022, Venice, Italy. We benefited greatly from suggestions and discussions with Federico Bassetti, John M. Maheu, Mark J. Jensen and Herman K. van Dijk. This research used the SCSCF and HPC multiprocessor cluster systems at Ca’ Foscari University of Venice.

References

  • Ang, A., and Timmermann, A. (2012), “Regime Changes and Financial Markets,” Annual Review of Financial Economics, 4, 313–337. DOI: 10.1146/annurev-financial-110311-101808.
  • Arbel, J., Blasi, P. D., and Prünster, I. (2019), “Stochastic Approximations to the Pitman–Yor Process,” Bayesian Analysis, 14, 1201–1219. DOI: 10.1214/18-BA1127.
  • Arbel, J., Mengersen, K., and Rousseau, J. (2016), “Bayesian Nonparametric Dependent Model for Partially Replicated Data: The Influence of Fuel Spills on Species Diversity,” The Annals of Applied Statistics, 10, 1496–1516. DOI: 10.1214/16-AOAS944.
  • Ardia, D. (2008), Financial Risk Management with Bayesian Estimation of GARCH Models: Theory and Applications, volume 612 of Lecture Notes in Economics and Mathematical Systems, Berlin: Springer.
  • Ardia, D., Bluteau, K., Boudt, K., Catania, L., and Trottier, D.-A. (2019), “Markov-Switching GARCH Models in R: The MSGARCH Package,” Journal of Statistical Software, 91, 1–38. DOI: 10.18637/jss.v091.i04.
  • Barberis, N., and Shleifer, A. (2003), “Style Investing,” Journal of Financial Economics, 68, 161–199. DOI: 10.1016/S0304-405X(03)00064-3.
  • Barro, D., Canestrelli, E., and Consigli, G. (2019), “Volatility versus Downside Risk: Performance Protection in Dynamic Portfolio Strategies,” Computational Management Science, 16, 433–479. DOI: 10.1007/s10287-018-0310-4.
  • Bassetti, F., Casarin, R., and Leisen, F. (2014), “Beta-Product Dependent Pitman-Yor Processes for Bayesian Inference,” Journal of Econometrics, 180, 49–72. DOI: 10.1016/j.jeconom.2014.01.007.
  • Bassetti, F., Casarin, R., and Ravazzolo, F. (2018), “Bayesian Nonparametric Calibration and Combination of Predictive Distributions,” Journal of the American Statistical Association, 113, 675–685. DOI: 10.1080/01621459.2016.1273117.
  • Bauwens, L., Dufays, A., and Rombouts, J. (2014), “Marginal Likelihood for Markov-switching and Change-Point GARCH,” Journal of Econometrics, 178, 508–522. DOI: 10.1016/j.jeconom.2013.08.017.
  • Bauwens, L., and Otranto, E. (2016), “Modeling the Dependence of Conditional Correlations on Market Volatility,” Journal of Business and Economic Statistics, 34, 254–268. DOI: 10.1080/07350015.2015.1037882.
  • Bauwens, L., and Otranto, E. (2020), “Nonlinearities and Regimes in Conditional Correlations with Different Dynamics,” Journal of Econometrics, 217, 496–522.
  • Bauwens, L., Preminger, A., and Rombouts, J. (2010), “Theory and Inference for a Markov Switching GARCH Model,” Econometrics Journal, 13, 218–244. DOI: 10.1111/j.1368-423X.2009.00307.x.
  • Bauwens, L., and Rombouts, J. (2007), “Bayesian Clustering of Many GARCH Models,” Econometric Reviews, 26, 365–86. DOI: 10.1080/07474930701220576.
  • Bianchi, F. (2020), “The Great Depression and the Great Recession: A View from Financial Markets,” Journal of Monetary Economics, 114, 240–261. DOI: 10.1016/j.jmoneco.2019.03.010.
  • Billio, M., Casarin, R., and Osuntuyi, A. (2016), “Efficient Gibbs Sampling for Markov Switching GARCH Models,” Computational Statistics and Data Analysis, 100, 37–57. DOI: 10.1016/j.csda.2014.04.011.
  • Billio, M., Casarin, R., Ravazzolo, F., and Van Dijk, H. (2016), “Interactions between Eurozone and US Booms and Busts: A Bayesian Panel Markov-switching VAR Model,” Journal of Applied Econometrics, 31, 1352–1370. DOI: 10.1002/jae.2501.
  • Billio, M., Casarin, R., and Rossini, L. (2019), “Bayesian Nonparametric Sparse VAR Models,” Journal of Econometrics, 212, 97–115. DOI: 10.1016/j.jeconom.2019.04.022.
  • Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. DOI: 10.1016/0304-4076(86)90063-1.
  • Brookfield, D., Su, C., and Bangassa, K. (2015), “Investment Style Positioning of UK Unit Trusts,” The European Journal of Finance, 21, 946–970. DOI: 10.1080/1351847X.2013.788533.
  • Canova, F., and Ciccarelli, M. (2004), “Forecasting and Turning Point Prediction in a Bayesian Panel VAR Model,” Journal of Econometrics, 120, 327–359. DOI: 10.1016/S0304-4076(03)00216-1.
  • Carter, C. K., and Kohn, R. (1994), “On Gibbs Sampling for State Space Models,” Biometrika, 81, 209–230. DOI: 10.1093/biomet/81.3.541.
  • Casarin, R., Foroni, C., Marcellino, M., and Ravazzolo, F. (2019), “Uncertainty Through the Lenses of a Mixed-Frequency Bayesian Panel Markov Switching Model,” Annals of Applied Statistics, 12, 2559–2586.
  • Celeux, G. (1998), “Bayesian Inference for Mixture: The Label Switching Problem,” in Compstat, eds. R. Payne and P. Green, pp. 227–232, Heidelberg: Physica.
  • Di Lucca, M., Guglielmi, A., Müller, P., and Quintana, F. (2013), “A Simple Class of Bayesian Nonparametric Autoregression Models,” Bayesian Analysis, 8, 63–88. DOI: 10.1214/13-BA803.
  • Do, K.-A., Müller, P., and Tang, F. (2005), “A Bayesian Mixture Model for Differential Gene Expression,” Journal of the Royal Statistical Society, Series C, 54, 627–644. DOI: 10.1111/j.1467-9876.2005.05593.x.
  • Dufays, A. (2016), “Infinite-state Markov-switching for Dynamic Volatility and Correlation Models,” Journal of Financial Econometrics, 14, 418–460. DOI: 10.1093/jjfinec/nbv017.
  • Engle, R. F. (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation,” Econometrica, 50, 987–1007. DOI: 10.2307/1912773.
  • Fisher, M., and Jensen, M. J. (2022), “Bayesian Nonparametric Learning of How Skill is Distributed Across the Mutual Fund Industry,” Journal of Econometrics, 230, 131–153. DOI: 10.1016/j.jeconom.2021.04.002.
  • Froot, K., and Teo, M. (2008), “Style Investing and Institutional Investors,” Journal of Financial and Quantitative Analysis, 43, 883–906. DOI: 10.1017/S0022109000014381.
  • Frühwirth-Schnatter, S. (1994), “Data Augmentation and Dynamic Linear Models,” Journal of Time Series Analysis, 15, 183–202. DOI: 10.1111/j.1467-9892.1994.tb00184.x.
  • Frühwirth-Schnatter, S.(2001), “Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models,” Journal of the American Statistical Association, 96, 194–209.
  • Frühwirth-Schnatter, S.(2006), Finite Mixture and Markov Switching Models, New York: Springer.
  • George, E. I., and McCulloch, R. E. (1993), “Variable Selection via Gibbs Sampling,” Journal of the American Statistical Association, 88, 881–889. DOI: 10.1080/01621459.1993.10476353.
  • Gray, S. F. (1996), “Modeling the Conditional Distribution of Interest Rates as a Regime-switching Process,” Journal of Financial Economics, 42, 27–62. DOI: 10.1016/0304-405X(96)00875-6.
  • Griffin, J., and Kalli, M. (2018), “Bayesian Nonparametric Vector Autoregressive Models,” Journal of Econometrics, 203, 267–282. DOI: 10.1016/j.jeconom.2017.11.009.
  • Griffin, J. E., and Steel, M. F. J. (2011), “Stick-Breaking Autoregressive Processes,” Journal of Econometrics, 162, 383–396. DOI: 10.1016/j.jeconom.2011.03.001.
  • Haas, M., Mittnik, S., and Paolella, M. (2004), “A New Approach to Markov Switching GARCH Models,” Journal of Financial Econometrics, 2, 493–530. DOI: 10.1093/jjfinec/nbh020.
  • Hatjispyros, S. J., Nicoleris, T. N., and Walker, S. G. (2011), “Dependent Mixtures of Dirichlet Processes,” Computational Statistics & Data Analysis, 55, 2011–2025. DOI: 10.1016/j.csda.2010.12.005.
  • He, Z., and Maheu, J. (2010), “Real Time Detection of Structural Breaks in GARCH Models,” Computational Statistics & Data Analysis, 54, 2628–2640. DOI: 10.1016/j.csda.2009.09.038.
  • Hirano, K. (2002), “Semiparametric Bayesian Inference in Autoregressive Panel Data Models,” Econometrica, 70, 781–799. DOI: 10.1111/1468-0262.00305.
  • Hjort, N. L., Homes, C., Müuller, P., and Walker, S. G. (2010), Bayesian Nonparametrics, Cambridge: Cambridge University Press.
  • Hlouskova, J., Schmidheiny, K., and Wagner, M. (2009), “Multistep Predictions for Multivariate GARCH Models: Closed form Solution and the Value for Portfolio Management,” Journal of Empirical Finance, 14, 330–336. DOI: 10.1016/j.jempfin.2008.09.002.
  • Jame, R., and Tong, Q. (2014), “Industry-based Style Investing,” Journal of Financial Markets, 19, 110–130. DOI: 10.1016/j.finmar.2013.08.004.
  • Jensen, J. M., and Maheu, M. J. (2010), “Bayesian Semiparametric Stochastic Volatility Modeling,” Journal of Econometrics, 157, 306–316. DOI: 10.1016/j.jeconom.2010.01.014.
  • Kalli, M., Griffin, J. E., and Walker, S. G. (2011), “Slice Sampling Mixture Models,” Statistics and Computing, 21, 93–105. DOI: 10.1007/s11222-009-9150-y.
  • Kaufmann, S. (2010), “Dating and Forecasting Turning Points by Bayesian Clustering with Dynamic Structure: A Suggestion with an Application to Austrian Data,” Journal of Applied Econometrics, 25, 309–344. DOI: 10.1002/jae.1076.
  • Kaufmann, S. (2015), “K-state Switching Models with Time-varying Transition Distributions: Does Loan Growth Signal Stronger Effects of Variables on Inflation?” Journal of Econometrics, 187, 82–94.
  • Klaassen, F. (2002), “Improving GARCH Volatility Forecasts with Regime Switching GARCH,” Empirical Economics, 27, 363–394. DOI: 10.1007/s001810100100.
  • Lo, A. Y. (1984), “On a Class of Bayesian Nonparametric Estimates: I. Density Estimates,” The Annals of Statistics, 12, 351–357. DOI: 10.1214/aos/1176346412.
  • MacLehose, R., and Dunson, D. (2010), “Bayesian Semiparametric Multiple Shrinkage,” Biometrics, 66, 455–462. DOI: 10.1111/j.1541-0420.2009.01275.x.
  • Meilâ, M. (2007), “Comparing Clusterings – An Information Based Distance,” Journal of Multivariate Analysis, 98, 873–895.
  • Müller, P., and Mitra, R. (2013), “Bayesian Nonparametric Inference – Why and How,” Bayesian Analysis, 8, 269–302. DOI: 10.1214/13-BA811.
  • Nakatsuma, T. (1998), “A Markov-chain Sampling Algorithm for GARCH Models,” Studies in Nonlinear Dynamics and Econometrics, 3, 107–117.
  • Nieto-Barajas, L. E., and Quintana, F. A. (2016), “A Bayesian Non-parametric Dynamic AR Model for Multiple Time Series Analysis,” Journal of Time Series Analysis, 37, 675–689. DOI: 10.1111/jtsa.12182.
  • Otranto, E., and Gallo, G. M. (2002), “A Nonparametric Bayesian Approach to Detect the Number of Regimes in Markov Switching Models,” Econometric Reviews, 21, 477–496. DOI: 10.1081/ETC-120015387.
  • Park, T., and Casella, G. (2008), “The Bayesian Lasso,” Journal of the American Statistical Association, 103, 681–686. DOI: 10.1198/016214508000000337.
  • Pitman, J., and Yor, M. (1997), “The Two Parameter Poisson-Dirichlet Distribution Derived from a Stable Subordinator,” Annals of Probability, 25, 855–900.
  • Sethuraman, J. (1994), “A Constructive Definition of Dirichlet Priors,” Statistica Sinica, 4, 639–650.
  • Taddy, M. A., and Kottas, A. (2009), “Markov Switching Dirichlet Process Mixture Regression,” Bayesian Analysis, 4, 793–816. DOI: 10.1214/09-BA430.
  • Virbickaite, A., Ausín, M. C., and Galeano, P. (2015), “Bayesian Inference Methods for Univariate and Multivariate GARCH Models: A Survey,” Journal of Economic Surveys, 29, 76–96. DOI: 10.1111/joes.12046.
  • Wade, S., and Ghahramani, Z. (2018), “Bayesian Cluster Analysis: Point Estimation and Credible Balls,” (with discussion), Bayesian Analysis, 13, 559–626. DOI: 10.1214/17-BA1073.
  • Walker, S. G. (2007), “Sampling the Dirichlet Mixture Model with Slices,” Communications in Statistics - Simulation and Computation, 36, 45–54. DOI: 10.1080/03610910601096262.
  • Wang, H. (2010), “Sparse Seemingly Unrelated Regression Modelling: Applications in Finance and Econometrics,” Computational Statistics & Data Analysis, 54, 2866–2877. DOI: 10.1016/j.csda.2010.03.028.
  • Wee, D. C., Chen, F., and Dunsmuir, W. T. (2022), “Likelihood Inference for Markov Switching GARCH(1,1) Models using Sequential Monte Carlo,” Econometrics and Statistics, 21, 50–68. DOI: 10.1016/j.ecosta.2020.03.004.
  • Xu, Y., Müller, P., Wahed, A. S., and Thall, P. F. (2016), “Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times,” Journal of the American Statistical Association, 111, 921–950. DOI: 10.1080/01621459.2015.1086353.
  • Zhang, L., Guindani, M., Versace, F., Engelmann, J. M., and Vannucci, M. (2016), “A Spatiotemporal Nonparametric Bayesian Model of Multi-subject fMRI Data,” The Annals of Applied Statistics, 10, 638–666. DOI: 10.1214/16-AOAS926.

Appendix A

Proof of the Results in Section 3

We introduce for h1 the set of parameters allocated to the hth mixture component in the regime k, Dhk={i=1,,N|Dik=h} and the set of the non-empty mixture components Dk*={h|Dhk}. The number of stick-breaking components needed for the finite mixture representation is Dk*=max{Dik,i=1,,N}. When sampling from the full conditional distribution of Θk* and Vk, only Nk* element are sampled, where Nk* is the smallest integer such that h=1Nk*Whk>1Uk*, where Uk*=min{Uik,i=1,,N}.

A.1 Full Conditional Distribution of V and U

Let us split Vk in three blocks: Vk*={Vlk:lDk*}, Vk**={VkDk*+1,,VkDk*+Nk*} and Vk***={Vlk:l>Nk*}. The samples are generated from a collapsed Gibbs step

  1. the full conditional of the elements in Vk* given Ξ,Θ,P,D,Θ*,Y (24) f(Vlk|)Be(1ν+i=1NI(Dik=l),ψ+νl+i=1NI(Dik>l))×lDk*(24)

  2. 2. the full conditional of the elements of Vk** and Vk*** given Ξ,Θ,P,D,V*,Θ*,Y, which coincides with the prior distributions Be(1ν,ψ+νl) for l>Dk*

  3. the full conditional of the elements of Uk given V and Ξ,Θ,P,D,Θ*,Y, f(Uik|)I(Uik<WDikk), which is uniform on the interval (0,WDik)

A.2 Full Conditional Distribution of P and R

We apply a collapsed-Gibbs step and sample rk, k=1,,K, given Ξ,Θp,D,V,Y, and pk=(p1,k,,pN,k), k=1,,K, is the collection of the kth row of the transition probability matrices given (r1,,rK) and Ξ,Θp,D,V,Y. As for the transition probabilities, from standard calculations in Markov-switching regressions we obtain (25) f(pi,k|)h=1Kpi,khϕrkh+ni,kh1(25) Dir(ϕrk1+ni,k1,,ϕrkK+ni,kK),where ni,kh=t=1Tξikt1ξiht. The marginal distribution is f(rk|)Δ[0,1]KNi=1Nh=1Kpi,khϕrkh+ni,kh1Γ(ϕ)Γ(ϕrkh)dpi,khπ(rk)h=1Krkhd1i=1NΓ(ϕrkh+ni,kh)Γ(ϕ+ni,k)Γ(ϕ)Γ(ϕrkh),

where ni,k=ni,k1++ni,kK, and Δ[0,1]K={(p1,,pK)RK|pk>0 k, p1++pK=1} is the K-dim standard simplex. From the properties of the gamma functions Γ(ϕrkh+ni,kh)=l=1ni,kh(ϕrkh+l1)Γ(ϕrkh),Γ(ϕ+ni,k)=l=1ni,k(ϕ+l1)Γ(ϕ),we have: f(rk|)Dir(1/K+mk1,,1/K+mkK)g(rk), where g(rk)=i=1Nh=1Kl=2ni,kh(ϕrkh+l1), mkh=Card(Mkh) and Mkh={i=1,,N|ni,kh>0}. Samples from this full conditional distribution are obtained by a Metropolis-Hastings (MH) algorithm with independent proposal distribution Dir(1/K+mk1,,1/K+mkK).

A.3 Full Conditional Distribution of Θ*

The full conditional distribution of θhk*=(μhk*,γhk*,αhk*,βhk*) can be sampled by simulating iteratively from the following conditional distributions. The full conditional of μhk* is f(μhk*|)N(μhk*|m*,s*)iDhkN(μik|μhk*,s)N(μhk*|m¯hk,s¯hk),where m¯hk=s¯hk2(m*s*2+iDhkμiks2) and s¯hk=(1s*2+Card(Dhk)s2)1/2. The full conditional distribution of γhk* is (26) f(γhk*|)I[0,a](γhk*)iDhkBe(γik/a|rγhk*/a,r(1γhk*/a)),(26) I[0,a](γhk*)iDhkexp{(rγhk*/a1)log(γik/a)+(r(1γhk*/a)1)log(1γik/a)}Γ(rγhk*/a)Γ(r(1γhk*/a)),exp{κhkγhk*}(1Γ(rγhk*/a)Γ(r(1γhk*/a)))Card(Dhk)I[0,a](γhk*), where κhk=raiDhklog((aγik)/γik) samples from this conditional distribution are obtained by a MH algorithm with independent proposal distribution (1exp{κhkγhk*})(1exp{aκhk})1I[0,a](γhk*). The full conditional of αhk* is

f(αhk*|)I[0,1](αhk*)iDhkexp{(rαhk*1)log(αik)+(r(1αhk*)1)log(1αik)}Γ(rαhk*)Γ(r(1αhk*)),exp{τhkαhk*}(1Γ(rαhk*)Γ(r(1αhk*))) Card(Dhk)I[0,1](αhk*), where τhk=riDhklog((1αik)/αik). Samples from this conditional distribution are obtained by MH with independent proposal distribution (1exp{τhkαhk*})(1exp{τhk})1I[0,1](αhk*). Similar argument is applied to derive the full conditional distributions of βhk*.

A.4 Full Conditional Distribution of Θ

The full conditional distribution of the elements of θik k=1,,K is discussed. Let μi=(μi1,,μiK), its full conditional distribution is (27) f(μi|)(t=1TN(yit|μi(sit),σit))k=1KN(μik|μ˜ik*,s),(27) which is not tractable due to the recursive form of σit2. Thus, we sample from the full conditional by MH with proposal distribution obtained through the approximation σit*2 of σit2. The joint full conditional of μi can be approximated by a normal distribution with mean mi=Si(mi1/si12,mi2/si22,,miK/siK2) and diagonal covariance matrix Si=diag(si12,si22,,siK2) where mik=sik2(μ˜ik*s2+tTy,ikyitσit*2) and sik2=(1s2+tTy,ik1σit*2)1, with Ty,ik={t=1,,T|sit=k} and σit*2=γi(sit)+αi(sit)(yt1μi(sit1))2+(βi(sit))σt1*2. The constructed mean and variance are used to define the parameters of the normal mixture proposal distribution for μi, that is f(μi|)=0.05N(μi|mi,Si)+0.95N(μi|μi(r1),Si). As regards the parameters of the volatility process, the full conditional is (28) f(γi,αi,βi|)k=1KBe(γik/a|rγ˜ik*/a,r(1γ˜ik*/a))(28) Be(αik|rα˜ik*,r(1α˜ik*))Be(βik|rβ˜ik*,r(1β˜ik*))t=1TN(yit|μi(sit),σit), where γi=(γi1,,γiK), αi=(αi1,,αiK) and βi=(βi1,,βiK). We follow the ARMA approximation of the MSGARCH process, that is (29) σit2=γi(sit)+αi(sit)ϵit12+βi(sit)σit12,(29) (30) ϵit2=γi(sit)+(αi(sit)+βi(sit))ϵit12βi(sit)(ϵit12σit12)+(ϵit2σit2).(30)

Let wit=ϵit2σit2=(ϵit2σit21)σit2=(χ2(1)1)σit2 with Et1[wit]=0 and vart1[wit]=2σit4. Subject to the above, we assume witwit*N(0,2σit4) (Nakatsuma Citation1998). The auxiliary ARMA model for the squared error term ϵit2 is ϵit2=γisit+(αi(sit)+βi(sit))ϵit12βi(sit)wit1*+wit* with wit*N(0,2σit4), which returns wit*=ϵit2γi(sit)αi(sit)ϵit12βi(sit)(ϵit12wit1*). Following Ardia (Citation2008) we further express wit* as a linear function of the (3K×1) vector of volatility parameters θiσ=(γi1,,γiK,αi1,,αiK,βi1,,βiK). To do this, we approximate the function wt* by the first order Taylor’s expansion about θiσ(r1)=(γi1(r1),,γiK(r1),αi1(r1),,αiK(r1),βi1(r1),,βiK(r1)) as (31) wit*wit**=wit*(θiσ(r1))+it(θiσθiσ(r1)),(31) where it=(it1,it2,,itK) with itk=(wit*γik,wit*αik,wit*βik). it satisfies the recurrence: it=utξit+it1(βiξit), where ut=(1,ϵit12,(ϵit12wit1*)), i0k=0 and ξit is a row vector.

Upon defining rit*=wit*(θiπ(r1))itθiσ(r1), it turns out that wit**=rit*+itθiσ. Furthermore, by defining μi=(μi1,μi2,,μiK),the T×1 vectors wi=(wi1**,,wiT**) and ri*=(ri1*,,riT*), and a T×3K matrix i=(i1,i2,,iT) as well as a T×T diagonal matrix Υi=2diag(σi1**4,σi2**4,,σiT**4) with σit**2=(ξitγi(r1))+(ξitαi(r1))(yt1ξt1μi(r))2+(ξitβi(r1))σit1**2, we have wi=ri*+iθiσ. Using this linear approximation, the full conditional distribution of θiσ approximates as (32) f(θiσ|ξi,1:T(r1),μi(r),y1:T)1|Υi|12exp(wiΥi1wi2)IΘ(θiσ)N3K(miσ,Siσ)IΘ(θiσ),(32) where Θ={γi1>0,,γiK>0,0<αi1<1,,0<αiK<1,0<βi1<1,0<βiK<1} and Siσ=(iΥi1i)1, miσ=SiσiΥi1ri*. The mean and variance defined above are used to characterize proposal distribution for θiσ, that is a mixture of truncated normal distributions. In our MCMC exercise, we sample θiσ from the normal mixture f(θiσ|)=0.05N(θiσ|miσ,Siσ)+0.95N(θiσ|θiσ(r1),Siσ) and check that each sample satisfies the constraints.

A.5 Full Conditional Distribution of Ξ

The full joint conditional distribution of the state variables, ξi,1:T=(ξi1,,ξiT) with ξit=(ξi1,t,,ξiK,t), given the parameter values and return series (33) p(ξi,1:T|)t=1Tf(yit|sit,θisit,)k=1Kl=1kpi,lkξilt1ξikt,(33) is a nonstandard distribution. For this reason, following Billio, Casarin, and Osuntuyi (Citation2016), we propose a MH algorithm with proposal distribution given by an approximation of the smoothed probability p(ξi,1:T|). The algorithm involves running a Forward Filtering Backward Sampling (FFBS) on an auxiliary model to generate proposals at each iteration step (Carter and Kohn Citation1994; Frühwirth-Schnatter Citation1994). Among several alternative collapsing procedures (see Billio, Casarin, and Osuntuyi Citation2016), we adopt the MSGARCH model by Klaassen (Citation2002) as an auxiliary model as it accounts for the highest amount of information in its construction. We denote the proposal distribution by (34) q(ξi,1:T|θi,yi,1:T)=q(ξiT|θi,yi,1:T)t=1T1q(ξit|ξit+1,θi,yi,1:t),(34)

where q(ξit|ξit+1,θi,yi,1:t)=q(ξit|yi,1:t,θi)q(ξit+1|ξit,θi)/q(ξit+1| yi,1:t,θi), and q(ξit|yi,1:t,θi) is the filtered probability. At time t, given θi and yi,1:t, the predicted and filtered distributions are (35) q(ξit|θi,yi,1:t1)=k=1K(l=1Kpi,lkξil,t)q(ξit1=ek|θi,yi,1:t1),(35) (36) q(ξit|θi,yi,1:t)=g(yit|ξit,θi,yi,1:t1)q(ξit|θi,yi,1:t1)/c,(36) where c=k=1Kg(yit|ξit=ek,θi,yi,1:t1)q(ξit=ek|θi,yi,1:t1), ek is the k th row of the identity matrix IK. The conditional density of the unit i under the auxiliary model is (37) g(yit|ξit,θi,yi,1:t1)τ=1t1hiτexp((yiτμi(siτ))22hiτ2),(37) where hit2=γi(sit)+αi(sit)ϵ(y)it12+βi(sit)σ(y)i,kt12 with ϵ(y)it1=yit1μ(y)i,kt1, μ(y)ik,t1=E[μi(sit)|yi,1:t1,ξit=ek], and σ(y)i,kt12=E[σit12(yi,1:t2,ξit1,ξit2)|yi,1:t1,ξit=ek].

Using the output of the forward filtering, we compute q(ξiT|θi,yi,1:T) and q(ξit|ξit+1,θi,yi,1:t)l=1K(k=1Kpi,lkξi1,t)ξil,t+1q(ξit|θi,yi,1:t), t=T1,T2,,2,1. Then, at each time, step we sample ξT from q(ξT|θi,yi,1:T) and ξit from q(ξit|ξit+1,θi,yi,1:t) iteratively for t=T1,T2,,2,1. This is the backward sampling step. Samples from q(ξit|ξit+1,θi,yi,1:t) can be obtained by multinomial sampling.

A.6 Full Conditional Distribution of D

The full conditional of Dik is P(Dik=h|)=ch/c for hAki, with ch=N(μik|μhk*,s) Be(αik|rαhk*,r(1αhk*)) Be(βik|rβhk*,r(1βhk*))Be(γik/a|rγhk*/a,r(1γhk*/a))/a where c=hAkich is the normalizing constant and a a real positive constant.

A.7 Detection of the Number of Regimes

We present the panel version of the univariate approach by Otranto and Gallo (Citation2002). We assume yt| (μ˜t, Λ˜t1) ind NN( μ˜t, Λ˜t1), t=1, 2,…, T, with (μ˜t, Λ˜t1) |GiidG, and G PYP(υ0, ϕ0,G0). The BNP model allows for the infinite mixture representation yt| G k=1Wk φ(yt| μk, Λk1), where φ(y|μ,Σ) denotes the density of a multivariate normal with location and scale parameters, μ and Σ, respectively. We assume the base measure G0 is a product of N independent hierarchical prior distributions G0(μ,Λ)= Ga(τ;c/2,d/2) j=1N N(μj;mj,τλj1) Ga(λj;a/2,b/2). Instead of using the sampler in Otranto and Gallo (Citation2002), we apply a more efficient sampler which relies on the stick-breaking representation (Walker Citation2007). The sampler requires the use of the allocation, DtiidP(Dt=k)=Wk t=1,,T, stick-breaking VkindBe(1υ0,ϕ0+kυ0) k=1,2,, and slice UtU(0,WDt) variables. The allocation variable has the interpretation of regime allocation as in Otranto and Gallo (Citation2002). The full conditionals of μk, Λk, and τ are (38) f(μkj|)N((tDkyjt+mjτ)/((Tk+τ)),1/(λkj(Tk+τ))),(38) (39) f(λkj|)Ga((Tk+a)/2,((Tk1)sjk2+(Tkτ(mjy¯jk)2)/    (Tk+τ)+b)/2),(39) (40) f(τ|)Ga((N+c)/2,(d+j=1N(μkjmj)2λkj)/2),(40)

j=1,,N, where Dk={t=1,,T|Dt=k}, Tk=Card(Dk) sjk2=tDk(yjty¯jk)2/(Tk1), y¯jk=tDkyjt/Tk. Let D*=max{Dt,t=1,,T} be the maximum value of the allocation variables, and Dt* the smallest integer such that k=1Dt*Wk>1Ut. Since Dt*< a.s., the infinite mixture reduces to a finite mixture. The full conditionals of the steak-breaking components, the slice variables and the allocation variables are (41) f(Vl|)Be(1υ0+t=1TI(Dt=l),ϕ0+υ0l(41) (42) +t=1TI(Dik>l)), lD*,f(Ut|)U(Ut<WDt),(42) (43) f(Dt=k|)φ(yt|μk,Λk1), kDt*.(43)