Search in:

Statistical Theory and Related Fields Volume 5, 2021 - Issue 4

Submit an article Journal homepage

Free access

472

Views

CrossRef citations to date

Altmetric

Listen

Articles in the special topic of Bayesian analysis

Intrinsic Bayesian estimation of linear time series models

Shawn Nia Department of Economics, University of Missouri, Columbia, MO, USACorrespondence[email protected]
View further author information

Dongchu Sunb School of Statistics, East China Normal University, Shanghai, People's Republic of China;c Department of Statistics, University of Missouri, Columbia, MO, USAView further author information

Pages 275-287 | Received 10 May 2019, Accepted 15 Mar 2020, Published online: 02 Apr 2020

Cite this article
https://doi.org/10.1080/24754269.2020.1744073
CrossMark

In this article

1. Introduction
2. Entropy loss function for the iid multivariate and LTS models
3. Approaches of computing the expectation E(X∣Φ,Σ)(X′X)
4. Bayesian estimation of (Φ,Σ) in LTS models
5. A numerical example and an empirical study
6. Concluding remarks
Disclosure statement
Additional information
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Intrinsic loss functions (such as the Kullback–Leibler divergence, i.e. the entropy loss) have been used extensively in place of conventional loss functions for independent samples. But applications in serially correlated samples are scant. In the present study, we examine Bayes estimator of Linear Time Series (LTS) model under the entropy loss. We derive the Bayes estimator and show that it involves a frequentist expectation of regressors. We propose a Markov Chain Monte Carlo procedure that jointly simulates the posteriors of the LTS parameters with frequentist expectation of regressors. We conduct Bayesian estimation of an LTS model for seasonal effects in some U.S. macroeconomic variables.

Keywords:

entropy loss
latent parameters
reference prior

AMS 1991 Subject Classifications:

Primary 62F15
Secondary 62A15
62C10
62F25

1. Introduction

To analyse dynamics of multi-variate economic systems, researchers frequently employ Linear Time Series (LTS) models (see for example, Sims, Citation1980 and the ensuing literature). Bayesian inference of such models often requires point estimate of parameters because reporting the entire posterior distribution is made difficult by a prohibitively large number of parameters. A critical aspect in Bayesian estimation is the choice of loss function.

In this study, we derive Bayes estimator of LTS models based on the intrinsic loss. We illustrate a computational problem arising from serial correlation in the models when applying the intrinsic loss and our solution to the problem.

A loss function $L (θ, \hat{θ})$ measures the distance between the parameter θ and its estimate $\hat{θ}$ . Such a metric is often specified for convenience given the problem at hand instead of grounding on a general principle. Bernardo and Juárez (Citation2003) noted that for inferential purposes, what matters most is not the distance between θ and $\hat{θ}$ , instead it is the intrinsic loss – the distance between the probability model $f (x ∣ \hat{θ})$ (corresponding to the estimate $\hat{θ}$ ) and $f (x ∣ θ)$ (corresponding to the actual parameter θ). Robert (Citation1994, Citation1996) proposed using the logarithmic divergence (also known as the Kullback–Leibler divergence or the entropy loss) as the intrinsic loss. The intrinsic loss has a number of desirable properties not generally possessed by conventional loss functions. For example, it is invariant to transformation of data x or parameter θ and has the additive property that the loss of the sum of two independent data sets is the sum over the two losses corresponding to each data set.

The intrinsic loss has been used for independent samples for Bayesian estimation. It has also be used in various contexts for time series data. For instance, Kitamura and Stutzer (Citation1997) used the Kullback–Leibler distance to derive a frequentist estimator for nonlinear models. Solo et al. (Citation2001) used the Kullback–Leibler distance for evaluation of signal processing model. Robertson et al. (Citation2005) used the entropy divergence for evaluation of forecasting density. Fernandez-Villaverde and Rubio-Ramirez (Citation2004) used the Kullback–Leibler distance to evaluate dynamic equilibrium models in economics. However, employing the intrinsic loss for Bayesian estimation of time series models leads to technical challenges.

To illustrate the difference of the intrinsic loss in independent models and serially correlated models, consider the following examples. First, suppose $y = {y_{1}, \dots, y_{T}}$ , where $y_{t}$ ( $t = 1, \dots, T$ ) are independently identically distributed (iid) $N (ρ, 1)$ and we are interested in estimating the mean parameter ρ under the entropy loss $\begin{aligned} κ (\hat{ρ} ∣ ρ) & = \int \log \{\frac{f (y ∣ ρ)}{f (y ∣ \hat{ρ})}\} f (y ∣ ρ) d y \\ = E_{y ∣ ρ} \log \{\frac{f (y ∣ ρ)}{f (y ∣ \hat{ρ})}\} . \end{aligned}$ By the assumption on the model $f (y ∣ ρ) \propto \exp {- \frac{1}{2} \times \sum_{t = 1}^{T} (y_{t} - ρ)^{2}}$ . It is easy to verify that $κ (\hat{ρ} ∣ ρ) = (T / 2) (ρ - \hat{ρ})^{2}$ . In this case, the intrinsic loss coincides with the commonly used quadratic loss, which implies that the Bayes estimator of ρ is the posterior mean. Now consider an AR(1) model: $y_{t} = ρ y_{t - 1} + ϵ_{t},$ for $t = 1, \dots, T,$ where $ϵ_{t}$ is iid $N (0, 1)$ , and ρ is the only unknown parameter.

The entropy loss is still $\begin{aligned} κ (\hat{ρ} ∣ ρ) & = \int \log \{\frac{f (y ∣ ρ)}{f (y ∣ \hat{ρ})}\} f (y ∣ ρ) d y \\ = E_{y ∣ ρ} \log \{\frac{f (y ∣ ρ)}{f (y ∣ \hat{ρ})}\}, \end{aligned}$ but now f is the density of the AR variable. Substitute in the distribution of data $\begin{aligned} κ (\hat{ρ} ∣ ρ) = \frac{(ρ - \hat{ρ})^{2}}{2} E_{y ∣ ρ} \sum_{t = 1}^{T} y_{t - 1}^{2} = (ρ - \hat{ρ})^{2} δ (y_{0}, ρ), \end{aligned}$ where $\begin{aligned} δ (y_{0}, ρ) & = \frac{1}{2} E_{y ∣ ρ} \sum_{t = 1}^{T} y_{t - 1}^{2} \\ = \frac{1}{2} \sum_{t = 1}^{T} \{ρ^{2 (t - 1)} y_{0}^{2} + 1 + ρ^{2} + \dots + ρ^{2 (t - 2)}\} \\ = \frac{1}{2} \{\frac{1 - ρ^{2 T}}{1 - ρ^{2}} y_{0}^{2} + \frac{1}{1 - ρ^{2}} (T - \frac{1 - ρ^{2 T}}{1 - ρ^{2}})\} . \end{aligned}$ It is obvious that $δ (y_{0}, ρ)$ is an increasing function of $ρ^{2}$ and is nonnegative for any ρ. A Bayes estimator (which is called a generalised Bayes estimator if the prior is improper) minimises the Bayesian posterior expected loss. If the entropy loss is employed, the Bayes estimator for ρ with a given initial condition $y_{0}$ is ${\hat{ρ}}_{E} = argmin E_{ρ ∣ y} [δ (y_{0}, ρ) (ρ - \hat{ρ})^{2}] = E_{ρ ∣ y} {δ (y_{0}, ρ) ρ} / E_{ρ ∣ y} {δ (y_{0}, ρ)} .$ Note that if ρ is positive, ρ and $δ (y_{0}, ρ)$ are positively correlated. It follows that the Bayes estimator under the entropy loss for a positive ρ is larger than the posterior mean. It is well known that the MLE (and the posterior mean under constant prior) of ${\hat{ρ}}_{M}$ is biased downward, especially when the true parameter is close to unity (see MacKinnon & Smith, Citation1998) Note that under the constant prior $\sum_{t = 1}^{T} y_{t - 1}^{2}$ is the posterior precision (i.e. the inverse of posterior variance) for ρ. Hence the weight for the square of estimation error in the intrinsic loss function, $δ (y_{0}, ρ) = E_{y ∣ ρ} \sum_{t = 1}^{T} y_{t - 1}^{2}$ , is larger in the region of ρ where the posterior precision is high. It is in the spirit of Zellner's (Citation1978, Citation1998) ‘precision of the estimation’ loss. This is in contrast of quadratic loss that imposes the same weight on all regions of ρ.

Now we turn to the model of interest. The LTS of a p dimensional column endogenous variable $y_{t}$ and a q dimensional column exogenous (predetermined) variable $x_{0 t}$ $(t = 1, \dots, T)$ has the form: (1) $\begin{aligned} y_{t}^{'} = x_{0 t}^{'} B_{0} + \sum_{j = 1}^{L} y_{t - j}^{'} B_{j} + ϵ_{t}, \end{aligned}$ (1) where L is a known positive integer, $B_{0}$ is a $q \times p$ unknown matrix, $B_{j}$ is an unknown $p \times p$ matrix, $ϵ_{1}, \dots, ϵ_{T}$ are iid $N_{p} (0, Σ)$ errors, and $Σ$ is an unknown $p \times p$ positive definite matrix.

A special case of the above LTS model is all of the lag coefficients $B_{1}, \dots, B_{L}$ are zero (i.e. all regressors (with q>p) are exogenous variables.) The exogenous variables may be functions of time. For example, in modelling of climate temperature or holiday consumer spending, seasonal dummies may be introduced in the model. In economic applications, these exogenous variables may also be variables of government policies. Another special case is $x_{0 t}$ is a $1 \times p$ constant vector with elements of unity. The regressors only include lags of the variable $y_{t}$ . The LTS model becomes a Vector AutoRegression (VAR), which is commonly used for modelling of macroeconomic time series.

We can rewrite Equation (Equation1(1) $\begin{aligned} y_{t}^{'} = x_{0 t}^{'} B_{0} + \sum_{j = 1}^{L} y_{t - j}^{'} B_{j} + ϵ_{t}, \end{aligned}$ (1) ) in the familiar matrix form (2) $\begin{aligned} Y = X Φ + ϵ, \end{aligned}$ (2) where $\begin{aligned} X & = (\begin{matrix} x_{1}^{'} \\ ⋮ \\ x_{T}^{'} \end{matrix}) = (X_{0}, X_{1}); X_{0} = (\begin{matrix} x_{01}^{'} \\ ⋮ \\ x_{0 T}^{'} \end{matrix}), \\ X_{1} & = (\begin{matrix} y_{0}^{'} & \dots & y_{1 - L}^{'} \\ ⋮ & ⋮ & ⋮ \\ y_{T - 1}^{'} & \dots & y_{T - L}^{'} \end{matrix}); \\ Φ & = (\begin{matrix} B_{0} \\ Φ_{1} \end{matrix}) = (\begin{matrix} B_{0} \\ B_{1} \\ ⋮ \\ B_{L} \end{matrix}); Y = (\begin{matrix} y_{1}^{'} \\ ⋮ \\ y_{T}^{'} \end{matrix}), ϵ = (\begin{matrix} ϵ_{1}^{'} \\ ⋮ \\ ϵ_{T}^{'} \end{matrix}) . \end{aligned}$ Here $X_{0}$ and $X_{1}$ are $T \times q$ and $T \times L p$ ; the former does not depend on parameters $Σ$ and $Φ$ , but the latter does. $Y$ and $ϵ$ are $T \times p$ matrices, $Φ$ is a $(q + L p) \times p$ matrix of unknown parameters, $x_{t}^{'}$ is a $1 \times (q + L p)$ row vector, and $X$ is a $T \times (q + L p)$ matrix of observations. The likelihood function of $(Φ, Σ)$ based on $Y$ is then (3) $\begin{aligned} f (Y ∣ Φ, Σ) \\ \propto \frac{1}{| Σ |^{T / 2}} \exp \{- \frac{1}{2} \sum_{t = 1}^{T} (y_{t} - x_{t} Φ) Σ^{- 1} (y_{t} - x_{t} Φ)^{'}\} \\ = \frac{1}{| Σ |^{T / 2}} e t r \{- \frac{1}{2} (Y - X Φ) Σ^{- 1} (Y - X Φ)^{'}\} . \end{aligned}$ (3) Here and hereafter $e t r (A)$ is $\exp (tr (A))$ of a matrix $A .$

The present paper achieves two goals. The first one is derivation of the Bayes estimator of LTS model under the entropy loss. We show that the entropy loss on $(Φ, Σ)$ is non-separable in $Σ$ and $Φ$ , which can be written as the sum of losses pertaining to the covariance matrix $Σ$ and normalised estimation error of $Φ$ . The form of the $Φ$ -part of the entropy loss for LTS is $t r {{\hat{Σ}}^{- 1} (Φ - \hat{Φ})^{'} E_{(X ∣ Φ, Σ)} (X^{'} X) (Φ - \hat{Φ})}$ , where $\hat{Σ}$ is the Bayes estimator of $Σ$ . Under the entropy loss, the Bayes estimator distinctly differs from the posterior mean and differs from that of the iid multivariate normal model. The part of the intrinsic loss function associated with the regression coefficients turns out to be related with a conventional loss function. For estimation of a matrix parameter such as $Φ$ in the simultaneous equations context, Zellner (Citation1978, Citation1998) proposed a ‘precision of estimation’ loss that can also be written as $t r {Σ^{- 1} (Φ - \hat{Φ})^{'} (X^{'} X) (Φ - \hat{Φ})}$ . However, in Zellner's simultaneous equations model, $X^{'} X$ is taken as given, but in LTS the predetermined variable $X^{'} X$ depends on parameters $(Φ, Σ)$ .

The second goal concerns numerical estimation of the intrinsic Bayes estimator via Markov Chain Monte Carlo (MCMC). We propose a general algorithm that generate regressors as latent parameters in simulation of posteriors of parameters of LTS models. Data augmentation in this study differs from that in Tanner and Wang's Citation1987 seminal paper in motivation and implementation. Tanner and Wang use data augmentation to alter the likelihood function for easier MCMC simulation of the posteriors. In this study, the likelihood function of the generated data is the same as the likelihood of the sample data. Here, data augmentation does not make it easier for posterior simulation. Instead, it makes it possible to compute frequentist moment $E_{(X ∣ Φ, Σ)} (X^{'} X)$ of the LTS variables. The frequentist moment, simulated jointly with parameters, is used to produce Bayes estimates under the entropy loss.

Besides the choice of loss function, the choice of prior also plays a pivotal role in Bayesian estimation. Jeffreys prior on $Σ$ (see Zellner, Citation1971) is a noninformative prior for $Σ$ that gives rise to conditional posteriors in well known distributions. Ni et al. (Citation2007) conducted Bayesian estimation of VAR model under the entropy loss, using the Jefferys prior for $Σ$ . However, despite its popularity the Jeffreys prior is known for producing unsatisfactory results in multi-parameter settings. In this study we simulate the LTS model under a combination of normal prior on regression parameters and Yang and Berger (Citation1994) reference prior on $Σ$ . The conditional posteriors of $Σ$ are simulated using a Metropolis-Hastings algorithm. Our empirical application shows that despite the fact that LTS models involve a large number of parameters and a large number of latent variables, the data-augmentation algorithm is quite efficient.

In Section 2 of the paper, we derive the Bayes estimator of LTS models under the entropy loss function and discuss computation of the weighting matrix in the Bayes estimator. In Section 3, we present a general algorithm using generated data as latent parameters. In Section 4, we lay out the MCMC algorithm for computing ( $Φ$ , $Σ$ ) in the LTS model. In Section 5, we first compare intrinsic Bayes estimator with other estimators in a numerical example and then estimate a LTS model using seasonally unadjusted macroeconomic data. In Section 6 we offer concluding remarks.

2. Entropy loss function for the iid multivariate and LTS models

2.1. Entropy loss function for the iid model

We first consider the entropy loss function (Robert, Citation1994, p. 74) for a multivariate normal distribution. Let $Y = (x_{1}, x_{2}, \dots, x_{T})$ be a random sample from $N_{p} (μ, Σ)$ . One can compute the entropy loss function as $\begin{aligned} L (\tilde{μ}, \tilde{Σ}; μ, Σ) \\ = \int \log \{\frac{p (x ∣ μ, Σ)}{p (x ∣ \tilde{μ}, \tilde{Σ})}\} p (x ∣ μ, Σ) d x \\ = \frac{T}{2} \{t r (Σ {\tilde{Σ}}^{- 1}) - \log | Σ {\tilde{Σ}}^{- 1} | - p \\ + (\tilde{μ} - μ)^{'} {\tilde{Σ}}^{- 1} (\tilde{μ} - μ)\} . \end{aligned}$ Here $p (x ∣ μ, Σ)$ is the density of $N_{p} (μ, Σ)$ . Clearly, the loss function has two parts. One part is related to the means $\tilde{μ}$ and $μ$ (with $\tilde{Σ}$ as the weighting matrix), and the other part is related to $\tilde{Σ}$ and $Σ .$ The following fact states that Bayes estimator for μ is the posterior mean of μ but that of $Σ$ is larger than the posterior mean of $Σ$ .

Fact 2.1

Under the entropy loss L, the generalised Bayes estimator of $(μ, Σ)$ is $\begin{aligned} {\hat{μ}}_{i i d} & = E {μ | Y}, \\ {\hat{Σ}}_{i i d} & = E (Σ ∣ Y) + v a r (μ ∣ Y) . \end{aligned}$

Note that $Y$ represents data, expectation $E (.)$ and variance $v a r (.)$ are with respect to the posterior distribution. The proof is in the appendix.

2.2. Entropy loss functions for LTS models

Recall that for the LTS model (Equation2(2) $\begin{aligned} Y = X Φ + ϵ, \end{aligned}$ (2) ), the likelihood function of $(Φ, Σ)$ is of the form (Equation3(3) $\begin{aligned} f (Y ∣ Φ, Σ) \\ \propto \frac{1}{| Σ |^{T / 2}} \exp \{- \frac{1}{2} \sum_{t = 1}^{T} (y_{t} - x_{t} Φ) Σ^{- 1} (y_{t} - x_{t} Φ)^{'}\} \\ = \frac{1}{| Σ |^{T / 2}} e t r \{- \frac{1}{2} (Y - X Φ) Σ^{- 1} (Y - X Φ)^{'}\} . \end{aligned}$ (3) ). The entropy loss for the LTS model is (4) $\begin{aligned} L (\tilde{Φ}, \tilde{Σ}; Φ, Σ) = E_{(Y ∣ Φ, Σ)} \log \{\frac{f (Y ∣ Φ, Σ)}{f (Y ∣ \tilde{Φ}, \tilde{Σ})}\}, \end{aligned}$ (4) where for computing the expectation on the right-hand side, $(\tilde{Φ}, \tilde{Σ})$ is not a function of $Y$ . The entropy loss $L (\tilde{Φ}, \tilde{Σ}; Φ, Σ)$ can be decomposed into two parts. One part measures the loss associated with the covariance matrix only, while the second part measures the loss of coefficients $Φ$ , but related to the covariance matrix $Σ$ as well. Because $y_{t} - X_{t} Φ$ $(t = 1, \dots, T)$ are iid $N_{p} (0, Σ)$ , we have $\begin{aligned} E_{(X ∣ Φ, Σ)} (Y - X Φ) = 0, \\ E_{(X ∣ Φ, Σ)} {(Y - X Φ)^{'} X Φ} = 0, \\ E_{(X ∣ Φ, Σ)} {(Y - X Φ)^{'} (Y - X Φ)} = T Σ . \end{aligned}$ Then $\begin{aligned} E_{(X ∣ Φ, Σ)} {\log \frac{| Σ |^{- T / 2} e t r {- \frac{1}{2} (Y - X Φ) Σ^{- 1} (Y - X Φ)^{'}}}{| \hat{Σ} |^{- T / 2} e t r {- \frac{1}{2} (Y - X \hat{Φ}) {\hat{Σ}}^{- 1} (Y - X \hat{Φ})^{'}}}} \\ = \frac{T}{2} (\log | \hat{Σ} Σ^{- 1} | - p) \\ + \frac{1}{2} t r [E_{(X ∣ Φ, Σ)} \{(Y - X \hat{Φ}) {\hat{Σ}}^{- 1} (Y - X \hat{Φ})^{'}\}] \\ = \frac{T}{2} (\log | \hat{Σ} Σ^{- 1} | - p) + \frac{T}{2} t r (\hat{Σ} Σ^{- 1}) \\ + \frac{1}{2} t r [E_{(X ∣ Φ, Σ)} \{X (Φ - \hat{Φ}) {\hat{Σ}}^{- 1} (Φ - \hat{Φ})^{'} X^{'}\}] \\ = \frac{T}{2} \{t r ({\hat{Σ}}^{- 1} Σ) - \log | {\hat{Σ}}^{- 1} Σ | - p\} \\ + \frac{1}{2} t r \{{\hat{Σ}}^{- 1} (Φ - \hat{Φ})^{'} E_{(X ∣ Φ, Σ)} (X^{'} X) (Φ - \hat{Φ})\} . \end{aligned}$ The result of this derivation can be summarised by the following lemma.

Lemma 2.1

The entropy loss function for the LTS model is (5) $\begin{aligned} \frac{T}{2} {t r ({\tilde{Σ}}^{- 1} Σ) - \log | {\tilde{Σ}}^{- 1} Σ | - p} \\ + \frac{1}{2} t r {{\tilde{Σ}}^{- 1} (Φ - \tilde{Φ})^{'} W (Φ - \tilde{Φ})}, \end{aligned}$ (5) where (6) $\begin{aligned} W = \frac{1}{T} E_{(X ∣ Φ, Σ)} (X^{'} X) . \end{aligned}$ (6)

This Lemma can be proved using algebra similar to that in the iid case. However, there is an important difference. For the LTS model, Bayes estimators involve matrix $W$ , a frequentist expectation of $X^{'} X$ for given parameters $(Φ, Σ)$ . For the iid case, no such term is present. The next theorem gives the form of the Bayes estimators under the entropy loss.

Theorem 2.1

The generalised Bayes estimator of $(Φ, Σ)$ under the entropy loss is (7) ${\hat{Φ}}_{E} = {E (W ∣ Y)}^{- 1} E (W Φ ∣ Y),$ (7)

(8) $\begin{aligned} {\hat{Σ}}_{E} = E (Σ ∣ Y) + E {(Φ - {\hat{Φ}}_{E})^{'} W (Φ - {\hat{Φ}}_{E}) ∣ Y} . \end{aligned}$ (8)

The above theorem can be proved similarly as Fact 2.1.

Under the special case with no lag coefficients in the regression, we have $W = E_{(X ∣ Φ, Σ)} (X^{'} X) = X_{0}^{'} X_{0}$ , which is not a function of $Σ$ and $Φ$ . It follows that the Bayes estimator of ${\hat{Φ}}_{E}$ is the posterior mean, as it is for the iid model. This observation is stated in the following remark.

Remark 1

If $B_{j} = 0$ for j>0, then $\begin{aligned} {\hat{Φ}}_{E} & = E {Φ ∣ Y}, \\ {\hat{Σ}}_{E} & = E (Σ ∣ Y) + \frac{1}{T} X_{0}^{'} X_{0} V a r (Φ ∣ Y) . \end{aligned}$

However, the Bayes estimator for the LTS model is generally different from the iid case. The Bayes estimator ${\hat{Φ}}_{E}$ for the LTS model is not the posterior mean. To compare the estimator ${\hat{Φ}}_{E}$ with the posterior mean, note that in general $\begin{aligned} {\hat{Φ}}_{E} = E (Φ ∣ Y) + {E (W ∣ Y)}^{- 1} C o v (W, Φ ∣ Y) . \end{aligned}$ Because $W = E_{(X ∣ Φ, Σ)} (X^{'} X)$ and $Φ$ are likely to be positively correlated, the Bayes estimator of $Φ$ under the intrinsic loss is likely to be larger than the posterior mean. It is known that MLE and the posterior mean of $Φ$ under a diffuse prior is likely to have a downward bias when the true parameters are closed to random walk, a typical pattern of macroeconomic data. The form of Bayes estimator of $Φ$ based on the intrinsic loss is helpful in correcting the bias in the posterior mean.

The estimator in LTS model involves the frequentist expectation $W$ . The $W$ matrix depends on specifications of the regressors $X$ . If the regressors are specified as functions of lags of $Y,$ the computation for $W$ matrix becomes nontrivial.

Using notation in Equation (Equation1(1) $\begin{aligned} y_{t}^{'} = x_{0 t}^{'} B_{0} + \sum_{j = 1}^{L} y_{t - j}^{'} B_{j} + ϵ_{t}, \end{aligned}$ (1) ), the frequentist expectation matrix $W$ can be written as $\begin{aligned} E_{(X ∣ Φ, Σ)} (X^{'} X) = (\begin{matrix} X_{0}^{'} X_{0} & E_{(X ∣ Φ, Σ)} (X_{0}^{'} X_{1}) \\ E_{(X ∣ Φ, Σ)} (X_{1}^{'} X_{0}) & E_{(X ∣ Φ, Σ)} (X_{1}^{'} X_{1}) \end{matrix}) . \end{aligned}$

For exogenous variables $X_{0}$ , there is no need deriving general closed-form expression for the terms in the above matrix as functions of parameter $Φ$ and $Σ$ . On the other hand, due to the serial correlations of $y_{t}$ , computation of $E_{(X ∣ Φ, Σ)} (X_{1}^{'} X_{1})$ is not straightforward. In the presence of exogenous variables no analytical expression for $W$ is available. In the following, we discuss approaches to Bayesian estimation under the entropy loss for the general LTS model.

3. Approaches of computing the expectation $E_{(X ∣ Φ, Σ)} (X^{'} X)$

Theorem 2.1 shows that under the entropy loss the Bayes estimator for $(Φ, Σ)$ involves the frequentist expectation $W = E_{(X ∣ Φ, Σ)} (X^{'} X)$ , and we need to compute the posterior moments $E (W ∣ Y)$ , $E (W Φ ∣ Y)$ , and $E (Φ^{'} W Φ ∣ Y)$ .

The frequentist expectation $E_{(X ∣ Φ, Σ)} (X^{'} X)$ depends on $Φ$ and $Σ$ . For the LTS model $E_{(X ∣ Φ, Σ)} (X^{'} X)$ does not have an analytical form and needs to be computed numerically for a given ( $Φ, Σ$ ). We use $Y$ and $X$ to denote observed data in the LTS model (Equation2(2) $\begin{aligned} Y = X Φ + ϵ, \end{aligned}$ (2) ). We generate $Y^{*}$ and $X^{*}$ from the same model in (Equation2(2) $\begin{aligned} Y = X Φ + ϵ, \end{aligned}$ (2) ) given parameters $Φ$ and $Σ$ , in order to compute $E_{(X ∣ Φ, Σ)} (X^{'} X)$ . There is only one observed data set $Y$ and $X$ but there are many sets of generated $Y^{*}$ and $X^{*}$ . Suppose $Φ$ and $Σ$ need to be simulated by an MCMC algorithm, then $Y^{*}$ and $X^{*}$ need to be generated for each draw of $Φ$ and $Σ$ .

One approach to computing $E_{(X ∣ Φ, Σ)} (X^{'} X)$ is straight forward but time-consuming: for each $Φ_{k}$ and $Σ_{k}$ drawn in the kth MCMC cycle we generate many sets of $X^{*}$ and use the average of ${X^{*}}^{'} X^{*}$ to approximate $E_{(X ∣ Φ, Σ)} (X^{'} X)$ . While this approach is possible in theory its high computational cost renders it infeasible in practice. For practical purposes, we must take an alternative approach to compute Bayes estimates.

Fortunately, we have an alternative approach that does not require much additional computational cost beyond simulating $(Φ, Σ)$ . Suppose we simulate one set of data $X_{k}^{*}$ from the LTS model in each MCMC cycle with simulated parameters of $(Φ_{k - 1}, Σ_{k - 1})$ , and then simulate the parameters of the next MCMC cycle $(Φ_{k}, Σ_{k})$ conditional on both the sample data $X$ and the simulated data $X_{k}^{*}$ . We will demonstrate that the posterior moments such as $E (W ∣ Y)$ , $E (W Φ ∣ Y)$ , and $E (Φ^{'} W Φ ∣ Y)$ can be computed through simulated parameters $(Φ_{k}, Σ_{k})$ and the jointly simulated data $X_{k}^{*}$ (for $k = 1, \dots, M$ ). The simulated data are in essence latent parameters. They are not the subject of our interests per se but are useful for simulation of parameter of interest (i.e. the frequentist expectation $W$ ). Data augmentation is not uncommon in Bayesian simulations, but as we noted in the introduction, this data-augmented simulation approach differs from its other uses in the econometrics and statistics literature. One question of practical importance remains though: The number of elements in simulated matrix $X_{1}^{*}$ has the dimension of $T \times L p$ , which can be quite large. Do we have to simulate very long Markov chains to assure the averages are good approximates of the posterior mean? Fortunately, our numerical results show that the answer to the question is “no”.

In the following we propose a general algorithm that formalises the data-augmentation idea discussed above.

3.1. A general algorithm using data as latent parameters

Suppose that observed data $X$ has the density $f (x ∣ θ)$ , where parameter vector $θ$ is unknown. A prior $π (θ)$ can be informative or noninformative. Let $X^{*}$ be a random vector (or a matrix) with the density $f (x^{*} ∣ θ)$ . Let $h (θ)$ be a function of the parameters $θ$ . We are interested in the posterior mean of the quantity $E_{(X^{*} ∣ θ)} g (X^{*}, h (θ))$ given the data $X$ .

Our algorithm is based on the following fact: $\begin{aligned} E {E_{(X^{*} ∣ θ)} g (X^{*}, h (θ)) ∣ X} \\ = \frac{\int {\int g (x^{*}, h (θ)) f (x^{*} ∣ θ) d x^{*}} f (X ∣ θ) π (θ) d θ}{\int f (X ∣ θ) π (θ) d θ} \\ = \int \int g (x^{*}, h (θ)) π (x^{*}, θ ∣ X) d x^{*} d θ, \end{aligned}$ where (9) $\begin{aligned} π (x^{*}, θ ∣ X) = \frac{f (x^{*} ∣ θ) f (X ∣ θ) π (θ)}{\int \int f ({\tilde{x}}^{*} ∣ \tilde{θ}) f (X ∣ \tilde{θ}) π (\tilde{θ}) d {\tilde{x}}^{*} d \tilde{θ}} . \end{aligned}$ (9) If we have a random sample $(X_{k}^{*}, θ_{k}), k = 1, \dots, M$ , from the joint distribution of (Equation9(9) $\begin{aligned} π (x^{*}, θ ∣ X) = \frac{f (x^{*} ∣ θ) f (X ∣ θ) π (θ)}{\int \int f ({\tilde{x}}^{*} ∣ \tilde{θ}) f (X ∣ \tilde{θ}) π (\tilde{θ}) d {\tilde{x}}^{*} d \tilde{θ}} . \end{aligned}$ (9) ), we can estimate $E (E_{(X^{*} ∣ θ)} {g (X^{*}, θ)} h (θ) ∣ X)$ by using the result $\begin{aligned} \hat{E} [E {g (X^{*}, h (θ)) ∣ θ} ∣ X] & = {\hat{E}}_{(X^{*}, θ ∣ X)} {g (X^{*}, h (θ))} \\ = \frac{1}{M} \sum_{k = 1}^{M} g (X_{k}^{*} h (θ_{k})) . \end{aligned}$

The problem becomes to generate observations from the joint distribution of $(X^{*}, θ)$ given the data $X$ . For this task the following MCMC method can be used.

Suppose that at the beginning of cycle k we have $(X_{k - 1}^{*}, θ_{k - 1})$ .

Simulating full conditional posterior: We sample from $\begin{aligned} π (θ ∣ X^{*}, X) \propto f (X^{*} ∣ θ) f (X ∣ θ) π (θ) . \end{aligned}$

Step 1. Simulate $X_{k}^{*} \sim f (x^{*} ∣ θ_{k - 1}) .$

Step 2. Simulate $θ_{k} \sim π (θ ∣ X_{k}^{*}, X) \propto f (X_{k}^{*} ∣ θ) f (X ∣ θ) π (θ) .$

4. Bayesian estimation of $(Φ, Σ)$ in LTS models

4.1. Priors

The Bayes estimator of LTS depends on the prior of $(Φ, Σ)$ We assume prior independence so the prior for $π (Φ, Σ)$ is $π (φ) π (Σ)$ , the product of priors for $Φ$ and $Σ$ .

For estimation of regression coefficient $Φ$ , a popular informative prior of $φ = v e c (Φ)$ is the normal distribution, $N (φ_{0}, M_{0})$ , with hyperparameters $φ_{0}$ and $M_{0}$ : (10) $\begin{aligned} π_{N} (φ) \propto | M_{0} |^{- 1 / 2} \exp \{- \frac{1}{2} (φ - φ_{0})^{'} M_{0}^{- 1} (φ - φ_{0})\} . \end{aligned}$ (10)

A popular class of non-informative prior on $Σ$ is $π_{b} (Σ) \propto 1 / | Σ |^{b / 2} .$ If b = p + 1, $π_{b} (Σ)$ becomes the Jeffreys prior (see Zellner, Citation1971)

Ni et al. (Citation2007) examined intrinsic Bayes estimator under prior $π_{b} (φ, Σ) = π_{N} (φ) π_{b} (Σ) .$

In the appendix we show posteriors $π (Σ ∣ X^{*}, X)$ and $π (Φ ∣ X^{*}, X)$ can be obtained in analytical form.

As mentioned in the introduction, in multiple-parameter settings the Jeffreys prior often has undesirable properties. Bernardo (Citation1979) proposed an approach of deriving a reference prior by breaking a single multiparameter problem into a consecutive series of problems with fewer numbers of parameters. For examples where the reference priors produce more desirable estimates than the Jeffreys priors, see Berger and Bernardo (Citation1992) and Sun and Berger (Citation1998), among others. In estimating the variance-covariance matrix $Σ$ based on an iid random sample from a normal population with known mean, Yang and Berger (Citation1994) re-parameterised matrix $Σ$ as $O^{'} D O$ , where $D$ is a diagonal matrix the elements of which are the eigenvalues of $Σ$ (in increasing or decreasing order) and $O$ is an orthogonal matrix. The following reference prior is derived by giving vectorised $D$ higher priority over vectorised $O$ : (11) $\begin{aligned} π_{R} (Σ) \propto \frac{1}{| Σ | \prod_{1 \leq i < j \leq p} (d_{i} - d_{j})}, \end{aligned}$ (11) where $d_{1} > d_{2} > \dots > d_{p}$ are the eigenvalues of $Σ .$

For numerical and empirical exercise in the is study we use the normal-reference prior $π_{N} (φ) π_{R} (Σ)$ . The conditional densities under the normal-reference prior are in the appendix. Ni and Sun (Citation2003) proved that the posteriors of $(Φ, Σ)$ are proper under the normal-reference prior. But the conditional posterior $π (Σ ∣ Φ, X^{*}, X)$ does not have an analytical form and must be sampled numerically.

4.2. A simulation algorithm for LTS models under the normal-reference prior

We employ an MCMC method to sample from the posterior. In particular, we use the Gibbs sampling method (cf. Gelfand & Smith, Citation1990). The following algorithm simulates the posteriors of LTS parameters conditional on both the sample and generated data.

Suppose that at cycle k, we have $(Φ_{k - 1}, Σ_{k - 1})$ (with an initial draw of $Σ$ and $Φ$ , e.g. the MLE.)

Algorithm MCMC:

Step 1. Generate $Y_{k}^{*} | Σ_{k - 1}, Φ_{k - 1}$ .

Simulate $y_{k, t}^{*^{'}} \sim N (x_{0 t}^{'} B_{0} + \sum_{i = 1}^{L} y_{k, t - i}^{*^{'}} B_{k - 1, i}, Σ_{k - 1}),$ for $t = 1, \dots, T$ . Define $\begin{aligned} Y_{k}^{*} = (\begin{matrix} y_{k, 1}^{*^{'}} \\ ⋮ \\ y_{k, T}^{*^{'}} \end{matrix}) and X_{k}^{*} = (\begin{matrix} x_{01}^{'} & y_{k, - 1}^{*^{'}} & \dots & y_{k, - L}^{*^{'}} \\ x_{02}^{'} & y_{k, 0}^{*^{'}} & \dots & y_{k, 1 - L}^{*^{'}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{0 T}^{'} & y_{k, T - 1}^{*^{'}} & \dots & y_{k, T - L}^{*^{'}} \end{matrix}) . \end{aligned}$

Step 2. Generate $Φ_{k} | Σ_{k - 1}, Y_{k}^{*}, Y$ .

Simulate $φ_{k} = v e c (Φ_{k}) \sim N (μ_{k}, V_{k}),$ where (12) $\begin{aligned} μ_{k} & = V_{k}^{*} [{Σ_{k - 1}^{- 1} \otimes (X^{'} X \\ + {X_{k}^{*}}^{'} X_{k}^{*})} {\hat{φ}}_{M L E k}^{*} + M_{0}^{- 1} φ_{0}]; \end{aligned}$ (12) (13) $\begin{aligned} {\hat{φ}}_{M L E k}^{*} & = v e c {(X^{'} X + {X_{k}^{*}}^{'} X_{k}^{*})^{- 1} (X^{'} Y + {X_{k}^{*}}^{'} Y_{k}^{*})} . \end{aligned}$ (13) (14) $\begin{aligned} V_{k}^{*} & = {M_{0}^{- 1} + Σ_{k - 1}^{- 1} \otimes (X^{'} X + {X_{k}^{*}}^{'} X_{k}^{*})}^{- 1} . \end{aligned}$ (14)

Steps 3 to 6 generate $Σ_{k} | Σ_{k - 1}, Φ_{k}, Y_{k}^{*}, Y$ .

Step 3: Calculate $S_{k} = S (Φ_{k}) + S^{*} (Φ_{k}) = (Y - X \times Φ_{k})^{'} (Y - X Φ_{k}) + (Y^{*} - X^{*} Φ_{k})^{'} (Y^{*} - X^{*} Φ_{k}) .$ Decompose $Σ_{k - 1} = O D O^{'},$ where $O$ is an orthogonal matrix, $D = d i a g (d_{1}, \dots, d_{p})$ and $d_{1} > d_{2} \dots > d_{p} .$ Let $d_{i}^{#} = \log (d_{i}),$ $\begin{aligned} D^{#} = d i a g (d_{1}^{#}, \dots, d_{p}^{#}) a n d Σ_{k - 1}^{#} = O D^{#} O^{'} . \end{aligned}$

Step 4: Select a random symmetric $p \times p$ matrix $V$ , with elements, $v_{i j} = z_{i j} / \sqrt{\sum_{l \leq m} z_{l m}^{2}},$ where $z_{i j} \sim N (0, 1)$ ( $1 \leq i \leq j \leq p$ , the other elements of $V$ are defined by symmetry).

Step 5: Generate $λ \sim N (0, 1)$ and set $Ψ = Σ_{k - 1}^{#} + λ V .$ Decompose $Ψ = Q C^{#} Q^{'},$ where $Q$ is an orthogonal matrix, $C^{#} = d i a g (c_{1}^{#}, \dots, c_{p}^{#})$ and $c_{1}^{#} > c_{2}^{#} \dots > c_{p}^{#} .$ Compute $\begin{aligned} β_{k} & = T \sum_{i = 1}^{p} (d_{i}^{#} - c_{i}^{#}) \\ + \frac{1}{2} t r [{(e x p Σ_{k - 1}^{#})^{- 1} - (\exp Ψ)^{- 1}} S_{k}] \\ + \sum_{i < j} \log (d_{i}^{#} - d_{j}^{#}) - \sum_{i < j} \log (c_{i}^{#} - c_{j}^{#}) . \end{aligned}$

Step 6: Define $C = d i a g (\exp (c_{1}^{#}), \dots, \exp (c_{p}^{#}))$ and $\tilde{Σ} = Q C Q^{'}$ . Simulate $u \sim u n i f o r m (0, 1)$ and let $\begin{aligned} Σ_{k} = \{\begin{cases} \tilde{Σ}, & i f u \leq min {1, \exp (β_{k})}, \\ Σ_{k - 1}, & o t h e r w i s e . \end{cases} \end{aligned}$

Note the acceptance probability $\exp (β_{k}) = π (Ψ | Φ_{k}, Y_{k}^{*}, Y) / π (Σ_{k - 1} | Φ_{k}, Y_{k}^{*}, Y)$ , where the conditional posterior $π (Σ | Φ, Y^{*}, Y)$ is given in (EquationA14(A14) $\begin{aligned} π (Σ^{#} | Φ, Y, Y^{*}) = π (Σ^{#} | S (Φ), S^{*} (Φ)) \\ \propto \frac{e t r [- T | Σ^{#} | - \frac{1}{2} (e x p Σ^{*})^{- 1} {S (Φ) + S^{*} (Φ)}]}{\prod_{i < j} (d_{i}^{#} - d_{j}^{#})}, \end{aligned}$ (A14) ). To accelerate the convergence, we repeat Step 6 up to five times until a new candidate is accepted.

4.3. Computing the posterior average loss

From Lemma 2.1, given the estimate $(\hat{Φ}, \hat{Σ})$ , which is computed for a given data sample $Y$ , we write the posterior average loss $E_{((Φ, Σ) | Y)} L (\hat{Φ}, \hat{Σ}, Φ, Σ)$ as $E_{(Φ, Σ) | Y} L_{1} (\hat{Σ}; Σ) + E_{(Φ, Σ) | Y} L_{2} (\hat{Σ}, \hat{Φ}, Σ, Φ)$ . We decompose the $Σ_{k}$ in the kth MCMC cycle as $Σ_{k} = Q D_{k} Q^{'}$ , where $D_{k}$ is the diagonal matrix that consists of eigenvalues of $Σ_{k}$ : $D_{k} = d i a g (d_{k 1}, d_{k 2}, \dots, d_{k p})$ , and $Q$ is an orthogonal matrix with $Q Q^{'} = I$ .

The posterior average loss under the intrinsic loss can be computed using the posterior draws of $(Φ_{k}, Σ_{k}, X_{k}^{*})$ generated by the MCMC procedure $(k = 1, 2, \dots, M)$ , with (15) $\begin{aligned} {\hat{E}}_{((Φ, Σ) | Y)} L_{1} (\hat{Σ}; Σ) \\ = {\hat{E}}_{((Φ, Σ) | Y)} {\frac{T}{2} {t r ({\hat{Σ}}^{- 1} Σ) - l o g | {\hat{Σ}}^{- 1} Σ | - p} \\ = \frac{T}{2} {t r ({\hat{Σ}}^{- 1} \frac{1}{M} \sum_{k = 1}^{M} Σ_{k}) \\ + \log | \hat{Σ} | - p - \frac{1}{M} \sum_{k = 1}^{M} \sum_{i = 1}^{p} \log | d_{k i} |}; \end{aligned}$ (15) (16) $\begin{aligned} {\hat{E}}_{((Φ, Σ) | Y)} L_{2} (\hat{Φ}, \hat{Σ}; Φ, Σ) \\ = {\hat{E}}_{((Φ, Σ) | Y)} \frac{1}{2} t r [{\hat{Σ}}^{- 1} {(Φ - \hat{Φ})^{'} W (Φ - \hat{Φ})}] \\ = \frac{1}{2} {t r ({\hat{Σ}}^{- 1} \frac{1}{M} \sum_{k = 1}^{M} {Φ_{k}}^{'} {X_{k}^{*}}^{'} X_{k}^{*} Φ_{k}) \\ + t r (\hat{Φ} {\hat{Σ}}^{- 1} {\hat{Φ}}^{'} \frac{1}{M} \sum_{k = 1}^{M} {X_{k}^{*}}^{'} X_{k}^{*})} \\ - t r ({\hat{Σ}}^{- 1} {\hat{Φ}}^{'} \frac{1}{M} \sum_{k = 1}^{M} {X_{k}^{*}}^{'} X_{k}^{*} Φ_{k}), \end{aligned}$ (16) where $W = E_{(X ∣ Φ, Σ)} (X^{'} X)$ .

Note that all terms in the posterior entropy loss are functions of simulated $Σ$ , $Φ$ , and $X^{*}$ over the MCMC cycles. The moments of the simulated parameters can be computed in the MCMC cycles, just as the posterior mean, without the need of storage of all of the simulated parameters.

5. A numerical example and an empirical study

5.1. A numerical example

In this section we first simulate data from an LTS model (17) $\begin{aligned} y_{t} = c + (y_{t - 1}, x_{t}) B + ϵ_{t}, \end{aligned}$ (17) for $t = 1, \dots, T$ . The dimension of the VAR variable $y_{t}$ is 5. The exogenous variable $x_{t}$ is a scalar representing seasonal cycles, with $x_{1} = 0$ , $x_{2} = 1$ , $x_{3} = 0$ , $x_{4} = - 1$ , and $x_{t} = x_{t - 4}$ for t>4.

Now we let the true parameters be $\begin{aligned} Σ & = (\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 0 & 5 \end{matrix}), \\ B & = (\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0.5 & 0 & 0 & 0 & 0 \\ 0 & 0.5 & 0 & 0 & 0 \\ 0 & 0 & 0.5 & 0 & 0 \\ 0 & 0 & 0 & 0.5 & 0 \\ 0 & 0 & 0 & 0 & 0.5 \\ 0.1 & 0.2 & 0.3 & 0.4 & 0.5 \end{matrix}) . \end{aligned}$

The last row of matrix $B$ are the parameters of the seasonal dummies. The discussion in Section 2 shows with this parameter setting there is no closed-form expression of the frequentist expectation $E (X^{'} X | Σ, Φ)$ and we need to simulate $E_{(X ∣ Φ, Σ)} (X^{'} X)$ using the data-augmentation algorithms proposed in Section 4.

We generate one data sample (T) of 100 observations from the LTS model with the above parameters for $Σ$ and $Φ$ . The MLE of the parameters are

$\begin{aligned} {\hat{Σ}}_{M L E} & = (\begin{matrix} 0.93201 & 0.12358 & - 0.08462 & 0.27246 & - 0.11782 \\ 0.12358 & 1.76018 & 0.02217 & - 0.16508 & 0.26475 \\ - 0.08462 & 0.02217 & 2.44980 & - 0.07874 & - 0.01940 \\ $ 0.27246 $ & $ - $ 0.16508 & $ - $ 0.07874 & 3.44815 & - 0.80056 \\ $ - $ 0.11782 & 0.26475 & - 0.01940 & - 0.80056 & 4.03612 \end{matrix}), \\ {\hat{Φ}}_{M L E} & = (\begin{matrix} - 0.05633 & 0.14700 & 0.01158 & 0.16497 & 0.26224 \\ 0.51936 & 0.24397 & 0.21705 & - 0.1209 & - 0.25149 \\ 0.0214 & 0.41804 & - 0.03495 & 0.06221 & 0.28700 \\ - 0.04826 & - 0.03265 & 0.50279 & - 0.03351 & - 0.10127 \\ - 0.06176 & 0.01439 & 0.14174 & 0.37051 & - 0.01758 \\ - 0.0194 & - 0.0014 & - 0.08389 & 0.01734 & 0.39826 \\ 0.15900 & 0.22462 & - 0.21547 & - 0.01671 & 1.06891 \end{matrix}) . \end{aligned}$

We conduct the simulation with a diffuse prior on $Φ$ and the Yang–Berger reference prior on $Σ$ . The length of MCMC cycles set at 100,000. The intrinsic Bayes estimates are

(18) $\begin{aligned} {\hat{Σ}}_{E} & = (\begin{matrix} 1.25871 & 0.17644 & - 0.08826 & 0.26711 & - 0.13247 \\ 0.17644 & 2.33691 & 0.00497 & - 0.15278 & 0.28849 \\ - 0.08826 & 0.00497 & 2.97993 & - 0.03929 & - 0.07036 \\ 0.26711 & - 0.15278 & - 0.03929 & 3.76133 & - 0.74181 \\ - 0.13247 & 0.28849 & - 0.07036 & - 0.74181 & 4.57260 \end{matrix}), \end{aligned}$ (18) (19) $\begin{aligned} {\hat{Φ}}_{E} & = (\begin{matrix} - 0.05655 & 0.09316 & - 0.00092 & 0.11812 & 0.16943 \\ 0.46112 & 0.18679 & 0.13921 & - 0.09552 & - 0.13973 \\ 0.02891 & 0.37594 & - 0.01426 & 0.05658 & 0.18949 \\ - 0.04417 & - 0.03156 & 0.45267 & - 0.0236 & - 0.10665 \\ - 0.05809 & 0.02705 & 0.11909 & 0.33560 & 0.02602 \\ - 0.02400 & 0.00105 & - 0.08313 & 0.02921 & 0.34976 \\ 0.11188 & 0.08832 & - 0.08980 & 0.04434 & 0.34897 \end{matrix}) . \end{aligned}$ (19)

The acceptance rate for the Metropolis step employed for sampling of $Σ$ from the posterior conditional on other parameters and data is $24 %$ .

Now we compare the Bayes estimator with the MLE and posterior mean. The posterior mean obtained using Option 1 is

(20) $\begin{aligned} {\hat{Σ}}_{M e a n} & = (\begin{matrix} $ 1.17992 $ & 0.16657 & - 0.08449 & 0.25618 & - 0.12700 \\ 0.16657 & 2.20489 & 0.00476 & - 0.14659 & 0.27587 \\ - 0.08449 & 0.00476 & 2.81882 & - 0.03801 & - 0.06557 \\ 0.25618 & - 0.14659 & - 0.03801 & 3.57382 & - 0.71561 \\ - 0.12700 & 0.27587 & - 0.06557 & - 0.71561 & 4.35069 \end{matrix}), \end{aligned}$ (20) (21) $\begin{aligned} {\hat{Φ}}_{M e a n} & = (\begin{matrix} - 0.0599 & 0.09731 & - 0.00193 & 0.12276 & 0.17558 \\ 0.43287 & 0.18741 & 0.14185 & - 0.09236 & - 0.13907 \\ 0.02772 & 0.35030 & - 0.01270 & 0.05628 & 0.19118 \\ - 0.04650 & - 0.03116 & 0.42790 & - 0.02733 & - 0.10462 \\ - 0.05718 & 0.02602 & 0.12006 & 0.31275 & 0.02437 \\ - 0.02373 & - 0.00041 & - 0.08285 & 0.02851 & 0.32792 \\ 0.11167 & 0.08778 & - 0.08838 & 0.04414 & 0.34923 \end{matrix}) . \end{aligned}$ (21)

It is known that the posterior mean of $Σ$ minimises the expected posterior loss of $L_{1}$ . But it does not minimise the intrinsic loss because the estimator of $Σ$ also influences the weight of $Φ$ related loss in $L_{2}$ . Table reports the average posterior loss of the MLE, posterior mean, and the Bayes estimator for the data sample generated in the example. The Bayes estimator improves $L_{2}$ -related risk with a tradeoff of larger $L_{1}$ -related risk. Table shows that the Bayes estimator induces lower posterior risk than the posterior mean by making the $L_{2}$ -related risk substantially lower and the $L_{1}$ -related risk only slightly higher. Both posterior mean and the Bayes estimator dominate the MLE.

Table 1. Posterior average loss of the estimates in the Example.

Display Table

5.2. An empirical study: seasonal effects in a macroeconomic model

We now turn to an empirical application of the Bayesian estimation under the entropy loss. We estimate an LTS consisting of seasonal dummies and four macroeconomic variables: the return of Standard and Poor 500 stock price index (which represents weighted stock prices of large companies), the 3-month Treasury Bill rate, the growth rate of payroll (including government jobs as well as private sector jobs), and the growth rate of industrial production (in that order). All series are measured in percentage terms. The series are monthly data from 1970:1 to 2002:12 and are not seasonally-adjusted. The payroll data are obtained from the Bureau of Labor Statistics, the rest series are obtained from the Federal Reserve Board. There are 12 column dummy variables, representing January to December.

The role of seasonal fluctuations in business cycles has been noted by a number of economists. Barky and Miron (Citation1989) argued that for the U.S. economy the characteristics of seasonal fluctuations are similar to the conventional characterisations of business cycles. Cecchetti et al. (Citation1997) estimated production function of various industries based on how their responses to seasonal shocks vary with the state of business cycle. Ghysels (Citation1988) showed that univariate seasonal adjustment of endogenous variables is not harmless because information on the interactions among endogenous variables will be lost. Miron and Beaulieu (Citation1996) provided a survey on econometric and economic issues on understanding business cycles through seasonal fluctuations. In the finance literature, numerous studies argue that stock returns appear to have a seasonal components. Rozeff and Kinney (Citation1976) documented a celebrated “turn of the year” effect, which refers to the seemingly abnormally high returns in January and July, especially for stocks with small-market capitalisations. A number of theories have been developed to explain the phenomenon. Reinganum (Citation1983) attributed the high stock return in January to the end of year tax-loss selling in December. Chang and Pinegar (Citation1989) found that industrial production trails the seasonal movement of stock returns by one month. The reported point estimates of the seasonal effects in the literature are model dependent and based on OLS or MLE. We will estimate the seasonal effects. Our primary interest lies in comparison of posterior mean with Bayes estimate under the entropy loss.

In this section, we employ an LTS model (22) $\begin{aligned} y_{t}^{'} = (x_{t}^{'}, y_{t - 1}^{'}, \dots, y_{t - L}^{'}) Φ + ϵ_{t}, \end{aligned}$ (22) for $t = 1, \dots, T .$ The dimension of the VAR variable $y_{t}$ is four. The exogenous variable $x_{0 t} = (x_{0 t, 1}, \dots, x_{0 t, 12})$ is a 12-dimensional vector representing seasonal cycles, with $x_{0 i, 1}$ equals 1 if period i is January, and 0 otherwise; $x_{0 i, 12} = 1$ if period i is December and 0 otherwise.

Based on the Schwarz criterion, for each sample period the lag length L of the LTS is 2. The Yang–Berger reference prior is applied to the covariance matrix $Σ$ . The prior for the LTS coefficient $φ$ is a rather diffuse $N (0, M_{0})$ . Here $M_{0}$ is a diagonal matrix with 10.0 being the diagonal element for parameters corresponding to the dummy variables and 2.0 being the diagonal element for parameters corresponding to the lag coefficients. We draw the posterior from M MCMC cycles after $0.1 \times M$ burn-in runs. The MCMC length M is set at 50, 000, 100, 000, and 1, 000, 000.

Under algorithm MCMC, reducing the length of MCMC cycles to 100, 000 or 50, 000 from 1, 000, 000 makes little difference. The MLE, posterior mean, and Bayes estimate of the covariance matrix $Σ$ are as follows. As dictated by the theoretical result, the Bayes estimate under the entropy loss, $Σ_{E}$ is larger than the posterior mean $Σ_{M e a n}$ . $\begin{aligned} {\hat{Σ}}_{M L E} = (\begin{matrix} 19.775 & - 0.335 & - 0.094 & 0.059 \\ - 0.335 & 0.254 & 0.055 & 0.011 \\ - 0.094 & 0.055 & 0.937 & 0.070 \\ 0.059 & 0.011 & 0.070 & 0.044 \end{matrix}) . \end{aligned}$ With M = 1, 000, 000, the posterior mean and entropy-based Bayes estimates are $\begin{aligned} {\hat{Σ}}_{M e a n} & = (\begin{matrix} 20.908 & - 0.354 & - 0.100 & 0.063 \\ - 0.354 & 0.271 & 0.058 & 0.012 \\ - 0.100 & 0.058 & 0.996 & 0.075 \\ 0.063 & 0.012 & 0.075 & 0.048 \end{matrix}) . \\ {\hat{Σ}}_{E} & = (\begin{matrix} 22.729 & - 0.384 & - 0.101 & 0.078 \\ - 0.384 & 0.295 & 0.057 & 0.012 \\ - 0.101 & 0.057 & 1.074 & 0.081 \\ 0.078 & 0.012 & 0.081 & 0.052 \end{matrix}) . \end{aligned}$ The posterior standard deviations of the elements of the covariance matrix are $\begin{aligned} (\begin{matrix} 1.538 & 0.124 & 0.235 & 0.052 \\ 0.124 & 0.020 & 0.027 & 0.006 \\ 0.235 & 0.027 & 0.074 & 0.012 \\ 0.052 & 0.006 & 0.012 & 0.003 \end{matrix}) . \end{aligned}$ The difference between estimates ${\hat{Σ}}_{M e a n}$ and ${\hat{Σ}}_{E}$ is large relative to the posterior standard deviations.

The above point estimates and standard deviations are similar to those with M = 50, 000. The MCMC algorithm yields posteriors with few outliers. This is because in Step 2 the MCMC algorithm, $φ$ is generated from an average of sample data and generated data, instead of sample data alone. A few outliers in the posterior may affect the posterior mean slightly but can change the posterior risk and the Bayesian estimate substantially because the few explosive parameters carry disproportionately large weights in the posterior average loss.

The intrinsic Bayes estimate dominates the posterior mean by a large margin in terms of posterior expected loss. The large difference in the posterior expected loss is mainly due to the difference in the $Φ$ -related risk, i.e. the quadratic term in (Equation5(5) $\begin{aligned} \frac{T}{2} {t r ({\tilde{Σ}}^{- 1} Σ) - \log | {\tilde{Σ}}^{- 1} Σ | - p} \\ + \frac{1}{2} t r {{\tilde{Σ}}^{- 1} (Φ - \tilde{Φ})^{'} W (Φ - \tilde{Φ})}, \end{aligned}$ (5) ). This difference in risk is approximately $\begin{aligned} \frac{1}{2} t r [{\hat{Σ}}_{E}^{- 1} E {(Φ - {\hat{Φ}}_{E})^{'} W \\ \times (Φ - {\hat{Φ}}_{E}) ∣ Y} {\hat{Σ}}_{E}^{- 1} ({\hat{Σ}}_{E} - {\hat{Σ}}_{M e a n})], \end{aligned}$ which is proportional to the frequentist expectation $W$ , and the latter is comparable to $X^{'} X$ . $X^{'} X$ is quite large in this application, largely due to the strong serial correlation of the 3-month Tbill rates. As a result, with a larger $\hat{Σ}$ the Bayesian estimate substantially reduces the posterior risk, compared with the posterior mean. Simulation with M = 1, 000, 000 shows that the posterior average loss of the posterior mean estimate $(3230.6)$ is larger than that of the intrinsic Bayes estimate $(69.7)$ . The lower overall posterior average loss of the intrinsic Bayes estimate is achieved by substantially lowering the risk of the quadratic part, from 3225.3 for the posterior mean to 61.8. The first term of the loss related to $Σ$ of the intrinsic Bayes estimate is slightly larger $(7.9)$ compared to that under the posterior mean estimate $(5.3)$ . As is noted earlier, the intrinsic Bayes estimate improves $Φ$ -related loss with a tradeoff of larger $Σ$ -related loss. The empirical result shows that the Bayesian estimate induces lower posterior average loss than the posterior mean by making the $Φ$ -related loss substantially lower and the $Σ$ -related loss only slightly higher.

We now turn to compare the estimates of the regression coefficient $Φ$ . Table reports the MLE, posterior mean, the intrinsic Bayes estimate, and posterior standard deviations. As a consequence of applying a rather diffuse prior, the posterior mean $Φ_{M e a n}$ is quite similar to the MLE. For stock returns of Standard and Poor 500 index, MLE and the posterior mean estimate indicate a moderate positive seasonal factor in January, which is smaller than the seasonal factor of March, April, October, November, and December. Most surprisingly, October registers the largest seasonal gain, despite the fact that the sample included the 1987 October sell off. The data of recent years suggest that the estimates of seasonality in large capitalisation stock returns are quite sensitive to the sample period and regression model. In comparison to the MLE and the posterior mean, the intrinsic Bayesian estimate shows a smaller January effect and much smaller end-of-the-year positive seasonal returns. The sum of the seasonal coefficients of the MLE is above $11 %$ while that of the intrinsic Bayesian estimates is about half as much. The large discrepancy between the posterior mean and the intrinsic Bayesian estimate s casts doubt on the robustness of the seasonality of returns of Standard and Poor 500 stock.

Table 2. Estimates of three equations.

Download CSV Display Table

Compared to the stock return, the seasonality of industrial production growth rate is much more robust. The most distinct pattern is a steep decline in July followed by a surge in August; then weakness in the end of the year precedes a strong rebound in February. The strong showing of industrial production in February and August is consistent with the pattern reported in Chang and Pinegar (Citation1989) while the predicted industrial production by Standard and Poor stock returns is quite small. The magnitude of the seasonal effects by the entropy-loss-based Bayesian estimates is on average slightly larger than that of the posterior mean.

Lastly, we examine the estimates of the employment growth rate equation. The most prominent seasonal patterns are the decline in January followed by a rebound in February and March and the weakness in July followed by a recovery in September. The estimated seasonality in payroll growth is somewhat different from that of the industrial production growth. Note that in year 2002 about $83 %$ of the payroll consists of service sector jobs while $85 %$ of industrial production concerns the manufacture sector. The subject of interest is the point estimates. Similar to the industrial production equation, the entropy-loss-based Bayes estimates for the payroll growth rate equation are similar to the posterior mean.

In summary, the Bayesian estimates based on the entropy loss show a qualitatively similar seasonal pattern to that of the posterior mean estimates for industrial production and employment growth but a distinctly different one for stock returns. The posterior average loss of the Bayes estimates with respect to the entropy loss is substantially smaller than that of the posterior mean.

6. Concluding remarks

In this paper we investigate properties of Bayes estimators of LTS model $(Φ, Σ)$ derived from the entropy loss function. These estimators are distinctly different from the multivariate iid model because of the serial correlation of the time series variables. Bayesian computation under the entropy loss requires simulating a frequentist moment of the regressors. We propose a data-augmenting algorithm for simulation of posteriors and computation of Bayes estimators under the entropy loss and a normal-reference prior. The algorithm that draws from the full conditional posterior is shown to be quite efficient. A novel approach taken in this paper concerns generating data in an MCMC as latent parameters. This idea may be useful for other contexts for simulating complicated posterior moments. Our empirical application to a macroeconomic problem shows that the Bayes estimates under the entropy loss can differ substantially from the posterior mean.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Shawn Ni

Dr. Shawn Ni holds a PhD in Economics from University of Minnesota. He is currently Middlebush Professor of Economics and Adjunct Professor of Statistics at University of Missouri-Columbia. He conducts research on a wide range of empirical economics topics and Bayesian statistics.

Dongchu Sun

Dr. Dongchu Sun holds a PhD in Statistics from Purdue University. He is a research professor of statistics at the University of Nebraska-Lincoln and East China Normal University. His research interests includes Bayesian analysis, small area estimation, decision theory, business and econometrics, space-time and longitudinal models, and smoothing splines.

References

Barky, R. B., & Miron, J. A. (1989). The seasonal cycle and the business cycle. The Journal of Political Economy, 97(3), 503–534. https://doi.org/https://doi.org/10.1086/261614
Web of Science ®Google Scholar
Berger, J. O., & Bernardo, J. M. (1992). On the development of reference priors. In J. M. Bernardo et al. (Eds.), Bayesian analysis IV . Oxford University Press.
Google Scholar
Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference. Journal of Royal Statistical Society Ser. B, 41, 113–147.
Google Scholar
Bernardo, J. M., & Juárez, M. A. (2003). Intrinsic estimation. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith & M. West (Eds.), Bayesian statistics (Vol. 7, pp. 465–476). Oxford University Press.
Google Scholar
Cecchetti, S. G., Kashyap, A. K., & Wilcox, D. W. (1997). Interactions between the seasonal and business cycles in production and inventories. American Economic Review, 87, 884–892.
Web of Science ®Google Scholar
Chang, E. C., & Pinegar, M. J. (1989). Seasonal fluctuations in industrial production and stock market seasonals. The Journal of Financial and Quantitative Analysis, 24(1), 59–74. https://doi.org/https://doi.org/10.2307/2330748
Web of Science ®Google Scholar
Fernandez-Villaverde, J., & Rubio-Ramirez, J. F. (2004). Comparing dynamic equilibrium economies to data. Journal of Econometrics, 123(1), 153–187. https://doi.org/https://doi.org/10.1016/j.jeconom.2003.10.031
Web of Science ®Google Scholar
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409. https://doi.org/https://doi.org/10.1080/01621459.1990.10476213
Web of Science ®Google Scholar
Ghysels, E. (1988). A study toward a dynamic theory of seasonality for economic time series. Journal of the American Statistical Association, 83(401), 168–172. https://doi.org/https://doi.org/10.1080/01621459.1988.10478583
Web of Science ®Google Scholar
Harville, D. A. (1998). Matrix algebra from a statistician's perspective. Taylor & Francis Group.
Google Scholar
Kitamura, Y., & Stutzer, M. (1997). An information-theoretic alternative to generalized method of moments estimation. Econometrica, 65(4), 861–874. https://doi.org/https://doi.org/10.2307/2171942
Web of Science ®Google Scholar
MacKinnon, J. G., & Smith, A. A. (1998). Approximate bias correction in econometrics. Journal of Econometrics, 85(2), 205–230. https://doi.org/https://doi.org/10.1016/S0304-4076(97)00099-7
Web of Science ®Google Scholar
Miron, J. A., & Beaulieu, J. J. (1996). What have macroeconomists learned about business cycles from the study of seasonal cycles?. The Review of Economics and Statistics, 78(1), 54–66. https://doi.org/https://doi.org/10.2307/2109847
Web of Science ®Google Scholar
Ni, S., & Sun, D. (2003). Noninformative priors and frequentist risks of Bayesian estimators of vector-autoregressive models. Journal of Econometrics, 115(1), 159–197. https://doi.org/https://doi.org/10.1016/S0304-4076(03)00099-X
Web of Science ®Google Scholar
Ni, S., Sun, D., & Sun, X. (2007). Intrinsic Bayesian estimation of vector autoregression impulse responses. Journal of Business and Economic Statistics, 25(2), 163–176.https://doi.org/https://doi.org/10.1198/073500106000000378
Web of Science ®Google Scholar
Reinganum, M. R. (1983). The anomalous stock market behavior of small firms in January: Empirical tests for tax-loss selling effects. Journal of Financial Economics, 12(1), 89–104.https://doi.org/https://doi.org/10.1016/0304-405X(83)90029-6
Web of Science ®Google Scholar
Robert, C. P. (1994). The Bayesian choice. Springer-Verlag.
Google Scholar
Robert, C. P. (1996). Intrinsic losses. Theory and Decision, 40(2), 191–214. https://doi.org/https://doi.org/10.1007/BF00133173
Web of Science ®Google Scholar
Robertson, J. C., Tallman, E. W., & Whiteman, C. H. (2005). Forecasting using relative entropy. Journal of Money, Credit, and Banking, 37(3), 383–401. https://doi.org/https://doi.org/10.1353/mcb.2005.0034
Web of Science ®Google Scholar
Rozeff, M. S., & Kinney, W. R. (1976). Capital market seasonality: The case of stock returns. Journal of Financial Economics, 3(4), 379–402. https://doi.org/https://doi.org/10.1016/0304-405X(76)90028-3
Web of Science ®Google Scholar
Sims, C. A. (1980). Macroeconomics and reality. Econometrica, 48(1), 1–48. https://doi.org/https://doi.org/10.2307/1912017
Web of Science ®Google Scholar
Solo, V., Purdon, P., Weisskoff, R., & Brown, E. (2001). A signal estimation approach to functional MRI. IEEE Transactions on Medical Imaging, 20(1), 26–35. https://doi.org/https://doi.org/10.1109/42.906422
Web of Science ®Google Scholar
Sun, D., & Berger, J. O. (1998). Reference priors under partial information. Biometrika, 85, 55–71. https://doi.org/https://doi.org/10.1093/biomet/85.1.55
Web of Science ®Google Scholar
Sun, D., & Ni, S. (2004). Bayesian analysis of VAR models with noninformative priors. Journal of Statistical Planning and Inference, 121(2), 291–309. https://doi.org/https://doi.org/10.1016/S0378-3758(03)00116-2
Web of Science ®Google Scholar
Tanner, M., & Wang, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of American Statistical Association, 82(398), 528–540. https://doi.org/https://doi.org/10.1080/01621459.1987.10478458
Web of Science ®Google Scholar
Yang, R., & Berger, J. O. (1994). Estimation of a covariance matrix using the reference prior. The Annals of Statistics, 22(3), 1195–1211. https://doi.org/https://doi.org/10.1214/aos/1176325625
Web of Science ®Google Scholar
Zellner, A. (1971). An introduction to Bayesian inference in econometrics. John Wiley & Sons.
Google Scholar
Zellner, A. (1978). Estimation of functions of population means and regression coefficients including structural coefficients: A minimum expected loss approach. Journal of Econometrics, 8(2), 127–158. https://doi.org/https://doi.org/10.1016/0304-4076(78)90024-6
Google Scholar
Zellner, A. (1998). The finite sample properties of simultaneous equations estimates and estimators: Bayesian and non-Bayesian approaches. Journal of Econometrics, 83(1–2), 185–212. https://doi.org/https://doi.org/10.1016/S0304-4076(97)00069-9
Web of Science ®Google Scholar

Appendix. Proof of Fact 2.1 and posterior properties

A.1. Proof of Fact 2.1

Proof. Let

(\tilde{μ}, \tilde{Σ})

denote an arbitrary estimator of

(μ, Σ)

. For the entropy loss function L and posterior

π (μ, Σ ∣ Y)

, the expected posterior loss is

\begin{aligned} R (\tilde{μ}, \tilde{Σ} ∣ Y) & = \frac{T}{2} E {(\tilde{μ} - μ)^{'} {\tilde{Σ}}^{- 1} (\tilde{μ} - μ) \\ + t r (Σ {\tilde{Σ}}^{- 1}) - \log | Σ {\tilde{Σ}}^{- 1} | - p} . \end{aligned}

The Bayes estimator, which minimises the expected posterior loss, can be derived through conditions on first-order derivatives. Note that because

\tilde{Σ}

is symmetric,

\begin{aligned} \frac{\partial R (\tilde{μ}, \tilde{Σ})}{\partial \tilde{μ}} = 2 E (\tilde{Σ} (\tilde{μ} - μ)) . \end{aligned}

Let the derivative be 0 yields

{\hat{μ}}_{i i d} = E (μ ∣ Y)

The following identities are known (e.g. Harville, Citation1998, p. 327) for symmetric matrices $A$ and $B$ , $\begin{aligned} \frac{\partial \log (| A |)}{\partial A} & = 2 A^{- 1} - d i a g (A^{- 1}), \\ \frac{\partial t r (A B^{- 1})}{\partial B} & = - 2 (B^{- 1} A B^{- 1}) + d i a g (B^{- 1} A B^{- 1}) . \end{aligned}$ Here $d i a g (A)$ is a diagonal matrix, whose diagonal elements are these from $A$ . Using the conclusion that the estimator for $μ$ is the posterior mean, we have $E {({\hat{μ}}_{i i d} - μ) ({\hat{μ}}_{i i d} - μ)^{'} ∣ Y} = v a r (μ ∣ Y)$ . Taking this result to the derivative with respect to $\tilde{Σ},$ we have $\begin{aligned} \frac{\partial R ({\hat{μ}}_{i i d}, \tilde{Σ})}{\partial \tilde{Σ}} \\ = E \frac{T}{2} {- 2 {\tilde{Σ}}^{- 1} v a r (μ ∣ Y) {\tilde{Σ}}^{- 1} + d i a g ({\tilde{Σ}}^{- 1} v a r (μ ∣ Y) {\tilde{Σ}}^{- 1}) \\ - 2 {\tilde{Σ}}^{- 1} Σ {\tilde{Σ}}^{- 1} + d i a g ({\tilde{Σ}}^{- 1} Σ {\tilde{Σ}}^{- 1}) + 2 {\tilde{Σ}}^{- 1} - d i a g ({\tilde{Σ}}^{- 1})} . \\ = \frac{T}{2} {2 {\tilde{Σ}}^{- 1} (I - [E (Σ ∣ Y) + v a r (μ ∣ Y)] {\tilde{Σ}}^{- 1}) + \\ - d i a g ({\tilde{Σ}}^{- 1} (I - [E (Σ ∣ Y) + v a r (μ ∣ Y)] {\tilde{Σ}}^{- 1})} . \end{aligned}$ The derivative is 0 when $\begin{aligned} I = [E (Σ ∣ Y) + v a r (μ ∣ Y)] {\tilde{Σ}}^{- 1}, \end{aligned}$ which yields ${\hat{Σ}}_{i i d} = E (Σ ∣ Y) + v a r (μ ∣ Y) .$

A.2. Posterior properties

A.2.1. $(θ ∣ X^{*}, X)$ under normal prior for $Φ$ and Jeffreys type prior for $Σ$

A commonly used noninformative prior for $Σ$ is the Jeffreys prior $π_{J} (Σ) \propto | Σ |^{- (p + 1) / 2} .$ The prior for $Σ$ in the RATS statistical package is a modified version of the Jeffreys prior, $π_{A} (Σ) \propto | Σ |^{- (L + 1) p / 2 - 1} .$ Zellner's maximum data informative (MDI) prior, $π_{M} (Σ) \propto | Σ |^{- 1 / 2} .$ For analysis on prior choice in VAR models see Ni and Sun (Citation2003), and Sun and Ni (Citation2004). We consider a class of joint priors, (A1) $\begin{aligned} π_{b} (φ, Σ) = π_{N} (φ) π_{b}^{*} (Σ), \end{aligned}$ (A1) where $π_{N} (φ)$ is the normal prior for $φ$ given by (Equation10(10) $\begin{aligned} π_{N} (φ) \propto | M_{0} |^{- 1 / 2} \exp \{- \frac{1}{2} (φ - φ_{0})^{'} M_{0}^{- 1} (φ - φ_{0})\} . \end{aligned}$ (10) ), and $π_{b}^{*} (Σ)$ ( $b \in I R$ ) is given by (A2) $\begin{aligned} π_{b}^{*} (Σ) \propto \frac{1}{| Σ |^{b / 2}} . \end{aligned}$ (A2) Note that $π_{N J}, π_{N A}$ , and $π_{N M}$ are special cases of (EquationA2(A2) $\begin{aligned} π_{b}^{*} (Σ) \propto \frac{1}{| Σ |^{b / 2}} . \end{aligned}$ (A2) ) when b equals to p + 1, $(L + 1) p + 2$ and 1, respectively.

We propose using the posterior quantities conditional on the simulated data $X^{*}$ , $h (θ ∣ X^{*}, X)$ , instead of the marginal posterior, $h (θ ∣ X)$ , as the estimator of $h (θ)$ . Note that the posterior $\begin{aligned} f (Φ, Σ ∣ X^{*}, X) & \propto \frac{1}{| Σ |^{T + b / 2}} \\ \times e t r \{- \frac{1}{2} (Y - X Φ) Σ^{- 1} (Y - X Φ)^{'}\} \\ \times e t r \{- \frac{1}{2} (Y^{*} - X^{*} Φ) Σ^{- 1} (Y^{*} - X^{*} Φ)^{'}\} \\ \times \exp \{- \frac{1}{2} (φ - φ_{0})^{'} M_{0}^{- 1} (φ - φ_{0})\} . \end{aligned}$ We would like to express the posterior of $Σ$ in terms of $X^{*}$ . Integrating out $Φ$ results in $\begin{aligned} f (Σ ∣ X^{*}, X) \\ = \frac{| M_{0} + Σ \otimes (X^{'} X + {X^{*}}^{'} X^{*})^{- 1} |^{1 / 2}}{| Σ |^{T + b / 2}} \\ \times e t r \{- \frac{1}{2} Σ^{- 1} [(Y - X {\hat{Φ}}_{M})^{'} (Y - X {\hat{Φ}}_{M}) \\ + (Y^{*} - X^{*} {\hat{Φ}}_{M}^{*})^{'} (Y^{*} - X^{*} {\hat{Φ}}_{M}^{*})]\} \\ \times \exp \{- \frac{1}{2} (\tilde{φ} - φ_{0})^{'} [M_{0} + Σ \otimes (X^{'} X \\ + {X^{*}}^{'} X^{*})^{- 1}]^{- 1} (\tilde{φ} - φ_{0})\} \\ \times \exp \{- \frac{1}{2} ({\hat{φ}}_{M} - {\hat{φ}}_{M}^{*})^{'} Σ^{- 1} \otimes [(X^{'} X)^{- 1} \\ + ({X^{*}}^{'} X^{*})^{- 1}]^{- 1} ({\hat{φ}}_{M} - {\hat{φ}}_{M}^{*})\}, \end{aligned}$ where the lower case defines the vec operator: $φ = v e c (Φ)$ , $\tilde{Φ} = (X^{'} X + {X^{*}}^{'} X^{*})^{- 1} (X^{'} Y + {X^{*}}^{'} Y^{*}),$ ${\hat{Φ}}_{M} = (X^{'} X)^{- 1} X^{'} Y$ , and ${\hat{Φ}}_{M}^{*} = ({X^{*}}^{'} X^{*})^{- 1} {X^{*}}^{'} Y^{*} .$

The posterior $f (Φ, Σ ∣ X^{*}, X)$ has a closed form when the prior for $φ$ can be written as $N (φ_{0}, M_{0})$ with (A3) $\begin{aligned} M_{0} = Σ \otimes Ω_{0}, \end{aligned}$ (A3) where $Ω_{0}$ is a Lp + 1 by Lp + 1 known covariance matrix. In the extreme case of ${Ω_{0}}^{- 1} \to 0$ , $M_{0}^{- 1} \to 0$ and the prior approaches a constant prior. Under the assumption (EquationA3(A3) $\begin{aligned} M_{0} = Σ \otimes Ω_{0}, \end{aligned}$ (A3) ), we have $\begin{aligned} f (Σ ∣ X^{*}, X) \propto | Ω_{0} + (X^{'} X + {X^{*}}^{'} X^{*})^{- 1} |^{p / 2} \\ | Σ |^{- T - \frac{b - L p - 1}{2}} e t r (- \frac{1}{2} V Σ^{- 1}), \end{aligned}$ where $\begin{aligned} V (X, X^{*}) \\ = ({\hat{Φ}}_{M} - {\hat{Φ}}_{M}^{*})^{'} [(X^{'} X)^{- 1} + ({X^{*}}^{'} X^{*})^{- 1}] ({\hat{Φ}}_{M} - {\hat{Φ}}_{M}^{*}) \\ + (\hat{Φ} - Φ_{0})^{'} [Ω_{0} + (X^{'} X + {X^{*}}^{'} X^{*})^{- 1}]^{- 1} (\tilde{Φ} - Φ_{0}) \\ + (Y - X {\hat{Φ}}_{M})^{'} (Y - X {\hat{Φ}}_{M}) \\ + (Y^{*} - X^{*} {\hat{Φ}}_{M}^{*})^{'} (Y^{*} - X^{*} {\hat{Φ}}_{M}^{*}) . \end{aligned}$ $f (Σ ∣ X^{*}, X) \sim I W {V, 2 T + b - (L + 1) p - 2}$ . The mean of $Σ ∣ X^{*}, X$ is $V (X, X^{*}) / (2 T + b - (L + 1) p - 2 - p - 1) = V (X, X^{*}) / (2 T + b - (L + 2) p - 3)$ . The posterior mean $Σ ∣ X$ is estimated by $\sum_{j = 1}^{M} (Σ ∣ X_{j}^{*}, X) / M$ .

The marginal posterior of $f (Φ ∣ X^{*}, X)$ can be obtained by integrating out $Σ$ in $f (Φ, Σ ∣ X^{*}, X)$ . It is easy to verify that (A4) $\begin{aligned} f (Φ, Σ ∣ X^{*}, X) \propto | Σ |^{- T - b / 2} e t r (- \frac{1}{2} U Σ^{- 1}) \\ = | Σ |^{- (ν + p + 1) / 2} e t r (- \frac{1}{2} U Σ^{- 1}), \end{aligned}$ (A4) where the degree of freedom $ν = 2 T + b - p - 1$ and (A5) $\begin{aligned} U & = (Φ - Φ^{u})^{'} ({Ω_{0}}^{- 1} + X^{'} X + {X^{*}}^{'} X^{*}) (Φ - Φ^{u}) + V (X, X^{*}), \end{aligned}$ (A5) (A6) $\begin{aligned} Φ^{u} & = ({Ω_{0}}^{- 1} + X^{'} X + {X^{*}}^{'} X^{*})^{- 1} ({Ω_{0}}^{- 1} Φ_{0} + X^{'} Y + {X^{*}}^{'} Y^{*}) . \end{aligned}$ (A6)

It follows from (EquationA4(A4) $\begin{aligned} f (Φ, Σ ∣ X^{*}, X) \propto | Σ |^{- T - b / 2} e t r (- \frac{1}{2} U Σ^{- 1}) \\ = | Σ |^{- (ν + p + 1) / 2} e t r (- \frac{1}{2} U Σ^{- 1}), \end{aligned}$ (A4) ) that (A7) $\begin{aligned} f (Φ ∣ X^{*}, X) \propto | U |^{- ν / 2} \\ = |(Φ - Φ^{u})^{'} ({Ω_{0}}^{- 1} \\ {+ X^{'} X + {X^{*}}^{'} X^{*}) (Φ - Φ^{u}) + V (X, X^{*})|}^{- ν / 2}, \end{aligned}$ (A7) which follows a matrix version of the Student-t distribution. The mean of $Φ ∣ X^{*}, X$ is $Φ^{u}$ . To calculate the intrinsic Bayes estimator, note that the frequentist expectation $\hat{W}$ can be estimated by $\sum_{j = 1}^{M} X_{j}^{*'} X_{j}^{*} / M$ . The posterior mean $E (W Φ ∣ X)$ is estimated by $\sum_{j = 1}^{M} X_{j}^{*'} X_{j}^{*} Φ (X_{j}^{*}, X) / M .$

A.2.2. Conditional posteriors under normal prior for $Φ$ and the Yang-Berger reference prior for $Σ$

Fact A.1

Consider the normal prior for $φ$ given in (Equation10(10) $\begin{aligned} π_{N} (φ) \propto | M_{0} |^{- 1 / 2} \exp \{- \frac{1}{2} (φ - φ_{0})^{'} M_{0}^{- 1} (φ - φ_{0})\} . \end{aligned}$ (10) ). The conditional density of $φ$ given $(Σ; Y)$ is $N_{J} (μ_{M}, V_{M}),$ where (A8) $\begin{aligned} μ_{M} & = {\hat{φ}}_{M L E} + {M_{0}^{- 1} + Σ^{- 1} \otimes (X^{'} X)}^{- 1} M_{0}^{- 1} (φ_{0} - {\hat{φ}}_{M L E}); \end{aligned}$ (A8) (A9) $\begin{aligned} V_{M} & = {M_{0}^{- 1} + Σ^{- 1} \otimes (X^{'} X)}^{- 1}, \end{aligned}$ (A9) where (A10) $\begin{aligned} {\hat{φ}}_{M L E} = v e c {(X^{'} X)^{- 1} X^{'} Y} . \end{aligned}$ (A10)

Fact A.2

Consider the normal prior for $φ$ given in (Equation10(10) $\begin{aligned} π_{N} (φ) \propto | M_{0} |^{- 1 / 2} \exp \{- \frac{1}{2} (φ - φ_{0})^{'} M_{0}^{- 1} (φ - φ_{0})\} . \end{aligned}$ (10) ). The conditional density of $φ$ given $(Σ; Y, Y^{*})$ is $\sim N (φ_{M}^{*}, V_{M}^{*}),$ where $\begin{aligned} V_{M}^{*} & = {M_{0}^{- 1} + Σ^{- 1} \otimes (X^{'} X + {X^{*}}^{'} X^{*})}^{- 1}, \\ φ_{M}^{*} & = V_{M}^{*} [{Σ^{- 1} \otimes (X^{'} X + {X^{*}}^{'} X^{*})} {\hat{φ}}_{M L E}^{*} + M_{0}^{- 1} φ_{0}]; \\ {\hat{φ}}_{M L E}^{*} & = v e c {(X^{'} X + {X^{*}}^{'} X^{*})^{- 1} (X^{'} Y + {X^{*}}^{'} Y^{*})} . \end{aligned}$

Fact A.3

The conditional density of $Σ$ given $(φ, Y)$ is (A11) $\begin{aligned} π (Σ | Φ, Y) \propto \frac{e t r {- \frac{1}{2} Σ^{- 1} S (Φ)}}{| Σ |^{T / 2 + 1} \prod_{1 \leq i < j \leq p} (d_{i} - d_{j})}, \end{aligned}$ (A11) where $S (Φ) = (Y - X Φ)^{'} (Y - X Φ)$ .

Fact A.4

The conditional density of $Σ$ given $(φ, Y, Y^{*})$ is (A12) $\begin{aligned} π (Σ | Φ, Y, Y^{*}) \propto \frac{e t r {- \frac{1}{2} Σ^{- 1} (S (Φ) + S^{*} (Φ))}}{| Σ |^{T + 1} \prod_{1 \leq i < j \leq p} (d_{i} - d_{j})}, \end{aligned}$ (A12) where $S^{*} (Φ) = (Y^{*} - X^{*} Φ)^{'} (Y^{*} - X^{*} Φ)$ .

We have shown how simulated data facilitate computation of a frequentist moment in the Bayes estimator. In the appendix, we will show that the simulated data can also be used to reduced the variance of MCMC, making the simulation more efficient.

For simulation of $Σ$ , we adopt a hit-and-run algorithm used in Yang and Berger (Citation1994). In implementing the algorithm, we consider a one-to-one transformation $Σ^{#} = \log (Σ)$ , or $Σ = \exp (Σ^{#})$ in the sense that $\begin{aligned} Σ = \sum_{j = 0}^{\infty} \frac{(Σ^{#})^{j}}{j!} . \end{aligned}$ The reason for simulating $Σ$ as $\exp (Σ^{#})$ is to ensure the generated $Σ$ matrices are positive definite. It can be shown that the conditional posterior density of $Σ^{#}$ given $(Φ, Y)$ is (A13) $\begin{aligned} π (Σ^{#} | Φ, Y) \\ = π (Σ^{#} | S (Φ)) \propto \frac{e t r {- \frac{T}{2} | Σ^{#} | - \frac{1}{2} (e x p Σ^{*})^{- 1} S (Φ)}}{\prod_{i < j} (d_{i}^{#} - d_{j}^{#})}, \end{aligned}$ (A13) and that the conditional posterior density of $Σ^{*}$ given $(φ, Y, Y^{*})$ is then (A14) $\begin{aligned} π (Σ^{#} | Φ, Y, Y^{*}) = π (Σ^{#} | S (Φ), S^{*} (Φ)) \\ \propto \frac{e t r [- T | Σ^{#} | - \frac{1}{2} (e x p Σ^{*})^{- 1} {S (Φ) + S^{*} (Φ)}]}{\prod_{i < j} (d_{i}^{#} - d_{j}^{#})}, \end{aligned}$ (A14) where $Σ^{#} = O^{'} D^{#} O$ , $O$ is an orthogonal matrix, $D^{#} = d i a g (d_{1}^{#}, \dots, d_{p}^{#})$ , with $d_{1}^{#} \geq \dots \geq d_{p}^{#}$ . Note that $e x p (Σ^{#}) = O^{'} \exp (D^{#}) O$ .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Intrinsic Bayesian estimation of linear time series models

Abstract

1. Introduction

2. Entropy loss function for the iid multivariate and LTS models

2.1. Entropy loss function for the iid model

2.2. Entropy loss functions for LTS models

3. Approaches of computing the expectation $E_{(X ∣ Φ, Σ)} (X^{'} X)$

3.1. A general algorithm using data as latent parameters

4. Bayesian estimation of $(Φ, Σ)$ in LTS models

4.1. Priors

4.2. A simulation algorithm for LTS models under the normal-reference prior

4.3. Computing the posterior average loss

5. A numerical example and an empirical study

5.1. A numerical example

Table 1. Posterior average loss of the estimates in the Example.

5.2. An empirical study: seasonal effects in a macroeconomic model

Table 2. Estimates of three equations.

6. Concluding remarks

Disclosure statement

Notes on contributors

Shawn Ni

Dongchu Sun

References

Appendix. Proof of Fact 2.1 and posterior properties

A.1. Proof of Fact 2.1

A.2. Posterior properties

A.2.1. $(θ ∣ X^{*}, X)$ under normal prior for $Φ$ and Jeffreys type prior for $Σ$

A.2.2. Conditional posteriors under normal prior for $Φ$ and the Yang-Berger reference prior for $Σ$

Information for

Open access

Opportunities

Help and information

Intrinsic Bayesian estimation of linear time series models

Abstract

1. Introduction

2. Entropy loss function for the iid multivariate and LTS models

2.1. Entropy loss function for the iid model

2.2. Entropy loss functions for LTS models

3. Approaches of computing the expectation E(X∣Φ,Σ)(X′X)

3.1. A general algorithm using data as latent parameters

4. Bayesian estimation of (Φ,Σ) in LTS models

4.1. Priors

4.2. A simulation algorithm for LTS models under the normal-reference prior

4.3. Computing the posterior average loss

5. A numerical example and an empirical study

5.1. A numerical example

Table 1. Posterior average loss of the estimates in the Example.

5.2. An empirical study: seasonal effects in a macroeconomic model

Table 2. Estimates of three equations.

6. Concluding remarks

Disclosure statement

Additional information

Notes on contributors

Shawn Ni

Dongchu Sun

References

Appendix. Proof of Fact 2.1 and posterior properties

A.1. Proof of Fact 2.1

A.2. Posterior properties

A.2.1. (θ∣X∗,X) under normal prior for Φ and Jeffreys type prior for Σ

A.2.2. Conditional posteriors under normal prior for Φ and the Yang-Berger reference prior for Σ

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3. Approaches of computing the expectation $E_{(X ∣ Φ, Σ)} (X^{'} X)$

4. Bayesian estimation of $(Φ, Σ)$ in LTS models

A.2.1. $(θ ∣ X^{*}, X)$ under normal prior for $Φ$ and Jeffreys type prior for $Σ$

A.2.2. Conditional posteriors under normal prior for $Φ$ and the Yang-Berger reference prior for $Σ$