472
Views
0
CrossRef citations to date
0
Altmetric
Articles in the special topic of Bayesian analysis

Intrinsic Bayesian estimation of linear time series models

&
Pages 275-287 | Received 10 May 2019, Accepted 15 Mar 2020, Published online: 02 Apr 2020

Abstract

Intrinsic loss functions (such as the Kullback–Leibler divergence, i.e. the entropy loss) have been used extensively in place of conventional loss functions for independent samples. But applications in serially correlated samples are scant. In the present study, we examine Bayes estimator of Linear Time Series (LTS) model under the entropy loss. We derive the Bayes estimator and show that it involves a frequentist expectation of regressors. We propose a Markov Chain Monte Carlo procedure that jointly simulates the posteriors of the LTS parameters with frequentist expectation of regressors. We conduct Bayesian estimation of an LTS model for seasonal effects in some U.S. macroeconomic variables.

AMS 1991 Subject Classifications:

1. Introduction

To analyse dynamics of multi-variate economic systems, researchers frequently employ Linear Time Series (LTS) models (see for example, Sims, Citation1980 and the ensuing literature). Bayesian inference of such models often requires point estimate of parameters because reporting the entire posterior distribution is made difficult by a prohibitively large number of parameters. A critical aspect in Bayesian estimation is the choice of loss function.

In this study, we derive Bayes estimator of LTS models based on the intrinsic loss. We illustrate a computational problem arising from serial correlation in the models when applying the intrinsic loss and our solution to the problem.

A loss function L(θ,θˆ) measures the distance between the parameter θ and its estimate θˆ. Such a metric is often specified for convenience given the problem at hand instead of grounding on a general principle. Bernardo and Juárez (Citation2003) noted that for inferential purposes, what matters most is not the distance between θ and θˆ, instead it is the intrinsic loss – the distance between the probability model f(xθˆ) (corresponding to the estimate θˆ) and f(xθ) (corresponding to the actual parameter θ). Robert (Citation1994Citation1996) proposed using the logarithmic divergence (also known as the Kullback–Leibler divergence or the entropy loss) as the intrinsic loss. The intrinsic loss has a number of desirable properties not generally possessed by conventional loss functions. For example, it is invariant to transformation of data x or parameter θ and has the additive property that the loss of the sum of two independent data sets is the sum over the two losses corresponding to each data set.

The intrinsic loss has been used for independent samples for Bayesian estimation. It has also be used in various contexts for time series data. For instance, Kitamura and Stutzer (Citation1997) used the Kullback–Leibler distance to derive a frequentist estimator for nonlinear models. Solo et al. (Citation2001) used the Kullback–Leibler distance for evaluation of signal processing model. Robertson et al. (Citation2005) used the entropy divergence for evaluation of forecasting density. Fernandez-Villaverde and Rubio-Ramirez (Citation2004) used the Kullback–Leibler distance to evaluate dynamic equilibrium models in economics. However, employing the intrinsic loss for Bayesian estimation of time series models leads to technical challenges.

To illustrate the difference of the intrinsic loss in independent models and serially correlated models, consider the following examples. First, suppose y={y1,,yT}, where yt (t=1,,T) are independently identically distributed (iid) N(ρ,1) and we are interested in estimating the mean parameter ρ under the entropy loss κ(ρˆρ)=logf(yρ)f(yρˆ)f(yρ)dy=Eyρlogf(yρ)f(yρˆ). By the assumption on the model f(yρ)exp{12×t=1T(ytρ)2}. It is easy to verify that κ(ρˆρ)=(T/2)(ρρˆ)2. In this case, the intrinsic loss coincides with the commonly used quadratic loss, which implies that the Bayes estimator of ρ is the posterior mean. Now consider an AR(1) model: yt=ρyt1+ϵt, for t=1,,T, where ϵt is iid N(0,1), and ρ is the only unknown parameter.

The entropy loss is still κ(ρˆρ)=logf(yρ)f(yρˆ)f(yρ)dy=Eyρlogf(yρ)f(yρˆ), but now f is the density of the AR variable. Substitute in the distribution of data κ(ρˆρ)=(ρρˆ)22Eyρt=1Tyt12=(ρρˆ)2δ(y0,ρ), where δ(y0,ρ)=12Eyρt=1Tyt12=12t=1Tρ2(t1)y02+1+ρ2++ρ2(t2)=121ρ2T1ρ2y02+11ρ2T1ρ2T1ρ2. It is obvious that δ(y0,ρ) is an increasing function of ρ2 and is nonnegative for any ρ. A Bayes estimator (which is called a generalised Bayes estimator if the prior is improper) minimises the Bayesian posterior expected loss. If the entropy loss is employed, the Bayes estimator for ρ with a given initial condition y0 is ρˆE=argmin Eρy[δ(y0,ρ)(ρρˆ)2]=Eρy{δ(y0,ρ)ρ}/Eρy{δ(y0,ρ)}. Note that if ρ is positive, ρ and δ(y0,ρ) are positively correlated. It follows that the Bayes estimator under the entropy loss for a positive ρ is larger than the posterior mean. It is well known that the MLE (and the posterior mean under constant prior) of ρˆM is biased downward, especially when the true parameter is close to unity (see MacKinnon & Smith, Citation1998) Note that under the constant prior t=1Tyt12 is the posterior precision (i.e. the inverse of posterior variance) for ρ. Hence the weight for the square of estimation error in the intrinsic loss function, δ(y0,ρ)=Eyρt=1Tyt12, is larger in the region of ρ where the posterior precision is high. It is in the spirit of Zellner's (Citation1978, Citation1998) ‘precision of the estimation’ loss. This is in contrast of quadratic loss that imposes the same weight on all regions of ρ.

Now we turn to the model of interest. The LTS of a p dimensional column endogenous variable yt and a q dimensional column exogenous (predetermined) variable x0t (t=1,,T) has the form: (1) yt=x0tB0+j=1LytjBj+ϵt,(1) where L is a known positive integer, B0 is a q×p unknown matrix, Bj is an unknown p×p matrix, ϵ1,,ϵT are iid Np(0,Σ) errors, and Σ is an unknown p×p positive definite matrix.

A special case of the above LTS model is all of the lag coefficients B1,,BL are zero (i.e. all regressors (with q>p) are exogenous variables.) The exogenous variables may be functions of time. For example, in modelling of climate temperature or holiday consumer spending, seasonal dummies may be introduced in the model. In economic applications, these exogenous variables may also be variables of government policies. Another special case is x0t is a 1×p constant vector with elements of unity. The regressors only include lags of the variable yt. The LTS model becomes a Vector AutoRegression (VAR), which is commonly used for modelling of macroeconomic time series.

We can rewrite Equation (Equation1) in the familiar matrix form (2) Y=XΦ+ϵ,(2) where X=x1xT=(X0,X1);X0=x01x0T,X1=y0y1LyT1yTL;Φ=B0Φ1=B0B1BL;Y=y1yT,ϵ=ϵ1ϵT. Here X0 and X1 are T×q and T×Lp; the former does not depend on parameters Σ and Φ, but the latter does. Y and ϵ are T×p matrices, Φ is a (q+Lp)×p matrix of unknown parameters, xt is a 1×(q+Lp) row vector, and X is a T×(q+Lp) matrix of observations. The likelihood function of (Φ,Σ) based on Y is then (3) f(YΦ,Σ)1|Σ|T/2exp12t=1T(ytxtΦ)Σ1(ytxtΦ)=1|Σ|T/2etr12(YXΦ)Σ1(YXΦ).(3) Here and hereafter etr(A) is exp(tr(A)) of a matrix A.

The present paper achieves two goals. The first one is derivation of the Bayes estimator of LTS model under the entropy loss. We show that the entropy loss on (Φ,Σ) is non-separable in Σ and Φ, which can be written as the sum of losses pertaining to the covariance matrix Σ and normalised estimation error of Φ. The form of the Φ-part of the entropy loss for LTS is tr{Σˆ1(ΦΦˆ)E(XΦ,Σ)(XX)(ΦΦˆ)}, where Σˆ is the Bayes estimator of Σ. Under the entropy loss, the Bayes estimator distinctly differs from the posterior mean and differs from that of the iid multivariate normal model. The part of the intrinsic loss function associated with the regression coefficients turns out to be related with a conventional loss function. For estimation of a matrix parameter such as Φ in the simultaneous equations context, Zellner (Citation1978Citation1998) proposed a ‘precision of estimation’ loss that can also be written as tr{Σ1(ΦΦˆ)(XX)(ΦΦˆ)}. However, in Zellner's simultaneous equations model, XX is taken as given, but in LTS the predetermined variable XX depends on parameters (Φ,Σ).

The second goal concerns numerical estimation of the intrinsic Bayes estimator via Markov Chain Monte Carlo (MCMC). We propose a general algorithm that generate regressors as latent parameters in simulation of posteriors of parameters of LTS models. Data augmentation in this study differs from that in Tanner and Wang's Citation1987 seminal paper in motivation and implementation. Tanner and Wang use data augmentation to alter the likelihood function for easier MCMC simulation of the posteriors. In this study, the likelihood function of the generated data is the same as the likelihood of the sample data. Here, data augmentation does not make it easier for posterior simulation. Instead, it makes it possible to compute frequentist moment E(XΦ,Σ)(XX) of the LTS variables. The frequentist moment, simulated jointly with parameters, is used to produce Bayes estimates under the entropy loss.

Besides the choice of loss function, the choice of prior also plays a pivotal role in Bayesian estimation. Jeffreys prior on Σ (see Zellner, Citation1971) is a noninformative prior for Σ that gives rise to conditional posteriors in well known distributions. Ni et al. (Citation2007) conducted Bayesian estimation of VAR model under the entropy loss, using the Jefferys prior for Σ. However, despite its popularity the Jeffreys prior is known for producing unsatisfactory results in multi-parameter settings. In this study we simulate the LTS model under a combination of normal prior on regression parameters and Yang and Berger (Citation1994) reference prior on Σ. The conditional posteriors of Σ are simulated using a Metropolis-Hastings algorithm. Our empirical application shows that despite the fact that LTS models involve a large number of parameters and a large number of latent variables, the data-augmentation algorithm is quite efficient.

In Section 2 of the paper, we derive the Bayes estimator of LTS models under the entropy loss function and discuss computation of the weighting matrix in the Bayes estimator. In Section 3, we present a general algorithm using generated data as latent parameters. In Section 4, we lay out the MCMC algorithm for computing (Φ, Σ) in the LTS model. In Section 5, we first compare intrinsic Bayes estimator with other estimators in a numerical example and then estimate a LTS model using seasonally unadjusted macroeconomic data. In Section 6 we offer concluding remarks.

2. Entropy loss function for the iid multivariate and LTS models

2.1. Entropy loss function for the iid model

We first consider the entropy loss function (Robert, Citation1994, p. 74) for a multivariate normal distribution. Let Y=(x1,x2,,xT) be a random sample from Np(μ,Σ). One can compute the entropy loss function as L(μ~,Σ~;μ,Σ)=logp(xμ,Σ)p(xμ~,Σ~)p(xμ,Σ)dx=T2tr(ΣΣ~1)log|ΣΣ~1|p+(μ~μ)Σ~1(μ~μ). Here p(xμ,Σ) is the density of Np(μ,Σ). Clearly, the loss function has two parts. One part is related to the means μ~ and μ (with Σ~ as the weighting matrix), and the other part is related to Σ~ and Σ. The following fact states that Bayes estimator for μ is the posterior mean of μ but that of Σ is larger than the posterior mean of Σ.

Fact 2.1

Under the entropy loss L, the generalised Bayes estimator of (μ,Σ) is μˆiid=E{μ|Y},Σˆiid=E(ΣY)+var(μY).

Note that Y represents data, expectation E(.) and variance var(.) are with respect to the posterior distribution. The proof is in the appendix.

2.2. Entropy loss functions for LTS models

Recall that for the LTS model (Equation2), the likelihood function of (Φ,Σ) is of the form (Equation3). The entropy loss for the LTS model is (4) L(Φ~,Σ~;Φ,Σ)=E(YΦ,Σ)logf(YΦ,Σ)f(YΦ~,Σ~),(4) where for computing the expectation on the right-hand side, (Φ~,Σ~) is not a function of Y. The entropy loss L(Φ~,Σ~;Φ,Σ) can be decomposed into two parts. One part measures the loss associated with the covariance matrix only, while the second part measures the loss of coefficients Φ, but related to the covariance matrix Σ as well. Because ytXtΦ (t=1,,T) are iid Np(0,Σ), we have E(XΦ,Σ)(YXΦ)=0,E(XΦ,Σ){(YXΦ)XΦ}=0,E(XΦ,Σ){(YXΦ)(YXΦ)}=TΣ. Then E(XΦ,Σ){log|Σ|T/2etr{12(YXΦ)Σ1(YXΦ)}|Σˆ|T/2etr{12(YXΦˆ)Σˆ1(YXΦˆ)}}=T2(log|ΣˆΣ1|p)+12trE(XΦ,Σ)(YXΦˆ)Σˆ1(YXΦˆ)=T2(log|ΣˆΣ1|p)+T2tr(ΣˆΣ1)+12trE(XΦ,Σ)X(ΦΦˆ)Σˆ1(ΦΦˆ)X=T2tr(Σˆ1Σ)log|Σˆ1Σ|p+12trΣˆ1(ΦΦˆ)E(XΦ,Σ)(XX)(ΦΦˆ). The result of this derivation can be summarised by the following lemma.

Lemma 2.1

The entropy loss function for the LTS model is (5) T2{tr(Σ~1Σ)log|Σ~1Σ|p}+12tr{Σ~1(ΦΦ~)W(ΦΦ~)},(5) where (6) W=1TE(XΦ,Σ)(XX).(6)

This Lemma can be proved using algebra similar to that in the iid case. However, there is an important difference. For the LTS model, Bayes estimators involve matrix W, a frequentist expectation of XX for given parameters (Φ,Σ). For the iid case, no such term is present. The next theorem gives the form of the Bayes estimators under the entropy loss.

Theorem 2.1

The generalised Bayes estimator of (Φ,Σ) under the entropy loss is (7) ΦˆE={E(WY)}1E(WΦY),(7)

(8) ΣˆE=E(ΣY)+E{(ΦΦˆE)W(ΦΦˆE)Y}.(8)

The above theorem can be proved similarly as Fact 2.1.

Under the special case with no lag coefficients in the regression, we have W=E(XΦ,Σ)(XX)=X0X0, which is not a function of Σ and Φ. It follows that the Bayes estimator of ΦˆE is the posterior mean, as it is for the iid model. This observation is stated in the following remark.

Remark 1

If Bj=0 for j>0, then ΦˆE=E{ΦY},ΣˆE=E(ΣY)+1TX0X0Var(ΦY).

However, the Bayes estimator for the LTS model is generally different from the iid case. The Bayes estimator ΦˆE for the LTS model is not the posterior mean. To compare the estimator ΦˆE with the posterior mean, note that in general ΦˆE=E(ΦY)+{E(WY)}1Cov(W,ΦY). Because W=E(XΦ,Σ)(XX) and Φ are likely to be positively correlated, the Bayes estimator of Φ under the intrinsic loss is likely to be larger than the posterior mean. It is known that MLE and the posterior mean of Φ under a diffuse prior is likely to have a downward bias when the true parameters are closed to random walk, a typical pattern of macroeconomic data. The form of Bayes estimator of Φ based on the intrinsic loss is helpful in correcting the bias in the posterior mean.

The estimator in LTS model involves the frequentist expectation W. The W matrix depends on specifications of the regressors X. If the regressors are specified as functions of lags of Y, the computation for W matrix becomes nontrivial.

Using notation in Equation (Equation1), the frequentist expectation matrix W can be written as E(XΦ,Σ)(XX)=X0X0E(XΦ,Σ)(X0X1)E(XΦ,Σ)(X1X0)E(XΦ,Σ)(X1X1).

For exogenous variables X0, there is no need deriving general closed-form expression for the terms in the above matrix as functions of parameter Φ and Σ. On the other hand, due to the serial correlations of yt, computation of E(XΦ,Σ)(X1X1) is not straightforward. In the presence of exogenous variables no analytical expression for W is available. In the following, we discuss approaches to Bayesian estimation under the entropy loss for the general LTS model.

3. Approaches of computing the expectation E(XΦ,Σ)(XX)

Theorem 2.1 shows that under the entropy loss the Bayes estimator for (Φ,Σ) involves the frequentist expectation W=E(XΦ,Σ)(XX), and we need to compute the posterior moments E(WY), E(WΦY), and E(ΦWΦY).

The frequentist expectation E(XΦ,Σ)(XX) depends on Φ and Σ. For the LTS model E(XΦ,Σ)(XX) does not have an analytical form and needs to be computed numerically for a given (Φ,Σ). We use Y and X to denote observed data in the LTS model (Equation2). We generate Y and X from the same model in (Equation2) given parameters Φ and Σ, in order to compute E(XΦ,Σ)(XX). There is only one observed data set Y and X but there are many sets of generated Y and X. Suppose Φ and Σ need to be simulated by an MCMC algorithm, then Y and X need to be generated for each draw of Φ and Σ.

One approach to computing E(XΦ,Σ)(XX) is straight forward but time-consuming: for each Φk and Σk drawn in the kth MCMC cycle we generate many sets of X and use the average of XX to approximate E(XΦ,Σ)(XX). While this approach is possible in theory its high computational cost renders it infeasible in practice. For practical purposes, we must take an alternative approach to compute Bayes estimates.

Fortunately, we have an alternative approach that does not require much additional computational cost beyond simulating (Φ,Σ). Suppose we simulate one set of data Xk from the LTS model in each MCMC cycle with simulated parameters of (Φk1,Σk1), and then simulate the parameters of the next MCMC cycle (Φk,Σk) conditional on both the sample data X and the simulated data Xk. We will demonstrate that the posterior moments such as E(WY), E(WΦY), and E(ΦWΦY) can be computed through simulated parameters (Φk,Σk) and the jointly simulated data Xk (for k=1,,M). The simulated data are in essence latent parameters. They are not the subject of our interests per se but are useful for simulation of parameter of interest (i.e. the frequentist expectation W). Data augmentation is not uncommon in Bayesian simulations, but as we noted in the introduction, this data-augmented simulation approach differs from its other uses in the econometrics and statistics literature. One question of practical importance remains though: The number of elements in simulated matrix X1 has the dimension of T×Lp, which can be quite large. Do we have to simulate very long Markov chains to assure the averages are good approximates of the posterior mean? Fortunately, our numerical results show that the answer to the question is “no”.

In the following we propose a general algorithm that formalises the data-augmentation idea discussed above.

3.1. A general algorithm using data as latent parameters

Suppose that observed data X has the density f(xθ), where parameter vector θ is unknown. A prior π(θ) can be informative or noninformative. Let X be a random vector (or a matrix) with the density f(xθ). Let h(θ) be a function of the parameters θ. We are interested in the posterior mean of the quantity E(Xθ)g(X,h(θ)) given the data X.

Our algorithm is based on the following fact: E{E(Xθ)g(X,h(θ))X}={g(x,h(θ))f(xθ)dx}f(Xθ)π(θ)dθf(Xθ)π(θ)dθ=g(x,h(θ))π(x,θX)dxdθ, where (9) π(x,θX)=f(xθ)f(Xθ)π(θ)f(x~θ~)f(Xθ~)π(θ~)dx~dθ~.(9) If we have a random sample (Xk,θk),k=1,,M, from the joint distribution of (Equation9), we can estimate E(E(Xθ){g(X,θ)}h(θ)X) by using the result Eˆ[E{g(X,h(θ))θ}X]=Eˆ(X,θX){g(X,h(θ))}=1Mk=1Mg(Xkh(θk)).

The problem becomes to generate observations from the joint distribution of (X,θ) given the data X. For this task the following MCMC method can be used.

Suppose that at the beginning of cycle k we have (Xk1,θk1).

Simulating full conditional posterior: We sample from π(θX,X)f(Xθ)f(Xθ)π(θ).

Step 1. Simulate Xkf(xθk1).

Step 2. Simulate θkπ(θXk,X)f(Xkθ)f(Xθ)π(θ).

4. Bayesian estimation of (Φ,Σ) in LTS models

4.1. Priors

The Bayes estimator of LTS depends on the prior of (Φ,Σ) We assume prior independence so the prior for π(Φ,Σ) is π(φ)π(Σ), the product of priors for Φ and Σ.

For estimation of regression coefficient Φ, a popular informative prior of φ=vec(Φ) is the normal distribution, N(φ0,M0), with hyperparameters φ0 and M0: (10) πN(φ)|M0|1/2exp12(φφ0)M01(φφ0).(10)

A popular class of non-informative prior on Σ is πb(Σ)1/|Σ|b/2. If b = p + 1, πb(Σ) becomes the Jeffreys prior (see Zellner, Citation1971)

Ni et al. (Citation2007) examined intrinsic Bayes estimator under prior πb(φ,Σ)=πN(φ)πb(Σ).

In the appendix we show posteriors π(ΣX,X) and π(ΦX,X) can be obtained in analytical form.

As mentioned in the introduction, in multiple-parameter settings the Jeffreys prior often has undesirable properties. Bernardo (Citation1979) proposed an approach of deriving a reference prior by breaking a single multiparameter problem into a consecutive series of problems with fewer numbers of parameters. For examples where the reference priors produce more desirable estimates than the Jeffreys priors, see Berger and Bernardo (Citation1992) and Sun and Berger (Citation1998), among others. In estimating the variance-covariance matrix Σ based on an iid random sample from a normal population with known mean, Yang and Berger (Citation1994) re-parameterised matrix Σ as ODO, where D is a diagonal matrix the elements of which are the eigenvalues of Σ (in increasing or decreasing order) and O is an orthogonal matrix. The following reference prior is derived by giving vectorised D higher priority over vectorised O: (11) πR(Σ)1|Σ|1i<jp(didj),(11) where d1>d2>>dp are the eigenvalues of Σ.

For numerical and empirical exercise in the is study we use the normal-reference prior πN(φ)πR(Σ). The conditional densities under the normal-reference prior are in the appendix. Ni and Sun (Citation2003) proved that the posteriors of (Φ,Σ) are proper under the normal-reference prior. But the conditional posterior π(ΣΦ,X,X) does not have an analytical form and must be sampled numerically.

4.2. A simulation algorithm for LTS models under the normal-reference prior

We employ an MCMC method to sample from the posterior. In particular, we use the Gibbs sampling method (cf. Gelfand & Smith, Citation1990). The following algorithm simulates the posteriors of LTS parameters conditional on both the sample and generated data.

Suppose that at cycle k, we have (Φk1,Σk1) (with an initial draw of Σ and Φ, e.g. the MLE.)

Algorithm MCMC:

Step 1. Generate Yk|Σk1,Φk1.

Simulate yk,tN(x0tB0+i=1Lyk,tiBk1,i,Σk1), for t=1,,T. Define Yk=yk,1yk,T and Xk=x01yk,1yk,Lx02yk,0yk,1Lx0Tyk,T1yk,TL.

Step 2. Generate Φk|Σk1,Yk,Y.

Simulate φk=vec(Φk)N(μk, Vk), where (12) μk=Vk[{Σk11(XX+XkXk)}φˆMLEk+M01φ0];(12) (13) φˆMLEk=vec{(XX+XkXk)1(XY+XkYk)}.(13) (14) Vk={M01+Σk11(XX+XkXk)}1.(14)

Steps 3 to 6 generate Σk|Σk1,Φk,Yk,Y.

Step 3: Calculate Sk=S(Φk)+S(Φk)=(YX×Φk)(YXΦk)+(YXΦk)(YXΦk). Decompose Σk1=ODO, where O is an orthogonal matrix, D=diag(d1,,dp) and d1>d2>dp. Let di#=log(di), D#=diag(d1#,,dp#)andΣk1#=OD#O.

Step 4: Select a random symmetric p×p matrix V, with elements, vij=zij/lmzlm2, where zijN(0,1) (1ijp, the other elements of V are defined by symmetry).

Step 5: Generate λN(0,1) and set Ψ=Σk1#+λV. Decompose Ψ=QC#Q, where Q is an orthogonal matrix, C#=diag(c1#,,cp#) and c1#>c2#>cp#. Compute βk=Ti=1p(di#ci#)+12tr[{(expΣk1#)1(expΨ)1}Sk]+i<jlog(di#dj#)i<jlog(ci#cj#).

Step 6: Define C=diag(exp(c1#),,exp(cp#)) and Σ~=QCQ. Simulate uuniform(0,1) and let Σk=Σ~,if umin{1,exp(βk)},Σk1,otherwise.

Note the acceptance probability exp(βk)=π(Ψ|Φk,Yk,Y)/π(Σk1|Φk,Yk,Y), where the conditional posterior π(Σ|Φ,Y,Y) is given in (EquationA14). To accelerate the convergence, we repeat Step 6 up to five times until a new candidate is accepted.

4.3. Computing the posterior average loss

From Lemma 2.1, given the estimate (Φˆ,Σˆ), which is computed for a given data sample Y, we write the posterior average loss E((Φ,Σ)|Y)L(Φˆ,Σˆ,Φ,Σ) as E(Φ,Σ)|YL1(Σˆ;Σ)+E(Φ,Σ)|YL2(Σˆ,Φˆ,Σ,Φ). We decompose the Σk in the kth MCMC cycle as Σk=QDkQ, where Dk is the diagonal matrix that consists of eigenvalues of Σk: Dk=diag(dk1,dk2,,dkp), and Q is an orthogonal matrix with QQ=I.

The posterior average loss under the intrinsic loss can be computed using the posterior draws of (Φk,Σk,Xk) generated by the MCMC procedure (k=1,2,,M), with (15) Eˆ((Φ,Σ)|Y)L1(Σˆ;Σ)=Eˆ((Φ,Σ)|Y){T2{tr(Σˆ1Σ)log|Σˆ1Σ|p}=T2{tr(Σˆ11Mk=1MΣk)+log|Σˆ|p1Mk=1Mi=1plog|dki|};(15) (16) Eˆ((Φ,Σ)|Y)L2(Φˆ,Σˆ;Φ,Σ)=Eˆ((Φ,Σ)|Y)12tr[Σˆ1{(ΦΦˆ)W(ΦΦˆ)}]=12{tr(Σˆ11Mk=1MΦkXkXkΦk)+tr(ΦˆΣˆ1Φˆ1Mk=1MXkXk)}tr(Σˆ1Φˆ1Mk=1MXkXkΦk),(16) where W=E(XΦ,Σ)(XX).

Note that all terms in the posterior entropy loss are functions of simulated Σ, Φ, and X over the MCMC cycles. The moments of the simulated parameters can be computed in the MCMC cycles, just as the posterior mean, without the need of storage of all of the simulated parameters.

5. A numerical example and an empirical study

5.1. A numerical example

In this section we first simulate data from an LTS model (17) yt=c+(yt1,xt)B+ϵt,(17) for t=1,,T. The dimension of the VAR variable yt is 5. The exogenous variable xt is a scalar representing seasonal cycles, with x1=0, x2=1, x3=0, x4=1, and xt=xt4 for t>4.

Now we let the true parameters be Σ=1000002000003000004000005,B=000000.5000000.5000000.5000000.5000000.50.10.20.30.40.5.

The last row of matrix B are the parameters of the seasonal dummies. The discussion in Section 2 shows with this parameter setting there is no closed-form expression of the frequentist expectation E(XX|Σ,Φ) and we need to simulate E(XΦ,Σ)(XX) using the data-augmentation algorithms proposed in Section 4.

We generate one data sample (T) of 100 observations from the LTS model with the above parameters for Σ and Φ. The MLE of the parameters are

ΣˆMLE=0.932010.123580.084620.272460.117820.123581.760180.022170.165080.264750.084620.022172.449800.078740.01940$0.27246$$$0.16508$$0.078743.448150.80056$$0.117820.264750.019400.800564.03612,ΦˆMLE=0.056330.147000.011580.164970.262240.519360.243970.217050.12090.251490.02140.418040.034950.062210.287000.048260.032650.502790.033510.101270.061760.014390.141740.370510.017580.01940.00140.083890.017340.398260.159000.224620.215470.016711.06891.

We conduct the simulation with a diffuse prior on Φ and the Yang–Berger reference prior on Σ. The length of MCMC cycles set at 100,000. The intrinsic Bayes estimates are

(18) ΣˆE=1.258710.176440.088260.267110.132470.176442.336910.004970.152780.288490.088260.004972.979930.039290.070360.267110.152780.039293.761330.741810.132470.288490.070360.741814.57260,(18) (19) ΦˆE=0.056550.093160.000920.118120.169430.461120.186790.139210.095520.139730.028910.375940.014260.056580.189490.044170.031560.452670.02360.106650.058090.027050.119090.335600.026020.024000.001050.083130.029210.349760.111880.088320.089800.044340.34897.(19)

The acceptance rate for the Metropolis step employed for sampling of Σ from the posterior conditional on other parameters and data is 24%.

Now we compare the Bayes estimator with the MLE and posterior mean. The posterior mean obtained using Option 1 is

(20) ΣˆMean=$1.17992$0.166570.084490.256180.127000.166572.204890.004760.146590.275870.084490.004762.818820.038010.065570.256180.146590.038013.573820.715610.127000.275870.065570.715614.35069,(20) (21) ΦˆMean=0.05990.097310.001930.122760.175580.432870.187410.141850.092360.139070.027720.350300.012700.056280.191180.046500.031160.427900.027330.104620.057180.026020.120060.312750.024370.023730.000410.082850.028510.327920.111670.087780.088380.044140.34923.(21)

It is known that the posterior mean of Σ minimises the expected posterior loss of L1. But it does not minimise the intrinsic loss because the estimator of Σ also influences the weight of Φ related loss in L2. Table  reports the average posterior loss of the MLE, posterior mean, and the Bayes estimator for the data sample generated in the example. The Bayes estimator improves L2-related risk with a tradeoff of larger L1-related risk. Table  shows that the Bayes estimator induces lower posterior risk than the posterior mean by making the L2-related risk substantially lower and the L1-related risk only slightly higher. Both posterior mean and the Bayes estimator dominate the MLE.

Table 1. Posterior average loss of the estimates in the Example.

5.2. An empirical study: seasonal effects in a macroeconomic model

We now turn to an empirical application of the Bayesian estimation under the entropy loss. We estimate an LTS consisting of seasonal dummies and four macroeconomic variables: the return of Standard and Poor 500 stock price index (which represents weighted stock prices of large companies), the 3-month Treasury Bill rate, the growth rate of payroll (including government jobs as well as private sector jobs), and the growth rate of industrial production (in that order). All series are measured in percentage terms. The series are monthly data from 1970:1 to 2002:12 and are not seasonally-adjusted. The payroll data are obtained from the Bureau of Labor Statistics, the rest series are obtained from the Federal Reserve Board. There are 12 column dummy variables, representing January to December.

The role of seasonal fluctuations in business cycles has been noted by a number of economists. Barky and Miron (Citation1989) argued that for the U.S. economy the characteristics of seasonal fluctuations are similar to the conventional characterisations of business cycles. Cecchetti et al. (Citation1997) estimated production function of various industries based on how their responses to seasonal shocks vary with the state of business cycle. Ghysels (Citation1988) showed that univariate seasonal adjustment of endogenous variables is not harmless because information on the interactions among endogenous variables will be lost. Miron and Beaulieu (Citation1996) provided a survey on econometric and economic issues on understanding business cycles through seasonal fluctuations. In the finance literature, numerous studies argue that stock returns appear to have a seasonal components. Rozeff and Kinney (Citation1976) documented a celebrated “turn of the year” effect, which refers to the seemingly abnormally high returns in January and July, especially for stocks with small-market capitalisations. A number of theories have been developed to explain the phenomenon. Reinganum (Citation1983) attributed the high stock return in January to the end of year tax-loss selling in December. Chang and Pinegar (Citation1989) found that industrial production trails the seasonal movement of stock returns by one month. The reported point estimates of the seasonal effects in the literature are model dependent and based on OLS or MLE. We will estimate the seasonal effects. Our primary interest lies in comparison of posterior mean with Bayes estimate under the entropy loss.

In this section, we employ an LTS model (22) yt=(xt,yt1,,ytL)Φ+ϵt,(22) for t=1,,T. The dimension of the VAR variable yt is four. The exogenous variable x0t=(x0t,1,,x0t,12) is a 12-dimensional vector representing seasonal cycles, with x0i,1 equals 1 if period i is January, and 0 otherwise; x0i,12=1 if period i is December and 0 otherwise.

Based on the Schwarz criterion, for each sample period the lag length L of the LTS is 2. The Yang–Berger reference prior is applied to the covariance matrix Σ. The prior for the LTS coefficient φ is a rather diffuse N(0,M0). Here M0 is a diagonal matrix with 10.0 being the diagonal element for parameters corresponding to the dummy variables and 2.0 being the diagonal element for parameters corresponding to the lag coefficients. We draw the posterior from M MCMC cycles after 0.1×M burn-in runs. The MCMC length M is set at 50, 000, 100, 000, and 1, 000, 000.

Under algorithm MCMC, reducing the length of MCMC cycles to 100, 000 or 50, 000 from 1, 000, 000 makes little difference. The MLE, posterior mean, and Bayes estimate of the covariance matrix Σ are as follows. As dictated by the theoretical result, the Bayes estimate under the entropy loss, ΣE is larger than the posterior mean ΣMean. ΣˆMLE=19.7750.3350.0940.0590.3350.2540.0550.0110.0940.0550.9370.0700.0590.0110.0700.044. With M = 1, 000, 000, the posterior mean and entropy-based Bayes estimates are ΣˆMean=20.9080.3540.1000.0630.3540.2710.0580.0120.1000.0580.9960.0750.0630.0120.0750.048.ΣˆE=22.7290.3840.1010.0780.3840.2950.0570.0120.1010.0571.0740.0810.0780.0120.0810.052. The posterior standard deviations of the elements of the covariance matrix are 1.5380.1240.2350.0520.1240.0200.0270.0060.2350.0270.0740.0120.0520.0060.0120.003. The difference between estimates ΣˆMean and ΣˆE is large relative to the posterior standard deviations.

The above point estimates and standard deviations are similar to those with M = 50, 000. The MCMC algorithm yields posteriors with few outliers. This is because in Step 2 the MCMC algorithm, φ is generated from an average of sample data and generated data, instead of sample data alone. A few outliers in the posterior may affect the posterior mean slightly but can change the posterior risk and the Bayesian estimate substantially because the few explosive parameters carry disproportionately large weights in the posterior average loss.

The intrinsic Bayes estimate dominates the posterior mean by a large margin in terms of posterior expected loss. The large difference in the posterior expected loss is mainly due to the difference in the Φ-related risk, i.e. the quadratic term in (Equation5). This difference in risk is approximately 12tr[ΣˆE1E{(ΦΦˆE)W×(ΦΦˆE)Y}ΣˆE1(ΣˆEΣˆMean)], which is proportional to the frequentist expectation W, and the latter is comparable to XX. XX is quite large in this application, largely due to the strong serial correlation of the 3-month Tbill rates. As a result, with a larger Σˆ the Bayesian estimate substantially reduces the posterior risk, compared with the posterior mean. Simulation with M = 1, 000, 000 shows that the posterior average loss of the posterior mean estimate (3230.6) is larger than that of the intrinsic Bayes estimate (69.7). The lower overall posterior average loss of the intrinsic Bayes estimate is achieved by substantially lowering the risk of the quadratic part, from 3225.3 for the posterior mean to 61.8. The first term of the loss related to Σ of the intrinsic Bayes estimate is slightly larger (7.9) compared to that under the posterior mean estimate (5.3). As is noted earlier, the intrinsic Bayes estimate improves Φ-related loss with a tradeoff of larger Σ-related loss. The empirical result shows that the Bayesian estimate induces lower posterior average loss than the posterior mean by making the Φ-related loss substantially lower and the Σ-related loss only slightly higher.

We now turn to compare the estimates of the regression coefficient Φ. Table  reports the MLE, posterior mean, the intrinsic Bayes estimate, and posterior standard deviations. As a consequence of applying a rather diffuse prior, the posterior mean ΦMean is quite similar to the MLE. For stock returns of Standard and Poor 500 index, MLE and the posterior mean estimate indicate a moderate positive seasonal factor in January, which is smaller than the seasonal factor of March, April, October, November, and December. Most surprisingly, October registers the largest seasonal gain, despite the fact that the sample included the 1987 October sell off. The data of recent years suggest that the estimates of seasonality in large capitalisation stock returns are quite sensitive to the sample period and regression model. In comparison to the MLE and the posterior mean, the intrinsic Bayesian estimate shows a smaller January effect and much smaller end-of-the-year positive seasonal returns. The sum of the seasonal coefficients of the MLE is above 11% while that of the intrinsic Bayesian estimates is about half as much. The large discrepancy between the posterior mean and the intrinsic Bayesian estimate s casts doubt on the robustness of the seasonality of returns of Standard and Poor 500 stock.

Table 2. Estimates of three equations.

Compared to the stock return, the seasonality of industrial production growth rate is much more robust. The most distinct pattern is a steep decline in July followed by a surge in August; then weakness in the end of the year precedes a strong rebound in February. The strong showing of industrial production in February and August is consistent with the pattern reported in Chang and Pinegar (Citation1989) while the predicted industrial production by Standard and Poor stock returns is quite small. The magnitude of the seasonal effects by the entropy-loss-based Bayesian estimates is on average slightly larger than that of the posterior mean.

Lastly, we examine the estimates of the employment growth rate equation. The most prominent seasonal patterns are the decline in January followed by a rebound in February and March and the weakness in July followed by a recovery in September. The estimated seasonality in payroll growth is somewhat different from that of the industrial production growth. Note that in year 2002 about 83% of the payroll consists of service sector jobs while 85% of industrial production concerns the manufacture sector. The subject of interest is the point estimates. Similar to the industrial production equation, the entropy-loss-based Bayes estimates for the payroll growth rate equation are similar to the posterior mean.

In summary, the Bayesian estimates based on the entropy loss show a qualitatively similar seasonal pattern to that of the posterior mean estimates for industrial production and employment growth but a distinctly different one for stock returns. The posterior average loss of the Bayes estimates with respect to the entropy loss is substantially smaller than that of the posterior mean.

6. Concluding remarks

In this paper we investigate properties of Bayes estimators of LTS model (Φ,Σ) derived from the entropy loss function. These estimators are distinctly different from the multivariate iid model because of the serial correlation of the time series variables. Bayesian computation under the entropy loss requires simulating a frequentist moment of the regressors. We propose a data-augmenting algorithm for simulation of posteriors and computation of Bayes estimators under the entropy loss and a normal-reference prior. The algorithm that draws from the full conditional posterior is shown to be quite efficient. A novel approach taken in this paper concerns generating data in an MCMC as latent parameters. This idea may be useful for other contexts for simulating complicated posterior moments. Our empirical application to a macroeconomic problem shows that the Bayes estimates under the entropy loss can differ substantially from the posterior mean.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Shawn Ni

Dr. Shawn Ni holds a PhD in Economics from University of Minnesota. He is currently Middlebush Professor of Economics and Adjunct Professor of Statistics at University of Missouri-Columbia. He conducts research on a wide range of empirical economics topics and Bayesian statistics.

Dongchu Sun

Dr. Dongchu Sun holds a PhD in Statistics from Purdue University. He is a research professor of statistics at the University of Nebraska-Lincoln and East China Normal University. His research interests includes Bayesian analysis, small area estimation, decision theory, business and econometrics, space-time and longitudinal models, and smoothing splines.

References

Appendix. Proof of Fact 2.1 and posterior properties

A.1. Proof of Fact  2.1

Proof.

Let (μ~,Σ~) denote an arbitrary estimator of (μ,Σ). For the entropy loss function L and posterior π(μ,ΣY), the expected posterior loss is R(μ~,Σ~Y)=T2E{(μ~μ)Σ~1(μ~μ)+tr(ΣΣ~1)log|ΣΣ~1|p}. The Bayes estimator, which minimises the expected posterior loss, can be derived through conditions on first-order derivatives. Note that because Σ~ is symmetric, R(μ~,Σ~)μ~=2EΣ~(μ~μ). Let the derivative be 0 yields μˆiid=E(μY).

The following identities are known (e.g. Harville, Citation1998, p. 327) for symmetric matrices A and B, log(|A|)A=2A1diag(A1),tr(AB1)B=2(B1AB1)+diag(B1AB1). Here diag(A) is a diagonal matrix, whose diagonal elements are these from A. Using the conclusion that the estimator for μ is the posterior mean, we have E{(μˆiidμ)(μˆiidμ)Y}=var(μY). Taking this result to the derivative with respect to Σ~, we have R(μˆiid,Σ~)Σ~=ET2{2Σ~1var(μY)Σ~1+diag(Σ~1var(μY)Σ~1)2Σ~1ΣΣ~1+diag(Σ~1ΣΣ~1)+2Σ~1diag(Σ~1)}.=T2{2Σ~1(I[E(ΣY)+var(μY)]Σ~1)+diag(Σ~1(I[E(ΣY)+var(μY)]Σ~1)}. The derivative is 0 when I=[E(ΣY)+var(μY)]Σ~1, which yields Σˆiid=E(ΣY)+var(μY).

A.2. Posterior properties

A.2.1. (θX,X) under normal prior for Φ and Jeffreys type prior for Σ

A commonly used noninformative prior for Σ is the Jeffreys prior πJ(Σ)|Σ|(p+1)/2. The prior for Σ in the RATS statistical package is a modified version of the Jeffreys prior, πA(Σ)|Σ|(L+1)p/21. Zellner's maximum data informative (MDI) prior, πM(Σ)|Σ|1/2. For analysis on prior choice in VAR models see Ni and Sun (Citation2003), and Sun and Ni (Citation2004). We consider a class of joint priors, (A1) πb(φ,Σ)=πN(φ)πb(Σ),(A1) where πN(φ) is the normal prior for φ given by (Equation10), and πb(Σ) (bI R) is given by (A2) πb(Σ)1|Σ|b/2.(A2) Note that πNJ,πNA, and πNM are special cases of (EquationA2) when b equals to p + 1, (L+1)p+2 and 1, respectively.

We propose using the posterior quantities conditional on the simulated data X, h(θX,X), instead of the marginal posterior, h(θX), as the estimator of h(θ). Note that the posterior f(Φ,ΣX,X)1|Σ|T+b/2×etr12(YXΦ)Σ1(YXΦ)×etr12(YXΦ)Σ1(YXΦ)×exp12(φφ0)M01(φφ0). We would like to express the posterior of Σ in terms of X. Integrating out Φ results in f(ΣX,X)=|M0+Σ(XX+XX)1|1/2|Σ|T+b/2×etr12Σ1[(YXΦˆM)(YXΦˆM)+(YXΦˆM)(YXΦˆM)]×exp12(φ~φ0)[M0+Σ(XX+XX)1]1(φ~φ0)×exp12(φˆMφˆM)Σ1[(XX)1+(XX)1]1(φˆMφˆM), where the lower case defines the vec operator: φ=vec(Φ), Φ~=(XX+XX)1(XY+XY), ΦˆM=(XX)1XY, and ΦˆM=(XX)1XY.

The posterior f(Φ,ΣX,X) has a closed form when the prior for φ can be written as N(φ0,M0) with (A3) M0=ΣΩ0,(A3) where Ω0 is a Lp + 1 by Lp + 1 known covariance matrix. In the extreme case of Ω010, M010 and the prior approaches a constant prior. Under the assumption (EquationA3), we have f(ΣX,X)|Ω0+(XX+XX)1|p/2|Σ|TbLp12etr12VΣ1, where V(X,X)=(ΦˆMΦˆM)[(XX)1+(XX)1](ΦˆMΦˆM)+(ΦˆΦ0)[Ω0+(XX+XX)1]1(Φ~Φ0)+(YXΦˆM)(YXΦˆM)+(YXΦˆM)(YXΦˆM).f(ΣX,X)IW{V,2T+b(L+1)p2}. The mean of ΣX,X is V(X,X)/(2T+b(L+1)p2p1)=V(X,X)/(2T+b(L+2)p3). The posterior mean ΣX is estimated by j=1M(ΣXj,X)/M.

The marginal posterior of f(ΦX,X) can be obtained by integrating out Σ in f(Φ,ΣX,X). It is easy to verify that (A4) f(Φ,ΣX,X)|Σ|Tb/2etr12UΣ1=|Σ|(ν+p+1)/2etr12UΣ1,(A4) where the degree of freedom ν=2T+bp1 and (A5) U=(ΦΦu)(Ω01+XX+XX)(ΦΦu)+V(X,X),(A5) (A6) Φu=(Ω01+XX+XX)1(Ω01Φ0+XY+XY).(A6)

It follows from (EquationA4) that (A7) f(ΦX,X)|U|ν/2=(ΦΦu)(Ω01+XX+XX)(ΦΦu)+V(X,X)ν/2,(A7) which follows a matrix version of the Student-t distribution. The mean of ΦX,X is Φu. To calculate the intrinsic Bayes estimator, note that the frequentist expectation Wˆ can be estimated by j=1MXjXj/M. The posterior mean E(WΦX) is estimated by j=1MXjXjΦ(Xj,X)/M.

A.2.2. Conditional posteriors under normal prior for Φ and the Yang-Berger reference prior for Σ

Fact A.1

Consider the normal prior for φ given in (Equation10). The conditional density of φ given (Σ;Y) is NJ(μM,VM), where (A8) μM=φˆMLE+{M01+Σ1(XX)}1M01(φ0φˆMLE);(A8) (A9) VM={M01+Σ1(XX)}1,(A9) where (A10) φˆMLE=vec{(XX)1XY}.(A10)

Fact A.2

Consider the normal prior for φ given in (Equation10). The conditional density of φ given (Σ;Y,Y) is N(φM, VM), where VM={M01+Σ1(XX+XX)}1,φM=VM[{Σ1(XX+XX)}φˆMLE+M01φ0];φˆMLE=vec{(XX+XX)1(XY+XY)}.

Fact A.3

The conditional density of Σ given (φ,Y) is (A11) π(Σ|Φ,Y)etr{12Σ1S(Φ)}|Σ|T/2+11i<jp(didj),(A11) where S(Φ)=(YXΦ)(YXΦ).

Fact A.4

The conditional density of Σ given (φ,Y,Y) is (A12) π(Σ|Φ,Y,Y)etr{12Σ1(S(Φ)+S(Φ))}|Σ|T+11i<jp(didj),(A12) where S(Φ)=(YXΦ)(YXΦ).

We have shown how simulated data facilitate computation of a frequentist moment in the Bayes estimator. In the appendix, we will show that the simulated data can also be used to reduced the variance of MCMC, making the simulation more efficient.

For simulation of Σ, we adopt a hit-and-run algorithm used in Yang and Berger (Citation1994). In implementing the algorithm, we consider a one-to-one transformation Σ#=log(Σ), or Σ=exp(Σ#) in the sense that Σ=j=0(Σ#)jj!. The reason for simulating Σ as exp(Σ#) is to ensure the generated Σ matrices are positive definite. It can be shown that the conditional posterior density of Σ# given (Φ,Y) is (A13) π(Σ#|Φ,Y)=π(Σ#|S(Φ))etr{T2|Σ#|12(expΣ)1S(Φ)}i<j(di#dj#),(A13) and that the conditional posterior density of Σ given (φ,Y,Y) is then (A14) π(Σ#|Φ,Y,Y)=π(Σ#|S(Φ),S(Φ))etr[T|Σ#|12(expΣ)1{S(Φ)+S(Φ)}]i<j(di#dj#),(A14) where Σ#=OD#O, O is an orthogonal matrix, D#=diag(d1#,,dp#), with d1#dp#. Note that exp(Σ#)=Oexp(D#)O.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.