![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
Abstract
Intrinsic loss functions (such as the Kullback–Leibler divergence, i.e. the entropy loss) have been used extensively in place of conventional loss functions for independent samples. But applications in serially correlated samples are scant. In the present study, we examine Bayes estimator of Linear Time Series (LTS) model under the entropy loss. We derive the Bayes estimator and show that it involves a frequentist expectation of regressors. We propose a Markov Chain Monte Carlo procedure that jointly simulates the posteriors of the LTS parameters with frequentist expectation of regressors. We conduct Bayesian estimation of an LTS model for seasonal effects in some U.S. macroeconomic variables.
AMS 1991 Subject Classifications:
1. Introduction
To analyse dynamics of multi-variate economic systems, researchers frequently employ Linear Time Series (LTS) models (see for example, Sims, Citation1980 and the ensuing literature). Bayesian inference of such models often requires point estimate of parameters because reporting the entire posterior distribution is made difficult by a prohibitively large number of parameters. A critical aspect in Bayesian estimation is the choice of loss function.
In this study, we derive Bayes estimator of LTS models based on the intrinsic loss. We illustrate a computational problem arising from serial correlation in the models when applying the intrinsic loss and our solution to the problem.
A loss function measures the distance between the parameter θ and its estimate
. Such a metric is often specified for convenience given the problem at hand instead of grounding on a general principle. Bernardo and Juárez (Citation2003) noted that for inferential purposes, what matters most is not the distance between θ and
, instead it is the intrinsic loss – the distance between the probability model
(corresponding to the estimate
) and
(corresponding to the actual parameter θ). Robert (Citation1994, Citation1996) proposed using the logarithmic divergence (also known as the Kullback–Leibler divergence or the entropy loss) as the intrinsic loss. The intrinsic loss has a number of desirable properties not generally possessed by conventional loss functions. For example, it is invariant to transformation of data x or parameter θ and has the additive property that the loss of the sum of two independent data sets is the sum over the two losses corresponding to each data set.
The intrinsic loss has been used for independent samples for Bayesian estimation. It has also be used in various contexts for time series data. For instance, Kitamura and Stutzer (Citation1997) used the Kullback–Leibler distance to derive a frequentist estimator for nonlinear models. Solo et al. (Citation2001) used the Kullback–Leibler distance for evaluation of signal processing model. Robertson et al. (Citation2005) used the entropy divergence for evaluation of forecasting density. Fernandez-Villaverde and Rubio-Ramirez (Citation2004) used the Kullback–Leibler distance to evaluate dynamic equilibrium models in economics. However, employing the intrinsic loss for Bayesian estimation of time series models leads to technical challenges.
To illustrate the difference of the intrinsic loss in independent models and serially correlated models, consider the following examples. First, suppose , where
(
) are independently identically distributed (iid)
and we are interested in estimating the mean parameter ρ under the entropy loss
By the assumption on the model
. It is easy to verify that
. In this case, the intrinsic loss coincides with the commonly used quadratic loss, which implies that the Bayes estimator of ρ is the posterior mean. Now consider an AR(1) model:
for
where
is iid
, and ρ is the only unknown parameter.
The entropy loss is still
but now f is the density of the AR variable. Substitute in the distribution of data
where
It is obvious that
is an increasing function of
and is nonnegative for any ρ. A Bayes estimator (which is called a generalised Bayes estimator if the prior is improper) minimises the Bayesian posterior expected loss. If the entropy loss is employed, the Bayes estimator for ρ with a given initial condition
is
Note that if ρ is positive, ρ and
are positively correlated. It follows that the Bayes estimator under the entropy loss for a positive ρ is larger than the posterior mean. It is well known that the MLE (and the posterior mean under constant prior) of
is biased downward, especially when the true parameter is close to unity (see MacKinnon & Smith, Citation1998) Note that under the constant prior
is the posterior precision (i.e. the inverse of posterior variance) for ρ. Hence the weight for the square of estimation error in the intrinsic loss function,
, is larger in the region of ρ where the posterior precision is high. It is in the spirit of Zellner's (Citation1978, Citation1998) ‘precision of the estimation’ loss. This is in contrast of quadratic loss that imposes the same weight on all regions of ρ.
Now we turn to the model of interest. The LTS of a p dimensional column endogenous variable and a q dimensional column exogenous (predetermined) variable
has the form:
(1)
(1) where L is a known positive integer,
is a
unknown matrix,
is an unknown
matrix,
are iid
errors, and
is an unknown
positive definite matrix.
A special case of the above LTS model is all of the lag coefficients are zero (i.e. all regressors (with q>p) are exogenous variables.) The exogenous variables may be functions of time. For example, in modelling of climate temperature or holiday consumer spending, seasonal dummies may be introduced in the model. In economic applications, these exogenous variables may also be variables of government policies. Another special case is
is a
constant vector with elements of unity. The regressors only include lags of the variable
. The LTS model becomes a Vector AutoRegression (VAR), which is commonly used for modelling of macroeconomic time series.
We can rewrite Equation (Equation1(1)
(1) ) in the familiar matrix form
(2)
(2) where
Here
and
are
and
; the former does not depend on parameters
and
, but the latter does.
and
are
matrices,
is a
matrix of unknown parameters,
is a
row vector, and
is a
matrix of observations. The likelihood function of
based on
is then
(3)
(3) Here and hereafter
is
of a matrix
The present paper achieves two goals. The first one is derivation of the Bayes estimator of LTS model under the entropy loss. We show that the entropy loss on is non-separable in
and
, which can be written as the sum of losses pertaining to the covariance matrix
and normalised estimation error of
. The form of the
-part of the entropy loss for LTS is
, where
is the Bayes estimator of
. Under the entropy loss, the Bayes estimator distinctly differs from the posterior mean and differs from that of the iid multivariate normal model. The part of the intrinsic loss function associated with the regression coefficients turns out to be related with a conventional loss function. For estimation of a matrix parameter such as
in the simultaneous equations context, Zellner (Citation1978, Citation1998) proposed a ‘precision of estimation’ loss that can also be written as
. However, in Zellner's simultaneous equations model,
is taken as given, but in LTS the predetermined variable
depends on parameters
.
The second goal concerns numerical estimation of the intrinsic Bayes estimator via Markov Chain Monte Carlo (MCMC). We propose a general algorithm that generate regressors as latent parameters in simulation of posteriors of parameters of LTS models. Data augmentation in this study differs from that in Tanner and Wang's Citation1987 seminal paper in motivation and implementation. Tanner and Wang use data augmentation to alter the likelihood function for easier MCMC simulation of the posteriors. In this study, the likelihood function of the generated data is the same as the likelihood of the sample data. Here, data augmentation does not make it easier for posterior simulation. Instead, it makes it possible to compute frequentist moment of the LTS variables. The frequentist moment, simulated jointly with parameters, is used to produce Bayes estimates under the entropy loss.
Besides the choice of loss function, the choice of prior also plays a pivotal role in Bayesian estimation. Jeffreys prior on (see Zellner, Citation1971) is a noninformative prior for
that gives rise to conditional posteriors in well known distributions. Ni et al. (Citation2007) conducted Bayesian estimation of VAR model under the entropy loss, using the Jefferys prior for
. However, despite its popularity the Jeffreys prior is known for producing unsatisfactory results in multi-parameter settings. In this study we simulate the LTS model under a combination of normal prior on regression parameters and Yang and Berger (Citation1994) reference prior on
. The conditional posteriors of
are simulated using a Metropolis-Hastings algorithm. Our empirical application shows that despite the fact that LTS models involve a large number of parameters and a large number of latent variables, the data-augmentation algorithm is quite efficient.
In Section 2 of the paper, we derive the Bayes estimator of LTS models under the entropy loss function and discuss computation of the weighting matrix in the Bayes estimator. In Section 3, we present a general algorithm using generated data as latent parameters. In Section 4, we lay out the MCMC algorithm for computing (,
) in the LTS model. In Section 5, we first compare intrinsic Bayes estimator with other estimators in a numerical example and then estimate a LTS model using seasonally unadjusted macroeconomic data. In Section 6 we offer concluding remarks.
2. Entropy loss function for the iid multivariate and LTS models
2.1. Entropy loss function for the iid model
We first consider the entropy loss function (Robert, Citation1994, p. 74) for a multivariate normal distribution. Let be a random sample from
. One can compute the entropy loss function as
Here
is the density of
. Clearly, the loss function has two parts. One part is related to the means
and
(with
as the weighting matrix), and the other part is related to
and
The following fact states that Bayes estimator for μ is the posterior mean of μ but that of
is larger than the posterior mean of
.
Fact 2.1
Under the entropy loss L, the generalised Bayes estimator of is
Note that represents data, expectation
and variance
are with respect to the posterior distribution. The proof is in the appendix.
2.2. Entropy loss functions for LTS models
Recall that for the LTS model (Equation2(2)
(2) ), the likelihood function of
is of the form (Equation3
(3)
(3) ). The entropy loss for the LTS model is
(4)
(4) where for computing the expectation on the right-hand side,
is not a function of
. The entropy loss
can be decomposed into two parts. One part measures the loss associated with the covariance matrix only, while the second part measures the loss of coefficients
, but related to the covariance matrix
as well. Because
are iid
, we have
Then
The result of this derivation can be summarised by the following lemma.
Lemma 2.1
The entropy loss function for the LTS model is
(5)
(5) where
(6)
(6)
This Lemma can be proved using algebra similar to that in the iid case. However, there is an important difference. For the LTS model, Bayes estimators involve matrix , a frequentist expectation of
for given parameters
. For the iid case, no such term is present. The next theorem gives the form of the Bayes estimators under the entropy loss.
Theorem 2.1
The generalised Bayes estimator of under the entropy loss is
(7)
(7)
(8)
(8)
The above theorem can be proved similarly as Fact 2.1.
Under the special case with no lag coefficients in the regression, we have , which is not a function of
and
. It follows that the Bayes estimator of
is the posterior mean, as it is for the iid model. This observation is stated in the following remark.
Remark 1
If for j>0, then
However, the Bayes estimator for the LTS model is generally different from the iid case. The Bayes estimator for the LTS model is not the posterior mean. To compare the estimator
with the posterior mean, note that in general
Because
and
are likely to be positively correlated, the Bayes estimator of
under the intrinsic loss is likely to be larger than the posterior mean. It is known that MLE and the posterior mean of
under a diffuse prior is likely to have a downward bias when the true parameters are closed to random walk, a typical pattern of macroeconomic data. The form of Bayes estimator of
based on the intrinsic loss is helpful in correcting the bias in the posterior mean.
The estimator in LTS model involves the frequentist expectation . The
matrix depends on specifications of the regressors
. If the regressors are specified as functions of lags of
the computation for
matrix becomes nontrivial.
Using notation in Equation (Equation1(1)
(1) ), the frequentist expectation matrix
can be written as
For exogenous variables , there is no need deriving general closed-form expression for the terms in the above matrix as functions of parameter
and
. On the other hand, due to the serial correlations of
, computation of
is not straightforward. In the presence of exogenous variables no analytical expression for
is available. In the following, we discuss approaches to Bayesian estimation under the entropy loss for the general LTS model.
3. Approaches of computing the expectation ![](//:0)
![](//:0)
Theorem 2.1 shows that under the entropy loss the Bayes estimator for involves the frequentist expectation
, and we need to compute the posterior moments
,
, and
.
The frequentist expectation depends on
and
. For the LTS model
does not have an analytical form and needs to be computed numerically for a given (
). We use
and
to denote observed data in the LTS model (Equation2
(2)
(2) ). We generate
and
from the same model in (Equation2
(2)
(2) ) given parameters
and
, in order to compute
. There is only one observed data set
and
but there are many sets of generated
and
. Suppose
and
need to be simulated by an MCMC algorithm, then
and
need to be generated for each draw of
and
.
One approach to computing is straight forward but time-consuming: for each
and
drawn in the kth MCMC cycle we generate many sets of
and use the average of
to approximate
. While this approach is possible in theory its high computational cost renders it infeasible in practice. For practical purposes, we must take an alternative approach to compute Bayes estimates.
Fortunately, we have an alternative approach that does not require much additional computational cost beyond simulating . Suppose we simulate one set of data
from the LTS model in each MCMC cycle with simulated parameters of
, and then simulate the parameters of the next MCMC cycle
conditional on both the sample data
and the simulated data
. We will demonstrate that the posterior moments such as
,
, and
can be computed through simulated parameters
and the jointly simulated data
(for
). The simulated data are in essence latent parameters. They are not the subject of our interests per se but are useful for simulation of parameter of interest (i.e. the frequentist expectation
). Data augmentation is not uncommon in Bayesian simulations, but as we noted in the introduction, this data-augmented simulation approach differs from its other uses in the econometrics and statistics literature. One question of practical importance remains though: The number of elements in simulated matrix
has the dimension of
, which can be quite large. Do we have to simulate very long Markov chains to assure the averages are good approximates of the posterior mean? Fortunately, our numerical results show that the answer to the question is “no”.
In the following we propose a general algorithm that formalises the data-augmentation idea discussed above.
3.1. A general algorithm using data as latent parameters
Suppose that observed data has the density
, where parameter vector
is unknown. A prior
can be informative or noninformative. Let
be a random vector (or a matrix) with the density
. Let
be a function of the parameters
. We are interested in the posterior mean of the quantity
given the data
.
Our algorithm is based on the following fact:
where
(9)
(9) If we have a random sample
, from the joint distribution of (Equation9
(9)
(9) ), we can estimate
by using the result
The problem becomes to generate observations from the joint distribution of given the data
. For this task the following MCMC method can be used.
Suppose that at the beginning of cycle k we have .
Simulating full conditional posterior: We sample from
Step 1. Simulate
Step 2. Simulate
4. Bayesian estimation of ![](//:0)
in LTS models
4.1. Priors
The Bayes estimator of LTS depends on the prior of We assume prior independence so the prior for
is
, the product of priors for
and
.
For estimation of regression coefficient , a popular informative prior of
is the normal distribution,
, with hyperparameters
and
:
(10)
(10)
A popular class of non-informative prior on is
If b = p + 1,
becomes the Jeffreys prior (see Zellner, Citation1971)
Ni et al. (Citation2007) examined intrinsic Bayes estimator under prior
In the appendix we show posteriors and
can be obtained in analytical form.
As mentioned in the introduction, in multiple-parameter settings the Jeffreys prior often has undesirable properties. Bernardo (Citation1979) proposed an approach of deriving a reference prior by breaking a single multiparameter problem into a consecutive series of problems with fewer numbers of parameters. For examples where the reference priors produce more desirable estimates than the Jeffreys priors, see Berger and Bernardo (Citation1992) and Sun and Berger (Citation1998), among others. In estimating the variance-covariance matrix based on an iid random sample from a normal population with known mean, Yang and Berger (Citation1994) re-parameterised matrix
as
, where
is a diagonal matrix the elements of which are the eigenvalues of
(in increasing or decreasing order) and
is an orthogonal matrix. The following reference prior is derived by giving vectorised
higher priority over vectorised
:
(11)
(11) where
are the eigenvalues of
For numerical and empirical exercise in the is study we use the normal-reference prior . The conditional densities under the normal-reference prior are in the appendix. Ni and Sun (Citation2003) proved that the posteriors of
are proper under the normal-reference prior. But the conditional posterior
does not have an analytical form and must be sampled numerically.
4.2. A simulation algorithm for LTS models under the normal-reference prior
We employ an MCMC method to sample from the posterior. In particular, we use the Gibbs sampling method (cf. Gelfand & Smith, Citation1990). The following algorithm simulates the posteriors of LTS parameters conditional on both the sample and generated data.
Suppose that at cycle k, we have (with an initial draw of
and
, e.g. the MLE.)
Algorithm MCMC:
Step 1. Generate .
Simulate for
. Define
Step 2. Generate .
Simulate where
(12)
(12)
(13)
(13)
(14)
(14)
Steps 3 to 6 generate .
Step 3: Calculate Decompose
where
is an orthogonal matrix,
and
Let
Step 4: Select a random symmetric matrix
, with elements,
where
(
, the other elements of
are defined by symmetry).
Step 5: Generate and set
Decompose
where
is an orthogonal matrix,
and
Compute
Step 6: Define and
. Simulate
and let
Note the acceptance probability , where the conditional posterior
is given in (EquationA14
(A14)
(A14) ). To accelerate the convergence, we repeat Step 6 up to five times until a new candidate is accepted.
4.3. Computing the posterior average loss
From Lemma 2.1, given the estimate , which is computed for a given data sample
, we write the posterior average loss
as
. We decompose the
in the kth MCMC cycle as
, where
is the diagonal matrix that consists of eigenvalues of
:
, and
is an orthogonal matrix with
.
The posterior average loss under the intrinsic loss can be computed using the posterior draws of generated by the MCMC procedure
, with
(15)
(15)
(16)
(16) where
.
Note that all terms in the posterior entropy loss are functions of simulated ,
, and
over the MCMC cycles. The moments of the simulated parameters can be computed in the MCMC cycles, just as the posterior mean, without the need of storage of all of the simulated parameters.
5. A numerical example and an empirical study
5.1. A numerical example
In this section we first simulate data from an LTS model
(17)
(17) for
. The dimension of the VAR variable
is 5. The exogenous variable
is a scalar representing seasonal cycles, with
,
,
,
, and
for t>4.
Now we let the true parameters be
The last row of matrix are the parameters of the seasonal dummies. The discussion in Section 2 shows with this parameter setting there is no closed-form expression of the frequentist expectation
and we need to simulate
using the data-augmentation algorithms proposed in Section 4.
We generate one data sample (T) of 100 observations from the LTS model with the above parameters for and
. The MLE of the parameters are
We conduct the simulation with a diffuse prior on and the Yang–Berger reference prior on
. The length of MCMC cycles set at 100,000. The intrinsic Bayes estimates are
(18)
(18)
(19)
(19)
The acceptance rate for the Metropolis step employed for sampling of from the posterior conditional on other parameters and data is
.
Now we compare the Bayes estimator with the MLE and posterior mean. The posterior mean obtained using Option 1 is
(20)
(20)
(21)
(21)
It is known that the posterior mean of minimises the expected posterior loss of
. But it does not minimise the intrinsic loss because the estimator of
also influences the weight of
related loss in
. Table reports the average posterior loss of the MLE, posterior mean, and the Bayes estimator for the data sample generated in the example. The Bayes estimator improves
-related risk with a tradeoff of larger
-related risk. Table shows that the Bayes estimator induces lower posterior risk than the posterior mean by making the
-related risk substantially lower and the
-related risk only slightly higher. Both posterior mean and the Bayes estimator dominate the MLE.
Table 1. Posterior average loss of the estimates in the Example.
5.2. An empirical study: seasonal effects in a macroeconomic model
We now turn to an empirical application of the Bayesian estimation under the entropy loss. We estimate an LTS consisting of seasonal dummies and four macroeconomic variables: the return of Standard and Poor 500 stock price index (which represents weighted stock prices of large companies), the 3-month Treasury Bill rate, the growth rate of payroll (including government jobs as well as private sector jobs), and the growth rate of industrial production (in that order). All series are measured in percentage terms. The series are monthly data from 1970:1 to 2002:12 and are not seasonally-adjusted. The payroll data are obtained from the Bureau of Labor Statistics, the rest series are obtained from the Federal Reserve Board. There are 12 column dummy variables, representing January to December.
The role of seasonal fluctuations in business cycles has been noted by a number of economists. Barky and Miron (Citation1989) argued that for the U.S. economy the characteristics of seasonal fluctuations are similar to the conventional characterisations of business cycles. Cecchetti et al. (Citation1997) estimated production function of various industries based on how their responses to seasonal shocks vary with the state of business cycle. Ghysels (Citation1988) showed that univariate seasonal adjustment of endogenous variables is not harmless because information on the interactions among endogenous variables will be lost. Miron and Beaulieu (Citation1996) provided a survey on econometric and economic issues on understanding business cycles through seasonal fluctuations. In the finance literature, numerous studies argue that stock returns appear to have a seasonal components. Rozeff and Kinney (Citation1976) documented a celebrated “turn of the year” effect, which refers to the seemingly abnormally high returns in January and July, especially for stocks with small-market capitalisations. A number of theories have been developed to explain the phenomenon. Reinganum (Citation1983) attributed the high stock return in January to the end of year tax-loss selling in December. Chang and Pinegar (Citation1989) found that industrial production trails the seasonal movement of stock returns by one month. The reported point estimates of the seasonal effects in the literature are model dependent and based on OLS or MLE. We will estimate the seasonal effects. Our primary interest lies in comparison of posterior mean with Bayes estimate under the entropy loss.
In this section, we employ an LTS model
(22)
(22) for
The dimension of the VAR variable
is four. The exogenous variable
is a 12-dimensional vector representing seasonal cycles, with
equals 1 if period i is January, and 0 otherwise;
if period i is December and 0 otherwise.
Based on the Schwarz criterion, for each sample period the lag length L of the LTS is 2. The Yang–Berger reference prior is applied to the covariance matrix . The prior for the LTS coefficient
is a rather diffuse
. Here
is a diagonal matrix with 10.0 being the diagonal element for parameters corresponding to the dummy variables and 2.0 being the diagonal element for parameters corresponding to the lag coefficients. We draw the posterior from M MCMC cycles after
burn-in runs. The MCMC length M is set at 50, 000, 100, 000, and 1, 000, 000.
Under algorithm MCMC, reducing the length of MCMC cycles to 100, 000 or 50, 000 from 1, 000, 000 makes little difference. The MLE, posterior mean, and Bayes estimate of the covariance matrix are as follows. As dictated by the theoretical result, the Bayes estimate under the entropy loss,
is larger than the posterior mean
.
With M = 1, 000, 000, the posterior mean and entropy-based Bayes estimates are
The posterior standard deviations of the elements of the covariance matrix are
The difference between estimates
and
is large relative to the posterior standard deviations.
The above point estimates and standard deviations are similar to those with M = 50, 000. The MCMC algorithm yields posteriors with few outliers. This is because in Step 2 the MCMC algorithm, is generated from an average of sample data and generated data, instead of sample data alone. A few outliers in the posterior may affect the posterior mean slightly but can change the posterior risk and the Bayesian estimate substantially because the few explosive parameters carry disproportionately large weights in the posterior average loss.
The intrinsic Bayes estimate dominates the posterior mean by a large margin in terms of posterior expected loss. The large difference in the posterior expected loss is mainly due to the difference in the -related risk, i.e. the quadratic term in (Equation5
(5)
(5) ). This difference in risk is approximately
which is proportional to the frequentist expectation
, and the latter is comparable to
.
is quite large in this application, largely due to the strong serial correlation of the 3-month Tbill rates. As a result, with a larger
the Bayesian estimate substantially reduces the posterior risk, compared with the posterior mean. Simulation with M = 1, 000, 000 shows that the posterior average loss of the posterior mean estimate
is larger than that of the intrinsic Bayes estimate
. The lower overall posterior average loss of the intrinsic Bayes estimate is achieved by substantially lowering the risk of the quadratic part, from 3225.3 for the posterior mean to 61.8. The first term of the loss related to
of the intrinsic Bayes estimate is slightly larger
compared to that under the posterior mean estimate
. As is noted earlier, the intrinsic Bayes estimate improves
-related loss with a tradeoff of larger
-related loss. The empirical result shows that the Bayesian estimate induces lower posterior average loss than the posterior mean by making the
-related loss substantially lower and the
-related loss only slightly higher.
We now turn to compare the estimates of the regression coefficient . Table reports the MLE, posterior mean, the intrinsic Bayes estimate, and posterior standard deviations. As a consequence of applying a rather diffuse prior, the posterior mean
is quite similar to the MLE. For stock returns of Standard and Poor 500 index, MLE and the posterior mean estimate indicate a moderate positive seasonal factor in January, which is smaller than the seasonal factor of March, April, October, November, and December. Most surprisingly, October registers the largest seasonal gain, despite the fact that the sample included the 1987 October sell off. The data of recent years suggest that the estimates of seasonality in large capitalisation stock returns are quite sensitive to the sample period and regression model. In comparison to the MLE and the posterior mean, the intrinsic Bayesian estimate shows a smaller January effect and much smaller end-of-the-year positive seasonal returns. The sum of the seasonal coefficients of the MLE is above
while that of the intrinsic Bayesian estimates is about half as much. The large discrepancy between the posterior mean and the intrinsic Bayesian estimate s casts doubt on the robustness of the seasonality of returns of Standard and Poor 500 stock.
Table 2. Estimates of three equations.
Compared to the stock return, the seasonality of industrial production growth rate is much more robust. The most distinct pattern is a steep decline in July followed by a surge in August; then weakness in the end of the year precedes a strong rebound in February. The strong showing of industrial production in February and August is consistent with the pattern reported in Chang and Pinegar (Citation1989) while the predicted industrial production by Standard and Poor stock returns is quite small. The magnitude of the seasonal effects by the entropy-loss-based Bayesian estimates is on average slightly larger than that of the posterior mean.
Lastly, we examine the estimates of the employment growth rate equation. The most prominent seasonal patterns are the decline in January followed by a rebound in February and March and the weakness in July followed by a recovery in September. The estimated seasonality in payroll growth is somewhat different from that of the industrial production growth. Note that in year 2002 about of the payroll consists of service sector jobs while
of industrial production concerns the manufacture sector. The subject of interest is the point estimates. Similar to the industrial production equation, the entropy-loss-based Bayes estimates for the payroll growth rate equation are similar to the posterior mean.
In summary, the Bayesian estimates based on the entropy loss show a qualitatively similar seasonal pattern to that of the posterior mean estimates for industrial production and employment growth but a distinctly different one for stock returns. The posterior average loss of the Bayes estimates with respect to the entropy loss is substantially smaller than that of the posterior mean.
6. Concluding remarks
In this paper we investigate properties of Bayes estimators of LTS model derived from the entropy loss function. These estimators are distinctly different from the multivariate iid model because of the serial correlation of the time series variables. Bayesian computation under the entropy loss requires simulating a frequentist moment of the regressors. We propose a data-augmenting algorithm for simulation of posteriors and computation of Bayes estimators under the entropy loss and a normal-reference prior. The algorithm that draws from the full conditional posterior is shown to be quite efficient. A novel approach taken in this paper concerns generating data in an MCMC as latent parameters. This idea may be useful for other contexts for simulating complicated posterior moments. Our empirical application to a macroeconomic problem shows that the Bayes estimates under the entropy loss can differ substantially from the posterior mean.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
Shawn Ni
Dr. Shawn Ni holds a PhD in Economics from University of Minnesota. He is currently Middlebush Professor of Economics and Adjunct Professor of Statistics at University of Missouri-Columbia. He conducts research on a wide range of empirical economics topics and Bayesian statistics.
Dongchu Sun
Dr. Dongchu Sun holds a PhD in Statistics from Purdue University. He is a research professor of statistics at the University of Nebraska-Lincoln and East China Normal University. His research interests includes Bayesian analysis, small area estimation, decision theory, business and econometrics, space-time and longitudinal models, and smoothing splines.
References
- Barky, R. B., & Miron, J. A. (1989). The seasonal cycle and the business cycle. The Journal of Political Economy, 97(3), 503–534. https://doi.org/https://doi.org/10.1086/261614
- Berger, J. O., & Bernardo, J. M. (1992). On the development of reference priors. In J. M. Bernardo et al. (Eds.), Bayesian analysis IV . Oxford University Press.
- Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference. Journal of Royal Statistical Society Ser. B, 41, 113–147.
- Bernardo, J. M., & Juárez, M. A. (2003). Intrinsic estimation. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith & M. West (Eds.), Bayesian statistics (Vol. 7, pp. 465–476). Oxford University Press.
- Cecchetti, S. G., Kashyap, A. K., & Wilcox, D. W. (1997). Interactions between the seasonal and business cycles in production and inventories. American Economic Review, 87, 884–892.
- Chang, E. C., & Pinegar, M. J. (1989). Seasonal fluctuations in industrial production and stock market seasonals. The Journal of Financial and Quantitative Analysis, 24(1), 59–74. https://doi.org/https://doi.org/10.2307/2330748
- Fernandez-Villaverde, J., & Rubio-Ramirez, J. F. (2004). Comparing dynamic equilibrium economies to data. Journal of Econometrics, 123(1), 153–187. https://doi.org/https://doi.org/10.1016/j.jeconom.2003.10.031
- Gelfand, A. E., & Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409. https://doi.org/https://doi.org/10.1080/01621459.1990.10476213
- Ghysels, E. (1988). A study toward a dynamic theory of seasonality for economic time series. Journal of the American Statistical Association, 83(401), 168–172. https://doi.org/https://doi.org/10.1080/01621459.1988.10478583
- Harville, D. A. (1998). Matrix algebra from a statistician's perspective. Taylor & Francis Group.
- Kitamura, Y., & Stutzer, M. (1997). An information-theoretic alternative to generalized method of moments estimation. Econometrica, 65(4), 861–874. https://doi.org/https://doi.org/10.2307/2171942
- MacKinnon, J. G., & Smith, A. A. (1998). Approximate bias correction in econometrics. Journal of Econometrics, 85(2), 205–230. https://doi.org/https://doi.org/10.1016/S0304-4076(97)00099-7
- Miron, J. A., & Beaulieu, J. J. (1996). What have macroeconomists learned about business cycles from the study of seasonal cycles?. The Review of Economics and Statistics, 78(1), 54–66. https://doi.org/https://doi.org/10.2307/2109847
- Ni, S., & Sun, D. (2003). Noninformative priors and frequentist risks of Bayesian estimators of vector-autoregressive models. Journal of Econometrics, 115(1), 159–197. https://doi.org/https://doi.org/10.1016/S0304-4076(03)00099-X
- Ni, S., Sun, D., & Sun, X. (2007). Intrinsic Bayesian estimation of vector autoregression impulse responses. Journal of Business and Economic Statistics, 25(2), 163–176.https://doi.org/https://doi.org/10.1198/073500106000000378
- Reinganum, M. R. (1983). The anomalous stock market behavior of small firms in January: Empirical tests for tax-loss selling effects. Journal of Financial Economics, 12(1), 89–104.https://doi.org/https://doi.org/10.1016/0304-405X(83)90029-6
- Robert, C. P. (1994). The Bayesian choice. Springer-Verlag.
- Robert, C. P. (1996). Intrinsic losses. Theory and Decision, 40(2), 191–214. https://doi.org/https://doi.org/10.1007/BF00133173
- Robertson, J. C., Tallman, E. W., & Whiteman, C. H. (2005). Forecasting using relative entropy. Journal of Money, Credit, and Banking, 37(3), 383–401. https://doi.org/https://doi.org/10.1353/mcb.2005.0034
- Rozeff, M. S., & Kinney, W. R. (1976). Capital market seasonality: The case of stock returns. Journal of Financial Economics, 3(4), 379–402. https://doi.org/https://doi.org/10.1016/0304-405X(76)90028-3
- Sims, C. A. (1980). Macroeconomics and reality. Econometrica, 48(1), 1–48. https://doi.org/https://doi.org/10.2307/1912017
- Solo, V., Purdon, P., Weisskoff, R., & Brown, E. (2001). A signal estimation approach to functional MRI. IEEE Transactions on Medical Imaging, 20(1), 26–35. https://doi.org/https://doi.org/10.1109/42.906422
- Sun, D., & Berger, J. O. (1998). Reference priors under partial information. Biometrika, 85, 55–71. https://doi.org/https://doi.org/10.1093/biomet/85.1.55
- Sun, D., & Ni, S. (2004). Bayesian analysis of VAR models with noninformative priors. Journal of Statistical Planning and Inference, 121(2), 291–309. https://doi.org/https://doi.org/10.1016/S0378-3758(03)00116-2
- Tanner, M., & Wang, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of American Statistical Association, 82(398), 528–540. https://doi.org/https://doi.org/10.1080/01621459.1987.10478458
- Yang, R., & Berger, J. O. (1994). Estimation of a covariance matrix using the reference prior. The Annals of Statistics, 22(3), 1195–1211. https://doi.org/https://doi.org/10.1214/aos/1176325625
- Zellner, A. (1971). An introduction to Bayesian inference in econometrics. John Wiley & Sons.
- Zellner, A. (1978). Estimation of functions of population means and regression coefficients including structural coefficients: A minimum expected loss approach. Journal of Econometrics, 8(2), 127–158. https://doi.org/https://doi.org/10.1016/0304-4076(78)90024-6
- Zellner, A. (1998). The finite sample properties of simultaneous equations estimates and estimators: Bayesian and non-Bayesian approaches. Journal of Econometrics, 83(1–2), 185–212. https://doi.org/https://doi.org/10.1016/S0304-4076(97)00069-9
Appendix. Proof of Fact 2.1 and posterior properties
A.1. Proof of Fact 2.1
Proof.
Let denote an arbitrary estimator of
. For the entropy loss function L and posterior
, the expected posterior loss is
The Bayes estimator, which minimises the expected posterior loss, can be derived through conditions on first-order derivatives. Note that because
is symmetric,
Let the derivative be 0 yields
.
The following identities are known (e.g. Harville, Citation1998, p. 327) for symmetric matrices and
,
Here
is a diagonal matrix, whose diagonal elements are these from
. Using the conclusion that the estimator for
is the posterior mean, we have
. Taking this result to the derivative with respect to
we have
The derivative is 0 when
which yields
A.2. Posterior properties
A.2.1. ![](//:0)
under normal prior for ![](//:0)
and Jeffreys type prior for ![](//:0)
![](//:0)
A commonly used noninformative prior for is the Jeffreys prior
The prior for
in the RATS statistical package is a modified version of the Jeffreys prior,
Zellner's maximum data informative (MDI) prior,
For analysis on prior choice in VAR models see Ni and Sun (Citation2003), and Sun and Ni (Citation2004). We consider a class of joint priors,
(A1)
(A1) where
is the normal prior for
given by (Equation10
(10)
(10) ), and
(
) is given by
(A2)
(A2) Note that
, and
are special cases of (EquationA2
(A2)
(A2) ) when b equals to p + 1,
and 1, respectively.
We propose using the posterior quantities conditional on the simulated data ,
, instead of the marginal posterior,
, as the estimator of
. Note that the posterior
We would like to express the posterior of
in terms of
. Integrating out
results in
where the lower case defines the vec operator:
,
, and
The posterior has a closed form when the prior for
can be written as
with
(A3)
(A3) where
is a Lp + 1 by Lp + 1 known covariance matrix. In the extreme case of
,
and the prior approaches a constant prior. Under the assumption (EquationA3
(A3)
(A3) ), we have
where
. The mean of
is
. The posterior mean
is estimated by
.
The marginal posterior of can be obtained by integrating out
in
. It is easy to verify that
(A4)
(A4) where the degree of freedom
and
(A5)
(A5)
(A6)
(A6)
It follows from (EquationA4(A4)
(A4) ) that
(A7)
(A7) which follows a matrix version of the Student-t distribution. The mean of
is
. To calculate the intrinsic Bayes estimator, note that the frequentist expectation
can be estimated by
. The posterior mean
is estimated by
A.2.2. Conditional posteriors under normal prior for ![](//:0)
and the Yang-Berger reference prior for ![](//:0)
![](//:0)
Fact A.1
Consider the normal prior for given in (Equation10
(10)
(10) ). The conditional density of
given
is
where
(A8)
(A8)
(A9)
(A9) where
(A10)
(A10)
Fact A.2
Consider the normal prior for given in (Equation10
(10)
(10) ). The conditional density of
given
is
where
Fact A.3
The conditional density of given
is
(A11)
(A11) where
.
Fact A.4
The conditional density of given
is
(A12)
(A12) where
.
We have shown how simulated data facilitate computation of a frequentist moment in the Bayes estimator. In the appendix, we will show that the simulated data can also be used to reduced the variance of MCMC, making the simulation more efficient.
For simulation of , we adopt a hit-and-run algorithm used in Yang and Berger (Citation1994). In implementing the algorithm, we consider a one-to-one transformation
, or
in the sense that
The reason for simulating
as
is to ensure the generated
matrices are positive definite. It can be shown that the conditional posterior density of
given
is
(A13)
(A13) and that the conditional posterior density of
given
is then
(A14)
(A14) where
,
is an orthogonal matrix,
, with
. Note that
.