![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
ABSTRACT
The conditional autoregressive model is a routinely used statistical model for areal data that arise from, for instances, epidemiological, socio-economic or ecological studies. Various multivariate conditional autoregressive models have also been extensively studied in the literature and it has been shown that extending from the univariate case to the multivariate case is not trivial. The difficulties lie in many aspects, including validity, interpretability, flexibility and computational feasibility of the model. In this paper, we approach the multivariate modelling from an element-based perspective instead of the traditional vector-based perspective. We focus on the joint adjacency structure of elements and discuss graphical structures for both the spatial and non-spatial domains. We assume that the graph for the spatial domain is generally known and fixed while the graph for the non-spatial domain can be unknown and random. We propose a very general specification for the multivariate conditional modelling and then focus on three special cases, which are linked to well-known models in the literature. Bayesian inference for parameter learning and graph learning is provided for the focused cases, and finally, an example with public health data is illustrated.
1. Introduction
Areal data, sometimes called lattice data, are usually represented by an undirected graph where each vertex represents an areal unit and each edge represents a neighbouring relationship. A finite set of random variables on an undirected graph, where each vertex is a random variable, is called a Markov random field if it has the Markov property. Hence, the Markov random field models are often used for the areal data. The univariate conditional autoregressive (CAR) model, originated from Besag (Citation1974), is a Gaussian Markov random field model, for which the joint distribution is multivariate Gaussian. Let be a vector of random variables on I areal units (i.e., I vertices). The zero-centred conditional autoregressive model specifies full conditional Gaussian distributions
where
is the collection of
for
. The resulting joint distribution, derived using Brook's lemma, has a density function as follows,
where
is an identity matrix;
is an
matrix whose off-diagonal entries are
and diagonal entries are zeros, and
. The joint distribution is multivariate Gaussian if and only if
is symmetric and positive definite. A further parameterization on
and
is needed to reduce the number of parameters in the model. Consider a so-called adjacency matrix
for the undirected graph, where the
th entry
if unit i and unit
are neighbours (denoted as
) and
otherwise. One popular parameterization is to let
and
, where
is the ith row sum of
, representing the total number of neighbours of unit i. Let
. When ρ is strictly between the smallest and largest eigenvalues of
, or sufficiently, when
, and
, the joint distribution of
is a zero-mean multivariate Gaussian distribution:
. This is called the proper conditional autoregressive model in the literature. When
, it is called the intrinsic conditional autoregressive model which is an improper distribution due to the singular covariance matrix.
Turning to the multivariate case, consider J responses (e.g., multiple diseases) on I areal units. Let be an
matrix-variate, where the ijth entry
is a random variable for the ith areal unit and jth response. Each column of
is an areal vector for a single response and hence can be modelled by the univariate conditional autoregressive model. However, a multivariate model is desired for the matrix-variate
in order to simultaneously model the dependence across responses. Initially proposed by Mardia (Citation1988), the multivariate conditional autoregressive model specifies full conditional distributions on row vectors of
. Let
be the ith row vector of
. Following Besag (Citation1974), specify
(1)
(1) where
and
are
matrices needing a further parameterization. To make the joint distribution for
a multivariate Gaussian,
and
must satisfy certain conditions (Mardia, Citation1988). Gelfand and Vounatsou (Citation2003) showed a convenient parameterization,
and
. When
and
is positive definite,
has a zero-mean multivariate Gaussian distribution:
. It is clear that this multivariate specification is a Kronecker-product formula where
models the covariance structure across rows of
(spatial domain) and
models the covariance structure across columns of
(response domain). From the modelling perspective, Mardia's specification has a difficulty with parameterization. It is usually difficult to have a meaningful parameterization for
and
unless one pursues a simple formulation. It is arguable that the Mardia's specification presents a conflict, where the between vector variation is specified through an inverse covariance matrix, but the within vector variation is specified through a covariance matrix. It seems more intuitive to either work with the joint covariance or the joint inverse covariance directly. Notice that most multivariate spatial models for point reference data focus on the joint covariance structure. In this paper, we focus on the joint inverse covariance structure of elements in the multivariate areal data. In particular, we consider the joint adjacency structure of the lattice based on graphical structures of both the spatial domain and the response domain. We build a framework for graph-based multivariate conditional autoregressive models and discuss parameterizations under this framework. The advantage is that this framework is very general and we demonstrate it through multiple case examples. Furthermore, we allow graph learning for multiple responses in such models, which is potentially useful for many modern applications.
We shall point out other recent work on multivariate conditional autoregressive models. Kim, Sun, and Tsutakawa (Citation2001) and Jin, Carlin, and Banerjee (Citation2005) proposed conditional autoregressive models for bivariate areal data. Multivariate models were considered by Gelfand and Vounatsou (Citation2003), Jin, Banerjee, and Carlin (Citation2007), MacNab (Citation2011, Citation2016), Martinez-Beneito (Citation2013), Martinez-Beneito, Botella-Rocamora, and Banerjee (Citation2017) among many others. MacNab (Citation2018) reviewed some recent developments on multivariate Gaussian Markov random field models. We will show that some of the earlier work can be reconstructed in our proposed framework and some can be extended to graphical models. The paper is organised as follows. Section 2 presents the general framework and three special parameterizations. Section 3 presents a real data example using the proposed models. Section 5 contains further discussions and remarks. Technical details are given in the Appendices.
2. Graph-based multivariate conditional autoregressive models
2.1. General framework
Instead of specifying full conditional distributions on vectors like (Equation1(1)
(1) ), we approach this problem from an element-based perspective. Following Besag (Citation1974), specify full conditional distributions for each element
in the matrix-variate
as follows,
where
means either
or
. In fact, here we consider a lattice consisting of all elements in
. Using Brook's lemma, the resulting joint distribution for
is
where
is an
identity matrix,
and
can be expressed block-wisely,
and the diagonal elements
are zeros. The joint distribution for
is multivariate Gaussian if and only if
is symmetric and positive definite. It is desired that
and
are further parameterised to reduce the number of parameters in the model. We denote this general model
for later use.
Consider the adjacency structure of the undirected graph for the lattice of . In the univariate situation, the adjacency structure is determined by those geographical locations. Two areal units are connected by an edge if they are neighbours geographically. However, it is not obvious which elements should be neighbours in
. Consider that the J responses can be connected through an undirected graph. Let
be the adjacency matrix for all I areal units and
be the adjacency matrix for all J responses. Both the spatial graph and the response graph are then uniquely determined by
and
, respectively. Let
be the joint adjacency matrix for the lattice of
. A general construction of
can be made through
and
,
(2)
(2) This construction connects
with
,
and
, meaning its spatial neighbour, response neighbour and interaction neighbour, respectively. One may add edges for secondary neighbours or drop edges in a specific modelling. For example, some reduced constructions would be (i)
(independent conditional autoregressive models, no dependence between responses); (ii)
(independent multivariate variables, no spatial dependence); and (iii)
(drop edges for interaction neighbours
).
Let denote entries in
, analogous to the block-wise notation
for
. Let
be the jth row sum in
and
be the ith row sum in
. Then the ijth row sum in
is
. Let
,
and
. With the adjacency constructions and notations, we then explore further parameterisation on
and
in the following subsections, and specifically, we discuss three specifications made from this general framework, all of which are linked to well-known models in the literature.
2.2. Model 1: nonseparable multifold specification
Kim et al. (Citation2001) developed a twofold conditional autoregressive model for bivariate areal data (J = 2), using different linkage parameters for different types of neighbours. Those linkage parameters, in their work, are called smoothing and bridging parameters, representing the strength of information sharing. If we extend their specification to an arbitrary J, we can parameterise and
in the following way (assuming
and
):
where
,
and
are linkage parameters and
are variance components. Linkage parameters are for three types of neighbour:
,
and
. Having this specification, the conditional mean of
essentially is
which is a weighted average of all its neighbours in
. This specification generalises (Kim et al., Citation2001)' twofold model and hence could be called a multifold specification. It can be shown that, for this parameterisation, the joint precision matrix is
(3)
(3) where
,
,
and
are
symmetric matrices with entries
and
, respectively, and the operator ° means an element-wise product. A derivation of (Equation3
(3)
(3) ) is given in Appendix 1. Note that only nonzero entries of
and
are parameters in the model, the number of which depends on
.
In order to make (Equation3(3)
(3) ) positive definite, constraints on
,
and
are needed, assuming that
. In general, it is difficult to find a sufficient and necessary condition for the positive definiteness of (Equation3
(3)
(3) ). Kim et al. (Citation2001)'s solution to this problem was a sufficient condition:
, under which the matrix (Equation3
(3)
(3) ) is diagonally dominant and hence is positive definite. Though their proof was under J = 2, it is true for any J by the same arguments. The advantage of this condition is that it is simple and implementable. However, this is not a necessary condition meaning that it is impossible to reach all possible positive definite structures for the model under such a condition. In a Bayesian model, priors on parameters
,
and
can be chosen based on their actual constraints. In our case, a uniform prior
is adequate for these linkage parameters. Priors on the variance components
can be weakly-informative inverse gamma priors
. Inference and computation under this model are given in Appendix 2.
2.3. Model 2: separable specification with homogeneous spatial smoothing
Gelfand and Vounatsou (Citation2003)'s Kronecker-product model is a convenient parameterisation of Mardia (Citation1988)'s model. In our framework, this specification can be obtained and extended by having the following parameterisation for and
(assuming
and
):
where ρ and
are linkage parameters, and
are variance components. This parameterisation does not seem straightforward, but is much clearer in the form of conditional mean:
(4)
(4) Note that for a single response, the univariate conditional autoregressive model specifies
. In the multivariate setting,
is no longer the conditional mean for
and their conditional difference is regressed on other differences through
. This parameterisation yields the joint precision matrix
(5)
(5) where
is a symmetric
matrix with entries
. A derivation of (Equation5
(5)
(5) ) is given in Appendix 1. The linkage parameter ρ is interpreted as a spatial smoothing parameter and
controls the dependence across J responses. It is noteworthy that only nonzero entries in
are parameters in the model and we denote
for simplicity. The notation
, commonly used in graphical models, means the precision matrix restricted by graph
. The model (Equation5
(5)
(5) ) is a natural extension of Gelfand and Vounatsou (Citation2003)'s model. When
is the complete graph (any two vertices are connected),
is free of zero entries. Then let
and (Equation5
(5)
(5) ) is equivalent to Gelfand and Vounatsou (Citation2003)'s specification. We call this a completely separable specification because the Kronecker product completely separates the spatial domain and the response domain. This complete separation is often not desirable because it makes the spatial smoothing common for all j. We call this homogeneous spatial smoothing because the linkage ρ is the same for any i and
which distinguishes Model 2 from Model 3 in the next section.
The joint precision matrix (Equation5(5)
(5) ) is positive definite if
and
is positive definite. Let
be the cone of symmetric positive definite matrices restricted by
and then
. In a Bayesian model, a widely used prior on
is the G-Wishart distribution (Atay-Kayis & Massam, Citation2005; Letac & Massam, Citation2007). The G-Wishart distribution is a conjugate family for the precision matrix of a Gaussian graphical model, whose density function is given by
where b>2 is the number of degrees of freedom;
is the scale matrix and
is the normalising constant. It is practically attractive because of its conjugacy. That said, for a prior distribution
and a given sample covariance matrix
of sample size n, the posterior distribution of
is
. Inference and computation under this model are given in Appendix 2.
2.4. Model 3: separable specification with heterogenous spatial smoothing
Dobra, Lenkoski, and Rodriguez (Citation2011) introduced a multivariate lattice model by giving Kronecker-product G-Wishart priors to the matrix-variate . In our framework,
and
can be parameterised in the following way (assuming
and
):
which is equivalent to the version of conditional mean
(6)
(6) Comparing (Equation6
(6)
(6) ) with (Equation4
(4)
(4) ), instead of a homogeneous spatial smoothing with ρ, it has a heterogeneous specification with
. This is hence more flexible in the spatial domain. The resulting joint precision matrix is
(7)
(7) where
is a symmetric
matrix with entries
and
is a symmetric
matrix with entries
. A derivation of (Equation7
(7)
(7) ) is given in Appendix 1. We again use
and
for simplicity. In model (Equation5
(5)
(5) ), the spatial part is the conventional conditional autoregressive model while in model (Equation7
(7)
(7) ), it is modelled by a more flexible one
.
The precision matrix (Equation7(7)
(7) ) is positive definite if both
and
are positive definite. In a Bayesian model, both can have G-Wishart priors. The specification has an obvious problem of identification:
, where z is an arbitrary constant scalar. Following Wang and West (Citation2009), one can impose a constraint
and add an auxiliary variable z. Then specify a joint prior on
:
(8)
(8) where
is the density of G-Wishart distribution. Transform this joint density to
and we obtain the desired joint prior. There is no additional constraint imposed on
and let
. Inference and computation under this model are given in Appendix 2.
2.5. Priors for the graph
The two types of graphs used in this modelling framework should be treated differently. On one hand, the spatial graph should be treated known and fixed because the geographical locations and their neighbouring structure is fixed in most scenarios. On the other hand, the response graph should be treated unknown because we often know little about the relationship between multiple responses. In the literature of Gaussian graphical model determination, usually the unknown graph is assumed random and a prior on the graph is assigned. The Markov chain Monte Carlo (MCMC) sampling scheme, such as the reversible jump MCMC (Green, Citation1995), is often used to sample graphs from the posterior distribution. In this paper, we adopt and slightly modify existing MCMC algorithms for the graph determination (Dobra et al., Citation2011; Wang & Li, Citation2012), with computational details given in Appendix 2, for each aforementioned model. For the prior choice of , consider
(9)
(9) where
is the beta function, m is the total number of possible edges
,
, and a and b are given hyperparameters. More details about this prior can be found in Scott and Berger (Citation2006) and Scott and Carvalho (Citation2009). The following prior is often used as well (Dobra et al., Citation2011):
(10)
(10) where
is a given hyperparameter. Sparser graphs can be favoured by choosing a small value for π. The prior (Equation9
(9)
(9) ) can be obtained by integrating π out with a hyperprior
on π.
3. An application
We illustrate the proposed models with a real example of disease mapping. It is known that smoking is linked with multiple diseases in the population, of which leading diseases include lung diseases and heart diseases. The dataset under consideration here includes six variables, among which four variables are related to the smoke exposure and the other two are diseases. Obtained from the 2011 Missouri County Level Survey, the four smoke exposure variables are: Current Cigarette Smoking, Current Smokeless Tobacco Use, Current Other Tobacco Use, Exposure to Secondhand Smoke. Data are binary responses to the survey questionnaires (Yes or No), aggregated to each county level. The other two variables, obtained from the Surveillance, Epidemiology and End Results (SEER) program, are the Lung Cancer Mortality and the Heart Diseases Mortality, both of which are counts for each county within a specified time period. To summarise, we have I = 115 counties and J = 6 response variables. Let be the numbers of respondents in the survey and let
and
be the age-adjusted expected mortality for the two diseases. Then, the proportions
are empirical estimates of the prevalences of the survey variables, and the proportions
are standardised mortality ratios of the diseases.
Consider a Bayesian hierarchical model for . We use the binomial-logit model and the Poisson-lognormal model (Banerjee, Gelfand, & Carlin, Citation2004) for
and
, respectively, i.e.,
For simplicity, we do not consider other covariates in this example. The primary interest here is to model the random effects
, which are expected to be correlated in both the spatial domain and the response domain. To complete the model specification, specify a weakly-informative normal prior for the intercepts
and a multivariate conditional autoregressive model
for the random effects
. We apply the three proposed versions of
here. Hyperparameters for prior distributions are specified as follows. For the graph, noticing that the choice of π in (Equation10
(10)
(10) ) can influence the posterior inference, we consider the prior (Equation10
(10)
(10) ) with both
in favour of a sparse graph and
as no preference. All other priors are chosen to be only weakly-informative and have little impact on the posterior inference. In Model 1, we specify hyperparameters in the inverse gamma prior as
. In Model 2, we specify hyperparameters in the G-Wishart prior as b = 3 and
. In Model 3, we specify hyperparameters in the two G-Wishart priors as
and
, which implies a prior mode at a proper conditional autoregressive model. For each model, we perform the Markov chain Monte Carlo for 150,000 iterations with a burn-in size of 50,000. Posterior results are based on the remaining samples. Figure shows the convergence of the log-joint-posterior and notice that they all converge quickly.
Figure 1. Data analysis: convergence of the log-joint posterior under the three models (first 50,000 iterations), where
represents the collection of all parameters.
![Figure 1. Data analysis: convergence of the log-joint posterior logp(θ∣data) under the three models (first 50,000 iterations), where θ represents the collection of all parameters.](/cms/asset/f53ab6d5-dd24-486d-8f44-278c209e9d51/tstf_a_1666242_f0001_ob.jpg)
Table shows the posterior edge inclusion probabilities for the response graph . First, all three models seem to agree on the link between Cigarette Smoking and Secondhand Smoke Exposure, as well as the link between Lung Diseases Mortality and Heart Diseases Mortality. There is a moderate agreement on the links between Secondhand Smoke Exposure and Lung Diseases Mortality, and between Cigarette Smoking and Lung Diseases Mortality. In general, Model 1 tends to be a sparser graph, which is possibly due to the diagonal dominance condition. Model 2 is the simplest model as reflected by its pD, the effective number of parameters but has the largest DIC. Model 3 is the most flexible model among the three, and the inferred graph tends to be denser than the other two. It is as expected that its pD is larger but the overall criterion DIC is much smaller than the other two. Second, the edge inclusion probabilities are in general higher when
, as expected, but it has little material impact on the final inferred graph. The DIC has little change with different π values. Lastly, Figures – show the maps of spatial random effects
for the three models, respectively, and for a problem of disease mapping, this is often the eventual output for practitioners.
Table 1. Data analysis: Posterior edge inclusion probabilities for the response graph and deviance information criterion for Models 1–3, respectively.
4. Simulation
To validate the proposed algorithms, we perform a simulation study on a regular grid (I = 49 area units) with J = 4 response variables. Consider the true response graph with two edges
and
. In this simulation study, we do not consider the scenario with misspecified models, and therefore, data are generated under each of the three models and the correct model is then used for inference. The parameter settings are given as follows. For Model 1,
,
,
and
. For Model 2,
,
,
and
. For Model 3, parameters are the same as Model 2 but
is generated from
. We repeat the simulation and inference process for L = 50 times and for each time, the MCMC iteration number is 5000. We consider three measures for validating and comparing the three algorithms. The first measure is the mean inclusion probability matrix with standard deviations. We call the second measure the error rate of mis-identified edges. If we use 0.5 as the threshold for identifying an edge in the graph, for each replication, we obtain an inferred graph and then compare with the true graph to record a proportion of wrong edges/non-edges. The error rate is the average proportion of L replications. The third measure is the mean absolute error (MAE) of random effects in the model,
where
is the true value and
is the posterior mean.
Simulation results are given in Table . For all three models, the algorithms can correctly identify the true edges. The algorithm for Model 1 appears to be unstable as the standard deviation is large and tends to underestimate inclusion probabilities, while the algorithm for Model 3 tends to overestimate inclusion probabilities for non-edges. The algorithm for Model 2 presents the smallest error rate and MAE. Note that this simulation study validates the proposed algorithms under correct model specifications and hence the result cannot imply that Model 2 is the best model for a real dataset. In fact, as shown in the data analysis, Model 2 is the simplest specification and is the least preferred model in that case according to DIC.
Table 2. Simulation: Mean posterior edge inclusion probabilities (standard deviations in parentheses), error rates of mis-identified edges, and the mean absolute errors of random effects.
5. Further discussion
In this paper, we proposed a modelling framework for multivariate areal data from a graphical model perspective. We rebuilt three well-known models in our framework and developed Bayesian inference tools for the proposed models. It is our perspective that this framework is very general and can contain other models that are beyond the cases discussed in the paper. For example, Jin et al. (Citation2007) specified a co-regionalised areal data model, in which their Case 3 is a very general specification. We show that this specification can be reproduced and extended in our framework. Consider the Cholesky decomposition . Jin et al. (Citation2007)'s Case 3 specification of the joint covariance matrix is
whose inverse is then
(11)
(11) where
is a symmetric
matrix. Let
and
. Obviously it is one-to-one from
to
. Specification (Equation11
(11)
(11) ) is hence equivalent to
(12)
(12) where
is a symmetric
matrix with entries
. To reproduce this specification in our framework, parameterise
and
as follows (assuming
and
):
This parameterisation leads to the joint precision matrix
(13)
(13) The expression (Equation13
(13)
(13) ) reduces to (Equation12
(12)
(12) ) which is equivalent to Jin et al. (Citation2007)'s (Equation11
(11)
(11) ) when
is a complete graph. A derivation of (Equation13
(13)
(13) ) is given in Appendix 1. The validity of this model relies on the positive definiteness of (Equation13
(13)
(13) ). Jin et al. (Citation2007) showed that it is positive definite if
is positive definite and eigenvalues of
are between
and
, reciprocals of the smallest and largest eigenvalues of
, which are known constants. The graphical version (Equation13
(13)
(13) ) must also satisfy this condition, that is,
, where
is any eigenvalue of
. Considering that both
and
are restricted by the underlying graph, the eigenvalue condition is not easy to implement in computations. This matter is worth investigating in the future.
In general, flexible models are desired for modelling multivariate areal data because overly simplistic models may misspecify the true underlying covariance structure. However, there is almost always a trade-off between the simplicity and the flexibility of a model. It is probably reasonable to allow certain flexibilities for specific purposes, such as in this paper, for learning a graphical relationship between multiple responses. It is usually the practitioner's choice whether a more flexible but complicated model is needed for the problem at hand, especially when the performance improvement is negligible.
Acknowledgments
The author would like to thank two editors and two anonymous referees for their helpful comments which lead to a much improved paper.
Disclosure statement
No potential conflict of interest was reported by the author.
ORCID
Ye Liang http://orcid.org/0000-0002-6513-8962
Additional information
Notes on contributors
Ye Liang
Ye Liang is an associate professor in the Department of Statistics at Oklahoma State University. He received his Ph.D. in statistics from the University of Missouri.
References
- Atay-Kayis, A., & Massam, H. (2005). A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika, 92, 317–335. doi: 10.1093/biomet/92.2.317
- Banerjee, S., Gelfand, A., & Carlin, B. (2004). Hierarchical modeling and analysis for spatial data. Boca Raton, FL: CRC Press/Chapman & Hall.
- Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B, 36, 192–236.
- Dobra, A., Lenkoski, A., & Rodriguez, A. (2011). Bayesian inference for general Gaussian graphical models with applications to multivariate lattice data. Journal of the American Statistical Association, 106, 1418–1433. doi: 10.1198/jasa.2011.tm10465
- Gelfand, A., Sahu, S., & Carlin, B. (1995). Efficient parameterizations for normal linear mixed models. Biometrika, 82, 479–488. doi: 10.1093/biomet/82.3.479
- Gelfand, A., & Vounatsou, P. (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics, 4, 11–25. doi: 10.1093/biostatistics/4.1.11
- Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. doi: 10.1093/biomet/82.4.711
- Jin, X., Banerjee, S., & Carlin, B. (2007). Order-free co-regionalized areal data models with application to multiple-disease mapping. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69, 817–838. doi: 10.1111/j.1467-9868.2007.00612.x
- Jin, X., Carlin, B., & Banerjee, S. (2005). Generalized hierarchical multivariate CAR models for areal data. Biometrics, 61, 950–961. doi: 10.1111/j.1541-0420.2005.00359.x
- Kim, H., Sun, D., & Tsutakawa, R. (2001). A bivariate Bayes method for improving estimates of mortality rates with a twofold conditional autoregressive model. Journal of the American Statistical Association, 96, 1506–1521. doi: 10.1198/016214501753382408
- Letac, G., & Massam, H. (2007). Wishart distributions for decomposable graphs. The Annals of Statistics, 35, 1278–1323. doi: 10.1214/009053606000001235
- MacNab, Y. (2011). On Gaussian Markov random fields and Bayesian disease mapping. Statistical Methods in Medical Research, 20, 49–68. doi: 10.1177/0962280210371561
- MacNab, Y. C. (2016). Linear models of coregionalization for multivariate lattice data: A general framework for coregionalized multivariate CAR models. Statistics in Medicine, 35, 3827–3850. doi: 10.1002/sim.6955
- MacNab, Y. C. (2018). Some recent work on multivariate Gaussian Markov random fields. Test, 27, 497–541. doi: 10.1007/s11749-018-0605-3
- Mardia, K. (1988). Multidimentional multivariate Gaussian Markov random fields with application to image processing. Journal of Multivariate Analysis, 24, 265–284. doi: 10.1016/0047-259X(88)90040-1
- Martinez-Beneito, M. (2013). A general modeling framework for multivariate disease mapping. Biometrika, 100, 539–553. doi: 10.1093/biomet/ast023
- Martinez-Beneito, M. A., Botella-Rocamora, P., & Banerjee, S. (2017). Towards a multidimensional approach to Bayesian disease mapping. Bayesian Analysis, 12, 239–259. doi: 10.1214/16-BA995
- Scott, J., & Berger, J. (2006). An exploratory of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference, 136, 2144–2162. doi: 10.1016/j.jspi.2005.08.031
- Scott, J., & Carvalho, C. (2009). Feature-inclusion stochastic search for Gaussian graphical models. Journal of Computational and Graphical Statistics, 17, 790–808. doi: 10.1198/106186008X382683
- Wang, H., & Li, S. (2012). Efficient Gaussian graphical model determination under G-Wishart prior distributions. Electronic Journal of Statistics, 6, 168–198. doi: 10.1214/12-EJS669
- Wang, H., & West, M. (2009). Bayesian analysis of matrix normal graphical models. Biometrika, 96, 821–834. doi: 10.1093/biomet/asp049
Appendices
Appendix 1. Derivations
A.1. Derivation of Equation (3)
With the parameterisation in Model 1, we have and
Then immediately we have expression (Equation3
(3)
(3) ) for
.
A.2. Derivation of Equation (5)
With the parameterisation in Model 2, we have and
Then
which is expression (Equation5
(5)
(5) ).
A.3. Derivation of Equation (7)
With the parameterisation in Model 3, we have and
Then
which is expression (Equation7
(7)
(7) ).
A.4. Derivation of Equation (13)
With the parameterisation in Model 4, we have and
Then
which is expression (Equation13
(13)
(13) ).
Appendix 2: Bayesian computations
A.5. A hierarchical generalised linear model
For illustration, we now assume a full Bayesian hierarchical model and give computational details for this model. Assume binomial counts for J responses and I areal units. Specify a Bayesian model as follows, for
and
,
where
is a given constant,
is the matrix-variate of
, and priors for
and
depend on the specific parameterisation. This section is organised as follows: we first give details of updating effects parameters
and
, and then, separately for each model, details of updating parameters of
and updating the random response graph
.
A.6. Updating effects parameters
Our experience has shown that the convergence is poor if we directly update and
. We apply the hierarchical centring technique (Gelfand, Sahu, & Carlin, Citation1995) and block sampling. Let
and
has a non-centred MCAR prior. We update
instead of
. The full conditional distribution of
is
We use Metropolis-Hastings algorithm to sample
from this conditional density. We block sample
in the following way. For now denote
, the joint precision matrix. Let
and
be a
vector such that
is the sum of the first I elements in
,
is the sum of the second I elements in
and so on. Partition
into
blocks and define
where
is the all-one vector. Then the full conditional distribution for the vector
is
A.7. Model 1: updating ![](//:0)
, ![](//:0)
, ![](//:0)
, ![](//:0)
and ![](//:0)
![](//:0)
Given the current graph , parameters are updated through Gibbs sampling. Recall priors on these parameters:
and
. Let
be the jth column vector of
,
and
be the jth diagonal block of
. The full conditional distribution of
is given by
It can be shown that the transformed one
is log-concave when
. Thus, we use the adaptive rejection sampling to update
.
Let be an
matrix, where
and
be the jth column vector of
. Let
as in (Equation3
(3)
(3) ). Then
,
and
are sequentially updated through following full conditional distributions,
We use Metropolis-Hastings algorithm to update these parameters. Note that evaluating the sparse
could be computationally intensive. An efficient algorithm, usually based on the Cholesky decomposition, on sparse matrices is helpful.
The graph is updated through a simple reversible jump MCMC algorithm. Propose a new graph
by only adding or deleting one edge from
. Without loss of generality, suppose that one edge
is added to the new graph. Dimension has been changed by 2 from
to
. Propose
and
, and let
and
. The Jacobian from
to
hence is 1. Choose a Bernoulli jump proposal with odds
and systematically scan through the graph for updating. Accept the move from
to
with probability
where
A.8. Model 2: updating ρ, ![](//:0)
and ![](//:0)
![](//:0)
Given the current graph , parameters are updated through Gibbs sampling. Recall priors on these parameters:
and
. Use Metropolis-Hastings algorithm to update ρ. It can be shown that the full conditional distribution for ρ is
Let
be an
matrix, where
and
be the jth column vector of
. Let
be an
matrix with
. Then the full conditional distribution of
is
which is GWis(
). For sampling from the G-Wishart distribution, we use the block Gibbs sampler, given the set of maximum cliques, introduced by Wang and Li (Citation2012).
The graph is updated using (Wang & Li, Citation2012)'s partial analytic structure algorithm (p. 188, Algorithm 2).
A.9. Model 3: updating ![](//:0)
, ![](//:0)
and ![](//:0)
![](//:0)
Given the current graph , parameters are updated through Gibbs sampling. Recall that we impose a constraint and use a joint prior (Equation8
(8)
(8) ) on
and have
. Let both
and
be
matrices, where
and
. Let
be the jth column vector of
and
be the ith row vector of
. Then let
be
with
and
be
with
. With these notations, we have
where
and
;
and
The graph
is updated using (Wang & Li, Citation2012)'s partial analytic structure algorithm (p. 188, Algorithm 2).