549
Views
1
CrossRef citations to date
0
Altmetric
Articles

Graph-based multivariate conditional autoregressive models

ORCID Icon
Pages 158-169 | Received 25 Nov 2018, Accepted 07 Sep 2019, Published online: 16 Sep 2019

ABSTRACT

The conditional autoregressive model is a routinely used statistical model for areal data that arise from, for instances, epidemiological, socio-economic or ecological studies. Various multivariate conditional autoregressive models have also been extensively studied in the literature and it has been shown that extending from the univariate case to the multivariate case is not trivial. The difficulties lie in many aspects, including validity, interpretability, flexibility and computational feasibility of the model. In this paper, we approach the multivariate modelling from an element-based perspective instead of the traditional vector-based perspective. We focus on the joint adjacency structure of elements and discuss graphical structures for both the spatial and non-spatial domains. We assume that the graph for the spatial domain is generally known and fixed while the graph for the non-spatial domain can be unknown and random. We propose a very general specification for the multivariate conditional modelling and then focus on three special cases, which are linked to well-known models in the literature. Bayesian inference for parameter learning and graph learning is provided for the focused cases, and finally, an example with public health data is illustrated.

1. Introduction

Areal data, sometimes called lattice data, are usually represented by an undirected graph where each vertex represents an areal unit and each edge represents a neighbouring relationship. A finite set of random variables on an undirected graph, where each vertex is a random variable, is called a Markov random field if it has the Markov property. Hence, the Markov random field models are often used for the areal data. The univariate conditional autoregressive (CAR) model, originated from Besag (Citation1974), is a Gaussian Markov random field model, for which the joint distribution is multivariate Gaussian. Let u=(u1,,uI)T be a vector of random variables on I areal units (i.e., I vertices). The zero-centred conditional autoregressive model specifies full conditional Gaussian distributions uiuiNiibiiui,τi2,i=1,,I, where ui is the collection of ui for ii. The resulting joint distribution, derived using Brook's lemma, has a density function as follows, f(uTCAR,BCAR)exp12uTTCAR1(IBCAR)u, where I is an identity matrix; BCAR is an I×I matrix whose off-diagonal entries are bii and diagonal entries are zeros, and TCAR=diag{τ12,,τI2}. The joint distribution is multivariate Gaussian if and only if TCAR1(IBCAR) is symmetric and positive definite. A further parameterization on BCAR and TCAR is needed to reduce the number of parameters in the model. Consider a so-called adjacency matrix CCAR for the undirected graph, where the iith entry Cii=1 if unit i and unit i are neighbours (denoted as ii) and Cii=0 otherwise. One popular parameterization is to let bii=ρCii/Ci+ and τi2=σ2/Ci+, where Ci+ is the ith row sum of CCAR, representing the total number of neighbours of unit i. Let DCAR=diag{C1+,,CI+}. When ρ is strictly between the smallest and largest eigenvalues of DCAR1/2CCARDCAR1/2, or sufficiently, when |ρ|<1, and σ2>0, the joint distribution of u is a zero-mean multivariate Gaussian distribution: uN{0,σ2(DCARρCCAR)1}. This is called the proper conditional autoregressive model in the literature. When ρ=1, it is called the intrinsic conditional autoregressive model which is an improper distribution due to the singular covariance matrix.

Turning to the multivariate case, consider J responses (e.g., multiple diseases) on I areal units. Let U be an I×J matrix-variate, where the ijth entry uij is a random variable for the ith areal unit and jth response. Each column of U is an areal vector for a single response and hence can be modelled by the univariate conditional autoregressive model. However, a multivariate model is desired for the matrix-variate U in order to simultaneously model the dependence across responses. Initially proposed by Mardia (Citation1988), the multivariate conditional autoregressive model specifies full conditional distributions on row vectors of U. Let ui be the ith row vector of U. Following Besag (Citation1974), specify (1) uiuiNiiBiiui,Σi,i=1,,I,(1) where Bii and Σi are J×J matrices needing a further parameterization. To make the joint distribution for vec(UT) a multivariate Gaussian, Bii and Σi must satisfy certain conditions (Mardia, Citation1988). Gelfand and Vounatsou (Citation2003) showed a convenient parameterization, Bii=(ρCii/Ci+)IJ and Σi=Σ/Ci+. When |ρ|<1 and Σ is positive definite, vec(UT) has a zero-mean multivariate Gaussian distribution: vec(UT)N{0,(DCARρCCAR)1Σ}. It is clear that this multivariate specification is a Kronecker-product formula where (DCARρCCAR)1 models the covariance structure across rows of U (spatial domain) and Σ models the covariance structure across columns of U (response domain). From the modelling perspective, Mardia's specification has a difficulty with parameterization. It is usually difficult to have a meaningful parameterization for Bii and Σi unless one pursues a simple formulation. It is arguable that the Mardia's specification presents a conflict, where the between vector variation is specified through an inverse covariance matrix, but the within vector variation is specified through a covariance matrix. It seems more intuitive to either work with the joint covariance or the joint inverse covariance directly. Notice that most multivariate spatial models for point reference data focus on the joint covariance structure. In this paper, we focus on the joint inverse covariance structure of elements in the multivariate areal data. In particular, we consider the joint adjacency structure of the lattice based on graphical structures of both the spatial domain and the response domain. We build a framework for graph-based multivariate conditional autoregressive models and discuss parameterizations under this framework. The advantage is that this framework is very general and we demonstrate it through multiple case examples. Furthermore, we allow graph learning for multiple responses in such models, which is potentially useful for many modern applications.

We shall point out other recent work on multivariate conditional autoregressive models. Kim, Sun, and Tsutakawa (Citation2001) and Jin, Carlin, and Banerjee (Citation2005) proposed conditional autoregressive models for bivariate areal data. Multivariate models were considered by Gelfand and Vounatsou (Citation2003), Jin, Banerjee, and Carlin (Citation2007), MacNab (Citation2011Citation2016), Martinez-Beneito (Citation2013), Martinez-Beneito, Botella-Rocamora, and Banerjee (Citation2017) among many others. MacNab (Citation2018) reviewed some recent developments on multivariate Gaussian Markov random field models. We will show that some of the earlier work can be reconstructed in our proposed framework and some can be extended to graphical models. The paper is organised as follows. Section 2 presents the general framework and three special parameterizations. Section 3 presents a real data example using the proposed models. Section 5 contains further discussions and remarks. Technical details are given in the Appendices.

2. Graph-based multivariate conditional autoregressive models

2.1. General framework

Instead of specifying full conditional distributions on vectors like (Equation1), we approach this problem from an element-based perspective. Following Besag (Citation1974), specify full conditional distributions for each element uij in the matrix-variate U as follows, uiju{ij}N{ij}{ij}b{ij},{ij}uij,τij2,i=1,,I and j=1,,J, where {ij}{ij} means either ii or jj. In fact, here we consider a lattice consisting of all elements in U. Using Brook's lemma, the resulting joint distribution for vec(U) is f(vec(U)B,T)exp×12vec(U)TT1(IB)vec(U), where I is an IJ×IJ identity matrix, T=diag{τ112,,τI12,,τ1J2,,τIJ2} and B can be expressed block-wisely, B=B11B1JBJ1BJJwhereBjj=b{1j},{1j}b{1j},{Ij}b{Ij},{1j}b{Ij},{Ij}, and the diagonal elements b{ij},{ij} are zeros. The joint distribution for vec(U) is multivariate Gaussian if and only if T1(IB) is symmetric and positive definite. It is desired that B and T are further parameterised to reduce the number of parameters in the model. We denote this general model MCAR(B,T) for later use.

Consider the adjacency structure of the undirected graph for the lattice of U. In the univariate situation, the adjacency structure is determined by those geographical locations. Two areal units are connected by an edge if they are neighbours geographically. However, it is not obvious which elements should be neighbours in U. Consider that the J responses can be connected through an undirected graph. Let C(s) be the adjacency matrix for all I areal units and C(r) be the adjacency matrix for all J responses. Both the spatial graph and the response graph are then uniquely determined by C(s) and C(r), respectively. Let C be the joint adjacency matrix for the lattice of U. A general construction of C can be made through C(s) and C(r), (2) C=C(r)C(s)+C(r)II+IJC(s).(2) This construction connects uij with uii,j, ui,jj and uii,jj, meaning its spatial neighbour, response neighbour and interaction neighbour, respectively. One may add edges for secondary neighbours or drop edges in a specific modelling. For example, some reduced constructions would be (i) C=IJC(s) (independent conditional autoregressive models, no dependence between responses); (ii) C=C(r)II (independent multivariate variables, no spatial dependence); and (iii) C=C(r)II+IJC(s) (drop edges for interaction neighbours uii,jj).

Let C{ij},{ij} denote entries in C, analogous to the block-wise notation b{ij},{ij} for B. Let dj(r) be the jth row sum in C(r) and di(s) be the ith row sum in C(s). Then the ijth row sum in C is dij=dj(r)di(s)+dj(r)+di(s). Let D(r)=diag{d1(r),,dJ(r)}, D(s)=diag{d1(s),,dI(s)} and D=diag{d11,,dI1,,d1J,,dIJ}. With the adjacency constructions and notations, we then explore further parameterisation on B and T in the following subsections, and specifically, we discuss three specifications made from this general framework, all of which are linked to well-known models in the literature.

2.2. Model 1: nonseparable multifold specification

Kim et al. (Citation2001) developed a twofold conditional autoregressive model for bivariate areal data (J = 2), using different linkage parameters for different types of neighbours. Those linkage parameters, in their work, are called smoothing and bridging parameters, representing the strength of information sharing. If we extend their specification to an arbitrary J, we can parameterise B and T in the following way (assuming ii and jj): b{ij},{ij}=λjdijC{ij},{ij},b{ij},{ij}=ψjjdijδjδjC{ij},{ij},b{ij},{ij}=φjjdijδjδjC{ij},{ij},τij2=δjdij, where λj, ψjj and φjj are linkage parameters and δj are variance components. Linkage parameters are for three types of neighbour: uii,j, ui,jj and uii,jj. Having this specification, the conditional mean of uij essentially is E(uiju{ij})=1dijλjiiuij+ψjjjjδjδjuij+φjjiijjδjδjuij, which is a weighted average of all its neighbours in C. This specification generalises (Kim et al., Citation2001)' twofold model and hence could be called a multifold specification. It can be shown that, for this parameterisation, the joint precision matrix is (3) T1(IB)=(Δ1/2II)DΛC(s)(ΨC(r))II(ΦC(r))C(s)×(Δ1/2II),(3) where Δ=diag{δ1,,δJ}, Λ=diag{λ1,,λJ}, Ψ and Φ are J×J symmetric matrices with entries ψjj and φjj, respectively, and the operator ° means an element-wise product. A derivation of (Equation3) is given in Appendix 1. Note that only nonzero entries of Ψ and Φ are parameters in the model, the number of which depends on C(r).

In order to make (Equation3) positive definite, constraints on λj, ψjj and φjj are needed, assuming that δj>0. In general, it is difficult to find a sufficient and necessary condition for the positive definiteness of (Equation3). Kim et al. (Citation2001)'s solution to this problem was a sufficient condition: max{|λj|,|ψjj|,|φjj|;j,j}<1, under which the matrix (Equation3) is diagonally dominant and hence is positive definite. Though their proof was under J = 2, it is true for any J by the same arguments. The advantage of this condition is that it is simple and implementable. However, this is not a necessary condition meaning that it is impossible to reach all possible positive definite structures for the model under such a condition. In a Bayesian model, priors on parameters λj, ψjj and φjj can be chosen based on their actual constraints. In our case, a uniform prior Unif(1,1) is adequate for these linkage parameters. Priors on the variance components δj can be weakly-informative inverse gamma priors IG(aj,bj). Inference and computation under this model are given in Appendix 2.

2.3. Model 2: separable specification with homogeneous spatial smoothing

Gelfand and Vounatsou (Citation2003)'s Kronecker-product model is a convenient parameterisation of Mardia (Citation1988)'s model. In our framework, this specification can be obtained and extended by having the following parameterisation for B and T (assuming ii and jj): b{ij},{ij}=ρdi(s)C{ij},{ij},b{ij},{ij}=ωjjωjjC{ij},{ij},b{ij},{ij}=ρωjjdi(s)ωjjC{ij},{ij},τij2=1di(s)ωjj, where ρ and ωjj are linkage parameters, and 1/ωjj are variance components. This parameterisation does not seem straightforward, but is much clearer in the form of conditional mean: (4) Euijρdi(s)iiuij|u{ij}=jjωjjωjjuijρdi(s)iiuij.(4) Note that for a single response, the univariate conditional autoregressive model specifies E(ui|ui)=ρiiui/di(s). In the multivariate setting, ρiiuij/di(s) is no longer the conditional mean for uiju{ij} and their conditional difference is regressed on other differences through ωjj. This parameterisation yields the joint precision matrix (5) T1(IB)=Ω(IJ+C(r))(D(s)ρC(s))=ΩC(r)(D(s)ρC(s)),(5) where Ω is a symmetric J×J matrix with entries ωjj. A derivation of (Equation5) is given in Appendix 1. The linkage parameter ρ is interpreted as a spatial smoothing parameter and Ω controls the dependence across J responses. It is noteworthy that only nonzero entries in Ω are parameters in the model and we denote ΩC(r)=Ω(IJ+C(r)) for simplicity. The notation ΩC(r), commonly used in graphical models, means the precision matrix restricted by graph C(r). The model (Equation5) is a natural extension of Gelfand and Vounatsou (Citation2003)'s model. When C(r) is the complete graph (any two vertices are connected), ΩC(r) is free of zero entries. Then let Σ=Ω1 and (Equation5) is equivalent to Gelfand and Vounatsou (Citation2003)'s specification. We call this a completely separable specification because the Kronecker product completely separates the spatial domain and the response domain. This complete separation is often not desirable because it makes the spatial smoothing common for all j. We call this homogeneous spatial smoothing because the linkage ρ is the same for any i and i which distinguishes Model 2 from Model 3 in the next section.

The joint precision matrix (Equation5) is positive definite if |ρ|<1 and ΩC(r) is positive definite. Let M+(C(r)) be the cone of symmetric positive definite matrices restricted by C(r) and then ΩC(r)M+(C(r)). In a Bayesian model, a widely used prior on ΩC(r) is the G-Wishart distribution (Atay-Kayis & Massam, Citation2005; Letac & Massam, Citation2007). The G-Wishart distribution is a conjugate family for the precision matrix of a Gaussian graphical model, whose density function is given by p(ΩC(r)b,V)=IC(r)(b,V)1ΩC(r)(b2)/2exp×12tr(VΩC(r))1ΩC(r)M+(C(r)), where b>2 is the number of degrees of freedom; V is the scale matrix and IC(r)() is the normalising constant. It is practically attractive because of its conjugacy. That said, for a prior distribution GWis(b,V) and a given sample covariance matrix S of sample size n, the posterior distribution of ΩC(r) is GWis(b+n,V+S). Inference and computation under this model are given in Appendix 2.

2.4. Model 3: separable specification with heterogenous spatial smoothing

Dobra, Lenkoski, and Rodriguez (Citation2011) introduced a multivariate lattice model by giving Kronecker-product G-Wishart priors to the matrix-variate U. In our framework, B and T can be parameterised in the following way (assuming ii and jj): b{ij},{ij}=ωii(s)ωii(s)C{ij},{ij},b{ij},{ij}=ωjj(r)ωjj(r)C{ij},{ij},b{ij},{ij}=ωii(s)ωjj(r)ωii(s)ωjj(r)C{ij},{ij},τij2=1ωii(s)ωjj(r), which is equivalent to the version of conditional mean (6) Euij1ωii(s)iiωii(s)uij|u{ij}=jjωjjωjjuij1ωii(s)iiωii(s)uij.(6) Comparing (Equation6) with (Equation4), instead of a homogeneous spatial smoothing with ρ, it has a heterogeneous specification with ωii(s). This is hence more flexible in the spatial domain. The resulting joint precision matrix is (7) T1(IB)=Ω(r)(IJ+C(r))Ω(s)(II+C(s))=ΩC(r)ΩC(s),(7) where Ω(r) is a symmetric J×J matrix with entries ωjj(r) and Ω(s) is a symmetric I×I matrix with entries ωii(s). A derivation of (Equation7) is given in Appendix 1. We again use ΩC(r) and ΩC(s) for simplicity. In model (Equation5), the spatial part is the conventional conditional autoregressive model while in model (Equation7), it is modelled by a more flexible one ΩC(s).

The precision matrix (Equation7) is positive definite if both ΩC(r) and ΩC(s) are positive definite. In a Bayesian model, both can have G-Wishart priors. The specification has an obvious problem of identification: ΩC(r)ΩC(s)=zΩC(r)(1/z)ΩC(s), where z is an arbitrary constant scalar. Following Wang and West (Citation2009), one can impose a constraint ΩC(r),11=1 and add an auxiliary variable z. Then specify a joint prior on (z,zΩC(r)): (8) p(z,zΩC(r)b(r),V(r))pGWis(zΩC(r)b(r),V(r))1,(8) where pGWis() is the density of G-Wishart distribution. Transform this joint density to (z,ΩC(r)) and we obtain the desired joint prior. There is no additional constraint imposed on ΩC(s) and let ΩC(s)GWis(b(s),V(s)). Inference and computation under this model are given in Appendix 2.

2.5. Priors for the graph

The two types of graphs used in this modelling framework should be treated differently. On one hand, the spatial graph should be treated known and fixed because the geographical locations and their neighbouring structure is fixed in most scenarios. On the other hand, the response graph should be treated unknown because we often know little about the relationship between multiple responses. In the literature of Gaussian graphical model determination, usually the unknown graph is assumed random and a prior on the graph is assigned. The Markov chain Monte Carlo (MCMC) sampling scheme, such as the reversible jump MCMC (Green, Citation1995), is often used to sample graphs from the posterior distribution. In this paper, we adopt and slightly modify existing MCMC algorithms for the graph determination (Dobra et al., Citation2011; Wang & Li, Citation2012), with computational details given in Appendix 2, for each aforementioned model. For the prior choice of C(r), consider (9) P(C(r))B(a+size(C(r)),b+msize(C(r)))/B(a,b),(9) where B(,) is the beta function, m is the total number of possible edges J2, size(C(r)){0,1,,m}, and a and b are given hyperparameters. More details about this prior can be found in Scott and Berger (Citation2006) and Scott and Carvalho (Citation2009). The following prior is often used as well (Dobra et al., Citation2011): (10) P(C(r))πsize(C(r))(1π)msize(C(r)),(10) where π(0,1) is a given hyperparameter. Sparser graphs can be favoured by choosing a small value for π. The prior (Equation9) can be obtained by integrating π out with a hyperprior Beta(a,b) on π.

3. An application

We illustrate the proposed models with a real example of disease mapping. It is known that smoking is linked with multiple diseases in the population, of which leading diseases include lung diseases and heart diseases. The dataset under consideration here includes six variables, among which four variables are related to the smoke exposure and the other two are diseases. Obtained from the 2011 Missouri County Level Survey, the four smoke exposure variables are: Current Cigarette Smoking, Current Smokeless Tobacco Use, Current Other Tobacco Use, Exposure to Secondhand Smoke. Data are binary responses to the survey questionnaires (Yes or No), aggregated to each county level. The other two variables, obtained from the Surveillance, Epidemiology and End Results (SEER) program, are the Lung Cancer Mortality and the Heart Diseases Mortality, both of which are counts for each county within a specified time period. To summarise, we have I = 115 counties and J = 6 response variables. Let ni1,,ni4 be the numbers of respondents in the survey and let Ei5 and Ei6 be the age-adjusted expected mortality for the two diseases. Then, the proportions yij/nij, j=1,,4 are empirical estimates of the prevalences of the survey variables, and the proportions yij/Eij, j=5,6 are standardised mortality ratios of the diseases.

Consider a Bayesian hierarchical model for yij. We use the binomial-logit model and the Poisson-lognormal model (Banerjee, Gelfand, & Carlin, Citation2004) for yi,14 and yi,56, respectively, i.e., yijBin(nij,pij),logit(pij)=βj+uij,i=1,,115 and j=1,,4;yijPoi(Eijηij),log(ηij)=βj+uij,i=1,,115 and j=5,6. For simplicity, we do not consider other covariates in this example. The primary interest here is to model the random effects uij, which are expected to be correlated in both the spatial domain and the response domain. To complete the model specification, specify a weakly-informative normal prior for the intercepts βj and a multivariate conditional autoregressive model MCAR(B,T) for the random effects U={uij}. We apply the three proposed versions of MCAR(B,T) here. Hyperparameters for prior distributions are specified as follows. For the graph, noticing that the choice of π in (Equation10) can influence the posterior inference, we consider the prior (Equation10) with both π=0.2 in favour of a sparse graph and π=0.5 as no preference. All other priors are chosen to be only weakly-informative and have little impact on the posterior inference. In Model 1, we specify hyperparameters in the inverse gamma prior as aj=bj=0.5. In Model 2, we specify hyperparameters in the G-Wishart prior as b = 3 and V=I. In Model 3, we specify hyperparameters in the two G-Wishart priors as b=3,V=I,b(s)=24 and V(s)=(b(s)2)(D0.95C)1, which implies a prior mode at a proper conditional autoregressive model. For each model, we perform the Markov chain Monte Carlo for 150,000 iterations with a burn-in size of 50,000. Posterior results are based on the remaining samples. Figure  shows the convergence of the log-joint-posterior and notice that they all converge quickly.

Figure 1. Data analysis: convergence of the log-joint posterior logp(θdata) under the three models (first 50,000 iterations), where θ represents the collection of all parameters.

Figure 1. Data analysis: convergence of the log-joint posterior log⁡p(θ∣data) under the three models (first 50,000 iterations), where θ represents the collection of all parameters.

Table  shows the posterior edge inclusion probabilities for the response graph C(r). First, all three models seem to agree on the link between Cigarette Smoking and Secondhand Smoke Exposure, as well as the link between Lung Diseases Mortality and Heart Diseases Mortality. There is a moderate agreement on the links between Secondhand Smoke Exposure and Lung Diseases Mortality, and between Cigarette Smoking and Lung Diseases Mortality. In general, Model 1 tends to be a sparser graph, which is possibly due to the diagonal dominance condition. Model 2 is the simplest model as reflected by its pD, the effective number of parameters but has the largest DIC. Model 3 is the most flexible model among the three, and the inferred graph tends to be denser than the other two. It is as expected that its pD is larger but the overall criterion DIC is much smaller than the other two. Second, the edge inclusion probabilities are in general higher when π=0.5, as expected, but it has little material impact on the final inferred graph. The DIC has little change with different π values. Lastly, Figures   show the maps of spatial random effects uij for the three models, respectively, and for a problem of disease mapping, this is often the eventual output for practitioners.

Figure 2. Data analysis: posterior means of spatial random effects uij under Model 1.

Figure 2. Data analysis: posterior means of spatial random effects uij under Model 1.

Figure 3. Data analysis: posterior means of spatial random effects uij under Model 2.

Figure 3. Data analysis: posterior means of spatial random effects uij under Model 2.

Figure 4. Data analysis: posterior means of spatial random effects uij under Model 3.

Figure 4. Data analysis: posterior means of spatial random effects uij under Model 3.

Table 1. Data analysis: Posterior edge inclusion probabilities for the response graph and deviance information criterion for Models 1–3, respectively.

4. Simulation

To validate the proposed algorithms, we perform a simulation study on a 7×7 regular grid (I = 49 area units) with J = 4 response variables. Consider the true response graph with two edges C13(r) and C24(r). In this simulation study, we do not consider the scenario with misspecified models, and therefore, data are generated under each of the three models and the correct model is then used for inference. The parameter settings are given as follows. For Model 1, λj=0.95, φjj=ψjj=0.9, δj=1 and βj=1. For Model 2, ρ=0.9, ωjj=4, ω13=ω24=3.2 and βj=1. For Model 3, parameters are the same as Model 2 but ΩC(s) is generated from GWis(10,8(D0.9C)1). We repeat the simulation and inference process for L = 50 times and for each time, the MCMC iteration number is 5000. We consider three measures for validating and comparing the three algorithms. The first measure is the mean inclusion probability matrix with standard deviations. We call the second measure the error rate of mis-identified edges. If we use 0.5 as the threshold for identifying an edge in the graph, for each replication, we obtain an inferred graph and then compare with the true graph to record a proportion of wrong edges/non-edges. The error rate is the average proportion of L replications. The third measure is the mean absolute error (MAE) of random effects in the model, MAE=1L1J1Iljiuˆijluijluijl where uijl is the true value and uˆijl is the posterior mean.

Simulation results are given in Table . For all three models, the algorithms can correctly identify the true edges. The algorithm for Model 1 appears to be unstable as the standard deviation is large and tends to underestimate inclusion probabilities, while the algorithm for Model 3 tends to overestimate inclusion probabilities for non-edges. The algorithm for Model 2 presents the smallest error rate and MAE. Note that this simulation study validates the proposed algorithms under correct model specifications and hence the result cannot imply that Model 2 is the best model for a real dataset. In fact, as shown in the data analysis, Model 2 is the simplest specification and is the least preferred model in that case according to DIC.

Table 2. Simulation: Mean posterior edge inclusion probabilities (standard deviations in parentheses), error rates of mis-identified edges, and the mean absolute errors of random effects.

5. Further discussion

In this paper, we proposed a modelling framework for multivariate areal data from a graphical model perspective. We rebuilt three well-known models in our framework and developed Bayesian inference tools for the proposed models. It is our perspective that this framework is very general and can contain other models that are beyond the cases discussed in the paper. For example, Jin et al. (Citation2007) specified a co-regionalised areal data model, in which their Case 3 is a very general specification. We show that this specification can be reproduced and extended in our framework. Consider the Cholesky decomposition Σ=AAT. Jin et al. (Citation2007)'s Case 3 specification of the joint covariance matrix is (AII)(IJD(s)ΦC(s))1(AII)T whose inverse is then (11) (AAT)1D(s)(A1)TΦA1C(s)(11) where Φ is a symmetric J×J matrix. Let Ω=Σ1=(AAT)1 and Q=(A1)TΦA1. Obviously it is one-to-one from (A,Φ) to (Ω,Q). Specification (Equation11) is hence equivalent to (12) ΩD(s)QC(s),(12) where Q is a symmetric J×J matrix with entries qjj. To reproduce this specification in our framework, parameterise B and T as follows (assuming ii and jj): b{ij},{ij}=qjjdi(s)ωjjC{ij},{ij},b{ij},{ij}=ωjjωjjC{ij},{ij},b{ij},{ij}=qjjdi(s)ωjjC{ij},{ij},τij2=1di(s)ωjj. This parameterisation leads to the joint precision matrix (13) T1(IB)=Ω(IJ+C(r))D(s)Q(IJ+C(r))C(s)=ΩC(r)D(s)QC(r)C(s).(13) The expression (Equation13) reduces to (Equation12) which is equivalent to Jin et al. (Citation2007)'s (Equation11) when C(r) is a complete graph. A derivation of (Equation13) is given in Appendix 1. The validity of this model relies on the positive definiteness of (Equation13). Jin et al. (Citation2007) showed that it is positive definite if Ω is positive definite and eigenvalues of Φ=ATQA are between 1/ξmin and 1/ξmax, reciprocals of the smallest and largest eigenvalues of D(s)1/2C(s)D(s)1/2, which are known constants. The graphical version (Equation13) must also satisfy this condition, that is, 1/ξminλ(ΩC(r)1QC(r))1/ξmax, where λ(M) is any eigenvalue of M. Considering that both ΩC(r) and QC(r) are restricted by the underlying graph, the eigenvalue condition is not easy to implement in computations. This matter is worth investigating in the future.

In general, flexible models are desired for modelling multivariate areal data because overly simplistic models may misspecify the true underlying covariance structure. However, there is almost always a trade-off between the simplicity and the flexibility of a model. It is probably reasonable to allow certain flexibilities for specific purposes, such as in this paper, for learning a graphical relationship between multiple responses. It is usually the practitioner's choice whether a more flexible but complicated model is needed for the problem at hand, especially when the performance improvement is negligible.

Acknowledgments

The author would like to thank two editors and two anonymous referees for their helpful comments which lead to a much improved paper.

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Notes on contributors

Ye Liang

Ye Liang is an associate professor in the Department of Statistics at Oklahoma State University. He received his Ph.D. in statistics from the University of Missouri.

References

  • Atay-Kayis, A., & Massam, H. (2005). A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika, 92, 317–335. doi: 10.1093/biomet/92.2.317
  • Banerjee, S., Gelfand, A., & Carlin, B. (2004). Hierarchical modeling and analysis for spatial data. Boca Raton, FL: CRC Press/Chapman & Hall.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B, 36, 192–236.
  • Dobra, A., Lenkoski, A., & Rodriguez, A. (2011). Bayesian inference for general Gaussian graphical models with applications to multivariate lattice data. Journal of the American Statistical Association, 106, 1418–1433. doi: 10.1198/jasa.2011.tm10465
  • Gelfand, A., Sahu, S., & Carlin, B. (1995). Efficient parameterizations for normal linear mixed models. Biometrika, 82, 479–488. doi: 10.1093/biomet/82.3.479
  • Gelfand, A., & Vounatsou, P. (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics, 4, 11–25. doi: 10.1093/biostatistics/4.1.11
  • Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. doi: 10.1093/biomet/82.4.711
  • Jin, X., Banerjee, S., & Carlin, B. (2007). Order-free co-regionalized areal data models with application to multiple-disease mapping. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69, 817–838. doi: 10.1111/j.1467-9868.2007.00612.x
  • Jin, X., Carlin, B., & Banerjee, S. (2005). Generalized hierarchical multivariate CAR models for areal data. Biometrics, 61, 950–961. doi: 10.1111/j.1541-0420.2005.00359.x
  • Kim, H., Sun, D., & Tsutakawa, R. (2001). A bivariate Bayes method for improving estimates of mortality rates with a twofold conditional autoregressive model. Journal of the American Statistical Association, 96, 1506–1521. doi: 10.1198/016214501753382408
  • Letac, G., & Massam, H. (2007). Wishart distributions for decomposable graphs. The Annals of Statistics, 35, 1278–1323. doi: 10.1214/009053606000001235
  • MacNab, Y. (2011). On Gaussian Markov random fields and Bayesian disease mapping. Statistical Methods in Medical Research, 20, 49–68. doi: 10.1177/0962280210371561
  • MacNab, Y. C. (2016). Linear models of coregionalization for multivariate lattice data: A general framework for coregionalized multivariate CAR models. Statistics in Medicine, 35, 3827–3850. doi: 10.1002/sim.6955
  • MacNab, Y. C. (2018). Some recent work on multivariate Gaussian Markov random fields. Test, 27, 497–541. doi: 10.1007/s11749-018-0605-3
  • Mardia, K. (1988). Multidimentional multivariate Gaussian Markov random fields with application to image processing. Journal of Multivariate Analysis, 24, 265–284. doi: 10.1016/0047-259X(88)90040-1
  • Martinez-Beneito, M. (2013). A general modeling framework for multivariate disease mapping. Biometrika, 100, 539–553. doi: 10.1093/biomet/ast023
  • Martinez-Beneito, M. A., Botella-Rocamora, P., & Banerjee, S. (2017). Towards a multidimensional approach to Bayesian disease mapping. Bayesian Analysis, 12, 239–259. doi: 10.1214/16-BA995
  • Scott, J., & Berger, J. (2006). An exploratory of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference, 136, 2144–2162. doi: 10.1016/j.jspi.2005.08.031
  • Scott, J., & Carvalho, C. (2009). Feature-inclusion stochastic search for Gaussian graphical models. Journal of Computational and Graphical Statistics, 17, 790–808. doi: 10.1198/106186008X382683
  • Wang, H., & Li, S. (2012). Efficient Gaussian graphical model determination under G-Wishart prior distributions. Electronic Journal of Statistics, 6, 168–198. doi: 10.1214/12-EJS669
  • Wang, H., & West, M. (2009). Bayesian analysis of matrix normal graphical models. Biometrika, 96, 821–834. doi: 10.1093/biomet/asp049

Appendices

Appendix 1. Derivations

A.1. Derivation of Equation (3)

With the parameterisation in Model 1, we have T=D1(ΔII) and B=(Δ1/2II)D1ΛC(s)+(ΨC(r))II+(ΦC(r))C(s)(Δ1/2II). Then immediately we have expression (Equation3) for T1(IB).

A.2. Derivation of Equation (5)

With the parameterisation in Model 2, we have T=(diag(Ω)D(s))1 and B=ρIJD(s)1C(s)diag(Ω)1ΩC(r)II+ρdiag(Ω)1ΩC(r)D(s)1C(s). Then T1(IB)=diag(Ω)D(s)ρdiag(Ω)C(s)+(ΩC(r))D(s)ρ(ΩC(r))C(s)=Ω(IJ+C(r))D(s)Ω(IJ+C(r))ρC(s)=Ω(IJ+C(r))(D(s)ρC(s)), which is expression (Equation5).

A.3. Derivation of Equation (7)

With the parameterisation in Model 3, we have T=(diag(Ω(r))diag(Ω(s)))1 and B=IJdiag(Ω(s))1Ω(s)C(s)diag(Ω(r))1Ω(r)C(r)IIdiag(Ω(r))1Ω(r)C(r)diag(Ω(s))1Ω(s)C(s). Then T1(IB)=diag(Ω(r))diag(Ω(s))+diag(Ω(r))(Ω(s)C(s))+(Ω(r)C(r))diag(Ω(s))+(Ω(r)C(r))(Ω(s)C(s))=Ω(r)(IJ+C(r))Ω(s)(II+C(s)), which is expression (Equation7).

A.4. Derivation of Equation (13)

With the parameterisation in Model 4, we have T=(diag(Ω)D(s))1 and B=diag(Ω)1diag(Q)D(s)1C(s)diag(Ω)1ΩC(r)II+diag(Ω)1QC(r)D(s)1C(s). Then T1(IB)=diag(Ω)D(s)diag(Q)C(s)+ΩC(r)D(s)QC(r)C(s)=Ω(IJ+C(r))D(s)Q(IJ+C(r))C(s), which is expression (Equation13).

Appendix 2: Bayesian computations

A.5. A hierarchical generalised linear model

For illustration, we now assume a full Bayesian hierarchical model and give computational details for this model. Assume binomial counts yij/nij for J responses and I areal units. Specify a Bayesian model as follows, for i=1,,I and j=1,,J, yijpijBin(nij,pij),logit(pij)=βj+uij,βjN(0,τ02),UB,TMCAR(B,T), where τ02 is a given constant, U is the matrix-variate of uij, and priors for B and T depend on the specific parameterisation. This section is organised as follows: we first give details of updating effects parameters βj and uij, and then, separately for each model, details of updating parameters of MCAR and updating the random response graph C(r).

A.6. Updating effects parameters

Our experience has shown that the convergence is poor if we directly update βj and uij. We apply the hierarchical centring technique (Gelfand, Sahu, & Carlin, Citation1995) and block sampling. Let γij=βj+uij and γ=vec[(γij)I×J] has a non-centred MCAR prior. We update (γij,βj) instead of (uij,βj). The full conditional distribution of γij is p(γij)eyijγij(1+eγij)nijexp12τij2γijβj{ij}{ij}b{ij},{ij}(γijβj). We use Metropolis-Hastings algorithm to sample γij from this conditional density. We block sample β in the following way. For now denote M=T1(IB), the joint precision matrix. Let γ=Mγ and γ be a J×1 vector such that γ1 is the sum of the first I elements in γ, γ2 is the sum of the second I elements in γ and so on. Partition M into J×J blocks and define H=1TM1111TM1J11TMJ111TMJJ1, where 1 is the all-one vector. Then the full conditional distribution for the vector β is (β)N(H+1/τ02I)1γ,(H+1/τ02I)1.

A.7. Model 1: updating δj, λj, ψjj, φjj and C(r)

Given the current graph C(r), parameters are updated through Gibbs sampling. Recall priors on these parameters: δjIG(aj,bj) and λj,ψjj,φjjUnif(1,1). Let uj be the jth column vector of U, j=1,,J and Dj be the jth diagonal block of D. The full conditional distribution of δj is given by p(δj)δjI/2aj1exp12δjujT(DjλjC(s))uj+jj1δjδjujT(ψjjII+φjjC(s))ujbjδj. It can be shown that the transformed one (1/δj) is log-concave when I+2aj1>0. Thus, we use the adaptive rejection sampling to update δj.

Let W be an I×J matrix, where vec(W)=(Δ1/2II)vec(U) and wj be the jth column vector of W. Let M=DΛC(s)(ΨC(r))II(ΦC(r))C(s) as in (Equation3). Then λj, ψjj and φjj are sequentially updated through following full conditional distributions, p(λj)M(λj)1/2exp12λjwjTC(s)wj,p(ψjj)M(ψjj)1/2expψjjwjTwj,p(φjj)M(φjj)1/2expφjjwjTC(s)wj. We use Metropolis-Hastings algorithm to update these parameters. Note that evaluating the sparse |M| could be computationally intensive. An efficient algorithm, usually based on the Cholesky decomposition, on sparse matrices is helpful.

The graph C(r) is updated through a simple reversible jump MCMC algorithm. Propose a new graph C(r) by only adding or deleting one edge from C(r). Without loss of generality, suppose that one edge {j0,k0} is added to the new graph. Dimension has been changed by 2 from (C(r),Ψ,Φ) to (C(r),Ψ,Φ). Propose u1U(1,1) and u2U(1,1), and let ψj0,k0=u1 and φj0,k0=u2. The Jacobian from (Ψ,Φ,u1,u2) to (Ψ,Φ) hence is 1. Choose a Bernoulli jump proposal with odds q(C(r),C(r))/q(C(r),C(r))=p(C(r))/p(C(r)) and systematically scan through the graph for updating. Accept the move from C(r) to C(r) with probability min{1,α} where α=|M|1/2|M|1/2exp12vec(W)(MM)vec(W).

A.8. Model 2: updating ρ, ΩC(r) and C(r)

Given the current graph C(r), parameters are updated through Gibbs sampling. Recall priors on these parameters: ρU(1,1) and ΩC(r)GWis(b,V). Use Metropolis-Hastings algorithm to update ρ. It can be shown that the full conditional distribution for ρ is p(ρ)D(s)ρC(s)J/2exp×ρ2vec(U)T(ΩC(r)C(s))vec(U). Let W1 be an I×J matrix, where vec(W1)=[IJ(D(s)ρC(s))1/2]vec(U) and w1,j be the jth column vector of W1. Let S be an J×J matrix with sjj=w1,jTw1,j. Then the full conditional distribution of ΩC(r) is p(ΩC(r))ΩC(r)(b+I2)/2exp×12trΩC(r)(V+S)1ΩC(r)M+(C(r)), which is GWis(b+I,V+S). For sampling from the G-Wishart distribution, we use the block Gibbs sampler, given the set of maximum cliques, introduced by Wang and Li (Citation2012).

The graph C(r) is updated using (Wang & Li, Citation2012)'s partial analytic structure algorithm (p. 188, Algorithm 2).

A.9. Model 3: updating ΩC(r), ΩC(s) and C(r)

Given the current graph C(r), parameters are updated through Gibbs sampling. Recall that we impose a constraint and use a joint prior (Equation8) on (z,zΩC(r)) and have ΩC(s)GWis(b(s),V(s)). Let both W(r) and W(s) be I×J matrices, where vec(W(r))=(IJΩC(s)1/2)vec(U) and vec(W(s))=(ΩC(r)1/2II)vec(U). Let wj(r) be the jth column vector of W(r) and wi(s) be the ith row vector of W(s). Then let S(r) be J×J with sjj(r)=wj(r)Twj(r) and S(s) be I×I with sii(s)=wi(s)Twi(s). With these notations, we have (z)Ga(az,bz), where az=J(b2)/2+ν(C(r)) and bz=tr(ΩC(r)V(r))/2; (ΩC(r))GWis(b(r)+I,zV(r)+S(r)) and (ΩC(s))GWis(b(s)+J,V(s)+S(s)). The graph C(r) is updated using (Wang & Li, Citation2012)'s partial analytic structure algorithm (p. 188, Algorithm 2).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.