3,630
Views
0
CrossRef citations to date
0
Altmetric
STATISTICS

Bayesian inference for generalized linear mixed models: A comparison of different statistical software procedures

ORCID Icon &
Article: 1896102 | Received 19 Aug 2020, Accepted 22 Feb 2021, Published online: 13 Apr 2021

ABSTRACT

Bayesian inference for generalized linear mixed models (GLMM) is appealing, but its widespread use has been hampered by the lack of a fast implementation tool and the difficulty in specifying prior distributions. In this paper, we conduct an extensive simulation study to evaluate the performance of INLA for estimation of the hierarchical Poisson regression models with overdispersion in comparison with JAGS and Stan while assuming a variety of prior specifications for variance components. Further, we analysed the influence of different factors such as small number of observations per cluster, different values of the cluster variance and estimation from a misspecified model. A simulation study has shown that the approximation strategy employed by INLA is accurate in general and that all software leads to similar results for most of the cases considered. Estimation of the variance components, however, is difficult when their true value is small for all estimation methods and prior specifications. The estimates obtained for all software tend to be biased downward or upward depending on the assumed priors.

PUBLIC INTEREST STATEMENT

Longitudinal count data are now an integral part of experimental and empirical studies across a range of disciplines from the medical to the social and business sciences. Special models for longitudinal data are required when there are repeated measurements of the count outcome from the same individual over time, which leads to a dependence structure in the data. Generalized linear mixed models (GLMM) are one of the most used models for modeling longitudinal count data. Bayesian inference for generalized linear mixed effect models (GLMM) is appealing, but its widespread use has been hampered by the lack of a fast implementation tool and the difficulty in specifying prior distributions. In this paper, we conduct an extensive simulation study to evaluate the performance of INLA for estimation of the hierarchical Poisson regression models with overdispersion in comparison with JAGS and Stan while assuming a variety of prior specifications for variance components.

1 Introduction

The Integrated Nested Laplace Approximation (INLA) by Rue et al. (Citation2009) is a Bayesian estimation method which is computationally faster than its predecessors. Since its introduction, its performance compared to other software for Bayesian analysis has been widely reported in the literature. Taylor and Diggle (Citation2013) compared INLA and Markov chain Monte Carlo (MCMC) in the context of spatial log-Gaussian Cox processes. Their simulation study confirms the advantage of INLA in terms of computational time, but shows that INLA has a lower predictive accuracy in certain scenarios. The use of INLA for Bayesian inference for generalized linear mixed models (GLMMs) was investigated by Fong et al. (Citation2010) making a comparison with the maximum likelihood approach via Penalized Quasi Likelihood (Breslow & Clayton, Citation1993). Fong et al. (Citation2010) showed that the approximations were inaccurate for binary data with few or no replications. Further, the performance of Bayesian inference using INLA for a random intercept logit model was investigated by Grilli et al. (Citation2015), who presented a comparison with Bayesian MCMC Gibbs sampling and maximum likelihood with Adaptive Gaussian Quadrature (AGQ) approximation. In contrast to the previous two studies, Grilli et al. (Citation2015) showed that the specification of the prior distribution is more relevant than the choice of the estimation method. Following Fong et al. (Citation2010), INLA’s developers addressed the inaccuracy of INLA for binary data with few or no replications by introducing a new correction term for INLA (Ferkingstad & Rue et al., Citation2015) and claim their correction has significantly improved the accuracy.

In this paper, we extend the investigation of Grilli et al. (Citation2015) to longitudinal count data with overdispersion and evaluate the performance of INLA in comparison with JAGS (Plummer, Citation2003) and Stan (Stan Development Team, Citation2015) while assuming a variety of prior specifications for variance components. In contrast with the comparison presented in Ferkingstad and Rue et al. (Citation2015), based on the difference between the results obtained by INLA after the correction to the results obtained by MCMC simulation, we base our comparison on the difference between the parameter estimates obtained for all methods and the true parameter values. Further, we investigate the influence of small sample sizes and true values on the cluster variance and overdispersion.

We proceed as follows. In Section 2, the two case studies used for illustration are presented. In section 3, we review the generalized linear mixed effects model, Bayesian estimation approaches and prior specifications. Section 4 presents results obtained for the two studies where we apply the three estimation methods to the two longitudinal count data sets. The simulation design and the main results of the simulation study are commented in Section 5. Finally, Section 6 gives a discussion and concluding remarks.

2 Case studies

Two data sets with longitudinal count outcomes were used to illustrate the methodological aspects discussed in this paper.

2.1 Anopheles mosquitoes count data

A longitudinal entomological study was conducted between June 2013 and November 2013 in Jimma town, south-western Ethiopia, to investigate whether the ecological transformation due to resettlement has an influence on the abundance and species composition of Anopheles mosquitoes. This was carried out by comparing villages who are at the centre of the town with those newly emerged villages located at the suburb of the town. The study design and rationale are presented in Degefa et al. (Citation2015).

The study consists of a longitudinal count data where adult Anopheles mosquitoes resting inside human habitations were collected monthly (June 2013—November 2013) from 40 selected houses using pyrethrum spray catches (PSCs). Half of the households belong to resettled villages and are considered to be at risk of Malaria infection. The second half of the household belongs to villages located in low-risk areas (control). (left panel) presents the monthly female Anopheles mosquito count while the right panel presents the mean evolution over time.

Figure 1. Female Anopheles mosquito count data. Individual count (left panel) and average profile (right panel) in at risk and control villages, Jimma town, Southwest Ethiopia (June—November 2013)

Figure 1. Female Anopheles mosquito count data. Individual count (left panel) and average profile (right panel) in at risk and control villages, Jimma town, Southwest Ethiopia (June—November 2013)

2.2 The epilepsy data

The epilepsy data set contains information about 59 epileptics’ patients that are randomized into two treatment arms, a placebo or a new drug in a randomized clinical trial of anticonvulsant therapy (Thall & Vail, Citation1990). The response variable, the number of epilepsy seizures, was measured at four visits over time. The data was presented and analysed by Breslow and Clayton (Citation1993) and used as an illustrating example in Fong et al. (Citation2010) who evaluated the performance of INLA in comparison with penalized quasi-likelihood (PQL). The epilepsy data set is available in R (R Core Team, Citation2016) as part of the R package INLA (Rue et al., Citation2009). shows the individual and average profiles of the epilepsy data for both treatment groups.

Figure 2. Individual profiles (left panel) and average profile (right panel) of the epilepsy data for both treatment groups

Figure 2. Individual profiles (left panel) and average profile (right panel) of the epilepsy data for both treatment groups

3 Hierarchical Bayesian models for count data

3.1 Hierarchical poisson-normal models

Generalized linear mixed models (GLMMs) extend the generalized linear model with a subject-specific random effect, usually of Gaussian type, added to the linear predictor to give a rich family of models that have been used in a wide variety of applications (McCulloch & Neuhaus, Citation2001; Molenberghs & Verbeke, Citation2005). The analysis presented in this paper is focused on count variables (number of female Anopheles mosquitoes and number of epilepsy seizures) which were repeatedly measured over time. Hence, we formulated a hierarchical Poisson model for both case studies. Let Yij represent the response variable of the ith subject measured at time j, i=1,2,,I and j=1,2,,J. A Poisson-Normal hierarchical (HPN) model can be formulated as

YijPoisson(λij),
ηij=log(λij)=xijTβ+b0i,
(1) b0iN(0,σb02),(1)

where xij is a p-dimensional design matrix of the fixed effect parameters β and b0i is a random intercept typically used to model repeated count responses. To complete the specification, we used non-informative prior distributions, namely, a normal distribution with mean zero and variance 10,000 for β‘s and considered various non-informative prior distributions for σb02 (see Section 3.2).

The HPN model specified in (1) assumes that all sources of variability in the data can be captured by the random effects and the Poisson variability. A commonly encountered problem related to count data is overdispersion (or underdispersion), which may cause serious flaws in precision estimation and inference (Breslow, Citation1990) if not appropriately accounted for. A number of extensions of the HPN models have been proposed to account for extra-dispersion in the setting of longitudinal count data (2Citation015; Aregay et al., Citation2013; Molenberghs et al., Citation2007). Aregay et al. (Citation2015) extend the HPN model in (1) by considering two separate random effects added to the linear predictor. The mean structure for their model (HPNOD) is given by

ηij=log(λij)=xijTβ+b0i+uij,
(2) uijN(0,σu2),(2)

where b0i is the random intercept used to account for possible clustering effect as before and the random effect uij included to accommodate the overdispersion not captured by the normal random effect b0i. The assumed priors for σu2 are discussed in section 3.2. We choose the additive overdispersion model above rather than the multiplicative overdispersion model (Aregay et al., Citation2013; Molenberghs et al., Citation2007) to keep parametrization of the model to be the same across softwares. Further, a comparison of the additive model and the multiplicative model has been described by Aregay et al. (Citation2015) and they reported the performance of the two approaches to be the same.

3.1.1 Model formulation for Anopheles mosquitoes count data

Let Yij represent the number of female Anopheles mosquito counts for household i during month j of the follow-up period, let tij be the time point (months) at which Yij has been measured, tij=1,,6 for all households, and let xi be an indicator variable, which denotes the village type of the ith household, which takes the value of one if the household is located in a resettled (risk) village and zero if the household is located in a control village. We assumed Yij|β,b0iPoisson(λij),i=1,,40,j=1,,6, and that the pattern of Anopheles mosquito abundance over time is log-linear and possibly different between the control and at-risk villages. Thus, the HPN model has a linear predictor of the form

(3) ηij=log(λij)=β00xi+β01(1xi)+β10xitij+β11(1xi)tij+b0i,(3)

and the mean structure for the HPNOD is given by

(4) ηij=log(λij)=β00xi+β01(1xi)+β10xitij+β11(1xi)tij+b0i+uij.(4)

3.1.2 Model formulation for epilepsy data

Breslow and Clayton (Citation1993) analyzed the epilepsy data set presented in Section 2.2 using likelihood-based inference via penalized quasi-likelihood (PQL). Fong et al. (Citation2010) used this data set to illustrate how Bayesian inference may be performed using INLA. We concentrate on the two random-effects models fit by Breslow and Clayton (Citation1993) and Fong et al. (Citation2010). Let Yij be the number of seizures for patient i at visit j, i=1,2,,59 and j=1,2,,4, assumed to be conditionally independent Poisson variables with mean exp(λij), where the linear predictor for the HPN model is given by

ηij=log(λij)=β0+β1log(Baselinei/4)+β2Trti+β3Trti×log(Baselinei/4)
(5) +β4log(Agei)+β5V4i+b0i,(5)

and the linear predictor for the HPNOD model is given by

ηij=log(λij=β0+β1log(Baselinei/4)+β2Trti+β3Trti×log(Baselinei/4)
(6) +β4log(Agei)+β5V4i+b0i+uij.(6)

Here, Baselinei/4 denote the one-fourth of the baseline seizure count for the ith patient, Trti is an indicator variable for the treatment arm of the ith patient, which takes the value of one if the patient is assigned to the new drug and zero if the patient is assigned to the placebo group, and V4i is an indicator variable for the fourth visit. To aid convergence when fitting the HPN and HPNOD models (5) and (6), respectively, the covariates log(Baselinei/4), log(Agei), and Trti×log(Baselinei/4) were centred about their mean.

3.2 Priors for the variance components of the hierarchical Poisson models

A Bayesian approach is attractive for modelling complex longitudinal count data, but requires the specification of prior distributions for all the random elements of the model. For the hierarchical models in Section 3.1, this involves choosing priors for the regression coefficients and the hyperparameters σb02 and σu2 of subject and observation-specific random effects, respectively. Two classes of prior distributions, informative and non-informative priors, are used in Bayesian modelling. One can use informative priors when substantial prior information is available, for instance, from previous studies relevant to the current data set. Non-informative prior distributions are intended to allow Bayesian inference for parameters about which little is known beyond the data included in the analysis at hand (Gelman, Citation2006). In this paper, we focus on the choose of non-informative prior.

The specification of a non-informative prior to express the absence of prior information about the cluster variance and overdispersion is often difficult (Gelman, Citation2006). Various non-informative prior distributions have been suggested in the Bayesian literature, including an improper uniform density on the scale of standard deviation (Gelman et al., Citation2003), proper distributions such as the inversegamma(,) with small positive for the variance (Lunn et al., Citation2012), and conditionally conjugate folded-non-central-t family of prior distributions for the standard deviation (Gelman, Citation2006). In this paper, we concentrate on the latter two approaches. We considered three specifications based on inversegamma(,) for the variance and half-Cauchy prior (a special case of the conditionally conjugate folded-non-central-t family of prior distributions) for the standard deviation (see in the supplementary material):

1. p(σ2)Γ(1,0.0005)—the default choice of the INLA software (Rue et al., Citation2009);

2. p(σ2)Γ(0.001,0.001)—the default choice of the BUGS software (Lunn et al., Citation2012) and the most popular choice in Bayesian analysis;

3. p(σ2)Γ(0.5,0.0164)—a specification proposed by Fong et al. (Citation2010);

4. Half—Cauchy prior with scale 25 on σ—a specification proposed by (Gelman, Citation2006).

4 Result

4.1 Analysis of the Anopheles mosquitoes count data

In this subsection, we present the analysis of the Anopheles mosquitoes count data introduced in Section 2.1. We considered three estimation routines (namely: Stan, JAGS, and INLA) where all of them are accessed through R software version 3.3.2 (R Core Team, Citation2016) to fit the HPN and HPNOD models presented in section 3.1.1 and 3.1.2. For the Bayesian inference using runjags (Denwood, Citation2016) we run 3 chains with 30,000 MCMC iterations per chain from which the first 10,000 iterations are considered burn-in period, while for RStan (Stan Development Team, Citation2016) the results are based on 4 chains with 4000 MCMC iterations per chain (with the first 2000 the burn-in period). Note that the models can be fitted in Stan using brms (Bürkner, Citation2017) as well. An elaborate discussion about two packages, RStan and brms, is given in Section 6 of the supplementary appendix. Model selection was made using the Deviance Information Criteria (DIC, Spiegelhalter et al., Citation2002) for INLA and JAGS and the widely applicable information criteria (WAIC, Vehtari et al., Citation2017) for Stan.

The posterior means and standard deviations for the parameters of the HPN and HPNOD models estimated using INLA, JAGS, and Stan using the four prior specifications are presented in . For all priors considered, the DIC and WAIC values of the HPNOD model are slightly higher than those of the HPN model, suggesting the latter to be a preferred model for this dataset.

Table 1. Parameter estimates of the HPN and HPNOD models for the Anopheles mosquitoes count data using INLA, JAGS and Stan

All prior distributions and all methods of estimations we examined yielded similar results for the regression coefficients. However, the results in revealed the influence of the software used to fit the models and the prior specification on the parameter estimates corresponding to the random effect terms. The three estimation methods give similar point estimates for the variance of the random intercept under the Γ(0.001,0.001) and Γ(0.5,0.0164) priors in both the HPN and HPNOD models. On the other hand, the difference among estimation methods is noticeable under the Γ(1,0.0005) and half Cauchy prior, where the posterior mean for σb02 obtained from Stan is about 6% larger than INLA under Γ(1,0.0005) prior and about 13% larger under the half Cauchy prior. shows the posterior density of the precision of the random intercept (σb02) obtained from JAGS, Stan and INLA using the four prior distributions. All estimation methods lead to a similar result under Γ(0.001,0.001) and Γ(0.5,0.0164) priors, but the posterior density of INLA for σb02 deviates from that of Stan under the Γ(1,0.0005) and half Cauchy prior. For INLA, we observed differences in the point estimates of σb02 and σu2 across priors, where a profound variation across priors is observed in the posterior density of σu2 ().

Figure 3. Anopheles mosquitoes count data: Posterior density for the precision of the random intercept (σb02) obtained for the HPN model

Figure 3. Anopheles mosquitoes count data: Posterior density for the precision of the random intercept (σb0−2) obtained for the HPN model

Figure 4. Anopheles mosquitoes count data: Marginal posterior distribution of σb (left panel) and σu (right panel) for the analysis of Anopheles mosquitoes count data using HPNOD model estimated via INLA. Pr1—Γ(1,0.0005), Pr2—Γ(0.001,0.001), Pr3—Γ(0.5,0.0164) and Pr4—(half-Cauchy(0,25)

Figure 4. Anopheles mosquitoes count data: Marginal posterior distribution of σb (left panel) and σu (right panel) for the analysis of Anopheles mosquitoes count data using HPNOD model estimated via INLA. Pr1—Γ(1,0.0005), Pr2—Γ(0.001,0.001), Pr3—Γ(0.5,0.0164) and Pr4—(half-Cauchy(0,25)

4.2 Analysis of the epilepsy data

The HPN and HPNOD models formulated in Equationequations (5) and (Equation6) were fitted using the three software and four priors discussed in the previous section.

Figure 5. Epilepsy data: Posterior density for the precision of the random intercept (σb02) obtained for the HPNOD model

Figure 5. Epilepsy data: Posterior density for the precision of the random intercept (σb0−2) obtained for the HPNOD model

presents the posterior means obtained for all fitted models. The DIC (WAIC) values of the HPNOD model obtained for all estimation methods are smaller than those of the HPN model for all prior settings, indicating that the first model is preferred. The three estimation methods lead to virtually identical results for the HPNOD model under priors 2 and 3. However, under the half Cauchy prior (prior 4), the posterior means for the random intercept (σb02) and overdispersion (σu2) obtained for Stan and JAGS are slightly higher than the posterior mean obtained for INLA. show the posterior density of the precision of the random intercept (σb02) and the precision of the overdispersion parameter (σu2) obtained from JAGS, Stan and INLA using the four prior distributions. We notice a small difference between JAGS and INLA as well as Stan and INLA for the two prior specifications (priors 1 and 2). Differences between estimation methods are clearly seen when the half Cauchy prior is used.

Figure 6. Epilepsy data: posterior density for the precision of the overdispersion parameter (σu2) obtained for the HPNOD model

Figure 6. Epilepsy data: posterior density for the precision of the overdispersion parameter (σu−2) obtained for the HPNOD model

Table 2. Parameter estimates of the HPN and HPNOD models for the epilepsy data set using INLA, JAGS and stan

5 Simulation study

A simulation study was conducted to compare the performance of the three Bayesian software and the four prior specifications for σb0 and σu presented in Section 3.2.

5.1 Simulation setting

The simulation represents a longitudinal study where the data are Poisson distributed. The steps for the simulation study are as follows: For i=1,,N clusters (subjects), j=1,,J observations per cluster observed at equally spaced sampling times tij=(ti1,,tiJ) , we generate YijPoisson(λij) where

(7) ηij=log(λij)=β00xi+β01(1xi)+β10xitij+β11(1xi)tij+b0i+uij,(7)

with b0iN(0,σb02),uijN(0,σu2), and xi is an indicator variable such that xi=0 for iN/2 and xi=1 otherwise. The model formulated in (7) is the HPNOD model discussed in (3.1.1). Low, moderate, and high levels of overdispersion were considered corresponding to σu=0.2,1 and 2, respectively (Aregay et al., Citation2015). A model without overdispersion (HPN) was formulated by omitting uij from the mean structure in (7). The true values of the fixed effect parameters for both the HPN and HPNOD models are β=(β00,β01,β10,β11) =(2,2,0.05,0.2) . To study how the performance of the estimation methods and the role of the prior distribution are affected by the value of the cluster variance, we used four different values for the standard deviation parameter of the random intercept term, i.e σb0=0.1,0.5,1,1.5. Further, in order to study the effect of small number of observations per cluster, we considered four balanced designs with J=2,5 observations per cluster and the number of clusters equal to N=20 and 40. In total 4×4×2×2=64 simulation setting was considered. For each setting 500 datasets were generated. For each dataset, the HPN and HPNOD models were fitted with INLA, JAGS, and Stan using a flat prior distribution N(0,1000) for the fixed effect parameters and using the four prior distributions listed in Section 3.2 for σb02 and σu2. For JAGS and Stan, we ran 3 chains with 20,000 iterations per chain from which the first 10,000 iterations were considered as burn-in period. For each parameter of interest θ, the relative bias was calculated by

1500Σi=1500θˆiθ/θ,

and the mean squared error (MSE) by

1500Σi=1500Var(θˆi)+θˆiθ2,

where θˆi is the parameter estimate for θ obtained for the ith simulation. A simulation study for unbalanced longitudinal data was conducted as well. The simulation setting and result are discussed in detail in Section 5 of the supplementary appendix.

5.2 Simulation results

5.2.1 Estimation of σb0

(upper left panel) presents the simulation results obtained for the datasets generated without overdispersion. For all prior specifications, and for all software, differences were observed for σb0=0.1. For INLA and JAGS, prior 1 led to an underestimation of σb0 while prior 24 led to an over estimation. For Stan, prior 4 led to over estimation, while priors 1–3 led to an underestimation of σb0. Note that this pattern was observed for both HPN and HPNOD, which in this case, mis-specified the mean structure (by including the over dispersion parameter).

The upper right and the lower panels in present the simulation results obtained for the datasets generated with varying levels of overdispersion (σu=0.2, σu=1, and σu=2). In this case, the HPN model mis-specified the mean structure of the underlying model used to generate the data (by omitting the overdispersion parameter). For datasets generated with low levels of overdispersion (σu=0.2), we observe a similar pattern to that of no overdispersion. When the data are generated with a moderate to high level of overdispersion, for the HPN model, all prior specifications and software have a tendency to overestimate the value of σb0 with the magnitude of overestimation decreasing as the value of σb0 increases. For the HPNOD model (which specified the mean structure correctly), for σb0=0.1, prior 1 consistently led to an under estimation of σb0 across all software but with the smallest magnitude compared to the overestimation observed for the other priors.

Figure 7. Relative bias for the random intercept σb0 (y-axis) as a function of the true value of σb0 (x-axis); panels by level of overdispersion σu. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Figure 7. Relative bias for the random intercept σb0 (y-axis) as a function of the true value of σb0 (x-axis); panels by level of overdispersion σu. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Similar patterns were observed for the second simulation study of unbalanced longitudinal data (see Section 5 of the supplementary material). When the true values for σb are small, substantial variation is observed among software and prior specifications and this variation vanishes when the true values are relatively large.

5.2.2 Estimation of σu

shows the results for the estimation of σu across the levels of σb0 for the HPNOD model. Differences between the priors can be observed for σu=0.2. For INLA and Stan, all priors led to an underestimation of σu (prior 1 with the highest magnitude). For JAGS, prior 4 led to an overestimation of σu while the other priors to an underestimation.

Figure 8. Relative bias for the overdispersion parameter σu (y-axis) as a function of the true value for σu (x-axis). Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Figure 8. Relative bias for the overdispersion parameter σu (y-axis) as a function of the true value for σu (x-axis). Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

5.2.3 Estimation of β‘s

The relative biases for the regression coefficients for given values of σb0 and σu are shown in . For INLA and Stan, all priors lead to similar relative bias for all regression coefficients under all settings. For JAGS, we observed a different pattern for β10 when the data are generated with a high level of overdispersion (σu=2) and the HPNOD model used to fit the data. The relative bias for this parameter increases in a linear fashion with the value of the standard deviation of the random intercept.

Figure 9. Relative bias for regression coefficients: β00 (top left panel), β01 (top right panel), β10 (bottom left panel), and β11 (bottom right panel). Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Figure 9. Relative bias for regression coefficients: β00 (top left panel), β01 (top right panel), β10 (bottom left panel), and β11 (bottom right panel). Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

5.2.4 Effect of cluster size

A simulation study also investigated the effect of a small number of observations per cluster on the estimation. Two and five observations per cluster were considered. , presents the relative bias for all the parameters of the HPNOD model. The results obtained for N=20, σb0=1, and σu=1 indicate that, as expected, the relative bias decreases as the sample size increases. Note that this pattern was observed for all software and priors considered. The results obtained for the HPN model are similar and presented in Section 3 of the supplementary appendix.

Figure 10. Relative bias for HPNOD model parameters (y-axis) as a function number of observation per cluster J (x-axis). Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20, σb0=1, and σu=1

Figure 10. Relative bias for HPNOD model parameters (y-axis) as a function number of observation per cluster J (x-axis). Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20, σb0=1, and σu=1

5.3 Random intercept and slope model

We extend the simulation setting in Section 5.1 to a random intercept and slope model. For i=1,,N clusters (subjects), j=1,,J observations per cluster observed at equally spaced sampling times tij=(ti1,,tiJ) , we generate YijPoisson(λij) where

(8) ηij=log(λij)=β00xi+β01(1xi)+β10xitij+β11(1xi)tij+b0i+b1itij+uij,(8)

with b0iN(0,σb02), b1iN(0,σb12), uijN(0,σu2), and xi is an indicator variable such that xi=0 for iN/2 and xi=1 otherwise. We considered four set of values for σb0 and σb1, i.e σb000σb1=0.1000.1, 0.1001.5,1.5000.1,and1.5001.5. Further, low, moderate, and high levels of overdispersion were considered corresponding to σu=0.2,1 and 2, respectively. A model without overdispersion (HPN) was formulated by omitting uij from the mean structure in (8). The true values of the fixed effect parameters are β=(β00,β01,β10,β11) =(2,2,0.05,0.2) . For each setting, we made 100 simulated data sets consisting of 20 subjects with 5 measurements per subject and fit the HPN and HPNOD models with INLA, JAGS, and Stan using the four prior distributions listed in Section 3.2 for σb02, σb12 and σu2. For JAGS and Stan, we run 3 chains with 20,000 iterations per chain from which the first 10,000 iterations were considered as burn-in period. Then, we computed the relative bias and mean squared for each parameter (see Section 5.1).

5.3.1 Estimation of σb0 and σb1

The relative bias for σb0 obtained for the datasets generated with varying level of overdispersion (σu2=0,0.2,1,and2) and different values of σb1 (σb1=0.1andσb1=1.5) are presented in . We observe substantial variation among priors and software when σb0=0.1. When σb0=0.1 and the data generated without overdispersion, Prior 1 (Γ(1,0.0005)) leads to underestimation while prior 2–4 leads to overestimation for all software. The relative bias decreases as the true value for σb0 increases. We also observed the influence of the level of variation in the random slope (σb1) on the estimate of σb0 especially for JAGS.

Figure 11. Relative bias for σb0 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu) and true values of σb1. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Figure 11. Relative bias for σb0 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu) and true values of σb1. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

presents the relative bias for σb1 obtained for the datasets generated with varying levels of overdispersion (σu2=0,0.2,1,and2) and different values of σb0 (σb0=0.1andσb0=1.5). As before, variation among priors and software is observed when the true value for σb1 is small (σb1=0.1) and the true value for σb0 is large (σb1=1.5). Overall, INLA performs better than JAGS and Stan under all scenarios.

Figure 12. Relative bias for σb1 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu) and true values of σb0. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Figure 12. Relative bias for σb1 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu) and true values of σb0. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

5.3.2 Estimation of σu

presents the relative bias for σu obtained for the datasets generated with varying levels of overdispersion. The variation among priors decreases as the true value of overdispersion (σu) increases. However, a substantial difference is observed among software. INLA performs better than JAGS and Stan under all scenarios. JAGS and Stan consistently underestimate σu when σb000σb1=0.1000.1 and 0.1001.5. Further, when the data are generated with σb000σb1=0.1001.5 or 1.5001.5, JAGS and Stan overestimate σu when σu=0.2 and underestimate σu when σu=1orσu=2.

Figure 13. Relative bias for σu as a function of it’s true value (x-axis) obtained. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

Figure 13. Relative bias for σu as a function of it’s true value (x-axis) obtained. Red lines for prior 1 (Γ(1,0.0005)), green for prior 2 (Γ(0.001,0.001)), light blue for prior 3 (Γ(0.5,0.0164)) and purple for prior 4 (half-Cauchy(0,25)). N=20 and J=5

6 Concluding remarks

In this paper, we performed a Monte Carlo simulation study in order to simultaneously evaluate the performance of the Bayesian estimation methods and prior specifications for variance components in the context of longitudinal count data. We compared the results obtained with INLA to the results obtained with JAGS which uses Gibbs sampling and Stan which uses Hamiltonian Monte Carlo while assuming a variety of prior specifications for variance components. We analysed the influence of different factors such as small number of observations per cluster, different values of the random effect variance and estimation from a misspecified model on the bias and mean squared errors of the parameter estimates.

A simulation study has shown that the approximation strategy employed by INLA is accurate in general and all software leads to similar results for most of the cases considered. Estimation of the variance components, however, is difficult when their true value is small for all estimation methods and prior specifications. The estimates obtained for all software tend to be biased downward or upward depending on the prior. The results of the simulation study also show that there is an effect of cluster size. For all software and prior specifications, the relative bias for all parameters decrease as cluster size increases. For the random intercept and slope model, INLA performs better than JAGS and Stan under all scenarios. In our simulation study of the random intercept and slope model, we only considered independent prior for the random intercept and slope. Future research should focus on the comparison of different software assuming general priors for variance-covariance.

Supplemental material

Supplemental Material

Download Latex File (187.7 KB)

Acknowledgements

The authors gratefully acknowledge the support from VLIR-UOS. For the simulations, the Flemish Supercomputer Centre, funded by the Hercules Foundation and the Flemish Government of Belgium – department EWI, was used.

Disclosure statement

No potential conflict of interest was reported by the authors.

Supplemental data

Supplemental data for this article can be accessed here.

Additional information

Funding

The authors received no direct funding for this research.

Notes on contributors

Belay Birlie Yimer

Belay Birlie Yimer is currently a research associate at Versus Arthritis Centre for Epidemiology, Division of Musculoskeletal and Dermatological Sciences, The University of Manchester, UK. Prior to joining the University of Manchester, Belay was an assistant professor in the Department of Statistics, Jimma University, Ethiopia. He has completed his B.Sc. and M.Sc. in statistics and is currently pursuing his PhD in statistics. His main research area focuses on Bayesian methods and modelling complex time-to-event data with applications to infectious and chronic diseases.

Ziv Shkedy

Ziv Shkedy is a Professor of Biostatistics and bioinformatics at the University of Hasselt in Belgium.

References