833
Views
1
CrossRef citations to date
0
Altmetric
Articles in the special topic of Bayesian analysis

Bayesian quantile semiparametric mixed-effects double regression models

ORCID Icon, , ORCID Icon & ORCID Icon
Pages 303-315 | Received 07 Feb 2020, Accepted 15 Jan 2021, Published online: 05 Feb 2021

Abstract

Semiparametric mixed-effects double regression models have been used for analysis of longitudinal data in a variety of applications, as they allow researchers to jointly model the mean and variance of the mixed-effects as a function of predictors. However, these models are commonly estimated based on the normality assumption for the errors and the results may thus be sensitive to outliers and/or heavy-tailed data. Quantile regression is an ideal alternative to deal with these problems, as it is insensitive to heteroscedasticity and outliers and can make statistical analysis more robust. In this paper, we consider Bayesian quantile regression analysis for semiparametric mixed-effects double regression models based on the asymmetric Laplace distribution for the errors. We construct a Bayesian hierarchical model and then develop an efficient Markov chain Monte Carlo sampling algorithm to generate posterior samples from the full posterior distributions to conduct the posterior inference. The performance of the proposed procedure is evaluated through simulation studies and a real data application.

1. Introduction

Semiparametric mixed-effects double (SPMED) regression models can be viewed as a useful extension to semiparametric mixed-effects models and provide a more flexible framework for analysis of longitudinal data in a wide range of disciplines, such as biological studies, econometrics, fiance, and social sciences. Compared to that of traditional semiparametric mixed-effects models, they allow researchers to simultaneously model the mean and variance of the mixed-effects as a function of predictors. In many practical applications, we shall be interested in modeling heteroscedastic data by assuming that both the location and scale parameters depend on a set of predictors. For example, in Taguchi-type designed experiments, the researchers are interested in dealing with the scale structures and modeling the source of variability in the observed data.

Many estimation methods have recently been developed for jointly modeling the mean and variance of semiparametric mixed-effects models from both frequentist and Bayesian perspectives. Examples include (Aitkin, Citation1987) for the maximum likelihood (ML) estimation for the mean and variance in normal regression models, Cepeda and Gamerman (Citation2000) for Bayesian analysis of modeling of variance heterogeneity in normal regression models, Lombardia and Sperlich (Citation2007) for the ML estimation of generalized mixed-effects models, Chen and Tang (Citation2010) for Bayesian analysis of semiparametric reproductive dispersion mixed-effects models, Chen and Ye (Citation2009Citation2011) for Bayesian hierarchical modeling methods on dual response surfaces. Tang and Duan (Citation2012) for Bayesian analysis of semiparametric generalized partial linear mixed models, to name a just few. We observe that estimation of the mean parameters in these models often depends on the normality assumption for the errors, which implies that the results may not be robust when the data exhibit heavy-tailed behavior and/or contain outliers. To tackle this issue, we may employ some heavy-tailed distributions for the errors, which should be less insensitive to heteroscedasticity and outliers and thus make the statistical analysis more robust; see, for example, Fonseca et al. (Citation2008); Kang et al. (Citation2018); Wu et al. (Citation2017).

Alternatively, quantile regression (QR) may be employed to deal with these problems, as it is insensitive to heteroscedasticity and a number of error distributions; see, for example, Koenker and Bassett (Citation1978); Koenker et al. (Citation2018). QR can be view as a type of regression analysis, as it depicts the effects of the predictors on the conditional quantile function of the response variable and thus provides information that the classical mean regression may fail to reflect. In recent decades, the parameter estimation in QR has drawn increasingly attention in the literature, such as the Majorize-Minimization algorithm of Hunter and Lange (Citation2000), the Expectation–Maximization of Tian et al. (Citation2014), and Bayesian method of Yu andMoyeed (Citation2001).

It is worth noting that due to a close relationship between the models with the asymmetric Laplace distribution (ALD) for the errors and the QR analysis (e.g., Yu and Moyeed (Citation2001)), Bayesian QR analysis has been widely studied by specifying the ALD as a working likelihood function. Such a specification is based on the fact that the problem of estimating the regression coefficients in linear QR models is equivalent to that of maximizing the likelihood function in linear models with the ALD for the errors. Furthermore, the utilization of the ALD could make the statistical analysis more robust than the normal distributed error and its mixture representation (Kozumi & Kobayashi, Citation2011)) allows researchers to develop a Bayesian hierarchical model for conducting the posterior inference; see, for example, Alhamzawi and Ali (Citation2018); Kotz et al. (Citation2001); Tian and Song (Citation2020). Yang et al. (Citation2016) discussed the asymptotic validity of posterior inference of pseudo-Bayesian quantile regression methods when the ALD is used as the working likelihood function. Bayesian QR analysis has also been studied in linear mixed-effects models. For instance, Waldmann et al. (Citation2013) studied Bayesian semiparametric additive regression models. Tian et al. (Citation2016) studied Bayesian joint QR analysis for mixed-effects models with different data features. Zhang et al. (Citation2019) investigated Bayesian quantile regression-based partially linear mixed-effects models for analysis of longitudinal data.

As seen in many practical applications, the assumptions of equal variances and normality for the errors may not be appropriate for modeling the data that exhibit heteroscedasticity and have tails heavier than those of a normal distribution. To avoid these assumptions, we may consider semiparametric mixed-effect models (e.g., Geraci (Citation2019); Geraci and Bottai (Citation2014)) or SPMED models (e.g., Xu et al. (Citation2016)) with flexible errors. The main difference between the two types of models is that SPMED models jointly consider the quantile of the response variable and the variance structures of random-effects as a function of the predictors. This framework allows researchers to model various quantiles and the variance based on the same set of the predictors through using parametric linear models that have been extensively studied in the literature. In addition, it is not only insensitive to heteroscedasticity and outliers, but also allows researchers to develop an efficient Markov chain Monte Carlo (MCMC) algorithm for performing the posterior inference. To be best of my knowledge, not much work has been done for conducting Bayesian QR analysis for the SPMED models under the ALD for the error. In this paper, we fill this gap by developing a Bayesian hierarchical SPMED model based on an assumption that both the quantile and variance parameters rely on a set of predictors under consideration, which could depict the effects of a set of predictors on the complete conditional distribution of a response variable.

The remainder of this paper is organized as follows. In Section 2, we briefly overview SPMED regression models and discuss the specification of the ALD as a working likelihood for the errors. In Section 3, we discuss prior distributions of the model parameters and develop an efficient MCMC-based sampling algorithm for drawing the posterior inference of all unknown parameters. In Section 4, we conduct simulation studies to investigate the performance of the proposed methods. In Section 5, a real-data application is provided for illustrative purposes, and finally, some concluding remarks are given in Section 6 with the Metropolis-Hastings algorithm provided in the Appendix.

2. Quantile semiparametric mixed-effects double regression models

In this section, we introduce quantile SPMED regression models in Section 2.1 and discuss Bayesian analysis of quantile SPMED regression models under the ALD error in Section 2.2.

2.1. Quantile SPMED regression models

Consider the semiparametric mixed-effects model of the form (1) yij=xijTβ+g(tij)+vi+ϵij,i=1,2,,n,j=1,2,,mi,(1) where yij is the response variable of the ith subject on the jth measurement, xij=(xij1,,xijp)T is a p×1 vector of predictor variables, β is a p×1 vector of unknown regression coefficients, g(tij) is an unknown smooth function associated with a univariate observed covariate at time tij, vi is a random-effect of the ith subject with viN(0,σi2) while σi2 being the heterogeneity variance, and ϵij is the error term. Here, the superscript T represents the transpose of a matrix or a vector. To avoid the need of an intercept, it is assumed that the response y=(y11,ymnn)T is zero centered. In practical applications, there often exists the variability related to predictors. This motivates us to assume the existence of variance heterogeneity for each subject, in which σi2 is related to a set of predictors zi=(zi1,,ziq)T, such that σi2=h(zi,γ), where h(,) is a known function and γ=(γ1,,γq)T is a q×1 vector of regression coefficients. There are several known forms for h(zi,γ), such as the log-linear model or the power product model (e.g., Xu et al. (Citation2016)).

Numerous techniques such as the smoothing splines and the kernel methods can be adopted to handle the nonparametric function g() in (Equation1); see, for example, Fan and Gijbels (Citation1996); Kai et al. (Citation2011), among others. In this paper, we consider the B-spline approximation to convert g() into a linear function, which consists of a set of basis functions. Without loss of generality, we assume that tij[0,1] can be partitioned as 0=s0<s1<<skn<skn+1=1, where si is an internal knot. There are K=kn+M normalized B-spline basis functions {πk(tij)} of order M that form a basis for the linear spline space, where πk() is the kth basis function for k=1,2,,K. By following the idea of He et al. (Citation2005), we may set the number of knots to be the integer part of N1/5 with N=i=1nmi. The model (Equation1) can then be simply linearized as (2) yij=xijTβ+bijTα+vi+ϵij,i=1,2,,n,j=1,2,,mi,(2) where bij=(π1(tij),,πK(tij))T is a K×1 vector of basis functions and α is a K×1 vector of the regression coefficients for the basis functions.

We observe from Kozumi and Kobayashi (Citation2011) that if the distribution function of the error ϵij in (Equation2) is restricted to have the τth quantile equal to zero, such that 0fτ(ϵij)dϵij=τ, where fτ() is the probability density function (pdf) of the error, then the τth quantile regression estimator for βτ and ατ can be obtained by minimizing the following objective loss function (3) (βˆτ, αˆτ)=argminβτ,ατi=1nj=1miρτ(yijxijTβτbijTατvi),(3) where τ(0,1) is a given quantile level, and the check loss function ρτ() is defined as ρτ(u)=u{τI(u<0)}, where I() denotes the indicator function. Since the estimators cannot be directly obtained by differentiating the objective function in (Equation3), we may employ numerical methods to calculate these quantile regression estimators; see, for example, Koenker and Park (Citation1996).

2.2. Bayesian quantile SPMED regression models

Within a Bayesian QR framework, we need to specify a density function fτ() for the error in (Equation2). Owing to an equivalence between the minimization problem in (Equation3) and the maximization of the likelihood function with the ALD for the errors (e.g., Yu and Moyeed (Citation2001)), we may assume the error in (Equation2) to be the ALD, denoted by ϵALD(μ,θ,τ), whose pdf is given by fτ(ϵ|μ,θ,τ)=τ(1τ)θexpρτϵμθ, where μ is the location, θ is the scale, and τ(0,1) stands for the skewness. Analogous to Kozumi and Kobayashi (Citation2011), we can represent the ALD as a normal-exponential mixture summarized in the following proposition.

Proposition 2.1

Let eE(θ1) and rN(0,1) be two independent random variables, where E(ϑ) represents an exponential distribution with mean ϑ. If ϵALD(μ,θ,τ), then it can be represented by ϵ=μ+k1e+rk2θe, where k1=12ττ(1τ) and k2=2τ(1τ).

According to Proposition 2.1, the model in (Equation2) under the ALD for the errors can be rewritten as (4) yij=xijTβ+bijTα+vi+k1eij+rijk2θeij,i=1,2,,n,j=1,2,,mi,(4) where eijE(θ1) and rijN(0,1) are independent of each other.

Let y=(y1T,,ynT)T be the vector of all response observations with yi=(yi1,,yimi)T, t=(t1T,,tnT)T be the time sequence vector with ti=(ti1,,timi)T, X=(X1T,,XnT)T be the design matrix with Xi=(xi1,,ximi)T, B=(B1T,,BnT)T with Bi=(bi1,,bimi)T, e=(e1T,,enT)T with ei=(ei1,,eimi)T, and r=(r1T,,rnT)T with ri=(ri1,,rimi)T. Denote v~=(v1T,,vnT)T with vi=vi1mi, where ⊗ is the Kronecker product and 1mi is a vector consisting of mi 1s. Then the model in (Equation4) can be rewritten as a matrix form (5) y=Xβ+Bα+v~+k1e+rk2θe,(5) where the symbol ‘’ is the Hadamard product that takes two matrices of the same dimensions to multiply element by element. The likelihood function of all model parameters is given by (6) L(α,β,γ,θ,v,e|y,X,Z,t)|Σ|12|E|12×exp12(yμ)TE1(yμ)12vTΣ1v,(6) where Σ=diag(σ12,,σn2) with σi2=h(zi,γ) for i=1,,n, h(,) is a known function and diag() is the matrix with its argument on the diagonal, γ=(γ1,,γq)T is a q×1 vector of the regression coefficients, E=k2θdiag(eT), μ=Xβ+Bα+v~+k1e, v=(v1,,vn)T, and Z=(z1,,zn)T with zi=(zi1,,ziq)T.

3. Posterior inference

Bayesian analysis begins with prior specifications for the unknown parameters (α,β,γ,θ) in (Equation6). We assume that (α,β,γ) are independently distributed as multivariate normal distributions, such that α|ϕ2NK(α0,ϕ2IK), β|θNp(β0,θBβ), and γNq(γ0,Bγ), respectively, where α0, β0, γ0, Bβ, and Bγ are the prespecified hyperparameters and IK is the identity matrix, and ϕ2 follows an inverse Gamma distribution, denoted by ϕ2InvGamma(aϕ2,bϕ2), where the shape parameter aϕ2 and the scale parameter bϕ2 are known positive constants. We assume that θInvGamma(aθ,bθ), where aθ and bθ are known positive constants. The resulting joint posterior distribution of the unknown parameters Φ=(α,β,γ,θ) is given by (7) p(Φ,e,v|y,X,Z,t)L(α,β,γ,θ,v,e|y,X,Z,t)p(e|θ)p(α|ϕ2)×p(β|θ)p(γ)p(ϕ2)p(θ),(7) which is indirectly tractable for performing the posterior inference. To tackle this issue, we first derive the full conditional distributions of the unknown parameters and then construct the Gibbs sampler with the Metropolis–Hastings sampling algorithms to generate posterior samples from their full conditional distributions as follows.

  • Full conditional distribution of α:

    Since α|ϕ2NK(α0,ϕ2IK), we have (8) p(α|β,γ,θ,v,e,y,X,Z,t)L(α|θ,β,γ,v,e,y,X,Z,t)p(α|ϕ2)exp12(αα0)TBα1(αα0),(8) where α0=Bα((ϕ2)1α0+BTE1(yXβv~k1e)) and Bα=(BTE1B+(ϕ2)1IK)1.

  • Full conditional distribution of β:

    Since β|θNp(β0,θBβ), we have (9) p(β|α,γ,θ,v,e,y,X,Z,t)L(β|θ,α,γ,v,e,y,X,Z,t)p(β|θ)exp12(ββ0)TBβ1(ββ0),(9) where β0=Bβ(θ1Bβ1β0+XTE1(yv~Bαk1e)) and Bβ=(XTE1X+θ1Bβ1)1.

  • Full conditional distribution of γ:

    Since γNq(γ0,Bγ), we have (10) p(γ|α,β,θ,v,e,y,X,Z,t)L(γ|θ,β,α,v,e,y,X,Z,t)p(γ)|Σ|12exp12vTΣ1v12(γγ0)TBγ1(γγ0).(10)

  • Full conditional distribution of θ:

    Since eij|θE(θ1),β|θNp(β0,θBβ), and θInvGamma(aθ,bθ), it can be easily shown that (11) p(θ|α,β,γ,v,e,y,X,Z,t)L(θ|β,α,γ,v,e,y,X,Z,t)p(e|θ)p(β|θ)p(θ)θaθ1expbθθ,(11) where aθ=3N+p2+aθ and bθ=12(yμ)TE01(yμ)+12(ββ0)TBβ1(ββ0)+eT1N+bθ with E0=k2diag(eT).

  • Full conditional distribution of ϕ2:

    Since α|ϕ2NK(α0,ϕ2IK) and ϕ2InvGamma(aϕ2,bϕ2), we have (12) p(ϕ2|α)p(α|ϕ2)p(ϕ2)(ϕ2)aϕ21expbϕ2ϕ2,(12) where aϕ2=K2+aϕ2 and bϕ2=12(αα0)T(αα0)+bϕ2.

  • Full conditional distribution of e:

    Since eij|θE(θ), the full conditional distribution of each eij is given by (13) p(eij|α,β,γ,θ,v,y,X,Z,t)L(eij|θ,β,α,γ,v,y,X,Z,t)p(eij|θ)eij12expaeeij+beij/eij2,(13) where ae=k12+2k2k2θ and beij=(yijxijTβbijTαvi)2k2θ.

  • Full conditional distribution of v:

    The conditional posterior v is given by (14) p(v|α,β,γ,θ,e,y,X,Z,t)exp12(v~TE1v~2(yXβBαk1e)TE1v~+vTΣ1v)12exp12(vv0)TBv1(vv0),(14) where v0=Bvw and Bv=(A+Σ1)1 with A=(k2θ)1diag(j=1m1e1j1,,j=1mnenj1) and w=(w1,,wn)T with  wi=(k2θ)1j=1mj{(yijxijTβbijTαk1eij)eij1}.

According to the conditional posteriors from (Equation8) to (Equation14), we an construct an efficient MCMC-based sampling algorithm for generating posterior samples from the full conditional distributions of the unknown parameters summarized in Algorithm 1.

It is worth noting that except the conditional distribution of γ in Step 4, the sampling schedules for other parameters are based on known distributions and can thus be easily implemented. After the burn-in period, we may assume that the posterior samplers from Algorithm 1 have converged to the joint posterior distribution in (Equation7). Then we collect a total number of M MCMC samples Φ(k)=(θ(k),α(k),β(k),γ(k)), for k=1,,M with M<J. The posterior means of the parameters (α~,β~,γ~,θ~) can be estimated as follows α~=1Mk=1Mα(k),β~=1Mk=1Mβ(k),γ~=1Mk=1Mγ(k),andθ~=1Mk=1Mθ(k), respectively.

4. Simulation study

In this section, we conduct simulation studies to evaluate the performance of the proposed Bayesian procedure with respect to different choices of prior information in Section 4.1 and several types of errors in Section 4.2. We here adopt the cubic spline approximation (i.e., M = 4) for the nonparametric function in (Equation1). Since the cubic B-spline has two continuous derivatives and a piecewise polynomial of degree three behaves numerically well, it should be sufficient to lead to smooth approximations. All the results were based on 10,000 iterations after discarding the first 2,000 as the burn-in period. There is no evidence of lack of convergence in MCMC simulations according to the run length control diagnostic due to Raftery and Lewis (Citation1992) and the convergence diagnostic test statistic (at a significance level of 5%) proposed by Geweke (Citation1992).

4.1. Quantile regression models with the ALD error

In the simulation study, we consider the model (Equation1) of the form (15) yij=xijTβ+g(tij)+vi+k1eij+rijk2θeij,i=1,2,,n,j=1,2,,m,(15) where we set g(tij)=sin(2πtij) and m = 4. We generate tij's from a uniform distribution on (0, 1), xij is a 3×1 vector, whose elements are independently sampled from N(0,1), and β=(1,0.8,1)T. For the random effect vi, we consider a log-linear structure, such that log(σi2)=ziTγ with γ=(1,0.5)T and zi=(zi1,zi2)T, where zi1 and zi2 are independently sampled from N(0,1). Then vi is generated from N(0,σi2). Since rijN(0,1), we have rijk2θeijN(0,k2θeij) with eijE(1). We can generate yij from N(μij,k2θeij) with μij=xijTβ+g(tij)+vi+k1eij.

We conduct the sensitivity analysis of the proposed method with respect to three different choices of the hyperparameters for β0 and γ0 as follows:

  • Type I: Accurate prior information with β0=(1,0.8,1)T and γ0=(1,0.5)T.

  • Type II: Inaccurate prior information with β0=1.5×(1,0.8,1)T and γ0=1.5×(1,0.5)T.

  • Type III: No prior information with β0=(0,0,0)T and γ0=(0,0)T.

Other hyperparameters are set as σγ2=4, aθ=bθ=1, aϕ2=bϕ2=1, Bβ=I3, and Bγ=I2 to reflect weak prior information. It deserves mentioning that we can easily incorporate available prior information into the proposed Bayesian procedure by specifying different values of the hyperparameters. In each type, we consider n={30,80,160} and quantile levels τ={0.25,0.5,0.75}. We generate 100 replications from each combination under consideration.

Figures  and  depict the true sine curve against its estimated one based on the cubic B-spline approximation. We observe that all estimated curves are close to the true ones, indicating that the cubic B-spline approximation performs well for estimating g() in (Equation15). Table  provides the simulation results for θ, β1,β2,β3, γ1, and γ2, including the estimated bias and the mean squared error (MSE) under different quantile levels, sample sizes, and the three types of prior information. Several conclusions from this table can be summarized as follows:

  1. As one expects, the bias and MSE of all the parameters decrease significantly at each quantile level as the sample size increases. For instance, there is no significant difference between n = 80 and n = 160. This illustrates that the proposed method could provide accurate estimates even when the sample size is moderate (e.g., n = 80).

  2. We observe that Bayesian estimates are quite robust for the prior specifications for the unknown parameters. The bias and MSE of the parameters do not have distinct difference across three types of considered prior information. We also observe that initial values of parameters and the prior inputs do not affect the performance of the proposed Bayesian method.

  3. It is worth noting that Bayesian estimates of γ have relatively large bias and MSE compared to the ones for other parameters. This is mainly because the parameter γ is related to the variance of the random-effects. Of particulate note is that Bayesian estimates of θ usually have a small bias regardless of different sample sizes and prior information.

  4. The results of bias and MSE of all the parameters clearly show that the performance of the proposed Bayesian method is quite satisfactory for any specific combination of sample size and prior information among different quantiles under consideration.

Figure 1. The true sine curve versus its estimated one when n = 80 and τ=0.5. Type I (left panel), Type II (middle panel), and Type III (right panel).

Figure 1. The true sine curve versus its estimated one when n = 80 and τ=0.5. Type I (left panel), Type II (middle panel), and Type III (right panel).

Figure 2. The true sine curve versus the estimated curve when n = 160 and τ=0.25. Type I (left panel), Type II (middle panel), and Type III (right panel).

Figure 2. The true sine curve versus the estimated curve when n = 160 and τ=0.25. Type I (left panel), Type II (middle panel), and Type III (right panel).

Table 1. Bias and MSE (in parenthesis) of the Bayesian estimates under ALD error distributions.

4.2. Quantile regression models with the non-ALD errors

To further illustrate the performance of the proposed Bayesian method, we consider the following three distributions for the errors

  • Type A: ϵijN(μ,4), where μ is chosen such that the τth quantile is 0.

  • Type B: ϵijLaplace(μ,2), where μ is chosen such that the τth quantile is 0.

  • Type C: ϵij0.3N(μ+1,1)+0.7N(μ,4), where μ is chosen such that the τth quantile is 0.

Other simulation settings and the choices of the hyperparameters keep the same as the ones in Section 4.1. We consider n = 80 and τ={0.25,0.5,0.75}. We consider noninformative priors by setting β0=(0,0,0)T and γ0=(0,0)T. Numerical results are provided in Table . It can be seen that under Type C, the MSEs of Bayesian estimates are smaller for all the parameters, especially for τ={0.25,0.75} and that Bayesian estimates are quite accurate regardless of the errors under consideration.

Table 2. Bias and MSE (in parenthesis) of the Bayesian estimates under non-AL error distributions.

5. Real data application: the multiCenter AIDS cohort study

In this section, we consider the MultiCenter AIDS Cohort Study (MACS) data, which is available in the ‘timereg’ package (Scheike & Zhang, Citation2011). The MACS data set consists of 283 human immunodeficiency virus (HIV) status of homosexual men. This data set has been widely used to study the mean CD4 percentage depletion over time and the effects of other physical status, including age of the patient at the start of the trial, smoking status, and the post-infection CD4 percentage; see, for example, Fan and Li (Citation2004); Xu et al. (Citation2016); Zhao and Xue (Citation2010). Owing to the difference of the trend of CD4 depletion between high CD4 percentage and low CD4 percentage patients, this data set is of specific interest to us for the implementation of the proposed Bayesian quantile SPMED models.

Let yij be the observation of CD4 percentage at the current visit, xij1 be the smoking status (0 for non-smoker and 1 for smoker), xij2 be the age of the patient at the start of the trial, and xij3 be the post-infection CD4 percentage. To avoid the need of an intercept, it is assume that the response y is zero centered. We also assume that the variance of the random effect vi has a linear relationship with two predictors zi1=1mij=1mixij1 and zi2=1mij=1mixij3. The Bayesian quantile SPMED regression model can be written as yij=xijTβ+g(tij)+vi+k1eij+rijk2θeij, where xij=(xij1,xij2,xij3)T, β=(β1,β2,β3)T, k1=12ττ(1τ), and k2=2τ(1τ). Here, eijE(θ1), rijN(0,1), and viN(0,σi2) with σi2=γ1zi1+γ2zi2 for i=1,2,,283,j=1,2,,mi.

We are interested in investigating the relationship between the mean CD4 percentage g(tij) and time tij at different quantile levels τ={0.05,0.25,0.5,0.75,0.95}. Due to a lack of prior knowledge, we set all the hyperparameters to be small and let initial values of all the unknown parameters be 0. For the implementation of the Metropolis-Hastings algorithm for sampling γ, we set σγ2=4 to achieve approximate acceptance rates of about 35% (Gelman & Gilks, Citation1996). For each quantile level, we run the proposed MCMC-based sampling algorithm 55,000 iterations, discard the first 5,000 as the burn-in period, and then perform a thinning factor of 10, leading to a total number of 5,000 samples for conducting the posterior inference.

Figures  and  depict the trace and density plots of 5,000 posterior samples of the model parameters. It can be seen from these figures that the MCMC chains of all the parameters rapidly converge to their stationary distributions. Furthermore, we present the autocorrelation function (ACF) plot to in Figure  check the autocorrelations between the posterior samples of all the parameters, which shows that the autocorrelations between posterior samples decline to zero at a very fast rate. Similar results are also observed with respect to other quantiles and are not shown here for simplicity. We also conduct the run length control diagnostic of Raftery and Lewis (Citation1992) and the convergence diagnostic test statistic of Geweke (Citation1992) at a significance level of 5%, which reconfirm that there is no evidence of lack of convergence of all the MCMC chains.

Figure 3. The trace plot and density plot of 5,000 posterior samples of (θ,γ1,γ2) when τ=0.5.

Figure 3. The trace plot and density plot of 5,000 posterior samples of (θ,γ1,γ2) when τ=0.5.

Figure 4. The trace plot and density plot of 5,000 posterior samples of (β1,β2,β3) when τ=0.5.

Figure 4. The trace plot and density plot of 5,000 posterior samples of (β1,β2,β3) when τ=0.5.

Figure 5. The ACF plot of 5,000 posterior samples of the unknown parameters when τ=0.5.

Figure 5. The ACF plot of 5,000 posterior samples of the unknown parameters when τ=0.5.

Table  summarizes Bayesian estimates (EST) of the unknown parameters (θ,ϕ2,β,γ) and their standard deviation estimates (SD). Figure  depicts the estimated CD4 depletion trends under five different quantile levels. As one expects, the CD4 depletion trends are quite distinct among diverse groups of patients. For instance, for the patients who have a high CD4 level around 50, their CD4 percentages almost get back to their original level after 6 years. Other patients' CD4 percentages could decrease quickly after an infection, whereas they would increase at the end. This observation indicates drug usage can be distributed according to the patients' starting CD4 percentages. It deserves mentioning that our results at τ=0.5 are in good agreement with the ones by fitting the local linear fitting method of Fan and Li (Citation2004) and by using the semiparametric varying-coefficient partially linear models of Zhao and Xue(Citation2011).

Figure 6. The mean CD4 percentage g(tij) vs time tij at the 5 different quantile levels.

Figure 6. The mean CD4 percentage g(tij) vs time tij at the 5 different quantile levels.

Table 3. The Bayesian estimates of the unknown parameters in the MACS data.

6. Concluding remarks

In this paper, we developed Bayesian methods for parameter estimation in quantile semiparametric mixed-effects double regression models by modeling the variance of the mixed-effects as a function of predictors. A Bayesian hierarchical model was developed and an efficient MCMC-based sampling algorithm was proposed for performing the full posterior inference. Numerical results from both simulation studies and a real data application showed that the proposed procedures perform well in general settings and can accurately estimate the parameters of interest under different scenarios. It is known that variable selection is as important as parameter estimation in modeling the longitudinal data, and for future work, we conduct Bayesian variable selection in quantile semiparametric mixed-effects double regression models.

Acknowledgements

The authors greatly appreciate the two reviewers for their thoughtful comments and suggestions, which have improved the quality of the manuscript. This work was based on the first author's dissertation research which was supervised by the corresponding author. Dr. Wu was supported by the National Natural Science Foundation of China under grant 11861041. Drs. Keying Ye and Min Wang were partially supported by a grant from the UTSA Vice President for Research, Economic Development, and Knowledge Enterprise at the University of Texas at San Antonio.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Duo Zhang

Duo Zhang is a statistician and his research interests include quantile regression, statistical analysis, and applications.

Liucang Wu

Liucang Wu is a professor of Statistics in the Faculty of Science at Kunming University of Science and Technology. His research interests include Bayesian analysis, statistical decision theory, multivariate analysis, and statistical applications.

Keying Ye

Keying Ye is a professor of Statistics in the Department of Management Science and Statistics at the University of Texas at San Antonio. His research interests include Bayesian analysis, statistical inference and decision theory, and statistical applications.

Min Wang

Min Wang is an associate professor of Statistics in the Department of Management Science and Statistics at the University of Texas at San Antonio. His research interests include Bayesian statistics, high dimensional inference, statistical applications, and machine learning.

References

Appendix

The Metropolis–Hastings algorithm for sampling γ in Algorithm 1 is summarized as follows. We choose the commonly used multivariate normal distribution as the proposal distribution (Chib & Greenberg, Citation1994), denoted by Nq(m(k),σγ2V(k)) with m(k)=argmaxlogp(γ|v(k),y,X,Z,t),V(k)={(H)1}γ=m,H=2p(γ|v(k),y,X,Z,t)γγT, where H is the Hessian matrix and σγ2 is chosen such that the average acceptance rate is between 0.25 and 0.45 (Gelman & Gilks, Citation1996). For the (k+1)th iteration, we sample γ(k) by the following two steps:

  1. Generate a new candidate γ from the proposal distribution Nq(m(k),σγ2V(k)).

  2. Let γ(k)=γ,if Unif (0,1)ω(γ,γ(k)),γ(k),otherwise,, where ω(γ,γ(k)) is the acceptance ratio given by ω(γ,γ(k))=min1,p(γ|v(k),y,X,Z,t)p(γ(k)|v(k),y,X,Z,t).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.