623
Views
1
CrossRef citations to date
0
Altmetric
Articles in the special topic of Bayesian analysis

Bayesian analysis for quantile smoothing spline

ORCID Icon &
Pages 346-364 | Received 02 Jan 2021, Accepted 18 Jun 2021, Published online: 06 Jul 2021

Abstract

In Bayesian quantile smoothing spline [Thompson, P., Cai, Y., Moyeed, R., Reeve, D., & Stander, J. (2010). Bayesian nonparametric quantile regression using splines. Computational Statistics and Data Analysis, 54, 1138–1150.], a fixed-scale parameter in the asymmetric Laplace likelihood tends to result in misleading fitted curves. To solve this problem, we propose a new Bayesian quantile smoothing spline (NBQSS), which considers a random scale parameter. To begin with, we justify its objective prior options by establishing one sufficient and one necessary condition of the posterior propriety under two classes of general priors including the invariant prior for the scale component. We then develop partially collapsed Gibbs sampling to facilitate the computation. Out of a practical concern, we extend the theoretical results to NBQSS with unobserved knots. Finally, simulation studies and two real data analyses reveal three main findings. Firstly, NBQSS usually outperforms other competing curve fitting methods. Secondly, NBQSS considering unobserved knots behaves better than the NBQSS without unobserved knots in terms of estimation accuracy and precision. Thirdly, NBQSS is robust to possible outliers and could provide accurate estimation.

1. Introduction

While mean regression has been widely used in statistics, quantile regression, first proposed by Koenker and Bassett (Citation1978), is regarded as a powerful supplement of mean regression. It models the conditional quantiles of the dependent variable as a function of the covariates. There is an extensive literature on quantile regression in classic statistics from various fields including the works of Koenker (Citation2004), Chernozhukov (Citation2005), Wang et al. (Citation2007), Li and Zhu (Citation2008) and Wu and Liu (Citation2009). We recommend Koenker (Citation2005) for a comprehensive review on quantile regression. Suppose that we have N samples (x1,y1),,(xN,yN), the linear quantile regression model for the pth quantile is yi=xiβ+ui, where ui are independent with pth quantile zero. Koenker and Bassett (Citation1978) proposed to obtain the estimators of the coefficient β=(β1,,βm) by solving the following minimisation (1) minβi=1Nρp(yixiβ),(1) where ρp(x) is called a check function with (2) ρp(x)=px,if x>0,(p1)x,if x0.(2) From a frequentist view, the consistency and the asymptotic normality of the estimators have been well established (Koenker, Citation2005). However, the Bayesian approach is preferred when the priority is making inference with a small sample size or obtaining credible intervals for all the parameters simultaneously. Under the Bayesian framework, Yu and Moyeed (Citation2001) assumed that the distribution of error term ui follows the asymmetric Laplace distribution (ALD). This choice has received tremendous attention for its guaranteed posterior consistency of Bayesian estimators (Sriram et al., Citation2013) and robustness (Yu & Moyeed, Citation2001) even if the random errors do not follow ALD. The density of ALD with location parameter μR, scale parameter σ>0 and pth quantile, ALD(μ,σ,p), is (3) p(y|μ,σ,p)=p(1p)σexp(σρp(yμ)).(3) With ALD, the likelihood function of β is

(4) p(y|β,σ,p)=pN(1p)NσNexpσi=1Nρpyixiβ.(4)

Clearly, with a constant prior on β, the posterior mode of β given σ is the minimiser of (Equation1).

Notice that (Equation4) is based on a linear relationship between the response variable and a set of predictors. In reality, the underlying relationship between univariate covariate x and y may be a nonlinear function f(x). In the discussion of Cole (Citation1988) and Jone (Citation1988) proposed to estimate quantile smoothing spline by (5) minfi=1Nρp(yif(xi))+η0Tf(x)2dx,(5) where ηR+ controls the smoothness of the spline. Koenker et al. (Citation1994) pointed out that the resulting quadratic program in (Equation5) poses some serious computational issues and they estimated f(x) by minimising (6) minfi=1Nρp(yif(xi))+η01|f(x)|dx,(6) where f(x) belongs to the space U2 defined as follows, (7) U2=01g:g(x)=a0+a1x+01(xy)+dμ(y),V(μ)<+.(7) They showed that the solution is the linear spline with the knots at xi and the computation enjoys linear programming virtues. However, the frequentist approaches cannot easily provide uncertainty evaluations for the estimation of f() or a stable estimator of η. From a Bayesian perspective, Thompson et al. (Citation2010) proposed original Bayesian quantile smoothing spline to solve (Equation5). Using ALD in (Equation3) with σ=1 serving as the likelihood, they imposed the prior on z=(f(x1),,f(xn)) (8) p(z|η)ηN22(2π)N22(μ1μN2)1/2expη2zAz,(8) where A is a smoothing matrix (See Section 2) and μ1,,μN2 are the inverse of N−2 non-zero eigenvalues of A. However, two undesirable issues exist in this method. Firstly, as demonstrated by Santos and Bolfarine (Citation2016), the likelihood with σ fixed is not flexible enough to capture the variability of the data, especially when p is away from 0.5, which leads to inaccurate estimates. Secondly, worse still, our extensive simulation studies and one real data example (Section 4) show that this method automatically poses either an extremely large or small value on η, hence the resulting curve tends to either too flat or overly fitted and fails to capture the true trend of the underlying curve.

One natural consideration to alleviate this problem is employing a random σ instead of a fixed value. We probe this option for Bayesian quantile smoothing spline using the likelihood (Equation4) and we refer to this option as the new Bayesian quantile smoothing spline (NBQSS). In fact, a random σ has been studied in other Bayesian quantile curve fitting methods. For instance, Yue and Rue (Citation2011) proposed Bayesian quantile regression for additive mixed models. Hu et al. (Citation2015) established semiparametric Bayesian quantile regression for partially linear additive models. However, they all employed the proper conjugate prior on scale parameter, which is not favourable when little information is available. In this paper, we explore the objective prior for the scale parameter σ and investigate the posterior propriety of the joint posterior with one general type prior including the invariant prior on σ. We also develop the partially collapsed Gibbs sampling algorithm for the joint posterior. Furthermore, to deal with the unobserved knots in some intervals, we extend our framework to the unobserved knots case seamlessly.

The remainder of paper is organised as follows. In Section 2, we present NBQSS and establish the corresponding conditions of posterior propriety under two common prior specifications. We also develop the related ready-to-use partially collapsed Gibbs sampling. In Section 3, we extend our conclusions to the case with unobserved knots out of practical consideration. In Section 4, to evaluate the performance of our method, we compare NBQSS with the other four competing curve fitting methods through extensive simulation studies. In addition, in the case of the unobserved knots, we investigate the behaviour of NBQSS under two unobserved mechanisms. In Section 5, we conclude with highlights in this paper, recommendations for the priors and research priorities in the future.

2. Bayesian quantile smoothing spline

2.1. Prior specification and posterior propriety

We formally address the minimisation (Equation5). To be more specific, we assume the knots xi are arranged in an increasing order, 0x1<<xnT. Suppose that wi is the number of observations at the knot xi, i=1,,n with i=1nwi=N and yij is the jth observation at xi, j=1,,wi. Then, (Equation5) can be rewritten as (9) minfW22i=1nj=1wiρp(yijf(xi))+η0T(f(x))2dx,(9) where W22={f:f,f is absolutely continuous,0T(f(t))2dt<+}. The solution to (Equation9) is the natural cubic spline and the knots are xi,i=1,,n (Koenker et al., Citation1994; Wahba, Citation1990). Therefore, solving (Equation9) is equivalent to solving (10) minθ0,θ1,ai=1nj=1wiρpk=1nyijθ0θ1xik=1nakR(xk,xi)+ηaRa,(10) where R is a n×n positive matrix with (i,j) element R(xi,xj) and R(,) is a reproducing kernel (Gu, Citation2013) with the following form, (11) R(x,y)=01(xu)+(yu)+du=16(xy)2{3(xy)(xy)},(11) where (xu)+=max{xu,0}, xy=max(x,y), xy=min(x,y), a=(a1,,an) and aRa is the penalty term corresponding to 0T(f(t))2dt in (Equation9).

To handle the minimisation in (Equation10) under the Bayesian framework, we assume the likelihood of (θ0,θ1,σ,a) based on y=(y11,,y1w1,,ynwn) is (12) p(y|θ0,θ1,a,σ,p)σNexpσi=1nj=1wiρp×yijθ0θ1xik=1nakR(xk,xi)j=1wi.(12) Define θ=(θ0,θ1), we assign the prior for (θ,a) to be (13) π(θ,a|δ)1δn/2exp12δaRa.(13) Note that with the likelihood (Equation12) and the prior (Equation13), the posterior mode of (θ,a|δ,σ,y) is the solution to (Equation10).

As our primary interest lies in the fitted curves instead of the estimation of (θ,a), inspired by Speckman and Sun (Citation2003), we alternatively unify the prior of (θ,a) as the prior of the unknown functional value, zi=θ0+θ1xi+k=1nakR(xk,xi). Here, we denote z=(z1,,zn)=Tθ+Ra, T=(1n,x), x=(x1,,xn), then the likelihood (Equation12) can be written in the following way, (14) p(y|z,p,σ)σNexpσρp(yCz),(14) where C=diag(1w1,,1wn) and CC=diag(w1,w2,,wn). Notice that the reparameterisation reduces the number of parameters in the model.

Utilising (Equation13) and the relationship between z and (θ,a), we can obtain that the prior distribution of z is a partial information normal prior (PIN) (Speckman & Sun, Citation2003) with density, (15) π(z|δ)|A|+1/2δ(n2)/2exp12δzAz,(15)

where A=R1R1T(TR1T)1TR1 is a positive semi-definite matrix with rank n−2 and |A|+ is the product of the positive eigenvalues of A. According to Sun et al. (Citation1999), PIN can be interpreted as two parts, a constant prior on the null space of A and a proper multivariate normal prior on the range of A.

For the fully Bayesian analysis, one common prior of (σ,δ) is the independent conjugate priors as follows, (16) π(σ,δ)σa01exp(b0σ)1δa1+1expb1δ.(16) (Equation16) has been widely used in Bayesian P-spline (Lang & Brezger, Citation2004) and Bayesian hierarchical linear mixed models (Hobert & Casella, Citation1996). Lang and Brezger (Citation2004) suggested the invariant prior (a0=b0=0) on σ, a1=1 and b1 is a small quantity (such as 102). However, there are two undesirable issues for introducing this specification in NBQSS. To start with, the posterior estimates may be sensitive to the choice of hyperparameter b1 (see Section 4). More importantly, the joint posterior could be improper in some important cases. To demonstrate this point, we allow a0,a1 to be the real and b0,b1 to be nonnegative so that some limiting cases of Gamma priors such as the invariant prior could be included. With the likelihood (Equation14), priors (Equation15) and (Equation16), the following theorem establishes the corresponding sufficient and necessary conditions for posterior propriety.

Theorem 2.1

Define SSE=y(INC(CC)1C)y, the posterior of (z,σ,δ|y) is proper if Conditions A, B and C hold.

Condition A. One of the following holds:

  1. b1>0,N2+a0>0;

  2. b1=0,a1<0.

Condition B. One of the following holds:

  1. SSE+2b0>0,n2+2a1>0;

  2. SSE+2b0=0,Nn+a0<0.

Condition C. N2+a0+2a1>0.

By Theorem 2.1, the posterior of (z,σ,δ|y) does not exist for a0=b0=0 and SSE = 0 or equivalently each xi corresponds to only one observation, since Condition (B2) in Theorem 2.1 is violated.

To overcome deficiencies in (Equation16), instead of (σ,δ), we could consider priors on σ and the smoothing parameter η=1/(2σδ). To include a slightly more general class of independent priors of (σ,η), we specify (17) π(σ,η)1σa+1h(η),σ>0,η>0,(17) where a0 is fixed and h(η) is one general prior for η. This idea of using prior (σ,η) rather than (σ,δ) is actually motivated by the prior in Bayesian additive smoothing splines (Sun & Speckman, Citation2008). With the likelihood (Equation14), the priors (Equation15) and (Equation17), we have the following theorem.

Theorem 2.2

We have the following two cases,

  1. Under the priors (Equation15) and (Equation17), the joint posterior of (z,σ,η|y) is proper if h(η) is proper and n>2 + a.

  2. When SSE>0, suppose there exist L>0, b>0 such that (18) h(η)Lη1+b,η>0,(18) the joint posterior of (z,σ,η|y) is proper if n>2 + 2b and N>2 + a + b.

Theorem 2.2 establishes the sufficient condition for the posterior propriety of the joint posterior. Based on Theorem 2.2(a), when we use the invariant prior for σ (a=0), despite of SSE = 0 or >0, the joint posterior is proper when h(η) is proper and there are at least three knots. When SSE>0, Theorem 2.2(b) allows an improper prior on η. With a small quantity on a and b, for example a = 0 and b = 0.01, the joint posterior is proper when there are at least three knots and four total observations. For the necessary condition, we have the following theorem.

Theorem 2.3

Necessary condition

When SSE = 0, the necessary condition for the propriety of joint posterior (z,σ,η|y) is n>2 + a and there exists a positive ϵ such that (19) 0ϵηah(η)dη+ϵ+h(η)dη<+.(19)

By Theorem 2.2(a) and 2.3, when SSE = 0 and the invariant prior is imposed on σ (a=0), a proper h(η) is sufficient and necessary condition for the propriety of joint posterior (z,σ,η|y).

Remark 2.1

When SSE = 0 and a = 0, the condition for the posterior propriety is the same as the case in the Bayesian smoothing spline (Tong et al., Citation2018).

Here, we imposed the invariant prior on σ for an objective purpose. At last, we need to specify a prior for η to complete the fully Bayesian analysis. In accordance with Theorem 2.2, when the invariant prior is used for σ, we need a proper prior for η to ensure a proper joint posterior. We adopt the scaled Pareto prior for η, (20) h(η)c(c+η)2,η>0, c>0.(20) In point of fact, (Equation20) has been widely used in Bayesian smoothing spline (Cheng & Speckman, Citation2012; Tong et al., Citation2018) due to its heavy tail and straightforward explanation of hyperparameter c, which is the median of this distribution. Furthermore, (Equation20) is equivalent to the hierarchical structure below (21) η|sexp(s),sexp(c),(21) where s is the latent variable. This hierarchical structure offers computation benefits by the resulting conjugacy of the full conditional distribution of η. For the choice of the hyperparameter c, we select c to be the tuned smoothing parameter ηopt by generalised approximate cross-validation (GACV) (Yuan, Citation2006). The sensitivity test of hyperparameters c suggests the fitted curves are quite robust to the choice of c (see Section 4).

2.2. Partially collapsed gibbs sampling

To facilitate the Gibbs sampling involving ALD, we decomposed ALD as a mixture of an exponential and a scaled normal distribution (Kozumi & Kobayashi, Citation2011). To be specific, if Y is distributed as ALD(μ,σ,p), (22) Y=dξ1ν+ξ2σ12νz+μ,(22) where ξ1=12pp(1p),ξ2=2p(1p),νExp(σ),zN(0,1), ν and z are independent. This representation allows us to express the quantile regression model as the normal regression with the latent variable ν distributed as exponential. Incorporating the latent variables s and ν=(ν1w1,,νnwn) through (Equation21) and (Equation22), we can obtain the full conditional distributions of the unknown parameter including latent variables. We further applied partially collapsed Gibbs sampling (van Dyk & Park, Citation2008) by marginalising the latent variable ν in (σ,ν|z,σ,s,y) to achieve a better mixing property. The sampling procedure for the joint distribution of (z,σ,η,s,ν|y) is as follows.

  1. Sample σ from GammaN+n21,ρp(yCz)+ηzAz.

  2. Sample z from N(μ,Σ),where Σ=(Ω+2σηA)1,μ=ΣB,and B=(B1,,Bn), Bi=j=1wiyijξ1νijξ22νij,Ω=diag(C1,,Cn),Ci=i=1wi1ξ22νij.

  3. Sample η from Gamman2,s+σzAz.

  4. Sample s from Gamma(2,η+c).

  5. The density of νijν is proportional to 1νijexpξ122ξ22+σνij(yijzi)22ξ22νij,i=1,,n, j=1,,wi. If we set νij=1/νij, νij follows the inverse Gaussian (IG) distribution, IG(σξ1ξ22+2σ,|yijzi|1ξ1+2σξ22), where a random variable X follows IG distribution (Jørgensen, Citation1982), IG(ν,λ), if the density is λ2πx3exp12λ(xν)22ν2x.

3. Smoothing spline with unobserved knots

In practice, it is crucial to consider situations where knots may be unobserved in several intervals but we are interested in the fitted values at some given knots. Here, the unobserved knots refer to the case where there is no observation at the knots. This situation is common. For instance, in the bond transaction data, the trading data for the short term is more abundant due to its good fluidity than the long term. However, we still need to know some fitted values at key knots without observations in the long term. Notice that a regular NBQSS could provide the whole fitted curve based on the observed knots and hence obtain an estimate for any given knot, including the unobserved. In contrast, in this section, we directly incorporate the functional values at the unobserved knots as unknown parameters into the model. Therefore, it is feasible to borrow the information from observed knots. It is within our expectation that NBQSS with unobserved knots exhibits less uncertainty compared with a regular NBQSS and give a better prediction at unobserved knots. We prespecified position and the number of the unobserved knots. Admittedly, the choice for optimal locations or the number of added knots remains a challenging problem but it is beyond the scope of our paper.

Consider {x1,x2,,xn} as the complete knots, assume that there are m unobserved knots, say {x~1,x~2,,x~m}, in {x1,x2,,xn} and the rest observed knots are marked as {x1,x2,,xnm}. We still assume that there exist wi observations at xi, i=1,,nm and i=1nmwi=N.

Notice that one nice aspect of the priors associated with smoothing spline is that they extend naturally to f(x) for arbitrary unobserved knots (Nychka, Citation2000). Therefore, we introduce the incidence matrix, denoted as C, to the likelihood of (z,σ) with kernel (23) σNexpσρpyCz,z=(z1,,znm,z~1,,z~m),(23) where (z1,,znm) is the unknown functional value at the knots (x1,,xnm) and (z~1,,z~m) corresponds to the knots (x~1,,x~m). CC=diag(w1,,wn) and wi is positive when zi corresponds to the observed knot, zero otherwise. C is utilised to indicate the corresponding response variable for zi,i=1,,nm. For an intuitive understanding, we provide an illustration for C. Suppose there are three knots x1<x~1<x2, there are 2 observations at x1, 2 observations at x2, and no observation at x~1. Then, C is 100100001001and CC=diag(2,0,2).

The following theorem establishes the posterior propriety for NBQSS with unobserved knots.

Theorem 3.1

Under the likelihood (Equation23) and PIN prior (Equation15) on z, the conditional posterior of (z|σ,δ,y) or (z|σ,η,y) after integrating (z~1,,z~m) has the same structure as the case where there is no additional knot.

By Theorem 3.1, it follows immediately that, if we specify the prior for (σ,δ) or (σ,η), the conditions for posterior propriety of the joint posterior of all the parameters is the same as the regular NBQSS.

4. Numerical analysis

4.1. Sensitivity analyses of the hyperparameters

In this section, we perform the sensitivity analyses in NBQSS with respect to its two candidate prior specifications including the Gamma prior (Equation16) for δ and the scaled Pareto prior (Equation20) for η. Assume the data are generated from (24) yi=f(xi)+ϵi,ϵii.i.d.N(0,0.062),(24) with (25) f(x)=1+e4(x0.3)1+1+e3(x0.2)1+1+e4(x0.7)1+1+e5(x0.8)1.(25) We equally divide x[2,2] into 50 pieces and the knots are {x1,,x50}. For the knots {x1,,x24,x26,,x50}, we generate one observation. For the knot x25, we generate 2 observations to ensure the existence of the posterior of (z,σ,δ|y). The function (Equation25) is also utilised by Jullion and Lambert (Citation2007) to check the sensitivity of the hyperparameter for Gamma prior in Bayesian P-spline. In Figure , we may find that the fitted curves fluctuate a lot to the choice of the hyperparameter b1 in Gamma prior (Equation16) while they are quite robust to the choice of c in the scaled Pareto prior (Equation20). Additionally, we compare the estimation performances using Gamma prior for δ with the scaled Pareto prior for η by integrated square error (ISE) defined as, (26) ISE=22fˆ(x)f(x)2dx.(26) A smaller ISE indicates a better estimation for f(x). The results are summarised in Table . From Table , we may find using scaled Pareto prior for η outperforms Gamma prior for δ, especially when the hyperparameter is chosen small.

Figure 1. The curves are fitted when p = 0.5 and the true curves are the solid lines. Graph (a): Gamma prior, Ga(a1,b1), for δ with a1=1 and b1=0.01 (dash-dotted), b1=0.1 (dotted), b1=0.5 (dashed); Graph (b): Scaled Pareto prior for η with c = 0.05 (dash-dotted), c = 0.1 (dashed) and c = 0.5 (dotted).

Figure 1. The curves are fitted when p = 0.5 and the true curves are the solid lines. Graph (a): Gamma prior, Ga(a1,b1), for δ with a1=1 and b1=0.01 (dash-dotted), b1=0.1 (dotted), b1=0.5 (dashed); Graph (b): Scaled Pareto prior for η with c = 0.05 (dash-dotted), c = 0.1 (dashed) and c = 0.5 (dotted).

Table 1. ISEs for Gamma prior on δ and scaled Pareto prior on η.

4.2. Comparsion studies

Our comparison studies are conducted from two perspectives. For one thing, we evaluate the performance of our method by comparing with other four popular competing methods. For another, we investigate how the proposed method works when there are unobserved knots. We generate the data by, (27) yi=f(xi)+ap+ϵi,i=1,,n,(27) where f() is a prespecified function and ap is the tuning number to ensure the pth quantile zero. We consider the following three distributions for the error term:

  • Normal distribution, N(0,0.12);

  • ALD(0,0.05,p), where p is the chosen quantile;

  • Cauchy distribution with the location parameter 0 and the scale parameter 0.013, C(0,0.013).

Given a quantile p and an error distribution above, we simulate 200 datasets. Within each dataset, we split it into one training dataset and one validation dataset. For the training dataset, we equally divide (0,1) into 20 spaces and simulate 5 observations at each knot. For the validation dataset, we equally divide (0,1) into 30 spaces and generate one observation at each knot. The validation dataset is employed to obtain the optimal regularisation parameter. We compare the different procedures by median of integrated absolute error (MIAE) and median of integrated square error (MISE), MIAE=median01fˆ(x)f(x)dx,MISE=median01fˆ(x)f(x)2dx,where median is taken over 200 simulation studies.

4.2.1. Comparison with candidate approaches

The Monte Carlo method is used to compare our method with other approaches including,

  1. New Bayesian quantile smoothing spline (NBQSS);

  2. Original Bayesian quantile smoothing spline (OBQSS) (Thompson et al., Citation2010);

  3. Quantile smoothing spline (QSS) (Koenker et al., Citation1994);

  4. Bayesian smoothing spline (BSS) (Tong et al., Citation2018);

  5. Smoothing spline (SS) (Wahba, Citation1990).

We generate the data by (Equation27) with the following two underlying curves (28) Curve I :f(x)=2+0.25sin(0.25πx)+0.25ln(x),0<x<1,(28) (29) Curve II:f(x)=2+0.2sin(5πx)+0.25ln(x),0<x<1.(29) The underlying Curve I is monotone and Curve II is a little bit twisted. For the smoothing parameter η in NBQSS, we adopt the scaled Pareto prior (Equation20) and the hyperparameter is chosen by GACV. For the smoothing parameter η in OBQSS, a Gamma(α,β) prior is utilised and its hyperparamters are selected in accordance with Thompson et al. (Citation2010)'s suggestion. Specifically, β=0.1/GCV(mean spline) and α=GCV(mean spline)/β, where GCV(mean spline) is the value of η chosen by the generalised cross-validation (Craven & Wahba, Citation1978). The numerical results for Curves I and II are listed in Tables  and , respectively.

Table 2. MIAEs and MISEs for Curve I with p = 0.1, 0.3 and 0.5.

Table 3. MIAEs and MISEs for Curve II with p = 0.1, 0.3 and 0.5.

From Tables  and , NBQSS uniformly outperforms OBQSS, which indicates our method serves as an efficient improvement over the original method. Among three quantile methods (NBQSS, OBQSS and QSS), NBQSS performs best with the lowest MIAE and MISE in 8 out of 9 comparisons. However, when the error is the normal distribution, two mean regression methods (BSS and SS) perform better than three quantile regression methods with lower MIAE and MISE. In contrast, when the error is the ALD or the Cauchy, two quantile approaches (NBQSS and QSS) have better performances for all the selected quantiles. In addition, OBQSS has the worst performance among the five candidate methods no matter which quantiles or random errors are used. Also, OBQSS tends to perform worse as p is away from 0.5, which echoes the findings in Santos and Bolfarine (Citation2016). Although a similar trend appears in all quantile methods, compared with other candidate models, our method still shows considerable advantages.

We also demonstrate the behaviours of fitted curves for Curves I and II with ALD error and p = 0.3. In the 200 simulation studies, we can obtain the fitted value for any point x(0,1). We treated the median of the 200 simulation studies at the point x as the estimation and draw the fitted curves in Figures  and . We also plot the pointwise empirical 2.5% and 97.5% quantiles of the 200 estimates to display the variability in estimation.

Figure 2. The curves are fitted for Curve I under p = 0.3 and ALD error. Graph (a) NBQSS method; Graph (b) OBQSS method; Graph (c) QSS method; Graph (d) BSS method; Graph (e) SS method. The solid lines are  the true curve and the fitted curve. Two dashed lines are 2.5% and 97.5% pointwise empirical quantiles.

Figure 2. The curves are fitted for Curve I under p = 0.3 and ALD error. Graph (a) NBQSS method; Graph (b) OBQSS method; Graph (c) QSS method; Graph (d) BSS method; Graph (e) SS method. The solid lines are  the true curve and the fitted curve. Two dashed lines are 2.5% and 97.5% pointwise empirical quantiles.

Figure 3. The curves are fitted for Curve II under p = 0.3 and ALD error. Graph (a) NBQSS method; Graph (b) OBQSS method; Graph (c) QSS method; Graph (d) BSS method; Graph (e) SS method. The solid lines are the true curve and the fitted curve. Two dashed lines are 2.5% and 97.5% pointwise empirical quantiles.

Figure 3. The curves are fitted for Curve II under p = 0.3 and ALD error. Graph (a) NBQSS method; Graph (b) OBQSS method; Graph (c) QSS method; Graph (d) BSS method; Graph (e) SS method. The solid lines are the true curve and the fitted curve. Two dashed lines are 2.5% and 97.5% pointwise empirical quantiles.

From Figures  and , we may find that OBQSS tends to give a relatively flat curve, which fails to capture the true trend in the curve, especially when the curve is twisted. Based on our experience, in this case, OBQSS chooses large values for η, which results in a flat fitted curve. For example, for Curve I with p = 0.3 and ALD distributed random errors, the posterior median of η in NBQSS is 0.004 while in OBQSS is 1.135. This phenomenon requires further investigation.

4.2.2. Comparison of NBQSS with and without unobserved knots

We investigate how NBQSS performs with unobserved knots. We generate the data by (Equation27) with the underlying curve (30) f(x)=0.8sin(πx),0<x<1.(30) We equally divide (0,1) into 20 spaces and regard it as the complete knots. Two representative unobserved knots mechanisms are considered,

  • Mechanism I:  α% of the complete knots are unobserved in the middle.

  • Mechanism II:  two knots around 0.5 have observations and (α/2)% of the complete knots are unobserved in each of two intervals, (0,0.47) and (0.52,1).

We take α%= 30%, 50% and 70% to represent low, medium and high unobserved proportion. For each observed knot, we randomly generate 5 observations. We illustrate the above two mechanisms in Figure . We implement the regular NBQSS (without unobserved knots) and NBQSS with unobserved knots (we refer it to NBQSSK, ‘K’ is for knot). For Mechanism I, we add 6 uniformly distributed knots in total for the interval without knots. For Mechanism II, we add 3 uniformly distributed knots in each of two intervals without knots, hence 6 unobserved knots in total. We consider p = 0.1, 0.5 and 0.9. MIAEs and MISEs are recorded in Tables  and .

Figure 4. One illustration for Mechanisms I and II. Graphs (a)–(c) correspond to Mechanism I with α%=30%,50% and 70%. Graphs (d) and (e) correspond to Mechanism II with α%=30%, 50% and 70%.

Figure 4. One illustration for Mechanisms I and II. Graphs (a)–(c) correspond to Mechanism I with α%=30%,50% and 70%. Graphs (d) and (e) correspond to Mechanism II with α%=30%, 50% and 70%.

Table 4. MIAEs and MISEs for Mechanism I with quantiles p = 0.1, 0.5, 0.9 and unobserved proportions α%=30%,50%,70%.

Table 5. MIAEs and MISEs for Mechanism II with quantiles p = 0.1, 0.5, 0.9 and unobserved proportions α%=30%,50%,70%.

In Tables  and , we may find two similarities for NBQSS and NBQSSK. Firstly, as α increases, the prediction errors increase for two methods. Secondly, both methods perform the best for p = 0.5 compared with other selected quantiles. In addition, based on Tables  and , for brevity, Table  summarises the percentage where NBQSSK outperforms NBQSS in each cell across different combinations of unobserved proportions and error distributions. From Table , we may find that NBQSSK defeats NBQSS in all the scenarios, especially for a large α, which substantiates that NBQSSK can provide the better prediction compared with regular NBQSS.

Table 6. Summary of cases where NBQSSK outperforms NBQSS.

Furthermore, to offer a more intuitive comparison of NBQSS and NBQSSK, we present the fitted curves and their empirical 95% credible intervals with quantile p = 0.1, ALD random error for Mechanisms I and II setting the unobserved proportion α%=50% and 70%. From Figure , we have several main findings. Firstly, NBQSSK performs similarly compared with NBQSS in the interval with observed knots. Secondly, two methods provide wider empirical credible intervals for Mechanism I than Mechanism II. It implies that the uncertainty surges when there is no guided information in the middle. Thirdly, for Mechanism I, the empirical 95% credible intervals of the NBQSSK are shorter than the NBQSS, especially in the middle interval. It reveals that NBQSSK can provide a more efficient uncertainty evaluation for the unobserved knots. However, the advantage of NBQSSK is less obvious for Mechanism II. Besides, for Mechanism I, NBQSSK is more advantageous than NBQSS when the unobserved proportion α%=70% compared with α%=50%, which echoes our findings in Tables  and . At last, for Mechanism I or II, compared with α%=50%, α%=70% has wider empirical 95% credible interval widths.

Figure 5. The solid lines are the true curve and fitted curves. The curves are fitted with p = 0.1, ALD random error. Dashed lines are empirical 95% credible interval corresponding to NBQSSK and dotted lines are empirical 95% credible interval corresponding to NBQSS. Graph (a): Mechanism I, α%=50%; Graph (b): corresponds to Mechanism II, α%=50%; Graph (c): Mechanism I, α%=70%; Graph (d): Mechanism II, α%=70%.

Figure 5. The solid lines are the true curve and fitted curves. The curves are fitted with p = 0.1, ALD random error. Dashed lines are empirical 95% credible interval corresponding to NBQSSK and dotted lines are empirical 95% credible interval corresponding to NBQSS. Graph (a): Mechanism I, α%=50%; Graph (b): corresponds to Mechanism II, α%=50%; Graph (c): Mechanism I, α%=70%; Graph (d): Mechanism II, α%=70%.

Remark 4.1

Notice that the above simulation studies are based on 5 observations at each observed knot. In fact, we have also conducted simulation studies with unequal number of observations at each observed knot. The main findings are similar to Tables  and . This implies that the relative performance of NBQSSK and NBQSS is irrespective of the number of observations at each observed knot.

4.3. Real dataset

4.3.1. Motorcycle dataset

We analysed the well-known data utilised by Silverman (Citation1985) to demonstrate nonparametric regression curve fitting. It has been frequently used to motivate and demonstrate the spline-based methodology, since the underlying curve makes polynomial modelling inappropriate. There are 113 observations in the data including accelerometer readings taken through time in an experiment on the efficacy of crash helmet. For NBQSS, we fit the quantile curves with p = 0.1, 0.5 and 0.9. To assess their performances, we fit the median curve with OBQSS and the mean curves with BSS and SS. Figure  shows the fitted curves of these methods. In Graph (a), with p = 0.5, NBQSS captures the general trend. With more quantiles, NBQSS offers an overview of the distribution. Interestingly, in Graph (b), the curve fitted using OBQSS is jittering very much, which tends to overfit. Similar results can be found in OBQSS when p = 0.1 and 0.9 (not shown here). Compared with NBQSS, the estimation of smoothing parameter is 8.167×106 while is 0.068 in NBQSS. Compared with two mean curve fitting methods, we record the median of absolute deviation (MAD) defined as (31) MAD=medianyiyˆi,i=1,,113.(31) for each method. The MAD for NBQSS with p = 0.5, BSS and SS is 8.755, 12.979 and 13.02781, respectively. It indicates that the median curve fitted by NBQSS are relatively robust to the points with big deviation compared with mean curve fitting methods.

Figure 6. Graph (a): NBQSS method with p = 0.1, 0.5 and 0.9; Graph (b): OBQSS method with p = 0.5; Graph (c): BSS method; Graph (d): SS method.

Figure 6. Graph (a): NBQSS method with p = 0.1, 0.5 and 0.9; Graph (b): OBQSS method with p = 0.5; Graph (c): BSS method; Graph (d): SS method.

4.3.2. China bond medium term note

The term structure of interest rates is the series of interest rates ordered by time to maturity at a given time. It is a fundamental concept in economic and financial theory such as fixed-income securities analysis, pricing derivatives, performing hedging operations, etc. The ‘term structure of interest rates’ is also known as a yield curve. In financial markets, there are a limited number of bonds traded, so an interpolation method is necessary to estimate the yield cuvre across the whole maturity spectrum. Tong et al. (Citation2018) employed the BSS to fit China bond yield curve and the simulation studies show that BSS outperforms the traditional yield curve fitting methods such as Nelson-Siegel model (Nelson & Siegel, Citation1987) and Svensson extension model (Svensson, Citation1994). However, when fitting the yield curve for China Bond Medium Term Note (CBMTN), which is one kind of bond issued by corporation or company to collect the capital, there always exist some underlying outliers disturbing the signals. In comparison to BSS, we employ NBQSS to obtain the quantile curves for CBMTN. The data set is downloaded from Wind (https://www.wind.com.cn/). This analysis focuses on the yield curve of CBMTN rating ‘CAAA’ on 25 September 2018, which is the highest rating of this bond and associated with lowest yield but lowest risk. There are 282 transaction data in CBMTN and there exist some obvious abnormal transactions in the data. Figure  shows us the NBQSS curves for p = 0.1, 0.5 and 0.9 along with BSS curve. The transaction data are marked as grey points. We may find that the BSS curve is overall lifted up by the outliers compared with the NBQSS using p = 0.5, especially when the time to maturity belongs to [0,5]. The results imply that the quantile curves using NBQSS are robust to the possible outliers while BSS is not. In addition, we record MAD for NBQSS with p = 0.5 and BSS. The MAD for NBQSS with p = 0.5 is 0.451 and for BSS is 0.79, which indicates that NBQSS provides more accurate estimation compared with BSS.

Figure 7. BSS curve (solid) and NBQSS with p=0.1 (dashed), 0.5 (dotted) and 0.9 (dotted-dash) curves on 25 September 2018.

Figure 7. BSS curve (solid) and NBQSS with p=0.1 (dashed), 0.5 (dotted) and 0.9 (dotted-dash) curves on 25 September 2018.

5. Comments

In this paper, we numerically demonstrate the issue associated with a fixed σ in OBQSS. To serve as a solution, we systematically investigate NBQSS with a random scale parameter. We establish conditions for the posterior propriety of the NBQSS under two common prior options, conjugate priors for (σ,δ) and one general prior 1/σa+1h(η) on (σ,η). These conditions are easy for practitioners to verify, hence serve as a guide to specify priors. We recommend imposing the prior on (σ,η) rather than (σ,δ) to an ensured proper joint posterior and a relatively robust hyperparameter specification. In practice, it is often for researchers to face unobserved knots when dealing with curve fitting. Therefore, we extend our theoretical results to NBQSS with unobserved knots. Finally, our simulation studies imply that our NBQSS performs the best in candidate quantile methods and NBQSS with unobserved knots perform better than regular NBQSS for most of cases. As with any simulation study, these cannot cover all possibilities, but we believe that they are sufficiently wide ranging to provide useful insights into the comparative performance of the different procedures.

Several unsolved issues in our work still worth investigating. For example, as shown in the simulation studies, all quantile methods perform relatively worse near the boundary, such as p = 0.1, and more work is encouraged to improve their boundary behaviours under the Bayesian framework. Also, when there are unobserved knots, how to choose the positions of added knots and how to decide the number of added knots to obtain better fitted curves are of interest in NBQSS. Furthermore, we do not consider the crossing problem for different quantiles in this article. Without special restriction, quantile regression functions estimated at different orders can cross each other, which disobeys the rule of the probability. Therefore, another concern is to develop the theoretical results for the non-crossing Bayesian regularised regression quantile (Liu & Wu, Citation2011).

Acknowledgements

The authors sincerely thank two anonymous reviewers for the thoughtful and constructive suggestions they provided that led to considerable improvements in the presentation of our work.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The project was supported by the National Natural Science Foundation of China [Grant Number 11671146].

Notes on contributors

Zhongheng Cai

Zhongheng Cai is a PhD candidate in the school of statistics, East China Normal University, Shanghai, China. His research interests include Bayesian statistics.

Dongchu Sun

Dr. Dongchu Sun received his PhD. in 1991 from Department of Statistics, Purdue University, under the guidance of Professor James O. Berger.

References

Appendices

Appendix 1. Proof of Theorem 2.1

Proof.

With the likelihood (Equation14) and priors (Equation15), (Equation16), the posterior of (z,σ,δ|y) is proportional to (A1) σN+a01expσρp(yCz)12δzAz×exp(b0σ)1δa1+n22+1expb1δ.(A1) Integrate σ, (EquationA1) is proportional to (A2) 1ρpyCz+b0N+a01δn22+a1+1exp12δzAzb1δ.(A2) We only prove the sufficiency, the necessity can be proved in the similar way.

Since min{p,1p}|z|1ρp(z)max{p,1p}|z|1, and ||z||2|z|1n||z||2,zRn, there exists a positive number M1 such that (A3) ρp(yCz)+b0M1(yCz)(yCz)+b0.(A3) Moreover, there exist M2>0, such that (A4) (A3)M2M1(yCz)(yCz)+b0.(A4) Hence, we just need to consider, (A5) 1M1(yCz)(yCz)+b0N+a01δn22+a1+1×exp12δzAzb1δ.(A5) We consider (A6) 1σ1N+a02+11δn22×expM1σ1(yCz)(yCz)b0σ112δzAz×1δa1+1expb1δ,(A6) since the integration with respect to σ1 is proportional to (EquationA5). Now, set y1=2M1y,z1=2M1z,A1=12M1A, (EquationA6) becomes 1σ1N+a02+11δn22×exp12σ1(y1Cz1)(y1Cz1)b0σ112δz1A1z1×1δa1+1expb1δ,which agrees with the case for normal error in Speckman and Sun (Citation2003), the results follow.

Appendix 2. Proof of Theorem 2.2

For (a), the following lemma is needed,

Lemma A.1

Assume ZN(0,σ2), for any t>0,μR,

  1. (A7) 22πσt1+σ2t2expμ22σ2Eexpt|Zμ|min1,2eπ1t|μ|21+πe2μ2t2,(A7)

  2. (A8) Eexpt|Zμ|2π1σtexpμ24σ2+expμt2.(A8)

Proof.

For (1), the third inequality is trivial, we focus on the other two inequalities. For the first one, we have, E(expt|Zμ|)=+1σ2πexpt|zμ|z22σ2dz=+1σ2πexptz(z+μ)22σ2dz=expμ22σ2+1σ2π×expt|z|z22σzμσ2dz=expμ22σ20+1σ2πexpt|z|z22σ2×expzμσ2+expzμσ2dz.By arithmetic-geometric mean inequality, exp(zμ/σ2)+exp(zμ/σ2)2. Hence, Eexpt|Zμ|2expμ22σ2Eexpt|Z|.We can obtain E(expt|Z|)=2π1Φ(tσ)ϕ(tσ),where Φ, ϕ are standard normal c.d.f and p.d.f respectively. By the inequality (Gordon, Citation1941) t1+t21Φ(t)ϕ(t)1t,t>0,the first inequality holds. For the second inequality, we need the hierarchical structure of the Laplace distribution. Andrews and Mallows (Citation1974) proposed that (A9) t2expt|Zμ|=0+12πyexp(Zμ)22y2t2yexpy2t22dy.(A9) Hence, by Fubini's theorem, (A10) t2E(exp(t|Zμ|))=0++12πσ12πy×exp(zμ)22y2z22σ2dzt2yexpy2t22dy.(A10) Since μ|z,yN(z,y2), z|σ2N(0,σ2), we have the marginal distribution of μ|y,σ2 is N(0,σ2+y2). The inner integral equals (A11) I(μ,y)=12π(σ2+y2)expμ22σ2+2y2.(A11) Since the function f(w)=1/wexp(μ2/(2w)),w=σ2+y2, is unimodal with the maximum value |μ|1exp(1/2), we have I(μ,y)12πe|μ|.Hence, E(expt|Zμ|)2πe1|μ|t.Clearly, E(exp(t|xμ|))1, so the second inequality holds.

For (2), since (A12) t2E(exp(t|Zμ|))=0+12π(σ2+y2)t2y×expμ22σ2+2y2y2t22dy.(A12) Set x=y/σ,μ=μ/σ,t=σt, we have (A13) (A12)=12πt2σ0+11+x2x×expμ22+2x2t2x22dx12πt2σ01xexpμ24t2x22dx+1+expμ24x2t2x22dx.(A13) Since the following identity holds for a, b are positive, 0+faxbx2dx=1a0+f(x2)dx. we have (A14) (A13)12π1σexpμ24+t2σexp|μ|t2=12π1σexpμ24σ2+t2exp|μ|t2.(A14) This implies result.

Assume SSE=0. First, we consider a>0. The joint posterior of (z,σ,η|y) is proportional to (A15) σ32n2aηn22expσρp(yCz)σηzAzh(η).(A15) By the spectral decomposition, (A16) A=PΛP,(A16) where P is the orthogonal matrix and Λ=diag(0,0,η3,,ηn). Let Dp=min(p,1p) (A17) (A15)σ32n2aηn22×expDpσ||yz||2σηzAzh(η).(A17) Set z=Pz, y=Py, (A18) (A17)=σ32n2aηn22×expDpσ||yz||2σηzΛzh(η)σ32n2aηn22×expDpnσ|yz|1σηzΛzh(η)=σ32n2aηn22expDpnσi=12|yizi|Dpnσi=3n|yizi|σηi=3nηizi2h(η).(A18) Integrating out z1,z2, (EquationA18) is proportional to (A19) σ32n4aηn22×expDpnσi=3n|yizi|σηi=3nηizi2h(η).(A19) Moreover, (A20) (A19)σn3ai=3nEziexpDpnσyizi,ziN0,12ηiση.(A20) By Lemma A.1(a), we know (EquationA20) is smaller than or equal to an expression proportional to (A21) σn3ai=3n21+πeDp22nyi2σ2h(η),(A21) When n3, we know σn3ai=3n21+πeDp22nyi2σ2,has finite integral with respect to σ when a>0. Hence, if h(η) is proper, the posterior of (z,σ,η|y) is proper.

When a = 0, the joint posterior of (z,σ,η|y) is proportional to (A22) σ32n2ηn22expσρp(yz)σηzAzh(η).(A22) Following the same procedure as the case a>0, we only need to show (A23) σ32n4ηn22expDpσi=3n|yizi|σηi=3nηizi2h(η).(A23) has finite integral. By Lemma A.1(b), we know there exist a constant D1 such that (EquationA23) is smaller than or equal to (A24) D1σn3i=3n4ηiηπσexpyi2ηiση2+expyiσ2×h(η).(A24) We know there are 2n2 terms after expansion. For the term involving both 4ηiηπσexpyi2ηiση2,expyiσ2,we employ the inequality, yexpμ2y221e|μ|,for y>0,for 4ηiη/(πσ)exp(yi2ηiση/2). Then, the term will not relate to η and has finite integral with respect to σ. Hence, we only need to consider the term (A25) σn3ησ(n2)/2expσηi=3nyi2ηi2.(A25) Set σ1=ση, then (EquationA25) is equal to σ1(n4)/2expσ1i=3nyi2ηi2,which is proper when n3. Hence, if h(η) is proper, the posterior of (z,σ,η|y) is proper.

When SSE>0, the joint posterior of (z,σ,η|y) is proportional to (A26) σN1aσ(n2)/2η(n2)/2×expσρp(yCz)σηzAzh(η).(A26) With the same argument in Theorem 2.1, (EquationA26) is smaller than or equal to (A27) σ(N+n22a)η(n2)/2×expminp,1pσ(yCz)(yCz)σηzAz(yCz)(yCz)h(η).(A27) Let Py=(CC)1Cy and z=zPy. The integral of (EquationA27) w.r.t z equals to (A28) σN1aσ(n2)/2η(n2)/2×expminp,1pσzWz+SSEση(z+Py)A(z+Py)zWz+SSEh(η)σN1aσ(n2)/2η(n2)/2×expC1σi=1nwi|zi|+SSEση(z+Py)A(z+Py)i=1nwi|zi|+SSEh(η).(A28) Since σNnexp(C1σSSE) is bounded, we only need to consider (A29) σ(3n2a)/2η(n2)/2×expC1σi=1nwi|zi|ση(z+Py)A(z+Py)i=1nh(η).(A29) Similarly, we can find the C2 such that (30) (A29)σ(3n2a)/2η(n2)/2×expC2σρp(zPy)σηzAzh(η).(30) Let z1=C2z,y1=C2Py,A1=A/C22, (EquationA31) is proportional to the joint posterior of (z,σ,η|y) when SSE = 0. By Theorem 2.2(a), the result holds.

For (b), since the posterior of (z,σ,δ|y) is proportional to (A31) σNaδ(n2)/2expσρp(yCz)12δzAz1σ2δ2h12σδσNaδ(n2)/2expσρp(yCz)12δzAz×1σ2δ2σ1+bδ1+b.(A31) We know that (EquationA31) is the special case in Theorem 2.1. Hence, it has finite integral with respect to (z,σ,δ) iff n>2+2b,N>2+a+b.

Appendix 3. Proof of Theorem 2.3

With the similar argument in Theorem 2.2, there exist D4,D5 such that (A32) (A15)D4σ32n2aηn22×expD5σi=3n|yizi|σηi=3nηizi2h(η).(A32) By Lemma A.1(a) and the following inequality, xy1+xyx1+xy1+y,x>0,y>0,Ignoring the constant of the product, we have the right hand side of (EquationA32) is larger than or equal to (A33) σn3aσn22i=3n1+σ2ηiηn22(1+η)n2×expσηi=3nηiyi2h(η),(A33) Set σ1=ση, (EquationA33) is larger than or equal to (A34) σ1n3aσ1n22i=3n1+σ12ηi×expσ1i=3nηiyi2ηa(1+η)2n4h(η).(A34) Since σ1n3aσ1n22i=3n1+σ12ηiexpσ1i=3nηiyi2,has finite integral if n2+2/3a. Hence the necessary condition for the posterior propriety of (z,σ,η|y) is 0+ηa(1+η)2n4h(η)<+.Combining the fact that 1/(1+η)2n4 is bounded and away from zero in (0,ϵ) for arbitrary large ϵ, The condition (A35) 0ϵηah(η)dη<+.(A35) is also necessary. Next, we focus on the integral of η in the interval (ϵ,+). With loss of generality, assume ϵ=1. All together (EquationA12) with (EquationA19), the integration of (EquationA32) with respect to z is proportional to (A36) 0+0+σ1a×i=3nσ2ti12ηiση+ti2expyi21ηiση+2ti2Dp2σ2ti22n×dt3dtn.(A36) Let ti=si/σ, (A37) (A36)0+0+σ32n4a×expσi=3nyi21ηiη+2si2+Dp2si22n×i=3nsi12ηiη+si2ds3dsn.(A37) After integrating out σ in (EquationA37), the result is proportional to (A38) f(η)=0+0+1i=3n(yi21ηiη+2si2+Dp2si22n)32n3a×i=3nsi12ηiη+si2ds3dsn,(A38) since η1, we have (A39) f(η)0+0+1i=3nyi22si2+Dp2si22n32n3a×i=3nsi12ηi+si2ds3dsn=:Q.(A39) We will show right hand side of (EquationA40), Q, is finite under n>2 + a. Divide the interval (0,+) into (0,1)[0,+), Q turns to be the summation of 2n2 integral. All the integrals can be dealt in the similar way. We take one for example. Consider s3(0,1) and the others are in (1,+). (A40) 011+1+1i=3nyi22si2+Dp2si22n32n3a×i=3nsi12ηi+si2ds3dsn,(A40) since y32/2s32+Dp2s32/2nDp|y3|/n and si/(2ηi)1+si21, i=4,,n, (A41) (A40)1+×1+1Dp|y3|n+Dp2i=4nsi22n32n3ads4dsn,(A41) Using the polar transformation for (s4,,sn) (A42) s4=rcos(ϕ1),s5=rsin(ϕ1)cos(ϕ2),sn=rsin(ϕ1)sin(ϕ2)sin(ϕn4),(A42) where r2n3 and 0ϕiπ/2. The determinant of Jacobian matrix is rn4sinn5(ϕ1)sin(ϕn4). Hence, the right hand side of (EquationA41) is proportional to n3+rn4Dp|y3|n+Dp2r22n32n3adr,which is finite when n>3/2+a. Similarly, all the 2n2 integrals can be handled in this way and the case when all the si(1,+) will give the most strict constraint for n, which is n>2 + a. Hence, we have (A43) ϵ+h(η)dη<+.(A43) Combing (EquationA35) with (EquationA43), the result holds.

Appendix 4. Proof of Theorem 3.1

The following lemma can be found in de Oliveira (Citation2007),

Lemma A.2

There exists a full rank n×(n2) matrix L satisfying LT=0,T=(1,x) and LL=In2, for which the following hold,

  1. A=R1R1T(TR1T)1TR1=L(LRL)1L,

  2. LRL=D, where D is an (n2)×(n2) diagonal matrix with positive diagonal elements.

For the special structure of A, we can prove all its n−2 subprincipal matrice are positive definite. Let T be the orthogonalisation of T. We know (L,T) is an unitary matrix. Without loss of generality, we take the left upper subprincipal matrix of A with order n−2 for example and denote it to be Asub. The determinant is |Lsub|2|D|1, where Lsub is the first n−2 rows of L. In addition, since (L,T) is an unitary matrix, by the Theorem 6.3 in Zhang (Citation2011), we have |Lsub|=c1xn11xn,where c is a positive number related to x. Hence, we have |A|=c21xn11xn2|D|1,which is positive. Hence, all the n−2 subprincipal matrice of A are positive definite. Now, we can prove the theorem.

Here, we only consider the conditional posterior of (z|σ,σ,y). The similar argument can be applied to (z|σ,η,y). The conditional posterior of (z|σ,σ,y) is proportional to (A44) σNδn22expσρp(yCzo)12δzAzh(η),(A44) where zo=(z1,,znm). Partition the precision matrix A, (A45) A1=A11A12A12A22,(A45) where A11 is the (nm)×(nm) matrix and A22 is m×m matrix. In addition, we have A22 is positive definite. Integrate out (z~nm+1,,z~n), we have (EquationA44) is proportional to (A46) σNδnm22expσρp(yCzo)12δzoA11.2zo,(A46) where A11.2=A11A12A221A12 whose rank is nm−2. Then, (EquationA46) has the same form as the case when CC is full rank. Hence, the result holds.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.