309
Views
0
CrossRef citations to date
0
Altmetric
Articles

Power-expected-posterior prior Bayes factor consistency for nested linear models with increasing dimensions

ORCID Icon, &
Pages 162-171 | Received 13 May 2019, Accepted 14 Jan 2020, Published online: 30 Jan 2020

Abstract

The power-expected-posterior prior is used in this paper for comparing nested linear models. The asymptotic behaviour of the method is investigated for different values of the power parameter of the prior. Focus is given on the consistency of the Bayes factor of comparing the full model Mp versus a generic submodel M. In each case, we allow the true generating model to be either Mp or M and we keep the dimension of M fixed, while the dimension of Mp can be either fixed or (grow as) O(n), with n denoting the sample size.

1. Introduction

Pérez and Berger (Citation2002) developed priors for objective Bayesian model comparison, through the utilisation of the device of ‘imaginary training samples’. The expected-posterior prior (EPP) for the parameter under a model is an expectation of the posterior distribution given imaginary observations y of size n. The expectation is taken with respect to a suitable probability measure of a reference model M0, while the posterior distribution is computed via Bayes's theorem starting from a default, typically improper, prior. One of the advantages of using EPPs is that impropriety of baseline priors causes no indeterminacy in the computation of Bayes factors. On the other hand, the EPPs depend on the training sample size and particularly in variable selection problems, imaginary design matrices should also be introduced, under each competing model, and therefore the resulting prior will further depend on this choice (for a detailed discussion on this issue, see Fouskakis, Ntzoufras, & Draper, Citation2015). The selection of a minimal training sample, of size n, has been proposed (see, for example, Berger & Pericchi, Citation2004), to make the information content of the prior as small as possible, and this is an appealing idea. But even under this set-up, the resulting prior can be influential when the sample size n is not much larger than the total number of parameters under the full model (see Fouskakis et al., Citation2015).

The power-expected-posterior (PEP) prior, introduced by Fouskakis et al. (Citation2015), is an objective prior which amalgamates ideas from the power prior (Ibrahim & Chen, Citation2000), the expected-posterior prior (Pérez & Berger, Citation2002) and the unit-information-prior approach of Kass and Wasserman (Citation1995) to simultaneously (a) produce a minimally informative prior and (b) diminish the effect of training samples under the EPP methodology. The main idea is to substitute the likelihood by a density-normalised version of a power-likelihood in EPP. Fouskakis et al. (Citation2015) and Fouskakis and Ntzoufras (Citation2016b) studied in detailed the PEP priors under the variable selection problem in Gaussian regression models. In the first paper, they introduced the PEP prior by considering as parameter of interest both the coefficients of the model and the error variance while in the second paper they studied the conditional version of PEP, named PCEP, where they considered only the coefficients as the parameter of interest and the error variance as a common nuisance parameter. Here we focus in the former case. Under this approach, for every model M in M (the set of all models under consideration) the sampling distribution f(|β,σ2) is specified by (1) (Y|X,β,σ2,M)Nn(Xβ,σ2In),(1) where Y=(Y1,,Yn) is a vector containing the responses for all subjects, X is an n×d design matrix containing the values of the explanatory variables in its columns, In is the n×n identity matrix, β is a vector of length d summarising the effects of the covariates in model M on the response Y and σ2 is the error variance for model M. Finally, by p we denote the total number of the explanatory variables under consideration and by Mp the full model, including all p covariates.

Furthermore, we denote by πN(β,σ2) the baseline prior to the parameters of model M. Here we use the independence Jeffreys prior (or reference prior) as the baseline prior distribution. Hence, for any MM, we have (2) πN(β,σ2)=cσ2,(2) where c is an unknown normalising constant.

We assume that in M there exists a model M0, with parameters β0 and σ02, sampling distribution f0(|β0,σ02) and baseline prior π0N(β0,σ02)σ02, which is nested into each of the remaining models and we consider it as a reference model. This is the typical case in the variable selection problem, studied in this paper. Given then a set of imaginary data y=(y1,,yn)T and a positive power parameter δ, that is used to regulate, essentially, the contribution of the imaginary data on the ‘final’ prior, we introduce the density-normalised power-likelihood, under model M, given by (3) f(y|β,σ2,δ,X)=f(y|β,σ2,X)1/δf(y|β,σ2,X)1/δdy.(3) The above density-normalised power-likelihood is still a normal distribution with variance inflated by a factor of δ; in the above, X denotes the imaginary design matrix under model M. In a similar manner, under the reference model, the density-normalised power-likelihood takes the form of (Equation3) but using now the likelihood f0(y|β0,σ02,X0) of M0.

In order to apply the PEP methodology, the density-normalised power-likelihood (Equation3) is used to evaluate, under the imaginary data and the baseline prior, the prior predictive distribution m0N(y|δ,X0) of model M0 as well as the posterior distribution of the parameters of model M (4) πN(β,σ2|y,δ,X)=f(y|β,σ2,δ,X)πN(β,σ2)mN(y|δ,X),(4) where (5) mjN(y|δ,Xj)=fj(y|βj,σj2,δ,Xj)πjN(βj,σj2)dβjdσj2,(5) is the prior predictive distribution of model Mj for j=,0.

Finally, the imposed prior for the parameters of any model M has the following form (6) πPEP(β,σ2|δ,X)=πN(β,σ2|y,δ,X)m0N(y|δ,X0)dy.(6) The default choice for δ is to set it equal to n, i.e. the sample size of the imaginary data, so that the overall information of the imaginary data in the posterior is equal to one data point. Furthermore, setting n=n and, consequently, the design matrix of the imaginary data XX simplifies significantly the overwhelming computations required when considering all possible ‘minimal’ training samples (Pérez & Berger, Citation2002) while it also avoids the complicated issue (in some cases) of defining the size of the minimal training samples (Berger & Pericchi, Citation2004). In addition, under the choice n=n, the PEP prior remains relatively non-informative even for models with dimension close to the sample size n, while the effect on the evaluation of each model is minimal since the resulting Bayes factors are robust over different values of n. Detailed information about the default specifications of the PEP prior is provided in Fouskakis et al. (Citation2015). Finally, the null model (with no explanatory variables) is a standard choice for the reference model in regression problems; see, for example, Pérez and Berger (Citation2002). In the above definition of PEP prior, the power parameter can also be model depended, and denoted by δ.

Fouskakis and Ntzoufras (Citation2016a) proved the consistency of the Bayes factor when using the PEP methodology, with the independence Jeffreys as a baseline prior, for Gaussian linear models, under very mild conditions on the design matrix, when the dimension of each model is fixed, the size of the training sample is equal to the sample size n and the power parameter is also set equal to n. In a similar manner as in Fouskakis and Ntzoufras (Citation2016a), when comparing the full model Mp to a reduced model M, the Bayes factor under the PEP prior is given by (7) BFpPEP=2Γ(np)Γ2(np2)0π/2×(sinϕ)nd1(cosϕ)np1(δ+sin2ϕ)(np)/2δRSSpRSS+sin2ϕ(nd)/2dϕ,(7) with RSSj denoting the residual sum of squares of model Mj (j=,p). For large n, we can approximate the Bayes factor given in (Equation7) as (8) BFpPEP1ρp(nd)/21δ(pd)/212(pd)/2,(8) if p is fixed constant; and as (9) BFpPEP1ρp(rpd)/21δ(pd)/22(2(r1)p1)/2×(r1)(r1)p/2r(rpd1)/2(2r1)((2r1)pd1)/2,(9) if p increases as n grows to infinity and (np) grows to infinity, with rate r>1 so that n=r×p (for a detailed proof of (Equation8) and (Equation9) see Innocent, Citation2016).

In the rest of the paper, we denote by ρp=RSSpRSS and by ϵp=1σT2βTtXTt(InH)XTnβT, where MT denotes the ‘true’ model and H the hat matrix of model M (see Casella, Girón, Martínez, & Moreno, Citation2009). Since the reduced model M is nested in the full model Mp, we have that ρp(0,1].

Finally, the following results hold, as n increases, with respect to the distribution and the limiting behaviour of the statistic ρp (see Girón, Moreno, & Casella, Citation2010):

  • If dim(M)=d=O(1) and dim(Mp)=p=O(1):

    1. When sampling from model M, the distribution of the statistic ρp is the central beta distribution Be((np)/2,(pd)/2) and limn+ρp=1.

    2. When sampling from model Mp, the distribution of the statistic ρp is the non-central beta distribution Be((np)/2,(pd)/2,0,nϵp) and limn+ρp=11+ϵ, with limn+ϵp=ϵ>0.

  • If dim(M)=d=O(1) and dim(Mp)=p=O(n) with r=limn,p+np>1,p>d>1:

    1. When sampling from model M, the distribution of the statistic ρp is the central beta distribution Be(p(r1)/2,(pd)/2) and limn+ρp=r1r,r>1.

    2. When sampling from model Mp the distribution of the statistic ρp is the non-central beta distribution Be(p(r1)/2,(pd)/2,0,rpϵp) and limn+ρp=r1r(1+ϵ), where limn+ϵp=ϵ>0.

In this paper, we examine the consistency of the Bayes factor, for nested normal linear models, under the PEP methodology, using the pair of models M and Mp. The number of parameters of the simpler model M is always fixed, while for the full model is of order O(nα), where α{0,1}. We investigate the effect of the power parameter δ by examining four different scenarios. In each case, the ‘true’ model is set equal to either M or Mp.

2. Bayes factor consistency under power-expected-posterior priors

In what follows we set the size of the training sample n equal to the sample size n as in Fouskakis et al. (Citation2015).

2.1. When the power δ=n

First, we consider the case where the power parameter is set equal to the sample size n, and studying the consistency when the dimension p of the full model Mp is either a fixed constant number or large and goes to infinity.

Then (Equation7) becomes: (10) BFpPEP=2Γ(np)Γ2(np2)0π/2×(sinϕ)nd1(cosϕ)np1(n+sin2ϕ)(np)/2(nρp+sin2ϕ)(nd)/2dϕ.(10)

2.1.1. When dim(M)=O(1) and dim(Mp)=O(1)

Theorem 2.1

Let the sample size n increases and being strictly greater than the dimension of the full model Mp. Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. dim(M)=d=O(1) and dim(Mp)=p=O(1), where p>d>1. Under the condition δ=n, when sampling from model Mj, where j is either ℓ or p we have: limn+BFpPEP=0if j=+if j=p.

Proof.

For δ=n, (Equation8) becomes (11) BFpPEP12n(pd)/21ρp(nd)/2.(11)

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation11) becomes: (12) BFpPEP12n(pd)/2.(12) Since p and d are constants and n goes to infinity we get limn+BFpPEP=0. Thus, the Bayes factor of the full model Mp versus the reduced model M is consistent under the reduced model M.

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation11) becomes: (13) BFpPEP12n(pd)/2(1+ϵ)n/212(pd)/2×en((pd)/2)(log(n)/n)+(n/2)log(1+ϵ).(13) Thus limn+BFpPEP=elimn+(n/2)log(1+ϵ)=+, since ϵ>0, and (n/2)log(1+ϵ)+ as n+. Therefore, the Bayes factor of the full model Mp versus the reduced model M is consistent when sampling from the full model Mp.

2.1.2. When dim(M)=O(1) and dim(Mp)=O(n)

Theorem 2.2

Let δ=n and suppose that the reduced model M has a fixed number of parameters, i.e. dim(M)=d=O(1), as the simple size n increases, and in the full model Mp the number of parameters increase with rate dim(Mp)=p=O(n) with r=limn,p+np>1,p>d>1. Then:

  1. When sampling from model M limn+BFpPEP=0.

  2. When sampling from model Mp limn+BFpPEP=0if r>1 is a fixed constant0if limn+ϵp<ϵp2(r)+if limn+ϵpϵp2(r)if r>1 is a large numberfor some function ϵp2 given by ϵp2(r):(1,+)R,r(2rp)1/r1.

Proof.

By replacing nrp, and δ=rp, (Equation9), becomes (14) BFpPEP12(r1)p(pd)/2×2r2r1((2r1)pd1)/2r1rρp(prd)/2.(14)

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation14) becomes BFpPEP12(r1)p(pd)/22r2r1((2r1)pd1)/2 and then BFpPEP12(r1)p2r2r12r1p/2×(2r1)(r1)prd/2112r1/2. So for large value of p, we have BFpPEP12(r1)p2r2r12r1p/2 and then (15) BFpPEP1pp/2if r>1 is a fixed constant12rpp/2if r is a large number.(15) In both cases, for large p, we get limn+BFpPEP=0, since limn+1pp/2=limn+expp2logp=0. Thus the Bayes factor of the full model Mp against the reduced model M is consistent under the reduced model M.

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation14) becomes BFpPEP12(r1)p(pd)/2×2r2r1((2r1)pd1)/2(1+ϵ)(rpd)/2. So for large p, we have BFpPEP1pp/2if r>1 is a fixed constant(1+ϵ)r2rpp/2(2rp)d/2if r is a large number and then limn+BFpPEP1pp/2if r>1 is a fixed constant(2rp)d/2if (1+ϵ)r2rp=1(1+ϵ)r2rpp/2if (1+ϵ)r2rp1if r>1 is a large number. Solving the equation (1+ϵ)r/2rp=1 for ε, we get ϵ=(2rp)1/r1. Therefore using the function ϵp2(r):(1,+)R,r(2rp)1/r1 we have limn+BFpPEP=0if r>1 is a fixed constant0if limn+ϵp<ϵp2(r)+if limn+ϵpϵp2(r)if r>1 is a large number. Thus, the Bayes factor of the full model Mp versus the reduced model M is consistent under the full model Mp if and only if limn+ϵpϵp2(r) when r is large and goes to infinity.

2.2. When the power δ=(np)

Second, we consider the case where the power δ=(np) and studying the consistency when the dimension p of the full model Mp is either a fixed constant number or large and goes to infinity. Then (Equation7) becomes: BFpPEP=2Γ(np)Γ2(np2)0π/2×(sinϕ)nd1(cosϕ)np1((np)+sin2ϕ)(np)/2(np)ρip+sin2ϕ(nd)/2dϕ

2.2.1. When dim(M)=O(1) and dim(Mp)=O(1)

Let the simple size n increases and being strictly greater than the dimension of the full model Mp. Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. dim(M)=d=O(1) and dim(Mp)=p=O(1), where p>d>1.

For δ=(np), (Equation8) becomes BFpPEP12(pd)/21np(pd)/21ρp(nd)/2, and then since p and d are fixed constants and for large values of n, we get (16) BFpPEP1np/21ρpn/2.(16) Working as in the proof of Theorem 2.1, we conclude that the Bayes factor of the full model Mp versus the reduced model M is consistent when sampling from either models.

2.2.2. When dim(M)=O(1) and dim(Mp)=O(n)

Theorem 2.3

Let δ=(np) and suppose that the reduced model M has a fixed number of parameters, i.e. dim(M)=d=O(1), as the simple size n increases, and in the full model Mp the number of parameters increase with rate dim(Mp)=p=O(n) with r=limn,p+np>1,p>d>1. Then:

  1. When sampling from model M limn+BFpPEP=0.

  2. When sampling from model Mp limn+BFpPEP=0if r>1 is a fixed constant0if limn+ϵp<ϵp2(r)+if limn+ϵpϵp2(r)if r>1 is a large numberfor some function ϵp2 given by ϵp2(r):(1,+)R,r(2rp)1/r1.

Proof.

By replacing nrp, and δ=rpp, (Equation9), becomes (17) BFpPEP2r2r1((2r1)pd1)/2×r2p(r1)2(pd)/2r1rρp(prd)/2.(17)

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation17) becomes BFpPEP2r2r1((2r1)pd1)/2×r2p(r1)2(pd)/2. So for large value of p, we have BFpPEP1pp/2if r>1 is a fixed constant12rpp/2if r is a large number. In both cases, for large p, we get limn+BFpPEP=0, Thus the Bayes factor of the full model Mp against the reduced model M is consistent under the reduced model M.

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation17) becomes BFpPEPr2p(r1)2(pd)/2×2r2r1((2r1)pd1)/2(1+ϵ)(rpd)/2, or (18) BFpPEP2r2r12r1r(1+ϵ)r2p(r1)2p/2×(2r1)(r1)2(1+ϵ)1pr2d/2×112r1/2.(18) So for large p, we have BFpPEP1pp/2if r>1 is a fixed constant(1+ϵ)r2rpp/2(2rp)d/2if r is a large number Thus working as in the proof of Theorem 2.2 we conclude that the Bayes factor of the full model Mp versus the reduced model M is consistent under the full model Mp if and only if limn+ϵpϵp2(r) when r is large and goes to infinity.

2.3. When the power δ=p

Third, we consider the case where the power is equal to the dimension of the full model and studying the consistency when the dimension p=dim(Mp) of the full model Mp is either a fixed constant number or large and goes to infinity.

Under this set-up, (Equation7) becomes: BFpPEP=2Γ(np)Γ2(np2)×0π/2(sinϕ)nd1(cosϕ)np1(p+sin2ϕ)(np)/2(pρp+sin2ϕ)(nd)/2dϕ.

2.3.1. When dim(M)=O(1) and dim(Mp)=O(1)

Theorem 2.4

Let δ=p and the sample size n increases and being strictly greater than the dimension of the full model Mp. Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. dim(M)=d=O(1) and dim(Mp)=p=O(1), where p>d>1. Then when sampling from model Mj, where j is either ℓ or p we have: limn+BFpPEP=Constant>0if j=+if j=p.

Proof.

For δ=p, (Equation8) becomes (19) BFpPEP12p(pd)/21ρp(nd)/2(19) Then we consider the following two cases.

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation19) becomes BFpPEP12p(pd)/2. Since p and d are constants, with p>d>1, we get limn+BFpPEP=limn+12p(pd)/2=Constant>0. Thus, the Bayes factor of the full model Mp versus the reduced model M is inconsistent under the reduced model M.

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation19) becomes BFpPEPe(n/2)log(1+ϵ). and thus limn+BFpPEP=+. Therefore, the Bayes factor of the full model Mp versus the reduced model M is consistent when sampling from the full model Mp.

2.3.2. When dim(M)=O(1) and dim(Mp)=O(n)

Theorem 2.5

Let δ=p and suppose that the reduced model M has a fixed number of parameters, i.e. dim(M)=d=O(1), as the simple size n increases, and in the full model Mp the number of parameters increase with rate dim(Mp)=p=O(n) with r=limn,p+np>1,p>d>1. Then:

  1. When sampling from model M limn+BFpPEP=0.

  2. When sampling from model Mp limn+BFpPEP=0if r>1 is a fixed constant0if limn+ϵp<ϵp1(r)+if limn+ϵpϵp1(r)if r>1 is a large numberfor some function ϵp1 given by ϵp1(r):(1,+)R,r(2p)1/r1.

Proof.

By replacing nrp, and δ=p, (Equation9) becomes (20) BFpPEPr2(r1)p(pd)/2×2r2r1((2r1)pd1)/2r1rρp(prd)/2.(20)

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation20) becomes BFpPEPr2(r1)p(pd)/2×2r2r1((2r1)pd1)/2 and then BFpPEPr2(r1)p2r2r12r1p/2×(2r1)(r1)p2r2d/2112r1/2. So for large value of p we have (21) BFpPEP1pp/2if r>1 is a fixed constant12pp/2if r is a large number.(21) In both cases, for large p, we get limn+BFpPEP=0. Thus the Bayes factor of the full model Mp against the reduced model M is consistent under the reduced model M.

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation20) becomes BFpPEPr2(r1)p(pd)/2×2r2r1((2r1)pd1)/2(1+ϵ)(rpd)/2. So for large p, we have BFpPEP1pp/2if r>1 is a fixed constant(1+ϵ)r2pp/2(2p)d/2if r is a large number and then BFpPEP1pp/2if r>1 is a fixed constant(2p)d/2if (1+ϵ)r2p=1(1+ϵ)r2pp/2if (1+ϵ)r2p1if r>1 is a large number. Solving the equation (1+ϵ)r/2p=1 for ε, we get ϵ=(2p)1/r1. Therefore using the function ϵp1(r):(1,+)R,r(2p)1/r1 we have limn+BFpPEP=0if r>1 is a fixed constant0if limn+ϵp<ϵp1(r)+if limn+ϵpϵp1(r)if r>1 is a large number. Thus, the Bayes factor of the full model Mp versus the reduced model M is consistent under the full model Mp if and only if limn+ϵpϵp1(r) when r is large and goes to infinity.

2.4. When the power δ=δ

Finally, we consider the case where the power parameter is set equal to a fixed non-negative constant δ, and studying the consistency when the dimension p of the full model Mp is either a fixed constant number or large and goes to infinity.

Then (Equation7) becomes: (22) BFpPEP=2Γ(np)Γ2(np2)0π/2×(sinϕ)nd1(cosϕ)np1(δ+sin2ϕ)(np)/2(δρp+sin2ϕ)(nd)/2dϕ.(22)

2.4.1. When dim(M)=O(1) and dim(Mp)=O(1)

Theorem 2.6

Let the sample size n increases and being strictly greater than the dimension of the full model Mp. Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. dim(M)=d=O(1) and dim(Mp)=p=O(1), where p>d>1. Under the condition δ=δ>0, when sampling from model Mj, where j is either ℓ or p we have: limn+BFpPEP=0if j= and δ is largeConstant>0if j= and δ is not large+if j=p.

Proof.

For δ=δ, (Equation8) becomes (23) BFpPEP12δ(pd)/21ρp(nd)/2(23) Then we consider the following two cases.

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation23) becomes BFpPEP12δ(pd)/2. Since p and d are constants, with p>d>1, if δ is large, we get limn+BFpPEP=0, while if δ is not large, we get limn+BFpPEP= Constant>0. Thus, the Bayes factor of the full model Mp versus the reduced model M is consistent under the reduced model M, only for large values of δ.

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation23) becomes BFpPEP12δ(pd)/2(1+ϵ)(nd)/2e(n/2)(((pd)/n)log(2δ)+log(1+ϵ)). Thus limn+BFpiPEP=+. Therefore, the Bayes factor of the full model Mp versus the reduced model M is consistent when sampling from the full model Mp.

2.4.2. When dim(M)=O(1) and dim(Mp)=O(n)

Theorem 2.7

Let δ=δ>0 and suppose that the reduced model M has a fixed number of parameters, i.e. dim(M)=d=O(1), as the simple size n increases, and in the full model Mp the number of parameters increase with rate dim(Mp)=p=O(n) with r=limn,p+np>1,p>d>1. Then:

  1. When sampling from model M limn+BFpPEP=0ifδ>β1(r)+ifδ<β1(r)Constant>1ifδ=β1(r)for a continuous and decreasing function β1:(1,+)R,r(2r/(2r1))2r1(r/2(r1)).

  2. When sampling from model Mp limn+BFpPEP=0if δ>β2(r)+if δ<β2(r)+if δ=β2(r) and largeConstant>0if δ=β2(r) and smallfor a continuous function β2:(1,+)R,rβ1(r)(1+r)r.

Proof.

By replacing nrp and δ=δ, (Equation9) becomes (24) BFpPEPr2(r1)δ(pd)/2×2r2r1((2r1)pd1)/2r1rρp(prd)/2.(24)

(a) Suppose that the Reduced Model M is true

Using the asymptotic results of ρp given in Section 1, (Equation24) becomes BFpPEPr2(r1)δ(pd)/2×2r2r1((2r1)pd1)/2 and then BFpPEPr2(r1)δ2r2r12r1p/2×(2r1)(r1)δr2d/2112r1/2. We consider the following cases:

  • If (r/2(r1)δ)(2r/(2r1))2r1=1δ=β1(r) then BFpPEP2r2r1(r1)d112r1/2.Thus for any r>1 limn+BFpPEP=Constant>0.

  • If (r/2(r1)δ)(2r/(2r1))2r11 for large values of p we get BFpPEPr2(r1)δ2r2r12r1p/2.Then if

    1. (r/2(r1)δ)(2r/(2r1))2r1<1δ>β1(r) limn+BFpPEP=0.

    2. (r/2(r1)δ)(2r/(2r1))2r1>1δ<β1(r) limn+BFpPEP=+.

Thus, the Bayes factor of the full model Mp versus the reduced model M is consistent under the reduced model M if and only if the power δ>β1(r).

(b) Suppose that the Full Model Mp is true

Using the asymptotic results of ρp given in Section 1, (Equation24) becomes BFpPEPr2(r1)δ(pd)/2×2r2r1((2r1)pd1)/21+ϵ(prd)/2, or BFpPEP2r2r12r1r(1+ϵ)r2(r1)δp/2×(2r1)(r1)δr2(1+ϵ)d/2112r1/2. We consider the following cases

  • If (2r/(2r1))2r1(r(1+ϵ)r/2(r1)δ)=1δ=β2(r) then BFpPEP((2r1)(r1)δ/r2(1+ϵ))d/2(11/2r)1/2 and for large values of δ we have limn+BFpPEPlimn+2δ1+ϵd/2=+,while if δ is not large limn+BFpPEPlimn+2δ1+ϵd/2=Constant>0.

  • If (2r/(2r1))2r1(r(1+ϵ)r/2(r1)δ)1, for large value p we have BFpPEP((2r/(2r1))2r1(r(1+ϵ)r/2(r1)δ))p/2. Then if

    1. (2r/(2r1))2r1(r(1+ϵ)r/2(r1)δ)<1δ>β2(r) limn+BFpPEP=0.

    2. (2r/(2r1))2r1(r(1+ϵ)r/2(r1)δ)>1δ<β2(r) limn+BFpPEP=+.

Thus, the Bayes factor of the full model Mp versus the reduced model M is inconsistent under the full model Mp if δ>β2(r) or when δ=β2(r) and δ is small.

3. Summary and conclusions

In this paper, we examined the asymptotic behaviour of the power-expected-posterior methodology when comparing nested normal linear models. Emphasis was given on the consistency of the Bayes factor of the full model Mp versus a generic submodel M. The number of parameters of the simplest model M was kept always fixed, while for the full model was set of order O(nα), where α{0,1}. We investigated the effect of the prior power parameter δ, by examining four different scenarios. In each case, the ‘true’ model was set equal to either M or Mp. Tables  summarise our findings.

Table 1. Consistency of BFpPEP when model M has dimension dim(M)=i=O(1) and δ{n,np}.

Table 2. Consistency of BFpPEP when model M has dimension dim(M)=i=O(1) and δ=p.

Table 3. Consistency of BFpPEP when model M has dimension dim(M)=i=O(1) and δ=δ>0.

The consistency properties of the Power-Expected-Posterior (PEP) prior Bayes factors are eminently reasonable, assuming that we are sampling from either of the candidate models. It is always consistent for fixed dimensions of the candidate models and even in the difficult situation on which the alternative model can grow with the sample size, for the situations described in Tables – , the PEP Bayes factor is consistent, unless the alternative model is extremely close to the null model, in which case, we conjecture, the lack of consistency is not a critical issue, at least for prediction purposes.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

D. Fouskakis

D. Fouskakis is an Associate Professor in the Department of Mathematics, at the National Technical University of Athens, in Greece. He is also the Director of the Stats Lab at the same University. His research mostly focuses on Bayesian model and variable selection, on objective priors and on stochastic optimization methods.

J. K. Innocent

J. K. Innocent received a Ph.D in Mathematics at the University of Puerto Rico, Puerto Rico, USA in 2016. He is currently back to Haiti, where he teaches mathematical and Statistical courses at a university level. His main research areas are on Bayesian Statistics, Statistical Analysis, Biostatistics and Epidemiology.

L. Pericchi

L. Pericchi is a Full Professor in the Department of Mathematics of the University of Puerto Rico Rio Piedras, USA. He is also the Director of the Center of Biostatistics and Bioinformatics of the College of Natural Sciences. His research is in the Theory and Applications of Statistics, with emphasis in the Bayesian Approach.

References

  • Berger, J., & Pericchi, L. (2004). Training samples in objective Bayesian model selection. Annals of Statistics, 32, 841–869. doi: 10.1214/009053604000000229
  • Casella, G., Girón, F. J., Martínez, M. L., & Moreno, E. (2009). Consistency of Bayesian procedures for variable selection. Annals of Statistics, 37, 1207–1228. doi: 10.1214/08-AOS606
  • Fouskakis, D., & Ntzoufras, I. (2016a). Limiting behaviour of the Jeffreys power-expected-posterior Bayes factor in Gaussian linear models. Brazilian Journal of Probability and Statistics, 30, 299–320. doi: 10.1214/15-BJPS281
  • Fouskakis, D., & Ntzoufras, I. (2016b). Power-conditional-expected priors: Using g-priors with random imaginary data for variable selection power-conditional-expected priors. Journal of Computational and Graphical Statistics, 25, 647–664. doi: 10.1080/10618600.2015.1036996
  • Fouskakis, D., Ntzoufras, I., & Draper, D. (2015). Power-expected-posterior priors for variable selection in Gaussian linear models. Bayesian Analysis, 10, 75–107. doi: 10.1214/14-BA887
  • Girón, F. J., Moreno, E., & Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Annals of Statistics, 38, 1937–1952. doi: 10.1214/09-AOS754
  • Ibrahim, J. G., & Chen, M. H.. (2000). Power prior distributions for regression models. Statistical Science, 15, 46–60. doi: 10.1214/ss/1009212673
  • Innocent, J. K. (2016). Bayes factors consistency for nested linear models with increasing dimensions (Unpublished doctoral dissertation). University of Puerto Rico.
  • Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90, 928–934. doi: 10.1080/01621459.1995.10476592
  • Pérez, J. M., & Berger, J. O. (2002). Expected-posterior prior distributions for model selection. Biometrika, 89, 491–512. doi: 10.1093/biomet/89.3.491

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.