![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
Abstract
The power-expected-posterior prior is used in this paper for comparing nested linear models. The asymptotic behaviour of the method is investigated for different values of the power parameter of the prior. Focus is given on the consistency of the Bayes factor of comparing the full model versus a generic submodel
. In each case, we allow the true generating model to be either
or
and we keep the dimension of
fixed, while the dimension of
can be either fixed or (grow as)
, with n denoting the sample size.
1. Introduction
Pérez and Berger (Citation2002) developed priors for objective Bayesian model comparison, through the utilisation of the device of ‘imaginary training samples’. The expected-posterior prior (EPP) for the parameter under a model is an expectation of the posterior distribution given imaginary observations of size
. The expectation is taken with respect to a suitable probability measure of a reference model
, while the posterior distribution is computed via Bayes's theorem starting from a default, typically improper, prior. One of the advantages of using EPPs is that impropriety of baseline priors causes no indeterminacy in the computation of Bayes factors. On the other hand, the EPPs depend on the training sample size and particularly in variable selection problems, imaginary design matrices should also be introduced, under each competing model, and therefore the resulting prior will further depend on this choice (for a detailed discussion on this issue, see Fouskakis, Ntzoufras, & Draper, Citation2015). The selection of a minimal training sample, of size
, has been proposed (see, for example, Berger & Pericchi, Citation2004), to make the information content of the prior as small as possible, and this is an appealing idea. But even under this set-up, the resulting prior can be influential when the sample size n is not much larger than the total number of parameters under the full model (see Fouskakis et al., Citation2015).
The power-expected-posterior (PEP) prior, introduced by Fouskakis et al. (Citation2015), is an objective prior which amalgamates ideas from the power prior (Ibrahim & Chen, Citation2000), the expected-posterior prior (Pérez & Berger, Citation2002) and the unit-information-prior approach of Kass and Wasserman (Citation1995) to simultaneously (a) produce a minimally informative prior and (b) diminish the effect of training samples under the EPP methodology. The main idea is to substitute the likelihood by a density-normalised version of a power-likelihood in EPP. Fouskakis et al. (Citation2015) and Fouskakis and Ntzoufras (Citation2016b) studied in detailed the PEP priors under the variable selection problem in Gaussian regression models. In the first paper, they introduced the PEP prior by considering as parameter of interest both the coefficients of the model and the error variance while in the second paper they studied the conditional version of PEP, named PCEP, where they considered only the coefficients as the parameter of interest and the error variance as a common nuisance parameter. Here we focus in the former case. Under this approach, for every model in
(the set of all models under consideration) the sampling distribution
is specified by
(1)
(1) where
is a vector containing the responses for all subjects,
is an
design matrix containing the values of the explanatory variables in its columns,
is the
identity matrix,
is a vector of length
summarising the effects of the covariates in model
on the response
and
is the error variance for model
. Finally, by p we denote the total number of the explanatory variables under consideration and by
the full model, including all p covariates.
Furthermore, we denote by the baseline prior to the parameters of model
. Here we use the independence Jeffreys prior (or reference prior) as the baseline prior distribution. Hence, for any
, we have
(2)
(2) where
is an unknown normalising constant.
We assume that in there exists a model
, with parameters
and
, sampling distribution
and baseline prior
, which is nested into each of the remaining models and we consider it as a reference model. This is the typical case in the variable selection problem, studied in this paper. Given then a set of imaginary data
and a positive power parameter δ, that is used to regulate, essentially, the contribution of the imaginary data on the ‘final’ prior, we introduce the density-normalised power-likelihood, under model
, given by
(3)
(3) The above density-normalised power-likelihood is still a normal distribution with variance inflated by a factor of δ; in the above,
denotes the imaginary design matrix under model
. In a similar manner, under the reference model, the density-normalised power-likelihood takes the form of (Equation3
(3)
(3) ) but using now the likelihood
of
.
In order to apply the PEP methodology, the density-normalised power-likelihood (Equation3(3)
(3) ) is used to evaluate, under the imaginary data and the baseline prior, the prior predictive distribution
of model
as well as the posterior distribution of the parameters of model
(4)
(4) where
(5)
(5) is the prior predictive distribution of model
for
.
Finally, the imposed prior for the parameters of any model has the following form
(6)
(6) The default choice for δ is to set it equal to
, i.e. the sample size of the imaginary data, so that the overall information of the imaginary data in the posterior is equal to one data point. Furthermore, setting
and, consequently, the design matrix of the imaginary data
simplifies significantly the overwhelming computations required when considering all possible ‘minimal’ training samples (Pérez & Berger, Citation2002) while it also avoids the complicated issue (in some cases) of defining the size of the minimal training samples (Berger & Pericchi, Citation2004). In addition, under the choice
, the PEP prior remains relatively non-informative even for models with dimension close to the sample size n, while the effect on the evaluation of each model is minimal since the resulting Bayes factors are robust over different values of
. Detailed information about the default specifications of the PEP prior is provided in Fouskakis et al. (Citation2015). Finally, the null model (with no explanatory variables) is a standard choice for the reference model in regression problems; see, for example, Pérez and Berger (Citation2002). In the above definition of PEP prior, the power parameter can also be model depended, and denoted by
.
Fouskakis and Ntzoufras (Citation2016a) proved the consistency of the Bayes factor when using the PEP methodology, with the independence Jeffreys as a baseline prior, for Gaussian linear models, under very mild conditions on the design matrix, when the dimension of each model is fixed, the size of the training sample is equal to the sample size n and the power parameter is also set equal to n. In a similar manner as in Fouskakis and Ntzoufras (Citation2016a), when comparing the full model to a reduced model
, the Bayes factor under the PEP prior is given by
(7)
(7) with
denoting the residual sum of squares of model
(
). For large n, we can approximate the Bayes factor given in (Equation7
(7)
(7) ) as
(8)
(8) if p is fixed constant; and as
(9)
(9) if p increases as n grows to infinity and
grows to infinity, with rate
so that
(for a detailed proof of (Equation8
(8)
(8) ) and (Equation9
(9)
(9) ) see Innocent, Citation2016).
In the rest of the paper, we denote by
and by
where
denotes the ‘true’ model and
the hat matrix of model
(see Casella, Girón, Martínez, & Moreno, Citation2009). Since the reduced model
is nested in the full model
, we have that
.
Finally, the following results hold, as n increases, with respect to the distribution and the limiting behaviour of the statistic (see Girón, Moreno, & Casella, Citation2010):
If
and
:
When sampling from model
, the distribution of the statistic
is the central beta distribution
and
When sampling from model
, the distribution of the statistic
is the non-central beta distribution
and
with
If
and
with
When sampling from model
, the distribution of the statistic
is the central beta distribution
and
When sampling from model
the distribution of the statistic
is the non-central beta distribution
and
where
In this paper, we examine the consistency of the Bayes factor, for nested normal linear models, under the PEP methodology, using the pair of models and
. The number of parameters of the simpler model
is always fixed, while for the full model is of order
where
. We investigate the effect of the power parameter
by examining four different scenarios. In each case, the ‘true’ model is set equal to either
or
.
2. Bayes factor consistency under power-expected-posterior priors
In what follows we set the size of the training sample equal to the sample size n as in Fouskakis et al. (Citation2015).
2.1. When the power ![](//:0)
![](//:0)
First, we consider the case where the power parameter is set equal to the sample size n, and studying the consistency when the dimension p of the full model is either a fixed constant number or large and goes to infinity.
Then (Equation7(7)
(7) ) becomes:
(10)
(10)
2.1.1. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.1
Let the sample size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e.
and
where
Under the condition
when sampling from model
where j is either ℓ or p we have:
Proof.
For , (Equation8
(8)
(8) ) becomes
(11)
(11)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation11
(11)
(11) ) becomes:
(12)
(12) Since p and
are constants and n goes to infinity we get
Thus, the Bayes factor of the full model
versus the reduced model
is consistent under the reduced model
.
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation11
(11)
(11) ) becomes:
(13)
(13) Thus
since
and
as
. Therefore, the Bayes factor of the full model
versus the reduced model
is consistent when sampling from the full model
.
2.1.2. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.2
Let and suppose that the reduced model
has a fixed number of parameters, i.e.
as the simple size n increases, and in the full model
the number of parameters increase with rate
with
Then:
When sampling from model
When sampling from model
for some function
given by
.
Proof.
By replacing and
, (Equation9
(9)
(9) ), becomes
(14)
(14)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation14
(14)
(14) ) becomes
and then
So for large value of p, we have
and then
(15)
(15) In both cases, for large p, we get
since
Thus the Bayes factor of the full model
against the reduced model
is consistent under the reduced model
.
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation14
(14)
(14) ) becomes
So for large p, we have
and then
Solving the equation
for ε, we get
Therefore using the function
we have
Thus, the Bayes factor of the full model
versus the reduced model
is consistent under the full model
if and only if
when r is large and goes to infinity.
2.2. When the power ![](//:0)
![](//:0)
Second, we consider the case where the power and studying the consistency when the dimension p of the full model
is either a fixed constant number or large and goes to infinity. Then (Equation7
(7)
(7) ) becomes:
2.2.1. When ![](//:0)
and ![](//:0)
![](//:0)
Let the simple size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e.
and
, where
.
For , (Equation8
(8)
(8) ) becomes
and then since p and
are fixed constants and for large values of n, we get
(16)
(16) Working as in the proof of Theorem 2.1, we conclude that the Bayes factor of the full model
versus the reduced model
is consistent when sampling from either models.
2.2.2. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.3
Let and suppose that the reduced model
has a fixed number of parameters, i.e.
as the simple size n increases, and in the full model
the number of parameters increase with rate
with
Then:
When sampling from model
When sampling from model
for some function
given by
Proof.
By replacing and
, (Equation9
(9)
(9) ), becomes
(17)
(17)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation17
(17)
(17) ) becomes
So for large value of p, we have
In both cases, for large p, we get
Thus the Bayes factor of the full model
against the reduced model
is consistent under the reduced model
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation17
(17)
(17) ) becomes
or
(18)
(18) So for large p, we have
Thus working as in the proof of Theorem 2.2 we conclude that the Bayes factor of the full model
versus the reduced model
is consistent under the full model
if and only if
when r is large and goes to infinity.
2.3. When the power ![](//:0)
![](//:0)
Third, we consider the case where the power is equal to the dimension of the full model and studying the consistency when the dimension of the full model
is either a fixed constant number or large and goes to infinity.
Under this set-up, (Equation7(7)
(7) ) becomes:
2.3.1. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.4
Let and the sample size n increases and being strictly greater than the dimension of the full model
. Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e.
and
where
Then when sampling from model
where j is either ℓ or p we have:
Proof.
For , (Equation8
(8)
(8) ) becomes
(19)
(19) Then we consider the following two cases.
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation19
(19)
(19) ) becomes
Since p and
are constants, with
, we get
Thus, the Bayes factor of the full model
versus the reduced model
is inconsistent under the reduced model
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation19
(19)
(19) ) becomes
and thus
Therefore, the Bayes factor of the full model
versus the reduced model
is consistent when sampling from the full model
2.3.2. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.5
Let and suppose that the reduced model
has a fixed number of parameters, i.e.
as the simple size n increases, and in the full model
the number of parameters increase with rate
with
Then:
When sampling from model
When sampling from model
for some function
given by
.
Proof.
By replacing and
, (Equation9
(9)
(9) ) becomes
(20)
(20)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation20
(20)
(20) ) becomes
and then
So for large value of p we have
(21)
(21) In both cases, for large p, we get
Thus the Bayes factor of the full model
against the reduced model
is consistent under the reduced model
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation20
(20)
(20) ) becomes
So for large p, we have
and then
Solving the equation
for ε, we get
Therefore using the function
we have
Thus, the Bayes factor of the full model
versus the reduced model
is consistent under the full model
if and only if
when r is large and goes to infinity.
2.4. When the power ![](//:0)
![](//:0)
Finally, we consider the case where the power parameter is set equal to a fixed non-negative constant δ, and studying the consistency when the dimension p of the full model is either a fixed constant number or large and goes to infinity.
Then (Equation7(7)
(7) ) becomes:
(22)
(22)
2.4.1. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.6
Let the sample size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e.
and
where
Under the condition
when sampling from model
where j is either ℓ or p we have:
Proof.
For , (Equation8
(8)
(8) ) becomes
(23)
(23) Then we consider the following two cases.
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation23
(23)
(23) ) becomes
Since p and
are constants, with
, if δ is large, we get
while if δ is not large, we get
Thus, the Bayes factor of the full model
versus the reduced model
is consistent under the reduced model
only for large values of δ.
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation23
(23)
(23) ) becomes
Thus
Therefore, the Bayes factor of the full model
versus the reduced model
is consistent when sampling from the full model
.
2.4.2. When ![](//:0)
and ![](//:0)
![](//:0)
Theorem 2.7
Let and suppose that the reduced model
has a fixed number of parameters, i.e.
as the simple size n increases, and in the full model
the number of parameters increase with rate
with
Then:
When sampling from model
for a continuous and decreasing function
.
When sampling from model
for a continuous function
Proof.
By replacing and
, (Equation9
(9)
(9) ) becomes
(24)
(24)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation24
(24)
(24) ) becomes
and then
We consider the following cases:
If
then
Thus for any r>1
If
for large values of p we get
Then if
Thus, the Bayes factor of the full model versus the reduced model
is consistent under the reduced model
if and only if the power
.
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation24
(24)
(24) ) becomes
or
We consider the following cases
If
then
and for large values of δ we have
while if δ is not large
If
, for large value p we have
Then if
Thus, the Bayes factor of the full model versus the reduced model
is inconsistent under the full model
if
or when
and δ is small.
3. Summary and conclusions
In this paper, we examined the asymptotic behaviour of the power-expected-posterior methodology when comparing nested normal linear models. Emphasis was given on the consistency of the Bayes factor of the full model versus a generic submodel
. The number of parameters of the simplest model
was kept always fixed, while for the full model was set of order
where
. We investigated the effect of the prior power parameter
, by examining four different scenarios. In each case, the ‘true’ model was set equal to either
or
. Tables – summarise our findings.
Table 1. Consistency of ![](//:0)
when model ![](//:0)
has dimension ![](//:0)
and ![](//:0)
.
Table 2. Consistency of ![](//:0)
when model ![](//:0)
has dimension ![](//:0)
and ![](//:0)
.
Table 3. Consistency of ![](//:0)
when model ![](//:0)
has dimension ![](//:0)
and ![](//:0)
.
The consistency properties of the Power-Expected-Posterior (PEP) prior Bayes factors are eminently reasonable, assuming that we are sampling from either of the candidate models. It is always consistent for fixed dimensions of the candidate models and even in the difficult situation on which the alternative model can grow with the sample size, for the situations described in Tables – , the PEP Bayes factor is consistent, unless the alternative model is extremely close to the null model, in which case, we conjecture, the lack of consistency is not a critical issue, at least for prediction purposes.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
D. Fouskakis
D. Fouskakis is an Associate Professor in the Department of Mathematics, at the National Technical University of Athens, in Greece. He is also the Director of the Stats Lab at the same University. His research mostly focuses on Bayesian model and variable selection, on objective priors and on stochastic optimization methods.
J. K. Innocent
J. K. Innocent received a Ph.D in Mathematics at the University of Puerto Rico, Puerto Rico, USA in 2016. He is currently back to Haiti, where he teaches mathematical and Statistical courses at a university level. His main research areas are on Bayesian Statistics, Statistical Analysis, Biostatistics and Epidemiology.
L. Pericchi
L. Pericchi is a Full Professor in the Department of Mathematics of the University of Puerto Rico Rio Piedras, USA. He is also the Director of the Center of Biostatistics and Bioinformatics of the College of Natural Sciences. His research is in the Theory and Applications of Statistics, with emphasis in the Bayesian Approach.
References
- Berger, J., & Pericchi, L. (2004). Training samples in objective Bayesian model selection. Annals of Statistics, 32, 841–869. doi: 10.1214/009053604000000229
- Casella, G., Girón, F. J., Martínez, M. L., & Moreno, E. (2009). Consistency of Bayesian procedures for variable selection. Annals of Statistics, 37, 1207–1228. doi: 10.1214/08-AOS606
- Fouskakis, D., & Ntzoufras, I. (2016a). Limiting behaviour of the Jeffreys power-expected-posterior Bayes factor in Gaussian linear models. Brazilian Journal of Probability and Statistics, 30, 299–320. doi: 10.1214/15-BJPS281
- Fouskakis, D., & Ntzoufras, I. (2016b). Power-conditional-expected priors: Using g-priors with random imaginary data for variable selection power-conditional-expected priors. Journal of Computational and Graphical Statistics, 25, 647–664. doi: 10.1080/10618600.2015.1036996
- Fouskakis, D., Ntzoufras, I., & Draper, D. (2015). Power-expected-posterior priors for variable selection in Gaussian linear models. Bayesian Analysis, 10, 75–107. doi: 10.1214/14-BA887
- Girón, F. J., Moreno, E., & Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Annals of Statistics, 38, 1937–1952. doi: 10.1214/09-AOS754
- Ibrahim, J. G., & Chen, M. H.. (2000). Power prior distributions for regression models. Statistical Science, 15, 46–60. doi: 10.1214/ss/1009212673
- Innocent, J. K. (2016). Bayes factors consistency for nested linear models with increasing dimensions (Unpublished doctoral dissertation). University of Puerto Rico.
- Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90, 928–934. doi: 10.1080/01621459.1995.10476592
- Pérez, J. M., & Berger, J. O. (2002). Expected-posterior prior distributions for model selection. Biometrika, 89, 491–512. doi: 10.1093/biomet/89.3.491