Abstract
The power-expected-posterior prior is used in this paper for comparing nested linear models. The asymptotic behaviour of the method is investigated for different values of the power parameter of the prior. Focus is given on the consistency of the Bayes factor of comparing the full model versus a generic submodel . In each case, we allow the true generating model to be either or and we keep the dimension of fixed, while the dimension of can be either fixed or (grow as) , with n denoting the sample size.
1. Introduction
Pérez and Berger (Citation2002) developed priors for objective Bayesian model comparison, through the utilisation of the device of ‘imaginary training samples’. The expected-posterior prior (EPP) for the parameter under a model is an expectation of the posterior distribution given imaginary observations of size . The expectation is taken with respect to a suitable probability measure of a reference model , while the posterior distribution is computed via Bayes's theorem starting from a default, typically improper, prior. One of the advantages of using EPPs is that impropriety of baseline priors causes no indeterminacy in the computation of Bayes factors. On the other hand, the EPPs depend on the training sample size and particularly in variable selection problems, imaginary design matrices should also be introduced, under each competing model, and therefore the resulting prior will further depend on this choice (for a detailed discussion on this issue, see Fouskakis, Ntzoufras, & Draper, Citation2015). The selection of a minimal training sample, of size , has been proposed (see, for example, Berger & Pericchi, Citation2004), to make the information content of the prior as small as possible, and this is an appealing idea. But even under this set-up, the resulting prior can be influential when the sample size n is not much larger than the total number of parameters under the full model (see Fouskakis et al., Citation2015).
The power-expected-posterior (PEP) prior, introduced by Fouskakis et al. (Citation2015), is an objective prior which amalgamates ideas from the power prior (Ibrahim & Chen, Citation2000), the expected-posterior prior (Pérez & Berger, Citation2002) and the unit-information-prior approach of Kass and Wasserman (Citation1995) to simultaneously (a) produce a minimally informative prior and (b) diminish the effect of training samples under the EPP methodology. The main idea is to substitute the likelihood by a density-normalised version of a power-likelihood in EPP. Fouskakis et al. (Citation2015) and Fouskakis and Ntzoufras (Citation2016b) studied in detailed the PEP priors under the variable selection problem in Gaussian regression models. In the first paper, they introduced the PEP prior by considering as parameter of interest both the coefficients of the model and the error variance while in the second paper they studied the conditional version of PEP, named PCEP, where they considered only the coefficients as the parameter of interest and the error variance as a common nuisance parameter. Here we focus in the former case. Under this approach, for every model in (the set of all models under consideration) the sampling distribution is specified by (1) (1) where is a vector containing the responses for all subjects, is an design matrix containing the values of the explanatory variables in its columns, is the identity matrix, is a vector of length summarising the effects of the covariates in model on the response and is the error variance for model . Finally, by p we denote the total number of the explanatory variables under consideration and by the full model, including all p covariates.
Furthermore, we denote by the baseline prior to the parameters of model . Here we use the independence Jeffreys prior (or reference prior) as the baseline prior distribution. Hence, for any , we have (2) (2) where is an unknown normalising constant.
We assume that in there exists a model , with parameters and , sampling distribution and baseline prior , which is nested into each of the remaining models and we consider it as a reference model. This is the typical case in the variable selection problem, studied in this paper. Given then a set of imaginary data and a positive power parameter δ, that is used to regulate, essentially, the contribution of the imaginary data on the ‘final’ prior, we introduce the density-normalised power-likelihood, under model , given by (3) (3) The above density-normalised power-likelihood is still a normal distribution with variance inflated by a factor of δ; in the above, denotes the imaginary design matrix under model . In a similar manner, under the reference model, the density-normalised power-likelihood takes the form of (Equation3(3) (3) ) but using now the likelihood of .
In order to apply the PEP methodology, the density-normalised power-likelihood (Equation3(3) (3) ) is used to evaluate, under the imaginary data and the baseline prior, the prior predictive distribution of model as well as the posterior distribution of the parameters of model (4) (4) where (5) (5) is the prior predictive distribution of model for .
Finally, the imposed prior for the parameters of any model has the following form (6) (6) The default choice for δ is to set it equal to , i.e. the sample size of the imaginary data, so that the overall information of the imaginary data in the posterior is equal to one data point. Furthermore, setting and, consequently, the design matrix of the imaginary data simplifies significantly the overwhelming computations required when considering all possible ‘minimal’ training samples (Pérez & Berger, Citation2002) while it also avoids the complicated issue (in some cases) of defining the size of the minimal training samples (Berger & Pericchi, Citation2004). In addition, under the choice , the PEP prior remains relatively non-informative even for models with dimension close to the sample size n, while the effect on the evaluation of each model is minimal since the resulting Bayes factors are robust over different values of . Detailed information about the default specifications of the PEP prior is provided in Fouskakis et al. (Citation2015). Finally, the null model (with no explanatory variables) is a standard choice for the reference model in regression problems; see, for example, Pérez and Berger (Citation2002). In the above definition of PEP prior, the power parameter can also be model depended, and denoted by .
Fouskakis and Ntzoufras (Citation2016a) proved the consistency of the Bayes factor when using the PEP methodology, with the independence Jeffreys as a baseline prior, for Gaussian linear models, under very mild conditions on the design matrix, when the dimension of each model is fixed, the size of the training sample is equal to the sample size n and the power parameter is also set equal to n. In a similar manner as in Fouskakis and Ntzoufras (Citation2016a), when comparing the full model to a reduced model , the Bayes factor under the PEP prior is given by (7) (7) with denoting the residual sum of squares of model (). For large n, we can approximate the Bayes factor given in (Equation7(7) (7) ) as (8) (8) if p is fixed constant; and as (9) (9) if p increases as n grows to infinity and grows to infinity, with rate so that (for a detailed proof of (Equation8(8) (8) ) and (Equation9(9) (9) ) see Innocent, Citation2016).
In the rest of the paper, we denote by and by where denotes the ‘true’ model and the hat matrix of model (see Casella, Girón, Martínez, & Moreno, Citation2009). Since the reduced model is nested in the full model , we have that .
Finally, the following results hold, as n increases, with respect to the distribution and the limiting behaviour of the statistic (see Girón, Moreno, & Casella, Citation2010):
If and :
When sampling from model , the distribution of the statistic is the central beta distribution and
When sampling from model , the distribution of the statistic is the non-central beta distribution and with
If and with
When sampling from model , the distribution of the statistic is the central beta distribution and
When sampling from model the distribution of the statistic is the non-central beta distribution and where
In this paper, we examine the consistency of the Bayes factor, for nested normal linear models, under the PEP methodology, using the pair of models and . The number of parameters of the simpler model is always fixed, while for the full model is of order where . We investigate the effect of the power parameter by examining four different scenarios. In each case, the ‘true’ model is set equal to either or .
2. Bayes factor consistency under power-expected-posterior priors
In what follows we set the size of the training sample equal to the sample size n as in Fouskakis et al. (Citation2015).
2.1. When the power
First, we consider the case where the power parameter is set equal to the sample size n, and studying the consistency when the dimension p of the full model is either a fixed constant number or large and goes to infinity.
Then (Equation7(7) (7) ) becomes: (10) (10)
2.1.1. When and
Theorem 2.1
Let the sample size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. and where Under the condition when sampling from model where j is either ℓ or p we have:
Proof.
For , (Equation8(8) (8) ) becomes (11) (11)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation11(11) (11) ) becomes: (12) (12) Since p and are constants and n goes to infinity we get Thus, the Bayes factor of the full model versus the reduced model is consistent under the reduced model .
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation11(11) (11) ) becomes: (13) (13) Thus since and as . Therefore, the Bayes factor of the full model versus the reduced model is consistent when sampling from the full model .
2.1.2. When and
Theorem 2.2
Let and suppose that the reduced model has a fixed number of parameters, i.e. as the simple size n increases, and in the full model the number of parameters increase with rate with Then:
When sampling from model
When sampling from model for some function given by .
Proof.
By replacing and , (Equation9(9) (9) ), becomes (14) (14)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation14(14) (14) ) becomes and then So for large value of p, we have and then (15) (15) In both cases, for large p, we get since Thus the Bayes factor of the full model against the reduced model is consistent under the reduced model .
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation14(14) (14) ) becomes So for large p, we have and then Solving the equation for ε, we get Therefore using the function we have Thus, the Bayes factor of the full model versus the reduced model is consistent under the full model if and only if when r is large and goes to infinity.
2.2. When the power
Second, we consider the case where the power and studying the consistency when the dimension p of the full model is either a fixed constant number or large and goes to infinity. Then (Equation7(7) (7) ) becomes:
2.2.1. When and
Let the simple size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. and , where .
For , (Equation8(8) (8) ) becomes and then since p and are fixed constants and for large values of n, we get (16) (16) Working as in the proof of Theorem 2.1, we conclude that the Bayes factor of the full model versus the reduced model is consistent when sampling from either models.
2.2.2. When and
Theorem 2.3
Let and suppose that the reduced model has a fixed number of parameters, i.e. as the simple size n increases, and in the full model the number of parameters increase with rate with Then:
When sampling from model
When sampling from model for some function given by
Proof.
By replacing and , (Equation9(9) (9) ), becomes (17) (17)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation17(17) (17) ) becomes So for large value of p, we have In both cases, for large p, we get Thus the Bayes factor of the full model against the reduced model is consistent under the reduced model
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation17(17) (17) ) becomes or (18) (18) So for large p, we have Thus working as in the proof of Theorem 2.2 we conclude that the Bayes factor of the full model versus the reduced model is consistent under the full model if and only if when r is large and goes to infinity.
2.3. When the power
Third, we consider the case where the power is equal to the dimension of the full model and studying the consistency when the dimension of the full model is either a fixed constant number or large and goes to infinity.
Under this set-up, (Equation7(7) (7) ) becomes:
2.3.1. When and
Theorem 2.4
Let and the sample size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. and where Then when sampling from model where j is either ℓ or p we have:
Proof.
For , (Equation8(8) (8) ) becomes (19) (19) Then we consider the following two cases.
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation19(19) (19) ) becomes Since p and are constants, with , we get Thus, the Bayes factor of the full model versus the reduced model is inconsistent under the reduced model
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation19(19) (19) ) becomes and thus Therefore, the Bayes factor of the full model versus the reduced model is consistent when sampling from the full model
2.3.2. When and
Theorem 2.5
Let and suppose that the reduced model has a fixed number of parameters, i.e. as the simple size n increases, and in the full model the number of parameters increase with rate with Then:
When sampling from model
When sampling from model for some function given by .
Proof.
By replacing and , (Equation9(9) (9) ) becomes (20) (20)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation20(20) (20) ) becomes and then So for large value of p we have (21) (21) In both cases, for large p, we get Thus the Bayes factor of the full model against the reduced model is consistent under the reduced model
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation20(20) (20) ) becomes So for large p, we have and then Solving the equation for ε, we get Therefore using the function we have Thus, the Bayes factor of the full model versus the reduced model is consistent under the full model if and only if when r is large and goes to infinity.
2.4. When the power
Finally, we consider the case where the power parameter is set equal to a fixed non-negative constant δ, and studying the consistency when the dimension p of the full model is either a fixed constant number or large and goes to infinity.
Then (Equation7(7) (7) ) becomes: (22) (22)
2.4.1. When and
Theorem 2.6
Let the sample size n increases and being strictly greater than the dimension of the full model . Furthermore, suppose that the dimension of both models, under consideration, are fixed non-negative natural numbers, i.e. and where Under the condition when sampling from model where j is either ℓ or p we have:
Proof.
For , (Equation8(8) (8) ) becomes (23) (23) Then we consider the following two cases.
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation23(23) (23) ) becomes Since p and are constants, with , if δ is large, we get while if δ is not large, we get Thus, the Bayes factor of the full model versus the reduced model is consistent under the reduced model only for large values of δ.
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation23(23) (23) ) becomes Thus Therefore, the Bayes factor of the full model versus the reduced model is consistent when sampling from the full model .
2.4.2. When and
Theorem 2.7
Let and suppose that the reduced model has a fixed number of parameters, i.e. as the simple size n increases, and in the full model the number of parameters increase with rate with Then:
When sampling from model for a continuous and decreasing function .
When sampling from model for a continuous function
Proof.
By replacing and , (Equation9(9) (9) ) becomes (24) (24)
(a) Suppose that the Reduced Model is true
Using the asymptotic results of given in Section 1, (Equation24(24) (24) ) becomes and then We consider the following cases:
If then Thus for any r>1
If for large values of p we get Then if
Thus, the Bayes factor of the full model versus the reduced model is consistent under the reduced model if and only if the power .
(b) Suppose that the Full Model is true
Using the asymptotic results of given in Section 1, (Equation24(24) (24) ) becomes or We consider the following cases
If then and for large values of δ we have while if δ is not large
If , for large value p we have Then if
Thus, the Bayes factor of the full model versus the reduced model is inconsistent under the full model if or when and δ is small.
3. Summary and conclusions
In this paper, we examined the asymptotic behaviour of the power-expected-posterior methodology when comparing nested normal linear models. Emphasis was given on the consistency of the Bayes factor of the full model versus a generic submodel . The number of parameters of the simplest model was kept always fixed, while for the full model was set of order where . We investigated the effect of the prior power parameter , by examining four different scenarios. In each case, the ‘true’ model was set equal to either or . Tables – summarise our findings.
Table 1. Consistency of when model has dimension and .
Table 2. Consistency of when model has dimension and .
Table 3. Consistency of when model has dimension and .
The consistency properties of the Power-Expected-Posterior (PEP) prior Bayes factors are eminently reasonable, assuming that we are sampling from either of the candidate models. It is always consistent for fixed dimensions of the candidate models and even in the difficult situation on which the alternative model can grow with the sample size, for the situations described in Tables – , the PEP Bayes factor is consistent, unless the alternative model is extremely close to the null model, in which case, we conjecture, the lack of consistency is not a critical issue, at least for prediction purposes.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
D. Fouskakis
D. Fouskakis is an Associate Professor in the Department of Mathematics, at the National Technical University of Athens, in Greece. He is also the Director of the Stats Lab at the same University. His research mostly focuses on Bayesian model and variable selection, on objective priors and on stochastic optimization methods.
J. K. Innocent
J. K. Innocent received a Ph.D in Mathematics at the University of Puerto Rico, Puerto Rico, USA in 2016. He is currently back to Haiti, where he teaches mathematical and Statistical courses at a university level. His main research areas are on Bayesian Statistics, Statistical Analysis, Biostatistics and Epidemiology.
L. Pericchi
L. Pericchi is a Full Professor in the Department of Mathematics of the University of Puerto Rico Rio Piedras, USA. He is also the Director of the Center of Biostatistics and Bioinformatics of the College of Natural Sciences. His research is in the Theory and Applications of Statistics, with emphasis in the Bayesian Approach.
References
- Berger, J., & Pericchi, L. (2004). Training samples in objective Bayesian model selection. Annals of Statistics, 32, 841–869. doi: 10.1214/009053604000000229
- Casella, G., Girón, F. J., Martínez, M. L., & Moreno, E. (2009). Consistency of Bayesian procedures for variable selection. Annals of Statistics, 37, 1207–1228. doi: 10.1214/08-AOS606
- Fouskakis, D., & Ntzoufras, I. (2016a). Limiting behaviour of the Jeffreys power-expected-posterior Bayes factor in Gaussian linear models. Brazilian Journal of Probability and Statistics, 30, 299–320. doi: 10.1214/15-BJPS281
- Fouskakis, D., & Ntzoufras, I. (2016b). Power-conditional-expected priors: Using g-priors with random imaginary data for variable selection power-conditional-expected priors. Journal of Computational and Graphical Statistics, 25, 647–664. doi: 10.1080/10618600.2015.1036996
- Fouskakis, D., Ntzoufras, I., & Draper, D. (2015). Power-expected-posterior priors for variable selection in Gaussian linear models. Bayesian Analysis, 10, 75–107. doi: 10.1214/14-BA887
- Girón, F. J., Moreno, E., & Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Annals of Statistics, 38, 1937–1952. doi: 10.1214/09-AOS754
- Ibrahim, J. G., & Chen, M. H.. (2000). Power prior distributions for regression models. Statistical Science, 15, 46–60. doi: 10.1214/ss/1009212673
- Innocent, J. K. (2016). Bayes factors consistency for nested linear models with increasing dimensions (Unpublished doctoral dissertation). University of Puerto Rico.
- Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90, 928–934. doi: 10.1080/01621459.1995.10476592
- Pérez, J. M., & Berger, J. O. (2002). Expected-posterior prior distributions for model selection. Biometrika, 89, 491–512. doi: 10.1093/biomet/89.3.491