200
Views
0
CrossRef citations to date
0
Altmetric
Discussion Paper and Discussions

A discussion of prior-based Bayesian information criterion (PBIC)

&
Pages 17-18 | Received 12 Feb 2019, Accepted 13 Feb 2019, Published online: 13 Mar 2019

Professor Bayarri and coauthors' paper (hereafter, PBIC) offers a stimulating and welcomed addition to the already extensive and yet still rapid expanding literature on model selection and related topics. In a 2013 review on model selection in linear mixed models by Müller, Scealy, and Welsh (Citation2013), the authors classified main approaches in mixed model selection into three categories, the information criteria, the shrinkage methods and the fence methods. The current paper is not specifically regarding mixed model selection problems; however, as one shall see, there is a connection in various ways.

The paper focuses on a special case of the information criteria, namely, the Bayesian information criterion (BIC) and its extensions. In this regard, two other references may be mentioned, in addition to those cited by the authors. One is the δ-BIC method of Broman and Speed (Citation2002), in which a tuning constant, δ, is multiplied to the logarithm penalty to improve finite-sample performance; the other is an extended BIC proposed by Chen and Chen (Citation2008), which allows the number of covariates to increase with the sample size.

The current paper has noted a number of problems with general use of BIC. Some similar notes were made regarding not just BIC but the information criteria in general by Jiang, Rao, Gu, and Nguyen (Citation2008) in the context of mixed model selection. Among the problems mentioned in both papers is the so-called effective sample size (ESS). The issue was naturally raised in Jiang et al. (Citation2008) because the latter authors were concerned with correlated observations. Intuitively, when the data are correlated, the ESS is smaller than the total number of observations due to the ‘redundancy’ in the data that each data point does not bring as much new information as an independent data point. Take a look at an extreme case where n data points are so correlated that they are identical; obviously, in this case the ESS should be 1, rather than n. Another example, given in Jiang et al. (Citation2008) (also see Jiang & Nguyen, Citation2015), is a linear mixed model, which may be viewed as a two-way extension of the group mean model discussed extensively in PBIC. In the linear mixed model, the observations, yij, satisfy yij=xijβ+ui+vj+eij, i=1,,m1, j=1,,m2, where xij is a vector of known covariates, β is a vector of unknown regression coefficients (the fixed effects), ui, vj are random effects, and eij is an additional error. It is assumed that ui's, vj's and eij's are independent such that uiN(0,σu2), vjN(0,σv2), eijN(0,σe2). It is well-known (e.g., Hartley & Rao, Citation1967; Harville, Citation1977; Miller, Citation1977) that, in this case, the ESS for estimating σu2 and σv2 is not the total sample size, n=m1m2, but m1 and m2, respectively. Now suppose that one wishes to select the fixed covariates, which are components of xij, under the assumed model structure using BIC. It is not clear what should be in place of n in the log(n) penalty (it does not make sense to let n=m1m2). Note that the m1,m2 as the ESS for estimating σu2,σv2, respectively, can be interpreted intuitively – they are the numbers of appearance of the uis and vjs, respectively, in the model. In general, the ESS for correlated data is somewhere between 1 and n, the sample size (this is also noted in PBIC), but exact quantification of ESS is difficult. In PBIC, the authors consider independent, rather than dependent data; still, they show that the ESS issue arises, when it comes to estimating different parameters. More importantly, the authors are able to quantify the ESS, in a certain way. I wonder if the quantification has some general, intuitive explanation, as in the special examples discussed above. By the way, the notation nie, used to denote the ESS for estimating the ith group mean in the group mean model, might cause some confusion as being the eth power of ni; perhaps, ne,i is a better notation?

Another problem, noted both in PBIC and in Jiang et al. (Citation2008), is how to reasonably count the number of (free) parameters, or the degrees of freedom associated with the parameters. In this regard, Ye (Citation1998) introduced the generalised degrees of freedom, which, in particular, is not necessarily an integer. This is similar to ESS, which can also be a non-integer.

As noted, there is an extensive literature in model selection, even if one focuses attention on BIC extensions. Furthermore, even though most of these extensions are proven to be consistent, finite-sample performance can differ substantially. For example, the δ-BIC (Broman & Speed, Citation2002) corresponds to a class of criteria with different values of δ, and the finite-sample performance of the criterion depends heavily on the choice of δ. A question about which BIC extension is the best is a difficult one to answer, if it can be answered at all. An alternative is to let the data speak (assuming that the data know the answer but not how to speak without help). A natural way of doing this is via the fence methods (e.g., Jiang & Nguyen, Citation2015). The idea consists of constructing a statistical fence to carefully isolate a subset of candidate models, known as the correct models. Once the fence is built, the optimal model can be selected from those within the fence based on a criterion of optimality that can incorporate practical considerations. A standard criterion of optimality is parsimony, that is, choosing the model within the fence that is the simplest, e.g., in terms of dimensionality. In a mathematical expression, the fence is constructed via the inequality (1) Q(M)Q(M)c,(1) where M denotes a candidate model, Q() is a measure of lack-of-fit, M is a candidate model that has the minimum Q [so that Q(M) is the basedline measure], and c is a tuning constant. Note that, essentially, all of the model selection strategies, including the information criteria, amount to balancing model fitting and model complexity. The fitting part is controlled by the fence inequality, (1); the complexity part is controlled by the parsimony criterion, if the latter is used to select the optimal model within the fence. Thus, for example, the penalty for model complexity, which corresponds to the expressions other than 2l(θˆ) in PBIC or PBIC [2l(θˆ) is the Q in this case], or δlog(n) in δ-BIC, are not needed. The final question comes down to the choice of c in (1), which may be viewed as a cut-off. This is where the data have something to say. Typically, a lack-of-fit and complexity measures go opposite directions in a way much like the Type-I and Type-II errors in hypothesis testing. Thus, (1) might be viewed as the standard strategy of controlling the probability of Type-I error, but there is a major difference. Instead of using a given cut-off, such as α=0.05 in hypothesis testing, the c in (1) is chosen in a data-driven manner by maximising the ‘posterior’ probability that a candidate model is selected, leading to the adaptive fence (e.g., Jiang & Nguyen, Citation2015, ch. 3).

Finally, consistency in model selection has been widely used as the standard asymptotic property in model selection, but, it is not very useful in comparing different model selection criteria that are all consistent. Although there has been further asymptotic properties, such as the oracle property (Fan & Li, Citation2001), much of the issue still exists, that is, virtually every new model selection procedure that is proposed is consistent, and has the oracle property. What is really needed, when it comes to asymptotic comparison of different model selection procedures, is a similar property to efficiency in parameter estimation. So far, such a property has not been established, and widely accepted.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Broman, K. W., & Speed, T. P. (2002). A model selection approach for the identification of quantitative trait loci in experiemental crosses. Journal of the Royal Statistical Society, Series B, 64, 641–656. doi: 10.1111/1467-9868.00354
  • Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771. doi: 10.1093/biomet/asn034
  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360. doi: 10.1198/016214501753382273
  • Hartley, H. O., & Rao, J. N. K. (1967). Maximum likelihood estimation for the mixed analysis of variance model. Biometrika, 54, 93–108. doi: 10.1093/biomet/54.1-2.93
  • Harville, D. A. (1977). maximum likelihood approaches to variance components estimation and related problems. Journal of the American Statistical Association, 72, 320–340. doi: 10.1080/01621459.1977.10480998
  • Jiang, J., & Nguyen, T. (2015). The fence methods. Singapore: World Scientific.
  • Jiang, J., Rao, J. S., Gu, Z., & Nguyen, T. (2008). Fence methods for mixed model selection. The Annals of Statistics, 36, 1669–1692. doi: 10.1214/07-AOS517
  • Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of analysis of variance. The Annals of Statistics, 5, 746–762. doi: 10.1214/aos/1176343897
  • Müller, S., Scealy, J. L., & Welsh, A. H. (2013). Model selection in linear mixed models. Statistical Science, 28, 135–167. doi: 10.1214/12-STS410
  • Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93, 120–131. doi: 10.1080/01621459.1998.10474094

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.