![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
The authors should be congratulated on a stimulating piece of work. There is definitely a great need to understand information-based criteria such as BIC better. I especially especially enjoyed the way they simplified the problem through LaPlace approximation allowing for closed form calculations. This made me remember fondly some old papers of R. A. Fisher (Citation1922). However, as any good academician, I have a few comments:
First, I do not believe that the name of the method is very descriptive. When I first read the title of the paper, I thought that it is going to be about introducing a user selected prior information into the BIC. Instead, the proposed PBIC criterion is based on a very particular prior. This prior is flat for parameters that are common to all the models, and Cauchy-like (Equation (9)) for orthogonal transformation of parameters under selection. I view this both as the biggest strength and biggest weakness of this paper. On one side, the fact that classical BIC does not depend on the prior selected can be viewed as its big advantage. There is no room for the user to doubt the compatibility of a particular prior with the problem considered. Consequently, it also makes it easier for BIC to be accepted in the frequentist community. On the other, the particular prior selected for PBIC, allows for more detailed calculations. This leads to a more detailed formula, as compared to BIC, accounting for different precision in different parameters. In any case, I believe the name should reflect this specific choice of prior explicitly. What about Cauchy prior-based Bayesian Information Criteria (CBIC)?
Second, PBIC use of a particular prior opens it to criticism of the prior selected. In particular, Johnson and Rossell (Citation2010) argue that for model selection it is better to use non-local priors, i.e. priors that do not put mass near the origin. One could implement this by replacing Equation (9) with
Similar calculation as those made in the paper lead to
(1)
(1) While (Equation1
(1)
(1) ) is somewhat longer than PBIC, it is just as intuitive. Would it make sense to consider this or some other similar formula?
Third, as discussed in the paper, the selection of is the weakest point. Unfortunately, I could not come any more clever, less ad-hoc way to do this than the authors.
Finally, I would like to ask the authors if they believe there is a room for a fiducial information criterion.
Disclosure statement
No potential conflict of interest was reported by the author.
ORCID
Jan Hannig http://orcid.org/0000-0002-4164-0173
Additional information
Funding
References
- Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222, 309–368.
- Johnson, V. E., & Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 143–170. doi: 10.1111/j.1467-9868.2009.00730.x