Abstract
We explore the arguments for maximizing the “evidence” as an algorithm for model selection. We show, using a new definition of model complexity which we term “flexibility,” that maximizing the evidence should appeal to both Bayesian and frequentist statisticians. This is due to flexibility’s unique position in the exact decomposition of log-evidence into log-fit minus flexibility. In the Gaussian linear model, flexibility is asymptotically equal to the Bayesian information criterion (BIC) penalty, but we caution against using BIC in place of flexibility for model selection.
Acknowledgments
We began this article while CEP was Heilbronn Distinguished Visitor in Data Science at the University of Bristol, UK, in Spring 2019. We would like to thank the two reviewers for their detailed comments on previous versions of this article, which improved it considerably.