681
Views
1
CrossRef citations to date
0
Altmetric
Statistical Computing and Graphics

Black Box Variational Bayesian Model Averaging

ORCID Icon, , &
Pages 85-96 | Received 23 Jun 2021, Accepted 22 Mar 2022, Published online: 29 Apr 2022

References

  • Ambrogioni, L., Lin, K., Fertig, E., Vikram, S., Hinne, M., Moore, D., and van Gerven, M. (2021), “Automatic Structured Variational Inference,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, eds. A. Banerjee, and K. Fukumizu, pp. 676–684. PMLR.
  • Ardia, D., Baştürk, N., Hoogerheide, L., and van Dijk, H. K. (2012), “A Comparative Study of Monte Carlo Methods for Efficient Evaluation of Marginal Likelihood,” Computational Statistics and Data Analysis, 56, 3398–3414. 1st issue of the Annals of Computational and Financial Econometrics Sixth Special Issue on Computational Econometrics. DOI: 10.1016/j.csda.2010.09.001.
  • Audi, G., Wapstra, A., and Thibault, C. (2003), “The AME2003 Atomic Mass Evaluation: (ii). Tables, Graphs and References,” Nuclear Physics A, 729, 337–676. DOI: 10.1016/j.nuclphysa.2003.11.003.
  • Balasubramanian, J. B., Visweswaran, S., Cooper, G. F., and Gopalakrishnan, V. (2014), “Selective Model Averaging with Bayesian Rule Learning for Predictive Biomedicine,” AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science, 2014, 17–22.
  • Bartel, J., Quentin, P., Brack, M., Guet, C., and Håkansson, H.-B. (1982), “Towards a Better Parametrisation of Skyrme-like Effective Forces: A Critical Study of the SkM Force,” Nuclear Physics A, 386, 79–100. DOI: 10.1016/0375-9474(82)90403-1.
  • Bernardo, J. M., and Smith, A. F. M. (1994), Reference Analysis, Chapter Inference. Wiley.
  • Bhattacharya, S., and Maiti, T. (2021), “Statistical Foundation of Variational Bayes Neural Networks,” Neural Networks, 137, 151–173. DOI: 10.1016/j.neunet.2021.01.027.
  • Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017), “Variational Inference: A Review for Statisticians,” Journal of the American Statistical Association, 112, 859–877. DOI: 10.1080/01621459.2017.1285773.
  • Bottou, L., Le Cun, Y., and Bengio, Y. (1997), “Global Training of Document Processing Systems Using Graph Transformer Networks,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 489–493. IEEE.
  • Casella, G., and Robert, C. P. (1996), “Rao-Blackwellisation of Sampling Schemes,” Biometrika, 83, 81–94. DOI: 10.1093/biomet/83.1.81.
  • Chabanat, E., Bonche, P., Haensel, P., Meyer, J., and Schaeffer, R. (1995), “New Skyrme Effective Forces for Supernovae and Neutron Rich Nuclei,” Physica Scripta, 1995, 231–233. DOI: 10.1088/0031-8949/1995/T56/034.
  • Clarke, B. (2003), “Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored,” The Journal of Machine Learning Research, 4, 683–712.
  • Clyde, M., and Iversen, E. (2013), Bayesian Model Averaging in the M-Open Framework, pp. 484–498.
  • Clyde, M. A., Ghosh, J., and Littman, M. L. (2011), “Bayesian Adaptive Sampling for Variable Selection and Model Averaging,” Journal of Computational and Graphical Statistics, 20, 80–101. DOI: 10.1198/jcgs.2010.09049.
  • Dobaczewski, J., Flocard, H., and Treiner, J. (1984), “Hartree-Fock-Bogolyubov Description of Nuclei Near the Neutron-Drip Line,” Nuclear Physics A, 422, 103–139. DOI: 10.1016/0375-9474(84)90433-0.
  • Dua, D., and Graff, C. (2017), “UCI Machine Learning Repository.”
  • Duchi, J., Hazan, E., and Singer, Y. (2011), “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” The Journal of Machine Learning Research, 12, 2121–2159.
  • Feroz, F., Hobson, M. P., and Bridges, M. (2009), “Multinest: An Efficient and Robust Bayesian Inference Tool for Cosmology and Particle Physics,” Monthly Notices of the Royal Astronomical Society, 398, 1601–1614. DOI: 10.1111/j.1365-2966.2009.14548.x.
  • Fortunato, M., Blundell, C., and Vinyals, O. (2017), “Bayesian Recurrent Neural Networks,” arXiv preprint arXiv: 1704.02798.
  • Fragoso, T. M., Bertoli, W., and Louzada, F. (2018), “Bayesian Model Averaging: A Systematic Review and conceptual Classification,” International Statistical Review, 86, 1–28. DOI: 10.1111/insr.12243.
  • Friel, N., and Pettitt, A. N. (2008), “Marginal Likelihood Estimation via Power Posteriors,” Journal of the Royal Statistical Society, Series B, 70, 589–607. DOI: 10.1111/j.1467-9868.2007.00650.x.
  • Friel, N., and Wyse, J. (2012), “Estimating the Evidence—A Review,” Statistica Neerlandica, 66, 288–308. DOI: 10.1111/j.1467-9574.2011.00515.x.
  • Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2013), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: CRC Press.
  • Geweke, J. (1999), “Using Simulation Methods for Bayesian Econometric Models: Inference, Development, and Communication,” Econometric Reviews, 18, 1–73. DOI: 10.1080/07474939908800428.
  • Hernández, B., Raftery, A. E., Pennington, S. R., and Parnell, A. C. (2018), “Bayesian Additive Regression Trees Using Bayesian Model Averaging,” Statistics and Computing, 28, 869–890. DOI: 10.1007/s11222-017-9767-1.
  • Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999), “Bayesian Model Averaging: A Tutorial,” Statistical Science, 14, 382–401.
  • Hoffman, M., and Blei, D. (2015), “Stochastic Structured Variational Inference,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (Vol. 38), San Diego, CA, pp. 361–369. PMLR.
  • Homan, M. D., and Gelman, A. (2014), “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo,” Journal of Machine Learning Research, 15, 1351–1381.
  • Hooten, M. B., and Hobbs, N. T. (2015), “A Guide to Bayesian Model Selection for Ecologists,” Ecological Monographs, 85, 3–28. DOI: 10.1890/14-0661.1.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning: with Applications in R, New York, NY: Springer New York.
  • Jaureguiberry, X., Vincent, E., and Richard, G. (2014), “Variational Bayesian Model Averaging for Audio Source Separation,” in 2014 IEEE Workshop on Statistical Signal Processing (SSP), pp. 33–36.
  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999), “An Introduction to Variational Methods for Graphical Models,” Machine Learning, 37, 183–233. DOI: 10.1023/A:1007665907178.
  • Kass, R. E., and Raftery, A. E. (1995), “Bayes Factors,” Journal of the American Statistical Association, 90, 773–795. DOI: 10.1080/01621459.1995.10476572.
  • Kejzlar, V., Neufcourt, L., Nazarewicz, W., and Reinhard, P.-G. (2020), “Statistical Aspects of Nuclear Mass Models,” Journal of Physics G: Nuclear and Particle Physics, 47, 094001. DOI: 10.1088/1361-6471/ab907c.
  • Kingma, D., and Ba, J. (2014), “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations.
  • Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. (2016), “Improved Variational Inference with Inverse Autoregressive Flow,” in Advances in Neural Information Processing Systems (Vol. 29), eds. D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett. Curran Associates, Inc.
  • Klüpfel, P., Reinhard, P.-G., Bürvenich, T. J., and Maruhn, J. A. (2009), “Variations on a Theme by Skyrme: A Systematic Study of Adjustments of Model Parameters,” Physical Review C, 79, 034310. DOI: 10.1103/PhysRevC.79.034310.
  • Kobyzev, I., Prince, S. J., and Brubaker, M. A. (2021), “Normalizing Flows: An Introduction and Review of Current Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3964–3979. DOI: 10.1109/TPAMI.2020.2992934.
  • Kortelainen, M., Lesinski, T., Moré, J. J., Nazarewicz, W., Sarich, J., Schunck, N., Stoitsov, M. V., and Wild, S. M. (2010), “Nuclear Energy Density Optimization,” Physical Review C, 82, 024313. DOI: 10.1103/PhysRevC.82.024313.
  • Kortelainen, M., McDonnell, J., Nazarewicz, W., Reinhard, P.-G., Sarich, J., Schunck, N., Stoitsov, M. V., and Wild, S. M. (2012), “Nuclear Energy Density Optimization: Large Deformations,” Physical Review C, 85, 024304. DOI: 10.1103/PhysRevC.85.024304.
  • Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2017), “Automatic Differentiation Variational Inference,” Journal of Machine Learning Research, 18, 1–45.
  • Latouche, P., and Robin, S. S. (2016), “Variational Bayes Model Averaging for Graphon Functions and Motif Frequencies Inference in W-graph Models,” Statistics and Computing, 26, 1173–1185. DOI: 10.1007/s11222-015-9607-0.
  • Leamer, E. E. (1978), Specification Searches: Ad Hoc Inference with Nonexperimental Data, New York: Wiley.
  • Lenk, P. (2009), “Simulation Pseudo-bias Correction to the Harmonic Mean Estimator of Integrated Likelihoods,” Journal of Computational and Graphical Statistics, 18, 941–960. DOI: 10.1198/jcgs.2009.08022.
  • Madigan, D., Gavrin, J., and Raftery, A. E. (1995), “Eliciting Prior Information to Enhance the Predictive Performance of Bayesian Graphical Models,” Communications in Statistics – Theory and Methods, 24, 2271–2292. DOI: 10.1080/03610929508831616.
  • Masegosa, A. (2020), “Learning Under Model Misspecification: Applications to Variational and Ensemble Methods,” in Advances in Neural Information Processing Systems (Vol. 33), eds. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, pp. 5479–5491. Curran Associates, Inc.
  • Mukhopadhyay, M., and Dunson, D. B. (2020), “Targeted Random Projection for Prediction from High-dimensional Features,” Journal of the American Statistical Association, 115, 1998–2010. DOI: 10.1080/01621459.2019.1677240.
  • Nazarewicz, W. (2016), “Challenges in Nuclear Structure Theory,” Journal of Physics G: Nuclear and Particle Physics, 43, 044002. DOI: 10.1088/0954-3899/43/4/044002.
  • Neal, R. (2001), “Annealed Importance Sampling,” Statistics and Computing, 11, 125–139. DOI: 10.1023/A:1008923215028.
  • Neufcourt, L., Cao, Y., Giuliani, S., Nazarewicz, W., Olsen, E., and Tarasov, O. B. (2020a), “Beyond the Proton Drip Line: Bayesian Analysis of Proton-Emitting Nuclei,” Physical Review C, 101, 014319. DOI: 10.1103/PhysRevC.101.014319.
  • Neufcourt, L., Cao, Y., Giuliani, S., Nazarewicz, W., Olsen, E., and Tarasov, O. B. (2020b), “Quantified Limits of the Nuclear Landscape,” Physical Review C, 101, 044307.
  • Neufcourt, L., Cao, Y., Nazarewicz, W., Olsen, E., and Viens, F. (2019), “Neutron Drip Line in the Ca Region from Bayesian Model Averaging,” Physical Review Letters, 122, 062502. DOI: 10.1103/PhysRevLett.122.062502.
  • Neufcourt, L., Cao, Y., Nazarewicz, W., and Viens, F. (2018), “Bayesian Approach to Model-Based Extrapolation of Nuclear Observables,” Physical Review C, 98, 034318. DOI: 10.1103/PhysRevC.98.034318.
  • Pajor, A. (2017), “Estimating the Marginal Likelihood Using the Arithmetic Mean Identity,” Bayesian Analysis, 12, 261–287. DOI: 10.1214/16-BA1001.
  • Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S., and Lakshminarayanan, B. (2021), “Normalizing Flows for Probabilistic Modeling and Inference,” Journal of Machine Learning Research, 22, 1–64.
  • Papamakarios, G., Pavlakou, T., and Murray, I. (2017), “Masked Autoregressive Flow for Density Estimation,” in Advances in Neural Information Processing Systems (Vol. 30), eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Curran Associates, Inc.
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019), “Pytorch: An Imperative Style, High-performance Deep Learning Library,” in Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc.
  • Peterson, C., and Anderson, J. R. (1987), “A Mean Field Theory Learning Algorithm for Neural Networks,” Complex Systems, 1, 995–1019.
  • Phillips, D. R., Furnstahl, R. J., Heinz, U., Maiti, T., Nazarewicz, W., Nunes, F. M., Plumlee, M., Pratola, M. T., Pratt, S., Viens, F. G., and Wild, S. M. (2021), “Get on the Band Wagon: A Bayesian Framework for Quantifying Model Uncertainties in Nuclear Dynamics,” Journal of Physics. G, Nuclear and Particle Physics, 48, 1–39.
  • Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997), “Bayesian Model Averaging for Linear Regression Models,” Journal of the American Statistical Association, 92, 179–191. DOI: 10.1080/01621459.1997.10473615.
  • Raftery, A. E., Newton, M. A., Satagopa, J. M., and Krivitsk, P. N. (2007), “Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity,” Bayesian Statistics, 8, 1–45.
  • Ranganath, R., Gerrish, S., and Blei, D. (2014), “Black Box Variational Inference,” in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume 33 of Proceedings of Machine Learning Research, pp. 814–822. PMLR.
  • Ranganath, R., Tran, D., and Blei, D. M. (2016), “Hierarchical Variational Models,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning, Volume 48, ICML’16, pp. 2568–2577. JMLR.
  • Rezende, D., and Mohamed, S. (2015), “Variational Inference with Normalizing Flows,” in Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, Lille, France, eds. F. Bach and D. Blei, pp. 1530–1538. PMLR.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
  • Ross, S. M. (2006), Simulation (4th ed.), Orlando, FL: Academic Press, Inc.
  • Ruiz, F. J. R., Titsias, M. K., and Blei, D. M. (2016), “Overdispersed Black-Box Variational Inference,” in Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI’16, Arlington, VA, pp. 647–656. AUAI Press.
  • Salvatier, J., Wiecki, T. V., and Fonnesbeck, C. (2016), “Probabilistic Programming in Python Using pymc3,” PeerJ Computer Science, 2, e55. DOI: 10.7717/peerj-cs.55.
  • Schorning, K., Bornkamp, B., Bretz, F., and Dette, H. (2016), “Model Selection Versus Model Averaging in Dose Finding Studies,” Statistics in Medicine, 35, 4021–4040. DOI: 10.1002/sim.6991.
  • Shazeer, N., and Stern, M. (2018), “Adafactor: Adaptive Learning Rates with Sublinear Memory Cost,” in Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, eds. J. Dy and A. Krause, pp. 4596–4604. PMLR.
  • Silvestro, D., Schnitzler, J., Liow, L. H., Antonelli, A., and Salamin, N. (2014), “Bayesian Estimation of Speciation and Extinction from Incomplete Fossil Occurrence Data,” Systematic Biology, 63, 349–367. DOI: 10.1093/sysbio/syu006.
  • Skilling, J. (2006), “Nested Sampling for General Bayesian Computation,” Bayesian Analysis,1, 833–860. DOI: 10.1214/06-BA127.
  • Steel, M. F. J. (2020), “Model Averaging and its use in Economics,” Journal of Economic Literature, 58, 644–719. DOI: 10.1257/jel.20191385.
  • Tieleman, T., and Hinton, G. (2012), “Lecture 6.5—RmsProp: Divide the Gradient by a Running Average of its Recent Magnitude,” COURSERA: Neural Networks for Machine Learning.
  • Titsias, M., and Lázaro-Gredilla, M. (2014), “Doubly Stochastic Variational Bayes for Non-conjugate Inference,” in Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, Bejing, China, eds. E. P. Xing and T. Jebara, pp. 1971–1979. PMLR.
  • Tran, D., Blei, D. M., and Airoldi, E. M. (2015). “Copula Variational Inference,” in Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NeurIPS’15, pp. 3564–3572, Cambridge, MA: MIT Press.
  • Tran, D., Ranganath, R., and Blei, D. M. (2017), “Hierarchical Implicit Models and Likelihood-Free Variational Inference,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS’17, pp. 5529–5539.
  • Vandaele, W. (1978), “Participation in Illegitimate Activities-Ehrlich Revisited (from Deterrence and Incapacitation-Estimating the Effects of Criminal Sanctions on Crime Rates, pp. 270–335 (alfred blumstein et al, ed.-see ncj-44669).
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. U., and Polosukhin, I. (2017), “Attention is All You Need,” in Advances in Neural Information Processing Systems, volume 30, eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Curran Associates, Inc.
  • Wainwright, M. J., and Jordan, M. I. (2008), “Graphical Models, Exponential Families, and Variational Inference,” Foundations and Trends[textregistered] in Machine Learning, 1, 1–305.
  • Wang, M., Audi, G., Kondev, F. G., Huang, W. J., Naimi, S., and Xu, X. (2017), “The AME2016 Atomic Mass Evaluation (II). Tables, Graphs and References,” Chinese Physics C, 41, 030003. DOI: 10.1088/1674-1137/41/3/030003.
  • Wang, Y., and Blei, D. M. (2018), “Frequentist Consistency of Variational Bayes,” Journal of the American Statistical Association, 114, 1147–1161. DOI: 10.1080/01621459.2018.1473776.
  • Wei, W., Visweswaran, S., and Cooper, G. F. (2011), “The Application of Naive Bayes Model Averaging to Predict Alzheimer’s Disease from Genome-Wide Data,” Journal of the American Medical Informatics Association, 18, 370–375. DOI: 10.1136/amiajnl-2011-000101.
  • Weilbach, C., Beronov, B., Wood, F., and Harvey, W. (2020), “Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, eds. S. Chiappa and R. Calandra, pp. 4441–4451. PMLR.
  • Wen, X. (2015), “Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing,” Biostatistics, 16, 701–712. DOI: 10.1093/biostatistics/kxv009.
  • Zeiler, M. D. (2012), “Adadelta: An Adaptive Learning Rate Method,” ArXiv 1212.5701.
  • Zellner, A. (1986), “On Assessing Prior Distributions and Bayesian Regression Analysis with g Prior Distributions,” in Bayesian Inference and Decision Techniques: Essays in Honor of Brune de Finetti, eds. P. Goel and A. Zellner, pp. 233–243, New York: Elsevier.
  • Zhang, F., and Gao, C. (2020), “Convergence Rates of Variational Posterior Distributions,” The Annals of Statistics, 48, 2180–2207. DOI: 10.1214/19-AOS1883.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.