CrossRef citations to date
Theory and Methods

Frequentist Consistency of Variational Bayes

ORCID Icon &
Pages 1147-1161 | Received 01 May 2017, Published online: 06 Aug 2018


  • Abbe, E., and Sandon, C. (2015), “Community Detection in General Stochastic Block Models: Fundamental Limits and Efficient Recovery Algorithms,” in 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), Piscataway, NJ: IEEE, pp. 670–688.
  • Alquier, P., and Ridgway, J. (2017), “Concentration of Tempered Posteriors and of Their Variational Approximations,” arXiv:1706.09293.
  • Alquier, P., Ridgway, J., and Chopin, N. (2016), “On the Properties of Variational Approximations of Gibbs Posteriors,” Journal of Machine Learning Research, 17, 1–41.
  • Amir-Moez, A., and Johnston, G. (1969), “On the Product of Diagonal Elements of a Positive Matrix,” Mathematics Magazine, 42, 24–26.
  • Beckenbach, E. F., and Bellman, R. (2012), Inequalities (Vol. 30), New York: Springer Science & Business Media.
  • Bernstein, S. N. (1917), Theory of Probability, Moscow, Leningrad.
  • Bickel, P., Choi, D., Chang, X., and Zhang, H. (2013), “Asymptotic Normality of Maximum Likelihood and Its Variational Approximation for Stochastic Blockmodels,” The Annals of Statistics, 41, 1922–1943.
  • Bickel, P., and Kleijn, B. (2012), “The Semiparametric Bernstein–von Mises Theorem,” The Annals of Statistics, 40, 206–237.
  • Bickel, P. J., and Yahav, J. A. (1967), “Asymptotically Pointwise Optimal Procedures in Sequential Analysis,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 401–413).
  • Bishop, C. M. (2006), “Pattern Recognition,” in Machine Learning, New York: Springer-Verlag, p. 128.
  • Blei, D., Kucukelbir, A., and McAuliffe, J. (2016), “Variational Inference: A Review for Statisticians,” Journal of American Statistical Association, 112, 859–877.
  • Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3, 993–1022.
  • Bontemps, D. et al., (2011), “Bernstein–Von Mises Theorems for Gaussian Regression With Increasing Number of Regressors,” The Annals of Statistics, 39, 2557–2584.
  • Boucheron, S., Gassiat, E., et al., (2009), “A Bernstein-Von Mises Theorem for Discrete Probability Distributions,” Electronic Journal of Statistics, 3, 114–148.
  • Braides, A. (2006), “A Handbook of Γ-Convergence,” in Handbook of Differential Equations: Stationary Partial Differential Equations (Vol. 3), eds. M. Chipot and P. Quittner, The Netherlands, Elsevier, pp. 101–213.
  • Breslow, N. E., and Clayton, D. G. (1993), “Approximate Inference in Generalized Linear Mixed Models,” Journal of the American Statistical Association, 88, 9–25.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., and Riddell, A. (2015), “Stan: A Probabilistic Programming Language,” Journal of Statistical Software, 76, 1–32.
  • Castillo, I. (2012a), “Semiparametric Bernstein–Von Mises Theorem and Bias, Illustrated With Gaussian Process Priors,” Sankhya A, 74, 194–221.
  • ——— (2012b), “A Semiparametric Bernstein–Von Mises Theorem for Gaussian Process Priors,” Probability Theory and Related Fields, 152, 53–99.
  • ——— (2014), “On Bayesian Supremum Norm Contraction Rates,” The Annals of Statistics, 42, 2058–2091.
  • Castillo, I., and Nickl, R. (2012), “Nonparametric Bernstein–Von Mises Theorems,” arXiv:1208.3862.
  • ——— (2013), “Nonparametric Bernstein–Von Mises Theorems in Gaussian White Noise,” The Annals of Statistics, 41, 1999–2028.
  • ——— (2014), “On the Bernstein–von Mises Phenomenon for Nonparametric Bayes Procedures,” The Annals of Statistics, 42, 1941–1969.
  • Castillo, I., and Rousseau, J. (2015), “A Bernstein–Von Mises Theorem for Smooth Functionals in Semiparametric Models,” The Annals of Statistics, 43, 2353–2383.
  • Celisse, A., Daudin, J.-J., and Pierre, L. (2012), “Consistency of Maximum-Likelihood and Variational Estimators in the Stochastic Block Model,” Electronic Journal of Statistics, 6, 1847–1899.
  • Chen, Y.-C., Wang, Y. S., and Erosheva, E. A. (2017), “On the Use of Bootstrap With Variational Inference: Theory, Interpretation, and a Two-Sample Test Example,” arXiv:1711.11057.
  • Cox, D. D. (1993), “An Analysis of Bayesian Inference for Nonparametric Regression,” The Annals of Statistics, 21, 903–923.
  • Dal Maso, G. (2012), An Introduction to Γ-Convergence (Vol. 8), New York: Springer Science & Business Media.
  • De Blasi, P., and Hjort, N. L. (2009), “The Bernstein–Von Mises Theorem in Semiparametric Competing Risks Models,” Journal of Statistical Planning and Inference, 139, 2316–2328.
  • Dempster, A., Laird, N., and Rubin, D. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38.
  • Diaconis, P., and Freedman, D. (1986), “On the Consistency of Bayes Estimates,” The Annals of Statistics, 14, 1–26.
  • ——— (1997), “Consistency of Bayes Estimates for Nonparametric Regression: A Review,” in Festschrift for Lucien Le Cam, eds. D. Pollard, E. Torgersen, and G. L. Yang, New York: Springer, pp. 157–165.
  • ——— (1998), “Consistency of Bayes Estimates for Nonparametric Regression: Normal Theory,” Bernoulli, 4, 411–444.
  • Dieng, A. B., Tran, D., Ranganath, R., Paisley, J., and Blei, D. M. (2017), “Variational Inference via χ-Upper Bound Minimization,” in Advances in Neural Information Processing Systems, pp. 2729–2738.
  • Freedman, D. (1999), “Wald Lecture: On the Bernstein-Von Mises Theorem With Infinite-Dimensional Parameters,” The Annals of Statistics, 27, 1119–1141.
  • Gelfand, A. E., and Smith, A. F. (1990), “Sampling-Based Approaches to Calculating Marginal Densities,” Journal of the American Statistical Association, 85, 398–409.
  • Ghorbani, B., Javadi, H., and Montanari, A. (2018), “An Instability in Variational Inference for Topic Models,” arXiv:1802.00568.
  • Ghosal, S., and van der Vaart, A. (2017), Fundamentals of Nonparametric Bayesian Inference (Vol. 44), Cambridge, UK: Cambridge University Press.
  • Ghosh, J., and Ramamoorthi, R. (2003), Bayesian Nonparametrics (Springer Series in Statistics), New York: Springer.
  • Giordano, R., Broderick, T., and Jordan, M. I. (2017a), “Covariances, Robustness, and Variational Bayes,” arXiv:1709.02536.
  • Giordano, R., Liu, R., Varoquaux, N., Jordan, M. I., and Broderick, T. (2017b), “Measuring Cluster Stability for Bayesian Nonparametrics Using the Linear Bootstrap,” arXiv:1712.01435.
  • Hall, P., Ormerod, J. T., and Wand, M. (2011a), “Theory of Gaussian Variational Approximation for a Poisson Mixed Model,” Statistica Sinica, 21, 369–389.
  • Hall, P., Pham, T., Wand, M. P., and Wang, S. S. (2011b), “Asymptotic Normality and Valid Inference for Gaussian Variational Approximation,” The Annals of Statistics, 39, 2502–2532.
  • Hastings, W. (1970), “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, 57, 97–109.
  • Hoffman, M., Blei, D., Wang, C., and Paisley, J. (2013), “Stochastic Variational Inference,” Journal of Machine Learning Research, 14, 1303–1347.
  • Hoffman, M. D., and Gelman, A. (2014), “The No-U-Turn Sampler,” JMLR, 15, 1593–1623.
  • Hofman, J., and Wiggins, C. (2008), “Bayesian Approach to Network Modularity,” Physical Review Letters, 100, 258701-1–258701-4.
  • James, L. F. (2008), “Large Sample Asymptotics for the Two-Parameter Poisson–Dirichlet Process,” in Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, eds. B. S. Clarke and S. Ghosal, Beachwood, OH: Institute of Mathematical Statistics, pp. 187–199.
  • Jiang, J. (2007), Linear and Generalized Linear Mixed Models and Their Applications, New York: Springer Science & Business Media.
  • Johnstone, I. M. (2010), “High Dimensional Bernstein-Von Mises: Simple Examples,” Institute of Mathematical Statistics Collections, 6, 87.
  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999), “An Introduction to Variational Methods for Graphical Models,” Machine Learning, 37, 183–233.
  • Kim, Y. (2006), “The Bernstein–Von Mises Theorem for the Proportional Hazard Model,” The Annals of Statistics, 34, 1678–1700.
  • ——— (2009), “A Bernstein-Von Mises Theorem for Doubly Censored Data,” Statistica Sinica, 19, 581–595.
  • Kim, Y., and Lee, J. (2004), “A Bernstein-Von Mises Theorem in the Nonparametric Right-Censoring Model,” Annals of Statistics, 32, 1492–1512.
  • Kleijn, B., and Van der Vaart, A. (2012), “The Bernstein-von-Mises Theorem Under Misspecification,” Electronic Journal of Statistics, 6, 354–381.
  • Knapik, B. T., van der Vaart, A. W., van Zanten, J. H. et al. (2011), “Bayesian Inverse Problems With Gaussian Priors,” The Annals of Statistics, 39, 2626–2657.
  • Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2017), “Automatic Differentiation Variational Inference,” The Journal of Machine Learning Research, 18, 430–474.
  • Laplace, P. (1809), “Memoire Sur Les Integrales Definies et leur Application aux Probabilites, et Specialement a la Recherche du Milieu qu’il Faut Choisir Entre les Resultats des Observations,” Memoires Presentes a l’Academie Des Sciences, Paris.
  • Leahu, H. (2011), “On the Bernstein-Von Mises Phenomenon in the Gaussian White Noise Model,” Electronic Journal of Statistics, 5, 373–404.
  • Le Cam, L. (1953), “On Some Asymptotic Properties of Maximum Likelihood Estimates and Related Bayes’ Estimates,” University of California Publications in Statistics, 1, 277–330.
  • Le Cam, L., and Yang, G. L. (2012), Asymptotics in Statistics: Some Basic Concepts, New York: Springer Science & Business Media.
  • Lehmann, E. L., and Casella, G. (2006), Theory of Point Estimation, New York: Springer Science & Business Media.
  • Li, Y., and Turner, R. E. (2016), “Rényi Divergence Variational Inference,” in Advances in Neural Information Processing Systems, pp. 1073–1081.
  • Liu, Q., and Wang, D. (2016), “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm,” in Advances In Neural Information Processing Systems, pp. 2378–2386.
  • Lu, Y. (2017), “On the Bernstein-Von Mises Theorem for High Dimensional Nonlinear Bayesian Inverse Problems,” arXiv:1706.00289.
  • Lu, Y., Stuart, A. M., and Weber, H. (2017), “Gaussian Approximations for Probability Measures on Rd,” SIAM/ASA Journal on Uncertainty Quantification, 5, 1136–1165.
  • McCullagh, P. (1984), “Generalized Linear Models,” European Journal of Operational Research, 16, 285–292.
  • McCulloch, C. E., and Neuhaus, J. M. (2001), Generalized Linear Mixed Models, New York: Wiley Online Library.
  • Mossel, E., Neeman, J., and Sly, A. (2012), “Stochastic Block Models and Reconstruction,” arXiv:1202.1499.
  • Murphy, K. P. (2012), Machine Learning: A Probabilistic Perspective, Cambridge, MA: MIT Press.
  • Murphy, S. A., and Van der Vaart, A. W. (2000), “On Profile Likelihood,” Journal of the American Statistical Association, 95, 449–465.
  • Ormerod, J. T., and Wand, M. P. (2010), “Explaining Variational Approximations,” The American Statistician, 64, 140–153.
  • Ormerod, J. T., You, C., and Muller, S. (2014), “A Variational Bayes Approach to Variable Selection,” Electronic Journal of Statistics, 11, 3549–3594.
  • Panov, M., and Spokoiny, V. (2015), “Finite Sample Bernstein–Von Mises Theorem for Semiparametric Problems,” Bayesian Analysis, 10, 665–710.
  • ——— (2014), “Critical Dimension in the Semiparametric Bernsteinvon Mises Theorem,” Proceedings of the Steklov Institute of Mathematics, 287, 232–255.
  • Pati, D., Bhattacharya, A., and Yang, Y. (2017), “On Statistical Optimality of Variational Bayes,” arXiv:1712.08983.
  • Ranganath, R., Tran, D., Altosaar, J., and Blei, D. (2016a), “Operator Variational Inference,” in Advances in Neural Information Processing Systems, pp. 496–504.
  • Ranganath, R., Tran, D., and Blei, D. (2016b), “Hierarchical Variational Models,” in Proceedings of The 33rd International Conference on Machine Learning, pp. 324–333.
  • Ray, K. (2017), “Adaptive Bernstein-Von Mises Theorems in Gaussian White Noise,” The Annals of Statistics, 45, 2511–2536.
  • Rivoirard, V., and Rousseau, J. (2012), “Bernstein–Von Mises Theorem for Linear Functionals of the Density,” The Annals of Statistics, 40, 1489–1523.
  • Robert, C., and Casella, G. (2004), Monte Carlo Statistical Methods (Springer Texts in Statistics), New York: Springer-Verlag.
  • Roberts, S. J., Husmeier, D., Rezek, I., and Penny, W. (1998), “Bayesian Approaches to Gaussian Mixture Modeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1133–1142.
  • Sheth, R., and Khardon, R. (2017), “Excess Risk Bounds for the Bayes Risk Using Variational Inference in Latent Gaussian Models,” in Advances in Neural Information Processing Systems, pp. 5157–5167.
  • Snijders, T. A., and Nowicki, K. (1997), “Estimation and Prediction for Stochastic Blockmodels for Graphs With Latent Block Structure,” Journal of Classification, 14, 75–100.
  • Spokoiny, V. (2013), “Bernstein-Von Mises Theorem for Growing Parameter Dimension,” arXiv:1302.3430.
  • Tran, D., Blei, D., and Airoldi, E. M. (2015a), “Copula Variational Inference,” in Advances in Neural Information Processing Systems, pp. 3564–3572.
  • Tran, D., Ranganath, R., and Blei, D. M. (2015b), “The Variational Gaussian Process,” arXiv:1511.06499.
  • Van der Vaart, A. W. (2000), Asymptotic Statistics (Vol. 3), Cambridge, UK: Cambridge University Press.
  • Von Mises, R. (1931), Wahrscheinlichkeitsrechnung und ihre Anwendungen in der Statistik und der Theoretischen Physik, Leipzig und Wien.
  • Wainwright, M. J., and Jordan, M. I. (2008), “Graphical Models, Exponential Families, and Variational Inference,” Foundations and Trends® in Machine Learning, 1, 1–305.
  • Wang, B., and Titterington, D. (2004), “Convergence and Asymptotic Normality of Variational Bayesian Approximations for Exponential Family Models With Missing Values,” in Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI Press, pp. 577–584.
  • ——— (2005), “Inadequacy of Interval Estimates Corresponding to Variational Bayesian Approximations,” in Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, eds. R. G. Corwell and Z. Ghahramani, pp. 373–380.
  • Wang, B., and Titterington, D. (2006), “Convergence Properties of a General Algorithm for Calculating Variational Bayesian Estimates for a Normal Mixture Model,” Bayesian Analysis, 1, 625–650.
  • Wang, Y. J., and Wong, G. Y. (1987), “Stochastic Blockmodels for Directed Graphs,” Journal of the American Statistical Association, 82, 8–19.
  • Westling, T., and McCormick, T. H. (2015), “Beyond Prediction: A Framework for Inference with Variational Approximations in Mixture Models,” arXiv:1510.08151.
  • Yang, Y., Pati, D., and Bhattacharya, A. (2017), “α-Variational Inference With Statistical Guarantees,” arXiv:1710.03266.
  • You, C., Ormerod, J. T., and Müller, S. (2014), “On Variational Bayes Estimation and Variational Information Criteria for Linear Regression Models,” Australian & New Zealand Journal of Statistics, 56, 73–87.
  • Zhang, A. Y., and Zhou, H. H. (2017), “Theoretical and Computational Guarantees of Mean Field Variational Inference for Community Detection,” arXiv:1710.11268.
  • Zhang, F., and Gao, C. (2017), “Convergence Rates of Variational Posterior Distributions,” arXiv:1712.02519.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.