Search in:

Advanced search

Journal of the American Statistical Association Volume 114, 2019 - Issue 527

Submit an article Journal homepage

4,577

Views

CrossRef citations to date

Altmetric

Theory and Methods

Frequentist Consistency of Variational Bayes

Yixin WangDepartment of Statistics, Columbia University, New York, NY

https://orcid.org/0000-0002-6617-4842 View further author information

David M. BleiDepartment of Statistics and Department of Computer Science, Columbia University, New York, NYView further author information

Pages 1147-1161 | Received 01 May 2017, Published online: 06 Aug 2018

Cite this article
https://doi.org/10.1080/01621459.2018.1473776
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Abbe, E., and Sandon, C. (2015), “Community Detection in General Stochastic Block Models: Fundamental Limits and Efficient Recovery Algorithms,” in 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), Piscataway, NJ: IEEE, pp. 670–688.
Google Scholar
Alquier, P., and Ridgway, J. (2017), “Concentration of Tempered Posteriors and of Their Variational Approximations,” arXiv:1706.09293.
Google Scholar
Alquier, P., Ridgway, J., and Chopin, N. (2016), “On the Properties of Variational Approximations of Gibbs Posteriors,” Journal of Machine Learning Research, 17, 1–41.
Web of Science ®Google Scholar
Amir-Moez, A., and Johnston, G. (1969), “On the Product of Diagonal Elements of a Positive Matrix,” Mathematics Magazine, 42, 24–26.
Google Scholar
Beckenbach, E. F., and Bellman, R. (2012), Inequalities (Vol. 30), New York: Springer Science & Business Media.
Google Scholar
Bernstein, S. N. (1917), Theory of Probability, Moscow, Leningrad.
Google Scholar
Bickel, P., Choi, D., Chang, X., and Zhang, H. (2013), “Asymptotic Normality of Maximum Likelihood and Its Variational Approximation for Stochastic Blockmodels,” The Annals of Statistics, 41, 1922–1943.
Web of Science ®Google Scholar
Bickel, P., and Kleijn, B. (2012), “The Semiparametric Bernstein–von Mises Theorem,” The Annals of Statistics, 40, 206–237.
Web of Science ®Google Scholar
Bickel, P. J., and Yahav, J. A. (1967), “Asymptotically Pointwise Optimal Procedures in Sequential Analysis,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 401–413).
Google Scholar
Bishop, C. M. (2006), “Pattern Recognition,” in Machine Learning, New York: Springer-Verlag, p. 128.
Google Scholar
Blei, D., Kucukelbir, A., and McAuliffe, J. (2016), “Variational Inference: A Review for Statisticians,” Journal of American Statistical Association, 112, 859–877.
Web of Science ®Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3, 993–1022.
Web of Science ®Google Scholar
Bontemps, D. et al., (2011), “Bernstein–Von Mises Theorems for Gaussian Regression With Increasing Number of Regressors,” The Annals of Statistics, 39, 2557–2584.
Web of Science ®Google Scholar
Boucheron, S., Gassiat, E., et al., (2009), “A Bernstein-Von Mises Theorem for Discrete Probability Distributions,” Electronic Journal of Statistics, 3, 114–148.
Web of Science ®Google Scholar
Braides, A. (2006), “A Handbook of Γ-Convergence,” in Handbook of Differential Equations: Stationary Partial Differential Equations (Vol. 3), eds. M. Chipot and P. Quittner, The Netherlands, Elsevier, pp. 101–213.
Google Scholar
Breslow, N. E., and Clayton, D. G. (1993), “Approximate Inference in Generalized Linear Mixed Models,” Journal of the American Statistical Association, 88, 9–25.
Web of Science ®Google Scholar
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., and Riddell, A. (2015), “Stan: A Probabilistic Programming Language,” Journal of Statistical Software, 76, 1–32.
Web of Science ®Google Scholar
Castillo, I. (2012a), “Semiparametric Bernstein–Von Mises Theorem and Bias, Illustrated With Gaussian Process Priors,” Sankhya A, 74, 194–221.
Google Scholar
——— (2012b), “A Semiparametric Bernstein–Von Mises Theorem for Gaussian Process Priors,” Probability Theory and Related Fields, 152, 53–99.
Web of Science ®Google Scholar
——— (2014), “On Bayesian Supremum Norm Contraction Rates,” The Annals of Statistics, 42, 2058–2091.
Web of Science ®Google Scholar
Castillo, I., and Nickl, R. (2012), “Nonparametric Bernstein–Von Mises Theorems,” arXiv:1208.3862.
Google Scholar
——— (2013), “Nonparametric Bernstein–Von Mises Theorems in Gaussian White Noise,” The Annals of Statistics, 41, 1999–2028.
Web of Science ®Google Scholar
——— (2014), “On the Bernstein–von Mises Phenomenon for Nonparametric Bayes Procedures,” The Annals of Statistics, 42, 1941–1969.
Web of Science ®Google Scholar
Castillo, I., and Rousseau, J. (2015), “A Bernstein–Von Mises Theorem for Smooth Functionals in Semiparametric Models,” The Annals of Statistics, 43, 2353–2383.
Web of Science ®Google Scholar
Celisse, A., Daudin, J.-J., and Pierre, L. (2012), “Consistency of Maximum-Likelihood and Variational Estimators in the Stochastic Block Model,” Electronic Journal of Statistics, 6, 1847–1899.
Web of Science ®Google Scholar
Chen, Y.-C., Wang, Y. S., and Erosheva, E. A. (2017), “On the Use of Bootstrap With Variational Inference: Theory, Interpretation, and a Two-Sample Test Example,” arXiv:1711.11057.
Google Scholar
Cox, D. D. (1993), “An Analysis of Bayesian Inference for Nonparametric Regression,” The Annals of Statistics, 21, 903–923.
Web of Science ®Google Scholar
Dal Maso, G. (2012), An Introduction to Γ-Convergence (Vol. 8), New York: Springer Science & Business Media.
Google Scholar
De Blasi, P., and Hjort, N. L. (2009), “The Bernstein–Von Mises Theorem in Semiparametric Competing Risks Models,” Journal of Statistical Planning and Inference, 139, 2316–2328.
Web of Science ®Google Scholar
Dempster, A., Laird, N., and Rubin, D. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38.
Google Scholar
Diaconis, P., and Freedman, D. (1986), “On the Consistency of Bayes Estimates,” The Annals of Statistics, 14, 1–26.
Web of Science ®Google Scholar
——— (1997), “Consistency of Bayes Estimates for Nonparametric Regression: A Review,” in Festschrift for Lucien Le Cam, eds. D. Pollard, E. Torgersen, and G. L. Yang, New York: Springer, pp. 157–165.
Google Scholar
——— (1998), “Consistency of Bayes Estimates for Nonparametric Regression: Normal Theory,” Bernoulli, 4, 411–444.
Web of Science ®Google Scholar
Dieng, A. B., Tran, D., Ranganath, R., Paisley, J., and Blei, D. M. (2017), “Variational Inference via χ-Upper Bound Minimization,” in Advances in Neural Information Processing Systems, pp. 2729–2738.
Google Scholar
Freedman, D. (1999), “Wald Lecture: On the Bernstein-Von Mises Theorem With Infinite-Dimensional Parameters,” The Annals of Statistics, 27, 1119–1141.
Web of Science ®Google Scholar
Gelfand, A. E., and Smith, A. F. (1990), “Sampling-Based Approaches to Calculating Marginal Densities,” Journal of the American Statistical Association, 85, 398–409.
Web of Science ®Google Scholar
Ghorbani, B., Javadi, H., and Montanari, A. (2018), “An Instability in Variational Inference for Topic Models,” arXiv:1802.00568.
Google Scholar
Ghosal, S., and van der Vaart, A. (2017), Fundamentals of Nonparametric Bayesian Inference (Vol. 44), Cambridge, UK: Cambridge University Press.
Google Scholar
Ghosh, J., and Ramamoorthi, R. (2003), Bayesian Nonparametrics (Springer Series in Statistics), New York: Springer.
Google Scholar
Giordano, R., Broderick, T., and Jordan, M. I. (2017a), “Covariances, Robustness, and Variational Bayes,” arXiv:1709.02536.
Google Scholar
Giordano, R., Liu, R., Varoquaux, N., Jordan, M. I., and Broderick, T. (2017b), “Measuring Cluster Stability for Bayesian Nonparametrics Using the Linear Bootstrap,” arXiv:1712.01435.
Google Scholar
Hall, P., Ormerod, J. T., and Wand, M. (2011a), “Theory of Gaussian Variational Approximation for a Poisson Mixed Model,” Statistica Sinica, 21, 369–389.
Web of Science ®Google Scholar
Hall, P., Pham, T., Wand, M. P., and Wang, S. S. (2011b), “Asymptotic Normality and Valid Inference for Gaussian Variational Approximation,” The Annals of Statistics, 39, 2502–2532.
Web of Science ®Google Scholar
Hastings, W. (1970), “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, 57, 97–109.
Web of Science ®Google Scholar
Hoffman, M., Blei, D., Wang, C., and Paisley, J. (2013), “Stochastic Variational Inference,” Journal of Machine Learning Research, 14, 1303–1347.
Web of Science ®Google Scholar
Hoffman, M. D., and Gelman, A. (2014), “The No-U-Turn Sampler,” JMLR, 15, 1593–1623.
Google Scholar
Hofman, J., and Wiggins, C. (2008), “Bayesian Approach to Network Modularity,” Physical Review Letters, 100, 258701-1–258701-4.
Web of Science ®Google Scholar
James, L. F. (2008), “Large Sample Asymptotics for the Two-Parameter Poisson–Dirichlet Process,” in Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, eds. B. S. Clarke and S. Ghosal, Beachwood, OH: Institute of Mathematical Statistics, pp. 187–199.
Google Scholar
Jiang, J. (2007), Linear and Generalized Linear Mixed Models and Their Applications, New York: Springer Science & Business Media.
Google Scholar
Johnstone, I. M. (2010), “High Dimensional Bernstein-Von Mises: Simple Examples,” Institute of Mathematical Statistics Collections, 6, 87.
PubMedGoogle Scholar
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999), “An Introduction to Variational Methods for Graphical Models,” Machine Learning, 37, 183–233.
Web of Science ®Google Scholar
Kim, Y. (2006), “The Bernstein–Von Mises Theorem for the Proportional Hazard Model,” The Annals of Statistics, 34, 1678–1700.
Web of Science ®Google Scholar
——— (2009), “A Bernstein-Von Mises Theorem for Doubly Censored Data,” Statistica Sinica, 19, 581–595.
Web of Science ®Google Scholar
Kim, Y., and Lee, J. (2004), “A Bernstein-Von Mises Theorem in the Nonparametric Right-Censoring Model,” Annals of Statistics, 32, 1492–1512.
Web of Science ®Google Scholar
Kleijn, B., and Van der Vaart, A. (2012), “The Bernstein-von-Mises Theorem Under Misspecification,” Electronic Journal of Statistics, 6, 354–381.
Web of Science ®Google Scholar
Knapik, B. T., van der Vaart, A. W., van Zanten, J. H. et al. (2011), “Bayesian Inverse Problems With Gaussian Priors,” The Annals of Statistics, 39, 2626–2657.
Web of Science ®Google Scholar
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2017), “Automatic Differentiation Variational Inference,” The Journal of Machine Learning Research, 18, 430–474.
Web of Science ®Google Scholar
Laplace, P. (1809), “Memoire Sur Les Integrales Definies et leur Application aux Probabilites, et Specialement a la Recherche du Milieu qu’il Faut Choisir Entre les Resultats des Observations,” Memoires Presentes a l’Academie Des Sciences, Paris.
Google Scholar
Leahu, H. (2011), “On the Bernstein-Von Mises Phenomenon in the Gaussian White Noise Model,” Electronic Journal of Statistics, 5, 373–404.
Web of Science ®Google Scholar
Le Cam, L. (1953), “On Some Asymptotic Properties of Maximum Likelihood Estimates and Related Bayes’ Estimates,” University of California Publications in Statistics, 1, 277–330.
Google Scholar
Le Cam, L., and Yang, G. L. (2012), Asymptotics in Statistics: Some Basic Concepts, New York: Springer Science & Business Media.
Google Scholar
Lehmann, E. L., and Casella, G. (2006), Theory of Point Estimation, New York: Springer Science & Business Media.
Google Scholar
Li, Y., and Turner, R. E. (2016), “Rényi Divergence Variational Inference,” in Advances in Neural Information Processing Systems, pp. 1073–1081.
Google Scholar
Liu, Q., and Wang, D. (2016), “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm,” in Advances In Neural Information Processing Systems, pp. 2378–2386.
Google Scholar
Lu, Y. (2017), “On the Bernstein-Von Mises Theorem for High Dimensional Nonlinear Bayesian Inverse Problems,” arXiv:1706.00289.
Google Scholar
Lu, Y., Stuart, A. M., and Weber, H. (2017), “Gaussian Approximations for Probability Measures on Rd,” SIAM/ASA Journal on Uncertainty Quantification, 5, 1136–1165.
Web of Science ®Google Scholar
McCullagh, P. (1984), “Generalized Linear Models,” European Journal of Operational Research, 16, 285–292.
Web of Science ®Google Scholar
McCulloch, C. E., and Neuhaus, J. M. (2001), Generalized Linear Mixed Models, New York: Wiley Online Library.
Google Scholar
Mossel, E., Neeman, J., and Sly, A. (2012), “Stochastic Block Models and Reconstruction,” arXiv:1202.1499.
Google Scholar
Murphy, K. P. (2012), Machine Learning: A Probabilistic Perspective, Cambridge, MA: MIT Press.
Google Scholar
Murphy, S. A., and Van der Vaart, A. W. (2000), “On Profile Likelihood,” Journal of the American Statistical Association, 95, 449–465.
Web of Science ®Google Scholar
Ormerod, J. T., and Wand, M. P. (2010), “Explaining Variational Approximations,” The American Statistician, 64, 140–153.
Web of Science ®Google Scholar
Ormerod, J. T., You, C., and Muller, S. (2014), “A Variational Bayes Approach to Variable Selection,” Electronic Journal of Statistics, 11, 3549–3594.
Web of Science ®Google Scholar
Panov, M., and Spokoiny, V. (2015), “Finite Sample Bernstein–Von Mises Theorem for Semiparametric Problems,” Bayesian Analysis, 10, 665–710.
Web of Science ®Google Scholar
——— (2014), “Critical Dimension in the Semiparametric Bernsteinvon Mises Theorem,” Proceedings of the Steklov Institute of Mathematics, 287, 232–255.
Web of Science ®Google Scholar
Pati, D., Bhattacharya, A., and Yang, Y. (2017), “On Statistical Optimality of Variational Bayes,” arXiv:1712.08983.
Google Scholar
Ranganath, R., Tran, D., Altosaar, J., and Blei, D. (2016a), “Operator Variational Inference,” in Advances in Neural Information Processing Systems, pp. 496–504.
Google Scholar
Ranganath, R., Tran, D., and Blei, D. (2016b), “Hierarchical Variational Models,” in Proceedings of The 33rd International Conference on Machine Learning, pp. 324–333.
Google Scholar
Ray, K. (2017), “Adaptive Bernstein-Von Mises Theorems in Gaussian White Noise,” The Annals of Statistics, 45, 2511–2536.
Web of Science ®Google Scholar
Rivoirard, V., and Rousseau, J. (2012), “Bernstein–Von Mises Theorem for Linear Functionals of the Density,” The Annals of Statistics, 40, 1489–1523.
Web of Science ®Google Scholar
Robert, C., and Casella, G. (2004), Monte Carlo Statistical Methods (Springer Texts in Statistics), New York: Springer-Verlag.
Google Scholar
Roberts, S. J., Husmeier, D., Rezek, I., and Penny, W. (1998), “Bayesian Approaches to Gaussian Mixture Modeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1133–1142.
Web of Science ®Google Scholar
Sheth, R., and Khardon, R. (2017), “Excess Risk Bounds for the Bayes Risk Using Variational Inference in Latent Gaussian Models,” in Advances in Neural Information Processing Systems, pp. 5157–5167.
Google Scholar
Snijders, T. A., and Nowicki, K. (1997), “Estimation and Prediction for Stochastic Blockmodels for Graphs With Latent Block Structure,” Journal of Classification, 14, 75–100.
Web of Science ®Google Scholar
Spokoiny, V. (2013), “Bernstein-Von Mises Theorem for Growing Parameter Dimension,” arXiv:1302.3430.
Google Scholar
Tran, D., Blei, D., and Airoldi, E. M. (2015a), “Copula Variational Inference,” in Advances in Neural Information Processing Systems, pp. 3564–3572.
Google Scholar
Tran, D., Ranganath, R., and Blei, D. M. (2015b), “The Variational Gaussian Process,” arXiv:1511.06499.
Google Scholar
Van der Vaart, A. W. (2000), Asymptotic Statistics (Vol. 3), Cambridge, UK: Cambridge University Press.
Google Scholar
Von Mises, R. (1931), Wahrscheinlichkeitsrechnung und ihre Anwendungen in der Statistik und der Theoretischen Physik, Leipzig und Wien.
Google Scholar
Wainwright, M. J., and Jordan, M. I. (2008), “Graphical Models, Exponential Families, and Variational Inference,” Foundations and Trends® in Machine Learning, 1, 1–305.
Google Scholar
Wang, B., and Titterington, D. (2004), “Convergence and Asymptotic Normality of Variational Bayesian Approximations for Exponential Family Models With Missing Values,” in Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI Press, pp. 577–584.
Google Scholar
——— (2005), “Inadequacy of Interval Estimates Corresponding to Variational Bayesian Approximations,” in Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, eds. R. G. Corwell and Z. Ghahramani, pp. 373–380.
Google Scholar
Wang, B., and Titterington, D. (2006), “Convergence Properties of a General Algorithm for Calculating Variational Bayesian Estimates for a Normal Mixture Model,” Bayesian Analysis, 1, 625–650.
Web of Science ®Google Scholar
Wang, Y. J., and Wong, G. Y. (1987), “Stochastic Blockmodels for Directed Graphs,” Journal of the American Statistical Association, 82, 8–19.
Web of Science ®Google Scholar
Westling, T., and McCormick, T. H. (2015), “Beyond Prediction: A Framework for Inference with Variational Approximations in Mixture Models,” arXiv:1510.08151.
Google Scholar
Yang, Y., Pati, D., and Bhattacharya, A. (2017), “α-Variational Inference With Statistical Guarantees,” arXiv:1710.03266.
Google Scholar
You, C., Ormerod, J. T., and Müller, S. (2014), “On Variational Bayes Estimation and Variational Information Criteria for Linear Regression Models,” Australian & New Zealand Journal of Statistics, 56, 73–87.
Web of Science ®Google Scholar
Zhang, A. Y., and Zhou, H. H. (2017), “Theoretical and Computational Guarantees of Mean Field Variational Inference for Community Detection,” arXiv:1710.11268.
Google Scholar
Zhang, F., and Gao, C. (2017), “Convergence Rates of Variational Posterior Distributions,” arXiv:1712.02519.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Frequentist Consistency of Variational Bayes

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Frequentist Consistency of Variational Bayes

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date