45,319
Views
1,958
CrossRef citations to date
0
Altmetric
Review

Variational Inference: A Review for Statisticians

, &
Pages 859-877 | Received 01 Jan 2016, Published online: 13 Jul 2017

References

  • Ahmed, A. , Aly, M. , Gonzalez, J. , Narayanamurthy, S. , and Smola, A. (2012), “Scalable Inference in Latent Variable Models,” in International Conference on Web Search and Data Mining , pp. 123–132.
  • Airoldi, E. , Blei, D. , Fienberg, S. , and Xing, E. (2008), “Mixed Membership Stochastic Blockmodels,” Journal of Machine Learning Research , 9, 1981–2014.
  • Amari, S. (1982), “Differential Geometry of Curved Exponential Families-Curvatures and Information Loss,” The Annals of Statistics , 10, 357–385.
  • ——— (1998), “Natural Gradient Works Efficiently in Learning,” Neural Computation , 10, 251–276.
  • Archambeau, C. , Cornford, D. , Opper, M. , and Shawe-Taylor, J. (2007a), “Gaussian Process Approximations of Stochastic Differential Equations,” Workshop on Gaussian Processes in Practice , 1, 1–16.
  • Archambeau, C. , Opper, M. , Shen, Y. , Cornford, D. , and Shawe-Taylor, J. (2007b), “Variational Inference for Diffusion Processes,” in Neural Information Processing Systems , pp. 17–24.
  • Armagan, A. , Clyde, M. , and Dunson, D. (2011), “Generalized Beta Mixtures of Gaussians,” in Neural Information Processing Systems , pp. 523–531.
  • Armagan, A. , and Dunson, D. (2011), “Sparse Variational Analysis of Linear Mixed Models for Large Data Sets,” Statistics & Probability Letters , 81, 1056–1062.
  • Barber, D. (2012), Bayesian Reasoning and Machine Learning , Cambridge, UK : Cambridge University Press.
  • Barber, D. , and Bishop, C. M. (1998), “Ensemble Learning in Bayesian Neural Networks,” in Generalization in Neural Networks and Machine Learning , ed. C. M. Bishop, New York : Springer Verlag, pp. 215–237.
  • Barber, D. , and Chiappa, S. (2006), “Unified Inference for Variational Bayesian Linear Gaussian State-Space Models,” in Neural Information Processing Systems , pp. 81–88.
  • Barber, D. , and de van Laar, P. (1999), “Variational Cumulant Expansions for Intractable Distributions,” Journal of Artificial Intelligence Research , 10, 435–455.
  • Barber, D. , and Wiegerinck, W. (1999), “Tractable Variational Structures for Approximating Graphical Models,” in Neural Information Processing Systems , pp. 183–189.
  • Beal, M. , and Ghahramani, Z. (2003), “The Variational Bayesian EM Algorithm for Incomplete Data: With Application to Scoring Graphical Model Structures,” in Bayesian Statistics (Vol. 7), eds. J. Bernardo , M. Bayarri , J. Berger , A. Dawid , D. Heckerman , A. Smith , and M. West , Oxford, UK : Oxford University Press, pp. 453–464.
  • Bernardo, J. , and Smith, A. (1994), Bayesian Theory , Chichester, UK : Wiley.
  • Bickel, P. , Choi, D. , Chang, X. , and Zhang, H. (2013), “Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels,” The Annals of Statistics , 41, 1922–1943.
  • Bishop, C. (2006), Pattern Recognition and Machine Learning , New York : Springer.
  • Bishop, C. , Lawrence, N. , Jaakkola, T. , and Jordan, M. I. (1998), “Approximating Posterior Distributions in Belief Networks using Mixtures,” in Neural Information Processing Systems , pp. 416–422.
  • Bishop, C. , and Winn, J. (2000), “Non-linear Bayesian Image Modelling,” in European Conference on Computer Vision , pp. 3–17.
  • Blei, D. (2012), “Probabilistic Topic Models,” Communications of the ACM , 55, 77–84.
  • Blei, D. , and Jordan, M. I. (2006), “Variational Inference for Dirichlet Process Mixtures,” Journal of Bayesian Analysis , 1, 121–144.
  • Blei, D. , and Lafferty, J. (2007), “A Correlated Topic Model of Science,” Annals of Applied Statistics , 1, 17–35.
  • Blei, D. , Ng, A. , and Jordan, M. I. (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research , 3, 993–1022.
  • Braun, M. , and McAuliffe, J. (2010), “Variational Inference for Large-Scale Models of Discrete Choice,” Journal of the American Statistical Association , 105, 324–335.
  • Brown, L. (1986), Fundamentals of Statistical Exponential Families , Hayward, CA : Institute of Mathematical Statistics.
  • Bugbee, B. , Breidt, F. , and van der Woerd, M. (2016), “Laplace Variational Approximation for Semiparametric Regression in the Presence of Heteroscedastic Errors,” Journal of Computational and Graphical Statistics , 25, 225–245.
  • Carbonetto, P. , and Stephens, M. (2012), “Scalable Variational Inference for Bayesian Variable Selection in Regression, and its Accuracy in Genetic Association Studies,” Bayesian Analysis , 7, 73–108.
  • Celisse, A. , Daudin, J.-J. , and Pierre, L. (2012), “Consistency of Maximum-Likelihood and Variational Estimators in the Stochastic Block Model,” Electronic Journal of Statistics , 6, 1847–1899.
  • Challis, E. , and Barber, D. (2013), “Gaussian Kullback-Leibler Approximate Inference,” The Journal of Machine Learning Research , 14, 2239–2286.
  • Chan, A. , and Vasconcelos, N. (2009), “Layered Dynamic Textures,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 31, 1862–1879.
  • Cohen, S. , and Smith, N. (2010), “Covariance in Unsupervised Learning of Probabilistic Grammars,” The Journal of Machine Learning Research , 11, 3017–3051.
  • Cummins, M. , and Newman, P. (2008), “FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance,” The International Journal of Robotics Research , 27, 647–665.
  • de Freitas, N. D. , Højen-Sørensen, P. , Jordan, M. , and Russell, S. (2001), “Variational MCMC,” in Uncertainty in Artificial Intelligence , pp. 120–127.
  • Damianou, A. , Titsias, M. , and Lawrence, N. (2011), “Variational Gaussian Process Dynamical Systems,” in Neural Information Processing Systems , pp. 2510–2518.
  • Daunizeau, J. , Adam, V. , and Rigoux, L. (2014), “VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data,” PLoS Computational Biology , 10, e1003441.
  • Dempster, A. , Laird, N. , and Rubin, D. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society , Series B, 39, 1–38.
  • Deng, L. (2004), “Switching Dynamic System Models for Speech Articulation and Acoustics,” in Mathematical Foundations of Speech and Language Processing, eds. M. Johnson, S. P. Khudanpur, M. Ostendorf, and R. Rosenfeld, New York : Springer, pp. 115–133.
  • Diaconis, P. , and Ylvisaker, D. (1979), “Conjugate Priors for Exponential Families,” The Annals of Statistics , 7, 269–281.
  • Du, L. , Lu, R. , Carin, L. , and Dunson, D. (2009), “A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation,” in Neural Information Processing Systems , pp. 486–494.
  • Ermis, B. , and Bouchard, G. (2014), “Iterative Splits of Quadratic Bounds for Scalable Binary Tensor Factorization,” in Uncertainty in Artificial Intelligence , pp. 192–199.
  • Erosheva, E. A. , Fienberg, S. E. , and Joutard, C. (2007), “Describing Disability through Individual-Level Mixture Models for Multivariate Binary Data,” The Annals of Applied Statistics , 1, 346–384.
  • Flandin, G. , and Penny, W. (2007), “Bayesian fMRI Data Analysis with Sparse Spatial Basis Function Priors,” NeuroImage , 34, 1108–1125.
  • Foti, N. , Xu, J. , Laird, D. , and Fox, E. (2014), “Stochastic Variational Inference for Hidden Markov Models,” in Neural Information Processing Systems , pp. 3599–3607.
  • Furmston, T. , and Barber, D. (2010), “Variational Methods for Reinforcement Learning,” Artificial Intelligence and Statistics , 9, 241–248.
  • Gelfand, A. , and Smith, A. (1990), “Sampling Based Approaches to Calculating Marginal Densities,” Journal of the American Statistical Association , 85, 398–409.
  • Geman, S. , and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 6, 721–741.
  • Gershman, S. J. , Blei, D. M. , Norman, K. A. , and Sederberg, P. B. (2014), “Decomposing Spatiotemporal Brain Patterns into Topographic Latent Sources,” NeuroImage , 98, 91–102.
  • Ghahramani, Z. , and Jordan, M. I. (1997), “Factorial Hidden Markov Models,” Machine Learning , 29, 245–273.
  • Giordano, R. J. , Broderick, T. , and Jordan, M. I. (2015), “Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes,” in Neural Information Processing Systems , pp. 1441–1449.
  • Grimmer, J. (2011), “An introduction to Bayesian Inference via Variational Approximations,” Political Analysis , 19, 32–47.
  • Hall, P. , Ormerod, J. , and Wand, M. (2011a), “Theory of Gaussian Variational Approximation for a Poisson Mixed Model,” Statistica Sinica , 21, 369–389.
  • Hall, P. , Pham, T. , Wand, M. , and Wang, S. (2011b), “Asymptotic Normality and Valid Inference for Gaussian Variational Approximation,” Annals of Statistics , 39, 2502–2532.
  • Harrison, L. , and Green, G. (2010), “A Bayesian Spatiotemporal Model for very Large Data Sets,” Neuroimage , 50, 1126–1141.
  • Hastings, W. (1970), “Monte Carlo Sampling Methods using Markov Chains and their Applications,” Biometrika , 57, 97–109.
  • Hensman, J. , Fusi, N. , and Lawrence, N. (2013), “Gaussian Processes for Big Data,” in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence , Corvallis, OR : AUAI Press, pp. 282–290.
  • Hensman, J. , Rattray, M. , and Lawrence, N. (2012), “Fast Variational Inference in the Conjugate Exponential Family,” in Neural Information Processing Systems , pp. 2888–2896.
  • Hinton, G. , and Van Camp, D. (1993), “Keeping the Neural Networks Simple by Minimizing the Description Length of the Weights,” in Computational Learning Theory , pp. 5–13.
  • Hoffman, M. , Blei, D. , and Mimno, D. M. (2012), “Sparse Stochastic Inference for Latent Dirichlet Allocation,” in Proceedings of the 29th International Conference on Machine Learning (ICML-12), eds. J. Langford and J. Pineau, New York: ACM, pp. 1599–1606.
  • Hoffman, M. D. , Blei, D. , Wang, C. , and Paisley, J. (2013), “Stochastic Variational Inference,” Journal of Machine Learning Research , 14, 1303–1347.
  • Hoffman, M. D. , and Blei, D. M. (2015), “Structured Stochastic Variational Inference,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (Vol. 38), eds. G. Lebanon and S. V. N. Vishwanathan, San Diego, CA: Proceedings of Machine Learning Research, pp. 361–369.
  • Hoffman, M. D. , and Gelman, A. (2014), “The No-U-turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo,” The Journal of Machine Learning Research , 15, 1593–1623.
  • Honkela, A. , Tornio, M. , Raiko, T. , and Karhunen, J. (2008), “Natural Conjugate Gradient in Variational Inference,” in Neural Information Processing, eds. J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, New York : Springer, pp. 305–314.
  • Jaakkola, T. , and Jordan, M. I. (1996), “Computing Upper and Lower Bounds on Likelihoods in Intractable Networks,” in Uncertainty in Artificial Intelligence , pp. 340–348.
  • ——— (1997), “A Variational Approach to Bayesian Logistic Regression Models and their Extensions,” in Artificial Intelligence and Statistics , pp. 1–12.
  • ——— (2000), “Bayesian Parameter Estimation via Variational Methods,” Statistics and Computing , 10, 25–37.
  • Ji, C. , Shen, H. , and West, M. (2010), “Bounded Approximations for Marginal Likelihoods,” Technical Report, Duke University.
  • Johnson, M. , and Willsky, A. (2014), “Stochastic Variational Inference for Bayesian Time Series Models,” in International Conference on Machine Learning , pp. 1854–1862.
  • Jojic, N. , and Frey, B. (2001), “Learning Flexible Sprites in Video Layers,” in Computer Vision and Pattern Recognition , pp. 1–8.
  • Jojic, V. , Jojic, N. , Meek, C. , Geiger, D. , Siepel, A. , Haussler, D. , and Heckerman, D. (2004), “Efficient Approximations for Learning Phylogenetic HMM Models from Data,” Bioinformatics , 20, 161–168.
  • Jordan, M. I. , Ghahramani, Z. , Jaakkola, T. , and Saul, L. (1999), “Introduction to Variational Methods for Graphical Models,” Machine Learning , 37, 183–233.
  • Khan, M. E. , Bouchard, G. , Murphy, K. P. , and Marlin, B. M. (2010), “Variational Bounds for Mixed-Data Factor Analysis,” in Neural Information Processing Systems , pp. 1108–1116.
  • Kiebel, S. , Daunizeau, J. , Phillips, C. , and Friston, K. (2008), “Variational Bayesian Inversion of the Equivalent Current Dipole Model in EEG/MEG,” NeuroImage , 39, 728–741.
  • Kingma, D. , and Welling, M. (2014), “Auto-Encoding Variational Bayes,” in Proceedings of the 2nd International Conference on Learning Representations (ICLR) .
  • Knowles, D. , and Minka, T. (2011), “Non-Conjugate Variational Message Passing for Multinomial and Binary Regression,” in Neural Information Processing Systems , pp. 1701–1709.
  • Kucukelbir, A. , Ranganath, R. , Gelman, A. , and Blei, D. (2015), “Automatic Variational Inference in Stan,” in Neural Information Processing Systems , pp. 568–576.
  • Kucukelbir, A. , Tran, D. , Ranganath, R. , Gelman, A. , and Blei, D. M. (2017), “Automatic Differentiation Variational Inference,” Journal of Machine Learning Research , 18, 1–45.
  • Kullback, S. , and Leibler, R. (1951), “On Information and Sufficiency,” The Annals of Mathematical Statistics , 22, 79–86.
  • Kurihara, K. , and Sato, T. (2006), “Variational Bayesian Grammar Induction for Natural Language,” in Grammatical Inference: Algorithms and Applications , New York : Springer, pp. 84–96.
  • Kushner, H. , and Yin, G. (1997), Stochastic Approximation Algorithms and Applications, eds. Y. Sakakibara, S. Kobayashi, K. Sato, T. Nishino, and E. Tomita, New York : Springer.
  • Lashkari, D. , Sridharan, R. , Vul, E. , Hsieh, P. , Kanwisher, N. , and Golland, P. (2012), “Search for Patterns of Functional Specificity in the Brain: A Nonparametric Hierarchical Bayesian Model for Group fMRI Data,” Neuroimage , 59, 1348–1368.
  • Lauritzen, S. , and Spiegelhalter, D. (1988), “Local Computations with Probabilities on Graphical Structures and their Application to Expert Systems,” Journal of the Royal Statistical Society , Series B, 50, 157–224.
  • Le Cun, Y. , and Bottou, L. (2004), “Large Scale Online Learning,” in Neural Information Processing Systems , pp. 217–224.
  • Leisink, M. , and Kappen, H. (2001), “A Tighter Bound for Graphical Models,” Neural Computation , 13, 2149–2171.
  • Liang, P. , Jordan, M. I. , and Klein, D. (2009), “Probabilistic Grammars and Hierarchical Dirichlet Processes,” in The Handbook of Applied Bayesian Analysis , eds. T. O’Hagan , and M. West , New York : Oxford University Press, pp. 776–819.
  • Liang, P. , Petrov, S. , Klein, D. , and Jordan, M. I. (2007), “The Infinite PCFG using Hierarchical Dirichlet Processes,” in Empirical Methods in Natural Language Processing , pp. 688–697.
  • Likas, A. , and Galatsanos, N. (2004), “A Variational Approach for Bayesian Blind Image Deconvolution,” IEEE Transactions on Signal Processing , 52, 2222–2233.
  • Logsdon, B. , Hoffman, G. , and Mezey, J. (2010), “A Variational Bayes Algorithm for Fast and Accurate Multiple Locus Genome-Wide Association Analysis,” BMC Bioinformatics , 11, 58.
  • MacKay, D. J. (1997), “Ensemble Learning for Hidden Markov Models,” unpublished manuscript, available at http://www.inference.eng.cam.ac.uk/mackay/ensemblePaper.pdf .
  • Manning, J. R. , Ranganath, R. , Norman, K. A. , and Blei, D. M. (2014), “Topographic Factor Analysis: A Bayesian Model for Inferring Brain Networks from Neural Data,” PloS one , 9, e94914.
  • Marlin, B. M. , Khan, M. E. , and Murphy, K. P. (2011), “Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models,” in International Conference on Machine Learning , pp. 633–640.
  • McGrory, C. A. , and Titterington, D. M. (2007), “Variational Approximations in Bayesian Model Selection for Finite Mixture Distributions,” Computational Statistics and Data Analysis , 51, 5352–5367.
  • Metropolis, N. , Rosenbluth, A. , Rosenbluth, M. , Teller, M. , and Teller, E. (1953), “Equations of State Calculations by Fast Computing Machines,” Journal of Chemical Physics , 21, 1087–1092.
  • Minka, T. P. (2001), “Expectation Propagation for Approximate Bayesian Inference,” in Uncertainty in Artificial Intelligence , pp. 362–369.
  • ——— (2005), “Divergence Measures and Message Passing,” Technical Report, Microsoft Research.
  • Minka, T. , Winn, J. , Guiver, J. , Webster, S. , Zaykov, Y. , Yangel, B. , Spengler, A. , and Bronskill, J. (2014), Infer.NET 2.6. Cambridge, MA: Microsoft Research.
  • Naseem, T. , Chen, H. , Barzilay, R. , and Johnson, M. (2010), “Using Universal Linguistic Knowledge to Guide Grammar Induction,” in Empirical Methods in Natural Language Processing , pp. 1234–1244.
  • Nathoo, F. , Babul, A. , Moiseev, A. , Virji-Babul, N. , and Beg, M. (2014), “A Variational Bayes Spatiotemporal Model for Electromagnetic Brain Mapping,” Biometrics , 70, 132–143.
  • Neal, R. M. , and Hinton, G. E. (1998), “A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants,” in Learning in Graphical Models , New York: Springer, pp. 355–368.
  • Neville, S. , Ormerod, J. , and Wand, M. (2014), “Mean Field Variational Bayes for Continuous Sparse Signal Shrinkage: Pitfalls and Remedies,” Electronic Journal of Statistics , 8, 1113–1151.
  • Nott, D. J. , Tan, S. L. , Villani, M. , and Kohn, R. (2012), “Regression Density Estimation with Variational Methods and Stochastic Approximation,” Journal of Computational and Graphical Statistics , 21, 797–820.
  • Opper, M. , and Winther, O. (2005), “Expectation Consistent Approximate Inference,” The Journal of Machine Learning Research , 6, 2177–2204.
  • Ormerod, J. , You, C. , and Muller, S. (2014), “A Variational Bayes Approach to Variable Selection,” unpublished manuscript, available at http://www.maths.usyd.edu.au/u/jormerod/JTOpapers/Variab-leSelectionFinal.pdf .
  • Paisley, J. , Blei, D. , and Jordan, M. I. (2012), “Variational Bayesian Inference with Stochastic Search,” in Proceedings of the 29th International Conference on International Conference on Machine Learning , Madison, WI: Omnipress, pp. 1363–1370.
  • Parisi, G. (1988), Statistical Field Theory , Melville, NY: Addison-Wesley.
  • Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference , San Francisco, CA: Morgan Kaufmann.
  • Penny, W. , Kiebel, S. , and Friston, K. (2003), “Variational Bayesian Inference for fMRI Time Series,” NeuroImage , 19, 727–741.
  • Penny, W. , Trujillo-Barreto, N. , and Friston, K. (2005), “Bayesian fMRI Time Series Analysis with Spatial Priors,” Neuroimage , 24, 350–362.
  • Peterson, C. , and Anderson, J. (1987), “A Mean Field Theory Learning Algorithm for Neural Networks,” Complex Systems , 1, 995–1019.
  • Raj, A. , Stephens, M. , and Pritchard, J. (2014), “fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets,” Genetics , 197, 573–589.
  • Ramos, F. , Upcroft, B. , Kumar, S. , and Durrant-Whyte, H. (2012), “A Bayesian Approach for Place Recognition,” Robotics and Autonomous Systems , 60, 487–497.
  • Ranganath, R. , Gerrish, S. , and Blei, D. (2014), “Black Box Variational Inference,” in Artificial Intelligence and Statistics , pp. 814–822.
  • Ranganath, R. , Tran, D. , and Blei, D. (2016), “Hierarchical Variational Models,” in International Conference on Machine Learning , pp. 324–333.
  • Regier, J. , Miller, A. , McAuliffe, J. , Adams, R. , Hoffman, M. , Lang, D. , Schlegel, D. , and Prabhat (2015), “Celeste: Variational Inference for a Generative Model of Astronomical Images,” in International Conference on Machine Learning , pp. 2095–2103.
  • Reyes-Gomez, M. , Ellis, D. , and Jojic, N. (2004), “Multiband Audio Modeling for Single-Channel Acoustic Source Separation,” in Acoustics, Speech, and Signal Processing , pp. 641–644.
  • Rezende, D. J. , Mohamed, S. , and Wierstra, D. (2014), “Stochastic Backpropagation and Approximate Inference in Deep Generative Models,” in Proceedings of the 31st International Conference on Machine Learning (Vol. 32), eds. E. P. Xing and T. Jebara, Beijing, China: Proceedings of Machine Learning Research, pp. 1278–1286.
  • Robbins, H. , and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics , 22, 400–407.
  • Robert, C. , and Casella, G. (2004), Monte Carlo Statistical Methods (Springer Texts in Statistics) , New York : Springer-Verlag.
  • Roberts, S. , Guilford, T. , Rezek, I. , and Biro, D. (2004), “Positional Entropy During Pigeon Homing I: Application of Bayesian Latent State Modelling,” Journal of Theoretical Biology , 227, 39–50.
  • Roberts, S. , and Penny, W. (2002), “Variational Bayes for Generalized Autoregressive Models,” IEEE Transactions on Signal Processing , 50, 2245–2257.
  • Rohde, D. , and Wand, M. (2016), “Semiparametric Mean Field Variational Bayes: General Principles and Numerical Issues,” Journal of Machine Learning Research , 17, 1–47.
  • Salimans, T. , Kingma, D. , and Welling, M. (2015), “Markov Chain Monte Carlo and Variational Inference: Bridging the Gap,” in International Conference on Machine Learning , pp. 1218–1226.
  • Salimans, T. , and Knowles, D. (2014), “On using Control Variates with Stochastic Approximation for Variational Bayes,” arXiv preprint, arXiv:1401.1022. Available at https://arxiv.org/abs/1401.1022
  • Sanguinetti, G. , Lawrence, N. , and Rattray, M. (2006), “Probabilistic Inference of Transcription Factor Concentrations and Gene-Specific Regulatory Activities,” Bioinformatics , 22, 2775–2781.
  • Sato, M. (2001), “Online Model Selection Based on the Variational Bayes,” Neural Computation , 13, 1649–1681.
  • Sato, M. , Yoshioka, T. , Kajihara, S. , Toyama, K. , Goda, N. , Doya, K. , and Kawato, M. (2004), “Hierarchical Bayesian Estimation for MEG Inverse Problem,” NeuroImage , 23, 806–826.
  • Saul, L. , and Jordan, M. I. (1996), “Exploiting Tractable Substructures in Intractable Networks,” in Neural Information Processing Systems , pp. 486–492.
  • Saul, L. K. , Jaakkola, T. , and Jordan, M. I. (1996), “Mean Field Theory for Sigmoid Belief Networks,” Journal of Artificial Intelligence Research , 4, 61–76.
  • Spall, J. (2003), Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , New York : Wiley.
  • Stan Development Team (2015), Stan Modeling Language Users Guide and Reference Manual, Version 2.8.0. New York: Columbia University.
  • Stegle, O. , Parts, L. , Durbin, R. , and Winn, J. (2010), “A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eqtl Studies,” PLoS Computational Biology , 6, e1000770.
  • Sudderth, E. B. , and Jordan, M. I. (2009), “Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes,” in Neural Information Processing Systems , pp. 1585–1592.
  • Sung, J. , Ghahramani, Z. , and Bang, Y. (2008), “Latent-Space Variational Bayes,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 30, 2236–2242.
  • Sykacek, P. , Roberts, S. , and Stokes, M. (2004), “Adaptive BCI Based on Variational Bayesian Kalman Filtering: An Empirical Evaluation,” IEEE Transactions on Biomedical Engineering , 51, 719–727.
  • Tan, L. , and Nott, D. (2013), “Variational Inference for Generalized Linear Mixed Models using Partially Noncentered Parametrizations,” Statistical Science , 28, 168–188.
  • Tan, L. , and Nott, D. (2014), “A Stochastic Variational Framework for Fitting and Diagnosing Generalized Linear Mixed Models,” Bayesian Analysis , 9, 963–1004.
  • Tan, L. , and Nott, D. 2018). Gaussian variational approximation with sparse precision matrices. Statistics and Computing, 28, 259-275. https://doi.org/10.1007/s11222-017-9729-7
  • Tipping, M. , and Lawrence, N. (2005), “Variational Inference for Student-t models: Robust Bayesian Interpolation and Generalised Component Analysis,” Neurocomputing , 69, 123–141.
  • Titsias, M. , and Lawrence, N. (2010), “Bayesian Gaussian Process Latent Variable Model,” in Artificial Intelligence and Statistics , pp. 844–851.
  • Titsias, M. , and Lázaro-Gredilla, M. (2014), “Doubly Stochastic Variational Bayes for Non-Conjugate Inference,” in International Conference on Machine Learning , pp. 1971–1979.
  • Tran, D. , Ranganath, R. , and Blei, D. M. (2016), “The Variational Gaussian Process,” in International Conference on Learning Representations , pp. 1–4.
  • Ueda, N. , and Ghahramani, Z. (2002), “Bayesian Model Search for Mixture Models Based on Optimizing Variational Bounds,” Neural Networks , 15, 1223–1241.
  • Van Den Broek, B. , Wiegerinck, W. , and Kappen, B. (2008), “Graphical Model Inference in Optimal Control of Stochastic Multi-Agent Systems,” Journal of Artificial Intelligence Research , 32, 95–122.
  • Vermaak, J. , Lawrence, N. D. , and Pérez, P. (2003), “Variational Inference for Visual Tracking,” in Computer Vision and Pattern Recognition , pp. 1–8.
  • Villegas, M. , Paredes, R. , and Thomee, B. (2013), “Overview of the ImageCLEF 2013 Scalable Concept Image Annotation Subtask,” in CLEF Evaluation Labs and Workshop , pp. 308–328.
  • Wainwright, M. J. , and Jordan, M. I. (2008), “Graphical Models, Exponential Families, and Variational Inference,” Foundations and Trends in Machine Learning , 1, 1–305.
  • Wand, M. (2014), “Fully Simplified Multivariate Normal Updates in Non-Conjugate Variational Message Passing,” Journal of Machine Learning Research , 15, 1351–1369.
  • Wand, M. , Ormerod, J. , Padoan, S. , and Fuhrwirth, R. (2011), “Mean Field Variational Bayes for Elaborate Distributions,” Bayesian Analysis , 6, 847–900.
  • Wang, B. , and Titterington, D. (2005), “Inadequacy of Interval Estimates Corresponding to Variational Bayesian Approximations,” in Artificial Intelligence and Statistics , pp. 373–380.
  • Wang, B. , and Titterington, D. (2006), “Convergence Properties of a General Algorithm for Calculating Variational Bayesian Estimates for a Normal Mixture Model,” Bayesian Analysis , 1, 625–650.
  • Wang, C. , and Blei, D. (2013), “Variational Inference in Nonconjugate Models,” Journal of Machine Learning Research , 14, 1005–1031.
  • Wang, C. , and Blei, D. (2015), “A General Method for Robust Bayesian Modeling,” Journal of Machine Learning Research , 14, 1005–1031.
  • Wang, P. , and Blunsom, P. (2013), “Collapsed Variational Bayesian Inference for Hidden Markov Models,” in Artificial Intelligence and Statistics , pp. 599–607.
  • Wang, Y. , and Mori, G. (2009), “Human Action Recognition by Semilatent Topic Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 31, 1762–1774.
  • Waterhouse, S. , MacKay, D. , and Robinson, T. (1996), “Bayesian Methods for Mixtures of Experts,” in Neural Information Processing Systems , pp. 351–357.
  • Welling, M. , and Teh, Y. (2011), “Bayesian Learning via Stochastic Gradient Langevin Dynamics,” in International Conference on Machine Learning , pp. 681–688.
  • Westling, T. , and McCormick, T. H. (2015), “Establishing Consistency and Improving Uncertainty Estimates of Variational Inference Through M-estimation,” arXiv preprint, arXiv:1510.08151. Available at https://arxiv.org/abs/1510.08151
  • Wiggins, C. , and Hofman, J. (2008), “Bayesian Approach to Network Modularity,” Physical Review Letters , 100, 258701.
  • Wingate, D. , and Weber, T. (2013), “Automated Variational Inference in Probabilistic Programming,” arXiv preprint, arXiv:1301.1299. Available at https://arxiv.org/abs/1301.1299
  • Winn, J. , and Bishop, C. (2005), “Variational Message Passing,” Journal of Machine Learning Research , 6, 661–694.
  • Wipf, D. , and Nagarajan, S. (2009), “A Unified Bayesian Framework for MEG/EEG Source Imaging,” NeuroImage , 44, 947–966.
  • Woolrich, M. , Behrens, T. , Beckmann, C. , Jenkinson, M. , and Smith, S. (2004), “Multilevel Linear Modeling for fMRI Group Analysis using Bayesian Inference,” Neuroimage , 21, 1732–1747.
  • Xing, E. , Wu, W. , Jordan, M. I. , and Karp, R. (2004), “Logos: A Modular Bayesian Model for de novo motif Detection,” Journal of Bioinformatics and Computational Biology , 2, 127–154.
  • Yedidia, J. S. , Freeman, W. T. , and Weiss, Y. (2001), “Generalized Belief Propagation,” in Neural Information Processing Systems , pp. 689–695.
  • Yogatama, D. , Wang, C. , Routledge, B. , Smith, N. A. , and Xing, E. (2014), “Dynamic Language Models for Streaming Text,” Transactions of the Association for Computational Linguistics , 2, 181–192.
  • You, C. , Ormerod, J. , and Muller, S. (2014), “On Variational Bayes Estimation and Variational Information Criteria for Linear Regression Models,” Australian & New Zealand Journal of Statistics , 56, 73–87.
  • Yu, T. , and Wu, Y. (2005), “Decentralized Multiple Target Tracking using Netted Collaborative Autonomous Trackers,” in Computer Vision and Pattern Recognition , pp. 939–946.
  • Zumer, J. , Attias, H. , Sekihara, K. , and Nagarajan, S. (2007), “A Probabilistic Algorithm Integrating Source Localization and Noise Suppression for MEG and EEG Data,” NeuroImage , 37, 102–115.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.