1,006
Views
1
CrossRef citations to date
0
Altmetric
Theory and Methods

Mini-Batch Metropolis–Hastings With Reversible SGLD Proposal

, &
Pages 386-394 | Received 05 Sep 2019, Accepted 06 Jun 2020, Published online: 14 Sep 2020

References

  • Agarwal, A., and Duchi, J. C. (2011), “Distributed Delayed Stochastic Optimization,” in Advances in Neural Information Processing Systems, pp. 873–881.
  • Ahn, S., Korattikara, A., and Welling, M. (2012), “Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring,” arXiv no. 1206.6380.
  • Andrieu, C., and Roberts, G. O. (2009), “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations,” The Annals of Statistics, 37, 697–725. DOI: https://doi.org/10.1214/07-AOS574.
  • Bardenet, R., Doucet, A., and Holmes, C. (2014), “Towards Scaling up Markov Chain Monte Carlo: An Adaptive Subsampling Approach,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 405–413.
  • Bardenet, R., Doucet, A., and Holmes, C. (2015), “On Markov Chain Monte Carlo Methods for Tall Data,” arXiv no. 1505.02827.
  • Bierkens, J., Fearnhead, P., and Roberts, G. (2019), “The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data,” The Annals of Statistics, 47, 1288–1320. DOI: https://doi.org/10.1214/18-AOS1715.
  • Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017), “Variational Inference: A Review for Statisticians,” Journal of the American Statistical Association, 112, 859–877. DOI: https://doi.org/10.1080/01621459.2017.1285773.
  • Chen, H., Seita, D., Pan, X., and Canny, J. (2016), “An Efficient Minibatch Acceptance Test for Metropolis-Hastings,” arXiv no. 1610.06848.
  • Chen, T., Fox, E., and Guestrin, C. (2014), “Stochastic Gradient Hamiltonian Monte Carlo,” in International Conference on Machine Learning, pp. 1683–1691.
  • De Sa, C., Chen, V., and Wong, W. (2018), “Minibatch Gibbs Sampling on Large Graphical Models,” arXiv no. 1806.06086.
  • Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Le, Q.V. (2012), “Large Scale Distributed Deep Networks,” in Advances in Neural Information Processing Systems, pp. 1223–1231.
  • Duchi, J., Hazan, E., and Singer, Y. (2011), “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, 12, 2121–2159.
  • Ghorbani, B., Javadi, H., and Montanari, A. (2018), “An Instability in Variational Inference for Topic Models,” arXiv no. 1802.00568.
  • Jacob, P. E., and Thiery, A. H. (2015), “On Nonnegative Unbiased Estimators,” The Annals of Statistics, 43, 769–784. DOI: https://doi.org/10.1214/15-AOS1311.
  • Kingma, D. P., and Ba, J. (2014), “Adam: A Method for Stochastic Optimization,” arXiv no. 1412.6980.
  • Korattikara, A., Chen, Y., and Welling, M. (2014), “Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 181–189.
  • Krizhevsky, A., and Hinton, G. (2009), “Learning Multiple Layers of Features From Tiny Images,” Technical Report, Citeseer.
  • Li, C., Chen, C., Carlson, D. E., and Carin, L. (2009), “Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks,” in AAAI (Vol. 2), p. 4. DOI: https://doi.org/10.1609/aaai.v33i01.33014173.
  • Li, D., and Wong, W. H. (2017), “Mini-Batch Tempered MCMC,” arXiv no. 1707.09705.
  • Maclaurin, D., and Adams, R. P. (2014), “Firefly Monte Carlo: Exact MCMC With Subsets of Data,” in UAI, pp. 543–552.
  • Mukherjee, S. S., Sarkar, P., Wang, Y. X. R., and Yan, B. (2018), “Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues,” in Advances in Neural Information Processing Systems, pp. 10717–10727.
  • Neal, R. M. (2011), “MCMC Using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo (Vol. 2), eds. S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Boca Raton, FL: CRC Press, pp. 113–160.
  • Neiswanger, W., Wang, C., and Xing, E. (2013), “Asymptotically Exact, Embarrassingly Parallel MCMC,” in UAI.
  • Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2019), “Speeding up MCMC by Efficient Data Subsampling,” Journal of the American Statistical Association, 114, 831–843. DOI: https://doi.org/10.1080/01621459.2018.1448827.
  • Raginsky, M., Rakhlin, A., and Telgarsky, M. (2017), “Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis,” arXiv no. 1702.03849.
  • Robbins, H., and Monro, S. (1985), “A Stochastic Approximation Method,” in Herbert Robbins Selected Papers, Springer, pp. 102–109.
  • Roberts, G. O., and Tweedie, R. L. (1996), “Exponential Convergence of Langevin Distributions and Their Discrete Approximations,” Bernoulli, 2, 341–363. DOI: https://doi.org/10.2307/3318418.
  • Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2016), “Bayes and Big Data: The Consensus Monte Carlo Algorithm,” International Journal of Management Science and Engineering Management, 11, 78–88. DOI: https://doi.org/10.1080/17509653.2016.1142191.
  • Teh, Y. W., Thiery, A. H., and Vollmer, S. J. (2016), “Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics,” Journal of Machine Learning Research, 17, 193–225.
  • Tieleman, T., and Hinton, G. (2012), “Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude,” COURSERA: Neural Networks for Machine Learning, 4, 26–31.
  • Wang, X., and Dunson, D. B. (2013), “Parallelizing MCMC via Weierstrass Sampler,” arXiv no. 1312.4605.
  • Welling, M., and Teh, Y. W. (2011), “Bayesian Learning via Stochastic Gradient Langevin Dynamics,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688.
  • Xing, C., Arpit, D., Tsirigotis, C., and Bengio, Y. (2018), “A Walk With SGD,” arXiv no. 1802.08770.
  • Ye, N., Zhu, Z., and Mantiuk, R. K. (2017), “Langevin Dynamics With Continuous Tempering for Training Deep Neural Networks,” arXiv no. 1703.04379.
  • Yin, D., Pananjady, A., Lam, M., Papailiopoulos, D., Ramchandran, K., and Bartlett, P. (2018), “Gradient Diversity: A Key Ingredient for Scalable Distributed Learning,” in AISTATS (Vol. 84), pp. 1998–2007.
  • Zhang, C., Öztireli, C., Mandt, S., and Salvi, G. (2019), “Active Mini-Batch Sampling Using Repulsive Point Processes,” in Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33), pp. 5741–5748. DOI: https://doi.org/10.1609/aaai.v33i01.33015741.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.