References
- Agarwal, A., and Duchi, J. C. (2011), “Distributed Delayed Stochastic Optimization,” in Advances in Neural Information Processing Systems, pp. 873–881.
- Ahn, S., Korattikara, A., and Welling, M. (2012), “Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring,” arXiv no. 1206.6380.
- Andrieu, C., and Roberts, G. O. (2009), “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations,” The Annals of Statistics, 37, 697–725. DOI: https://doi.org/10.1214/07-AOS574.
- Bardenet, R., Doucet, A., and Holmes, C. (2014), “Towards Scaling up Markov Chain Monte Carlo: An Adaptive Subsampling Approach,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 405–413.
- Bardenet, R., Doucet, A., and Holmes, C. (2015), “On Markov Chain Monte Carlo Methods for Tall Data,” arXiv no. 1505.02827.
- Bierkens, J., Fearnhead, P., and Roberts, G. (2019), “The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data,” The Annals of Statistics, 47, 1288–1320. DOI: https://doi.org/10.1214/18-AOS1715.
- Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017), “Variational Inference: A Review for Statisticians,” Journal of the American Statistical Association, 112, 859–877. DOI: https://doi.org/10.1080/01621459.2017.1285773.
- Chen, H., Seita, D., Pan, X., and Canny, J. (2016), “An Efficient Minibatch Acceptance Test for Metropolis-Hastings,” arXiv no. 1610.06848.
- Chen, T., Fox, E., and Guestrin, C. (2014), “Stochastic Gradient Hamiltonian Monte Carlo,” in International Conference on Machine Learning, pp. 1683–1691.
- De Sa, C., Chen, V., and Wong, W. (2018), “Minibatch Gibbs Sampling on Large Graphical Models,” arXiv no. 1806.06086.
- Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Le, Q.V. (2012), “Large Scale Distributed Deep Networks,” in Advances in Neural Information Processing Systems, pp. 1223–1231.
- Duchi, J., Hazan, E., and Singer, Y. (2011), “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, 12, 2121–2159.
- Ghorbani, B., Javadi, H., and Montanari, A. (2018), “An Instability in Variational Inference for Topic Models,” arXiv no. 1802.00568.
- Jacob, P. E., and Thiery, A. H. (2015), “On Nonnegative Unbiased Estimators,” The Annals of Statistics, 43, 769–784. DOI: https://doi.org/10.1214/15-AOS1311.
- Kingma, D. P., and Ba, J. (2014), “Adam: A Method for Stochastic Optimization,” arXiv no. 1412.6980.
- Korattikara, A., Chen, Y., and Welling, M. (2014), “Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 181–189.
- Krizhevsky, A., and Hinton, G. (2009), “Learning Multiple Layers of Features From Tiny Images,” Technical Report, Citeseer.
- Li, C., Chen, C., Carlson, D. E., and Carin, L. (2009), “Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks,” in AAAI (Vol. 2), p. 4. DOI: https://doi.org/10.1609/aaai.v33i01.33014173.
- Li, D., and Wong, W. H. (2017), “Mini-Batch Tempered MCMC,” arXiv no. 1707.09705.
- Maclaurin, D., and Adams, R. P. (2014), “Firefly Monte Carlo: Exact MCMC With Subsets of Data,” in UAI, pp. 543–552.
- Mukherjee, S. S., Sarkar, P., Wang, Y. X. R., and Yan, B. (2018), “Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues,” in Advances in Neural Information Processing Systems, pp. 10717–10727.
- Neal, R. M. (2011), “MCMC Using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo (Vol. 2), eds. S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Boca Raton, FL: CRC Press, pp. 113–160.
- Neiswanger, W., Wang, C., and Xing, E. (2013), “Asymptotically Exact, Embarrassingly Parallel MCMC,” in UAI.
- Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2019), “Speeding up MCMC by Efficient Data Subsampling,” Journal of the American Statistical Association, 114, 831–843. DOI: https://doi.org/10.1080/01621459.2018.1448827.
- Raginsky, M., Rakhlin, A., and Telgarsky, M. (2017), “Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis,” arXiv no. 1702.03849.
- Robbins, H., and Monro, S. (1985), “A Stochastic Approximation Method,” in Herbert Robbins Selected Papers, Springer, pp. 102–109.
- Roberts, G. O., and Tweedie, R. L. (1996), “Exponential Convergence of Langevin Distributions and Their Discrete Approximations,” Bernoulli, 2, 341–363. DOI: https://doi.org/10.2307/3318418.
- Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2016), “Bayes and Big Data: The Consensus Monte Carlo Algorithm,” International Journal of Management Science and Engineering Management, 11, 78–88. DOI: https://doi.org/10.1080/17509653.2016.1142191.
- Teh, Y. W., Thiery, A. H., and Vollmer, S. J. (2016), “Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics,” Journal of Machine Learning Research, 17, 193–225.
- Tieleman, T., and Hinton, G. (2012), “Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude,” COURSERA: Neural Networks for Machine Learning, 4, 26–31.
- Wang, X., and Dunson, D. B. (2013), “Parallelizing MCMC via Weierstrass Sampler,” arXiv no. 1312.4605.
- Welling, M., and Teh, Y. W. (2011), “Bayesian Learning via Stochastic Gradient Langevin Dynamics,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688.
- Xing, C., Arpit, D., Tsirigotis, C., and Bengio, Y. (2018), “A Walk With SGD,” arXiv no. 1802.08770.
- Ye, N., Zhu, Z., and Mantiuk, R. K. (2017), “Langevin Dynamics With Continuous Tempering for Training Deep Neural Networks,” arXiv no. 1703.04379.
- Yin, D., Pananjady, A., Lam, M., Papailiopoulos, D., Ramchandran, K., and Bartlett, P. (2018), “Gradient Diversity: A Key Ingredient for Scalable Distributed Learning,” in AISTATS (Vol. 84), pp. 1998–2007.
- Zhang, C., Öztireli, C., Mandt, S., and Salvi, G. (2019), “Active Mini-Batch Sampling Using Repulsive Point Processes,” in Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33), pp. 5741–5748. DOI: https://doi.org/10.1609/aaai.v33i01.33015741.