757
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Efficient Multimodal Sampling via Tempered Distribution Flow

&
Pages 1446-1460 | Received 27 Mar 2022, Accepted 15 Feb 2023, Published online: 26 May 2023

References

  • Ambrosio, L., Gigli, N., and Savare, G. (2008), Gradient Flows: In Metric Spaces and in the Space of Probability Measures, Basel: Birkhäuser.
  • Bhattacharya, R. (1978), “Criteria for Recurrence and Existence of Invariant Measures for Multidimensional Diffusions,” The Annals of Probability, 6, 541–553. DOI: 10.1214/aop/1176995476.
  • Bond-Taylor, S., Leach, A., Long, Y., and Willcocks, C. G. (2021), “Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-based and Autoregressive Models,” arXiv preprint, arXiv:2103.04922.
  • Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011), Handbook of Markov Chain Monte Carlo, Boca Raton, FL: Chapman & Hall/CRC.
  • Che, T., Zhang, R., Sohl-Dickstein, J., Larochelle, H., Paull, L., Cao, Y., and Bengio, Y. (2020), “Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling,” in Advances in Neural Information Processing Systems (Vol. 33).
  • Cheng, X., Chatterji, N. S., Abbasi-Yadkori, Y., Bartlett, P. L., and Jordan, M. I. (2018), “Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting,” arXiv preprint arXiv:1805.01648.
  • Chwialkowski, K., Strathmann, H., and Gretton, A. (2016), “A Kernel Test of Goodness of Fit,” in International Conference on Machine Learning, pp. 2606–2615. PMLR.
  • Dinh, L., Krueger, D., and Bengio, Y. (2014), “NICE: Non-Linear Independent Components Estimation,” arXiv preprint arXiv:1410.8516.
  • Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016), “Density Estimation Using Real NVP,” arXiv preprint arXiv:1605.08803.
  • Dolatabadi, H. M., Erfani, S., and Leckie, C. (2020), “Invertible Generative Modeling Using Linear Rational Splines,” in International Conference on Artificial Intelligence and Statistics, pp. 4236–4246.
  • Dongarra, J., and Sullivan, F. (2000), “Guest Editors’ Introduction: The Top 10 Algorithms,” Computing in Science & Engineering, 2, 22–23. DOI: 10.1109/MCISE.2000.814652.
  • Dunson, D. B., and Johndrow, J. (2020), “The Hastings Algorithm at Fifty,” Biometrika, 107, 1–23. DOI: 10.1093/biomet/asz066.
  • Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019), “Neural Spline Flows,” in Advances in Neural Information Processing Systems (Vol. 32).
  • Earl, D. J., and Deem, M. W. (2005), “Parallel Ttempering: Theory, Applications, and New Perspectives,” Physical Chemistry Chemical Physics, 7, 3910–3916. DOI: 10.1039/b509983h.
  • Falcioni, M., and Deem, M. W. (1999), “A Biased Monte Carlo Scheme for Zeolite Structure Solution,” The Journal of Chemical Physics, 110, 1754–1766. DOI: 10.1063/1.477812.
  • Gao, Y., Huang, J., Jiao, Y., Liu, J., Lu, X., and Yang, Z. (2022), “Deep Generative Learning via Euler Particle Transport,” in Mathematical and Scientific Machine Learning, pp. 336–368.
  • Gao, Y., Jiao, Y., Wang, Y., Wang, Y., Yang, C., and Zhang, S. (2019), “Deep Generative Learning via Variational Gradient Flow,” in International Conference on Machine Learning, pp. 2093–2101. PMLR.
  • Ge, R., Lee, H., and Risteski, A. (2018), “Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition,” arXiv preprint arXiv:1812.00793.
  • Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2014), Bayesian Data Analysis, Boca Raton, FL: Chapman & Hall/CRC.
  • Geyer, C. J. (1991), “Markov Chain Monte Carlo Maximum Likelihood,” in 23rd Symposium on the Interface.
  • Geyer, C. J., and Thompson, E. A. (1995), “Annealing Markov Chain Monte Carlo with Applications to Ancestral Inference,” Journal of the American Statistical Association, 90, 909–920. DOI: 10.1080/01621459.1995.10476590.
  • Gilks, W., Richardson, S., and Spiegelhalter, D. (1995), Markov Chain Monte Carlo in Practice, Boca Raton, FL: Chapman & Hall/CRC.
  • Goodfellow, I., Bengio, Y., and Courville, A. (2016), Deep Learning, Cambridge, MA: MIT Press.
  • Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., and Smola, A. (2006), “A Kernel Method for the Two-Sample-Problem,” in Advances in Neural Information Processing Systems (Vol. 19).
  • Hastings, W. K. (1970), “Monte Carlo Sampling Methods using Markov Chains and Their Applications,” Biometrika, 57, 97–109. DOI: 10.1093/biomet/57.1.97.
  • Hoffman, M., Sountsov, P., Dillon, J. V., Langmore, I., Tran, D., and Vasudevan, S. (2019), “NeuTra-Lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport,” arXiv preprint, arXiv:1903.03704.
  • Huang, J., Jiao, Y., Kang, L., Liao, X., Liu, J., and Liu, Y. (2021), “Schrödinger-föllmer Sampler: Sampling Without Ergodicity,” arXiv preprint arXiv:2106.10880.
  • Kingma, D. P., and Ba, J. (2015), “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations, pp. 1–13.
  • Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. (2016), “Improved Variational Inference with Inverse Autoregressive Flow,” in Advances in Neural Information Processing Systems (Vol. 29).
  • Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983), “Optimization by Simulated Annealing,” Science, 220, 671–680. DOI: 10.1126/science.220.4598.671.
  • Kofke, D. A. (2002), “On the Acceptance Probability of Replica-Exchange Monte Carlo Trials,” The Journal of Chemical Physics, 117, 6911–6914. DOI: 10.1063/1.1507776.
  • Kone, A., and Kofke, D. A. (2005), “Selection of Temperature Intervals for Parallel-Tempering Simulations,” The Journal of Chemical Physics, 122, 206101. DOI: 10.1063/1.1917749.
  • Levine, R. A., and Casella, G. (2001), “Implementations of the Monte Carlo EM Algorithm,” Journal of Computational and Graphical Statistics, 10, 422–439. DOI: 10.1198/106186001317115045.
  • Ley, C., and Swan, Y. (2013), “Stein’s Density Approach and Information Inequalities,” Electronic Communications in Probability, 18,1–14. DOI: 10.1214/ECP.v18-2578.
  • Liu, Q., Xu, J., Jiang, R., and Wong, W. H. (2021), “Density Estimation Using Deep Generative Neural Networks,” Proceedings of the National Academy of Sciences, 118, e2101344118. DOI: 10.1073/pnas.2101344118.
  • Liu, Z., Luo, P., Wang, X., and Tang, X. (2015), “Deep Learning Face Attributes in the Wild,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738.
  • Marinari, E., and Parisi, G. (1992), “Simulated Tempering: A New Monte Carlo Scheme,” Europhysics Letters, 19, 451. DOI: 10.1209/0295-5075/19/6/002.
  • Marzouk, Y., Moselhy, T., Parno, M., and Spantini, A. (2016), “Sampling via Measure Transport: An Introduction,” in Handbook of Uncertainty Quantification, eds. R. Ghanem, D. Higdon, and H. Owhadi, pp. 785–825, Cham: Springer.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953), “Equation of State Calculations by Fast Computing Machines,” The Journal of Chemical Physics, 21,1087–1092. DOI: 10.1063/1.1699114.
  • Neal, R. M. (1996), “Sampling from Multimodal Distributions using Tempered Transitions,” Statistics and Computing, 6, 353–366. DOI: 10.1007/BF00143556.
  • Nelsen, R. B. (2006). An Introduction to Copulas, New York: Springer.
  • Owen, A. B. (2013). Monte Carlo Theory, Methods and Examples. Available at https://artowen.su.domains/mc/
  • Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., and Wu, Y. N. (2020), “Learning Latent Space Energy-based Prior Model,” in Advances in Neural Information Processing Systems (Vol. 33).
  • Papamakarios, G., Pavlakou, T., and Murray, I. (2017), “Masked Autoregressive Flow for Density Estimation,” in Advances in Neural Information Processing Systems (Vol. 30).
  • Qiu, Y., and Wang, X. (2021), “ALMOND: Adaptive Latent Modeling and Optimization via Neural Networks and Langevin Diffusion,” Journal of the American Statistical Association, 116, 1224–1236. DOI: 10.1080/01621459.2019.1691563.
  • Raginsky, M., Rakhlin, A., and Telgarsky, M. (2017), “Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis” in Conference on Learning Theory, pp. 1674–1703, PMLR.
  • Rezende, D., and Mohamed, S. (2015), “Variational Inference with Normalizing Flows,” in International Conference on Machine Learning, pp. 1530–1538. PMLR.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
  • Romano, Y., Sesia, M., and Candès, E. (2020), “Deep Knockoffs,” Journal of the American Statistical Association, 115, 1861–1872. DOI: 10.1080/01621459.2019.1660174.
  • Salakhutdinov, R. (2015), “Learning Deep Generative Models,” Annual Review of Statistics and Its Application, 2, 361–385. DOI: 10.1146/annurev-statistics-010814-020120.
  • Santambrogio, F. (2017), “{ Euclidean, Metric, and Wasserstein } Gradient Flows: An Overview,” Bulletin of Mathematical Sciences, 7,87–154.
  • Sullivan, T. J. (2015). Introduction to Uncertainty Quantification, Cham: Springer.
  • Sun, Y., Song, Q., and Liang, F. (2021), “Consistent Sparse Deep Learning: Theory and Computation,” Journal of the American Statistical Association, accepted. DOI: 10.1080/01621459.2021.1895175.
  • Swendsen, R. H., and Wang, J.-S. (1986), “Replica Monte Carlo Simulation of Spin-Glasses,” Physical Review Letters, 57, 2607. DOI: 10.1103/PhysRevLett.57.2607.
  • Tabak, E. G., and Turner, C. V. (2013), “A Family of Nonparametric Density Estimation Algorithms,” Communications on Pure and Applied Mathematics, 66, 145–164. DOI: 10.1002/cpa.21423.
  • Tabak, E. G., and Vanden-Eijnden, E. (2010), “Density Estimation by Dual Ascent of the Log-Likelihood,” Communications in Mathematical Sciences, 8, 217–233. DOI: 10.4310/CMS.2010.v8.n1.a11.
  • Vempala, S., and Wibisono, A. (2019), “Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices,” in Advances in Neural Information Processing Systems (Vol. 2).
  • Villani, C. (2009). Optimal Transport: Old and New, Berlin: Springer.
  • Vousden, W., Farr, W. M., and Mandel, I. (2016), “Dynamic Temperature Selection for Parallel Tempering in Markov Chain Monte Carlo Simulations,” Monthly Notices of the Royal Astronomical Society, 455, 1919–1937. DOI: 10.1093/mnras/stv2422.
  • Wei, G. C., and Tanner, M. A. (1990), “A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms,” Journal of the American Statistical Association, 85, 699–704. DOI: 10.1080/01621459.1990.10474930.
  • Woodard, D. B., Schmidler, S. C., and Huber, M. (2009), “Conditions for Rapid Mixing of Parallel and Simulated Tempering on Multimodal Distributions,” The Annals of Applied Probability, 19, 617–640. DOI: 10.1214/08-AAP555.
  • Xiao, H., Rasul, K., and Vollgraf, R. (2017), “Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv preprint arXiv:1708.07747.
  • Yuan, Y., Deng, Y., Zhang, Y., and Qu, A. (2020), “Deep Learning from a Statistical Perspective,” Stat, 9, e294. DOI: 10.1002/sta4.294.
  • Zheng, Z. (2003), “On Swapping and Simulated Tempering Algorithms,” Stochastic Processes and their Applications, 104, 131–154. DOI: 10.1016/S0304-4149(02)00232-6.
  • Zhou, X., Jiao, Y., Liu, J., and Huang, J. (2021), “A Deep Generative Approach to Conditional Sampling,” Journal of the American Statistical Association, accepted. DOI: 10.1080/01621459.2021.2016424.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.