ABSTRACT
Variational methods are attractive for computing Bayesian inference when exact inference is impractical. They approximate a target distribution—either the posterior or an augmented posterior—using a simpler distribution that is selected to balance accuracy with computational feasibility. Here, we approximate an element-wise parametric transformation of the target distribution as multivariate Gaussian or skew-normal. Approximations of this kind are implicit copula models for the original parameters, with a Gaussian or skew-normal copula function and flexible parametric margins. A key observation is that their adoption can improve the accuracy of variational inference in high dimensions at limited or no additional computational cost. We consider the Yeo–Johnson and inverse G&H transformations, along with sparse factor structures for the scale matrix of the Gaussian or skew-normal. We also show how to implement efficient reparameterization gradient methods for these copula-based approximations. The efficacy of the approach is illustrated by computing posterior inference for three different models using six real datasets. In each case, we show that our proposed copula model distributions are more accurate variational approximations than Gaussian or skew-normal distributions, but at only a minor or no increase in computational cost. Supplementary materials comprising an online appendix, MATLAB code to implement the method, and the datasets employed, are available online.
Acknowledgments
The authors would like to thank Dr. Linda Tan for providing the MCMC output for the examples in Section 4, and Prof. Richard Gerlach and the review team for comments that helped improve the article.
Supplementary Materials
Supplementary materials contain: smith_loaiza_maya_nott_webappend.pdf
An online appendix in three parts. Part A specifies the pair-copula used in Section 3.2; Part B derives the four derivatives in Appendix B; Part C details two key MATLAB functions provided.
smith_loaiza_maya_nott_code.zip MATLAB code for implementing our method and reproducing the results in the article. README files are included.
Notes
1 Here the “vech” operator is the half-vectorization of a rectangular matrix, defined for an matrix A with n > K as with for .
2Note that is the distribution function of evaluated at vt, and is the distribution function of evaluated at vs.
3 This is not to be confused with asymmetry of the marginal distributions .