Abstract
In the big data era there is a growing need to model the main features of large and non-trivial data sets. This paper proposes a Bayesian nonparametric prior for modelling situations where data are divided into different units with different densities, allowing information pooling across the groups. Leisen and Lijoi [(2011), ‘Vectors of Poisson–Dirichlet processes’, J. Multivariate Anal., 102, 482–495] introduced a bivariate vector of random probability measures with Poisson–Dirichlet marginals where the dependence is induced through a Lévy's Copula. In this paper the same approach is used for generalising such a vector to the multivariate setting. A first important contribution is the derivation of the Laplace functional transform which is non-trivial in the multivariate setting. The Laplace transform is the basis to derive the exchangeable partition probability function (EPPF) and, as a second contribution, we provide an expression of the EPPF for the multivariate setting. Finally, a novel Markov Chain Monte Carlo algorithm for evaluating the EPPF is introduced and tested. In particular, numerical illustrations of the clustering behaviour of the new prior are provided.
Funding
The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement no: 630677.
Notes
1. Recall that a sequence of measures (mi)i≥1 in converges, in the w♯–topology, to a measure m in
if and only if
for any bounded set
such that
. See CitationDaley and Vere-Jones (2003) for further details.