123
Views
2
CrossRef citations to date
0
Altmetric
Articles

A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling

, &
Pages 2848-2870 | Received 02 Feb 2019, Accepted 10 Jul 2019, Published online: 18 Jul 2019
 

ABSTRACT

In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets.

2010 MATHEMATICS SUBJECT CLASSIFICATION:

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

C. A. B. Pereira thanks the Conselho Nacional de Desenvolvimento Científico e Tecnológico, CNPq, for support [grant number 308776/2014-3].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.