Abstract
In this paper we introduce a Bayesian mixture model with an unknown number of components for partitioning gene expression data. Inferences about all the unknown parameters involved are made by using the proposed data-driven Markov chain Monte Carlo. This algorithm is essentially a Metropolis–Hastings within Gibbs sampling. The Metropolis–Hastings is performed to change the number of partitions k in the neighborhood and
using a pair of split-merge moves. Our strategy for splitting is based on data in which allocation probabilities are calculated based on marginal likelihood function from the previously allocated observations. Conditional on k, the partitions labels are updated via Gibbs sampling. The two main advantages of the proposed algorithm is that it is easy to be implemented and the acceptance probability for split-merge movements depends only on the observed data. We examine the performance of the proposed algorithm on simulated data and then analyze two publicly available gene expression data sets.
Acknowledgments
We thank the editor and the referees for their comments, suggestions and criticisms which have led to improvements of this article. The first author acknowledges the Brazilian institution CNPq. F. Louzada acknowledges the Brazilian institutions CNPq and FAPESP.
Disclosure statement
No potential conflict of interest was reported by the authors.