129
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

A finite mixture approach to joint clustering of individuals and multivariate discrete outcomes

&
Pages 2186-2206 | Received 21 Jan 2016, Accepted 20 Apr 2017, Published online: 08 May 2017

References

  • Ghahramani Z, Hinton GE. The EM algorithm for mixture of factor analyzers. Technical Report, CRG-TR-96-1, 8, University of Toronto; 1997.
  • McNicholas PD, Murphy TB. Parsimonious Gaussian mixture models. Stat Comput. 2008;18:285–296. doi: 10.1007/s11222-008-9056-0
  • Greselin F, Ingrassia S. Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput. 2015;25(2):215–226. doi: 10.1007/s11222-013-9427-z
  • Murray PM, Browne RP, McNicholas PD. Mixtures of skew-t factor analyzers. Comput Statist Data Anal. 2014;77:326–335. doi: 10.1016/j.csda.2014.03.012
  • Tortora C, McNicholas PD, Browne RP. A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif. 2016;10: 423–440. doi: 10.1007/s11634-015-0204-z
  • Martella F, Alfò M, Vichi M. Biclustering of gene expression data by an extension of mixtures of factor analyzers. The Int J Biostat. 2008;4(1):3. doi: 10.2202/1557-4679.1078
  • Martella F, Alfò M, Vichi M. Hierarchical mixture models for biclustering in microarray data. Stat Model. 2011;11(6):489–505. doi: 10.1177/1471082X1001100602
  • Vicari D, Alfò M. Model based clustering of customer choice data. Comput Statist Data Anal. 2014;71:3–13. doi: 10.1016/j.csda.2013.09.014
  • Hartigan JA. Direct clustering of a data matrix. J Amer Statist Assoc. 1972;67:123–129. doi: 10.1080/01621459.1972.10481214
  • Hartigan JA. Clustering algorithms. New York: John Wiley & Sons, Inc.; 1975.
  • Bock HH. Automatische Klassifikation. Gottingen: Vandenhoeck and Ruprecht; 1974.
  • Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics/IEEE. ACM; 2004. p. 24–45.
  • VanMechelen I, Schepers J. A unifying model for biclustering. COMPSTAT2006 Proceedings, Università degli Studi di Roma La Sapienza, Rome, Italy; 2006.
  • Govaert G, Nadif M. Co-clustering: models, algorithms and applications. New York: Wiley; 2013.
  • Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000;8:93–103.
  • Ihmels J, Friedlander G, Bergman S, et al. Revealing modular organization in the yeast transcriptional network. Nat Genet. 2002;31:370–377.
  • Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002;18(Suppl. 1):S136–S144. doi: 10.1093/bioinformatics/18.suppl_1.S136
  • Ben-Dor A, Chor B, Karp R, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol. 2003;10:373–384. doi: 10.1089/10665270360688075
  • Murali TM, Kasif S. Extracting conserved gene expression motifs from gene expression data. IEEE/ACM Trans Comput Biol Bioinform. 2003;8:77–88.
  • Lee M, Shen H, Huang JZ, et al. Biclustering via sparse singular value decomposition. Biometrics. 2010;66:1087–1095. doi: 10.1111/j.1541-0420.2010.01392.x
  • Kiraly A, Abonyi J, Laiho A, et al. Biclustering of high-throughput gene expression data with bicluster miner. International Conference Data Mining Workshops; 2012. p. 131–138.
  • Li L, Guo Y, Wu W, et al. A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Min. 2012;5:1–10. doi: 10.1186/1756-0381-5-8
  • Dhillon IS. Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the SeventhACMSIGKDD International Conference on Knowledge Discovery and Data Mining. KDD'01. New York (NY): ACM; 2001. p. 269–274.
  • Bisson G, Hussain F. Chi-sim: a new similarity measure for the co-clustering task. In: Machine learning and applications, ICMLA '08, Seventh International Conference; 2008. p. 211–217.
  • Lazzeroni L, Owen AB. Plaid models for gene expression data. Statist Sinica. 2002;12:61–86.
  • Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. Bioinformatics. 2003;19:196–205. doi: 10.1093/bioinformatics/btg1078
  • Dhollander T, Sheng Q, Lemmens K, et al. Query-driven module discovery in microarray data. Bioinformatics. 2007;23: 2573–2580. doi: 10.1093/bioinformatics/btm387
  • Govaert G, Nadif M. Clustering with block mixture models. Pattern Recognit. 2003;36(2):463–473. doi: 10.1016/S0031-3203(02)00074-2
  • Govaert G, Nadif M. Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Statist Data Anal. 2008;52:3233–3245. doi: 10.1016/j.csda.2007.09.007
  • Wyse J, Friel N. Block clustering with collapsed latent block models. Stat Comput. 2012;22:415–428. doi: 10.1007/s11222-011-9233-4
  • Keribin C, Brault V, Celeux G, et al. Estimation and selection for the latent block model on categorical data. Stat Comput 2014; 25:1201–1216. doi: 10.1007/s11222-014-9472-2
  • Priam R, Nadif M, Govaert G. The block generative topographic mapping. In: ANNPR'2008, LNAI. Berlin: Springer; 2008. p. 13–23.
  • Priam R, Nadif M, Govaert G. Topographic Bernoulli block mixture mapping for binary tables. Pattern Anal Appl. 2014;17:839–847. doi: 10.1007/s10044-014-0368-8
  • Li J, Zha H. Two-way Poisson mixture models for simultaneous document classification and word clustering. Comput Statist Data Anal. 2006;50(1):163–180. doi: 10.1016/j.csda.2004.07.013
  • Lee S, Huang JZ. A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat Comput. 2014;24(3):429–441. doi: 10.1007/s11222-013-9379-3
  • Melnykov V. Model-based biclustering of clickstream data. Comput Statist Data Anal. 2014. Available on line 28 September 2014. doi: 10.1016/j.csda.2014.09.016
  • Barkow S, Bleuler S, Prelic A, et al. BicAT: a biclustering analysis toolbox. Bioinformatics. 2006;22(10):1282–1283. doi: 10.1093/bioinformatics/btl099
  • Kaiser S, Leisch F. A toolbox for bicluster analysis in R. Technical Report 28, Department of Statistics: Technical Reports; 2008.
  • Prelic A, Bleuler S, Zimmermann P, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22:1122–1129. doi: 10.1093/bioinformatics/btl060
  • Rodriguez-Baena DS, Perez-Pulido A, Aguilar-Ruiz JS. A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics. 2011;27: 2738–2745. doi: 10.1093/bioinformatics/btr464
  • Van Uitert M, Meuleman W, Wessels L. Biclustering sparse binary genomic data. J Comput Biol. 2008;15:1329–1345. doi: 10.1089/cmb.2008.0066
  • Shamir R, Maron-Katz A, Tanay A, et al. EXPANDER–an integrative program suite for microarray data analysis. BMC Bioinform. 2005;6:232. doi: 10.1186/1471-2105-6-232
  • Goncalves JP, Madeira SC, Oliveira AL. BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data. BMC Res Notes. 2009;2(1),124. ISSN 1756-0500. doi: 10.1186/1756-0500-2-124. See http://www.biomedcentral.com/1756-0500/2/124
  • Bhatia P, Iovleff S, Govaert G. blockcluster: an R Package for model based co-clustering. J Stat Softw. 2014; 76 (submitted).
  • Lazarsfeld PF, Henry NW. Latent structure analysis. Boston: Houghton Mifflin; 1968.
  • Bartolucci F. A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika. 2007;72:141–157. doi: 10.1007/s11336-005-1376-9
  • Bartolucci F, Montanari GE, Pandolfi S. Dimensionality of the latent structure and item selection via latent class multidimensional IRT models. Psychometrika. 2012;77:782–802. doi: 10.1007/s11336-012-9278-0
  • Gollini I, Murphy TB. Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput. 2014;24(4):569–588. doi: 10.1007/s11222-013-9389-1
  • Rost J. Rasch models in latent classes: an integration of two approaches to item analysis. Appl Psychol Meas. 1990;14:271–282. doi: 10.1177/014662169001400305
  • Rost J, von Davier M. Mixture distribution Rasch models. In: Fischer GH, Molenaar IW, editors. Rasch models: Foundations, recent developments, and applications. New York: Springer; 1995. p. 257–268.
  • von Davier M, Yamamoto K. Mixture distribution and HYBRID Rasch models. In: von Davier M, Carstensen CH, editors. Multivariate and mixture distribution Rasch models. New York: Springer; 2007. p. 99–115.
  • von Davier M, Rost J, Carstensen CH. Introduction: extending the Rasch model. In: von Davier M, Carstensen CH, editors. Multivariate and mixture distribution Rasch models. New York: Springer; 2007. p. 1–12.
  • Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol. 1977;39: 1–38.
  • Biernacki C, Celeux G, Govaert G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Statist Data Anal. 2003;41:561–575. doi: 10.1016/S0167-9473(02)00163-9
  • Lindstrom MJ, Bates DM. Netwon-Raphson and EM algorithms for linear mixed effects models for repeated measures data. J Amer Statist Assoc. 1998;83:1014–1022.
  • McNicholas PD, Murphy TB, McDaid AF, et al. Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Statist Data Anal. 2010;54(3):711–723. doi: 10.1016/j.csda.2009.02.011
  • Seidel W, Mosler K, Alker M. A cautionary note on likelihood ratio tests in mixture models. Annals of the Institute of Statistical Mathematics. 2000;52:481–487. doi: 10.1023/A:1004117419204
  • Bohning D, Dietz E, Schaub R, et al. The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Statist Math. 1994;46:373–388. doi: 10.1007/BF01720593
  • Pilla RS, Kamarthi SV, Lindsay BG. Aitken-based acceleration methods for assessing convergence of multilayer neural networks. IEEE Trans Neural Netw. 2001;12:998–1012. doi: 10.1109/72.950130
  • Lindsay BG. The geometry of mixture likelihood: a general theory. Ann Statist. 1983;11:86–94. doi: 10.1214/aos/1176346059
  • Lindsay BG. Mixture models: theory, geometry and applications. NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics, California. Vol. 5; 1995.
  • Pilla RS, Lindsay BG. Alternative EM methods for nonparametric finite mixture models. Biometrika. 2001;88:535–550. doi: 10.1093/biomet/88.2.535
  • Wu CFJ. On the convergence properties of the EM algorithm. Ann Statist. 1983;11: 95–103. doi: 10.1214/aos/1176346060
  • Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. 2nd International symposium on information theory. Budapest: Akademiai Kiado; 1973. p. 267–281.
  • Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6:461–464. doi: 10.1214/aos/1176344136
  • McLachlan GJ, Peel D. Finite mixture models. New York: Wiley; 2000.
  • Biernacki C, Celeux G, Govaert G. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell. 2000;22(7):719–725. doi: 10.1109/34.865189
  • Dasgupta A, Raftery AE. Detecting features in spatial point processes with clutter via model-based clustering. J Amer Statist Assoc. 1998;93:294–302. doi: 10.1080/01621459.1998.10474110
  • Teicher H. Identifiability of mixtures. Ann Math Statist. 1961;32:244–248. doi: 10.1214/aoms/1177705155
  • Teicher H. Identifiability of finite mixtures. Ann Math Statist. 1963;34:1265–1269. doi: 10.1214/aoms/1177703862
  • Yakowitz SJ, Spragins JD. On the identifiability of finite mixtures. Ann Math Statist. 1968;39:209–214. doi: 10.1214/aoms/1177698520
  • Atienza N, Garcia-Heras J, Muñoz-Pichardo JM. A new condition for identifiability of finite mixture distributions. Metrika. 2006;63:215–221. doi: 10.1007/s00184-005-0013-z
  • Follman DA, Lambert D. Generalizing logistic regression by nonparametric mixing. J Amer Statist Assoc. 1989;84:295–300. doi: 10.1080/01621459.1989.10478769
  • Wang P, Puterman ML, Cockburn I, et al. Mixed poisson regression models with covariate dependent rates. Biometrics. 1996;52:381–400. doi: 10.2307/2532881
  • Hennig C. Identifiability of models for clusterwise linear regression. J Classif. 2000;17:273–296. doi: 10.1007/s003570000022
  • Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106
  • Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23(21):2881–2887. doi: 10.1093/bioinformatics/btm453
  • Robinson MD, Smyth GK. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008;9(2):321–332. doi: 10.1093/biostatistics/kxm030
  • Hardcastle TJ, Kelly KA. BaySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010;11:422. doi: 10.1186/1471-2105-11-422
  • Zhou Y H, Xia K, Wright FA. A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics. 2011;27(19):2672–2678. doi: 10.1093/bioinformatics/btr449
  • Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2013;14(2):232–243. doi: 10.1093/biostatistics/kxs033
  • Risso D, Schwartz K, Sherlock G, et al. GC-content normalization for RNA-seq data. Technical report #291, University of California, Berkeley, Division of Biostatistics; 2011. Available from: http://www.bepress.com/ucbbiostat/paper291/
  • Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2: 193–218. doi: 10.1007/BF01908075
  • Frank A, Asuncion A. UCI machine learning repository. Irvine (CA): University of California, School of Information and Computer Science; 2010. Available from: http://archive.ics.uci.edu/ml.
  • Bartolucci F, Farcomeni A. A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J Amer Statist Assoc. 2009;104:816–831. doi: 10.1198/jasa.2009.0107

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.