Search in:

Advanced search

Journal of Computational and Graphical Statistics Volume 31, 2022 - Issue 1

Submit an article Journal homepage

443

Views

CrossRef citations to date

Altmetric

Model Selection

Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models

Beomseok SeoDepartment of Statistics, The Pennsylvania State University, University Park, PACorrespondence[email protected]
View further author information

Lin LinDepartment of Statistics, The Pennsylvania State University, University Park, PAView further author information

Jia LiDepartment of Statistics, The Pennsylvania State University, University Park, PAView further author information

Pages 138-150 | Received 06 Mar 2020, Accepted 03 Sep 2021, Published online: 17 Nov 2021

Cite this article
https://doi.org/10.1080/10618600.2021.1982724
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Amit, I., Garber, M., Chevrier, N., Leite, A. P., Donner, Y., Eisenhaure, T., Guttman, M., Grenier, J. K., Li, W., Zuk, O., Schubert, L. A., Birditt, B., Shay, T., Goren, A., Zhang, X., Smith, Z., Deering, R., McDonald, R. C., Cabili, M., Bernstein, B. E., Rinn, J. L., Meissner, A., Root, D. E., Hacohen, N., Regev, A. (2009), “Unbiased Reconstruction of a Mammalian Transcriptional Network Mediating Pathogen Responses,” Science, 326, 257–263. DOI: 10.1126/science.1179050.
PubMed Web of Science ®Google Scholar
Andrews, J., and McNicholas, P. (2013), “vscc: Variable Selection for Clustering and Classification,” R package version 0.2.
Google Scholar
Belkin, M., and Niyogi, P. (2002), “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,” in Advances in Neural Information Processing Systems, pp. 585–591.
Google Scholar
Ben-Hur, A., and Guyon, I. (2003), “Detecting Stable Clusters Using Principal Component Analysis,” in Functional Genomics, pp. 159–182. Springer.
Google Scholar
Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistical Society, Series B, 57, 289–300. DOI: 10.1111/j.2517-6161.1995.tb02031.x.
Web of Science ®Google Scholar
Bhattacharyya, A. (1943), “On a Measure of Divergence Between Two Statistical Populations Defined by Their Probability Distributions,” Bulletin of the Calcutta Mathematical Society, 35, 99–109.
Google Scholar
Cai, D., Zhang, C., and He, X. (2010), “Unsupervised Feature Selection for Multi-Cluster Data,” Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM. DOI: 10.1145/1835804.1835848.
Google Scholar
Chang, W. (1983), “On Using Principal Components Before Separating a Mixture of Two Multivariate Normal Distributions,” Applied Statistics, 32, 267–275. DOI: 10.2307/2347949.
Web of Science ®Google Scholar
Chinchor, N. (1992), “The Statistical Significance of the muc-4 Results,” in Proceedings of the 4th Conference on Message Understanding, McLean, VA. Stroudsburg, PA: Association for Computational Linguistics, pp. 30–50.
Google Scholar
Constantinopoulos, C., Titsias, M. K., and Likas, A. (2006), “Bayesian Feature and Model Selection for Gaussian Mixture Models,” IEEE Transactions on Pattern Analysis & Machine Intelligence, 28, 1013–1018.
PubMed Web of Science ®Google Scholar
Davies, D. L., and Bouldin, D. W. (1979), “A Cluster Separation Measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, 224–227. DOI: 10.1109/TPAMI.1979.4766909.
Web of Science ®Google Scholar
Dy, J. G., and Brodley, C. E. (2004), “Feature Selection for Unsupervised Learning,” Journal of Machine Learning Research, 5, 845–889.
Web of Science ®Google Scholar
Fop, M., Murphy, T. B. (2018), “Variable Selection methods for model-based clustering. Statistics Surveys, 12, 18–65. DOI: 10.1214/18-SS119.
Web of Science ®Google Scholar
Fraley, C., and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation,” Journal of the American Statistical Association, 97, 611–631. DOI: 10.1198/016214502760047131.
Web of Science ®Google Scholar
Galimberti, G., and Soffritti, G. (2007), “Model-Based Methods to Identify Multiple Cluster Structures in a Data Set,” Computational Statistics & Data Analysis, 52, 520–536.
Web of Science ®Google Scholar
Guyon, I., and Elisseeff, A. (2003), “An Introduction to Variable and Feature Selection,” Journal of Machine Learning Research, 3,1157–1182.
Google Scholar
Hackstadt, A. J., and Hess, A. M. (2009), “Filtering for Increased Power for Microarray Data Analysis,” BMC Bioinformatics, 10, 11. DOI: 10.1186/1471-2105-10-11.
PubMed Web of Science ®Google Scholar
He, X., Cai, D., and Niyogi, P. (2006), “Laplacian Score for Feature Selection,” in Advances in Neural Information Processing Systems, Vancouver, BC, Canada, eds. Yair Weiss, Cambridge, MA: The MIT Press, pp. 507–514.
Google Scholar
Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J.-B., Lönnerberg, P., and Linnarsson, S. (2011), “Characterization of the Single-Cell Transcriptional Landscape by Highly Multiplex RNA-seq,” Genome Research, 21, 1160–1167. DOI: 10.1101/gr.110882.110.
PubMed Web of Science ®Google Scholar
Kohavi, R., and John, G. H. (1997), “Wrappers for Feature Subset Selection,” Artificial Intelligence, 97, 273–324. DOI: 10.1016/S0004-3702(97)00043-X.
Web of Science ®Google Scholar
Langfelder, P., Zhang, B., and Horvath, S. (2007), “Defining Clusters From a Hierarchical Cluster Tree: The Dynamic Tree Cut Package for r,” Bioinformatics, 24, 719–720. DOI: 10.1093/bioinformatics/btm563.
PubMed Web of Science ®Google Scholar
Lawler, E. L. (1985), The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization, Wiley-Interscience Series in Discrete Mathematics.
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998), “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, 86, 2278–2324. DOI: 10.1109/5.726791.
Web of Science ®Google Scholar
Lee, H., and Li, J. (2012), “Variable Selection for Clustering by Separability Based on Ridgelines,” Journal of Computational and Graphical Statistics, 21, 315–336. DOI: 10.1080/10618600.2012.679226.
Web of Science ®Google Scholar
Li, J. (2005), “Clustering Based on a Multilayer Mixture Model,” Journal of Computational and Graphical Statistics, 14, 547–568. DOI: 10.1198/106186005X59586.
Web of Science ®Google Scholar
Li, J., Ray, S., and Lindsay, B. (2007), “A Nonparametric Statistical Approach to Clustering Via Mode Identification,” Journal of Machine Learning Research, 8, 1687–1723.
Web of Science ®Google Scholar
Li, J., Seo, B., and Lin, L. (2019), “Optimal Transport, Mean Partition, and Uncertainty Assessment in Cluster Analysis,” Statistical Analysis and Data Mining: The ASA Data Science Journal, 12, 359–377. DOI: 10.1002/sam.11418.
Web of Science ®Google Scholar
Li, Z., Yang, Y., Liu, J., Zhou, X., and Lu, H. (2012), “Unsupervised Feature Selection Using Nonnegative Spectral Analysis,” in Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada. Palo Alto, CA: The AAAI Press
Google Scholar
Lin, L., Chan, C., and West, M. (2016), “Discriminative Variable Subsets in Bayesian Classification With Mixture Models, With Application in Flow Cytometry Studies,” Biostatistics, 17, 40–53. DOI: 10.1093/biostatistics/kxv021.
PubMed Web of Science ®Google Scholar
Lin, L., and Li, J. (2017), “Clustering With Hidden Markov Model on Variable Blocks,” The Journal of Machine Learning Research, 18, 3913–3961.
Web of Science ®Google Scholar
Liu, H., Xu, M., Gu, H., Gupta, A., Lafferty, J., and Wasserman, L. (2011), “Forest Density Estimation,” Journal of Machine Learning Research, 12, 907–951.
Web of Science ®Google Scholar
Liu, J. S., Zhang, J. L., Palumbo, M. J., and Lawrence, C. E. (2003a), “Bayesian Clustering With Variable and Transformation Selections,” Bayesian Statistics, 7, 249–275.
Google Scholar
Liu, T., Liu, S., Chen, Z., and Ma, W.-Y. (2003b), “An Evaluation on Feature Selection for Text Clustering,” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 488–495.
Google Scholar
Marbac, M., and Sedki, M. (2018), “Varsellcm: An r/c++ Package for Variable Selection in Model-Based Clustering of Mixed-Data With Missing Values,” Bioinformatics, 35, 1255–1257. DOI: 10.1093/bioinformatics/bty786.
Web of Science ®Google Scholar
Marbac, M., and Vandewalle, V. (2019), “A Tractable Multi-Partitions Clustering,” Computational Statistics & Data Analysis, 132, 167–179.
Web of Science ®Google Scholar
Miao, J., and Niu, L. (2016), “A Survey on Feature Selection,” Procedia Computer Science, 91, 919–926. DOI: 10.1016/j.procs.2016.07.111.
Google Scholar
Padilla, O. H. M., Sharpnack, J., Scott, J. G., and Tibshirani, R. J. (2017), “The dfs Fused Lasso: Linear-Time Denoising Over General Graphs,” Journal of Machine Learning Research, 18, 176–1.
Web of Science ®Google Scholar
Pan, W., and Shen, X. (2007), “Penalized Model-Based Clustering With Application to Variable Selection,” Journal of Machine Learning Research, 8, 1145–1164.
Web of Science ®Google Scholar
Qian, M., and Zhai, C. (2013), “Robust Unsupervised Feature Selection,” in Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China. Menlo Park, CA: AAAI Press
Google Scholar
Raftery, A., and Dean, N. (2006), Variable selection for model-based clustering. Journal of the American Statistical Association, 101(473):168–178. DOI: 10.1198/016214506000000113.
Web of Science ®Google Scholar
Raileanu, L. E., and Stoffel, K. (2004), “Theoretical Comparison Between the Gini Index and Information Gain Criteria,” Annals of Mathematics and Artificial Intelligence, 41, 77–93. DOI: 10.1023/B:AMAI.0000018580.96245.c6.
Web of Science ®Google Scholar
Rand, W. M. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850. DOI: 10.1080/01621459.1971.10482356.
Web of Science ®Google Scholar
Rousseeuw, P. J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” Journal of Computational and Applied Mathematics, 20, 53–65. DOI: 10.1016/0377-0427(87)90125-7.
Web of Science ®Google Scholar
Scrucca, L., and Raftery, A. E. (2018), “clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in r,” Journal of Statistical Software, 84. DOI: 10.18637/jss.v084.i01.
PubMedGoogle Scholar
Tang, F., Lao, K., and Surani, M. A. (2011), “Development and Applications of Single-Cell Transcriptome Analysis,” Nature methods, 8, S6. DOI: 10.1038/nmeth.1557.
PubMed Web of Science ®Google Scholar
Thorndike, R. L. (1953), “Who Belongs in the Family?” Psychometrika, 18, 267–276. DOI: 10.1007/BF02289263.
Google Scholar
Williams, G., Huang, J., Chen, X., Wang, Q., and Xiao, L. (2014), “wskm: Weighted k-Means Clustering,” R package version, 1:19.
Google Scholar
Witten, D. M., and Tibshirani, R. (2010), “A Framework for Feature Selection in Clustering,” Journal of the American Statistical Association, 105, 713–726. DOI: 10.1198/jasa.2010.tm09415.
PubMed Web of Science ®Google Scholar
Witten, D. M., and Tibshirani, R. (2013), “sparcl: Perform Sparse Hierarchical Clustering and Sparse k-Means Clustering,” R package version, 1.
Google Scholar
Wolfe, J. H. (1970), “Pattern Clustering by Multivariate Mixture Analysis,” Multivariate Behavioral Research, 5, 329–350. DOI: 10.1207/s15327906mbr0503_6.
PubMed Web of Science ®Google Scholar
Wu, C., Kwon, S., Shen, X., and Pan, W. (2016), “A New Algorithm and Theory for Penalized Regression-Based Clustering,” The Journal of Machine Learning Research, 17, 6479–6503.
Web of Science ®Google Scholar
Xie, B., Pan, W., and Shen, X. (2008), “Penalized Model-Based Clustering With Cluster-Specific Diagonal Covariance Matrices and Grouped Variables,” Electronic Journal of Statistics, 2, 168–212. DOI: 10.1214/08-EJS194.
PubMed Web of Science ®Google Scholar
Xie, J., Girshick, R., and Farhadi, A. (2016), “Unsupervised Deep Embedding for Clustering Analysis,” in International Conference on Machine Learning, pp. 478–487.
Google Scholar
Yan, L., Yang, M., Guo, H., Yang, L., Wu, J., Li, R., Liu, P., Lian, Y., Zheng, X., Yan, J., et al. (2013), “Single-Cell RNA-seq Profiling of Human Preimplantation Embryos and Embryonic Stem Cells,” Nature Structural & Molecular Biology, 20, 1131–1139.
PubMed Web of Science ®Google Scholar
Yeung, K., and Ruzzo, W. (2001), “Principal Component Analysis for Clustering Gene Expression Data,” Bioinformatics, 17, 763–774. DOI: 10.1093/bioinformatics/17.9.763.
PubMed Web of Science ®Google Scholar
Yu, L., and Liu, H. (2003), “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC. Menlo Park, CA: AAAI Press, pp.856–863.
Google Scholar
Zhu, P., Zhu, W., Hu, Q., Zhang, C., and Zuo, W. (2017), “Subspace Clustering Guided Unsupervised Feature Selection,” Pattern Recognition, 66, 364–374. DOI: 10.1016/j.patcog.2017.01.016.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date