Search in:

Advanced search

Journal of the American Statistical Association Volume 119, 2024 - Issue 546

Submit an article Journal homepage

535

Views

CrossRef citations to date

Altmetric

Theory and Methods

Skeleton Clustering: Dimension-Free Density-Aided Clustering

Zeyu WeiDepartment of Statistics, University of Washington, Seattle, WACorrespondence[email protected]

https://orcid.org/0000-0003-1614-4458 View further author information

Yen-Chi ChenDepartment of Statistics, University of Washington, Seattle, WA

https://orcid.org/0000-0002-4485-306X View further author information

Pages 1124-1135 | Received 21 Apr 2021, Accepted 13 Jan 2023, Published online: 06 Mar 2023

Cite this article
https://doi.org/10.1080/01621459.2023.2174122
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Almodovar-Rivera, I. A., and Maitra, R. (2020), “Kernel-Estimated Nonparametric Overlap-based Syncytial Clustering,” Journal of Machine Learning Research, 21, 1–54.
PubMed Web of Science ®Google Scholar
Amenta, N., Attali, D., and Devillers, O. (2007), “Complexity of Delaunay Triangulation for Points on Lower-Dimensional Polyhedra,” in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pp. 1106–1113, USA. Society for Industrial and Applied Mathematics.
Google Scholar
Aragam, B., Dan, C., Xing, E. P., and Ravikumar, P. (2020), “Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering,” The Annals of Statistics, 48, 2277–2302. DOI: 10.1214/19-AOS1887.
Web of Science ®Google Scholar
Azzalini, A., and Torelli, N. (2007), “Clustering via Nonparametric Density Estimation,” Statistics and Computing, 17, 71–80.
Web of Science ®Google Scholar
Bachem, O., Lucic, M., and Krause, A. (2017), “Practical Coreset Constructions for Machine Learning,” arXiv preprint. DOI: 10.48550/arXiv.1703.06476.
Google Scholar
Baudry, J.-P., Raftery, A. E., Celeux, G., Lo, K., and Gottardo, R. (2010), “Combining Mixture Components for Clustering,” Journal of Computational and Graphical Statistics, 19, 332–353. DOI: 10.1198/jcgs.2010.08111.
Web of Science ®Google Scholar
Bentley, J. L. (1975), “Multidimensional Binary Search Trees used for Associative Searching,” Communications of the ACM, 18, 509–517.
Web of Science ®Google Scholar
Berg, M. d., Cheong, O., Kreveld, M. v., and Overmars, M. (2008), Computational Geometry: Algorithms and Applications (3rd ed.), Springer-Verlag TELOS, Santa Clara, CA, USA.
Google Scholar
Brinkman, R. R., Gasparetto, M., Lee, S. J. J., Ribickas, A. J., Perkins, J., Janssen, W., Smiley, R., and Smith, C. (2007), “High-Content Flow Cytometry and Temporal Data Analysis for Defining a Cellular Signature of Graft-Versus-Host Disease,” Biology of Blood and Marrow Transplantation, 13, 691–700.
PubMed Web of Science ®Google Scholar
Campello, R. J., Moulavi, D., Zimek, A., and Sander, J. (2015), “Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection,” ACM Transactions on Knowledge Discovery from Data, 10, 1–51.
Web of Science ®Google Scholar
Carreira-Perpinán, M. A. (2015), “A Review of Mean-Shift Algorithms for Clustering,” arXiv preprint arXiv:1503.00687.
Google Scholar
Chacón, J. E. (2015), “A Population Background for Nonparametric Density-Based Clustering,” Statistical Science, 30, 518–532. DOI: 10.1214/15-STS526.
Web of Science ®Google Scholar
Chacón, J. E. (2019), “Mixture Model Modal Clustering,” Advances in Data Analysis and Classification, 13, 379–404.
Web of Science ®Google Scholar
Chacón, J. E., and Duong, T. (2013), “Data-Driven Density Derivative Estimation, with Applications to Nonparametric Clustering and Bump Hunting,” Electronic Journal of Statistics, 7, 499–532.
Web of Science ®Google Scholar
Chacón, J. E., Duong, T., and Wand, M. P. (2011), “Asymptotics for General Multivariate Kernel Density Derivative Estimators,” Statistica Sinica, 21, 807–840.
Web of Science ®Google Scholar
Chaudhuri, K., and Dasgupta, S. (2010), “Rates of Convergence for the Cluster Tree,” in Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, NIPS’10, pp. 343–351, Red Hook, NY, USA. Curran Associates Inc.
Google Scholar
Chaudhuri, K., Dasgupta, S., Kpotufe, S., and von Luxburg, U. (2014), “Consistent Procedures for Cluster Tree Estimation and Pruning,” IEEE Transactions on Information Theory, 60, 7900–7912.
Web of Science ®Google Scholar
Chazelle, B. (1993), “An Optimal Convex Hull Algorithm in Any Fixed Dimension,” Discrete & Computational Geometry, 10, 377–409. DOI: 10.1007/BF02573985.
Web of Science ®Google Scholar
Chen, Y.-C. (2017), “A Tutorial on Kernel Density Estimation and Recent Advances,” Biostatistics and Epidemiology, 1, 161–187.
Google Scholar
Chen, Y. C., Genovese, C. R., and Wasserman, L. (2016), “A Comprehensive Approach to Mode Clustering,” Electronic Journal of Statistics, 10, 210–241.
Web of Science ®Google Scholar
Cheng, Y. (1995), “Mean Shift, Mode Seeking, and Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.
Web of Science ®Google Scholar
Cuevas, A., Febrero, M., and Fraiman, R. (2000), “Estimating the Number of Clusters,” Canadian Journal of Statistics, 28, 367–382.
Web of Science ®Google Scholar
Cuevas, A., Febrero, M., and Fraiman, R. (2001), “Cluster Analysis: A Further Approach based on Density Estimation,” Computational Statistics and Data Analysis, 36, 441–459.
Web of Science ®Google Scholar
Delaunay, B. (1934), “Sur la sphère vide. a la mémoire de georges voronoï,” Bulletin de l’Académie des Sciences de l’URSS. Classe des sciences mathématiques et na, 6, 793–800.
Google Scholar
Eldridge, J., Belkin, M., and Wang, Y. (2015), “Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering,” Proceedings of Machine Learning Research (Vol. 40), pp. 588–606, Paris, France, 03–06 Jul 2015, PMLR.
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996), “A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231, AAAI Press.
Google Scholar
Fraley, C., and Raftery, A. E. (2002), “Model-based Clustering, Discriminant Analysis, and Density Estimation,” Journal of the American Statistical Association, 97, 611–631.
Web of Science ®Google Scholar
Fred, A. L. N., and Jain, A. K. (2005), “Combining Multiple Clusterings Using Evidence Accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 835–850. DOI: 10.1109/TPAMI.2005.113.
PubMed Web of Science ®Google Scholar
Fukunaga, K., and Hostetler, L. (1975), “The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition,” IEEE Transactions on Information Theory, 21, 32–40.
Web of Science ®Google Scholar
Hartigan, J. A., and Hartigan, P. M. (1985), “The Dip Test of Unimodality,” The Annals of Statistics, 13, 70–84.
Web of Science ®Google Scholar
Hartigan, J. A., and Wong, M. A. (1979), “Algorithm AS 136: A K-Means Clustering Algorithm,” Applied Statistics, 28, 100–108.
Google Scholar
Hennig, C. (2010), “Methods for Merging Gaussian Mixture Ccomponents,” Advances in Data Analysis and Classification, 4, 3–34.
Google Scholar
Heskes, T. (2001), “Self-Organizing Maps, Vector Quantization, and Mixture Modeling,” IEEE Transactions on Neural Networks, 12, 1299–1305. DOI: 10.1109/72.963766.
PubMed Web of Science ®Google Scholar
Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218.
Web of Science ®Google Scholar
Kim, J., Chen, Y.-C., Balakrishnan, S., Rinaldo, A., and Wasserman, L. (2016), “Statistical Inference for Cluster Trees,” in Advances in Neural Information Processing Systems (Vol. 29), eds. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, pp. 1839–1847, Curran Associates, Inc.
Google Scholar
Li, J. (2005), “Clustering based on a Multilayer Mixture Model,” Journal of Computational and Graphical Statistics, 14, 547–568. DOI: 10.1198/106186005X59586.
Web of Science ®Google Scholar
Li, J., Ray, S., and Lindsay, B. G. (2007), “A Nonparametric Statistical Approach to Clustering via Mode Identification,” Journal of Machine Learning Research, 8, 1687–1723.
Web of Science ®Google Scholar
Lloyd, S. P. (1982), “Least Squares Quantization in PCM,” IEEE Transactions on Information Theory, 28, 129–137.
Web of Science ®Google Scholar
Lo, K., Brinkman, R. R., and Gottardo, R. (2008), “Automated Gating of Flow Cytometry Data via Robust Model-based Clustering,” Cytometry Part A, 73, 321–332.
PubMedGoogle Scholar
Maitra, R. (2009), “Initializing Partition-Optimization Algorithms,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6, 144–157. DOI: 10.1109/TCBB.2007.70244.
PubMed Web of Science ®Google Scholar
Mason, D. M., and Polonik, W. (2009), “Asymptotic Normality of Plug-in Level Set Estimates,” The Annals of Applied Probability, 19, 1108–1142.
Web of Science ®Google Scholar
Menardi, G., and Azzalini, A. (2014), “An Advancement in Clustering via Nonparametric Density Estimation,” Statistics and Computing, 24, 753–767.
Web of Science ®Google Scholar
Nugent, R., and Stuetzle, W. (2010), “Clustering with Confidence: A Low-Dimensional Binning Approach,” in Classification as a Tool for Research, eds. H. Locarek-Junge and C. Weihs, pp. 117–125, Berlin: Springer.
Google Scholar
Peterson, A. D., Ghosh, A. P., and Maitra, R. (2018), “Merging k-means with Hierarchical Clustering for Identifying General-Shaped Groups,” Stat, 7, e172.
PubMed Web of Science ®Google Scholar
Polianskii, V., and Pokorny, F. T. (2020), “Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’20, pp. 2154–2164, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3394486.3403266.
Google Scholar
Pollard, D. (1982), “A Central Limit Theorem for k-Means Clustering,” The Annals of Probability, 10, 919–926.
Web of Science ®Google Scholar
Rand, W. M. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850.
Web of Science ®Google Scholar
Ray, S., and Lindsay, B. G. (2005), “The Topography of Multivariate Normal Mixtures,” The Annals of Statistics, 33, 2042–2065. DOI: 10.1214/009053605000000417.
Web of Science ®Google Scholar
Rinaldo, A., Singh, A., Nugent, R., and Wasserman, L. (2012), “Stability of Density-based Clustering,” Journal of Machine Learning Research, 13, 905–948.
Web of Science ®Google Scholar
Scott, D. W. (2015), Multivariate Density Estimation: Theory, Practice, and Visualization, Hoboken, NJ: Wiley.
Google Scholar
Scrucca, L. (2016), “Identifying Connected Components in Gaussian Finite Mixture Models for Clustering,” Computational Statistics & Data Analysis, 93, 5–17.
Web of Science ®Google Scholar
Shin, J., Rinaldo, A., and Wasserman, L. (2019), “Predictive Clustering,” arXiv preprint. DOI: 10.48550/arXiv.1903.08125.
Google Scholar
Stuetzle, W., and Nugent, R. (2010), “A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density,” Journal of Computational and Graphical Statistics, 19, 397–418.
Web of Science ®Google Scholar
Tibshirani, R., and Walther, G. (2005), “Cluster Validation by Prediction Strength,” Journal of Computational and Graphical Statistics, 14, 511–528. DOI: 10.1198/106186005X59243.
Web of Science ®Google Scholar
Tibshirani, R., Walther, G., and Hastie, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” Journal of the Royal Statistical Society, Series B, 63, 411–423.
Google Scholar
Tsimidou, M., Macrae, R., and Wilson, I. (1987), “Authentication of Virgin Olive Oils Using Principal Component Analysis of Triglyceride and Fatty Acid Profiles: Part 1—Classification of Greek Olive Oils,” Food Chemistry, 25, 227–239.
Web of Science ®Google Scholar
Turner, P., Liu, J., and Rigollet, P. (2020), “A statistical perspective on coreset density estimation,” arXiv preprint. DOI: 10.48550/arXiv.2011.04907.
Google Scholar
Voronoi, G. (1908), “Recherches sur les paralléloèdres primitives,” Journal für die reine und angewandte Mathematik, 134, 198–287.
Google Scholar
Walesiak, M., and Dudek, A. (2020), “The Choice of Variable Normalization Method in Cluster Analysis,” in Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development During Global Challenges, ed. K. S. Soliman, pp. 325–340, International Business Information Management Association (IBIMA).
Google Scholar
Wasserman, L. (2006), All of Nonparametric Statistics, New York: Springer. DOI: 10.1007/0-387-30623-4.
Google Scholar
Weber, R., Schek, H.-J., and Blott, S. (1998), “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” in Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98, pp. 194–205, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Skeleton Clustering: Dimension-Free Density-Aided Clustering

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Skeleton Clustering: Dimension-Free Density-Aided Clustering

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date