535
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Skeleton Clustering: Dimension-Free Density-Aided Clustering

ORCID Icon & ORCID Icon
Pages 1124-1135 | Received 21 Apr 2021, Accepted 13 Jan 2023, Published online: 06 Mar 2023

References

  • Almodovar-Rivera, I. A., and Maitra, R. (2020), “Kernel-Estimated Nonparametric Overlap-based Syncytial Clustering,” Journal of Machine Learning Research, 21, 1–54.
  • Amenta, N., Attali, D., and Devillers, O. (2007), “Complexity of Delaunay Triangulation for Points on Lower-Dimensional Polyhedra,” in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pp. 1106–1113, USA. Society for Industrial and Applied Mathematics.
  • Aragam, B., Dan, C., Xing, E. P., and Ravikumar, P. (2020), “Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering,” The Annals of Statistics, 48, 2277–2302. DOI: 10.1214/19-AOS1887.
  • Azzalini, A., and Torelli, N. (2007), “Clustering via Nonparametric Density Estimation,” Statistics and Computing, 17, 71–80.
  • Bachem, O., Lucic, M., and Krause, A. (2017), “Practical Coreset Constructions for Machine Learning,” arXiv preprint. DOI: 10.48550/arXiv.1703.06476.
  • Baudry, J.-P., Raftery, A. E., Celeux, G., Lo, K., and Gottardo, R. (2010), “Combining Mixture Components for Clustering,” Journal of Computational and Graphical Statistics, 19, 332–353. DOI: 10.1198/jcgs.2010.08111.
  • Bentley, J. L. (1975), “Multidimensional Binary Search Trees used for Associative Searching,” Communications of the ACM, 18, 509–517.
  • Berg, M. d., Cheong, O., Kreveld, M. v., and Overmars, M. (2008), Computational Geometry: Algorithms and Applications (3rd ed.), Springer-Verlag TELOS, Santa Clara, CA, USA.
  • Brinkman, R. R., Gasparetto, M., Lee, S. J. J., Ribickas, A. J., Perkins, J., Janssen, W., Smiley, R., and Smith, C. (2007), “High-Content Flow Cytometry and Temporal Data Analysis for Defining a Cellular Signature of Graft-Versus-Host Disease,” Biology of Blood and Marrow Transplantation, 13, 691–700.
  • Campello, R. J., Moulavi, D., Zimek, A., and Sander, J. (2015), “Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection,” ACM Transactions on Knowledge Discovery from Data, 10, 1–51.
  • Carreira-Perpinán, M. A. (2015), “A Review of Mean-Shift Algorithms for Clustering,” arXiv preprint arXiv:1503.00687.
  • Chacón, J. E. (2015), “A Population Background for Nonparametric Density-Based Clustering,” Statistical Science, 30, 518–532. DOI: 10.1214/15-STS526.
  • Chacón, J. E. (2019), “Mixture Model Modal Clustering,” Advances in Data Analysis and Classification, 13, 379–404.
  • Chacón, J. E., and Duong, T. (2013), “Data-Driven Density Derivative Estimation, with Applications to Nonparametric Clustering and Bump Hunting,” Electronic Journal of Statistics, 7, 499–532.
  • Chacón, J. E., Duong, T., and Wand, M. P. (2011), “Asymptotics for General Multivariate Kernel Density Derivative Estimators,” Statistica Sinica, 21, 807–840.
  • Chaudhuri, K., and Dasgupta, S. (2010), “Rates of Convergence for the Cluster Tree,” in Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, NIPS’10, pp. 343–351, Red Hook, NY, USA. Curran Associates Inc.
  • Chaudhuri, K., Dasgupta, S., Kpotufe, S., and von Luxburg, U. (2014), “Consistent Procedures for Cluster Tree Estimation and Pruning,” IEEE Transactions on Information Theory, 60, 7900–7912.
  • Chazelle, B. (1993), “An Optimal Convex Hull Algorithm in Any Fixed Dimension,” Discrete & Computational Geometry, 10, 377–409. DOI: 10.1007/BF02573985.
  • Chen, Y.-C. (2017), “A Tutorial on Kernel Density Estimation and Recent Advances,” Biostatistics and Epidemiology, 1, 161–187.
  • Chen, Y. C., Genovese, C. R., and Wasserman, L. (2016), “A Comprehensive Approach to Mode Clustering,” Electronic Journal of Statistics, 10, 210–241.
  • Cheng, Y. (1995), “Mean Shift, Mode Seeking, and Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.
  • Cuevas, A., Febrero, M., and Fraiman, R. (2000), “Estimating the Number of Clusters,” Canadian Journal of Statistics, 28, 367–382.
  • Cuevas, A., Febrero, M., and Fraiman, R. (2001), “Cluster Analysis: A Further Approach based on Density Estimation,” Computational Statistics and Data Analysis, 36, 441–459.
  • Delaunay, B. (1934), “Sur la sphère vide. a la mémoire de georges voronoï,” Bulletin de l’Académie des Sciences de l’URSS. Classe des sciences mathématiques et na, 6, 793–800.
  • Eldridge, J., Belkin, M., and Wang, Y. (2015), “Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering,” Proceedings of Machine Learning Research (Vol. 40), pp. 588–606, Paris, France, 03–06 Jul 2015, PMLR.
  • Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996), “A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231, AAAI Press.
  • Fraley, C., and Raftery, A. E. (2002), “Model-based Clustering, Discriminant Analysis, and Density Estimation,” Journal of the American Statistical Association, 97, 611–631.
  • Fred, A. L. N., and Jain, A. K. (2005), “Combining Multiple Clusterings Using Evidence Accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 835–850. DOI: 10.1109/TPAMI.2005.113.
  • Fukunaga, K., and Hostetler, L. (1975), “The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition,” IEEE Transactions on Information Theory, 21, 32–40.
  • Hartigan, J. A., and Hartigan, P. M. (1985), “The Dip Test of Unimodality,” The Annals of Statistics, 13, 70–84.
  • Hartigan, J. A., and Wong, M. A. (1979), “Algorithm AS 136: A K-Means Clustering Algorithm,” Applied Statistics, 28, 100–108.
  • Hennig, C. (2010), “Methods for Merging Gaussian Mixture Ccomponents,” Advances in Data Analysis and Classification, 4, 3–34.
  • Heskes, T. (2001), “Self-Organizing Maps, Vector Quantization, and Mixture Modeling,” IEEE Transactions on Neural Networks, 12, 1299–1305. DOI: 10.1109/72.963766.
  • Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218.
  • Kim, J., Chen, Y.-C., Balakrishnan, S., Rinaldo, A., and Wasserman, L. (2016), “Statistical Inference for Cluster Trees,” in Advances in Neural Information Processing Systems (Vol. 29), eds. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, pp. 1839–1847, Curran Associates, Inc.
  • Li, J. (2005), “Clustering based on a Multilayer Mixture Model,” Journal of Computational and Graphical Statistics, 14, 547–568. DOI: 10.1198/106186005X59586.
  • Li, J., Ray, S., and Lindsay, B. G. (2007), “A Nonparametric Statistical Approach to Clustering via Mode Identification,” Journal of Machine Learning Research, 8, 1687–1723.
  • Lloyd, S. P. (1982), “Least Squares Quantization in PCM,” IEEE Transactions on Information Theory, 28, 129–137.
  • Lo, K., Brinkman, R. R., and Gottardo, R. (2008), “Automated Gating of Flow Cytometry Data via Robust Model-based Clustering,” Cytometry Part A, 73, 321–332.
  • Maitra, R. (2009), “Initializing Partition-Optimization Algorithms,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6, 144–157. DOI: 10.1109/TCBB.2007.70244.
  • Mason, D. M., and Polonik, W. (2009), “Asymptotic Normality of Plug-in Level Set Estimates,” The Annals of Applied Probability, 19, 1108–1142.
  • Menardi, G., and Azzalini, A. (2014), “An Advancement in Clustering via Nonparametric Density Estimation,” Statistics and Computing, 24, 753–767.
  • Nugent, R., and Stuetzle, W. (2010), “Clustering with Confidence: A Low-Dimensional Binning Approach,” in Classification as a Tool for Research, eds. H. Locarek-Junge and C. Weihs, pp. 117–125, Berlin: Springer.
  • Peterson, A. D., Ghosh, A. P., and Maitra, R. (2018), “Merging k-means with Hierarchical Clustering for Identifying General-Shaped Groups,” Stat, 7, e172.
  • Polianskii, V., and Pokorny, F. T. (2020), “Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’20, pp. 2154–2164, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3394486.3403266.
  • Pollard, D. (1982), “A Central Limit Theorem for k-Means Clustering,” The Annals of Probability, 10, 919–926.
  • Rand, W. M. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850.
  • Ray, S., and Lindsay, B. G. (2005), “The Topography of Multivariate Normal Mixtures,” The Annals of Statistics, 33, 2042–2065. DOI: 10.1214/009053605000000417.
  • Rinaldo, A., Singh, A., Nugent, R., and Wasserman, L. (2012), “Stability of Density-based Clustering,” Journal of Machine Learning Research, 13, 905–948.
  • Scott, D. W. (2015), Multivariate Density Estimation: Theory, Practice, and Visualization, Hoboken, NJ: Wiley.
  • Scrucca, L. (2016), “Identifying Connected Components in Gaussian Finite Mixture Models for Clustering,” Computational Statistics & Data Analysis, 93, 5–17.
  • Shin, J., Rinaldo, A., and Wasserman, L. (2019), “Predictive Clustering,” arXiv preprint. DOI: 10.48550/arXiv.1903.08125.
  • Stuetzle, W., and Nugent, R. (2010), “A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density,” Journal of Computational and Graphical Statistics, 19, 397–418.
  • Tibshirani, R., and Walther, G. (2005), “Cluster Validation by Prediction Strength,” Journal of Computational and Graphical Statistics, 14, 511–528. DOI: 10.1198/106186005X59243.
  • Tibshirani, R., Walther, G., and Hastie, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” Journal of the Royal Statistical Society, Series B, 63, 411–423.
  • Tsimidou, M., Macrae, R., and Wilson, I. (1987), “Authentication of Virgin Olive Oils Using Principal Component Analysis of Triglyceride and Fatty Acid Profiles: Part 1—Classification of Greek Olive Oils,” Food Chemistry, 25, 227–239.
  • Turner, P., Liu, J., and Rigollet, P. (2020), “A statistical perspective on coreset density estimation,” arXiv preprint. DOI: 10.48550/arXiv.2011.04907.
  • Voronoi, G. (1908), “Recherches sur les paralléloèdres primitives,” Journal für die reine und angewandte Mathematik, 134, 198–287.
  • Walesiak, M., and Dudek, A. (2020), “The Choice of Variable Normalization Method in Cluster Analysis,” in Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development During Global Challenges, ed. K. S. Soliman, pp. 325–340, International Business Information Management Association (IBIMA).
  • Wasserman, L. (2006), All of Nonparametric Statistics, New York: Springer. DOI: 10.1007/0-387-30623-4.
  • Weber, R., Schek, H.-J., and Blott, S. (1998), “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” in Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98, pp. 194–205, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.