79
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A Penalized Criterion for Selecting the Number of Clusters for K-Medians

&
Received 20 Jun 2023, Accepted 24 Feb 2024, Published online: 29 Mar 2024

References

  • Arlot, S., and Massart, P. (2009), “Data-Driven Calibration of Penalties for Least-Squares Regression,” Journal of Machine Learning Research, 10, 245–279.
  • Bartlett, P. L., Linder, T., and Lugosi, G. (1998), “The Minimax Distortion Redundancy in Empirical Quantizer Design,” IEEE Transactions on Information Theory, 44, 1802–1813. DOI: 10.1109/18.705560.
  • Baudry, J.-P., Maugis, C., and Michel, B. (2012), “Slope Heuristics: Overview and Implementation,” Statistics and Computing, 22, 455–470. DOI: 10.1007/s11222-011-9236-1.
  • Bello, D. Z., Valk, M., and Cybis, G. B. (2023), “Towards u-statistics Clustering Inference for Multiple Groups,” Journal of Statistical Computation and Simulation, 94, 1–19. DOI: 10.1080/00949655.2023.2239978.
  • Berkhin, P. (2006), “A Survey of Clustering Data Mining Techniques,” in Grouping Multidimensional Data, pp. 25–71, Berlin: Springer.
  • Bezdek, J. C. (2013), Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Springer.
  • Birgé, L., and Massart, P. (2007), “Minimal Penalties for Gaussian Model Selection,” Probability Theory and Related Fields, 138, 33–73. DOI: 10.1007/s00440-006-0011-8.
  • Brault, V., Baudry, J.-P., Maugis, C., Michel, B., and Brault, M. V. (2011), “Package ‘capushe’.”
  • Cardot, H., Cénac, P., and Monnez, J.-M. (2012), “A Fast and Recursive Algorithm for Clustering Large Datasets with k-medians,” Computational Statistics & Data Analysis, 56, 1434–1449. DOI: 10.1016/j.csda.2011.11.019.
  • Cardot, H., Cénac, P., and Zitt, P.-A. (2013), “Efficient and Fast Estimation of the Geometric Median in Hilbert Spaces with an Averaged Stochastic Gradient Algorithm,” Bernoulli, 19, 18–43. DOI: 10.3150/11-BEJ390.
  • Cardot, H., and Godichon-Baggioni, A. (2017), “Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis,” Test, 26, 461–480. DOI: 10.1007/s11749-016-0519-x.
  • Cesa-Bianchi, N., and Lugosi, G. (1999), “Minimax Regret Under Log Loss for General Classes of Experts,” in Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 12–18. DOI: 10.1145/307400.307407.
  • Duflo, M. (1997), Random Iterative Models, Stochastic Modelling and Applied Probability (Vol. 34), New York: Springer.
  • Dunn, J. C. (1973), “A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters,” Journal of Cybernetics, 3, 32–57. DOI: 10.1080/01969727308546046.
  • Fischer, A. (2011), “On the Number of Groups in Clustering,” Statistics & Probability Letters, 81, 1771–1781. DOI: 10.1016/j.spl.2011.07.005.
  • Forgy, E. W. (1965), “Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications,” biometrics, 21, 768–769.
  • Gagolewski, M., Bartoszuk, M., and Cena, A. (2016), “Genie: A New, Fast, and Outlier-Resistant Hierarchical Clustering Algorithm,” Information Sciences, 363m 8–23. DOI: 10.1016/j.ins.2016.05.003.
  • Gersho, A., and Gray, R. M. (2012), Vector Quantization and Signal Compression (Vol. 159), New York: Springer.
  • Haldane, J. (1948), “Note on the Median of a Multivariate Distribution,” Biometrika, 35, 414–417. DOI: 10.1093/biomet/35.3-4.414.
  • Hoeffding, W. (1994), “Probability Inequalities for Sums of Bounded Random Variables,” in The collected works of Wassily Hoeffding, pp. 409–426, New York: Springer.
  • Huang, H., Liu, Y., Yuan, M., and Marron, J. (2015), “Statistical Significance of Clustering Using Soft Thresholding,” Journal of Computational and Graphical Statistics, 24, 975–993. DOI: 10.1080/10618600.2014.948179.
  • Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218. DOI: 10.1007/BF01908075.
  • Jain, A. K., and Dubes, R. C. (1988), Algorithms for Clustering Data, Upper Saddle River, NJ: Prentice-Hall, Inc.
  • Jain, A. K., Murty, M. N., and Flynn, P. J. (1999), “Data Clustering: A Review,” ACM computing surveys (CSUR), 31, 264–323. DOI: 10.1145/331499.331504.
  • Kaufman, L., and Rousseeuw, P. J. (2009), Finding Groups in Data: An Introduction to Cluster Analysis, Hoboken, NJ: Wiley.
  • Kemperman, J. (1987), “The Median of a Finite Measure on a Banach Space,” Statistical Data Analysis based on the L1-norm and Related Methods (Neuchâtel, 1987), pp. 217–230.
  • Linder, T. (2000), “On the Training Distortion of Vector Quantizers,” IEEE Transactions on Information Theory, 46, 1617–1623. DOI: 10.1109/18.850705.
  • Liu, Y., Hayes, D. N., Nobel, A., and Marron, J. S. (2008), “Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data,” Journal of the American Statistical Association, 103, 1281–1293. DOI: 10.1198/016214508000000454.
  • MacQueen, J. (1967), “Classification and Analysis of Multivariate Observations,” in 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297.
  • Massart, P. (2007), Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003, Berlin: Springer.
  • McDiarmid, C. (1989), “On the Method of Bounded Differences,” Surveys in Combinatorics, 141, 148–188.
  • Mirkin, B. (1996), Mathematical Classification and Clustering (Vol. 11), New York: Springer.
  • Polyak, B. T., and Juditsky, A. B. (1992), “Acceleration of Stochastic Approximation by Averaging,” SIAM Journal on Control and Optimization, 30, 838–855. DOI: 10.1137/0330046.
  • Rand, W. M. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850. DOI: 10.1080/01621459.1971.10482356.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
  • Ruppert, D. (1985), “A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure,” The Annals of Statistics, 13, 236–245. DOI: 10.1214/aos/1176346589.
  • Spath, H. (1980), Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Chichester: Ellis Horwood.
  • Tibshirani, R., Walther, G., and Hastie, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” Journal of the Royal Statistical Society, Series B, 63, 411–423. DOI: 10.1111/1467-9868.00293.
  • Vardi, Y., and Zhang, C.-H. (2000), “The Multivariate l 1-median and Associated Data Depth,” Proceedings of the National Academy of Sciences, 97, 1423–1426. DOI: 10.1073/pnas.97.4.1423.
  • Weiszfeld, E. (1937), “Sur le point pour lequel la somme des distances de n points donnés est minimum,” Tohoku Mathematical Journal, First Series, 43, 355–386.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.