Search in:

Advanced search

Journal of Computational and Graphical Statistics Latest Articles

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Research Article

A Penalized Criterion for Selecting the Number of Clusters for K-Medians

Antoine Godichon-BaggioniLaboratoire de Probabilités, Statistique et Modélisation, Sorbonne-Université, Paris, France

Sobihan SurendranLaboratoire de Probabilités, Statistique et Modélisation, Sorbonne-Université, Paris, FranceCorrespondence[email protected]

Received 20 Jun 2023, Accepted 24 Feb 2024, Published online: 29 Mar 2024

Cite this article
https://doi.org/10.1080/10618600.2024.2325458
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Arlot, S., and Massart, P. (2009), “Data-Driven Calibration of Penalties for Least-Squares Regression,” Journal of Machine Learning Research, 10, 245–279.
Web of Science ®Google Scholar
Bartlett, P. L., Linder, T., and Lugosi, G. (1998), “The Minimax Distortion Redundancy in Empirical Quantizer Design,” IEEE Transactions on Information Theory, 44, 1802–1813. DOI: 10.1109/18.705560.
Web of Science ®Google Scholar
Baudry, J.-P., Maugis, C., and Michel, B. (2012), “Slope Heuristics: Overview and Implementation,” Statistics and Computing, 22, 455–470. DOI: 10.1007/s11222-011-9236-1.
Web of Science ®Google Scholar
Bello, D. Z., Valk, M., and Cybis, G. B. (2023), “Towards u-statistics Clustering Inference for Multiple Groups,” Journal of Statistical Computation and Simulation, 94, 1–19. DOI: 10.1080/00949655.2023.2239978.
Web of Science ®Google Scholar
Berkhin, P. (2006), “A Survey of Clustering Data Mining Techniques,” in Grouping Multidimensional Data, pp. 25–71, Berlin: Springer.
Google Scholar
Bezdek, J. C. (2013), Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Springer.
Google Scholar
Birgé, L., and Massart, P. (2007), “Minimal Penalties for Gaussian Model Selection,” Probability Theory and Related Fields, 138, 33–73. DOI: 10.1007/s00440-006-0011-8.
Web of Science ®Google Scholar
Brault, V., Baudry, J.-P., Maugis, C., Michel, B., and Brault, M. V. (2011), “Package ‘capushe’.”
Google Scholar
Cardot, H., Cénac, P., and Monnez, J.-M. (2012), “A Fast and Recursive Algorithm for Clustering Large Datasets with k-medians,” Computational Statistics & Data Analysis, 56, 1434–1449. DOI: 10.1016/j.csda.2011.11.019.
Web of Science ®Google Scholar
Cardot, H., Cénac, P., and Zitt, P.-A. (2013), “Efficient and Fast Estimation of the Geometric Median in Hilbert Spaces with an Averaged Stochastic Gradient Algorithm,” Bernoulli, 19, 18–43. DOI: 10.3150/11-BEJ390.
Web of Science ®Google Scholar
Cardot, H., and Godichon-Baggioni, A. (2017), “Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis,” Test, 26, 461–480. DOI: 10.1007/s11749-016-0519-x.
Web of Science ®Google Scholar
Cesa-Bianchi, N., and Lugosi, G. (1999), “Minimax Regret Under Log Loss for General Classes of Experts,” in Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 12–18. DOI: 10.1145/307400.307407.
Google Scholar
Duflo, M. (1997), Random Iterative Models, Stochastic Modelling and Applied Probability (Vol. 34), New York: Springer.
Google Scholar
Dunn, J. C. (1973), “A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters,” Journal of Cybernetics, 3, 32–57. DOI: 10.1080/01969727308546046.
Google Scholar
Fischer, A. (2011), “On the Number of Groups in Clustering,” Statistics & Probability Letters, 81, 1771–1781. DOI: 10.1016/j.spl.2011.07.005.
Web of Science ®Google Scholar
Forgy, E. W. (1965), “Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications,” biometrics, 21, 768–769.
Web of Science ®Google Scholar
Gagolewski, M., Bartoszuk, M., and Cena, A. (2016), “Genie: A New, Fast, and Outlier-Resistant Hierarchical Clustering Algorithm,” Information Sciences, 363m 8–23. DOI: 10.1016/j.ins.2016.05.003.
Web of Science ®Google Scholar
Gersho, A., and Gray, R. M. (2012), Vector Quantization and Signal Compression (Vol. 159), New York: Springer.
Google Scholar
Haldane, J. (1948), “Note on the Median of a Multivariate Distribution,” Biometrika, 35, 414–417. DOI: 10.1093/biomet/35.3-4.414.
Web of Science ®Google Scholar
Hoeffding, W. (1994), “Probability Inequalities for Sums of Bounded Random Variables,” in The collected works of Wassily Hoeffding, pp. 409–426, New York: Springer.
Google Scholar
Huang, H., Liu, Y., Yuan, M., and Marron, J. (2015), “Statistical Significance of Clustering Using Soft Thresholding,” Journal of Computational and Graphical Statistics, 24, 975–993. DOI: 10.1080/10618600.2014.948179.
PubMed Web of Science ®Google Scholar
Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218. DOI: 10.1007/BF01908075.
Web of Science ®Google Scholar
Jain, A. K., and Dubes, R. C. (1988), Algorithms for Clustering Data, Upper Saddle River, NJ: Prentice-Hall, Inc.
Google Scholar
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999), “Data Clustering: A Review,” ACM computing surveys (CSUR), 31, 264–323. DOI: 10.1145/331499.331504.
Web of Science ®Google Scholar
Kaufman, L., and Rousseeuw, P. J. (2009), Finding Groups in Data: An Introduction to Cluster Analysis, Hoboken, NJ: Wiley.
Google Scholar
Kemperman, J. (1987), “The Median of a Finite Measure on a Banach Space,” Statistical Data Analysis based on the L1-norm and Related Methods (Neuchâtel, 1987), pp. 217–230.
Google Scholar
Linder, T. (2000), “On the Training Distortion of Vector Quantizers,” IEEE Transactions on Information Theory, 46, 1617–1623. DOI: 10.1109/18.850705.
Web of Science ®Google Scholar
Liu, Y., Hayes, D. N., Nobel, A., and Marron, J. S. (2008), “Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data,” Journal of the American Statistical Association, 103, 1281–1293. DOI: 10.1198/016214508000000454.
Web of Science ®Google Scholar
MacQueen, J. (1967), “Classification and Analysis of Multivariate Observations,” in 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297.
Google Scholar
Massart, P. (2007), Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003, Berlin: Springer.
Google Scholar
McDiarmid, C. (1989), “On the Method of Bounded Differences,” Surveys in Combinatorics, 141, 148–188.
Google Scholar
Mirkin, B. (1996), Mathematical Classification and Clustering (Vol. 11), New York: Springer.
Google Scholar
Polyak, B. T., and Juditsky, A. B. (1992), “Acceleration of Stochastic Approximation by Averaging,” SIAM Journal on Control and Optimization, 30, 838–855. DOI: 10.1137/0330046.
Web of Science ®Google Scholar
Rand, W. M. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850. DOI: 10.1080/01621459.1971.10482356.
Web of Science ®Google Scholar
Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
Google Scholar
Ruppert, D. (1985), “A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure,” The Annals of Statistics, 13, 236–245. DOI: 10.1214/aos/1176346589.
Web of Science ®Google Scholar
Spath, H. (1980), Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Chichester: Ellis Horwood.
Google Scholar
Tibshirani, R., Walther, G., and Hastie, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” Journal of the Royal Statistical Society, Series B, 63, 411–423. DOI: 10.1111/1467-9868.00293.
Google Scholar
Vardi, Y., and Zhang, C.-H. (2000), “The Multivariate l 1-median and Associated Data Depth,” Proceedings of the National Academy of Sciences, 97, 1423–1426. DOI: 10.1073/pnas.97.4.1423.
PubMed Web of Science ®Google Scholar
Weiszfeld, E. (1937), “Sur le point pour lequel la somme des distances de n points donnés est minimum,” Tohoku Mathematical Journal, First Series, 43, 355–386.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A Penalized Criterion for Selecting the Number of Clusters for K-Medians

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A Penalized Criterion for Selecting the Number of Clusters for K-Medians

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date