1,439
Views
33
CrossRef citations to date
0
Altmetric
Original Articles

Estimating the Number of Clusters Using Cross-Validation

&
Pages 162-173 | Received 09 Feb 2017, Accepted 17 Jul 2019, Published online: 30 Sep 2019
 

Abstract

Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong modeling assumptions. This article proposes a data-driven approach to estimate the number of clusters based on a novel form of cross-validation. The proposed method differs from ordinary cross-validation, because clustering is fundamentally an unsupervised learning problem. Simulation and real data analysis results show that the proposed method outperforms existing methods, especially in high-dimensional settings with heterogeneous or heavy-tailed noise. In a yeast cell cycle dataset, the proposed method finds a parsimonious clustering with interpretable gene groupings. Supplementary materials for this article are available online.

Acknowledgments

We thank Rob Tibshirani for getting us started on this problem and for providing code for some initial simulations. We thank Art Owen for providing us with a summary of the relevant theory on k-means clustering, and for giving us feedback on our theoretical results. We also thank Cliff Hurvich, Josh Reed, and Jeff Simonoff, for providing comments on an early draft of this article and for suggesting further avenues of inquiry.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.