225
Views
2
CrossRef citations to date
0
Altmetric
Multivariate and Dependent Data

Improving Spectral Clustering Using the Asymptotic Value of the Normalized Cut

Pages 980-992 | Received 12 Jan 2018, Accepted 03 Mar 2019, Published online: 20 May 2019
 

Abstract

Spectral clustering (SC) is a popular and versatile clustering method based on a relaxation of the normalized graph cut objective. Despite its popularity, selecting the number of clusters and tuning the important scaling parameter remain challenging problems in practical applications of SC. Popular heuristics have been proposed, but corresponding theoretical results are scarce. In this article, we investigate the asymptotic value of the normalized cut for an increasing sample assumed to arise from an underlying probability distribution. Based on this, we find strong connections between spectral and density clustering. This enables us to provide recommendations for selecting the number of clusters and setting the scaling parameter in a data driven manner. An algorithm inspired by these recommendations is proposed, which we have found to exhibit strong performance in a range of applied domains. An R implementation of the algorithm is available from https://github.com/DavidHofmeyr/spuds. Supplementary materials for this article are available online.

Acknowledgments

The author thanks Dr. Nicos Pavlidis for his valuable comments on this work. He is also very grateful to the anonymous reviewers, whose recommendations have led to substantial improvements in the quality of the work.

Notes

1 We use the implementation provided by the authors, taken from http://www.vision.caltech.edu/lihi/Demos/SelfTuningClustering.html.

2 UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets.html): Opt. Digits, M.F. Digits, Satellite, Pen Digits, Image Seg., Wine, Seeds, Iris, Glass, Dermatology, Breast Cancer, Control Chart, Mammography, Parkinsons, Libras.

3 The Elements of Statistical Learning (https://web.stanford.edu/∼hastie/ElemStatLearn/data.html): Phoneme.

4 The Yale faces database (https://cervisia.org/machine\_learning\_data.php): Yale.

5 CRAN package pdfCluster (https://CRAN.R-project.org/package=pdfCluster): Olive Oil.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.