225
Views
2
CrossRef citations to date
0
Altmetric
Multivariate and Dependent Data

Improving Spectral Clustering Using the Asymptotic Value of the Normalized Cut

Pages 980-992 | Received 12 Jan 2018, Accepted 03 Mar 2019, Published online: 20 May 2019
 

Abstract

Spectral clustering (SC) is a popular and versatile clustering method based on a relaxation of the normalized graph cut objective. Despite its popularity, selecting the number of clusters and tuning the important scaling parameter remain challenging problems in practical applications of SC. Popular heuristics have been proposed, but corresponding theoretical results are scarce. In this article, we investigate the asymptotic value of the normalized cut for an increasing sample assumed to arise from an underlying probability distribution. Based on this, we find strong connections between spectral and density clustering. This enables us to provide recommendations for selecting the number of clusters and setting the scaling parameter in a data driven manner. An algorithm inspired by these recommendations is proposed, which we have found to exhibit strong performance in a range of applied domains. An R implementation of the algorithm is available from https://github.com/DavidHofmeyr/spuds. Supplementary materials for this article are available online.

Acknowledgments

The author thanks Dr. Nicos Pavlidis for his valuable comments on this work. He is also very grateful to the anonymous reviewers, whose recommendations have led to substantial improvements in the quality of the work.

Notes

1 We use the implementation provided by the authors, taken from http://www.vision.caltech.edu/lihi/Demos/SelfTuningClustering.html.

2 UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets.html): Opt. Digits, M.F. Digits, Satellite, Pen Digits, Image Seg., Wine, Seeds, Iris, Glass, Dermatology, Breast Cancer, Control Chart, Mammography, Parkinsons, Libras.

3 The Elements of Statistical Learning (https://web.stanford.edu/∼hastie/ElemStatLearn/data.html): Phoneme.

4 The Yale faces database (https://cervisia.org/machine\_learning\_data.php): Yale.

5 CRAN package pdfCluster (https://CRAN.R-project.org/package=pdfCluster): Olive Oil.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.