367
Views
10
CrossRef citations to date
0
Altmetric
Spatial, Graph, and Dependent Data Methodology

Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering

, , &
Pages 422-441 | Received 22 Apr 2019, Accepted 10 Sep 2020, Published online: 21 Oct 2020
 

Abstract

Our problem of interest is to cluster vertices of a graph by identifying underlying community structure. Among various vertex clustering approaches, spectral clustering is one of the most popular methods because it is easy to implement while often outperforming more traditional clustering algorithms. However, there are two inherent model selection problems in spectral clustering, namely estimating both the embedding dimension and number of clusters. This article attempts to address the issue by establishing a novel model selection framework specifically for vertex clustering on graphs under a stochastic block model. The first contribution is a probabilistic model which approximates the distribution of the extended spectral embedding of a graph. The model is constructed based on a theoretical result of asymptotic normality for the informative part of the embedding, and on simulation results providing a conjecture for the limiting behavior of the redundant part of the embedding. The second contribution is a simultaneous model selection framework. In contrast with traditional approaches, our model selection procedure estimates embedding dimension and number of clusters simultaneously. Based on our conjectured distributional model, a theorem on the consistency of the estimates of model parameters is presented, providing support for the utility of our method. Algorithms for our simultaneous model selection (SMS) for vertex clustering are proposed, demonstrating superior performance in simulation experiments. We illustrate our method via application to a collection of brain graphs.

Acknowledgments

The authors thank two referees and an associate editor for their time and effort; this article is significantly improved based on their thorough and constructive reports. The required R source codes and the data to reproduce the results in this article can be found in https://github.com/youngser/dhatkhat.

Additional information

Funding

This work was partially supported by DARPA D3M contract FA8750-17-2-0112 and by the Naval Engineering Education Consortium through Office of Naval Research award number N00174-19-1-0011.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.