513
Views
3
CrossRef citations to date
0
Altmetric
Research Articles

Spatially–encouraged spectral clustering: a technique for blending map typologies and regionalization

ORCID Icon
Pages 2356-2373 | Received 20 Apr 2018, Accepted 21 May 2021, Published online: 05 Jul 2021
 

ABSTRACT

Clustering is a central concern in geographic data science and reflects a large, active domain of research. In spatial clustering, it is often challenging to balance two kinds of ‘goodness of fit:’ clusters should have ‘feature’ homogeneity, in that they aim to represent one ‘type’ of observation, and also ‘geographic’ coherence, in that they aim to represent some detected geographical ‘place’. This divides ‘map typologization’ studies, common in geodemographics, from ‘regionalization’ studies, common in spatial optimization and statistics. Recent attempts to simultaneously typologize and regionalize data into clusters with both feature homogeneity and geographic coherence have faced conceptual and computational challenges. Fortunately, new work on spectral clustering can address both regionalization and typologization tasks within the same framework. This research develops a novel kernel combination method for use within spectral clustering that allows analysts to blend smoothly between feature homogeneity and geographic coherence. I explore the formal properties of two kernel combination methods and recommend multiplicative kernel combination with spectral clustering. Altogether, spatially encouraged spectral clustering is shown as a novel kernel combination clustering method that can address both regionalization and typologization tasks in order to reveal the geographies latent in spatially structured data.

Data and codes availability statement

All code and documentation for the plots and algorithms in this paper are made available on the Open Science Framework (https://doi.org/10.17605/OSF.IO/FCS5X). Furthermore, a generalized spatially encouraged spectral clustering algorithm has been made available in the PySAL package (Rey and Anselin Citation2007) as part of the spopt subpackage. The algorithm depends primarily on NumPy (van der Walt et al. Citation2011) and scikit-learn (Pedregosa et al. Citation2011).

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes

1. Further types of ‘core detection’ (Aldstadt and Getis Citation2006, Murray et al. Citation2014, Kim et al. Citation2017) or ‘boundary detection’ (Jacquez et al. Citation2008, Dean et al. Citation2018, Dong et al. Citation2018) allow for ‘non-exhaustive’ partitions, where observations can evade cluster assignments. This is not of interest at here – using Kim et al. (Citation2017)’s terminology, this means only ‘districting’ methods are considered.

2. Numerically, it is common to use a kernel function, such as the negative exponential kernel, and standardize the resulting values to between 0 and 1.

3. although the minimum size or shape regularity are not parameterized directly as in other methods (Duque et al. Citation2012, Li et al. Citation2014).

4. In their specific case, Yuan et al. (Citation2015) cluster principal components derived from many mean-centered and unit-deviation standardized covariates. But, τ2 is not intended to stand in as the empirical variance of X generally, as X may be N×P with different variances for each feature but τ2 is scalar and used for all P.

5. This algorithm is made available post-publication in PySAL, the Python spatial analysis library (Rey and Anselin Citation2007), and is built primarily using NumPy (van der Walt et al. Citation2011) and scikit-learn (Pedregosa et al. Citation2011).

6. The binarized contiguity kernel is used here for simplicity. Each row of uses the Aη connectivity matrix, connecting observations with maximum path order η, since the non-binary exponential kernel behaves substantively similarly to η=0.

7. Further, this has similar semantics to the Queen contiguity matrix used in the previous example: the Delaunay triangulation is the dual graph of a Voronoi diagram for the Airbnbs, as the Queen contiguity graph is a kind of dual graph for the Texas counties. Their order statistics, both at first order and higher, are also similar. Alternative spatial kernels, like k-nearest neighbor or Distance-weighted kernels could also be used.

8. Precisely, I set aside 25% of the listings, compute their nearest geographic cluster, and predict their price using the mean cluster price. I am grateful to an anonymous reviewer for proposing this method.

Additional information

Funding

This material is based upon work supported by the National Science Foundation under [Grant No. 1733705]; as well as the Alan Turing Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation and/or the Alan Turing Institute.

Notes on contributors

Levi John Wolf

Levi John Wolf is a Senior Lecturer at the University of Bristol and a Fellow with the Alan Turing Institute. He develops new concepts, methods, and measures to analyse and understand inequality and segregation in cities. He is also a maintainer of many open source spatial analysis software projects.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.