Abstract
Determining the optimal number of regions is a challenging issue in regionalization. Although cluster validity indices developed for non-spatial clustering have been used to determine the optimal number of regions, spatial contiguity constraints for regionalization are often neglected. Consequently, different regionalization results can share the same validity index value, which reduces the reliability of identifying the optimal number of regions in regionalization. To overcome this limitation, this study proposes a spatially constrained statistical approach for determining the optimal number of regions using two metrics: (i) a permutation-based variance for measuring the homogeneity within regions and (ii) a proportion index based on spatially constrained k-nearest neighbors to quantify the separation between regions. Furthermore, a distance-based method is employed to balance these two metrics to automatically determine the optimal number of regions. Experimental results on five synthetic datasets, the US presidential election and climate datasets show that the statistical approach developed in this study outperforms three widely used cluster validity indices in determining the optimal number of regions. The proposed statistical approach is straightforward to implement and can effectively reduce subjectivity in regionalization.
Data and codes availability statement
The data and codes that support the findings of this study are available on ‘figshare.com,’ with the identifier at the public link: https://doi.org/10.6084/m9.figshare.24321121.v1.
Author contributions
Qiliang Liu and Yuxuan Chen conceived and designed the presented idea. Yuxuan Chen implemented the experiments and analysed the results. Qiliang Liu and Yuxuan Chen wrote the manuscript. Jie Yang and Xinghua Chen help for designing the method. Min Deng reviewed the manuscript, and provided comments.
Acknowledgements
We gratefully acknowledge the comments from the editor and the reviewers.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Funding
Notes on contributors
Yuxuan Chen
Yuxuan Chen is currently a Ph.D. candidate at Central South University and his research interests focus on spatio-temporal datamining.
Qiliang Liu
Qiliang Liu is currently a professor at Central South University, Hunan, China. His research interests focus on multi-scale spatio-temporal data mining and spatiotemporal statistics. He has published more than 50 peer-reviewed journal articles in these areas.
Jie Yang
Jie Yang is currently a Ph.D. candidate at Central South University and his research interests focus on spatio-temporal statistics.
Xinghua Cheng
Xinhua Cheng is currently a Ph.D. candidate at University of Connecticut and his research interests focus on spatial analysis.
Min Deng
Min Deng is currently a professor at Central South University and the associate dean of School of Geosciences and info-physics. His research interests are map generalization, spatio-temporal data analysis and mining.