ABSTRACT
Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions – up to thousands of times faster – on a standard desktop.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
5. gwrr is also an open-source GWR package. It is only optimizing cross-validation instead of AICc in bandwidth searching, which cannot produce comparable results.
9. Computation times are based on an average of five iterations to account for any small variations that can occur due to passive consumption of computing resources by the operating system.
Additional information
Funding
Notes on contributors
Ziqi Li
Ziqi Li is a PhD Student in the School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, USA. His current research interest is developing geographically weighted statistical methods and tools in better understanding spatial non-stationary processes. He is also one of the developers of mgwr open-source python package for calibrating multi-scale and traditional GWR models.
A. Stewart Fotheringham
A. Stewart Fotheringham is Professor of computational spatial science in the School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, USA. He is also a Distinguished Scientist in the Julie Ann Wrigley Global Institute of Sustainability. His research interests are in the analysis of spatial data sets using statistical, mathematical, and computational methods. He is well known in the fields of spatial interaction modeling and local statistical analysis, the latter as one of the developers of geographically weighted regression (GWR). He has substantive interests in health data, crime patterns, retailing, and migration.
Wenwen Li
Wenwen Li is Associate Professor and Director of the Cyberinfrastructure and Computational Intelligence (CICI) Lab in the School of Geographical Sciences and Urban Planning at Arizona State University. Her research interest is cyberinfrastructure, space-time big data analytics and machine learning. She led the team who developed PolarHub - a large-scale web crawling engine for distributed geospatial data and PolarGlobe - a web-based scientific visualization tool for Earth science data.
Taylor Oshan
Taylor Oshan is an assistant professor in the Center for Geospatial Information Science within the Department of Geographical Science at the University of Maryland. His current research is focused on adapting spatial analysis methods for large heterogenous datasets and applying them to reveal how complex relationships change over space and time, especially within the context of cities. He is also broadly interested in spatial statistics, spatial data science, geocomputation, and the development of open source tools.