1,546
Views
57
CrossRef citations to date
0
Altmetric
Research Articles

Fast Geographically Weighted Regression (FastGWR): a scalable algorithm to investigate spatial process heterogeneity in millions of observations

ORCID Icon, , ORCID Icon &
Pages 155-175 | Received 15 May 2018, Accepted 05 Sep 2018, Published online: 05 Oct 2018
 

ABSTRACT

Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions – up to thousands of times faster – on a standard desktop.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

5. gwrr is also an open-source GWR package. It is only optimizing cross-validation instead of AICc in bandwidth searching, which cannot produce comparable results.

9. Computation times are based on an average of five iterations to account for any small variations that can occur due to passive consumption of computing resources by the operating system.

Additional information

Funding

This work was supported by the National Science Foundation [1455349,1758786].

Notes on contributors

Ziqi Li

Ziqi Li is a PhD Student in the School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, USA. His current research interest is developing geographically weighted statistical methods and tools in better understanding spatial non-stationary processes. He is also one of the developers of mgwr open-source python package for calibrating multi-scale and traditional GWR models.

A. Stewart Fotheringham

A. Stewart Fotheringham is Professor of computational spatial science in the School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, USA. He is also a Distinguished Scientist in the Julie Ann Wrigley Global Institute of Sustainability. His research interests are in the analysis of spatial data sets using statistical, mathematical, and computational methods. He is well known in the fields of spatial interaction modeling and local statistical analysis, the latter as one of the developers of geographically weighted regression (GWR). He has substantive interests in health data, crime patterns, retailing, and migration.

Wenwen Li

Wenwen Li is Associate Professor and Director of the Cyberinfrastructure and Computational Intelligence (CICI) Lab in the School of Geographical Sciences and Urban Planning at Arizona State University. Her research interest is cyberinfrastructure, space-time big data analytics and machine learning. She led the team who developed PolarHub - a large-scale web crawling engine for distributed geospatial data and PolarGlobe - a web-based scientific visualization tool for Earth science data.

Taylor Oshan

Taylor Oshan is an assistant professor in the Center for Geospatial Information Science within the Department of Geographical Science at the University of Maryland. His current research is focused on adapting spatial analysis methods for large heterogenous datasets and applying them to reveal how complex relationships change over space and time, especially within the context of cities. He is also broadly interested in spatial statistics, spatial data science, geocomputation, and the development of open source tools.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.