1,043
Views
45
CrossRef citations to date
0
Altmetric
Original Articles

A hybrid parallel cellular automata model for urban growth simulation over GPU/CPU heterogeneous architectures

, , &
Pages 494-514 | Received 13 Aug 2014, Accepted 29 Mar 2015, Published online: 20 May 2015
 

Abstract

As an important spatiotemporal simulation approach and an effective tool for developing and examining spatial optimization strategies (e.g., land allocation and planning), geospatial cellular automata (CA) models often require multiple data layers and consist of complicated algorithms in order to deal with the complex dynamic processes of interest and the intricate relationships and interactions between the processes and their driving factors. Also, massive amount of data may be used in CA simulations as high-resolution geospatial and non-spatial data are widely available. Thus, geospatial CA models can be both computationally intensive and data intensive, demanding extensive length of computing time and vast memory space. Based on a hybrid parallelism that combines processes with discrete memory and threads with global memory, we developed a parallel geospatial CA model for urban growth simulation over the heterogeneous computer architecture composed of multiple central processing units (CPUs) and graphics processing units (GPUs). Experiments with the datasets of California showed that the overall computing time for a 50-year simulation dropped from 13,647 seconds on a single CPU to 32 seconds using 64 GPU/CPU nodes. We conclude that the hybrid parallelism of geospatial CA over the emerging heterogeneous computer architectures provides scalable solutions to enabling complex simulations and optimizations with massive amount of data that were previously infeasible, sometimes impossible, using individual computing approaches.

Notes

1. The discrete memory architecture allocates separate memory spaces for computing units, and a computing unit cannot access other units’ memory spaces directly. Any data exchange must be implemented through communications between units. On the other hand, the shared memory architecture allows all units to access the same memory space, which eliminates the necessity of communication. However, when multiple units are trying to access (especially updating) the same memory address, a locking mechanism must be evoked to allow one unit to access the address at a time, which will degrade the performance.

2. Ranking from Top500.org as of June, 2014.

3. The reading time refers to the time used for MPI processes to read the sub-domains of input data into the CPU memory, and does not include the time for CPU→GPU transfers.

Additional information

Funding

This research was supported by the Specialized Research Fund for the Doctoral Program of Higher Education, Ministry of Education of China [20130145120013]; the National Science Foundation of USA [OCI-1047916]. This research used resources of the Keeneland Computing Facility, which is supported by the National Science Foundation of USA [OCI-0910735].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.