Parallel Monte Carlo simulation in the canonical ensemble on the graphics processing unit

Eyad HailatDepartment of Computer Science, Wayne State University, Detroit, MI48202, USACorrespondence[email protected]

Vincent RussoDepartment of Computer Science, Wayne State University, Detroit, MI48202, USA

Kamel RushaidatDepartment of Computer Science, Wayne State University, Detroit, MI48202, USA

Jason MickDepartment of Chemical Engineering and Materials Science, Wayne State University, Detroit, MI48202, USA

Loren SchwiebertDepartment of Computer Science, Wayne State University, Detroit, MI48202, USA

Jeffrey PotoffDepartment of Chemical Engineering and Materials Science, Wayne State University, Detroit, MI48202, USA

Abstract

Graphics processing units (GPUs) offer parallel computing power that usually requires a cluster of networked computers or a supercomputer to accomplish. While writing kernel code is fairly straightforward, achieving efficiency and performance requires very careful optimisation decisions and changes to the original serial algorithm. We introduce a parallel canonical ensemble Monte Carlo (MC) simulation that runs entirely on the GPU. In this paper, we describe two MC simulation codes of Lennard-Jones particles in the canonical ensemble, a single CPU core and a parallel GPU implementations. Using Compute Unified Device Architecture, the parallel implementation enables the simulation of systems containing over 200,000 particles in a reasonable amount of time, which allows researchers to obtain more accurate simulation results. A remapping algorithm is introduced to balance the load of the device resources and demonstrate by experimental results that the efficiency of this algorithm is bounded by available GPU resource. Our parallel implementation achieves an improvement of up to 15 times on a commodity GPU over our efficient single core implementation for a system consisting of 256k particles, with the speedup increasing with the problem size. Furthermore, we describe our methods and strategies for optimising our implementation in detail.

Keywords::

Acknowledgements

The authors thank the anonymous reviewers for their insightful comments and invaluable feedback. This work has been supported by Wayne State University's Research Enhancement Program (REP) and National Science Foundation (NSF) grants NSF CBET-0730768 and OCI-1148168. The authors also thank NVIDIA® for donating some of the graphics cards used in this study.

Notes

^1. Threads in the same warp do not need to be synchronised.

^2. Although hundreds of millions of simulation steps are required to obtain scientifically accurate simulation results, one million steps is sufficient to show the relative speedup of the GPU code.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Parallel Monte Carlo simulation in the canonical ensemble on the graphics processing unit

Information for

Open access

Opportunities

Help and information

Parallel Monte Carlo simulation in the canonical ensemble on the graphics processing unit

Abstract

Acknowledgements

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature