Concurrent number cruncher: a GPU implementation of a general sparse linear solver

Luc Buatois ENSG/CRPG, Gocad Research Group, Nancy University, Rue du Doyen Roubault, BP40 54501, Vandoeuvre-les-Nancy, France; INRIA Lorraine, ALICE, BP 239 – 54506, Vandoeuvre-les-Nancy Cedex, FranceCorrespondence[email protected]

Guillaume Caumon ENSG/CRPG, Gocad Research Group, Nancy University, Rue du Doyen Roubault, BP40 54501, Vandoeuvre-les-Nancy, France

Bruno Lévy INRIA Lorraine, ALICE, BP 239 – 54506, Vandoeuvre-les-Nancy Cedex, France

Abstract

A wide class of numerical methods needs to solve a linear system, where the matrix pattern of non-zero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on graphics processor units (GPUs), especially since dedicated general purpose APIs such as close-to-metal (CTM) (AMD–ATI) and compute unified device architecture (CUDA) (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage (BCRS), register blocking and vectorization), to implement a sparse general-purpose linear solver. Our implementation of the Jacobi-preconditioned conjugate gradient algorithm outperforms by up to a factor of 6.0 × leading-edge CPU counterparts, making it attractive for applications which are content with single precision.

Keywords:

Acknowledgements

The authors thank the members of the GOCAD research consortium for their support (www.gocad.org), Xavier Cavin, Bruno Stefanizzi from AMD–ATI, and all the GPGPU team from NVIDIA, especially Tyler Worden. We also would like to thank the anonymous reviewers for their valuable comments about our work.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Concurrent number cruncher: a GPU implementation of a general sparse linear solver

Related Research Data

Information for

Open access

Opportunities

Help and information

Concurrent number cruncher: a GPU implementation of a general sparse linear solver

Abstract

Acknowledgements

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature