Exploring emerging manycore architectures for uncertainty quantification through embedded stochastic Galerkin methods: International Journal of Computer Mathematics: Vol 91, No 4

191

Views

CrossRef citations to date

Altmetric

Abstract

We explore approaches for improving the performance of intrusive or embedded stochastic Galerkin uncertainty quantification methods on emerging computational architectures. Our work is motivated by the trend of increasing disparity between floating-point throughput and memory access speed on emerging architectures, thus requiring the design of new algorithms with memory access patterns more commensurate with computer architecture capabilities. We first compare the traditional approach for implementing stochastic Galerkin methods to non-intrusive spectral projection methods employing high-dimensional sparse quadratures on relevant problems from computational mechanics, and demonstrate the performance of stochastic Galerkin is reasonable. Several reorganizations of the algorithm with improved memory access patterns are described and their performance measured on contemporary manycore architectures. We demonstrate these reorganizations can lead to improved performance for matrix–vector products needed by iterative linear system solvers, and highlight further algorithm research that might lead to even greater performance.

Keywords:

2010 AMS Subject Classifications::

View correction statement:

Erratum

Acknowledgements

This work was supported by the LDRD program of Sandia National Laboratories.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

Notes

1. In some cases better performance can be obtained through nested Clenshaw–Curtis or Gauss–Patterson abscissas, however we found these methods to be more expensive for a similar level of accuracy when applied to these problems.

2. For each algorithm, we compute the floating-point throughput rate by dividing the total number of floating-point operations by the total run-time. For the two flat CRS approaches, the total number of floating-point operations is exactly twice the total number of matrix non-zeros. For the dense-block multiply, it is 2(P+1)²nz (where nz is the number of non-zeros arising from the spatial discretization) and for the polynomial multiply algorithm it is (5 nc+P+1)nz where nc is the number of non-zero triple-product values.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Exploring emerging manycore architectures for uncertainty quantification through embedded stochastic Galerkin methods

Information for

Open access

Opportunities

Help and information

Exploring emerging manycore architectures for uncertainty quantification through embedded stochastic Galerkin methods

Abstract

Acknowledgements

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature