191
Views
8
CrossRef citations to date
0
Altmetric
Section B

Exploring emerging manycore architectures for uncertainty quantification through embedded stochastic Galerkin methods

, , &
Pages 707-729 | Received 30 Jan 2013, Accepted 29 Aug 2013, Published online: 05 Dec 2013
 

Abstract

We explore approaches for improving the performance of intrusive or embedded stochastic Galerkin uncertainty quantification methods on emerging computational architectures. Our work is motivated by the trend of increasing disparity between floating-point throughput and memory access speed on emerging architectures, thus requiring the design of new algorithms with memory access patterns more commensurate with computer architecture capabilities. We first compare the traditional approach for implementing stochastic Galerkin methods to non-intrusive spectral projection methods employing high-dimensional sparse quadratures on relevant problems from computational mechanics, and demonstrate the performance of stochastic Galerkin is reasonable. Several reorganizations of the algorithm with improved memory access patterns are described and their performance measured on contemporary manycore architectures. We demonstrate these reorganizations can lead to improved performance for matrix–vector products needed by iterative linear system solvers, and highlight further algorithm research that might lead to even greater performance.

2010 AMS Subject Classifications::

View correction statement:
Erratum

Acknowledgements

This work was supported by the LDRD program of Sandia National Laboratories.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

Notes

1. In some cases better performance can be obtained through nested Clenshaw–Curtis or Gauss–Patterson abscissas, however we found these methods to be more expensive for a similar level of accuracy when applied to these problems.

2. For each algorithm, we compute the floating-point throughput rate by dividing the total number of floating-point operations by the total run-time. For the two flat CRS approaches, the total number of floating-point operations is exactly twice the total number of matrix non-zeros. For the dense-block multiply, it is 2(P+1)2nz (where nz is the number of non-zeros arising from the spatial discretization) and for the polynomial multiply algorithm it is (5 nc+P+1)nz where nc is the number of non-zero triple-product values.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,129.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.