55
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

, , &
Pages 311-329 | Received 08 Nov 2006, Accepted 20 Feb 2007, Published online: 06 Apr 2009
 

Abstract

The design of the hierarchical hybrid grids (HHG) framework is motivated by the desire to achieve high performance on large-scale, parallel, finite element simulations on super computers. In order to realize this goal, careful analysis of the low-level, computationally intensive algorithms used in implementing the library is necessary. This analysis is primarily concerned with identifying and removing bottlenecks that limit the serial performance of multigrid component algorithms such as smoothing and residual error calculation. To aid in this investigation, two metrics have been developed: the balance metric (BM), and the loads per miss metric (LPMM). Each of these metrics makes assumptions about the interaction of various data structures and algorithms with the underlying memory subsystems and processors of the architectures on which they are implemented. Applying these metrics generates performance predictions that can then be compared to measured results to determine the actual characteristics of an algorithm/data structure on a given platform. This information can then be used to increase performance.

In this paper, we first present an overview of the HHG framework. Next, we introduce the details of the two performance metrics. These metrics are then applied to three different data structures used to implement a Gauß–Seidel smoothing algorithm. Performance results and an interpretation of the underlying interactions of the data structures with several relevant supercomputing architectures are given. Finally, we present a brief discussion of some performance results of the HHG framework, followed by some concluding remarks.

Notes

†† As we will see, it is important to consider not only what algorithm is being analyzed, but also how that algorithm is implemented, since this affects the way in which data are accessed.

‡‡ The current HHG implementation is designed to accommodate three-dimensional grids. The two-dimensional example is included because it is easier to visualize.

¶¶ The example given here is only one possibility for counting this metric. If we were to make different assumptions about the cache behavior of the algorithm, we would get a different count.

∥∥ Recall that all three algorithms have the same spatial locality by design.

Additional information

Notes on contributors

G. Wellein

∥ ∥ [email protected]

F. Hülsemann

# # [email protected]

U. Rüde

** ** [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.