55
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

, , &
Pages 311-329 | Received 08 Nov 2006, Accepted 20 Feb 2007, Published online: 06 Apr 2009
 

Abstract

The design of the hierarchical hybrid grids (HHG) framework is motivated by the desire to achieve high performance on large-scale, parallel, finite element simulations on super computers. In order to realize this goal, careful analysis of the low-level, computationally intensive algorithms used in implementing the library is necessary. This analysis is primarily concerned with identifying and removing bottlenecks that limit the serial performance of multigrid component algorithms such as smoothing and residual error calculation. To aid in this investigation, two metrics have been developed: the balance metric (BM), and the loads per miss metric (LPMM). Each of these metrics makes assumptions about the interaction of various data structures and algorithms with the underlying memory subsystems and processors of the architectures on which they are implemented. Applying these metrics generates performance predictions that can then be compared to measured results to determine the actual characteristics of an algorithm/data structure on a given platform. This information can then be used to increase performance.

In this paper, we first present an overview of the HHG framework. Next, we introduce the details of the two performance metrics. These metrics are then applied to three different data structures used to implement a Gauß–Seidel smoothing algorithm. Performance results and an interpretation of the underlying interactions of the data structures with several relevant supercomputing architectures are given. Finally, we present a brief discussion of some performance results of the HHG framework, followed by some concluding remarks.

Notes

†† As we will see, it is important to consider not only what algorithm is being analyzed, but also how that algorithm is implemented, since this affects the way in which data are accessed.

‡‡ The current HHG implementation is designed to accommodate three-dimensional grids. The two-dimensional example is included because it is easier to visualize.

¶¶ The example given here is only one possibility for counting this metric. If we were to make different assumptions about the cache behavior of the algorithm, we would get a different count.

∥∥ Recall that all three algorithms have the same spatial locality by design.

Additional information

Notes on contributors

G. Wellein

∥ ∥ [email protected]

F. Hülsemann

# # [email protected]

U. Rüde

** ** [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 763.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.