Abstract
Data parallelism, in which the same operation is performed on many elements of an n-dimensional array, is one of the most powerful methods of extracting parallelism in scientific computation. One form of data parallelism involves defining a sequence of parallel wavefronts of a computation. Each wavefront consists of an (n - l)-dimcnsional subarray of the evaluated array and all wavefront elements are evaluated simultaneously. Different wavefronts result in different performance, so the question arises how to determine the wavefronts that result in the minimum computation time. Wavefront determination should define also allocation of wavefront elements to processors.
In this paper we present efficient algorithms for determining the optimum wavefront and for partitioning it into sections assigned to individual processors. Presented algorithms are applicable to computations that are defined over two or higher dimensional arrays and are executed on distributed memory machines interconnected into a one or two-dimensional processor array.
Notes
∗This work sponsored in part by IBM Corp. under the Development Grant, by NSF under grant CCR-8920694 and by ONR under grant N00014-93-1-0076. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.