Abstract
The data dependence method for algorithm implementation on processor arrays is based on linear transformations of data dependencies and the index set of the algorithm. The implementations obtained by using this method have regular data flow, but still the method is unable to optimize the array size or the implementation.
More efficient arrays can be obtained by using some nonlinear transformations. These implementations realize the algorithm using less processors than the implementations obtained by linear transformations. The nonlinear implementations are developed by using properties of some special linear transformations. The properties like symmetric implementation and processors working in swapped active and inactive time moments are defined according to the definition of the data dependence method.
The 2D hex matrix multiplication algorithm has all these properties. The array implementation from this example is folded and a triangular solution is obtained. It is shown that this can be done by using a nonlinear mapping modifying the linear mapping that produced the parallel implementation. A correctness proof shows that this implementation uses a nonregular data flow.
A new improved version of the same idea is also presented. This implementation also uses nonlinear transformations and offers regular data flow and a parallel/systolic implementation.