Abstract
The systolic concept in the parallel architecture design proposed by the H. T. Kung [1,2] obtains high throughput and speedups. The linear array for the matrix vector multiplication executes the algorithm in 2n − 1 time steps using 2n − 1 processors. Although the speedup obtained is very high, the efficiency is very poor (typical values of 25% efficiency for problem size greater than 10). H. T. Kung proposed an idea for a linear systolic array using two data streams flowing in opposite directions. However, the processors in the array perform operations in every second time moment.
Attempts to improve this design have been made by many researchers. Nonlinear and folding transformations techniques [3,4,5] only decrease the number of processors used to half the size, but do not affect the time.
We propose the use of a fast linear systolic computation procedure to obtain a solution that uses 3n/2 processors and executes the algorithm in 3n/2 time steps for the same cells, the same communication and the same regular data flow as the H. T. Kung linear array. Only the algorithm is restructured and more efficiently organized. Now the processors are utilized in every time step and no idle steps are required.
1University “Kiril i Metodij” of Skopje, PMF Institut za Informatika, Arhimedova 5, p.f. 162, YU-91000 Skopje, Macedonia, Yugoslavia.
2Jozef Stefan Institute, Jamova 39, YU-61111 Ljubljana, Slovenia.
1University “Kiril i Metodij” of Skopje, PMF Institut za Informatika, Arhimedova 5, p.f. 162, YU-91000 Skopje, Macedonia, Yugoslavia.
2Jozef Stefan Institute, Jamova 39, YU-61111 Ljubljana, Slovenia.
Notes
1University “Kiril i Metodij” of Skopje, PMF Institut za Informatika, Arhimedova 5, p.f. 162, YU-91000 Skopje, Macedonia, Yugoslavia.
2Jozef Stefan Institute, Jamova 39, YU-61111 Ljubljana, Slovenia.