Abstract
The bidirectional linear arrays were derived heuristically for various algorithms or by following the"systolic principle"introduced by H. T. Kung and C. E. Leiserson. This principle was constrained since all the data items in the data streams have to enter the array every second time moment. This was"the only"design that enables each data item to meet all the elements from the other data stream
We introduce a new design, and show that the bidirectional linear array can be organizedwith data items that enter the array in consecutive time moments. The standard IPS cell isslightly modified to compute two pipelineable IPS operations instead of one. Also a specialresident memory is included in the cell. This now enables each data item to flow in the data pathand to be delayed in the same cell in the next time moment. So it can meet all the data items from the other data stream. The complete details of the cell and the algorithm are discussed andthen this design is compared to the standard BLA design
It is shown that the fastest systolic BLA design offers a speedup of 2, uses half the processorsthan the standard BLA design and finally the efficiency is 4 times greater. The maximumefficiency of 100% can be exploited in all the cells, contrary to the standard BLA designs. TheMA/AR filters are the examples represented with full details. The implementation of the fastestBLA design is possible in all available processor arrays. Even the systolic WARP cell canemulate this cell, although it is designed as an implementation of the standard BLAs.