Abstract
In this paper we extend our previous work that focused on reducing the transmission cost of parallel pipelined messages distributed in a block-cyclic fashion. We apply the same transmission strategy but we aim to reduce index computation overheads. More specifically, we show how to reduce the computations required to define the interprocessor communication cost and we introduce a more efficient use of memory, based on indices.