Abstract
A new efficient modular division algorithm suitable for systolic implementation and its systolic architecture is proposed in this article. With a new exit condition of while loop and a new updating method of a control variable, the new algorithm reduces the average of iteration numbers by more than 14.3% compared to the algorithm proposed by Chen, Bai and Chen. Based on the new algorithm, we design a fast systolic architecture with an optimised core computing cell. Compared to the architecture proposed by Chen, Bai and Chen, our systolic architecture has reduced the critical path delay by about 18% and the total computational time for one modular division by almost 30%, with the cost of about 1% more cells. Moreover, by the addition of a flag signal and three logic gates, the proposed systolic architecture can also perform Montgomery modular multiplication and a fast unified modular divider/multiplier is realised.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (60673071).
Notes
1. In expressions involving bit variables, a
i
b
i
for bitwise AND of a
i
and b
i
, ā
i
for bitwise NOT of a
i
, | for bitwise OR, ⊕ for bitwise exclusive OR, for bitwise exclusive NOR.
2. T(x) defines the delay of an input x or a logic element x. Mux represents multiplexer.