Abstract
The standard RSA relies on multiple big-number modular exponentiation operations and a longer key-length is required for better protection. This imposes a hefty time penalty for encryption and decryption. In this study, we analysed and developed an improved parallel algorithm (PMKRSA) based on the idea of splitting the plaintext into multiple chunks and encrypt the chunks using multiple key-pairs. The algorithm in our new scheme is so natural for parallelised implementation that we also investigated its parallelisation in a GPU environment. In the following, the structure of our new scheme is outlined and its correctness is proved mathematically. Then, with the algorithm implemented and optimised on both CPU and CPU+GPU platforms, we showed that our algorithm shortens the computational time considerably, and it has a security advantage over the standard RSA as it is invulnerable to the common attacks. Finally, we also proved the feasibility of using our algorithm to encrypt large files through simulation. The results show that over the set of file size: 1 MB, 10 MB, 25 MB, 50 MB, 100 MB, the average encryption and decryption time of the CPU version is 0.2476 and 9.4476 s, and for the CPU+GPU version, it is 0.0009 and 0.0618 s, respectively.
Acknowledgements
The computation in this paper has been done on the USBCFootnote8 cluster provided by Beijing Normal University-Hong Kong Baptist University-United International College (UIC) Department of Statistics.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 CUDA: Parallel computing platform and programming model developed by NVIDIA for GPGPU.
2 An API supports multi-platform shared-memory parallel programming in C/C++ and Fortran.
3 We define and as the exponentiation in encryption and the exponentiation in decryption respectively for the matrix form plaintext, ° is the symbol of the Hadamard product.
4 For more detail in OpenSSL, see: https://github.com/openssl/openssl
5 GMP: https://gmplib.org/
6 According to Linux Programmer's Manual, the /dev/random only provide the output when it collects enough physical entropy. To ensure there is enough entropy, we use haveged to generate additional entropy.
7 We use only 2 cores and 4 cores to simulate the personal computer configuration. Usually, the frequency of server processor is lower than the PC's and the effect of PC's CPU acceleration will be more obvious.
8 USBC is short for BNU-HKBU United International College Statistics Bayes Cluster