Area-Efficient Hardware Design of Modular Exponentiation based on Montgomery Multiplier for RSA CryptosystemRichard Boateng Nti1,1 and Kwangki Ryoo1,

1 Graduate School of Information & Communications, Hanbat National University,125 Dongseodaero, Yuseong-Gu, Daejeon 34158, Republic of Korea

[email protected], [email protected]

Abstract. Modular exponentiation is the most time-consuming mathematical operation in some public key cryptosystem such as RSA. The primal operation of the RSA cryptosystem is modular exponentiation, computed by repeated modular multiplication. Fast modular multiplication algorithms have been proposed to speed up decryption/encryption yet minimize area. However, the Montgomery algorithm is limited by the carry propagation delay from the addition of long operands. In this paper, we propose a hardware structure that simplifies the operation of the Q logic coupled with a compact CSA in Montgomery multiplier. The resulting design was applied in modular exponentiation for lightweight applications of RSA. Synthesis results showed that the new multiplier design achieved reduce hardware area, consequently, an area-efficient modular exponentiation design. A frequency of 452.49MHz was achieved for modular exponentiation with 85K gate equivalent using the 130nm CMOS technology.

Keywords: Public key cryptosystem, RSA, Carry-Save Adder (CSA), Montgomery multiplication, Modular exponentiation.

1 Introduction

Maintenance of privacy and data integrity is essential from the viewpoint of security, which can be achieved through techniques: authentication and cryptography1. Public key cryptosystems are vital for information security. RSA is the most widely used public key algorithm and requires repeated modular multiplication to compute for modular exponentiation2. Modular multiplication with large numbers is time-consuming. The Montgomery algorithm is used as the core algorithm for cryptosystems. Montgomery algorithm determines the quotient by replacing the trial division by modulus with a series of additions and shift operations3. To avoid the delay, several approaches have been proposed to speed up the operation based on carry-save addition. Based on the representation, these approaches can be divided into two.

In the first approach, the intermediate results are kept in carry-save form to avoid carry propagation4 while the input and output operands (i.e. A, B, N and S) of the algorithm remains in binary representation. Final conversion from the carry save form into the binary form must be performed. This comes with an extra effort to pay by using the CPA3. This results in an increase in the total computation time and it implies that the resulting throughput rate is dependent on the length of operands.

On the contrary, some work used 5-to-2 carry-save additions/adders (CSA)5, a three-level CSA tree, to deal with this problem without performing format conversion. This second approach eliminates repeated interim output of the Montgomery modular multiplication output to input conversion by keeping all input and output operands in the carry save form except the final step for the results. Mclvor et al.6 proposed two algorithmic variants of the Montgomery algorithm with both approaches using carry save adder(CSA) to compute for the exponentiation. One is based on a five-to-two CSA and the other on a four-to-two CSA plus multiplexer. Each can perform a Montgomery multiplication in only k + 1 and k + 2 clock cycles, respectively, where k is the operand bit length. The later approach MM42 in an effort to reduce the number of input operands introduces extra multiplexers and select signals to select the desired operands. More registers are also required to store the combined input as well.

In this paper, we focus on the hardware design of efficient Montgomery multiplier with a two-level adder. A simplified Q_logic was designed for bit operation which accounted for a reduction in the hardware area. The proposed Montgomery multiplier is then applied in the H algorithm to compute modular exponentiation.

Section 2 reviews radix-2 Montgomery and modular exponentiation algorithms. We proposed an efficient design of the Montgomery algorithm in section 3. In Section 4, we compare different Montgomery multipliers as well as modular exponentiation designs. Finally, concluding remarks are drawn in Section 5.

2 Modular Multiplication and Exponentiation Algorithms

The Montgomery multiplication is an algorithm used to compute the product of two integers A and B modulo N. Algorithm 1 shows the radix-2 version of the algorithm. Given two integers a and b; where a, b