# Introduction s the telecommunication network has grown explosively and the internet has become increasingly popular, security over the network is the main concern for services like electronic commerce [1]. The fundamental security requirements include confidentiality, authentication, data integrity, and non repudiation. Cryptography plays an important role in the security of data. It enables us to store sensitive information or transmit it across insecure networks so that unauthorized persons cannot read it. The urgency for secure exchange of digital data resulted in large quantities of different encryption algorithms which can be classified into two groups: symmetric key algorithms (with private key algorithms) and asymmetric key algorithms (with public key algorithms) [2]. Many systems utilize public-key cryptography to provide such security services, and the algorithms developed by Rivest, Shamir, and Adleman (RSA) [3] is one of the most widely adopted public key algorithms at present. Since, RSA is considered as an efficient and optimized solution for public-key cryptography, we have implemented the Commutative RSA (CRSA) approach for authenticating data communication between Multiple Input Multiple Output (MIMO) or transceiver systems. In most of the existing data authentication or security systems, the authentication is accomplished by key exchange approach and thus it increases the key exchange overheads. On the other hand at every terminal, encryption and decryption process is required and thus if general RSA approach is applied in that case the data authentication and security could be violated. Therefore, in order to accomplish the goal of data security with individual encryption/decryption without affecting the data security and its integrity, a modified RSA has been developed and this mechanism is termed as Commutative RSA (CRSA). RSA is the most widely used public-key cryptosystem. An RSA operation is an exponentiation, which requires repeated multiplications. The Montgomery multiplication algorithm [4] is the most efficient multiplication algorithm available. It replaces trial division by the modulus with a series of additions and divisions by a power of two. Thus, it is well suited to hardware implementation and forms the basis of many of the currently reported RSA hardware architectures [5][6][7]. To date, several techniques have been proposed in order to avoid carry propagation during the addition stages of the computation, as this is a key factor in determining performance. One approach proposed by Elbirt and Paar [6] is to break these additions into x-bit stages, where x is an optimal bit length chosen to take advantage of the fast carry chains available on modern FPGAs. However, a drawback of this approach is that the circuits developed can be very heavily technology A XIII Issue XV Version I 47 ( ) and implementation dependent. For example, it is unlikely that a design created in this manner for a specific FPGA family will show the same speed advantages if migrated to a modern ASIC technology or, indeed, an alternative type of FPGA or Programmable Logic Device (PLD). An alternative approach presented by Blum and Paar [7] is based on the use of FPGA systolic array multiplier architectures with varying processing element sizes, namely, 4, 8 and 16 bits. However, these systems are again tailored specifically for the XilinxFPGA series. As the operands such as the plain text of a message or the cipher or possibly a partially ciphered text are usually large and, in order to improve time requirements of the encryption/decryption operations, it is essential to attempt to minimize the number of multiplications performed and to reduce the time requirement of a single multiplication. There are various algorithms that implement multiplication. But considering the versatility and robustness of Montgomery multiplication approach, we have used Montgomery Multiplication algorithm. The most attractive feature of Montgomery algorithm is that it computes multiplications without trial divisions. The RSA algorithm and Diffie-Hellman key exchange scheme need exponentiation, which binary or m-ary methods can break into a series of multiplications. It is effectively accomplished by Montgomery multiplication algorithm. Montgomery algorithm speeds up the multiplications and squaring required for exponentiation. The efficient implementation of this long-word length multiplication is crucial for the performance of public-key cryptography like our proposed CRSA. Exponentiation with a large modulus, which is usually accomplished by repeated multiplications, has been widely used in public key cryptosystems for secured data communications. To speed up the computation, the Montgomery multiplication algorithm is used to relax the process of quotient determination and, the carrysave addition (CSA) is employed to reduce the critical path delay. Basically, the exponentiation with a large modulus is usually accomplished by performing repeated multiplications, which is considerably timeconsuming. As a result, the throughput rate of RSA cryptosystem will be entirely dependent on the speed of multiplication and the number of performed multiplications. One way to achieve this is to use carry save adders (CSAs) to perform the addition stages of Montgomery's algorithm. For example, Kim et al. [8] used two levels of carry save logic (CSL) and a 32-bit carry propagate adder along with a 32 x 32-bit shift register in order to perform the 1024-bit additions required. Bunimov et al. [9] improved this by replacing one level of CSL with a look-up table. In order to accomplish the goal of data security and authentication among multiple MIMO or transceiver terminals with proposed Commutative RSA cryptographic algorithm, we have implemented an enhanced and optimized noble data authentication architecture called Commutative RSA algorithm with multiple MIMO or transceiver systems, and simulated on FPGA devices. In this approach, three FPGA cores have been considered in simulation framework and simulation for RSA encryption and decryption has been accomplished at every considered terminal. The developed architecture encompasses the Montgomery modular multiplication approach to speed up the computation and to relax the process of quotient determination and similarly the carry-save addition has been employed to reduce the critical path delay. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. In order to make the system compatible with Very Large Scale Integration and to get optimized performance, the system architecture has been developed with Montgomery multiplication with Radix-2 multiplier based architecture. We have implemented two different CRSA implementation architectures. One is Serial Montgomery implementation and another one represents Parallel Montgomery based CRSA core. The performance for both architectures for delay, frequency, efficiency, power consumption as well as throughput have been calculated and we have found that the proposed Parallel Montgomery (PM) based CRSA performs far better than serial Montgomery (SM) based CRSA core. The remaining paper has been divided into the following sections. Section 2 discusses in brief the literature survey conducted for the research work with emphasis on RSA algorithm and implementation of Montgomery multiplication with Radix-2 architecture. Section 3 discusses the proposed Commutative RSA algorithm and presents the mathematical derivation for CRSA approach. Section 4 represents the proposed commutative RSA core based on serial Montgomery and parallel Montgomery multipliers. The hardware implementation has been presented in Section 5 followed by Section 6 that presents the results and analysis of the research work. The conclusion has been given in the last section. # II. # Related Works Gustavo D. Sutter et. al [10] optimized the Montgomery's multiplication and proposed architectures to perform the least significant bit first and the most significant bit first algorithms. The developed architecture has the following distinctive characteristics: 1) use of digit serial approach for Montgomery multiplication. 2) Conversion of the CSA representation of intermediate multiplication using carry-skip addition. This allows the critical path to be reduced, albeit with a small-area speed penalty; and 3) recomputed the quotient value in Montgomery's iteration in order to XIII Issue XV Version I # ( ) Year researchers presented results in Xilinx Vertex 5 and in 0.18-?m application-specified integrated circuit technologies. Jin-Hua and Cheng-Wen [11] proposed a radix-4 modular multiplication algorithm based on Montgomery's algorithm, and a fast radix-4 modular exponentiation algorithm for RSA public-key cryptosystem. The proposed multiplier is four-times faster than a direct radix-2 implementation of Montgomery's algorithm. Extending the design for a larger modulus is straightforward. High-radix bit-level and digit-level modular multipliers have also been discussed. C. McIvor et.al [12] presented Modified Montgomery multiplication and associated RSA modular exponentiation algorithms and circuit architectures. Practical approach presented is based on a reformulation of the solution to modular multiplication within the context of RSA exponentiation. Alexandre F. Tenca and C¸ etin K. Koc [13] presented a scalable architecture for the computation of modular multiplication based on the Montgomery multiplication algorithm. A word-based version of is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Marcelo E. and Naofumi Takagi [14] proposed a mixed radix-4/2 algorithm for modular multiplication/division for a large modulus suitable for VLSI implementation. The calculation of modular multiplication is based on the Montgomery multiplication algorithm and the modular division on the extended Binary GCD algorithm. The researchers exploit these similarities to modify the algorithms in order to share almost all hardware components for both operations. Koç, C.K., et.al [15] studied the operations involved in computing the Montgomery product and describe several high-speed, space-efficient algorithms for computing MonPro (a, b), and analyzed their time and space requirements. Their focus is to collect several alternatives for Montgomery multiplication, three of which are new. However, the researchers do not compare the Montgomery techniques to other modular multiplication approaches. Ching-Chao Yang et. al [16] proposed a new algorithm based on Montgomery's algorithm to calculate modular multiplication that is the core arithmetic operation in an RSA cryptosystem. The modified algorithm eliminates over-large residue and has very short critical path delay that yields a very high-speed processing. The researchers have implemented a 512bit single-chip RSA processor based on the modified algorithm with Compass 0.6-µm SPDM CMOS cell library. GuilhermePerin et. al [18] described a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The RSA algorithm proposed by P. Fournaris and O. Koufopavlou [19] has gained wide acceptability and has been well used algorithm in many security applications. Its main mathematical function is demanding in terms of speed, operation of modular exponentiation. In this article, a systolic, scalable, redundant carry-save modular multiplier and RSA encryption architecture are proposed using the Montgomery modular multiplication algorithm. Perovic, N. S. et. al [23] presented FPGA implementation of RSA algorithm, where a key is 1024 bits long and the project synthesis results like resource occupancy, maximal operating frequency, etc. were examined for the system implementation. # III. # Proposed System Highly robust and optimized system architecture for implementation of ?????????????????????? ?????? algorithm for data authentication among multiple MIMO terminals (here simulated on FPGA devices) has been proposed in this paper. In order to facilitate the secure data communication among multiple MIMO or transceiver systems, a noble commutative RSA approach that states that, the order in which encryption is performed does not affect the result of the encryption, has been implemented and simulated on multiple FPGA devices. In order to optimize the performance of the system with minimum space and higher speed, the robust Montgomery modular multiplication mechanism has been adopted with ?????????? ? 2 multiplication architecture. We have proposed the implementation of Serial Montgomery as well as Parallel Montgomery based CRSA cryptography core, with a goal to enhance the system performance for its less memory occupancy, fast rate, higher throughput and less power consumption. # XIII Issue XV Version I # ( ) # Year Along with the strong momentum of shifting from single-core to multicore systems, Zhimin Chen et. al [17] present a parallel-software implementation of the Montgomery multiplication for multicore systems. Their comprehensive analysis shows that the proposed scheme, pSHS, partitions the task in a balanced way so that each core has the same amount of job to do. In addition, we also comprehensively analyze the impact of inter-core communication overhead on the performance of pSHS. The analysis reveals that pSHS is high performance, scalable over different number of cores, and stable when the communication latency changes. # a) Commutative RSA A secure plane is realizable provided the data communicated over the plane is protected and cannot be colluded. The use of cryptographic techniques is generally preferred, hence the ???????????? ?????????? ???????? ?????????????????????????? ???????????????? ( ??????????) proposed in this paper adopts the commutative RSA algorithm. The ?????????? considers two prime numbers ??????????_?? ?? ???????? and ??????????_?? ?? ???????? initialized amongst all the group members. ?? ?? Let and ?? ?? represent the group members required to communicate over the secure plane. To compute the encryption keys and decryption key pairs of the commutative RSA algorithm, the Property ????????_?? ???????? and ????????_?? ???????? are computed using the following equations: ????????_?? ???????? = ??????????_?? ?? ???????? ? × ?????????_?? ?? ???????? ??(1)????????_?? ???????? = ??????????_?? ?? ???????? ? 1? × ?????????_?? ?? ???????? ? 1??(2) From the above equations, it is clear that ??????????_?? ?? ???????? = ??????????_?? ?? ????????(3) and ????????_?? ?? ???????? = ????????_?? ?? ???????? for ?? and ?? The encryption key pair of ?? and ?? represented as ( ????????_?? ?? ???????? , ????????_?? ?? ???????? ) and ( ????????_?? ?? ???????? , ????????_?? ?? ???????? ) are to be obtained. The ??????????_?? ???????? is obtained by randomly selecting numbers such that it is a co prime of ????????_?? ???????? or in other terms: ?ð?"?ð?"? ?????? (????????_?? ???????? , ????????_?? ???????? ) = 1(5) where ?ð?"?ð?"? ?????? (??, ??) represents the greatest common divisor function between two variables ?? and ??. The decryption key pair of ?? and ?? is represented by ( ????????_?? ?? ???????? , ????????_?? ?? ???????? ) and ( ????????_?? ?? ???????? , ????????_?? ?? ???????? ) and the Property ????????_?? ???????? is computed based on the following ????????_?? ???????? = ( ????????_?? ???????? ) ?1 ??????(????????_?? ???????? )(6) Let ?????? ?? represent the encrypted data ?? . The encryption operation is defined as follows: ?????? ?? = ?? ???????? _?? ???????? ??????(????????_?? ???????? )(7) The commutative RSA decryption operation on the encrypted data ð?"¹ð?"¹ is defined as ?????? ?? = ?? ???????? _?? ???????? ??????(????????_?? ???????? )(8) b) Commutative property of RSA Algorithm The commutative property of the RSA algorithm adopted in SMFCP can be proved if data X encrypted by A and then encrypted by B provides the same resultant if the encryption is performed by B followed by the encryption performed by A , i.e., # Enc B (Enc x A ) ? Enc A (Enc x B ) (9) Enc B ? X Prop _E A CRSA Mod(Prop_N A CRSA )? ? Enc A ?X Prop _E B CRSA Mod(Prop_N B CRSA )?(10)X ?Prop _E A CRSA ×Prop _E B CRSA ? Mod ?Prop N A CRSA ? = X ?Prop _E B CRSA ×Prop _E A CRSA ? Mod(Prop_N B CRSA )(11) As Prop N A CRSA = Prop_N B CRSA it can be concluded that X ?Prop _E A CRSA ×Prop _E B CRSA ? Mod ?Prop N A CRSA ? = X ?Prop _E B CRSA ×Prop _E A CRSA ? Mod(Prop_N A CRSA )(12) And hence Enc B (Enc X A ) ? Enc A (Enc X B ) XIII Issue XV Version I # ( ) Year have implemented Commutative RSA cryptography core among multiple FPGA devices. In order to optimize the performance as well as memory occupancy, highly effective system architectures like Montgomery modular multiplication based on Radix-2 has been developed. Such implementation causes the reduction in memory occupancy as well as the speed is also enhanced many folds. These implemented approaches have been discussed in the following sections. # a) Montgomery Algorithm Montgomery multiplication [20] is an efficient method for modular multiplication with an arbitrary modulus, particularly suitable for implementation on general-purpose computers and embedded microprocessors. The method is based on a representation of the residue class modulo ??. The algorithm uses simple divisions by a power of two instead of divisions by ??, which are used in a conventional modular operation. The Montgomery multiplication (MM) is the basic operation used in modular exponentiation, which is required in the Diffie-Hellman and RSA public-key cryptosystems. Montgomery's modular multiplication algorithm employs only simple additions, subtractions, and shift operations to avoid trial division, a critical and timeconsuming operation in conventional modular multiplication. The price paid is the need to convert operands into and out of Montgomery's domain, which is almost negligible in some particular applications such as cryptosystems. Montgomery Mathematically, it can be written as: MP(?? ? , B ? , M) = A ? . B ? . 2 ?n = (A. 2 ?? ). (B. 2 ?? ). 2 ?n = A. B. 2 ?? = (A. B) ? (mod M).(14) The conversion between each domain can be done using the same Montgomery operation, in particular A ? = MP(A, 2 2n (mod M), M) and X = MP(A ? , 1, M) , where 2 2?? (mod M) can be precomputed. Despite the initial conversion cost, we achieve an advantage over ordinary multiplication if we do many Montgomery multiplications followed by an inverse conversion at the end, which is the case, for example, in our proposed RSA. # b) Radix-2 Modular Multiplier The optimized algorithm for Radix-2 Modular multiplier for Montgomery multiplication is given as follows: ??????????: ?????? ??, ?? = ??????? 2 ??? + 1, (15) ?? = ? ?? ?? ???1 ??=0 . 2 ?? , ??????? 0 ? ??, ?? < ?? Proposed Commutative RSA Core Based on Serial Montgomery and Parallel Montgomery The dominant goal of this research work is to implement and illustrate the efficiency and robustness of commutative RSA cryptography approach for multiple MIMO or transceiver systems and for this purpose, we 1.1 X[0] = 0; (18) 1.2 ?????? ?? = 0 ???? ?? ? 1 ????; 1.3? ?? ?? = (?? ?? . ?? 0 ) ? X [??]??; X[?? + 1] = ??[??]+?? ?? .??+?? ?? .?? 2 ;(19) The above mentioned algorithm represents the Pseudocode for the Radix-2 Montgomery multiplication, where we choose ?? = ??????? 2 ??? + 1. ?? is the size of M in bits. The verification of the above algorithm may be presented as follows: Consider X[i] given as X[??] ? 1 2 ?? ?? ?? ?? . 2 ?? ???1 ?? =0 ?. ??(?????? ??),(21) With X[0]=0. Then ??[??] ? ??. ??. 2 ??? (?????? ??) = ????(??, ??, ??). ??[??](22) can be computed iteratively using the following dependence: ? ??[?? + 1] ? 1 2 ??+1 ?? ?? ?? . 2 ?? ?? ?? =0 ?. ?? (23) ? 1 2 ??+1 ?? ?? ?? . 2 ?? ?? ?? =0 + ?? ?? . 2 ?? ?. ?? (24) 1 2 ? 1 2 ?? ?? ?? ?? . 2 ?? ???1 ?? =0 ?. ?? + ?? ?? . ???(25) 1 2 (??[??] + ?? ?? . ??)(?????? ??). Therefore, depending on the parity of X [??] + ?? ?? . ?? , we do compute X [?? + 1] as or ??[?? + 1] = ??[??]+??.??+?? 2 so as to make the numerator divisible by 2. Since ?? < ?? and X [0] = 0, one has 0 ? ??[??] < 2?? for all 0 ? ?? < ?? . In References [21] and [22], the result of a Montgomery multiplication is presented as ??. ??. 2 ??? (?????? ??) < 2?? when ??, ?? < 2?? and 2 ?? > 4??. As a result, by redefining "n" to be the smallest integer such that 2 ?? > 4?? , the subtraction at the end of algorithm can be avoided and the output of the multipication can be directly used as an input for the next Montgomery multiplication. # c) Modular Multiplication Algorithms In RSA, the public encryption key is a pair of positive integers (E, N) and the private decryption key is another pair of positive integers (D, N). To encrypt a message using the key (E, N) the following structural approach have been implemented. Fig. 1 represents the Serial Montgomery multiplication, whereas the parallel Montgomery is presented in Fig. 2. It encompasses two Montgomery multipliers connected in parallel. In our research work, we have implemented ?????????? ? 2 Modular multiplier based multiplication architecture. A brief description of the employed algorithm is as follows: CONTROLLER MUX22 MUX22 T 1 E 1 0 1 ei e0/1 0 MPRODUCT1 SQUARE1 SAMMM1 SAMMM2 MODULUS M SQUARE1.1 MPRODUCT1.1 # Hardware Design Fig. 2 presented earlier shows the architecture of a 32-bit RSA processor based on the proposed Commutative RSA algorithm. We use four 32-bit linear shift registers to store operands needed in computing 32-bit RSA operation. The operations of the RSA processor are described in the following. In the initial stage, commutative RSA operands are loaded into shift registers serially through an input buffer. While loading message M into the text register, we shift the exponent register until the first nonzero is the most significant bit and count the number of bits of exponent log 2 E. After the initial stages, we start the multiplier. Once the first output bit of the multiplier is ready, we start the Montgomery module immediately. So the execution time of CPA, multiplier, and Montgomery module is almost overlapped. Therefore, the function units of our design are fully utilized during computation. Carry-Propagation Adder and Serial Parallel Multiplier: The carry-propagation adder converts the carry-save form of the output from the Montgomery module to non-redundant binary form. It generates one bit output per cycle to the serial-parallel multiplier for the next iteration. The serial-parallel multiplier is used to realize the multiplication and square of two n +1 bit numbers. It first generates the n + 2 lower bits of a product serially to the Montgomery module, and then it stops and holds the n higher bit of the product. The n higher bits of the product will be added with the output of the Montgomery module to get the modular multiplication result. The multiplier itself is a linear array type with a special input circuit. When the multiplier is generating a product of two numbers, the parallel input M0 is ready in the text register and another operand can arrive in serial. However, if we want to square one number, a serial input of the operand will make the multiplier fail. We solved this problem by scheduling the serial input operands and insert some zeros to avert the failure of the squaring operation. # Montgomery Module: The Montgomery module is shown in Fig. 2 and the overall operation for Montgomery modular multiplication and its functional approach has already been presented in previous sections. The variable X[0] refers the n+2 lower bit of the product from the multiplier. X[0] enters the Montgomery module one bit per cycle from the lower bit to the higher bit in series. The reduction step is a shift-and-add operation that is very similar to the basic step of a multiplication. The quotient determination is a parity decision on the summation of the intermediate result and the carry. This can be done simply by an exclusive-OR gate with inputs of ??[i] and the LSB of the intermediate result in the previous iteration. After n + 2 iterations, the Montgomery module will add X [n + 2] and the ?? higher bits of the product from the multiplier together. The result is then sent to the carry-propagation adder for the next modular multiplier iteration. In this work, we have developed two CRSA cryptography cores. First model represents the Serial Montgomery multiplier based design, while the second describes the optimized Parallel Montgomery based CRSA cryptography core implementation. In parallel Montgomery approach, two Montgomery multipliers have been used in parallel. The results obtained after implementation have been summarized in the following sections. # VI. # Results The robust commutative RSA core, whose details were presented in earlier sections, has been implemented on multiple FPGA devices for simulation and illustration of data authenticity among multiple user terminals in a communication environment. The Even in the proposed system, the trade-off between power consumption is also very small and it is only 0.03% higher in Parallel Montgomery based CRSA. The simulation results for encryption and decryption obtained by the Serial Montgomery based ???????? core is presented in Fig. 3 and Fig. 4 respectively. The functional verification of the Parallel Montgomery based ???????? cryptographic core is shown in Fig. 5 and Fig. 6. The graphical comparison for the performance of Serial and Parallel Montgomery based Commutative RSA architectures has been presented in Fig. 7 to Fig. 10. # XIII Issue XV Version # Conclusion A noble security or authentication public key cryptography technique called Commutative RSA has been implemented for multiple MIMO or transceiver terminals for accomplishing the goal of data security in multiuser communication environment. The commutative RSA approach has been implemented with multiple FPGA cores that functions as individual transceiver terminal and performs its encryption and decryption individually without affecting the original data. The two approaches based on Montgomery multiplication with Radix-2 multiplier have been designed and individual modules for Serial Montgomery (SM) and Parallel Montgomery have been simulated. The results obtained have been compared and it has been found that the proposed Parallel Montgomery (PM) architecture performs better as compared to Serial Montgomery. The proposed PM based CRSA cryptography core has exhibited 12.1% higher throughput as compared to Serial Montgomery based CRSA. Similarly, the frequency or speed of the proposed system is also higher. The proposed system exhibits trade-off of 0.03% in power consumption. Thus considering various aspects of this research work, it can be stated that the proposed Parallel Montgomery based Commutative RSA performs better than the serial based Montgomery multiplication application. 16![????????????: ?? = ????(??, ??, ??) ? ??. ??. 2 ??? (?????? ??), 0 ? ?? < ??](image-2.png "( 16 )") 12![Figure 1 : Serial Montgomery Multiplication Architecture](image-3.png "Figure 1 :Figure 2 :") ![proposed work for the implementation of commutative RSA cryptography core has been simulated with three individual FPGA devices. The implementation of FPGA cores do signify the MIMO or multiple transceiver terminals in multiuser communication environment. The design has been coded in VHDL and has been simulated using Xilinx Design Suite 14.3 targeted on Virtex-5, xc5vlx330t-2-ff1738FPGA. In this work, two systems have been developed as mentioned earlier. One is the Serial Montgomery based Cryptography core and the second is our proposed Parallel Montgomery based cryptography core. The results obtained for both architectures have been compared. Considering the performance parameters like Memory occupancy, speed, power consumption, delay and throughput, it has been found that the Parallel Montgomery performs better than Serial Montgomery (SM) based Commutative RSA implementation. The delay in Parallel Montgomery based CRSA is 13.78% lower as compared to Serial Montgomery based CRSA cryptography core. Similarly, the throughput of Parallel Montgomery based CRSA is 12.11% higher than the serial Montgomery based CRSA architecture.](image-4.png "") 3![Figure 3 : Simulation Waveforms Using Serial Montgomery based : Encryption](image-5.png "Figure 3 :") 456![Figure 4 : Simulation Waveforms Using Serial Montgomery based : Decryption](image-6.png "Figure 4 :Figure 5 :Figure 6 :") 1Serial and Parallel Montgomery based CRSAcryptography CoreCRYPTOGRAPHY CORECRSACRSACIRCUITSERIAL MONTGOMERYPARALLEL MONTGOMERYDEVICExc5vlx330t-2-ff1738xc5vlx330t-2-ff1738SLICE LUT913844LUT USED AS LOGIC913813OCCUPIED SLICES290311 2STATIC POWER (mW)3516.73516.75DYNAMIC POWER (mW)4.765.72TOTAL POWER (mW)3521.463522.47and proposed Parallel Montgomery based CommutativeRSA cryptography coreCRYPTOGRAPHY CORECRSACRSACIRCUITSERIAL MONTGOMERYPARALLEL MONTGOMERYDEVICExc5vlx330t-2-ff1738xc5vlx330t-2-ff1738 3Throughput) Analysis for Serial and Parallel MontgomeryBased CRSA Cryptography CoreCRYPTOGRAPHY CORECRSACRSACIRCUITSERIAL MONTGOMERYPARALLEL MONTGOMERYDEVICExc5vlx330t-2-ff1738 xc5vlx330t-2-ff1738FREQUENCY (MHz)199.57227.08DELAY (ns)5.014.40THROUGHPUT(kbps)779.58887.02 © 2013 Global Journals Inc. (US) © 2013 Global Journals Inc. (US) 1.5 ?X[??] = ??[??] ? ??; 1.6 ð??"ð??"??????ð??"ð??"?? ?? = X[??] (20) 1.4 X[??] > ?? ???????? © 2013 Global Journals Inc. (US) Volume * An RNS to binary converter in 2n+1, 2n, 2n-1 moduli set BPremkumar IEEE Trans. Circuits Syst. II 39 July 1992 * Applied Cryptography: Protocols, Algorithms, and Source Code in C BSchneier 1996 John Wiley & Sons * A method for obtaining digital signature and public-key cryptosystems RLRivest AShamir LAdleman Commun. ACM 21 2 Feb. 1978 * Multiplication without Trial Division PLMontgomery Math. Computation 44 1985 * Hardware Implementation of Montgomery's Multiplication Algorithm SEEldridge CDWalter IEEE Trans. Comput 42 1993 * Towards an FPGA Architecture Optimized for Public-Key Algorithms'; the SPIE Symposium on Voice, Video and Communications AJElbirt CPaar Sept. 1999 * Montgomery Exponentiation on Re-configurable Hardware TBlum CPaar Proc. 14th Symposium on Computer Arithmetic 14th Symposium on Computer Arithmetic 1999 * Implementation of 1024-bit processor for RSA cryptosystem YSKim WSKang JRChoi * A Complexity-Effective Version of Montgomery's Algorithm'. Presented at the Workshop on Complexity Effective Designs (WECD02) VBunimov MSchimmler BTolg May 2002 * Modular Multiplication and Exponentiation Architectures for Fast RSA Cryptosystem Based on Digit Serial Computation GustavoDSutter Jean-Pierre JoséLuis IEEE TRANSACTIONS ON INDUSTRIAL LECTRONICS 58 7 JULY 2011 * Cellular-Array Modular Multiplier for Fast RSA Public-Key Cryptosystem Based on Modified Booth's Algorithm Jin-HuaHong Cheng-WenWu IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS 11 3 JUNE 2003 * Modified Montgomery modular multiplication and RSA exponentiation techniques MMcivor JVMcloone Mccanny 10.1049/ip-cdt:20040791 IEE Proceedings online 20040791 * A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm FAlexandre Tenca KC¸ Etin Koc IEEE TRANSACTIONS ON COMPUTERS 52 9 SEPTEMBER 2003 * A Hardware Algorithm for Modular Multiplication/ Division MarceloEKaihara Naofumitakagi IEEE TRANSACTIONS ON COMPUTERS 54 1 JANUARY 2005 * Analyzing and comparing Montgomery multiplication algorithms CKKoç TolgaAcar BSKaliski Jun 1996 16 3 Micro, IEEE Publication * A New RSA Cryptosystem Hardware Design Based on Montgomery's Algorithm Ching-ChaoYang Tian-SheuanChang Chein-WeiJen IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: ANALOG AND DIGITAL SIGNAL PROCESSING 45 7 JULY 1998 * A Parallel Implementation of Montgomery Multiplication on Multicore Systems: Algorithm, Analysis, and Prototype ZhiminChen PatrickSchaumont IEEE TRANSACTIONS ON COM-PUTERS 60 12 DECEMBER 2011 * Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation DanielGomesGuilhermeperin JoãobaptistaMesquita Martins 10.1155/2011/127147 International Journal of Reconfigurable Computing 2011 2011 * A new RSA encryption architecture and hardware implementation based on optimized Montgomery multiplication PFournaris OKoufopavlou Proc. IEEE ISCAS IEEE ISCAS May 23-26, 2005 * Modular Multiplication without Trial Division PLMontgomery Math. of Computation 44 170 Apr., 1985 * Montgomery in Practice: How to Do It More Efficiently in Hardware LBatina GMuurling Proc. Cryptographer's Track at the RSA Conf., Topics in Cryptology (CT-RSA '02) Cryptographer's Track at the RSA Conf., Topics in Cryptology (CT-RSA '02) Feb. 2002 * Precise Bounds for Montgomery ModularMultiplication and Some Potentially Insecure RSA Moduli DWalter Proc. Cryptographer's Track at the RSA Conf. Topics in Cryptology (CT-RSA '02) Cryptographer's Track at the RSA Conf. Topics in Cryptology (CT-RSA '02) Feb. 2002 * FPGA implementation of RSA cryptoalgorithm using shift and carry algorithm NSPerovic MPopovic-Bozovic th Telecommunications Forum (TELFOR) Page 2012 on 20-22 Nov. 2012 20