# Introduction

ryptography is the branch of computer science that deals with security. It supports operations such as encryption and decryption. The cryptography is implemented in the form of hash functions, symmetric key algorithms, and public key algorithms. The symmetric and public key algorithms are used for encryption and decryption while hash functions are one way functions as they don't allow the retrieval of processed data. As MD5 and SHA are the two mostly used algorithms in the industry, this paper focuses on secure hash algorithms. MD5 can avoid collision attacks [1] with computational feasibility while SHA -1 attacks also computationally expensive [2]. As SHA-1 is not fully secure, the SHA -2 was introduced [3]. At hardware level in order to improve the performance, GPPs (General Purpose Processors) are used. SHA improvement has been done [10], [11]. This paper introduces the implementation of SHA algorithm at hardware level using techniques described here. They are pipeline techniques [5], [18]; embedded memories used to store constant values [8]; improved addition and balanced delays [4], [12]; unrolling techniques [5], [9], [10], [12]; balanced carry save address and parallel counters [4], [5], [7].

This paper proposes two architectures that can be used with hardware. This is meant for achieving high throughput. The results of implementation of SHA-1and SHA-2 algorithms at hardware level provide more speed when compared with software implementations.


# II.


# Hash Functions

Since from its inception in 1993, the hash algorithms are improved further to have SHA, SHA-1 and SHA-2. The original SHA was revised in 1995 [15] and named as SHA-1 while SHA-2 WAS INTRODUCED IN 2001 which makes use of DM thus making it more robust to security attacks. SHA functions are available with 128, 256 and 512 bits. From the given input message SHA-1 can produce 160 bit message digest as output. Final DM of 256 bits is the output of SHA 256. The computation of SHA 512 is identical to that of SHA 256. The difference is in the size of operands that means it uses 64 bits instead of 32 bits. Moreover the DM of this algorithm has 512 bits the logical function used are also different [15]. Fig. 1 and Fig. 2 show the round calculations of SHA-1 and SHA-2. SHA-1 needs 80 rounds while SHA 256 uses 64 rounds. Each round of SHA-1 requires value from previous round. As rounds are data dependent, it is essential that rounds are carried out sequentially. By unrolling each round computations [10] attempts to speed it up. Another approach increases throughput as it makes use of pipelined structure [11]. As described in [13], high throughput is achieved in SHA-1. Operations rescheduling with respect to this paper has operations rescheduling, hash value initialization, and improved Hash Value Function. The whole computation of SHA-1 is in the A as the rest do not need any computation. The required values are provided by previous round values of various A to D. By adding zero to initialization vector, the internal hash value of first data block can be initialized. Later on this value is loaded into internal registers through multiplexer. In this case instead of the value to the register, it is set to zero as described in [6]. Improving hash value addition is done after all the rounds have been computed for a given data block.

Here the internal variables are to be added to the current DM and it needs four additional adders. Finally SHA-1 data block expansion is described here. 512 bits of each data block is expanded in hardware for efficiency reasons as described in [1]. It can be implemented using XOR operations and also registers. As done in SHA-1, the functional rescheduling can also be applied to SHA-2 as well. However, its computational complexity of it is more. In each round values are calculated as and when required. As described in [19], the part that has to be computed is identified. With respect to operational rescheduling values for B, C, D, F, G and H are obtained directly. However, A and E values can't be computed until they are computed in the previous round. With respect to hash value addition and initialization, similar to SHA-1, the internal variables of SHA-2 also have to be added to the DM. It needs eight adders. For each SHA 256 and SHA 512 of 32 bits and 64 bits respectively an adder is required. The empirical results reveal that DM addition with a shift is more efficient. With respect to SHA-2 data block expansion done by data block expansion unit, it is similar to SHA128 in terms of computations. The XOR operation is replaced by arithmetic addition.


# III.


# Implementation

The SHA designs described above has been implemented as processor cores on a Xilinx VIRTEX II Pro FPGA. The FPGA embedded RAMs (BRAMs) are used in order to implement ROM used to store SHA256 and SHA512. Register -based structures can also be used alternatively. One is based on circular fashion in which memory blocks addressed while the other one is based on FIFOs (First -inputs -first -outputs).


# IV. Performance Analysis and Related Work

The resulting cores have been implemented in different Xilinx devices in order to compare the architectural gains of the proposed SHA structures. SHA -1 core, SHA 256 core and SHA 512 core performance comparisons are provided in tables I, II and III respectively.


# Design

Lien [11] Lien [11] Our  Integration with Processor SHA algorithms that have been implemented are integrated with a processor known as MOLEN polymorphic processor and its operation [14], [16] is based on the coprocessor architectureal paradigm which allows SHA cores to be embedded in a reconfigurable coprocessor with the GPP. This implementation is similar to the one given in [17]. When compared with software implementations, it is capable of achieving throughputs such as 5 Mbit/s and 4 Mbit/s for SHA 256 and SHA 128 respectively and overall speed is increased by 150 times.


# VI.


# Conclusion

We have implemented SHA-1 and SHA-2 algorithms at hardware level. This achieves the reutilization and rescheduling of hardware in terms of area and speed. Critical path can be reduced with the help of operation rescheduling. It leads to the very good usage of pipeline structure. The SHA-2 which makes use of DM causes the reduction of reconfigurable resources. This also hides the extra clock cycle delay.

The results of implementation reveals that the hard ware implementation of the hash algorithms are many times better than the software implementations of the same. 
112![Fig. 1 : Shows round calculation of SHA-1](image-2.png "Fig. 1 : 1 CFig. 2 :")
![Final output is original data for first 16 rounds and computed values for the rest of rounds. b) Design for SHA 2.](image-3.png "")
1-Exp.CAST[20]Helion[21]Our-Cst.Our+IV
3V.
			F © 2012 Global Journals Inc. (US) 2012 July
			© 2012 Global Journals Inc. (US)
		
		
* 
	
		Finding MD5 collisions-A toy for a notebook
		
			Klima
		
	
		Cryptology ePrint Archive
		
			2005/075, 2005
		
	
* 
	
		Finding collisions in the full SHA-1
		
			XWang
		
		
			YLYin
		
		
			HYu
		
	
		Lecture Notes in Computer Science
		
			3621
			
			2005
			Springer
		
	
* 
	
		FIPS 180-2, secure hash standard (SHS)
		
			2002
		
		
			National Institute of Standards and Technology (NIST), MD
		
	
* 
	
		The design of a high speed ASIC unit for the hash function SHA-256 (384, 512)
		
			LDadda
		
		
			MMacchetti
		
		
			JOwen
		
	
		Proc. DATE
				DATE
		
			2004
			
		
* 
	
		Quasi-pipelined hash circuits
		
			MMacchetti
		
		
			LDadda
		
	
		Proc. IEEE Symp. Comput. Arithmetic
				IEEE Symp. Comput. Arithmetic
		
			2005
			
		
* 
	
		An ASIC design for a high speed implementation of the hash function SHA-256 (384, 512)
		
			LDadda
		
		
			MMacchetti
		
		
			JOwen
		
		
			DGarrett
		
		
			JLach
		
		
			CAZukowski
		
		
			Eds
		
	
		Proc. ACM Great Lakes Symp. VLSI
				ACM Great Lakes Symp. VLSI
		
			2004
			
		
* 
	
		Comparative analysis of the hardware implementations of hash functions SHA-1 and SHA-512
		
			TGrembowski
		
		
			RLien
		
		
			KGaj
		
		
			NNguyen
		
		
			PBellows
		
		
			JFlidr
		
		
			TLehman
		
		
			BSchott
		
	
		Lecture Notes in Computer Science
		ISC,A.H. Chan and V. D. Gligor
		
			2433
			
			2002
			Springer
		
	
* 
	
		Efficient singlechip implementation of SHA-384&SHA-512
		
			MMcloone
		
		
			JVMccanny
		
		
* 
	
		Proc. IEEE Int. Conf. Field-Program.Technol
				IEEE Int. Conf. Field-Program.Technol
		
			2002
			
		
* 
	
		Implementation of the SHA-2 hash family standard using FPGAs
		
			NSklavos
		
		
			OKoufopavlou
		
	
		J. Supercomput
		
			31
			
			2005
		
	
* 
	
		A 1 Gbit/s partially unrolled architecture of hash functions SHA-1 and SHA-512
		
			RLien
		
		
			TGrembowski
		
		
			KGaj
		
	
		Proc. CT-RSA
				CT-RSA
		
			2004
			
		
* 
	
		Networking data integrity: High speed architectures and hardware implementations
		
			NSklavos
		
		
			EAlexopoulos
		
		
			OGKoufopavlou
		
	
		Int. Arab J. Inf. Technol
		
			1
			
			2003
		
	
* 
	
		Optimisation of the SHA-2 family of hash functions on FPGAs
		
			RPMcevoy
		
		
			FMCrowe
		
		
			CCMurphy
		
		
			WPMarnane
		
	
		Proc
				null
		
			2006
			
		
* 
	
		Rescheduling for optimized SHA-1 calculation
		
			RChaves
		
		
			GKuzmanov
		
		
			LASousa
		
		
			SVassiliadis
		
	
		Proc. SAMOS Workshop Comput. Syst. Arch. Model. Simulation
				SAMOS Workshop Comput. Syst. Arch. Model. Simulation
		
			Jul. 2006
			
		
* 
	
		The MOLEN polymorphic processor
		
			SVassiliadis
		
		
			SWong
		
		
			GNGaydadjiev
		
		
			KBertels
		
		
			GKuzmanov
		
		
			EMPanainte
		
	
		IEEE Trans
		
	
* 
	
		Announcing the standard for secure hash standard
		
			Comput
		
	
		FIPS
		
			53
			11
			
			Nov. 2004. 15. 1993
		
		
			National Institute of Standards and Technology (NIST), MD
		
	
* 
	
		The MOLEN __-coded processor
		
			SVassiliadis
		
		
			SWong
		
		
			SDCotofana
		
	
		Proc. 11th Int. Conf. Field-Program. Logic Appl. (FPL)
				11th Int. Conf. Field-Program. Logic Appl. (FPL)
		
			Aug. 2001
			2147
			
		
* 
	
		Reconfigurable memory based AES coprocessor
		
			RChaves
		
		
			GKuzmanov
		
		
			SVassiliadis
		
		
			LASousa
		
	
		Proc. 13th Reconfigurable Arch. Workshop (RAW)
				13th Reconfigurable Arch. Workshop (RAW)
		
			Apr. 2006
			
		
* 
	
		Optimizing SHA-1 hash function for high throughput with a partial unrolling study
		
			HEMichail
		
		
			APKakarountas
		
		
			GNSelimis
		
		
			CEGoutis
		
	
		Lecture Notes in Computer Science
		PATMOS, V. Paliouras, J. Vounckx, and D. Verkest
		
			3728
			
			2005
			Springer
		
	
* 
	
		Improving SHA-2 hardware implementations
		
			RChaves
		
		
			GKuzmanov
		
		
			LASousa
		
		
			SVassiliadis
		
	
		Proc. Workshop Cryptograph. Hardw. Embedded Syst. (CHES)
				Workshop Cryptograph. Hardw. Embedded Syst. (CHES)
		
			Oct. 2006
			
		
* 
	
		New Techniques for Hardware Implementations of SHA