Artificial Intelligence formulated this projection for compatibility purposes from the original article published at Global Journals. However, this technology is currently in beta. *Therefore, kindly ignore odd layouts, missed formulae, text, tables, or figures.* 

# Dynamic Power Reduction in Modified Lifting Scheme Based DWT for Image Processing Dr. C.Chandrasekhar<sup>1</sup> and Dr. S.Narayana Reddy<sup>2</sup> <sup>1</sup> S.V.University, Tirupathi. Received: 11 February 2012 Accepted: 2 March 2012 Published: 15 March 2012

#### 7 Abstract

8 Image compression is one of the major applications in image processing that imposes greater design challenges for VLSI design engineers in design and development of low power and high 9 speed architectures. DWT is used in image compression for transformation of image from 10 spatial to frequency domain. In this paper, DWT architecture based on lifting scheme is 11 considered and dynamic power reduction is achieved with suitable modifications to the 12 architecture and adoption of low power techniques. The interdependency of scaling and 13 dilation coefficients is simplified to single hierarchy and thus reduces latency and increases 14 throughput. Wallace tree multiplier and carry select adder are used in realizing 1D DWT 15 architecture. The hierarchy in the design enables to adopt multi-stage and hierarchical clock 16 gating technique thus reducing dynamic power. Power gating and DVFS techniques are also 17 adopted to optimize power dissipation. The modified lifting architecture operates at a 18 maximum frequency of 290MHz, and reduces power by more than 50 19

20

Index terms— Dynamic power dissipation, DWT, Lifting Scheme, Hierarchical design, low power design ASIC implementation.

# 23 1 Introduction

24 WT is recommended by JPEG2000 standards as it supports features like progressive transmission, higher 25 compression and region of interest encoding schemes. Convolution based DWT or FIR filter bank based DWT architectures occupy large area as they require more number of multipliers and adders, thus making 26 the computations complex and time consuming. Mobile phones and other similar hand held devices that support 27 image//video applications demand high speed and low power architectures with reduced memory size for DWT 28 processing. There are several architectures discussed in literature to perform lifting based DWT. General approach 29 for 2-D DWT is to apply the 1-D DWT row-wise which produces L and H subbands and then process these sub-30 bands columnwise to get LL, LH, HL and HH coefficients. Several architectures like direct mapped [2], folded [3], 31 and flipping [4] for single level and multi-level DWT have been proposed to implement 1-D lifting DWT. Many 32 Author? : HOD, Dept.of ECE SVCET, CHITTOOR, prof & Head in Dept.of ECE, S.V.University, Tirupathi E-33 mail: snreddysvu@yahoo.com architectures that implement the Two-Dimensional separable Forward (2D-DWT) 34 35 and Inverse DWT (2D-IDWT) in order to be applied on 2D signals have been presented in the past [5], [6], [7] 36 and [8]. These architectures are consisting of filters for performing the 1D-DWT and memory units for storing 37 the results of the transformation. Due to the fact that streaming multimedia applications -in which the DWT is presentare characterized by high throughput requirements, this imposes the need for optimizing the design of the 38 filters in terms of speed. Moreover, portable multimedia devices require low power consumption for increasing the 39 battery lifetime and this can be achieved by minimizing the storage size and number of memory accesses [9]. Low 40 power DWT architectures based on pipelining and parallel processing has been discussed in [10] and ??11], in 41 their work low power is achieved by modifying the architecture to reduce number of computations the design was 42 implemented on FPGA. Many of the low power techniques reported in literature [12], [13], [14] and [15] for DWT 43

#### 5 SUBSYSTEM DESIGNS FOR DWT ARCHITECTURE

44 propose modifications in the architecture level to reduce power dissipation. Power reduction can be accomplished

at various levels of abstraction starting from architecture level to circuit level. Power reduction at the sub system
 level or at the circuit level can be accomplished when ASIC design of DWT architecture is performed. Many of

the work reported in literature have restricted to FPGA implementation. In this paper, in order to demonstrate

48 the dynamic power reduction techniques at various levels of abstraction, DWT architecture is considered as a

test case for illustration. ASIC design of DWT architecture optimizing dynamic power reduction using 65nm
 TSMC libraries is performed.

51 Section II discusses wavelet transforms, DWT architecture and dynamic low power reduction techniques.

52 Section III discusses proposed low power schemes for design DWT architecture sub systems. Section IV presents

ASIC implementation of DWT architecture based on low power schemes. Section V discusses implementation

 $_{\rm 54}$   $\,$  results and performance comparison and section VI presents conclusion.

# <sup>55</sup> 2 a) DWT and Low Power Schemes

In this section, DWT architecture and low power schemes are presented. Lifting scheme based DWT architecture is considered as test case for dynamic power reduction and is briefly discussed in this section.

## <sup>58</sup> 3 i. DWT architecture

In wavelet analysis, signals are represented using a set of basis functions derived by shifting and scaling a single 59 prototype function, referred to as "mother wavelet", in time [16]. Wavelet transforms are closely related to tree 60 structured digital filter banks and multiresolution analysis. A set of wavelet basis functions can be generated 61 by translating and dilating the mother wavelet. A number of architectures have been proposed for calculation 62 of DWT [2], [3], [4], [5] and [6]. The architectures are mostly folded and can be broadly classified into serial 63 architectures (where the inputs are supplied to the filters in a serial manner) and parallel architectures (where 64 the inputs are supplied to the filters in a parallel manner). A methodology for implementing lifting-based DWT 65 66 that reduces the memory requirements and communication between the processors, when the input is broken up into blocks is presented in [17]. Figure 1 [17] The z-1 blocks are for delay, ?, ?, ?, ?, ? are the lifting coefficients 67 and the shaded blocks are registers. 9/7 filter has been used for implementation which requires four steps for 68 lifting and one step for scaling. The input signal xi is split into two parts even part x 2i and odd part x 2i+169 then the first step of lifting performed is given by the equations [17]. 70 Then the second lifting step performed gives: 71

Then scaling is performed and the following equations are obtained: di 1 = ? (x2i + x2i+2) + x2i+1 ai 1 = ? di 1 + di 1 -1) + x2i di 2 = ? (ai 1 + ai 1 +1) + di 1 ai 2 = ? (di 2 + di 2 -1) + ai 1 ai = ? ai 2 di = di 2 ? ? The predict step helps determine the correlation between the sets of data and predicts even data samples from odd. These samples are used in the update step for updating the present phase. Some of the properties of the original input data can be maintained in the reduced set also by construction of a new operator using the update step. The lifting coefficients have constant values of -1.58613, -0.0529, 0.882911, 0.44350, -1.1496 for ?, ?, ?, ?, ? respectively. ai and di are DWT outputs after level 1 decompisition.

79 ii. Sources of power dissipation in CMOS VLSI circuits Power consumption in CMOS digital circuits is divided 80 two major components (Static and Dynamic) as shown in Figure 2 (a). Static power is due to leakage current and short circuit current, dynamic power is due to switching current. Power dissipation is CMOS is exponentially 81 increased with scaling in transistor size. Figure 2(b) shows the power dissipation in CMOS with technology 82 scaling. Dynamic power dissipation was dominating with 250nm technology, with technology scaling towards 83 lower geometries (65nm and below), leakage power has significantly increased. However, dynamic power has also 84 exponentially increased; this is due to the fact in increase in switching current and frequency of operation of 85 CMOS circuits. There are various low power reduction techniques such as [18] 86

# <sup>87</sup> 4 Global Journal of Researches in Engineering

# **5** Subsystem Designs for Dwt Architecture

An adder is the most commonly used arithmetic block in the Central Processing Unit (CPU) of a microprocessor, 89 a Digital Signal Processor (DSP), and even in a variety of ASICs. In a DWT processor, adder is one of the 90 important building blocks, required to compute the DWT coefficients of input signal. Multiplier used in a DWT 91 processor also requires adder to add the partial products. Hence, design and analysis of adder is considered in 92 93 this section. Speed and optimization of power of an adder is significant, to improve the overall performance of 94 the system. But an adder also experiences the power-delay trade off. That is, its power dissipation increases 95 with reduction in delay and vice versa. There are various architectures for adder design. 4-bit adders can be of 96 different types. Some of those are Carry look Ahead Adder, Ripple Carry Adder, Carry Save Adder, Carry Select Adder. In many digital signal processing operations-such as correlations, convolutions, filtering, and frequency 97 analysis-one needs to perform multiplication. Multiplication algorithms will be used to illustrate methods of 98 designing different cells so that they fit into a larger structure. In order to introduce these designs, simple and 99 serial and parallel multipliers will be introduced. High-speed parallel multipliers are becoming one of the keys 100 in RISCs (Reduced Instruction Set Computers), DSPs (Digital Signal Processors), and graphics accelerators and 101

so on. Parallel multipliers are used in data processor as well as in digital signal processors. There are various 102 multiplier architectures reported in literature, Wallace tree, booths multiplier, BZ-FAD multiplier, Shift and Add 103 multiplier and Array multiplier are most popular for DSP applications. In this work, the adders and multipliers 104 are modeled using HDL and is synthesized using TSMC 65nm CMOS libraries using Synopsys DC. The synthesis 105 results generate reports that provide information on area, delay and power dissipation. The results obtained 106 are presented in table 1 and table 2 is without low power techniques. Multipliers are designed using carry save 107 adders. In order to reduce power dissipation of adder and multiplier, multi VDD technique is adopted. Reducing 108 VDD supply voltage, reduce the power consumption, there will be no effect on area. From the results obtained 109 it is found that power consumption is a quadratic function of voltage (Power= fCV DD 2 ). Decrease in supply 110 voltage increases the overall delay (Delay = (KV DD /V DD -V t)? . 111

The synthesis results generate reports that provide information on area, delay and power dissipation. The 112 results obtained are presented in table 1 and table 2 is without low power techniques. Multipliers are designed 113 using carry save adders. Lifting equations presented in (??) -(??) when realized using HDL model is a sequential 114 process, as the scaling factors and are dependent on previous samples, thus introducing latency. In order 115 to increase throughput and latency modified equation are derived. The modified lifting equations eliminate 116 dependency of outputs on previous samples. We have obtained the equations for a i and d i by substituting 117 118 (4) in (??), (??) in (??) and so on. The lifting coefficients were substituted and the results were scaled by 119 multiplying with 256 to avoid decimal and to round off the values. The modified lifting scheme equations are: 120 These equations are obtained by taking coefficients as common. The equations have initial latency, as the input samples need to be stored before DWT ai and di coefficients computations.? a i = 294\* (8(6\*x 2i + 4\*x 2i-2 + x 2i-2 + 121 2i +4+x 2i+4 +x 2i-4 +4\*x 2i+2 122

The design of low power architecture to reduce dynamic power dissipation is based on equations (7) and (8). 123 From the equation the following are the observations made: The proposed architecture shown in Figure 4 takes 124 two inputs and gives two outputs per cycle. Data1 and Data2 are the odd and even input samples given to 125 hardware in single clock for 100 % hardware utilization. This architecture is very simple design as compared to 126 other architectures suggested in [20] which have complex control path to achieve 100% hardware utilization. The 127 row processor and column processor shown in figure 4 are realized using modified lifting scheme based equations. 128 Figure ?? : Row processor and column processor for modified lifting DWT Based on the architecture shown in 129 figure ?? and equations presented in (7) and (??), the top level model for the architecture is shown in figure 4. A 130 detailed data flow for the proposed architecture is presented in the The modified architecture derived consists of 131 the following blocks: parallel input and serial output register, serial input and parallel output register, Multiplier 132 and adders and control unit. The HDL model is developed and the design is verified for its functionality using 133 test bench in ModelSim. The functionally correct HDL code is synthesized using Synopsys DC targeting TSMC 134 65 nm library and technology files. The reports obtained are complied and presented in table 4. From the 135 results obtained and tabulated in Table 4, it is found that due to changes in architecture that reduces number of 136 stages in DWT computation, the dynamic power dissipation is reduced be 37%. However, the area is increased 137 due to increase in registers and intermediate storage units, the design is synthesized to obtain minimum delay 138 and zero slack requirement. Due to architectural changes it is demonstrated that dynamic power is reduced by 139 37%. In order to further reduce power dissipation various other dynamic low power techniques are introduced for 140 optimization. The simplest, general (or automatic) clock gating inserts a single clock gate for each register bank. 141 Most tools permit the user "split" register banks or to prevent clock gate "sharing" across unrelated register 142 143 banks. To save even more dynamic power, advanced clock gating styles such as multi-stage and hierarchical can be used, depending on design architecture and design requirements. The modified lifting DWT have common 144 coefficients and thus need to be enabling at different instants of time and hence multi-stage clock gating technique 145 is implemented. The 2D DWT architecture is realized using sub systems (multipliers, adders and registers), 1D 146 DWT and finally 2D DWT, in order to reduce power dissipation hierarchical clock gating technique is adopted. 147 Figure 5 shows the multistage clock gating technique introduced into the row processor. Enable adder enables 148 all adders together, similarly the enable reg enables all intermediate registers, thus saving power. In order to 149 implement power gating technique power gates and state retention register required. Power gating cells are 150 required for turning blocks on and off. State retention registers in their turn are useful because, if the state of 151 a shut down or "sleeping" block needs to be retained the most automated method to retain the state is the use 152 of retention registers. These registers have a backup power supply connection that remains always on to hold 153 the state of the register via a high voltage threshold latch built into the register. An isolation cell is required to 154 ensure electrical and logical isolation of logic that is shut down from active logic in a design. The reason this is 155 required is because when a block is shut down the internal signal level will transition to an unknown, floating 156 state. Also always on cells are required between switched and steady state blocks to ensuring interoperability. 157 Figure 7 shows the power gating logic for dynamic power reduction. Multiple voltages are used to drive the 158 cells that are active or in standby. In the hierarchical design shown in Figure 6, 1D DWT are active during 159 computation and inactive during data storage, thus power gating techniques are inserted. The most common 160 approach to provide state retention during power gating is to replace a standard register with a retention register. 161 To achieve further improvements in power reduction without resorting to custom circuit techniques, Dynamic 162 Voltage and Frequency Scaling can be used. Dynamic Voltage and Frequency Scaling is effective because of the 163

164 following two facts:

? The amount of energy required to complete a task is proportional to the square of the supply voltage. 165

? The maximum frequency of any CMOS circuit is proportional to the supply voltage. 166

So if the supply voltage is decreased there is a square-law reduction in energy to complete a given task. 167 However the task takes longer to complete because of the linear reduction in frequency. Therefore, the principle 168 gain with Dynamic Voltage and Frequency Scaling is with respect to dynamic power consumption. 169

Dynamic voltage and frequency scaling adjusts performance and energy consumption levels while the logic 170 circuit is active. It is required to reduce processor frequency and voltage to obtain quadratic energy savings. 171 DVFS is an effective way of reducing the CPU energy consumption by providing computation power. 172

DVFS technique has been proven to be a highly effective technique for power minimization subject to a 173 performance constraint. DVFS should consider not only the CPU power, but also the total system power 174 dissipation. In this work, to realize 2D DWT, multiple 1D DWT architecture is realized using modified lifting 175 scheme logic. Thus DVFS is adopted to minimize power dissipation. 176

DVFS computation for modified lifting DWT: Workload of a task, W task, is defined as the total number of 177 clock cycles required to compute 1D DWT. 178

#### Asic Implementation and Result Analysis 6 179

The simulation results for modified DWT are presented in this section. There are sixty four inputs, each having 180 bit width of twenty bits. These inputs are serially sent to the DWT architecture. The DWT consists of registers, 181 multiplexer, adder and multiplier. Whenever the inputs are sent through SIPO (serial input parallel output), the 182 data has been divided into even data and odd data. The even data and odd data are stored in the temporary 183 registers. When the reset is high, the temporary register value consists of zero, whenever the reset is low, the 184 input data is split into the even data and odd data. The input data is read up to sixty four clock cycles, after 185 that the data read according to the lifting scheme. The output data consists of low pass and high pass elements. 186 This is the 1-D discrete wavelet 187

#### Implementation Results and Discussion 7 188

In this work, ASIC design flow is restricted to synthesis only for the modified lifting DWT, thus low power 189 libraries and low power IPS from Synopsys design ware are adopted for synthesis. The synthesis constraint file 190 is set for low power synthesis, the Synopsys DC constraints are: 191

transform. The two level discrete wavelet transform is The constraints are set according to the command 192 193 set in the file above. The low power constraints are supported only if the RTL is hierarchical and is parallel in nature. The constraints file is shown in below. The constraints for dynamic power reduction discussed earlier are 194 set in a constraints file and are used for synthesis. The TCL scripts for DWT TOP MODULE are presented 195 below and are used for synthesis. Figure 10 shows the synthesis netlist obtained using 65nm technology and the 196 interconnections used in the design along with clock tree network. Figure 11 shows the synthesized netlist along 197 with clock tree network. 198

RTL model developed for the modified lifting scheme based DWT architecture is remodeled for ASIC 199 implementation. The design is synthesized using Design Compiler and timing analysis is carried out using 200 Prime Time. The design requires 42 input-output ports and requires 550 cells. The total combinational area is 201 21527.410 sq umm and non-combinational area is 10256.23 sq umm. Total dynamic power is 498.36 ?W. Due to 202 the low power techniques adopted the dynamic power dissipation is reduced by 19%. From the results obtained, 203 design of architecture achieves 37% power reduction; low power techniques presented in this section reduces power 204 dissipation by 17%. Thus maximum power dissipation is achieved at the architecture abstract level. Power saving 205 achieved at various levels of hierarchy is proven in this work. Starting from architecture level to circuit level, 206 power reduction need to be performed and is illustrated in this work. 207 V.

208

#### 8 Conclusion 209

210 In this work, a modified lifting based DWT is proposed, designed and implemented using 64nm TSMC low 211 power design library. Lifting based DWT is considered to illustrate the techniques that can be adopted to reduce 212 dynamic power. Modification in the architecture level as well as at different abstraction levels are considered for 213 power reduction. Low power library cells from Synopsys design ware are considered for synthesis. TCL scripts for constraining the design for various dynamic power dissipation are developed. The RTL model developed is 214 synthesized and performances are estimated. From the results obtained it is found that there is a total of 50%215 power reduction as compared with direct implementation. The developed low power techniques can be adopted 216 to other complex designs. Further power dissipation can be reduced at the physical design stage. 217



Figure 1: Figure 1 :



Figure 2:



Figure 3:



Figure 4: Figure 2 :



Figure 5:



Figure 6: ) - 5  $\ast$ 



Figure 7: ?



Figure 8: Figure 4 :









| Name                                       | Velue | 1 1750. |       | and some of | 160l  | PPLILLE | 1.1 1990 | 1 1 1 1 1 1 1 1 1 1 1 1 1 | Sec. 1 |
|--------------------------------------------|-------|---------|-------|-------------|-------|---------|----------|---------------------------|--------|
| 41                                         | -     | _       |       | -           | -     |         |          |                           |        |
| - e- c8.                                   | 21    |         |       |             |       |         |          |                           |        |
| -e-rst                                     | 510   | 147     | 140 T | 255 1       | 254 I | 255     | T a      | 54 1                      | 134    |
| + = - data_in(20.0)<br>+ = temp_even(20.0) | 123   | 153     | 140   | 100         | 254   | 200     | 255      |                           | 10     |
| e e temp_odd(20:0)                         | 122   | 147     | 1     |             | 255   |         | Y        | 254                       | -      |
| + data_out_odd(20.0)                       |       | \$1 I   | -1    | 1           | 54    | T       | 1        | T                         |        |
| - data out even(20.0                       |       | 267     | 1     | 165         | T.    | 160     | T.       | 267                       | -      |
| -1 :                                       | 510   | 1       |       | 1           | 1     |         | -        | 1                         |        |
| =m1_put[20:0]                              | 145   | 254 1   | 153   | 1           | 140   | 1       | 254      | 1                         |        |
| [0.01]tue 5min +                           | 260   | 407 I   | 301   | I           | 402   | Î       | 508      | 1                         |        |
| - n m3_out[200]                            | -134  | 1 605-  | +150  | 1           | -201  | î       | -254     | 1                         |        |
| a o m4_put[20:0]                           | 4     | 1 [     | 51    | 1           | -3    | 1       | 54       | 1                         |        |
| e ~ m5_out(20.0)                           | 1     | 12      | 1     | -40         | X     | 61      | 1        | 55                        |        |
| w ~ m6_out[20:0]                           | 0     | 13      | 1     |             | 12    |         | 1        | 13                        |        |
|                                            |       |         |       |             |       |         |          |                           |        |

Figure 11: Figure 6 :

- VI. 9 218
- Parameters 10219

1 2 220

 $<sup>^1 \</sup>odot$  2012 Global Journals Inc. (US) Dynamic Power Reduction in Modified Lifting Scheme Based DWT for Image Processing  $^{2}$  © 2012 Global Journals Inc. (US)



Figure 12: Figure 7 :

|           | RTL Power Constructs    | Definition of power domain<br>Isolation behavior of a particular signal<br>Retention behavior of particular registers                                  |  |  |  |
|-----------|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
|           | RTL Simulation          | Power domain simulation<br>Isolation logic simulation                                                                                                  |  |  |  |
| S         | Logic Synthesis         | Create Power Domains<br>Clock Gating<br>Apply OpCond on blocks<br>Special cell Insertion<br>Retention Cell Synthesis<br>Compile<br>MV DFT              |  |  |  |
| Libraries | Physical Implementation | Voltage Area Creation<br>MTCMOS Insertion<br>Physical synthesis<br>Leakage optimization<br>MCMM<br>Scan reordering<br>MV aware CTS<br>MV aware Routing |  |  |  |
|           | Verification            | RTL vs. Gates matching                                                                                                                                 |  |  |  |
|           |                         | Static Low Power Checks Parasitic Extraction                                                                                                           |  |  |  |
|           | Signoff                 | SI, Timing, Power Signoff                                                                                                                              |  |  |  |
|           |                         | Power Network Analysis                                                                                                                                 |  |  |  |

Figure 13:







Figure 15: Figure 8 :

Figure 16: Table 1 :

1

Figure 17: Table 2 :

# 3

| Type of                | No. of              | Power -Delay |     |
|------------------------|---------------------|--------------|-----|
| adder (16 -            | bit) transistors ?W |              | -ps |
| Ripple carry           | 286                 | 40.5505      | 600 |
| adders                 | 22                  | 10.00.11     |     |
| Carry save             | 92                  | 18.9241      | 74  |
| adder<br>Commu colocit | 102                 | 16.897       | 65  |
| Carry select<br>adder  | 102                 | 10.097       | 05  |
| Carry look             | 621                 | 55.1482      | 62  |
| ahead adder            | 021                 | 001110-      | 52  |

Figure 18: Table 3 :

 $\mathbf{4}$ 

DWT

Figure 19: Table 4 :

### **10 PARAMETERS**

## 221 .1 Acknowledgement

- <sup>222</sup> The authors would like to acknowledge Dr. Cyril Prasanna Raj P, for his valuable support and guidance extended
- 223 in completion of this work.
- $[ Wu and Lin ] \ A \ High-Performance \ and \ Memory-Efficient, \ Bing-Fei \ Wu \ , \ Chung-Fu \ Lin \ .$
- [Lai et al. (2009)] 'A Highperformance and Memory-Efficient VLSI Architecture with Parallel Scanning method
   for 2-D Lifting-Based Discrete Wavelet Transform'. Yeong-Kang Lai , Lien-Fei Chen , Yui-Chih Shih . *IEEE*
- 227 Transaction on Consumer Electronics May 2009. 55 (2).
- [Acharya and Chakrabarti ()] 'A Survey on Lifting-based Discrete Wavelet Transform Architectures'. Tinku
   Acharya , Chaitali Chakrabarti . Journal of VLSI Signal Processing 2006. 42 p. .
- [Chakrabarti et al. ()] 'Architectures for wavelet transforms: A survey'. C Chakrabarti , M Vishwanath , R M
   Owens . Journal of VLSI Signal Processing 1996. 4 (2) p. .
- [Catthoor et al. ()] Custom Memory Management Methodology -Exploration of Memory management Organiza tion for Embedded Multimedia System Design, F Catthoor, S Wuytack, E De Greff, F Balasa, L Nachtergale
- , A Vandecappele . 1998. Kluwer Academic Publishers.
- [Cyril Prasanna Raj P, Low power DWT for image compression SASTech Journal ()] 'Cyril Prasanna Raj P,
   Low power DWT for image compression'. SASTech Journal 2008. 7 p. .
- [Liu et al. ()] 'Design and Implementation of a Progressive Image Coding Chip Based on the Lifted Wavelet
   Transform'. C C Liu , Y H Shiau , J M Jou . *Proc. of the 11th VLSI Design/CAD Symposium*, (of the 11th
   VLSI Design/CAD SymposiumTaiwan) 2000.
- [Shanthala et al.] 'Design and VLSI implementation of Pipelined Multiply Accumulate Unit'. Cyril S Prasanna
  Shanthala , P Raj , Dr S Y Kulkarni . was presented at International Conference on Emerging Trends in
  Engineering and Technology (ICETET 09) during 16th -18th December 2009 at G.H. Raisoni College of
  Engineering, (Nagpur (Maharashtra)
- [Marino ()] 'Efficient high-speed/low-power pipelined architecture for the direct 2-D discrete wavelet transform'.
   F Marino . *IEEE Trans. Circuits Systems* 2000. II (12) p. .
- [Chakrabarti and Vishwanath (1995)] 'Efficient realizations of the discrete and continuous wavelet transforms:
  from single chip implementations to SIMD parallel computers'. C Chakrabarti , M Vishwanath . *IEEE Trans. Signal Processing* March 1995. 43 (3) p. .
- [Zervas et al. (2001)] 'Evaluation of design alternatives for the 2-Ddiscrete wavelet transform'. N D Zervas, G P
   Anagnostopoulos, V Spiliotopoulos, Y Andreopoulos, C E Goutis. *IEEE Trans. Circuits and Syst. Video Technol* December 2001. 11 (2) p.
- [Daubechies and Sweldens ()] 'Factoring Wavelet transforms into Lifting Schemes'. I Daubechies , W Sweldens .
   The J. of Fourier Analysis and Applications 1998. 4 p. .
- [Huang et al. ()] 'Flipping Structure: An Efficient VLSI Architecture for Lifting-Based Discrete Wavelet
   Transform'. C T Huang , P C Tseng , L G Chen . *IEEE Transactions on Signal Processing*, 2004. p. .
- [Park and Jung ()] 'High speed lattice based VLSI architecture of 2D discrete wavelet transform for real-time video signal processing'. T Park , S Jung . *IEEE Trans. Consumer Elect* 2002. 48 (4) p. .
- [Lian et al. ()] 'Lifting Based Discrete Wavelet Transform Architecture for JPEG 2000'. C Lian , K F Chen , H
   H Chen , L G Chen . *IEEE International Symposium on Circuits and Systems*, (Sydney, Australia) 2001. p. .
- [Darji et al.] 'Memory Efficient and Low power VLSI architecture for 2-D Lifting based DWT with Dual data
   Scan Technique'. A D Darji , A N Chandorkar , S N Merchant . Recent Researches in Circuits, Systems and
   Signal Processing,
- [Vaidyanathan ()] Multirate systems and Filter Banks, P P Vaidyanathan . 1993. Englewood Cliffs: Prenctice Hall.
- [Nagabushnam, Cyril Prasanna Raj P, Ramachandra, Design and FPGA Implementation of Modified Distributive Arithmetic Bas
   'Nagabushnam, Cyril Prasanna Raj P, Ramachandra, Design and FPGA Implementation of Modified
- Distributive Arithmetic Based DWT-IDWT Processor for Image Compression'. *IEEE Trans. on circuit and* systems for video Technology December 2005 11. 2009. 15 (12) p. . (European Journal of Scientific Research)
- [Neil et al. ()] H E Neil, David Weste, Harris. CMOS VLSI Design -A Circuit and System Perspective, 2005.
- 270 Pearson Education. (3rd edition)
- 271 [Vishwanath et al. (1995)] 'VLSI architectures for the discrete wavelet transform'. R M Vishwanath , M J Owens
- , Irwin . IEEE Trans. Circuits and Syst May 1995. II (5).