Design and Implementation of Complexity Reduced Digital Signal Processors for Low Power Biomedical Applications
Eminaga, Y.

This is an electronic version of a PhD thesis awarded by the University of Westminster. © Miss Yaprak Eminaga, 2019.
Design and Implementation of Complexity Reduced Digital Signal Processors for Low Power Biomedical Applications

Yaprak EMİNAĞA

A thesis submitted in partial fulfilment of the requirements of the University of Westminster for the degree of Doctor of Philosophy

June 2019
Dedicated to

my parents Özdal & Mustafa EMİNAĞA

and

to the loving memories of my grandparents

Bahire & Hüseyin KUBİLAY
I declare that the work presented in this thesis is my own, has not been submitted for any other award, is identical to the content of the electronic submission and that, to the best of my knowledge, it does not contain any material previously created by another person, except where due reference is made.

Yaprak EMINAĞA
Abstract

Wearable health monitoring systems can provide remote care with supervised, independent living which are capable of signal sensing, acquisition, local processing and transmission. A generic biopotential signal (such as Electrocardiogram (ECG), and Electroencephalogram (EEG)) processing platform consists of four main functional components. The signals acquired by the electrodes are amplified and preconditioned by the (1) Analog-Front-End (AFE) which are then digitized via the (2) Analog-to-Digital Converter (ADC) for further processing. The local digital signal processing is usually handled by a custom designed (3) Digital Signal Processor (DSP) which is responsible for either anyone or combination of signal processing algorithms such as noise detection, noise/artefact removal, feature extraction, classification and compression. The digitally processed data is then transmitted via the (4) transmitter which is renown as the most power hungry block in the complete platform. All the aforementioned components of the wearable systems are required to be designed and fitted into an integrated system where the area and the power requirements are stringent. Therefore, hardware complexity and power dissipation of each functional component are crucial aspects while designing and implementing a wearable monitoring platform. The work undertaken focuses on reducing the hardware complexity of a biosignal DSP and presents low hardware complexity solutions that can be employed in the aforementioned wearable platforms.

A typical state-of-the-art system utilizes Sigma Delta (ΣΔ) ADCs incorporating a ΣΔ modulator and a decimation filter whereas the state-of-the-art decimation filters employ linear phase Finite-Impulse-Response (FIR) filters with high orders that increase the hardware complexity [1–5]. In this thesis, the novel use of minimum phase Infinite-Impulse-Response (IIR) decimators is proposed where the hardware complexity is massively reduced compared to the conventional FIR decimators. In addition, the non-linear phase effects of these filters are also investigated since phase non-linearity may distort the time domain representation of the signal being filtered which is un-
desirable effect for biopotential signals especially when the fiducial characteristics carry diagnostic importance. In the case of ECG monitoring systems the effect of the IIR filter phase non-linearity is minimal which does not affect the diagnostic accuracy of the signals.

The work undertaken also proposes two methods for reducing the hardware complexity of the popular biosignal processing tool, Discrete Wavelet Transform (DWT). General purpose multipliers are known to be hardware and power hungry in terms of the number of addition operations or their underlying building blocks like full adders or half adders required. Higher number of adders leads to an increase in the power consumption which is directly proportional to the clock frequency, supply voltage, switching activity and the resources utilized. A typical Field-Programmable-Gate-Array’s (FPGA) resources are Look-up Tables (LUTs) whereas a custom Digital Signal Processor’s (DSP) are gate-level cells of standard cell libraries that are used to build adders [6]. One of the proposed methods is the replacement of the hardware and power hungry general purpose multipliers and the coefficient memories with reconfigurable multiplier blocks that are composed of simple shift-add networks and multiplexers. This method substantially reduces the resource utilization as well as the power consumption of the system. The second proposed method is the design and implementation of the DWT filter banks using IIR filters which employ less number of arithmetic operations compared to the state-of-the-art FIR wavelets. This reduces the hardware complexity of the analysis filter bank of the DWT and can be employed in applications where the reconstruction is not required. However, the synthesis filter bank for the IIR wavelet transform has a higher computational complexity compared to the conventional FIR wavelet synthesis filter banks since re-indexing of the filtered data sequence is required that can only be achieved via the use of extra registers. Therefore, this led to the proposal of a novel design which replaces the complex IIR based synthesis filter banks with FIR filters which are the approximations of the associated IIR filters. Finally, a comparative study is presented where the hybrid IIR/FIR and FIR/FIR wavelet filter banks are deployed in a typical noise reduction scenario using the wavelet thresholding techniques. It is concluded that the proposed hybrid IIR/FIR wavelet filter banks provide better denoising performance, reduced computational complexity and power consumption in comparison to their IIR/IIR and FIR/FIR counterparts.
Acknowledgements

First and foremost, I would like to express my sincerest appreciations to my supervisor Dr. Adem Coşkun for his constant support, approachability, enthusiasm, and in-depth knowledge throughout my PhD. He has always been supportive, positive and more importantly understanding. It is an honour to be his first PhD student and thank you for making my PhD experience a very enjoyable one.

I would also like to express my deepest gratitude to my second supervisor Prof. İzzet Kale for his endless support, encouragement, always positive attitude and great knowledge throughout my PhD. Thank you for always enlightening my way for more than a decade, for always pointing me the correct direction, and for encouraging me to try my best even during the difficult times. It has been and always will be an honour to work under your guidance.

To Dağhan Özbilenler, thank you for making my life better and for helping me get over any obstacle I face. I would not get over this journey without your endless support. I would also like to thank my whole family and friends for having faith in me.

Last but not the least, I would like to thank my parents for being my inspiration and supporting me both financially and emotionally, and believing in me throughout my whole life. I would not be the person I am and I would not be where I am now, if it was not for the endless compromises you made and your infinite love. My dear sister Başak and dear brother Barış thank you for giving me the opportunity to know that I will never be alone when you are in my life, even thousands of miles away from home.
# List of Acronyms

<table>
<thead>
<tr>
<th>Acronym</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADC</td>
<td>Analog-to-Digital Converter</td>
</tr>
<tr>
<td>AFE</td>
<td>Analog-Front-End</td>
</tr>
<tr>
<td>AIQ</td>
<td>Algebraic Integer Quantization</td>
</tr>
<tr>
<td>AP</td>
<td>Action Potential</td>
</tr>
<tr>
<td>ASIC</td>
<td>Application Specific Integrated Circuit</td>
</tr>
<tr>
<td>AV</td>
<td>Atrioventricular</td>
</tr>
<tr>
<td>BPM</td>
<td>Beats Per Minute</td>
</tr>
<tr>
<td>CIC</td>
<td>Cascaded Integrator-Comb</td>
</tr>
<tr>
<td>CLB</td>
<td>Configurable Logic Block</td>
</tr>
<tr>
<td>CNS</td>
<td>Central Nervous System</td>
</tr>
<tr>
<td>CSC</td>
<td>Cross Spectral Coherence</td>
</tr>
<tr>
<td>CSD</td>
<td>Canonic Signed Digit</td>
</tr>
<tr>
<td>CSE</td>
<td>Common Sub-expression Elimination</td>
</tr>
<tr>
<td>CWT</td>
<td>Continuous Wavelet Transform</td>
</tr>
<tr>
<td>DA</td>
<td>Distributed Arithmetic</td>
</tr>
<tr>
<td>DAG</td>
<td>Directed Acyclic Graph</td>
</tr>
<tr>
<td>DCT</td>
<td>Discrete Cosine Transform</td>
</tr>
<tr>
<td>DFT</td>
<td>Discrete Fourier Transform</td>
</tr>
<tr>
<td>DR</td>
<td>Distortion Ratio</td>
</tr>
<tr>
<td>DSP</td>
<td>Digital Signal Processor</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>-------------</td>
</tr>
<tr>
<td>DWT</td>
<td>Discrete Wavelet Transform</td>
</tr>
<tr>
<td>ECG</td>
<td>Electrocardiogram</td>
</tr>
<tr>
<td>ECoG</td>
<td>Electrocorticogram</td>
</tr>
<tr>
<td>EEG</td>
<td>Electroencephalogram</td>
</tr>
<tr>
<td>EMG</td>
<td>Electromyogram</td>
</tr>
<tr>
<td>FB</td>
<td>filter bank</td>
</tr>
<tr>
<td>FF</td>
<td>Flip-Flop</td>
</tr>
<tr>
<td>FP</td>
<td>fixed-point</td>
</tr>
<tr>
<td>FIR</td>
<td>Finite Impulse Response</td>
</tr>
<tr>
<td>FPGA</td>
<td>Field-Programmable Gate Array</td>
</tr>
<tr>
<td>FT</td>
<td>Fourier Transform</td>
</tr>
<tr>
<td>GP</td>
<td>General Purpose</td>
</tr>
<tr>
<td>HB</td>
<td>Half-Band</td>
</tr>
<tr>
<td>IC</td>
<td>Integrated Circuit</td>
</tr>
<tr>
<td>IIR</td>
<td>Infinite Impulse Response</td>
</tr>
<tr>
<td>KCM</td>
<td>Constant Coefficient Multiplier</td>
</tr>
<tr>
<td>LIFO</td>
<td>Last-in First-out</td>
</tr>
<tr>
<td>LMS</td>
<td>Least Mean Square</td>
</tr>
<tr>
<td>LSB</td>
<td>least significant bit</td>
</tr>
<tr>
<td>LTC</td>
<td>Long Term Condition</td>
</tr>
<tr>
<td>LUT</td>
<td>Look-up Table</td>
</tr>
<tr>
<td>MAC</td>
<td>Multiply-Accumulate</td>
</tr>
<tr>
<td>MAE</td>
<td>Maximum Absolute Error</td>
</tr>
<tr>
<td>MAG</td>
<td>Minimum Adder Graph</td>
</tr>
<tr>
<td>MCM</td>
<td>Multiple Constant Multiplication</td>
</tr>
<tr>
<td>MEMS</td>
<td>Micro-Electro-Mechanical Systems</td>
</tr>
<tr>
<td>MSB</td>
<td>most significant bit</td>
</tr>
<tr>
<td>MSD</td>
<td>Minimum Signed Digit</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>--------------------------------------</td>
</tr>
<tr>
<td>MSE</td>
<td>Mean Square Error</td>
</tr>
<tr>
<td>ND(TDL)</td>
<td>Numerator-Denominator TDL</td>
</tr>
<tr>
<td>PAR</td>
<td>Place and Route</td>
</tr>
<tr>
<td>PCA</td>
<td>Principle Component Analysis</td>
</tr>
<tr>
<td>PDA</td>
<td>Personal Data Assistant</td>
</tr>
<tr>
<td>PNS</td>
<td>Peripheral Nervous System</td>
</tr>
<tr>
<td>PR</td>
<td>perfect reconstruction</td>
</tr>
<tr>
<td>PSD</td>
<td>Power Spectral Density</td>
</tr>
<tr>
<td>PZP</td>
<td>Pole-Zero Plane</td>
</tr>
<tr>
<td>QMF</td>
<td>quadrature mirror</td>
</tr>
<tr>
<td>QNPSD</td>
<td>Quantization Noise Power</td>
</tr>
<tr>
<td>RAG-n</td>
<td>Reduced Adder Graph</td>
</tr>
<tr>
<td>RAM</td>
<td>Random Access Memory</td>
</tr>
<tr>
<td>ReMB</td>
<td>Reconfigurable Multiplier Block</td>
</tr>
<tr>
<td>RF</td>
<td>Radio Frequency</td>
</tr>
<tr>
<td>RMSE</td>
<td>Root Mean Square Error</td>
</tr>
<tr>
<td>ROM</td>
<td>Read-Only Memory</td>
</tr>
<tr>
<td>SA</td>
<td>Sinoatrial</td>
</tr>
<tr>
<td>ΣΔ</td>
<td>Sigma-Delta</td>
</tr>
<tr>
<td>SCM</td>
<td>Single Constant Multiplication</td>
</tr>
<tr>
<td>SER</td>
<td>Signal-to-Error Ratio</td>
</tr>
<tr>
<td>SIMO</td>
<td>Single-Input-Multiple Output</td>
</tr>
<tr>
<td>SISO</td>
<td>Single-Input-Single-Output</td>
</tr>
<tr>
<td>SNR</td>
<td>Signal-to-Noise Ratio</td>
</tr>
<tr>
<td>SoC</td>
<td>System-on-Chip</td>
</tr>
<tr>
<td>STFT</td>
<td>Short-Time Fourier Transform</td>
</tr>
<tr>
<td>TDA</td>
<td>Tap-Delay Accumulate</td>
</tr>
<tr>
<td>TDL</td>
<td>Tapped Delay Line</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>-------------</td>
</tr>
<tr>
<td>TSMC</td>
<td>Taiwan Semiconductor Manufacturing Company</td>
</tr>
<tr>
<td>TWAC</td>
<td>T-Wave Alternans Challenge</td>
</tr>
<tr>
<td>VHDL</td>
<td>VHSIC Hardware Description Language</td>
</tr>
<tr>
<td>VLSI</td>
<td>Very-Large-Scale Integration</td>
</tr>
<tr>
<td>WBAN</td>
<td>Wireless Body Area Network</td>
</tr>
<tr>
<td>WT</td>
<td>Wavelet Transform</td>
</tr>
</tbody>
</table>
# Contents

**Abstract** ii  
**Acknowledgements** v  
**List of Acronyms** vi  
**List of Figures** xiii  
**List of Tables** xix  

## 1 Introduction

1.1 Wireless Body Area Network

1.1.1 WBAN General Architecture

1.1.2 State-of-the-art WBAN Systems with Local Processing

1.2 Objectives

1.3 Novel Contributions of the Work

1.4 Outline of the Thesis

1.5 Chapter Conclusion

## 2 Physiological Background and Wavelet Theory

2.1 Introduction to Physiological Signals

2.1.1 Electrocardiography (ECG)

2.1.2 Electroencephalography (EEG)

2.1.3 Electromyography (EMG)

2.2 Introduction to Wavelet Theory

2.2.1 Short-Time Fourier Transform

2.2.2 Wavelet Transform

2.3 Chapter Conclusions

## 3 Decimation Filter for Wearable ECG Monitoring Systems

3.1 Introduction

3.2 State-of-the-art Decimation Filters in Biomedical Applications

3.3 Proposed Decimation Filter Structure

3.3.1 The Slink Filter

3.3.2 Two-Path All-pass Based Half-Band IIR Filters

x
List of Figures

1.1 The architecture of a wearable WBAN system, illustrating the typical three tier system ............................. 5
1.2 Main functional blocks of WBAN sensors; AFE, ΣΔ ADC, DSP, RF communications and Power Management ............... 5

2.1 Conduction system of the heart and generation of an ECG signal by the temporal and spatial summation of APs [47] .................................................................................................................. 22
2.2 Time domain features of an ECG signal: P, QRS, and T represent atrial depolarization, ventricular depolarization, and atrial and ventricular repolarization respectively [48] .................................................................................................................. 23
2.3 Lateral view of the cerebral cortex with occipital, temporal, parietal and frontal lobes [52] .................................................. 25
2.4 Example EEG recordings at (a) frontal, (b) temporal, (c) parietal and (d) occipital lobe, which were recorded while subject performed different motor/imagery tasks [55] ................................................................. 26
2.5 A schematic view of the activation of muscle fibres by the CNS impulse [52] .................................................. 29
2.6 Example recording of EMG obtained from individual and combined finger movements of a healthy subject. Each five second interval (indicated with dashed lines) represents different finger’s movement [65] ................................................................. 29
2.7 Time and frequency domain responses of a (a) Stationary and (b) Non-stationary signal .............................................................................................................. 33
2.8 Example wavelet functions of different families .............................................................................................................. 35
2.9 Time and frequency resolution of (a) Short-Time Fourier Transform and (b) Wavelet Transform [88] ................................................................. 37
2.10 DWT analysis and synthesis filter banks, for 3 level decomposition and reconstruction .............................................................................................................. 40

3.1 Behavioural structure of the decimation filter, incorporating the 4th order Slink, two 2-path all-pass based HB IIR, and Slink compensation filters .............................................................................................................. 45
3.2 The magnitude response of the fourth order Slink filter given in (3.1) .............................................................................................................. 46
3.3 Two-path all-pass based HB IIR decimator structure, incorporating all-pass filters \( A_1(z) \) and \( A_2(z) \) in the top and bottom paths respectively .............................................................................................................. 47
3.4 First order all-pass filter structures for (a) \( A_1(z) \) in the top branch with \( \alpha_1 = 0.125 \) and (b) \( A_2(z) \) in the bottom branch with \( \alpha_2 = 0.5625 \) .............................................................................................................. 48
3.5 Magnitude response of the proposed Half-Band (HB) Infinite Impulse Response (IIR) filters presented in Figure 3.3. .................................................. 48
3.6 Slink roll-off compensation filter structure with coefficient $\alpha_c = 0.03125$. 49
3.7 Full band magnitude response of the overall decimation filter, (a) at input rate (zoomed into the passband region) and (b) at the output rate (decimated) along with the Slink (blue) and Slink compensation (red) filters magnitude response. .................................................. 50
3.8 The group delay of the two-path all-pass based HB IIR filter with the all-pass section coefficients $\alpha_1 = 0.125$ and $\alpha_2 = 0.5625$. 52
3.9 The two-path polyphase IIR filters normalized group delay response corrected using (a) a single section corrector and (b) a 4 section corrector along with the original filters (blue), and the phase compensators (red) group delay responses. .................................................. 54
3.10 PSD of 10 seconds long (a) 12 lead recordings (from Lead I to Lead V6) of record twa55 in sinus rhythm obtained from TWAC Database and (b) 13 Lead II recordings from MIT-BIH Arrhythmia Database with various conduction abnormalities and beat morphologies. (AFIB: Atrial fibrillation, AFL: Atrial flutter, SBR: Sinus bradycardia, IVR: Idioventricular rhythm, SVTA: Supraventricular tachyarrhythmia, VFL: Ventricular flutter, VT: Ventricular tachycardia, BII: 2° heart block, PREX: Pre-excitation, B: Ventricular bigeminy, T: Ventricular trigeminy and P: Paced rhythm) .................................................. 55
3.11 Power spectrum measurements within the signal bandwidth of 500 Hz for sampling rate of 64 kHz, (a) $\Sigma\Delta$ modulator output and (b) decimation chain output (without phase compensation (black) and with phase compensation (red)). .................................................. 59
3.12 PSD of Lead II recording of record twa55 at sinus rhythm (blue) versus the group delay variation with (green) and without phase compensation (red) filters, $f_1 = 22.5 \text{ Hz}(\nu_1 = 0.0625)$ and $f_2 = 45 \text{ Hz}(\nu_2 = 0.125)$ indicated by the yellow lines at $f_s = 46.08 \text{ kHz}$. .................................................. 59
3.13 Decimation chain output (black) versus the input(red) (a) without phase compensation and (b) with phase compensation. .................................................. 61
3.14 Amplitude difference between the input and output of the decimation chain, without (original - red) and with compensation (corrected - black). 61
3.15 (a) Waveform Dissimilarity between the input and output of the overall decimation chain without (blue) and with group delay compensation (red). (b) Input/ Output Distortion Ratios of the overall decimation chain without (blue) and with phase compensation (red). .................................................. 62
3.16 Decimation chain output (black) versus the input (red) (a) without phase compensation and (b) with phase compensation (Ventricular Tachycardia). .................................................. 63
3.17 Amplitude difference between the input and output of the decimation chain, without (red) and with compensation (black). (Ventricular Tachycardia) .................................................. 64
3.18 (a) Waveform Dissimilarity between the input and output of the overall
decimation chain without (blue) and with group delay compensation
(red). (b) Input/Output Distortion Ratios of the overall decimation
chain without (blue) and with phase compensation (red). . . . . . . . . 64

4.1 Parallel (a) Tapped Delay Line (TDL) and (b) Time Delay and Accu-
mulate (TDA) filter architectures where the boxes highlight the multi-
plication blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2 Constant multiplier structure for coefficients 7, 29, 39 (a) before and (b)
after CSE technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 Multiplier graph for 27 represented with (a) CSD and (b) method given
in [127]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.4 Adder graphs generated for coefficient set 1, 7, 16, 21 and 33 by (a) BHM
and (b) RAG-n [127] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.5 Xilinx 4-series (a) Configurable logic block and (b) simplified half slice
[144]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.6 (a) Basic structure with 2:1 multiplexer and 4-input LUT mapping for
(b) addition, (c) subtraction and (d) addition/subtraction. . . . . . . . 80

4.7 Example adder graph with $N = 8, k = 2$ and $n = 2$ [125]. . . . . . . . 83

4.8 Simplified half slice of Xilinx 7-series [146]. . . . . . . . . . . . . . . . . 83

4.9 (a) Basic structure with 3:1/4:1 multiplexer and its 6-input LUT map-
ping for (b) addition, (c) subtraction and (d) addition/subtraction. . . 84

4.10 (a) Basic structure with two 2:1 muxes and (b) its mapping on a 5-input,
2-output LUT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.11 Estimated (a) MSE and (b) SER of the reconstructed output with vari-
ous filter coefficient precision. . . . . . . . . . . . . . . . . . . . . . . . 88

4.12 Frequency response of (a) fixed-point filters at each decomposition level,
(b) Wavelet and Scaling filters with floating-point and 11-bit fixed-point
coefficients (red), (c) Scaling and Wavelet function associated with db4,
and (d) Pole-zero plane of the floating and 11-bit fixed-point coefficients
of db4 scaling filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.13 The ReMB designed for db4 wavelet filters. . . . . . . . . . . . . . . . 91

4.14 Constant multiplier blocks designed for db4 filter coefficients. (a) Design2
and (b) Design3 [155]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.15 Time-multiplexed TDL FIR filter implemented using (a) a general pur-
pose multiplier and coefficient memory, and (b) the proposed ReMB
block replacing multiplier and coefficient memory. . . . . . . . . . . 94

4.16 The controller for the lowpass analysis filter ($h_0(k)$). . . . . . . . . . 96

4.17 One level analysis filter bank comprised of a lowpass ($h_0(k)$) and high-
pass ($h_1(k)$) time-multiplexed TDL filters with; (a) parallel multiplier
and coefficient memory and (b) the proposed ReMB. . . . . . . . . . . 97

4.18 The controller for the analysis filter bank . . . . . . . . . . . . . . . . . 98

5.1 Polyphase realization of two-channel IIR QMF bank. . . . . . . . . . . . 112
5.2 9th order IIR lowpass (blue) and highpass (red) filters; Maximally flat
with $K = 9$ and $N = 4$; (a) Magnitude response, (b) Pole-Zero Plane
Elliptic with $K = 1$ and $N = 0$; (c) Magnitude response, (d) Pole-Zero
Plane Intermediate with $K = 5$ and $N = 2$; (e) Magnitude response, (f)
Pole-Zero Plane ................................................................. 117

5.3 Scaling and Wavelet functions for 9th order (a) Maximally flat, (b) El-
liptic, and (c) Intermediate IIR wavelet filters. ...................................... 118

5.4 One level IIR wavelet filter bank in polyphase structure with causal
stable analysis and stable but anti-causal synthesis filters. .................. 121

5.5 Implementation of non-causal time reversed allpass filter using Powell
and Chau technique [169]. .......................................................... 125

5.6 (a) One level hybrid IIR/FIR wavelet filter bank in polyphase structure
with causal and stable IIR analysis and FIR synthesis filter banks. ...... 127

5.7 $ilet3$; (a) Magnitude response and (b) Pole-Zero locations. $ilet5$; (c)
Magnitude response and (d) Pole-Zero locations. (e) $ilet3$ Scaling and
Wavelet functions, and (f) $ilet5$ Scaling and Wavelet functions. .......... 129

5.8 Analysis filter bank responses for 5 level decomposition where D and A
are the highpass and the lowpass branch responses. (a) $ilet3$, (b) $ilet5$
Wavelet Transform (WT), and (c) $db4$ WT ........................................ 130

5.9 Truncated impulse response of $ilet3$, (a) Maximum error between the
$H_0(z)$ and $H_0(z)$ which is the truncated IIR filter, and (b) Top figure;
Magnitude response, Bottom figure; Magnitude error of the $H_0(z)$ (blue)
and $H_0(z)$ (red) impulse response with $L_0 = 8$. ......................... 132

5.10 Truncated impulse response of $ilet5$, (a) Maximum error between the
$H_0(z)$ and $H_0(z)$ which is the truncated IIR filter, and (b) Top figure;
Magnitude response, Bottom figure; Magnitude error of the $H_0(z)$ (blue)
and $H(z)_0$ (red) impulse response with $L_0 = 8$ and $L_1 = 16$. ........ 132

5.11 (a) First level analysis filter bank (FB), (b) first level synthesis FB and
(c) implementation of $A_0(z^{-1})$ with $L_0 = 8$ in floating point precision
for $ilet3$. .................................................................................. 134

5.12 (a) First level analysis FB (b) first level synthesis FB (c) implement-
ation of $A_0(z^{-1})$ with $L_0 = 8$ and (d) implementation of $A_1(z^{-1})$ with
$L_1 = 16$ in floating point precision for $ilet5$. ............................. 135

5.13 Timing diagram of $A_0,1(z^{-1})$ in Figures 5.11 and 5.12. ............. 136

5.14 $ilet3$ FB performance for perfect reconstruction with $L_0 = 8$, (a) ECG
record-232 input(red)vs reconstructed output (blue), (b) MAE (mV) for
ECG data records, (c) EEG record-chb14 input(red)vs reconstructed
output (blue), and (d) MAE (mV) for EEG data records. ................... 137

5.15 $ilet5$ FB performance for perfect reconstruction with $L_0 = 8$ and $L_1 = 16$, (a) ECG record-232 input(red)vs reconstructed output (blue), (b)
MAE (mV) for ECG data records, (c) EEG record-chb14 input(red)vs
reconstructed output (blue), and (d) MAE (mV) for EEG data records. 138
5.16 *ilet3* wavelet analysis filters; (a) Magnitude responses, and (b) PZPs for floating-point ($\tilde{H}_0(z)$) and fixed-point ($\tilde{H}_0(z)$) coefficients, *ilet5* wavelet Analysis filters; (c) Magnitude responses, and (d) PZPs for floating point and fixed point coefficients.

5.17 Peak gain and quantization noise shaping for 1st order ND-TDL allpass structure [171].

5.18 (a) Gain $|P_t(z)|$ and (b) Output quantization noise power (QNPSD), for $A_0(z)$ of *ilet3* (blue), $A_0(z)$ of *ilet5* (red) and $A_1(z)$ of *ilet5* (black).

5.19 *ilet3* (a) One-level IIR synthesis filter bank architecture and (b) the timing diagram for controlling the operation of the synthesis filter bank.

5.20 *ilet5* One-level IIR synthesis filter bank architecture with the block processing method.

5.21 (a) Phase compensation error with $(L_0 - 1)^{th}$ order FIR filter for *ilet3* wavelet, (b) Magnitude of Linear Distortion Transfer Function ($D_L(z)$), (c) Magnitude of Aliasing Distortion Transfer Function ($D_A(z)$), and (d) the group delay of the analysis and synthesis filter banks.

5.22 *ilet3* wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter magnitude responses, (c) Analysis filter group delay, and (d) synthesis filter group delay.

5.23 Floating model of one-level hybrid IIR/FIR wavelet filter bank for *ilet3* wavelet.

5.24 (a) Phase compensation error with $(L_0 - 1)^{th}$ and $(L_1 - 1)^{th}$ order FIR filters for *ilet5* wavelet, (b) Magnitude of Linear Distortion Transfer Function ($D_L(z)$), (c) Magnitude of Aliasing Distortion Transfer Function ($D_A(z)$), and (d) the group delay of the analysis and synthesis filter banks.

5.25 *ilet5* wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter magnitude responses, (c) Analysis filter group delay, and (d) synthesis filter group delay.

5.26 Floating model of one-level hybrid IIR/FIR wavelet filter bank for *ilet5* wavelet.

5.27 Magnitude responses of; (a) *ilet3* synthesis filters and (b) *ilet5* synthesis filters with floating-point (blue-red) and fixed-point (black-green) coefficients.

5.28 Magnitude of; (a) linear, and (b) aliasing distortion transfer functions, and (c) the hybrid filter bank group delay, with floating- (blue) and fixed-point (red) coefficients for *ilet3* wavelet.

5.29 Magnitude of; (a) linear, and (b) aliasing distortion transfer functions, and (c) the hybrid filter bank group delay, with floating- (blue) and fixed-point (red) coefficients for *ilet5* wavelet.

5.30 Four seconds of (a) ECG data record-232 (top figure) and reconstruction error (bottom figure), and (b) EEG data record-chb14 (top figure) and reconstruction error (bottom figure).

5.31 (a) The multiplier free architecture of the Hybrid IIR/FIR wavelet filter bank for *ilet3*, and (b) the structure of the ReMB.
5.32 The structure of the controller designed for generating the ReMB control signals. ........................................................ 166
5.33 The multiplier free architecture of the one-level Hybrid IIR/FIR wavelet filter bank for ilet5 ................................................. 168
5.34 Structure of (a) ReMB0 designed for $R_0(z)$ and (b) ReMB1 designed for $R_1(z)$ of the ilet5 Hybrid IIR/FIR wavelet filter bank. ........................................................................................................... 169
5.35 The controller designed for generating the ReMB control signals. ................................................................. 170
5.36 The controller designed for generating the ReMB control signals. ................................................................. 171
5.37 The block diagram of the DWT based denoising method. ...................................................................................... 174
5.38 A 5 second segment of the (a) clean record ‘105’, (b) the generated EMG noise (c) generated baseline wander and (d) the noisy record with an SNR of -8 dB .............................................................................................................................. 177
5.39 Average (a) SNR Improvement (dB), and (b) MSE, after wavelet denoising with ilet3, ilet5, db4, db6, db8, sym4 and coif4 ................................................................. 178

B.1 (a) State diagram and (b) logic design, of the 3-bit up/down counter employed in the design of ilet3 IIR synthesis filter bank to generate the required addresses for the dual-port RAM. ............................................................... 210
B.2 (a) State diagram and (b) logic design, of the 4-bit up/down counter employed in the design of ilet5 IIR synthesis filter bank to generate the required addresses for the dual-port RAM. ............................................................... 211

C.1 Seven level analysis filter bank responses of (a) ilet3, (b) ilet5, (c) db4 and (d) db6. .......................................................... 214
C.1 Seven level analysis filter bank responses of (e) db8, (f) sym4, and (g) coif4. .......................................................... 215
# List of Tables

<table>
<thead>
<tr>
<th>Table</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.1</td>
<td>Normal values of a Lead II ECG features of a healthy subject in sinus rhythm at 60 BPM [49].</td>
</tr>
<tr>
<td>2.2</td>
<td>Typical Adult Human Scalp EEG Waves and Associated Frequency Ranges and Body Activities [54].</td>
</tr>
<tr>
<td>3.1</td>
<td>Variation in the group delay in normalized frequency bands of ( \nu_{B1} = 0 - 0.0625 ) and ( \nu_{B2} = 0 - 0.125 ).</td>
</tr>
<tr>
<td>3.2</td>
<td>Two-path All-pass based HB IIR Filter Characteristics Comparison with the State-of-the-art.</td>
</tr>
<tr>
<td>4.1</td>
<td>Lowpass and Highpass, Analysis and Synthesis Filter coefficients.</td>
</tr>
<tr>
<td>4.2</td>
<td>Fixed-point (11-bit) ( db4 ) wavelet filter coefficients, their adder costs and shift-add format used to design the proposed ReMB.</td>
</tr>
<tr>
<td>4.3</td>
<td>CSD encoded coefficients and common sub-expressions.</td>
</tr>
<tr>
<td>4.4</td>
<td>Select line (S0:S4) values for multiplexers given in Figures 4.13 and 4.15 (b) to generate the lowpass analysis filter coefficients.</td>
</tr>
<tr>
<td>4.5</td>
<td>Resource utilization of individual MCM blocks that are designed using the ReMB, CSE and DAG fusion methods as well as the the Xilinx Multiplier LogiCORE™.</td>
</tr>
<tr>
<td>4.6</td>
<td>Resource utilization of the time-multiplexed TDL Filters with the proposed RemB and the Xilinx Multiplier after Place and Route on Xilinx Kintex-7 device.</td>
</tr>
<tr>
<td>4.7</td>
<td>Resource Utilization and Power Consumption of the Multiplier Free ( db4 ) Filter Bank Architectures.</td>
</tr>
<tr>
<td>5.1</td>
<td>Average Error Measures for evaluating the implemented three level IIR/IIR ( ilet3 ) and ( ilet5 ) wavelet filter banks.</td>
</tr>
<tr>
<td>5.2</td>
<td>The average MSE, QNPSD, and SER metrics obtained for White Gaussian Noise, ECG and EEG data with ( ilet3 ) and ( ilet5 ) filter banks.</td>
</tr>
<tr>
<td>5.3</td>
<td>The average QNPSD obtained for Approximation ((A3, A2, A1)) and detail coefficients ((D_1, D_2, D_3)) at decomposition levels 3, 2, and 1 for White Gaussian Noise, ECG and EEG data with ( ilet3 ) and ( ilet5 ) filter banks.</td>
</tr>
<tr>
<td>5.4</td>
<td>Fixed-point (9-bit) ( ilet3 ) and ( ilet5 ) wavelet filter coefficients, their adder costs and shift-add format used to design the constant multiplications.</td>
</tr>
</tbody>
</table>
5.5 Resource Utilization and Power Consumption of the illet3 and illet5 Analysis IIR Filter Bank Architectures. ............................................. 148
5.6 Resource Utilization and Power Consumption of the Multiplier Free illet3 and illet5 IIR/IIR Filter Bank Architectures. ................................. 151
5.7 Average Error Measures for evaluating the implemented three level hybrid IIR/FIR illet3 and illet5 wavelet filter banks. .......................... 158
5.8 Average Error Measures for Evaluating Three Level Hybrid IIR/FIR illet3 and illet5 Wavelet Filter Banks with Finite-Precision Filter Coefficients. ................................................................. 162
5.9 FIR Synthesis Filters’ Coefficients for illet3 and illet5 Wavelets. ....................... 164
5.10 Control signals for the ReMB designed for $R_0(z)$ of illet3 Hybrid IIR/FIR wavelet filter bank. ................................................................. 167
5.11 Control signals for the ReMB designed for $R_0(z)$ of illet5 Hybrid IIR/FIR wavelet filter bank. ................................................................. 170
5.12 Control signals for the ReMB designed for $R_1(z)$ of illet5 Hybrid IIR/FIR wavelet filter bank. ................................................................. 170
5.13 Resource Utilization and Power Consumption of the Multiplier Free illet3 and illet5 IIR/FIR Filter Bank Architectures. ................................. 172
5.14 SNR improvement (dB) and MSE after wavelet denoising the four noisy ECG records with input SNR of -8 dB. ........................................... 180

A.1 Truth tables used to design the db4 lowpass filter $h_0(k)$ ReMB controller. 208
A.2 Truth tables used to design the db4 highpass filter $h_1(k)$ ReMB controller. 209

B.1 Truth tables used to design the illet3 $R_0(z)$ ReMB controller. .................. 212
B.2 Truth tables used to design the illet5 $R_0(z)$ ReMB controller. .................. 212
B.3 Truth tables used to design the illet5 $R_1(z)$ ReMB controller. .................. 213
Chapter 1

Introduction

There is a growing demand in managing people with Long Term Conditions (LTCs) such as cardiovascular diseases, hypertension, diabetes, cancer, and epilepsy. Currently 15 million people in England suffer from one or more such chronic diseases [7]. In addition, more than 17% of the UK population are aged 65 and over, and this number is predicted to rise due to the ageing population demographics [8]. The unavoidable result of such population ageing trend, is an increase in the number of chronically ill patients which is projected to rise by 3 million within the next two decades [7]. Another inevitable result is the increase in the annual cost of health and social care services. According to the Department of Health, in 2012 the health services provided for caring and treatment of patients with LTCs governed 50% and 64% of general practitioner and outpatient appointments respectively, which accounted for 70% of healthcare expenditure in England [9].

These facts emphasize the importance of long-term unobtrusive physiological monitoring for early-stage diagnosis of the health issues, as well as decreasing the health care
costs via offering healthcare services to be delivered at patient’s home without affecting their daily routines [10]. Thus, these act as significant catalysts for developing clinical and cost-effective healthcare solutions by using the recent technological advancements. The rapid improvement in the semiconductor industry enables innovative technologies to be manufactured and integrated while increasing computational performance and diminishing sizes. Wearable systems, such as Wireless Body Area Network (WBAN), can provide remote care of the elderly and chronically ill people that leads to convenience of the healthcare provider. Utilization of these systems can also deliver supervised but independent living and improved healthcare quality to the users while reducing the cost of health services by means of decreased number of General Practitioner and hospital admissions and increasing the efficiency of chronic disease management [11]. Typical applications of medical wearable systems can be continuous LTC monitoring for chronically ill patients, remote postoperative rehabilitation, daily activity monitoring, fall and movement detection, location tracking for elderly people and assessment and/or enhancement of sportive technical abilities [12].

The benefits of wearable systems also bring new challenges in the development and acceptance of these systems. Compared to the non-ambulatory monitoring systems, wearable human assistive systems are in close contact with users for a longer time, therefore they should not restrict users’ life. In addition, due to the fact that ambulatory monitoring application does not take place in a steady and controlled environment, wearable systems must operate with optimum performance under real-time. While reliability, comfort, and usability are of great importance, security and acceptance are equally significant [13].
1.1 Wireless Body Area Network

A WBAN is a wireless networking technology that interconnects miniaturised and autonomous sensor nodes or routers in, on or around a human body, monitoring body function during normal activities for sporting, health and emergency applications [14]. During the last few decades WBAN has become main research focus in medical diagnostics and personal healthcare due to the growing demand for long-term real-time health monitoring and proactive healthcare [15, 16]. With the advancing technology it is aimed to replace bulky wired devices, with small, low-weight, low-cost, portable and wireless ones, in order to provide real-time feedback of useful medical information to medical servers, and mobility and comfort to the users. The smart wearable systems can measure different physiological parameters including, bio-potential signals, skin and body temperature, heart rate, blood pressure, respiration and blood oxygen saturation (SpO₂) [17].

As mentioned previously, wearable systems in remote processing category, acquire raw biomedical data and transmit them to a remote server for further advanced digital signal processing, whereas in the second category signal processing is handled locally. Wearable WBAN systems can be categorized in two groups. The first group is the off-site systems in which the physiological monitoring system acquires the raw data and either stores the data locally for off-line processing or transmits them through a wireless communication link to a remote site for real-time processing. Some examples of the research prototypes include Lifeguard [18], LiveNet [19], and RTWPMS [20]. These prototypes are capable of measuring multiple body parameters such as blood pressure, SPO₂, heart rate and body temperature. They integrate medical sensors, a wearable
device such as Personal Data Assistant (PDA) and a remote device such as Tablet PC, in which communication links between the medical sensors and the wearable device are achieved through wires. In common, these are all bulky wired systems which are not compatible with present understanding of the term “wearable” and do not have signal processing capabilities.

The second group of systems is on-site whereby the medical sensor nodes process the acquired data in real-time by their on-node processing abilities and either transmit them to a remote terminal or perform corresponding operations, such as detection and classification [21]. The challenge with the on-site processing is the limited energy supply and computational capability which limits the complexity of the algorithms deployed on the sensor node. This work focuses on design and development of low-power, low-complexity digital signal processor that can be employed in the medical sensor nodes in an on-site wearable physiological monitoring system.

1.1.1 WBAN General Architecture

A generic WBAN system is composed of three tiers which are the medical sensor, personal terminal, and remote terminal tiers [13]. The medical sensing tier involves the medical sensors which acquire physical and chemical quantities, apply real-time analog and digital signal processing and then transmit processed signals to a personal terminal via wireless body communication networks. The second tier is usually a base station or a smart phone which relays information between the medical sensor node and the remote terminal. Finally, the remote terminal stores or manages the received data and can link with medical server, emergency server and/or informal caregivers.
for required medical response. The architecture of a \textbf{WBAN} system with three tiers is illustrated in Figure 1.1.

![Figure 1.1: The architecture of a wearable WBAN system, illustrating the typical three tier system](image)

The medical sensing tier generally consists of battery powered medical sensor nodes including analog and digital components which are desired to have high acquisition and processing accuracy while being low-cost, small in size, and power efficient. A generic medical sensor consists of four main functional components as shown in Figure 1.2.

![Figure 1.2: Main functional blocks of WBAN sensors; AFE, \(\Sigma\Delta\) ADC, DSP, RF communications and Power Management](image)

The biopotential signals acquired by the electrodes are amplified and preconditioned by the Analog-Front-End (AFE) block which are then digitized via the Analog-to-Digital...
Converter (ADC) for further processing. Digital signal processing is usually handled by a micro-controller and/or a custom designed Digital Signal Processor (DSP). The digitally processed signals are then transmitted to the second tier via the Radio Frequency (RF) communication block using Wi-Fi, Bluetooth or ZigBee. Finally, power management of all these components are handled by the power management block. Communication between the first and second tier can be achieved via Wi-Fi or Bluetooth requiring a wireless communication module on the sensor node. Operating life time is one of the main obstacles in sensor design and the highest power is consumed by wireless communication block [23,24]. Since the medical sensor has limited computing power and energy supply, it is necessary to handle some of the signal processing, such as noise reduction, feature detection and classification on the sensor nodes in order to avoid transmission of unnecessary information (e.g. motion artifact). This is a trade-off between the power consumption of the wireless communication and the signal processing.

1.1.2 State-of-the-art WBAN Systems with Local Processing

Since the last two decades wearable physiological monitoring sensors with embedded digital signal processing capabilities have been the major research focus in the wearable health monitoring field which encapsulated design of many research prototypes as well as commercially available products. With the developing semiconductor industry, the wearable devices embed digital signal processing into the hardware where local processing takes place before any data transmission. The advantages of on-sensor signal processing are; (1) reduced system power consumption, (2) robust and autonomous res-
ults with minimized system latency, (3) increased quality of the recorded signals, and (4) decreased amount of data to be transmitted thus lesser off-line processing [25]. In the last decade, different Application Specific Integrated Circuit (ASIC) and System-on-Chip (SoC) designs have been proposed, each with different analog and digital processing capabilities.

Some designs are implemented for multi-signal processing that can be deployed in various applications. In [23] a digital signal processing platform to be deployed in ambulatory monitoring systems was proposed. The platform provides power and performance reconfigurability which makes it possible to be employed in different biopotential signal processing applications and has four operation modes; (1) high performance, (2) low power, (3) data collection, and (4) sleep mode. In another study, a biosignal acquisition system is proposed that consists of three separate chips for AFE, DSP and transceiver which are fabricated in TSMC 0.18 \( \mu m \) standard CMOS process [26]. The total power consumption is 1.5 \( mW \) on a supply voltage 1.2 \( V \) and the digital back end processor occupies 460 \( \mu W \) of the total power consumption.

On the other hand, several studies were presented on signal specific DSP design that were mainly concentrated on Electroencephalogram (EEG) and Electrocardiogram (ECG) applications. Verma \textit{et al.} [27] presented a SoC for continuous real-time detection of seizure onset in epilepsy patients. Each EEG channel corresponds to a SoC which is comprised of an AFE, ADC and DSP. The DSP performs local feature extraction which is followed by the wireless transmission of the extracted data for classification processes. It is reported that each SoC is implemented in a 0.18 \( \mu m \) CMOS process and consumes 120 \( \mu W \) on a supply voltage of 1 \( V \). In another study, an 8-channel
scalable EEG acquisition SoC which integrates an AFE, a personalized seizure onset detector and classifier is presented. It is implemented with 0.18 µm CMOS technology and shows energy efficiency of 2 µJ/classification at 128 classifications/s. In a prototype device assembling Micro-Electro-Mechanical Systems (MEMS) sensor for nasal airflow measurement, an Integrated Circuit (IC) for sleep apnea detection and autonomous scoring by applying time-domain signal processing and a wireless transmitter. The IC chip is fabricated in 0.5 µm CMOS technology and integrates an AFE, a breathing rhythm detection block, a time-to-digital converter. The IC chip dissipates 33 µW on a 5 V supply voltage. Page et al. demonstrated an Field-Programmable Gate Array (FPGA) implementation and ASIC design in 65 nm CMOS of a personalized seizure detector and classifier. The ASIC dissipates power between 37 nW and 77 nW when running at its nominal and maximum frequencies of 484 Hz and 1 GHz, respectively, with 1 V supply voltage. In more recent studies, neural signal processors are demonstrated where Discrete Wavelet Transform (DWT) is employed for various purposes such as feature extraction and data compression. In , a multiplier free DWT processor for a 32-channel neural recording system is presented which is implemented in a 0.18 µm CMOS process. The processor is responsible for data compression through a four-level DWT decomposition and is reported to consume 37 µW at 6.4 MHz operating frequency. Finally, another 32-channel neural signal processor mapped on 0.18 µm CMOS technology is reported by Yang et al. with spike detection, feature extraction and spike classification capabilities. The processor employs Haar DWT based feature extraction and consumes 24 µW at 160 kHz operating frequency. In addition to EEG applications, the following demonstrate some examples for ECG
processors from the open-literature. In 2011, Liu et al. [33] proposed an ECG signal-processing ASIC with 0.18 µm CMOS technology, employing wavelet transform algorithm for the purpose of real time artefact removal, QRS detection and heart rate prediction and classification with a total power consumption of 29 µW on a supply voltage of 1 V. In [34] a 3-lead wireless ECG SoC integrating AFE, RF transceiver and a application specific micro-controller that can be programmed to perform heart rate detection is demonstrated. The SoC is implemented with 0.13 µm CMOS technology can operate in two modes; the heart rate detection and raw data transmission modes during which the power consumptions are 17.4 µW and 74.8 µW respectively on a supply voltage of 0.7 V. In another study, Min et al. [35] presented an ECG detector chip fabricated in 0.35 µm CMOS technology employing an ADC to be deployed in implantable cardiac pacemakers. The detector is based on wavelet-transform and consumes 19 µW on a supply voltage of 3 V. In [36] an ECG processor capable of real-time ECG recording and QRS, P and T wave detection that employs wavelet transform is proposed. It is implemented in 0.18µm CMOS technology and consumes 457 nW at 0.5 V supply. An event-driven ADC with real-time QRS detector, in 0.13 µm CMOS technology, to be employed in ambulatory ECG monitoring applications is presented by Zhang et al. [37]. The detector employs two algorithms; (1) pulse triggered and (2) time-assisted pulse triggered QRS detection, and together with ADC the detector consumes 220 nW at 300 mV supply. Kim et al. [38] described a mixed signal SoC which integrates AFE and a custom designed DSP for continuous and real-time 3 channel ECG acquisition, heart beat detection using wavelet transforms and motion artefact removal as well as electrode-tissue impedance measurement. The proposed SoC can
operate in three modes which are data acquisition, heart beat detection and accurate R peak detection. The digital back end can be programmed to perform two motion arte-fact removal algorithms (i.e. adaptive Least Mean Square (LMS) filtering and Principle Component Analysis (PCA) prior to accurate R peak detection. It is implemented in 0.18 \( \mu m \) CMOS technology and power consumption is between 32 \( \mu W \) and 82.4 \( \mu W \) at a supply voltage of 1.2 \( V \) depending on the selected mode. Zou et al. \cite{39} presented an ASIC fabricated in TSMC 65 \( nm \) CMOS technology, for ECG acquisition, data compression and R-peak detection applications. The ASIC embeds data compression and R peak detection algorithms based on wavelet transform. The chip is tested in two modes with two different supply voltages. The first mode is the recording mode in which the approximation of the measured signals are stored in the memory bank and compressed before transmission. In this mode the ASIC consumes 49 \( \mu W \) and 312 \( \mu W \) with 0.7 \( V \) and 1.2 \( V \) supply at operating frequency 9 \( kHz \), respectively. The second mode is the detection mode where a detection algorithm is applied on the acquired signal and the output is compressed for wireless transmission. The second mode dissipates less power that the first one which are 33 \( \mu W \) and 233 \( \mu W \) for 0.7 \( V \) and 1.2 \( V \) supply voltages, respectively. Whereas, the transmitter on its own consumes 37.2 \( mW \) with 2 \( V \) supply voltage. In \cite{40} an adaptive ECG feature extraction and delineation algorithms are implemented within an ASIC in 65 \( nm \) CMOS technology is presented. The total power consumption of the ASIC is reported as 614 \( \mu W \). An improved version of this processor is presented in \cite{41} which provides additional features like statistical analysis based feature extraction and classification based on naive Bayes classifier. A similar technology is used for implementation and the power consumption is reported
to be 2.78 $\mu W$ at 10 kHz operating frequency with a supply voltage of 1 V. Furthermore, in a more recent study an efficient ECG processor is designed for QRS detection purposes which employs Haar wavelet based DWT with a multiplier free structure. It is implemented on 0.18 $\mu m$ CMOS technology and reported to consume 410 nW with a supply voltage of 1V [42].

The aforementioned systems are all capable of achieving low power consumption, however it is difficult to make a fair comparison among each other due to the different technologies, operating frequencies, and the number of channels they use as well as the modalities they employ. Although there are many other solutions along with the provided examples, there is still room for optimization in hardware and power consumption which will enable utilization of sophisticated techniques for mobile health monitoring systems. As reported in [42], utilization of DWT filter banks with higher filter orders than the Haar wavelet filters are avoided due to the stringent hardware and power constraints of the portable medical devices. Therefore, this raises an interest in investigation of different techniques in order to implement low-complexity DWT filter banks with different wavelet functions.

1.2 Objectives

Wearable health monitoring systems have very stringent area and power requirements as they are desired to be small in area and operate for long time. As mentioned above, the highest power is consumed by wireless communication block [23, 24]. Since the medical sensor has limited computing power and energy supply, in the last decade, it is aimed to incorporate as much local signal processing in order to reduce the amount
of data transmitted. Therefore, the aim of this research is to investigate alternative solutions to the state-of-the-art in the biomedical signal processing literature in order to reduce the hardware complexity of a biomedical DSP while improving its power consumption. Based on the literature review, the objectives for this research can be listed as follows:

- To investigate the applicability of infinite impulse response filters in the decimation chain of a biomedical signal acquisition system due their hardware simplicity and minimized power consumption. There is a trade-off between hardware simplicity and the non-linear phase. Thus, the subsequent aim is to investigate and evaluate the non-linear phase effects of the all-pass based infinite impulse response filters.

- To analyse implementation cost of wavelet transform and investigate a more hardware and power efficient implementation alternative. Wavelet transform is a popular tool employed in biomedical applications for decomposing signals into different frequency bands where noise reduction and detection algorithms can be applied. Furthermore, reconfigurable multiplier blocks have been used in fixed coefficient filtering and transforms which significantly reduced hardware utilization for FPGA implementations. Thus, employing reconfigurable multiplier blocks in wavelet transform filters will contribute to power reduction of the transform in biomedical applications.

- Infinite impulse response filters are hardware efficient, thus it is aimed to investigate, design and implement infinite impulse response filters based wavelet transform that can be employed in biomedical signal processing applications. In-
finite impulse response filters are well-known to achieve similar or better filtering performance with lower filter orders as compared to finite impulse response filters. Low order filters will automatically lead to reduced hardware complexity and power consumption since the number of power hungry arithmetic operations are less. Furthermore, better frequency selectivity that can be achieved with simple structures will provide an advantage and therefore, it is also aimed to investigate the denoising performance of this novel implementation of the wavelet families in the analysis of biopotential signals.

- The IIR wavelet analysis filter banks exhibits non-linear phase which introduces unwanted distortions in the reconstructed data. A typical FIR based perfect reconstruction filter bank overcomes this problem by simply time-reversing the filter coefficients which eliminates the non-linear phase of the analysis filter bank. However, due to the feedback path existing in the IIR filter structures the time-reversal of the filter coefficients is not a straight forward and simple operation. Therefore, the subsequent aim of this thesis is to overcome the phase non-linearity of the IIR wavelet filter banks by investigating different approaches.

1.3 Novel Contributions of the Work

The novel contributions of the research conducted and resulting publications are listed below.

- This work presents novel use of a decimation filter chain incorporating two-path HB all-pass based IIR filters in ECG data acquisition systems with a high filter-
ing performance, a completely multiplier free hardware structure and low power consumption. The proposed structure highly reduces hardware complexity compared to the existing decimation filters used in this field which in turn reduces power consumption.

- The non-linear phase effects of the described filters on ECG signals, which may introduce signal distortions leading to misdiagnosis is investigated which, to the best of the author’s knowledge, is a first in open literature. This work concludes that the proposed design introduces minimal distortion to the signal of interest and would not affect critical diagnosis, and thus, this is an efficient approach for the decimation process for high resolution biomedical data conversion and acquisition applications.

- A novel Reconfigurable Multiplier Block (ReMB) structure targeting FPGA technologies is proposed for 8-tap Daubechies filters that have been widely employed in biomedical signal processing applications. This structure employs Look-up Tables (LUTs) with increased number of inputs, unlike state-of-the-art and increases the reconfigurability of the system. The proposed structure highly reduces hardware complexity compared to a general purpose multiplier which in turn reduces power consumption. The method introduced can also be employed in the design of other wavelet functions.

- IIR filter based Discrete Wavelet Transform for biomedical signal processing applications and the effect of phase non-linearity of the analysis IIR Wavelet Filter Bank is investigated. In the open literature, the IIR wavelets has not been involved in real-time continuous data processing applications which require per-
fect reconstruction due to the problems arise from their non-linear phase. In this thesis, a novel solution of Hybrid IIR/FIR Wavelet Filter banks is proposed which enables the use of IIR wavelets in continuous biomedical signal processing applications while achieving minimal computational complexity, power consumption and near-perfect reconstruction. As a result of this research;

- Two novel hybrid IIR/FIR wavelet filter banks are proposed and designed to be deployed in biomedical signal processing applications.

- The floating- and fixed-point structures of the proposed hybrid filter banks are modelled in which the quantization noise effects in the filter bank datapath and filter datapath itself are investigated.

- The proposed hybrid systems are implemented on an FPGA device using the novel method of designing ReMBs which further reduces the complexity of the proposed systems.

- Novel use of IIR filter based Wavelet Transform for ECG signal denoising is proposed. The denoising performance of the proposed filter banks are compared with the conventional FIR based wavelet transform. The proposed designs provide better performance in terms of the improved Signal-to-Noise Ratio (SNR) and Mean Square Error (MSE). IIR wavelets are shown to achieve better frequency selectivity than the conventional FIR wavelets which provides a significant advantage in denoising applications.

- List of publications relevant to this work:
  - Y. Eminaga, A. Coskun, S. A. Moschos and I. Kale, “Low Complexity All-


• List of awards received from the attended conferences:

– The Gold Leaf Certificate awarded for the best written and presented paper
in the 11th Conference on PhD Research in Microelectronics and Electronics (PRIME), 2015.

– The Best Presentation Certificate awarded in the Doctoral Conference organized by the Faculty of Science and Technology at the University of Westminster, 2017.


1.4 Outline of the Thesis

This thesis consists of six subsequent chapters.

Chapter 2 is composed of two main parts. The first part introduces background information on the physiological signals that are most commonly recorded for diagnostic purposes inside and outside clinical environment. This is further divided into three sections for providing detailed information about the physiological origin, morphological characteristics, recording techniques and applications of the ECG, EEG and Electromyogram (EMG) signals. The second part provides a review of the wavelet theory and explains its main differences and advantages over Short-Time Fourier Transform (STFT). Here two main types of WT which are Continuous Wavelet Transform (CWT) and DWT are introduced where DWT is described in terms of digital filter banks.

Chapter 3 presents the proposed decimation filter chain that comprises a Slink filter,
two two-path all-pass based HB IIR filters and a Slink roll-off compensator. A review of the state-of-the-art decimation filters used in biomedical applications is presented which is followed by the detailed description of the decimation filter chain structure along with their magnitude characteristics. A brief description of phase linearity and group delay is provided. Phase characteristics of the two-path all-pass based HB IIR filter is analysed and phase compensation method used is described. The chapter further provides description of the ECG databases used for simulation purposes. The chapter is concluded by discussions on the simulation results.

Chapter 4 presents the proposed ReMB structure for 8-tap Daubechies wavelet transform filters that replaces multiplication operation with a reconfigurable shift-add network using multiplexers optimized for FPGA implementation. The chapter provides a brief introduction about different methods used for multiplier-block implementation and gives some examples from the open literature. The method used for implementing the proposed design is described and followed by the implementation results in terms of hardware utilization and estimated dynamic power consumption. The proposed design is compared with alternative designs achieved through state-of-the-art methods and a general purpose multiplier. In addition, the comparisons are extended by including the open literature designs implemented for the db4 filters in order to evaluate the efficiency of the proposed system. Chapter 4 concludes with discussions and conclusions.

Chapter 5 provides an introduction on how to employ IIR filters for the wavelet transform to be employed in biomedical applications. A procedure for designing IIR wavelets is presented where detailed mathematical steps are provided along with the design examples. Furthermore, the synthesis filter bank implementation problems due to the
need for anti-causal filtering is addressed where two methods are presented to overcome this problem. This chapter provides detailed design and fixed-point implementation considerations regarding the analysis and synthesis filter banks of the IIR wavelets as well as the computational complexity and the estimated power dissipation of the proposed systems. In addition, application of the designed wavelet filter banks in ECG signal denoising scenario is presented and their performance are compared to the state-of-the-art wavelet filter banks. The chapter is concluded with the conclusions section.

Chapter 6 provides the summary and the evaluation of the study presented in this thesis. The driven conclusions, possible extensions of this work and the future work are presented in detail.

1.5 Chapter Conclusion

Chapter 1 provided an introductory knowledge on the WBAN systems and the challenges faced by both users and engineers. The aims, contributions and publications along with the outline of the thesis are summarised here.
Chapter 2

Physiological Background and Wavelet Theory

2.1 Introduction to Physiological Signals

Physiological signal monitoring has been one of the most widely employed methods for patient observation and diagnosis. The monitoring systems are usually used in clinical environment where admission of the patient is required for continuous real-time monitoring. Portable recorders, such as holter monitor, have been widely used for outside hospital monitoring, however they lack to provide real-time data that can be used for taking instantaneous action in emergency situations. As stated in the previous sections, emerging wireless and low-power technologies can allow employment of these monitoring systems outside the clinical environment and offer a level of comfort to the users while supplying real-time diagnostic data to the healthcare professionals and system, and caregivers.
This chapter provides a background information on the most widely used physiological signals for diagnostic purposes that can be monitored inside and outside the clinical environment. These signals are the ECG, EEG and EMG waves that can be measured through the body surface. They are generated by the ion exchange through the cell membrane, resulting in a cycle of the cellular potential known as the Action Potentials (APs) by different types of cells [43].

2.1.1 Electrocardiography (ECG)

A human heart consists of four chambers which are right and left, atria and ventricles, responsible for the collection and transport of blood throughout the body. The right side of the heart collects the deoxygenated blood from the systemic veins and pumps it towards the lungs whereas the left side collects the oxygenated blood from the pulmonary veins and pumps it to the rest of the body [44].

Cardiac excitation involves the generation of electrical impulses, referred to as APs, by individual cells due to the electrical current flow across the cell membranes and their conduction to neighbouring cells [45]. The conduction system of the heart consists of three groups of specialized cells which are known as the Sinoatrial (SA) Node, Atrioventricular (AV) Node and bundle of HIS. Cells within the SA node, also known as Pacemaker cells, which are located on the right atrium, have the fastest rate of AP generation and drive the rest of the heart at this rate [46]. Thus, electrical activation of the heart starts within the SA node which fires electrical impulses that propagates through the AV Node, bundle of HIS down to the Purkinje Fibers [46].

A recorded ECG signal represents the temporal and spatial summation of the action
potentials generated by these specialised group of cells through the conduction system during a cardiac cycle \[47\]. Simplified anatomy of the heart and the generated AP shapes and durations for specialised cardiac cells are illustrated in Figure 2.1. Here it is depicted that different parts of the cardiac conduction system give rise to APs with different shapes and durations, at different times and different locations. In this figure, the APs generated by each group of cells are colour coded and their contribution to an ECG waveform (bottom right corner) during a cardiac cycle can be clearly seen. For instance, the two pink APs generated by the ventricular muscles contribute towards the generation of the QRS complex and the T wave which are important features of the ECG signal and will be explained in the section below.

Figure 2.1: Conduction system of the heart and generation of a ECG signal by the temporal and spatial summation of APs \[47\].
ECG Waveform Morphology

ECG signals reflect the cardiac activation of the heart measured between any two points on the body surface using electrodes. A standard clinical ECG signal is composed of waves at different frequency bands, each reflecting the electrical activation of different parts of the heart. These waves are known as the P waves, QRS complex (combination of Q, R and S waves), and T waves, representing the atrial depolarization, ventricular depolarization and repolarization, respectively. The segments and intervals between these waves such as ST segment as well as PQ/PR interval, QRS width, QT and RR interval also represent different cardiac events and carry diagnostic information. Figure 2.2 depicts a typical ECG waveform where each wave and interval is presented.

![ECG Waveform](image)

Figure 2.2: Time domain features of an ECG signal: P, QRS, and T represent atrial depolarization, ventricular depolarization, and atrial and ventricular repolarization respectively [48].

These waves have extremely low amplitudes ranging from 100 µV to 5 mV and low diagnostic frequency bandwidth between 0.05 to 100 Hz [49]. The standard clinical features of the ECG waves for a healthy adult male in sinus rhythm are presented in Table 2.1 [49] and are dependent on several factors such as age, gender, heart rate, respiration patterns and diseases [49].
Table 2.1: Normal values of a Lead II ECG features of a healthy subject in sinus rhythm at 60 BPM [49].

<table>
<thead>
<tr>
<th>Feature</th>
<th>Normal Value</th>
<th>Normal Limit</th>
</tr>
</thead>
<tbody>
<tr>
<td>P width</td>
<td>110 ms</td>
<td>± 20 ms</td>
</tr>
<tr>
<td>PQ/PR interval</td>
<td>160 ms</td>
<td>± 40 ms</td>
</tr>
<tr>
<td>QRS width</td>
<td>100 ms</td>
<td>± 20 ms</td>
</tr>
<tr>
<td>QT interval</td>
<td>400 ms</td>
<td>± 40 ms</td>
</tr>
<tr>
<td>ST segment</td>
<td>70 ms</td>
<td>± 10 ms</td>
</tr>
<tr>
<td>P amplitude</td>
<td>0.15 mV</td>
<td>± 0.05 mV</td>
</tr>
<tr>
<td>QRS height</td>
<td>1.5 mV</td>
<td>± 0.5 mV</td>
</tr>
<tr>
<td>ST level</td>
<td>0 mV</td>
<td>± 0.1 mV</td>
</tr>
<tr>
<td>T amplitude</td>
<td>0.3 mV</td>
<td>± 0.2 mV</td>
</tr>
</tbody>
</table>

ECG Recording Techniques and Applications

Standard clinical ECG instrumentation consists of ten surface electrodes placed on the chest, limbs and left leg. This system records 12-lead ECG which is the most commonly used method for diagnostic purposes. In wearable WBAN systems usually 1-, 3- and 6-lead ECGs are recorded by employing two and three electrodes respectively, in order to minimize the system complexity and power dissipation [34]. The ECG is a fundamental component in patient monitoring and diagnosis, thus accurate ECG signal acquisition and its precise analysis are of great importance and has been subject to numerous research work [33, 41, 50]. Diagnosis of the cardiac health conditions mostly rely on the assessment of ECG data based on the interbeat timing and wave amplitudes. Different types of arrhythmias can be distinguished by the morphological and beat-to-beat interval variations and/or missing beats. These morphological abnormalities sometimes can be fatal and often occur sporadically which require long-term and real-time monitoring [26, 33, 34, 38, 40]. ECG is also employed in cardiac implants such as Pacemakers. The detection of abnormalities is used for triggering stimulations by the implanted device in order to regulate functioning of the heart [35].
2.1.2 Electroencephalography (EEG)

The nervous system is responsible for collection, transmission and processing of information from various body parts and comprises two subsystems. The first is the Central Nervous System (CNS) that employs the brain and spinal cord. The cerebral cortex is the largest part of the brain which is located at the outermost layer and consists of two hemispheres responsible for vital functions such as movement, perception, learning and speaking [51]. Each hemisphere is divided into four cortical lobes: (1) occipital, (2) temporal, (3) parietal and (4) frontal lobes. Figure 2.3 shows one cortical half with the four major lobes. The frontal lobe is responsible for the control of voluntary movements such as speaking and finger movement. Whereas hearing, seeing and sensing is controlled by the temporal, occipital and parietal lobes, respectively [51]. The second subsystem is the Peripheral Nervous System (PNS) comprising the nerves responsible for transmitting information between the CNS and other organs or vice versa [52].

![Lateral view of the cerebral cortex with occipital, temporal, parietal and frontal lobes](image.png)

Figure 2.3: Lateral view of the cerebral cortex with occipital, temporal, parietal and frontal lobes [52].

EEG is the detection and recording of the electrical activity of the brain along the scalp, produced by the firing of the cortical neurons which are the main functional
cell type in the cerebral cortex \[53\]. Similar to the cardiac cells, positive deflection in the membrane potential across the cortical neurons, result in generation of APs which propagate through the PNS creating a communication link between the CNS and PNS peripherals. The electrical activity of a single cortical neuron does not have sufficient strength to be recorded by surface electrodes, since it gets attenuated by thick layers of tissue (fluids, bones, and skin) around the cerebral cortex \[51\]. Thus, a recorded EEG signal represents summation of the synchronous activity of numerous cortical neurons.

Brain is the driving force in the nervous system and many brain disorder diagnoses are done by critical evaluation of the recorded EEG signals, therefore accurate recording of the brain activity is a significant task.

**EEG Rhythms and Waveforms**

Typical adult human scalp EEG signals have amplitudes ranging from 10 to 300 \(\mu V\), and frequency spectrum ranging from 0 to 100Hz which can vary with different individuals as well as with the state of the brain activity \[54\]. Figure 2.4 presents 60 second epochs of recorded EEG signals from the centre of the four cortical lobes.

![Figure 2.4: Example EEG recordings at (a) frontal, (b) temporal, (c) parietal and (d) occipital lobe, which were recorded while subject performed different motor/imagery tasks \[55\].](image_url)
The EEG signal is comprised of five major waves with different frequency ranges that are associated with different brain activities. These waves are also referred as rhythms due to their oscillatory and repetitive behaviour. As shown in Table 2.2, when the brain is in active state, such as thinking and focusing, the cerebral cortex is actively processing information therefore, rhythms observed during these activities have high frequency. Consequently, due to high frequency, the cortical neurons’ activation is asynchronous which results in low-amplitude rhythms (Beta, Gamma). On the contrary, the inactive state of the brain such as drowsiness or deep sleep, leads to low-frequency but high-amplitude rhythms (Delta, Theta, Alpha).

Table 2.2: Typical Adult Human Scalp EEG Waves and Associated Frequency Ranges and Body Activities

<table>
<thead>
<tr>
<th>EEG Wave</th>
<th>Primarily Associated Brain Activity</th>
<th>Frequency Band (Hz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delta (δ)</td>
<td>Deep Sleep Infants</td>
<td>0.5 -4</td>
</tr>
<tr>
<td>Theta (θ)</td>
<td>Sleeping in adult</td>
<td>4 - 8</td>
</tr>
<tr>
<td></td>
<td>Deep meditation Children</td>
<td></td>
</tr>
<tr>
<td>Alpha (α)</td>
<td>Resting in adults-Eyes closed</td>
<td>8 - 13</td>
</tr>
<tr>
<td>Beta (β)</td>
<td>Active thinking</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Active attention</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Focusing</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Problem Solving</td>
<td>13 - 30</td>
</tr>
<tr>
<td>Gamma (γ)</td>
<td>Event-related synchronization (ERS)</td>
<td>30 - 45+</td>
</tr>
</tbody>
</table>

Recording Techniques and Applications

The EEG can be recorded either non-invasively or invasively. The non-invasive recording is achieved by placing surface electrodes on to the scalp whereas invasive recording is achieved by directly placing subdural electrodes into the brain tissue. The invasive EEG recording is also known as Electrocorticogram (ECoG). The ECoG signals are usually less contaminated with artefacts and have higher amplitudes however, it is an
invasive method and not suitable for ambulatory applications. Non-ambulatory clinical EEG recording systems commonly use a standardized method for electrode placement also known as the International 10/20 system in which 64 or more electrodes are placed along the scalp [56]. On the other hand, ambulatory EEG recording systems usually employ less amount of electrodes due to the power and hardware limitations [27–30]. EEG is an important measure for seizure detection applications in epilepsy patients, diagnosis of sleep disorders [29, 57] as well as development of brain-computer interface [58, 59]. Detection of onset of the epileptic seizures can be a challenging task and necessitates long-term monitoring, traditionally in a controlled clinical environment where the patient has to be motionless for prolonged period of time. This causes discomfort to the patient, increasing the demand for more convenient methods that can be easily used. These include off-line [60, 61] and on-line recording systems [27, 28, 30, 62], both offering extended recording time with satisfactory signal quality while increasing level of patients’ comfort and reducing health expenses.

2.1.3 Electromyography (EMG)

EMG is a technique for recording the electrical activity of the skeletal muscles during voluntary and reflexive movements. On receiving an impulse from the CNS, skeletal muscle fibres are activated, generating muscle fibre APs propagating along the length of fibres. Spatial and temporal summation of APs from different muscle fibres results in electrical potential that can be measured between two points on the body surface [63]. Figure 2.5 presents a schematic view of the activation of muscle fibres.
EMG Signal Characteristics

The amplitude and frequency of EMG signals range from 100 µV to 90 mV and from 25 Hz to several kilohertz, respectively depending on the muscle group of interest.

An example EMG obtained from finger movements of a healthy subject generated by a two-channel surface EMG system is presented in Figure 2.6.

Figure 2.6: Example recording of EMG obtained from individual and combined finger movements of a healthy subject. Each five second interval (indicated with dashed lines) represents different finger’s movement.
EMG Recording Techniques and Applications

As with EEG, EMG can be recorded invasively and non-invasively using needle and surface electrodes, respectively. Invasive EMG recording is achieved by direct placement of needle electrodes in specific muscle fibres of interest and has the advantage of being less prone to artefacts. However, invasive measurements are impractical and discomforting for patients. On the other hand, EMG recording via surface electrodes is convenient for ambulatory wearable applications where no clinical admission is required and patients can function in their comfort zone. However, surface EMG signals have weaker signal amplitudes due to the attenuation by extra tissue on the signal path and thus, they are more likely to be contaminated by artefacts. This necessitates improved acquisition and processing methods for increased EMG signal interpretation accuracy both for diagnostic and research purposes.

Wireless EMG sensors are being deployed in wide range of research areas along with the traditional diagnostic applications [65–67]. These include gesture recognition, prosthetic control, and rehabilitation [68–73]. These research outputs offer multi-channel recording modalities for long-term monitoring while Brunelli et al. [71] employs on-node signal processing capabilities.

2.2 Introduction to Wavelet Theory

Over the past three decades, WT has been a very popular tool for time-frequency domain analysis of non-stationary signals whose spectral content vary in time. WT provides adaptive, multiresolution analysis which makes it a very successful signal pro-
cessing technique that has been used in various fields including biomedicine, geophysics, telecommunications, image and video coding, etc. In Fourier analysis, a time domain signal is expanded onto orthogonal basis functions, sine and cosine waves, in order to define its spectral content. Although, weighted sums of sinusoids have perfect compact support and provide very good localization in frequency domain, they are global (i.e. infinite support) in time. This means that a frequency component observed in the frequency domain cannot be localized in the time domain. In some applications such as biomedical diagnostics, it is crucial to localize frequency variations over time, thus the Fourier Transform (FT) cannot be used in such applications to analyse non-stationary signals where both time and frequency information is critical. An alternative solution to this problem was introduced by Gabor \[74\] in 1946 known as the STFT. In this method a signal is segmented into short intervals using a windowing technique and the FT is applied on each signal segment. The segments of a non-stationary signal are treated as stationary whose statistics remain unchanged for their duration. However, this method suffers from the lack of resolution due to the fixed window length throughout the analysis. Many real world signals’ spectral contents change rapidly, therefore finding an appropriate short-time window during which signal is stationary can be a challenging task. Choice of a very narrow window results in poor frequency resolution (i.e. localization) whereas a wide window results in poor time resolution as well as invalidating the stationarity assumption within the window. The drawback of STFT was overcome by introduction of WT which analyses signals with varying window lengths. WT provides a multiresolution analysis by means of shrinking the window length at high frequencies and expanding it at low frequencies. Alfred Haar \[75\], was the first
to present concept of wavelets in 1909, by introducing the simplest wavelets, known as the \textit{Haar} wavelets \cite{76}. Following Haar, many applied physicists and mathematicians such as Morlet \cite{77}, Grossman \cite{77} and Meyer \cite{78} studied wavelets as an alternative to Fourier based analysis techniques for many years. However, it was not until late 1980s that the connection between wavelets and signal processing was established \cite{79,80}. Since then WT has become a great interest in research especially in the field of digital signal/image processing.

2.2.1 Short-Time Fourier Transform

The well-known FT is a mathematical tool that is used to transform a time domain signal into frequency domain signal. FT of any signal $x(t)$ is defined as;

$$X_{FT}(\omega) = \int_{-\infty}^{\infty} x(t) e^{-j\omega t} dt$$

(2.1)

where $\omega = 2\pi f$. As (2.1) states, FT expands any signal onto orthogonal basis functions of sinusoids. However, FT does not provide information on how the frequency contents of the signal vary with time. In other words, one cannot determine whether the frequency information of a signal obtained through FT is continuously present throughout the time of observation or only at certain intervals, which can be easily seen in the time-domain representation \cite{81}.

Now, lets consider a signal $x_s(t)$ given in (2.2), which is a multi-tone sinusoid with frequencies $20 \ Hz$, $100 \ Hz$, $200 \ Hz$ and $400 \ Hz$.

$$x_s(t) = \cos (2\pi 20t) + \cos (2\pi 100t) + \cos (2\pi 200t) + \cos (2\pi 400t)$$

(2.2)
The signal has constant frequency content over an indefinite time span thus, \( x_s(t) \) is stationary. Another multi-tone signal \( x_{ns}(t) \) over a one second observation window, with the same frequency content as \( x_s(t) \) is given in \((2.3)\).

\[
x_{ns}(t) = \begin{cases} 
\cos(2\pi 20t) & 0 < t \leq 0.25s \\
\cos(2\pi 100t) & 0.25 < t \leq 0.5s \\
\cos(2\pi 200t) & 0.5 < t \leq 0.75s \\
\cos(2\pi 400t) & 0.75 < t \leq 1s 
\end{cases}
\]

(2.3)

Although \( x_s(t) \) and \( x_{ns}(t) \) have the same spectral components, the frequency content of \( x_{ns}(t) \) varies over finite time spans within the observation window which results in a different time-domain signal and it effectively becomes non-stationary within that observation window. The time domain and frequency domain responses of these signals are given in Figure 2.7.

![Figure 2.7: Time and frequency domain responses of a (a) Stationary and (b) Non-stationary signal.](image)

Carrying-out a spectral analysis using the FT with one-second window size provides little information about the time-domain characteristics of the signal \( x_{ns}(t) \). Thus, shorter window lengths are required, with the assumption that the signal will be sta-
tionary in that duration. In applications where the time information as well as the frequency information of the signal is needed, STFT can be used to localize frequency variations over time. STFT of any signal \( x(t) \) is defined as:

\[
X_{STFT}(\tau, \omega) = \int_{-\infty}^{\infty} x(t) h(t - \tau) e^{-j\omega t} dt
\]  

(2.4)

where \( h(t) \) is the window function. STFT uses a fixed-length window \( h(t) \) which is shifted along the time axis in order to analyse the spectral content of the signal in the windowed interval. Although this is a popular time-frequency method, it suffers from a time and frequency resolution trade off. Due to the fixed-length window the whole signal has to be analysed with the same time and frequency resolution. Therefore, the accuracy of this method is limited to the size and shape of the window function employed. Biomedical signals are non-stationary and are composed of high frequency spectral components which are closely spaced in time and long lasting low frequency components that are closely spaced in frequency. Such high frequency components can only be analysed with very short duration windows that have poor frequency resolution, where as low frequency components require better frequency resolution with a longer window. Thus, STFT is not the optimal technique for this type of analysis.

### 2.2.2 Wavelet Transform

Wavelet analysis utilizes wavelets to transform a signal into time-frequency representation for easier interpretation and processing in comparison to the FT. A wavelet is a limited duration oscillatory wavelike function that has a zero mean and its energy concentrated in time. Wavelets have different shapes that change accordingly with their additional mathematical properties \[S2\]. The WT calculates the correlation between
a signal and the selected wavelet which results in large magnitude wavelet coefficients for resembling shapes of signal and wavelet, and low magnitude otherwise. The selected wavelet is generally referred to as *mother wavelet*. There are many different mother wavelets such as Daubechies (\(db\)), Coiflets (\(coif\)) and Symmlet (\(sym\)) and some examples of which are shown in Figure 2.8 [83].

![Wavelet Functions](image)

Figures (a) for \(db4\), (b) for \(coif4\) and (c) for \(sym4\)

Selection of the mother wavelet is dependent on the application requirements and some research is reported for the selection of optimal wavelets for biomedical signal processing [84–86]. The \(WT\) can be divided into two main groups known as the \(CWT\) and the \(DWT\) which are briefly described in the following sections.

**Continuous Wavelet Transform**

The \(CWT\) is achieved by convolving a signal with translated and dilated versions of the mother wavelet, generating wavelet coefficients. The temporally shifted and scaled versions of the mother wavelet create the wavelet basis function \(\psi_{a,\tau}(t)\) which is defined in (2.5):

\[
\psi_{a,\tau}(t) = \frac{1}{\sqrt{a}} \psi \left( \frac{t - \tau}{a} \right)
\]  

(2.5)
where $\psi(t)$ is the mother wavelet (i.e. wavelet function), $a$ is scaling factor and is $a > 0, \in \mathbb{R}$, $\tau$ is the amount of shift in time and $\frac{1}{\sqrt{a}}$ is the factor required for energy preservation, so that the wavelet coefficients have the same energy at every scale. For this transform $a$ and $\tau$ vary continuously over $\mathbb{R}$ and the CWT of a function $x(t)$ can be calculated as,

$$W_\psi(a, \tau) = \frac{1}{\sqrt{a}} \int_{-\infty}^{\infty} x(t) \psi\left(\frac{t - \tau}{a}\right) dt$$  \hspace{1cm} (2.6)

where $W_\psi(a, \tau)$ are the generated wavelet coefficients. (2.6) can be realized as convolution, and it appears that $\psi(t)$ has a bandpass spectrum due to admissibility condition, which implies that the FT of $\psi(t)$ ($\Psi(\omega)$) vanishes at $\omega = 0$ [87]. Further mathematical derivations and proofs can be found in [82, 83]. Hence, CWT is computed by filtering the signal with dilated bandpass filters. The scaling factor $a$ is directly related to the spectrum analysing window size. A small scale factor compresses the wavelet in time, where rapidly changing details can be captured corresponding to high frequency components. Whereas a larger scale factor results in a stretched wavelet, capturing slowly changing coarser features of the signal and this corresponds to low frequency components. By varying the scale factor, the whole spectrum with varying window sizes can be analysed. In addition, changing $\tau$ moves the time localization centre of $\psi_{a,\tau}(t)$, where each $\psi_{a,\tau}(t)$ is centred around $\tau$. This way, one-dimensional time-domain data is transformed into two-dimensions. Therefore, CWT provides good spectral resolution for low frequency components and good temporal resolution for high frequency components. The time-frequency resolution of both STFT and CWT are presented in Figure 2.9 by using Heisenberg boxes [88].
Discrete Wavelet Transform

Unlike the Continuous Wavelet Transform (CWT) that employs continuous scale and shift values, the Discrete Wavelet Transform (DWT) discretizes these values such that $a^{-1} = a_0^j$ and $\tau = k a_0^j \tau_0$, where $j, k$ are integers and $a_0 > 1, \tau_0 > 0$ are fixed. Therefore, discretized wavelet function becomes:

$$\psi_{j,k}(n) = a_0^j \psi \left( a_0^j n - k \tau_0 \right) \quad (2.7)$$

Choice of $\psi$, $a_0$, and $\tau_0$ is a significant factor in practical implications of DWT. Hence, selection of $a_0 = 2$ and $\tau_0 = 1$ results in a wavelet orthonormal basis which yields to a perfect tiling of the time-frequency plane as demonstrated in Figure 2.9 (b) [82]. A discrete wavelet function with aforementioned values is defined by:

$$\psi_{j,k}(n) = 2^j \psi \left( 2^j n - k \right) \quad (2.8)$$

and the wavelet coefficients of a discrete signal $x(n)$ for $n = 0, 1, \ldots, M - 1$ can be written as:

$$W_{\psi}(j,k) = \frac{1}{\sqrt{M}} \sum_{n} x(n) 2^j \psi(2^j n - k) \quad (2.9)$$
where $\frac{1}{\sqrt{M}}$ is a normalizing term ensuring the recovery of the signal transformed into wavelet domain. As mentioned previously, the wavelet function has a bandpass spectrum, thus wavelet coefficients calculated up to scale $2^j$ does not cover the whole signal spectrum and it is necessary to add the low frequencies of the spectrum as well. This can be achieved via introduction of scaling function $\phi(n)$ which has a similar definition to the wavelet function but with different scaling parameters, as shown in (2.10). Scaling coefficients can also be calculated by replacing $\psi_{j,k}(n)$ by $\phi_{j_0,k}(n)$ in (2.9).

$$\phi_{j_0,k}(n) = 2^{2^{j_0} \frac{n}{2}} \phi(2^{j_0} n - k) \quad (2.10)$$

$$W_{\phi}(j_0, k) = \frac{1}{\sqrt{M}} \sum_n x(n) 2^{2^{j_0} \frac{n}{2}} \phi(2^{j_0} n - k). \quad (2.11)$$

where $j \geq j_0$. In practical applications the $j_0$ is selected as 0 and the length of the input data is selected to be power of 2 such that $M = 2^J$ where $J$ is an integer. The $W_{\psi}(j, k)$ and $W_{\phi}(j_0, k)$ are referred to as the Forward DWT. Therefore, the original signal $x(n)$ can be recovered using the computed wavelet and scaling coefficients using (2.12) and this operation is referred to as the Inverse DWT.

$$x(n) = \frac{1}{\sqrt{M}} \left( \sum_k W_{\phi}(j_0, k) \phi_{j_0,k}(n) + \sum_{j=j_0}^{J-1} \sum_k W_{\psi}(j, k) \phi_{j,k}(n) \right) \quad (2.12)$$

The wavelet function can be expressed as a series summation of the scaling functions which can be expressed by;

$$\psi(n) = \sum_p h_\psi(p) \sqrt{2} \phi(2n - p) \quad (2.13)$$

where $p$ is an integer and the dummy variable. If the $\psi(n)$ is scaled by $2^j$ and shifted by $k$, the new shifted and scaled wavelet function becomes;
\[ \psi(2^j n - k) = \sum_m h_\psi(m - 2k) \sqrt{2} \phi(2^{j+1} n - m) \]  

(2.14)

where \( m = 2k + p \). Therefore, by substituting (2.14) into (2.9), the wavelet coefficients can be expressed by the scaled and shifted versions of the scaling functions and this expression is given by (2.15).

\[ W_\psi(j, k) = \sum_m h_\psi(m - 2k) W_\phi(j + 1, m) \]  

(2.15)

where \( W_\phi(j + 1, m) = \frac{1}{\sqrt{M}} \sum_n x(n) 2^{j+1} \psi(2^{j+1} n - m) \). Similarly, the scaling coefficients can be obtained using (2.16).

\[ W_\phi(j, k) = \sum_m h_\phi(m - 2k) W_\phi(j + 1, m) \]  

(2.16)

Observing (2.15) and (2.16), it can be seen that these are convolution operations with the functions \( h_\phi(m) \) and \( h_\psi(m) \) which are simply the impulse responses of a set of analysis filters. Therefore, the calculation of both wavelet and scaling coefficients can be realized as octave band filtering also known as subband coding which decomposes a signal into octave frequency bands by consecutive stages of filtering and subsampling.

**Implementation of Discrete Wavelet Transform**

DWT can be implemented by two-channel quadrature mirror filter banks with lowpass filter \( h_0(n) \) (i.e. \( h_\psi(-n) \)) and highpass filter \( h_1(n) \) (i.e. \( h_\psi(-n) \)). The output from each filter is downsampled by 2 where outputs at lowpass and highpass branches are known as the approximation coefficient (i.e. scaling coefficients) covering spectrum below half sampling frequency and detail coefficients (i.e. wavelet coefficients) covering
spectrum above half sampling frequency, respectively. For the following stages, approximation coefficients obtained from previous stage are further decomposed with the same lowpass and highpass filters. Decomposition, also known as analysis, is an iterative process and provides coarser resolution in time and finer resolution in frequency. Approximation \((cA_j(n))\) and detail \((cD_j(n))\) coefficients can be calculated as,

\[
cA_{j+1}(n) = \sum_{k=-\infty}^{\infty} cA_j(k) h_0(2n - k) \tag{2.17}
\]

and

\[
cD_{j+1}(n) = \sum_{k=-\infty}^{\infty} cA_j(k) h_1(2n - k). \tag{2.18}
\]

In order to reconstruct the decomposed signal, the procedure applied in analysis stage is repeated in reverse direction by upsampling detail and approximations coefficients, followed by filtering operation using another set of lowpass \((g_0(n))\) and highpass \((g_1(n))\) filters where the outputs of the lowpass and highpass branches are added. This procedure also known as synthesis, is repeated until the original signal length is recovered. Complete demonstration of analysis and synthesis filter banks for DWT calculation is demonstrated in Figure 2.10.

![Diagram](image)

(a) Analysis Filter Bank  
(b) Synthesis Filter Bank

Figure 2.10: DWT analysis and synthesis filter banks, for 3 level decomposition and reconstruction.
2.3 Chapter Conclusions

This chapter provided physiological information on the most widely used biopotential signals in the field of diagnostics and monitoring. The physiological origins of ECG, EEG and EMG signals along with their morphological characteristics and different measuring techniques are presented. Feasible and accurate measurements of these signals are of great importance, since they are used in different applications where some examples of these applications for each signal are provided. In addition, second part of the chapter provides a brief introduction about wavelet transform and its differences compared to well-known STFT. WT is a powerful tool in signal processing applications which has advantage of variable length windows for analysing different frequency components. Thus, it is a superior transform for non-stationary signal applications where signal properties vary over time. WT can be divided into two main types which are CWT and DWT. The DWT can be realized as a tree-form filter bank which provides a practical implementation of wavelet transform. Chapter 4 and 5, provides further information on filter bank implementation and their properties.
Chapter 3

Decimation Filter for Wearable ECG Monitoring Systems

3.1 Introduction

In Chapter 1, the main functional blocks of a wearable WBAN sensor is presented. Recalling Figure 1.2 the second block is given as an ADC which is required to have sufficient resolution and minimal quantization noise for accurate digitization. Due to low amplitude characteristics of the biopotential signals described in Chapter 2, the ADC to be deployed needs to have 10 to 16 bits of resolution. Thus, the Sigma-Delta (ΣΔ) oversampled ADCs are well suited for the digitization process, since they provide high resolution and dynamic range for low-bandwidth biomedical signals with simple hardware architectures [89]. The ΣΔ ADCs incorporate two sections which are the modulator and digital filter. The modulator reduces the quantization noise in the signal band by moving it to higher frequencies by means of oversampling and noise
shaping. Digital processing of the raw high-rate bit stream from the modulator is computationally complex, resulting in high power consumption and expensive digital circuitry. Therefore, a decimation filter is used to reduce the sampling rate while preserving the resolution by filtering the high frequency noise and downsampling.

This chapter presents a decimation filter chain for ECG signal acquisition which provides a highly efficient filtering performance by introducing minimal signal distortion. The proposed design employs cascaded Slink filter (generally spuriously referred to as a Cascaded Integrator-Comb (CIC) filter [90,91]), two path all-pass based HB IIR filters and a Slink roll-off compensator [92–98]. IIR filters are renowned for their computational efficiency however at the expense of having a non-linear phase response which may result in waveform distortions in the time-domain. The work reported here also studies the non-linear phase effects of these filters on ECG data. For further investigation phase compensation filters are implemented using low-complexity cascaded all-pass filter sections and results from the original and phase corrected filters are comparatively studied. To the best knowledge of the authors this is a first in the biomedical signal processing literature. The rest of this chapter provides a brief summary of the state-of-the-art decimation filters used in biomedical applications. The detailed structure of the proposed decimation filter along with the magnitude, phase and group delay characteristics are provided. The following section further provides information on phase compensation filter design and introduces the databases used for simulation, test and evaluation purposes. The chapter concludes with the results and discussion sections.
3.2 State-of-the-art Decimation Filters in Biomedical Applications

In the open literature, several decimation filters are proposed for biosignal acquisition systems where some employed multi-stage Finite Impulse Response (FIR) decimation filters with very high filter orders, and therefore necessitating extra processing in terms of arithmetic operations [1–4,26]. In [5,99] single stage FIR decimation filters are used, in which the filter coefficients are optimized in order to match the analog loop filter response of the modulator. Single stage decimation with a high decimation rate requires sharper transition, higher stopband attenuation and thus a higher filter order. Whereas, the Canonic Signed Digit (CSD) coefficients is another alternative for reducing the complexity of the decimation filters. This representation uses a series of subtractions and additions in order to accomplish the multiplication operation. Although the number of multiplier units are decreased by this method, the need for high order digital filters in the decimation stage is not avoided [100,101]. The aforementioned filters are all designed by using high order FIR filters which inevitably increase the power consumption and hardware complexity. In [102–106], single stage Slink decimators are employed. However, using a single Slink filter with a high decimation ratio comes at a high cost, as it does not easily provide the required stopband attenuation for suppressing the high frequency noise, and exhibits a high roll-off in the passband region which degrades the in-band signal features. Furthermore, in [101,107] three stage decimation chains each employing a Slink, an FIR and an IIR filter are proposed. However, due to the non-linear phase of the IIR filter, an additional equalizer is applied following the
decimation chain. Thus, even though the overall filter order is reduced, compared to their FIR counterparts, the hardware complexity and power consumption is increased due to the equalizer.

3.3 Proposed Decimation Filter Structure

The proposed decimator (filter followed by downsampling) is designed to achieve a decimation ratio of 128 in order to demodulate the 1-bit output stream from a third-order ΣΔ modulator. It is composed of four cascaded multiplier free stages, which are a fourth-order Slink filter, two fifth-order two-path all-pass based HB IIR filters and a first order Slink compensation filter. The overall decimator is designed to achieve 0.04dB passband ripples and −74dB stopband attenuation. The behavioural structure of the cascaded decimation filter incorporating the decimation ratios at each stage is shown in Figure 3.1.

![Diagram](image.png)

Figure 3.1: Behavioural structure of the decimation filter, incorporating the 4th order Slink, two 2-path all-pass based HB IIR, and Slink compensation filters

3.3.1 The Slink Filter

A fourth order Slink filter with a decimation ratio of 32 is the first stage of the proposed design that has linear phase with a multiplier free structure and a z-domain transfer function given in (3.1) [90].
The gain of the filter is required to roll-off at a faster rate than the $\Sigma\Delta$ modulation noise rises, and therefore the order of the filter has to be one higher than the $\Sigma\Delta$ modulators order. A filter with a higher order causes a higher roll-off in the band of interest and a lower order will not provide the required stopband characteristics resulting in aliasing in the signal band [108]. The Slink filter incorporates two sections with the opportunity of shifting the downsampler in between these two sections. The two sections are made up of four cascaded accumulators and four cascaded differencers. The shifted downsampler enables the differencers to operate at a rate of 32 times slower than the input, reducing the power consumption, which makes it well suited for applications requiring high decimation factors and low circuit complexity and power consumption [91]. Magnitude response of the Slink filter is presented in Figure 3.2.

![Figure 3.2: The magnitude response of the fourth order Slink filter given in (3.1).](image)

The Slink filter is behaviourally equivalent to four cascaded 32-point moving averagers followed by a 32:1 sample rate decimation. Hence, as it can be seen, magnitude-response of the Slink filter resembles a $\text{sinc}(x)$ function and has zeros at multiples of

$$H_{\text{slink}}(z) = \frac{1}{32^4} \left( \frac{1 - z^{-32}}{1 - z^{-1}} \right)^4$$

(3.1)
\[ \nu = \frac{1}{32} \] 

### 3.3.2 Two-Path All-pass Based Half-Band IIR Filters

In order to provide computationally efficient second and third stages of the decimation chain shown in Figure 3.1, very high fidelity minimum phase two-path all-pass based HB IIR filters are designed where at each stage the input signal is decimated by two. The filters described here are formed by two-parallel paths composed of second order all-pass filters with multiplier free structures. The delayer in the bottom path of the recursive polyphase structure creates an increasing phase difference between the two paths. This difference reaches 90° at \( \nu_c \) and 180° at \( \nu_s \) which define the normalized cut-off and the stopband frequencies respectively. Since \( A_i(z) \), given in (3.2), is rational and sparse, the downsamplers can be shifted before the all-pass sections by using the Noble identity [109] in order to avoid unnecessary operations as shown in Figure 3.3.

![Figure 3.3: Two-path all-pass based HB IIR decimator structure, incorporating all-pass filters \( A_1(z) \) and \( A_2(z) \) in the top and bottom paths respectively.](image-url)

The coefficients of the all-pass sections in the second and the third decimation stages are designed to be powers of two (\( \alpha_1 = 0.125 \) and \( \alpha_2 = 0.5625 \)) [92]. These coefficients eliminate the need for multipliers by replacing them with shift and add operations, as shown in Figure 3.4 (a) and (b), where " \( \gg \) " represents a binary shift operation along with a number indicating the number of shifts. The z-domain transfer function for
both all-pass sections is $92.110$, \[
A_i(z) = \frac{\alpha_i + z^{-2}}{1 + \alpha_i z^{-2}} \quad (3.2)
\]
where $i = 1, 2$ and represents the coefficients of the all-pass filters.

![Figure 3.4: First order all-pass filter structures for (a) $A_1(z)$ in the top branch with $\alpha_1 = 0.125$ and (b) $A_2(z)$ in the bottom branch with $\alpha_2 = 0.5625$.](image)

Each of the proposed HB IIR filters achieve $0.47 \mu dB$ passband ripples in the region of signal activity and $-70 dB$ stopband attenuation with minimal hardware complexity. The magnitude response of these filters are presented in Figure 3.5.

![Figure 3.5: Magnitude response of the proposed HB IIR filters presented in Figure 3.3](image)
3.3.3 The Slink Roll-off Compensation Filter

The Slink compensator is the last stage in the proposed design, exhibiting a passband response that is the inverse of the slink filter in order to compensate the amplitude roll-off in the band of interest (dc to half Nyquist) caused by the Slink filter. The compensator has a similar structure with the all-pass sections given in Figure 3.4 whereas only one delayer is required as shown in Figure 3.6 [90, 111]. This filter also has a multiplier free structure, since the only coefficient is designed as power of two and can be implemented as a shift operation.

Figure 3.6: Slink roll-off compensation filter structure with coefficient $\alpha_c = 0.03125$.

3.3.4 Decimation Chain Magnitude Response

The decimation filters introduced in the previous sections were implemented using MATLAB in order to validate the overall system performance. Figure 3.7 (a) and (b) present the fullband magnitude response of the decimation chain at the oversampled and output data rate, respectively.
In most ECG signal applications, the desired dynamic range can be up to 60 dB and the minimum required resolution is 8-bits \cite{22}. As it can be observed from Figure 3.7 (a), the overall decimation chain achieves 0.04 dB passband ripples and –74 dB stopband attenuation. This fits the specifications of the application as it provides the required signal resolution and attenuates the high frequency quantization noise. Magnitude characteristics of the Slink (blue) and Slink compensation filters’ (red) in the band of interest are presented in Figure 3.7 (b) in which the inverse Slink characteristics of the Slink compensator can be easily seen. The Slink roll-off is compensated up to a certain ripple size by using only a first order multiplier free allpass IIR filter.

### 3.4 Phase Characteristics of All-pass Based Half-band Polyphase IIR Filter

IIR filters are well known for their non-linear phase response which should be compensated in order to avoid any phase distortion on the signal to be processed. The
filter group delay is a measure of the linearity of the filters phase response which is
defined as the negative derivative of the phase of the system with respect to frequency
i.e. $\tau(\nu) = -\frac{1}{2\pi} \frac{d\phi(\nu)}{d\nu}$ where $\phi(\nu)$ is the filter phase and $\nu = f/f_s$ is normalized fre-
quency, $f_s$ being the sampling frequency. The group delay contribution of the
second stage and third stage filters (Figure 3.1), both of which are chosen to have the
same structural and coefficient content as depicted in Figure 3.3 and Figure 3.4 to the
overall group delay is,

$$
\tau_{IIR}(\nu) = \frac{N_1 \tau_{HB}(N_1 \nu)}{2^{\text{nd\ stage}} \ HB\ IIR} + 2N_1 \tau_{HB}(2N_1 \nu) \quad (3.3)
$$

and

$$
\tau_{HB}(\nu) = \frac{1}{2} + 2 \sum_{k=1}^{K} \tau_k(\nu) + \Delta \quad (3.4)
$$

where $N_1$ is the downsampling ratio at the first stage of the decimation filter and $K$ is
the number of paths in the filter structure. In (3.4), $\frac{1}{2}$ is the group delay contribution of
the delayer used in the bottom path of the polyphase structure and $\tau_k(\nu)$ represents the
group delay contribution of the $k^{th}$ all-pass filter with coefficient $\alpha_k$ to the overall group
delay of the all-pass based HB IIR decimation filter $\tau_{HB}(\nu)$, which can be formulated
as follows,

$$
\tau_k(\nu) = 2 \frac{1 - \alpha_k^2}{1 + \alpha_k^2 + 2\alpha_k \cos(4\pi \nu)} \quad (3.5)
$$

and $\Delta = \sum \delta(\nu_z)$ is the summation of the delta functions in the group delay occurring
at frequencies $\nu_z > 0.25$ corresponding to the filter zeros. Since the frequency
components beyond the cut-off frequency correspond to the filter’s stopband region,
the parameter $\Delta$ in (3.4) can be ignored. The total group delay for the aforementioned
filters results in a bell-like shape that can be formulated as $\tau_{HB}(\nu) = \frac{1}{2} + \tau_1(\nu) + \tau_2(\nu)$ according to (3.4), for $K = 2$ and ignoring the delta functions at the filters stopband. Here $\tau_1(\nu)$ and $\tau_2(\nu)$ are the group delay functions of the all-pass filters in the top and bottom branches. The group delay of the two-path all-pass based HB IIR filter with non-linear phase response is presented in Figure 3.8.

![Figure 3.8: The group delay of the two-path all-pass based HB IIR filter with the all-pass section coefficients $\alpha_1 = 0.125$ and $\alpha_2 = 0.5625$.]

### 3.5 Phase Compensation

Linear phase is desired for most of the systems especially for biomedical applications since the temporal characteristics of a biosignal is of great importance due to diagnostic purposes. A non-linear phase results in a non-constant group delay which introduces different amount of time delays at different frequencies which may result in a distortion in the time domain amplitude of a signal. Phase non-linearity can be compensated by designing cascaded all-pass correctors with a phase response which is approximately opposite of that for the polyphase filter [112]. The aim is to achieve an almost constant group delay response in the band of interest. A general transfer function for a $K$ section
The compensator is,
\[ H_c(z) = \frac{\alpha_c 0 + z^{-2}}{1 + \alpha_c 0 z^{-2} \prod_{m=1}^{K-1} \left( 1 + \alpha_c m z^{-2m-1} \right)} \]

(3.6)

where \( \alpha_c 0 \) and \( \alpha_c m \) are the coefficients of a single second order and higher order correctors respectively \[112\]. In order to study the non-linear phase effects of the proposed all-pass based HB IIR filters, a second order single corrector and a four section compensator are designed according to the detailed algorithm provided in \[112\]. Due to high sampling rate at the second stage of decimation, the phase non-linearity in the band of interest (i.e. the group delay peak-to-peak difference) is small however, it grows quickly as the sampling rate reduces at the third stage. Thus, a single all-pass corrector at the second stage is sufficient whereas, a 4 section corrector is required for the third stage in order to achieve a better correction in the band of interest. In this study, the phase is compensated up-to \( \nu_c = 0.125 \) which reduces the peak-to-peak difference in the band of interest (in this case for \( \nu_c = 0.125 \)) from 0.431 to 0.0041. In other words peak-to-peak error is reduced by a factor of 106.3 times \[112\],

\[ K = \frac{\tau_c(\nu)_{\max} - \tau_c(\nu)_{\min}}{\tau_a(\nu)_{\max} - \tau_a(\nu)_{\min}}, \quad \text{where} \quad 0 \leq \nu < \nu_c \]

(3.7)

where \( \tau_c(\nu) \) and \( \tau_a(\nu) \) are group delays before and after phase correction, respectively.

Figure 3.9 (a) presents the normalized group delay responses of the original filter (blue) versus the phase corrected filter (black) via a single section all-pass corrector (red). Similarly, Figure 3.9 (b) illustrates the group delay responses of the original (blue) and corrected filter (black) with 4 section all-pass corrector (red).
Figure 3.9: The two-path polyphase IIR filters normalized group delay response corrected using (a) a single section corrector and (b) a 4 section corrector along with the original filters (blue), and the phase compensators (red) group delay responses.

3.6 T-wave Alternans Challenge and MIT-BIH Arrhythmia Database

For evaluating the performance of the proposed decimation filter with ECG data, 10 second long recordings from two databases are used. The T-Wave Alternans Challenge (TWAC) Database provides ECG recordings at sinus rhythm which are acquired and digitized at 500 Hz by a typical 12-lead standard ECG from healthy subjects [113]. In addition, the MIT-BIH database provides ECG data with a wide range of arrhythmia and beat morphology variations, which are recorded via a two-channel recorder and digitized at 360 Hz [114, 115]. Spectral analysis was carried out for both datasets by taking the averages of the Power Spectral Densities (PSDs) of each data set for each Lead in order to determine the frequency content distribution of the data records. Results from the TWAC Database records showed that more than 99% of the average power is concentrated in the frequency band of $0 - 50 \text{ Hz}(\nu = 0 - 0.1)$ Figure 3.10
(a) presents the average power of the 12-Leads (labelled through I to V6) at the corresponding frequency band for record twa55. Figure 3.10 (b) shows the average power in several frequency bands of Lead II records of 13 different heart conditions, obtained from MIT-BIH Arrhythmia database. These heart conditions are chosen according to their density of occurrence such as atrial fibrillation, ventricular fibrillation and tachycardia. In the presence of different arrhythmias and conduction abnormalities, similar to the previous observation, more than 99% of the average power is concentrated in the frequency band of $0 - 45 \text{ Hz}$\left(\nu = 0 - 0.125\right)$. The $60 \text{ Hz}$ power-line interference (USA standard) and its second harmonic can be observed as peaks in the corresponding plots.

Figure 3.10: PSD of 10 seconds long (a) 12 lead recordings (from Lead I to Lead V6) of record twa55 in sinus rhythm obtained from TWAC Database and (b) 13 Lead II recordings from MIT-BIH Arrhythmia Database with various conduction abnormalities and beat morphologies. (AFIB : Atrial fibrillation , AFL: Atrial flutter, SBR: Sinus bradycardia, IVR: Idioventricular rhythm, SVTA: Supraventricular tachyarrhythmia, VFL: Ventricular flutter, VT: Ventricular tachycardia, BII: 2$^{nd}$ heart block, PREX: Pre-excitation, B: Ventricular bigeminy, T: Ventricular trigeminy and P: Paced rhythm)

In addition, the spectral resemblance between each of the leads and consecutive 10 second time segments of the same lead for both sets of data is studied by measuring the Cross Spectral Coherence (CSC) which is defined as 49,
\[ C_{xy} = \frac{|P_{xy}|^2}{P_{xx}P_{yy}} \]  

(3.8)

where \( P_{xx} \) and \( P_{yy} \) are the power spectral estimates of \( x \) and \( y \) respectively, and \( P_{xy} \) is the cross spectral estimate of \( x \) and \( y \). The coherence value shows the degree of spectral resemblance of \( x \) to \( y \) at each frequency, which ranges between 0 and 1 [49]. The CSC results are almost 1 for each lead and different time instances showing very high spectral resemblance amongst each other. Therefore, the performance of the decimation filters on one channel of a 10 second segment of an ECG signal is a good indication for its performance with respect to the overall system and longer recordings.

### 3.7 Error Measures

The performance of the proposed filter without and with phase correction can be quantified by measuring the filter input and output signal similarities in time domain (after time alignment) by calculating the cross correlation and Root Mean Square Error (RMSE) which are defined in (3.9) and (3.10), respectively [49, 116]. The correlation coefficient values are normalised between 0 and 1 which indicate no similarity and exact match, respectively. The complement of the correlation coefficient is used as a measure for the waveform dissimilarities and is insensitive to variations in the gain [117]. Thus, the RMSE is used to measure the amplitude variations of the input \( x(k) \) and output \( y(k) \).

\[ R(x, y) = \frac{C_{xy}}{\sqrt{C_{xx}C_{yy}}} \]  

(3.9)
where, $C_{xy}$ is the covariance between $x(k)$ and $y(k)$, $C_{xx}$ and $C_{yy}$ are the auto-covariance of $x(k)$ and $y(k)$, respectively.

$$RMSE = \frac{1}{K} \sqrt{\sum_{k=1}^{K} (x(k) - y(k))^2} \quad (3.10)$$

where, $K$ is the number of samples, $x(k)$ is the ECG data records, and $y(k)$ is the output from the decimation chain. The spectral similarities of the corresponding signals are quantified by calculating the Distortion Ration (DR) and the CSC given in (3.11) and (3.8) respectively. The CSC is a measure that is not effected by the signal morphology and amplitude, quantifying the steadiness of the input and output signal phases as a function of frequency [117].

$$DR = \sqrt{\frac{\sum (Y(f) - X(f))^2}{\sum X(f)^2}} \quad (3.11)$$

where $X(f)$ and $Y(f)$ are the spectral magnitude of the input and output of the decimation chain.

### 3.8 Experimental Results

In order to test and evaluate the performance of the proposed decimation filter for ECG signal acquisition, ECG data, described in Section 3.6, are fed into a single-loop 3rd order 1-bit Σ∆ modulator and modulated at a rate of 128 times faster than the Nyquist sampling frequency which are then filtered and decimated. The behavioural fixed-point models of the aforementioned filters were implemented through MATLAB. The simulations are run for the 10 second long Lead II recordings of two different databases. First dataset includes 6 recordings (records twa39, twa46, twa55, twa60, twa90 and
Both datasets are decimated by the decimation filters described in Section 3.3 with and without phase compensation. The decimation filter performance is first evaluated by measuring the SNR at the output of the \( \Sigma \Delta \) modulator and the decimation chain without and with compensation, under the sampling rates of 46.08 kHz and 64 kHz. The signal bandwidths are chosen as 360 Hz and 500 Hz for the aforementioned sampling conditions respectively. The most distinguishable feature of an ECG signal is the QRS complex (see Figure 2.2) where its frequency content can extend up to 50 Hz \[26, 33, 118, 119\]. Any added noise in this bandwidth will distort the time domain signal and will be most visible with the QRS complex. Thus, 50 Hz sine wave with an added 60 dB white noise is fed to the \( \Sigma \Delta \) modulator to demonstrate the effect of \( \Sigma \Delta \) modulation and decimation filtering in the signal bandwidth of 50 Hz. The SNR (in the band of interest) at the output of the modulator and the decimation chains without and with phase compensation at the sampling rate of 64 kHz were 80.2 dB, 80.1 dB and 80.1 dB respectively. The proposed decimation filter successfully preserves the SNR in the band of interest after the decimation process. In addition, SNR’s obtained after decimating the modulated signal with the decimation filters without and with compensation demonstrated that the compensation filters do not contribute to the in band noise attenuation. Figures 3.11(a) and 3.11(b) present the Power Spectral Densities (PSDs) and SNRs of the modulator and decimator outputs, respectively.
Figure 3.11: Power spectrum measurements within the signal bandwidth of 500 Hz for sampling rate of 64 kHz, (a) ΣΔ modulator output and (b) decimation chain output (without phase compensation (black) and with phase compensation (red)).

Normalized group delay introduced by the final three stages of the decimation chain (two all-pass based HB IIR filters and a slink compensator), the compensated group delay (up to ν = 0.125), filter magnitude response at the decimation chain output and the PSD of a Lead II ECG record (twa55) from a healthy subject are shown in Figure 3.12.

Figure 3.12: PSD of Lead II recording of record twa55 at sinus rhythm (blue) versus the group delay variation with (green) and without phase compensation (red) filters, $f_1 = 22.5 \text{ Hz}(\nu_1 = 0.0625)$ and $f_2 = 45 \text{ Hz}(\nu_2 = 0.125)$ indicated by the yellow lines at $f_{s2} = 46.08 \text{ kHz}$.

The variations in the total group delay of the filters are calculated in the specified frequency bands (yellow lines in Figure 3.12). Table 3.1 provides the group delay
variations in number of samples and in µ-seconds at two different sampling rates \( f_{s1} = 64 \text{ kHz} \) and \( f_{s2} = 46.08 \text{ kHz} \) for TWAC Database and MIT-BIH Arrhythmia Database respectively.

Table 3.1: Variation in the group delay in normalized frequency bands of \( \nu B_1 = 0 - 0.0625 \) and \( \nu B_2 = 0 - 0.125 \).

<table>
<thead>
<tr>
<th>Normalized Frequency band ( (\nu B) )</th>
<th>Without compensation</th>
<th>With compensation</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Samples</td>
<td>µSec ( (f_{s1}) )</td>
</tr>
<tr>
<td>0 – 0.0625</td>
<td>1.86</td>
<td>40.28</td>
</tr>
<tr>
<td>0 – 0.125</td>
<td>7.61</td>
<td>165.15</td>
</tr>
</tbody>
</table>

\(^1 f_{s1} = 46.080 \text{ kHz} \)
\(^2 f_{s2} = 64 \text{ kHz} \)

As mentioned in Section 3.6 and shown in Figure 3.10, more than 99 % of the power of the ECG data is concentrated in the frequency range of the \( \nu = 0 - 0.125 \) for both datasets, thus group the delay variations are explored in this frequency range. According to the upper and lower limits of the wave durations given in Table 2.1, the group delay variation caused by the decimation filters are relatively low for the frequencies up to \( \nu = 0.125 \). The maximum variation the signals are exposed to are 165 µsecs \( (f_{s2} = 46.08 \text{ kHz}) \) and 119 µsecs \( (f_{s1} = 64 \text{ kHz}) \) without compensation, and 10.63 µsecs \( (f_{s2} = 46.08 \text{ kHz}) \) and 7.66 µsecs \( (f_{s1} = 64 \text{ kHz}) \) with compensation. These deviations are comparatively low regarding to the wave durations and normal limits for standard ECG features, thus the group delay variation due to phase non-linearity without compensation in this frequency range is negligible.

The first experimentation incorporated the data obtained from healthy subjects which are sampled at 500 Hz and oversampled by a factor of 128 (i.e. 64 kHz). Figures 3.13(a)
and (3.13b) illustrate the input \((twa55)\) and output from decimation chain without and with phase compensation, respectively. As presented in Figure (3.13) there is no visible error between two filter outputs which can be seen from Figure (3.14) illustrating this amplitude error. In Figure (3.14) a relatively large overshoot with a decaying manner can be observed which is due to the transient response of the IIR filters.

![Decimation chain output without phase compensation](image1.png)

**Figure 3.13:** Decimation chain output (black) versus the input (red) (a) without phase compensation and (b) with phase compensation.

![Amplitude difference](image2.png)

**Figure 3.14:** Amplitude difference between the input and output of the decimation chain, without (original - red) and with compensation (corrected - black).

The average of six RMSE values of data records obtained from decimation chain without compensation that are calculated according to (3.10) is 15 \(nV\) with a maximum value of 20 \(nV\) and a minimum of 10 \(nV\), and for the set with compensation it
is 13.2 \, nV with a maximum of 15.5 \, nV and minimum of 0.7 \, nV. The most obvious amplitude differences are observed at the QRS peaks where the frequency content of the QRS complex is exposed to a higher group delay. The largest peak amplitude errors in six of the data demodulated with decimation filter without and with phase correction are 43 \, nV and 44 \, nV. According to the normal variation limits given for the ECG wave amplitudes (Table 2.1) both mean errors and the maximum QRS peak errors are acceptable since they are relatively small. In addition, the overall mean of input/output dissimilarity calculated by (3.10) are, $0.2 \times 10^{-3}\%$ and $0.13 \times 10^{-3}\%$ for uncompensated and compensated filters, respectively and waveform dissimilarities for each data records are shown in Figure 3.15(a).

Finally, the spectral similarity is measured by calculating the distortion ratio as well as the CSC between the input and the output. The DR is calculated for six of the ECG data using (3.11), which are represented in Figure 3.15(b) and the means are 4.8% and 3.3%, for filters without and with compensation, respectively. The CSC for the input
and the output is also calculated which give values that are approximately one for most of the frequencies, showing high spectral coherence between the input and output.

The above mentioned experiment is repeated for the data obtained from MIT-BIH which are digitized at $360 \ Hz$ and oversampled by a factor of 128 (i.e. $46.08 \ kHz$).

Average of the RMSE values obtained from decimation chain without compensation is $83.5 \ nV$ and for the one with compensation is $75 \ nV$. The largest peak amplitude errors in 13 of the data demodulated with decimation filter without and phase correction are $48 \ nV$ and $60 \ nV$. A 10 second long Lead II recording from a patient suffering from ventricular tachycardia (record 213) after processing with the decimation filters discussed above is presented in Figure 3.16. The error measures between the input and output from original and compensated filter are given in Figure 3.17.

![Figure 3.16: Decimation chain output (black) versus the input (red) (a) without phase compensation and (b) with phase compensation (Ventricular Tachycardia).]
Figures 3.17: Amplitude difference between the input and output of the decimation chain, without (red) and with compensation (black). (Ventricular Tachycardia)

Figures 3.18 (a) and (b) show the waveform dissimilarity and the measured distortion ratio for these records. The mean DR is calculated to be 5.3% and 3.7% for decimation filter without and with compensation, respectively. Also, the overall mean of input/output dissimilarity is estimated to be $0.4 \times 10^{-3}\%$ and $0.16 \times 10^{-3}\%$ for uncompensated and compensated filters, respectively. The group delay compensation results in a better performance by resulting in a lower waveform dissimilarity.

Figure 3.18: (a) Waveform Dissimilarity between the input and output of the overall decimation chain without (blue) and with group delay compensation (red). (b) Input/Output Distortion Ratios of the overall decimation chain without (blue) and with phase compensation (red).
The performance of the proposed polyphase filter is also compared to the all-pass based polyphase filters that are designed according to the algorithms provided in the state-of-the-art. Cascading multiples of the polyphase structures improves the stopband attenuation while keeping the cost and complexity low, however in the cost of decreased passband ripples and increased group delay. In order to achieve a better stopband attenuation, a cascade of two proposed polyphase filters are implemented \[91\].

According to the algorithm provided in \[120\], a two path polyphase filter with two cascaded all-pass filters in both paths and finally, the filter designed in \[121\] incorporating 4 cascaded two path polyphase structures with single all-pass filters are implemented. Table 3.2 shows the filter specifications, the number of adders and multipliers required and the group delay introduced by each filter.

Table 3.2: Two-path All-pass based HB IIR Filter Characteristics Comparison with the State-of-the-art

<table>
<thead>
<tr>
<th>Filter Specifications</th>
<th>Cost</th>
<th>Group Delay Variations (samples)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Order</td>
<td>(\epsilon_p) (dB)</td>
<td>(\epsilon_s) (dB)</td>
</tr>
<tr>
<td>Proposed</td>
<td>5</td>
<td>0.47 (\mu)</td>
</tr>
<tr>
<td>[91]</td>
<td>10</td>
<td>1 (\mu)</td>
</tr>
<tr>
<td>[120]</td>
<td>9</td>
<td>1.17 (\mu)</td>
</tr>
<tr>
<td>[121]</td>
<td>12</td>
<td>0.1</td>
</tr>
</tbody>
</table>

1. \(\epsilon_p\): Passband Ripples  
2. \(\epsilon_s\): Stopband Attenuation  
3. \(\nu_T\): Normalized Transition Bandwidth

### 3.9 Analysis

The calculated SNR at the modulator and decimation filter output, as shown in Figure 3.11 demonstrated that the proposed design with or without compensation successfully
filters the out-of-band quantization noise while preserving the SNR in the signal band without detrimental group delay distortion. In Figure 3.12 it is shown that the use of phase compensation filters reduces the group delay variations in the frequency band of $\nu = 0 - 0.125$ which directly effects the amount of time delay each frequency component is exposed to. Furthermore, in Table 3.1 the amount of the group delay variations in the frequency bands of $\nu = 0 - 0.0625$ and $\nu = 0 - 0.125$ are presented in terms of samples and $\mu$sec as the proposed filters are designed as discrete time digital filters. These sample values can be easily converted into seconds by simply dividing them into the oversampling frequencies which are indicated as $f_{s1} = 46.08kHz$ and $f_{s2} = 64kHz$ for MIT-BIH Arrhythmia Database and TWAC Database, respectively. In order to provide a more detailed understating of Table 3.1 lets look at the fourth and sixth columns which present the amount of group delay variation for decimation filters without and with phase compensation in the bands of interest, in terms of $\mu$sec. In the band of $\nu = 0 - 0.0625$ the group delay variation for the proposed decimation chain is measured as 29 $\mu$sec which indicates that if a frequency component at DC ($\nu = 0$) is delayed by 0 second, the frequency component at $\nu = 0.0625$ will be delayed by 29 $\mu$sec. On the other hand, use of phase compensation filters reduces this variation to 3.44 $\mu$sec. This will effectively lead to a misalignment of the time domain signal which will result in signal distortion and thus, lower group delay variation indicates less distortion. Although phase compensation reduces the group delay variation, the values obtained without phase compensation are comparatively low regarding to the wave durations and normal limits of a standard ECG features as presented in Table 2.1. This provides a good indication that the phase compensation is not compulsory for this application,
nevertheless further simulations are carried out in order to evaluate the effect of phase
non-linearity on different ECG data records. The results of these simulations are
presented in Figures 3.15 and 3.18 which show the waveform dissimilarity and distortion
ratio for each ECG data record of the two aforementioned databases. In both figures,
the blue and red bars present the results of analysis obtained without and with phase
compensation, respectively. In Figure 3.15 (a) the waveform dissimilarity for six TWAC
ECG data records are presented and it can be observed that the utilization of phase
compensation filters reduces the amount of waveform dissimilarity. Amongst these six
data records, twa90 shows higher values which is related to the power of the signal.
twa90 record is observed to have lower signal power, calculated as the average sum of
the absolute squares of the time-domain samples, when compared to the other data
sets which shows that the signal of interest is more effected by the high frequency noise.
As the group delay variation increases at higher frequencies than $\nu = 0.125$ this leads
to increased signal distortion and thus increased waveform dissimilarity. However, the
values presented in this figure are in the order of $10^{-4}$ which is a negligible mismatch.
In addition, Figure 3.15 (b) presents the distortion ratio where phase compensation
reduces the distortion ratio, as expected. The twa46 and twa90 records present higher
ratios as opposed to the other signals which is due to having lower signal power. As DR
is a ratio of the power of difference between the magnitude of the original data and the
decimation filtered data, to the power of the signal, the DR values can vary for different
signals. The DR results are also dependent on the frequency content of the signals.
When the signal has more high frequency components the signal faces more distortion
due to the increasing group delay variation which affects the effectiveness of the phase
compensators as can be seen from the difference between the red bars for these two data records. Although, phase compensation filters reduce DR and waveform dissimilarity, the results obtained without phase compensation is negligible. Therefore, it is not crucial to use phase compensation filters which will increase the hardware and algorithm complexity. Furthermore, Figures 3.18 (a) and (b) present the waveform dissimilarity and DR for the ECG data records obtained from the MIT-BIH Arrhythmia Database. Similarly, phase compensation filters reduces the signal distortion for all data records. The two conditions, *Ventricular bigeminy* ($B$) and *2º heart block* ($BII$) presented in both figures exhibits relatively higher waveform dissimilarity and DR, compared to the other cardiac conditions. Power of these signals are calculated to be lower than the rest of the signals which leads to a relatively higher distortion ratio and waveform, since according to (3.11) decreasing signal power, increases the distortion ratio and waveform dissimilarity.

The simulation results obtained by using two ECG data sets with different diagnostic importance, the spectral similarity for both sets of data (healthy and unhealthy) is determined to be approximately the same due to the passband characteristics of the decimation filters. Morphological similarity between the filter input and the output is too high for the filters without compensation in the case of both normal sinus rhythm and arrhythmia. In addition, since the mean amplitude error is negligibly small and almost the same for both filters, the use of the compensation filter is not necessary, thus saving power and cost by avoiding the use of extra hardware. This is due to the fact that more than 99 % of the signal energy is concentrated in the frequency range of the $\nu = 0 - 0.125$ where the group delay variations are minimal. Although, different
results are obtained for each data set, it should be noted that, biopotential signals are non-stationary and there is no standard one best solution for them. Overall results have proven that proposed design do not cause any drastic distortion which might lead to misdiagnosis.

The comparisons with the state-of-the-art filters and the filters implemented according to the state-of-the-art algorithms, presented in Table 3.2, demonstrated that the proposed polyphase structures are superior to others since they provide very small passband ripples \(0.47 \mu dB\) and sufficient stopband attenuation with negligible group delay variation in the band of interest. The proposed design requires no multiplier which makes it superior to the filter designed by [120], whilst they exhibit almost the same magnitude characteristics. Although, [121] is a multiplier free design as well, it requires almost twice the number of adders in order to achieve a stopband attenuation of \(-80\) \(\text{dB}\) and suffers from having large passband ripples. In addition, the group delay variation introduced by the proposed design is almost half of the group delay variation introduced by [120] and [121]. Cascading two polyphase structure provides a very high stopband attenuation and micro dB passband ripples. However, the performance of a single polyphase filter incorporated in a decimation chain is sufficient enough to attenuate the out-of-band quantization noise and prevent aliasing, therefore there is no need for using higher order filters with increased group delay variation.

### 3.10 Conclusions

This chapter presented the design of a decimation chain composed of a forth order Slink filter, two fifth order allpass based polyphase halfband IIR filters and a first or-
der Slink compensator to be employed in ECG monitoring applications. The proposed decimation chain is a low-complexity design which can be implemented without the need for multipliers where the allpass filter coefficients are simply powers of 2 and can be implemented with shift-add operations. The state-of-the-art decimation filters used in biomedical applications employ high order FIR filters which increases the complexity of the decimators. Thus, the proposed design provides reduction in complexity and offers an attractive alternative for the already existing solutions in the literature. Furthermore, it is demonstrated that the phase non-linearity of the IIR polyphase filters, do not cause a significant distortion on the morphological and spectral characteristics of the input ECG signal. This is due to the very narrow and low frequency range corresponding the physiologically significant frequencies for the ECG signals. In other words, these frequency bands are close to DC where the group delay variation is already minimal (minimum phase filter) without using any group delay compensator. The high spectral coherence, high morphological correlation and low $nV$ error between the input and the output signals quantifies that the IIR polyphase filter introduces minimal distortion to the signal which would not affect critical diagnosis therefore, phase compensation is not a must for such an application. The work reported in this chapter is that of a decimation filter to be used in an ECG data acquisition systems with very efficient filtering performance that delivers low-power and low-complexity. The proposed design meets the requirements of demodulating a $\Sigma \Delta$ modulator output while preserving the diagnostically important morphological features of the ECG signal.
Chapter 4

Investigation of Hardware Efficient Implementation Methods for Wavelet Filter Banks

4.1 Introduction

Implementation of the DSP algorithms requires intensive multiply-add operations which increases the importance of realizing the associated hardware for power limited applications. The general purpose multipliers are most widely used building blocks, however they are known to be hardware and power inefficient \[122\]. Hence, a vast amount of research is being carried to reduce their complexity. Nevertheless, the general purpose multipliers are not required for multiplications with fixed constants, known as Single Constant Multiplication (SCM) or Multiple Constant Multiplication (MCM). Therefore, they can be replaced with bit-wise shifts and less complex adders and/or
subtractor in order to reduce the hardware complexity and power consumption of the
system. As adders and subtractors have similar hardware complexity both will be re-
ferred to as adders in rest of the document. Complexity of such shift-add networks is
dependent on the quantity of adders, since shifting can be realised by hard-wiring for
bit-parallel arithmetic operations, without any additional hardware [123]. Although
replacement of multipliers with fixed shift-add networks decreases the hardware cost,
different design methodologies and techniques have been studied over the years in
order to optimize the resource utilization of these networks both for parallel and time-
multiplexed architectures [122][131].

Most of discrete transforms such as Discrete Fourier Transform (DFT), Discrete Co-
sine Transform (DCT) and digital filters employ constant multiplications that can be
implemented as shift-add operations. Discrete wavelet analysis is another example of
discrete transforms that can be realised as a filter bank. It employs fixed coefficient
filters associated with a selected mother wavelet, hence it can benefit from shift-add
network topologies. To the best of author’s knowledge, the replacement of filter mul-
tipliers with these networks has not been investigated deeply in the biomedical signal
processing literature. This chapter presents a hardware efficient reconfigurable constant
multiplier block structure and its FPGA implementation that can be employed in time-
multiplexed filter structures for wavelet analysis of biomedical signals such as ECG,
EEG, and EMG. This design is based on efficiently employing dedicated resources of
the FPGA as presented in [124][125], however with an extension by taking the advant-
age of the new FPGA technology. The rest of this chapter provides a brief summary
of existing methods for designing multiplier blocks in parallel and time-multiplexed

72
filter architectures. Detailed structure of the implemented design and its comparison to the general purpose parallel multiplier are demonstrated. Furthermore, a DWT FE is also designed and implemented using an FPGA where its hardware resource utilization and power consumption is compared with the state-of-the-art designs that are also multiplier free. The chapter concludes with a results and comparisons section.

4.2 Parallel Architectures and Multiplier Blocks

In a parallel filter architecture as shown in Figure 4.1, multiplications are performed concurrently. When dealing with constant multiplications, each general-purpose multiplier shown in the dashed line boxes in Figure 4.1 can be replaced with a separate fixed shift-add network.

![Parallel filter architectures](image)

(a)

![Parallel filter architectures](image)

(b)

Figure 4.1: Parallel (a) Tapped Delay Line (TDL) and (b) Time Delay and Accumulate (TDA) filter architectures where the boxes highlight the multiplication blocks.

Minimization of the number of adders being used in each network is known as the SCM problem. The coefficients can be represented in signed digit format where the number of non-zero terms determines the number of adders required to generate the corresponding
coefficient. However, instead of implementing separate SCMs, Bull et al. introduced the primitive operator filter technique, also known as MCM or multiplier blocks, to further reduce the hardware utilization. This method combines multiple constants and shares the intermediate adder results between each coefficient, generating concurrent coefficients. Over the last three decades many different optimization algorithms have been proposed for reducing the resource utilization of the multiplier blocks for parallel architectures, that can be broadly divided in two concepts, Common Sub-expression Elimination (CSE) and Directed Acyclic Graph (DAG).

4.2.1 Common Sub-expression Elimination (CSE) Technique

The CSE is an optimization technique that maximizes sharing of common sub-expressions by exploiting the repeating digit pattern among coefficients that are in the CSD format. CSD is a subset of the signed digit number system in which constants are represented using signed digits -1, 0, and 1. It provides a unique representation for any n-bit twos’ complement number and gives minimum number of non-zero digits compared both to the twos’ complement and the signed digit formats. The CSD algorithm searches for a bit stream with a format of ‘011..11’ that has at least 2 bits and replaces this pattern with ‘100...01’ of the same length where 1 corresponds to -1. The procedure starts from the least significant bit (LSB) towards the most significant bit (MSB) of a constant and continues searching until no adjacent digits are non-zero.

In order to explain the CSE procedure lets consider, three coefficient values 7, 29 and 39 and their CSD encoding 00100 11, 100 101, and 10100 11, respectively. According to
the CSE method the most common sub-expression among these coefficients is $100\overline{1}$ and this can be shared for reducing the number of adders required. Figure 4.2 shows the structure of a constant multiplier before and after sub-expression sharing where the dashed line box indicates the common expression. The amount of reduction in the number of adders is dependent both on the number of constants and their word-length.

![Figure 4.2: Constant multiplier structure for coefficients 7, 29, 39 (a) before and (b) after CSE technique.](image)

Over the years different techniques were introduced to maximize the identification of the most appropriate common sub-expression for elimination. Hartley’s first algorithm in [129] performs a recursive search where most frequently appearing subexpressions with two non-zero digits are extracted both from the coefficient itself and across all the coefficients whereas Pasko’s algorithm [133] searches for patterns with the most non-zero digits (i.e. the largest) and selects the one with the highest occurrence frequency. Unlike these techniques that were based on horizontal search, Jang et al. [140] introduced a vertical search algorithm for common sub-expressions. Some other studies combined the two mentioned methods which search for repeated digit patterns with a minimum of two non-zero digits both in vertical and horizontal directions [130,141].
4.2.2 Directed Acyclic Graph (DAG)

Adder graphs, also known as the DAGs, use vertices (nodes) and edges to represent two-input adders and binary shifts, respectively. A typical adder-graph has an initial vertex (not an adder) that is assigned the value 1 and a terminal vertex which gives the desired output value. The output of each intermediate vertex corresponds to a partial sum, also known as fundamentals. Each edge can be assigned any negative or positive power of two integer, representing the multiplication value. In terms of multiplication, input fed into the initial vertex can be named as the multiplicand whereas the output from the terminal vertex is the product. In order to have integer coefficients, a scaling of $2^n$ can be applied to the fixed-point filter coefficients with $\text{LSB} \cdot 2^{-n}$ and the output can be rescaled with $2^n$ at no additional cost \[127\]. For further understanding, an example for the SCM problem is presented in Figure 4.3 in which number 27 is represented using CSD and method given in \[127\].

![Figure 4.3: Multiplier graph for 27 represented with (a) CSD and (b) method given in \[127\](a)

(b)

The concept of adder graph method for implementing multiplier blocks is similar to CSE in which common fundamentals are shared among the coefficients instead of common sub-expressions. However, unlike CSE, fundamentals are not required to be in a certain number format to obtain an optimal solution. There are different heuristic approaches published for finding the optimal minimal solution for multiplier blocks using this
method. Adder-graphs were first employed by Bull and Horrocks (BHM algorithm) \[122\] where a set of fundamentals were created and used as an input for a new vertex to create new set of values. The BHM algorithm was further optimized by Dempster and Macleod who introduced the Reduced Adder Graph (RAG-n) algorithm \[127\] which utilizes pre-stored the Minimum Adder Graph (MAG) tables to generate graphs of coefficients with already existing fundamentals. Unlike the BHM, RAG-n can generate partial sums by using values greater than the sum itself (i.e 7 = 8 - 1, rather than 7 = 4 + 2 + 1) and that are only odd values to maximize the number of fundamentals. Figure 4.4 presents two adder graphs generated by the BHM and the RAG-n algorithms for the coefficient set 1, 7, 16, 21 and 33, demonstrating the fundamental sharing.

![Adder graphs](image)

Figure 4.4: Adder graphs generated for coefficient set 1, 7, 16, 21 and 33 by (a) BHM and (b) RAG-n \[127\]

### 4.3 Time-multiplexed Architectures and Multiplier Blocks

For applications where the sample rate is lower than the maximum obtainable clock frequency, time-multiplexing is an efficient way of fully utilizing FPGA resources and reducing hardware cost. Time-multiplexed structures re-use hardware resources via time-sharing which reduce the number of adders and multipliers employed. These
designs are resource efficient compared to the parallel ones and different filter structures such as \texttt{FIR} and \texttt{IIR} and polyphase filters as well as filter banks can be implemented as time-multiplexed structures. Figure 4.15 (a) shows a conventional time-multiplexed Tapped Delay Line (TDL) FIR architecture. Similar to the parallel architectures, general-purpose multipliers as well as the coefficient memory can be substituted with a multiplier block for complexity and area reduction \cite{125,142}. Several studies employed multiplier blocks for time-multiplexed architectures by introducing the usage of multiplexers in order to add reconfigurability to the multiplier blocks by providing the flexibility of choosing between different coefficients. Turner \textit{et al.} \cite{143} explores the design of ReMBs on FPGA devices by exploiting the resource redundancy created while mapping algorithms on FPGA's via the CSE method. Based on a similar idea Demirsoy \textit{et al.} \cite{125} developed an optimization algorithm for time-multiplexed architectures that takes the advantage of the dedicated FPGA resources using the DAG method. Demirsoy’s algorithm is based on the creation of table of the DAGs for all possible fundamentals and searches for the ones that can be shared efficiently. The algorithm is created to provide an optimal solution for Single-Input-Single-Output (SISO) and Single-Input-Multiple Output (SIMO) systems such as time-multiplexed filter banks. On the other hand, in \cite{142} a more generalized algorithm based on fusion of single coefficient DAGs is introduced that provides area-efficient ReMB designs for ASIC implementations.
4.4 Reconfigurable Multiplier Blocks for FPGAs

4.4.1 ReMBs for 4-series FPGAs

The older series of Xilinx FPGAs such as the Virtex-4 are composed of basic building blocks including Configurable Logic Blocks (CLBs), DSPs, programmable interconnect, and I/O. Each CLB contains four slices where each slice has two 4-input LUTs, carry-chain logic, and two flip-flops. A simplified schematic of a Xilinx 4-series half-slices are given in Figure 4.5.

![Figure 4.5: Xilinx 4-series (a) Configurable logic block and (b) simplified half slice](image)

A 1-bit full adder can be implemented using the dedicated carry-chain logic and an LUT for the remaining XOR gate. The concept of reconfigurability in multiplier blocks employs a multiplexer where its output is connected to at least one input of the adder. The multiplexer can be implemented using the unused pins of the LUT that is used for an XOR gate. This idea was first introduced by Turner et al. [143] in which 4-input LUT followed by the dedicated fast carry logic are used to implement reconfigurable multiplier blocks with the CSE technique. Later Demirsoy et al. [125] proposed an improved algorithm by using "basic structures" fused into the adder graph technique.
where the structure of the ReMB is not dependent on the coefficient representations.

The main aim of both techniques is to make full use of FPGA logic elements in order to create a set of coefficients for SISO and SIMO systems.

A basic structure is simply a two input adder with at least one of its inputs connected to a multiplexer that can be implemented with a 4-input LUT. Due to the available FPGA logic technology, Demirsoy’s algorithm is based on the use of 2:1 multiplexers (muxes) which can be implemented on the FPGA slices with no additional cost. Figure 4.6 shows a basic structure and its mapping on Virtex-4 FPGA for different add/subtract operations [125].

\[ \text{(a) Sum} = \begin{cases} 2^a A + 2^b_0 B_0 & \text{for } S = 0 \\ 2^a A + 2^b_1 B_1 & \text{for } S = 1 \end{cases} \]

\[ \text{(b) Sum} = \begin{cases} 2^a A + 2^b_0 B_0 & \text{for } S = 0 \\ 2^a A + 2^b_1 B_1 & \text{for } S = 1 \end{cases} \]

\[ \text{(c) Sum} = \begin{cases} 2^a A - 2^b_0 B_0 & \text{for } S = 0 \\ 2^a A - 2^b_1 B_1 & \text{for } S = 1 \end{cases} \]

\[ \text{(d) Sum} = \begin{cases} 2^a A + 2^b_0 B_0 & \text{for } S = 0 \\ 2^a A + 2^b_1 B_1 & \text{for } S = 1 \end{cases} \]

Figure 4.6: (a) Basic structure with 2:1 multiplexer and 4-input LUT mapping for (b) addition, (c) subtraction and (d) addition/subtraction.

The basic structure given in Figure 4.6 (a) is capable of generating two different sums at the output depending on the select line (S) of the mux. If the adder can be in...
switchable adder/subtractor mode which can be controlled via ‘Add/Sub’ input, then the basic structure generates four different outputs, considering that there is a separate control signal for controlling the operation of the adder. The ”≪” sign at the LUT inputs represents bit-wise left shift with the corresponding positive or negative values (i.e. \( a, \pm b_0, \) and \( \pm b_1 \)) which is equivalent to power of two multiplications of the inputs \( A, B_0, \) and \( B_1. \) The \( C_{in} \) input of the basic structure given in Figure 4.6 represents the carry bit from the previous least significant stage or the initial carry bit in a n-bit full adder carry chain. A basic structure with a 2:1 mux implements \( \text{Sum}_2 = 2^a A \pm 2^{b_0} B_0 \) or \( \text{Sum}_2 = 2^a A \pm 2^{b_1} B_1, \) thus it produces different fundamentals (i.e. partial sums) for the multiplier block. On an FPGA, the adder/subtractor has limited functionality since the mux select line \( (S) \) controls both the mux and the operation of the adder/subtractor as shown in Figure 4.6(d). Therefore, a basic structure can only produce two different partial products in an FPGA design. Inputs of muxes can be connected to the input of the ReMB or to the output of another basic structure or to ground. In order to implement a set of coefficients, a number of these basic structures can be interconnected in chain (i.e. horizontally cascaded) and tree forms (i.e. inputs of a mux connected to the output of another basic structure). The number of generated coefficients at the output is dependent on the basic structure topology, the number of basic structures and how they are interconnected. For example, if two basic structures given in Figure 4.6 (b) are interconnected (both with 2 different outputs) then the output set size is equivalent to 4 \( (2 \times 2) \). To find a valid ReMB design for an aimed coefficient set, it is critical to realize required depth of the design \( \text{depth}_{\text{remb}} \), and the adder cost of each coefficient. The depth of a design represents the number of required cascaded stages.
to obtain the required number of coefficients which can be calculated using (4.1).

\[ N = n^{k^2-1} \]  

where \( N \) is the size of the coefficient set at the output node, \( n \) is maximum number of fundamentals that can be generated with a basic structure (i.e. \( n = 2 \)) and \( k \) is the number of cascaded basic structures (i.e. layers). In addition, in order to generate each coefficient a minimum number of cascaded adders are required on each path between the input and output nodes known as the adder-cost (\( \text{cost}_{\text{adder}} \)) \[145\]. The adder cost of each coefficient can be calculated from the CSD representation of the coefficients using (4.2).

\[ \text{cost}_{\text{adder}} = \left\lfloor \log_2(nz(C)) \right\rfloor \]  

where \( nz(C) \) represents number of non-zero terms for CSD coefficients. It is important to note that minimum adder depth for a design may require more adders to be employed than the required number for a minimal adder solution. Thus, following these requisites, ReMB depth can be generalized using (4.3) \[125\].

\[ \text{ReMB}_{\text{depth}} = \max\left(\min(k), \max(\text{cost}_{\text{adder}})\right) \]  

Lets consider a set of coefficients \( C_1 = \{7, 19\} \) with set size \( N = 2 \). Adder-cost for 7 (\( 8 - 1 \)) is one whereas for 19 (\( 16 + 2 + 1 \)) is two. Thus, a ReMB with a depth of two should be designed. On the other hand, a set \( C_2 = \{9, 15, 33, 63\} \) with \( N = 4 \) which consists of coefficients with adder depth one, requires a ReMB depth of two as well. An example adder graph which shows chain and tree form, basic structure interconnections along with maximum number of outputs is demonstrated in Figure 4.7 \[125\].
4.4.2 ReMBs for 7-series FPGAs

In the previous section, the concept of ReMB utilizing 4-input LUTs for implementing 2:1 mux and a full-adder is presented. The more recent FPGAs such as the Kintex-7 replace the 4-input with 6-input LUTs which increases the amount of logical combinations by $2^{48}$ that can be achieved by a single LUT. A simplified half-slice of a Xilinx 7-series FPGA is given in Figure 4.8.

Figure 4.8: Simplified half slice of Xilinx 7-series FPGA.
In addition, the new LUTs can be configured to use six inputs with one output and five or less number of inputs with two outputs. With the increased number of input ports of the LUTs, the method introduced above can be expanded by replacing the 2:1 mux in the basic structure with either a 3:1 or 4:1 mux for no additional cost and the structure of the modified basic structure is given in Figure 4.9 (a).

$$
\begin{align*}
\text{(a) Sum} &= \begin{cases} 
2^a A + 2^b B_0 & \text{for } S_0 = 0, S_1 = 0 \\
2^a A + 2^b B_1 & \text{for } S_0 = 1, S_1 = 0 \\
2^a A + 2^b B_2 & \text{for } S_0 = 0, S_1 = 1 
\end{cases}
\end{align*}
$$

$$
\begin{align*}
\text{(b) Sum} &= \begin{cases} 
2^a A + 2^b B_0 & \text{for } S_0 = 0, S_1 = 0 \\
2^a A + 2^b B_1 & \text{for } S_0 = 1, S_1 = 0 \\
2^a A + 2^b B_2 & \text{for } S_0 = 0, S_1 = 1 
\end{cases}
\end{align*}
$$

$$
\begin{align*}
\text{(c) Sum} &= \begin{cases} 
2^a A - 2^b B_0 & \text{for } S_0 = 0, S_1 = 0 \\
2^a A - 2^b B_1 & \text{for } S_0 = 1, S_1 = 0 \\
2^a A - 2^b B_2 & \text{for } S_0 = 0, S_1 = 1 
\end{cases}
\end{align*}
$$

$$
\begin{align*}
\text{(d) Sum} &= \begin{cases} 
2^a A + 2^b B_0 & \text{for } S_0 = 0, S_1 = 0 \\
2^a A - 2^b B_1 & \text{for } S_0 = 1, S_1 = 0 \\
2^a A + 2^b B_1 & \text{for } S_0 = 0, S_1 = 1 \\
2^a A - 2^b B_2 & \text{for } S_0 = 1, S_1 = 1 
\end{cases}
\end{align*}
$$

Figure 4.9: (a) Basic structure with 3:1/4:1 multiplexer and its 6-input LUT mapping for (b) addition, (c) subtraction and (d) addition/subtraction.

If the adder structure is either an adder or a subtractor then the basic structure can produce three distinct results as presented in Figures 4.9 (b) and (c), respectively. Figure 4.9 (b) represents an addition operation coupled with a 3:1 mux where signals
\( S_1 \) and \( S_0 \) represent the MSB and LSB of the mux control signal. In this configuration, \( S_1 = S_0 = 0 \) selects the first input of the mux, \( B_0 \) scaled by \( 2^b \), which is added to \( A \) and results in the one of three possible output indicated as \( \text{Sum} = 2^a A + 2^b B_0 \). Similarly, Figure 4.9 (c) presents a subtraction operation in which the inputs are inverted and the subtractor is fed with a 'hot one' through the carry input of the LSB. For \( S_1 = 1 \) and \( S_0 = 0 \) the output of the mux is \( 2^b B_1 \) which is subtracted from \( 2^a A \) resulting in 'Sum = \( 2^a A - 2^b B_1 \)'. Whereas, if the adder is a switchable adder/subtractor then the basic structure can generate four distinct outputs as shown in Figure 4.9 (d). A 4:1 mux requires 6 input ports where two are for the select lines and 4 for the inputs. Since the 7-series FPGAs consist of only 6-input LUTs, an extra one input will cause utilization of an additional LUT in order to realize the addition operation, which is not desired. Therefore, one of the inputs of the mux must be shared amongst others. Although input sharing might seem as redundant and unnecessary, it actually provides more reconfigurability to the design. In addition, by switching the ports of the input data to the mux, three distinct combinations of output set is obtained with the same input which is due to the mux select line (\( S_0 \) or \( S_1 \)) controlling both the mux and the operation of the adder/subtractor. This simple modification adds more reconfigurability to the basic structure to reduce the hardware complexity further, especially for high order filters. Furthermore, the flexibility of using the 6-input LUTs as 5-input, 2-output one, enables to realize addition of two 2:1 mux outputs. This leads to the implementation of fundamentals such as 5 \((4 + 1)\) and 15 \((16 - 1)\) to be realized in one structure which was not possible before. The new structure and its mapping on an LUT with switchable adder/subtractor is demonstrated in Figure 4.10 and each input can be
added or subtracted similar to Figures 4.9 (b) and (c). Due to the dedicated resources of FPGA, adders in basic structures are implemented as ripple-carry adders.

\[
\begin{align*}
(a) \quad \text{Sum} &= \begin{cases} 
2^a A_0 + 2^b B_0 & \text{for } S = 0 \\
2^a A_1 + 2^b B_1 & \text{for } S = 1 
\end{cases} \\
(b) \quad \text{Sum} &= \begin{cases} 
2^a A_0 + 2^b B_0 & \text{for } S = 0 \\
2^a A_1 + 2^b B_1 & \text{for } S = 1 
\end{cases}
\]

Figure 4.10: (a) Basic structure with two 2:1 muxes and (b) its mapping on a 5-input, 2-output LUT.

### 4.5 ReMB for Daubechies Filters in Biomedical Applications

As mentioned earlier in this chapter, DWT can be implemented as a filter bank with a tree structure which employs constant coefficient scaling and wavelet filters. The coefficients of these filters are irrational and vary according to the selection of a mother wavelet. Daubechies-4 \((db_4)\) wavelet, with four vanishing moments and eight coefficients, is a popular wavelet family that has been used in many different biomedical signal processing applications. Some of these applications include ECG denoising [147, 148], and feature extraction [149, 150], EEG spike detection [151, 152], and artifact removal [153], and finally motion detection using EMG [154]. Due to the wide application area of \(db_4\), an ReMB is designed and implemented for a Kintex-7 FPGA using the principles described in the previous sections. However, ReMB can also be designed for other wavelets.
4.5.1 \textit{db}4 Filter Coefficient Quantization

Floating point implementation of wavelet transform leads to perfect reconstruction (PR) FBs, meaning that the reconstructed output perfectly matches the input data. Therefore, the accuracy of DWT depends on the precision of the decomposition and reconstruction filter coefficients, as well as the precision of the FB internal data-path. Quantization of the filter coefficients will deteriorate the filter characteristics and operation, followed by the deterioration of the PR characteristics of DWT. In addition, the coefficient word-length plays a significant role in the design of the ReMBs, since the structure of the multiplier block depends on the desired coefficient precision. Longer word-lengths result in increased number of adders and thus, higher resource utilization.

On the other hand, insufficient number of bits will deteriorate the filter characteristics and the smoothness of the wavelet and scaling functions, as quantization with lesser precision will cause displacement of filter zeros away from $z = -1$ (i.e. Nyquist frequency). In order to evaluate the effect of filter coefficient quantization, the input and output data as well as the internal data-path of the DWT FB are designed to have floating-point precision whereas, the decomposition and reconstruction filter coefficients are quantized with various precision and employed in several level decomposition and reconstruction procedure of WT. The coefficient word-lengths varied from 7-bit (6 fractional bits and 1 integer bit) to 15-bits (14 fractional bits and 1 integer bit). Two realistic noise-free synthetic ECG and EEG signals are simulated and fed as input data to the DWT FB for evaluation purposes. The synthetic ECG waveform is generated using the model introduced by McSharry \textit{et al.} \cite{48}, whereas the synthetic EEG waveform is simulated by feeding white noise into a $7^{th}$ order AR model which
is established by real noise-free EEG record obtained from Physionet [114]. The MSE and the Signal-to-Error Ratio (SER) between the floating-point inputs and the reconstructed outputs of the fixed-point coefficient DWT FB are measured. Figure 4.11 demonstrates measured error values using different coefficient word-lengths for both ECG (blue) and EEG (red) data for 5 level DWT. As it can be observed increasing the coefficient word-length decreases the introduced MSE and improves the SER.

Figure 4.11: Estimated (a) MSE and (b) SER of the reconstructed output with various filter coefficient precision.

Figure 4.12(a) presents the filter responses at each decomposition level where D1-D5 are the highpass branch responses at levels 1 to 5, respectively and A5 is the lowpass branch response at level 5. Figure 4.12(b) compares the wavelet and scaling filter responses, whereas Figure 4.12(c) presents the $db_4$ scaling and wavelet functions that are obtained with the floating-point (blue) and 11-bit (red) coefficients. Finally, Figure 4.12(d) shows the Pole-Zero Plane (PZP) of the floating and fixed-point scaling filter. Approximately -72 dB MSE and 140 dB SER can be observed with coefficient word-length of 11 bits (10 fractional bits). In addition, Figures 4.12 (b), and (c) demonstrates that coefficients represented with 11 bits preserve the smoothness and
the characteristics of the wavelet and scaling filters and functions. The smoothness is directly linked to the number of zeros at the Nyquist frequency and the PZP of the quantized filter demonstrates that the two zeros moved towards $z = -1$ where the remaining two are still closely placed. Therefore, the error introduced at the reconstructed output with 11-bit coefficients is decided to be negligible for this study, as such error is not observable with the naked eye and the wavelet properties are preserved.

Figure 4.12: Frequency response of (a) fixed-point filters at each decomposition level, (b) Wavelet and Scaling filters with floating-point and 11-bit fixed-point coefficients (red), (c) Scaling and Wavelet function associated with $db4$, and (d) Pole-zero plane of the floating and 11-bit fixed-point coefficients of $db4$ scaling filter.
4.5.2 ReMB Structures for \( db4 \) Filters

The fixed-point \( db4 \) filter coefficients and their scaled (by \( 2^{10} \)) integer values are given in Table 4.1.

<table>
<thead>
<tr>
<th></th>
<th>Synthesis Filters</th>
<th>Analysis Filters</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Low-pass</td>
<td>High-pass</td>
</tr>
<tr>
<td></td>
<td>Fixed-point</td>
<td>Integer</td>
</tr>
<tr>
<td>b0</td>
<td>0.23046875</td>
<td>236</td>
</tr>
<tr>
<td>b1</td>
<td>0.71484375</td>
<td>732</td>
</tr>
<tr>
<td>b2</td>
<td>0.630859375</td>
<td>646</td>
</tr>
<tr>
<td>b3</td>
<td>-0.0283203125</td>
<td>-29</td>
</tr>
<tr>
<td>b4</td>
<td>-0.1875</td>
<td>-192</td>
</tr>
<tr>
<td>b5</td>
<td>0.03125</td>
<td>32</td>
</tr>
<tr>
<td>b6</td>
<td>0.033203125</td>
<td>34</td>
</tr>
<tr>
<td>b7</td>
<td>-0.0107421875</td>
<td>-11</td>
</tr>
</tbody>
</table>

The lowpass \( (h_0(k)) \) and highpass \( (h_1(k)) \) \( db4 \) filters employed both in the synthesis and analysis FBs are power complimentary which states that all four filters have the same coefficients but with alternating signs. Thus, there are only eight distinct coefficients and a single ReMB structure is designed for all filters using the basic structures introduced in Subsection 4.4.2 with an additional 2:1 mux at the ReMB output in order to select between positive and negative coefficients. First of all, \( \text{ReMB}_{\text{depth}} \) is calculated according to (4.1) where \( n = 4 \) and \( N = 8 \) which demonstrates that 2 cascaded stages are required. Then, the maximum adder cost for all eight coefficients is calculated to be 2. Thus, (4.3) states that the depth of ReMB is 2. Although, \( \text{ReMB}_{\text{depth}} \) provides a topological minimum, it does not consider the operations required by each coefficient and the limitations of each basic structure for an FPGA implementation. Therefore, in order to combine the fundamentals of all coefficients three basic structures are inter-
connected in a tree structure and the **LSB** of the mux select lines are used to control the adder/subtractor operation. The details of the adder-costs and shift-add operations for $db_4$ coefficients are as shown in Table 4.2 and the structure of the proposed **ReMB** (will be referred to as **Design 1** in the rest of this section) is presented in Figure 4.13.

**Table 4.2**: Fixed-point (11-bit) $db_4$ wavelet filter coefficients, their adder costs and shift-add format used to design the proposed ReMB.

<table>
<thead>
<tr>
<th>$z$</th>
<th>Adder Cost</th>
<th>Shift-Add</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>2</td>
<td>$2^2(2 + 1) - 1$</td>
</tr>
<tr>
<td>34</td>
<td>1</td>
<td>$2(2^4 - 1)$</td>
</tr>
<tr>
<td>32</td>
<td>0</td>
<td>$2^5$</td>
</tr>
<tr>
<td>192</td>
<td>1</td>
<td>$2^6(2 + 1)$</td>
</tr>
<tr>
<td>29</td>
<td>2</td>
<td>$2^2(4 + 1) + (2^3 + 1)$</td>
</tr>
<tr>
<td>646</td>
<td>2</td>
<td>$2(2^6(4 + 1) + (2 + 1))$</td>
</tr>
<tr>
<td>732</td>
<td>2</td>
<td>$2^2(2^2(2 + 1) - (2^3 + 1))$</td>
</tr>
<tr>
<td>236</td>
<td>2</td>
<td>$2^2(2^2(16 - 1) - 1)$</td>
</tr>
</tbody>
</table>

**Figure 4.13**: The ReMB designed for $db_4$ wavelet filters.

**Design 1** is targeted for FPGA platforms which takes the advantage of using the dedicated fast carry logic and implements multiplexers with no additional cost, as
described before. The operation of the adders is controlled by the LSBs of the 2-bit mux select lines, S0:S2, which are denoted as ‘Add/Sub’. The adder will do an addition operation when Add/Sub = 0 and a subtraction operation when Add/Sub = 1. However, when non-FPGA technologies are targeted, then the ReMB can be redesigned with increased flexibility of using larger muxes. In FPGA platforms, resource cost for an individual mux with more than three inputs is comparable to an adder’s. For instance, implementation of both a 4-bit 4:1 mux and a 4-bit full adder, utilizes four LUTs each, whereas for non-FPGA technologies, multiplexer cost is relatively cheaper. Therefore, a second multiplier block is designed using CSE method via searching for common patterns from CSD representations of the coefficients as shown in Table 4.3.

Table 4.3: CSD encoded coefficients and common sub-expressions

<table>
<thead>
<tr>
<th>2^10</th>
<th>2^9</th>
<th>2^8</th>
<th>2^7</th>
<th>2^6</th>
<th>2^5</th>
<th>2^4</th>
<th>2^3</th>
<th>2^2</th>
<th>2^1</th>
<th>2^0</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>34</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>32</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>192</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>29</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>646</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>732</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>236</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Two common patterns are detected (shaded areas in Table 4.3) and shared across all coefficients. This design is shown in Figure 4.14 (a) (will be referred to as Design 2 in the rest of this section) and it employs three adders, one 2:1 and two 6:1 muxes. These structures are both compared to the structure created with DAG fusion algorithm using SPIRAL online tool (will be referred to as Design 3 in the rest of this section) which
aims to minimize the adder cost. Design 3 is given in Figure 4.14(b) and it employs three adders, three 2:1 muxes and one 6:1 mux. For both designs, a controller is responsible for generating the select signals and the ‘Add/Sub’ signals for the muxes and the adders, respectively.

Figure 4.14: Constant multiplier blocks designed for db4 filter coefficients. (a) Design 2 and (b) Design 3

4.5.3 db4 Filters and DWT Filter Bank Architectures

Biomedical signals have frequency bands up to a few kHz, hence they require comparably slow operating frequencies. Recent FPGAs can operate at up to a few GHz,
therefore time-multiplexed architectures can be easily used and this way hardware utilization of the DWT can be massively reduced. A conventional time-multiplexed TDL filter is composed of an input memory, a coefficient memory and a single Multiply-Accumulate unit with a General Purpose (GP) multiplier. Such a filter structure operates sequentially. At every cycle, the incoming data is multiplied with one coefficient stored in a memory and this process is controlled with a simple control unit, typically a counter. Each generated product is accumulated with the previous one by using an accumulator and a register. The structure of the conventional filter is presented in Figure 4.15 (a).

On the other hand, the proposed ReMB produces the intermediate results of input and coefficient multiplication at each clock cycle which eliminates the need for a coefficient memory. As it can be observed from Figure 4.15 (b), a multiplexer is placed after ReMB which is responsible for selecting between the generated coefficient or its complement.
Here, controller is responsible for addressing the correct coefficient for each tap by generating correct control signals for the multiplexers and adders/subtractors employed in the ReMB as well as the multiplexer after it.

The controller is simply an up-counter followed by a decoder where the counter generates the address to control the input memory and the decoder decodes these addresses to generate the mux select lines. Table 4.4 presents the select line values required to generate the lowpass analysis filter \((h_0(k))\) coefficients. \(S_0 : S_3\) corresponds to the select lines of the muxes given in Figure 4.13 and \(S_{4-h_0}\) is the mux at the output of the ReMB given in Figure 4.15 (b). Select line values 0, 1, 2, and 3 choose the mux input from top to bottom, 0 and 3 selecting top and bottom input, respectively, where \(X\) is a Don’t Care indicating that the muxes and adders are not employed in the generation of the corresponding coefficient.

Table 4.4: Select line (S0:S4) values for multiplexers given in Figures 4.13 and 4.15 (b) to generate the lowpass analysis filter coefficients.

<table>
<thead>
<tr>
<th>(Z)</th>
<th>(S_0)</th>
<th>(S_1)</th>
<th>(S_2)</th>
<th>(S_3)</th>
<th>(S_{4-h_0})</th>
</tr>
</thead>
<tbody>
<tr>
<td>(b_0)</td>
<td>-11</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>(b_1)</td>
<td>34</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>(b_2)</td>
<td>32</td>
<td>3</td>
<td>(X)</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>(b_3)</td>
<td>-192</td>
<td>3</td>
<td>1</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>(b_4)</td>
<td>-29</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>(b_5)</td>
<td>646</td>
<td>0</td>
<td>2</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>(b_6)</td>
<td>732</td>
<td>2</td>
<td>1</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>(b_7)</td>
<td>236</td>
<td>1</td>
<td>3</td>
<td>1</td>
<td>2</td>
</tr>
</tbody>
</table>

Figure 4.16 presents the structure of the control unit designed for controlling \(h_0(k)\), where \(S_{x1}\) and \(S_{x0}\) represent the MSB and LSB of the mux select line \(S_x\), respectively. In addition \(A\), \(B\), and \(C\) represent the decoded counter output MSB to LSB, respectively.
while $\bar{A}$, $\bar{B}$, and $\bar{C}$ are their complements. The truth tables used to generate this controller are presented in Appendix A Table A.1.

![Diagram](image)

Figure 4.16: The controller for the lowpass analysis filter ($h_0(k)$).

The filter structures presented in Figure 4.15 are employed in two tree structured one-level analysis FBs as the lowpass ($h_0(k)$) and the highpass ($h_1(k)$) $db4$ filters. The architecture of both FBs are shown in Figures 4.17 (a) and (b). For the conventional FBs with GP multipliers two separate coefficient Read-Only Memories (ROMs) are used and a single input memory and a controller is used. On the other hand, for the FB given in Figure 4.17(b) a bigger decoder is designed to simultaneously control both ReMBs in $h_0(k)$ and $h_1(k)$ and the input memory, that replaces the two ROMs.
employed in Figure 4.17 (a) and is presented in Figure 4.18. The truth tables used to
design the ReMB of $h_1(k)$ are presented in Appendix A, Table A.2.

Figure 4.17: One level analysis filter bank comprised of a lowpass ($h_0(k)$) and highpass ($h_1(k)$) time-
multiplexed TDL filters with; (a) parallel multiplier and coefficient memory and (b) the proposed
ReMB.
Figure 4.18: The controller for the analysis filter bank.
4.6 Hardware Validation and Cost Assessment

For hardware validation, cost assessment and performance evaluation, the aforementioned three different multiplier block designs (Figures 4.13 and 4.14) and the 1-level analysis filter banks employing conventional FIR filters and the filters with the proposed ReMB (Figure 4.17), are synthesized and implemented on a Kintex-7 (xc7k325tffg900) FPGA in Vivado v16.2, using the System Generator for DSP in the Matlab/Simulink environment. The resource utilization of the multiplier blocks in terms of LUTs and slice counts, as well as the critical path delay ($\text{Delay}_{\text{cp}}$) in terms of adder operation time, indicated using $\tau_a$ is reported. In addition, the filter bank resource utilization is presented in terms of LUTs, flip-flops, where the maximum operating frequencies and power consumption figures are compared with the designs available in the open-literature.

4.6.1 Multiplier Block Cost Assessment

The multiplier block structures introduced in Subsection 4.5.2 and presented in Figures 4.13 (Design 1) and 4.14 (Design 2 and Design 3) employing three adders, however while Design 1 and Design 2 have an adder cost (i.e. maximum number of adders cascaded from the input to the output) of two, Design 3 adder cost is three which is a significant factor for data-path delay and power consumption of the multiplier blocks. All three structures are synthesized and place and routed in Vivado v16.2 and two signed 8-bit and signed 11-bit data are fed as inputs to each multiplier structure. It should be noted that the internal data-path for the multiplier structure is kept at full precision in order to avoid quantization errors. Therefore, Design 1 employs a 15- and
a 16-bit adder/subtractors in the first stage and a 23-bit adder/subtractor in the second stage of the ReMB. Table 4.5 presents the resource utilization figures after place and route in terms of LUTs and Slices which are compared to each other as well as to the off-the-shelf Xilinx Multiplier LogiCORE™ (version 12.0) [157].

Table 4.5: Resource utilization of individual MCM blocks that are designed using the ReMB, CSE and DAG fusion methods as well as the Xilinx Multiplier LogiCORE™.

<table>
<thead>
<tr>
<th>Design</th>
<th>Design 2</th>
<th>Design 3</th>
<th>Xilinx Multiplier LogiCORE™</th>
</tr>
</thead>
<tbody>
<tr>
<td>Input word-length (bits)</td>
<td>8 11</td>
<td>8 11</td>
<td>8 11</td>
</tr>
<tr>
<td>LUT</td>
<td>70 82</td>
<td>102 122</td>
<td>117 144</td>
</tr>
<tr>
<td>Slice</td>
<td>21 26</td>
<td>31 34</td>
<td>34 41</td>
</tr>
<tr>
<td>Delay_{cp}</td>
<td>2\tau_a</td>
<td>2\tau_a</td>
<td>3\tau_a</td>
</tr>
</tbody>
</table>

As mentioned earlier, Design 1 targets the FPGA platforms and efficiently uses the dedicated logic due to the efficient placement of multiplexers before adders. Thus, it utilizes the least amount of resources when compared to the other structures. For an 11-bit input, which is a typical precision for biomedical signals, Design 1 employs 82 LUTs and 26 slices which is 51\% less compared to the Xilinx Multiplier LogiCORE™. On the other hand, Design 2 reduces the resource utilization by 27.8\% compared to the Xilinx Multiplier LogiCORE™. Design 2 employs less number of muxes and provides a better solution than Design 3 which makes it a better candidate for non-FPGA platforms when implementing db4 filters for WT. In other words, the DAG fusion algorithm is not optimal in terms of adder cost and number of multiplexers used, and a more sophisticated controller is required. Nevertheless, the results demonstrated that Design 1 is the optimal structure for the db4 filters in an FPGA platforms and can be used for replacing the parallel-multipliers and coefficient memory.
4.6.2 Time-Multiplexed FIR Filters Cost Assessment

The Xilinx Multiplier LogiCORE™ and Design 1 are also employed in time-multiplexed TDL filter architectures as shown in Figures 4.15 (a) and (b), respectively and their resource utilization is compared to each other. Resource utilization for each architecture after place and route is demonstrated in Table 4.6 in terms of LUTs, and Flip-Flops (FFs). In addition, critical path delay (Delay\_cp) for the multiplier/multiplier blocks are demonstrated in terms of adder and multiplier operation times, indicated using \( \tau_a \) and \( \tau_m \), respectively.

Table 4.6: Resource utilization of the time-multiplexed TDL Filters with the proposed RemB and the Xilinx Multiplier after Place and Route on Xilinx Kintex-7 device.

<table>
<thead>
<tr>
<th>Filter Resource Utilization</th>
<th>Figure 4.15 (a)</th>
<th>Figure 4.15 (b)</th>
</tr>
</thead>
<tbody>
<tr>
<td>LUT</td>
<td>219</td>
<td>133</td>
</tr>
<tr>
<td>FF</td>
<td>141</td>
<td>108</td>
</tr>
<tr>
<td>Slices</td>
<td>93</td>
<td>57</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Multiplier Resource Utilization</th>
<th>Xilinx Multiplier</th>
<th>RemB</th>
</tr>
</thead>
<tbody>
<tr>
<td>LUT</td>
<td>169</td>
<td>98(^1)</td>
</tr>
<tr>
<td>Max. Frequency (MHz)</td>
<td>82</td>
<td>82</td>
</tr>
</tbody>
</table>

\(^1\)This number includes the 2:1 mux placed after the RemB which selects between the positive and negative coefficients.

It should be noted that pipelining is not considered for these implementations however, for the RemB it can be achieved by adding registers following adders, and multiplexers [125]. Looking at Table 4.6, it can be observed that resource utilization for Design 1 is lower compared to the reference design. The reference design’s cost is estimated as 219 LUTs, 141 FFs and 93 slices where the multiplier on its own costs 169 LUTs. On the
other hand, Design 1 demonstrates high savings against the reference design. Overall filter cost is reduced by 33% which is achieved by 39% reduction of the LUTs and 23% reduction of the FFs where the ReMB utilizes 43% less LUTs on its own. In terms of critical path delay, in FPGA implementations multiplexer delays are not included in path delay since they are embedded into LUTs, therefore it is only critical to consider the logic depth of adders. Multiplier blocks have reduced logical depth compared to the multipliers which will reduce critical path delay. Thus Design 1 offers the most optimal solution in FPGA targeted applications for db4 filters.

4.6.3 1-level Analysis Filter Bank Cost Assessment

The time-multiplexed filters implemented using the proposed ReMB and general purpose multiplier are employed in a 1-D analysis DWT filter bank, in order to provide an insight for a whole system. For this purpose, a conventional tree architecture is implemented where two analysis filters ($h_0(k)$ and $h_1(k)$) for one level decomposition are employed. The structure of the filter bank employing the multiplier-free filters and the control unit are presented in Figures 4.17 and 4.18 respectively. In order to validate the performance of the system, a 8-bit random signal and a 11-bit ECG data record from the MIT-BIH Arrhythmia database are used. The resource utilization after Place and Route, for both filter banks are presented in Table 4.7 in terms of, LUTs, Registers, Adders and Multipliers. In addition, Table 4.7 presents and compares the resource utilization, maximum clock frequency and the dynamic power consumption figures (if applicable) of other multiplier free db4 analysis filter bank implementations from the open literature along with the proposed ReMB implementation. System power
consumption is estimated at clock speed of 50 MHz in order have a fair comparison
with the literature and the Xilinx Power Estimator tool is used for a more accurate ana-
lysis \cite{158}. The one-level analysis\textit{FB} employing the proposed\textit{ReMB} utilizes 255 LUTs,
129 FFs and achieves a maximum operating frequency of 90.38 MHz, where the power
consumption is estimated as 3.404 mW at 50 MHz clock frequency for an 11-bit ECG
input data. In \cite{159}, Wahid. \textit{et al.} presented a matrix based Algebraic Integer Quant-
ization (AIQ) mapped and a conventional fixed-point 1-level decomposition $db_4$ filter
bank architectures for image processing applications. The $db_4$ transform coefficients
are implemented using the CSD representation to replace multipliers with shift-add
networks. It is reported that 22 adders are required and the hardware cost is listed
as 734 and 692 LUTs for AIQ and fixed-point (FP) based implementation, respect-
ively for coefficients with 10-bit precision. A more recent study by Hasan \textit{et al.} \cite{160},
proposed two architectures which were lifting-based structures of the $db_4$ wavelet fil-
ters. Here the filter coefficients were divided into lifting steps and shift-add networks
were used for implementing two 1-level decomposition\textit{FB} without multipliers. Lifting
scheme is used for lowering the resource utilization of the DWT\textit{FBs} since polyphase
matrices are used and the structure halves the computational complexity. However, as
the number of wavelet filter coefficients increases, calculation of the lifting coefficients
become more sophisticated and require more arithmetic operations to represent them,
which increases their complexity. The resource utilizations for both\textit{FBs} were reported
as 470 LUTs, 133 Registers and 389 LUTs, 101 Registers for Scheme 1 and Scheme
2, respectively. The proposed\textit{ReMB} based filter bank employs 216 LUTs and 108
registers where the\textit{GP} multiplier based\textit{FB} employs 315 LUTs and 137 Registers, for
an 8-bit input data. These results demonstrate that, although a conventional tree filter
bank structure is used, the proposed structure exhibits the least hardware resources
where the hardware complexity reduction of 46% and 34% is achieved when compared
to Hasan et al.’s presented two schemes, respectively. It is critical to note that, these
massive savings are achieved through the ReMB itself and no filter bank architecture
optimization such as polyphase implementation is used. In [159], the power consump-
tion figures were also presented for the two architectures implemented which were 6.8
and 8.1 mW, respectively, whereas power consumption of the proposed and the refer-
ence designs are 2.93 and 4.41 mW, respectively. However, the technology used in the
literature are relatively old compared to the technology used in this work, thus it is
not fair to compare the power figures. Therefore, the power consumption figures are
compared against the reference analysis filter bank and it can be observed that the
use of ReMB improves the dynamic power consumption by 50.5% and 62% for 8-bit
and 11-bit input data, respectively. This demonstrates that the increase in the power
consumption of the parallel multiplier with the increased input word-length is higher
than the ReMB.
Table 4.7: Resource Utilization and Power Consumption of the Multiplier Free db4 Filter Bank Architectures

<table>
<thead>
<tr>
<th>Architecture</th>
<th>Matrix</th>
<th>Matrix</th>
<th>DA$^4$</th>
<th>Lifting</th>
<th>Lifting</th>
<th>Time-multiplexed</th>
<th>Time-multiplexed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Input word length (bits)</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>11</td>
</tr>
<tr>
<td>Adders</td>
<td>22</td>
<td>27</td>
<td>-</td>
<td>27</td>
<td>24</td>
<td>2</td>
<td>8</td>
</tr>
<tr>
<td>Multipliers</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>LUTs</td>
<td>734</td>
<td>692</td>
<td>614</td>
<td>470</td>
<td>389</td>
<td>315</td>
<td>417</td>
</tr>
<tr>
<td>Registers</td>
<td>180</td>
<td>-</td>
<td>-</td>
<td>133</td>
<td>101</td>
<td>137</td>
<td>161</td>
</tr>
<tr>
<td>Max. Frequency (MHz)</td>
<td>69</td>
<td>-</td>
<td>149</td>
<td>63</td>
<td>112</td>
<td>89</td>
<td>108.3</td>
</tr>
<tr>
<td>Power (mW)$^5$</td>
<td>6.8</td>
<td>8.1</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>4.41</td>
<td>5.52</td>
</tr>
<tr>
<td>Device</td>
<td>Cyclone II</td>
<td>Cyclone II</td>
<td>Stratix II</td>
<td>Virtex-6</td>
<td>Virtex-6</td>
<td>Kintex-7</td>
<td>Kintex-7</td>
</tr>
</tbody>
</table>

$^1$ Algebraic Integer Quantization (AIQ)  $^2$ Finite-Precision (FP)  $^3$ Conventional Filter Bank (CFB)  $^4$ Distributed Arithmetic (DA)

$^5$ Measured at 50 MHz
4.7 Summary

An efficient implementation for $db_4$ wavelet and scaling filters are presented here that employs a specifically designed ReMB. It is shown that addition of multiplexers into shift-add networks provides reconfigurability to the well known constant multiplication blocks. By taking the advantage of the recent FPGA technologies having 6-input LUTs, 3:1/4:1muxes are employed in the design of ReMBs at no additional hardware cost which updates the concepts proposed in the open literature. In order to evaluate the resource and power efficiency of the proposed structure, the proposed ReMB is employed in time-multiplexed FIR filters and conventional DWT FBs which are implemented on a Kintex-7 FPGA platform and are compared to the reference designs employing parallel multipliers and to the designs present in the open-literature. Although there is a substantially diverse literature on efficient FPGA and Very-Large-Scale Integration (VLSI) implementations of the wavelet transform, to the best of author’s knowledge, application of reconfigurable multiplier blocks with optimized structure for FPGA platforms has not been investigated in the field of biomedical signal processing. The replacement of multipliers in DWT with shift-add networks has been subject to research in image processing and image compression applications, however reconfigurable constant multiplications are not studied. As the results demonstrate, the proposed ReMB massively reduces the resource utilization when compared to the parallel multipliers. The ratio of the savings increase with the increasing input word-length, as the number of adders in the parallel multiplier increases while the number of adders in the ReMB remains the same. When employed in a time-multiplexed FIR filter architecture, the additional 2:1 mux and the controller increases the hardware utilization of the filter
structure however, the proposed design still utilizes 39% less logic elements against the reference design. Furthermore, the 1-level analysis filter bank cost assessment results also demonstrated that the proposed system massively improves the resource utilization and power consumption compared to the open-literature and the conventional reference design.

4.8 Conclusions

In this chapter an alternative implementation method is investigated in order to reduce the hardware complexity and power consumption of the DWT filter banks. A practical approach referred to as ReMB, which replaces the conventional GP parallel multipliers is presented. In this approach, the advantage of having constant coefficients is taken and each multiplication operation is replaced with simpler shift and add operations. From the work presented in this chapter, it can be concluded that the proposed ReMB approach provides simplicity to the implementation of the DWT wavelet filter banks and the designer can easily achieve massive hardware and power reduction up to 50%.

The performance of the ReMB approach is also compared to the state-of-the-art multiplierless implementation solutions for the presented wavelet family and it can be observed that the presented approach achieves the highest savings with no other architectural or algorithmic optimization. These savings are simply related to the reduction in the number of addition operations with the aid of large (3:1/4:1) multiplexers for no additional cost for FPGA implementations. The FPGA implementation results provided an insight that the proposed approach is low-cost and power efficient compared to other FPGA implementations. Therefore, it can be concluded that the ReMB
structures are suitable for DWT filter banks and can be used for ASIC implementa-
tions and employed in low-cost embedded platforms for ambulatory physiological signal monitoring and analysis.
Chapter 5

IIR Wavelet Filter Banks for Biomedical Signal Processing

5.1 Introduction

As mentioned in Chapter 2, DWT can be realised as a two-channel PR quadrature mirror (QMF) filter banks in which the input data is decomposed by iterating the lowpass branch of the analysis filter bank, and reconstructed through the synthesis filter bank. Although the most commonly used wavelets are realised with non-recursive filters that have finite impulse response, recursive filters with infinite impulse response can also be used for implementing wavelet filter banks. In DSP applications, the IIR filters has an advantage over their FIR counterparts, as they can achieve comparable filter specifications such as passband ripples, stopband attenuation and transition bandwidth with much lower filter orders which leads to reduced arithmetic operations, memory constraints and hence lower system delay. Furthermore, the advantage of realizing
IIR filters with polyphase networks composed of allpass filters, further reduces the computational burden and makes the filter more robust to coefficient quantization. Thus, IIR filters are more desirable for low-power and low complexity applications where coefficient precision is a significant factor to consider. In the literature for wavelet transform, a vast amount of research employed FIR filter banks for many application areas such as biomedical, communication, audio signal and image and video processing [76,79,80,148,152,162], meanwhile the IIR wavelet filter banks are studied relatively less and limited to image processing and compression applications [163–166].

The use of non-linear phase IIR filters in the analysis filter bank generally leads to unstable but causal or non-causal but stable synthesis filters. Therefore, the design of IIR based filter banks with PR property becomes more challenging than the FIR based counterparts, which is the main reason for the limited application area and interest. However, the computational simplicity of the IIR filter based analysis filter banks are appealing alternatives to FIR filter based ones which motivated the study presented in this chapter.

The rest of this chapter presents the desired properties and design procedure of orthogonal IIR wavelet analysis filter banks which are realised as parallel connections of real allpass filters. Therefore, the IIR wavelet analysis filter design problem is reduced to the allpass filter design, where Remez exchange algorithm [167] based on eigenvalue decomposition is used. Furthermore, the problem of non-causal IIR synthesis filters is investigated and a novel hybrid solution is proposed where the analysis and synthesis filter banks employ IIR and FIR filters, respectively. To the best knowledge of the author, this hybrid solution is the first in the area of biomedical signal processing
and wavelet literature which offers reduced hardware complexity solutions for DWT implementation to be employed in portable, limited size and power, health monitoring systems. Validation and cost assessment studies are carried out and comparative results are also presented.

### 5.2 Orthogonal IIR Wavelet Analysis Filter Banks

The analysis part of a two-channel [PR] IIR filter bank can be realised with a halfband lowpass and a halfband highpass filter denoted by $H_0(z)$ and $H_1(z)$, respectively. These filters are based on the parallel connection of two real all-pass filters ($A_0(z)$ and $A_1(z)$) where the transform matrix is presented by (5.1).

$$
H(z) = \begin{bmatrix}
H_0(z) \\
H_1(z)
\end{bmatrix}
= \frac{1}{2} \begin{bmatrix}
A_0(z^2) + z^{-1}A_1(z^2) \\
A_0(z^2) - z^{-1}A_1(z^2)
\end{bmatrix}
$$

(5.1)

where $A_0(z)$ and $A_1(z)$ are $M^{th}$ order allpass filters with real coefficients $\alpha_m$ and a general transfer function,

$$
A(z) = z^{-M} \sum_{m=0}^{M} \alpha_m z^m
$$

(5.2)

As it can be observed from (5.1), $H_0(z)$ and $H_1(z)$ are power complementary filters since, they satisfy (5.3) where $z = e^{j\omega}$.

$$
|H_0(e^{j\omega})|^2 + |H_1(e^{j\omega})|^2 = 1
$$

(5.3)

A one-level IIR [QMF] filter bank based on the polyphase components $A_0(z)$ and $A_1(z)$ is demonstrated in Figure 5.1.
The scaling and wavelet functions associated with the aforementioned filters, can be
achieved by iterating the filter bank \( J \) times on its lowpass branch as shown in (5.4).
This will result in the transfer functions \( \Phi(z) \) and \( \Psi(z) \) with lowpass and bandpass
spectrum where their impulse responses are the scaling \( (\phi(n)) \) and wavelet \( (\psi(n)) \)
functions, respectively.

\[
\Phi(z) = \prod_{j=0}^{J-1} H_0 \left( z^{2^j} \right)
\]
\[
\Psi(z) = H_1 \left( z^{2^{J-1}} \right) \prod_{j=0}^{J-2} H_0 \left( z^{2^j} \right)
\]

It is well known that the regularity of the wavelets is a very important property since
it defines the smoothness of the wavelet functions and it is useful for detecting discon-
tinuities in signals, such as electrode movements in biopotential signal recordings. It is
directly related to the wavelet’s vanishing moments which is the number of times the
wavelet spectrum vanishes (goes to zero) at \( \omega = 0 \) i.e \( \left| \Psi(e^{j\omega}) \right|_{\omega=0} = 0 \) where \( z = e^{j\omega} \).
Therefore, the aforementioned \( H_0(z) \) and \( H_1(z) \) require an additional flatness condi-
tion as shown in (5.5) [163].

\[
\left. \frac{\partial^k H_1(e^{j\omega})}{\partial \omega^k} \right|_{\omega=0} = \left. \frac{\partial^k H_0(e^{j\omega})}{\partial \omega^k} \right|_{\omega=\pi} = 0 \quad \text{for} \quad k = 0, 1, \ldots, K - 1 \quad (5.5)
\]
where $K$ corresponds to the number of zeros of $H_1(z)$ at $z = 1$ and $H_0(z)$ at $z = -1$ i.e. the Nyquist frequency. In other words, the associated wavelet function has $K$ consecutive vanishing moments [164]. The power complementary property given in (5.3) enables the reduction of the design procedure to the design of $H_0(z)$. For a given filter order, a trade of between frequency resolution and wavelet regularity exists. Therefore, it is critical to identify needs of the application and select the best possible frequency selectivity for a given flatness condition [165]. In this study, the IIR wavelet design methodology introduced by Zhang et. al [165] is adopted for implementing the IIR wavelet filters with the selected filter and regularity order, and the design steps are introduced in this section. This methodology employs the Remez exchange algorithm based on the eigenvalue problem in order to calculate the allpass filter coefficients with an added flatness condition. Recalling (5.1), $H_0(z)$ is defined as the parallel summation of the two allpass filters $A_0(z)$ and $A_1(z)$, and it can be re-written as,

$$H_0(z) = \frac{1}{2} A_0(z^2) \left( 1 + z^{-1} U(z^2) \right)$$

(5.6)

where,

$$U(z^2) = \frac{A_1(z^2)}{A_0(z^2)} = z^{-2M} \sum_{m=0}^{M} \alpha_m z^{-2m}$$

(5.7)

is an allpass filter with real coefficients $\alpha_m$ and $a_0 = 1$. Phase response of $z^{-1} U(z^2)$ can be expressed as,
\[
\theta(\omega) = -(1 + 2M) + 2 \tan^{-1} \frac{\sum_{m=0}^{M} \alpha_m \sin 2m\omega}{\sum_{m=0}^{M} \alpha_m \cos 2m\omega} \quad (5.8a)
\]

\[
= 2 \tan^{-1} \frac{\sum_{m=0}^{M} \alpha_m (2m - M - 0.5) \omega}{\sum_{m=0}^{M} \alpha_m \cos (2m - M - 0.5) \omega}. \quad (5.8b)
\]

Following this, the frequency response of \(H_0(z)\) can be calculated by evaluating (5.1) on the unit circle and the magnitude response can be written as,

\[
|H_0(e^{j\omega})| = \cos \frac{\theta(\omega)}{2} \quad (5.9)
\]

The required flatness condition is achieved by substituting (5.9) into (5.5) which simplifies to,

\[
\sum_{m=0}^{M} a_m (2m - M - 0.5)^{2m-1} = 0 \quad \text{for} \quad n = 1, 2, \ldots, N, \quad (5.10)
\]

for \(\omega = \pi\), where \(N = \frac{K - 1}{2}\), such that \(K\) is odd, \(N\) is an integer which is \(0 \leq N \leq M\) and \(M\) is the allpass filter order. In order to achieve maximum regularity which leads to maximally flat \(H_0(z)\), \(N\) should be equal to the allpass filter order \(M\). For cases when \(0 \leq N < M\), the Remez algorithm is used to achieve equiripple magnitude response by approximating either the passband \([0, \omega_p]\) or the stopband \([\omega_s, \pi]\) ripples. For a half-band IIR filter implemented with polyphase components, \(\omega_p + \omega_s = \pi\), hence its magnitude response is characterized by,

\[
|H_0(e^{j\omega_p})|^2 + |H_0(e^{j(\pi-\omega_p)})|^2 = 1. \quad (5.11)
\]

Therefore, the approximation of the passband region is sufficient for calculating the filter coefficients which further simplifies the IIR wavelet filter design procedure. The first step in the Remez algorithm is the identification of the extremal frequencies (\(\omega_i\))
in the band of interest where the magnitude response will be recursively calculated to minimize the magnitude error. Hence, the Remez problem becomes,

$$
|H_0(e^{j\omega})| = \frac{\sum_{m=0}^{M} \alpha_m \sin(2m - M - 0.5) \omega_i}{\sum_{m=0}^{M} \alpha_m \cos(2m - M - 0.5) \omega_i} = (-1)^i \delta
$$  \hspace{1cm} (5.12)

which can be written in matrix form such that $PA = \delta QA$ and solved as a generalized eigenvalue problem, where $\delta$ is an eigenvalue and $A$ is the corresponding eigenvector with the allpass filter coefficients. $P$ and $Q$ matrices are given in (5.13) and (5.14).

$$
P = \begin{bmatrix}
(-0.5 - M) & (1.5 - M) & \cdots & (M - 0.5) \\
(-0.5 - M)^3 & (1.5 - M)^3 & \cdots & (M - 0.5)^3 \\
\vdots & \vdots & \ddots & \vdots \\
(-0.5 - M)^{2N-1} & (1.5 - M)^{2N-1} & \cdots & (M - 0.5)^{2N-1} \\
\sin(-M - 0.5) \omega_0 & \sin(1.5 - M) \omega_0 & \cdots & \sin(M - 0.5) \omega_0 \\
-\sin(-M - 0.5) \omega_1 & -\sin(1.5 - M) \omega_1 & \cdots & -\sin(M - 0.5) \omega_1 \\
\vdots & \vdots & \ddots & \vdots \\
\gamma \sin(-M - 0.5) \Theta & \gamma \sin(1.5 - M) \Theta & \cdots & \gamma \sin(M - 0.5) \Theta
\end{bmatrix}
$$  \hspace{1cm} (5.13)

$$
Q = \begin{bmatrix}
0 & 0 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 0 \\
\cos(-M - 0.5) \omega_0 & \cos(1.5 - M) \omega_0 & \cdots & \cos(M - 0.5) \omega_0 \\
\cos(-M - 0.5) \omega_1 & \cos(1.5 - M) \omega_1 & \cdots & \cos(M - 0.5) \omega_1 \\
\vdots & \vdots & \ddots & \vdots \\
\cos(-M - 0.5) \Theta & \cos(1.5 - M) \Theta & \cdots & \cos(M - 0.5) \Theta
\end{bmatrix}
$$  \hspace{1cm} (5.14)
where $\gamma = (-1)^{M-N}$ and $\Theta = \omega_{M-N}$. In order to find the optimal allpass filter coefficients, vector $A$, the absolute minimum eigenvalue should be found. Once the coefficients are identified, the poles of $U(z)$ are computed, where the poles inside and reciprocals of the poles outside the unit circle are assigned as poles of $A_0(z)$ and $A_1(z)$, respectively.

The aforementioned algorithm is used to generate three $9^{th}$ order IIR wavelet filters with different regularities (i.e. number of zeros at the Nyquist frequency), as examples. For all filters, the allpass filter order $M = 4$, and the passband edge frequency $\omega_p = 0.4\pi$ (i.e. $\nu_p = 0.2$ where $\omega = 2\pi\nu$) are selected. The first design presented in Figure 5.2a exhibits a maximally flat filter response (i.e. Butterworth filter) with $K = 9$ (i.e. $N = 4$). As it can be observed in Figure 5.2a, all filter zeros are placed at $z = -1$ and $z = 0$ for $H_0(z)$ (blue) and $H_1(z)$ (red), respectively. For a $9^{th}$ order filter nine zeros at $z = -1$ (i.e. nine vanishing moments) is the maximum achievable regularity, thus the filter is maximally flat. Second extreme case with no flatness condition imposed on to the IIR wavelet filter is presented in Figure 5.2b. Here, $K = 1$ and the resulting filter exhibits equiripple passband and stopband characteristics which is no different than a conventional Elliptic IIR filter. Looking at these two border examples, the trade of between the regularity and the frequency selectivity can be observed. In other words, as the number of zeros at $z = -1$ and $z = 0$, the transition bandwidth of $H_0(z)$ and $H_1(z)$ gets wider, or vice versa, which degrades the frequency selectivity of the halfband filters. Finally, an intermediate example is presented in Figure 5.2c with $K = 5$. 

116
Figure 5.2: 9\textsuperscript{th} order IIR lowpass (blue) and highpass (red) filters; Maximally flat with $K = 9$ and $N = 4$; (a) Magnitude response, (b) Pole-Zero Plane
Elliptic with $K = 1$ and $N = 0$; (c) Magnitude response, (d) Pole-Zero Plane
Intermediate with $K = 5$ and $N = 2$; (e) Magnitude response, (f) Pole-Zero Plane
The wavelet and scaling functions associated with maximally flat, elliptic and intermediate filters are presented in Figures 5.3a, 5.3b, and 5.3c respectively. It can be observed that the wavelet and scaling functions associated with the maximally flat design decays faster than the other two designs which is due to the higher number of vanishing moments, proving the effect of increased number of vanishing moments on the wavelet decay speed.

Figure 5.3: Scaling and Wavelet functions for 9th order (a) Maximally flat, (b) Elliptic, and (c) Intermediate IIR wavelet filters.
5.3 IIR Wavelet Synthesis Filter Banks

In wavelet theory one of the most important properties of the implemented filter bank is the perfect reconstruction of a decomposed signal. Recalling Figure 5.1, the relationship between the filter bank input and the reconstructed output for a PR QMF wavelet filter bank can be expressed as:

\[
\hat{X}(z) = X(z)D_L(z) + X(-z)D_A(z)
\]  

(5.15)

where \(X(z)\) and \(\hat{X}(z)\) are the input and the reconstructed output of the filter bank, respectively, \(X(-z)\) is the alias component of \(X(z)\) as a result of the downsampling operation, \(D_A(z)\) is the distortion transfer function caused by aliasing and \(D_L\) is the linear distortion transfer function which relates to the amplitude and phase characteristics of the filters. \(D_A(z)\) is defined by (5.16).

\[
D_A(z) = \frac{1}{2} \left( H_0(-z)G_0(z) + H_1(-z)G_1(z) \right)
\]

(5.16)

The aliasing distortion is cancelled if \(D_A(z) = 0\) which can be achieved by having synthesis filters, \(G_0(z) = H_1(-z)\) and \(G_1(z) = -H_0(-z)\). Also, if the analysis filters \(H_0(z) = H_1(-z)\), then the filters are called quadrature mirror filters, thus the filter bank is called [QMF] bank. In addition, the linear distortion transfer function is described as:

\[
D_L(z) = \frac{1}{2} \left( H_0(z)G_0(z) + H_1(z)G_1(z) \right)
\]

(5.17)
and in order to have $\hat{X}(z) = X(z)$ then, $D_L(z) = cz^{-d}$ where $c$ is some constant and $d$ is the delay introduced by the system. Therefore, the QMF-banks guarantee complete aliasing cancellation but the condition $D_L(z) = cz^{-d}$ is only met if the designed filters have linear phase. As described in Section 5.2, the analysis as well as the synthesis filters can be represented with polyphase components which are presented in Figure 5.1 and defined in (5.18).

\[ H(z) = \begin{bmatrix} H_0(z) \\ H_1(z) \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} A_0(z^2) \\ z^{-1}A_1(z^2) \end{bmatrix} \]  
\[ G(z) = \begin{bmatrix} G_0(z) & G_1(z) \end{bmatrix} = \frac{1}{2} \begin{bmatrix} z^{-1}A_1(z^2) & A_0(z^2) \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \]  

(5.18a)  
(5.18b)

By substituting (5.18) into (5.16) and (5.17), the aliasing and linear distortion functions of the analysis-synthesis filter bank become;

\[ D_A(z) = 0 \]  
\[ D_L(z) = \frac{1}{2} z^{-1}A_0(z^2)A_1(z^2) \]  

(5.19a)  
(5.19b)

where (5.19a) satisfies the complete alias cancellation while (5.19b) is an allpass filter which depicts that the amplitude distortion is eliminated. However, the perfect signal reconstruction is dependent on the phase responses of $A_0(z)$ and $A_1(z)$, thus phase distortion persists unless a linear phase is obtained between the input and the output of the system. Theoretically, the phase distortion can be cancelled by a synthesis filter bank whose polyphase components have the inverse transfer functions of the allpass
filters of the analysis filter bank as given by (5.20) and presented in Figure 5.4. This is a common practice for the FIR filter banks since the inverse of an FIR filter transfer function is easily achieved by the time-reversal (i.e. flipping) of the filter coefficients and as the system poles are all placed at \( z = 0 \) the filter stability is not effected.

\[
H(z) = \begin{bmatrix}
H_0(z) \\
H_1(z)
\end{bmatrix} = \frac{1}{2} \begin{bmatrix}
1 & 1 \\
1 & -1
\end{bmatrix} \begin{bmatrix}
A_0(z^2) \\
\bar{z}^{-1}A_1(z^2)
\end{bmatrix}
\] (5.20a)

\[
G(z) = \begin{bmatrix}
H_0(z^{-1}) \\
H_1(z^{-1})
\end{bmatrix} = \frac{1}{2} \begin{bmatrix}
A_0(z^{-2}) \\
\bar{z}A_1(z^{-2})
\end{bmatrix} \begin{bmatrix}
1 & 1 \\
1 & -1
\end{bmatrix}
\] (5.20b)

![Diagram of a one level IIR wavelet filter bank in polyphase structure with causal stable analysis and stable but anti-causal synthesis filters.](image)

Meanwhile, the linear distortion transfer function with the new synthesis filters is obtained by inserting (5.20a) and (5.20b) into (5.17) which becomes a function of both analysis and synthesis filters.

\[
D_L(z) = \frac{1}{2} \left( A_0(z^2)A_0(z^{-2}) + A_1(z^2)A_1(z^{-2}) \right) = \bar{z}^{-d}
\] (5.21)
Both $A_0(z^2)A_0(z^{-2})$ and $A_1(z^2)A_1(z^{-2})$ demonstrate the phase non-linearity is compensation as well as the cancellation of the amplitude and aliasing distortion. However, as it can be observed from (5.20b) that the time reversal of the IIR filter impulse response leads to either causal and unstable or anti-causal and stable filters, thus implementation of the synthesis filters is not as straight forward as FIR synthesis filter banks. This work presents two solutions for realising the anti-causal synthesis filters where the first solution employs the block processing method introduced by Powell and Chau [169] and the second one uses FIR approximations of the anti-causal IIR synthesis filters using analytical expressions [168] to achieve novel hybrid IIR/FIR wavelet systems for low-complexity biomedical applications.

5.3.1 The Block Processing Technique

In order to obtain a causal and stable implementation for the synthesis filter bank for infinite length input signals, the block processing techniques introduced by Powell and Chau [169] is employed in this work, which was originally introduced to achieve approximate linear phase IIR filters. The feedback path in the IIR filter structure results in infinite duration impulse response which continues with decaying amplitudes. Thus, physical realisation of such a filter results in finite duration since the parasitic oscillations cannot be fully represented with the resolution supported by finite precision. Therefore, truncation naturally occurs due to the hardware limitations which indicates that the filter response can be approximated with a truncated version of the impulse response. Length of the truncated impulse is dependent on the filter order since higher orders result in longer impulse responses.
Truncation enables the implementation of non-causal time reversed filters in real time by using the overlap-add method [170]. In this method, an infinite length input sequence is divided into non-overlapping small blocks with sufficient number of samples and each block is filtered by two causal and stable IIR filters. Then the outputs are truncated and combined to get the continuous output sequence. The size of the input blocks are proportional to the length of the impulse response. In theory, the length of the input blocks \( L \) must be larger than the length of the physical realization of the impulse response \( h(k) \) so that the filter output length is less than or equal to \( 2L - 1 \) where (5.22) shows the segmented input,

\[
x(k) = \sum_{n=0}^{\infty} x_n(k) \quad \text{where}
\]

\[
x_n(k) = \begin{cases} 
x(k) & nL \leq k \leq (n + 1)L - 1 \\
0 & \text{elsewhere}
\end{cases}
\]  

(5.22)

where \( x(k) \) is the infinitely long input and \( x_n(k) \) represents the non-overlapping blocks of length \( L \). To compute the non-causal time reversed filter output, each block of input sequence is convolved with the time reversed impulse response \( h(-k) \) with the assumption that \( h(-k) \) is of finite length and equals 0 for \( k > L \). Then the output of each block is computed by the addition of two \( 2L - 1 \) length convolution outputs from the adjacent input blocks of length \( L \) as shown in (5.23).
\begin{align*}
y_n(k) &= x(k) \ast h(-k) \quad \text{for } nL \leq k \leq (n+1)L - 1 \\
&= \sum_{m=k}^{n+L} x(m) h(k-m) \\
&= \sum_{m=k}^{(n+1)L-1} x(m) h(k-m) + \sum_{m=(n+1)L}^{k+L} x(m) h(k-m) \\
&= h(-k) x_n(k) + h(-k) x_{n+1}(k) \\
&= X_n^L + X_{n+1}^T \\
y(k) &= \sum_{k=0}^{\infty} y_n(k)
\end{align*}

(5.23)

where \( y_n(k) \) is the output of each input block, \( X_n^L \) and \( X_{n+1}^T \) consist of the first and the last \( L \) outputs of the adjacent input blocks, respectively and \( y(k) \) is the non-causal time reversed filter output \[169\]. Thus, the real-time implementation of this procedure requires the following steps;

1. Store the input sequence into a \( L \)-length Last-in First-out (LIFO) register to obtain time reversed version of each input segment, \( x_n(-k) \).

2. \( 2L \)-length output sections are computed by filtering \( x_n(-k) \) with \( a_{0,1}(k) \), \( a_{0,1}(k) \ast x_n(-k) \).

3. Add the last \( L \) output samples of the current sequence and the first \( L \) output samples of the previous sequence, which can be achieved by introducing a delay of \( 2L \) samples into the path resulting in \( y_n(-k) \).

4. Finally, store the \( L \) samples of \( y_n(-k) \) into a \( L \)-length LIFO register which will provide the time-reversed version, \( y_n(k) \).
Figure 5.5: Implementation of non-causal time reversed allpass filter using Powell and Chau technique.

Figure 5.5 presents the block diagram for standalone realization of $A_{0,1}(z^{-1})$ filter which comprises two identical copies of $A_{0,1}(z)$ filter, two $L$-length LIFO registers, 2$L$-length shift register and an adder.

5.3.2 Approximation of anti-causal IIR filters by causal FIR filters

One way of causal implementation of $A_0(z^{-2})$ and $A_1(z^{-2})$ is the block processing method presented above. For this method the allpass filter impulse responses are truncated to a certain number of samples in order to approximate the anti-causal filtering operations via stable and causal filters. The selected number of samples approximately corrects the phase distortion created by the analysis allpass filters up-to a certain error. The block processing method is equivalent to having an FIR filter of order $L$ which can approximate the amplitude and phase characteristics of the allpass filter but with the ease of flipping the filter coefficients without worrying about instability or anti-causality. These FIR filter coefficients can be calculated by a simple polynomial expression given in (5.24) that are simply the Markov parameters of the state-space...
models. In (5.24a) the transfer functions of the first order allpass filters \( A_{0,1}(z) \) are presented whereas (5.24b) and (5.24c) demonstrate the truncated and the time-reversed truncated impulse responses of the allpass filters \( A_{0,1}(z) \), respectively [168].

\[
A_{0,1}(z) = \frac{z^{-1} + \alpha_{0,1}}{1 + \alpha_{0,1}z^{-1}} \tag{5.24a}
\]

\[
\tilde{A}_{0,1}(z) = \alpha_{0,1} + (-1)^{L_{0,1}-1} \alpha_{0,1}^{L_{0,1}-1} z^{-L_{0,1}} + (1 - \alpha_{0,1}^2) \sum_{k=0}^{L_{0,1}-2} (-1)^k \alpha_{0,1}^k z^{-(k+1)} \tag{5.24b}
\]

\[
R_{0,1}(z) = \alpha_{0,1} z^{-L_{0,1}} + (-1)^{L_{0,1}-1} \alpha_{0,1}^{L_{0,1}-1} + (1 - \alpha_{0,1}^2) \sum_{k=1}^{L_{0,1}-1} (-1)^{L_{0,1}-k-1} \alpha_{0,1}^{L_{0,1}-k-1} z^{-k} \tag{5.24c}
\]

\[
= (1 + \alpha_{0,1} z^{-1}) \sum_{k=0}^{L_{0,1}} (-1)^k \alpha_{0,1}^k z^{-(L_{0,1}-k-1)}
\]

where \( k \in \mathbb{Z} \) is an integer, \( \tilde{A}_{0,1}(z) \) is the FIR approximation of the allpass filters \( A_{0,1} \), and \( R_{0,1}(z) = \tilde{A}_{0,1}(z^{-1}) \) that can be used as synthesis filters. Therefore, analytically it now can be shown that the transfer functions of both the top and bottom branches of the filter bank satisfy (5.25) ensuring the phase compensation up to a certain error.

\[
A_0(z) R_0(z) = z^{-L_0} - (-1)^{L_0} \varepsilon(\alpha_0, L_0) \tag{5.25a}
\]

\[
A_1(z) R_1(z) = z^{-L_1} - (-1)^{L_1} \varepsilon(\alpha_1, L_1) \tag{5.25b}
\]

which are derived from the polynomial factorization relation;

\[
(z^{-1} + \alpha_{0,1}) \sum_{k=0}^{L_{0,1}-1} (-1)^k \alpha_{0,1}^k z^{-(L_{0,1}-k-1)} = z^{-L_{0,1}} - (-1)^{L_{0,1}} \alpha_{0,1}^{L_{0,1}} \tag{5.26}
\]

126
where $L_0$ and $L_1$ are the FIR filter orders such that (5.25a) and (5.25b) indicates that the analysis and synthesis paths are pure delay with near-PR property up to a certain error $\varepsilon(\alpha_0, L_0)$ and $\varepsilon(\alpha_1, L_1)$, respectively where $\varepsilon(\alpha_{0,1}, L_{0,1}) = \alpha^{L_{0,1}}$. There exists a trade off between the system delay and the compensation error, in other words, the error can be substantially reduced by increasing the FIR filters’ order which will increase the system delay as well as the computational complexity. On the other hand, the length of the $R_{0,1}(z)$ is proportional to the absolute magnitude of the allpass coefficients. As the absolute value of the coefficient grows towards one, then longer FIR filters are required to sufficiently approximate the first order allpass filter. This will lead to having data alignment problems between the top and the bottom branches for different allpass filters, therefore extra registers need to be used in the branch path that employs the lower order FIR filter. Hence, the amount of the delay should be equal to the delay difference between $R_0(z)$ and $R_1(z)$, i.e. $L_1 - L_0$, considering that $L_1 \geq L_0$. Thus, the new hybrid IIR/FIR hybrid wavelet filter banks are presented in Figure 5.6.

Figure 5.6: (a) One level hybrid IIR/FIR wavelet filter bank in polyphase structure with causal and stable IIR analysis and FIR synthesis filter banks.
5.4 Design of IIR Wavelet Analysis Filters for Biomedical Applications

In biomedical signal processing literature vast amount of research employed different wavelet families for various applications and biosignals. The selection of the mother wavelet as well as its design is dependent on the application of interest. Peak and discontinuity detection applications require mother wavelets with high vanishing moment, since they are capable of identifying any abrupt changes, even if they are not visible to the human eye. This directly relates to the differentiability of the wavelet polynomial [80]. On the other hand, the correlation of the wavelet function with the signal of interest (i.e. their similarity to each other), is another selection/design criteria for the mother wavelet which is important for feature detection and denoising applications. Furthermore, frequency selectivity of the wavelet filters is another critical property to take into consideration for denoising applications. Based on these criteria, as stated in the previous chapter, the Daubechies family and specially the mother wavelet $db_4$ with four vanishing moments and 8 distinct filter coefficients is the most popular mother wavelet employed in various biomedical applications. This creates a strong reference for the design consideration of IIR wavelets where similar frequency and vanishing moment characteristics are aimed to be achieved, as of $db4$ filters. Therefore, two IIR filters are designed using the method introduced in Section 5.2 with $K = 3$ (i.e. 3 zeros located at $z = -1$) and one single coefficient ($\alpha_0 = 1/3$) allpass filter, and $K = 5$ (i.e. 5 zeros located at $z = -1$) and two single coefficient allpass filters ($\alpha_0 = 0.106$ and $\alpha_1 = 0.528$). For both filters, allpass filter order $M = 2$, and passband edge frequency
\[ \omega_p = 0.4\pi \] are selected. The 3rd and 5th order maximally flat IIR analysis filters will be referred to as \( \text{ilet}3 \) and \( \text{ilet}5 \), and their magnitude responses, pole-zero locations and the scaling and wavelet functions are presented in Figure 5.7.

Figure 5.7: \( \text{ilet}3 \); (a) Magnitude response and (b) Pole-Zero locations. \( \text{ilet}5 \); (c) Magnitude response and (d) Pole-Zero locations. (e) \( \text{ilet}3 \) Scaling and Wavelet functions, and (f) \( \text{ilet}5 \) Scaling and Wavelet functions.
The analysis filter bank responses over different decomposition levels, 5 in this case, are also presented in Figure 5.8 for both *ilet* IIR and *db*4 analysis filter banks. The magnitude response similarities between *ilet*3 and *db*4 based filter banks can be easily observed although *ilet*3 achieves wider passband and steeper transition. *ilet*3 wavelet is a special variation of the IIR wavelets since, unlike its counterparts, only one allpass filter is employed in its polyphase structure and thus the analysis filter bank is computationally very efficient. Since, the wavelet and scaling filters achieve similar frequency characteristics with the *db*4 filters, *ilet*3 can be an attractive alternative to *db*4. On the other hand, the *ilet*5 based filter bank exhibits better frequency selectivity compared to the others.

Figure 5.8: Analysis filter bank responses for 5 level decomposition where D and A are the highpass and the lowpass branch responses. (a) *ilet*3, (b) *ilet*5 WT and (c) *db*4 WT.
5.5 IIR/IIR Wavelet Filter Banks for Biomedical Applications

As mentioned in Section 5.3, implementation of the IIR synthesis filter bank requires more attention due to the anti-causal synthesis filters. To overcome the causality problem two methods, the block processing and the FIR approximation methods are introduced. In this section, the block processing method is employed in order to design the synthesis filter banks for the IIR analysis filter banks, \textit{ilet}3 and \textit{ilet}5, discussed in the previous section. Therefore, two filter bank structures are designed for the \textit{ilet}3 and \textit{ilet}5 wavelets and these filter banks are referred to as IIR/IIR wavelet filter banks, as both the analysis and the synthesis filters are IIR filters.

5.5.1 Floating-Point Models

The block processing method requires the identification of the input block sizes that will be used for the Simulink model implementation. The first step of block size identification is impulse response truncation. Since the IIR filter banks are implemented as polyphase structures, the impulse responses of the allpass filters in the polyphase branches are truncated and the effects on the overall filter response are evaluated by calculating the maximum error introduced to the overall filter impulse response as a function of $L_0$ and $L_1$ using (5.27).

\[
\epsilon (L_0, L_1) = \max \left( \| H_0(e^{j\omega}) \| - \| H_0(e^{j\omega}) \| \right)
\]  

(5.27)
where \( H_0(e^{j\omega}) = \frac{1}{2}(A_0(e^{2j\omega}) + e^{-j\omega}A_1(e^{2j\omega})) \) and \( \tilde{H}_0(e^{j\omega}) \) are the original and the truncated IIR filters, respectively and \( L_0 \) and \( L_1 \) are the block sizes for \( A_0(z) \) and \( A_1(z) \), respectively. Since the wavelet and the scaling filters are halfband multirate filters, the noble identities are used. Thus, allpass sections operate at half the input sampling rate, which means that zeros in between impulse response samples are eliminated and the block sizes can be halved. Figures 5.9 and 5.10 present the maximum error \( \epsilon(L_0, L_1) \), magnitude responses and error magnitude of \( ilet3 \) and \( ilet5 \) scaling filters.

Figure 5.9: Truncated impulse response of \( ilet3 \), (a) Maximum error between the \( H_0(z) \) and \( \tilde{H}_0(z) \) which is the truncated IIR filter, and (b) Top figure; Magnitude response, Bottom figure; Magnitude error of the \( H_0(z) \) (blue) and \( \tilde{H}_0(z) \) (red) impulse response with \( L_0 = 8 \).

Figure 5.10: Truncated impulse response of \( ilet5 \), (a) Maximum error between the \( H_0(z) \) and \( \tilde{H}_0(z) \) which is the truncated IIR filter, and (b) Top figure; Magnitude response, Bottom figure; Magnitude error of the \( H_0(z) \) (blue) and \( \tilde{H}(z)_0 \) (red) impulse response with \( L_0 = 8 \) and \( L_1 = 16 \).
A block size of $L_0 = 8$ is selected for $ilet3$, as it employs only one allpass component. In other words, the sizes of the LIFO registers is eight and this results in a maximum magnitude error of $\epsilon(L_0) \approx -70\,dB$. On the other hand, $L_0 = 8$ and $L_1 = 16$ are selected for $ilet5$ allpass sections, respectively which led to $\epsilon(L_0, L_1) \approx -85\,dB$. Both maximum error magnitudes are negligible since such an magnitude mismatch in the filter responses is sufficient to generate a near-PR filter banks to be employed for biomedical signal processing applications. However, there is a trade-off between the block size and the computational complexity. As it can be observed from Figures 5.9(a) and 5.10(a) increasing the block size diminishes the maximum error but at the cost of increased memory requirements, read/write operations, and power consumption. Therefore, the block size determination can vary depending on the hardware resources as well as the target application requirements.

The aforementioned filter banks designed for $ilet3$ and $ilet5$ wavelets are modelled in MATLAB Simulink environment in which two, three level filter banks with floating point precision are implemented in order to evaluate the effect of block sizes on the filter banks’ near-PR properties. It should be noted that these models are straight forward implementations where tree structure is employed without any optimization or time-multiplexing considerations. Figures 5.11(a) and 5.12(a) present the structure of the first level analysis filter bank for $ilet3$ and $ilet5$, respectively. For both filter banks Numerator-Denominator TDL (ND(TDL)) allpass structures are employed with two adders, two registers and one multiplier. The selection of the allpass structure relates to the fixed point implementation which will be explained in the following section. In Figures 5.11(b) and 5.12(b) a single level synthesis bank is presented for $ilet3$ and $ilet5$,
respectively where the time-reversed allpass filters are annotated as ‘A0inv’ and ‘A1inv’. Similar to the analysis filter bank only one branch of the polyphase structure employs a time-reversed allpass filter for $ilet3$ where the second branch is pure delay. This provides higher computational efficiency compared to the $ilet5$ synthesis filter bank since the second branch does not employ a time reversed implementation. Figures 5.11(c) and 5.12(c) and 5.12(d) depicts the structure of the time-reversed allpass filters, ‘A0inv’ and ‘A1inv’. As it can be observed the time reversed allpass implementation requires correct timing scheme in order to operate correctly, which includes the correct switching times for the switches and activation time for the filters $A_{0,1}(z)$.

Figure 5.11: (a) First level analysis FB, (b) first level synthesis FB and (c) implementation of $A_{0}(z^{-1})$ with $L_{0} = 8$ in floating point precision for $ilet3$. 

134
Figure 5.12: (a) First level analysis FB, (b) first level synthesis FB, (c) implementation of $A_0(z^{-1})$ with $L_0 = 8$ and (d) implementation of $A_1(z^{-1})$ with $L_1 = 16$ in floating point precision for $ilet5$. 

135
A generalized timing diagram for the $A_{0,1}(z^{-1})$ is presented in Figure 5.13 where the 

Switch represents the timing for the switches as they need to switch at every $L_0$ or $L_1$ samples, Enable Bottom and Enable Top are used for enabling the allpass filters in the bottom and top branches, respectively at every $(2L_{0,1} - 1)$ samples. The top branch is enabled $L_{0,1}$ samples after the bottom branch in order to correctly overlap-add the $(2L_{0,1} - 1)$-length output sequence as described in Section 5.3.

This design is tested and evaluated based on the preservation of the perfect reconstruction property to observe the effects of block processing. Thus, three different type of test signals are fed into the system which are white Gaussian data with varying variances, and real ECG and EEG data collected from Physionet [114]. For the white Gaussian input 100 Monte Carlo simulations are performed while, 14 different ECG and EEG records are used to obtain an average $MSE$ and $SER$ via (5.28).

\[
MSE = \frac{\sum_{k=0}^{K-1} |x(k) - \hat{x}(k)|^2}{K} \quad (5.28a)
\]

\[
SER = 10 \log_{10} \left( \frac{\sum_{k=0}^{K-1} x(k)^2}{\sum_{k=0}^{K-1} |x(k) - \hat{x}(k)|^2} \right) \quad (5.28b)
\]

where $x(k)$ and $\hat{x}(k)$ are the input and the reconstructed output of the filter bank, and $K$ is the number of samples of $x(k)$. 

136
Figures 5.14(a) and 5.15(a) presents 10 second segment of the input and the output of the \textit{ilet}3 and \textit{ilet}5 systems, respectively with ECG record-232 obtained from the MIT-BIH Arrhythmia Database where Figures 5.14(b) and 5.15(b) show the Maximum Absolute Error (MAE) between the input and output for 14 ECG records.

Similarly, Figures 5.14(c) and 5.15(c) present a 10 second segment of the input and the output of the \textit{ilet}5 IIR/IIR wavelet filter bank with EEG record-chb14 obtained from the CHB-MIT Scalp EEG Database where Figures 5.14(d) and 5.15(d) show the MAE (mV) for EEG data records.
MAE between the input and the output for 14 EEG records. As it can be seen, the three level filter banks achieve near PR where the introduced reconstruction error is not observable.

Figure 5.15: ilet5 FB performance for perfect reconstruction with $L_0 = 8$ and $L_1 = 16$, (a) ECG record-232 input(red) vs reconstructed output (blue), (b) MAE (mV) for ECG data records, (c) EEG record-chb14 input(red) vs reconstructed output (blue), and (d) MAE (mV) for EEG data records.

Furthermore, Table 5.1 summarizes the average error metrics obtained for each input data type. An average 65 dB and 88 dB SER is obtained for all inputs with ilet3 and ilet5 analysis and synthesis filter banks, respectively which is a negligible amount of SER that will not be visible to the human eye and will not lead to a misinterpretation.
of the biomedical data. This confirms the $L_{0,1}$ selections for both filter banks, thus fixed-point implementations are made to further analyse real-life implications.

Table 5.1: Average Error Measures for evaluating the implemented three level IIR/IIR $ilet_3$ and $ilet_5$ wavelet filter banks.

<table>
<thead>
<tr>
<th>Error measurement</th>
<th>SER (dB)</th>
<th>MSE</th>
<th>MAE (mV)</th>
<th>SER (dB)</th>
<th>MSE</th>
<th>MAE (mV)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gaussian White Noise</td>
<td>65</td>
<td>$1.09 \times 10^{-7}$</td>
<td>$2.4 \times 10^{-3}$</td>
<td>87.8</td>
<td>$5.5 \times 10^{-10}$</td>
<td>$3 \times 10^{-4}$</td>
</tr>
<tr>
<td>ECG</td>
<td>65.7</td>
<td>$7.73 \times 10^{-8}$</td>
<td>$3.5 \times 10^{-3}$</td>
<td>89.2</td>
<td>$3 \times 10^{-9}$</td>
<td>$2.9 \times 10^{-4}$</td>
</tr>
<tr>
<td>EEG</td>
<td>65.4</td>
<td>$1.05 \times 10^{-9}$</td>
<td>$5.6 \times 10^{-4}$</td>
<td>89.6</td>
<td>$3.8 \times 10^{-12}$</td>
<td>$4.8 \times 10^{-5}$</td>
</tr>
</tbody>
</table>

5.5.2 Fixed-Point Models

It is well-know that the floating point implementations are not practical due to the limited hardware and source power. Therefore, previously introduced models are converted into fixed-point precision in terms of the filter coefficients, the filter datapath and the filter bank datapath. Here, the implications of the coefficient and datapath quantization are presented with the optimum solution to balance the trade off between the performance, and hardware and power consumption efficiency.

The proposed wavelet filter banks are solely polyphase structures which eliminates the sensitivity to the coefficient quantization of the direct form IIR filters, therefore shorter coefficient word-lengths can be selected as opposed to their FIR counterparts. Since allpass filters are the smallest building blocks of the overall filter banks, the first step in fixed-point conversion is the allpass coefficient quantization. The proposed analysis and synthesis filter banks consist of three distinct coefficients which are $\alpha_0 = 1/3$ for $ilet_3$ and $\alpha_0 = 0.10557$ and $\alpha_1 = 0.52786$ for $ilet_5$. The effect of coefficient quantization is evaluated by computing the filters’ magnitude response mismatch using (5.29).
\[ E(e^{j\omega}) = 10\log_{10} \left( \frac{|H(e^{j\omega})| - |\tilde{H}(e^{j\omega})|^2}{N_{DFT}} \right) \] (5.29)

where \( |H(e^{j\omega})| \) and \( |\tilde{H}(e^{j\omega})| \) are the magnitude responses of the IIR filters formed with floating-point and fixed-point allpass coefficients, respectively and \( N_{DFT} \) is the length of the DFT. In order to obtain fair comparisons with the FIR implementation introduced in Chapter 4, allpass coefficient word-length of 9 bits with 8 fractional bits are used, resulting in \( \approx -70 \, dB \) of filter magnitude magnitude mismatch. Figure 5.16 present the magnitude responses as well as the PZPs of the ilet3 and ilet5 wavelet and scaling filters with floating-point and fixed-point allpass coefficients.

Figure 5.16: ilet3 wavelet analysis filters; (a) Magnitude responses and (b) PZPs for floating-point \( H_0(z) \) and fixed-point \( \tilde{H}_0(z) \) coefficients, ilet5 wavelet Analysis filters; (c) Magnitude responses, and (d) PZPs for floating point and fixed point coefficients.
A fixed-point conversion results in arithmetic-round-off errors and arithmetic overflow. The round off errors occur due to use of shorter precision used in the datapath that degrade the filter performance in a linear manner. Meanwhile, the arithmetic overflows are more critical which occur when the filter’s dynamic range exceeds the maximum supported dynamic range of the fixed point precision. Such overflows can lead to a system behaving non-linearly which will result in erroneous outputs. The dynamic range of a filter is related to its gain and this gain is dependent on the filter structure being used. Allpass filters can be implemented with different structures and each has different gain and quantization noise characteristics varying with the filter coefficient and the input frequency content. In [171] the effects of datapath quantization for four different allpass structures are analysed using a linear quantization noise model in which the quantization noise is modelled as a white additive noise which is uncorrelated with the quantized input and it is added at each adder and multiplier. in this study, a similar analysis carried out and the ND(TDL) allpass filter structure, presented in Figure 5.17, with the best possible gain and quantization noise figures is selected.

![Diagram of allpass filter structure](image)

Figure 5.17: Peak gain and quantization noise shaping for 1st order ND-TDL allpass structure [171].

\[
|P_1(z)| = \frac{1 - z^{-2}}{1 + \alpha z^{-1}} \quad (5.30)
\]

\[
|P_2(z)| = \frac{\alpha + z^{-1}}{1 + \alpha z^{-1}}
\]

\[
|H_{N_1}(z)| = \frac{1}{1 + \alpha z^{-1}}
\]

\[
|H_{N_2}(z)| = \frac{\alpha}{1 + \alpha z^{-1}} \quad (5.31)
\]

\[
|H_{N_3}(z)| = \frac{1}{1 + \alpha z^{-1}}
\]
The peak internal values of the implemented first order allpass structures are determined at the output of each adder, since the coefficient values are smaller than one and adders are the only arithmetic operations contributing towards the internal gain growth. Thus, the transfer functions $P_1(z)$ and $P_2(z)$ are derived and presented in (5.30). Evaluating $P_1(z)$ and $P_2(z)$ for the coefficient values 0.33, 0.1056 and 0.5279, the resulting magnitude responses of $P_2(z)$ (i.e. $|P_2(e^{j\omega})|$) are uniform and one as expected, while the magnitude responses of $|P_1(e^{j\omega})|$ are presented in Figure 5.18 (a).

![Figure 5.18: (a) Gain $|P_1(z)|$ and (b) Output quantization noise power (QNPSD), for $A_0(z)$ of ilet3 (blue), $A_0(z)$ of ilet5 (red) and $A_1(z)$ of ilet5 (black).](image)

As it can be observed, the maximum gain of $|P_1(e^{j\omega})|$ in $L_1$-norm sense is 2 at half-Nyquist for all three coefficients. In other words, the magnitude response of the transfer function of the internal calculations is limited to a finite value of 2 for any frequency and is independent of the coefficient values, with the assumption that coefficients are smaller than one. This clearly states that if the input signal has frequency components close to half-Nyquist, then the gain of the internal operations will be doubled, and one extra guard bit is required at each arithmetic operator in order to prevent arithmetic overflow due to the bit growth. As mentioned above, the datapath word-length is
evaluated by adding white noise input at each arithmetic node and computing the quantization noise power at the output of each allpass filter. For this purpose, a convergent quantizer is used after the multiplier to quantize the datapath back to the selected fractional bits while keeping the added guard bit which solely defines the arithmetic-round-off error. In this work, the effect of convergent quantization on the \( \text{ND(TDL)} \) filter’s internal datapath is evaluated by feeding it with white Gaussian noise represented by 12 fractional bits and quantizing the datapath to fractional bits same as the input. The resulting quantization noise power at the output of the allpass filters with the aforementioned coefficients are presented in Figure 5.18(b) which ranges between \(-107 \, \text{dB}\) and \(-97 \, \text{dB}\), depending on the input frequency content and the value of the allpass coefficient. As it can be observed, the average noise power level increases with the increasing coefficient values which is a result of having poles of the filter closer to the unit circle. Also, the quantization noise will go to infinity if the coefficient values approach one, since there is no counter effect of the numerator of \( P_1(z) \) \text{[171]}\). If the filter coefficients approach one, then the increase in quantization noise power could be compensated with few additional fractional bits.

Finally, since the lowpass and highpass filters are implemented as polyphase structures and the outputs of each branch do not grow over the input range, the summation of allpass structures only requires an extra bit to fully represent the output. Based on the above evaluations, three level analysis and synthesis bank is implemented where the block processing method is used for the synthesis filter bank and the finite-wordlength effects are evaluated at the output of the filter bank. In addition, the approximation and detail coefficients are crucial elements of the wavelet transform that are used for
denoising and detection applications. Thus, the quantization effects on these coefficients are also evaluated, individually. For this purpose three different types of data are employed which are white Gaussian noise, ECG and EEG datasets used which are presented in Figures 5.14 and 5.15. For the ECG data records the input data is represented with 11-bit wordlength with 8 fractional bits, whereas the white Gaussian noise and EEG data are represented with 14-bit with 12 fractional bits. The quantization is applied at the internal datapath of each allpass filter where 8-fractional bits are discarded. Table 5.2 presents the average MSE, Quantization Noise Power (QNPSD), and SER at the output for both \textit{ilet3} and \textit{ilet5} wavelet filter banks where Table 5.3 presents the average QNPSD values for the approximation and detail coefficients ($A_3, A_2, A_1$ and $D_3, D_2, D_1$) at decomposition levels 3, 2, and 1, respectively.

Table 5.2: The average MSE, QNPSD, and SER metrics obtained for White Gaussian Noise, ECG and EEG data with \textit{ilet3} and \textit{ilet5} filter banks.

<table>
<thead>
<tr>
<th>Error measurement</th>
<th>\textit{ilet3}</th>
<th></th>
<th>\textit{ilet5}</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>MSE</td>
<td>QNPSD (dB)</td>
<td>SER (dB)</td>
<td>MSE</td>
</tr>
<tr>
<td>Gaussian White Noise</td>
<td>$3.5 \times 10^{-8}$</td>
<td>-74.5</td>
<td>69.7</td>
<td>$6 \times 10^{-8}$</td>
</tr>
<tr>
<td>ECG</td>
<td>$6.4 \times 10^{-6}$</td>
<td>-52.4</td>
<td>45.3</td>
<td>$8.6 \times 10^{-6}$</td>
</tr>
<tr>
<td>EEG</td>
<td>$3.6 \times 10^{-8}$</td>
<td>-74.5</td>
<td>50.3</td>
<td>$5.7 \times 10^{-8}$</td>
</tr>
</tbody>
</table>

Table 5.3: The average QNPSD obtained for Approximation ($A_3, A_2, A_1$) and detail coefficients ($D_1, D_2, D_3$) at decomposition levels 3, 2, and 1 for White Gaussian Noise, ECG and EEG data with \textit{ilet3} and \textit{ilet5} filter banks.

<table>
<thead>
<tr>
<th>\textit{ilet3}</th>
<th></th>
<th>\textit{ilet5}</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>White Noise (dB)</td>
<td>ECG (dB)</td>
<td>EEG (dB)</td>
</tr>
<tr>
<td>A3</td>
<td>-90.1</td>
<td>-68.1</td>
<td>-92</td>
</tr>
<tr>
<td>A2</td>
<td>-86.3</td>
<td>-65</td>
<td>-88.8</td>
</tr>
<tr>
<td>A1</td>
<td>-84.6</td>
<td>-62.2</td>
<td>-86</td>
</tr>
<tr>
<td>D3</td>
<td>-93.5</td>
<td>-75.1</td>
<td>-98.5</td>
</tr>
<tr>
<td>D2</td>
<td>-89.8</td>
<td>-69.5</td>
<td>-93.2</td>
</tr>
<tr>
<td>D1</td>
<td>-84.6</td>
<td>-62.2</td>
<td>-86</td>
</tr>
</tbody>
</table>
It should be noted that at each level the input data fed to the analysis filters is scaled down by two and the fractional bit growth is not truncated in order not to lose precision of the approximation and detail coefficients. Therefore, decomposition levels 1, 2, and 3 employ 9-, 10-, and 11- fractional bits for the ECG data and 13-, 14-, and 15- fractional bits for the white Gaussian noise and EEG data. The theoretical analysis demonstrated that when a white Gaussian noise with 12 fractional bits is fed through allpass filters, the expected allpass filter output quantization noise power for \(ilet3\) and \(ilet5\) floating-point filter coefficients ranges between \(-107 dB\) and \(-97 dB\).

Observing Table 5.3 which presents the QNPSD of lowpass and highpass outputs for all three types of input data, it can be seen that the average QNPSD values for the white Gaussian noise and EEG datasets are close but not equal to the theoretical values. This is expected as the fractional bit length increases at every decomposition level which decreases the QNPSD of the approximation and detail coefficients. However, the structure of the input signal also has significant effect on the theoretical QNPSD calculated, which will cause a deviation upto \(\pm 10\%\) \([17]\). In addition, although the EEG and white noise have the same wordlengths, it can be easily observed that the QNPSD values of EEG data records are lower. This is due to the spectral content of EEG signals as most of the signal power is concentrated at the low frequencies where the QNPSD is much lower, as shown in Figure 5.18. Thus, the presented results are in the expected noise power range. On the other hand, the output noise power is relatively higher but negligible compared to the internal noise power as quantization is also applied in the synthesis filter bank.
5.5.3 Hardware Validation and Cost Assessment

For hardware validation, cost assessment and performance evaluation, the *ilet3* and *ilet5* IIR wavelet filter banks (Figures 5.12 and 5.13) are designed as tree structured 1-level filter banks, and are synthesized and Place and Route (PAR) on a Kintex-7 (xc7k325tffg900) FPGA in Vivado v16.2, using the System Generator for DSP in the Matlab/Simulink. The filter bank resource utilizations are presented in terms of LUTs, flip-flops, block Random Access Memorys (RAMs) along with the maximum operating frequencies and power consumption figures. Implementation of the analysis filter banks is similar to the structures presented in Figures 5.12(a) and 5.13(a) and as mentioned earlier the ND(TDL) filter structures are used for the allpass sections. In addition, the filter coefficients are represented in CSD format and the CSE method is used for implementing them using hard-wired shifts and with minimum number of adders. On the other hand, the synthesis filter bank implementation requires more attention, as the timing and storage of the incoming data is critical. In order to implement the LIFO registers and the required data segmentation as described by (5.22) two dual port RAMs are used, one at the input and one at the output of the synthesis filter bank. The following subsections provide details of the synthesis filter bank architecture, and timing diagrams as well as the resource and power figures of the complete filter banks.

IIR Analysis Filter Bank Architecture

As discussed previously, the IIR analysis filter banks are implemented using the two-branch polyphase structures which employ allpass sections to implement the scaling ($H_0(z)$) and the wavelet ($H_1(z)$) filters. The allpass sections are designed with
IIR structures employing two registers, two adders and one multiplier. As the filter coefficients are designed according to the requirements of the target application, they are treated as constant multiplications, thus they are represented in CSD format and implemented as hard-wired shifts and adds using the CSE method introduced in Chapter 4. Hence the analysis filter bank is multiplier free where ilet3 analysis filter bank employs six adders and three registers and the ilet5 analysis filter bank employs ten adders and five registers, as each polyphase branch requires one first order allpass filter. Based on the analysis provided in the previous section, the gain of the allpass filters for both ilet3 and ilet5 is one, and one guard bit is used for the adders to prevent any overflow that might lead to an unstable system. Table 5.4 presents the fixed-point coefficients along with the adder depth and the shift add network used to implement the SCMs.

Table 5.4: Fixed-point (9-bit) ilet3 and ilet5 wavelet filter coefficients, their adder costs and shift-add format used to design the constant multiplications.

<table>
<thead>
<tr>
<th></th>
<th>ilet3</th>
<th>ilet5</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z</td>
<td>0.33203125</td>
<td>0.10546875</td>
</tr>
<tr>
<td>Adder Cost</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>Shift-Add</td>
<td>((2^2 + 1)(2^3 + 1))</td>
<td>(2^5 - (2^2 + 1))</td>
</tr>
</tbody>
</table>

Table 5.5 presents the resource utilization and power consumption figures estimated at 50 MHz for the one-level analysis filter banks for ilet3 and ilet5 wavelets. As expected, ilet3 analysis filter bank employs less resources than the ilet5 analysis filter bank where 32% resource savings and 24% improvement in power consumption figures are achieved for a typical 11-bit ECG data.
In Chapter 4 the resource utilization of the multiplierless db4 analysis filter bank is presented and compared to the proposed IIR wavelet analysis filter banks. It can be observed that the resources utilized are reduced massively by 61% and 43% with ilet3 and ilet5 analysis filter banks, respectively which reflects as power consumption improvement of 55% and 21.5%, respectively.

### IIR Synthesis Filter Bank Architecture

The fixed-point Simulink model of the synthesis filter banks for both ilets are discussed in Section 5.5.1 where LIFO registers and switches are used for storing and flipping the data in time. For the hardware implementation the LIFO registers and switches are replaced with dual port RAMs. The dual port RAMs are configured not to read during write mode and the data addresses are controlled by up-down counters. Hence, the RAM writes into addresses while the counter is counting up and reads from the addresses while the counter is counting down. During the write operation the outputs of the RAMs are reset back to zero. The input ports are accessed concurrently with $L$ sample delay which is the selected block size. This configuration generates the required...
input signal segmentation. The dual-RAM outputs are then filtered by identical allpass filters and then added up to obtain the overlap-added output. Finally an additional dual-port RAM is employed to reverse the time-reversing operation and to obtain the required output. For the \textit{ilet}3 wavelet the block size $L_0 = 8$ is selected thus, the dual-port RAM requires 16 addresses, first eight for the top branch and eight for the bottom branch. For this purpose, a 3-bit up-down counter is designed where the output three bits are extended by adding an MSB bit 0 and 1 for the address 0 to 7 and 8 to 15, respectively. The structure of a one-level IIR synthesis filter bank for the \textit{ilet}3 wavelet is given in Figure 5.19(a).

\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{filter_bank.png}
\caption{\textit{ilet}3 (a) One-level IIR synthesis filter bank architecture and (b) the timing diagram for controlling the operation of the synthesis filter bank.}
\end{figure}
Furthermore, the timing diagram for the synthesis filter bank is presented in Figure 5.19(b) where ‘CLK’ is the clock driving the synthesis filter bank, ‘DIR’ determines the direction of the up-down counter, ‘WR\_a,b’ is the write enable for the top and the bottom input ports, respectively, and similarly ‘RST\_a,b’ is the reset for the top and the bottom output ports. The architecture for the \textit{ilet}5 IIR synthesis filter bank is presented in Figure 5.20.

![Diagram](image)

**Figure 5.20: \textit{ilet}5 One-level IIR synthesis filter bank architecture with the block processing method.**

Here each polyphase branch of the synthesis filter bank employs the block processing method unlike the \textit{ilet}3 and the dual-port RAMs in the top branch consist of 16-address lines whereas the dual-port RAMs of the bottom branch consist of 32-address lines due to the block sizes $L_0 = 8$ and $L_1 = 16$ determined previously. The address lines for the top branch are generated in the same way as the \textit{ilet}3 design whereas the address lines for the bottom branch are generated by designing a 4-bit up-down counter where the MSB is bit-extended with 0 and 1 in order to obtain addresses from 0 to 15 and 16 to 31, respectively. The state diagrams and combinational logics of the 3- and 4-
bit up-down counters are presented in Appendix B Figures B.1 and B.2. The same timing-principles used for *ilet3* are employed for this structure as well. Finally, Table 5.6 presents the number of adders and multipliers as well as the resource utilization, maximum operating frequency and the estimated power consumption of *ilet3* and *ilet5* one-level IIR/IIR wavelet filter banks.

Table 5.6: Resource Utilization and Power Consumption of the Multiplier Free *ilet3* and *ilet5* IIR/IIR Filter Bank Architectures.

<table>
<thead>
<tr>
<th></th>
<th><em>ilet3</em></th>
<th><em>ilet5</em></th>
</tr>
</thead>
<tbody>
<tr>
<td>Input word length (bits)</td>
<td>8</td>
<td>11</td>
</tr>
<tr>
<td>Adders</td>
<td>17</td>
<td>17</td>
</tr>
<tr>
<td>Multipliers</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>LUTs</td>
<td>220</td>
<td>288</td>
</tr>
<tr>
<td>Registers</td>
<td>139</td>
<td>175</td>
</tr>
<tr>
<td>Max. Frequency (MHz)</td>
<td>76.4</td>
<td>75.6</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>5.333</td>
<td>6.437</td>
</tr>
</tbody>
</table>

Compared to the analysis filter bank resource utilization and power consumption figures from Table 5.5, a relatively high increase in the number of LUTs and FFs used as well as the estimated power consumption figures can be easily observed. For *ilet5* IIR/IIR filter bank with an 11-bit input, the synthesis filter bank employs almost three times more resources as well as block RAMs of the FPGA which corresponds to approximately three times more power consumption. Although the block processing method enables the causal implementation of the anti-causal filters, the hardware evaluation results show that this method is not efficient in terms of both hardware complexity and power consumption.
5.6 Hybrid IIR/FIR Wavelet Filter Banks for Biomedical Applications

In Section 5.5 two filter bank architectures are presented where the synthesis filter banks are implemented using the block processing method. The cost and power estimation studies demonstrated that the use of dual-port RAMs as well as the increased number of arithmetic operations in the synthesis filter bank, reduces the likelihood of them being employed in portable biomedical devices. On the other hand, a more attractive alternative of replacing the anti-causal IIR synthesis filters with FIR filters is proposed. Following the method and the theory introduced in Section 5.3, two hybrid IIR/FIR filter banks are designed and implemented for the ilet3 and ilet5 wavelets to be employed in low-complexity biomedical signal processing applications. Since the analysis IIR filters are already designed, the first step of the hybrid filter bank design procedure is the determination of the FIR filters’ order in order to minimize the phase compensation error defined as $\varepsilon(\alpha_i, L_i)$ while maintaining the low-complexity and suitable and reasonable delay of the system. After determination of the FIR filters’ order, the floating point model is validated by feeding the system with random biomedical signals and measuring the error introduced in the reconstructed signal. In order to implement the system on a hardware platform, FPGA in this case, the floating-point filter banks are converted into a fixed-point filter banks and later synthesized in order to evaluate the hardware complexity and estimate the power consumption.
5.6.1 Hybrid IIR/FIR Wavelet Filter Banks - Floating Point

In Section 5.4, two ilet wavelets referred to as ilet3 with one first order allpass filter and ilet5 with two first order allpass filters used for the analysis filter banks are designed. The structure of the analysis filter banks are also demonstrated in Figures 5.11 and 5.12.

In order to implement Hybrid IIR/FIR wavelet filter banks, the FIR synthesis filters for each allpass structure are designed using (5.24c). Figure 5.21(a) presents the phase compensation error measured in terms MSE between the IIR and the approximated FIR filters for FIR filter lengths varying from $L_0 = 2$ to $L_0 = 11$.

Figure 5.21: (a) Phase compensation error with $(L_0 - 1)^{th}$ order FIR filter for ilet3 wavelet, (b) Magnitude of Linear Distortion Transfer Function ($D_L(z)$), (c) Magnitude of Aliasing Distortion Transfer Function ($D_A(z)$), and (d) the group delay of the analysis and synthesis filter banks.
There is an inversely proportional relationship between the filter order and the amount of error introduced, as increasing the FIR filter order reduces the amount error between these two filters. As depicted in Figure 5.21(a) the magnitude of this error exhibits a linear relationship on a logarithmic scale with the number of coefficients selected for the FIR filter order. This is simply the result of the exponential decay of the error $\varepsilon(\alpha_{0,1}, L_{0,1}) = \alpha_{0,1}^{L_{0,1}}$ given in (5.26). In addition, an ideal IIR allpass filter magnitude response equates to unity which is due to the trigonometric relationship between the numerator and denominator polynomials as $z = e^{j2\pi\nu} = \cos(2\pi\nu) + jsin(2\pi\nu)$ by the Euler's formula. As demonstrated by (5.24b), this ratio of polynomials is approximated by a single polynomial (i.e. an FIR filter) that requires a compromise in the accuracy of the unity filter response. Therefore, the filter magnitude mismatch (i.e. the error between the ideal allpass IIR and the approximated FIR magnitude responses) carries a trigonometric characteristic and approximates to a sinusoidal response as it can be observed from the Figures 5.21(b). Furthermore, Figures 5.21(b) and 5.21(c) present the magnitude of the linear distortion and aliasing distortion transfer functions defined by (5.16) and (5.17), respectively where two filter lengths $L_0 = 7$ (blue) and $L_0 = 11$ (red) are used to observe their effects. Finally, Figure 5.21(d) presents the overall group delay of the analysis-synthesis filter banks. FIR filter order of $L_0 = 11$ provides better approximation of the IIR analysis allpass filters as the magnitude of linear distortion transfer function is $\approx 1$, the magnitude of aliasing distortion function is $\approx -120 \, dB$ and the group delay variation of the filter bank is in the magnitude of $10^{-4}$. However, there is a trade-off between the performance and the computational complexity. Employing a $10^{th}$ order FIR filter requires extra coefficients and higher
precision which will increase the complexity of the synthesis filter, therefore $L_0 = 7$ is determined to balance this trade-off while maintaining the requirements for a near-PRI filter bank. The magnitude and group delay responses of the analysis $(H_0(z), H_1(z))$, and the synthesis $(G_0(z), G_1(z))$ filters implemented using the $A_0(z)$ with $\alpha_0 = 1/3$ and the proposed 6th order FIR $(R_0(z))$, are presented in Figure 5.22.

![Graphs showing magnitude and group delay responses for analysis and synthesis filters.](a) Analysis filter magnitude responses, (b) Synthesis filter magnitude responses, (c) Analysis filter group delay, and (d) synthesis filter group delay.

The similarity between the analysis and synthesis filters’ magnitude responses can be easily detected although $G_0(z)$ and $G_1(z)$ exhibit smaller stopband attenuation. In addition, the group delay responses of the analysis and the synthesis filters are almost the opposite to each other, ensuring the phase compensation up to a certain error. The
structure of the one-level hybrid wavelet filter bank is presented in Figure 5.23.

![Diagram]

Figure 5.23: Floating model of one-level hybrid IIR/FIR wavelet filter bank for ilet3 wavelet.

In addition, Figure 5.24(a) presents the effect of $L_{0,1}$ selections on the magnitude responses of the $R_{0,1}(z)$ designed for the ilet5 wavelet, and the degree of deviation from the analysis allpass filters $A_{0,1}(z)$ in terms of MSE.

![Graphs]

Figure 5.24: (a) Phase compensation error with $(L_0 - 1)^{th}$ and $(L_1 - 1)^{th}$ order FIR filters for ilet5 wavelet, (b) Magnitude of Linear Distortion Transfer Function ($D_L(z)$), (c) Magnitude of Aliasing Distortion Transfer Function ($D_A(z)$), and (d) the group delay of the analysis and synthesis filter banks.
Unlike *ilet*3 wavelets, two first order allpass filters are employed, where the coefficient of $A_0(z)$ is smaller than the *ilet*3 allpass filters and the coefficient of $A_1(z)$ is larger. This is representative of the MSE values, since the selection of $L_{0,1} = 7$ results in $-140$ dB and $-50$ dB mismatch for $A_0(z)$ and $A_1(z)$, respectively. Therefore, smaller $L_0$ and larger $L_1$ values need to be selected. Figures 5.24(b), 5.24(c), and 5.24(d) compares the effect of $L_{0,1}$ selection and based on these comparisons $L_0 = 5$ and $L_1 = 11$ are selected as the orders for the synthesis FIR filters. The magnitude and group delay responses of the analysis and the synthesis filters are presented in Figure 5.25.

**Figure 5.25:** *ilet*5 wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter magnitude responses, (c) Analysis filter group delay, and (d) synthesis filter group delay.
Similar to the ilet3 wavelet and scaling filters the synthesis filters’ magnitude responses exhibit smaller stopband attenuation which is $-68dB$. The structure of the one-level hybrid wavelet filter bank is presented in Figure 5.26.

![Figure 5.26: Floating model of one-level hybrid IIR/FIR wavelet filter bank for ilet5 wavelet.](image)

Furthermore, three level floating-point precision hybrid IIR/FIR wavelet filter banks are implemented for ilet3 and ilet5 wavelets and are fed with the same dataset used in the previous section, in order to evaluate the effect of the selected FIR filter lengths on the PR property of the filter banks. For this purpose, MSE, MAE, and SER are calculated. The results are summarized in Table 5.7 where the presented values demonstrate negligible amount of reconstruction error introduced to the three data types with different frequency content.

Table 5.7: Average Error Measures for evaluating the implemented three level hybrid IIR/FIR ilet3 and ilet5 wavelet filter banks.

<table>
<thead>
<tr>
<th>Error measurement</th>
<th>ilet3</th>
<th>ilet5</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>SER (dB)</td>
<td>MSE (mV)</td>
</tr>
<tr>
<td>Gaussian White Noise</td>
<td>67</td>
<td>$5.5 \times 10^{-10}$</td>
</tr>
<tr>
<td>ECG</td>
<td>63</td>
<td>$1.4 \times 10^{-6}$</td>
</tr>
<tr>
<td>EEG</td>
<td>64</td>
<td>$1.6 \times 10^{-9}$</td>
</tr>
</tbody>
</table>

Increasing the FIR filter orders will make the linear distortion arbitrarily small, hence the hybrid filter banks can achieve better PR performance. However, by increasing FIR filter order, the values of the extra coefficients will get smaller which will increase
the hardware and computational complexity of the hybrid filter banks, as longer coefficient word-lengths are required to represent the small coefficients. Otherwise, these coefficients will be rounded to zero and they will not contribute to the PR performance of the filter bank. The values presented in Table 5.7 showed that with the selected synthesis FIR filter orders, the filter bank output is near perfectly reconstructed.

5.6.2 Hybrid IIR/FIR Wavelet Filter Banks - Fixed Point

Fixed-point conversion of the analysis filter banks for both \( \text{ilet}_3 \) and \( \text{ilet}_5 \) wavelets are presented in Section 5.5 where the allpass filter coefficients are quantized to 9 bits. Since, FIR filters are more sensitive to coefficient quantization which require higher word-lengths compared to the allpass based halfband polyphase IIR filters, hence the FIR filter coefficients are quantized to 10 bits (one sign and nine fractional bits). The FIR synthesis filter, \( R(z) \) for \( \text{ilet}_3 \) wavelet employs eight distinct coefficients whereas the synthesis filters, \( R_0(z) \) and \( R_1(z) \) for \( \text{ilet}_5 \) wavelet employ six and twelve distinct coefficients, respectively. The magnitude responses of the synthesis filters with floating-point \( (G_{0,1}(z)) \) and fixed-point coefficients \( (\tilde{G}_{0,1}(z)) \) are presented in Figure 5.27.

![Figure 5.27: Magnitude responses of; (a) \( \text{ilet}_3 \) synthesis filters and (b) \( \text{ilet}_5 \) synthesis filters with floating-point (blue-red) and fixed-point (black-green) coefficients.](image-url)
The coefficient quantization leads to a 10 \( dB \) deterioration in the synthesis filters’ stopband attenuation. Unlike the conventional digital filter designs, the effect of this deterioration is evaluated via recomputing the linear and aliasing distortion transfer functions introduced in Section 5.3 as well as the overall filter bank group delay by employing the scaling and wavelet, analysis and synthesis filters with the quantized filter coefficients. These are presented in Figures 5.28 and 5.29 for \( ilet3 \) and \( ilet5 \) wavelets, respectively.

Figure 5.28: Magnitude of: (a) linear, and (b) aliasing distortion transfer functions, and (c) the hybrid filter bank group delay, with floating- (blue) and fixed-point (red) coefficients for \( ilet3 \) wavelet.
Although, the floating-point IIR/FIR hybrid wavelet filter banks result in equiripple results as outlined in Subsection 5.6.1, their finite-precision counterparts deviate from the equiripple characteristics. In Figures 5.28 and 5.29 this deviation can be easily observed through the red plots. As presented in Section 5.3, the linear and aliasing distortion transfer functions are calculated using (5.16) and (5.17), which results in deterministic responses whereas, the changes in the magnitude and group delay responses after filter coefficients quantization is non-deterministic and cannot be defined using a mathematical expression, analytically. In addition, the ripple characteristics such as
their shape and amplitude differ according to the selected loss of precision method. In this study, the loss of precision method is empirically determined according to the peak error magnitude, and convergent rounding is employed since it provides the least peak error for both linear and aliasing transfer functions. Increasing the precision of the coefficients (more fractional bits) will effectively lead to smaller quantization errors and eventually result in similar responses as the floating-point coefficients. However, the quantization error introduced is negligible and provides magnitude of $\approx 1$ (with a deviation of $-56.48$ dB) for the linear distortion functions and almost constant (deviation of $-33$ dB) group delays for both wavelets. Also, carefully observing these figures, the resulting magnitude responses of the aforementioned transfer functions exhibit a symmetric relationship around half-Nyquist ($\nu = 0.25$), as the allpass IIR and approximated FIR filters are half-band filters. Furthermore, the effect of coefficient quantization is also quantified by employing the same approach as for the floating-point hybrid wavelet filter banks where the error between the reconstructed data and the input is measured, and the results are presented in Table 5.8. In general, a increase in the MSE and MAE values can be easily observed after comparing with the results from Table 5.7. However, regardless of the increase in noise power, the features of the biomedical data such as the amplitude and fiducial points are not effected, therefore there is no observable distortion.

Table 5.8: Average Error Measures for Evaluating Three Level Hybrid IIR/FIR ilet3 and ilet5 Wavelet Filter Banks with Finite-Precision Filter Coefficients.

<table>
<thead>
<tr>
<th>Error measurement</th>
<th>ilet3</th>
<th>ilet5</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>SER (dB)</td>
<td>MSE</td>
</tr>
<tr>
<td>Gaussian White Noise</td>
<td>50.6</td>
<td>$2.9 \times 10^{-5}$</td>
</tr>
<tr>
<td>ECG</td>
<td>50</td>
<td>$2.5 \times 10^{-5}$</td>
</tr>
<tr>
<td>EEG</td>
<td>51.1</td>
<td>$2.64 \times 10^{-8}$</td>
</tr>
<tr>
<td></td>
<td>52</td>
<td>$2.1 \times 10^{-5}$</td>
</tr>
<tr>
<td></td>
<td>49.4</td>
<td>$2.5 \times 10^{-5}$</td>
</tr>
<tr>
<td></td>
<td>49.6</td>
<td>$4.2 \times 10^{-8}$</td>
</tr>
</tbody>
</table>
Figure 5.30: Four seconds of (a) ECG data record-232 (top figure) and reconstruction error (bottom figure), and (b) EEG data record-chb14 (top figure) and reconstruction error (bottom figure).

Figure 5.30 presents four seconds of reconstructed ECG data and EEG data after being fed into a three level $ilet_3$ IIR/FIR hybrid wavelet filter bank with finite-precision coefficients. The top, middle and bottom plots for both Figures 5.30 (a) and (b) are the input, output signals and the reconstruction error introduced by the system, respectively. In addition, it can be easily seen that the three level $ilet_3$ wavelet filter bank introduces a constant delay of 126 samples or 0.35 seconds at a sampling rate of 360 Hz which is the time required for the filter bank to enter the steady state. This delay is fraction of a second which is negligible for monitoring purposes as in real-time applications, the output delay should not be perceivable by the user [172,173].

5.6.3 Hardware Validation and Cost Assessment

For hardware validation, cost assessment and performance evaluation, the $ilet_3$ and $ilet_5$ hybrid IIR/FIR wavelet filter banks are designed as a one-level tree structure with polyphase components as presented in Figures 5.31(a) and 5.33(a), synthesized
and PAR on a Kintex-7 (xc7k325tf900) FPGA in Vivado v16.2, using the System Generator for DSP in Matlab/Simulink. For the hybrid systems, the analysis all-pass sections are designed to have the ND(TDL) IIR structure with two adders, two registers and one multiplier, whereas the synthesis FIR filters are implemented as time-multiplexed structures which conventionally employ an input and a coefficient memory, a multiplier and an accumulator. The allpass filter coefficients are represented in CSD format and the CSE method is used for implementing them using hard-wired shifts and with a minimum number of adders. Since the coefficient multiplications are the most hardware and power demanding arithmetic operations, the ReMB method presented in Chapter 4 is used for replacing the coefficient memory and the multipliers of the synthesis filters. The FIR synthesis filter, \( R_0(z) \) for \( ilet_3 \) wavelet employs eight distinct coefficient whereas the synthesis filters, \( R_0(z) \) and \( R_1(z) \) for \( ilet_5 \) wavelet employ six and twelve distinct coefficients. In order to design the ReMBs, the aforementioned quantized coefficients are scaled by \( 2^9 \) to obtain integer values. The fixed-point coefficients and their integer representations are listed in Table 5.9.

Table 5.9: FIR Synthesis Filters’ Coefficients for \( ilet_3 \) and \( ilet_5 \) Wavelets.

<table>
<thead>
<tr>
<th></th>
<th>( ilet_3 )</th>
<th></th>
<th>( ilet_5 )</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Fixed-point</td>
<td>Integer</td>
<td>Fixed-point</td>
</tr>
<tr>
<td>( R_0(z) )</td>
<td>0.001953125</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>( R_1(z) )</td>
<td>-0.00390625</td>
<td>-2</td>
<td>-0.001953125</td>
</tr>
<tr>
<td>( R_2(z) )</td>
<td>0.01171875</td>
<td>6</td>
<td>0.01171875</td>
</tr>
<tr>
<td>( R_3(z) )</td>
<td>-0.033203125</td>
<td>-17</td>
<td>-0.103515625</td>
</tr>
<tr>
<td>( R_4(z) )</td>
<td>0.099609375</td>
<td>51</td>
<td>0.98828125</td>
</tr>
<tr>
<td>( R_5(z) )</td>
<td>-0.296875</td>
<td>-152</td>
<td>0.10546875</td>
</tr>
<tr>
<td>( R_6(z) )</td>
<td>0.888671875</td>
<td>455</td>
<td>-</td>
</tr>
<tr>
<td>( R_7(z) )</td>
<td>0.333984375</td>
<td>171</td>
<td>-</td>
</tr>
<tr>
<td>( R_8(z) )</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>( R_9(z) )</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>( R_{10}(z) )</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>( R_{11}(z) )</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</tbody>
</table>
Figure 5.31(a) presents the architecture used to design a one-level hybrid IIR/FIR wavelet filter bank for ilet3 wavelet, whereas Figures 5.31(b) and 5.32 demonstrate the structure of the corresponding ReMB and its controller. As $R_0(z)$ employs eight coefficients, the controller is a 3-bit counter where its output is decoded into single bits as $A$, $B$, and $C$, MSB to LSB, respectively and their logic combinations are used to generate the control signals $S_0$, $S_1$, and $S_2$.

![Diagram of the multiplier free architecture of the Hybrid IIR/FIR wavelet filter bank for ilet3, and the structure of the ReMB.](image-url)

Figure 5.31: (a) The multiplier free architecture of the Hybrid IIR/FIR wavelet filter bank for ilet3, and (b) the structure of the ReMB.
Figure 5.32: The structure of the controller designed for generating the ReMB control signals.

Table 5.10 presents these control signals, where $S_{x1}$ and $S_{x0}$ represent the MSB and LSB of $S_x$, respectively. The truth tables used to design the controller logic are provided in Appendix B, Table B.1. The maximum adder depth required by the *ilet3* FIR filter coefficients is three hence, the ReMB has an adder depth of three and employs three adders. The overall filter bank is multiplier free and incorporates twelve adders where six are used for the analysis filter bank and the remaining six for the synthesis filter bank.
Table 5.10: Control signals for the ReMB designed for $R_0(z)$ of $ilet3$ Hybrid IIR/FIR wavelet filter bank.

<table>
<thead>
<tr>
<th>Coefficient</th>
<th>Counter</th>
<th>$S_0$</th>
<th>$S_1$</th>
<th>$S_2$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z</td>
<td>A</td>
<td>B</td>
<td>C</td>
<td>$S_{01}$</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>-2</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>-17</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>51</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>-152</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>455</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>171</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Furthermore, Figure 5.33 presents the architecture used to design a one-level hybrid IIR/FIR wavelet filter bank for $ilet5$ wavelet, whereas Figures 5.34(a) and 5.34(b) demonstrate the structures of the ReMBs of $R_0(z)$ (ReMB$_0$) and $R_1(z)$ (ReMB$_1$), respectively employed in the synthesis filter bank. In addition the control signals required for ReMBs’ correct operation and the structure of the controllers designed for (ReMB$_0$) and (ReMB$_1$) are presented in Table 5.11 and Figure 5.35 and Table 5.12 and Figure 5.36, respectively. As mentioned before, $R_0(z)$ has six coefficients, hence a 3-bit counter is employed whereas $R_1(z)$ has twelve coefficients, hence a 4-bit counter is employed and the control logic is more sophisticated. The truth tables used to design the controllers logic are provided in Appendix B Tables B.2 and B.3 for $R_0(z)$ and $R_1(z)$, respectively. (ReMB$_0$) and (ReMB$_1$) employ three and four adders respectively and therefore the overall filter bank is multiplier free and incorporates 21 adders where six is used in the analysis filter bank and the remaining 15 for the synthesis filter bank.
Figure 5.33: The multiplier free architecture of the one-level Hybrid IIR/FIR wavelet filter bank for \( ilet_5 \)
Figure 5.34: Structure of (a) ReMB₀ designed for \( R₀(z) \) and (b) ReMB₁ designed for \( R₁(z) \) of the \( ilei5 \) Hybrid IIR/FIR wavelet filter bank.
Table 5.11: Control signals for the ReMB designed for $R_0(z)$ of $ilet_5$ Hybrid IIR/FIR wavelet filter bank.

<table>
<thead>
<tr>
<th>Coefficient</th>
<th>Counter</th>
<th>$S_0$</th>
<th>$S_1$</th>
<th>$S_2$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z</td>
<td>$A$</td>
<td>$B$</td>
<td>$C$</td>
<td>$S_{01}$</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>-1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>-53</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>506</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>-54</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Figure 5.35: The controller designed for generating the ReMB control signals.

Table 5.12: Control signals for the ReMB designed for $R_1(z)$ of $ilet_5$ Hybrid IIR/FIR wavelet filter bank.

<table>
<thead>
<tr>
<th>Coefficient</th>
<th>Counter</th>
<th>$S_0$</th>
<th>$S_1$</th>
<th>$S_2$</th>
<th>$S_3$</th>
<th>$S_4$</th>
<th>$S_5$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z</td>
<td>$A$</td>
<td>$B$</td>
<td>$C$</td>
<td>$D$</td>
<td>$S_{01}$</td>
<td>$S_{00}$</td>
<td>$S_{11}$</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>X</td>
</tr>
<tr>
<td>-1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>X</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>-4</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td>8</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>-15</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>X</td>
</tr>
<tr>
<td>29</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>-54</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td>103</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>-195</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>369</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>270</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>X</td>
</tr>
</tbody>
</table>
Figure 5.36: The controller designed for generating the ReMB control signals.
The filter bank resource utilizations are presented in terms of LUTs, flip-flops, along with the maximum operating frequencies and estimated power consumption at 50 MHz clock frequency. Unlike the IIR/IIR wavelet filter banks the hybrid IIR/FIR wavelet filter banks do not employ block RAMs which reflect to the estimated power consumption figures where for \( ilet3 \) and \( ilet5 \) wavelets with 11-bit input, this results in 37.5% and 55.7% improvement in the estimated power consumption, respectively. The proposed hybrid systems also increase the maximum achievable operating frequency of the systems which demonstrates that the maximum path delays are shorter. Therefore, hybrid systems become attractive alternative to the IIR/IIR wavelet filter banks.

Table 5.13: Resource Utilization and Power Consumption of the Multiplier Free \( ilet3 \) and \( ilet5 \) IIR/FIR Filter Bank Architectures.

<table>
<thead>
<tr>
<th></th>
<th>( ilet3 )</th>
<th>( ilet5 )</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Input word length (bits)</strong></td>
<td>8</td>
<td>11</td>
</tr>
<tr>
<td><strong>Adders</strong></td>
<td>12</td>
<td>12</td>
</tr>
<tr>
<td><strong>Multipliers</strong></td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>LUTs</strong></td>
<td>229</td>
<td>272</td>
</tr>
<tr>
<td><strong>Registers</strong></td>
<td>210</td>
<td>258</td>
</tr>
<tr>
<td><strong>Max. Frequency (MHz)</strong></td>
<td>81</td>
<td>85</td>
</tr>
<tr>
<td><strong>Power (mW)</strong></td>
<td>3.93</td>
<td>4.681</td>
</tr>
</tbody>
</table>

5.7 Hybrid IIR/FIR Wavelet Filter Banks for ECG Signal Denoising

In biomedical signal processing literature the DWT is used for various purposes and denoising is one popular application of the DWT, which has received considerable attention in biomedical signal noise reduction. The theoretical and experimental stud-
ies proved that the hybrid IIR/FIR wavelet filter banks are capable of maintaining near-perfect reconstruction with the right selection of parameters. However, it is also important to evaluate their performance in noise removal scenarios. The ECG signals are usually contaminated with noise where the noise and signal spectra overlap and the conventional filtering techniques are insufficient to remove this noise. The DWT is a popular tool in the field of non-stationary signal processing that provides simultaneous time and frequency information, and has been used to detect and remove the overlapping noise from the signal. In ECG denoising literature a vast amount of research employed FIR filter banks with various wavelet families, most popular ones being the Daubechies (e.g. db4), Symmlets (e.g. sym4) and Coiflets (e.g. coif4) [156, 174–177]. However, IIR wavelet filter bank studies are less extensive and limited to image processing and compressing applications [166,176]. This section presents the application of the proposed hybrid IIR/FIR wavelet filter banks for ECG signal denoising. To the best knowledge of the authors, this is a first in the wavelet literature for ECG denoising and the results demonstrate that the proposed hybrid IIR/FIR DWT filter banks achieve better denoising performance in terms of SNR improvement and MSE with reduced arithmetic operation complexity compared to the conventional FIR/FIR wavelets.

5.7.1 Method

There are various types of noise such as the powerline interference, baseline wander, and muscle contraction artifacts that are assumed to be additive and independent from the the ECG signal which is generally modelled as $x_n(n) = x_c(n) + e(n)$, where
\(x_n(n), x_c(n),\) and \(e(n)\) are the noisy ECG, the clean ECG and the composite noise, respectively. Powerline interference can be eliminated by a digital notch filter, the spectrum of other noise sources overlap with the spectrum of the ECG which makes them difficult to remove with conventional filtering techniques. In such circumstances, wavelet thresholding can be employed where the noisy signal is decomposed into several levels, thresholded and reconstructed \[179\]. The block diagram of the DWT based denoising is presented in Figure 5.37 and the denoising steps are summarized below.

- Decompose the noisy ECG signal into 7 levels.
- Compute threshold for each detail coefficient (i.e. \(cD_j(k)\); outputs of \(h_1(k)\) at each level) and apply the selected thresholding technique to remove EMG noise.
- Nullify the finest level approximation coefficient to remove the baseline wander (i.e. \(cA_6(k)\); \(h_0(k)\) output at level 7).
- Reconstruct the thresholded detail coefficients to obtain the denoised signal.

For this study, soft thresholding \[180\] given in \(5.32a\) \[180\], is used where the threshold \(\lambda\) is computed using Rigorous SURE (Stein’s Unbiased Risk Estimator) criterion given by \(5.32b\) \[181\].

Figure 5.37: The block diagram of the DWT based denoising method.
\( \tilde{cD}_j = \begin{cases} \text{sign}(cD_j)(|cD_j| - \lambda) & |cD_j| \geq \lambda \\ 0 & |cD_j| < \lambda \end{cases} \) \quad (5.32a)

\[ \lambda = \sigma \sqrt{\min(\epsilon)} \] \quad (5.32b)

\[ \epsilon = \frac{(K - 2i) + cD_j^2 (K - 1) + \sum_{i=1}^{K} cD_j}{K} \] \quad (5.32c)

where \( cD_j \) and \( \tilde{cD}_j \) are the original and denoised detail coefficient at level \( j \), respectively. \( \sigma = \text{median}(|cD_1|)/0.6745 \) is the noise variance of level 1 detail coefficient, \( K \) is the length of the detail coefficient at each level, and \( i = 1, 2, \ldots, K \). The thresholding method and threshold criterion is empirically determined where soft thresholding is well-known for delivering smoother outputs and the Rigorous SURE threshold selection scheme is known for successfully identifying the small details of signal overlapped with noise. A good comparison of different threshold selection and thresholding methods can be found in [182].

5.7.2 Generated ECG data and Synthetic Noise Sources

Four raw ECG records (‘103’, ‘105’, ‘109’, and ‘118’) are randomly taken from the MIT-BIH arrhythmia database [114] which are resampled to 256 Hz. In order to obtain clean control data, preprocessing stages are applied, including notch and highpass filtering (cut-off frequency \( f_c = 0.5 \) Hz), to remove 60 Hz powerline interference and baseline wander, respectively. Then, the EMG interference \( x_\epsilon(k) \) is modelled as white Gaussian noise, whereas the baseline wander is modelled as additive combination of
deterministic and random data with frequency content below 1 Hz as shown in (5.33).

\[ x_{bw}(k) = \sum_{i=1}^{P} \sin \left( 2\pi k \frac{f_i}{f_s} \right) + W(k) \]  

(5.33)

where \(0 < f_i \leq 1\) for \(i = 1, 2, \ldots, P\), \(f_s\) is the sampling frequency and \(W(k)\) is lowpass filtered \((f_c = 1\) Hz\) white Gaussian noise. Thus, the composite noise is obtained by \(e(k) = A(x_c(k) + x_{bw}(k))\) where \(A\) is the input noise scaling factor that is determined by the desired input \(\text{SNR}\) and is computed via;

\[
A = \sqrt{\frac{\sum_{k=1}^{K} \left| x_c(k) \right|^2}{\sum_{k=1}^{K} \left| e(k) \right|^2}} 10^{-\text{SNR}/10} 
\]

(5.34)

The ECG signal denoising performance of ilet3 and ilet5 hybrid IIR/FIR filter banks as well as \(db4, db6, db8\) \(\text{sym4}\), and \(coif4\) FIR/FIR wavelet filter banks are evaluated and compared by computing the \(\text{MSE}\) and the \(\text{SNR}\) improvement. In this study, the MSE results are used to evaluate the amount of signal distortion introduced after denoising whereas the SNR improvement results represent the ratio of improvement in the noise power with respect to the signal power.

\[
\text{SNR}_{imp} = \frac{\sum_{k=1}^{K} \left| x_{n}(k) - x_{c}(k) \right|^2}{\sum_{k=1}^{K} \left| x_{d}(k) - x_{c}(k) \right|^2} 
\]

(5.35)

where \(x_d(k)\) is the denoised ECG signal. In Figure 5.38 a 5 second segment of the clean record ‘105’, the generated EMG noise, the baseline wander and the noisy record with an SNR of -8 dB are presented.
5.8 Results and Discussions

In Figure 5.38, it is shown that an input SNR of -8 dB results in an ECG signal which is buried in noise where the QRS peaks are barely visible and the ECG characteristics with lower amplitude such as T and P waves cannot be easily distinguished. Thus, having a lower input SNR indicates that the ECG signal cannot be used for diagnostic purposes which necessitates noise removal. On the other hand, ECG recordings do
not always suffer from strong noise therefore, higher SNR levels are also required to mimic real life applications. Therefore, four 60 second long records (‘103’, ‘105’, ‘109’, and ‘118’) are contaminated by adding the synthetically generated EMG and baseline wander with SNR ranging from $-12$ to $16$ dB. For each data record and at each SNR, 100 Monte Carlo Simulations are performed and the average SNR improvement and MSE are computed. The results obtained after denoising of the noisy record ‘105’ are presented in Figure 5.39.

![Figure 5.39: Average (a) SNR Improvement (dB), and (b) MSE, after wavelet denoising with 
*ilet*3, 
*ilet*5, 
$db$4, 
$db$6, 
$db$8, 
$sym$4 and 
$coif$4.](image-url)

Observing Figure 5.39 (a) it can be seen that the SNR improvement achieved after the wavelet thresholding using different wavelet families follows a similar pattern. The noise reduction method works more effectively, when the noise power is high which is due to the decreasing correlation between the signal and the wavelet. Wavelet transform simply performs a correlation analysis between the signal and the wavelet function, and expected to produce wavelet coefficients with minimal amplitude which typically correspond to the noise. This way considerable amount of noise is suppressed. The
basic idea behind this method is that wavelet coefficients in large-magnitude are typically the sharp features of the signal, such as the QRS complex, which are preserved. However, this leads to signal distortion especially due to the relatively smoother features such as T and P waves. Thus, measurement of the SNR improvement is not sufficient to determine the effectiveness of the noise reduction since the time-domain characteristics of an ECG signal are diagnostically important. Hence, the MSE after denoising is calculated and presented in Figure 5.39 (b). In this case, it can be observed that higher noise power results in increased MSE, as the wavelet thresholding method cannot distinguish between the signal and noise components as effectively and some of the signal characteristics are distorted. Therefore, although the SNR improvement values present promising results the important features of the ECG signal is affected. Nevertheless, the presented MSE values are small and the distortion is negligible for this application example. The aim of this study is to compare the denoising performance as well as the computational complexity of the proposed IIR filter based wavelet filter banks with the conventional wavelet filter banks. Therefore, it is important to highlight that the SNR improvement and MSE values obtained in this study may not be optimal and can be improved with alternative/additional noise removal algorithms, such as adaptive filtering and Independent Component Analysis (ICA).

From Figure 5.39, it can be observed that the $i.e.t5$ hybrid IIR/FIR wavelet filter bank (red) provides the highest SNR improvement and the lowest MSE when compared to the others, whilst $c.o.i.f4$ FIR/FIR wavelet filter bank (burgundy) provides the second highest SNR improvement and the lowest MSE results. Table 5.14 presents the average SNR improvement figures (in dB) and the MSE values obtained after denoising four
noisy ECG with input SNR of -8 dB by using the aforementioned wavelet filter banks in order to provide detailed SNR and MSE values obtained after denoising.

Table 5.14: SNR improvement (dB) and MSE after wavelet denoising the four noisy ECG records with input SNR of -8 dB.

<table>
<thead>
<tr>
<th></th>
<th>SNR Improvement (dB)</th>
<th>Mean Square Error (MSE)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>ilet3</td>
<td>ilet5</td>
</tr>
<tr>
<td>'103'</td>
<td>11.95</td>
<td><strong>12.5</strong></td>
</tr>
<tr>
<td>'105'</td>
<td>12.5</td>
<td><strong>13.17</strong></td>
</tr>
<tr>
<td>'109'</td>
<td>13</td>
<td><strong>13.78</strong></td>
</tr>
<tr>
<td>'118'</td>
<td>12.18</td>
<td><strong>12.86</strong></td>
</tr>
</tbody>
</table>

In Table 5.14 the SNR improvement and MSE values for all data records vary for different mother wavelets while the threshold method and rule are kept the same. This indicates that the performance of the well-known wavelet thresholding based noise removal method can be improved by using different wavelet families. This is simply due to the various filter characteristics such as transition bandwidth and passband ripples, which are directly related to the filtering performance. The analysis filter bank responses are presented in Appendix C. These figures present that the highpass branches of the analysis filter banks of ilet5 and coif4 exhibit longer passband regions with sharper transition bands compared to the others. Hence, ilet5 provides the highest SNR improvement and the lowest MSE values for all the data records which are presented in Table 5.14 as bold numbers. The ilet5 hybrid filter bank is followed by the coif4 both under high and relatively low input noise power due to the better frequency selectivity achieved. However, the computational complexity of these filter banks needs attention since the ilet5 hybrid IIR/FIR wavelet filter bank achieves better frequency selectivity by employing only 20 coefficient multiplications whereas the coif4 FIR/FIR wavelet
filter bank requires 48, with the assumption that the filter banks are implemented as polyphase structures. Although in Chapter 4 an efficient method of implementing the FIR/FIR wavelet filter banks is proposed, the same method is applied with the hybrid IIR/FIR wavelet filter banks which makes them superior over their FIR/FIR counterparts for high order wavelets such as the $\text{coif}4$. In addition, the $db4$, $sym4$, $db6$ and $db8$ analysis and synthesis filters employ 16, 16, 24 and 32 coefficient multiplications (with the assumption that the filter banks are implemented as polyphase structures), respectively whereas the $ilet3$ hybrid filter bank employs only 9 and leads to better SNR improvement and lower MSE at the output. While SNR improvement is an important factor in the denoising applications, it is important to note that lower MSE is an indication of a relatively smaller signal distortion after denoising which is a significant factor for diagnostic applications. In terms of the computational complexity, since the analysis filter banks of the proposed hybrid filter banks are implemented using the allpass based halfband polyphase IIR filters, the arithmetic and storage complexity of the hybrid filter banks will always be lower in comparison to the FIR wavelets based on the selected mother wavelet and the FIR filter structure. Also, it is a well known fact that for fixed-point implementations, FIR filters are more sensitive to coefficient quantization which require higher word-lengths compared to the allpass based halfband polyphase IIR filters, which further increases the system complexity.

5.9 Conclusions

In this chapter, the design methodology for orthonormal IIR wavelets, introduced by Zhang et al. [165] is described in detail. Using this method two IIR wavelet filters,
ilet3 and ilet5, are designed and their corresponding filter banks are implemented in floating and fixed point precision. In order to eliminate the non-linear phase effects and achieve a near perfect reconstruction, the synthesis filters need to be the time-reversed versions of the analysis filters which results in non-causal filters. Although it is possible to achieve this for off-line processing with finite length input extra care needs to be taken for real-time implementation with infinitely long input sequences. Thus, two methods for realising the non-causal filter implementation is introduced. The first method incorporated block processing technique where the input is divided into small blocks and with the help of dual-port RAMs time-reversal of the input is achieved. Then, the time reversed input blocks are filtered with allpass sections, added and time reversed again to obtain the required output. The effects of the block sizes are investigated and concluded that $L_0 = 8$ and $L_0 = 8$ and $L_1 = 16$ for ilet3 and ilet5 filters, respectively are sufficient to achieve near perfect reconstruction. These filter banks are later converted into fixed point precision and implemented in MATLAB/Simulink environment where the quantization effects are evaluated for white Gaussian noise, ECG and EEG data. The evaluated systems are also synthesized and place and routed on Kintex-7 FPGA device in order to achieve the resource utilization, maximum operating frequencies and estimated power consumption figures. The second method for implementing the synthesis filter banks, incorporated the design and implementation of FIR approximations of the required anti-causal IIR synthesis filters. Thus, the length of the FIR filters are determined by evaluating the perfect reconstruction properties of the filter where $L_0 = 8$ and $L_0 = 7$ and $L_1 = 11$ for ilet3 and ilet5 filters are selected, respectively. Similarly, the floating point models are converted into fixed-point pre-
cision and later synthesized and place and routed on a Kintex-7 FPGA. The FPGA synthesis results demonstrated that the hybrid systems are less power demanding and are attractive alternatives to the IIR/IIR wavelet filter banks.

Furthermore, the proposed hybrid IIR/FIR wavelet filter banks as well as the state-of-the-art FIR/FIR wavelet filter banks are employed in an ECG denoising application based on DWT denoising method. The results of this study demonstrated that the proposed systems provide better frequency selectivity and hence better denoising performance compared to the state-of-the-art FIR filter based systems with reduced computational complexity. Therefore, the work proposed in this chapter concludes that the IIR/IIR wavelet filter banks are computationally complex and have high memory requirements whereas their hybrid counterparts provide more efficient implementation.

In addition, it is demonstrated that the hybrid systems are less computationally complex and requires less power compared to the FIR based filter banks while providing better filter specifications that are useful for applications such as denoising. It should be noted that the DWT based denoising methods are not always sufficient enough to provide required output SNRs for the biomedical signals, thus they are used along with other techniques such as adaptive filtering which are hardware and power demanding. The reduction in the complexity and power requirements of the DWT filter bank by using the proposed hybrid systems creates resources for additional methods and enables their employment in power limited portable health monitoring applications.
Chapter 6

Conclusions and Future Work

6.1 Conclusions

The wearable, mobile health monitoring systems have received a considerable amount of attention in the recent years in order to provide remote, supervised and independent living for patients suffering from long term medical conditions. The key to achieving this is to provide continuous and real time monitoring by using on-body sensors that have on-site processing abilities and are capable of long-term transmission. However, these systems face the challenge of being in a resource and power constrained environment which requires low-complexity design solutions. Based on this motive, this research proposed the design and implementation of a complexity reduced digital signal processor solutions for biomedical applications that reduces the power consumption and can be employed in power limited portable health monitoring devices.

The work presented in this thesis first proposed the design of a low-complexity multiplier free decimation chain composed of a forth order Slink filter, two fifth order allpass
based polyphase halfband IIR filters and a first order allpass based Slink compensator. The decimation filters used in biomedical applications employ high order FIR filters which increases the complexity of the decimators and hence the digital signal processor. The proposed decimator provides reduction in complexity and offers an attractive alternative to existing solutions in the literature by replacing the very high order FIR based decimators with IIR decimators which employ only two distinct coefficients that are implemented with simple shift and add operations. The proposed design is tested and evaluated by feeding real ECG data through a $\Sigma\Delta$ modulator which is then filtered and decimated via the proposed decimation chain. The phase non-linearity of the IIR filters is an important factor that required attention. Therefore, the phase non-linearity effects of the proposed decimator are evaluated by measuring the time- and frequency domain distortion introduced to the filtered data. It is demonstrated that the phase non-linearity of the IIR polyphase filters, do not cause a significant distortion on the morphological and spectral characteristics of the input ECG signal. This is due to the very narrow and low frequency range corresponding the physiologically significant frequencies of the ECG signals. In other words, these frequency bands of interest are close to the DC where the group delay variation is already minimal (minimum phase filter). The high spectral coherence, high morphological correlation and low error between the input and the output signals quantifies that the IIR polyphase filter introduces minimal distortion to the signal which would not affect critical diagnosis and therefore, phase compensation is not a must for such applications.

In a typical on-body sensor the ADC is followed by the digital signal processing operations. The DWT is the one of the most popular tools employed in biomedical signal
processing literature and it is commonly realised as tree-structured filter banks composed of highpass and lowpass filters derived from wavelet functions, also known as mother wavelets. These filters are conventionally implemented as FIR filters which employ intensive multiply-add operations. The complexity of the DWT is reported to be high which limits its application in the area and power limited on-body sensor nodes. Therefore, in Chapter 4 an efficient implementation method for the DWT filters is proposed in which the resource and power hungry multipliers are replaced by specifically designed ReMBs. In this method, the constant coefficient multiplications of the scaling and wavelet filters are replaced with shift-add networks with an addition of multiplexers in a time-multiplexed FIR filter structure. It is shown that addition of the multiplexers introduces reconfigurability to the well known constant multiplication blocks. By taking the advantage of the recent FPGA technologies having 6-input LUTs, 3:1/4:1 muxes are employed in the design of ReMBs at no additional hardware cost which updates the concepts proposed in the open literature. The proposed novel solution for the DWT filter bank implementation is employed for the design of the db4 mother wavelet based filter bank, in order to evaluate the resource and power efficiency of the proposed method. For this purpose an ReMB is specifically designed for the db4 filters and employed in time-multiplexed FIR filters within a conventional DWT FBs. The proposed design is then implemented on a Kintex-7 FPGA platform and are compared to the reference designs employing parallel multipliers and to the other multiplier block designs presented in the open-literature. Although there is a wide literature on efficient FPGA and VLSI implementations of wavelet transform, to the best of author’s knowledge, application of reconfigurable multiplier blocks with optimized structure for
FPGA platforms has not been investigated in the field of biomedical signal processing. The replacement of multipliers in DWT with shift-add networks has been subject to research in image processing and image compression applications, however reconfigurable constant multiplications are not studied. As the results demonstrated, the proposed ReMB massively reduces the resource utilization when compared to the parallel multipliers. The ratio of the savings increase with the increasing input word-length, as the number of adders in the parallel multiplier increases while the number of adders in the ReMB remains the same. Furthermore, 1-level analysis filter bank cost assessment results also demonstrated that the proposed system massively improves the resource utilization and power consumption compared to the open-literature and the conventional reference design. The FPGA implementation results provided an insight that the proposed structure of the multiplier block is low-cost and power efficient compared to other FPGA implementations. Thus, such structures are suitable for DWT filter banks and can be used for ASIC implementations and employed in low-cost embedded platforms for ambulatory physiological signal monitoring and analysis.

Although an efficient implementation of the FIR filter based DWT filter banks is introduced in Chapter 4, Chapter 3 presented that IIR filters are attractive alternatives to FIR filters in terms of complexity reduction. Therefore, in Chapter 5 the design and implementation of IIR filters based DWT filter banks are proposed for further complexity reduction. For this purpose two IIR wavelet filters, $ilet_3$ and $ilet_5$, are proposed, designed and implemented in floating and fixed point precision. As, the IIR wavelet filters employed in the analysis filter bank suffer from non-linear phase which necessitates the elimination of these effects for perfectly reconstructing the de-
composed input. In order to eliminate the non-linear phase effects and achieve a near perfect reconstruction, the synthesis filters need to be the time-reversed versions of the analysis filters which results in non-causal filters. Thus, two methods for realising the non-causal filter implementation are introduced. The first method incorporated block processing technique where the input is divided into small blocks and with the help of dual-port RAMs time-reversal of the input is achieved. Then, the time reversed input blocks are filtered with allpass sections, added and time reversed again to obtain the required output. These filter banks are later converted into fixed point precision and implemented in MATLAB/Simulink environment where the quantization effects are evaluated for white Gaussian noise, ECG and EEG data. The evaluated systems are also synthesized and place and routed on Kintex-7 FPGA device in order to evaluate the resource utilization and power consumption of the IIR/IIR filter banks. However, the implementation results demonstrated that employing the block processing method increases the hardware complexity as well as the power consumption when compared to the FIR filter based DWT filter banks which does not serve the purpose of this research. Therefore, the second method for implementing the synthesis filter banks is proposed which incorporated the design and implementation of FIR approximations of the required anti-causal IIR synthesis filters. The length of the FIR filters are determined by evaluating the perfect reconstruction properties of the filter and the floating point and fixed-point models are implemented. The fixed point models are then synthesized and place and routed on a Kintex-7 FPGA. The filter banks composed of IIR analysis and FIR synthesis filters are referred to as hybrid IIR/FIR wavelet filter banks. The FPGA synthesis results demonstrated that the hybrid systems are less
power demanding and are attractive alternatives to the IIR/IIR wavelet filter banks. The proposed hybrid IIR/FIR wavelet filter banks and the state-of-the-art FIR/FIR wavelet filter banks are employed in an ECG denoising application based on DWT denoising method. The results of this study demonstrated that the proposed systems provide better frequency selectivity and hence better denoising performance compared to the state-of-the-art FIR filter based systems with reduced computational complexity. Therefore, the work proposed in Chapter 5, concluded that the IIR/IIR wavelet filter banks are computationally complex and have high memory requirements whereas their hybrid counterparts provide a more efficient implementation. In addition, it is demonstrated that the hybrid systems are less computationally complex and require less power compared to the FIR based filter banks while providing better filter specifications that are useful for applications such as denoising. The reduction in the complexity and power requirements of the DWT filter bank by using the proposed hybrid systems creates resources for additional methods and enables their employment in power limited portable health monitoring applications.

6.2 Future Work

The work proposed in this research demonstrated that the hardware complexity and power consumption of a biomedical signal processor can be massively reduced when compared to the state-of-the-art design solutions for the decimation filters and the DWT filter banks. The proposed designs’ hardware utilization and power consumptions are evaluated on a Kintex-7 FPGA however, these are also suitable for ASIC implementations to be employed in low-cost embedded platforms for ambulatory physiolo-
gical signal monitoring and analysis. Therefore, design of a custom ASIC using the proposed systems is the next step towards creating low-cost and low-complexity DSPs for wearable body area network.

The work presented in this thesis can be further extended by incorporating various implementation techniques for reducing the hardware complexity of DWT filter banks which are required to be applied for several decomposition and reconstruction levels. As the analysis and synthesis filters are the same for each level of DWT analysis and the required operating system frequency is rather slow, time-multiplexed architectures can be employed where only one set of analysis and synthesis filters is required to reduce the hardware resources. However, time-multiplexed designs will require additional memory and memory read-and write operations in order to access the intermediate detail and approximation coefficients of the filter-bank, hence this will increase the power consumption of the overall system. Therefore, alternative methods needs to be investigated too.

Chapter 3 covers the design of a multi-stage IIR filter based decimation filters to be employed in biomedical applications. Due to the importance of the time domain characteristics of the biomedical data a non-linear phase analysis is carried out where ECG data records are used for test purposes. The presented results demonstrated that the non-linear phase of the allpass based half-band polyphase IIR filters, only result in a negligible amount of distortion in the ECG signal. However, due to time limitation the research only considered ECG signals which are commonly used for ambulatory monitoring applications. In the future, the study undertaken can be extended to consider EEG signals with smaller amplitudes and narrower bandwidth and EMG signals with...
larger amplitude and wider bandwidth compared to the ECG signals.

Furthermore, in this thesis, a novel class of IIR/FIR hybrid wavelet filter banks is
proposed where the FIR filters are designed using the impulse response truncation
method. The constructed synthesis filters can approximately eliminate the non-linear
phase effects and the amplitude distortion caused by the analysis filter bank. However,
the selection of the FIR filters length is arbitrary and there is a trade of between low
complexity and low delay systems and the minimization of the amplitude, phase and
aliasing distortion. The impulse response truncation method is not optimal in the sense
of $H_1$-norm. In other words, the selected filter lengths $L_{0,1}$ do not provide the minimal
magnitude error. Therefore, this work can be further extended by investigation of
different approximation methods such as Linear Matrix Inequality (LMI), and Balanced
Model Truncation (BMT) which can potentially minimise the peak error with shorter
filter lengths $\text{(183)}$. $\text{184}$.

A typical biomedical signal processor can be used for various applications, such as signal
detection, artifact removal and/or classification. The low-complexity and low-power
solutions proposed in this work can be easily extended to many other fields where the
DWT is to be employed in junction with other detection and/or noise reduction and/or
classification algorithms. Further investigation is required as these methods are usually
computationally complex and power demanding. For instance, the adaptive filtering
techniques used to eliminate the ocular artifacts from the EEG as in $\text{(185)}$ include
iterative processing and updating of variables, or the independent component analysis
for artifact removal and classification of EEG signals as in $\text{(186)}$ incorporates matrix
computations.
References


202


Appendix A

The truth tables used to design the controllers of the ReMBs designed for the lowpass ($h_0(k)$) and highpass ($h_1(k)$) db4 analysis filters are presented in Tables A.1 and A.2.

Table A.1: Truth tables used to design the $db4$ lowpass filter $h_0(k)$ ReMB controller.

<table>
<thead>
<tr>
<th>$C$</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>$\bar{X}$</td>
<td>1</td>
</tr>
<tr>
<td>11</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>$C$</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>11</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

$S_{01} = \bar{A}B + A\bar{C}$

$S_{00} = \bar{A}\bar{C} + BC$

$S_{11} = ABC$

<table>
<thead>
<tr>
<th>$C$</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>$\bar{X}$</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>11</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>$C$</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>11</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

$S_{10} = \bar{A}\bar{C} + B$

$S_{21} = \bar{A}B + B\bar{C} + A\bar{B}C$

$S_{20} = \bar{A}\bar{B} + \bar{A}\bar{C} + ABC$

<table>
<thead>
<tr>
<th>$C$</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>11</td>
<td>$\bar{X}$</td>
<td>1</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>$C$</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>11</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

$S_{31} = AB$

$S_{30} = B\bar{C}$

$S_{4-h_0} = \bar{A}\bar{B} + \bar{C}$
Table A.2: Truth tables used to design the db4 highpass filter $h_1(k)$ ReMB controller.

<table>
<thead>
<tr>
<th></th>
<th>$C$</th>
<th>$C$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>$AB$</td>
<td>00</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>01</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>1</td>
<td>X</td>
</tr>
</tbody>
</table>

$S_{01} = AB + AC$

$S_{00} = AC + BC$

$S_{11} = \overline{AB}C$

<table>
<thead>
<tr>
<th></th>
<th>$C$</th>
<th>$C$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>$AB$</td>
<td>00</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>01</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>1</td>
<td>X</td>
</tr>
</tbody>
</table>

$S_{10} = B + AC$

$S_{21} = \overline{AB}C + \overline{A}B\overline{C}$

$S_{20} = AB + AC + \overline{A}B\overline{C}$

<table>
<thead>
<tr>
<th></th>
<th>$C$</th>
<th>$C$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>$AB$</td>
<td>00</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>01</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

$S_{31} = \overline{A}B$

$S_{30} = BC$

$S_{4-h_1} = \overline{A} + \overline{B} + C$

209
Appendix B

The state-diagram and the logic design of the 3-bit up/down counter employed in the design of ilet3 IIR synthesis filter bank to generate the required addresses for the dual-port RAM, are demonstrated in Figures B.1(a) and B.1(b) where A, B, and C are the output bits from MSB to LSB, respectively. The ‘DIR’ input determines the direction of the count, in other words if DIR = 1, then the counter counts up and if DIR = 0 then the counter counts down. The state-diagram and the logic design of the 4-bit up/down counter employed in the design of ilet5 IIR synthesis filter bank to generate the required addresses for the dual-port RAM, are demonstrated in Figures B.2(a) and B.2(b), respectively. The truth tables used to design the controllers of the ReMBs designed for the $R_0(z)$ of ilet3 and $R_0(z)$ and $R_1(z)$ of ilet5 FIR synthesis filters are presented in Tables B.1, B.2 and B.3, respectively.

Figure B.1: (a)State diagram and (b) logic design, of the 3-bit up/down counter employed in the design of ilet3 IIR synthesis filter bank to generate the required addresses for the dual-port RAM.
Figure B.2: a) State diagram and (b) logic design, of the 4-bit up/down counter employed in the design of *ilet5* IIR synthesis filter bank to generate the required addresses for the dual-port RAM.
Table B.1: Truth tables used to design the *ilet3* $R_0(z)$ ReMB controller.

<table>
<thead>
<tr>
<th>$A$</th>
<th>$B$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

$S_{01} = AB + AC$  
$S_{00} = \bar{A}B + \bar{A}C + AB\bar{C}$  
$S_{11} = \bar{A}\bar{B} + \bar{A}\bar{C}$

<table>
<thead>
<tr>
<th>$A$</th>
<th>$B$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

$S_{10} = \bar{C}(\bar{A} \oplus \bar{B})$  
$S_{21} = \bar{A}\bar{C} + \bar{A}\bar{B}$  
$S_{20} = \bar{A}\bar{B} + \bar{A}\bar{C} + \bar{B}C$

Table B.2: Truth tables used to design the *ilet5* $R_0(z)$ ReMB controller.

<table>
<thead>
<tr>
<th>$A$</th>
<th>$B$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

$S_{01} = \bar{C}$  
$S_{00} = \bar{A}\bar{B}$  
$S_1 = AC + BC$

<table>
<thead>
<tr>
<th>$A$</th>
<th>$B$</th>
<th>$C$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

$S_{21} = A\bar{C}$  
$S_{20} = A \oplus C$

212
Table B.3: Truth tables used to design the \textit{ilet5} $R_1(z)$ ReMB controller.

<table>
<thead>
<tr>
<th>$AB$</th>
<th>$CD$</th>
<th></th>
<th></th>
<th></th>
<th>$CD$</th>
<th></th>
<th></th>
<th></th>
<th>$CD$</th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>01</td>
<td>11</td>
<td>10</td>
<td></td>
<td>00</td>
<td>01</td>
<td>11</td>
<td>10</td>
<td></td>
<td>00</td>
<td>01</td>
<td>11</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>00</td>
<td>1</td>
<td>1</td>
<td>X</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
<td>1</td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td>11</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td>1</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

$S_{01} = \overline{A}C + \overline{A}\overline{B} + BC\overline{D}$

$S_{00} = A\overline{D} + BC$

$S_{11} = \overline{A} + \overline{C}\overline{D}$

$S_{10} = \overline{A}C + D$

$S_{21} = \overline{A} + D$

$S_{20} = \overline{A}\overline{B} + \overline{C}\overline{D}$

$S_{31} = \overline{C}$

$S_{30} = \overline{A}C$

$S_{41} = \overline{A}B\overline{C} + \overline{A}D + CD$

$S_{40} = \overline{A}\overline{B}CD$

$S_{5} = \overline{A}D + \overline{C}D$

213
Appendix C

The analysis filter bank responses of the 7-level DWT filter banks that are used in Section 5.7 for ECG signal denoising purposes, where D1 to D7 represent the filter responses of the highpass branch at levels 1 to 7 and A7 represents the filter response of the lowpass branch at level 7.

Figure C.1: Seven level analysis filter bank responses of (a) ilet3, (b) ilet5, (c) db4 and (d) db6.
Figure C.1: Seven level analysis filter bank responses of (e) db8, (f) sym4, and (g) coif4.