Performance Analysis of Non Local Means Algorithm using Hardware Accelerators

Antony, Daniel Sanju

dc.contributor.advisor	Rathna, G N
dc.contributor.author	Antony, Daniel Sanju
dc.date.accessioned	2017-12-16T08:38:53Z
dc.date.accessioned	2018-07-31T04:57:02Z
dc.date.available	2017-12-16T08:38:53Z
dc.date.available	2018-07-31T04:57:02Z
dc.date.issued	2017-12-16
dc.date.submitted	2016
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/2932
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/3794/G27791-Abs.pdf	en_US
dc.description.abstract	Image De-noising forms an integral part of image processing. It is used as a standalone algorithm for improving the quality of the image obtained through camera as well as a starting stage for image processing applications like face recognition, super resolution etc. Non Local Means (NL-Means) and Bilateral Filter are two computationally complex de-noising algorithms which could provide good de-noising results. Due to its computational complexity, the real time applications associated with these letters are limited. In this thesis, we propose the use of hardware accelerators such as GPU (Graphics Processing Units) and FPGA (Field Programmable Gate Arrays) to speed up the filter execution and efficiently implement using them. GPU based implementation of these letters is carried out using Open Computing Language (Open CL). The basic objective of this research is to perform high speed de-noising without compromising on the quality. Here we implement a basic NL-Means filter, a Fast NL-Means filter, and Bilateral filter using Gauss Polynomial decomposition on GPU. We also propose a modification to the existing NL-Means algorithm and Gauss Polynomial Bilateral filter. Instead of Gaussian Spatial Kernel used in standard algorithm, Box Spatial kernel is introduced to improve the speed of execution of the algorithm. This research work is a step forward towards making the real time implementation of these algorithms possible. It has been found from results that the NL-Means implementation on GPU using Open CL is about 25x faster than regular CPU based implementation for larger images (1024x1024). For Fast NL-Means, GPU based implementation is about 90x faster than CPU implementation. Even with the improved execution time, the embedded system application of the NL-Means is limited due to the power and thermal restrictions of the GPU device. In order to create a low power and faster implementation, we have implemented the algorithm on FPGA. FPGAs are reconfigurable devices and enable us to create a custom architecture for the parallel execution of the algorithm. It was found that the execution time for smaller images (256x256) is about 200x faster than CPU implementation and about 25x faster than GPU execution. Moreover the power requirements of the FPGA design of the algorithm (0.53W) is much less compared to CPU(30W) and GPU(200W).	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G27791	en_US
dc.subject	Algorithm using Open Computing Language	en_US
dc.subject	Image Denoising	en_US
dc.subject	Non Local Means Algorithm	en_US
dc.subject	OpenCL	en_US
dc.subject	Field-Programmable Gate Array (FPGA)	en_US
dc.subject	Open Computing Language	en_US
dc.subject	GPU	en_US
dc.subject	Graphics Processing Unit	en_US
dc.subject	Additive White Gaussian Noise (AWGN)	en_US
dc.subject	NL-Means Algorithm	en_US
dc.subject.classification	Electrcal Engineering	en_US
dc.title	Performance Analysis of Non Local Means Algorithm using Hardware Accelerators	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G27791.pdf
Size:: 23.34Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [448]

Show simple item record