Knowledge-driven training of deep models for better reconstruction and recognition

Pandey, Ram Krishna

dc.contributor.advisor	Ramakrishnan, A G
dc.contributor.author	Pandey, Ram Krishna
dc.date.accessioned	2021-07-08T10:22:15Z
dc.date.available	2021-07-08T10:22:15Z
dc.date.submitted	2020
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5191
dc.description.abstract	This thesis aims to efficiently solve many interesting and challenging problems by incorporating appropriate image processing techniques in a deep learning framework. We have proposed, implemented and tested efficient and effective solutions for tasks such as image reconstruction and recognition. We have shown that the performance of any deep architecture can be improved at three different levels: (i) input, (ii) architecture and (iii) the objective or loss function. While developing algorithms, our focus has been on designing architectures that can optimally utilize the advantages of our exploration at all these levels. In the first part, we propose different techniques to enhance the quality of low-resolution document images (particularly binary) for better human readability and OCR performance. We start with a comprehensive study on low-resolution images, showing that the performance of OCR can be improved by increasing the resolution of the document images. Next, we have improved the quality of low-resolution, down-sampled document images and finally that of real low-resolution document images. We have achieved significant enhancement in the quality of the reconstructed images, whereby humans find the reconstructed high-resolution images easy to read and the OCR recognition accuracy is also significantly improved. (exploration of the input and architecture ) In the second part,we have proposed different techniques to improve the quality of low-resolution natural images. Here we have fused multiple interpolations in a deep network to obtain better reconstruction. This idea of fusingmultiple interpolations can be applied to various computer vision and image processing tasks. This suggests that traditional algorithms can be combined in a deep framework to obtain better reconstruction. (exploration of the input and architecture) In the third part, we have proposed mean square Canny error (MSCE) as a “new loss function" that improves the performance of any existing deep architecture (super-resolution or denoising) that earlier used mean square error (MSE) as a loss function. Many a work in the literature use ’mean square error’ in various super-resolution and denoising tasks. Our main goal in proposing this loss function is that it can improve all these existing algorithms (super-resolution or denoising) that use the mean square as a loss function without incurring additional cost during inference. (exploration of the objective function) The fourth part of the thesis addresses practical applications of deep learning to some computer vision tasks. This part suggests that increasing the width and depth of a deep network is not always a better approach in the process of obtaining an optimal model (lightweight or less complex). We have shown that feeding the gradient and/or the Laplacian of the input image can improve the performance of facial emotion classifiers by a good margin, without incurring additional overhead during inference. This allows us to find a lightweight and computationally efficient model, without compromising the classification accuracies. In another task of real-time, artistic style transfer, we have proposed techniques to make it computationally more efficient, without much decrease in the perceptual quality of the reconstructed artistic images. We have proposed the use of depth-wise separable convolution (DepSep) in place of convolution and nearest neighbor (NN) interpolation in place of transposed convolution. We have also explored the concatenation of nearest neighbour and bilinear (Bil) interpolations in place of transposed convolution. The stylized images from the modified architectures are perceptually similar in quality to those from the original architecture. The decrease in the computational complexity of our architectures is validated by the decrease in the testing time by 26.1%, 39.1%, and 57.1%, respectively, for DepSep, DepSep-NN-Bil, and DepSep-NN variants. (explorations of the input, architecture and the objective)	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29698
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	deep learning	en_US
dc.subject	image reconstruction	en_US
dc.subject	image recognition	en_US
dc.subject	OCR	en_US
dc.subject	mean square Canny error	en_US
dc.subject	DepSep	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineering	en_US
dc.title	Knowledge-driven training of deep models for better reconstruction and recognition	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G29698-pandey.pdf
Size:: 113.6Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [448]

Show simple item record