Knowledge-driven training of deep models for better reconstruction and recognition

Pandey, Ram Krishna

View/Open

Thesis full text (113.6Mb)

Author

Pandey, Ram Krishna

Metadata

Show full item record

Abstract

This thesis aims to efficiently solve many interesting and challenging problems by incorporating appropriate image processing techniques in a deep learning framework. We have proposed, implemented and tested efficient and effective solutions for tasks such as image reconstruction and recognition. We have shown that the performance of any deep architecture can be improved at three different levels: (i) input, (ii) architecture and (iii) the objective or loss function. While developing algorithms, our focus has been on designing architectures that can optimally utilize the advantages of our exploration at all these levels. In the first part, we propose different techniques to enhance the quality of low-resolution document images (particularly binary) for better human readability and OCR performance. We start with a comprehensive study on low-resolution images, showing that the performance of OCR can be improved by increasing the resolution of the document images. Next, we have improved the quality of low-resolution, down-sampled document images and finally that of real low-resolution document images. We have achieved significant enhancement in the quality of the reconstructed images, whereby humans find the reconstructed high-resolution images easy to read and the OCR recognition accuracy is also significantly improved. (exploration of the input and architecture ) In the second part,we have proposed different techniques to improve the quality of low-resolution natural images. Here we have fused multiple interpolations in a deep network to obtain better reconstruction. This idea of fusingmultiple interpolations can be applied to various computer vision and image processing tasks. This suggests that traditional algorithms can be combined in a deep framework to obtain better reconstruction. (exploration of the input and architecture) In the third part, we have proposed mean square Canny error (MSCE) as a “new loss function" that improves the performance of any existing deep architecture (super-resolution or denoising) that earlier used mean square error (MSE) as a loss function. Many a work in the literature use ’mean square error’ in various super-resolution and denoising tasks. Our main goal in proposing this loss function is that it can improve all these existing algorithms (super-resolution or denoising) that use the mean square as a loss function without incurring additional cost during inference. (exploration of the objective function) The fourth part of the thesis addresses practical applications of deep learning to some computer vision tasks. This part suggests that increasing the width and depth of a deep network is not always a better approach in the process of obtaining an optimal model (lightweight or less complex). We have shown that feeding the gradient and/or the Laplacian of the input image can improve the performance of facial emotion classifiers by a good margin, without incurring additional overhead during inference. This allows us to find a lightweight and computationally efficient model, without compromising the classification accuracies. In another task of real-time, artistic style transfer, we have proposed techniques to make it computationally more efficient, without much decrease in the perceptual quality of the reconstructed artistic images. We have proposed the use of depth-wise separable convolution (DepSep) in place of convolution and nearest neighbor (NN) interpolation in place of transposed convolution. We have also explored the concatenation of nearest neighbour and bilinear (Bil) interpolations in place of transposed convolution. The stylized images from the modified architectures are perceptually similar in quality to those from the original architecture. The decrease in the computational complexity of our architectures is validated by the decrease in the testing time by 26.1%, 39.1%, and 57.1%, respectively, for DepSep, DepSep-NN-Bil, and DepSep-NN variants. (explorations of the input, architecture and the objective)

URI

https://etd.iisc.ac.in/handle/2005/5191

Collections

Electrical Engineering (EE) [357]