Knowledge-driven training of deep models for better reconstruction and recognition
Abstract
This thesis aims to efficiently solve many interesting and challenging problems by incorporating
appropriate image processing techniques in a deep learning framework. We have proposed, implemented
and tested efficient and effective solutions for tasks such as image reconstruction and
recognition. We have shown that the performance of any deep architecture can be improved at
three different levels: (i) input, (ii) architecture and (iii) the objective or loss function. While developing
algorithms, our focus has been on designing architectures that can optimally utilize the
advantages of our exploration at all these levels.
In the first part, we propose different techniques to enhance the quality of low-resolution document
images (particularly binary) for better human readability and OCR performance. We start
with a comprehensive study on low-resolution images, showing that the performance of OCR can
be improved by increasing the resolution of the document images. Next, we have improved the
quality of low-resolution, down-sampled document images and finally that of real low-resolution
document images. We have achieved significant enhancement in the quality of the reconstructed
images, whereby humans find the reconstructed high-resolution images easy to read and the OCR
recognition accuracy is also significantly improved. (exploration of the input and architecture )
In the second part,we have proposed different techniques to improve the quality of low-resolution
natural images. Here we have fused multiple interpolations in a deep network to obtain better reconstruction.
This idea of fusingmultiple interpolations can be applied to various computer vision
and image processing tasks. This suggests that traditional algorithms can be combined in a deep
framework to obtain better reconstruction. (exploration of the input and architecture)
In the third part, we have proposed mean square Canny error (MSCE) as a “new loss function"
that improves the performance of any existing deep architecture (super-resolution or denoising)
that earlier used mean square error (MSE) as a loss function. Many a work in the literature use
’mean square error’ in various super-resolution and denoising tasks. Our main goal in proposing
this loss function is that it can improve all these existing algorithms (super-resolution or denoising)
that use the mean square as a loss function without incurring additional cost during inference.
(exploration of the objective function)
The fourth part of the thesis addresses practical applications of deep learning to some computer
vision tasks. This part suggests that increasing the width and depth of a deep network is
not always a better approach in the process of obtaining an optimal model (lightweight or less
complex). We have shown that feeding the gradient and/or the Laplacian of the input image can
improve the performance of facial emotion classifiers by a good margin, without incurring additional
overhead during inference. This allows us to find a lightweight and computationally efficient
model, without compromising the classification accuracies. In another task of real-time, artistic
style transfer, we have proposed techniques to make it computationally more efficient, without
much decrease in the perceptual quality of the reconstructed artistic images. We have proposed
the use of depth-wise separable convolution (DepSep) in place of convolution and nearest neighbor
(NN) interpolation in place of transposed convolution. We have also explored the concatenation
of nearest neighbour and bilinear (Bil) interpolations in place of transposed convolution. The
stylized images from the modified architectures are perceptually similar in quality to those from
the original architecture. The decrease in the computational complexity of our architectures is
validated by the decrease in the testing time by 26.1%, 39.1%, and 57.1%, respectively, for DepSep,
DepSep-NN-Bil, and DepSep-NN variants. (explorations of the input, architecture and the objective)