Show simple item record

dc.contributor.advisorRamakrishnan, A G
dc.contributor.authorPandey, Ram Krishna
dc.date.accessioned2021-07-08T10:22:15Z
dc.date.available2021-07-08T10:22:15Z
dc.date.submitted2020
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5191
dc.description.abstractThis thesis aims to efficiently solve many interesting and challenging problems by incorporating appropriate image processing techniques in a deep learning framework. We have proposed, implemented and tested efficient and effective solutions for tasks such as image reconstruction and recognition. We have shown that the performance of any deep architecture can be improved at three different levels: (i) input, (ii) architecture and (iii) the objective or loss function. While developing algorithms, our focus has been on designing architectures that can optimally utilize the advantages of our exploration at all these levels. In the first part, we propose different techniques to enhance the quality of low-resolution document images (particularly binary) for better human readability and OCR performance. We start with a comprehensive study on low-resolution images, showing that the performance of OCR can be improved by increasing the resolution of the document images. Next, we have improved the quality of low-resolution, down-sampled document images and finally that of real low-resolution document images. We have achieved significant enhancement in the quality of the reconstructed images, whereby humans find the reconstructed high-resolution images easy to read and the OCR recognition accuracy is also significantly improved. (exploration of the input and architecture ) In the second part,we have proposed different techniques to improve the quality of low-resolution natural images. Here we have fused multiple interpolations in a deep network to obtain better reconstruction. This idea of fusingmultiple interpolations can be applied to various computer vision and image processing tasks. This suggests that traditional algorithms can be combined in a deep framework to obtain better reconstruction. (exploration of the input and architecture) In the third part, we have proposed mean square Canny error (MSCE) as a “new loss function" that improves the performance of any existing deep architecture (super-resolution or denoising) that earlier used mean square error (MSE) as a loss function. Many a work in the literature use ’mean square error’ in various super-resolution and denoising tasks. Our main goal in proposing this loss function is that it can improve all these existing algorithms (super-resolution or denoising) that use the mean square as a loss function without incurring additional cost during inference. (exploration of the objective function) The fourth part of the thesis addresses practical applications of deep learning to some computer vision tasks. This part suggests that increasing the width and depth of a deep network is not always a better approach in the process of obtaining an optimal model (lightweight or less complex). We have shown that feeding the gradient and/or the Laplacian of the input image can improve the performance of facial emotion classifiers by a good margin, without incurring additional overhead during inference. This allows us to find a lightweight and computationally efficient model, without compromising the classification accuracies. In another task of real-time, artistic style transfer, we have proposed techniques to make it computationally more efficient, without much decrease in the perceptual quality of the reconstructed artistic images. We have proposed the use of depth-wise separable convolution (DepSep) in place of convolution and nearest neighbor (NN) interpolation in place of transposed convolution. We have also explored the concatenation of nearest neighbour and bilinear (Bil) interpolations in place of transposed convolution. The stylized images from the modified architectures are perceptually similar in quality to those from the original architecture. The decrease in the computational complexity of our architectures is validated by the decrease in the testing time by 26.1%, 39.1%, and 57.1%, respectively, for DepSep, DepSep-NN-Bil, and DepSep-NN variants. (explorations of the input, architecture and the objective)en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G29698
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectdeep learningen_US
dc.subjectimage reconstructionen_US
dc.subjectimage recognitionen_US
dc.subjectOCRen_US
dc.subjectmean square Canny erroren_US
dc.subjectDepSepen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineeringen_US
dc.titleKnowledge-driven training of deep models for better reconstruction and recognitionen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record