Landmark Estimation and Image Synthesis Guidance using  Self-Supervised Networks

Karmali, Tejan

dc.contributor.advisor	Venkatesh, Babu R
dc.contributor.author	Karmali, Tejan
dc.date.accessioned	2022-11-07T06:25:52Z
dc.date.available	2022-11-07T06:25:52Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5899
dc.description.abstract	The exponential rise in the availability of data over the past decade has fuelled research in deep learning. While supervised deep learning models achieve near-human performance using annotated data, it comes with an additional cost of annotation. Additionally, there could be ambiguity in annotations due to human error. While an image classification task assigns one label to the whole image, as we increase the granularity of the task to landmark estimation, the annotator needs to pinpoint the landmark accurately. The self-supervised learning (SSL) paradigm overcomes these concerns by using pretext task based objectives to learn from large-scale unannotated data. In this work, we show how to extract relevant signals from pretrained self-supervised networks for a) a discriminative task of landmark estimation under limited annotations, and b) increasing perceptual quality of the images generated by generative adversarial network. In this first part, we demonstrate the emergent correspondence tracking properties in the non-contrastive SSL framework. Using this as supervision, we propose LEAD which is an approach to discover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under a drastically limited number of annotations while also improving generalization across scale variations. Next, we utilize the rich feature space from the SSL framework as a “naturalness” prior to alleviate unnatural image generation from Generative Adversarial Networks (GAN), which is a popular class of generative models. Progress in GANs has enabled the generation of high-resolution photorealistic images of astonishing quality. StyleGANs allow for compelling attribute modification on such images via mathematical operations on the latent style vectors in the W/W+ space that effectively modulates the rich hierarchical representations of the generator. Such operations have recently been generalized beyond mere attribute swapping in the original StyleGAN paper to include interpolations. In spite of many significant improvements in StyleGANs, they are still seen to generate unnatural images. The quality of the generated images is a function of, (a) richness of the hierarchical representations learned by the generator, and, (b) linearity and smoothness of the style spaces. In this work, we propose Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learned by pretrained networks on large amounts of data. HSR not only improves generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images. To demonstrate improved linearity, we propose a novel metric - Attribute Linearity Score (ALS). A significant reduction in the generation of unnatural images is corroborated by improvement in the Perceptual Path Length (PPL) metric by 15% across different standard datasets while simultaneously improving the linearity of attribute-change in the attribute editing tasks.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Self-supervised learning	en_US
dc.subject	Landmark Estimation	en_US
dc.subject	Image Synthesis	en_US
dc.subject	Deep Learning	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Landmark Estimation and Image Synthesis Guidance using Self-Supervised Networks	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: mtech_res_thesis-2-1.pdf
Size:: 23.03Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [116]

Show simple item record