Deep Learning for Hand-drawn Sketches: Analysis, Synthesis and Cognitive Process Models

Sarvadevabhatla, Ravi Kiran

dc.contributor.advisor	Venkatesh Babu, R
dc.contributor.author	Sarvadevabhatla, Ravi Kiran
dc.date.accessioned	2021-09-28T05:25:05Z
dc.date.available	2021-09-28T05:25:05Z
dc.date.submitted	2018
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5351
dc.description.abstract	Deep Learning-based object category understanding is an important and active area of research in Computer Vision. Most work in this area has predominantly focused on the portion of depiction spectrum consisting of photographic images. However, depictions at the other end of the spectrum, freehand sketches, are a fascinating visual representation and worthy of study in themselves. In this thesis, we present deep-learning approaches for sketch analysis, sketch synthesis and modelling sketch-driven cognitive processes. On the analysis front, we first focus on the problem of recognizing hand-drawn line sketches of objects. We propose a deep Recurrent Neural Network architecture with a novel loss formulation for sketch object recognition. Our approach achieves state-of-the-art results on a large-scale sketch dataset. We also show that the inherently online nature of our framework is especially suitable for on-the- fly recognition of objects as they are being drawn. We then move beyond object-level label prediction to the relatively harder problem of parsing sketched objects, i.e. given a freehand object sketch, determine its salient attributes (e.g. category, semantic parts, pose). To this end, we propose SketchParse, the first deep-network architecture for fully automatic parsing of freehand object sketches. We subsequently demonstrate SketchParse's abilities (i) on two challenging large-scale sketch datasets (ii) in parsing unseen, semantically related object categories (iii) in improving fine-grained sketch-based image retrieval. As a novel application, we also illustrate how SketchParse's output can be used to generate caption-style descriptions for hand-drawn sketches. On the synthesis front, we design generative models for sketches via Generative Adversarial Networks (GANs). Keeping the limited size of sketch datasets in mind, we propose DeLi- GAN, a novel architecture for diverse and limited training data scenarios. In our approach, we reparameterize the latent generative space as a mixture model and learn the mixture model's parameters along with those of GAN. This seemingly simple modification to the vanilla GAN framework is surprisingly e ective and results in models which enable diversity in generated samples although trained with limited data. We show that DeLiGAN generates diverse samples not just for hand-drawn sketches but for other image modalities as well. To quantitatively characterize intra-class diversity of generated samples, we also introduce a modi ed version of \inception-score", a measure which has been found to correlate well with human assessment of generated samples. We subsequently present an approach for synthesizing minimally discriminative sketch-based object representations which we term category-epitomes. The synthesis procedure concurrently provides a natural measure for quantifying the sparseness underlying the original sketch, which we term epitome-score. We show that the category-level distribution of epitome-scores can be used to characterize level of detail required in general for recognizing object categories. On the cognitive process modelling front, we analyze the results of a free-viewing eye fixation study conducted on freehand sketches. The analysis reveals that eye relaxation sequences exhibit marked consistency within a sketch, across sketches of a category and even across suitably grouped sets of categories. This multi-level consistency is remarkable given the variability in depiction and extreme image content sparsity that characterizes hand-drawn object sketches. We show that the multi-level consistency in the fixation data can be exploited to predict a sketch's category given only its fixation sequence and to build a computational model which predicts part-labels underlying the eye fixations on objects. The ability of machine-based agents to play games in human-like fashion is considered a benchmark of progress in AI. Motivated by this observation, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data and gathering open-ended guess-words from human guessers. To mimic humans playing Pictionary, we propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. The model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate the model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. The promising experimental results demonstrate the challenges and opportunities in building computational models for Pictionary and similarly themed games.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29364
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Computer Vision	en_US
dc.subject	Deep Learning	en_US
dc.subject	Recurrent Neural Network architecture	en_US
dc.subject	sketch object recognition	en_US
dc.subject	SketchParse	en_US
dc.subject	Generative Adversarial Networks	en_US
dc.subject	Sketch-QA	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Deep Learning for Hand-drawn Sketches: Analysis, Synthesis and Cognitive Process Models	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G29364.pdf
Size:: 14.99Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [102]

Show simple item record