Learning to Perceive Humans From Appearance and Pose

Seth, Siddharth

dc.contributor.advisor	Chakraborty, Anirban
dc.contributor.author	Seth, Siddharth
dc.date.accessioned	2022-07-27T07:12:23Z
dc.date.available	2022-07-27T07:12:23Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5801
dc.description.abstract	Analyzing humans and their activities takes a central role in computer vision. This requires machine learning models to encapsulate both the diverse poses and appearances exhibited by humans. Estimating the 3D poses of highly deformable humans from monocular RGB images remains an important, challenging, and unsolved problem with applications in human-robot interaction, augmented reality, gaming industry, etc. Another important task is to identify the same human targets across camera viewpoints in a wide-area video surveillance setup, requiring learning discriminative and robust representations of human appearances under large variabilities of poses, backgrounds, and illuminations. In this thesis, we study several computer vision problems under the theme of estimating human pose and modeling appearance from monocular images. Estimating the 3D pose from a single image is an ill-posed classical inverse problem as the model lacks depth information. In such scenarios, supervised approaches tend to perform well by guiding the model towards plausible poses. While assuming the availability of a labeled dataset is itself impractical, such approaches tend to suffer from poor generalization to unseen datasets. We thus formulate the problem as an unsupervised learning task and propose a novel framework that consists of a series of differentiable transformations acting as a suitable bottleneck, stimulating effective pose disentanglement. Furthermore, the proposed adaptation technique enables learning from in-the-wild videos beyond laboratory settings, thereby resulting in superior generalizability across diverse and unseen environments. The 3D pose estimation models discard variations in a human body, e.g., shape and appearance, which may help solve other related tasks such as body-part segmentation. As a next step, we design a single part-based 2D puppet model, relying on human pose articulation constraints and a set of unpaired 3D poses to estimate both 3D poses and part segments from human-centric images. Unlike our previous work, the proposed part-based model allows us to operate on videos with diverse camera movements. The approaches above cast the 3D pose estimation problem as a task of disentangling human pose and appearance. Different from these, we propose to cast the 3D pose learning as a cross-modal alignment problem in our subsequent work. We consider the availability of an unpaired pool of short-length natural action videos and 3D pose sequences from the input and output modalities respectively. We introduce a novel technique for self-supervised alignment across these modalities while relying on preserving higher-order non-local relations in a pre-learned, latent pose space to attain superior generalizability over the state-of-the-art. Unsupervised person re-identification (re-ID) aims to tackle the problem of matching identities across non-overlapping cameras without any assumption of labels during training. We propose a two-stage training strategy towards solving this task. First, we train a deep network on an expertly designed pose-transformed dataset obtained by generating multiple perturbations in the pose space for each original image. Next, the network learns to attend to the fundamental aspects of feature learning - compact clusters with low intra-cluster and high inter-cluster variation, thereby mapping similar features closer using the proposed discriminative clustering algorithm. Experiments on large-scale re-ID datasets demonstrate the superiority of our method against state-of-the-art approaches.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Self-Supervised Learning	en_US
dc.subject	Computer Vision	en_US
dc.subject	3D Human Pose Estimation	en_US
dc.subject	Person Re-identification	en_US
dc.subject	Deep Learning	en_US
dc.subject	Convolutional Neural Network	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer science	en_US
dc.title	Learning to Perceive Humans From Appearance and Pose	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Thesis_Masters.pdf
Size:: 33.58Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [116]

Show simple item record