Show simple item record

dc.contributor.advisorChakraborty, Anirban
dc.contributor.authorSeth, Siddharth
dc.date.accessioned2022-07-27T07:12:23Z
dc.date.available2022-07-27T07:12:23Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5801
dc.description.abstractAnalyzing humans and their activities takes a central role in computer vision. This requires machine learning models to encapsulate both the diverse poses and appearances exhibited by humans. Estimating the 3D poses of highly deformable humans from monocular RGB images remains an important, challenging, and unsolved problem with applications in human-robot interaction, augmented reality, gaming industry, etc. Another important task is to identify the same human targets across camera viewpoints in a wide-area video surveillance setup, requiring learning discriminative and robust representations of human appearances under large variabilities of poses, backgrounds, and illuminations. In this thesis, we study several computer vision problems under the theme of estimating human pose and modeling appearance from monocular images. Estimating the 3D pose from a single image is an ill-posed classical inverse problem as the model lacks depth information. In such scenarios, supervised approaches tend to perform well by guiding the model towards plausible poses. While assuming the availability of a labeled dataset is itself impractical, such approaches tend to suffer from poor generalization to unseen datasets. We thus formulate the problem as an unsupervised learning task and propose a novel framework that consists of a series of differentiable transformations acting as a suitable bottleneck, stimulating effective pose disentanglement. Furthermore, the proposed adaptation technique enables learning from in-the-wild videos beyond laboratory settings, thereby resulting in superior generalizability across diverse and unseen environments. The 3D pose estimation models discard variations in a human body, e.g., shape and appearance, which may help solve other related tasks such as body-part segmentation. As a next step, we design a single part-based 2D puppet model, relying on human pose articulation constraints and a set of unpaired 3D poses to estimate both 3D poses and part segments from human-centric images. Unlike our previous work, the proposed part-based model allows us to operate on videos with diverse camera movements. The approaches above cast the 3D pose estimation problem as a task of disentangling human pose and appearance. Different from these, we propose to cast the 3D pose learning as a cross-modal alignment problem in our subsequent work. We consider the availability of an unpaired pool of short-length natural action videos and 3D pose sequences from the input and output modalities respectively. We introduce a novel technique for self-supervised alignment across these modalities while relying on preserving higher-order non-local relations in a pre-learned, latent pose space to attain superior generalizability over the state-of-the-art. Unsupervised person re-identification (re-ID) aims to tackle the problem of matching identities across non-overlapping cameras without any assumption of labels during training. We propose a two-stage training strategy towards solving this task. First, we train a deep network on an expertly designed pose-transformed dataset obtained by generating multiple perturbations in the pose space for each original image. Next, the network learns to attend to the fundamental aspects of feature learning - compact clusters with low intra-cluster and high inter-cluster variation, thereby mapping similar features closer using the proposed discriminative clustering algorithm. Experiments on large-scale re-ID datasets demonstrate the superiority of our method against state-of-the-art approaches.en_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectSelf-Supervised Learningen_US
dc.subjectComputer Visionen_US
dc.subject3D Human Pose Estimationen_US
dc.subjectPerson Re-identificationen_US
dc.subjectDeep Learningen_US
dc.subjectConvolutional Neural Networken_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer scienceen_US
dc.titleLearning to Perceive Humans From Appearance and Poseen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record