dc.contributor.advisor | Chakraborty, Anirban | |
dc.contributor.author | Jambigi, Chaitra | |
dc.date.accessioned | 2022-11-07T06:20:47Z | |
dc.date.available | 2022-11-07T06:20:47Z | |
dc.date.submitted | 2022 | |
dc.identifier.uri | https://etd.iisc.ac.in/handle/2005/5898 | |
dc.description.abstract | With rapid technological advances, one can easily find video surveillance systems deployed in public places such as malls, airports etc. as well as across private residential areas. These systems play a critical role in ensuring safety and security against criminal/anomalous activities. ‘Person Re-Identification’ (re-ID) is a key component of such a system and is well-studied in modern computer vision literature. The task of person re-ID is typically posed as an instance retrieval problem in a large wide-area network of cameras with non-overlapping field-of-views (FoV). When presented with an image of a person of interest (query) as observed in any given camera, the goal is to retrieve all image instances of the target with the same identity from all other cameras (gallery) in the network. Despite the extensive research in this area, there is still a gap between the efficacy of the existing re-ID frameworks under laboratory setting and their real-world deployability - thus necessitating the development of practical solutions for person re-ID. In this thesis, we explore two such research directions to build robust and scalable person re-ID models.
The first part of the thesis proposes a solution for the challenging and open problem of Visible-Thermal Person Re-ID (VT Re-ID). In this cross-modal retrieval problem, the query image of a target (in dark/low-light conditions) is captured using a thermal imaging camera and the re-ID system needs to search and retrieve observations corresponding to the same identity from the gallery set, which is composed of visible spectrum images of various targets captured using standard RGB cameras in well-lit environment. Such a system has major applications in night-time surveillance and enables round-the-clock monitoring of the places of interest. Existing cross-modal re-ID methods align the modalities via adversarial learning or complex feature extraction modules that heavily rely on domain knowledge. We propose a simple but effective framework, MMD-ReID, to explicitly reduce the modality gap. MMD-ReID takes inspiration from ‘Maximum Mean Discrepancy’ (MMD), a statistical tool that determines the distance between two distributions. Our method uses a novel margin-based formulation to match class-conditional feature distributions of the visible and thermal samples to minimize intra-class distances while maintaining feature discriminability across identities. Extensive experiments show that our method outperforms state-of-the-art approaches by significant margins.
The second part of the thesis attempts to solve a more challenging problem of Domain Generalization (DG) in person re-ID. Most existing re-ID models are trained and tested on the same dataset and perform poorly when evaluated on a new dataset (domain) without any explicit fine-tuning using annotated data samples from the latter. Recent multi-source DG methods use meta-learning approaches, which are prone to overfitting on the seen domains. To overcome this, we propose a novel strategy based on a supervised contrastive learning framework for learning domain-agnostic features. Our method attempts to model domain variations by creating hallucinated ‘positive’ samples that realistically mimic the perturbations one expects from domain-shift. We empirically show that by using our proposed pool of perturbation strategies, we are able to learn better generalizable features, thereby achieving state-of-the-art performance across unseen domains. We also hypothesize that training on a related, auxiliary task that is preserved across domains can help in learning robust features. With attribute prediction as the chosen auxiliary task, we experimentally show that such training indeed leads to a better generalization of the learnt model. | en_US |
dc.language.iso | en_US | en_US |
dc.rights | I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part
of this thesis or dissertation | en_US |
dc.subject | Computer vision | en_US |
dc.subject | Person reidentification | en_US |
dc.subject | Domain generalization | en_US |
dc.subject | Cross modal person re-identification | en_US |
dc.subject | video surveillance systems | en_US |
dc.subject | person re-ID models | en_US |
dc.subject.classification | Research Subject Categories::TECHNOLOGY::Information technology::Computer science | en_US |
dc.title | Towards Robust and Scalable Video Surveillance: Cross-modal and Domain Generalizable Person Re-identification | en_US |
dc.type | Thesis | en_US |
dc.degree.name | MTech (Res) | en_US |
dc.degree.level | Masters | en_US |
dc.degree.grantor | Indian Institute of Science | en_US |
dc.degree.discipline | Engineering | en_US |