Towards Robust and Scalable Video Surveillance: Cross-modal and Domain Generalizable Person Re-identification

Jambigi, Chaitra

dc.contributor.advisor	Chakraborty, Anirban
dc.contributor.author	Jambigi, Chaitra
dc.date.accessioned	2022-11-07T06:20:47Z
dc.date.available	2022-11-07T06:20:47Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5898
dc.description.abstract	With rapid technological advances, one can easily find video surveillance systems deployed in public places such as malls, airports etc. as well as across private residential areas. These systems play a critical role in ensuring safety and security against criminal/anomalous activities. ‘Person Re-Identification’ (re-ID) is a key component of such a system and is well-studied in modern computer vision literature. The task of person re-ID is typically posed as an instance retrieval problem in a large wide-area network of cameras with non-overlapping field-of-views (FoV). When presented with an image of a person of interest (query) as observed in any given camera, the goal is to retrieve all image instances of the target with the same identity from all other cameras (gallery) in the network. Despite the extensive research in this area, there is still a gap between the efficacy of the existing re-ID frameworks under laboratory setting and their real-world deployability - thus necessitating the development of practical solutions for person re-ID. In this thesis, we explore two such research directions to build robust and scalable person re-ID models. The first part of the thesis proposes a solution for the challenging and open problem of Visible-Thermal Person Re-ID (VT Re-ID). In this cross-modal retrieval problem, the query image of a target (in dark/low-light conditions) is captured using a thermal imaging camera and the re-ID system needs to search and retrieve observations corresponding to the same identity from the gallery set, which is composed of visible spectrum images of various targets captured using standard RGB cameras in well-lit environment. Such a system has major applications in night-time surveillance and enables round-the-clock monitoring of the places of interest. Existing cross-modal re-ID methods align the modalities via adversarial learning or complex feature extraction modules that heavily rely on domain knowledge. We propose a simple but effective framework, MMD-ReID, to explicitly reduce the modality gap. MMD-ReID takes inspiration from ‘Maximum Mean Discrepancy’ (MMD), a statistical tool that determines the distance between two distributions. Our method uses a novel margin-based formulation to match class-conditional feature distributions of the visible and thermal samples to minimize intra-class distances while maintaining feature discriminability across identities. Extensive experiments show that our method outperforms state-of-the-art approaches by significant margins. The second part of the thesis attempts to solve a more challenging problem of Domain Generalization (DG) in person re-ID. Most existing re-ID models are trained and tested on the same dataset and perform poorly when evaluated on a new dataset (domain) without any explicit fine-tuning using annotated data samples from the latter. Recent multi-source DG methods use meta-learning approaches, which are prone to overfitting on the seen domains. To overcome this, we propose a novel strategy based on a supervised contrastive learning framework for learning domain-agnostic features. Our method attempts to model domain variations by creating hallucinated ‘positive’ samples that realistically mimic the perturbations one expects from domain-shift. We empirically show that by using our proposed pool of perturbation strategies, we are able to learn better generalizable features, thereby achieving state-of-the-art performance across unseen domains. We also hypothesize that training on a related, auxiliary task that is preserved across domains can help in learning robust features. With attribute prediction as the chosen auxiliary task, we experimentally show that such training indeed leads to a better generalization of the learnt model.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Computer vision	en_US
dc.subject	Person reidentification	en_US
dc.subject	Domain generalization	en_US
dc.subject	Cross modal person re-identification	en_US
dc.subject	video surveillance systems	en_US
dc.subject	person re-ID models	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Towards Robust and Scalable Video Surveillance: Cross-modal and Domain Generalizable Person Re-identification	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Towards_Robust_and_Scalable_Pe ...
Size:: 8.627Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [102]

Show simple item record