Generalizing Cross-domain Retrieval Algorithms
Abstract
Cross-domain retrieval is an important research topic due to its wide range of applications in e-commerce, forensics etc. It addresses the data retrieval problem from a search set, when the query belongs to one domain, and the search database contains samples from some other domain. Several algorithms have been proposed for the same in recent literature to address this task. In this thesis, we address some of the challenges in cross-domain retrieval, specifically for the application of sketch-based image retrieval.
Traditionally, cross-domain algorithms assume that both the training and test data belong to the same set of seen-classes, which is quite restrictive. Thus, such models can only be used to retrieve data from the two specific domains on which they have been trained on, and cannot generalize to new domains or new classes, during retrieval. But in real world, new object classes are continuously being discovered over time, thus it is necessary to design algorithms that can generalize to previously unseen classes. In addition, for a practically useful retrieval model, it will be good if the model can perform retrieval between any two different data domains, whether or not those domains are used for training. In our work, we observe a significant decrease in the performance of existing approaches in these generalized retrieval scenarios, when such simplified assumptions are removed. In this thesis, we aim to address these and related challenges, so as to make the cross-domain retrieval models better suited for real-life applications.
We first consider a class-wise generalized protocol, where the query data during retrieval may belong to any unseen classes. Following the nomenclature in the classification problems, we refer to this as zero-shot cross-modal retrieval and propose an add-on ranking module to improve the performance of the existing cross-modal methods in literature. This work is applicable to different modalities (eg. text-image), in addition to different domains (eg. image and RGBD data). Next, we focus on developing an end-to-end framework, named StyleGuide, which addresses the task of sketch-based image retrieval, for such zero-shot retrieval condition. In addition, this thesis also explores the effects of class-imbalance in training data, which is a challenging aspect for designing any machine learning algorithm. The problem of data imbalance is inherently present in all real-world datasets and we show that it adversely affects the performance of existing sketch-based image retrieval approaches. A robust adaptive margin- based regularizer is proposed as a potential solution to handle this challenge. Also, a style- augmented SBIR system is proposed in this thesis, as an extended use-case for SBIR-problems. Finally, we introduce a novel protocol termed as Universal cross-domain retrieval (UCDR), which is an extension of the zero-shot cross-modal retrieval across generalized query domains. Here, the query may belong to an unseen domain, as well as an unseen class, thus further generalizing the retrieval model. A mix-up based class-neighbourhood aware network SnMpNet is proposed to address the same. Finally, we conclude the thesis summarizing all the research findings and discussing the future research directions.