Cross-Modal Retrieval and Hashing
The objective of cross-modal retrieval is to retrieve relevant items from one modality (say image), given a query from another modality (say textual document). Cross-modal retrieval has various applications like matching image-sketch, audio-visual, near infrared-RGB, etc. Different feature representations of the two modalities, absence of paired correspondences, etc. makes this a very challenging problem. In this thesis, we have extensively looked at the cross-modal retrieval problem from different aspects and proposed methodologies to address them. • In the first work, we propose a novel framework, which can work with unpaired data of the two modalities. The method has two-steps, consisting of a hash code learning stage followed by a hash function learning stage. The method can also generate unified hash representations in post-processing stage for even better performance. Finally, we investigate, formulate and address the cross-modal hashing problem in presence of missing similarity information between the data items. • In the second work, we investigate how to make the cross-modal hashing algorithms scalable so that it can handle large amounts of training data and propose two solutions. The first approach builds on mini-batch realization of the previously formulated objective and the second is based on matrix factorization. We also investigate whether it is possible to build a hashing based approach without the need to learn a hash function as is typically done in literature. Finally, we propose a strategy so that an already trained cross-modal approach can be adapted and updated to take into account the real life scenario of increasing label space, without retraining the entire model from scratch. • In the third work, we explore semi-supervised approaches for cross-modal retrieval. We first propose a novel framework, which can predict the labels of the unlabeled data using complementary information from the different modalities. The framework can be used as an add-on with any baseline cross-modal algorithm. The second approach estimates the labels of the unlabeled data using nearest neighbor strategy, and then train a network with skip connections to predict the true labels. • In the fourth work, we investigate the cross-modal problem in an incremental multiclass scenario, where new data may contain previously unseen categories. We propose a novel incremental cross-modal hashing algorithm, which can adapt itself to handle incoming data of new categories. At every stage, a small amount of old category data termed exemplars is used, so as not to forget the old data while trying to learn for the new incoming data. • Finally, we investigate the effect of label corruption on cross-modal algorithms. We first study the recently proposed training paradigms, which focuses on small loss samples to build noise-resistant image classification models and improve upon that model using techniques like self-supervision and relabeling of large loss samples. Next we extend this work for cross-modal retrieval under noisy data.
Showing items related by title, author, creator and subject.
Shenoy, Ravi R (2017-09-20)We consider real zero-crossing analysis of the real/imaginary parts of the spectrum, namely, spectral zero-crossings (SZCs). The two major contributions are to show that: (i) SZCs provide enable temporal localization of ...
Chakraborty, Twarita (2013-05-23)The synthesis of materials with molecular recognition properties has become a topic of great technological and scientific interest. Molecular imprinting is one of the most effective strategies in preparing highly selective ...
Development of Novel Reconstruction Methods Based on l1--Minimization for Near Infrared Diffuse Optical Tomography Shaw, Calbvin B (2018-03-03)Diffuse optical tomography uses near infrared (NIR) light as the probing media to recover the distributions of tissue optical properties. It has a potential to become an adjunct imaging modality for breast and brain imaging, ...