dc.contributor.advisor | Biswas, Soma | |
dc.contributor.author | Mandal, Devraj | |
dc.date.accessioned | 2020-11-19T07:35:35Z | |
dc.date.available | 2020-11-19T07:35:35Z | |
dc.date.submitted | 2020 | |
dc.identifier.uri | https://etd.iisc.ac.in/handle/2005/4685 | |
dc.description.abstract | The objective of cross-modal retrieval is to retrieve relevant items from one modality (say image), given a query from another modality (say textual document). Cross-modal retrieval has various applications like matching image-sketch, audio-visual, near infrared-RGB, etc. Different feature representations of the two modalities, absence of paired correspondences, etc. makes this a very challenging problem. In this thesis, we have extensively looked at the cross-modal retrieval problem from different aspects and proposed methodologies to address them.
• In the first work, we propose a novel framework, which can work with unpaired data of the two modalities. The method has two-steps, consisting of a hash code learning stage followed by a hash function learning stage. The method can also generate unified hash representations in post-processing stage for even better performance. Finally, we investigate, formulate and address the cross-modal hashing problem in presence of missing similarity information between the data items.
• In the second work, we investigate how to make the cross-modal hashing algorithms scalable so that it can handle large amounts of training data and propose two solutions. The first approach builds on mini-batch realization of the previously formulated objective and the second is based on matrix factorization. We also investigate whether it is possible to build a hashing based approach without the need to learn a hash function as is typically done in literature. Finally, we propose a strategy so that an already trained cross-modal approach can be adapted and updated to take into account the real life scenario of increasing label space, without retraining the entire model from scratch.
• In the third work, we explore semi-supervised approaches for cross-modal retrieval. We first propose a novel framework, which can predict the labels of the unlabeled data using complementary information from the different modalities. The framework can be used as an add-on with any baseline cross-modal algorithm. The second approach estimates the labels of the unlabeled data using nearest neighbor strategy, and then train a network with skip connections to predict the true labels.
• In the fourth work, we investigate the cross-modal problem in an incremental multiclass scenario, where new data may contain previously unseen categories. We propose a novel incremental cross-modal hashing algorithm, which can adapt itself to handle incoming data of new categories. At every stage, a small amount of old category data termed exemplars is used, so as not to forget the old data while trying to learn for the new incoming data.
• Finally, we investigate the effect of label corruption on cross-modal algorithms. We first study the recently proposed training paradigms, which focuses on small loss samples to build noise-resistant image classification models and improve upon that model using techniques like self-supervision and relabeling of large loss samples. Next we extend this work for cross-modal retrieval under noisy data. | en_US |
dc.language.iso | en_US | en_US |
dc.rights | I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part
of this thesis or dissertation | en_US |
dc.subject | cross-modal retireval | en_US |
dc.subject | hashing | en_US |
dc.subject | incremental learning | en_US |
dc.subject | learning with noisy labels | en_US |
dc.subject | semi-supervised learning | en_US |
dc.subject | Cross-Modal Retrieval and Hashing | en_US |
dc.subject | Scalability in Cross-Modal Retrieval | en_US |
dc.subject | Cross-Modal Retrieval under Noisy Labels | en_US |
dc.subject | Cross-Modal Retrieval under Incremental Multi-Class Setting | en_US |
dc.subject.classification | Research Subject Categories::TECHNOLOGY::Information technology::Computer science | en_US |
dc.title | Cross-Modal Retrieval and Hashing | en_US |
dc.type | Thesis | en_US |
dc.degree.name | PhD | en_US |
dc.degree.level | Doctoral | en_US |
dc.degree.grantor | Indian Institute of Science | en_US |
dc.degree.discipline | Engineering | en_US |