Improved Generative Models for Zero-shot object recognition
Abstract
Recognizing objects from their images automatically using computers is an important research area in the computer vision community. In this scenario, the testing images are automatically classified into one of the classes seen during training. But in real-world, new categories arise dynamically, for example, new species of animals or plants are being discovered. To address this issue, zero-shot learning (ZSL) aims at recognizing objects from categories, which has not been encountered during training. If the system has no apriori knowledge whether the input belongs to a seen or unseen class, the problem is even more challenging and is referred to as generalized zero-shot learning (GZSL). Existing methods for zero-shot learning and generalized zero-shot learning use a higher-level class description (attributes) of both seen and unseen classes to bridge the knowledge gap of the classifiers. Recently, generative models have achieved impressive performance for this problem by generating synthetic visual features from attributes of unseen classes; thus transforming ZSL/GZSL into traditional supervised classification task. However, such an approach has some limitations and in this thesis, we aim to address two of these limitations. Specifically, this thesis has two main contributions as described below.
First, a new integrated classifier is proposed. It has two main advantages over the standard classification using the generated features, namely (i) The classifier is trained simultaneously with the generator, and eliminates the requirement of additional classifier for new object categories, and (ii) Since the classifier is learnt along with the generated images, this facilitates the generation of more discriminative and useful features. Extensive experiments performed on four standard ZSL datasets shows the effectiveness of the proposed approach. We also show that the proposed integrated classifier is general and can be used with any existing generative model like CVAE to improve their performance. The number of images required to be generated is also considerably small.
The second contribution is a novel generative bi-directional model for generating the data for the unseen classes. Generative models have achieved impressive performance by learning the mapping from attributes to feature space. Here, we propose to derive semantic inferences from images and use them for the generation, which enables us to capture the bidirectional information i.e., visual to semantic and semantic to visual spaces. Specifically, we propose a Semantic Embedding module, which not only gives image specific semantic information to the generative model for the generation of better features, but also makes sure that the generated features can be mapped to the correct semantic space. We combine it both with the standard SoftMax classifier and also with the proposed integrated classifier and analyze their performance. Extensive evaluation on several datasets shows the effectiveness of the proposed framework.