Multimodal Deep Learning for Multi-Label Classification and Ranking Problems
In recent years, deep neural network models have shown to outperform many state of the art algorithms. The reason for this is, unsupervised pretraining with multi-layered deep neural networks have shown to learn better features, which further improves many supervised tasks. These models not only automate the feature extraction process but also provide with robust features for various machine learning tasks. But the unsupervised pretraining and feature extraction using multi-layered networks are restricted only to the input features and not to the output. The performance of many supervised learning algorithms (or models) depends on how well the output dependencies are handled by these algorithms [Dembczy´nski et al., 2012]. Adapting the standard neural networks to handle these output dependencies for any speciﬁc type of problem has been an active area of research [Zhang and Zhou, 2006, Ribeiro et al., 2012]. On the other hand, inference into multimodal data is considered as a difﬁcult problem in machine learning and recently ‘deep multimodal neural networks’ have shown signiﬁcant results [Ngiam et al., 2011, Srivastava and Salakhutdinov, 2012]. Several problems like classiﬁcation with complete or missing modality data, generating the missing modality etc., are shown to perform very well with these models. In this work, we consider three nontrivial supervised learning tasks (i) multi-class classiﬁcation (MCC), (ii) multi-label classiﬁcation (MLC) and (iii) label ranking (LR), mentioned in the order of increasing complexity of the output. While multi-class classiﬁcation deals with predicting one class for every instance, multi-label classiﬁcation deals with predicting more than one classes for every instance and label ranking deals with assigning a rank to each label for every instance. All the work in this ﬁeld is associated around formulating new error functions that can force network to identify the output dependencies. Aim of our work is to adapt neural network to implicitly handle the feature extraction (dependencies) for output in the network structure, removing the need of hand crafted error functions. We show that the multimodal deep architectures can be adapted for these type of problems (or data) by considering labels as one of the modalities. This also brings unsupervised pretraining to the output along with the input. We show that these models can not only outperform standard deep neural networks, but also outperform standard adaptations of neural networks for individual domains under various metrics over several data sets considered by us. We can observe that the performance of our models over other models improves even more as the complexity of the output/ problem increases.
Showing items related by title, author, creator and subject.
Srinivas, Suraj (2018-05-22)Deep neural networks with millions of parameters are at the heart of many state of the art computer vision models. However, recent works have shown that models with much smaller number of parameters can often perform just ...
Mahale, Gopinath Vasanth (2017-10-31)Face recognition is a field of biometrics that deals with identification of subjects based on features present in the images of their faces. The factors that make face recognition popular and favorite as compared to other ...
Vijaya Kumar, M (2013-04-29)The present work focuses on the two areas of investigation: system identification of helicopter and design of controller for the helicopter. Helicopter system identification, the first subject of investigation in this ...