Gaussian processes for learning problems with related outputs
Abstract
Gaussian processes (GPs) have attracted much attention in the machine learning community due to their promising generalization performance and unique properties. GPs provide a Bayesian non-parametric approach to learning in kernel machines. In this work, we discuss the applications of GPs to solve learning problems with related outputs: ordinal regression, structured prediction, and multi-task learning.
Ordinal regression problem arises in situations where examples are rated on an ordinal scale. State-of-the-art Gaussian process ordinal regression approaches perform model selection using an approximate measure based on marginal likelihood. We provide an approach which uses an exact measure based on leave-one-out cross-validation for model selection. Both these approaches use approximate inference techniques to perform prediction. We propose a new approach which provides a simple and exact way to perform ordinal regression using GPs. It avoids complex approximate inference techniques and provides a valid probability distribution over ordinal outputs. We also propose an approach to design a sparse GP model for ordinal regression which reduces training time, inference time, and storage requirements. In many real-world applications, labeled ordinal data are scarce while unlabeled data are available in abundance. We propose a novel semi-supervised learning approach for ordinal regression using GPs based on the idea of distribution matching. Numerical experiments on ordinal datasets demonstrate the effectiveness of the proposed approaches.
Structured prediction problem arises in various applications such as natural language processing and computer vision, where the output consists of several inter-related components. We propose a GP-based approach using a pseudo-likelihood model. This model can efficiently capture long-range dependencies among the output components and is also useful in handling datasets with partially missing labels. We present efficient training and inference algorithms for the proposed model. The ability to capture long-range dependencies helps in giving better generalization performance, which is evident from the experimental results.
Multi-task learning involves solving multiple related learning problems by sharing some common structure for improved generalization performance. We propose a flexible GP-based approach which captures the task similarity by selecting the features common across all the tasks. This is achieved by using a hierarchical model with a multi-Laplacian prior over the hyper-parameters. By identifying relevant features common across all the tasks, the proposed approach is found to exhibit better generalization performance than other GP-based approaches on real-world datasets.

