dc.description.abstract | In many applications, we observe large volumes of data supported on irregular (non-Euclidean) domains. In graph signal processing (GSP) and graph machine learning (GML), data is indexed using the nodes of a graph and interactions between the data points is modeled through its edges. That is, data is viewed as graph signals. GSP and GML methods leverage the underlying graph structure while processing data to extract meaningful information. Oftentimes, we gather multi-view or multi-domain data, which forms the central focus on the thesis. Multi-domain graph data have more than one irregular domain associated with it, whereas multi-view graph signals are complementary data acquired about the same source. One of the main challenges in processing such data using existing GSP or GML methods is high computational costs and huge memory requirements, which make them difficult when dealing with large-scale graph data. Therefore, this thesis aims at developing algorithms to process multi-domain and multi-view graph data with a focus to graph topology inference, data imputation, and clustering. Specifically, in this thesis, to process multi-domain graph signals, we leverage the so-called product graph structure to develop computationally efficient algorithms for graph learning, graph signal reconstruction when the node-to-signal correspondence information is not available, and Bayesian inference with active learning. Next, we focus on clustering multi-view graph data into their corresponding subspaces or manifolds, but from sketched (i.e., compressed) data.
We first focus on graph topology inference from multi-domain data. To do so, we model multi-domain data as a product graph signal formed by the Cartesian product of two small graph factors. Assuming the muti-domain graph signal is smooth on the underlying Cartesian product graph, we pose the graph learning problem as the problem of estimating its graph factors. Specifically, we estimate graph factors with sparsity and multi-component spectral constraints to capture the local interactions and clusters in the multi-domain data. While working with graph factors is computationally attractive, not all graphs can be expressed as the Cartesian product of two or more smaller graphs. So, we also focus on approximately factorizing a graph as the Cartesian product of its graph factors.
Next, we consider the graph signal reconstruction (GSR) problem from a subset of observations on a product graph. Specifically, we consider a case where the one-to-one mapping between the observed graph signals and the corresponding node indices is missing due to privacy or data gathering constraints. In other words, the node indices of the observed graph signals are missing, or equivalently, the observed graph signals are permuted, but with an unknown permutation. Given such data, the goal is to reconstruct the true product graph signal. When the underlying data permutation is ignored, the performance of the traditional GSR methods deteriorates as they assume that the node indices of the observed graph signals are known. Therefore, we develop a GSR method that jointly estimates the product graph signal and the underlying permutation. Furthermore, assuming that the product graph signal is bandlimited, we derive conditions for perfect recovery.
We then focus on Bayesian inference on product graphs. Specifically, given noisy product graph signals observed on a subset of nodes of the product graph, we aim to estimate the graph signals on the remaining nodes along with quantifying uncertainty of the estimate that we use for active learning, i.e., which samples or labels to aquire next having observed a few samples. To do so, we follow a Gaussian process (GP) framework, where we model the graph signal as an unknown function with a GP prior having an inductive bias about the underlying product graph, and propose an estimator as the mean of the posterior probability distribution function with its variance quantifying the uncertainty of the unobserved graph signals. We use the uncertainty in the estimate to perform active learning on the product graph.
In the last part of the thesis, we consider on multi-view subspace clustering (MvSC), where the goal is to cluster the multi-view data points into their corresponding subspaces or manifolds by combining the information from different views using compressed or sketched data. In this thesis, assuming that data lie on a union of linear subspaces or non-linear manifolds, we propose three MvSC algorithms and derive bounds for worst-case representation error incurred due to sketching. The proposed algorithms are scalable and incur less computational costs than the traditional MvSC methods. | en_US |