Sampling, Recovery, and Learning Methods for Graphs and Simplicial Complexes
Abstract
Many datasets are often supported by irregular non-Euclidean domains. Graph signal processing (GSP) and topological signal processing (TSP) enables effective processing of such data by modeling interactions among data points using pairwise relationships (as edges) in graphs and higher-order relationships (as simplicies) in simplicial complexes. In practice, we encounter large-scale high-dimensional data. Thus, for effective inference, pre-processing methods like sampling (selecting a representative subset), dimensionality reduction, and representation learning become essential. This thesis presents several results addressing challenges in sampling and representation learning over graphs and simplicial complexes. In the first part of the thesis, we focus on sampling and recovery problems on graphs and simplicial complexes, wherein the main goal is to recover non-bandlimited localized signals on graphs, multi-order signals on simplicial complexes and signals over product cell structures. In the second part of the thesis, we focus on representation learning, wherein we develop algebraic and deep learning methods that leverage the underlying topological structure.
We first consider the problem of localizing a sparse non-bandlimited source that induces diffusive fields on graphs from subsampled observations. Specifically, we model diffusive fields on graphs using first-order heat diffusion equation driven by an initial field and an external time-invariant input. When the field induced on the graph is only due to the initial or external input, we develop a method to recover the underlying sources using a simple least squares estimator without imposing any band-limiting or sparsity constraints. When the field induced is due to the presence of both external and initial fields, we develop a least squares estimator by constraining only one of the sources as bandlimited. Next, we consider sampling and recovery on simplicial complexes. In particular, we focus on recovering multi-order bandlimited simplicial signals of one order higher and one order lower by subsampling a simplicial signal of a certain order. To do so, we propose an aggregation sampling scheme for simplicial complexes based on the Hodge Laplacian matrix. Then we discuss a method for recovery using simple least squares estimator. We present theoretical conditions on the number of aggregations and size of the sampling set required for faithful reconstruction as a function of the bandwidth of simplicial signals to be recovered. Lastly, we consider recovery of signals over product cell structures. In particular, we develop a method to recover the edge and node signals on product cell complexes in a computational efficient way. To do so, we consider the product cell complex as a topological structure formed by the Cartesian product of two simplicial complexes and then focus on the problem of recovering the edge signals and node signals on the product complex by subsampling signals on factor simplicial complexes. We present a simple least squares solution for estimating the edge and node signals on the product complexes. Experiments on diversified datasets demonstrates that the proposed techniques are simple and are effective in recovering sparse graph signals, multi-order simplicial signals, and signals over product cell structures.
In the second part of the thesis, we first consider the problem of multi-view dimensionality reduction for tensor data on graphs. In particular, we develop multi-view dimensionality reduction techniques for tensor data, namely, canonical correlation analysis and multi-view canonical correlation analysis on graphs to preserve the intrinsic structure in data and effectively leverage the graph structure in the common source that generates the multi-view data. A key limitation of directly applying traditional vectorization-based canonical correlation analysis algorithms to tensor data is that it destroys the inherent structural properties in data and often results in very high-dimensional representations leading to the curse of dimensionality. To begin with, we discuss tensor based canonical correlation analysis on graphs for two-view data leveraging the underlying graph structure in the common source (aka the projected low-dimensional data). Specifically, to promote the smoothness of projected low-dimensional data on graphs, we discuss a tensor based graph regularizer. Then, for multi-view data with more than two-views we discuss multi-view tensor canonical correlation analysis on graphs.
Further, to effectively capture higher-order interactions while clustering, we next focus on clustering on simplicial complexes. Specifically, we develop a spectral clustering algorithm to cluster the nodes in a simplicial complex using 2-simplices (or, filled triangles). To do so, we introduce the simplicial Cheeger inequality that relates the conductance to the second smallest eigenvalue of the simplicial Laplacian operator, which captures the relationships between nodes through 2-simplices. Leveraging this connection, we proposed a spectral clustering algorithm for simplicial complexes.
Finally, we present a deep representation learning model with simplicial complexes as an inductive bias. Specifically, we develop simplicial neural networks for learning representations of simplices in a simplicial complexes. In particular, we develop a simplicial ARMA filter based neural networks for simplicial complexes. To begin with, we first introduce simplicial ARMA filters whose frequency responses are rational and are capable of modeling a wide range of frequency responses for simplicial complexes. Leveraging this, we then develop the neural network architecture for obtaining the embeddings. Extensive experiments on downstream machine learning tasks, such as classification, clustering, and fake news detection, prove the efficacy of the proposed algebraic based representation learning algorithms in preserving intrinsic data structure and graph structure. Similarly, experiments on simplicial-level tasks, such as trajectory prediction and simplicial closure, demonstrate the effectiveness of deep representation learning algorithms in modeling a wide range of frequency responses.