Going Beyond Pairwise Relationships: Representation Learning on Simplicial Complexes

Gurugubelli, Sravanthi

View/Open

Thesis full text (19.09Mb)

Author

Gurugubelli, Sravanthi

Metadata

Show full item record

Abstract

Relational data appear naturally in many real-world applications and can be modeled using mathematical structures known as graphs. Graphs model entities as nodes and their pairwise relations or interactions as edges. For example, a social network may be represented as a graph where users are nodes and their connections are edges. Learning node representations allows us to perform tasks such as classification and regression on graph data, like suggesting potential friends on a social network. However, interactions often extend beyond pairs, like groups forming on social networking platforms. These higher-order interactions can be represented using simplicial complexes, which are higher-order generalizations of graphs. A simplex of order k represents interactions between k+1 entities, with simplex neighborhoods defined by boundary, coboundary, upper, and lower adjacency operators. Due to the non-Euclidean nature of such data, traditional machine learning models cannot be readily used. Therefore, to perform tasks like classification and regression on data defined on simplicial complexes, we need to learn simplex representations. This process is known as simplicial representation learning and it involves embedding simplices into Euclidean space while preserving the inherent structural information of the simplicial complex. One of the popular approaches to learn simplex embeddings is through simplicial neural networks (SNNs). SNNs are neural network models that capture the structure-aware embeddings of simplices through sequential layers of non-linear parametric functions, updating simplex embeddings via feature aggregation from neighboring simplices. However, SNNs are parameter-heavy, requiring large amounts of labeled training data, significant computational resources, and large training times. Furthermore, they lack interpretability since they operate as black boxes, learning entirely from provided data. In this thesis, we address the limitations of SNNs by introducing three simplicial representation learning models. Each model is designed to target a specific shortcoming of SNNs. Furthermore, when the underlying structure of the simplicial complex is unknown, existing SNNs often make oversimplified assumptions about the structure. They may only consider the available graph structure, neglecting higher-order interactions, or assume all graph cliques to be simplices. These assumptions can be overly simplistic and potentially lead to inaccurate simplicial embeddings. To address this, we propose a model for learning simplicial complexes from simplicial data and graph structure. The computational complexity of SNNs is primarily due to the need to aggregate features from different types of neighboring simplices during each training epoch. To overcome this issue, we introduce a computationally efficient neural network model for simplicial representation learning. Unlike traditional SNNs, the training time of this model does not exponentially increase with the size of the simplicial complex. Instead, it remains almost independent of both the size and order of the simplicial complex. This neural model, known as simplicial-aware neural network (SaNN), operates by precomputing simplicial features prior to the training process. Despite its simplified structure, the model is designed to preserve the expressiveness of simplex embeddings. We theoretically analyze the conditions under which the embeddings of SaNN are as expressive as those of the most expressive existing SNN. Experimental results demonstrate that SaNN performs on par with the existing SNNs but with significantly less runtime. For very large simplicial complexes, where existing SNNs run out of memory, SaNN continues to perform well due to its reduced complexity and fewer parameters. The performance of SNNs is compromised when large amounts of data is not available. To address this, we introduce a simplicial scattering network (SSN) based on scattering transforms, providing an unsupervised approach to simplicial representation learning. Scattering transforms decompose a signal into coefficients representing different frequency scales, creating a multi-scale representation of the signal that captures both low-frequency and high-frequency information. We adapt scattering transforms to process signals defined over simplicial complexes. SSN employs lazy random walk matrices and a sequentially designed filter bank to capture different frequency components in the input data, generating task-agnostic, simplicial-aware representations. This approach overcomes the information loss during feature aggregation seen in both SNNs and SaNNs, where the aggregation step acts as a low pass filter, neglecting high-frequency components. SSN retains the essential properties of traditional scattering transforms, including invariance to permutations of simplices within simplicial complexes, non-expansiveness, and robustness to structural perturbations in simplicial complexes. We also demonstrate that leveraging higher-order simplices while learning representations improves the stability of output representations to perturbations in the simplicial complex. Through numerical experiments, we demonstrate the expressivity of the model, showing that it performs comparably to, or even outperforms, existing SNNs while maintaining computational efficiency. Despite the advantages of SaNN and SSN, these models still are not interpretable as a result of being neural network-based. To overcome this limitation, we introduce a Gaussian process (GP)-based model, which we refer to as simplicial-aware GP (SaGP), for probabilistic simplicial representation learning. SaGP captures higher-order relationships by modifying GPs defined over simplicial data to leverage local structural neighborhood information from all four types of adjacencies, with learnable weights to determine the importance of each kind of adjacency. Moreover, we extended SaGP to model latent functions over open triangles, ensuring that these latent functions are GPs and demonstrating the invariance of the associated kernel to the ordering of edges in the triangles. Theoretical evidence suggests that the inclusion of higher-order information and information from boundary and coboundary simplices improves the stability of the embeddings against structural perturbations. Experimental evaluations on various tasks using real-world datasets show that SaGP outperforms traditional machine learning, graph, and higher-order graph neural network models. The probabilistic nature of SaGP allows for modeling uncertainties in predictions. This enables better sampling of training data and results in good performance even with less data. Implementing simplicial representation learning models typically requires knowledge of the underlying simplicial complex. However, in some situations, we may only have access to the attributes of simplices and the basic graph structure, not the higher-order interactions. We, therefore, conclude the thesis by proposing a method to learn a simplicial complex from simplex data and graph structure, with a specific focus on second-order simplicial complexes. We propose a probabilistic model to link edge flow signals with the underlying structure while accounting for the influence of data over nodes and triangles. Specifically, we introduce latent variables that represent node and triangle signals and assume these follow a prior distribution that encourages smooth transitions across interconnected nodes or triangles. Given the observed edge flow and the underlying graph structure, we formulate an optimization problem based on the maximum a posteriori (MAP) estimation problem. This problem simultaneously infers the higher-order structure (specifically, the filled triangles) and the latent signals of nodes and triangles within the graph. Finally, we conclude the thesis with interesting directions for future research. These include representation learning for time-evolving simplicial complexes and updating GP predictions for such dynamic simplicial complexes. Future work could also focus on multi-modal simplicial representation learning and developing generative models for simplicial complexes. We could also focus on employing more sophistical methods for simplicial learning, which could incorporate constraints to learn specific properties, such as multiple components or a core-periphery structure.

URI

https://etd.iisc.ac.in/handle/2005/7043

Collections

Electrical Communication Engineering (ECE) [432]