Graph Neural Networks with Parallel Local Neighborhood Aggregations
Abstract
Graph neural networks (GNNs) have become very popular for processing and analyzing graph-structured data in the last few years. Using message passing as their basic building blocks that aggregate information from neighborhoods, GNN architectures learn low-dimensional graph-level or node-level embeddings useful for several downstream machine learning tasks. In this thesis, we focus on GNN architectures that perform parallel neighborhood aggregations (in short, referred to as PA-GNNs) for two tasks, namely, graph classification and link prediction. Such architectures have a natural advantage of reduced training and inference time as neighborhood aggregation is done before training, unlike GNNs that perform the neighborhood aggregation sequentially (referred to as SA-GNNs) during training. Thus, the runtime of SA-GNNs depends on the number of edges in the graph.
In the first part of the thesis, we propose a generic model for GNNs with parallel neighborhood aggregation and theoretically characterize the discriminative power of PA-GNN models to discriminate between two non-isomorphic graphs. We provide conditions under which they are provably powerful as the well-known Weisfeiler-Lehman graph isomorphism test. We then specialize the generic PA-GNN model and propose a GNN architecture, which we call SPIN: simple and parallel graph isomorphism network. Next to the theoretical characterization of the developed PA-GNN model, we also present numerical experiments on a diverse variety of real-world benchmark graph classification datasets related to social networks, chemical molecular data, and brain networks. The proposed model achieves state-of-the-art performance with reduced training and inference time.
In the second part of the thesis, we propose a computational method for drug repurposing by presenting a deep learning model that captures the complex interactions between the drugs, diseases, genes, and anatomies in a large-scale interactome with over 1.4 million connections. Specifically, we propose a PA-GNN based drug repurposing architecture, which we call GDRnet, to screen a large drug database of clinically approved drugs and predict potential drugs that can be repurposed for novel diseases. GNN-based machine learning models are a natural choice for computational drug repurposing because of their ability to capture the underlying structural information in such complex biological networks. While the proposed PA-GNN architecture is computationally attractive, we present results from numerical experiments to show the efficacy of GNNs for drug repurposing. We also provide numerical experimental results on drug repurposing for coronavirus diseases (including COVID-19), where many of the drugs predicted by the proposed model are considered as the mainstay treatment.