Representing Networks: Centrality, Node Embeddings, Community Outliers and Graph Representation
Abstract
Networks are ubiquitous. We start our technical work in this thesis by exploring the classical concept of node centrality (also known as influence measure) in information networks. Like clustering, node centrality is also an ill-posed problem. There exist several heuristics and algorithms to compute the centrality of a node in a graph, but there is no formal definition of centrality available in the network science literature. Lately, researchers have proposed axiomatic frameworks for the centrality of a node in a network. However, these existing formal frameworks are not generic in nature in terms of characterizing the space of influence measures in complex networks. In this work, we propose a set of six axioms in order to capture most of the intrinsic properties that any influence measure ideally should satisfy. We also characterize existing measures of centrality with respect to this framework.
Next, we focus more on the representation learning on networks. Network embedding is required as real life networks are large, extremely sparse and discrete in nature. We investigate the problem of unsupervised node representation in attributed networks through informative random walk. Edges are also useful for various downstream network mining tasks, but most of the existing homogeneous network representation learning approaches focus on embedding the nodes of a graph. So, we propose a novel unsupervised algorithm to embed the edges of a network, through the application of the classical concept of line graph of a network. The optimization framework of edge embedding connects to the concept of node centrality in the representation learning framework. Also, we conduct research on attributed hypergraphs. We propose a novel hypergraph neural network to represent and classify hypernodes.
Outlier analysis is another important problem in network science. All the real-world networks contain outlier nodes to some extent. Empirically we have shown that outliers can affect the quality of network embedding if not handled properly. So, we integrate the process of network embedding and outlier detection into a single framework. In this research thread, we first propose a matrix factorization based approach which minimizes the effect of outlier nodes in the framework of attributed network embedding. Next, we propose two neural network architectures, based on L2 regularization and adversarial training respectively, to minimize the effect of outliers on node embedding of an attributed network. Further, extending the concept of support vector data description, we propose a novel algorithm which integrates node embedding, community detection and outlier detection into a single optimization framework by exploiting the link structure of a graph.
In the last part of the thesis, we focus on graph level representation and tasks. First, we propose a supervised graph neural network based algorithm with hierarchical pooling strategy to classify a graph from a set of graphs. Next, we propose a novel GNN based algorithm for the unsupervised representation of a graph from a set of graphs, so that similar graphs are represented closely in the embedding space and dissimilar graphs are separated away.