Leveraging KG Embeddings for Knowledge Graph Question Answering
Abstract
Knowledge graphs (KG) are multi-relational graphs consisting of entities as nodes and relations
among them as typed edges. The goal of knowledge graph question answering (KGQA) is to
answer natural language queries posed over the KG. These could be simple factoid questions
such as “What is the currency of USA? ” or it could be a more complex query such as “Who
was the president of USA after World War II? ”. Multiple systems have been proposed in the
literature to perform KGQA, include question decomposition, semantic parsing and even graph
neural network-based methods.
In a separate line of research, KG embedding methods (KGEs) have been proposed to
embed the entities and relations in the KG in low-dimensional vector space. These methods
aim to learn representations that can be then utilized by various scoring functions to predict the
plausibility of triples (facts) in the KG. Applications of KG embeddings include link prediction
and KG completion. Such KG embedding methods, even though highly relevant, have not been
explored for KGQA so far.
In this work, we focus on 2 aspects of KGQA: (i) Temporal reasoning, and (ii) KG incompleteness. Here, we leverage recent advances in KG embeddings to improve model reasoning in
the temporal domain, as well as use the robustness of embeddings to KG sparsity to improve
incomplete KG question answering performance. We do this through the following contributions:
Improving Multi-Hop KGQA using KG Embeddings
We first tackle a subset of KGQA queries – multi-hop KGQA. We propose EmbedKGQA, a
method which uses ComplEx embeddings and scoring function to answer these queries. We find
that EmbedKGQA is particularly effective at KGQA over sparse KGs, while it also relaxes the
requirement of answer selection from a pre-specified local neighborhood, an undesirable constraint imposed by GNN-based for this task. Experiments show that EmbedKGQA is superior
to several GNN-based methods on incomplete KGs across a variety of dataset scales.
Question Answering over Temporal Knowledge Graphs We then extend our method to temporal knowledge graphs (TKG), where each edge in the KG
is accompanied by a time scope (i.e. start and end times). Here, instead of KGEs, we make
use of temporal KGEs (TKGE) to enable the model to make use of these time annotations and
perform temporal reasoning. We also propose a new dataset - CronQuestions - which is one of
the largest publicly available temporal KGQA dataset with over 400k template-based temporal
reasoning questions. Through extensive experiments we show the superiority of our method,
CronKGQA, over several language-model baselines on the challenging task of temporal KGQA
on CronQuestions.
Sequence-to-Sequence Knowledge Graph Completion and Question Answering
So far, integrating KGE into the KGQA pipeline had required separate training of the KGE
and KGQA modules. In this work, we show that an off-the-shelf encoder-decoder Transformer
model can serve as a scalable and versatile KGE model obtaining state-of-the-art results for
KG link prediction and incomplete KG question answering. We achieve this by posing KG link
prediction as a sequence-to-sequence task and exchange the triple scoring approach taken by
prior KGE methods with autoregressive decoding. Such a simple but powerful method reduces
the model size up to 98% compared to conventional KGE models while keeping inference time
tractable. It also allows us to answer a variety of KGQA queries, not being restricted by query
type.