Optimization of Traversal Queries on Distributed Graph Stores

Sharma, Abhilash

dc.contributor.advisor	Simmhan, Yogesh
dc.contributor.author	Sharma, Abhilash
dc.date.accessioned	2021-10-06T09:52:09Z
dc.date.available	2021-10-06T09:52:09Z
dc.date.submitted	2018
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5397
dc.description.abstract	In this era of Big Data Analytics, much of the semi-structured data has inherent interconnectivity between representative entities. These are increasingly being modeled as property graphs because of the semantic advantages of the associations/edges formed between entities/vertices and their attributes that helps with their analysis. These graphs scale to billions of edges thereby requiring distributed frameworks. Giraph, GoFFish, Blogel are some of the distributed frameworks for graph analytics. These are generally used for long-running analytical queries that may take minutes or hours, while short term interactive querying requires low-latency O(seconds) responses. In particular, traversal queries over a property graph is used to retrieve paths or neighborhoods from distributed property graph stores rapidly, for interactive analysis. Horton+, Titan, and GoDB are some of the distributed frameworks for traversal queries. Efficient execution of traversal queries presents a challenge because of the unstructured and irregular access patterns of property graphs. This requires strategies to reduce the number of vertices and edges to visit and evaluated for a predicate speci fied in the query, while also reducing the network overheads for executing across parts of the graph present on different machines in a distributed setup. GoDB proposed a heuristics-based cost model that uses graph statistics to estimate the number of elements to be visited and the network overhead for each plan and selects the optimal plan. However, the large sizes of the in-memory Java objects used to hold the graph entities leads to the use of a large number of distributed machines, which in turn increases the network overhead. We leverage recent work on compressed data structures like Succinct to model the subgraph data in GoDBX to sharply reduce the memory usage and hence network latency across machines, while minimizing the increase in computational latency. We extend GoDB's cost model to operate on both compressed and uncompressed forms of real world property graphs, and o er detailed performance experiments to evaluate the scalability of GoDBX and GoDB, relative to Titan, for graphs with up to billions of edges	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29301
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	graph analytics	en_US
dc.subject	GoDBX	en_US
dc.subject	traversal queries	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Optimization of Traversal Queries on Distributed Graph Stores	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G29301.pdf
Size:: 3.141Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [102]

Show simple item record