Show simple item record

dc.contributor.advisorSimmhan, Yogesh
dc.contributor.authorSharma, Abhilash
dc.date.accessioned2021-10-06T09:52:09Z
dc.date.available2021-10-06T09:52:09Z
dc.date.submitted2018
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5397
dc.description.abstractIn this era of Big Data Analytics, much of the semi-structured data has inherent interconnectivity between representative entities. These are increasingly being modeled as property graphs because of the semantic advantages of the associations/edges formed between entities/vertices and their attributes that helps with their analysis. These graphs scale to billions of edges thereby requiring distributed frameworks. Giraph, GoFFish, Blogel are some of the distributed frameworks for graph analytics. These are generally used for long-running analytical queries that may take minutes or hours, while short term interactive querying requires low-latency O(seconds) responses. In particular, traversal queries over a property graph is used to retrieve paths or neighborhoods from distributed property graph stores rapidly, for interactive analysis. Horton+, Titan, and GoDB are some of the distributed frameworks for traversal queries. Efficient execution of traversal queries presents a challenge because of the unstructured and irregular access patterns of property graphs. This requires strategies to reduce the number of vertices and edges to visit and evaluated for a predicate speci fied in the query, while also reducing the network overheads for executing across parts of the graph present on different machines in a distributed setup. GoDB proposed a heuristics-based cost model that uses graph statistics to estimate the number of elements to be visited and the network overhead for each plan and selects the optimal plan. However, the large sizes of the in-memory Java objects used to hold the graph entities leads to the use of a large number of distributed machines, which in turn increases the network overhead. We leverage recent work on compressed data structures like Succinct to model the subgraph data in GoDBX to sharply reduce the memory usage and hence network latency across machines, while minimizing the increase in computational latency. We extend GoDB's cost model to operate on both compressed and uncompressed forms of real world property graphs, and o er detailed performance experiments to evaluate the scalability of GoDBX and GoDB, relative to Titan, for graphs with up to billions of edgesen_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G29301
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectgraph analyticsen_US
dc.subjectGoDBXen_US
dc.subjecttraversal queriesen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleOptimization of Traversal Queries on Distributed Graph Storesen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record