• Login
    View Item 
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Optimization of Traversal Queries on Distributed Graph Stores

    View/Open
    Thesis full text (3.141Mb)
    Author
    Sharma, Abhilash
    Metadata
    Show full item record
    Abstract
    In this era of Big Data Analytics, much of the semi-structured data has inherent interconnectivity between representative entities. These are increasingly being modeled as property graphs because of the semantic advantages of the associations/edges formed between entities/vertices and their attributes that helps with their analysis. These graphs scale to billions of edges thereby requiring distributed frameworks. Giraph, GoFFish, Blogel are some of the distributed frameworks for graph analytics. These are generally used for long-running analytical queries that may take minutes or hours, while short term interactive querying requires low-latency O(seconds) responses. In particular, traversal queries over a property graph is used to retrieve paths or neighborhoods from distributed property graph stores rapidly, for interactive analysis. Horton+, Titan, and GoDB are some of the distributed frameworks for traversal queries. Efficient execution of traversal queries presents a challenge because of the unstructured and irregular access patterns of property graphs. This requires strategies to reduce the number of vertices and edges to visit and evaluated for a predicate speci fied in the query, while also reducing the network overheads for executing across parts of the graph present on different machines in a distributed setup. GoDB proposed a heuristics-based cost model that uses graph statistics to estimate the number of elements to be visited and the network overhead for each plan and selects the optimal plan. However, the large sizes of the in-memory Java objects used to hold the graph entities leads to the use of a large number of distributed machines, which in turn increases the network overhead. We leverage recent work on compressed data structures like Succinct to model the subgraph data in GoDBX to sharply reduce the memory usage and hence network latency across machines, while minimizing the increase in computational latency. We extend GoDB's cost model to operate on both compressed and uncompressed forms of real world property graphs, and o er detailed performance experiments to evaluate the scalability of GoDBX and GoDB, relative to Titan, for graphs with up to billions of edges
    URI
    https://etd.iisc.ac.in/handle/2005/5397
    Collections
    • Department of Computational and Data Sciences (CDS) [102]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV